The Nomad client can now optionally emit telemetry data from the
prerun and prestart hooks. This allows operators to monitor and
alert on failures and time taken to complete.
The new datapoints are:
- nomad.client.alloc_hook.prerun.success (counter)
- nomad.client.alloc_hook.prerun.failed (counter)
- nomad.client.alloc_hook.prerun.elapsed (sample)
- nomad.client.task_hook.prestart.success (counter)
- nomad.client.task_hook.prestart.failed (counter)
- nomad.client.task_hook.prestart.elapsed (sample)
The hook execution time is useful to Nomad engineering and will
help optimize code where possible and understand job specification
impacts on hook performance.
Currently only the PreRun and PreStart hooks have telemetry
enabled, so we limit the number of new metrics being produced.
This PR adds Consul Template's executeTemplate function to the denylist by
default, in order to prevent accidental or malicious infinitely recursive
execution.
---------
Co-authored-by: Tim Gross <tgross@hashicorp.com>
A more comprehensive env.denylist that now includes more token, token file and
license variables.
---------
Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>
In order to help users understand multi-region federated
deployments, this change adds two new sections to the website.
The first expands the architecture page, so we can add further
detail over time with an initial federation page. The second adds
a federation operations page which goes into failure planning and
mitigation.
Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>
Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
* escaping newlines is not allowed in go-sockaddr template
* client{} block in client section
* tiny extra clarification that the NOMAD_ADDR is an example
Core scheduler relies on a special table in the state store—the TimeTable—to
figure out which objects can be GC'd. The TimeTable correlates Raft indices
with objects insertion time, a solution we used before most of the objects we
store in the state contained timestamps. This introduced a bit of a memory
overhead and complexity, but most importantly meant that any GC threshold users
set greater than timeTableLimit = 72 * time.Hour was ignored. This PR removes
the TimeTable and relies on object timestamps to determine whether they could
be GCd or not.
* initial content from Daniel's doc
* Add IPv6 support doc to operations section.
* daniel obsessively re-refactors his docs
* Style guide edits
* a few more style nits
---------
Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>
When using transparent proxy mode with the `connect` block, the UID of the
workload cannot be the same as the UID of the Envoy sidecar (currently 101 in
the default Envoy container image).
Fixes: https://github.com/hashicorp/nomad/issues/23508
* docs: warn about Consul auth method locality
The locality of Consul tokens we mint via Workload Identity is governed by the
Consul auth method configuration. By default tokens are local to the Consul
datacenter, which typically maps 1:1 with a Nomad region. Cluster administrators
who need cross-datacenter tokens can get them by setting the locality to global,
at the risk of placement problems if the primary DC isn't available.
Ref: https://github.com/hashicorp/consul/issues/21863
Fixes: https://github.com/hashicorp/nomad/issues/23505
* Docs: Update CLI job tag unset
CLI help order was wrong, so updating the docs.
* change usage to [options]. Move general options into expanable.
* change "to see" to "for"
* Add language from CLI help to job revert for version|tag
* Add CLI job tag subcommand page
* Add API create delete tag
Examples use same names between CLI and API
* Update CLI revert, tag; API jobs
* Add job version content
* add tag name unique per job to CLI/API; address Phil's feedback
Add partial explaining why tag, add to CLI/API
* Add diff_version to API jobs list job versions
* Apply suggestions from code review
Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>
* remove tutorial links since not published yet.
---------
Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>
* jobspec: add a chown option to artifact block
This PR adds a boolean 'chown' field to the artifact block.
It indicates whether the Nomad client should chown the downloaded files
and directories to be owned by the task.user. This is useful for drivers
like raw_exec and exec2 which are subject to the host filesystem user
permissions structure. Before, these drivers might not be able to use or
manage the downloaded artifacts since they would be owned by the root
user on a typical Nomad client configuration.
* api: no need for pointer of chown field
As of #24166, Nomad agents will use their own token to deregister services and
checks from Consul. This returns the deregistration path to the pre-Workload
Identity workflow. Expand the documentation to make clear why certain ACL
policies are required for clients.
Additionally, we did not explicitly call out that auth methods should not set an
expiration on Consul tokens. Nomad does not have a facility to refresh these
tokens if they expire. Even if Nomad could, there's no way to re-inject them
into Envoy sidecars for Consul Service Mesh without recreating the task anyways,
which is what happens today. Warn users that they should not set an expiration.
Closes: https://github.com/hashicorp/nomad/issues/20185 (wontfix)
Ref: https://hashicorp.atlassian.net/browse/NET-10262
A few small updates to the recent "Federate access to AWS with Nomad Workload Identity" documentation, most notably that restart isn't needed because AWS SDKs handle OIDC reauth gracefully (unlike any other type of auth - for all others it's cached statically on startup, so nothing but a full restart works in case your credentials expire).
Nomad v1.9.0 (finally!) removes support for HCL1 and the `-hcl1` flag.
See #23912 for details.
One of the uses of HCL1 over HCL2 was that HCL1 allowed quoted keys in
blocks such as env, meta, and Docker's labels:
```hcl
some_block {
"foo.bar" = "baz"
}
```
This works in HCL1 but is invalid HCL2. In HCL2 you must use a map
instead of a block:
```hcl
some_map = {
"eggs.spam" = "works!"
}
```
This was such a hassle for users we special cased the `env` and `meta`
blocks to be accepted as blocks or maps in #9936.
However Docker `labels`, being a task config option, is much harder to
special case and commonly needs dots-in-keys for things like DataDog
autodiscovery via Docker container labels:
https://docs.datadoghq.com/containers/docker/integrations/?tab=labels
Luckily `labels` can be specified as a list-of-maps instead:
```hcl
labels = [
{
"com.datadoghq.ad.check_names" = "[\"openmetrics\"]"
"com.datadoghq.ad.init_configs" = "[{}]"
}
]
```
So instead of adding more awkward hcl1/2 backward compat code to Nomad,
I just updated the docs to hopefully help people hit by this.
The only other known workaround is dropping HCL in favor of JSON
jobspecs altogether, but that forces a huge migration and maintenance
burden on users:
https://discuss.hashicorp.com/t/docker-based-autodiscovery-with-datadog-how-can-we-make-it-work/18870
* TaggedVersion information in structs, rather than job_endpoint (#23841)
* TaggedVersion information in structs, rather than job_endpoint
* Test for taggedVersion description length
* Some API plumbing
* Tag and Untag job versions (#23863)
* Tag and Untag at API level on down, but am I unblocking the wrong thing?
* Code and comment cleanup
* Unset methods generally now I stare long into the namespace abyss
* Namespace passes through with QueryOptions removed from a write requesting struct
* Comment and PR review cleanup
* Version back to VersionStr
* Generally consolidate unset logic into apply for version tagging
* Addressed some PR comments
* Auth check and RPC forwarding
* uint64 instead of pointer for job version after api layer and renamed copy
* job tag command split into apply and unset
* latest-version convenience handling moved to CLI command level
* CLI tests for tagging/untagging
* UI parts removed
* Add to job table when unsetting job tag on latest version
* Vestigial no more
* Compare versions by name and version number with the nomad history command (#23889)
* First pass at passing a tagname and/or diff version to plan/versions requests
* versions API now takes compare_to flags
* Job history command output can have tag names and descriptions
* compare_to to diff-tag and diff-version, plus adding flags to history command
* 0th version now shows a diff if a specific diff target is requested
* Addressing some PR comments
* Simplify the diff-appending part of jobVersions and hide None-type diffs from CLI
* Remove the diff-tag and diff-version parts of nomad job plan, with an eye toward making them a new top-level CLI command soon
* Version diff tests
* re-implement JobVersionByTagName
* Test mods and simplification
* Documentation for nomad job history additions
* Prevent pruning and reaping of TaggedVersion jobs (#23983)
tagged versions should not count against JobTrackedVersions
i.e. new job versions being inserted should not evict tagged versions
and GC should not delete a job if any of its versions are tagged
Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>
---------
Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>
* [ui] Version Tags on the job versions page (#24013)
* Timeline styles and their buttons modernized, and tags added
* styled but not yet functional version blocks
* Rough pass at edit/unedit UX
* Styles consolidated
* better UX around version tag crud, plus adapter and serializers
* Mirage and acceptance tests
* Modify percy to not show time-based things
---------
Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>
* Job revert command and API endpoint can take a string version tag name (#24059)
* Job revert command and API endpoint can take a string version tag name
* RevertOpts as a signature-modified alternative to Revert()
* job revert CLI test
* Version pointers in endpoint tests
* Dont copy over the tag when a job is reverted to a version with a tag
* Convert tag name to version number at CLI level
* Client method for version lookup by tag
* No longer double-declaring client
* [ui] Add tag filter to the job versions page (#24064)
* Rough pass at the UI for version diff dropdown
* Cleanup and diff fetching via adapter method
* TaggedVersion now VersionTag (#24066)
---------
Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>