* Add language from CLI help to job revert for version|tag
* Add CLI job tag subcommand page
* Add API create delete tag
Examples use same names between CLI and API
* Update CLI revert, tag; API jobs
* Add job version content
* add tag name unique per job to CLI/API; address Phil's feedback
Add partial explaining why tag, add to CLI/API
* Add diff_version to API jobs list job versions
* Apply suggestions from code review
Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>
* remove tutorial links since not published yet.
---------
Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>
When the local Consul agent receives a deregister request, it performs a
pre-flight check using the locally cached ACL token. The agent then sends the
request upstream to the Consul servers as part of anti-entropy, using its own
token. This requires that the token we use for deregistration is valid even
though that's not the token used to write to the Consul server.
There are several cases where the service identity token might no longer exist
at the time of deregistration:
* A race condition between the sync and destroying the allocation.
* Misconfiguration of the Consul auth method with a TTL.
* Out-of-band destruction of the token.
Additionally, Nomad's sync with Consul returns early if there are any errors,
which means that a single broken token can prevent any other service on the
Nomad agent from being registered or deregistered.
Update Nomad's sync with Consul to use the Nomad agent's own Consul token for
deregistration, regardless of which token the service was registered
with. Accumulate errors from the sync so that they no longer block
deregistration of other services.
Fixes: https://github.com/hashicorp/nomad/issues/20159
* jobspec: add a chown option to artifact block
This PR adds a boolean 'chown' field to the artifact block.
It indicates whether the Nomad client should chown the downloaded files
and directories to be owned by the task.user. This is useful for drivers
like raw_exec and exec2 which are subject to the host filesystem user
permissions structure. Before, these drivers might not be able to use or
manage the downloaded artifacts since they would be owned by the root
user on a typical Nomad client configuration.
* api: no need for pointer of chown field
As of #24166, Nomad agents will use their own token to deregister services and
checks from Consul. This returns the deregistration path to the pre-Workload
Identity workflow. Expand the documentation to make clear why certain ACL
policies are required for clients.
Additionally, we did not explicitly call out that auth methods should not set an
expiration on Consul tokens. Nomad does not have a facility to refresh these
tokens if they expire. Even if Nomad could, there's no way to re-inject them
into Envoy sidecars for Consul Service Mesh without recreating the task anyways,
which is what happens today. Warn users that they should not set an expiration.
Closes: https://github.com/hashicorp/nomad/issues/20185 (wontfix)
Ref: https://hashicorp.atlassian.net/browse/NET-10262
While testing with agents built with the race-detection option enabled, I
encountered a data race while draining a node.
When we upsert a node we copy the `NodeResources` struct and then perform a
fixup for backwards compatibility of the topology struct. This fixup was being
executed on the original struct and not the copy, which means we're uselessly
fixing up the wrong struct and we're corrupting the state store in the
process (albeit harmlessly, I suspect).
Fix the data race by calling the method on the correct pointer.
When using the Client FS APIs, we check to ensure that reads don't traverse into
the allocation's secret dir and private dir. But this check can be bypassed on
case-insensitive file systems (ex. Windows, macOS, and Linux with obscure ext4
options enabled). This allows a user with `read-fs` permissions but not
`alloc-exec` permissions to read from the secrets dir.
This changeset updates the check so that it's case-insensitive. This risks false
positives for escape (see linked Go issue), but only if a task without
filesystem isolation deliberately writes into the task working directory to do
so, which is a fail-safe failure mode.
Ref: https://github.com/golang/go/issues/18358
Co-authored-by: dduzgun-security <deniz.duzgun@hashicorp.com>
In #23977 we merged a change to how the keyring was stored. Because keyring
initialization takes slightly longer now, this uncovered existing timing bugs in
some of our tests where tests that require the keyring (ex. plan applier tests)
were waiting for the leader but not the keyring initialization. Fix another
example we've seen causing test flakes.
* Modify variable access permissions for UI users with write in only certain namespaces
* Addressing some PR comments
* Variables index namespaces on * and ability checks are now namespaced
* Mistook Delete for Destroy, and update unit tests for mult-return allPaths
We initialize this slice with a zeroed array and then append to it, which means
we then have to clean out the empty strings later. Initialize to the correct
capacity up front so there are no empty values.
Ref: https://github.com/hashicorp/nomad/pull/24104
In #24007 we merged new HCL files but they were missing copywrite headers
because the scan didn't run on this PR for some reason. I've already backported
this to the Enterprise branches.
In #24095 we made a fix for non-streaming exec into Docker tasks for script
checks and `change_mode = "script"`, but didn't complete E2E testing. We need to
use `ContainerExecAttach` in the new API in order to get stdout/stderr from
tasklets, but the previous `ContainerExecStart` call will prevent this from
running successfully with an error that the exec has already run.
* Ref: [NET-11202 (comment)](https://hashicorp.atlassian.net/browse/NET-11202?focusedCommentId=551618)
* This has shipped in Nomad 1.9.0-beta.1 but not production yet.
* This should fix the remaining issues in nightly E2E for Docker.
When jobs are submitted with a scaling policy, the scaling policy's target only
includes the job's namespace if the `namespace` field is set in the jobspec and
not from the request. Normally jobs are canonicalized in the RPC handler before
being written to Raft. But the scaling policy targets are instead written during
the conversion from `api.Job` to `structs.Job`. We populate the `structs.Job`
namespace from the request here as well, but only after the conversion has
occurred. Swap the order of these operations so that the conversion is always
happening with a correct namespace.
Long-term we should not be making mutations during conversion either. But we
can't remove it immediately because API requests may come from any agent across
upgrades. Move the scaling target creation into the `Canonicalize` method and
mark it for future removal in the API conversion code path.
Fixes: https://github.com/hashicorp/nomad/issues/24039
In ##23966 when we switched to using the official Docker SDK client, this
included new API calls for attaching to the "exec objects" created for running
processes inside a running Docker task. When we updated the API for the
non-streaming cases (script health checks, and `change_mode = "script"`), we
used the container ID and not the exec object ID. These IDs aren't identical
because you can have multiple exec objects for a given container. This results
in errors like "unable to upgrade to tcp, received 404" because the Docker API
can't find the exec object with the container ID.
* Ref: [NET-11202 (comment)](https://hashicorp.atlassian.net/browse/NET-11202?focusedCommentId=551618)
* This has shipped in Nomad 1.9.0-beta.1 but not production yet.
In ##23966 when we switched to using the official Docker SDK client, we had to
rework the stats collection loop for the new client. But we missed resetting the
timer on the collection loop, which meant that we'd only collect stats once and
then never again.
* Ref: [NET-11202 (comment)](https://hashicorp.atlassian.net/browse/NET-11202?focusedCommentId=550814)
* This has shipped in Nomad 1.9.0-beta.1 but not production yet.
A few small updates to the recent "Federate access to AWS with Nomad Workload Identity" documentation, most notably that restart isn't needed because AWS SDKs handle OIDC reauth gracefully (unlike any other type of auth - for all others it's cached statically on startup, so nothing but a full restart works in case your credentials expire).
In ##23966 when we switched to using the official Docker SDK client, we had more
contexts to add because most of the library methods take one. But for some APIs
like waiting for a container to exit after we've started it, we never want to
close this context, because the operation can outlive the Nomad agent itself.
Nomad v1.9.0 (finally!) removes support for HCL1 and the `-hcl1` flag.
See #23912 for details.
One of the uses of HCL1 over HCL2 was that HCL1 allowed quoted keys in
blocks such as env, meta, and Docker's labels:
```hcl
some_block {
"foo.bar" = "baz"
}
```
This works in HCL1 but is invalid HCL2. In HCL2 you must use a map
instead of a block:
```hcl
some_map = {
"eggs.spam" = "works!"
}
```
This was such a hassle for users we special cased the `env` and `meta`
blocks to be accepted as blocks or maps in #9936.
However Docker `labels`, being a task config option, is much harder to
special case and commonly needs dots-in-keys for things like DataDog
autodiscovery via Docker container labels:
https://docs.datadoghq.com/containers/docker/integrations/?tab=labels
Luckily `labels` can be specified as a list-of-maps instead:
```hcl
labels = [
{
"com.datadoghq.ad.check_names" = "[\"openmetrics\"]"
"com.datadoghq.ad.init_configs" = "[{}]"
}
]
```
So instead of adding more awkward hcl1/2 backward compat code to Nomad,
I just updated the docs to hopefully help people hit by this.
The only other known workaround is dropping HCL in favor of JSON
jobspecs altogether, but that forces a huge migration and maintenance
burden on users:
https://discuss.hashicorp.com/t/docker-based-autodiscovery-with-datadog-how-can-we-make-it-work/18870