In #16872 we added support for configuring the API client with a unix domain
socket. In order to set the host correctly, we parse the address before mutating
the Address field in the configuration. But this prevents the configuration from
being reused across multiple clients, as the next time we parse the address it
will no longer be pointing to the socket. This breaks consumers like the
autoscaler, which reuse the API config between plugins.
Update the `NewClient` constructor to only override the `url` field if it hasn't
already been parsed. Include a test demonstrating safe reuse with a unix domain
socket.
Ref: https://github.com/hashicorp/nomad-autoscaler/issues/944
Ref: https://github.com/hashicorp/nomad-autoscaler/pull/945
Although we have `client.allocations` metrics to track allocation states on a
client, having separate metrics for `client.tasks` will allow operators to
identify that there are individual tasks in an unexpected state in an otherwise
healthy allocation.
Fixes: https://github.com/hashicorp/nomad/issues/23770
The Docker driver's `volume` field to specify bind-mounts takes a list of
strings that consist of three `:`-delimited fields: source, destination, and
options. We append the SELinux label from the plugin configuration as the third
field. But when the user has already specified the volume is read-only with
`:ro`, we're incorrectly appending the SELinux label with another `:` instead of
the required `,`.
Combine the options into a single field value before appending them to the bind
mounts configuration. Updated the tests to split out Windows behavior (which
doesn't accept options) and to ensure the test task has the expected environment
for bind mounts.
Fixes: https://github.com/hashicorp/nomad/issues/23690
Tasks have a default identity created during canonicalization. If this default
identity is somehow missing, we'll hit panics when trying to create and sign the
claims in the plan applier. Fallback to the default identity if it's missing
from the task.
This changeset will need a different implementation in 1.7.x+ent backports, as
the constructor for identities was refactored significantly in #23708. The panic
cannot occur in 1.6.x+ent.
Fixes: https://github.com/hashicorp/nomad/issues/23758
a change to the network{cni{}} block means that
the user wants the network config to change,
and that only happens during initial alloc setup,
so we need to replace the alloc(s) to get
fresh network(s) to reconfigure from scratch.
e.g. a job plan diff like this
```
+/- Task Group: "g" (1 in-place update)
+ Network {
+ CNIConfig {
+ a: "ayy"
}
```
should instead be
```
+/- Task Group: "g" (1 create/destroy update)
+ Network {
+ CNIConfig {
+ a: "ayy"
}
```
The path for a Variable never begins with a leading `/`, because it's stripped
off in the API before it ever gets to the state store. The CLI and UI allow the
leading `/` for convenience, but this can be misleading when it comes to writing
ACL policies. An ACL policy with a path starting with a leading `/` will never
match.
Update the ACL policy parser so that we prevent an incorrect variable path in
the policy.
Fixes: https://github.com/hashicorp/nomad/issues/23730
In #23720 we removed the 1.7.x and 1.6.x release branches from the "active"
versions for backports. But we forgot that we needed these backports to remain
active for documentation backports because of versioned docs.
- Pulled common content from multiple pages into new partials
- Refactored install/index to be OS-based so I could add linux-distro-based instructions to install-consul-cni-plugins.mdx partial. The tab groups on the install/index page do match and change focus as expected.
- Moved CNI overview-type content to networking/index
- Refactored networking/cni to include install CNI plugins and configuration content (from install/index).
- Moved CNI plugins explanation in bridge mode configuration section into bullet points. They had been #### headings, which aren't rendered in the R page TOC. I tried to simplify and format the bullet point content to be easier to scan.
Ref: https://hashicorp.atlassian.net/browse/CE-661
Fixes: https://github.com/hashicorp/nomad/issues/23229
Fixes: https://github.com/hashicorp/nomad/issues/23583
Resolves symlink escape when unarchiving by removing existing paths within the same allocation directory which can occur by writing a header that points to a symlink that lives outside of the sandbox environment. This exploit requires first compromising the Nomad client agent at the source allocation.
Ref: https://hashicorp.atlassian.net/browse/NET-10607
Ref: https://github.com/hashicorp/nomad-enterprise/pull/1725
On supported platforms, the secrets directory is a 1MiB tmpfs. But some tasks
need larger space for downloading large secrets. This is especially the case for
tasks using `templates`, which need extra room to write a temporary file to the
secrets directory that gets renamed to the old file atomically.
This changeset allows increasing the size of the tmpfs in the `resources`
block. Because this is a memory resource, we need to include it in the memory we
allocate for scheduling purposes. The task is already prevented from using more
memory in the tmpfs than the `resources.memory` field allows, but can bypass
that limit by writing to the tmpfs via `template` or `artifact` blocks.
Therefore, we need to account for the size of the tmpfs in the allocation
resources. Simply adding it to the memory needed when we create the allocation
allows it to be accounted for in all downstream consumers, and then we'll
subtract that amount from the memory resources just before configuring the task
driver.
For backwards compatibility, the default value of 1MiB is "free" and ignored by
the scheduler. Otherwise we'd be increasing the allocated resources for every
existing alloc, which could cause problems across upgrades. If a user explicitly
sets `resources.secrets = 1` it will no longer be free.
Fixes: https://github.com/hashicorp/nomad/issues/2481
Ref: https://hashicorp.atlassian.net/browse/NET-10070
In 1.6.0 we shipped the ability to review the original HCL in the web UI, but
didn't follow-up with an equivalent in the command line. Add a `-hcl` flag to
the `job inspect` command.
Closes: https://github.com/hashicorp/nomad/issues/6778
Although we encourage users to use Vault roles, sometimes they're going to want
to assign policies based on entity and pre-create entities and aliases based on
claims. This allows them to use single default role (or at least small number of
them) that has a templated policy, but have an escape hatch from that.
When defining Vault entities the `user_claim` must be unique. When writing Vault
binding rules for use with Nomad workload identities the binding rule won't be
able to create a 1:1 mapping because the selector language allows accessing only
a single field. The `nomad_job_id` claim isn't sufficient to uniquely identify a
job because of namespaces. It's possible to create a JWT auth role with
`bound_claims` to avoid this becoming a security problem, but this doesn't allow
for correct accounting of user claims.
Add support for an `extra_claims` block on the server's `default_identity`
blocks for Vault. This allows a cluster administrator to add a custom claim on
all allocations. The values for these claims are interpolatable with a limited
subset of fields, similar to how we interpolate the task environment.
Fixes: https://github.com/hashicorp/nomad/issues/23510
Ref: https://hashicorp.atlassian.net/browse/NET-10372
Ref: https://hashicorp.atlassian.net/browse/NET-10387
Upcoming work to add extensibility to identity claims for Vault (ref #23510)
will require exposing server configuration and more objects from state to the
process of creating an `IdentityClaims` struct.
Depending on how we inject these parameters into the constructor, we end up
creating circular dependencies or a lot more logic in the setup in the plan
applier and alloc endpoint. There are three contexts where we call
`NewIdentityClaims`: the plan applier (where we only care about the default
identity), signing task identities, and signing service identities. Each needs
different parameters. So we'll refactor the constructor as a builder with
methods that the caller can decide to use (or not) depending on context. I've
pulled this work out of #23675 to make it easier to review separately.
Ref: https://github.com/hashicorp/nomad/issues/23510
Ref: https://github.com/hashicorp/nomad/pull/23675
Ref: https://hashicorp.atlassian.net/browse/NET-10372
Ref: https://hashicorp.atlassian.net/browse/NET-10387
As of Nomad 1.8.0 LTS we're no longer backporting changes to Nomad CE versions
1.7.x and 1.6.x. We never got around to removing the flags in the file the
backport assistant uses to control automated backports.
The TLS configuration object includes a deprecated `prefer_server_cipher_suites`
field. In version of Go prior to 1.17, this property controlled whether a TLS
connection would use the cipher suites preferred by the server or by the
client. This field is ignored as of 1.17 and, according to the `crypto/tls`
docs: "Servers now select the best mutually supported cipher suite based on
logic that takes into account inferred client hardware, server hardware, and
security."
This property has been long-deprecated and leaving it in place may lead to false
assumptions about how cipher suites are negotiated in connection to a server. So
we want to remove it in Nomad 1.9.0.
Fixes: https://github.com/hashicorp/nomad-enterprise/issues/999
Ref: https://hashicorp.atlassian.net/browse/NET-10531
In Nomad 1.6.0 we started sending the node secret with RPCs that previously did
not include it. We planned to deprecate the older auth workflow but didn't set a
release. Removing the legacy support means that nodes running <1.6.0 will fail
to heartbeat.
Ref: https://hashicorp.atlassian.net/browse/NET-10009
Add a section to the docs describing planned upcoming deprecations and
removals. Also added some missing upgrade guide sections missed during the last
release.
The default root:root is used as this provides permissions to run
both server and client agents. The comment details what changes
can be made to operators if needed.
When running the service file prior to this change, root:root
would be the default.
When updating a `JobScalingEvent`, the state store function did not copy the
existing object before mutating it. This corrupts the state store because it
modifies the leaf node without committing it in a transaction. It can also cause
the Nomad server to crash with a "fatal error: concurrent map read and map
write" if its `ScalingEvents` map is read via the `ScaleStatus` RPC at the same
time as it's being written.
This changeset also removes some mostly-unused public methods on the struct that
dangerously encourage you to mutate it outside of a copy.
Ref: https://hashicorp.atlassian.net/browse/NET-10529
For templates with `change_mode = "script"`, we set a driver handle in the
poststart method, so the template runner can execute the script inside the
task. But when the client is restarted and the template contents change during
that window, we trigger a change_mode in the prestart method. In that case, the
hook will not have the handle and so returns an errror trying to run the change
mode.
We restore the driver handle before we call any prestart hooks, so we can pass
that handle in the constructor whenever it's available. In the normal task start
case the handle will be empty but also won't be called.
The error messages are also misleading, as there's no capabilities check
happening here. Update the error messages to match.
Fixes: https://github.com/hashicorp/nomad/issues/15851
Ref: https://hashicorp.atlassian.net/browse/NET-9338
When a root key is rotated, the servers immediately start signing Workload
Identities with the new active key. But workloads may be using those WI tokens
to sign into external services, which may not have had time to fetch the new
public key and which might try to fetch new keys as needed.
Add support for prepublishing keys. Prepublished keys will be visible in the
JWKS endpoint but will not be used for signing or encryption until their
`PublishTime`. Update the periodic key rotation to prepublish keys at half the
`root_key_rotation_threshold` window, and promote prepublished keys to active
after the `PublishTime`.
This changeset also fixes two bugs in periodic root key rotation and garbage
collection, both of which can't be safely fixed without implementing
prepublishing:
* Periodic root key rotation would never happen because the default
`root_key_rotation_threshold` of 720h exceeds the 72h maximum window of the FSM
time table. We now compare the `CreateTime` against the wall clock time instead
of the time table. (We expect to remove the time table in future work, ref
https://github.com/hashicorp/nomad/issues/16359)
* Root key garbage collection could GC keys that were used to sign
identities. We now wait until `root_key_rotation_threshold` +
`root_key_gc_threshold` before GC'ing a key.
* When rekeying a root key, the core job did not mark the key as inactive after
the rekey was complete.
Ref: https://hashicorp.atlassian.net/browse/NET-10398
Ref: https://hashicorp.atlassian.net/browse/NET-10280
Fixes: https://github.com/hashicorp/nomad/issues/19669
Fixes: https://github.com/hashicorp/nomad/issues/23528
Fixes: https://github.com/hashicorp/nomad/issues/19368
The documentation for the `SSL` option for the Docker driver is
misleading inasmuch as it's both deprecated and non-functional in current
versions of Docker. Remove this option from the docs and add a section
explaining how to use insecure registries.
Fixes: https://github.com/hashicorp/nomad/issues/23616