Fix a regression introduced in Nomad 1.9.0 where CNI bridge networking with
IPv6-only interfaces would fail with 'no interface with an address' error.
The issue was that while the code correctly populated the AddressIPv6 field
for IPv6 addresses, several validation checks only examined the Address field
(IPv4), causing IPv6-only configurations to be rejected.
Changes:
- Update interface selection logic in cniToAllocNet to accept interfaces with
either IPv4 or IPv6 addresses (not just IPv4)
- Update fallback logic to check both Address and AddressIPv6 fields
- Update error check to only fail when both IPv4 and IPv6 are missing
- Update AllocNetworkStatus.IsZero() to check AddressIPv6 field
This allows CNI configurations with IPv6-only interfaces to work correctly,
restoring functionality from Nomad 1.8.x.
Fixes#26905
Add a test case to verify that CNI results containing only IPv6 addresses
are handled correctly. This is a regression test for a bug introduced in
GH-23882 where IPv6-only interfaces would fail with 'no interface with an
address' error.
The test verifies that when a CNI plugin returns only an IPv6 address
(without IPv4), the allocation network status should be properly populated
with the IPv6 address in the AddressIPv6 field.
If a Nomad job is started with a large number of instances (e.g. 4 billion),
then the Nomad servers that attempt to schedule it will run out of memory and
crash. While it's unlikely that anyone would intentionally schedule a job with 4
billion instances, we have occasionally run into issues with bugs in external
automation. For example, an automated deployment system running on a test
environment had an off-by-one error, and deployed a job with count = uint32(-1),
causing the Nomad servers for that environment to run out of memory and crash.
To prevent this, this PR introduces a job_max_count Nomad server configuration
parameter. job_max_count limits the number of allocs that may be created from a
job. The default value is 50000 - this is low enough that a job with the maximum
possible number of allocs will not require much memory on the server, but is
still much higher than the number of allocs in the largest Nomad job we have
ever run.
The 'cni' make target was hardcoded to download amd64 CNI plugins,
which prevented users on arm64/aarch64 systems from using this target.
This change adds automatic architecture detection that maps the
system architecture (from uname -m) to the appropriate CNI plugin
architecture:
- x86_64 -> amd64 (existing behavior)
- aarch64 -> arm64 (new support)
The implementation uses the existing THIS_ARCH variable already
defined in the Makefile, ensuring consistency with other build
targets.
Fixes#26864
When a Nomad server joined or left the cluster via Serf, we emit
a log message to indicate this. It was previously using a single
kv context to detail the server name, dc, and address which likely
was created before the switch to hclog and use of contextual
logging.
This change splits the log message, so that each server part is
a kv entry, which will be easier to parse with log collecting
tools.
* Add preserve-resources flag when registering a job
* Add preserve-resources flag to website docs
* Add changelog
* Update tests, docs
* Preserve counts & resources in fsm
* Update doc
* Update preservation of resources/count to happen in StateStore
The transit keyring uses the go-kms-wrapper for parsing the vault config
and errors if tlsSkipVerify is an empty string.
---------
Co-authored-by: Tim Gross <tgross@hashicorp.com>
When we attempt to drop unneeded evals from the eval broker, if the eval has
been GC'd before the check is made, we hit a nil pointer. Check that the eval
actually exists before attempting to remove it from the broker.
Fixes: https://github.com/hashicorp/nomad/issues/26871
Node reconciler never took node feasibility into account. In cases when
there were nodes excluded from allocation placement due to constraints
not being met, for example, the desired total or desired canary numbers
were never updated in the reconciler to account for that. Thus,
deployments would never become successful.
If the latest Consul version has known bugs that cause failures of
the test suite, it is useful to be able to skip this. Otherwise,
CI will fail on all PRs and release branches until a new version
is released.
In cases where system jobs had the same amount of canary allocations
deployed as there were eligible nodes, the scheduler would incorrectly
mark the deployment as complete, as if auto promotion was set. This edge
case uncovered a bug in the setDeploymentStatusAndUpdates method, and
since we round up canary nodes, it may not be such an edge case
afterall.
---------
Co-authored-by: Tim Gross <tgross@hashicorp.com>
ACL policies are parsed when creating, updating, or compiling the
resulting ACL object when used. This parsing was silently ignoring
duplicate singleton keys, or invalid keys which does not grant any
additional access, but is a poor UX and can be unexpected.
This change parses all new policy writes and updates, so that
duplicate or invalid keys return an error to the caller. This is
called strict parsing. In order to correctly handle upgrades of
clusters which have existing policies that would fall foul of the
change, a lenient parsing mode is also available. This allows
the policy to continue to be parsed and compiled after an upgrade
without the need for an operator to correct the policy document
prior to further use.
Co-authored-by: Tim Gross <tgross@hashicorp.com>
In a cluster where the Nomad servers have been upgraded before the
clients, the cluster leader will generate an identity for each
client at every heartbeat. The clients will not have the code path
to handle this response, so the object is thrown away. This wastes
server resources.
This change introduces a minimum version check to the logic which
decides whether an identity should be generated. In the situation
above, the leader will now decline to generate identities for Nomad
clients running pre-1.11 versions.
In #26831 we're preventing unexpected node RPCs by ensuring that the volume
watcher only unpublishes when allocations are client-terminal.
To mitigate any remaining similar issues, add serialization of node plugin RPCs,
as we did for controller plugin RPCs in #17996 and as recommended ("SHOULD") by
the CSI specification. Here we can do per-volume serialization rather than
per-plugin serialization.
Reorder the methods of the `volumeManager` in the client so that each interface
method and its directly-associated helper methods read from top-to-bottom,
instead of a mix of directions.
Ref: https://github.com/hashicorp/nomad/pull/17996
Ref: https://github.com/hashicorp/nomad/pull/26831
While working on #26831 and #26832 I made some minor improvements to our
end-to-end test setup for CSI:
* bump the AWS EBS plugin versions to latest release (1.48.0)
* remove the unnnecessary `datacenters` field from the AWS EBS plugin jobs
* add a name tag to the EBS volumes we create
* add a user-specific name tag to the cluster name when using the makefile to
deploy a cluster
* add volumes and other missing variables from the `provision-infra` module to
the main E2E module
Ref: https://github.com/hashicorp/nomad/pull/26832
Ref: https://github.com/hashicorp/nomad/pull/26831
The volume watcher checks whether any allocations that have claims are terminal
so that it knows if it's safe to unpublish the volume. This check was
considering a claim as unpublishable if the allocation was terminal on either
the server or client, rather than the client alone. In many circumstances this
is safe.
But if an allocation takes a while to stop (ex. it has a `shutdown_delay`), it's
possible for garbage collection to run in the window between when the alloc is
marked server-terminal and when the task is actually stopped. The server
unpublishes the volume which sends a node plugin RPC. The plugin unmounts the
volume while it's in use, and then unmounts it again when the allocation stops
and the CSI postrun hook runs. If the task writes to the volume during the
unmounting process, some providers end up in a broken state and the volume is
not usable unless it's detached and reattached.
Fix this by considering a claim a "past claim" only when the allocation is
client terminal. This way if garbage collection runs while we're waiting for
allocation shutdown, the alloc will only be server-terminal and we won't send
the extra node RPCs.
Fixes: https://github.com/hashicorp/nomad/issues/24130
Fixes: https://github.com/hashicorp/nomad/issues/25819
Ref: https://hashicorp.atlassian.net/browse/NMD-1001
The node identity TTL defaults to 24hr but can be altered by
setting the node identity TTL parameter. In order to allow setting
and viewing the value, the field is now plumbed through the CLI
and HTTP API.
In order to parse the HCL, a new helper package has been created
which contains generic parsing and decoding functionality for
dealing with HCL that contains time durations. hclsimple can be
used when this functionality is not needed. In order to parse the
JSON, custom marshal and unmarshal functions have been created as
used in many other places.
The node pool init command has been updated to include this new
parameter, although commented out, so reference. The info command
now includes the TTL in its output too.
Expand on the documentation of allocation garbage collection:
* Explain that server-side GC of allocations is tied to the GC of the
evaluation that spawned the allocation.
* Explain that server-side GC of allocations will force them to be immediately
GC'd on the client regardless of the client-side configurations.
Ref: https://github.com/hashicorp/nomad/issues/26765
Co-authored-by: Aimee Ukasick <Aimee.Ukasick@ibm.com>
Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>
I cannot replicate this locally, but it appears that on CI some of our
system jobs take longer than the default 20s to finish deploying. This
PR is just to make sure this isn't the reason these tests fail.
Nomad's periodic block includes a "time_zone" parameter which lets
operators set the time zone at which the next launch interval is
checked against. For this to work, Nomad needs to use the
"time.LoadLocation" which in-turn can use multiple TZ data sources.
When using the Docker image to trigger Nomad job registrations, it
currently does not have access to any TZ data, meaning it is only
aware of UTC. Adding the tzdata package contents to the release
image provides the required data for this to work.
It would have also been possible to set the "-tags" build tag when
releasing Nomad which would embed a copy of the timezone database
in the code. We decided against using the build tag approach as it
is a subtle way that we could introduce bugs that are very
difficult to track down and we prefer the commit approach.
In most RPC endpoints we use the resolved ACL object to determine whether a
given auth token or identity has access to the object of interest to the
RPC. In #15870 we adjusted this across most of the RPCs to handle workload identity.
But in the ACL endpoints that read policies, we can't use the resolved ACL
object and have to go back to the original token and lookup the policies it has
access to. So we need to resolve any workload-associated policies during that
lookup as well.
Fixes: https://github.com/hashicorp/nomad/issues/26764
Ref: https://hashicorp.atlassian.net/browse/NMD-990
Ref: https://github.com/hashicorp/nomad/pull/15870