When we attempt to drop unneeded evals from the eval broker, if the eval has
been GC'd before the check is made, we hit a nil pointer. Check that the eval
actually exists before attempting to remove it from the broker.
Fixes: https://github.com/hashicorp/nomad/issues/26871
Node reconciler never took node feasibility into account. In cases when
there were nodes excluded from allocation placement due to constraints
not being met, for example, the desired total or desired canary numbers
were never updated in the reconciler to account for that. Thus,
deployments would never become successful.
If the latest Consul version has known bugs that cause failures of
the test suite, it is useful to be able to skip this. Otherwise,
CI will fail on all PRs and release branches until a new version
is released.
In cases where system jobs had the same amount of canary allocations
deployed as there were eligible nodes, the scheduler would incorrectly
mark the deployment as complete, as if auto promotion was set. This edge
case uncovered a bug in the setDeploymentStatusAndUpdates method, and
since we round up canary nodes, it may not be such an edge case
afterall.
---------
Co-authored-by: Tim Gross <tgross@hashicorp.com>
ACL policies are parsed when creating, updating, or compiling the
resulting ACL object when used. This parsing was silently ignoring
duplicate singleton keys, or invalid keys which does not grant any
additional access, but is a poor UX and can be unexpected.
This change parses all new policy writes and updates, so that
duplicate or invalid keys return an error to the caller. This is
called strict parsing. In order to correctly handle upgrades of
clusters which have existing policies that would fall foul of the
change, a lenient parsing mode is also available. This allows
the policy to continue to be parsed and compiled after an upgrade
without the need for an operator to correct the policy document
prior to further use.
Co-authored-by: Tim Gross <tgross@hashicorp.com>
In a cluster where the Nomad servers have been upgraded before the
clients, the cluster leader will generate an identity for each
client at every heartbeat. The clients will not have the code path
to handle this response, so the object is thrown away. This wastes
server resources.
This change introduces a minimum version check to the logic which
decides whether an identity should be generated. In the situation
above, the leader will now decline to generate identities for Nomad
clients running pre-1.11 versions.
In #26831 we're preventing unexpected node RPCs by ensuring that the volume
watcher only unpublishes when allocations are client-terminal.
To mitigate any remaining similar issues, add serialization of node plugin RPCs,
as we did for controller plugin RPCs in #17996 and as recommended ("SHOULD") by
the CSI specification. Here we can do per-volume serialization rather than
per-plugin serialization.
Reorder the methods of the `volumeManager` in the client so that each interface
method and its directly-associated helper methods read from top-to-bottom,
instead of a mix of directions.
Ref: https://github.com/hashicorp/nomad/pull/17996
Ref: https://github.com/hashicorp/nomad/pull/26831
While working on #26831 and #26832 I made some minor improvements to our
end-to-end test setup for CSI:
* bump the AWS EBS plugin versions to latest release (1.48.0)
* remove the unnnecessary `datacenters` field from the AWS EBS plugin jobs
* add a name tag to the EBS volumes we create
* add a user-specific name tag to the cluster name when using the makefile to
deploy a cluster
* add volumes and other missing variables from the `provision-infra` module to
the main E2E module
Ref: https://github.com/hashicorp/nomad/pull/26832
Ref: https://github.com/hashicorp/nomad/pull/26831
The volume watcher checks whether any allocations that have claims are terminal
so that it knows if it's safe to unpublish the volume. This check was
considering a claim as unpublishable if the allocation was terminal on either
the server or client, rather than the client alone. In many circumstances this
is safe.
But if an allocation takes a while to stop (ex. it has a `shutdown_delay`), it's
possible for garbage collection to run in the window between when the alloc is
marked server-terminal and when the task is actually stopped. The server
unpublishes the volume which sends a node plugin RPC. The plugin unmounts the
volume while it's in use, and then unmounts it again when the allocation stops
and the CSI postrun hook runs. If the task writes to the volume during the
unmounting process, some providers end up in a broken state and the volume is
not usable unless it's detached and reattached.
Fix this by considering a claim a "past claim" only when the allocation is
client terminal. This way if garbage collection runs while we're waiting for
allocation shutdown, the alloc will only be server-terminal and we won't send
the extra node RPCs.
Fixes: https://github.com/hashicorp/nomad/issues/24130
Fixes: https://github.com/hashicorp/nomad/issues/25819
Ref: https://hashicorp.atlassian.net/browse/NMD-1001
The node identity TTL defaults to 24hr but can be altered by
setting the node identity TTL parameter. In order to allow setting
and viewing the value, the field is now plumbed through the CLI
and HTTP API.
In order to parse the HCL, a new helper package has been created
which contains generic parsing and decoding functionality for
dealing with HCL that contains time durations. hclsimple can be
used when this functionality is not needed. In order to parse the
JSON, custom marshal and unmarshal functions have been created as
used in many other places.
The node pool init command has been updated to include this new
parameter, although commented out, so reference. The info command
now includes the TTL in its output too.
Expand on the documentation of allocation garbage collection:
* Explain that server-side GC of allocations is tied to the GC of the
evaluation that spawned the allocation.
* Explain that server-side GC of allocations will force them to be immediately
GC'd on the client regardless of the client-side configurations.
Ref: https://github.com/hashicorp/nomad/issues/26765
Co-authored-by: Aimee Ukasick <Aimee.Ukasick@ibm.com>
Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>
I cannot replicate this locally, but it appears that on CI some of our
system jobs take longer than the default 20s to finish deploying. This
PR is just to make sure this isn't the reason these tests fail.
Nomad's periodic block includes a "time_zone" parameter which lets
operators set the time zone at which the next launch interval is
checked against. For this to work, Nomad needs to use the
"time.LoadLocation" which in-turn can use multiple TZ data sources.
When using the Docker image to trigger Nomad job registrations, it
currently does not have access to any TZ data, meaning it is only
aware of UTC. Adding the tzdata package contents to the release
image provides the required data for this to work.
It would have also been possible to set the "-tags" build tag when
releasing Nomad which would embed a copy of the timezone database
in the code. We decided against using the build tag approach as it
is a subtle way that we could introduce bugs that are very
difficult to track down and we prefer the commit approach.
In most RPC endpoints we use the resolved ACL object to determine whether a
given auth token or identity has access to the object of interest to the
RPC. In #15870 we adjusted this across most of the RPCs to handle workload identity.
But in the ACL endpoints that read policies, we can't use the resolved ACL
object and have to go back to the original token and lookup the policies it has
access to. So we need to resolve any workload-associated policies during that
lookup as well.
Fixes: https://github.com/hashicorp/nomad/issues/26764
Ref: https://hashicorp.atlassian.net/browse/NMD-990
Ref: https://github.com/hashicorp/nomad/pull/15870
The test was incorrectly writing to state that registration had
been finished before writing the node identity token. This is the
opposite of what happens in the client code and caused a timing
issue which meant we read registration as completed before we had
the identity available and therefore returned the secret ID.
In Nomad Enterprise we can fingerprint multiple Consul datacenters. If neither
is `"default"` then we end up with warning logs about adding a "link".
The `Link` field on the `Node` struct is a map of attributes that only
contributes to the node's computed hash. The `"consul"` key's value is derived
from the `unique.consul.name` attribute, which only exists if there's a default
Consul cluster.
Update the fingerprint to skip setting the link field if there's no
`unique.consul.name`, and lower the warning log for malformed fields to debug;
this is a minor scheduling optimization largely captured by existing Consul
fields in the node computed class. The only reason not to remove it entirely is
to avoid changing computed classes on existing large clusters.
Fixes: https://github.com/hashicorp/nomad/issues/26781
Ref: https://hashicorp.atlassian.net/browse/NMD-998
On Windows, the `os.Process.Signal` method returns an error when sending
`os.Interrupt` (SIGINT) because it isn't implemented. This causes test servers
in the `testutil` packages to break on Windows. Use the platform specific
syscalls to generate the SIGINT instead.
The agent's signal handler also did not correctly handle the Ctrl-C because we
were masking os.Interrupt instead of SIGINT.
Fixes: https://github.com/hashicorp/nomad/issues/26775
Co-authored-by: Chris Roberts <croberts@hashicorp.com>
When calling the client identity renew API, it is possible the
target node ID is provided by either the URI or within the request
body. This change fixes a bug where all calls using a node_id query
parameter would be reject as it failed to decode the empty request
body.
Co-authored-by: Tim Gross <tgross@hashicorp.com>