* Stopped status passed through to the statuses endpoint and observed on job model and steady-state panel
* Status passed to statuses endpoint and test for FE model statuses
Update `runc` to 1.1.13 to pick up build support for Go 1.22.4+, in order to
ensure we've resolved errors cloning processes into Linux namespaces for
libcontainer (`exec` driver) with new versions of Go and older but still
supported versions of glibc.
This changeset has two minor quirks:
* Testing shows that the reported issues is already resolved on `main` by
upgrading to Go 1.22.4 without this dependency bump, at least for glibc 2.31.
Upgrading the dependency should make sure there isn't another glibc version
where the problem will still appear.
* This version of `runc` refers to fields in `cilium/ebpf` which are not present
in more recent versions of that library. So in order to build, we have to
downgrade `cilium/ebpf`. Fortunately, `runc` is the only consumer of that
transitive dependency.
Closes: https://github.com/hashicorp/nomad/issues/20212
Ref: https://hashicorp.atlassian.net/browse/NET-10078
* Rework of attributes using pathTree
* Pack meta reintroduced and made local
* attributes table test updated for new pathTree syntax
* removed flat import and extended the PathTree type signature to include prefix
* Slightly darken the is-faded text in tables
Nomad client agents run as privileged processes and require access to much of
the cluster state, secrets, etc. to operate. But we can improve upon this by
tightening up the virtual policy that use for RPC requests authenticated by the
node secret ID. This changeset removes the `node:read`, `plugin:read`, and
`plugin:list` policy, as well as namespace operations. In return, we add a
`AllowClientOp` check to the RPCs the client uses that would otherwise need
those policies.
Where possible, the update RPCs have also been changed to match on node ID so
that a client can only make the RPC that impacts itself. In future work, we may
be able to downscope further by adding node pool filtering to `AllowClientOp`.
Ref: https://github.com/hashicorp/nomad-enterprise/issues/1528
Ref: https://github.com/hashicorp/nomad-enterprise/pull/1529
Ref: https://hashicorp.atlassian.net/browse/NET-9925
The NUMA topology struct field `NodeIDs` is a `idset.Set`, which has no public
members. As a result, this field is never serialized via msgpack and persisted
in state. When `numa.affinity = "prefer"`, the scheduler dereferences this nil
field and panics the scheduler worker.
Ideally we would fix this by adding a msgpack serialization extension, but
because the field already exists and is just always empty, this breaks RPC wire
compatibility across upgrades. Instead, create a new field that's populated at
the same time we populate the more useful `idset.Set`, and repopulate the set on
demand.
Fixes: https://hashicorp.atlassian.net/browse/NET-9924
Update the Consul/Vault build downloader functions so that we include the
current prerelease build (if any) in our E2E compatibility testing we do on each
PR. This will automatically cycle out when the GA build is released, because
that build is "higher" in the sorted set.
The `testing.go` test helpers file for the driver manager initializes the NUMA
scan as a package-global variable. This causes it to be pulled in even in
production builds, so even running commands like `nomad version` will cause the
NUMA scan to happen. Move the scan into the test helper setup.
When an allocation fails it triggers an evaluation. The evaluation is processed
and the scheduler sees it needs to reschedule, which triggers a follow-up
eval. The follow-up eval creates a plan to `(stop 1) (place 1)`. The replacement
alloc has a `RescheduleTracker` (or gets its `RescheduleTracker` updated).
But in the case where the follow-up eval can't place all allocs (there aren't
enough resources), it can create a partial plan to `(stop 1) (place 0)`. It then
creates a blocked eval. The plan applier stops the failed alloc. Then when the
blocked eval is processed, the job is missing an allocation, so the scheduler
creates a new allocation. This allocation is _not_ a replacement from the
perspective of the scheduler, so it's not handed off a `RescheduleTracker`.
This changeset fixes this by annotating the reschedule tracker whenever the
scheduler can't place a replacement allocation. We check this annotation for
allocations that have the `stop` desired status when filtering out allocations
to pass to the reschedule tracker. I've also included tests that cover this case
and expands coverage of the relevant area of the code.
Fixes: https://github.com/hashicorp/nomad/issues/12147
Fixes: https://github.com/hashicorp/nomad/issues/17072