When setting up the timer for heartbeat invalidation, there's no control that
allows us to remove that timer when the node is GC'd. If the GC window is narrow
enough, it's possible to GC a node that has a waiting heartbeat timer. In this
case, we hit a bug where querying for the node returns `nil` and this is
incorrectly handled when checking for disconnect/reconnect state. Fix this bug
by correctly handling a `nil` node and allowing the `Node.Update` RPC to fire
normally (which then errors correctly).
Fixes: https://github.com/hashicorp/nomad/issues/23376
Ref: https://hashicorp.atlassian.net/browse/NET-10109
* Stopped status passed through to the statuses endpoint and observed on job model and steady-state panel
* Status passed to statuses endpoint and test for FE model statuses
Update `runc` to 1.1.13 to pick up build support for Go 1.22.4+, in order to
ensure we've resolved errors cloning processes into Linux namespaces for
libcontainer (`exec` driver) with new versions of Go and older but still
supported versions of glibc.
This changeset has two minor quirks:
* Testing shows that the reported issues is already resolved on `main` by
upgrading to Go 1.22.4 without this dependency bump, at least for glibc 2.31.
Upgrading the dependency should make sure there isn't another glibc version
where the problem will still appear.
* This version of `runc` refers to fields in `cilium/ebpf` which are not present
in more recent versions of that library. So in order to build, we have to
downgrade `cilium/ebpf`. Fortunately, `runc` is the only consumer of that
transitive dependency.
Closes: https://github.com/hashicorp/nomad/issues/20212
Ref: https://hashicorp.atlassian.net/browse/NET-10078
* Rework of attributes using pathTree
* Pack meta reintroduced and made local
* attributes table test updated for new pathTree syntax
* removed flat import and extended the PathTree type signature to include prefix
* Slightly darken the is-faded text in tables
Nomad client agents run as privileged processes and require access to much of
the cluster state, secrets, etc. to operate. But we can improve upon this by
tightening up the virtual policy that use for RPC requests authenticated by the
node secret ID. This changeset removes the `node:read`, `plugin:read`, and
`plugin:list` policy, as well as namespace operations. In return, we add a
`AllowClientOp` check to the RPCs the client uses that would otherwise need
those policies.
Where possible, the update RPCs have also been changed to match on node ID so
that a client can only make the RPC that impacts itself. In future work, we may
be able to downscope further by adding node pool filtering to `AllowClientOp`.
Ref: https://github.com/hashicorp/nomad-enterprise/issues/1528
Ref: https://github.com/hashicorp/nomad-enterprise/pull/1529
Ref: https://hashicorp.atlassian.net/browse/NET-9925
The NUMA topology struct field `NodeIDs` is a `idset.Set`, which has no public
members. As a result, this field is never serialized via msgpack and persisted
in state. When `numa.affinity = "prefer"`, the scheduler dereferences this nil
field and panics the scheduler worker.
Ideally we would fix this by adding a msgpack serialization extension, but
because the field already exists and is just always empty, this breaks RPC wire
compatibility across upgrades. Instead, create a new field that's populated at
the same time we populate the more useful `idset.Set`, and repopulate the set on
demand.
Fixes: https://hashicorp.atlassian.net/browse/NET-9924
Update the Consul/Vault build downloader functions so that we include the
current prerelease build (if any) in our E2E compatibility testing we do on each
PR. This will automatically cycle out when the GA build is released, because
that build is "higher" in the sorted set.
The `testing.go` test helpers file for the driver manager initializes the NUMA
scan as a package-global variable. This causes it to be pulled in even in
production builds, so even running commands like `nomad version` will cause the
NUMA scan to happen. Move the scan into the test helper setup.