* HashiCorp Design System upgraded to 3.6.0
* Fresh yarn
* Responses out of range are brought back within
* General pass at a11y fixes with updated components and node
* Further tooltip updates
* 3 more partitions worth of toggle and tooltip updates
* scale-events-accordion and topo-viz node fixes
When a job eval is blocked due to missing capacity, the `nomad job run`
command will monitor the deployment, which may succeed once additional
capacity is made available.
But the current implementation would return `2` even when the deployment
succeeded because it only took the first eval status into account.
This commit updates the eval monitoring logic to reset the scheduling
error state if the deployment eventually succeeds.
Add new configuration option on task's volume_mounts, to give a fine grained control over SELinux "z" label
* Update website/content/docs/job-specification/volume_mount.mdx
Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
* fix: typo
* func: make volume mount verification happen even on mounts with no volume
---------
Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
Co-authored-by: Tim Gross <tgross@hashicorp.com>
Although Nomad itself is not vulnerable to CVE-2024-21626, we want to update
dependencies that bring in the vulnerable packages so as not to trip
vulnerability scanners. Update `containerd` and `go-dockerclient` as well as the
various transitive dependencies these bring in.
We don't run the whole suite of unit tests on all platforms to keep CI times
reasonable, so the only things we've been running on Windows are
platform-specific.
I'm working on some platform-specific `template` related work and having these
tests run on Windows will reduce the risk of regressions. Our Windows CI box
doesn't have Consul or Vault, so I've skipped those tests for the time being,
and can follow up with that later. There's also a test with assertions looking
for specific paths, and the results are different on Windows. I've skipped those
for the moment as well and will follow up under a separate PR.
Also swap `testify` for `shoenig/test`
This PR refactors a helper function for getting the UID associated with
a given username to also return the GID and home directory. Also adds
unit tests on the known values of root and nobody user on Ubuntu Linux.
Some packages licensed under MPL-2.0 were incorrectly importing code
from packages licensed under BUSL-1.1.
Not all imports are fixed here as they will require additional work to
untangle them. To help track progress this commit adds a Semgrep rule
that detects incorrect BUSL-1.1 imports in MPL-2.0 packages.
Fixes#19781
Do not mark the envoy bootstrap hook as done after successfully running once.
Since the bootstrap file is written to /secrets, which is a tmpfs on supported
platforms, it is not persisted across reboots. This causes the task and
allocation to fail on reboot (see #19781).
This fixes it by *always* rewriting the envoy bootstrap file every time the
Nomad agent starts. This does mean we may write a new bootstrap file to an
already running Envoy task, but in my testing that doesn't have any impact.
This commit doesn't necessarily fix every use of Done by hooks, but hopefully
improves the situation. The comment on Done has been expanded to hopefully
avoid misuse in the future.
Done assertions were removed from tests as they add more noise than value.
*Alternative 1: Use a regular file*
An alternative approach would be to write the bootstrap file somewhere
other than the tmpfs, but this is *unsafe* as when Consul ACLs are
enabled the file will contain a secret token:
https://developer.hashicorp.com/consul/commands/connect/envoy#bootstrap
*Alternative 2: Detect if file is already written*
An alternative approach would be to detect if the bootstrap file exists,
and only write it if it doesn't.
This is just a more complicated form of the current fix. I think in
general in the absence of other factors task hooks should be idempotent
and therefore able to rerun on any agent startup. This simplifies the
code and our ability to reason about task restarts vs agent restarts vs
node reboots by making them all take the same code path.
Script checks don't support Consul's `success_before_passing`, `failures_before_critical`, or `failures_before_warning` because they're run by Nomad and not by Consul
Adds Namespace UI to Access Control - Also adds two step buttons to other Access Control pages
---------
Co-authored-by: Phil Renaud <phil@riotindustries.com>
The new `nomad setup vault -check` commmand can be used to retrieve
information about the changes required before a cluster is migrated from
the deprecated legacy authentication flow with Vault to use only
workload identities.
* client/allocdir: use an interface in place of AllocDir structs
This PR replace *allocdir.AllocDir with allocdir.Interface such that we
may eventually have another implementation of alloc directories. This is
in support of the exec2 driver, which will need an implementation of the
alloc directory incompatibile with the current version.
* use rlock
When jobs are deleted with -purge, all their deployments and allocations should
be deleted from the state store, and the evals status should be set to complete.
Otherwise we end up in a situation where users could re-submit previously
failing jobs, but these new jobs would not get deployments allocated unless
system gc got called.
Even with the new workload identitiy based flow the Nomad servers still
need the `acl = "write"` permission in order to revoke service identity
tokens.
* e2e: move rawexec oversub tests into oversubscription e2e test suite
This PR moves two tests for raw_exec and memory oversubscription into
the oversubscription test suite, which has the necessary plumbing to
activate and restore the oversubscription configuration of the scheduler
during the test.
* cr: rename files for better readability
When using the no-op Vault client the Nomad server still needs to delete
the revoked Vault accessors from state to prevent them from lingering
forever after the cluster migrates to the workload identity flow.