After changes introduced in #23284 we no longer need to make a if
!st.SupportsNUMA() check in the GetNodes() topology method. In fact this check
will now cause panic in nomadTopologyToProto method on systems that don't
support NUMA.
As part of the work for 1.7.0 we moved portions of the task cgroup setup down
into the executor. This requires that the executor constructor get the
`TaskConfig.Resources` struct, and this was missing from the `qemu` driver. We
fixed a panic caused by this change in #19089 before we shipped, but this fix
was effectively undo after we added plumbing for custom cgroups for `raw_exec`
in 1.8.0. As a result, running `qemu` tasks always fail on Linux.
This was undetected in testing because our CI environment doesn't have QEMU
installed. I've got all the unit tests running locally again and have added QEMU
installation when we're running the drivers tests.
Fixes: https://github.com/hashicorp/nomad/issues/23250
We no longer intend to release 32-bit builds for any platform. We'd previously
removed the builds for i386 on both Linux and Windows, but never got around to
removing the ARM builds. Add a note about this deprecation in the release notes
for 1.8.x.
Configuration changes to use backport assistant with LTS support. These include:
* adding a manifest file for active releases
* adding configuration to send backport to ENT repo
* add LICENSE(.txt) to zip that goes on releases.hashicorp.com
* add LICENSE(.txt) to linux packages and docker image
* add some more docker labels (including license)
The Nomad client renders templates in the same privileged process used for most
other client operations. During internal testing, we discovered that a malicious
task can create a symlink that can cause template rendering to read and write to
arbitrary files outside the allocation sandbox. Because the Nomad agent can be
restarted without restarting tasks, we can't simply check that the path is safe
at the time we write without encountering a time-of-check/time-of-use race.
To protect Nomad client hosts from this attack, we'll now read and write
templates in a subprocess:
* On Linux/Unix, this subprocess is sandboxed via chroot to the allocation
directory. This requires that Nomad is running as a privileged process. A
non-root Nomad agent will warn that it cannot sandbox the template renderer.
* On Windows, this process is sandboxed via a Windows AppContainer which has
been granted access to only to the allocation directory. This does not require
special privileges on Windows. (Creating symlinks in the first place can be
prevented by running workloads as non-Administrator or
non-ContainerAdministrator users.)
Both sandboxes cause encountered symlinks to be evaluated in the context of the
sandbox, which will result in a "file not found" or "access denied" error,
depending on the platform. This change will also require an update to
Consul-Template to allow callers to inject a custom `ReaderFunc` and
`RenderFunc`.
This design is intended as a workaround to allow us to fix this bug without
creating backwards compatibility issues for running tasks. A future version of
Nomad may introduce a read-only mount specifically for templates and artifacts
so that tasks cannot write into the same location that the Nomad agent is.
Fixes: https://github.com/hashicorp/nomad/issues/19888
Fixes: CVE-2024-1329
We don't run the whole suite of unit tests on all platforms to keep CI times
reasonable, so the only things we've been running on Windows are
platform-specific.
I'm working on some platform-specific `template` related work and having these
tests run on Windows will reduce the risk of regressions. Our Windows CI box
doesn't have Consul or Vault, so I've skipped those tests for the time being,
and can follow up with that later. There's also a test with assertions looking
for specific paths, and the results are different on Windows. I've skipped those
for the moment as well and will follow up under a separate PR.
Also swap `testify` for `shoenig/test`
Set up a new test suite that exercises Nomad's compatibility with Consul. This
suite installs all currently supported versions of Consul, spins up a Consul
agent with appropriate configuration, and a Nomad agent running in dev
mode. Then it runs a Connect job against each pair.
* Color the status cell for servers and nodes
* Testfix and changelog
* Leader indicator moved post-word
* Icon and badge treatment
* Capitalizing test checks
* HDS badges dont expose statusClass like we used to, so stop checking for it
6747ef8803 fixes the Nomad client to support using the raw_exec
driver while running as a non-root user. Remove the use of sudo
in the test-e2e workflow for running integration (vaultcompat)
tests.
Trusted Supply Chain Component Registry (TSCCR) enforcement starts Monday and an
internal report shows our semgrep action is pinned to a version that's not
currently permitted. Update all the action versions to whatever's the new
hotness to maximum the time-to-live on these until we have automated pinning
setup.
Also version bumps our chromedriver action, which randomly broke upstream today.
This adds a quick smoke test of our binaries to verify we haven't exceeeded the
maximum GLIBC (2.17) version during linking which would break our ability to
execute on EL7 machines.
this is basically to avoid Fear/Uncertainty/Doubt
the github action actions/setup-go
(and, with a different chache key, hashicorp/setup-golang)
caches both GOMODCACHE (go source files), which is good,
and GOCACHE (build outputs), which *might* be bad,
if the cache was built on an OS with an older glibc
than we want to support. from `go help cache`:
> [...] the build cache does not detect changes to
> C libraries imported with cgo.
so in enterprise we can use Vault for secrets,
without merge conflicts from oss->ent.
also:
* use hashicorp/setup-golang
* setup-js for self-hosted runners
they don't come with yarn, nor chrome,
and might not always match node version.
Update the revision used by the docker action. This should always reflect the commit that's being built as this may differ from the default <github.sha> that the workflow was invoked at.
Goes with https://github.com/hashicorp/actions-docker-build/pull/59 - and should not be merged until this PR is merged and a new version of the action is cut.
Although #17669 fixed the permissions of the release pipeline to push new
commits, there was still an error when invoking the `build` workflow.
The format of the reference was changed in #17103 such that we're sending the
git ref (a SHA) and not the "--ref" argument required by the GH actions workflow
API, which in this case is apparently specially defined as "The branch or tag
name which contains the version of the workflow file you'd like to run" and not
what git calls a "ref".
This changeset:
* Removes the third-party action entirely so that we're using GitHub's own
tooling. This removes one more thing from the supply chain to pin and ensures a
1:1 mapping of args to what's documented by GitHub.
* Removes the `--ref` argument entirely, which causes it to default to the
current branch that the release workflow is running on (which is always what
we want).
Since the matrix exercises different test cases, it's better to allow
all partitions to completely run, even if one of them fails, so it's
easier to catch multiple test failures.
In #17103 we set read-only permissions on all the workflows. Unfortunately we
missed that the `release` workflow makes git commits and pushes them to the
repository, so it needs to have write permissions.