Some time ago the Windows host we were using as a Nomad client agent test target
started failing to allow ssh connections. The underlying problem appears to be
with sysprep but I wasn't able to debug the exact cause as it's not an area I
have a lot of expertise in.
Swap out the deprecated Windows 2016 host for a Windows 2022 host. This will use
a base image provided by Amazon and then we'll use a userdata script to
bootstrap ssh and some target directories for Terraform to upload files to. The
more modern Windows will let us drop some of extra powershell scripts we were
using as well.
Fixes: https://hashicorp.atlassian.net/browse/NMD-151
Fixes: https://github.com/hashicorp/nomad-e2e/issues/125
As of April 1, Docker Hub rate limits tightened. With only 10 pulls/hr/IP, we're
likely to encounter test failures. Switch all Docker images getting pulled from
this repository to use the HashiCorp managed registry mirror.
Note that most of our tests in `drivers/docker` don't pull from the remote
registry but load a local image, while others will need to pull from the remote
and fetch different images depending on OS/arch. Refactor the definition of test
task configuration to make it clear which is which, and de-factor some false
sharing of setup functions.
Updates the E2E tests to use that registry by configuring the Docker
daemon. This required changing out a few container images that we don't have in
the registry, but these new images are all smaller. There are a couple of tests
that still use explicitly-tagged `docker.io` images or other third-party
registries, which have been left in place.
Ref: https://hashicorp.atlassian.net/browse/NET-12233
update E2E images to those in the registry mirror
fix windows and docklog test build
fix stopsignal test
mop-up
more mop-up
No functional changes, just cleaning up deprecated usages that are
removed in v2 and replace one call of .Slice with .ForEach to avoid
making the intermediate copy.
Fix the e2e case where we download the go-getter bomb.zip test file, which
is being removed on main. We can still get it from the version tag - yay git!
* artifact: protect against unbounded artifact decompression
Starting with 1.5.0, set defaut values for artifact decompression limits.
artifact.decompression_size_limit (default "100GB") - the maximum amount of
data that will be decompressed before triggering an error and cancelling
the operation
artifact.decompression_file_count_limit (default 4096) - the maximum number
of files that will be decompressed before triggering an error and
cancelling the operation.
* artifact: assert limits cannot be nil in validation
This PR fixes the artifact sandbox (new in Nomad 1.5) to allow downloading
artifacts into the shared 'alloc' directory made available to each task in
a common allocation. Previously we assumed the 'alloc' dir would be mounted
under the 'task' dir, but this is only the case in fs isolation: chroot; in
other modes the alloc dir is elsewhere.
* client: sandbox go-getter subprocess with landlock
This PR re-implements the getter package for artifact downloads as a subprocess.
Key changes include
On all platforms, run getter as a child process of the Nomad agent.
On Linux platforms running as root, run the child process as the nobody user.
On supporting Linux kernels, uses landlock for filesystem isolation (via go-landlock).
On all platforms, restrict environment variables of the child process to a static set.
notably TMP/TEMP now points within the allocation's task directory
kernel.landlock attribute is fingerprinted (version number or unavailable)
These changes make Nomad client more resilient against a faulty go-getter implementation that may panic, and more secure against bad actors attempting to use artifact downloads as a privilege escalation vector.
Adds new e2e/artifact suite for ensuring artifact downloading works.
TODO: Windows git test (need to modify the image, etc... followup PR)
* landlock: fixup items from cr
* cr: fixup tests and go.mod file