Commit Graph

893 Commits

Author SHA1 Message Date
Juanadelacuesta
445b19ce3e docs: update func docs 2024-10-30 12:35:06 +01:00
Juanadelacuesta
f707a02f4d fix: update test to force recreation of idvalidator 2024-10-30 12:28:59 +01:00
Juanadelacuesta
bba0407250 style: remove unused code and duplicated test 2024-10-30 11:43:04 +01:00
Juanadelacuesta
3fa2717195 style: remove unused code 2024-10-30 11:36:25 +01:00
Juanadelacuesta
a491ceff5f fix: put back MSL license header 2024-10-30 11:25:27 +01:00
Juanadelacuesta
e1a0c7cb43 fix: move exclusive unix test back from driver tests 2024-10-30 11:22:41 +01:00
Juanadelacuesta
9a6d2648c8 style: improve debug logging 2024-10-30 11:21:51 +01:00
Juanadelacuesta
2b9bb7a289 license: change missing file to BUSL 2024-10-30 10:24:35 +01:00
Juanadelacuesta
1751b618e4 func: Add conditional to validation init, to allow for easy testing 2024-10-29 16:45:33 +01:00
Juanadelacuesta
a9a452341c license: update headers to BUSL 2024-10-29 15:54:09 +01:00
Juanadelacuesta
0227788e22 fix: update tests configuration 2024-10-29 15:24:12 +01:00
Juanadelacuesta
0cd1b5ff13 func: move the validation to a dependency and use id sets 2024-10-28 18:59:51 +01:00
Juanadelacuesta
65be613be9 fix: rename test to avoid conflict 2024-10-28 12:17:57 +01:00
Juanadelacuesta
d77dc7dfa4 style: format 2024-10-28 11:46:51 +01:00
Juanadelacuesta
ed04b1bf64 style: remove print 2024-10-28 11:35:03 +01:00
Mike Nomitch
fd7e81dbce Fixing accidental move of helper fn to unix only validators file 2024-10-28 11:15:41 +01:00
Mike Nomitch
c4f2a41da6 Splitting validators unix functions into own file 2024-10-28 11:15:41 +01:00
Mike Nomitch
ff5ab3776c Tweaking user lookup code 2024-10-28 11:15:41 +01:00
Mike Nomitch
e1c226e633 Restructuring IDRange 2024-10-28 11:15:41 +01:00
Mike Nomitch
0fbf592131 moving user out of validators 2024-10-28 11:15:41 +01:00
Mike Nomitch
916af5a948 Moving idrange struct location 2024-10-28 11:15:41 +01:00
Mike Nomitch
9565dde138 Only parsing id ranges once 2024-10-28 11:15:41 +01:00
Mike Nomitch
d0049b1e63 Fixed error in denied_uids spec 2024-10-28 11:15:41 +01:00
Mike Nomitch
6b6a1b5bc4 Fixed windows build error 2024-10-28 11:15:41 +01:00
Mike Nomitch
cf36509474 Removing unnecessary int conversion 2024-10-28 11:15:40 +01:00
Mike Nomitch
9cc3992ca6 Adds ability to restrict uid and gids in exec and raw_exec 2024-10-28 11:15:37 +01:00
Seth Hoenig
b539b54c9e docker: close hijacked write connection when exec ends (#24244) 2024-10-17 11:41:29 -05:00
Seth Hoenig
b18851617f docker: close response connection once stdin is exhausted (#24202) 2024-10-17 11:07:23 -05:00
Piotr Kazmierczak
1ac14f4869 docker: always use API version negotiation when initializing clients (#24237)
During a refactoring of the docker driver in #23966 we introduced a bug: API
version negotiation option was not passed to every new client call.
2024-10-17 15:23:14 +02:00
Tim Gross
d12128c380 docker: use streaming stats collection to correct CPU stats (#24229)
In #23966 we switched to the official Docker SDK for the `docker` driver. In the
process we refactored code around stats collection to use the "one shot" version
of stats. Unfortunately this "one shot" stats collection does not include the
`PreCPU` stats, which are the stats from the previous read. This breaks the
calculation we use to determine CPU ticks, because now we're subtracting 0 from
the current value to get the delta.

Switch back to using the streaming stats collection. Add a test that fully
exercises the `TaskStats` API.

Fixes: https://github.com/hashicorp/nomad/issues/24224
Ref: https://hashicorp.atlassian.net/browse/NET-11348
2024-10-17 08:25:59 -04:00
Piotr Kazmierczak
f9cbaaf6c7 docker: fix a bug where auth for private registries wasn't parsed correctly (#24215)
In #23966 we introduced an official Docker client and did not notice that in
contrast to our previous 3rd party client, the official SDK PullOptions object
expects a base64 encoded JSON with username and password, instead of username/
password pair.
2024-10-16 22:04:54 +02:00
Tim Gross
6b8ddff1fa windows: set job object for executor and children (#24214)
On Windows, if the `raw_exec` driver's executor exits, the child processes are
not also killed. Create a Windows "job object" (not to be confused with a Nomad
job) and add the executor to it. Child processes of the executor will inherit
the job automatically. When the handle to the job object is freed (on executor
exit), the job itself is destroyed and this causes all processes in that job to
exit.

Fixes: https://github.com/hashicorp/nomad/issues/23668
Ref: https://learn.microsoft.com/en-us/windows/win32/procthread/job-objects
2024-10-16 09:20:26 -04:00
Tim Gross
fec91d1dc8 windows: trade heap for stack to build process tree for stats in linear space (#24182)
In #20619 we overhauled how we were gathering stats for Windows
processes. Unlike in Linux where we can ask for processes in a cgroup, on
Windows we have to make a single expensive syscall to get all the processes and
then build the tree ourselves. Our algorithm to do so is recursive and quadratic
in both steps and space with the number of processes on the host. For busy hosts
this hits the stack limit and panics the Nomad client.

We already build a map of parent PID to PID, so modify this to be a map of
parent PID to slice of children and then traverse that tree only from the root
we care about (the executor PID). This moves the allocations to the heap but
makes the stats gathering linear in steps and space required.

This changeset also moves as much of this code as possible into an area
 not conditionally-compiled by OS, as the tagged test file was not being run in CI.

Fixes: https://github.com/hashicorp/nomad/issues/23984
2024-10-14 11:26:38 -04:00
Tim Gross
e9ba630639 docker: fix script check execution (#24098)
In #24095 we made a fix for non-streaming exec into Docker tasks for script
checks and `change_mode = "script"`, but didn't complete E2E testing. We need to
use `ContainerExecAttach` in the new API in order to get stdout/stderr from
tasklets, but the previous `ContainerExecStart` call will prevent this from
running successfully with an error that the exec has already run.

* Ref: [NET-11202 (comment)](https://hashicorp.atlassian.net/browse/NET-11202?focusedCommentId=551618)
* This has shipped in Nomad 1.9.0-beta.1 but not production yet.
* This should fix the remaining issues in nightly E2E for Docker.
2024-10-01 16:41:38 -04:00
Tim Gross
7a88d5d626 docker: fix non-streaming exec attachment (#24095)
In ##23966 when we switched to using the official Docker SDK client, this
included new API calls for attaching to the "exec objects" created for running
processes inside a running Docker task. When we updated the API for the
non-streaming cases (script health checks, and `change_mode = "script"`), we
used the container ID and not the exec object ID. These IDs aren't identical
because you can have multiple exec objects for a given container. This results
in errors like "unable to upgrade to tcp, received 404" because the Docker API
can't find the exec object with the container ID.

* Ref: [NET-11202 (comment)](https://hashicorp.atlassian.net/browse/NET-11202?focusedCommentId=551618)
* This has shipped in Nomad 1.9.0-beta.1 but not production yet.
2024-10-01 11:27:13 -04:00
Tim Gross
bf0a65f2d6 docker: reset timer after collecting stats (#24092)
In ##23966 when we switched to using the official Docker SDK client, we had to
rework the stats collection loop for the new client. But we missed resetting the
timer on the collection loop, which meant that we'd only collect stats once and
then never again.

* Ref: [NET-11202 (comment)](https://hashicorp.atlassian.net/browse/NET-11202?focusedCommentId=550814)
* This has shipped in Nomad 1.9.0-beta.1 but not production yet.
2024-10-01 08:31:03 -04:00
Tim Gross
154aeb77af docker: fix bug in waiting for container to exit (#24081)
In ##23966 when we switched to using the official Docker SDK client, we had more
contexts to add because most of the library methods take one. But for some APIs
like waiting for a container to exit after we've started it, we never want to
close this context, because the operation can outlive the Nomad agent itself.
2024-09-30 08:50:07 -04:00
Piotr Kazmierczak
ec42aa2a1b docker: use docker errdefs instead of string comparisons when checking errors (#24075) 2024-09-27 15:32:29 +02:00
Piotr Kazmierczak
981ca36049 docker: use official client instead of fsouza/go-dockerclient (#23966)
This PR replaces fsouza/go-dockerclient 3rd party docker client library with
docker's official SDK.

---------

Co-authored-by: Tim Gross <tgross@hashicorp.com>
Co-authored-by: Seth Hoenig <shoenig@duck.com>
2024-09-26 18:41:44 +02:00
Seth Hoenig
51215bf102 deps: update to go-set/v3 and refactor to use custom iterators (#23971)
* deps: update to go-set/v3

* deps: use custom set iterators for looping
2024-09-16 13:40:10 -05:00
Tim Gross
192d70cee7 docker: update infra_image to new registry (#23927)
The gcr.io container registry is shutting down in March. Update the default
`image_image` for Docker's "pause" containers to point to the new location
hosted by the k8s project.

Fixes: https://github.com/hashicorp/nomad/issues/23911
Ref: https://hashicorp.atlassian.net/browse/NET-10942
2024-09-06 14:34:03 -04:00
Tim Gross
6aa503f2bb docker: disable cpuset management for non-root clients (#23804)
Nomad clients manage a cpuset cgroup for each task to reserve or share CPU
cores. But Docker owns its own cgroups, and attempting to set a parent cgroup
that Nomad manages runs into conflicts with how runc manages cgroups via
systemd. Therefore Nomad must run as root in order for cpuset management to ever
be compatible with Docker.

However, some users running in unsupported configurations felt that the changes
we made in Nomad 1.7.0 to ensure Nomad was running correctly represented a
regression. This changeset disables cpuset management for non-root Nomad
clients. When running Nomad as non-root, the driver will not longer reconcile
cpusets with Nomad and `resources.cores` will behave incorrectly (but the driver
will still run).

Although this is one small step along the way to supporting a rootless Nomad
client, running Nomad as non-root is still unsupported. This PR is insufficient
by itself to have a secure and properly-working rootless Nomad client.

Ref: https://github.com/hashicorp/nomad/issues/18211
Ref: https://github.com/hashicorp/nomad/issues/13669
Ref: https://hashicorp.atlassian.net/browse/NET-10652
Ref: https://github.com/opencontainers/runc/blob/main/docs/systemd.md
2024-08-14 16:44:13 -04:00
Tim Gross
9543e740af docker: fix delimiter for selinux label for read-only volumes (#23750)
The Docker driver's `volume` field to specify bind-mounts takes a list of
strings that consist of three `:`-delimited fields: source, destination, and
options. We append the SELinux label from the plugin configuration as the third
field. But when the user has already specified the volume is read-only with
`:ro`, we're incorrectly appending the SELinux label with another `:` instead of
the required `,`.

Combine the options into a single field value before appending them to the bind
mounts configuration. Updated the tests to split out Windows behavior (which
doesn't accept options) and to ensure the test task has the expected environment
for bind mounts.

Fixes: https://github.com/hashicorp/nomad/issues/23690
2024-08-08 09:08:01 -04:00
Tim Gross
b25f1b66ce resources: allow job authors to configure size of secrets tmpfs (#23696)
On supported platforms, the secrets directory is a 1MiB tmpfs. But some tasks
need larger space for downloading large secrets. This is especially the case for
tasks using `templates`, which need extra room to write a temporary file to the
secrets directory that gets renamed to the old file atomically.

This changeset allows increasing the size of the tmpfs in the `resources`
block. Because this is a memory resource, we need to include it in the memory we
allocate for scheduling purposes. The task is already prevented from using more
memory in the tmpfs than the `resources.memory` field allows, but can bypass
that limit by writing to the tmpfs via `template` or `artifact` blocks.

Therefore, we need to account for the size of the tmpfs in the allocation
resources. Simply adding it to the memory needed when we create the allocation
allows it to be accounted for in all downstream consumers, and then we'll
subtract that amount from the memory resources just before configuring the task
driver.

For backwards compatibility, the default value of 1MiB is "free" and ignored by
the scheduler. Otherwise we'd be increasing the allocated resources for every
existing alloc, which could cause problems across upgrades. If a user explicitly
sets `resources.secrets = 1` it will no longer be free.

Fixes: https://github.com/hashicorp/nomad/issues/2481
Ref: https://hashicorp.atlassian.net/browse/NET-10070
2024-08-05 16:06:58 -04:00
Piotr Kazmierczak
f22ce921cd docker: adjust capabilities on Windows (#23599)
Adjusts Docker capabilities per OS, and checks for runtime on Windows.

---------

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2024-07-17 09:01:45 +02:00
Tim Gross
eedbd36fef qemu: pass task resources into driver for cgroup setup (#23466)
As part of the work for 1.7.0 we moved portions of the task cgroup setup down
into the executor. This requires that the executor constructor get the
`TaskConfig.Resources` struct, and this was missing from the `qemu` driver. We
fixed a panic caused by this change in #19089 before we shipped, but this fix
was effectively undo after we added plumbing for custom cgroups for `raw_exec`
in 1.8.0. As a result, running `qemu` tasks always fail on Linux.

This was undetected in testing because our CI environment doesn't have QEMU
installed. I've got all the unit tests running locally again and have added QEMU
installation when we're running the drivers tests.

Fixes: https://github.com/hashicorp/nomad/issues/23250
2024-07-01 11:41:10 -04:00
Piotr Kazmierczak
d5e1515e80 docker: default to hyper-v isolation on Windows (#23452) 2024-07-01 08:56:43 +02:00
Piotr Kazmierczak
0ece7b5c16 docker: validate that containers do not run as ContainerAdmin on Windows (#23443)
This enables checks for ContainerAdmin user on docker images on Windows. It's
only checked if users run docker with process isolation and not hyper-v,
because hyper-v provides its own, proper sandboxing.

---------

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2024-06-27 16:22:24 +02:00
Piotr Kazmierczak
85430be6dd raw_exec: oom_score_adj support (#23308) 2024-06-14 11:36:27 +02:00
Luke Palmer
75874136ac fix cgroup setup for non-default devices (#22518) 2024-06-13 09:27:19 -04:00