Commit Graph

417 Commits

Author SHA1 Message Date
James Rasell
0726e4cc3e driver/docker: Fix container CPU stats collection (#24768)
The recent change to collection via a "one-shot" Docker API call
did not update the stream boolean argument. This results in the
PreCPUStats values being zero and therefore breaking the CPU
calculations which rely on this data. The base fix is to update
the passed boolean parameter to match the desired non-streaming
behaviour. The non-streaming API call correctly returns the
PreCPUStats data which can be seen in the added unit test.

The most recent change also modified the behaviour of the
collectStats go routine, so that any error encountered results in
the routine exiting. In the event this was a transient error, the
container will continue to run, however, no stats will be collected
until the task is stopped and replaced. This PR reverts the
behaviour, so that an error encountered during a stats collection
run results in the error being logged but the collection process
continuing with a backoff used.
2025-01-07 07:42:31 +00:00
Vincent Ducamps
6469b59a0a docker: Fix a bug where images with port number and no tags weren't parsed correctly 2025-01-03 11:38:43 +01:00
Juana De La Cuesta
a9e7166b6b [gh-24339] Move from streaming stats to polling for docker (#24525)
* fix: dont stream the docker stats, read them one by one

* func: add a NewSafeTicker to the herlper functions

* style: remove commented code
2024-11-21 17:36:53 +01:00
Seth Hoenig
b539b54c9e docker: close hijacked write connection when exec ends (#24244) 2024-10-17 11:41:29 -05:00
Seth Hoenig
b18851617f docker: close response connection once stdin is exhausted (#24202) 2024-10-17 11:07:23 -05:00
Piotr Kazmierczak
1ac14f4869 docker: always use API version negotiation when initializing clients (#24237)
During a refactoring of the docker driver in #23966 we introduced a bug: API
version negotiation option was not passed to every new client call.
2024-10-17 15:23:14 +02:00
Tim Gross
d12128c380 docker: use streaming stats collection to correct CPU stats (#24229)
In #23966 we switched to the official Docker SDK for the `docker` driver. In the
process we refactored code around stats collection to use the "one shot" version
of stats. Unfortunately this "one shot" stats collection does not include the
`PreCPU` stats, which are the stats from the previous read. This breaks the
calculation we use to determine CPU ticks, because now we're subtracting 0 from
the current value to get the delta.

Switch back to using the streaming stats collection. Add a test that fully
exercises the `TaskStats` API.

Fixes: https://github.com/hashicorp/nomad/issues/24224
Ref: https://hashicorp.atlassian.net/browse/NET-11348
2024-10-17 08:25:59 -04:00
Piotr Kazmierczak
f9cbaaf6c7 docker: fix a bug where auth for private registries wasn't parsed correctly (#24215)
In #23966 we introduced an official Docker client and did not notice that in
contrast to our previous 3rd party client, the official SDK PullOptions object
expects a base64 encoded JSON with username and password, instead of username/
password pair.
2024-10-16 22:04:54 +02:00
Tim Gross
e9ba630639 docker: fix script check execution (#24098)
In #24095 we made a fix for non-streaming exec into Docker tasks for script
checks and `change_mode = "script"`, but didn't complete E2E testing. We need to
use `ContainerExecAttach` in the new API in order to get stdout/stderr from
tasklets, but the previous `ContainerExecStart` call will prevent this from
running successfully with an error that the exec has already run.

* Ref: [NET-11202 (comment)](https://hashicorp.atlassian.net/browse/NET-11202?focusedCommentId=551618)
* This has shipped in Nomad 1.9.0-beta.1 but not production yet.
* This should fix the remaining issues in nightly E2E for Docker.
2024-10-01 16:41:38 -04:00
Tim Gross
7a88d5d626 docker: fix non-streaming exec attachment (#24095)
In ##23966 when we switched to using the official Docker SDK client, this
included new API calls for attaching to the "exec objects" created for running
processes inside a running Docker task. When we updated the API for the
non-streaming cases (script health checks, and `change_mode = "script"`), we
used the container ID and not the exec object ID. These IDs aren't identical
because you can have multiple exec objects for a given container. This results
in errors like "unable to upgrade to tcp, received 404" because the Docker API
can't find the exec object with the container ID.

* Ref: [NET-11202 (comment)](https://hashicorp.atlassian.net/browse/NET-11202?focusedCommentId=551618)
* This has shipped in Nomad 1.9.0-beta.1 but not production yet.
2024-10-01 11:27:13 -04:00
Tim Gross
bf0a65f2d6 docker: reset timer after collecting stats (#24092)
In ##23966 when we switched to using the official Docker SDK client, we had to
rework the stats collection loop for the new client. But we missed resetting the
timer on the collection loop, which meant that we'd only collect stats once and
then never again.

* Ref: [NET-11202 (comment)](https://hashicorp.atlassian.net/browse/NET-11202?focusedCommentId=550814)
* This has shipped in Nomad 1.9.0-beta.1 but not production yet.
2024-10-01 08:31:03 -04:00
Tim Gross
154aeb77af docker: fix bug in waiting for container to exit (#24081)
In ##23966 when we switched to using the official Docker SDK client, we had more
contexts to add because most of the library methods take one. But for some APIs
like waiting for a container to exit after we've started it, we never want to
close this context, because the operation can outlive the Nomad agent itself.
2024-09-30 08:50:07 -04:00
Piotr Kazmierczak
ec42aa2a1b docker: use docker errdefs instead of string comparisons when checking errors (#24075) 2024-09-27 15:32:29 +02:00
Piotr Kazmierczak
981ca36049 docker: use official client instead of fsouza/go-dockerclient (#23966)
This PR replaces fsouza/go-dockerclient 3rd party docker client library with
docker's official SDK.

---------

Co-authored-by: Tim Gross <tgross@hashicorp.com>
Co-authored-by: Seth Hoenig <shoenig@duck.com>
2024-09-26 18:41:44 +02:00
Seth Hoenig
51215bf102 deps: update to go-set/v3 and refactor to use custom iterators (#23971)
* deps: update to go-set/v3

* deps: use custom set iterators for looping
2024-09-16 13:40:10 -05:00
Tim Gross
192d70cee7 docker: update infra_image to new registry (#23927)
The gcr.io container registry is shutting down in March. Update the default
`image_image` for Docker's "pause" containers to point to the new location
hosted by the k8s project.

Fixes: https://github.com/hashicorp/nomad/issues/23911
Ref: https://hashicorp.atlassian.net/browse/NET-10942
2024-09-06 14:34:03 -04:00
Tim Gross
6aa503f2bb docker: disable cpuset management for non-root clients (#23804)
Nomad clients manage a cpuset cgroup for each task to reserve or share CPU
cores. But Docker owns its own cgroups, and attempting to set a parent cgroup
that Nomad manages runs into conflicts with how runc manages cgroups via
systemd. Therefore Nomad must run as root in order for cpuset management to ever
be compatible with Docker.

However, some users running in unsupported configurations felt that the changes
we made in Nomad 1.7.0 to ensure Nomad was running correctly represented a
regression. This changeset disables cpuset management for non-root Nomad
clients. When running Nomad as non-root, the driver will not longer reconcile
cpusets with Nomad and `resources.cores` will behave incorrectly (but the driver
will still run).

Although this is one small step along the way to supporting a rootless Nomad
client, running Nomad as non-root is still unsupported. This PR is insufficient
by itself to have a secure and properly-working rootless Nomad client.

Ref: https://github.com/hashicorp/nomad/issues/18211
Ref: https://github.com/hashicorp/nomad/issues/13669
Ref: https://hashicorp.atlassian.net/browse/NET-10652
Ref: https://github.com/opencontainers/runc/blob/main/docs/systemd.md
2024-08-14 16:44:13 -04:00
Tim Gross
9543e740af docker: fix delimiter for selinux label for read-only volumes (#23750)
The Docker driver's `volume` field to specify bind-mounts takes a list of
strings that consist of three `:`-delimited fields: source, destination, and
options. We append the SELinux label from the plugin configuration as the third
field. But when the user has already specified the volume is read-only with
`:ro`, we're incorrectly appending the SELinux label with another `:` instead of
the required `,`.

Combine the options into a single field value before appending them to the bind
mounts configuration. Updated the tests to split out Windows behavior (which
doesn't accept options) and to ensure the test task has the expected environment
for bind mounts.

Fixes: https://github.com/hashicorp/nomad/issues/23690
2024-08-08 09:08:01 -04:00
Piotr Kazmierczak
f22ce921cd docker: adjust capabilities on Windows (#23599)
Adjusts Docker capabilities per OS, and checks for runtime on Windows.

---------

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2024-07-17 09:01:45 +02:00
Piotr Kazmierczak
d5e1515e80 docker: default to hyper-v isolation on Windows (#23452) 2024-07-01 08:56:43 +02:00
Piotr Kazmierczak
0ece7b5c16 docker: validate that containers do not run as ContainerAdmin on Windows (#23443)
This enables checks for ContainerAdmin user on docker images on Windows. It's
only checked if users run docker with process isolation and not hyper-v,
because hyper-v provides its own, proper sandboxing.

---------

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2024-06-27 16:22:24 +02:00
Piotr Kazmierczak
830297bcf0 docker: update image in TestDockerDriver_Start_Image_HTTPS (#23309) 2024-06-12 16:13:39 +02:00
Piotr Kazmierczak
0e8a67f0e1 docker: oom_score_adj support (#23297) 2024-06-12 10:49:20 +02:00
Tim Gross
71fd5c2474 testing: pull Docker images from mirror (#23190)
In https://github.com/hashicorp/nomad/pull/17401 we added test helpers that
would allow `docker` driver tests to pull from a mirror of the Docker Hub
registry. Extend the use of this helper a test that recently hit
rate-limiting.

Fixes: https://github.com/hashicorp/nomad/issues/23174
2024-06-06 11:21:45 -04:00
Piotr Kazmierczak
307fd590d7 docker: new container_exists_attempts configuration field (#22419)
This allows users to set a custom value of attempts that will be made to purge
an existing (not running) container if one is found during task creation.

---------

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2024-05-30 19:22:14 +02:00
Piotr Kazmierczak
bf11e39ac8 docker: add a unit test for "container already exists" error when creating containers (#22238) 2024-05-30 11:24:28 +02:00
Seth Hoenig
825efc3925 docker: use correct effective cpuset filename on legacy cgroups v1 systems (#20294) 2024-04-05 08:05:51 -05:00
Yorick Gersie
6124ee8afb cpuset fixer: use correct cgroup path for updates (#20276)
* cpuset fixer: use correct cgroup path for updates

fixes #20275

* docker: flatten switch statement and add test cases

* cl: add cl

---------

Co-authored-by: Seth Hoenig <shoenig@duck.com>
2024-04-04 15:54:10 -05:00
Seth Hoenig
05937ab75b exec2: add client support for unveil filesystem isolation mode (#20115)
* exec2: add client support for unveil filesystem isolation mode

This PR adds support for a new filesystem isolation mode, "Unveil". The
mode introduces a "alloc_mounts" directory where tasks have user-owned
directory structure which are bind mounts into the real alloc directory
structure. This enables a task driver to use landlock (and maybe the
real unveil on openbsd one day) to isolate a task to the task owned
directory structure, providing sandboxing.

* actually create alloc-mounts-dir directory

* fix doc strings about alloc mount dir paths
2024-03-13 08:24:17 -05:00
Seth Hoenig
4d83733909 tests: swap testify for test in more places (#20028)
* tests: swap testify for test in plugins/csi/client_test.go

* tests: swap testify for test in testutil/

* tests: swap testify for test in host_test.go

* tests: swap testify for test in plugin_test.go

* tests: swap testify for test in utils_test.go

* tests: swap testify for test in scheduler/

* tests: swap testify for test in parse_test.go

* tests: swap testify for test in attribute_test.go

* tests: swap testify for test in plugins/drivers/

* tests: swap testify for test in command/

* tests: fixup some test usages

* go: run go mod tidy

* windows: cpuset test only on linux
2024-02-29 12:11:35 -06:00
Seth Hoenig
e3c8700ded deps: upgrade to go-set/v2 (#18638)
No functional changes, just cleaning up deprecated usages that are
removed in v2 and replace one call of .Slice with .ForEach to avoid
making the intermediate copy.
2023-10-05 11:56:17 -05:00
Seth Hoenig
591394fb62 drivers: plumb hardware topology via grpc into drivers (#18504)
* drivers: plumb hardware topology via grpc into drivers

This PR swaps out the temporary use of detecting system hardware manually
in each driver for using the Client's detected topology by plumbing the
data over gRPC. This ensures that Client configuration is taken to account
consistently in all references to system topology.

* cr: use enum instead of bool for core grade

* cr: fix test slit tables to be possible
2023-09-18 08:58:07 -05:00
Seth Hoenig
2e1974a574 client: refactor cpuset partitioning (#18371)
* client: refactor cpuset partitioning

This PR updates the way Nomad client manages the split between tasks
that make use of resources.cpus vs. resources.cores.

Previously, each task was explicitly assigned which CPU cores they were
able to run on. Every time a task was started or destroyed, all other
tasks' cpusets would need to be updated. This was inefficient and would
crush the Linux kernel when a client would try to run ~400 or so tasks.

Now, we make use of cgroup heirarchy and cpuset inheritence to efficiently
manage cpusets.

* cr: tweaks for feedback
2023-09-12 09:11:11 -05:00
James Rasell
a9d5beb141 test: use correct parallel test setup func (#18326) 2023-08-25 13:51:36 +01:00
Seth Hoenig
f5b0da1d55 all: swap exp packages for maps, slices (#18311) 2023-08-23 15:42:13 -05:00
Piotr Kazmierczak
53ef6391a5 drivers/docker: fix a hostConfigMemorySwappiness panic (#18238)
cgroupslib.MaybeDisableMemorySwappiness returned an incorrect type, and was
incorrectly typecast to int64 causing a panic on non-linux and non-windows hosts.
2023-08-17 14:45:31 +02:00
Tim Gross
f00bff09f1 fix multiple overflow errors in exponential backoff (#18200)
We use capped exponential backoff in several places in the code when handling
failures. The code we've copy-and-pasted all over has a check to see if the
backoff is greater than the limit, but this check happens after the bitshift and
we always increment the number of attempts. This causes an overflow with a
fairly small number of failures (ex. at one place I tested it occurs after only
24 iterations), resulting in a negative backoff which then never recovers. The
backoff becomes a tight loop consuming resources and/or DoS'ing a Nomad RPC
handler or an external API such as Vault. Note this doesn't occur in places
where we cap the number of iterations so the loop breaks (usually to return an
error), so long as the number of iterations is reasonable.

Introduce a helper with a check on the cap before the bitshift to avoid overflow in all 
places this can occur.

Fixes: #18199
Co-authored-by: stswidwinski <stan.swidwinski@gmail.com>
2023-08-15 14:38:18 -04:00
hashicorp-copywrite[bot]
2d35e32ec9 Update copyright file headers to BUSL-1.1 2023-08-10 17:27:15 -05:00
Seth Hoenig
a4cc76bd3e numa: enable numa topology detection (#18146)
* client: refactor cgroups management in client

* client: fingerprint numa topology

* client: plumb numa and cgroups changes to drivers

* client: cleanup task resource accounting

* client: numa client and config plumbing

* lib: add a stack implementation

* tools: remove ec2info tool

* plugins: fixup testing for cgroups / numa changes

* build: update makefile and package tests and cl
2023-08-10 17:05:30 -05:00
Patric Stout
e190eae395 Use config "cpu_total_compute" (if set) for all CPU statistics (#17628)
Before this commit, it was only used for fingerprinting, but not
for CPU stats on nodes or tasks. This meant that if the
auto-detection failed, setting the cpu_total_compute didn't resolved
the issue.

This issue was most noticeable on ARM64, as there auto-detection
always failed.
2023-07-19 13:30:47 -05:00
Charlie Voiselle
3a687930bd Fix typos (#17962) 2023-07-18 13:31:36 -04:00
Devashish Taneja
b31e891e5f Include parent job ID as a Docker container label (#17843)
Fixes: #17751
2023-07-10 11:27:45 -04:00
Seth Hoenig
ec4fa55bbf drivers/docker: refactor use of clients in docker driver (#17731)
* drivers/docker: refactor use of clients in docker driver

This PR refactors how we manage the two underlying clients used by the
docker driver for communicating with the docker daemon. We keep two clients
- one with a hard-coded timeout that applies to all operations no matter
what, intended for use with short lived / async calls to docker. The other
has no timeout and is the responsibility of the caller to set a context
that will ensure the call eventually terminates.

The use of these two clients has been confusing and mistakes were made
in a number of places where calls were making use of the wrong client.

This PR makes it so that a user must explicitly call a function to get
the client that makes sense for that use case.

Fixes #17023

* cr: followup items
2023-06-26 15:21:42 -05:00
Piotr Kazmierczak
807907e001 chore: gofmt docker driver handle.go (#17721) 2023-06-26 10:38:23 +02:00
Johan Forssell
5b46b74b94 drivers: OOM kill logging for Docker driver (#17518)
Explicit error log of the docker ID and container image name
2023-06-26 10:13:23 +02:00
Seth Hoenig
89ce092b20 docker: stop network pause container of lost alloc after node restart (#17455)
This PR fixes a bug where the docker network pause container would not be
stopped and removed in the case where a node is restarted, the alloc is
moved to another node, the node comes back up. See the issue below for
full repro conditions.

Basically in the DestroyNetwork PostRun hook we would depend on the
NetworkIsolationSpec field not being nil - which is only the case
if the Client stays alive all the way from network creation to network
teardown. If the node is rebooted we lose that state and previously
would not be able to find the pause container to remove. Now, we manually
find the pause container by scanning them and looking for the associated
allocID.

Fixes #17299
2023-06-09 08:46:29 -05:00
KamilCuk
da9ec8ce1e Add group_add docker option (#17313) 2023-06-02 20:26:01 -04:00
Daniel Bennett
e0dd940439 tests: enable newer windows (#17401)
* "allow" (don't try to drop) linux capabilities
  in the docker test driver harness (see #15181)
* refactor to allow different busybox images
  since windows containers need to be the same
  version as the underlying OS, and we're
  moving from 2016 to 2019
* one docker test was flaky from apparently
  being a bit slower on windows, so add Wait()
2023-06-02 11:38:38 -05:00
Tim Gross
bf7b82b52b drivers: make internal DisableLogCollection capability public (#17196)
The `DisableLogCollection` capability was introduced as an experimental
interface for the Docker driver in 0.10.4. The interface has been stable and
allowing third-party task drivers the same capability would be useful for those
drivers that don't need the additional overhead of logmon.

This PR only makes the capability public. It doesn't yet add it to the
configuration options for the other internal drivers.

Fixes: #14636 #15686
2023-05-16 09:16:03 -04:00
Tim Gross
30bc456f03 logs: allow disabling log collection in jobspec (#16962)
Some Nomad users ship application logs out-of-band via syslog. For these users
having `logmon` (and `docker_logger`) running is unnecessary overhead. Allow
disabling the logmon and pointing the task's stdout/stderr to /dev/null.

This changeset is the first of several incremental improvements to log
collection short of full-on logging plugins. The next step will likely be to
extend the internal-only task driver configuration so that cluster
administrators can turn off log collection for the entire driver.

---

Fixes: #11175

Co-authored-by: Thomas Weber <towe75@googlemail.com>
2023-04-24 10:00:27 -04:00