Commit Graph

128 Commits

Author SHA1 Message Date
Vincent Ducamps
6469b59a0a docker: Fix a bug where images with port number and no tags weren't parsed correctly 2025-01-03 11:38:43 +01:00
Tim Gross
d12128c380 docker: use streaming stats collection to correct CPU stats (#24229)
In #23966 we switched to the official Docker SDK for the `docker` driver. In the
process we refactored code around stats collection to use the "one shot" version
of stats. Unfortunately this "one shot" stats collection does not include the
`PreCPU` stats, which are the stats from the previous read. This breaks the
calculation we use to determine CPU ticks, because now we're subtracting 0 from
the current value to get the delta.

Switch back to using the streaming stats collection. Add a test that fully
exercises the `TaskStats` API.

Fixes: https://github.com/hashicorp/nomad/issues/24224
Ref: https://hashicorp.atlassian.net/browse/NET-11348
2024-10-17 08:25:59 -04:00
Piotr Kazmierczak
f9cbaaf6c7 docker: fix a bug where auth for private registries wasn't parsed correctly (#24215)
In #23966 we introduced an official Docker client and did not notice that in
contrast to our previous 3rd party client, the official SDK PullOptions object
expects a base64 encoded JSON with username and password, instead of username/
password pair.
2024-10-16 22:04:54 +02:00
Piotr Kazmierczak
ec42aa2a1b docker: use docker errdefs instead of string comparisons when checking errors (#24075) 2024-09-27 15:32:29 +02:00
Piotr Kazmierczak
981ca36049 docker: use official client instead of fsouza/go-dockerclient (#23966)
This PR replaces fsouza/go-dockerclient 3rd party docker client library with
docker's official SDK.

---------

Co-authored-by: Tim Gross <tgross@hashicorp.com>
Co-authored-by: Seth Hoenig <shoenig@duck.com>
2024-09-26 18:41:44 +02:00
Tim Gross
9543e740af docker: fix delimiter for selinux label for read-only volumes (#23750)
The Docker driver's `volume` field to specify bind-mounts takes a list of
strings that consist of three `:`-delimited fields: source, destination, and
options. We append the SELinux label from the plugin configuration as the third
field. But when the user has already specified the volume is read-only with
`:ro`, we're incorrectly appending the SELinux label with another `:` instead of
the required `,`.

Combine the options into a single field value before appending them to the bind
mounts configuration. Updated the tests to split out Windows behavior (which
doesn't accept options) and to ensure the test task has the expected environment
for bind mounts.

Fixes: https://github.com/hashicorp/nomad/issues/23690
2024-08-08 09:08:01 -04:00
Piotr Kazmierczak
f22ce921cd docker: adjust capabilities on Windows (#23599)
Adjusts Docker capabilities per OS, and checks for runtime on Windows.

---------

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2024-07-17 09:01:45 +02:00
Piotr Kazmierczak
0ece7b5c16 docker: validate that containers do not run as ContainerAdmin on Windows (#23443)
This enables checks for ContainerAdmin user on docker images on Windows. It's
only checked if users run docker with process isolation and not hyper-v,
because hyper-v provides its own, proper sandboxing.

---------

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2024-06-27 16:22:24 +02:00
Piotr Kazmierczak
bf11e39ac8 docker: add a unit test for "container already exists" error when creating containers (#22238) 2024-05-30 11:24:28 +02:00
Seth Hoenig
591394fb62 drivers: plumb hardware topology via grpc into drivers (#18504)
* drivers: plumb hardware topology via grpc into drivers

This PR swaps out the temporary use of detecting system hardware manually
in each driver for using the Client's detected topology by plumbing the
data over gRPC. This ensures that Client configuration is taken to account
consistently in all references to system topology.

* cr: use enum instead of bool for core grade

* cr: fix test slit tables to be possible
2023-09-18 08:58:07 -05:00
Seth Hoenig
2e1974a574 client: refactor cpuset partitioning (#18371)
* client: refactor cpuset partitioning

This PR updates the way Nomad client manages the split between tasks
that make use of resources.cpus vs. resources.cores.

Previously, each task was explicitly assigned which CPU cores they were
able to run on. Every time a task was started or destroyed, all other
tasks' cpusets would need to be updated. This was inefficient and would
crush the Linux kernel when a client would try to run ~400 or so tasks.

Now, we make use of cgroup heirarchy and cpuset inheritence to efficiently
manage cpusets.

* cr: tweaks for feedback
2023-09-12 09:11:11 -05:00
James Rasell
a9d5beb141 test: use correct parallel test setup func (#18326) 2023-08-25 13:51:36 +01:00
hashicorp-copywrite[bot]
2d35e32ec9 Update copyright file headers to BUSL-1.1 2023-08-10 17:27:15 -05:00
Seth Hoenig
a4cc76bd3e numa: enable numa topology detection (#18146)
* client: refactor cgroups management in client

* client: fingerprint numa topology

* client: plumb numa and cgroups changes to drivers

* client: cleanup task resource accounting

* client: numa client and config plumbing

* lib: add a stack implementation

* tools: remove ec2info tool

* plugins: fixup testing for cgroups / numa changes

* build: update makefile and package tests and cl
2023-08-10 17:05:30 -05:00
KamilCuk
da9ec8ce1e Add group_add docker option (#17313) 2023-06-02 20:26:01 -04:00
Daniel Bennett
e0dd940439 tests: enable newer windows (#17401)
* "allow" (don't try to drop) linux capabilities
  in the docker test driver harness (see #15181)
* refactor to allow different busybox images
  since windows containers need to be the same
  version as the underlying OS, and we're
  moving from 2016 to 2019
* one docker test was flaky from apparently
  being a bit slower on windows, so add Wait()
2023-06-02 11:38:38 -05:00
Tim Gross
30bc456f03 logs: allow disabling log collection in jobspec (#16962)
Some Nomad users ship application logs out-of-band via syslog. For these users
having `logmon` (and `docker_logger`) running is unnecessary overhead. Allow
disabling the logmon and pointing the task's stdout/stderr to /dev/null.

This changeset is the first of several incremental improvements to log
collection short of full-on logging plugins. The next step will likely be to
extend the internal-only task driver configuration so that cluster
administrators can turn off log collection for the entire driver.

---

Fixes: #11175

Co-authored-by: Thomas Weber <towe75@googlemail.com>
2023-04-24 10:00:27 -04:00
Seth Hoenig
74b16da272 deps: update docker to 23.0.3 (#16862)
* [no ci] deps: update docker to 23.0.3

This PR brings our docker/docker dependency (which is hosted at github.com/moby/moby)
up to 23.0.3 (forward about 2 years). Refactored our use of docker/libnetwork to
reference the package in its new home, which is docker/docker/libnetwork (it is
no longer an independent repository). Some minor nearby test case cleanup as well.

* add cl
2023-04-12 14:13:36 -05:00
hashicorp-copywrite[bot]
f005448366 [COMPLIANCE] Add Copyright and License Headers 2023-04-10 15:36:59 +00:00
Michael Schurter
db2325b17a docker: default device.container_path to host_path (#16811)
* docker: default device.container_path to host_path

Matches docker cli behavior.

Fixes #16754
2023-04-06 14:44:33 -07:00
Lance Haig
3160c76209 deps: Update ioutil library references to os and io respectively for drivers package (#16331)
* Update ioutil library references to os and io respectively for drivers package

No user facing changes so I assume no change log is required

* Fix failing tests
2023-03-08 10:31:09 -06:00
Seth Hoenig
dab4d7ed7a ci: swap freeport for portal in packages (#15661) 2023-01-03 11:25:20 -06:00
Michael Schurter
1bc5c718b4 Data race fixes in tests and a new semgrep rule (#14594)
* test: don't use loop vars in goroutines

fixes a data race in the test

* test: copy objects in statestore before mutating

fixes data race in test

* test: @lgfa29's segmgrep rule for loops/goroutines

Found 2 places where we were improperly using loop variables inside
goroutines.
2022-09-15 10:35:08 -07:00
Seth Hoenig
0bd42a501c docker: create a docker task config setting for disable built-in healthcheck
This PR adds a docker driver task configuration setting for turning off
built-in HEALTHCHECK of a container.

References)
https://docs.docker.com/engine/reference/builder/#healthcheck
https://github.com/docker/engine-api/blob/master/types/container/config.go#L16

Closes #5310
Closes #14068
2022-08-11 10:33:48 -05:00
Tim Gross
2415d72bb6 test: disable docker OOM detection test on cgroups v2 (#13928)
OOM detection under cgroups v2 is flaky under versions of `containerd` before
v1.6.3, but our `containerd` dependency is transitive on `moby/moby`, who have
not yet updated. Disable this test for cgroups v2 environments until we can
update the dependency chain.
2022-07-28 14:47:06 -04:00
Seth Hoenig
410834b705 drivers/docker: do not set cgroup parent in v1 mode
This PR fixes a bug where the CgroupParent on the docker
HostConfig struct was accidently being set when running in
cgroups v1 mode.
2022-05-24 11:22:50 -05:00
Seth Hoenig
d91e4160da cli: update default redis and use nomad service discovery
Closes #12927
Closes #12958

This PR updates the version of redis used in our examples from 3.2 to 7.
The old version is very not supported anymore, and we should be setting
a good example by using a supported version.

The long-form example job is now fixed so that the service stanza uses
nomad as the service discovery provider, and so now the job runs without
a requirement of having Consul running and configured.
2022-05-17 10:24:19 -05:00
Eng Zer Jun
fca4ee8e05 test: use T.TempDir to create temporary test directory (#12853)
* test: use `T.TempDir` to create temporary test directory

This commit replaces `ioutil.TempDir` with `t.TempDir` in tests. The
directory created by `t.TempDir` is automatically removed when the test
and all its subtests complete.

Prior to this commit, temporary directory created using `ioutil.TempDir`
needs to be removed manually by calling `os.RemoveAll`, which is omitted
in some tests. The error handling boilerplate e.g.
	defer func() {
		if err := os.RemoveAll(dir); err != nil {
			t.Fatal(err)
		}
	}
is also tedious, but `t.TempDir` handles this for us nicely.

Reference: https://pkg.go.dev/testing#T.TempDir
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>

* test: fix TestLogmon_Start_restart on Windows

Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>

* test: fix failing TestConsul_Integration

t.TempDir fails to perform the cleanup properly because the folder is
still in use

testing.go:967: TempDir RemoveAll cleanup: unlinkat /tmp/TestConsul_Integration2837567823/002/191a6f1a-5371-cf7c-da38-220fe85d10e5/web/secrets: device or resource busy

Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
2022-05-12 11:42:40 -04:00
Seth Hoenig
5da1a31e94 client: enable support for cgroups v2
This PR introduces support for using Nomad on systems with cgroups v2 [1]
enabled as the cgroups controller mounted on /sys/fs/cgroups. Newer Linux
distros like Ubuntu 21.10 are shipping with cgroups v2 only, causing problems
for Nomad users.

Nomad mostly "just works" with cgroups v2 due to the indirection via libcontainer,
but not so for managing cpuset cgroups. Before, Nomad has been making use of
a feature in v1 where a PID could be a member of more than one cgroup. In v2
this is no longer possible, and so the logic around computing cpuset values
must be modified. When Nomad detects v2, it manages cpuset values in-process,
rather than making use of cgroup heirarchy inheritence via shared/reserved
parents.

Nomad will only activate the v2 logic when it detects cgroups2 is mounted at
/sys/fs/cgroups. This means on systems running in hybrid mode with cgroups2
mounted at /sys/fs/cgroups/unified (as is typical) Nomad will continue to
use the v1 logic, and should operate as before. Systems that do not support
cgroups v2 are also not affected.

When v2 is activated, Nomad will create a parent called nomad.slice (unless
otherwise configured in Client conifg), and create cgroups for tasks using
naming convention <allocID>-<task>.scope. These follow the naming convention
set by systemd and also used by Docker when cgroups v2 is detected.

Client nodes now export a new fingerprint attribute, unique.cgroups.version
which will be set to 'v1' or 'v2' to indicate the cgroups regime in use by
Nomad.

The new cpuset management strategy fixes #11705, where docker tasks that
spawned processes on startup would "leak". In cgroups v2, the PIDs are
started in the cgroup they will always live in, and thus the cause of
the leak is eliminated.

[1] https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html

Closes #11289
Fixes #11705 #11773 #11933
2022-03-23 11:35:27 -05:00
Seth Hoenig
b242957990 ci: swap ci parallelization for unconstrained gomaxprocs 2022-03-15 12:58:52 -05:00
Shishir Mahajan
479442e682 Add support for --init to docker driver.
Signed-off-by: Shishir Mahajan <smahajan@roblox.com>
2021-10-15 12:53:25 -07:00
Mahmood Ali
6c414cd5f9 gofmt all the files
mostly to handle build directives in 1.17.
2021-10-01 10:14:28 -04:00
Seth Hoenig
c34beb48b1 drivers/docker: reuse capabilities plumbing in docker driver
This changeset does not introduce any functional change for the
docker driver, but rather cleans up the implementation around
computing configured capabilities by re-using code written for
the exec/java task drivers.
2021-05-17 12:37:40 -06:00
Seth Hoenig
003d68fe6d drivers/docker+exec+java: disable net_raw capability by default
The default Linux Capabilities set enabled by the docker, exec, and
java task drivers includes CAP_NET_RAW (for making ping just work),
which has the side affect of opening an ARP DoS/MiTM attack between
tasks using bridge networking on the same host network.

https://docs.docker.com/engine/reference/run/#runtime-privilege-and-linux-capabilities

This PR disables CAP_NET_RAW for the docker, exec, and java task
drivers. The previous behavior can be restored for docker using the
allow_caps docker plugin configuration option.

A future version of nomad will enable similar configurability for the
exec and java task drivers.
2021-05-12 13:22:09 -07:00
Isabel Suchanek
276644470e Clean up docker driver test to make it less flaky (#10559)
Co-authored-by: Mahmood Ali <mahmood@hashicorp.com>
2021-05-10 14:58:19 -07:00
Isabel Suchanek
1b2296400b Fix test panic in docker driver test 2021-05-07 12:12:33 -07:00
Isabel Suchanek
379c09513c drivers/docker: add support for STOPSIGNAL
This fixes a bug where Nomad overrides a Dockerfile's STOPSIGNAL with
the default kill_signal (SIGTERM).

This adds a check for kill_signal. If it's not set, it calls
StopContainer instead of Signal, which uses STOPSIGNAL if it's
specified. If both kill_signal and STOPSIGNAL are set, Nomad tries to
stop the container with kill_signal first, before then calling
StopContainer.

Fixes #9989
2021-05-05 10:27:58 -07:00
Mahmood Ali
b1ff06fd19 oversubscription: docker to honor MemoryMaxMB values 2021-03-30 16:55:58 -04:00
Florian Apolloner
8b3ea4ea9a docker: support configuring default log driver in plugin options 2021-03-12 16:04:33 -05:00
Adrian Todorov
2748d2a895 driver/docker: add extra labels ( job name, task and task group name) 2021-03-08 08:59:52 -05:00
Mahmood Ali
8879645ab9 docker: introduce a new hcl2-friendly mount syntax (#9635)
Introduce a new more-block friendly syntax for specifying mounts with a new `mount` block type with the target as label:

```hcl
config {
  image = "..."

  mount {
    type = "..."
    target = "target-path"
    volume_options { ... }
  }
}
```

The main benefit here is that by `mount` being a block, it can nest blocks and avoids the compatibility problems noted in https://github.com/hashicorp/nomad/pull/9634/files#diff-2161d829655a3a36ba2d916023e4eec125b9bd22873493c1c2e5e3f7ba92c691R128-R155 .

The intention is for us to promote this `mount` blocks and quietly deprecate the `mounts` type, while still honoring to preserve compatibility as much as we could.

This addresses the issue in https://github.com/hashicorp/nomad/issues/9604 .
2020-12-15 14:13:50 -05:00
Shishir Mahajan
850694634f Fix circleci. 2020-11-11 12:30:00 -08:00
Shishir Mahajan
161bec4ada Add cpuset_cpus to docker driver. 2020-11-11 12:30:00 -08:00
Tim Gross
306cfabb62 docker: disallow volume mounts from host by default (#9321)
The default behavior for `docker.volumes.enabled` is intended to be `false`,
but the HCL schema defaults to `true` if the value is unset. Set the default
literal value to `true`.

Additionally, Docker driver mounts of type "volume" (but not "bind") are not
being properly sandboxed with that setting. Disable Docker mounts with type
"volume" entirely whenever the `docker.volumes.enabled` flag is set to
false. Note this is unrelated to the `volume_mount` feature, which is
constrained to preconfigured host volumes or whatever is mounted by a CSI
plugin.

This changeset includes updates to unit tests that should have been failing
under the documented behavior but were not.
2020-11-11 10:03:46 -05:00
Russell Rollins
93c7909951 Use Dockerhub Mirror. (#9220)
Dockerhub is going to rate limit unauthenticated pulls.

Use our HashiCorp internal mirror for builds run through CircleCI.

Co-authored-by: Mahmood Ali <mahmood@hashicorp.com>
2020-11-02 09:28:02 -05:00
Yoan Blanc
c14c616194 use allow/deny instead of the colored alternatives (#9019)
Signed-off-by: Yoan Blanc <yoan@dosimple.ch>
2020-10-12 08:47:05 -04:00
Seth Hoenig
a299b76c5a docs: document docker signal fix, add tests
This PR adds a version specific upgrade note about the docker stop
signal behavior. Also adds test for the signal logic in docker driver.

Closes #8932 which was fixed in #8933
2020-10-02 10:06:43 -05:00
Nick Ethier
2176fbd90a docker: use Nomad managed resolv.conf when DNS options are set (#8600) 2020-08-17 10:22:08 -04:00
James Rasell
4f39d161ed Merge pull request #8589 from hashicorp/f-gh-5718
driver/docker: allow configurable pull context timeout setting.
2020-08-14 16:07:59 +02:00
James Rasell
a40a14064a driver/docker: allow configurable pull context timeout setting.
Pulling large docker containers can take longer than the default
context timeout. Without a way to change this it is very hard for
users to utilise Nomad properly without hacky work arounds.

This change adds an optional pull_timeout config parameter which
gives operators the possibility to account for increase pull times
where needed. The infra docker image also has the option to set a
custom timeout to keep consistency.
2020-08-12 08:58:07 +01:00