nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-01 16:05:42 +03:00

Author	SHA1	Message	Date
Tim Gross	c8dcd3c2db	docker: clamp CPU shares to minimum of 2 (#26081 ) In #25963 we added normalization of CPU shares for large hosts where the total compute was larger than the maximum CPU shares. But if the result after normalization is less than 2, runc will have an integer overflow. We prevent this in the shared executor for the `exec`/`rawexec` driver by clamping to the safe minimum value. Do this for the `docker` driver as well and add test coverage of it for the shared executor too. Fixes: https://github.com/hashicorp/nomad/issues/26080 Ref: https://github.com/hashicorp/nomad/pull/25963	2025-06-19 13:48:06 -04:00
Conor Mongey	f7096fb9d6	docker: add cgroupns task config (#25927 )	2025-06-11 13:50:44 -04:00
dependabot[bot]	6a35c1b8ea	chore(deps): bump github.com/docker/docker from 28.1.1+incompatible to 28.2.2+incompatible (#25954 ) * chore(deps): bump github.com/docker/docker Bumps [github.com/docker/docker](https://github.com/docker/docker) from 28.1.1+incompatible to 28.2.2+incompatible. - [Release notes](https://github.com/docker/docker/releases) - [Commits](https://github.com/docker/docker/compare/v28.1.1...v28.2.2) --- updated-dependencies: - dependency-name: github.com/docker/docker dependency-version: 28.2.2+incompatible dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> * deps: containerd/errdefs instead of docker/errdefs moby's errdefs are deprecated as of `f1bb44aeee` and now merely point to containerd's --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>	2025-06-05 10:26:18 -04:00
Tim Gross	34e96932a1	drivers: normalize CPU shares/weights to fit large hosts (#25963 ) The `resources.cpu` field is scheduled in MHz. On most Linux task drivers, this value is then mapped to a `cpu.share` (cgroups v1) or `cpu.weight` (cgroups v2). But this means on very large hosts where the total compute is greater than the Linux kernel defined maximum CPU shares, you can't set a `resources.cpu` value large enough to consume the entire host. The `cpu.share`/`cpu.weight` value is relative within the parent cgroup's slice, which is owned by Nomad. So we can fix this by re-normalizing the weight on very large hosts such that the maximum `resources.cpu` matches up with largest possible CPU share. This happens in the task driver so that the rest of Nomad doesn't need to be aware of this implementation detail. Note that these functions will result in bad share config if the request is more than the available, but that's supposed to be caught in the scheduler so by not catching it here we intentionally hit the runc error. Fixes: https://hashicorp.atlassian.net/browse/NMD-297 Fixes: https://github.com/hashicorp/nomad/issues/7731 Ref: https://go.hashi.co/rfc/nmd-211	2025-06-03 15:57:40 -04:00
Allison Larson	d1d8945d2e	Add docker plugin config option image_pull_timeout value for default timeout (#25489 ) * Add docker plugin config image_pull_timeout value for default timeout * Add image_pull_timeout docker plugin config to docs * Add changelog	2025-03-24 13:03:14 -07:00
dependabot[bot]	459f95ce3f	chore(deps): bump github.com/docker/docker from 27.4.1+incompatible to 28.0.1+incompatible (#25405 ) Co-authored-by: James Rasell <jrasell@hashicorp.com>	2025-03-18 08:32:37 +00:00
Juana De La Cuesta	5605f9630d	Fix the docker image parser to account for private repos (#24926 ) * fix: fix the docker image parser to account for private repos * style: change the local regex for docker image indentifiers and use docker package instead * func: return early when no repo found on the image name * func: return error if no path found in image * Update drivers/docker/utils.go Co-authored-by: Tim Gross <tgross@hashicorp.com> * Update coordinator.go * Update driver.go * Update network.go --------- Co-authored-by: Tim Gross <tgross@hashicorp.com>	2025-03-04 16:53:20 +01:00
Seth Hoenig	b539b54c9e	docker: close hijacked write connection when exec ends (#24244 )	2024-10-17 11:41:29 -05:00
Seth Hoenig	b18851617f	docker: close response connection once stdin is exhausted (#24202 )	2024-10-17 11:07:23 -05:00
Piotr Kazmierczak	1ac14f4869	docker: always use API version negotiation when initializing clients (#24237 ) During a refactoring of the docker driver in #23966 we introduced a bug: API version negotiation option was not passed to every new client call.	2024-10-17 15:23:14 +02:00
Piotr Kazmierczak	ec42aa2a1b	docker: use docker errdefs instead of string comparisons when checking errors (#24075 )	2024-09-27 15:32:29 +02:00
Piotr Kazmierczak	981ca36049	docker: use official client instead of fsouza/go-dockerclient (#23966 ) This PR replaces fsouza/go-dockerclient 3rd party docker client library with docker's official SDK. --------- Co-authored-by: Tim Gross <tgross@hashicorp.com> Co-authored-by: Seth Hoenig <shoenig@duck.com>	2024-09-26 18:41:44 +02:00
Seth Hoenig	51215bf102	deps: update to go-set/v3 and refactor to use custom iterators (#23971 ) * deps: update to go-set/v3 * deps: use custom set iterators for looping	2024-09-16 13:40:10 -05:00
Tim Gross	6aa503f2bb	docker: disable cpuset management for non-root clients (#23804 ) Nomad clients manage a cpuset cgroup for each task to reserve or share CPU cores. But Docker owns its own cgroups, and attempting to set a parent cgroup that Nomad manages runs into conflicts with how runc manages cgroups via systemd. Therefore Nomad must run as root in order for cpuset management to ever be compatible with Docker. However, some users running in unsupported configurations felt that the changes we made in Nomad 1.7.0 to ensure Nomad was running correctly represented a regression. This changeset disables cpuset management for non-root Nomad clients. When running Nomad as non-root, the driver will not longer reconcile cpusets with Nomad and `resources.cores` will behave incorrectly (but the driver will still run). Although this is one small step along the way to supporting a rootless Nomad client, running Nomad as non-root is still unsupported. This PR is insufficient by itself to have a secure and properly-working rootless Nomad client. Ref: https://github.com/hashicorp/nomad/issues/18211 Ref: https://github.com/hashicorp/nomad/issues/13669 Ref: https://hashicorp.atlassian.net/browse/NET-10652 Ref: https://github.com/opencontainers/runc/blob/main/docs/systemd.md	2024-08-14 16:44:13 -04:00
Tim Gross	9543e740af	docker: fix delimiter for selinux label for read-only volumes (#23750 ) The Docker driver's `volume` field to specify bind-mounts takes a list of strings that consist of three `:`-delimited fields: source, destination, and options. We append the SELinux label from the plugin configuration as the third field. But when the user has already specified the volume is read-only with `:ro`, we're incorrectly appending the SELinux label with another `:` instead of the required `,`. Combine the options into a single field value before appending them to the bind mounts configuration. Updated the tests to split out Windows behavior (which doesn't accept options) and to ensure the test task has the expected environment for bind mounts. Fixes: https://github.com/hashicorp/nomad/issues/23690	2024-08-08 09:08:01 -04:00
Piotr Kazmierczak	f22ce921cd	docker: adjust capabilities on Windows (#23599 ) Adjusts Docker capabilities per OS, and checks for runtime on Windows. --------- Co-authored-by: Tim Gross <tgross@hashicorp.com>	2024-07-17 09:01:45 +02:00
Piotr Kazmierczak	d5e1515e80	docker: default to hyper-v isolation on Windows (#23452 )	2024-07-01 08:56:43 +02:00
Piotr Kazmierczak	0ece7b5c16	docker: validate that containers do not run as ContainerAdmin on Windows (#23443 ) This enables checks for ContainerAdmin user on docker images on Windows. It's only checked if users run docker with process isolation and not hyper-v, because hyper-v provides its own, proper sandboxing. --------- Co-authored-by: Tim Gross <tgross@hashicorp.com>	2024-06-27 16:22:24 +02:00
Piotr Kazmierczak	0e8a67f0e1	docker: oom_score_adj support (#23297 )	2024-06-12 10:49:20 +02:00
Piotr Kazmierczak	307fd590d7	docker: new container_exists_attempts configuration field (#22419 ) This allows users to set a custom value of attempts that will be made to purge an existing (not running) container if one is found during task creation. --------- Co-authored-by: Tim Gross <tgross@hashicorp.com>	2024-05-30 19:22:14 +02:00
Yorick Gersie	6124ee8afb	cpuset fixer: use correct cgroup path for updates (#20276 ) * cpuset fixer: use correct cgroup path for updates fixes #20275 * docker: flatten switch statement and add test cases * cl: add cl --------- Co-authored-by: Seth Hoenig <shoenig@duck.com>	2024-04-04 15:54:10 -05:00
Seth Hoenig	e3c8700ded	deps: upgrade to go-set/v2 (#18638 ) No functional changes, just cleaning up deprecated usages that are removed in v2 and replace one call of .Slice with .ForEach to avoid making the intermediate copy.	2023-10-05 11:56:17 -05:00
Seth Hoenig	591394fb62	drivers: plumb hardware topology via grpc into drivers (#18504 ) * drivers: plumb hardware topology via grpc into drivers This PR swaps out the temporary use of detecting system hardware manually in each driver for using the Client's detected topology by plumbing the data over gRPC. This ensures that Client configuration is taken to account consistently in all references to system topology. * cr: use enum instead of bool for core grade * cr: fix test slit tables to be possible	2023-09-18 08:58:07 -05:00
Seth Hoenig	2e1974a574	client: refactor cpuset partitioning (#18371 ) * client: refactor cpuset partitioning This PR updates the way Nomad client manages the split between tasks that make use of resources.cpus vs. resources.cores. Previously, each task was explicitly assigned which CPU cores they were able to run on. Every time a task was started or destroyed, all other tasks' cpusets would need to be updated. This was inefficient and would crush the Linux kernel when a client would try to run ~400 or so tasks. Now, we make use of cgroup heirarchy and cpuset inheritence to efficiently manage cpusets. * cr: tweaks for feedback	2023-09-12 09:11:11 -05:00
Seth Hoenig	f5b0da1d55	all: swap exp packages for maps, slices (#18311 )	2023-08-23 15:42:13 -05:00
Piotr Kazmierczak	53ef6391a5	drivers/docker: fix a hostConfigMemorySwappiness panic (#18238 ) cgroupslib.MaybeDisableMemorySwappiness returned an incorrect type, and was incorrectly typecast to int64 causing a panic on non-linux and non-windows hosts.	2023-08-17 14:45:31 +02:00
Tim Gross	f00bff09f1	fix multiple overflow errors in exponential backoff (#18200 ) We use capped exponential backoff in several places in the code when handling failures. The code we've copy-and-pasted all over has a check to see if the backoff is greater than the limit, but this check happens after the bitshift and we always increment the number of attempts. This causes an overflow with a fairly small number of failures (ex. at one place I tested it occurs after only 24 iterations), resulting in a negative backoff which then never recovers. The backoff becomes a tight loop consuming resources and/or DoS'ing a Nomad RPC handler or an external API such as Vault. Note this doesn't occur in places where we cap the number of iterations so the loop breaks (usually to return an error), so long as the number of iterations is reasonable. Introduce a helper with a check on the cap before the bitshift to avoid overflow in all places this can occur. Fixes: #18199 Co-authored-by: stswidwinski <stan.swidwinski@gmail.com>	2023-08-15 14:38:18 -04:00
hashicorp-copywrite[bot]	2d35e32ec9	Update copyright file headers to BUSL-1.1	2023-08-10 17:27:15 -05:00
Seth Hoenig	a4cc76bd3e	numa: enable numa topology detection (#18146 ) * client: refactor cgroups management in client * client: fingerprint numa topology * client: plumb numa and cgroups changes to drivers * client: cleanup task resource accounting * client: numa client and config plumbing * lib: add a stack implementation * tools: remove ec2info tool * plugins: fixup testing for cgroups / numa changes * build: update makefile and package tests and cl	2023-08-10 17:05:30 -05:00
Devashish Taneja	b31e891e5f	Include parent job ID as a Docker container label (#17843 ) Fixes: #17751	2023-07-10 11:27:45 -04:00
Seth Hoenig	ec4fa55bbf	drivers/docker: refactor use of clients in docker driver (#17731 ) * drivers/docker: refactor use of clients in docker driver This PR refactors how we manage the two underlying clients used by the docker driver for communicating with the docker daemon. We keep two clients - one with a hard-coded timeout that applies to all operations no matter what, intended for use with short lived / async calls to docker. The other has no timeout and is the responsibility of the caller to set a context that will ensure the call eventually terminates. The use of these two clients has been confusing and mistakes were made in a number of places where calls were making use of the wrong client. This PR makes it so that a user must explicitly call a function to get the client that makes sense for that use case. Fixes #17023 * cr: followup items	2023-06-26 15:21:42 -05:00
Seth Hoenig	89ce092b20	docker: stop network pause container of lost alloc after node restart (#17455 ) This PR fixes a bug where the docker network pause container would not be stopped and removed in the case where a node is restarted, the alloc is moved to another node, the node comes back up. See the issue below for full repro conditions. Basically in the DestroyNetwork PostRun hook we would depend on the NetworkIsolationSpec field not being nil - which is only the case if the Client stays alive all the way from network creation to network teardown. If the node is rebooted we lose that state and previously would not be able to find the pause container to remove. Now, we manually find the pause container by scanning them and looking for the associated allocID. Fixes #17299	2023-06-09 08:46:29 -05:00
KamilCuk	da9ec8ce1e	Add group_add docker option (#17313 )	2023-06-02 20:26:01 -04:00
Tim Gross	30bc456f03	logs: allow disabling log collection in jobspec (#16962 ) Some Nomad users ship application logs out-of-band via syslog. For these users having `logmon` (and `docker_logger`) running is unnecessary overhead. Allow disabling the logmon and pointing the task's stdout/stderr to /dev/null. This changeset is the first of several incremental improvements to log collection short of full-on logging plugins. The next step will likely be to extend the internal-only task driver configuration so that cluster administrators can turn off log collection for the entire driver. --- Fixes: #11175 Co-authored-by: Thomas Weber <towe75@googlemail.com>	2023-04-24 10:00:27 -04:00
hashicorp-copywrite[bot]	f005448366	[COMPLIANCE] Add Copyright and License Headers	2023-04-10 15:36:59 +00:00
Michael Schurter	db2325b17a	docker: default device.container_path to host_path (#16811 ) * docker: default device.container_path to host_path Matches docker cli behavior. Fixes #16754	2023-04-06 14:44:33 -07:00
Tim Gross	03a90f497a	docker: move pause container recovery to after `SetConfig` (#16713 ) When we added recovery of pause containers in #16352 we called the recovery function from the plugin factory function. But in our plugin setup protocol, a plugin isn't ready for use until we call `SetConfig`. This meant that recovering pause containers was always done with the default config. Setting up the Docker client only happens once, so setting the wrong config in the recovery function also means that all other Docker API calls will use the default config. Move the `recoveryPauseContainers` call into the `SetConfig`. Fix the error handling so that we return any error but also don't log when the context is canceled, which happens twice during normal startup as we fingerprint the driver.	2023-03-29 16:20:37 -04:00
Lance Haig	3160c76209	deps: Update ioutil library references to os and io respectively for drivers package (#16331 ) * Update ioutil library references to os and io respectively for drivers package No user facing changes so I assume no change log is required * Fix failing tests	2023-03-08 10:31:09 -06:00
Seth Hoenig	b3f7559351	docker: fix bug where network pause containers would be erroneously reconciled (#16352 ) * docker: fix bug where network pause containers would be erroneously gc'd * docker: cl: thread context from driver into pause container restoration	2023-03-07 12:17:32 -06:00
Piotr Kazmierczak	949a6f60c7	renamed stanza to block for consistency with other projects (#15941 )	2023-01-30 15:48:43 +01:00
Nick Wales	69fd1a0e4b	docker: add option for Windows isolation modes (#15819 )	2023-01-24 16:31:48 -05:00
Seth Hoenig	0bd42a501c	docker: create a docker task config setting for disable built-in healthcheck This PR adds a docker driver task configuration setting for turning off built-in HEALTHCHECK of a container. References) https://docs.docker.com/engine/reference/builder/#healthcheck https://github.com/docker/engine-api/blob/master/types/container/config.go#L16 Closes #5310 Closes #14068	2022-08-11 10:33:48 -05:00
Seth Hoenig	410834b705	drivers/docker: do not set cgroup parent in v1 mode This PR fixes a bug where the CgroupParent on the docker HostConfig struct was accidently being set when running in cgroups v1 mode.	2022-05-24 11:22:50 -05:00
Tim Gross	3671ea6a8f	remove pre-0.9 driver code and related E2E test (#12791 ) This test exercises upgrades between 0.8 and Nomad versions greater than 0.9. We have not supported 0.8.x in a very long time and in any case the test has been marked to skip because the downloader doesn't work.	2022-04-27 09:53:37 -04:00
Seth Hoenig	5da1a31e94	client: enable support for cgroups v2 This PR introduces support for using Nomad on systems with cgroups v2 [1] enabled as the cgroups controller mounted on /sys/fs/cgroups. Newer Linux distros like Ubuntu 21.10 are shipping with cgroups v2 only, causing problems for Nomad users. Nomad mostly "just works" with cgroups v2 due to the indirection via libcontainer, but not so for managing cpuset cgroups. Before, Nomad has been making use of a feature in v1 where a PID could be a member of more than one cgroup. In v2 this is no longer possible, and so the logic around computing cpuset values must be modified. When Nomad detects v2, it manages cpuset values in-process, rather than making use of cgroup heirarchy inheritence via shared/reserved parents. Nomad will only activate the v2 logic when it detects cgroups2 is mounted at /sys/fs/cgroups. This means on systems running in hybrid mode with cgroups2 mounted at /sys/fs/cgroups/unified (as is typical) Nomad will continue to use the v1 logic, and should operate as before. Systems that do not support cgroups v2 are also not affected. When v2 is activated, Nomad will create a parent called nomad.slice (unless otherwise configured in Client conifg), and create cgroups for tasks using naming convention <allocID>-<task>.scope. These follow the naming convention set by systemd and also used by Docker when cgroups v2 is detected. Client nodes now export a new fingerprint attribute, unique.cgroups.version which will be set to 'v1' or 'v2' to indicate the cgroups regime in use by Nomad. The new cpuset management strategy fixes #11705, where docker tasks that spawned processes on startup would "leak". In cgroups v2, the PIDs are started in the cgroup they will always live in, and thus the cause of the leak is eliminated. [1] https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html Closes #11289 Fixes #11705 #11773 #11933	2022-03-23 11:35:27 -05:00
Shishir	f2a37a0a03	Add support for setting pids_limit in docker plugin config. (#11526 )	2021-12-21 13:31:34 -05:00
Shishir Mahajan	479442e682	Add support for --init to docker driver. Signed-off-by: Shishir Mahajan <smahajan@roblox.com>	2021-10-15 12:53:25 -07:00
Tim Gross	a66034bb3a	docker: move host path for hosts file mount to alloc dir (#10823 ) In Nomad 1.1.1 we generate a hosts file based on the Nomad-owned network namespace, rather than using the default hosts file from the pause container. This hosts file should be shared between tasks in the same allocation so that tasks can update the file and have the results propagated between tasks.	2021-06-30 11:10:04 -04:00
Tim Gross	2a640f0b2d	docker: generate /etc/hosts file for bridge network mode (#10766 ) When `network.mode = "bridge"`, we create a pause container in Docker with no networking so that we have a process to hold the network namespace we create in Nomad. The default `/etc/hosts` file of that pause container is then used for all the Docker tasks that share that network namespace. Some applications rely on this file being populated. This changeset generates a `/etc/hosts` file and bind-mounts it to the container when Nomad owns the network, so that the container's hostname has an IP in the file as expected. The hosts file will include the entries added by the Docker driver's `extra_hosts` field. In this changeset, only the Docker task driver will take advantage of this option, as the `exec`/`java` drivers currently copy the host's `/etc/hosts` file and this can't be changed without breaking backwards compatibility. But the fields are available in the task driver protobuf for community task drivers to use if they'd like.	2021-06-16 14:55:22 -04:00
Seth Hoenig	c34beb48b1	drivers/docker: reuse capabilities plumbing in docker driver This changeset does not introduce any functional change for the docker driver, but rather cleans up the implementation around computing configured capabilities by re-using code written for the exec/java task drivers.	2021-05-17 12:37:40 -06:00

1 2 3 4

180 Commits