62 Commits

Author SHA1 Message Date
Tim Gross
6d17a9bf64 deps: update go-plugin to 1.6.2 (#24265)
During allocation recovery, the go-plugin reattachment workflow checks to see if
the PID we stored in client state is still running. If so, we try to connect to
that process. If that fails, we kill the process under the presumption it's not
working. But during reattachment we don't know that the PID we have is still
valid. Which means that the process we're trying to attach to may have exited
and a different process has spawned with the same PID. This results in some
unrelated process getting silently killed.

Update go-plugin to 1.6.2, which includes a bug fix so we do not kill the
process.

Fixes: https://github.com/hashicorp/nomad/issues/23969
2024-10-21 15:48:26 -04:00
Piotr Kazmierczak
981ca36049 docker: use official client instead of fsouza/go-dockerclient (#23966)
This PR replaces fsouza/go-dockerclient 3rd party docker client library with
docker's official SDK.

---------

Co-authored-by: Tim Gross <tgross@hashicorp.com>
Co-authored-by: Seth Hoenig <shoenig@duck.com>
2024-09-26 18:41:44 +02:00
Tim Gross
b25f1b66ce resources: allow job authors to configure size of secrets tmpfs (#23696)
On supported platforms, the secrets directory is a 1MiB tmpfs. But some tasks
need larger space for downloading large secrets. This is especially the case for
tasks using `templates`, which need extra room to write a temporary file to the
secrets directory that gets renamed to the old file atomically.

This changeset allows increasing the size of the tmpfs in the `resources`
block. Because this is a memory resource, we need to include it in the memory we
allocate for scheduling purposes. The task is already prevented from using more
memory in the tmpfs than the `resources.memory` field allows, but can bypass
that limit by writing to the tmpfs via `template` or `artifact` blocks.

Therefore, we need to account for the size of the tmpfs in the allocation
resources. Simply adding it to the memory needed when we create the allocation
allows it to be accounted for in all downstream consumers, and then we'll
subtract that amount from the memory resources just before configuring the task
driver.

For backwards compatibility, the default value of 1MiB is "free" and ignored by
the scheduler. Otherwise we'd be increasing the allocated resources for every
existing alloc, which could cause problems across upgrades. If a user explicitly
sets `resources.secrets = 1` it will no longer be free.

Fixes: https://github.com/hashicorp/nomad/issues/2481
Ref: https://hashicorp.atlassian.net/browse/NET-10070
2024-08-05 16:06:58 -04:00
Seth Hoenig
d9416afee5 testing: fix the value of NOMAD_SECRETS_DIR in test harness (#23166)
This PR fixes the value of NOMAD_SECRETS_DIR to be the alloc_mounts
secrets directory instead of the real secrets directory, which is protected
by root 0700 even when running tests.

Needed for https://github.com/hashicorp/nomad-driver-exec2/issues/29
2024-06-04 10:58:10 -05:00
Seth Hoenig
5f64e42d73 client: fixup how alloc mounts directory are setup (#20463) 2024-04-26 07:29:52 -05:00
Seth Hoenig
77889a16fb exec2: more tweaks to driver harness (#20221)
Also add an explicit exit code to subproc package for when a child
process is instructed to run an unrunnable command (i.e. cannot be
found or is not executable) - with the 127 return code folks using bash
are familiar with
2024-03-26 08:02:41 -05:00
Seth Hoenig
c36db1b005 drivers/testutil: set full filepath for envs when using unveil fs isolation (#20187) 2024-03-22 09:46:17 -05:00
Michael Schurter
23e4b7c9d2 Upgrade go-msgpack to v2 (#20173)
Replaces #18812

Upgraded with:
```
find . -name '*.go' -exec sed -i s/"github.com\/hashicorp\/go-msgpack\/codec"/"github.com\/hashicorp\/go-msgpack\/v2\/codec/" '{}' ';'
find . -name '*.go' -exec sed -i s/"github.com\/hashicorp\/net-rpc-msgpackrpc"/"github.com\/hashicorp\/net-rpc-msgpackrpc\/v2/" '{}' ';'
go get
go get -v -u github.com/hashicorp/raft-boltdb/v2
go get -v github.com/hashicorp/serf@5d32001edfaa18d1c010af65db707cdb38141e80
```

see https://github.com/hashicorp/go-msgpack/releases/tag/v2.1.0
for details
2024-03-21 11:44:23 -07:00
Seth Hoenig
05937ab75b exec2: add client support for unveil filesystem isolation mode (#20115)
* exec2: add client support for unveil filesystem isolation mode

This PR adds support for a new filesystem isolation mode, "Unveil". The
mode introduces a "alloc_mounts" directory where tasks have user-owned
directory structure which are bind mounts into the real alloc directory
structure. This enables a task driver to use landlock (and maybe the
real unveil on openbsd one day) to isolate a task to the task owned
directory structure, providing sandboxing.

* actually create alloc-mounts-dir directory

* fix doc strings about alloc mount dir paths
2024-03-13 08:24:17 -05:00
Seth Hoenig
4d83733909 tests: swap testify for test in more places (#20028)
* tests: swap testify for test in plugins/csi/client_test.go

* tests: swap testify for test in testutil/

* tests: swap testify for test in host_test.go

* tests: swap testify for test in plugin_test.go

* tests: swap testify for test in utils_test.go

* tests: swap testify for test in scheduler/

* tests: swap testify for test in parse_test.go

* tests: swap testify for test in attribute_test.go

* tests: swap testify for test in plugins/drivers/

* tests: swap testify for test in command/

* tests: fixup some test usages

* go: run go mod tidy

* windows: cpuset test only on linux
2024-02-29 12:11:35 -06:00
dependabot[bot]
398b5000c1 chore(deps): bump github.com/hashicorp/go-plugin from 1.4.10 to 1.6.0 (#19646)
Co-authored-by: James Rasell <jrasell@hashicorp.com>
2024-01-08 08:26:34 +00:00
Seth Hoenig
2e1974a574 client: refactor cpuset partitioning (#18371)
* client: refactor cpuset partitioning

This PR updates the way Nomad client manages the split between tasks
that make use of resources.cpus vs. resources.cores.

Previously, each task was explicitly assigned which CPU cores they were
able to run on. Every time a task was started or destroyed, all other
tasks' cpusets would need to be updated. This was inefficient and would
crush the Linux kernel when a client would try to run ~400 or so tasks.

Now, we make use of cgroup heirarchy and cpuset inheritence to efficiently
manage cpusets.

* cr: tweaks for feedback
2023-09-12 09:11:11 -05:00
Seth Hoenig
a4cc76bd3e numa: enable numa topology detection (#18146)
* client: refactor cgroups management in client

* client: fingerprint numa topology

* client: plumb numa and cgroups changes to drivers

* client: cleanup task resource accounting

* client: numa client and config plumbing

* lib: add a stack implementation

* tools: remove ec2info tool

* plugins: fixup testing for cgroups / numa changes

* build: update makefile and package tests and cl
2023-08-10 17:05:30 -05:00
James Rasell
81c14dee3c test: enable exec test previously disabled due to CircleCI (#17975) 2023-07-19 15:15:11 +01:00
Tim Gross
88323bab4a allocrunner: provide factory function so we can build mock ARs (#17161)
Tools like `nomad-nodesim` are unable to implement a minimal implementation of
an allocrunner so that we can test the client communication without having to
lug around the entire allocrunner/taskrunner code base. The allocrunner was
implemented with an interface specifically for this purpose, but there were
circular imports that made it challenging to use in practice.

Move the AllocRunner interface into an inner package and provide a factory
function type. Provide a minimal test that exercises the new function so that
consumers have some idea of what the minimum implementation required is.
2023-05-12 13:29:44 -04:00
Luiz Aoqui
ee5a08dbb2 Revert "hashicorp/go-msgpack v2 (#16810)" (#17047)
This reverts commit 8a98520d56.
2023-05-01 17:18:34 -04:00
Ian Fijolek
8a98520d56 hashicorp/go-msgpack v2 (#16810)
* Upgrade from hashicorp/go-msgpack v1.1.5 to v2.1.0

Fixes #16808

* Update hashicorp/net-rpc-msgpackrpc to v2 to match go-msgpack

* deps: use go-msgpack v2.0.0

go-msgpack v2.1.0 includes some code changes that we will need to
investigate furthere to assess its impact on Nomad, so keeping this
dependency on v2.0.0 for now since it's no-op.

---------

Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
2023-04-17 17:02:05 -04:00
Seth Hoenig
74b16da272 deps: update docker to 23.0.3 (#16862)
* [no ci] deps: update docker to 23.0.3

This PR brings our docker/docker dependency (which is hosted at github.com/moby/moby)
up to 23.0.3 (forward about 2 years). Refactored our use of docker/libnetwork to
reference the package in its new home, which is docker/docker/libnetwork (it is
no longer an independent repository). Some minor nearby test case cleanup as well.

* add cl
2023-04-12 14:13:36 -05:00
hashicorp-copywrite[bot]
f005448366 [COMPLIANCE] Add Copyright and License Headers 2023-04-10 15:36:59 +00:00
Lance Haig
0e74431b01 Update ioutil library references to os and io respectively for API and Plugins package (#16330)
No user facing changes so I assume no change log is required
2023-03-08 10:25:09 -06:00
Tim Gross
263a0497dd testing: skip exec stream child process test (#14601)
This test is broken in CircleCI only. It works on GHA in both 20.04 and 22.04
and has been verified to work on real Nomad; temporarily commenting-out so that
we don't block unrelated CI runs.

WIP to fix in https://github.com/hashicorp/nomad/pull/14600
2022-09-15 11:53:12 -04:00
Seth Hoenig
a6f345c8f5 ci: fixup task runner chroot test
This PR is 2 fixes for the flaky TestTaskRunner_TaskEnv_Chroot test.

And also the TestTaskRunner_Download_ChrootExec test.

- Use TinyChroot to stop copying gigabytes of junk, which causes GHA
to fail to create the environment in time.

- Pre-create cgroups on V2 systems. Normally the cgroup directory is
managed by the cpuset manager, but that is not active in taskrunner tests,
so create it by hand in the test framework.
2022-04-19 10:37:46 -05:00
Seth Hoenig
be7ec8de3e raw_exec: make raw exec driver work with cgroups v2
This PR adds support for the raw_exec driver on systems with only cgroups v2.

The raw exec driver is able to use cgroups to manage processes. This happens
only on Linux, when exec_driver is enabled, and the no_cgroups option is not
set. The driver uses the freezer controller to freeze processes of a task,
issue a sigkill, then unfreeze. Previously the implementation assumed cgroups
v1, and now it also supports cgroups v2.

There is a bit of refactoring in this PR, but the fundamental design remains
the same.

Closes #12351 #12348
2022-04-04 16:11:38 -05:00
Seth Hoenig
5da1a31e94 client: enable support for cgroups v2
This PR introduces support for using Nomad on systems with cgroups v2 [1]
enabled as the cgroups controller mounted on /sys/fs/cgroups. Newer Linux
distros like Ubuntu 21.10 are shipping with cgroups v2 only, causing problems
for Nomad users.

Nomad mostly "just works" with cgroups v2 due to the indirection via libcontainer,
but not so for managing cpuset cgroups. Before, Nomad has been making use of
a feature in v1 where a PID could be a member of more than one cgroup. In v2
this is no longer possible, and so the logic around computing cpuset values
must be modified. When Nomad detects v2, it manages cpuset values in-process,
rather than making use of cgroup heirarchy inheritence via shared/reserved
parents.

Nomad will only activate the v2 logic when it detects cgroups2 is mounted at
/sys/fs/cgroups. This means on systems running in hybrid mode with cgroups2
mounted at /sys/fs/cgroups/unified (as is typical) Nomad will continue to
use the v1 logic, and should operate as before. Systems that do not support
cgroups v2 are also not affected.

When v2 is activated, Nomad will create a parent called nomad.slice (unless
otherwise configured in Client conifg), and create cgroups for tasks using
naming convention <allocID>-<task>.scope. These follow the naming convention
set by systemd and also used by Docker when cgroups v2 is detected.

Client nodes now export a new fingerprint attribute, unique.cgroups.version
which will be set to 'v1' or 'v2' to indicate the cgroups regime in use by
Nomad.

The new cpuset management strategy fixes #11705, where docker tasks that
spawned processes on startup would "leak". In cgroups v2, the PIDs are
started in the cgroup they will always live in, and thus the cause of
the leak is eliminated.

[1] https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html

Closes #11289
Fixes #11705 #11773 #11933
2022-03-23 11:35:27 -05:00
Seth Hoenig
b242957990 ci: swap ci parallelization for unconstrained gomaxprocs 2022-03-15 12:58:52 -05:00
Seth Hoenig
3888a59c47 testing: use a smaller chroot when running exec driver tests
The default chroot copies all of /bin, /usr, etc. which can ammount
to gigabytes of stuff not actually needed for running our tests.

Use a smaller chroot in test cases so that CI infra with poor disk
IO has a chance.
2022-03-09 16:24:07 -06:00
James Rasell
ab9ba35e6a chore: fixup inconsistent method receiver names. (#11704) 2021-12-20 11:44:21 +01:00
Michael Schurter
eeb1da8a2e test: update tests to properly use AllocDir
Also use t.TempDir when possible.
2021-10-19 10:49:07 -07:00
Michael Schurter
37f053ff89 client: never embed alloc_dir in chroot
Fixes #2522

Skip embedding client.alloc_dir when building chroot. If a user
configures a Nomad client agent so that the chroot_env will embed the
client.alloc_dir, Nomad will happily infinitely recurse while building
the chroot until something horrible happens. The best case scenario is
the filesystem's path length limit is hit. The worst case scenario is
disk space is exhausted.

A bad agent configuration will look something like this:

```hcl
data_dir = "/tmp/nomad-badagent"

client {
  enabled = true

  chroot_env {
    # Note that the source matches the data_dir
    "/tmp/nomad-badagent" = "/ohno"
    # ...
  }
}
```

Note that `/ohno/client` (the state_dir) will still be created but not
`/ohno/alloc` (the alloc_dir).
While I cannot think of a good reason why someone would want to embed
Nomad's client (and possibly server) directories in chroots, there
should be no cause for harm. chroots are only built when Nomad runs as
root, and Nomad disables running exec jobs as root by default. Therefore
even if client state is copied into chroots, it will be inaccessible to
tasks.

Skipping the `data_dir` and `{client,server}.state_dir` is possible, but
this PR attempts to implement the minimum viable solution to reduce risk
of unintended side effects or bugs.

When running tests as root in a vm without the fix, the following error
occurs:

```
=== RUN   TestAllocDir_SkipAllocDir
    alloc_dir_test.go:520:
                Error Trace:    alloc_dir_test.go:520
                Error:          Received unexpected error:
                                Couldn't create destination file /tmp/TestAllocDir_SkipAllocDir1457747331/001/nomad/test/testtask/nomad/test/testtask/.../nomad/test/testtask/secrets/.nomad-mount: open /tmp/TestAllocDir_SkipAllocDir1457747331/001/nomad/test/.../testtask/secrets/.nomad-mount: file name too long
                Test:           TestAllocDir_SkipAllocDir
--- FAIL: TestAllocDir_SkipAllocDir (22.76s)
```

Also removed unused Copy methods on AllocDir and TaskDir structs.

Thanks to @eveld for not letting me forget about this!
2021-10-18 09:22:01 -07:00
James Rasell
e34fa583f9 allow configuration of Docker hostnames in bridge mode (#11173)
Add a new hostname string parameter to the network block which
allows operators to specify the hostname of the network namespace.
Changing this causes a destructive update to the allocation and it
is omitted if empty from API responses. This parameter also supports
interpolation.

In order to have a hostname passed as a configuration param when
creating an allocation network, the CreateNetwork func of the
DriverNetworkManager interface needs to be updated. In order to
minimize the disruption of future changes, rather than add another
string func arg, the function now accepts a request struct along with
the allocID param. The struct has the hostname as a field.

The in-tree implementations of DriverNetworkManager.CreateNetwork
have been modified to account for the function signature change.
In updating for the change, the enhancement of adding hostnames to
network namespaces has also been added to the Docker driver, whilst
the default Linux manager does not current implement it.
2021-09-16 08:13:09 +02:00
James Rasell
ff4cd338d9 plugins: fix test data race. 2021-06-15 09:31:08 +02:00
Drew Bailey
d72a9b6f29 remove flakey exec test that tests non-deterministic docker behavior (#10291) 2021-04-02 11:15:22 -04:00
Chris Baker
11f0fed935 update template and artifact interpolation to use client-relative paths
resolves #9839
resolves #6929
resolves #6910

e2e: template env interpolation path testing
2021-01-04 22:25:34 +00:00
Kris Hicks
7747124ef0 Apply some suggested fixes from staticcheck (#9598) 2020-12-10 07:29:18 -08:00
Kris Hicks
85ed8ddd4f Add gosimple linter (#9590) 2020-12-09 11:05:18 -08:00
Mahmood Ali
edaa16589b honor task user when execing into raw_exec task (#9439)
Fix #9210 .

This update the executor so it honors the User when using nomad alloc exec. The bug was that the exec task didn't honor the init command when execing.
2020-11-25 09:34:10 -05:00
Michael Schurter
f44c04ecd1 s/0.13/1.0/g
1.0 here we come!
2020-10-14 15:17:47 -07:00
Yoan Blanc
c14c616194 use allow/deny instead of the colored alternatives (#9019)
Signed-off-by: Yoan Blanc <yoan@dosimple.ch>
2020-10-12 08:47:05 -04:00
Mahmood Ali
a181ff3f79 Compare to the correct host network setting
In systemd-resolved hosts with no DNS customizations, the docker driver
DNS setting should be compared to /run/systemd/resolve/resolv.conf while
exec/java drivers should be compared to /etc/resolv.conf.

When system-resolved is enabled, /etc/resolv.conf is a stub that points
to 127.0.0.53. Docker avoids this stub because this address isn't
accessible from the container.  The exec/java drivers that don't create
network isolations use the stub though in the default configuration.
2020-10-01 10:23:14 -04:00
Nick Ethier
e9ff8a8daa Task DNS Options (#7661)
Co-Authored-By: Tim Gross <tgross@hashicorp.com>
Co-Authored-By: Seth Hoenig <shoenig@hashicorp.com>
2020-06-18 11:01:31 -07:00
Anthony Scalisi
e1287846ae fix spelling errors (#6985) 2020-04-20 09:28:19 -04:00
Yoan Blanc
c3928fe360 fixup! vendor: explicit use of hashicorp/go-msgpack
Signed-off-by: Yoan Blanc <yoan@dosimple.ch>
2020-03-31 09:48:07 -04:00
Yoan Blanc
887f23a351 vendor: explicit use of hashicorp/go-msgpack
Signed-off-by: Yoan Blanc <yoan@dosimple.ch>
2020-03-31 09:45:21 -04:00
Tim Gross
e17901d667 driver/networking: don't recreate existing network namespaces 2019-09-25 14:58:17 -04:00
Lucas BEE
dfd673f3c6 Fix missing plugin driver capabilities (#6128)
NetIsolationModes and MustInitiateNetwork were left out of the
driver Capabilities when using an external task driver plugin

Signed-off-by: Lucas BEE <pouulet@gmail.com>
2019-08-14 09:10:10 -04:00
Nick Ethier
bc71cafcee plugin/driver: fix tests and add new dep to vendor 2019-07-31 01:03:17 -04:00
Chris Baker
3a96683131 docker: DestroyTask was not cleaning up Docker images because it was erroring early due to an attempt to inspect an image that had already been removed 2019-06-03 19:04:27 +00:00
Mahmood Ali
3a77502cae Add basic drivers conformance tests
Add consolidated testing package to serve as conformance tests for all drivers.
2019-05-09 16:49:08 -04:00
Nick Ethier
cc12c8443f logmon: fix logmon handling in driver test harness 2019-03-20 21:14:08 -04:00
Mahmood Ali
b70442a325 drivers: pass logger through driver plugin client
This fixes a panic whenever driver plugin attempts to log a message.
2019-01-25 09:38:41 -05:00