Commit Graph

85 Commits

Author SHA1 Message Date
Seth Hoenig
8492c6576e build: upgrade and speedup circleci configuration
This PR upgrades our CI images and fixes some affected tests.

- upgrade go-machine-image to premade latest ubuntu LTS (ubuntu-2004:202111-02)

- eliminate go-machine-recent-image (no longer necessary)

- manage GOPATH in GNUMakefile (see https://discuss.circleci.com/t/gopath-is-set-to-multiple-directories/7174)

- fix tcp dial error check (message seems to be OS specific)

- spot check values measured instead of specifically 'RSS' (rss no longer reported in cgroups v2)

- use safe MkdirTemp for generating tmpfiles

NOT applied: (too flakey)

- eliminate setting GOMAXPROCS=1 (build tools were also affected by this setting)

- upgrade resource type for all imanges to large (2C -> 4C)
2022-01-24 08:28:14 -06:00
Seth Hoenig
87dbc7162b deps: upgrade docker and runc
This PR upgrades
 - docker dependency to the latest tagged release (v20.10.12)
 - runc dependency to the latest tagged release (v1.0.3)

Docker does not abide by [semver](https://github.com/moby/moby/issues/39302), so it is marked +incompatible,
and transitive dependencies are upgrade manually.

Runc made three relevant breaking changes

 * cgroup manager .Set changed to accept Resources instead of Cgroup
   3f65946756

 * config.Device moved to devices.Device
   https://github.com/opencontainers/runc/pull/2679

 * mountinfo.Mounted now returns an error if the specified path does not exist
   https://github.com/moby/sys/blob/mountinfo/v0.5.0/mountinfo/mountinfo.go#L16
2022-01-18 08:35:26 -06:00
Alessandro De Blasis
759397533a metrics: added mapped_file metric (#11500)
Signed-off-by: Alessandro De Blasis <alex@deblasis.net>
Co-authored-by: Nate <37554478+servusdei2018@users.noreply.github.com>
2022-01-10 15:35:19 -05:00
Mahmood Ali
feb450a393 executor: set CpuWeight in cgroup-v2 (#11287)
Cgroup-v2 uses `cpu.weight` property instead of cpu shares:
https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html#cpu-interface-files
. And it uses a different range (i.e. `[1, 10000]`) from cpu.shares
(i.e. `[2, 262144]`) to make things more interesting.

Luckily, the libcontainer provides a helper function to perform the
conversion
[`ConvertCPUSharesToCgroupV2Value`](https://pkg.go.dev/github.com/opencontainers/runc@v1.0.2/libcontainer/cgroups#ConvertCPUSharesToCgroupV2Value).

I have confirmed that docker/libcontainer performs the conversion as
well in
https://github.com/opencontainers/runc/blob/v1.0.2/libcontainer/specconv/spec_linux.go#L536-L541
, and that CpuShares is ignored by libcontainer in
https://github.com/opencontainers/runc/blob/v1.0.2/libcontainer/cgroups/fs2/cpu.go#L24-L29
.
2021-10-14 08:46:07 -04:00
Mahmood Ali
6c414cd5f9 gofmt all the files
mostly to handle build directives in 1.17.
2021-10-01 10:14:28 -04:00
Mahmood Ali
0be58d72f4 drivers/exec: Don't inherit Nomad oom_score_adj value (#10698)
Explicitly set the `oom_score_adj` value for `exec` and `java` tasks.

We recommend that the Nomad service to have oom_score_adj of a low value
(e.g. -1000) to avoid having nomad agent OOM Killed if the node is
oversubscriped.

However, Nomad's workloads should not inherit Nomad's process, which is
the default behavior.

Fixes #10663
2021-06-03 14:15:50 -04:00
Seth Hoenig
595cef8136 drivers/exec: pass capabilities through executor RPC
Add capabilities to the LaunchRequest proto so that the
capabilities set actually gets plumbed all the way through
to task launch.
2021-05-17 12:37:40 -06:00
Seth Hoenig
191144c3bf drivers/exec: enable setting allow_caps on exec driver
This PR enables setting allow_caps on the exec driver
plugin configuration, as well as cap_add and cap_drop in
exec task configuration. These options replicate the
functionality already present in the docker task driver.

Important: this change also reduces the default set of
capabilities enabled by the exec driver to match the
default set enabled by the docker driver. Until v1.0.5
the exec task driver would enable all capabilities supported
by the operating system. v1.0.5 removed NET_RAW from that
list of default capabilities, but left may others which
could potentially also be leveraged by compromised tasks.

Important: the "root" user is still special cased when
used with the exec driver. Older versions of Nomad enabled
enabled all capabilities supported by the operating system
for tasks set with the root user. To maintain compatibility
with existing clusters we continue supporting this "feature",
however we maintain support for the legacy set of capabilities
rather than enabling all capabilities now supported on modern
operating systems.
2021-05-17 12:37:40 -06:00
Seth Hoenig
003d68fe6d drivers/docker+exec+java: disable net_raw capability by default
The default Linux Capabilities set enabled by the docker, exec, and
java task drivers includes CAP_NET_RAW (for making ping just work),
which has the side affect of opening an ARP DoS/MiTM attack between
tasks using bridge networking on the same host network.

https://docs.docker.com/engine/reference/run/#runtime-privilege-and-linux-capabilities

This PR disables CAP_NET_RAW for the docker, exec, and java task
drivers. The previous behavior can be restored for docker using the
allow_caps docker plugin configuration option.

A future version of nomad will enable similar configurability for the
exec and java task drivers.
2021-05-12 13:22:09 -07:00
Nick Ethier
e0a599ed9c nit: code cleanup/organization 2021-04-16 15:14:29 -04:00
Nick Ethier
5377be43ff executor: add support for cpuset cgroup 2021-04-15 10:24:31 -04:00
Yoan Blanc
a814f0253f chore: bump golangci-lint from v1.24 to v1.39
Signed-off-by: Yoan Blanc <yoan@dosimple.ch>
2021-04-03 09:50:23 +02:00
Mahmood Ali
4532272931 drivers/exec: Account for cgroup-v2 memory stats
If the host is running with cgroup-v2, RSS and Max Usage doesn't get
reported anymore.
2021-04-01 12:13:21 -04:00
zhsj
46b335d652 deps: update runc to v1.0.0-rc93
includes updates for breaking changes in runc v1.0.0-rc93
2021-03-31 10:57:02 -04:00
Mahmood Ali
43549b46fc driver/exec: set soft memory limit
Linux offers soft memory limit:
https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v1/memory.html#soft-limits
, and
https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html?highlight=memory.low
.

We can set soft memory limits through libcontainer
`Resources.MemoryReservation`: https://pkg.go.dev/github.com/opencontainers/runc@v0.1.1/libcontainer/configs#Resources
2021-03-30 16:55:58 -04:00
Mahmood Ali
5e3fbd5774 oversubscription: driver/exec to honor MemoryMaxMB 2021-03-30 16:55:58 -04:00
Seth Hoenig
836ee9e4a2 drivers/exec+java: Add task configuration to restore previous PID/IPC isolation behavior
This PR adds pid_mode and ipc_mode options to the exec and java task
driver config options. By default these will defer to the default_pid_mode
and default_ipc_mode agent plugin options created in #9969. Setting
these values to "host" mode disables isolation for the task. Doing so
is not recommended, but may be necessary to support legacy job configurations.

Closes #9970
2021-02-08 14:26:35 -06:00
Seth Hoenig
6dd5de4b69 docs: fixup comments, var names 2021-02-08 10:58:44 -06:00
Seth Hoenig
b682371a22 drivers/exec+java: Add configuration to restore previous PID/IPC namespace behavior.
This PR adds default_pid_mode and default_ipc_mode options to the exec and java
task drivers. By default these will default to "private" mode, enabling PID and
IPC isolation for tasks. Setting them to "host" mode disables isolation. Doing
so is not recommended, but may be necessary to support legacy job configurations.

Closes #9969
2021-02-05 15:52:11 -06:00
Chris Baker
7f06adf1af Merge tag 'v1.0.3' into post-release-1.0.3
Version 1.0.3
2021-01-29 19:30:08 +00:00
Chris Baker
109fb53e50 put exec process in a new IPC namespace 2021-01-28 12:03:19 +00:00
Kris Hicks
677353a205 Add PID namespacing and e2e test 2021-01-28 12:03:19 +00:00
Kris Hicks
bcd4752fc9 executor_linux: Remove unreachable PATH= code (#9778)
This has to have been unused because the HasPrefix operation is
backwards, meaning a Command.Env that includes PATH= never would have
worked; the default path was always used.
2021-01-15 11:19:09 -08:00
Kris Hicks
071f4c7596 Add gocritic to golangci-lint config (#9556) 2020-12-08 12:47:04 -08:00
Shengjing Zhu
274bf2ee1c Adjust cgroup change in libcontainer 2020-08-20 00:31:07 +08:00
Thomas Lefebvre
5a017acd0b client: support no_pivot_root in exec driver configuration 2020-02-18 09:27:16 -08:00
Mahmood Ali
f794b49ec6 simplify cgroup path lookup 2019-12-11 12:43:25 -05:00
Mahmood Ali
596d0be5d8 executor: stop joining executor to container cgroup
Stop joining libcontainer executor process into the newly created task
container cgroup, to ensure that the cgroups are fully destroyed on
shutdown, and to make it consistent with other plugin processes.

Previously, executor process is added to the container cgroup so the
executor process resources get aggregated along with user processes in
our metric aggregation.

However, adding executor process to container cgroup adds some
complications with much benefits:

First, it complicates cleanup.  We must ensure that the executor is
removed from container cgroup on shutdown.  Though, we had a bug where
we missed removing it from the systemd cgroup.  Because executor uses
`containerState.CgroupPaths` on launch, which includes systemd, but
`cgroups.GetAllSubsystems` which doesn't.

Second, it may have advese side-effects.  When a user process is cpu
bound or uses too much memory, executor should remain functioning
without risk of being killed (by OOM killer) or throttled.

Third, it is inconsistent with other drivers and plugins.  Logmon and
DockerLogger processes aren't in the task cgroups.  Neither are
containerd processes, though it is equivalent to executor in
responsibility.

Fourth, in my experience when executor process moves cgroup while it's
running, the cgroup aggregation is odd.  The cgroup
`memory.usage_in_bytes` doesn't seem to capture the full memory usage of
the executor process and becomes a red-harring when investigating memory
issues.

For all the reasons above, I opted to have executor remain in nomad
agent cgroup and we can revisit this when we have a better story for
plugin process cgroup management.
2019-12-11 11:28:09 -05:00
Mahmood Ali
2f4b9da61a drivers/exec: test all cgroups are destroyed 2019-12-11 11:12:29 -05:00
Danielle Lancashire
afb59bedf5 volumes: Add support for mount propagation
This commit introduces support for configuring mount propagation when
mounting volumes with the `volume_mount` stanza on Linux targets.

Similar to Kubernetes, we expose 3 options for configuring mount
propagation:

- private, which is equivalent to `rprivate` on Linux, which does not allow the
           container to see any new nested mounts after the chroot was created.

- host-to-task, which is equivalent to `rslave` on Linux, which allows new mounts
                that have been created _outside of the container_ to be visible
                inside the container after the chroot is created.

- bidirectional, which is equivalent to `rshared` on Linux, which allows both
                 the container to see new mounts created on the host, but
                 importantly _allows the container to create mounts that are
                 visible in other containers an don the host_

private and host-to-task are safe, but bidirectional mounts can be
dangerous, as if the code inside a container creates a mount, and does
not clean it up before tearing down the container, it can cause bad
things to happen inside the kernel.

To add a layer of safety here, we require that the user has ReadWrite
permissions on the volume before allowing bidirectional mounts, as a
defense in depth / validation case, although creating mounts should also require
a priviliged execution environment inside the container.
2019-10-14 14:09:58 +02:00
Nick Ethier
2f16eb9640 executor: run exec commands in netns if set 2019-09-30 11:50:22 -04:00
Nick Ethier
1ff85f09f3 executor: cleanup netns handling in executor 2019-07-31 01:04:05 -04:00
Nick Ethier
e26192ad49 Driver networking support
Adds support for passing network isolation config into drivers and
implements support in the rawexec driver as a proof of concept
2019-07-31 01:03:20 -04:00
Mahmood Ali
c72bf13f8a exec: use an independent name=systemd cgroup path
We aim for containers to be part of a new cgroups hierarchy independent
from nomad agent.  However, we've been setting a relative path as
libcontainer `cfg.Cgroups.Path`, which makes libcontainer concatinate
the executor process cgroup with passed cgroup, as set in [1].

By setting an absolute path, we ensure that all cgroups subsystem
(including `name=systemd` get a dedicated one).  This matches behavior
in Nomad 0.8, and behavior of how Docker and OCI sets CgroupsPath[2]

Fixes #5736

[1] d7edf9b2e4/vendor/github.com/opencontainers/runc/libcontainer/cgroups/fs/apply_raw.go (L326-L340)
[2] 238f8eaa31/vendor/github.com/containerd/containerd/oci/spec.go (L229)
2019-06-10 22:00:12 -04:00
Mahmood Ali
1a6454d242 special case root capabilities 2019-05-24 14:10:10 -04:00
Mahmood Ali
82611af925 drivers/exec: Restore 0.8 capabilities
Nomad 0.9 incidentally set effective capabilities that is higher than
what's expected of a `nobody` process, and what's set in 0.8.

This change restores the capabilities to ones used in Nomad 0.9.
2019-05-20 13:11:29 -04:00
Lang Martin
568a120e7b Merge pull request #5649 from hashicorp/b-lookup-exe-chroot
lookup executables inside chroot
2019-05-17 15:07:41 -04:00
Mahmood Ali
7f76aedfae use pty/tty terminology similar to github.com/kr/pty 2019-05-10 19:17:14 -04:00
Mahmood Ali
976bfbc41a executors: implement streaming exec
Implements streamign exec handling in both executors (i.e. universal and
libcontainer).

For creation of TTY, some incidental complexity leaked in.  The universal
executor uses github.com/kr/pty for creation of TTYs.

On the other hand, libcontainer expects a console socket and for libcontainer to
create the underlying console object on process start.  The caller can then use
`libcontainer.utils.RecvFd()` to get tty master end.

I chose github.com/kr/pty for managing TTYs here.  I tried
`github.com/containerd/console` package (which is already imported), but the
package did not work as expected on macOS.
2019-05-10 19:17:14 -04:00
Lang Martin
a9da4bac11 executor_linux only do path resolution in the taskDir, not local
split out lookPathIn to show it's similarity to exec.LookPath
2019-05-10 11:33:35 -04:00
Lang Martin
0c4a45fa16 executor_linux pass the command to lookupTaskBin to get path 2019-05-08 10:01:20 -04:00
Lang Martin
9688710c10 executor/* Launch log at top of Launch is more explicit, trace 2019-05-07 17:01:05 -04:00
Lang Martin
538f387d8d move lookupTaskBin to executor_linux, for os dependency clarity 2019-05-07 16:58:27 -04:00
Lang Martin
7a2fdf7a2d executor and executor_linux debug launch prep and process start 2019-05-03 14:42:57 -04:00
Lang Martin
83930dcb9c executor_linux call new lookupTaskBin 2019-05-03 11:55:19 -04:00
Mahmood Ali
9bf54eae97 comment what refer to 2019-04-19 09:49:04 -04:00
Mahmood Ali
b6af5c9dca Move libcontainer helper to executor package 2019-04-19 09:49:04 -04:00
Michael Schurter
56048bda0a executor/linux: comment this bizarre code 2019-04-02 11:25:45 -07:00
Michael Schurter
21e895e2e7 Revert "executor/linux: add defensive checks to binary path"
This reverts commit cb36f4537e.
2019-04-02 11:17:12 -07:00
Michael Schurter
cb36f4537e executor/linux: add defensive checks to binary path 2019-04-02 09:40:53 -07:00