Explicitly set the `oom_score_adj` value for `exec` and `java` tasks.
We recommend that the Nomad service to have oom_score_adj of a low value
(e.g. -1000) to avoid having nomad agent OOM Killed if the node is
oversubscriped.
However, Nomad's workloads should not inherit Nomad's process, which is
the default behavior.
Fixes#10663
This PR enables setting allow_caps on the exec driver
plugin configuration, as well as cap_add and cap_drop in
exec task configuration. These options replicate the
functionality already present in the docker task driver.
Important: this change also reduces the default set of
capabilities enabled by the exec driver to match the
default set enabled by the docker driver. Until v1.0.5
the exec task driver would enable all capabilities supported
by the operating system. v1.0.5 removed NET_RAW from that
list of default capabilities, but left may others which
could potentially also be leveraged by compromised tasks.
Important: the "root" user is still special cased when
used with the exec driver. Older versions of Nomad enabled
enabled all capabilities supported by the operating system
for tasks set with the root user. To maintain compatibility
with existing clusters we continue supporting this "feature",
however we maintain support for the legacy set of capabilities
rather than enabling all capabilities now supported on modern
operating systems.
The default Linux Capabilities set enabled by the docker, exec, and
java task drivers includes CAP_NET_RAW (for making ping just work),
which has the side affect of opening an ARP DoS/MiTM attack between
tasks using bridge networking on the same host network.
https://docs.docker.com/engine/reference/run/#runtime-privilege-and-linux-capabilities
This PR disables CAP_NET_RAW for the docker, exec, and java task
drivers. The previous behavior can be restored for docker using the
allow_caps docker plugin configuration option.
A future version of nomad will enable similar configurability for the
exec and java task drivers.
* Fixup uses of `sanity`
* Remove unnecessary comments.
These checks are better explained by earlier comments about
the context of the test. Per @tgross, moved the tests together
to better reinforce the overall shared context.
* Update nomad/fsm_test.go
This PR adds pid_mode and ipc_mode options to the exec and java task
driver config options. By default these will defer to the default_pid_mode
and default_ipc_mode agent plugin options created in #9969. Setting
these values to "host" mode disables isolation for the task. Doing so
is not recommended, but may be necessary to support legacy job configurations.
Closes#9970
This PR adds default_pid_mode and default_ipc_mode options to the exec and java
task drivers. By default these will default to "private" mode, enabling PID and
IPC isolation for tasks. Setting them to "host" mode disables isolation. Doing
so is not recommended, but may be necessary to support legacy job configurations.
Closes#9969
When raw_exec is configured with [`no_cgroups`](https://www.nomadproject.io/docs/drivers/raw_exec#no_cgroups), raw_exec shouldn't attempt to create a cgroup.
Prior to this change, we accidentally always required freezer cgroup to do stats PID tracking. We already have the proper fallback in place for metrics, so only need to ensure that we don't create a cgroup for the task.
Fixes https://github.com/hashicorp/nomad/issues/8565
My latest Vagrant box contains an empty cgroup name that isn't used for
isolation:
```
$ cat /proc/self/cgroup | grep ::
0::/user.slice/user-1000.slice/session-17.scope
```
This commit introduces support for configuring mount propagation when
mounting volumes with the `volume_mount` stanza on Linux targets.
Similar to Kubernetes, we expose 3 options for configuring mount
propagation:
- private, which is equivalent to `rprivate` on Linux, which does not allow the
container to see any new nested mounts after the chroot was created.
- host-to-task, which is equivalent to `rslave` on Linux, which allows new mounts
that have been created _outside of the container_ to be visible
inside the container after the chroot is created.
- bidirectional, which is equivalent to `rshared` on Linux, which allows both
the container to see new mounts created on the host, but
importantly _allows the container to create mounts that are
visible in other containers an don the host_
private and host-to-task are safe, but bidirectional mounts can be
dangerous, as if the code inside a container creates a mount, and does
not clean it up before tearing down the container, it can cause bad
things to happen inside the kernel.
To add a layer of safety here, we require that the user has ReadWrite
permissions on the volume before allowing bidirectional mounts, as a
defense in depth / validation case, although creating mounts should also require
a priviliged execution environment inside the container.
* master: (71 commits)
Fix output of 'nomad deployment fail' with no arg
Always create a running allocation when testing task state
tests: ensure exec tests pass valid task resources (#4992)
some changes for more idiomatic code
fix iops related tests
fixed bug in loop delay
gofmt
improved code for readability
client: updateAlloc release lock after read
fixup! device attributes in `nomad node status -verbose`
drivers/exec: support device binds and mounts
fix iops bug and increase test matrix coverage
tests: tag image explicitly
changelog
ci: install lxc-templates explicitly
tests: skip checking rdma cgroup
ci: use Ubuntu 16.04 (Xenial) in TravisCI
client: update driver info on new fingerprint
drivers/docker: enforce volumes.enabled (#4983)
client: Style: use fluent style for building loggers
...
Noticed few places where tests seem to block indefinitely and panic
after the test run reaches the test package timeout.
I intend to follow up with the proper fix later, but timing out is much
better than indefinitely blocking.
IOPS have been modelled as a resource since Nomad 0.1 but has never
actually been detected and there is no plan in the short term to add
detection. This is because IOPS is a bit simplistic of a unit to define
the performance requirements from the underlying storage system. In its
current state it adds unnecessary confusion and can be removed without
impacting any users. This PR leaves IOPS defined at the jobspec parsing
level and in the api/ resources since these are the two public uses of
the field. These should be considered deprecated and only exist to allow
users to stop using them during the Nomad 0.9.x release. In the future,
there should be no expectation that the field will exist.