This commit introduces support for configuring mount propagation when
mounting volumes with the `volume_mount` stanza on Linux targets.
Similar to Kubernetes, we expose 3 options for configuring mount
propagation:
- private, which is equivalent to `rprivate` on Linux, which does not allow the
container to see any new nested mounts after the chroot was created.
- host-to-task, which is equivalent to `rslave` on Linux, which allows new mounts
that have been created _outside of the container_ to be visible
inside the container after the chroot is created.
- bidirectional, which is equivalent to `rshared` on Linux, which allows both
the container to see new mounts created on the host, but
importantly _allows the container to create mounts that are
visible in other containers an don the host_
private and host-to-task are safe, but bidirectional mounts can be
dangerous, as if the code inside a container creates a mount, and does
not clean it up before tearing down the container, it can cause bad
things to happen inside the kernel.
To add a layer of safety here, we require that the user has ReadWrite
permissions on the volume before allowing bidirectional mounts, as a
defense in depth / validation case, although creating mounts should also require
a priviliged execution environment inside the container.
Nomad 0.9 incidentally set effective capabilities that is higher than
what's expected of a `nobody` process, and what's set in 0.8.
This change restores the capabilities to ones used in Nomad 0.9.
Implements streamign exec handling in both executors (i.e. universal and
libcontainer).
For creation of TTY, some incidental complexity leaked in. The universal
executor uses github.com/kr/pty for creation of TTYs.
On the other hand, libcontainer expects a console socket and for libcontainer to
create the underlying console object on process start. The caller can then use
`libcontainer.utils.RecvFd()` to get tty master end.
I chose github.com/kr/pty for managing TTYs here. I tried
`github.com/containerd/console` package (which is already imported), but the
package did not work as expected on macOS.
Reverts hashicorp/nomad#5433
Apparently, channel communications can constitute Happens-Before even for proximate variables, so this syncing isn't necessary.
> _The closing of a channel happens before a receive that returns a zero value because the channel is closed._
https://golang.org/ref/mem#tmp_7
strings.Replace call with n=0 argument makes no sense
as it will do nothing. Probably -1 is intended.
Signed-off-by: Iskander Sharipov <quasilyte@gmail.com>
* CVE-2019-5736: Update libcontainer depedencies
Libcontainer is vulnerable to a runc container breakout, that was
reported as CVE-2019-5736[1]. Upgrading vendored libcontainer with the fix.
The runc changes are captured in 369b920277 .
[1] https://seclists.org/oss-sec/2019/q1/119
Track current memory usage, `memory.usage_in_bytes`, in addition to
`memory.max_memory_usage_in_bytes` and friends. This number is closer
what Docker reports.
Related to https://github.com/hashicorp/nomad/issues/5165 .
plugins/driver: update driver interface to support streaming stats
client/tr: use streaming stats api
TODO:
* how to handle errors and closed channel during stats streaming
* prevent tight loop if Stats(ctx) returns an error
drivers: update drivers TaskStats RPC to handle streaming results
executor: better error handling in stats rpc
docker: better control and error handling of stats rpc
driver: allow stats to return a recoverable error
This PR fixes various instances of plugins being launched without using
the parent loggers. This meant that logs would not all go to the same
output, break formatting etc.
Re-export the ResourceUsage structs in drivers package to avoid drivers
directly depending on the internal client/structs package directly.
I attempted moving the structs to drivers, but that caused some import
cycles that was a bit hard to disentagle. Alternatively, I added an
alias here that's sufficient for our purposes of avoiding external
drivers depend on internal packages, while allowing us to restructure
packages in future without breaking source compatibility.
We ultimately decided to provide a limited set of devices in exec/java
drivers instead of all of host ones. Pre-0.9, we made all host devices
available to exec tasks accidentally, yet most applications only use a
small subset, and this choice limits our ability to restrict/isolate GPU
and other devices.
Starting with 0.9, by default, we only provide the same subset of
devices Docker provides, and allow users to provide more devices as
needed on case-by-case basis.
This reverts commit 5805c64a9f.
This reverts commit ff9a4a17e5.
Use a dedicated /dev mount so we can inject more devices if necessary,
and avoid allowing a container to contaminate host /dev.
Follow up to https://github.com/hashicorp/nomad/pull/5143 - and fixes master.
Restores pre-0.9 behavior, where Nomad makes /dev available to exec
task. Switching to libcontainer, we accidentally made only a small
subset available.
Here, we err on the side of preserving behavior of 0.8, instead of going
for the sensible route, where only a reasonable subset of devices is
mounted by default and user can opt to request more.
* master: (71 commits)
Fix output of 'nomad deployment fail' with no arg
Always create a running allocation when testing task state
tests: ensure exec tests pass valid task resources (#4992)
some changes for more idiomatic code
fix iops related tests
fixed bug in loop delay
gofmt
improved code for readability
client: updateAlloc release lock after read
fixup! device attributes in `nomad node status -verbose`
drivers/exec: support device binds and mounts
fix iops bug and increase test matrix coverage
tests: tag image explicitly
changelog
ci: install lxc-templates explicitly
tests: skip checking rdma cgroup
ci: use Ubuntu 16.04 (Xenial) in TravisCI
client: update driver info on new fingerprint
drivers/docker: enforce volumes.enabled (#4983)
client: Style: use fluent style for building loggers
...
IOPS have been modelled as a resource since Nomad 0.1 but has never
actually been detected and there is no plan in the short term to add
detection. This is because IOPS is a bit simplistic of a unit to define
the performance requirements from the underlying storage system. In its
current state it adds unnecessary confusion and can be removed without
impacting any users. This PR leaves IOPS defined at the jobspec parsing
level and in the api/ resources since these are the two public uses of
the field. These should be considered deprecated and only exist to allow
users to stop using them during the Nomad 0.9.x release. In the future,
there should be no expectation that the field will exist.