Commit Graph

134 Commits

Author SHA1 Message Date
Mahmood Ali
596d0be5d8 executor: stop joining executor to container cgroup
Stop joining libcontainer executor process into the newly created task
container cgroup, to ensure that the cgroups are fully destroyed on
shutdown, and to make it consistent with other plugin processes.

Previously, executor process is added to the container cgroup so the
executor process resources get aggregated along with user processes in
our metric aggregation.

However, adding executor process to container cgroup adds some
complications with much benefits:

First, it complicates cleanup.  We must ensure that the executor is
removed from container cgroup on shutdown.  Though, we had a bug where
we missed removing it from the systemd cgroup.  Because executor uses
`containerState.CgroupPaths` on launch, which includes systemd, but
`cgroups.GetAllSubsystems` which doesn't.

Second, it may have advese side-effects.  When a user process is cpu
bound or uses too much memory, executor should remain functioning
without risk of being killed (by OOM killer) or throttled.

Third, it is inconsistent with other drivers and plugins.  Logmon and
DockerLogger processes aren't in the task cgroups.  Neither are
containerd processes, though it is equivalent to executor in
responsibility.

Fourth, in my experience when executor process moves cgroup while it's
running, the cgroup aggregation is odd.  The cgroup
`memory.usage_in_bytes` doesn't seem to capture the full memory usage of
the executor process and becomes a red-harring when investigating memory
issues.

For all the reasons above, I opted to have executor remain in nomad
agent cgroup and we can revisit this when we have a better story for
plugin process cgroup management.
2019-12-11 11:28:09 -05:00
Mahmood Ali
2f4b9da61a drivers/exec: test all cgroups are destroyed 2019-12-11 11:12:29 -05:00
Danielle Lancashire
afb59bedf5 volumes: Add support for mount propagation
This commit introduces support for configuring mount propagation when
mounting volumes with the `volume_mount` stanza on Linux targets.

Similar to Kubernetes, we expose 3 options for configuring mount
propagation:

- private, which is equivalent to `rprivate` on Linux, which does not allow the
           container to see any new nested mounts after the chroot was created.

- host-to-task, which is equivalent to `rslave` on Linux, which allows new mounts
                that have been created _outside of the container_ to be visible
                inside the container after the chroot is created.

- bidirectional, which is equivalent to `rshared` on Linux, which allows both
                 the container to see new mounts created on the host, but
                 importantly _allows the container to create mounts that are
                 visible in other containers an don the host_

private and host-to-task are safe, but bidirectional mounts can be
dangerous, as if the code inside a container creates a mount, and does
not clean it up before tearing down the container, it can cause bad
things to happen inside the kernel.

To add a layer of safety here, we require that the user has ReadWrite
permissions on the volume before allowing bidirectional mounts, as a
defense in depth / validation case, although creating mounts should also require
a priviliged execution environment inside the container.
2019-10-14 14:09:58 +02:00
Nick Ethier
149578ca1e executor: rename wrapNetns to withNetworkIsolation 2019-09-30 21:38:31 -04:00
Nick Ethier
e6ce6d2c2b comment wrapNetns 2019-09-30 12:06:52 -04:00
Nick Ethier
159b911820 executor: removed unused field from exec_utils.go 2019-09-30 11:57:34 -04:00
Nick Ethier
2f16eb9640 executor: run exec commands in netns if set 2019-09-30 11:50:22 -04:00
Nick Ethier
1ff85f09f3 executor: cleanup netns handling in executor 2019-07-31 01:04:05 -04:00
Nick Ethier
d28d865100 executor: support network namespacing on universal executor 2019-07-31 01:03:58 -04:00
Nick Ethier
e26192ad49 Driver networking support
Adds support for passing network isolation config into drivers and
implements support in the rawexec driver as a proof of concept
2019-07-31 01:03:20 -04:00
Lang Martin
c7cd018655 executor_universal_linux log a link to the docs on cgroup error 2019-07-24 12:37:33 -04:00
Lang Martin
cab04997f0 executor_universal_linux raw_exec cgroup failure is not fatal 2019-07-22 15:16:36 -04:00
Lang Martin
7bd881cbf7 default e.getAllPids in executor_basic 2019-07-18 10:57:27 -04:00
Lang Martin
ab3e6259d0 executor_unix and _windows stub getAllPids ByScanning 2019-07-17 17:34:06 -04:00
Lang Martin
1a9c598fc2 executor_universal_linux getAllPids chooses cgroup when available 2019-07-17 17:33:55 -04:00
Lang Martin
3834616691 executor use e.getAllPids() 2019-07-17 17:33:11 -04:00
Lang Martin
d3ef456bd7 resource_container_linux new getAllPidsByCgroup 2019-07-17 17:31:36 -04:00
Lang Martin
412997f566 pid_collector getAllPids -> getAllPidsByScanning 2019-07-17 17:31:20 -04:00
Mahmood Ali
8fb9d25041 comment on use of init() for plugin handlers 2019-06-18 20:54:55 -04:00
Mahmood Ali
eeaa95ddf9 Use init to handle plugin invocation
Currently, nomad "plugin" processes (e.g. executor, logmon, docker_logger) are started as CLI
commands to be handled by command CLI framework.  Plugin launchers use
`discover.NomadBinary()` to identify the binary and start it.

This has few downsides: The trivial one is that when running tests, one
must re-compile the nomad binary as the tests need to invoke the nomad
executable to start plugin.  This is frequently overlooked, resulting in
puzzlement.

The more significant issue with `executor` in particular is in relation
to external driver:

* Plugin must identify the path of invoking nomad binary, which is not
trivial; `discvoer.NomadBinary()` now returns the path to the plugin
rather than to nomad, preventing external drivers from launching
executors.

* The external driver may get a different version of executor than it
expects (specially if we make a binary incompatible change in future).

This commit addresses both downside by having the plugin invocation
handling through an `init()` call, similar to how libcontainer init
handler is done in [1] and recommened by libcontainer [2].  `init()`
will be invoked and handled properly in tests and external drivers.

For external drivers, this change will cause external drivers to launch
the executor that's compiled against.

There a are a couple of downsides to this approach:
* These specific packages (i.e executor, logmon, and dockerlog) need to
be careful in use of `init()`, package initializers.  Must avoid having
command execution rely on any other init in the package.  I prefixed
files with `z_` (golang processes files in lexical order), but ensured
we don't depend on order.
* The command handling is spread in multiple packages making it a bit
less obvious how plugin starts are handled.

[1] drivers/shared/executor/libcontainer_nsenter_linux.go
[2] eb4aeed24f/libcontainer (using-libcontainer)
2019-06-13 16:48:01 -04:00
Mahmood Ali
2cc2e60ded update comment 2019-06-11 13:00:26 -04:00
Mahmood Ali
c72bf13f8a exec: use an independent name=systemd cgroup path
We aim for containers to be part of a new cgroups hierarchy independent
from nomad agent.  However, we've been setting a relative path as
libcontainer `cfg.Cgroups.Path`, which makes libcontainer concatinate
the executor process cgroup with passed cgroup, as set in [1].

By setting an absolute path, we ensure that all cgroups subsystem
(including `name=systemd` get a dedicated one).  This matches behavior
in Nomad 0.8, and behavior of how Docker and OCI sets CgroupsPath[2]

Fixes #5736

[1] d7edf9b2e4/vendor/github.com/opencontainers/runc/libcontainer/cgroups/fs/apply_raw.go (L326-L340)
[2] 238f8eaa31/vendor/github.com/containerd/containerd/oci/spec.go (L229)
2019-06-10 22:00:12 -04:00
Mahmood Ali
6217d50803 Fix test comparisons 2019-05-24 21:38:22 -05:00
Mahmood Ali
a1414bd360 Test for expected capabilities specifically 2019-05-24 16:07:05 -05:00
Mahmood Ali
e855738e0c use /bin/bash 2019-05-24 14:50:23 -04:00
Mahmood Ali
1a6454d242 special case root capabilities 2019-05-24 14:10:10 -04:00
Mahmood Ali
3e1b136929 tests: Fix binary dir permissions 2019-05-24 11:31:12 -04:00
Mahmood Ali
67188714a3 fix 2019-05-20 15:30:07 -04:00
Mahmood Ali
82611af925 drivers/exec: Restore 0.8 capabilities
Nomad 0.9 incidentally set effective capabilities that is higher than
what's expected of a `nobody` process, and what's set in 0.8.

This change restores the capabilities to ones used in Nomad 0.9.
2019-05-20 13:11:29 -04:00
Lang Martin
568a120e7b Merge pull request #5649 from hashicorp/b-lookup-exe-chroot
lookup executables inside chroot
2019-05-17 15:07:41 -04:00
Mahmood Ali
7f76aedfae use pty/tty terminology similar to github.com/kr/pty 2019-05-10 19:17:14 -04:00
Mahmood Ali
976bfbc41a executors: implement streaming exec
Implements streamign exec handling in both executors (i.e. universal and
libcontainer).

For creation of TTY, some incidental complexity leaked in.  The universal
executor uses github.com/kr/pty for creation of TTYs.

On the other hand, libcontainer expects a console socket and for libcontainer to
create the underlying console object on process start.  The caller can then use
`libcontainer.utils.RecvFd()` to get tty master end.

I chose github.com/kr/pty for managing TTYs here.  I tried
`github.com/containerd/console` package (which is already imported), but the
package did not work as expected on macOS.
2019-05-10 19:17:14 -04:00
Mahmood Ali
efc4249f85 executor: scaffolding for executor grpc handling
Prepare executor to handle streaming exec API calls that reuse drivers protobuf
structs.
2019-05-10 19:17:14 -04:00
Lang Martin
a9da4bac11 executor_linux only do path resolution in the taskDir, not local
split out lookPathIn to show it's similarity to exec.LookPath
2019-05-10 11:33:35 -04:00
Lang Martin
60735deb7b executor_linux_test call lookupTaskBin with an ExecCommand 2019-05-08 10:01:51 -04:00
Lang Martin
0c4a45fa16 executor_linux pass the command to lookupTaskBin to get path 2019-05-08 10:01:20 -04:00
Lang Martin
9688710c10 executor/* Launch log at top of Launch is more explicit, trace 2019-05-07 17:01:05 -04:00
Lang Martin
538f387d8d move lookupTaskBin to executor_linux, for os dependency clarity 2019-05-07 16:58:27 -04:00
Lang Martin
2ea860a087 driver_test leave cat in the test, but add cat to the chroot 2019-05-07 16:14:01 -04:00
Lang Martin
bc5eaf6cdb executor_test cleanup old lookupBin tests 2019-05-04 10:21:59 -04:00
Lang Martin
8a7b5c6830 executor lookupTaskBin also does PATH expansion, anchored in taskDIR 2019-05-03 16:22:09 -04:00
Lang Martin
512bc52af5 executor_linux_test test PATH lookup inside the container 2019-05-03 16:21:58 -04:00
Lang Martin
7a2fdf7a2d executor and executor_linux debug launch prep and process start 2019-05-03 14:42:57 -04:00
Lang Martin
103d37d4f9 executor_linux_test new TestExecutor_EscapeContainer 2019-05-03 14:38:42 -04:00
Lang Martin
c16f82bb9a executor_test test for more edges of lookupBin behavior 2019-05-03 11:55:19 -04:00
Lang Martin
83930dcb9c executor_linux call new lookupTaskBin 2019-05-03 11:55:19 -04:00
Lang Martin
4538014cc3 executor split up lookupBin 2019-05-03 11:55:19 -04:00
Mahmood Ali
6747195682 comment on using init() for libcontainer handling 2019-04-19 09:49:04 -04:00
Mahmood Ali
9bf54eae97 comment what refer to 2019-04-19 09:49:04 -04:00
Mahmood Ali
b6af5c9dca Move libcontainer helper to executor package 2019-04-19 09:49:04 -04:00