Commit Graph

56 Commits

Author SHA1 Message Date
Seth Hoenig
591394fb62 drivers: plumb hardware topology via grpc into drivers (#18504)
* drivers: plumb hardware topology via grpc into drivers

This PR swaps out the temporary use of detecting system hardware manually
in each driver for using the Client's detected topology by plumbing the
data over gRPC. This ensures that Client configuration is taken to account
consistently in all references to system topology.

* cr: use enum instead of bool for core grade

* cr: fix test slit tables to be possible
2023-09-18 08:58:07 -05:00
Seth Hoenig
2e1974a574 client: refactor cpuset partitioning (#18371)
* client: refactor cpuset partitioning

This PR updates the way Nomad client manages the split between tasks
that make use of resources.cpus vs. resources.cores.

Previously, each task was explicitly assigned which CPU cores they were
able to run on. Every time a task was started or destroyed, all other
tasks' cpusets would need to be updated. This was inefficient and would
crush the Linux kernel when a client would try to run ~400 or so tasks.

Now, we make use of cgroup heirarchy and cpuset inheritence to efficiently
manage cpusets.

* cr: tweaks for feedback
2023-09-12 09:11:11 -05:00
Seth Hoenig
6747ef8803 drivers/raw_exec: restore ability to run tasks without nomad running as root (#18206)
Although nomad officially does not support running the client as a non-root
user, doing so has been more or less possible with the raw_exec driver as
long as you don't expect features to work like networking or running tasks
as specific users. In the cgroups refactoring I bulldozed right over the
special casing we had in place for raw_exec to continue working if the cgroups
were unable to be created. This PR restores that behavior - you can now
(as before) run the nomad client as a non-root user and make use of the
raw_exec task driver.
2023-08-15 11:22:30 -05:00
Seth Hoenig
a4cc76bd3e numa: enable numa topology detection (#18146)
* client: refactor cgroups management in client

* client: fingerprint numa topology

* client: plumb numa and cgroups changes to drivers

* client: cleanup task resource accounting

* client: numa client and config plumbing

* lib: add a stack implementation

* tools: remove ec2info tool

* plugins: fixup testing for cgroups / numa changes

* build: update makefile and package tests and cl
2023-08-10 17:05:30 -05:00
Patric Stout
e190eae395 Use config "cpu_total_compute" (if set) for all CPU statistics (#17628)
Before this commit, it was only used for fingerprinting, but not
for CPU stats on nodes or tasks. This meant that if the
auto-detection failed, setting the cpu_total_compute didn't resolved
the issue.

This issue was most noticeable on ARM64, as there auto-detection
always failed.
2023-07-19 13:30:47 -05:00
Tim Gross
7fc37fa2ad logs: fix logs.disabled on Windows (#17199)
On Windows the executor returns an error when trying to open the `NUL` device
when we pass it `os.DevNull` for the stdout/stderr paths. Instead of opening the
device, use the discard pipe so that we have platform-specific behavior from the
executor itself.

Fixes: #17148
2023-05-18 09:14:39 -04:00
hashicorp-copywrite[bot]
f005448366 [COMPLIANCE] Add Copyright and License Headers 2023-04-10 15:36:59 +00:00
Lance Haig
3160c76209 deps: Update ioutil library references to os and io respectively for drivers package (#16331)
* Update ioutil library references to os and io respectively for drivers package

No user facing changes so I assume no change log is required

* Fix failing tests
2023-03-08 10:31:09 -06:00
Seth Hoenig
be7ec8de3e raw_exec: make raw exec driver work with cgroups v2
This PR adds support for the raw_exec driver on systems with only cgroups v2.

The raw exec driver is able to use cgroups to manage processes. This happens
only on Linux, when exec_driver is enabled, and the no_cgroups option is not
set. The driver uses the freezer controller to freeze processes of a task,
issue a sigkill, then unfreeze. Previously the implementation assumed cgroups
v1, and now it also supports cgroups v2.

There is a bit of refactoring in this PR, but the fundamental design remains
the same.

Closes #12351 #12348
2022-04-04 16:11:38 -05:00
Seth Hoenig
5da1a31e94 client: enable support for cgroups v2
This PR introduces support for using Nomad on systems with cgroups v2 [1]
enabled as the cgroups controller mounted on /sys/fs/cgroups. Newer Linux
distros like Ubuntu 21.10 are shipping with cgroups v2 only, causing problems
for Nomad users.

Nomad mostly "just works" with cgroups v2 due to the indirection via libcontainer,
but not so for managing cpuset cgroups. Before, Nomad has been making use of
a feature in v1 where a PID could be a member of more than one cgroup. In v2
this is no longer possible, and so the logic around computing cpuset values
must be modified. When Nomad detects v2, it manages cpuset values in-process,
rather than making use of cgroup heirarchy inheritence via shared/reserved
parents.

Nomad will only activate the v2 logic when it detects cgroups2 is mounted at
/sys/fs/cgroups. This means on systems running in hybrid mode with cgroups2
mounted at /sys/fs/cgroups/unified (as is typical) Nomad will continue to
use the v1 logic, and should operate as before. Systems that do not support
cgroups v2 are also not affected.

When v2 is activated, Nomad will create a parent called nomad.slice (unless
otherwise configured in Client conifg), and create cgroups for tasks using
naming convention <allocID>-<task>.scope. These follow the naming convention
set by systemd and also used by Docker when cgroups v2 is detected.

Client nodes now export a new fingerprint attribute, unique.cgroups.version
which will be set to 'v1' or 'v2' to indicate the cgroups regime in use by
Nomad.

The new cpuset management strategy fixes #11705, where docker tasks that
spawned processes on startup would "leak". In cgroups v2, the PIDs are
started in the cgroup they will always live in, and thus the cause of
the leak is eliminated.

[1] https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html

Closes #11289
Fixes #11705 #11773 #11933
2022-03-23 11:35:27 -05:00
Seth Hoenig
143705fb28 deps: pty has new home
github.com/kr/pty was moved to github.com/creack/pty

Swap this dependency so we can upgrade to the latest version
and no longer need a replace directive.
2022-01-19 12:33:05 -06:00
James Rasell
3bffe443ac chore: fix incorrect docstring formatting. 2021-08-30 11:08:12 +02:00
Seth Hoenig
595cef8136 drivers/exec: pass capabilities through executor RPC
Add capabilities to the LaunchRequest proto so that the
capabilities set actually gets plumbed all the way through
to task launch.
2021-05-17 12:37:40 -06:00
Seth Hoenig
191144c3bf drivers/exec: enable setting allow_caps on exec driver
This PR enables setting allow_caps on the exec driver
plugin configuration, as well as cap_add and cap_drop in
exec task configuration. These options replicate the
functionality already present in the docker task driver.

Important: this change also reduces the default set of
capabilities enabled by the exec driver to match the
default set enabled by the docker driver. Until v1.0.5
the exec task driver would enable all capabilities supported
by the operating system. v1.0.5 removed NET_RAW from that
list of default capabilities, but left may others which
could potentially also be leveraged by compromised tasks.

Important: the "root" user is still special cased when
used with the exec driver. Older versions of Nomad enabled
enabled all capabilities supported by the operating system
for tasks set with the root user. To maintain compatibility
with existing clusters we continue supporting this "feature",
however we maintain support for the legacy set of capabilities
rather than enabling all capabilities now supported on modern
operating systems.
2021-05-17 12:37:40 -06:00
Seth Hoenig
003d68fe6d drivers/docker+exec+java: disable net_raw capability by default
The default Linux Capabilities set enabled by the docker, exec, and
java task drivers includes CAP_NET_RAW (for making ping just work),
which has the side affect of opening an ARP DoS/MiTM attack between
tasks using bridge networking on the same host network.

https://docs.docker.com/engine/reference/run/#runtime-privilege-and-linux-capabilities

This PR disables CAP_NET_RAW for the docker, exec, and java task
drivers. The previous behavior can be restored for docker using the
allow_caps docker plugin configuration option.

A future version of nomad will enable similar configurability for the
exec and java task drivers.
2021-05-12 13:22:09 -07:00
Seth Hoenig
836ee9e4a2 drivers/exec+java: Add task configuration to restore previous PID/IPC isolation behavior
This PR adds pid_mode and ipc_mode options to the exec and java task
driver config options. By default these will defer to the default_pid_mode
and default_ipc_mode agent plugin options created in #9969. Setting
these values to "host" mode disables isolation for the task. Doing so
is not recommended, but may be necessary to support legacy job configurations.

Closes #9970
2021-02-08 14:26:35 -06:00
Seth Hoenig
6dd5de4b69 docs: fixup comments, var names 2021-02-08 10:58:44 -06:00
Seth Hoenig
b682371a22 drivers/exec+java: Add configuration to restore previous PID/IPC namespace behavior.
This PR adds default_pid_mode and default_ipc_mode options to the exec and java
task drivers. By default these will default to "private" mode, enabling PID and
IPC isolation for tasks. Setting them to "host" mode disables isolation. Doing
so is not recommended, but may be necessary to support legacy job configurations.

Closes #9969
2021-02-05 15:52:11 -06:00
Mahmood Ali
edaa16589b honor task user when execing into raw_exec task (#9439)
Fix #9210 .

This update the executor so it honors the User when using nomad alloc exec. The bug was that the exec task didn't honor the init command when execing.
2020-11-25 09:34:10 -05:00
Mahmood Ali
bd745fa3e5 raw_exec: don't use cgroups when no_cgroup is set (#9328)
When raw_exec is configured with [`no_cgroups`](https://www.nomadproject.io/docs/drivers/raw_exec#no_cgroups), raw_exec shouldn't attempt to create a cgroup.

Prior to this change, we accidentally always required freezer cgroup to do stats PID tracking. We already have the proper fallback in place for metrics, so only need to ensure that we don't create a cgroup for the task.

Fixes https://github.com/hashicorp/nomad/issues/8565
2020-11-11 16:20:34 -05:00
Thomas Lefebvre
5a017acd0b client: support no_pivot_root in exec driver configuration 2020-02-18 09:27:16 -08:00
Nick Ethier
149578ca1e executor: rename wrapNetns to withNetworkIsolation 2019-09-30 21:38:31 -04:00
Nick Ethier
2f16eb9640 executor: run exec commands in netns if set 2019-09-30 11:50:22 -04:00
Nick Ethier
d28d865100 executor: support network namespacing on universal executor 2019-07-31 01:03:58 -04:00
Nick Ethier
e26192ad49 Driver networking support
Adds support for passing network isolation config into drivers and
implements support in the rawexec driver as a proof of concept
2019-07-31 01:03:20 -04:00
Lang Martin
3834616691 executor use e.getAllPids() 2019-07-17 17:33:11 -04:00
Lang Martin
568a120e7b Merge pull request #5649 from hashicorp/b-lookup-exe-chroot
lookup executables inside chroot
2019-05-17 15:07:41 -04:00
Mahmood Ali
976bfbc41a executors: implement streaming exec
Implements streamign exec handling in both executors (i.e. universal and
libcontainer).

For creation of TTY, some incidental complexity leaked in.  The universal
executor uses github.com/kr/pty for creation of TTYs.

On the other hand, libcontainer expects a console socket and for libcontainer to
create the underlying console object on process start.  The caller can then use
`libcontainer.utils.RecvFd()` to get tty master end.

I chose github.com/kr/pty for managing TTYs here.  I tried
`github.com/containerd/console` package (which is already imported), but the
package did not work as expected on macOS.
2019-05-10 19:17:14 -04:00
Mahmood Ali
efc4249f85 executor: scaffolding for executor grpc handling
Prepare executor to handle streaming exec API calls that reuse drivers protobuf
structs.
2019-05-10 19:17:14 -04:00
Lang Martin
9688710c10 executor/* Launch log at top of Launch is more explicit, trace 2019-05-07 17:01:05 -04:00
Lang Martin
538f387d8d move lookupTaskBin to executor_linux, for os dependency clarity 2019-05-07 16:58:27 -04:00
Lang Martin
8a7b5c6830 executor lookupTaskBin also does PATH expansion, anchored in taskDIR 2019-05-03 16:22:09 -04:00
Lang Martin
7a2fdf7a2d executor and executor_linux debug launch prep and process start 2019-05-03 14:42:57 -04:00
Lang Martin
4538014cc3 executor split up lookupBin 2019-05-03 11:55:19 -04:00
Mahmood Ali
244544b735 an alternative order 2019-04-02 20:00:54 -04:00
Mahmood Ali
d441cdd52f try not without checking stat first 2019-04-02 19:55:44 -04:00
Mahmood Ali
714c41185c rename fifo methods for clarity 2019-04-01 16:52:58 -04:00
Mahmood Ali
48259078df avoid opening files just to close them 2019-04-01 13:24:18 -04:00
Mahmood Ali
a0d025e90d Revert "executor: synchronize exitState accesses" (#5449)
Reverts hashicorp/nomad#5433

Apparently, channel communications can constitute Happens-Before even for proximate variables, so this syncing isn't necessary.

> _The closing of a channel happens before a receive that returns a zero value because the channel is closed._
https://golang.org/ref/mem#tmp_7
2019-03-20 07:33:05 -04:00
Nick Ethier
d9d90fa5f0 Merge pull request #5429 from hashicorp/b-blocking-executor-shutdown
executor: block shutdown on process exiting
2019-03-19 15:18:01 -04:00
Mahmood Ali
989175fc59 executor: synchronize exitState accesses
exitState is set in `wait()` goroutine but accessed in a different
`Wait()` goroutine, so accesses must be synchronized by a lock.
2019-03-17 11:56:58 -04:00
Nick Ethier
c2c984ea50 executor: block shutdown on process exiting 2019-03-15 23:50:17 -04:00
Danielle Tomlinson
17dbace46b executor: Always close stdout/stderr fifos 2019-01-15 16:47:27 +01:00
Alex Dadgar
109c5ef650 Merge pull request #5173 from hashicorp/b-log-levels
Plugins use parent loggers
2019-01-14 16:14:30 -08:00
Nick Ethier
fbf9a4c772 executor: implement streaming stats API
plugins/driver: update driver interface to support streaming stats

client/tr: use streaming stats api

TODO:
 * how to handle errors and closed channel during stats streaming
 * prevent tight loop if Stats(ctx) returns an error

drivers: update drivers TaskStats RPC to handle streaming results

executor: better error handling in stats rpc

docker: better control and error handling of stats rpc

driver: allow stats to return a recoverable error
2019-01-12 12:18:22 -05:00
Alex Dadgar
270ae48b82 Plugins use parent loggers
This PR fixes various instances of plugins being launched without using
the parent loggers. This meant that logs would not all go to the same
output, break formatting etc.
2019-01-11 11:36:37 -08:00
Mahmood Ali
800a3522e3 drivers: re-export ResourceUsage structs
Re-export the ResourceUsage structs in drivers package to avoid drivers
directly depending on the internal client/structs package directly.

I attempted moving the structs to drivers, but that caused some import
cycles that was a bit hard to disentagle.  Alternatively, I added an
alias here that's sufficient for our purposes of avoiding external
drivers depend on internal packages, while allowing us to restructure
packages in future without breaking source compatibility.
2019-01-08 09:11:47 -05:00
Nick Ethier
8a344412e8 Merge branch 'master' into f-grpc-executor
* master: (71 commits)
  Fix output of 'nomad deployment fail' with no arg
  Always create a running allocation when testing task state
  tests: ensure exec tests pass valid task resources (#4992)
  some changes for more idiomatic code
  fix iops related tests
  fixed bug in loop delay
  gofmt
  improved code for readability
  client: updateAlloc release lock after read
  fixup! device attributes in `nomad node status -verbose`
  drivers/exec: support device binds and mounts
  fix iops bug and increase test matrix coverage
  tests: tag image explicitly
  changelog
  ci: install lxc-templates explicitly
  tests: skip checking rdma cgroup
  ci: use Ubuntu 16.04 (Xenial) in TravisCI
  client: update driver info on new fingerprint
  drivers/docker: enforce volumes.enabled (#4983)
  client: Style: use fluent style for building loggers
  ...
2018-12-13 14:41:09 -05:00
Mahmood Ali
97f33bb153 drivers/exec: support device binds and mounts 2018-12-11 18:35:21 -05:00
Nick Ethier
224c68860d executor: use drivers.Resources as resource model 2018-12-06 21:22:02 -05:00