Commit Graph

56 Commits

Author SHA1 Message Date
Seth Hoenig
5da1a31e94 client: enable support for cgroups v2
This PR introduces support for using Nomad on systems with cgroups v2 [1]
enabled as the cgroups controller mounted on /sys/fs/cgroups. Newer Linux
distros like Ubuntu 21.10 are shipping with cgroups v2 only, causing problems
for Nomad users.

Nomad mostly "just works" with cgroups v2 due to the indirection via libcontainer,
but not so for managing cpuset cgroups. Before, Nomad has been making use of
a feature in v1 where a PID could be a member of more than one cgroup. In v2
this is no longer possible, and so the logic around computing cpuset values
must be modified. When Nomad detects v2, it manages cpuset values in-process,
rather than making use of cgroup heirarchy inheritence via shared/reserved
parents.

Nomad will only activate the v2 logic when it detects cgroups2 is mounted at
/sys/fs/cgroups. This means on systems running in hybrid mode with cgroups2
mounted at /sys/fs/cgroups/unified (as is typical) Nomad will continue to
use the v1 logic, and should operate as before. Systems that do not support
cgroups v2 are also not affected.

When v2 is activated, Nomad will create a parent called nomad.slice (unless
otherwise configured in Client conifg), and create cgroups for tasks using
naming convention <allocID>-<task>.scope. These follow the naming convention
set by systemd and also used by Docker when cgroups v2 is detected.

Client nodes now export a new fingerprint attribute, unique.cgroups.version
which will be set to 'v1' or 'v2' to indicate the cgroups regime in use by
Nomad.

The new cpuset management strategy fixes #11705, where docker tasks that
spawned processes on startup would "leak". In cgroups v2, the PIDs are
started in the cgroup they will always live in, and thus the cause of
the leak is eliminated.

[1] https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html

Closes #11289
Fixes #11705 #11773 #11933
2022-03-23 11:35:27 -05:00
James Rasell
e34fa583f9 allow configuration of Docker hostnames in bridge mode (#11173)
Add a new hostname string parameter to the network block which
allows operators to specify the hostname of the network namespace.
Changing this causes a destructive update to the allocation and it
is omitted if empty from API responses. This parameter also supports
interpolation.

In order to have a hostname passed as a configuration param when
creating an allocation network, the CreateNetwork func of the
DriverNetworkManager interface needs to be updated. In order to
minimize the disruption of future changes, rather than add another
string func arg, the function now accepts a request struct along with
the allocID param. The struct has the hostname as a field.

The in-tree implementations of DriverNetworkManager.CreateNetwork
have been modified to account for the function signature change.
In updating for the change, the enhancement of adding hostnames to
network namespaces has also been added to the Docker driver, whilst
the default Linux manager does not current implement it.
2021-09-16 08:13:09 +02:00
Tim Gross
2a640f0b2d docker: generate /etc/hosts file for bridge network mode (#10766)
When `network.mode = "bridge"`, we create a pause container in Docker with no
networking so that we have a process to hold the network namespace we create
in Nomad. The default `/etc/hosts` file of that pause container is then used
for all the Docker tasks that share that network namespace. Some applications
rely on this file being populated.

This changeset generates a `/etc/hosts` file and bind-mounts it to the
container when Nomad owns the network, so that the container's hostname has an
IP in the file as expected. The hosts file will include the entries added by
the Docker driver's `extra_hosts` field.

In this changeset, only the Docker task driver will take advantage of this
option, as the `exec`/`java` drivers currently copy the host's `/etc/hosts`
file and this can't be changed without breaking backwards compatibility. But
the fields are available in the task driver protobuf for community task
drivers to use if they'd like.
2021-06-16 14:55:22 -04:00
Michael Schurter
d50fb2a00e core: propagate remote task handles
Add a new driver capability: RemoteTasks.

When a task is run by a driver with RemoteTasks set, its TaskHandle will
be propagated to the server in its allocation's TaskState. If the task
is replaced due to a down node or draining, its TaskHandle will be
propagated to its replacement allocation.

This allows tasks to be scheduled in remote systems whose lifecycles are
disconnected from the Nomad node's lifecycle.

See https://github.com/hashicorp/nomad-driver-ecs for an example ECS
remote task driver.
2021-04-27 15:07:03 -07:00
Nick Ethier
9c014b62c6 plugins/driver: add cpuset_cpus back and mark cpuset_mems as reserved 2021-04-15 13:31:18 -04:00
Nick Ethier
5377be43ff executor: add support for cpuset cgroup 2021-04-15 10:24:31 -04:00
Nick Ethier
f897ac79e8 client/ar: thread through cpuset manager 2021-04-13 13:28:36 -04:00
Mahmood Ali
b1ff06fd19 oversubscription: docker to honor MemoryMaxMB values 2021-03-30 16:55:58 -04:00
Adrian Todorov
2748d2a895 driver/docker: add extra labels ( job name, task and task group name) 2021-03-08 08:59:52 -05:00
Nick Ethier
c11dbcd001 docker: support group allocated ports and host_networks (#8623)
* docker: support group allocated ports

* docker: add new ports driver config to specify which group ports are mapped

* docker: update port mapping docs
2020-08-11 18:30:22 -04:00
Nick Ethier
e9ff8a8daa Task DNS Options (#7661)
Co-Authored-By: Tim Gross <tgross@hashicorp.com>
Co-Authored-By: Seth Hoenig <shoenig@hashicorp.com>
2020-06-18 11:01:31 -07:00
Mahmood Ali
d6c75e301e cleanup driver eventor goroutines
This fixes few cases where driver eventor goroutines are leaked during
normal operations, but especially so in tests.

This change makes few modifications:

First, it switches drivers to use `Context`s to manage shutdown events.
Previously, it relied on callers invoking `.Shutdown()` function that is
specific to internal drivers only and require casting.  Using `Contexts`
provide a consistent idiomatic way to manage lifecycle for both internal
and external drivers.

Also, I discovered few places where we don't clean up a temporary driver
instance in the plugin catalog code, where we dispense a driver to
inspect and validate the schema config without properly cleaning it up.
2020-05-26 11:04:04 -04:00
Tim Gross
8860b72bc3 volumes: return better error messages for unsupported task drivers (#8030)
When an allocation runs for a task driver that can't support volume mounts,
the mounting will fail in a way that can be hard to understand. With host
volumes this usually means failing silently, whereas with CSI the operator
gets inscrutable internals exposed in the `nomad alloc status`.

This changeset adds a MountConfig field to the task driver Capabilities
response. We validate this when the `csi_hook` or `volume_hook` fires and
return a user-friendly error.

Note that we don't currently have a way to get driver capabilities up to the
server, except through attributes. Validating this when the user initially
submits the jobspec would be even better than what we're doing here (and could
be useful for all our other capabilities), but that's out of scope for this
changeset.

Also note that the MountConfig enum starts with "supports all" in order to
support community plugins in a backwards compatible way, rather than cutting
them off from volume mounting unexpectedly.
2020-05-21 09:18:02 -04:00
Chris Baker
e2ea6713ea fix typo in comment 2020-03-13 09:09:46 -05:00
Mahmood Ali
943854469d driver: allow disabling log collection
Operators commonly have docker logs aggregated using various tools and
don't need nomad to manage their docker logs.  Worse, Nomad uses a
somewhat heavy docker api call to collect them and it seems to cause
problems when a client runs hundreds of log collections.

Here we add a knob to disable log aggregation completely for nomad.
When log collection is disabled, we avoid running logmon and
docker_logger for the docker tasks in this implementation.

The downside here is once disabled, `nomad logs ...` commands and API
no longer return logs and operators must corrolate alloc-ids with their
aggregated log info.

This is meant as a stop gap measure.  Ideally, we'd follow up with at
least two changes:

First, we should optimize behavior when we can such that operators don't
need to disable docker log collection.  Potentially by reverting to
using pre-0.9 syslog aggregation in linux environments, though with
different trade-offs.

Second, when/if logs are disabled, nomad logs endpoints should lookup
docker logs api on demand.  This ensures that the cost of log collection
is paid sparingly.
2019-12-08 14:15:03 -05:00
Danielle Lancashire
afb59bedf5 volumes: Add support for mount propagation
This commit introduces support for configuring mount propagation when
mounting volumes with the `volume_mount` stanza on Linux targets.

Similar to Kubernetes, we expose 3 options for configuring mount
propagation:

- private, which is equivalent to `rprivate` on Linux, which does not allow the
           container to see any new nested mounts after the chroot was created.

- host-to-task, which is equivalent to `rslave` on Linux, which allows new mounts
                that have been created _outside of the container_ to be visible
                inside the container after the chroot is created.

- bidirectional, which is equivalent to `rshared` on Linux, which allows both
                 the container to see new mounts created on the host, but
                 importantly _allows the container to create mounts that are
                 visible in other containers an don the host_

private and host-to-task are safe, but bidirectional mounts can be
dangerous, as if the code inside a container creates a mount, and does
not clean it up before tearing down the container, it can cause bad
things to happen inside the kernel.

To add a layer of safety here, we require that the user has ReadWrite
permissions on the volume before allowing bidirectional mounts, as a
defense in depth / validation case, although creating mounts should also require
a priviliged execution environment inside the container.
2019-10-14 14:09:58 +02:00
Tim Gross
e17901d667 driver/networking: don't recreate existing network namespaces 2019-09-25 14:58:17 -04:00
Grégoire Delattre
50068fc348 Fix the ExecTask function in DriverExecTaskNotSupported (#6145)
This fixes the ExecTask definition to match with the DriverPlugin
interface.
2019-08-29 11:36:29 -04:00
Danielle Lancashire
c3c003dbd6 client: Add volume_hook for mounting volumes 2019-08-12 15:39:08 +02:00
Nick Ethier
da3978b377 plugins/driver: make DriverNetworkManager interface optional 2019-07-31 01:03:19 -04:00
Nick Ethier
bc71cafcee plugin/driver: fix tests and add new dep to vendor 2019-07-31 01:03:17 -04:00
Nick Ethier
4a8a96fa1a ar: initial driver based network management 2019-07-31 01:03:17 -04:00
Nick Ethier
e20fa7ccc1 Add network lifecycle management
Adds a new Prerun and Postrun hooks to manage set up of network namespaces
on linux. Work still needs to be done to make the code platform agnostic and
support Docker style network initalization.
2019-07-31 01:03:17 -04:00
Mahmood Ali
6d711d054b add nomad streaming exec core data structures and interfaces
In this commit, we add two driver interfaces for supporting `nomad exec`
invocation:

* A high level `ExecTaskStreamingDriver`, that operates on io reader/writers.
  Drivers should prefer using this interface
* A low level `ExecTaskStreamingRawDriver` that operates on the raw stream of
  input structs; useful when a driver delegates handling to driver backend (e.g.
  across RPC/grpc).

The interfaces are optional for a driver, as `nomad exec` support is opt-in.
Existing drivers continue to compile without exec support, until their
maintainer add such support.

Furthermore, we create protobuf structures to represent exec stream entities:
`ExecTaskStreamingRequest` and `ExecTaskStreamingResponse`.  We aim to reuse the
protobuf generated code as much as possible, without translation to avoid
conversion overhead.

`ExecTaskStream` abstract fetching and sending stream entities.  It's influenced
by the grpc bi-directional stream interface, to avoid needing any adapter.  I
considered using channels, but the asynchronisity and concurrency makes buffer
reuse too complicated, which would put more pressure on GC and slows exec operation.
2019-04-30 14:02:29 -04:00
Nick Ethier
994c66f7d7 drivers: use consts for task handle version 2019-01-18 18:31:01 -05:00
Nick Ethier
fbf9a4c772 executor: implement streaming stats API
plugins/driver: update driver interface to support streaming stats

client/tr: use streaming stats api

TODO:
 * how to handle errors and closed channel during stats streaming
 * prevent tight loop if Stats(ctx) returns an error

drivers: update drivers TaskStats RPC to handle streaming results

executor: better error handling in stats rpc

docker: better control and error handling of stats rpc

driver: allow stats to return a recoverable error
2019-01-12 12:18:22 -05:00
Mahmood Ali
800a3522e3 drivers: re-export ResourceUsage structs
Re-export the ResourceUsage structs in drivers package to avoid drivers
directly depending on the internal client/structs package directly.

I attempted moving the structs to drivers, but that caused some import
cycles that was a bit hard to disentagle.  Alternatively, I added an
alias here that's sufficient for our purposes of avoiding external
drivers depend on internal packages, while allowing us to restructure
packages in future without breaking source compatibility.
2019-01-08 09:11:47 -05:00
Mahmood Ali
c0162fab35 move cstructs.DeviceNetwork to drivers pkg 2019-01-08 09:11:47 -05:00
Mahmood Ali
694e3010c2 use drivers.FSIsolation 2019-01-08 09:11:47 -05:00
Danielle Tomlinson
994057abf4 drivers: Add internal interface for Shutdown
This allows us to correctly terminate internal state during runs of the
nomad test suite, e.g closing eventer contexts correctly.
2019-01-08 13:48:49 +01:00
Preetha Appan
26594aa31e Standardize driver health description messages for all drivers 2019-01-06 22:06:38 -06:00
Alex Dadgar
99df4c98c7 Store device envs separately and pass to drivers 2018-12-19 14:23:09 -08:00
Nick Ethier
6951ca487d drivermanager: use allocID and task name to route task events 2018-12-18 23:01:51 -05:00
Alex Dadgar
da6925bfc1 utilities 2018-12-18 15:48:52 -08:00
Danielle Tomlinson
23197ec6b4 drivers: Create drivers/shared/structs
This creates a drivers/shared/structs package and moves the buffer size
checks into it.
2018-11-30 10:46:13 +01:00
Preetha
be65442002 Merge pull request #4919 from hashicorp/f-fingerprint-attribute-type
Modify fingerprint interface to use typed attribute struct
2018-11-28 14:18:28 -06:00
Preetha Appan
829bf74aa8 modify fingerprint interface to use typed attribute struct 2018-11-28 10:01:03 -06:00
Alex Dadgar
429c5bb885 Device hook and devices affect computed node class
This PR introduces a device hook that retrieves the device mount
information for an allocation. It also updates the computed node class
computation to take into account devices.

TODO Fix the task runner unit test. The environment variable is being
lost even though it is being properly set in the prestart hook.
2018-11-27 17:25:33 -08:00
Chris Baker
e3108507c5 Merge pull request #4891 from hashicorp/b-1150-rkt-volume-names
drivers/rkt: fix invalid volumes
2018-11-27 18:55:00 -05:00
Preetha Appan
22ac0498ca Fix nil dereference in copy method 2018-11-26 15:53:15 -06:00
Chris Baker
790fe0b1db modified TaskConfig to include AllocID
use this for volume names in drivers/rkt to address #1150
2018-11-26 18:54:26 +00:00
Nick Ethier
fc53f5f635 docker: sync access to exit result within a handle 2018-11-20 20:41:32 -05:00
Nick Ethier
3601e4241d plugins/driver: remove NodeResources from task Resources and use PercentTicks field for docker driver 2018-11-19 22:59:17 -05:00
Nick Ethier
37ed75502e docker: move recoverable error proto to shared structs 2018-11-19 22:59:16 -05:00
Nick Ethier
9c154d7135 drivers: added NodeResources to drivers.TaskConfig 2018-11-19 22:59:16 -05:00
Nick Ethier
98b295d617 docker: started work on porting docker driver to new plugin framework 2018-11-19 22:59:15 -05:00
Alex Dadgar
9d42f4d039 Plugin client's handle plugin dying
This PR plumbs the plugins done ctx through the base and driver plugin
clients (device already had it). Further, it adds generic handling of
gRPC stream errors.
2018-11-12 17:09:27 -08:00
Michael Schurter
fdbe446ea6 client: first pass at implementing task restoring
Task restoring works but dead tasks may be restarted
2018-11-05 12:32:05 -08:00
Preetha Appan
3c6d6b9377 Review comments 2018-10-16 16:56:56 -07:00
Preetha Appan
0ebc3bdd2f RKT driver plugin and unit tests 2018-10-16 16:56:56 -07:00