41 Commits

Author SHA1 Message Date
Michael Smithhisler
47c14ddf28 remove remote task execution code (#24909) 2025-01-29 08:08:34 -05:00
Seth Hoenig
bb54d16e4a exec2: setup RPC plumbing for dynamic workload users (#20129)
And pass the dynamic users pool from the client into the hook.
2024-03-13 14:06:52 -05:00
Seth Hoenig
05937ab75b exec2: add client support for unveil filesystem isolation mode (#20115)
* exec2: add client support for unveil filesystem isolation mode

This PR adds support for a new filesystem isolation mode, "Unveil". The
mode introduces a "alloc_mounts" directory where tasks have user-owned
directory structure which are bind mounts into the real alloc directory
structure. This enables a task driver to use landlock (and maybe the
real unveil on openbsd one day) to isolate a task to the task owned
directory structure, providing sandboxing.

* actually create alloc-mounts-dir directory

* fix doc strings about alloc mount dir paths
2024-03-13 08:24:17 -05:00
Seth Hoenig
89ce092b20 docker: stop network pause container of lost alloc after node restart (#17455)
This PR fixes a bug where the docker network pause container would not be
stopped and removed in the case where a node is restarted, the alloc is
moved to another node, the node comes back up. See the issue below for
full repro conditions.

Basically in the DestroyNetwork PostRun hook we would depend on the
NetworkIsolationSpec field not being nil - which is only the case
if the Client stays alive all the way from network creation to network
teardown. If the node is rebooted we lose that state and previously
would not be able to find the pause container to remove. Now, we manually
find the pause container by scanning them and looking for the associated
allocID.

Fixes #17299
2023-06-09 08:46:29 -05:00
Tim Gross
bf7b82b52b drivers: make internal DisableLogCollection capability public (#17196)
The `DisableLogCollection` capability was introduced as an experimental
interface for the Docker driver in 0.10.4. The interface has been stable and
allowing third-party task drivers the same capability would be useful for those
drivers that don't need the additional overhead of logmon.

This PR only makes the capability public. It doesn't yet add it to the
configuration options for the other internal drivers.

Fixes: #14636 #15686
2023-05-16 09:16:03 -04:00
hashicorp-copywrite[bot]
f005448366 [COMPLIANCE] Add Copyright and License Headers 2023-04-10 15:36:59 +00:00
stswidwinski
d16a2c9467 Fix goroutine leakage (#15180)
* Fix goroutine leakage

* cl: add cl entry

Co-authored-by: Seth Hoenig <shoenig@duck.com>
2022-11-17 09:47:11 -06:00
fyn
b6ec83b59b fix(plugins): should return when ctx.Done 2022-04-09 01:04:29 +08:00
Thomas Lefebvre
4c9f476d32 fix: update incorrect DriverNetworkManager interface implementation in plugins/drivers/client.go and drivers/mock/driver.go
And add assertions to catch drifts at compilation time.
2022-03-15 11:51:01 -07:00
Michael Schurter
d50fb2a00e core: propagate remote task handles
Add a new driver capability: RemoteTasks.

When a task is run by a driver with RemoteTasks set, its TaskHandle will
be propagated to the server in its allocation's TaskState. If the task
is replaced due to a down node or draining, its TaskHandle will be
propagated to its replacement allocation.

This allows tasks to be scheduled in remote systems whose lifecycles are
disconnected from the Nomad node's lifecycle.

See https://github.com/hashicorp/nomad-driver-ecs for an example ECS
remote task driver.
2021-04-27 15:07:03 -07:00
Tim Gross
8860b72bc3 volumes: return better error messages for unsupported task drivers (#8030)
When an allocation runs for a task driver that can't support volume mounts,
the mounting will fail in a way that can be hard to understand. With host
volumes this usually means failing silently, whereas with CSI the operator
gets inscrutable internals exposed in the `nomad alloc status`.

This changeset adds a MountConfig field to the task driver Capabilities
response. We validate this when the `csi_hook` or `volume_hook` fires and
return a user-friendly error.

Note that we don't currently have a way to get driver capabilities up to the
server, except through attributes. Validating this when the user initially
submits the jobspec would be even better than what we're doing here (and could
be useful for all our other capabilities), but that's out of scope for this
changeset.

Also note that the MountConfig enum starts with "supports all" in order to
support community plugins in a backwards compatible way, rather than cutting
them off from volume mounting unexpectedly.
2020-05-21 09:18:02 -04:00
Tim Gross
e17901d667 driver/networking: don't recreate existing network namespaces 2019-09-25 14:58:17 -04:00
Lucas BEE
dfd673f3c6 Fix missing plugin driver capabilities (#6128)
NetIsolationModes and MustInitiateNetwork were left out of the
driver Capabilities when using an external task driver plugin

Signed-off-by: Lucas BEE <pouulet@gmail.com>
2019-08-14 09:10:10 -04:00
Nick Ethier
4a8a96fa1a ar: initial driver based network management 2019-07-31 01:03:17 -04:00
Mahmood Ali
d8a43b2066 Signal plugin shutdown for driver.TaskStats
The driver plugin stub client must call `grpcutils.HandleGrpcErr` to handle plugin
shutdown similar to other functions.  This ensures that TaskStats returns
`ErrPluginShutdown` when plugin shutdown.
2019-07-11 13:57:35 +08:00
Mahmood Ali
94ed649489 implemment streaming exec handling in driver grpc handlers
Also add a helper that converts the adapts the high level interface to the
low-level interface of nomad exec interfaces.
2019-05-09 16:49:08 -04:00
Michael Schurter
c9fe5d26b3 plugins: squelch context Canceled error logs
As far as I can tell this is the most straightforward and resilient way
to skip error logging on context cancellation with grpc streams. You
cannot compare the error against context.Canceled directly as it is of
type `*status.statusError`. The next best solution I found was:

```go
resp, err := stream.Recv()
if code, ok := err.(interface{ Code() code.Code }); ok {
	if code.Code == code.Canceled {
		return
	}
}
```

However I think checking ctx.Err() directly makes the code much easier
to read and is resilient against grpc API changes.
2019-02-21 15:32:18 -08:00
Nick Ethier
f38612c3b3 plugins/drivers: change stats interval to duration type in proto 2019-01-24 22:19:18 -05:00
Michael Schurter
158c74887e goimports until make check is happy 2019-01-23 06:27:14 -08:00
Michael Schurter
0d61ff0fb9 move pluginutils -> helper/pluginutils
I wanted a different color bikeshed, so I get to paint it
2019-01-22 15:50:08 -08:00
Alex Dadgar
fe2fa21a7d gofmt 2019-01-22 15:43:34 -08:00
Alex Dadgar
b9f36134dc move catalog + grpcutils 2019-01-22 15:11:57 -08:00
Nick Ethier
9904463da2 executor: fix failing stats related test 2019-01-12 12:18:23 -05:00
Nick Ethier
f6af1d4d04 docker: add test for stats collection 2019-01-12 12:18:22 -05:00
Nick Ethier
fbf9a4c772 executor: implement streaming stats API
plugins/driver: update driver interface to support streaming stats

client/tr: use streaming stats api

TODO:
 * how to handle errors and closed channel during stats streaming
 * prevent tight loop if Stats(ctx) returns an error

drivers: update drivers TaskStats RPC to handle streaming results

executor: better error handling in stats rpc

docker: better control and error handling of stats rpc

driver: allow stats to return a recoverable error
2019-01-12 12:18:22 -05:00
Mahmood Ali
800a3522e3 drivers: re-export ResourceUsage structs
Re-export the ResourceUsage structs in drivers package to avoid drivers
directly depending on the internal client/structs package directly.

I attempted moving the structs to drivers, but that caused some import
cycles that was a bit hard to disentagle.  Alternatively, I added an
alias here that's sufficient for our purposes of avoiding external
drivers depend on internal packages, while allowing us to restructure
packages in future without breaking source compatibility.
2019-01-08 09:11:47 -05:00
Mahmood Ali
c0162fab35 move cstructs.DeviceNetwork to drivers pkg 2019-01-08 09:11:47 -05:00
Mahmood Ali
694e3010c2 use drivers.FSIsolation 2019-01-08 09:11:47 -05:00
Alex Dadgar
437f03d877 recover 2019-01-07 14:49:40 -08:00
Alex Dadgar
ffadab1b20 remove nil logger 2019-01-07 14:48:01 -08:00
Nick Ethier
6951ca487d drivermanager: use allocID and task name to route task events 2018-12-18 23:01:51 -05:00
Nick Ethier
467930f650 executor: use grpc instead of netrpc as plugin protocol
* Added protobuf spec for executor
 * Seperated executor structs into their own package
2018-12-05 11:03:56 -05:00
Preetha Appan
829bf74aa8 modify fingerprint interface to use typed attribute struct 2018-11-28 10:01:03 -06:00
Nick Ethier
37ed75502e docker: move recoverable error proto to shared structs 2018-11-19 22:59:16 -05:00
Nick Ethier
c2d94dc86a drivers: support recoverable errors in the plugin RPC layer 2018-11-19 22:59:15 -05:00
Alex Dadgar
9d42f4d039 Plugin client's handle plugin dying
This PR plumbs the plugins done ctx through the base and driver plugin
clients (device already had it). Further, it adds generic handling of
gRPC stream errors.
2018-11-12 17:09:27 -08:00
Michael Schurter
fd2fcd7cb6 drivers: only log non-cancellation errors 2018-10-30 17:13:35 -07:00
Nick Ethier
2e055fe18a client: add test for driverfailure during fingerprinting 2018-10-16 16:56:56 -07:00
Nick Ethier
d335a82859 client: begin driver plugin integration
client: fingerprint driver plugins
2018-10-16 16:56:56 -07:00
Nick Ethier
c9f0d2e0b4 driver/raw_exec: port existing raw_exec tests and add some testing utilities 2018-10-16 16:53:31 -07:00
Nick Ethier
e2bf0a388e clientv2: base driver plugin (#4671)
Driver plugin framework to facilitate development of driver plugins.

Implementing plugins only need to implement the DriverPlugin interface.
The framework proxies this interface to the go-plugin GRPC interface generated
from the driver.proto spec.

A testing harness is provided to allow implementing drivers to test the full
lifecycle of the driver plugin. An example use:

func TestMyDriver(t *testing.T) {
    harness := NewDriverHarness(t, &MyDiverPlugin{})
    // The harness implements the DriverPlugin interface and can be used as such
    taskHandle, err := harness.StartTask(...)
}
2018-10-16 16:53:31 -07:00