406 Commits

Author SHA1 Message Date
Danielle Lancashire
cd0c2a6df0 csi: Setup gRPC Clients with a logger 2020-03-23 13:58:29 -04:00
Danielle Lancashire
0d412c0d7d csi: Add NodeGetCapabilities RPC 2020-03-23 13:58:29 -04:00
Danielle Lancashire
91e3bfea2e plugins_csi: Add GetControllerCapabilities RPC 2020-03-23 13:58:28 -04:00
Danielle Lancashire
86de04d840 csi: Add initial plumbing for controller rpcs 2020-03-23 13:58:28 -04:00
Danielle Lancashire
d296efd2c6 CSI Plugin Registration (#6555)
This changeset implements the initial registration and fingerprinting
of CSI Plugins as part of #5378. At a high level, it introduces the
following:

* A `csi_plugin` stanza as part of a Nomad task configuration, to
  allow a task to expose that it is a plugin.

* A new task runner hook: `csi_plugin_supervisor`. This hook does two
  things. When the `csi_plugin` stanza is detected, it will
  automatically configure the plugin task to receive bidirectional
  mounts to the CSI intermediary directory. At runtime, it will then
  perform an initial heartbeat of the plugin and handle submitting it to
  the new `dynamicplugins.Registry` for further use by the client, and
  then run a lightweight heartbeat loop that will emit task events
  when health changes.

* The `dynamicplugins.Registry` for handling plugins that run
  as Nomad tasks, in contrast to the existing catalog that requires
  `go-plugin` type plugins and to know the plugin configuration in
  advance.

* The `csimanager` which fingerprints CSI plugins, in a similar way to
  `drivermanager` and `devicemanager`. It currently only fingerprints
  the NodeID from the plugin, and assumes that all plugins are
  monolithic.

Missing features

* We do not use the live updates of the `dynamicplugin` registry in
  the `csimanager` yet.

* We do not deregister the plugins from the client when they shutdown
  yet, they just become indefinitely marked as unhealthy. This is
  deliberate until we figure out how we should manage deploying new
  versions of plugins/transitioning them.
2020-03-23 13:58:28 -04:00
Chris Baker
e2ea6713ea fix typo in comment 2020-03-13 09:09:46 -05:00
Mahmood Ali
d2ddef5ba3 update grpc
Upgrade grpc to v1.27.1 and protobuf plugins to v1.3.4.
2020-03-03 08:39:54 -05:00
Mahmood Ali
8b7db37613 update protobuf generated files 2020-03-02 15:19:46 -05:00
Mahmood Ali
943854469d driver: allow disabling log collection
Operators commonly have docker logs aggregated using various tools and
don't need nomad to manage their docker logs.  Worse, Nomad uses a
somewhat heavy docker api call to collect them and it seems to cause
problems when a client runs hundreds of log collections.

Here we add a knob to disable log aggregation completely for nomad.
When log collection is disabled, we avoid running logmon and
docker_logger for the docker tasks in this implementation.

The downside here is once disabled, `nomad logs ...` commands and API
no longer return logs and operators must corrolate alloc-ids with their
aggregated log info.

This is meant as a stop gap measure.  Ideally, we'd follow up with at
least two changes:

First, we should optimize behavior when we can such that operators don't
need to disable docker log collection.  Potentially by reverting to
using pre-0.9 syslog aggregation in linux environments, though with
different trade-offs.

Second, when/if logs are disabled, nomad logs endpoints should lookup
docker logs api on demand.  This ensures that the cost of log collection
is paid sparingly.
2019-12-08 14:15:03 -05:00
Tim Gross
f02f59d579 fix plugin launcher SetConfig msgpack params (#6776)
* fix plugin launcher SetConfig msgpack params

The plugin launcher tool was passing the wrong byte array into
`SetConfig`, resulting in msgpack decoding errors. This was fixed in
a949050 (#6725) but accidentally reverted in 6aff18d (#6590).

Co-Authored-By: Chris Baker <1675087+cgbaker@users.noreply.github.com>
2019-11-26 10:49:22 -05:00
Lang Martin
6aff18dcd0 plugins device: remove trace level containing config contents 2019-11-25 14:49:40 -05:00
Chris Baker
2dfced01e5 the plugin launcher tool was passing the wrong byte array into
SetConfig, resulting in msgpack decoding errors
2019-11-19 14:53:34 +00:00
Danielle Lancashire
afb59bedf5 volumes: Add support for mount propagation
This commit introduces support for configuring mount propagation when
mounting volumes with the `volume_mount` stanza on Linux targets.

Similar to Kubernetes, we expose 3 options for configuring mount
propagation:

- private, which is equivalent to `rprivate` on Linux, which does not allow the
           container to see any new nested mounts after the chroot was created.

- host-to-task, which is equivalent to `rslave` on Linux, which allows new mounts
                that have been created _outside of the container_ to be visible
                inside the container after the chroot is created.

- bidirectional, which is equivalent to `rshared` on Linux, which allows both
                 the container to see new mounts created on the host, but
                 importantly _allows the container to create mounts that are
                 visible in other containers an don the host_

private and host-to-task are safe, but bidirectional mounts can be
dangerous, as if the code inside a container creates a mount, and does
not clean it up before tearing down the container, it can cause bad
things to happen inside the kernel.

To add a layer of safety here, we require that the user has ReadWrite
permissions on the volume before allowing bidirectional mounts, as a
defense in depth / validation case, although creating mounts should also require
a priviliged execution environment inside the container.
2019-10-14 14:09:58 +02:00
Tim Gross
e17901d667 driver/networking: don't recreate existing network namespaces 2019-09-25 14:58:17 -04:00
Grégoire Delattre
50068fc348 Fix the ExecTask function in DriverExecTaskNotSupported (#6145)
This fixes the ExecTask definition to match with the DriverPlugin
interface.
2019-08-29 11:36:29 -04:00
Mahmood Ali
a72a0f8832 Merge pull request #5676 from hashicorp/f-b-upgrade-ugorji-dep-20190508
Update ugorji/go to latest
2019-08-23 18:29:49 -04:00
Lucas BEE
04f5ab391f Add NetworkIsolation in TaskConfig (#6135)
NetworkIsolation was left out of the task config when using an
external task driver plugin
2019-08-15 13:05:55 -04:00
Lucas BEE
dfd673f3c6 Fix missing plugin driver capabilities (#6128)
NetIsolationModes and MustInitiateNetwork were left out of the
driver Capabilities when using an external task driver plugin

Signed-off-by: Lucas BEE <pouulet@gmail.com>
2019-08-14 09:10:10 -04:00
Danielle Lancashire
c3c003dbd6 client: Add volume_hook for mounting volumes 2019-08-12 15:39:08 +02:00
Nick Ethier
e26192ad49 Driver networking support
Adds support for passing network isolation config into drivers and
implements support in the rawexec driver as a proof of concept
2019-07-31 01:03:20 -04:00
Nick Ethier
9fa47daf5c ar: fix lint errors 2019-07-31 01:03:19 -04:00
Nick Ethier
da3978b377 plugins/driver: make DriverNetworkManager interface optional 2019-07-31 01:03:19 -04:00
Nick Ethier
bc71cafcee plugin/driver: fix tests and add new dep to vendor 2019-07-31 01:03:17 -04:00
Nick Ethier
4a8a96fa1a ar: initial driver based network management 2019-07-31 01:03:17 -04:00
Nick Ethier
e20fa7ccc1 Add network lifecycle management
Adds a new Prerun and Postrun hooks to manage set up of network namespaces
on linux. Work still needs to be done to make the code platform agnostic and
support Docker style network initalization.
2019-07-31 01:03:17 -04:00
Jasmine Dahilig
e31db578e0 add formatting for hcl parsing error messages (#5972) 2019-07-19 10:04:39 -07:00
Mahmood Ali
d8a43b2066 Signal plugin shutdown for driver.TaskStats
The driver plugin stub client must call `grpcutils.HandleGrpcErr` to handle plugin
shutdown similar to other functions.  This ensures that TaskStats returns
`ErrPluginShutdown` when plugin shutdown.
2019-07-11 13:57:35 +08:00
Chris Baker
3a96683131 docker: DestroyTask was not cleaning up Docker images because it was erroring early due to an attempt to inspect an image that had already been removed 2019-06-03 19:04:27 +00:00
Mahmood Ali
a8a6515e9f Update ugorji/go to latest
Our testing so far indicates that ugorji/go/codec maintains backward
compatiblity with the version we are using now, for purposes of Nomad
serialization.

Using latest ugorji/go allows us to get back to using upstream library,
get get the optimizations benefits in RPC paths (including code
generation optimizations).

ugorji/go introduced two significant changes:
* time binary format in debb8e2d2e.  Setting `h.BasicHandle.TimeNotBuiltin = true` restores old behavior
* ugorji/go started honoring `json` tag as well:

v1.1.4 is the latest but has a bug in handling RawString that's fixed in
d09a80c1e0
.
2019-05-09 19:35:58 -04:00
Mahmood Ali
3a77502cae Add basic drivers conformance tests
Add consolidated testing package to serve as conformance tests for all drivers.
2019-05-09 16:49:08 -04:00
Mahmood Ali
94ed649489 implemment streaming exec handling in driver grpc handlers
Also add a helper that converts the adapts the high level interface to the
low-level interface of nomad exec interfaces.
2019-05-09 16:49:08 -04:00
Mahmood Ali
6d711d054b add nomad streaming exec core data structures and interfaces
In this commit, we add two driver interfaces for supporting `nomad exec`
invocation:

* A high level `ExecTaskStreamingDriver`, that operates on io reader/writers.
  Drivers should prefer using this interface
* A low level `ExecTaskStreamingRawDriver` that operates on the raw stream of
  input structs; useful when a driver delegates handling to driver backend (e.g.
  across RPC/grpc).

The interfaces are optional for a driver, as `nomad exec` support is opt-in.
Existing drivers continue to compile without exec support, until their
maintainer add such support.

Furthermore, we create protobuf structures to represent exec stream entities:
`ExecTaskStreamingRequest` and `ExecTaskStreamingResponse`.  We aim to reuse the
protobuf generated code as much as possible, without translation to avoid
conversion overhead.

`ExecTaskStream` abstract fetching and sending stream entities.  It's influenced
by the grpc bi-directional stream interface, to avoid needing any adapter.  I
considered using channels, but the asynchronisity and concurrency makes buffer
reuse too complicated, which would put more pressure on GC and slows exec operation.
2019-04-30 14:02:29 -04:00
Mahmood Ali
b8f80e5124 Simplify proto conversion and handle swap
Convert all cpu and memory usage fields regardless of stated measured
fields, and handle swap fields
2019-03-30 15:18:28 -04:00
Mahmood Ali
484ccfafab deserialize total ticks 2019-03-30 07:14:57 -04:00
Mahmood Ali
a880f93a61 Always report TotalTicks when percent is measured
Fix a case where TotalTicks doesn't get serialized across executor grpc
calls.

Here, I opted to implicit add field, rather than explicitly mark it as a
measured field, because it's a derived field and to preserve 0.8
behavior where total ticks aren't explicitly marked as a measured field.
2019-03-29 22:34:28 -04:00
Mahmood Ali
7299e0015a Merge pull request #5428 from hashicorp/b-dropped-logs-on-task-restart
client/logmon: restart log collection correctly when a task is restarted
2019-03-21 14:02:08 -04:00
Nick Ethier
cc12c8443f logmon: fix logmon handling in driver test harness 2019-03-20 21:14:08 -04:00
Mahmood Ali
eb5ab38ae5 Regenerate Proto files (#5421)
Noticed that the protobuf files are out of sync with ones generated by 1.2.0 protoc go plugin.

The cause for these files seem to be related to release processes, e.g. [0.9.0-beta1 preperation](ecec3d38de (diff-da4da188ee496377d456025c2eab4e87)), and [0.9.0-beta3 preperation](b849d84f2f).

This restores the changes to that of the pinned protoc version and fails build if protobuf files are out of sync.  Sample failing Travis job is that of the first commit change: https://travis-ci.org/hashicorp/nomad/jobs/506285085
2019-03-14 10:56:27 -04:00
Michael Schurter
b849d84f2f Generate files for 0.9.0-beta3 release 2019-02-26 09:44:49 -08:00
Michael Schurter
c9fe5d26b3 plugins: squelch context Canceled error logs
As far as I can tell this is the most straightforward and resilient way
to skip error logging on context cancellation with grpc streams. You
cannot compare the error against context.Canceled directly as it is of
type `*status.statusError`. The next best solution I found was:

```go
resp, err := stream.Recv()
if code, ok := err.(interface{ Code() code.Code }); ok {
	if code.Code == code.Canceled {
		return
	}
}
```

However I think checking ctx.Err() directly makes the code much easier
to read and is resilient against grpc API changes.
2019-02-21 15:32:18 -08:00
Michael Schurter
1d17fbc681 simplify hcl2 parsing helper
No need to pass in the entire eval context
2019-02-04 11:07:57 -08:00
Alex Dadgar
ecec3d38de Nomad 0.9.0-beta1 generated code 2019-01-30 10:49:44 -08:00
Nick Ethier
8aa451806b executor: fix bug and add tests for incorrect stats timestamp reporting 2019-01-28 21:57:45 -05:00
Mahmood Ali
b70442a325 drivers: pass logger through driver plugin client
This fixes a panic whenever driver plugin attempts to log a message.
2019-01-25 09:38:41 -05:00
Nick Ethier
f38612c3b3 plugins/drivers: change stats interval to duration type in proto 2019-01-24 22:19:18 -05:00
Mahmood Ali
4781f1dd7b Merge pull request #5229 from hashicorp/r-grabbag-201901019
Grab bag of small changes
2019-01-23 13:06:51 -05:00
Mahmood Ali
cabe5e90f2 spell check 2019-01-23 10:54:52 -05:00
Michael Schurter
158c74887e goimports until make check is happy 2019-01-23 06:27:14 -08:00
Michael Schurter
0d61ff0fb9 move pluginutils -> helper/pluginutils
I wanted a different color bikeshed, so I get to paint it
2019-01-22 15:50:08 -08:00
Alex Dadgar
95297c608c goimports 2019-01-22 15:44:31 -08:00