Commit Graph

4052 Commits

Author SHA1 Message Date
Danielle Lancashire
9bc0ff16f6 volume_manager: NodeStageVolume Support
This commit introduces support for staging volumes when a plugin
implements the STAGE_UNSTAGE_VOLUME capability.

See the following for further reference material:
 4731db0e0b/spec.md (nodestagevolume)
2020-03-23 13:58:29 -04:00
Danielle Lancashire
6b28db98b6 volume_manager: Introduce helpers for staging
This commit adds helpers that create and validate the staging directory
for a given volume. It is currently missing usage options as the
interfaces are not yet in place for those.

The staging directory is only required when a volume has the
STAGE_UNSTAGE Volume capability and has to live within the plugin root
as the plugin needs to be able to create mounts inside it from within
the container.
2020-03-23 13:58:29 -04:00
Lang Martin
5c26cdf08b csi: pluginmanager use PluginID instead of Driver 2020-03-23 13:58:29 -04:00
Danielle Lancashire
1250d56333 csi: Add VolumeManager (#6920)
This changeset is some pre-requisite boilerplate that is required for
introducing CSI volume management for client nodes.

It extracts out fingerprinting logic from the csi instance manager.
This change is to facilitate reusing the csimanager to also manage the
node-local CSI functionality, as it is the easiest place for us to
guaruntee health checking and to provide additional visibility into the
running operations through the fingerprinter mechanism and goroutine.

It also introduces the VolumeMounter interface that will be used to
manage staging/publishing unstaging/unpublishing of volumes on the host.
2020-03-23 13:58:29 -04:00
Lang Martin
ee8496a88e client structs: use nstructs rather than s for nomad/structs 2020-03-23 13:58:29 -04:00
Lang Martin
c37621cc98 client structs: move CSIVolumeAttachmentMode and CSIVolumeAccessMode 2020-03-23 13:58:29 -04:00
Danielle Lancashire
cd0c2a6df0 csi: Setup gRPC Clients with a logger 2020-03-23 13:58:29 -04:00
Danielle Lancashire
3f36dae246 csimanager: Fingerprint Node Service capabilities 2020-03-23 13:58:29 -04:00
Danielle Lancashire
406984ca8d csimanager: Fingerprint controller capabilities 2020-03-23 13:58:29 -04:00
Danielle Lancashire
a7f7114590 client_csi: Validate Access/Attachment modes 2020-03-23 13:58:28 -04:00
Danielle Lancashire
19d06d5bb2 csi: ClientCSIControllerPublish* -> ClientCSIControllerAttach* 2020-03-23 13:58:28 -04:00
Danielle Lancashire
964ede4301 csi: Model Attachment and Access modes 2020-03-23 13:58:28 -04:00
Danielle Lancashire
778a32de4a client: Setup CSI RPC Endpoint
This commit introduces a new set of endpoints to a Nomad Client:
ClientCSI.

ClientCSI is responsible for mediating requests from a Nomad Server to
a CSI Plugin running on a Nomad Client. It should only really be used to
make controller RPCs.
2020-03-23 13:58:28 -04:00
Danielle Lancashire
d296efd2c6 CSI Plugin Registration (#6555)
This changeset implements the initial registration and fingerprinting
of CSI Plugins as part of #5378. At a high level, it introduces the
following:

* A `csi_plugin` stanza as part of a Nomad task configuration, to
  allow a task to expose that it is a plugin.

* A new task runner hook: `csi_plugin_supervisor`. This hook does two
  things. When the `csi_plugin` stanza is detected, it will
  automatically configure the plugin task to receive bidirectional
  mounts to the CSI intermediary directory. At runtime, it will then
  perform an initial heartbeat of the plugin and handle submitting it to
  the new `dynamicplugins.Registry` for further use by the client, and
  then run a lightweight heartbeat loop that will emit task events
  when health changes.

* The `dynamicplugins.Registry` for handling plugins that run
  as Nomad tasks, in contrast to the existing catalog that requires
  `go-plugin` type plugins and to know the plugin configuration in
  advance.

* The `csimanager` which fingerprints CSI plugins, in a similar way to
  `drivermanager` and `devicemanager`. It currently only fingerprints
  the NodeID from the plugin, and assumes that all plugins are
  monolithic.

Missing features

* We do not use the live updates of the `dynamicplugin` registry in
  the `csimanager` yet.

* We do not deregister the plugins from the client when they shutdown
  yet, they just become indefinitely marked as unhealthy. This is
  deliberate until we figure out how we should manage deploying new
  versions of plugins/transitioning them.
2020-03-23 13:58:28 -04:00
Mahmood Ali
2ad338ef38 address review feedback 2020-03-21 17:52:58 -04:00
Mahmood Ali
83b08ab158 tr: proceed to mark other tasks as dead if alloc fails 2020-03-21 17:52:58 -04:00
Mahmood Ali
4558fa6aec fix test 2020-03-21 17:52:57 -04:00
Jasmine Dahilig
6c1474398f change jobspec lifecycle stanza to use sidecar attribute instead of
block_until status
2020-03-21 17:52:57 -04:00
Jasmine Dahilig
dcd317745d fix restart policy for system jobs with no lifecycle 2020-03-21 17:52:56 -04:00
Jasmine Dahilig
3688a2b7a3 refactor TaskHookCoordinator tests to use mock package and add failed init and sidecar test cases 2020-03-21 17:52:56 -04:00
Jasmine Dahilig
db7e8614f3 remove debugging test code from TestAllocRunner_TaskLeader_StopRestoredTG 2020-03-21 17:52:54 -04:00
Jasmine Dahilig
60671f880d fix bug in lifecycle restore tests after refactor 2020-03-21 17:52:54 -04:00
Jasmine Dahilig
90fa242d83 fix failing ci test: TestTaskRunner_UnregisterConsul_Retries 2020-03-21 17:52:54 -04:00
Jasmine Dahilig
da3eb69a2f fix linting errors 2020-03-21 17:52:53 -04:00
Jasmine Dahilig
ee92c98d4e add task hook coordinator many init tasks test case 2020-03-21 17:52:53 -04:00
Jasmine Dahilig
88d3e232a2 refactor task hook coordinator helper method and tests 2020-03-21 17:52:53 -04:00
Jasmine Dahilig
0031b6777f clean up restore test 2020-03-21 17:52:52 -04:00
Jasmine Dahilig
aced15ea27 partial test for restore functionality 2020-03-21 17:52:52 -04:00
Jasmine Dahilig
48ce093dd5 account for client restarts in task lifecycle hooks 2020-03-21 17:52:51 -04:00
Jasmine Dahilig
c6cd7b523b clean up restart conditions and restart tests for task lifecycle 2020-03-21 17:52:50 -04:00
Jasmine Dahilig
4be7d056ac put lifecycle nil and empty checks in api Canonicalize 2020-03-21 17:52:50 -04:00
Jasmine Dahilig
fa19007dfb update task hook coordinator tests 2020-03-21 17:52:46 -04:00
Jasmine Dahilig
c2ab4c9c90 add test for lifecycle coordinator 2020-03-21 17:52:42 -04:00
Jasmine Dahilig
262d204096 incorporate lifecycle into restart tracker 2020-03-21 17:52:40 -04:00
Mahmood Ali
5377b4cb58 Add a coordinator for alloc runners 2020-03-21 17:52:38 -04:00
Yoan Blanc
77cf2f0573 vendor: vault api and sdk
Signed-off-by: Yoan Blanc <yoan@dosimple.ch>
2020-03-21 17:57:48 +01:00
Mahmood Ali
ac53c110a6 Merge pull request #7236 from hashicorp/b-remove-rkt
Remove rkt as a built-in driver
2020-03-17 09:07:35 -04:00
Mahmood Ali
4e4c3873cb Update gopsutil code
Latest gosutil includes two backward incompatible changes:

First, it removed unused Stolen field in
cae8efcffa (diff-d9747e2da342bdb995f6389533ad1a3d)
.

Second, it updated the Windows cpu stats calculation to be inline with
other platforms, where it returns absolate stats rather than
percentages.  See https://github.com/shirou/gopsutil/pull/611.
2020-03-15 09:37:05 +01:00
Yoan Blanc
f80cbe86a1 gopsutils: v2.20.2
Signed-off-by: Yoan Blanc <yoan@dosimple.ch>
2020-03-15 09:36:59 +01:00
Michael Schurter
64c40af018 Merge pull request #7170 from fredrikhgrelland/consul_template_upgrade
Update consul-template to v0.24.1 and remove deprecated vault grace
2020-03-10 14:15:47 -07:00
Mahmood Ali
3284a34b42 Merge pull request #7255 from hashicorp/vendor-update-grpc-20200302
update grpc
2020-03-04 09:32:16 -05:00
Mahmood Ali
d2ddef5ba3 update grpc
Upgrade grpc to v1.27.1 and protobuf plugins to v1.3.4.
2020-03-03 08:39:54 -05:00
Mahmood Ali
e812954bd9 Simplify Bootstrap logic in tests
This change updates tests to honor `BootstrapExpect` exclusively when
forming test clusters and removes test only knobs, e.g.
`config.DevDisableBootstrap`.

Background:

Test cluster creation is fragile.  Test servers don't follow the
BootstapExpected route like production clusters.  Instead they start as
single node clusters and then get rejoin and may risk causing brain
split or other test flakiness.

The test framework expose few knobs to control those (e.g.
`config.DevDisableBootstrap` and `config.Bootstrap`) that control
whether a server should bootstrap the cluster.  These flags are
confusing and it's unclear when to use: their usage in multi-node
cluster isn't properly documented.  Furthermore, they have some bad
side-effects as they don't control Raft library: If
`config.DevDisableBootstrap` is true, the test server may not
immediately attempt to bootstrap a cluster, but after an election
timeout (~50ms), Raft may force a leadership election and win it (with
only one vote) and cause a split brain.

The knobs are also confusing as Bootstrap is an overloaded term.  In
BootstrapExpect, we refer to bootstrapping the cluster only after N
servers are connected.  But in tests and the knobs above, it refers to
whether the server is a single node cluster and shouldn't wait for any
other server.

Changes:

This commit makes two changes:

First, it relies on `BootstrapExpected` instead of `Bootstrap` and/or
`DevMode` flags.  This change is relatively trivial.

Introduce a `Bootstrapped` flag to track if the cluster is bootstrapped.
This allows us to keep `BootstrapExpected` immutable.  Previously, the
flag was a config value but it gets set to 0 after cluster bootstrap
completes.
2020-03-02 13:47:43 -05:00
Mahmood Ali
e265e4c7b2 Remove rkt as a built-in driver
Rkt has been archived and is no longer an active project:
* https://github.com/rkt/rkt
* https://github.com/rkt/rkt/issues/4024

The rkt driver will continue to live as an external plugin.
2020-02-26 22:16:41 -05:00
Fredrik Hoem Grelland
26cca14f27 Update consul-template to v0.24.1 and remove deprecated vault_grace (#7170) 2020-02-23 16:24:53 +01:00
Nick Ethier
3cd4d11efa Merge pull request #7163 from hashicorp/b-driver-plugin-recovery
drivermanager: attempt dispense on reattachment failure
2020-02-21 10:33:20 -05:00
Mahmood Ali
a3b0b25acb update rest of consul packages 2020-02-16 16:25:04 -06:00
Nick Ethier
64b7c91538 drivermanager: attempt dispense on reattachment failure 2020-02-15 00:50:06 -05:00
Seth Hoenig
1ced8ba47d Merge pull request #7106 from hashicorp/f-ctag-override
client: enable configuring enable_tag_override for services
2020-02-13 12:34:48 -06:00
Michael Schurter
3a01ad4892 Merge pull request #7102 from hashicorp/test-limits
Fix some race conditions and flaky tests
2020-02-13 10:19:11 -08:00