Commit Graph

4254 Commits

Author SHA1 Message Date
Lang Martin
ee8496a88e client structs: use nstructs rather than s for nomad/structs 2020-03-23 13:58:29 -04:00
Lang Martin
c37621cc98 client structs: move CSIVolumeAttachmentMode and CSIVolumeAccessMode 2020-03-23 13:58:29 -04:00
Danielle Lancashire
cd0c2a6df0 csi: Setup gRPC Clients with a logger 2020-03-23 13:58:29 -04:00
Danielle Lancashire
3f36dae246 csimanager: Fingerprint Node Service capabilities 2020-03-23 13:58:29 -04:00
Danielle Lancashire
406984ca8d csimanager: Fingerprint controller capabilities 2020-03-23 13:58:29 -04:00
Danielle Lancashire
a7f7114590 client_csi: Validate Access/Attachment modes 2020-03-23 13:58:28 -04:00
Danielle Lancashire
19d06d5bb2 csi: ClientCSIControllerPublish* -> ClientCSIControllerAttach* 2020-03-23 13:58:28 -04:00
Danielle Lancashire
964ede4301 csi: Model Attachment and Access modes 2020-03-23 13:58:28 -04:00
Danielle Lancashire
778a32de4a client: Setup CSI RPC Endpoint
This commit introduces a new set of endpoints to a Nomad Client:
ClientCSI.

ClientCSI is responsible for mediating requests from a Nomad Server to
a CSI Plugin running on a Nomad Client. It should only really be used to
make controller RPCs.
2020-03-23 13:58:28 -04:00
Danielle Lancashire
d296efd2c6 CSI Plugin Registration (#6555)
This changeset implements the initial registration and fingerprinting
of CSI Plugins as part of #5378. At a high level, it introduces the
following:

* A `csi_plugin` stanza as part of a Nomad task configuration, to
  allow a task to expose that it is a plugin.

* A new task runner hook: `csi_plugin_supervisor`. This hook does two
  things. When the `csi_plugin` stanza is detected, it will
  automatically configure the plugin task to receive bidirectional
  mounts to the CSI intermediary directory. At runtime, it will then
  perform an initial heartbeat of the plugin and handle submitting it to
  the new `dynamicplugins.Registry` for further use by the client, and
  then run a lightweight heartbeat loop that will emit task events
  when health changes.

* The `dynamicplugins.Registry` for handling plugins that run
  as Nomad tasks, in contrast to the existing catalog that requires
  `go-plugin` type plugins and to know the plugin configuration in
  advance.

* The `csimanager` which fingerprints CSI plugins, in a similar way to
  `drivermanager` and `devicemanager`. It currently only fingerprints
  the NodeID from the plugin, and assumes that all plugins are
  monolithic.

Missing features

* We do not use the live updates of the `dynamicplugin` registry in
  the `csimanager` yet.

* We do not deregister the plugins from the client when they shutdown
  yet, they just become indefinitely marked as unhealthy. This is
  deliberate until we figure out how we should manage deploying new
  versions of plugins/transitioning them.
2020-03-23 13:58:28 -04:00
Drew Bailey
ae5777c4ea Audit config, seams for enterprise audit features
allow oss to parse sink duration

clean up audit sink parsing

ent eventer config reload

fix typo

SetEnabled to eventer interface

client acl test

rm dead code

fix failing test
2020-03-23 13:47:42 -04:00
Mahmood Ali
525623c53c health tracker: account for group service checks 2020-03-22 12:38:37 -04:00
Mahmood Ali
1454af731d health check account for task lifecycle
In service jobs, lifecycles non-sidecar task tweak health logic a bit:
they may terminate successfully without impacting alloc health, but fail
the alloc if they fail.

Sidecars should be treated just like a normal task.
2020-03-22 12:37:40 -04:00
Mahmood Ali
3132176acd health: fail health if any task is pending
Fixes a bug where an allocation is considered healthy if some of the
tasks are being restarted and as such, their checks aren't tracked by
consul agent client.

Here, we fix the immediate case by ensuring that an alloc is healthy
only if tasks are running and the registered checks at the time are
healthy.

Previously, health tracker tracked task "health" independently from
checks and leads to problems when a task restarts.  Consider the
following series of events:

1. all tasks start running -> `tracker.tasksHealthy` is true
2. one task has unhealthy checks and get restarted
3. remaining checks are healthy -> `tracker.checksHealthy` is true
4. propagate health status now that `tracker.tasksHealthy` and
`tracker.checksHealthy`.

This change ensures that we accurately use the latest status of tasks
and checks regardless of their status changes.

Also, ensures that we only consider check health after tasks are
considered healthy, otherwise we risk trusting incomplete checks.

This approach accomodates task dependencies well.  Service jobs can have
prestart short-lived tasks that will terminate before main process runs.
These dead tasks that complete successfully will not negate health
status.
2020-03-22 11:13:41 -04:00
Mahmood Ali
3719ff3059 tests: add a check for failing service checks
Add tests to check for failing or missing service checks in consul
update.
2020-03-22 11:13:40 -04:00
Mahmood Ali
2ad338ef38 address review feedback 2020-03-21 17:52:58 -04:00
Mahmood Ali
83b08ab158 tr: proceed to mark other tasks as dead if alloc fails 2020-03-21 17:52:58 -04:00
Mahmood Ali
4558fa6aec fix test 2020-03-21 17:52:57 -04:00
Jasmine Dahilig
6c1474398f change jobspec lifecycle stanza to use sidecar attribute instead of
block_until status
2020-03-21 17:52:57 -04:00
Jasmine Dahilig
dcd317745d fix restart policy for system jobs with no lifecycle 2020-03-21 17:52:56 -04:00
Jasmine Dahilig
3688a2b7a3 refactor TaskHookCoordinator tests to use mock package and add failed init and sidecar test cases 2020-03-21 17:52:56 -04:00
Jasmine Dahilig
db7e8614f3 remove debugging test code from TestAllocRunner_TaskLeader_StopRestoredTG 2020-03-21 17:52:54 -04:00
Jasmine Dahilig
60671f880d fix bug in lifecycle restore tests after refactor 2020-03-21 17:52:54 -04:00
Jasmine Dahilig
90fa242d83 fix failing ci test: TestTaskRunner_UnregisterConsul_Retries 2020-03-21 17:52:54 -04:00
Jasmine Dahilig
da3eb69a2f fix linting errors 2020-03-21 17:52:53 -04:00
Jasmine Dahilig
ee92c98d4e add task hook coordinator many init tasks test case 2020-03-21 17:52:53 -04:00
Jasmine Dahilig
88d3e232a2 refactor task hook coordinator helper method and tests 2020-03-21 17:52:53 -04:00
Jasmine Dahilig
0031b6777f clean up restore test 2020-03-21 17:52:52 -04:00
Jasmine Dahilig
aced15ea27 partial test for restore functionality 2020-03-21 17:52:52 -04:00
Jasmine Dahilig
48ce093dd5 account for client restarts in task lifecycle hooks 2020-03-21 17:52:51 -04:00
Jasmine Dahilig
c6cd7b523b clean up restart conditions and restart tests for task lifecycle 2020-03-21 17:52:50 -04:00
Jasmine Dahilig
4be7d056ac put lifecycle nil and empty checks in api Canonicalize 2020-03-21 17:52:50 -04:00
Jasmine Dahilig
fa19007dfb update task hook coordinator tests 2020-03-21 17:52:46 -04:00
Jasmine Dahilig
c2ab4c9c90 add test for lifecycle coordinator 2020-03-21 17:52:42 -04:00
Jasmine Dahilig
262d204096 incorporate lifecycle into restart tracker 2020-03-21 17:52:40 -04:00
Mahmood Ali
5377b4cb58 Add a coordinator for alloc runners 2020-03-21 17:52:38 -04:00
Yoan Blanc
77cf2f0573 vendor: vault api and sdk
Signed-off-by: Yoan Blanc <yoan@dosimple.ch>
2020-03-21 17:57:48 +01:00
Mahmood Ali
ac53c110a6 Merge pull request #7236 from hashicorp/b-remove-rkt
Remove rkt as a built-in driver
2020-03-17 09:07:35 -04:00
Mahmood Ali
4e4c3873cb Update gopsutil code
Latest gosutil includes two backward incompatible changes:

First, it removed unused Stolen field in
cae8efcffa (diff-d9747e2da342bdb995f6389533ad1a3d)
.

Second, it updated the Windows cpu stats calculation to be inline with
other platforms, where it returns absolate stats rather than
percentages.  See https://github.com/shirou/gopsutil/pull/611.
2020-03-15 09:37:05 +01:00
Yoan Blanc
f80cbe86a1 gopsutils: v2.20.2
Signed-off-by: Yoan Blanc <yoan@dosimple.ch>
2020-03-15 09:36:59 +01:00
Michael Schurter
64c40af018 Merge pull request #7170 from fredrikhgrelland/consul_template_upgrade
Update consul-template to v0.24.1 and remove deprecated vault grace
2020-03-10 14:15:47 -07:00
Mahmood Ali
3284a34b42 Merge pull request #7255 from hashicorp/vendor-update-grpc-20200302
update grpc
2020-03-04 09:32:16 -05:00
Mahmood Ali
d2ddef5ba3 update grpc
Upgrade grpc to v1.27.1 and protobuf plugins to v1.3.4.
2020-03-03 08:39:54 -05:00
Mahmood Ali
e812954bd9 Simplify Bootstrap logic in tests
This change updates tests to honor `BootstrapExpect` exclusively when
forming test clusters and removes test only knobs, e.g.
`config.DevDisableBootstrap`.

Background:

Test cluster creation is fragile.  Test servers don't follow the
BootstapExpected route like production clusters.  Instead they start as
single node clusters and then get rejoin and may risk causing brain
split or other test flakiness.

The test framework expose few knobs to control those (e.g.
`config.DevDisableBootstrap` and `config.Bootstrap`) that control
whether a server should bootstrap the cluster.  These flags are
confusing and it's unclear when to use: their usage in multi-node
cluster isn't properly documented.  Furthermore, they have some bad
side-effects as they don't control Raft library: If
`config.DevDisableBootstrap` is true, the test server may not
immediately attempt to bootstrap a cluster, but after an election
timeout (~50ms), Raft may force a leadership election and win it (with
only one vote) and cause a split brain.

The knobs are also confusing as Bootstrap is an overloaded term.  In
BootstrapExpect, we refer to bootstrapping the cluster only after N
servers are connected.  But in tests and the knobs above, it refers to
whether the server is a single node cluster and shouldn't wait for any
other server.

Changes:

This commit makes two changes:

First, it relies on `BootstrapExpected` instead of `Bootstrap` and/or
`DevMode` flags.  This change is relatively trivial.

Introduce a `Bootstrapped` flag to track if the cluster is bootstrapped.
This allows us to keep `BootstrapExpected` immutable.  Previously, the
flag was a config value but it gets set to 0 after cluster bootstrap
completes.
2020-03-02 13:47:43 -05:00
Mahmood Ali
e265e4c7b2 Remove rkt as a built-in driver
Rkt has been archived and is no longer an active project:
* https://github.com/rkt/rkt
* https://github.com/rkt/rkt/issues/4024

The rkt driver will continue to live as an external plugin.
2020-02-26 22:16:41 -05:00
Fredrik Hoem Grelland
26cca14f27 Update consul-template to v0.24.1 and remove deprecated vault_grace (#7170) 2020-02-23 16:24:53 +01:00
Nick Ethier
3cd4d11efa Merge pull request #7163 from hashicorp/b-driver-plugin-recovery
drivermanager: attempt dispense on reattachment failure
2020-02-21 10:33:20 -05:00
Mahmood Ali
a3b0b25acb update rest of consul packages 2020-02-16 16:25:04 -06:00
Nick Ethier
64b7c91538 drivermanager: attempt dispense on reattachment failure 2020-02-15 00:50:06 -05:00
Seth Hoenig
1ced8ba47d Merge pull request #7106 from hashicorp/f-ctag-override
client: enable configuring enable_tag_override for services
2020-02-13 12:34:48 -06:00