nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-05 18:05:42 +03:00

Author	SHA1	Message	Date
Danielle Lancashire	01ff8960b5	volume_hook: Loosen validation in host volume prep	2020-03-23 13:58:30 -04:00
Danielle Lancashire	7d044a340f	allocrunner: Push state from hooks to taskrunners This commit is an initial (read: janky) approach to forwarding state from an allocrunner hook to a taskrunner using a similar `hookResources` approach that tr's use internally. It should eventually probably be replaced with something a little bit more message based, but for things that only come from pre-run hooks, and don't change, it's probably fine for now.	2020-03-23 13:58:30 -04:00
Danielle Lancashire	246f210975	csi_hook: Stage/Mount volumes as required This commit introduces the first stage of volume mounting for an allocation. The csimanager.VolumeMounter interface manages the blocking and actual minutia of the CSI implementation allowing this hook to do the minimal work of volume retrieval and creating mount info. In the future the `CSIVolume.Get` request should be replaced by `CSIVolume.Claim(Batch?)` to minimize the number of RPCs and to handle external triggering of a ControllerPublishVolume request as required. We also need to ensure that if pre-run hooks fail, we still get a full unwinding of any publish and staged volumes to ensure that there are no hanging references to volumes. That is not handled in this commit.	2020-03-23 13:58:30 -04:00
Danielle Lancashire	69cbb964e1	client: Pass an RPC Client to AllocRunners As part of introducing support for CSI, AllocRunner hooks need to be able to communicate with Nomad Servers for validation of and interaction with storage volumes. Here we create a small RPCer interface and pass the client (rpc client) to the AR in preparation for making these RPCs.	2020-03-23 13:58:30 -04:00
Tim Gross	27e5cea0c5	csi: implement CSI controller detach request/response (#7107 ) This changeset implements the minimal structs on the client-side we need to compile the work-in-progress implementation of the server-to-controller RPCs. It doesn't include implementing the `ClientCSI.DettachVolume` RPC on the client.	2020-03-23 13:58:29 -04:00
Danielle Lancashire	5291c3acd0	csi: Fix broken call to newVolumeManager	2020-03-23 13:58:29 -04:00
Danielle Lancashire	b3a1110c50	csi: Provide plugin-scoped paths during RPCs When providing paths to plugins, the path needs to be in the scope of the plugins container, rather than that of the host. Here we enable that by providing the mount point through the plugin registration and then use it when constructing request target paths.	2020-03-23 13:58:29 -04:00
Danielle Lancashire	30527e3e51	csimanager: Cleanup volumemanager setup	2020-03-23 13:58:29 -04:00
Danielle Lancashire	c8e21f4547	csimanager: Instantiate fingerprint manager's csiclient	2020-03-23 13:58:29 -04:00
Danielle Lancashire	5d9f560f59	volume_manager: cleanup of mount detection No functional changes, but makes ensure.*Dir follow a nicer return style.	2020-03-23 13:58:29 -04:00
Danielle Lancashire	bead8dc8fc	volume_manager: Add support for publishing volumes	2020-03-23 13:58:29 -04:00
Danielle Lancashire	2e3599d35c	volume_manager: Initial support for unstaging volumes	2020-03-23 13:58:29 -04:00
Danielle Lancashire	9bc0ff16f6	volume_manager: NodeStageVolume Support This commit introduces support for staging volumes when a plugin implements the STAGE_UNSTAGE_VOLUME capability. See the following for further reference material: `4731db0e0b/spec.md (nodestagevolume)`	2020-03-23 13:58:29 -04:00
Danielle Lancashire	6b28db98b6	volume_manager: Introduce helpers for staging This commit adds helpers that create and validate the staging directory for a given volume. It is currently missing usage options as the interfaces are not yet in place for those. The staging directory is only required when a volume has the STAGE_UNSTAGE Volume capability and has to live within the plugin root as the plugin needs to be able to create mounts inside it from within the container.	2020-03-23 13:58:29 -04:00
Lang Martin	5c26cdf08b	csi: pluginmanager use PluginID instead of Driver	2020-03-23 13:58:29 -04:00
Danielle Lancashire	1250d56333	csi: Add VolumeManager (#6920 ) This changeset is some pre-requisite boilerplate that is required for introducing CSI volume management for client nodes. It extracts out fingerprinting logic from the csi instance manager. This change is to facilitate reusing the csimanager to also manage the node-local CSI functionality, as it is the easiest place for us to guaruntee health checking and to provide additional visibility into the running operations through the fingerprinter mechanism and goroutine. It also introduces the VolumeMounter interface that will be used to manage staging/publishing unstaging/unpublishing of volumes on the host.	2020-03-23 13:58:29 -04:00
Lang Martin	ee8496a88e	client structs: use nstructs rather than s for nomad/structs	2020-03-23 13:58:29 -04:00
Lang Martin	c37621cc98	client structs: move CSIVolumeAttachmentMode and CSIVolumeAccessMode	2020-03-23 13:58:29 -04:00
Danielle Lancashire	cd0c2a6df0	csi: Setup gRPC Clients with a logger	2020-03-23 13:58:29 -04:00
Danielle Lancashire	3f36dae246	csimanager: Fingerprint Node Service capabilities	2020-03-23 13:58:29 -04:00
Danielle Lancashire	406984ca8d	csimanager: Fingerprint controller capabilities	2020-03-23 13:58:29 -04:00
Danielle Lancashire	a7f7114590	client_csi: Validate Access/Attachment modes	2020-03-23 13:58:28 -04:00
Danielle Lancashire	19d06d5bb2	csi: ClientCSIControllerPublish* -> ClientCSIControllerAttach*	2020-03-23 13:58:28 -04:00
Danielle Lancashire	964ede4301	csi: Model Attachment and Access modes	2020-03-23 13:58:28 -04:00
Danielle Lancashire	778a32de4a	client: Setup CSI RPC Endpoint This commit introduces a new set of endpoints to a Nomad Client: ClientCSI. ClientCSI is responsible for mediating requests from a Nomad Server to a CSI Plugin running on a Nomad Client. It should only really be used to make controller RPCs.	2020-03-23 13:58:28 -04:00
Danielle Lancashire	d296efd2c6	CSI Plugin Registration (#6555 ) This changeset implements the initial registration and fingerprinting of CSI Plugins as part of #5378. At a high level, it introduces the following: * A `csi_plugin` stanza as part of a Nomad task configuration, to allow a task to expose that it is a plugin. * A new task runner hook: `csi_plugin_supervisor`. This hook does two things. When the `csi_plugin` stanza is detected, it will automatically configure the plugin task to receive bidirectional mounts to the CSI intermediary directory. At runtime, it will then perform an initial heartbeat of the plugin and handle submitting it to the new `dynamicplugins.Registry` for further use by the client, and then run a lightweight heartbeat loop that will emit task events when health changes. * The `dynamicplugins.Registry` for handling plugins that run as Nomad tasks, in contrast to the existing catalog that requires `go-plugin` type plugins and to know the plugin configuration in advance. * The `csimanager` which fingerprints CSI plugins, in a similar way to `drivermanager` and `devicemanager`. It currently only fingerprints the NodeID from the plugin, and assumes that all plugins are monolithic. Missing features * We do not use the live updates of the `dynamicplugin` registry in the `csimanager` yet. * We do not deregister the plugins from the client when they shutdown yet, they just become indefinitely marked as unhealthy. This is deliberate until we figure out how we should manage deploying new versions of plugins/transitioning them.	2020-03-23 13:58:28 -04:00
Drew Bailey	ae5777c4ea	Audit config, seams for enterprise audit features allow oss to parse sink duration clean up audit sink parsing ent eventer config reload fix typo SetEnabled to eventer interface client acl test rm dead code fix failing test	2020-03-23 13:47:42 -04:00
Mahmood Ali	525623c53c	health tracker: account for group service checks	2020-03-22 12:38:37 -04:00
Mahmood Ali	1454af731d	health check account for task lifecycle In service jobs, lifecycles non-sidecar task tweak health logic a bit: they may terminate successfully without impacting alloc health, but fail the alloc if they fail. Sidecars should be treated just like a normal task.	2020-03-22 12:37:40 -04:00
Mahmood Ali	3132176acd	health: fail health if any task is pending Fixes a bug where an allocation is considered healthy if some of the tasks are being restarted and as such, their checks aren't tracked by consul agent client. Here, we fix the immediate case by ensuring that an alloc is healthy only if tasks are running and the registered checks at the time are healthy. Previously, health tracker tracked task "health" independently from checks and leads to problems when a task restarts. Consider the following series of events: 1. all tasks start running -> `tracker.tasksHealthy` is true 2. one task has unhealthy checks and get restarted 3. remaining checks are healthy -> `tracker.checksHealthy` is true 4. propagate health status now that `tracker.tasksHealthy` and `tracker.checksHealthy`. This change ensures that we accurately use the latest status of tasks and checks regardless of their status changes. Also, ensures that we only consider check health after tasks are considered healthy, otherwise we risk trusting incomplete checks. This approach accomodates task dependencies well. Service jobs can have prestart short-lived tasks that will terminate before main process runs. These dead tasks that complete successfully will not negate health status.	2020-03-22 11:13:41 -04:00
Mahmood Ali	3719ff3059	tests: add a check for failing service checks Add tests to check for failing or missing service checks in consul update.	2020-03-22 11:13:40 -04:00
Mahmood Ali	2ad338ef38	address review feedback	2020-03-21 17:52:58 -04:00
Mahmood Ali	83b08ab158	tr: proceed to mark other tasks as dead if alloc fails	2020-03-21 17:52:58 -04:00
Mahmood Ali	4558fa6aec	fix test	2020-03-21 17:52:57 -04:00
Jasmine Dahilig	6c1474398f	change jobspec lifecycle stanza to use sidecar attribute instead of block_until status	2020-03-21 17:52:57 -04:00
Jasmine Dahilig	dcd317745d	fix restart policy for system jobs with no lifecycle	2020-03-21 17:52:56 -04:00
Jasmine Dahilig	3688a2b7a3	refactor TaskHookCoordinator tests to use mock package and add failed init and sidecar test cases	2020-03-21 17:52:56 -04:00
Jasmine Dahilig	db7e8614f3	remove debugging test code from TestAllocRunner_TaskLeader_StopRestoredTG	2020-03-21 17:52:54 -04:00
Jasmine Dahilig	60671f880d	fix bug in lifecycle restore tests after refactor	2020-03-21 17:52:54 -04:00
Jasmine Dahilig	90fa242d83	fix failing ci test: TestTaskRunner_UnregisterConsul_Retries	2020-03-21 17:52:54 -04:00
Jasmine Dahilig	da3eb69a2f	fix linting errors	2020-03-21 17:52:53 -04:00
Jasmine Dahilig	ee92c98d4e	add task hook coordinator many init tasks test case	2020-03-21 17:52:53 -04:00
Jasmine Dahilig	88d3e232a2	refactor task hook coordinator helper method and tests	2020-03-21 17:52:53 -04:00
Jasmine Dahilig	0031b6777f	clean up restore test	2020-03-21 17:52:52 -04:00
Jasmine Dahilig	aced15ea27	partial test for restore functionality	2020-03-21 17:52:52 -04:00
Jasmine Dahilig	48ce093dd5	account for client restarts in task lifecycle hooks	2020-03-21 17:52:51 -04:00
Jasmine Dahilig	c6cd7b523b	clean up restart conditions and restart tests for task lifecycle	2020-03-21 17:52:50 -04:00
Jasmine Dahilig	4be7d056ac	put lifecycle nil and empty checks in api Canonicalize	2020-03-21 17:52:50 -04:00
Jasmine Dahilig	fa19007dfb	update task hook coordinator tests	2020-03-21 17:52:46 -04:00
Jasmine Dahilig	c2ab4c9c90	add test for lifecycle coordinator	2020-03-21 17:52:42 -04:00

1 2 3 4 5 ...

4169 Commits