nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-06 10:25:42 +03:00

Author	SHA1	Message	Date
Derek Strickland	bc03aadf3b	csi_hook: valid if any driver supports csi (#13446 ) * csi_hook: valid if any driver supports csi volumes	2022-06-22 10:43:43 -04:00
Tim Gross	a8d5e5e7a3	CSI: don't block client shutdown for node unmount (#12457 ) When we unmount a volume we need to be able to recover from cases where the plugin has been shutdown before the allocation that needs it, so in #11892 we blocked shutting down the alloc runner hook. But this blocks client shutdown if we're in the middle of unmounting. The client won't be able to communicate with the plugin or send the unpublish RPC anyways, so we should cancel the context and assume that we'll resume the unmounting process when the client restarts. For `-dev` mode we don't send the graceful `Shutdown()` method and instead destroy all the allocations. In this case, we'll never be able to communicate with the plugin but also never close the context we need to prevent the hook from blocking. To fix this, move the retries into their own goroutine that doesn't block the main `Postrun`.	2022-04-05 13:05:10 -04:00
Tim Gross	98e122c7e2	CSI: reorder controller volume detachment (#12387 ) In #12112 and #12113 we solved for the problem of races in releasing volume claims, but there was a case that we missed. During a node drain with a controller attach/detach, we can hit a race where we call controller publish before the unpublish has completed. This is discouraged in the spec but plugins are supposed to handle it safely. But if the storage provider's API is slow enough and the plugin doesn't handle the case safely, the volume can get "locked" into a state where the provider's API won't detach it cleanly. Check the claim before making any external controller publish RPC calls so that Nomad is responsible for the canonical information about whether a volume is currently claimed. This has a couple side-effects that also had to get fixed here: * Changing the order means that the volume will have a past claim without a valid external node ID because it came from the client, and this uncovered a separate bug where we didn't assert the external node ID was valid before returning it. Fallthrough to getting the ID from the plugins in the state store in this case. We avoided this originally because of concerns around plugins getting lost during node drain but now that we've fixed that we may want to revisit it in future work. * We should make sure we're handling `FailedPrecondition` cases from the controller plugin the same way we handle other retryable cases. * Several tests had to be updated because they were assuming we fail in a particular order that we're no longer doing.	2022-03-29 09:44:00 -04:00
Tim Gross	649f1e3967	CSI: retry claims from client when max claims are reached (#12113 ) When the alloc runner claims a volume, an allocation for a previous version of the job may still have the volume claimed because it's still shutting down. In this case we'll receive an error from the server. Retry this error until we succeed or until a very long timeout expires, to give operators a chance to recover broken plugins. Make the alloc runner hook tolerant of temporary RPC failures.	2022-02-24 10:39:07 -05:00
Tim Gross	89ca3d9d75	csi: don't wait to fire initial unmount RPC (#12102 ) In PR #11892 we updated the `csi_hook` to unmount the volume locally via the CSI node RPCs before releasing the claim from the server. The timer for this hook was initialized with the retry time, forcing us to wait 1s before making the first unmount RPC calls. Use the new helper for timers to ensure we clean up the timer nicely.	2022-02-22 13:43:06 -05:00
Tim Gross	8364eda1d7	CSI: node unmount from the client before unpublish RPC (#11892 ) When an allocation stops, the `csi_hook` makes an unpublish RPC to the servers to unpublish via the CSI RPCs: first to the node plugins and then the controller plugins. The controller RPCs must happen after the node RPCs so that the node has had a chance to unmount the volume before the controller tries to detach the associated device. But the client has local access to the node plugins and can independently determine if it's safe to send unpublish RPC to those plugins. This will allow the server to treat the node plugin as abandoned if a client is disconnected and `stop_on_client_disconnect` is set. This will let the server try to send unpublish RPCs to the controller plugins, under the assumption that the client will be trying to unmount the volume on its end first. Note that the CSI `NodeUnpublishVolume`/`NodeUnstageVolume` RPCs can return ignorable errors in the case where the volume has already been unmounted from the node. Handle all other errors by retrying until we get success so as to give operators the opportunity to reschedule a failed node plugin (ex. in the case where they accidentally drained a node without `-ignore-system`). Fan-out the work for each volume into its own goroutine so that we can release a subset of volumes if only one is stuck.	2022-01-28 08:30:31 -05:00
Tim Gross	d27b1370ae	CSI: tests to exercise csi_hook (#11788 ) Small refactoring of the allocrunner hook for CSI to make it more testable, and a unit test that covers most of its logic.	2022-01-07 15:23:47 -05:00
James Rasell	ab9ba35e6a	chore: fixup inconsistent method receiver names. (#11704 )	2021-12-20 11:44:21 +01:00
Tim Gross	a37af3100e	CSI: use AccessMode/AttachmentMode from CSIVolumeClaim Registration of Nomad volumes previously allowed for a single volume capability (access mode + attachment mode pair). The recent `volume create` command requires that we pass a list of requested capabilities, but the existing workflow for claiming volumes and attaching them on the client assumed that the volume's single capability was correct and unchanging. Add `AccessMode` and `AttachmentMode` to `CSIVolumeClaim`, use these fields to set the initial claim value, and add backwards compatibility logic to handle the existing volumes that already have claims without these fields.	2021-04-07 11:24:09 -04:00
Tim Gross	7c7569674c	CSI: unique volume per allocation Add a `PerAlloc` field to volume requests that directs the scheduler to test feasibility for volumes with a source ID that includes the allocation index suffix (ex. `[0]`), rather than the exact source ID. Read the `PerAlloc` field when making the volume claim at the client to determine if the allocation index suffix (ex. `[0]`) should be added to the volume source ID.	2021-03-18 15:35:11 -04:00
Tim Gross	0ed0b945c9	csi: Postrun hook should not change mode (#9323 ) The unpublish workflow requires that we know the mode (RW vs RO) if we want to unpublish the node. Update the hook and the Unpublish RPC so that we mark the claim for release in a new state but leave the mode alone. This fixes a bug where RO claims were failing node unpublish. The core job GC doesn't know the mode, but we don't need it for that workflow, so add a mode specifically for GC; the volumewatcher uses this as a sentinel to check whether claims (with their specific RW vs RO modes) need to be claimed.	2020-11-11 13:06:30 -05:00
Tim Gross	1ce58e8000	csi: fix incorrect comment on csi_hook context lifetime	2020-10-09 11:03:51 -04:00
Lang Martin	c0bf46da1e	CSI RPC Token (#8626 ) * client/allocrunner/csi_hook: use the Node SecretID * client/allocrunner/csi_hook: include the namespace for Claim	2020-08-11 13:08:39 -04:00
Tim Gross	9384b1f77e	csi: release claims via csi_hook postrun unpublish RPC (#8580 ) Add a Postrun hook to send the `CSIVolume.Unpublish` RPC to the server. This may forward client RPCs to the node plugins or to the controller plugins, depending on whether other allocations on this node have claims on this volume. By making clients responsible for running the `CSIVolume.Unpublish` RPC (and making the RPC available to a `nomad volume detach` command), the volumewatcher becomes only used by the core GC job and we no longer need async volume GC from job deregister and node update.	2020-08-06 14:51:46 -04:00
Tim Gross	8860b72bc3	volumes: return better error messages for unsupported task drivers (#8030 ) When an allocation runs for a task driver that can't support volume mounts, the mounting will fail in a way that can be hard to understand. With host volumes this usually means failing silently, whereas with CSI the operator gets inscrutable internals exposed in the `nomad alloc status`. This changeset adds a MountConfig field to the task driver Capabilities response. We validate this when the `csi_hook` or `volume_hook` fires and return a user-friendly error. Note that we don't currently have a way to get driver capabilities up to the server, except through attributes. Validating this when the user initially submits the jobspec would be even better than what we're doing here (and could be useful for all our other capabilities), but that's out of scope for this changeset. Also note that the MountConfig enum starts with "supports all" in order to support community plugins in a backwards compatible way, rather than cutting them off from volume mounting unexpectedly.	2020-05-21 09:18:02 -04:00
Tim Gross	22d4b88b69	csi: checkpoint volume claim garbage collection (#7782 ) Adds a `CSIVolumeClaim` type to be tracked as current and past claims on a volume. Allows for a client RPC failure during node or controller detachment without having to keep the allocation around after the first garbage collection eval. This changeset lays groundwork for moving the actual detachment RPCs into a volume watching loop outside the GC eval.	2020-04-23 11:06:23 -04:00
Tim Gross	414caf76e5	CSI: move node unmount to server-driven RPCs (#7596 ) If a volume-claiming alloc stops and the CSI Node plugin that serves that alloc's volumes is missing, there's no way for the allocrunner hook to send the `NodeUnpublish` and `NodeUnstage` RPCs. This changeset addresses this issue with a redesign of the client-side for CSI. Rather than unmounting in the alloc runner hook, the alloc runner hook will simply exit. When the server gets the `Node.UpdateAlloc` for the terminal allocation that had a volume claim, it creates a volume claim GC job. This job will made client RPCs to a new node plugin RPC endpoint, and only once that succeeds, move on to making the client RPCs to the controller plugin. If the node plugin is unavailable, the GC job will fail and be requeued.	2020-04-02 16:04:56 -04:00
Tim Gross	74e5c90b42	csi: annotate remaining missing cancellation contexts (#7552 )	2020-03-30 16:46:43 -04:00
Lang Martin	1bef8b8879	csi: add mount_options to volumes and volume requests (#7398 ) Add mount_options to both the volume definition on registration and to the volume block in the group where the volume is requested. If both are specified, the options provided in the request replace the options defined in the volume. They get passed to the NodePublishVolume, which causes the node plugin to actually mount the volume on the host. Individual tasks just mount bind into the host mounted volume (unchanged behavior). An operator can mount the same volume with different options by specifying it twice in the group context. closes #7007 * nomad/structs/volumes: add MountOptions to volume request * jobspec/test-fixtures/basic.hcl: add mount_options to volume block * jobspec/parse_test: add expected MountOptions * api/tasks: add mount_options * jobspec/parse_group: use hcl decode not mapstructure, mount_options * client/allocrunner/csi_hook: pass MountOptions through client/allocrunner/csi_hook: add a VolumeMountOptions client/allocrunner/csi_hook: drop Options client/allocrunner/csi_hook: use the structs options * client/pluginmanager/csimanager/interface: UsageOptions.MountOptions * client/pluginmanager/csimanager/volume: pass MountOptions in capabilities * plugins/csi/plugin: remove todo 7007 comment * nomad/structs/csi: MountOptions * api/csi: add options to the api for parsing, match structs * plugins/csi/plugin: move VolumeMountOptions to structs * api/csi: use specific type for mount_options * client/allocrunner/csi_hook: merge MountOptions here * rename CSIOptions to CSIMountOptions * client/allocrunner/csi_hook * client/pluginmanager/csimanager/volume * nomad/structs/csi * plugins/csi/fake/client: add PrevVolumeCapability * plugins/csi/plugin * client/pluginmanager/csimanager/volume_test: remove debugging * client/pluginmanager/csimanager/volume: fix odd merging logic * api: rename CSIOptions -> CSIMountOptions * nomad/csi_endpoint: remove a 7007 comment * command/alloc_status: show mount options in the volume list * nomad/structs/csi: include MountOptions in the volume stub * api/csi: add MountOptions to stub * command/volume_status_csi: clean up csiVolMountOption, add it * command/alloc_status: csiVolMountOption lives in volume_csi_status * command/node_status: display mount flags * nomad/structs/volumes: npe * plugins/csi/plugin: npe in ToCSIRepresentation * jobspec/parse_test: expand volume parse test cases * command/agent/job_endpoint: ApiTgToStructsTG needs MountOptions * command/volume_status_csi: copy paste error * jobspec/test-fixtures/basic: hclfmt * command/volume_status_csi: clean up csiVolMountOption	2020-03-23 13:59:25 -04:00
Lang Martin	7083cdd8cb	csi: csi-hostpath plugin unimplemented error on controller publish (#7299 ) * client/allocrunner/csi_hook: tag errors * nomad/client_csi_endpoint: tag errors * nomad/client_rpc: remove an unnecessary error tag * nomad/state/state_store: ControllerRequired fix intent We use ControllerRequired to indicate that a volume should use the publish/unpublish workflow, rather than that it has a controller. We need to check both RequiresControllerPlugin and SupportsAttachDetach from the fingerprint to check that. * nomad/csi_endpoint: tag errors * nomad/csi_endpoint_test: longer error messages, mock fingerprints	2020-03-23 13:58:30 -04:00
Danielle Lancashire	4a105a6679	csi: Claim CSI Volumes during csi_hook.Prerun This commit is the initial implementation of claiming volumes from the server and passes through any publishContext information as appropriate. There's nothing too fancy here.	2020-03-23 13:58:30 -04:00
Danielle Lancashire	4fe9c74f28	csi: Pass through usage options to the csimanager The CSI Spec requires us to attach and stage volumes based on different types of usage information when it may effect how they are bound. Here we pass through some basic usage options in the CSI Hook (specifically the volume aliases ReadOnly field), and the attachment/access mode from the volume. We pass the attachment/access mode seperately from the volume as it simplifies some handling and doesn't necessarily force every attachment to use the same mode should more be supported (I.e if we let each `volume "foo" {}` specify an override in the future).	2020-03-23 13:58:30 -04:00
Danielle Lancashire	01b69ef7bb	csi: Unpublish volumes during ar.Postrun This commit introduces initial support for unmounting csi volumes. It takes a relatively simplistic approach to performing NodeUnpublishVolume calls, optimising for cleaning up any leftover state rather than terminating early in the case of errors. This is because it happens during an allocation's shutdown flow and may not always have a corresponding call to `NodePublishVolume` that succeeded.	2020-03-23 13:58:30 -04:00
Danielle Lancashire	7d044a340f	allocrunner: Push state from hooks to taskrunners This commit is an initial (read: janky) approach to forwarding state from an allocrunner hook to a taskrunner using a similar `hookResources` approach that tr's use internally. It should eventually probably be replaced with something a little bit more message based, but for things that only come from pre-run hooks, and don't change, it's probably fine for now.	2020-03-23 13:58:30 -04:00
Danielle Lancashire	246f210975	csi_hook: Stage/Mount volumes as required This commit introduces the first stage of volume mounting for an allocation. The csimanager.VolumeMounter interface manages the blocking and actual minutia of the CSI implementation allowing this hook to do the minimal work of volume retrieval and creating mount info. In the future the `CSIVolume.Get` request should be replaced by `CSIVolume.Claim(Batch?)` to minimize the number of RPCs and to handle external triggering of a ControllerPublishVolume request as required. We also need to ensure that if pre-run hooks fail, we still get a full unwinding of any publish and staged volumes to ensure that there are no hanging references to volumes. That is not handled in this commit.	2020-03-23 13:58:30 -04:00
Danielle Lancashire	69cbb964e1	client: Pass an RPC Client to AllocRunners As part of introducing support for CSI, AllocRunner hooks need to be able to communicate with Nomad Servers for validation of and interaction with storage volumes. Here we create a small RPCer interface and pass the client (rpc client) to the AR in preparation for making these RPCs.	2020-03-23 13:58:30 -04:00
Danielle Lancashire	1250d56333	csi: Add VolumeManager (#6920 ) This changeset is some pre-requisite boilerplate that is required for introducing CSI volume management for client nodes. It extracts out fingerprinting logic from the csi instance manager. This change is to facilitate reusing the csimanager to also manage the node-local CSI functionality, as it is the easiest place for us to guaruntee health checking and to provide additional visibility into the running operations through the fingerprinter mechanism and goroutine. It also introduces the VolumeMounter interface that will be used to manage staging/publishing unstaging/unpublishing of volumes on the host.	2020-03-23 13:58:29 -04:00

27 Commits