nomad/.changelog/26831.txt at main - nomad - Gitea: Git with a cup of tea

kemko/nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-01 16:05:42 +03:00

Files

Tim Gross 40241b261b CSI: ensure only client-terminal allocs are treated as past claims (#26831 )

The volume watcher checks whether any allocations that have claims are terminal
so that it knows if it's safe to unpublish the volume. This check was
considering a claim as unpublishable if the allocation was terminal on either
the server or client, rather than the client alone. In many circumstances this
is safe.

But if an allocation takes a while to stop (ex. it has a `shutdown_delay`), it's
possible for garbage collection to run in the window between when the alloc is
marked server-terminal and when the task is actually stopped. The server
unpublishes the volume which sends a node plugin RPC. The plugin unmounts the
volume while it's in use, and then unmounts it again when the allocation stops
and the CSI postrun hook runs. If the task writes to the volume during the
unmounting process, some providers end up in a broken state and the volume is
not usable unless it's detached and reattached.

Fix this by considering a claim a "past claim" only when the allocation is
client terminal. This way if garbage collection runs while we're waiting for
allocation shutdown, the alloc will only be server-terminal and we won't send
the extra node RPCs.

Fixes: https://github.com/hashicorp/nomad/issues/24130
Fixes: https://github.com/hashicorp/nomad/issues/25819
Ref: https://hashicorp.atlassian.net/browse/NMD-1001

2025-09-25 09:24:53 -04:00

4 lines

120 B

Plaintext

Raw Permalink Blame History

	```release-note:bug
	`csi: Fixed a bug where volumes could be unmounted while in use by a task that was shutting down`
	```