Files
nomad/client
Tim Gross f3d53e3e2b CSI: restart task on failing initial probe, instead of killing it (#25307)
When a CSI plugin is launched, we probe it until the csi_plugin.health_timeout
expires (by default 30s). But if the plugin never becomes healthy, we're not
restarting the task as documented.

Update the plugin supervisor to trigger a restart instead. We still exit the
supervisor loop at that point to avoid having the supervisor send probes to a
task that isn't running yet. This requires reworking the poststart hook to allow
the supervisor loop to be restarted when the task restarts.

In doing so, I identified that we weren't respecting the task kill context from
the post start hook, which would leave the supervisor running in the window
between when a task is killed because it failed and its stop hooks were
triggered. Combine the two contexts to make sure we stop the supervisor
whichever context gets closed first.

Fixes: https://github.com/hashicorp/nomad/issues/25293
Ref: https://hashicorp.atlassian.net/browse/NET-12264
2025-03-07 10:04:59 -05:00
..