nomad/client/allocrunner/taskrunner/plugin_supervisor_hook.go at 65c7f34f2d4c92ffbb51646aacc264f8db33cc06

mirror of https://github.com/kemko/nomad.git synced 2026-01-04 09:25:46 +03:00

Files

Tim Gross f3d53e3e2b CSI: restart task on failing initial probe, instead of killing it (#25307 )

When a CSI plugin is launched, we probe it until the csi_plugin.health_timeout
expires (by default 30s). But if the plugin never becomes healthy, we're not
restarting the task as documented.

Update the plugin supervisor to trigger a restart instead. We still exit the
supervisor loop at that point to avoid having the supervisor send probes to a
task that isn't running yet. This requires reworking the poststart hook to allow
the supervisor loop to be restarted when the task restarts.

In doing so, I identified that we weren't respecting the task kill context from
the post start hook, which would leave the supervisor running in the window
between when a task is killed because it failed and its stop hooks were
triggered. Combine the two contexts to make sure we stop the supervisor
whichever context gets closed first.

Fixes: https://github.com/hashicorp/nomad/issues/25293
Ref: https://hashicorp.atlassian.net/browse/NET-12264

2025-03-07 10:04:59 -05:00

17 KiB

Raw Blame History

View Raw

17 KiB Raw Blame History

17 KiB

Raw Blame History