CSI: set mounts in alloc hook resources atomically (#16722)

The allocrunner has a facility for passing data written by allocrunner hooks to
taskrunner hooks. Currently the only consumers of this facility are the
allocrunner CSI hook (which writes data) and the taskrunner volume hook (which
reads that same data).

The allocrunner hook for CSI volumes doesn't set the alloc hook resources
atomically. Instead, it gets the current resources and then writes a new version
back. Because the CSI hook is currently the only writer and all readers happen
long afterwards, this should be safe but #16623 shows there's some sequence of
events during restore where this breaks down.

Refactor hook resources so that hook data is accessed via setters and getters
that hold the mutex.
This commit is contained in:
Tim Gross
2023-04-03 11:03:36 -04:00
committed by GitHub
parent db2a4e579e
commit f3fc54adcf
9 changed files with 69 additions and 77 deletions

View File

@@ -128,9 +128,9 @@ type allocRunner struct {
// transitions.
runnerHooks []interfaces.RunnerHook
// hookState is the output of allocrunner hooks
hookState *cstructs.AllocHookResources
hookStateMu sync.RWMutex
// hookResources holds the output from allocrunner hooks so that later
// allocrunner hooks or task runner hooks can read them
hookResources *cstructs.AllocHookResources
// tasks are the set of task runners
tasks map[string]*taskrunner.TaskRunner
@@ -238,6 +238,7 @@ func NewAllocRunner(config *Config) (*allocRunner, error) {
serviceRegWrapper: config.ServiceRegWrapper,
checkStore: config.CheckStore,
getter: config.Getter,
hookResources: cstructs.NewAllocHookResources(),
}
// Create the logger based on the allocation ID
@@ -293,6 +294,7 @@ func (ar *allocRunner) initTaskRunners(tasks []*structs.Task) error {
ShutdownDelayCtx: ar.shutdownDelayCtx,
ServiceRegWrapper: ar.serviceRegWrapper,
Getter: ar.getter,
AllocHookResources: ar.hookResources,
}
if ar.cpusetManager != nil {