full task cleanup when alloc prerun hook fails (#17104)

to avoid leaking task resources (e.g. containers,
iptables) if allocRunner prerun fails during
restore on client restart.

now if prerun fails, TaskRunner.MarkFailedKill()
will only emit an event, mark the task as failed,
and cancel the tr's killCtx, so then ar.runTasks()
-> tr.Run() can take care of the actual cleanup.

removed from (formerly) tr.MarkFailedDead(),
now handled by tr.Run():
 * set task state as dead
 * save task runner local state
 * task stop hooks

also done in tr.Run() now that it's not skipped:
 * handleKill() to kill tasks while respecting
   their shutdown delay, and retrying as needed
   * also includes task preKill hooks
 * clearDriverHandle() to destroy the task
   and associated resources
 * task exited hooks
This commit is contained in:
Daniel Bennett
2023-05-08 13:17:10 -05:00
committed by GitHub
parent 58a7d40122
commit c2dc1c58dd
10 changed files with 334 additions and 35 deletions

View File

@@ -11,10 +11,11 @@ import (
"time"
"github.com/google/go-cmp/cmp"
"github.com/hashicorp/nomad/nomad/structs"
"github.com/kr/pretty"
"github.com/shoenig/test/must"
"github.com/shoenig/test/wait"
"github.com/hashicorp/nomad/nomad/structs"
)
type testFn func() (bool, error)
@@ -241,6 +242,7 @@ func WaitForVotingMembers(t testing.TB, rpc rpcFn, nPeers int) {
// RegisterJobWithToken registers a job and uses the job's Region and Namespace.
func RegisterJobWithToken(t testing.TB, rpc rpcFn, job *structs.Job, token string) {
t.Helper()
WaitForResult(func() (bool, error) {
args := &structs.JobRegisterRequest{}
args.Job = job