Our vocabulary around scheduler behaviors outside of the `reschedule` and
`migrate` blocks leaves room for confusion around whether the reschedule tracker
should be propagated between allocations. There are effectively five different
behaviors we need to cover:
* restart: when the tasks of an allocation fail and we try to restart the tasks
in place.
* reschedule: when the `restart` block runs out of attempts (or the allocation
fails before tasks even start), and we need to move
the allocation to another node to try again.
* migrate: when the user has asked to drain a node and we need to move the
allocations. These are not failures, so we don't want to propagate the
reschedule tracker.
* replacement: when a node is lost, we don't count that against the `reschedule`
tracker for the allocations on the node (it's not the allocation's "fault",
after all). We don't want to run the `migrate` machinery here here either, as we
can't contact the down node. To the scheduler, this is effectively the same as
if we bumped the `group.count`
* replacement for `disconnect.replace = true`: this is a replacement, but the
replacement is intended to be temporary, so we propagate the reschedule tracker.
Add a section to the `reschedule`, `migrate`, and `disconnect` blocks explaining
when each item applies. Update the use of the word "reschedule" in several
places where "replacement" is correct, and vice-versa.
Fixes: https://github.com/hashicorp/nomad/issues/24918
Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>
stopAlloc() checks if an allocation represents a system job like this:
```
if *alloc.Job.Type == api.JobTypeSystem {
...
}
```
This caused the cli to crash:
```
==> 2024-02-29T08:45:53+01:00: Restarting 2 allocations
2024-02-29T08:45:54+01:00: Rescheduling allocation "6a9da11a" for group "redacted-group"
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x2 addr=0x20 pc=0x10686affc]
goroutine 36 [running]:
github.com/hashicorp/nomad/command.(*JobRestartCommand).stopAlloc(0x14000b11040, {0x14000996dc0?, 0x0?})
github.com/hashicorp/nomad/command/job_restart.go:968 +0x25c
github.com/hashicorp/nomad/command.(*JobRestartCommand).handleAlloc(0x14000b11040, {0x14000996dc0?, 0x0?})
github.com/hashicorp/nomad/command/job_restart.go:868 +0x34
github.com/hashicorp/nomad/command.(*JobRestartCommand).Run.(*JobRestartCommand).Run.func1.func2()
github.com/hashicorp/nomad/command/job_restart.go:392 +0x28
github.com/hashicorp/go-multierror.(*Group).Go.func1()
github.com/hashicorp/go-multierror@v1.1.1/group.go:23 +0x60
created by github.com/hashicorp/go-multierror.(*Group).Go in goroutine 1
github.com/hashicorp/go-multierror@v1.1.1/group.go:20 +0x84
```
Attaching a debugger revealed that `alloc.Job` was set, but
`alloc.Job.Type` was nil. After guarding the `.Type` check with a
`alloc.Job.Type != nil`, it still crashed. This time, `alloc.Job` was
nil.
I was scrambling to get the job running again, so I didn't have the
opportunity to find out why those values were nil, but this change
ensures the CLI does not crash in these situations.
Fixes#20048
The `-reschedule` flag stops allocations and assumes the Nomad scheduler
will create new allocations to replace them. But this is only true for
service and batch jobs.
Restarting non-service jobs with the `-reschedule` flag causes the
command to loop forever waiting for the allocations to be replaced,
which never happens.
Allocations for system jobs may be replaced by triggering an evaluation
after each stop to cause the reconciler to run again.
Sysbatch jobs should not be allowed to be rescheduled as they are never
replaced by the scheduler.
The `nomad job restart` command should skip allocations that already
have replacements. Restarting an allocation with a replacement is a
no-op because the allocation status is terminal and the command's
replacement monitor returns immediatelly.
But by not skipping them, the effective batch size is computed
incorrectly.
No functional changes, just cleaning up deprecated usages that are
removed in v2 and replace one call of .Slice with .ForEach to avoid
making the intermediate copy.
* build: update to go1.21
* go: eliminate helpers in favor of min/max
* build: run go mod tidy
* build: swap depguard for semgrep
* command: fixup broken tls error check on go1.21
Implement the new `nomad job restart` command that allows operators to
restart allocations tasks or reschedule then entire allocation.
Restarts can be batched to target multiple allocations in parallel.
Between each batch the command can stop and hold for a predefined time
or until the user confirms that the process should proceed.
This implements the "Stateless Restarts" alternative from the original
RFC
(https://gist.github.com/schmichael/e0b8b2ec1eb146301175fd87ddd46180).
The original concept is still worth implementing, as it allows this
functionality to be exposed over an API that can be consumed by the
Nomad UI and other clients. But the implementation turned out to be more
complex than we initially expected so we thought it would be better to
release a stateless CLI-based implementation first to gather feedback
and validate the restart behaviour.
Co-authored-by: Shishir Mahajan <smahajan@roblox.com>