mirror of
https://github.com/kemko/nomad.git
synced 2026-01-06 18:35:44 +03:00
scheduler: RescheduleTracker dropped if follow-up fails placements (#12319)
When an allocation fails it triggers an evaluation. The evaluation is processed and the scheduler sees it needs to reschedule, which triggers a follow-up eval. The follow-up eval creates a plan to `(stop 1) (place 1)`. The replacement alloc has a `RescheduleTracker` (or gets its `RescheduleTracker` updated). But in the case where the follow-up eval can't place all allocs (there aren't enough resources), it can create a partial plan to `(stop 1) (place 0)`. It then creates a blocked eval. The plan applier stops the failed alloc. Then when the blocked eval is processed, the job is missing an allocation, so the scheduler creates a new allocation. This allocation is _not_ a replacement from the perspective of the scheduler, so it's not handed off a `RescheduleTracker`. This changeset fixes this by annotating the reschedule tracker whenever the scheduler can't place a replacement allocation. We check this annotation for allocations that have the `stop` desired status when filtering out allocations to pass to the reschedule tracker. I've also included tests that cover this case and expands coverage of the relevant area of the code. Fixes: https://github.com/hashicorp/nomad/issues/12147 Fixes: https://github.com/hashicorp/nomad/issues/17072
This commit is contained in:
@@ -558,7 +558,8 @@ type GenericResponse struct {
|
||||
|
||||
// RescheduleTracker encapsulates previous reschedule events
|
||||
type RescheduleTracker struct {
|
||||
Events []*RescheduleEvent
|
||||
Events []*RescheduleEvent
|
||||
LastReschedule string
|
||||
}
|
||||
|
||||
// RescheduleEvent is used to keep track of previous attempts at rescheduling an allocation
|
||||
|
||||
Reference in New Issue
Block a user