Files
nomad/scheduler
Tim Gross f811169267 scheduler: recover from panic (#12009)
If processing a specific evaluation causes the scheduler (and
therefore the entire server) to panic, that evaluation will never
get a chance to be nack'd and cleared from the state store. It will
get dequeued by another scheduler, causing that server to panic, and
so forth until all servers are in a panic loop. This prevents the
operator from intervening to remove the evaluation or update the
state.

Recover the goroutine from the top-level `Process` methods for each
scheduler so that this condition can be detected without panicking the
server process. This will lead to a loop of recovering the scheduler
goroutine until the eval can be removed or nack'd, but that's much
better than taking a downtime.
2022-02-07 11:47:53 -05:00
..
2018-12-06 15:09:26 -08:00
2018-09-15 16:23:13 -07:00
2021-10-01 10:14:28 -04:00