nomad/.changelog/25726.txt at main - nomad - Gitea: Git with a cup of tea

kemko/nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-01 16:05:42 +03:00

Files

Tim Gross 5208ad4c2c scheduler: allow canaries to be migrated on node drain (#25726 )

When a node is drained that has canaries that are not yet healthy, the canaries
may not be properly migrated and the deployment will halt. This happens only if
there are more than `migrate.max_parallel` canaries on the node and the canaries
are not yet healthy (ex. they have a long `update.min_healthy_time`). In this
circumstance, the first batch of canaries are marked for migration by the
drainer correctly. But then the reconciler counts these migrated canaries
against the total number of expected canaries and no longer progresses the
deployment. Because an insufficient number of allocations have reported they're
healthy, the deployment cannot be promoted.

When the reconciler looks for canaries to cancel, it leaves in the list any
canaries that are already terminal (because there shouldn't be any work to
do). But this ends up skipping the creation of a new canary to replace terminal
canaries that have been marked for migration. Add a conditional for this case to
cause the canary to be removed from the list of active canaries so we can
replace it.

Ref: https://hashicorp.atlassian.net/browse/NMD-560
Fixes: https://github.com/hashicorp/nomad/issues/17842

2025-04-24 09:24:28 -04:00

4 lines

118 B

Plaintext

Raw Permalink Blame History

	```release-note:bug
	`scheduler: Fixed a bug where draining a node with canaries could result in a stuck deployment`
	```