Our vocabulary around scheduler behaviors outside of the `reschedule` and
`migrate` blocks leaves room for confusion around whether the reschedule tracker
should be propagated between allocations. There are effectively five different
behaviors we need to cover:
* restart: when the tasks of an allocation fail and we try to restart the tasks
in place.
* reschedule: when the `restart` block runs out of attempts (or the allocation
fails before tasks even start), and we need to move
the allocation to another node to try again.
* migrate: when the user has asked to drain a node and we need to move the
allocations. These are not failures, so we don't want to propagate the
reschedule tracker.
* replacement: when a node is lost, we don't count that against the `reschedule`
tracker for the allocations on the node (it's not the allocation's "fault",
after all). We don't want to run the `migrate` machinery here here either, as we
can't contact the down node. To the scheduler, this is effectively the same as
if we bumped the `group.count`
* replacement for `disconnect.replace = true`: this is a replacement, but the
replacement is intended to be temporary, so we propagate the reschedule tracker.
Add a section to the `reschedule`, `migrate`, and `disconnect` blocks explaining
when each item applies. Update the use of the word "reschedule" in several
places where "replacement" is correct, and vice-versa.
Fixes: https://github.com/hashicorp/nomad/issues/24918
Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>
* func: remove the lists to override the nomad_local_binary for servers and clients
* docs: add a note to the terraform e2e readme
* fix: remove the extra 'windows' from the aws_ami filter
* style: hcl fmt
Similarly to #6732 it removes checking affinity and spread for inplace update.
Both affinity and spread should be as soft preference for Nomad scheduler rather than strict constraint. Therefore modifying them should not trigger job reallocation.
Fixes#25070
Co-authored-by: Tim Gross <tgross@hashicorp.com>
* Add dead (stopped) to status mapping to clarify Stopped
CE-816
* Pull status mapping into partial and include in job status command
* change `complete` to dead in table after discuss with Michael
* added clarifications; add CLI status definitions
* fixed line endings
* fixed typoce816dead
In #20165 we fixed a bug where a partially configured `client.template` retry
block would set any unset fields to nil instead of their default values. But
this patch introduced a regression in the default values, so we were now
defaulting to unlimited retries if the retry block was unset. Restore the
correct behavior and add better test coverage at both the config parsing and
template configuration code.
Ref: https://github.com/hashicorp/nomad/pull/20165
Ref: https://github.com/hashicorp/nomad/issues/23305#issuecomment-2643731565
In #24526 we updated the Consul and Vault fingerprints so that they are no
longer periodic. This fixed a problem that cluster admins reported where rolling
updates of Vault or Consul would cause a thundering herd of fingerprint updates
across the whole cluster.
But if Consul/Vault is not available during the initial fingerprint, it will
never get fingerprinted again. This is challenging for cluster updates and black
starts because the implicit service startup ordering may require
reloads. Instead, have the fingerprinter run periodically but mark that it has
made its first successful fingerprint of all Consul/Vault clusters. At that
point, we can skip further periodic updates. The `Reload` method will reset the
mark and allow the subsequent fingerprint to run normally.
Fixes: https://github.com/hashicorp/nomad/issues/25097
Ref: https://github.com/hashicorp/nomad/pull/24526
Ref: https://github.com/hashicorp/nomad/issues/24049
* fix: change the value of the version used for testing to account for ent versions
* func: add more specific test for servers stability
* func: change the criteria we use to verify the cluster stability after server upgrades
* style: syntax
In #24650 we switched to using ephemeral state for CNI plugins, so that when a
host reboots and we lose all the allocations we don't end up trying to use IPs
we created in network namespaces we just destroyed. Unfortunately upgrade
testing missed that in a non-reboot scenario, the existing CNI state was being
used by plugins like the ipam plugin to hand out the "next available" IP
address. So with no state carried over, we might allocate new addresses that
conflict with existing allocations. (This can be avoided by draining the node
first.)
As a compatibility shim, copy the old CNI state directory to the new CNI state
directory during agent startup, if the new CNI state directory doesn't already
exist.
Ref: https://github.com/hashicorp/nomad/pull/24650
Add a README describing the setup required for running upgrade testing via
Enos. Also fix the authorization header of our `wget` to use the proper header
for short-lived tokens, and the output path variable of the artifactory step.
Co-authored-by: Juanadelacuesta <8647634+Juanadelacuesta@users.noreply.github.com>
A return statement was missing in the sticky volume check—when we weren't able
to find a suitable volume, we did not return false. This was caught by e2e
test.
This PR fixes the issue, and corrects and expands the unit test.
CE side of ENT PR:
task schedule: pauses are not restart "attempts"
distinguish between these two cases:
1. task dies because we "paused" it (on purpose)
- should not count against restarts,
because nothing is wrong.
2. task dies because it didn't work right
- should count against restart attempts,
so users can address application issues.
with this, the restart{} block is back to its normal
behavior, so its documentation applies without caveat.
* Docs SEO: task drivers and plugins; refactor virt section
* add redirects for virt driver files
* Some updates. committing rather than stashing
* fix content-check errors
* Remove docs/devices/ and redirect to plugins/devices
* Update docs/drivers descriptions
* Move USB device plugin up a level. Finish descriptions.
* Apply suggestions from Jeff's code review
Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>
* Apply title case suggestions from code review
Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>
* apply title case suggestions; fix indentation
---------
Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>
This dependency is only used to generate mock `Variables`. The only time the
faked values are meaningful would be in the state store and RPC handler tests,
where we are always setting the values directly so that we can control
unblocking behaviors. Remove most of the random generation and remove the
dependency.
Closes: https://github.com/hashicorp/nomad/pull/25066
* func: add a new output that merges both windowa and linux clients, but add tags to distinguish them
* fix: outputs cant referrence other outputs in terraform
* Update e2e/terraform/provision-infra/compute.tf
Co-authored-by: Tim Gross <tgross@hashicorp.com>
---------
Co-authored-by: Tim Gross <tgross@hashicorp.com>
* func: add module to upgrade clients
* func: add polling to verify the metadata to make sure all clients are up
* style: remove unused code
* fix: Give the allocations a little time to get to the expected number on teh test health check, to avoid possible flaky tests in the future
* fix: set the upgrade version as clients version for the last health check
I merged #24869 having forgotten we don't run these tests in PR CI, so there's a compile error in the test. Fix that error and add the no-op import we use to catch this kind of thing.
Ref: https://github.com/hashicorp/nomad/pull/24869
Add tests for dynamic host volumes where the claiming jobs have `volume.sticky =
true`. Includes a test for forced rescheduling and a test for node drain.
This changeset includes a new `e2e/v3`-style package for creating dynamic host
volumes, so we can reuse that across other tests.
at least one bug has been created because it's
easy to miss a future.set() in pullImageImpl()
this pulls future.set() out to PullImage(),
the same level where it's created and wait()ed
We introduce an alternative solution to the one presented in #24960 which is
based on the state store and not previous-next allocation tracking in the
reconciler. This new solution reduces cognitive complexity of the scheduler
code at the cost of slightly more boilerplate code, but also opens up new
possibilities in the future, e.g., allowing users to explicitly "un-stick"
volumes with workloads still running.
The diagram below illustrates the new logic:
SetVolumes() upsertAllocsImpl()
sets ns, job +-----------------checks if alloc requests
tg in the scheduler v sticky vols and consults
| +-----------------------+ state. If there is no claim,
| | TaskGroupVolumeClaim: | it creates one.
| | - namespace |
| | - jobID |
| | - tg name |
| | - vol ID |
v | uniquely identify vol |
hasVolumes() +----+------------------+
consults the state | ^
and returns true | | DeleteJobTxn()
if there's a match <-----------+ +---------------removes the claim from
or if there is no the state
previous claim
| | | |
+-----------------------------+ +------------------------------------------------------+
scheduler state store
The variables definitions for Enos upgrade scenarios have a couple of unused
variables and some of the documentation strings are ambiguous:
* `nomad_region` and `binary_local_path` variables are unused and can be removed.
* `nomad_local_binary` refers to the directory where the binaries will be
download, not the binaries themselves. Rename to make it clear this belongs to
the artifactory fetch and not the provisioning step (which uses the
artifactory fetch outputs).