Commit Graph

26725 Commits

Author SHA1 Message Date
Tim Gross
1788bfb42e remove addresses from node class hash (#24942)
When a node is fingerprinted, we calculate a "computed class" from a hash over a
subset of its fields and attributes. In the scheduler, when a given node fails
feasibility checking (before fit checking) we know that no other node of that
same class will be feasible, and we add the hash to a map so we can reject them
early. This hash cannot include any values that are unique to a given node,
otherwise no other node will have the same hash and we'll never save ourselves
the work of feasibility checking those nodes.

In #4390 we introduce the `nomad.advertise.address` attribute and in #19969 we
introduced `consul.dns.addr` attribute. Both of these are unique per node and
break the hash.

Additionally, we were not correctly filtering attributes out when checking if a
node escaped the class by not filtering for attributes that start with
`unique.`. The test for this introduced in #708 had an inverted assertion, which
allowed this to pass unnoticed since the early days of Nomad.

Ref: https://github.com/hashicorp/nomad/pull/708
Ref: https://github.com/hashicorp/nomad/pull/4390
Ref: https://github.com/hashicorp/nomad/pull/19969
2025-03-03 09:28:32 -05:00
Michael Smithhisler
7867957811 e2e: remove legacy consul token tests (#25174) 2025-02-28 11:31:33 -05:00
David
c52623d7d4 client: remove unused nodeID parameter from host stats metric functions (#25247) 2025-02-28 10:29:38 +01:00
James Rasell
7268053174 vault: Remove legacy token based authentication workflow. (#25155)
The legacy workflow for Vault whereby servers were configured
using a token to provide authentication to the Vault API has now
been removed. This change also removes the workflow where servers
were responsible for deriving Vault tokens for Nomad clients.

The deprecated Vault config options used byi the Nomad agent have
all been removed except for "token" which is still in use by the
Vault Transit keyring implementation.

Job specification authors can no longer use the "vault.policies"
parameter and should instead use "vault.role" when not using the
default workload identity.

---------

Co-authored-by: Tim Gross <tgross@hashicorp.com>
Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>
2025-02-28 07:40:02 +00:00
Tim Gross
4a62d1b75c upgrade tests: add CSI workload (#25223)
Add an upgrade test workload for CSI with the AWS EFS plugin. In order to
validate this workload, we'll need to deploy the plugin job and then register a
volume with it. So this extends the `run_workloads` module to allow for "pre
scripts" and "post scripts" to be run before and after a given job has been
deployed. We can use that as a model for other test workloads.

Ref: https://hashicorp.atlassian.net/browse/NET-12217
2025-02-27 15:16:04 -05:00
James Rasell
c34f17c377 deps: Consolidated update of dependabot PRs (#25240)
* chore(deps): bump github.com/moby/term from 0.5.0 to 0.5.2

Bumps [github.com/moby/term](https://github.com/moby/term) from 0.5.0 to 0.5.2.
- [Commits](https://github.com/moby/term/compare/v0.5.0...v0.5.2)

---
updated-dependencies:
- dependency-name: github.com/moby/term
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore(deps): bump golang.org/x/mod from 0.22.0 to 0.23.0

Bumps [golang.org/x/mod](https://github.com/golang/mod) from 0.22.0 to 0.23.0.
- [Commits](https://github.com/golang/mod/compare/v0.22.0...v0.23.0)

---
updated-dependencies:
- dependency-name: golang.org/x/mod
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore(deps): bump github.com/zclconf/go-cty from 1.16.0 to 1.16.2

Bumps [github.com/zclconf/go-cty](https://github.com/zclconf/go-cty) from 1.16.0 to 1.16.2.
- [Release notes](https://github.com/zclconf/go-cty/releases)
- [Changelog](https://github.com/zclconf/go-cty/blob/main/CHANGELOG.md)
- [Commits](https://github.com/zclconf/go-cty/compare/v1.16.0...v1.16.2)

---
updated-dependencies:
- dependency-name: github.com/zclconf/go-cty
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore(deps): bump github.com/docker/go-connections from 0.4.0 to 0.5.0

Bumps [github.com/docker/go-connections](https://github.com/docker/go-connections) from 0.4.0 to 0.5.0.
- [Commits](https://github.com/docker/go-connections/compare/v0.4.0...v0.5.0)

---
updated-dependencies:
- dependency-name: github.com/docker/go-connections
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-02-27 16:54:44 +00:00
Piotr Kazmierczak
73a193f6d9 stateful deployments: task group host volume claims CLI (#25116)
CLI for interacting with task group host volume claims.
2025-02-27 17:04:48 +01:00
Tim Gross
6ae1444cf4 upgrade testing: debugging assistance (#25232)
Enos buries the Terraform output from provisioning. Add a shell script to load
the environment from provisioning for debugging Nomad during development of
upgrade tests.
2025-02-27 08:35:45 -05:00
James Rasell
bc52ca3142 sec: Update x/crypto and x/oauth2 to resolve scan failures. (#25236)
Fixes:
- vulnerability GO-2025-3487 in golang.org/x/crypto@v0.32.0
- vulnerability GO-2025-3488 in golang.org/x/oauth2@v0.25.0

The go.mod declaration is also needed by the update.
2025-02-27 10:37:54 +00:00
dependabot[bot]
2ea0846b0d chore(deps): bump github.com/go-jose/go-jose/v3 from 3.0.3 to 3.0.4 (#25234)
Bumps [github.com/go-jose/go-jose/v3](https://github.com/go-jose/go-jose) from 3.0.3 to 3.0.4.
- [Release notes](https://github.com/go-jose/go-jose/releases)
- [Changelog](https://github.com/go-jose/go-jose/blob/main/CHANGELOG.md)
- [Commits](https://github.com/go-jose/go-jose/compare/v3.0.3...v3.0.4)

---
updated-dependencies:
- dependency-name: github.com/go-jose/go-jose/v3
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-02-27 10:35:11 +01:00
Juana De La Cuesta
461d4268e2 func: add python servers to raw exec workloads (#25230) 2025-02-26 18:05:46 +01:00
Juana De La Cuesta
b13132043b Add new workloads (#25106)
* func: Add more workloads

* Update jobs.sh

* Update versions.sh

* style: format

* Update enos/modules/test_cluster_health/scripts/allocs.sh

Co-authored-by: Tim Gross <tgross@hashicorp.com>

* docs: improve outputs descriptions

* func: change docker workloads to be redis boxes and add healthchecks

* func: register the services on consul

* style: format

---------

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2025-02-26 17:02:27 +01:00
Tim Gross
3b9290a11e E2E: fix column parsing for dynamic host volumes test (#25228)
In #25185 we changed the output of `volume status` to include both DHV and CSI
volumes by default. When the E2E test parses the output, it's not expecting the
new section header.

Ref: https://github.com/hashicorp/nomad/pull/25185
2025-02-26 09:52:47 -05:00
Aimee Ukasick
4693f0be2e docs: update Virt beta callout to use Note component (#25184) 2025-02-25 11:26:41 -06:00
Phil Renaud
7d08e79da3 [ui] System, Batch, and Sysbatch jobs' Start Job buttons let you revert to previous versions (#25104)
* Changes the behaviour of system/batch/sysbatch jobs not to look for a latest stable version, as their versions never go to stable

* Dont show job stability on versions page for system/sysbatch/batch jobs

* Tests that depend on jobs to revert specify that they are Service jobs

* Batch jobs added to detail-restart test loop

* Right, they're not stable, they're just versions
2025-02-25 11:10:14 -05:00
Tim Gross
db5022b965 deps: remove actions updates from dependabot (#25211)
Dependabot can update actions to versions that are not in the TSCCR
allowlist. The TSCCR check doesn't happen in CE, which means we don't learn we
have a problem until after we've spent the effort to backport them. Remove the
automation that updates actions automatically until this issue is resolved on
the security team's side.
2025-02-25 10:18:50 -05:00
dependabot[bot]
d9d5e7351a chore(deps): bump github.com/go-jose/go-jose/v4 from 4.0.4 to 4.0.5 (#25204)
Bumps [github.com/go-jose/go-jose/v4](https://github.com/go-jose/go-jose) from 4.0.4 to 4.0.5.
- [Release notes](https://github.com/go-jose/go-jose/releases)
- [Changelog](https://github.com/go-jose/go-jose/blob/main/CHANGELOG.md)
- [Commits](https://github.com/go-jose/go-jose/compare/v4.0.4...v4.0.5)

---
updated-dependencies:
- dependency-name: github.com/go-jose/go-jose/v4
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-02-25 15:55:36 +01:00
Piotr Kazmierczak
58c6387323 stateful deployments: task group host volume claims API (#25114)
This PR introduces API endpoints /v1/volumes/claims/ and /v1/volumes/claim/:id
for listing and deleting task group host volume claims, respectively.
2025-02-25 15:51:59 +01:00
Tim Gross
7997a760df docs: improve cross-links for scheduler preemption (#25203)
Fix a broken link from the preemption concepts docs to the relevant API. Also include a link to the relevant command.

Ref: #25038
2025-02-25 08:56:54 -05:00
Tim Gross
cee11356ab deps: update go-msgpack and net-rpc-msgpackrpc (#25201)
Fixes a bug where connections would not be closed on write errors in the
msgpack encoder, which would cause the reader end of RPC connections to hang
indefinitely. This resulted in clients in widely-distributed geographies being
unable to poll for allocation updates.

Fixes: https://github.com/hashicorp/nomad/issues/23305
2025-02-24 14:23:50 -05:00
dependabot[bot]
7b9ef94e7e chore(deps): bump github.com/aws/aws-sdk-go-v2/config (#25193)
Bumps [github.com/aws/aws-sdk-go-v2/config](https://github.com/aws/aws-sdk-go-v2) from 1.29.6 to 1.29.7.
- [Release notes](https://github.com/aws/aws-sdk-go-v2/releases)
- [Changelog](https://github.com/aws/aws-sdk-go-v2/blob/main/changelog-template.json)
- [Commits](https://github.com/aws/aws-sdk-go-v2/compare/config/v1.29.6...config/v1.29.7)

---
updated-dependencies:
- dependency-name: github.com/aws/aws-sdk-go-v2/config
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-02-24 13:53:42 -05:00
dependabot[bot]
62e02be050 chore(deps): bump actions/upload-artifact from 4.6.0 to 4.6.1 (#25191)
Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 4.6.0 to 4.6.1.
- [Release notes](https://github.com/actions/upload-artifact/releases)
- [Commits](65c4c4a1dd...4cec3d8aa0)

---
updated-dependencies:
- dependency-name: actions/upload-artifact
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-02-24 13:40:03 -05:00
dependabot[bot]
8d0dd622bd chore(deps): bump github.com/prometheus/client_golang (#25194)
Bumps [github.com/prometheus/client_golang](https://github.com/prometheus/client_golang) from 1.20.5 to 1.21.0.
- [Release notes](https://github.com/prometheus/client_golang/releases)
- [Changelog](https://github.com/prometheus/client_golang/blob/main/CHANGELOG.md)
- [Commits](https://github.com/prometheus/client_golang/compare/v1.20.5...v1.21.0)

---
updated-dependencies:
- dependency-name: github.com/prometheus/client_golang
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-02-24 13:37:00 -05:00
dependabot[bot]
11399d98f4 chore(deps): bump github.com/mattn/go-colorable from 0.1.13 to 0.1.14 (#25195)
Bumps [github.com/mattn/go-colorable](https://github.com/mattn/go-colorable) from 0.1.13 to 0.1.14.
- [Commits](https://github.com/mattn/go-colorable/compare/v0.1.13...v0.1.14)

---
updated-dependencies:
- dependency-name: github.com/mattn/go-colorable
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-02-24 13:31:46 -05:00
dependabot[bot]
6fa8741cbc chore(deps): bump github.com/aws/smithy-go from 1.22.2 to 1.22.3 (#25197)
Bumps [github.com/aws/smithy-go](https://github.com/aws/smithy-go) from 1.22.2 to 1.22.3.
- [Release notes](https://github.com/aws/smithy-go/releases)
- [Changelog](https://github.com/aws/smithy-go/blob/main/CHANGELOG.md)
- [Commits](https://github.com/aws/smithy-go/compare/v1.22.2...v1.22.3)

---
updated-dependencies:
- dependency-name: github.com/aws/smithy-go
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-02-24 13:31:32 -05:00
dependabot[bot]
8545c47cd9 chore(deps): bump github.com/hashicorp/go-kms-wrapping/wrappers/transit/v2 (#25196)
Bumps [github.com/hashicorp/go-kms-wrapping/wrappers/transit/v2](https://github.com/hashicorp/go-kms-wrapping) from 2.0.12 to 2.0.13.
- [Commits](https://github.com/hashicorp/go-kms-wrapping/compare/v2.0.12...v2.0.13)

---
updated-dependencies:
- dependency-name: github.com/hashicorp/go-kms-wrapping/wrappers/transit/v2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-02-24 13:31:08 -05:00
Tim Gross
4cdfa19b1e volume status: default type to show both DHV and CSI volumes (#25185)
The `-type` option for `volume status` is a UX papercut because for many
clusters there will be only one sort of volume in use. Update the CLI so that
the default behavior is to query CSI and/or DHV.

This behavior is subtly different when the user provides an ID or not. If the
user doesn't provide an ID, we query both CSI and DHV and show both tables. If
the user provides an ID, we query DHV first and then CSI, and show only the
appropriate volume. Because DHV IDs are UUIDs, we're sure we won't have
collisions between the two. We only show errors if both queries return an error.

Fixes: https://hashicorp.atlassian.net/browse/NET-12214
2025-02-24 11:38:07 -05:00
Tim Gross
8c95f5f17e upgrade testing: make sure we capture last error if not exiting (#25186)
While testing #25172 I found a few spots where #25152 wasn't capturing the
errors from transient failures correctly or exiting early instead of
retrying.

Ref: https://hashicorp.atlassian.net/browse/NET-11546
2025-02-24 09:37:17 -05:00
Juana De La Cuesta
0529c0247d Only take one snapshot when upgrading servers (#25187)
* func: add possibility of having different binaries for server and clients

* style: rename binaries modules

* func: remove the check for last configuration log, and only take one snapshot when upgrading the servers

* Update enos/modules/upgrade_servers/main.tf

Co-authored-by: Tim Gross <tgross@hashicorp.com>

---------

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2025-02-24 15:06:16 +01:00
Juana De La Cuesta
4a75d2de63 Adjust the servers to be always linux instances (#25172)
* func: add possibility of having different binaries for server and clients

* style: rename binaries modules

* docs: update comments

* fix: correct the token input variable for fetch binaries
2025-02-24 13:09:57 +01:00
Phil Renaud
11cf98abe3 Pins trim-newlines for the website package (#25189) 2025-02-21 16:50:18 -05:00
Tim Gross
38840e6b62 dynamic host volumes: mark volumes on down nodes as unavailable (#25188)
We update the status of a volume when the node fingerprint changes. But if a
node goes down, we still show the volume as available. The scheduler behavior is
correct because a down node can never have work scheduled on it, but it might be
confusing for job authors who are looking at volumes that are showing as
available.

Update the volume update logic we have on node updates to include marking the
volume as unavailable when a node goes down.

Fixes: https://hashicorp.atlassian.net/browse/NET-12068
2025-02-21 16:50:04 -05:00
Tim Gross
e02ef73abf fingerprint: only log failed Consul fingerprint once (#25182)
In #24526 we updated Consul and Vault fingerprinting so that we no longer
periodically fingerprint. In #25102 we made it so that we fingerprint
periodically on start until the first fingerprint, in order to tolerate Consul
or Vault not being available on start. For clusters not running Consul, this
leads to a warn-level log every 15s. This same log exists for Vault, but Vault
support is opt-in via `vault.enable = true` whereas you have to manually disable
the fingerprinter for Consul.

Make it so that we only log a failed Consul fingerprint once per Consul
cluster. Reset the gate on this once we have a successful fingerprint, so that
we get the logs after a reload if Consul is unavailable.

Ref: https://github.com/hashicorp/nomad/pull/24526
Ref: https://github.com/hashicorp/nomad/pull/25102
Fixes: https://github.com/hashicorp/nomad/issues/25181
2025-02-21 13:09:34 -05:00
James Rasell
5c7e3bbe2f changelog: add entry for #25140 (#25178) 2025-02-21 08:24:32 +00:00
Phil Renaud
3f764a21bc [ui, tests] Various acceptance test fixups (v main) (#25031)
* Add factory hooks for jobs to have previously stable versions and stopped status

* Since #24973 node-read isn't presupposed and so should regex match only on the common url parts

* Job detail tests for title buttons are now bimodal and default to having previously-stable version in history

* prettier plz

* Breaking a thing on purpose to see if my other broken thing is broken

* continue-on-error set to false to get things red when appropriate

* OK what if continue-on-error=true but we do a separate failure reporting after the fact

* fail-fast are you the magic incantation that I need?

* Re-fix my test now that fast-fail is off

* Fix to server-leader by adding a region first, and always()-append to uploading partition results

* Express failure step lists failing tests so you don't have to click back into ember-exam step

* temporary snapshot and logging for flakey test in service job detail

* Bunch of region and tasklogs test fixups

* using allocStatusDistribution to ensure service job always has a non-queued alloc
2025-02-20 16:56:14 -05:00
Tim Gross
d63c1d0bad volumes: add capabilities to volume status (#25173)
Job authors need to be able to review what capabilities a dynamic host volume or
CSI volume has so that they can set the correct access mode and attachment mode
in their job. Add these to the CLI output of `volume status`.

Ref: https://hashicorp.atlassian.net/browse/NET-12063
2025-02-20 14:40:31 -05:00
dependabot[bot]
dde8eac42c chore(deps-dev): bump prettier from 3.4.1 to 3.5.1 in /website (#25125)
Bumps [prettier](https://github.com/prettier/prettier) from 3.4.1 to 3.5.1.
- [Release notes](https://github.com/prettier/prettier/releases)
- [Changelog](https://github.com/prettier/prettier/blob/main/CHANGELOG.md)
- [Commits](https://github.com/prettier/prettier/compare/3.4.1...3.5.1)

---
updated-dependencies:
- dependency-name: prettier
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-02-20 16:58:53 +01:00
Phil Renaud
6a091b4c9c fix code scanning alert ws affected by a dos when handling a request with many http headers (#25159)
* Pin socket ws for ui

* Website ws pinned
2025-02-20 10:36:21 -05:00
James Rasell
8bce0b0954 e2e: Migrate legacy Vault token based workflow to workload ID (#25139)
Nomad 1.10.0 is removing the legacy Vault token based workflow
which means the legacy e2e compatibility tests will fail and not
work.

The Nomad e2e cluster was using the legacy Vault token based
workflow for initial cluster build. This change migrates to using
the workload identity flow which utilizes authentication methods,
roles, and policies.

The Nomad server network has been modified to allow traffic from
the HCP Vault HVN which is a private network peered into our AWS
account. This is required, so that Vault can pull JWKS
information from the Nomad API without going over the public
internet.

The cluster build will now also configure a Vault KV v2 mount at
a unique indentifier for the e2e cluster. This allows all Nomad
workloads and tests to use this if required.

The vaultsecrets suite has been updated to accommodate the new
changes and extended to test the default workload ID flow for
allocations which use Vault for secrets.
2025-02-20 14:06:25 +00:00
Tim Gross
73cd934e1a upgrade testing: make script error handling more robust (#25152)
We're using `set -eo pipefail` everywhere in the Enos scripts, several of the
scripts used for checking assertions didn't take advantage of pipefail in such a
way that we could avoid early exits from transient errors. This meant that if a
server was slightly late to come back up, we'd hit an error and exit the whole
script instead of polling as expected.

While fixing this, I've made a number of other improvements to the shell scripts:

* I've changed the design of the polling loops so that we're calling a function
that returns an exit code and sets `last_error` value, along with any global
variables required by downstream functions. This makes the loops more readable
by reducing the number of global variables, and helped identify some places
where we're exiting instead of returning into the loop.

* Using `shellcheck -s bash` I fixes some unused variables and undefined
variables that we were missing because they were only used on the error paths.
2025-02-20 08:44:35 -05:00
James Rasell
ec0cf86a37 github: update numerous workflow dependencies (#25160) 2025-02-20 08:32:29 +00:00
Tim Gross
86e1d6da52 E2E: use repo root to find correct git sha for AMI (#25151)
The nightly E2E run only builds a new AMI when required by changes to the
build. The AMI is tagged with the SHA of the commit that forced that build,
which may not be the commit that's spawning a particular test run. So we have a
resource in the `provision-infra` module that finds that SHA.

But when we run upgrade testing via Enos, we're running the E2E Terraform
configuration from outside the `e2e/terraform` folder. So the script that
resource runs will fail and prevent us from getting the AMI. Fix the script so
it can be run from any folder.

We also have duplicate resources for the "ubuntu jammy" AMI, but this is because
the Enos matrix might (in the near future) test with ARM64. For now, we'll pin
the Consul server to AMD64. Rename the resource appropriately to make the source
of the duplicate obvious.
2025-02-19 08:59:22 -05:00
James Rasell
32c25d3935 cli: Remove warning notes from Vault and Consul setup commands. (#25153) 2025-02-19 09:18:42 +00:00
Michael Smithhisler
ae21ae54a7 docs: add auth-methods section in acl concepts (#24917)
---------

Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>
2025-02-18 12:29:44 -05:00
Etienne Bruines
0739095d2b docs: remove non-existing option in task-config.mdx for virt driver (#25146)
The "network" block does not have a "provider" option, therefore it was removed.
2025-02-18 17:17:04 +01:00
Tim Gross
dc58f247ed docs: clarify reschedule, migrate, and replacement terminology (#24929)
Our vocabulary around scheduler behaviors outside of the `reschedule` and
`migrate` blocks leaves room for confusion around whether the reschedule tracker
should be propagated between allocations. There are effectively five different
behaviors we need to cover:

* restart: when the tasks of an allocation fail and we try to restart the tasks
  in place.

* reschedule: when the `restart` block runs out of attempts (or the allocation
  fails before tasks even start), and we need to move
  the allocation to another node to try again.

* migrate: when the user has asked to drain a node and we need to move the
  allocations. These are not failures, so we don't want to propagate the
  reschedule tracker.

* replacement: when a node is lost, we don't count that against the `reschedule`
  tracker for the allocations on the node (it's not the allocation's "fault",
  after all). We don't want to run the `migrate` machinery here here either, as we
  can't contact the down node. To the scheduler, this is effectively the same as
  if we bumped the `group.count`

* replacement for `disconnect.replace = true`: this is a replacement, but the
  replacement is intended to be temporary, so we propagate the reschedule tracker.

Add a section to the `reschedule`, `migrate`, and `disconnect` blocks explaining
when each item applies. Update the use of the word "reschedule" in several
places where "replacement" is correct, and vice-versa.

Fixes: https://github.com/hashicorp/nomad/issues/24918
Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>
2025-02-18 09:31:03 -05:00
James Rasell
37fb418a16 deps: Update consul-template to 0.40.0 (#25140) 2025-02-18 14:14:14 +00:00
Juana De La Cuesta
af2ac87409 Simplify binary overrides on e2e provision (#25122)
* func: remove the lists to override the nomad_local_binary for servers and clients

* docs: add a note to the terraform e2e readme

* fix: remove the extra 'windows' from the aws_ami filter

* style: hcl fmt
2025-02-17 16:13:32 +01:00
dependabot[bot]
05681afa57 chore(deps): bump golang.org/x/sys from 0.29.0 to 0.30.0 (#25127)
Bumps [golang.org/x/sys](https://github.com/golang/sys) from 0.29.0 to 0.30.0.
- [Commits](https://github.com/golang/sys/compare/v0.29.0...v0.30.0)

---
updated-dependencies:
- dependency-name: golang.org/x/sys
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-02-17 13:50:55 +01:00
dependabot[bot]
b7b18d1c50 chore(deps): bump go.etcd.io/bbolt from 1.3.11 to 1.4.0 (#25130)
Bumps [go.etcd.io/bbolt](https://github.com/etcd-io/bbolt) from 1.3.11 to 1.4.0.
- [Release notes](https://github.com/etcd-io/bbolt/releases)
- [Commits](https://github.com/etcd-io/bbolt/compare/v1.3.11...v1.4.0)

---
updated-dependencies:
- dependency-name: go.etcd.io/bbolt
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-02-17 13:50:00 +01:00