Commit Graph

27277 Commits

Author SHA1 Message Date
Tim Gross
192dec4297 docs: fix self-referencing link for raw_exec driver config (#26353)
During the big docs rearchitecture, we split up the task driver pages into
separate job declaration and driver configuration pages. The link for the
`raw_exec` driver to the configuration page is a self-reference.
2025-07-28 13:48:23 -04:00
Tim Gross
513ec02486 docs: explain access modes for CSI and DHV volumes (#26352)
The documentation for CSI and DHV has a list of the available access modes, but
doesn't explain what they mean in terms of what jobs can request, the scheduler
behavior, or the CSI plugin behavior. Expand on the information available in the
CSI specification and provide a description of DHV's behavior as well.

Ref: https://github.com/container-storage-interface/spec/blob/master/spec.md#createvolume
2025-07-28 13:48:01 -04:00
Tim Gross
6e5ecb6bb0 E2E: update Consul/Vault compat versions tested (#26369)
Update our E2E compatibility test for Consul and Vault to only include back to
the oldest-supported LTS versions of Consul and Vault. This will still leave
a few unsupported non-LTS versions in the matrix between the two oldest LTS, but
this is a small number of tests and fixing it would mean hard-coding the LTS
support matrix in our tests.
2025-07-28 12:03:30 -04:00
dependabot[bot]
d418260b6d chore(deps): bump google.golang.org/grpc from 1.73.0 to 1.74.2 (#26357)
Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.73.0 to 1.74.2.
- [Release notes](https://github.com/grpc/grpc-go/releases)
- [Commits](https://github.com/grpc/grpc-go/compare/v1.73.0...v1.74.2)

---
updated-dependencies:
- dependency-name: google.golang.org/grpc
  dependency-version: 1.74.2
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-07-28 11:27:49 -04:00
dependabot[bot]
a90f82bd0f chore(deps): bump github.com/aws/smithy-go from 1.22.4 to 1.22.5 (#26355)
Bumps [github.com/aws/smithy-go](https://github.com/aws/smithy-go) from 1.22.4 to 1.22.5.
- [Release notes](https://github.com/aws/smithy-go/releases)
- [Changelog](https://github.com/aws/smithy-go/blob/main/CHANGELOG.md)
- [Commits](https://github.com/aws/smithy-go/compare/v1.22.4...v1.22.5)

---
updated-dependencies:
- dependency-name: github.com/aws/smithy-go
  dependency-version: 1.22.5
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-07-28 11:00:15 -04:00
James Rasell
fe42c5bab0 ci: Revert hclogvet running across entire codebase. (#26365)
It seems the tool requires a little attention and does not run
well across our enterprise codebase. Rolling back that makefile
change, so it does not stop enterprise work, backport, CI, etc.
2025-07-28 15:53:40 +01:00
dependabot[bot]
e561bdb476 chore(deps): bump github.com/hashicorp/consul-template (#26356)
Bumps [github.com/hashicorp/consul-template](https://github.com/hashicorp/consul-template) from 0.41.0 to 0.41.1.
- [Release notes](https://github.com/hashicorp/consul-template/releases)
- [Changelog](https://github.com/hashicorp/consul-template/blob/v0.41.1/CHANGELOG.md)
- [Commits](https://github.com/hashicorp/consul-template/compare/v0.41.0...v0.41.1)

---
updated-dependencies:
- dependency-name: github.com/hashicorp/consul-template
  dependency-version: 0.41.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-07-28 10:02:59 -04:00
dependabot[bot]
5bc5f4f9f1 chore(deps): bump github.com/aws/aws-sdk-go-v2/config (#26358)
Bumps [github.com/aws/aws-sdk-go-v2/config](https://github.com/aws/aws-sdk-go-v2) from 1.29.17 to 1.29.18.
- [Release notes](https://github.com/aws/aws-sdk-go-v2/releases)
- [Changelog](https://github.com/aws/aws-sdk-go-v2/blob/main/changelog-template.json)
- [Commits](https://github.com/aws/aws-sdk-go-v2/compare/config/v1.29.17...config/v1.29.18)

---
updated-dependencies:
- dependency-name: github.com/aws/aws-sdk-go-v2/config
  dependency-version: 1.29.18
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-07-28 10:02:27 -04:00
James Rasell
f2417ffb89 ci: Update hclogvet and correctly run across codebase. (#26362) 2025-07-28 14:15:33 +01:00
Tim Gross
26554e544e scheduler: move result mutation into computeUpdates (#26336)
The `computeUpdate` method returns 4 different values, some of which are just
different shapes of the same data and only ever get used to be applied to the
result in the caller. Move the mutation of the result into `computeUpdates` to
match the work done in #26325. Clean up the return signature so that only slices
we need downstream are returned, and fix the incorrect docstring.

Also fix a silent bug where the `inplace` set includes the original alloc and
not the updated version. This has no functional change because all existing
callers only ever look at the length of this slice, but it will prevent future
bugs if that ever changes.

Ref: https://github.com/hashicorp/nomad/pull/26325
Ref: https://hashicorp.atlassian.net/browse/NMD-819
2025-07-25 08:21:37 -04:00
James Rasell
5989d5862a ci: Update golangci-lint to v2 and fix highlighted issues. (#26334) 2025-07-25 10:44:08 +01:00
James Rasell
2ef837f02f cli: Ensure all no argument console messages are the same. (#26331)
Use a constant to ensure consistency across the CLI when displaying
a console message indicating the command was passed arguments when
it takes none.
2025-07-25 07:05:10 +01:00
Aimee Ukasick
ccaa3b7325 add table to service.port entry (#26344) 2025-07-24 14:00:05 -05:00
Tim Gross
b91d1726ce docs: clarify namespace support in autoscaler (#26337)
The current autoscaler docs implies that it has minimal or non-working support
for Nomad namespaces. Whereas in fact the namespace support works fine but just
doesn't allow configuring multiple namespaces without using a wildcard (for
now). Make this more clear and fix the reference to the configuration "below",
which is no longer on that same page.

Ref: https://github.com/hashicorp/nomad-autoscaler/issues/65
2025-07-24 12:16:24 -04:00
Aimee Ukasick
55926afe11 Docs: Clarify service.connect examples (#26330)
* Docs: CE-997 clarify connect examples

* fix DSN typos

* CE-996 clarify agent config consul.client_auto_join

* add (formerly Consul Connect)

* remove 'Nomad and Consul are
2025-07-24 10:59:03 -05:00
Tim Gross
2c4be7fc2e Reconciler mutation improvements (#26325)
Refactors of the `computeGroup` code in the reconciler to make understanding its
mutations more manageable. Some of this work makes mutation more consistent but
more importantly it's intended to make it readily _detectable_ while still being
readable. Includes:

* In the `computeCanaries` function, we mutate the dstate and the result and
  then the return values are used to further mutate the result in the
  caller. Move all this mutation into the function.

* In the `computeMigrations` function, we mutate the result and then the return
  values are used to further mutate the result in the caller. Move all this
  mutation into the function.

* In the `cancelUnneededCanaries` function, we mutate the result and then the
  return values are used to further mutate the result in the caller. Move all
  this mutation into the function, and annotate which `allocSet`s are mutated by
  taking a pointer to the set.

* The `createRescheduleLaterEvals` function currently mutates the results and
  returns updates to mutate the results in the caller. Move all this mutation
  into the function to help cleanup `computeGroup`.

* Extract `computeReconnecting` method from `computeGroup`. There's some tangled
  logic in `computeGroup` for determining changes to make for reconnecting
  allocations. Pull this out into its own function. Annotate mutability in the
  function by passing pointers to `allocSet` where needed, and mutate the result
  to update counts. Rename the old `computeReconnecting` method to
  `appendReconnectingUpdates` to mirror the naming of the similar logic for
  disconnects.

* Extract `computeDisconnecting` method from `computeGroup`. There's some
  tangled logic in `computeGroup` for determining changes to make for
  disconnected allocations. Pull this out into its own function. Annotate
  mutability in the function by passing pointers to `allocSet` where needed, and
  mutate the result to update counts.

* The `appendUnknownDisconnectingUpdates` method used to create updates for
  disconnected allocations mutates one of its `allocSet` arguments to change the
  allocations that the reschedule now set points to. Pull this update out into
  the caller.

* A handful of small docstring and helper function fixes


Ref: https://hashicorp.atlassian.net/browse/NMD-819
2025-07-24 08:33:49 -04:00
Tim Gross
e675491eb6 refactor uses of allocSet in reconciler (#26324)
The reconciler contains a large set of methods and functions that operate on
`allocSet` (a map of allocation IDs to their allocs). Update these so that they
are consistently methods that are documented to not consume the `allocSet`. This
sets the stage for further improvements around mutability in the reconciler.

This changeset also includes a few related refactors:
* Use the `allocSet` alias in every location it's relevant in the reconciler,
  for consistency and clarity.
* Move the filter functions and related helpers in the `allocs.go` file into the
  `filters.go` file.
* Update the method receiver on `allocSet` to match everywhere and generally
  improve the docstrings on the filter functions.

Ref: https://hashicorp.atlassian.net/browse/NMD-819
2025-07-23 08:57:41 -04:00
Jeff Boruszak
61cb8f6f10 Merge pull request #26270 from hashicorp/docs/redirects-for-versioning
docs: Versioned redirect logic
2025-07-22 14:14:23 -07:00
Aimee Ukasick
e6d63faf58 Fix typo (#26319) 2025-07-22 09:53:31 -05:00
Michael Smithhisler
36b4aa79df docs: fix link to nomad schedulers (#26302) 2025-07-21 08:53:29 -05:00
dependabot[bot]
66c22971b0 chore(deps): bump github.com/klauspost/cpuid/v2 from 2.2.11 to 2.3.0 (#26305) 2025-07-21 13:20:34 +01:00
dependabot[bot]
c6584e241c chore(deps): bump github.com/docker/cli (#26307) 2025-07-21 12:31:50 +01:00
dependabot[bot]
e26f7af91c chore(deps): bump github.com/miekg/dns from 1.1.66 to 1.1.67 (#26306) 2025-07-21 11:26:44 +01:00
dependabot[bot]
9facc80924 chore(deps): bump github.com/golang-jwt/jwt/v5 from 5.2.2 to 5.2.3 (#26308) 2025-07-21 10:36:03 +01:00
dependabot[bot]
5b05d99257 chore(deps): bump github.com/aws/aws-sdk-go-v2/feature/ec2/imds (#26304) 2025-07-21 08:24:15 +01:00
Piotr Kazmierczak
973a554808 scheduler: remove unnecessary reconnecting and ignore allocset assignment (#26298)
These values aren't used anywhere, and the code is confusing as is.
2025-07-21 09:06:52 +02:00
dependabot[bot]
edd7b5b8e7 chore(deps): bump golang.org/x/crypto from 0.39.0 to 0.40.0 (#26276)
Bumps [golang.org/x/crypto](https://github.com/golang/crypto) from 0.39.0 to 0.40.0.
- [Commits](https://github.com/golang/crypto/compare/v0.39.0...v0.40.0)

---
updated-dependencies:
- dependency-name: golang.org/x/crypto
  dependency-version: 0.40.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-07-18 16:06:26 -04:00
dependabot[bot]
0d3056ef8a chore(deps): bump golang.org/x/sys from 0.33.0 to 0.34.0 (#26275)
Bumps [golang.org/x/sys](https://github.com/golang/sys) from 0.33.0 to 0.34.0.
- [Commits](https://github.com/golang/sys/compare/v0.33.0...v0.34.0)

---
updated-dependencies:
- dependency-name: golang.org/x/sys
  dependency-version: 0.34.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-07-18 13:39:38 -04:00
Deniz Onur Duzgun
f23ca749e7 build: update toolchain to go 1.24.5 (#26297) 2025-07-18 13:38:11 -04:00
dependabot[bot]
da77c0134e chore(deps): bump golang.org/x/mod from 0.25.0 to 0.26.0 (#26274)
Bumps [golang.org/x/mod](https://github.com/golang/mod) from 0.25.0 to 0.26.0.
- [Commits](https://github.com/golang/mod/compare/v0.25.0...v0.26.0)

---
updated-dependencies:
- dependency-name: golang.org/x/mod
  dependency-version: 0.26.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-07-18 10:55:38 -04:00
Tim Gross
333dd94362 scheduler: exit early on count=0 and filter out server-terminal (#26292)
When a task group is removed from a jobspec, the reconciler stops all
allocations and immediately returns from `computeGroup`. We can do the same for
when the group has been scaled-to-zero, but doing so runs into an inconsistency
in the way that server-terminal allocations are handled.

Prior to this change server-terminal allocations fall through `computeGroup`
without being marked as `ignore`, unless they are terminal canaries, in which
case they are marked `stop` (but this is a no-op). This inconsistency causes a
_tiny_ amount of extra `Plan.Submit`/Raft traffic, but more importantly makes it
more difficult to make test assertions for `stop` vs `ignore` vs
fallthrough. Remove this inconsistency by filtering out server-terminal
allocations early in `computeGroup`.

This brings the cluster reconciler's behavior closer to the node reconciler's
behavior, except that the node reconciler discards _all_ terminal allocations
because it doesn't support rescheduling.

This changeset required adjustments to two tests, but the tests themselves were
a bit of a mess:
* In https://github.com/hashicorp/nomad/pull/25726 we added a test of how
  canaries were treated when on draining nodes. But the test didn't correctly
  configure the job with an update block, leading to misleading test
  behavior. Fix the test to exercise the intended behavior and refactor for
  clarity.
* While working on reconciler behaviors around stopped allocations, I found it
  extremely hard to follow the intent of the disconnected client tests because
  many of the fields in the table-driven test are switches for more complex
  behavior or just tersely named. Attempt to make this a little more legible by
  moving some branches directly into fields, renaming some fields, and
  flattening out some branching.

Ref: https://hashicorp.atlassian.net/browse/NMD-819
2025-07-18 08:51:52 -04:00
dependabot[bot]
c9ebf01b4a chore(deps): bump github.com/docker/docker (#26272)
Bumps [github.com/docker/docker](https://github.com/docker/docker) from 28.3.1+incompatible to 28.3.2+incompatible.
- [Release notes](https://github.com/docker/docker/releases)
- [Commits](https://github.com/docker/docker/compare/v28.3.1...v28.3.2)

---
updated-dependencies:
- dependency-name: github.com/docker/docker
  dependency-version: 28.3.2+incompatible
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-07-17 15:40:00 -04:00
Allison Larson
918e1eb123 Correctly canonicalize lifecycle block when missing hook value (#26285) 2025-07-16 11:40:16 -07:00
Aimee Ukasick
0d620607fe add blog links and video to nomad vs k8s (#26286) 2025-07-16 12:56:42 -05:00
Tim Gross
35f3f6ce41 scheduler: add disconnect and reschedule info to reconciler output (#26255)
The `DesiredUpdates` struct that we send to the Read Eval API doesn't include
information about disconnect/reconnect and rescheduling. Annotate the
`DesiredUpdates` with this data, and adjust the `eval status` command to display
only those fields that have non-zero values in order to make the output width
manageable.

Ref: https://hashicorp.atlassian.net/browse/NMD-815
2025-07-16 08:46:38 -04:00
Tim Gross
9a288ef493 deployment watcher: refactoring testing (#26284)
While investigating whether the deploymentwatcher would need updates to
implement system deployments, I discovered that some of the tests are racy and
make assertions about called functions without waiting.

Update these tests to wait where needed, and generally clean them up while we're
in here. In particular I've removed the heavyweight mocking in lieu of checking
the call counts and then asserting the expected state store changes.

Ref: https://hashicorp.atlassian.net/browse/NMD-892
2025-07-16 08:46:24 -04:00
Allison Larson
3ca518e89c Add node_pool to blockedEval metric (#26215)
Adds the node_pool to the blockedEval metrics that get emitted for
resource/cpu, along with the dc and node class.
2025-07-15 09:48:04 -07:00
Tim Gross
279775082c sysbatch: correctly validate that reschedule policy is not allowed (#26279)
System and sysbatch jobs don't support the reschedule block, because we'd always
replace allocations back onto the same node. The job validation for system jobs
asserts that the user hasn't set a `reschedule` block so that users aren't
submitting jobs expecting it to be supported. But this validation was missing
for sysbatch jobs.

Validate that sysbatch jobs don't have a reschedule block.
2025-07-15 10:47:02 -04:00
Daniel Bennett
089c148236 allocrunner: run all postrun hooks, even on error (#26271)
e.g. if the consul postrun hook fails, continue running
the subsequent postrun hooks, which among other things
includes network/CNI/iptables cleanup.
2025-07-14 13:55:33 -04:00
boruszak
a70e09f508 typo fix 2025-07-11 13:34:27 -07:00
boruszak
a6ec64063d neater formatting 2025-07-11 13:19:50 -07:00
boruszak
9c6ca4fd21 Tutorial archive redirects for versioned docs. 2025-07-11 13:14:05 -07:00
boruszak
5cf2d36b29 Versioned redirect logic 2025-07-11 13:13:42 -07:00
Tim Gross
b23ab5ac15 docs: clarify requirements for deleting volumes (#26240)
If you delete a CSI volume, the volume cannot be currently claimed by an
allocation or in the process of being unpublished. This is documented in the CLI
but not the API. Also, the documentation incorrectly says that the `volume
delete` command silently returns without error if the volume doesn't exist, but
that's incorrect.

Fixes: https://github.com/hashicorp/nomad/issues/24756
2025-07-11 15:01:06 -04:00
Tim Gross
bf44eddd9f docs: note that CSI volume name must be unique (#26249)
When we originally implemented CSI, Nomad did not support the `CreateVolume`
workflow, so the volume name field was just a display name. The `CreateVolume`
CSI RPC requires that the volume name be unique. In retrospect, Nomad should
probably have mapped the namespace + ID to the volume name field, but because we
didn't the name field must be unique per storage provider. In future work we
should try to figure out a way to unwind that decision but in the meantime let's
make that requirement clear in the documentation.

Ref: https://gitlab.com/rocketduck/csi-plugin-nfs/-/issues/21
2025-07-11 14:57:53 -04:00
Piotr Kazmierczak
08b3db104d docs: update reconciler diagram to reflect recent refactors (#26260)
Co-authored-by: Tim Gross <tgross@hashicorp.com>
2025-07-11 15:34:07 +02:00
Tim Gross
26302ab25d reconciler: share assertions in property tests (#26259)
Refactor the reconciler property tests to extract functions for safety property
assertions we'll share between different job types for the same reconciler.
2025-07-11 09:27:22 -04:00
Aimee Ukasick
9af1642a1f add redirect to handle new 1.9, 1.8 /commands path (#26254) 2025-07-10 15:08:44 -05:00
Frédéric Praca
7e47aa3a1f fix(doc): fix links for task driver plugins (#26250)
host URL was wrong, changed from develoepr to developer
2025-07-10 14:33:45 -05:00
Tim Gross
3bb1c9aeaf docs: more details for alloc status (#26243)
The `alloc status` documentation is missing information about placement metrics.

Ref: https://hashicorp.atlassian.net/browse/NMD-818
2025-07-10 08:57:37 -04:00