Commit Graph

27509 Commits

Author SHA1 Message Date
Tim Gross
fbcdb125da end-to-end testing improvements for CSI (#26834)
While working on #26831 and #26832 I made some minor improvements to our
end-to-end test setup for CSI:

* bump the AWS EBS plugin versions to latest release (1.48.0)
* remove the unnnecessary `datacenters` field from the AWS EBS plugin jobs
* add a name tag to the EBS volumes we create
* add a user-specific name tag to the cluster name when using the makefile to
  deploy a cluster
* add volumes and other missing variables from the `provision-infra` module to
  the main E2E module

Ref: https://github.com/hashicorp/nomad/pull/26832
Ref: https://github.com/hashicorp/nomad/pull/26831
2025-09-25 09:27:15 -04:00
Tim Gross
40241b261b CSI: ensure only client-terminal allocs are treated as past claims (#26831)
The volume watcher checks whether any allocations that have claims are terminal
so that it knows if it's safe to unpublish the volume. This check was
considering a claim as unpublishable if the allocation was terminal on either
the server or client, rather than the client alone. In many circumstances this
is safe.

But if an allocation takes a while to stop (ex. it has a `shutdown_delay`), it's
possible for garbage collection to run in the window between when the alloc is
marked server-terminal and when the task is actually stopped. The server
unpublishes the volume which sends a node plugin RPC. The plugin unmounts the
volume while it's in use, and then unmounts it again when the allocation stops
and the CSI postrun hook runs. If the task writes to the volume during the
unmounting process, some providers end up in a broken state and the volume is
not usable unless it's detached and reattached.

Fix this by considering a claim a "past claim" only when the allocation is
client terminal. This way if garbage collection runs while we're waiting for
allocation shutdown, the alloc will only be server-terminal and we won't send
the extra node RPCs.

Fixes: https://github.com/hashicorp/nomad/issues/24130
Fixes: https://github.com/hashicorp/nomad/issues/25819
Ref: https://hashicorp.atlassian.net/browse/NMD-1001
2025-09-25 09:24:53 -04:00
James Rasell
c80c60965f node pool: Allow specifying node identity ttl in HCL or JSON spec. (#26825)
The node identity TTL defaults to 24hr but can be altered by
setting the node identity TTL parameter. In order to allow setting
and viewing the value, the field is now plumbed through the CLI
and HTTP API.

In order to parse the HCL, a new helper package has been created
which contains generic parsing and decoding functionality for
dealing with HCL that contains time durations. hclsimple can be
used when this functionality is not needed. In order to parse the
JSON, custom marshal and unmarshal functions have been created as
used in many other places.

The node pool init command has been updated to include this new
parameter, although commented out, so reference. The info command
now includes the TTL in its output too.
2025-09-24 14:20:34 +01:00
Aimee Ukasick
6d4c8b3efe Update CODEOWNERS (#26827)
change web-presence to web-devdot so web engineers not on the devdot team don't get assigned
2025-09-23 09:22:23 -05:00
Daniel Bennett
1d6fddd11f build: ui: setup-node v4.4.0 (#26826)
for actions/cache upgrade, specifically to account for
https://github.com/actions/toolkit/discussions/1890
2025-09-22 15:35:09 -04:00
dependabot[bot]
ccd497b46f chore(deps): bump github.com/shoenig/go-m1cpu from 0.1.6 to 0.1.7 (#26817)
Bumps [github.com/shoenig/go-m1cpu](https://github.com/shoenig/go-m1cpu) from 0.1.6 to 0.1.7.
- [Release notes](https://github.com/shoenig/go-m1cpu/releases)
- [Commits](https://github.com/shoenig/go-m1cpu/compare/v0.1.6...v0.1.7)

---
updated-dependencies:
- dependency-name: github.com/shoenig/go-m1cpu
  dependency-version: 0.1.7
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-22 09:58:50 +02:00
dependabot[bot]
63e4376d3c chore(deps): bump golang.org/x/mod from 0.27.0 to 0.28.0 (#26814)
Bumps [golang.org/x/mod](https://github.com/golang/mod) from 0.27.0 to 0.28.0.
- [Commits](https://github.com/golang/mod/compare/v0.27.0...v0.28.0)

---
updated-dependencies:
- dependency-name: golang.org/x/mod
  dependency-version: 0.28.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-22 09:58:28 +02:00
Tim Gross
b5530128df docs: expand on allocation GC details (#26792)
Expand on the documentation of allocation garbage collection:
* Explain that server-side GC of allocations is tied to the GC of the
evaluation that spawned the allocation.
* Explain that server-side GC of allocations will force them to be immediately
GC'd on the client regardless of the client-side configurations.

Ref: https://github.com/hashicorp/nomad/issues/26765

Co-authored-by: Aimee Ukasick <Aimee.Ukasick@ibm.com>
Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>
2025-09-19 12:17:17 -04:00
Aimee Ukasick
377674f93e Contributing README: Add section for creating an issue (#26805)
* Add section for creating an issue

* incorporate feedback

Co-authored-by: Tim Gross <tgross@hashicorp.com>

* Update contributing/README.md

---------

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2025-09-19 10:49:54 -05:00
Piotr Kazmierczak
ceeee1f68c e2e: set longer job submission time for failing system jobs (#26809)
I cannot replicate this locally, but it appears that on CI some of our
system jobs take longer than the default 20s to finish deploying. This
PR is just to make sure this isn't the reason these tests fail.
2025-09-19 17:25:53 +02:00
Tim Gross
0367b60ca9 changelog for Nomad Enterprise 1.9.13+ent and 1.8.17+ent (#26806)
Due to the delayed release of Nomad Enterprise, we didn't have the changelog
entries for these two releases.
2025-09-19 11:22:58 -04:00
Piotr Kazmierczak
f767db5639 e2e: fix TestScaling/TestScaling_System (#26804) 2025-09-19 15:31:32 +02:00
James Rasell
8e553ad95b build: Add tzdata to Docker container final image. (#26794)
Nomad's periodic block includes a "time_zone" parameter which lets
operators set the time zone at which the next launch interval is
checked against. For this to work, Nomad needs to use the
"time.LoadLocation" which in-turn can use multiple TZ data sources.

When using the Docker image to trigger Nomad job registrations, it
currently does not have access to any TZ data, meaning it is only
aware of UTC. Adding the tzdata package contents to the release
image provides the required data for this to work.

It would have also been possible to set the "-tags" build tag when
releasing Nomad which would embed a copy of the timezone database
in the code. We decided against using the build tag approach as it
is a subtle way that we could introduce bugs that are very
difficult to track down and we prefer the commit approach.
2025-09-19 08:55:57 +01:00
ethel-hashicorp
6ea57a589d SMRE-733: Updates post-install text to properly reflect the updated IPLA blurb (#26791) 2025-09-19 07:35:58 +01:00
Piotr Kazmierczak
f42239bf6c api: add DefaultUpdateStrategy to system jobs if missing (#26777)
From 1.11, Nomad system jobs will feature deployments, and thus jobspecs missing
an update block should be canonicalized to have one.
2025-09-18 15:21:23 +02:00
Tim Gross
3ef25e5867 ACL: allow workload identities to list/get their own policies (#26772)
In most RPC endpoints we use the resolved ACL object to determine whether a
given auth token or identity has access to the object of interest to the
RPC. In #15870 we adjusted this across most of the RPCs to handle workload identity.

But in the ACL endpoints that read policies, we can't use the resolved ACL
object and have to go back to the original token and lookup the policies it has
access to. So we need to resolve any workload-associated policies during that
lookup as well.

Fixes: https://github.com/hashicorp/nomad/issues/26764
Ref: https://hashicorp.atlassian.net/browse/NMD-990
Ref: https://github.com/hashicorp/nomad/pull/15870
2025-09-18 09:10:37 -04:00
James Rasell
a206ff3858 test: Fix test flake in client get registration token (#26796)
The test was incorrectly writing to state that registration had
been finished before writing the node identity token. This is the
opposite of what happens in the client code and caused a timing
issue which meant we read registration as completed before we had
the identity available and therefore returned the secret ID.
2025-09-18 13:56:17 +01:00
Piotr Kazmierczak
46dfd9d992 scheduler: do not create deployments for system job reschedules (#26789)
System jobs that get rescheduled should not get new deployments.
2025-09-18 14:54:54 +02:00
Tim Gross
3432b0a2d6 consul: only add fingerprint link if unique.consul.name is set (#26787)
In Nomad Enterprise we can fingerprint multiple Consul datacenters. If neither
is `"default"` then we end up with warning logs about adding a "link".

The `Link` field on the `Node` struct is a map of attributes that only
contributes to the node's computed hash. The `"consul"` key's value is derived
from the `unique.consul.name` attribute, which only exists if there's a default
Consul cluster.

Update the fingerprint to skip setting the link field if there's no
`unique.consul.name`, and lower the warning log for malformed fields to debug;
this is a minor scheduling optimization largely captured by existing Consul
fields in the node computed class. The only reason not to remove it entirely is
to avoid changing computed classes on existing large clusters.

Fixes: https://github.com/hashicorp/nomad/issues/26781
Ref: https://hashicorp.atlassian.net/browse/NMD-998
2025-09-17 13:23:01 -04:00
Jeff Boruszak
6dce21bc85 Merge pull request #26682 from hashicorp/docs/versioned-redirect-fix
docs: Versioned docs redirect fixes
2025-09-17 08:58:37 -07:00
Tim Gross
4e75e99f1a windows: use/accept platform-specific signal for stopping agent (#26780)
On Windows, the `os.Process.Signal` method returns an error when sending
`os.Interrupt` (SIGINT) because it isn't implemented. This causes test servers
in the `testutil` packages to break on Windows. Use the platform specific
syscalls to generate the SIGINT instead.

The agent's signal handler also did not correctly handle the Ctrl-C because we
were masking os.Interrupt instead of SIGINT.

Fixes: https://github.com/hashicorp/nomad/issues/26775

Co-authored-by: Chris Roberts <croberts@hashicorp.com>
2025-09-17 11:32:20 -04:00
Aimee Ukasick
fca783c566 Add 1.10.5 release notes (#26782) 2025-09-17 08:59:43 -05:00
James Rasell
ac5a77af56 docs: Add client identity HTTP API detail on api-docs page. (#26774)
Co-authored-by: Aimee Ukasick <Aimee.Ukasick@ibm.com>
2025-09-17 14:05:37 +01:00
Piotr Kazmierczak
4874622ebd e2e: test canary updates for system jobs (#26776) 2025-09-17 10:20:03 +02:00
boruszak
8ab61f37b3 Fix accidental "s 2025-09-16 14:23:59 -07:00
Michael Smithhisler
1a19a16ee9 docs: fix link in multiregion job spec page (#26755) 2025-09-16 13:00:42 -05:00
James Rasell
2abd72d433 http: Fix client identity renew call when node ID is in URI. (#26773)
When calling the client identity renew API, it is possible the
target node ID is provided by either the URI or within the request
body. This change fixes a bug where all calls using a node_id query
parameter would be reject as it failed to decode the empty request
body.

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2025-09-16 15:15:39 +01:00
Olli Janatuinen
6398ef9475 secrets: Support custom plugins in Windows (#26751)
Signed-off-by: Olli Janatuinen <olli.janatuinen@gmail.com>
2025-09-16 09:14:50 -04:00
Daniel Bennett
f47cb5d10f e2e: adjust flaky timings (#26771)
hopefully fixes:

```
TestOversubscription/testExec:
    oversubscription_test.go:57: submitting job: "./input/exec.hcl"
    oversubscription_test.go:72:
        oversubscription_test.go:72: expected condition to pass within wait context
        ↪ error: wait: timeout exceeded: expect '31457280' in stdout, got: 'stat {...}/cat.stdout.0: no such file or directory'
```

and in separate runs,

```
TestTaskAPI/testTaskAPI_Auth:
     taskapi_test.go:85:
         taskapi_test.go:85: expected string to have suffix
         ↪ suffix: Unauthorized
         ↪ string:
```

```
TestTaskAPI/testTaskAPI_Auth:
     taskapi_test.go:85:
         taskapi_test.go:85: expected string to have suffix
         ↪ suffix: Forbidden
         ↪ string:
```
2025-09-15 15:54:53 -04:00
dependabot[bot]
ababacc9ab chore(deps): bump github.com/shoenig/test from 1.12.1 to 1.12.2 in /api (#26757)
* chore(deps): bump github.com/shoenig/test from 1.12.1 to 1.12.2 in /api

Bumps [github.com/shoenig/test](https://github.com/shoenig/test) from 1.12.1 to 1.12.2.
- [Release notes](https://github.com/shoenig/test/releases)
- [Commits](https://github.com/shoenig/test/compare/v1.12.1...v1.12.2)

---
updated-dependencies:
- dependency-name: github.com/shoenig/test
  dependency-version: 1.12.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* root dep needs to be updated too

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Tim Gross <tgross@hashicorp.com>
2025-09-15 09:06:41 -04:00
dependabot[bot]
2baeffec92 chore(deps-dev): bump prettier from 3.5.3 to 3.6.2 in /website (#26162)
Bumps [prettier](https://github.com/prettier/prettier) from 3.5.3 to 3.6.2.
- [Release notes](https://github.com/prettier/prettier/releases)
- [Changelog](https://github.com/prettier/prettier/blob/main/CHANGELOG.md)
- [Commits](https://github.com/prettier/prettier/compare/3.5.3...3.6.2)

---
updated-dependencies:
- dependency-name: prettier
  dependency-version: 3.6.2
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-15 08:51:31 -04:00
dependabot[bot]
be1fdc0d53 chore(deps): bump golang.org/x/crypto from 0.41.0 to 0.42.0 (#26758)
Bumps [golang.org/x/crypto](https://github.com/golang/crypto) from 0.41.0 to 0.42.0.
- [Commits](https://github.com/golang/crypto/compare/v0.41.0...v0.42.0)

---
updated-dependencies:
- dependency-name: golang.org/x/crypto
  dependency-version: 0.42.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-15 08:48:31 -04:00
dependabot[bot]
16533b3d34 chore(deps): bump google.golang.org/grpc from 1.75.0 to 1.75.1 (#26760)
Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.75.0 to 1.75.1.
- [Release notes](https://github.com/grpc/grpc-go/releases)
- [Commits](https://github.com/grpc/grpc-go/compare/v1.75.0...v1.75.1)

---
updated-dependencies:
- dependency-name: google.golang.org/grpc
  dependency-version: 1.75.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-15 08:48:15 -04:00
dependabot[bot]
24ef9fa928 chore(deps): bump github.com/aws/aws-sdk-go-v2/feature/ec2/imds (#26762)
Bumps [github.com/aws/aws-sdk-go-v2/feature/ec2/imds](https://github.com/aws/aws-sdk-go-v2) from 1.18.6 to 1.18.7.
- [Release notes](https://github.com/aws/aws-sdk-go-v2/releases)
- [Changelog](https://github.com/aws/aws-sdk-go-v2/blob/config/v1.18.7/CHANGELOG.md)
- [Commits](https://github.com/aws/aws-sdk-go-v2/compare/config/v1.18.6...config/v1.18.7)

---
updated-dependencies:
- dependency-name: github.com/aws/aws-sdk-go-v2/feature/ec2/imds
  dependency-version: 1.18.7
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-15 08:16:49 -04:00
dependabot[bot]
5d0d5d2b22 chore(deps): bump github.com/zclconf/go-cty from 1.16.4 to 1.17.0 (#26761)
Bumps [github.com/zclconf/go-cty](https://github.com/zclconf/go-cty) from 1.16.4 to 1.17.0.
- [Release notes](https://github.com/zclconf/go-cty/releases)
- [Changelog](https://github.com/zclconf/go-cty/blob/main/CHANGELOG.md)
- [Commits](https://github.com/zclconf/go-cty/compare/v1.16.4...v1.17.0)

---
updated-dependencies:
- dependency-name: github.com/zclconf/go-cty
  dependency-version: 1.17.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-15 08:16:37 -04:00
dependabot[bot]
da9a25d77d chore(deps): bump golang.org/x/time from 0.12.0 to 0.13.0 (#26759)
Bumps [golang.org/x/time](https://github.com/golang/time) from 0.12.0 to 0.13.0.
- [Commits](https://github.com/golang/time/compare/v0.12.0...v0.13.0)

---
updated-dependencies:
- dependency-name: golang.org/x/time
  dependency-version: 0.13.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-15 08:16:12 -04:00
James Rasell
a7db1b42b8 acl: Migrate all tests from testify to must. (#26704) 2025-09-15 08:21:49 +01:00
Chris Roberts
10be73c081 ci: fix github to jira issue sync (#26747)
Add local actions for JIRA interactions to replace github actions
that have been archived.
2025-09-12 13:40:11 -07:00
Tim Gross
ac86225e09 metrics: reduce heap usage of eval broker metrics (#26737)
The metrics on the eval broker include labels for the job ID, but under a high
volume of dispatch workloads, this results in excessive heap usage on the
leader. Dispatch workloads should use their parent ID rather than their child ID
for any metrics we collect.

Also, eliminate an extra copy of the labels. And remove the extremely high
cardinality `"eval_id"` label from the `nomad.broker.eval_waiting` metric.

Fixes: https://github.com/hashicorp/nomad/issues/26657
2025-09-12 08:29:46 -04:00
Michael Smithhisler
c20f854d16 client: set network status on tasks when restoring allocations (#26699)
The allocation network hook was not properly restoring network status from state when the network had previously been setup.  This led to missing environment variables, misconfigured hosts file, and resolv.conf when a task was restarted after the nomad agent has restarted.
---------

Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>
2025-09-11 13:10:21 -04:00
Chris Roberts
8b51acf259 [artifact] fix path within check on trimmed target (#26748)
When checking if the target path is within the root path, the
target path is trimmed and then file information is fetched. If
the trimmed path does not exist, then the full target path is
not within the root. In the case of receiving a not exist error,
simply return false.
2025-09-11 08:59:18 -07:00
Piotr Kazmierczak
8eb72b2868 Post 1.10.5 release (#26749)
* Generate files for 1.10.5 release

* Prepare for next release

---------

Co-authored-by: hc-github-team-nomad-core <github-team-nomad-core@hashicorp.com>
2025-09-11 14:49:12 +02:00
hc-github-team-nomad-core
4c0e5b286b Prepare for next release 2025-09-11 10:20:15 +02:00
hc-github-team-nomad-core
f9bce13f8c Generate files for 1.10.5 release 2025-09-11 10:20:15 +02:00
Michael Smithhisler
f58e915bd3 scheduler: allow device count to use different vendors/models (#26649)
A small optimization in the scheduler required users to specify specific
models of devices if the required count was higher than the individual
model/vendor on the node. This change removes that optimization to allow
for more intuitive device scheduling when different vendor/model device
types exist on a node.
2025-09-10 07:12:38 -04:00
tehut
68d767654a ci: remove mkdir from action for release runners (#26743) 2025-09-10 09:13:49 +02:00
tehut
bfd64b5f98 build:replicate nomad-enterprise 557e533 (#26741) 2025-09-09 17:02:08 -07:00
Tim Gross
75774711f0 eliminate dead Vault-related code from nomad/structs (#26736)
When we removed the legacy Vault token workflow, we left behind a few bits of
code that only served that workflow. Remove the dead code.
2025-09-09 12:12:57 -04:00
Michael Smithhisler
37da98be1c Merge pull request #26681 from hashicorp/NMD-760-nomad-secrets-block
Secrets Block: merge feature branch to main
2025-09-09 10:46:18 -04:00
Tim Gross
0b69999698 Revert go-getter update (#26731)
The `go-getter` update in https://github.com/hashicorp/nomad/pull/26713 is not passing tests upstream (apparently https://github.com/hashicorp/go-getter/pull/548 is the origin of the problem but that PR did not ever run tests). The issue being fixed isn't a critical vulnerability, so in the interest of preparing us for the next release, revert the `go-getter` change but keep the Go toolchain update.

We'll skip go-getter 1.8.0 and pick up the next patch version once its issues are fixed.
Reverts commit 8a96929870.
2025-09-09 09:28:08 -04:00