Commit Graph

27466 Commits

Author SHA1 Message Date
hc-github-team-nomad-core
4c0e5b286b Prepare for next release 2025-09-11 10:20:15 +02:00
hc-github-team-nomad-core
f9bce13f8c Generate files for 1.10.5 release 2025-09-11 10:20:15 +02:00
Michael Smithhisler
f58e915bd3 scheduler: allow device count to use different vendors/models (#26649)
A small optimization in the scheduler required users to specify specific
models of devices if the required count was higher than the individual
model/vendor on the node. This change removes that optimization to allow
for more intuitive device scheduling when different vendor/model device
types exist on a node.
2025-09-10 07:12:38 -04:00
tehut
68d767654a ci: remove mkdir from action for release runners (#26743) 2025-09-10 09:13:49 +02:00
tehut
bfd64b5f98 build:replicate nomad-enterprise 557e533 (#26741) 2025-09-09 17:02:08 -07:00
Tim Gross
75774711f0 eliminate dead Vault-related code from nomad/structs (#26736)
When we removed the legacy Vault token workflow, we left behind a few bits of
code that only served that workflow. Remove the dead code.
2025-09-09 12:12:57 -04:00
Michael Smithhisler
37da98be1c Merge pull request #26681 from hashicorp/NMD-760-nomad-secrets-block
Secrets Block: merge feature branch to main
2025-09-09 10:46:18 -04:00
Tim Gross
0b69999698 Revert go-getter update (#26731)
The `go-getter` update in https://github.com/hashicorp/nomad/pull/26713 is not passing tests upstream (apparently https://github.com/hashicorp/go-getter/pull/548 is the origin of the problem but that PR did not ever run tests). The issue being fixed isn't a critical vulnerability, so in the interest of preparing us for the next release, revert the `go-getter` change but keep the Go toolchain update.

We'll skip go-getter 1.8.0 and pick up the next patch version once its issues are fixed.
Reverts commit 8a96929870.
2025-09-09 09:28:08 -04:00
Daniel Bennett
cb3e49f3e4 e2e: shorten restart delay in docker registry task (#26729)
tests that use this local docker registry (docker and podman tests)
occasionally flake, I think due to the timeout being reached,
despite passing after a restart.

> jobs3.go:658: tg 'create-files' task 'create-auth-file' event: Task received by client
> jobs3.go:658: tg 'create-files' task 'create-auth-file' event: Building Task Directory
> jobs3.go:658: tg 'create-files' task 'create-auth-file' event: Task started by client
> jobs3.go:658: tg 'create-files' task 'create-auth-file' event: Exit Code: 1
> jobs3.go:658: tg 'create-files' task 'create-auth-file' event: Task restarting in 16.212149445s
> jobs3.go:658: tg 'create-files' task 'create-auth-file' event: Task started by client
> jobs3.go:658: tg 'create-files' task 'create-auth-file' event: Exit Code: 0

setting the delay lower will (hopefully) keep within the job timeout.

I'm not sure why the `pledge` task apparently flakes like this;
I could find no useful info in the logs.
2025-09-08 15:21:08 -04:00
Tim Gross
db8ecac20d docs: include Consul namespace claim mapping in auth config example (#26730)
When configuring Nomad Enterprise with Consul Enterprise and multiple
namespaces, you need to include the `consul_namespace` mapping in the auth
method configuration. Otherwise you'll see an error like "unknown variable
accessed: value.consul_namespace". There's no example of the updated auth method
configuration you need, which makes this detail unclear when we're showing the
claim being used in the following `consul acl auth-method create` command.
2025-09-08 15:15:47 -04:00
dependabot[bot]
e8d5cfb77d chore(deps): bump github.com/hashicorp/go-plugin from 1.6.3 to 1.7.0 (#26716)
Bumps [github.com/hashicorp/go-plugin](https://github.com/hashicorp/go-plugin) from 1.6.3 to 1.7.0.
- [Release notes](https://github.com/hashicorp/go-plugin/releases)
- [Changelog](https://github.com/hashicorp/go-plugin/blob/main/CHANGELOG.md)
- [Commits](https://github.com/hashicorp/go-plugin/compare/v1.6.3...v1.7.0)

---
updated-dependencies:
- dependency-name: github.com/hashicorp/go-plugin
  dependency-version: 1.7.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-08 18:44:51 +02:00
dependabot[bot]
7ccd017bc8 chore(deps): bump github.com/prometheus/common from 0.65.0 to 0.66.1 (#26717)
Bumps [github.com/prometheus/common](https://github.com/prometheus/common) from 0.65.0 to 0.66.1.
- [Release notes](https://github.com/prometheus/common/releases)
- [Changelog](https://github.com/prometheus/common/blob/main/CHANGELOG.md)
- [Commits](https://github.com/prometheus/common/compare/v0.65.0...v0.66.1)

---
updated-dependencies:
- dependency-name: github.com/prometheus/common
  dependency-version: 0.66.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-08 18:08:30 +02:00
dependabot[bot]
1498ec6c2e chore(deps): bump go.etcd.io/bbolt from 1.4.2 to 1.4.3 (#26720)
Bumps [go.etcd.io/bbolt](https://github.com/etcd-io/bbolt) from 1.4.2 to 1.4.3.
- [Release notes](https://github.com/etcd-io/bbolt/releases)
- [Commits](https://github.com/etcd-io/bbolt/compare/v1.4.2...v1.4.3)

---
updated-dependencies:
- dependency-name: go.etcd.io/bbolt
  dependency-version: 1.4.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-08 18:07:37 +02:00
Tim Gross
f86a141026 scheduler: don't sort reserved port ranges before adding to bitmap (#26712)
During a large volume dispatch load test, I discovered that a lot of the total
scheduling time is being spent calling `structs.ParsePortRanges` repeatedly, in
order to parse the reserved ports configuration of the node (ex. converting
`"80,8000-8001"` to `[]int{80, 8000, 8001}`). A close examination of the
profiles shows that the bulk of the time is being spent hashing the keys for the
map of ports we use for de-duplication, and then sorting the resulting slice.

The `(*NetworkIndex) SetNode` method that calls the offending `ParsePortRanges`
merges all the ports into the `UsedPorts` map of bitmaps at scheduling
time. Which means the consumer of the slice is already de-duplicating and
doesn't care about the order. The only other caller of `ParsePortRanges` is when
we validate the configuration file, and that throws away the slice entirely.

By skipping de-duplication and not sorting, we can cut down the runtime of this
function by 30x and memory usage by 3x.

Ref: https://github.com/hashicorp/nomad/blob/v1.10.4/nomad/structs/network.go#L201
Fixes: https://github.com/hashicorp/nomad/issues/26654
2025-09-08 12:05:21 -04:00
Daniel Bennett
1f7f51ceb4 e2e: update cni plugins (#26724)
> failed to configure network: plugin type="firewall" failed (add):
> incompatible CNI versions; config is "1.0.0", plugin supports ["0.4.0"]
2025-09-08 11:52:23 -04:00
dependabot[bot]
49d451a1a3 chore(deps): bump github.com/docker/docker (#26718)
Bumps [github.com/docker/docker](https://github.com/docker/docker) from 28.3.3+incompatible to 28.4.0+incompatible.
- [Release notes](https://github.com/docker/docker/releases)
- [Commits](https://github.com/docker/docker/compare/v28.3.3...v28.4.0)

---
updated-dependencies:
- dependency-name: github.com/docker/docker
  dependency-version: 28.4.0+incompatible
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-08 17:18:18 +02:00
Deniz Onur Duzgun
8a96929870 bump: go and go-getter versions (#26713)
* bump: go and go-getter versions

* add changelog
2025-09-08 11:10:25 -04:00
dependabot[bot]
00fd92a1d4 chore(deps): bump github.com/hashicorp/cronexpr in /api (#26715)
Bumps [github.com/hashicorp/cronexpr](https://github.com/hashicorp/cronexpr) from 1.1.2 to 1.1.3.
- [Release notes](https://github.com/hashicorp/cronexpr/releases)
- [Commits](https://github.com/hashicorp/cronexpr/compare/v1.1.2...v1.1.3)

---
updated-dependencies:
- dependency-name: github.com/hashicorp/cronexpr
  dependency-version: 1.1.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-08 17:06:16 +02:00
Michael Smithhisler
56b7a8da5c secrets: add changelog for secret block 2025-09-05 16:09:33 -04:00
Michael Smithhisler
10ed46cbd4 secrets: pass key/value config data to plugins as env (#26455)
Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
Co-authored-by: Tim Gross <tgross@hashicorp.com>
2025-09-05 16:08:24 -04:00
Michael Smithhisler
e9e1631b8c test: add task validation when using vault secret provider (#26517) 2025-09-05 16:08:23 -04:00
Michael Smithhisler
1089b8893e secrets: refactor template providers to hold secrets in memory (#26506) 2025-09-05 16:08:23 -04:00
Michael Smithhisler
9950ef515c secrets: validate name and update client config (#26447) 2025-09-05 16:08:23 -04:00
Michael Smithhisler
68167254e8 e2e: add initial tests for secrets block (#26397) 2025-09-05 16:08:23 -04:00
Michael Smithhisler
00ef9cacab secrets: add common secrets plugins impl (#26335)
Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
2025-09-05 16:08:23 -04:00
Michael Smithhisler
c7a6b8b253 adds implied secrets constraint to job hook (#26328) 2025-09-05 16:08:23 -04:00
Michael Smithhisler
ac32b0864d scheduler: adds implicit constraint for secrets plugin node attributes (#26303) 2025-09-05 16:08:23 -04:00
Michael Smithhisler
6dcd155bf8 add input validation and path traversal protections (#26241)
---------

Co-authored-by: Deniz Onur Duzgun <59659739+dduzgun-security@users.noreply.github.com>
2025-09-05 16:08:23 -04:00
Tim Gross
0e9eb5ae43 dispatch: write evaluation atomically with dispatch registration (#26710)
In #8435 (shipped in 0.12.1), we updated the `Job.Register` RPC to atomically
write the eval along with the job. But this didn't get copied to
`Job.Dispatch`. Under excessive load testing we demonstrated this can result in
dispatched jobs without corresponding evals.

Update the dispatch RPC to write the eval in the same Raft log as the job
registration. Note that we don't need to version-check this change for upgrades,
because the register and dispatch RPCs share the same `JobRegisterRequestType`
Raft message, and therefore all supported server versions already look for the
eval in the FSM. If an updated leader includes the eval, older followers will
write the eval. If a non-updated leader writes the eval in a separate Raft
entry, updated followers will write those evals normally.

Fixes: https://github.com/hashicorp/nomad/issues/26655
Ref: https://hashicorp.atlassian.net/browse/NMD-947
Ref: https://github.com/hashicorp/nomad/pull/8435
2025-09-05 14:53:08 -04:00
Piotr Kazmierczak
964cc8b8ca Merge pull request #26708 from hashicorp/f-system-deployments
scheduler: system deployments
2025-09-05 18:23:41 +02:00
Piotr Kazmierczak
3e4d2b731c scheduler: changelog entry for system deployments 2025-09-05 17:52:27 +02:00
Piotr Kazmierczak
8175f275c9 tooling: add 'feature' changelog msg type for make cl (#26709) 2025-09-05 16:42:42 +02:00
Tim Gross
ce614e6b7a scheduler: upgrade block testing for system deployments (#26579)
This changeset adds system scheduler tests of various permutations of the `update`
block. It also fixes a number of bugs discovered in the process.

* Don't create deployment for in-flight rollout. If a system job is in the
  middle of a rollout prior to upgrading to a version of Nomad with system
  deployments, we'll end up creating a system deployment which might never
  complete because previously placed allocs will not be tracked. Check to see if
  we have existing allocs that should belong to the new deployment and prevent a
  deployment from being created in that case.
* Ensure we call `Copy` on `Deployment` to avoid state store corruption.
* Don't limit canary counts by `max_parallel`.
* Never create deployments for `sysbatch` jobs.

Ref: https://hashicorp.atlassian.net/browse/NMD-761
2025-09-05 10:22:42 -04:00
Piotr Kazmierczak
a083495240 system scheduler: correction to Test_computeCanaryNodes (#26707) 2025-09-05 16:20:34 +02:00
Piotr Kazmierczak
276ab8a4c6 system scheduler: keep track of previously used canary nodes (#26697)
In the system scheduler, we need to keep track which nodes were previously used
as "canary nodes" and not pick them at random, in case of previously failed
canaries or changes to the amount of canaries in the jobspec.

---------

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2025-09-05 15:32:08 +02:00
James Rasell
1916a16311 exec: Set LOGNAME env var on exec based drivers. (#26703)
Typically the `LOGNAME` environment variable should be set according
to the values within `/etc/passwd` and represents the name of the
logged in user. This should be set, where possible, alongside the
USER and HOME variables for all drivers that use the shared
executor and do not use a sub-shell.
2025-09-05 14:07:27 +01:00
Michael Schurter
c046e83d17 bump cronexpr from v1.1.2 -> v1.1.3 (#26700)
No functional changes. Bumping just to clear up some license
ambiguities.
2025-09-05 07:46:02 +01:00
Michael Smithhisler
85a2875183 task: adds ability to interpret values from secrets hook (#26261) 2025-09-04 15:58:03 -04:00
Michael Smithhisler
2d0ce43c47 secrets: add vault secrets provider (#26198) 2025-09-04 15:58:03 -04:00
Michael Smithhisler
20a855ea13 secrets: add secrets hook with nomad provider (#26143) 2025-09-04 15:58:03 -04:00
Michael Smithhisler
65c7f34f2d secrets: Add secrets block to job spec (#26076) 2025-09-04 15:58:03 -04:00
Daniel Bennett
9682aa2724 consul connect: allow "cni/*" network mode (#26449)
don't require "bridge" network mode when using connect{}

we document this as "at your own risk" because CNI configuration
is so flexible that we can't guarantee a user's network will work,
but Nomad's "bridge" CNI config may be used as a reference.
2025-09-04 12:29:50 -04:00
Juana De La Cuesta
2944a34b58 Reuse token if it exists on client reconnect (#26604)
Currently every time a client starts, it creates a new consul token per service or task,. This PR changes the behaviour , it persists consul ACL token to the client state and it starts by looking up a token before creating a new one.

Fixes: #20184
Fixes: #20185
2025-09-04 15:27:57 +02:00
Daniel Bennett
3ad22ddad5 e2e: ui: fix token form fill (#26692)
look, I know I misspelled "locater" in the code comment, but it's easier to acknowledge that here in this commit message than it is to push a new commit with all the test/approval machinery in github.
2025-09-03 12:11:35 -04:00
dependabot[bot]
d0db16386f chore(deps): bump github.com/stretchr/testify from 1.10.0 to 1.11.1 (#26669) 2025-09-03 15:22:58 +01:00
Piotr Kazmierczak
14e98a2420 scheduler: fix promotions of system job canaries (#26652)
This changeset adjusts the handling of allocations placement when we're
promoting a deployment, and it corrects the behavior of isDeploymentComplete,
which previously would never mark promoted deployment as complete.
2025-09-03 16:09:36 +02:00
James Rasell
269e05ba33 test: Migrate volumewatcher to must and fix racy test. (#26686)
The TestVolumeWatch_LeadershipTransition test was a little racy
and the fix required adding an eventually wrapper to the end of
the test. While doing this work, it seemed fit to move the package
to the must library also.
2025-09-03 14:21:10 +01:00
James Rasell
270ab1011e lint: Enable and fix SA9004 constant type lint errors. (#26678)
When creating constants with a custom type, each definition should
include the type definition. If only the first constant defines
this, it will have a different type to the other constants.

This change fixes occurances of this and enables SA9004 within CI
linting to catch future problems while the change is in review.
2025-09-03 07:45:29 +01:00
Chris Roberts
b856e065f2 Merge pull request #26440 from hashicorp/f-winsvc-service
Add Windows service commands and Event Log support
2025-09-02 17:10:19 -07:00
Chris Roberts
c3dcdb5413 [cli] Add windows service commands (#26442)
Adds a new `windows` command which is available when running on
a Windows hosts. The command includes two new subcommands:

* `service install`
* `service uninstall`

The `service install` command will install the called binary into
the Windows program files directory, create a new Windows service,
setup configuration and data directories, and register the service
with the Window eventlog. If the service and/or binary already
exist, the service will be stopped, service and eventlog updated
if needed, binary replaced, and the service started again.

The `service uninstall` command will stop the service, remove the
Windows service, and deregister the service with the eventlog. It
will not remove the configuration/data directory nor will it remove
the installed binary.
2025-09-02 16:40:35 -07:00