Commit Graph

26943 Commits

Author SHA1 Message Date
James Rasell
85c30dfd1e test: Remove use of "mitchellh/go-testing-interface" for stdlib. (#25640)
The stdlib testing package now includes this interface, so we can
remove our dependency on the external library.
2025-04-14 07:43:49 +01:00
Aimee Ukasick
d293684d3d Update rel notes, upgrade links to point to correct previous ver (#25652) 2025-04-11 10:22:23 -05:00
Tim Gross
5c89b07f11 CI: run copywrite on PRs, not just after merges (#25658)
* CI: run copywrite on PRs, not just after merges
* fix a missing copyright header
2025-04-10 17:01:34 -04:00
Carlos Galdino
048c5bcba9 Use core ID when selecting cores (#25340)
* Use core ID when selecting cores

If the available cores are not a continuous set, the core selector might
panic when trying to select cores.

For example, consider a scenario where the available cores for the selector are the following:

    [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47]

This list contains 46 cores, because cores with IDs 0 and 24 are not
included in the list

Before this patch, if we requested 46 cores, the selector would panic
trying to access the item with index 46 in `cs.topology.Cores`.

This patch changes the selector to use the core ID instead when looking
for a core inside `cs.topology.Cores`. This prevents an out of bounds
access that was causing the panic.

Note: The patch is straightforward with the change. Perhaps a better
long-term solution would be to restructure the `numalib.Topology.Cores`
field to be a `map[ID]Core`, but that is a much larger change that is
more difficult to land. Also, the amount of cores in our case is
small—at most 192—so a search won't have any noticeable impact.

* Add changelog entry

* Build list of IDs inline
2025-04-10 13:04:15 -07:00
Michael Schurter
4a147db906 Merge pull request #25654 from hashicorp/cl-1.9.8-1.8.12
Add changelogs for 1.9.8+ent and 1.8.12+ent
2025-04-10 11:54:30 -07:00
Michael Schurter
6a09e7f9cd created changelog for 1.8.12+ent 2025-04-10 11:43:53 -07:00
Michael Schurter
a39743c96d created changelog for 1.9.8+ent 2025-04-10 11:43:34 -07:00
Michael Schurter
c5451cf300 Merge pull request #25635 from hashicorp/post-1.10.0-release
Post 1.10.0 release
2025-04-10 10:32:24 -07:00
Tim Gross
48f304d0ca java: only set nobody user on Unix (#25648)
In #25496 we introduced the ability to have `task.user` set for on Windows, so
long as the user ID fits a particular shape. But this uncovered a 7 year old bug
in the `java` driver introduced in #5143, where we set the `task.user` to the
non-existent Unix user `nobody`, even if we're running on Windows.

Prior to the change in #25496 we always ignored the `task.user`, so this was not
a problem. We don't set the `task.user` in the `raw_exec` driver, and the
otherwise very similar `exec` driver is Linux-only, so we never see the problem
there.

Fix the bug in the `java` driver by gating the change to the `task.user` on not
being Windows. Also add a check to the new code path that the user is non-empty
before parsing it, so that any third party drivers that might be borrowing the
executor code don't hit the same probem on Windows.

Ref: https://github.com/hashicorp/nomad/pull/5143
Ref: https://github.com/hashicorp/nomad/pull/25496
Fixes: https://github.com/hashicorp/nomad/issues/25638
2025-04-10 10:34:34 -04:00
Ranjandas
8b33584fbf Add note to root keyring remove command (#25637)
* Add note to root keyring remove command

This PR updates the documentation for the root keyring remove command to note that the full key ID must be provided for the command to function correctly.

* Move keyID explanation to usage section

---------

Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>
2025-04-10 08:58:48 -05:00
James Rasell
4c4cb2c6ad agent: Fix misaligned contextual k/v logging arguments. (#25629)
Arguments passed to hclog log lines should always have an even
number to provide the expected k/v output.
2025-04-10 14:40:21 +01:00
Tim Gross
27caae2b2a api: make attempting to remove peer by address a no-op (#25599)
In Nomad 1.4.0 we removed support for Raft Protocol v2 entirely. But the
`Operator.RemoveRaftPeerByAddress` RPC handler was left in place, along with its
supporting HTTP API and command line flags. Using this API will always result in
the Raft library error "operation not supported with current protocol version".

Unfortunately it's still possible in unit tests to exercise this code path, and
these tests are quite flaky. This changeset turns the RPC handler and HTTP API
into a no-op, removes the associated command line flags, and removes the flaky
tests. I've also cleaned up the test for `RemoveRaftPeerByID` to consolidate
test servers and use `shoenig/test`.

Fixes: https://hashicorp.atlassian.net/browse/NET-12413
Ref: https://github.com/hashicorp/nomad/pull/13467
Ref: https://developer.hashicorp.com/nomad/docs/upgrade/upgrade-specific#raft-protocol-version-2-unsupported
Ref: https://github.com/hashicorp/nomad-enterprise/actions/runs/13201513025/job/36855234398?pr=2302
2025-04-10 09:19:25 -04:00
Michael Schurter
0d9b108498 cleanup 1.10.0 entry and move 1.7 out 2025-04-09 16:09:01 -07:00
hc-github-team-nomad-core
7db0bdf2de Prepare for next release 2025-04-09 16:03:21 -07:00
hc-github-team-nomad-core
71af41b4b1 Generate files for 1.10.0 release 2025-04-09 16:03:21 -07:00
hc-github-team-nomad-core
9f33796156 Prepare for next release 2025-04-09 16:03:21 -07:00
hc-github-team-nomad-core
239c5f11ee Generate files for 1.10.0 release 2025-04-09 16:03:21 -07:00
Aimee Ukasick
87aabc9af2 Docs: 1.10 release notes, some factoring, sentinel apply update (#25433)
* Docs: 1.10 release notes and upgrade factoring

* Update based on code review suggestions

* add CLI for disabling UI URL hints

* fix indentation

* nav: list release notes in reverse order

fix broken link to v1.6.x docs

* Update PKCE section from Daniel's latest PR

* update pkce per daniel's suggestion

* Add dynamic host volumes governance section from blog
2025-04-09 15:43:58 -07:00
Michael Schurter
95c914624e build: Update Go to v1.24.0 (#25623)
* build: Update Go to v1.24.0

Fixes Go CVE https://pkg.go.dev/vuln/GO-2025-3563
2025-04-08 16:07:47 -07:00
James Rasell
311a83d706 e2e: Ensure UI is enabled. (#25620)
The `ui.enabled` parameter is a non-pointer bool which means the
merge function is unable to differentiate between false and not
set. When e2e introduced the `ui.show_cli_hints` configuration
parameter, the way we merge meant the UI became disabled.
2025-04-08 13:57:29 +01:00
dependabot[bot]
b8143368d8 chore(deps): bump github.com/hashicorp/raft from 1.7.2 to 1.7.3 (#25602) 2025-04-07 22:30:21 +00:00
dependabot[bot]
d6b429aa6b chore(deps): bump github.com/miekg/dns from 1.1.62 to 1.1.64 (#25603) 2025-04-07 22:29:13 +00:00
dependabot[bot]
f72fea22a7 chore(deps): bump github.com/docker/docker (#25604) 2025-04-07 22:27:56 +00:00
dependabot[bot]
60cc55615e chore(deps): bump google.golang.org/grpc from 1.71.0 to 1.71.1 (#25605) 2025-04-07 22:26:39 +00:00
dependabot[bot]
09ebb390e2 chore(deps): bump github.com/golang/snappy from 0.0.4 to 1.0.0 (#25606) 2025-04-07 22:23:13 +00:00
Daniel Bennett
5c8e436de9 auth: oidc: disable pkce by default (#25600)
our goal of "enable by default, only for new auth methods"
proved to be unwieldy, so instead make it a simple bool,
disabled by default.
2025-04-07 12:36:09 -05:00
James Rasell
6c39285538 e2e: Ensure test resources are cleaned. (#25611)
I couldn't find any reason the exec2 HTTP jobs were not being run
with a generated cleanup function, so I added this.

The deletion of the DHV ACL policy does not seem like it would
have any negative impact.
2025-04-07 14:15:29 +01:00
James Rasell
0316309276 ci: Run the build workflow on pushes to long-lived branches only. (#25597) 2025-04-07 07:16:24 +01:00
Tim Gross
95520ac819 Post-release 1.10.0-rc.1 (#25596) 2025-04-03 16:49:58 -04:00
Tim Gross
c653f52b8d release: update backport versions for 1.10.0 (#25595)
With the release of Nomad 1.10.0-rc.1, we'll start backporting to the 1.10.x
release series. Add this to the supported versions and remove 1.7.x.
2025-04-03 15:29:35 -04:00
hc-github-team-nomad-core
0f29b0c51b Prepare for next release 2025-04-03 18:22:07 +00:00
hc-github-team-nomad-core
a18faebda1 Generate files for 1.10.0-rc.1 release 2025-04-03 18:21:58 +00:00
Tim Gross
fffef3c6b1 Prepare release 1.10.0-rc.1 2025-04-03 14:16:30 -04:00
Daniel Bennett
6383d5f54d auth: oidc client assertion tweaks (#25565)
* allow for newline flexibility in client assertion key/cert

* if client assertion, don't send the client secret,
but do keep the client secret in both places in state
(on the parent Config, and within the OIDCClientAssertion)
mainly so that it shows up as "redacted" instead of empty
when inspecting the auth method config via API.
2025-04-03 11:53:37 -05:00
Daniel Bennett
6a0c4f5a3d auth: oidc: enable pkce only on new auth methods (#25593)
trying not to violate the principle of least astonishment.

we want to only auto-enable PKCE on *new* auth methods,
rather than *new or updated* auth methods, to avoid a
scenario where a Nomad admin updates an auth method
sometime in the future -- something innocent like a new
client secret -- and their OIDC provider doesn't like PKCE.

the main concern is that the provider won't like PKCE
in a totally confusing way. error messages rarely
say PKCE directly, so why the user's auth method
suddenly broke would be a big mystery.

this means that to enable it on existing auth methods,
you would set `OIDCDisablePKCE = false`, and the double-
negative doesn't feel right, so instead, swap the language,
so enabling it on *existing* methods reads sensibly, and to
disable it on *new* methods reads ok-enough:
`OIDCEnablePKCE = false`
2025-04-03 10:56:17 -05:00
Denis Rodin
aca0ff438a raw_exec windows: add support for setting the task user (#25496) 2025-04-03 11:21:13 -04:00
Tim Gross
e4d2fc93cd upgrade testing: temporarily disable CSI workload (#25589)
The CSI workload we're using for upgrade testing seems to be flaky to come
up. The plugin jobs don't launch in a timely fashion despite several
attempts. In order to not block running the rest of the upgrade testing, let's
disable this workload temporarily. We'll fix this in NET-12430.

Ref: https://hashicorp.atlassian.net/browse/NET-12430
2025-04-03 08:53:20 -04:00
tehut
27b1d470a8 modify rawexec TaskConfig and Config to accept envvar denylist (#25511)
* modify rawexec TaskConfig and Config to accept envvar denylist
* update rawexec driver docs to include deniedEnvars options
Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>

---------

Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>
2025-04-02 12:25:28 -07:00
Tim Gross
78cc7ec1eb dynamic host volumes: enforce that namespace exists (#25590)
Testing found that if you create or register a dynamic host volume in a
non-existing namespace, the volume gets created on the client but then we can't
write it to state. Add a check for this in the initial validation.
2025-04-02 15:18:55 -04:00
Nikita Eliseev
76fb3eb9a1 rpc: added configuration for yamux session (#25466)
Fixes: https://github.com/hashicorp/nomad/issues/25380
2025-04-02 10:58:23 -04:00
Tim Gross
1a1ccec8b2 CNI: add warning log for CNI check command failures (#25581)
In #24658 we fixed a bug around client restarts where we would not assert
network namespaces existed and were properly configured when restoring
allocations. We introduced a call to the CNI `Check` method so that the plugins
could report correct config. But when we get an error from this call, we don't
log it unless the error is fatal. This makes it challenging to debug the case
where the initial check fails but we tear down the network and try again (as
described in #25510). Add a noisy log line here.

Ref: https://github.com/hashicorp/nomad/pull/24658
Ref: https://github.com/hashicorp/nomad/issues/25510
2025-04-02 10:43:05 -04:00
Phil Renaud
afa9e65afa Update playwright to 1.51.0 for e2e ui tests (#25585) 2025-04-02 15:12:00 +01:00
Michael Smithhisler
c8cc519f54 e2e: disable cli hints for command parsing (#25584) 2025-04-02 09:12:36 -04:00
Michael Smithhisler
95c9029df0 e2e: update consul task policy and add empty consul block to task groups (#25580) 2025-04-01 16:29:47 -04:00
Deniz Onur Duzgun
80da9cb211 bump: go-discovery to latest commit SHA (#25566)
* bump: go-discovery to latest commit SHA

* go mod tidy
2025-04-01 11:12:06 -04:00
James Rasell
1a60464ca5 volumes: Version gate create/delete host volume RPCs. (#25571)
All Nomad servers should be running v1.10.0 before the DHV feature
can be used. Without this, it is possible for a write to succeed
and cause immediate loss and subsequent failure to establish
leadership.
2025-04-01 15:53:37 +01:00
Aimee Ukasick
9778fa4912 Docs: Fix broken links in main for 1.10 release (#25540)
* Docs: Fix broken links in main for 1.10 release

* Implement Tim's suggestions

* Remove link to Portworx from ecosystem page

* remove "Portworx" since Portworx 3.2 no longer supports Nomad
2025-04-01 09:09:44 -05:00
James Rasell
3ffe6e5f53 test: Move client server manager tests to use must library. (#25569) 2025-04-01 14:23:08 +01:00
Tim Gross
cdd40cf81b docs: document requirements for Consul tokens in admin partitions (#25529)
When using Nomad with Consul, each Nomad agent is expected to have a Consul
agent running alongside. When using Nomad Enterprise and Consul Enterprise
together, the Consul agent may be in a Consul admin partition. In order for
Nomad's "anti-entropy" sync to work with Consul, the Consul ACL token and ACL
policy for the Nomad client must be in the same admin partition as the Consul
agent. Otherwise, we can register services (via WI) but then won't be able to
deregister them unless they're the default namespace.

Ref: https://hashicorp.atlassian.net/browse/NET-12361
2025-04-01 08:45:05 -04:00
Michael Smithhisler
7176cf443a docs: add missing podman task config options (#25465)
---------

Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>
2025-04-01 08:31:58 -04:00