Commit Graph

26799 Commits

Author SHA1 Message Date
Juanadelacuesta
fba2efa728 func: add a step to drain a node as part of the upgrade process 2025-03-14 17:43:36 +01:00
Juana De La Cuesta
e3f21166af Merge pull request #25393 from hashicorp/NET-12228-consul
Set the default policy to deny for consul ACLs on e2e cluster
2025-03-14 16:55:34 +01:00
Juanadelacuesta
3af2da7362 fix: add default policy to consul acl configurations for the e2e cluster 2025-03-14 16:46:03 +01:00
James Rasell
3e1f56c1c0 cli: Add volume type to delete error messages when API call fails. (#25392) 2025-03-14 14:59:41 +00:00
Phil Renaud
239ac3e4bd [ui] Case-insensitive jobs list filtering (#25378) 2025-03-13 16:39:19 -04:00
Tim Gross
433f8c9a8b dynamic host volumes: don't wait for fingerprint to reserve node (#25386)
If multiple dynamic host volumes are created in quick succession, it's possible
for the server to attempt placement on a host where another volume has been
placed but not yet fingerprinted as ready. Once a `VolumeCreate` RPC returns a
response, we've already invoked the plugin successfully and written to state, so
we're just waiting on the fingerprint for scheduling purposes. Change the
placement selection so that we skip a node if it has a volume, regardless of
whether that volume is ready yet.
2025-03-13 15:27:01 -04:00
Tim Gross
8cf34bde62 upgrade testing: allow configurable artifactory repo (#25350)
Prerelease builds are in a different Artifactory repository than release
builds. Make this a variable option so we can test prerelease builds in the
nightly/weekly runs.
2025-03-13 10:32:02 -04:00
Juana De La Cuesta
ad7dc7a4eb Merge pull request #25348 from hashicorp/NET-11546-enos-linux
Add instructions to add new workloads to the tests.
2025-03-13 10:38:47 +01:00
dependabot[bot]
dab7e49a3f chore(deps): bump golang.org/x/net from 0.34.0 to 0.36.0 (#25377)
Bumps [golang.org/x/net](https://github.com/golang/net) from 0.34.0 to 0.36.0.
- [Commits](https://github.com/golang/net/compare/v0.34.0...v0.36.0)

---
updated-dependencies:
- dependency-name: golang.org/x/net
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-03-13 10:06:54 +01:00
Daniel Bennett
3322254e5b cli: acl auth-method info: add client assertion (#25370)
and pkce
2025-03-12 12:38:03 -05:00
Daniel Bennett
6a06653032 auth: decrease size of oidc request cache (#25371)
if the auth-url api is getting DOS'd,
then we do not expect it to still function;
we only protect the rest of the system.

users will need to use a break-glass ACL
token if they need Nomad UI/API access
during such a denial of service.
2025-03-12 12:37:47 -05:00
Piotr Kazmierczak
5c2ae00170 docs: increasing the non-interactive desktop heap size (#25357) 2025-03-12 17:19:49 +01:00
Juanadelacuesta
ebeb3047c8 docs: add note about workloads life expectancy 2025-03-12 16:51:03 +01:00
Juana De La Cuesta
667e02730e Merge pull request #25358 from hashicorp/release/1.10.0-beta.1
Release/1.10.0 beta.1
2025-03-12 16:30:41 +01:00
Tim Gross
92013e274c docs: update 1.10-beta changelog with major features (#25367) 2025-03-12 10:58:46 -04:00
Habibi Mustafa
0b1a660b81 docs: fix missing api version on path (#25355) 2025-03-12 09:35:52 -05:00
Habibi Mustafa
715186f7c3 docs: fix missing api version on acl path (#25356)
* docs: fix missing api version on acl auth method path

* docs: fix missing api version on acl binding rules path

* docs: fix missing api version on acl policies path

* docs: fix missing api version on acl roles path

* docs: fix missing api version on acl tokens path
2025-03-12 09:28:21 -05:00
hc-github-team-nomad-core
18a8190a1f Prepare for next release 2025-03-12 10:37:52 +00:00
hc-github-team-nomad-core
e1b9bd8ab0 Generate files for 1.10.0-beta.1 release 2025-03-12 10:37:46 +00:00
Juanadelacuesta
5957472d31 Prepare release 1.10.0-beta.1 2025-03-12 11:32:00 +01:00
Daniel Bennett
d98d414c7f oidc: more tests for client assertions (#25352)
Co-authored-by: dduzgun-security <deniz.duzgun@hashicorp.com>
2025-03-11 15:56:26 -05:00
Juana De La Cuesta
3de2a6b1d6 Update README.md 2025-03-11 17:51:56 +01:00
Juana De La Cuesta
b1ea04a4d1 Update README.md 2025-03-11 17:50:26 +01:00
Juana De La Cuesta
859f257d32 Update README.md 2025-03-11 17:48:45 +01:00
Juanadelacuesta
08f386e8e5 docs: Add section of readme to add workloads 2025-03-11 17:48:14 +01:00
Daniel Bennett
04db81951f test: fix go 1.24 test complaints (#25346)
e.g. Error: nomad/leader_test.go:382:12: non-constant format string in call to (*testing.common).Fatalf
2025-03-11 11:01:39 -05:00
Tim Gross
1ffb7ab3fb dynamic host volumes: allow plugins to return an error message (#25341)
Errors from `volume create` or `volume delete` only get logged by the client
agent, which may make it harder for volume authors to debug these tasks if they
are not also the cluster administrator with access to host logs.

Allow plugins to include an optional error message in their response. Because we
can't count on receiving this response (the error could come before the plugin
executes), we parse this message optimistically and include it only if
available.

Ref: https://hashicorp.atlassian.net/browse/NET-12087
2025-03-11 11:06:57 -04:00
James Rasell
33905d3cdc Merge pull request #25342 from hashicorp/post-1.9.7-release
Post 1.9.7 release
2025-03-11 15:40:42 +01:00
James Rasell
721066528d Merge release 1.9.7 files 2025-03-11 14:12:07 +00:00
hc-github-team-nomad-core
a411cccce4 Prepare for next release 2025-03-11 14:09:02 +00:00
hc-github-team-nomad-core
1da56b8e07 Generate files for 1.9.7 release 2025-03-11 14:09:02 +00:00
Daniel Bennett
38f063a341 auth: oidc request lru cache (#25336)
use hashicorp/golang-lru instead of my hand-rolled cache
2025-03-11 08:46:23 -05:00
Tim Gross
61bbff9c24 upgrade testing: Variables, Workload Identity, and Task API (#25229)
Add an upgrade test workload for that continuously writes to a Nomad
Variable. In order to run this workload, we'll need to deploy a
Workload-Associated ACL policy. So this extends the `run_workloads` module to
allow for a "pre script" to be run before a given job is deployed. We can use
that as a model for other test workloads.

Ref: https://hashicorp.atlassian.net/browse/NET-12217
2025-03-11 08:48:40 -04:00
Phil Renaud
1976202cd6 Feature: Dynamic Host Volumes in the UI (#25224)
* DHV UI init

* /csi routes to /storage routes and a routeRedirector util (#25163)

* /csi routes to /storage routes and a routeRedirector util

* Tests and routes move csi/ to storage/

* Changelog added

* [ui] Storage UI overhaul + Dynamic Host Volumes UI (#25226)

* Storage index page and DHV model properties

* Naive version of a storage overview page

* Experimental fetch of alloc data dirs

* Fetch ephemeral disks and static host volumes as an ember concurrency task and nice table stylings

* Playing nice with section header labels to make eslint happy even though wcag was already cool with it

* inlined the storage type explainers and reordered things, plus tooltips and keynav

* Bones of a dynamic host volume individual page

* Woooo dynamic host volume model, adapter, and serializer with embedded alloc relationships

* Couple test fixes

* async:false relationship for dhv.hasMany('alloc') to prevent a ton of xhr requests

* DHV request type at index routemodel and better serialization

* Pagination and searching and query params oh my

* Test retrofits for csi volumes

* Really fantastic flake gets fixed

* DHV detail page acceptance test and a bunch of mirage hooks

* Seed so that the actions test has a guaranteed task

* removed ephemeral disk and static host volume manual scanning

* CapacityBytes and capabilities table added to DHV detail page

* Debugging actions flyout test

* was becoming clear that faker.seed editing was causing havoc elsewhere so might as well not boil the ocean and just tell this test to do what I want it to

* Post-create job gets taskCount instead of count

* CSI volumes now get /csi route prefix at detail level

* lazyclick method for unused keynav removed

* keyboard nav and table-watcher for DHV added

* Addressed PR comments, changed up capabilities table and id references, etc.

* Capabilities table for DHV and ID in details header

* Testfixes for pluginID and capabilities table on DHV page
2025-03-10 14:46:02 -04:00
Daniel Bennett
8e56805fea oidc: support PKCE and client assertion / private key JWT (#25231)
PKCE is enabled by default for new/updated auth methods.
 * ref: https://oauth.net/2/pkce/

Client assertions are an optional, more secure replacement for client secrets
 * ref: https://oauth.net/private-key-jwt/

a change to the existing flow, even without these new options,
is that the oidc.Req is retained on the Nomad server (leader)
in between auth-url and complete-auth calls.

and some fields in auth method config are now more strictly required.
2025-03-10 13:32:53 -05:00
Daniel Bennett
dc482bf905 auth: redact auth method client secret (#25328)
OIDC client secrets that users provide in auth method configuration are,
well, secret, so we should hide them from API calls and event streams.
2025-03-10 11:12:02 -05:00
Tim Gross
4a1b050eb8 docs: extend code layout in contributing guides (#25330)
Expand this document with some rough signposts to the major packages that developers will need to know about when getting started.
2025-03-10 11:55:38 -04:00
Deniz Onur Duzgun
182c46a746 update: changelog 24683.txt (#25329) 2025-03-10 11:40:06 -04:00
James Rasell
c53ba3e7d1 consul: Remove implicit workload identity when task has a template. (#25298)
When a task included a template block, Nomad was adding a Consul
identity by default which allowed the template to use Consul API
template functions even when they were not needed or desired.

This change removes the implict addition of Consul identities to
tasks when they include a template block. Job specification
authors will now need to add a Consul identity or Consul block to
their task if they have a template which uses Consul API functions.

This change also removes the default addition of a Consul block to
all task groups registered and processed by the API package.
2025-03-10 13:49:50 +00:00
James Rasell
4bbce4c7a3 deps: Consolidated update of dependabot PRs (#25324)
* chore(deps): bump github.com/container-storage-interface/spec
* chore(deps): bump github.com/hashicorp/go-kms-wrapping/wrappers/azurekeyvault/v2
* chore(deps): bump github.com/moby/sys/mount from 0.3.3 to 0.3.4
* chore(deps): bump golang.org/x/crypto from 0.35.0 to 0.36.0
* chore(deps): bump github.com/aws/aws-sdk-go-v2/config
2025-03-10 09:10:12 +00:00
James Rasell
f94016816d cli: Add node_prefix read policy to Consul setup task policy. (#25310)
When Nomad registers a service within Consul it is regarded as a
node service. In order for Nomad workloads to read these services,
it must have an ACL policy which includes node_prefix read. If it
does not, the service is filtered out from the result.

This change adds the required permission to the Consul setup
command.
2025-03-10 08:06:09 +00:00
Robert Main
57cd92274c Merge pull request #25192 from hashicorp/dependabot/npm_and_yarn/website/prettier-3.5.2
chore(deps-dev): bump prettier from 3.5.1 to 3.5.2 in /website
2025-03-07 16:14:30 -05:00
Tim Gross
5cc1b4e606 upgrade tests: add transparent proxy workload (#25176)
Add an upgrade test workload for Consul service mesh with transparent
proxy. Note this breaks from the "countdash" demo. The dashboard application
only can verify the backend is up by making a websocket connection, which we
can't do as a health check, and the health check it exposes for that purpose
only passes once the websocket connection has been made. So replace the
dashboard with a minimal nginx reverse proxy to the count-api instead.

Ref: https://hashicorp.atlassian.net/browse/NET-12217
2025-03-07 15:25:26 -05:00
Tim Gross
c3e2d4a652 E2E: remove outdated legacy token workflow tests (#25315)
In https://github.com/hashicorp/nomad/pull/25217 we removed the legacy Consul token workflow, and in https://github.com/hashicorp/nomad/pull/25174 we removed the related E2E tests. But we missed the tests in the `e2e/connect` package.

After removing these tests, Consul-related E2E tests in this repo pass.
2025-03-07 15:09:36 -05:00
Phil Renaud
35e1ea4328 [cli] UI URL hints for common CLI commands (#24454)
* Basic implementation for server members and node status

* Commands for alloc status and job status

* -ui flag for most commands

* url hints for variables

* url hints for job dispatch, evals, and deployments

* agent config ui.cli_url_links to disable

* Fix an issue where path prefix was presumed for variables

* driver uncomment and general cleanup

* -ui flag on the generic status endpoint

* Job run command gets namespaces, and no longer gets ui hints for --output flag

* Dispatch command hints get a namespace, and bunch o tests

* Lots of tests depend on specific output, so let's not mess with them

* figured out what flagAddress is all about for testServer, oof

* Parallel outside of test instances

* Browser-opening test, sorta

* Env var for disabling/enabling CLI hints

* Addressing a few PR comments

* CLI docs available flags now all have -ui

* PR comments addressed; switched the env var to be consistent and scrunched monitor-adjacent hints a bit more

* ui.Output -> ui.Warn; moves hints from stdout to stderr

* isTerminal check and parseBool on command option

* terminal.IsTerminal check removed for test-runner-not-being-terminal reasons
2025-03-07 13:23:35 -05:00
Tim Gross
f3d53e3e2b CSI: restart task on failing initial probe, instead of killing it (#25307)
When a CSI plugin is launched, we probe it until the csi_plugin.health_timeout
expires (by default 30s). But if the plugin never becomes healthy, we're not
restarting the task as documented.

Update the plugin supervisor to trigger a restart instead. We still exit the
supervisor loop at that point to avoid having the supervisor send probes to a
task that isn't running yet. This requires reworking the poststart hook to allow
the supervisor loop to be restarted when the task restarts.

In doing so, I identified that we weren't respecting the task kill context from
the post start hook, which would leave the supervisor running in the window
between when a task is killed because it failed and its stop hooks were
triggered. Combine the two contexts to make sure we stop the supervisor
whichever context gets closed first.

Fixes: https://github.com/hashicorp/nomad/issues/25293
Ref: https://hashicorp.atlassian.net/browse/NET-12264
2025-03-07 10:04:59 -05:00
James Rasell
768ba78e2d deps: Consolidated update of dependabot PRs (#25311)
* chore(deps): bump github.com/hashicorp/go-kms-wrapping/v2
* chore(deps): bump github.com/hashicorp/go-connlimit from 0.3.0 to 0.3.1
* chore(deps): bump github.com/aws/aws-sdk-go-v2/config
* chore(deps): bump github.com/hashicorp/cap from 0.7.0 to 0.9.0
* chore(deps): bump go.uber.org/goleak from 1.2.1 to 1.3.0
2025-03-07 14:38:40 +00:00
James Rasell
c0eccda4f7 template: Set any Consul token generated by workload identity. (#25309) 2025-03-07 14:32:02 +00:00
Tim Gross
f528022e3a upgrade testing: add missing dependency during client upgrades (#25306)
The check to read back node metadata depends on a resource that waits for the
Nomad API, but that resource doesn't wait for the metadata to be written in the
first place (and the client subsequently upgraded). Add this dependency so that
we're reading back the node metadata as the last step.

Ref: https://github.com/hashicorp/nomad-e2e/actions/runs/13690355150/job/38282457406
2025-03-07 09:06:04 -05:00
James Rasell
7b156e928a github: Update Vault and Consul versions used in core workflow. (#25287) 2025-03-07 07:20:24 +00:00