nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-06 02:15:43 +03:00

Author	SHA1	Message	Date
James Rasell	f94016816d	cli: Add node_prefix read policy to Consul setup task policy. (#25310 ) When Nomad registers a service within Consul it is regarded as a node service. In order for Nomad workloads to read these services, it must have an ACL policy which includes node_prefix read. If it does not, the service is filtered out from the result. This change adds the required permission to the Consul setup command.	2025-03-10 08:06:09 +00:00
Robert Main	57cd92274c	Merge pull request #25192 from hashicorp/dependabot/npm_and_yarn/website/prettier-3.5.2 chore(deps-dev): bump prettier from 3.5.1 to 3.5.2 in /website	2025-03-07 16:14:30 -05:00
Tim Gross	5cc1b4e606	upgrade tests: add transparent proxy workload (#25176 ) Add an upgrade test workload for Consul service mesh with transparent proxy. Note this breaks from the "countdash" demo. The dashboard application only can verify the backend is up by making a websocket connection, which we can't do as a health check, and the health check it exposes for that purpose only passes once the websocket connection has been made. So replace the dashboard with a minimal nginx reverse proxy to the count-api instead. Ref: https://hashicorp.atlassian.net/browse/NET-12217	2025-03-07 15:25:26 -05:00
Tim Gross	c3e2d4a652	E2E: remove outdated legacy token workflow tests (#25315 ) In https://github.com/hashicorp/nomad/pull/25217 we removed the legacy Consul token workflow, and in https://github.com/hashicorp/nomad/pull/25174 we removed the related E2E tests. But we missed the tests in the `e2e/connect` package. After removing these tests, Consul-related E2E tests in this repo pass.	2025-03-07 15:09:36 -05:00
Phil Renaud	35e1ea4328	[cli] UI URL hints for common CLI commands (#24454 ) * Basic implementation for server members and node status * Commands for alloc status and job status * -ui flag for most commands * url hints for variables * url hints for job dispatch, evals, and deployments * agent config ui.cli_url_links to disable * Fix an issue where path prefix was presumed for variables * driver uncomment and general cleanup * -ui flag on the generic status endpoint * Job run command gets namespaces, and no longer gets ui hints for --output flag * Dispatch command hints get a namespace, and bunch o tests * Lots of tests depend on specific output, so let's not mess with them * figured out what flagAddress is all about for testServer, oof * Parallel outside of test instances * Browser-opening test, sorta * Env var for disabling/enabling CLI hints * Addressing a few PR comments * CLI docs available flags now all have -ui * PR comments addressed; switched the env var to be consistent and scrunched monitor-adjacent hints a bit more * ui.Output -> ui.Warn; moves hints from stdout to stderr * isTerminal check and parseBool on command option * terminal.IsTerminal check removed for test-runner-not-being-terminal reasons	2025-03-07 13:23:35 -05:00
Tim Gross	f3d53e3e2b	CSI: restart task on failing initial probe, instead of killing it (#25307 ) When a CSI plugin is launched, we probe it until the csi_plugin.health_timeout expires (by default 30s). But if the plugin never becomes healthy, we're not restarting the task as documented. Update the plugin supervisor to trigger a restart instead. We still exit the supervisor loop at that point to avoid having the supervisor send probes to a task that isn't running yet. This requires reworking the poststart hook to allow the supervisor loop to be restarted when the task restarts. In doing so, I identified that we weren't respecting the task kill context from the post start hook, which would leave the supervisor running in the window between when a task is killed because it failed and its stop hooks were triggered. Combine the two contexts to make sure we stop the supervisor whichever context gets closed first. Fixes: https://github.com/hashicorp/nomad/issues/25293 Ref: https://hashicorp.atlassian.net/browse/NET-12264	2025-03-07 10:04:59 -05:00
James Rasell	768ba78e2d	deps: Consolidated update of dependabot PRs (#25311 ) * chore(deps): bump github.com/hashicorp/go-kms-wrapping/v2 * chore(deps): bump github.com/hashicorp/go-connlimit from 0.3.0 to 0.3.1 * chore(deps): bump github.com/aws/aws-sdk-go-v2/config * chore(deps): bump github.com/hashicorp/cap from 0.7.0 to 0.9.0 * chore(deps): bump go.uber.org/goleak from 1.2.1 to 1.3.0	2025-03-07 14:38:40 +00:00
James Rasell	c0eccda4f7	template: Set any Consul token generated by workload identity. (#25309 )	2025-03-07 14:32:02 +00:00
Tim Gross	f528022e3a	upgrade testing: add missing dependency during client upgrades (#25306 ) The check to read back node metadata depends on a resource that waits for the Nomad API, but that resource doesn't wait for the metadata to be written in the first place (and the client subsequently upgraded). Add this dependency so that we're reading back the node metadata as the last step. Ref: https://github.com/hashicorp/nomad-e2e/actions/runs/13690355150/job/38282457406	2025-03-07 09:06:04 -05:00
James Rasell	7b156e928a	github: Update Vault and Consul versions used in core workflow. (#25287 )	2025-03-07 07:20:24 +00:00
Tim Gross	694b10d71c	upgrade testing: commit missing volume specification (#25305 ) In #25285 we converted the CSI workload for upgrade testing to use a self-hosted NFS. But the volume spec name got changed to `volume.hcl` in the process, which is in our `.gitignore` file for the repo. We missed this during testing because the file existed locally, but it fails in nightly runs. Ref: https://github.com/hashicorp/nomad/pull/25285 Ref: https://github.com/hashicorp/nomad-e2e/actions/runs/13703979647/job/38324786351	2025-03-06 14:36:34 -05:00
Simon Zou	73ceacd236	ListProcesses through PID when cgroup is not found in Linux (#25198 ) * ListProcesses through PID when cgroup is not found * add changelog entry * update the ListByPid for windows	2025-03-06 17:41:51 +01:00
Piotr Kazmierczak	149141e831	stateful deployments: task group host volume claims docs (#25290 )	2025-03-06 17:23:08 +01:00
Piotr Kazmierczak	ed4a5decba	stateful deployments: concept and jobspec documentation (#25288 )	2025-03-06 17:06:29 +01:00
James Rasell	17fcee5614	deps: Update tool dependencies. (#25275 )	2025-03-06 11:51:07 +00:00
James Rasell	2eb35a4678	build: Update Go to v1.24.1 (#25249 )	2025-03-06 10:33:14 +00:00
Piotr Kazmierczak	29c7b7ca44	stateful deployments: fix missing prefix search in claim list CLI (#25297 )	2025-03-06 11:23:00 +01:00
Juana De La Cuesta	69c2ed55d5	Check for nil values when parsing HCL strings (#25294 ) * fix: when parsing hcl durations, check for nil values and fail validation if present * docs: add changelog * style: remove unnecesary function	2025-03-06 10:38:33 +01:00
Juana De La Cuesta	6ffe441983	[gh-24931] Return dummy function for moving processes when running rootless (#24944 ) * fix: stop executor launch if nomad doesnt have permissions * func: return move function if c group is not enabled	2025-03-06 10:34:21 +01:00
Michael Smithhisler	5c4d0e923d	consul: Remove legacy token based authentication workflow (#25217 )	2025-03-05 15:38:11 -05:00
Tim Gross	3fd8a1ed8d	scheduler: prevent nil pointer ref when reschedule policy is missing (#24893 ) When upgrading from older versions of Nomad, the reschedule policy block may be nil. There is logic to handle this safely in the `NextRescheduleTimeByTime` used for allocs on disconnected clients, but it's missing from the `NextRescheduleTime` method used by more typical allocations. Return an empty time object in this case. Fixes: https://github.com/hashicorp/nomad/issues/24846	2025-03-05 15:34:10 -05:00
Michael Smithhisler	f2b761f17c	disconnected: removes deprecated disconnect fields (#25284 ) The group level fields stop_after_client_disconnect, max_client_disconnect, and prevent_reschedule_on_lost were deprecated in Nomad 1.8 and replaced by field in the disconnect block. This change removes any logic related to those deprecated fields. --------- Co-authored-by: Tim Gross <tgross@hashicorp.com>	2025-03-05 14:46:02 -05:00
Tim Gross	916fe2c7fa	upgrade testing: rework CSI test to use self-contained workload (#25285 ) Getting the CSI test to work with AWS EFS or EBS has proven to be awkward because we're having to deal with external APIs with their own consistency guarantees, as well as challenges around teardown. Make the CSI test entirely self-contained by using a userland NFS server and the rocketduck CSI plugin. Ref: https://hashicorp.atlassian.net/browse/NET-12217 Ref: https://gitlab.com/rocketduck/csi-plugin-nfs	2025-03-05 11:48:19 -05:00
Tim Gross	7a051991bd	upgrade testing: temporarily disable CSI test (#25283 ) The CSI workload is failing and creating complications for teardown, so I'm reworking it. But this work is taking a while to finish, so while that's in progress let's disable the CSI workload so that we're running the upgrade tests all the way through to the end. I expect to be able to revert this in the next couple days.	2025-03-04 11:21:45 -05:00
Tim Gross	9cc0e2eae0	upgrade testing: make cluster name prefix a variable (#25281 ) During initial development of upgrade testing, we had a hard-coded prefix to distinguish between clusters created for this vs those created by GHA runners. Update the prefix to be a variable so that developers can add their own prefix during test workload development.	2025-03-04 11:11:02 -05:00
Juana De La Cuesta	5605f9630d	Fix the docker image parser to account for private repos (#24926 ) * fix: fix the docker image parser to account for private repos * style: change the local regex for docker image indentifiers and use docker package instead * func: return early when no repo found on the image name * func: return error if no path found in image * Update drivers/docker/utils.go Co-authored-by: Tim Gross <tgross@hashicorp.com> * Update coordinator.go * Update driver.go * Update network.go --------- Co-authored-by: Tim Gross <tgross@hashicorp.com>	2025-03-04 16:53:20 +01:00
Aimee Ukasick	b33c801039	Docs: Add workload identity and Consul Enterprise info to partition parameter (#25251 ) * Docs: Add info to partition parameter. CE-820 * fix link format for Nomad Workload Identities * fix typo	2025-03-04 09:33:57 -06:00
Juana De La Cuesta	2dadf9fe6c	Improve stability (#25244 ) * func: add dependencies to avoid race conditions and move the update to each client to the main upgrade scenario * Update enos/enos-scenario-upgrade.hcl Co-authored-by: Tim Gross <tgross@hashicorp.com> * Update enos/enos-scenario-upgrade.hcl Co-authored-by: Tim Gross <tgross@hashicorp.com> --------- Co-authored-by: Tim Gross <tgross@hashicorp.com>	2025-03-04 16:23:07 +01:00
Michael Smithhisler	25cea5c16b	e2e: allow consul access to nomad cluster (#25277 )	2025-03-04 09:06:50 -05:00
grembo	b6d925987c	Allow disabling wait in client configuration (#25255 ) Before the fixes in #20165, the wait feature was disabled by default. After these changes, it's always enabled, which - at least on some platforms - leads to a significant increase in load (5-7x). This patch allows disabling the wait feature in the client stanza of the configuration file by setting min and max to 0: wait { min = "0" max = "0" } Per-template wait blocks in the task description still work like one would expect.	2025-03-03 16:38:46 -05:00
Juana De La Cuesta	d50a9a474c	Add note to `stop allocs` for system allocs (#25263 ) * docs: Add note to stop allocs to make sure system allocs are not rescheduled * Update stop.mdx * Update website/content/docs/commands/alloc/stop.mdx Co-authored-by: Tim Gross <tgross@hashicorp.com> --------- Co-authored-by: Tim Gross <tgross@hashicorp.com>	2025-03-03 19:37:24 +01:00
Tim Gross	7e8c7d2896	generic paginator (#25252 ) The paginator was developed before generics were available, so we've had to work around a lack of compile-time safety by creating configuration objects at runtime that require a lot of branching and type casts. This results in a lot of added boilerplate in the RPC handlers. Refactor the paginator to take advantage of generics. * Move all decision making around tokenization to compile-time by providing pre-built generic functions that close over target tokens. * Remove the `appendFunc` parameter in lieu of a `Stub` function parameter that will accept existing `Stub` functions in most cases (with the addition of an extra `error` return value). * Generally remove boilerplate in the RPC handlers as a result, except where a given handler wants more complex filtering. This doesn't reduce the boilerplate we need at the top of many blocking queries to define the iterator we want based on arguments, which we're typically doing to decide upon which memdb index we want. That's a query optimization problem and way beyond the scope of this PR.	2025-03-03 10:08:50 -05:00
Tim Gross	60132ab0cf	docs: update renamed attributes (#25265 ) A couple of attributes were renamed in #24942. Update example outputs in the API docs to match. Ref: https://github.com/hashicorp/nomad/pull/24942#pullrequestreview-2653776939	2025-03-03 09:44:26 -05:00
Tim Gross	1788bfb42e	remove addresses from node class hash (#24942 ) When a node is fingerprinted, we calculate a "computed class" from a hash over a subset of its fields and attributes. In the scheduler, when a given node fails feasibility checking (before fit checking) we know that no other node of that same class will be feasible, and we add the hash to a map so we can reject them early. This hash cannot include any values that are unique to a given node, otherwise no other node will have the same hash and we'll never save ourselves the work of feasibility checking those nodes. In #4390 we introduce the `nomad.advertise.address` attribute and in #19969 we introduced `consul.dns.addr` attribute. Both of these are unique per node and break the hash. Additionally, we were not correctly filtering attributes out when checking if a node escaped the class by not filtering for attributes that start with `unique.`. The test for this introduced in #708 had an inverted assertion, which allowed this to pass unnoticed since the early days of Nomad. Ref: https://github.com/hashicorp/nomad/pull/708 Ref: https://github.com/hashicorp/nomad/pull/4390 Ref: https://github.com/hashicorp/nomad/pull/19969	2025-03-03 09:28:32 -05:00
Michael Smithhisler	7867957811	e2e: remove legacy consul token tests (#25174 )	2025-02-28 11:31:33 -05:00
David	c52623d7d4	client: remove unused nodeID parameter from host stats metric functions (#25247 )	2025-02-28 10:29:38 +01:00
James Rasell	7268053174	vault: Remove legacy token based authentication workflow. (#25155 ) The legacy workflow for Vault whereby servers were configured using a token to provide authentication to the Vault API has now been removed. This change also removes the workflow where servers were responsible for deriving Vault tokens for Nomad clients. The deprecated Vault config options used byi the Nomad agent have all been removed except for "token" which is still in use by the Vault Transit keyring implementation. Job specification authors can no longer use the "vault.policies" parameter and should instead use "vault.role" when not using the default workload identity. --------- Co-authored-by: Tim Gross <tgross@hashicorp.com> Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>	2025-02-28 07:40:02 +00:00
Tim Gross	4a62d1b75c	upgrade tests: add CSI workload (#25223 ) Add an upgrade test workload for CSI with the AWS EFS plugin. In order to validate this workload, we'll need to deploy the plugin job and then register a volume with it. So this extends the `run_workloads` module to allow for "pre scripts" and "post scripts" to be run before and after a given job has been deployed. We can use that as a model for other test workloads. Ref: https://hashicorp.atlassian.net/browse/NET-12217	2025-02-27 15:16:04 -05:00
James Rasell	c34f17c377	deps: Consolidated update of dependabot PRs (#25240 ) * chore(deps): bump github.com/moby/term from 0.5.0 to 0.5.2 Bumps [github.com/moby/term](https://github.com/moby/term) from 0.5.0 to 0.5.2. - [Commits](https://github.com/moby/term/compare/v0.5.0...v0.5.2) --- updated-dependencies: - dependency-name: github.com/moby/term dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> * chore(deps): bump golang.org/x/mod from 0.22.0 to 0.23.0 Bumps [golang.org/x/mod](https://github.com/golang/mod) from 0.22.0 to 0.23.0. - [Commits](https://github.com/golang/mod/compare/v0.22.0...v0.23.0) --- updated-dependencies: - dependency-name: golang.org/x/mod dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> * chore(deps): bump github.com/zclconf/go-cty from 1.16.0 to 1.16.2 Bumps [github.com/zclconf/go-cty](https://github.com/zclconf/go-cty) from 1.16.0 to 1.16.2. - [Release notes](https://github.com/zclconf/go-cty/releases) - [Changelog](https://github.com/zclconf/go-cty/blob/main/CHANGELOG.md) - [Commits](https://github.com/zclconf/go-cty/compare/v1.16.0...v1.16.2) --- updated-dependencies: - dependency-name: github.com/zclconf/go-cty dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> * chore(deps): bump github.com/docker/go-connections from 0.4.0 to 0.5.0 Bumps [github.com/docker/go-connections](https://github.com/docker/go-connections) from 0.4.0 to 0.5.0. - [Commits](https://github.com/docker/go-connections/compare/v0.4.0...v0.5.0) --- updated-dependencies: - dependency-name: github.com/docker/go-connections dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-02-27 16:54:44 +00:00
Piotr Kazmierczak	73a193f6d9	stateful deployments: task group host volume claims CLI (#25116 ) CLI for interacting with task group host volume claims.	2025-02-27 17:04:48 +01:00
Tim Gross	6ae1444cf4	upgrade testing: debugging assistance (#25232 ) Enos buries the Terraform output from provisioning. Add a shell script to load the environment from provisioning for debugging Nomad during development of upgrade tests.	2025-02-27 08:35:45 -05:00
James Rasell	bc52ca3142	sec: Update x/crypto and x/oauth2 to resolve scan failures. (#25236 ) Fixes: - vulnerability GO-2025-3487 in golang.org/x/crypto@v0.32.0 - vulnerability GO-2025-3488 in golang.org/x/oauth2@v0.25.0 The go.mod declaration is also needed by the update.	2025-02-27 10:37:54 +00:00
dependabot[bot]	2ea0846b0d	chore(deps): bump github.com/go-jose/go-jose/v3 from 3.0.3 to 3.0.4 (#25234 ) Bumps [github.com/go-jose/go-jose/v3](https://github.com/go-jose/go-jose) from 3.0.3 to 3.0.4. - [Release notes](https://github.com/go-jose/go-jose/releases) - [Changelog](https://github.com/go-jose/go-jose/blob/main/CHANGELOG.md) - [Commits](https://github.com/go-jose/go-jose/compare/v3.0.3...v3.0.4) --- updated-dependencies: - dependency-name: github.com/go-jose/go-jose/v3 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-02-27 10:35:11 +01:00
Juana De La Cuesta	461d4268e2	func: add python servers to raw exec workloads (#25230 )	2025-02-26 18:05:46 +01:00
Juana De La Cuesta	b13132043b	Add new workloads (#25106 ) * func: Add more workloads * Update jobs.sh * Update versions.sh * style: format * Update enos/modules/test_cluster_health/scripts/allocs.sh Co-authored-by: Tim Gross <tgross@hashicorp.com> * docs: improve outputs descriptions * func: change docker workloads to be redis boxes and add healthchecks * func: register the services on consul * style: format --------- Co-authored-by: Tim Gross <tgross@hashicorp.com>	2025-02-26 17:02:27 +01:00
Tim Gross	3b9290a11e	E2E: fix column parsing for dynamic host volumes test (#25228 ) In #25185 we changed the output of `volume status` to include both DHV and CSI volumes by default. When the E2E test parses the output, it's not expecting the new section header. Ref: https://github.com/hashicorp/nomad/pull/25185	2025-02-26 09:52:47 -05:00
Aimee Ukasick	4693f0be2e	docs: update Virt beta callout to use Note component (#25184 )	2025-02-25 11:26:41 -06:00
Phil Renaud	7d08e79da3	[ui] System, Batch, and Sysbatch jobs' Start Job buttons let you revert to previous versions (#25104 ) * Changes the behaviour of system/batch/sysbatch jobs not to look for a latest stable version, as their versions never go to stable * Dont show job stability on versions page for system/sysbatch/batch jobs * Tests that depend on jobs to revert specify that they are Service jobs * Batch jobs added to detail-restart test loop * Right, they're not stable, they're just versions	2025-02-25 11:10:14 -05:00
Tim Gross	db5022b965	deps: remove actions updates from dependabot (#25211 ) Dependabot can update actions to versions that are not in the TSCCR allowlist. The TSCCR check doesn't happen in CE, which means we don't learn we have a problem until after we've spent the effort to backport them. Remove the automation that updates actions automatically until this issue is resolved on the security team's side.	2025-02-25 10:18:50 -05:00
dependabot[bot]	d9d5e7351a	chore(deps): bump github.com/go-jose/go-jose/v4 from 4.0.4 to 4.0.5 (#25204 ) Bumps [github.com/go-jose/go-jose/v4](https://github.com/go-jose/go-jose) from 4.0.4 to 4.0.5. - [Release notes](https://github.com/go-jose/go-jose/releases) - [Changelog](https://github.com/go-jose/go-jose/blob/main/CHANGELOG.md) - [Commits](https://github.com/go-jose/go-jose/compare/v4.0.4...v4.0.5) --- updated-dependencies: - dependency-name: github.com/go-jose/go-jose/v4 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-02-25 15:55:36 +01:00

1 2 3 4 5 ...

26759 Commits