nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-05 01:45:44 +03:00

Author	SHA1	Message	Date
Daniel Bennett	a0d7fb6b09	connect: fix ipv6 bind_address test (#24216 )	2024-10-16 08:23:44 -05:00
Tim Gross	6b8ddff1fa	windows: set job object for executor and children (#24214 ) On Windows, if the `raw_exec` driver's executor exits, the child processes are not also killed. Create a Windows "job object" (not to be confused with a Nomad job) and add the executor to it. Child processes of the executor will inherit the job automatically. When the handle to the job object is freed (on executor exit), the job itself is destroyed and this causes all processes in that job to exit. Fixes: https://github.com/hashicorp/nomad/issues/23668 Ref: https://learn.microsoft.com/en-us/windows/win32/procthread/job-objects	2024-10-16 09:20:26 -04:00
James Rasell	0f6561bdfe	docs: Add initial nomad-driver-virt driver plugin documentation. (#24094 ) Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>	2024-10-15 17:05:30 +01:00
Tim Gross	d261d58ea2	build: update hc-install to current (#24199 ) Installing Vault and Consul from releases.hashicorp.com via `hc-install` has been failing intermittently. Update the `hc-install` binaries to be current and add one retry to downloads for our compat tests so that we can get builds more reliably green while the underlying issue is being debugged.	2024-10-15 10:07:58 -04:00
James Rasell	61dd1f3f10	docs: CLI node pool list does not accept arguments. (#24188 )	2024-10-15 07:49:37 +01:00
Daniel Bennett	067afcda26	Consul Connect over IPv6 (except tproxy) (#24203 ) * detect ipv6 on "bridge" network and set service.connect.sidecar_proxy.config.bind_address for envoy to "::" instead of "0.0.0.0" * allow users to set bind_address in jobspec e.g. "" would defer to consul proxy-defaults * caveat: tproxy still does not work, because the CNI plugin does not configure ip6tables	2024-10-14 18:52:02 -05:00
Aimee Ukasick	5beb1ce58e	Docs: Update job version section with tutorial links (#24179 ) * Update job page with tutorial links * Update section links	2024-10-14 12:29:56 -05:00
Tim Gross	fec91d1dc8	windows: trade heap for stack to build process tree for stats in linear space (#24182 ) In #20619 we overhauled how we were gathering stats for Windows processes. Unlike in Linux where we can ask for processes in a cgroup, on Windows we have to make a single expensive syscall to get all the processes and then build the tree ourselves. Our algorithm to do so is recursive and quadratic in both steps and space with the number of processes on the host. For busy hosts this hits the stack limit and panics the Nomad client. We already build a map of parent PID to PID, so modify this to be a map of parent PID to slice of children and then traverse that tree only from the root we care about (the executor PID). This moves the allocations to the heap but makes the stats gathering linear in steps and space required. This changeset also moves as much of this code as possible into an area not conditionally-compiled by OS, as the tagged test file was not being run in CI. Fixes: https://github.com/hashicorp/nomad/issues/23984	2024-10-14 11:26:38 -04:00
Aimee Ukasick	8f4a9326be	Docs: Add 1.9 release notes (#24161 ) * Add 1.9 release notes * Add deprecated items * Update Virt driver docs link to point to repo Update Virt driver docs link to point to repo	2024-10-14 09:57:15 -05:00
James Rasell	a7dad68996	changelog: remove doubled entry for 1.9 release. (#24192 )	2024-10-14 14:48:50 +01:00
dependabot[bot]	294ebd1540	chore(deps): bump actions/checkout from 4.2.0 to 4.2.1 (#24183 ) Bumps [actions/checkout](https://github.com/actions/checkout) from 4.2.0 to 4.2.1. - [Release notes](https://github.com/actions/checkout/releases) - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md) - [Commits](`d632683dd7...eef61447b9`) --- updated-dependencies: - dependency-name: actions/checkout dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-10-14 08:26:34 -05:00
dependabot[bot]	e439d6e408	chore(deps): bump actions/upload-artifact from 4.4.0 to 4.4.3 (#24184 ) Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 4.4.0 to 4.4.3. - [Release notes](https://github.com/actions/upload-artifact/releases) - [Commits](`50769540e7...b4b15b8c7c`) --- updated-dependencies: - dependency-name: actions/upload-artifact dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-10-14 08:24:59 -05:00
Michael Smithhisler	436ff75f15	scheduler: fix reconnecting allocations getting rescheduled (#24165 ) * scheduler: fix reconnecting allocations getting rescheduled	2024-10-14 09:00:58 -04:00
James Rasell	e7154f1d81	Merge pull request #24187 from hashicorp/post-1.9.0-release admin: post 1.9.0 release	2024-10-14 09:15:14 +02:00
James Rasell	67f2f32027	Merge release 1.9.0 files	2024-10-14 07:42:14 +01:00
hc-github-team-nomad-core	da654ead34	Prepare for next release	2024-10-14 07:26:46 +01:00
hc-github-team-nomad-core	f1714162df	Generate files for 1.9.0 release	2024-10-14 07:26:36 +01:00
Aimee Ukasick	c839f38cab	Docs: Golden Versions updates (#24153 ) * Add language from CLI help to job revert for version\|tag * Add CLI job tag subcommand page * Add API create delete tag Examples use same names between CLI and API * Update CLI revert, tag; API jobs * Add job version content * add tag name unique per job to CLI/API; address Phil's feedback Add partial explaining why tag, add to CLI/API * Add diff_version to API jobs list job versions * Apply suggestions from code review Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com> * remove tutorial links since not published yet. --------- Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>	2024-10-11 12:36:32 -05:00
Tim Gross	4de1665942	consul: improve reliability of deregistration (#24166 ) When the local Consul agent receives a deregister request, it performs a pre-flight check using the locally cached ACL token. The agent then sends the request upstream to the Consul servers as part of anti-entropy, using its own token. This requires that the token we use for deregistration is valid even though that's not the token used to write to the Consul server. There are several cases where the service identity token might no longer exist at the time of deregistration: * A race condition between the sync and destroying the allocation. * Misconfiguration of the Consul auth method with a TTL. * Out-of-band destruction of the token. Additionally, Nomad's sync with Consul returns early if there are any errors, which means that a single broken token can prevent any other service on the Nomad agent from being registered or deregistered. Update Nomad's sync with Consul to use the Nomad agent's own Consul token for deregistration, regardless of which token the service was registered with. Accumulate errors from the sync so that they no longer block deregistration of other services. Fixes: https://github.com/hashicorp/nomad/issues/20159	2024-10-11 12:32:23 -04:00
Tim Gross	5bb6d96773	build: update versions file for backports (#24174 )	2024-10-11 12:30:34 -04:00
Seth Hoenig	f1ce127524	jobspec: add a chown option to artifact block (#24157 ) * jobspec: add a chown option to artifact block This PR adds a boolean 'chown' field to the artifact block. It indicates whether the Nomad client should chown the downloaded files and directories to be owned by the task.user. This is useful for drivers like raw_exec and exec2 which are subject to the host filesystem user permissions structure. Before, these drivers might not be able to use or manage the downloaded artifacts since they would be owned by the root user on a typical Nomad client configuration. * api: no need for pointer of chown field	2024-10-11 11:30:27 -05:00
Tim Gross	7381f8419b	docs: clarify requirements for Consul token policies and TTLs (#24167 ) As of #24166, Nomad agents will use their own token to deregister services and checks from Consul. This returns the deregistration path to the pre-Workload Identity workflow. Expand the documentation to make clear why certain ACL policies are required for clients. Additionally, we did not explicitly call out that auth methods should not set an expiration on Consul tokens. Nomad does not have a facility to refresh these tokens if they expire. Even if Nomad could, there's no way to re-inject them into Envoy sidecars for Consul Service Mesh without recreating the task anyways, which is what happens today. Warn users that they should not set an expiration. Closes: https://github.com/hashicorp/nomad/issues/20185 (wontfix) Ref: https://hashicorp.atlassian.net/browse/NET-10262	2024-10-11 11:59:21 -04:00
Daniel Bennett	373aae7b32	docs: add Resource Quota specification page (#24152 ) and update some related pages Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com> Co-authored-by: Tim Gross <tgross@hashicorp.com>	2024-10-10 15:03:10 -05:00
Daniel Bennett	278a2df3af	e2e: ui: update playwright to 1.48.0 (#24158 ) steps to update: * edit run.sh IMAGE variable manually * run ./run.sh test	2024-10-09 10:34:53 -05:00
Phil Renaud	dc45066ae7	[ui] Separate Diffs and Versions from the /versions endpoint as far as Ember is concerned (#24145 ) * Separate Diffs and Versions from the /versions endpoint as far as Ember is concerned * Back to async true * Handle undefined-diffs case	2024-10-08 12:13:01 -04:00
the-sun-will-rise-tomorrow	1ba9cc266c	docs: Link directly to podman's --network option (#24149 )	2024-10-08 09:05:14 -05:00
Daniel Bennett	4562b9ac8a	Release/1.9.0 beta.2	2024-10-04 14:07:13 -05:00
hc-github-team-nomad-core	7d7a88d7e0	Prepare for next release	2024-10-04 16:18:34 +00:00
hc-github-team-nomad-core	668a827b2b	Generate files for 1.9.0-beta.2 release	2024-10-04 16:18:27 +00:00
Daniel Bennett	3f1bba1643	Prepare release 1.9.0-beta.2	2024-10-04 12:13:01 -04:00
Tim Gross	7531b7a62f	fix data race in node upsert (#24127 ) While testing with agents built with the race-detection option enabled, I encountered a data race while draining a node. When we upsert a node we copy the `NodeResources` struct and then perform a fixup for backwards compatibility of the topology struct. This fixup was being executed on the original struct and not the copy, which means we're uselessly fixing up the wrong struct and we're corrupting the state store in the process (albeit harmlessly, I suspect). Fix the data race by calling the method on the correct pointer.	2024-10-04 08:41:14 -04:00
Daniel Bennett	1c76dd9c1c	update example device readme (#24124 )	2024-10-03 13:24:58 -05:00
Tim Gross	b7595c646d	alloc fs: use case-insensitive check for reads of secret/private dir (#24125 ) When using the Client FS APIs, we check to ensure that reads don't traverse into the allocation's secret dir and private dir. But this check can be bypassed on case-insensitive file systems (ex. Windows, macOS, and Linux with obscure ext4 options enabled). This allows a user with `read-fs` permissions but not `alloc-exec` permissions to read from the secrets dir. This changeset updates the check so that it's case-insensitive. This risks false positives for escape (see linked Go issue), but only if a task without filesystem isolation deliberately writes into the task working directory to do so, which is a fail-safe failure mode. Ref: https://github.com/golang/go/issues/18358 Co-authored-by: dduzgun-security <deniz.duzgun@hashicorp.com>	2024-10-03 14:20:24 -04:00
Michael Schurter	da75d4ff4b	docs: fix aed -> aead typo (#24123 )	2024-10-03 13:31:32 -04:00
Tim Gross	f7d4bd2fd1	test: wait for keyring in plan submission tests (#24122 ) In #23977 we merged a change to how the keyring was stored. Because keyring initialization takes slightly longer now, this uncovered existing timing bugs in some of our tests where tests that require the keyring (ex. plan applier tests) were waiting for the leader but not the keyring initialization. Fix another example we've seen causing test flakes.	2024-10-03 13:22:41 -04:00
Daniel Bennett	7526c91ccd	scheduler: non-nil err when no devices match (#24118 )	2024-10-03 10:29:36 -05:00
Aimee Ukasick	4c131229f4	Add devices to NUMA section of CPU page (#24113 )	2024-10-03 09:09:10 -05:00
Aimee Ukasick	e5b18affa1	nvidia driver: add MIG support to overview paragraph (#24099 )	2024-10-03 09:08:43 -05:00
James Rasell	1fabbaa179	driver: remove LXC and ECS driver documentation. (#24107 ) Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>	2024-10-03 08:55:39 +01:00
Phil Renaud	2fc7544ff3	[ui] Modify variable access permissions for UI users with write in only certain namespaces (#24073 ) * Modify variable access permissions for UI users with write in only certain namespaces * Addressing some PR comments * Variables index namespaces on * and ability checks are now namespaced * Mistook Delete for Destroy, and update unit tests for mult-return allPaths	2024-10-02 16:02:40 -04:00
Tim Gross	64881eefce	docs: remove references to serf.io site (#24114 ) The serf.io site is being taken down, so change all our links to point to the repo docs instead. Ref: https://github.com/hashicorp/serf/pull/743	2024-10-02 14:33:04 -04:00
Daniel Bennett	6b9bcb8582	differently exclude tagged job versions from being pruned (#24102 ) * test bug: tagged versions count against limit specifically tagged versions that are not the oldest * fix: use original logic, sans tagged versions	2024-10-02 09:58:35 -05:00
Martijn Vegter	3ecf0d21e2	metrics: introduce client config to include alloc metadata as part of the base labels (#23964 )	2024-10-02 10:55:44 -04:00
Tim Gross	6c03e1991d	refactor: clean up slice initialization in node status (#24109 ) We initialize this slice with a zeroed array and then append to it, which means we then have to clean out the empty strings later. Initialize to the correct capacity up front so there are no empty values. Ref: https://github.com/hashicorp/nomad/pull/24104	2024-10-02 10:40:32 -04:00
Tim Gross	7dc57efe1b	build: update go toolchain to 1.23.2 (#24108 ) Picks up some small bug fixes but one especially relevant to Nomad is the `os/exec` file descriptor, which could impact script check / change mode for task drivers without isolated exec (ex. `raw_exec`). Ref: https://github.com/golang/go/issues?q=milestone%3AGo1.23.2+label%3ACherryPickApproved Ref: https://github.com/golang/go/issues/69402	2024-10-02 10:29:10 -04:00
Tim Gross	651d8d6f88	tests: fixup copywrite in test file (#24101 ) In #24007 we merged new HCL files but they were missing copywrite headers because the scan didn't run on this PR for some reason. I've already backported this to the Enterprise branches.	2024-10-01 16:43:10 -04:00
Tim Gross	e9ba630639	docker: fix script check execution (#24098 ) In #24095 we made a fix for non-streaming exec into Docker tasks for script checks and `change_mode = "script"`, but didn't complete E2E testing. We need to use `ContainerExecAttach` in the new API in order to get stdout/stderr from tasklets, but the previous `ContainerExecStart` call will prevent this from running successfully with an error that the exec has already run. * Ref: [NET-11202 (comment)](https://hashicorp.atlassian.net/browse/NET-11202?focusedCommentId=551618) * This has shipped in Nomad 1.9.0-beta.1 but not production yet. * This should fix the remaining issues in nightly E2E for Docker.	2024-10-01 16:41:38 -04:00
Juliano Martinez	4a74fda8ce	Allow client template config block to be parsed when using json config (#24007 ) - Adds tests - Adds sample test data for parsing hcl and json - Adds changelog	2024-10-01 15:44:36 -04:00
Seth Hoenig	8ae7f21d41	docs: stats_period device configuration no longer exists (#24097 )	2024-10-01 13:47:04 -05:00
Tim Gross	5e1ad14f1f	scaling policy: use request namespace as target if unset in jobspec (#24065 ) When jobs are submitted with a scaling policy, the scaling policy's target only includes the job's namespace if the `namespace` field is set in the jobspec and not from the request. Normally jobs are canonicalized in the RPC handler before being written to Raft. But the scaling policy targets are instead written during the conversion from `api.Job` to `structs.Job`. We populate the `structs.Job` namespace from the request here as well, but only after the conversion has occurred. Swap the order of these operations so that the conversion is always happening with a correct namespace. Long-term we should not be making mutations during conversion either. But we can't remove it immediately because API requests may come from any agent across upgrades. Move the scaling target creation into the `Canonicalize` method and mark it for future removal in the API conversion code path. Fixes: https://github.com/hashicorp/nomad/issues/24039	2024-10-01 11:41:40 -04:00

1 2 3 4 5 ...

26240 Commits