nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-01 16:05:42 +03:00

Author	SHA1	Message	Date
Phil Renaud	ec53cccbc8	Adds an ellipsis and max width to profile nav token name (#24240 )	2024-10-17 23:50:24 -04:00
Phil Renaud	3aaf6d8791	Upgrades Percy and Percy CLI (#24170 )	2024-10-17 23:49:48 -04:00
Michael Schurter	cbbe6bb389	docs: explain schedule state values (#24160 ) * docs: explain schedule state values GET /v1/client/allocation/:alloc_id/pause?task=:task_name is a tiny but critical API for observability of tasks with a schedule. This PR explains each of the values which might be returned. * correct docstring * add missing state and expand PUT docs --------- Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>	2024-10-17 11:42:12 -07:00
Tim Gross	55fe05d353	heartbeat: use leader's ACL token when failing heartbeat (#24241 ) In #23838 we updated the `Node.Update` RPC handler we use for heartbeats to be more strict about requiring node secrets. But when a node goes down, it's the leader that sends the request to mark the node down via `Node.Update` (to itself), and this request was missing the leader ACL needed to authenticate to itself. Add the leader ACL to the request and update the RPC handler test for disconnected-clients to use ACLs, which would have detected this bug. Also added a note to the `Authenticate` comment about how that authentication path requires the leader ACL. Fixes: https://github.com/hashicorp/nomad/issues/24231 Ref: https://hashicorp.atlassian.net/browse/NET-11384	2024-10-17 13:48:20 -04:00
Michael Schurter	e440e1d1db	cli: update nomad job init full examples (#24232 ) * cli: trim job init example jobspec * cli: trim job init -connect example jobspec --------- Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>	2024-10-17 10:32:47 -07:00
Seth Hoenig	b539b54c9e	docker: close hijacked write connection when exec ends (#24244 )	2024-10-17 11:41:29 -05:00
Seth Hoenig	b18851617f	docker: close response connection once stdin is exhausted (#24202 )	2024-10-17 11:07:23 -05:00
Piotr Kazmierczak	1ac14f4869	docker: always use API version negotiation when initializing clients (#24237 ) During a refactoring of the docker driver in #23966 we introduced a bug: API version negotiation option was not passed to every new client call.	2024-10-17 15:23:14 +02:00
Tim Gross	d12128c380	docker: use streaming stats collection to correct CPU stats (#24229 ) In #23966 we switched to the official Docker SDK for the `docker` driver. In the process we refactored code around stats collection to use the "one shot" version of stats. Unfortunately this "one shot" stats collection does not include the `PreCPU` stats, which are the stats from the previous read. This breaks the calculation we use to determine CPU ticks, because now we're subtracting 0 from the current value to get the delta. Switch back to using the streaming stats collection. Add a test that fully exercises the `TaskStats` API. Fixes: https://github.com/hashicorp/nomad/issues/24224 Ref: https://hashicorp.atlassian.net/browse/NET-11348	2024-10-17 08:25:59 -04:00
Piotr Kazmierczak	a22e56390e	e2e: fix failing tests due to docker plugin settings (#24234 )	2024-10-17 11:12:59 +02:00
Piotr Kazmierczak	f9cbaaf6c7	docker: fix a bug where auth for private registries wasn't parsed correctly (#24215 ) In #23966 we introduced an official Docker client and did not notice that in contrast to our previous 3rd party client, the official SDK PullOptions object expects a base64 encoded JSON with username and password, instead of username/ password pair.	2024-10-16 22:04:54 +02:00
Daniel Bennett	a0d7fb6b09	connect: fix ipv6 bind_address test (#24216 )	2024-10-16 08:23:44 -05:00
Tim Gross	6b8ddff1fa	windows: set job object for executor and children (#24214 ) On Windows, if the `raw_exec` driver's executor exits, the child processes are not also killed. Create a Windows "job object" (not to be confused with a Nomad job) and add the executor to it. Child processes of the executor will inherit the job automatically. When the handle to the job object is freed (on executor exit), the job itself is destroyed and this causes all processes in that job to exit. Fixes: https://github.com/hashicorp/nomad/issues/23668 Ref: https://learn.microsoft.com/en-us/windows/win32/procthread/job-objects	2024-10-16 09:20:26 -04:00
James Rasell	0f6561bdfe	docs: Add initial nomad-driver-virt driver plugin documentation. (#24094 ) Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>	2024-10-15 17:05:30 +01:00
Tim Gross	d261d58ea2	build: update hc-install to current (#24199 ) Installing Vault and Consul from releases.hashicorp.com via `hc-install` has been failing intermittently. Update the `hc-install` binaries to be current and add one retry to downloads for our compat tests so that we can get builds more reliably green while the underlying issue is being debugged.	2024-10-15 10:07:58 -04:00
James Rasell	61dd1f3f10	docs: CLI node pool list does not accept arguments. (#24188 )	2024-10-15 07:49:37 +01:00
Daniel Bennett	067afcda26	Consul Connect over IPv6 (except tproxy) (#24203 ) * detect ipv6 on "bridge" network and set service.connect.sidecar_proxy.config.bind_address for envoy to "::" instead of "0.0.0.0" * allow users to set bind_address in jobspec e.g. "" would defer to consul proxy-defaults * caveat: tproxy still does not work, because the CNI plugin does not configure ip6tables	2024-10-14 18:52:02 -05:00
Aimee Ukasick	5beb1ce58e	Docs: Update job version section with tutorial links (#24179 ) * Update job page with tutorial links * Update section links	2024-10-14 12:29:56 -05:00
Tim Gross	fec91d1dc8	windows: trade heap for stack to build process tree for stats in linear space (#24182 ) In #20619 we overhauled how we were gathering stats for Windows processes. Unlike in Linux where we can ask for processes in a cgroup, on Windows we have to make a single expensive syscall to get all the processes and then build the tree ourselves. Our algorithm to do so is recursive and quadratic in both steps and space with the number of processes on the host. For busy hosts this hits the stack limit and panics the Nomad client. We already build a map of parent PID to PID, so modify this to be a map of parent PID to slice of children and then traverse that tree only from the root we care about (the executor PID). This moves the allocations to the heap but makes the stats gathering linear in steps and space required. This changeset also moves as much of this code as possible into an area not conditionally-compiled by OS, as the tagged test file was not being run in CI. Fixes: https://github.com/hashicorp/nomad/issues/23984	2024-10-14 11:26:38 -04:00
Aimee Ukasick	8f4a9326be	Docs: Add 1.9 release notes (#24161 ) * Add 1.9 release notes * Add deprecated items * Update Virt driver docs link to point to repo Update Virt driver docs link to point to repo	2024-10-14 09:57:15 -05:00
James Rasell	a7dad68996	changelog: remove doubled entry for 1.9 release. (#24192 )	2024-10-14 14:48:50 +01:00
dependabot[bot]	294ebd1540	chore(deps): bump actions/checkout from 4.2.0 to 4.2.1 (#24183 ) Bumps [actions/checkout](https://github.com/actions/checkout) from 4.2.0 to 4.2.1. - [Release notes](https://github.com/actions/checkout/releases) - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md) - [Commits](`d632683dd7...eef61447b9`) --- updated-dependencies: - dependency-name: actions/checkout dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-10-14 08:26:34 -05:00
dependabot[bot]	e439d6e408	chore(deps): bump actions/upload-artifact from 4.4.0 to 4.4.3 (#24184 ) Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 4.4.0 to 4.4.3. - [Release notes](https://github.com/actions/upload-artifact/releases) - [Commits](`50769540e7...b4b15b8c7c`) --- updated-dependencies: - dependency-name: actions/upload-artifact dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-10-14 08:24:59 -05:00
Michael Smithhisler	436ff75f15	scheduler: fix reconnecting allocations getting rescheduled (#24165 ) * scheduler: fix reconnecting allocations getting rescheduled	2024-10-14 09:00:58 -04:00
James Rasell	e7154f1d81	Merge pull request #24187 from hashicorp/post-1.9.0-release admin: post 1.9.0 release	2024-10-14 09:15:14 +02:00
James Rasell	67f2f32027	Merge release 1.9.0 files	2024-10-14 07:42:14 +01:00
hc-github-team-nomad-core	da654ead34	Prepare for next release	2024-10-14 07:26:46 +01:00
hc-github-team-nomad-core	f1714162df	Generate files for 1.9.0 release	2024-10-14 07:26:36 +01:00
Aimee Ukasick	c839f38cab	Docs: Golden Versions updates (#24153 ) * Add language from CLI help to job revert for version\|tag * Add CLI job tag subcommand page * Add API create delete tag Examples use same names between CLI and API * Update CLI revert, tag; API jobs * Add job version content * add tag name unique per job to CLI/API; address Phil's feedback Add partial explaining why tag, add to CLI/API * Add diff_version to API jobs list job versions * Apply suggestions from code review Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com> * remove tutorial links since not published yet. --------- Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>	2024-10-11 12:36:32 -05:00
Tim Gross	4de1665942	consul: improve reliability of deregistration (#24166 ) When the local Consul agent receives a deregister request, it performs a pre-flight check using the locally cached ACL token. The agent then sends the request upstream to the Consul servers as part of anti-entropy, using its own token. This requires that the token we use for deregistration is valid even though that's not the token used to write to the Consul server. There are several cases where the service identity token might no longer exist at the time of deregistration: * A race condition between the sync and destroying the allocation. * Misconfiguration of the Consul auth method with a TTL. * Out-of-band destruction of the token. Additionally, Nomad's sync with Consul returns early if there are any errors, which means that a single broken token can prevent any other service on the Nomad agent from being registered or deregistered. Update Nomad's sync with Consul to use the Nomad agent's own Consul token for deregistration, regardless of which token the service was registered with. Accumulate errors from the sync so that they no longer block deregistration of other services. Fixes: https://github.com/hashicorp/nomad/issues/20159	2024-10-11 12:32:23 -04:00
Tim Gross	5bb6d96773	build: update versions file for backports (#24174 )	2024-10-11 12:30:34 -04:00
Seth Hoenig	f1ce127524	jobspec: add a chown option to artifact block (#24157 ) * jobspec: add a chown option to artifact block This PR adds a boolean 'chown' field to the artifact block. It indicates whether the Nomad client should chown the downloaded files and directories to be owned by the task.user. This is useful for drivers like raw_exec and exec2 which are subject to the host filesystem user permissions structure. Before, these drivers might not be able to use or manage the downloaded artifacts since they would be owned by the root user on a typical Nomad client configuration. * api: no need for pointer of chown field	2024-10-11 11:30:27 -05:00
Tim Gross	7381f8419b	docs: clarify requirements for Consul token policies and TTLs (#24167 ) As of #24166, Nomad agents will use their own token to deregister services and checks from Consul. This returns the deregistration path to the pre-Workload Identity workflow. Expand the documentation to make clear why certain ACL policies are required for clients. Additionally, we did not explicitly call out that auth methods should not set an expiration on Consul tokens. Nomad does not have a facility to refresh these tokens if they expire. Even if Nomad could, there's no way to re-inject them into Envoy sidecars for Consul Service Mesh without recreating the task anyways, which is what happens today. Warn users that they should not set an expiration. Closes: https://github.com/hashicorp/nomad/issues/20185 (wontfix) Ref: https://hashicorp.atlassian.net/browse/NET-10262	2024-10-11 11:59:21 -04:00
Daniel Bennett	373aae7b32	docs: add Resource Quota specification page (#24152 ) and update some related pages Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com> Co-authored-by: Tim Gross <tgross@hashicorp.com>	2024-10-10 15:03:10 -05:00
Daniel Bennett	278a2df3af	e2e: ui: update playwright to 1.48.0 (#24158 ) steps to update: * edit run.sh IMAGE variable manually * run ./run.sh test	2024-10-09 10:34:53 -05:00
Phil Renaud	dc45066ae7	[ui] Separate Diffs and Versions from the /versions endpoint as far as Ember is concerned (#24145 ) * Separate Diffs and Versions from the /versions endpoint as far as Ember is concerned * Back to async true * Handle undefined-diffs case	2024-10-08 12:13:01 -04:00
the-sun-will-rise-tomorrow	1ba9cc266c	docs: Link directly to podman's --network option (#24149 )	2024-10-08 09:05:14 -05:00
Daniel Bennett	4562b9ac8a	Release/1.9.0 beta.2	2024-10-04 14:07:13 -05:00
hc-github-team-nomad-core	7d7a88d7e0	Prepare for next release	2024-10-04 16:18:34 +00:00
hc-github-team-nomad-core	668a827b2b	Generate files for 1.9.0-beta.2 release	2024-10-04 16:18:27 +00:00
Daniel Bennett	3f1bba1643	Prepare release 1.9.0-beta.2	2024-10-04 12:13:01 -04:00
Tim Gross	7531b7a62f	fix data race in node upsert (#24127 ) While testing with agents built with the race-detection option enabled, I encountered a data race while draining a node. When we upsert a node we copy the `NodeResources` struct and then perform a fixup for backwards compatibility of the topology struct. This fixup was being executed on the original struct and not the copy, which means we're uselessly fixing up the wrong struct and we're corrupting the state store in the process (albeit harmlessly, I suspect). Fix the data race by calling the method on the correct pointer.	2024-10-04 08:41:14 -04:00
Daniel Bennett	1c76dd9c1c	update example device readme (#24124 )	2024-10-03 13:24:58 -05:00
Tim Gross	b7595c646d	alloc fs: use case-insensitive check for reads of secret/private dir (#24125 ) When using the Client FS APIs, we check to ensure that reads don't traverse into the allocation's secret dir and private dir. But this check can be bypassed on case-insensitive file systems (ex. Windows, macOS, and Linux with obscure ext4 options enabled). This allows a user with `read-fs` permissions but not `alloc-exec` permissions to read from the secrets dir. This changeset updates the check so that it's case-insensitive. This risks false positives for escape (see linked Go issue), but only if a task without filesystem isolation deliberately writes into the task working directory to do so, which is a fail-safe failure mode. Ref: https://github.com/golang/go/issues/18358 Co-authored-by: dduzgun-security <deniz.duzgun@hashicorp.com>	2024-10-03 14:20:24 -04:00
Michael Schurter	da75d4ff4b	docs: fix aed -> aead typo (#24123 )	2024-10-03 13:31:32 -04:00
Tim Gross	f7d4bd2fd1	test: wait for keyring in plan submission tests (#24122 ) In #23977 we merged a change to how the keyring was stored. Because keyring initialization takes slightly longer now, this uncovered existing timing bugs in some of our tests where tests that require the keyring (ex. plan applier tests) were waiting for the leader but not the keyring initialization. Fix another example we've seen causing test flakes.	2024-10-03 13:22:41 -04:00
Daniel Bennett	7526c91ccd	scheduler: non-nil err when no devices match (#24118 )	2024-10-03 10:29:36 -05:00
Aimee Ukasick	4c131229f4	Add devices to NUMA section of CPU page (#24113 )	2024-10-03 09:09:10 -05:00
Aimee Ukasick	e5b18affa1	nvidia driver: add MIG support to overview paragraph (#24099 )	2024-10-03 09:08:43 -05:00
James Rasell	1fabbaa179	driver: remove LXC and ECS driver documentation. (#24107 ) Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>	2024-10-03 08:55:39 +01:00

1 2 3 4 5 ...

26251 Commits