nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-05 01:45:44 +03:00

Author	SHA1	Message	Date
Daniel Bennett	22cbb913db	csi: rename volume Mounter to Manager (#18434 ) to align with its broader purpose, and the volumeManager implementation	2023-09-08 15:33:46 -05:00
Tim Gross	b022346575	fingerprint: backoff on Consul fingerprint after initial success (#18426 ) In the original design of Consul fingerprinting, we would poll every period so that we could change the client's fingerprint if Consul became unavailable. As of 1.4.0 (ref #14673) we no longer update the fingerprint in order to avoid excessive `Node.Register` RPCs when someone's Consul cluster is flapping. This allows us to safely backoff Consul fingerprinting on success, just as we have with Vault.	2023-09-08 08:17:07 -04:00
Tim Gross	a8e68e6479	fingerprint: add support for fingerprinting multiple Consul clusters (#18392 ) fingerprint: add support for fingerprinting multiple Consul clusters Add fingerprinting we'll need to accept multiple Consul clusters in upcoming Nomad Enterprise features. The fingerprinter will create a map of Consul clients by cluster name. In Nomad CE, all but the default cluster will be ignored and there will be no visible behavior change. Ref: https://github.com/hashicorp/team-nomad/issues/404	2023-09-07 14:05:35 -04:00
Tim Gross	c145e8b30f	fingerprint: add warning in CE when there are multiple vaults (#18412 ) Nomad CE only supports a single (default) Vault cluster, so log a warning if the user has configured multiple Vaults.	2023-09-07 09:51:48 -04:00
Piotr Kazmierczak	2fffb96604	client: new Consul client (#18370 ) This PR introduces a new Consul client that returns SI tokens based on requests that contain JWTs.	2023-09-05 20:55:36 +02:00
Luiz Aoqui	b614ef3b01	client: fix panic on alloc restore (#18356 ) When restoring an allocation `WIDMgr` was not being set in the alloc runner config, resulting in a nil panic when the task runner attempted to start. Since we will often require the same configuration values when creating or restoring a new allocation, this commit moves the logic to a shared function to ensure that `addAlloc` and `restoreState` configure alloc runners with the same values.	2023-09-01 11:42:00 -03:00
Seth Hoenig	05c3322214	Revert "client: include response body in output for successful HTTP checks (#18345 )" (#18362 ) * Revert "client: include response body in output for successful HTTP checks (#18345)" This reverts commit `d0a93f12d1`. * cr: add comment about dropping ok output Co-authored-by: Luiz Aoqui <luiz@hashicorp.com> --------- Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>	2023-08-30 08:22:28 -05:00
Matthew Salsamendi	d0a93f12d1	client: include response body in output for successful HTTP checks (#18345 )	2023-08-28 19:15:06 -04:00
stswidwinski	f25480c9e9	Ensure that the index processed by the client is at least as new as the last one processed. (#18269 ) Ensure that the index processed by the client is at least as new as the last index processed so that stale data does not impact the running allocations.	2023-08-25 15:54:58 -07:00
James Rasell	a9d5beb141	test: use correct parallel test setup func (#18326 )	2023-08-25 13:51:36 +01:00
Seth Hoenig	f5b0da1d55	all: swap exp packages for maps, slices (#18311 )	2023-08-23 15:42:13 -05:00
Luiz Aoqui	14a38bee7b	client: 404 when accessing files for GC'ed alloc (#18232 ) When an allocation is garbage collected from the client, but not from the servers, the API request is routed to the client and the client does attempt to read the file, but the alloc dir has already been deleted, resulting in a 500 error. This happens because the client GC only destroys the alloc runner (deleting the alloc dir), but it keeps a reference to the alloc runner until the alloc is garbage collected from the servers as well. This commit adjusts this logic by checking if the alloc runner (and the alloc files) has been destroyed, returning a 404 if so.	2023-08-21 16:09:24 -04:00
Tim Gross	b51b2a2705	fingerprint: add support for fingerprinting multiple Vault clusters (#18253 ) Add fingerprinting we'll need to accept multiple Vault clusters in upcoming Nomad Enterprise features. The fingerprinter will create a map of Vault clients by cluster name. In Nomad CE, all but the default cluster will be ignored and there will be no visible behavior change.	2023-08-18 15:33:22 -04:00
Tim Gross	a8bad048b6	config: parsing support for multiple Consul clusters in agent config (#18255 ) Add the plumbing we need to accept multiple Consul clusters in Nomad agent configuration, to support upcoming Nomad Enterprise features. The `consul` blocks are differentiated by a new `name` field, and if the `name` is omitted it becomes the "default" Consul configuration. All blocks with the same name are merged together, as with the existing behavior. As with the `vault` block, we're still using HCL1 for parsing configuration and the `Decode` method doesn't parse multiple blocks differentiated only by a field name without a label. So we've had to add an extra parsing pass, similar to what we've done for HCL1 jobspecs. This also revealed a subtle bug in the `vault` block handling of extra keys when there are multiple `vault` blocks, which I've fixed here. For now, all existing consumers will use the "default" Consul configuration, so there's no user-facing behavior change in this changeset other than the contents of the agent self API. Ref: https://github.com/hashicorp/team-nomad/issues/404	2023-08-18 15:25:16 -04:00
James Rasell	6108f5c4c3	admin: rename _oss files to _ce (#18209 )	2023-08-18 07:47:24 +01:00
Tim Gross	74b796e6d0	config: parsing support for multiple Vault clusters in agent config (#18224 ) Add the plumbing we need to accept multiple Vault clusters in Nomad agent configuration, to support upcoming Nomad Enterprise features. The `vault` blocks are differentiated by a new `name` field, and if the `name` is omitted it becomes the "default" Vault configuration. All blocks with the same name are merged together, as with the existing behavior. Unfortunately we're still using HCL1 for parsing configuration and the `Decode` method doesn't parse multiple blocks differentiated only by a field name without a label. So we've had to add an extra parsing pass, similar to what we've done for HCL1 jobspecs. For now, all existing consumers will use the "default" Vault configuration, so there's no user-facing behavior change in this changeset other than the contents of the agent self API. Ref: https://github.com/hashicorp/team-nomad/issues/404	2023-08-17 14:10:32 -04:00
Piotr Kazmierczak	53ef6391a5	drivers/docker: fix a hostConfigMemorySwappiness panic (#18238 ) cgroupslib.MaybeDisableMemorySwappiness returned an incorrect type, and was incorrectly typecast to int64 causing a panic on non-linux and non-windows hosts.	2023-08-17 14:45:31 +02:00
Seth Hoenig	8833452d44	followup to numa/cgroups refactor (#18214 ) * lang: note that Stack is not concurrency-safe * client: use more descriptive name for wrangler hook in logs * numalib: use correct name for receiver parameter	2023-08-15 14:12:17 -05:00
Tim Gross	f00bff09f1	fix multiple overflow errors in exponential backoff (#18200 ) We use capped exponential backoff in several places in the code when handling failures. The code we've copy-and-pasted all over has a check to see if the backoff is greater than the limit, but this check happens after the bitshift and we always increment the number of attempts. This causes an overflow with a fairly small number of failures (ex. at one place I tested it occurs after only 24 iterations), resulting in a negative backoff which then never recovers. The backoff becomes a tight loop consuming resources and/or DoS'ing a Nomad RPC handler or an external API such as Vault. Note this doesn't occur in places where we cap the number of iterations so the loop breaks (usually to return an error), so long as the number of iterations is reasonable. Introduce a helper with a check on the cap before the bitshift to avoid overflow in all places this can occur. Fixes: #18199 Co-authored-by: stswidwinski <stan.swidwinski@gmail.com>	2023-08-15 14:38:18 -04:00
Seth Hoenig	6747ef8803	drivers/raw_exec: restore ability to run tasks without nomad running as root (#18206 ) Although nomad officially does not support running the client as a non-root user, doing so has been more or less possible with the raw_exec driver as long as you don't expect features to work like networking or running tasks as specific users. In the cgroups refactoring I bulldozed right over the special casing we had in place for raw_exec to continue working if the cgroups were unable to be created. This PR restores that behavior - you can now (as before) run the nomad client as a non-root user and make use of the raw_exec task driver.	2023-08-15 11:22:30 -05:00
Michael Schurter	0e22fc1a0b	identity: add support for multiple identities + audiences (#18123 ) Allows for multiple `identity{}` blocks for tasks along with user-specified audiences. This is a building block to allow workload identities to be used with Consul, Vault and 3rd party JWT based auth methods. Expiration is still unimplemented and is necessary for JWTs to be used securely, so that's up next. --------- Co-authored-by: Tim Gross <tgross@hashicorp.com>	2023-08-15 09:11:53 -07:00
Seth Hoenig	d9341f0664	update go1.21 (#18184 ) * build: update to go1.21 * go: eliminate helpers in favor of min/max * build: run go mod tidy * build: swap depguard for semgrep * command: fixup broken tls error check on go1.21	2023-08-14 08:43:27 -05:00
hashicorp-copywrite[bot]	2d35e32ec9	Update copyright file headers to BUSL-1.1	2023-08-10 17:27:15 -05:00
Seth Hoenig	a4cc76bd3e	numa: enable numa topology detection (#18146 ) * client: refactor cgroups management in client * client: fingerprint numa topology * client: plumb numa and cgroups changes to drivers * client: cleanup task resource accounting * client: numa client and config plumbing * lib: add a stack implementation * tools: remove ec2info tool * plugins: fixup testing for cgroups / numa changes * build: update makefile and package tests and cl	2023-08-10 17:05:30 -05:00
Tim Gross	8ad663d1de	allocwatcher: don't destroy local allocdir after migration (#18108 ) When ephemeral disks are migrated from an allocation on the same node, allocation logs for the previous allocation are lost. There are two workflows for the best-effort attempt to migrate the allocation data between the old and new allocations. For previous allocations on other clients (the "remote" workflow), we create a local allocdir and download the data from the previous client into it. That data is then moved into the new allocdir and we delete the allocdir of the previous alloc. For "local" previous allocations we don't need to create an extra directory for the previous allocation and instead move the files directly from one to the other. But we still delete the old allocdir _entirely_, which includes all the logs! There doesn't seem to be any reason to destroy the local previous allocdir, as the usual client garbage collection should destroy it later on when needed. By not deleting it, the previous allocation's logs are still available for the user to read. Fixes: #18034	2023-08-02 09:41:46 -04:00
Charlie Voiselle	585b0533c0	[dep] bump golang.org/x/exp (#18102 ) There are some refactorings that have to be made in the getter and state where the api changed in `slices` * Bump golang.org/x/exp * Bump golang.org/x/exp in api * Update job_endpoint_test * [feedback] unexport sort function	2023-08-01 11:50:17 -04:00
Kevin Schoonover	4841791c86	fingerprint: fix 'default' alias not added to interface specified by `network_interface` (#18096 )	2023-08-01 08:35:31 -04:00
Gerard Nguyen	9e98d694a6	feature: Add new field render_templates on restart block (#18054 ) This feature is necessary when user want to explicitly re-render all templates on task restart. E.g. to fetch all new secrets from Vault, even if the lease on the existing secrets has not been expired.	2023-07-28 11:53:32 -07:00
Ville Vesilehto	2c463bb038	chore(lint): use Go stdlib variables for HTTP methods and status codes (#17968 )	2023-07-26 15:28:09 +01:00
stswidwinski	b9a388f5df	Retain task states for post stop tasks at the time of node GC (#18005 ) * Retain task states for post stop tasks at the time of node GC	2023-07-21 10:55:00 -07:00
Luiz Aoqui	ce0f60fb68	metrics: report task memory_max value (#17938 ) Add new `nomad.client.allocs.memory.max_allocated` metric to report the value of the task `memory_max` resource value.	2023-07-19 16:50:12 -04:00
Luiz Aoqui	e664f1439a	nsd: retain query params in HTTP health checks (#17936 ) Apply the same logic as Consul service health checks when building the HTTP URL so that query params in `path` are preserved.	2023-07-19 16:46:22 -04:00
Patric Stout	e190eae395	Use config "cpu_total_compute" (if set) for all CPU statistics (#17628 ) Before this commit, it was only used for fingerprinting, but not for CPU stats on nodes or tasks. This meant that if the auto-detection failed, setting the cpu_total_compute didn't resolved the issue. This issue was most noticeable on ARM64, as there auto-detection always failed.	2023-07-19 13:30:47 -05:00
Michael Schurter	6e9514920d	remove empty file (#17853 )	2023-07-10 16:34:10 -07:00
Tim Gross	0ba7d0036b	CSI: persist previous mounts on client to restore during restart (#17840 ) When claiming a CSI volume, we need to ensure the CSI node plugin is running before we send any CSI RPCs. This extends even to the controller publish RPC because it requires the storage provider's "external node ID" for the client. This primarily impacts client restarts but also is a problem if the node plugin exits (and fingerprints) while the allocation that needs a CSI volume claim is being placed. Unfortunately there's no mapping of volume to plugin ID available in the jobspec, so we don't have enough information to wait on plugins until we either get the volume from the server or retrieve the plugin ID from data we've persisted on the client. If we always require getting the volume from the server before making the claim, a client restart for disconnected clients will cause all the allocations that need CSI volumes to fail. Even while connected, checking in with the server to verify the volume's plugin before trying to make a claim RPC is inherently racy, so we'll leave that case as-is and it will fail the claim if the node plugin needed to support a newly-placed allocation is flapping such that the node fingerprint is changing. This changeset persists a minimum subset of data about the volume and its plugin in the client state DB, and retrieves that data during the CSI hook's prerun to avoid re-claiming and remounting the volume unnecessarily. This changeset also updates the RPC handler to use the external node ID from the claim whenever it is available. Fixes: #13028	2023-07-10 13:20:15 -04:00
Devashish Taneja	b31e891e5f	Include parent job ID as a Docker container label (#17843 ) Fixes: #17751	2023-07-10 11:27:45 -04:00
Seth Hoenig	100c460467	env/aws: updates from ec2info (#17835 )	2023-07-07 10:12:05 -05:00
Yorick Gersie	709f20c04a	cni: ensure to setup CNI addresses in deterministic order (#17766 ) * cni: ensure to setup CNI addresses in deterministic order Currently as commented in the code the go-cni library returns an unordered map of interfaces. In cases where there are multiple CNI interfaces being created this creates a problem with service registration and healthchecking because the first address in the map is being used. The use case we have where this is an issue is that we run CNI with the macvlan plugin to isolate workloads, but they still need to be able to access the host on a static address to be able to perform local resolving and hit host services like the Consul agent API. To make this work there are 2 options, you either add a macvlan interface on the host with an assigned address for each VLAN you have or you create an additional veth bridged interface in the container namespace. We chose the latter option through a custom CNI plugin but the ordering issue leaves us with incorrect service registration. * Updates after feedback * First check for the CNIResult interfaces length, if it's zero we don't need to proceed at all. * Use sorted interfaces list for the address fallback scenario as well. * Remove "found" log message logic, when an address isn't found an error is returned stating the allocation could not be configured as an address was missing from the CNIResult. If we still need a Warn message then we can add it to the condition that returns the error if no address could be found instead of using the "found" bool logic.	2023-07-06 13:25:29 -07:00
Patric Stout	ede662a828	metrics: add "total_ticks_count" for CPU metrics (#17579 ) This counter tells you the total amount of ticks for that CPU entry since the start of Nomad.	2023-07-05 10:28:55 -04:00
Tim Gross	78f4f76520	adjust prioritized client updates (#17541 ) In #17354 we made client updates prioritized to reduce client-to-server traffic. When the client has no previously-acknowledged update we assume that the update is of typical priority; although we don't know that for sure in practice an allocation will never become healthy quickly enough that the first update we send is the update saying the alloc is healthy. But that doesn't account for allocations that quickly fail in an unrecoverable way because of allocrunner hook failures, and it'd be nice to be able to send those failure states to the server more quickly. This changeset does so and adds some extra comments on reasoning behind priority.	2023-06-26 09:14:24 -04:00
grembo	6f04b91912	Add `disable_file` parameter to job's `vault` stanza (#13343 ) This complements the `env` parameter, so that the operator can author tasks that don't share their Vault token with the workload when using `image` filesystem isolation. As a result, more powerful tokens can be used in a job definition, allowing it to use template stanzas to issue all kinds of secrets (database secrets, Vault tokens with very specific policies, etc.), without sharing that issuing power with the task itself. This is accomplished by creating a directory called `private` within the task's working directory, which shares many properties of the `secrets` directory (tmpfs where possible, not accessible by `nomad alloc fs` or Nomad's web UI), but isn't mounted into/bound to the container. If the `disable_file` parameter is set to `false` (its default), the Vault token is also written to the NOMAD_SECRETS_DIR, so the default behavior is backwards compatible. Even if the operator never changes the default, they will still benefit from the improved behavior of Nomad never reading the token back in from that - potentially altered - location.	2023-06-23 15:15:04 -04:00
James Rasell	ced6c5890e	client: remove unused nsd check allocation result diff func (#17695 )	2023-06-23 15:26:06 +01:00
Tim Gross	deae9bb62e	client: send node secret with every client-to-server RPC (#16799 ) In Nomad 1.5.3 we fixed a security bug that allowed bypass of ACL checks if the request came thru a client node first. But this fix broke (knowingly) the identification of many client-to-server RPCs. These will be now measured as if they were anonymous. The reason for this is that many client-to-server RPCs do not send the node secret and instead rely on the protection of mTLS. This changeset ensures that the node secret is being sent with every client-to-server RPC request. In a future version of Nomad we can add enforcement on the server side, but this was left out of this changeset to reduce risks to the safe upgrade path. Sending the node secret as an auth token introduces a new problem during initial introduction of a client. Clients send many RPCs concurrently with `Node.Register`, but until the node is registered the node secret is unknown to the server and will be rejected as invalid. This causes permission denied errors. To fix that, this changeset introduces a gate on having successfully made a `Node.Register` RPC before any other RPCs can be sent (except for `Status.Ping`, which we need earlier but which also ignores the error because that handler doesn't do an authorization check). This ensures that we only send requests with a node secret already known to the server. This also makes client startup a little easier to reason about because we know `Node.Register` must succeed first, and it should make for a good place to hook in future plans for secure introduction of nodes. The tradeoff is that an existing client that has running allocs will take slightly longer (a second or two) to transition to ready after a restart, because the transition in `Node.UpdateStatus` is gated at the server by first submitting `Node.UpdateAlloc` with client alloc updates.	2023-06-22 11:06:49 -04:00
Seth Hoenig	33ac5ed1df	client: do not disable memory swappiness if kernel does not support it (#17625 ) * client: do not disable memory swappiness if kernel does not support it This PR adds a workaround for very old Linux kernels which do not support the memory swappiness interface file. Normally we write a "0" to the file to explicitly disable swap. In the case the kernel does not support it, give libcontainer a nil value so it does not write anything. Fixes #17448 * client: detect swappiness by writing to the file * fixup changelog Co-authored-by: James Rasell <jrasell@users.noreply.github.com> --------- Co-authored-by: James Rasell <jrasell@users.noreply.github.com>	2023-06-22 09:36:31 -05:00
VishnuJin	102f73274b	fingerprint: added windows os.build attribute to host fingerprint (#17576 )	2023-06-21 10:53:50 -04:00
Patric Stout	a1a5241606	Fix DevicesSets being removed when cpusets are reloaded with cgroup v2 (#17535 ) * Fix DevicesSets being removed when cpusets are reloaded with cgroup v2 This meant that if any allocation was created or removed, all active DevicesSets were removed from all cgroups of all tasks. This was most noticeable with "exec" and "raw_exec", as it meant they no longer had access to /dev files. * e2e: add test for verifying cgroups do not interfere with access to devices --------- Co-authored-by: Seth Hoenig <shoenig@duck.com>	2023-06-15 09:39:36 -05:00
Tim Gross	068d0ea9af	node pools: add pool as label on client metrics (#17528 ) This changeset adds the node pool as a label anywhere we're already emitting labels with additional information such as node class or ID about the client.	2023-06-14 15:58:38 -04:00
Luiz Aoqui	30921a1bb4	client: fix panic on alloc stop in non-Linux environments (#17515 ) Provide a no-op implementation of the drivers.DriverNetoworkManager interface to be used by systems that don't support network isolation and prevent panics where a network manager is expected.	2023-06-14 10:22:38 -04:00
Seth Hoenig	89ce092b20	docker: stop network pause container of lost alloc after node restart (#17455 ) This PR fixes a bug where the docker network pause container would not be stopped and removed in the case where a node is restarted, the alloc is moved to another node, the node comes back up. See the issue below for full repro conditions. Basically in the DestroyNetwork PostRun hook we would depend on the NetworkIsolationSpec field not being nil - which is only the case if the Client stays alive all the way from network creation to network teardown. If the node is rebooted we lose that state and previously would not be able to find the pause container to remove. Now, we manually find the pause container by scanning them and looking for the associated allocID. Fixes #17299	2023-06-09 08:46:29 -05:00
Seth Hoenig	225693ad28	client: fix client panic during drain cause by shutdown (#17450 ) During shutdown of a client with drain_on_shutdown there is a race between the Client ending the cgroup and the task's cpuset manager cleaning up the cgroup. During the path traversal, skip anything we cannot read, which avoids the nil DirEntry we try to dereference now.	2023-06-07 15:12:44 -05:00

1 2 3 4 5 ...

4792 Commits