nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-05 09:55:44 +03:00

Author	SHA1	Message	Date
stswidwinski	b9a388f5df	Retain task states for post stop tasks at the time of node GC (#18005 ) * Retain task states for post stop tasks at the time of node GC	2023-07-21 10:55:00 -07:00
Luiz Aoqui	ce0f60fb68	metrics: report task memory_max value (#17938 ) Add new `nomad.client.allocs.memory.max_allocated` metric to report the value of the task `memory_max` resource value.	2023-07-19 16:50:12 -04:00
Luiz Aoqui	e664f1439a	nsd: retain query params in HTTP health checks (#17936 ) Apply the same logic as Consul service health checks when building the HTTP URL so that query params in `path` are preserved.	2023-07-19 16:46:22 -04:00
Patric Stout	e190eae395	Use config "cpu_total_compute" (if set) for all CPU statistics (#17628 ) Before this commit, it was only used for fingerprinting, but not for CPU stats on nodes or tasks. This meant that if the auto-detection failed, setting the cpu_total_compute didn't resolved the issue. This issue was most noticeable on ARM64, as there auto-detection always failed.	2023-07-19 13:30:47 -05:00
Michael Schurter	6e9514920d	remove empty file (#17853 )	2023-07-10 16:34:10 -07:00
Tim Gross	0ba7d0036b	CSI: persist previous mounts on client to restore during restart (#17840 ) When claiming a CSI volume, we need to ensure the CSI node plugin is running before we send any CSI RPCs. This extends even to the controller publish RPC because it requires the storage provider's "external node ID" for the client. This primarily impacts client restarts but also is a problem if the node plugin exits (and fingerprints) while the allocation that needs a CSI volume claim is being placed. Unfortunately there's no mapping of volume to plugin ID available in the jobspec, so we don't have enough information to wait on plugins until we either get the volume from the server or retrieve the plugin ID from data we've persisted on the client. If we always require getting the volume from the server before making the claim, a client restart for disconnected clients will cause all the allocations that need CSI volumes to fail. Even while connected, checking in with the server to verify the volume's plugin before trying to make a claim RPC is inherently racy, so we'll leave that case as-is and it will fail the claim if the node plugin needed to support a newly-placed allocation is flapping such that the node fingerprint is changing. This changeset persists a minimum subset of data about the volume and its plugin in the client state DB, and retrieves that data during the CSI hook's prerun to avoid re-claiming and remounting the volume unnecessarily. This changeset also updates the RPC handler to use the external node ID from the claim whenever it is available. Fixes: #13028	2023-07-10 13:20:15 -04:00
Devashish Taneja	b31e891e5f	Include parent job ID as a Docker container label (#17843 ) Fixes: #17751	2023-07-10 11:27:45 -04:00
Seth Hoenig	100c460467	env/aws: updates from ec2info (#17835 )	2023-07-07 10:12:05 -05:00
Yorick Gersie	709f20c04a	cni: ensure to setup CNI addresses in deterministic order (#17766 ) * cni: ensure to setup CNI addresses in deterministic order Currently as commented in the code the go-cni library returns an unordered map of interfaces. In cases where there are multiple CNI interfaces being created this creates a problem with service registration and healthchecking because the first address in the map is being used. The use case we have where this is an issue is that we run CNI with the macvlan plugin to isolate workloads, but they still need to be able to access the host on a static address to be able to perform local resolving and hit host services like the Consul agent API. To make this work there are 2 options, you either add a macvlan interface on the host with an assigned address for each VLAN you have or you create an additional veth bridged interface in the container namespace. We chose the latter option through a custom CNI plugin but the ordering issue leaves us with incorrect service registration. * Updates after feedback * First check for the CNIResult interfaces length, if it's zero we don't need to proceed at all. * Use sorted interfaces list for the address fallback scenario as well. * Remove "found" log message logic, when an address isn't found an error is returned stating the allocation could not be configured as an address was missing from the CNIResult. If we still need a Warn message then we can add it to the condition that returns the error if no address could be found instead of using the "found" bool logic.	2023-07-06 13:25:29 -07:00
Patric Stout	ede662a828	metrics: add "total_ticks_count" for CPU metrics (#17579 ) This counter tells you the total amount of ticks for that CPU entry since the start of Nomad.	2023-07-05 10:28:55 -04:00
Tim Gross	78f4f76520	adjust prioritized client updates (#17541 ) In #17354 we made client updates prioritized to reduce client-to-server traffic. When the client has no previously-acknowledged update we assume that the update is of typical priority; although we don't know that for sure in practice an allocation will never become healthy quickly enough that the first update we send is the update saying the alloc is healthy. But that doesn't account for allocations that quickly fail in an unrecoverable way because of allocrunner hook failures, and it'd be nice to be able to send those failure states to the server more quickly. This changeset does so and adds some extra comments on reasoning behind priority.	2023-06-26 09:14:24 -04:00
grembo	6f04b91912	Add `disable_file` parameter to job's `vault` stanza (#13343 ) This complements the `env` parameter, so that the operator can author tasks that don't share their Vault token with the workload when using `image` filesystem isolation. As a result, more powerful tokens can be used in a job definition, allowing it to use template stanzas to issue all kinds of secrets (database secrets, Vault tokens with very specific policies, etc.), without sharing that issuing power with the task itself. This is accomplished by creating a directory called `private` within the task's working directory, which shares many properties of the `secrets` directory (tmpfs where possible, not accessible by `nomad alloc fs` or Nomad's web UI), but isn't mounted into/bound to the container. If the `disable_file` parameter is set to `false` (its default), the Vault token is also written to the NOMAD_SECRETS_DIR, so the default behavior is backwards compatible. Even if the operator never changes the default, they will still benefit from the improved behavior of Nomad never reading the token back in from that - potentially altered - location.	2023-06-23 15:15:04 -04:00
James Rasell	ced6c5890e	client: remove unused nsd check allocation result diff func (#17695 )	2023-06-23 15:26:06 +01:00
Tim Gross	deae9bb62e	client: send node secret with every client-to-server RPC (#16799 ) In Nomad 1.5.3 we fixed a security bug that allowed bypass of ACL checks if the request came thru a client node first. But this fix broke (knowingly) the identification of many client-to-server RPCs. These will be now measured as if they were anonymous. The reason for this is that many client-to-server RPCs do not send the node secret and instead rely on the protection of mTLS. This changeset ensures that the node secret is being sent with every client-to-server RPC request. In a future version of Nomad we can add enforcement on the server side, but this was left out of this changeset to reduce risks to the safe upgrade path. Sending the node secret as an auth token introduces a new problem during initial introduction of a client. Clients send many RPCs concurrently with `Node.Register`, but until the node is registered the node secret is unknown to the server and will be rejected as invalid. This causes permission denied errors. To fix that, this changeset introduces a gate on having successfully made a `Node.Register` RPC before any other RPCs can be sent (except for `Status.Ping`, which we need earlier but which also ignores the error because that handler doesn't do an authorization check). This ensures that we only send requests with a node secret already known to the server. This also makes client startup a little easier to reason about because we know `Node.Register` must succeed first, and it should make for a good place to hook in future plans for secure introduction of nodes. The tradeoff is that an existing client that has running allocs will take slightly longer (a second or two) to transition to ready after a restart, because the transition in `Node.UpdateStatus` is gated at the server by first submitting `Node.UpdateAlloc` with client alloc updates.	2023-06-22 11:06:49 -04:00
Seth Hoenig	33ac5ed1df	client: do not disable memory swappiness if kernel does not support it (#17625 ) * client: do not disable memory swappiness if kernel does not support it This PR adds a workaround for very old Linux kernels which do not support the memory swappiness interface file. Normally we write a "0" to the file to explicitly disable swap. In the case the kernel does not support it, give libcontainer a nil value so it does not write anything. Fixes #17448 * client: detect swappiness by writing to the file * fixup changelog Co-authored-by: James Rasell <jrasell@users.noreply.github.com> --------- Co-authored-by: James Rasell <jrasell@users.noreply.github.com>	2023-06-22 09:36:31 -05:00
VishnuJin	102f73274b	fingerprint: added windows os.build attribute to host fingerprint (#17576 )	2023-06-21 10:53:50 -04:00
Patric Stout	a1a5241606	Fix DevicesSets being removed when cpusets are reloaded with cgroup v2 (#17535 ) * Fix DevicesSets being removed when cpusets are reloaded with cgroup v2 This meant that if any allocation was created or removed, all active DevicesSets were removed from all cgroups of all tasks. This was most noticeable with "exec" and "raw_exec", as it meant they no longer had access to /dev files. * e2e: add test for verifying cgroups do not interfere with access to devices --------- Co-authored-by: Seth Hoenig <shoenig@duck.com>	2023-06-15 09:39:36 -05:00
Tim Gross	068d0ea9af	node pools: add pool as label on client metrics (#17528 ) This changeset adds the node pool as a label anywhere we're already emitting labels with additional information such as node class or ID about the client.	2023-06-14 15:58:38 -04:00
Luiz Aoqui	30921a1bb4	client: fix panic on alloc stop in non-Linux environments (#17515 ) Provide a no-op implementation of the drivers.DriverNetoworkManager interface to be used by systems that don't support network isolation and prevent panics where a network manager is expected.	2023-06-14 10:22:38 -04:00
Seth Hoenig	89ce092b20	docker: stop network pause container of lost alloc after node restart (#17455 ) This PR fixes a bug where the docker network pause container would not be stopped and removed in the case where a node is restarted, the alloc is moved to another node, the node comes back up. See the issue below for full repro conditions. Basically in the DestroyNetwork PostRun hook we would depend on the NetworkIsolationSpec field not being nil - which is only the case if the Client stays alive all the way from network creation to network teardown. If the node is rebooted we lose that state and previously would not be able to find the pause container to remove. Now, we manually find the pause container by scanning them and looking for the associated allocID. Fixes #17299	2023-06-09 08:46:29 -05:00
Seth Hoenig	225693ad28	client: fix client panic during drain cause by shutdown (#17450 ) During shutdown of a client with drain_on_shutdown there is a race between the Client ending the cgroup and the task's cpuset manager cleaning up the cgroup. During the path traversal, skip anything we cannot read, which avoids the nil DirEntry we try to dereference now.	2023-06-07 15:12:44 -05:00
Jerome Eteve	0d41fb6747	client checks kernel module in /sys/module for WSL2 bridge networking (#17306 )	2023-06-06 10:26:50 -04:00
Seth Hoenig	560315a49e	test: ensure cpuset cgroup is setup before fingerprinting (#17428 ) This PR fixes a racey test where we need to ensure the cpuset cgroup is setup before trying to fingerprint it.	2023-06-05 14:15:00 -05:00
hashicorp-copywrite[bot]	8597061b52	[COMPLIANCE] Add Copyright and License Headers (#17429 ) Co-authored-by: hashicorp-copywrite[bot] <110428419+hashicorp-copywrite[bot]@users.noreply.github.com>	2023-06-05 13:23:59 -04:00
Luiz Aoqui	81f0b359dd	node pools: register a node in a node pool (#17405 )	2023-06-02 17:50:50 -04:00
Tim Gross	893d4a77c8	prioritized client updates (#17354 ) The allocrunner sends several updates to the server during the early lifecycle of an allocation and its tasks. Clients batch-up allocation updates every 200ms, but experiments like the C2M challenge has shown that even with this batching, servers can be overwhelmed with client updates during high volume deployments. Benchmarking done in #9451 has shown that client updates can easily represent ~70% of all Nomad Raft traffic. Each allocation sends many updates during its lifetime, but only those that change the `ClientStatus` field are critical for progressing a deployment or kicking off a reschedule to recover from failures. Add a priority to the client allocation sync and update the `syncTicker` receiver so that we only send an update if there's a high priority update waiting, or on every 5th tick. This means when there are no high priority updates, the client will send updates at most every 1s instead of 200ms. Benchmarks have shown this can reduce overall Raft traffic by 10%, as well as reduce client-to-server RPC traffic. This changeset also switches from a channel-based collection of updates to a shared buffer, so as to split batching from sending and prevent backpressure onto the allocrunner when the RPC is slow. This doesn't have a major performance benefit in the benchmarks but makes the implementation of the prioritized update simpler. Fixes: #9451	2023-05-31 15:34:16 -04:00
Luiz Aoqui	4068b68b29	client: fix Consul version finterprint (#17349 ) Consul v1.13.8 was released with a breaking change in the /v1/agent/self endpoint version where a line break was being returned. This caused the Nomad finterprint to fail because `NewVersion` errors on parse. This commit removes any extra space from the Consul version returned by the API.	2023-05-30 11:07:57 -04:00
Seth Hoenig	60e0404bb5	compliance: add headers with fixed copywrite tool (#17353 ) Closes #17117	2023-05-30 09:20:32 -05:00
Charlie Voiselle	4510f6d7c1	[core] nil check and error handling for client status in heartbeat responses (#17316 ) Add a nil check to constructNodeServerInfoResponse to manage an apparent race between deregister and client heartbeats. Fixes #17310	2023-05-25 16:04:54 -04:00
Lance Haig	7e93f150b5	cli: tls certs not created with correct SANs (#16959 ) The `nomad tls cert` command did not create certificates with the correct SANs for them to work with non default domain and region names. This changset updates the code to support non default domains and regions in the certificates.	2023-05-22 09:31:56 -04:00
Seth Hoenig	3cc25949fa	client: ignore restart issued to terminal allocations (#17175 ) * client: ignore restart issued to terminal allocations This PR fixes a bug where issuing a restart to a terminal allocation would cause the allocation to run its hooks anyway. This was particularly apparent with group_service_hook who would then register services but then never deregister them - as the allocation would be effectively in a "zombie" state where it is prepped to run tasks but never will. * e2e: add e2e test for alloc restart zombies * cl: tweak text Co-authored-by: Tim Gross <tgross@hashicorp.com> --------- Co-authored-by: Tim Gross <tgross@hashicorp.com>	2023-05-16 10:19:41 -05:00
Tim Gross	bf7b82b52b	drivers: make internal `DisableLogCollection` capability public (#17196 ) The `DisableLogCollection` capability was introduced as an experimental interface for the Docker driver in 0.10.4. The interface has been stable and allowing third-party task drivers the same capability would be useful for those drivers that don't need the additional overhead of logmon. This PR only makes the capability public. It doesn't yet add it to the configuration options for the other internal drivers. Fixes: #14636 #15686	2023-05-16 09:16:03 -04:00
Tim Gross	88323bab4a	allocrunner: provide factory function so we can build mock ARs (#17161 ) Tools like `nomad-nodesim` are unable to implement a minimal implementation of an allocrunner so that we can test the client communication without having to lug around the entire allocrunner/taskrunner code base. The allocrunner was implemented with an interface specifically for this purpose, but there were circular imports that made it challenging to use in practice. Move the AllocRunner interface into an inner package and provide a factory function type. Provide a minimal test that exercises the new function so that consumers have some idea of what the minimum implementation required is.	2023-05-12 13:29:44 -04:00
Tim Gross	116f24d768	client: de-duplicate alloc updates and gate during restore (#17074 ) When client nodes are restarted, all allocations that have been scheduled on the node have their modify index updated, including terminal allocations. There are several contributing factors: * The `allocSync` method that updates the servers isn't gated on first contact with the servers. This means that if a server updates the desired state while the client is down, the `allocSync` races with the `Node.ClientGetAlloc` RPC. This will typically result in the client updating the server with "running" and then immediately thereafter "complete". * The `allocSync` method unconditionally sends the `Node.UpdateAlloc` RPC even if it's possible to assert that the server has definitely seen the client state. The allocrunner may queue-up updates even if we gate sending them. So then we end up with a race between the allocrunner updating its internal state to overwrite the previous update and `allocSync` sending the bogus or duplicate update. This changeset adds tracking of server-acknowledged state to the allocrunner. This state gets checked in the `allocSync` before adding the update to the batch, and updated when `Node.UpdateAlloc` returns successfully. To implement this we need to be able to equality-check the updates against the last acknowledged state. We also need to add the last acknowledged state to the client state DB, otherwise we'd drop unacknowledged updates across restarts. The client restart test has been expanded to cover a variety of allocation states, including allocs stopped before shutdown, allocs stopped by the server while the client is down, and allocs that have been completely GC'd on the server while the client is down. I've also bench tested scenarios where the task workload is killed while the client is down, resulting in a failed restore. Fixes #16381	2023-05-11 09:05:24 -04:00
Seth Hoenig	2272ca9167	client: unveil /etc/ssh/ssh_known_hosts for artifact downloads (#17122 ) This PR fixes a bug where nodes configured with populated /etc/ssh/ssh_known_hosts files would be unable to read them during artifact downloading. Fixes #17086	2023-05-09 09:43:52 -05:00
Daniel Bennett	c2dc1c58dd	full task cleanup when alloc prerun hook fails (#17104 ) to avoid leaking task resources (e.g. containers, iptables) if allocRunner prerun fails during restore on client restart. now if prerun fails, TaskRunner.MarkFailedKill() will only emit an event, mark the task as failed, and cancel the tr's killCtx, so then ar.runTasks() -> tr.Run() can take care of the actual cleanup. removed from (formerly) tr.MarkFailedDead(), now handled by tr.Run(): * set task state as dead * save task runner local state * task stop hooks also done in tr.Run() now that it's not skipped: * handleKill() to kill tasks while respecting their shutdown delay, and retrying as needed * also includes task preKill hooks * clearDriverHandle() to destroy the task and associated resources * task exited hooks	2023-05-08 13:17:10 -05:00
Tim Gross	2aa3c746c4	logs: fix missing allocation logs after update to Nomad 1.5.4 (#17087 ) When the server restarts for the upgrade, it loads the `structs.Job` from the Raft snapshot/logs. The jobspec has long since been parsed, so none of the guards around the default value are in play. The empty field value for `Enabled` is the zero value, which is false. This doesn't impact any running allocation because we don't replace running allocations when either the client or server restart. But as soon as any allocation gets rescheduled (ex. you drain all your clients during upgrades), it'll be using the `structs.Job` that the server has, which has `Enabled = false`, and logs will not be collected. This changeset fixes the bug by adding a new field `Disabled` which defaults to false (so that the zero value works), and deprecates the old field. Fixes #17076	2023-05-04 16:01:18 -04:00
Seth Hoenig	c89f6a538e	connect: remove unusable path for fallback envoy image names (#17044 ) This PR does some cleanup of an old code path for versions of Consul that did not support reporting the supported versions of Envoy in its API. Those versions are no longer supported for years at this point, and the fallback version of envoy hasn't been supported by any version of Consul for almost as long. Remove this code path that is no longer useful.	2023-05-02 09:48:44 -05:00
Seth Hoenig	7744caed48	connect: use explicit docker.io prefix in default envoy image names (#17045 ) This PR modifies references to the envoyproxy/envoy docker image to explicitly include the docker.io prefix. This does not affect existing users, but makes things easier for Podman users, who otherwise need to specify the full name because Podman does not default to docker.io	2023-05-02 09:27:48 -05:00
Luiz Aoqui	ee5a08dbb2	Revert "hashicorp/go-msgpack v2 (#16810 )" (#17047 ) This reverts commit `8a98520d56`.	2023-05-01 17:18:34 -04:00
Seth Hoenig	0b3bd454ec	connect: do not restrict auto envoy version to docker task driver (#17041 ) This PR updates the envoy_bootstrap_hook to no longer disable itself if the task driver in use is not docker. In other words, make it work for podman and other image based task drivers. The hook now only checks that 1. the task is a connect sidecar 2. the task.config block contains an "image" field	2023-05-01 15:07:35 -05:00
Seth Hoenig	889c5aa0f7	services: un-mark group services as deregistered if restart hook runs (#16905 ) * services: un-mark group services as deregistered if restart hook runs This PR may fix a bug where group services will never be deregistered if the group undergoes a task restart. * e2e: add test case for restart and deregister group service * cl: add cl * e2e: add wait for service list call	2023-04-24 14:24:51 -05:00
Tim Gross	30bc456f03	logs: allow disabling log collection in jobspec (#16962 ) Some Nomad users ship application logs out-of-band via syslog. For these users having `logmon` (and `docker_logger`) running is unnecessary overhead. Allow disabling the logmon and pointing the task's stdout/stderr to /dev/null. This changeset is the first of several incremental improvements to log collection short of full-on logging plugins. The next step will likely be to extend the internal-only task driver configuration so that cluster administrators can turn off log collection for the entire driver. --- Fixes: #11175 Co-authored-by: Thomas Weber <towe75@googlemail.com>	2023-04-24 10:00:27 -04:00
valodzka	b4e6a70fe6	fix host port handling for ipv6 (#16723 )	2023-04-20 19:53:20 -07:00
Etienne Bruines	f7730beb64	cni: fix plugin fingerprinting versions (#16776 ) CNI plugins v1.2.0 and above output a second line, containing supported protocol versions.	2023-04-20 18:44:39 -07:00
astudentofblake	206236039c	fix: added landlock access to /usr/libexec for getter (#16900 )	2023-04-20 11:16:04 -05:00
Luiz Aoqui	3f7143f50b	allocrunner: prevent panic on network manager (#16921 )	2023-04-18 13:39:13 -07:00
Ian Fijolek	8a98520d56	hashicorp/go-msgpack v2 (#16810 ) * Upgrade from hashicorp/go-msgpack v1.1.5 to v2.1.0 Fixes #16808 * Update hashicorp/net-rpc-msgpackrpc to v2 to match go-msgpack * deps: use go-msgpack v2.0.0 go-msgpack v2.1.0 includes some code changes that we will need to investigate furthere to assess its impact on Nomad, so keeping this dependency on v2.0.0 for now since it's no-op. --------- Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>	2023-04-17 17:02:05 -04:00
Seth Hoenig	ed0dfd2ffb	users: eliminate nobody user memoization (#16904 ) This PR eliminates code specific to looking up and caching the uid/gid/user.User object associated with the nobody user in an init block. This code existed before adding the generic users cache and was meant to optimize the one search path we knew would happen often. Now that we have the cache, seems reasonable to eliminate this init block and use the cache instead like for any other user. Also fixes a constraint on the podman (and other) drivers, where building without CGO became problematic on some OS like Fedora IoT where the nobody user cannot be found with the pure-Go standard library. Fixes github.com/hashicorp/nomad-driver-podman/issues/228	2023-04-17 12:30:30 -05:00
Tim Gross	c3002db815	client: allow `drain_on_shutdown` configuration (#16827 ) Adds a new configuration to clients to optionally allow them to drain their workloads on shutdown. The client sends the `Node.UpdateDrain` RPC targeting itself and then monitors the drain state as seen by the server until the drain is complete or the deadline expires. If it loses connection with the server, it will monitor local client status instead to ensure allocations are stopped before exiting.	2023-04-14 15:35:32 -04:00

1 2 3 4 5 ...

4763 Commits