nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-07 02:45:42 +03:00

Author	SHA1	Message	Date
Tim Gross	cc9227b858	template: fix panic in change_mode=script on client restart (#24057 ) When we introduced change_mode=script to templates, we passed the driver handle down into the template manager so we could call its `Exec` method directly. But the lifecycle of the driver handle is managed by the taskrunner and isn't available when the template manager is first created. This has led to a series of patches trying to fixup the behavior (#15915, #15192, #23663, #23917). Part of the challenge in getting this right is using an interface to avoid the circular import of the driver handle. But the taskrunner already has a way to deal with this problem using a "lazy handle". The other template change modes already use this indirectly through the `Lifecycle` interface. Change the driver handle `Exec` call in the template manager to a new `Lifecycle.Exec` call that reuses the existing behavior. This eliminates the need for the template manager to know anything at all about the handle state. Fixes: https://github.com/hashicorp/nomad/issues/24051	2024-09-25 08:59:01 -04:00
Michael Schurter	8f564182ef	connect: rewrite envoy bootstrap on every restart (#19787 ) Fixes #19781 Do not mark the envoy bootstrap hook as done after successfully running once. Since the bootstrap file is written to /secrets, which is a tmpfs on supported platforms, it is not persisted across reboots. This causes the task and allocation to fail on reboot (see #19781). This fixes it by always rewriting the envoy bootstrap file every time the Nomad agent starts. This does mean we may write a new bootstrap file to an already running Envoy task, but in my testing that doesn't have any impact. This commit doesn't necessarily fix every use of Done by hooks, but hopefully improves the situation. The comment on Done has been expanded to hopefully avoid misuse in the future. Done assertions were removed from tests as they add more noise than value. Alternative 1: Use a regular file An alternative approach would be to write the bootstrap file somewhere other than the tmpfs, but this is unsafe as when Consul ACLs are enabled the file will contain a secret token: https://developer.hashicorp.com/consul/commands/connect/envoy#bootstrap Alternative 2: Detect if file is already written An alternative approach would be to detect if the bootstrap file exists, and only write it if it doesn't. This is just a more complicated form of the current fix. I think in general in the absence of other factors task hooks should be idempotent and therefore able to rerun on any agent startup. This simplifies the code and our ability to reason about task restarts vs agent restarts vs node reboots by making them all take the same code path.	2024-01-24 11:26:31 -08:00
hashicorp-copywrite[bot]	2d35e32ec9	Update copyright file headers to BUSL-1.1	2023-08-10 17:27:15 -05:00
hashicorp-copywrite[bot]	f005448366	[COMPLIANCE] Add Copyright and License Headers	2023-04-10 15:36:59 +00:00
Michael Schurter	9bab96ebd3	Task API via Unix Domain Socket (#15864 ) This change introduces the Task API: a portable way for tasks to access Nomad's HTTP API. This particular implementation uses a Unix Domain Socket and, unlike the agent's HTTP API, always requires authentication even if ACLs are disabled. This PR contains the core feature and tests but followup work is required for the following TODO items: - Docs - might do in a followup since dynamic node metadata / task api / workload id all need to interlink - Unit tests for auth middleware - Caching for auth middleware - Rate limiting on negative lookups for auth middleware --------- Co-authored-by: Seth Hoenig <shoenig@duck.com>	2023-02-06 11:31:22 -08:00
Yorick Gersie	24a575ab80	Allow per_alloc to be used with host volumes (#15780 ) Disallowing per_alloc for host volumes in some cases makes life of a nomad user much harder. When we rely on the NOMAD_ALLOC_INDEX for any configuration that needs to be re-used across restarts we need to make sure allocation placement is consistent. With CSI volumes we can use the `per_alloc` feature but for some reason this is explicitly disabled for host volumes. Ensure host volumes understand the concept of per_alloc	2023-01-26 09:14:47 -05:00
Tim Gross	d3e9b9ac7e	workload identity (#13223 ) In order to support implicit ACL policies for tasks to get their own secrets, each task would need to have its own ACL token. This would add extra raft overhead as well as new garbage collection jobs for cleaning up task-specific ACL tokens. Instead, Nomad will create a workload Identity Claim for each task. An Identity Claim is a JSON Web Token (JWT) signed by the server’s private key and attached to an Allocation at the time a plan is applied. The encoded JWT can be submitted as the X-Nomad-Token header to replace ACL token secret IDs for the RPCs that support identity claims. Whenever a key is is added to a server’s keyring, it will use the key as the seed for a Ed25519 public-private private keypair. That keypair will be used for signing the JWT and for verifying the JWT. This implementation is a ruthlessly minimal approach to support the secure variables feature. When a JWT is verified, the allocation ID will be checked against the Nomad state store, and non-existent or terminal allocation IDs will cause the validation to be rejected. This is sufficient to support the secure variables feature at launch without requiring implementation of a background process to renew soon-to-expire tokens.	2022-07-11 13:34:05 -04:00
Tim Gross	631db25e4a	task runner: fix goroutine leak in prestart hook (#11741 ) The task runner prestart hooks take a `joincontext` so they have the option to exit early if either of two contexts are canceled: from killing the task or client shutdown. Some tasks exit without being shutdown from the server, so neither of the joined contexts ever gets canceled and we leak the `joincontext` (48 bytes) and its internal goroutine. This primarily impacts batch jobs and any task that fails or completes early such as non-sidecar prestart lifecycle tasks. Cancel the `joincontext` after the prestart call exits to fix the leak.	2021-12-23 11:50:51 -05:00
Michael Schurter	43909b1374	Revert "Revert "Use joint context to cancel prestart hooks""	2019-10-08 11:34:09 -07:00
Michael Schurter	680e30457f	Revert "Use joint context to cancel prestart hooks"	2019-10-08 11:27:08 -07:00
Drew Bailey	12be12020e	simplify logic to check for vault read event defer shutdown to cleanup after failed run Co-Authored-By: Michael Schurter <mschurter@hashicorp.com> update comment to include ctx note for shutdown	2019-09-30 11:02:14 -07:00
Michael Schurter	e9e1d788b5	client: cleanup and document context uses Some of the context uses in TR hooks are useless (Killed during Stop never seems meaningful). None of the hooks are interruptable for graceful shutdown which is unfortunate and probably needs fixing.	2019-03-12 15:03:54 -07:00
Michael Schurter	7970b224b0	client: emit event and call exited hooks during cleanup Builds upon earlier commit that cleans up restored handles of terminal allocs by also emitting terminated events and calling exited hooks when appropriate.	2019-03-05 15:12:02 -08:00
Michael Schurter	db9daf6631	client: ensure task is cleaned up when terminal This commit is a significant change. TR.Run is now always executed, even for terminal allocations. This was changed to allow TR.Run to cleanup (run stop hooks) if a handle was recovered. This is intended to handle the case of Nomad receiving a DesiredStatus=Stop allocation update, persisting it, but crashing before stopping AR/TR. The commit also renames task runner hook data as it was very easy to accidently set state on Requests instead of Responses using the old field names.	2019-03-01 14:00:23 -08:00
Preetha Appan	a1cc48f78d	Rename TaskKillRequest/Response to TaskPreKillRequest/Response	2019-01-22 09:54:02 -06:00
Preetha Appan	bf9a2168e7	Rename TaskKillHook to TaskPreKillHook to more closely match usage Also added/fixed comments	2019-01-22 09:41:21 -06:00
Nick Ethier	fbf9a4c772	executor: implement streaming stats API plugins/driver: update driver interface to support streaming stats client/tr: use streaming stats api TODO: * how to handle errors and closed channel during stats streaming * prevent tight loop if Stats(ctx) returns an error drivers: update drivers TaskStats RPC to handle streaming results executor: better error handling in stats rpc docker: better control and error handling of stats rpc driver: allow stats to return a recoverable error	2019-01-12 12:18:22 -05:00
Mahmood Ali	c0162fab35	move cstructs.DeviceNetwork to drivers pkg	2019-01-08 09:11:47 -05:00
Danielle Tomlinson	03db4cf82d	client: Rename drivers/shared/env => client/taskenv	2018-11-30 12:18:39 +01:00
Danielle Tomlinson	6756ffd052	drivers: Move client/drivers/env to drivers/shared/env As part of deprecating legacy drivers, we're moving the env package to a new drivers/shared tree, as it is used by the modern docker and rkt driver packages, and is useful for 3rd party plugins.	2018-11-30 10:46:13 +01:00
Alex Dadgar	429c5bb885	Device hook and devices affect computed node class This PR introduces a device hook that retrieves the device mount information for an allocation. It also updates the computed node class computation to take into account devices. TODO Fix the task runner unit test. The environment variable is being lost even though it is being properly set in the prestart hook.	2018-11-27 17:25:33 -08:00
Michael Schurter	740ca8e6ca	client: fix tr lifecycle logic and shutdown delay ShutdownDelay must be honored whenever the task is killed or restarted. Services were not being deregistered prior to restarting.	2018-11-05 12:32:05 -08:00
Michael Schurter	a44e82f326	tr: implement dispatch payload hook Now passing the TaskDir struct to prestart hooks instead of just the root task dir itself as dispatch needs local/.	2018-10-16 16:56:56 -07:00
Nick Ethier	ea9ed2282e	client: refactor post allocrunnerv2 finalization	2018-10-16 16:56:56 -07:00
Nick Ethier	d335a82859	client: begin driver plugin integration client: fingerprint driver plugins	2018-10-16 16:56:56 -07:00
Alex Dadgar	3a492bb33f	allocrunnerv2 -> allocrunner	2018-10-16 16:56:56 -07:00

26 Commits