nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-06 02:15:43 +03:00

Author	SHA1	Message	Date
Michael Smithhisler	4c8257d0c7	client: add once mode to template block (#25922 )	2025-05-28 11:45:11 -04:00
Tim Gross	cc9227b858	template: fix panic in change_mode=script on client restart (#24057 ) When we introduced change_mode=script to templates, we passed the driver handle down into the template manager so we could call its `Exec` method directly. But the lifecycle of the driver handle is managed by the taskrunner and isn't available when the template manager is first created. This has led to a series of patches trying to fixup the behavior (#15915, #15192, #23663, #23917). Part of the challenge in getting this right is using an interface to avoid the circular import of the driver handle. But the taskrunner already has a way to deal with this problem using a "lazy handle". The other template change modes already use this indirectly through the `Lifecycle` interface. Change the driver handle `Exec` call in the template manager to a new `Lifecycle.Exec` call that reuses the existing behavior. This eliminates the need for the template manager to know anything at all about the handle state. Fixes: https://github.com/hashicorp/nomad/issues/24051	2024-09-25 08:59:01 -04:00
Tim Gross	c280891703	template: allow change_mode script to run after client restart (#23663 ) For templates with `change_mode = "script"`, we set a driver handle in the poststart method, so the template runner can execute the script inside the task. But when the client is restarted and the template contents change during that window, we trigger a change_mode in the prestart method. In that case, the hook will not have the handle and so returns an errror trying to run the change mode. We restore the driver handle before we call any prestart hooks, so we can pass that handle in the constructor whenever it's available. In the normal task start case the handle will be empty but also won't be called. The error messages are also misleading, as there's no capabilities check happening here. Update the error messages to match. Fixes: https://github.com/hashicorp/nomad/issues/15851 Ref: https://hashicorp.atlassian.net/browse/NET-9338	2024-07-24 08:29:39 -04:00
Tim Gross	6d58acd897	WI: ensure tasks within same alloc get different Consul tokens (#20411 ) The `consul_hook` in the allocrunner gets a separate Consul token for each task, even if the tasks' identities have the same name, but used the identity name as the key to the alloc hook resources map. This means the last task in the group overwrites the Consul tokens of all other tasks. Fix this by adding the task name to the key in the allocrunner's `consul_hook`. And update the taskrunner's `consul_hook` to expect the task name in the key. Fixes: https://github.com/hashicorp/nomad/issues/20374 Fixes: https://hashicorp.atlassian.net/browse/NOMAD-614	2024-04-17 11:29:58 -04:00
Tim Gross	13617eee4b	template: improve internal documentation around shutdown (#20134 ) While investigating a report around possible consul-template shutdown issues, which didn't bear fruit, I found that some of the logic around template runner shutdown is unintuitive. * Add some doc strings to the places where someone might think we should be obviously stopping the runner or returning early. * Mark context argument for `Poststart`, `Stop`, and `Update` hooks as unused. No functional code changes.	2024-03-14 15:33:32 -04:00
Tim Gross	df86503349	template: sandbox template rendering The Nomad client renders templates in the same privileged process used for most other client operations. During internal testing, we discovered that a malicious task can create a symlink that can cause template rendering to read and write to arbitrary files outside the allocation sandbox. Because the Nomad agent can be restarted without restarting tasks, we can't simply check that the path is safe at the time we write without encountering a time-of-check/time-of-use race. To protect Nomad client hosts from this attack, we'll now read and write templates in a subprocess: * On Linux/Unix, this subprocess is sandboxed via chroot to the allocation directory. This requires that Nomad is running as a privileged process. A non-root Nomad agent will warn that it cannot sandbox the template renderer. * On Windows, this process is sandboxed via a Windows AppContainer which has been granted access to only to the allocation directory. This does not require special privileges on Windows. (Creating symlinks in the first place can be prevented by running workloads as non-Administrator or non-ContainerAdministrator users.) Both sandboxes cause encountered symlinks to be evaluated in the context of the sandbox, which will result in a "file not found" or "access denied" error, depending on the platform. This change will also require an update to Consul-Template to allow callers to inject a custom `ReaderFunc` and `RenderFunc`. This design is intended as a workaround to allow us to fix this bug without creating backwards compatibility issues for running tasks. A future version of Nomad may introduce a read-only mount specifically for templates and artifacts so that tasks cannot write into the same location that the Nomad agent is. Fixes: https://github.com/hashicorp/nomad/issues/19888 Fixes: CVE-2024-1329	2024-02-08 10:40:24 -05:00
Luiz Aoqui	0bc822db40	vault: load default config for tasks without vault (#19439 ) It is often expected that a task that needs access to Vault defines a `vault` block to specify the Vault policy to use to derive a token. But in some scenarios, like when the Nomad client is connected to a local Vault agent that is responsible for authn/authz, the task is not required to defined a `vault` block. In these situations, the `default` Vault cluster should be used to render the template.	2023-12-12 14:06:55 -05:00
Luiz Aoqui	f0acf72ae7	client: fix Consul token retrievel for templates (#19058 ) The template hook must use the Consul token for the cluster defined in the task-level `consul` block or, if `nil, in the group-level `consul` block. The Consul tokens are generated by the allocrunner consul hook, but during the transition period we must fallback to the Nomad agent token if workload identities are not being used. So an empty token returned from `GetConsulTokens()` is not enough to determine if we should use the legacy flow (either this is an old task or the cluster is not configured for Consul WI), or if there is a misconfiguration (task or group is `consul` block is using a cluster that doesn't have an `identity` set). In order to distinguish between the two scenarios we must iterate over the task identities looking for one suitable for the Consul cluster being used.	2023-11-10 13:42:30 -05:00
Tim Gross	c7c3b3ae33	revoke Consul tokens obtained via WI when alloc stops (#19034 ) Add a `Postrun` and `Destroy` hook to the allocrunner's `consul_hook` to ensure that Consul tokens we've created via WI get revoked via the logout API when we're done with them. Also add the logout to the `Prerun` hook if we've hit an error.	2023-11-09 10:08:09 -05:00
Tim Gross	50f0ce5412	config: remove old Vault/Consul config blocks from client (#18994 ) Remove the now-unused original configuration blocks for Consul and Vault from the client. When the client needs to refer to a Consul or Vault block it will always be for a specific cluster for the task/service. Add a helper for accessing the default clusters (for the client's own use). This is two of three changesets for this work. The remainder will implement the same changes in the `command/agent` package. As part of this work I discovered and fixed two bugs: * The gRPC proxy socket that we create for Envoy is only ever created using the default Consul cluster's configuration. This will prevent Connect from being used with the non-default cluster. * The Consul configuration we use for templates always comes from the default Consul cluster's configuration, but will use the correct Consul token for the non-default cluster. This will prevent templates from being used with the non-default cluster. Ref: https://github.com/hashicorp/nomad/issues/18947 Ref: https://github.com/hashicorp/nomad/pull/18991 Fixes: https://github.com/hashicorp/nomad/issues/18984 Fixes: https://github.com/hashicorp/nomad/issues/18983	2023-11-07 09:15:37 -05:00
Tim Gross	dd62e8a319	consul/vault: use accessor method to get cluster name in client (#18955 ) When looking up the Consul or Vault cluster from a client hook, we should always use an accessor function rather than trying to lookup the `Cluster` field, which may be empty for jobs registered before Nomad 1.7.	2023-11-01 10:59:59 -04:00
Tim Gross	c1fa145765	vault: fix lookups of default cluster across upgrades (#18940 ) Allocations that were created before Nomad 1.7 will not have the `cluster` field set for their Vault blocks. While this can be corrected server-side, that doesn't help allocations already on clients. Also add extra safety on Consul cluster lookup too	2023-10-31 17:30:01 -04:00
Luiz Aoqui	8b9a5fde4e	vault: add multi-cluster support on templates (#18790 ) In Nomad Enterprise, a task may connect to a non-default Vault cluster, requiring `consul-template` to be configured with a specific client `vault` block.	2023-10-18 20:45:01 -04:00
Piotr Kazmierczak	299f3bf74b	client: use WI-issued consul tokens in the template_hook (#18752 ) ref https://github.com/hashicorp/team-nomad/issues/404	2023-10-16 09:39:20 +02:00
hashicorp-copywrite[bot]	2d35e32ec9	Update copyright file headers to BUSL-1.1	2023-08-10 17:27:15 -05:00
Gerard Nguyen	9e98d694a6	feature: Add new field render_templates on restart block (#18054 ) This feature is necessary when user want to explicitly re-render all templates on task restart. E.g. to fetch all new secrets from Vault, even if the lease on the existing secrets has not been expired.	2023-07-28 11:53:32 -07:00
hashicorp-copywrite[bot]	f005448366	[COMPLIANCE] Add Copyright and License Headers	2023-04-10 15:36:59 +00:00
Luiz Aoqui	b668e74a4a	template: restore driver handle on update (#15915 ) When the template hook Update() method is called it may recreate the template manager if the Nomad or Vault token has been updated. This caused the new template manager did not have a driver handler because this was only being set on the Poststart hook, which is not called for inplace updates.	2023-01-27 10:55:59 -05:00
Seth Hoenig	00c8cd37bb	template: protect use of template manager with a lock (#15192 ) This PR protects access to `templateHook.templateManager` with its lock. So far we have not been able to reproduce the panic - but it seems either Poststart is running without a Prestart being run first (should be impossible), or the Update hook is running concurrently with Poststart, nil-ing out the templateManager in a race with Poststart. Fixes #15189	2022-11-10 08:30:27 -06:00
Piotr Kazmierczak	34e4b080f6	template: custom change_mode scripts (#13972 ) This PR adds the functionality of allowing custom scripts to be executed on template change. Resolves #2707	2022-08-24 17:43:01 +02:00
Tim Gross	d3e9b9ac7e	workload identity (#13223 ) In order to support implicit ACL policies for tasks to get their own secrets, each task would need to have its own ACL token. This would add extra raft overhead as well as new garbage collection jobs for cleaning up task-specific ACL tokens. Instead, Nomad will create a workload Identity Claim for each task. An Identity Claim is a JSON Web Token (JWT) signed by the server’s private key and attached to an Allocation at the time a plan is applied. The encoded JWT can be submitted as the X-Nomad-Token header to replace ACL token secret IDs for the RPCs that support identity claims. Whenever a key is is added to a server’s keyring, it will use the key as the seed for a Ed25519 public-private private keypair. That keypair will be used for signing the JWT and for verifying the JWT. This implementation is a ruthlessly minimal approach to support the secure variables feature. When a JWT is verified, the allocation ID will be checked against the Nomad state store, and non-existent or terminal allocation IDs will cause the validation to be rejected. This is sufficient to support the secure variables feature at launch without requiring implementation of a background process to renew soon-to-expire tokens.	2022-07-11 13:34:05 -04:00
James Rasell	9dc0b88cb5	client: add Nomad template service functionality to runner. (#12458 ) This change modifies the template task runner to utilise the new consul-template which includes Nomad service lookup template funcs. In order to provide security and auth to consul-template, we use a custom HTTP dialer which is passed to consul-template when setting up the runner. This method follows Vault implementation. Co-authored-by: Michael Schurter <mschurter@hashicorp.com>	2022-04-06 19:17:05 +02:00
Seth Hoenig	a97254fa20	consul: plubming for specifying consul namespace in job/group This PR adds the common OSS changes for adding support for Consul Namespaces, which is going to be a Nomad Enterprise feature. There is no new functionality provided by this changeset and hopefully no new bugs.	2021-04-05 10:03:19 -06:00
Drew Bailey	19810365f6	oss compoments for multi-vault namespaces adds in oss components to support enterprise multi-vault namespace feature upgrade specific doc on vault multi-namespaces vault docs update test to reflect new error	2020-07-24 10:14:59 -04:00
Danielle Tomlinson	03db4cf82d	client: Rename drivers/shared/env => client/taskenv	2018-11-30 12:18:39 +01:00
Danielle Tomlinson	6756ffd052	drivers: Move client/drivers/env to drivers/shared/env As part of deprecating legacy drivers, we're moving the env package to a new drivers/shared tree, as it is used by the modern docker and rkt driver packages, and is useful for 3rd party plugins.	2018-11-30 10:46:13 +01:00
Michael Schurter	a44e82f326	tr: implement dispatch payload hook Now passing the TaskDir struct to prestart hooks instead of just the root task dir itself as dispatch needs local/.	2018-10-16 16:56:56 -07:00
Alex Dadgar	3a492bb33f	allocrunnerv2 -> allocrunner	2018-10-16 16:56:56 -07:00

28 Commits