nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-03 00:45:43 +03:00

Author	SHA1	Message	Date
Tim Gross	02d26ceb1a	CSI: set plugin `CSI_ENDPOINT` env var only if unset by user (#12257 ) * Use unix:// prefix for CSI_ENDPOINT variable by default * Some plugins have strict validation over the format of the `CSI_ENDPOINT` variable, and unfortunately not all plugins agree. Allow the user to override the `CSI_ENDPOINT` to workaround those cases. * Update all demos and tests with CSI_ENDPOINT	2022-03-21 11:48:47 -04:00
James Rasell	74c886064f	Merge pull request #12307 from hashicorp/b-groupservices-avoid-double-tg-lookup client: avoid double group lookup within groupservice hook setup.	2022-03-16 17:00:01 +01:00
James Rasell	98e7430086	client: avoid double group lookup within groupservice hook setup.	2022-03-16 09:42:57 +01:00
Seth Hoenig	b242957990	ci: swap ci parallelization for unconstrained gomaxprocs	2022-03-15 12:58:52 -05:00
Tim Gross	7d0f87b910	CSI: allow updates to volumes on re-registration (#12167 ) CSI `CreateVolume` RPC is idempotent given that the topology, capabilities, and parameters are unchanged. CSI volumes have many user-defined fields that are immutable once set, and many fields that are not user-settable. Update the `Register` RPC so that updating a volume via the API merges onto any existing volume without touching Nomad-controlled fields, while validating it with the same strict requirements expected for idempotent `CreateVolume` RPCs. Also, clarify that this state store method is used for everything, not just for the `Register` RPC.	2022-03-07 11:06:59 -05:00
Tim Gross	907c795874	CSI: set plugin socket path on restore (#12149 ) The Prestart hook for task runner hooks doesn't get called when we restore a task, because the task is already running. The Postrun hook for CSI plugin supervisors needs the socket path to have been populated so that the client has a valid path.	2022-03-01 10:22:52 -05:00
Tim Gross	03a8d72dba	CSI: implement support for topology (#12129 )	2022-03-01 10:15:46 -05:00
Tim Gross	649f1e3967	CSI: retry claims from client when max claims are reached (#12113 ) When the alloc runner claims a volume, an allocation for a previous version of the job may still have the volume claimed because it's still shutting down. In this case we'll receive an error from the server. Retry this error until we succeed or until a very long timeout expires, to give operators a chance to recover broken plugins. Make the alloc runner hook tolerant of temporary RPC failures.	2022-02-24 10:39:07 -05:00
Seth Hoenig	a6cc062c14	client: resolve rebase conflict	2022-02-23 14:32:32 -06:00
Seth Hoenig	b2fe196e42	agent: switch to go.etc.io/bbolt for state store This PR modifies the server and client agents to use `go.etc.io/bbolt` as the implementation for their state stores.	2022-02-23 14:28:31 -06:00
Tim Gross	7bcf0afd81	CSI: allow for concurrent plugin allocations (#12078 ) The dynamic plugin registry assumes that plugins are singletons, which matches the behavior of other Nomad plugins. But because dynamic plugins like CSI are implemented by allocations, we need to handle the possibility of multiple allocations for a given plugin type + ID, as well as behaviors around interleaved allocation starts and stops. Update the data structure for the dynamic registry so that more recent allocations take over as the instance manager singleton, but we still preserve the previous running allocations so that restores work without racing. Multiple allocations can run on a client for the same plugin, even if only during updates. Provide each plugin task a unique path for the control socket so that the tasks don't interfere with each other.	2022-02-23 15:23:07 -05:00
Tim Gross	89ca3d9d75	csi: don't wait to fire initial unmount RPC (#12102 ) In PR #11892 we updated the `csi_hook` to unmount the volume locally via the CSI node RPCs before releasing the claim from the server. The timer for this hook was initialized with the retry time, forcing us to wait 1s before making the first unmount RPC calls. Use the new helper for timers to ensure we clean up the timer nicely.	2022-02-22 13:43:06 -05:00
Michael Schurter	2411d3afd2	core: remove all traces of unused protocol version Nomad inherited protocol version numbering configuration from Consul and Serf, but unlike those projects Nomad has never used it. Nomad's `protocol_version` has always been `1`. While the code is effectively unused and therefore poses no runtime risks to leave, I felt like removing it was best because: 1. Nomad's RPC subsystem has been able to evolve extensively without needing to increment the version number. 2. Nomad's HTTP API has evolved extensively without increment `API{Major,Minor}Version`. If we want to version the HTTP API in the future, I doubt this is the mechanism we would choose. 3. The presence of the `server.protocol_version` configuration parameter is confusing since `server.raft_protocol` is an important parameter for operators to consider. Even more confusing is that there is a distinct Serf protocol version which is included in `nomad server members` output under the heading `Protocol`. `raft_protocol` is the only protocol version relevant to Nomad developers and operators. The other protocol versions are either deadcode or have never changed (Serf). 4. If we were to need to version the RPC, HTTP API, or Serf protocols, I don't think these configuration parameters and variables are the best choice. If we come to that point we should choose a versioning scheme based on the use case and modern best practices -- not this 6+ year old dead code.	2022-02-18 16:12:36 -08:00
Michael Schurter	bdeea4b0db	Merge pull request #11975 from hashicorp/f-connect-debugging connect: write envoy bootstrap debugging info	2022-02-18 13:56:22 -08:00
Seth Hoenig	efee15f13f	connect: bootstrap envoy using -proxy-id This PR modifies the Consul CLI arguments used to bootstrap envoy for Connect sidecars to make use of '-proxy-id' instead of '-sidecar-for'. Nomad registers the sidecar service, so we know what ID it has. The '-sidecar-for' was intended for use when you only know the name of the service for which the sidecar is being created. The improvement here is that using '-proxy-id' does not require an underlying request for listing Consul services. This will make make the interaction between Nomad and Consul more efficient. Closes #10452	2022-02-18 14:58:23 -06:00
Michael Schurter	d47678074b	connect: write envoy bootstrap debugging info When Consul Connect just works, it's wonderful. When it doesn't work it can be exceeding difficult to debug: operators have to check task events, Nomad logs, Consul logs, Consul APIs, and even then critical information is missing. Using Consul to generate a bootstrap config for Envoy is notoriously difficult. Nomad doesn't even log stderr, so operators are left trying to piece together what went wrong. This patch attempts to provide maximal context which unfortunately includes secrets. Secrets are always restricted to the secrets/ directory. This makes debugging a little harder, but allows operators to know exactly what operation Nomad was trying to perform. What's added: - stderr is sent to alloc/logs/envoy_bootstrap.stderr.0 - the CLI is written to secrets/.envoy_bootstrap.cmd - the environment is written to secrets/.envoy_bootstrap.env as JSON Accessing this information is unfortunately awkward: ``` nomad alloc exec -task connect-proxy-count-countdash b36a cat secrets/.envoy_bootstrap.env nomad alloc exec -task connect-proxy-count-countdash b36a cat secrets/.envoy_bootstrap.cmd nomad alloc fs b36a alloc/logs/envoy_bootstrap.stderr.0 ``` The above assumes an alloc id that starts with `b36a` and a Connect sidecar proxy for a service named `count-countdash`. If the alloc is unable to start successfully, the debugging files are only accessible from the host filesystem.	2022-02-18 12:02:36 -08:00
Tiernan	1fabefd27e	interpolate network.dns block on client (#12021 )	2022-02-16 08:39:44 -05:00
Tim Gross	b775a73ded	CSI: make gRPC client creation more robust (#12057 ) Nomad communicates with CSI plugin tasks via gRPC. The plugin supervisor hook uses this to ping the plugin for health checks which it emits as task events. After the first successful health check the plugin supervisor registers the plugin in the client's dynamic plugin registry, which in turn creates a CSI plugin manager instance that has its own gRPC client for fingerprinting the plugin and sending mount requests. If the plugin manager instance fails to connect to the plugin on its first attempt, it exits. The plugin supervisor hook is unaware that connection failed so long as its own pings continue to work. A transient failure during plugin startup may mislead the plugin supervisor hook into thinking the plugin is up (so there's no need to restart the allocation) but no fingerprinter is started. * Refactors the gRPC client to connect on first use. This provides the plugin manager instance the ability to retry the gRPC client connection until success. * Add a 30s timeout to the plugin supervisor so that we don't poll forever waiting for a plugin that will never come back up. Minor improvements: * The plugin supervisor hook creates a new gRPC client for every probe and then throws it away. Instead, reuse the client as we do for the plugin manager. * The gRPC client constructor has a 1 second timeout. Clarify that this timeout applies to the connection and not the rest of the client lifetime.	2022-02-15 16:57:29 -05:00
James Rasell	282eb10a40	Merge pull request #12052 from hashicorp/b-taskrunner-track-deregistered-call client: track service deregister call so it's only called once.	2022-02-14 09:01:26 +01:00
Tim Gross	16baefcb45	csi: provide `CSI_ENDPOINT` env var to plugins (#12050 ) The CSI specification says: > The CO SHALL provide the listen-address for the Plugin by way of the `CSI_ENDPOINT` environment variable. Note that plugins without filesystem isolation won't have the plugin dir bind-mounted to their alloc dir, but we can provide a path to the socket anyways. Refactor to use opts struct for plugin supervisor hook config. The parameter list for configuring the plugin supervisor hook has grown enough where is makes sense to use an options struct similiar to many of the other task runner hooks (ex. template).	2022-02-11 08:46:21 -05:00
James Rasell	72f411c986	client: track service deregister call so it's only called once. In certain task lifecycles the taskrunner service deregister call could be called three times for a task that is exiting. Whilst each hook caller of deregister has its own purpose, we should try and ensure it is only called once during the shutdown lifecycle of a task. This change therefore tracks when deregister has been called, so that subsequent calls are noop. In the event the task is restarting, the deregister value is reset to ensure proper operation.	2022-02-11 09:29:38 +01:00
Luiz Aoqui	bc333c2560	Merge tag 'v1.2.6' into merge-release-1.2.6-branch Version 1.2.6	2022-02-10 14:55:34 -05:00
Seth Hoenig	b3c0e6a7a5	client: check escaping of alloc dir using symlinks This PR adds symlink resolution when doing validation of paths to ensure they do not escape client allocation directories.	2022-02-09 19:50:13 -05:00
Seth Hoenig	6445da9baf	client: fix race condition in use of go-getter go-getter creates a circular dependency between a Client and Getter, which means each is inherently thread-unsafe if you try to re-use on or the other. This PR fixes Nomad to no longer make use of the default Getter objects provided by the go-getter package. Nomad must create a new Client object on every artifact download, as the Client object controls the Src and Dst among other things. When Caling Client.Get, the Getter modifies its own Client reference, creating the circular reference and race condition. We can still achieve most of the desired connection caching behavior by re-using a shared HTTP client with transport pooling enabled.	2022-02-09 19:48:28 -05:00
Kevin Schoonover	6633f8d908	fingerprint: remove metadata from digitalocean (#12032 )	2022-02-09 07:31:45 -05:00
Tim Gross	79e8d394b4	fingerprint: digitalocean fingerprint test requires metadata header (#12028 )	2022-02-08 16:35:13 -05:00
Seth Hoenig	652de761bf	env: update aws cpu configs By running the tools/ec2info tool	2022-02-08 12:44:00 -06:00
Kevin Schoonover	5cea36639d	address comments Co-authored-by: Seth Hoenig <seth.a.hoenig@gmail.com>	2022-02-07 09:03:48 -08:00
Kevin Schoonover	7b6f9540db	small fixes	2022-02-05 22:23:43 -08:00
Kevin Schoonover	4d4c839796	add digitalocean fingerprinter	2022-02-05 22:17:36 -08:00
Karthick Ramachandran	16485f4071	improve error message on service length (#12012 )	2022-02-04 19:39:34 -05:00
Seth Hoenig	4f56d81ce2	Merge pull request #11983 from hashicorp/b-select-after cleanup: prevent leaks from time.After	2022-02-03 09:38:06 -06:00
Samantha	37c14b2a30	Fix health checking for ephemeral poststart tasks (#11945 ) Update the logic in the Nomad client's alloc health tracker which erroneously marks existing healthy allocations with dead poststart ephemeral tasks as unhealthy even if they were already successful during a previous deployment.	2022-02-02 16:29:49 -05:00
Seth Hoenig	c1e033c8c6	cleanup: prevent leaks from time.After This PR replaces use of time.After with a safe helper function that creates a time.Timer to use instead. The new function returns both a time.Timer and a Stop function that the caller must handle. Unlike time.NewTimer, the helper function does not panic if the duration set is <= 0.	2022-02-02 14:32:26 -06:00
Seth Hoenig	97176a5654	deps: import libtime the normal way Previously we copied this library by hand to avoid vendor-ing a bunch of files related to minimock. Now that we no longer vendor, just import the library normally. Also we might use more of the library for handling `time.After` uses, for which this library provides a Context-based solution.	2022-01-31 14:49:05 -06:00
Tim Gross	707b4b3e0e	CSI: node unmount from the client before unpublish RPC (#11892 ) When an allocation stops, the `csi_hook` makes an unpublish RPC to the servers to unpublish via the CSI RPCs: first to the node plugins and then the controller plugins. The controller RPCs must happen after the node RPCs so that the node has had a chance to unmount the volume before the controller tries to detach the associated device. But the client has local access to the node plugins and can independently determine if it's safe to send unpublish RPC to those plugins. This will allow the server to treat the node plugin as abandoned if a client is disconnected and `stop_on_client_disconnect` is set. This will let the server try to send unpublish RPCs to the controller plugins, under the assumption that the client will be trying to unmount the volume on its end first. Note that the CSI `NodeUnpublishVolume`/`NodeUnstageVolume` RPCs can return ignorable errors in the case where the volume has already been unmounted from the node. Handle all other errors by retrying until we get success so as to give operators the opportunity to reschedule a failed node plugin (ex. in the case where they accidentally drained a node without `-ignore-system`). Fan-out the work for each volume into its own goroutine so that we can release a subset of volumes if only one is stuck.	2022-01-28 14:43:58 -05:00
Derek Strickland	143fb90e4c	Update IsEmpty to check for pre-1.2.4 fields (#11930 )	2022-01-28 14:41:49 -05:00
Tim Gross	8364eda1d7	CSI: node unmount from the client before unpublish RPC (#11892 ) When an allocation stops, the `csi_hook` makes an unpublish RPC to the servers to unpublish via the CSI RPCs: first to the node plugins and then the controller plugins. The controller RPCs must happen after the node RPCs so that the node has had a chance to unmount the volume before the controller tries to detach the associated device. But the client has local access to the node plugins and can independently determine if it's safe to send unpublish RPC to those plugins. This will allow the server to treat the node plugin as abandoned if a client is disconnected and `stop_on_client_disconnect` is set. This will let the server try to send unpublish RPCs to the controller plugins, under the assumption that the client will be trying to unmount the volume on its end first. Note that the CSI `NodeUnpublishVolume`/`NodeUnstageVolume` RPCs can return ignorable errors in the case where the volume has already been unmounted from the node. Handle all other errors by retrying until we get success so as to give operators the opportunity to reschedule a failed node plugin (ex. in the case where they accidentally drained a node without `-ignore-system`). Fan-out the work for each volume into its own goroutine so that we can release a subset of volumes if only one is stuck.	2022-01-28 08:30:31 -05:00
Seth Hoenig	87d54b8c21	client: change test to not poke cgroupv2 edge case This PR tweaks the TestCpusetManager_AddAlloc unit test to not break when being run on a machine using cgroupsv2. The behavior of writing an empty cpuset.cpu changes in cgroupv2, where such a group now inherits the value of its parent group, rather than remaining empty. The test in question was written such that a task would consume all available cores shared on an alloc, causing the empty set to be written to the shared group, which works fine on cgroupsv1 but breaks on cgroupsv2. By adjusting the test to consume only 1 core instead of all cores, it no longer triggers that edge case. The actual fix for the new cgroupsv2 behavior will be in #11933	2022-01-27 08:27:40 -06:00
Derek Strickland	a30c7dd56b	Update IsEmpty to check for pre-1.2.4 fields (#11930 )	2022-01-26 11:31:37 -05:00
Nomad Release bot	9f21b724ac	Generate files for 1.2.4 release	2022-01-18 23:43:00 +00:00
Seth Hoenig	87dbc7162b	deps: upgrade docker and runc This PR upgrades - docker dependency to the latest tagged release (v20.10.12) - runc dependency to the latest tagged release (v1.0.3) Docker does not abide by [semver](https://github.com/moby/moby/issues/39302), so it is marked +incompatible, and transitive dependencies are upgrade manually. Runc made three relevant breaking changes * cgroup manager .Set changed to accept Resources instead of Cgroup `3f65946756` * config.Device moved to devices.Device https://github.com/opencontainers/runc/pull/2679 * mountinfo.Mounted now returns an error if the specified path does not exist https://github.com/moby/sys/blob/mountinfo/v0.5.0/mountinfo/mountinfo.go#L16	2022-01-18 08:35:26 -06:00
James Rasell	eee5d90e8b	Merge pull request #11402 from hashicorp/document-client-initial-vault-renew taskrunner: add clarifying initial vault token renew comment.	2022-01-13 16:21:58 +01:00
Alessandro De Blasis	759397533a	metrics: added `mapped_file` metric (#11500 ) Signed-off-by: Alessandro De Blasis <alex@deblasis.net> Co-authored-by: Nate <37554478+servusdei2018@users.noreply.github.com>	2022-01-10 15:35:19 -05:00
grembo	e9032c10d3	Un-break templates when using vault stanza change_mode noop (#11783 ) Templates in nomad jobs make use of the vault token defined in the vault stanza when issuing credentials like client certificates. When using change_mode "noop" in the vault stanza, consul-template is not informed in case a vault token is re-issued (which can happen from time to time for various reasons, as described in https://www.nomadproject.io/docs/job-specification/vault). As a result, consul-template will keep using the old vault token to renew credentials and - once the token expired - stop renewing credentials. The symptom of this problem is a vault_token file that is newer than the issued credential (e.g., TLS certificate) in a job's /secrets directory. This change corrects this, so that h.updater.updatedVaultToken(token) is called, which will inform stakeholders about the new token and make sure, the new token is used by consul-template. Example job template fragment: vault { policies = ["nomad-job-policy"] change_mode = "noop" } template { data = <<-EOH {{ with secret "pki_int/issue/nomad-job" "common_name=myjob.service.consul" "ttl=90m" "alt_names=localhost" "ip_sans=127.0.0.1"}} {{ .Data.certificate }} {{ .Data.private_key }} {{ .Data.issuing_ca }} {{ end }} EOH destination = "${NOMAD_SECRETS_DIR}/myjob.crt" change_mode = "noop" } This fix does not alter the meaning of the three change modes of vault - "noop" - Take no action - "restart" - Restart the job - "signal" - send a signal to the task as the switch statement following line 232 contains the necessary logic. It is assumed that "take no action" was never meant to mean "don't tell consul-template about the new vault token". Successfully tested in a staging cluster consisting of multiple nomad client nodes.	2022-01-10 14:41:38 -05:00
Conor Evans	31978a0366	replace 'a alloc' with 'an alloc' where appropriate (#11792 )	2022-01-10 11:59:46 -05:00
Derek Strickland	43edd0e709	Expose Consul template configuration parameters (#11606 ) This PR exposes the following existing`consul-template` configuration options to Nomad jobspec authors in the `{job.group.task.template}` stanza. - `wait` It also exposes the following`consul-template` configuration to Nomad operators in the `{client.template}` stanza. - `max_stale` - `block_query_wait` - `consul_retry` - `vault_retry` - `wait` Finally, it adds the following new Nomad-specific configuration to the `{client.template}` stanza that allows Operators to set bounds on what `jobspec` authors configure. - `wait_bounds` Co-authored-by: Tim Gross <tgross@hashicorp.com> Co-authored-by: Michael Schurter <mschurter@hashicorp.com>	2022-01-10 10:19:07 -05:00
Tim Gross	d27b1370ae	CSI: tests to exercise csi_hook (#11788 ) Small refactoring of the allocrunner hook for CSI to make it more testable, and a unit test that covers most of its logic.	2022-01-07 15:23:47 -05:00
Arkadiusz	aa21628488	Fix log streaming missing frames (#11721 ) Perform one more read after receiving cancel when streaming file from the allocation API	2022-01-04 14:07:16 -05:00
Tim Gross	631db25e4a	task runner: fix goroutine leak in prestart hook (#11741 ) The task runner prestart hooks take a `joincontext` so they have the option to exit early if either of two contexts are canceled: from killing the task or client shutdown. Some tasks exit without being shutdown from the server, so neither of the joined contexts ever gets canceled and we leak the `joincontext` (48 bytes) and its internal goroutine. This primarily impacts batch jobs and any task that fails or completes early such as non-sidecar prestart lifecycle tasks. Cancel the `joincontext` after the prestart call exits to fix the leak.	2021-12-23 11:50:51 -05:00

1 2 3 4 5 ...

4495 Commits