nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-02 16:35:44 +03:00

Author	SHA1	Message	Date
Chris Roberts	362690ddd1	client: suppress kill task event on completed tasks (#26075 ) The `killTasks` function will kill all the alloc runners task runners. If the task of a task runner has already completed, the killing of the task runner can cause confusion due to the task event showing that the task was signaled even though it is already complete. To prevent this, a check is done when creating the task event to determine if the task has completed. If it has no task event is created and when the task runner is killed, no extra task event is added.	2025-07-01 13:30:52 -07:00
Tim Gross	26004c5407	vault: set renew increment to lease duration (#26041 ) When we renew Vault tokens, we use the lease duration to determine how often to renew. But we also set an `increment` value which is never updated from the initial 30s. For periodic tokens this is not a problem because the `increment` field is ignored on renewal. But for non-periodic tokens this prevents the token TTL from being properly incremented. This behavior has been in place since the initial Vault client implementation in #1606 but before the switch to workload identity most (all?) tokens being created were periodic tokens so this was never detected. Fix this bug by updating the request's `increment` field to the lease duration on each renewal. Also switch out a `time.After` call in backoff of the derive token caller with a safe timer so that we don't have to spawn a new goroutine per loop, and have tighter control over when that's GC'd. Ref: https://github.com/hashicorp/nomad/pull/1606 Ref: https://github.com/hashicorp/nomad/issues/25812	2025-06-13 13:50:54 -04:00
Chris Roberts	dfa07e10ed	client: fix batch job drain behavior (#26025 ) Batch job allocations that are drained from a node will be moved to an eligible node. However, when no eligible nodes are available to place the draining allocations, the tasks will end up being complete and will not be placed when an eligible node becomes available. This occurs because the drained allocations are simultaneously stopped on the draining node while attempting to be placed on an eligible node. The stopping of the allocations on the draining node result in tasks being killed, but importantly this kill does not fail the task. The result is tasks reporting as complete due to their state being dead and not being failed. As such, when an eligible node becomes available, all tasks will show as complete and no allocations will need to be placed. To prevent the behavior described above a check is performed when the alloc runner kills its tasks. If the allocation's job type is batch, and the allocation has a desired transition of migrate, the task will be failed when it is killed. This ensures the task does not report as complete, and when an eligible node becomes available the allocations are placed as expected.	2025-06-13 08:28:31 -07:00
Deniz Onur Duzgun	abd0efdd76	sec: remove non-hermetic sprig template functions (#25998 ) * sec:add sprig template functions in denylists * remove explicit set which is no longer needed * go mod tidy * add changelog * better changelog and filtered denylist * go mod tidy with 1.24.4 * edit changelog and remove htpasswd and derive * fix tests * Update client/allocrunner/taskrunner/template/template_test.go Co-authored-by: Tim Gross <tgross@hashicorp.com> * edit changelog --------- Co-authored-by: Tim Gross <tgross@hashicorp.com>	2025-06-09 13:00:47 -04:00
Michael Smithhisler	4c8257d0c7	client: add once mode to template block (#25922 )	2025-05-28 11:45:11 -04:00
Tim Gross	77c8acb422	telemetry: fix excessive CPU consumption in executor (#25870 ) Collecting metrics from processes is expensive, especially on platforms like Windows. The executor code has a 5s cache of stats to ensure that we don't thrash syscalls on nodes running many allocations. But the timestamp used to calculate TTL of this cache was never being set, so we were always treating it as expired. This causes excess CPU utilization on client nodes. Ensure that when we fill the cache, we set the timestamp. In testing on Windows, this reduces exector CPU overhead by roughly 75%. This changeset includes two other related items: * The `telemetry.publish_allocation_metrics` field correctly prevents a node from publishing metrics, but the stats hook on the taskrunner still collects the metrics, which can be expensive. Thread the configuration value into the stats hook so that we don't collect if `telemetry.publish_allocation_metrics = false`. * The `linuxProcStats` type in the executor's `procstats` package is misnamed as a result of a couple rounds of refactoring. It's used by all task executors, not just Linux. Rename this and move a comment about how Windows processes are listed so that the comment is closer to where the logic is implemented. Fixes: https://github.com/hashicorp/nomad/issues/23323 Fixes: https://hashicorp.atlassian.net/browse/NMD-455	2025-05-19 09:24:13 -04:00
James Rasell	be84613dc3	test: Only run and lint Linux network hook test on Linux. (#25858 )	2025-05-15 13:33:37 +01:00
Michael Smithhisler	6036ab8b40	client: close namespace file handle and defensively lazy unmount (#25714 )	2025-04-21 16:25:05 -04:00
James Rasell	85c30dfd1e	test: Remove use of "mitchellh/go-testing-interface" for stdlib. (#25640 ) The stdlib testing package now includes this interface, so we can remove our dependency on the external library.	2025-04-14 07:43:49 +01:00
James Rasell	4c4cb2c6ad	agent: Fix misaligned contextual k/v logging arguments. (#25629 ) Arguments passed to hclog log lines should always have an even number to provide the expected k/v output.	2025-04-10 14:40:21 +01:00
Tim Gross	1a1ccec8b2	CNI: add warning log for CNI check command failures (#25581 ) In #24658 we fixed a bug around client restarts where we would not assert network namespaces existed and were properly configured when restoring allocations. We introduced a call to the CNI `Check` method so that the plugins could report correct config. But when we get an error from this call, we don't log it unless the error is fatal. This makes it challenging to debug the case where the initial check fails but we tear down the network and try again (as described in #25510). Add a noisy log line here. Ref: https://github.com/hashicorp/nomad/pull/24658 Ref: https://github.com/hashicorp/nomad/issues/25510	2025-04-02 10:43:05 -04:00
Tim Gross	e168548341	provide allocrunner hooks with prebuilt taskenv and fix mutation bugs (#25373 ) Some of our allocrunner hooks require a task environment for interpolating values based on the node or allocation. But several of the hooks accept an already-built environment or builder and then keep that in memory. Both of these retain a copy of all the node attributes and allocation metadata, which balloons memory usage until the allocation is GC'd. While we'd like to look into ways to avoid keeping the allocrunner around entirely (see #25372), for now we can significantly reduce memory usage by creating the task environment on-demand when calling allocrunner methods, rather than persisting it in the allocrunner hooks. In doing so, we uncover two other bugs: * The WID manager, the group service hook, and the checks hook have to interpolate services for specific tasks. They mutated a taskenv builder to do so, but each time they mutate the builder, they write to the same environment map. When a group has multiple tasks, it's possible for one task to set an environment variable that would then be interpolated in the service definition for another task if that task did not have that environment variable. Only the service definition interpolation is impacted. This does not leak env vars across running tasks, as each taskrunner has its own builder. To fix this, we move the `UpdateTask` method off the builder and onto the taskenv as the `WithTask` method. This makes a shallow copy of the taskenv with a deep clone of the environment map used for interpolation, and then overwrites the environment from the task. * The checks hook interpolates Nomad native service checks only on `Prerun` and not on `Update`. This could cause unexpected deregistration and registration of checks during in-place updates. To fix this, we make sure we interpolate in the `Update` method. I also bumped into an incorrectly implemented interface in the CSI hook. I've pulled that and some better guardrails out to https://github.com/hashicorp/nomad/pull/25472. Fixes: https://github.com/hashicorp/nomad/issues/25269 Fixes: https://hashicorp.atlassian.net/browse/NET-12310 Ref: https://github.com/hashicorp/nomad/issues/25372	2025-03-24 12:05:04 -04:00
Tim Gross	c67c4ea182	client: statically assert hook interfaces in build (#25472 ) While working on #25373, I noticed that the CSI hook's `Destroy` method doesn't match the interface, which means it never gets called. Because this method only cancels any in-flight CSI requests, the only impact of this bug is that any CSI RPCs that are in-flight when an alloc is GC'd on the client or a dev agent is shut down won't be interrupted gracefully. Fix the interface, but also make static assertions for all the allocrunner hooks in the production code, so that you can make changes to interfaces and have compile-time assistance in avoiding mistakes. Ref: https://github.com/hashicorp/nomad/pull/25373	2025-03-21 09:14:13 -04:00
Michael Smithhisler	4eb294e1ef	client: skip shutdown delay when tasks already deregistered (#25157 ) --------- Co-authored-by: Tim Gross <tgross@hashicorp.com>	2025-03-19 14:15:35 -04:00
Daniel Bennett	04db81951f	test: fix go 1.24 test complaints (#25346 ) e.g. Error: nomad/leader_test.go:382:12: non-constant format string in call to (*testing.common).Fatalf	2025-03-11 11:01:39 -05:00
Tim Gross	f3d53e3e2b	CSI: restart task on failing initial probe, instead of killing it (#25307 ) When a CSI plugin is launched, we probe it until the csi_plugin.health_timeout expires (by default 30s). But if the plugin never becomes healthy, we're not restarting the task as documented. Update the plugin supervisor to trigger a restart instead. We still exit the supervisor loop at that point to avoid having the supervisor send probes to a task that isn't running yet. This requires reworking the poststart hook to allow the supervisor loop to be restarted when the task restarts. In doing so, I identified that we weren't respecting the task kill context from the post start hook, which would leave the supervisor running in the window between when a task is killed because it failed and its stop hooks were triggered. Combine the two contexts to make sure we stop the supervisor whichever context gets closed first. Fixes: https://github.com/hashicorp/nomad/issues/25293 Ref: https://hashicorp.atlassian.net/browse/NET-12264	2025-03-07 10:04:59 -05:00
James Rasell	c0eccda4f7	template: Set any Consul token generated by workload identity. (#25309 )	2025-03-07 14:32:02 +00:00
Michael Smithhisler	5c4d0e923d	consul: Remove legacy token based authentication workflow (#25217 )	2025-03-05 15:38:11 -05:00
Tim Gross	1788bfb42e	remove addresses from node class hash (#24942 ) When a node is fingerprinted, we calculate a "computed class" from a hash over a subset of its fields and attributes. In the scheduler, when a given node fails feasibility checking (before fit checking) we know that no other node of that same class will be feasible, and we add the hash to a map so we can reject them early. This hash cannot include any values that are unique to a given node, otherwise no other node will have the same hash and we'll never save ourselves the work of feasibility checking those nodes. In #4390 we introduce the `nomad.advertise.address` attribute and in #19969 we introduced `consul.dns.addr` attribute. Both of these are unique per node and break the hash. Additionally, we were not correctly filtering attributes out when checking if a node escaped the class by not filtering for attributes that start with `unique.`. The test for this introduced in #708 had an inverted assertion, which allowed this to pass unnoticed since the early days of Nomad. Ref: https://github.com/hashicorp/nomad/pull/708 Ref: https://github.com/hashicorp/nomad/pull/4390 Ref: https://github.com/hashicorp/nomad/pull/19969	2025-03-03 09:28:32 -05:00
James Rasell	7268053174	vault: Remove legacy token based authentication workflow. (#25155 ) The legacy workflow for Vault whereby servers were configured using a token to provide authentication to the Vault API has now been removed. This change also removes the workflow where servers were responsible for deriving Vault tokens for Nomad clients. The deprecated Vault config options used byi the Nomad agent have all been removed except for "token" which is still in use by the Vault Transit keyring implementation. Job specification authors can no longer use the "vault.policies" parameter and should instead use "vault.role" when not using the default workload identity. --------- Co-authored-by: Tim Gross <tgross@hashicorp.com> Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>	2025-02-28 07:40:02 +00:00
Tim Gross	7b89c0ee28	template: fix client's default retry configuration (#25113 ) In #20165 we fixed a bug where a partially configured `client.template` retry block would set any unset fields to nil instead of their default values. But this patch introduced a regression in the default values, so we were now defaulting to unlimited retries if the retry block was unset. Restore the correct behavior and add better test coverage at both the config parsing and template configuration code. Ref: https://github.com/hashicorp/nomad/pull/20165 Ref: https://github.com/hashicorp/nomad/issues/23305#issuecomment-2643731565	2025-02-14 09:25:41 -05:00
Matt Keeler	833e240597	Upgrade to using hashicorp/go-metrics@v0.5.4 (#24856 ) * Upgrade to using hashicorp/go-metrics@v0.5.4 This also requires bumping the dependencies for: * memberlist * serf * raft * raft-boltdb * (and indirectly hashicorp/mdns due to the memberlist or serf update) Unlike some other HashiCorp products, Nomads root module is currently expected to be consumed by others. This means that it needs to be treated more like our libraries and upgrade to hashicorp/go-metrics by utilizing its compat packages. This allows those importing the root module to control the metrics module used via build tags.	2025-01-31 15:22:00 -05:00
Michael Smithhisler	47c14ddf28	remove remote task execution code (#24909 )	2025-01-29 08:08:34 -05:00
Gabi	e107d84c78	taskrunner: fix panic when a task that has a dynamic user is recovered (#24739 )	2025-01-27 13:05:55 -05:00
Tim Gross	08a6f870ad	cni: use check command when restoring from restart (#24658 ) When the Nomad client restarts and restores allocations, the network namespace for an allocation may exist but no longer be correctly configured. For example, if the host is rebooted and the task was a Docker task using a pause container, the network namespace may be recreated by the docker daemon. When we restore an allocation, use the CNI "check" command to verify that any existing network namespace matches the expected configuration. This requires CNI plugins of at least version 1.2.0 to avoid a bug in older plugin versions that would cause the check to fail. If the check fails, destroy the network namespace and try to recreate it from scratch once. If that fails in the second pass, fail the restore so that the allocation can be recreated (rather than silently having networking fail). This should fix the gap left #24650 for Docker task drivers and any other drivers with the `MustInitiateNetwork` capability. Fixes: https://github.com/hashicorp/nomad/issues/24292 Ref: https://github.com/hashicorp/nomad/pull/24650	2025-01-07 09:38:39 -05:00
Tim Gross	24fa7439df	cni: use tmpfs location for ipam plugin (#24650 ) When a Nomad host reboots, the network namespace files in the tmpfs in `/var/run` are wiped out. So when we restore allocations after a host reboot, we need to be able to restore both the network namespace and the network configuration. But because the netns is newly created and we need to run the CNI plugins again, this create potential conflicts with the IPAM plugin which has written state to persistent disk at `/var/lib/cni`. These IPs aren't the ones advertised to Consul, so there's no particular reason to keep them around after a host reboot because all virtual interfaces need to be recreated too. Reconfigure the CNI bridge configuration to use `/var/run/cni` as its state directory. We already expect this location to be created by CNI because the netns files are hard-coded to be created there too in `libcni`. Note this does not fix the problem described for Docker in #24292 because that appears to be related to the netns itself being restored unexpectedly from Docker's state. Ref: https://github.com/hashicorp/nomad/issues/24292#issuecomment-2537078584 Ref: https://www.cni.dev/plugins/current/ipam/host-local/#files	2024-12-16 09:36:35 -05:00
James Rasell	7d48aa2667	client: emit optional telemetry from prerun and prestart hooks. (#24556 ) The Nomad client can now optionally emit telemetry data from the prerun and prestart hooks. This allows operators to monitor and alert on failures and time taken to complete. The new datapoints are: - nomad.client.alloc_hook.prerun.success (counter) - nomad.client.alloc_hook.prerun.failed (counter) - nomad.client.alloc_hook.prerun.elapsed (sample) - nomad.client.task_hook.prestart.success (counter) - nomad.client.task_hook.prestart.failed (counter) - nomad.client.task_hook.prestart.elapsed (sample) The hook execution time is useful to Nomad engineering and will help optimize code where possible and understand job specification impacts on hook performance. Currently only the PreRun and PreStart hooks have telemetry enabled, so we limit the number of new metrics being produced.	2024-12-12 14:43:14 +00:00
Piotr Kazmierczak	3a18f22c18	goflags: go:build linux for tests that won't compile on other platforms (#24559 ) I'm a heavy LSP user and I frequently goto:next_error. This confuses my editor on macOS.	2024-11-28 15:05:00 +01:00
James Rasell	beb4097e81	client: mark the remote_task hook as deprecated. (#24505 )	2024-11-20 15:32:50 +00:00
Florian Apolloner	0a343798b6	Add NOMAD_* variables to CNI args. Fixes #23830 (#24319 ) Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>	2024-11-19 12:48:48 -08:00
Tim Gross	a420732424	consul: allow non-root Nomad to rewrite token (#24410 ) When a task restarts, the Nomad client may need to rewrite the Consul token, but it's created with permissions that prevent a non-root agent from writing to it. While Nomad clients should be run as root (currently), it's harmless to allow whatever user the Nomad agent is running as to be able to write to it, and that's one less barrier to rootless Nomad. Ref: https://github.com/hashicorp/nomad/issues/23859#issuecomment-2465757392	2024-11-19 10:21:14 -05:00
Michael Smithhisler	0714353324	fix: handle template re-renders on client restart (#24399 ) When multiple templates with api functions are included in a task, it's possible for consul-template to re-render templates as it creates watchers, overwriting render event data. This change uses event fields that do not get overwritten, and only executes the change mode for templates that were actually written to disk. --------- Co-authored-by: Tim Gross <tgross@hashicorp.com>	2024-11-08 12:49:38 -05:00
Seth Hoenig	4ef4bebd1f	connect: handle grpc_address as gosockaddr/template string (#24280 ) * connect: handle grpc_address as gosockaddr/template string This PR fixes a bug where the consul.grpc_address could not be set using a go-sockaddr/template string. This was inconsistent with how we do accept such strings for consul.address values. * add changelog	2024-11-07 09:04:58 -06:00
Seth Hoenig	f1ce127524	jobspec: add a chown option to artifact block (#24157 ) * jobspec: add a chown option to artifact block This PR adds a boolean 'chown' field to the artifact block. It indicates whether the Nomad client should chown the downloaded files and directories to be owned by the task.user. This is useful for drivers like raw_exec and exec2 which are subject to the host filesystem user permissions structure. Before, these drivers might not be able to use or manage the downloaded artifacts since they would be owned by the root user on a typical Nomad client configuration. * api: no need for pointer of chown field	2024-10-11 11:30:27 -05:00
Martijn Vegter	3ecf0d21e2	metrics: introduce client config to include alloc metadata as part of the base labels (#23964 )	2024-10-02 10:55:44 -04:00
Tim Gross	cc9227b858	template: fix panic in change_mode=script on client restart (#24057 ) When we introduced change_mode=script to templates, we passed the driver handle down into the template manager so we could call its `Exec` method directly. But the lifecycle of the driver handle is managed by the taskrunner and isn't available when the template manager is first created. This has led to a series of patches trying to fixup the behavior (#15915, #15192, #23663, #23917). Part of the challenge in getting this right is using an interface to avoid the circular import of the driver handle. But the taskrunner already has a way to deal with this problem using a "lazy handle". The other template change modes already use this indirectly through the `Lifecycle` interface. Change the driver handle `Exec` call in the template manager to a new `Lifecycle.Exec` call that reuses the existing behavior. This eliminates the need for the template manager to know anything at all about the handle state. Fixes: https://github.com/hashicorp/nomad/issues/24051	2024-09-25 08:59:01 -04:00
Michael Smithhisler	6b6aa7cc26	identity: adds ability to specify custom filepath for saving workload identities (#24038 )	2024-09-23 10:27:00 -04:00
Seth Hoenig	51215bf102	deps: update to go-set/v3 and refactor to use custom iterators (#23971 ) * deps: update to go-set/v3 * deps: use custom set iterators for looping	2024-09-16 13:40:10 -05:00
Daniel Bennett	5e1fae2856	networking: set alloc NetworkStatus.AddressIPv6 (#23959 ) when a CNI result includes an IPv6 address, set it on the alloc's NetworkStatus for reference. e.g.: $ nomad alloc status -json 3dca \| jq '.NetworkStatus' { "Address": "172.26.64.14", "AddressIPv6": "fd00:a110:c8::b", "DNS": null, "InterfaceName": "eth0" }	2024-09-16 10:21:52 -05:00
Tim Gross	07aca67108	template: lock task handle before trying script check (#23917 ) In #23663 we fixed the template hook so that `change_mode="script"` didn't lose track of the task handle during restores. But this revealed a second bug which is that access to the handle is not locked while in use, which can allow it to be removed concurrently. Fixes: https://github.com/hashicorp/nomad/issues/23875	2024-09-12 08:41:06 -04:00
Daniel Bennett	2f5cf8efae	networking: option to enable ipv6 on bridge network (#23882 ) by setting bridge_network_subnet_ipv6 in client config Co-authored-by: Martina Santangelo <martina.santangelo@hashicorp.com>	2024-09-04 10:17:10 -05:00
Daniel Bennett	a6e29057d6	networking: refactor some iptables for testability (#23856 )	2024-08-23 10:05:46 -05:00
Martina Santangelo	73ce56ba27	networking: refactor building nomad bridge config (#23772 )	2024-08-14 12:43:31 -05:00
Tim Gross	b25f1b66ce	resources: allow job authors to configure size of secrets tmpfs (#23696 ) On supported platforms, the secrets directory is a 1MiB tmpfs. But some tasks need larger space for downloading large secrets. This is especially the case for tasks using `templates`, which need extra room to write a temporary file to the secrets directory that gets renamed to the old file atomically. This changeset allows increasing the size of the tmpfs in the `resources` block. Because this is a memory resource, we need to include it in the memory we allocate for scheduling purposes. The task is already prevented from using more memory in the tmpfs than the `resources.memory` field allows, but can bypass that limit by writing to the tmpfs via `template` or `artifact` blocks. Therefore, we need to account for the size of the tmpfs in the allocation resources. Simply adding it to the memory needed when we create the allocation allows it to be accounted for in all downstream consumers, and then we'll subtract that amount from the memory resources just before configuring the task driver. For backwards compatibility, the default value of 1MiB is "free" and ignored by the scheduler. Otherwise we'd be increasing the allocated resources for every existing alloc, which could cause problems across upgrades. If a user explicitly sets `resources.secrets = 1` it will no longer be free. Fixes: https://github.com/hashicorp/nomad/issues/2481 Ref: https://hashicorp.atlassian.net/browse/NET-10070	2024-08-05 16:06:58 -04:00
Tim Gross	c280891703	template: allow change_mode script to run after client restart (#23663 ) For templates with `change_mode = "script"`, we set a driver handle in the poststart method, so the template runner can execute the script inside the task. But when the client is restarted and the template contents change during that window, we trigger a change_mode in the prestart method. In that case, the hook will not have the handle and so returns an errror trying to run the change mode. We restore the driver handle before we call any prestart hooks, so we can pass that handle in the constructor whenever it's available. In the normal task start case the handle will be empty but also won't be called. The error messages are also misleading, as there's no capabilities check happening here. Update the error messages to match. Fixes: https://github.com/hashicorp/nomad/issues/15851 Ref: https://hashicorp.atlassian.net/browse/NET-9338	2024-07-24 08:29:39 -04:00
Daniel Bennett	32d8ec446f	cni: fix parsing .conf and .json configs (#23629 ) so we can support more than just .conflist format like our docs claim we do	2024-07-18 14:55:13 -05:00
Martina Santangelo	8cbd857ac9	cni: test Setup method (#23609 ) Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>	2024-07-17 13:02:23 -04:00
Martina Santangelo	661011f5de	cni: allow users to set CNI args in job spec (#23538 )	2024-07-12 11:47:15 -04:00
Piotr Kazmierczak	356ea87e00	template: disable sandboxed rendering on Windows (#23432 ) Following #23443, we no longer need to sandbox template rendering on Windows.	2024-06-28 17:16:27 +02:00
Tim Gross	df67e74615	Consul: add preflight checks for Envoy bootstrap (#23381 ) Nomad creates Consul ACL tokens and service registrations to support Consul service mesh workloads, before bootstrapping the Envoy proxy. Nomad always talks to the local Consul agent and never directly to the Consul servers. But the local Consul agent talks to the Consul servers in stale consistency mode to reduce load on the servers. This can result in the Nomad client making the Envoy bootstrap request with a tokens or services that have not yet replicated to the follower that the local client is connected to. This request gets a 404 on the ACL token and that negative entry gets cached, preventing any retries from succeeding. To workaround this, we'll use a method described by our friends over on `consul-k8s` where after creating the objects in Consul we try to read them from the local agent in stale consistency mode (which prevents a failed read from being cached). This cannot completely eliminate this source of error because it's possible that Consul cluster replication is unhealthy at the time we need it, but this should make Envoy bootstrap significantly more robust. This changset adds preflight checks for the objects we create in Consul: * We add a preflight check for ACL tokens after we login via via Workload Identity and in the function we use to derive tokens in the legacy workflow. We do this check early because we also want to use this token for registering group services in the allocrunner hooks. * We add a preflight check for services right before we bootstrap Envoy in the taskrunner hook, so that we have time for our service client to batch updates to the local Consul agent in addition to the local agent sync. We've added the timeouts to be configurable via node metadata rather than the usual static configuration because for most cases, users should not need to touch or even know these values are configurable; the configuration is mostly available for testing. Fixes: https://github.com/hashicorp/nomad/issues/9307 Fixes: https://github.com/hashicorp/nomad/issues/10451 Fixes: https://github.com/hashicorp/nomad/issues/20516 Ref: https://github.com/hashicorp/consul-k8s/pull/887 Ref: https://hashicorp.atlassian.net/browse/NET-10051 Ref: https://hashicorp.atlassian.net/browse/NET-9273 Follow-up: https://hashicorp.atlassian.net/browse/NET-10138	2024-06-27 10:15:37 -04:00

1 2 3 4 5 ...

891 Commits