nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-07 19:05:42 +03:00

Author	SHA1	Message	Date
Tim Gross	1a1ccec8b2	CNI: add warning log for CNI check command failures (#25581 ) In #24658 we fixed a bug around client restarts where we would not assert network namespaces existed and were properly configured when restoring allocations. We introduced a call to the CNI `Check` method so that the plugins could report correct config. But when we get an error from this call, we don't log it unless the error is fatal. This makes it challenging to debug the case where the initial check fails but we tear down the network and try again (as described in #25510). Add a noisy log line here. Ref: https://github.com/hashicorp/nomad/pull/24658 Ref: https://github.com/hashicorp/nomad/issues/25510	2025-04-02 10:43:05 -04:00
Tim Gross	e168548341	provide allocrunner hooks with prebuilt taskenv and fix mutation bugs (#25373 ) Some of our allocrunner hooks require a task environment for interpolating values based on the node or allocation. But several of the hooks accept an already-built environment or builder and then keep that in memory. Both of these retain a copy of all the node attributes and allocation metadata, which balloons memory usage until the allocation is GC'd. While we'd like to look into ways to avoid keeping the allocrunner around entirely (see #25372), for now we can significantly reduce memory usage by creating the task environment on-demand when calling allocrunner methods, rather than persisting it in the allocrunner hooks. In doing so, we uncover two other bugs: * The WID manager, the group service hook, and the checks hook have to interpolate services for specific tasks. They mutated a taskenv builder to do so, but each time they mutate the builder, they write to the same environment map. When a group has multiple tasks, it's possible for one task to set an environment variable that would then be interpolated in the service definition for another task if that task did not have that environment variable. Only the service definition interpolation is impacted. This does not leak env vars across running tasks, as each taskrunner has its own builder. To fix this, we move the `UpdateTask` method off the builder and onto the taskenv as the `WithTask` method. This makes a shallow copy of the taskenv with a deep clone of the environment map used for interpolation, and then overwrites the environment from the task. * The checks hook interpolates Nomad native service checks only on `Prerun` and not on `Update`. This could cause unexpected deregistration and registration of checks during in-place updates. To fix this, we make sure we interpolate in the `Update` method. I also bumped into an incorrectly implemented interface in the CSI hook. I've pulled that and some better guardrails out to https://github.com/hashicorp/nomad/pull/25472. Fixes: https://github.com/hashicorp/nomad/issues/25269 Fixes: https://hashicorp.atlassian.net/browse/NET-12310 Ref: https://github.com/hashicorp/nomad/issues/25372	2025-03-24 12:05:04 -04:00
Tim Gross	c67c4ea182	client: statically assert hook interfaces in build (#25472 ) While working on #25373, I noticed that the CSI hook's `Destroy` method doesn't match the interface, which means it never gets called. Because this method only cancels any in-flight CSI requests, the only impact of this bug is that any CSI RPCs that are in-flight when an alloc is GC'd on the client or a dev agent is shut down won't be interrupted gracefully. Fix the interface, but also make static assertions for all the allocrunner hooks in the production code, so that you can make changes to interfaces and have compile-time assistance in avoiding mistakes. Ref: https://github.com/hashicorp/nomad/pull/25373	2025-03-21 09:14:13 -04:00
Tim Gross	08a6f870ad	cni: use check command when restoring from restart (#24658 ) When the Nomad client restarts and restores allocations, the network namespace for an allocation may exist but no longer be correctly configured. For example, if the host is rebooted and the task was a Docker task using a pause container, the network namespace may be recreated by the docker daemon. When we restore an allocation, use the CNI "check" command to verify that any existing network namespace matches the expected configuration. This requires CNI plugins of at least version 1.2.0 to avoid a bug in older plugin versions that would cause the check to fail. If the check fails, destroy the network namespace and try to recreate it from scratch once. If that fails in the second pass, fail the restore so that the allocation can be recreated (rather than silently having networking fail). This should fix the gap left #24650 for Docker task drivers and any other drivers with the `MustInitiateNetwork` capability. Fixes: https://github.com/hashicorp/nomad/issues/24292 Ref: https://github.com/hashicorp/nomad/pull/24650	2025-01-07 09:38:39 -05:00
hashicorp-copywrite[bot]	2d35e32ec9	Update copyright file headers to BUSL-1.1	2023-08-10 17:27:15 -05:00
Seth Hoenig	89ce092b20	docker: stop network pause container of lost alloc after node restart (#17455 ) This PR fixes a bug where the docker network pause container would not be stopped and removed in the case where a node is restarted, the alloc is moved to another node, the node comes back up. See the issue below for full repro conditions. Basically in the DestroyNetwork PostRun hook we would depend on the NetworkIsolationSpec field not being nil - which is only the case if the Client stays alive all the way from network creation to network teardown. If the node is rebooted we lose that state and previously would not be able to find the pause container to remove. Now, we manually find the pause container by scanning them and looking for the associated allocID. Fixes #17299	2023-06-09 08:46:29 -05:00
hashicorp-copywrite[bot]	f005448366	[COMPLIANCE] Add Copyright and License Headers	2023-04-10 15:36:59 +00:00
James Rasell	e34fa583f9	allow configuration of Docker hostnames in bridge mode (#11173 ) Add a new hostname string parameter to the network block which allows operators to specify the hostname of the network namespace. Changing this causes a destructive update to the allocation and it is omitted if empty from API responses. This parameter also supports interpolation. In order to have a hostname passed as a configuration param when creating an allocation network, the CreateNetwork func of the DriverNetworkManager interface needs to be updated. In order to minimize the disruption of future changes, rather than add another string func arg, the function now accepts a request struct along with the allocID param. The struct has the hostname as a field. The in-tree implementations of DriverNetworkManager.CreateNetwork have been modified to account for the function signature change. In updating for the change, the enhancement of adding hostnames to network namespaces has also been added to the Docker driver, whilst the default Linux manager does not current implement it.	2021-09-16 08:13:09 +02:00
Tim Gross	2a640f0b2d	docker: generate /etc/hosts file for bridge network mode (#10766 ) When `network.mode = "bridge"`, we create a pause container in Docker with no networking so that we have a process to hold the network namespace we create in Nomad. The default `/etc/hosts` file of that pause container is then used for all the Docker tasks that share that network namespace. Some applications rely on this file being populated. This changeset generates a `/etc/hosts` file and bind-mounts it to the container when Nomad owns the network, so that the container's hostname has an IP in the file as expected. The hosts file will include the entries added by the Docker driver's `extra_hosts` field. In this changeset, only the Docker task driver will take advantage of this option, as the `exec`/`java` drivers currently copy the host's `/etc/hosts` file and this can't be changed without breaking backwards compatibility. But the fields are available in the task driver protobuf for community task drivers to use if they'd like.	2021-06-16 14:55:22 -04:00
Nick Ethier	756aa11654	client: add NetworkStatus to Allocation (#8657 )	2020-10-12 13:43:04 -04:00
Tim Gross	e17901d667	driver/networking: don't recreate existing network namespaces	2019-09-25 14:58:17 -04:00
Nick Ethier	fbe633b9ff	ar: refactor network bridge config to use go-cni lib (#6255 ) * ar: refactor network bridge config to use go-cni lib * ar: use eth as the iface prefix for bridged network namespaces * vendor: update containerd/go-cni package * ar: update network hook to use TODO contexts when calling configurator * unnecessary conversion	2019-09-04 16:33:25 -04:00
Nick Ethier	dc08ec8783	ar: plumb client config for networking into the network hook	2019-07-31 01:04:06 -04:00
Nick Ethier	e15005bdcb	networking: Add new bridge networking mode implementation	2019-07-31 01:04:06 -04:00
Nick Ethier	9fa47daf5c	ar: fix lint errors	2019-07-31 01:03:19 -04:00
Nick Ethier	56d5fe704a	ar: rearrange network hook to support building on windows	2019-07-31 01:03:19 -04:00
Nick Ethier	16343ae23a	ar: add tests for network hook	2019-07-31 01:03:18 -04:00
Nick Ethier	c39e8dca6e	ar: move linux specific code to it's own file and add tests	2019-07-31 01:03:18 -04:00
Nick Ethier	e20fa7ccc1	Add network lifecycle management Adds a new Prerun and Postrun hooks to manage set up of network namespaces on linux. Work still needs to be done to make the code platform agnostic and support Docker style network initalization.	2019-07-31 01:03:17 -04:00

19 Commits