nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-04 01:15:43 +03:00

Author	SHA1	Message	Date
Daniel Bennett	a6e29057d6	networking: refactor some iptables for testability (#23856 )	2024-08-23 10:05:46 -05:00
Seth Hoenig	8b093a6a5d	scheduler: support for device - aware numa scheduling (#1760 ) (#23837 ) (CE backport of ENT 59433d56c7215c0b8bf33764f41b57d9bd30160f (without ent files)) * scheduler: enhance numa aware scheduling with support for devices * cr: add comments	2024-08-20 07:53:04 -05:00
Piotr Kazmierczak	0bc9796d3b	client: log an error message if total detected cpu is zero (#23827 )	2024-08-15 18:31:27 +02:00
Tim Gross	682c8c0c81	cgroupslib: allow initial controller check with delegated cgroups v2 (#23803 ) During Nomad client initialization with cgroups v2, we assert that the required cgroup controllers are available in the root `cgroup.subtree_control` file by idempotently writing to the file. But if Nomad is running with delegated cgroups, this will fail file permissions checks even if the subtree control file already has the controllers we need. Update the initialization to first check if the controllers are missing before attempting to write to them. This allows cgroup delegation so long as the cluster administrator has pre-created a Nomad owned cgroups tree and set the `Delegate` option in a systemd override. If not, initialization fails in the existing way. Although this is one small step along the way to supporting a rootless Nomad client, running Nomad as non-root is still unsupported. I've intentionally not documented setting up cgroup delegation in this PR, as this PR is insufficient by itself to have a secure and properly-working rootless Nomad client. Ref: https://github.com/hashicorp/nomad/issues/18211 Ref: https://github.com/hashicorp/nomad/issues/13669	2024-08-14 16:58:21 -04:00
Martina Santangelo	73ce56ba27	networking: refactor building nomad bridge config (#23772 )	2024-08-14 12:43:31 -05:00
Seth Hoenig	db0642099e	build: update golangci-lint to 1.60.1 (#23807 ) * build: update golangci-lint to 1.60.1 * ci: update golangci-lint to v1.60.1 Helps with go1.23 compatability. Introduces some breaking changes / newly enforced linter patterns so those are fixed as well.	2024-08-14 10:09:31 -05:00
Tim Gross	920f4702d6	testing: fix skip comment on `RequireWindows` helper (#23776 )	2024-08-09 09:07:25 -04:00
Tim Gross	ef116b12d5	metrics: add `client.tasks` state metrics (#23773 ) Although we have `client.allocations` metrics to track allocation states on a client, having separate metrics for `client.tasks` will allow operators to identify that there are individual tasks in an unexpected state in an otherwise healthy allocation. Fixes: https://github.com/hashicorp/nomad/issues/23770	2024-08-09 09:02:17 -04:00
Tim Gross	9543e740af	docker: fix delimiter for selinux label for read-only volumes (#23750 ) The Docker driver's `volume` field to specify bind-mounts takes a list of strings that consist of three `:`-delimited fields: source, destination, and options. We append the SELinux label from the plugin configuration as the third field. But when the user has already specified the volume is read-only with `:ro`, we're incorrectly appending the SELinux label with another `:` instead of the required `,`. Combine the options into a single field value before appending them to the bind mounts configuration. Updated the tests to split out Windows behavior (which doesn't accept options) and to ensure the test task has the expected environment for bind mounts. Fixes: https://github.com/hashicorp/nomad/issues/23690	2024-08-08 09:08:01 -04:00
Deniz Onur Duzgun	0f7b8698ec	security: fix write symlink escape on the same allocdir path (#23738 ) Resolves symlink escape when unarchiving by removing existing paths within the same allocation directory which can occur by writing a header that points to a symlink that lives outside of the sandbox environment. This exploit requires first compromising the Nomad client agent at the source allocation. Ref: https://hashicorp.atlassian.net/browse/NET-10607 Ref: https://github.com/hashicorp/nomad-enterprise/pull/1725	2024-08-05 16:23:27 -04:00
Tim Gross	b25f1b66ce	resources: allow job authors to configure size of secrets tmpfs (#23696 ) On supported platforms, the secrets directory is a 1MiB tmpfs. But some tasks need larger space for downloading large secrets. This is especially the case for tasks using `templates`, which need extra room to write a temporary file to the secrets directory that gets renamed to the old file atomically. This changeset allows increasing the size of the tmpfs in the `resources` block. Because this is a memory resource, we need to include it in the memory we allocate for scheduling purposes. The task is already prevented from using more memory in the tmpfs than the `resources.memory` field allows, but can bypass that limit by writing to the tmpfs via `template` or `artifact` blocks. Therefore, we need to account for the size of the tmpfs in the allocation resources. Simply adding it to the memory needed when we create the allocation allows it to be accounted for in all downstream consumers, and then we'll subtract that amount from the memory resources just before configuring the task driver. For backwards compatibility, the default value of 1MiB is "free" and ignored by the scheduler. Otherwise we'd be increasing the allocated resources for every existing alloc, which could cause problems across upgrades. If a user explicitly sets `resources.secrets = 1` it will no longer be free. Fixes: https://github.com/hashicorp/nomad/issues/2481 Ref: https://hashicorp.atlassian.net/browse/NET-10070	2024-08-05 16:06:58 -04:00
Tim Gross	c280891703	template: allow change_mode script to run after client restart (#23663 ) For templates with `change_mode = "script"`, we set a driver handle in the poststart method, so the template runner can execute the script inside the task. But when the client is restarted and the template contents change during that window, we trigger a change_mode in the prestart method. In that case, the hook will not have the handle and so returns an errror trying to run the change mode. We restore the driver handle before we call any prestart hooks, so we can pass that handle in the constructor whenever it's available. In the normal task start case the handle will be empty but also won't be called. The error messages are also misleading, as there's no capabilities check happening here. Update the error messages to match. Fixes: https://github.com/hashicorp/nomad/issues/15851 Ref: https://hashicorp.atlassian.net/browse/NET-9338	2024-07-24 08:29:39 -04:00
Daniel Bennett	32d8ec446f	cni: fix parsing .conf and .json configs (#23629 ) so we can support more than just .conflist format like our docs claim we do	2024-07-18 14:55:13 -05:00
Martina Santangelo	8cbd857ac9	cni: test Setup method (#23609 ) Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>	2024-07-17 13:02:23 -04:00
guifran001	1c44521543	client: Add a preferred address family option for network-interface (#23389 ) to prefer ipv4 or ipv6 when deducing IP from network interface Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>	2024-07-12 15:30:38 -05:00
Martina Santangelo	661011f5de	cni: allow users to set CNI args in job spec (#23538 )	2024-07-12 11:47:15 -04:00
Piotr Kazmierczak	7772711c89	plugins: fix nomadTopologyToProto panic on systems that don't support NUMA (#23399 ) After changes introduced in #23284 we no longer need to make a if !st.SupportsNUMA() check in the GetNodes() topology method. In fact this check will now cause panic in nomadTopologyToProto method on systems that don't support NUMA.	2024-07-09 08:41:52 +02:00
Deniz Onur Duzgun	ef6cdec884	security: add escape to arbitrary file access (#23319 )	2024-07-08 14:00:09 -04:00
Tim Gross	18fdda6242	vault: fix namespace reset for clients with unset namespace (#23491 ) The Vault "logical" API doesn't allow configuring the namespace on a per-request basis. Instead, it's set on the client. Our `vaultclient` wrapper locks access to the API client and sets the namespace (and token, if applicable) for each request, and then resets the namespace and unlocks the API client. The logic for resetting the namespace incorrectly assumed that if the Vault configuration didn't set the namespace that it was canonicalized to the non-empty string `"default"`. This results in the API client's namespace getting "stuck" whenever a job uses a non-default namespace if the configuration value is empty. Update the logic to always go back to the configuration, rather than accepting the "previous" namespace from the caller. This changeset also removes some long-dead code in the Vault client wrapper. Fixes: https://github.com/hashicorp/nomad/issues/22230 Ref: https://hashicorp.atlassian.net/browse/NET-10207	2024-07-03 10:13:20 -04:00
Piotr Kazmierczak	356ea87e00	template: disable sandboxed rendering on Windows (#23432 ) Following #23443, we no longer need to sandbox template rendering on Windows.	2024-06-28 17:16:27 +02:00
Tim Gross	df67e74615	Consul: add preflight checks for Envoy bootstrap (#23381 ) Nomad creates Consul ACL tokens and service registrations to support Consul service mesh workloads, before bootstrapping the Envoy proxy. Nomad always talks to the local Consul agent and never directly to the Consul servers. But the local Consul agent talks to the Consul servers in stale consistency mode to reduce load on the servers. This can result in the Nomad client making the Envoy bootstrap request with a tokens or services that have not yet replicated to the follower that the local client is connected to. This request gets a 404 on the ACL token and that negative entry gets cached, preventing any retries from succeeding. To workaround this, we'll use a method described by our friends over on `consul-k8s` where after creating the objects in Consul we try to read them from the local agent in stale consistency mode (which prevents a failed read from being cached). This cannot completely eliminate this source of error because it's possible that Consul cluster replication is unhealthy at the time we need it, but this should make Envoy bootstrap significantly more robust. This changset adds preflight checks for the objects we create in Consul: * We add a preflight check for ACL tokens after we login via via Workload Identity and in the function we use to derive tokens in the legacy workflow. We do this check early because we also want to use this token for registering group services in the allocrunner hooks. * We add a preflight check for services right before we bootstrap Envoy in the taskrunner hook, so that we have time for our service client to batch updates to the local Consul agent in addition to the local agent sync. We've added the timeouts to be configurable via node metadata rather than the usual static configuration because for most cases, users should not need to touch or even know these values are configurable; the configuration is mostly available for testing. Fixes: https://github.com/hashicorp/nomad/issues/9307 Fixes: https://github.com/hashicorp/nomad/issues/10451 Fixes: https://github.com/hashicorp/nomad/issues/20516 Ref: https://github.com/hashicorp/consul-k8s/pull/887 Ref: https://hashicorp.atlassian.net/browse/NET-10051 Ref: https://hashicorp.atlassian.net/browse/NET-9273 Follow-up: https://hashicorp.atlassian.net/browse/NET-10138	2024-06-27 10:15:37 -04:00
Tim Gross	7d73065066	numa: fix scheduler panic due to topology serialization bug (#23284 ) The NUMA topology struct field `NodeIDs` is a `idset.Set`, which has no public members. As a result, this field is never serialized via msgpack and persisted in state. When `numa.affinity = "prefer"`, the scheduler dereferences this nil field and panics the scheduler worker. Ideally we would fix this by adding a msgpack serialization extension, but because the field already exists and is just always empty, this breaks RPC wire compatibility across upgrades. Instead, create a new field that's populated at the same time we populate the more useful `idset.Set`, and repopulate the set on demand. Fixes: https://hashicorp.atlassian.net/browse/NET-9924	2024-06-11 08:55:00 -04:00
Tim Gross	61608e43cb	test: move NUMA platform scan out of testing global (#23289 ) The `testing.go` test helpers file for the driver manager initializes the NUMA scan as a package-global variable. This causes it to be pulled in even in production builds, so even running commands like `nomad version` will cause the NUMA scan to happen. Move the scan into the test helper setup.	2024-06-11 08:52:51 -04:00
nicoche	ffcb72bfe3	api: Add Notes field to service checks (#22397 ) Co-authored-by: Piotr Kazmierczak <470696+pkazmierczak@users.noreply.github.com>	2024-06-10 16:59:49 +02:00
Seth Hoenig	45da80bde2	client: cleanup empty task directory when using unveil filesystem isolation (#23237 ) This PR fixes a bug where Nomad client would leave behind an empty directory created on behalf of tasks making use of the unveil filesystem isolation mode (i.e. using exec2 task driver). Once unmounting is complete, we should remember to also delete the directory. Fixes #22433	2024-06-06 10:47:23 -05:00
Tim Gross	140747240f	consul: include admin partition in JWT login requests (#22226 ) When logging into a JWT auth method, we need to explicitly supply the Consul admin partition if the local Consul agent is in a partition. We can't derive this from agent configuration because the Consul agent's configuration is canonical, so instead we get the partition from the fingerprint (if available). This changeset updates the Consul client constructor so that we close over the partition from the fingerprint. Ref: https://hashicorp.atlassian.net/browse/NET-9451	2024-05-29 16:31:09 -04:00
Daniel Bennett	4415fabe7d	jobspec: time based task execution (#22201 ) this is the CE side of an Enterprise-only feature. a job trying to use this in CE will fail to validate. to enable daily-scheduled execution entirely client-side, a job may now contain: task "name" { schedule { cron { start = "0 12 * * * *" # may not include "," or "/" end = "0 16" # partial cron, with only {minute} {hour} timezone = "EST" # anything in your tzdb } } ... and everything about the allocation will be placed as usual, but if outside the specified schedule, the taskrunner will block on the client, waiting on the schedule start, before proceeding with the task driver execution, etc. this includes a taksrunner hook, which watches for the end of the schedule, at which point it will kill the task. then, restarts-allowing, a new task will start and again block waiting for start, and so on. this also includes all the plumbing required to pipe API calls through from command->api->agent->server->client, so that tasks can be force-run, force-paused, or resume the schedule on demand.	2024-05-22 15:40:25 -05:00
Seth Hoenig	09bd11383c	client: alloc_mounts directory must be sibling of data directory (#22199 ) This PR adjusts the default location of -alloc-mounts-dir path to be a sibling of the -data-dir path rather than a child. This is because on a production-hardened systems the data dir is supposed to be chmod 0700 owned by root - preventing the exec2 task driver (and others using unveil file system isolation features) from working properly. For reference the directory structure from -data-dir now looks like this after running an example job. Under the alloc_mounts directory, task specific directories are mode 0710 and owned by the task user (which may be a dynamic user UID/GID). ➜ sudo tree -p -d -u /tmp/mynomad [drwxrwxr-x shoenig ] /tmp/mynomad ├── [drwx--x--x root ] alloc_mounts │ └── [drwx--x--- 80552 ] c753b71d-c6a1-3370-1f59-47ab838fd8a6-mytask │ ├── [drwxrwxrwx nobody ] alloc │ │ ├── [drwxrwxrwx nobody ] data │ │ ├── [drwxrwxrwx nobody ] logs │ │ └── [drwxrwxrwx nobody ] tmp │ ├── [drwxrwxrwx nobody ] local │ ├── [drwxr-xr-x root ] private │ ├── [drwx--x--- 80552 ] secrets │ └── [drwxrwxrwt nobody ] tmp └── [drwx------ root ] data ├── [drwx--x--x root ] alloc │ └── [drwxr-xr-x root ] c753b71d-c6a1-3370-1f59-47ab838fd8a6 │ ├── [drwxrwxrwx nobody ] alloc │ │ ├── [drwxrwxrwx nobody ] data │ │ ├── [drwxrwxrwx nobody ] logs │ │ └── [drwxrwxrwx nobody ] tmp │ └── [drwx--x--- 80552 ] mytask │ ├── [drwxrwxrwx nobody ] alloc │ │ ├── [drwxrwxrwx nobody ] data │ │ ├── [drwxrwxrwx nobody ] logs │ │ └── [drwxrwxrwx nobody ] tmp │ ├── [drwxrwxrwx nobody ] local │ ├── [drwxrwxrwx nobody ] private │ ├── [drwx--x--- 80552 ] secrets │ └── [drwxrwxrwt nobody ] tmp ├── [drwx------ root ] client └── [drwxr-xr-x root ] server ├── [drwx------ root ] keystore ├── [drwxr-xr-x root ] raft │ └── [drwxr-xr-x root ] snapshots └── [drwxr-xr-x root ] serf 32 directories	2024-05-22 13:14:34 -05:00
Tim Gross	b1657dd1fa	CSI: track node claim before staging to prevent interleaved unstage (#20550 ) The CSI hook for each allocation that claims a volume runs concurrently. If a call to `MountVolume` happens at the same time as a call to `UnmountVolume` for the same volume, it's possible for the second alloc to detect the volume has already been staged, then for the original alloc to unpublish and unstage it, only for the second alloc to then attempt to publish a volume that's been unstaged. The usage tracker on the volume manager was intended to prevent this behavior but the call to claim the volume was made only after staging and publishing was complete. Move the call to claim the volume for the usage tracker to the top of the `MountVolume` workflow to prevent it from being unstaged until all consuming allocations have called `UnmountVolume`. Fixes: https://github.com/hashicorp/nomad/issues/20424	2024-05-16 09:45:07 -04:00
Tim Gross	953bfcc31e	services: retry failed Nomad service deregistrations from client (#20596 ) When the allocation is stopped, we deregister the service in the alloc runner's `PreKill` hook. This ensures we delete the service registration and wait for the shutdown delay before shutting down the tasks, so that workloads can drain their connections. However, the call to remove the workload only logs errors and never retries them. Add a short retry loop to the `RemoveWorkload` method for Nomad services, so that transient errors give us an extra opportunity to deregister the service before the tasks are stopped, before we need to fall back to the data integrity improvements implemented in #20590. Ref: https://github.com/hashicorp/nomad/issues/16616	2024-05-16 08:59:54 -04:00
Deniz Onur Duzgun	1cc99cc1b4	bug: resolve type conversion alerts (#20553 )	2024-05-15 13:22:10 -04:00
Seth Hoenig	4148ca1769	client: mount shared alloc dir as nobody (#20589 ) In the Unveil filesystem isolation mode we were mounting the shared alloc dir with the UID/GID of the user of the task dir being mounted and 0710 filesystem permissions. This was causing the actual task dir to become inaccessible to other tasks in the allocation (a race where the last mounter wins). Instead mount the shared alloc dir as nobody with 0777 filesystem permissions.	2024-05-15 10:43:30 -05:00
James Rasell	04ba358266	client: expose network namespace CNI config as task env vars. (#11810 ) This change exposes CNI configuration details of a network namespace as environment variables. This allows a task to use these value to configure itself; a potential use case is to run a Raft application binding to IP and Port details configured using the bridge network mode.	2024-05-14 09:02:06 +01:00
Juana De La Cuesta	169818b1bd	[gh-6980] Client: clean up old allocs before running new ones using the `exec` task driver. (#20500 ) Whenever the "exec" task driver is being used, nomad runs a plug in that in time runs the task on a container under the hood. If by any circumstance the executor is killed, the task is reparented to the init service and wont be stopped by Nomad in case of a job updated or stop. This commit introduces two mechanisms to avoid this behaviour: * Adds signal catching and handling to the executor, so in case of a SIGTERM, the signal will also be passed on to the task. * Adds a pre start clean up of the processes in the container, ensuring only the ones the executor runs are present at any given time.	2024-05-14 09:51:27 +02:00
Tim Gross	65ae61249c	CSI: include volume namespace in staging path (#20532 ) CSI volumes are namespaced. But the client does not include the namespace in the staging mount path. This causes CSI volumes with the same volume ID but different namespace to collide if they happen to be placed on the same host. The per-allocation paths don't need to be namespaced, because an allocation can only mount volumes from its job's own namespace. Rework the CSI hook tests to have more fine-grained control over the mock on-disk state. Add tests covering upgrades from staging paths missing namespaces. Fixes: https://github.com/hashicorp/nomad/issues/18741	2024-05-13 11:24:09 -04:00
Tim Gross	623486b302	deps: vendor containernetworking/plugins functions for net NS utils (#20556 ) We bring in `containernetworking/plugins` for the contents of a single file, which we use in a few places for running a goroutine in a specific network namespace. This code hasn't needed an update in a couple of years, and a good chunk of what we need was previously vendored into `client/lib/nsutil` already. Updating the library via dependabot is causing errors in Docker driver tests because it updates a lot of transient dependencies, and it's bringing in a pile of new transient dependencies like opentelemetry. Avoid this problem going forward by vendoring the remaining code we hadn't already. Ref: https://github.com/hashicorp/nomad/pull/20146	2024-05-13 09:10:16 -04:00
James Rasell	7e42ad869a	client: fix unallocated CPU metric when reserved cpu is set. (#20543 )	2024-05-09 10:55:22 +01:00
Seth Hoenig	14a022cbc0	drivers/raw_exec: enable setting cgroup override values (#20481 ) * drivers/raw_exec: enable setting cgroup override values This PR enables configuration of cgroup override values on the `raw_exec` task driver. WARNING: setting cgroup override values eliminates any gauruntee Nomad can make about resource availability for any task on the client node. For cgroup v2 systems, set a single unified cgroup path using `cgroup_v2_override`. The path may be either absolute or relative to the cgroup root. config { cgroup_v2_override = "custom.slice/app.scope" } or config { cgroup_v2_override = "/sys/fs/cgroup/custom.slice/app.scope" } For cgroup v1 systems, set a per-controller path for each controller using `cgroup_v1_override`. The path(s) may be either absolute or relative to the controller root. config { cgroup_v1_override = { "pids": "custom/app", "cpuset": "custom/app", } } or config { cgroup_v1_override = { "pids": "/sys/fs/cgroup/pids/custom/app", "cpuset": "/sys/fs/cgroup/cpuset/custom/app", } } * drivers/rawexec: ensure only one of v1/v2 cgroup override is set * drivers/raw_exec: executor should error if setting cgroup does not work * drivers/raw_exec: create cgroups in raw_exec tests * drivers/raw_exec: ensure we fail to start if custom cgroup set and non-root * move custom cgroup func into shared file --------- Co-authored-by: Michael Schurter <mschurter@hashicorp.com>	2024-05-07 16:46:27 -07:00
Tim Gross	f41bc468eb	consul: provide `CONSUL_HTTP_TOKEN` env var to tasks (#20519 ) When available, we provide an environment variable `CONSUL_TOKEN` to tasks, but this isn't the environment variable expected by the Consul CLI. Job specifications like deploying an API Gateway become noticeably nicer if we can instead provide the expected env var.	2024-05-03 11:30:33 -04:00
Seth Hoenig	5f64e42d73	client: fixup how alloc mounts directory are setup (#20463 )	2024-04-26 07:29:52 -05:00
Tim Gross	6d58acd897	WI: ensure tasks within same alloc get different Consul tokens (#20411 ) The `consul_hook` in the allocrunner gets a separate Consul token for each task, even if the tasks' identities have the same name, but used the identity name as the key to the alloc hook resources map. This means the last task in the group overwrites the Consul tokens of all other tasks. Fix this by adding the task name to the key in the allocrunner's `consul_hook`. And update the taskrunner's `consul_hook` to expect the task name in the key. Fixes: https://github.com/hashicorp/nomad/issues/20374 Fixes: https://hashicorp.atlassian.net/browse/NOMAD-614	2024-04-17 11:29:58 -04:00
Luiz Aoqui	9d4f7bcb68	mock_driver: fix fingreprint key (#20351 ) The `mock_driver` is an internal task driver used mostly for testing and simulating workloads. During the allocrunner v2 work (#4792) its name changed from `mock_driver` to just `mock` and then back to `mock_driver`, but the fingreprint key was kept as `driver.mock`. This results in tasks configured with `driver = "mock"` to be scheduled (because Nomad thinks the client has a task driver called `mock`), but fail to actually run (because the Nomad client can't find a driver called `mock` in its catalog). Fingerprinting the right name prevents the job from being scheduled in the first place. Also removes mentions of the mock driver from documentation since its an internal driver and not available in any production release.	2024-04-16 07:16:55 +01:00
Tim Gross	9cb1ef3e3d	CNI: fix bugs in parsing strings to port number integers (#20379 ) Ports are a maximum of uint16, but we have a few places in the recent tproxy code where we were parsing them as 64-bit wide integers and then downcasting them to `int`, which is technically unsafe and triggers code scanning alerts. In practice we've validated the range elsewhere and don't build for 32-bit platforms. This changeset fixes the parsing to make everything a bit more robust and silence the alert. Fixes: https://github.com/hashicorp/nomad-enterprise/security/code-scanning/444	2024-04-12 13:31:25 -04:00
Seth Hoenig	ae6c4c8e3f	deps: purge use of old x/exp packages (#20373 )	2024-04-12 08:29:00 -05:00
astudentofblake	7b7ed12326	func: Allow custom paths to be added the the getter landlock (#20349 ) * func: Allow custom paths to be added the the getter landlock Fixes: 20315 * fix: slices imports fix: more meaningful examples fix: improve documentation fix: quote error output	2024-04-11 15:17:33 -05:00
Tim Gross	d56e8ad1aa	WI: ensure Consul hook and WID manager interpolate services (#20344 ) Services can have some of their string fields interpolated. The new Workload Identity flow doesn't interpolate the services before requesting signed identities or using those identities to get Consul tokens. Add support for interpolation to the WID manager and the Consul tokens hook by providing both with a taskenv builder. Add an "interpolate workload" field to the WI handle to allow passing the original workload name to the server so the server can find the correct service to sign. This changeset also makes two related test improvements: * Remove the mock WID manager, which was only used in the Consul hook tests and isn't necessary so long as we provide the real WID manager with the mock signer and never call `Run` on it. It wasn't feasible to exercise the correct behavior without this refactor, as the mocks were bypassing the new code. * Fixed swapped expect-vs-actual assertions on the `consul_hook` tests. Fixes: https://github.com/hashicorp/nomad/issues/20025	2024-04-11 15:40:28 -04:00
Tim Gross	8298d39e78	Connect transparent proxy support Add support for Consul Connect transparent proxies Fixes: https://github.com/hashicorp/nomad/issues/10628	2024-04-10 11:00:18 -04:00
Tim Gross	4fef82e8e2	tproxy: refactor `getPortMapping` The `getPortMapping` method forces callers to handle two different data structures, but only one caller cares about it. We don't want to return a single map or slice because the `cni.PortMapping` object doesn't include a label field that we need for tproxy. Return a new datastructure that closes over both a slice of `cni.PortMapping` and a map of label to index in that slice.	2024-04-10 10:16:13 -04:00
Tim Gross	8eaf176868	client: fix IPv6 parsing for `client.servers` block (#20324 ) When the `client.servers` block is parsed, we split the port from the address. This does not correctly handle IPv6 addresses when they are in URL format (wrapped in brackets), which we require to disambiguate the port and address. Fix the parser to correctly split out the port and handle a missing port value for IPv6. Update the documentation to make the URL format requirement clear. Fixes: https://github.com/hashicorp/nomad/issues/20310	2024-04-08 15:06:27 -04:00
Tim Gross	76009d89af	tproxy: networking hook changes (#20183 ) When `transparent_proxy` block is present and the network mode is `bridge`, use a different CNI configuration that includes the `consul-cni` plugin. Before invoking the CNI plugins, create a Consul SDK `iptables.Config` struct for the allocation. This includes: * Use all the `transparent_proxy` block fields * The reserved ports are added to the inbound exclusion list so the alloc is reachable from outside the mesh * The `expose` blocks and `check` blocks with `expose=true` are added to the inbound exclusion list so health checks work. The `iptables.Config` is then passed as a CNI argument to the `consul-cni` plugin. Ref: https://github.com/hashicorp/nomad/issues/10628	2024-04-04 17:01:07 -04:00

1 2 3 4 5 ...

4970 Commits