Commit Graph

4970 Commits

Author SHA1 Message Date
Daniel Bennett
a6e29057d6 networking: refactor some iptables for testability (#23856) 2024-08-23 10:05:46 -05:00
Seth Hoenig
8b093a6a5d scheduler: support for device - aware numa scheduling (#1760) (#23837)
(CE backport of ENT 59433d56c7215c0b8bf33764f41b57d9bd30160f (without ent files))

* scheduler: enhance numa aware scheduling with support for devices

* cr: add comments
2024-08-20 07:53:04 -05:00
Piotr Kazmierczak
0bc9796d3b client: log an error message if total detected cpu is zero (#23827) 2024-08-15 18:31:27 +02:00
Tim Gross
682c8c0c81 cgroupslib: allow initial controller check with delegated cgroups v2 (#23803)
During Nomad client initialization with cgroups v2, we assert that the required
cgroup controllers are available in the root `cgroup.subtree_control` file by
idempotently writing to the file. But if Nomad is running with delegated
cgroups, this will fail file permissions checks even if the subtree control file
already has the controllers we need.

Update the initialization to first check if the controllers are missing before
attempting to write to them. This allows cgroup delegation so long as the
cluster administrator has pre-created a Nomad owned cgroups tree and set the
`Delegate` option in a systemd override. If not, initialization fails in the
existing way.

Although this is one small step along the way to supporting a rootless Nomad
client, running Nomad as non-root is still unsupported. I've intentionally not
documented setting up cgroup delegation in this PR, as this PR is insufficient
by itself to have a secure and properly-working rootless Nomad client.

Ref: https://github.com/hashicorp/nomad/issues/18211
Ref: https://github.com/hashicorp/nomad/issues/13669
2024-08-14 16:58:21 -04:00
Martina Santangelo
73ce56ba27 networking: refactor building nomad bridge config (#23772) 2024-08-14 12:43:31 -05:00
Seth Hoenig
db0642099e build: update golangci-lint to 1.60.1 (#23807)
* build: update golangci-lint to 1.60.1

* ci: update golangci-lint to v1.60.1

Helps with go1.23 compatability. Introduces some breaking changes / newly
enforced linter patterns so those are fixed as well.
2024-08-14 10:09:31 -05:00
Tim Gross
920f4702d6 testing: fix skip comment on RequireWindows helper (#23776) 2024-08-09 09:07:25 -04:00
Tim Gross
ef116b12d5 metrics: add client.tasks state metrics (#23773)
Although we have `client.allocations` metrics to track allocation states on a
client, having separate metrics for `client.tasks` will allow operators to
identify that there are individual tasks in an unexpected state in an otherwise
healthy allocation.

Fixes: https://github.com/hashicorp/nomad/issues/23770
2024-08-09 09:02:17 -04:00
Tim Gross
9543e740af docker: fix delimiter for selinux label for read-only volumes (#23750)
The Docker driver's `volume` field to specify bind-mounts takes a list of
strings that consist of three `:`-delimited fields: source, destination, and
options. We append the SELinux label from the plugin configuration as the third
field. But when the user has already specified the volume is read-only with
`:ro`, we're incorrectly appending the SELinux label with another `:` instead of
the required `,`.

Combine the options into a single field value before appending them to the bind
mounts configuration. Updated the tests to split out Windows behavior (which
doesn't accept options) and to ensure the test task has the expected environment
for bind mounts.

Fixes: https://github.com/hashicorp/nomad/issues/23690
2024-08-08 09:08:01 -04:00
Deniz Onur Duzgun
0f7b8698ec security: fix write symlink escape on the same allocdir path (#23738)
Resolves symlink escape when unarchiving by removing existing paths within the same allocation directory which can occur by writing a header that points to a symlink that lives outside of the sandbox environment. This exploit requires first compromising the Nomad client agent at the source allocation.

Ref: https://hashicorp.atlassian.net/browse/NET-10607
Ref: https://github.com/hashicorp/nomad-enterprise/pull/1725
2024-08-05 16:23:27 -04:00
Tim Gross
b25f1b66ce resources: allow job authors to configure size of secrets tmpfs (#23696)
On supported platforms, the secrets directory is a 1MiB tmpfs. But some tasks
need larger space for downloading large secrets. This is especially the case for
tasks using `templates`, which need extra room to write a temporary file to the
secrets directory that gets renamed to the old file atomically.

This changeset allows increasing the size of the tmpfs in the `resources`
block. Because this is a memory resource, we need to include it in the memory we
allocate for scheduling purposes. The task is already prevented from using more
memory in the tmpfs than the `resources.memory` field allows, but can bypass
that limit by writing to the tmpfs via `template` or `artifact` blocks.

Therefore, we need to account for the size of the tmpfs in the allocation
resources. Simply adding it to the memory needed when we create the allocation
allows it to be accounted for in all downstream consumers, and then we'll
subtract that amount from the memory resources just before configuring the task
driver.

For backwards compatibility, the default value of 1MiB is "free" and ignored by
the scheduler. Otherwise we'd be increasing the allocated resources for every
existing alloc, which could cause problems across upgrades. If a user explicitly
sets `resources.secrets = 1` it will no longer be free.

Fixes: https://github.com/hashicorp/nomad/issues/2481
Ref: https://hashicorp.atlassian.net/browse/NET-10070
2024-08-05 16:06:58 -04:00
Tim Gross
c280891703 template: allow change_mode script to run after client restart (#23663)
For templates with `change_mode = "script"`, we set a driver handle in the
poststart method, so the template runner can execute the script inside the
task. But when the client is restarted and the template contents change during
that window, we trigger a change_mode in the prestart method. In that case, the
hook will not have the handle and so returns an errror trying to run the change
mode.

We restore the driver handle before we call any prestart hooks, so we can pass
that handle in the constructor whenever it's available. In the normal task start
case the handle will be empty but also won't be called.

The error messages are also misleading, as there's no capabilities check
happening here. Update the error messages to match.

Fixes: https://github.com/hashicorp/nomad/issues/15851
Ref: https://hashicorp.atlassian.net/browse/NET-9338
2024-07-24 08:29:39 -04:00
Daniel Bennett
32d8ec446f cni: fix parsing .conf and .json configs (#23629)
so we can support more than just .conflist format
like our docs claim we do
2024-07-18 14:55:13 -05:00
Martina Santangelo
8cbd857ac9 cni: test Setup method (#23609)
Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>
2024-07-17 13:02:23 -04:00
guifran001
1c44521543 client: Add a preferred address family option for network-interface (#23389)
to prefer ipv4 or ipv6 when deducing IP from network interface

Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>
2024-07-12 15:30:38 -05:00
Martina Santangelo
661011f5de cni: allow users to set CNI args in job spec (#23538) 2024-07-12 11:47:15 -04:00
Piotr Kazmierczak
7772711c89 plugins: fix nomadTopologyToProto panic on systems that don't support NUMA (#23399)
After changes introduced in #23284 we no longer need to make a if
!st.SupportsNUMA() check in the GetNodes() topology method. In fact this check
will now cause panic in nomadTopologyToProto method on systems that don't
support NUMA.
2024-07-09 08:41:52 +02:00
Deniz Onur Duzgun
ef6cdec884 security: add escape to arbitrary file access (#23319) 2024-07-08 14:00:09 -04:00
Tim Gross
18fdda6242 vault: fix namespace reset for clients with unset namespace (#23491)
The Vault "logical" API doesn't allow configuring the namespace on a per-request
basis. Instead, it's set on the client. Our `vaultclient` wrapper locks access
to the API client and sets the namespace (and token, if applicable) for each
request, and then resets the namespace and unlocks the API client.

The logic for resetting the namespace incorrectly assumed that if the Vault
configuration didn't set the namespace that it was canonicalized to the
non-empty string `"default"`. This results in the API client's namespace getting
"stuck" whenever a job uses a non-default namespace if the configuration value
is empty. Update the logic to always go back to the configuration, rather than
accepting the "previous" namespace from the caller.

This changeset also removes some long-dead code in the Vault client wrapper.

Fixes: https://github.com/hashicorp/nomad/issues/22230
Ref: https://hashicorp.atlassian.net/browse/NET-10207
2024-07-03 10:13:20 -04:00
Piotr Kazmierczak
356ea87e00 template: disable sandboxed rendering on Windows (#23432)
Following #23443, we no longer need to sandbox template rendering on Windows.
2024-06-28 17:16:27 +02:00
Tim Gross
df67e74615 Consul: add preflight checks for Envoy bootstrap (#23381)
Nomad creates Consul ACL tokens and service registrations to support Consul
service mesh workloads, before bootstrapping the Envoy proxy. Nomad always talks
to the local Consul agent and never directly to the Consul servers. But the
local Consul agent talks to the Consul servers in stale consistency mode to
reduce load on the servers. This can result in the Nomad client making the Envoy
bootstrap request with a tokens or services that have not yet replicated to the
follower that the local client is connected to. This request gets a 404 on the
ACL token and that negative entry gets cached, preventing any retries from
succeeding.

To workaround this, we'll use a method described by our friends over on
`consul-k8s` where after creating the objects in Consul we try to read them from
the local agent in stale consistency mode (which prevents a failed read from
being cached). This cannot completely eliminate this source of error because
it's possible that Consul cluster replication is unhealthy at the time we need
it, but this should make Envoy bootstrap significantly more robust.

This changset adds preflight checks for the objects we create in Consul:
* We add a preflight check for ACL tokens after we login via via Workload
  Identity and in the function we use to derive tokens in the legacy
  workflow. We do this check early because we also want to use this token for
  registering group services in the allocrunner hooks.
* We add a preflight check for services right before we bootstrap Envoy in the
  taskrunner hook, so that we have time for our service client to batch updates
  to the local Consul agent in addition to the local agent sync.

We've added the timeouts to be configurable via node metadata rather than the
usual static configuration because for most cases, users should not need to
touch or even know these values are configurable; the configuration is mostly
available for testing.


Fixes: https://github.com/hashicorp/nomad/issues/9307
Fixes: https://github.com/hashicorp/nomad/issues/10451
Fixes: https://github.com/hashicorp/nomad/issues/20516

Ref: https://github.com/hashicorp/consul-k8s/pull/887
Ref: https://hashicorp.atlassian.net/browse/NET-10051
Ref: https://hashicorp.atlassian.net/browse/NET-9273
Follow-up: https://hashicorp.atlassian.net/browse/NET-10138
2024-06-27 10:15:37 -04:00
Tim Gross
7d73065066 numa: fix scheduler panic due to topology serialization bug (#23284)
The NUMA topology struct field `NodeIDs` is a `idset.Set`, which has no public
members. As a result, this field is never serialized via msgpack and persisted
in state. When `numa.affinity = "prefer"`, the scheduler dereferences this nil
field and panics the scheduler worker.

Ideally we would fix this by adding a msgpack serialization extension, but
because the field already exists and is just always empty, this breaks RPC wire
compatibility across upgrades. Instead, create a new field that's populated at
the same time we populate the more useful `idset.Set`, and repopulate the set on
demand.

Fixes: https://hashicorp.atlassian.net/browse/NET-9924
2024-06-11 08:55:00 -04:00
Tim Gross
61608e43cb test: move NUMA platform scan out of testing global (#23289)
The `testing.go` test helpers file for the driver manager initializes the NUMA
scan as a package-global variable. This causes it to be pulled in even in
production builds, so even running commands like `nomad version` will cause the
NUMA scan to happen. Move the scan into the test helper setup.
2024-06-11 08:52:51 -04:00
nicoche
ffcb72bfe3 api: Add Notes field to service checks (#22397)
Co-authored-by: Piotr Kazmierczak <470696+pkazmierczak@users.noreply.github.com>
2024-06-10 16:59:49 +02:00
Seth Hoenig
45da80bde2 client: cleanup empty task directory when using unveil filesystem isolation (#23237)
This PR fixes a bug where Nomad client would leave behind an empty directory
created on behalf of tasks making use of the unveil filesystem isolation
mode (i.e. using exec2 task driver). Once unmounting is complete, we should
remember to also delete the directory.

Fixes #22433
2024-06-06 10:47:23 -05:00
Tim Gross
140747240f consul: include admin partition in JWT login requests (#22226)
When logging into a JWT auth method, we need to explicitly supply the Consul
admin partition if the local Consul agent is in a partition. We can't derive
this from agent configuration because the Consul agent's configuration is
canonical, so instead we get the partition from the fingerprint (if
available). This changeset updates the Consul client constructor so that we
close over the partition from the fingerprint.

Ref: https://hashicorp.atlassian.net/browse/NET-9451
2024-05-29 16:31:09 -04:00
Daniel Bennett
4415fabe7d jobspec: time based task execution (#22201)
this is the CE side of an Enterprise-only feature.
a job trying to use this in CE will fail to validate.

to enable daily-scheduled execution entirely client-side,
a job may now contain:

task "name" {
  schedule {
    cron {
      start    = "0 12 * * * *" # may not include "," or "/"
      end      = "0 16"         # partial cron, with only {minute} {hour}
      timezone = "EST"          # anything in your tzdb
    }
  }
...

and everything about the allocation will be placed as usual,
but if outside the specified schedule, the taskrunner will block
on the client, waiting on the schedule start, before proceeding
with the task driver execution, etc.

this includes a taksrunner hook, which watches for the end of
the schedule, at which point it will kill the task.

then, restarts-allowing, a new task will start and again block
waiting for start, and so on.

this also includes all the plumbing required to pipe API calls
through from command->api->agent->server->client, so that
tasks can be force-run, force-paused, or resume the schedule
on demand.
2024-05-22 15:40:25 -05:00
Seth Hoenig
09bd11383c client: alloc_mounts directory must be sibling of data directory (#22199)
This PR adjusts the default location of -alloc-mounts-dir path to be a
sibling of the -data-dir path rather than a child. This is because on a
production-hardened systems the data dir is supposed to be chmod 0700
owned by root - preventing the exec2 task driver (and others using
unveil file system isolation features) from working properly.

For reference the directory structure from -data-dir now looks like this
after running an example job. Under the alloc_mounts directory, task
specific directories are mode 0710 and owned by the task user (which
may be a dynamic user UID/GID).

➜ sudo tree -p -d -u /tmp/mynomad
[drwxrwxr-x shoenig ]  /tmp/mynomad
├── [drwx--x--x root    ]  alloc_mounts
│   └── [drwx--x--- 80552   ]  c753b71d-c6a1-3370-1f59-47ab838fd8a6-mytask
│       ├── [drwxrwxrwx nobody  ]  alloc
│       │   ├── [drwxrwxrwx nobody  ]  data
│       │   ├── [drwxrwxrwx nobody  ]  logs
│       │   └── [drwxrwxrwx nobody  ]  tmp
│       ├── [drwxrwxrwx nobody  ]  local
│       ├── [drwxr-xr-x root    ]  private
│       ├── [drwx--x--- 80552   ]  secrets
│       └── [drwxrwxrwt nobody  ]  tmp
└── [drwx------ root    ]  data
    ├── [drwx--x--x root    ]  alloc
    │   └── [drwxr-xr-x root    ]  c753b71d-c6a1-3370-1f59-47ab838fd8a6
    │       ├── [drwxrwxrwx nobody  ]  alloc
    │       │   ├── [drwxrwxrwx nobody  ]  data
    │       │   ├── [drwxrwxrwx nobody  ]  logs
    │       │   └── [drwxrwxrwx nobody  ]  tmp
    │       └── [drwx--x--- 80552   ]  mytask
    │           ├── [drwxrwxrwx nobody  ]  alloc
    │           │   ├── [drwxrwxrwx nobody  ]  data
    │           │   ├── [drwxrwxrwx nobody  ]  logs
    │           │   └── [drwxrwxrwx nobody  ]  tmp
    │           ├── [drwxrwxrwx nobody  ]  local
    │           ├── [drwxrwxrwx nobody  ]  private
    │           ├── [drwx--x--- 80552   ]  secrets
    │           └── [drwxrwxrwt nobody  ]  tmp
    ├── [drwx------ root    ]  client
    └── [drwxr-xr-x root    ]  server
        ├── [drwx------ root    ]  keystore
        ├── [drwxr-xr-x root    ]  raft
        │   └── [drwxr-xr-x root    ]  snapshots
        └── [drwxr-xr-x root    ]  serf

32 directories
2024-05-22 13:14:34 -05:00
Tim Gross
b1657dd1fa CSI: track node claim before staging to prevent interleaved unstage (#20550)
The CSI hook for each allocation that claims a volume runs concurrently. If a
call to `MountVolume` happens at the same time as a call to `UnmountVolume` for
the same volume, it's possible for the second alloc to detect the volume has
already been staged, then for the original alloc to unpublish and unstage it,
only for the second alloc to then attempt to publish a volume that's been
unstaged.

The usage tracker on the volume manager was intended to prevent this behavior
but the call to claim the volume was made only after staging and publishing was
complete. Move the call to claim the volume for the usage tracker to the top of
the `MountVolume` workflow to prevent it from being unstaged until all consuming
allocations have called `UnmountVolume`.

Fixes: https://github.com/hashicorp/nomad/issues/20424
2024-05-16 09:45:07 -04:00
Tim Gross
953bfcc31e services: retry failed Nomad service deregistrations from client (#20596)
When the allocation is stopped, we deregister the service in the alloc runner's
`PreKill` hook. This ensures we delete the service registration and wait for the
shutdown delay before shutting down the tasks, so that workloads can drain their
connections. However, the call to remove the workload only logs errors and never
retries them.

Add a short retry loop to the `RemoveWorkload` method for Nomad services, so
that transient errors give us an extra opportunity to deregister the service
before the tasks are stopped, before we need to fall back to the data integrity
improvements implemented in #20590.

Ref: https://github.com/hashicorp/nomad/issues/16616
2024-05-16 08:59:54 -04:00
Deniz Onur Duzgun
1cc99cc1b4 bug: resolve type conversion alerts (#20553) 2024-05-15 13:22:10 -04:00
Seth Hoenig
4148ca1769 client: mount shared alloc dir as nobody (#20589)
In the Unveil filesystem isolation mode we were mounting the shared
alloc dir with the UID/GID of the user of the task dir being mounted
and 0710 filesystem permissions. This was causing the actual task dir
to become inaccessible to other tasks in the allocation (a race where
the last mounter wins). Instead mount the shared alloc dir as nobody
with 0777 filesystem permissions.
2024-05-15 10:43:30 -05:00
James Rasell
04ba358266 client: expose network namespace CNI config as task env vars. (#11810)
This change exposes CNI configuration details of a network
namespace as environment variables. This allows a task to use
these value to configure itself; a potential use case is to run
a Raft application binding to IP and Port details configured using
the bridge network mode.
2024-05-14 09:02:06 +01:00
Juana De La Cuesta
169818b1bd [gh-6980] Client: clean up old allocs before running new ones using the exec task driver. (#20500)
Whenever the "exec" task driver is being used, nomad runs a plug in that in time runs the task on a container under the hood. If by any circumstance the executor is killed, the task is reparented to the init service and wont be stopped by Nomad in case of a job updated or stop.

This commit introduces two mechanisms to avoid this behaviour:

* Adds signal catching and handling to the executor, so in case of a SIGTERM, the signal will also be passed on to the task.
* Adds a pre start clean up of the processes in the container, ensuring only the ones the executor runs are present at any given time.
2024-05-14 09:51:27 +02:00
Tim Gross
65ae61249c CSI: include volume namespace in staging path (#20532)
CSI volumes are namespaced. But the client does not include the namespace in the
staging mount path. This causes CSI volumes with the same volume ID but
different namespace to collide if they happen to be placed on the same host. The
per-allocation paths don't need to be namespaced, because an allocation can only
mount volumes from its job's own namespace.

Rework the CSI hook tests to have more fine-grained control over the mock
on-disk state. Add tests covering upgrades from staging paths missing
namespaces.

Fixes: https://github.com/hashicorp/nomad/issues/18741
2024-05-13 11:24:09 -04:00
Tim Gross
623486b302 deps: vendor containernetworking/plugins functions for net NS utils (#20556)
We bring in `containernetworking/plugins` for the contents of a single file,
which we use in a few places for running a goroutine in a specific network
namespace. This code hasn't needed an update in a couple of years, and a good
chunk of what we need was previously vendored into `client/lib/nsutil`
already.

Updating the library via dependabot is causing errors in Docker driver tests
because it updates a lot of transient dependencies, and it's bringing in a pile
of new transient dependencies like opentelemetry. Avoid this problem going
forward by vendoring the remaining code we hadn't already.

Ref: https://github.com/hashicorp/nomad/pull/20146
2024-05-13 09:10:16 -04:00
James Rasell
7e42ad869a client: fix unallocated CPU metric when reserved cpu is set. (#20543) 2024-05-09 10:55:22 +01:00
Seth Hoenig
14a022cbc0 drivers/raw_exec: enable setting cgroup override values (#20481)
* drivers/raw_exec: enable setting cgroup override values

This PR enables configuration of cgroup override values on the `raw_exec`
task driver. WARNING: setting cgroup override values eliminates any
gauruntee Nomad can make about resource availability for *any* task on
the client node.

For cgroup v2 systems, set a single unified cgroup path using `cgroup_v2_override`.
The path may be either absolute or relative to the cgroup root.

config {
  cgroup_v2_override = "custom.slice/app.scope"
}

or

config {
  cgroup_v2_override = "/sys/fs/cgroup/custom.slice/app.scope"
}

For cgroup v1 systems, set a per-controller path for each controller using
`cgroup_v1_override`. The path(s) may be either absolute or relative to
the controller root.

config {
  cgroup_v1_override = {
    "pids": "custom/app",
    "cpuset": "custom/app",
  }
}

or

config {
  cgroup_v1_override = {
    "pids": "/sys/fs/cgroup/pids/custom/app",
    "cpuset": "/sys/fs/cgroup/cpuset/custom/app",
  }
}

* drivers/rawexec: ensure only one of v1/v2 cgroup override is set

* drivers/raw_exec: executor should error if setting cgroup does not work

* drivers/raw_exec: create cgroups in raw_exec tests

* drivers/raw_exec: ensure we fail to start if custom cgroup set and non-root

* move custom cgroup func into shared file

---------

Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
2024-05-07 16:46:27 -07:00
Tim Gross
f41bc468eb consul: provide CONSUL_HTTP_TOKEN env var to tasks (#20519)
When available, we provide an environment variable `CONSUL_TOKEN` to tasks, but
this isn't the environment variable expected by the Consul CLI. Job
specifications like deploying an API Gateway become noticeably nicer if we can
instead provide the expected env var.
2024-05-03 11:30:33 -04:00
Seth Hoenig
5f64e42d73 client: fixup how alloc mounts directory are setup (#20463) 2024-04-26 07:29:52 -05:00
Tim Gross
6d58acd897 WI: ensure tasks within same alloc get different Consul tokens (#20411)
The `consul_hook` in the allocrunner gets a separate Consul token for each task,
even if the tasks' identities have the same name, but used the identity name as
the key to the alloc hook resources map. This means the last task in the group
overwrites the Consul tokens of all other tasks.

Fix this by adding the task name to the key in the allocrunner's
`consul_hook`. And update the taskrunner's `consul_hook` to expect the task
name in the key.

Fixes: https://github.com/hashicorp/nomad/issues/20374
Fixes: https://hashicorp.atlassian.net/browse/NOMAD-614
2024-04-17 11:29:58 -04:00
Luiz Aoqui
9d4f7bcb68 mock_driver: fix fingreprint key (#20351)
The `mock_driver` is an internal task driver used mostly for testing and
simulating workloads. During the allocrunner v2 work (#4792) its name
changed from `mock_driver` to just `mock` and then back to
`mock_driver`, but the fingreprint key was kept as `driver.mock`.

This results in tasks configured with `driver = "mock"` to be scheduled
(because Nomad thinks the client has a task driver called `mock`), but
fail to actually run (because the Nomad client can't find a driver
called `mock` in its catalog).

Fingerprinting the right name prevents the job from being scheduled in
the first place.

Also removes mentions of the mock driver from documentation since its an
internal driver and not available in any production release.
2024-04-16 07:16:55 +01:00
Tim Gross
9cb1ef3e3d CNI: fix bugs in parsing strings to port number integers (#20379)
Ports are a maximum of uint16, but we have a few places in the recent tproxy
code where we were parsing them as 64-bit wide integers and then downcasting
them to `int`, which is technically unsafe and triggers code scanning alerts. In
practice we've validated the range elsewhere and don't build for 32-bit
platforms. This changeset fixes the parsing to make everything a bit more robust
and silence the alert.

Fixes: https://github.com/hashicorp/nomad-enterprise/security/code-scanning/444
2024-04-12 13:31:25 -04:00
Seth Hoenig
ae6c4c8e3f deps: purge use of old x/exp packages (#20373) 2024-04-12 08:29:00 -05:00
astudentofblake
7b7ed12326 func: Allow custom paths to be added the the getter landlock (#20349)
* func: Allow custom paths to be added the the getter landlock

Fixes: 20315

* fix: slices imports
fix: more meaningful examples
fix: improve documentation
fix: quote error output
2024-04-11 15:17:33 -05:00
Tim Gross
d56e8ad1aa WI: ensure Consul hook and WID manager interpolate services (#20344)
Services can have some of their string fields interpolated. The new Workload
Identity flow doesn't interpolate the services before requesting signed
identities or using those identities to get Consul tokens.

Add support for interpolation to the WID manager and the Consul tokens hook by
providing both with a taskenv builder. Add an "interpolate workload" field to
the WI handle to allow passing the original workload name to the server so the
server can find the correct service to sign.

This changeset also makes two related test improvements:
* Remove the mock WID manager, which was only used in the Consul hook tests and
  isn't necessary so long as we provide the real WID manager with the mock
  signer and never call `Run` on it. It wasn't feasible to exercise the correct
  behavior without this refactor, as the mocks were bypassing the new code.
* Fixed swapped expect-vs-actual assertions on the `consul_hook` tests.

Fixes: https://github.com/hashicorp/nomad/issues/20025
2024-04-11 15:40:28 -04:00
Tim Gross
8298d39e78 Connect transparent proxy support
Add support for Consul Connect transparent proxies

Fixes: https://github.com/hashicorp/nomad/issues/10628
2024-04-10 11:00:18 -04:00
Tim Gross
4fef82e8e2 tproxy: refactor getPortMapping
The `getPortMapping` method forces callers to handle two different data
structures, but only one caller cares about it. We don't want to return a single
map or slice because the `cni.PortMapping` object doesn't include a label field
that we need for tproxy. Return a new datastructure that closes over both a
slice of `cni.PortMapping` and a map of label to index in that slice.
2024-04-10 10:16:13 -04:00
Tim Gross
8eaf176868 client: fix IPv6 parsing for client.servers block (#20324)
When the `client.servers` block is parsed, we split the port from the
address. This does not correctly handle IPv6 addresses when they are in URL
format (wrapped in brackets), which we require to disambiguate the port and
address.

Fix the parser to correctly split out the port and handle a missing port value
for IPv6. Update the documentation to make the URL format requirement clear.

Fixes: https://github.com/hashicorp/nomad/issues/20310
2024-04-08 15:06:27 -04:00
Tim Gross
76009d89af tproxy: networking hook changes (#20183)
When `transparent_proxy` block is present and the network mode is `bridge`, use
a different CNI configuration that includes the `consul-cni` plugin. Before
invoking the CNI plugins, create a Consul SDK `iptables.Config` struct for the
allocation. This includes:

* Use all the `transparent_proxy` block fields
* The reserved ports are added to the inbound exclusion list so the alloc is
  reachable from outside the mesh
* The `expose` blocks and `check` blocks with `expose=true` are added to the
  inbound exclusion list so health checks work.

The `iptables.Config` is then passed as a CNI argument to the `consul-cni`
plugin.

Ref: https://github.com/hashicorp/nomad/issues/10628
2024-04-04 17:01:07 -04:00