Commit Graph

841 Commits

Author SHA1 Message Date
Tim Gross
140747240f consul: include admin partition in JWT login requests (#22226)
When logging into a JWT auth method, we need to explicitly supply the Consul
admin partition if the local Consul agent is in a partition. We can't derive
this from agent configuration because the Consul agent's configuration is
canonical, so instead we get the partition from the fingerprint (if
available). This changeset updates the Consul client constructor so that we
close over the partition from the fingerprint.

Ref: https://hashicorp.atlassian.net/browse/NET-9451
2024-05-29 16:31:09 -04:00
Daniel Bennett
4415fabe7d jobspec: time based task execution (#22201)
this is the CE side of an Enterprise-only feature.
a job trying to use this in CE will fail to validate.

to enable daily-scheduled execution entirely client-side,
a job may now contain:

task "name" {
  schedule {
    cron {
      start    = "0 12 * * * *" # may not include "," or "/"
      end      = "0 16"         # partial cron, with only {minute} {hour}
      timezone = "EST"          # anything in your tzdb
    }
  }
...

and everything about the allocation will be placed as usual,
but if outside the specified schedule, the taskrunner will block
on the client, waiting on the schedule start, before proceeding
with the task driver execution, etc.

this includes a taksrunner hook, which watches for the end of
the schedule, at which point it will kill the task.

then, restarts-allowing, a new task will start and again block
waiting for start, and so on.

this also includes all the plumbing required to pipe API calls
through from command->api->agent->server->client, so that
tasks can be force-run, force-paused, or resume the schedule
on demand.
2024-05-22 15:40:25 -05:00
Seth Hoenig
09bd11383c client: alloc_mounts directory must be sibling of data directory (#22199)
This PR adjusts the default location of -alloc-mounts-dir path to be a
sibling of the -data-dir path rather than a child. This is because on a
production-hardened systems the data dir is supposed to be chmod 0700
owned by root - preventing the exec2 task driver (and others using
unveil file system isolation features) from working properly.

For reference the directory structure from -data-dir now looks like this
after running an example job. Under the alloc_mounts directory, task
specific directories are mode 0710 and owned by the task user (which
may be a dynamic user UID/GID).

➜ sudo tree -p -d -u /tmp/mynomad
[drwxrwxr-x shoenig ]  /tmp/mynomad
├── [drwx--x--x root    ]  alloc_mounts
│   └── [drwx--x--- 80552   ]  c753b71d-c6a1-3370-1f59-47ab838fd8a6-mytask
│       ├── [drwxrwxrwx nobody  ]  alloc
│       │   ├── [drwxrwxrwx nobody  ]  data
│       │   ├── [drwxrwxrwx nobody  ]  logs
│       │   └── [drwxrwxrwx nobody  ]  tmp
│       ├── [drwxrwxrwx nobody  ]  local
│       ├── [drwxr-xr-x root    ]  private
│       ├── [drwx--x--- 80552   ]  secrets
│       └── [drwxrwxrwt nobody  ]  tmp
└── [drwx------ root    ]  data
    ├── [drwx--x--x root    ]  alloc
    │   └── [drwxr-xr-x root    ]  c753b71d-c6a1-3370-1f59-47ab838fd8a6
    │       ├── [drwxrwxrwx nobody  ]  alloc
    │       │   ├── [drwxrwxrwx nobody  ]  data
    │       │   ├── [drwxrwxrwx nobody  ]  logs
    │       │   └── [drwxrwxrwx nobody  ]  tmp
    │       └── [drwx--x--- 80552   ]  mytask
    │           ├── [drwxrwxrwx nobody  ]  alloc
    │           │   ├── [drwxrwxrwx nobody  ]  data
    │           │   ├── [drwxrwxrwx nobody  ]  logs
    │           │   └── [drwxrwxrwx nobody  ]  tmp
    │           ├── [drwxrwxrwx nobody  ]  local
    │           ├── [drwxrwxrwx nobody  ]  private
    │           ├── [drwx--x--- 80552   ]  secrets
    │           └── [drwxrwxrwt nobody  ]  tmp
    ├── [drwx------ root    ]  client
    └── [drwxr-xr-x root    ]  server
        ├── [drwx------ root    ]  keystore
        ├── [drwxr-xr-x root    ]  raft
        │   └── [drwxr-xr-x root    ]  snapshots
        └── [drwxr-xr-x root    ]  serf

32 directories
2024-05-22 13:14:34 -05:00
James Rasell
04ba358266 client: expose network namespace CNI config as task env vars. (#11810)
This change exposes CNI configuration details of a network
namespace as environment variables. This allows a task to use
these value to configure itself; a potential use case is to run
a Raft application binding to IP and Port details configured using
the bridge network mode.
2024-05-14 09:02:06 +01:00
Tim Gross
65ae61249c CSI: include volume namespace in staging path (#20532)
CSI volumes are namespaced. But the client does not include the namespace in the
staging mount path. This causes CSI volumes with the same volume ID but
different namespace to collide if they happen to be placed on the same host. The
per-allocation paths don't need to be namespaced, because an allocation can only
mount volumes from its job's own namespace.

Rework the CSI hook tests to have more fine-grained control over the mock
on-disk state. Add tests covering upgrades from staging paths missing
namespaces.

Fixes: https://github.com/hashicorp/nomad/issues/18741
2024-05-13 11:24:09 -04:00
Seth Hoenig
14a022cbc0 drivers/raw_exec: enable setting cgroup override values (#20481)
* drivers/raw_exec: enable setting cgroup override values

This PR enables configuration of cgroup override values on the `raw_exec`
task driver. WARNING: setting cgroup override values eliminates any
gauruntee Nomad can make about resource availability for *any* task on
the client node.

For cgroup v2 systems, set a single unified cgroup path using `cgroup_v2_override`.
The path may be either absolute or relative to the cgroup root.

config {
  cgroup_v2_override = "custom.slice/app.scope"
}

or

config {
  cgroup_v2_override = "/sys/fs/cgroup/custom.slice/app.scope"
}

For cgroup v1 systems, set a per-controller path for each controller using
`cgroup_v1_override`. The path(s) may be either absolute or relative to
the controller root.

config {
  cgroup_v1_override = {
    "pids": "custom/app",
    "cpuset": "custom/app",
  }
}

or

config {
  cgroup_v1_override = {
    "pids": "/sys/fs/cgroup/pids/custom/app",
    "cpuset": "/sys/fs/cgroup/cpuset/custom/app",
  }
}

* drivers/rawexec: ensure only one of v1/v2 cgroup override is set

* drivers/raw_exec: executor should error if setting cgroup does not work

* drivers/raw_exec: create cgroups in raw_exec tests

* drivers/raw_exec: ensure we fail to start if custom cgroup set and non-root

* move custom cgroup func into shared file

---------

Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
2024-05-07 16:46:27 -07:00
Tim Gross
f41bc468eb consul: provide CONSUL_HTTP_TOKEN env var to tasks (#20519)
When available, we provide an environment variable `CONSUL_TOKEN` to tasks, but
this isn't the environment variable expected by the Consul CLI. Job
specifications like deploying an API Gateway become noticeably nicer if we can
instead provide the expected env var.
2024-05-03 11:30:33 -04:00
Seth Hoenig
5f64e42d73 client: fixup how alloc mounts directory are setup (#20463) 2024-04-26 07:29:52 -05:00
Tim Gross
6d58acd897 WI: ensure tasks within same alloc get different Consul tokens (#20411)
The `consul_hook` in the allocrunner gets a separate Consul token for each task,
even if the tasks' identities have the same name, but used the identity name as
the key to the alloc hook resources map. This means the last task in the group
overwrites the Consul tokens of all other tasks.

Fix this by adding the task name to the key in the allocrunner's
`consul_hook`. And update the taskrunner's `consul_hook` to expect the task
name in the key.

Fixes: https://github.com/hashicorp/nomad/issues/20374
Fixes: https://hashicorp.atlassian.net/browse/NOMAD-614
2024-04-17 11:29:58 -04:00
Tim Gross
9cb1ef3e3d CNI: fix bugs in parsing strings to port number integers (#20379)
Ports are a maximum of uint16, but we have a few places in the recent tproxy
code where we were parsing them as 64-bit wide integers and then downcasting
them to `int`, which is technically unsafe and triggers code scanning alerts. In
practice we've validated the range elsewhere and don't build for 32-bit
platforms. This changeset fixes the parsing to make everything a bit more robust
and silence the alert.

Fixes: https://github.com/hashicorp/nomad-enterprise/security/code-scanning/444
2024-04-12 13:31:25 -04:00
Seth Hoenig
ae6c4c8e3f deps: purge use of old x/exp packages (#20373) 2024-04-12 08:29:00 -05:00
astudentofblake
7b7ed12326 func: Allow custom paths to be added the the getter landlock (#20349)
* func: Allow custom paths to be added the the getter landlock

Fixes: 20315

* fix: slices imports
fix: more meaningful examples
fix: improve documentation
fix: quote error output
2024-04-11 15:17:33 -05:00
Tim Gross
d56e8ad1aa WI: ensure Consul hook and WID manager interpolate services (#20344)
Services can have some of their string fields interpolated. The new Workload
Identity flow doesn't interpolate the services before requesting signed
identities or using those identities to get Consul tokens.

Add support for interpolation to the WID manager and the Consul tokens hook by
providing both with a taskenv builder. Add an "interpolate workload" field to
the WI handle to allow passing the original workload name to the server so the
server can find the correct service to sign.

This changeset also makes two related test improvements:
* Remove the mock WID manager, which was only used in the Consul hook tests and
  isn't necessary so long as we provide the real WID manager with the mock
  signer and never call `Run` on it. It wasn't feasible to exercise the correct
  behavior without this refactor, as the mocks were bypassing the new code.
* Fixed swapped expect-vs-actual assertions on the `consul_hook` tests.

Fixes: https://github.com/hashicorp/nomad/issues/20025
2024-04-11 15:40:28 -04:00
Tim Gross
4fef82e8e2 tproxy: refactor getPortMapping
The `getPortMapping` method forces callers to handle two different data
structures, but only one caller cares about it. We don't want to return a single
map or slice because the `cni.PortMapping` object doesn't include a label field
that we need for tproxy. Return a new datastructure that closes over both a
slice of `cni.PortMapping` and a map of label to index in that slice.
2024-04-10 10:16:13 -04:00
Tim Gross
76009d89af tproxy: networking hook changes (#20183)
When `transparent_proxy` block is present and the network mode is `bridge`, use
a different CNI configuration that includes the `consul-cni` plugin. Before
invoking the CNI plugins, create a Consul SDK `iptables.Config` struct for the
allocation. This includes:

* Use all the `transparent_proxy` block fields
* The reserved ports are added to the inbound exclusion list so the alloc is
  reachable from outside the mesh
* The `expose` blocks and `check` blocks with `expose=true` are added to the
  inbound exclusion list so health checks work.

The `iptables.Config` is then passed as a CNI argument to the `consul-cni`
plugin.

Ref: https://github.com/hashicorp/nomad/issues/10628
2024-04-04 17:01:07 -04:00
Seth Hoenig
6ad648bec8 networking: Inject implicit constraints on CNI plugins when using bridge mode (#15473)
This PR adds a job mutator which injects constraints on the job taskgroups
that make use of bridge networking. Creating a bridge network makes use of the
CNI plugins: bridge, firewall, host-local, loopback, and portmap. Starting
with Nomad 1.5 these plugins are fingerprinted on each node, and as such we
can ensure jobs are correctly scheduled only on nodes where they are available,
when needed.
2024-03-27 16:11:39 -04:00
Tim Gross
15162917c1 cni: fix regression in falling back to DNS owned by dockerd (#20189)
In #20007 we fixed a bug where the DNS configuration set by CNI plugins was not
threaded through to the task configuration. This resulted in a regression where
a DNS override set by `dockerd` was not respected for `bridge` mode
networking. Our existing handling of CNI DNS incorrectly assumed that the DNS
field would be empty, when in fact it contains a single empty DNS struct.

Handle this case correctly by checking whether the DNS struct we get back from
CNI has any nameservers, and ignore it if it doesn't. Expand test coverage of
this case.

Fixes: https://github.com/hashicorp/nomad/issues/20174
2024-03-22 10:54:16 -04:00
Tim Gross
13617eee4b template: improve internal documentation around shutdown (#20134)
While investigating a report around possible consul-template shutdown issues,
which didn't bear fruit, I found that some of the logic around template runner
shutdown is unintuitive.

* Add some doc strings to the places where someone might think we should be
  obviously stopping the runner or returning early.
* Mark context argument for `Poststart`, `Stop`, and `Update` hooks as unused.

No functional code changes.
2024-03-14 15:33:32 -04:00
Amir Abbas
40b8f17717 Support insecure flag on artifact (#20126) 2024-03-14 10:59:20 -05:00
Seth Hoenig
bb54d16e4a exec2: setup RPC plumbing for dynamic workload users (#20129)
And pass the dynamic users pool from the client into the hook.
2024-03-13 14:06:52 -05:00
Seth Hoenig
05937ab75b exec2: add client support for unveil filesystem isolation mode (#20115)
* exec2: add client support for unveil filesystem isolation mode

This PR adds support for a new filesystem isolation mode, "Unveil". The
mode introduces a "alloc_mounts" directory where tasks have user-owned
directory structure which are bind mounts into the real alloc directory
structure. This enables a task driver to use landlock (and maybe the
real unveil on openbsd one day) to isolate a task to the task owned
directory structure, providing sandboxing.

* actually create alloc-mounts-dir directory

* fix doc strings about alloc mount dir paths
2024-03-13 08:24:17 -05:00
carrychair
5f5b34db0e remove repetitive words (#20110)
Signed-off-by: carrychair <linghuchong404@gmail.com>
2024-03-11 08:52:08 +00:00
Seth Hoenig
67554b8f91 exec2: implement dynamic workload users taskrunner hook (#20069)
* exec2: implement dynamic workload users taskrunner hook

This PR impelements a TR hook for allocating dynamic workload users from
a pool managed by the Nomad client. This adds a new task driver Capability,
DynamicWorkloadUsers - which a task driver must indicate in order to make
use of this feature.

The client config plumbing is coming in a followup PR - in the RFC we
realized having a client.users block would be nice to have, with some
additional unrelated options being moved from the deprecated client.options
config.

* learn to spell
2024-03-06 09:34:27 -06:00
Tim Gross
45b2c34532 cni: add DNS set by CNI plugins to task configuration (#20007)
CNI plugins may set DNS configuration, but this isn't threaded through to the
task configuration so that we can write it to the `/etc/resolv.conf` file as
needed. Add the `AllocNetworkStatus` to the alloc hook resources so they're
accessible from the taskrunner. Any DNS entries provided by the user will
override these values.

Fixes: https://github.com/hashicorp/nomad/issues/11102
2024-02-20 10:17:27 -05:00
Tim Gross
a54657899c CNI: fix deprecation warnings (#19954)
We updated our `go-cni` dependency in #17582 but this left deprecation warnings
on the `cni.CNIResult` type (now `cni.Result`).
2024-02-12 15:35:43 -05:00
Tim Gross
df86503349 template: sandbox template rendering
The Nomad client renders templates in the same privileged process used for most
other client operations. During internal testing, we discovered that a malicious
task can create a symlink that can cause template rendering to read and write to
arbitrary files outside the allocation sandbox. Because the Nomad agent can be
restarted without restarting tasks, we can't simply check that the path is safe
at the time we write without encountering a time-of-check/time-of-use race.

To protect Nomad client hosts from this attack, we'll now read and write
templates in a subprocess:

* On Linux/Unix, this subprocess is sandboxed via chroot to the allocation
  directory. This requires that Nomad is running as a privileged process. A
  non-root Nomad agent will warn that it cannot sandbox the template renderer.

* On Windows, this process is sandboxed via a Windows AppContainer which has
  been granted access to only to the allocation directory. This does not require
  special privileges on Windows. (Creating symlinks in the first place can be
  prevented by running workloads as non-Administrator or
  non-ContainerAdministrator users.)

Both sandboxes cause encountered symlinks to be evaluated in the context of the
sandbox, which will result in a "file not found" or "access denied" error,
depending on the platform. This change will also require an update to
Consul-Template to allow callers to inject a custom `ReaderFunc` and
`RenderFunc`.

This design is intended as a workaround to allow us to fix this bug without
creating backwards compatibility issues for running tasks. A future version of
Nomad may introduce a read-only mount specifically for templates and artifacts
so that tasks cannot write into the same location that the Nomad agent is.

Fixes: https://github.com/hashicorp/nomad/issues/19888
Fixes: CVE-2024-1329
2024-02-08 10:40:24 -05:00
Juana De La Cuesta
120c3ca3c9 Add granular control of SELinux labels for host mounts (#19839)
Add new configuration option on task's volume_mounts, to give a fine grained control over SELinux "z" label

* Update website/content/docs/job-specification/volume_mount.mdx

Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>

* fix: typo

* func: make volume mount verification happen even on  mounts with no volume

---------

Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
Co-authored-by: Tim Gross <tgross@hashicorp.com>
2024-02-05 10:05:33 +01:00
Tim Gross
334c383eb6 template: run template tests on Windows where possible (#19856)
We don't run the whole suite of unit tests on all platforms to keep CI times
reasonable, so the only things we've been running on Windows are
platform-specific.

I'm working on some platform-specific `template` related work and having these
tests run on Windows will reduce the risk of regressions. Our Windows CI box
doesn't have Consul or Vault, so I've skipped those tests for the time being,
and can follow up with that later. There's also a test with assertions looking
for specific paths, and the results are different on Windows. I've skipped those
for the moment as well and will follow up under a separate PR.

Also swap `testify` for `shoenig/test`
2024-02-02 09:22:03 -05:00
Michael Schurter
8f564182ef connect: rewrite envoy bootstrap on every restart (#19787)
Fixes #19781

Do not mark the envoy bootstrap hook as done after successfully running once.
Since the bootstrap file is written to /secrets, which is a tmpfs on supported
platforms, it is not persisted across reboots. This causes the task and
allocation to fail on reboot (see #19781).

This fixes it by *always* rewriting the envoy bootstrap file every time the
Nomad agent starts. This does mean we may write a new bootstrap file to an
already running Envoy task, but in my testing that doesn't have any impact.

This commit doesn't necessarily fix every use of Done by hooks, but hopefully
improves the situation. The comment on Done has been expanded to hopefully
avoid misuse in the future.

Done assertions were removed from tests as they add more noise than value.

*Alternative 1: Use a regular file*

An alternative approach would be to write the bootstrap file somewhere
other than the tmpfs, but this is *unsafe* as when Consul ACLs are
enabled the file will contain a secret token:
https://developer.hashicorp.com/consul/commands/connect/envoy#bootstrap

*Alternative 2: Detect if file is already written*

An alternative approach would be to detect if the bootstrap file exists,
and only write it if it doesn't.

This is just a more complicated form of the current fix. I think in
general in the absence of other factors task hooks should be idempotent
and therefore able to rerun on any agent startup. This simplifies the
code and our ability to reason about task restarts vs agent restarts vs
node reboots by making them all take the same code path.
2024-01-24 11:26:31 -08:00
Seth Hoenig
5b7f4746ce client/allocdir: use an interface in place of AllocDir structs (#19703)
* client/allocdir: use an interface in place of AllocDir structs

This PR replace *allocdir.AllocDir with allocdir.Interface such that we
may eventually have another implementation of alloc directories. This is
in support of the exec2 driver, which will need an implementation of the
alloc directory incompatibile with the current version.

* use rlock
2024-01-12 14:13:29 -06:00
Tim Gross
0935f443dc vault: support allowing tokens to expire without refresh (#19691)
Some users with batch workloads or short-lived prestart tasks want to derive a
Vaul token, use it, and then allow it to expire without requiring a constant
refresh. Add the `vault.allow_token_expiration` field, which works only with the
Workload Identity workflow and not the legacy workflow.

When set to true, this disables the client's renewal loop in the
`vault_hook`. When Vault revokes the token lease, the token will no longer be
valid. The client will also now automatically detect if the Vault auth
configuration does not allow renewals and will disable the renewal loop
automatically.

Note this should only be used when a secret is requested from Vault once at the
start of a task or in a short-lived prestart task. Long-running tasks should
never set `allow_token_expiration=true` if they obtain Vault secrets via
`template` blocks, as the Vault token will expire and the template runner will
continue to make failing requests to Vault until the `vault_retry` attempts are
exhausted.

Fixes: https://github.com/hashicorp/nomad/issues/8690
2024-01-10 14:49:02 -05:00
Piotr Kazmierczak
f1fb51422b client: consul hook not called for templates (#19490)
Due to some refactoring mishap, task-level Consul hook was never triggered and
thus never wrote any secrets in task secret dirs.
2023-12-15 17:16:00 +01:00
Luiz Aoqui
0bc822db40 vault: load default config for tasks without vault (#19439)
It is often expected that a task that needs access to Vault defines a
`vault` block to specify the Vault policy to use to derive a token.

But in some scenarios, like when the Nomad client is connected to a
local Vault agent that is responsible for authn/authz, the task is not
required to defined a `vault` block.

In these situations, the `default` Vault cluster should be used to
render the template.
2023-12-12 14:06:55 -05:00
Luiz Aoqui
099ee06a60 Revert "deps: update go-metrics to v0.5.3 (#19190)" (#19374)
* Revert "deps: update go-metrics to v0.5.3 (#19190)"

This reverts commit ddb060d8b3.

* changelog: add entry for #19374
2023-12-08 08:46:55 -05:00
Tim Gross
ae403dcb4b script_check_hook: handle task-level Consul namespace (#19241)
The `script_check_hook` runs at the task level but can create script checks for
both task-level services and group-level services. Now that we allow the Consul
namespace to be set at the task-level `consul.namespace`, we need to have both
possible namespaces handy when creating and updating checks.
2023-11-30 11:13:30 -05:00
Luiz Aoqui
1a2d41d30b consul: refactor allocrunner consul hook (#19229)
Refactor the JWT token derivation logic to only take a single request
since it was only ever called with a map of length one.

The original implementation received multiple requets to match the
legacy flow, but but legacy flow requests were batched from the Nomad
client to the server, which doesn't happen for JWT. Each JWT request
goes directly from the Nomad client to the Consul agent, so there is no
batching involved.
2023-11-30 10:55:03 -05:00
Tim Gross
f77b4baebb service_hook: ensure task-level consul.namespace is respected (#19224)
The task-level service hook is using the group-level method to get the provider
namespace, but this was not designed with task-level `consul` blocks in
mind. This leads to task-level services using the group-level
`consul.namespace`. Fix by creating a method to get the correct namespace and
move this into the service hook itself rather than in the outer `initHooks`
method.
2023-11-29 16:46:27 -05:00
Luiz Aoqui
ddb060d8b3 deps: update go-metrics to v0.5.3 (#19190)
Update `go-metrics` to v0.5.3 to pick
https://github.com/hashicorp/go-metrics/pull/146.
2023-11-28 12:37:57 -05:00
Piotr Kazmierczak
6a98e45c53 client: add metadata to tokens requested by Consul client (#19196)
This way tokens created by Nomad workloads are easier to keep track of.
2023-11-28 16:09:31 +01:00
Piotr Kazmierczak
711da2e653 client: change Consul client interface (#19140)
DeriveSITokenWithJWT is a misleading method name, because it's used to derive
Consul ACL tokens for other purposes too.
2023-11-21 16:01:26 +01:00
Piotr Kazmierczak
e9019d5fc8 client: make sure consul_hook does not perform double requests for tasks (#19137) 2023-11-21 10:24:45 +01:00
Tim Gross
b5af87ebf3 set Vault namespace from task in vault_hook JWT login (#19080)
The JWT login codepath for the `vault_hook` was missing the Vault namespace, so
the login request for non-default namespaces would fail.
2023-11-14 09:54:36 -05:00
Luiz Aoqui
f0acf72ae7 client: fix Consul token retrievel for templates (#19058)
The template hook must use the Consul token for the cluster defined in
the task-level `consul` block or, if `nil, in the group-level `consul`
block.

The Consul tokens are generated by the allocrunner consul hook, but
during the transition period we must fallback to the Nomad agent token
if workload identities are not being used.

So an empty token returned from `GetConsulTokens()` is not enough to
determine if we should use the legacy flow (either this is an old task
or the cluster is not configured for Consul WI), or if there is a
misconfiguration (task or group is `consul` block is using a cluster
that doesn't have an `identity` set).

In order to distinguish between the two scenarios we must iterate over
the task identities looking for one suitable for the Consul cluster
being used.
2023-11-10 13:42:30 -05:00
Tim Gross
5ad715b281 fix taskrunner test after broken signature (#19056)
PRs #19034 and #19040 accidentally conflicted with each other without a merge
conflict when #19034 changes the method signature of `SetConsulTokens`. Because
CI doesn't rebase, both PRs tested fine and only were broken once they landed on
`main`. Fix that.
2023-11-09 15:53:25 -05:00
Luiz Aoqui
6d8417014f client: pass alloc hook resources to template hook (#19040)
The task template hook uses the alloc resource to retrieve Consul
tokens, so it must be passed from the allocation.
2023-11-09 10:55:35 -05:00
Tim Gross
c7c3b3ae33 revoke Consul tokens obtained via WI when alloc stops (#19034)
Add a `Postrun` and `Destroy` hook to the allocrunner's `consul_hook` to ensure
that Consul tokens we've created via WI get revoked via the logout API when
we're done with them. Also add the logout to the `Prerun` hook if we've hit an
error.
2023-11-09 10:08:09 -05:00
Tim Gross
7191c78928 refactor: rename allocrunner's Consul service reg handler (#19019)
The allocrunner has a service registration handler that proxies various API
calls to Consul. With multi-cluster support (for ENT), the service registration
handler is what selects the correct Consul client. The name of this field in the
allocrunner and taskrunner code base looks like it's referring to the actual
Consul API client. This was actually the case before Nomad native service
discovery was implemented, but now the name is misleading.
2023-11-08 15:39:32 -05:00
Tim Gross
50f0ce5412 config: remove old Vault/Consul config blocks from client (#18994)
Remove the now-unused original configuration blocks for Consul and Vault from
the client. When the client needs to refer to a Consul or Vault block it will
always be for a specific cluster for the task/service. Add a helper for
accessing the default clusters (for the client's own use).

This is two of three changesets for this work. The remainder will implement the
same changes in the `command/agent` package.

As part of this work I discovered and fixed two bugs:

* The gRPC proxy socket that we create for Envoy is only ever created using the
  default Consul cluster's configuration. This will prevent Connect from being
  used with the non-default cluster.
* The Consul configuration we use for templates always comes from the default
  Consul cluster's configuration, but will use the correct Consul token for the
  non-default cluster. This will prevent templates from being used with the
  non-default cluster.

Ref: https://github.com/hashicorp/nomad/issues/18947
Ref: https://github.com/hashicorp/nomad/pull/18991
Fixes: https://github.com/hashicorp/nomad/issues/18984
Fixes: https://github.com/hashicorp/nomad/issues/18983
2023-11-07 09:15:37 -05:00
Tim Gross
483e78615d template: fix test assertion to be compatible between CE/ENT (#18957)
The template hook emits an error when the task has a Consul block that requires
WI but there's no WI. The exact error message we get depends on whether we're
running in CE or ENT. Update the test assertion so that we can tolerate this
difference without building ENT-specific test files.
2023-11-01 13:26:45 -04:00
Tim Gross
dd62e8a319 consul/vault: use accessor method to get cluster name in client (#18955)
When looking up the Consul or Vault cluster from a client hook, we should always
use an accessor function rather than trying to lookup the `Cluster` field, which
may be empty for jobs registered before Nomad 1.7.
2023-11-01 10:59:59 -04:00