Commit Graph

600 Commits

Author SHA1 Message Date
James Rasell
beb4097e81 client: mark the remote_task hook as deprecated. (#24505) 2024-11-20 15:32:50 +00:00
Tim Gross
a420732424 consul: allow non-root Nomad to rewrite token (#24410)
When a task restarts, the Nomad client may need to rewrite the Consul token, but
it's created with permissions that prevent a non-root agent from writing to
it. While Nomad clients should be run as root (currently), it's harmless to
allow whatever user the Nomad agent is running as to be able to write to it, and
that's one less barrier to rootless Nomad.

Ref: https://github.com/hashicorp/nomad/issues/23859#issuecomment-2465757392
2024-11-19 10:21:14 -05:00
Michael Smithhisler
0714353324 fix: handle template re-renders on client restart (#24399)
When multiple templates with api functions are included in a task, it's
possible for consul-template to re-render templates as it creates
watchers, overwriting render event data. This change uses event fields
that do not get overwritten, and only executes the change mode for
templates that were actually written to disk.

---------

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2024-11-08 12:49:38 -05:00
Seth Hoenig
f1ce127524 jobspec: add a chown option to artifact block (#24157)
* jobspec: add a chown option to artifact block

This PR adds a boolean 'chown' field to the artifact block.

It indicates whether the Nomad client should chown the downloaded files
and directories to be owned by the task.user. This is useful for drivers
like raw_exec and exec2 which are subject to the host filesystem user
permissions structure. Before, these drivers might not be able to use or
manage the downloaded artifacts since they would be owned by the root
user on a typical Nomad client configuration.

* api: no need for pointer of chown field
2024-10-11 11:30:27 -05:00
Martijn Vegter
3ecf0d21e2 metrics: introduce client config to include alloc metadata as part of the base labels (#23964) 2024-10-02 10:55:44 -04:00
Tim Gross
cc9227b858 template: fix panic in change_mode=script on client restart (#24057)
When we introduced change_mode=script to templates, we passed the driver handle
down into the template manager so we could call its `Exec` method directly. But
the lifecycle of the driver handle is managed by the taskrunner and isn't
available when the template manager is first created. This has led to a series
of patches trying to fixup the behavior (#15915, #15192, #23663, #23917). Part
of the challenge in getting this right is using an interface to avoid the
circular import of the driver handle.

But the taskrunner already has a way to deal with this problem using a "lazy
handle". The other template change modes already use this indirectly through the
`Lifecycle` interface. Change the driver handle `Exec` call in the template
manager to a new `Lifecycle.Exec` call that reuses the existing behavior. This
eliminates the need for the template manager to know anything at all about the
handle state.

Fixes: https://github.com/hashicorp/nomad/issues/24051
2024-09-25 08:59:01 -04:00
Michael Smithhisler
6b6aa7cc26 identity: adds ability to specify custom filepath for saving workload identities (#24038) 2024-09-23 10:27:00 -04:00
Tim Gross
07aca67108 template: lock task handle before trying script check (#23917)
In #23663 we fixed the template hook so that `change_mode="script"` didn't lose
track of the task handle during restores. But this revealed a second bug which
is that access to the handle is not locked while in use, which can allow it to
be removed concurrently.

Fixes: https://github.com/hashicorp/nomad/issues/23875
2024-09-12 08:41:06 -04:00
Tim Gross
b25f1b66ce resources: allow job authors to configure size of secrets tmpfs (#23696)
On supported platforms, the secrets directory is a 1MiB tmpfs. But some tasks
need larger space for downloading large secrets. This is especially the case for
tasks using `templates`, which need extra room to write a temporary file to the
secrets directory that gets renamed to the old file atomically.

This changeset allows increasing the size of the tmpfs in the `resources`
block. Because this is a memory resource, we need to include it in the memory we
allocate for scheduling purposes. The task is already prevented from using more
memory in the tmpfs than the `resources.memory` field allows, but can bypass
that limit by writing to the tmpfs via `template` or `artifact` blocks.

Therefore, we need to account for the size of the tmpfs in the allocation
resources. Simply adding it to the memory needed when we create the allocation
allows it to be accounted for in all downstream consumers, and then we'll
subtract that amount from the memory resources just before configuring the task
driver.

For backwards compatibility, the default value of 1MiB is "free" and ignored by
the scheduler. Otherwise we'd be increasing the allocated resources for every
existing alloc, which could cause problems across upgrades. If a user explicitly
sets `resources.secrets = 1` it will no longer be free.

Fixes: https://github.com/hashicorp/nomad/issues/2481
Ref: https://hashicorp.atlassian.net/browse/NET-10070
2024-08-05 16:06:58 -04:00
Tim Gross
c280891703 template: allow change_mode script to run after client restart (#23663)
For templates with `change_mode = "script"`, we set a driver handle in the
poststart method, so the template runner can execute the script inside the
task. But when the client is restarted and the template contents change during
that window, we trigger a change_mode in the prestart method. In that case, the
hook will not have the handle and so returns an errror trying to run the change
mode.

We restore the driver handle before we call any prestart hooks, so we can pass
that handle in the constructor whenever it's available. In the normal task start
case the handle will be empty but also won't be called.

The error messages are also misleading, as there's no capabilities check
happening here. Update the error messages to match.

Fixes: https://github.com/hashicorp/nomad/issues/15851
Ref: https://hashicorp.atlassian.net/browse/NET-9338
2024-07-24 08:29:39 -04:00
Piotr Kazmierczak
356ea87e00 template: disable sandboxed rendering on Windows (#23432)
Following #23443, we no longer need to sandbox template rendering on Windows.
2024-06-28 17:16:27 +02:00
Tim Gross
df67e74615 Consul: add preflight checks for Envoy bootstrap (#23381)
Nomad creates Consul ACL tokens and service registrations to support Consul
service mesh workloads, before bootstrapping the Envoy proxy. Nomad always talks
to the local Consul agent and never directly to the Consul servers. But the
local Consul agent talks to the Consul servers in stale consistency mode to
reduce load on the servers. This can result in the Nomad client making the Envoy
bootstrap request with a tokens or services that have not yet replicated to the
follower that the local client is connected to. This request gets a 404 on the
ACL token and that negative entry gets cached, preventing any retries from
succeeding.

To workaround this, we'll use a method described by our friends over on
`consul-k8s` where after creating the objects in Consul we try to read them from
the local agent in stale consistency mode (which prevents a failed read from
being cached). This cannot completely eliminate this source of error because
it's possible that Consul cluster replication is unhealthy at the time we need
it, but this should make Envoy bootstrap significantly more robust.

This changset adds preflight checks for the objects we create in Consul:
* We add a preflight check for ACL tokens after we login via via Workload
  Identity and in the function we use to derive tokens in the legacy
  workflow. We do this check early because we also want to use this token for
  registering group services in the allocrunner hooks.
* We add a preflight check for services right before we bootstrap Envoy in the
  taskrunner hook, so that we have time for our service client to batch updates
  to the local Consul agent in addition to the local agent sync.

We've added the timeouts to be configurable via node metadata rather than the
usual static configuration because for most cases, users should not need to
touch or even know these values are configurable; the configuration is mostly
available for testing.


Fixes: https://github.com/hashicorp/nomad/issues/9307
Fixes: https://github.com/hashicorp/nomad/issues/10451
Fixes: https://github.com/hashicorp/nomad/issues/20516

Ref: https://github.com/hashicorp/consul-k8s/pull/887
Ref: https://hashicorp.atlassian.net/browse/NET-10051
Ref: https://hashicorp.atlassian.net/browse/NET-9273
Follow-up: https://hashicorp.atlassian.net/browse/NET-10138
2024-06-27 10:15:37 -04:00
Daniel Bennett
4415fabe7d jobspec: time based task execution (#22201)
this is the CE side of an Enterprise-only feature.
a job trying to use this in CE will fail to validate.

to enable daily-scheduled execution entirely client-side,
a job may now contain:

task "name" {
  schedule {
    cron {
      start    = "0 12 * * * *" # may not include "," or "/"
      end      = "0 16"         # partial cron, with only {minute} {hour}
      timezone = "EST"          # anything in your tzdb
    }
  }
...

and everything about the allocation will be placed as usual,
but if outside the specified schedule, the taskrunner will block
on the client, waiting on the schedule start, before proceeding
with the task driver execution, etc.

this includes a taksrunner hook, which watches for the end of
the schedule, at which point it will kill the task.

then, restarts-allowing, a new task will start and again block
waiting for start, and so on.

this also includes all the plumbing required to pipe API calls
through from command->api->agent->server->client, so that
tasks can be force-run, force-paused, or resume the schedule
on demand.
2024-05-22 15:40:25 -05:00
James Rasell
04ba358266 client: expose network namespace CNI config as task env vars. (#11810)
This change exposes CNI configuration details of a network
namespace as environment variables. This allows a task to use
these value to configure itself; a potential use case is to run
a Raft application binding to IP and Port details configured using
the bridge network mode.
2024-05-14 09:02:06 +01:00
Seth Hoenig
14a022cbc0 drivers/raw_exec: enable setting cgroup override values (#20481)
* drivers/raw_exec: enable setting cgroup override values

This PR enables configuration of cgroup override values on the `raw_exec`
task driver. WARNING: setting cgroup override values eliminates any
gauruntee Nomad can make about resource availability for *any* task on
the client node.

For cgroup v2 systems, set a single unified cgroup path using `cgroup_v2_override`.
The path may be either absolute or relative to the cgroup root.

config {
  cgroup_v2_override = "custom.slice/app.scope"
}

or

config {
  cgroup_v2_override = "/sys/fs/cgroup/custom.slice/app.scope"
}

For cgroup v1 systems, set a per-controller path for each controller using
`cgroup_v1_override`. The path(s) may be either absolute or relative to
the controller root.

config {
  cgroup_v1_override = {
    "pids": "custom/app",
    "cpuset": "custom/app",
  }
}

or

config {
  cgroup_v1_override = {
    "pids": "/sys/fs/cgroup/pids/custom/app",
    "cpuset": "/sys/fs/cgroup/cpuset/custom/app",
  }
}

* drivers/rawexec: ensure only one of v1/v2 cgroup override is set

* drivers/raw_exec: executor should error if setting cgroup does not work

* drivers/raw_exec: create cgroups in raw_exec tests

* drivers/raw_exec: ensure we fail to start if custom cgroup set and non-root

* move custom cgroup func into shared file

---------

Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
2024-05-07 16:46:27 -07:00
Tim Gross
f41bc468eb consul: provide CONSUL_HTTP_TOKEN env var to tasks (#20519)
When available, we provide an environment variable `CONSUL_TOKEN` to tasks, but
this isn't the environment variable expected by the Consul CLI. Job
specifications like deploying an API Gateway become noticeably nicer if we can
instead provide the expected env var.
2024-05-03 11:30:33 -04:00
Seth Hoenig
5f64e42d73 client: fixup how alloc mounts directory are setup (#20463) 2024-04-26 07:29:52 -05:00
Tim Gross
6d58acd897 WI: ensure tasks within same alloc get different Consul tokens (#20411)
The `consul_hook` in the allocrunner gets a separate Consul token for each task,
even if the tasks' identities have the same name, but used the identity name as
the key to the alloc hook resources map. This means the last task in the group
overwrites the Consul tokens of all other tasks.

Fix this by adding the task name to the key in the allocrunner's
`consul_hook`. And update the taskrunner's `consul_hook` to expect the task
name in the key.

Fixes: https://github.com/hashicorp/nomad/issues/20374
Fixes: https://hashicorp.atlassian.net/browse/NOMAD-614
2024-04-17 11:29:58 -04:00
Seth Hoenig
ae6c4c8e3f deps: purge use of old x/exp packages (#20373) 2024-04-12 08:29:00 -05:00
astudentofblake
7b7ed12326 func: Allow custom paths to be added the the getter landlock (#20349)
* func: Allow custom paths to be added the the getter landlock

Fixes: 20315

* fix: slices imports
fix: more meaningful examples
fix: improve documentation
fix: quote error output
2024-04-11 15:17:33 -05:00
Tim Gross
d56e8ad1aa WI: ensure Consul hook and WID manager interpolate services (#20344)
Services can have some of their string fields interpolated. The new Workload
Identity flow doesn't interpolate the services before requesting signed
identities or using those identities to get Consul tokens.

Add support for interpolation to the WID manager and the Consul tokens hook by
providing both with a taskenv builder. Add an "interpolate workload" field to
the WI handle to allow passing the original workload name to the server so the
server can find the correct service to sign.

This changeset also makes two related test improvements:
* Remove the mock WID manager, which was only used in the Consul hook tests and
  isn't necessary so long as we provide the real WID manager with the mock
  signer and never call `Run` on it. It wasn't feasible to exercise the correct
  behavior without this refactor, as the mocks were bypassing the new code.
* Fixed swapped expect-vs-actual assertions on the `consul_hook` tests.

Fixes: https://github.com/hashicorp/nomad/issues/20025
2024-04-11 15:40:28 -04:00
Tim Gross
15162917c1 cni: fix regression in falling back to DNS owned by dockerd (#20189)
In #20007 we fixed a bug where the DNS configuration set by CNI plugins was not
threaded through to the task configuration. This resulted in a regression where
a DNS override set by `dockerd` was not respected for `bridge` mode
networking. Our existing handling of CNI DNS incorrectly assumed that the DNS
field would be empty, when in fact it contains a single empty DNS struct.

Handle this case correctly by checking whether the DNS struct we get back from
CNI has any nameservers, and ignore it if it doesn't. Expand test coverage of
this case.

Fixes: https://github.com/hashicorp/nomad/issues/20174
2024-03-22 10:54:16 -04:00
Tim Gross
13617eee4b template: improve internal documentation around shutdown (#20134)
While investigating a report around possible consul-template shutdown issues,
which didn't bear fruit, I found that some of the logic around template runner
shutdown is unintuitive.

* Add some doc strings to the places where someone might think we should be
  obviously stopping the runner or returning early.
* Mark context argument for `Poststart`, `Stop`, and `Update` hooks as unused.

No functional code changes.
2024-03-14 15:33:32 -04:00
Amir Abbas
40b8f17717 Support insecure flag on artifact (#20126) 2024-03-14 10:59:20 -05:00
Seth Hoenig
bb54d16e4a exec2: setup RPC plumbing for dynamic workload users (#20129)
And pass the dynamic users pool from the client into the hook.
2024-03-13 14:06:52 -05:00
Seth Hoenig
05937ab75b exec2: add client support for unveil filesystem isolation mode (#20115)
* exec2: add client support for unveil filesystem isolation mode

This PR adds support for a new filesystem isolation mode, "Unveil". The
mode introduces a "alloc_mounts" directory where tasks have user-owned
directory structure which are bind mounts into the real alloc directory
structure. This enables a task driver to use landlock (and maybe the
real unveil on openbsd one day) to isolate a task to the task owned
directory structure, providing sandboxing.

* actually create alloc-mounts-dir directory

* fix doc strings about alloc mount dir paths
2024-03-13 08:24:17 -05:00
Seth Hoenig
67554b8f91 exec2: implement dynamic workload users taskrunner hook (#20069)
* exec2: implement dynamic workload users taskrunner hook

This PR impelements a TR hook for allocating dynamic workload users from
a pool managed by the Nomad client. This adds a new task driver Capability,
DynamicWorkloadUsers - which a task driver must indicate in order to make
use of this feature.

The client config plumbing is coming in a followup PR - in the RFC we
realized having a client.users block would be nice to have, with some
additional unrelated options being moved from the deprecated client.options
config.

* learn to spell
2024-03-06 09:34:27 -06:00
Tim Gross
45b2c34532 cni: add DNS set by CNI plugins to task configuration (#20007)
CNI plugins may set DNS configuration, but this isn't threaded through to the
task configuration so that we can write it to the `/etc/resolv.conf` file as
needed. Add the `AllocNetworkStatus` to the alloc hook resources so they're
accessible from the taskrunner. Any DNS entries provided by the user will
override these values.

Fixes: https://github.com/hashicorp/nomad/issues/11102
2024-02-20 10:17:27 -05:00
Tim Gross
df86503349 template: sandbox template rendering
The Nomad client renders templates in the same privileged process used for most
other client operations. During internal testing, we discovered that a malicious
task can create a symlink that can cause template rendering to read and write to
arbitrary files outside the allocation sandbox. Because the Nomad agent can be
restarted without restarting tasks, we can't simply check that the path is safe
at the time we write without encountering a time-of-check/time-of-use race.

To protect Nomad client hosts from this attack, we'll now read and write
templates in a subprocess:

* On Linux/Unix, this subprocess is sandboxed via chroot to the allocation
  directory. This requires that Nomad is running as a privileged process. A
  non-root Nomad agent will warn that it cannot sandbox the template renderer.

* On Windows, this process is sandboxed via a Windows AppContainer which has
  been granted access to only to the allocation directory. This does not require
  special privileges on Windows. (Creating symlinks in the first place can be
  prevented by running workloads as non-Administrator or
  non-ContainerAdministrator users.)

Both sandboxes cause encountered symlinks to be evaluated in the context of the
sandbox, which will result in a "file not found" or "access denied" error,
depending on the platform. This change will also require an update to
Consul-Template to allow callers to inject a custom `ReaderFunc` and
`RenderFunc`.

This design is intended as a workaround to allow us to fix this bug without
creating backwards compatibility issues for running tasks. A future version of
Nomad may introduce a read-only mount specifically for templates and artifacts
so that tasks cannot write into the same location that the Nomad agent is.

Fixes: https://github.com/hashicorp/nomad/issues/19888
Fixes: CVE-2024-1329
2024-02-08 10:40:24 -05:00
Juana De La Cuesta
120c3ca3c9 Add granular control of SELinux labels for host mounts (#19839)
Add new configuration option on task's volume_mounts, to give a fine grained control over SELinux "z" label

* Update website/content/docs/job-specification/volume_mount.mdx

Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>

* fix: typo

* func: make volume mount verification happen even on  mounts with no volume

---------

Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
Co-authored-by: Tim Gross <tgross@hashicorp.com>
2024-02-05 10:05:33 +01:00
Tim Gross
334c383eb6 template: run template tests on Windows where possible (#19856)
We don't run the whole suite of unit tests on all platforms to keep CI times
reasonable, so the only things we've been running on Windows are
platform-specific.

I'm working on some platform-specific `template` related work and having these
tests run on Windows will reduce the risk of regressions. Our Windows CI box
doesn't have Consul or Vault, so I've skipped those tests for the time being,
and can follow up with that later. There's also a test with assertions looking
for specific paths, and the results are different on Windows. I've skipped those
for the moment as well and will follow up under a separate PR.

Also swap `testify` for `shoenig/test`
2024-02-02 09:22:03 -05:00
Michael Schurter
8f564182ef connect: rewrite envoy bootstrap on every restart (#19787)
Fixes #19781

Do not mark the envoy bootstrap hook as done after successfully running once.
Since the bootstrap file is written to /secrets, which is a tmpfs on supported
platforms, it is not persisted across reboots. This causes the task and
allocation to fail on reboot (see #19781).

This fixes it by *always* rewriting the envoy bootstrap file every time the
Nomad agent starts. This does mean we may write a new bootstrap file to an
already running Envoy task, but in my testing that doesn't have any impact.

This commit doesn't necessarily fix every use of Done by hooks, but hopefully
improves the situation. The comment on Done has been expanded to hopefully
avoid misuse in the future.

Done assertions were removed from tests as they add more noise than value.

*Alternative 1: Use a regular file*

An alternative approach would be to write the bootstrap file somewhere
other than the tmpfs, but this is *unsafe* as when Consul ACLs are
enabled the file will contain a secret token:
https://developer.hashicorp.com/consul/commands/connect/envoy#bootstrap

*Alternative 2: Detect if file is already written*

An alternative approach would be to detect if the bootstrap file exists,
and only write it if it doesn't.

This is just a more complicated form of the current fix. I think in
general in the absence of other factors task hooks should be idempotent
and therefore able to rerun on any agent startup. This simplifies the
code and our ability to reason about task restarts vs agent restarts vs
node reboots by making them all take the same code path.
2024-01-24 11:26:31 -08:00
Tim Gross
0935f443dc vault: support allowing tokens to expire without refresh (#19691)
Some users with batch workloads or short-lived prestart tasks want to derive a
Vaul token, use it, and then allow it to expire without requiring a constant
refresh. Add the `vault.allow_token_expiration` field, which works only with the
Workload Identity workflow and not the legacy workflow.

When set to true, this disables the client's renewal loop in the
`vault_hook`. When Vault revokes the token lease, the token will no longer be
valid. The client will also now automatically detect if the Vault auth
configuration does not allow renewals and will disable the renewal loop
automatically.

Note this should only be used when a secret is requested from Vault once at the
start of a task or in a short-lived prestart task. Long-running tasks should
never set `allow_token_expiration=true` if they obtain Vault secrets via
`template` blocks, as the Vault token will expire and the template runner will
continue to make failing requests to Vault until the `vault_retry` attempts are
exhausted.

Fixes: https://github.com/hashicorp/nomad/issues/8690
2024-01-10 14:49:02 -05:00
Piotr Kazmierczak
f1fb51422b client: consul hook not called for templates (#19490)
Due to some refactoring mishap, task-level Consul hook was never triggered and
thus never wrote any secrets in task secret dirs.
2023-12-15 17:16:00 +01:00
Luiz Aoqui
0bc822db40 vault: load default config for tasks without vault (#19439)
It is often expected that a task that needs access to Vault defines a
`vault` block to specify the Vault policy to use to derive a token.

But in some scenarios, like when the Nomad client is connected to a
local Vault agent that is responsible for authn/authz, the task is not
required to defined a `vault` block.

In these situations, the `default` Vault cluster should be used to
render the template.
2023-12-12 14:06:55 -05:00
Luiz Aoqui
099ee06a60 Revert "deps: update go-metrics to v0.5.3 (#19190)" (#19374)
* Revert "deps: update go-metrics to v0.5.3 (#19190)"

This reverts commit ddb060d8b3.

* changelog: add entry for #19374
2023-12-08 08:46:55 -05:00
Tim Gross
ae403dcb4b script_check_hook: handle task-level Consul namespace (#19241)
The `script_check_hook` runs at the task level but can create script checks for
both task-level services and group-level services. Now that we allow the Consul
namespace to be set at the task-level `consul.namespace`, we need to have both
possible namespaces handy when creating and updating checks.
2023-11-30 11:13:30 -05:00
Tim Gross
f77b4baebb service_hook: ensure task-level consul.namespace is respected (#19224)
The task-level service hook is using the group-level method to get the provider
namespace, but this was not designed with task-level `consul` blocks in
mind. This leads to task-level services using the group-level
`consul.namespace`. Fix by creating a method to get the correct namespace and
move this into the service hook itself rather than in the outer `initHooks`
method.
2023-11-29 16:46:27 -05:00
Luiz Aoqui
ddb060d8b3 deps: update go-metrics to v0.5.3 (#19190)
Update `go-metrics` to v0.5.3 to pick
https://github.com/hashicorp/go-metrics/pull/146.
2023-11-28 12:37:57 -05:00
Tim Gross
b5af87ebf3 set Vault namespace from task in vault_hook JWT login (#19080)
The JWT login codepath for the `vault_hook` was missing the Vault namespace, so
the login request for non-default namespaces would fail.
2023-11-14 09:54:36 -05:00
Luiz Aoqui
f0acf72ae7 client: fix Consul token retrievel for templates (#19058)
The template hook must use the Consul token for the cluster defined in
the task-level `consul` block or, if `nil, in the group-level `consul`
block.

The Consul tokens are generated by the allocrunner consul hook, but
during the transition period we must fallback to the Nomad agent token
if workload identities are not being used.

So an empty token returned from `GetConsulTokens()` is not enough to
determine if we should use the legacy flow (either this is an old task
or the cluster is not configured for Consul WI), or if there is a
misconfiguration (task or group is `consul` block is using a cluster
that doesn't have an `identity` set).

In order to distinguish between the two scenarios we must iterate over
the task identities looking for one suitable for the Consul cluster
being used.
2023-11-10 13:42:30 -05:00
Tim Gross
5ad715b281 fix taskrunner test after broken signature (#19056)
PRs #19034 and #19040 accidentally conflicted with each other without a merge
conflict when #19034 changes the method signature of `SetConsulTokens`. Because
CI doesn't rebase, both PRs tested fine and only were broken once they landed on
`main`. Fix that.
2023-11-09 15:53:25 -05:00
Luiz Aoqui
6d8417014f client: pass alloc hook resources to template hook (#19040)
The task template hook uses the alloc resource to retrieve Consul
tokens, so it must be passed from the allocation.
2023-11-09 10:55:35 -05:00
Tim Gross
c7c3b3ae33 revoke Consul tokens obtained via WI when alloc stops (#19034)
Add a `Postrun` and `Destroy` hook to the allocrunner's `consul_hook` to ensure
that Consul tokens we've created via WI get revoked via the logout API when
we're done with them. Also add the logout to the `Prerun` hook if we've hit an
error.
2023-11-09 10:08:09 -05:00
Tim Gross
7191c78928 refactor: rename allocrunner's Consul service reg handler (#19019)
The allocrunner has a service registration handler that proxies various API
calls to Consul. With multi-cluster support (for ENT), the service registration
handler is what selects the correct Consul client. The name of this field in the
allocrunner and taskrunner code base looks like it's referring to the actual
Consul API client. This was actually the case before Nomad native service
discovery was implemented, but now the name is misleading.
2023-11-08 15:39:32 -05:00
Tim Gross
50f0ce5412 config: remove old Vault/Consul config blocks from client (#18994)
Remove the now-unused original configuration blocks for Consul and Vault from
the client. When the client needs to refer to a Consul or Vault block it will
always be for a specific cluster for the task/service. Add a helper for
accessing the default clusters (for the client's own use).

This is two of three changesets for this work. The remainder will implement the
same changes in the `command/agent` package.

As part of this work I discovered and fixed two bugs:

* The gRPC proxy socket that we create for Envoy is only ever created using the
  default Consul cluster's configuration. This will prevent Connect from being
  used with the non-default cluster.
* The Consul configuration we use for templates always comes from the default
  Consul cluster's configuration, but will use the correct Consul token for the
  non-default cluster. This will prevent templates from being used with the
  non-default cluster.

Ref: https://github.com/hashicorp/nomad/issues/18947
Ref: https://github.com/hashicorp/nomad/pull/18991
Fixes: https://github.com/hashicorp/nomad/issues/18984
Fixes: https://github.com/hashicorp/nomad/issues/18983
2023-11-07 09:15:37 -05:00
Tim Gross
483e78615d template: fix test assertion to be compatible between CE/ENT (#18957)
The template hook emits an error when the task has a Consul block that requires
WI but there's no WI. The exact error message we get depends on whether we're
running in CE or ENT. Update the test assertion so that we can tolerate this
difference without building ENT-specific test files.
2023-11-01 13:26:45 -04:00
Tim Gross
dd62e8a319 consul/vault: use accessor method to get cluster name in client (#18955)
When looking up the Consul or Vault cluster from a client hook, we should always
use an accessor function rather than trying to lookup the `Cluster` field, which
may be empty for jobs registered before Nomad 1.7.
2023-11-01 10:59:59 -04:00
Michael Schurter
e49ca3c431 identity: Implement change_mode (#18943)
* identity: support change_mode and change_signal

wip - just jobspec portion

* test struct

* cleanup some insignificant boogs

* actually implement change mode

* docs tweaks

* add changelog

* test identity.change_mode operations

* use more words in changelog

* job endpoint tests

* address comments from code review

---------

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2023-11-01 09:41:11 -05:00
Tim Gross
d62213a135 consul: fix lookups of default cluster across upgrades (#18945)
Allocations that were created before Nomad 1.7 will not have the cluster field
set for their Consul blocks. While this can be corrected server-side, that
doesn't help allocations already on clients.
2023-11-01 10:11:54 -04:00