Commit Graph

715 Commits

Author SHA1 Message Date
James Rasell
fc75e9d117 consul/connect: fixed a bug where restarting proxy tasks failed. (#16815)
The first start of a Consul Connect proxy sidecar triggers a run
of the envoy_version hook which modifies the task config image
entry. The modification takes into account a number of factors to
correctly populate this. Importantly, once the hook has run, it
marks itself as done so the taskrunner will not execute it again.

When the client receives a non-destructive update for the
allocation which the proxy sidecar is a member of, it will update
and overwrite the task definition within the taskerunner. In doing
so it overwrite the modification performed by the hook. If the
allocation is restarted, the envoy_version hook will be skipped as
it previously marked itself as done, and therefore the sidecar
config image is incorrect and causes a driver error.

The fix removes the hook in marking itself as done to the view of
the taskrunner.
2023-04-11 15:56:03 +01:00
Michael Schurter
a9f52f540e Merge pull request #16836 from hashicorp/compliance/add-headers
[COMPLIANCE] Add Copyright and License Headers
2023-04-10 16:32:03 -07:00
Daniel Bennett
dbe836595e gracefully recover tasks that use csi node plugins (#16809)
new WaitForPlugin() called during csiHook.Prerun,
so that on startup, clients can recover running
tasks that use CSI volumes, instead of them being
terminated and rescheduled because they need a
node plugin that is "not found" *yet*, only because
the plugin task has not yet been recovered.
2023-04-10 17:15:33 -05:00
hashicorp-copywrite[bot]
f005448366 [COMPLIANCE] Add Copyright and License Headers 2023-04-10 15:36:59 +00:00
James Rasell
f257aec073 client: ensure envoy version hook uses all pointer receiver funcs. (#16813) 2023-04-06 14:47:00 +01:00
the-nando
385e3bab86 Do not set attributes when spawning the getter child (#16791)
* Do not set attributes when spawning the getter child

* Cleanup

* Cleanup

---------

Co-authored-by: the-nando <the-nando@invalid.local>
2023-04-05 11:47:51 -05:00
Tim Gross
f3fc54adcf CSI: set mounts in alloc hook resources atomically (#16722)
The allocrunner has a facility for passing data written by allocrunner hooks to
taskrunner hooks. Currently the only consumers of this facility are the
allocrunner CSI hook (which writes data) and the taskrunner volume hook (which
reads that same data).

The allocrunner hook for CSI volumes doesn't set the alloc hook resources
atomically. Instead, it gets the current resources and then writes a new version
back. Because the CSI hook is currently the only writer and all readers happen
long afterwards, this should be safe but #16623 shows there's some sequence of
events during restore where this breaks down.

Refactor hook resources so that hook data is accessed via setters and getters
that hold the mutex.
2023-04-03 11:03:36 -04:00
Seth Hoenig
ed498f8ddb nsd: always set deregister flag after deregistration of group (#16289)
* services: always set deregister flag after deregistration of group

This PR fixes a bug where the group service hook's deregister flag was
not set in some cases, causing the hook to attempt deregistrations twice
during job updates (alloc replacement).

In the tests ... we used to assert on the wrong behvior (remove twice) which
has now been corrected to assert we remove only once.

This bug was "silent" in the Consul provider world because the error logs for
double deregistration only show up in Consul logs; with the Nomad provider the
error logs are in the Nomad agent logs.

* services: cleanup group service hook tests
2023-03-17 09:44:21 -05:00
Seth Hoenig
995ab41cbc artifact: git needs more files for private repositories (#16508)
* landlock: git needs more files for private repositories

This PR fixes artifact downloading so that git may work when cloning from
private repositories. It needs

- file read on /etc/passwd
- dir read on /root/.ssh
- file write on /root/.ssh/known_hosts

Add these rules to the landlock rules for the artifact sandbox.

* cr: use nonexistent instead of devnull

Co-authored-by: Michael Schurter <mschurter@hashicorp.com>

* cr: use go-homdir for looking up home directory

* pr: pull go-homedir into explicit require

* cr: fixup homedir tests in homeless root cases

* cl: fix root test for real

---------

Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
2023-03-16 12:22:25 -05:00
Seth Hoenig
ea727dff9e artifact: do not set process attributes on darwin (#16511)
This PR fixes the non-root macOS use case where artifact downloads
stopped working. It seems setting a Credential on a SysProcAttr
used by the exec package will always cause fork/exec to fail -
even if the credential contains our own UID/GID or nil UID/GID.

Technically we do not need to set this as the child process will
inherit the parent UID/GID anyway... and not setting it makes
things work again ... /shrug
2023-03-16 11:31:18 -05:00
Luiz Aoqui
419c4bfa4e allocrunner: fix health check monitoring for Consul services (#16402)
Services must be interpolated to replace runtime variables before they
can be compared against the values returned by Consul.
2023-03-10 14:43:31 -05:00
Seth Hoenig
95359b8c4c client: disable running artifact downloader as nobody (#16375)
* client: disable running artifact downloader as nobody

This PR reverts a change from Nomad 1.5 where artifact downloads were
executed as the nobody user on Linux systems. This was done as an attempt
to improve the security model of artifact downloading where third party
tools such as git or mercurial would be run as the root user with all
the security implications thereof.

However, doing so conflicts with Nomad's own advice for securing the
Client data directory - which when setup with the recommended directory
permissions structure prevents artifact downloads from working as intended.

Artifact downloads are at least still now executed as a child process of
the Nomad agent, and on modern Linux systems make use of the kernel Landlock
feature for limiting filesystem access of the child process.

* docs: update upgrade guide for 1.5.1 sandboxing

* docs: add cl

* docs: add title to upgrade guide fix
2023-03-08 15:58:43 -06:00
Lance Haig
48e7d70fcd deps: Update ioutil deprecated library references to os and io respectively in the client package (#16318)
* Update ioutil deprecated library references to os and io respectively

* Deal with the errors produced.

Add error handling to filEntry info
Add error handling to info
2023-03-08 13:25:10 -06:00
Luiz Aoqui
b07af57618 client: don't emit task shutdown delay event if not waiting (#16281) 2023-03-03 18:22:06 -05:00
Farbod Ahmadian
fbd0dcbe9b tests: add functionality to skip a test if it's not running in CI and not with root user (#16222) 2023-03-02 13:38:27 -05:00
Tim Gross
f619b0bd7a populate Nomad token for task runner update hooks (#16266)
The `TaskUpdateRequest` struct we send to task runner update hooks was not
populating the Nomad token that we get from the task runner (which we do for the
Vault token). This results in task runner hooks like the template hook
overwriting the Nomad token with the zero value for the token. This causes
in-place updates of a task to break templates (but not other uses that rely on
identity but don't currently bother to update it, like the identity hook).
2023-02-27 10:48:13 -05:00
Seth Hoenig
30bcd51588 services: ensure task group is set on service hook (#16240)
This PR fixes a bug where the task group information was not being set
on the serviceHook.AllocInfo struct, which is needed later on for calculating
the CheckID of a nomad service check. The CheckID is calculated independently
from multiple callsites, and the information being passed in must be consistent,
including the group name.

The workload.AllocInfo.Group was not set at this callsite, due to the bug fixed in this PR.
 https://github.com/hashicorp/nomad/blob/main/client/serviceregistration/nsd/nsd.go#L114
2023-02-22 10:22:48 -06:00
Seth Hoenig
511d0c1e70 artifact: protect against unbounded artifact decompression (1.5.0) (#16151)
* artifact: protect against unbounded artifact decompression

Starting with 1.5.0, set defaut values for artifact decompression limits.

artifact.decompression_size_limit (default "100GB") - the maximum amount of
data that will be decompressed before triggering an error and cancelling
the operation

artifact.decompression_file_count_limit (default 4096) - the maximum number
of files that will be decompressed before triggering an error and
cancelling the operation.

* artifact: assert limits cannot be nil in validation
2023-02-14 09:28:39 -06:00
Charlie Voiselle
bbf9b073fc [chore] Move TestUtil_loadVersionControlGlobalConfigs into build flagged file (#16114) 2023-02-09 14:25:26 -05:00
Seth Hoenig
77ea4e3239 users: eliminate LookupGroupId and its one use case (#16093)
This PR deletes the user.LookupGroupId function as it was only being used
in a single test case, and its value was not important to the test.
2023-02-08 14:57:09 -06:00
Luiz Aoqui
ae26057eec docs: update default Nomad bridge config (#16072) 2023-02-07 09:47:41 -05:00
Michael Schurter
9bab96ebd3 Task API via Unix Domain Socket (#15864)
This change introduces the Task API: a portable way for tasks to access Nomad's HTTP API. This particular implementation uses a Unix Domain Socket and, unlike the agent's HTTP API, always requires authentication even if ACLs are disabled.

This PR contains the core feature and tests but followup work is required for the following TODO items:

- Docs - might do in a followup since dynamic node metadata / task api / workload id all need to interlink
- Unit tests for auth middleware
- Caching for auth middleware
- Rate limiting on negative lookups for auth middleware

---------

Co-authored-by: Seth Hoenig <shoenig@duck.com>
2023-02-06 11:31:22 -08:00
Daniel Bennett
fad28e4265 docs: how to troubleshoot consul connect envoy (#15908)
* largely a doc-ification of this commit message:
  d47678074b
  this doesn't spell out all the possible failure modes,
  but should be a good starting point for folks.

* connect: add doc link to envoy bootstrap error

* add Unwrap() to RecoverableError
  mainly for easier testing
2023-02-02 14:20:26 -06:00
Charlie Voiselle
3225e5cead Update networking_bridge_linux.go (#16025)
* Removed line from previous implementation
* remove import

Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
2023-02-02 14:03:02 -05:00
Charlie Voiselle
fe4ff5be2a Add option to expose workload token to task (#15755)
Add `identity` jobspec block to expose workload identity tokens to tasks.

---------

Co-authored-by: Anders <mail@anars.dk>
Co-authored-by: Tim Gross <tgross@hashicorp.com>
Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
2023-02-02 10:59:14 -08:00
Charlie Voiselle
55df5af4aa client: Add option to enable hairpinMode on Nomad bridge (#15961)
* Add `bridge_network_hairpin_mode` client config setting
* Add node attribute: `nomad.bridge.hairpin_mode`
* Changed format string to use `%q` to escape user provided data
* Add test to validate template JSON for developer safety

Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>
2023-02-02 10:12:15 -05:00
Piotr Kazmierczak
949a6f60c7 renamed stanza to block for consistency with other projects (#15941) 2023-01-30 15:48:43 +01:00
舍我其谁
69b08bb706 volume: Add the missing option propagation_mode (#15626) 2023-01-30 09:32:07 -05:00
Seth Hoenig
d30e34261e client: always run alloc cleanup hooks on final update (#15855)
* client: run alloc pre-kill hooks on last pass despite no live tasks

This PR fixes a bug where alloc pre-kill hooks were not run in the
edge case where there are no live tasks remaining, but it is also
the final update to process for the (terminal) allocation. We need
to run cleanup hooks here, otherwise they will not run until the
allocation gets garbage collected (i.e. via Destroy()), possibly
at a distant time in the future.

Fixes #15477

* client: do not run ar cleanup hooks if client is shutting down
2023-01-27 09:59:31 -06:00
Luiz Aoqui
b668e74a4a template: restore driver handle on update (#15915)
When the template hook Update() method is called it may recreate the
template manager if the Nomad or Vault token has been updated.

This caused the new template manager did not have a driver handler
because this was only being set on the Poststart hook, which is not
called for inplace updates.
2023-01-27 10:55:59 -05:00
Seth Hoenig
5f4368d83c artifact: enable reading system git/mercurial configuration (#15903)
This PR adjusts the artifact sandbox on Linux to enable reading from known
system-wide git or mercurial configuration, if they exist.

Folks doing something odd like specifying custom paths for global config will
need to use the standard locations, or disable artifact filesystem isolation.
2023-01-26 13:07:40 -06:00
Seth Hoenig
2470bf9128 nsd: block on removal of services (#15862)
* nsd: block on removal of services

This PR uses a WaitGroup to ensure workload removals are complete
before returning from ServiceRegistrationHandler.RemoveWorkload of
the nomad service provider. The de-registration of individual services
still occurs asynchrously, but we must block on the parent removal
call so that we do not race with further operations on the same set
of services - e.g. in the case of a task restart where we de-register
and then re-register the services in quick succession.

Fixes #15032

* nsd: add e2e test for initial failing check and restart
2023-01-26 08:17:57 -06:00
Yorick Gersie
24a575ab80 Allow per_alloc to be used with host volumes (#15780)
Disallowing per_alloc for host volumes in some cases makes life of a nomad user much harder.
When we rely on the NOMAD_ALLOC_INDEX for any configuration that needs to be re-used across
restarts we need to make sure allocation placement is consistent. With CSI volumes we can
use the `per_alloc` feature but for some reason this is explicitly disabled for host volumes.

Ensure host volumes understand the concept of per_alloc
2023-01-26 09:14:47 -05:00
Michael Schurter
43fa28ab67 Add INFO task even log line and make logmon less noisy (#15842)
* client: log task events at INFO level

Fixes #15840

Example INFO level client logs with this enabled:

```
[INFO]  client: node registration complete
[INFO]  client.alloc_runner.task_runner: Task event: alloc_id=b3dab5a9-91fd-da9a-ae89-ef7f1eceaf51 task=sleepy type=Received msg="Task received by client" failed=false
[INFO]  client.alloc_runner.task_runner: Task event: alloc_id=b3dab5a9-91fd-da9a-ae89-ef7f1eceaf51 task=sleepy type="Task Setup" msg="Building Task Directory" failed=false
[WARN]  client.alloc_runner.task_runner.task_hook.logmon: plugin configured with a nil SecureConfig: alloc_id=b3dab5a9-91fd-da9a-ae89-ef7f1eceaf51 task=sleepy
[INFO]  client.alloc_runner.task_runner.task_hook.logmon.nomad: opening fifo: alloc_id=b3dab5a9-91fd-da9a-ae89-ef7f1eceaf51 task=sleepy path=/tmp/NomadClient2414238708/b3dab5a9-91fd-da9a-ae89-ef7f1eceaf51/alloc/logs/.sleepy.stdout.fifo @module=logmon timestamp=2023-01-20T11:19:34.275-0800
[INFO]  client.alloc_runner.task_runner.task_hook.logmon.nomad: opening fifo: alloc_id=b3dab5a9-91fd-da9a-ae89-ef7f1eceaf51 task=sleepy @module=logmon path=/tmp/NomadClient2414238708/b3dab5a9-91fd-da9a-ae89-ef7f1eceaf51/alloc/logs/.sleepy.stderr.fifo timestamp=2023-01-20T11:19:34.275-0800
[INFO]  client.driver_mgr.raw_exec: starting task: driver=raw_exec driver_cfg="{Command:/bin/bash Args:[-c sleep 1000]}"
[WARN]  client.driver_mgr.raw_exec.executor: plugin configured with a nil SecureConfig: alloc_id=b3dab5a9-91fd-da9a-ae89-ef7f1eceaf51 driver=raw_exec task_name=sleepy
[INFO]  client.alloc_runner.task_runner: Task event: alloc_id=b3dab5a9-91fd-da9a-ae89-ef7f1eceaf51 task=sleepy type=Started msg="Task started by client" failed=false
[INFO]  client.alloc_runner.task_runner: Task event: alloc_id=b3dab5a9-91fd-da9a-ae89-ef7f1eceaf51 task=sleepy type=Killing msg="Sent interrupt. Waiting 5s before force killing" failed=false
[INFO]  client.driver_mgr.raw_exec.executor: plugin process exited: alloc_id=b3dab5a9-91fd-da9a-ae89-ef7f1eceaf51 driver=raw_exec task_name=sleepy path=/home/schmichael/go/bin/nomad pid=27668
[INFO]  client.alloc_runner.task_runner: Task event: alloc_id=b3dab5a9-91fd-da9a-ae89-ef7f1eceaf51 task=sleepy type=Terminated msg="Exit Code: 130, Signal: 2" failed=false
[INFO]  client.alloc_runner.task_runner: Task event: alloc_id=b3dab5a9-91fd-da9a-ae89-ef7f1eceaf51 task=sleepy type=Killed msg="Task successfully killed" failed=false
[INFO]  client.alloc_runner.task_runner.task_hook.logmon: plugin process exited: alloc_id=b3dab5a9-91fd-da9a-ae89-ef7f1eceaf51 task=sleepy path=/home/schmichael/go/bin/nomad pid=27653
[INFO]  client.gc: marking allocation for GC: alloc_id=b3dab5a9-91fd-da9a-ae89-ef7f1eceaf51
```

So task events will approximately *double* the number of per-task log
lines, but I think they add a lot of value.

* client: drop logmon 'opening' from debug->info

Cannot imagine why users care and removes 2 log lines per task
invocation.

```

[INFO]  client: node registration complete
[INFO]  client.alloc_runner.task_runner: Task event: alloc_id=1cafb2dc-302e-2c92-7845-f56618bc8648 task=sleepy type=Received msg="Task received by client" failed=false
[INFO]  client.alloc_runner.task_runner: Task event: alloc_id=1cafb2dc-302e-2c92-7845-f56618bc8648 task=sleepy type="Task Setup" msg="Building Task Directory" failed=false
<<< 2 "opening fifo" lines elided here >>>
[WARN]  client.alloc_runner.task_runner.task_hook.logmon: plugin configured with a nil SecureConfig: alloc_id=1cafb2dc-302e-2c92-7845-f56618bc8648 task=sleepy
[INFO]  client.driver_mgr.raw_exec: starting task: driver=raw_exec driver_cfg="{Command:/bin/bash Args:[-c sleep 1000]}"
[WARN]  client.driver_mgr.raw_exec.executor: plugin configured with a nil SecureConfig: alloc_id=1cafb2dc-302e-2c92-7845-f56618bc8648 driver=raw_exec task_name=sleepy
[INFO]  client.alloc_runner.task_runner: Task event: alloc_id=1cafb2dc-302e-2c92-7845-f56618bc8648 task=sleepy type=Started msg="Task started by client" failed=false
```

* docs: add changelog for #15842
2023-01-20 14:35:00 -08:00
Seth Hoenig
c3017da6af consul: add client configuration for grpc_ca_file (#15701)
* [no ci] first pass at plumbing grpc_ca_file

* consul: add support for grpc_ca_file for tls grpc connections in consul 1.14+

This PR adds client config to Nomad for specifying consul.grpc_ca_file

These changes combined with https://github.com/hashicorp/consul/pull/15913 should
finally enable Nomad users to upgrade to Consul 1.14+ and use tls grpc connections.

* consul: add cl entgry for grpc_ca_file

* docs: mention grpc_tls changes due to Consul 1.14
2023-01-11 09:34:28 -06:00
Seth Hoenig
865ee8d37c artifact: fix sandbox behavior when destination is shared alloc directory (#15712)
This PR fixes the artifact sandbox (new in Nomad 1.5) to allow downloading
artifacts into the shared 'alloc' directory made available to each task in
a common allocation. Previously we assumed the 'alloc' dir would be mounted
under the 'task' dir, but this is only the case in fs isolation: chroot; in
other modes the alloc dir is elsewhere.
2023-01-09 09:46:32 -06:00
Seth Hoenig
493389e861 artifact: enable inheriting environment variables from client (#15514)
* artifact: enable inheriting environment variables from client

This PR adds client configuration for specifying environment variables that
should be inherited by the artifact sandbox process from the Nomad Client agent.

Most users should not need to set these values but the configuration is provided
to ensure backwards compatability. Configuration of go-getter should ideally be
done through the artifact block in a jobspec task.

e.g.

```hcl
client {
  artifact {
    set_environment_variables = "TMPDIR,GIT_SSH_OPTS"
  }
}
```

Closes #15498

* website: update set_environment_variables text to mention PATH
2022-12-09 15:46:07 -06:00
Seth Hoenig
990537e8ba artifact: add client toggle to disable filesystem isolation (#15503)
This PR adds the client config option for turning off filesystem isolation,
applicable on Linux systems where filesystem isolation is possible and
enabled by default.

```hcl
client{
  artifact {
    disable_filesystem_isolation = <bool:false>
  }
}
```

Closes #15496
2022-12-08 12:29:23 -06:00
Seth Hoenig
cfc67c3422 client: sandbox go-getter subprocess with landlock (#15328)
* client: sandbox go-getter subprocess with landlock

This PR re-implements the getter package for artifact downloads as a subprocess.

Key changes include

On all platforms, run getter as a child process of the Nomad agent.
On Linux platforms running as root, run the child process as the nobody user.
On supporting Linux kernels, uses landlock for filesystem isolation (via go-landlock).
On all platforms, restrict environment variables of the child process to a static set.
notably TMP/TEMP now points within the allocation's task directory
kernel.landlock attribute is fingerprinted (version number or unavailable)
These changes make Nomad client more resilient against a faulty go-getter implementation that may panic, and more secure against bad actors attempting to use artifact downloads as a privilege escalation vector.

Adds new e2e/artifact suite for ensuring artifact downloading works.

TODO: Windows git test (need to modify the image, etc... followup PR)

* landlock: fixup items from cr

* cr: fixup tests and go.mod file
2022-12-07 16:02:25 -06:00
Seth Hoenig
fa0afdbdd9 client: manually cleanup leaked iptables rules (#15407)
This PR adds a secondary path for cleaning up iptables created for an allocation
when the normal CNI library fails to do so. This typically happens when the state
of the pause container is unexpected - e.g. deleted out of band from Nomad. Before,
the iptables rules would be leaked which could lead to unexpected nat routing
behavior later on (in addition to leaked resources). With this change, we scan
for the rules created on behalf of the allocation being GC'd and delete them.

Fixes #6385
2022-11-28 11:32:16 -06:00
James Rasell
847c2cc528 client: accommodate Consul 1.14.0 gRPC and agent self changes. (#15309)
* client: accommodate Consul 1.14.0 gRPC and agent self changes.

Consul 1.14.0 changed the way in which gRPC listeners are
configured, particularly when using TLS. Prior to the change, a
single listener was responsible for handling plain-text and
encrypted gRPC requests. In 1.14.0 and beyond, separate listeners
will be used for each, defaulting to 8502 and 8503 for plain-text
and TLS respectively.

The change means that Nomad’s Consul Connect integration would not
work when integrated with Consul clusters using TLS and running
1.14.0 or greater.

The Nomad Consul fingerprinter identifies the gRPC port Consul has
exposed using the "DebugConfig.GRPCPort" value from Consul’s
“/v1/agent/self” endpoint. In Consul 1.14.0 and greater, this only
represents the plain-text gRPC port which is likely to be disbaled
in clusters running TLS. In order to fix this issue, Nomad now
takes into account the Consul version and configured scheme to
optionally use “DebugConfig.GRPCTLSPort” value from Consul’s agent
self return.

The “consul_grcp_socket” allocrunner hook has also been updated so
that the fingerprinted gRPC port attribute is passed in. This
provides a better fallback method, when the operator does not
configure the “consul.grpc_address” option.

* docs: modify Consul Connect entries to detail 1.14.0 changes.

* changelog: add entry for #15309

* fixup: tidy tests and clean version match from review feedback.

* fixup: use strings tolower func.
2022-11-21 09:19:09 -06:00
Charlie Voiselle
9ad90290e2 [bug] Return a spec on reconnect (#15214)
client: fixed a bug where non-`docker` tasks with network isolation would leak network namespaces and iptables rules if the client was restarted while they were running
2022-11-11 13:27:36 -05:00
Seth Hoenig
5f3f52156e client: avoid unconsumed channel in timer construction (#15215)
* client: avoid unconsumed channel in timer construction

This PR fixes a bug introduced in #11983 where a Timer initialized with 0
duration causes an immediate tick, even if Reset is called before reading the
channel. The fix is to avoid doing that, instead creating a Timer with a non-zero
initial wait time, and then immediately calling Stop.

* pr: remove redundant stop
2022-11-11 09:31:34 -06:00
Seth Hoenig
00c8cd37bb template: protect use of template manager with a lock (#15192)
This PR protects access to `templateHook.templateManager` with its lock. So
far we have not been able to reproduce the panic - but it seems either Poststart
is running without a Prestart being run first (should be impossible), or the
Update hook is running concurrently with Poststart, nil-ing out the templateManager
in a race with Poststart.

Fixes #15189
2022-11-10 08:30:27 -06:00
Charlie Voiselle
52a254ba22 template: error on missing key (#15141)
* Support error_on_missing_value for templates
* Update docs for template stanza
2022-11-04 13:23:01 -04:00
Tim Gross
5a5b4b04cb WI: set identity to client secret if missing (#15121)
Allocations created before 1.4.0 will not have a workload identity token. When
the client running these allocs is upgraded to 1.4.x, the identity hook will run
and replace the node secret ID token used previously with an empty string. This
causes service discovery queries to fail.

Fallback to the node's secret ID when the allocation doesn't have a signed
identity. Note that pre-1.4.0 allocations won't have templates that read
Variables, so there's no threat that this new node ID secret will be able to
read data that the allocation shouldn't have access to.
2022-11-03 11:10:11 -04:00
Seth Hoenig
0b69a52a40 e2e: convert flaky exec download in chroot unit test into e2e test (#14949)
Similar to https://github.com/hashicorp/nomad/pull/14710, convert flaky
test into e2e test.
2022-10-19 08:22:32 -05:00
Hemanth Krishna
a1388217aa enhancement: UpdateTask when Task is waiting for ShutdownDelay (#14775)
Signed-off-by: Hemanth Krishna <hkpdev008@gmail.com>
2022-10-06 16:33:28 -04:00
Luiz Aoqui
88b61cb5b4 template: apply splay value on change_mode script (#14749)
Previously, the splay timeout was only applied if a template re-render
caused a restart or a signal action. The `change_mode = "script"` was
running after the `if restart || len(signals) != 0` check, so it was
invoked at all times.

This change refactors the logic so it's easier to notice that new
`change_mode` options should start only after `splay` is applied.
2022-09-30 12:04:22 -04:00
Seth Hoenig
e4e5bc5cef client: protect user lookups with global lock (#14742)
* client: protect user lookups with global lock

This PR updates Nomad client to always do user lookups while holding
a global process lock. This is to prevent concurrency unsafe implementations
of NSS, but still enabling NSS lookups of users (i.e. cannot not use osusergo).

* cl: add cl
2022-09-29 09:30:13 -05:00