For templates with `change_mode = "script"`, we set a driver handle in the
poststart method, so the template runner can execute the script inside the
task. But when the client is restarted and the template contents change during
that window, we trigger a change_mode in the prestart method. In that case, the
hook will not have the handle and so returns an errror trying to run the change
mode.
We restore the driver handle before we call any prestart hooks, so we can pass
that handle in the constructor whenever it's available. In the normal task start
case the handle will be empty but also won't be called.
The error messages are also misleading, as there's no capabilities check
happening here. Update the error messages to match.
Fixes: https://github.com/hashicorp/nomad/issues/15851
Ref: https://hashicorp.atlassian.net/browse/NET-9338
When a root key is rotated, the servers immediately start signing Workload
Identities with the new active key. But workloads may be using those WI tokens
to sign into external services, which may not have had time to fetch the new
public key and which might try to fetch new keys as needed.
Add support for prepublishing keys. Prepublished keys will be visible in the
JWKS endpoint but will not be used for signing or encryption until their
`PublishTime`. Update the periodic key rotation to prepublish keys at half the
`root_key_rotation_threshold` window, and promote prepublished keys to active
after the `PublishTime`.
This changeset also fixes two bugs in periodic root key rotation and garbage
collection, both of which can't be safely fixed without implementing
prepublishing:
* Periodic root key rotation would never happen because the default
`root_key_rotation_threshold` of 720h exceeds the 72h maximum window of the FSM
time table. We now compare the `CreateTime` against the wall clock time instead
of the time table. (We expect to remove the time table in future work, ref
https://github.com/hashicorp/nomad/issues/16359)
* Root key garbage collection could GC keys that were used to sign
identities. We now wait until `root_key_rotation_threshold` +
`root_key_gc_threshold` before GC'ing a key.
* When rekeying a root key, the core job did not mark the key as inactive after
the rekey was complete.
Ref: https://hashicorp.atlassian.net/browse/NET-10398
Ref: https://hashicorp.atlassian.net/browse/NET-10280
Fixes: https://github.com/hashicorp/nomad/issues/19669
Fixes: https://github.com/hashicorp/nomad/issues/23528
Fixes: https://github.com/hashicorp/nomad/issues/19368
The documentation for the `SSL` option for the Docker driver is
misleading inasmuch as it's both deprecated and non-functional in current
versions of Docker. Remove this option from the docs and add a section
explaining how to use insecure registries.
Fixes: https://github.com/hashicorp/nomad/issues/23616
In Nomad 1.4.0, we shipped support for encrypted Variables and signed Workload
Identities, but the key material is protected only by a AEAD encrypting the
KEK. Add support for Vault transit encryption and external KMS from major cloud
providers. The servers call out to the external service to decrypt each key in
the on-disk keystore.
Ref: https://hashicorp.atlassian.net/browse/NET-10334
Fixes: https://github.com/hashicorp/nomad/issues/14852
and tweak Makefile to generate a custom.tfvars
instead of specifying vars separately via CLI.
hoping this makes it a little more obvious
if there is no consul/nomad license.
Nomad's default serf configuration has a full sync interval of 60s (the WAN
default configuration in the library). If tests need to join nodes and the
leader is not in the join set, the test can hang up to twice that interval
waiting for the new node to be seen by the leader and added to Raft.
This changeset includes the following tweaks to improve test timings:
* Ensure that nodes introduced later in the keyring replication test are joined
to all peers. (Also updates the test to `shoenig/test`.)
* Update the `TestJoin` helper so that all servers passed are joined to the full
set, instead of a set that's offset by 1, and use a single `Join` call for
each server to reduce the number of messages sent.
* Reduce the `PushPullInterval` from 60s to 500ms in our unit test
configuration, to force faster full syncs.
Fixes a bug where variable values in job submissions that contained newlines
weren't encoded correctly, and thus jobs that contained them couldn't be
resumed once stopped via the UI.
Internal ref: https://hashicorp.atlassian.net/browse/NET-9966
* First pass at global token creation and regional awareness at token fetch time
* Reset and refetch token when you switch region but stay in place
* Ugly and functional global token save
* Tests and log cleanup
the ratio of optimized/unoptimized log size in TestPlanNormalize
has been increased several times as people have added to various
structs and coincidentally bumped into the magic limit.
we encountered another such case when adding to NetworkResource,
but here we omitempty on the struct instead of bumping the limit
in the test.
this has the added benefit of reducing the serialized struct size!
which was the original intent behind this test in the first place :P
the actual value of the ratio is now 0.628... but here the
test value is only dropped down to 0.66 to leave some wiggle room.