As of Nomad 1.8.0 LTS we're no longer backporting changes to Nomad CE versions
1.7.x and 1.6.x. We never got around to removing the flags in the file the
backport assistant uses to control automated backports.
The TLS configuration object includes a deprecated `prefer_server_cipher_suites`
field. In version of Go prior to 1.17, this property controlled whether a TLS
connection would use the cipher suites preferred by the server or by the
client. This field is ignored as of 1.17 and, according to the `crypto/tls`
docs: "Servers now select the best mutually supported cipher suite based on
logic that takes into account inferred client hardware, server hardware, and
security."
This property has been long-deprecated and leaving it in place may lead to false
assumptions about how cipher suites are negotiated in connection to a server. So
we want to remove it in Nomad 1.9.0.
Fixes: https://github.com/hashicorp/nomad-enterprise/issues/999
Ref: https://hashicorp.atlassian.net/browse/NET-10531
In Nomad 1.6.0 we started sending the node secret with RPCs that previously did
not include it. We planned to deprecate the older auth workflow but didn't set a
release. Removing the legacy support means that nodes running <1.6.0 will fail
to heartbeat.
Ref: https://hashicorp.atlassian.net/browse/NET-10009
Add a section to the docs describing planned upcoming deprecations and
removals. Also added some missing upgrade guide sections missed during the last
release.
The default root:root is used as this provides permissions to run
both server and client agents. The comment details what changes
can be made to operators if needed.
When running the service file prior to this change, root:root
would be the default.
When updating a `JobScalingEvent`, the state store function did not copy the
existing object before mutating it. This corrupts the state store because it
modifies the leaf node without committing it in a transaction. It can also cause
the Nomad server to crash with a "fatal error: concurrent map read and map
write" if its `ScalingEvents` map is read via the `ScaleStatus` RPC at the same
time as it's being written.
This changeset also removes some mostly-unused public methods on the struct that
dangerously encourage you to mutate it outside of a copy.
Ref: https://hashicorp.atlassian.net/browse/NET-10529
For templates with `change_mode = "script"`, we set a driver handle in the
poststart method, so the template runner can execute the script inside the
task. But when the client is restarted and the template contents change during
that window, we trigger a change_mode in the prestart method. In that case, the
hook will not have the handle and so returns an errror trying to run the change
mode.
We restore the driver handle before we call any prestart hooks, so we can pass
that handle in the constructor whenever it's available. In the normal task start
case the handle will be empty but also won't be called.
The error messages are also misleading, as there's no capabilities check
happening here. Update the error messages to match.
Fixes: https://github.com/hashicorp/nomad/issues/15851
Ref: https://hashicorp.atlassian.net/browse/NET-9338
When a root key is rotated, the servers immediately start signing Workload
Identities with the new active key. But workloads may be using those WI tokens
to sign into external services, which may not have had time to fetch the new
public key and which might try to fetch new keys as needed.
Add support for prepublishing keys. Prepublished keys will be visible in the
JWKS endpoint but will not be used for signing or encryption until their
`PublishTime`. Update the periodic key rotation to prepublish keys at half the
`root_key_rotation_threshold` window, and promote prepublished keys to active
after the `PublishTime`.
This changeset also fixes two bugs in periodic root key rotation and garbage
collection, both of which can't be safely fixed without implementing
prepublishing:
* Periodic root key rotation would never happen because the default
`root_key_rotation_threshold` of 720h exceeds the 72h maximum window of the FSM
time table. We now compare the `CreateTime` against the wall clock time instead
of the time table. (We expect to remove the time table in future work, ref
https://github.com/hashicorp/nomad/issues/16359)
* Root key garbage collection could GC keys that were used to sign
identities. We now wait until `root_key_rotation_threshold` +
`root_key_gc_threshold` before GC'ing a key.
* When rekeying a root key, the core job did not mark the key as inactive after
the rekey was complete.
Ref: https://hashicorp.atlassian.net/browse/NET-10398
Ref: https://hashicorp.atlassian.net/browse/NET-10280
Fixes: https://github.com/hashicorp/nomad/issues/19669
Fixes: https://github.com/hashicorp/nomad/issues/23528
Fixes: https://github.com/hashicorp/nomad/issues/19368
The documentation for the `SSL` option for the Docker driver is
misleading inasmuch as it's both deprecated and non-functional in current
versions of Docker. Remove this option from the docs and add a section
explaining how to use insecure registries.
Fixes: https://github.com/hashicorp/nomad/issues/23616
In Nomad 1.4.0, we shipped support for encrypted Variables and signed Workload
Identities, but the key material is protected only by a AEAD encrypting the
KEK. Add support for Vault transit encryption and external KMS from major cloud
providers. The servers call out to the external service to decrypt each key in
the on-disk keystore.
Ref: https://hashicorp.atlassian.net/browse/NET-10334
Fixes: https://github.com/hashicorp/nomad/issues/14852
and tweak Makefile to generate a custom.tfvars
instead of specifying vars separately via CLI.
hoping this makes it a little more obvious
if there is no consul/nomad license.
Nomad's default serf configuration has a full sync interval of 60s (the WAN
default configuration in the library). If tests need to join nodes and the
leader is not in the join set, the test can hang up to twice that interval
waiting for the new node to be seen by the leader and added to Raft.
This changeset includes the following tweaks to improve test timings:
* Ensure that nodes introduced later in the keyring replication test are joined
to all peers. (Also updates the test to `shoenig/test`.)
* Update the `TestJoin` helper so that all servers passed are joined to the full
set, instead of a set that's offset by 1, and use a single `Join` call for
each server to reduce the number of messages sent.
* Reduce the `PushPullInterval` from 60s to 500ms in our unit test
configuration, to force faster full syncs.