In #23977 we merged a change to how the keyring was stored. Because keyring
initialization takes slightly longer now, this uncovered existing timing bugs in
some of our tests where tests that require the keyring (ex. plan applier tests)
were waiting for the leader but not the keyring initialization. Fix some of the
examples we've seen cause test flakes.
* update changelog for 1.8.4 release
* changelog: add 1.8.4 backport changelog notes
I botched the changelog bits of the checklist, adding the backport notes
to the CE changelog now.
In Nomad 1.4, we implemented a root keyring to support encrypting Variables and
signing Workload Identities. The keyring was originally stored with the
AEAD-wrapped DEKs and the KEK together in a JSON keystore file on disk. We
recently added support for using an external KMS for the KEK to improve the
security model for the keyring. But we've encountered multiple instances of the
keystore files not getting backed up separately from the Raft snapshot,
resulting in failure to restore clusters from backup.
Move Nomad's root keyring into Raft (encrypted with a KMS/Vault where available)
in order to eliminate operational problems with the separate on-disk keystore.
Fixes: https://github.com/hashicorp/nomad/issues/23665
Ref: https://hashicorp.atlassian.net/browse/NET-10523
We're releasing the beta for Nomad 1.9.0 shortly. Bumping the base version now
will make it easier to test out new features that require a version
check. Builds from `main` will show as `1.9.0-dev`.
Resolve scan job runner
Resolve linting alerts
adding EOF on files
adding EOF on gitignore too
add hclfmt and bump action versions
update scan.hcl comments
Co-authored-by: Tim Gross <tgross@hashicorp.com>
fix typo
move scan.hcl file and paths-ignore for scans
change action runner
use org secret to checkout
typo
change runner
use hashicorp/setup-golang@v3
Co-authored-by: Tim Gross <tgross@hashicorp.com>
pin the github action sha
so more than one copy of a program can run
at a time on the same port with SO_REUSEPORT.
requires host network mode.
some task drivers (like docker) may also need
config {
network_mode = "host"
}
but this is not validated prior to placement.
The landlock fingerprint test assumes there's no version of the landlock API
>3. Update the test assertion to allow for the current v4 and any future
versions.
While working on #23655 I found there were a few places in the encrypter/keyring
where we could make modest improvements to performance and reliability of the
existing code.
This changeset allows keyring replication to skip trying to replicate from
itself, switches some of the read-only keyring accesses to use the read lock
instead of a r/w lock, fixes the logging configuration to drop spurious "extra
value" warnings in the logs, drops an unused type, and makes a minor refactoring
to eliminate shadowing of the `keyset` type. Pulling this out to its own PR lets
us backport these changes to the LTS and reduces the size of the PR that
implements #23665.
Ref https://github.com/hashicorp/nomad/issues/23665
when a CNI result includes an IPv6 address,
set it on the alloc's NetworkStatus for reference.
e.g.:
$ nomad alloc status -json 3dca | jq '.NetworkStatus'
{
"Address": "172.26.64.14",
"AddressIPv6": "fd00:a110:c8::b",
"DNS": null,
"InterfaceName": "eth0"
}
The documentation is referring to a `file` attribute that does not exist on the `vault` block.
This PR changes those references to mention the `disable_file` attribute instead.
For #23665 I'm about to make add a lot more code to the state store for the
keyring, so I'd like to pull these out to their own file. Also updates the test
to use `shoenig/test` and changes the name of one method to be a little more
accurate.
In #23663 we fixed the template hook so that `change_mode="script"` didn't lose
track of the task handle during restores. But this revealed a second bug which
is that access to the handle is not locked while in use, which can allow it to
be removed concurrently.
Fixes: https://github.com/hashicorp/nomad/issues/23875
Quota usage calculation depends on allocation.Resources field (which will be
deprecated in the future), while device resources are being kept in
allocation.AllocatedResources and parsed into a structure (vendor/type/name)
in order for the ranking in the scheduler to find nodes that can satisfy device
requirements. To make device quotas work properly, this has to be temporarily
translated into allocation.Resources.Devices.
---------
Co-authored-by: Tim Gross <tgross@hashicorp.com>
As of Nomad 1.6.0, Nomad client agents send their secret with all the
RPCs (other than registration). But for backwards compatibility we had to keep
a legacy auth method that didn't require the node secret. We've previously
announced that this legacy auth method would be removed and that nodes older
than 1.6.0 would not be supported with Nomad 1.9.0.
This changeset removes the legacy auth method.
Ref: https://developer.hashicorp.com/nomad/docs/release-notes/nomad/upcoming#nomad-1-9-0