This PR adds Consul Template's executeTemplate function to the denylist by
default, in order to prevent accidental or malicious infinitely recursive
execution.
---------
Co-authored-by: Tim Gross <tgross@hashicorp.com>
A more comprehensive env.denylist that now includes more token, token file and
license variables.
---------
Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>
Some plugins emit multiple topology segment entries for the same segment (ex. newer versions of AWS EBS) to accommodate convention changes in k8s. Check that segments are a superset instead of exactly equal to the plugin's topology segments.
* func: User url rules to scape non alphanumeric values in hcl variables
* docs: add changelog
* func: unscape flags before returning
* use JSON.stringify instead of bespoke value quoting to handle in-value-multi-line cases
---------
Co-authored-by: Phil Renaud <phil@riotindustries.com>
Fixes a bug in the AllocatedResources.Comparable method, where the scheduler
would only take into account the cpusets of the tasks in the largest lifecycle.
This could result in overlapping cgroup cpusets. Now we make the distinction
between reserved and fungible resources throughout the lifespan of the alloc.
In addition, added logging in case of future regressions thus not requiring
manual inspection of cgroup files.
This PR fixes a bug where System.GarbageCollect endpoint didn't work on objects
that weren't older than their respective GC thresholds. System.GarbageCollect
is used to force garbage collection (also used by the system gc command) and
should ignore any GC threshold settings.
* windows: revert process listing logic to that of v1.6.10
In Nomad 1.7 much of the process management code was refactored, including
a rewrite of how the process tree of an executor was determined on Windows
machines. Unfortunately that rewrite has been cursed with performance issues
and bugs. Instead, revert to the logic used in v1.6.10.
* changelog
When the service client syncs to Consul, we accumulate service sync errors in a
multierror before reading all the local checks. If the API call to the local
checks fails, we either return that error or append it to the multierror and
return the set of errors. But `multierror.Error.Len()` doesn't nil-check, so we
need to do this ourselves.
I've also made a quick pass through the rest of the code base looking for
multierror `Len` method calls to see if we have this pattern elsewhere.
Fixes: https://github.com/hashicorp/nomad/issues/24512
When a task restarts, the Nomad client may need to rewrite the Consul token, but
it's created with permissions that prevent a non-root agent from writing to
it. While Nomad clients should be run as root (currently), it's harmless to
allow whatever user the Nomad agent is running as to be able to write to it, and
that's one less barrier to rootless Nomad.
Ref: https://github.com/hashicorp/nomad/issues/23859#issuecomment-2465757392
In order to help users understand multi-region federated
deployments, this change adds two new sections to the website.
The first expands the architecture page, so we can add further
detail over time with an initial federation page. The second adds
a federation operations page which goes into failure planning and
mitigation.
Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>
Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
User of `nsutil` library should be able to do the following and for it
to work:
```
var errno syscall.Errno
if errors.As(err, &errno) {
if errno == unix.EBUSY { ... }
}
```
This commit fixes that issue.
When a Vault lease expires, it's revoked on the server and cannot be removed, so
this error should be treated as fatal.
The errors we get aren't wrapped by the Vault SDK, so unfortunately we have to
read the error messages and can't easily enumerate non-fatal error
messages (which might be bubbling up from the stdlib). I've audited the errors
currently used and have documented their source.
Ref 52ba156d47/vault/expiration.go (L1327)
Fixes: https://github.com/hashicorp/nomad/issues/23859
* func: remove validation scaling for system jobs and dont canonicalize to 1
* test: update test to validate for 0 and improve error message
* func: remove the canonicalization to 1 from system jobs
* docs: add changelog
* func: add test for scaling system jobs
* temp: add logging to debug test
* fix: clean up after test is done
* fix: scaled down jobs will still have the stop allocation, update test to account for it
* Update the e2e test to accomodate for system jobs to have an alloc per node
* fix: filter to only count ready nodes on the node count
* fix: remove the datacenter constrain from the system job definition
* fix: compare alloc IDs to avoid flaky tests when verifying no alloc was stoped
* fix: remove duplicated code
Recently we moved from github.com/syndtr/gocapability to
github.com/moby/sys/capability due to the former package no longer being
maintainer. The new package's capability function works differently: the
known/supported functionality is split now, and the .ListSupported() call will
always return an empty list on non-linux systems. This means Nomad agents won't
start on darwin or windows.
Our git pre-push hook already prevents Nomad Enterprise code from getting pushed
anywhere but its own repo. But this hook only works for files on the current
worktree (checkout). Were you to fetch an Enterprise tag into your local
Community Edition repo but not have it checked out, and then `git push --tags`,
you'd push that tag and the associated commit history.
Add tag filtering to the pre-push hook to prevent Enterprise tags (and the older
`+pro` SKU) tags from getting pushed to the Community Edition repo.
Clusters that have gone through several upgrades have be found
to include keyring material which has an empty RSA block.
In more recent versions of Nomad, an empty RSA block is omitted
from being written to disk. This results in the panic not being
present. Older versions, however, did not have this struct tag
meaning we wrote an empty JSON block which is not accounted for
in the current version.
github.com/moby/sys/capability is a fork of the (no longer maintained)
github.com/syndtr/gocapability package.
For changes since the fork took place, see
https://github.com/moby/sys/blob/main/capability/CHANGELOG.md
Note that the "workaround for RHEL6" is removed for a number of reasons.
Feel free to choose the one you like the most, either is sufficient:
1. /proc/sys/kernel/cap_last_cap is available since RHEL 6.7
(kernel 2.6.32-573.el6), released 9 years ago (2015-07-22).
2. It incorrectly returns CAP_BLOCK_SUSPEND (36), which was only added
in kernel v3.5 and was never backported to RHEL6 kernels. The
correct value for RHEL6 would be CAP_MAC_ADMIN (33).
3. As far as upstream kernels go, /proc/sys/kernel/cap_last_cap was
added in kernel v3.2, and a correct value depends on the kernel
version. It could be CAP_WAKE_ALARM (35), added to kernel v3.0, or
CAP_SYSLOG (34), added to kernel v2.6.38, or possibly a lesser value
for even older kernels.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
When we removed the time table in #24112 we introduced a bug where if a previous
version of Nomad had written a time table entry, we'd return from the restore
loop early and never load the rest of the FSM. This will result in a mostly or
partially wiped state for that Nomad node, which would then be out of sync with
its peers (which would also have the same problem on upgrade).
The bug only occurs when the FSM is being restored from snapshot, which isn't
the case if you test with a server that's only written Raft logs and not
snapshotted them.
While fixing this bug, we still need to ensure we're reading the time table
entries even if we're throwing them away, so that we move the snapshot reader
along to the next full entry.
Fixes: https://github.com/hashicorp/nomad/issues/24411