* func: Update the scaling policies when deregistering a job
* func: Add tests for updating the policy
* docs: add changelog
* func: set back the old order
* style: rearrange for clarity and to reuse the watchset
* func: set the policies to teh last submitted when starting a job
* func: expand tests of teh start job command to include job submission
* func: Expand the tests to verify the correct state of the scaling policy after job start
* Update command/job_start.go
Co-authored-by: Tim Gross <tgross@hashicorp.com>
* Update nomad/fsm_test.go
Co-authored-by: Tim Gross <tgross@hashicorp.com>
* func: add warning when there is no previous job submission
---------
Co-authored-by: Tim Gross <tgross@hashicorp.com>
TestSingleAffinities never expected a node with affinity score set to 0 in
the set of returned nodes. However, since #25800, this can happen. What the
test should be checking for instead is that the node with the highest normalized
score has the right affinity.
When a disconnected alloc reconnects, the follow-up evaluation is left pending
and the followup eval ID field isn't cleared. If the allocation later fails, the
followup eval ID prevents the server from creating a new eval for that event.
Update the state store so that updates from the client clear the followup eval
ID if the allocation is reconnecting, and mark the eval as canceled. Update the
FSM to remove those evals from the eval broker's delay heap.
Fixes: https://github.com/hashicorp/nomad/issues/12809
Fixes: https://hashicorp.atlassian.net/browse/NMD-302
The `disconnect.stop_on_client_after` feature is implemented as a loop on the
client that's intended to wait on the shortest timeout of all the allocations on
the node and then check whether the interval since the last heartbeat has been
longer than the timeout. It uses a buffered channel of allocations written and
read from the same goroutine to push "stops" from the timeout expiring to the
next pass through the loop. Unfortunately if there are multiple allocations that
need to be stopped in the same timeout event, or even if a previous event has
not yet been dequeued, then sending on the channel will block and the entire
goroutine deadlocks itself.
While fixing this, I also discovered that the `stop_on_client_after` and
heartbeat loops can synchronize in a pathological way that extends the
`stop_on_client_after` window. If a heartbeat fails close to the beginning of
the shortest `stop_on_client_after` window, the loop will end up waiting until
almost 2x the intended wait period.
While fixing both of those issues, I discovered that the existing tests had a
bug such that we were asserting that an allocrunner was being destroyed when it
had already exited.
This commit includes the following:
* Rework the watch loop so that we handle the stops in the same case as the
timer expiration, rather than using a channel in the method scope.
* Remove the alloc intervals map field from the struct and keep it in the
method scope, in order to discourage writing racy tests that read its value.
* Reset the timer whenever we receive a heartbeat, which forces the two
intervals to synchronize correctly.
* Minor refactoring of the disconnect timeout lookup to improve brevity.
Fixes: https://github.com/hashicorp/nomad/issues/24679
Ref: https://hashicorp.atlassian.net/browse/NMD-407
During #25547 and #25588 work, incorrect response codes from
/v1/acl/token/self were changed, but we did not make a note about this in the
upgrade guide.
When a node is garbage collected, any dynamic host volumes on the node are
orphaned in the state store. We generally don't want to automatically collect
these volumes and risk data loss, and have provided a CLI flag to `-force`
remove them in #25902. But for clusters running on ephemeral cloud
instances (ex. AWS EC2 in an autoscaling group), deleting host volumes may add
excessive friction. Add a configuration knob to the client configuration to
remove host volumes from the state store on node GC.
Ref: https://github.com/hashicorp/nomad/pull/25902
Ref: https://github.com/hashicorp/nomad/issues/25762
Ref: https://hashicorp.atlassian.net/browse/NMD-705
* ui: Handle new token self response object when ACLs are disabled.
The ACL self lookup now returns a spoof token when ACLs are
disbaled, rather than an error. The UI needs to be updated to
handle this change, so permissions checks are not performed which
grey out buttons such as client drain incorrectly.
* changelog: add entry for #25881
* Set MaxAllocations in client config
Add NodeAllocationTracker struct to Node struct
Evaluate MaxAllocations in AllocsFit function
Set up cli config parsing
Integrate maxAllocs into AllocatedResources view
Co-authored-by: Tim Gross <tgross@hashicorp.com>
---------
Co-authored-by: Tim Gross <tgross@hashicorp.com>
We've been gradually migrating from `testify` to `shoenig/test` on a
test-by-test basis. While working on a large refactoring in the state store, I
found this to create a lot of diffs incidental to the refactoring.
In this changeset, I've used a prototype collection of semgrep fix rules to
autofix most of the uses of testify in the `nomad/state` package. Then I went in
manually and fixed any resulting problems, as well as a few minor test bugs that
`shoenig/test` catches and `testify` does not because of its API. I've also
added a semgrep rule for marking a package as "testify clean", so that we don't
accidentally add it back to any package we manage to remove it from going
forward.
While I'm here, I've removed most of the uses of `reflect.DeepEqual` in the
tests as well as cleaned up some older idioms that Go has nicer syntax for now.
Workflow identities currently support ACL policies being applied
to a job ID within a namespace. With this update an ACL policy
can be applied to a namespace. This results in the ACL policy
being applied to all jobs within the namespace.
We have several semgrep rules forbidding imports of packages we don't
want. While testing out a new rule I discovered that the rule we have is
completely ineffective. Update the rule to detect imports using the Go language
plugin, including regex matching on some packages where it's forbidden to import
the root but fine to import a subpackage or different version.
The go-set import rule is an example of one where our `go-set/v3` imports fails
the re-written check unless we use the regex syntax. If you replace the pattern
rule with `import "=~/github.com\/hashicorp\/go-set/v3$/"` it would fail.
The DNS configuration for our E2E cluster uses dnsmasq to pass all DNS through
Consul. But there's a circular reference in systemd configurations that
sometimes causes the Docker service to fail, this is causing test flakes during
upgrade testing because we count the number of nodes and expect `system` jobs
using Docker to run on all nodes.
We no longer have any tests that require Consul DNS, so remove the complication
of dnsmasq to break the reference cycle. Also, while I was looking at this I
noticed we still had setup that would configure the ECS remote task driver
plugin, which is archived. Remove this as well.
Ref: https://hashicorp.atlassian.net/browse/NMD-162
If there are no affinities on a job, we don't want to count an affinity score of
zero in the number of scores we divide the normalized score by. This is how we
handle other scoring components like node reschedule penalties on nodes that
weren't running the previous allocation.
But we also exclude counting the affinity in the case where we have affinity but
the value is zero. In pathological cases, this can result in a node with a low
affinity being picked over a node with no affinity, because the denominator is 1
larger. Include zero-value affinities in the count of scores if the job has
affinities but the value just happens to be zero.
Fixes: https://github.com/hashicorp/nomad/issues/25621
This introduces a new HTTP endpoint (and an associated CLI command) for querying
ACL policies associated with a workload identity. It allows users that want
to learn about the ACL capabilities from within WI-tasks to know what sort of
policies are enabled.
---------
Co-authored-by: Tim Gross <tgross@hashicorp.com>
Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>
* action page
* change all page_title fields
* update title
* constraint through migrate pages
* update page title and heading to use sentence case
* fix front matter description
* Apply suggestions from code review
Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>
---------
Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>
Collecting metrics from processes is expensive, especially on platforms like
Windows. The executor code has a 5s cache of stats to ensure that we don't
thrash syscalls on nodes running many allocations. But the timestamp used to
calculate TTL of this cache was never being set, so we were always treating it
as expired. This causes excess CPU utilization on client nodes.
Ensure that when we fill the cache, we set the timestamp. In testing on Windows,
this reduces exector CPU overhead by roughly 75%.
This changeset includes two other related items:
* The `telemetry.publish_allocation_metrics` field correctly prevents a node
from publishing metrics, but the stats hook on the taskrunner still collects
the metrics, which can be expensive. Thread the configuration value into the
stats hook so that we don't collect if `telemetry.publish_allocation_metrics =
false`.
* The `linuxProcStats` type in the executor's `procstats` package is misnamed as
a result of a couple rounds of refactoring. It's used by all task executors,
not just Linux. Rename this and move a comment about how Windows processes are
listed so that the comment is closer to where the logic is implemented.
Fixes: https://github.com/hashicorp/nomad/issues/23323
Fixes: https://hashicorp.atlassian.net/browse/NMD-455
* Only error on constraints if no allocs are running
When running `nomad job run <JOB>` multiple times with constraints
defined, there should be no error as a result of filtering out nodes
that do not/have not ever satsified the constraints.
When running a systems job with constraint, any run after an initial
startup returns an exit(2) and a warning about unplaced allocations due
to constraints. An error that is not encountered on the initial run,
though the constraint stays the same.
This is because the node that satisfies the condition is already running
the allocation, and the placement is ignored. Another placement is
attempted, but the only node(s) left are the ones that do not satisfy
the constraint. Nomad views this case (no allocations that were
attempted to placed could be placed successfully) as an error, and
reports it as such. In reality, no allocations should be placed or
updated in this case, but it should not be treated as an error.
This change uses the `ignored` placements from diffSystemAlloc to attempt to
determine if the case encountered is an error (no ignored placements
means that nothing is already running, and is an error), or is not one
(an ignored placement means that the task is already running somewhere
on a node). It does this at the point where `failedTGAlloc` is
populated, so placement functionality isn't changed, just the field that
populates error.
There is functionality that should be preserved which (correctly)
notifies a user if a job is attempted that cannot be run on any node due
to the constraints filtering out all available nodes. This should still
behave as expected.
* Add changelog entry
* Handle in-place updates for constrained system jobs
* Update .changelog/25850.txt
Co-authored-by: Piotr Kazmierczak <470696+pkazmierczak@users.noreply.github.com>
* Remove conditionals
---------
Co-authored-by: Piotr Kazmierczak <470696+pkazmierczak@users.noreply.github.com>
During the upgrade test we can trigger a re-render of the Vault secret due to
client restart before the allocrunner has marked the task as running, which
triggers the change mode on the template and restarts the task. This results in
a race where the alloc is still "pending" when we go to check it. We never
change the value of this secret in upgrade testing, so paper over this race
condition by setting a "noop" change mode.
We're required to pin Docker images for Actions to a specific SHA now and this
is tripping scans in the Enterprise repo. Update the actionlint image.
Ref: https://go.hashi.co/memo/sec-032
Nomad Enterprise users operating in air-gapped or otherwise secured environments
don't want to send license reporting metrics directly from their
servers. Implement manual/offline reporting by periodically recording usage
metrics snapshots in the state store, and providing an API and CLI by which
cluster administrators can download the snapshot for review and out-of-band
transmission to HashiCorp.
This is the CE portion of the work required for implemention in the Enterprise
product. Nomad CE does not perform utilization reporting.
Ref: https://github.com/hashicorp/nomad-enterprise/pull/2673
Ref: https://hashicorp.atlassian.net/browse/NMD-68
Ref: https://go.hashi.co/rfc/nmd-210
This changeset includes several adjustments to the upgrade testing scripts to
reduce flakes and make problems more understandable:
* When a node is drained prior to the 3rd client upgrade, it's entirely
possible the 3rd client to be upgraded is the drained node. This results in
miscounting the expected number of allocations because many of them will be
"complete" (service/batch) or "pending" (system). Leave the system jobs running
during drains and only count the running allocations at that point as the
expected set. Move the inline script that gets this count into a script file for
legibility.
* When the last initial workload is deployed, it's possible for it to be
briefly still in "pending" when we move to the next step. Poll for a short
window for the expected count of jobs.
* Make sure that any scripts that are being run right after a server or client
is coming back up can handle temporary unavailability gracefully.
* Change the debugging output of several scripts to avoid having the debug
output run into the error message (Ex. "some allocs are not running" looked like
the first allocation running was the missing allocation).
* Add some notes to the README about running locally with `-dev` builds and
tagging a cluster with your own name.
Ref: https://hashicorp.atlassian.net/browse/NMD-162