The current autoscaler docs implies that it has minimal or non-working support
for Nomad namespaces. Whereas in fact the namespace support works fine but just
doesn't allow configuring multiple namespaces without using a wildcard (for
now). Make this more clear and fix the reference to the configuration "below",
which is no longer on that same page.
Ref: https://github.com/hashicorp/nomad-autoscaler/issues/65
Refactors of the `computeGroup` code in the reconciler to make understanding its
mutations more manageable. Some of this work makes mutation more consistent but
more importantly it's intended to make it readily _detectable_ while still being
readable. Includes:
* In the `computeCanaries` function, we mutate the dstate and the result and
then the return values are used to further mutate the result in the
caller. Move all this mutation into the function.
* In the `computeMigrations` function, we mutate the result and then the return
values are used to further mutate the result in the caller. Move all this
mutation into the function.
* In the `cancelUnneededCanaries` function, we mutate the result and then the
return values are used to further mutate the result in the caller. Move all
this mutation into the function, and annotate which `allocSet`s are mutated by
taking a pointer to the set.
* The `createRescheduleLaterEvals` function currently mutates the results and
returns updates to mutate the results in the caller. Move all this mutation
into the function to help cleanup `computeGroup`.
* Extract `computeReconnecting` method from `computeGroup`. There's some tangled
logic in `computeGroup` for determining changes to make for reconnecting
allocations. Pull this out into its own function. Annotate mutability in the
function by passing pointers to `allocSet` where needed, and mutate the result
to update counts. Rename the old `computeReconnecting` method to
`appendReconnectingUpdates` to mirror the naming of the similar logic for
disconnects.
* Extract `computeDisconnecting` method from `computeGroup`. There's some
tangled logic in `computeGroup` for determining changes to make for
disconnected allocations. Pull this out into its own function. Annotate
mutability in the function by passing pointers to `allocSet` where needed, and
mutate the result to update counts.
* The `appendUnknownDisconnectingUpdates` method used to create updates for
disconnected allocations mutates one of its `allocSet` arguments to change the
allocations that the reschedule now set points to. Pull this update out into
the caller.
* A handful of small docstring and helper function fixes
Ref: https://hashicorp.atlassian.net/browse/NMD-819
The reconciler contains a large set of methods and functions that operate on
`allocSet` (a map of allocation IDs to their allocs). Update these so that they
are consistently methods that are documented to not consume the `allocSet`. This
sets the stage for further improvements around mutability in the reconciler.
This changeset also includes a few related refactors:
* Use the `allocSet` alias in every location it's relevant in the reconciler,
for consistency and clarity.
* Move the filter functions and related helpers in the `allocs.go` file into the
`filters.go` file.
* Update the method receiver on `allocSet` to match everywhere and generally
improve the docstrings on the filter functions.
Ref: https://hashicorp.atlassian.net/browse/NMD-819
When a task group is removed from a jobspec, the reconciler stops all
allocations and immediately returns from `computeGroup`. We can do the same for
when the group has been scaled-to-zero, but doing so runs into an inconsistency
in the way that server-terminal allocations are handled.
Prior to this change server-terminal allocations fall through `computeGroup`
without being marked as `ignore`, unless they are terminal canaries, in which
case they are marked `stop` (but this is a no-op). This inconsistency causes a
_tiny_ amount of extra `Plan.Submit`/Raft traffic, but more importantly makes it
more difficult to make test assertions for `stop` vs `ignore` vs
fallthrough. Remove this inconsistency by filtering out server-terminal
allocations early in `computeGroup`.
This brings the cluster reconciler's behavior closer to the node reconciler's
behavior, except that the node reconciler discards _all_ terminal allocations
because it doesn't support rescheduling.
This changeset required adjustments to two tests, but the tests themselves were
a bit of a mess:
* In https://github.com/hashicorp/nomad/pull/25726 we added a test of how
canaries were treated when on draining nodes. But the test didn't correctly
configure the job with an update block, leading to misleading test
behavior. Fix the test to exercise the intended behavior and refactor for
clarity.
* While working on reconciler behaviors around stopped allocations, I found it
extremely hard to follow the intent of the disconnected client tests because
many of the fields in the table-driven test are switches for more complex
behavior or just tersely named. Attempt to make this a little more legible by
moving some branches directly into fields, renaming some fields, and
flattening out some branching.
Ref: https://hashicorp.atlassian.net/browse/NMD-819
The `DesiredUpdates` struct that we send to the Read Eval API doesn't include
information about disconnect/reconnect and rescheduling. Annotate the
`DesiredUpdates` with this data, and adjust the `eval status` command to display
only those fields that have non-zero values in order to make the output width
manageable.
Ref: https://hashicorp.atlassian.net/browse/NMD-815
While investigating whether the deploymentwatcher would need updates to
implement system deployments, I discovered that some of the tests are racy and
make assertions about called functions without waiting.
Update these tests to wait where needed, and generally clean them up while we're
in here. In particular I've removed the heavyweight mocking in lieu of checking
the call counts and then asserting the expected state store changes.
Ref: https://hashicorp.atlassian.net/browse/NMD-892
System and sysbatch jobs don't support the reschedule block, because we'd always
replace allocations back onto the same node. The job validation for system jobs
asserts that the user hasn't set a `reschedule` block so that users aren't
submitting jobs expecting it to be supported. But this validation was missing
for sysbatch jobs.
Validate that sysbatch jobs don't have a reschedule block.
If you delete a CSI volume, the volume cannot be currently claimed by an
allocation or in the process of being unpublished. This is documented in the CLI
but not the API. Also, the documentation incorrectly says that the `volume
delete` command silently returns without error if the volume doesn't exist, but
that's incorrect.
Fixes: https://github.com/hashicorp/nomad/issues/24756
When we originally implemented CSI, Nomad did not support the `CreateVolume`
workflow, so the volume name field was just a display name. The `CreateVolume`
CSI RPC requires that the volume name be unique. In retrospect, Nomad should
probably have mapped the namespace + ID to the volume name field, but because we
didn't the name field must be unique per storage provider. In future work we
should try to figure out a way to unwind that decision but in the meantime let's
make that requirement clear in the documentation.
Ref: https://gitlab.com/rocketduck/csi-plugin-nfs/-/issues/21
Refactor the reconciler property tests to extract functions for safety property
assertions we'll share between different job types for the same reconciler.
When the client handles an update status response from the server,
it modifies its heartbeat stop tracker with a time set once the
RPC call returns. It optionally also emits a log message, if the
client suspects it has missed a heartbeat.
These times were originally tracked by two different calls to the
time function which were executed 2 microseconds apart. There is
no reason we cannot use a single time variable for both uses which
saves us one whole call to time.Now.
While working on property testing in #26172 we discovered there are scenarios
where the reconciler will produce more than the expected number of
placements. Testing of those scenarios at the whole-scheduler level shows that
this gets handled correctly downstream of the reconciler, but this makes it
harder to reason about reconciler behavior. Cap the number of placements in the
reconciler.
Ref: https://github.com/hashicorp/nomad/pull/26172
While working on property testing in #26216, I discovered we had unreachable
code in the node reconciler. The `diffSystemAllocsForNode` function receives a
set of non-terminal allocations, but then has branches where it assumes the
allocations might be terminal. It's trivially provable that these allocs are
always live, as the system scheduler splits the set of known allocs into live
and terminal sets before passing them into the node reconciler.
Eliminate the unreachable code and improve the variable names to make the known
state of the allocs more clear in the reconciler code.
Ref: https://github.com/hashicorp/nomad/pull/26216
* Move commands from docs to its own root-level directory
* temporarily use modified dev-portal branch with nomad ia changes
* explicitly clone nomad ia exp branch
* retrigger build, fixed dev-portal broken build
* architecture, concepts and get started individual pages
* fix get started section destinations
* reference section
* update repo comment in website-build.sh to show branch
* docs nav file update capitalization
* update capitalization to force deploy
* remove nomad-vs-kubernetes dir; move content to what is nomad pg
* job section
* Nomad operations category, deploy section
* operations category, govern section
* operations - manage
* operations/scale; concepts scheduling fix
* networking
* monitor
* secure section
* remote auth-methods folder and move up pages to sso; linkcheck
* Fix install2deploy redirects
* fix architecture redirects
* Job section: Add missing section index pages
* Add section index pages so breadcrumbs build correctly
* concepts/index fix front matter indentation
* move task driver plugin config to new deploy section
* Finish adding full URL to tutorials links in nav
* change SSO to Authentication in nav and file system
* Docs NomadIA: Move tutorials into NomadIA branch (#26132)
* Move governance and policy from tutorials to docs
* Move tutorials content to job-declare section
* run jobs section
* stateful workloads
* advanced job scheduling
* deploy section
* manage section
* monitor section
* secure/acl and secure/authorization
* fix example that contains an unseal key in real format
* remove images from sso-vault
* secure/traffic
* secure/workload-identities
* vault-acl change unseal key and root token in command output sample
* remove lines from sample output
* fix front matter
* move nomad pack tutorials to tools
* search/replace /nomad/tutorials links
* update acl overview with content from deleted architecture/acl
* fix spelling mistake
* linkcheck - fix broken links
* fix link to Nomad variables tutorial
* fix link to Prometheus tutorial
* move who uses Nomad to use cases page; move spec/config shortcuts
add dividers
* Move Consul out of Integrations; move namespaces to govern
* move integrations/vault to secure/vault; delete integrations
* move ref arch to docs; rename Deploy Nomad back to Install Nomad
* address feedback
* linkcheck fixes
* Fixed raw_exec redirect
* add info from /nomad/tutorials/manage-jobs/jobs
* update page content with newer tutorial
* link updates for architecture sub-folders
* Add redirects for removed section index pages. Fix links.
* fix broken links from linkcheck
* Revert to use dev-portal main branch instead of nomadIA branch
* build workaround: add intro-nav-data.json with single entry
* fix content-check error
* add intro directory to get around Vercel build error
* workound for emtpry directory
* remove mdx from /intro/ to fix content-check and git snafu
* Add intro index.mdx so Vercel build should work
---------
Co-authored-by: Tu Nguyen <im2nguyen@gmail.com>