If you delete a CSI volume, the volume cannot be currently claimed by an
allocation or in the process of being unpublished. This is documented in the CLI
but not the API. Also, the documentation incorrectly says that the `volume
delete` command silently returns without error if the volume doesn't exist, but
that's incorrect.
Fixes: https://github.com/hashicorp/nomad/issues/24756
When we originally implemented CSI, Nomad did not support the `CreateVolume`
workflow, so the volume name field was just a display name. The `CreateVolume`
CSI RPC requires that the volume name be unique. In retrospect, Nomad should
probably have mapped the namespace + ID to the volume name field, but because we
didn't the name field must be unique per storage provider. In future work we
should try to figure out a way to unwind that decision but in the meantime let's
make that requirement clear in the documentation.
Ref: https://gitlab.com/rocketduck/csi-plugin-nfs/-/issues/21
Refactor the reconciler property tests to extract functions for safety property
assertions we'll share between different job types for the same reconciler.
When the client handles an update status response from the server,
it modifies its heartbeat stop tracker with a time set once the
RPC call returns. It optionally also emits a log message, if the
client suspects it has missed a heartbeat.
These times were originally tracked by two different calls to the
time function which were executed 2 microseconds apart. There is
no reason we cannot use a single time variable for both uses which
saves us one whole call to time.Now.
While working on property testing in #26172 we discovered there are scenarios
where the reconciler will produce more than the expected number of
placements. Testing of those scenarios at the whole-scheduler level shows that
this gets handled correctly downstream of the reconciler, but this makes it
harder to reason about reconciler behavior. Cap the number of placements in the
reconciler.
Ref: https://github.com/hashicorp/nomad/pull/26172
While working on property testing in #26216, I discovered we had unreachable
code in the node reconciler. The `diffSystemAllocsForNode` function receives a
set of non-terminal allocations, but then has branches where it assumes the
allocations might be terminal. It's trivially provable that these allocs are
always live, as the system scheduler splits the set of known allocs into live
and terminal sets before passing them into the node reconciler.
Eliminate the unreachable code and improve the variable names to make the known
state of the allocs more clear in the reconciler code.
Ref: https://github.com/hashicorp/nomad/pull/26216
* Move commands from docs to its own root-level directory
* temporarily use modified dev-portal branch with nomad ia changes
* explicitly clone nomad ia exp branch
* retrigger build, fixed dev-portal broken build
* architecture, concepts and get started individual pages
* fix get started section destinations
* reference section
* update repo comment in website-build.sh to show branch
* docs nav file update capitalization
* update capitalization to force deploy
* remove nomad-vs-kubernetes dir; move content to what is nomad pg
* job section
* Nomad operations category, deploy section
* operations category, govern section
* operations - manage
* operations/scale; concepts scheduling fix
* networking
* monitor
* secure section
* remote auth-methods folder and move up pages to sso; linkcheck
* Fix install2deploy redirects
* fix architecture redirects
* Job section: Add missing section index pages
* Add section index pages so breadcrumbs build correctly
* concepts/index fix front matter indentation
* move task driver plugin config to new deploy section
* Finish adding full URL to tutorials links in nav
* change SSO to Authentication in nav and file system
* Docs NomadIA: Move tutorials into NomadIA branch (#26132)
* Move governance and policy from tutorials to docs
* Move tutorials content to job-declare section
* run jobs section
* stateful workloads
* advanced job scheduling
* deploy section
* manage section
* monitor section
* secure/acl and secure/authorization
* fix example that contains an unseal key in real format
* remove images from sso-vault
* secure/traffic
* secure/workload-identities
* vault-acl change unseal key and root token in command output sample
* remove lines from sample output
* fix front matter
* move nomad pack tutorials to tools
* search/replace /nomad/tutorials links
* update acl overview with content from deleted architecture/acl
* fix spelling mistake
* linkcheck - fix broken links
* fix link to Nomad variables tutorial
* fix link to Prometheus tutorial
* move who uses Nomad to use cases page; move spec/config shortcuts
add dividers
* Move Consul out of Integrations; move namespaces to govern
* move integrations/vault to secure/vault; delete integrations
* move ref arch to docs; rename Deploy Nomad back to Install Nomad
* address feedback
* linkcheck fixes
* Fixed raw_exec redirect
* add info from /nomad/tutorials/manage-jobs/jobs
* update page content with newer tutorial
* link updates for architecture sub-folders
* Add redirects for removed section index pages. Fix links.
* fix broken links from linkcheck
* Revert to use dev-portal main branch instead of nomadIA branch
* build workaround: add intro-nav-data.json with single entry
* fix content-check error
* add intro directory to get around Vercel build error
* workound for emtpry directory
* remove mdx from /intro/ to fix content-check and git snafu
* Add intro index.mdx so Vercel build should work
---------
Co-authored-by: Tu Nguyen <im2nguyen@gmail.com>
* fix: initalize the topology of teh processors to avoid nil pointers
* func: initialize topology to avoid nil pointers
* fix: update the new public method for NodeProcessorResources
The RPC handler for deleting dynamic host volumes has a check that any
allocations associated with a volume are client-terminal before deleting the
volume. But the state store delete that happens after we send client RPCs to the
plugin checks that the allocs are non-terminal on both server and client.
This can improperly allow deleting a volume from a client but then not being
able to delete it from the state store because of a time-of-check / time-of-use
bug. If the allocation fails/completes on the client before the server marks its
desired status as terminal, or if the allocation is marked server-terminal
during the client RPC, we can get a volume that passes the first check but not
the second check that happens in the state store and cannot be deleted.
Update the state store delete method to require that any allocation for a volume
is client terminal in order to delete the volume, not just server terminal.
Fixes: https://github.com/hashicorp/nomad/issues/26140
Ref: https://hashicorp.atlassian.net/browse/NMD-883
The Nomad client will persist its own identity within its state
store for restart persistence. The added benefit of using it over
the filesystem is that it supports transactions. This is useful
when considering the identity will be renewed periodically.
Nomad agents emit metrics for Consul service and check operations, but these
were not documented. Update the metrics reference table to include these
metrics. Note that the metrics are prefixed `nomad.client` but are present on
all agents, because the server registers itself in Consul as well.
The output of the reconciler stage of scheduling is only visible via debug-level
logs, typically accessible only to the cluster admin. We can give job authors
better ability to understand what's happening to their jobs if we expose this
information to them in the `eval status` command.
Add the reconciler's desired updates to the evaluation struct so it can be
exposed in the API. This increases the size of evals by roughly 15% in the state
store, or a bit more when there are preemptions (but we expect this will be a
small minority of evals).
Ref: https://hashicorp.atlassian.net/browse/NMD-818
Fixes: https://github.com/hashicorp/nomad/issues/15564
The mkdir plugin creates the directory and then chowns it. In the
event the chown command fails, we should attempt to remove the
directory. Without this, we leave directories on the client in
partial failure situations.
The meta client looks for both an environment variable and a CLI
flag when generating a client. The CLI UUID checker needs to do
this also, so we account for users using both env vars and CLI
flag tokens.
When draining nodes allocs are checked for a healthy state and
marked to be drained, with the value in the max parallel setting
determining how many allocs will be migrated. Depending on the
circumstances, however, the max parallel setting may not be
properly respected.
Given a job with max parallel set to one, a group count greater
than one, and allocs on multiple nodes: Draining a single node
will result in one alloc being marked to drain. If another
node is immediately drained the alloc running on the first
node will be seen as "healthy" and another alloc will be
marked to be drained resulting in two allocs being marked
for migration at the same time. This can lead to issues with
service availablility.
To prevent this allocs can only be marked as healthy when the
alloc has not been marked for migration. This prevents migrating
allocs being seen as healthy which results in the max parallel
setting being properly respected.
When performing a graceful shutdown the client drain configuration
is checked for a deadline which is appended to the timeout. When
running as a server the client will not be set. Attempting to get
the drain deadline will result in a panic. This checks for the
client being available prior to fetching the deadline value.
The `killTasks` function will kill all the alloc runners
task runners. If the task of a task runner has already
completed, the killing of the task runner can cause
confusion due to the task event showing that the task
was signaled even though it is already complete.
To prevent this, a check is done when creating the
task event to determine if the task has completed. If
it has no task event is created and when the task
runner is killed, no extra task event is added.
When a Nomad client register or re-registers, the RPC handler will
generate and return a node identity if required. When an identity
is generated, the signing key ID will be stored within the node
object, to ensure a root key is not deleted until it is not used.
During normal client operation it will periodically heartbeat to
the Nomad servers to indicate aliveness. The RPC handler that
is used for this action has also been updated to conditionally
perform identity generation. Performing it here means no extra RPC
handlers are required and we inherit the jitter in identity
generation from the heartbeat mechanism.
The identity generation check methods are performed from the RPC
request arguments, so they a scoped to the required behaviour and
can handle the nuance of each RPC. Failure to generate an identity
is considered terminal to the RPC call. The client will include
behaviour to retry this error which is always caused by the
encrypter not being ready unless the servers keyring has been
corrupted.
Both the cluster reconciler and node reconciler emit a debug-level log line with
their results, but these are unstructured multi-line logs that are annoying for
operators to parse. Change these to emit structured key-value pairs like we do
everywhere else.
Ref: https://hashicorp.atlassian.net/browse/NMD-818
Ref: https://go.hashi.co/rfc/nmd-212
When debugging an evaluation, you almost always want to know about all the
related evaluations and what allocations were placed by that evaluation (and
where), not just failed placements. We can enrich the command by adding the
`related` query parameter to the API, and having the command query for the
evaluations allocations automatically. Emit this data as a pair of new tables
and expose fields like quota limits, and previous/next/blocked eval without the
`-verbose` flag.
Update the docs to include the full output and remove references to long-removed
behavior of the `-json` flag.
Ref: https://hashicorp.atlassian.net/browse/NMD-818
Ref: https://go.hashi.co/rfc/nmd-212