It includes the work over the state store, the PRC server, the HTTP server, the go API package and the CLI's command. To read more on the actuall functionality, refer to the RFCs [NMD-178] Locking with Nomad Variables and [NMD-179] Leader election using locking mechanism for the Autoscaler.
This feature will help operator to remove a failed/left node from Serf layer immediately
without waiting for 24 hours for the node to be reaped
* Update CLI with prune flag
* Update API /v1/agent/force-leave with prune query string parameter
* Update CLI and API doc
* Add unit test
The alloc exec and filesystem/logs commands allow passing the `-job` flag to
select a random allocation. If the namespace for the command is set to `*`, the
RPC handler doesn't handle this correctly as it's expecting to query for a
specific job. Most commands handle this ambiguity by first verifying that only a
single object of the type in question exists (ex. a single node or job).
Update these commands so that when the `-job` flag is set we first verify
there's a single job that matches. This also allows us to extend the
functionality to allow for the `-job` flag to support prefix matching.
Fixes: #12097
If you use `nomad node drain -force`, the drain deadline is set to -1ns. If you
have not prevented system and CSI node plugin allocations from being drained
with `-ignore-system`, they will be immediately drained as well. This is
typically not safe for CSI node plugins.
Also fix some broken links.
Fixes: #17696
* jobspec: rename node pool scheduler_configuration
In HCL specifications we usually call configuration blocks `config`
instead of `configuration`.
* np: add memory oversubscription config
* np: make scheduler config ENT
Add structs and fields to support the Nomad Pools Governance Enterprise
feature of controlling node pool access via namespaces.
Nomad Enterprise allows users to specify a default node pool to be used
by jobs that don't specify one. In order to accomplish this, it's
necessary to distinguish between a job that explicitly uses the
`default` node pool and one that did not specify any.
If the `default` node pool is set during job canonicalization it's
impossible to do this, so this commit allows a job to have an empty node
pool value during registration but sets to `default` at the admission
controller mutator.
In order to guarantee state consistency the state store validates that
the job node pool is set and exists before inserting it.
This changeset includes some fixes to documentation discovered while working on
node pools, but we didn't want to include in the node pool PRs so they can get
backported easily:
* namespace apply/delete commands are forwarded to the authoritative region
* deleting a namespace requires there are no non-terminal jobs in any of the
federated regions
* fixed a typo in the name of the `nomad.client.allocated.disk` metric
The `nomad tls cert` command did not create certificates with the correct SANs for
them to work with non default domain and region names. This changset updates the
code to support non default domains and regions in the certificates.
The `volume register` command can update a small subset of the volume's fields
in-place, with some restrictions depending on whether the volume is currently in
use. Document these in the `volume register` command docs and the volume
specification docs.
Fixes: #17247
This PR modifies references to the envoyproxy/envoy docker image to
explicitly include the docker.io prefix. This does not affect existing
users, but makes things easier for Podman users, who otherwise need to
specify the full name because Podman does not default to docker.io
The `-deadline` and `-force` flag for the `nomad node drain` command only cause
the draining to ignore the `migrate` block's healthy deadline, max parallel,
etc. These flags don't have anything to do with the `kill_timeout` or
`shutdown_delay` options of the jobspec.
This changeset fixes the skipped E2E tests so that they validate the intended
behavior, and updates the docs for more clarity.
This update changes the behaviour when following logs from an
allocation, so that both stdout and stderr files streamed when the
operator supplies the follow flag. The previous behaviour is held
when all other flags and situations are provided.
Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
When cluster administrators restore from Raft snapshot, they also need to ensure the
keyring is in place. For on-prem users doing in-place upgrades this is less of a
concern but for typical cloud workflows where the whole host is replaced, it's
an important warning (at least until #14852 has been implemented).
Implement the new `nomad job restart` command that allows operators to
restart allocations tasks or reschedule then entire allocation.
Restarts can be batched to target multiple allocations in parallel.
Between each batch the command can stop and hold for a predefined time
or until the user confirms that the process should proceed.
This implements the "Stateless Restarts" alternative from the original
RFC
(https://gist.github.com/schmichael/e0b8b2ec1eb146301175fd87ddd46180).
The original concept is still worth implementing, as it allows this
functionality to be exposed over an API that can be consumed by the
Nomad UI and other clients. But the implementation turned out to be more
complex than we initially expected so we thought it would be better to
release a stateless CLI-based implementation first to gather feedback
and validate the restart behaviour.
Co-authored-by: Shishir Mahajan <smahajan@roblox.com>
* Added and flag to command
* cli[style]: small refactor to avoid confussion with tmpl variable
* Update inspect.mdx
* cli: add changelog entry
* Update .changelog/16478.txt
Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
* Update command/quota_inspect.go
Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
---------
Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
* cli: Add and flags to server members
* Update website/content/docs/commands/server/members.mdx
Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
* Update website/content/docs/commands/server/members.mdx
Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
* cli: update the server memebers tests to use must
* cli: add flags addition to changelog
---------
Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
nomad login command does not need to know ACL Auth Method's type, since all
method names are unique.
Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
* cli: Add and flag to namespace status command
* Update command/namespace_status.go
Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
* cli: update tests for namespace status command to use must
---------
Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
The job evaluate endpoint creates a new evaluation for the job which is
a write operation. This change modifies the necessary capability from
`read-job` to `submit-job` to better reflect this.
* cli: add -json flag to alloc checks for completion
* CLI: Expand test to include testing the json flag for allocation checks
* Documentation: Add the checks command
* Documentation: Add example for alloc check command
* Update website/content/docs/commands/alloc/checks.mdx
Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
* CLI: Add template flag to alloc checks command
* Update website/content/docs/commands/alloc/checks.mdx
Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
* CLI: Extend test to include -t flag for alloc checks
* func: add changelog for added flags to alloc checks
* cli[doc]: Make usage section on alloc checks clearer
* Update website/content/docs/commands/alloc/checks.mdx
Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
* Delete modd.conf
* cli[doc]: add -t flag to command description for alloc checks
---------
Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
Co-authored-by: Juanita De La Cuesta Morales <juanita.delacuestamorales@juanita.delacuestamorales-LHQ7X0QG9X>
Most job subcommands allow for job ID prefix match as a convenience
functionality so users don't have to type the full job ID.
But this introduces a hard ACL requirement that the token used to run
these commands have the `list-jobs` permission, even if the token has
enough permission to execute the basic command action and the user
passed an exact job ID.
This change softens this requirement by not failing the prefix match in
case the request results in a permission denied error and instead using
the information passed by the user directly.
* build: add BuildDate to version info
will be used in enterprise to compare to license expiration time
* cli: multi-line version output, add BuildDate
before:
$ nomad version
Nomad v1.4.3 (coolfakecommithashomgoshsuchacoolonewoww)
after:
$ nomad version
Nomad v1.5.0-dev
BuildDate 2023-02-17T19:29:26Z
Revision coolfakecommithashomgoshsuchacoolonewoww
compare consul:
$ consul version
Consul v1.14.4
Revision dae670fe
Build Date 2023-01-26T15:47:10Z
Protocol 2 spoken by default, blah blah blah...
and vault:
$ vault version
Vault v1.12.3 (209b3dd99fe8ca320340d08c70cff5f620261f9b), built 2023-02-02T09:07:27Z
* docs: update version command output
The `nomad fmt -check` command incorrectly writes to file because we didn't
return before writing the file on a diff. Fix this bug and update the command
internals to differentiate between the write-to-file and write-to-stdout code
paths, which are activated by different combinations of options and flags.
The docstring for the `-list` and `-write` flags is also unclear and can be
easily misread to be the opposite of the actual behavior. Clarify this and fix
up the docs to match.
This changeset also refactors the tests quite a bit so as to make the test
outputs clear when something is incorrect.
In #13374 we updated the commented-out `license_path` in the packaged example
configuration file to match the existing documentation. Although this config
value was commented-out, it was reported that changing the value was
confusing. Update the commented-out line to the previous value and update the
documented examples to match that. This matches most of the examples for
Consul/Vault licensing as well. I've double-checked the tutorials and it looks
like it'd been left on the previous value there, so no additional work to be
done.