Commit Graph

224 Commits

Author SHA1 Message Date
Piotr Kazmierczak
4212bfd669 docs: update documentation of namespace delete command (#23536) 2024-07-10 18:31:35 +02:00
Tim Gross
cd3101d624 scale: add -check-index to job scale command (#23457)
The RPC handler for scaling a job passes flags to enforce the job modify index
is unchanged when it makes the write to Raft. But its only checking against the
existing job modify index at the time the RPC handler snapshots the state store,
so it can only enforce consistency for its own validation.

In clusters with automated scaling, it would be useful to expose the enforce
index options to the API, so that cluster admins can enforce that scaling only
happens when the job state is consistent with a state they've previously seen in
other API calls. Add this option to the CLI and API and have the RPC handler
check them if asked.

Fixes: https://github.com/hashicorp/nomad/issues/23444
2024-06-27 16:54:06 -04:00
James Rasell
26d0a9169c docs: fix typo in alloc exec CLI docs page. (#23392) 2024-06-20 07:50:32 +01:00
James Rasell
1c976d126e docs: update snapshot inspect CLI detail to mirror recent changes. (#23276) 2024-06-10 14:30:13 +01:00
Tim Gross
39dee90ad4 docs: clarify node drain behavior for batch workloads (#23170)
Our documentation for the `node drain` command doesn't include a treatment of
batch jobs, which are not migrated. The user is left to piece this behavior
together from the `migrate` documentation and the tutorial. Instead, let's
explicitly list the behaviors per job type.

Fixes: https://github.com/hashicorp/nomad/issues/17563
2024-06-05 08:47:37 -04:00
Michael Schurter
a2fe43030c rap 2024-05-29 15:50:33 -07:00
Michael Schurter
5a0c74d1f9 Apply suggestions from code review
Co-authored-by: David Yu <dyu@hashicorp.com>
2024-05-29 15:50:33 -07:00
Michael Schurter
fe0bda9c34 speling 2024-05-29 15:50:33 -07:00
Michael Schurter
690abefc4a docs: add docs for time based task execution 2024-05-29 15:50:33 -07:00
Tim Gross
0fb22eeab3 docs: fix broken markdown in alloc exec (#20576) 2024-05-13 15:34:37 -04:00
Tim Gross
f9dd120d29 cli: add -jwks-ca-file to Vault/Consul setup commands (#20518)
When setting up auth methods for Consul and Vault in production environments, we
can typically assume that the CA certificate for the JWKS endpoint will be in
the host certificate store (as part of the usual configuration management
cluster admins needs to do). But for quick demos with `-dev` agents, this won't
be the case.

Add a `-jwks-ca-file` parameter to the setup commands so that we can use this
tool to quickly setup WI with `-dev` agents running TLS.
2024-05-03 08:26:29 -04:00
Daniel Bennett
3ac3bc1cfe acl: token global mode can not be changed (#20464)
true up CLI and docs with API reality
2024-04-22 11:58:47 -05:00
Tim Gross
02d98b9357 operator debug: fix pprof interval handling (#20206)
The `nomad operator debug` command saves a CPU profile for each interval, and
names these files based on the interval.

The same functions takes a goroutine profile, heap profile, etc. but is missing
the logic to interpolate the file name with the interval. This results in the
operator debug command making potentially many expensive profile requests, and
then overwriting the data. Update the command to save every profile it scrapes,
and number them similarly to the existing CPU profile.

Additionally, the command flags for `-pprof-interval` and `-pprof-duration` were
validated backwards, which meant that we always coerced the `-pprof-interval` to
be the same as the `-pprof-duration`, which always resulted in a single profile
being taken at the start of the bundle. Correct the check as well as change the
defaults to be more sensible.

Fixes: https://github.com/hashicorp/nomad/issues/20151
2024-03-25 09:01:06 -04:00
Tim Gross
d3ddb0aa49 docs: make it clear that federation features require ACLs (#20196)
Our documentation has a hidden assumption that users know that federation
replication requires ACLs to be enabled and bootstrapped. Add notes at some of
the places users are likely to look for it.

A separate follow-up PR to the federation tutorial should point to the ACL
multi-region tutorial as well.

Fixes: https://github.com/hashicorp/nomad/issues/20128
2024-03-22 15:15:00 -04:00
Tim Gross
c4253470a0 autopilot: add operator autopilot health command (#20156)
Add a command line operation that reports Enterprise autopilot data from the
`/operator/autopilot/health` API. I've pulled this feature out of
@lindleywhite's PR in the Enterprise repo.

Ref: https://github.com/hashicorp/nomad-enterprise/pull/1394

Co-authored-by: Lindley <lindley@hashicorp.com>
2024-03-18 14:46:18 -04:00
Giovanni Avelar
26a27bb12c cli: add -json option on jobs status command (#18925) 2024-03-08 16:03:52 -05:00
Phil Renaud
41c783aec2 Noting action name restrictions, and correcting those of auth methods and roles (#19905) 2024-02-08 12:01:22 -05:00
Luiz Aoqui
e1e80f383e vault: add new nomad setup vault -check commmand (#19720)
The new `nomad setup vault -check` commmand can be used to retrieve
information about the changes required before a cluster is migrated from
the deprecated legacy authentication flow with Vault to use only
workload identities.
2024-01-12 15:48:30 -05:00
Mike Nomitch
31f4296826 Adds support for failures before warning to Consul service checks (#19336)
Adds support for failures before warning and failures before critical
to the automatically created Nomad client and server services in Consul
2023-12-14 11:33:31 -08:00
Tim Gross
e551814df5 docs: add warnings about backing up keyring to snapshot commands (#19400)
The `operator snapshot` commands and agent don't back up Nomad's key
material. Add some warnings about this to places where users might be looking
for information on cluster recovery.

Fixes: https://github.com/hashicorp/nomad/issues/19389
2023-12-08 16:05:05 -05:00
Luiz Aoqui
27d2ad1baf cli: add -dev-consul and -dev-vault agent mode (#19327)
The `-dev-consul` and `-dev-vault` flags add default identities and
configuration to the Nomad agent to connect and use the workload
identity integration with Consul and Vault.
2023-12-07 11:51:20 -05:00
Piotr Kazmierczak
0a783d0046 wi: change setup cmds -cleanup flag to -destroy (#19295) 2023-12-04 15:28:17 +01:00
Piotr Kazmierczak
0ff190fa38 docs: setup helpers documentation (#19267) 2023-12-04 09:59:07 +01:00
Phil Renaud
d104432cd3 Actions: API, command, and jobspec docs (#19166)
* API command and jobspec docs

* PR comments addressed

* API docs for job/jobid/action socket

* Removing a perhaps incorrect origin of job_id across the jobs api doc

* PR comments addressed
2023-11-30 14:13:37 -05:00
Seth Hoenig
5f3aae7340 website: fix spellcheck path and cleanup some misspellings (#19238) 2023-11-30 09:38:19 -06:00
Luiz Aoqui
d29ac461a7 cli: non-service jobs on job restart -reschedule (#19147)
The `-reschedule` flag stops allocations and assumes the Nomad scheduler
will create new allocations to replace them. But this is only true for
service and batch jobs.

Restarting non-service jobs with the `-reschedule` flag causes the
command to loop forever waiting for the allocations to be replaced,
which never happens.

Allocations for system jobs may be replaced by triggering an evaluation
after each stop to cause the reconciler to run again.

Sysbatch jobs should not be allowed to be rescheduled as they are never
replaced by the scheduler.
2023-11-29 13:01:19 -05:00
Jorge Marey
5f78940911 Allow setting a token name template on auth methods (#19135)
Co-authored-by: James Rasell <jrasell@hashicorp.com>
2023-11-28 12:26:21 +00:00
James Rasell
cfbb2e8923 cli: use spaces when outputting ACL auth method token TTL param. (#19159) 2023-11-24 10:39:27 +00:00
James Rasell
ca9e08e6b5 monitor: add log include location option on monitor CLI and API (#18795) 2023-10-20 07:55:22 +01:00
James Rasell
1ffdd576bb agent: add config option to enable file and line log detail. (#18768) 2023-10-16 15:59:16 +01:00
Luiz Aoqui
ef6814388c cli: remove default for ACL token type on update (#18689)
With a default value set to `client`, the `nomad acl token update`
command can silently downgrade a management token to client on update if
the command does not specify `-type=management` on every update.
2023-10-10 15:51:13 -04:00
Charlie Voiselle
8a93ff3d2d [server] Directed leadership transfer CLI and API (#17383)
* Add directed leadership transfer func
* Add leadership transfer RPC endpoint
* Add ACL tests for leadership-transfer endpoint
* Add HTTP API route and implementation
* Add to Go API client
* Implement CLI command
* Add documentation
* Add changelog

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2023-10-04 12:20:27 -04:00
Daniel Bennett
fab968a748 csi: document volume expansion (#18573)
and show Capacity in `volume status` command.
2023-09-26 14:49:15 -05:00
Juana De La Cuesta
72acaf6623 [17449] Introduces a locking mechanism over variables (#18207)
It includes the work over the state store, the PRC server, the HTTP server, the go API package and the CLI's  command. To read more on the actuall functionality, refer to the RFCs [NMD-178] Locking with Nomad Variables and [NMD-179] Leader election using locking mechanism for the Autoscaler.
2023-09-21 17:56:33 +02:00
Gerard Nguyen
1339599185 cli: Add prune flag for nomad server force-leave command (#18463)
This feature will help operator to remove a failed/left node from Serf layer immediately
without waiting for 24 hours for the node to be reaped

* Update CLI with prune flag
* Update API /v1/agent/force-leave with prune query string parameter
* Update CLI and API doc
* Add unit test
2023-09-15 08:45:11 -04:00
Luiz Aoqui
e21ab7d948 docs: fix job dispatch documentation (#18225) 2023-08-16 17:22:55 -04:00
Tim Gross
4fb5bf9a16 cli: support wildcard namespace in alloc subcommands (#18095)
The alloc exec and filesystem/logs commands allow passing the `-job` flag to
select a random allocation. If the namespace for the command is set to `*`, the
RPC handler doesn't handle this correctly as it's expecting to query for a
specific job. Most commands handle this ambiguity by first verifying that only a
single object of the type in question exists (ex. a single node or job).

Update these commands so that when the `-job` flag is set we first verify
there's a single job that matches. This also allows us to extend the
functionality to allow for the `-job` flag to support prefix matching.

Fixes: #12097
2023-07-31 13:15:15 -04:00
Luiz Aoqui
ee31916c3b cli: add help message for -consul-namespace (#18081)
Add missing help entry for the `-consul-namespace` flag in `nomad job
run`.
2023-07-28 10:22:59 -04:00
Nando
ca26673781 volume-status : show namespace the volume belongs to (#17911)
* volume-status : show namespace the volume belongs to
2023-07-19 16:36:51 -04:00
Lance Haig
1541358ef3 Add the ability to customise the details of the CA (#17309)
Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
2023-07-11 08:53:09 +01:00
Tim Gross
fc611fc5f4 docs: clarify drain's -force flag behavior with system/CSI jobs (#17703)
If you use `nomad node drain -force`, the drain deadline is set to -1ns. If you
have not prevented system and CSI node plugin allocations from being drained
with `-ignore-system`, they will be immediately drained as well. This is
typically not safe for CSI node plugins.

Also fix some broken links.

Fixes: #17696
2023-06-23 16:38:11 -04:00
Luiz Aoqui
6c64847e1b np: scheduler configuration updates (#17575)
* jobspec: rename node pool scheduler_configuration

In HCL specifications we usually call configuration blocks `config`
instead of `configuration`.

* np: add memory oversubscription config

* np: make scheduler config ENT
2023-06-19 11:41:46 -04:00
Luiz Aoqui
4f7c38b2a7 node pools: namespace integration (#17562)
Add structs and fields to support the Nomad Pools Governance Enterprise
feature of controlling node pool access via namespaces.

Nomad Enterprise allows users to specify a default node pool to be used
by jobs that don't specify one. In order to accomplish this, it's
necessary to distinguish between a job that explicitly uses the
`default` node pool and one that did not specify any.

If the `default` node pool is set during job canonicalization it's
impossible to do this, so this commit allows a job to have an empty node
pool value during registration but sets to `default` at the admission
controller mutator.

In order to guarantee state consistency the state store validates that
the job node pool is set and exists before inserting it.
2023-06-16 16:30:22 -04:00
Tim Gross
eee2315d5d docs: clarify node pool apply/delete behavior (#17529) 2023-06-14 15:58:53 -04:00
Tim Gross
0ac85db680 cli: fix missing -quiet flag for var init (#17526)
The `var init` command was intended to have support for a `-quiet` flag but it
was not documented and never parsed.
2023-06-14 14:52:46 -04:00
Tim Gross
6bd1ebed29 docs: note namespace apply/delete behaviors, fix metric (#17527)
This changeset includes some fixes to documentation discovered while working on
node pools, but we didn't want to include in the node pool PRs so they can get
backported easily:

* namespace apply/delete commands are forwarded to the authoritative region
* deleting a namespace requires there are no non-terminal jobs in any of the
  federated regions
* fixed a typo in the name of the `nomad.client.allocated.disk` metric
2023-06-14 14:52:06 -04:00
Tim Gross
0aeeaf1083 node pools: implement node pool init command (#17479)
Implement a `nomad node pool init` command that generates an example spec file
in either HCL or JSON format.
2023-06-13 14:51:29 -04:00
Piotr Kazmierczak
be8f04e89f docs: corrections and additional information for OIDC-related concepts (#17470) 2023-06-09 16:50:22 +02:00
Luiz Aoqui
354d741c95 node pool: implement nomad node pool nodes CLI (#17444) 2023-06-07 10:37:27 -04:00
Tim Gross
84e7cf39f6 node pools: implement CLI for node pool jobs command (#17432) 2023-06-06 15:02:26 -04:00