Commit Graph

3712 Commits

Author SHA1 Message Date
Mike Nomitch
31f4296826 Adds support for failures before warning to Consul service checks (#19336)
Adds support for failures before warning and failures before critical
to the automatically created Nomad client and server services in Consul
2023-12-14 11:33:31 -08:00
hc-github-team-nomad-core
b777013ff9 Generate files for 1.7.2 release 2023-12-14 11:23:55 +01:00
James Rasell
71ea1deda7 cli: Fix bug in var put command using mix of flags and spec. (#19423) 2023-12-12 08:31:22 +00:00
hc-github-team-nomad-core
180fd54918 Generate files for 1.7.1 release 2023-12-08 14:39:09 -05:00
Luiz Aoqui
099ee06a60 Revert "deps: update go-metrics to v0.5.3 (#19190)" (#19374)
* Revert "deps: update go-metrics to v0.5.3 (#19190)"

This reverts commit ddb060d8b3.

* changelog: add entry for #19374
2023-12-08 08:46:55 -05:00
Luiz Aoqui
c624dc2121 config: fix loading Vault token from env var (#19349)
The `defaultVault` variable is a pointer to the Vault configuration
named `default`. Initially, this variable points to the Vault
configuration that is used to load CLI flag values, but after those are
merged with the default and config file values the pointer reference
must be updated before mutating the config with environment variable
values.
2023-12-07 11:56:53 -05:00
Luiz Aoqui
27d2ad1baf cli: add -dev-consul and -dev-vault agent mode (#19327)
The `-dev-consul` and `-dev-vault` flags add default identities and
configuration to the Nomad agent to connect and use the workload
identity integration with Consul and Vault.
2023-12-07 11:51:20 -05:00
hc-github-team-nomad-core
e799b06f02 Generate files for 1.7.0 release 2023-12-07 16:43:02 +01:00
Tim Gross
3c4e2009f5 connect: deployments should wait for Connect sidecar checks (#19334)
When a Connect service is registered with Consul, Nomad includes the nested
`Connect.SidecarService` field that includes health checks for the Envoy
proxy. Because these are not part of the job spec, the alloc health tracker
created by `health_hook` doesn't know to read the value of these checks.

In many circumstances this won't be noticed, but if the Envoy health check
happens to take longer than the `update.min_healthy_time` (perhaps because it's
been set low), it's possible for a deployment to progress too early such that
there will briefly be no healthy instances of the service available in Consul.

Update the Consul service client to find the nested sidecar service in the
service catalog and attach it to the results provided to the tracker. The
tracker can then check the sidecar health checks.

Fixes: https://github.com/hashicorp/nomad/issues/19269
2023-12-06 16:59:51 -05:00
Juana De La Cuesta
cf539c405e Add a new parameter to avoid starting a replacement for lost allocs (#19101)
This commit introduces the parameter preventRescheduleOnLost which indicates that the task group can't afford to have multiple instances running at the same time. In the case of a node going down, its allocations will be registered as unknown but no replacements will be rescheduled. If the lost node comes back up, the allocs will reconnect and continue to run.

In case of max_client_disconnect also being enabled, if there is a reschedule policy, an error will be returned.
Implements issue #10366

Co-authored-by: Dom Lavery <dom@circleci.com>
Co-authored-by: Tim Gross <tgross@hashicorp.com>
Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
2023-12-06 12:28:42 +01:00
Piotr Kazmierczak
0a783d0046 wi: change setup cmds -cleanup flag to -destroy (#19295) 2023-12-04 15:28:17 +01:00
Piotr Kazmierczak
9d209d6725 vault: claims for WI workloads should not contain nomad_group (#19296) 2023-12-04 15:25:22 +01:00
Luiz Aoqui
d12dc36c3b cli: add Consul namespace selector (#19251)
Update the `nomad setup consul` command to include a `Selector` for the
`NamespaceRule` so the logic is only applied when the token has a claim
for `consul_namespace`.

Jobs without an explicit `consul.namespace` value receive a JWT without
the `consul_namespace` claim because Nomad is unable to determine which
Consul namespace should be used.

By using `NamespaceRules`, cluster operators are able to set a default
value for these jobs.
2023-12-01 09:29:08 -05:00
Michael Schurter
4cb40433bb Post 1.7.0 rc.1 release (#19252)
* Prepare release 1.7.0-rc.1

* Generate files for 1.7.0-rc.1 release

* Prepare for next release
2023-12-01 08:53:48 -05:00
Phil Renaud
d104432cd3 Actions: API, command, and jobspec docs (#19166)
* API command and jobspec docs

* PR comments addressed

* API docs for job/jobid/action socket

* Removing a perhaps incorrect origin of job_id across the jobs api doc

* PR comments addressed
2023-11-30 14:13:37 -05:00
Piotr Kazmierczak
67bbcc4a4f cli: setup consul proper ns handling (#19237)
In order to correctly handle Consul namespaces, auth methods and binding rules
must always be created in the default namespace only.

---------

Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
2023-11-30 20:09:19 +01:00
James Rasell
81249ffe65 agent: log using error keyword not err in keyring endpoint (#19243) 2023-11-30 16:40:13 +00:00
Luiz Aoqui
d29ac461a7 cli: non-service jobs on job restart -reschedule (#19147)
The `-reschedule` flag stops allocations and assumes the Nomad scheduler
will create new allocations to replace them. But this is only true for
service and batch jobs.

Restarting non-service jobs with the `-reschedule` flag causes the
command to loop forever waiting for the allocations to be replaced,
which never happens.

Allocations for system jobs may be replaced by triggering an evaluation
after each stop to cause the reconciler to run again.

Sysbatch jobs should not be allowed to be rescheduled as they are never
replaced by the scheduler.
2023-11-29 13:01:19 -05:00
James Rasell
0819aab237 cli: fix help formatting on job stop command. (#19214) 2023-11-29 15:52:37 +00:00
Luiz Aoqui
ddb060d8b3 deps: update go-metrics to v0.5.3 (#19190)
Update `go-metrics` to v0.5.3 to pick
https://github.com/hashicorp/go-metrics/pull/146.
2023-11-28 12:37:57 -05:00
Jorge Marey
5f78940911 Allow setting a token name template on auth methods (#19135)
Co-authored-by: James Rasell <jrasell@hashicorp.com>
2023-11-28 12:26:21 +00:00
Piotr Kazmierczak
248b2ba5cd WI: use single auth method for Consul by default (#19169)
This simplifies the default setup of Nomad workloads WI-based
authentication for Consul by using a single auth method with 2 binding rules.

Users can still specify separate auth methods for services and tasks.
2023-11-28 12:22:27 +01:00
Luiz Aoqui
5ff6cce3ab vault: update default JWT auth method path (#19188)
Update default auth method path to be `jwt-nomad` to avoid potential
conflicts when Vault's `jwt` default is already being used for something
else.
2023-11-27 17:48:12 -05:00
Piotr Kazmierczak
742651f2f7 agent: ignore websocket statuses 1000, 1001 and 1005 correctly (#19172)
These are "close" messages and not actual errors.
2023-11-27 09:33:08 +01:00
Phil Renaud
fb14c2b556 [ui] Actions service and flyout (#19084)
* Initial pass at a global actions instance queue

* Action card with a bunch of functionality that needs to be pared back a bit

* Happy little actions button

* runAction performs updated to use actions service

* Stop All and Clear Finished buttons

* Keyboard service now passes element, so we can pseudo-click the actions dropdown

* resizable sidebar code blocks

* Contextual actions within task and job levels

* runAction greatly consolidated

* Pluralize action text

* Peer grouping of flyout action intances

* ShortIDs instead of full alloc IDs

* Testfixes that previously depended on notifications

* Stop and stop all for peered action instances

* Job name in action instance card linkable

* Componentized actions global button

* scss consolidation

* Clear and Stop buttons become mutually exclusive in an action card

* Clean up action card title styles a bit

* todo-bashing

* stopAll and stopPeers separated and fixed up

* Socket handling functions moved to the Actions service

* Error handling on socket message

* Smarter import

* Documentation note: need alloc-exec and alloc-raw-exec for raw_exec jobs

* Tests for flyout and dropdown actions

* Docs link when in empty flyout/queue state and percy snapshot test for it
2023-11-26 23:46:44 -05:00
James Rasell
cfbb2e8923 cli: use spaces when outputting ACL auth method token TTL param. (#19159) 2023-11-24 10:39:27 +00:00
Luiz Aoqui
bdac8d9583 cli: prevent panic on CTRL+C during a question (#19154)
Fix a panic when a question receives an interrupt signal before the
signal handler is initialized.
2023-11-23 14:51:56 -05:00
Luiz Aoqui
d2849b8a76 cli: skip allocs with replacements on job restart (#19155)
The `nomad job restart` command should skip allocations that already
have replacements. Restarting an allocation with a replacement is a
no-op because the allocation status is terminal and the command's
replacement monitor returns immediatelly.

But by not skipping them, the effective batch size is computed
incorrectly.
2023-11-23 14:51:10 -05:00
James Rasell
532402aa2d actions: use specific RPC request object and tighten naming. (#19149) 2023-11-23 07:42:37 +00:00
Phil Renaud
eb8553c16f Reframe nomad action as a wrapper around nomad job action (#19048)
* Reframe nomad action as a wrapper around nomad job action

* dont conditionally pass flags, just pass flags

* PR comments addressed
2023-11-22 09:23:48 -05:00
James Rasell
0f0b9a1a3c action: add job action name validation (#19145) 2023-11-22 08:02:49 +00:00
hc-github-team-nomad-core
ea3f6cc879 Generate files for 1.7.0-beta.2 release 2023-11-15 22:47:41 +00:00
Adriano Caloiaro
f66eb83fc0 Add go-netaddrs support to retry_join (#18745) 2023-11-15 10:07:18 -05:00
Luiz Aoqui
26746a4093 cli: add zero nodes message to node status (#19082)
Display a message to indicate that there are no nodes registered when
`node status` returns zero values.
2023-11-14 23:00:12 -05:00
Luiz Aoqui
85d923b759 cli: fix Consul env var URL reference (#19041) 2023-11-09 10:58:03 -05:00
Piotr Kazmierczak
128c71b579 cli: simplify conditionals in setup commands (#19011) 2023-11-08 19:41:15 -05:00
Tim Gross
7191c78928 refactor: rename allocrunner's Consul service reg handler (#19019)
The allocrunner has a service registration handler that proxies various API
calls to Consul. With multi-cluster support (for ENT), the service registration
handler is what selects the correct Consul client. The name of this field in the
allocrunner and taskrunner code base looks like it's referring to the actual
Consul API client. This was actually the case before Nomad native service
discovery was implemented, but now the name is misleading.
2023-11-08 15:39:32 -05:00
Luiz Aoqui
6761f1f98c cli: fix setup consul binding rule config (#19033)
When creating the binding rule, `BindName` must match the pattern used
for the role name, otherwise the task will not be able to login to
Consul.

Also update the equality check for the binding rule to ensure this
property is held even if the auth method already has existing binding
rules attached.
2023-11-08 15:13:16 -05:00
Michael Schurter
c4ae91f8be Fix WorkloadIdentity.TTL handling, jobspec2 testing, and hcl1 vs 2 parsing (#19024)
* make the little dots consistent
* don't trim delimiter as that over matches
* test jobspec2 package
* copy api/WorkloadIdentity.TTL -> structs
* test ttl parsing
* fix hcl1 v 2 parsing mismatch
* make jobspec(1) tests match jobspec2 tests
2023-11-08 09:01:16 -08:00
Tim Gross
9d075c44b2 config: remove old Vault/Consul config blocks from parser (#18997)
Remove the now-unused original configuration blocks for Consul and Vault from
the agent configuration parsing. When the agent needs to refer to a Consul or
Vault block it will always be for a specific cluster for the task/service (or
the default cluster for the agent's own use).

This is third of three changesets for this work.

Fixes: https://github.com/hashicorp/nomad/issues/18947
Ref: https://github.com/hashicorp/nomad/pull/18991
Ref: https://github.com/hashicorp/nomad/pull/18994
2023-11-08 09:30:08 -05:00
Tim Gross
50f0ce5412 config: remove old Vault/Consul config blocks from client (#18994)
Remove the now-unused original configuration blocks for Consul and Vault from
the client. When the client needs to refer to a Consul or Vault block it will
always be for a specific cluster for the task/service. Add a helper for
accessing the default clusters (for the client's own use).

This is two of three changesets for this work. The remainder will implement the
same changes in the `command/agent` package.

As part of this work I discovered and fixed two bugs:

* The gRPC proxy socket that we create for Envoy is only ever created using the
  default Consul cluster's configuration. This will prevent Connect from being
  used with the non-default cluster.
* The Consul configuration we use for templates always comes from the default
  Consul cluster's configuration, but will use the correct Consul token for the
  non-default cluster. This will prevent templates from being used with the
  non-default cluster.

Ref: https://github.com/hashicorp/nomad/issues/18947
Ref: https://github.com/hashicorp/nomad/pull/18991
Fixes: https://github.com/hashicorp/nomad/issues/18984
Fixes: https://github.com/hashicorp/nomad/issues/18983
2023-11-07 09:15:37 -05:00
Tim Gross
1998004483 move deprecation warning for Vault/Consul token to admission hook (#18995)
Submitting a Consul or Vault token with a job is deprecated in Nomad 1.7 and
intended for removal in Nomad 1.9. We added a deprecation warning to the CLI
when the user passes in the appropriate flag or environment variable in
does not use Vault or Consul but happen to have the appropriate environment
variable in your environment. While this is generally a bad practice (because
the token is leaked to Nomad), it's also the existing practice for some users.

Move the warning to the job admission hook. This will allow us to warn only when
appropriate, and that will also help the migration process by producing warnings
only for the relevant jobs.
2023-11-07 08:37:06 -05:00
Piotr Kazmierczak
7c6863b479 cli: setup vault command (#18910)
An interactive setup helper for configuring Vault to accept Nomad WI-enabled
workloads.

---------

Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
2023-11-07 10:42:00 +01:00
Tim Gross
1ef99f0536 config: remove old Vault/Consul config blocks from server (#18991)
Remove the now-unused original configuration blocks for Consul and Vault from
the server. When the server needs to refer to a Consul or Vault block it will
always be for a specific cluster for the task/service. Add a helper for
accessing the default clusters (for the servers own use).

This is one of three changesets for this work. The remainder will implement the
same changes in the `client` package and on the `command/agent` package.

As part of this work I discovered that the job submission hook for Vault only
checks the enabled flag on the default cluster, rather than the clusters that
are used by the job being submitted. This will return an error on job
registration saying that Vault is disabled. Fix that to check only the
cluster(s) used by the job.

Ref: https://github.com/hashicorp/nomad/issues/18947
Fixes: https://github.com/hashicorp/nomad/issues/18990
2023-11-06 10:26:20 -05:00
Tim Gross
b62c5c51d2 cli: extend coverage of operator client-state command (#18996)
The `operator client-state` command is mostly used for developer debugging of
the Nomad client state, but it hasn't been updated with several recent
additions. Add allocation identities, network status, and dynamic volumes to the
objects it outputs.

Also, fix a bug where reading the state for an allocation without task states
will crash the CLI. This can happen if the Nomad client stops after an alloc is
persisted to disk but before the task actually starts.
2023-11-03 15:43:05 -04:00
Michael Schurter
78f0c6b2a9 cli: update acl bootstrap help to match docs (#18961)
See https://developer.hashicorp.com/nomad/docs/commands/acl/bootstrap
2023-11-02 08:52:21 -07:00
Piotr Kazmierczak
d69a1238cd cli: consul setup command (#18820)
An interactive setup helper for configuring Consul to accept Nomad WI-enabled workloads.

---------

Co-authored-by: Tim Gross <tgross@hashicorp.com>
Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
2023-11-02 09:02:07 +01:00
James Rasell
0822af35af cli: remove unused raft tool helper. (#18954) 2023-11-02 07:43:44 +00:00
Seth Hoenig
51b8737ca9 Release/1.7.0 beta.1 (#18962)
* Prepare release 1.7.0-beta.1

* cl: tweak actions cl entry

* Generate files for 1.7.0-beta.1 release

* Prepare for next release

---------

Co-authored-by: hc-github-team-nomad-core <github-team-nomad-core@hashicorp.com>
2023-11-01 14:27:59 -05:00
Michael Schurter
e49ca3c431 identity: Implement change_mode (#18943)
* identity: support change_mode and change_signal

wip - just jobspec portion

* test struct

* cleanup some insignificant boogs

* actually implement change mode

* docs tweaks

* add changelog

* test identity.change_mode operations

* use more words in changelog

* job endpoint tests

* address comments from code review

---------

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2023-11-01 09:41:11 -05:00