* Jobs index without groups
* Download button only appears if you have content in your template
* No longer need to test for the group count in jobs index
When we added a RSA key for signing Workload Identities, we added it to the
keystore serialization but did not also add it to the `GetKey` RPC. This means
that when a key is rotated, the RSA key will not come along. The Nomad leader
signs all Workload Identities, but external consumers of WI (like Consul or
Vault) will verify the WI against any of the servers. If the request to verify
hits a follower, the follower will not have the RSA private key and cannot use
the existing ed25519 key to verify WIs with the `RS256` algorithm.
Add the RSA key material to the `GetKey` RPC.
Also remove an extraneous write to disk that happens for each key each time we
restart the Nomad server.
Fixes: #19340
If an allocrunner is persisted to the client state but the client stops before
task runner can start, we end up with an allocation in the database with
allocrunner state but no taskrunner state. This ends up mimicking an old
pre-0.9.5 state where this state was not recorded and that hits a backwards
compatibility shim. This leaves allocations in the client state that can never
be restored, but won't ever be removed either.
Update the backwards compatibility shim so that we fail the restore for the
allocrunner and remove the allocation from the client state. Taskrunners persist
state during graceful shutdown, so it shouldn't be possible to leak tasks that
have actually started. This lets us "start over" with the allocation, if the
server still wants to place it on the client.
The `defaultVault` variable is a pointer to the Vault configuration
named `default`. Initially, this variable points to the Vault
configuration that is used to load CLI flag values, but after those are
merged with the default and config file values the pointer reference
must be updated before mutating the config with environment variable
values.
The `-dev-consul` and `-dev-vault` flags add default identities and
configuration to the Nomad agent to connect and use the workload
identity integration with Consul and Vault.
When a Connect service is registered with Consul, Nomad includes the nested
`Connect.SidecarService` field that includes health checks for the Envoy
proxy. Because these are not part of the job spec, the alloc health tracker
created by `health_hook` doesn't know to read the value of these checks.
In many circumstances this won't be noticed, but if the Envoy health check
happens to take longer than the `update.min_healthy_time` (perhaps because it's
been set low), it's possible for a deployment to progress too early such that
there will briefly be no healthy instances of the service available in Consul.
Update the Consul service client to find the nested sidecar service in the
service catalog and attach it to the results provided to the tracker. The
tracker can then check the sidecar health checks.
Fixes: https://github.com/hashicorp/nomad/issues/19269
The EBS snapshot operation can take a long time to complete. Recent runs have
shown we sometimes get up to the 10s timeout on the context we're giving the CLI
command. Extend this so that we're not getting spurious timeouts.
Fixes: https://github.com/hashicorp/nomad/issues/19118
* cleanup consul tokens by accessor id
rather than secret id, which has been failing for some time with:
> 404 (Cannot find token to delete)
* expect subset of consul namespaces
the consul test cluster may have namespaces from other unrelated tests
This commit introduces the parameter preventRescheduleOnLost which indicates that the task group can't afford to have multiple instances running at the same time. In the case of a node going down, its allocations will be registered as unknown but no replacements will be rescheduled. If the lost node comes back up, the allocs will reconnect and continue to run.
In case of max_client_disconnect also being enabled, if there is a reschedule policy, an error will be returned.
Implements issue #10366
Co-authored-by: Dom Lavery <dom@circleci.com>
Co-authored-by: Tim Gross <tgross@hashicorp.com>
Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
Some of our documentation on `tls` configuration could be more clear as to
whether we're referring to mTLS or TLS. Also, when ACLs are enabled it's fine to
have `verify_https_client=false` (the default). Make it clear that this is an
acceptably secure configuration and that it's in fact recommended in order to
avoid pain of distributing client certs to user browsers.
* An example job with a few interesting actions
* A pretty different example job
* Tests updated with const'd number of default templates
* Removed default jobspec params and formatted
This will dump much of the interesting parts of cluster state, including
available nodes and their status, existing allocations and their status,
and existing evaluations and their status.
Fixes some errors in the documentation for the Consul integration, based on
tests locally without using the `nomad setup consul` command and updating the
docs to match.
* Consul CE doesn't support the `-namespace-rule-bind-namespace` option.
* The binding rule for services should not including the Nomad namespace in the
`bind-name` parameter (the service is registered in the appropriate Consul
namespace).
* The role for tasks should include the suffix "-tasks" in the name to match the
binding rule we create.
* Fix the Consul bound audiences to be a list of strings
* Fix some quoting issues in the commands.
In #18754 we accidentally fixed a bug that prevented poststop tasks from getting
access to Variables. This was fixed in the 1.6.x branch in #19270, at which
point we discovered the fix had been done in main already as part of the auth
refactor. Add a changelog entry for it.
Clients prior to Nomad 1.7 cannot support the new workload identity-based
authentication to Consul and Vault. Add an implicit Nomad version constraint on
job submission for task groups that use the new workflow.
Includes a constraint test showing same-version prelease handling.
Some sections of the `consul` configuration are relevant only for clients or
servers. We updated our Vault docs to split these parameters out into their own
sections for clarity. Match that for the Consul docs.
The new Workload Identity workflow for Vault tokens correctly handles post-stop
tasks, however the legacy workflow does not. Attempts to get a Vault token are
rejected if the allocation is server-terminal or client-terminal, but we should
be waiting until the allocation is client-terminal (only) so that poststop tasks
get a chance to get Vault tokens too.
Fixes: https://github.com/hashicorp/nomad/issues/16886