Commit Graph

63 Commits

Author SHA1 Message Date
Tim Gross
a1ede9765c docs: warn about UID overlap between workload and Envoy tproxy (#24291)
When using transparent proxy mode with the `connect` block, the UID of the
workload cannot be the same as the UID of the Envoy sidecar (currently 101 in
the default Envoy container image).

Fixes: https://github.com/hashicorp/nomad/issues/23508
2024-10-24 08:45:44 -04:00
R.B. Boyer
4e8f596311 docs: update broken consul acl token links (#24287) 2024-10-23 13:34:21 -04:00
Tim Gross
10358cc911 docs: warn about Consul auth method locality (#24275)
* docs: warn about Consul auth method locality

The locality of Consul tokens we mint via Workload Identity is governed by the
Consul auth method configuration. By default tokens are local to the Consul
datacenter, which typically maps 1:1 with a Nomad region. Cluster administrators
who need cross-datacenter tokens can get them by setting the locality to global,
at the risk of placement problems if the primary DC isn't available.

Ref: https://github.com/hashicorp/consul/issues/21863
Fixes: https://github.com/hashicorp/nomad/issues/23505
2024-10-23 11:44:03 -04:00
Tim Gross
7381f8419b docs: clarify requirements for Consul token policies and TTLs (#24167)
As of #24166, Nomad agents will use their own token to deregister services and
checks from Consul. This returns the deregistration path to the pre-Workload
Identity workflow. Expand the documentation to make clear why certain ACL
policies are required for clients.

Additionally, we did not explicitly call out that auth methods should not set an
expiration on Consul tokens. Nomad does not have a facility to refresh these
tokens if they expire. Even if Nomad could, there's no way to re-inject them
into Envoy sidecars for Consul Service Mesh without recreating the task anyways,
which is what happens today. Warn users that they should not set an expiration.

Closes: https://github.com/hashicorp/nomad/issues/20185 (wontfix)
Ref: https://hashicorp.atlassian.net/browse/NET-10262
2024-10-11 11:59:21 -04:00
Piotr Kazmierczak
c1362c03df docs: minimal Consul policy for Nomad agents needs node:write (#23800) 2024-08-13 17:53:21 +02:00
Aimee Ukasick
021692eccf docs: refactor CNI plugin content (#23707)
- Pulled common content from multiple pages into new partials
- Refactored install/index to be OS-based so I could add linux-distro-based instructions to install-consul-cni-plugins.mdx partial. The tab groups on the install/index page do match and change focus as expected.
- Moved CNI overview-type content to networking/index
- Refactored networking/cni to include install CNI plugins and configuration content (from install/index).
- Moved CNI plugins explanation in bridge mode configuration section into bullet points. They had been #### headings, which aren't rendered in the R page TOC. I tried to simplify and format the bullet point content to be easier to scan.

Ref: https://hashicorp.atlassian.net/browse/CE-661
Fixes: https://github.com/hashicorp/nomad/issues/23229
Fixes: https://github.com/hashicorp/nomad/issues/23583
2024-08-06 14:47:46 -04:00
Tim Gross
d5ca07a247 docs: notices of upcoming deprecations and backports (#23683)
Add a section to the docs describing planned upcoming deprecations and
removals. Also added some missing upgrade guide sections missed during the last
release.
2024-07-25 10:20:18 -04:00
Tim Gross
2f4353412d keyring: support prepublishing keys (#23577)
When a root key is rotated, the servers immediately start signing Workload
Identities with the new active key. But workloads may be using those WI tokens
to sign into external services, which may not have had time to fetch the new
public key and which might try to fetch new keys as needed.

Add support for prepublishing keys. Prepublished keys will be visible in the
JWKS endpoint but will not be used for signing or encryption until their
`PublishTime`. Update the periodic key rotation to prepublish keys at half the
`root_key_rotation_threshold` window, and promote prepublished keys to active
after the `PublishTime`.

This changeset also fixes two bugs in periodic root key rotation and garbage
collection, both of which can't be safely fixed without implementing
prepublishing:

* Periodic root key rotation would never happen because the default
  `root_key_rotation_threshold` of 720h exceeds the 72h maximum window of the FSM
  time table. We now compare the `CreateTime` against the wall clock time instead
  of the time table. (We expect to remove the time table in future work, ref
  https://github.com/hashicorp/nomad/issues/16359)
* Root key garbage collection could GC keys that were used to sign
  identities. We now wait until `root_key_rotation_threshold` +
  `root_key_gc_threshold` before GC'ing a key.
* When rekeying a root key, the core job did not mark the key as inactive after
  the rekey was complete.

Ref: https://hashicorp.atlassian.net/browse/NET-10398
Ref: https://hashicorp.atlassian.net/browse/NET-10280
Fixes: https://github.com/hashicorp/nomad/issues/19669
Fixes: https://github.com/hashicorp/nomad/issues/23528
Fixes: https://github.com/hashicorp/nomad/issues/19368
2024-07-19 13:29:41 -04:00
Tim Gross
df67e74615 Consul: add preflight checks for Envoy bootstrap (#23381)
Nomad creates Consul ACL tokens and service registrations to support Consul
service mesh workloads, before bootstrapping the Envoy proxy. Nomad always talks
to the local Consul agent and never directly to the Consul servers. But the
local Consul agent talks to the Consul servers in stale consistency mode to
reduce load on the servers. This can result in the Nomad client making the Envoy
bootstrap request with a tokens or services that have not yet replicated to the
follower that the local client is connected to. This request gets a 404 on the
ACL token and that negative entry gets cached, preventing any retries from
succeeding.

To workaround this, we'll use a method described by our friends over on
`consul-k8s` where after creating the objects in Consul we try to read them from
the local agent in stale consistency mode (which prevents a failed read from
being cached). This cannot completely eliminate this source of error because
it's possible that Consul cluster replication is unhealthy at the time we need
it, but this should make Envoy bootstrap significantly more robust.

This changset adds preflight checks for the objects we create in Consul:
* We add a preflight check for ACL tokens after we login via via Workload
  Identity and in the function we use to derive tokens in the legacy
  workflow. We do this check early because we also want to use this token for
  registering group services in the allocrunner hooks.
* We add a preflight check for services right before we bootstrap Envoy in the
  taskrunner hook, so that we have time for our service client to batch updates
  to the local Consul agent in addition to the local agent sync.

We've added the timeouts to be configurable via node metadata rather than the
usual static configuration because for most cases, users should not need to
touch or even know these values are configurable; the configuration is mostly
available for testing.


Fixes: https://github.com/hashicorp/nomad/issues/9307
Fixes: https://github.com/hashicorp/nomad/issues/10451
Fixes: https://github.com/hashicorp/nomad/issues/20516

Ref: https://github.com/hashicorp/consul-k8s/pull/887
Ref: https://hashicorp.atlassian.net/browse/NET-10051
Ref: https://hashicorp.atlassian.net/browse/NET-9273
Follow-up: https://hashicorp.atlassian.net/browse/NET-10138
2024-06-27 10:15:37 -04:00
David Yu
92af6280e3 Update service-mesh.mdx 2024-06-13 20:09:53 -07:00
David Yu
94bb91ab80 docs - release notes updates (#23312)
Also updated Consul compatibility matrix
2024-06-13 13:46:42 -04:00
Michael Schurter
a3b1810bdb doc: specify ca cert needs to be shared (#20620)
Specify that the Vault JWT auth method must be configured to trust Nomad's CA certificate when mTLS is enabled.
2024-05-17 14:49:48 -07:00
Tim Gross
1739f94e84 docs: fix a broken link on the Consul index page (#20387) 2024-04-12 15:31:48 -04:00
Tim Gross
43281f6038 docs: provide guidance on using Consul DNS (#20369)
Add a standalone section to the Consul integration docs showing how to configure
both the Consul agent and the workload to take advantage of Consul DNS. Include
a reference to the new transparent proxy feature as well.

Fixes: https://github.com/hashicorp/nomad/issues/18305
2024-04-12 14:38:04 -04:00
Tim Gross
9340c77b12 docs: remove extra indents in tproxy HCL examples 2024-04-10 10:21:32 -04:00
Tim Gross
e2e561da88 tproxy: documentation improvements 2024-04-10 08:55:50 -04:00
Tim Gross
bb062deadc docs: update service mesh integration docs for transparent proxy (#20251)
Update the service mesh integration docs to explain how Consul needs to be
configured for transparent proxy. Update the walkthrough to assume that
`transparent_proxy` mode is the best approach, and move the manually-configured
`upstreams` to a separate section for users who don't want to use Consul DNS.

Ref: https://github.com/hashicorp/nomad/pull/20175
Ref: https://github.com/hashicorp/nomad/pull/20241
2024-04-04 17:01:07 -04:00
Tim Gross
9c2286014f docs: update Consul compatibility matrix (#20242)
Version of Nomad and Consul that were known not to be compatible are no longer
supported in general. Update the compatibility matrix for Consul to match.
2024-03-27 16:11:14 -04:00
Luiz Aoqui
e1e80f383e vault: add new nomad setup vault -check commmand (#19720)
The new `nomad setup vault -check` commmand can be used to retrieve
information about the changes required before a cluster is migrated from
the deprecated legacy authentication flow with Vault to use only
workload identities.
2024-01-12 15:48:30 -05:00
Luiz Aoqui
b2aa6ffd05 docs: fix Consul ACL requirements (#19721)
Even with the new workload identitiy based flow the Nomad servers still
need the `acl = "write"` permission in order to revoke service identity
tokens.
2024-01-11 15:52:23 -05:00
Tim Gross
0935f443dc vault: support allowing tokens to expire without refresh (#19691)
Some users with batch workloads or short-lived prestart tasks want to derive a
Vaul token, use it, and then allow it to expire without requiring a constant
refresh. Add the `vault.allow_token_expiration` field, which works only with the
Workload Identity workflow and not the legacy workflow.

When set to true, this disables the client's renewal loop in the
`vault_hook`. When Vault revokes the token lease, the token will no longer be
valid. The client will also now automatically detect if the Vault auth
configuration does not allow renewals and will disable the renewal loop
automatically.

Note this should only be used when a secret is requested from Vault once at the
start of a task or in a short-lived prestart task. Long-running tasks should
never set `allow_token_expiration=true` if they obtain Vault secrets via
`template` blocks, as the Vault token will expire and the template runner will
continue to make failing requests to Vault until the `vault_retry` attempts are
exhausted.

Fixes: https://github.com/hashicorp/nomad/issues/8690
2024-01-10 14:49:02 -05:00
Luiz Aoqui
5267eec3ad vault: fix token revocation during workflow migration (#19689)
When transitioning from the legacy token-based workflow to the new JWT
workflow for Vault the previous code would instantiate a no-op Vault if
the server configuration had a `default_identity` block.

This no-op client returned an error for some of its operations were
called, such as `LookupToken` and `RevokeTokens`. The original intention
was that, in the new JWT workflow, none of these methods should be
called, so returning an error could help surface potential bugs.

But the `RevokeTokens` and `MarkForRevocation` methods _are_ called even
in the JWT flow. When a leadership transition happens, the new server
looks for unused Vault accessors from state and tries to revoke them.
Similarly, the `RevokeTokens` method is called every time the
`Node.UpdataStatus` and `Node.UpdateAlloc` RPCs are made by clients, as
the Nomad server tries to find unused Vault tokens for the node/alloc.

Since the new JWT flow does not require Nomad servers to contact Vault,
calling `RevokeTokens` and `MarkForRevocation` is not able to complete
without a Vault token, so this commit changes the logic to use the no-op
Vault client when no token is configured. It also updates the client
itself to not error if these methods are called, but to rather just log
so operators can be made aware that there are Vault tokens created by
Nomad that have not been force-expired.

When migrating an existing cluster to the new workload identity based
flow, Nomad operators must first upgrade the Nomad version without
removing any of the existing Vault configuration. Doing so can prevent
Nomad servers from managing and cleaning-up existing Vault tokens during
a leadership transition and node or alloc updates.

Operators must also resubmit all jobs with a `vault` block so they are
updated with an `identity` for Vault. Skipping this step may cause
allocations to fail if their Vault token expires (if, for example, the
Nomad client stops running for TTL/2) or if they are rescheduled, since
the new client will try to follow the legacy flow which will fail if the
Nomad server configuration for Vault has already been updated to remove
the Vault address and token.
2024-01-10 13:28:46 -05:00
Luiz Aoqui
a8d1447550 docs: update Consul and Vault integration (#19424) 2023-12-14 15:14:55 -05:00
Tim Gross
37df614da6 docs: fix recommended binding rules for Consul integration (#19299)
Fixes some errors in the documentation for the Consul integration, based on
tests locally without using the `nomad setup consul` command and updating the
docs to match.

* Consul CE doesn't support the `-namespace-rule-bind-namespace` option.
* The binding rule for services should not including the Nomad namespace in the
  `bind-name` parameter (the service is registered in the appropriate Consul
  namespace).
* The role for tasks should include the suffix "-tasks" in the name to match the
  binding rule we create.
* Fix the Consul bound audiences to be a list of strings
* Fix some quoting issues in the commands.
2023-12-04 11:56:03 -05:00
Piotr Kazmierczak
e57dcdf106 docs: adjust claim mappings for Consul auth method (#19244) 2023-11-30 20:01:18 +01:00
Piotr Kazmierczak
d699b82df6 docs: update consul-integration to include ns changes (#19239)
Co-authored-by: Tim Gross <tgross@hashicorp.com>
2023-11-30 16:37:48 +01:00
Piotr Kazmierczak
26b778bb0c docs: correction to Consul integration TLS note (#19207) 2023-11-28 19:22:02 +01:00
Piotr Kazmierczak
248b2ba5cd WI: use single auth method for Consul by default (#19169)
This simplifies the default setup of Nomad workloads WI-based
authentication for Consul by using a single auth method with 2 binding rules.

Users can still specify separate auth methods for services and tasks.
2023-11-28 12:22:27 +01:00
Piotr Kazmierczak
3b701ee0cf docs: additional note about JWKS endpoints and CA certs (#19144) 2023-11-27 17:34:44 +01:00
Tim Gross
2bff6d2a6a docs: fix token_period in example Vault role for WI (#18939)
Vault tokens requested for WI are "periodic" Vault tokens (ones that get
periodically renewed). The field we should be setting for the renewal window is
`token_period`.
2023-10-31 16:33:03 -04:00
Michael Schurter
9afc70ef5a Fix Vault docs to use HCL instead of JSON (#18938) 2023-10-31 13:25:20 -07:00
Tim Gross
47f2118f40 docs: Vault Workload Identity integration (#18704)
Documentation updates to support the new Vault integration with Nomad Workload
Identity. Included:

* Added a large section to the Vault integration docs to explain how to set up
  auth methods, roles, and policies (by hand, assuming we don't ship a `nomad
  setup-vault` tool for now), and how to safely migrate from the existing workflow
  to the new one.
* Shuffled around some of the existing text so that the legacy authentication
  method text is in its own section.
* Added a compatibility matrix to the Vault integration page.
2023-10-26 10:33:52 -04:00
Piotr Kazmierczak
7f62dec473 consul WI: rename default auth method for services (#18867)
It should be called nomad-services instead of nomad-workloads.
2023-10-26 09:43:33 +02:00
Kerim Satirli
5e1bbf90fc docs: update all URLs to developer.hashicorp.com (#16247) 2023-10-24 11:00:11 -04:00
Tim Gross
8a311255a2 docs: Consul Workload Identity integration (#18685)
Documentation updates to support the new Consul integration with Nomad Workload
Identity. Included:

* Added a large section to the Consul integration docs to explain how to set up
  auth methods and binding rules (by hand, assuming we don't ship a `nomad
  setup-consul` tool for now), and how to safely migrate from the existing
  workflow to the new one.
* Move `consul` block out of `group` and onto its own page now that we have it
  available at the `task` scope, and expanded examples of its use.
* Added the `service_identity` and `task_identity` blocks to the Nomad agent
  configuration, and provided a recommended default.
* Added the `identity` block to the `service` block page.
* Added a rough compatibility matrix to the Consul integration page.
2023-10-23 09:17:22 -04:00
Jose Merchan
20f6ec75ef Update consul-connect.mdx (#18575)
The hyperlink points to a non-existing URL. I suggest change it for this one (https://developer.hashicorp.com/consul/docs/install/ports) which at least listed the port 8503 (grpc tls)
2023-09-26 10:04:54 +01:00
Iwan Aucamp
f122d291d2 docs: fix a sentence in vault-integration.mdx (#18296) 2023-08-23 11:24:23 +01:00
Seth Hoenig
a7a2a3ce56 docs: move CNI reference plugins installation to CNI overview page (#17068)
* docs: move CNI reference plugins installation to CNI overview page

This PR moves the instruction steps for install the CNI reference plugins
from the Consul Mesh integration page to the general Networking CNI page.

These plugins are required for bridge networking, not just Consul Mesh,
so it makes sense to have them on the general CNI page.

Closes #17038

* docs: fix a link to post install steps
2023-05-04 11:32:06 -05:00
Daniel Bennett
fad28e4265 docs: how to troubleshoot consul connect envoy (#15908)
* largely a doc-ification of this commit message:
  d47678074b
  this doesn't spell out all the possible failure modes,
  but should be a good starting point for folks.

* connect: add doc link to envoy bootstrap error

* add Unwrap() to RecoverableError
  mainly for easier testing
2023-02-02 14:20:26 -06:00
Daniel Bennett
9f583f57f5 Change job init default to example.nomad.hcl and recommend in docs (#15997)
recommend .nomad.hcl for job files instead of .nomad (without .hcl)
* nomad job init -> example.nomad.hcl
* update docs
2023-02-02 11:47:47 -06:00
Piotr Kazmierczak
949a6f60c7 renamed stanza to block for consistency with other projects (#15941) 2023-01-30 15:48:43 +01:00
Ashlee M Boyer
3444ece549 docs: Migrate link formats (#15779)
* Adding check-legacy-links-format workflow

* Adding test-link-rewrites workflow

* chore: updates link checker workflow hash

* Migrating links to new format

Co-authored-by: Kendall Strautman <kendallstrautman@gmail.com>
2023-01-25 09:31:14 -08:00
Seth Hoenig
c3017da6af consul: add client configuration for grpc_ca_file (#15701)
* [no ci] first pass at plumbing grpc_ca_file

* consul: add support for grpc_ca_file for tls grpc connections in consul 1.14+

This PR adds client config to Nomad for specifying consul.grpc_ca_file

These changes combined with https://github.com/hashicorp/consul/pull/15913 should
finally enable Nomad users to upgrade to Consul 1.14+ and use tls grpc connections.

* consul: add cl entgry for grpc_ca_file

* docs: mention grpc_tls changes due to Consul 1.14
2023-01-11 09:34:28 -06:00
James Rasell
847c2cc528 client: accommodate Consul 1.14.0 gRPC and agent self changes. (#15309)
* client: accommodate Consul 1.14.0 gRPC and agent self changes.

Consul 1.14.0 changed the way in which gRPC listeners are
configured, particularly when using TLS. Prior to the change, a
single listener was responsible for handling plain-text and
encrypted gRPC requests. In 1.14.0 and beyond, separate listeners
will be used for each, defaulting to 8502 and 8503 for plain-text
and TLS respectively.

The change means that Nomad’s Consul Connect integration would not
work when integrated with Consul clusters using TLS and running
1.14.0 or greater.

The Nomad Consul fingerprinter identifies the gRPC port Consul has
exposed using the "DebugConfig.GRPCPort" value from Consul’s
“/v1/agent/self” endpoint. In Consul 1.14.0 and greater, this only
represents the plain-text gRPC port which is likely to be disbaled
in clusters running TLS. In order to fix this issue, Nomad now
takes into account the Consul version and configured scheme to
optionally use “DebugConfig.GRPCTLSPort” value from Consul’s agent
self return.

The “consul_grcp_socket” allocrunner hook has also been updated so
that the fingerprinted gRPC port attribute is passed in. This
provides a better fallback method, when the operator does not
configure the “consul.grpc_address” option.

* docs: modify Consul Connect entries to detail 1.14.0 changes.

* changelog: add entry for #15309

* fixup: tidy tests and clean version match from review feedback.

* fixup: use strings tolower func.
2022-11-21 09:19:09 -06:00
Bryce Kalow
f49b3a95dd website: fixes redirected links (#14918) 2022-10-18 10:31:52 -05:00
Bryce Kalow
67d39725b1 website: content updates for developer (#14473)
Co-authored-by: Geoffrey Grosenbach <26+topfunky@users.noreply.github.com>
Co-authored-by: Anthony <russo555@gmail.com>
Co-authored-by: Ashlee Boyer <ashlee.boyer@hashicorp.com>
Co-authored-by: Ashlee M Boyer <43934258+ashleemboyer@users.noreply.github.com>
Co-authored-by: HashiBot <62622282+hashibot-web@users.noreply.github.com>
Co-authored-by: Kevin Wang <kwangsan@gmail.com>
2022-09-16 10:38:39 -05:00
Michelle Noorali
b9e084a4b7 doc: explain permissions for Vault sys/capabilties-self 2022-07-06 10:01:30 -04:00
Derek Strickland
e78a5908b9 docker: update images to reference hashicorpdev Docker organization (#12903)
docker: update images to reference hashicorpdev dockerhub organization
generate job_init.bindata_assetfs.go

Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
2022-06-08 15:06:00 -04:00
Anthony
5b80907a5d docs: added note about vault -period flag (#13185) 2022-05-31 14:26:03 -07:00
Luiz Aoqui
0abe5a6c79 vault: revert support for entity aliases (#12723)
After a more detailed analysis of this feature, the approach taken in
PR #12449 was found to be not ideal due to poor UX (users are
responsible for setting the entity alias they would like to use) and
issues around jobs potentially masquerading itself as another Vault
entity.
2022-04-22 10:46:34 -04:00