Commit Graph

55 Commits

Author SHA1 Message Date
Tim Gross
df67e74615 Consul: add preflight checks for Envoy bootstrap (#23381)
Nomad creates Consul ACL tokens and service registrations to support Consul
service mesh workloads, before bootstrapping the Envoy proxy. Nomad always talks
to the local Consul agent and never directly to the Consul servers. But the
local Consul agent talks to the Consul servers in stale consistency mode to
reduce load on the servers. This can result in the Nomad client making the Envoy
bootstrap request with a tokens or services that have not yet replicated to the
follower that the local client is connected to. This request gets a 404 on the
ACL token and that negative entry gets cached, preventing any retries from
succeeding.

To workaround this, we'll use a method described by our friends over on
`consul-k8s` where after creating the objects in Consul we try to read them from
the local agent in stale consistency mode (which prevents a failed read from
being cached). This cannot completely eliminate this source of error because
it's possible that Consul cluster replication is unhealthy at the time we need
it, but this should make Envoy bootstrap significantly more robust.

This changset adds preflight checks for the objects we create in Consul:
* We add a preflight check for ACL tokens after we login via via Workload
  Identity and in the function we use to derive tokens in the legacy
  workflow. We do this check early because we also want to use this token for
  registering group services in the allocrunner hooks.
* We add a preflight check for services right before we bootstrap Envoy in the
  taskrunner hook, so that we have time for our service client to batch updates
  to the local Consul agent in addition to the local agent sync.

We've added the timeouts to be configurable via node metadata rather than the
usual static configuration because for most cases, users should not need to
touch or even know these values are configurable; the configuration is mostly
available for testing.


Fixes: https://github.com/hashicorp/nomad/issues/9307
Fixes: https://github.com/hashicorp/nomad/issues/10451
Fixes: https://github.com/hashicorp/nomad/issues/20516

Ref: https://github.com/hashicorp/consul-k8s/pull/887
Ref: https://hashicorp.atlassian.net/browse/NET-10051
Ref: https://hashicorp.atlassian.net/browse/NET-9273
Follow-up: https://hashicorp.atlassian.net/browse/NET-10138
2024-06-27 10:15:37 -04:00
David Yu
92af6280e3 Update service-mesh.mdx 2024-06-13 20:09:53 -07:00
David Yu
94bb91ab80 docs - release notes updates (#23312)
Also updated Consul compatibility matrix
2024-06-13 13:46:42 -04:00
Michael Schurter
a3b1810bdb doc: specify ca cert needs to be shared (#20620)
Specify that the Vault JWT auth method must be configured to trust Nomad's CA certificate when mTLS is enabled.
2024-05-17 14:49:48 -07:00
Tim Gross
1739f94e84 docs: fix a broken link on the Consul index page (#20387) 2024-04-12 15:31:48 -04:00
Tim Gross
43281f6038 docs: provide guidance on using Consul DNS (#20369)
Add a standalone section to the Consul integration docs showing how to configure
both the Consul agent and the workload to take advantage of Consul DNS. Include
a reference to the new transparent proxy feature as well.

Fixes: https://github.com/hashicorp/nomad/issues/18305
2024-04-12 14:38:04 -04:00
Tim Gross
9340c77b12 docs: remove extra indents in tproxy HCL examples 2024-04-10 10:21:32 -04:00
Tim Gross
e2e561da88 tproxy: documentation improvements 2024-04-10 08:55:50 -04:00
Tim Gross
bb062deadc docs: update service mesh integration docs for transparent proxy (#20251)
Update the service mesh integration docs to explain how Consul needs to be
configured for transparent proxy. Update the walkthrough to assume that
`transparent_proxy` mode is the best approach, and move the manually-configured
`upstreams` to a separate section for users who don't want to use Consul DNS.

Ref: https://github.com/hashicorp/nomad/pull/20175
Ref: https://github.com/hashicorp/nomad/pull/20241
2024-04-04 17:01:07 -04:00
Tim Gross
9c2286014f docs: update Consul compatibility matrix (#20242)
Version of Nomad and Consul that were known not to be compatible are no longer
supported in general. Update the compatibility matrix for Consul to match.
2024-03-27 16:11:14 -04:00
Luiz Aoqui
e1e80f383e vault: add new nomad setup vault -check commmand (#19720)
The new `nomad setup vault -check` commmand can be used to retrieve
information about the changes required before a cluster is migrated from
the deprecated legacy authentication flow with Vault to use only
workload identities.
2024-01-12 15:48:30 -05:00
Luiz Aoqui
b2aa6ffd05 docs: fix Consul ACL requirements (#19721)
Even with the new workload identitiy based flow the Nomad servers still
need the `acl = "write"` permission in order to revoke service identity
tokens.
2024-01-11 15:52:23 -05:00
Tim Gross
0935f443dc vault: support allowing tokens to expire without refresh (#19691)
Some users with batch workloads or short-lived prestart tasks want to derive a
Vaul token, use it, and then allow it to expire without requiring a constant
refresh. Add the `vault.allow_token_expiration` field, which works only with the
Workload Identity workflow and not the legacy workflow.

When set to true, this disables the client's renewal loop in the
`vault_hook`. When Vault revokes the token lease, the token will no longer be
valid. The client will also now automatically detect if the Vault auth
configuration does not allow renewals and will disable the renewal loop
automatically.

Note this should only be used when a secret is requested from Vault once at the
start of a task or in a short-lived prestart task. Long-running tasks should
never set `allow_token_expiration=true` if they obtain Vault secrets via
`template` blocks, as the Vault token will expire and the template runner will
continue to make failing requests to Vault until the `vault_retry` attempts are
exhausted.

Fixes: https://github.com/hashicorp/nomad/issues/8690
2024-01-10 14:49:02 -05:00
Luiz Aoqui
5267eec3ad vault: fix token revocation during workflow migration (#19689)
When transitioning from the legacy token-based workflow to the new JWT
workflow for Vault the previous code would instantiate a no-op Vault if
the server configuration had a `default_identity` block.

This no-op client returned an error for some of its operations were
called, such as `LookupToken` and `RevokeTokens`. The original intention
was that, in the new JWT workflow, none of these methods should be
called, so returning an error could help surface potential bugs.

But the `RevokeTokens` and `MarkForRevocation` methods _are_ called even
in the JWT flow. When a leadership transition happens, the new server
looks for unused Vault accessors from state and tries to revoke them.
Similarly, the `RevokeTokens` method is called every time the
`Node.UpdataStatus` and `Node.UpdateAlloc` RPCs are made by clients, as
the Nomad server tries to find unused Vault tokens for the node/alloc.

Since the new JWT flow does not require Nomad servers to contact Vault,
calling `RevokeTokens` and `MarkForRevocation` is not able to complete
without a Vault token, so this commit changes the logic to use the no-op
Vault client when no token is configured. It also updates the client
itself to not error if these methods are called, but to rather just log
so operators can be made aware that there are Vault tokens created by
Nomad that have not been force-expired.

When migrating an existing cluster to the new workload identity based
flow, Nomad operators must first upgrade the Nomad version without
removing any of the existing Vault configuration. Doing so can prevent
Nomad servers from managing and cleaning-up existing Vault tokens during
a leadership transition and node or alloc updates.

Operators must also resubmit all jobs with a `vault` block so they are
updated with an `identity` for Vault. Skipping this step may cause
allocations to fail if their Vault token expires (if, for example, the
Nomad client stops running for TTL/2) or if they are rescheduled, since
the new client will try to follow the legacy flow which will fail if the
Nomad server configuration for Vault has already been updated to remove
the Vault address and token.
2024-01-10 13:28:46 -05:00
Luiz Aoqui
a8d1447550 docs: update Consul and Vault integration (#19424) 2023-12-14 15:14:55 -05:00
Tim Gross
37df614da6 docs: fix recommended binding rules for Consul integration (#19299)
Fixes some errors in the documentation for the Consul integration, based on
tests locally without using the `nomad setup consul` command and updating the
docs to match.

* Consul CE doesn't support the `-namespace-rule-bind-namespace` option.
* The binding rule for services should not including the Nomad namespace in the
  `bind-name` parameter (the service is registered in the appropriate Consul
  namespace).
* The role for tasks should include the suffix "-tasks" in the name to match the
  binding rule we create.
* Fix the Consul bound audiences to be a list of strings
* Fix some quoting issues in the commands.
2023-12-04 11:56:03 -05:00
Piotr Kazmierczak
e57dcdf106 docs: adjust claim mappings for Consul auth method (#19244) 2023-11-30 20:01:18 +01:00
Piotr Kazmierczak
d699b82df6 docs: update consul-integration to include ns changes (#19239)
Co-authored-by: Tim Gross <tgross@hashicorp.com>
2023-11-30 16:37:48 +01:00
Piotr Kazmierczak
26b778bb0c docs: correction to Consul integration TLS note (#19207) 2023-11-28 19:22:02 +01:00
Piotr Kazmierczak
248b2ba5cd WI: use single auth method for Consul by default (#19169)
This simplifies the default setup of Nomad workloads WI-based
authentication for Consul by using a single auth method with 2 binding rules.

Users can still specify separate auth methods for services and tasks.
2023-11-28 12:22:27 +01:00
Piotr Kazmierczak
3b701ee0cf docs: additional note about JWKS endpoints and CA certs (#19144) 2023-11-27 17:34:44 +01:00
Tim Gross
2bff6d2a6a docs: fix token_period in example Vault role for WI (#18939)
Vault tokens requested for WI are "periodic" Vault tokens (ones that get
periodically renewed). The field we should be setting for the renewal window is
`token_period`.
2023-10-31 16:33:03 -04:00
Michael Schurter
9afc70ef5a Fix Vault docs to use HCL instead of JSON (#18938) 2023-10-31 13:25:20 -07:00
Tim Gross
47f2118f40 docs: Vault Workload Identity integration (#18704)
Documentation updates to support the new Vault integration with Nomad Workload
Identity. Included:

* Added a large section to the Vault integration docs to explain how to set up
  auth methods, roles, and policies (by hand, assuming we don't ship a `nomad
  setup-vault` tool for now), and how to safely migrate from the existing workflow
  to the new one.
* Shuffled around some of the existing text so that the legacy authentication
  method text is in its own section.
* Added a compatibility matrix to the Vault integration page.
2023-10-26 10:33:52 -04:00
Piotr Kazmierczak
7f62dec473 consul WI: rename default auth method for services (#18867)
It should be called nomad-services instead of nomad-workloads.
2023-10-26 09:43:33 +02:00
Kerim Satirli
5e1bbf90fc docs: update all URLs to developer.hashicorp.com (#16247) 2023-10-24 11:00:11 -04:00
Tim Gross
8a311255a2 docs: Consul Workload Identity integration (#18685)
Documentation updates to support the new Consul integration with Nomad Workload
Identity. Included:

* Added a large section to the Consul integration docs to explain how to set up
  auth methods and binding rules (by hand, assuming we don't ship a `nomad
  setup-consul` tool for now), and how to safely migrate from the existing
  workflow to the new one.
* Move `consul` block out of `group` and onto its own page now that we have it
  available at the `task` scope, and expanded examples of its use.
* Added the `service_identity` and `task_identity` blocks to the Nomad agent
  configuration, and provided a recommended default.
* Added the `identity` block to the `service` block page.
* Added a rough compatibility matrix to the Consul integration page.
2023-10-23 09:17:22 -04:00
Jose Merchan
20f6ec75ef Update consul-connect.mdx (#18575)
The hyperlink points to a non-existing URL. I suggest change it for this one (https://developer.hashicorp.com/consul/docs/install/ports) which at least listed the port 8503 (grpc tls)
2023-09-26 10:04:54 +01:00
Iwan Aucamp
f122d291d2 docs: fix a sentence in vault-integration.mdx (#18296) 2023-08-23 11:24:23 +01:00
Seth Hoenig
a7a2a3ce56 docs: move CNI reference plugins installation to CNI overview page (#17068)
* docs: move CNI reference plugins installation to CNI overview page

This PR moves the instruction steps for install the CNI reference plugins
from the Consul Mesh integration page to the general Networking CNI page.

These plugins are required for bridge networking, not just Consul Mesh,
so it makes sense to have them on the general CNI page.

Closes #17038

* docs: fix a link to post install steps
2023-05-04 11:32:06 -05:00
Daniel Bennett
fad28e4265 docs: how to troubleshoot consul connect envoy (#15908)
* largely a doc-ification of this commit message:
  d47678074b
  this doesn't spell out all the possible failure modes,
  but should be a good starting point for folks.

* connect: add doc link to envoy bootstrap error

* add Unwrap() to RecoverableError
  mainly for easier testing
2023-02-02 14:20:26 -06:00
Daniel Bennett
9f583f57f5 Change job init default to example.nomad.hcl and recommend in docs (#15997)
recommend .nomad.hcl for job files instead of .nomad (without .hcl)
* nomad job init -> example.nomad.hcl
* update docs
2023-02-02 11:47:47 -06:00
Piotr Kazmierczak
949a6f60c7 renamed stanza to block for consistency with other projects (#15941) 2023-01-30 15:48:43 +01:00
Ashlee M Boyer
3444ece549 docs: Migrate link formats (#15779)
* Adding check-legacy-links-format workflow

* Adding test-link-rewrites workflow

* chore: updates link checker workflow hash

* Migrating links to new format

Co-authored-by: Kendall Strautman <kendallstrautman@gmail.com>
2023-01-25 09:31:14 -08:00
Seth Hoenig
c3017da6af consul: add client configuration for grpc_ca_file (#15701)
* [no ci] first pass at plumbing grpc_ca_file

* consul: add support for grpc_ca_file for tls grpc connections in consul 1.14+

This PR adds client config to Nomad for specifying consul.grpc_ca_file

These changes combined with https://github.com/hashicorp/consul/pull/15913 should
finally enable Nomad users to upgrade to Consul 1.14+ and use tls grpc connections.

* consul: add cl entgry for grpc_ca_file

* docs: mention grpc_tls changes due to Consul 1.14
2023-01-11 09:34:28 -06:00
James Rasell
847c2cc528 client: accommodate Consul 1.14.0 gRPC and agent self changes. (#15309)
* client: accommodate Consul 1.14.0 gRPC and agent self changes.

Consul 1.14.0 changed the way in which gRPC listeners are
configured, particularly when using TLS. Prior to the change, a
single listener was responsible for handling plain-text and
encrypted gRPC requests. In 1.14.0 and beyond, separate listeners
will be used for each, defaulting to 8502 and 8503 for plain-text
and TLS respectively.

The change means that Nomad’s Consul Connect integration would not
work when integrated with Consul clusters using TLS and running
1.14.0 or greater.

The Nomad Consul fingerprinter identifies the gRPC port Consul has
exposed using the "DebugConfig.GRPCPort" value from Consul’s
“/v1/agent/self” endpoint. In Consul 1.14.0 and greater, this only
represents the plain-text gRPC port which is likely to be disbaled
in clusters running TLS. In order to fix this issue, Nomad now
takes into account the Consul version and configured scheme to
optionally use “DebugConfig.GRPCTLSPort” value from Consul’s agent
self return.

The “consul_grcp_socket” allocrunner hook has also been updated so
that the fingerprinted gRPC port attribute is passed in. This
provides a better fallback method, when the operator does not
configure the “consul.grpc_address” option.

* docs: modify Consul Connect entries to detail 1.14.0 changes.

* changelog: add entry for #15309

* fixup: tidy tests and clean version match from review feedback.

* fixup: use strings tolower func.
2022-11-21 09:19:09 -06:00
Bryce Kalow
f49b3a95dd website: fixes redirected links (#14918) 2022-10-18 10:31:52 -05:00
Bryce Kalow
67d39725b1 website: content updates for developer (#14473)
Co-authored-by: Geoffrey Grosenbach <26+topfunky@users.noreply.github.com>
Co-authored-by: Anthony <russo555@gmail.com>
Co-authored-by: Ashlee Boyer <ashlee.boyer@hashicorp.com>
Co-authored-by: Ashlee M Boyer <43934258+ashleemboyer@users.noreply.github.com>
Co-authored-by: HashiBot <62622282+hashibot-web@users.noreply.github.com>
Co-authored-by: Kevin Wang <kwangsan@gmail.com>
2022-09-16 10:38:39 -05:00
Michelle Noorali
b9e084a4b7 doc: explain permissions for Vault sys/capabilties-self 2022-07-06 10:01:30 -04:00
Derek Strickland
e78a5908b9 docker: update images to reference hashicorpdev Docker organization (#12903)
docker: update images to reference hashicorpdev dockerhub organization
generate job_init.bindata_assetfs.go

Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
2022-06-08 15:06:00 -04:00
Anthony
5b80907a5d docs: added note about vault -period flag (#13185) 2022-05-31 14:26:03 -07:00
Luiz Aoqui
0abe5a6c79 vault: revert support for entity aliases (#12723)
After a more detailed analysis of this feature, the approach taken in
PR #12449 was found to be not ideal due to poor UX (users are
responsible for setting the entity alias they would like to use) and
issues around jobs potentially masquerading itself as another Vault
entity.
2022-04-22 10:46:34 -04:00
Seth Hoenig
b2a2f77d40 docs: update documentation with connect acls changes
This PR updates the changelog, adds notes the 1.3 upgrade guide, and
updates the connect integration docs with documentation about the new
requirement on Consul ACL policies of Consul agent default anonymous ACL
tokens.
2022-04-18 08:22:33 -05:00
Luiz Aoqui
d412f7b497 Support Vault entity aliases (#12449)
Move some common Vault API data struct decoding out of the Vault client
so it can be reused in other situations.

Make Vault job validation its own function so it's easier to expand it.

Rename the `Job.VaultPolicies` method to just `Job.Vault` since it
returns the full Vault block, not just their policies.

Set `ChangeMode` on `Vault.Canonicalize`.

Add some missing tests.

Allows specifying an entity alias that will be used by Nomad when
deriving the task Vault token.

An entity alias assigns an indentity to a token, allowing better control
and management of Vault clients since all tokens with the same indentity
alias will now be considered the same client. This helps track Nomad
activity in Vault's audit logs and better control over Vault billing.

Add support for a new Nomad server configuration to define a default
entity alias to be used when deriving Vault tokens. This default value
will be used if the task doesn't have an entity alias defined.
2022-04-05 14:18:10 -04:00
Charlie Voiselle
dce23e829f DOCS: Update Consul Connect to Consul service mesh (#11362)
* Update Consul Connect to Consul service mesh
* Apply suggestions from code review
2021-10-26 15:10:21 -04:00
Forest Anderson
d891a716a8 Change dashboard port to http (#11129) 2021-09-03 20:34:40 -04:00
Mahmood Ali
499fcebc42 docs: Consul Connect tweaks (#11040)
Tweaks to the commands in Consul Connect page.

For multi-command scripts, having the leading `$` is a bit annoying, as it makes copying the text harder. Also, the `copy` button would only copy the first command and ignore the rest.

Also, the `echo 1 > ...` commands are required to run as root, unlike the rest! I made them use `| sudo tee` pattern to ease copy & paste as well.

Lastly, update the CNI plugin links to 1.0.0. It's fresh off the oven - just got released less than an hour ago: https://github.com/containernetworking/plugins/releases/tag/v1.0.0 .
2021-08-11 17:14:26 -04:00
Seth Hoenig
2922839fff docs: apply suggestions from code review
Co-authored-by: Tim Gross <tgross@hashicorp.com>
2021-04-05 10:03:19 -06:00
Seth Hoenig
a97254fa20 consul: plubming for specifying consul namespace in job/group
This PR adds the common OSS changes for adding support for Consul Namespaces,
which is going to be a Nomad Enterprise feature. There is no new functionality
provided by this changeset and hopefully no new bugs.
2021-04-05 10:03:19 -06:00
Bryce Kalow
ee79587a67 feat(website): migrates to new nav data format (#10264) 2021-03-31 08:43:17 -05:00