Commit Graph

166 Commits

Author SHA1 Message Date
James Rasell
7d48aa2667 client: emit optional telemetry from prerun and prestart hooks. (#24556)
The Nomad client can now optionally emit telemetry data from the
prerun and prestart hooks. This allows operators to monitor and
alert on failures and time taken to complete.

The new datapoints are:
  - nomad.client.alloc_hook.prerun.success (counter)
  - nomad.client.alloc_hook.prerun.failed (counter)
  - nomad.client.alloc_hook.prerun.elapsed (sample)

  - nomad.client.task_hook.prestart.success (counter)
  - nomad.client.task_hook.prestart.failed (counter)
  - nomad.client.task_hook.prestart.elapsed (sample)

The hook execution time is useful to Nomad engineering and will
help optimize code where possible and understand job specification
impacts on hook performance.

Currently only the PreRun and PreStart hooks have telemetry
enabled, so we limit the number of new metrics being produced.
2024-12-12 14:43:14 +00:00
Piotr Kazmierczak
368241dbf2 security: a more comprehensive env.denylist (#24540)
A more comprehensive env.denylist that now includes more token, token file and
license variables. 

---------

Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>
2024-11-22 18:54:18 +01:00
James Rasell
dc501339da docs: Add federated region concept and operations pages. (#24477)
In order to help users understand multi-region federated
deployments, this change adds two new sections to the website.

The first expands the architecture page, so we can add further
detail over time with an initial federation page. The second adds
a federation operations page which goes into failure planning and
mitigation.

Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>
Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
2024-11-19 12:39:57 +00:00
Eduardo Medeiros
f8c85b036b docs: remove duplicated word. (#24433)
remove duplicated word “Using using”
2024-11-11 16:10:10 -05:00
Tim Gross
7381f8419b docs: clarify requirements for Consul token policies and TTLs (#24167)
As of #24166, Nomad agents will use their own token to deregister services and
checks from Consul. This returns the deregistration path to the pre-Workload
Identity workflow. Expand the documentation to make clear why certain ACL
policies are required for clients.

Additionally, we did not explicitly call out that auth methods should not set an
expiration on Consul tokens. Nomad does not have a facility to refresh these
tokens if they expire. Even if Nomad could, there's no way to re-inject them
into Envoy sidecars for Consul Service Mesh without recreating the task anyways,
which is what happens today. Warn users that they should not set an expiration.

Closes: https://github.com/hashicorp/nomad/issues/20185 (wontfix)
Ref: https://hashicorp.atlassian.net/browse/NET-10262
2024-10-11 11:59:21 -04:00
Martijn Vegter
3ecf0d21e2 metrics: introduce client config to include alloc metadata as part of the base labels (#23964) 2024-10-02 10:55:44 -04:00
Daniel Bennett
2f5cf8efae networking: option to enable ipv6 on bridge network (#23882)
by setting bridge_network_subnet_ipv6 in client config

Co-authored-by: Martina Santangelo <martina.santangelo@hashicorp.com>
2024-09-04 10:17:10 -05:00
Tim Gross
c43e30a387 WI: interpolate parent job ID in vault.default_identity.extra_claims (#23817)
When we interpolate job fields for the `vault.default_identity.extra_claims`
block, we forgot to use the parent job ID when that's available (as we do for
all other claims). This changeset fixes the bug and adds a helper method that'll
hopefully remind us to do this going forward.

Also added a missing changelog entry for #23675 where we implemented the
`extra_claims` block originally, which shipped in Nomad 1.8.3.

Fixes: https://github.com/hashicorp/nomad/issues/23798
2024-09-03 13:56:36 -04:00
Martijn Vegter
aded4b3500 docs: remove remaining references to network_speed config (#23792) 2024-08-14 14:14:38 -04:00
Tim Gross
bc50eebebd workload identity: add support for extra claims config for Vault (#23675)
Although we encourage users to use Vault roles, sometimes they're going to want
to assign policies based on entity and pre-create entities and aliases based on
claims. This allows them to use single default role (or at least small number of
them) that has a templated policy, but have an escape hatch from that.

When defining Vault entities the `user_claim` must be unique. When writing Vault
binding rules for use with Nomad workload identities the binding rule won't be
able to create a 1:1 mapping because the selector language allows accessing only
a single field. The `nomad_job_id` claim isn't sufficient to uniquely identify a
job because of namespaces. It's possible to create a JWT auth role with
`bound_claims` to avoid this becoming a security problem, but this doesn't allow
for correct accounting of user claims.

Add support for an `extra_claims` block on the server's `default_identity`
blocks for Vault. This allows a cluster administrator to add a custom claim on
all allocations. The values for these claims are interpolatable with a limited
subset of fields, similar to how we interpolate the task environment.

Fixes: https://github.com/hashicorp/nomad/issues/23510
Ref: https://hashicorp.atlassian.net/browse/NET-10372
Ref: https://hashicorp.atlassian.net/browse/NET-10387
2024-08-05 15:01:54 -04:00
Aimee Ukasick
cbacdb2041 DOCS: CE-659 chroot limitations for isolated fork/exec driver (#23739) 2024-08-05 14:35:54 -04:00
Tim Gross
9ff7437b06 docs: document client.alloc_mounts_dir configuration (#23733)
In Nomad 1.8.0 we introduced the `alloc_mounts_dir` to support unveil filesystem
isolation, but we didn't document the configuration value.
2024-08-05 11:59:47 -04:00
Tim Gross
9d4686c0df tls: remove deprecated prefer_server_cipher_suites field (#23712)
The TLS configuration object includes a deprecated `prefer_server_cipher_suites`
field. In version of Go prior to 1.17, this property controlled whether a TLS
connection would use the cipher suites preferred by the server or by the
client. This field is ignored as of 1.17 and, according to the `crypto/tls`
docs: "Servers now select the best mutually supported cipher suite based on
logic that takes into account inferred client hardware, server hardware, and
security."

This property has been long-deprecated and leaving it in place may lead to false
assumptions about how cipher suites are negotiated in connection to a server. So
we want to remove it in Nomad 1.9.0.

Fixes: https://github.com/hashicorp/nomad-enterprise/issues/999
Ref: https://hashicorp.atlassian.net/browse/NET-10531
2024-08-01 08:52:05 -04:00
Tim Gross
2ee6043cab tls: support setting min version to TLS1.3 (#23713)
Nomad already supports TLS1.3, but not as a minimum version
configuration. Update our config validation to allow setting `tls_min_version`
to 1.3. Update the documentation to match Vault and warn that the
`tls_cipher_suites` field is ignored when TLS is 1.3

Fixes: https://github.com/hashicorp/nomad/issues/20131
Ref: https://hashicorp.atlassian.net/browse/NET-10530
2024-08-01 08:46:32 -04:00
Tim Gross
0f4014b4a9 docs: external KMS configuration (#23600)
In #23580 we're implementing support for encrypting Nomad's key material with
external KMS providers or Vault Transit. This changeset breaks out the
documentation from that PR to keep the review manageable and present it to a
wider set of reviewers.

Ref: https://hashicorp.atlassian.net/browse/NET-10334
Ref: https://github.com/hashicorp/nomad/issues/14852
Ref: https://github.com/hashicorp/nomad/pull/23580
2024-07-19 15:08:54 -04:00
Tim Gross
2f4353412d keyring: support prepublishing keys (#23577)
When a root key is rotated, the servers immediately start signing Workload
Identities with the new active key. But workloads may be using those WI tokens
to sign into external services, which may not have had time to fetch the new
public key and which might try to fetch new keys as needed.

Add support for prepublishing keys. Prepublished keys will be visible in the
JWKS endpoint but will not be used for signing or encryption until their
`PublishTime`. Update the periodic key rotation to prepublish keys at half the
`root_key_rotation_threshold` window, and promote prepublished keys to active
after the `PublishTime`.

This changeset also fixes two bugs in periodic root key rotation and garbage
collection, both of which can't be safely fixed without implementing
prepublishing:

* Periodic root key rotation would never happen because the default
  `root_key_rotation_threshold` of 720h exceeds the 72h maximum window of the FSM
  time table. We now compare the `CreateTime` against the wall clock time instead
  of the time table. (We expect to remove the time table in future work, ref
  https://github.com/hashicorp/nomad/issues/16359)
* Root key garbage collection could GC keys that were used to sign
  identities. We now wait until `root_key_rotation_threshold` +
  `root_key_gc_threshold` before GC'ing a key.
* When rekeying a root key, the core job did not mark the key as inactive after
  the rekey was complete.

Ref: https://hashicorp.atlassian.net/browse/NET-10398
Ref: https://hashicorp.atlassian.net/browse/NET-10280
Fixes: https://github.com/hashicorp/nomad/issues/19669
Fixes: https://github.com/hashicorp/nomad/issues/23528
Fixes: https://github.com/hashicorp/nomad/issues/19368
2024-07-19 13:29:41 -04:00
guifran001
1c44521543 client: Add a preferred address family option for network-interface (#23389)
to prefer ipv4 or ipv6 when deducing IP from network interface

Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>
2024-07-12 15:30:38 -05:00
Adrian Todorov
3f2729f7f5 remove mentions of old versions of Nomad in various docs (#23567) 2024-07-12 11:01:32 -04:00
Deniz Onur Duzgun
c82dd76a1b security: update tls cipher suites (#23551) 2024-07-11 14:01:45 -04:00
Adrian Todorov
6589d7130b docs: remove mentions of 'new in Nomad X version' where X is an older version (#23552) 2024-07-11 13:43:28 -04:00
liukch
cc7a5ed7e2 docs: Fix parameter type and default value in client reserved configuration. (#23359) 2024-06-21 16:29:59 -04:00
Tim Gross
44078d4786 docs: update configuration docs to include trace-level logging (#23285) 2024-06-11 09:19:52 -04:00
Piotr Kazmierczak
2a09abc477 metrics: quota utilization configuration and documentation (#22912)
Introduces support for (optional) quota utilization metrics

CE part of the hashicorp/nomad-enterprise#1488 change
2024-06-03 21:06:19 +02:00
Seth Hoenig
8ae1a0e356 docs: add docs around dynamic workload users (#20477) 2024-04-23 07:57:40 -05:00
astudentofblake
7b7ed12326 func: Allow custom paths to be added the the getter landlock (#20349)
* func: Allow custom paths to be added the the getter landlock

Fixes: 20315

* fix: slices imports
fix: more meaningful examples
fix: improve documentation
fix: quote error output
2024-04-11 15:17:33 -05:00
Tim Gross
8298d39e78 Connect transparent proxy support
Add support for Consul Connect transparent proxies

Fixes: https://github.com/hashicorp/nomad/issues/10628
2024-04-10 11:00:18 -04:00
Tim Gross
e2e561da88 tproxy: documentation improvements 2024-04-10 08:55:50 -04:00
Tim Gross
8eaf176868 client: fix IPv6 parsing for client.servers block (#20324)
When the `client.servers` block is parsed, we split the port from the
address. This does not correctly handle IPv6 addresses when they are in URL
format (wrapped in brackets), which we require to disambiguate the port and
address.

Fix the parser to correctly split out the port and handle a missing port value
for IPv6. Update the documentation to make the URL format requirement clear.

Fixes: https://github.com/hashicorp/nomad/issues/20310
2024-04-08 15:06:27 -04:00
James Rasell
fd5a42a6ca docs: clarify data dir default parameters and default creation. (#20268) 2024-04-04 09:20:47 +01:00
James Rasell
facc3e8013 agent: allow configuration of in-memory telemetry sink. (#20166)
This change adds configuration options for setting the in-memory
telemetry sink collection and retention durations. This sink backs
the metrics JSON API and previously had hard-coded default values.

The new options are particularly useful when running development or
debug environments, where metrics collection is desired at a fast
and granular rate.
2024-03-25 15:00:18 +00:00
Tim Gross
d3ddb0aa49 docs: make it clear that federation features require ACLs (#20196)
Our documentation has a hidden assumption that users know that federation
replication requires ACLs to be enabled and bootstrapped. Add notes at some of
the places users are likely to look for it.

A separate follow-up PR to the federation tutorial should point to the ACL
multi-region tutorial as well.

Fixes: https://github.com/hashicorp/nomad/issues/20128
2024-03-22 15:15:00 -04:00
Juana De La Cuesta
56bf253474 Add docs for disconnected block (#20147)
Expand the job settings to include the disconnect block and set as deprecated the fields that will be replaced by it.
2024-03-20 10:08:16 +01:00
Tim Gross
dc39c20e66 docs: make recommendation for collection interval vs scrape interval (#20056)
Metrics tools that "pull" metrics, such as Prometheus, have a configurable
interval for how frequently they scrape metrics. This should be greater or equal
to the Nomad `telemetry.collection_interval` to avoid re-scraping metrics that
cannot have been updated in that interval.

Fixes: https://github.com/hashicorp/nomad/issues/20055
2024-03-19 08:56:29 -04:00
Tim Gross
695bb7ffcf docs: improve wording around autoconfiguration via Consul (#20139)
Fixes: https://github.com/hashicorp/nomad/issues/20132
2024-03-15 08:44:58 -04:00
Tim Gross
c1b5850473 docs: add warning not to enable Consul tls.grpc.verify_incoming (#19970)
Consul does not support incoming TLS verification of Envoy. This failure results
in hard-to-understand errors like `SSLV3_ALERT_BAD_CERTIFICATE` in the Envoy
allocation logs. Leave a warning about this to users.

Closes: https://github.com/hashicorp/nomad/issues/19772
Closes: https://github.com/hashicorp/nomad/issues/16854
Ref: https://github.com/hashicorp/consul/issues/13088
2024-02-14 08:56:35 -05:00
Seth Hoenig
37c497628c docs: describe cloud environments in fingerprint denylist (#19952)
This PR changes the example of the client config option "fingerprint.denylist"
to include all the cloud environment fingerprinters. Each one contains a
2 second HTTP timeout to a metadata endpoint that does not exist if you are not
in that particular cloud. When run in serial on startup, this results in
an 8 second wait where nothing useful is happening.

Closes #16727
2024-02-12 09:57:29 -06:00
Tom Davies
5a11a28cac docs: updates link to Consul WLI migration docs (#19748) 2024-01-17 09:57:02 -05:00
Tim Gross
e7ca2b51ad vault: ignore allow_unauthenticated config if identity is set (#19585)
When the server's `vault` block has a default identity, we don't check the
user's Vault token (and in fact, we warn them on job submit if they've provided
one). But the validation hook still checks for a token if
`allow_unauthenticated` is set to true. This is a misconfiguration but there's
no reason for Nomad not to do the expected thing here.

Fixes: https://github.com/hashicorp/nomad/issues/19565
2024-01-02 16:46:34 -05:00
Mike Nomitch
31f4296826 Adds support for failures before warning to Consul service checks (#19336)
Adds support for failures before warning and failures before critical
to the automatically created Nomad client and server services in Consul
2023-12-14 11:33:31 -08:00
James Rasell
94b8b7769a docs: add reporting config block documentation. (#19470) 2023-12-14 15:11:29 +00:00
Charlie Voiselle
d2fc7cc0c4 [docs] Note reboot to update bridge_network_hairpin_mode (#19304) 2023-12-12 19:49:15 -05:00
Luiz Aoqui
99d72b7154 docs: fix placement of Consul auth method configs (#19404)
The auth method names are used by Nomad clients, not servers.
2023-12-11 09:16:57 -05:00
Tim Gross
1e51379e56 docs: clarify behavior and recommendations for mTLS vs TLS for HTTP (#19282)
Some of our documentation on `tls` configuration could be more clear as to
whether we're referring to mTLS or TLS. Also, when ACLs are enabled it's fine to
have `verify_https_client=false` (the default). Make it clear that this is an
acceptably secure configuration and that it's in fact recommended in order to
avoid pain of distributing client certs to user browsers.
2023-12-04 15:03:43 -05:00
James Rasell
d041ddc4ee docs: fix up HCL formatting on agent config examples. (#19254) 2023-12-04 08:44:00 +00:00
Luiz Aoqui
125dd4af38 docs: small updates to agent consul (#19285) 2023-12-01 16:40:06 -05:00
Seth Hoenig
b83c1e14c1 docs: fix documentation of client.reserved.cores (#19266) 2023-12-01 13:06:55 -06:00
Tim Gross
2ba459c73a docs: split consul config params into client vs server sections (#19258)
Some sections of the `consul` configuration are relevant only for clients or
servers. We updated our Vault docs to split these parameters out into their own
sections for clarity. Match that for the Consul docs.
2023-12-01 13:37:39 -05:00
Piotr Kazmierczak
248b2ba5cd WI: use single auth method for Consul by default (#19169)
This simplifies the default setup of Nomad workloads WI-based
authentication for Consul by using a single auth method with 2 binding rules.

Users can still specify separate auth methods for services and tasks.
2023-11-28 12:22:27 +01:00
Luiz Aoqui
5ff6cce3ab vault: update default JWT auth method path (#19188)
Update default auth method path to be `jwt-nomad` to avoid potential
conflicts when Vault's `jwt` default is already being used for something
else.
2023-11-27 17:48:12 -05:00
Adriano Caloiaro
f66eb83fc0 Add go-netaddrs support to retry_join (#18745) 2023-11-15 10:07:18 -05:00