nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-03 17:05:43 +03:00

Author	SHA1	Message	Date
James Rasell	7d48aa2667	client: emit optional telemetry from prerun and prestart hooks. (#24556 ) The Nomad client can now optionally emit telemetry data from the prerun and prestart hooks. This allows operators to monitor and alert on failures and time taken to complete. The new datapoints are: - nomad.client.alloc_hook.prerun.success (counter) - nomad.client.alloc_hook.prerun.failed (counter) - nomad.client.alloc_hook.prerun.elapsed (sample) - nomad.client.task_hook.prestart.success (counter) - nomad.client.task_hook.prestart.failed (counter) - nomad.client.task_hook.prestart.elapsed (sample) The hook execution time is useful to Nomad engineering and will help optimize code where possible and understand job specification impacts on hook performance. Currently only the PreRun and PreStart hooks have telemetry enabled, so we limit the number of new metrics being produced.	2024-12-12 14:43:14 +00:00
Piotr Kazmierczak	368241dbf2	security: a more comprehensive env.denylist (#24540 ) A more comprehensive env.denylist that now includes more token, token file and license variables. --------- Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>	2024-11-22 18:54:18 +01:00
James Rasell	dc501339da	docs: Add federated region concept and operations pages. (#24477 ) In order to help users understand multi-region federated deployments, this change adds two new sections to the website. The first expands the architecture page, so we can add further detail over time with an initial federation page. The second adds a federation operations page which goes into failure planning and mitigation. Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com> Co-authored-by: Michael Schurter <mschurter@hashicorp.com>	2024-11-19 12:39:57 +00:00
Eduardo Medeiros	f8c85b036b	docs: remove duplicated word. (#24433 ) remove duplicated word “Using using”	2024-11-11 16:10:10 -05:00
Tim Gross	7381f8419b	docs: clarify requirements for Consul token policies and TTLs (#24167 ) As of #24166, Nomad agents will use their own token to deregister services and checks from Consul. This returns the deregistration path to the pre-Workload Identity workflow. Expand the documentation to make clear why certain ACL policies are required for clients. Additionally, we did not explicitly call out that auth methods should not set an expiration on Consul tokens. Nomad does not have a facility to refresh these tokens if they expire. Even if Nomad could, there's no way to re-inject them into Envoy sidecars for Consul Service Mesh without recreating the task anyways, which is what happens today. Warn users that they should not set an expiration. Closes: https://github.com/hashicorp/nomad/issues/20185 (wontfix) Ref: https://hashicorp.atlassian.net/browse/NET-10262	2024-10-11 11:59:21 -04:00
Martijn Vegter	3ecf0d21e2	metrics: introduce client config to include alloc metadata as part of the base labels (#23964 )	2024-10-02 10:55:44 -04:00
Daniel Bennett	2f5cf8efae	networking: option to enable ipv6 on bridge network (#23882 ) by setting bridge_network_subnet_ipv6 in client config Co-authored-by: Martina Santangelo <martina.santangelo@hashicorp.com>	2024-09-04 10:17:10 -05:00
Tim Gross	c43e30a387	WI: interpolate parent job ID in `vault.default_identity.extra_claims` (#23817 ) When we interpolate job fields for the `vault.default_identity.extra_claims` block, we forgot to use the parent job ID when that's available (as we do for all other claims). This changeset fixes the bug and adds a helper method that'll hopefully remind us to do this going forward. Also added a missing changelog entry for #23675 where we implemented the `extra_claims` block originally, which shipped in Nomad 1.8.3. Fixes: https://github.com/hashicorp/nomad/issues/23798	2024-09-03 13:56:36 -04:00
Martijn Vegter	aded4b3500	docs: remove remaining references to network_speed config (#23792 )	2024-08-14 14:14:38 -04:00
Tim Gross	bc50eebebd	workload identity: add support for extra claims config for Vault (#23675 ) Although we encourage users to use Vault roles, sometimes they're going to want to assign policies based on entity and pre-create entities and aliases based on claims. This allows them to use single default role (or at least small number of them) that has a templated policy, but have an escape hatch from that. When defining Vault entities the `user_claim` must be unique. When writing Vault binding rules for use with Nomad workload identities the binding rule won't be able to create a 1:1 mapping because the selector language allows accessing only a single field. The `nomad_job_id` claim isn't sufficient to uniquely identify a job because of namespaces. It's possible to create a JWT auth role with `bound_claims` to avoid this becoming a security problem, but this doesn't allow for correct accounting of user claims. Add support for an `extra_claims` block on the server's `default_identity` blocks for Vault. This allows a cluster administrator to add a custom claim on all allocations. The values for these claims are interpolatable with a limited subset of fields, similar to how we interpolate the task environment. Fixes: https://github.com/hashicorp/nomad/issues/23510 Ref: https://hashicorp.atlassian.net/browse/NET-10372 Ref: https://hashicorp.atlassian.net/browse/NET-10387	2024-08-05 15:01:54 -04:00
Aimee Ukasick	cbacdb2041	DOCS: CE-659 chroot limitations for isolated fork/exec driver (#23739 )	2024-08-05 14:35:54 -04:00
Tim Gross	9ff7437b06	docs: document `client.alloc_mounts_dir` configuration (#23733 ) In Nomad 1.8.0 we introduced the `alloc_mounts_dir` to support unveil filesystem isolation, but we didn't document the configuration value.	2024-08-05 11:59:47 -04:00
Tim Gross	9d4686c0df	tls: remove deprecated `prefer_server_cipher_suites` field (#23712 ) The TLS configuration object includes a deprecated `prefer_server_cipher_suites` field. In version of Go prior to 1.17, this property controlled whether a TLS connection would use the cipher suites preferred by the server or by the client. This field is ignored as of 1.17 and, according to the `crypto/tls` docs: "Servers now select the best mutually supported cipher suite based on logic that takes into account inferred client hardware, server hardware, and security." This property has been long-deprecated and leaving it in place may lead to false assumptions about how cipher suites are negotiated in connection to a server. So we want to remove it in Nomad 1.9.0. Fixes: https://github.com/hashicorp/nomad-enterprise/issues/999 Ref: https://hashicorp.atlassian.net/browse/NET-10531	2024-08-01 08:52:05 -04:00
Tim Gross	2ee6043cab	tls: support setting min version to TLS1.3 (#23713 ) Nomad already supports TLS1.3, but not as a minimum version configuration. Update our config validation to allow setting `tls_min_version` to 1.3. Update the documentation to match Vault and warn that the `tls_cipher_suites` field is ignored when TLS is 1.3 Fixes: https://github.com/hashicorp/nomad/issues/20131 Ref: https://hashicorp.atlassian.net/browse/NET-10530	2024-08-01 08:46:32 -04:00
Tim Gross	0f4014b4a9	docs: external KMS configuration (#23600 ) In #23580 we're implementing support for encrypting Nomad's key material with external KMS providers or Vault Transit. This changeset breaks out the documentation from that PR to keep the review manageable and present it to a wider set of reviewers. Ref: https://hashicorp.atlassian.net/browse/NET-10334 Ref: https://github.com/hashicorp/nomad/issues/14852 Ref: https://github.com/hashicorp/nomad/pull/23580	2024-07-19 15:08:54 -04:00
Tim Gross	2f4353412d	keyring: support prepublishing keys (#23577 ) When a root key is rotated, the servers immediately start signing Workload Identities with the new active key. But workloads may be using those WI tokens to sign into external services, which may not have had time to fetch the new public key and which might try to fetch new keys as needed. Add support for prepublishing keys. Prepublished keys will be visible in the JWKS endpoint but will not be used for signing or encryption until their `PublishTime`. Update the periodic key rotation to prepublish keys at half the `root_key_rotation_threshold` window, and promote prepublished keys to active after the `PublishTime`. This changeset also fixes two bugs in periodic root key rotation and garbage collection, both of which can't be safely fixed without implementing prepublishing: * Periodic root key rotation would never happen because the default `root_key_rotation_threshold` of 720h exceeds the 72h maximum window of the FSM time table. We now compare the `CreateTime` against the wall clock time instead of the time table. (We expect to remove the time table in future work, ref https://github.com/hashicorp/nomad/issues/16359) * Root key garbage collection could GC keys that were used to sign identities. We now wait until `root_key_rotation_threshold` + `root_key_gc_threshold` before GC'ing a key. * When rekeying a root key, the core job did not mark the key as inactive after the rekey was complete. Ref: https://hashicorp.atlassian.net/browse/NET-10398 Ref: https://hashicorp.atlassian.net/browse/NET-10280 Fixes: https://github.com/hashicorp/nomad/issues/19669 Fixes: https://github.com/hashicorp/nomad/issues/23528 Fixes: https://github.com/hashicorp/nomad/issues/19368	2024-07-19 13:29:41 -04:00
guifran001	1c44521543	client: Add a preferred address family option for network-interface (#23389 ) to prefer ipv4 or ipv6 when deducing IP from network interface Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>	2024-07-12 15:30:38 -05:00
Adrian Todorov	3f2729f7f5	remove mentions of old versions of Nomad in various docs (#23567 )	2024-07-12 11:01:32 -04:00
Deniz Onur Duzgun	c82dd76a1b	security: update tls cipher suites (#23551 )	2024-07-11 14:01:45 -04:00
Adrian Todorov	6589d7130b	docs: remove mentions of 'new in Nomad X version' where X is an older version (#23552 )	2024-07-11 13:43:28 -04:00
liukch	cc7a5ed7e2	docs: Fix parameter type and default value in client reserved configuration. (#23359 )	2024-06-21 16:29:59 -04:00
Tim Gross	44078d4786	docs: update configuration docs to include trace-level logging (#23285 )	2024-06-11 09:19:52 -04:00
Piotr Kazmierczak	2a09abc477	metrics: quota utilization configuration and documentation (#22912 ) Introduces support for (optional) quota utilization metrics CE part of the hashicorp/nomad-enterprise#1488 change	2024-06-03 21:06:19 +02:00
Seth Hoenig	8ae1a0e356	docs: add docs around dynamic workload users (#20477 )	2024-04-23 07:57:40 -05:00
astudentofblake	7b7ed12326	func: Allow custom paths to be added the the getter landlock (#20349 ) * func: Allow custom paths to be added the the getter landlock Fixes: 20315 * fix: slices imports fix: more meaningful examples fix: improve documentation fix: quote error output	2024-04-11 15:17:33 -05:00
Tim Gross	8298d39e78	Connect transparent proxy support Add support for Consul Connect transparent proxies Fixes: https://github.com/hashicorp/nomad/issues/10628	2024-04-10 11:00:18 -04:00
Tim Gross	e2e561da88	tproxy: documentation improvements	2024-04-10 08:55:50 -04:00
Tim Gross	8eaf176868	client: fix IPv6 parsing for `client.servers` block (#20324 ) When the `client.servers` block is parsed, we split the port from the address. This does not correctly handle IPv6 addresses when they are in URL format (wrapped in brackets), which we require to disambiguate the port and address. Fix the parser to correctly split out the port and handle a missing port value for IPv6. Update the documentation to make the URL format requirement clear. Fixes: https://github.com/hashicorp/nomad/issues/20310	2024-04-08 15:06:27 -04:00
James Rasell	fd5a42a6ca	docs: clarify data dir default parameters and default creation. (#20268 )	2024-04-04 09:20:47 +01:00
James Rasell	facc3e8013	agent: allow configuration of in-memory telemetry sink. (#20166 ) This change adds configuration options for setting the in-memory telemetry sink collection and retention durations. This sink backs the metrics JSON API and previously had hard-coded default values. The new options are particularly useful when running development or debug environments, where metrics collection is desired at a fast and granular rate.	2024-03-25 15:00:18 +00:00
Tim Gross	d3ddb0aa49	docs: make it clear that federation features require ACLs (#20196 ) Our documentation has a hidden assumption that users know that federation replication requires ACLs to be enabled and bootstrapped. Add notes at some of the places users are likely to look for it. A separate follow-up PR to the federation tutorial should point to the ACL multi-region tutorial as well. Fixes: https://github.com/hashicorp/nomad/issues/20128	2024-03-22 15:15:00 -04:00
Juana De La Cuesta	56bf253474	Add docs for disconnected block (#20147 ) Expand the job settings to include the disconnect block and set as deprecated the fields that will be replaced by it.	2024-03-20 10:08:16 +01:00
Tim Gross	dc39c20e66	docs: make recommendation for collection interval vs scrape interval (#20056 ) Metrics tools that "pull" metrics, such as Prometheus, have a configurable interval for how frequently they scrape metrics. This should be greater or equal to the Nomad `telemetry.collection_interval` to avoid re-scraping metrics that cannot have been updated in that interval. Fixes: https://github.com/hashicorp/nomad/issues/20055	2024-03-19 08:56:29 -04:00
Tim Gross	695bb7ffcf	docs: improve wording around autoconfiguration via Consul (#20139 ) Fixes: https://github.com/hashicorp/nomad/issues/20132	2024-03-15 08:44:58 -04:00
Tim Gross	c1b5850473	docs: add warning not to enable Consul `tls.grpc.verify_incoming` (#19970 ) Consul does not support incoming TLS verification of Envoy. This failure results in hard-to-understand errors like `SSLV3_ALERT_BAD_CERTIFICATE` in the Envoy allocation logs. Leave a warning about this to users. Closes: https://github.com/hashicorp/nomad/issues/19772 Closes: https://github.com/hashicorp/nomad/issues/16854 Ref: https://github.com/hashicorp/consul/issues/13088	2024-02-14 08:56:35 -05:00
Seth Hoenig	37c497628c	docs: describe cloud environments in fingerprint denylist (#19952 ) This PR changes the example of the client config option "fingerprint.denylist" to include all the cloud environment fingerprinters. Each one contains a 2 second HTTP timeout to a metadata endpoint that does not exist if you are not in that particular cloud. When run in serial on startup, this results in an 8 second wait where nothing useful is happening. Closes #16727	2024-02-12 09:57:29 -06:00
Tom Davies	5a11a28cac	docs: updates link to Consul WLI migration docs (#19748 )	2024-01-17 09:57:02 -05:00
Tim Gross	e7ca2b51ad	vault: ignore `allow_unauthenticated` config if identity is set (#19585 ) When the server's `vault` block has a default identity, we don't check the user's Vault token (and in fact, we warn them on job submit if they've provided one). But the validation hook still checks for a token if `allow_unauthenticated` is set to true. This is a misconfiguration but there's no reason for Nomad not to do the expected thing here. Fixes: https://github.com/hashicorp/nomad/issues/19565	2024-01-02 16:46:34 -05:00
Mike Nomitch	31f4296826	Adds support for failures before warning to Consul service checks (#19336 ) Adds support for failures before warning and failures before critical to the automatically created Nomad client and server services in Consul	2023-12-14 11:33:31 -08:00
James Rasell	94b8b7769a	docs: add reporting config block documentation. (#19470 )	2023-12-14 15:11:29 +00:00
Charlie Voiselle	d2fc7cc0c4	[docs] Note reboot to update `bridge_network_hairpin_mode` (#19304 )	2023-12-12 19:49:15 -05:00
Luiz Aoqui	99d72b7154	docs: fix placement of Consul auth method configs (#19404 ) The auth method names are used by Nomad clients, not servers.	2023-12-11 09:16:57 -05:00
Tim Gross	1e51379e56	docs: clarify behavior and recommendations for mTLS vs TLS for HTTP (#19282 ) Some of our documentation on `tls` configuration could be more clear as to whether we're referring to mTLS or TLS. Also, when ACLs are enabled it's fine to have `verify_https_client=false` (the default). Make it clear that this is an acceptably secure configuration and that it's in fact recommended in order to avoid pain of distributing client certs to user browsers.	2023-12-04 15:03:43 -05:00
James Rasell	d041ddc4ee	docs: fix up HCL formatting on agent config examples. (#19254 )	2023-12-04 08:44:00 +00:00
Luiz Aoqui	125dd4af38	docs: small updates to agent `consul` (#19285 )	2023-12-01 16:40:06 -05:00
Seth Hoenig	b83c1e14c1	docs: fix documentation of client.reserved.cores (#19266 )	2023-12-01 13:06:55 -06:00
Tim Gross	2ba459c73a	docs: split `consul` config params into client vs server sections (#19258 ) Some sections of the `consul` configuration are relevant only for clients or servers. We updated our Vault docs to split these parameters out into their own sections for clarity. Match that for the Consul docs.	2023-12-01 13:37:39 -05:00
Piotr Kazmierczak	248b2ba5cd	WI: use single auth method for Consul by default (#19169 ) This simplifies the default setup of Nomad workloads WI-based authentication for Consul by using a single auth method with 2 binding rules. Users can still specify separate auth methods for services and tasks.	2023-11-28 12:22:27 +01:00
Luiz Aoqui	5ff6cce3ab	vault: update default JWT auth method path (#19188 ) Update default auth method path to be `jwt-nomad` to avoid potential conflicts when Vault's `jwt` default is already being used for something else.	2023-11-27 17:48:12 -05:00
Adriano Caloiaro	f66eb83fc0	Add `go-netaddrs` support to `retry_join` (#18745 )	2023-11-15 10:07:18 -05:00

1 2 3 4

166 Commits