nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-03 08:55:43 +03:00

Author	SHA1	Message	Date
Piotr Kazmierczak	0c2fcb3e30	docs: explicitly list all schedulers enabled by default (#26150 ) Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com> Co-authored-by: Tim Gross <tgross@hashicorp.com>	2025-06-26 17:37:26 +02:00
Mattias Fjellström	8e6b2e1b63	docs: adding note on azure msi for server join (#26141 )	2025-06-26 10:29:06 +02:00
Mattias Fjellström	e2a30df14c	docs: clarified azure cloud join requirements (#26091 )	2025-06-23 08:34:56 -05:00
Tim Gross	3f59860254	host volumes: add configuration to GC on node GC (#25903 ) When a node is garbage collected, any dynamic host volumes on the node are orphaned in the state store. We generally don't want to automatically collect these volumes and risk data loss, and have provided a CLI flag to `-force` remove them in #25902. But for clusters running on ephemeral cloud instances (ex. AWS EC2 in an autoscaling group), deleting host volumes may add excessive friction. Add a configuration knob to the client configuration to remove host volumes from the state store on node GC. Ref: https://github.com/hashicorp/nomad/pull/25902 Ref: https://github.com/hashicorp/nomad/issues/25762 Ref: https://hashicorp.atlassian.net/browse/NMD-705	2025-05-27 10:22:08 -04:00
tehut	55523ecf8e	Add NodeMaxAllocations to client configuration (#25785 ) * Set MaxAllocations in client config Add NodeAllocationTracker struct to Node struct Evaluate MaxAllocations in AllocsFit function Set up cli config parsing Integrate maxAllocs into AllocatedResources view Co-authored-by: Tim Gross <tgross@hashicorp.com> --------- Co-authored-by: Tim Gross <tgross@hashicorp.com>	2025-05-22 12:49:27 -07:00
Tim Gross	8a5a057d88	offline license utilization reporting (#25844 ) Nomad Enterprise users operating in air-gapped or otherwise secured environments don't want to send license reporting metrics directly from their servers. Implement manual/offline reporting by periodically recording usage metrics snapshots in the state store, and providing an API and CLI by which cluster administrators can download the snapshot for review and out-of-band transmission to HashiCorp. This is the CE portion of the work required for implemention in the Enterprise product. Nomad CE does not perform utilization reporting. Ref: https://github.com/hashicorp/nomad-enterprise/pull/2673 Ref: https://hashicorp.atlassian.net/browse/NMD-68 Ref: https://go.hashi.co/rfc/nmd-210	2025-05-14 09:51:13 -04:00
James Rasell	0b265d2417	encrypter: Track initial tasks for is ready calculation. (#25803 ) The server startup could "hang" to the view of an operator if it had a key that could not be decrypted or replicated loaded from the FSM at startup. In order to prevent this happening, the server startup function will now use a timeout to wait for the encrypter to be ready. If the timeout is reached, the error is sent back to the caller which fails the CLI command. This bubbling of error message will also flush to logs which will provide addition operator feedback. The server only cares about keys loaded from the FSM snapshot and trailing logs before the encrypter should be classed as ready. So that the encrypter ready function does not get blocked by keys added outside of the initial Raft load, we take a snapshot of the decryption tasks as we enter the blocking call, and class these as our barrier.	2025-05-07 15:38:16 +01:00
Nikita Eliseev	76fb3eb9a1	rpc: added configuration for yamux session (#25466 ) Fixes: https://github.com/hashicorp/nomad/issues/25380	2025-04-02 10:58:23 -04:00
Aimee Ukasick	9778fa4912	Docs: Fix broken links in main for 1.10 release (#25540 ) * Docs: Fix broken links in main for 1.10 release * Implement Tim's suggestions * Remove link to Portworx from ecosystem page * remove "Portworx" since Portworx 3.2 no longer supports Nomad	2025-04-01 09:09:44 -05:00
Tim Gross	cdd40cf81b	docs: document requirements for Consul tokens in admin partitions (#25529 ) When using Nomad with Consul, each Nomad agent is expected to have a Consul agent running alongside. When using Nomad Enterprise and Consul Enterprise together, the Consul agent may be in a Consul admin partition. In order for Nomad's "anti-entropy" sync to work with Consul, the Consul ACL token and ACL policy for the Nomad client must be in the same admin partition as the Consul agent. Otherwise, we can register services (via WI) but then won't be able to deregister them unless they're the default namespace. Ref: https://hashicorp.atlassian.net/browse/NET-12361	2025-04-01 08:45:05 -04:00
Aimee Ukasick	34ae5d5ae6	Fix link rendering in server.default_scheduler_config (#25482 ) CE-821	2025-03-21 12:50:57 -05:00
Tim Gross	bf67f53ba2	docs: add note about Consul Enterprise role bindings and namespaces (#25426 ) When configuring Consul to use Nomad workload identities, you create the Consul auth method in the default namespace. If you're using Consul Enterprise namespaces, there are two available approaches: one is to create the tokens in the default namespace and give them policies that define cross-namespace access, and the other is to use binding rules that map the login to a particular namespace. The latter is what we show in our docs, but this was missing a note that any roles (and their associated policies) targetted by `-bind-type role` need to exist in the Consul namespace we're logging into. Also, in Nomad CE, the `consul.namespace` flag is always treated as having been set to `"default"`. That is, we ignore it and don't return an error even though it's a Nomad ENT-only feature. Clarify this in the documentation for the field the same way we've done for the `cluster` field. Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>	2025-03-18 15:35:00 -04:00
Phil Renaud	35e1ea4328	[cli] UI URL hints for common CLI commands (#24454 ) * Basic implementation for server members and node status * Commands for alloc status and job status * -ui flag for most commands * url hints for variables * url hints for job dispatch, evals, and deployments * agent config ui.cli_url_links to disable * Fix an issue where path prefix was presumed for variables * driver uncomment and general cleanup * -ui flag on the generic status endpoint * Job run command gets namespaces, and no longer gets ui hints for --output flag * Dispatch command hints get a namespace, and bunch o tests * Lots of tests depend on specific output, so let's not mess with them * figured out what flagAddress is all about for testServer, oof * Parallel outside of test instances * Browser-opening test, sorta * Env var for disabling/enabling CLI hints * Addressing a few PR comments * CLI docs available flags now all have -ui * PR comments addressed; switched the env var to be consistent and scrunched monitor-adjacent hints a bit more * ui.Output -> ui.Warn; moves hints from stdout to stderr * isTerminal check and parseBool on command option * terminal.IsTerminal check removed for test-runner-not-being-terminal reasons	2025-03-07 13:23:35 -05:00
Michael Smithhisler	5c4d0e923d	consul: Remove legacy token based authentication workflow (#25217 )	2025-03-05 15:38:11 -05:00
grembo	b6d925987c	Allow disabling wait in client configuration (#25255 ) Before the fixes in #20165, the wait feature was disabled by default. After these changes, it's always enabled, which - at least on some platforms - leads to a significant increase in load (5-7x). This patch allows disabling the wait feature in the client stanza of the configuration file by setting min and max to 0: wait { min = "0" max = "0" } Per-template wait blocks in the task description still work like one would expect.	2025-03-03 16:38:46 -05:00
James Rasell	7268053174	vault: Remove legacy token based authentication workflow. (#25155 ) The legacy workflow for Vault whereby servers were configured using a token to provide authentication to the Vault API has now been removed. This change also removes the workflow where servers were responsible for deriving Vault tokens for Nomad clients. The deprecated Vault config options used byi the Nomad agent have all been removed except for "token" which is still in use by the Vault Transit keyring implementation. Job specification authors can no longer use the "vault.policies" parameter and should instead use "vault.role" when not using the default workload identity. --------- Co-authored-by: Tim Gross <tgross@hashicorp.com> Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>	2025-02-28 07:40:02 +00:00
Tim Gross	dc58f247ed	docs: clarify reschedule, migrate, and replacement terminology (#24929 ) Our vocabulary around scheduler behaviors outside of the `reschedule` and `migrate` blocks leaves room for confusion around whether the reschedule tracker should be propagated between allocations. There are effectively five different behaviors we need to cover: * restart: when the tasks of an allocation fail and we try to restart the tasks in place. * reschedule: when the `restart` block runs out of attempts (or the allocation fails before tasks even start), and we need to move the allocation to another node to try again. * migrate: when the user has asked to drain a node and we need to move the allocations. These are not failures, so we don't want to propagate the reschedule tracker. * replacement: when a node is lost, we don't count that against the `reschedule` tracker for the allocations on the node (it's not the allocation's "fault", after all). We don't want to run the `migrate` machinery here here either, as we can't contact the down node. To the scheduler, this is effectively the same as if we bumped the `group.count` * replacement for `disconnect.replace = true`: this is a replacement, but the replacement is intended to be temporary, so we propagate the reschedule tracker. Add a section to the `reschedule`, `migrate`, and `disconnect` blocks explaining when each item applies. Update the use of the word "reschedule" in several places where "replacement" is correct, and vice-versa. Fixes: https://github.com/hashicorp/nomad/issues/24918 Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>	2025-02-18 09:31:03 -05:00
Tim Gross	c2298e0999	Dynamic host volume reference documentation (#24797 )	2025-02-13 12:25:58 -05:00
Jorge Marey	25426f0777	fingerprint: add config option to disable dmidecode (#25108 )	2025-02-13 11:20:48 -05:00
Tim Gross	a12c0f724e	dynamic host volumes: client configuration docs (#24827 ) Document the client configuration changes needed to support dynamic host volumes. This changeset excludes the plugin specification/concepts, which will be under a separate PR. Ref: https://github.com/hashicorp/nomad/pull/24797 Ref: https://hashicorp.atlassian.net/browse/NET-11482 Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com> Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>	2025-01-28 16:33:20 -05:00
Daniel Bennett	49c147bcd7	dynamic host volumes: change env vars, fixup auto-delete (#24943 ) * plugin env: DHV_HOST_PATH->DHV_VOLUMES_DIR * client config: host_volumes_dir * plugin env: add namespace+nodepool * only auto-delete after error saving client state on initial create	2025-01-27 10:36:53 -06:00
Aimee Ukasick	ffb34319d5	Docs SEO: Update Configuration section to improve search (#24759 ) * Docs SEO: Update Configuration section to improve search engine opt CE-775 * Add enterprise only back to audit * Update descriptions and add intro paragraph * Fix typo * replace "below" and "see" * Apply suggestions from code review Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com> --------- Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>	2025-01-10 11:05:23 -06:00
James Rasell	7d48aa2667	client: emit optional telemetry from prerun and prestart hooks. (#24556 ) The Nomad client can now optionally emit telemetry data from the prerun and prestart hooks. This allows operators to monitor and alert on failures and time taken to complete. The new datapoints are: - nomad.client.alloc_hook.prerun.success (counter) - nomad.client.alloc_hook.prerun.failed (counter) - nomad.client.alloc_hook.prerun.elapsed (sample) - nomad.client.task_hook.prestart.success (counter) - nomad.client.task_hook.prestart.failed (counter) - nomad.client.task_hook.prestart.elapsed (sample) The hook execution time is useful to Nomad engineering and will help optimize code where possible and understand job specification impacts on hook performance. Currently only the PreRun and PreStart hooks have telemetry enabled, so we limit the number of new metrics being produced.	2024-12-12 14:43:14 +00:00
Piotr Kazmierczak	368241dbf2	security: a more comprehensive env.denylist (#24540 ) A more comprehensive env.denylist that now includes more token, token file and license variables. --------- Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>	2024-11-22 18:54:18 +01:00
James Rasell	dc501339da	docs: Add federated region concept and operations pages. (#24477 ) In order to help users understand multi-region federated deployments, this change adds two new sections to the website. The first expands the architecture page, so we can add further detail over time with an initial federation page. The second adds a federation operations page which goes into failure planning and mitigation. Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com> Co-authored-by: Michael Schurter <mschurter@hashicorp.com>	2024-11-19 12:39:57 +00:00
Eduardo Medeiros	f8c85b036b	docs: remove duplicated word. (#24433 ) remove duplicated word “Using using”	2024-11-11 16:10:10 -05:00
Tim Gross	7381f8419b	docs: clarify requirements for Consul token policies and TTLs (#24167 ) As of #24166, Nomad agents will use their own token to deregister services and checks from Consul. This returns the deregistration path to the pre-Workload Identity workflow. Expand the documentation to make clear why certain ACL policies are required for clients. Additionally, we did not explicitly call out that auth methods should not set an expiration on Consul tokens. Nomad does not have a facility to refresh these tokens if they expire. Even if Nomad could, there's no way to re-inject them into Envoy sidecars for Consul Service Mesh without recreating the task anyways, which is what happens today. Warn users that they should not set an expiration. Closes: https://github.com/hashicorp/nomad/issues/20185 (wontfix) Ref: https://hashicorp.atlassian.net/browse/NET-10262	2024-10-11 11:59:21 -04:00
Martijn Vegter	3ecf0d21e2	metrics: introduce client config to include alloc metadata as part of the base labels (#23964 )	2024-10-02 10:55:44 -04:00
Daniel Bennett	2f5cf8efae	networking: option to enable ipv6 on bridge network (#23882 ) by setting bridge_network_subnet_ipv6 in client config Co-authored-by: Martina Santangelo <martina.santangelo@hashicorp.com>	2024-09-04 10:17:10 -05:00
Tim Gross	c43e30a387	WI: interpolate parent job ID in `vault.default_identity.extra_claims` (#23817 ) When we interpolate job fields for the `vault.default_identity.extra_claims` block, we forgot to use the parent job ID when that's available (as we do for all other claims). This changeset fixes the bug and adds a helper method that'll hopefully remind us to do this going forward. Also added a missing changelog entry for #23675 where we implemented the `extra_claims` block originally, which shipped in Nomad 1.8.3. Fixes: https://github.com/hashicorp/nomad/issues/23798	2024-09-03 13:56:36 -04:00
Martijn Vegter	aded4b3500	docs: remove remaining references to network_speed config (#23792 )	2024-08-14 14:14:38 -04:00
Tim Gross	bc50eebebd	workload identity: add support for extra claims config for Vault (#23675 ) Although we encourage users to use Vault roles, sometimes they're going to want to assign policies based on entity and pre-create entities and aliases based on claims. This allows them to use single default role (or at least small number of them) that has a templated policy, but have an escape hatch from that. When defining Vault entities the `user_claim` must be unique. When writing Vault binding rules for use with Nomad workload identities the binding rule won't be able to create a 1:1 mapping because the selector language allows accessing only a single field. The `nomad_job_id` claim isn't sufficient to uniquely identify a job because of namespaces. It's possible to create a JWT auth role with `bound_claims` to avoid this becoming a security problem, but this doesn't allow for correct accounting of user claims. Add support for an `extra_claims` block on the server's `default_identity` blocks for Vault. This allows a cluster administrator to add a custom claim on all allocations. The values for these claims are interpolatable with a limited subset of fields, similar to how we interpolate the task environment. Fixes: https://github.com/hashicorp/nomad/issues/23510 Ref: https://hashicorp.atlassian.net/browse/NET-10372 Ref: https://hashicorp.atlassian.net/browse/NET-10387	2024-08-05 15:01:54 -04:00
Aimee Ukasick	cbacdb2041	DOCS: CE-659 chroot limitations for isolated fork/exec driver (#23739 )	2024-08-05 14:35:54 -04:00
Tim Gross	9ff7437b06	docs: document `client.alloc_mounts_dir` configuration (#23733 ) In Nomad 1.8.0 we introduced the `alloc_mounts_dir` to support unveil filesystem isolation, but we didn't document the configuration value.	2024-08-05 11:59:47 -04:00
Tim Gross	9d4686c0df	tls: remove deprecated `prefer_server_cipher_suites` field (#23712 ) The TLS configuration object includes a deprecated `prefer_server_cipher_suites` field. In version of Go prior to 1.17, this property controlled whether a TLS connection would use the cipher suites preferred by the server or by the client. This field is ignored as of 1.17 and, according to the `crypto/tls` docs: "Servers now select the best mutually supported cipher suite based on logic that takes into account inferred client hardware, server hardware, and security." This property has been long-deprecated and leaving it in place may lead to false assumptions about how cipher suites are negotiated in connection to a server. So we want to remove it in Nomad 1.9.0. Fixes: https://github.com/hashicorp/nomad-enterprise/issues/999 Ref: https://hashicorp.atlassian.net/browse/NET-10531	2024-08-01 08:52:05 -04:00
Tim Gross	2ee6043cab	tls: support setting min version to TLS1.3 (#23713 ) Nomad already supports TLS1.3, but not as a minimum version configuration. Update our config validation to allow setting `tls_min_version` to 1.3. Update the documentation to match Vault and warn that the `tls_cipher_suites` field is ignored when TLS is 1.3 Fixes: https://github.com/hashicorp/nomad/issues/20131 Ref: https://hashicorp.atlassian.net/browse/NET-10530	2024-08-01 08:46:32 -04:00
Tim Gross	0f4014b4a9	docs: external KMS configuration (#23600 ) In #23580 we're implementing support for encrypting Nomad's key material with external KMS providers or Vault Transit. This changeset breaks out the documentation from that PR to keep the review manageable and present it to a wider set of reviewers. Ref: https://hashicorp.atlassian.net/browse/NET-10334 Ref: https://github.com/hashicorp/nomad/issues/14852 Ref: https://github.com/hashicorp/nomad/pull/23580	2024-07-19 15:08:54 -04:00
Tim Gross	2f4353412d	keyring: support prepublishing keys (#23577 ) When a root key is rotated, the servers immediately start signing Workload Identities with the new active key. But workloads may be using those WI tokens to sign into external services, which may not have had time to fetch the new public key and which might try to fetch new keys as needed. Add support for prepublishing keys. Prepublished keys will be visible in the JWKS endpoint but will not be used for signing or encryption until their `PublishTime`. Update the periodic key rotation to prepublish keys at half the `root_key_rotation_threshold` window, and promote prepublished keys to active after the `PublishTime`. This changeset also fixes two bugs in periodic root key rotation and garbage collection, both of which can't be safely fixed without implementing prepublishing: * Periodic root key rotation would never happen because the default `root_key_rotation_threshold` of 720h exceeds the 72h maximum window of the FSM time table. We now compare the `CreateTime` against the wall clock time instead of the time table. (We expect to remove the time table in future work, ref https://github.com/hashicorp/nomad/issues/16359) * Root key garbage collection could GC keys that were used to sign identities. We now wait until `root_key_rotation_threshold` + `root_key_gc_threshold` before GC'ing a key. * When rekeying a root key, the core job did not mark the key as inactive after the rekey was complete. Ref: https://hashicorp.atlassian.net/browse/NET-10398 Ref: https://hashicorp.atlassian.net/browse/NET-10280 Fixes: https://github.com/hashicorp/nomad/issues/19669 Fixes: https://github.com/hashicorp/nomad/issues/23528 Fixes: https://github.com/hashicorp/nomad/issues/19368	2024-07-19 13:29:41 -04:00
guifran001	1c44521543	client: Add a preferred address family option for network-interface (#23389 ) to prefer ipv4 or ipv6 when deducing IP from network interface Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>	2024-07-12 15:30:38 -05:00
Adrian Todorov	3f2729f7f5	remove mentions of old versions of Nomad in various docs (#23567 )	2024-07-12 11:01:32 -04:00
Deniz Onur Duzgun	c82dd76a1b	security: update tls cipher suites (#23551 )	2024-07-11 14:01:45 -04:00
Adrian Todorov	6589d7130b	docs: remove mentions of 'new in Nomad X version' where X is an older version (#23552 )	2024-07-11 13:43:28 -04:00
liukch	cc7a5ed7e2	docs: Fix parameter type and default value in client reserved configuration. (#23359 )	2024-06-21 16:29:59 -04:00
Tim Gross	44078d4786	docs: update configuration docs to include trace-level logging (#23285 )	2024-06-11 09:19:52 -04:00
Piotr Kazmierczak	2a09abc477	metrics: quota utilization configuration and documentation (#22912 ) Introduces support for (optional) quota utilization metrics CE part of the hashicorp/nomad-enterprise#1488 change	2024-06-03 21:06:19 +02:00
Seth Hoenig	8ae1a0e356	docs: add docs around dynamic workload users (#20477 )	2024-04-23 07:57:40 -05:00
astudentofblake	7b7ed12326	func: Allow custom paths to be added the the getter landlock (#20349 ) * func: Allow custom paths to be added the the getter landlock Fixes: 20315 * fix: slices imports fix: more meaningful examples fix: improve documentation fix: quote error output	2024-04-11 15:17:33 -05:00
Tim Gross	8298d39e78	Connect transparent proxy support Add support for Consul Connect transparent proxies Fixes: https://github.com/hashicorp/nomad/issues/10628	2024-04-10 11:00:18 -04:00
Tim Gross	e2e561da88	tproxy: documentation improvements	2024-04-10 08:55:50 -04:00
Tim Gross	8eaf176868	client: fix IPv6 parsing for `client.servers` block (#20324 ) When the `client.servers` block is parsed, we split the port from the address. This does not correctly handle IPv6 addresses when they are in URL format (wrapped in brackets), which we require to disambiguate the port and address. Fix the parser to correctly split out the port and handle a missing port value for IPv6. Update the documentation to make the URL format requirement clear. Fixes: https://github.com/hashicorp/nomad/issues/20310	2024-04-08 15:06:27 -04:00

1 2 3 4

188 Commits