nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-01 16:05:42 +03:00

Author	SHA1	Message	Date
Brendan MacDonell	26485c45a2	Add job_max_count option to keep Nomad server from running out of memory (#26858 ) If a Nomad job is started with a large number of instances (e.g. 4 billion), then the Nomad servers that attempt to schedule it will run out of memory and crash. While it's unlikely that anyone would intentionally schedule a job with 4 billion instances, we have occasionally run into issues with bugs in external automation. For example, an automated deployment system running on a test environment had an off-by-one error, and deployed a job with count = uint32(-1), causing the Nomad servers for that environment to run out of memory and crash. To prevent this, this PR introduces a job_max_count Nomad server configuration parameter. job_max_count limits the number of allocs that may be created from a job. The default value is 50000 - this is low enough that a job with the maximum possible number of allocs will not require much memory on the server, but is still much higher than the number of allocs in the largest Nomad job we have ever run.	2025-10-06 09:35:10 -04:00
Allison Larson	e40164abce	Add preserve-resources flag (#26841 ) * Add preserve-resources flag when registering a job * Add preserve-resources flag to website docs * Add changelog * Update tests, docs * Preserve counts & resources in fsm * Update doc * Update preservation of resources/count to happen in StateStore	2025-10-02 13:56:59 -07:00
Michael Smithhisler	f2b831a430	docs: add job spec and plugin authoring pages for secrets (#26529 ) --------- Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>	2025-10-01 10:46:12 -04:00
Chris Roberts	1cf8d35245	docs: fix broken link for plugin guide (#26843 )	2025-09-30 09:21:28 -05:00
James Rasell	e6a04e06d1	acl: Check for duplicate or invalid keys when writing new policies (#26836 ) ACL policies are parsed when creating, updating, or compiling the resulting ACL object when used. This parsing was silently ignoring duplicate singleton keys, or invalid keys which does not grant any additional access, but is a poor UX and can be unexpected. This change parses all new policy writes and updates, so that duplicate or invalid keys return an error to the caller. This is called strict parsing. In order to correctly handle upgrades of clusters which have existing policies that would fall foul of the change, a lenient parsing mode is also available. This allows the policy to continue to be parsed and compiled after an upgrade without the need for an operator to correct the policy document prior to further use. Co-authored-by: Tim Gross <tgross@hashicorp.com>	2025-09-30 08:16:59 +01:00
James Rasell	61a4a02166	docs: Add node identity concepts page and other missing items. (#26830 ) Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>	2025-09-26 07:44:58 +01:00
Tim Gross	b5530128df	docs: expand on allocation GC details (#26792 ) Expand on the documentation of allocation garbage collection: * Explain that server-side GC of allocations is tied to the GC of the evaluation that spawned the allocation. * Explain that server-side GC of allocations will force them to be immediately GC'd on the client regardless of the client-side configurations. Ref: https://github.com/hashicorp/nomad/issues/26765 Co-authored-by: Aimee Ukasick <Aimee.Ukasick@ibm.com> Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>	2025-09-19 12:17:17 -04:00
Aimee Ukasick	fca783c566	Add 1.10.5 release notes (#26782 )	2025-09-17 08:59:43 -05:00
James Rasell	ac5a77af56	docs: Add client identity HTTP API detail on api-docs page. (#26774 ) Co-authored-by: Aimee Ukasick <Aimee.Ukasick@ibm.com>	2025-09-17 14:05:37 +01:00
Michael Smithhisler	1a19a16ee9	docs: fix link in multiregion job spec page (#26755 )	2025-09-16 13:00:42 -05:00
Tim Gross	ac86225e09	metrics: reduce heap usage of eval broker metrics (#26737 ) The metrics on the eval broker include labels for the job ID, but under a high volume of dispatch workloads, this results in excessive heap usage on the leader. Dispatch workloads should use their parent ID rather than their child ID for any metrics we collect. Also, eliminate an extra copy of the labels. And remove the extremely high cardinality `"eval_id"` label from the `nomad.broker.eval_waiting` metric. Fixes: https://github.com/hashicorp/nomad/issues/26657	2025-09-12 08:29:46 -04:00
Tim Gross	db8ecac20d	docs: include Consul namespace claim mapping in auth config example (#26730 ) When configuring Nomad Enterprise with Consul Enterprise and multiple namespaces, you need to include the `consul_namespace` mapping in the auth method configuration. Otherwise you'll see an error like "unknown variable accessed: value.consul_namespace". There's no example of the updated auth method configuration you need, which makes this detail unclear when we're showing the claim being used in the following `consul acl auth-method create` command.	2025-09-08 15:15:47 -04:00
James Rasell	1916a16311	exec: Set LOGNAME env var on exec based drivers. (#26703 ) Typically the `LOGNAME` environment variable should be set according to the values within `/etc/passwd` and represents the name of the logged in user. This should be set, where possible, alongside the USER and HOME variables for all drivers that use the shared executor and do not use a sub-shell.	2025-09-05 14:07:27 +01:00
Daniel Bennett	9682aa2724	consul connect: allow "cni/*" network mode (#26449 ) don't require "bridge" network mode when using connect{} we document this as "at your own risk" because CNI configuration is so flexible that we can't guarantee a user's network will work, but Nomad's "bridge" CNI config may be used as a reference.	2025-09-04 12:29:50 -04:00
Chris Roberts	c3dcdb5413	[cli] Add windows service commands (#26442 ) Adds a new `windows` command which is available when running on a Windows hosts. The command includes two new subcommands: * `service install` * `service uninstall` The `service install` command will install the called binary into the Windows program files directory, create a new Windows service, setup configuration and data directories, and register the service with the Window eventlog. If the service and/or binary already exist, the service will be stopped, service and eventlog updated if needed, binary replaced, and the service started again. The `service uninstall` command will stop the service, remove the Windows service, and deregister the service with the eventlog. It will not remove the configuration/data directory nor will it remove the installed binary.	2025-09-02 16:40:35 -07:00
Chris Roberts	fd1e40537c	[artifact] add artifact inspection after download (#26608 ) This adds artifact inspection after download to detect any issues with the content fetched. Currently this means checking for any symlinks within the artifact that resolve outside the task or allocation directories. On platforms where lockdown is available (some Linux) this inspection is not performed. The inspection can be disabled with the DisableArtifactInspection option. A dedicated option for disabling this behavior allows the DisableFilesystemIsolation option to be enabled but still have artifacts inspected after download.	2025-08-27 10:37:34 -07:00
James Rasell	71e66231f9	docs: Add node identity and introduction CLI, API, and config docs (#26516 ) Co-authored-by: Aimee Ukasick <Aimee.Ukasick@ibm.com>	2025-08-26 15:26:00 +01:00
Aimee Ukasick	bb7114e518	Docs Chore: Add release notes for 1.10.1-1.10.3 (#26593 ) * add 1.10.3 * add 1.10.2 * Add 1.10.1 release notes; add partials to share * address feedback	2025-08-25 09:38:15 -05:00
Michael Schurter	ee5059a6a7	docs: revert to labels={"foo.bar": "baz"} style (#26535 ) * docs: revert to labels={"foo.bar": "baz"} style Back in #24074 I thought it was necessary to wrap labels in a list to support quoted keys in hcl2. This... doesn't appear to be true at all? The simpler `labels={...}` syntax appears to work just fine. I updated the docs and a test (and modernized it a bit). I also switched some other examples to the `labels = {}` format from the old `labels{}` format. * copywronged * fmtd	2025-08-20 09:26:42 -07:00
Aimee Ukasick	c17b15f8d0	change overview pages usage to use plaintext code block (#26575 )	2025-08-19 09:47:37 -05:00
Tim Gross	b8b95eb918	docs: warn against enabling Prometheus metrics if not in use (#26560 ) The go-metrics library retains Prometheus metrics in memory until expiration, but the expiration logic requires that the metrics are being regularly scraped. If you don't have a Prometheus server scraping, this leads to ever-increasing memory usage. In particular, high volume dispatch workloads emit a large set of label values and if these are not eventually aged out the bulk of Nomad server memory can end up consumed by metrics.	2025-08-19 08:44:16 -04:00
Daniel Bennett	fdd46e6fd3	docs: cni: add tproxy conflist example (#26532 )	2025-08-18 12:04:34 -04:00
Aimee Ukasick	52b8deeb3b	Docs: Add 1.10.4 release notes (#26524 ) * 1.10.4 release notes * update node version in package.json so Vercel builds * revert node version * address feedback; add missing "-" to debug parms	2025-08-18 11:04:06 -05:00
Austin Workman	26f02c25c6	docs: Update virt install.mdx (#26531 ) Fixing plugin name in nomad client plugin config example.	2025-08-18 10:58:15 -05:00
Frédéric Praca	7b9bebd653	[Doc] Fix link for Nomad event stream page (#26522 ) * fix(doc): fix links for task driver plugins host URL was wrong, changed from develoepr to developer * Update stateful-workloads.mdx Fix link for Nomad event stream page	2025-08-14 18:29:44 -05:00
Aimee Ukasick	befc755f98	Docs Nomad Pack: Add CLI command reference (#26508 ) * Add CLI commands to Nomad Pack docs. * organize subcommands into directories * seo updates; style guide clean up	2025-08-14 09:22:42 -05:00
Aimee Ukasick	9bcfe7bd36	Docs: Update SSO with Auth0 guide (#26488 ) * initial * Update for Auth0 changes. * updated to end * fix URL with double forward slashes	2025-08-12 09:34:23 -05:00
Adiel Cristo	d4eb251004	fix(docs): remove incomplete phrase fragment (#26489 )	2025-08-11 07:40:36 -05:00
Juana De La Cuesta	225ac2938a	Add new metric for queue size to the autoscaler (#26453 ) * docs: add a new metric to the autoscaler for the size of the execution queue * Update telemetry.mdx * Update telemetry.mdx	2025-08-11 10:26:57 +02:00
Aimee Ukasick	d305f32017	Docs: Plugin authoring guide (#26395 ) * create plugin author guide; remove concepts/plugins * style guide; update links * update cni redirect * move host-volume plugin to /plugins/. Add arch host volume content. * Apply Jeff's style guide updates Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com> * Create Base plugin API section, link to BasePlugin interface --------- Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>	2025-08-08 14:55:58 -05:00
Wim	f712d5db90	Add AllocIPv6 option to allow IPv6 address being used for service registration (#25632 ) Fixes #25627 by adding an extra `alloc_advertise_ipv6` option similar to the `AdvertiseIPv6Addr` with the docker driver config. Fixes: https://github.com/hashicorp/nomad/issues/25627	2025-08-08 15:01:46 -04:00
Michael Smithhisler	b6f90d0562	docs: fix indent on vault create_from_role (#26472 )	2025-08-07 16:03:33 -05:00
Daniel Bennett	3c435d2953	docs: cni: add ipv6 bridge example (#26456 )	2025-08-07 16:16:45 -04:00
Tim Gross	5d8e8df7bd	docs: clarify consumers of environment variables for CLI (#26459 ) In https://github.com/hashicorp/nomad/issues/15459 we've had a bit of back-and-forth as a result of applying Nomad environment variables where they typically should not be used. Clarify that the env vars are for the CLI and mostly not for the agent. Also move the `NOMAD_CLI_SHOW_HINTS` description into the correct section.	2025-08-07 15:47:32 -04:00
Tim Gross	9717719502	docs: fix missing entry from template function_denylist (#26458 ) The docs for the `template` block accurately describe the template configuration default function denylist in the body but the default parameters are missing values. The equivalent docs in the `client` configuration are missing `executeTemplate` as well.	2025-08-07 15:47:14 -04:00
Allison Larson	e16a3339ad	Add CSI Volume Sentinel Policy scaffolding (#26438 ) * Add ent policy enforcement stubs to CSI Volume create/register * Wire policy override/warnings through CSI volume register/create * Add new scope to sentinel apply * Sanitize CSISecrets & CSIMountOptions * Add sentinel policy scope to ui * Update docs for new sentinel scope/policy * Create new api funcs for CSI endpoints * fix sentinel csi ui test * Update sentinel-policy docs * Add changelog * Update docs from feedback	2025-08-07 12:03:18 -07:00
Michael Schurter	0f630004b9	docs: Once -> once (#26435 )	2025-08-05 11:10:25 -07:00
tehut	21841d3067	Add historical journald and log export flags to operator debug command (#26410 ) * Add -log-file-export and -log-lookback commands to add historical log to debug capture * use monitor.PrepFile() helper for other historical log tests	2025-08-04 13:55:25 -07:00
tehut	d709accaf5	Add nomad monitor export command (#26178 ) * Add MonitorExport command and handlers * Implement autocomplete * Require nomad in serviceName * Fix race in StreamReader.Read * Add and use framer.Flush() to coordinate function exit * Add LogFile to client/Server config and read NomadLogPath in rpcHandler instead of HTTPServer * Parameterize StreamFixed stream size	2025-08-01 10:26:59 -07:00
Aimee Ukasick	5dc7e7fe25	Docs: Chore: Ent labels (#26323 ) * replace outdated tutorial links * update more tutorial links * Add CE/ENT or ENT to left nav * remove ce/ent labels * revert enterprise features	2025-07-30 09:02:28 -05:00
Tim Gross	501608ca68	docs: document handling of unset affinity/constraint values (#26354 ) Affinities and contraints use similar feasibility checking logic to determine if a given node matches (although affinities don't support all the same operators). Most operators don't allow `value` to be unset. Update the docs to reflect this. Fixes: https://github.com/hashicorp/nomad/issues/24983	2025-07-28 14:12:43 -04:00
Tim Gross	b286a8ee9c	docs: update Consul/Vault compatibility matrix (#26368 ) Update our support matrix to show currently-supported versions of Consul, Vault, and Nomad.	2025-07-28 13:48:38 -04:00
Tim Gross	192dec4297	docs: fix self-referencing link for raw_exec driver config (#26353 ) During the big docs rearchitecture, we split up the task driver pages into separate job declaration and driver configuration pages. The link for the `raw_exec` driver to the configuration page is a self-reference.	2025-07-28 13:48:23 -04:00
Tim Gross	513ec02486	docs: explain access modes for CSI and DHV volumes (#26352 ) The documentation for CSI and DHV has a list of the available access modes, but doesn't explain what they mean in terms of what jobs can request, the scheduler behavior, or the CSI plugin behavior. Expand on the information available in the CSI specification and provide a description of DHV's behavior as well. Ref: https://github.com/container-storage-interface/spec/blob/master/spec.md#createvolume	2025-07-28 13:48:01 -04:00
Aimee Ukasick	ccaa3b7325	add table to service.port entry (#26344 )	2025-07-24 14:00:05 -05:00
Tim Gross	b91d1726ce	docs: clarify namespace support in autoscaler (#26337 ) The current autoscaler docs implies that it has minimal or non-working support for Nomad namespaces. Whereas in fact the namespace support works fine but just doesn't allow configuring multiple namespaces without using a wildcard (for now). Make this more clear and fix the reference to the configuration "below", which is no longer on that same page. Ref: https://github.com/hashicorp/nomad-autoscaler/issues/65	2025-07-24 12:16:24 -04:00
Aimee Ukasick	55926afe11	Docs: Clarify service.connect examples (#26330 ) * Docs: CE-997 clarify connect examples * fix DSN typos * CE-996 clarify agent config consul.client_auto_join * add (formerly Consul Connect) * remove 'Nomad and Consul are	2025-07-24 10:59:03 -05:00
Aimee Ukasick	e6d63faf58	Fix typo (#26319 )	2025-07-22 09:53:31 -05:00
Michael Smithhisler	36b4aa79df	docs: fix link to nomad schedulers (#26302 )	2025-07-21 08:53:29 -05:00
Aimee Ukasick	0d620607fe	add blog links and video to nomad vs k8s (#26286 )	2025-07-16 12:56:42 -05:00

1 2 3 4 5 ...

1482 Commits