5124 Commits

Author SHA1 Message Date
Brendan MacDonell
26485c45a2 Add job_max_count option to keep Nomad server from running out of memory (#26858)
If a Nomad job is started with a large number of instances (e.g. 4 billion),
then the Nomad servers that attempt to schedule it will run out of memory and
crash. While it's unlikely that anyone would intentionally schedule a job with 4
billion instances, we have occasionally run into issues with bugs in external
automation. For example, an automated deployment system running on a test
environment had an off-by-one error, and deployed a job with count = uint32(-1),
causing the Nomad servers for that environment to run out of memory and crash.

To prevent this, this PR introduces a job_max_count Nomad server configuration
parameter. job_max_count limits the number of allocs that may be created from a
job. The default value is 50000 - this is low enough that a job with the maximum
possible number of allocs will not require much memory on the server, but is
still much higher than the number of allocs in the largest Nomad job we have
ever run.
2025-10-06 09:35:10 -04:00
Allison Larson
e40164abce Add preserve-resources flag (#26841)
* Add preserve-resources flag when registering a job

* Add preserve-resources flag to website docs

* Add changelog

* Update tests, docs

* Preserve counts & resources in fsm

* Update doc

* Update preservation of resources/count to happen in StateStore
2025-10-02 13:56:59 -07:00
Michael Smithhisler
f2b831a430 docs: add job spec and plugin authoring pages for secrets (#26529)
---------

Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>
2025-10-01 10:46:12 -04:00
Chris Roberts
1cf8d35245 docs: fix broken link for plugin guide (#26843) 2025-09-30 09:21:28 -05:00
James Rasell
e6a04e06d1 acl: Check for duplicate or invalid keys when writing new policies (#26836)
ACL policies are parsed when creating, updating, or compiling the
resulting ACL object when used. This parsing was silently ignoring
duplicate singleton keys, or invalid keys which does not grant any
additional access, but is a poor UX and can be unexpected.

This change parses all new policy writes and updates, so that
duplicate or invalid keys return an error to the caller. This is
called strict parsing. In order to correctly handle upgrades of
clusters which have existing policies that would fall foul of the
change, a lenient parsing mode is also available. This allows
the policy to continue to be parsed and compiled after an upgrade
without the need for an operator to correct the policy document
prior to further use.

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2025-09-30 08:16:59 +01:00
James Rasell
61a4a02166 docs: Add node identity concepts page and other missing items. (#26830)
Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>
2025-09-26 07:44:58 +01:00
Tim Gross
b5530128df docs: expand on allocation GC details (#26792)
Expand on the documentation of allocation garbage collection:
* Explain that server-side GC of allocations is tied to the GC of the
evaluation that spawned the allocation.
* Explain that server-side GC of allocations will force them to be immediately
GC'd on the client regardless of the client-side configurations.

Ref: https://github.com/hashicorp/nomad/issues/26765

Co-authored-by: Aimee Ukasick <Aimee.Ukasick@ibm.com>
Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>
2025-09-19 12:17:17 -04:00
Jeff Boruszak
6dce21bc85 Merge pull request #26682 from hashicorp/docs/versioned-redirect-fix
docs: Versioned docs redirect fixes
2025-09-17 08:58:37 -07:00
Aimee Ukasick
fca783c566 Add 1.10.5 release notes (#26782) 2025-09-17 08:59:43 -05:00
James Rasell
ac5a77af56 docs: Add client identity HTTP API detail on api-docs page. (#26774)
Co-authored-by: Aimee Ukasick <Aimee.Ukasick@ibm.com>
2025-09-17 14:05:37 +01:00
boruszak
8ab61f37b3 Fix accidental "s 2025-09-16 14:23:59 -07:00
Michael Smithhisler
1a19a16ee9 docs: fix link in multiregion job spec page (#26755) 2025-09-16 13:00:42 -05:00
dependabot[bot]
2baeffec92 chore(deps-dev): bump prettier from 3.5.3 to 3.6.2 in /website (#26162)
Bumps [prettier](https://github.com/prettier/prettier) from 3.5.3 to 3.6.2.
- [Release notes](https://github.com/prettier/prettier/releases)
- [Changelog](https://github.com/prettier/prettier/blob/main/CHANGELOG.md)
- [Commits](https://github.com/prettier/prettier/compare/3.5.3...3.6.2)

---
updated-dependencies:
- dependency-name: prettier
  dependency-version: 3.6.2
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-15 08:51:31 -04:00
Tim Gross
ac86225e09 metrics: reduce heap usage of eval broker metrics (#26737)
The metrics on the eval broker include labels for the job ID, but under a high
volume of dispatch workloads, this results in excessive heap usage on the
leader. Dispatch workloads should use their parent ID rather than their child ID
for any metrics we collect.

Also, eliminate an extra copy of the labels. And remove the extremely high
cardinality `"eval_id"` label from the `nomad.broker.eval_waiting` metric.

Fixes: https://github.com/hashicorp/nomad/issues/26657
2025-09-12 08:29:46 -04:00
Tim Gross
db8ecac20d docs: include Consul namespace claim mapping in auth config example (#26730)
When configuring Nomad Enterprise with Consul Enterprise and multiple
namespaces, you need to include the `consul_namespace` mapping in the auth
method configuration. Otherwise you'll see an error like "unknown variable
accessed: value.consul_namespace". There's no example of the updated auth method
configuration you need, which makes this detail unclear when we're showing the
claim being used in the following `consul acl auth-method create` command.
2025-09-08 15:15:47 -04:00
James Rasell
1916a16311 exec: Set LOGNAME env var on exec based drivers. (#26703)
Typically the `LOGNAME` environment variable should be set according
to the values within `/etc/passwd` and represents the name of the
logged in user. This should be set, where possible, alongside the
USER and HOME variables for all drivers that use the shared
executor and do not use a sub-shell.
2025-09-05 14:07:27 +01:00
Daniel Bennett
9682aa2724 consul connect: allow "cni/*" network mode (#26449)
don't require "bridge" network mode when using connect{}

we document this as "at your own risk" because CNI configuration
is so flexible that we can't guarantee a user's network will work,
but Nomad's "bridge" CNI config may be used as a reference.
2025-09-04 12:29:50 -04:00
Chris Roberts
c3dcdb5413 [cli] Add windows service commands (#26442)
Adds a new `windows` command which is available when running on
a Windows hosts. The command includes two new subcommands:

* `service install`
* `service uninstall`

The `service install` command will install the called binary into
the Windows program files directory, create a new Windows service,
setup configuration and data directories, and register the service
with the Window eventlog. If the service and/or binary already
exist, the service will be stopped, service and eventlog updated
if needed, binary replaced, and the service started again.

The `service uninstall` command will stop the service, remove the
Windows service, and deregister the service with the eventlog. It
will not remove the configuration/data directory nor will it remove
the installed binary.
2025-09-02 16:40:35 -07:00
boruszak
10658a9391 Syntax fix 2025-09-02 12:44:48 -07:00
Chris Roberts
fd1e40537c [artifact] add artifact inspection after download (#26608)
This adds artifact inspection after download to detect any issues
with the content fetched. Currently this means checking for any
symlinks within the artifact that resolve outside the task or
allocation directories. On platforms where lockdown is available
(some Linux) this inspection is not performed.

The inspection can be disabled with the DisableArtifactInspection
option. A dedicated option for disabling this behavior allows
the DisableFilesystemIsolation option to be enabled but still
have artifacts inspected after download.
2025-08-27 10:37:34 -07:00
James Rasell
71e66231f9 docs: Add node identity and introduction CLI, API, and config docs (#26516)
Co-authored-by: Aimee Ukasick <Aimee.Ukasick@ibm.com>
2025-08-26 15:26:00 +01:00
Leah Bush
36d423ceda Merge pull request #26580 from hashicorp/leah/feat/upgrade-node
feat: upgrade node version to v22
2025-08-25 10:02:30 -05:00
Aimee Ukasick
bb7114e518 Docs Chore: Add release notes for 1.10.1-1.10.3 (#26593)
* add 1.10.3

* add 1.10.2

* Add 1.10.1 release notes; add partials to share

* address feedback
2025-08-25 09:38:15 -05:00
Michael Schurter
ee5059a6a7 docs: revert to labels={"foo.bar": "baz"} style (#26535)
* docs: revert to labels={"foo.bar": "baz"} style

Back in #24074 I thought it was necessary to wrap labels in a list to
support quoted keys in hcl2. This... doesn't appear to be true at all?
The simpler `labels={...}` syntax appears to work just fine.

I updated the docs and a test (and modernized it a bit). I also switched
some other examples to the `labels = {}` format from the old `labels{}`
format.

* copywronged

* fmtd
2025-08-20 09:26:42 -07:00
Leah Bush
07fae8440a feat: upgrade node version to v22 2025-08-19 11:29:56 -05:00
Aimee Ukasick
c17b15f8d0 change overview pages usage to use plaintext code block (#26575) 2025-08-19 09:47:37 -05:00
Tim Gross
b8b95eb918 docs: warn against enabling Prometheus metrics if not in use (#26560)
The go-metrics library retains Prometheus metrics in memory until expiration,
but the expiration logic requires that the metrics are being regularly
scraped. If you don't have a Prometheus server scraping, this leads to
ever-increasing memory usage. In particular, high volume dispatch workloads emit
a large set of label values and if these are not eventually aged out the bulk of
Nomad server memory can end up consumed by metrics.
2025-08-19 08:44:16 -04:00
Daniel Bennett
fdd46e6fd3 docs: cni: add tproxy conflist example (#26532) 2025-08-18 12:04:34 -04:00
Aimee Ukasick
52b8deeb3b Docs: Add 1.10.4 release notes (#26524)
* 1.10.4 release notes

* update node version in package.json so Vercel builds

* revert node version

* address feedback; add missing "-" to debug parms
2025-08-18 11:04:06 -05:00
Austin Workman
26f02c25c6 docs: Update virt install.mdx (#26531)
Fixing plugin name in nomad client plugin config example.
2025-08-18 10:58:15 -05:00
Frédéric Praca
7b9bebd653 [Doc] Fix link for Nomad event stream page (#26522)
* fix(doc): fix links for task driver plugins

host URL was wrong, changed from develoepr to developer

* Update stateful-workloads.mdx

Fix link for Nomad event stream page
2025-08-14 18:29:44 -05:00
Aimee Ukasick
befc755f98 Docs Nomad Pack: Add CLI command reference (#26508)
* Add CLI commands to Nomad Pack docs.

* organize subcommands into directories

* seo updates; style guide clean up
2025-08-14 09:22:42 -05:00
Aimee Ukasick
9bcfe7bd36 Docs: Update SSO with Auth0 guide (#26488)
* initial

* Update for Auth0 changes.

* updated to end

* fix URL with double forward slashes
2025-08-12 09:34:23 -05:00
Adiel Cristo
d4eb251004 fix(docs): remove incomplete phrase fragment (#26489) 2025-08-11 07:40:36 -05:00
Juana De La Cuesta
225ac2938a Add new metric for queue size to the autoscaler (#26453)
* docs: add a new metric to the autoscaler for the size of the execution queue

* Update telemetry.mdx

* Update telemetry.mdx
2025-08-11 10:26:57 +02:00
Aimee Ukasick
d305f32017 Docs: Plugin authoring guide (#26395)
* create plugin author guide; remove concepts/plugins

* style guide; update links

* update cni redirect

* move host-volume plugin to /plugins/. Add arch host volume content.

* Apply Jeff's style guide updates

Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>

* Create Base plugin API section, link to BasePlugin interface

---------

Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>
2025-08-08 14:55:58 -05:00
Wim
f712d5db90 Add AllocIPv6 option to allow IPv6 address being used for service registration (#25632)
Fixes #25627 by adding an extra `alloc_advertise_ipv6` option similar to the `AdvertiseIPv6Addr` with the docker driver config.

Fixes: https://github.com/hashicorp/nomad/issues/25627
2025-08-08 15:01:46 -04:00
Alexey Kulakov
34025aa6b6 fix(website): node version bump from v18 to v22 (#26479) 2025-08-08 10:54:35 -07:00
Michael Smithhisler
b6f90d0562 docs: fix indent on vault create_from_role (#26472) 2025-08-07 16:03:33 -05:00
Daniel Bennett
3c435d2953 docs: cni: add ipv6 bridge example (#26456) 2025-08-07 16:16:45 -04:00
Tim Gross
5d8e8df7bd docs: clarify consumers of environment variables for CLI (#26459)
In https://github.com/hashicorp/nomad/issues/15459 we've had a bit of
back-and-forth as a result of applying Nomad environment variables where they
typically should not be used. Clarify that the env vars are for the CLI and
mostly not for the agent. Also move the `NOMAD_CLI_SHOW_HINTS` description into
the correct section.
2025-08-07 15:47:32 -04:00
Tim Gross
9717719502 docs: fix missing entry from template function_denylist (#26458)
The docs for the `template` block accurately describe the template configuration
default function denylist in the body but the default parameters are missing
values. The equivalent docs in the `client` configuration are missing
`executeTemplate` as well.
2025-08-07 15:47:14 -04:00
Allison Larson
e16a3339ad Add CSI Volume Sentinel Policy scaffolding (#26438)
* Add ent policy enforcement stubs to CSI Volume create/register

* Wire policy override/warnings through CSI volume register/create

* Add new scope to sentinel apply

* Sanitize CSISecrets & CSIMountOptions

* Add sentinel policy scope to ui

* Update docs for new sentinel scope/policy

* Create new api funcs for CSI endpoints

* fix sentinel csi ui test

* Update sentinel-policy docs

* Add changelog

* Update docs from feedback
2025-08-07 12:03:18 -07:00
Michael Schurter
0f630004b9 docs: Once -> once (#26435) 2025-08-05 11:10:25 -07:00
tehut
21841d3067 Add historical journald and log export flags to operator debug command (#26410)
* Add -log-file-export and -log-lookback commands to add historical log to
debug capture
* use monitor.PrepFile() helper for other historical log tests
2025-08-04 13:55:25 -07:00
tehut
d709accaf5 Add nomad monitor export command (#26178)
* Add MonitorExport command and handlers
* Implement autocomplete
* Require nomad in serviceName
* Fix race in StreamReader.Read
* Add and use framer.Flush() to coordinate function exit
* Add LogFile to client/Server config and read NomadLogPath in rpcHandler instead of HTTPServer
* Parameterize StreamFixed stream size
2025-08-01 10:26:59 -07:00
Aimee Ukasick
5dc7e7fe25 Docs: Chore: Ent labels (#26323)
* replace outdated tutorial links

* update more tutorial links

* Add CE/ENT or ENT to left nav

* remove ce/ent labels

* revert enterprise features
2025-07-30 09:02:28 -05:00
Tim Gross
e062f87b07 docs: fix typo in redirect URL domain (#26384) 2025-07-28 16:28:27 -04:00
Tim Gross
501608ca68 docs: document handling of unset affinity/constraint values (#26354)
Affinities and contraints use similar feasibility checking logic to determine if
a given node matches (although affinities don't support all the same
operators). Most operators don't allow `value` to be unset. Update the docs to
reflect this.

Fixes: https://github.com/hashicorp/nomad/issues/24983
2025-07-28 14:12:43 -04:00
Tim Gross
b286a8ee9c docs: update Consul/Vault compatibility matrix (#26368)
Update our support matrix to show currently-supported versions of Consul, Vault,
and Nomad.
2025-07-28 13:48:38 -04:00