Commit Graph

72 Commits

Author SHA1 Message Date
Brendan MacDonell
26485c45a2 Add job_max_count option to keep Nomad server from running out of memory (#26858)
If a Nomad job is started with a large number of instances (e.g. 4 billion),
then the Nomad servers that attempt to schedule it will run out of memory and
crash. While it's unlikely that anyone would intentionally schedule a job with 4
billion instances, we have occasionally run into issues with bugs in external
automation. For example, an automated deployment system running on a test
environment had an off-by-one error, and deployed a job with count = uint32(-1),
causing the Nomad servers for that environment to run out of memory and crash.

To prevent this, this PR introduces a job_max_count Nomad server configuration
parameter. job_max_count limits the number of allocs that may be created from a
job. The default value is 50000 - this is low enough that a job with the maximum
possible number of allocs will not require much memory on the server, but is
still much higher than the number of allocs in the largest Nomad job we have
ever run.
2025-10-06 09:35:10 -04:00
James Rasell
7466dd71b2 server: Add new server.client_introduction config block. (#26315)
The new configuration block exposes some key options which allow
cluster administrators to control certain client introduction
behaviours.

This change introduces the new block and plumbing, so that it is
exposed in the Nomad server for consumption via internal processes.
2025-07-22 08:50:19 +01:00
Tim Gross
3f59860254 host volumes: add configuration to GC on node GC (#25903)
When a node is garbage collected, any dynamic host volumes on the node are
orphaned in the state store. We generally don't want to automatically collect
these volumes and risk data loss, and have provided a CLI flag to `-force`
remove them in #25902. But for clusters running on ephemeral cloud
instances (ex. AWS EC2 in an autoscaling group), deleting host volumes may add
excessive friction. Add a configuration knob to the client configuration to
remove host volumes from the state store on node GC.

Ref: https://github.com/hashicorp/nomad/pull/25902
Ref: https://github.com/hashicorp/nomad/issues/25762
Ref: https://hashicorp.atlassian.net/browse/NMD-705
2025-05-27 10:22:08 -04:00
tehut
55523ecf8e Add NodeMaxAllocations to client configuration (#25785)
* Set MaxAllocations in client config
Add NodeAllocationTracker struct to Node struct
Evaluate MaxAllocations in AllocsFit function
Set up cli config parsing
Integrate maxAllocs into AllocatedResources view
Co-authored-by: Tim Gross <tgross@hashicorp.com>

---------

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2025-05-22 12:49:27 -07:00
Daniel Bennett
15c01e5a49 ipv6: normalize addrs per RFC-5942 §4 (#25921)
https://datatracker.ietf.org/doc/html/rfc5952#section-4

* copy NormalizeAddr func from vault
  * PRs hashicorp/vault#29228 & hashicorp/vault#29517
* normalize bind/advertise addrs
* normalize consul/vault addrs
2025-05-22 14:21:30 -04:00
Tim Gross
8a5a057d88 offline license utilization reporting (#25844)
Nomad Enterprise users operating in air-gapped or otherwise secured environments
don't want to send license reporting metrics directly from their
servers. Implement manual/offline reporting by periodically recording usage
metrics snapshots in the state store, and providing an API and CLI by which
cluster administrators can download the snapshot for review and out-of-band
transmission to HashiCorp.

This is the CE portion of the work required for implemention in the Enterprise
product. Nomad CE does not perform utilization reporting.

Ref: https://github.com/hashicorp/nomad-enterprise/pull/2673
Ref: https://hashicorp.atlassian.net/browse/NMD-68
Ref: https://go.hashi.co/rfc/nmd-210
2025-05-14 09:51:13 -04:00
James Rasell
0b265d2417 encrypter: Track initial tasks for is ready calculation. (#25803)
The server startup could "hang" to the view of an operator if it
had a key that could not be decrypted or replicated loaded from
the FSM at startup.

In order to prevent this happening, the server startup function
will now use a timeout to wait for the encrypter to be ready. If
the timeout is reached, the error is sent back to the caller which
fails the CLI command. This bubbling of error message will also
flush to logs which will provide addition operator feedback.

The server only cares about keys loaded from the FSM snapshot and
trailing logs before the encrypter should be classed as ready. So
that the encrypter ready function does not get blocked by keys
added outside of the initial Raft load, we take a snapshot of the
decryption tasks as we enter the blocking call, and class these as
our barrier.
2025-05-07 15:38:16 +01:00
Nikita Eliseev
76fb3eb9a1 rpc: added configuration for yamux session (#25466)
Fixes: https://github.com/hashicorp/nomad/issues/25380
2025-04-02 10:58:23 -04:00
James Rasell
7268053174 vault: Remove legacy token based authentication workflow. (#25155)
The legacy workflow for Vault whereby servers were configured
using a token to provide authentication to the Vault API has now
been removed. This change also removes the workflow where servers
were responsible for deriving Vault tokens for Nomad clients.

The deprecated Vault config options used byi the Nomad agent have
all been removed except for "token" which is still in use by the
Vault Transit keyring implementation.

Job specification authors can no longer use the "vault.policies"
parameter and should instead use "vault.role" when not using the
default workload identity.

---------

Co-authored-by: Tim Gross <tgross@hashicorp.com>
Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>
2025-02-28 07:40:02 +00:00
James Rasell
7d48aa2667 client: emit optional telemetry from prerun and prestart hooks. (#24556)
The Nomad client can now optionally emit telemetry data from the
prerun and prestart hooks. This allows operators to monitor and
alert on failures and time taken to complete.

The new datapoints are:
  - nomad.client.alloc_hook.prerun.success (counter)
  - nomad.client.alloc_hook.prerun.failed (counter)
  - nomad.client.alloc_hook.prerun.elapsed (sample)

  - nomad.client.task_hook.prestart.success (counter)
  - nomad.client.task_hook.prestart.failed (counter)
  - nomad.client.task_hook.prestart.elapsed (sample)

The hook execution time is useful to Nomad engineering and will
help optimize code where possible and understand job specification
impacts on hook performance.

Currently only the PreRun and PreStart hooks have telemetry
enabled, so we limit the number of new metrics being produced.
2024-12-12 14:43:14 +00:00
Tim Gross
651d8d6f88 tests: fixup copywrite in test file (#24101)
In #24007 we merged new HCL files but they were missing copywrite headers
because the scan didn't run on this PR for some reason. I've already backported
this to the Enterprise branches.
2024-10-01 16:43:10 -04:00
Juliano Martinez
4a74fda8ce Allow client template config block to be parsed when using json config (#24007)
- Adds tests
- Adds sample test data for parsing hcl and json
- Adds changelog
2024-10-01 15:44:36 -04:00
Daniel Bennett
2f5cf8efae networking: option to enable ipv6 on bridge network (#23882)
by setting bridge_network_subnet_ipv6 in client config

Co-authored-by: Martina Santangelo <martina.santangelo@hashicorp.com>
2024-09-04 10:17:10 -05:00
Tim Gross
9d4686c0df tls: remove deprecated prefer_server_cipher_suites field (#23712)
The TLS configuration object includes a deprecated `prefer_server_cipher_suites`
field. In version of Go prior to 1.17, this property controlled whether a TLS
connection would use the cipher suites preferred by the server or by the
client. This field is ignored as of 1.17 and, according to the `crypto/tls`
docs: "Servers now select the best mutually supported cipher suite based on
logic that takes into account inferred client hardware, server hardware, and
security."

This property has been long-deprecated and leaving it in place may lead to false
assumptions about how cipher suites are negotiated in connection to a server. So
we want to remove it in Nomad 1.9.0.

Fixes: https://github.com/hashicorp/nomad-enterprise/issues/999
Ref: https://hashicorp.atlassian.net/browse/NET-10531
2024-08-01 08:52:05 -04:00
Tim Gross
c8be863bc8 reporting: allow export interval and address to be configurable (#23674)
The go-census library supports configuration to send metrics to a local
development version of the collector. Add "undocumented" configuration options
to the `reporting` block allow developers to debug and verify we're sending the
data we expect with real Nomad servers and not just unit tests.

Ref: https://hashicorp.atlassian.net/browse/NET-10057
Ref: https://github.com/hashicorp/nomad-enterprise/pull/1708
2024-07-24 08:29:59 -04:00
Tim Gross
c970d22164 keyring: support external KMS for key encryption key (KEK) (#23580)
In Nomad 1.4.0, we shipped support for encrypted Variables and signed Workload
Identities, but the key material is protected only by a AEAD encrypting the
KEK. Add support for Vault transit encryption and external KMS from major cloud
providers. The servers call out to the external service to decrypt each key in
the on-disk keystore.

Ref: https://hashicorp.atlassian.net/browse/NET-10334
Fixes: https://github.com/hashicorp/nomad/issues/14852
2024-07-18 09:42:28 -04:00
James Rasell
facc3e8013 agent: allow configuration of in-memory telemetry sink. (#20166)
This change adds configuration options for setting the in-memory
telemetry sink collection and retention durations. This sink backs
the metrics JSON API and previously had hard-coded default values.

The new options are particularly useful when running development or
debug environments, where metrics collection is desired at a fast
and granular rate.
2024-03-25 15:00:18 +00:00
Seth Hoenig
05937ab75b exec2: add client support for unveil filesystem isolation mode (#20115)
* exec2: add client support for unveil filesystem isolation mode

This PR adds support for a new filesystem isolation mode, "Unveil". The
mode introduces a "alloc_mounts" directory where tasks have user-owned
directory structure which are bind mounts into the real alloc directory
structure. This enables a task driver to use landlock (and maybe the
real unveil on openbsd one day) to isolate a task to the task owned
directory structure, providing sandboxing.

* actually create alloc-mounts-dir directory

* fix doc strings about alloc mount dir paths
2024-03-13 08:24:17 -05:00
Tim Gross
9d075c44b2 config: remove old Vault/Consul config blocks from parser (#18997)
Remove the now-unused original configuration blocks for Consul and Vault from
the agent configuration parsing. When the agent needs to refer to a Consul or
Vault block it will always be for a specific cluster for the task/service (or
the default cluster for the agent's own use).

This is third of three changesets for this work.

Fixes: https://github.com/hashicorp/nomad/issues/18947
Ref: https://github.com/hashicorp/nomad/pull/18991
Ref: https://github.com/hashicorp/nomad/pull/18994
2023-11-08 09:30:08 -05:00
Piotr Kazmierczak
7f62dec473 consul WI: rename default auth method for services (#18867)
It should be called nomad-services instead of nomad-workloads.
2023-10-26 09:43:33 +02:00
Tim Gross
d0957eb109 Consul: agent config updates for WI (#18774)
This changeset makes two changes:
* Removes the `consul.use_identity` field from the agent configuration. This behavior is properly covered by the presence of `consul.service_identity` / `consul.task_identity` blocks.
* Adds a `consul.task_auth_method` and `consul.service_auth_method` fields to the agent configuration. This allows the cluster administrator to choose specific Consul Auth Method names for their environment, with a reasonable default.
2023-10-17 14:42:14 -04:00
James Rasell
1ffdd576bb agent: add config option to enable file and line log detail. (#18768) 2023-10-16 15:59:16 +01:00
Luiz Aoqui
a4b29a29cb vault: add jwt_backend_path agent config (#18606)
Add agent configuration to allow cluster operators to define the path
where the JWT auth method backend is mounted.
2023-09-28 18:02:30 -03:00
Luiz Aoqui
fed1992cea vault: remove use_identity agent config (#18592)
The initial intention behind the `vault.use_identity` configuration was
to indicate to Nomad servers that they would need to sign a workload
identities for allocs with a `vault` block.

But in order to support identity renewal, #18262 and #18431 moved the
token signing logic to the alloc runner since a new token needs to be
signed prior to the TTL expiring.

So #18343 implemented `use_identity` as a flag to indicate that the
workload identity JWT flow should be used when deriving Vault tokens for
tasks.

But this configuration value is set on servers so it is not available to
clients at the time of token derivation, making its meaning not clear: a
job may end up using the identity-based flow even when `use_identity` is
`false`.

The only reliable signal available to clients at token derivation time
is the presence of an `identity` block for Vault, and this is already
configured with the `vault.default_identity` configuration block, making
`vault.use_identity` redundant.

This commit removes the `vault.use_identity` configuration and
simplifies the logic on when an implicit Vault identity is injected into
tasks.
2023-09-27 17:44:07 -03:00
Luiz Aoqui
19241964a4 config: fix some issues with workload identity and multi Consul and Vault (#18590)
* config: fix multi consul and vault config parse

Capture the loop variable when parsing multiple Consul and Vault
configuration blocks so the duration parse function uses the correct
field when it's called later on.

* client: build Vault client with right config

When setting up the multiple Vault clients, the code was always loading
the default configuration, resulting in all clients to be configured the
same way.

* config: fix WorkloadIdentityConfig.Copy() method

Ensure `WorkloadIdentityConfig.Copy()` does not return the original
pointer for the `TTL` field.
2023-09-27 14:41:11 -03:00
Juana De La Cuesta
124272c050 server: Add reporting option to agent (#18572)
* func: add reporting option to agent

* func: add test for merge and fix comments

* Update config_ce.go

* Update config_ce.go

* Update config_ce.go

* fix: add reporting config to default configuration and update to use must over require

* Update command/agent/config_parse.go

Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>

* Update nomad/structs/config/reporting.go

Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>

* Update nomad/structs/config/reporting.go

Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>

* style: rename license and reporting config

* fix: use default function instead of empty struct

---------

Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
2023-09-27 00:11:32 +02:00
Tim Gross
d7bd47d60f config: remove consul.template_identity in lieu of task_identity (#18540)
The original thinking for Workload Identity integration with Consul and Vault
was that we'd allow `template` blocks to specify their own identity. But because
the login to Consul/Vault to get tokens happens at the task level, this would
involve making the `template` block a new WID watcher on its own rather than
using the Consul and Vault hooks we're building at the group/task level.

So it doesn't make sense to have separate identities for individual `template`
blocks rather than at the level of tasks. Update the agent configuration to
rename the `template_identity` to the more accurate `task_identity`, which will
be used for any non-service hooks (just `template` today).

Update the implicit identities job mutation hook to create the identity we'll
need as well.
2023-09-20 15:43:08 -04:00
Luiz Aoqui
3534307d0d vault: add use_identity and default_identity agent configuration and implicit workload identity (#18343) 2023-09-12 13:53:37 -03:00
Luiz Aoqui
82372fecb8 config: add TTL to agent identity config (#18457)
Add support for identity token TTL in agent configuration fields such as
Consul `service_identity` and `template_identity`.

Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
2023-09-12 11:13:09 -03:00
Piotr Kazmierczak
b430d21a67 agent: add consul.service_identity and consul.template_identity blocks (#18279)
This PR introduces updates to the agent config required for workload identity support.
2023-08-24 17:45:34 +02:00
Tim Gross
a8bad048b6 config: parsing support for multiple Consul clusters in agent config (#18255)
Add the plumbing we need to accept multiple Consul clusters in Nomad agent
configuration, to support upcoming Nomad Enterprise features. The `consul` blocks
are differentiated by a new `name` field, and if the `name` is omitted it
becomes the "default" Consul configuration. All blocks with the same name are
merged together, as with the existing behavior.

As with the `vault` block, we're still using HCL1 for parsing configuration and
the `Decode` method doesn't parse multiple blocks differentiated only by a field
name without a label. So we've had to add an extra parsing pass, similar to what
we've done for HCL1 jobspecs. This also revealed a subtle bug in the `vault`
block handling of extra keys when there are multiple `vault` blocks, which I've
fixed here.

For now, all existing consumers will use the "default" Consul configuration, so
there's no user-facing behavior change in this changeset other than the contents
of the agent self API.

Ref: https://github.com/hashicorp/team-nomad/issues/404
2023-08-18 15:25:16 -04:00
Tim Gross
74b796e6d0 config: parsing support for multiple Vault clusters in agent config (#18224)
Add the plumbing we need to accept multiple Vault clusters in Nomad agent
configuration, to support upcoming Nomad Enterprise features. The `vault` blocks
are differentiated by a new `name` field, and if the `name` is omitted it
becomes the "default" Vault configuration. All blocks with the same name are
merged together, as with the existing behavior.

Unfortunately we're still using HCL1 for parsing configuration and the `Decode`
method doesn't parse multiple blocks differentiated only by a field name without
a label. So we've had to add an extra parsing pass, similar to what we've done
for HCL1 jobspecs.

For now, all existing consumers will use the "default" Vault configuration, so
there's no user-facing behavior change in this changeset other than the contents
of the agent self API.

Ref: https://github.com/hashicorp/team-nomad/issues/404
2023-08-17 14:10:32 -04:00
hashicorp-copywrite[bot]
a9d61ea3fd Update copyright file headers to BUSL-1.1 2023-08-10 17:27:29 -05:00
hashicorp-copywrite[bot]
f005448366 [COMPLIANCE] Add Copyright and License Headers 2023-04-10 15:36:59 +00:00
Alessio Perugini
365ccf4377 Allow configurable range of Job priorities (#16084) 2023-02-17 09:23:13 -05:00
visweshs123
7d4ccf11bc csi: add option to configure CSIVolumeClaimGCInterval (#16195) 2023-02-16 10:41:15 -05:00
James Rasell
eaea9164a5 acl: correctly resolve ACL roles within client cache. (#14922)
The client ACL cache was not accounting for tokens which included
ACL role links. This change modifies the behaviour to resolve role
links to policies. It will also now store ACL roles within the
cache for quick lookup. The cache TTL is configurable in the same
manner as policies or tokens.

Another small fix is included that takes into account the ACL
token expiry time. This was not included, which meant tokens with
expiry could be used past the expiry time, until they were GC'd.
2022-10-20 09:37:32 +02:00
James Rasell
892ab8a07a Merge branch 'main' into f-gh-13120-sso-umbrella 2022-08-02 08:30:03 +01:00
Luiz Aoqui
d456cc1e7f Track plan rejection history and automatically mark clients as ineligible (#13421)
Plan rejections occur when the scheduler work and the leader plan
applier disagree on the feasibility of a plan. This may happen for valid
reasons: since Nomad does parallel scheduling, it is expected that
different workers will have a different state when computing placements.

As the final plan reaches the leader plan applier, it may no longer be
valid due to a concurrent scheduling taking up intended resources. In
these situations the plan applier will notify the worker that the plan
was rejected and that they should refresh their state before trying
again.

In some rare and unexpected circumstances it has been observed that
workers will repeatedly submit the same plan, even if they are always
rejected.

While the root cause is still unknown this mitigation has been put in
place. The plan applier will now track the history of plan rejections
per client and include in the plan result a list of node IDs that should
be set as ineligible if the number of rejections in a given time window
crosses a certain threshold. The window size and threshold value can be
adjusted in the server configuration.

To avoid marking several nodes as ineligible at one, the operation is rate
limited to 5 nodes every 30min, with an initial burst of 10 operations.
2022-07-12 18:40:20 -04:00
James Rasell
08845cef04 server: add ACL token expiration config parameters. (#13667)
This commit adds configuration parameters to control ACL token
expirations. This includes both limits on the min and max TTL
expiration values, as well as a GC threshold for expired tokens.
2022-07-12 13:43:25 +02:00
James Rasell
d442e1b4c1 agent: test full object when performing test config parse. (#13668) 2022-07-11 16:21:36 +02:00
Michael Schurter
2411d3afd2 core: remove all traces of unused protocol version
Nomad inherited protocol version numbering configuration from Consul and
Serf, but unlike those projects Nomad has never used it. Nomad's
`protocol_version` has always been `1`.

While the code is effectively unused and therefore poses no runtime
risks to leave, I felt like removing it was best because:

1. Nomad's RPC subsystem has been able to evolve extensively without
   needing to increment the version number.
2. Nomad's HTTP API has evolved extensively without increment
   `API{Major,Minor}Version`. If we want to version the HTTP API in the
   future, I doubt this is the mechanism we would choose.
3. The presence of the `server.protocol_version` configuration
   parameter is confusing since `server.raft_protocol` *is* an important
   parameter for operators to consider. Even more confusing is that
   there is a distinct Serf protocol version which is included in `nomad
   server members` output under the heading `Protocol`. `raft_protocol`
   is the *only* protocol version relevant to Nomad developers and
   operators. The other protocol versions are either deadcode or have
   never changed (Serf).
4. If we were to need to version the RPC, HTTP API, or Serf protocols, I
   don't think these configuration parameters and variables are the best
   choice. If we come to that point we should choose a versioning scheme
   based on the use case and modern best practices -- not this 6+ year
   old dead code.
2022-02-18 16:12:36 -08:00
Matt Mukerjee
0881b94201 Add FailoverHeartbeatTTL to config (#11127)
FailoverHeartbeatTTL is the amount of time to wait after a server leader failure
before considering reallocating client tasks. This TTL should be fairly long as
the new server leader needs to rebuild the entire heartbeat map for the
cluster. In deployments with a small number of machines, the default TTL (5m)
may be unnecessary long. Let's allow operators to configure this value in their
config files.
2021-10-06 18:48:12 -04:00
Drew Bailey
4484d361ee configuration and oss components for licensing (#10216)
* configuration and oss components for licensing

* vendor sync
2021-03-23 09:08:14 -04:00
Drew Bailey
3347b40d11 remove event durability (#9147)
* remove event durability

temporarily removing go-memdb event durability until a new strategy is developed on how to best handled increased durability needs

* drop events table schema and state store methods

* fix neweventbuffer invocations
2020-10-22 12:21:03 -04:00
Drew Bailey
3c15f41411 filter on additional filter keys, remove switch statement duplication
properly wire up durable event count

move newline responsibility

moves newline creation from NDJson to the http handler, json stream only encodes and sends now

ignore snapshot restore if broker is disabled

enable dev mode to access event steam without acl

use mapping instead of switch

use pointers for config sizes, remove unused ttl, simplify closed conn logic
2020-10-14 14:14:33 -04:00
Drew Bailey
39ef3263ca Add EvictCallbackFn to handle removing entries from go-memdb when they
are removed from the event buffer.

Wire up event buffer size config, use pointers for structs.Events
instead of copying.
2020-10-14 12:44:42 -04:00
Drew Bailey
8a3130a356 event durability count and cfg 2020-10-14 12:44:40 -04:00
Drew Bailey
e7e6df394f wire up enable_event_publisher 2020-10-14 12:44:38 -04:00
Chris Baker
797543ad4b removed backwards-compatible/untagged metrics deprecated in 0.7 2020-10-13 20:18:39 +00:00