Commit Graph

1072 Commits

Author SHA1 Message Date
Tim Gross
08a6f870ad cni: use check command when restoring from restart (#24658)
When the Nomad client restarts and restores allocations, the network namespace
for an allocation may exist but no longer be correctly configured. For example,
if the host is rebooted and the task was a Docker task using a pause container,
the network namespace may be recreated by the docker daemon.

When we restore an allocation, use the CNI "check" command to verify that any
existing network namespace matches the expected configuration. This requires CNI
plugins of at least version 1.2.0 to avoid a bug in older plugin versions that
would cause the check to fail.

If the check fails, destroy the network namespace and try to recreate it from
scratch once. If that fails in the second pass, fail the restore so that the
allocation can be recreated (rather than silently having networking fail).

This should fix the gap left #24650 for Docker task drivers and any other
drivers with the `MustInitiateNetwork` capability.

Fixes: https://github.com/hashicorp/nomad/issues/24292
Ref: https://github.com/hashicorp/nomad/pull/24650
2025-01-07 09:38:39 -05:00
Piotr Kazmierczak
0906f788f0 keyring: warn if removing a key that was used for encrypting variables (#24766)
Adds an additional check in the Keyring.Delete RPC to make sure we're not
trying to delete a key that's been used to encrypt a variable. It also adds a
-force flag for the CLI/API to sidestep that check.
2025-01-07 10:15:02 +01:00
Charles Z.
f7b12dc54e add noswap to secretdir tmpfs (#24645) 2025-01-06 09:44:43 -05:00
Aimee Ukasick
1c12fc59a6 Docs: change stop_after to stop_on_client_after (#24727)
* change stop_after to stop_on_client_after

CE-800  GH https://github.com/hashicorp/nomad/issues/24702

* Move disconnect entry to correct alphabetical place in nav
2024-12-19 13:13:57 -06:00
Aimee Ukasick
8dc4a94b35 Add link to published tutorial (#24712)
CE-801
2024-12-19 12:52:05 -06:00
James Rasell
7d48aa2667 client: emit optional telemetry from prerun and prestart hooks. (#24556)
The Nomad client can now optionally emit telemetry data from the
prerun and prestart hooks. This allows operators to monitor and
alert on failures and time taken to complete.

The new datapoints are:
  - nomad.client.alloc_hook.prerun.success (counter)
  - nomad.client.alloc_hook.prerun.failed (counter)
  - nomad.client.alloc_hook.prerun.elapsed (sample)

  - nomad.client.task_hook.prestart.success (counter)
  - nomad.client.task_hook.prestart.failed (counter)
  - nomad.client.task_hook.prestart.elapsed (sample)

The hook execution time is useful to Nomad engineering and will
help optimize code where possible and understand job specification
impacts on hook performance.

Currently only the PreRun and PreStart hooks have telemetry
enabled, so we limit the number of new metrics being produced.
2024-12-12 14:43:14 +00:00
Aimee Ukasick
af5e2a742e Docs Feature: Add clone and edit feature (#24593)
* Docs: Add clone and edit feature

CE-741

* Change clone and edit heading level

* A few work tweaks
2024-12-05 09:21:27 -06:00
CJ
4563165196 Update sentinel.mdx (#24598) 2024-12-03 11:24:06 -05:00
CJ
b603b97d26 Update security.mdx 2024-12-02 11:43:24 -06:00
Piotr Kazmierczak
f7a4ded2c0 security: add CT executeTemplate to default function_denylist (#24541)
This PR adds Consul Template's executeTemplate function to the denylist by
default, in order to prevent accidental or malicious infinitely recursive
execution.

---------

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2024-11-22 19:33:56 +01:00
Piotr Kazmierczak
368241dbf2 security: a more comprehensive env.denylist (#24540)
A more comprehensive env.denylist that now includes more token, token file and
license variables. 

---------

Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>
2024-11-22 18:54:18 +01:00
Juana De La Cuesta
25cc492a16 docs: update the job subcommands on the docs (#24506) 2024-11-20 08:37:43 -06:00
Phil Renaud
83b30128a0 Add an image of the rendered UI block for a jobspec (#24481) 2024-11-20 09:33:47 -05:00
James Rasell
11bba3dbcd docs: fix broken link within enterprise Sentinel docs. (#24486) 2024-11-20 07:43:30 +00:00
Florian Apolloner
0a343798b6 Add NOMAD_* variables to CNI args. Fixes #23830 (#24319)
Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>
2024-11-19 12:48:48 -08:00
Aimee Ukasick
4dfedf1aef add top-level heading so the page renders correctly (#24491)
Add opening paragraph; update description
2024-11-19 11:10:10 -06:00
James Rasell
dc501339da docs: Add federated region concept and operations pages. (#24477)
In order to help users understand multi-region federated
deployments, this change adds two new sections to the website.

The first expands the architecture page, so we can add further
detail over time with an initial federation page. The second adds
a federation operations page which goes into failure planning and
mitigation.

Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>
Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
2024-11-19 12:39:57 +00:00
Michael Schurter
8dd570d6ca docs: upgrade docs should point at real version (#24438)
Let users know what happened to 1.9.2 but label the gc change as the
first working release (1.9.3).
2024-11-12 11:05:27 -08:00
Eduardo Medeiros
f8c85b036b docs: remove duplicated word. (#24433)
remove duplicated word “Using using”
2024-11-11 16:10:10 -05:00
Juana De La Cuesta
dfa0066d06 [gh-24311] Expand on documentation about jobs that are both parameterised and periodic (#24384)
* docs: expand on documentation about jobs that are both parameterized and periodic

* fix: typo

* docs: expand on the example

* Update website/content/docs/job-specification/parameterized.mdx

Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>

* Update website/content/docs/job-specification/parameterized.mdx

Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>

* Update website/content/docs/job-specification/parameterized.mdx

Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>

* Update website/content/docs/job-specification/parameterized.mdx

Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>

* Update website/content/docs/job-specification/periodic.mdx

Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>

* Update website/content/docs/job-specification/parameterized.mdx

Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>

* Update website/content/docs/job-specification/periodic.mdx

Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>

* Update website/content/docs/job-specification/parameterized.mdx

Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>

* style: improve the content with PR suggestions

* periodic.mdx fix link to parameterized

* Update website/content/docs/job-specification/parameterized.mdx

Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>

* Update parameterized.mdx

* Update website/content/docs/job-specification/parameterized.mdx

Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>

* Update website/content/docs/job-specification/parameterized.mdx

Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>

* Update website/content/docs/job-specification/parameterized.mdx

Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>

* Update parameterized.mdx

---------

Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>
Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>
2024-11-08 17:29:46 +01:00
Daniel Bennett
c32d9ed6f5 docs: ipv6: small fixes (#24368)
* escaping newlines is not allowed in go-sockaddr template
* client{} block in client section
* tiny extra clarification that the NOMAD_ADDR is an example
2024-11-05 11:11:36 -06:00
Piotr Kazmierczak
f7847c6e5b state: remove TimeTable and rely on objects' modify times instead (#24112)
Core scheduler relies on a special table in the state store—the TimeTable—to
figure out which objects can be GC'd. The TimeTable correlates Raft indices
with objects insertion time, a solution we used before most of the objects we
store in the state contained timestamps. This introduced a bit of a memory
overhead and complexity, but most importantly meant that any GC threshold users
set greater than timeTableLimit = 72 * time.Hour was ignored. This PR removes
the TimeTable and relies on object timestamps to determine whether they could
be GCd or not.
2024-11-01 19:38:04 +01:00
Michael Smithhisler
658c429d75 Drivers: add work_dir config to exec/raw_exec/java drivers (#24249)
---------

Co-authored-by: wurosh <uros.m.perisic@gmail.com>
Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
Co-authored-by: Tim Gross <tgross@hashicorp.com>
2024-11-01 11:04:40 -04:00
James Rasell
58ea294f0b docs: add note to reschedule block for update progress deadline. (#24346)
Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>
2024-11-01 14:54:51 +00:00
Juana De La Cuesta
c18418fa61 Merge pull request #20073 from hashicorp/feat/uid-gid-restriction
Adds ability to restrict uid and gids in exec and raw_exec
2024-10-31 15:48:45 +01:00
Juana De La Cuesta
3449056cd6 Update website/content/docs/drivers/raw_exec.mdx
Co-authored-by: Michael Smithhisler <michael.smithhisler@hashicorp.com>
2024-10-31 10:26:26 +01:00
Juana De La Cuesta
3f32557f1e Update website/content/docs/drivers/exec.mdx
Co-authored-by: Michael Smithhisler <michael.smithhisler@hashicorp.com>
2024-10-31 09:43:49 +01:00
Aimee Ukasick
5b1ad83d82 Docs: Add IPv6 support page (#24228)
* initial content from Daniel's doc

* Add IPv6 support doc to operations section.

* daniel obsessively re-refactors his docs

* Style guide edits

* a few more style nits

---------

Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>
2024-10-29 14:02:04 -05:00
Mike Nomitch
9565dde138 Only parsing id ranges once 2024-10-28 11:15:41 +01:00
Mike Nomitch
9cc3992ca6 Adds ability to restrict uid and gids in exec and raw_exec 2024-10-28 11:15:37 +01:00
Martijn Vegter
6236f354a5 consul: add support for service weight (#24186) 2024-10-25 11:21:38 -04:00
Tim Gross
a1ede9765c docs: warn about UID overlap between workload and Envoy tproxy (#24291)
When using transparent proxy mode with the `connect` block, the UID of the
workload cannot be the same as the UID of the Envoy sidecar (currently 101 in
the default Envoy container image).

Fixes: https://github.com/hashicorp/nomad/issues/23508
2024-10-24 08:45:44 -04:00
R.B. Boyer
4e8f596311 docs: update broken consul acl token links (#24287) 2024-10-23 13:34:21 -04:00
Tim Gross
10358cc911 docs: warn about Consul auth method locality (#24275)
* docs: warn about Consul auth method locality

The locality of Consul tokens we mint via Workload Identity is governed by the
Consul auth method configuration. By default tokens are local to the Consul
datacenter, which typically maps 1:1 with a Nomad region. Cluster administrators
who need cross-datacenter tokens can get them by setting the locality to global,
at the risk of placement problems if the primary DC isn't available.

Ref: https://github.com/hashicorp/consul/issues/21863
Fixes: https://github.com/hashicorp/nomad/issues/23505
2024-10-23 11:44:03 -04:00
Aimee Ukasick
6a2e1e4216 Docs: Update CLI job tag unset (#24273)
* Docs: Update CLI job tag unset

CLI help order was wrong, so updating the docs.

* change usage to [options]. Move general options into expanable.

* change "to see" to "for"
2024-10-23 10:20:45 -05:00
James Rasell
11573fba89 docs: fix workload identity concepts page JSON format. (#24255) 2024-10-18 14:52:42 +01:00
James Rasell
61dd1f3f10 docs: CLI node pool list does not accept arguments. (#24188) 2024-10-15 07:49:37 +01:00
Aimee Ukasick
5beb1ce58e Docs: Update job version section with tutorial links (#24179)
* Update job page with tutorial links

* Update section links
2024-10-14 12:29:56 -05:00
Aimee Ukasick
8f4a9326be Docs: Add 1.9 release notes (#24161)
* Add 1.9 release notes

* Add deprecated items

* Update Virt driver docs link to point to repo

Update Virt driver docs link to point to repo
2024-10-14 09:57:15 -05:00
Aimee Ukasick
c839f38cab Docs: Golden Versions updates (#24153)
* Add language from CLI help to job revert for version|tag

* Add CLI job tag subcommand page

* Add API create delete tag

Examples use same names between CLI and API

* Update CLI revert, tag; API jobs

* Add job version content

* add tag name unique per job to CLI/API; address Phil's feedback

Add partial explaining why tag, add to CLI/API

* Add diff_version to API jobs list job versions

* Apply suggestions from code review

Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>

* remove tutorial links since not published yet.

---------

Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>
2024-10-11 12:36:32 -05:00
Seth Hoenig
f1ce127524 jobspec: add a chown option to artifact block (#24157)
* jobspec: add a chown option to artifact block

This PR adds a boolean 'chown' field to the artifact block.

It indicates whether the Nomad client should chown the downloaded files
and directories to be owned by the task.user. This is useful for drivers
like raw_exec and exec2 which are subject to the host filesystem user
permissions structure. Before, these drivers might not be able to use or
manage the downloaded artifacts since they would be owned by the root
user on a typical Nomad client configuration.

* api: no need for pointer of chown field
2024-10-11 11:30:27 -05:00
Tim Gross
7381f8419b docs: clarify requirements for Consul token policies and TTLs (#24167)
As of #24166, Nomad agents will use their own token to deregister services and
checks from Consul. This returns the deregistration path to the pre-Workload
Identity workflow. Expand the documentation to make clear why certain ACL
policies are required for clients.

Additionally, we did not explicitly call out that auth methods should not set an
expiration on Consul tokens. Nomad does not have a facility to refresh these
tokens if they expire. Even if Nomad could, there's no way to re-inject them
into Envoy sidecars for Consul Service Mesh without recreating the task anyways,
which is what happens today. Warn users that they should not set an expiration.

Closes: https://github.com/hashicorp/nomad/issues/20185 (wontfix)
Ref: https://hashicorp.atlassian.net/browse/NET-10262
2024-10-11 11:59:21 -04:00
Daniel Bennett
373aae7b32 docs: add Resource Quota specification page (#24152)
and update some related pages

Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>
Co-authored-by: Tim Gross <tgross@hashicorp.com>
2024-10-10 15:03:10 -05:00
Michael Schurter
da75d4ff4b docs: fix aed -> aead typo (#24123) 2024-10-03 13:31:32 -04:00
Aimee Ukasick
4c131229f4 Add devices to NUMA section of CPU page (#24113) 2024-10-03 09:09:10 -05:00
James Rasell
1fabbaa179 driver: remove LXC and ECS driver documentation. (#24107)
Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>
2024-10-03 08:55:39 +01:00
Tim Gross
64881eefce docs: remove references to serf.io site (#24114)
The serf.io site is being taken down, so change all our links to point to the
repo docs instead.

Ref: https://github.com/hashicorp/serf/pull/743
2024-10-02 14:33:04 -04:00
Martijn Vegter
3ecf0d21e2 metrics: introduce client config to include alloc metadata as part of the base labels (#23964) 2024-10-02 10:55:44 -04:00
Adrian Todorov
2444cc3504 docs: small updates to Nomad as an AWS OIDC Provider docs (#24078)
A few small updates to the recent "Federate access to AWS with Nomad Workload Identity" documentation, most notably that restart isn't needed because AWS SDKs handle OIDC reauth gracefully (unlike any other type of auth - for all others it's cached statically on startup, so nothing but a full restart works in case your credentials expire).
2024-09-30 11:02:09 -04:00
Aimee Ukasick
5f92ccbfb2 Docs: Terraform prereq clarification (#24069)
Clarify Terraform prereq since you don't need to install the Terraform CLI locally.

Fixes: [CE-726](https://hashicorp.atlassian.net/browse/CE-726)

[CE-726]: https://hashicorp.atlassian.net/browse/CE-726?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ
2024-09-27 13:47:10 -04:00