Commit Graph

1029 Commits

Author SHA1 Message Date
Michael Schurter
da75d4ff4b docs: fix aed -> aead typo (#24123) 2024-10-03 13:31:32 -04:00
Aimee Ukasick
4c131229f4 Add devices to NUMA section of CPU page (#24113) 2024-10-03 09:09:10 -05:00
James Rasell
1fabbaa179 driver: remove LXC and ECS driver documentation. (#24107)
Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>
2024-10-03 08:55:39 +01:00
Tim Gross
64881eefce docs: remove references to serf.io site (#24114)
The serf.io site is being taken down, so change all our links to point to the
repo docs instead.

Ref: https://github.com/hashicorp/serf/pull/743
2024-10-02 14:33:04 -04:00
Martijn Vegter
3ecf0d21e2 metrics: introduce client config to include alloc metadata as part of the base labels (#23964) 2024-10-02 10:55:44 -04:00
Adrian Todorov
2444cc3504 docs: small updates to Nomad as an AWS OIDC Provider docs (#24078)
A few small updates to the recent "Federate access to AWS with Nomad Workload Identity" documentation, most notably that restart isn't needed because AWS SDKs handle OIDC reauth gracefully (unlike any other type of auth - for all others it's cached statically on startup, so nothing but a full restart works in case your credentials expire).
2024-09-30 11:02:09 -04:00
Aimee Ukasick
5f92ccbfb2 Docs: Terraform prereq clarification (#24069)
Clarify Terraform prereq since you don't need to install the Terraform CLI locally.

Fixes: [CE-726](https://hashicorp.atlassian.net/browse/CE-726)

[CE-726]: https://hashicorp.atlassian.net/browse/CE-726?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ
2024-09-27 13:47:10 -04:00
Michael Schurter
34cb05d297 docs: explain how to use dots in docker labels (#24074)
Nomad v1.9.0 (finally!) removes support for HCL1 and the `-hcl1` flag.
See #23912 for details.

One of the uses of HCL1 over HCL2 was that HCL1 allowed quoted keys in
blocks such as env, meta, and Docker's labels:

```hcl
some_block {
  "foo.bar" = "baz"
}
```

This works in HCL1 but is invalid HCL2. In HCL2 you must use a map
instead of a block:

```hcl
some_map = {
  "eggs.spam" = "works!"
}
```

This was such a hassle for users we special cased the `env` and `meta`
blocks to be accepted as blocks or maps in #9936.

However Docker `labels`, being a task config option, is much harder to
special case and commonly needs dots-in-keys for things like DataDog
autodiscovery via Docker container labels:
https://docs.datadoghq.com/containers/docker/integrations/?tab=labels

Luckily `labels` can be specified as a list-of-maps instead:

```hcl
labels = [
  {
    "com.datadoghq.ad.check_names"  = "[\"openmetrics\"]"
    "com.datadoghq.ad.init_configs" = "[{}]"
  }
]
```

So instead of adding more awkward hcl1/2 backward compat code to Nomad,
I just updated the docs to hopefully help people hit by this.

The only other known workaround is dropping HCL in favor of JSON
jobspecs altogether, but that forces a huge migration and maintenance
burden on users:
https://discuss.hashicorp.com/t/docker-based-autodiscovery-with-datadog-how-can-we-make-it-work/18870
2024-09-27 10:02:50 -07:00
Seth Hoenig
6fb59ca72a docs: add documentation for numa devices block (#24067) 2024-09-26 09:41:33 -05:00
Phil Renaud
e206993d49 Feature: Golden Versions (#24055)
* TaggedVersion information in structs, rather than job_endpoint (#23841)

* TaggedVersion information in structs, rather than job_endpoint

* Test for taggedVersion description length

* Some API plumbing

* Tag and Untag job versions (#23863)

* Tag and Untag at API level on down, but am I unblocking the wrong thing?

* Code and comment cleanup

* Unset methods generally now I stare long into the namespace abyss

* Namespace passes through with QueryOptions removed from a write requesting struct

* Comment and PR review cleanup

* Version back to VersionStr

* Generally consolidate unset logic into apply for version tagging

* Addressed some PR comments

* Auth check and RPC forwarding

* uint64 instead of pointer for job version after api layer and renamed copy

* job tag command split into apply and unset

* latest-version convenience handling moved to CLI command level

* CLI tests for tagging/untagging

* UI parts removed

* Add to job table when unsetting job tag on latest version

* Vestigial no more

* Compare versions by name and version number with the nomad history command (#23889)

* First pass at passing a tagname and/or diff version to plan/versions requests

* versions API now takes compare_to flags

* Job history command output can have tag names and descriptions

* compare_to to diff-tag and diff-version, plus adding flags to history command

* 0th version now shows a diff if a specific diff target is requested

* Addressing some PR comments

* Simplify the diff-appending part of jobVersions and hide None-type diffs from CLI

* Remove the diff-tag and diff-version parts of nomad job plan, with an eye toward making them a new top-level CLI command soon

* Version diff tests

* re-implement JobVersionByTagName

* Test mods and simplification

* Documentation for nomad job history additions

* Prevent pruning and reaping of TaggedVersion jobs (#23983)

tagged versions should not count against JobTrackedVersions
i.e. new job versions being inserted should not evict tagged versions

and GC should not delete a job if any of its versions are tagged

Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>

---------

Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>

* [ui] Version Tags on the job versions page (#24013)

* Timeline styles and their buttons modernized, and tags added

* styled but not yet functional version blocks

* Rough pass at edit/unedit UX

* Styles consolidated

* better UX around version tag crud, plus adapter and serializers

* Mirage and acceptance tests

* Modify percy to not show time-based things

---------

Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>

* Job revert command and API endpoint can take a string version tag name (#24059)

* Job revert command and API endpoint can take a string version tag name

* RevertOpts as a signature-modified alternative to Revert()

* job revert CLI test

* Version pointers in endpoint tests

* Dont copy over the tag when a job is reverted to a version with a tag

* Convert tag name to version number at CLI level

* Client method for version lookup by tag

* No longer double-declaring client

* [ui] Add tag filter to the job versions page (#24064)

* Rough pass at the UI for version diff dropdown

* Cleanup and diff fetching via adapter method

* TaggedVersion now VersionTag (#24066)

---------

Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>
2024-09-25 19:59:16 -04:00
Tim Gross
a3a2028837 docs: update key management docs for keyring-in-Raft (#24026)
In #23977 we moved the keyring into Raft. This changeset documents the
operational changes and adds notes to the upgrade guide.
2024-09-25 10:48:14 -04:00
Anthony
46d92a53a5 Usage doc for configuring Nomad OIDC with AWS IAM (#23845) 2024-09-23 14:01:22 -04:00
Michael Smithhisler
6b6aa7cc26 identity: adds ability to specify custom filepath for saving workload identities (#24038) 2024-09-23 10:27:00 -04:00
Tim Gross
a7f2cb879e command line tools for redacting keyring from snapshots (#24023)
In #23977 we moved the keyring into Raft, which can expose key material in Raft
snapshots when using the less-secure AEAD keyring instead of KMS. This changeset
adds tools for redacting this material from snapshots:

* The `operator snapshot state` command gains the ability to display key
  metadata (only), which respects the `-filter` option.
* The `operator snapshot save` command gains a `-redact` option that removes key
  material from the snapshot after it's downloaded.
* A new `operator snapshot redact` command allows removing key material from an
  existing snapshot.
2024-09-20 15:30:14 -04:00
Daniel Bennett
ec81e7c57c networking: add ignore_collision for static port{} (#23956)
so more than one copy of a program can run
at a time on the same port with SO_REUSEPORT.

requires host network mode.

some task drivers (like docker) may also need
config {
  network_mode = "host"
}
but this is not validated prior to placement.
2024-09-17 16:01:48 -05:00
Benjamin Boudreau
cdaf45d990 Stop referring non existent vault.file attribute (#23946)
The documentation is referring to a `file` attribute that does not exist on the `vault` block.

This PR changes those references to mention the `disable_file` attribute instead.
2024-09-12 09:10:41 -07:00
Tim Gross
192d70cee7 docker: update infra_image to new registry (#23927)
The gcr.io container registry is shutting down in March. Update the default
`image_image` for Docker's "pause" containers to point to the new location
hosted by the k8s project.

Fixes: https://github.com/hashicorp/nomad/issues/23911
Ref: https://hashicorp.atlassian.net/browse/NET-10942
2024-09-06 14:34:03 -04:00
Tim Gross
06f5fbc5d6 auth: enforce use of node secret and remove legacy auth (#23838)
As of Nomad 1.6.0, Nomad client agents send their secret with all the
RPCs (other than registration). But for backwards compatibility we had to keep
a legacy auth method that didn't require the node secret. We've previously
announced that this legacy auth method would be removed and that nodes older
than 1.6.0 would not be supported with Nomad 1.9.0.

This changeset removes the legacy auth method.

Ref: https://developer.hashicorp.com/nomad/docs/release-notes/nomad/upcoming#nomad-1-9-0
2024-09-05 14:24:28 -04:00
Tim Gross
a9beef7edd jobspec: remove HCL1 support (#23912)
This changeset removes support for parsing jobspecs via the long-deprecated
HCLv1.

Fixes: https://github.com/hashicorp/nomad/issues/20195
Ref: https://hashicorp.atlassian.net/browse/NET-10220
2024-09-05 09:02:45 -04:00
Daniel Bennett
2f5cf8efae networking: option to enable ipv6 on bridge network (#23882)
by setting bridge_network_subnet_ipv6 in client config

Co-authored-by: Martina Santangelo <martina.santangelo@hashicorp.com>
2024-09-04 10:17:10 -05:00
Austin Culter
ce3e159ee8 docs: update upgrade-specific.mdx (#23906) 2024-09-04 08:42:27 -04:00
Tim Gross
c43e30a387 WI: interpolate parent job ID in vault.default_identity.extra_claims (#23817)
When we interpolate job fields for the `vault.default_identity.extra_claims`
block, we forgot to use the parent job ID when that's available (as we do for
all other claims). This changeset fixes the bug and adds a helper method that'll
hopefully remind us to do this going forward.

Also added a missing changelog entry for #23675 where we implemented the
`extra_claims` block originally, which shipped in Nomad 1.8.3.

Fixes: https://github.com/hashicorp/nomad/issues/23798
2024-09-03 13:56:36 -04:00
Aimee Ukasick
8407a9f442 Docs: CE-674 Add job statuses (#23849)
* Docs: CE-674 Add job status explanation

add new page for jobs to concepts section

* add job types

* Rename jobs; move in site nav; remove types; reformat; add scaled

* change Jobs to Job on the page

* fix typo

* Apply suggestions from code review

Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>

* create UI statuses heading

---------

Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>
2024-08-29 11:22:12 -05:00
Aimee Ukasick
3d06eef65d Docs: CE-705 Highlight that user must backup keyring separately 2024-08-26 11:25:26 -05:00
Sujata Roy
36522ec632 Merge pull request #23850 from hashicorp/Nomad-NET-9394
command/debug: capture more logs by default
2024-08-22 10:43:28 -07:00
Michael Schurter
8b0a88e2f7 docs: update defaults for operator debug 2024-08-22 09:17:03 -07:00
Florian Apolloner
d6be784e2d namespaces: add allowed network modes to capabilities. (#23813) 2024-08-16 09:47:19 -04:00
Piotr Kazmierczak
f8e7905e24 docs: dmidecode manual installation as post-install step (#23823)
Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>
2024-08-15 17:14:16 +02:00
Tim Gross
6aa503f2bb docker: disable cpuset management for non-root clients (#23804)
Nomad clients manage a cpuset cgroup for each task to reserve or share CPU
cores. But Docker owns its own cgroups, and attempting to set a parent cgroup
that Nomad manages runs into conflicts with how runc manages cgroups via
systemd. Therefore Nomad must run as root in order for cpuset management to ever
be compatible with Docker.

However, some users running in unsupported configurations felt that the changes
we made in Nomad 1.7.0 to ensure Nomad was running correctly represented a
regression. This changeset disables cpuset management for non-root Nomad
clients. When running Nomad as non-root, the driver will not longer reconcile
cpusets with Nomad and `resources.cores` will behave incorrectly (but the driver
will still run).

Although this is one small step along the way to supporting a rootless Nomad
client, running Nomad as non-root is still unsupported. This PR is insufficient
by itself to have a secure and properly-working rootless Nomad client.

Ref: https://github.com/hashicorp/nomad/issues/18211
Ref: https://github.com/hashicorp/nomad/issues/13669
Ref: https://hashicorp.atlassian.net/browse/NET-10652
Ref: https://github.com/opencontainers/runc/blob/main/docs/systemd.md
2024-08-14 16:44:13 -04:00
Martijn Vegter
aded4b3500 docs: remove remaining references to network_speed config (#23792) 2024-08-14 14:14:38 -04:00
Piotr Kazmierczak
c1362c03df docs: minimal Consul policy for Nomad agents needs node:write (#23800) 2024-08-13 17:53:21 +02:00
Tim Gross
ef116b12d5 metrics: add client.tasks state metrics (#23773)
Although we have `client.allocations` metrics to track allocation states on a
client, having separate metrics for `client.tasks` will allow operators to
identify that there are individual tasks in an unexpected state in an otherwise
healthy allocation.

Fixes: https://github.com/hashicorp/nomad/issues/23770
2024-08-09 09:02:17 -04:00
VPanteleev-S7
2e5d6192a7 docs: illustrate how to use the obtained token (#19557)
Currently, the page doesn't explain how to do things as the logged in user.
2024-08-08 15:35:26 -04:00
Kartik Prajapati
3a3e63e2e1 cli: add role update functionality to acl token update (#18532) 2024-08-08 15:33:36 -04:00
Aimee Ukasick
20511fa64d docs: Clarify namespace rules matching criteria. (#23752)
Clarify how Nomad evaluates policy rules.

Fixes: #20118
Jira: https://hashicorp.atlassian.net/browse/CE-695

Related tutorial PR: https://github.com/hashicorp/tutorials/pull/2205
2024-08-07 09:28:38 -04:00
Tim Gross
4a5921cb16 acl: disallow leading / on variable paths (#23757)
The path for a Variable never begins with a leading `/`, because it's stripped
off in the API before it ever gets to the state store. The CLI and UI allow the
leading `/` for convenience, but this can be misleading when it comes to writing
ACL policies. An ACL policy with a path starting with a leading `/` will never
match.

Update the ACL policy parser so that we prevent an incorrect variable path in
the policy.

Fixes: https://github.com/hashicorp/nomad/issues/23730
2024-08-07 09:26:18 -04:00
Aimee Ukasick
021692eccf docs: refactor CNI plugin content (#23707)
- Pulled common content from multiple pages into new partials
- Refactored install/index to be OS-based so I could add linux-distro-based instructions to install-consul-cni-plugins.mdx partial. The tab groups on the install/index page do match and change focus as expected.
- Moved CNI overview-type content to networking/index
- Refactored networking/cni to include install CNI plugins and configuration content (from install/index).
- Moved CNI plugins explanation in bridge mode configuration section into bullet points. They had been #### headings, which aren't rendered in the R page TOC. I tried to simplify and format the bullet point content to be easier to scan.

Ref: https://hashicorp.atlassian.net/browse/CE-661
Fixes: https://github.com/hashicorp/nomad/issues/23229
Fixes: https://github.com/hashicorp/nomad/issues/23583
2024-08-06 14:47:46 -04:00
Tim Gross
b25f1b66ce resources: allow job authors to configure size of secrets tmpfs (#23696)
On supported platforms, the secrets directory is a 1MiB tmpfs. But some tasks
need larger space for downloading large secrets. This is especially the case for
tasks using `templates`, which need extra room to write a temporary file to the
secrets directory that gets renamed to the old file atomically.

This changeset allows increasing the size of the tmpfs in the `resources`
block. Because this is a memory resource, we need to include it in the memory we
allocate for scheduling purposes. The task is already prevented from using more
memory in the tmpfs than the `resources.memory` field allows, but can bypass
that limit by writing to the tmpfs via `template` or `artifact` blocks.

Therefore, we need to account for the size of the tmpfs in the allocation
resources. Simply adding it to the memory needed when we create the allocation
allows it to be accounted for in all downstream consumers, and then we'll
subtract that amount from the memory resources just before configuring the task
driver.

For backwards compatibility, the default value of 1MiB is "free" and ignored by
the scheduler. Otherwise we'd be increasing the allocated resources for every
existing alloc, which could cause problems across upgrades. If a user explicitly
sets `resources.secrets = 1` it will no longer be free.

Fixes: https://github.com/hashicorp/nomad/issues/2481
Ref: https://hashicorp.atlassian.net/browse/NET-10070
2024-08-05 16:06:58 -04:00
Tim Gross
e684636aed cli: add option to return original HCL in job inspect command (#23699)
In 1.6.0 we shipped the ability to review the original HCL in the web UI, but
didn't follow-up with an equivalent in the command line. Add a `-hcl` flag to
the `job inspect` command.

Closes: https://github.com/hashicorp/nomad/issues/6778
2024-08-05 15:35:18 -04:00
Tim Gross
bc50eebebd workload identity: add support for extra claims config for Vault (#23675)
Although we encourage users to use Vault roles, sometimes they're going to want
to assign policies based on entity and pre-create entities and aliases based on
claims. This allows them to use single default role (or at least small number of
them) that has a templated policy, but have an escape hatch from that.

When defining Vault entities the `user_claim` must be unique. When writing Vault
binding rules for use with Nomad workload identities the binding rule won't be
able to create a 1:1 mapping because the selector language allows accessing only
a single field. The `nomad_job_id` claim isn't sufficient to uniquely identify a
job because of namespaces. It's possible to create a JWT auth role with
`bound_claims` to avoid this becoming a security problem, but this doesn't allow
for correct accounting of user claims.

Add support for an `extra_claims` block on the server's `default_identity`
blocks for Vault. This allows a cluster administrator to add a custom claim on
all allocations. The values for these claims are interpolatable with a limited
subset of fields, similar to how we interpolate the task environment.

Fixes: https://github.com/hashicorp/nomad/issues/23510
Ref: https://hashicorp.atlassian.net/browse/NET-10372
Ref: https://hashicorp.atlassian.net/browse/NET-10387
2024-08-05 15:01:54 -04:00
Aimee Ukasick
cbacdb2041 DOCS: CE-659 chroot limitations for isolated fork/exec driver (#23739) 2024-08-05 14:35:54 -04:00
Tim Gross
9ff7437b06 docs: document client.alloc_mounts_dir configuration (#23733)
In Nomad 1.8.0 we introduced the `alloc_mounts_dir` to support unveil filesystem
isolation, but we didn't document the configuration value.
2024-08-05 11:59:47 -04:00
Tim Gross
9d4686c0df tls: remove deprecated prefer_server_cipher_suites field (#23712)
The TLS configuration object includes a deprecated `prefer_server_cipher_suites`
field. In version of Go prior to 1.17, this property controlled whether a TLS
connection would use the cipher suites preferred by the server or by the
client. This field is ignored as of 1.17 and, according to the `crypto/tls`
docs: "Servers now select the best mutually supported cipher suite based on
logic that takes into account inferred client hardware, server hardware, and
security."

This property has been long-deprecated and leaving it in place may lead to false
assumptions about how cipher suites are negotiated in connection to a server. So
we want to remove it in Nomad 1.9.0.

Fixes: https://github.com/hashicorp/nomad-enterprise/issues/999
Ref: https://hashicorp.atlassian.net/browse/NET-10531
2024-08-01 08:52:05 -04:00
Tim Gross
2ee6043cab tls: support setting min version to TLS1.3 (#23713)
Nomad already supports TLS1.3, but not as a minimum version
configuration. Update our config validation to allow setting `tls_min_version`
to 1.3. Update the documentation to match Vault and warn that the
`tls_cipher_suites` field is ignored when TLS is 1.3

Fixes: https://github.com/hashicorp/nomad/issues/20131
Ref: https://hashicorp.atlassian.net/browse/NET-10530
2024-08-01 08:46:32 -04:00
Tim Gross
c06859e5bc docs: add note about removing support for older clients in 1.9 (#23695)
In Nomad 1.6.0 we started sending the node secret with RPCs that previously did
not include it. We planned to deprecate the older auth workflow but didn't set a
release. Removing the legacy support means that nodes running <1.6.0 will fail
to heartbeat.

Ref: https://hashicorp.atlassian.net/browse/NET-10009
2024-07-26 13:26:24 -04:00
Tim Gross
d5ca07a247 docs: notices of upcoming deprecations and backports (#23683)
Add a section to the docs describing planned upcoming deprecations and
removals. Also added some missing upgrade guide sections missed during the last
release.
2024-07-25 10:20:18 -04:00
Tim Gross
0f4014b4a9 docs: external KMS configuration (#23600)
In #23580 we're implementing support for encrypting Nomad's key material with
external KMS providers or Vault Transit. This changeset breaks out the
documentation from that PR to keep the review manageable and present it to a
wider set of reviewers.

Ref: https://hashicorp.atlassian.net/browse/NET-10334
Ref: https://github.com/hashicorp/nomad/issues/14852
Ref: https://github.com/hashicorp/nomad/pull/23580
2024-07-19 15:08:54 -04:00
Tim Gross
2f4353412d keyring: support prepublishing keys (#23577)
When a root key is rotated, the servers immediately start signing Workload
Identities with the new active key. But workloads may be using those WI tokens
to sign into external services, which may not have had time to fetch the new
public key and which might try to fetch new keys as needed.

Add support for prepublishing keys. Prepublished keys will be visible in the
JWKS endpoint but will not be used for signing or encryption until their
`PublishTime`. Update the periodic key rotation to prepublish keys at half the
`root_key_rotation_threshold` window, and promote prepublished keys to active
after the `PublishTime`.

This changeset also fixes two bugs in periodic root key rotation and garbage
collection, both of which can't be safely fixed without implementing
prepublishing:

* Periodic root key rotation would never happen because the default
  `root_key_rotation_threshold` of 720h exceeds the 72h maximum window of the FSM
  time table. We now compare the `CreateTime` against the wall clock time instead
  of the time table. (We expect to remove the time table in future work, ref
  https://github.com/hashicorp/nomad/issues/16359)
* Root key garbage collection could GC keys that were used to sign
  identities. We now wait until `root_key_rotation_threshold` +
  `root_key_gc_threshold` before GC'ing a key.
* When rekeying a root key, the core job did not mark the key as inactive after
  the rekey was complete.

Ref: https://hashicorp.atlassian.net/browse/NET-10398
Ref: https://hashicorp.atlassian.net/browse/NET-10280
Fixes: https://github.com/hashicorp/nomad/issues/19669
Fixes: https://github.com/hashicorp/nomad/issues/23528
Fixes: https://github.com/hashicorp/nomad/issues/19368
2024-07-19 13:29:41 -04:00
Tim Gross
a8ab2d13b4 docs: explain how to use insecure registries with Docker (#23642)
The documentation for the `SSL` option for the Docker driver is
misleading inasmuch as it's both deprecated and non-functional in current
versions of Docker. Remove this option from the docs and add a section
explaining how to use insecure registries.

Fixes: https://github.com/hashicorp/nomad/issues/23616
2024-07-19 11:18:47 -04:00
James Rasell
a65e5c126a docs: update quota docs and changelog to detail new cores feature. (#23592) 2024-07-15 10:07:34 +01:00