Commit Graph

26108 Commits

Author SHA1 Message Date
Tim Gross
192d70cee7 docker: update infra_image to new registry (#23927)
The gcr.io container registry is shutting down in March. Update the default
`image_image` for Docker's "pause" containers to point to the new location
hosted by the k8s project.

Fixes: https://github.com/hashicorp/nomad/issues/23911
Ref: https://hashicorp.atlassian.net/browse/NET-10942
2024-09-06 14:34:03 -04:00
Juana De La Cuesta
bd8569e16e Merge pull request #23922 from hashicorp/b-NET-10880
[NET-10880] Keep a register of the usable cores to avoid using more than that
2024-09-06 13:18:56 +02:00
Juana De La Cuesta
9c5f962940 Update client/lib/cgroupslib/partition_linux.go
Co-authored-by: Tim Gross <tgross@hashicorp.com>
2024-09-06 10:56:47 +02:00
Juana De La Cuesta
426c225dc2 Update client/lib/cgroupslib/partition_linux.go
Co-authored-by: Tim Gross <tgross@hashicorp.com>
2024-09-06 10:56:41 +02:00
Juana De La Cuesta
8e6d85b66f Update client/lib/cgroupslib/partition_linux.go
Co-authored-by: Tim Gross <tgross@hashicorp.com>
2024-09-06 10:56:36 +02:00
Tim Gross
06f5fbc5d6 auth: enforce use of node secret and remove legacy auth (#23838)
As of Nomad 1.6.0, Nomad client agents send their secret with all the
RPCs (other than registration). But for backwards compatibility we had to keep
a legacy auth method that didn't require the node secret. We've previously
announced that this legacy auth method would be removed and that nodes older
than 1.6.0 would not be supported with Nomad 1.9.0.

This changeset removes the legacy auth method.

Ref: https://developer.hashicorp.com/nomad/docs/release-notes/nomad/upcoming#nomad-1-9-0
2024-09-05 14:24:28 -04:00
Tim Gross
04ad7165e7 services: reject node secret for Read/List RPC (#23910)
As of Nomad 1.6.0, Nomad clients never make RPC requests to the
ServiceRegistrationList/Read RPC without using a specific Workload Identity
rather than the node secret. Tighten the ACL permissions on these RPCs so that
node secrets are no longer valid tokens.

Ref: https://hashicorp.atlassian.net/browse/NET-10009
Ref: https://developer.hashicorp.com/nomad/docs/release-notes/nomad/upcoming#nomad-1-9-0
2024-09-05 13:52:32 -04:00
Juanadelacuesta
a65d05ff51 fix: keep a register of the usable cores to avoid using more than that 2024-09-05 17:02:54 +02:00
Tim Gross
a9beef7edd jobspec: remove HCL1 support (#23912)
This changeset removes support for parsing jobspecs via the long-deprecated
HCLv1.

Fixes: https://github.com/hashicorp/nomad/issues/20195
Ref: https://hashicorp.atlassian.net/browse/NET-10220
2024-09-05 09:02:45 -04:00
Juana De La Cuesta
4972b7382d Merge pull request #23909 from hashicorp/docs-gh-23878
Remove wrong `VariableFlags` parameter from parse job endpoint
2024-09-04 20:15:21 +02:00
Daniel Bennett
2f5cf8efae networking: option to enable ipv6 on bridge network (#23882)
by setting bridge_network_subnet_ipv6 in client config

Co-authored-by: Martina Santangelo <martina.santangelo@hashicorp.com>
2024-09-04 10:17:10 -05:00
Juanadelacuesta
ed150010c7 docs: remove wrong FlagsVariable parameter 2024-09-04 15:09:38 +02:00
Austin Culter
ce3e159ee8 docs: update upgrade-specific.mdx (#23906) 2024-09-04 08:42:27 -04:00
Tim Gross
60284ad874 WI: use parent job ID for subject claim (#23902)
When we use the job ID in creating the subject claim (`sub`) for workload
identities, we forgot to use the parent job ID when that's available. Child job
IDs have a random component that makes them unsuitable for the subject field.

Ref: https://github.com/hashicorp/nomad/pull/23817#discussion_r1717490323
Ref: https://hashicorp.atlassian.net/browse/NET-10714
2024-09-03 16:33:10 -04:00
Tim Gross
c43e30a387 WI: interpolate parent job ID in vault.default_identity.extra_claims (#23817)
When we interpolate job fields for the `vault.default_identity.extra_claims`
block, we forgot to use the parent job ID when that's available (as we do for
all other claims). This changeset fixes the bug and adds a helper method that'll
hopefully remind us to do this going forward.

Also added a missing changelog entry for #23675 where we implemented the
`extra_claims` block originally, which shipped in Nomad 1.8.3.

Fixes: https://github.com/hashicorp/nomad/issues/23798
2024-09-03 13:56:36 -04:00
Piotr Kazmierczak
6700937303 cli: fix typos in quota_init and spec parsing (#23891) 2024-08-29 18:45:35 +02:00
Aimee Ukasick
8407a9f442 Docs: CE-674 Add job statuses (#23849)
* Docs: CE-674 Add job status explanation

add new page for jobs to concepts section

* add job types

* Rename jobs; move in site nav; remove types; reformat; add scaled

* change Jobs to Job on the page

* fix typo

* Apply suggestions from code review

Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>

* create UI statuses heading

---------

Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>
2024-08-29 11:22:12 -05:00
Piotr Kazmierczak
9265b384b3 quota: parse device block (#23866) 2024-08-28 18:45:12 +02:00
Piotr Kazmierczak
2d7dcba4b7 quota: add device block to the quota init command (#23881) 2024-08-28 16:14:50 +02:00
Piotr Kazmierczak
82f0f00a83 node endpoints: do not create evals for sysbatch jobs (#23858)
node-update triggers should never trigger sysbatch allocations, these should
only ever be create by periodic-job or job-register.

An example scenario is: an allocation spawned by a sysbatch periodic job is
running on a node, the allocation gets stopped, GC runs, the node becomes
ineligible and eligible again, all within the parent sysbatch job period
window. If this happens, node-update will trigger the system scheduler and
prematurely start an allocation. This is not a desired behavior, and in fact a
bug.
2024-08-27 09:51:40 +02:00
Aimee Ukasick
bc90bd7c68 Merge pull request #23870 from hashicorp/ce705
Docs: CE-705 Highlight that user must back up keyring separately
2024-08-26 13:36:50 -05:00
Aimee Ukasick
3d06eef65d Docs: CE-705 Highlight that user must backup keyring separately 2024-08-26 11:25:26 -05:00
Aimee Ukasick
b562aabcee Merge pull request #23861 from hashicorp/aimeeu-patch-1
Website README: Add install HashiCorp package to run content-check locally
2024-08-26 10:42:09 -05:00
Aimee Ukasick
5c3dae9d22 Website README: Update to include installing HashiCorp package to run content-check locally
Validating content section doesn't mention that you need to have the @hashicorp/platform-content-conformance installed if you want to run `npm run content-check` locally.
2024-08-23 15:17:51 -05:00
Daniel Bennett
a6e29057d6 networking: refactor some iptables for testability (#23856) 2024-08-23 10:05:46 -05:00
Sujata Roy
36522ec632 Merge pull request #23850 from hashicorp/Nomad-NET-9394
command/debug: capture more logs by default
2024-08-22 10:43:28 -07:00
Michael Schurter
3572fd58cf docs: add cl for #23850 2024-08-22 09:19:05 -07:00
Michael Schurter
8b0a88e2f7 docs: update defaults for operator debug 2024-08-22 09:17:03 -07:00
Seth Hoenig
8b093a6a5d scheduler: support for device - aware numa scheduling (#1760) (#23837)
(CE backport of ENT 59433d56c7215c0b8bf33764f41b57d9bd30160f (without ent files))

* scheduler: enhance numa aware scheduling with support for devices

* cr: add comments
2024-08-20 07:53:04 -05:00
Phil Renaud
5fcec1f8cc [ui] Show "Scaled Down" as a valid job status when task groups' counts are set to zero (#23829)
* Scaled Down as a status

* Scaled Down as a steady-state job panel status as well

* Test for badge status and changelog
2024-08-19 13:45:19 -04:00
Seth Hoenig
4aeb279534 e2e: fix module name of an artifact we download (#23843)
Because this will definitely never change again, for sure, trust me.
2024-08-19 10:25:35 -05:00
Phil Renaud
fbd8d62955 Check for target on click to prevent double-opening cmd+clicked links on jobs index (#23832) 2024-08-19 10:20:24 -04:00
Florian Apolloner
d6be784e2d namespaces: add allowed network modes to capabilities. (#23813) 2024-08-16 09:47:19 -04:00
Piotr Kazmierczak
0bc9796d3b client: log an error message if total detected cpu is zero (#23827) 2024-08-15 18:31:27 +02:00
Piotr Kazmierczak
f8e7905e24 docs: dmidecode manual installation as post-install step (#23823)
Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>
2024-08-15 17:14:16 +02:00
Tim Gross
682c8c0c81 cgroupslib: allow initial controller check with delegated cgroups v2 (#23803)
During Nomad client initialization with cgroups v2, we assert that the required
cgroup controllers are available in the root `cgroup.subtree_control` file by
idempotently writing to the file. But if Nomad is running with delegated
cgroups, this will fail file permissions checks even if the subtree control file
already has the controllers we need.

Update the initialization to first check if the controllers are missing before
attempting to write to them. This allows cgroup delegation so long as the
cluster administrator has pre-created a Nomad owned cgroups tree and set the
`Delegate` option in a systemd override. If not, initialization fails in the
existing way.

Although this is one small step along the way to supporting a rootless Nomad
client, running Nomad as non-root is still unsupported. I've intentionally not
documented setting up cgroup delegation in this PR, as this PR is insufficient
by itself to have a secure and properly-working rootless Nomad client.

Ref: https://github.com/hashicorp/nomad/issues/18211
Ref: https://github.com/hashicorp/nomad/issues/13669
2024-08-14 16:58:21 -04:00
Tim Gross
6aa503f2bb docker: disable cpuset management for non-root clients (#23804)
Nomad clients manage a cpuset cgroup for each task to reserve or share CPU
cores. But Docker owns its own cgroups, and attempting to set a parent cgroup
that Nomad manages runs into conflicts with how runc manages cgroups via
systemd. Therefore Nomad must run as root in order for cpuset management to ever
be compatible with Docker.

However, some users running in unsupported configurations felt that the changes
we made in Nomad 1.7.0 to ensure Nomad was running correctly represented a
regression. This changeset disables cpuset management for non-root Nomad
clients. When running Nomad as non-root, the driver will not longer reconcile
cpusets with Nomad and `resources.cores` will behave incorrectly (but the driver
will still run).

Although this is one small step along the way to supporting a rootless Nomad
client, running Nomad as non-root is still unsupported. This PR is insufficient
by itself to have a secure and properly-working rootless Nomad client.

Ref: https://github.com/hashicorp/nomad/issues/18211
Ref: https://github.com/hashicorp/nomad/issues/13669
Ref: https://hashicorp.atlassian.net/browse/NET-10652
Ref: https://github.com/opencontainers/runc/blob/main/docs/systemd.md
2024-08-14 16:44:13 -04:00
Martijn Vegter
aded4b3500 docs: remove remaining references to network_speed config (#23792) 2024-08-14 14:14:38 -04:00
Martina Santangelo
73ce56ba27 networking: refactor building nomad bridge config (#23772) 2024-08-14 12:43:31 -05:00
Seth Hoenig
db0642099e build: update golangci-lint to 1.60.1 (#23807)
* build: update golangci-lint to 1.60.1

* ci: update golangci-lint to v1.60.1

Helps with go1.23 compatability. Introduces some breaking changes / newly
enforced linter patterns so those are fixed as well.
2024-08-14 10:09:31 -05:00
Seth Hoenig
f89335e01b build: update to go1.22.6 (#23805) 2024-08-14 09:20:14 -05:00
Seth Hoenig
0bcfd9a266 build: apt update before apt install (#23806) 2024-08-14 08:58:15 -05:00
Piotr Kazmierczak
c1362c03df docs: minimal Consul policy for Nomad agents needs node:write (#23800) 2024-08-13 17:53:21 +02:00
Piotr Kazmierczak
b34a6fe10b Merge pull request #23797 from hashicorp/post-1.8.3-release
Post 1.8.3 release
2024-08-13 14:36:01 +02:00
Piotr Kazmierczak
0ab7e2219a assets rebuild 2024-08-13 12:48:08 +02:00
Piotr Kazmierczak
c021857659 Merge release 1.8.3 files 2024-08-13 12:23:40 +02:00
hc-github-team-nomad-core
7c29e7cb7b Prepare for next release 2024-08-13 12:21:21 +02:00
hc-github-team-nomad-core
8489dadf57 Generate files for 1.8.3 release 2024-08-13 12:21:21 +02:00
Tim Gross
b7419bc940 api: only set url field in config if previously unset (#23785)
In #16872 we added support for configuring the API client with a unix domain
socket. In order to set the host correctly, we parse the address before mutating
the Address field in the configuration. But this prevents the configuration from
being reused across multiple clients, as the next time we parse the address it
will no longer be pointing to the socket. This breaks consumers like the
autoscaler, which reuse the API config between plugins.

Update the `NewClient` constructor to only override the `url` field if it hasn't
already been parsed. Include a test demonstrating safe reuse with a unix domain
socket.

Ref: https://github.com/hashicorp/nomad-autoscaler/issues/944
Ref: https://github.com/hashicorp/nomad-autoscaler/pull/945
2024-08-09 13:28:04 -04:00
Tim Gross
920f4702d6 testing: fix skip comment on RequireWindows helper (#23776) 2024-08-09 09:07:25 -04:00