Commit Graph

26597 Commits

Author SHA1 Message Date
James Rasell
3d6de7fa6b docs: Update CNI install detail to use 1.6.2 (#24976)
CNI had release problems which meant 1.6.1 got pulled and 1.6.2 is
identical.
2025-01-31 07:30:15 +00:00
Juana De La Cuesta
3861c40220 func: add initial enos skeleton (#24787)
* func: add initial enos skeleton

* style: add headers

* func: change the variables input to a map of objects to simplify the workloads creation

* style: formating

* Add tests for servers and clients

* style: separate the tests in diferent scripts

* style: add missing headers

* func: add tests for allocs

* style: improve output

* func: add step to copy remote upgrade version

* style: hcl formatting

* fix: remove the terraform nomad provider

* fix: Add clean token to remove extra new line added in provision

* fix: Add clean token to remove extra new line added in provision

* fix: Add clean token to remove extra new line added in provision

* fix: add missing license headers

* style: hcl fmt

* style: rename variables and fix format

* func: remove the template step on the workloads module and chop the noamd token output on the provide module

* fix: correct the jobspec path on the workloads module

* fix: add missing variable definitions on job specs for workloads

* style: formatting

* fix: rename variable in health test
2025-01-30 16:37:55 +01:00
James Rasell
0d57e91282 sec: Surpress yamux OSV alert in CRT. (#24978)
The change also removes an old surpression which has now been
resolved.
2025-01-30 15:27:19 +00:00
James Rasell
bfd5f38761 ui: Remove unrequired node read from task log streaming page. (#24973)
Co-authored-by: Phil Renaud <phil@riotindustries.com>
2025-01-30 07:42:27 +00:00
Michael Smithhisler
47c14ddf28 remove remote task execution code (#24909) 2025-01-29 08:08:34 -05:00
Daniel Bennett
dcf6201d2b dynamic host volumes: CE side of quota tweaks (#24972)
* quota spec:
  if `region_limit.storage.host_volumes` is set,
  do not require that `variables` also be set,
  and vice versa.
* subtract from quota usage on volume delete
* stub CE quota subtraction method
2025-01-28 17:27:25 -06:00
Juana De La Cuesta
1b1ad896ec Add the path to the ssh key to connect to the cluster's instances as an output (#24969)
* fix: add the ssh key pem path to te outputs and fix the message with the correct path

* func: add ssh pem key as output
2025-01-28 18:25:02 +01:00
James Rasell
c8d7e741c8 e2e: Fix TF output SSH key path. (#24965) 2025-01-28 16:29:56 +00:00
Deniz Onur Duzgun
bfcbe83ab5 sec: sanitize identity token from events (#24966)
* bug: sanitize identity token from events

* add changelog
2025-01-28 10:57:06 -05:00
James Rasell
7a450f5499 build: update to go 1.23.5 (#24963) 2025-01-28 15:47:00 +00:00
James Rasell
8859cfa3f5 e2e: Ensure Consul client is running before starting Nomad service. (#24964) 2025-01-28 15:28:12 +00:00
Tim Gross
09eb473189 dynamic host volumes: set status unavailable on failed restore (#24962)
When a client restarts but can't restore a volume (ex. the plugin is now
missing), it's removed from the node fingerprint. So we won't allow future
scheduling of the volume, but we were not updating the volume state field to
report this reasoning to operators. Make debugging easier and the state field
more meaningful by setting the value to "unavailable".

Also, remove the unused "deleted" field. We did not implement soft deletes and
aren't planning on it for Nomad 1.10.0.

Ref: https://hashicorp.atlassian.net/browse/NET-11551
2025-01-27 16:35:53 -05:00
Michael Smithhisler
b7aabb11be changelog: add entry for PR #24739 (#24961) 2025-01-27 13:48:37 -05:00
Gabi
e107d84c78 taskrunner: fix panic when a task that has a dynamic user is recovered (#24739) 2025-01-27 13:05:55 -05:00
Phil Renaud
7106ac1462 Update playwright to 1.50.0 for e2e ui tests (#24956) 2025-01-27 12:03:59 -05:00
Daniel Bennett
49c147bcd7 dynamic host volumes: change env vars, fixup auto-delete (#24943)
* plugin env: DHV_HOST_PATH->DHV_VOLUMES_DIR
* client config: host_volumes_dir
* plugin env: add namespace+nodepool
* only auto-delete after error saving client state
  on *initial* create
2025-01-27 10:36:53 -06:00
Judith Malnick
890daba432 Remove web team from CODEOWNERS for content directories (#24946) 2025-01-27 08:57:58 -05:00
Seth Hoenig
1356880962 fingerprint: convert consul and vault fingerprinters to be reloadable (#24526)
This PR changes the Consul and Vault fingerprint implementations to be
reloadable rather than periodic. Reasons described in the issue.
2025-01-27 09:20:01 +00:00
Tim Gross
7add04eb0f refactor: volume request modes to be generic between DHV/CSI (#24896)
When we implemented CSI, the types of the fields for access mode and attachment
mode on volume requests were defined with a prefix "CSI". This gets confusing
now that we have dynamic host volumes using the same fields. Fortunately the
original was a typedef on string, and the Go API in the `api` package just uses
strings directly, so we can change the name of the type without breaking
backwards compatibility for the msgpack wire format.

Update the names to `VolumeAccessMode` and `VolumeAttachmentMode`. Keep the CSI
and DHV specific value constant names for these fields (they aren't currently
1:1), so that we can easily differentiate in a given bit of code which values
are valid.

Ref: https://github.com/hashicorp/nomad/pull/24881#discussion_r1920702890
2025-01-24 10:37:48 -05:00
James Rasell
b4d71f6693 changelog: add entry for #24919 (#24939) 2025-01-24 14:29:41 +00:00
James Rasell
ef32825ede docs: Remove Portworx state workloads link. (#24921)
Portworx website no longer has Nomad related documentation.
2025-01-24 08:45:41 +00:00
James Rasell
739e5ed6ee reporting: Update server to accommodate new enterprise reporting. (#24919)
Nomad Enterprise will utilise new reporting metrics and the
changes here allow this work to be conducted.

The server specific GetClientNodesCount function has been remomved
from CE as this is only called within enterprise code. A new
heartbeater function allows us to get the number of active timers,
which can be used by the heartbeater metrics and any other callers
that want this data.
2025-01-24 08:00:07 +00:00
Tim Gross
c1dc9ed75d CSI: don't overwrite context with empty value from request (#24922)
When a volume is updated, we merge the new definition to the old. But the
volume's context comes from the plugin and is likely not present in the user's
volume specification. Which means that if the user re-submits the volume
specification to make an adjustment to the volume, we wipe out the context field
which might be required for subsequent operations by the CSI plugin. This was
discovered to be a problem with the Terraform provider and fixed there, but it's
also a problem for users of the `volume create` and `volume register` commands.

Update the merge so that we only overwrite the value of the context if it's been
explictly set by the user. We still need to support user-driven updates to
context for the `volume register` workflow.

Ref: https://github.com/hashicorp/terraform-provider-nomad/pull/503
Fixes: https://github.com/democratic-csi/democratic-csi/issues/438
2025-01-23 14:06:32 -05:00
Michael Smithhisler
5befea62b7 event stream: adds ability to authenticate using Workload Identity (#24849) 2025-01-23 11:49:54 -05:00
Michael Smithhisler
d621211108 auth: adds option to enable verbose logging during sso (#24892)
Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
2025-01-23 11:40:01 -05:00
Tim Gross
3e7adba8f0 volume spec: fix access_mode field in examples (#24911)
The `volume init` command creates example volume specifications. But one of the
values for `capability.access_mode` is not a valid value. Correct the example to
match the validation logic.
2025-01-22 09:30:49 -05:00
Juana De La Cuesta
687335639b fix: add a dependency to avoid terraform errors when generating ssh keys (#24912) 2025-01-22 11:36:03 +01:00
Piotr Kazmierczak
3d7e4fd634 client: always initialize node.HostVolumes map (#24910)
The default node configuration in the client should always set an empty
HostVolumes map. Otherwise callers can panic, e.g.,:

goroutine 179 [running]:
github.com/hashicorp/nomad/client/hostvolumemanager.UpdateVolumeMap({0x36042b0, 0xc000c62a80}, 0x0, {0xc000a802a0, 0xd}, 0xc000691940)
	github.com/hashicorp/nomad/client/hostvolumemanager/volume_fingerprint.go:43 +0x1b2
github.com/hashicorp/nomad/client.(*Client).batchFirstFingerprints.func1({0xc000a802a0, 0xd}, 0xc000691940)
	github.com/hashicorp/nomad/client/node_updater.go:54 +0xd7
github.com/hashicorp/nomad/client.(*batchNodeUpdates).batchHostVolumeUpdates(0xc000912608?, 0xc0009f2f88)
	github.com/hashicorp/nomad/client/node_updater.go:417 +0x152
github.com/hashicorp/nomad/client.(*Client).batchFirstFingerprints(0xc000c2d188)
	github.com/hashicorp/nomad/client/node_updater.go:53 +0x1c5
created by github.com/hashicorp/nomad/client.NewClient in goroutine 1
	github.com/hashicorp/nomad/client/client.go:557 +0x2069

is a panic of the HVM when restarting a client that doesn't have any static
host volumes, but does have a dynamic host volume.
2025-01-21 20:45:04 +01:00
Piotr Kazmierczak
ebffcce378 stateful deployments: remove CSIVolumeIDs (#24908) 2025-01-21 17:00:55 +01:00
Juana De La Cuesta
039da61d8f [F-net-11478] Make keys directory cluster grouped (#24883)
* func: make windows arch dependant

* func: unify keys and make them cluster grouped

* Update README.md

* Update e2e/terraform/provision-infra/provision-nomad/variables.tf

Co-authored-by: Tim Gross <tgross@hashicorp.com>

* Update .gitignore

* style: add an output with the custer identifier

---------

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2025-01-20 10:18:38 +01:00
Piotr Kazmierczak
af4f31fc5f deps: upgrade go-getter to 1.7.8 (#24832) 2025-01-20 09:49:12 +01:00
Tim Gross
44d9c2a3d1 dynamic host volumes: enforce exclusive access in plan apply (#24881)
Some dynamic host volumes are claimed by allocations with the capability we
borrowed from CSI called `single-node-single-writer`, which says only one
allocation can use the volume, and it can use it in read/write mode. We enforce
this in the scheduler, but if evaluations for different jobs were to be
processed concurrently by the scheduler, it's possible to get plans that would
fail to enforce this requirement. Add a check in the plan applier to ensure that
non-terminal allocations have exclusive access when requested.
2025-01-17 15:38:33 -05:00
Michael Schurter
63dacd2d6e update vault token warning from 1.9->1.10 (#24884)
Fixes #24847
2025-01-17 10:56:06 -08:00
Tim Gross
96e539ee87 dynamic host volumes quotas (#24871)
Allow users to configure a host volumes quota in MB. This will be enforced at
the time of provisioning via create/register RPCs. This changeset is the CE
version of ENT/2114.

Ref: https://github.com/hashicorp/nomad-enterprise/pull/2114
Ref: https://hashicorp.atlassian.net/browse/NET-11549
2025-01-17 11:41:56 -05:00
Daniel Bennett
4807e74ea2 dynamic host volumes: serialize ops per volume (#24852)
let only one of any create/register/delete run at a time per volume ID

* plugins can assume that Nomad will not run concurrent operations for the same volume
* we avoid interleaving client RPCs with raft writes
2025-01-17 10:37:09 -06:00
Tim Gross
33c68dcc58 docs: clarify workload-associated policy parameters (#24882)
Workload-associated ACL policies can only be set on a specific job within a
namespace, not the namespace as a whole. Clarify the documentation for the CLI
and API.

Fixes: https://github.com/hashicorp/terraform-provider-nomad/issues/500
Ref: https://github.com/hashicorp/terraform-provider-nomad/pull/504
2025-01-17 10:51:33 -05:00
James Rasell
63ea13be77 agent: Ensure logger set up method is public. (#24886)
This is needed by a Nomad Enterprise code path.
2025-01-17 13:47:06 +00:00
Tim Gross
1df94b1470 E2E: refactor volume_mounts test (#24876)
The volume_mounts test is flaky due to slow starts from the exec-driver and some
incorrect wait code. Refactor the volume_mounts test to use the `e2e/v3` package
helpers, and use these to give it enough time to start the exec tasks.
2025-01-17 08:31:50 -05:00
James Rasell
753f752cdd agent: remove unused log filter and unrequired library. (#24873)
The Nomad agent used a log filter to ensure logs were written at
the expected level. Since the use of hclog this is not required,
as hclog acts as the gate keeper and filter for logging. All log
writers accept messages from hclog which has already done the
filtering.
2025-01-17 07:51:27 +00:00
Brian McClain
b4cc5d88e7 docs: update install command for Fedora to match install page (#24870) 2025-01-16 13:39:56 -05:00
James Rasell
03cbe7cd71 server: Fix error message format when detailing cluster metadata. (#24874) 2025-01-16 13:54:42 +00:00
James Rasell
1ae9785f9b agent: Fix a bug where all syslog lines are notice when using JSON (#24865)
The agent syslog write handler was unable to handle JSON log lines
correctly, meaning all syslog entries when using JSON log format
showed as NOTICE level.

This change adds a new handler to the Nomad agent which can parse
JSON log lines and correctly understand the expected log level
entry.

The change also removes the use of a filter from the default log
format handler. This is not needed as the logs are fed into the
syslog handler via hclog, which is responsible for level
filtering.
2025-01-16 07:23:08 +00:00
Tim Gross
46bd0b1716 dynamic host volume: set default capability (#24857)
We can reduce the amount of volume specification configuration many users will
need by setting a default capability on a dynamic host volume if none is
set. The default capability will allow using the volume in read/write mode on
its node, with no further restrictions except those that might be set in the
jobspec.
2025-01-15 14:07:07 -05:00
Tim Gross
044784b2fb dynamic host volumes: move node pool governance to placement filter (CE) (#24867)
Enterprise governance checks happen after dynamic host volumes are placed, so if
node pool governance is active and you don't set a node pool or node ID for a
volume, it's possible to get a placement that fails node pool governance even
though there might be other nodes in the cluster that would be valid placements.

Move the node pool governance for host volumes into the placement path, so that
we're checking a specific node pool when node pool or node ID are set, but
otherwise filtering out candidate nodes by node pool.

This changset is the CE version of ENT/2200.

Ref: https://hashicorp.atlassian.net/browse/NET-11549
Ref: https://github.com/hashicorp/nomad-enterprise/pull/2200
2025-01-15 14:04:18 -05:00
Tim Gross
a292ecc621 dynamic host volumes: allow for node pool and plugin ID changes (#24851)
Update dynamic host volume validation and update logic to allow for changes to
the node pool and plugin ID. If the client's node pool changes we'll sync up the
correct node pool for the volumes already placed on that client. We'll also
allow the plugin ID to be changed to allow for new versions of plugins
supporting the same volume over time.
2025-01-15 13:40:42 -05:00
James Rasell
75d0ac657e ui: Fill service check background object for pending checks. (#24818) 2025-01-15 15:27:10 +00:00
James Rasell
8d201a82fd agent: Fixed a bug where syslog error messages marked as notice. (#24820)
The mapping between Nomad log level identifiers and syslog
priorities did not handle the error level string correctly.
2025-01-15 08:02:53 +00:00
James Rasell
689f935e0a services: Support TLS Skip Verify within Nomad service checks. (#24781)
Checks within a service using the Nomad provider can now utilise
the `tls_skip_verify` parameter.
2025-01-15 07:39:39 +00:00
Michael Schurter
0438294f69 Merge pull request #24858 from hashicorp/post-1.9.5-release
Post 1.9.5 release
2025-01-14 13:11:58 -08:00
Michael Schurter
925d2dbaed actually update backport changelog 2025-01-14 12:56:37 -08:00