Commit Graph

3846 Commits

Author SHA1 Message Date
James Rasell
7d48aa2667 client: emit optional telemetry from prerun and prestart hooks. (#24556)
The Nomad client can now optionally emit telemetry data from the
prerun and prestart hooks. This allows operators to monitor and
alert on failures and time taken to complete.

The new datapoints are:
  - nomad.client.alloc_hook.prerun.success (counter)
  - nomad.client.alloc_hook.prerun.failed (counter)
  - nomad.client.alloc_hook.prerun.elapsed (sample)

  - nomad.client.task_hook.prestart.success (counter)
  - nomad.client.task_hook.prestart.failed (counter)
  - nomad.client.task_hook.prestart.elapsed (sample)

The hook execution time is useful to Nomad engineering and will
help optimize code where possible and understand job specification
impacts on hook performance.

Currently only the PreRun and PreStart hooks have telemetry
enabled, so we limit the number of new metrics being produced.
2024-12-12 14:43:14 +00:00
James Rasell
86bc7ed224 cli: Ensure JSON flag is respected in autopilot health command. (#24655) 2024-12-12 13:43:32 +00:00
James Rasell
261359fba7 agent: Fix a bug where retry_join was not retrying. (#24561)
The retry_join logic was not allowing for retries to happen and
was exiting after the first failed discovery attempt. This change
fixes that behaviour and adds a test to ensure no further
regressions.
2024-11-29 08:29:15 +00:00
Piotr Kazmierczak
f7a4ded2c0 security: add CT executeTemplate to default function_denylist (#24541)
This PR adds Consul Template's executeTemplate function to the denylist by
default, in order to prevent accidental or malicious infinitely recursive
execution.

---------

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2024-11-22 19:33:56 +01:00
Piotr Kazmierczak
368241dbf2 security: a more comprehensive env.denylist (#24540)
A more comprehensive env.denylist that now includes more token, token file and
license variables. 

---------

Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>
2024-11-22 18:54:18 +01:00
Juana De La Cuesta
c21dfdb17a [gh-476] Sanitise HCL variables before storing on job submission (#24423)
* func: User url rules to scape non alphanumeric values in hcl variables

* docs: add changelog

* func: unscape flags before returning

* use JSON.stringify instead of bespoke value quoting to handle in-value-multi-line cases

---------

Co-authored-by: Phil Renaud <phil@riotindustries.com>
2024-11-22 11:45:02 +01:00
Tim Gross
6b9dbefb9e consul: handle nil multierror pointer correctly (#24513)
When the service client syncs to Consul, we accumulate service sync errors in a
multierror before reading all the local checks. If the API call to the local
checks fails, we either return that error or append it to the multierror and
return the set of errors. But `multierror.Error.Len()` doesn't nil-check, so we
need to do this ourselves.

I've also made a quick pass through the rest of the code base looking for
multierror `Len` method calls to see if we have this pattern elsewhere.

Fixes: https://github.com/hashicorp/nomad/issues/24512
2024-11-20 10:55:52 -05:00
Piotr Kazmierczak
9c5078f151 agent: set content type header explicitly (#24489)
This PR addresses an XSS vulnerability where Nomad agents wouldn't explicitly
set content type headers for error responses.
2024-11-20 10:18:30 +01:00
Tim Gross
189d648f95 csi: remove redundant namespace field from volume status output (#24432)
The `volume status :id` command outputs the namespace for a CSI volume
twice. Drop the second output.

Ref: https://github.com/hashicorp/nomad/pull/24382#discussion_r1837097250
2024-11-11 16:05:59 -05:00
hc-github-team-nomad-core
9f9e66fa61 Generate files for 1.9.3 release 2024-11-11 19:40:44 +01:00
hc-github-team-nomad-core
1938a7578b Generate files for 1.9.2 release 2024-11-08 15:21:39 +01:00
Daniel Bennett
a036b75aef api: new dispatch endpoint sends body as Payload (#24381)
this opens up dispatching parameterized jobs by systems
that do not allow modifying what http request body they send

e.g. these two things are equal:

POST '{"Payload": "'"$(base64 <<< "hello")"'"}' /v1/job/my-job/dispatch
POST 'hello' /v1/job/my-job/dispatch/payload
2024-11-07 10:12:29 -06:00
Jamie Finnigan
dec1bf51c0 update ndjson links due to domain expiry/resale (#24306) 2024-10-28 09:06:50 +00:00
Martijn Vegter
6236f354a5 consul: add support for service weight (#24186) 2024-10-25 11:21:38 -04:00
Phil Renaud
cfba3edaab Fixed an error in job tag unset help text (#24272) 2024-10-22 16:02:20 -04:00
Juana De La Cuesta
aaf7936bb2 Merge pull request #24270 from hashicorp/post-1.9.1-release
Post 1.9.1 release
2024-10-22 17:22:18 +02:00
Habibi Mustafa
c5aa77e012 CLI: fix leadership transfer title docs (#24263) 2024-10-21 16:18:59 -04:00
hc-github-team-nomad-core
8117fa011b Generate files for 1.9.1 release 2024-10-21 21:51:05 +02:00
hc-github-team-nomad-core
777776ef37 Generate files for 1.9.1 release 2024-10-21 21:51:04 +02:00
Rajeev
42eacc85e2 #23671 Added synopsis for operator root and operator gossip command. (#23855)
Co-authored-by: James Rasell <jrasell@hashicorp.com>
2024-10-18 08:48:12 +01:00
Michael Schurter
e440e1d1db cli: update nomad job init full examples (#24232)
* cli: trim job init example jobspec
* cli: trim job init -connect example jobspec

---------

Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>
2024-10-17 10:32:47 -07:00
Daniel Bennett
a0d7fb6b09 connect: fix ipv6 bind_address test (#24216) 2024-10-16 08:23:44 -05:00
Daniel Bennett
067afcda26 Consul Connect over IPv6 (except tproxy) (#24203)
* detect ipv6 on "bridge" network and set
  service.connect.sidecar_proxy.config.bind_address
  for envoy to "::" instead of "0.0.0.0"
* allow users to set bind_address in jobspec
  e.g. "" would defer to consul proxy-defaults
* caveat: tproxy still does not work, because
  the CNI plugin does not configure ip6tables
2024-10-14 18:52:02 -05:00
hc-github-team-nomad-core
f1714162df Generate files for 1.9.0 release 2024-10-14 07:26:36 +01:00
Tim Gross
4de1665942 consul: improve reliability of deregistration (#24166)
When the local Consul agent receives a deregister request, it performs a
pre-flight check using the locally cached ACL token. The agent then sends the
request upstream to the Consul servers as part of anti-entropy, using its own
token. This requires that the token we use for deregistration is valid even
though that's not the token used to write to the Consul server.

There are several cases where the service identity token might no longer exist
at the time of deregistration:
* A race condition between the sync and destroying the allocation.
* Misconfiguration of the Consul auth method with a TTL.
* Out-of-band destruction of the token.

Additionally, Nomad's sync with Consul returns early if there are any errors,
which means that a single broken token can prevent any other service on the
Nomad agent from being registered or deregistered.

Update Nomad's sync with Consul to use the Nomad agent's own Consul token for
deregistration, regardless of which token the service was registered
with. Accumulate errors from the sync so that they no longer block
deregistration of other services.

Fixes: https://github.com/hashicorp/nomad/issues/20159
2024-10-11 12:32:23 -04:00
Seth Hoenig
f1ce127524 jobspec: add a chown option to artifact block (#24157)
* jobspec: add a chown option to artifact block

This PR adds a boolean 'chown' field to the artifact block.

It indicates whether the Nomad client should chown the downloaded files
and directories to be owned by the task.user. This is useful for drivers
like raw_exec and exec2 which are subject to the host filesystem user
permissions structure. Before, these drivers might not be able to use or
manage the downloaded artifacts since they would be owned by the root
user on a typical Nomad client configuration.

* api: no need for pointer of chown field
2024-10-11 11:30:27 -05:00
hc-github-team-nomad-core
668a827b2b Generate files for 1.9.0-beta.2 release 2024-10-04 16:18:27 +00:00
Martijn Vegter
3ecf0d21e2 metrics: introduce client config to include alloc metadata as part of the base labels (#23964) 2024-10-02 10:55:44 -04:00
Tim Gross
6c03e1991d refactor: clean up slice initialization in node status (#24109)
We initialize this slice with a zeroed array and then append to it, which means
we then have to clean out the empty strings later. Initialize to the correct
capacity up front so there are no empty values.

Ref: https://github.com/hashicorp/nomad/pull/24104
2024-10-02 10:40:32 -04:00
Tim Gross
651d8d6f88 tests: fixup copywrite in test file (#24101)
In #24007 we merged new HCL files but they were missing copywrite headers
because the scan didn't run on this PR for some reason. I've already backported
this to the Enterprise branches.
2024-10-01 16:43:10 -04:00
Juliano Martinez
4a74fda8ce Allow client template config block to be parsed when using json config (#24007)
- Adds tests
- Adds sample test data for parsing hcl and json
- Adds changelog
2024-10-01 15:44:36 -04:00
Tim Gross
5e1ad14f1f scaling policy: use request namespace as target if unset in jobspec (#24065)
When jobs are submitted with a scaling policy, the scaling policy's target only
includes the job's namespace if the `namespace` field is set in the jobspec and
not from the request. Normally jobs are canonicalized in the RPC handler before
being written to Raft. But the scaling policy targets are instead written during
the conversion from `api.Job` to `structs.Job`. We populate the `structs.Job`
namespace from the request here as well, but only after the conversion has
occurred. Swap the order of these operations so that the conversion is always
happening with a correct namespace.

Long-term we should not be making mutations during conversion either. But we
can't remove it immediately because API requests may come from any agent across
upgrades. Move the scaling target creation into the `Canonicalize` method and
mark it for future removal in the API conversion code path.

Fixes: https://github.com/hashicorp/nomad/issues/24039
2024-10-01 11:41:40 -04:00
hc-github-team-nomad-core
07dc87eb21 Generate files for 1.9.0-beta.1 release 2024-09-26 17:35:57 +00:00
Phil Renaud
e206993d49 Feature: Golden Versions (#24055)
* TaggedVersion information in structs, rather than job_endpoint (#23841)

* TaggedVersion information in structs, rather than job_endpoint

* Test for taggedVersion description length

* Some API plumbing

* Tag and Untag job versions (#23863)

* Tag and Untag at API level on down, but am I unblocking the wrong thing?

* Code and comment cleanup

* Unset methods generally now I stare long into the namespace abyss

* Namespace passes through with QueryOptions removed from a write requesting struct

* Comment and PR review cleanup

* Version back to VersionStr

* Generally consolidate unset logic into apply for version tagging

* Addressed some PR comments

* Auth check and RPC forwarding

* uint64 instead of pointer for job version after api layer and renamed copy

* job tag command split into apply and unset

* latest-version convenience handling moved to CLI command level

* CLI tests for tagging/untagging

* UI parts removed

* Add to job table when unsetting job tag on latest version

* Vestigial no more

* Compare versions by name and version number with the nomad history command (#23889)

* First pass at passing a tagname and/or diff version to plan/versions requests

* versions API now takes compare_to flags

* Job history command output can have tag names and descriptions

* compare_to to diff-tag and diff-version, plus adding flags to history command

* 0th version now shows a diff if a specific diff target is requested

* Addressing some PR comments

* Simplify the diff-appending part of jobVersions and hide None-type diffs from CLI

* Remove the diff-tag and diff-version parts of nomad job plan, with an eye toward making them a new top-level CLI command soon

* Version diff tests

* re-implement JobVersionByTagName

* Test mods and simplification

* Documentation for nomad job history additions

* Prevent pruning and reaping of TaggedVersion jobs (#23983)

tagged versions should not count against JobTrackedVersions
i.e. new job versions being inserted should not evict tagged versions

and GC should not delete a job if any of its versions are tagged

Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>

---------

Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>

* [ui] Version Tags on the job versions page (#24013)

* Timeline styles and their buttons modernized, and tags added

* styled but not yet functional version blocks

* Rough pass at edit/unedit UX

* Styles consolidated

* better UX around version tag crud, plus adapter and serializers

* Mirage and acceptance tests

* Modify percy to not show time-based things

---------

Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>

* Job revert command and API endpoint can take a string version tag name (#24059)

* Job revert command and API endpoint can take a string version tag name

* RevertOpts as a signature-modified alternative to Revert()

* job revert CLI test

* Version pointers in endpoint tests

* Dont copy over the tag when a job is reverted to a version with a tag

* Convert tag name to version number at CLI level

* Client method for version lookup by tag

* No longer double-declaring client

* [ui] Add tag filter to the job versions page (#24064)

* Rough pass at the UI for version diff dropdown

* Cleanup and diff fetching via adapter method

* TaggedVersion now VersionTag (#24066)

---------

Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>
2024-09-25 19:59:16 -04:00
Tim Gross
65ec00da1d cli: fix -t flag on job status command (#24054)
In #18925 we added a `-json` flag to the `job status` command, but the argument
handling had a bug where it would always set the `-json` flag if either the `-t`
or `-json` flags were set, resulting in a misleading error. Instead, pass the
`-json` flag value into the formatter.

Fixes: https://github.com/hashicorp/nomad/issues/24050
2024-09-25 09:12:52 -04:00
Michael Smithhisler
6b6aa7cc26 identity: adds ability to specify custom filepath for saving workload identities (#24038) 2024-09-23 10:27:00 -04:00
Tim Gross
a7f2cb879e command line tools for redacting keyring from snapshots (#24023)
In #23977 we moved the keyring into Raft, which can expose key material in Raft
snapshots when using the less-secure AEAD keyring instead of KMS. This changeset
adds tools for redacting this material from snapshots:

* The `operator snapshot state` command gains the ability to display key
  metadata (only), which respects the `-filter` option.
* The `operator snapshot save` command gains a `-redact` option that removes key
  material from the snapshot after it's downloaded.
* A new `operator snapshot redact` command allows removing key material from an
  existing snapshot.
2024-09-20 15:30:14 -04:00
Daniel Bennett
ec81e7c57c networking: add ignore_collision for static port{} (#23956)
so more than one copy of a program can run
at a time on the same port with SO_REUSEPORT.

requires host network mode.

some task drivers (like docker) may also need
config {
  network_mode = "host"
}
but this is not validated prior to placement.
2024-09-17 16:01:48 -05:00
Seth Hoenig
51215bf102 deps: update to go-set/v3 and refactor to use custom iterators (#23971)
* deps: update to go-set/v3

* deps: use custom set iterators for looping
2024-09-16 13:40:10 -05:00
Piotr Kazmierczak
47f3313ffd cli: quota status extension for devices (#23899)
quota status CLI now displays device limits (if present in the quota spec)
2024-09-12 16:51:53 +02:00
Piotr Kazmierczak
2e6ccf825a quotas: corrections to Resources.Add and quota apply parsing logic (#23894) 2024-09-09 15:27:17 +02:00
Tim Gross
a9beef7edd jobspec: remove HCL1 support (#23912)
This changeset removes support for parsing jobspecs via the long-deprecated
HCLv1.

Fixes: https://github.com/hashicorp/nomad/issues/20195
Ref: https://hashicorp.atlassian.net/browse/NET-10220
2024-09-05 09:02:45 -04:00
Daniel Bennett
2f5cf8efae networking: option to enable ipv6 on bridge network (#23882)
by setting bridge_network_subnet_ipv6 in client config

Co-authored-by: Martina Santangelo <martina.santangelo@hashicorp.com>
2024-09-04 10:17:10 -05:00
Piotr Kazmierczak
6700937303 cli: fix typos in quota_init and spec parsing (#23891) 2024-08-29 18:45:35 +02:00
Piotr Kazmierczak
9265b384b3 quota: parse device block (#23866) 2024-08-28 18:45:12 +02:00
Piotr Kazmierczak
2d7dcba4b7 quota: add device block to the quota init command (#23881) 2024-08-28 16:14:50 +02:00
Sujata Roy
36522ec632 Merge pull request #23850 from hashicorp/Nomad-NET-9394
command/debug: capture more logs by default
2024-08-22 10:43:28 -07:00
Florian Apolloner
d6be784e2d namespaces: add allowed network modes to capabilities. (#23813) 2024-08-16 09:47:19 -04:00
Seth Hoenig
db0642099e build: update golangci-lint to 1.60.1 (#23807)
* build: update golangci-lint to 1.60.1

* ci: update golangci-lint to v1.60.1

Helps with go1.23 compatability. Introduces some breaking changes / newly
enforced linter patterns so those are fixed as well.
2024-08-14 10:09:31 -05:00
Piotr Kazmierczak
0ab7e2219a assets rebuild 2024-08-13 12:48:08 +02:00