Commit Graph

3826 Commits

Author SHA1 Message Date
Michael Schurter
e440e1d1db cli: update nomad job init full examples (#24232)
* cli: trim job init example jobspec
* cli: trim job init -connect example jobspec

---------

Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>
2024-10-17 10:32:47 -07:00
Daniel Bennett
a0d7fb6b09 connect: fix ipv6 bind_address test (#24216) 2024-10-16 08:23:44 -05:00
Daniel Bennett
067afcda26 Consul Connect over IPv6 (except tproxy) (#24203)
* detect ipv6 on "bridge" network and set
  service.connect.sidecar_proxy.config.bind_address
  for envoy to "::" instead of "0.0.0.0"
* allow users to set bind_address in jobspec
  e.g. "" would defer to consul proxy-defaults
* caveat: tproxy still does not work, because
  the CNI plugin does not configure ip6tables
2024-10-14 18:52:02 -05:00
hc-github-team-nomad-core
f1714162df Generate files for 1.9.0 release 2024-10-14 07:26:36 +01:00
Tim Gross
4de1665942 consul: improve reliability of deregistration (#24166)
When the local Consul agent receives a deregister request, it performs a
pre-flight check using the locally cached ACL token. The agent then sends the
request upstream to the Consul servers as part of anti-entropy, using its own
token. This requires that the token we use for deregistration is valid even
though that's not the token used to write to the Consul server.

There are several cases where the service identity token might no longer exist
at the time of deregistration:
* A race condition between the sync and destroying the allocation.
* Misconfiguration of the Consul auth method with a TTL.
* Out-of-band destruction of the token.

Additionally, Nomad's sync with Consul returns early if there are any errors,
which means that a single broken token can prevent any other service on the
Nomad agent from being registered or deregistered.

Update Nomad's sync with Consul to use the Nomad agent's own Consul token for
deregistration, regardless of which token the service was registered
with. Accumulate errors from the sync so that they no longer block
deregistration of other services.

Fixes: https://github.com/hashicorp/nomad/issues/20159
2024-10-11 12:32:23 -04:00
Seth Hoenig
f1ce127524 jobspec: add a chown option to artifact block (#24157)
* jobspec: add a chown option to artifact block

This PR adds a boolean 'chown' field to the artifact block.

It indicates whether the Nomad client should chown the downloaded files
and directories to be owned by the task.user. This is useful for drivers
like raw_exec and exec2 which are subject to the host filesystem user
permissions structure. Before, these drivers might not be able to use or
manage the downloaded artifacts since they would be owned by the root
user on a typical Nomad client configuration.

* api: no need for pointer of chown field
2024-10-11 11:30:27 -05:00
hc-github-team-nomad-core
668a827b2b Generate files for 1.9.0-beta.2 release 2024-10-04 16:18:27 +00:00
Martijn Vegter
3ecf0d21e2 metrics: introduce client config to include alloc metadata as part of the base labels (#23964) 2024-10-02 10:55:44 -04:00
Tim Gross
6c03e1991d refactor: clean up slice initialization in node status (#24109)
We initialize this slice with a zeroed array and then append to it, which means
we then have to clean out the empty strings later. Initialize to the correct
capacity up front so there are no empty values.

Ref: https://github.com/hashicorp/nomad/pull/24104
2024-10-02 10:40:32 -04:00
Tim Gross
651d8d6f88 tests: fixup copywrite in test file (#24101)
In #24007 we merged new HCL files but they were missing copywrite headers
because the scan didn't run on this PR for some reason. I've already backported
this to the Enterprise branches.
2024-10-01 16:43:10 -04:00
Juliano Martinez
4a74fda8ce Allow client template config block to be parsed when using json config (#24007)
- Adds tests
- Adds sample test data for parsing hcl and json
- Adds changelog
2024-10-01 15:44:36 -04:00
Tim Gross
5e1ad14f1f scaling policy: use request namespace as target if unset in jobspec (#24065)
When jobs are submitted with a scaling policy, the scaling policy's target only
includes the job's namespace if the `namespace` field is set in the jobspec and
not from the request. Normally jobs are canonicalized in the RPC handler before
being written to Raft. But the scaling policy targets are instead written during
the conversion from `api.Job` to `structs.Job`. We populate the `structs.Job`
namespace from the request here as well, but only after the conversion has
occurred. Swap the order of these operations so that the conversion is always
happening with a correct namespace.

Long-term we should not be making mutations during conversion either. But we
can't remove it immediately because API requests may come from any agent across
upgrades. Move the scaling target creation into the `Canonicalize` method and
mark it for future removal in the API conversion code path.

Fixes: https://github.com/hashicorp/nomad/issues/24039
2024-10-01 11:41:40 -04:00
hc-github-team-nomad-core
07dc87eb21 Generate files for 1.9.0-beta.1 release 2024-09-26 17:35:57 +00:00
Phil Renaud
e206993d49 Feature: Golden Versions (#24055)
* TaggedVersion information in structs, rather than job_endpoint (#23841)

* TaggedVersion information in structs, rather than job_endpoint

* Test for taggedVersion description length

* Some API plumbing

* Tag and Untag job versions (#23863)

* Tag and Untag at API level on down, but am I unblocking the wrong thing?

* Code and comment cleanup

* Unset methods generally now I stare long into the namespace abyss

* Namespace passes through with QueryOptions removed from a write requesting struct

* Comment and PR review cleanup

* Version back to VersionStr

* Generally consolidate unset logic into apply for version tagging

* Addressed some PR comments

* Auth check and RPC forwarding

* uint64 instead of pointer for job version after api layer and renamed copy

* job tag command split into apply and unset

* latest-version convenience handling moved to CLI command level

* CLI tests for tagging/untagging

* UI parts removed

* Add to job table when unsetting job tag on latest version

* Vestigial no more

* Compare versions by name and version number with the nomad history command (#23889)

* First pass at passing a tagname and/or diff version to plan/versions requests

* versions API now takes compare_to flags

* Job history command output can have tag names and descriptions

* compare_to to diff-tag and diff-version, plus adding flags to history command

* 0th version now shows a diff if a specific diff target is requested

* Addressing some PR comments

* Simplify the diff-appending part of jobVersions and hide None-type diffs from CLI

* Remove the diff-tag and diff-version parts of nomad job plan, with an eye toward making them a new top-level CLI command soon

* Version diff tests

* re-implement JobVersionByTagName

* Test mods and simplification

* Documentation for nomad job history additions

* Prevent pruning and reaping of TaggedVersion jobs (#23983)

tagged versions should not count against JobTrackedVersions
i.e. new job versions being inserted should not evict tagged versions

and GC should not delete a job if any of its versions are tagged

Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>

---------

Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>

* [ui] Version Tags on the job versions page (#24013)

* Timeline styles and their buttons modernized, and tags added

* styled but not yet functional version blocks

* Rough pass at edit/unedit UX

* Styles consolidated

* better UX around version tag crud, plus adapter and serializers

* Mirage and acceptance tests

* Modify percy to not show time-based things

---------

Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>

* Job revert command and API endpoint can take a string version tag name (#24059)

* Job revert command and API endpoint can take a string version tag name

* RevertOpts as a signature-modified alternative to Revert()

* job revert CLI test

* Version pointers in endpoint tests

* Dont copy over the tag when a job is reverted to a version with a tag

* Convert tag name to version number at CLI level

* Client method for version lookup by tag

* No longer double-declaring client

* [ui] Add tag filter to the job versions page (#24064)

* Rough pass at the UI for version diff dropdown

* Cleanup and diff fetching via adapter method

* TaggedVersion now VersionTag (#24066)

---------

Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>
2024-09-25 19:59:16 -04:00
Tim Gross
65ec00da1d cli: fix -t flag on job status command (#24054)
In #18925 we added a `-json` flag to the `job status` command, but the argument
handling had a bug where it would always set the `-json` flag if either the `-t`
or `-json` flags were set, resulting in a misleading error. Instead, pass the
`-json` flag value into the formatter.

Fixes: https://github.com/hashicorp/nomad/issues/24050
2024-09-25 09:12:52 -04:00
Michael Smithhisler
6b6aa7cc26 identity: adds ability to specify custom filepath for saving workload identities (#24038) 2024-09-23 10:27:00 -04:00
Tim Gross
a7f2cb879e command line tools for redacting keyring from snapshots (#24023)
In #23977 we moved the keyring into Raft, which can expose key material in Raft
snapshots when using the less-secure AEAD keyring instead of KMS. This changeset
adds tools for redacting this material from snapshots:

* The `operator snapshot state` command gains the ability to display key
  metadata (only), which respects the `-filter` option.
* The `operator snapshot save` command gains a `-redact` option that removes key
  material from the snapshot after it's downloaded.
* A new `operator snapshot redact` command allows removing key material from an
  existing snapshot.
2024-09-20 15:30:14 -04:00
Daniel Bennett
ec81e7c57c networking: add ignore_collision for static port{} (#23956)
so more than one copy of a program can run
at a time on the same port with SO_REUSEPORT.

requires host network mode.

some task drivers (like docker) may also need
config {
  network_mode = "host"
}
but this is not validated prior to placement.
2024-09-17 16:01:48 -05:00
Seth Hoenig
51215bf102 deps: update to go-set/v3 and refactor to use custom iterators (#23971)
* deps: update to go-set/v3

* deps: use custom set iterators for looping
2024-09-16 13:40:10 -05:00
Piotr Kazmierczak
47f3313ffd cli: quota status extension for devices (#23899)
quota status CLI now displays device limits (if present in the quota spec)
2024-09-12 16:51:53 +02:00
Piotr Kazmierczak
2e6ccf825a quotas: corrections to Resources.Add and quota apply parsing logic (#23894) 2024-09-09 15:27:17 +02:00
Tim Gross
a9beef7edd jobspec: remove HCL1 support (#23912)
This changeset removes support for parsing jobspecs via the long-deprecated
HCLv1.

Fixes: https://github.com/hashicorp/nomad/issues/20195
Ref: https://hashicorp.atlassian.net/browse/NET-10220
2024-09-05 09:02:45 -04:00
Daniel Bennett
2f5cf8efae networking: option to enable ipv6 on bridge network (#23882)
by setting bridge_network_subnet_ipv6 in client config

Co-authored-by: Martina Santangelo <martina.santangelo@hashicorp.com>
2024-09-04 10:17:10 -05:00
Piotr Kazmierczak
6700937303 cli: fix typos in quota_init and spec parsing (#23891) 2024-08-29 18:45:35 +02:00
Piotr Kazmierczak
9265b384b3 quota: parse device block (#23866) 2024-08-28 18:45:12 +02:00
Piotr Kazmierczak
2d7dcba4b7 quota: add device block to the quota init command (#23881) 2024-08-28 16:14:50 +02:00
Sujata Roy
36522ec632 Merge pull request #23850 from hashicorp/Nomad-NET-9394
command/debug: capture more logs by default
2024-08-22 10:43:28 -07:00
Florian Apolloner
d6be784e2d namespaces: add allowed network modes to capabilities. (#23813) 2024-08-16 09:47:19 -04:00
Seth Hoenig
db0642099e build: update golangci-lint to 1.60.1 (#23807)
* build: update golangci-lint to 1.60.1

* ci: update golangci-lint to v1.60.1

Helps with go1.23 compatability. Introduces some breaking changes / newly
enforced linter patterns so those are fixed as well.
2024-08-14 10:09:31 -05:00
Piotr Kazmierczak
0ab7e2219a assets rebuild 2024-08-13 12:48:08 +02:00
hc-github-team-nomad-core
8489dadf57 Generate files for 1.8.3 release 2024-08-13 12:21:21 +02:00
Kartik Prajapati
3a3e63e2e1 cli: add role update functionality to acl token update (#18532) 2024-08-08 15:33:36 -04:00
Farbod Ahmadian
bb4c4fbd49 cli: show warning when creating token if policy doesn't exist (#16437) 2024-08-08 11:04:55 -04:00
Tim Gross
b25f1b66ce resources: allow job authors to configure size of secrets tmpfs (#23696)
On supported platforms, the secrets directory is a 1MiB tmpfs. But some tasks
need larger space for downloading large secrets. This is especially the case for
tasks using `templates`, which need extra room to write a temporary file to the
secrets directory that gets renamed to the old file atomically.

This changeset allows increasing the size of the tmpfs in the `resources`
block. Because this is a memory resource, we need to include it in the memory we
allocate for scheduling purposes. The task is already prevented from using more
memory in the tmpfs than the `resources.memory` field allows, but can bypass
that limit by writing to the tmpfs via `template` or `artifact` blocks.

Therefore, we need to account for the size of the tmpfs in the allocation
resources. Simply adding it to the memory needed when we create the allocation
allows it to be accounted for in all downstream consumers, and then we'll
subtract that amount from the memory resources just before configuring the task
driver.

For backwards compatibility, the default value of 1MiB is "free" and ignored by
the scheduler. Otherwise we'd be increasing the allocated resources for every
existing alloc, which could cause problems across upgrades. If a user explicitly
sets `resources.secrets = 1` it will no longer be free.

Fixes: https://github.com/hashicorp/nomad/issues/2481
Ref: https://hashicorp.atlassian.net/browse/NET-10070
2024-08-05 16:06:58 -04:00
Tim Gross
e684636aed cli: add option to return original HCL in job inspect command (#23699)
In 1.6.0 we shipped the ability to review the original HCL in the web UI, but
didn't follow-up with an equivalent in the command line. Add a `-hcl` flag to
the `job inspect` command.

Closes: https://github.com/hashicorp/nomad/issues/6778
2024-08-05 15:35:18 -04:00
Tim Gross
9d4686c0df tls: remove deprecated prefer_server_cipher_suites field (#23712)
The TLS configuration object includes a deprecated `prefer_server_cipher_suites`
field. In version of Go prior to 1.17, this property controlled whether a TLS
connection would use the cipher suites preferred by the server or by the
client. This field is ignored as of 1.17 and, according to the `crypto/tls`
docs: "Servers now select the best mutually supported cipher suite based on
logic that takes into account inferred client hardware, server hardware, and
security."

This property has been long-deprecated and leaving it in place may lead to false
assumptions about how cipher suites are negotiated in connection to a server. So
we want to remove it in Nomad 1.9.0.

Fixes: https://github.com/hashicorp/nomad-enterprise/issues/999
Ref: https://hashicorp.atlassian.net/browse/NET-10531
2024-08-01 08:52:05 -04:00
Tim Gross
c8be863bc8 reporting: allow export interval and address to be configurable (#23674)
The go-census library supports configuration to send metrics to a local
development version of the collector. Add "undocumented" configuration options
to the `reporting` block allow developers to debug and verify we're sending the
data we expect with real Nomad servers and not just unit tests.

Ref: https://hashicorp.atlassian.net/browse/NET-10057
Ref: https://github.com/hashicorp/nomad-enterprise/pull/1708
2024-07-24 08:29:59 -04:00
Tim Gross
2f4353412d keyring: support prepublishing keys (#23577)
When a root key is rotated, the servers immediately start signing Workload
Identities with the new active key. But workloads may be using those WI tokens
to sign into external services, which may not have had time to fetch the new
public key and which might try to fetch new keys as needed.

Add support for prepublishing keys. Prepublished keys will be visible in the
JWKS endpoint but will not be used for signing or encryption until their
`PublishTime`. Update the periodic key rotation to prepublish keys at half the
`root_key_rotation_threshold` window, and promote prepublished keys to active
after the `PublishTime`.

This changeset also fixes two bugs in periodic root key rotation and garbage
collection, both of which can't be safely fixed without implementing
prepublishing:

* Periodic root key rotation would never happen because the default
  `root_key_rotation_threshold` of 720h exceeds the 72h maximum window of the FSM
  time table. We now compare the `CreateTime` against the wall clock time instead
  of the time table. (We expect to remove the time table in future work, ref
  https://github.com/hashicorp/nomad/issues/16359)
* Root key garbage collection could GC keys that were used to sign
  identities. We now wait until `root_key_rotation_threshold` +
  `root_key_gc_threshold` before GC'ing a key.
* When rekeying a root key, the core job did not mark the key as inactive after
  the rekey was complete.

Ref: https://hashicorp.atlassian.net/browse/NET-10398
Ref: https://hashicorp.atlassian.net/browse/NET-10280
Fixes: https://github.com/hashicorp/nomad/issues/19669
Fixes: https://github.com/hashicorp/nomad/issues/23528
Fixes: https://github.com/hashicorp/nomad/issues/19368
2024-07-19 13:29:41 -04:00
Tim Gross
c970d22164 keyring: support external KMS for key encryption key (KEK) (#23580)
In Nomad 1.4.0, we shipped support for encrypted Variables and signed Workload
Identities, but the key material is protected only by a AEAD encrypting the
KEK. Add support for Vault transit encryption and external KMS from major cloud
providers. The servers call out to the external service to decrypt each key in
the on-disk keystore.

Ref: https://hashicorp.atlassian.net/browse/NET-10334
Fixes: https://github.com/hashicorp/nomad/issues/14852
2024-07-18 09:42:28 -04:00
Juanadelacuesta
656725a615 fix: updated ui assets 2024-07-17 13:59:51 +02:00
hc-github-team-nomad-core
6dc691da07 Generate files for 1.8.2 release 2024-07-17 00:00:36 +02:00
guifran001
1c44521543 client: Add a preferred address family option for network-interface (#23389)
to prefer ipv4 or ipv6 when deducing IP from network interface

Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>
2024-07-12 15:30:38 -05:00
Martina Santangelo
661011f5de cni: allow users to set CNI args in job spec (#23538) 2024-07-12 11:47:15 -04:00
Piotr Kazmierczak
fa8ffedd74 api: handle newlines in JobSubmission vars correctly (#23560)
Fixes a bug where variable values in job submissions that contained newlines
weren't encoded correctly, and thus jobs that contained them couldn't be
resumed once stopped via the UI.

Internal ref: https://hashicorp.atlassian.net/browse/NET-9966
2024-07-12 08:04:27 +02:00
James Rasell
f3de47e63d quota: Allow cores to be configured within an enterprise quota. (#23543) 2024-07-11 14:54:25 +01:00
Tim Gross
b09c1146a9 CLI: fix prefix matching across multiple commands (#23502)
Several commands that inspect objects where the names are user-controlled share
a bug where the user cannot inspect the object if it has a name that is an exact
prefix of the name of another object (in the same namespace, where
applicable). For example, the object "test" can't be inspected if there's an
object with the name "testing".

Copy existing logic we have for jobs, node pools, etc. to the impacted commands:

* `plugin status`
* `quota inspect`
* `quota status`
* `scaling policy info`
* `service info`
* `volume deregister`
* `volume detach`
* `volume status`

If we get multiple objects for the prefix query, we check if any of them are an
exact match and use that object instead of returning an error. Where possible
because the prefix query signatures are the same, use a generic function that
can be shared across multiple commands.

Fixes: https://github.com/hashicorp/nomad/issues/13920
Fixes: https://github.com/hashicorp/nomad/issues/17132
Fixes: https://github.com/hashicorp/nomad/issues/23236
Ref: https://hashicorp.atlassian.net/browse/NET-10054
Ref: https://hashicorp.atlassian.net/browse/NET-10055
2024-07-10 09:04:10 -04:00
Sujata Roy
6f34bf3ba7 Nomad Default to 5m duration and trace-level logging 2024-07-09 16:43:02 -07:00
Piotr Kazmierczak
88e8973004 consul: additional unit test for consul config merging (#23495) 2024-07-03 16:09:16 +02:00
Seth Hoenig
3f57c9bcf2 cli: fix bold output of devices headers (#23477) 2024-07-01 12:36:55 -05:00
Tim Gross
cd3101d624 scale: add -check-index to job scale command (#23457)
The RPC handler for scaling a job passes flags to enforce the job modify index
is unchanged when it makes the write to Raft. But its only checking against the
existing job modify index at the time the RPC handler snapshots the state store,
so it can only enforce consistency for its own validation.

In clusters with automated scaling, it would be useful to expose the enforce
index options to the API, so that cluster admins can enforce that scaling only
happens when the job state is consistent with a state they've previously seen in
other API calls. Add this option to the CLI and API and have the RPC handler
check them if asked.

Fixes: https://github.com/hashicorp/nomad/issues/23444
2024-06-27 16:54:06 -04:00