Commit Graph

3800 Commits

Author SHA1 Message Date
Sujata Roy
36522ec632 Merge pull request #23850 from hashicorp/Nomad-NET-9394
command/debug: capture more logs by default
2024-08-22 10:43:28 -07:00
Florian Apolloner
d6be784e2d namespaces: add allowed network modes to capabilities. (#23813) 2024-08-16 09:47:19 -04:00
Seth Hoenig
db0642099e build: update golangci-lint to 1.60.1 (#23807)
* build: update golangci-lint to 1.60.1

* ci: update golangci-lint to v1.60.1

Helps with go1.23 compatability. Introduces some breaking changes / newly
enforced linter patterns so those are fixed as well.
2024-08-14 10:09:31 -05:00
Piotr Kazmierczak
0ab7e2219a assets rebuild 2024-08-13 12:48:08 +02:00
hc-github-team-nomad-core
8489dadf57 Generate files for 1.8.3 release 2024-08-13 12:21:21 +02:00
Kartik Prajapati
3a3e63e2e1 cli: add role update functionality to acl token update (#18532) 2024-08-08 15:33:36 -04:00
Farbod Ahmadian
bb4c4fbd49 cli: show warning when creating token if policy doesn't exist (#16437) 2024-08-08 11:04:55 -04:00
Tim Gross
b25f1b66ce resources: allow job authors to configure size of secrets tmpfs (#23696)
On supported platforms, the secrets directory is a 1MiB tmpfs. But some tasks
need larger space for downloading large secrets. This is especially the case for
tasks using `templates`, which need extra room to write a temporary file to the
secrets directory that gets renamed to the old file atomically.

This changeset allows increasing the size of the tmpfs in the `resources`
block. Because this is a memory resource, we need to include it in the memory we
allocate for scheduling purposes. The task is already prevented from using more
memory in the tmpfs than the `resources.memory` field allows, but can bypass
that limit by writing to the tmpfs via `template` or `artifact` blocks.

Therefore, we need to account for the size of the tmpfs in the allocation
resources. Simply adding it to the memory needed when we create the allocation
allows it to be accounted for in all downstream consumers, and then we'll
subtract that amount from the memory resources just before configuring the task
driver.

For backwards compatibility, the default value of 1MiB is "free" and ignored by
the scheduler. Otherwise we'd be increasing the allocated resources for every
existing alloc, which could cause problems across upgrades. If a user explicitly
sets `resources.secrets = 1` it will no longer be free.

Fixes: https://github.com/hashicorp/nomad/issues/2481
Ref: https://hashicorp.atlassian.net/browse/NET-10070
2024-08-05 16:06:58 -04:00
Tim Gross
e684636aed cli: add option to return original HCL in job inspect command (#23699)
In 1.6.0 we shipped the ability to review the original HCL in the web UI, but
didn't follow-up with an equivalent in the command line. Add a `-hcl` flag to
the `job inspect` command.

Closes: https://github.com/hashicorp/nomad/issues/6778
2024-08-05 15:35:18 -04:00
Tim Gross
9d4686c0df tls: remove deprecated prefer_server_cipher_suites field (#23712)
The TLS configuration object includes a deprecated `prefer_server_cipher_suites`
field. In version of Go prior to 1.17, this property controlled whether a TLS
connection would use the cipher suites preferred by the server or by the
client. This field is ignored as of 1.17 and, according to the `crypto/tls`
docs: "Servers now select the best mutually supported cipher suite based on
logic that takes into account inferred client hardware, server hardware, and
security."

This property has been long-deprecated and leaving it in place may lead to false
assumptions about how cipher suites are negotiated in connection to a server. So
we want to remove it in Nomad 1.9.0.

Fixes: https://github.com/hashicorp/nomad-enterprise/issues/999
Ref: https://hashicorp.atlassian.net/browse/NET-10531
2024-08-01 08:52:05 -04:00
Tim Gross
c8be863bc8 reporting: allow export interval and address to be configurable (#23674)
The go-census library supports configuration to send metrics to a local
development version of the collector. Add "undocumented" configuration options
to the `reporting` block allow developers to debug and verify we're sending the
data we expect with real Nomad servers and not just unit tests.

Ref: https://hashicorp.atlassian.net/browse/NET-10057
Ref: https://github.com/hashicorp/nomad-enterprise/pull/1708
2024-07-24 08:29:59 -04:00
Tim Gross
2f4353412d keyring: support prepublishing keys (#23577)
When a root key is rotated, the servers immediately start signing Workload
Identities with the new active key. But workloads may be using those WI tokens
to sign into external services, which may not have had time to fetch the new
public key and which might try to fetch new keys as needed.

Add support for prepublishing keys. Prepublished keys will be visible in the
JWKS endpoint but will not be used for signing or encryption until their
`PublishTime`. Update the periodic key rotation to prepublish keys at half the
`root_key_rotation_threshold` window, and promote prepublished keys to active
after the `PublishTime`.

This changeset also fixes two bugs in periodic root key rotation and garbage
collection, both of which can't be safely fixed without implementing
prepublishing:

* Periodic root key rotation would never happen because the default
  `root_key_rotation_threshold` of 720h exceeds the 72h maximum window of the FSM
  time table. We now compare the `CreateTime` against the wall clock time instead
  of the time table. (We expect to remove the time table in future work, ref
  https://github.com/hashicorp/nomad/issues/16359)
* Root key garbage collection could GC keys that were used to sign
  identities. We now wait until `root_key_rotation_threshold` +
  `root_key_gc_threshold` before GC'ing a key.
* When rekeying a root key, the core job did not mark the key as inactive after
  the rekey was complete.

Ref: https://hashicorp.atlassian.net/browse/NET-10398
Ref: https://hashicorp.atlassian.net/browse/NET-10280
Fixes: https://github.com/hashicorp/nomad/issues/19669
Fixes: https://github.com/hashicorp/nomad/issues/23528
Fixes: https://github.com/hashicorp/nomad/issues/19368
2024-07-19 13:29:41 -04:00
Tim Gross
c970d22164 keyring: support external KMS for key encryption key (KEK) (#23580)
In Nomad 1.4.0, we shipped support for encrypted Variables and signed Workload
Identities, but the key material is protected only by a AEAD encrypting the
KEK. Add support for Vault transit encryption and external KMS from major cloud
providers. The servers call out to the external service to decrypt each key in
the on-disk keystore.

Ref: https://hashicorp.atlassian.net/browse/NET-10334
Fixes: https://github.com/hashicorp/nomad/issues/14852
2024-07-18 09:42:28 -04:00
Juanadelacuesta
656725a615 fix: updated ui assets 2024-07-17 13:59:51 +02:00
hc-github-team-nomad-core
6dc691da07 Generate files for 1.8.2 release 2024-07-17 00:00:36 +02:00
guifran001
1c44521543 client: Add a preferred address family option for network-interface (#23389)
to prefer ipv4 or ipv6 when deducing IP from network interface

Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>
2024-07-12 15:30:38 -05:00
Martina Santangelo
661011f5de cni: allow users to set CNI args in job spec (#23538) 2024-07-12 11:47:15 -04:00
Piotr Kazmierczak
fa8ffedd74 api: handle newlines in JobSubmission vars correctly (#23560)
Fixes a bug where variable values in job submissions that contained newlines
weren't encoded correctly, and thus jobs that contained them couldn't be
resumed once stopped via the UI.

Internal ref: https://hashicorp.atlassian.net/browse/NET-9966
2024-07-12 08:04:27 +02:00
James Rasell
f3de47e63d quota: Allow cores to be configured within an enterprise quota. (#23543) 2024-07-11 14:54:25 +01:00
Tim Gross
b09c1146a9 CLI: fix prefix matching across multiple commands (#23502)
Several commands that inspect objects where the names are user-controlled share
a bug where the user cannot inspect the object if it has a name that is an exact
prefix of the name of another object (in the same namespace, where
applicable). For example, the object "test" can't be inspected if there's an
object with the name "testing".

Copy existing logic we have for jobs, node pools, etc. to the impacted commands:

* `plugin status`
* `quota inspect`
* `quota status`
* `scaling policy info`
* `service info`
* `volume deregister`
* `volume detach`
* `volume status`

If we get multiple objects for the prefix query, we check if any of them are an
exact match and use that object instead of returning an error. Where possible
because the prefix query signatures are the same, use a generic function that
can be shared across multiple commands.

Fixes: https://github.com/hashicorp/nomad/issues/13920
Fixes: https://github.com/hashicorp/nomad/issues/17132
Fixes: https://github.com/hashicorp/nomad/issues/23236
Ref: https://hashicorp.atlassian.net/browse/NET-10054
Ref: https://hashicorp.atlassian.net/browse/NET-10055
2024-07-10 09:04:10 -04:00
Sujata Roy
6f34bf3ba7 Nomad Default to 5m duration and trace-level logging 2024-07-09 16:43:02 -07:00
Piotr Kazmierczak
88e8973004 consul: additional unit test for consul config merging (#23495) 2024-07-03 16:09:16 +02:00
Seth Hoenig
3f57c9bcf2 cli: fix bold output of devices headers (#23477) 2024-07-01 12:36:55 -05:00
Tim Gross
cd3101d624 scale: add -check-index to job scale command (#23457)
The RPC handler for scaling a job passes flags to enforce the job modify index
is unchanged when it makes the write to Raft. But its only checking against the
existing job modify index at the time the RPC handler snapshots the state store,
so it can only enforce consistency for its own validation.

In clusters with automated scaling, it would be useful to expose the enforce
index options to the API, so that cluster admins can enforce that scaling only
happens when the job state is consistent with a state they've previously seen in
other API calls. Add this option to the CLI and API and have the RPC handler
check them if asked.

Fixes: https://github.com/hashicorp/nomad/issues/23444
2024-06-27 16:54:06 -04:00
James Rasell
d63ad1a6c5 Generate UI assets 2024-06-20 14:13:24 +01:00
hc-github-team-nomad-core
9566174e92 Generate files for 1.8.1 release 2024-06-19 15:24:08 +01:00
nicoche
ffcb72bfe3 api: Add Notes field to service checks (#22397)
Co-authored-by: Piotr Kazmierczak <470696+pkazmierczak@users.noreply.github.com>
2024-06-10 16:59:49 +02:00
Gerard Nguyen
c3c2240304 Update nomad operator snapshot inspect with more detail (#20062)
Co-authored-by: Michael Schurter <michael.schurter@gmail.com>
Co-authored-by: James Rasell <jrasell@hashicorp.com>
2024-06-06 06:57:10 +01:00
Piotr Kazmierczak
2a09abc477 metrics: quota utilization configuration and documentation (#22912)
Introduces support for (optional) quota utilization metrics

CE part of the hashicorp/nomad-enterprise#1488 change
2024-06-03 21:06:19 +02:00
Phil Renaud
014f5145dc Lockfile and bindata_assetfs recompiled on latest main (#22434) 2024-05-31 13:23:59 -04:00
Phil Renaud
86ee56b8c5 [ui] Jobs index page badge for when a job has a paused task (#22392)
* Adds a badge on the jobs index page if any task within any allocation of a running job is currently paused

* Snapshot and acceptance tests for paused states

* Cleared yarn cache

* Remove MirageScenario from the test dependency chain

* Logging before toString

* Cardinal sin of time-based test execution

* Maybe weve been lucky for years and the clientStatus has always been running for this test by happenstance

* Back away from the time-based and toward the settled() approach
2024-05-30 21:18:35 -04:00
Michael Schurter
690abefc4a docs: add docs for time based task execution 2024-05-29 15:50:33 -07:00
Tim Gross
de38ff4189 consul: set partition for gateway config entries (#22228)
When we write Connect gateway configuation entries from the server, we're not
passing in the intended partition. This means we're using the server's own
partition to submit the configuration entries and this may not match. Note this
requires the Nomad server's token has permission to that partition.

Also, move the config entry write after we check Sentinel policies. This allows
us to return early if we hit a Sentinel error without making Consul RPCs first.
2024-05-29 16:31:02 -04:00
hc-github-team-nomad-core
32d820644a Generate files for 1.8.0 release 2024-05-29 11:48:55 -04:00
hc-github-team-nomad-core
c374bd375b Generate files for 1.8.0-rc.1 release 2024-05-23 16:55:05 -04:00
Daniel Bennett
4415fabe7d jobspec: time based task execution (#22201)
this is the CE side of an Enterprise-only feature.
a job trying to use this in CE will fail to validate.

to enable daily-scheduled execution entirely client-side,
a job may now contain:

task "name" {
  schedule {
    cron {
      start    = "0 12 * * * *" # may not include "," or "/"
      end      = "0 16"         # partial cron, with only {minute} {hour}
      timezone = "EST"          # anything in your tzdb
    }
  }
...

and everything about the allocation will be placed as usual,
but if outside the specified schedule, the taskrunner will block
on the client, waiting on the schedule start, before proceeding
with the task driver execution, etc.

this includes a taksrunner hook, which watches for the end of
the schedule, at which point it will kill the task.

then, restarts-allowing, a new task will start and again block
waiting for start, and so on.

this also includes all the plumbing required to pipe API calls
through from command->api->agent->server->client, so that
tasks can be force-run, force-paused, or resume the schedule
on demand.
2024-05-22 15:40:25 -05:00
Phil Renaud
e8b77fcfa0 [ui] Jobspec UI block: Descriptions and Links (#18292)
* Hacky but shows links and desc

* markdown

* Small pre-test cleanup

* Test for UI description and link rendering

* JSON jobspec docs and variable example job get UI block

* Jobspec documentation for UI block

* Description and links moved into the Title component and made into Helios components

* Marked version upgrade

* Allow links without a description and max description to 1000 chars

* Node 18 for setup-js

* markdown sanitization

* Ui to UI and docs change

* Canonicalize, copy and diff for job.ui

* UI block added to testJob for structs testing

* diff test

* Remove redundant reset

* For readability, changing the receiving pointer of copied job variables

* TestUI endpiont conversion tests

* -require +must

* Nil check on Links

* JobUIConfig.Links as pointer

---------

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2024-05-22 15:00:45 -04:00
Seth Hoenig
09bd11383c client: alloc_mounts directory must be sibling of data directory (#22199)
This PR adjusts the default location of -alloc-mounts-dir path to be a
sibling of the -data-dir path rather than a child. This is because on a
production-hardened systems the data dir is supposed to be chmod 0700
owned by root - preventing the exec2 task driver (and others using
unveil file system isolation features) from working properly.

For reference the directory structure from -data-dir now looks like this
after running an example job. Under the alloc_mounts directory, task
specific directories are mode 0710 and owned by the task user (which
may be a dynamic user UID/GID).

➜ sudo tree -p -d -u /tmp/mynomad
[drwxrwxr-x shoenig ]  /tmp/mynomad
├── [drwx--x--x root    ]  alloc_mounts
│   └── [drwx--x--- 80552   ]  c753b71d-c6a1-3370-1f59-47ab838fd8a6-mytask
│       ├── [drwxrwxrwx nobody  ]  alloc
│       │   ├── [drwxrwxrwx nobody  ]  data
│       │   ├── [drwxrwxrwx nobody  ]  logs
│       │   └── [drwxrwxrwx nobody  ]  tmp
│       ├── [drwxrwxrwx nobody  ]  local
│       ├── [drwxr-xr-x root    ]  private
│       ├── [drwx--x--- 80552   ]  secrets
│       └── [drwxrwxrwt nobody  ]  tmp
└── [drwx------ root    ]  data
    ├── [drwx--x--x root    ]  alloc
    │   └── [drwxr-xr-x root    ]  c753b71d-c6a1-3370-1f59-47ab838fd8a6
    │       ├── [drwxrwxrwx nobody  ]  alloc
    │       │   ├── [drwxrwxrwx nobody  ]  data
    │       │   ├── [drwxrwxrwx nobody  ]  logs
    │       │   └── [drwxrwxrwx nobody  ]  tmp
    │       └── [drwx--x--- 80552   ]  mytask
    │           ├── [drwxrwxrwx nobody  ]  alloc
    │           │   ├── [drwxrwxrwx nobody  ]  data
    │           │   ├── [drwxrwxrwx nobody  ]  logs
    │           │   └── [drwxrwxrwx nobody  ]  tmp
    │           ├── [drwxrwxrwx nobody  ]  local
    │           ├── [drwxrwxrwx nobody  ]  private
    │           ├── [drwx--x--- 80552   ]  secrets
    │           └── [drwxrwxrwt nobody  ]  tmp
    ├── [drwx------ root    ]  client
    └── [drwxr-xr-x root    ]  server
        ├── [drwx------ root    ]  keystore
        ├── [drwxr-xr-x root    ]  raft
        │   └── [drwxr-xr-x root    ]  snapshots
        └── [drwxr-xr-x root    ]  serf

32 directories
2024-05-22 13:14:34 -05:00
Deniz Onur Duzgun
1cc99cc1b4 bug: resolve type conversion alerts (#20553) 2024-05-15 13:22:10 -04:00
Tim Gross
c9fd93c772 connect: support volume_mount blocks for sidecar task overrides (#20575)
Users can override the default sidecar task for Connect workloads. This sidecar
task might need access to certificate stores on the host. Allow adding the
`volume_mount` block to the sidecar task override.

Also fixes a bug where `volume_mount` blocks would not appear in plan diff
outputs.

Fixes: https://github.com/hashicorp/nomad/issues/19786
2024-05-14 12:49:37 -04:00
hc-github-team-nomad-core
e1a176c120 Generate files for 1.8.0-beta.1 release 2024-05-07 07:06:07 +00:00
Daniel Bennett
cf87a556b3 api: new /v1/jobs/statuses endpoint for /ui/jobs page (#20130)
introduce a new API /v1/jobs/statuses, primarily for use in the UI,
which collates info about jobs, their allocations, and latest deployment.

currently the UI gets *all* of /v1/jobs and sorts and paginates them client-side
in the browser, and its "summary" column is based on historical summary data
(which can be visually misleading, and sometimes scary when a job has failed
at some point in the not-yet-garbage-collected past).

this does pagination and filtering and such, and returns jobs sorted by ModifyIndex,
so latest-changed jobs still come first. it pulls allocs and latest deployment
straight out of current state for more a more robust, holistic view of the job status.
it is less efficient per-job, due to the extra state lookups, but should be more efficient
per-page (excepting perhaps for job(s) with very-many allocs).

if a POST body is sent like `{"jobs": [{"namespace": "cool-ns", "id": "cool-job"}]}`,
then the response will be limited to that subset of jobs. the main goal here is to
prevent "jostling" the user in the UI when jobs come into and out of existence.

and if a blocking query is started with `?index=N`, then the query should only
unblock if jobs "on page" change, rather than any change to any of the state
tables being queried ("jobs", "allocs", and "deployment"), to save unnecessary
HTTP round trips.
2024-05-03 15:01:40 -05:00
Tim Gross
54fc146432 agent: add support for sdnotify protocol (#20528)
Nomad agents expect to receive `SIGHUP` to reload their configuration. The
signal handler for this is installed fairly late in agent startup, after the
client or server components are up and running. This means that configuration
management tools can potentially reload the configuration before the agent can
handle it, causing the agent to crash.

We don't want to allow configuration reload during client or server component
startup, because it would significantly complicate initialization. Instead,
we'll implement the systemd notify protocol. This causes systemd to block
sending configuration reload signals until the agent is actually ready. Users
can still bypass this by sending signals directly.

Note that there are several Go libraries that implement the sdnotify protocol,
but most are part of much larger projects which would create a lot of dependabot
burden. The bits of the protocol we need are extremely simple to implement in a
just a couple of functions.

For non-Linux or non-systemd Linux systems, this feature is a no-op. In future
work we could potentially implement service notification for Windows as well.

Fixes: https://github.com/hashicorp/nomad/issues/3885
2024-05-03 13:42:07 -04:00
Tim Gross
f9dd120d29 cli: add -jwks-ca-file to Vault/Consul setup commands (#20518)
When setting up auth methods for Consul and Vault in production environments, we
can typically assume that the CA certificate for the JWKS endpoint will be in
the host certificate store (as part of the usual configuration management
cluster admins needs to do). But for quick demos with `-dev` agents, this won't
be the case.

Add a `-jwks-ca-file` parameter to the setup commands so that we can use this
tool to quickly setup WI with `-dev` agents running TLS.
2024-05-03 08:26:29 -04:00
Michael Schurter
3aefc010d7 test: remove spurious print statements (#20503) 2024-05-01 09:47:56 -07:00
James Rasell
05a7bb53d3 cli: fix handling of scaling jobs which don't generate evals. (#20479)
In some cases, Nomad job scaling will not generate evaluations
such as parameterized jobs. This change fixes the CLI behaviour
in this case, and copies the job run command for consistency.
2024-04-30 10:32:31 +01:00
Daniel Bennett
3ac3bc1cfe acl: token global mode can not be changed (#20464)
true up CLI and docs with API reality
2024-04-22 11:58:47 -05:00
Juana De La Cuesta
64978662b6 Post 1.7.7 release (#20421)
Generate files for 1.7.7 release, prepare for next release and merge release 1.7.7 files
2024-04-17 10:44:32 +02:00
Seth Hoenig
ae6c4c8e3f deps: purge use of old x/exp packages (#20373) 2024-04-12 08:29:00 -05:00
astudentofblake
7b7ed12326 func: Allow custom paths to be added the the getter landlock (#20349)
* func: Allow custom paths to be added the the getter landlock

Fixes: 20315

* fix: slices imports
fix: more meaningful examples
fix: improve documentation
fix: quote error output
2024-04-11 15:17:33 -05:00