Commit Graph

3769 Commits

Author SHA1 Message Date
Michael Schurter
690abefc4a docs: add docs for time based task execution 2024-05-29 15:50:33 -07:00
Tim Gross
de38ff4189 consul: set partition for gateway config entries (#22228)
When we write Connect gateway configuation entries from the server, we're not
passing in the intended partition. This means we're using the server's own
partition to submit the configuration entries and this may not match. Note this
requires the Nomad server's token has permission to that partition.

Also, move the config entry write after we check Sentinel policies. This allows
us to return early if we hit a Sentinel error without making Consul RPCs first.
2024-05-29 16:31:02 -04:00
hc-github-team-nomad-core
32d820644a Generate files for 1.8.0 release 2024-05-29 11:48:55 -04:00
hc-github-team-nomad-core
c374bd375b Generate files for 1.8.0-rc.1 release 2024-05-23 16:55:05 -04:00
Daniel Bennett
4415fabe7d jobspec: time based task execution (#22201)
this is the CE side of an Enterprise-only feature.
a job trying to use this in CE will fail to validate.

to enable daily-scheduled execution entirely client-side,
a job may now contain:

task "name" {
  schedule {
    cron {
      start    = "0 12 * * * *" # may not include "," or "/"
      end      = "0 16"         # partial cron, with only {minute} {hour}
      timezone = "EST"          # anything in your tzdb
    }
  }
...

and everything about the allocation will be placed as usual,
but if outside the specified schedule, the taskrunner will block
on the client, waiting on the schedule start, before proceeding
with the task driver execution, etc.

this includes a taksrunner hook, which watches for the end of
the schedule, at which point it will kill the task.

then, restarts-allowing, a new task will start and again block
waiting for start, and so on.

this also includes all the plumbing required to pipe API calls
through from command->api->agent->server->client, so that
tasks can be force-run, force-paused, or resume the schedule
on demand.
2024-05-22 15:40:25 -05:00
Phil Renaud
e8b77fcfa0 [ui] Jobspec UI block: Descriptions and Links (#18292)
* Hacky but shows links and desc

* markdown

* Small pre-test cleanup

* Test for UI description and link rendering

* JSON jobspec docs and variable example job get UI block

* Jobspec documentation for UI block

* Description and links moved into the Title component and made into Helios components

* Marked version upgrade

* Allow links without a description and max description to 1000 chars

* Node 18 for setup-js

* markdown sanitization

* Ui to UI and docs change

* Canonicalize, copy and diff for job.ui

* UI block added to testJob for structs testing

* diff test

* Remove redundant reset

* For readability, changing the receiving pointer of copied job variables

* TestUI endpiont conversion tests

* -require +must

* Nil check on Links

* JobUIConfig.Links as pointer

---------

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2024-05-22 15:00:45 -04:00
Seth Hoenig
09bd11383c client: alloc_mounts directory must be sibling of data directory (#22199)
This PR adjusts the default location of -alloc-mounts-dir path to be a
sibling of the -data-dir path rather than a child. This is because on a
production-hardened systems the data dir is supposed to be chmod 0700
owned by root - preventing the exec2 task driver (and others using
unveil file system isolation features) from working properly.

For reference the directory structure from -data-dir now looks like this
after running an example job. Under the alloc_mounts directory, task
specific directories are mode 0710 and owned by the task user (which
may be a dynamic user UID/GID).

➜ sudo tree -p -d -u /tmp/mynomad
[drwxrwxr-x shoenig ]  /tmp/mynomad
├── [drwx--x--x root    ]  alloc_mounts
│   └── [drwx--x--- 80552   ]  c753b71d-c6a1-3370-1f59-47ab838fd8a6-mytask
│       ├── [drwxrwxrwx nobody  ]  alloc
│       │   ├── [drwxrwxrwx nobody  ]  data
│       │   ├── [drwxrwxrwx nobody  ]  logs
│       │   └── [drwxrwxrwx nobody  ]  tmp
│       ├── [drwxrwxrwx nobody  ]  local
│       ├── [drwxr-xr-x root    ]  private
│       ├── [drwx--x--- 80552   ]  secrets
│       └── [drwxrwxrwt nobody  ]  tmp
└── [drwx------ root    ]  data
    ├── [drwx--x--x root    ]  alloc
    │   └── [drwxr-xr-x root    ]  c753b71d-c6a1-3370-1f59-47ab838fd8a6
    │       ├── [drwxrwxrwx nobody  ]  alloc
    │       │   ├── [drwxrwxrwx nobody  ]  data
    │       │   ├── [drwxrwxrwx nobody  ]  logs
    │       │   └── [drwxrwxrwx nobody  ]  tmp
    │       └── [drwx--x--- 80552   ]  mytask
    │           ├── [drwxrwxrwx nobody  ]  alloc
    │           │   ├── [drwxrwxrwx nobody  ]  data
    │           │   ├── [drwxrwxrwx nobody  ]  logs
    │           │   └── [drwxrwxrwx nobody  ]  tmp
    │           ├── [drwxrwxrwx nobody  ]  local
    │           ├── [drwxrwxrwx nobody  ]  private
    │           ├── [drwx--x--- 80552   ]  secrets
    │           └── [drwxrwxrwt nobody  ]  tmp
    ├── [drwx------ root    ]  client
    └── [drwxr-xr-x root    ]  server
        ├── [drwx------ root    ]  keystore
        ├── [drwxr-xr-x root    ]  raft
        │   └── [drwxr-xr-x root    ]  snapshots
        └── [drwxr-xr-x root    ]  serf

32 directories
2024-05-22 13:14:34 -05:00
Deniz Onur Duzgun
1cc99cc1b4 bug: resolve type conversion alerts (#20553) 2024-05-15 13:22:10 -04:00
Tim Gross
c9fd93c772 connect: support volume_mount blocks for sidecar task overrides (#20575)
Users can override the default sidecar task for Connect workloads. This sidecar
task might need access to certificate stores on the host. Allow adding the
`volume_mount` block to the sidecar task override.

Also fixes a bug where `volume_mount` blocks would not appear in plan diff
outputs.

Fixes: https://github.com/hashicorp/nomad/issues/19786
2024-05-14 12:49:37 -04:00
hc-github-team-nomad-core
e1a176c120 Generate files for 1.8.0-beta.1 release 2024-05-07 07:06:07 +00:00
Daniel Bennett
cf87a556b3 api: new /v1/jobs/statuses endpoint for /ui/jobs page (#20130)
introduce a new API /v1/jobs/statuses, primarily for use in the UI,
which collates info about jobs, their allocations, and latest deployment.

currently the UI gets *all* of /v1/jobs and sorts and paginates them client-side
in the browser, and its "summary" column is based on historical summary data
(which can be visually misleading, and sometimes scary when a job has failed
at some point in the not-yet-garbage-collected past).

this does pagination and filtering and such, and returns jobs sorted by ModifyIndex,
so latest-changed jobs still come first. it pulls allocs and latest deployment
straight out of current state for more a more robust, holistic view of the job status.
it is less efficient per-job, due to the extra state lookups, but should be more efficient
per-page (excepting perhaps for job(s) with very-many allocs).

if a POST body is sent like `{"jobs": [{"namespace": "cool-ns", "id": "cool-job"}]}`,
then the response will be limited to that subset of jobs. the main goal here is to
prevent "jostling" the user in the UI when jobs come into and out of existence.

and if a blocking query is started with `?index=N`, then the query should only
unblock if jobs "on page" change, rather than any change to any of the state
tables being queried ("jobs", "allocs", and "deployment"), to save unnecessary
HTTP round trips.
2024-05-03 15:01:40 -05:00
Tim Gross
54fc146432 agent: add support for sdnotify protocol (#20528)
Nomad agents expect to receive `SIGHUP` to reload their configuration. The
signal handler for this is installed fairly late in agent startup, after the
client or server components are up and running. This means that configuration
management tools can potentially reload the configuration before the agent can
handle it, causing the agent to crash.

We don't want to allow configuration reload during client or server component
startup, because it would significantly complicate initialization. Instead,
we'll implement the systemd notify protocol. This causes systemd to block
sending configuration reload signals until the agent is actually ready. Users
can still bypass this by sending signals directly.

Note that there are several Go libraries that implement the sdnotify protocol,
but most are part of much larger projects which would create a lot of dependabot
burden. The bits of the protocol we need are extremely simple to implement in a
just a couple of functions.

For non-Linux or non-systemd Linux systems, this feature is a no-op. In future
work we could potentially implement service notification for Windows as well.

Fixes: https://github.com/hashicorp/nomad/issues/3885
2024-05-03 13:42:07 -04:00
Tim Gross
f9dd120d29 cli: add -jwks-ca-file to Vault/Consul setup commands (#20518)
When setting up auth methods for Consul and Vault in production environments, we
can typically assume that the CA certificate for the JWKS endpoint will be in
the host certificate store (as part of the usual configuration management
cluster admins needs to do). But for quick demos with `-dev` agents, this won't
be the case.

Add a `-jwks-ca-file` parameter to the setup commands so that we can use this
tool to quickly setup WI with `-dev` agents running TLS.
2024-05-03 08:26:29 -04:00
Michael Schurter
3aefc010d7 test: remove spurious print statements (#20503) 2024-05-01 09:47:56 -07:00
James Rasell
05a7bb53d3 cli: fix handling of scaling jobs which don't generate evals. (#20479)
In some cases, Nomad job scaling will not generate evaluations
such as parameterized jobs. This change fixes the CLI behaviour
in this case, and copies the job run command for consistency.
2024-04-30 10:32:31 +01:00
Daniel Bennett
3ac3bc1cfe acl: token global mode can not be changed (#20464)
true up CLI and docs with API reality
2024-04-22 11:58:47 -05:00
Juana De La Cuesta
64978662b6 Post 1.7.7 release (#20421)
Generate files for 1.7.7 release, prepare for next release and merge release 1.7.7 files
2024-04-17 10:44:32 +02:00
Seth Hoenig
ae6c4c8e3f deps: purge use of old x/exp packages (#20373) 2024-04-12 08:29:00 -05:00
astudentofblake
7b7ed12326 func: Allow custom paths to be added the the getter landlock (#20349)
* func: Allow custom paths to be added the the getter landlock

Fixes: 20315

* fix: slices imports
fix: more meaningful examples
fix: improve documentation
fix: quote error output
2024-04-11 15:17:33 -05:00
Tim Gross
8298d39e78 Connect transparent proxy support
Add support for Consul Connect transparent proxies

Fixes: https://github.com/hashicorp/nomad/issues/10628
2024-04-10 11:00:18 -04:00
Tim Gross
a0cbc1a26a cli: remove extraneous trailing newline from nomad fmt (#20318)
When `nomad fmt` writes to stdout instead of overwriting a file, the command was
using the `UI` output, which appends an extra newline. This results in extra
trailing newlines when using `nomad fmt` as part of a pipeline or editor plugin.

Update the command to write directly to stdout when in the stdout mode.

Fixes: https://github.com/hashicorp/nomad/issues/20307
2024-04-08 13:29:22 -04:00
Tim Gross
e8d203e7ce transparent proxy: add jobspec support (#20144)
Add a transparent proxy block to the existing Connect sidecar service proxy
block. This changeset is plumbing required to support transparent proxy
configuration on the client.

Ref: https://github.com/hashicorp/nomad/issues/10628
2024-04-04 17:01:07 -04:00
Tim Gross
a50e6267d0 cli: remove redundant allocs profile from operator debug (#20219)
The pprof `allocs` profile is identical to the `heap` profile, just with a
different default view. Collecting only one of the two is sufficient to view all
of `alloc_objects`, `alloc_space`, `inuse_objects`, and `inuse_space`, and
collecting only one means that both views will be of the same profile.

Also improve the docstrings on the goroutine profiles explaining what's in each
so that it's clear why we might want all of debug=0, debug=1, and debug=2.
2024-03-26 08:19:18 -04:00
James Rasell
facc3e8013 agent: allow configuration of in-memory telemetry sink. (#20166)
This change adds configuration options for setting the in-memory
telemetry sink collection and retention durations. This sink backs
the metrics JSON API and previously had hard-coded default values.

The new options are particularly useful when running development or
debug environments, where metrics collection is desired at a fast
and granular rate.
2024-03-25 15:00:18 +00:00
Tim Gross
02d98b9357 operator debug: fix pprof interval handling (#20206)
The `nomad operator debug` command saves a CPU profile for each interval, and
names these files based on the interval.

The same functions takes a goroutine profile, heap profile, etc. but is missing
the logic to interpolate the file name with the interval. This results in the
operator debug command making potentially many expensive profile requests, and
then overwriting the data. Update the command to save every profile it scrapes,
and number them similarly to the existing CPU profile.

Additionally, the command flags for `-pprof-interval` and `-pprof-duration` were
validated backwards, which meant that we always coerced the `-pprof-interval` to
be the same as the `-pprof-duration`, which always resulted in a single profile
being taken at the start of the bundle. Correct the check as well as change the
defaults to be more sensible.

Fixes: https://github.com/hashicorp/nomad/issues/20151
2024-03-25 09:01:06 -04:00
Tim Gross
bdf3ff301e jobspec: add support for destination partition to upstream block (#20167)
Adds support for specifying a destination Consul admin partition in the
`upstream` block.

Fixes: https://github.com/hashicorp/nomad/issues/19785
2024-03-22 16:15:22 -04:00
Tim Gross
10dd738a03 jobspec: update gateway.ingress.service Consul API fields (#20176)
Add support for further configuring `gateway.ingress.service` blocks to bring
this block up-to-date with currently available Consul API fields (except for
namespace and admin partition, which will need be handled under a different
PR). These fields are sent to Consul as part of the job endpoint submission hook
for Connect gateways.

Co-authored-by: Horacio Monsalvo <horacio.monsalvo@southworks.com>
2024-03-22 13:50:48 -04:00
Michael Schurter
23e4b7c9d2 Upgrade go-msgpack to v2 (#20173)
Replaces #18812

Upgraded with:
```
find . -name '*.go' -exec sed -i s/"github.com\/hashicorp\/go-msgpack\/codec"/"github.com\/hashicorp\/go-msgpack\/v2\/codec/" '{}' ';'
find . -name '*.go' -exec sed -i s/"github.com\/hashicorp\/net-rpc-msgpackrpc"/"github.com\/hashicorp\/net-rpc-msgpackrpc\/v2/" '{}' ';'
go get
go get -v -u github.com/hashicorp/raft-boltdb/v2
go get -v github.com/hashicorp/serf@5d32001edfaa18d1c010af65db707cdb38141e80
```

see https://github.com/hashicorp/go-msgpack/releases/tag/v2.1.0
for details
2024-03-21 11:44:23 -07:00
Tim Gross
7b9bce2d08 config: fix client.template config merging with defaults (#20165)
When loading the client configuration, the user-specified `client.template`
block was not properly merged with the default values. As a result, if the user
set any `client.template` field, all the other field defaulted to their zero
values instead of the documented defaults.

This changeset:
* Adds the missing `Merge` method for the client template config and ensures
  it's called.
* Makes a single source of truth for the default template configuration,
  instead of two different constructors.
* Extends the tests to cover the merge of a partial block better.

Fixes: https://github.com/hashicorp/nomad/issues/20164
2024-03-20 10:18:56 -04:00
Tim Gross
c4253470a0 autopilot: add operator autopilot health command (#20156)
Add a command line operation that reports Enterprise autopilot data from the
`/operator/autopilot/health` API. I've pulled this feature out of
@lindleywhite's PR in the Enterprise repo.

Ref: https://github.com/hashicorp/nomad-enterprise/pull/1394

Co-authored-by: Lindley <lindley@hashicorp.com>
2024-03-18 14:46:18 -04:00
Tim Gross
5138c1c82f autopilot: add Enterprise health information to API endpoint (#20153)
Add information about autopilot health to the `/operator/autopilot/health` API
in Nomad Enterprise.

I've pulled the CE changes required for this feature out of @lindleywhite's PR
in the Enterprise repo. A separate PR will include a new `operator autopilot
health` command that can present this information at the command line.

Ref: https://github.com/hashicorp/nomad-enterprise/pull/1394
Co-authored-by: Lindley <lindley@hashicorp.com>
2024-03-18 11:38:17 -04:00
Tim Gross
db195726a5 cli: add options to help string for acl policy info (#20138)
Fixes: https://github.com/hashicorp/nomad/issues/20117
2024-03-15 08:44:50 -04:00
Amir Abbas
40b8f17717 Support insecure flag on artifact (#20126) 2024-03-14 10:59:20 -05:00
Seth Hoenig
05937ab75b exec2: add client support for unveil filesystem isolation mode (#20115)
* exec2: add client support for unveil filesystem isolation mode

This PR adds support for a new filesystem isolation mode, "Unveil". The
mode introduces a "alloc_mounts" directory where tasks have user-owned
directory structure which are bind mounts into the real alloc directory
structure. This enables a task driver to use landlock (and maybe the
real unveil on openbsd one day) to isolate a task to the task owned
directory structure, providing sandboxing.

* actually create alloc-mounts-dir directory

* fix doc strings about alloc mount dir paths
2024-03-13 08:24:17 -05:00
hc-github-team-nomad-core
46182c2a83 Generate files for 1.7.6 release 2024-03-12 12:04:04 +01:00
carrychair
5f5b34db0e remove repetitive words (#20110)
Signed-off-by: carrychair <linghuchong404@gmail.com>
2024-03-11 08:52:08 +00:00
Seth Hoenig
286dce7a2a exec2: add a client.users configuration block (#20093)
* exec: add a client.users configuration block

For now just add min/max dynamic user values; soon we can also absorb
the "user.denylist" and "user.checked_drivers" options from the
deprecated client.options map.

* give the no-op pool implementation a better name

* use explicit error types to make referencing them cleaner in tests

* use import alias to not shadow package name
2024-03-08 16:02:32 -06:00
Giovanni Avelar
26a27bb12c cli: add -json option on jobs status command (#18925) 2024-03-08 16:03:52 -05:00
Soren L. Hansen
96acddbc13 Avoid NPE in nomad/command/job_restart.go (#20049)
stopAlloc() checks if an allocation represents a system job like this:
```
  if *alloc.Job.Type == api.JobTypeSystem {
    ...
  }
```

This caused the cli to crash:
```
==> 2024-02-29T08:45:53+01:00: Restarting 2 allocations
    2024-02-29T08:45:54+01:00: Rescheduling allocation "6a9da11a" for group "redacted-group"
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x2 addr=0x20 pc=0x10686affc]

goroutine 36 [running]:
github.com/hashicorp/nomad/command.(*JobRestartCommand).stopAlloc(0x14000b11040, {0x14000996dc0?, 0x0?})
        github.com/hashicorp/nomad/command/job_restart.go:968 +0x25c
github.com/hashicorp/nomad/command.(*JobRestartCommand).handleAlloc(0x14000b11040, {0x14000996dc0?, 0x0?})
        github.com/hashicorp/nomad/command/job_restart.go:868 +0x34
github.com/hashicorp/nomad/command.(*JobRestartCommand).Run.(*JobRestartCommand).Run.func1.func2()
        github.com/hashicorp/nomad/command/job_restart.go:392 +0x28
github.com/hashicorp/go-multierror.(*Group).Go.func1()
        github.com/hashicorp/go-multierror@v1.1.1/group.go:23 +0x60
created by github.com/hashicorp/go-multierror.(*Group).Go in goroutine 1
        github.com/hashicorp/go-multierror@v1.1.1/group.go:20 +0x84
```

Attaching a debugger revealed that `alloc.Job` was set, but
`alloc.Job.Type` was nil. After guarding the `.Type` check with a
`alloc.Job.Type != nil`, it still crashed. This time, `alloc.Job` was
nil.

I was scrambling to get the job running again, so I didn't have the
opportunity to find out why those values were nil, but this change
ensures the CLI does not crash in these situations.

Fixes #20048
2024-03-01 08:07:28 -06:00
Seth Hoenig
4d83733909 tests: swap testify for test in more places (#20028)
* tests: swap testify for test in plugins/csi/client_test.go

* tests: swap testify for test in testutil/

* tests: swap testify for test in host_test.go

* tests: swap testify for test in plugin_test.go

* tests: swap testify for test in utils_test.go

* tests: swap testify for test in scheduler/

* tests: swap testify for test in parse_test.go

* tests: swap testify for test in attribute_test.go

* tests: swap testify for test in plugins/drivers/

* tests: swap testify for test in command/

* tests: fixup some test usages

* go: run go mod tidy

* windows: cpuset test only on linux
2024-02-29 12:11:35 -06:00
Juana De La Cuesta
20cfbc82d3 Introduces Disconnect block into the TaskGroup configuration (#19886)
This PR is the first on two that will implement the new Disconnect block. In this PR the new block is introduced to be backwards compatible with the fields it will replace. For more information refer to this RFC and this ticket.
2024-02-19 16:41:35 +01:00
hc-github-team-nomad-core
6e08d9ffff Generate files for 1.7.5 release 2024-02-13 11:32:59 -05:00
Tim Gross
e986c298ac alloc exec: fix panics after stream close (#19932)
In #19172 we added a check on websocket errors to see if they were one of
several benign "close" messages. This change inadvertently assumed that other
messages used for close would not implement `HTTPCodedError`. When errors like
the following are received:

> msgpack decode error [pos 0]: io: read/write on closed pipe"

they are sent from the inner loop as though they were a "real" error, but the
channel is already being closed with a "close" message.

This allowed many more attempts to pass thru a previously-undiscovered race
condition in the two goroutines that stream RPC responses to the websocket. When
the input stream returns an error for any reason (for example, the command we're
executing has exited), it will unblock the "outer" goroutine and cause a write
to the websocket. If we're concurrently writing the "close error" discussed
above, this results in a panic from the websocket library.

This changeset includes two fixes:
* Catch "closed pipe" error correctly so that we're not sending unnecessary
  error messages.
* Move all writes to the websocket into the same response streaming
  goroutine. The main handler goroutine will block on a results channel, and the
  response streaming goroutine will send on that channel with the final error when
  it's done so it can be reported to the user.
2024-02-12 09:43:34 -05:00
Tim Gross
110d93ab25 windows: remove LazyDLL calls for system modules (#19925)
On Windows, Nomad uses `syscall.NewLazyDLL` and `syscall.LoadDLL` functions to
load a few system DLL files, which does not prevent DLL hijacking
attacks. Hypothetically a local attacker on the client host that can place an
abusive library in a specific location could use this to escalate privileges to
the Nomad process. Although this attack does not fall within the Nomad security
model, it doesn't hurt to follow good practices here.

We can remove two of these DLL loads by using wrapper functions provided by the
stdlib in `x/sys/windows`

Co-authored-by: dduzgun-security <deniz.duzgun@hashicorp.com>
2024-02-09 08:47:48 -05:00
hc-github-team-nomad-core
875e96cccc Generate files for 1.7.4 release 2024-02-08 10:40:24 -05:00
Luiz Aoqui
ce710d49fd cli: fix tls ca create command with -domain (#19892)
The current implementation of the `nomad tls ca create` command
ovierrides the value of the `-domain` flag with `"nomad"` if no
additional customization is provided.

This results in a certificate for the wrong domain or an error if the
`-name-constraint` flag is also used.

THe logic for `IsCustom()` also seemed reversed. If all custom fields
are empty then the certificate is _not_ customized, so `IsCustom()`
should return false.
2024-02-07 16:40:51 -05:00
Luiz Aoqui
50c50a6328 cli: fix return code when job deployment succeeds (#19876)
When a job eval is blocked due to missing capacity, the `nomad job run`
command will monitor the deployment, which may succeed once additional
capacity is made available.

But the current implementation would return `2` even when the deployment
succeeded because it only took the first eval status into account.

This commit updates the eval monitoring logic to reset the scheduling
error state if the deployment eventually succeeds.
2024-02-05 18:32:25 -05:00
Juana De La Cuesta
120c3ca3c9 Add granular control of SELinux labels for host mounts (#19839)
Add new configuration option on task's volume_mounts, to give a fine grained control over SELinux "z" label

* Update website/content/docs/job-specification/volume_mount.mdx

Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>

* fix: typo

* func: make volume mount verification happen even on  mounts with no volume

---------

Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
Co-authored-by: Tim Gross <tgross@hashicorp.com>
2024-02-05 10:05:33 +01:00
Piotr Kazmierczak
11ca21ca3c cli: correct typos in setup consul (#19754) 2024-01-17 14:13:07 +01:00
James Rasell
41555b6370 cli: Fix minor help formatting issue in agent command. (#19743) 2024-01-17 12:18:00 +00:00