nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-01 16:05:42 +03:00

Author	SHA1	Message	Date
Michael Schurter	690abefc4a	docs: add docs for time based task execution	2024-05-29 15:50:33 -07:00
Tim Gross	de38ff4189	consul: set partition for gateway config entries (#22228 ) When we write Connect gateway configuation entries from the server, we're not passing in the intended partition. This means we're using the server's own partition to submit the configuration entries and this may not match. Note this requires the Nomad server's token has permission to that partition. Also, move the config entry write after we check Sentinel policies. This allows us to return early if we hit a Sentinel error without making Consul RPCs first.	2024-05-29 16:31:02 -04:00
hc-github-team-nomad-core	32d820644a	Generate files for 1.8.0 release	2024-05-29 11:48:55 -04:00
hc-github-team-nomad-core	c374bd375b	Generate files for 1.8.0-rc.1 release	2024-05-23 16:55:05 -04:00
Daniel Bennett	4415fabe7d	jobspec: time based task execution (#22201 ) this is the CE side of an Enterprise-only feature. a job trying to use this in CE will fail to validate. to enable daily-scheduled execution entirely client-side, a job may now contain: task "name" { schedule { cron { start = "0 12 * * * *" # may not include "," or "/" end = "0 16" # partial cron, with only {minute} {hour} timezone = "EST" # anything in your tzdb } } ... and everything about the allocation will be placed as usual, but if outside the specified schedule, the taskrunner will block on the client, waiting on the schedule start, before proceeding with the task driver execution, etc. this includes a taksrunner hook, which watches for the end of the schedule, at which point it will kill the task. then, restarts-allowing, a new task will start and again block waiting for start, and so on. this also includes all the plumbing required to pipe API calls through from command->api->agent->server->client, so that tasks can be force-run, force-paused, or resume the schedule on demand.	2024-05-22 15:40:25 -05:00
Phil Renaud	e8b77fcfa0	[ui] Jobspec UI block: Descriptions and Links (#18292 ) * Hacky but shows links and desc * markdown * Small pre-test cleanup * Test for UI description and link rendering * JSON jobspec docs and variable example job get UI block * Jobspec documentation for UI block * Description and links moved into the Title component and made into Helios components * Marked version upgrade * Allow links without a description and max description to 1000 chars * Node 18 for setup-js * markdown sanitization * Ui to UI and docs change * Canonicalize, copy and diff for job.ui * UI block added to testJob for structs testing * diff test * Remove redundant reset * For readability, changing the receiving pointer of copied job variables * TestUI endpiont conversion tests * -require +must * Nil check on Links * JobUIConfig.Links as pointer --------- Co-authored-by: Tim Gross <tgross@hashicorp.com>	2024-05-22 15:00:45 -04:00
Seth Hoenig	09bd11383c	client: alloc_mounts directory must be sibling of data directory (#22199 ) This PR adjusts the default location of -alloc-mounts-dir path to be a sibling of the -data-dir path rather than a child. This is because on a production-hardened systems the data dir is supposed to be chmod 0700 owned by root - preventing the exec2 task driver (and others using unveil file system isolation features) from working properly. For reference the directory structure from -data-dir now looks like this after running an example job. Under the alloc_mounts directory, task specific directories are mode 0710 and owned by the task user (which may be a dynamic user UID/GID). ➜ sudo tree -p -d -u /tmp/mynomad [drwxrwxr-x shoenig ] /tmp/mynomad ├── [drwx--x--x root ] alloc_mounts │ └── [drwx--x--- 80552 ] c753b71d-c6a1-3370-1f59-47ab838fd8a6-mytask │ ├── [drwxrwxrwx nobody ] alloc │ │ ├── [drwxrwxrwx nobody ] data │ │ ├── [drwxrwxrwx nobody ] logs │ │ └── [drwxrwxrwx nobody ] tmp │ ├── [drwxrwxrwx nobody ] local │ ├── [drwxr-xr-x root ] private │ ├── [drwx--x--- 80552 ] secrets │ └── [drwxrwxrwt nobody ] tmp └── [drwx------ root ] data ├── [drwx--x--x root ] alloc │ └── [drwxr-xr-x root ] c753b71d-c6a1-3370-1f59-47ab838fd8a6 │ ├── [drwxrwxrwx nobody ] alloc │ │ ├── [drwxrwxrwx nobody ] data │ │ ├── [drwxrwxrwx nobody ] logs │ │ └── [drwxrwxrwx nobody ] tmp │ └── [drwx--x--- 80552 ] mytask │ ├── [drwxrwxrwx nobody ] alloc │ │ ├── [drwxrwxrwx nobody ] data │ │ ├── [drwxrwxrwx nobody ] logs │ │ └── [drwxrwxrwx nobody ] tmp │ ├── [drwxrwxrwx nobody ] local │ ├── [drwxrwxrwx nobody ] private │ ├── [drwx--x--- 80552 ] secrets │ └── [drwxrwxrwt nobody ] tmp ├── [drwx------ root ] client └── [drwxr-xr-x root ] server ├── [drwx------ root ] keystore ├── [drwxr-xr-x root ] raft │ └── [drwxr-xr-x root ] snapshots └── [drwxr-xr-x root ] serf 32 directories	2024-05-22 13:14:34 -05:00
Deniz Onur Duzgun	1cc99cc1b4	bug: resolve type conversion alerts (#20553 )	2024-05-15 13:22:10 -04:00
Tim Gross	c9fd93c772	connect: support `volume_mount` blocks for sidecar task overrides (#20575 ) Users can override the default sidecar task for Connect workloads. This sidecar task might need access to certificate stores on the host. Allow adding the `volume_mount` block to the sidecar task override. Also fixes a bug where `volume_mount` blocks would not appear in plan diff outputs. Fixes: https://github.com/hashicorp/nomad/issues/19786	2024-05-14 12:49:37 -04:00
hc-github-team-nomad-core	e1a176c120	Generate files for 1.8.0-beta.1 release	2024-05-07 07:06:07 +00:00
Daniel Bennett	cf87a556b3	api: new /v1/jobs/statuses endpoint for /ui/jobs page (#20130 ) introduce a new API /v1/jobs/statuses, primarily for use in the UI, which collates info about jobs, their allocations, and latest deployment. currently the UI gets all of /v1/jobs and sorts and paginates them client-side in the browser, and its "summary" column is based on historical summary data (which can be visually misleading, and sometimes scary when a job has failed at some point in the not-yet-garbage-collected past). this does pagination and filtering and such, and returns jobs sorted by ModifyIndex, so latest-changed jobs still come first. it pulls allocs and latest deployment straight out of current state for more a more robust, holistic view of the job status. it is less efficient per-job, due to the extra state lookups, but should be more efficient per-page (excepting perhaps for job(s) with very-many allocs). if a POST body is sent like `{"jobs": [{"namespace": "cool-ns", "id": "cool-job"}]}`, then the response will be limited to that subset of jobs. the main goal here is to prevent "jostling" the user in the UI when jobs come into and out of existence. and if a blocking query is started with `?index=N`, then the query should only unblock if jobs "on page" change, rather than any change to any of the state tables being queried ("jobs", "allocs", and "deployment"), to save unnecessary HTTP round trips.	2024-05-03 15:01:40 -05:00
Tim Gross	54fc146432	agent: add support for sdnotify protocol (#20528 ) Nomad agents expect to receive `SIGHUP` to reload their configuration. The signal handler for this is installed fairly late in agent startup, after the client or server components are up and running. This means that configuration management tools can potentially reload the configuration before the agent can handle it, causing the agent to crash. We don't want to allow configuration reload during client or server component startup, because it would significantly complicate initialization. Instead, we'll implement the systemd notify protocol. This causes systemd to block sending configuration reload signals until the agent is actually ready. Users can still bypass this by sending signals directly. Note that there are several Go libraries that implement the sdnotify protocol, but most are part of much larger projects which would create a lot of dependabot burden. The bits of the protocol we need are extremely simple to implement in a just a couple of functions. For non-Linux or non-systemd Linux systems, this feature is a no-op. In future work we could potentially implement service notification for Windows as well. Fixes: https://github.com/hashicorp/nomad/issues/3885	2024-05-03 13:42:07 -04:00
Tim Gross	f9dd120d29	cli: add `-jwks-ca-file` to Vault/Consul setup commands (#20518 ) When setting up auth methods for Consul and Vault in production environments, we can typically assume that the CA certificate for the JWKS endpoint will be in the host certificate store (as part of the usual configuration management cluster admins needs to do). But for quick demos with `-dev` agents, this won't be the case. Add a `-jwks-ca-file` parameter to the setup commands so that we can use this tool to quickly setup WI with `-dev` agents running TLS.	2024-05-03 08:26:29 -04:00
Michael Schurter	3aefc010d7	test: remove spurious print statements (#20503 )	2024-05-01 09:47:56 -07:00
James Rasell	05a7bb53d3	cli: fix handling of scaling jobs which don't generate evals. (#20479 ) In some cases, Nomad job scaling will not generate evaluations such as parameterized jobs. This change fixes the CLI behaviour in this case, and copies the job run command for consistency.	2024-04-30 10:32:31 +01:00
Daniel Bennett	3ac3bc1cfe	acl: token global mode can not be changed (#20464 ) true up CLI and docs with API reality	2024-04-22 11:58:47 -05:00
Juana De La Cuesta	64978662b6	Post 1.7.7 release (#20421 ) Generate files for 1.7.7 release, prepare for next release and merge release 1.7.7 files	2024-04-17 10:44:32 +02:00
Seth Hoenig	ae6c4c8e3f	deps: purge use of old x/exp packages (#20373 )	2024-04-12 08:29:00 -05:00
astudentofblake	7b7ed12326	func: Allow custom paths to be added the the getter landlock (#20349 ) * func: Allow custom paths to be added the the getter landlock Fixes: 20315 * fix: slices imports fix: more meaningful examples fix: improve documentation fix: quote error output	2024-04-11 15:17:33 -05:00
Tim Gross	8298d39e78	Connect transparent proxy support Add support for Consul Connect transparent proxies Fixes: https://github.com/hashicorp/nomad/issues/10628	2024-04-10 11:00:18 -04:00
Tim Gross	a0cbc1a26a	cli: remove extraneous trailing newline from `nomad fmt` (#20318 ) When `nomad fmt` writes to stdout instead of overwriting a file, the command was using the `UI` output, which appends an extra newline. This results in extra trailing newlines when using `nomad fmt` as part of a pipeline or editor plugin. Update the command to write directly to stdout when in the stdout mode. Fixes: https://github.com/hashicorp/nomad/issues/20307	2024-04-08 13:29:22 -04:00
Tim Gross	e8d203e7ce	transparent proxy: add jobspec support (#20144 ) Add a transparent proxy block to the existing Connect sidecar service proxy block. This changeset is plumbing required to support transparent proxy configuration on the client. Ref: https://github.com/hashicorp/nomad/issues/10628	2024-04-04 17:01:07 -04:00
Tim Gross	a50e6267d0	cli: remove redundant `allocs` profile from `operator debug` (#20219 ) The pprof `allocs` profile is identical to the `heap` profile, just with a different default view. Collecting only one of the two is sufficient to view all of `alloc_objects`, `alloc_space`, `inuse_objects`, and `inuse_space`, and collecting only one means that both views will be of the same profile. Also improve the docstrings on the goroutine profiles explaining what's in each so that it's clear why we might want all of debug=0, debug=1, and debug=2.	2024-03-26 08:19:18 -04:00
James Rasell	facc3e8013	agent: allow configuration of in-memory telemetry sink. (#20166 ) This change adds configuration options for setting the in-memory telemetry sink collection and retention durations. This sink backs the metrics JSON API and previously had hard-coded default values. The new options are particularly useful when running development or debug environments, where metrics collection is desired at a fast and granular rate.	2024-03-25 15:00:18 +00:00
Tim Gross	02d98b9357	operator debug: fix pprof interval handling (#20206 ) The `nomad operator debug` command saves a CPU profile for each interval, and names these files based on the interval. The same functions takes a goroutine profile, heap profile, etc. but is missing the logic to interpolate the file name with the interval. This results in the operator debug command making potentially many expensive profile requests, and then overwriting the data. Update the command to save every profile it scrapes, and number them similarly to the existing CPU profile. Additionally, the command flags for `-pprof-interval` and `-pprof-duration` were validated backwards, which meant that we always coerced the `-pprof-interval` to be the same as the `-pprof-duration`, which always resulted in a single profile being taken at the start of the bundle. Correct the check as well as change the defaults to be more sensible. Fixes: https://github.com/hashicorp/nomad/issues/20151	2024-03-25 09:01:06 -04:00
Tim Gross	bdf3ff301e	jobspec: add support for destination partition to `upstream` block (#20167 ) Adds support for specifying a destination Consul admin partition in the `upstream` block. Fixes: https://github.com/hashicorp/nomad/issues/19785	2024-03-22 16:15:22 -04:00
Tim Gross	10dd738a03	jobspec: update `gateway.ingress.service` Consul API fields (#20176 ) Add support for further configuring `gateway.ingress.service` blocks to bring this block up-to-date with currently available Consul API fields (except for namespace and admin partition, which will need be handled under a different PR). These fields are sent to Consul as part of the job endpoint submission hook for Connect gateways. Co-authored-by: Horacio Monsalvo <horacio.monsalvo@southworks.com>	2024-03-22 13:50:48 -04:00
Michael Schurter	23e4b7c9d2	Upgrade go-msgpack to v2 (#20173 ) Replaces #18812 Upgraded with: ``` find . -name '.go' -exec sed -i s/"github.com\/hashicorp\/go-msgpack\/codec"/"github.com\/hashicorp\/go-msgpack\/v2\/codec/" '{}' ';' find . -name '.go' -exec sed -i s/"github.com\/hashicorp\/net-rpc-msgpackrpc"/"github.com\/hashicorp\/net-rpc-msgpackrpc\/v2/" '{}' ';' go get go get -v -u github.com/hashicorp/raft-boltdb/v2 go get -v github.com/hashicorp/serf@5d32001edfaa18d1c010af65db707cdb38141e80 ``` see https://github.com/hashicorp/go-msgpack/releases/tag/v2.1.0 for details	2024-03-21 11:44:23 -07:00
Tim Gross	7b9bce2d08	config: fix `client.template` config merging with defaults (#20165 ) When loading the client configuration, the user-specified `client.template` block was not properly merged with the default values. As a result, if the user set any `client.template` field, all the other field defaulted to their zero values instead of the documented defaults. This changeset: * Adds the missing `Merge` method for the client template config and ensures it's called. * Makes a single source of truth for the default template configuration, instead of two different constructors. * Extends the tests to cover the merge of a partial block better. Fixes: https://github.com/hashicorp/nomad/issues/20164	2024-03-20 10:18:56 -04:00
Tim Gross	c4253470a0	autopilot: add `operator autopilot health` command (#20156 ) Add a command line operation that reports Enterprise autopilot data from the `/operator/autopilot/health` API. I've pulled this feature out of @lindleywhite's PR in the Enterprise repo. Ref: https://github.com/hashicorp/nomad-enterprise/pull/1394 Co-authored-by: Lindley <lindley@hashicorp.com>	2024-03-18 14:46:18 -04:00
Tim Gross	5138c1c82f	autopilot: add Enterprise health information to API endpoint (#20153 ) Add information about autopilot health to the `/operator/autopilot/health` API in Nomad Enterprise. I've pulled the CE changes required for this feature out of @lindleywhite's PR in the Enterprise repo. A separate PR will include a new `operator autopilot health` command that can present this information at the command line. Ref: https://github.com/hashicorp/nomad-enterprise/pull/1394 Co-authored-by: Lindley <lindley@hashicorp.com>	2024-03-18 11:38:17 -04:00
Tim Gross	db195726a5	cli: add options to help string for `acl policy info` (#20138 ) Fixes: https://github.com/hashicorp/nomad/issues/20117	2024-03-15 08:44:50 -04:00
Amir Abbas	40b8f17717	Support insecure flag on artifact (#20126 )	2024-03-14 10:59:20 -05:00
Seth Hoenig	05937ab75b	exec2: add client support for unveil filesystem isolation mode (#20115 ) * exec2: add client support for unveil filesystem isolation mode This PR adds support for a new filesystem isolation mode, "Unveil". The mode introduces a "alloc_mounts" directory where tasks have user-owned directory structure which are bind mounts into the real alloc directory structure. This enables a task driver to use landlock (and maybe the real unveil on openbsd one day) to isolate a task to the task owned directory structure, providing sandboxing. * actually create alloc-mounts-dir directory * fix doc strings about alloc mount dir paths	2024-03-13 08:24:17 -05:00
hc-github-team-nomad-core	46182c2a83	Generate files for 1.7.6 release	2024-03-12 12:04:04 +01:00
carrychair	5f5b34db0e	remove repetitive words (#20110 ) Signed-off-by: carrychair <linghuchong404@gmail.com>	2024-03-11 08:52:08 +00:00
Seth Hoenig	286dce7a2a	exec2: add a client.users configuration block (#20093 ) * exec: add a client.users configuration block For now just add min/max dynamic user values; soon we can also absorb the "user.denylist" and "user.checked_drivers" options from the deprecated client.options map. * give the no-op pool implementation a better name * use explicit error types to make referencing them cleaner in tests * use import alias to not shadow package name	2024-03-08 16:02:32 -06:00
Giovanni Avelar	26a27bb12c	cli: add -json option on jobs status command (#18925 )	2024-03-08 16:03:52 -05:00
Soren L. Hansen	96acddbc13	Avoid NPE in nomad/command/job_restart.go (#20049 ) stopAlloc() checks if an allocation represents a system job like this: ``` if alloc.Job.Type == api.JobTypeSystem { ... } ``` This caused the cli to crash: ``` ==> 2024-02-29T08:45:53+01:00: Restarting 2 allocations 2024-02-29T08:45:54+01:00: Rescheduling allocation "6a9da11a" for group "redacted-group" panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x2 addr=0x20 pc=0x10686affc] goroutine 36 [running]: github.com/hashicorp/nomad/command.(JobRestartCommand).stopAlloc(0x14000b11040, {0x14000996dc0?, 0x0?}) github.com/hashicorp/nomad/command/job_restart.go:968 +0x25c github.com/hashicorp/nomad/command.(JobRestartCommand).handleAlloc(0x14000b11040, {0x14000996dc0?, 0x0?}) github.com/hashicorp/nomad/command/job_restart.go:868 +0x34 github.com/hashicorp/nomad/command.(JobRestartCommand).Run.(JobRestartCommand).Run.func1.func2() github.com/hashicorp/nomad/command/job_restart.go:392 +0x28 github.com/hashicorp/go-multierror.(Group).Go.func1() github.com/hashicorp/go-multierror@v1.1.1/group.go:23 +0x60 created by github.com/hashicorp/go-multierror.(*Group).Go in goroutine 1 github.com/hashicorp/go-multierror@v1.1.1/group.go:20 +0x84 ``` Attaching a debugger revealed that `alloc.Job` was set, but `alloc.Job.Type` was nil. After guarding the `.Type` check with a `alloc.Job.Type != nil`, it still crashed. This time, `alloc.Job` was nil. I was scrambling to get the job running again, so I didn't have the opportunity to find out why those values were nil, but this change ensures the CLI does not crash in these situations. Fixes #20048	2024-03-01 08:07:28 -06:00
Seth Hoenig	4d83733909	tests: swap testify for test in more places (#20028 ) * tests: swap testify for test in plugins/csi/client_test.go * tests: swap testify for test in testutil/ * tests: swap testify for test in host_test.go * tests: swap testify for test in plugin_test.go * tests: swap testify for test in utils_test.go * tests: swap testify for test in scheduler/ * tests: swap testify for test in parse_test.go * tests: swap testify for test in attribute_test.go * tests: swap testify for test in plugins/drivers/ * tests: swap testify for test in command/ * tests: fixup some test usages * go: run go mod tidy * windows: cpuset test only on linux	2024-02-29 12:11:35 -06:00
Juana De La Cuesta	20cfbc82d3	Introduces `Disconnect` block into the `TaskGroup` configuration (#19886 ) This PR is the first on two that will implement the new Disconnect block. In this PR the new block is introduced to be backwards compatible with the fields it will replace. For more information refer to this RFC and this ticket.	2024-02-19 16:41:35 +01:00
hc-github-team-nomad-core	6e08d9ffff	Generate files for 1.7.5 release	2024-02-13 11:32:59 -05:00
Tim Gross	e986c298ac	alloc exec: fix panics after stream close (#19932 ) In #19172 we added a check on websocket errors to see if they were one of several benign "close" messages. This change inadvertently assumed that other messages used for close would not implement `HTTPCodedError`. When errors like the following are received: > msgpack decode error [pos 0]: io: read/write on closed pipe" they are sent from the inner loop as though they were a "real" error, but the channel is already being closed with a "close" message. This allowed many more attempts to pass thru a previously-undiscovered race condition in the two goroutines that stream RPC responses to the websocket. When the input stream returns an error for any reason (for example, the command we're executing has exited), it will unblock the "outer" goroutine and cause a write to the websocket. If we're concurrently writing the "close error" discussed above, this results in a panic from the websocket library. This changeset includes two fixes: * Catch "closed pipe" error correctly so that we're not sending unnecessary error messages. * Move all writes to the websocket into the same response streaming goroutine. The main handler goroutine will block on a results channel, and the response streaming goroutine will send on that channel with the final error when it's done so it can be reported to the user.	2024-02-12 09:43:34 -05:00
Tim Gross	110d93ab25	windows: remove LazyDLL calls for system modules (#19925 ) On Windows, Nomad uses `syscall.NewLazyDLL` and `syscall.LoadDLL` functions to load a few system DLL files, which does not prevent DLL hijacking attacks. Hypothetically a local attacker on the client host that can place an abusive library in a specific location could use this to escalate privileges to the Nomad process. Although this attack does not fall within the Nomad security model, it doesn't hurt to follow good practices here. We can remove two of these DLL loads by using wrapper functions provided by the stdlib in `x/sys/windows` Co-authored-by: dduzgun-security <deniz.duzgun@hashicorp.com>	2024-02-09 08:47:48 -05:00
hc-github-team-nomad-core	875e96cccc	Generate files for 1.7.4 release	2024-02-08 10:40:24 -05:00
Luiz Aoqui	ce710d49fd	cli: fix `tls ca create` command with `-domain` (#19892 ) The current implementation of the `nomad tls ca create` command ovierrides the value of the `-domain` flag with `"nomad"` if no additional customization is provided. This results in a certificate for the wrong domain or an error if the `-name-constraint` flag is also used. THe logic for `IsCustom()` also seemed reversed. If all custom fields are empty then the certificate is _not_ customized, so `IsCustom()` should return false.	2024-02-07 16:40:51 -05:00
Luiz Aoqui	50c50a6328	cli: fix return code when job deployment succeeds (#19876 ) When a job eval is blocked due to missing capacity, the `nomad job run` command will monitor the deployment, which may succeed once additional capacity is made available. But the current implementation would return `2` even when the deployment succeeded because it only took the first eval status into account. This commit updates the eval monitoring logic to reset the scheduling error state if the deployment eventually succeeds.	2024-02-05 18:32:25 -05:00
Juana De La Cuesta	120c3ca3c9	Add granular control of SELinux labels for host mounts (#19839 ) Add new configuration option on task's volume_mounts, to give a fine grained control over SELinux "z" label * Update website/content/docs/job-specification/volume_mount.mdx Co-authored-by: Luiz Aoqui <luiz@hashicorp.com> * fix: typo * func: make volume mount verification happen even on mounts with no volume --------- Co-authored-by: Luiz Aoqui <luiz@hashicorp.com> Co-authored-by: Tim Gross <tgross@hashicorp.com>	2024-02-05 10:05:33 +01:00
Piotr Kazmierczak	11ca21ca3c	cli: correct typos in setup consul (#19754 )	2024-01-17 14:13:07 +01:00
James Rasell	41555b6370	cli: Fix minor help formatting issue in agent command. (#19743 )	2024-01-17 12:18:00 +00:00

1 2 3 4 5 ...

3769 Commits