nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-01 16:05:42 +03:00

Author	SHA1	Message	Date
Tim Gross	c8be863bc8	reporting: allow export interval and address to be configurable (#23674 ) The go-census library supports configuration to send metrics to a local development version of the collector. Add "undocumented" configuration options to the `reporting` block allow developers to debug and verify we're sending the data we expect with real Nomad servers and not just unit tests. Ref: https://hashicorp.atlassian.net/browse/NET-10057 Ref: https://github.com/hashicorp/nomad-enterprise/pull/1708	2024-07-24 08:29:59 -04:00
Tim Gross	2f4353412d	keyring: support prepublishing keys (#23577 ) When a root key is rotated, the servers immediately start signing Workload Identities with the new active key. But workloads may be using those WI tokens to sign into external services, which may not have had time to fetch the new public key and which might try to fetch new keys as needed. Add support for prepublishing keys. Prepublished keys will be visible in the JWKS endpoint but will not be used for signing or encryption until their `PublishTime`. Update the periodic key rotation to prepublish keys at half the `root_key_rotation_threshold` window, and promote prepublished keys to active after the `PublishTime`. This changeset also fixes two bugs in periodic root key rotation and garbage collection, both of which can't be safely fixed without implementing prepublishing: * Periodic root key rotation would never happen because the default `root_key_rotation_threshold` of 720h exceeds the 72h maximum window of the FSM time table. We now compare the `CreateTime` against the wall clock time instead of the time table. (We expect to remove the time table in future work, ref https://github.com/hashicorp/nomad/issues/16359) * Root key garbage collection could GC keys that were used to sign identities. We now wait until `root_key_rotation_threshold` + `root_key_gc_threshold` before GC'ing a key. * When rekeying a root key, the core job did not mark the key as inactive after the rekey was complete. Ref: https://hashicorp.atlassian.net/browse/NET-10398 Ref: https://hashicorp.atlassian.net/browse/NET-10280 Fixes: https://github.com/hashicorp/nomad/issues/19669 Fixes: https://github.com/hashicorp/nomad/issues/23528 Fixes: https://github.com/hashicorp/nomad/issues/19368	2024-07-19 13:29:41 -04:00
Tim Gross	c970d22164	keyring: support external KMS for key encryption key (KEK) (#23580 ) In Nomad 1.4.0, we shipped support for encrypted Variables and signed Workload Identities, but the key material is protected only by a AEAD encrypting the KEK. Add support for Vault transit encryption and external KMS from major cloud providers. The servers call out to the external service to decrypt each key in the on-disk keystore. Ref: https://hashicorp.atlassian.net/browse/NET-10334 Fixes: https://github.com/hashicorp/nomad/issues/14852	2024-07-18 09:42:28 -04:00
Juanadelacuesta	656725a615	fix: updated ui assets	2024-07-17 13:59:51 +02:00
hc-github-team-nomad-core	6dc691da07	Generate files for 1.8.2 release	2024-07-17 00:00:36 +02:00
guifran001	1c44521543	client: Add a preferred address family option for network-interface (#23389 ) to prefer ipv4 or ipv6 when deducing IP from network interface Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>	2024-07-12 15:30:38 -05:00
Martina Santangelo	661011f5de	cni: allow users to set CNI args in job spec (#23538 )	2024-07-12 11:47:15 -04:00
Piotr Kazmierczak	fa8ffedd74	api: handle newlines in JobSubmission vars correctly (#23560 ) Fixes a bug where variable values in job submissions that contained newlines weren't encoded correctly, and thus jobs that contained them couldn't be resumed once stopped via the UI. Internal ref: https://hashicorp.atlassian.net/browse/NET-9966	2024-07-12 08:04:27 +02:00
James Rasell	f3de47e63d	quota: Allow cores to be configured within an enterprise quota. (#23543 )	2024-07-11 14:54:25 +01:00
Tim Gross	b09c1146a9	CLI: fix prefix matching across multiple commands (#23502 ) Several commands that inspect objects where the names are user-controlled share a bug where the user cannot inspect the object if it has a name that is an exact prefix of the name of another object (in the same namespace, where applicable). For example, the object "test" can't be inspected if there's an object with the name "testing". Copy existing logic we have for jobs, node pools, etc. to the impacted commands: * `plugin status` * `quota inspect` * `quota status` * `scaling policy info` * `service info` * `volume deregister` * `volume detach` * `volume status` If we get multiple objects for the prefix query, we check if any of them are an exact match and use that object instead of returning an error. Where possible because the prefix query signatures are the same, use a generic function that can be shared across multiple commands. Fixes: https://github.com/hashicorp/nomad/issues/13920 Fixes: https://github.com/hashicorp/nomad/issues/17132 Fixes: https://github.com/hashicorp/nomad/issues/23236 Ref: https://hashicorp.atlassian.net/browse/NET-10054 Ref: https://hashicorp.atlassian.net/browse/NET-10055	2024-07-10 09:04:10 -04:00
Piotr Kazmierczak	88e8973004	consul: additional unit test for consul config merging (#23495 )	2024-07-03 16:09:16 +02:00
Seth Hoenig	3f57c9bcf2	cli: fix bold output of devices headers (#23477 )	2024-07-01 12:36:55 -05:00
Tim Gross	cd3101d624	scale: add `-check-index` to `job scale` command (#23457 ) The RPC handler for scaling a job passes flags to enforce the job modify index is unchanged when it makes the write to Raft. But its only checking against the existing job modify index at the time the RPC handler snapshots the state store, so it can only enforce consistency for its own validation. In clusters with automated scaling, it would be useful to expose the enforce index options to the API, so that cluster admins can enforce that scaling only happens when the job state is consistent with a state they've previously seen in other API calls. Add this option to the CLI and API and have the RPC handler check them if asked. Fixes: https://github.com/hashicorp/nomad/issues/23444	2024-06-27 16:54:06 -04:00
James Rasell	d63ad1a6c5	Generate UI assets	2024-06-20 14:13:24 +01:00
hc-github-team-nomad-core	9566174e92	Generate files for 1.8.1 release	2024-06-19 15:24:08 +01:00
nicoche	ffcb72bfe3	api: Add Notes field to service checks (#22397 ) Co-authored-by: Piotr Kazmierczak <470696+pkazmierczak@users.noreply.github.com>	2024-06-10 16:59:49 +02:00
Gerard Nguyen	c3c2240304	Update nomad operator snapshot inspect with more detail (#20062 ) Co-authored-by: Michael Schurter <michael.schurter@gmail.com> Co-authored-by: James Rasell <jrasell@hashicorp.com>	2024-06-06 06:57:10 +01:00
Piotr Kazmierczak	2a09abc477	metrics: quota utilization configuration and documentation (#22912 ) Introduces support for (optional) quota utilization metrics CE part of the hashicorp/nomad-enterprise#1488 change	2024-06-03 21:06:19 +02:00
Phil Renaud	014f5145dc	Lockfile and bindata_assetfs recompiled on latest main (#22434 )	2024-05-31 13:23:59 -04:00
Phil Renaud	86ee56b8c5	[ui] Jobs index page badge for when a job has a paused task (#22392 ) * Adds a badge on the jobs index page if any task within any allocation of a running job is currently paused * Snapshot and acceptance tests for paused states * Cleared yarn cache * Remove MirageScenario from the test dependency chain * Logging before toString * Cardinal sin of time-based test execution * Maybe weve been lucky for years and the clientStatus has always been running for this test by happenstance * Back away from the time-based and toward the settled() approach	2024-05-30 21:18:35 -04:00
Michael Schurter	690abefc4a	docs: add docs for time based task execution	2024-05-29 15:50:33 -07:00
Tim Gross	de38ff4189	consul: set partition for gateway config entries (#22228 ) When we write Connect gateway configuation entries from the server, we're not passing in the intended partition. This means we're using the server's own partition to submit the configuration entries and this may not match. Note this requires the Nomad server's token has permission to that partition. Also, move the config entry write after we check Sentinel policies. This allows us to return early if we hit a Sentinel error without making Consul RPCs first.	2024-05-29 16:31:02 -04:00
hc-github-team-nomad-core	32d820644a	Generate files for 1.8.0 release	2024-05-29 11:48:55 -04:00
hc-github-team-nomad-core	c374bd375b	Generate files for 1.8.0-rc.1 release	2024-05-23 16:55:05 -04:00
Daniel Bennett	4415fabe7d	jobspec: time based task execution (#22201 ) this is the CE side of an Enterprise-only feature. a job trying to use this in CE will fail to validate. to enable daily-scheduled execution entirely client-side, a job may now contain: task "name" { schedule { cron { start = "0 12 * * * *" # may not include "," or "/" end = "0 16" # partial cron, with only {minute} {hour} timezone = "EST" # anything in your tzdb } } ... and everything about the allocation will be placed as usual, but if outside the specified schedule, the taskrunner will block on the client, waiting on the schedule start, before proceeding with the task driver execution, etc. this includes a taksrunner hook, which watches for the end of the schedule, at which point it will kill the task. then, restarts-allowing, a new task will start and again block waiting for start, and so on. this also includes all the plumbing required to pipe API calls through from command->api->agent->server->client, so that tasks can be force-run, force-paused, or resume the schedule on demand.	2024-05-22 15:40:25 -05:00
Phil Renaud	e8b77fcfa0	[ui] Jobspec UI block: Descriptions and Links (#18292 ) * Hacky but shows links and desc * markdown * Small pre-test cleanup * Test for UI description and link rendering * JSON jobspec docs and variable example job get UI block * Jobspec documentation for UI block * Description and links moved into the Title component and made into Helios components * Marked version upgrade * Allow links without a description and max description to 1000 chars * Node 18 for setup-js * markdown sanitization * Ui to UI and docs change * Canonicalize, copy and diff for job.ui * UI block added to testJob for structs testing * diff test * Remove redundant reset * For readability, changing the receiving pointer of copied job variables * TestUI endpiont conversion tests * -require +must * Nil check on Links * JobUIConfig.Links as pointer --------- Co-authored-by: Tim Gross <tgross@hashicorp.com>	2024-05-22 15:00:45 -04:00
Seth Hoenig	09bd11383c	client: alloc_mounts directory must be sibling of data directory (#22199 ) This PR adjusts the default location of -alloc-mounts-dir path to be a sibling of the -data-dir path rather than a child. This is because on a production-hardened systems the data dir is supposed to be chmod 0700 owned by root - preventing the exec2 task driver (and others using unveil file system isolation features) from working properly. For reference the directory structure from -data-dir now looks like this after running an example job. Under the alloc_mounts directory, task specific directories are mode 0710 and owned by the task user (which may be a dynamic user UID/GID). ➜ sudo tree -p -d -u /tmp/mynomad [drwxrwxr-x shoenig ] /tmp/mynomad ├── [drwx--x--x root ] alloc_mounts │ └── [drwx--x--- 80552 ] c753b71d-c6a1-3370-1f59-47ab838fd8a6-mytask │ ├── [drwxrwxrwx nobody ] alloc │ │ ├── [drwxrwxrwx nobody ] data │ │ ├── [drwxrwxrwx nobody ] logs │ │ └── [drwxrwxrwx nobody ] tmp │ ├── [drwxrwxrwx nobody ] local │ ├── [drwxr-xr-x root ] private │ ├── [drwx--x--- 80552 ] secrets │ └── [drwxrwxrwt nobody ] tmp └── [drwx------ root ] data ├── [drwx--x--x root ] alloc │ └── [drwxr-xr-x root ] c753b71d-c6a1-3370-1f59-47ab838fd8a6 │ ├── [drwxrwxrwx nobody ] alloc │ │ ├── [drwxrwxrwx nobody ] data │ │ ├── [drwxrwxrwx nobody ] logs │ │ └── [drwxrwxrwx nobody ] tmp │ └── [drwx--x--- 80552 ] mytask │ ├── [drwxrwxrwx nobody ] alloc │ │ ├── [drwxrwxrwx nobody ] data │ │ ├── [drwxrwxrwx nobody ] logs │ │ └── [drwxrwxrwx nobody ] tmp │ ├── [drwxrwxrwx nobody ] local │ ├── [drwxrwxrwx nobody ] private │ ├── [drwx--x--- 80552 ] secrets │ └── [drwxrwxrwt nobody ] tmp ├── [drwx------ root ] client └── [drwxr-xr-x root ] server ├── [drwx------ root ] keystore ├── [drwxr-xr-x root ] raft │ └── [drwxr-xr-x root ] snapshots └── [drwxr-xr-x root ] serf 32 directories	2024-05-22 13:14:34 -05:00
Deniz Onur Duzgun	1cc99cc1b4	bug: resolve type conversion alerts (#20553 )	2024-05-15 13:22:10 -04:00
Tim Gross	c9fd93c772	connect: support `volume_mount` blocks for sidecar task overrides (#20575 ) Users can override the default sidecar task for Connect workloads. This sidecar task might need access to certificate stores on the host. Allow adding the `volume_mount` block to the sidecar task override. Also fixes a bug where `volume_mount` blocks would not appear in plan diff outputs. Fixes: https://github.com/hashicorp/nomad/issues/19786	2024-05-14 12:49:37 -04:00
hc-github-team-nomad-core	e1a176c120	Generate files for 1.8.0-beta.1 release	2024-05-07 07:06:07 +00:00
Daniel Bennett	cf87a556b3	api: new /v1/jobs/statuses endpoint for /ui/jobs page (#20130 ) introduce a new API /v1/jobs/statuses, primarily for use in the UI, which collates info about jobs, their allocations, and latest deployment. currently the UI gets all of /v1/jobs and sorts and paginates them client-side in the browser, and its "summary" column is based on historical summary data (which can be visually misleading, and sometimes scary when a job has failed at some point in the not-yet-garbage-collected past). this does pagination and filtering and such, and returns jobs sorted by ModifyIndex, so latest-changed jobs still come first. it pulls allocs and latest deployment straight out of current state for more a more robust, holistic view of the job status. it is less efficient per-job, due to the extra state lookups, but should be more efficient per-page (excepting perhaps for job(s) with very-many allocs). if a POST body is sent like `{"jobs": [{"namespace": "cool-ns", "id": "cool-job"}]}`, then the response will be limited to that subset of jobs. the main goal here is to prevent "jostling" the user in the UI when jobs come into and out of existence. and if a blocking query is started with `?index=N`, then the query should only unblock if jobs "on page" change, rather than any change to any of the state tables being queried ("jobs", "allocs", and "deployment"), to save unnecessary HTTP round trips.	2024-05-03 15:01:40 -05:00
Tim Gross	54fc146432	agent: add support for sdnotify protocol (#20528 ) Nomad agents expect to receive `SIGHUP` to reload their configuration. The signal handler for this is installed fairly late in agent startup, after the client or server components are up and running. This means that configuration management tools can potentially reload the configuration before the agent can handle it, causing the agent to crash. We don't want to allow configuration reload during client or server component startup, because it would significantly complicate initialization. Instead, we'll implement the systemd notify protocol. This causes systemd to block sending configuration reload signals until the agent is actually ready. Users can still bypass this by sending signals directly. Note that there are several Go libraries that implement the sdnotify protocol, but most are part of much larger projects which would create a lot of dependabot burden. The bits of the protocol we need are extremely simple to implement in a just a couple of functions. For non-Linux or non-systemd Linux systems, this feature is a no-op. In future work we could potentially implement service notification for Windows as well. Fixes: https://github.com/hashicorp/nomad/issues/3885	2024-05-03 13:42:07 -04:00
Tim Gross	f9dd120d29	cli: add `-jwks-ca-file` to Vault/Consul setup commands (#20518 ) When setting up auth methods for Consul and Vault in production environments, we can typically assume that the CA certificate for the JWKS endpoint will be in the host certificate store (as part of the usual configuration management cluster admins needs to do). But for quick demos with `-dev` agents, this won't be the case. Add a `-jwks-ca-file` parameter to the setup commands so that we can use this tool to quickly setup WI with `-dev` agents running TLS.	2024-05-03 08:26:29 -04:00
Michael Schurter	3aefc010d7	test: remove spurious print statements (#20503 )	2024-05-01 09:47:56 -07:00
James Rasell	05a7bb53d3	cli: fix handling of scaling jobs which don't generate evals. (#20479 ) In some cases, Nomad job scaling will not generate evaluations such as parameterized jobs. This change fixes the CLI behaviour in this case, and copies the job run command for consistency.	2024-04-30 10:32:31 +01:00
Daniel Bennett	3ac3bc1cfe	acl: token global mode can not be changed (#20464 ) true up CLI and docs with API reality	2024-04-22 11:58:47 -05:00
Juana De La Cuesta	64978662b6	Post 1.7.7 release (#20421 ) Generate files for 1.7.7 release, prepare for next release and merge release 1.7.7 files	2024-04-17 10:44:32 +02:00
Seth Hoenig	ae6c4c8e3f	deps: purge use of old x/exp packages (#20373 )	2024-04-12 08:29:00 -05:00
astudentofblake	7b7ed12326	func: Allow custom paths to be added the the getter landlock (#20349 ) * func: Allow custom paths to be added the the getter landlock Fixes: 20315 * fix: slices imports fix: more meaningful examples fix: improve documentation fix: quote error output	2024-04-11 15:17:33 -05:00
Tim Gross	8298d39e78	Connect transparent proxy support Add support for Consul Connect transparent proxies Fixes: https://github.com/hashicorp/nomad/issues/10628	2024-04-10 11:00:18 -04:00
Tim Gross	a0cbc1a26a	cli: remove extraneous trailing newline from `nomad fmt` (#20318 ) When `nomad fmt` writes to stdout instead of overwriting a file, the command was using the `UI` output, which appends an extra newline. This results in extra trailing newlines when using `nomad fmt` as part of a pipeline or editor plugin. Update the command to write directly to stdout when in the stdout mode. Fixes: https://github.com/hashicorp/nomad/issues/20307	2024-04-08 13:29:22 -04:00
Tim Gross	e8d203e7ce	transparent proxy: add jobspec support (#20144 ) Add a transparent proxy block to the existing Connect sidecar service proxy block. This changeset is plumbing required to support transparent proxy configuration on the client. Ref: https://github.com/hashicorp/nomad/issues/10628	2024-04-04 17:01:07 -04:00
Tim Gross	a50e6267d0	cli: remove redundant `allocs` profile from `operator debug` (#20219 ) The pprof `allocs` profile is identical to the `heap` profile, just with a different default view. Collecting only one of the two is sufficient to view all of `alloc_objects`, `alloc_space`, `inuse_objects`, and `inuse_space`, and collecting only one means that both views will be of the same profile. Also improve the docstrings on the goroutine profiles explaining what's in each so that it's clear why we might want all of debug=0, debug=1, and debug=2.	2024-03-26 08:19:18 -04:00
James Rasell	facc3e8013	agent: allow configuration of in-memory telemetry sink. (#20166 ) This change adds configuration options for setting the in-memory telemetry sink collection and retention durations. This sink backs the metrics JSON API and previously had hard-coded default values. The new options are particularly useful when running development or debug environments, where metrics collection is desired at a fast and granular rate.	2024-03-25 15:00:18 +00:00
Tim Gross	02d98b9357	operator debug: fix pprof interval handling (#20206 ) The `nomad operator debug` command saves a CPU profile for each interval, and names these files based on the interval. The same functions takes a goroutine profile, heap profile, etc. but is missing the logic to interpolate the file name with the interval. This results in the operator debug command making potentially many expensive profile requests, and then overwriting the data. Update the command to save every profile it scrapes, and number them similarly to the existing CPU profile. Additionally, the command flags for `-pprof-interval` and `-pprof-duration` were validated backwards, which meant that we always coerced the `-pprof-interval` to be the same as the `-pprof-duration`, which always resulted in a single profile being taken at the start of the bundle. Correct the check as well as change the defaults to be more sensible. Fixes: https://github.com/hashicorp/nomad/issues/20151	2024-03-25 09:01:06 -04:00
Tim Gross	bdf3ff301e	jobspec: add support for destination partition to `upstream` block (#20167 ) Adds support for specifying a destination Consul admin partition in the `upstream` block. Fixes: https://github.com/hashicorp/nomad/issues/19785	2024-03-22 16:15:22 -04:00
Tim Gross	10dd738a03	jobspec: update `gateway.ingress.service` Consul API fields (#20176 ) Add support for further configuring `gateway.ingress.service` blocks to bring this block up-to-date with currently available Consul API fields (except for namespace and admin partition, which will need be handled under a different PR). These fields are sent to Consul as part of the job endpoint submission hook for Connect gateways. Co-authored-by: Horacio Monsalvo <horacio.monsalvo@southworks.com>	2024-03-22 13:50:48 -04:00
Michael Schurter	23e4b7c9d2	Upgrade go-msgpack to v2 (#20173 ) Replaces #18812 Upgraded with: ``` find . -name '.go' -exec sed -i s/"github.com\/hashicorp\/go-msgpack\/codec"/"github.com\/hashicorp\/go-msgpack\/v2\/codec/" '{}' ';' find . -name '.go' -exec sed -i s/"github.com\/hashicorp\/net-rpc-msgpackrpc"/"github.com\/hashicorp\/net-rpc-msgpackrpc\/v2/" '{}' ';' go get go get -v -u github.com/hashicorp/raft-boltdb/v2 go get -v github.com/hashicorp/serf@5d32001edfaa18d1c010af65db707cdb38141e80 ``` see https://github.com/hashicorp/go-msgpack/releases/tag/v2.1.0 for details	2024-03-21 11:44:23 -07:00
Tim Gross	7b9bce2d08	config: fix `client.template` config merging with defaults (#20165 ) When loading the client configuration, the user-specified `client.template` block was not properly merged with the default values. As a result, if the user set any `client.template` field, all the other field defaulted to their zero values instead of the documented defaults. This changeset: * Adds the missing `Merge` method for the client template config and ensures it's called. * Makes a single source of truth for the default template configuration, instead of two different constructors. * Extends the tests to cover the merge of a partial block better. Fixes: https://github.com/hashicorp/nomad/issues/20164	2024-03-20 10:18:56 -04:00
Tim Gross	c4253470a0	autopilot: add `operator autopilot health` command (#20156 ) Add a command line operation that reports Enterprise autopilot data from the `/operator/autopilot/health` API. I've pulled this feature out of @lindleywhite's PR in the Enterprise repo. Ref: https://github.com/hashicorp/nomad-enterprise/pull/1394 Co-authored-by: Lindley <lindley@hashicorp.com>	2024-03-18 14:46:18 -04:00

1 2 3 4 5 ...

3789 Commits