nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-03 08:55:43 +03:00

Author	SHA1	Message	Date
Daniel Bennett	a0d7fb6b09	connect: fix ipv6 bind_address test (#24216 )	2024-10-16 08:23:44 -05:00
Daniel Bennett	067afcda26	Consul Connect over IPv6 (except tproxy) (#24203 ) * detect ipv6 on "bridge" network and set service.connect.sidecar_proxy.config.bind_address for envoy to "::" instead of "0.0.0.0" * allow users to set bind_address in jobspec e.g. "" would defer to consul proxy-defaults * caveat: tproxy still does not work, because the CNI plugin does not configure ip6tables	2024-10-14 18:52:02 -05:00
hc-github-team-nomad-core	f1714162df	Generate files for 1.9.0 release	2024-10-14 07:26:36 +01:00
Tim Gross	4de1665942	consul: improve reliability of deregistration (#24166 ) When the local Consul agent receives a deregister request, it performs a pre-flight check using the locally cached ACL token. The agent then sends the request upstream to the Consul servers as part of anti-entropy, using its own token. This requires that the token we use for deregistration is valid even though that's not the token used to write to the Consul server. There are several cases where the service identity token might no longer exist at the time of deregistration: * A race condition between the sync and destroying the allocation. * Misconfiguration of the Consul auth method with a TTL. * Out-of-band destruction of the token. Additionally, Nomad's sync with Consul returns early if there are any errors, which means that a single broken token can prevent any other service on the Nomad agent from being registered or deregistered. Update Nomad's sync with Consul to use the Nomad agent's own Consul token for deregistration, regardless of which token the service was registered with. Accumulate errors from the sync so that they no longer block deregistration of other services. Fixes: https://github.com/hashicorp/nomad/issues/20159	2024-10-11 12:32:23 -04:00
Seth Hoenig	f1ce127524	jobspec: add a chown option to artifact block (#24157 ) * jobspec: add a chown option to artifact block This PR adds a boolean 'chown' field to the artifact block. It indicates whether the Nomad client should chown the downloaded files and directories to be owned by the task.user. This is useful for drivers like raw_exec and exec2 which are subject to the host filesystem user permissions structure. Before, these drivers might not be able to use or manage the downloaded artifacts since they would be owned by the root user on a typical Nomad client configuration. * api: no need for pointer of chown field	2024-10-11 11:30:27 -05:00
hc-github-team-nomad-core	668a827b2b	Generate files for 1.9.0-beta.2 release	2024-10-04 16:18:27 +00:00
Martijn Vegter	3ecf0d21e2	metrics: introduce client config to include alloc metadata as part of the base labels (#23964 )	2024-10-02 10:55:44 -04:00
Tim Gross	651d8d6f88	tests: fixup copywrite in test file (#24101 ) In #24007 we merged new HCL files but they were missing copywrite headers because the scan didn't run on this PR for some reason. I've already backported this to the Enterprise branches.	2024-10-01 16:43:10 -04:00
Juliano Martinez	4a74fda8ce	Allow client template config block to be parsed when using json config (#24007 ) - Adds tests - Adds sample test data for parsing hcl and json - Adds changelog	2024-10-01 15:44:36 -04:00
Tim Gross	5e1ad14f1f	scaling policy: use request namespace as target if unset in jobspec (#24065 ) When jobs are submitted with a scaling policy, the scaling policy's target only includes the job's namespace if the `namespace` field is set in the jobspec and not from the request. Normally jobs are canonicalized in the RPC handler before being written to Raft. But the scaling policy targets are instead written during the conversion from `api.Job` to `structs.Job`. We populate the `structs.Job` namespace from the request here as well, but only after the conversion has occurred. Swap the order of these operations so that the conversion is always happening with a correct namespace. Long-term we should not be making mutations during conversion either. But we can't remove it immediately because API requests may come from any agent across upgrades. Move the scaling target creation into the `Canonicalize` method and mark it for future removal in the API conversion code path. Fixes: https://github.com/hashicorp/nomad/issues/24039	2024-10-01 11:41:40 -04:00
hc-github-team-nomad-core	07dc87eb21	Generate files for 1.9.0-beta.1 release	2024-09-26 17:35:57 +00:00
Phil Renaud	e206993d49	Feature: Golden Versions (#24055 ) * TaggedVersion information in structs, rather than job_endpoint (#23841) * TaggedVersion information in structs, rather than job_endpoint * Test for taggedVersion description length * Some API plumbing * Tag and Untag job versions (#23863) * Tag and Untag at API level on down, but am I unblocking the wrong thing? * Code and comment cleanup * Unset methods generally now I stare long into the namespace abyss * Namespace passes through with QueryOptions removed from a write requesting struct * Comment and PR review cleanup * Version back to VersionStr * Generally consolidate unset logic into apply for version tagging * Addressed some PR comments * Auth check and RPC forwarding * uint64 instead of pointer for job version after api layer and renamed copy * job tag command split into apply and unset * latest-version convenience handling moved to CLI command level * CLI tests for tagging/untagging * UI parts removed * Add to job table when unsetting job tag on latest version * Vestigial no more * Compare versions by name and version number with the nomad history command (#23889) * First pass at passing a tagname and/or diff version to plan/versions requests * versions API now takes compare_to flags * Job history command output can have tag names and descriptions * compare_to to diff-tag and diff-version, plus adding flags to history command * 0th version now shows a diff if a specific diff target is requested * Addressing some PR comments * Simplify the diff-appending part of jobVersions and hide None-type diffs from CLI * Remove the diff-tag and diff-version parts of nomad job plan, with an eye toward making them a new top-level CLI command soon * Version diff tests * re-implement JobVersionByTagName * Test mods and simplification * Documentation for nomad job history additions * Prevent pruning and reaping of TaggedVersion jobs (#23983) tagged versions should not count against JobTrackedVersions i.e. new job versions being inserted should not evict tagged versions and GC should not delete a job if any of its versions are tagged Co-authored-by: Daniel Bennett <dbennett@hashicorp.com> --------- Co-authored-by: Daniel Bennett <dbennett@hashicorp.com> * [ui] Version Tags on the job versions page (#24013) * Timeline styles and their buttons modernized, and tags added * styled but not yet functional version blocks * Rough pass at edit/unedit UX * Styles consolidated * better UX around version tag crud, plus adapter and serializers * Mirage and acceptance tests * Modify percy to not show time-based things --------- Co-authored-by: Daniel Bennett <dbennett@hashicorp.com> * Job revert command and API endpoint can take a string version tag name (#24059) * Job revert command and API endpoint can take a string version tag name * RevertOpts as a signature-modified alternative to Revert() * job revert CLI test * Version pointers in endpoint tests * Dont copy over the tag when a job is reverted to a version with a tag * Convert tag name to version number at CLI level * Client method for version lookup by tag * No longer double-declaring client * [ui] Add tag filter to the job versions page (#24064) * Rough pass at the UI for version diff dropdown * Cleanup and diff fetching via adapter method * TaggedVersion now VersionTag (#24066) --------- Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>	2024-09-25 19:59:16 -04:00
Michael Smithhisler	6b6aa7cc26	identity: adds ability to specify custom filepath for saving workload identities (#24038 )	2024-09-23 10:27:00 -04:00
Daniel Bennett	ec81e7c57c	networking: add ignore_collision for static port{} (#23956 ) so more than one copy of a program can run at a time on the same port with SO_REUSEPORT. requires host network mode. some task drivers (like docker) may also need config { network_mode = "host" } but this is not validated prior to placement.	2024-09-17 16:01:48 -05:00
Seth Hoenig	51215bf102	deps: update to go-set/v3 and refactor to use custom iterators (#23971 ) * deps: update to go-set/v3 * deps: use custom set iterators for looping	2024-09-16 13:40:10 -05:00
Tim Gross	a9beef7edd	jobspec: remove HCL1 support (#23912 ) This changeset removes support for parsing jobspecs via the long-deprecated HCLv1. Fixes: https://github.com/hashicorp/nomad/issues/20195 Ref: https://hashicorp.atlassian.net/browse/NET-10220	2024-09-05 09:02:45 -04:00
Daniel Bennett	2f5cf8efae	networking: option to enable ipv6 on bridge network (#23882 ) by setting bridge_network_subnet_ipv6 in client config Co-authored-by: Martina Santangelo <martina.santangelo@hashicorp.com>	2024-09-04 10:17:10 -05:00
Seth Hoenig	db0642099e	build: update golangci-lint to 1.60.1 (#23807 ) * build: update golangci-lint to 1.60.1 * ci: update golangci-lint to v1.60.1 Helps with go1.23 compatability. Introduces some breaking changes / newly enforced linter patterns so those are fixed as well.	2024-08-14 10:09:31 -05:00
Piotr Kazmierczak	0ab7e2219a	assets rebuild	2024-08-13 12:48:08 +02:00
hc-github-team-nomad-core	8489dadf57	Generate files for 1.8.3 release	2024-08-13 12:21:21 +02:00
Tim Gross	b25f1b66ce	resources: allow job authors to configure size of secrets tmpfs (#23696 ) On supported platforms, the secrets directory is a 1MiB tmpfs. But some tasks need larger space for downloading large secrets. This is especially the case for tasks using `templates`, which need extra room to write a temporary file to the secrets directory that gets renamed to the old file atomically. This changeset allows increasing the size of the tmpfs in the `resources` block. Because this is a memory resource, we need to include it in the memory we allocate for scheduling purposes. The task is already prevented from using more memory in the tmpfs than the `resources.memory` field allows, but can bypass that limit by writing to the tmpfs via `template` or `artifact` blocks. Therefore, we need to account for the size of the tmpfs in the allocation resources. Simply adding it to the memory needed when we create the allocation allows it to be accounted for in all downstream consumers, and then we'll subtract that amount from the memory resources just before configuring the task driver. For backwards compatibility, the default value of 1MiB is "free" and ignored by the scheduler. Otherwise we'd be increasing the allocated resources for every existing alloc, which could cause problems across upgrades. If a user explicitly sets `resources.secrets = 1` it will no longer be free. Fixes: https://github.com/hashicorp/nomad/issues/2481 Ref: https://hashicorp.atlassian.net/browse/NET-10070	2024-08-05 16:06:58 -04:00
Tim Gross	9d4686c0df	tls: remove deprecated `prefer_server_cipher_suites` field (#23712 ) The TLS configuration object includes a deprecated `prefer_server_cipher_suites` field. In version of Go prior to 1.17, this property controlled whether a TLS connection would use the cipher suites preferred by the server or by the client. This field is ignored as of 1.17 and, according to the `crypto/tls` docs: "Servers now select the best mutually supported cipher suite based on logic that takes into account inferred client hardware, server hardware, and security." This property has been long-deprecated and leaving it in place may lead to false assumptions about how cipher suites are negotiated in connection to a server. So we want to remove it in Nomad 1.9.0. Fixes: https://github.com/hashicorp/nomad-enterprise/issues/999 Ref: https://hashicorp.atlassian.net/browse/NET-10531	2024-08-01 08:52:05 -04:00
Tim Gross	c8be863bc8	reporting: allow export interval and address to be configurable (#23674 ) The go-census library supports configuration to send metrics to a local development version of the collector. Add "undocumented" configuration options to the `reporting` block allow developers to debug and verify we're sending the data we expect with real Nomad servers and not just unit tests. Ref: https://hashicorp.atlassian.net/browse/NET-10057 Ref: https://github.com/hashicorp/nomad-enterprise/pull/1708	2024-07-24 08:29:59 -04:00
Tim Gross	2f4353412d	keyring: support prepublishing keys (#23577 ) When a root key is rotated, the servers immediately start signing Workload Identities with the new active key. But workloads may be using those WI tokens to sign into external services, which may not have had time to fetch the new public key and which might try to fetch new keys as needed. Add support for prepublishing keys. Prepublished keys will be visible in the JWKS endpoint but will not be used for signing or encryption until their `PublishTime`. Update the periodic key rotation to prepublish keys at half the `root_key_rotation_threshold` window, and promote prepublished keys to active after the `PublishTime`. This changeset also fixes two bugs in periodic root key rotation and garbage collection, both of which can't be safely fixed without implementing prepublishing: * Periodic root key rotation would never happen because the default `root_key_rotation_threshold` of 720h exceeds the 72h maximum window of the FSM time table. We now compare the `CreateTime` against the wall clock time instead of the time table. (We expect to remove the time table in future work, ref https://github.com/hashicorp/nomad/issues/16359) * Root key garbage collection could GC keys that were used to sign identities. We now wait until `root_key_rotation_threshold` + `root_key_gc_threshold` before GC'ing a key. * When rekeying a root key, the core job did not mark the key as inactive after the rekey was complete. Ref: https://hashicorp.atlassian.net/browse/NET-10398 Ref: https://hashicorp.atlassian.net/browse/NET-10280 Fixes: https://github.com/hashicorp/nomad/issues/19669 Fixes: https://github.com/hashicorp/nomad/issues/23528 Fixes: https://github.com/hashicorp/nomad/issues/19368	2024-07-19 13:29:41 -04:00
Tim Gross	c970d22164	keyring: support external KMS for key encryption key (KEK) (#23580 ) In Nomad 1.4.0, we shipped support for encrypted Variables and signed Workload Identities, but the key material is protected only by a AEAD encrypting the KEK. Add support for Vault transit encryption and external KMS from major cloud providers. The servers call out to the external service to decrypt each key in the on-disk keystore. Ref: https://hashicorp.atlassian.net/browse/NET-10334 Fixes: https://github.com/hashicorp/nomad/issues/14852	2024-07-18 09:42:28 -04:00
Juanadelacuesta	656725a615	fix: updated ui assets	2024-07-17 13:59:51 +02:00
hc-github-team-nomad-core	6dc691da07	Generate files for 1.8.2 release	2024-07-17 00:00:36 +02:00
guifran001	1c44521543	client: Add a preferred address family option for network-interface (#23389 ) to prefer ipv4 or ipv6 when deducing IP from network interface Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>	2024-07-12 15:30:38 -05:00
Martina Santangelo	661011f5de	cni: allow users to set CNI args in job spec (#23538 )	2024-07-12 11:47:15 -04:00
Piotr Kazmierczak	fa8ffedd74	api: handle newlines in JobSubmission vars correctly (#23560 ) Fixes a bug where variable values in job submissions that contained newlines weren't encoded correctly, and thus jobs that contained them couldn't be resumed once stopped via the UI. Internal ref: https://hashicorp.atlassian.net/browse/NET-9966	2024-07-12 08:04:27 +02:00
Piotr Kazmierczak	88e8973004	consul: additional unit test for consul config merging (#23495 )	2024-07-03 16:09:16 +02:00
Tim Gross	cd3101d624	scale: add `-check-index` to `job scale` command (#23457 ) The RPC handler for scaling a job passes flags to enforce the job modify index is unchanged when it makes the write to Raft. But its only checking against the existing job modify index at the time the RPC handler snapshots the state store, so it can only enforce consistency for its own validation. In clusters with automated scaling, it would be useful to expose the enforce index options to the API, so that cluster admins can enforce that scaling only happens when the job state is consistent with a state they've previously seen in other API calls. Add this option to the CLI and API and have the RPC handler check them if asked. Fixes: https://github.com/hashicorp/nomad/issues/23444	2024-06-27 16:54:06 -04:00
James Rasell	d63ad1a6c5	Generate UI assets	2024-06-20 14:13:24 +01:00
hc-github-team-nomad-core	9566174e92	Generate files for 1.8.1 release	2024-06-19 15:24:08 +01:00
nicoche	ffcb72bfe3	api: Add Notes field to service checks (#22397 ) Co-authored-by: Piotr Kazmierczak <470696+pkazmierczak@users.noreply.github.com>	2024-06-10 16:59:49 +02:00
Piotr Kazmierczak	2a09abc477	metrics: quota utilization configuration and documentation (#22912 ) Introduces support for (optional) quota utilization metrics CE part of the hashicorp/nomad-enterprise#1488 change	2024-06-03 21:06:19 +02:00
Phil Renaud	014f5145dc	Lockfile and bindata_assetfs recompiled on latest main (#22434 )	2024-05-31 13:23:59 -04:00
Phil Renaud	86ee56b8c5	[ui] Jobs index page badge for when a job has a paused task (#22392 ) * Adds a badge on the jobs index page if any task within any allocation of a running job is currently paused * Snapshot and acceptance tests for paused states * Cleared yarn cache * Remove MirageScenario from the test dependency chain * Logging before toString * Cardinal sin of time-based test execution * Maybe weve been lucky for years and the clientStatus has always been running for this test by happenstance * Back away from the time-based and toward the settled() approach	2024-05-30 21:18:35 -04:00
Tim Gross	de38ff4189	consul: set partition for gateway config entries (#22228 ) When we write Connect gateway configuation entries from the server, we're not passing in the intended partition. This means we're using the server's own partition to submit the configuration entries and this may not match. Note this requires the Nomad server's token has permission to that partition. Also, move the config entry write after we check Sentinel policies. This allows us to return early if we hit a Sentinel error without making Consul RPCs first.	2024-05-29 16:31:02 -04:00
hc-github-team-nomad-core	32d820644a	Generate files for 1.8.0 release	2024-05-29 11:48:55 -04:00
hc-github-team-nomad-core	c374bd375b	Generate files for 1.8.0-rc.1 release	2024-05-23 16:55:05 -04:00
Daniel Bennett	4415fabe7d	jobspec: time based task execution (#22201 ) this is the CE side of an Enterprise-only feature. a job trying to use this in CE will fail to validate. to enable daily-scheduled execution entirely client-side, a job may now contain: task "name" { schedule { cron { start = "0 12 * * * *" # may not include "," or "/" end = "0 16" # partial cron, with only {minute} {hour} timezone = "EST" # anything in your tzdb } } ... and everything about the allocation will be placed as usual, but if outside the specified schedule, the taskrunner will block on the client, waiting on the schedule start, before proceeding with the task driver execution, etc. this includes a taksrunner hook, which watches for the end of the schedule, at which point it will kill the task. then, restarts-allowing, a new task will start and again block waiting for start, and so on. this also includes all the plumbing required to pipe API calls through from command->api->agent->server->client, so that tasks can be force-run, force-paused, or resume the schedule on demand.	2024-05-22 15:40:25 -05:00
Phil Renaud	e8b77fcfa0	[ui] Jobspec UI block: Descriptions and Links (#18292 ) * Hacky but shows links and desc * markdown * Small pre-test cleanup * Test for UI description and link rendering * JSON jobspec docs and variable example job get UI block * Jobspec documentation for UI block * Description and links moved into the Title component and made into Helios components * Marked version upgrade * Allow links without a description and max description to 1000 chars * Node 18 for setup-js * markdown sanitization * Ui to UI and docs change * Canonicalize, copy and diff for job.ui * UI block added to testJob for structs testing * diff test * Remove redundant reset * For readability, changing the receiving pointer of copied job variables * TestUI endpiont conversion tests * -require +must * Nil check on Links * JobUIConfig.Links as pointer --------- Co-authored-by: Tim Gross <tgross@hashicorp.com>	2024-05-22 15:00:45 -04:00
Seth Hoenig	09bd11383c	client: alloc_mounts directory must be sibling of data directory (#22199 ) This PR adjusts the default location of -alloc-mounts-dir path to be a sibling of the -data-dir path rather than a child. This is because on a production-hardened systems the data dir is supposed to be chmod 0700 owned by root - preventing the exec2 task driver (and others using unveil file system isolation features) from working properly. For reference the directory structure from -data-dir now looks like this after running an example job. Under the alloc_mounts directory, task specific directories are mode 0710 and owned by the task user (which may be a dynamic user UID/GID). ➜ sudo tree -p -d -u /tmp/mynomad [drwxrwxr-x shoenig ] /tmp/mynomad ├── [drwx--x--x root ] alloc_mounts │ └── [drwx--x--- 80552 ] c753b71d-c6a1-3370-1f59-47ab838fd8a6-mytask │ ├── [drwxrwxrwx nobody ] alloc │ │ ├── [drwxrwxrwx nobody ] data │ │ ├── [drwxrwxrwx nobody ] logs │ │ └── [drwxrwxrwx nobody ] tmp │ ├── [drwxrwxrwx nobody ] local │ ├── [drwxr-xr-x root ] private │ ├── [drwx--x--- 80552 ] secrets │ └── [drwxrwxrwt nobody ] tmp └── [drwx------ root ] data ├── [drwx--x--x root ] alloc │ └── [drwxr-xr-x root ] c753b71d-c6a1-3370-1f59-47ab838fd8a6 │ ├── [drwxrwxrwx nobody ] alloc │ │ ├── [drwxrwxrwx nobody ] data │ │ ├── [drwxrwxrwx nobody ] logs │ │ └── [drwxrwxrwx nobody ] tmp │ └── [drwx--x--- 80552 ] mytask │ ├── [drwxrwxrwx nobody ] alloc │ │ ├── [drwxrwxrwx nobody ] data │ │ ├── [drwxrwxrwx nobody ] logs │ │ └── [drwxrwxrwx nobody ] tmp │ ├── [drwxrwxrwx nobody ] local │ ├── [drwxrwxrwx nobody ] private │ ├── [drwx--x--- 80552 ] secrets │ └── [drwxrwxrwt nobody ] tmp ├── [drwx------ root ] client └── [drwxr-xr-x root ] server ├── [drwx------ root ] keystore ├── [drwxr-xr-x root ] raft │ └── [drwxr-xr-x root ] snapshots └── [drwxr-xr-x root ] serf 32 directories	2024-05-22 13:14:34 -05:00
Deniz Onur Duzgun	1cc99cc1b4	bug: resolve type conversion alerts (#20553 )	2024-05-15 13:22:10 -04:00
Tim Gross	c9fd93c772	connect: support `volume_mount` blocks for sidecar task overrides (#20575 ) Users can override the default sidecar task for Connect workloads. This sidecar task might need access to certificate stores on the host. Allow adding the `volume_mount` block to the sidecar task override. Also fixes a bug where `volume_mount` blocks would not appear in plan diff outputs. Fixes: https://github.com/hashicorp/nomad/issues/19786	2024-05-14 12:49:37 -04:00
hc-github-team-nomad-core	e1a176c120	Generate files for 1.8.0-beta.1 release	2024-05-07 07:06:07 +00:00
Daniel Bennett	cf87a556b3	api: new /v1/jobs/statuses endpoint for /ui/jobs page (#20130 ) introduce a new API /v1/jobs/statuses, primarily for use in the UI, which collates info about jobs, their allocations, and latest deployment. currently the UI gets all of /v1/jobs and sorts and paginates them client-side in the browser, and its "summary" column is based on historical summary data (which can be visually misleading, and sometimes scary when a job has failed at some point in the not-yet-garbage-collected past). this does pagination and filtering and such, and returns jobs sorted by ModifyIndex, so latest-changed jobs still come first. it pulls allocs and latest deployment straight out of current state for more a more robust, holistic view of the job status. it is less efficient per-job, due to the extra state lookups, but should be more efficient per-page (excepting perhaps for job(s) with very-many allocs). if a POST body is sent like `{"jobs": [{"namespace": "cool-ns", "id": "cool-job"}]}`, then the response will be limited to that subset of jobs. the main goal here is to prevent "jostling" the user in the UI when jobs come into and out of existence. and if a blocking query is started with `?index=N`, then the query should only unblock if jobs "on page" change, rather than any change to any of the state tables being queried ("jobs", "allocs", and "deployment"), to save unnecessary HTTP round trips.	2024-05-03 15:01:40 -05:00
Tim Gross	54fc146432	agent: add support for sdnotify protocol (#20528 ) Nomad agents expect to receive `SIGHUP` to reload their configuration. The signal handler for this is installed fairly late in agent startup, after the client or server components are up and running. This means that configuration management tools can potentially reload the configuration before the agent can handle it, causing the agent to crash. We don't want to allow configuration reload during client or server component startup, because it would significantly complicate initialization. Instead, we'll implement the systemd notify protocol. This causes systemd to block sending configuration reload signals until the agent is actually ready. Users can still bypass this by sending signals directly. Note that there are several Go libraries that implement the sdnotify protocol, but most are part of much larger projects which would create a lot of dependabot burden. The bits of the protocol we need are extremely simple to implement in a just a couple of functions. For non-Linux or non-systemd Linux systems, this feature is a no-op. In future work we could potentially implement service notification for Windows as well. Fixes: https://github.com/hashicorp/nomad/issues/3885	2024-05-03 13:42:07 -04:00
Juana De La Cuesta	64978662b6	Post 1.7.7 release (#20421 ) Generate files for 1.7.7 release, prepare for next release and merge release 1.7.7 files	2024-04-17 10:44:32 +02:00

1 2 3 4 5 ...

2331 Commits