nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-06 18:35:44 +03:00

Author	SHA1	Message	Date
Tim Gross	bbf49a9050	dynamic host volumes: node selection via constraints (#24518 ) When making a request to create a dynamic host volumes, users can pass a node pool and constraints instead of a specific node ID. This changeset implements a node scheduling logic by instantiating a filter by node pool and constraint checker borrowed from the scheduler package. Because host volumes with the same name can't land on the same host, we don't need to support `distinct_hosts`/`distinct_property`; this would be challenging anyways without building out a much larger node iteration mechanism to keep track of usage across multiple hosts. Ref: https://github.com/hashicorp/nomad/pull/24479	2024-12-19 09:25:54 -05:00
Daniel Bennett	c2dd97dee7	HostVolumePlugin interface and two implementations (#24497 ) * mkdir: HostVolumePluginMkdir: just creates a directory * example-host-volume: HostVolumePluginExternal: plugin script that does mkfs and mount loopback Co-authored-by: Tim Gross <tgross@hashicorp.com>	2024-12-19 09:25:54 -05:00
Tim Gross	10a5f4861f	dynamic host volumes: create/register RPC validation Add several validation steps in the create/register RPCs for dynamic host volumes. We first check that submitted volumes are self-consistent (ex. max capacity is more than min capacity), then that any updates we've made are valid. And we validate against state: preventing claimed volumes from being updated and preventing placement requests for nodes that don't exist. Ref: https://github.com/hashicorp/nomad/issues/15489	2024-12-19 09:25:54 -05:00
Tim Gross	0f92134a7f	dynamic host volumes: fix unblocking bug in state store The `HostVolumeByID` state store method didn't add a watch channel to the watchset, which meant that it would never unblock. The tests missed this because they were racy, so move the updates for unblocking tests into a `time.After` call to ensure the queries are blocked before the update happens.	2024-12-19 09:25:54 -05:00
Tim Gross	c3735127ae	allow FlattenMultierror to accept standard error	2024-12-19 09:25:54 -05:00
Tim Gross	7c85176059	dynamic host volumes: basic CLI CRUD operations (#24382 ) This changeset implements a first pass at the CLI for Dynamic Host Volumes. Ref: https://hashicorp.atlassian.net/browse/NET-11549	2024-12-19 09:25:54 -05:00
Tim Gross	a65358da7b	dynamic host volumes: HTTP API endpoint (#24380 ) This changeset implements the HTTP API endpoints for Dynamic Host Volumes. The `GET /v1/volumes` endpoint is shared between CSI and DHV with a query parameter for the type. In the interest of getting some working handlers available for use in development (and minimizing the size of the diff to review), this changeset doesn't do any sort of refactoring of how the existing List Volumes CSI endpoint works. That will come in a later PR, as will the corresponding `api` package updates we need to support the CLI. Ref: https://hashicorp.atlassian.net/browse/NET-11549	2024-12-19 09:25:54 -05:00
Tim Gross	6a3803c31e	dynamic host volumes: RPC handlers (#24373 ) This changeset implements the RPC handlers for Dynamic Host Volumes, including the plumbing needed to forward requests to clients. The client-side implementation is stubbed and will be done under a separate PR. Ref: https://hashicorp.atlassian.net/browse/NET-11549	2024-12-19 09:25:54 -05:00
Tim Gross	75c142ff40	dynamic host volumes: initial state store implementation (#24353 ) This changeset implements the state store schema for Dynamic Host Volumes, and methods used to query the state for RPCs. Ref: https://hashicorp.atlassian.net/browse/NET-11549	2024-12-19 09:25:53 -05:00
Tim Gross	d8c901570b	dynamic host volumes: ACL policies (#24356 ) This changeset implements the ACLs required for dynamic host volumes RPCs: * `host-volume-write` is a coarse-grained policy that implies all operations. * `host-volume-register` is the highest fine-grained privilege because it potentially bypasses quotas. * `host-volume-create` is implicitly granted by `host-volume-register` * `host-volume-delete` is implicitly granted only by `host-volume-write` * `host-volume-read` is implicitly granted by `policy = "read"`, These are namespaced operations, so the testing here is predominantly around parsing and granting of implicit capabilities rather than the well-tested `AllowNamespaceOperation` method. This changeset does not include any changes to the `host_volumes` policy which we'll need for claiming volumes on job submit. That'll be covered in a later PR. Ref: https://hashicorp.atlassian.net/browse/NET-11549	2024-12-19 09:25:53 -05:00
James Rasell	8bb7c1315d	e2e: fix failing tests due to region name change. (#24713 )	2024-12-19 14:21:17 +00:00
Daniel Peinhopf	04e930b756	Show HCL variable input on job submission. (#24622 ) * Show HCL variable input on job submission. * Add changelog entry (authored by @sevensolutions)	2024-12-18 13:56:47 -05:00
James Rasell	fe821b4c1c	github: Fix lock threads syntax and permissions (#24708 ) The parameters used for the reusable action were incorrect since the 5.0.1 update. The permissions were also incorrect as the workflow needs to write to issues and PRs.	2024-12-18 15:29:58 +00:00
James Rasell	e3ac00f30e	github: notify Slack when CI fails on merge to main/release (#24690 ) This change creates a reusable workflow for notifying Slack on CI failures. The message will include useful links and information about the failure, so product engineers can investigate and fix any problems. The new workflow is used by selected workflows which trigger on merges to main or release/* branches. The notification is only sent on failure and when the event was a push (PR merge) meaning the number of notifications should be minimal. The aim is to help identify and draw attention to failure across our release branches, in particular when automated processes happen.	2024-12-18 08:07:44 +00:00
Tim Gross	30e57c39b0	discovery: correctly handle IPv6 addresses from go-discover (#24649 ) Nomad sets a default port when resolving server addresses that don't have one. When we get a "bare" IPv6 address without a port, we end up with an unexpected error "too many colons in address" when we try to split the address and host, because the standard library function expects IPv6 addresses to be wrapped in brackets as recommended by RFC5952. User-configured addresses avoid this problem by accepting IP address and port as separate configuration values, but go-discover emits "bare" IPv6 addresses without a port in IPv6 environments. Fix this by adding brackets to IPv6 addresses when we get the "too many colons" error from the stdlib. This will still give erroneous results if the address includes the port but is missing brackets, but there's no way to unambiguously parse that address. Ref: https://www.rfc-editor.org/rfc/rfc5952 Fixes: https://github.com/hashicorp/nomad/issues/24608	2024-12-17 15:49:40 -05:00
Deniz Onur Duzgun	a4ac2025f4	sec: suppress osv alert in CRT (#24701 ) * sec: suppress GO-2022-0635 osv alert in CRT * hclfmt	2024-12-17 14:56:52 -05:00
Phil Renaud	7746f290e6	Reset childjobs watcher when kicking it off from a new parent job route (#24668 )	2024-12-17 10:49:37 -05:00
Phil Renaud	932c3ebfb0	[ui] Adds meta k/v tables to Task Group and Task pages (#24594 ) * Experimenting with a generic meta job-part component * Taskstate.task gets me every time * continue-on-error false test * continue-on-error back in, but explicit success check after exam * Testfixes for new meta structure on tasks and groups * Clean up test and dev code	2024-12-17 10:46:03 -05:00
Tim Gross	abeae5c47b	E2E: use a variable for region (#24693 ) In #24644 we set the region to "e2e" but forgot to setup the TLS certificate names appropriately. Swap the region out for a variable instead.	2024-12-17 10:28:22 -05:00
Phil Renaud	e48bfeccd7	Cross-spawn pinned re: GHSA-3xgq-45jj-v275 (#24616 )	2024-12-17 01:01:49 -05:00
Phil Renaud	71e3716435	[ui] Fixes an issue where system jobs' status were set to Scaled Down when their allocs get garbage collected (#24620 ) * Fixes an issue where system jobs' status were set to Scaled Down when their allocs get garbage collected * Added to aggregateAllocStatus acceptance test and changelog	2024-12-17 01:01:19 -05:00
Deniz Onur Duzgun	22b7470ccf	sec: fix alloc workload identity namespace permission (#24683 ) Sanitize the Allocations SignedIdentities to prevent privilege escalation within a namespace through unauthorized impersonation of [workload associated with ACL policies](https://developer.hashicorp.com/nomad/docs/concepts/workload-identity#workload-associated-acl-policies) in any workload within the namespace. Ref: CVE-2024-12678. Ref: https://github.com/hashicorp/nomad-enterprise/pull/2098	2024-12-16 16:35:10 -05:00
Tim Gross	75b0202f7f	api: don't copy previously parsed URL when setting new address (#24644 ) In #16872 we added support for unix domain sockets, but this required mutating the `Config` when parsing the address so as to remove the port number. In #23785 we fixed a bug where if the configuration was used across multiple clients that mutation would happen multiple times and the address would be incorrectly parsed. When making `alloc log`, `alloc fs`, or `alloc exec` calls where we have line-of-sight to the client, we attempt to make a HTTP API call directly to the client node. So we create a new API client from the same configuration and then set the address. But in this case we copy the private `url` field and that causes the URL parsing to be skipped for the new client. This results in the region always being set to the string literal `"global"` (because of mTLS handling code introduced all the way back in `4d3b75d867`), unless the user has set the region specifically. This fails with an error "no path to region" when the cluster isn't non-global and requests are sent to a non-leader. Arguably the "right" way of fixing this would be for `ClientConfig` not to change the API client's region to `"global"` in the first place, but as this is a public API and extremely longstanding behavior, it could potentially be a breaking change for some downstream consumers. Instead, we'll avoid copying the private `url` field so that the new address is re-parsed. Fixes: https://github.com/hashicorp/nomad/issues/24635 Fixes: https://github.com/hashicorp/nomad/issues/24609 Ref: https://github.com/hashicorp/nomad/pull/16872 Ref: https://github.com/hashicorp/nomad/pull/23785 Ref: `4d3b75d867`	2024-12-16 11:05:29 -05:00
Tim Gross	24fa7439df	cni: use tmpfs location for ipam plugin (#24650 ) When a Nomad host reboots, the network namespace files in the tmpfs in `/var/run` are wiped out. So when we restore allocations after a host reboot, we need to be able to restore both the network namespace and the network configuration. But because the netns is newly created and we need to run the CNI plugins again, this create potential conflicts with the IPAM plugin which has written state to persistent disk at `/var/lib/cni`. These IPs aren't the ones advertised to Consul, so there's no particular reason to keep them around after a host reboot because all virtual interfaces need to be recreated too. Reconfigure the CNI bridge configuration to use `/var/run/cni` as its state directory. We already expect this location to be created by CNI because the netns files are hard-coded to be created there too in `libcni`. Note this does not fix the problem described for Docker in #24292 because that appears to be related to the netns itself being restored unexpectedly from Docker's state. Ref: https://github.com/hashicorp/nomad/issues/24292#issuecomment-2537078584 Ref: https://www.cni.dev/plugins/current/ipam/host-local/#files	2024-12-16 09:36:35 -05:00
Judith Malnick	21c684affa	Let education approve PRs to the documentation side nav (#24636 )	2024-12-16 09:22:10 +01:00
Juana De La Cuesta	526c6375ad	Make paths in e2e/terraform/ directory relative to the module (#24664 ) * func: make paths relative * func: make paths relative to the module inside the e2e terraform folder * fix: add license files to gitignore * func: move /etc and update all paths * Uncomment forgotten code * fix: update the path to the tls certificates to be local to the instance	2024-12-13 17:33:59 +01:00
Juana De La Cuesta	a9a0f71213	Remove sockaddr and use native tools (#24665 ) * func: remove sockaddr and use native tools * Update setup.sh	2024-12-13 17:24:53 +01:00
dependabot[bot]	614cb5c17f	chore(deps): bump slackapi/slack-github-action from 1.27.0 to 2.0.0 (#24472 )	2024-12-13 14:00:37 +00:00
Phil Renaud	f452948556	Character restrictions in action names were unduly oppressive (#24642 ) * Character restrictions in action names were unduly oppressive * OK but what about SOME oppression * Test updates for our new action name rules	2024-12-12 16:58:26 -05:00
dependabot[bot]	dcb0259fe3	chore(deps): bump nanoid from 3.3.7 to 3.3.8 in /ui (#24640 ) Bumps [nanoid](https://github.com/ai/nanoid) from 3.3.7 to 3.3.8. - [Release notes](https://github.com/ai/nanoid/releases) - [Changelog](https://github.com/ai/nanoid/blob/main/CHANGELOG.md) - [Commits](https://github.com/ai/nanoid/compare/3.3.7...3.3.8) --- updated-dependencies: - dependency-name: nanoid dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-12-12 10:35:33 -05:00
James Rasell	7d48aa2667	client: emit optional telemetry from prerun and prestart hooks. (#24556 ) The Nomad client can now optionally emit telemetry data from the prerun and prestart hooks. This allows operators to monitor and alert on failures and time taken to complete. The new datapoints are: - nomad.client.alloc_hook.prerun.success (counter) - nomad.client.alloc_hook.prerun.failed (counter) - nomad.client.alloc_hook.prerun.elapsed (sample) - nomad.client.task_hook.prestart.success (counter) - nomad.client.task_hook.prestart.failed (counter) - nomad.client.task_hook.prestart.elapsed (sample) The hook execution time is useful to Nomad engineering and will help optimize code where possible and understand job specification impacts on hook performance. Currently only the PreRun and PreStart hooks have telemetry enabled, so we limit the number of new metrics being produced.	2024-12-12 14:43:14 +00:00
James Rasell	86bc7ed224	cli: Ensure JSON flag is respected in `autopilot health` command. (#24655 )	2024-12-12 13:43:32 +00:00
dependabot[bot]	63e2c6aaec	chore(deps): bump golang.org/x/crypto from 0.27.0 to 0.31.0 (#24652 ) Bumps [golang.org/x/crypto](https://github.com/golang/crypto) from 0.27.0 to 0.31.0. - [Commits](https://github.com/golang/crypto/compare/v0.27.0...v0.31.0) --- updated-dependencies: - dependency-name: golang.org/x/crypto dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-12-12 11:05:03 +01:00
Phil Renaud	3bc2c07821	Remove the Ember Test Audit workflow (#24637 )	2024-12-10 11:25:53 -06:00
dependabot[bot]	6a41dc7b2f	chore(deps): bump nanasess/setup-chromedriver from 2.2.2 to 2.3.0 (#24623 ) Bumps [nanasess/setup-chromedriver](https://github.com/nanasess/setup-chromedriver) from 2.2.2 to 2.3.0. - [Release notes](https://github.com/nanasess/setup-chromedriver/releases) - [Commits](`42cc299832...e93e57b843`) --- updated-dependencies: - dependency-name: nanasess/setup-chromedriver dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-12-09 14:26:24 -06:00
dependabot[bot]	f46097ee54	chore(deps): bump github.com/rs/cors from 1.8.3 to 1.11.1 (#24627 ) Bumps [github.com/rs/cors](https://github.com/rs/cors) from 1.8.3 to 1.11.1. - [Commits](https://github.com/rs/cors/compare/v1.8.3...v1.11.1) --- updated-dependencies: - dependency-name: github.com/rs/cors dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-12-09 09:56:54 -05:00
dependabot[bot]	fe217518a4	chore(deps): bump github.com/shoenig/test from 1.11.0 to 1.12.0 (#24625 ) Bumps [github.com/shoenig/test](https://github.com/shoenig/test) from 1.11.0 to 1.12.0. - [Release notes](https://github.com/shoenig/test/releases) - [Commits](https://github.com/shoenig/test/compare/v1.11.0...v1.12.0) --- updated-dependencies: - dependency-name: github.com/shoenig/test dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-12-09 09:56:47 -05:00
dependabot[bot]	5a0205083c	chore(deps): bump github.com/miekg/dns from 1.1.56 to 1.1.62 (#24624 ) Bumps [github.com/miekg/dns](https://github.com/miekg/dns) from 1.1.56 to 1.1.62. - [Changelog](https://github.com/miekg/dns/blob/master/Makefile.release) - [Commits](https://github.com/miekg/dns/compare/v1.1.56...v1.1.62) --- updated-dependencies: - dependency-name: github.com/miekg/dns dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-12-09 09:55:30 -05:00
dependabot[bot]	09f6043937	chore(deps): bump github.com/hashicorp/go-kms-wrapping/wrappers/awskms/v2 (#24626 ) Bumps [github.com/hashicorp/go-kms-wrapping/wrappers/awskms/v2](https://github.com/hashicorp/go-kms-wrapping) from 2.0.9 to 2.0.10. - [Commits](https://github.com/hashicorp/go-kms-wrapping/compare/v2.0.9...v2.0.10) --- updated-dependencies: - dependency-name: github.com/hashicorp/go-kms-wrapping/wrappers/awskms/v2 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-12-09 09:30:21 -05:00
Judith Malnick	76fcc6f2e3	codeowners: let nomad education and web presence merge PRs to website files (#24621 ) * add nomad education and web presence ability to merge PRs to relavent website files * add github-nomad-core * change edu approvers to nomad-docs	2024-12-09 08:44:30 -05:00
Jonathan Weinberg	d1cec4285f	Spelling error (#24619 )	2024-12-06 14:31:39 -05:00
dependabot[bot]	9a1d951f45	chore(deps-dev): bump prettier from 3.2.4 to 3.4.1 in /website (#24573 ) Bumps [prettier](https://github.com/prettier/prettier) from 3.2.4 to 3.4.1. - [Release notes](https://github.com/prettier/prettier/releases) - [Changelog](https://github.com/prettier/prettier/blob/main/CHANGELOG.md) - [Commits](https://github.com/prettier/prettier/compare/3.2.4...3.4.1) --- updated-dependencies: - dependency-name: prettier dependency-type: direct:development update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-12-06 09:40:05 -05:00
Aimee Ukasick	af5e2a742e	Docs Feature: Add clone and edit feature (#24593 ) * Docs: Add clone and edit feature CE-741 * Change clone and edit heading level * A few work tweaks	2024-12-05 09:21:27 -06:00
Michael Smithhisler	a43f46e247	ci: only cache from github hosted runners to avoid file perm issues (#24607 )	2024-12-04 12:27:26 -05:00
Tim Gross	4f3de69537	test: fix panic from concurrent writes in periodic dispatch test (#24602 ) Test setup for the `TestPeriodicDispatch_Add_TriggersUpdate` test can panic if the goroutine for the runner is running concurrently with adding the job the second time. Update the test as follows: * Make a copy when mutating the job before adding it. * Add a lock around checking if the dispatcher has a waiting eval. * Update to use `shoenig/test` in lieu of `testify`.	2024-12-04 09:57:38 -05:00
Tim Gross	da786f64c7	helper: sanitize method on ACL token object (#24600 ) There are several places where we want to redact the secret ID of an ACL token, some of which are in the Enterprise code base for Sentinel. Add a new method `Sanitize` that mirrors the one we have on `Node`. Ref: https://github.com/hashicorp/nomad-enterprise/pull/2087	2024-12-03 14:02:30 -05:00
CJ	4563165196	Update sentinel.mdx (#24598 )	2024-12-03 11:24:06 -05:00
Phil Renaud	4b91c17dfa	[ui, ci] retain artifacts from test runs including test timing (#24555 ) * retain artifacts from test runs including test timing * Pinning commit hashes for action helpers * trigger for ui-test run * Trying to isolate down to a simple upload * Once more with mkdir * What if we just wrote our own test reporter tho * Let the partitioned runs handle placement * Filter out common token logs, add a summary at the end, and note failures in logtime * Custom reporter cannot also have an output file, he finds out two days late * Aggregate summary, duration, and removing failure case * Conditional test report generation * Timeouts are errors * Trying with un-partitioned input json file * Remove the commented-out lines for main-only runs * combine-ui-test-results as its own script	2024-12-03 09:56:06 -05:00
Anthony	97d14c91dc	Merge pull request #24588 from hashicorp/security-model-doc-fix-title Fix doc title in security.mdx	2024-12-02 13:05:39 -05:00
CJ	b603b97d26	Update security.mdx	2024-12-02 11:43:24 -06:00

1 2 3 4 5 ...

26467 Commits