nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-04 09:25:46 +03:00

Author	SHA1	Message	Date
Tim Gross	acfb4e679a	docs: expand pprof documentation on goroutine profiles (#18172 )	2023-08-08 08:33:42 -04:00
Devashish Taneja	472693d642	server: add config to tune job versions retention. #17635 (#17939 )	2023-08-07 14:47:40 -04:00
Tim Gross	5d2c1d1f03	test: fix flaky RPC TLS enforcement test (#18155 ) The RPC TLS enforcment test creates network connections to a server and these are occassionally failing in testing with `write: broken pipe` errors. This has been an ongoing issue where it'll appear to get fixed, then reoccur, and no one seems to be able to reproduce outside of CI. The test assertion itself is reliable, which is why it's been hard to spend effort to hunt this down. The failing test cases are ones that are never supposed to work b/c they fail our TLS cert role validation. The error message is coming from the TLS handshake error. The RPC connection handler closes the connection immediately on getting the error from the TLS handshake. The stdlib's TLS library flushes the connection's buffer before returning the error. So the theory is that in the failing case we don't get the error message before the connection is closed, but do get the error return that allows the client to move on to a write, which tries to write on the closed pipe. I've been unable to reproduce this exactly, as the race is effectively between the OS and the runtime. The equivalent test of the Raft TLS enforcement includes handling of a EOF intead of the certificate error, so it appears this actually expected (or at least known) behavior. Because the code under test is operating as expected, this changeset updates the assertion to accept the error.	2023-08-07 11:17:06 -04:00
Abbas Yazdanpanah	388198abef	CLI: make snapshot name requiered in creating volume snapshots (#17958 ) Co-authored-by: James Rasell <jrasell@users.noreply.github.com>	2023-08-04 10:36:07 +01:00
Tim Gross	902f640c80	docs: fix URL in agent pprof examples (#18142 )	2023-08-03 16:05:53 -04:00
dependabot[bot]	9551441dff	build(deps): bump github.com/hashicorp/go-kms-wrapping/v2 (#17957 ) Bumps [github.com/hashicorp/go-kms-wrapping/v2](https://github.com/hashicorp/go-kms-wrapping) from 2.0.8 to 2.0.12. - [Commits](https://github.com/hashicorp/go-kms-wrapping/compare/v2.0.8...v2.0.12) --- updated-dependencies: - dependency-name: github.com/hashicorp/go-kms-wrapping/v2 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-08-03 15:43:14 -04:00
dependabot[bot]	02b572473b	build(deps): bump github.com/opencontainers/runc from 1.1.5 to 1.1.8 (#18037 ) Bumps [github.com/opencontainers/runc](https://github.com/opencontainers/runc) from 1.1.5 to 1.1.8. - [Release notes](https://github.com/opencontainers/runc/releases) - [Changelog](https://github.com/opencontainers/runc/blob/v1.1.8/CHANGELOG.md) - [Commits](https://github.com/opencontainers/runc/compare/v1.1.5...v1.1.8) --- updated-dependencies: - dependency-name: github.com/opencontainers/runc dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-08-03 15:37:04 -04:00
dependabot[bot]	0d3f976a8a	build(deps): bump github.com/hashicorp/consul/api from 1.18.0 to 1.23.0 (#18038 ) Bumps [github.com/hashicorp/consul/api](https://github.com/hashicorp/consul) from 1.18.0 to 1.23.0. - [Release notes](https://github.com/hashicorp/consul/releases) - [Changelog](https://github.com/hashicorp/consul/blob/main/CHANGELOG.md) - [Commits](https://github.com/hashicorp/consul/compare/api/v1.18.0...api/v1.23.0) --- updated-dependencies: - dependency-name: github.com/hashicorp/consul/api dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-08-03 15:01:34 -04:00
Tim Gross	b1742c7015	scheduler: filter device instance IDs by constraints (#18141 ) When the scheduler assigns a device instance, it iterates over the feasible devices and then picks the first instance with availability. If the jobspec uses a constraint on device ID, this can lead to buggy/surprising behavior where the node's device matches the constraint but then the individual device instance does not. Add a second filter based on the `${device.ids}` constraint after selecting a node's device to ensure the device instance ID falls within the constraint as well. Fixes: #18112	2023-08-03 14:58:30 -04:00
James Rasell	9707aafc5b	test: add tests for allocNameIndex core funcs (#18136 )	2023-08-03 15:43:50 +01:00
Karuppiah Natarajan	2fd508d4f1	docs: fix link for stopping an agent (#18130 )	2023-08-02 11:51:45 -04:00
Tim Gross	8ad663d1de	allocwatcher: don't destroy local allocdir after migration (#18108 ) When ephemeral disks are migrated from an allocation on the same node, allocation logs for the previous allocation are lost. There are two workflows for the best-effort attempt to migrate the allocation data between the old and new allocations. For previous allocations on other clients (the "remote" workflow), we create a local allocdir and download the data from the previous client into it. That data is then moved into the new allocdir and we delete the allocdir of the previous alloc. For "local" previous allocations we don't need to create an extra directory for the previous allocation and instead move the files directly from one to the other. But we still delete the old allocdir _entirely_, which includes all the logs! There doesn't seem to be any reason to destroy the local previous allocdir, as the usual client garbage collection should destroy it later on when needed. By not deleting it, the previous allocation's logs are still available for the user to read. Fixes: #18034	2023-08-02 09:41:46 -04:00
Charlie Voiselle	585b0533c0	[dep] bump golang.org/x/exp (#18102 ) There are some refactorings that have to be made in the getter and state where the api changed in `slices` * Bump golang.org/x/exp * Bump golang.org/x/exp in api * Update job_endpoint_test * [feedback] unexport sort function	2023-08-01 11:50:17 -04:00
Luiz Aoqui	768978883d	cli: search all namespaces for node volumes (#17925 ) When looking for CSI volumes to display in the `node status` command the CLI needs to search all namespaces.	2023-08-01 09:55:39 -04:00
Kevin Schoonover	4841791c86	fingerprint: fix 'default' alias not added to interface specified by `network_interface` (#18096 )	2023-08-01 08:35:31 -04:00
dependabot[bot]	511cb55633	build(deps): bump word-wrap from 1.2.3 to 1.2.4 in /ui (#17972 ) Bumps [word-wrap](https://github.com/jonschlinkert/word-wrap) from 1.2.3 to 1.2.4. - [Release notes](https://github.com/jonschlinkert/word-wrap/releases) - [Commits](https://github.com/jonschlinkert/word-wrap/compare/1.2.3...1.2.4) --- updated-dependencies: - dependency-name: word-wrap dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-07-31 15:57:22 -04:00
Phil Renaud	18dd9e722f	[ui] Job Variables page (#17964 ) * Bones of a component that has job variable awareness * Got vars listed woo * Variables as its own subnav and some pathLinkedVariable perf fixes * Automatic Access to Variables alerter * Helper and component to conditionally render the right link * A bit of cleanup post-template stuff * testfix for looping right-arrow keynav bc we have a new subnav section * A very roundabout way of ensuring that, if a job exists when saving a variable with a pathLinkedEntity of that job, its saved right through to the job itself * hacky but an async version of pathLinkedVariable * model-driven and async fetcher driven with cleanup * Only run the update-job func if jobname is detected in var path * Test cases begun * Management token for variables to appear in tests * Its a management token so it gets to see the clients tab under system jobs * Pre-review cleanup * More tests * Number of requests test and small fix to groups-by-way-or-resource-arrays elsewhere * Variable intro text tests * Variable name re-use * Simplifying our wording a bit * parse json vs plainId * Addressed PR feedback, including de-waterfalling	2023-07-31 15:04:36 -04:00
Tim Gross	4fb5bf9a16	cli: support wildcard namespace in alloc subcommands (#18095 ) The alloc exec and filesystem/logs commands allow passing the `-job` flag to select a random allocation. If the namespace for the command is set to `*`, the RPC handler doesn't handle this correctly as it's expecting to query for a specific job. Most commands handle this ambiguity by first verifying that only a single object of the type in question exists (ex. a single node or job). Update these commands so that when the `-job` flag is set we first verify there's a single job that matches. This also allows us to extend the functionality to allow for the `-job` flag to support prefix matching. Fixes: #12097	2023-07-31 13:15:15 -04:00
Phil Renaud	66649d12a7	[ui] Search results are overloading filter with sorted results (#18053 ) * Attempt at a varied end-result when sorting and searching * Consider sort direction as well * computed property dep update * prioritizeSearchOrder and test * Side-effecty but resets sort on search etc * changelog	2023-07-31 13:07:27 -04:00
Tim Gross	1ef8ad8176	scheduler: fix panic in `render_templates` destructive update check (#18100 ) In #18054 we introduced a new field `render_templates` in the `restart` block. Previously changes to the `restart` block were always non-destructive in the scheduler but we now need to check the new field so that we can update the template runner. The check assumed that the block was always non-nil, which causes panics in our scheduler tests.	2023-07-31 11:52:51 -04:00
Gunnar	76ebb3fe55	docs: added accessor info to Tuples in template.mdx (#18101 )	2023-07-31 11:03:12 -04:00
Gerard Nguyen	9e98d694a6	feature: Add new field render_templates on restart block (#18054 ) This feature is necessary when user want to explicitly re-render all templates on task restart. E.g. to fetch all new secrets from Vault, even if the lease on the existing secrets has not been expired.	2023-07-28 11:53:32 -07:00
Tim Gross	b17c0f7ff9	GHA pinning updates (#18093 ) Trusted Supply Chain Component Registry (TSCCR) enforcement starts Monday and an internal report shows our semgrep action is pinned to a version that's not currently permitted. Update all the action versions to whatever's the new hotness to maximum the time-to-live on these until we have automated pinning setup. Also version bumps our chromedriver action, which randomly broke upstream today.	2023-07-28 11:49:57 -04:00
Luiz Aoqui	ee31916c3b	cli: add help message for `-consul-namespace` (#18081 ) Add missing help entry for the `-consul-namespace` flag in `nomad job run`.	2023-07-28 10:22:59 -04:00
James Rasell	0a32d7ff5b	docs: add allocation checks API documentation. (#18078 )	2023-07-28 08:49:14 +01:00
Michael Schurter	d14362ec19	core: add jwks rpc and http api (#18035 ) Add JWKS endpoint to HTTP API for exposing the root public signing keys used for signing workload identity JWTs. Part 1 of N components as part of making workload identities consumable by third party services such as Consul and Vault. Identity attenuation (audience) and expiration (+renewal) are necessary to securely use workload identities with 3rd parties, so this merge does not yet document this endpoint. --------- Co-authored-by: Tim Gross <tgross@hashicorp.com>	2023-07-27 11:27:17 -07:00
Piotr Kazmierczak	ee0b104785	build: support s390x architecture for linux (ent) (#18069 ) Makefile changes required for supporting s390x builds and a corresponding changelog entry.	2023-07-26 17:43:37 +02:00
Piotr Kazmierczak	0a5667c0c7	changelog entry for nomad-enterprise#1201 (#18071 )	2023-07-26 16:48:15 +02:00
Ville Vesilehto	2c463bb038	chore(lint): use Go stdlib variables for HTTP methods and status codes (#17968 )	2023-07-26 15:28:09 +01:00
Ville Vesilehto	5c9cd35055	chore(variable): Go stdlib vars for HTTP methods and status codes (#18062 )	2023-07-26 14:30:11 +01:00
Ville Vesilehto	a8fd803176	chore(nodepool): Go stdlib vars for HTTP methods and status codes (#18061 )	2023-07-26 14:23:28 +01:00
James Rasell	7f30444356	changelog: add entry for #18044 (#18056 )	2023-07-25 13:04:19 +01:00
Phil Renaud	937d927af7	Default-sort variable keyvalues at serialization (#18051 )	2023-07-24 14:25:29 -04:00
Luiz Aoqui	55723e5a3b	website: add Nomad Ops to Tools (#18006 )	2023-07-24 11:32:54 -04:00
James Rasell	738bdb213d	build: update to go1.20.6 (#18044 )	2023-07-24 16:13:22 +01:00
Bruce Lok	7173d3bc25	Add missing consul grpc config (#17943 )	2023-07-24 12:39:23 +01:00
Lance Haig	03cde51720	Rename Function to reflect correct outcome. (#17948 )	2023-07-24 10:43:51 +01:00
Kevin Mulvey	ea37488e54	check in stderrFrame is nil before logging stderrFrame.Data (#17815 )	2023-07-24 09:33:14 +01:00
James Rasell	2a91bf4469	node-pool: fix validate name function comment typo. (#17927 )	2023-07-24 08:28:05 +01:00
stswidwinski	b9a388f5df	Retain task states for post stop tasks at the time of node GC (#18005 ) * Retain task states for post stop tasks at the time of node GC	2023-07-21 10:55:00 -07:00
Tim Gross	4768c2a455	Merge pull request #18028 from hashicorp/post-1.6.1-release Post 1.6.1 release	2023-07-21 11:31:34 -04:00
hc-github-team-nomad-core	0bcc20e9e5	Prepare for next release	2023-07-21 11:12:00 -04:00
hc-github-team-nomad-core	583f8773fa	Generate files for 1.6.1 release	2023-07-21 11:09:15 -04:00
Phil Renaud	91e1bafbac	Changelog entry for remote purge boot-out (#18026 )	2023-07-21 09:21:02 -04:00
Luiz Aoqui	2b3dd86dc5	ui: handle node pool requests to older regions (#18021 ) When accessing a region running a version of Nomad without node pools an error was thrown because the request is handled by the nodes endpoint which fails because it assumes `pools` is the node ID.	2023-07-21 09:16:49 -04:00
Luiz Aoqui	5d3639f304	ui: handle errors from unimplemented services (#18020 ) When a request is made to an RPC service that doesn't exist (for example, a cross-region request from a newer version of Nomad to an older version that doesn't implement the endpoint) the application should return an empty list as well.	2023-07-21 09:16:35 -04:00
Luiz Aoqui	f8b9b5c387	state: canonicalize namespace on restore (#18017 ) The upgrade path to Nomad 1.6.0 requires canonicalizing the namespace in order to set the default scheduler configuration values. Previous implementation only canonicalized on namespace upsert operations, which works for recent namespaces as those Raft transactions are reapplied on upgrade. But for older namespaces restore from a snapshot the code path did not canonicalize them, leaving the scheduler configuration set as `nil`.	2023-07-20 16:04:51 -04:00
Tim Gross	f52912454d	CSI: improve controller RPC reliability (#17996 ) The CSI specification says that we "SHOULD" send no more than one in-flight request per volume at a time, with an allowance for losing state (ex. leadership transitions) which the plugins "SHOULD" handle gracefully. We mostly successfully serialize node and controller RPCs for the same volume, except when Nomad clients are lost. (See also https://github.com/container-storage-interface/spec/issues/512) These concurrency requirements in the spec fall short because Storage Provider APIs aren't necessarily safe to call concurrently on the same host even for _different_ volumes. For example, concurrently attaching AWS EBS volumes to an EC2 instance results in a race for device names, which results in failure to attach (because the device name is taken already and the API call fails) and confused results when releasing claims. So in practice many CSI plugins rely on k8s-specific sidecars for serializing storage provider API calls globally. As a result, we have to be much more conservative about concurrency in Nomad than the spec allows. This changeset includes four major changes to fix this: * Add a serializer method to the CSI volume RPC handler. When the RPC handler makes a destructive CSI Controller RPC, we send the RPC thru this serializer and only one RPC is sent at a time. Any other RPCs in flight will block. * Ensure that requests go to the same controller plugin instance whenever possible by sorting by lowest client ID out of the plugin instances. * Ensure that requests go to _healthy_ plugin instances only. * Ensure that requests for controllers can go to a controller on any _live_ node, not just ones eligible for scheduling (which CSI controllers don't care about) Fixes: #15415	2023-07-20 14:51:51 -04:00
Phil Renaud	94112d8cfd	Copy button added to variables title (#17935 )	2023-07-20 14:16:33 -04:00
Phil Renaud	6bed12f693	Copy change to include the nomad/jobs all-access variable prefix (#17933 )	2023-07-20 14:16:14 -04:00

1 2 3 4 5 ...

24949 Commits