nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-01 16:05:42 +03:00

Author	SHA1	Message	Date
James Rasell	ca9e08e6b5	monitor: add log include location option on monitor CLI and API (#18795 )	2023-10-20 07:55:22 +01:00
Tim Gross	f5c5035fde	testutil: add ACL bootstrapping to test server configuration (#18811 ) Some of our `api` package tests have ACLs enabled, but none of those tests also run clients and the "wait for the clients to be live" code reads from the Node API. The caller can't bootstrap ACLs until `NewTestServer` returns, and this makes for a circular dependency. Allow developers to provide a bootstrap token to the test server config, and if it's available, have the server bootstrap the ACL system with it before checking for live clients.	2023-10-19 16:50:38 -04:00
Seth Hoenig	83720740f5	core: plumbing to support numa aware scheduling (#18681 ) * core: plumbing to support numa aware scheduling * core: apply node resources compatibility upon fsm rstore Handle the case where an upgraded server dequeus an evaluation before a client triggers a new fingerprint - which would be needed to cause the compatibility fix to run. By running the compat fix on restore the server will immediately have the compatible pseudo topology to use. * lint: learn how to spell pseudo	2023-10-19 15:09:30 -05:00
Piotr Kazmierczak	0410b8acea	client: remove unnecessary debugging from consul client mock (#18807 )	2023-10-19 16:23:42 +02:00
Luiz Aoqui	8b9a5fde4e	vault: add multi-cluster support on templates (#18790 ) In Nomad Enterprise, a task may connect to a non-default Vault cluster, requiring `consul-template` to be configured with a specific client `vault` block.	2023-10-18 20:45:01 -04:00
Piotr Kazmierczak	16d71582f6	client: `consul_hook` tests (#18780 ) ref https://github.com/hashicorp/team-nomad/issues/404	2023-10-18 20:02:35 +02:00
Luiz Aoqui	99e54da9a9	ui: fix websocket connections on dev proxy (#18791 ) * ui: fix websocket connections on dev proxy `ember-cli` includes a handler for websocket upgrade requests to start proxying requests. Handling the same upgrade event by also proxying in `api.js` causes some kind of conflict where the connection is closed unexpectedly. The only change necessary on upgrade is to overwrite the Origin header so Nomad accepts the connection. * ui: remove unused variables	2023-10-18 09:50:10 -04:00
Daniel Bennett	b027d8f771	do not embed *Server (#18786 ) these structs embedding Server, then Server _also embedding them_, confused my IDE, isn't necessary, and just feels wrong!	2023-10-17 15:15:00 -05:00
Tim Gross	d0957eb109	Consul: agent config updates for WI (#18774 ) This changeset makes two changes: * Removes the `consul.use_identity` field from the agent configuration. This behavior is properly covered by the presence of `consul.service_identity` / `consul.task_identity` blocks. * Adds a `consul.task_auth_method` and `consul.service_auth_method` fields to the agent configuration. This allows the cluster administrator to choose specific Consul Auth Method names for their environment, with a reasonable default.	2023-10-17 14:42:14 -04:00
Tim Gross	ac56855f07	consul: add multi-cluster support to client constructors (#18624 ) When agents start, they create a shared Consul client that is then wrapped as various interfaces for testability, and used in constructing the Nomad client and server. The interfaces that support workload services (rather than the Nomad agent itself) need to support multiple Consul clusters for Nomad Enterprise. Update these interfaces to be factory functions that return the Consul client for a given cluster name. Update the `ServiceClient` to split workload updates between clusters by creating a wrapper around all the clients that delegates to the cluster-specific `ServiceClient`. Ref: https://github.com/hashicorp/team-nomad/issues/404	2023-10-17 13:46:49 -04:00
Daniel Bennett	8234a422f3	only generate default workload identity once (#18776 ) per alloc task. this can save a bit of cpu when running plans for tasks that already exist, and prevents Nomad tokens from changing, which can cause task template{}s to restart unnecessarily.	2023-10-17 12:31:18 -05:00
modrake	51ffe4208e	workaround and fixes for MPL and copywrite bot (#18775 )	2023-10-17 08:02:13 +01:00
Luiz Aoqui	349c032369	vault: update task runner vault hook to support workload identity (#18534 )	2023-10-16 19:37:57 -04:00
James Rasell	1ffdd576bb	agent: add config option to enable file and line log detail. (#18768 )	2023-10-16 15:59:16 +01:00
James Rasell	fe0a06e4bc	demo: clarify CSI host-path single client limitation. (#18770 ) Co-authored-by: Tim Gross <tgross@hashicorp.com>	2023-10-16 15:58:26 +01:00
Tim Gross	cbd7248248	auth: use `ACLsDisabledACL` when ACLs are disabled (#18754 ) The RPC handlers expect to see `nil` ACL objects whenever ACLs are disabled. By using `nil` as a sentinel value, we have the risk of nil pointer exceptions and improper handling of `nil` when returned from our various auth methods that can lead to privilege escalation bugs. This is the final patch in a series to eliminate the use of `nil` ACLs as a sentinel value for when ACLs are disabled. This patch adds a new virtual ACL policy field for when ACLs are disabled and updates our authentication logic to use it. Included: * Extends auth package tests to demonstrate that nil ACLs are treated as failed auth and disabled ACLs succeed auth. * Adds a new `AllowDebug` ACL check for the weird special casing we have for pprof debugging when ACLs are disabled. * Removes the remaining unexported methods (and repeated tests) from the `nomad/acl.go` file. * Update the semgrep rules to detect improper nil ACL checking and remove the old invalid ACL checks. * Update the contributing guide for RPC authentication. Ref: https://github.com/hashicorp/nomad-enterprise/pull/1218 Ref: https://github.com/hashicorp/nomad/pull/18703 Ref: https://github.com/hashicorp/nomad/pull/18715 Ref: https://github.com/hashicorp/nomad/pull/16799 Ref: https://github.com/hashicorp/nomad/pull/18730 Ref: https://github.com/hashicorp/nomad/pull/18744	2023-10-16 09:30:24 -04:00
Kevin Wang	6dcc402188	chore(docs): update `file` HCL function (#18696 )	2023-10-16 09:03:50 +01:00
Piotr Kazmierczak	299f3bf74b	client: use WI-issued consul tokens in the template_hook (#18752 ) ref https://github.com/hashicorp/team-nomad/issues/404	2023-10-16 09:39:20 +02:00
dependabot[bot]	cb2363f2fb	chore(deps): bump github.com/hashicorp/go-bexpr from 0.1.12 to 0.1.13 (#18758 )	2023-10-16 08:21:57 +01:00
Piotr Kazmierczak	b697de9dda	client: correct consul block validation in the consul_hook (#18751 )	2023-10-13 15:15:04 +02:00
Tim Gross	0931f2ba12	csi: add test for that plugin allocs are filtered by namespace (#18753 ) A CSI plugin can be made up of multiple jobs, which may not be in the same namespace. When querying for a plugin and getting information about the allocations that implement the plugin, we need to filter by the namespaces the user has access to. This test existed in the ENT code base and was never moved over to CE when we made namespaces part of the CE product.	2023-10-13 09:06:36 -04:00
James Rasell	e02dd2a331	vault: use an importable const for Vault header string. (#18740 )	2023-10-13 07:39:06 +01:00
Tim Gross	484f91b893	auth: remove "mixed auth" special casing for Variables endpoint (#18744 ) The RPC handlers expect to see `nil` ACL objects whenever ACLs are disabled. By using `nil` as a sentinel value, we have the risk of nil pointer exceptions and improper handling of `nil` when returned from our various auth methods that can lead to privilege escalation bugs. This is the third in a series to eliminate the use of `nil` ACLs as a sentinel value for when ACLs are disabled. This patch involves leveraging the refactored `auth` package to remove the weird "mixed auth" helper functions that only support the Variables read/list RPC handlers. Instead, pass the ACL object and claim together into the `AllowVariableOperations` method in the usual `acl` package. Ref: https://github.com/hashicorp/nomad-enterprise/pull/1218 Ref: https://github.com/hashicorp/nomad/pull/18703 Ref: https://github.com/hashicorp/nomad/pull/18715 Ref: https://github.com/hashicorp/nomad/pull/16799 Ref: https://github.com/hashicorp/nomad/pull/18730 Fixes: https://github.com/hashicorp/nomad/issues/15875	2023-10-12 16:43:11 -04:00
Piotr Kazmierczak	91753308b3	WI: set the right identity name for Consul tasks (#18742 ) Consul tasks should only have 1 identity of the form consul/{consul_cluster_name}.	2023-10-12 20:34:15 +02:00
Tim Gross	3633ca0f8c	auth: add client-only ACL (#18730 ) The RPC handlers expect to see `nil` ACL objects whenever ACLs are disabled. By using `nil` as a sentinel value, we have the risk of nil pointer exceptions and improper handling of `nil` when returned from our various auth methods that can lead to privilege escalation bugs. This is the third in a series to eliminate the use of `nil` ACLs as a sentinel value for when ACLs are disabled. This patch involves creating a new "virtual" ACL object for checking permissions on client operations and a matching `AuthenticateClientOnly` method for client-only RPCs that can produce that ACL. Unlike the server ACLs PR, this also includes a special case for "legacy" client RPCs where the client was not previously sending the secret as it should (leaning on mTLS only). Those client RPCs were fixed in Nomad 1.6.0, but it'll take a while before we can guarantee they'll be present during upgrades. Ref: https://github.com/hashicorp/nomad-enterprise/pull/1218 Ref: https://github.com/hashicorp/nomad/pull/18703 Ref: https://github.com/hashicorp/nomad/pull/18715 Ref: https://github.com/hashicorp/nomad/pull/16799	2023-10-12 12:21:48 -04:00
dependabot[bot]	cecd9b0472	chore(deps): bump golang.org/x/net from 0.14.0 to 0.17.0 (#18734 )	2023-10-12 07:58:59 +01:00
Tim Gross	c7f97722ef	consul hook: get WIs only for own task group (#18732 ) The WID manager will only sign WI tokens for the allocation's task group. We're accidentally looping over all the task groups, which for jobs with multiple task groups results in a failure in the `consul_hook`.	2023-10-11 17:01:28 -04:00
Tim Gross	b39632fa6f	testing: fix configuration for retry tests (#18731 ) The retry tests in the `api` package set up a client but don't use `NewClient`, so the address never gets parsed into a `url.URL` and that's causing some test failures.	2023-10-11 14:06:31 -04:00
Charlie Voiselle	7266d267b0	Add unix domain socket support to API (#16872 ) - Expose internal HTTP client's Do() via Raw - Use URL parser to identify scheme - Align more with curl output - Add changelog - Fix test failure; add tests for socket envvars - Apply review feedback for tests - Consolidate address parsing - Address feedback from code reviews Co-authored-by: Tim Gross <tgross@hashicorp.com>	2023-10-11 11:04:12 -04:00
Tim Gross	a92461cdc9	auth: add server-only ACL (#18715 ) * auth: add server-only ACL The RPC handlers expect to see `nil` ACL objects whenever ACLs are disabled. By using `nil` as a sentinel value, we have the risk of nil pointer exceptions and improper handling of `nil` when returned from our various auth methods that can lead to privilege escalation bugs. This is the second in a series to eliminate the use of `nil` ACLs as a sentinel value for when ACLs are disabled. This patch involves creating a new "virtual" ACL object for checking permissions on server operations and a matching `AuthenticateServerOnly` method for server-only RPCs that can produce that ACL. Ref: https://github.com/hashicorp/nomad-enterprise/pull/1218 Ref: https://github.com/hashicorp/nomad/pull/18703	2023-10-11 10:59:31 -04:00
Tim Gross	7ca619fe97	deps: remove Vault SDK (#18725 ) Nomad imports the Vault SDK to get testing helpers, but it turns out the only thing actually in use was a single string constant for the Vault namespace header. Remove this dependency and hardcode the constant to reduce dependency churn.	2023-10-11 10:42:09 -04:00
Tim Gross	e22c5b82f3	WID manager: request signed identities for services (#18650 ) Includes changes to WID Manager that make it request signed identities for services, as well as a few improvements to WIHandle introduced in #18672. --------- Co-authored-by: Piotr Kazmierczak <470696+pkazmierczak@users.noreply.github.com>	2023-10-11 12:07:16 +02:00
Juana De La Cuesta	70b020e583	server: Rename functions and use iterator function for clarity (#18716 )	2023-10-11 09:47:10 +02:00
Tim Gross	635afee376	build: bump to go 1.21.3 (#18717 ) Go 1.21.3 fixes an important HTTP2 CVE (see CVE-2023-39325 and CVE-2023-44487). Nomad does not use HTTP2 and is not vulnerable. However we should pick up the toolchain bump if for no other reason than we don't have to answer questions about that.	2023-10-10 16:37:24 -04:00
Luiz Aoqui	ef6814388c	cli: remove default for ACL token type on update (#18689 ) With a default value set to `client`, the `nomad acl token update` command can silently downgrade a management token to client on update if the command does not specify `-type=management` on every update.	2023-10-10 15:51:13 -04:00
Tim Gross	9c2ecbf1d3	auth: refactor `Authenticate` into its own package (#18703 ) The RPC handlers expect to see `nil` ACL objects whenever ACLs are disabled. By using `nil` as a sentinel value, we have the risk of nil pointer exceptions and improper handling of `nil` when returned from our various auth methods that can lead to privilege escalation bugs. This patchset is the first in a series to eliminate the use of `nil` ACLs as a sentinel value for when ACLs are disabled. This one is entirely refactoring to reduce the burden of reviewing the final patchsets that have the functional changes: * Move RPC auth into a new `nomad/auth` package, injecting the dependencies required from the server. Expose only those public methods on `nomad/auth` that are intended for use in the RPC handlers. * Keep the existing large authentication test as an integration test. * Add unit tests covering the methods of `nomad/auth` we intend on keeping. The assertions for many of these will change once we have no `nil` sentinels and can make safe assertions about permissions on the resulting `ACL` objects.	2023-10-10 11:01:24 -04:00
James Rasell	9c57ddd838	core: add preempt to desired updates stringer function return. (#18702 )	2023-10-10 09:55:18 +01:00
dependabot[bot]	9a38a9c188	chore(deps): bump github.com/docker/cli (#18565 )	2023-10-10 09:12:32 +01:00
dependabot[bot]	fbf792f895	chore(deps): bump github.com/docker/distribution (#18693 )	2023-10-10 08:20:28 +01:00
Tim Gross	928a82a184	WID manager: save and restore signed WIs from client state DB (#18661 ) When clients are restarted and the identity hook runs when we restore allocations, the running allocations are likely to have already-signed Workload Identities that are unexpired. Save these to the client's local state DB so that we can avoid a thundering herd of RPCs during client restart. When we restore, we'll check if there's at least one expired signed WI before making any initial signing request. Included: * Renames `getIdentities` to `getInitialIdentities` to make the workflow more clear. * Renames the existing `widmgr_test.go` file of integration tests, which is in its own package to avoid circular imports to `widmgr_int_test.go`	2023-10-09 09:16:23 -04:00
dependabot[bot]	5945ed5cfd	chore(deps): bump google.golang.org/protobuf from 1.30.0 to 1.31.0 (#18694 )	2023-10-09 11:39:51 +01:00
Luiz Aoqui	c6ce966d98	build: load time/tzdata on Windows (#18676 ) Nomad uses `time.LoadLocation()` to translate a periodic job time zone string value to a `time.Location`. From godocs: LoadLocation looks for the IANA Time Zone database in the following locations in order: * the directory or uncompressed zip file named by the ZONEINFO environment variable * on a Unix system, the system standard installation location * $GOROOT/lib/time/zoneinfo.zip * the time/tzdata package, if it was imported So non-Unix systems require Go to be installed or `time/tzdata` to be imported, otherwise running periodic jobs with a specific `time_zone` value results in an error: Invalid time zone "America/Toronto": unknown time zone America/Toronto This commit adds the `timetzdata` build tag on Windows to embed the time zone data into the final binary. This results in a slightly bigger binary, but from `time/tzdata` godocs: Importing this package will increase the size of a program by about 450 KB. [..] This package will be automatically imported if you build with -tags timetzdata.	2023-10-06 12:57:42 -04:00
Piotr Kazmierczak	597d835220	wi: introduce workload identity handler (#18672 ) Any code that tracks workloads and their identities should not rely on string comparisons, especially since we support 2 types of workload identities: those that identify tasks and those that identify services. This means we cannot rely on task.Name for workload-identity pairs. The new type structs.WIHandle solves this problem by providing a uniform way of identifying workloads and their identities.	2023-10-06 18:32:47 +02:00
Luiz Aoqui	0ccf942b26	scheduler: fix host volume feasibility check (#18679 ) Host volumes were considered regular feasibility checks. This had two unintended consequences. The first happened when scheduling an allocation with a host volume on a set of nodes with the same computed class but where only some of them had the desired host volume. If the first node evaluated did not have the host volume, the entire node class was considered ineligible for the task group. ```go // Run the job feasibility checks. for _, check := range w.jobCheckers { feasible := check.Feasible(option) if !feasible { // If the job hasn't escaped, set it to be ineligible since it // failed a job check. if !jobEscaped { evalElig.SetJobEligibility(false, option.ComputedClass) } continue OUTER } } ``` This results in all nodes with the same computed class to be skipped, even if they do have the desired host volume. ```go switch evalElig.JobStatus(option.ComputedClass) { case EvalComputedClassIneligible: // Fast path the ineligible case metrics.FilterNode(option, "computed class ineligible") continue ``` The second consequence is somewhat the opposite. When an allocation has a host volume with `per_alloc = true` the node must have a host volume that matches the allocation index, so each allocation is likely to be placed in different nodes. But when the first allocation found a node match, it registered the node class as eligible for the task group. ```go // Set the task group eligibility if the constraints weren't escaped and // it hasn't been set before. if !tgEscaped && tgUnknown { evalElig.SetTaskGroupEligibility(true, w.tg, option.ComputedClass) } ``` This could cause other allocations to be placed on nodes without the expected host volume because of the computed node class fast path. The node feasibility for the volume was never checked. ```go case EvalComputedClassEligible: // Fast path the eligible case if w.available(option) { return option } // We match the class but are temporarily unavailable continue OUTER ``` These problems did not happen with CSI volumes kind of accidentally. Since the `CSIVolumeChecker` was not placed in the `tgCheckers` list it did not cause the node class to be considered ineligible on failure (avoiding the first problem). And, as illustrated in the code snippet above, the eligible node class fast path checks `tgAvailable` (where `CSIVolumeChecker` is placed) before returning the option (avoiding the second problem). By also placing `HostVolumeChecker` in the `tgAvailable` list instead of `tgCheckers` we also avoid these problems on host volume feasibility.	2023-10-06 11:00:48 -04:00
Seth Hoenig	e3c8700ded	deps: upgrade to go-set/v2 (#18638 ) No functional changes, just cleaning up deprecated usages that are removed in v2 and replace one call of .Slice with .ForEach to avoid making the intermediate copy.	2023-10-05 11:56:17 -05:00
Phil Renaud	533f293fa8	Wrap the passed path prop as a handlebars tag (#18598 )	2023-10-05 12:47:18 -04:00
Luiz Aoqui	d425c90e0f	client: remove null dynamic metadata keys (#18664 ) Setting a null value to a node metadata is expected to remove it from subsequent reads. This is true both for static node metadata (defined in the agent configuration file) as well as for dynamic node metadata (defined via the Nomad API). Null values for static metadata must be persisted to indicate that the value has been removed, but strictly dynamic metadata null values can be removed from state and client memory.	2023-10-05 11:41:44 -04:00
Luiz Aoqui	ed204e0fd9	client: ensure task only runs with prestart hooks (#18662 ) Since the allocation in the task runner is updated in a separate goroutine, a race condition may happen where the task is started but the prestart hooks are skipped because the allocation became terminal. Checking for a terminal allocation before proceeding with the task start ensures the task only runs if the prestart hooks are also executed. Since `shouldShutdown()` only uses terminal allocation status, it remains `true` after the first transition, so it's safe to check it again after the prestart hooks as it will never revert to `false`.	2023-10-05 10:16:57 -04:00
Juana De La Cuesta	d701925ffa	[f-gh-1106-reporting] Use full cluster metadata for reporting (#18660 ) * func: add reporting config to server * func: add reporting manager for ce * func: change from clusterID to clusterMetadata and use it to start ent ledearship * Update leader.go * style: typo	2023-10-05 09:32:54 +02:00
Piotr Kazmierczak	03cf9ae7ff	vault: eliminate vaultclient test import cycle (#18652 ) Eliminates the vaultclient test import cycle by putting the test file into the client package and making vaultclient objects public. Ref hashicorp/team-nomad#404	2023-10-05 09:17:16 +02:00

1 2 3 4 5 ...

25168 Commits