nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-01 16:05:42 +03:00

Author	SHA1	Message	Date
Seth Hoenig	de28760928	cl: add changelog for numa (#18847 )	2023-10-25 10:41:17 -05:00
James Rasell	b3e41bec2d	scheduler: remove unused alloc index functions. (#18846 )	2023-10-25 09:09:47 +01:00
Michael Schurter	9b3c38b3ed	docs: deprecate rsadecrypt (#18856 ) `rsadecrypt` uses PKCS #1 v1.5 padding which has multiple known weaknesses. While it is possible to use safely in Nomad, we should not encourage our users to use bad cryptographic primitives. If users want to decrypt secrets in jobspecs we should choose a cryptographic primitive designed for that purpose. `rsadecrypt` was inherited from Terraform which only implemented it to support decrypting Window's passwords on AWS EC2 instances: https://github.com/hashicorp/terraform/pull/16647 This is not something that should ever be done in a jobspec, therefore there's no reason for Nomad to support this HCL2 function.	2023-10-24 15:48:15 -07:00
Tim Gross	6c2d5a0fbb	E2E: Consul compatibility matrix tests (#18799 ) Set up a new test suite that exercises Nomad's compatibility with Consul. This suite installs all currently supported versions of Consul, spins up a Consul agent with appropriate configuration, and a Nomad agent running in dev mode. Then it runs a Connect job against each pair.	2023-10-24 16:03:53 -04:00
Seth Hoenig	8de7af51cb	cl: remove cgroup mountpoint (#18848 ) * cl: remove cgroup mountpoint attribute * cl: add changelog for cgroups attribute changes	2023-10-24 11:38:26 -05:00
Daniel Bennett	b46b41a2e9	scheduler: appropriately unblock evals with quotas (#18838 ) When an eval is blocked due to e.g. cpu exhausted on nodes, but there happens to also be a quota on the job's namespace, the eval would not get auto- unblocked when the node cpu got freed up. This change ensures, when considering quota during BlockedEvals.unblock(), that the block was due to quota in the first place, so unblocking does not get skipped due to the mere existence of a quota on the namespace.	2023-10-24 11:22:24 -05:00
Seth Hoenig	5cf4c6cc06	cl: note breaking change of numcores attribute on apple systems (#18850 ) I goofed the name the first time around, "power" should have been "performance" which is consistent with both Apple and Intel branding.	2023-10-24 10:54:26 -05:00
Seth Hoenig	9ae4b10dc6	cl: minor features are listed as improvements (#18845 ) The Features header is reserved for "tent-pole" features of a Nomad version.	2023-10-24 10:53:40 -05:00
James Rasell	f64ade2304	cli: ensure HCL env vars are added to the job submission object. (#18832 )	2023-10-24 16:48:13 +01:00
Kerim Satirli	5e1bbf90fc	docs: update all URLs to `developer.hashicorp.com` (#16247 )	2023-10-24 11:00:11 -04:00
Seth Hoenig	951cde4e3b	numa: fix cpu topology conversion for non linux systems (#18843 )	2023-10-24 09:12:34 -05:00
Tim Gross	cb3fde3c96	metrics: prevent negative counter from iowait decrease (#18835 ) The iowait metric obtained from `/proc/stat` can under some circumstances decrease. The relevant condition is when an interrupt arrives on a different core than the one that gets woken up for the IO, and a particular counter in the kernel for that core gets interrupted. This is documented in the man page for the `proc(5)` pseudo-filesystem, and considered an unfortunate behavior that can't be changed for the sake of ABI compatibility. In Nomad, we get the current "busy" time (everything except for idle) and compare it to the previous busy time to get the counter incremeent. If the iowait counter decreases and the idle counter increases more than the increase in the total busy time, we can get a negative total. This previously caused a panic in our metrics collection (see #15861) but that is being prevented by reporting an error message. Fix the bug by putting a zero floor on the values we return from the host CPU stats calculator. Fixes: #15861 Fixes: #18804	2023-10-24 09:58:25 -04:00
Seth Hoenig	043b1a95a7	deps: bump go-set/v2 to alpha.3 (#18844 ) fixes a rather critical bug in .Equals implementation	2023-10-24 08:23:25 -05:00
James Rasell	b55dcb3967	test: use must lib for bitmap tests. (#18834 )	2023-10-24 07:40:02 +01:00
Luiz Aoqui	70b1862026	test: add E2E `vaultcompat` test for JWT auth flow (#18822 ) Test the JWT auth flow using real Nomad and Vault agents.	2023-10-23 20:00:55 -04:00
Tim Gross	1b3920f96b	cli: add prefix ID and wildcard namespace support for `service info` (#18836 ) The `nomad service info` command doesn't support using a wildcard namespace with a prefix match, the way that we do for many other commands. Update the command to do a prefix match list query for the services before making the get query. Fixes: #18831	2023-10-23 13:17:51 -04:00
Tim Gross	8a311255a2	docs: Consul Workload Identity integration (#18685 ) Documentation updates to support the new Consul integration with Nomad Workload Identity. Included: * Added a large section to the Consul integration docs to explain how to set up auth methods and binding rules (by hand, assuming we don't ship a `nomad setup-consul` tool for now), and how to safely migrate from the existing workflow to the new one. * Move `consul` block out of `group` and onto its own page now that we have it available at the `task` scope, and expanded examples of its use. * Added the `service_identity` and `task_identity` blocks to the Nomad agent configuration, and provided a recommended default. * Added the `identity` block to the `service` block page. * Added a rough compatibility matrix to the Consul integration page.	2023-10-23 09:17:22 -04:00
Tim Gross	4d9cc73ed2	sids_hook: fix check for Consul token derived from WI (#18821 ) The `sids_hook` serves the legacy Connect workflow, and we want to bypass it when using workload identities. So the hook checks that there's not already a Consul token in the alloc hook resources derived from the Workload Identity. This check was looking for the wrong key. This would cause the hook to ignore the Consul token we already have and then fail to derive a SI token unless the Nomad agent has its own token with `acl:write` permission. Fix the lookup and add tests covering the bypass behavior.	2023-10-23 08:57:02 -04:00
Michael Schurter	a806363f6d	OpenID Configuration Discovery Endpoint (#18691 ) Added the [OIDC Discovery](https://openid.net/specs/openid-connect-discovery-1_0.html) `/.well-known/openid-configuration` endpoint to Nomad, but it is only enabled if the `server.oidc_issuer` parameter is set. Documented the parameter, but without a tutorial trying to actually _use_ this will be very hard. I intentionally did not use https://github.com/hashicorp/cap for the OIDC configuration struct because it's built to be a compliant OIDC provider. Nomad is not trying to be compliant initially because compliance to the spec does not guarantee it will actually satisfy the requirements of third parties. I want to avoid the problem where in an attempt to be standards compliant we ship configuration parameters that lock us in to a certain behavior that we end up regretting. I want to add parameters and behaviors as there's a demonstrable need. Users always have the escape hatch of providing their own OIDC configuration endpoint. Nomad just needs to know the Issuer so that the JWTs match the OIDC configuration. There's no reason the actual OIDC configuration JSON couldn't live in S3 and get served directly from there. Unlike JWKS the OIDC configuration should be static, or at least change very rarely. This PR is just the endpoint extracted from #18535. The `RS256` algorithm still needs to be added in hopes of supporting third parties such as [AWS IAM OIDC Provider](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_providers_create_oidc.html). Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>	2023-10-20 17:11:41 -07:00
Seth Hoenig	0020139440	core: port common code changes from ENT for numa scheduling (#18818 ) Some additional changes were made in the ENT PR to the common code in support of numa scheduling; this PR copies those changes back to CE.	2023-10-20 13:19:02 -05:00
Luiz Aoqui	6d4b62200b	log: add Consul and Vault cluster name to output (#18817 ) Ensure Consul and Vault loggers have the cluster name as an attribute to help differentiate log source.	2023-10-20 14:03:56 -04:00
Phil Renaud	8902afe651	Nomad Actions (#18794 ) * Scaffolding actions (#18639) * Task-level actions for job submissions and retrieval * FIXME: Temporary workaround to get ember dev server to pass exec through to 4646 * Update api/tasks.go Co-authored-by: Tim Gross <tgross@hashicorp.com> * Update command/agent/job_endpoint.go Co-authored-by: Tim Gross <tgross@hashicorp.com> * Diff and copy implementations * Action structs get their own file, diff updates to behave like our other diffs * Test to observe actions changes in a version update * Tests migrated into structs/diff_test and modified with PR comments in mind * APIActionToSTructsAction now returns a new value * de-comment some plain parts, remove unused action lookup * unused param in action converter --------- Co-authored-by: Tim Gross <tgross@hashicorp.com> * New endpoint: job/:id/actions (#18690) * unused param in action converter * backing out of parse_job level and moved toward new endpoint level * Adds taskName and taskGroupName to actions at job level * Unmodified job mock actions tests * actionless job test * actionless job test * Multi group multi task actions test * HTTP method check for GET, cleaner errors in job_endpoint_test * decomment * Actions aggregated at job model level (#18733) * Removal of temporary fix to proxy to 4646 * Run Action websocket endpoint (#18760) * Working demo for review purposes * removal of cors passthru for websockets * Remove job_endpoint-specific ws handlers and aimed at existing alloc exec handlers instead * PR comments adressed, no need for taskGroup pass, better group and task lookups from alloc * early return in action validate and removed jobid from req args per PR comments * todo removal, we're checking later in the rpc * boolean style change on tty * Action CLI command (#18778) * Action command init and stuck-notes * Conditional reqpath to aim at Job action endpoint * De-logged * General CLI command cleanup, observe namespace, pass action as string, get random alloc w group adherence * tab and varname cleanup * Remove action param from Allocations().Exec calls * changelog * dont nil-check acl --------- Co-authored-by: Tim Gross <tgross@hashicorp.com>	2023-10-20 13:05:55 -04:00
Seth Hoenig	3e8ebf85f5	lang: add a helper for iterating a map in order (#18809 ) In some cases it is helpful to iterate a map in the sorted order of the maps keyset - particularly in implementations of some function for which the tests cannot be deterministic without order.	2023-10-20 08:11:35 -05:00
James Rasell	1a0d1efb0d	cli: use single dep func for opening URLs. (#18808 )	2023-10-20 08:24:11 +01:00
James Rasell	ca9e08e6b5	monitor: add log include location option on monitor CLI and API (#18795 )	2023-10-20 07:55:22 +01:00
Tim Gross	f5c5035fde	testutil: add ACL bootstrapping to test server configuration (#18811 ) Some of our `api` package tests have ACLs enabled, but none of those tests also run clients and the "wait for the clients to be live" code reads from the Node API. The caller can't bootstrap ACLs until `NewTestServer` returns, and this makes for a circular dependency. Allow developers to provide a bootstrap token to the test server config, and if it's available, have the server bootstrap the ACL system with it before checking for live clients.	2023-10-19 16:50:38 -04:00
Seth Hoenig	83720740f5	core: plumbing to support numa aware scheduling (#18681 ) * core: plumbing to support numa aware scheduling * core: apply node resources compatibility upon fsm rstore Handle the case where an upgraded server dequeus an evaluation before a client triggers a new fingerprint - which would be needed to cause the compatibility fix to run. By running the compat fix on restore the server will immediately have the compatible pseudo topology to use. * lint: learn how to spell pseudo	2023-10-19 15:09:30 -05:00
Piotr Kazmierczak	0410b8acea	client: remove unnecessary debugging from consul client mock (#18807 )	2023-10-19 16:23:42 +02:00
Luiz Aoqui	8b9a5fde4e	vault: add multi-cluster support on templates (#18790 ) In Nomad Enterprise, a task may connect to a non-default Vault cluster, requiring `consul-template` to be configured with a specific client `vault` block.	2023-10-18 20:45:01 -04:00
Piotr Kazmierczak	16d71582f6	client: `consul_hook` tests (#18780 ) ref https://github.com/hashicorp/team-nomad/issues/404	2023-10-18 20:02:35 +02:00
Luiz Aoqui	99e54da9a9	ui: fix websocket connections on dev proxy (#18791 ) * ui: fix websocket connections on dev proxy `ember-cli` includes a handler for websocket upgrade requests to start proxying requests. Handling the same upgrade event by also proxying in `api.js` causes some kind of conflict where the connection is closed unexpectedly. The only change necessary on upgrade is to overwrite the Origin header so Nomad accepts the connection. * ui: remove unused variables	2023-10-18 09:50:10 -04:00
Daniel Bennett	b027d8f771	do not embed *Server (#18786 ) these structs embedding Server, then Server _also embedding them_, confused my IDE, isn't necessary, and just feels wrong!	2023-10-17 15:15:00 -05:00
Tim Gross	d0957eb109	Consul: agent config updates for WI (#18774 ) This changeset makes two changes: * Removes the `consul.use_identity` field from the agent configuration. This behavior is properly covered by the presence of `consul.service_identity` / `consul.task_identity` blocks. * Adds a `consul.task_auth_method` and `consul.service_auth_method` fields to the agent configuration. This allows the cluster administrator to choose specific Consul Auth Method names for their environment, with a reasonable default.	2023-10-17 14:42:14 -04:00
Tim Gross	ac56855f07	consul: add multi-cluster support to client constructors (#18624 ) When agents start, they create a shared Consul client that is then wrapped as various interfaces for testability, and used in constructing the Nomad client and server. The interfaces that support workload services (rather than the Nomad agent itself) need to support multiple Consul clusters for Nomad Enterprise. Update these interfaces to be factory functions that return the Consul client for a given cluster name. Update the `ServiceClient` to split workload updates between clusters by creating a wrapper around all the clients that delegates to the cluster-specific `ServiceClient`. Ref: https://github.com/hashicorp/team-nomad/issues/404	2023-10-17 13:46:49 -04:00
Daniel Bennett	8234a422f3	only generate default workload identity once (#18776 ) per alloc task. this can save a bit of cpu when running plans for tasks that already exist, and prevents Nomad tokens from changing, which can cause task template{}s to restart unnecessarily.	2023-10-17 12:31:18 -05:00
modrake	51ffe4208e	workaround and fixes for MPL and copywrite bot (#18775 )	2023-10-17 08:02:13 +01:00
Luiz Aoqui	349c032369	vault: update task runner vault hook to support workload identity (#18534 )	2023-10-16 19:37:57 -04:00
James Rasell	1ffdd576bb	agent: add config option to enable file and line log detail. (#18768 )	2023-10-16 15:59:16 +01:00
James Rasell	fe0a06e4bc	demo: clarify CSI host-path single client limitation. (#18770 ) Co-authored-by: Tim Gross <tgross@hashicorp.com>	2023-10-16 15:58:26 +01:00
Tim Gross	cbd7248248	auth: use `ACLsDisabledACL` when ACLs are disabled (#18754 ) The RPC handlers expect to see `nil` ACL objects whenever ACLs are disabled. By using `nil` as a sentinel value, we have the risk of nil pointer exceptions and improper handling of `nil` when returned from our various auth methods that can lead to privilege escalation bugs. This is the final patch in a series to eliminate the use of `nil` ACLs as a sentinel value for when ACLs are disabled. This patch adds a new virtual ACL policy field for when ACLs are disabled and updates our authentication logic to use it. Included: * Extends auth package tests to demonstrate that nil ACLs are treated as failed auth and disabled ACLs succeed auth. * Adds a new `AllowDebug` ACL check for the weird special casing we have for pprof debugging when ACLs are disabled. * Removes the remaining unexported methods (and repeated tests) from the `nomad/acl.go` file. * Update the semgrep rules to detect improper nil ACL checking and remove the old invalid ACL checks. * Update the contributing guide for RPC authentication. Ref: https://github.com/hashicorp/nomad-enterprise/pull/1218 Ref: https://github.com/hashicorp/nomad/pull/18703 Ref: https://github.com/hashicorp/nomad/pull/18715 Ref: https://github.com/hashicorp/nomad/pull/16799 Ref: https://github.com/hashicorp/nomad/pull/18730 Ref: https://github.com/hashicorp/nomad/pull/18744	2023-10-16 09:30:24 -04:00
Kevin Wang	6dcc402188	chore(docs): update `file` HCL function (#18696 )	2023-10-16 09:03:50 +01:00
Piotr Kazmierczak	299f3bf74b	client: use WI-issued consul tokens in the template_hook (#18752 ) ref https://github.com/hashicorp/team-nomad/issues/404	2023-10-16 09:39:20 +02:00
dependabot[bot]	cb2363f2fb	chore(deps): bump github.com/hashicorp/go-bexpr from 0.1.12 to 0.1.13 (#18758 )	2023-10-16 08:21:57 +01:00
Piotr Kazmierczak	b697de9dda	client: correct consul block validation in the consul_hook (#18751 )	2023-10-13 15:15:04 +02:00
Tim Gross	0931f2ba12	csi: add test for that plugin allocs are filtered by namespace (#18753 ) A CSI plugin can be made up of multiple jobs, which may not be in the same namespace. When querying for a plugin and getting information about the allocations that implement the plugin, we need to filter by the namespaces the user has access to. This test existed in the ENT code base and was never moved over to CE when we made namespaces part of the CE product.	2023-10-13 09:06:36 -04:00
James Rasell	e02dd2a331	vault: use an importable const for Vault header string. (#18740 )	2023-10-13 07:39:06 +01:00
Tim Gross	484f91b893	auth: remove "mixed auth" special casing for Variables endpoint (#18744 ) The RPC handlers expect to see `nil` ACL objects whenever ACLs are disabled. By using `nil` as a sentinel value, we have the risk of nil pointer exceptions and improper handling of `nil` when returned from our various auth methods that can lead to privilege escalation bugs. This is the third in a series to eliminate the use of `nil` ACLs as a sentinel value for when ACLs are disabled. This patch involves leveraging the refactored `auth` package to remove the weird "mixed auth" helper functions that only support the Variables read/list RPC handlers. Instead, pass the ACL object and claim together into the `AllowVariableOperations` method in the usual `acl` package. Ref: https://github.com/hashicorp/nomad-enterprise/pull/1218 Ref: https://github.com/hashicorp/nomad/pull/18703 Ref: https://github.com/hashicorp/nomad/pull/18715 Ref: https://github.com/hashicorp/nomad/pull/16799 Ref: https://github.com/hashicorp/nomad/pull/18730 Fixes: https://github.com/hashicorp/nomad/issues/15875	2023-10-12 16:43:11 -04:00
Piotr Kazmierczak	91753308b3	WI: set the right identity name for Consul tasks (#18742 ) Consul tasks should only have 1 identity of the form consul/{consul_cluster_name}.	2023-10-12 20:34:15 +02:00
Tim Gross	3633ca0f8c	auth: add client-only ACL (#18730 ) The RPC handlers expect to see `nil` ACL objects whenever ACLs are disabled. By using `nil` as a sentinel value, we have the risk of nil pointer exceptions and improper handling of `nil` when returned from our various auth methods that can lead to privilege escalation bugs. This is the third in a series to eliminate the use of `nil` ACLs as a sentinel value for when ACLs are disabled. This patch involves creating a new "virtual" ACL object for checking permissions on client operations and a matching `AuthenticateClientOnly` method for client-only RPCs that can produce that ACL. Unlike the server ACLs PR, this also includes a special case for "legacy" client RPCs where the client was not previously sending the secret as it should (leaning on mTLS only). Those client RPCs were fixed in Nomad 1.6.0, but it'll take a while before we can guarantee they'll be present during upgrades. Ref: https://github.com/hashicorp/nomad-enterprise/pull/1218 Ref: https://github.com/hashicorp/nomad/pull/18703 Ref: https://github.com/hashicorp/nomad/pull/18715 Ref: https://github.com/hashicorp/nomad/pull/16799	2023-10-12 12:21:48 -04:00
dependabot[bot]	cecd9b0472	chore(deps): bump golang.org/x/net from 0.14.0 to 0.17.0 (#18734 )	2023-10-12 07:58:59 +01:00

1 2 3 4 5 ...

25192 Commits