nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-01 16:05:42 +03:00

Author	SHA1	Message	Date
hc-github-team-nomad-core	6e08d9ffff	Generate files for 1.7.5 release	2024-02-13 11:32:59 -05:00
Tim Gross	e986c298ac	alloc exec: fix panics after stream close (#19932 ) In #19172 we added a check on websocket errors to see if they were one of several benign "close" messages. This change inadvertently assumed that other messages used for close would not implement `HTTPCodedError`. When errors like the following are received: > msgpack decode error [pos 0]: io: read/write on closed pipe" they are sent from the inner loop as though they were a "real" error, but the channel is already being closed with a "close" message. This allowed many more attempts to pass thru a previously-undiscovered race condition in the two goroutines that stream RPC responses to the websocket. When the input stream returns an error for any reason (for example, the command we're executing has exited), it will unblock the "outer" goroutine and cause a write to the websocket. If we're concurrently writing the "close error" discussed above, this results in a panic from the websocket library. This changeset includes two fixes: * Catch "closed pipe" error correctly so that we're not sending unnecessary error messages. * Move all writes to the websocket into the same response streaming goroutine. The main handler goroutine will block on a results channel, and the response streaming goroutine will send on that channel with the final error when it's done so it can be reported to the user.	2024-02-12 09:43:34 -05:00
Tim Gross	110d93ab25	windows: remove LazyDLL calls for system modules (#19925 ) On Windows, Nomad uses `syscall.NewLazyDLL` and `syscall.LoadDLL` functions to load a few system DLL files, which does not prevent DLL hijacking attacks. Hypothetically a local attacker on the client host that can place an abusive library in a specific location could use this to escalate privileges to the Nomad process. Although this attack does not fall within the Nomad security model, it doesn't hurt to follow good practices here. We can remove two of these DLL loads by using wrapper functions provided by the stdlib in `x/sys/windows` Co-authored-by: dduzgun-security <deniz.duzgun@hashicorp.com>	2024-02-09 08:47:48 -05:00
hc-github-team-nomad-core	875e96cccc	Generate files for 1.7.4 release	2024-02-08 10:40:24 -05:00
Luiz Aoqui	ce710d49fd	cli: fix `tls ca create` command with `-domain` (#19892 ) The current implementation of the `nomad tls ca create` command ovierrides the value of the `-domain` flag with `"nomad"` if no additional customization is provided. This results in a certificate for the wrong domain or an error if the `-name-constraint` flag is also used. THe logic for `IsCustom()` also seemed reversed. If all custom fields are empty then the certificate is _not_ customized, so `IsCustom()` should return false.	2024-02-07 16:40:51 -05:00
Luiz Aoqui	50c50a6328	cli: fix return code when job deployment succeeds (#19876 ) When a job eval is blocked due to missing capacity, the `nomad job run` command will monitor the deployment, which may succeed once additional capacity is made available. But the current implementation would return `2` even when the deployment succeeded because it only took the first eval status into account. This commit updates the eval monitoring logic to reset the scheduling error state if the deployment eventually succeeds.	2024-02-05 18:32:25 -05:00
Juana De La Cuesta	120c3ca3c9	Add granular control of SELinux labels for host mounts (#19839 ) Add new configuration option on task's volume_mounts, to give a fine grained control over SELinux "z" label * Update website/content/docs/job-specification/volume_mount.mdx Co-authored-by: Luiz Aoqui <luiz@hashicorp.com> * fix: typo * func: make volume mount verification happen even on mounts with no volume --------- Co-authored-by: Luiz Aoqui <luiz@hashicorp.com> Co-authored-by: Tim Gross <tgross@hashicorp.com>	2024-02-05 10:05:33 +01:00
Piotr Kazmierczak	11ca21ca3c	cli: correct typos in setup consul (#19754 )	2024-01-17 14:13:07 +01:00
James Rasell	41555b6370	cli: Fix minor help formatting issue in agent command. (#19743 )	2024-01-17 12:18:00 +00:00
hc-github-team-nomad-core	ddfc157c0a	Generate files for 1.7.3 release	2024-01-15 15:58:41 -05:00
Luiz Aoqui	e1e80f383e	vault: add new `nomad setup vault -check` commmand (#19720 ) The new `nomad setup vault -check` commmand can be used to retrieve information about the changes required before a cluster is migrated from the deprecated legacy authentication flow with Vault to use only workload identities.	2024-01-12 15:48:30 -05:00
Tim Gross	0935f443dc	vault: support allowing tokens to expire without refresh (#19691 ) Some users with batch workloads or short-lived prestart tasks want to derive a Vaul token, use it, and then allow it to expire without requiring a constant refresh. Add the `vault.allow_token_expiration` field, which works only with the Workload Identity workflow and not the legacy workflow. When set to true, this disables the client's renewal loop in the `vault_hook`. When Vault revokes the token lease, the token will no longer be valid. The client will also now automatically detect if the Vault auth configuration does not allow renewals and will disable the renewal loop automatically. Note this should only be used when a secret is requested from Vault once at the start of a task or in a short-lived prestart task. Long-running tasks should never set `allow_token_expiration=true` if they obtain Vault secrets via `template` blocks, as the Vault token will expire and the template runner will continue to make failing requests to Vault until the `vault_retry` attempts are exhausted. Fixes: https://github.com/hashicorp/nomad/issues/8690	2024-01-10 14:49:02 -05:00
Tim Gross	d3e5cae1eb	consul: support admin partitions (#19665 ) Add support for Consul Enterprise admin partitions. We added fingerprinting in https://github.com/hashicorp/nomad/pull/19485. This PR adds a `consul.partition` field. The expectation is that most users will create a mapping of Nomad node pool to Consul admin partition. But we'll also create an implicit constraint for the fingerprinted value. Fixes: https://github.com/hashicorp/nomad/issues/13139	2024-01-10 10:41:29 -05:00
Egor Mikhailov	18f49e015f	auth: add new optional `OIDCDisableUserInfo` setting for OIDC auth provider (#19566 ) Add new optional `OIDCDisableUserInfo` setting for OIDC auth provider which disables a request to the identity provider to get OIDC UserInfo. This option is helpful when your identity provider doesn't send any additional claims from the UserInfo endpoint, such as Microsoft AD FS OIDC Provider: > The AD FS UserInfo endpoint always returns the subject claim as specified in the > OpenID standards. AD FS doesn't support additional claims requested via the > UserInfo endpoint Fixes #19318	2024-01-09 13:41:46 -05:00
James Rasell	2abbd7e485	cli: fix operator snapshot save help output examples. (#19606 )	2024-01-05 07:43:12 +00:00
Phil Renaud	a5881963dd	Error message typo fix: Filed to Failed (#19611 )	2024-01-04 21:56:23 -05:00
Mike Nomitch	31f4296826	Adds support for failures before warning to Consul service checks (#19336 ) Adds support for failures before warning and failures before critical to the automatically created Nomad client and server services in Consul	2023-12-14 11:33:31 -08:00
hc-github-team-nomad-core	b777013ff9	Generate files for 1.7.2 release	2023-12-14 11:23:55 +01:00
James Rasell	71ea1deda7	cli: Fix bug in `var put` command using mix of flags and spec. (#19423 )	2023-12-12 08:31:22 +00:00
hc-github-team-nomad-core	180fd54918	Generate files for 1.7.1 release	2023-12-08 14:39:09 -05:00
Luiz Aoqui	099ee06a60	Revert "deps: update go-metrics to v0.5.3 (#19190 )" (#19374 ) * Revert "deps: update go-metrics to v0.5.3 (#19190)" This reverts commit `ddb060d8b3`. * changelog: add entry for #19374	2023-12-08 08:46:55 -05:00
Luiz Aoqui	c624dc2121	config: fix loading Vault token from env var (#19349 ) The `defaultVault` variable is a pointer to the Vault configuration named `default`. Initially, this variable points to the Vault configuration that is used to load CLI flag values, but after those are merged with the default and config file values the pointer reference must be updated before mutating the config with environment variable values.	2023-12-07 11:56:53 -05:00
Luiz Aoqui	27d2ad1baf	cli: add `-dev-consul` and `-dev-vault` agent mode (#19327 ) The `-dev-consul` and `-dev-vault` flags add default identities and configuration to the Nomad agent to connect and use the workload identity integration with Consul and Vault.	2023-12-07 11:51:20 -05:00
hc-github-team-nomad-core	e799b06f02	Generate files for 1.7.0 release	2023-12-07 16:43:02 +01:00
Tim Gross	3c4e2009f5	connect: deployments should wait for Connect sidecar checks (#19334 ) When a Connect service is registered with Consul, Nomad includes the nested `Connect.SidecarService` field that includes health checks for the Envoy proxy. Because these are not part of the job spec, the alloc health tracker created by `health_hook` doesn't know to read the value of these checks. In many circumstances this won't be noticed, but if the Envoy health check happens to take longer than the `update.min_healthy_time` (perhaps because it's been set low), it's possible for a deployment to progress too early such that there will briefly be no healthy instances of the service available in Consul. Update the Consul service client to find the nested sidecar service in the service catalog and attach it to the results provided to the tracker. The tracker can then check the sidecar health checks. Fixes: https://github.com/hashicorp/nomad/issues/19269	2023-12-06 16:59:51 -05:00
Juana De La Cuesta	cf539c405e	Add a new parameter to avoid starting a replacement for lost allocs (#19101 ) This commit introduces the parameter preventRescheduleOnLost which indicates that the task group can't afford to have multiple instances running at the same time. In the case of a node going down, its allocations will be registered as unknown but no replacements will be rescheduled. If the lost node comes back up, the allocs will reconnect and continue to run. In case of max_client_disconnect also being enabled, if there is a reschedule policy, an error will be returned. Implements issue #10366 Co-authored-by: Dom Lavery <dom@circleci.com> Co-authored-by: Tim Gross <tgross@hashicorp.com> Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>	2023-12-06 12:28:42 +01:00
Piotr Kazmierczak	0a783d0046	wi: change setup cmds -cleanup flag to -destroy (#19295 )	2023-12-04 15:28:17 +01:00
Piotr Kazmierczak	9d209d6725	vault: claims for WI workloads should not contain nomad_group (#19296 )	2023-12-04 15:25:22 +01:00
Luiz Aoqui	d12dc36c3b	cli: add Consul namespace selector (#19251 ) Update the `nomad setup consul` command to include a `Selector` for the `NamespaceRule` so the logic is only applied when the token has a claim for `consul_namespace`. Jobs without an explicit `consul.namespace` value receive a JWT without the `consul_namespace` claim because Nomad is unable to determine which Consul namespace should be used. By using `NamespaceRules`, cluster operators are able to set a default value for these jobs.	2023-12-01 09:29:08 -05:00
Michael Schurter	4cb40433bb	Post 1.7.0 rc.1 release (#19252 ) * Prepare release 1.7.0-rc.1 * Generate files for 1.7.0-rc.1 release * Prepare for next release	2023-12-01 08:53:48 -05:00
Phil Renaud	d104432cd3	Actions: API, command, and jobspec docs (#19166 ) * API command and jobspec docs * PR comments addressed * API docs for job/jobid/action socket * Removing a perhaps incorrect origin of job_id across the jobs api doc * PR comments addressed	2023-11-30 14:13:37 -05:00
Piotr Kazmierczak	67bbcc4a4f	cli: setup consul proper ns handling (#19237 ) In order to correctly handle Consul namespaces, auth methods and binding rules must always be created in the default namespace only. --------- Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>	2023-11-30 20:09:19 +01:00
James Rasell	81249ffe65	agent: log using error keyword not err in keyring endpoint (#19243 )	2023-11-30 16:40:13 +00:00
Luiz Aoqui	d29ac461a7	cli: non-service jobs on `job restart -reschedule` (#19147 ) The `-reschedule` flag stops allocations and assumes the Nomad scheduler will create new allocations to replace them. But this is only true for service and batch jobs. Restarting non-service jobs with the `-reschedule` flag causes the command to loop forever waiting for the allocations to be replaced, which never happens. Allocations for system jobs may be replaced by triggering an evaluation after each stop to cause the reconciler to run again. Sysbatch jobs should not be allowed to be rescheduled as they are never replaced by the scheduler.	2023-11-29 13:01:19 -05:00
James Rasell	0819aab237	cli: fix help formatting on job stop command. (#19214 )	2023-11-29 15:52:37 +00:00
Luiz Aoqui	ddb060d8b3	deps: update go-metrics to v0.5.3 (#19190 ) Update `go-metrics` to v0.5.3 to pick https://github.com/hashicorp/go-metrics/pull/146.	2023-11-28 12:37:57 -05:00
Jorge Marey	5f78940911	Allow setting a token name template on auth methods (#19135 ) Co-authored-by: James Rasell <jrasell@hashicorp.com>	2023-11-28 12:26:21 +00:00
Piotr Kazmierczak	248b2ba5cd	WI: use single auth method for Consul by default (#19169 ) This simplifies the default setup of Nomad workloads WI-based authentication for Consul by using a single auth method with 2 binding rules. Users can still specify separate auth methods for services and tasks.	2023-11-28 12:22:27 +01:00
Luiz Aoqui	5ff6cce3ab	vault: update default JWT auth method path (#19188 ) Update default auth method path to be `jwt-nomad` to avoid potential conflicts when Vault's `jwt` default is already being used for something else.	2023-11-27 17:48:12 -05:00
Piotr Kazmierczak	742651f2f7	agent: ignore websocket statuses 1000, 1001 and 1005 correctly (#19172 ) These are "close" messages and not actual errors.	2023-11-27 09:33:08 +01:00
Phil Renaud	fb14c2b556	[ui] Actions service and flyout (#19084 ) * Initial pass at a global actions instance queue * Action card with a bunch of functionality that needs to be pared back a bit * Happy little actions button * runAction performs updated to use actions service * Stop All and Clear Finished buttons * Keyboard service now passes element, so we can pseudo-click the actions dropdown * resizable sidebar code blocks * Contextual actions within task and job levels * runAction greatly consolidated * Pluralize action text * Peer grouping of flyout action intances * ShortIDs instead of full alloc IDs * Testfixes that previously depended on notifications * Stop and stop all for peered action instances * Job name in action instance card linkable * Componentized actions global button * scss consolidation * Clear and Stop buttons become mutually exclusive in an action card * Clean up action card title styles a bit * todo-bashing * stopAll and stopPeers separated and fixed up * Socket handling functions moved to the Actions service * Error handling on socket message * Smarter import * Documentation note: need alloc-exec and alloc-raw-exec for raw_exec jobs * Tests for flyout and dropdown actions * Docs link when in empty flyout/queue state and percy snapshot test for it	2023-11-26 23:46:44 -05:00
James Rasell	cfbb2e8923	cli: use spaces when outputting ACL auth method token TTL param. (#19159 )	2023-11-24 10:39:27 +00:00
Luiz Aoqui	bdac8d9583	cli: prevent panic on CTRL+C during a question (#19154 ) Fix a panic when a question receives an interrupt signal before the signal handler is initialized.	2023-11-23 14:51:56 -05:00
Luiz Aoqui	d2849b8a76	cli: skip allocs with replacements on job restart (#19155 ) The `nomad job restart` command should skip allocations that already have replacements. Restarting an allocation with a replacement is a no-op because the allocation status is terminal and the command's replacement monitor returns immediatelly. But by not skipping them, the effective batch size is computed incorrectly.	2023-11-23 14:51:10 -05:00
James Rasell	532402aa2d	actions: use specific RPC request object and tighten naming. (#19149 )	2023-11-23 07:42:37 +00:00
Phil Renaud	eb8553c16f	Reframe nomad action as a wrapper around nomad job action (#19048 ) * Reframe nomad action as a wrapper around nomad job action * dont conditionally pass flags, just pass flags * PR comments addressed	2023-11-22 09:23:48 -05:00
James Rasell	0f0b9a1a3c	action: add job action name validation (#19145 )	2023-11-22 08:02:49 +00:00
hc-github-team-nomad-core	ea3f6cc879	Generate files for 1.7.0-beta.2 release	2023-11-15 22:47:41 +00:00
Adriano Caloiaro	f66eb83fc0	Add `go-netaddrs` support to `retry_join` (#18745 )	2023-11-15 10:07:18 -05:00
Luiz Aoqui	26746a4093	cli: add zero nodes message to `node status` (#19082 ) Display a message to indicate that there are no nodes registered when `node status` returns zero values.	2023-11-14 23:00:12 -05:00

1 2 3 4 5 ...

3728 Commits