nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-01 16:05:42 +03:00

Author	SHA1	Message	Date
James Rasell	5989d5862a	ci: Update golangci-lint to v2 and fix highlighted issues. (#26334 )	2025-07-25 10:44:08 +01:00
Allison Larson	918e1eb123	Correctly canonicalize lifecycle block when missing hook value (#26285 )	2025-07-16 11:40:16 -07:00
Tim Gross	279775082c	sysbatch: correctly validate that reschedule policy is not allowed (#26279 ) System and sysbatch jobs don't support the reschedule block, because we'd always replace allocations back onto the same node. The job validation for system jobs asserts that the user hasn't set a `reschedule` block so that users aren't submitting jobs expecting it to be supported. But this validation was missing for sysbatch jobs. Validate that sysbatch jobs don't have a reschedule block.	2025-07-15 10:47:02 -04:00
Juana De La Cuesta	bdfd573fc4	Update the scaling policies when deregistering a job (#25911 ) * func: Update the scaling policies when deregistering a job * func: Add tests for updating the policy * docs: add changelog * func: set back the old order * style: rearrange for clarity and to reuse the watchset * func: set the policies to teh last submitted when starting a job * func: expand tests of teh start job command to include job submission * func: Expand the tests to verify the correct state of the scaling policy after job start * Update command/job_start.go Co-authored-by: Tim Gross <tgross@hashicorp.com> * Update nomad/fsm_test.go Co-authored-by: Tim Gross <tgross@hashicorp.com> * func: add warning when there is no previous job submission --------- Co-authored-by: Tim Gross <tgross@hashicorp.com>	2025-06-02 16:11:38 +02:00
Michael Smithhisler	4c8257d0c7	client: add once mode to template block (#25922 )	2025-05-28 11:45:11 -04:00
James Rasell	c53ba3e7d1	consul: Remove implicit workload identity when task has a template. (#25298 ) When a task included a template block, Nomad was adding a Consul identity by default which allowed the template to use Consul API template functions even when they were not needed or desired. This change removes the implict addition of Consul identities to tasks when they include a template block. Job specification authors will now need to add a Consul identity or Consul block to their task if they have a template which uses Consul API functions. This change also removes the default addition of a Consul block to all task groups registered and processed by the API package.	2025-03-10 13:49:50 +00:00
Michael Smithhisler	f2b761f17c	disconnected: removes deprecated disconnect fields (#25284 ) The group level fields stop_after_client_disconnect, max_client_disconnect, and prevent_reschedule_on_lost were deprecated in Nomad 1.8 and replaced by field in the disconnect block. This change removes any logic related to those deprecated fields. --------- Co-authored-by: Tim Gross <tgross@hashicorp.com>	2025-03-05 14:46:02 -05:00
Michael Smithhisler	47c14ddf28	remove remote task execution code (#24909 )	2025-01-29 08:08:34 -05:00
Piotr Kazmierczak	967addec48	stateful deployments: add corrections to API structs and methods (#24700 ) This changeset includes changes accidentally left out from 24641.	2024-12-19 09:25:54 -05:00
Seth Hoenig	f1ce127524	jobspec: add a chown option to artifact block (#24157 ) * jobspec: add a chown option to artifact block This PR adds a boolean 'chown' field to the artifact block. It indicates whether the Nomad client should chown the downloaded files and directories to be owned by the task.user. This is useful for drivers like raw_exec and exec2 which are subject to the host filesystem user permissions structure. Before, these drivers might not be able to use or manage the downloaded artifacts since they would be owned by the root user on a typical Nomad client configuration. * api: no need for pointer of chown field	2024-10-11 11:30:27 -05:00
Michael Smithhisler	6b6aa7cc26	identity: adds ability to specify custom filepath for saving workload identities (#24038 )	2024-09-23 10:27:00 -04:00
Daniel Bennett	4415fabe7d	jobspec: time based task execution (#22201 ) this is the CE side of an Enterprise-only feature. a job trying to use this in CE will fail to validate. to enable daily-scheduled execution entirely client-side, a job may now contain: task "name" { schedule { cron { start = "0 12 * * * *" # may not include "," or "/" end = "0 16" # partial cron, with only {minute} {hour} timezone = "EST" # anything in your tzdb } } ... and everything about the allocation will be placed as usual, but if outside the specified schedule, the taskrunner will block on the client, waiting on the schedule start, before proceeding with the task driver execution, etc. this includes a taksrunner hook, which watches for the end of the schedule, at which point it will kill the task. then, restarts-allowing, a new task will start and again block waiting for start, and so on. this also includes all the plumbing required to pipe API calls through from command->api->agent->server->client, so that tasks can be force-run, force-paused, or resume the schedule on demand.	2024-05-22 15:40:25 -05:00
Amir Abbas	40b8f17717	Support insecure flag on artifact (#20126 )	2024-03-14 10:59:20 -05:00
Juana De La Cuesta	20cfbc82d3	Introduces `Disconnect` block into the `TaskGroup` configuration (#19886 ) This PR is the first on two that will implement the new Disconnect block. In this PR the new block is introduced to be backwards compatible with the fields it will replace. For more information refer to this RFC and this ticket.	2024-02-19 16:41:35 +01:00
Juana De La Cuesta	120c3ca3c9	Add granular control of SELinux labels for host mounts (#19839 ) Add new configuration option on task's volume_mounts, to give a fine grained control over SELinux "z" label * Update website/content/docs/job-specification/volume_mount.mdx Co-authored-by: Luiz Aoqui <luiz@hashicorp.com> * fix: typo * func: make volume mount verification happen even on mounts with no volume --------- Co-authored-by: Luiz Aoqui <luiz@hashicorp.com> Co-authored-by: Tim Gross <tgross@hashicorp.com>	2024-02-05 10:05:33 +01:00
Tim Gross	0935f443dc	vault: support allowing tokens to expire without refresh (#19691 ) Some users with batch workloads or short-lived prestart tasks want to derive a Vaul token, use it, and then allow it to expire without requiring a constant refresh. Add the `vault.allow_token_expiration` field, which works only with the Workload Identity workflow and not the legacy workflow. When set to true, this disables the client's renewal loop in the `vault_hook`. When Vault revokes the token lease, the token will no longer be valid. The client will also now automatically detect if the Vault auth configuration does not allow renewals and will disable the renewal loop automatically. Note this should only be used when a secret is requested from Vault once at the start of a task or in a short-lived prestart task. Long-running tasks should never set `allow_token_expiration=true` if they obtain Vault secrets via `template` blocks, as the Vault token will expire and the template runner will continue to make failing requests to Vault until the `vault_retry` attempts are exhausted. Fixes: https://github.com/hashicorp/nomad/issues/8690	2024-01-10 14:49:02 -05:00
Luiz Aoqui	e4e70b086a	ci: run linter in `./api` package (#19513 )	2023-12-19 15:59:47 -05:00
Juana De La Cuesta	cf539c405e	Add a new parameter to avoid starting a replacement for lost allocs (#19101 ) This commit introduces the parameter preventRescheduleOnLost which indicates that the task group can't afford to have multiple instances running at the same time. In the case of a node going down, its allocations will be registered as unknown but no replacements will be rescheduled. If the lost node comes back up, the allocs will reconnect and continue to run. In case of max_client_disconnect also being enabled, if there is a reschedule policy, an error will be returned. Implements issue #10366 Co-authored-by: Dom Lavery <dom@circleci.com> Co-authored-by: Tim Gross <tgross@hashicorp.com> Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>	2023-12-06 12:28:42 +01:00
Michael Schurter	e49ca3c431	identity: Implement `change_mode` (#18943 ) * identity: support change_mode and change_signal wip - just jobspec portion * test struct * cleanup some insignificant boogs * actually implement change mode * docs tweaks * add changelog * test identity.change_mode operations * use more words in changelog * job endpoint tests * address comments from code review --------- Co-authored-by: Tim Gross <tgross@hashicorp.com>	2023-11-01 09:41:11 -05:00
Phil Renaud	8902afe651	Nomad Actions (#18794 ) * Scaffolding actions (#18639) * Task-level actions for job submissions and retrieval * FIXME: Temporary workaround to get ember dev server to pass exec through to 4646 * Update api/tasks.go Co-authored-by: Tim Gross <tgross@hashicorp.com> * Update command/agent/job_endpoint.go Co-authored-by: Tim Gross <tgross@hashicorp.com> * Diff and copy implementations * Action structs get their own file, diff updates to behave like our other diffs * Test to observe actions changes in a version update * Tests migrated into structs/diff_test and modified with PR comments in mind * APIActionToSTructsAction now returns a new value * de-comment some plain parts, remove unused action lookup * unused param in action converter --------- Co-authored-by: Tim Gross <tgross@hashicorp.com> * New endpoint: job/:id/actions (#18690) * unused param in action converter * backing out of parse_job level and moved toward new endpoint level * Adds taskName and taskGroupName to actions at job level * Unmodified job mock actions tests * actionless job test * actionless job test * Multi group multi task actions test * HTTP method check for GET, cleaner errors in job_endpoint_test * decomment * Actions aggregated at job model level (#18733) * Removal of temporary fix to proxy to 4646 * Run Action websocket endpoint (#18760) * Working demo for review purposes * removal of cors passthru for websockets * Remove job_endpoint-specific ws handlers and aimed at existing alloc exec handlers instead * PR comments adressed, no need for taskGroup pass, better group and task lookups from alloc * early return in action validate and removed jobid from req args per PR comments * todo removal, we're checking later in the rpc * boolean style change on tty * Action CLI command (#18778) * Action command init and stuck-notes * Conditional reqpath to aim at Job action endpoint * De-logged * General CLI command cleanup, observe namespace, pass action as string, get random alloc w group adherence * tab and varname cleanup * Remove action param from Allocations().Exec calls * changelog * dont nil-check acl --------- Co-authored-by: Tim Gross <tgross@hashicorp.com>	2023-10-20 13:05:55 -04:00
Tim Gross	aaee3076c2	consul: allow `consul` block in task scope (#18597 ) To support Workload Identity with Consul for templates, we want templates to be able to use the WI created at the task scope (either implicitly or set by the user). But to allow different tasks within a group to be assigned to different clusters as we're doing for Vault, we need to be able to set the `consul` block with its `cluster` field at the task level to override the group.	2023-09-29 15:03:48 -04:00
Michael Schurter	ef24e40b39	identity: support jwt expiration and rotation (#18262 ) Implements expirations and renewals for alternate workload identity tokens.	2023-09-08 14:50:34 -07:00
Tim Gross	7cdd592809	jobspec: support `cluster` field for Vault block (#18408 ) This field supports the upcoming ENT-only multiple Vault clusters feature. The job validation and mutation hooks will come in a separate PR. Ref: https://github.com/hashicorp/team-nomad/issues/404	2023-09-07 10:15:28 -04:00
Piotr Kazmierczak	9fa39eb829	jobspec: add nomad_service field and identity block (#18239 ) This PR introduces updates to the jobspec required for workload identity support for services. --------- Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>	2023-08-21 20:07:47 +02:00
Luiz Aoqui	196213c451	jobspec: add `role` to `vault` (#18257 )	2023-08-18 15:29:02 -04:00
Michael Schurter	0e22fc1a0b	identity: add support for multiple identities + audiences (#18123 ) Allows for multiple `identity{}` blocks for tasks along with user-specified audiences. This is a building block to allow workload identities to be used with Consul, Vault and 3rd party JWT based auth methods. Expiration is still unimplemented and is necessary for JWTs to be used securely, so that's up next. --------- Co-authored-by: Tim Gross <tgross@hashicorp.com>	2023-08-15 09:11:53 -07:00
Gerard Nguyen	9e98d694a6	feature: Add new field render_templates on restart block (#18054 ) This feature is necessary when user want to explicitly re-render all templates on task restart. E.g. to fetch all new secrets from Vault, even if the lease on the existing secrets has not been expired.	2023-07-28 11:53:32 -07:00
grembo	6f04b91912	Add `disable_file` parameter to job's `vault` stanza (#13343 ) This complements the `env` parameter, so that the operator can author tasks that don't share their Vault token with the workload when using `image` filesystem isolation. As a result, more powerful tokens can be used in a job definition, allowing it to use template stanzas to issue all kinds of secrets (database secrets, Vault tokens with very specific policies, etc.), without sharing that issuing power with the task itself. This is accomplished by creating a directory called `private` within the task's working directory, which shares many properties of the `secrets` directory (tmpfs where possible, not accessible by `nomad alloc fs` or Nomad's web UI), but isn't mounted into/bound to the container. If the `disable_file` parameter is set to `false` (its default), the Vault token is also written to the NOMAD_SECRETS_DIR, so the default behavior is backwards compatible. Even if the operator never changes the default, they will still benefit from the improved behavior of Nomad never reading the token back in from that - potentially altered - location.	2023-06-23 15:15:04 -04:00
Tim Gross	2aa3c746c4	logs: fix missing allocation logs after update to Nomad 1.5.4 (#17087 ) When the server restarts for the upgrade, it loads the `structs.Job` from the Raft snapshot/logs. The jobspec has long since been parsed, so none of the guards around the default value are in play. The empty field value for `Enabled` is the zero value, which is false. This doesn't impact any running allocation because we don't replace running allocations when either the client or server restart. But as soon as any allocation gets rescheduled (ex. you drain all your clients during upgrades), it'll be using the `structs.Job` that the server has, which has `Enabled = false`, and logs will not be collected. This changeset fixes the bug by adding a new field `Disabled` which defaults to false (so that the zero value works), and deprecates the old field. Fixes #17076	2023-05-04 16:01:18 -04:00
Tim Gross	30bc456f03	logs: allow disabling log collection in jobspec (#16962 ) Some Nomad users ship application logs out-of-band via syslog. For these users having `logmon` (and `docker_logger`) running is unnecessary overhead. Allow disabling the logmon and pointing the task's stdout/stderr to /dev/null. This changeset is the first of several incremental improvements to log collection short of full-on logging plugins. The next step will likely be to extend the internal-only task driver configuration so that cluster administrators can turn off log collection for the entire driver. --- Fixes: #11175 Co-authored-by: Thomas Weber <towe75@googlemail.com>	2023-04-24 10:00:27 -04:00
hashicorp-copywrite[bot]	f005448366	[COMPLIANCE] Add Copyright and License Headers	2023-04-10 15:36:59 +00:00
Luiz Aoqui	fffdbdff06	cli: job restart command (#16278 ) Implement the new `nomad job restart` command that allows operators to restart allocations tasks or reschedule then entire allocation. Restarts can be batched to target multiple allocations in parallel. Between each batch the command can stop and hold for a predefined time or until the user confirms that the process should proceed. This implements the "Stateless Restarts" alternative from the original RFC (https://gist.github.com/schmichael/e0b8b2ec1eb146301175fd87ddd46180). The original concept is still worth implementing, as it allows this functionality to be exposed over an API that can be consumed by the Nomad UI and other clients. But the implementation turned out to be more complex than we initially expected so we thought it would be better to release a stateless CLI-based implementation first to gather feedback and validate the restart behaviour. Co-authored-by: Shishir Mahajan <smahajan@roblox.com>	2023-03-23 18:28:26 -04:00
Charlie Voiselle	fe4ff5be2a	Add option to expose workload token to task (#15755 ) Add `identity` jobspec block to expose workload identity tokens to tasks. --------- Co-authored-by: Anders <mail@anars.dk> Co-authored-by: Tim Gross <tgross@hashicorp.com> Co-authored-by: Michael Schurter <mschurter@hashicorp.com>	2023-02-02 10:59:14 -08:00
Piotr Kazmierczak	949a6f60c7	renamed stanza to block for consistency with other projects (#15941 )	2023-01-30 15:48:43 +01:00
Charlie Voiselle	52a254ba22	template: error on missing key (#15141 ) * Support error_on_missing_value for templates * Update docs for template stanza	2022-11-04 13:23:01 -04:00
Piotr Kazmierczak	34e4b080f6	template: custom change_mode scripts (#13972 ) This PR adds the functionality of allowing custom scripts to be executed on template change. Resolves #2707	2022-08-24 17:43:01 +02:00
Luiz Aoqui	934bafb922	template: use pointer values for gid and uid (#14203 ) When a Nomad agent starts and loads jobs that already existed in the cluster, the default template uid and gid was being set to 0, since this is the zero value for int. This caused these jobs to fail in environments where it was not possible to use 0, such as in Windows clients. In order to differentiate between an explicit 0 and a template where these properties were not set we need to use a pointer.	2022-08-22 16:25:49 -04:00
Seth Hoenig	5694999c61	cli: display nomad service check status output in CLI commands This PR adds some NSD check status output to the CLI. 1. The 'nomad alloc status' command produces nsd check summary output (if present) 2. The 'nomad alloc checks' sub-command is added to produce complete nsd check output (if present)	2022-08-19 09:18:29 -05:00
Piotr Kazmierczak	c4be2c6078	cleanup: replace TypeToPtr helper methods with pointer.Of (#14151 ) Bumping compile time requirement to go 1.18 allows us to simplify our pointer helper methods.	2022-08-17 18:26:34 +02:00
Luiz Aoqui	a37ef39b8f	template: set default UID/GID to -1 (#13998 ) UID/GID 0 is usually reserved for the root user/group. While Nomad clients are expected to run as root it may not always be the case. Setting these values as -1 if not defined will fallback to the pervious behaviour of not attempting to set file ownership and use whatever UID/GID the Nomad agent is running as. It will also keep backwards compatibility, which is specially important for platforms where this feature is not supported, like Windows.	2022-08-04 11:26:08 -04:00
Piotr Kazmierczak	2e0b875b14	client: enable specifying user/group permissions in the template stanza (#13755 ) * Adds Uid/Gid parameters to template. * Updated diff_test * fixed order * update jobspec and api * removed obsolete code * helper functions for jobspec parse test * updated documentation * adjusted API jobs test. * propagate uid/gid setting to job_endpoint * adjusted job_endpoint tests * making uid/gid into pointers * refactor * updated documentation * updated documentation * Update client/allocrunner/taskrunner/template/template_test.go Co-authored-by: Luiz Aoqui <luiz@hashicorp.com> * Update website/content/api-docs/json-jobs.mdx Co-authored-by: Luiz Aoqui <luiz@hashicorp.com> * propagating documentation change from Luiz * formatting * changelog entry * changed changelog entry Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>	2022-08-02 22:15:38 +02:00
Eric Weber	07bbf1f91e	Add stage_publish_base_dir field to csi_plugin stanza of a job (#13919 ) * Allow specification of CSI staging and publishing directory path * Add website documentation for stage_publish_dir * Replace erroneous reference to csi_plugin.mount_config with csi_plugin.mount_dir * Avoid requiring CSI plugins to be redeployed after introducing StagePublishDir	2022-08-02 09:42:44 -04:00
Grant Griffiths	2986f1f18a	CSI: make plugin health_timeout configurable in csi_plugin stanza (#13340 ) Signed-off-by: Grant Griffiths <ggriffiths@purestorage.com>	2022-06-14 10:04:16 -04:00
Luiz Aoqui	0abe5a6c79	vault: revert support for entity aliases (#12723 ) After a more detailed analysis of this feature, the approach taken in PR #12449 was found to be not ideal due to poor UX (users are responsible for setting the entity alias they would like to use) and issues around jobs potentially masquerading itself as another Vault entity.	2022-04-22 10:46:34 -04:00
Derek Strickland	83dd636bf1	`MaxClientDisconnect` Jobspec checklist (#12177 ) * api: Add struct, conversion function, and tests * TaskGroup: Add field, validation, and tests * diff: Add diff handler and test * docs: Update docs	2022-04-05 17:12:23 -04:00
Derek Strickland	35752655b0	disconnected clients: Add reconnect task event (#12133 ) * Add TaskClientReconnectedEvent constant * Add allocRunner.Reconnect function to manage task state manually * Removes server-side push	2022-04-05 17:12:23 -04:00
Luiz Aoqui	d412f7b497	Support Vault entity aliases (#12449 ) Move some common Vault API data struct decoding out of the Vault client so it can be reused in other situations. Make Vault job validation its own function so it's easier to expand it. Rename the `Job.VaultPolicies` method to just `Job.Vault` since it returns the full Vault block, not just their policies. Set `ChangeMode` on `Vault.Canonicalize`. Add some missing tests. Allows specifying an entity alias that will be used by Nomad when deriving the task Vault token. An entity alias assigns an indentity to a token, allowing better control and management of Vault clients since all tokens with the same indentity alias will now be considered the same client. This helps track Nomad activity in Vault's audit logs and better control over Vault billing. Add support for a new Nomad server configuration to define a default entity alias to be used when deriving Vault tokens. This default value will be used if the task doesn't have an entity alias defined.	2022-04-05 14:18:10 -04:00
Derek Strickland	43edd0e709	Expose Consul template configuration parameters (#11606 ) This PR exposes the following existing`consul-template` configuration options to Nomad jobspec authors in the `{job.group.task.template}` stanza. - `wait` It also exposes the following`consul-template` configuration to Nomad operators in the `{client.template}` stanza. - `max_stale` - `block_query_wait` - `consul_retry` - `vault_retry` - `wait` Finally, it adds the following new Nomad-specific configuration to the `{client.template}` stanza that allows Operators to set bounds on what `jobspec` authors configure. - `wait_bounds` Co-authored-by: Tim Gross <tgross@hashicorp.com> Co-authored-by: Michael Schurter <mschurter@hashicorp.com>	2022-01-10 10:19:07 -05:00
Michael Schurter	d3d6c60e63	clarify docs from pr comments	2021-04-30 08:31:31 -07:00
Michael Schurter	d50fb2a00e	core: propagate remote task handles Add a new driver capability: RemoteTasks. When a task is run by a driver with RemoteTasks set, its TaskHandle will be propagated to the server in its allocation's TaskState. If the task is replaced due to a down node or draining, its TaskHandle will be propagated to its replacement allocation. This allows tasks to be scheduled in remote systems whose lifecycles are disconnected from the Nomad node's lifecycle. See https://github.com/hashicorp/nomad-driver-ecs for an example ECS remote task driver.	2021-04-27 15:07:03 -07:00

1 2 3 4 5

249 Commits