nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-06 02:15:43 +03:00

Author	SHA1	Message	Date
Derek Strickland	f5de802993	system_scheduler: support disconnected clients (#12555 ) * structs: Add helper method for checking if alloc is configured to disconnect * system_scheduler: Add support for disconnected clients	2022-04-15 09:31:32 -04:00
Tim Gross	6a49a0fb81	set minimum version for disconnected client mode to 1.3.0 (#12530 )	2022-04-08 16:48:37 -04:00
Derek Strickland	8ac3e642e6	reconciler: 2 phase reconnects and tests (#12333 ) * structs: Add alloc.Expired & alloc.Reconnected functions. Add Reconnect eval trigger by. * node_endpoint: Emit new eval for reconnecting unknown allocs. * filterByTainted: handle 2 phase commit filtering rules. * reconciler: Append AllocState on disconnect. Logic updates from testing and 2 phase reconnects. * allocs: Set reconnect timestamp. Destroy if not DesiredStatusRun. Watch for unknown status.	2022-04-05 17:13:10 -04:00
Derek Strickland	bab317300e	Add description for allocs stopped due to reconnect (#12270 )	2022-04-05 17:12:23 -04:00
Derek Strickland	6329f44148	disconnected clients: ensure servers meet minimum required version (#12202 ) * planner: expose ServerMeetsMinimumVersion via Planner interface * filterByTainted: add flag indicating disconnect support * allocReconciler: accept and pass disconnect support flag * tests: update dependent tests	2022-04-05 17:12:23 -04:00
Derek Strickland	35752655b0	disconnected clients: Add reconnect task event (#12133 ) * Add TaskClientReconnectedEvent constant * Add allocRunner.Reconnect function to manage task state manually * Removes server-side push	2022-04-05 17:12:23 -04:00
Derek Strickland	786180601d	reconciler: support disconnected clients (#12058 ) * Add merge helper for string maps * structs: add statuses, MaxClientDisconnect, and helper funcs * taintedNodes: Include disconnected nodes * upsertAllocsImpl: don't use existing ClientStatus when upserting unknown * allocSet: update filterByTainted and add delayByMaxClientDisconnect * allocReconciler: support disconnecting and reconnecting allocs * GenericScheduler: upsert unknown and queue reconnecting Co-authored-by: Tim Gross <tgross@hashicorp.com>	2022-04-05 17:10:37 -04:00
Derek Strickland	cefc58dd7b	reconciler: refactor `computeGroup` (#12033 ) The allocReconciler's computeGroup function contained a significant amount of inline logic that was difficult to understand the intent of. This commit extracts inline logic into the following intention revealing subroutines. It also includes updates to the function internals also aimed at improving maintainability and renames some existing functions for the same purpose. New or renamed functions include. Renamed functions - handleGroupCanaries -> cancelUnneededCanaries - handleDelayedLost -> createLostLaterEvals - handeDelayedReschedules -> createRescheduleLaterEvals New functions - filterAndStopAll - initializeDeploymentState - requiresCanaries - computeCanaries - computeUnderProvisionedBy - computeReplacements - computeDestructiveUpdates - computeMigrations - createDeployment - isDeploymentComplete	2022-02-10 16:24:51 -05:00
Tim Gross	f811169267	scheduler: recover from panic (#12009 ) If processing a specific evaluation causes the scheduler (and therefore the entire server) to panic, that evaluation will never get a chance to be nack'd and cleared from the state store. It will get dequeued by another scheduler, causing that server to panic, and so forth until all servers are in a panic loop. This prevents the operator from intervening to remove the evaluation or update the state. Recover the goroutine from the top-level `Process` methods for each scheduler so that this condition can be detected without panicking the server process. This will lead to a loop of recovering the scheduler goroutine until the eval can be removed or nack'd, but that's much better than taking a downtime.	2022-02-07 11:47:53 -05:00
Luiz Aoqui	8a427a470a	scheduler: detect and log unexpected scheduling collisions (#11793 )	2022-01-14 20:09:14 -05:00
James Rasell	80dcae7216	core: allow setting and propagation of eval priority on job de/registration (#11532 ) This change modifies the Nomad job register and deregister RPCs to accept an updated option set which includes eval priority. This param is optional and override the use of the job priority to set the eval priority. In order to ensure all evaluations as a result of the request use the same eval priority, the priority is shared to the allocReconciler and deploymentWatcher. This creates a new distinction between eval priority and job priority. The Nomad agent HTTP API has been modified to allow setting the eval priority on job update and delete. To keep consistency with the current v1 API, job update accepts this as a payload param; job delete accepts this as a query param. Any user supplied value is validated within the agent HTTP handler removing the need to pass invalid requests to the server. The register and deregister opts functions now all for setting the eval priority on requests. The change includes a small change to the DeregisterOpts function which handles nil opts. This brings the function inline with the RegisterOpts.	2021-11-23 09:23:31 +01:00
Mahmood Ali	56a7cc61d0	scheduler: stop allocs in unrelated nodes (#11391 ) The system scheduler should leave allocs on draining nodes as-is, but stop node stop allocs on nodes that are no longer part of the job datacenters. Previously, the scheduler did not make the distinction and left system job allocs intact if they are already running. I've added a failing test first, which you can see in https://app.circleci.com/jobs/github/hashicorp/nomad/179661 . Fixes https://github.com/hashicorp/nomad/issues/11373	2021-10-27 07:04:13 -07:00
Seth Hoenig	61ee443ee6	core: implement system batch scheduler This PR implements a new "System Batch" scheduler type. Jobs can make use of this new scheduler by setting their type to 'sysbatch'. Like the name implies, sysbatch can be thought of as a hybrid between system and batch jobs - it is for running short lived jobs intended to run on every compatible node in the cluster. As with batch jobs, sysbatch jobs can also be periodic and/or parameterized dispatch jobs. A sysbatch job is considered complete when it has been run on all compatible nodes until reaching a terminal state (success or failed on retries). Feasibility and preemption are governed the same as with system jobs. In this PR, the update stanza is not yet supported. The update stanza is sill limited in functionality for the underlying system scheduler, and is not useful yet for sysbatch jobs. Further work in #4740 will improve support for the update stanza and deployments. Closes #2527	2021-08-03 10:30:47 -04:00
Michael Schurter	2e6eb84a57	Merge pull request #10248 from hashicorp/f-remotetask-2021 core: propagate remote task handles	2021-04-30 08:57:26 -07:00
Michael Schurter	d3d6c60e63	clarify docs from pr comments	2021-04-30 08:31:31 -07:00
Luiz Aoqui	c7114921fa	Add metrics for blocked eval resources (#10454 ) * add metrics for blocked eval resources * docs: add new blocked_evals metrics * fix to call `pruneStats` instead of `stats.prune` directly	2021-04-29 15:03:45 -04:00
Michael Schurter	d50fb2a00e	core: propagate remote task handles Add a new driver capability: RemoteTasks. When a task is run by a driver with RemoteTasks set, its TaskHandle will be propagated to the server in its allocation's TaskState. If the task is replaced due to a down node or draining, its TaskHandle will be propagated to its replacement allocation. This allows tasks to be scheduled in remote systems whose lifecycles are disconnected from the Nomad node's lifecycle. See https://github.com/hashicorp/nomad-driver-ecs for an example ECS remote task driver.	2021-04-27 15:07:03 -07:00
Tim Gross	7c7569674c	CSI: unique volume per allocation Add a `PerAlloc` field to volume requests that directs the scheduler to test feasibility for volumes with a source ID that includes the allocation index suffix (ex. `[0]`), rather than the exact source ID. Read the `PerAlloc` field when making the volume claim at the client to determine if the allocation index suffix (ex. `[0]`) should be added to the volume source ID.	2021-03-18 15:35:11 -04:00
Kris Hicks	85ed8ddd4f	Add gosimple linter (#9590 )	2020-12-09 11:05:18 -08:00
Kris Hicks	be6e5e9e6d	scheduler: Fix always-false sort func (#9547 ) Co-authored-by: Mahmood Ali <mahmood@hashicorp.com>	2020-12-08 09:57:47 -08:00
Michael Schurter	a55f46e9ba	api: add field filters to /v1/{allocations,nodes} Fixes #9017 The ?resources=true query parameter includes resources in the object stub listings. Specifically: - For `/v1/nodes?resources=true` both the `NodeResources` and `ReservedResources` field are included. - For `/v1/allocations?resources=true` the `AllocatedResources` field is included. The ?task_states=false query parameter removes TaskStates from /v1/allocations responses. (By default TaskStates are included.)	2020-10-14 10:35:22 -07:00
Mahmood Ali	6f6a93b262	Handle migration of non-deployment jobs This handles the case where a job when from no-deployment to deployment with canaries. Consider a case where a `max_parallel=0` job is submitted as version 0, then an update is submitted with `max_parallel=1, canary=1` as verion 1. In this case, we will have 1 canary alloc, and all remaining allocs will be version 0. Until the deployment is promoted, we ought to replace the canaries with version 0 job (which isn't associated with a deployment).	2020-08-26 10:36:34 -04:00
Mahmood Ali	92bb3728c9	tweak stack job manipulation To address review comments	2020-08-25 17:37:19 -04:00
Mahmood Ali	cb038b1a8c	Have Plan.AppendAlloc accept the job	2020-08-25 17:22:09 -04:00
Mahmood Ali	5720266c91	Respect alloc job version for lost/failed allocs This change fixes a bug where lost/failed allocations are replaced by allocations with the latest versions, even if the version hasn't been promoted yet. Now, when generating a plan for lost/failed allocations, the scheduler first checks if the current deployment is in Canary stage, and if so, it ensures that any lost/failed allocations is replaced one with the latest promoted version instead.	2020-08-19 09:52:48 -04:00
Mahmood Ali	b00d226c40	this is OSS	2020-06-22 10:28:45 -04:00
Nick Ethier	ad8ced3873	multi-interface network support	2020-06-19 09:42:10 -04:00
Lang Martin	9ccec0afbb	scheduler/reconcile: set FollowupEvalID on lost stop_after_client_disconnect (#8105 ) (#8138 ) * scheduler/reconcile: set FollowupEvalID on lost stop_after_client_disconnect * scheduler/reconcile: thread follupEvalIDs through to results.stop * scheduler/reconcile: comment typo * nomad/_test: correct arguments for plan.AppendStoppedAlloc * scheduler/reconcile: avoid nil, cleanup handleDelayed(Lost\|Reschedules)	2020-06-09 17:13:53 -04:00
Lang Martin	422493f38d	Delayed evaluations for `stop_after_client_disconnect` can cause unwanted extra followup evaluations around job garbage collection (#8099 ) * client/heartbeatstop: reversed time condition for startup grace * scheduler/generic_sched: use `delayInstead` to avoid a loop Without protecting the loop that creates followUpEvals, a delayed eval is allowed to create an immediate subsequent delayed eval. For both `stop_after_client_disconnect` and the `reschedule` block, a delayed eval should always produce some immediate result (running or blocked) and then only after the outcome of that eval produce a second delayed eval. * scheduler/reconcile: lostLater are different than delayedReschedules Just slightly. `lostLater` allocs should be used to create batched evaluations, but `handleDelayedReschedules` assumes that the allocations are in the untainted set. When it creates the in-place updates to those allocations at the end, it causes the allocation to be treated as running over in the planner, which causes the initial `stop_after_client_disconnect` evaluation to be retried by the worker.	2020-06-03 09:48:38 -04:00
Mahmood Ali	9f11857ad1	Open source Preemption code Nomad 0.12 OSS is to include preemption feature. This commit moves the private code for managing preemption to OSS repository.	2020-05-27 15:02:01 -04:00
Lang Martin	cd6d34425f	server: stop after client disconnect (#7939 ) * jobspec, api: add stop_after_client_disconnect * nomad/state/state_store: error message typo * structs: alloc methods to support stop_after_client_disconnect 1. a global AllocStates to track status changes with timestamps. We need this to track the time at which the alloc became lost originally. 2. ShouldClientStop() and WaitClientStop() to actually do the math * scheduler/reconcile_util: delayByStopAfterClientDisconnect * scheduler/reconcile: use delayByStopAfterClientDisconnect * scheduler/util: updateNonTerminalAllocsToLost comments This was setup to only update allocs to lost if the DesiredStatus had already been set by the scheduler. It seems like the intention was to update the status from any non-terminal state, and not all lost allocs have been marked stop or evict by now * scheduler/testing: AssertEvalStatus just use require * scheduler/generic_sched: don't create a blocked eval if delayed * scheduler/generic_sched_test: several scheduling cases	2020-05-13 16:39:04 -04:00
Chris Baker	1c9bac9087	wip: added job.scale rpc endpoint, needs explicit test (tested via http now)	2020-03-24 13:57:09 +00:00
Mahmood Ali	7897104b72	update scheduler to account for hooks	2020-03-21 17:52:45 -04:00
Drew Bailey	895e563461	nomad state store must be modified through raft, rm local state change	2020-02-03 13:57:34 -05:00
Tim Gross	3716a67b30	scheduler: fix job update placement on prev node penalized (#6781 ) Fixes #5856 When the scheduler looks for a placement for an allocation that's replacing another allocation, it's supposed to penalize the previous node if the allocation had been rescheduled or failed. But we're currently always penalizing the node, which leads to unnecessary migrations on job update. This commit leaves in place the existing behavior where if the previous alloc was itself rescheduled, its previous nodes are also penalized. This is conservative but the right behavior especially on larger clusters where a group of hosts might be having correlated trouble (like an AZ failure). Co-Authored-By: Michael Schurter <mschurter@hashicorp.com>	2019-12-03 06:14:49 -08:00
Preetha Appan	87e998d043	Fix inplace updates bug with group level networks During inplace updates, we should be using network information from the previous allocation being updated.	2019-09-05 18:37:24 -05:00
Nick Ethier	4cb99a1112	scheduler: fix disk constraints	2019-07-31 01:04:08 -04:00
Nick Ethier	e910fdbb32	fix failing tests	2019-07-31 01:04:07 -04:00
Nick Ethier	e15005bdcb	networking: Add new bridge networking mode implementation	2019-07-31 01:04:06 -04:00
Lang Martin	2d8bfb8d11	system_sched submits failed evals as blocked	2019-07-18 10:32:12 -04:00
Mahmood Ali	c62c246ad9	Stop allocs to be rescheduled Currently, when an alloc fails and is rescheduled, the alloc desired state remains as "run" and the nomad client may not free the resources. Here, we ensure that an alloc is marked as stopped when it's rescheduled. Notice the Desired Status and Description before and after this change: Before: ``` mars-2:nomad notnoop$ nomad alloc status 02aba49e ID = 02aba49e Eval ID = bb9ed1d2 Name = example-reschedule.nodes[0] Node ID = 5853d547 Node Name = mars-2.local Job ID = example-reschedule Job Version = 0 Client Status = failed Client Description = Failed tasks Desired Status = run Desired Description = <none> Created = 10s ago Modified = 5s ago Replacement Alloc ID = d6bf872b Task "payload" is "dead" Task Resources CPU Memory Disk Addresses 0/100 MHz 24 MiB/300 MiB 300 MiB Task Events: Started At = 2019-06-06T21:12:45Z Finished At = 2019-06-06T21:12:50Z Total Restarts = 0 Last Restart = N/A Recent Events: Time Type Description 2019-06-06T17:12:50-04:00 Not Restarting Policy allows no restarts 2019-06-06T17:12:50-04:00 Terminated Exit Code: 1 2019-06-06T17:12:45-04:00 Started Task started by client 2019-06-06T17:12:45-04:00 Task Setup Building Task Directory 2019-06-06T17:12:45-04:00 Received Task received by client ``` After: ``` ID = 5001ccd1 Eval ID = 53507a02 Name = example-reschedule.nodes[0] Node ID = a3b04364 Node Name = mars-2.local Job ID = example-reschedule Job Version = 0 Client Status = failed Client Description = Failed tasks Desired Status = stop Desired Description = alloc was rescheduled because it failed Created = 13s ago Modified = 3s ago Replacement Alloc ID = 7ba7ac20 Task "payload" is "dead" Task Resources CPU Memory Disk Addresses 21/100 MHz 24 MiB/300 MiB 300 MiB Task Events: Started At = 2019-06-06T21:22:50Z Finished At = 2019-06-06T21:22:55Z Total Restarts = 0 Last Restart = N/A Recent Events: Time Type Description 2019-06-06T17:22:55-04:00 Not Restarting Policy allows no restarts 2019-06-06T17:22:55-04:00 Terminated Exit Code: 1 2019-06-06T17:22:50-04:00 Started Task started by client 2019-06-06T17:22:50-04:00 Task Setup Building Task Directory 2019-06-06T17:22:50-04:00 Received Task received by client ```	2019-06-06 17:27:12 -04:00
Arshneet Singh	97686e371f	Remove allowPlanOptimization from schedulers	2019-04-23 09:18:02 -07:00
Arshneet Singh	4eedab18a7	Add code for plan normalization	2019-04-23 09:18:01 -07:00
Danielle	9a4fe5e98f	Merge pull request #5512 from hashicorp/dani/f-alloc-stop alloc-lifecycle: nomad alloc stop	2019-04-23 13:05:08 +02:00
Danielle Lancashire	bb142af5d6	allocs: Add nomad alloc stop This adds a `nomad alloc stop` command that can be used to stop and force migrate an allocation to a different node. This is built on top of the AllocUpdateDesiredTransitionRequest and explicitly limits the scope of access to that transition to expose it under the alloc-lifecycle ACL. The API returns the follow up eval that can be used as part of monitoring in the CLI or parsed and used in an external tool.	2019-04-23 12:50:23 +02:00
Preetha Appan	a134c16c22	remove stray new line	2019-04-12 10:32:48 -05:00
Preetha Appan	4743561396	Refactor scheduler package to enable preemption for batch/service jobs	2019-04-10 20:24:01 -05:00
James Rasell	ee92bf86d8	Add NodeName to the alloc/job status outputs. Currently when operators need to log onto a machine where an alloc is running they will need to perform both an alloc/job status call and then a call to discover the node name from the node list. This updates both the job status and alloc status output to include the node name within the information to make operator use easier. Closes #2359 Cloess #1180	2019-04-10 10:34:10 -05:00
Nick Ethier	80a04052b6	scheduler: fix NPE when deployment is nil, but placement is a canary	2019-01-28 20:22:59 -06:00
Alex Dadgar	95297c608c	goimports	2019-01-22 15:44:31 -08:00

1 2 3 4 5

201 Commits