nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-04 17:35:43 +03:00

Author	SHA1	Message	Date
Mahmood Ali	2899991ccd	Merge pull request #5790 from hashicorp/b-reschedule-desired-state Mark rescheduled allocs as stopped.	2019-06-13 17:28:59 -04:00
Mahmood Ali	25b44b18db	Test behavior no reschedule for service/batch jobs	2019-06-13 16:41:19 -04:00
Mahmood Ali	34a66835db	Don't stop rescheduleLater allocations When an alloc is due to be rescheduleLater, it goes through the reconciler twice: once to be ignored with a follow up evals, and once again when processing the follow up eval where they appear as rescheduleNow. Here, we ignore them in the first run and mark them as stopped in second iteration; rather than stop them twice.	2019-06-13 09:44:41 -04:00
Mahmood Ali	d342a24ba0	Only preempt for network when there is a network When examining preemption for networks, only consider allocs that have networks. Fixes https://github.com/hashicorp/nomad/issues/5793	2019-06-07 18:55:55 -04:00
Mahmood Ali	2808674fac	test: add tests for network devices and preemption	2019-06-07 18:55:02 -04:00
Mahmood Ali	c62c246ad9	Stop allocs to be rescheduled Currently, when an alloc fails and is rescheduled, the alloc desired state remains as "run" and the nomad client may not free the resources. Here, we ensure that an alloc is marked as stopped when it's rescheduled. Notice the Desired Status and Description before and after this change: Before: ``` mars-2:nomad notnoop$ nomad alloc status 02aba49e ID = 02aba49e Eval ID = bb9ed1d2 Name = example-reschedule.nodes[0] Node ID = 5853d547 Node Name = mars-2.local Job ID = example-reschedule Job Version = 0 Client Status = failed Client Description = Failed tasks Desired Status = run Desired Description = <none> Created = 10s ago Modified = 5s ago Replacement Alloc ID = d6bf872b Task "payload" is "dead" Task Resources CPU Memory Disk Addresses 0/100 MHz 24 MiB/300 MiB 300 MiB Task Events: Started At = 2019-06-06T21:12:45Z Finished At = 2019-06-06T21:12:50Z Total Restarts = 0 Last Restart = N/A Recent Events: Time Type Description 2019-06-06T17:12:50-04:00 Not Restarting Policy allows no restarts 2019-06-06T17:12:50-04:00 Terminated Exit Code: 1 2019-06-06T17:12:45-04:00 Started Task started by client 2019-06-06T17:12:45-04:00 Task Setup Building Task Directory 2019-06-06T17:12:45-04:00 Received Task received by client ``` After: ``` ID = 5001ccd1 Eval ID = 53507a02 Name = example-reschedule.nodes[0] Node ID = a3b04364 Node Name = mars-2.local Job ID = example-reschedule Job Version = 0 Client Status = failed Client Description = Failed tasks Desired Status = stop Desired Description = alloc was rescheduled because it failed Created = 13s ago Modified = 3s ago Replacement Alloc ID = 7ba7ac20 Task "payload" is "dead" Task Resources CPU Memory Disk Addresses 21/100 MHz 24 MiB/300 MiB 300 MiB Task Events: Started At = 2019-06-06T21:22:50Z Finished At = 2019-06-06T21:22:55Z Total Restarts = 0 Last Restart = N/A Recent Events: Time Type Description 2019-06-06T17:22:55-04:00 Not Restarting Policy allows no restarts 2019-06-06T17:22:55-04:00 Terminated Exit Code: 1 2019-06-06T17:22:50-04:00 Started Task started by client 2019-06-06T17:22:50-04:00 Task Setup Building Task Directory 2019-06-06T17:22:50-04:00 Received Task received by client ```	2019-06-06 17:27:12 -04:00
Mahmood Ali	5574b2f3d0	tests: Migrated allocs aren't lost Fix `TestServiceSched_NodeDown` for checking that the migrated allocs are actually marked to be stopped. The boolean logic in test made it skip actually checking client status as long as desired status was stop. Here, we mark some jobs for migration while leaving others as running, and we check that lost flag is only set for non-migrated allocs.	2019-06-06 16:05:07 -04:00
Lang Martin	21dccdf8dd	describe a pending deployment with auto_promote accurately	2019-05-22 12:32:08 -04:00
Lang Martin	2165f8be94	sched reconcile copy AutoPromote to DeploymentState	2019-05-22 12:32:08 -04:00
Preetha Appan	566dd71486	Fix comment and assert score in test case	2019-05-15 12:35:57 -05:00
Nick Ethier	5709bf7b54	fix missing brace	2019-05-15 13:02:04 -04:00
Nick Ethier	ea843a507a	scheduler: add check to prohibit returning inf during spread boost calculation	2019-05-15 13:00:24 -04:00
Lang Martin	b1228536e8	system_sched & test cleanup comments	2019-05-01 12:25:26 -04:00
Lang Martin	d1308420b2	system_sched_test extend the test to check ineligible nodes	2019-05-01 12:25:26 -04:00
Lang Martin	8cc8fc6200	system_sched when a node is filtered, don't mark failure	2019-05-01 12:25:26 -04:00
Lang Martin	178092591b	system_sched_test create partially constrained job	2019-05-01 12:25:26 -04:00
Arshneet Singh	ee268a58db	Add comments to functions, and use require instead of assert	2019-04-23 09:57:21 -07:00
Arshneet Singh	97686e371f	Remove allowPlanOptimization from schedulers	2019-04-23 09:18:02 -07:00
Arshneet Singh	02b832c3ff	Compat tags	2019-04-23 09:18:01 -07:00
Arshneet Singh	f75c6b4bdb	Add tests for plan normalization	2019-04-23 09:18:01 -07:00
Arshneet Singh	4eedab18a7	Add code for plan normalization	2019-04-23 09:18:01 -07:00
Danielle	9a4fe5e98f	Merge pull request #5512 from hashicorp/dani/f-alloc-stop alloc-lifecycle: nomad alloc stop	2019-04-23 13:05:08 +02:00
Danielle Lancashire	bb142af5d6	allocs: Add nomad alloc stop This adds a `nomad alloc stop` command that can be used to stop and force migrate an allocation to a different node. This is built on top of the AllocUpdateDesiredTransitionRequest and explicitly limits the scope of access to that transition to expose it under the alloc-lifecycle ACL. The API returns the follow up eval that can be used as part of monitoring in the CLI or parsed and used in an external tool.	2019-04-23 12:50:23 +02:00
Preetha Appan	a134c16c22	remove stray new line	2019-04-12 10:32:48 -05:00
Preetha Appan	4743561396	Refactor scheduler package to enable preemption for batch/service jobs	2019-04-10 20:24:01 -05:00
James Rasell	ee92bf86d8	Add NodeName to the alloc/job status outputs. Currently when operators need to log onto a machine where an alloc is running they will need to perform both an alloc/job status call and then a call to discover the node name from the node list. This updates both the job status and alloc status output to include the node name within the information to make operator use easier. Closes #2359 Cloess #1180	2019-04-10 10:34:10 -05:00
Preetha Appan	1323b4d5cc	Fix bug where scoring metadata would be overridden during an inplace upgrade.	2019-03-12 23:36:46 -05:00
Alex Dadgar	bc42873e07	Change types of weights on spread/affinity	2019-01-30 12:20:38 -08:00
Nick Ethier	80a04052b6	scheduler: fix NPE when deployment is nil, but placement is a canary	2019-01-28 20:22:59 -06:00
Alex Dadgar	8264f50c52	convert driver to device for device constraint/attributes	2019-01-23 10:58:45 -08:00
Alex Dadgar	95297c608c	goimports	2019-01-22 15:44:31 -08:00
Preetha Appan	fc9c87c032	Remove unnecessary usage of alloc.Resource	2019-01-10 16:36:47 -06:00
Mahmood Ali	d19245fa7b	appease linter	2019-01-08 10:58:49 -05:00
Alex Dadgar	19e67a0916	Test recovery	2019-01-07 14:49:41 -08:00
Preetha	44cc76c4a3	Merge pull request #4881 from hashicorp/f-device-preemption Device preemption	2018-12-11 18:34:19 -06:00
Preetha Appan	e7162e8bd8	Early continue after meeting needed count Also adds another optimization that filters out un-needed allocations as a final filtering step	2018-12-11 10:12:18 -06:00
Preetha Appan	3921793030	Score combinations of allocs from multiple devices for preemption	2018-12-07 18:35:47 -06:00
Alex Dadgar	0953d913ed	Deprecate IOPS IOPS have been modelled as a resource since Nomad 0.1 but has never actually been detected and there is no plan in the short term to add detection. This is because IOPS is a bit simplistic of a unit to define the performance requirements from the underlying storage system. In its current state it adds unnecessary confusion and can be removed without impacting any users. This PR leaves IOPS defined at the jobspec parsing level and in the api/ resources since these are the two public uses of the field. These should be considered deprecated and only exist to allow users to stop using them during the Nomad 0.9.x release. In the future, there should be no expectation that the field will exist.	2018-12-06 15:09:26 -08:00
Preetha Appan	4afd512f45	use structured logging everywhere consistently	2018-12-03 08:31:41 -06:00
Preetha Appan	e023b367fa	addresses some code clarity review comments	2018-11-27 11:02:06 -06:00
Mahmood Ali	e7257fe5be	Simplify map count update logic Co-Authored-By: preetapan <preetha@hashicorp.com>	2018-11-27 10:03:11 -06:00
Mahmood Ali	026f976761	code review suggestion Co-Authored-By: preetapan <preetha@hashicorp.com>	2018-11-27 09:59:57 -06:00
Preetha Appan	e8088e404b	Fix formatting	2018-11-16 20:45:52 -06:00
Preetha Appan	7f2826097d	Fix preemption logic bug, need to group allocations by device first. This ensures that the set of allocations chosen for preemption all share the same device where ID is <vendor/type/device>	2018-11-16 20:32:10 -06:00
Danielle Tomlinson	3d0a45f6e5	scheduler: Add is_set/is_not_set constraints This adds constraints for asserting that a given attribute or value exists, or does not exist. This acts as a companion to =, or != operators, e.g: ```hcl constraint { attribute = "${attrs.type}" operator = "!=" value = "database" } constraint { attribute = "${attrs.type}" operator = "is_set" } ```	2018-11-15 11:00:32 -08:00
Preetha Appan	813143fee5	fix linting	2018-11-15 12:27:32 -06:00
Preetha Appan	9dfc27915c	Initial implementation of device preemption	2018-11-15 11:09:26 -06:00
Danielle Tomlinson	0925bfe618	scheduler: Allow comparisons of nil values This commit allows the ConstraintChecker to test values that do not exist. This is useful when wanting to _exclude_ given nodes from executing a job, for example, if you wanted to give canary nodes an attribute, and not run critical services on them, you may specify something like the below, but not want to tag all other nodes with the inverse. ```hcl constraint { attribute = "${node.attr.canary} operator = "!=" value = "1" } ``` This also requires all constraint checkers to allow for nil target values, as they will no longer be short circuited by resolving a target.	2018-11-13 13:36:51 -08:00
Alex Dadgar	895fdb79f1	Merge pull request #4867 from hashicorp/b-deployment-progress-deadline Blocked evaluation fixes	2018-11-13 10:29:03 -08:00
Preetha Appan	3a2d5f0178	blank line	2018-11-12 15:50:14 -06:00

1 2 3 4 5 ...

596 Commits