Commit Graph

124 Commits

Author SHA1 Message Date
Mahmood Ali
cb038b1a8c Have Plan.AppendAlloc accept the job 2020-08-25 17:22:09 -04:00
Mahmood Ali
5720266c91 Respect alloc job version for lost/failed allocs
This change fixes a bug where lost/failed allocations are replaced by
allocations with the latest versions, even if the version hasn't been
promoted yet.

Now, when generating a plan for lost/failed allocations, the scheduler
first checks if the current deployment is in Canary stage, and if so, it
ensures that any lost/failed allocations is replaced one with the latest
promoted version instead.
2020-08-19 09:52:48 -04:00
Nick Ethier
e9ff8a8daa Task DNS Options (#7661)
Co-Authored-By: Tim Gross <tgross@hashicorp.com>
Co-Authored-By: Seth Hoenig <shoenig@hashicorp.com>
2020-06-18 11:01:31 -07:00
Lang Martin
9ccec0afbb scheduler/reconcile: set FollowupEvalID on lost stop_after_client_disconnect (#8105) (#8138)
* scheduler/reconcile: set FollowupEvalID on lost stop_after_client_disconnect

* scheduler/reconcile: thread follupEvalIDs through to results.stop

* scheduler/reconcile: comment typo

* nomad/_test: correct arguments for plan.AppendStoppedAlloc

* scheduler/reconcile: avoid nil, cleanup handleDelayed(Lost|Reschedules)
2020-06-09 17:13:53 -04:00
Lang Martin
cd6d34425f server: stop after client disconnect (#7939)
* jobspec, api: add stop_after_client_disconnect

* nomad/state/state_store: error message typo

* structs: alloc methods to support stop_after_client_disconnect

1. a global AllocStates to track status changes with timestamps. We
   need this to track the time at which the alloc became lost
   originally.

2. ShouldClientStop() and WaitClientStop() to actually do the math

* scheduler/reconcile_util: delayByStopAfterClientDisconnect

* scheduler/reconcile: use delayByStopAfterClientDisconnect

* scheduler/util: updateNonTerminalAllocsToLost comments

This was setup to only update allocs to lost if the DesiredStatus had
already been set by the scheduler. It seems like the intention was to
update the status from any non-terminal state, and not all lost allocs
have been marked stop or evict by now

* scheduler/testing: AssertEvalStatus just use require

* scheduler/generic_sched: don't create a blocked eval if delayed

* scheduler/generic_sched_test: several scheduling cases
2020-05-13 16:39:04 -04:00
Mahmood Ali
04000c9ba9 Ensure that alloc updates preserve device offers
When an alloc is updated in-place, ensure that the allocated device are
preserved and carried over to new alloc.
2020-04-21 08:57:15 -04:00
Mahmood Ali
08e5a9087f Merge pull request #7414 from hashicorp/b-network-mode-change
Detect network mode change
2020-03-24 09:46:40 -04:00
Mahmood Ali
7897104b72 update scheduler to account for hooks 2020-03-21 17:52:45 -04:00
Mahmood Ali
82cb13aac9 Detect network mode change
Mark job as updated if network mode changed.
2020-03-21 16:51:10 -04:00
Drew Bailey
a880d75b16 comment for filtering reason 2020-02-03 09:02:09 -05:00
Drew Bailey
cd00d6ded5 make diffSystemAllocsForNode aware of eligibility
diffSystemAllocs -> diffSystemAllocsForNode, this function is only used
for diffing system allocations, but lacked awareness of eligible
nodes and the node ID that the allocation was going to be placed.

This change now ignores a change if its existing allocation is on an
ineligible node. For a new allocation, it also checks tainted and
ineligible nodes in the same function instead of nil-ing out the diff
after computation in diffSystemAllocs
2020-02-03 09:02:08 -05:00
Drew Bailey
580baea231 ignore computed diffs if node is ineligible
test flakey, add temp sleeps for debugging

fix computed class
2020-02-03 09:02:08 -05:00
Drew Bailey
81a24098f0 Update Evicted allocations to lost when lost
If an alloc is being preempted and marked as evict, but the underlying
node is lost before the migration takes place, the allocation currently
stays as desired evict, status running forever, or until the node comes
back online.

This commit updates updateNonTerminalAllocsToLost to check for a
destired status of Evict as well as Stop when updating allocations on
tainted nodes.

switch to table test for lost node cases
2020-01-07 13:34:18 -05:00
Drew Bailey
89964c989a Removes checking constraints for inplace update 2019-11-19 13:34:41 -05:00
Drew Bailey
c87c6415eb DOCS: Spread stanza does not exist on task
Fixes documentation inaccuracy for spread stanza placement. Spreads can
only exist on the top level job struct or within a group.

comment about nil assumption
2019-11-19 08:26:36 -05:00
Drew Bailey
1607a203a8 Check for changes to affinity and constraints
Adds checks for affinity and constraint changes when determining if we
should update inplace.

refactor to check all levels at once

check for spread changes when checking inplace update
2019-11-19 08:26:34 -05:00
Chris Baker
bcd1243471 the scheduler checks whether task changes require a restart, this needed
to be updated to consider devices
2019-11-07 17:51:15 +00:00
Michael Schurter
08a17854ce core: fix panic when AllocatedResources is nil
Fix for #6540
2019-10-28 14:38:21 -07:00
Preetha Appan
654c72a7b4 update comment 2019-09-05 18:43:30 -05:00
Preetha Appan
87e998d043 Fix inplace updates bug with group level networks
During inplace updates, we should be using network information
from the previous allocation being updated.
2019-09-05 18:37:24 -05:00
Nick Ethier
4cb99a1112 scheduler: fix disk constraints 2019-07-31 01:04:08 -04:00
Nick Ethier
e910fdbb32 fix failing tests 2019-07-31 01:04:07 -04:00
Nick Ethier
e15005bdcb networking: Add new bridge networking mode implementation 2019-07-31 01:04:06 -04:00
Nick Ethier
c742f8b580 ar: cleanup lint errors 2019-07-31 01:03:18 -04:00
Nick Ethier
e20fa7ccc1 Add network lifecycle management
Adds a new Prerun and Postrun hooks to manage set up of network namespaces
on linux. Work still needs to be done to make the code platform agnostic and
support Docker style network initalization.
2019-07-31 01:03:17 -04:00
Arshneet Singh
4eedab18a7 Add code for plan normalization 2019-04-23 09:18:01 -07:00
Preetha Appan
1323b4d5cc Fix bug where scoring metadata would be overridden during an inplace upgrade. 2019-03-12 23:36:46 -05:00
Alex Dadgar
0953d913ed Deprecate IOPS
IOPS have been modelled as a resource since Nomad 0.1 but has never
actually been detected and there is no plan in the short term to add
detection. This is because IOPS is a bit simplistic of a unit to define
the performance requirements from the underlying storage system. In its
current state it adds unnecessary confusion and can be removed without
impacting any users. This PR leaves IOPS defined at the jobspec parsing
level and in the api/ resources since these are the two public uses of
the field. These should be considered deprecated and only exist to allow
users to stop using them during the Nomad 0.9.x release. In the future,
there should be no expectation that the field will exist.
2018-12-06 15:09:26 -08:00
Alex Dadgar
5e67b37aad use int64 2018-10-16 15:34:32 -07:00
Preetha Appan
3ca71ae935 Change CPU/Disk/MemoryMB to int everywhere in new resource structs 2018-10-16 16:21:42 -05:00
Alex Dadgar
0f2f4797cb fixing tests 2018-10-04 14:26:19 -07:00
Alex Dadgar
49c2d4f775 Scheduler uses allocated resources 2018-10-02 17:08:25 -07:00
Alex Dadgar
260b566c91 server 2018-09-15 16:23:13 -07:00
Alex Dadgar
05d183bbab Fix lost handling of not actually down nodes 2018-03-30 14:17:41 -07:00
Alex Dadgar
3099ef05e2 Unmark drain when nodes hit their deadline and only batch/system left and add all job type integration test 2018-03-28 17:25:58 -07:00
Alex Dadgar
95c3c637ba Correct status desc on draining system allocs 2018-03-26 17:54:46 -07:00
Michael Schurter
832b1d5694 switch to new raft DesiredTransition message 2018-03-21 16:49:48 -07:00
Alex Dadgar
48d637dad1 RPC, FSM, State Store for marking DesiredTransistion
fix build tag
2018-03-21 16:49:48 -07:00
Michael Schurter
95b3b6eb02 drain: initial drainv2 structs and impl 2018-03-21 16:49:48 -07:00
Josh Soref
692913095b spelling: feasibility 2018-03-11 18:07:09 +00:00
Preetha Appan
cc54e11802 Fix some comments and lint warnings, remove unused method 2018-01-31 09:56:53 -06:00
Preetha Appan
5ecb7895bb Reschedule previous allocs and track their reschedule attempts 2018-01-31 09:56:53 -06:00
Alex Dadgar
a9e3a41407 Enable more linters 2017-09-26 15:26:33 -07:00
Alex Dadgar
83fb59969d Stop before trying to place 2017-07-17 17:18:12 -07:00
Alex Dadgar
989aa56304 Remove canary 2017-07-07 12:10:04 -07:00
Alex Dadgar
690fc78091 Plan apply handles canaries and success is set via update 2017-07-07 12:10:04 -07:00
Alex Dadgar
29e31af007 Attach eval id 2017-07-07 12:10:04 -07:00
Alex Dadgar
0ec6d74338 update description of the alloc update factory function 2017-07-07 12:03:11 -07:00
Alex Dadgar
f32a9a5539 Non-Canary/Deployment Tests 2017-07-07 12:03:11 -07:00
Alex Dadgar
d72b270a51 Pull out in-place updating into a passed in function; reduce inputs to reconciler 2017-07-07 12:03:11 -07:00