Commit Graph

25 Commits

Author SHA1 Message Date
hashicorp-copywrite[bot]
a9d61ea3fd Update copyright file headers to BUSL-1.1 2023-08-10 17:27:29 -05:00
Tim Gross
feb34c738d scheduler: count implicit spread targets as a single target (#17195)
When calculating the score in the `SpreadIterator`, the score boost is
proportional to the difference between the current and desired count. But when
there are implicit spread targets, the current count is the sum of the possible
implicit targets, which results in incorrect scoring unless there's only one
implicit target.

This changeset updates the `propertySet` struct to accept a set of explicit
target values so it can detect when a property value falls into the implicit set
and should be combined with other implicit values.

Fixes: #11823
2023-05-17 10:25:00 -04:00
Tim Gross
c80c8ab43d scheduler: prevent -Inf in spread scoring (#17198)
When spread targets have a percent value of zero it's possible for them to
return -Inf scoring because of a float divide by zero. This is very hard for
operators to debug because the string "-Inf" is returned in the API and that
breaks the presentation of debugging data.

Most scoring iterators are bracketed to -1/+1, but spread iterators do not so
that they can handle greatly unbalanced scoring so we can't simply return a -1
score without generating a score that might be greater than the negative scores
set by other spread targets. Instead, track the lowest-seen spread boost and use
that as the spread boost for any cases where we'd divide by zero.

Fixes: #8863
2023-05-16 16:01:32 -04:00
hashicorp-copywrite[bot]
f005448366 [COMPLIANCE] Add Copyright and License Headers 2023-04-10 15:36:59 +00:00
Piotr Kazmierczak
949a6f60c7 renamed stanza to block for consistency with other projects (#15941) 2023-01-30 15:48:43 +01:00
Tim Gross
c49359ad58 scheduler: prevent panic in spread iterator during alloc stop
The spread iterator can panic when processing an evaluation, resulting
in an unrecoverable state in the cluster. Whenever a panicked server
restarts and quorum is restored, the next server to dequeue the
evaluation will panic.

To trigger this state:
* The job must have `max_parallel = 0` and a `canary >= 1`.
* The job must not have a `spread` block.
* The job must have a previous version.
* The previous version must have a `spread` block and at least one
  failed allocation.

In this scenario, the desired changes include `(place 1+) (stop
1+), (ignore n) (canary 1)`. Before the scheduler can place the canary
allocation, it tries to find out which allocations can be
stopped. This passes back through the stack so that we can determine
previous-node penalties, etc. We call `SetJob` on the stack with the
previous version of the job, which will include assessing the `spread`
block (even though the results are unused). The task group spread info
state from that pass through the spread iterator is not reset when we
call `SetJob` again. When the new job version iterates over the
`groupPropertySets`, it will get an empty `spreadAttributeMap`,
resulting in an unexpected nil pointer dereference.

This changeset resets the spread iterator internal state when setting
the job, logging with a bypass around the bug in case we hit similar
cases, and a test that panics the scheduler without the patch.
2022-02-09 19:53:06 -05:00
Preetha Appan
be897cadc3 More error->debug for logging in the bin packing iterator 2019-12-12 15:50:16 -06:00
Preetha Appan
566dd71486 Fix comment and assert score in test case 2019-05-15 12:35:57 -05:00
Nick Ethier
5709bf7b54 fix missing brace 2019-05-15 13:02:04 -04:00
Nick Ethier
ea843a507a scheduler: add check to prohibit returning inf during spread boost calculation 2019-05-15 13:00:24 -04:00
Alex Dadgar
bc42873e07 Change types of weights on spread/affinity 2019-01-30 12:20:38 -08:00
Alex Dadgar
260b566c91 server 2018-09-15 16:23:13 -07:00
Preetha Appan
f6cbfbfef6 Track top k nodes by norm score rather than top k nodes per scorer 2018-09-04 16:10:11 -05:00
Preetha Appan
72570e0698 fix linting error 2018-09-04 16:10:11 -05:00
Preetha Appan
31b2102055 Fix scoring logic for uneven spread to incorporate current alloc count
Also addressed other small code review comments
2018-09-04 16:10:11 -05:00
Preetha Appan
1ac696da56 more cleanup 2018-09-04 16:10:11 -05:00
Preetha Appan
fc48be3656 added some unit tests for -1 spread score 2018-09-04 16:10:11 -05:00
Preetha Appan
f881c4f266 comment and formatting cleanup 2018-09-04 16:10:11 -05:00
Preetha Appan
2dfdd4874f fix scoring algorithm when min count == current count 2018-09-04 16:10:11 -05:00
Preetha Appan
35bda8c975 Remove hardcoded boosts for even spread.
instead, calculate them based on delta between current and minimum value
2018-09-04 16:10:11 -05:00
Preetha Appan
7a5791f39e Implement support for even spread across datacenters, with unit test 2018-09-04 16:10:11 -05:00
Preetha Appan
56de0d0a11 Support implicit spread target to account for remaining desired counts 2018-09-04 16:10:11 -05:00
Preetha Appan
5f1d40e4c3 fix comments 2018-09-04 16:10:11 -05:00
Preetha Appan
bf84a5985a Include spreads configured at job level when precomputing weights/desired counts. 2018-09-04 16:10:11 -05:00
Preetha Appan
fd697272a7 Implement spread iterator that scores according to percentage of desired count in each target.
Added this as a new step in the stack and some unit tests
2018-09-04 16:10:11 -05:00