Commit Graph

73 Commits

Author SHA1 Message Date
Seth Hoenig
b242957990 ci: swap ci parallelization for unconstrained gomaxprocs 2022-03-15 12:58:52 -05:00
James Rasell
80dcae7216 core: allow setting and propagation of eval priority on job de/registration (#11532)
This change modifies the Nomad job register and deregister RPCs to
accept an updated option set which includes eval priority. This
param is optional and override the use of the job priority to set
the eval priority.

In order to ensure all evaluations as a result of the request use
the same eval priority, the priority is shared to the
allocReconciler and deploymentWatcher. This creates a new
distinction between eval priority and job priority.

The Nomad agent HTTP API has been modified to allow setting the
eval priority on job update and delete. To keep consistency with
the current v1 API, job update accepts this as a payload param;
job delete accepts this as a query param.

Any user supplied value is validated within the agent HTTP handler
removing the need to pass invalid requests to the server.

The register and deregister opts functions now all for setting
the eval priority on requests.

The change includes a small change to the DeregisterOpts function
which handles nil opts. This brings the function inline with the
RegisterOpts.
2021-11-23 09:23:31 +01:00
Mahmood Ali
6c414cd5f9 gofmt all the files
mostly to handle build directives in 1.17.
2021-10-01 10:14:28 -04:00
Tim Gross
e6b3123758 scheduler: test for reconciler's in-place rollback behavior
The reconciler has some complicated behavior when there are already running
allocations from a previous version of the job that we want to keep, as
happens during a rollback. Document this behavior with a test.
2021-06-03 10:02:19 -04:00
Chris Baker
93d5187e7b removed deprecated fields from Drain structs and API
node drain: use msgtype on txn so that events are emitted
wip: encoding extension to add Node.Drain field back to API responses

new approach for hiding Node.SecretID in the API, using `json` tag
documented this approach in the contributing guide
refactored the JSON handlers with extensions
modified event stream encoding to use the go-msgpack encoders with the extensions
2021-03-21 15:30:11 +00:00
Jasmine Dahilig
c346a47b5b add default update stanza and max_parallel=0 disables deployments (#6191) 2019-09-02 10:30:09 -07:00
Mahmood Ali
25b44b18db Test behavior no reschedule for service/batch jobs 2019-06-13 16:41:19 -04:00
Mahmood Ali
34a66835db Don't stop rescheduleLater allocations
When an alloc is due to be rescheduleLater, it goes through the
reconciler twice: once to be ignored with a follow up evals, and once
again when processing the follow up eval where they appear as
rescheduleNow.

Here, we ignore them in the first run and mark them as stopped in second
iteration; rather than stop them twice.
2019-06-13 09:44:41 -04:00
Mahmood Ali
c62c246ad9 Stop allocs to be rescheduled
Currently, when an alloc fails and is rescheduled, the alloc desired
state remains as "run" and the nomad client may not free the resources.

Here, we ensure that an alloc is marked as stopped when it's
rescheduled.

Notice the Desired Status and Description before and after this change:

Before:
```
mars-2:nomad notnoop$ nomad alloc status 02aba49e
ID                   = 02aba49e
Eval ID              = bb9ed1d2
Name                 = example-reschedule.nodes[0]
Node ID              = 5853d547
Node Name            = mars-2.local
Job ID               = example-reschedule
Job Version          = 0
Client Status        = failed
Client Description   = Failed tasks
Desired Status       = run
Desired Description  = <none>
Created              = 10s ago
Modified             = 5s ago
Replacement Alloc ID = d6bf872b

Task "payload" is "dead"
Task Resources
CPU        Memory          Disk     Addresses
0/100 MHz  24 MiB/300 MiB  300 MiB

Task Events:
Started At     = 2019-06-06T21:12:45Z
Finished At    = 2019-06-06T21:12:50Z
Total Restarts = 0
Last Restart   = N/A

Recent Events:
Time                       Type            Description
2019-06-06T17:12:50-04:00  Not Restarting  Policy allows no restarts
2019-06-06T17:12:50-04:00  Terminated      Exit Code: 1
2019-06-06T17:12:45-04:00  Started         Task started by client
2019-06-06T17:12:45-04:00  Task Setup      Building Task Directory
2019-06-06T17:12:45-04:00  Received        Task received by client

```

After:

```
ID                   = 5001ccd1
Eval ID              = 53507a02
Name                 = example-reschedule.nodes[0]
Node ID              = a3b04364
Node Name            = mars-2.local
Job ID               = example-reschedule
Job Version          = 0
Client Status        = failed
Client Description   = Failed tasks
Desired Status       = stop
Desired Description  = alloc was rescheduled because it failed
Created              = 13s ago
Modified             = 3s ago
Replacement Alloc ID = 7ba7ac20

Task "payload" is "dead"
Task Resources
CPU         Memory          Disk     Addresses
21/100 MHz  24 MiB/300 MiB  300 MiB

Task Events:
Started At     = 2019-06-06T21:22:50Z
Finished At    = 2019-06-06T21:22:55Z
Total Restarts = 0
Last Restart   = N/A

Recent Events:
Time                       Type            Description
2019-06-06T17:22:55-04:00  Not Restarting  Policy allows no restarts
2019-06-06T17:22:55-04:00  Terminated      Exit Code: 1
2019-06-06T17:22:50-04:00  Started         Task started by client
2019-06-06T17:22:50-04:00  Task Setup      Building Task Directory
2019-06-06T17:22:50-04:00  Received        Task received by client
```
2019-06-06 17:27:12 -04:00
Alex Dadgar
5e67b37aad use int64 2018-10-16 15:34:32 -07:00
Preetha Appan
3ca71ae935 Change CPU/Disk/MemoryMB to int everywhere in new resource structs 2018-10-16 16:21:42 -05:00
Preetha Appan
dfc76b3e1f Fix bug in reconciler where terminal allocs on a job already stopped were unnecessarily updated 2018-10-08 21:03:49 -05:00
Alex Dadgar
0f2f4797cb fixing tests 2018-10-04 14:26:19 -07:00
Alex Dadgar
260b566c91 server 2018-09-15 16:23:13 -07:00
Alex Dadgar
14b65c3378 Failed/paused deployments do not block migrations
This PR changes behavior of the scheduler such that a task group with a
deployment that is failed or paused will not cause the scheduler to skip
migrations.

The reason for this change is that it causes a bad UX when draining
nodes with allocations that are part of a failed/paused deployment.
These operations should not be coupled in any way and this remedies
that.

Prior behavior was still correct, but required either jobs to
transistion to a healthy state or for the node to hit its drain
deadline.
2018-09-10 15:28:45 -07:00
Alex Dadgar
98c7abe541 Tests only use testlog package logger 2018-06-13 15:40:56 -07:00
Preetha Appan
8e5909f073 make test create index clearer 2018-06-05 17:29:59 -05:00
Preetha Appan
65c08b76d3 Fix reconciler bug with deployment not being created if job create index is different
This fixes an issue where if a job is purged and resubmitted Nomad does not create
a new deployment. Adds unit test that failed before this fix
2018-06-05 13:58:53 -05:00
Preetha Appan
4f9d92cad3 fix test comment 2018-05-09 16:01:34 -05:00
Preetha Appan
268a99e71a Add unit tests for forced rescheduling 2018-05-09 11:30:42 -05:00
Alex Dadgar
fc099e59ea Add test 2018-05-07 14:55:01 -05:00
Alex Dadgar
0e1fb91189 Reschedule when we have canaries properly 2018-05-07 14:55:01 -05:00
Alex Dadgar
686cff26d6 canary reschedule test 2018-05-07 14:50:01 -05:00
Alex Dadgar
588bf68d45 Test for rescheduling when there are canaries 2018-05-07 14:50:01 -05:00
Alex Dadgar
ff7b1bebcc Allow canary count greater than desired 2018-05-07 14:50:01 -05:00
Preetha Appan
32557a1a99 Only use DesiredTransition.Reschedule in reconciler when its an active deployment 2018-05-07 14:50:01 -05:00
Alex Dadgar
b8aa63a780 Add test where deployment is marked as complete when done even with failed allocs 2018-05-07 14:50:01 -05:00
Alex Dadgar
be3e3eadf6 fix reconcile tests 2018-05-07 14:50:01 -05:00
Preetha Appan
7ebf6032dc remove unnecessary check and other fixes from code review 2018-04-04 07:35:20 -05:00
Preetha Appan
aa4a0cff50 Fixes edge cases around timing and task finish time being set more than once 2018-04-03 16:34:59 -05:00
Preetha Appan
fc50ab930f Refactored for readability, pair programmed with @dadgar 2018-03-29 13:28:37 -05:00
Preetha Appan
fefbdd3178 Filter out allocs with DesiredState = stop, and unit tests 2018-03-29 09:28:52 -05:00
Preetha Appan
e2226ca2b7 s/linear/constant/g 2018-03-26 14:45:09 -05:00
Preetha
ce34e73864 Merge pull request #4037 from hashicorp/b-fix-terminal-filtering-service-allocs
Fix edge case in reconciler
2018-03-26 13:14:51 -05:00
Preetha Appan
e50ef09421 one field per line in struct definition 2018-03-26 13:13:21 -05:00
Alex Dadgar
f9e9b791c7 name and test 2018-03-26 11:06:21 -07:00
Alex Dadgar
7ce09fff59 Don't create unnecessary deployments 2018-03-23 16:55:21 -07:00
Preetha Appan
f401044600 Fix edge case in reconciler where service jobs with ClientstatusComplete were not replaced 2018-03-23 18:41:00 -05:00
Alex Dadgar
9c8203a5a4 Do not mark an allocation as an inplace update if specification hasn't changed 2018-03-23 14:36:05 -07:00
Alex Dadgar
45e7e88558 Fix deadline handling 2018-03-21 16:51:44 -07:00
Michael Schurter
832b1d5694 switch to new raft DesiredTransition message 2018-03-21 16:49:48 -07:00
Alex Dadgar
48d637dad1 RPC, FSM, State Store for marking DesiredTransistion
fix build tag
2018-03-21 16:49:48 -07:00
Preetha Appan
d4056c4489 Rename DelayCeiling to MaxDelay 2018-03-14 16:10:32 -05:00
Preetha Appan
9628454d7a Scheduler and Reconciler changes to support delayed rescheduling 2018-03-14 16:10:32 -05:00
Preetha Appan
24c04d67d5 Fixes bug in reconciler where previously rescheduled allocs are rescheduled again. Simplified logic and added test case to catch this. 2018-02-20 12:07:56 -06:00
Preetha Appan
87d0523d55 Reconciler should consider failed allocs when marking deployment as failed. 2018-02-02 19:40:25 -06:00
Preetha Appan
aa1af00fbd Beef up unit test for rescheduling batch jobs 2018-01-31 09:56:53 -06:00
Preetha Appan
a49ad471f9 Address more code review feedback 2018-01-31 09:56:53 -06:00
Preetha Appan
c5f81b426f Make sure that reschedule trackers are not added for node drain replacements 2018-01-31 09:56:53 -06:00
Preetha Appan
0b6846873b Improve reconciler unit tests 2018-01-31 09:56:53 -06:00