Commit Graph

8715 Commits

Author SHA1 Message Date
Michael Schurter
a508bb9709 Fold SetFailure into SetRestartTriggered 2017-09-14 16:48:39 -07:00
Michael Schurter
6f72270d13 Test check watch updates 2017-09-14 16:48:39 -07:00
Michael Schurter
3c0a42ba8e Rename unhealthy var and fix test indeterminism 2017-09-14 16:48:39 -07:00
Michael Schurter
10dc1c7900 DRY up restart handling a bit.
All 3 error/failure cases share restart logic, but 2 of them have
special cased conditions.
2017-09-14 16:48:39 -07:00
Michael Schurter
5cd1d57218 Watched -> TriggersRestart
Watched was a silly name
2017-09-14 16:48:39 -07:00
Michael Schurter
40ed2625f0 Handle multiple failing checks on a single task
Before this commit if a task had 2 checks cause restarts at the same
time, both would trigger restarts of the task! This change removes all
checks for a task whenever one of them is restarted.
2017-09-14 16:48:39 -07:00
Michael Schurter
f8e872c855 RestartDelay isn't needed as checks are re-added on restarts
@dadgar made the excellent observation in #3105 that TaskRunner removes
and re-registers checks on restarts. This means checkWatcher doesn't
need to do *any* internal restart tracking. Individual checks can just
remove themselves and be re-added when the task restarts.
2017-09-14 16:48:39 -07:00
Michael Schurter
237c096661 Simplify from 2 select loops to one 2017-09-14 16:48:39 -07:00
Michael Schurter
8b8c164622 Wrap check watch updates in a struct
Reusing checkRestart for both adds/removes and the main check restarting
logic was confusing.
2017-09-14 16:48:39 -07:00
Michael Schurter
092057a32b Canonicalize and Merge CheckRestart in api 2017-09-14 16:48:39 -07:00
Michael Schurter
9fb28656c9 Fix whitespace 2017-09-14 16:47:41 -07:00
Michael Schurter
568b963270 Remove unused lastStart field 2017-09-14 16:47:41 -07:00
Michael Schurter
526528c7c9 Removed partially implemented allocLock 2017-09-14 16:47:41 -07:00
Michael Schurter
3db835cb8f Improve check watcher logging and add tests
Also expose a mock Consul Agent to allow testing ServiceClient and
checkWatcher from TaskRunner without actually talking to a real Consul.
2017-09-14 16:47:41 -07:00
Michael Schurter
850d991286 Add changelog entry for #3105 2017-09-14 16:47:41 -07:00
Michael Schurter
7e103f69cb Document new check_restart stanza 2017-09-14 16:46:54 -07:00
Michael Schurter
78c72f8725 Default grace period to 1s 2017-09-14 16:46:54 -07:00
Michael Schurter
c2d895d47a Add comments and move delay calc to TaskRunner 2017-09-14 16:46:54 -07:00
Michael Schurter
555d1e24dc on_warning=false -> ignore_warnings=false
Treat warnings as unhealthy by default
2017-09-14 16:46:54 -07:00
Michael Schurter
ebbf87f979 Use existing restart policy infrastructure 2017-09-14 16:46:54 -07:00
Michael Schurter
1608e59415 Add check watcher for restarting unhealthy tasks 2017-09-14 16:46:54 -07:00
Michael Schurter
bd1a342a92 Nest restart fields in CheckRestart 2017-09-14 16:46:54 -07:00
Michael Schurter
a720bb5a91 Add restart fields 2017-09-14 16:46:54 -07:00
Chelsea Komlo
0ec93d7b2b Merge pull request #3213 from hashicorp/f-acl-job-summary
Add job endpoint ACL
2017-09-14 18:21:19 -04:00
Alex Dadgar
15be1566b8 Merge pull request #3217 from hashicorp/b-batch-filter
Fix batch handling of complete allocs/node drains
2017-09-14 15:11:40 -07:00
Alex Dadgar
9b594b4705 changelog 2017-09-14 15:11:26 -07:00
Alex Dadgar
04d86ffd10 Fix batch handling of complete allocs/node drains
This PR fixes:
* An issue in which a node-drain that contains a complete batch alloc
would cause a replacement
* An issue in which allocations with the same name during a scale
down/stop event wouldn't be properly stopped.
* An issue in which batch allocations from previous job versions may not
have been stopped properly.

Fixes https://github.com/hashicorp/nomad/issues/3210
2017-09-14 15:08:57 -07:00
Alex Dadgar
ca07d16510 Changelog 2017-09-14 14:35:53 -07:00
Alex Dadgar
c2d2da1df7 Update CHANGELOG.md 2017-09-14 14:34:02 -07:00
Alex Dadgar
04a04bec52 Merge pull request #3206 from hashicorp/b-eval-index
Worker waits til max ModifyIndex across EvalsByJob
2017-09-14 14:29:32 -07:00
Alex Dadgar
eae982d9f2 Changelog 2017-09-14 14:29:02 -07:00
Alex Dadgar
c3cca843b5 Address feedback 2017-09-14 14:28:43 -07:00
Alex Dadgar
4b4e376d7e Worker waits til max ModifyIndex across EvalsByJob
This PR fixes a scheduling race condition in which the plan results from
one invocation of the scheduler were not being considered by the next
since the Worker was not waiting for the correct index.

Fixes https://github.com/hashicorp/nomad/issues/3198
2017-09-14 14:28:43 -07:00
Alex Dadgar
a81ab20f06 Merge pull request #3214 from hashicorp/f-agent-servers
Sort /v1/agent/servers output
2017-09-14 14:22:00 -07:00
Alex Dadgar
083aca50f6 changelog 2017-09-14 14:21:41 -07:00
Alex Dadgar
d0a9389a27 use assert 2017-09-14 14:20:22 -07:00
Alex Dadgar
e25dff5a28 Sort /v1/agent/servers output
This PR sorts the output of the endpoint since its results are used as
part of Consul checks to avoid the value changing unnecessarily.

Fixes https://github.com/hashicorp/nomad/issues/3211
2017-09-14 14:20:22 -07:00
Alex Dadgar
f9a914b7e2 Merge pull request #3195 from hashicorp/b-node-locking
Non-locked accessors to common Node fields
2017-09-14 14:09:35 -07:00
Alex Dadgar
98c47c72d0 changelog and feedback 2017-09-14 14:08:58 -07:00
Alex Dadgar
f23ac5f083 Non-locked accessors to common Node fields
This PR removes locking around commonly accessed node attributes that do
not need to be locked. The locking could cause nodes to TTL as the
heartbeat code path was acquiring a lock that could be held for an
excessively long time. An example of this is when Vault is inaccessible,
since the fingerprint is run with a lock held but the Vault
fingerprinter makes the API calls with a large timeout.

Fixes https://github.com/hashicorp/nomad/issues/2689
2017-09-14 14:08:26 -07:00
Chelsea Holland Komlo
957e2607cf fixups from code review 2017-09-14 20:14:38 +00:00
Chelsea Holland Komlo
5ba6f0c2a8 use separate response object 2017-09-14 19:17:05 +00:00
Chelsea Holland Komlo
b3e9eb23d3 update to use ACL test helpers 2017-09-14 19:08:25 +00:00
Chelsea Holland Komlo
0ae023f3d4 add job endpoint ACL 2017-09-14 18:17:35 +00:00
Alex Dadgar
2a3b65bb66 Merge pull request #3209 from dezmodue/patch-1
Adding missing <
2017-09-14 10:53:26 -07:00
Alex Dadgar
58f3a29d07 Merge pull request #3205 from hashicorp/f-deployment-acl
Deployment.GetDeployment ACL enforcement
2017-09-14 10:50:17 -07:00
Alex Dadgar
5fc7e11bd5 review feeback 2017-09-14 10:50:04 -07:00
Alex Dadgar
a88ce5439d fix multierror merge 2017-09-13 21:48:52 -07:00
Alex Dadgar
3777592079 changelog 2017-09-13 15:46:41 -07:00
Alex Dadgar
74d30d4412 Merge pull request #3203 from hashicorp/b-search-hyphens
Fix UUID search with hyphens
2017-09-13 15:45:22 -07:00