Commit Graph

2987 Commits

Author SHA1 Message Date
Chelsea Holland Komlo
a40750e596 update comment for when the fingerprinter setting health status 2018-04-10 16:53:00 -04:00
Chelsea Holland Komlo
46ec4633fe fingerprinter should set health check status if health check is not periodic 2018-04-10 15:29:51 -04:00
Chelsea Holland Komlo
9eaa1e7c9e add setters for access to the fingerprint manager's node
refactor extracting driver info
2018-04-10 15:29:51 -04:00
Chelsea Holland Komlo
49e12b1ad2 guard against overwriting health status 2018-04-10 15:29:51 -04:00
Chelsea Holland Komlo
d3f0d05ac8 immediately set healthy to false when driver moves to undetected 2018-04-10 15:29:51 -04:00
Chelsea Komlo
4444a3309e Merge pull request #4109 from hashicorp/f-shorten-docker-health-timeout
Shorten docker health timeout
2018-04-09 15:38:39 -04:00
Chelsea Holland Komlo
c6cd78db59 only initialize docker clients if they are nil 2018-04-09 14:13:07 -04:00
Chelsea Holland Komlo
4c1c88a91c refacotoring simplification from code review 2018-04-09 10:34:17 -04:00
Chelsea Holland Komlo
af8fc4f62c only run health check if driver moves from undetected to detected 2018-04-09 10:10:43 -04:00
Alex Dadgar
98a403a5a6 Start rebalance after discovering new servers 2018-04-05 15:41:59 -07:00
Alex Dadgar
b2ae8b73ef Merge pull request #4106 from hashicorp/b-servers
Improved Client handling of failed RPCs
2018-04-05 13:48:50 -07:00
Alex Dadgar
9ce59c5828 more jitter 2018-04-05 13:48:33 -07:00
Chelsea Holland Komlo
d251199432 group similar functions; update comments
health check timeout should be 1 minute
2018-04-05 16:19:02 -04:00
Chelsea Holland Komlo
dee4fc4555 remove do once block when creating a new docker client
only set cached connections upon no error
2018-04-05 16:19:02 -04:00
Chelsea Holland Komlo
45d09d1ef9 use client with shorter timeouts for health checks 2018-04-05 16:19:02 -04:00
Chelsea Holland Komlo
9092439107 refactor docker clients method to be able to extend to creating new clients 2018-04-05 16:19:02 -04:00
Alex Dadgar
c86ad8fa32 Handle no leader and faster retries near limit
Handle the ErrNoLeader case and apply slower retries. Also when we have
missed the heartbeat retry aggressively, backing off after we have
missed for more than 30 seconds.
2018-04-05 11:22:47 -07:00
Alex Dadgar
12a8655dbd Scale heartbeat retrying based on remaining heartbeat time 2018-04-05 10:58:13 -07:00
Alex Dadgar
80c380b456 Fire retry only when consul discovers new servers 2018-04-05 10:40:17 -07:00
Preetha
ff006877de Merge pull request #4101 from hashicorp/b-rescheduling-edge-fixes
Fixes edge cases around timing/ task finish time being set more than once
2018-04-04 16:18:21 -05:00
Preetha Appan
e81886d588 remove outdated commented out test code 2018-04-04 15:03:24 -05:00
Preetha Appan
8b6143f272 Remove old comment 2018-04-04 15:01:48 -05:00
Preetha Appan
7fa7655ebe Moves setting finishedAt to the right place and adds two unit tests. 2018-04-04 14:38:15 -05:00
Alex Dadgar
46e6d70435 Spelling error 2018-04-03 18:30:01 -07:00
Alex Dadgar
12ec2e3b60 RPC Retry Watcher 2018-04-03 18:05:28 -07:00
Preetha Appan
d8e975510a Add comment 2018-04-03 19:49:03 -05:00
Alex Dadgar
ca3b13e4c0 randomize servers 2018-04-03 17:46:13 -07:00
Preetha Appan
aa4a0cff50 Fixes edge cases around timing and task finish time being set more than once 2018-04-03 16:34:59 -05:00
Alex Dadgar
16ec4481e3 Improve Vault error handling 2018-04-03 14:29:22 -07:00
Alex Dadgar
1a66631eff remove generated files 2018-03-30 16:52:49 -07:00
Alex Dadgar
702a3be41e Generated files 2018-03-30 16:14:40 -07:00
Michael Schurter
2ee0426985 test: don't rely on alloc runner update count
We were incorrectly relying on the count of alloc updates in a number of
tests. Since alloc updates are async, their number is non-determinstic
and largely meaningless.

This should fix quite a few flaky tests in Travis and prevent future
mistaken assumptions in tests.
2018-03-30 09:34:33 -07:00
Michael Schurter
53a504c69c Merge pull request #4069 from hashicorp/f-hashealth
add HasHealth helper for nil checks
2018-03-29 17:03:20 -07:00
Alex Dadgar
357a10bcf4 Always capture the finish time 2018-03-29 11:27:22 -07:00
Michael Schurter
d09b0b62ba add HasHealth helper for nil checks
We performed the DeploymentStatus nil checks a couple different ways, so
hopefully this helper will consoldiate them and make it more clear what
the code is doing.
2018-03-29 09:29:19 -07:00
Chelsea Komlo
00b358553d Merge pull request #4065 from hashicorp/emit-node-event-on-first-health-change
Emit first node event after initialization on health status change
2018-03-29 11:23:25 -04:00
Chelsea Holland Komlo
aeb744d930 add clarifying comment 2018-03-29 10:58:39 -04:00
Michael Schurter
35f42b1fca Merge pull request #4059 from hashicorp/b-drain-health-svc-only
only service allocs should have health watched
2018-03-28 16:49:22 -07:00
Michael Schurter
12dd17affe only service allocs should have health watched 2018-03-28 16:20:11 -07:00
Chelsea Holland Komlo
dff03f6a91 emit first node event 2018-03-28 17:26:53 -04:00
Chelsea Komlo
b26031b90d Merge pull request #4057 from hashicorp/specify-docker-msg
Specify docker name in driver health messages
2018-03-28 13:32:36 -04:00
Preetha
6f870b8bd7 Merge pull request #4052 from hashicorp/f-specify-total-memory
Allow to specify total memory on agent configuration
2018-03-28 12:28:41 -05:00
Chelsea Holland Komlo
cdfeac13a1 specify driver health messages 2018-03-28 11:35:21 -04:00
Preetha Appan
30d104251b Code review feedback and unit test 2018-03-28 10:07:15 -05:00
Charlie Voiselle
33e57bf5a3 rkt: logging enhancements (#4044)
* Added extra debug logging; extended timeout; added jitter.

* small log changes

* increase timeout

* remove unneccessary uuid
2018-03-27 17:30:06 -07:00
Michael Schurter
079f425c32 client: always mark exited sys/svc allocs as failed
When restarts.attempts=0 was set in a jobspec a system or service alloc
that exited with 0 status would be marked as `completed` instead of
`failed`. Since system and service jobs are intended to run until
stopped or updated, they should always be marked as failed when they
exit even in cases where the exit code is 0.
2018-03-27 14:30:19 -07:00
Mildred Ki'Lya
d31105c69e Allow to specify total memory on agent configuration
Allow to set the total memory of an agent in its configuration file. This
can be used in case the automatic detection doesn't work or in specific
environments when memory overcommit (using swap for example) can be
desirable.
2018-03-27 15:46:18 -05:00
Chelsea Holland Komlo
041786360e use time.Time for node events for compatibility 2018-03-27 15:43:57 -04:00
Alex Dadgar
d10e155e0f Fix alloc watcher snapshot streaming 2018-03-27 11:14:53 -07:00
Alex Dadgar
31b317b6ee drop stats fetching log 2018-03-23 12:01:50 -07:00