Commit Graph

2537 Commits

Author SHA1 Message Date
Michael Schurter
c1cf162d83 Merge pull request #3105 from hashicorp/f-876-restart-unhealthy
Restart unhealthy tasks
2017-09-17 19:38:32 -07:00
epipho
56d591b119 Fix incorrect docker stats 2017-09-16 00:43:03 -04:00
Michael Schurter
fa836d8ca5 Name const after what it represents 2017-09-15 14:57:18 -07:00
Michael Schurter
cde908e35d Cleanup and test restart failure code 2017-09-15 14:54:37 -07:00
Michael Schurter
80147622c5 Add comments 2017-09-15 14:34:36 -07:00
Michael Schurter
a508bb9709 Fold SetFailure into SetRestartTriggered 2017-09-14 16:48:39 -07:00
Michael Schurter
10dc1c7900 DRY up restart handling a bit.
All 3 error/failure cases share restart logic, but 2 of them have
special cased conditions.
2017-09-14 16:48:39 -07:00
Michael Schurter
f8e872c855 RestartDelay isn't needed as checks are re-added on restarts
@dadgar made the excellent observation in #3105 that TaskRunner removes
and re-registers checks on restarts. This means checkWatcher doesn't
need to do *any* internal restart tracking. Individual checks can just
remove themselves and be re-added when the task restarts.
2017-09-14 16:48:39 -07:00
Michael Schurter
568b963270 Remove unused lastStart field 2017-09-14 16:47:41 -07:00
Michael Schurter
526528c7c9 Removed partially implemented allocLock 2017-09-14 16:47:41 -07:00
Michael Schurter
3db835cb8f Improve check watcher logging and add tests
Also expose a mock Consul Agent to allow testing ServiceClient and
checkWatcher from TaskRunner without actually talking to a real Consul.
2017-09-14 16:47:41 -07:00
Michael Schurter
c2d895d47a Add comments and move delay calc to TaskRunner 2017-09-14 16:46:54 -07:00
Michael Schurter
ebbf87f979 Use existing restart policy infrastructure 2017-09-14 16:46:54 -07:00
Michael Schurter
1608e59415 Add check watcher for restarting unhealthy tasks 2017-09-14 16:46:54 -07:00
Alex Dadgar
98c47c72d0 changelog and feedback 2017-09-14 14:08:58 -07:00
Alex Dadgar
f23ac5f083 Non-locked accessors to common Node fields
This PR removes locking around commonly accessed node attributes that do
not need to be locked. The locking could cause nodes to TTL as the
heartbeat code path was acquiring a lock that could be held for an
excessively long time. An example of this is when Vault is inaccessible,
since the fingerprint is run with a lock held but the Vault
fingerprinter makes the API calls with a large timeout.

Fixes https://github.com/hashicorp/nomad/issues/2689
2017-09-14 14:08:26 -07:00
Chelsea Komlo
0fff99488f Merge pull request #3191 from hashicorp/b-tagged-metrics-panic
Fix panic in emitting tagged allocation metrics
2017-09-11 14:28:50 -04:00
Armon Dadgar
c038c410b3 Merge pull request #3185 from hashicorp/f-acl-reset
Add ability to reset ACL bootstrap process
2017-09-11 10:47:17 -07:00
Armon Dadgar
3fc9dce13b Address @dadgar feedback 2017-09-11 10:30:59 -07:00
Alex Dadgar
1bed4b41d9 Merge pull request #3187 from hashicorp/b-windows-docker
Fix MemorySwappiness on Windows Docker
2017-09-11 09:56:49 -07:00
Alex Dadgar
49c4189758 Merge pull request #3184 from hashicorp/b-docker-logging
Fix docker user specified syslogging
2017-09-11 09:31:33 -07:00
Chelsea Holland Komlo
1ecfb687bf fix panic in emitting tagged metrics 2017-09-11 15:32:37 +00:00
Alex Dadgar
7148b65306 Fix MemorySwappiness on Windows Docker
Fixes https://github.com/hashicorp/nomad/issues/3181
2017-09-10 17:46:45 -07:00
Alex Dadgar
db261cd0c7 Fix invalid CPU stats on Windows
This PR fixes an issue introduced in Nomad 0.6.0 due to
https://github.com/shirou/gopsutil/issues/420. The issue arised from the
fact that the Windows stats from gopsutil reports CPUs in
percentages where we expected ticks.
2017-09-10 15:30:48 -07:00
Alex Dadgar
9206105e69 Fix docker user specified syslogging 2017-09-10 14:57:48 -07:00
James Nugent
3a5082022d client: Guard against "NaN" values from floats
This commit protects against finding `0.NaN` tokens in JSON streams
because of infinity representation on serialization.
2017-09-08 16:21:07 -05:00
Alex Dadgar
d8d4fb877b Merge pull request #3148 from clinta/purge-stopped
Always purge stopped containers
2017-09-05 17:18:05 -07:00
Alex Dadgar
4cab1781f6 Fix repo name passed to docker credential helpers
This PR fixes the server url passed to docker credential helpers and
fixes stderr capture.

Fixes https://github.com/hashicorp/nomad/issues/2957
2017-09-05 16:43:21 -07:00
Alex Dadgar
b9f51ce61c Parse Docker mounts correctly (#3163)
* Parse Docker mounts correctly

This PR fixes the parsing of Docker mounts and adds testing to ensure no
regressions.

Fixes https://github.com/hashicorp/nomad/issues/3156

* Review feedback
2017-09-05 14:02:57 -07:00
Chelsea Holland Komlo
68686cd69a final code review fixups 2017-09-05 18:47:44 +00:00
Chelsea Holland Komlo
ef4aef0223 fix up travis test failure via race condition 2017-09-05 15:04:59 +00:00
Chelsea Holland Komlo
681a3f337a fixups from code review 2017-09-05 14:13:34 +00:00
Chelsea Holland Komlo
3c0710074c labels depend on full setup of client beforehand 2017-09-05 14:13:34 +00:00
Chelsea Holland Komlo
fce72a1bc9 refactor to use baseLabels 2017-09-05 14:13:34 +00:00
Chelsea Holland Komlo
a6eeede7e2 pass in commonly used values 2017-09-05 14:13:34 +00:00
Chelsea Holland Komlo
50ab667799 create base labels to be used in every metric 2017-09-05 14:13:34 +00:00
Chelsea Holland Komlo
7a96f92290 emit metrics using labels, add option for backwards compatibility 2017-09-05 14:12:57 +00:00
Chelsea Holland Komlo
5efb4fb214 add metrics options to client config 2017-09-05 14:12:57 +00:00
Armon Dadgar
e9790c63b4 ACL RPCs allow stale reads for scalability 2017-09-04 13:07:44 -07:00
Armon Dadgar
33f640dc38 client: fixing policy resolution after ACL endpoint enforcement 2017-09-04 13:05:53 -07:00
Armon Dadgar
0fcf618dfc Add ErrPermissionDenied, rename TokenNotFound 2017-09-04 13:05:53 -07:00
Armon Dadgar
bda7b36da3 Address @dadgar feedback 2017-09-04 13:05:53 -07:00
Armon Dadgar
5b43ea4bff client: adding token resolution logic 2017-09-04 13:05:36 -07:00
Armon Dadgar
fb118b2dfb client: adding token cache for ACL resolution 2017-09-04 13:05:36 -07:00
Armon Dadgar
1da443f29a client: create ACL and Policy cache 2017-09-04 13:05:35 -07:00
Armon Dadgar
0dc6a1a9c7 agent: thread ACL config to client 2017-09-04 13:04:45 -07:00
Clint Armstrong
786c09f7e4 Always purge stopped containers 2017-08-31 14:28:48 -04:00
Clint Armstrong
cfffce07a0 fix logging re-init 2017-08-30 12:36:31 -04:00
Michael Schurter
7a84cbe02a Squelch logspam when unable to get disk usage stats
To reproduce logspam:

```
$ docker plugin install --grant-all-permissions vieux/sshfs
$ nomad agent -dev
...
2017/08/25 17:09:03.282868 [WARN] client: error fetching host disk usage stats for /var/lib/docker/plugins/a8b4a69b07e5180f828d19e1e9e102ccc0e26f9c9939eaef85357260c30b20a7/rootfs/mnt/volumes: permission denied
... repeats every collection period ...
```
2017-08-28 12:04:32 -07:00
Alex Dadgar
ba1eecbf7f Merge pull request #3073 from clinta/docker-500
Allow retry of 500 API errors to be handled by restart policies
2017-08-24 16:57:36 -07:00