Commit Graph

2394 Commits

Author SHA1 Message Date
Michael Schurter
9150135b50 Use broadcast send retry logic everywhere 2017-07-18 14:36:32 -07:00
Alex Dadgar
4f376d08ed Merge pull request #2853 from hashicorp/b-watcher
Improve alloc health watcher
2017-07-18 14:12:28 -07:00
Alex Dadgar
459ddf63ec Save deployment status 2017-07-18 12:37:52 -07:00
Alex Dadgar
386557da73 Small fixes 2017-07-18 12:19:57 -07:00
Michael Schurter
8c4b760803 Fix deadlock caused by syncing during destroy
When replacing an alloc the new alloc is blocked until the old alloc is
destroyed. This could cause a deadlock:

1. Destroying the old alloc includes a final sync of its status
2. Syncing status causes a GC
3. A GC looks for terminal allocs to cleanup
4. The GC waits for an alloc to stop completely before GC'ing

If the GC chooses the currently-being-destroyed-alloc to GC, the GC
deadlocks. If `client.max_parallel` deadlocks happen the GC is wedged
until the Nomad process is restarted.

Performing the final sync asynchronously is an ugly hack but prevents
the deadlock by allowing the final sync to occur after the alloc runner
has shutdown and been destroyed.
2017-07-18 11:12:56 -07:00
Michael Schurter
b0eae2f002 Test AllocDir.Copy 2017-07-17 15:46:54 -07:00
Michael Schurter
dc5ea4acb9 Add AllocRunner.allocID for ease-of-use
Since the AllocRunner.alloc struct can be mutated, most of AllocRunner
needs to acquire a lock to get the alloc's ID. Log lines always need to
include the alloc ID, so we often skipped acquiring a lock just to grab
the ID and accepted the race.

Let's make the race detector a little happier by storing the ID in a
single assignment field.
2017-07-17 15:46:54 -07:00
Michael Schurter
802a99749c Fix log level 2017-07-17 15:46:54 -07:00
Michael Schurter
427a0ae1db Don't fail if task dirs don't exist on creation
Task dir metadata is created in AllocRunner.Run which may not run before
an alloc is sync'd and Nomad exits. There's no reason not to just create
task dir metadata on restore if it doesn't exist.
2017-07-17 15:46:54 -07:00
Michael Schurter
12d9e91f65 Ensure allocDir is never nil and persisted safely
Fixes #2834
2017-07-17 15:46:54 -07:00
Alex Dadgar
2b917380eb Fix alloc broadcaster panic on double close 2017-07-17 14:09:05 -07:00
Michael Schurter
56f697580b Fix nil panic in Docker error condition
Fixes #2835

Yet another bug caused by overwriting container and then trying to
reference container.ID in the err handling block. Did a quick audit of
docker.go and it seems to be the last offender. See #2804 for previous
bug.
2017-07-14 10:48:19 -07:00
Alex Dadgar
d06fa455b7 Small fixes 2017-07-07 17:34:50 -07:00
Michael Schurter
0ce0973d3a Merge pull request #2793 from hashicorp/b-2776-ct-vault-servername
Propagate vault.tls_server_name to consul-template
2017-07-07 16:44:19 -07:00
Michael Schurter
ef98449b07 Merge pull request #2787 from hashicorp/f-docker-test-mac
Test #2652 - Docker MAC Address option
2017-07-07 16:22:10 -07:00
Michael Schurter
5ab252fe43 Merge pull request #2797 from hashicorp/f-2785-docker-bridge-ip
Add driver.docker.bridge_ip node attribute
2017-07-07 16:20:20 -07:00
Michael Schurter
024d5a8edc Remove debug logging 2017-07-07 16:19:42 -07:00
Michael Schurter
faa4da7c29 Merge pull request #2804 from hashicorp/b-2802-docker-panic
Don't panic in container list/remove/inspect race
2017-07-07 15:35:51 -07:00
Michael Schurter
c47860f928 Don't panic in container list/remove/inspect race
Fixes #2802

While it's hard to reproduce the theoretical race is:

1. This goroutine calls ListContainers()
2. Another goroutine removes a container X
3. This goroutine attempts to InspectContainer(X)

However, this bug could be hit in the much simpler case of
InspectContainer() timing out.

In those cases an error is returned and the old code attempted to wrap
the error with the now-nil container.ID. Storing the container ID fixes
that panic.
2017-07-07 15:10:59 -07:00
Alex Dadgar
3beaafca9a Vet and small improvement on watcher failure detection 2017-07-07 14:53:01 -07:00
Alex Dadgar
beb01f1754 test fixes 2017-07-07 14:11:27 -07:00
Alex Dadgar
e1c631064a @jippi Changed my mind! Good suggestion 2017-07-07 12:12:48 -07:00
Alex Dadgar
b8ba29bf93 Warn log 2017-07-07 12:10:04 -07:00
Alex Dadgar
f72bbaa370 Client watches for allocation health using task state and Consul checks
This PR adds watching of allocation health at the client. The client can
watch for health based on the tasks running on time and also based on
the consul checks passing.
2017-07-07 12:10:04 -07:00
Alex Dadgar
d165f65013 watcher per alloc 2017-07-07 12:07:08 -07:00
Alex Dadgar
8e58ddcceb Update index 2017-07-07 12:07:08 -07:00
Alex Dadgar
da82a6e814 initial watcher 2017-07-07 12:07:08 -07:00
Alex Dadgar
85e0d6fccd assign names 2017-07-07 12:03:11 -07:00
Michael Schurter
38bf16d2ef Add driver.docker.bridge_ip node attribute
Fixes #2785
2017-07-07 10:14:10 -07:00
Michael Schurter
3bf148bc05 Propagate vault.tls_server_name to consul-template
Fixes #2776
2017-07-06 16:56:50 -07:00
Michael Schurter
4f150e1650 Merge pull request #2786 from hashicorp/f-docker-auth-soft-fail
Default to auth hard fail but optionally soft fail
2017-07-06 13:25:56 -07:00
Michael Schurter
eaab2b2eee Test #2652
Also cleanup docker config opts docs
2017-07-06 12:46:25 -07:00
Michael Schurter
58186bfb88 Merge branch 'master' into master 2017-07-06 12:09:36 -07:00
Michael Schurter
3aae173432 Default to auth hard fail but optionally soft fail 2017-07-06 11:35:34 -07:00
Michael Schurter
d4c90fea1e Merge pull request #2781 from hashicorp/f-2678-getter-mode
Add support for go-getter modes
2017-07-06 11:06:40 -07:00
Michael Schurter
c5e9c4b7b0 Merge pull request #2744 from aep/master
Do not fail when no docker registry auth is available
2017-07-06 11:04:11 -07:00
Michael Schurter
450e347708 Add support for go-getter modes
Fixes #2678
2017-07-06 10:45:44 -07:00
Michael Schurter
2b97f61ac0 Consistently quote alloc ids in client logs 2017-07-06 10:24:52 -07:00
Michael Schurter
4794de99fd Tiny client race condition fix
Plus some logging improvements that may help with #2563
2017-07-05 16:15:19 -07:00
Michael Schurter
9eb1a87c47 rkt: use %s instead of %q when interpolating env
Fixes #2686
2017-07-05 09:36:17 -07:00
Michael Schurter
ca38020521 0 compute == error 2017-07-03 14:51:02 -07:00
Michael Schurter
c10f530964 Fix cpu_total_compute override 2017-07-03 14:51:02 -07:00
Michael Schurter
89e5971bc7 Merge pull request #2732 from hashicorp/b-persist-alloc-updates
Persist Alloc when EvalID changes
2017-07-03 14:46:43 -07:00
Michael Schurter
02691c988b Merge pull request #2763 from hashicorp/f-bad-state-help
Add more logging to restore state errors
2017-07-03 14:45:03 -07:00
Michael Schurter
dcf30f984a Merge pull request #2753 from hashicorp/b-leader-dies-first
Destroy task group leader first
2017-07-03 14:38:04 -07:00
Michael Schurter
6b3ae9acd8 Merge pull request #2709 from hashicorp/f-advertise-docker-ips
Advertise driver-specific addresses
2017-07-03 14:04:12 -07:00
Michael Schurter
11863660a0 Destroy task group leader first
Before this commit all tasks in a task group were destroyed
concurrently. This meant logging sidecars might be stopped before the
leader task whose logs still need to be shipped.

This commit blocks on the leader shutting down before signalling to
followers to shutdown.
2017-07-03 13:56:56 -07:00
Michael Schurter
e71673e24b Suggest wiping out alloc dir too 2017-07-03 12:29:21 -07:00
Michael Schurter
6b9af8fcc3 Add more logging to restore state errors 2017-07-03 11:58:41 -07:00
Arvid E. Picciani
1699761874 Do not fail when no docker registry auth is available
this amends the behaviour introduced with #2651
and allows pulling public images when docker.auth.helper is set
2017-06-27 11:11:18 +02:00