Commit Graph

2427 Commits

Author SHA1 Message Date
Alex Dadgar
8c9234e319 Make test Vault pick random ports 2017-07-25 17:40:59 -07:00
Michael Schurter
fd1d8a9e1d Don't attempt to restore tasks that never sync'd 2017-07-24 15:58:46 -07:00
Alex Dadgar
80c4b03f07 fix vet 2017-07-22 22:43:33 -07:00
Alex Dadgar
fdd77dcaa0 travis check fixes 2017-07-22 21:01:22 -07:00
Alex Dadgar
70040f2574 fingerprinters 2017-07-22 20:38:03 -07:00
Alex Dadgar
e048068f92 fix slow resolve on mac 2017-07-22 19:58:30 -07:00
Alex Dadgar
4c89212f0a drop rkt deadline 2017-07-22 19:54:06 -07:00
Alex Dadgar
4c7c3c45e2 Merge branch 'master' of github.com:hashicorp/nomad 2017-07-22 19:48:54 -07:00
Alex Dadgar
22c5999c09 darwin test fixes 2017-07-22 19:48:47 -07:00
Alex Dadgar
08c2ba9bc6 Parallel client tests (#2890)
* alloc_runner

* Random tests

* parallel task_runner and no exec compatible check

* Parallel client

* Fail fast and use random ports

* Fix docker port mapping

* Make concurrent pull less timing dependant

* up parallel

* Fixes

* don't build chroots in parallel on travis

* Reduce parallelism on travis with lxc/rkt

* make java test app not run forever

* drop parallelism a little

* use docker ports that are out of the os's ephemeral port range

* Limit even more on travis

* rkt deadline
2017-07-22 19:04:36 -07:00
Alex Dadgar
5d4b0ab016 typo 2017-07-22 12:55:30 -07:00
Alex Dadgar
f30e5a5984 typo 2017-07-22 12:33:07 -07:00
Alex Dadgar
3cb16aa9a8 small fixes 2017-07-22 12:25:02 -07:00
Alex Dadgar
b6451f2d07 Merge pull request #2888 from hashicorp/b-fix-allocrunner-test
Fix TestAllocRunner_TaskLeader_StopTG and unrelated races
2017-07-22 11:44:04 -07:00
Alex Dadgar
82dd0fad5a faster vaultclient 2017-07-21 19:38:37 -07:00
Michael Schurter
a96fc052dd Fix tr race by not sharing alloc/task
prestart only needs the original alloc/task so pass their pointers in.
Task updates may concurrently replace the pointer on tr.
2017-07-21 16:17:42 -07:00
Michael Schurter
96baafebd3 Minor test race fix 2017-07-21 16:17:23 -07:00
Michael Schurter
2569c58cb7 Fix race by not accessing tr.task from ar 2017-07-21 16:16:53 -07:00
Michael Schurter
cf62d02378 Remove unneeded saveTaskRunnerState method
Collapse it into the one place it's called
2017-07-21 16:16:02 -07:00
Michael Schurter
a59d3a80ba Fix test race by locking around ar.tasks access 2017-07-21 14:25:51 -07:00
Michael Schurter
96127527a1 Fix handle race 2017-07-21 14:00:32 -07:00
Michael Schurter
a04f5016a5 Fix more test races 2017-07-21 14:00:21 -07:00
Michael Schurter
8fa599c4a5 Fixup a few more even rarer test races 2017-07-21 13:43:32 -07:00
Michael Schurter
55713e2a61 Always interpolate task before calling with Consul
Also switch to returning a copy of the task to avoid races between
altering the Task and persitence.
2017-07-21 13:37:16 -07:00
Michael Schurter
3974dfa98c Fix TestAllocRunner_TaskLeader_StopTG
Also make alloc runner tests less racy. Basically every alloc runner
test used to have races with `upd.{Count,Allocs}`
2017-07-21 13:37:16 -07:00
Alex Dadgar
d4e35815a1 executor and logging pkg 2017-07-21 12:14:54 -07:00
Alex Dadgar
789a5072a9 Parallel 2017-07-21 12:06:39 -07:00
Alex Dadgar
700147c90e Speed up client startup 2017-07-20 22:34:24 -07:00
Michael Schurter
a7a830a980 Merge pull request #2878 from hashicorp/b-save-state
Fix state handling on restart
2017-07-20 17:16:46 -07:00
Karel Malec
bafe9276ec Pass task group name as NOMAD_GROUP_NAME environment variable 2017-07-21 01:22:54 +02:00
Alex Dadgar
bb958ba745 Destroy tasks that are part of terminal alloc 2017-07-20 12:02:04 -07:00
Michael Schurter
738321efa3 Don't save task runner state if it is destroyed 2017-07-20 10:17:41 -07:00
Alex Dadgar
ae2ac8ab58 Should not persist state after alloc_runner is garbage collected 2017-07-19 17:31:30 -07:00
Michael Schurter
9150135b50 Use broadcast send retry logic everywhere 2017-07-18 14:36:32 -07:00
Alex Dadgar
4f376d08ed Merge pull request #2853 from hashicorp/b-watcher
Improve alloc health watcher
2017-07-18 14:12:28 -07:00
Alex Dadgar
459ddf63ec Save deployment status 2017-07-18 12:37:52 -07:00
Alex Dadgar
386557da73 Small fixes 2017-07-18 12:19:57 -07:00
Michael Schurter
8c4b760803 Fix deadlock caused by syncing during destroy
When replacing an alloc the new alloc is blocked until the old alloc is
destroyed. This could cause a deadlock:

1. Destroying the old alloc includes a final sync of its status
2. Syncing status causes a GC
3. A GC looks for terminal allocs to cleanup
4. The GC waits for an alloc to stop completely before GC'ing

If the GC chooses the currently-being-destroyed-alloc to GC, the GC
deadlocks. If `client.max_parallel` deadlocks happen the GC is wedged
until the Nomad process is restarted.

Performing the final sync asynchronously is an ugly hack but prevents
the deadlock by allowing the final sync to occur after the alloc runner
has shutdown and been destroyed.
2017-07-18 11:12:56 -07:00
Michael Schurter
b0eae2f002 Test AllocDir.Copy 2017-07-17 15:46:54 -07:00
Michael Schurter
dc5ea4acb9 Add AllocRunner.allocID for ease-of-use
Since the AllocRunner.alloc struct can be mutated, most of AllocRunner
needs to acquire a lock to get the alloc's ID. Log lines always need to
include the alloc ID, so we often skipped acquiring a lock just to grab
the ID and accepted the race.

Let's make the race detector a little happier by storing the ID in a
single assignment field.
2017-07-17 15:46:54 -07:00
Michael Schurter
802a99749c Fix log level 2017-07-17 15:46:54 -07:00
Michael Schurter
427a0ae1db Don't fail if task dirs don't exist on creation
Task dir metadata is created in AllocRunner.Run which may not run before
an alloc is sync'd and Nomad exits. There's no reason not to just create
task dir metadata on restore if it doesn't exist.
2017-07-17 15:46:54 -07:00
Michael Schurter
12d9e91f65 Ensure allocDir is never nil and persisted safely
Fixes #2834
2017-07-17 15:46:54 -07:00
Alex Dadgar
2b917380eb Fix alloc broadcaster panic on double close 2017-07-17 14:09:05 -07:00
Michael Schurter
56f697580b Fix nil panic in Docker error condition
Fixes #2835

Yet another bug caused by overwriting container and then trying to
reference container.ID in the err handling block. Did a quick audit of
docker.go and it seems to be the last offender. See #2804 for previous
bug.
2017-07-14 10:48:19 -07:00
Alex Dadgar
d06fa455b7 Small fixes 2017-07-07 17:34:50 -07:00
Michael Schurter
0ce0973d3a Merge pull request #2793 from hashicorp/b-2776-ct-vault-servername
Propagate vault.tls_server_name to consul-template
2017-07-07 16:44:19 -07:00
Michael Schurter
ef98449b07 Merge pull request #2787 from hashicorp/f-docker-test-mac
Test #2652 - Docker MAC Address option
2017-07-07 16:22:10 -07:00
Michael Schurter
5ab252fe43 Merge pull request #2797 from hashicorp/f-2785-docker-bridge-ip
Add driver.docker.bridge_ip node attribute
2017-07-07 16:20:20 -07:00
Michael Schurter
024d5a8edc Remove debug logging 2017-07-07 16:19:42 -07:00