Commit Graph

107 Commits

Author SHA1 Message Date
Michael Schurter
55dcadc132 tests: fix race in alloc_runner_test.go
I could not reproduce the failure locally even with `stress -cpu ...`
eating all the cpu it could on my machine.

But I think the race was in one of two places:
* The task could restart which could create new events
* I think there could be a race between the updater's version of events
  and alloc runners as updates are async

I fixed both. Here's hoping that fixes this flaky test.
2018-04-17 17:14:59 -07:00
Preetha Appan
e81886d588 remove outdated commented out test code 2018-04-04 15:03:24 -05:00
Preetha Appan
7fa7655ebe Moves setting finishedAt to the right place and adds two unit tests. 2018-04-04 14:38:15 -05:00
Michael Schurter
2ee0426985 test: don't rely on alloc runner update count
We were incorrectly relying on the count of alloc updates in a number of
tests. Since alloc updates are async, their number is non-determinstic
and largely meaningless.

This should fix quite a few flaky tests in Travis and prevent future
mistaken assumptions in tests.
2018-03-30 09:34:33 -07:00
Michael Schurter
d09b0b62ba add HasHealth helper for nil checks
We performed the DeploymentStatus nil checks a couple different ways, so
hopefully this helper will consoldiate them and make it more clear what
the code is doing.
2018-03-29 09:29:19 -07:00
Michael Schurter
12dd17affe only service allocs should have health watched 2018-03-28 16:20:11 -07:00
Michael Schurter
50a94d73c9 test: try to prevent flakiness on travis 2018-03-21 16:51:45 -07:00
Josh Soref
ed82e98880 spelling: artifact 2018-03-11 17:41:02 +00:00
Alex Dadgar
a6baf7133a Remove testing 2018-02-15 13:59:01 -08:00
Michael Schurter
0de0e1d342 Handle leader task being dead in RestoreState
Fixes the panic mentioned in
https://github.com/hashicorp/nomad/issues/3420#issuecomment-341666932

While a leader task dying serially stops all follower tasks, the
synchronizing of state is asynchrnous. Nomad can shutdown before all
follower tasks have updated their state to dead thus saving the state
necessary to hit this panic: *have a non-terminal alloc with a dead
leader.*

The actual fix is a simple nil check to not assume non-terminal allocs
leader's have a TaskRunner.
2017-11-15 10:36:13 -08:00
Preetha Appan
b3631f3d32 Remove event GenericSource, and address other code review comments. Also added deprecation info in comments. 2017-11-03 10:10:06 -05:00
Chelsea Holland Komlo
fba1653057 Add functionality for authenticated volumes 2017-10-11 17:09:20 -07:00
Michael Schurter
04b8f8e7fc Remove structs import from api
Goes a step further and removes structs import from api's tests as well
by moving GenerateUUID to its own package.
2017-09-29 10:36:08 -07:00
Alex Dadgar
a9e3a41407 Enable more linters 2017-09-26 15:26:33 -07:00
Alex Dadgar
7b7ef952f5 Fix tests 2017-08-16 16:26:52 -07:00
Alex Dadgar
9032c0d579 Merge pull request #3025 from hashicorp/f-health-events
Emit task events explaining alloc health
2017-08-15 12:23:46 -07:00
Alex Dadgar
0f2a05714b test 2017-08-12 14:42:53 -07:00
Michael Schurter
8253439de2 Update tests for new blocking/migrating code 2017-08-11 16:21:57 -07:00
Alex Dadgar
3b300925a2 Fix alloc health with checks using interpolation
Fixes an issue in which the allocation health watcher was checking for
allocations health based on un-interpolated services and checks. Change
the interface for retrieving check information from Consul to retrieving
all registered services and checks by allocation. In the future this
will allow us to output nicer messages.

Fixes https://github.com/hashicorp/nomad/issues/2969
2017-08-07 16:27:08 -07:00
Alex Dadgar
08c2ba9bc6 Parallel client tests (#2890)
* alloc_runner

* Random tests

* parallel task_runner and no exec compatible check

* Parallel client

* Fail fast and use random ports

* Fix docker port mapping

* Make concurrent pull less timing dependant

* up parallel

* Fixes

* don't build chroots in parallel on travis

* Reduce parallelism on travis with lxc/rkt

* make java test app not run forever

* drop parallelism a little

* use docker ports that are out of the os's ephemeral port range

* Limit even more on travis

* rkt deadline
2017-07-22 19:04:36 -07:00
Michael Schurter
96baafebd3 Minor test race fix 2017-07-21 16:17:23 -07:00
Michael Schurter
a59d3a80ba Fix test race by locking around ar.tasks access 2017-07-21 14:25:51 -07:00
Michael Schurter
a04f5016a5 Fix more test races 2017-07-21 14:00:21 -07:00
Michael Schurter
8fa599c4a5 Fixup a few more even rarer test races 2017-07-21 13:43:32 -07:00
Michael Schurter
3974dfa98c Fix TestAllocRunner_TaskLeader_StopTG
Also make alloc runner tests less racy. Basically every alloc runner
test used to have races with `upd.{Count,Allocs}`
2017-07-21 13:37:16 -07:00
Alex Dadgar
f72bbaa370 Client watches for allocation health using task state and Consul checks
This PR adds watching of allocation health at the client. The client can
watch for health based on the tasks running on time and also based on
the consul checks passing.
2017-07-07 12:10:04 -07:00
Michael Schurter
11863660a0 Destroy task group leader first
Before this commit all tasks in a task group were destroyed
concurrently. This meant logging sidecars might be stopped before the
leader task whose logs still need to be shipped.

This commit blocks on the leader shutting down before signalling to
followers to shutdown.
2017-07-03 13:56:56 -07:00
Michael Schurter
2aaf149e10 Cleanup lots of leaked alloc runners in tests 2017-05-31 11:39:50 -07:00
Michael Schurter
6571fa548d Switch tests to mock_driver 2017-05-25 09:28:10 -07:00
Michael Schurter
118892453b Shrink chroot to avoid timing test failure 2017-05-23 16:11:24 -07:00
Michael Schurter
72a24aecb0 Add env.Builder.UpdateTask for alloc updates 2017-05-23 16:00:57 -07:00
Michael Schurter
1295f88d03 Handle Driver.Prestart returning nil, nil 2017-05-23 13:53:34 -07:00
Alex Dadgar
997390b04c Fix test 2017-05-09 11:35:48 -07:00
Alex Dadgar
e47be9f771 Merge branch 'master' into f-bolt-db 2017-05-09 11:11:55 -07:00
Michael Schurter
499ada5a64 Merge pull request #2585 from hashicorp/b-2554-container-exec
Execute exec/java script checks in containers
2017-05-05 10:31:18 -07:00
Alex Dadgar
3642434293 Fix tests 2017-05-03 15:14:19 -07:00
Alex Dadgar
5952dae3e0 Fix tests 2017-05-03 11:15:30 -07:00
Michael Schurter
2cc492c95e Test pre-0.6 script check upgrade path 2017-04-25 11:41:03 -07:00
Michael Schurter
10cb924b2c Refactor Consul Syncer into new ServiceClient
Fixes #2478 #2474 #1995 #2294

The new client only handles agent and task service advertisement. Server
discovery is mostly unchanged.

The Nomad client agent now handles all Consul operations instead of the
executor handling task related operations. When upgrading from an
earlier version of Nomad existing executors will be told to deregister
from Consul so that the Nomad agent can re-register the task's services
and checks.

Drivers - other than qemu - now support an Exec method for executing
abritrary commands in a task's environment. This is used to implement
script checks.

Interfaces are used extensively to avoid interacting with Consul in
tests that don't assert any Consul related behavior.
2017-04-19 12:42:47 -07:00
Alex Dadgar
c934063713 FinishedAt only records when the task has actually started 2017-03-31 17:06:05 -07:00
Alex Dadgar
d212f6fe94 Track task start/finish time & improve logs errors
This PR adds tracking to when a task starts and finishes and the logs
API takes advantage of this and returns better errors when asking for
logs that do not exist.
2017-03-31 16:14:11 -07:00
Alex Dadgar
fbfeecb486 Fix TestAllocRunner_SaveRestoreState 2017-03-02 20:45:46 -08:00
Alex Dadgar
07f7e19578 Fix vet script and fix vet problems
This PR fixes our vet script and fixes all the missed vet changes.

It also fixes pointers being printed in `nomad stop <job>` and `nomad
node-status <node>`.
2017-02-27 16:00:19 -08:00
Alex Dadgar
edbc84087c Add Leader support to client 2017-02-10 17:55:19 -08:00
Alex Dadgar
7181c5ac48 fix flaky test 2017-01-23 14:12:38 -08:00
Michael Schurter
8e82326c2f Prevent race between alloc runners
Block ar1's periodic syncing which could recreate the state file ar2 was
destroying.
2017-01-17 13:10:20 -08:00
Michael Schurter
3011df1c89 Fix upgrade path for #2132
AllocRunner's state dropped the Context struct which needs to be
converted to the new AllocDir+TaskDir structs in RestoreState.

TaskRunner added a TaskDirBuilt flag, but it's safe to just let that
default to `false` and rebuild all task dirs once on upgrade.
2017-01-05 16:31:55 -08:00
Michael Schurter
de7351b959 Move chroot building into TaskRunner
* Refactor AllocDir to have a TaskDir struct per task.
* Drivers expose filesystem isolation preference
* Fix lxc mounting of `secrets/`
2017-01-05 16:31:49 -08:00
Michael Schurter
b7083f4842 Bump timeout on test 2016-11-29 16:19:40 -08:00
Diptanu Choudhury
e32a855ca6 Fixed alloc dir move tests 2016-10-26 15:17:57 -07:00