Commit Graph

98 Commits

Author SHA1 Message Date
Michael Schurter
0de0e1d342 Handle leader task being dead in RestoreState
Fixes the panic mentioned in
https://github.com/hashicorp/nomad/issues/3420#issuecomment-341666932

While a leader task dying serially stops all follower tasks, the
synchronizing of state is asynchrnous. Nomad can shutdown before all
follower tasks have updated their state to dead thus saving the state
necessary to hit this panic: *have a non-terminal alloc with a dead
leader.*

The actual fix is a simple nil check to not assume non-terminal allocs
leader's have a TaskRunner.
2017-11-15 10:36:13 -08:00
Preetha Appan
b3631f3d32 Remove event GenericSource, and address other code review comments. Also added deprecation info in comments. 2017-11-03 10:10:06 -05:00
Chelsea Holland Komlo
fba1653057 Add functionality for authenticated volumes 2017-10-11 17:09:20 -07:00
Michael Schurter
04b8f8e7fc Remove structs import from api
Goes a step further and removes structs import from api's tests as well
by moving GenerateUUID to its own package.
2017-09-29 10:36:08 -07:00
Alex Dadgar
a9e3a41407 Enable more linters 2017-09-26 15:26:33 -07:00
Alex Dadgar
7b7ef952f5 Fix tests 2017-08-16 16:26:52 -07:00
Alex Dadgar
9032c0d579 Merge pull request #3025 from hashicorp/f-health-events
Emit task events explaining alloc health
2017-08-15 12:23:46 -07:00
Alex Dadgar
0f2a05714b test 2017-08-12 14:42:53 -07:00
Michael Schurter
8253439de2 Update tests for new blocking/migrating code 2017-08-11 16:21:57 -07:00
Alex Dadgar
3b300925a2 Fix alloc health with checks using interpolation
Fixes an issue in which the allocation health watcher was checking for
allocations health based on un-interpolated services and checks. Change
the interface for retrieving check information from Consul to retrieving
all registered services and checks by allocation. In the future this
will allow us to output nicer messages.

Fixes https://github.com/hashicorp/nomad/issues/2969
2017-08-07 16:27:08 -07:00
Alex Dadgar
08c2ba9bc6 Parallel client tests (#2890)
* alloc_runner

* Random tests

* parallel task_runner and no exec compatible check

* Parallel client

* Fail fast and use random ports

* Fix docker port mapping

* Make concurrent pull less timing dependant

* up parallel

* Fixes

* don't build chroots in parallel on travis

* Reduce parallelism on travis with lxc/rkt

* make java test app not run forever

* drop parallelism a little

* use docker ports that are out of the os's ephemeral port range

* Limit even more on travis

* rkt deadline
2017-07-22 19:04:36 -07:00
Michael Schurter
96baafebd3 Minor test race fix 2017-07-21 16:17:23 -07:00
Michael Schurter
a59d3a80ba Fix test race by locking around ar.tasks access 2017-07-21 14:25:51 -07:00
Michael Schurter
a04f5016a5 Fix more test races 2017-07-21 14:00:21 -07:00
Michael Schurter
8fa599c4a5 Fixup a few more even rarer test races 2017-07-21 13:43:32 -07:00
Michael Schurter
3974dfa98c Fix TestAllocRunner_TaskLeader_StopTG
Also make alloc runner tests less racy. Basically every alloc runner
test used to have races with `upd.{Count,Allocs}`
2017-07-21 13:37:16 -07:00
Alex Dadgar
f72bbaa370 Client watches for allocation health using task state and Consul checks
This PR adds watching of allocation health at the client. The client can
watch for health based on the tasks running on time and also based on
the consul checks passing.
2017-07-07 12:10:04 -07:00
Michael Schurter
11863660a0 Destroy task group leader first
Before this commit all tasks in a task group were destroyed
concurrently. This meant logging sidecars might be stopped before the
leader task whose logs still need to be shipped.

This commit blocks on the leader shutting down before signalling to
followers to shutdown.
2017-07-03 13:56:56 -07:00
Michael Schurter
2aaf149e10 Cleanup lots of leaked alloc runners in tests 2017-05-31 11:39:50 -07:00
Michael Schurter
6571fa548d Switch tests to mock_driver 2017-05-25 09:28:10 -07:00
Michael Schurter
118892453b Shrink chroot to avoid timing test failure 2017-05-23 16:11:24 -07:00
Michael Schurter
72a24aecb0 Add env.Builder.UpdateTask for alloc updates 2017-05-23 16:00:57 -07:00
Michael Schurter
1295f88d03 Handle Driver.Prestart returning nil, nil 2017-05-23 13:53:34 -07:00
Alex Dadgar
997390b04c Fix test 2017-05-09 11:35:48 -07:00
Alex Dadgar
e47be9f771 Merge branch 'master' into f-bolt-db 2017-05-09 11:11:55 -07:00
Michael Schurter
499ada5a64 Merge pull request #2585 from hashicorp/b-2554-container-exec
Execute exec/java script checks in containers
2017-05-05 10:31:18 -07:00
Alex Dadgar
3642434293 Fix tests 2017-05-03 15:14:19 -07:00
Alex Dadgar
5952dae3e0 Fix tests 2017-05-03 11:15:30 -07:00
Michael Schurter
2cc492c95e Test pre-0.6 script check upgrade path 2017-04-25 11:41:03 -07:00
Michael Schurter
10cb924b2c Refactor Consul Syncer into new ServiceClient
Fixes #2478 #2474 #1995 #2294

The new client only handles agent and task service advertisement. Server
discovery is mostly unchanged.

The Nomad client agent now handles all Consul operations instead of the
executor handling task related operations. When upgrading from an
earlier version of Nomad existing executors will be told to deregister
from Consul so that the Nomad agent can re-register the task's services
and checks.

Drivers - other than qemu - now support an Exec method for executing
abritrary commands in a task's environment. This is used to implement
script checks.

Interfaces are used extensively to avoid interacting with Consul in
tests that don't assert any Consul related behavior.
2017-04-19 12:42:47 -07:00
Alex Dadgar
c934063713 FinishedAt only records when the task has actually started 2017-03-31 17:06:05 -07:00
Alex Dadgar
d212f6fe94 Track task start/finish time & improve logs errors
This PR adds tracking to when a task starts and finishes and the logs
API takes advantage of this and returns better errors when asking for
logs that do not exist.
2017-03-31 16:14:11 -07:00
Alex Dadgar
fbfeecb486 Fix TestAllocRunner_SaveRestoreState 2017-03-02 20:45:46 -08:00
Alex Dadgar
07f7e19578 Fix vet script and fix vet problems
This PR fixes our vet script and fixes all the missed vet changes.

It also fixes pointers being printed in `nomad stop <job>` and `nomad
node-status <node>`.
2017-02-27 16:00:19 -08:00
Alex Dadgar
edbc84087c Add Leader support to client 2017-02-10 17:55:19 -08:00
Alex Dadgar
7181c5ac48 fix flaky test 2017-01-23 14:12:38 -08:00
Michael Schurter
8e82326c2f Prevent race between alloc runners
Block ar1's periodic syncing which could recreate the state file ar2 was
destroying.
2017-01-17 13:10:20 -08:00
Michael Schurter
3011df1c89 Fix upgrade path for #2132
AllocRunner's state dropped the Context struct which needs to be
converted to the new AllocDir+TaskDir structs in RestoreState.

TaskRunner added a TaskDirBuilt flag, but it's safe to just let that
default to `false` and rebuild all task dirs once on upgrade.
2017-01-05 16:31:55 -08:00
Michael Schurter
de7351b959 Move chroot building into TaskRunner
* Refactor AllocDir to have a TaskDir struct per task.
* Drivers expose filesystem isolation preference
* Fix lxc mounting of `secrets/`
2017-01-05 16:31:49 -08:00
Michael Schurter
b7083f4842 Bump timeout on test 2016-11-29 16:19:40 -08:00
Diptanu Choudhury
e32a855ca6 Fixed alloc dir move tests 2016-10-26 15:17:57 -07:00
Alex Dadgar
51831f92dd Merge pull request #1840 from hashicorp/f-kill-fail
Change how we mark tasks as failed and allow consul-template to fail tasks
2016-10-24 13:40:52 -07:00
Michael Schurter
4d3187bc7e Remove disk usage enforcement
Many thanks to @iverberk for the original PR (#1609), but we ended up
not wanting to ship this implementation with 0.5.

We'll come back to it after 0.5 and hopefully find a way to leverage
filesystem accounting and quotas, so we can skip the expensive polling.
2016-10-21 13:55:51 -07:00
Alex Dadgar
f130c8a894 Change how we mark tasks as failed and allow consul-template to fail tasks 2016-10-20 17:27:16 -07:00
Alex Dadgar
e34902ae8a Large refactor of task runner and Vault token rehandling 2016-10-18 11:24:20 -07:00
Ben Barnard
ce94317d00 Replace "the the" with "the" in documentation and comments 2016-10-11 15:31:40 -04:00
Diptanu Choudhury
c29861b418 Getting snapshot of allocation from remote node (#1741)
* Added the alloc dir move

* Moving allocdirs when starting allocations

* Added the migrate flag to ephemeral disk

* Stopping migration if the allocation doesn't need migration any more

* Added the GetAllocDir method

* refactored code

* Added a test for alloc runner

* Incorporated review comments
2016-10-03 09:59:57 -07:00
Alex Dadgar
d49dda45e3 Merge pull request #1713 from hashicorp/f-alloc-runner-vault
Vault integration in client
2016-09-20 16:15:55 -07:00
Alex Dadgar
cd8784894d Alloc runner tests 2016-09-15 17:24:09 -07:00
Alex Dadgar
038991b5e3 Handle recovery failure 2016-09-15 12:50:44 -07:00