Commit Graph

57 Commits

Author SHA1 Message Date
Chris Baker
03500494b2 cleanup test 2019-06-18 14:15:25 +00:00
Chris Baker
7050e14eb5 formatting and clarity 2019-06-18 14:00:57 +00:00
Chris Baker
240d68765c metrics: add namespace label to allocation metrics 2019-06-17 20:50:26 +00:00
Danielle Lancashire
1aa6bc80d8 trt: Fix test 2019-06-12 17:06:11 +02:00
Danielle Lancashire
5933528404 trhooks: Add TaskStopHook interface to services
We currently only run cleanup Service Hooks when a task is either
Killed, or Exited. However, due to the implementation of a task runner,
tasks are only Exited if they every correctly started running, which is
not true when you recieve an error early in the task start flow, such as
not being able to pull secrets from Vault.

This updates the service hook to also call consul deregistration
routines during a task Stop lifecycle event, to ensure that any
registered checks and services are cleared in such cases.

fixes #5770
2019-06-12 16:00:21 +02:00
Michael Schurter
796c05b9b8 client: register before restoring
Registration and restoring allocs don't share state or depend on each
other in any way (syncing allocs with servers is done outside of
registration).

Since restoring is synchronous, start the registration goroutine first.

For nodes with lots of allocs to restore or close to their heartbeat
deadline, this could be the difference between becoming "lost" or not.
2019-05-14 10:53:27 -07:00
Michael Schurter
6a2792ad90 client: do not restart dead tasks until server is contacted (try 2)
Refactoring of 104067bc2b2002a4e45ae7b667a476b89addc162

Switch the MarkLive method for a chan that is closed by the client.
Thanks to @notnoop for the idea!

The old approach called a method on most existing ARs and TRs on every
runAllocs call. The new approach does a once.Do call in runAllocs to
accomplish the same thing with less work. Able to remove the gate
abstraction that did much more than was needed.
2019-05-14 10:53:27 -07:00
Michael Schurter
e7042b674b client: do not restart dead tasks until server is contacted
Fixes #1795

Running restored allocations and pulling what allocations to run from
the server happen concurrently. This means that if a client is rebooted,
and has its allocations rescheduled, it may restart the dead allocations
before it contacts the server and determines they should be dead.

This commit makes tasks that fail to reattach on restore wait until the
server is contacted before restarting.
2019-05-14 10:53:27 -07:00
Michael Schurter
21e895e2e7 Revert "executor/linux: add defensive checks to binary path"
This reverts commit cb36f4537e.
2019-04-02 11:17:12 -07:00
Michael Schurter
cb36f4537e executor/linux: add defensive checks to binary path 2019-04-02 09:40:53 -07:00
Michael Schurter
254901a51e executor/linux: make chroot binary paths absolute
Avoid libcontainer.Process trying to lookup the binary via $PATH as the
executor has already found where the binary is located.
2019-04-01 15:45:31 -07:00
Michael Schurter
2dbc06de61 tests: port pre-0.9 task env tests
I chose to make them more of integration tests since there's a lot more
plumbing involved. The internal implementation details of how we craft
task envs can now change and these tests will still properly assert the
task runtime environment is setup properly.
2019-03-25 09:46:53 -07:00
Michael Schurter
8d409a6e39 client: test logmon cleanup
The test is sadly quite complicated and peeks into things (logmon's
reattach config) AR doesn't normally have access to.

However, I couldn't find another way of asserting logmon got cleaned up
without resorting to smaller unit tests. Smaller unit tests risk
re-implementing dependencies in an unrealistic way, so I opted for an
ugly integration test.
2019-03-04 13:15:15 -08:00
Michael Schurter
db9daf6631 client: ensure task is cleaned up when terminal
This commit is a significant change. TR.Run is now always executed, even
for terminal allocations. This was changed to allow TR.Run to cleanup
(run stop hooks) if a handle was recovered.

This is intended to handle the case of Nomad receiving a
DesiredStatus=Stop allocation update, persisting it, but crashing before
stopping AR/TR.

The commit also renames task runner hook data as it was very easy to
accidently set state on Requests instead of Responses using the old
field names.
2019-03-01 14:00:23 -08:00
Mahmood Ali
e1e5053936 emit TaskRestartSignal event on vault restart
When Vault token expires and task is restarted, emit `TaskRestartSignal`
similar to v0.8.7
2019-02-22 15:56:14 -05:00
Mahmood Ali
8b7f66499f address review comments 2019-02-22 15:56:14 -05:00
Mahmood Ali
d80774fde0 tests: port TestTaskRunner_VaultManager_Signal
From https://github.com/hashicorp/nomad/blob/v0.8.7/client/task_runner_test.go#L1427
2019-02-22 15:53:04 -05:00
Mahmood Ali
b8c74ff6ca tests: port TestTaskRunner_VaultManager_Restart
From https://github.com/hashicorp/nomad/blob/v0.8.7/client/task_runner_test.go#L1352
2019-02-22 15:53:04 -05:00
Mahmood Ali
ce04bb7440 tests: port TestTaskRunner_UnregisterConsul_Retries
From https://github.com/hashicorp/nomad/blob/v0.8.7/client/task_runner_test.go#L620
2019-02-22 15:53:04 -05:00
Mahmood Ali
d6a5a1c5a5 tests: port TestTaskRunner_Template_NewVaultToken
From https://github.com/hashicorp/nomad/blob/v0.8.7/client/task_runner_test.go#L1275
2019-02-22 15:53:04 -05:00
Mahmood Ali
90ca1ab5a3 tests: port TestTaskRunner_Template_Artifact
From https://github.com/hashicorp/nomad/blob/v0.8.7/client/task_runner_test.go#L1195
2019-02-22 15:52:59 -05:00
Michael Schurter
cf66e25e57 client: restart on recoverable StartTask errors
Fixes restarting on recoverable errors from StartTask.

Ports TestTaskRunner_Run_RecoverableStartError from 0.8 which discovered
the bug.
2019-02-21 15:30:49 -08:00
Michael Schurter
414532adab test: port TestTaskRunner_RestartSignalTask_NotRunning from 0.8 2019-02-21 15:30:49 -08:00
Michael Schurter
d4a17ae71f test: port TestTaskRunner_DriverNetwork from 0.8 2019-02-21 15:30:49 -08:00
Michael Schurter
c51a54cfee client: artifact errors are retry-able
0.9.0beta2 contains a regression where artifact download errors would
not cause a task restart and instead immediately fail the task.

This restores the pre-0.9 behavior of retrying all artifact errors and
adds missing tests.
2019-02-20 07:21:27 -08:00
Michael Schurter
83979252cd tests: add new task runner test helper
Adds a new helper and removes a duplicated test.
2019-02-20 07:21:27 -08:00
Mahmood Ali
eb8b19ec82 test: improve readability of duration
Co-Authored-By: schmichael <michael.schurter@gmail.com>
2019-02-14 08:12:06 -08:00
Mahmood Ali
a96cc97389 test: improve failure message
Co-Authored-By: schmichael <michael.schurter@gmail.com>
2019-02-14 08:11:37 -08:00
Michael Schurter
fa9537f6e9 tests: port TestTaskRunner_Download_List from 0.8 2019-02-12 15:48:04 -08:00
Michael Schurter
cfbe7520e8 consul: fix task deregistration hook
Broke ShutdownDelay but the test was timing dependent so it just
appeared flaky. Made the test slower so that it should never incorrectly
pass.
2019-02-12 15:36:02 -08:00
Michael Schurter
f2506e4d29 tests: port TaskRunner_DeriveToken tests from 0.8 2019-02-12 15:36:02 -08:00
Michael Schurter
b41308f16a tests: port TestTaskRunner_BlockForVault from 0.8
Also fix race conditions in the mock vault client.
2019-02-12 13:46:09 -08:00
Michael Schurter
06119e2505 test: port TestTaskRunner_CheckWatcher_Restart
Added ability to adjust the number of events the TaskRunner keeps as
there's no way to observe all events otherwise.

Task events differ slightly from 0.8 because 0.9 emits Terminated every
time a task exits instead of only when it exits on its own (not due to
restart or kill).

0.9 does not emit Killing/Killed for restarts like 0.8 which seems fine
as `Restart Signaled/Terminated/Restarting` is more descriptive.

Original v0.8 events emitted:
```
	expected := []string{
		"Received",
		"Task Setup",
		"Started",
		"Restart Signaled",
		"Killing",
		"Killed",
		"Restarting",
		"Started",
		"Restart Signaled",
		"Killing",
		"Killed",
		"Restarting",
		"Started",
		"Restart Signaled",
		"Killing",
		"Killed",
		"Not Restarting",
	}
```
2019-01-22 09:46:46 -08:00
Michael Schurter
81334cd3a1 test: port RestartTask from 0.8 2019-01-22 08:08:08 -08:00
Michael Schurter
418d360d19 test: port SignalFailure test from 0.8
Also fix signal error handling in mock_driver.
2019-01-22 08:08:08 -08:00
Chris Baker
d2f3dd7530 documenting test for task runner failure to set TaskGroupName 2019-01-18 20:00:49 +00:00
Michael Schurter
152ba08207 test: porting TestTaskRunner_SimpleRun_Dispatch
Porting test from 0.8 to 0.9.
2019-01-15 15:22:13 -08:00
Michael Schurter
99a5586aed test: assert shutdown delay deregs first
Restore a pre-0.9 test that asserts Consul services are deregistered
before a task's shutdown delay.
2019-01-14 09:56:53 -08:00
Alex Dadgar
19e67a0916 Test recovery 2019-01-07 14:49:41 -08:00
Nick Ethier
2f010a2f25 client/drivermananger: fixup issues from rebase and address PR comments 2018-12-18 22:55:38 -05:00
Nick Ethier
39ca1b00dd client/drivermananger: add driver manager
The driver manager is modeled after the device manager and is started by the client.
It's responsible for handling driver lifecycle and reattachment state, as well as
processing the incomming fingerprint and task events from each driver. The mananger
exposes a method for registering event handlers for task events that is used by the
task runner to update the server when a task has been updated with an event.

Since driver fingerprinting has been implemented by the driver manager, it is no
longer needed in the fingerprint mananger and has been removed.
2018-12-18 22:55:18 -05:00
Michael Schurter
a13607f2d9 client: properly support hook env vars
The old approach was incomplete. Hook env vars are now:

 * persisted and restored between agent restarts
 * deterministic (LWW if 2 hooks set the same key)
2018-11-27 17:25:33 -08:00
Alex Dadgar
429c5bb885 Device hook and devices affect computed node class
This PR introduces a device hook that retrieves the device mount
information for an allocation. It also updates the computed node class
computation to take into account devices.

TODO Fix the task runner unit test. The environment variable is being
lost even though it is being properly set in the prestart hook.
2018-11-27 17:25:33 -08:00
Michael Schurter
5d6d4bf290 Merge pull request #4883 from hashicorp/f-graceful-shutdown
Support graceful shutdowns in agent
2018-11-27 15:55:15 -06:00
Michael Schurter
31f113ba4d client: support graceful shutdowns
Client.Shutdown now blocks until all AllocRunners and TaskRunners have
exited their Run loops. Tasks are left running.
2018-11-19 16:39:30 -08:00
Michael Schurter
9189173649 tests: fix tests post-rebase 2018-11-15 17:40:56 -08:00
Michael Schurter
b878bbf3f7 client: add new nested variables to task's hcl ctx
The error messages are really bad, but it's extremely difficult to
produce good error messages without the original HCL.
2018-11-15 16:26:25 -08:00
Michael Schurter
43b359914b client: interpolate driver configurations
Also add missing SetDriverNetwork calls.
2018-11-15 16:25:57 -08:00
Mahmood Ali
5c906aa085 convert all config durations to strings in tests 2018-11-13 10:21:40 -05:00
Michael Schurter
d5c8e5bd26 client: fix ar and tr tests 2018-11-05 12:32:05 -08:00