Commit Graph

45 Commits

Author SHA1 Message Date
Jasmine Dahilig
81cad55d40 task lifecycle poststart: code review fixes 2020-08-31 13:22:41 -07:00
Michael Schurter
599b56e054 test: add allocrunner test for poststart hooks 2020-08-12 09:54:14 -07:00
Jasmine Dahilig
9cf4429518 lifecycle: add allocrunner and task hook coordinator unit tests 2020-07-29 12:39:42 -07:00
Mahmood Ali
73f19eb3b8 allocrunner: terminate sidecars in the end
This fixes a bug where a batch allocation fails to complete if it has
sidecars.

If the only remaining running tasks in an allocations are sidecars - we
must kill them and mark the allocation as complete.
2020-06-29 15:12:15 -04:00
Mahmood Ali
55db937f16 tests: update AR task restart policy 2020-03-24 17:00:42 -04:00
Jasmine Dahilig
db7e8614f3 remove debugging test code from TestAllocRunner_TaskLeader_StopRestoredTG 2020-03-21 17:52:54 -04:00
Jasmine Dahilig
60671f880d fix bug in lifecycle restore tests after refactor 2020-03-21 17:52:54 -04:00
Jasmine Dahilig
88d3e232a2 refactor task hook coordinator helper method and tests 2020-03-21 17:52:53 -04:00
Jasmine Dahilig
0031b6777f clean up restore test 2020-03-21 17:52:52 -04:00
Jasmine Dahilig
aced15ea27 partial test for restore functionality 2020-03-21 17:52:52 -04:00
Drew Bailey
3b033b2ef5 allow only positive shutdown delay
more explicit test case, remove select statement
2019-12-16 11:38:30 -05:00
Drew Bailey
672b76056b shutdown delay for task groups
copy struct values

ensure groupserviceHook implements RunnerPreKillhook

run deregister first

test that shutdown times are delayed

move magic number into variable
2019-12-16 11:38:16 -05:00
Nick Ethier
387b016ac4 client: improve group service stanza interpolation and check_re… (#6586)
* client: improve group service stanza interpolation and check_restart support

Interpolation can now be done on group service stanzas. Note that some task runtime specific information
that was previously available when the service was registered poststart of a task is no longer available.

The check_restart stanza for checks defined on group services will now properly restart the allocation upon
check failures if configured.
2019-11-18 13:04:01 -05:00
Mahmood Ali
a80643e46d Don't persist allocs of destroyed alloc runners
This fixes a bug where allocs that have been GCed get re-run again after client
is restarted.  A heavily-used client may launch thousands of allocs on startup
and get killed.

The bug is that an alloc runner that gets destroyed due to GC remains in
client alloc runner set.  Periodically, they get persisted until alloc is
gced by server.  During that  time, the client db will contain the alloc
but not its individual tasks status nor completed state.  On client restart,
client assumes that alloc is pending state and re-runs it.

Here, we fix it by ensuring that destroyed alloc runners don't persist any alloc
to the state DB.

This is a short-term fix, as we should consider revamping client state
management.  Storing alloc and task information in non-transaction non-atomic
concurrently while alloc runner is running and potentially changing state is a
recipe for bugs.

Fixes https://github.com/hashicorp/nomad/issues/5984
Related to https://github.com/hashicorp/nomad/pull/5890
2019-08-25 11:21:28 -04:00
Preetha Appan
7de4018656 code review feedback 2019-07-10 10:41:06 -05:00
Preetha Appan
26652d7a6b Populate task event struct with kill timeout
This makes for a nicer task event message
2019-07-09 09:37:09 -05:00
Preetha Appan
b4ecb448b3 Update deployment health on failed allocations only if health is unset
This fixes a confusing UX where a previously successful deployment's
healthy/unhealthy count would get updated if any allocations failed after
the deployment was already marked as successful.
2019-05-02 22:59:56 -05:00
Michael Schurter
8d409a6e39 client: test logmon cleanup
The test is sadly quite complicated and peeks into things (logmon's
reattach config) AR doesn't normally have access to.

However, I couldn't find another way of asserting logmon got cleaned up
without resorting to smaller unit tests. Smaller unit tests risk
re-implementing dependencies in an unrealistic way, so I opted for an
ugly integration test.
2019-03-04 13:15:15 -08:00
Preetha Appan
ad58ba3e18 More alloc runner tests ported from 0.8.7 2019-02-22 17:58:06 -06:00
Mahmood Ali
8b7f66499f address review comments 2019-02-22 15:56:14 -05:00
Mahmood Ali
4c30b03879 tests: port TestAllocRunner_RetryArtifact
Port TestAllocRunner_RetryArtifact from https://github.com/hashicorp/nomad/blob/v0.8.7/client/alloc_runner_test.go#L610-L672

I changed the test name because it doesn't actually test that artifact
hooks is retried
2019-02-22 15:50:39 -05:00
Mahmood Ali
69906bade4 tests: port TestAllocRunner_MoveAllocDir test 2019-02-22 15:50:39 -05:00
Michael Schurter
159266ccec tests: port TestAllocRunner_Destroy from 0.8
Also add destroy(ar) helper to fix a bunch of shutdown races in AR
tests.
2019-02-20 12:35:09 -08:00
Michael Schurter
7b8ec414a3 client: fix setting alloc unhealthy at deadline
During the 0.9 client refactor the code to fail a deployment when the
deadline was reached was broken. This restores and tests that behavior.
2019-02-19 07:44:14 -08:00
Michael Schurter
7445e418ca test: port some pre-0.9 DeploymentHealth tests
Skipping a failing one as I need to move to some other work and don't
want to leave this work orphaned on my machine.
2019-01-14 09:56:53 -08:00
Alex Dadgar
296141bb58 Merge pull request #5002 from hashicorp/b-task-config-resources
Convert driver resource to AllocatedTaskResource
2018-12-18 16:46:34 -08:00
Alex Dadgar
517bf1c35f Fix unit tests + upgrade pathing resources 2018-12-18 15:50:44 -08:00
Danielle Tomlinson
502f36335e allocrunner: Drop and log updates after closing waitCh 2018-12-18 23:38:34 +01:00
Danielle Tomlinson
69fc73767a allocrunner: Handle updates asynchronously
This creates a new buffered channel and goroutine on the allocrunner for
serializing updates to allocations. This allows us to take updates off
the routine that is used from processing updates from the server,
without having complicated machinery for tracking update lifetimes, or
other external synchronization.

This results in a nice performance improvement and signficantly better
throughput on batch changes such as preempting a large number of jobs
for a larger placement.
2018-12-18 23:38:33 +01:00
Danielle Tomlinson
800bd57333 allocrunner: Async shutdown and destroy
This commit reduces the locking required to shutdown or destroy
allocrunners, and allows parallel shutdown and destroy of allocrunners during
shutdown.
2018-12-18 23:38:33 +01:00
Danielle Tomlinson
62ac40ab09 allocrunner: Basic test alloc runner 2018-12-06 12:28:23 +01:00
Alex Dadgar
429c5bb885 Device hook and devices affect computed node class
This PR introduces a device hook that retrieves the device mount
information for an allocation. It also updates the computed node class
computation to take into account devices.

TODO Fix the task runner unit test. The environment variable is being
lost even though it is being properly set in the prestart hook.
2018-11-27 17:25:33 -08:00
Michael Schurter
31f113ba4d client: support graceful shutdowns
Client.Shutdown now blocks until all AllocRunners and TaskRunners have
exited their Run loops. Tasks are left running.
2018-11-19 16:39:30 -08:00
Mahmood Ali
5c906aa085 convert all config durations to strings in tests 2018-11-13 10:21:40 -05:00
Michael Schurter
d5c8e5bd26 client: fix ar and tr tests 2018-11-05 12:32:05 -08:00
Michael Schurter
fdbe446ea6 client: first pass at implementing task restoring
Task restoring works but dead tasks may be restarted
2018-11-05 12:32:05 -08:00
Michael Schurter
d71e7666bd ar: fix leader handling, state restoring, and destroying unrun ARs
* Migrated all of the old leader task tests and got them passing
* Refactor and consolidate task killing code in AR to always kill leader
  tasks first
* Fixed lots of issues with state restoring
* Fixed deadlock in AR.Destroy if AR.Run had never been called
* Added a new in memory statedb for testing
2018-10-19 09:45:45 -07:00
Michael Schurter
e029980b25 tests: explicitly cleanup after clients 2018-10-17 10:06:59 -07:00
Michael Schurter
2417ec5621 ar: fix task leader, update, and stop handling 2018-10-17 10:06:59 -07:00
Nick Ethier
4f9522dd54 client: review comments and fixup/skip tests 2018-10-16 16:56:56 -07:00
Alex Dadgar
3a492bb33f allocrunnerv2 -> allocrunner 2018-10-16 16:56:56 -07:00
Alex Dadgar
2e535aefcc move files around 2018-10-16 16:56:55 -07:00
Alex Dadgar
0f2f4797cb fixing tests 2018-10-04 14:26:19 -07:00
Alex Dadgar
3d3490fdf1 test fixes 2018-06-12 17:45:39 -07:00
Alex Dadgar
a62e412b88 Refactor - wip 2018-06-12 10:23:45 -07:00