Commit Graph

13593 Commits

Author SHA1 Message Date
Alex Dadgar
c6bb070bdf Merge pull request #5015 from hashicorp/f-plugin-versions
Add plugin API versioning to plugin loader and plugins
2018-12-18 16:49:02 -08:00
Alex Dadgar
52202c68fc fix docker launching plugins 2018-12-18 16:48:01 -08:00
Alex Dadgar
07a7555acd lint 2018-12-18 16:48:00 -08:00
Alex Dadgar
ed4f8eac6e Add plugin API versioning to plugin loader and plugins 2018-12-18 16:48:00 -08:00
Alex Dadgar
0cdf6634a5 base fixes 2018-12-18 16:48:00 -08:00
Alex Dadgar
d4fd73d536 protos 2018-12-18 16:48:00 -08:00
Alex Dadgar
296141bb58 Merge pull request #5002 from hashicorp/b-task-config-resources
Convert driver resource to AllocatedTaskResource
2018-12-18 16:46:34 -08:00
Preetha
e6aa2168be Merge pull request #5024 from hashicorp/f-affinities-e2e
Affinities e2e tests
2018-12-18 18:41:16 -06:00
Preetha Appan
d10b9b79a2 Affinities e2e tests 2018-12-18 18:39:45 -06:00
Danielle Tomlinson
5d3e27e691 Merge pull request #5007 from hashicorp/dani/f-allocrunner-async
allocrunner: Async api for shutdown/destroy/update
2018-12-19 01:26:41 +01:00
Alex Dadgar
517bf1c35f Fix unit tests + upgrade pathing resources 2018-12-18 15:50:44 -08:00
Alex Dadgar
d5512c39f0 Lint 2018-12-18 15:50:44 -08:00
Alex Dadgar
7a0b73341a LinuxResources doesn't use task.Resources 2018-12-18 15:50:44 -08:00
Alex Dadgar
cd6879409c Drivers 2018-12-18 15:50:11 -08:00
Alex Dadgar
da6925bfc1 utilities 2018-12-18 15:48:52 -08:00
Alex Dadgar
e1cf3ac69e protos 2018-12-18 15:48:52 -08:00
Danielle Tomlinson
0984bf1812 Merge pull request #5021 from hashicorp/dani/rand-tasks
taskrunner: Use a random suffix for Task Config
2018-12-19 00:39:55 +01:00
Danielle Tomlinson
b92bc1178d taskrunner: Use a random suffix for Task Config
The RestartCount is not really suitable for use as a source of
uniqueness within task invocations as it is not monotonic, and interacts
with the restart stanza in a users config, so conflates restarts due to
task failures, with restarts due to enviromental changes, such as consul
template or vault secrets changing.

Here we instead use a substring from a uuid, which is more random than
we strictly need, but is nicer than rolling our own random string
generator here.
2018-12-19 00:38:54 +01:00
Danielle Tomlinson
f619db297f client: Update tests for async destroy 2018-12-18 23:38:34 +01:00
Danielle Tomlinson
502f36335e allocrunner: Drop and log updates after closing waitCh 2018-12-18 23:38:34 +01:00
Danielle Tomlinson
5464a9565a allocrunner: Documentation for ShutdownCh/DestroyCh 2018-12-18 23:38:34 +01:00
Danielle Tomlinson
9f1b53f2a8 fixup: Log when we detect out of order updates 2018-12-18 23:38:33 +01:00
Danielle Tomlinson
69fc73767a allocrunner: Handle updates asynchronously
This creates a new buffered channel and goroutine on the allocrunner for
serializing updates to allocations. This allows us to take updates off
the routine that is used from processing updates from the server,
without having complicated machinery for tracking update lifetimes, or
other external synchronization.

This results in a nice performance improvement and signficantly better
throughput on batch changes such as preempting a large number of jobs
for a larger placement.
2018-12-18 23:38:33 +01:00
Danielle Tomlinson
6f636ea15a gc: Wait for allocrunners to be destroyed 2018-12-18 23:38:33 +01:00
Danielle Tomlinson
934d2e6bf6 client: Async API for shutdown/destroy allocrunners 2018-12-18 23:38:33 +01:00
Danielle Tomlinson
800bd57333 allocrunner: Async shutdown and destroy
This commit reduces the locking required to shutdown or destroy
allocrunners, and allows parallel shutdown and destroy of allocrunners during
shutdown.
2018-12-18 23:38:33 +01:00
Omar Khawaja
a6e15202ce Commenting out dead link to demo.nomadproject.io (#5017)
* removing dead link

* comment out header and footer to UI demo

* adding reference to link back and commenting it out
2018-12-18 17:26:51 -05:00
Preetha
08ffb0b15f Merge pull request #5012 from hashicorp/f-e2e-provisioning
Terraform configs for e2e tests
2018-12-18 13:45:58 -06:00
Preetha Appan
75294a781a added readme 2018-12-18 13:37:03 -06:00
Michael Lange
5084eda866 Merge pull request #4981 from hashicorp/b-ui-hide-stats-graphs-for-non-running-resources
UI: Hide stats graphs for non running resources
2018-12-18 11:15:39 -08:00
Danielle Tomlinson
be9763dc9b Merge pull request #5016 from hashicorp/dani/b-docker-delete-task-on-destroy
docker: Delete Task on Destroy
2018-12-18 18:22:36 +01:00
Danielle Tomlinson
ad4bac8d77 docker: Delete Task on Destroy
Currently the docker driver does not remove tasks from its state map
when destroying the task, which leads to issues when restarting tasks in
place, and leaks expired handles over time.
2018-12-18 15:53:31 +01:00
Michael Lange
fc7455c104 Merge pull request #4994 from hashicorp/b-ui-dots-in-tasks
UI: Bugs around dots in task/task-group/driver names
2018-12-17 15:50:31 -08:00
Preetha Appan
82f95b2e0c suggestions from code review 2018-12-17 15:06:22 -06:00
Jack Pearkes
dca95c2e57 Terraform configs for e2e tests 2018-12-17 11:40:09 -06:00
Danielle Tomlinson
bba8b4ef4f Merge pull request #4989 from hashicorp/dani/b-client-update-race-condition
client: Give a copy of clientconfig to allocrunner
2018-12-17 10:49:46 +01:00
Danielle Tomlinson
a282cf69c9 Merge pull request #5004 from hashicorp/dani/f-hook-errors
client: Emit TaskEvents when task hooks fail
2018-12-17 10:42:57 +01:00
Danielle Tomlinson
61a17621e3 taskrunner: Use hook errors for artifacts 2018-12-17 10:39:38 +01:00
Mahmood Ali
c526ddb068 Remove implicit check
I intended to remove this line in 29ef7ecf23 - see my notes there for details.
2018-12-16 09:14:26 -05:00
Mahmood Ali
ee652be312 tests: fix rkt command environment (#5011)
The environment variables needed for envoking `rkt` command line
should include host PATH (to access `iptables`).

Given that the command runs outside the VM, untrusted task environment
variables should NOT be honored here.

We do this already with `rkt`, but the change is quite subtle to miss
when refactoring.
2018-12-15 20:25:36 -05:00
Mahmood Ali
4a51769250 Merge pull request #5008 from hashicorp/b-docker-test-20181214
Fix flakiness in docker tests
2018-12-15 16:03:00 -05:00
Mahmood Ali
29ef7ecf23 tests: avoid implicitly asserting clean shutdown
The assertion here is causing many spurious failures that aren't
actually relevant to the test itself.

We are tracking the cause for this failure independently, and it would
make more sense to have a dedicated test for clean shutdown.
2018-12-15 15:30:09 -05:00
Mahmood Ali
119aabe77b testes: remove TestDockerDriver_Kill
We already have two other Kill tests (e.g.
TestDockerDriver_Start_Kill_Wait and
TestDockerDriver_Start_KillTimeout), so don't need yet another flaky
test.
2018-12-15 15:03:56 -05:00
Mahmood Ali
f248fefdbf driver/docker: stopping a dead container not error 2018-12-15 15:03:56 -05:00
Mahmood Ali
2502ffe589 tests: assert docker containers start 2018-12-15 15:03:56 -05:00
Mahmood Ali
29fc3f77c8 tests: try deflake TestDockerDriver_OOMKilled
Noticed an issue in Docker daemon failing to handle the OOM test case
failure in build https://travis-ci.org/hashicorp/nomad/jobs/468027848 ,
and I suspect it's related to the process dying so quickly, and
potentially the way we are starting the task, so added a start up delay
and made it more consistent with other tests that don't seem as flaky.

The following is the log line showing Docker returning 500 error condition; while we can probably handle it gracefully without retrying, the retry is very cheap in this case and it's more of an optimization that we can handle in follow up PR.

```
    testlog.go:32: 2018-12-14T14:57:52.626Z [DEBUG] docker/driver.go:852: docker: setting container startup command: task_name=nc-demo command="/bin/nc -l 127.0.0.1 -p 0"
    testlog.go:32: 2018-12-14T14:57:52.626Z [DEBUG] docker/driver.go:866: docker: setting container name: task_name=nc-demo container_name=724a3e77-8b15-e657-f6aa-84c2d3243b18
    testlog.go:32: 2018-12-14T14:57:52.694Z [INFO ] docker/driver.go:196: docker: created container: container_id=362b6ea183f3c4ce472d7d7571ca47023cea1df0f5eb920827921716f17718be
    testlog.go:32: 2018-12-14T14:57:53.523Z [DEBUG] docker/driver.go:416: docker: failed to start container: container_id=362b6ea183f3c4ce472d7d7571ca47023cea1df0f5eb920827921716f17718be attempt=1 error="API error (500): {"message":"cannot start a stopped process: unknown"}
        "
    testlog.go:32: 2018-12-14T14:57:55.394Z [DEBUG] docker/driver.go:416: docker: failed to start container: container_id=362b6ea183f3c4ce472d7d7571ca47023cea1df0f5eb920827921716f17718be attempt=2 error="API error (500): {"message":"cannot start a stopped process: unknown"}
        "
    testlog.go:32: 2018-12-14T14:57:57.243Z [DEBUG] docker/driver.go:416: docker: failed to start container: container_id=362b6ea183f3c4ce472d7d7571ca47023cea1df0f5eb920827921716f17718be attempt=3 error="API error (500): {"message":"cannot start a stopped process: unknown"}
        "
```
2018-12-15 15:03:56 -05:00
Mahmood Ali
e3cee53230 tests: pin busybox image to a specific point tag
Using `:latest` tag is typically a cause of pain, as underlying image
changes behavior.  Here, I'm switching to using a point release, and
re-updating the stored tarballs with it.

Sadly, when saving/loading images, the repo digeset is not supported:
https://github.com/moby/moby/issues/22011 ; but using point releases
should mitigate the problem.

The motivation here is that docker tests have some flakiness due to
accidental importing of `busybox:latest` which has `/bin/nc` that no
longer supports `-p 0`:

```
$ docker run -it --rm busybox /bin/nc -l 127.0.0.1 -p 0
Unable to find image 'busybox:latest' locally
latest: Pulling from library/busybox
Digest: sha256:2a03a6059f21e150ae84b0973863609494aad70f0a80eaeb64bddd8d92465812
Status: Downloaded newer image for busybox:latest
nc: bad local port '0'
```

Looks like older busybox versions (e.g. `busybox:1.24` do honor `-p 0`
as the test expect, but I would rather update busybox to fix.
2018-12-15 15:03:56 -05:00
Nick Ethier
3fb53e87de Merge pull request #4961 from hashicorp/f-grpc-executor
GRPC Executor
2018-12-15 00:34:36 -05:00
Nick Ethier
81ba18d74a executor: encode mounts and devices correctly when using grpc 2018-12-15 00:08:23 -05:00
Nick Ethier
d0efb72846 rawexec: fix misleading log 2018-12-14 23:40:37 -05:00