Commit Graph

13573 Commits

Author SHA1 Message Date
Danielle Tomlinson
502f36335e allocrunner: Drop and log updates after closing waitCh 2018-12-18 23:38:34 +01:00
Danielle Tomlinson
5464a9565a allocrunner: Documentation for ShutdownCh/DestroyCh 2018-12-18 23:38:34 +01:00
Danielle Tomlinson
9f1b53f2a8 fixup: Log when we detect out of order updates 2018-12-18 23:38:33 +01:00
Danielle Tomlinson
69fc73767a allocrunner: Handle updates asynchronously
This creates a new buffered channel and goroutine on the allocrunner for
serializing updates to allocations. This allows us to take updates off
the routine that is used from processing updates from the server,
without having complicated machinery for tracking update lifetimes, or
other external synchronization.

This results in a nice performance improvement and signficantly better
throughput on batch changes such as preempting a large number of jobs
for a larger placement.
2018-12-18 23:38:33 +01:00
Danielle Tomlinson
6f636ea15a gc: Wait for allocrunners to be destroyed 2018-12-18 23:38:33 +01:00
Danielle Tomlinson
934d2e6bf6 client: Async API for shutdown/destroy allocrunners 2018-12-18 23:38:33 +01:00
Danielle Tomlinson
800bd57333 allocrunner: Async shutdown and destroy
This commit reduces the locking required to shutdown or destroy
allocrunners, and allows parallel shutdown and destroy of allocrunners during
shutdown.
2018-12-18 23:38:33 +01:00
Preetha
08ffb0b15f Merge pull request #5012 from hashicorp/f-e2e-provisioning
Terraform configs for e2e tests
2018-12-18 13:45:58 -06:00
Preetha Appan
75294a781a added readme 2018-12-18 13:37:03 -06:00
Michael Lange
5084eda866 Merge pull request #4981 from hashicorp/b-ui-hide-stats-graphs-for-non-running-resources
UI: Hide stats graphs for non running resources
2018-12-18 11:15:39 -08:00
Danielle Tomlinson
be9763dc9b Merge pull request #5016 from hashicorp/dani/b-docker-delete-task-on-destroy
docker: Delete Task on Destroy
2018-12-18 18:22:36 +01:00
Danielle Tomlinson
ad4bac8d77 docker: Delete Task on Destroy
Currently the docker driver does not remove tasks from its state map
when destroying the task, which leads to issues when restarting tasks in
place, and leaks expired handles over time.
2018-12-18 15:53:31 +01:00
Michael Lange
fc7455c104 Merge pull request #4994 from hashicorp/b-ui-dots-in-tasks
UI: Bugs around dots in task/task-group/driver names
2018-12-17 15:50:31 -08:00
Preetha Appan
82f95b2e0c suggestions from code review 2018-12-17 15:06:22 -06:00
Jack Pearkes
dca95c2e57 Terraform configs for e2e tests 2018-12-17 11:40:09 -06:00
Danielle Tomlinson
bba8b4ef4f Merge pull request #4989 from hashicorp/dani/b-client-update-race-condition
client: Give a copy of clientconfig to allocrunner
2018-12-17 10:49:46 +01:00
Danielle Tomlinson
a282cf69c9 Merge pull request #5004 from hashicorp/dani/f-hook-errors
client: Emit TaskEvents when task hooks fail
2018-12-17 10:42:57 +01:00
Danielle Tomlinson
61a17621e3 taskrunner: Use hook errors for artifacts 2018-12-17 10:39:38 +01:00
Mahmood Ali
c526ddb068 Remove implicit check
I intended to remove this line in 29ef7ecf23 - see my notes there for details.
2018-12-16 09:14:26 -05:00
Mahmood Ali
ee652be312 tests: fix rkt command environment (#5011)
The environment variables needed for envoking `rkt` command line
should include host PATH (to access `iptables`).

Given that the command runs outside the VM, untrusted task environment
variables should NOT be honored here.

We do this already with `rkt`, but the change is quite subtle to miss
when refactoring.
2018-12-15 20:25:36 -05:00
Mahmood Ali
4a51769250 Merge pull request #5008 from hashicorp/b-docker-test-20181214
Fix flakiness in docker tests
2018-12-15 16:03:00 -05:00
Mahmood Ali
29ef7ecf23 tests: avoid implicitly asserting clean shutdown
The assertion here is causing many spurious failures that aren't
actually relevant to the test itself.

We are tracking the cause for this failure independently, and it would
make more sense to have a dedicated test for clean shutdown.
2018-12-15 15:30:09 -05:00
Mahmood Ali
119aabe77b testes: remove TestDockerDriver_Kill
We already have two other Kill tests (e.g.
TestDockerDriver_Start_Kill_Wait and
TestDockerDriver_Start_KillTimeout), so don't need yet another flaky
test.
2018-12-15 15:03:56 -05:00
Mahmood Ali
f248fefdbf driver/docker: stopping a dead container not error 2018-12-15 15:03:56 -05:00
Mahmood Ali
2502ffe589 tests: assert docker containers start 2018-12-15 15:03:56 -05:00
Mahmood Ali
29fc3f77c8 tests: try deflake TestDockerDriver_OOMKilled
Noticed an issue in Docker daemon failing to handle the OOM test case
failure in build https://travis-ci.org/hashicorp/nomad/jobs/468027848 ,
and I suspect it's related to the process dying so quickly, and
potentially the way we are starting the task, so added a start up delay
and made it more consistent with other tests that don't seem as flaky.

The following is the log line showing Docker returning 500 error condition; while we can probably handle it gracefully without retrying, the retry is very cheap in this case and it's more of an optimization that we can handle in follow up PR.

```
    testlog.go:32: 2018-12-14T14:57:52.626Z [DEBUG] docker/driver.go:852: docker: setting container startup command: task_name=nc-demo command="/bin/nc -l 127.0.0.1 -p 0"
    testlog.go:32: 2018-12-14T14:57:52.626Z [DEBUG] docker/driver.go:866: docker: setting container name: task_name=nc-demo container_name=724a3e77-8b15-e657-f6aa-84c2d3243b18
    testlog.go:32: 2018-12-14T14:57:52.694Z [INFO ] docker/driver.go:196: docker: created container: container_id=362b6ea183f3c4ce472d7d7571ca47023cea1df0f5eb920827921716f17718be
    testlog.go:32: 2018-12-14T14:57:53.523Z [DEBUG] docker/driver.go:416: docker: failed to start container: container_id=362b6ea183f3c4ce472d7d7571ca47023cea1df0f5eb920827921716f17718be attempt=1 error="API error (500): {"message":"cannot start a stopped process: unknown"}
        "
    testlog.go:32: 2018-12-14T14:57:55.394Z [DEBUG] docker/driver.go:416: docker: failed to start container: container_id=362b6ea183f3c4ce472d7d7571ca47023cea1df0f5eb920827921716f17718be attempt=2 error="API error (500): {"message":"cannot start a stopped process: unknown"}
        "
    testlog.go:32: 2018-12-14T14:57:57.243Z [DEBUG] docker/driver.go:416: docker: failed to start container: container_id=362b6ea183f3c4ce472d7d7571ca47023cea1df0f5eb920827921716f17718be attempt=3 error="API error (500): {"message":"cannot start a stopped process: unknown"}
        "
```
2018-12-15 15:03:56 -05:00
Mahmood Ali
e3cee53230 tests: pin busybox image to a specific point tag
Using `:latest` tag is typically a cause of pain, as underlying image
changes behavior.  Here, I'm switching to using a point release, and
re-updating the stored tarballs with it.

Sadly, when saving/loading images, the repo digeset is not supported:
https://github.com/moby/moby/issues/22011 ; but using point releases
should mitigate the problem.

The motivation here is that docker tests have some flakiness due to
accidental importing of `busybox:latest` which has `/bin/nc` that no
longer supports `-p 0`:

```
$ docker run -it --rm busybox /bin/nc -l 127.0.0.1 -p 0
Unable to find image 'busybox:latest' locally
latest: Pulling from library/busybox
Digest: sha256:2a03a6059f21e150ae84b0973863609494aad70f0a80eaeb64bddd8d92465812
Status: Downloaded newer image for busybox:latest
nc: bad local port '0'
```

Looks like older busybox versions (e.g. `busybox:1.24` do honor `-p 0`
as the test expect, but I would rather update busybox to fix.
2018-12-15 15:03:56 -05:00
Nick Ethier
3fb53e87de Merge pull request #4961 from hashicorp/f-grpc-executor
GRPC Executor
2018-12-15 00:34:36 -05:00
Nick Ethier
81ba18d74a executor: encode mounts and devices correctly when using grpc 2018-12-15 00:08:23 -05:00
Nick Ethier
d0efb72846 rawexec: fix misleading log 2018-12-14 23:40:37 -05:00
Nick Ethier
c8a3c0e96e executor: use int when encoding signal in RPC 2018-12-14 22:20:01 -05:00
Mahmood Ali
c784d59ab3 Merge pull request #5005 from hashicorp/dev-update-golang-1.11.3
Upgrade to Golang 1.11.3
2018-12-14 11:11:55 -05:00
Mahmood Ali
bb248e86cc dev: expand ... in go get
workaround a regression in 1.11.3

> We are aware of a functionality regression in "go get" when executed in GOPATH mode on an import path pattern containing "..." (e.g., "go get github.com/golang/pkg/..."), when downloading packages not already present in the GOPATH workspace. This is issue golang.org/issue/29241. It will be resolved in the next minor patch releases, Go 1.11.4 and Go 1.10.7, which we plan to release soon. We apologize for any disruption.
2018-12-14 09:42:23 -05:00
Mahmood Ali
edf5674796 dev: upgrade go to 1.11.3 2018-12-14 09:42:23 -05:00
Mahmood Ali
3ee0e0f2d6 Update changelog (#4993) 2018-12-14 09:20:17 -05:00
Nick Ethier
8a344412e8 Merge branch 'master' into f-grpc-executor
* master: (71 commits)
  Fix output of 'nomad deployment fail' with no arg
  Always create a running allocation when testing task state
  tests: ensure exec tests pass valid task resources (#4992)
  some changes for more idiomatic code
  fix iops related tests
  fixed bug in loop delay
  gofmt
  improved code for readability
  client: updateAlloc release lock after read
  fixup! device attributes in `nomad node status -verbose`
  drivers/exec: support device binds and mounts
  fix iops bug and increase test matrix coverage
  tests: tag image explicitly
  changelog
  ci: install lxc-templates explicitly
  tests: skip checking rdma cgroup
  ci: use Ubuntu 16.04 (Xenial) in TravisCI
  client: update driver info on new fingerprint
  drivers/docker: enforce volumes.enabled (#4983)
  client: Style: use fluent style for building loggers
  ...
2018-12-13 14:41:09 -05:00
Preetha
0826042a7d Merge pull request #4999 from hashicorp/blalor-patch-1
Fix output of 'nomad deployment fail' with no arg
2018-12-13 12:30:43 -06:00
Brian Lalor
40697ab6c1 Fix output of 'nomad deployment fail' with no arg 2018-12-13 13:22:17 -05:00
Danielle Tomlinson
4d4201331c taskrunner: Emit task events when a hook fails 2018-12-13 18:20:18 +01:00
Michael Lange
8909046f15 Don't use Ember.get in conjunction with dynamic strings in the job-plan serializer 2018-12-13 07:53:37 -08:00
Michael Lange
f2b4fbcbe8 Don't use Ember.get in conjunction with dynamic strings in the allocation serializer 2018-12-13 07:53:37 -08:00
Michael Lange
2a554e3954 Don't use Ember.get in conjunction with dynamic strings in the node serializer 2018-12-13 07:53:37 -08:00
Michael Lange
c3fdeb3fa6 Don't use Ember.get in conjunction with dynamic strings in the deployment serializer 2018-12-13 07:53:37 -08:00
Michael Lange
3a07f68d4f Don't use Ember.get in conjunction with dynamic strings in the job-summary serializer 2018-12-13 07:53:37 -08:00
Michael Lange
7143b3048b Don't use Ember.get in conjunction with dynamic strings in the evaluation serializer 2018-12-13 07:53:37 -08:00
Michael Lange
bf9ff6ed8b Test coverage for resource graph empty states 2018-12-13 07:53:17 -08:00
Michael Lange
453f11d0c6 Test coverage for allocation rows 2018-12-13 07:53:17 -08:00
Michael Lange
e9ba939bd6 Conditionally show the utilization graphs on the allocation and task detail pages 2018-12-13 07:53:17 -08:00
Michael Lange
25a29cb4ee Conditionally show utilization metrics on alloc and task rows 2018-12-13 07:53:17 -08:00
Michael Lange
c1c40af236 Task isRunning is based on both the task state and the allocation state 2018-12-13 07:53:17 -08:00