Commit Graph

13566 Commits

Author SHA1 Message Date
Preetha
08ffb0b15f Merge pull request #5012 from hashicorp/f-e2e-provisioning
Terraform configs for e2e tests
2018-12-18 13:45:58 -06:00
Preetha Appan
75294a781a added readme 2018-12-18 13:37:03 -06:00
Michael Lange
5084eda866 Merge pull request #4981 from hashicorp/b-ui-hide-stats-graphs-for-non-running-resources
UI: Hide stats graphs for non running resources
2018-12-18 11:15:39 -08:00
Danielle Tomlinson
be9763dc9b Merge pull request #5016 from hashicorp/dani/b-docker-delete-task-on-destroy
docker: Delete Task on Destroy
2018-12-18 18:22:36 +01:00
Danielle Tomlinson
ad4bac8d77 docker: Delete Task on Destroy
Currently the docker driver does not remove tasks from its state map
when destroying the task, which leads to issues when restarting tasks in
place, and leaks expired handles over time.
2018-12-18 15:53:31 +01:00
Michael Lange
fc7455c104 Merge pull request #4994 from hashicorp/b-ui-dots-in-tasks
UI: Bugs around dots in task/task-group/driver names
2018-12-17 15:50:31 -08:00
Preetha Appan
82f95b2e0c suggestions from code review 2018-12-17 15:06:22 -06:00
Jack Pearkes
dca95c2e57 Terraform configs for e2e tests 2018-12-17 11:40:09 -06:00
Danielle Tomlinson
bba8b4ef4f Merge pull request #4989 from hashicorp/dani/b-client-update-race-condition
client: Give a copy of clientconfig to allocrunner
2018-12-17 10:49:46 +01:00
Danielle Tomlinson
a282cf69c9 Merge pull request #5004 from hashicorp/dani/f-hook-errors
client: Emit TaskEvents when task hooks fail
2018-12-17 10:42:57 +01:00
Danielle Tomlinson
61a17621e3 taskrunner: Use hook errors for artifacts 2018-12-17 10:39:38 +01:00
Mahmood Ali
c526ddb068 Remove implicit check
I intended to remove this line in 29ef7ecf23 - see my notes there for details.
2018-12-16 09:14:26 -05:00
Mahmood Ali
ee652be312 tests: fix rkt command environment (#5011)
The environment variables needed for envoking `rkt` command line
should include host PATH (to access `iptables`).

Given that the command runs outside the VM, untrusted task environment
variables should NOT be honored here.

We do this already with `rkt`, but the change is quite subtle to miss
when refactoring.
2018-12-15 20:25:36 -05:00
Mahmood Ali
4a51769250 Merge pull request #5008 from hashicorp/b-docker-test-20181214
Fix flakiness in docker tests
2018-12-15 16:03:00 -05:00
Mahmood Ali
29ef7ecf23 tests: avoid implicitly asserting clean shutdown
The assertion here is causing many spurious failures that aren't
actually relevant to the test itself.

We are tracking the cause for this failure independently, and it would
make more sense to have a dedicated test for clean shutdown.
2018-12-15 15:30:09 -05:00
Mahmood Ali
119aabe77b testes: remove TestDockerDriver_Kill
We already have two other Kill tests (e.g.
TestDockerDriver_Start_Kill_Wait and
TestDockerDriver_Start_KillTimeout), so don't need yet another flaky
test.
2018-12-15 15:03:56 -05:00
Mahmood Ali
f248fefdbf driver/docker: stopping a dead container not error 2018-12-15 15:03:56 -05:00
Mahmood Ali
2502ffe589 tests: assert docker containers start 2018-12-15 15:03:56 -05:00
Mahmood Ali
29fc3f77c8 tests: try deflake TestDockerDriver_OOMKilled
Noticed an issue in Docker daemon failing to handle the OOM test case
failure in build https://travis-ci.org/hashicorp/nomad/jobs/468027848 ,
and I suspect it's related to the process dying so quickly, and
potentially the way we are starting the task, so added a start up delay
and made it more consistent with other tests that don't seem as flaky.

The following is the log line showing Docker returning 500 error condition; while we can probably handle it gracefully without retrying, the retry is very cheap in this case and it's more of an optimization that we can handle in follow up PR.

```
    testlog.go:32: 2018-12-14T14:57:52.626Z [DEBUG] docker/driver.go:852: docker: setting container startup command: task_name=nc-demo command="/bin/nc -l 127.0.0.1 -p 0"
    testlog.go:32: 2018-12-14T14:57:52.626Z [DEBUG] docker/driver.go:866: docker: setting container name: task_name=nc-demo container_name=724a3e77-8b15-e657-f6aa-84c2d3243b18
    testlog.go:32: 2018-12-14T14:57:52.694Z [INFO ] docker/driver.go:196: docker: created container: container_id=362b6ea183f3c4ce472d7d7571ca47023cea1df0f5eb920827921716f17718be
    testlog.go:32: 2018-12-14T14:57:53.523Z [DEBUG] docker/driver.go:416: docker: failed to start container: container_id=362b6ea183f3c4ce472d7d7571ca47023cea1df0f5eb920827921716f17718be attempt=1 error="API error (500): {"message":"cannot start a stopped process: unknown"}
        "
    testlog.go:32: 2018-12-14T14:57:55.394Z [DEBUG] docker/driver.go:416: docker: failed to start container: container_id=362b6ea183f3c4ce472d7d7571ca47023cea1df0f5eb920827921716f17718be attempt=2 error="API error (500): {"message":"cannot start a stopped process: unknown"}
        "
    testlog.go:32: 2018-12-14T14:57:57.243Z [DEBUG] docker/driver.go:416: docker: failed to start container: container_id=362b6ea183f3c4ce472d7d7571ca47023cea1df0f5eb920827921716f17718be attempt=3 error="API error (500): {"message":"cannot start a stopped process: unknown"}
        "
```
2018-12-15 15:03:56 -05:00
Mahmood Ali
e3cee53230 tests: pin busybox image to a specific point tag
Using `:latest` tag is typically a cause of pain, as underlying image
changes behavior.  Here, I'm switching to using a point release, and
re-updating the stored tarballs with it.

Sadly, when saving/loading images, the repo digeset is not supported:
https://github.com/moby/moby/issues/22011 ; but using point releases
should mitigate the problem.

The motivation here is that docker tests have some flakiness due to
accidental importing of `busybox:latest` which has `/bin/nc` that no
longer supports `-p 0`:

```
$ docker run -it --rm busybox /bin/nc -l 127.0.0.1 -p 0
Unable to find image 'busybox:latest' locally
latest: Pulling from library/busybox
Digest: sha256:2a03a6059f21e150ae84b0973863609494aad70f0a80eaeb64bddd8d92465812
Status: Downloaded newer image for busybox:latest
nc: bad local port '0'
```

Looks like older busybox versions (e.g. `busybox:1.24` do honor `-p 0`
as the test expect, but I would rather update busybox to fix.
2018-12-15 15:03:56 -05:00
Nick Ethier
3fb53e87de Merge pull request #4961 from hashicorp/f-grpc-executor
GRPC Executor
2018-12-15 00:34:36 -05:00
Nick Ethier
81ba18d74a executor: encode mounts and devices correctly when using grpc 2018-12-15 00:08:23 -05:00
Nick Ethier
d0efb72846 rawexec: fix misleading log 2018-12-14 23:40:37 -05:00
Nick Ethier
c8a3c0e96e executor: use int when encoding signal in RPC 2018-12-14 22:20:01 -05:00
Mahmood Ali
c784d59ab3 Merge pull request #5005 from hashicorp/dev-update-golang-1.11.3
Upgrade to Golang 1.11.3
2018-12-14 11:11:55 -05:00
Mahmood Ali
bb248e86cc dev: expand ... in go get
workaround a regression in 1.11.3

> We are aware of a functionality regression in "go get" when executed in GOPATH mode on an import path pattern containing "..." (e.g., "go get github.com/golang/pkg/..."), when downloading packages not already present in the GOPATH workspace. This is issue golang.org/issue/29241. It will be resolved in the next minor patch releases, Go 1.11.4 and Go 1.10.7, which we plan to release soon. We apologize for any disruption.
2018-12-14 09:42:23 -05:00
Mahmood Ali
edf5674796 dev: upgrade go to 1.11.3 2018-12-14 09:42:23 -05:00
Mahmood Ali
3ee0e0f2d6 Update changelog (#4993) 2018-12-14 09:20:17 -05:00
Nick Ethier
8a344412e8 Merge branch 'master' into f-grpc-executor
* master: (71 commits)
  Fix output of 'nomad deployment fail' with no arg
  Always create a running allocation when testing task state
  tests: ensure exec tests pass valid task resources (#4992)
  some changes for more idiomatic code
  fix iops related tests
  fixed bug in loop delay
  gofmt
  improved code for readability
  client: updateAlloc release lock after read
  fixup! device attributes in `nomad node status -verbose`
  drivers/exec: support device binds and mounts
  fix iops bug and increase test matrix coverage
  tests: tag image explicitly
  changelog
  ci: install lxc-templates explicitly
  tests: skip checking rdma cgroup
  ci: use Ubuntu 16.04 (Xenial) in TravisCI
  client: update driver info on new fingerprint
  drivers/docker: enforce volumes.enabled (#4983)
  client: Style: use fluent style for building loggers
  ...
2018-12-13 14:41:09 -05:00
Preetha
0826042a7d Merge pull request #4999 from hashicorp/blalor-patch-1
Fix output of 'nomad deployment fail' with no arg
2018-12-13 12:30:43 -06:00
Brian Lalor
40697ab6c1 Fix output of 'nomad deployment fail' with no arg 2018-12-13 13:22:17 -05:00
Danielle Tomlinson
4d4201331c taskrunner: Emit task events when a hook fails 2018-12-13 18:20:18 +01:00
Michael Lange
8909046f15 Don't use Ember.get in conjunction with dynamic strings in the job-plan serializer 2018-12-13 07:53:37 -08:00
Michael Lange
f2b4fbcbe8 Don't use Ember.get in conjunction with dynamic strings in the allocation serializer 2018-12-13 07:53:37 -08:00
Michael Lange
2a554e3954 Don't use Ember.get in conjunction with dynamic strings in the node serializer 2018-12-13 07:53:37 -08:00
Michael Lange
c3fdeb3fa6 Don't use Ember.get in conjunction with dynamic strings in the deployment serializer 2018-12-13 07:53:37 -08:00
Michael Lange
3a07f68d4f Don't use Ember.get in conjunction with dynamic strings in the job-summary serializer 2018-12-13 07:53:37 -08:00
Michael Lange
7143b3048b Don't use Ember.get in conjunction with dynamic strings in the evaluation serializer 2018-12-13 07:53:37 -08:00
Michael Lange
bf9ff6ed8b Test coverage for resource graph empty states 2018-12-13 07:53:17 -08:00
Michael Lange
453f11d0c6 Test coverage for allocation rows 2018-12-13 07:53:17 -08:00
Michael Lange
e9ba939bd6 Conditionally show the utilization graphs on the allocation and task detail pages 2018-12-13 07:53:17 -08:00
Michael Lange
25a29cb4ee Conditionally show utilization metrics on alloc and task rows 2018-12-13 07:53:17 -08:00
Michael Lange
c1c40af236 Task isRunning is based on both the task state and the allocation state 2018-12-13 07:53:17 -08:00
Michael Lange
605d7a245f Model isRunning based on the client status of the allocation 2018-12-13 07:53:17 -08:00
Michael Lange
6e39a3ea59 Merge pull request #4998 from hashicorp/b-ui-test-failure
UI: Fix intermittent test failure "cannot read property name of undefined"
2018-12-13 07:52:30 -08:00
Michael Lange
63cd9c972b Always create a running allocation when testing task state 2018-12-13 07:39:16 -08:00
Danielle Tomlinson
98dc399d5c Merge pull request #4990 from hashicorp/dani/b-alloc-lock
client: updateAlloc release lock after read
2018-12-13 12:43:59 +01:00
Danielle Tomlinson
30bed980f1 client: Give a copy of clientconfig to allocrunner
Currently, there is a race condition between creating a taskrunner, and
updating node attributes via fingerprinting.

This is because the taskenv builder will try to iterate over the
clientconfig.Node.Attributes map, which can be concurrently updated by
the fingerprinting process, thus causing a panic.

This fixes that by providing a copy of the clientconfg to the
allocrunner inside the Read lock during config creation.
2018-12-13 12:42:15 +01:00
Mahmood Ali
5ef81ed673 tests: ensure exec tests pass valid task resources (#4992)
Prior to 97f33bb153, executor cgroup validation errors were
silently ignored.  Enforcing them reveals test cases that missed them.

This doesn't change customer facing contract, as resource struct is
is either configured or we default to 100 (much higher than 2).
2018-12-12 20:40:38 -05:00
Chris Baker
5bcb24b1d5 Merge pull request #4974 from hashicorp/b-1173-log-spam
rpc accept loop: added backoff on logging
2018-12-12 16:54:42 -08:00