Commit Graph

194 Commits

Author SHA1 Message Date
Mahmood Ali
ee652be312 tests: fix rkt command environment (#5011)
The environment variables needed for envoking `rkt` command line
should include host PATH (to access `iptables`).

Given that the command runs outside the VM, untrusted task environment
variables should NOT be honored here.

We do this already with `rkt`, but the change is quite subtle to miss
when refactoring.
2018-12-15 20:25:36 -05:00
Mahmood Ali
4a51769250 Merge pull request #5008 from hashicorp/b-docker-test-20181214
Fix flakiness in docker tests
2018-12-15 16:03:00 -05:00
Mahmood Ali
119aabe77b testes: remove TestDockerDriver_Kill
We already have two other Kill tests (e.g.
TestDockerDriver_Start_Kill_Wait and
TestDockerDriver_Start_KillTimeout), so don't need yet another flaky
test.
2018-12-15 15:03:56 -05:00
Mahmood Ali
f248fefdbf driver/docker: stopping a dead container not error 2018-12-15 15:03:56 -05:00
Mahmood Ali
2502ffe589 tests: assert docker containers start 2018-12-15 15:03:56 -05:00
Mahmood Ali
29fc3f77c8 tests: try deflake TestDockerDriver_OOMKilled
Noticed an issue in Docker daemon failing to handle the OOM test case
failure in build https://travis-ci.org/hashicorp/nomad/jobs/468027848 ,
and I suspect it's related to the process dying so quickly, and
potentially the way we are starting the task, so added a start up delay
and made it more consistent with other tests that don't seem as flaky.

The following is the log line showing Docker returning 500 error condition; while we can probably handle it gracefully without retrying, the retry is very cheap in this case and it's more of an optimization that we can handle in follow up PR.

```
    testlog.go:32: 2018-12-14T14:57:52.626Z [DEBUG] docker/driver.go:852: docker: setting container startup command: task_name=nc-demo command="/bin/nc -l 127.0.0.1 -p 0"
    testlog.go:32: 2018-12-14T14:57:52.626Z [DEBUG] docker/driver.go:866: docker: setting container name: task_name=nc-demo container_name=724a3e77-8b15-e657-f6aa-84c2d3243b18
    testlog.go:32: 2018-12-14T14:57:52.694Z [INFO ] docker/driver.go:196: docker: created container: container_id=362b6ea183f3c4ce472d7d7571ca47023cea1df0f5eb920827921716f17718be
    testlog.go:32: 2018-12-14T14:57:53.523Z [DEBUG] docker/driver.go:416: docker: failed to start container: container_id=362b6ea183f3c4ce472d7d7571ca47023cea1df0f5eb920827921716f17718be attempt=1 error="API error (500): {"message":"cannot start a stopped process: unknown"}
        "
    testlog.go:32: 2018-12-14T14:57:55.394Z [DEBUG] docker/driver.go:416: docker: failed to start container: container_id=362b6ea183f3c4ce472d7d7571ca47023cea1df0f5eb920827921716f17718be attempt=2 error="API error (500): {"message":"cannot start a stopped process: unknown"}
        "
    testlog.go:32: 2018-12-14T14:57:57.243Z [DEBUG] docker/driver.go:416: docker: failed to start container: container_id=362b6ea183f3c4ce472d7d7571ca47023cea1df0f5eb920827921716f17718be attempt=3 error="API error (500): {"message":"cannot start a stopped process: unknown"}
        "
```
2018-12-15 15:03:56 -05:00
Mahmood Ali
e3cee53230 tests: pin busybox image to a specific point tag
Using `:latest` tag is typically a cause of pain, as underlying image
changes behavior.  Here, I'm switching to using a point release, and
re-updating the stored tarballs with it.

Sadly, when saving/loading images, the repo digeset is not supported:
https://github.com/moby/moby/issues/22011 ; but using point releases
should mitigate the problem.

The motivation here is that docker tests have some flakiness due to
accidental importing of `busybox:latest` which has `/bin/nc` that no
longer supports `-p 0`:

```
$ docker run -it --rm busybox /bin/nc -l 127.0.0.1 -p 0
Unable to find image 'busybox:latest' locally
latest: Pulling from library/busybox
Digest: sha256:2a03a6059f21e150ae84b0973863609494aad70f0a80eaeb64bddd8d92465812
Status: Downloaded newer image for busybox:latest
nc: bad local port '0'
```

Looks like older busybox versions (e.g. `busybox:1.24` do honor `-p 0`
as the test expect, but I would rather update busybox to fix.
2018-12-15 15:03:56 -05:00
Nick Ethier
81ba18d74a executor: encode mounts and devices correctly when using grpc 2018-12-15 00:08:23 -05:00
Nick Ethier
d0efb72846 rawexec: fix misleading log 2018-12-14 23:40:37 -05:00
Nick Ethier
c8a3c0e96e executor: use int when encoding signal in RPC 2018-12-14 22:20:01 -05:00
Nick Ethier
8a344412e8 Merge branch 'master' into f-grpc-executor
* master: (71 commits)
  Fix output of 'nomad deployment fail' with no arg
  Always create a running allocation when testing task state
  tests: ensure exec tests pass valid task resources (#4992)
  some changes for more idiomatic code
  fix iops related tests
  fixed bug in loop delay
  gofmt
  improved code for readability
  client: updateAlloc release lock after read
  fixup! device attributes in `nomad node status -verbose`
  drivers/exec: support device binds and mounts
  fix iops bug and increase test matrix coverage
  tests: tag image explicitly
  changelog
  ci: install lxc-templates explicitly
  tests: skip checking rdma cgroup
  ci: use Ubuntu 16.04 (Xenial) in TravisCI
  client: update driver info on new fingerprint
  drivers/docker: enforce volumes.enabled (#4983)
  client: Style: use fluent style for building loggers
  ...
2018-12-13 14:41:09 -05:00
Mahmood Ali
5ef81ed673 tests: ensure exec tests pass valid task resources (#4992)
Prior to 97f33bb153, executor cgroup validation errors were
silently ignored.  Enforcing them reveals test cases that missed them.

This doesn't change customer facing contract, as resource struct is
is either configured or we default to 100 (much higher than 2).
2018-12-12 20:40:38 -05:00
Mahmood Ali
97f33bb153 drivers/exec: support device binds and mounts 2018-12-11 18:35:21 -05:00
Mahmood Ali
5f11000714 Merge pull request #4985 from hashicorp/test-with-xenial
ci: Test with Ubuntu 16.04 in TravisCI
2018-12-11 18:00:39 -05:00
Mahmood Ali
51707199a6 Merge pull request #4975 from hashicorp/fix-master-20181209
Some test fixes and remedies
2018-12-11 18:00:21 -05:00
Mahmood Ali
e716c451a9 tests: tag image explicitly 2018-12-11 17:59:45 -05:00
Alex Dadgar
f42c060d35 Merge pull request #4970 from hashicorp/f-no-iops
Deprecate IOPS
2018-12-11 12:51:22 -08:00
Mahmood Ali
8f2454029a tests: skip checking rdma cgroup
rdma was added in most recent kernels and libcontainer/docker
don't isolate them by default.
2018-12-11 15:49:11 -05:00
Mahmood Ali
1678a8499b drivers/docker: enforce volumes.enabled (#4983)
When volumes.enable flag is off in Docker driver, disable all mounts of
paths outside alloc dir.
2018-12-11 14:22:50 -05:00
Mahmood Ali
c02dbc7f67 add a note about busybox license 2018-12-11 09:35:26 -05:00
Mahmood Ali
06a4b4add2 tests: prevent indefinite blocking in some tests
Noticed few places where tests seem to block indefinitely and panic
after the test run reaches the test package timeout.

I intend to follow up with the proper fix later, but timing out is much
better than indefinitely blocking.
2018-12-11 09:35:26 -05:00
Mahmood Ali
d31e52b1f6 tests: update stop/kill tests with new pattern
Update rawexec and rkt stop/kill tests with the patterns introduced in
7a49e9b68e.  This implementation should be
more resilient to discrepancy between task stopping and task being marked as exited.
2018-12-11 09:35:26 -05:00
Mahmood Ali
d6e708fe2d tests: setup libcontainer rootfs
Using statically linked busybox binary to setup a basic rootfs for
testing, by symlinking it to provide the basic commands used in tests.

I considered using a proper rootfs tarball, but the overhead of managing
tarfile and expanding it seems significant enough that I went with this
implementation.
2018-12-11 09:35:26 -05:00
Mahmood Ali
070ed8b654 fix dtestutil.NewDriverHarness ref 2018-12-08 09:58:23 -05:00
Mahmood Ali
2fb5e35012 Merge pull request #4950 from hashicorp/b-exc-libcontainer-kill
executor: kill all container processes
2018-12-08 09:52:42 -05:00
Nick Ethier
4243b7d5f3 executor: misspell 2018-12-08 01:52:06 -05:00
Nick Ethier
a6cb63a964 executor: don't drop errors when configuring libcontainer cfg, add nil check on resources 2018-12-07 14:03:42 -05:00
Mahmood Ali
c74e2d0243 Merge pull request #4933 from hashicorp/f-mount-device
Mount Devices in container based drivers
2018-12-07 10:32:03 -05:00
Nick Ethier
6b39bb33b6 Merge branch 'master' into f-grpc-executor 2018-12-06 21:42:38 -05:00
Nick Ethier
9e3c2492e5 executor: fix tests 2018-12-06 21:39:53 -05:00
Nick Ethier
ff1990064b executor: fix broken non-linux build 2018-12-06 21:33:20 -05:00
Nick Ethier
224c68860d executor: use drivers.Resources as resource model 2018-12-06 21:22:02 -05:00
Nick Ethier
13b582fcfb executor: merge plugin shim with executor package 2018-12-06 21:13:45 -05:00
Nick Ethier
0087a51a7a executor: remove structs package 2018-12-06 20:54:14 -05:00
Alex Dadgar
0953d913ed Deprecate IOPS
IOPS have been modelled as a resource since Nomad 0.1 but has never
actually been detected and there is no plan in the short term to add
detection. This is because IOPS is a bit simplistic of a unit to define
the performance requirements from the underlying storage system. In its
current state it adds unnecessary confusion and can be removed without
impacting any users. This PR leaves IOPS defined at the jobspec parsing
level and in the api/ resources since these are the two public uses of
the field. These should be considered deprecated and only exist to allow
users to stop using them during the Nomad 0.9.x release. In the future,
there should be no expectation that the field will exist.
2018-12-06 15:09:26 -08:00
Mahmood Ali
aef1c9dc96 Merge pull request #4955 from hashicorp/fix-docker-tests-20181203
Fix docker driver tests
2018-12-06 16:41:33 -05:00
Mahmood Ali
8c5e2e39a4 driver/rkt: mount plugin devices 2018-12-06 15:46:35 -05:00
Mahmood Ali
bfa2854c8b driver/lxc: mount plugin devices
Also, LXC requires target paths to be relative.  Container paths in LXC
binds should never be absolute paths, so we strip any preceeding `/`,
even if a user sets one.
2018-12-06 15:46:35 -05:00
Mahmood Ali
a0a5847315 fixup: add missed docker utils test 2018-12-06 15:46:35 -05:00
Mahmood Ali
5fd8d3fe68 tests: ensure image is loaded as test setup 2018-12-06 15:36:43 -05:00
Nick Ethier
8690d0ae75 executor: update test references 2018-12-05 11:07:48 -05:00
Nick Ethier
2d33d48980 executor: update driver references 2018-12-05 11:04:18 -05:00
Nick Ethier
467930f650 executor: use grpc instead of netrpc as plugin protocol
* Added protobuf spec for executor
 * Seperated executor structs into their own package
2018-12-05 11:03:56 -05:00
Mahmood Ali
04af097008 driver/docker: honor plugin devices 2018-12-04 21:31:28 -05:00
Mahmood Ali
65e9b3b8be refactor device manipulation 2018-12-04 20:55:59 -05:00
Mahmood Ali
7a49e9b68e drivers/exec: refactor stop/kill tests
Simplify the tests to do all assertions within the main goroutine and
account for status propagation delay.
2018-12-04 20:34:43 -05:00
Mahmood Ali
e9dc31c68f executor: Keep 0.8.6 exit code for wait() failures
0.8.6 uses exit code 1 when `proc.Wait()` fails: https://github.com/hashicorp/nomad/blob/v0.8.6/client/driver/executor/executor.go#L442
2018-12-04 19:38:25 -05:00
Mahmood Ali
92b57036a8 driver/rkt: use rkt environment
The rkt command itself needs an environment with PATH set to find iptables.
2018-12-04 14:00:45 -05:00
Mahmood Ali
2c9b6cd7c4 drivers/rkt: use image isolation for rkt 2018-12-04 11:40:10 -05:00
Mahmood Ali
8f7b78af0c tests: don't assert in WaitForResult
WaitForResult expects body to fail and retries few times before giving
up.  Assertions inside the testfn body causes it to terminate abruptly
without retrying.
2018-12-04 11:40:10 -05:00