Commit Graph

3684 Commits

Author SHA1 Message Date
Mahmood Ali
9dcebcd8a3 client: avoid registering node twice right away
I noticed that `watchNodeUpdates()` almost immediately after
`registerAndHeartbeat()` calls `retryRegisterNode()`, well after 5
seconds.

This call is unnecessary and made debugging a bit harder.  So here, we
ensure that we only re-register node for new node events, not for
initial registration.
2019-04-19 09:12:50 -04:00
Mahmood Ali
7a68d76160 client: wait for batched driver updated
Here we retain 0.8.7 behavior of waiting for driver fingerprints before
registering a node, with some timeout.  This is needed for system jobs,
as system job scheduling for node occur at node registration, and the
race might mean that a system job may not get placed on the node because
of missing drivers.

The timeout isn't strictly necessary, but raising it to 1 minute as it's
closer to indefinitely blocked than 1 second.  We need to keep the value
high enough to capture as much drivers/devices, but low enough that
doesn't risk blocking too long due to misbehaving plugin.

Fixes https://github.com/hashicorp/nomad/issues/5579
2019-04-19 09:00:24 -04:00
Michael Schurter
b135d28450 vault: fix data races 2019-04-16 11:22:44 -07:00
Michael Schurter
0e6da17a8f vault: fix renewal time
Renewal time was being calculated as 10s+Intn(lease-10s), so the renewal
time could be very rapid or within 1s of the deadline: [10s, lease)

This commit fixes the renewal time by calculating it as:

	(lease/2) +/- 10s

For a lease of 60s this means the renewal will occur in [20s, 40s).
2019-04-16 11:22:44 -07:00
Michael Schurter
eeb282ca2f Merge pull request #5518 from hashicorp/f-simplify-kill
client: simplify kill logic
2019-04-15 14:11:58 -07:00
Chris Baker
377c1d694b vault namespaces: inject VAULT_NAMESPACE alongside VAULT_TOKEN + documentation 2019-04-12 15:06:34 +00:00
Lang Martin
c45652ab8c Revert accidental merge of pr #5482
Revert "fingerprint Constraints and Affinities have Equals, as set"
This reverts commit 596f16fb5f.

Revert "client tests assert the independent handling of interface and speed"
This reverts commit 7857ac5993.

Revert "structs missed applying a style change from the review"
This reverts commit 658916e327.

Revert "client, structs comments"
This reverts commit be2838d6ba.

Revert "client fingerprint updateNetworks preserves the network configuration"
This reverts commit fc309cb430.

Revert "client_test cleanup comments from review"
This reverts commit bc0bf4efb9.

Revert "client Networks Equals is set equality"
This reverts commit f8d432345b.

Revert "struct cleanup indentation in RequestedDevice Equals"
This reverts commit f4746411ca.

Revert "struct Equals checks for identity before value checking"
This reverts commit 0767a4665e.

Revert "fix client-test, avoid hardwired platform dependecy on lo0"
This reverts commit e89dbb2ab1.

Revert "refactor error in client fingerprint to include the offending data"
This reverts commit a7fed726c6.

Revert "add client updateNodeResources to merge but preserve manual config"
This reverts commit 84bd433c7e.

Revert "refactor struts.RequestedDevice to have its own Equals"
This reverts commit 6897825240.

Revert "refactor structs.Resource.Networks to have its own Equals"
This reverts commit 49e2e6c77b.

Revert "refactor structs.Resource.Devices to have its own Equals"
This reverts commit 4ede9226bb.

Revert "add COMPAT(0.10): Remove in 0.10 notes to impl for structs.Resources"
This reverts commit 49fbaace52.

Revert "add structs.Resources Equals"
This reverts commit 8528a2a2a6.

Revert "test that fingerprint resources are updated, net not clobbered"
This reverts commit 8ee02ddd23.
2019-04-11 10:29:40 -04:00
Lang Martin
7857ac5993 client tests assert the independent handling of interface and speed 2019-04-11 09:56:22 -04:00
Lang Martin
be2838d6ba client, structs comments 2019-04-11 09:56:22 -04:00
Lang Martin
fc309cb430 client fingerprint updateNetworks preserves the network configuration 2019-04-11 09:56:22 -04:00
Lang Martin
bc0bf4efb9 client_test cleanup comments from review 2019-04-11 09:56:22 -04:00
Lang Martin
e89dbb2ab1 fix client-test, avoid hardwired platform dependecy on lo0 2019-04-11 09:56:22 -04:00
Lang Martin
a7fed726c6 refactor error in client fingerprint to include the offending data 2019-04-11 09:56:22 -04:00
Lang Martin
84bd433c7e add client updateNodeResources to merge but preserve manual config 2019-04-11 09:56:22 -04:00
Lang Martin
8ee02ddd23 test that fingerprint resources are updated, net not clobbered 2019-04-11 09:56:21 -04:00
Danielle Lancashire
419d70c5f9 allocs: Add nomad alloc restart
This adds a `nomad alloc restart` command and api that allows a job operator
with the alloc-lifecycle acl to perform an in-place restart of a Nomad
allocation, or a given subtask.
2019-04-11 14:25:49 +02:00
Chris Baker
2022db72b6 vault client test: minor formatting
vendor: using upstream circonus-gometrics
2019-04-10 10:34:10 -05:00
Chris Baker
312721427d vault e2e: pass vault version into setup instead of having to infer it from test name 2019-04-10 10:34:10 -05:00
Chris Baker
401c9fdd16 taskrunner: removed some unecessary config from a test 2019-04-10 10:34:10 -05:00
Chris Baker
20a3884559 docs: -vault-namespace, VAULT_NAMESPACE, and config
agent: added VAULT_NAMESPACE env-based configuration
2019-04-10 10:34:10 -05:00
Chris Baker
e09badbe8b client: gofmt 2019-04-10 10:34:10 -05:00
Chris Baker
3a28763455 taskrunner: pass configured Vault namespace into TaskTemplateConfig 2019-04-10 10:34:10 -05:00
Chris Baker
1349497152 config/docs: added namespace to vault config
server/client: process `namespace` config, setting on the instantiated vault client
2019-04-10 10:34:10 -05:00
Michael Schurter
8caa1c5b0d Bump to 0.9.1-dev 2019-04-09 09:01:48 -07:00
Nomad Release bot
18dd59056e Generate files for 0.9.0 release 2019-04-09 01:56:00 +00:00
Michael Schurter
3bad050bf1 client: simplify kill logic
Remove runLaunched tracking as Run is *always* called for killable
TaskRunners. TaskRunners which fail before Run can be called (during
NewTaskRunner or Restore) are not killable as they're never added to the
client's alloc map.
2019-04-04 15:18:33 -07:00
Michael Schurter
b51e9e09fc Remove 0.9.0-rc2 generated files 2019-04-03 07:41:09 -07:00
Nomad Release bot
6a838b8c3b Generate files for 0.9.0-rc2 release 2019-04-03 01:54:29 +00:00
Michael Schurter
800bd848c1 Merge pull request #5504 from hashicorp/b-exec-path
executor/linux: make chroot binary paths absolute
2019-04-02 14:09:50 -07:00
Michael Schurter
21e895e2e7 Revert "executor/linux: add defensive checks to binary path"
This reverts commit cb36f4537e.
2019-04-02 11:17:12 -07:00
Michael Schurter
cb36f4537e executor/linux: add defensive checks to binary path 2019-04-02 09:40:53 -07:00
Michael Schurter
254901a51e executor/linux: make chroot binary paths absolute
Avoid libcontainer.Process trying to lookup the binary via $PATH as the
executor has already found where the binary is located.
2019-04-01 15:45:31 -07:00
Mahmood Ali
714c41185c rename fifo methods for clarity 2019-04-01 16:52:58 -04:00
Mahmood Ali
7661857c6e clarify closeDone blocking and field name 2019-04-01 16:10:34 -04:00
Mahmood Ali
3c68c946c4 no requires in a test goroutine 2019-04-01 15:38:39 -04:00
Mahmood Ali
9f1ee37687 log when fifo fails to open 2019-04-01 13:18:03 -04:00
Mahmood Ali
5ca9b6eb37 fifo: Use plain fifo file in Unix
This PR switches to using plain fifo files instead of golang structs
managed by containerd/fifo library.

The library main benefit is management of opening fifo files.  In Linux,
a reader `open()` request would block until a writer opens the file (and
vice-versa).  The library uses goroutines so that it's the first IO
operation that blocks.

This benefit isn't really useful for us: Given that logmon simply
streams output in a separate process, blocking of opening or first read
is effectively the same.

The library additionally makes further complications for managing state
and tracking read/write permission that seems overhead for our use,
compared to using a file directly.

Looking here, I made the following incidental changes:
* document that we do handle if fifo files are already created, as we
rely on that behavior for logmon restarts
* use type system to lock read vs write: currently, fifo library returns
`io.ReadWriteCloser` even if fifo is opened for writing only!
2019-04-01 13:18:03 -04:00
Michael Schurter
6674b270ff Merge pull request #5456 from hashicorp/test-taskenv
tests: port pre-0.9 task env tests
2019-03-25 10:41:38 -07:00
Michael Schurter
2dbc06de61 tests: port pre-0.9 task env tests
I chose to make them more of integration tests since there's a lot more
plumbing involved. The internal implementation details of how we craft
task envs can now change and these tests will still properly assert the
task runtime environment is setup properly.
2019-03-25 09:46:53 -07:00
Michael Schurter
a77f769f1e Bump to dev post-0.9.0-rc1 release 2019-03-22 08:26:30 -07:00
Nomad Release bot
7c00ab4f3f Generate files for 0.9.0-rc1 release 2019-03-21 19:06:13 +00:00
Mahmood Ali
7299e0015a Merge pull request #5428 from hashicorp/b-dropped-logs-on-task-restart
client/logmon: restart log collection correctly when a task is restarted
2019-03-21 14:02:08 -04:00
Mahmood Ali
df9b877ef4 fix TestLogmon_Start_restart 2019-03-21 13:36:46 -04:00
Nick Ethier
31cdf54214 logmon: fix test assertion 2019-03-20 21:37:17 -04:00
Nick Ethier
b4faaa89bb logmon: remove sleeps from tests 2019-03-20 10:45:09 -04:00
Nick Ethier
76c9decfe6 logmon: add tests for rotation and open/closing of fifos 2019-03-19 14:41:23 -04:00
Nick Ethier
c62f9a0f58 logmon: make Start rpc idempotent and simplify hook 2019-03-19 14:02:36 -04:00
Nick Ethier
a28a67d263 logmon:add static check for logmon exited hook 2019-03-18 15:59:43 -04:00
Nick Ethier
2b1e977639 client/logmon: restart log collection correctly when a task is restarted 2019-03-15 23:59:18 -04:00
Mahmood Ali
eb5ab38ae5 Regenerate Proto files (#5421)
Noticed that the protobuf files are out of sync with ones generated by 1.2.0 protoc go plugin.

The cause for these files seem to be related to release processes, e.g. [0.9.0-beta1 preperation](ecec3d38de (diff-da4da188ee496377d456025c2eab4e87)), and [0.9.0-beta3 preperation](b849d84f2f).

This restores the changes to that of the pinned protoc version and fails build if protobuf files are out of sync.  Sample failing Travis job is that of the first commit change: https://travis-ci.org/hashicorp/nomad/jobs/506285085
2019-03-14 10:56:27 -04:00