Commit Graph

3811 Commits

Author SHA1 Message Date
Nick Ethier
9fa47daf5c ar: fix lint errors 2019-07-31 01:03:19 -04:00
Nick Ethier
56d5fe704a ar: rearrange network hook to support building on windows 2019-07-31 01:03:19 -04:00
Nick Ethier
508ff3cf2f ar: fix test that failed due to error renaming 2019-07-31 01:03:19 -04:00
Nick Ethier
da3978b377 plugins/driver: make DriverNetworkManager interface optional 2019-07-31 01:03:19 -04:00
Nick Ethier
35de444e9b ar: plumb error handling into alloc runner hook initialization 2019-07-31 01:03:18 -04:00
Nick Ethier
16343ae23a ar: add tests for network hook 2019-07-31 01:03:18 -04:00
Nick Ethier
c742f8b580 ar: cleanup lint errors 2019-07-31 01:03:18 -04:00
Nick Ethier
c39e8dca6e ar: move linux specific code to it's own file and add tests 2019-07-31 01:03:18 -04:00
Nick Ethier
4a8a96fa1a ar: initial driver based network management 2019-07-31 01:03:17 -04:00
Nick Ethier
e20fa7ccc1 Add network lifecycle management
Adds a new Prerun and Postrun hooks to manage set up of network namespaces
on linux. Work still needs to be done to make the code platform agnostic and
support Docker style network initalization.
2019-07-31 01:03:17 -04:00
Preetha Appan
ebd7002683 remove generated code and change version to 0.10.0 2019-07-30 15:56:05 -05:00
Nomad Release bot
a81aa846a4 Generate files for 0.9.4 release 2019-07-30 19:05:18 +00:00
Preetha Appan
31d674bf39 remove generated code 2019-07-23 12:07:49 -05:00
Nomad Release bot
4999923574 Generate files for 0.9.4-rc1 release 2019-07-22 21:42:36 +00:00
Michael Schurter
1a90b2d082 logmon: fix comment formattinglogmon: fix comment formattinglogmon: fix
comment formattinglogmon: fix comment formattinglogmon: fix comment
formatting
2019-07-22 13:05:01 -07:00
Michael Schurter
12c6ceb27b logmon: ensure errors are still handled properly
...and add a comment to switch back to the old error handling once we
switch to Go 1.12.
2019-07-22 12:49:48 -07:00
Danielle Lancashire
e0126123ab logmon: Workaround golang/go#29119
There's a bug in go1.11 that causes some io operations on windows to
return incorrect errors for some cases when Stat-ing files. To avoid
upgrading to go1.12 in a point release, here we loosen up the cases
where we will attempt to create fifos, and add some logging of
underlying stat errors to help with debugging.
2019-07-22 18:28:12 +02:00
Jasmine Dahilig
e31db578e0 add formatting for hcl parsing error messages (#5972) 2019-07-19 10:04:39 -07:00
Mahmood Ali
80b1344653 Update consul-template dependency to latest
To pick up the fix in
https://github.com/hashicorp/consul-template/pull/1231 .
2019-07-18 07:32:03 +07:00
Mahmood Ali
66bef39dd5 log unrecoverable errors 2019-07-17 11:01:59 +07:00
Mahmood Ali
bbf8f90ecb client/taskrunner: fix stats stats retry logic
Previously, if a channel is closed, we retry the Stats call.  But, if that call
fails, we go in a backoff loop without calling Stats ever again.

Here, we use a utility function for calling driverHandle.Stats call that retries
as one expects.

I aimed to preserve the logging formats but made small improvements as I saw fit.
2019-07-11 13:58:07 +08:00
Preetha Appan
16648b3e70 Test file for detect content type that satisfies linter and encoding 2019-07-10 11:42:04 -05:00
Preetha Appan
7de4018656 code review feedback 2019-07-10 10:41:06 -05:00
Preetha Appan
26652d7a6b Populate task event struct with kill timeout
This makes for a nicer task event message
2019-07-09 09:37:09 -05:00
Preetha Appan
0eae387a96 fix linting failure in test case file 2019-07-08 11:29:12 -05:00
Michael Lange
37f7ecafa2 Use consistent casing in the JSON representation of the AllocFileInfo struct 2019-07-02 17:27:31 -07:00
Preetha Appan
6c52f843e0 Added additional test cases and fixed go test case 2019-07-02 13:25:29 -05:00
Mahmood Ali
72f67c57cb Merge pull request #5905 from hashicorp/b-ar-failed-prestart
Fail alloc if alloc runner prestart hooks fail
2019-07-02 20:25:53 +08:00
Danielle
7968f799a5 Merge pull request #5864 from hashicorp/dani/win-pipe-cleaner
windows: Fix restarts using the raw_exec driver
2019-07-02 13:58:56 +02:00
Danielle Lancashire
c712fdcbd9 fifo: Safer access to Conn 2019-07-02 13:12:54 +02:00
Mahmood Ali
99802390c1 run post-run/post-stop task runner hooks
Handle when prestart failed while restoring a task, to prevent
accidentally leaking consul/logmon processes.
2019-07-02 18:38:32 +08:00
Mahmood Ali
380262613d Fail alloc if alloc runner prestart hooks fail
When an alloc runner prestart hook fails, the task runners aren't invoked
and they remain in a pending state.

This leads to terrible results, some of which are:
* Lockup in GC process as reported in https://github.com/hashicorp/nomad/pull/5861
* Lockup in shutdown process as TR.Shutdown() waits for WaitCh to be closed
* Alloc not being restarted/rescheduled to another node (as it's still in
  pending state)
* Unexpected restart of alloc on a client restart, potentially days/weeks after
  alloc expected start time!

Here, we treat all tasks to have failed if alloc runner prestart hook fails.
This fixes the lockups, and permits the alloc to be rescheduled on another node.

While it's desirable to retry alloc runner in such failures, I opted to treat it
out of scope.  I'm afraid of some subtles about alloc and task runners and their
idempotency that's better handled in a follow up PR.

This might be one of the root causes for
https://github.com/hashicorp/nomad/issues/5840 .
2019-07-02 18:35:47 +08:00
Mahmood Ali
bd7d60ef93 Merge pull request #5890 from hashicorp/b-dont-start-completed-allocs-2
task runner to avoid running task if terminal
2019-07-02 15:31:17 +08:00
Mahmood Ali
009f186eb7 address review comments 2019-07-02 14:53:50 +08:00
Mahmood Ali
22960f821e Merge pull request #5906 from hashicorp/b-alloc-stale-updates
client: defensive against getting stale alloc updates
2019-07-02 12:40:17 +08:00
Preetha Appan
de8ae8bcd2 Improve test cases for detecting content type 2019-07-01 16:24:48 -05:00
Danielle Lancashire
8148466da6 fifo: Close connections and cleanup lock handling 2019-07-01 14:14:29 +02:00
Danielle Lancashire
2e5aba9188 logmon: Add windows compatibility test 2019-07-01 14:14:06 +02:00
Mahmood Ali
2e1978eb1f client: defensive against getting stale alloc updates
When fetching node alloc assignments, be defensive against a stale read before
killing local nodes allocs.

The bug is when both client and servers are restarting and the client requests
the node allocation for the node, it might get stale data as server hasn't
finished applying all the restored raft transaction to store.

Consequently, client would kill and destroy the alloc locally, just to fetch it
again moments later when server store is up to date.

The bug can be reproduced quite reliably with single node setup (configured with
persistence).  I suspect it's too edge-casey to occur in production cluster with
multiple servers, but we may need to examine leader failover scenarios more closely.

In this commit, we only remove and destroy allocs if the removal index is more
recent than the alloc index. This seems like a cheap resiliency fix we already
use for detecting alloc updates.

A more proper fix would be to ensure that a nomad server only serves
RPC calls when state store is fully restored or up to date in leadership
transition cases.
2019-06-29 04:17:35 -05:00
Preetha Appan
f7f41c42e6 Infer content type in alloc fs stat endpoint 2019-06-28 20:31:28 -05:00
Danielle Lancashire
aff554deec appveyor: Run logmon tests 2019-06-28 16:01:41 +02:00
Danielle Lancashire
e6daf3b5bd fifo: Require that fifos do not exist for create
Although this operation is safe on linux, it is not safe on Windows when
using the named pipe interface. To provide a ~reasonable common api
abstraction, here we switch to returning File exists errors on the unix
api.
2019-06-28 13:47:18 +02:00
Danielle Lancashire
76f72fe4bd vendor: Use dani fork of go-winio 2019-06-28 13:47:18 +02:00
Danielle Lancashire
efda81cbbb logmon: Refactor fifo access for windows safety
On unix platforms, it is safe to re-open fifo's for reading after the
first creation if the file is already a fifo, however this is not
possible on windows where this triggers a permissions error on the
socket path, as you cannot recreate it.

We can't transparently handle this in the CreateAndRead handle, because
the Access Is Denied error is too generic to reliably be an IO error.
Instead, we add an explict API for opening a reader to an existing FIFO,
and check to see if the fifo already exists inside the calling package
(e.g logmon)
2019-06-28 13:41:54 +02:00
Mahmood Ali
f3c944aaf5 task runner to avoid running task if terminal
This change fixes a bug where nomad would avoid running alloc tasks if
the alloc is client terminal but the server copy on the client isn't
marked as running.

Here, we fix the case by having task runner uses the
allocRunner.shouldRun() instead of only checking the server updated
alloc.

Here, we preserve much of the invariants such that `tr.Run()` is always
run, and don't change the overall alloc runner and task runner
lifecycles.

Fixes https://github.com/hashicorp/nomad/issues/5883
2019-06-27 11:27:34 +08:00
Danielle Lancashire
079cfb437d tr: Fetch Wait channel before killTask in restart
Currently, if killTask results in the termination of a process before
calling WaitTask, Restart() will incorrectly return a TaskNotFound
error when using the raw_exec driver on Windows.
2019-06-26 15:20:57 +02:00
Mahmood Ali
6708a0ccc9 Merge pull request #5726 from hashicorp/b-plugins-via-init
Use init() to handle plugin invocation
2019-06-18 21:09:03 -04:00
Mahmood Ali
8fb9d25041 comment on use of init() for plugin handlers 2019-06-18 20:54:55 -04:00
Chris Baker
03500494b2 cleanup test 2019-06-18 14:15:25 +00:00
Chris Baker
7050e14eb5 formatting and clarity 2019-06-18 14:00:57 +00:00