Commit Graph

17016 Commits

Author SHA1 Message Date
Michael Schurter
b48a21cc77 test: wait longer than timeout
The 1s timeout raced with the 1s deadline it was trying to detect.
2020-02-07 15:50:53 -08:00
Michael Schurter
19d77d9c04 test: fix flaky health test
Test set Agent.client=nil which prevented the client from being
shutdown. This leaked goroutines and could cause panics due to the
leaked client goroutines logging after their parent test had finished.

Removed ACLs from the server test because I couldn't get it to work with
the test agent, and it tested very little.
2020-02-07 15:50:53 -08:00
Michael Schurter
46d5d3e583 test: fix race around reused default rpc addr
The default RPC addr was a global which is fine for normal runtime use
when it only has a single user.

However many tests modify it and cause races. Follow our convention of
returning defaults from funcs instead of using globals.
2020-02-07 15:50:53 -08:00
Michael Schurter
b1f443500d client: fix race accessing Node.status
* Call Node.Canonicalize once when Node is created.
 * Lock when accessing fields mutated by node update goroutine
2020-02-07 15:50:47 -08:00
Seth Hoenig
729e0c20a5 Merge pull request #7071 from hashicorp/b-e2e-cacls-wait-longer
e2e: wait 2m rather than 10s after disabling consul acls
2020-02-04 14:05:10 -06:00
Michael Schurter
a74917e07b Merge pull request #7074 from hashicorp/docs-changelog-6065
docs: #6065 shipped in v0.10.0, not v0.9.6
2020-02-04 11:57:46 -08:00
Michael Schurter
d55c549a31 docs: #6065 shipped in v0.10.0, not v0.9.6
PR #6065 was intended to be backported to v0.9.6 to fix issue #6223.
However it appears to have not been backported:

 * https://github.com/hashicorp/nomad/blob/v0.9.6/client/allocrunner/taskrunner/task_runner.go#L1349-L1351
 * https://github.com/hashicorp/nomad/blob/v0.9.7/client/allocrunner/taskrunner/task_runner.go#L1349-L1351

The fix was included in v0.10.0:

 * https://github.com/hashicorp/nomad/blob/v0.10.0/client/allocrunner/taskrunner/task_runner.go#L1363-L1370
2020-02-04 11:19:47 -08:00
Drew Bailey
f944959cdd Merge pull request #7072 from hashicorp/system-sched-e2e
System sched e2e
2020-02-04 14:16:21 -05:00
Drew Bailey
84cc906968 simplify job, better error 2020-02-04 13:59:39 -05:00
Drew Bailey
8bf5016880 fix check 2020-02-04 12:16:20 -05:00
Drew Bailey
39c9c20e88 rm unused field 2020-02-04 12:02:01 -05:00
Drew Bailey
3609e3adc1 clean up 2020-02-04 11:59:28 -05:00
Drew Bailey
5c2075e463 get test passing, new util func to wait for not pending 2020-02-04 11:56:37 -05:00
Drew Bailey
756f5c7d79 add e2e test for system sched ineligible nodes 2020-02-04 11:56:33 -05:00
Seth Hoenig
0f2d9ea915 e2e: wait 2m rather than 10s after disabling consul acls
Pretty sure Consul / Nomad clients are often not ready yet after
the ConsulACLs test disables ACLs, by the time the next test starts
running.

Running locally things tend to work, but in TeamCity this seems to
be a recurring problem. However, when running locally sometimes I do
see that the "show status" step after disabling ACLs, some nodes are
still initializing, suggesting we're right on the border of not waiting
long enough

    nomad node status
    ID        DC   Name              Class   Drain  Eligibility  Status
    0e4dfce2  dc1  EC2AMAZ-JB3NF9P   <none>  false  eligible     ready
    6b90aa06  dc2  ip-172-31-16-225  <none>  false  eligible     ready
    7068558a  dc2  ip-172-31-20-143  <none>  false  eligible     ready
    e0ae3c5c  dc1  ip-172-31-25-165  <none>  false  eligible     ready
    15b59ed6  dc1  ip-172-31-23-199  <none>  false  eligible     initializing

Going to try waiting a full 2 minutes after disabling ACLs, hopefully that
will help things Just Work. In the future, we should probably be parsing the
output of the status checks and actually confirming all nodes are ready.

Even better, maybe that's something shipyard will have built-in.
2020-02-04 10:51:03 -06:00
Tim Gross
ed41d7b590 e2e: rename linux runner to avoid implicit build tag (#7070)
Go implicitly treats files ending with `_linux.go` as build tagged for
Linux only. This broke the e2e provisioning framework on macOS once we
tried importing it into the `e2e/consulacls` module.
2020-02-04 10:55:38 -05:00
Tim Gross
15a2acc741 e2e: improve provisioning defaults and documentation (#7062)
This changeset improves the ergonomics of running the Nomad e2e test
provisioning process by defaulting to a blank `nomad_sha` in the
Terraform configuration. By default, a user will now need to pass in
one of the Nomad version flags. But they won't have to manually edit
the `provisioning.json` file for the common case of deploying a
released version of Nomad, and won't need to put dummy values for
`nomad_sha`.

Includes general documentation improvements.
2020-02-04 10:37:00 -05:00
Seth Hoenig
18bccb2348 Merge pull request #7060 from hashicorp/f-e2e-more-missed-debug
e2e: turn no-ACLs connect tests back on
2020-02-04 08:47:10 -06:00
Seth Hoenig
a2ee80402d e2e: turn no-ACLs connect tests back on
Also cleanup more missed debugging things >.>
2020-02-03 20:46:36 -06:00
Michael Schurter
61d4a44b1f docs: fix typo, ordering, & style in changelog 2020-02-03 13:59:57 -08:00
Mahmood Ali
c7eb60bbac Merge pull request #7055 from hashicorp/r-dev-tweaks-20200203
Grab bag of dev tweaks
2020-02-03 14:25:06 -05:00
Drew Bailey
c038ee0d86 Merge pull request #6975 from hashicorp/b-update-placed-canaries
keep placed canaries aligned in raft store
2020-02-03 14:24:32 -05:00
Michael Schurter
5c43f8c3b1 docs: add link & reorg #6690 in changelog 2020-02-03 11:03:45 -08:00
Drew Bailey
f7fb6219a9 add state store test to ensure PlacedCanaries is updated 2020-02-03 13:58:01 -05:00
Drew Bailey
895e563461 nomad state store must be modified through raft, rm local state change 2020-02-03 13:57:34 -05:00
Drew Bailey
f788316385 keep placed canaries aligned with alloc status 2020-02-03 13:57:33 -05:00
Drew Bailey
d28898be2d Merge pull request #7053 from hashicorp/b-client-monitor-acl-panic
Fix panic when monitoring a local client node
2020-02-03 13:45:46 -05:00
Michael Schurter
a1fee694fd docs: fix misspelling 2020-02-03 10:32:22 -08:00
Michael Lange
bbdfd69ff2 Merge pull request #6979 from hashicorp/f/codeowners
Add the digital marketing team as the code owners for the website dir
2020-02-03 10:28:12 -08:00
Drew Bailey
0eb358632c Merge pull request #6996 from hashicorp/system-sched-ineligible-updates
System sched ignore ineligible updates
2020-02-03 13:22:30 -05:00
Drew Bailey
173ad8315f update changelog 2020-02-03 13:20:07 -05:00
Drew Bailey
4e9dc03d7d agent Profile req nil check s.agent.Server()
clean up logic and tests
2020-02-03 13:20:05 -05:00
Drew Bailey
39a6c6374a Fix panic when monitoring a local client node
Fixes a panic when accessing a.agent.Server() when agent is a client
instead. This pr removes a redundant ACL check since ACLs are validated
at the RPC layer. It also nil checks the agent server and uses Client()
when appropriate.
2020-02-03 13:20:04 -05:00
Mahmood Ali
e8136c0c66 Merge pull request #7045 from hashicorp/b-rpc-fixes
Some fixes to connection pooling
2020-02-03 13:10:15 -05:00
Seth Hoenig
7f43161717 Merge pull request #7054 from hashicorp/f-remove-leftover-debug-line
e2e: remove leftover e2e debug println
2020-02-03 12:02:19 -06:00
Mahmood Ali
3bfc7d125d pool: Clear connection before releasing
This to be consistent with other connection clean up handler as well as consul's https://github.com/hashicorp/consul/blob/v1.6.3/agent/pool/pool.go#L468-L479 .
2020-02-03 12:41:11 -05:00
Mahmood Ali
41d5a690cf make: emit explanation for /api isolation
Emit a slightly helpful message when /api depends on nomad internal
packages.
2020-02-03 12:22:10 -05:00
Mahmood Ali
2e0f98c97a run "make hclfmt" 2020-02-03 12:15:53 -05:00
Seth Hoenig
9ccaa92ba1 e2e: remove leftover debug println statement 2020-02-03 11:15:38 -06:00
Mahmood Ali
d13fe75693 dev: Tweaks to cluster dev scripts
Consolidate all nomad data dir in a single root
`/tmp/nomad-dev-cluster`.  Eases clean up.

Allow running script from any path - don't require devs to cd into
`dev/cluster` directory first.

Also, block while nomad processes are running and prapogate
SIGTERM/SIGINT to nomad processes to shutdown.
2020-02-03 11:50:43 -05:00
Mahmood Ali
896ddf5629 prehook: fix enterprise repo remote value 2020-02-03 11:29:25 -05:00
Mahmood Ali
c0f42dca76 vagrant: disable audio interference
Avoid Vagrant/virtualbox interferring with host audio when the VM boots.
2020-02-03 11:26:41 -05:00
Mahmood Ali
445d0199b7 Merge pull request #7051 from hashicorp/b-copy-jobs-oss
sentinel: copy jobs to prevent mutation
2020-02-03 10:49:44 -05:00
Drew Bailey
e86988cc8d update changelog 2020-02-03 09:04:09 -05:00
Drew Bailey
a880d75b16 comment for filtering reason 2020-02-03 09:02:09 -05:00
Drew Bailey
92f0a343cb add test for node eligibility 2020-02-03 09:02:09 -05:00
Drew Bailey
cd00d6ded5 make diffSystemAllocsForNode aware of eligibility
diffSystemAllocs -> diffSystemAllocsForNode, this function is only used
for diffing system allocations, but lacked awareness of eligible
nodes and the node ID that the allocation was going to be placed.

This change now ignores a change if its existing allocation is on an
ineligible node. For a new allocation, it also checks tainted and
ineligible nodes in the same function instead of nil-ing out the diff
after computation in diffSystemAllocs
2020-02-03 09:02:08 -05:00
Drew Bailey
580baea231 ignore computed diffs if node is ineligible
test flakey, add temp sleeps for debugging

fix computed class
2020-02-03 09:02:08 -05:00
Michael Schurter
5dbccce4de sentinel: copy jobs to prevent mutation
It's unclear whether Sentinel code can mutate values passed to the eval,
so ensure it cannot by copying the job.
2020-02-03 08:48:51 -05:00
Michael Lange
ce3d58191d Merge pull request #7047 from hashicorp/f-ui/node-drain-icons
UI: Node drain status light icons
2020-01-31 22:45:04 -08:00