Commit Graph

3946 Commits

Author SHA1 Message Date
Chris Dickson
bbb6b2af09 client: expose allocated CPU per task (#6784) 2019-12-09 15:40:22 -05:00
Seth Hoenig
94c60b4cfa tests: swap lib/freeport for tweaked helper/freeport
Copy the updated version of freeport (sdk/freeport), and tweak it for use
in Nomad tests. This means staying below port 10000 to avoid conflicts with
the lib/freeport that is still transitively used by the old version of
consul that we vendor. Also provide implementations to find ephemeral ports
of macOS and Windows environments.

Ports acquired through freeport are supposed to be returned to freeport,
which this change now also introduces. Many tests are modified to include
calls to a cleanup function for Server objects.

This should help quite a bit with some flakey tests, but not all of them.
Our port problems will not go away completely until we upgrade our vendor
version of consul. With Go modules, we'll probably do a 'replace' to swap
out other copies of freeport with the one now in 'nomad/helper/freeport'.
2019-12-09 08:37:32 -06:00
Mahmood Ali
16aef03331 Merge pull request #6788 from hashicorp/b-timeout-logmon-stop
logmon: add timeout to RPC operations
2019-12-06 19:12:06 -05:00
Danielle Lancashire
5efa8994b9 spellcheck: Fix spelling of retrieve 2019-12-05 18:59:47 -06:00
Mahmood Ali
10c97bfb29 Merge pull request #6779 from hashicorp/r-aws-fingerprint-via-library
Use AWS SDK to access EC2 Metadata
2019-12-02 13:30:51 -05:00
Mahmood Ali
3428223004 logmon: add timeout to RPC operations
Add an RPC timeout for logmon.  In
https://github.com/hashicorp/nomad/issues/6461#issuecomment-559747758 ,
`logmonClient.Stop` locked up and indefinitely blocked the task runner
destroy operation.

This is an incremental improvement.  We still need to follow up to
understand how we got to that state, and the full impact of locked-up
Stop and its link to pending allocations on restart.
2019-12-02 10:33:05 -05:00
Mahmood Ali
b55bc6443e fingerprint code refactor
Some code cleanup:

* Use a field for setting EC2 metadata instead of env-vars in testing;
but keep environment variables for backward compatibility reasons

* Update tests to use testify
2019-11-26 10:51:28 -05:00
Mahmood Ali
f24dd5bec9 fingerprint: avoid api query if config overrides it 2019-11-26 10:51:28 -05:00
Mahmood Ali
34b9ee5433 fingerprint: use ec2metadata package 2019-11-26 10:51:27 -05:00
Lars Lehtonen
1c9a66950c client: fix use of T.Fatal inside TestFS_logsImpl_NoFollow() goroutine. 2019-11-25 23:51:28 -08:00
Mahmood Ali
c661d37ca2 fixup! tests: don't assume eth0 network is available 2019-11-21 08:28:20 -05:00
Mahmood Ali
bb81fce18e tests: don't assume eth0 network is available
TestClient_UpdateNodeFromFingerprintKeepsConfig checks a test node
network interface, which is hardcoded to `eth0` and is updated
asynchronously.  This causes flakiness when eth0 isn't available.

Here, we hardcode the value to an arbitrary network interface.
2019-11-20 20:37:30 -05:00
Mahmood Ali
b886e15487 tests: run TestClient_WatchAllocs in non-linux environments 2019-11-20 20:37:29 -05:00
Mahmood Ali
77a0064fcd testS: fix TestClient_RestoreError
When spinning a second client, ensure that it uses new driver
instances, rather than reuse the already shutdown unhealthy drivers from
first instance.

This speeds up tests significantly, but cutting ~50 seconds or so, the
timeout in NewClient until drivers fingerprints.  They never do because
drivers were shutdown already.
2019-11-20 20:37:28 -05:00
Mahmood Ali
47bc949a8f tests: remove TestClient_RestoreError test
TestClient_RestoreError is very slow, taking ~81 seconds.

It has few problematic patterns.  It's unclear what it tests, it
simulates a failure condition where all state db lookup fails and
asserts that alloc fails.  Though starting from
https://github.com/hashicorp/nomad/pull/6216 , we don't fail allocs in
that condition but rather restart them.

Also, the drivers used in second client `c2` are the same singleton
instances used in `c1` and already shutdown.  We ought to start healthy
new driver instances.
2019-11-20 20:37:27 -05:00
Preetha
d4f801d188 Merge pull request #6349 from hashicorp/b-host-stats
client: Return empty values when host stats fail
2019-11-20 10:13:02 -06:00
Lang Martin
c47d52e865 getter: allow the gcs download scheme (#6692) 2019-11-19 09:10:56 -05:00
Nick Ethier
387b016ac4 client: improve group service stanza interpolation and check_re… (#6586)
* client: improve group service stanza interpolation and check_restart support

Interpolation can now be done on group service stanzas. Note that some task runtime specific information
that was previously available when the service was registered poststart of a task is no longer available.

The check_restart stanza for checks defined on group services will now properly restart the allocation upon
check failures if configured.
2019-11-18 13:04:01 -05:00
Drew Bailey
fb49f3c35b add server-id to monitor specific server 2019-11-14 09:53:41 -05:00
Drew Bailey
79411c5e0e coordinate closing of doneCh, use interface to simplify callers
comments
2019-11-05 11:44:26 -05:00
Drew Bailey
33ba36acbd log-json -> json
fix typo command/agent/monitor/monitor.go

Co-Authored-By: Chris Baker <1675087+cgbaker@users.noreply.github.com>

Update command/agent/monitor/monitor.go

Co-Authored-By: Chris Baker <1675087+cgbaker@users.noreply.github.com>

address feedback, lock to prevent send on closed channel

fix lock/unlock for dropped messages
2019-11-05 09:51:59 -05:00
Drew Bailey
bb2a7f4338 address feedback, fix gauge metric name 2019-11-05 09:51:57 -05:00
Drew Bailey
dc977dc8e6 move forwarded monitor request into helper 2019-11-05 09:51:56 -05:00
Drew Bailey
f74bd99b2a monitor command takes no args
rm extra new line

fix lint errors

return after close

fix, simplify test
2019-11-05 09:51:55 -05:00
Drew Bailey
f8eaf1f5af lock in sub select
rm redundant lock

wip to use framing

wip switch to stream frames
2019-11-05 09:51:54 -05:00
Drew Bailey
6bf8617d02 rename function, initialize log level better
underscores instead of dashes for query params
2019-11-05 09:51:53 -05:00
Drew Bailey
1176fc0227 address feedback, use agent_endpoint instead of monitor 2019-11-05 09:51:53 -05:00
Drew Bailey
92d6a30f86 agent:read acl policy for monitor 2019-11-05 09:51:52 -05:00
Drew Bailey
cd60628b31 rpc acl tests for both monitor endpoints 2019-11-05 09:51:51 -05:00
Drew Bailey
e7589301ea enable json formatting, use queryoptions 2019-11-05 09:51:49 -05:00
Drew Bailey
8095b4868a New monitor pkg for shared monitor functionality
Adds new package that can be used by client and server RPC endpoints to
facilitate monitoring based off of a logger

clean up old code

small comment about write

rm old comment about minsize

rename to Monitor

Removes connection logic from monitor command

Keep connection logic in endpoints, use a channel to send results from
monitoring

use new multisink logger and interfaces

small test for dropped messages

update go-hclogger and update sink/intercept logger interfaces
2019-11-05 09:51:49 -05:00
Drew Bailey
890b8a43fb get local rpc endpoint working 2019-11-05 09:51:48 -05:00
Drew Bailey
12819975ee remove log_writer
prefix output with proper spacing

update gzip handler, adjust first byte flow to allow gzip handler bypass

wip, first stab at wiring up rpc endpoint
2019-11-05 09:51:48 -05:00
Michael Schurter
0fcb0d4016 client: fix panic from 0.8 -> 0.10 upgrade
makeAllocTaskServices did not do a nil check on AllocatedResources
which causes a panic when upgrading directly from 0.8 to 0.10. While
skipping 0.9 is not supported we intend to fix serious crashers caused
by such upgrades to prevent cluster outages.

I did a quick audit of the client package and everywhere else that
accesses AllocatedResources appears to be properly guarded by a nil
check.
2019-11-01 07:47:03 -07:00
Lars Lehtonen
d0d18b2a85 client/allocwatcher: fix dropped test error (#6592) 2019-10-31 08:29:25 -04:00
Michael Schurter
523586a6e6 vault: remove dead lease code 2019-10-25 15:08:35 -07:00
Michael Schurter
d42ac815b4 Merge branch 'master' into release-0100 2019-10-22 08:17:57 -07:00
Michael Schurter
d5a95f7810 cleanup post 0.10.0 release 2019-10-22 07:48:09 -07:00
Nomad Release bot
25ee121d95 Generate files for 0.10.0 release 2019-10-22 12:34:56 +00:00
Mahmood Ali
816f8fbb20 Revert "lint: ignore generated windows syscall wrappers"
This reverts commit 482862e6ab.
2019-10-22 08:23:44 -04:00
Michael Schurter
f989769b06 client: expose group network ports in env vars
Fixes #6375

Intentionally omitted IPs prior to 0.10.0 release to minimize changes
and risk.
2019-10-21 13:28:35 -07:00
Michael Schurter
5a6818510e client: expose group network ports in env vars
Fixes #6375

Intentionally omitted IPs prior to 0.10.0 release to minimize changes
and risk.
2019-10-21 12:31:13 -07:00
Michael Schurter
1c46510482 connect: upgrade to envoy 1.11.2 and add sha
Append the Docker image sha to the Envoy image to ensure users default
to using the version that Nomad was tested against.
2019-10-18 10:16:58 -07:00
Michael Schurter
ca57cd2775 connect: upgrade to envoy 1.11.2 and add sha
Append the Docker image sha to the Envoy image to ensure users default
to using the version that Nomad was tested against.
2019-10-18 07:46:53 -07:00
Mahmood Ali
9942dec211 Merge pull request #6290 from hashicorp/r-generated-code-refactor
dev: avoid codecgen code in downstream projects
2019-10-15 08:22:31 -04:00
Michael Schurter
aed2691ca7 Remove 0.10.0-rc1 generated files 2019-10-10 13:31:42 -07:00
Nomad Release bot
c49bf41779 Generate files for 0.10.0-rc1 release 2019-10-10 19:08:23 +00:00
Michael Schurter
43909b1374 Revert "Revert "Use joint context to cancel prestart hooks"" 2019-10-08 11:34:09 -07:00
Michael Schurter
680e30457f Revert "Use joint context to cancel prestart hooks" 2019-10-08 11:27:08 -07:00
Mahmood Ali
7a38784244 acl: check ACL against object namespace
Fix a bug where a millicious user can access or manipulate an alloc in a
namespace they don't have access to.  The allocation endpoints perform
ACL checks against the request namespace, not the allocation namespace,
and performs the allocation lookup independently from namespaces.

Here, we check that the requested can access the alloc namespace
regardless of the declared request namespace.

Ideally, we'd enforce that the declared request namespace matches
the actual allocation namespace.  Unfortunately, we haven't documented
alloc endpoints as namespaced functions; we suspect starting to enforce
this will be very disruptive and inappropriate for a nomad point
release.  As such, we maintain current behavior that doesn't require
passing the proper namespace in request.  A future major release may
start enforcing checking declared namespace.
2019-10-08 12:59:22 -04:00