Commit Graph

1485 Commits

Author SHA1 Message Date
Seth Hoenig
d24d470775 comments: cleanup some leftover debug comments and such 2020-01-31 19:04:35 -06:00
Seth Hoenig
ead935d12c agent: re-enable the server in dev mode 2020-01-31 19:04:19 -06:00
Seth Hoenig
9f48d83378 nomad: handle SI token revocations concurrently
Be able to revoke SI token accessors concurrently, and also
ratelimit the requests being made to Consul for the various
ACL API uses.
2020-01-31 19:04:14 -06:00
Seth Hoenig
d85cccc8d0 nomad: fixup token policy validation 2020-01-31 19:04:08 -06:00
Seth Hoenig
674ccaa122 nomad: proxy requests for Service Identity tokens between Clients and Consul
Nomad jobs may be configured with a TaskGroup which contains a Service
definition that is Consul Connect enabled. These service definitions end
up establishing a Consul Connect Proxy Task (e.g. envoy, by default). In
the case where Consul ACLs are enabled, a Service Identity token is required
for these tasks to run & connect, etc. This changeset enables the Nomad Server
to recieve RPC requests for the derivation of SI tokens on behalf of instances
of Consul Connect using Tasks. Those tokens are then relayed back to the
requesting Client, which then injects the tokens in the secrets directory of
the Task.
2020-01-31 19:03:53 -06:00
Seth Hoenig
0040c75e8e command, docs: create and document consul token configuration for connect acls (gh-6716)
This change provides an initial pass at setting up the configuration necessary to
enable use of Connect with Consul ACLs. Operators will be able to pass in a Consul
Token through `-consul-token` or `$CONSUL_TOKEN` in the `job run` and `job revert`
commands (similar to Vault tokens).

These values are not actually used yet in this changeset.
2020-01-31 19:02:53 -06:00
Michael Schurter
e3e1f5cb53 core: add limits to unauthorized connections
Introduce limits to prevent unauthorized users from exhausting all
ephemeral ports on agents:

 * `{https,rpc}_handshake_timeout`
 * `{http,rpc}_max_conns_per_client`

The handshake timeout closes connections that have not completed the TLS
handshake by the deadline (5s by default). For RPC connections this
timeout also separately applies to first byte being read so RPC
connections with TLS enabled have `rpc_handshake_time * 2` as their
deadline.

The connection limit per client prevents a single remote TCP peer from
exhausting all ephemeral ports. The default is 100, but can be lowered
to a minimum of 26. Since streaming RPC connections create a new TCP
connection (until MultiplexV2 is used), 20 connections are reserved for
Raft and non-streaming RPCs to prevent connection exhaustion due to
streaming RPCs.

All limits are configurable and may be disabled by setting them to `0`.

This also includes a fix that closes connections that attempt to create
TLS RPC connections recursively. While only users with valid mTLS
certificates could perform such an operation, it was added as a
safeguard to prevent programming errors before they could cause resource
exhaustion.
2020-01-30 10:38:25 -08:00
Mahmood Ali
eb0acc3301 Merge pull request #6935 from hashicorp/b-default-preemption-flag
scheduler: allow configuring default preemption for system scheduler
2020-01-28 15:11:06 -05:00
Mahmood Ali
31025d6cac Support customizing full scheduler config 2020-01-28 14:51:42 -05:00
Nick Ethier
d6dd9b61ef consul: fix var name from rebase 2020-01-27 14:00:19 -05:00
Nick Ethier
330d24cb80 consul: fix var name from rebase 2020-01-27 12:55:52 -05:00
Nick Ethier
64f4e9e691 consul: add support for canary meta 2020-01-27 09:53:30 -05:00
Mahmood Ali
744c9a485d scheduler: allow configuring default preemption for system scheduler
Some operators want a greater control over when preemption is enabled,
especially during an upgrade to limit potential side-effects.
2020-01-13 08:30:49 -05:00
Drew Bailey
a58b8a5e9c refactor api profile methods
comment why we ignore errors parsing params
2020-01-09 15:15:12 -05:00
Drew Bailey
ad86438fc0 adds qc param, address pr feedback 2020-01-09 15:15:11 -05:00
Drew Bailey
28265086fc condense table test 2020-01-09 15:15:10 -05:00
Drew Bailey
549045fcbb Rename profile package to pprof
Address pr feedback, rename profile package to pprof to more accurately
describe its purpose. Adds gc param for heap lookup profiles.
2020-01-09 15:15:10 -05:00
Drew Bailey
1776458956 address pr feedback 2020-01-09 15:15:09 -05:00
Drew Bailey
a3f73b3e06 leave acl checking to rpc endpoints
fix test expectation

test wrapNonJSON
2020-01-09 15:15:08 -05:00
Drew Bailey
db382d3195 provide helpful error, cleanup logic 2020-01-09 15:15:08 -05:00
Drew Bailey
11563dca1c prevent doubly wrapping with rpc error 2020-01-09 15:15:07 -05:00
Drew Bailey
d77b5add6c RPC server EnableDebug option
Passes in agent enable_debug config to nomad server and client configs.
This allows for rpc endpoints to have more granular control if they
should be enabled or not in combination with ACLs.

enable debug on client test
2020-01-09 15:15:07 -05:00
Drew Bailey
328075591f region forwarding; prevent recursive forwards for impossible requests
prevent region forwarding loop, backfill tests

fix failing test
2020-01-09 15:15:06 -05:00
Drew Bailey
390e22e421 move shared structs out of client and into nomad 2020-01-09 15:15:05 -05:00
Drew Bailey
57dc0c6a46 test pprof headers and profile methods
tidy up, add comments

clean up seconds param assignment
2020-01-09 15:15:04 -05:00
Drew Bailey
c28e5ad036 warn when enabled debug is on when registering
m -> a receiver name

return codederrors, fix query
2020-01-09 15:15:04 -05:00
Drew Bailey
d077cfe763 acl and debug test table
rename implementation method
2020-01-09 15:15:03 -05:00
Drew Bailey
fb1b4cdc26 Server request forwarding for Agent.Profile
Return rpc errors for profile requests, set up remote forwarding to
target leader or server id for profile requests.

server forwarding, endpoint tests
2020-01-09 15:15:03 -05:00
Drew Bailey
3575f1777f test for known pprof endpoints 2020-01-09 15:15:02 -05:00
Drew Bailey
240c0ee0ec agent pprof endpoints
wip, agent endpoint and client endpoint for pprof profiles

agent endpoint test
2020-01-09 15:15:02 -05:00
Drew Bailey
8c1f9b4128 docs for shutdown delay
update docs, address pr comments

ensure pointer is not nil

use pointer for diff tests, set vs unset
2019-12-16 11:38:35 -05:00
Drew Bailey
672b76056b shutdown delay for task groups
copy struct values

ensure groupserviceHook implements RunnerPreKillhook

run deregister first

test that shutdown times are delayed

move magic number into variable
2019-12-16 11:38:16 -05:00
Seth Hoenig
94c60b4cfa tests: swap lib/freeport for tweaked helper/freeport
Copy the updated version of freeport (sdk/freeport), and tweak it for use
in Nomad tests. This means staying below port 10000 to avoid conflicts with
the lib/freeport that is still transitively used by the old version of
consul that we vendor. Also provide implementations to find ephemeral ports
of macOS and Windows environments.

Ports acquired through freeport are supposed to be returned to freeport,
which this change now also introduces. Many tests are modified to include
calls to a cleanup function for Server objects.

This should help quite a bit with some flakey tests, but not all of them.
Our port problems will not go away completely until we upgrade our vendor
version of consul. With Go modules, we'll probably do a 'replace' to swap
out other copies of freeport with the one now in 'nomad/helper/freeport'.
2019-12-09 08:37:32 -06:00
Michael Schurter
aaffe61863 Merge branch 'master' into release-0102 2019-12-04 14:13:34 -08:00
Mahmood Ali
2afc939588 tests: deflake TestHTTP_FreshClientAllocMetrics
The test asserts that alloc counts get reported accurately in metrics by
inspecting the metrics endpoint directly.  Sadly, the metrics as
collected by `armon/go-metrics` seem to be stateful and may contain info
from other tests.

This means that the test can fail depending on the order of returned
metrics.

Inspecting the metrics output of one failing run, you can see the
duplicate guage entries but for different node_ids:

```
    {
      "Name": "service-name.default-0a3ba4b6-2109-485e-be74-6864228aed3d.client.allocations.terminal",
      "Value": 10,
      "Labels": {
        "datacenter": "dc1",
        "node_class": "none",
        "node_id": "67402bf4-00f3-bd8d-9fa8-f4d1924a892a"
      }
    },
    {
      "Name": "service-name.default-0a3ba4b6-2109-485e-be74-6864228aed3d.client.allocations.terminal",
      "Value": 0,
      "Labels": {
        "datacenter": "dc1",
        "node_class": "none",
        "node_id": "a2945b48-7e66-68e2-c922-49b20dd4e20c"
      }
    },
```
2019-11-22 18:41:21 -05:00
Nomad Release bot
33d411d6a5 Generate files for 0.10.2-rc1 release 2019-11-22 18:42:49 +00:00
Mahmood Ali
bbc7b6b486 Merge pull request #6669 from hashicorp/b-cors-allow-credentials
Allow UI to query client directly for task logs/state
2019-11-20 15:14:01 -05:00
Preetha
6cab7872a8 Merge pull request #6421 from hashicorp/b-acl-bootstrap-codes
api: acl bootstrap errors aren't 500
2019-11-20 10:36:08 -06:00
Preetha
d4f801d188 Merge pull request #6349 from hashicorp/b-host-stats
client: Return empty values when host stats fail
2019-11-20 10:13:02 -06:00
Mahmood Ali
fd66b9c9ab api: acl bootstrap errors aren't 500
Noticed that ACL endpoints return 500 status code for user errors.  This
is confusing and can lead to false monitoring alerts.

Here, I introduce a concept of RPCCoded errors to be returned by RPC
that signal a code in addition to error message.  Codes for now match
HTTP codes to ease reasoning.

```
$ nomad acl bootstrap
Error bootstrapping: Unexpected response code: 500 (ACL bootstrap already done (reset index: 9))

$ nomad acl bootstrap
Error bootstrapping: Unexpected response code: 400 (ACL bootstrap already done (reset index: 9))
```
2019-11-19 15:51:57 -05:00
Nick Ethier
387b016ac4 client: improve group service stanza interpolation and check_re… (#6586)
* client: improve group service stanza interpolation and check_restart support

Interpolation can now be done on group service stanzas. Note that some task runtime specific information
that was previously available when the service was registered poststart of a task is no longer available.

The check_restart stanza for checks defined on group services will now properly restart the allocation upon
check failures if configured.
2019-11-18 13:04:01 -05:00
Drew Bailey
e46c41553d serverID to target remote leader or server
handle the case where we request a server-id which is this current server

update docs, error on node and server id params

more accurate names for tests

use shared no leader err, formatting

rm bad comment

remove redundant variable
2019-11-14 10:07:35 -05:00
Drew Bailey
fb49f3c35b add server-id to monitor specific server 2019-11-14 09:53:41 -05:00
Drew Bailey
c24a631def Merge pull request #6670 from hashicorp/api/fallthrough-test
test rootfallthrough handler
2019-11-13 10:51:31 -05:00
Lars Lehtonen
75476e350d command/agent: Prune Dead Code (#6682)
* remove unused MockPeriodicJob() from tests
* remove unused getIndex() from tests
* remove unused checkIndex() from tests
* remove unused assertIndex() from tests
* remove unused Agent.findLoopbackDevice()
2019-11-13 08:20:01 -05:00
Drew Bailey
e187a7f3f9 fix so assertions are test case driven 2019-11-12 14:28:21 -05:00
Charlie Voiselle
1ec6388145 Added service wrapper code (#6220)
This is the basic code to add the Windows Service Manager hooks to Nomad.

Includes vendoring golang.org/x/sys/windows/svc and added Docs:
* guide for installing as a windows service.
* configuration for logging to file from PR #6429
2019-11-11 15:16:07 -05:00
Drew Bailey
e84eed84d4 test /ui/ path 2019-11-11 12:12:42 -05:00
Drew Bailey
6ad44f7ea6 test rootfallthrough handler 2019-11-11 12:08:44 -05:00
Mahmood Ali
5a0826fdf9 Allow UI to query client directly
Nomad web UI currently fails when querying client nodes for allocation
state end endpoints, due to CORS policy.

The issue is that CORS requests that are marked `withCredentials` need
the http server to include a `Access-Control-Allow-Credentials` [1].

But Nomad Task Logs and filesystem requests include authenticating
information and thus marked with `credentials=true`[2][3].

It's worth noting that the browser currently sends credentials and
authentication token to servers anyway; it's just that the response is
not made available to caller nomad ui javascript.  For task logs
specifically, nomad ui retries again by querying the web ui address
(typically pointing to a nomad server) which will forward the request
to the nomad client agent appropriately.

[1] https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Access-Control-Allow-Credentials
[2] 101d0373ee/ui/app/components/task-log.js (L50)
[3] 101d0373ee/ui/app/services/token.js (L25-L39)
2019-11-11 15:13:30 +00:00