Commit Graph

2491 Commits

Author SHA1 Message Date
Seth Hoenig
401193bcfa Merge pull request #7192 from hashicorp/b-connect-stanza-ignore
consul/connect: in-place update sidecar service registrations on changes
2020-02-21 09:24:53 -06:00
Seth Hoenig
1225afbf08 consul/connect: in-place update sidecar service registrations on changes
Fix a bug where consul service definitions would not be updated if changes
were made to the service in the Nomad job. Currently this only fixes the
bug for cases where the fix is a matter of updating consul agent's service
registration. There is related bug where destructive changes are required
(see #6877) which will be fixed in another PR.

The enable_tag_override configuration setting for the parent service is
applied to the sidecar service.

Fixes #6459
2020-02-19 13:07:04 -06:00
Mahmood Ali
90381d82ca Merge pull request #7171 from hashicorp/update-autopilot-20200214
Update consul vendor and add MinQuorum flag
2020-02-19 10:45:20 -06:00
Drew Bailey
4b6f44d0dc inlude pro in http_oss.go 2020-02-18 10:29:28 -05:00
Mahmood Ali
a3b0b25acb update rest of consul packages 2020-02-16 16:25:04 -06:00
Mahmood Ali
1d9ffa640b implement MinQuorum 2020-02-16 16:04:59 -06:00
Mahmood Ali
2ccade3364 tests: Avoid StartAsLeader raft config flag
It's being deprecated
2020-02-13 18:56:53 -05:00
Seth Hoenig
1ced8ba47d Merge pull request #7106 from hashicorp/f-ctag-override
client: enable configuring enable_tag_override for services
2020-02-13 12:34:48 -06:00
Michael Schurter
3a01ad4892 Merge pull request #7102 from hashicorp/test-limits
Fix some race conditions and flaky tests
2020-02-13 10:19:11 -08:00
Seth Hoenig
5ccc9a634a command: use consistent CONSUL_HTTP_TOKEN name
Consul CLI uses CONSUL_HTTP_TOKEN, so Nomad should use the same.
Note that consul-template uses CONSUL_TOKEN, which Nomad also uses,
so be careful to preserve any reference to that in the consul-template
context.
2020-02-12 10:42:33 -06:00
Seth Hoenig
6bfd86b1f8 client: enable configuring enable_tag_override for services
Consul provides a feature of Service Definitions where the tags
associated with a service can be modified through the Catalog API,
overriding the value(s) configured in the agent's service configuration.

To enable this feature, the flag enable_tag_override must be configured
in the service definition.

Previously, Nomad did not allow configuring this flag, and thus the default
value of false was used. Now, it is configurable.

Because Nomad itself acts as a state machine around the the service definitions
of the tasks it manages, it's worth describing what happens when this feature
is enabled and why.

Consider the basic case where there is no Nomad, and your service is provided
to consul as a boring JSON file. The ultimate source of truth for the definition
of that service is the file, and is stored in the agent. Later, Consul performs
"anti-entropy" which synchronizes the Catalog (stored only the leaders). Then
with enable_tag_override=true, the tags field is available for "external"
modification through the Catalog API (rather than directly configuring the
service definition file, or using the Agent API). The important observation
is that if the service definition ever changes (i.e. the file is changed &
config reloaded OR the Agent API is used to modify the service), those
"external" tag values are thrown away, and the new service definition is
once again the source of truth.

In the Nomad case, Nomad itself is the source of truth over the Agent in
the same way the JSON file was the source of truth in the example above.
That means any time Nomad sets a new service definition, any externally
configured tags are going to be replaced. When does this happen? Only on
major lifecycle events, for example when a task is modified because of an
updated job spec from the 'nomad job run <existing>' command. Otherwise,
Nomad's periodic re-sync's with Consul will now no longer try to restore
the externally modified tag values (as long as enable_tag_override=true).

Fixes #2057
2020-02-10 08:00:55 -06:00
Michael Schurter
040472224b test: fix flaky TestHTTP_FreshClientAllocMetrics 2020-02-07 15:50:53 -08:00
Michael Schurter
71304a306f test: fix missing agent shutdowns 2020-02-07 15:50:53 -08:00
Michael Schurter
6198c604ea testagent: fix case where agent would retry forever 2020-02-07 15:50:53 -08:00
Michael Schurter
f3cf1064d1 test: improve error messages when failing 2020-02-07 15:50:53 -08:00
Michael Schurter
683d34fc18 test: allow goroutine to exit even if test blocks 2020-02-07 15:50:53 -08:00
Michael Schurter
4ed435da05 test: workaround limits race 2020-02-07 15:50:53 -08:00
Michael Schurter
b48a21cc77 test: wait longer than timeout
The 1s timeout raced with the 1s deadline it was trying to detect.
2020-02-07 15:50:53 -08:00
Michael Schurter
19d77d9c04 test: fix flaky health test
Test set Agent.client=nil which prevented the client from being
shutdown. This leaked goroutines and could cause panics due to the
leaked client goroutines logging after their parent test had finished.

Removed ACLs from the server test because I couldn't get it to work with
the test agent, and it tested very little.
2020-02-07 15:50:53 -08:00
Michael Schurter
b1f443500d client: fix race accessing Node.status
* Call Node.Canonicalize once when Node is created.
 * Lock when accessing fields mutated by node update goroutine
2020-02-07 15:50:47 -08:00
Drew Bailey
4e9dc03d7d agent Profile req nil check s.agent.Server()
clean up logic and tests
2020-02-03 13:20:05 -05:00
Drew Bailey
39a6c6374a Fix panic when monitoring a local client node
Fixes a panic when accessing a.agent.Server() when agent is a client
instead. This pr removes a redundant ACL check since ACLs are validated
at the RPC layer. It also nil checks the agent server and uses Client()
when appropriate.
2020-02-03 13:20:04 -05:00
Seth Hoenig
d24d470775 comments: cleanup some leftover debug comments and such 2020-01-31 19:04:35 -06:00
Seth Hoenig
ead935d12c agent: re-enable the server in dev mode 2020-01-31 19:04:19 -06:00
Seth Hoenig
9f48d83378 nomad: handle SI token revocations concurrently
Be able to revoke SI token accessors concurrently, and also
ratelimit the requests being made to Consul for the various
ACL API uses.
2020-01-31 19:04:14 -06:00
Seth Hoenig
d85cccc8d0 nomad: fixup token policy validation 2020-01-31 19:04:08 -06:00
Seth Hoenig
674ccaa122 nomad: proxy requests for Service Identity tokens between Clients and Consul
Nomad jobs may be configured with a TaskGroup which contains a Service
definition that is Consul Connect enabled. These service definitions end
up establishing a Consul Connect Proxy Task (e.g. envoy, by default). In
the case where Consul ACLs are enabled, a Service Identity token is required
for these tasks to run & connect, etc. This changeset enables the Nomad Server
to recieve RPC requests for the derivation of SI tokens on behalf of instances
of Consul Connect using Tasks. Those tokens are then relayed back to the
requesting Client, which then injects the tokens in the secrets directory of
the Task.
2020-01-31 19:03:53 -06:00
Seth Hoenig
0040c75e8e command, docs: create and document consul token configuration for connect acls (gh-6716)
This change provides an initial pass at setting up the configuration necessary to
enable use of Connect with Consul ACLs. Operators will be able to pass in a Consul
Token through `-consul-token` or `$CONSUL_TOKEN` in the `job run` and `job revert`
commands (similar to Vault tokens).

These values are not actually used yet in this changeset.
2020-01-31 19:02:53 -06:00
Michael Schurter
e3e1f5cb53 core: add limits to unauthorized connections
Introduce limits to prevent unauthorized users from exhausting all
ephemeral ports on agents:

 * `{https,rpc}_handshake_timeout`
 * `{http,rpc}_max_conns_per_client`

The handshake timeout closes connections that have not completed the TLS
handshake by the deadline (5s by default). For RPC connections this
timeout also separately applies to first byte being read so RPC
connections with TLS enabled have `rpc_handshake_time * 2` as their
deadline.

The connection limit per client prevents a single remote TCP peer from
exhausting all ephemeral ports. The default is 100, but can be lowered
to a minimum of 26. Since streaming RPC connections create a new TCP
connection (until MultiplexV2 is used), 20 connections are reserved for
Raft and non-streaming RPCs to prevent connection exhaustion due to
streaming RPCs.

All limits are configurable and may be disabled by setting them to `0`.

This also includes a fix that closes connections that attempt to create
TLS RPC connections recursively. While only users with valid mTLS
certificates could perform such an operation, it was added as a
safeguard to prevent programming errors before they could cause resource
exhaustion.
2020-01-30 10:38:25 -08:00
Mahmood Ali
b789b507d1 Merge pull request #6922 from hashicorp/b-alloc-canoncalize
Handle Upgrades and Alloc.TaskResources modification
2020-01-28 15:12:41 -05:00
Mahmood Ali
eb0acc3301 Merge pull request #6935 from hashicorp/b-default-preemption-flag
scheduler: allow configuring default preemption for system scheduler
2020-01-28 15:11:06 -05:00
Mahmood Ali
31025d6cac Support customizing full scheduler config 2020-01-28 14:51:42 -05:00
Nick Ethier
d6dd9b61ef consul: fix var name from rebase 2020-01-27 14:00:19 -05:00
Nick Ethier
330d24cb80 consul: fix var name from rebase 2020-01-27 12:55:52 -05:00
Nick Ethier
64f4e9e691 consul: add support for canary meta 2020-01-27 09:53:30 -05:00
Danielle
2121d57281 cli: add system command and subcmds to interact with system API. (#6924)
cli: add system command and subcmds to interact with system API.
2020-01-13 16:16:08 +01:00
Mahmood Ali
744c9a485d scheduler: allow configuring default preemption for system scheduler
Some operators want a greater control over when preemption is enabled,
especially during an upgrade to limit potential side-effects.
2020-01-13 08:30:49 -05:00
James Rasell
0ea8fdfa81 cli: add system command and subcmds to interact with system API.
The system command includes gc and reconcile-summaries subcommands
which covers all currently available system API calls. The help
information is largely pulled from the current Nomad website API
documentation.
2020-01-13 11:34:46 +01:00
Drew Bailey
a58b8a5e9c refactor api profile methods
comment why we ignore errors parsing params
2020-01-09 15:15:12 -05:00
Drew Bailey
ad86438fc0 adds qc param, address pr feedback 2020-01-09 15:15:11 -05:00
Drew Bailey
28265086fc condense table test 2020-01-09 15:15:10 -05:00
Drew Bailey
549045fcbb Rename profile package to pprof
Address pr feedback, rename profile package to pprof to more accurately
describe its purpose. Adds gc param for heap lookup profiles.
2020-01-09 15:15:10 -05:00
Drew Bailey
1776458956 address pr feedback 2020-01-09 15:15:09 -05:00
Drew Bailey
a3f73b3e06 leave acl checking to rpc endpoints
fix test expectation

test wrapNonJSON
2020-01-09 15:15:08 -05:00
Drew Bailey
db382d3195 provide helpful error, cleanup logic 2020-01-09 15:15:08 -05:00
Drew Bailey
11563dca1c prevent doubly wrapping with rpc error 2020-01-09 15:15:07 -05:00
Drew Bailey
d77b5add6c RPC server EnableDebug option
Passes in agent enable_debug config to nomad server and client configs.
This allows for rpc endpoints to have more granular control if they
should be enabled or not in combination with ACLs.

enable debug on client test
2020-01-09 15:15:07 -05:00
Drew Bailey
328075591f region forwarding; prevent recursive forwards for impossible requests
prevent region forwarding loop, backfill tests

fix failing test
2020-01-09 15:15:06 -05:00
Drew Bailey
390e22e421 move shared structs out of client and into nomad 2020-01-09 15:15:05 -05:00
Drew Bailey
57dc0c6a46 test pprof headers and profile methods
tidy up, add comments

clean up seconds param assignment
2020-01-09 15:15:04 -05:00