Commit Graph

4229 Commits

Author SHA1 Message Date
Pete Woods
f40e6eed65 Add node "status", "scheduling eligibility" to all client metrics (#8925)
- We previously added these to the client host metrics, but it's useful to have them on all client metrics.
- e.g. so you can exclude draining nodes from charts showing your fleet size.
2020-09-22 13:53:50 -04:00
Pierre Cauchois
ca1b85b36d RPC Timeout/Retries account for blocking requests (#8921)
The current implementation measures RPC request timeout only against
config.RPCHoldTimeout, which is fine for non-blocking requests but will
almost surely be exceeded by long-poll requests that block for minutes
at a time.

This adds an HasTimedOut method on the RPCInfo interface that takes into
account whether the request is blocking, its maximum wait time, and the
RPCHoldTimeout.
2020-09-18 08:58:41 -04:00
Joel May
767e8909d7 fingerprinting: add AWS MAC and public-ipv6 (#8887) 2020-09-17 09:03:01 -04:00
Lars Lehtonen
79d983f982 client/allocrunner/taskrunner: client.Close after err check (#8825) 2020-09-04 08:12:08 -04:00
Tim Gross
ccdfd6170c fix params for Agent.Host client RPC (#8795)
The parameters for the receiving side of the Agent.Host client RPC did not
take the arguments serialized at the server side. This results in a panic.
2020-08-31 17:14:26 -04:00
Jasmine Dahilig
8faece3bd7 Merge pull request #8390 from hashicorp/lifecycle-poststart-hook
task lifecycle poststart hook
2020-08-31 13:53:24 -07:00
Jasmine Dahilig
81cad55d40 task lifecycle poststart: code review fixes 2020-08-31 13:22:41 -07:00
Seth Hoenig
a46ba7f4ac Merge branch 'master' into f-cc-ingress 2020-08-26 15:31:05 -05:00
Seth Hoenig
abd38b3a86 consul/connect: fixup some comments and context timeout 2020-08-26 13:17:16 -05:00
Mahmood Ali
3dcf60a61e close file when done reading 2020-08-24 20:22:42 -04:00
Mahmood Ali
b84bd95b6a don't lock if ref is nil
Ensure that d.mu is only dereferenced if d is not-nil, to avoid a null
dereference panic.
2020-08-24 20:19:40 -04:00
Seth Hoenig
db8020f4eb consul/connect: fixup tests to use new consul sdk 2020-08-24 12:02:41 -05:00
Seth Hoenig
30ac347677 Merge branch 'master' into consul-v1.7.7 2020-08-24 10:43:00 -05:00
Yoan Blanc
e1ee6a45b1 fixup! vendor: consul/api, consul/sdk v1.6.0
Signed-off-by: Yoan Blanc <yoan@dosimple.ch>
2020-08-24 08:59:03 +02:00
Mark Lee
893d4d102f refactor lookup code 2020-08-24 12:24:16 +09:00
Mark Lee
08dfc80724 lookup kernel builtin modules too 2020-08-24 11:09:13 +09:00
Seth Hoenig
9ffdeed904 consul/connect: add initial support for ingress gateways
This PR adds initial support for running Consul Connect Ingress Gateways (CIGs) in Nomad. These gateways are declared as part of a task group level service definition within the connect stanza.

```hcl
service {
  connect {
    gateway {
      proxy {
        // envoy proxy configuration
      }
      ingress {
        // ingress-gateway configuration entry
      }
    }
  }
}
```

A gateway can be run in `bridge` or `host` networking mode, with the caveat that host networking necessitates manually specifying the Envoy admin listener (which cannot be disabled) via the service port value.

Currently Envoy is the only supported gateway implementation in Consul, and Nomad only supports running Envoy as a gateway using the docker driver.

Aims to address #8294 and tangentially #8647
2020-08-21 16:21:54 -05:00
Shengjing Zhu
274bf2ee1c Adjust cgroup change in libcontainer 2020-08-20 00:31:07 +08:00
Michael Schurter
599b56e054 test: add allocrunner test for poststart hooks 2020-08-12 09:54:14 -07:00
Nick Ethier
c11dbcd001 docker: support group allocated ports and host_networks (#8623)
* docker: support group allocated ports

* docker: add new ports driver config to specify which group ports are mapped

* docker: update port mapping docs
2020-08-11 18:30:22 -04:00
Lang Martin
c0bf46da1e CSI RPC Token (#8626)
* client/allocrunner/csi_hook: use the Node SecretID
* client/allocrunner/csi_hook: include the namespace for Claim
2020-08-11 13:08:39 -04:00
Michael Schurter
0d504ade1b client: remove shortcircuit preventing poststart hooks from running 2020-08-11 09:48:24 -07:00
Michael Schurter
5e9a8bbd52 client: don't restart poststart sidecars on success 2020-08-11 09:47:18 -07:00
Tim Gross
079f60cd63 csi: client RPCs should return wrapped errors for checking (#8605)
When the client-side actions of a CSI client RPC succeed but we get
disconnected during the RPC or we fail to checkpoint the claim state, we want
to be able to retry the client RPC without getting blocked by the client-side
state (ex. mount points) already having been cleaned up in previous calls.
2020-08-07 11:01:36 -04:00
Tim Gross
9384b1f77e csi: release claims via csi_hook postrun unpublish RPC (#8580)
Add a Postrun hook to send the `CSIVolume.Unpublish` RPC to the server. This
may forward client RPCs to the node plugins or to the controller plugins,
depending on whether other allocations on this node have claims on this
volume.

By making clients responsible for running the `CSIVolume.Unpublish` RPC (and
making the RPC available to a `nomad volume detach` command), the
volumewatcher becomes only used by the core GC job and we no longer need
async volume GC from job deregister and node update.
2020-08-06 14:51:46 -04:00
Jasmine Dahilig
9cf4429518 lifecycle: add allocrunner and task hook coordinator unit tests 2020-07-29 12:39:42 -07:00
Seth Hoenig
1918c2d026 consul/connect: fixup some spelling, comments, consts 2020-07-29 09:26:01 -05:00
Seth Hoenig
b54fe19723 consul/connect: organize lock & fields in http/grpc socket hooks 2020-07-29 09:26:01 -05:00
Seth Hoenig
6c6f3d9fae consul/connect: optimze grpc socket hook check for bridge network first 2020-07-29 09:26:01 -05:00
Seth Hoenig
8ec3aa1716 consul/connect: add support for bridge networks with connect native tasks
Before, Connect Native Tasks needed one of these to work:

- To be run in host networking mode
- To have the Consul agent configured to listen to a unix socket
- To have the Consul agent configured to listen to a public interface

None of these are a great experience, though running in host networking is
still the best solution for non-Linux hosts. This PR establishes a connection
proxy between the Consul HTTP listener and a unix socket inside the alloc fs,
bypassing the network namespace for any Connect Native task. Similar to and
re-uses a bunch of code from the gRPC listener version for envoy sidecar proxies.

Proxy is established only if the alloc is configured for bridge networking and
there is at least one Connect Native task in the Task Group.

Fixes #8290
2020-07-29 09:26:01 -05:00
Drew Bailey
940817aeb5 Merge pull request #8453 from hashicorp/oss-multi-vault-ns
oss compoments for multi-vault namespaces
2020-07-27 08:45:22 -04:00
Mahmood Ali
b290e67dc5 Merge pull request #6517 from hashicorp/b-fingerprint-shutdown-race
client: don't retry fingerprinting on shutdown
2020-07-24 11:56:32 -04:00
Drew Bailey
19810365f6 oss compoments for multi-vault namespaces
adds in oss components to support enterprise multi-vault namespace feature

upgrade specific doc on vault multi-namespaces

vault docs

update test to reflect new error
2020-07-24 10:14:59 -04:00
Michael Schurter
f6512783f3 Merge pull request #8521 from hashicorp/docs-hearbeat
docs: s/hearbeat/heartbeat and fix link
2020-07-23 14:07:24 -07:00
Tim Gross
1cb9e75ec2 csi: NodePublish should not create target_path, only its parent dir (#8505)
The NodePublish workflow currently creates the target path and its parent
directory. However, the CSI specification says that the CO shall ensure the
parent directory of the target path exists, and that the SP shall place the
block device or mounted directory at the target path. Much of our testing has
been with CSI plugins that are more forgiving, but our behavior breaks
spec-compliant CSI plugins.

This changeset ensures we only create the parent directory.
2020-07-23 15:52:22 -04:00
Michael Schurter
b2dea4ca5b docs: s/hearbeat/heartbeat and fix link
Also fixed the same typo in a test. Fixing the typo fixes the link, but
the link was still broken when running the website locally due to the
trailing slash. It would have worked in prod thanks to redirects, but
using the canonical URL seems ideal.
2020-07-23 11:33:34 -07:00
Mahmood Ali
6bf248c790 honor config.NetworkInterface in NodeNetworks 2020-07-21 15:43:45 -04:00
Mahmood Ali
60dd7aecc9 nvidia: support disabling the nvidia plugin (#8353) 2020-07-21 10:11:16 -04:00
Jasmine Dahilig
c0640146a4 fix panic, but poststart is still stalled 2020-07-10 09:03:10 -07:00
Mahmood Ali
467bd62f26 Merge pull request #8333 from hashicorp/b-test-tweak-20200701
tests: avoid os.Exit in a test
2020-07-10 11:18:28 -04:00
Jasmine Dahilig
e1edb29675 add poststart hook to task hook coordinator & structs 2020-07-08 11:01:35 -07:00
Nick Ethier
e94690decb ar: support opting into binding host ports to default network IP (#8321)
* ar: support opting into binding host ports to default network IP

* fix config plumbing

* plumb node address into network resource

* struct: only handle network resource upgrade path once
2020-07-06 18:51:46 -04:00
Lang Martin
bde973e366 api: nomad debug new /agent/host (#8325)
* command/agent/host: collect host data, multi platform

* nomad/structs/structs: new HostDataRequest/Response

* client/agent_endpoint: add RPC endpoint

* command/agent/agent_endpoint: add Host

* api/agent: add the Host endpoint

* nomad/client_agent_endpoint: add Agent Host with forwarding

* nomad/client_agent_endpoint: use findClientConn

This changes forwardMonitorClient and forwardProfileClient to use
findClientConn, which was cribbed from the common parts of those
funcs.

* command/debug: call agent hosts

* command/agent/host: eliminate calling external programs
2020-07-02 09:51:25 -04:00
Mahmood Ali
13a4c56c5a tests: avoid os.Exit in a test 2020-07-01 15:25:13 -04:00
Mahmood Ali
73f19eb3b8 allocrunner: terminate sidecars in the end
This fixes a bug where a batch allocation fails to complete if it has
sidecars.

If the only remaining running tasks in an allocations are sidecars - we
must kill them and mark the allocation as complete.
2020-06-29 15:12:15 -04:00
Seth Hoenig
ef52328eb4 connect/native: doc and comment tweaks from PR 2020-06-24 10:13:22 -05:00
Seth Hoenig
cd3ec2f783 connect/native: check for pre-existing consul token 2020-06-24 09:16:28 -05:00
Seth Hoenig
193e355ec3 connect/native: update connect native hook tests 2020-06-23 12:07:35 -05:00
Seth Hoenig
a87130c0e8 connect/native: give tls files an extension 2020-06-23 12:06:28 -05:00
Seth Hoenig
7e8d5c2392 consul/connect: add support for running connect native tasks
This PR adds the capability of running Connect Native Tasks on Nomad,
particularly when TLS and ACLs are enabled on Consul.

The `connect` stanza now includes a `native` parameter, which can be
set to the name of task that backs the Connect Native Consul service.

There is a new Client configuration parameter for the `consul` stanza
called `share_ssl`. Like `allow_unauthenticated` the default value is
true, but recommended to be disabled in production environments. When
enabled, the Nomad Client's Consul TLS information is shared with
Connect Native tasks through the normal Consul environment variables.
This does NOT include auth or token information.

If Consul ACLs are enabled, Service Identity Tokens are automatically
and injected into the Connect Native task through the CONSUL_HTTP_TOKEN
environment variable.

Any of the automatically set environment variables can be overridden by
the Connect Native task using the `env` stanza.

Fixes #6083
2020-06-22 14:07:44 -05:00