mirror of
https://github.com/kemko/nomad.git
synced 2026-01-06 18:35:44 +03:00
* Docs: Fix broken links in main for 1.10 release * Implement Tim's suggestions * Remove link to Portworx from ecosystem page * remove "Portworx" since Portworx 3.2 no longer supports Nomad
520 lines
16 KiB
Plaintext
520 lines
16 KiB
Plaintext
---
|
|
layout: docs
|
|
page_title: Integrate Consul service mesh
|
|
description: |-
|
|
Learn how to use Nomad with Consul service mesh to enable secure service-to-service communication. Review an example that enables secure communication with Consul TLS, Consul access control lists (ACLs), and a transparent proxy.
|
|
---
|
|
|
|
# Consul service mesh
|
|
|
|
[Consul service mesh](/consul/docs/connect) provides
|
|
service-to-service connection authorization and encryption using mutual
|
|
Transport Layer Security (TLS). Applications can use sidecar proxies in a
|
|
service mesh configuration to automatically establish TLS connections for
|
|
inbound and outbound connections without being aware of the service mesh at all.
|
|
|
|
~> **Note:** Nomad's service mesh integration requires Linux network namespaces.
|
|
Consul service mesh will not run on Windows or macOS.
|
|
|
|
# Nomad with Consul service mesh integration
|
|
|
|
Nomad integrates with Consul to provide secure service-to-service communication
|
|
between Nomad jobs and task groups. To support Consul service mesh, Nomad
|
|
adds a new networking mode for jobs that enables tasks in the same task group to
|
|
share their networking stack. With a few changes to the job specification, job
|
|
authors can opt into service mesh integration. When service mesh is enabled, Nomad will
|
|
launch a proxy alongside the application in the job file. The proxy (Envoy)
|
|
provides secure communication with other applications in the cluster.
|
|
|
|
Nomad job specification authors can use Nomad's Consul service mesh integration to
|
|
implement [service segmentation](https://www.consul.io/use-cases/multi-platform-service-mesh) in a
|
|
microservice architecture running in public clouds without having to directly
|
|
manage TLS certificates. This is transparent to job specification authors as
|
|
security features in service mesh continue to work even as the application scales up
|
|
or down or gets rescheduled by Nomad.
|
|
|
|
For using the Consul service mesh integration with Consul ACLs enabled, see the
|
|
[Secure Nomad Jobs with Consul Service Mesh](/nomad/tutorials/integrate-consul/consul-service-mesh)
|
|
guide.
|
|
|
|
# Nomad Consul service mesh example
|
|
|
|
The following section walks through an example to enable secure communication
|
|
between a web dashboard and a backend counting service. The web dashboard and
|
|
the counting service are managed by Nomad. Nomad additionally configures Envoy
|
|
proxies to run along side these applications. The dashboard is configured to
|
|
connect to the counting service via localhost on port 9001. The proxy is managed
|
|
by Nomad, and handles mTLS communication to the counting service.
|
|
|
|
## Prerequisites
|
|
|
|
### Consul
|
|
|
|
The Consul service mesh integration with Nomad requires [Consul 1.6 or
|
|
later.](https://releases.hashicorp.com/consul/1.6.0/) The Consul agent can be
|
|
run in dev mode with the following command:
|
|
|
|
~> **Note:** Nomad's Consul service mesh integration requires Consul in your `$PATH`
|
|
|
|
```shell-session
|
|
$ consul agent -dev
|
|
```
|
|
|
|
To use service mesh on a non-dev Consul agent, you will minimally need to enable the
|
|
GRPC port and set `connect` to enabled by adding some additional information to
|
|
your Consul client configurations, depending on format. Consul agents running TLS
|
|
and a version greater than [1.14.0](https://releases.hashicorp.com/consul/1.14.0)
|
|
should set the `grpc_tls` configuration parameter instead of `grpc`. Please see
|
|
the Consul [port documentation](https://developer.hashicorp.com/consul/docs/install/ports) for further reference material.
|
|
|
|
For HCL configurations:
|
|
|
|
```hcl
|
|
# ...
|
|
|
|
ports {
|
|
grpc = 8502
|
|
}
|
|
|
|
connect {
|
|
enabled = true
|
|
}
|
|
```
|
|
|
|
For JSON configurations:
|
|
|
|
```javascript
|
|
{
|
|
// ...
|
|
"ports": {
|
|
"grpc": 8502
|
|
},
|
|
"connect": {
|
|
"enabled": true
|
|
}
|
|
}
|
|
```
|
|
|
|
#### Consul TLS
|
|
|
|
~> **Note:** Consul 1.14+ made a [backwards incompatible change][consul_grpc_tls]
|
|
in how TLS enabled grpc listeners work. When using Consul 1.14 with TLS enabled users
|
|
will need to specify additional Nomad agent configuration to work with Connect. The
|
|
`consul.grpc_ca_file` value must now be configured (introduced in Nomad 1.4.4),
|
|
and `consul.grpc_address` will most likely need to be set to use the new standard
|
|
`grpc_tls` port of `8503`.
|
|
|
|
```hcl
|
|
consul {
|
|
grpc_ca_file = "/etc/tls/consul-agent-ca.pem"
|
|
grpc_address = "127.0.0.1:8503"
|
|
ca_file = "/etc/tls/consul-agent-ca.pem"
|
|
cert_file = "/etc/tls/dc1-client-consul-0.pem"
|
|
key_file = "/etc/tls/dc1-client-consul-0-key.pem"
|
|
ssl = true
|
|
address = "127.0.0.1:8501"
|
|
}
|
|
```
|
|
|
|
#### Consul access control lists
|
|
|
|
~> **Note:** Starting in Nomad v1.3.0, Consul Service Identity ACL tokens automatically
|
|
generated by Nomad on behalf of Connect enabled services are now created in [`Local`]
|
|
rather than Global scope, and are no longer replicated globally.
|
|
|
|
To facilitate cross-Consul datacenter requests of Connect services registered by
|
|
Nomad, Consul agents will need to be configured with [default anonymous][anon_token]
|
|
ACL tokens with ACL policies of sufficient permissions to read service and node
|
|
metadata pertaining to those requests. This mechanism is described in Consul [#7414][consul_acl].
|
|
A typical Consul agent anonymous token may contain an ACL policy such as:
|
|
|
|
```hcl
|
|
service_prefix "" { policy = "read" }
|
|
node_prefix "" { policy = "read" }
|
|
```
|
|
|
|
#### Transparent proxy
|
|
|
|
Using Nomad's support for [transparent proxy][] configures the task group's
|
|
network namespace so that traffic flows through the Envoy proxy. When the
|
|
[`transparent_proxy`][] block is enabled:
|
|
|
|
* Nomad will invoke the [`consul-cni`][] CNI plugin to configure `iptables` rules
|
|
in the network namespace to force outbound traffic from an allocation to flow
|
|
through the proxy.
|
|
* If the local Consul agent is serving DNS, Nomad will set the IP address of the
|
|
Consul agent as the nameserver in the task's `/etc/resolv.conf`.
|
|
* Consul will provide a [virtual IP][] for any upstream service the workload
|
|
has access to, based on the service intentions.
|
|
|
|
Using transparent proxy has several important requirements:
|
|
|
|
* You must have the [`consul-cni`][] CNI plugin installed on the client host
|
|
along with the usual [required CNI plugins][cni_plugins].
|
|
* To use Consul DNS and virtual IPs, you will need to configure Consul's DNS
|
|
listener to be exposed to the workload network namespace. You can do this
|
|
without exposing the Consul agent on a public IP by setting the Consul
|
|
`bind_addr` to bind on a private IP address (the default is to use the
|
|
`client_addr`).
|
|
* The Consul agent must be configured with [`recursors`][] if you want
|
|
allocations to make DNS queries for applications outside the service mesh.
|
|
* Your workload's task cannot use the same [Unix user ID (UID)][uid] as the
|
|
Envoy sidecar proxy.
|
|
* You cannot set a [`network.dns`][] block on the allocation (unless you set
|
|
[`no_dns`][tproxy_no_dns], see below).
|
|
|
|
For example, a HCL configuration with a [go-sockaddr/template][] binding to the
|
|
subnet `10.37.105.0/20`, with recursive DNS set to OpenDNS nameservers:
|
|
|
|
```hcl
|
|
bind_addr = "{{ GetPrivateInterfaces | include \"network\" \"10.37.105.0/20\" | limit 1 | attr \"address\" }}"
|
|
|
|
recursors = ["208.67.222.222", "208.67.220.220"]
|
|
```
|
|
|
|
### Nomad
|
|
|
|
Nomad must schedule onto a routable interface in order for the proxies to
|
|
connect to each other. The following steps show how to start a Nomad dev agent
|
|
configured for Consul service mesh.
|
|
|
|
```shell-session
|
|
$ sudo nomad agent -dev-connect
|
|
```
|
|
|
|
### Container Network Interface (CNI) plugins
|
|
|
|
Nomad uses CNI reference plugins to configure the network namespace used to secure the
|
|
Consul service mesh sidecar proxy. All Nomad client nodes using network namespaces
|
|
must have these CNI plugins [installed][cni_install].
|
|
|
|
To use [`transparent_proxy`][] mode, Nomad client nodes will also need the
|
|
[`consul-cni`][] plugin installed. See the Linux post-installation [steps](/nomad/docs/install#linux-post-installation-steps) for more detail on how to install CNI plugins.
|
|
|
|
## Run the service mesh-enabled services
|
|
|
|
Once Nomad and Consul are running, with Consul DNS enabled for transparent proxy
|
|
mode as described above, submit the following service mesh-enabled services to
|
|
Nomad by copying the HCL into a file named `servicemesh.nomad.hcl` and running:
|
|
`nomad job run servicemesh.nomad.hcl`
|
|
|
|
```hcl
|
|
job "countdash" {
|
|
datacenters = ["dc1"]
|
|
|
|
group "api" {
|
|
network {
|
|
mode = "bridge"
|
|
}
|
|
|
|
service {
|
|
name = "count-api"
|
|
port = "9001"
|
|
|
|
connect {
|
|
sidecar_service {
|
|
proxy {
|
|
transparent_proxy {}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
|
|
task "web" {
|
|
driver = "docker"
|
|
|
|
config {
|
|
image = "hashicorpdev/counter-api:v3"
|
|
}
|
|
}
|
|
}
|
|
|
|
group "dashboard" {
|
|
network {
|
|
mode = "bridge"
|
|
|
|
port "http" {
|
|
static = 9002
|
|
to = 9002
|
|
}
|
|
}
|
|
|
|
service {
|
|
name = "count-dashboard"
|
|
port = "http"
|
|
|
|
connect {
|
|
sidecar_service {
|
|
proxy {
|
|
transparent_proxy {}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
|
|
task "dashboard" {
|
|
driver = "docker"
|
|
|
|
env {
|
|
COUNTING_SERVICE_URL = "http://count-api.virtual.consul"
|
|
}
|
|
|
|
config {
|
|
image = "hashicorpdev/counter-dashboard:v3"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
The job contains two task groups: an API service and a web frontend.
|
|
|
|
### API service
|
|
|
|
The API service is defined as a task group with a bridge network:
|
|
|
|
```hcl
|
|
group "api" {
|
|
network {
|
|
mode = "bridge"
|
|
}
|
|
|
|
# ...
|
|
}
|
|
```
|
|
|
|
Since the API service is only accessible via Consul service mesh, it does not
|
|
define any ports in its network. The `connect` block enables the service mesh
|
|
and the `transparent_proxy` block ensures that the service will be reachable via
|
|
a virtual IP address when used with Consul DNS.
|
|
|
|
```hcl
|
|
group "api" {
|
|
|
|
# ...
|
|
|
|
service {
|
|
name = "count-api"
|
|
port = "9001"
|
|
|
|
connect {
|
|
sidecar_service {
|
|
proxy {
|
|
transparent_proxy {}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
|
|
# ...
|
|
|
|
}
|
|
```
|
|
|
|
The `port` in the service block is the port the API service listens on. The
|
|
Envoy proxy will automatically route traffic to that port inside the network
|
|
namespace. Note that currently this cannot be a named port; it must be a
|
|
hard-coded port value. See [GH-9907].
|
|
|
|
### Web Frontend
|
|
|
|
The web frontend is defined as a task group with a bridge network and a static
|
|
forwarded port:
|
|
|
|
```hcl
|
|
group "dashboard" {
|
|
network {
|
|
mode = "bridge"
|
|
|
|
port "http" {
|
|
static = 9002
|
|
to = 9002
|
|
}
|
|
}
|
|
|
|
# ...
|
|
|
|
}
|
|
```
|
|
|
|
The `static = 9002` parameter requests the Nomad scheduler reserve port 9002 on
|
|
a host network interface. The `to = 9002` parameter forwards that host port to
|
|
port 9002 inside the network namespace.
|
|
|
|
This allows you to connect to the web frontend in a browser by visiting
|
|
`http://<host_ip>:9002` as show below:
|
|
|
|
[![Count Dashboard][count-dashboard]][count-dashboard]
|
|
|
|
The web frontend connects to the API service via Consul service mesh.
|
|
|
|
```hcl
|
|
service {
|
|
name = "count-dashboard"
|
|
port = "http"
|
|
|
|
connect {
|
|
sidecar_service {
|
|
proxy {
|
|
transparent_proxy {}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
The `connect` block with `transparent_proxy` configures the web frontend's
|
|
network namespace to route all access to the `count-api` service through the
|
|
Envoy proxy.
|
|
|
|
The web frontend is configured to communicate with the API service with an
|
|
environment variable `$COUNTING_SERVICE_URL`:
|
|
|
|
```hcl
|
|
env {
|
|
COUNTING_SERVICE_URL = "http://count-api.virtual.consul"
|
|
}
|
|
```
|
|
|
|
The `transparent_proxy` block ensures that DNS queries are made to Consul so
|
|
that the `count-api.virtual.consul` name resolves to a virtual IP address. Note
|
|
that you don't need to specify a port number because the virtual IP will only be
|
|
directed to the correct service port.
|
|
|
|
### Manually configured upstreams
|
|
|
|
You can also use Connect without Consul DNS and `transparent_proxy` mode. This
|
|
approach is not recommended because it requires duplicating service intention
|
|
information in an `upstreams` block in the Nomad job specification. But Consul
|
|
DNS is not protected by ACLs, so you might want to do this if you don't want to
|
|
expose Consul DNS to untrusted workloads.
|
|
|
|
In that case, you can add `upstream` blocks to the job spec. You don't need the
|
|
`transparent_proxy` block for the `count-api` service:
|
|
|
|
```hcl
|
|
group "api" {
|
|
|
|
# ...
|
|
|
|
service {
|
|
name = "count-api"
|
|
port = "9001"
|
|
|
|
connect {
|
|
sidecar_service {}
|
|
}
|
|
}
|
|
|
|
# ...
|
|
|
|
}
|
|
```
|
|
|
|
But you'll need to add an `upstreams` block to the `count-dashboard` service:
|
|
|
|
```hcl
|
|
service {
|
|
name = "count-dashboard"
|
|
port = "http"
|
|
|
|
connect {
|
|
sidecar_service {
|
|
proxy {
|
|
upstreams {
|
|
destination_name = "count-api"
|
|
local_bind_port = 8080
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
The `upstreams` block defines the remote service to access (`count-api`) and
|
|
what port to expose that service on inside the network namespace (`8080`).
|
|
|
|
The web frontend will also need to use an environment variable to communicate
|
|
with the API service:
|
|
|
|
```hcl
|
|
env {
|
|
COUNTING_SERVICE_URL = "http://${NOMAD_UPSTREAM_ADDR_count_api}"
|
|
}
|
|
```
|
|
|
|
This environment variable value gets interpolated with the upstream's
|
|
address. Note that dashes (`-`) are converted to underscores (`_`) in
|
|
environment variables so `count-api` becomes `count_api`.
|
|
|
|
## Limitations
|
|
|
|
- The minimum Consul version to use Connect with Nomad is Consul v1.8.0.
|
|
- The `consul` binary must be present in Nomad's `$PATH` to run the Envoy
|
|
proxy sidecar on client nodes.
|
|
- Consul service mesh using network namespaces is only supported on Linux.
|
|
- Prior to Consul 1.9, the Envoy sidecar proxy will drop and stop accepting
|
|
connections while the Nomad agent is restarting.
|
|
|
|
## Troubleshooting
|
|
|
|
If the sidecar service is not running correctly, you can investigate
|
|
potential `envoy` failures in the following ways:
|
|
|
|
* Task logs in the associated `connect-*` task
|
|
* Task secrets (may contain sensitive information):
|
|
* envoy CLI command: `secrets/.envoy_bootstrap.cmd`
|
|
* environment variables: `secrets/.envoy_bootstrap.env`
|
|
* An extra Allocation log file: `alloc/logs/envoy_bootstrap.stderr.0`
|
|
|
|
For example, with an allocation ID starting with `b36a`:
|
|
|
|
```shell-session
|
|
nomad alloc status -short b36a # to get the connect-* task name
|
|
nomad alloc logs -task connect-proxy-count-api -stderr b36a
|
|
nomad alloc exec -task connect-proxy-count-api b36a cat secrets/.envoy_bootstrap.cmd
|
|
nomad alloc exec -task connect-proxy-count-api b36a cat secrets/.envoy_bootstrap.env
|
|
nomad alloc fs b36a alloc/logs/envoy_bootstrap.stderr.0
|
|
```
|
|
|
|
Note: If the alloc is unable to start successfully, debugging files may
|
|
only be accessible from the host filesystem. However, the sidecar task secrets
|
|
directory may not be available in systems where it is mounted in a temporary
|
|
filesystem.
|
|
|
|
Bootstrapping the Envoy proxy requires that the Consul ACL token and service
|
|
registration have successfully replicated to whichever Consul server the local
|
|
Consul agent is connected to. Nomad clients poll for this value with exponential
|
|
backoff and a timeout. You can adjust the timeouts on a given node by setting
|
|
node metadata values via the command line or in the [`client.meta`][] agent
|
|
configuration block. The default values are shown below:
|
|
|
|
```shell-session
|
|
nomad node meta apply -node-id $nodeID \
|
|
consul.token_preflight_check.timeout=10s \
|
|
consul.token_preflight_check.base=500ms \
|
|
consul.service_preflight_check.timeout=60s \
|
|
consul.service_preflight_check.base=1s
|
|
```
|
|
|
|
[count-dashboard]: /img/count-dashboard.png
|
|
[consul_acl]: https://github.com/hashicorp/consul/issues/7414
|
|
[gh-9907]: https://github.com/hashicorp/nomad/issues/9907
|
|
[`Local`]: /consul/docs/security/acl/tokens#token-attributes
|
|
[anon_token]: /consul/docs/security/acl/tokens#special-purpose-tokens
|
|
[consul_ports]: /consul/docs/agent/config/config-files#ports
|
|
[consul_grpc_tls]: /consul/docs/upgrading/upgrade-specific#changes-to-grpc-tls-configuration
|
|
[cni_install]: /nomad/docs/install#linux-post-installation-steps
|
|
[transparent proxy]: /consul/docs/k8s/connect/transparent-proxy
|
|
[go-sockaddr/template]: https://pkg.go.dev/github.com/hashicorp/go-sockaddr/template
|
|
[`recursors`]: /consul/docs/agent/config/config-files#recursors
|
|
[`transparent_proxy`]: /nomad/docs/job-specification/transparent_proxy
|
|
[tproxy_no_dns]: /nomad/docs/job-specification/transparent_proxy#no_dns
|
|
[`consul-cni`]: https://releases.hashicorp.com/consul-cni
|
|
[virtual IP]: /consul/docs/services/discovery/dns-static-lookups#service-virtual-ip-lookups
|
|
[cni_plugins]: /nomad/docs/networking/cni#install-cni-reference-plugins
|
|
[consul_dns_port]: /consul/docs/agent/config/config-files#dns_port
|
|
[`network.dns`]: /nomad/docs/job-specification/network#dns-parameters
|
|
[`client.meta`]: /nomad/docs/configuration/client#meta
|
|
[uid]: /nomad/docs/job-specification/transparent_proxy#uid
|