mirror of
https://github.com/kemko/nomad.git
synced 2026-01-03 08:55:43 +03:00
* Update for search engine optimization * Update descriptions and add intro body summary paragraph * Apply suggestions from code review Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com> --------- Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>
343 lines
18 KiB
Plaintext
343 lines
18 KiB
Plaintext
---
|
|
layout: docs
|
|
page_title: Security Model
|
|
description: >-
|
|
Nomad's security model provides mechanisms such as mTLS, namespace, access control list (ACL), Sentinel policies (Enterprise), and resource quotas (Enterprise) to enable fine-grained access within and between clusters. Learn how to secure your Nomad clusters with threat models for internal threats and external threats. Review the network ports used by the security model.
|
|
---
|
|
|
|
# Security Model
|
|
|
|
This page provides conceptual information on Nomad's security model, which
|
|
provides mechanisms such as mTLS, namespace, access control list (ACL), Sentinel
|
|
policies, and resource quotas to enable fine-grained access within and between
|
|
clusters. Learn how to secure your Nomad clusters, threat models, internal
|
|
threats, and external threats. Review the network ports used by the security
|
|
model.
|
|
|
|
Nomad utilizes a lightweight gossip and RPC system, [similar to
|
|
Consul](/consul/docs/security/security-models/core), which provides
|
|
various essential features. Both of these systems provide security mechanisms
|
|
which should be utilized to help provide [confidentiality, integrity and
|
|
authentication](https://en.wikipedia.org/wiki/Information_security).
|
|
|
|
Using defense in depth is crucial for cluster security, and deployment
|
|
requirements may differ drastically depending on your use case. Further security
|
|
features for multi-tenant deployments are offered exclusively in the enterprise
|
|
version. This documentation may need to be adapted to your deployment situation,
|
|
but the general mechanisms for a secure Nomad deployment revolve around:
|
|
|
|
- **[mTLS](/nomad/tutorials/transport-security/security-enable-tls)** Mutual
|
|
authentication of both the TLS server and client x509 certificates prevents
|
|
internal abuse by preventing unauthenticated access to network components
|
|
within the cluster.
|
|
|
|
- **[ACLs](/nomad/tutorials/access-control)** Enables authorization for
|
|
authenticated connections by granting capabilities to ACL tokens.
|
|
|
|
- **[Namespaces](/nomad/tutorials/manage-clusters/namespaces)** Access to read
|
|
and write to a namespace can be controlled to allow for granular access to job
|
|
information managed within a multi-tenant cluster.
|
|
|
|
- **[Sentinel Policies]**<EnterpriseAlert inline /> Sentinel
|
|
policies allow for granular control over components such as task drivers
|
|
within a cluster.
|
|
|
|
## Personas
|
|
|
|
When thinking about Nomad, it helps to consider the following types of base
|
|
personas when managing the security requirements for the cluster deployment. The
|
|
granularity may change depending on your team's use case where rigorous roles
|
|
can be accurately defined and managed using the [Nomad backend secret engine for
|
|
Vault](/vault/docs/secrets/nomad). This is
|
|
described further with getting started steps using a development server
|
|
[here](/nomad/tutorials/access-control).
|
|
|
|
It's important to note that there's no traditional concept of a user
|
|
within Nomad itself.
|
|
|
|
- **System Administrator** - This is someone who has access to the underlying
|
|
infrastructure to a Nomad cluster. Often she has access to SSH or RDP
|
|
directly into a server within a cluster through a bastion host. Ultimately
|
|
they have read, write and execute permissions for the actual Nomad binary.
|
|
This binary is the same for server and client nodes using different
|
|
configuration files. These users potentially have something like sudo,
|
|
administrative, or some other super-user access to the underlying compute
|
|
resource. Users like these are essentially totally trusted by Nomad as they
|
|
have administrative rights to the system and can start or stop the agent.
|
|
|
|
- **Nomad Administrator** - This is someone (probably the same **System
|
|
Administrator**) who has access to define the Nomad agent configurations
|
|
for servers and clients, and/or have a Nomad management ACL token. They also
|
|
have total rights to all of the parts in the Nomad system including the
|
|
ability to start and stop all jobs within a cluster.
|
|
|
|
- **Nomad Operator** - This is someone who likely has selective access with
|
|
restricted capabilities to manage jobs applicable to their namespace within
|
|
a cluster.
|
|
|
|
- **User** - This is someone who is a user of an application being run on the
|
|
system. In some cases applications may be public facing and exposed to the
|
|
internet such as a web server. This is someone who shouldn't have any
|
|
network access to the Nomad server API.
|
|
|
|
## Secure Configuration
|
|
|
|
Nomad's security model is applicable only if all parts of the system are running
|
|
with a secure configuration; **Nomad is not secure-by-default.** Without the following
|
|
mechanisms enabled in Nomad's configuration, it may be possible to abuse access
|
|
to a cluster. Like all security considerations, one must appropriately determine
|
|
what concerns they have for their environment and adapt to these security
|
|
recommendations accordingly.
|
|
|
|
### Requirements
|
|
|
|
- [mTLS enabled](/nomad/tutorials/transport-security/security-enable-tls)
|
|
Mutual TLS (mTLS) enables [mutual
|
|
authentication](https://en.wikipedia.org/wiki/Mutual_authentication) with
|
|
security properties to prevent the following problems:
|
|
|
|
* Unauthorized access because both server and clients must provide valid TLS
|
|
[X.509](https://en.wikipedia.org/wiki/X.509) certificates signed by the same
|
|
valid [CA](https://en.wikipedia.org/wiki/Certificate_authority) in order to
|
|
communicate within the cluster.
|
|
|
|
* Observing or tampering communication between nodes is thwarted due to the
|
|
traffic being encrypted using the well known network security protocol
|
|
[TLS](https://en.wikipedia.org/wiki/Transport_Layer_Security) version 1.2,
|
|
with a [configurable minimal
|
|
version](/nomad/docs/configuration/tls#tls_min_version).
|
|
Both server and client agents must be configured to validate each other's
|
|
certificates to ensure mTLS is actually enabled. This requires appropriate
|
|
certificates to be distributed to servers, clients, machines, or operators
|
|
for things like CLI usage. It is recommended to use
|
|
[Vault](/nomad/tutorials/integrate-vault/vault-pki-nomad)
|
|
to securely manage the certificate creation and rotation for nodes.
|
|
|
|
* Agent role misconfiguration is prevented using the X.509
|
|
[SAN](https://en.wikipedia.org/wiki/Subject_Alternative_Name) extension.
|
|
This is essentially a domain name that is used to identify and verify a
|
|
node's region and role name are configured as expected (e.g.
|
|
`client.us-east.nomad`).
|
|
|
|
* Using the previously mentioned role name prevents maliciously masquerading
|
|
as a server or client node, and allows other services to be signed easily by
|
|
the same CA. This also avoids any potential pitfalls with certificates using
|
|
the IP or Hostname of nodes within a cluster.
|
|
|
|
* Using [`tls.verify_https_client=false`][verify_https_client] downgrades the
|
|
Nomad agent HTTP server to use TLS instead of mTLS. This is safe if ACLs are
|
|
enabled and is recommended if you are using the Nomad web UI to avoid the
|
|
difficulty of distributing client certificates to browsers.
|
|
|
|
Some API endpoints, such as [`/v1/metrics`][api_metrics] and
|
|
[`/v1/status/peers`][api_status_peers], can be accessed without an ACL
|
|
token and may be exposed to anyone with access to the Nomad HTTP address
|
|
when using `tls.verify_https_client=false`. You can use a reverse proxy or
|
|
other external means to restrict access to them.
|
|
|
|
- [ACLs enabled](/nomad/tutorials/access-control) The
|
|
access control list (ACL) system provides a capability-based control
|
|
mechanism for Nomad administrators allowing for custom roles (typically
|
|
within Vault) to be tied to an individual human or machine operator
|
|
identity. This allows for access to capabilities within the cluster to be
|
|
restricted to specific users.
|
|
|
|
- [Namespaces](/nomad/tutorials/manage-clusters/namespaces) This feature
|
|
allows for a cluster to be shared by multiple teams within a company. Using
|
|
this logical separation is important for multi-tenant clusters to prevent
|
|
users without access to that namespace from conflicting with each other.
|
|
This requires ACLs to be enabled in order to be enforced.
|
|
|
|
- [Sentinel Policies]<EnterpriseAlert inline />
|
|
[Sentinel](https://www.hashicorp.com/sentinel/) is a feature which enables
|
|
[policy-as-code](https://docs.hashicorp.com/sentinel/concepts/policy-as-code)
|
|
to enforce further restrictions on operators. This is used to augment the
|
|
built-in ACL system for fine-grained control over jobs.
|
|
|
|
- [Resource Quotas]<EnterpriseAlert inline /> Can limit a
|
|
namespace's access to the underlying compute resources in the cluster by
|
|
setting upper-limits for operators. Access to these resource quotas can be
|
|
managed via ACLs to ensure read-only access for operators so they can't just
|
|
change their quotas.
|
|
|
|
### Recommendations
|
|
|
|
The following are security recommendations that can help significantly improve
|
|
the security of your cluster depending on your use case. We recommend always
|
|
practicing defense in depth when architecting the security mechanisms for your
|
|
environment.
|
|
|
|
- **Rotate credentials** - Using short-lived credentials or rotating them
|
|
frequently is highly recommended to reduce damage of accidentally leaked
|
|
credentials.
|
|
|
|
- Use [Vault](/nomad/docs/integrations/vault-integration) to create and manage
|
|
dynamic, rotated credentials prevent secrets from being easily exposed
|
|
within the [job specification](/nomad/docs/job-specification) itself
|
|
which may be leaked into version control or otherwise be accidentally stored
|
|
on disk on an operator's local machine.
|
|
|
|
- Rotate credentials used by the Nomad agent; e.g. [integrate with Vault's
|
|
PKI secret engine](/nomad/tutorials/integrate-vault/vault-pki-nomad) to
|
|
automatically generate and renew dynamic, unique X.509 certificates for each
|
|
Nomad node with a short [TTL](https://en.wikipedia.org/wiki/Time_to_live).
|
|
|
|
- **[Running without Root](https://groups.google.com/forum/#!topic/nomad-tool/pSyMwC_FSFA)** -
|
|
Nomad servers can be run as unprivileged users that only require access to
|
|
the data directory.
|
|
|
|
- **Containers with Sandbox Runtimes** - In some situations, such as running
|
|
untrusted code as a service, it may be worth considering using different
|
|
container runtimes such as [gVisor](https://gvisor.dev/) or [Kata
|
|
Containers](https://katacontainers.io/). These types of runtimes provide
|
|
sandboxing features which help prevent raw access to the underlying shared
|
|
kernel for other containers and the Nomad client agent itself. Docker driver
|
|
allows [customizing runtimes](/nomad/docs/drivers/docker#runtime).
|
|
|
|
- **[Disable Unused Drivers](/nomad/docs/configuration/client#driver-denylist)** -
|
|
Each driver provides different degrees of isolation, and bugs may allow
|
|
unintended privilege escalation. If a task driver is not needed, you can
|
|
disable it to reduce risk.
|
|
|
|
- **Linux Security Modules** - Use of security modules that can be directly
|
|
integrated into operating systems such as AppArmor, SElinux, and Seccomp on
|
|
both the Nomad hosts and applied to containers for an extra layer of
|
|
security. Seccomp profiles are able to be passed directly to containers
|
|
using the
|
|
**[`security_opt`](/nomad/docs/drivers/docker#security_opt)**
|
|
parameter available in the default [Docker
|
|
driver](/nomad/docs/drivers/docker).
|
|
|
|
- **[Service Mesh](https://www.hashicorp.com/resources/service-mesh-microservices-networking)** -
|
|
Integrating service mesh technologies such as
|
|
**[Consul](https://www.consul.io/)** can be extremely useful for limiting
|
|
and efficiently load balancing network connectivity within a cluster.
|
|
|
|
- **[TLS Settings](/nomad/docs/configuration/tls)** -
|
|
TLS settings, such as the available [cipher suites](/nomad/docs/configuration/tls#tls_cipher_suites), should be tuned to fit the needs of your environment.
|
|
|
|
- **[HTTP Headers](/nomad/docs/configuration#http_api_response_headers)** -
|
|
Additional security [headers](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers), such as [`X-XSS-Protection`](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/X-XSS-Protection), can be [configured](/nomad/docs/configuration#http_api_response_headers) for HTTP API responses.
|
|
|
|
## Threat Model
|
|
|
|
The following are parts of the Nomad threat model:
|
|
|
|
- **Nomad agent-to-agent communication** - Transport encryption for
|
|
agent-to-agent communication is required to prevent eavesdropping. TCP and
|
|
UDP based protocols within Nomad provide different mechanisms for enabling
|
|
encryption including symmetric (shared gossip encryption keys) and
|
|
asymmetric keys (TLS).
|
|
|
|
- **Tampering of data in transit** - Any tampering should be detectable via mTLS
|
|
and cause Nomad to avoid processing the request.
|
|
|
|
- **Access to data without authentication or authorization** - Requests to the
|
|
server should be authenticated and authorized using mTLS and ACLs
|
|
respectively.
|
|
|
|
- **State modification or corruption due to malicious messages** - Improperly
|
|
formatted messages are discarded while properly formatted messages require
|
|
authentication and authorization.
|
|
|
|
- **Non-server members accessing raw data** - All servers that join the cluster
|
|
require proper authentication and authorization in order to begin
|
|
participating in Raft. All data in Raft should be encrypted with TLS.
|
|
|
|
- **Denial of Service against a node** - DoS attacks against a single node
|
|
should not compromise the security posture of Nomad.
|
|
|
|
The following are not part of the threat model for server agents:
|
|
|
|
- **Access (read or write) to the Nomad data directory** - Information about the
|
|
jobs managed by Nomad is persisted to a server's data directory.
|
|
|
|
- **Access (read or write) to the Nomad configuration directory** - Access to
|
|
Nomad's configuration file(s) directory can enable and disable features for
|
|
a cluster.
|
|
|
|
- **Memory access to a running Nomad server agent** - Direct access to the
|
|
memory of the Nomad server agent process (usually requiring a shell on the
|
|
system through various means) results in almost all aspects of the agent
|
|
being compromised including access to certificates and other secrets.
|
|
|
|
- **Existence of [Variables] metadata** - Access to Variables List APIs is
|
|
controlled by ACL policies, but the existence of specific paths or metadata is
|
|
not considered sensitive.
|
|
|
|
The following are not part of the threat model for client agents:
|
|
|
|
- **Access (read or write) to the Nomad data directory** - Information about the
|
|
allocations scheduled to a Nomad client is persisted to its data directory.
|
|
This would include any secrets in any of the allocation's file systems.
|
|
|
|
- **Access (read or write) to the Nomad configuration directory** - Access to a
|
|
client's configuration file can enable and disable features for a client
|
|
including insecure drivers such as
|
|
[`raw_exec`](/nomad/docs/drivers/raw_exec).
|
|
|
|
- **Memory access to a running Nomad client agent** - Direct access to the
|
|
memory of the Nomad client agent process allows an attack to extract secrets
|
|
from clients such as Vault tokens.
|
|
|
|
- **Lax Client Driver Sandbox** - Drivers may allow some privileged operations,
|
|
e.g. filesystem access to configuration directories, or raw accesses to host
|
|
devices. Such privileges can be used to facilitate compromise other workloads,
|
|
or cause denial-of-service attacks.
|
|
|
|
### Internal Threats
|
|
|
|
- **Job Operator** - Someone with a valid mTLS certificate and ACL token may still be a
|
|
threat to your cluster in certain situations, especially in multi-team
|
|
cluster deployments. They may accidentally or intentionally use a malicious
|
|
job to harm a cluster which can help be protected against using
|
|
Quotas, Namespace, and Sentinel policies.
|
|
|
|
- **Workload** - Workloads may have host network access within a cluster which
|
|
can lead to SSRF due to application security issues outside of the scope of
|
|
Nomad which may lead to internal access within the cluster. Using mTLS, ACLs
|
|
and Sentinel policies together can add layers of protection against
|
|
malicious workloads.
|
|
|
|
- **RPC / API Access** - RPC and HTTP API endpoints without mTLS can expose
|
|
clusters to abuse within the cluster from malicious workloads.
|
|
|
|
- **Client driver** - Drivers implement various workload types for a cluster,
|
|
and the backend configuration of these drivers should be considered to
|
|
implement defense in depth. For example, a custom Docker driver that limits
|
|
the ability to mount the host file system may be subverted by network access
|
|
to an exposed Docker daemon API through other means such as the [`raw_exec`](/nomad/docs/drivers/raw_exec)
|
|
driver.
|
|
|
|
### External Threats
|
|
|
|
There are two main components to consider to for external threats in a Nomad cluster:
|
|
|
|
- **Server agent** - Internal cluster leader elections and replication is
|
|
managed via Raft between server agents encrypted in transit. However,
|
|
information about the server is stored unencrypted at rest in the agent's
|
|
data directory. This may contain sensitive information such as ACL tokens
|
|
and TLS certificates.
|
|
|
|
- **Client agent** - Client-to-server communication within a cluster is
|
|
encrypted and authenticated using mTLS. Information about the allocations on
|
|
a client node is unencrypted in the agent's data and configuration
|
|
directory.
|
|
|
|
## Network Ports
|
|
|
|
| **Port / Protocol** | Agents | Description |
|
|
|----------------------|---------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
|
| **4646** / TCP | All | [HTTP](https://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol) to provide [UI](/nomad/tutorials/web-ui/web-ui-access) and [API](/nomad/api-docs) access to agents. |
|
|
| **4647** / TCP | All | [RPC](https://en.wikipedia.org/wiki/Remote_procedure_call) protocol used by agents. |
|
|
| **4648** / TCP + UDP | Servers | [gossip](/nomad/docs/concepts/gossip) protocol to manage server membership using [Serf][serf]. |
|
|
|
|
|
|
[api_metrics]: /nomad/api-docs/metrics
|
|
[api_status_peers]: /nomad/api-docs/status#list-peers
|
|
[Variables]: /nomad/docs/concepts/variables
|
|
[verify_https_client]: /nomad/docs/configuration/tls#verify_https_client
|
|
[serf]: https://github.com/hashicorp/serf
|
|
[Sentinel Policies]: /nomad/tutorials/governance-and-policy/sentinel
|
|
[Resource Quotas]: /nomad/tutorials/governance-and-policy/quotas
|