From 2e8dc2135cec994b8316bf6bd6fc58ef1aef696e Mon Sep 17 00:00:00 2001 From: Seth Vargo Date: Sat, 8 Oct 2016 18:57:51 +0800 Subject: [PATCH] Separate cluster formation into separate documentation pages --- website/source/docs/cluster/automatic.html.md | 116 +++++ .../source/docs/cluster/bootstrapping.html.md | 272 +---------- website/source/docs/cluster/federation.md | 28 ++ website/source/docs/cluster/manual.html.md | 65 +++ .../source/docs/cluster/requirements.html.md | 57 +++ website/source/layouts/docs.erb | 428 +++++++++--------- 6 files changed, 498 insertions(+), 468 deletions(-) create mode 100644 website/source/docs/cluster/automatic.html.md create mode 100644 website/source/docs/cluster/federation.md create mode 100644 website/source/docs/cluster/manual.html.md create mode 100644 website/source/docs/cluster/requirements.html.md diff --git a/website/source/docs/cluster/automatic.html.md b/website/source/docs/cluster/automatic.html.md new file mode 100644 index 000000000..cf596976f --- /dev/null +++ b/website/source/docs/cluster/automatic.html.md @@ -0,0 +1,116 @@ +--- +layout: "docs" +page_title: "Automatically Bootstrapping a Nomad Cluster" +sidebar_current: "docs-cluster-automatic" +description: |- + Learn how to automatically bootstrap a Nomad cluster using Consul. By having + a Consul agent installed on each host, Nomad can automatically discover other + clients and servers to bootstrap the cluster without operator involvement. +--- + +# Automatic Bootstrapping + +To automatically bootstrap a Nomad cluster, we must leverage another HashiCorp +open source tool, [Consul](https://www.consul.io/). Bootstrapping Nomad is +easiest against an existing Consul cluster. The Nomad servers and clients +will become informed of each other's existence when the Consul agent is +installed and configured on each host. As an added benefit, integrating Consul +with Nomad provides service and health check registration for applications which +later run under Nomad. + +Consul models infrastructures as datacenters and multiple Consul datacenters can +be connected over the WAN so that clients can discover nodes in other +datacenters. Since Nomad regions can encapsulate many datacenters, we recommend +running a Consul cluster in every Nomad datacenter and connecting them over the +WAN. Please refer to the Consul guide for both +[bootstrapping](https://www.consul.io/docs/guides/bootstrapping.html) a single +datacenter and [connecting multiple Consul clusters over the +WAN](https://www.consul.io/docs/guides/datacenters.html). + +If a Consul agent is installed on the host prior to Nomad starting, the Nomad +agent will register with Consul and discover other nodes. + +For servers, we must inform the cluster how many servers we expect to have. This +is required to form the initial quorum, since Nomad is unaware of how many peers +to expect. For example, to form a region with three Nomad servers, you would use +the following Nomad configuration file: + +```hcl +# /etc/nomad.d/server.hcl + +server { + enabled = true + bootstrap_expect = 3 +} +``` + +This configuration would be saved to disk and then run: + +```shell +$ nomad agent -config=/etc/nomad.d/server.hcl +``` + +A similar configuration is available for Nomad clients: + +```hcl +# /etc/nomad.d/client.hcl + +datacenter = "dc1" + +client { + enabled = true +} +``` + +The agent is started in a similar manner: + +```shell +$ nomad agent -config=/etc/nomad.d/client.hcl +``` + +As you can see, the above configurations include no IP or DNS addresses between +the clients and servers. This is because Nomad detected the existence of Consul +and utilized service discovery to form the cluster. + +## Internals + +~> This section discusses the internals of the Consul and Nomad integration at a +very high level. Reading is only recommended for those curious to the +implementation. + +As discussed in the previous section, Nomad merges multiple configuration files +together, so the `-config` may be specified more than once: + +```shell +$ nomad agent -config=base.hcl -config=server.hcl +``` + +In addition to merging configuration on the command line, Nomad also maintains +its own internal configurations (called "default configs") which include sane +base defaults. One of those default configurations includes a "consul" block, +which specifies sane defaults for connecting to and integrating with Consul. In +essence, this configuration file resembles the following: + +```hcl +# You do not need to add this to your configuration file. This is an example +# that is part of Nomad's internal default configuration for Consul integration. +consul { + # The address to the Consul agent. + address = "127.0.0.1:8500" + + # The service name to register the server and client with Consul. + server_service_name = "nomad" + client_service_name = "nomad-client" + + # Enables automatically registering the services. + auto_advertise = true + + # Enabling the server and client to bootstrap using Consul. + server_auto_join = true + client_auto_join = true +} +``` + +Please refer to the [Consul +documentation](/docs/agent/config.html#consul_options) for the complete set of +configuration options. diff --git a/website/source/docs/cluster/bootstrapping.html.md b/website/source/docs/cluster/bootstrapping.html.md index 6af5514b7..04ee3d544 100644 --- a/website/source/docs/cluster/bootstrapping.html.md +++ b/website/source/docs/cluster/bootstrapping.html.md @@ -1,270 +1,24 @@ --- layout: "docs" -page_title: "Creating a Nomad Cluster" +page_title: "Bootstrapping a Nomad Cluster" sidebar_current: "docs-cluster-bootstrap" description: |- Learn how to bootstrap a Nomad cluster. --- -# Creating a cluster +# Bootstrapping a Nomad Cluster -Nomad models infrastructure as regions and datacenters. Regions may contain -multiple datacenters. Servers are assigned to regions and manage all state for -the region and make scheduling decisions within that region. Clients are -registered to a single datacenter and region. +Nomad models infrastructure into regions and datacenters. Servers reside at the +regional layer and manage all state and scheduling decisions for that region. +Regions contain multiple datacenters, and clients are registered to a single +datacenter (and thus a region that contains that region). For more details on +the architecture of Nomad and how it models infrastructure see the [architecture +page](/docs/internals/architecture.html). -[![Regional Architecture](/assets/images/nomad-architecture-region.png)](/assets/images/nomad-architecture-region.png) +There are two strategies for bootstrapping a Nomad cluster: -This page will explain how to bootstrap a production grade Nomad region, both -with and without Consul, and how to federate multiple regions together. +1. Automatic bootstrapping +1. Manual bootstrapping -[![Global Architecture](/assets/images/nomad-architecture-global.png)](/assets/images/nomad-architecture-global.png) - -Bootstrapping Nomad is made significantly easier when there already exists a -Consul cluster in place. Since Nomad's topology is slightly richer than Consul's -since it supports not only datacenters but also regions lets start with how -Consul should be deployed in relation to Nomad. - -For more details on the architecture of Nomad and how it models infrastructure -see the [Architecture page](/docs/internals/architecture.html). - -## Deploying Consul Clusters - -A Nomad cluster gains the ability to bootstrap itself as well as provide service -and health check registration to applications when Consul is deployed along side -Nomad. - -Consul models infrastructures as datacenters and multiple Consul datacenters can -be connected over the WAN so that clients can discover nodes in other -datacenters. Since Nomad regions can encapsulate many datacenters, we recommend -running a Consul cluster in every Nomad datacenter and connecting them over the -WAN. Please refer to the Consul guide for both -[bootstrapping](https://www.consul.io/docs/guides/bootstrapping.html) a single datacenter and -[connecting multiple Consul clusters over the -WAN](https://www.consul.io/docs/guides/datacenters.html). - - -## Bootstrapping a Nomad cluster - -Nomad supports merging multiple configuration files together on startup. This is -done to enable generating a base configuration that can be shared by Nomad -servers and clients. A suggested base configuration is: - -``` -# Name the region, if omitted, the default "global" region will be used. -region = "europe" - -# Persist data to a location that will survive a machine reboot. -data_dir = "/opt/nomad/" - -# Bind to all addresses so that the Nomad agent is available both on loopback -# and externally. -bind_addr = "0.0.0.0" - -# Advertise an accessible IP address so the server is reachable by other servers -# and clients. The IPs can be materialized by Terraform or be replaced by an -# init script. -advertise { - http = "${self.ipv4_address}:4646" - rpc = "${self.ipv4_address}:4647" - serf = "${self.ipv4_address}:4648" -} - -# Ship metrics to monitor the health of the cluster and to see task resource -# usage. -telemetry { - statsite_address = "${var.statsite}" - disable_hostname = true -} - -# Enable debug endpoints. -enable_debug = true -``` - -### With Consul - -If a local Consul cluster is bootstrapped before Nomad, on startup Nomad -server's will register with Consul and discover other server's. With their set -of peers, they will automatically form quorum, respecting the `bootstrap_expect` -field. Thus to form a 3 server region, the below configuration can be used in -conjunction with the base config: - -``` -server { - enabled = true - bootstrap_expect = 3 -} -``` - -And an equally simple configuration can be used for clients: - -``` -# Replace with the relevant datacenter. -datacenter = "dc1" - -client { - enabled = true -} -``` - -As you can see, the above configurations have no mention of the other server's to -join or any Consul configuration. That is because by default, the following is -merged with the configuration file: - -``` -consul { - # The address to the Consul agent. - address = "127.0.0.1:8500" - - # The service name to register the server and client with Consul. - server_service_name = "nomad" - client_service_name = "nomad-client" - - # Enables automatically registering the services. - auto_advertise = true - - # Enabling the server and client to bootstrap using Consul. - server_auto_join = true - client_auto_join = true -} -``` - -Since the `consul` block is merged by default, bootstrapping a cluster becomes -as easy as running the following on each of the three servers: - -``` -$ nomad agent -config base.hcl -config server.hcl -``` - -And on every client in the cluster, the following should be run: - -``` -$ nomad agent -config base.hcl -config client.hcl -``` - -With the above configurations and commands the Nomad agents will automatically -register themselves with Consul and discover other Nomad servers. If the agent -is a server, it will join the quorum and if it is a client, it will register -itself and join the cluster. - -Please refer to the [Consul documentation](/docs/agent/config.html#consul_options) -for the complete set of configuration options. - -### Without Consul - -When bootstrapping without Consul, Nomad servers and clients must be started -knowing the address of at least one Nomad server. - -To join the Nomad server's we can either encode the address in the server -configs as such: - -``` -server { - enabled = true - bootstrap_expect = 3 - retry_join = [""] -} -``` - -Alternatively, the address can be supplied after the servers have all been started by -running the [`server-join` command](/docs/commands/server-join.html) on the servers -individual to cluster the servers. All servers can join just one other server, -and then rely on the gossip protocol to discover the rest. - -``` -nomad server-join -``` - -On the client side, the addresses of the servers are expected to be specified -via the client configuration. - -``` -client { - enabled = true - servers = ["10.10.11.2:4647", "10.10.11.3:4647", "10.10.11.4:4647"] -} -``` - -If servers are added or removed from the cluster, the information will be -pushed to the client. This means, that only one server must be specified because -after initial contact, the full set of servers in the client's region will be -pushed to the client. - -The port corresponds to the RPC port. If no port is specified with the IP address, -the default RCP port of `4647` is assumed. - -The same commmands can be used to start the servers and clients as shown in the -bootstrapping with Consul section. - -### Federating a cluster - -Nomad clusters across multiple regions can be federated allowing users to submit -jobs or interact with the HTTP API targeting any region, from any server. - -Federating multiple Nomad clusters is as simple as joining servers. From any -server in one region, simply issue a join command to a server in the remote -region: - -``` -nomad server-join 10.10.11.8:4648 -``` - -Servers across regions discover other servers in the cluster via the gossip -protocol and hence it's enough to join one known server. - -If the Consul clusters in the different Nomad regions are federated, and Consul -`server_auto_join` is enabled, then federation occurs automatically. - -## Network Topology - -### Nomad Servers - -Nomad servers are expected to have sub 10 millisecond network latencies between -each other to ensure liveness and high throughput scheduling. Nomad servers -can be spread across multiple datacenters if they have low latency -connections between them to achieve high availability. - -For example, on AWS every region comprises of multiple zones which have very low -latency links between them, so every zone can be modeled as a Nomad datacenter -and every Zone can have a single Nomad server which could be connected to form a -quorum and a region. - -Nomad servers uses Raft for state replication and Raft being highly consistent -needs a quorum of servers to function, therefore we recommend running an odd -number of Nomad servers in a region. Usually running 3-5 servers in a region is -recommended. The cluster can withstand a failure of one server in a cluster of -three servers and two failures in a cluster of five servers. Adding more servers -to the quorum adds more time to replicate state and hence throughput decreases -so we don't recommend having more than seven servers in a region. - -### Nomad Clients - -Nomad clients do not have the same latency requirements as servers since they -are not participating in Raft. Thus clients can have 100+ millisecond latency to -their servers. This allows having a set of Nomad servers that service clients -that can be spread geographically over a continent or even the world in the case -of having a single "global" region and many datacenter. - -## Production Considerations - -### Nomad Servers - -Depending on the number of jobs the cluster will be managing and the rate at -which jobs are submitted, the Nomad servers may need to be run on large machine -instances. We suggest having 8+ cores, 32 GB+ of memory, 80 GB+ of disk and -significant network bandwith. The core count and network recommendations are to -ensure high throughput as Nomad heavily relies on network communication and as -the Servers are managing all the nodes in the region and performing scheduling. -The memory and disk requirements are due to the fact that Nomad stores all state -in memory and will store two snapshots of this data onto disk. Thus disk should -be at least 2 times the memory available to the server when deploying a high -load cluster. - -### Nomad Clients - -Nomad clients support reserving resources on the node that should not be used by -Nomad. This should be used to target a specific resource utilization per node -and to reserve resources for applications running outside of Nomad's supervision -such as Consul and the operating system itself. - -Please see the [`reservation` config](/docs/agent/config.html#reserved) for more detail. +Please refer to the specific documentation links above or in the sidebar for +more detailed information about each strategy. diff --git a/website/source/docs/cluster/federation.md b/website/source/docs/cluster/federation.md new file mode 100644 index 000000000..06435f77a --- /dev/null +++ b/website/source/docs/cluster/federation.md @@ -0,0 +1,28 @@ +--- +layout: "docs" +page_title: "Federating a Nomad Cluster" +sidebar_current: "docs-cluster-federation" +description: |- + Learn how to join Nomad servers across multiple regions so users can submit + jobs to any server in any region using global federation. +--- + +# Federating a Cluster + +Because Nomad operates at a regional level, federation is part of Nomad core. +Federation enables users to submit jobs or interact with the HTTP API targeting +any region, from any server, even if that server resides in a different region. + +Federating multiple Nomad clusters is as simple as joining servers. From any +server in one region, issue a join command to a server in a remote region: + +```shell +$ nomad server-join 1.2.3.4:4648 +``` + +Note that only one join command is required per region. Servers across regions +discover other servers in the cluster via the gossip protocol and hence it's +enough to join just one known server. + +If bootstrapped via Consul and the Consul clusters in the Nomad regions are +federated, then federation occurs automatically. diff --git a/website/source/docs/cluster/manual.html.md b/website/source/docs/cluster/manual.html.md new file mode 100644 index 000000000..0b32c3ee1 --- /dev/null +++ b/website/source/docs/cluster/manual.html.md @@ -0,0 +1,65 @@ +--- +layout: "docs" +page_title: "Manually Bootstrapping a Nomad Cluster" +sidebar_current: "docs-cluster-manual" +description: |- + Learn how to manually bootstrap a Nomad cluster using the server-join + command. This section also discusses Nomad federation across multiple + datacenters and regions. +--- + +# Manual Bootstrapping + +Manually bootstrapping a Nomad cluster does not rely on additional tooling, but +does require operator participation in the cluster formation process. When +bootstrapping, Nomad servers and clients must be started and informed with the +address of at least one Nomad server. + +As you can tell, this creates a chicken-and-egg problem where one server must +first be fully bootstrapped and configured before the remaining servers and +clients can join the cluster. This requirement can add additional provisioning +time as well as ordered dependencies during provisioning. + +First, we bootstrap a single Nomad server and capture its IP address. After we +have that nodes IP address, we place this address in the configuration. + +For Nomad servers, this configuration may look something like this: + +```hcl +server { + enabled = true + bootstrap_expect = 3 + + # This is the IP address of the first server we provisioned + retry_join = [":4648"] +} +``` + +Alternatively, the address can be supplied after the servers have all been +started by running the [`server-join` command](/docs/commands/server-join.html) +on the servers individual to cluster the servers. All servers can join just one +other server, and then rely on the gossip protocol to discover the rest. + +``` +$ nomad server-join +``` + +For Nomad clients, the configuration may look something like: + +```hcl +client { + enabled = true + servers = [":4647"] +} +``` + +At this time, there is no equivalent of the server-join command for +Nomad clients. + +The port corresponds to the RPC port. If no port is specified with the IP +address, the default RCP port of `4647` is assumed. + +As servers are added or removed from the cluster, this information is pushed to +the client. This means only one server must be specified because, after initial +contact, the full set of servers in the client's region are shared with the +client. diff --git a/website/source/docs/cluster/requirements.html.md b/website/source/docs/cluster/requirements.html.md new file mode 100644 index 000000000..268b2764f --- /dev/null +++ b/website/source/docs/cluster/requirements.html.md @@ -0,0 +1,57 @@ +--- +layout: "docs" +page_title: "Nomad Client and Server Requirements" +sidebar_current: "docs-cluster-requirements" +description: |- + Learn how to manually bootstrap a Nomad cluster using the server-join + command. This section also discusses Nomad federation across multiple + datacenters and regions. +--- + +# Cluster Requirements + +## Resources (RAM, CPU, etc.) + +**Nomad servers** may need to be run on large machine instances. We suggest +having 8+ cores, 32 GB+ of memory, 80 GB+ of disk and significant network +bandwidth. The core count and network recommendations are to ensure high +throughput as Nomad heavily relies on network communication and as the Servers +are managing all the nodes in the region and performing scheduling. The memory +and disk requirements are due to the fact that Nomad stores all state in memory +and will store two snapshots of this data onto disk. Thus disk should be at +least 2 times the memory available to the server when deploying a high load +cluster. + +**Nomad clients** support reserving resources on the node that should not be +used by Nomad. This should be used to target a specific resource utilization per +node and to reserve resources for applications running outside of Nomad's +supervision such as Consul and the operating system itself. + +Please see the [reservation configuration](/docs/agent/config.html#reserved) for +more detail. + +## Network Topology + +**Nomad servers** are expected to have sub 10 millisecond network latencies +between each other to ensure liveness and high throughput scheduling. Nomad +servers can be spread across multiple datacenters if they have low latency +connections between them to achieve high availability. + +For example, on AWS every region comprises of multiple zones which have very low +latency links between them, so every zone can be modeled as a Nomad datacenter +and every Zone can have a single Nomad server which could be connected to form a +quorum and a region. + +Nomad servers uses Raft for state replication and Raft being highly consistent +needs a quorum of servers to function, therefore we recommend running an odd +number of Nomad servers in a region. Usually running 3-5 servers in a region is +recommended. The cluster can withstand a failure of one server in a cluster of +three servers and two failures in a cluster of five servers. Adding more servers +to the quorum adds more time to replicate state and hence throughput decreases +so we don't recommend having more than seven servers in a region. + +**Nomad clients** do not have the same latency requirements as servers since they +are not participating in Raft. Thus clients can have 100+ millisecond latency to +their servers. This allows having a set of Nomad servers that service clients +that can be spread geographically over a continent or even the world in the case +of having a single "global" region and many datacenter. diff --git a/website/source/layouts/docs.erb b/website/source/layouts/docs.erb index 5fc3c29a5..3ad4dba79 100644 --- a/website/source/layouts/docs.erb +++ b/website/source/layouts/docs.erb @@ -1,232 +1,242 @@ <% wrap_layout :inner do %> - <% content_for :sidebar do %> - + <% end %> + + <%= yield %> <% end %>