mirror of
https://github.com/kemko/nomad.git
synced 2026-01-05 09:55:44 +03:00
Add autopilot docs
This commit is contained in:
@@ -123,3 +123,232 @@ $ curl \
|
||||
--request DELETE \
|
||||
https://nomad.rocks/v1/operator/raft/peer?address=1.2.3.4
|
||||
```
|
||||
|
||||
## Read Autopilot Configuration
|
||||
|
||||
This endpoint retrieves its latest Autopilot configuration.
|
||||
|
||||
| Method | Path | Produces |
|
||||
| ------ | ---------------------------- | -------------------------- |
|
||||
| `GET` | `/operator/autopilot/configuration` | `application/json` |
|
||||
|
||||
The table below shows this endpoint's support for
|
||||
[blocking queries](/api/index.html#blocking-queries),
|
||||
[consistency modes](/api/index.html#consistency-modes), and
|
||||
[required ACLs](/api/index.html#acls).
|
||||
|
||||
| Blocking Queries | Consistency Modes | ACL Required |
|
||||
| ---------------- | ----------------- | --------------- |
|
||||
| `NO` | `none` | `operator:read` |
|
||||
|
||||
### Parameters
|
||||
|
||||
- `dc` `(string: "")` - Specifies the datacenter to query. This will default to
|
||||
the datacenter of the agent being queried. This is specified as part of the
|
||||
URL as a query string.
|
||||
|
||||
- `stale` `(bool: false)` - If the cluster does not currently have a leader an
|
||||
error will be returned. You can use the `?stale` query parameter to read the
|
||||
Raft configuration from any of the Nomad servers.
|
||||
|
||||
### Sample Request
|
||||
|
||||
```text
|
||||
$ curl \
|
||||
https://nomad.rocks/operator/autopilot/configuration
|
||||
```
|
||||
|
||||
### Sample Response
|
||||
|
||||
```json
|
||||
{
|
||||
"CleanupDeadServers": true,
|
||||
"LastContactThreshold": "200ms",
|
||||
"MaxTrailingLogs": 250,
|
||||
"ServerStabilizationTime": "10s",
|
||||
"RedundancyZoneTag": "",
|
||||
"DisableUpgradeMigration": false,
|
||||
"UpgradeVersionTag": "",
|
||||
"CreateIndex": 4,
|
||||
"ModifyIndex": 4
|
||||
}
|
||||
```
|
||||
|
||||
For more information about the Autopilot configuration options, see the
|
||||
[agent configuration section](/docs/agent/options.html#autopilot).
|
||||
|
||||
## Update Autopilot Configuration
|
||||
|
||||
This endpoint updates the Autopilot configuration of the cluster.
|
||||
|
||||
| Method | Path | Produces |
|
||||
| ------ | ---------------------------- | -------------------------- |
|
||||
| `PUT` | `/operator/autopilot/configuration` | `application/json` |
|
||||
|
||||
The table below shows this endpoint's support for
|
||||
[blocking queries](/api/index.html#blocking-queries),
|
||||
[consistency modes](/api/index.html#consistency-modes), and
|
||||
[required ACLs](/api/index.html#acls).
|
||||
|
||||
| Blocking Queries | Consistency Modes | ACL Required |
|
||||
| ---------------- | ----------------- | ---------------- |
|
||||
| `NO` | `none` | `opreator:write` |
|
||||
|
||||
### Parameters
|
||||
|
||||
- `dc` `(string: "")` - Specifies the datacenter to query. This will default to
|
||||
the datacenter of the agent being queried. This is specified as part of the
|
||||
URL as a query string.
|
||||
|
||||
- `cas` `(int: 0)` - Specifies to use a Check-And-Set operation. The update will
|
||||
only happen if the given index matches the `ModifyIndex` of the configuration
|
||||
at the time of writing.
|
||||
|
||||
- `CleanupDeadServers` `(bool: true)` - Specifies automatic removal of dead
|
||||
server nodes periodically and whenever a new server is added to the cluster.
|
||||
|
||||
- `LastContactThreshold` `(string: "200ms")` - Specifies the maximum amount of
|
||||
time a server can go without contact from the leader before being considered
|
||||
unhealthy. Must be a duration value such as `10s`.
|
||||
|
||||
- `MaxTrailingLogs` `(int: 250)` specifies the maximum number of log entries
|
||||
that a server can trail the leader by before being considered unhealthy.
|
||||
|
||||
- `ServerStabilizationTime` `(string: "10s")` - Specifies the minimum amount of
|
||||
time a server must be stable in the 'healthy' state before being added to the
|
||||
cluster. Only takes effect if all servers are running Raft protocol version 3
|
||||
or higher. Must be a duration value such as `30s`.
|
||||
|
||||
- `RedundancyZoneTag` `(string: "")` - Controls the node-meta key to use when
|
||||
Autopilot is separating servers into zones for redundancy. Only one server in
|
||||
each zone can be a voting member at one time. If left blank, this feature will
|
||||
be disabled.
|
||||
|
||||
- `DisableUpgradeMigration` `(bool: false)` - Disables Autopilot's upgrade
|
||||
migration strategy in Nomad Enterprise of waiting until enough
|
||||
newer-versioned servers have been added to the cluster before promoting any of
|
||||
them to voters.
|
||||
|
||||
- `UpgradeVersionTag` `(string: "")` - Controls the node-meta key to use for
|
||||
version info when performing upgrade migrations. If left blank, the Nomad
|
||||
version will be used.
|
||||
|
||||
### Sample Payload
|
||||
|
||||
```json
|
||||
{
|
||||
"CleanupDeadServers": true,
|
||||
"LastContactThreshold": "200ms",
|
||||
"MaxTrailingLogs": 250,
|
||||
"ServerStabilizationTime": "10s",
|
||||
"RedundancyZoneTag": "",
|
||||
"DisableUpgradeMigration": false,
|
||||
"UpgradeVersionTag": "",
|
||||
"CreateIndex": 4,
|
||||
"ModifyIndex": 4
|
||||
}
|
||||
```
|
||||
|
||||
## Read Health
|
||||
|
||||
This endpoint queries the health of the autopilot status.
|
||||
|
||||
| Method | Path | Produces |
|
||||
| ------ | ---------------------------- | -------------------------- |
|
||||
| `GET` | `/operator/autopilot/health` | `application/json` |
|
||||
|
||||
The table below shows this endpoint's support for
|
||||
[blocking queries](/api/index.html#blocking-queries),
|
||||
[consistency modes](/api/index.html#consistency-modes), and
|
||||
[required ACLs](/api/index.html#acls).
|
||||
|
||||
| Blocking Queries | Consistency Modes | ACL Required |
|
||||
| ---------------- | ----------------- | --------------- |
|
||||
| `NO` | `none` | `opreator:read` |
|
||||
|
||||
### Parameters
|
||||
|
||||
- `dc` `(string: "")` - Specifies the datacenter to query. This will default to
|
||||
the datacenter of the agent being queried. This is specified as part of the
|
||||
URL as a query string.
|
||||
|
||||
### Sample Request
|
||||
|
||||
```text
|
||||
$ curl \
|
||||
https://nomad.rocks/v1/operator/autopilot/health
|
||||
```
|
||||
|
||||
### Sample response
|
||||
|
||||
```json
|
||||
{
|
||||
"Healthy": true,
|
||||
"FailureTolerance": 0,
|
||||
"Servers": [
|
||||
{
|
||||
"ID": "e349749b-3303-3ddf-959c-b5885a0e1f6e",
|
||||
"Name": "node1",
|
||||
"Address": "127.0.0.1:8300",
|
||||
"SerfStatus": "alive",
|
||||
"Version": "0.8.0",
|
||||
"Leader": true,
|
||||
"LastContact": "0s",
|
||||
"LastTerm": 2,
|
||||
"LastIndex": 46,
|
||||
"Healthy": true,
|
||||
"Voter": true,
|
||||
"StableSince": "2017-03-06T22:07:51Z"
|
||||
},
|
||||
{
|
||||
"ID": "e36ee410-cc3c-0a0c-c724-63817ab30303",
|
||||
"Name": "node2",
|
||||
"Address": "127.0.0.1:8205",
|
||||
"SerfStatus": "alive",
|
||||
"Version": "0.8.0",
|
||||
"Leader": false,
|
||||
"LastContact": "27.291304ms",
|
||||
"LastTerm": 2,
|
||||
"LastIndex": 46,
|
||||
"Healthy": true,
|
||||
"Voter": false,
|
||||
"StableSince": "2017-03-06T22:18:26Z"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
- `Healthy` is whether all the servers are currently healthy.
|
||||
|
||||
- `FailureTolerance` is the number of redundant healthy servers that could be
|
||||
fail without causing an outage (this would be 2 in a healthy cluster of 5
|
||||
servers).
|
||||
|
||||
- `Servers` holds detailed health information on each server:
|
||||
|
||||
- `ID` is the Raft ID of the server.
|
||||
|
||||
- `Name` is the node name of the server.
|
||||
|
||||
- `Address` is the address of the server.
|
||||
|
||||
- `SerfStatus` is the SerfHealth check status for the server.
|
||||
|
||||
- `Version` is the Nomad version of the server.
|
||||
|
||||
- `Leader` is whether this server is currently the leader.
|
||||
|
||||
- `LastContact` is the time elapsed since this server's last contact with the leader.
|
||||
|
||||
- `LastTerm` is the server's last known Raft leader term.
|
||||
|
||||
- `LastIndex` is the index of the server's last committed Raft log entry.
|
||||
|
||||
- `Healthy` is whether the server is healthy according to the current Autopilot configuration.
|
||||
|
||||
- `Voter` is whether the server is a voting member of the Raft cluster.
|
||||
|
||||
- `StableSince` is the time this server has been in its current `Healthy` state.
|
||||
|
||||
The HTTP status code will indicate the health of the cluster. If `Healthy` is true, then a
|
||||
status of 200 will be returned. If `Healthy` is false, then a status of 429 will be returned.
|
||||
|
||||
64
website/source/docs/agent/configuration/autopilot.html.md
Normal file
64
website/source/docs/agent/configuration/autopilot.html.md
Normal file
@@ -0,0 +1,64 @@
|
||||
---
|
||||
layout: "docs"
|
||||
page_title: "autopilot Stanza - Agent Configuration"
|
||||
sidebar_current: "docs-agent-configuration-autopilot"
|
||||
description: |-
|
||||
The "autopilot" stanza configures the Nomad agent to configure Autopilot behavior.
|
||||
---
|
||||
|
||||
# `autopilot` Stanza
|
||||
|
||||
<table class="table table-bordered table-striped">
|
||||
<tr>
|
||||
<th width="120">Placement</th>
|
||||
<td>
|
||||
<code>**acl**</code>
|
||||
</td>
|
||||
</tr>
|
||||
</table>
|
||||
|
||||
The `autopilot` stanza configures the Nomad agent to configure Autopilot behavior.
|
||||
|
||||
```hcl
|
||||
autopilot {
|
||||
cleanup_dead_servers = true
|
||||
last_contact_threshold = "200ms"
|
||||
max_trailing_logs = 250
|
||||
server_stabilization_time = "10s"
|
||||
redundancy_zone_tag = ""
|
||||
disable_upgrade_migration = true
|
||||
upgrade_version_tag = ""
|
||||
}
|
||||
```
|
||||
|
||||
## `autopilot` Parameters
|
||||
|
||||
- `cleanup_dead_servers` `(bool: true)` - Specifies automatic removal of dead
|
||||
server nodes periodically and whenever a new server is added to the cluster.
|
||||
|
||||
- `last_contact_threshold` `(string: "200ms")` - Specifies the maximum amount of
|
||||
time a server can go without contact from the leader before being considered
|
||||
unhealthy. Must be a duration value such as `10s`.
|
||||
|
||||
- `max_trailing_logs` `(int: 250)` specifies the maximum number of log entries
|
||||
that a server can trail the leader by before being considered unhealthy.
|
||||
|
||||
- `server_stabilization_time` `(string: "10s")` - Specifies the minimum amount of
|
||||
time a server must be stable in the 'healthy' state before being added to the
|
||||
cluster. Only takes effect if all servers are running Raft protocol version 3
|
||||
or higher. Must be a duration value such as `30s`.
|
||||
|
||||
- `redundancy_zone_tag` `(string: "")` - Controls the node-meta key to use when
|
||||
Autopilot is separating servers into zones for redundancy. Only one server in
|
||||
each zone can be a voting member at one time. If left blank, this feature will
|
||||
be disabled.
|
||||
|
||||
- `disable_upgrade_migration` `(bool: false)` - Disables Autopilot's upgrade
|
||||
migration strategy in Nomad Enterprise of waiting until enough
|
||||
newer-versioned servers have been added to the cluster before promoting any of
|
||||
them to voters.
|
||||
|
||||
- `upgrade_version_tag` `(string: "")` - Controls the node-meta key to use for
|
||||
version info when performing upgrade migrations. If left blank, the Nomad
|
||||
version will be used.
|
||||
|
||||
@@ -102,6 +102,9 @@ server {
|
||||
second is a tradeoff as it lowers failure detection time of nodes at the
|
||||
tradeoff of false positives and increased load on the leader.
|
||||
|
||||
- `non_voting_server` `(bool: false)` - is whether this server will act as
|
||||
a non-voting member of the cluster to help provide read scalability. (Enterprise-only)
|
||||
|
||||
- `num_schedulers` `(int: [num-cores])` - Specifies the number of parallel
|
||||
scheduler threads to run. This can be as many as one per core, or `0` to
|
||||
disallow this server from making any scheduling decisions. This defaults to
|
||||
|
||||
@@ -28,8 +28,12 @@ Usage: `nomad operator <subcommand> <subcommand> [options]`
|
||||
Run `nomad operator <subcommand>` with no arguments for help on that subcommand.
|
||||
The following subcommands are available:
|
||||
|
||||
* [`autopilot get-config`][get-config] - Display the current Autopilot configuration
|
||||
* [`autopilot set-config`][set-config] - Modify the current Autopilot configuration
|
||||
* [`raft list-peers`][list] - Display the current Raft peer configuration
|
||||
* [`raft remove-peer`][remove] - Remove a Nomad server from the Raft configuration
|
||||
|
||||
[get-config]: /docs/commands/operator/autopilot-get-config.html "Autopilot Get Config command"
|
||||
[set-config]: /docs/commands/operator/autopilot-set-config.html "Autopilot Set Config command"
|
||||
[list]: /docs/commands/operator/raft-list-peers.html "Raft List Peers command"
|
||||
[remove]: /docs/commands/operator/raft-remove-peer.html "Raft Remove Peer command"
|
||||
|
||||
@@ -0,0 +1,63 @@
|
||||
---
|
||||
layout: "docs"
|
||||
page_title: "Commands: operator autopilot get-config"
|
||||
sidebar_current: "docs-commands-operator-autopilot-get-config"
|
||||
description: >
|
||||
Display the current Autopilot configuration.
|
||||
---
|
||||
|
||||
# Command: `operator autopilot get-config`
|
||||
|
||||
The Autopilot operator command is used to view the current Autopilot configuration. See the
|
||||
[Autopilot Guide](/guides/cluster/autopilot.html) for more information about Autopilot.
|
||||
|
||||
## Usage
|
||||
|
||||
```
|
||||
nomad operator autopilot get-config [options]
|
||||
```
|
||||
|
||||
## General Options
|
||||
|
||||
<%= partial "docs/commands/_general_options" %>
|
||||
|
||||
The output looks like this:
|
||||
|
||||
```
|
||||
CleanupDeadServers = true
|
||||
LastContactThreshold = 200ms
|
||||
MaxTrailingLogs = 250
|
||||
ServerStabilizationTime = 10s
|
||||
RedundancyZoneTag = ""
|
||||
DisableUpgradeMigration = false
|
||||
UpgradeMigrationTag = ""
|
||||
```
|
||||
|
||||
- `CleanupDeadServers` - Specifies automatic removal of dead
|
||||
server nodes periodically and whenever a new server is added to the cluster.
|
||||
|
||||
- `LastContactThreshold` - Specifies the maximum amount of
|
||||
time a server can go without contact from the leader before being considered
|
||||
unhealthy. Must be a duration value such as `10s`.
|
||||
|
||||
- `MaxTrailingLogs` - specifies the maximum number of log entries
|
||||
that a server can trail the leader by before being considered unhealthy.
|
||||
|
||||
- `ServerStabilizationTime` - Specifies the minimum amount of
|
||||
time a server must be stable in the 'healthy' state before being added to the
|
||||
cluster. Only takes effect if all servers are running Raft protocol version 3
|
||||
or higher. Must be a duration value such as `30s`.
|
||||
|
||||
- `RedundancyZoneTag` - Controls the node-meta key to use when
|
||||
Autopilot is separating servers into zones for redundancy. Only one server in
|
||||
each zone can be a voting member at one time. If left blank, this feature will
|
||||
be disabled.
|
||||
|
||||
- `DisableUpgradeMigration` - Disables Autopilot's upgrade
|
||||
migration strategy in Nomad Enterprise of waiting until enough
|
||||
newer-versioned servers have been added to the cluster before promoting any of
|
||||
them to voters.
|
||||
|
||||
- `UpgradeVersionTag` - Controls the node-meta key to use for
|
||||
version info when performing upgrade migrations. If left blank, the Nomad
|
||||
version will be used.
|
||||
@@ -0,0 +1,55 @@
|
||||
---
|
||||
layout: "docs"
|
||||
page_title: "Commands: operator autopilot set-config"
|
||||
sidebar_current: "docs-commands-operator-autopilot-set-config"
|
||||
description: >
|
||||
Modify the current Autopilot configuration.
|
||||
---
|
||||
|
||||
# Command: `operator autopilot set-config`
|
||||
|
||||
The Autopilot operator command is used to set the current Autopilot configuration. See the
|
||||
[Autopilot Guide](/guides/cluster/autopilot.html) for more information about Autopilot.
|
||||
|
||||
|
||||
## Usage
|
||||
|
||||
```
|
||||
nomad operator autopilot set-config [options]
|
||||
```
|
||||
|
||||
## General Options
|
||||
|
||||
<%= partial "docs/commands/_general_options" %>
|
||||
|
||||
## Set Config Options
|
||||
|
||||
* `-cleanup-dead-servers` - Specifies whether to enable automatic removal of dead servers
|
||||
upon the successful joining of new servers to the cluster. Must be one of `[true|false]`.
|
||||
|
||||
* `-last-contact-threshold` - Controls the maximum amount of time a server can go without contact
|
||||
from the leader before being considered unhealthy. Must be a duration value such as `200ms`.
|
||||
|
||||
* `-max-trailing-logs` - Controls the maximum number of log entries that a server can trail
|
||||
the leader by before being considered unhealthy.
|
||||
|
||||
* `-server-stabilization-time` - Controls the minimum amount of time a server must be stable in
|
||||
the 'healthy' state before being added to the cluster. Only takes effect if all servers are
|
||||
running Raft protocol version 3 or higher. Must be a duration value such as `10s`.
|
||||
|
||||
* `-disable-upgrade-migration` - (Enterprise-only) Controls whether Nomad will avoid promoting
|
||||
new servers until it can perform a migration. Must be one of `[true|false]`.
|
||||
|
||||
* `-redundancy-zone-tag`- (Enterprise-only) Controls the [`-node-meta`](/docs/agent/options.html#_node_meta)
|
||||
key name used for separating servers into different redundancy zones.
|
||||
|
||||
* `-upgrade-version-tag` - (Enterprise-only) Controls the [`-node-meta`](/docs/agent/options.html#_node_meta)
|
||||
tag to use for version info when performing upgrade migrations. If left blank, the Nomad version will be used.
|
||||
|
||||
The output looks like this:
|
||||
|
||||
```
|
||||
Configuration updated!
|
||||
```
|
||||
|
||||
The return code will indicate success or failure.
|
||||
@@ -15,6 +15,42 @@ details provided for their upgrades as a result of new features or changed
|
||||
behavior. This page is used to document those details separately from the
|
||||
standard upgrade flow.
|
||||
|
||||
## Nomad 0.8.0
|
||||
|
||||
#### Raft Protocol Version Compatibility
|
||||
|
||||
When upgrading to Nomad 0.8.0 from a version lower than 0.7.0, users will need to
|
||||
set the [`-raft-protocol`](/docs/agent/options.html#_raft_protocol) option to 1 in
|
||||
order to maintain backwards compatibility with the old servers during the upgrade.
|
||||
After the servers have been migrated to version 0.8.0, `-raft-protocol` can be moved
|
||||
up to 2 and the servers restarted to match the default.
|
||||
|
||||
The Raft protocol must be stepped up in this way; only adjacent version numbers are
|
||||
compatible (for example, version 1 cannot talk to version 3). Here is a table of the
|
||||
Raft Protocol versions supported by each Consul version:
|
||||
|
||||
<table class="table table-bordered table-striped">
|
||||
<tr>
|
||||
<th>Version</th>
|
||||
<th>Supported Raft Protocols</th>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>0.6 and earlier</td>
|
||||
<td>0</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>0.7</td>
|
||||
<td>1</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>0.8</td>
|
||||
<td>1, 2, 3</td>
|
||||
</tr>
|
||||
</table>
|
||||
|
||||
In order to enable all [Autopilot](/guides/cluster/autopilot.html) features, all servers
|
||||
in a Nomad cluster must be running with Raft protocol version 3 or later.
|
||||
|
||||
## Nomad 0.6.0
|
||||
|
||||
### Default `advertise` address changes
|
||||
|
||||
219
website/source/guides/cluster/autopilot.html.md
Normal file
219
website/source/guides/cluster/autopilot.html.md
Normal file
@@ -0,0 +1,219 @@
|
||||
---
|
||||
layout: "guides"
|
||||
page_title: "Autopilot"
|
||||
sidebar_current: "guides-cluster-autopilot"
|
||||
description: |-
|
||||
This guide covers how to configure and use Autopilot features.
|
||||
---
|
||||
|
||||
# Autopilot
|
||||
|
||||
Autopilot is a set of new features added in Nomad 0.8 to allow for automatic
|
||||
operator-friendly management of Nomad servers. It includes cleanup of dead
|
||||
servers, monitoring the state of the Raft cluster, and stable server introduction.
|
||||
|
||||
To enable Autopilot features (with the exception of dead server cleanup),
|
||||
the [`raft_protocol`](/docs/agent/configuration/server.html#raft_protocol) setting in
|
||||
the Agent configuration must be set to 3 or higher on all servers. In Nomad
|
||||
0.8 this setting defaults to 2; in Nomad 0.9 it will default to 3. For more
|
||||
information, see the [Version Upgrade section]
|
||||
(/docs/upgrade/upgrade-specific.html#raft-protocol-version-compatibility)
|
||||
on Raft Protocol versions.
|
||||
|
||||
## Configuration
|
||||
|
||||
The configuration of Autopilot is loaded by the leader from the agent's
|
||||
[Autopilot settings](/docs/agent/options.html#autopilot) when initially
|
||||
bootstrapping the cluster:
|
||||
|
||||
```
|
||||
{
|
||||
"cleanup_dead_servers": true,
|
||||
"last_contact_threshold": "200ms",
|
||||
"max_trailing_logs": 250,
|
||||
"server_stabilization_time": "10s",
|
||||
"redundancy_zone_tag": "az",
|
||||
"disable_upgrade_migration": false,
|
||||
"upgrade_version_tag": ""
|
||||
}
|
||||
```
|
||||
|
||||
After bootstrapping, the configuration can be viewed or modified either via the
|
||||
[`operator autopilot`](/docs/commands/operator/autopilot.html) subcommand or the
|
||||
[`/v1/operator/autopilot/configuration`](/api/operator.html#autopilot-configuration)
|
||||
HTTP endpoint:
|
||||
|
||||
```
|
||||
$ nomad operator autopilot get-config
|
||||
CleanupDeadServers = true
|
||||
LastContactThreshold = 200ms
|
||||
MaxTrailingLogs = 250
|
||||
ServerStabilizationTime = 10s
|
||||
RedundancyZoneTag = ""
|
||||
DisableUpgradeMigration = false
|
||||
UpgradeVersionTag = ""
|
||||
|
||||
$ Nomad operator autopilot set-config -cleanup-dead-servers=false
|
||||
Configuration updated!
|
||||
|
||||
$ Nomad operator autopilot get-config
|
||||
CleanupDeadServers = false
|
||||
LastContactThreshold = 200ms
|
||||
MaxTrailingLogs = 250
|
||||
ServerStabilizationTime = 10s
|
||||
RedundancyZoneTag = ""
|
||||
DisableUpgradeMigration = false
|
||||
UpgradeVersionTag = ""
|
||||
```
|
||||
|
||||
## Dead Server Cleanup
|
||||
|
||||
Dead servers will periodically be cleaned up and removed from the Raft peer
|
||||
set, to prevent them from interfering with the quorum size and leader elections.
|
||||
This cleanup will also happen whenever a new server is successfully added to the
|
||||
cluster.
|
||||
|
||||
Prior to Autopilot, it would take 72 hours for dead servers to be automatically reaped,
|
||||
or operators had to script a `nomad force-leave`. If another server failure occurred,
|
||||
it could jeopardize the quorum, even if the failed Nomad server had been automatically
|
||||
replaced. Autopilot helps prevent these kinds of outages by quickly removing failed
|
||||
servers as soon as a replacement Nomad server comes online. When servers are removed
|
||||
by the cleanup process they will enter the "left" state.
|
||||
|
||||
This option can be disabled by running `nomad operator autopilot set-config`
|
||||
with the `-cleanup-dead-servers=false` option.
|
||||
|
||||
## Server Health Checking
|
||||
|
||||
An internal health check runs on the leader to track the stability of servers.
|
||||
</br>A server is considered healthy if all of the following conditions are true:
|
||||
|
||||
- It has a SerfHealth status of 'Alive'
|
||||
- The time since its last contact with the current leader is below
|
||||
`LastContactThreshold`
|
||||
- Its latest Raft term matches the leader's term
|
||||
- The number of Raft log entries it trails the leader by does not exceed
|
||||
`MaxTrailingLogs`
|
||||
|
||||
The status of these health checks can be viewed through the [`/v1/operator/autopilot/health`]
|
||||
(/api/operator.html#read-health) HTTP endpoint, with a top level
|
||||
`Healthy` field indicating the overall status of the cluster:
|
||||
|
||||
```
|
||||
$ curl localhost:8500/v1/operator/autopilot/health
|
||||
{
|
||||
"Healthy": true,
|
||||
"FailureTolerance": 0,
|
||||
"Servers": [
|
||||
{
|
||||
"ID": "e349749b-3303-3ddf-959c-b5885a0e1f6e",
|
||||
"Name": "node1",
|
||||
"Address": "127.0.0.1:4647",
|
||||
"SerfStatus": "alive",
|
||||
"Version": "0.8.0",
|
||||
"Leader": true,
|
||||
"LastContact": "0s",
|
||||
"LastTerm": 2,
|
||||
"LastIndex": 10,
|
||||
"Healthy": true,
|
||||
"Voter": true,
|
||||
"StableSince": "2017-03-28T18:28:52Z"
|
||||
},
|
||||
{
|
||||
"ID": "e35bde83-4e9c-434f-a6ef-453f44ee21ea",
|
||||
"Name": "node2",
|
||||
"Address": "127.0.0.1:4747",
|
||||
"SerfStatus": "alive",
|
||||
"Version": "0.8.0",
|
||||
"Leader": false,
|
||||
"LastContact": "35.371007ms",
|
||||
"LastTerm": 2,
|
||||
"LastIndex": 10,
|
||||
"Healthy": true,
|
||||
"Voter": false,
|
||||
"StableSince": "2017-03-28T18:29:10Z"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## Stable Server Introduction
|
||||
|
||||
When a new server is added to the cluster, there is a waiting period where it
|
||||
must be healthy and stable for a certain amount of time before being promoted
|
||||
to a full, voting member. This can be configured via the `ServerStabilizationTime`
|
||||
setting.
|
||||
|
||||
---
|
||||
|
||||
~> The following Autopilot features are available only in
|
||||
[Nomad Enterprise](https://www.hashicorp.com/products/nomad/) version 0.8.0 and later.
|
||||
|
||||
## Server Read Scaling
|
||||
|
||||
With the [`non_voting_server`](/docs/agent/configuration/server.html#non_voting_server) option, a
|
||||
server can be explicitly marked as a non-voter and will never be promoted to a voting
|
||||
member. This can be useful when more read scaling is needed; being a non-voter means
|
||||
that the server will still have data replicated to it, but it will not be part of the
|
||||
quorum that the leader must wait for before committing log entries.
|
||||
|
||||
## Redundancy Zones
|
||||
|
||||
Prior to Autopilot, it was difficult to deploy servers in a way that took advantage of
|
||||
isolated failure domains such as AWS Availability Zones; users would be forced to either
|
||||
have an overly-large quorum (2-3 nodes per AZ) or give up redundancy within an AZ by
|
||||
deploying just one server in each.
|
||||
|
||||
If the `RedundancyZoneTag` setting is set, Nomad will use its value to look for a
|
||||
zone in each server's specified [`-meta`](/docs/agent/configuration/client.html#meta)
|
||||
tag. For example, if `RedundancyZoneTag` is set to `zone`, and `-meta zone=east1a`
|
||||
is used when starting a server, that server's redundancy zone will be `east1a`.
|
||||
|
||||
Here's an example showing how to configure this:
|
||||
|
||||
```
|
||||
$ nomad operator autopilot set-config -redundancy-zone-tag=zone
|
||||
Configuration updated!
|
||||
```
|
||||
|
||||
Nomad will then use these values to partition the servers by redundancy zone, and will
|
||||
aim to keep one voting server per zone. Extra servers in each zone will stay as non-voters
|
||||
on standby to be promoted if the active voter leaves or dies.
|
||||
|
||||
## Upgrade Migrations
|
||||
|
||||
Autopilot in Nomad Enterprise supports upgrade migrations by default. To disable this
|
||||
functionality, set `DisableUpgradeMigration` to true.
|
||||
|
||||
When a new server is added and Autopilot detects that its Nomad version is newer than
|
||||
that of the existing servers, Autopilot will avoid promoting the new server until enough
|
||||
newer-versioned servers have been added to the cluster. When the count of new servers
|
||||
equals or exceeds that of the old servers, Autopilot will begin promoting the new servers
|
||||
to voters and demoting the old servers. After this is finished, the old servers can be
|
||||
safely removed from the cluster.
|
||||
|
||||
To check the Nomad version of the servers, either the [autopilot health]
|
||||
(/api/operator.html#read-health) endpoint or the `Nomad members`
|
||||
command can be used:
|
||||
|
||||
```
|
||||
$ Nomad members
|
||||
Node Address Status Type Build Protocol DC
|
||||
node1 127.0.0.1:8301 alive server 0.7.5 2 dc1
|
||||
node2 127.0.0.1:8703 alive server 0.7.5 2 dc1
|
||||
node3 127.0.0.1:8803 alive server 0.7.5 2 dc1
|
||||
node4 127.0.0.1:8203 alive server 0.8.0 2 dc1
|
||||
```
|
||||
|
||||
### Migrations Without a Nomad Version Change
|
||||
|
||||
The `UpgradeVersionTag` can be used to override the version information used during
|
||||
a migration, so that the migration logic can be used for updating the cluster when
|
||||
changing configuration.
|
||||
|
||||
If the `UpgradeVersionTag` setting is set, Nomad will use its value to look for a
|
||||
version in each server's specified [`-meta`](/docs/agent/configuration/client.html#meta)
|
||||
tag. For example, if `UpgradeVersionTag` is set to `build`, and `-meta build:0.0.2`
|
||||
is used when starting a server, that server's version will be `0.0.2` when considered in
|
||||
a migration. The upgrade logic will follow semantic versioning and the version string
|
||||
must be in the form of either `X`, `X.Y`, or `X.Y.Z`.
|
||||
@@ -313,6 +313,12 @@
|
||||
<li<%= sidebar_current("docs-commands-operator") %>>
|
||||
<a href="/docs/commands/operator.html">operator</a>
|
||||
<ul class="nav">
|
||||
<li<%= sidebar_current("docs-commands-operator-autopilot-get-config") %>>
|
||||
<a href="/docs/commands/operator/autopilot-get-config.html">autopilot get-config</a>
|
||||
</li>
|
||||
<li<%= sidebar_current("docs-commands-operator-autopilot-set-config") %>>
|
||||
<a href="/docs/commands/operator/autopilot-set-config.html">autopilot set-config</a>
|
||||
</li>
|
||||
<li<%= sidebar_current("docs-commands-operator-raft-list-peers") %>>
|
||||
<a href="/docs/commands/operator/raft-list-peers.html">raft list-peers</a>
|
||||
</li>
|
||||
@@ -404,6 +410,9 @@
|
||||
<li <%= sidebar_current("docs-agent-configuration-acl") %>>
|
||||
<a href="/docs/agent/configuration/acl.html">acl</a>
|
||||
</li>
|
||||
<li <%= sidebar_current("docs-agent-configuration-autopilot") %>>
|
||||
<a href="/docs/agent/configuration/autopilot.html">autopilot</a>
|
||||
</li>
|
||||
<li <%= sidebar_current("docs-agent-configuration-client") %>>
|
||||
<a href="/docs/agent/configuration/client.html">client</a>
|
||||
</li>
|
||||
|
||||
@@ -42,6 +42,9 @@
|
||||
<li<%= sidebar_current("guides-cluster-automatic") %>>
|
||||
<a href="/guides/cluster/automatic.html">Automatic</a>
|
||||
</li>
|
||||
<li<%= sidebar_current("guides-cluster-autopilot") %>>
|
||||
<a href="/guides/cluster/autopilot.html">Autopilot</a>
|
||||
</li>
|
||||
<li<%= sidebar_current("guides-cluster-manual") %>>
|
||||
<a href="/guides/cluster/manual.html">Manual</a>
|
||||
</li>
|
||||
|
||||
Reference in New Issue
Block a user