Add section on peers.json new format and upgrading to raft protocol 3

This commit is contained in:
Preetha Appan
2018-05-30 13:47:56 -05:00
parent eea101c7cf
commit 024354f99a
2 changed files with 48 additions and 1 deletions

View File

@@ -29,7 +29,7 @@ to match the default.
The Raft protocol must be stepped up in this way; only adjacent version numbers are
compatible (for example, version 1 cannot talk to version 3). Here is a table of the
Raft Protocol versions supported by each Consul version:
Raft Protocol versions supported by each Nomad version:
<table class="table table-bordered table-striped">
<tr>
@@ -53,6 +53,17 @@ Raft Protocol versions supported by each Consul version:
In order to enable all [Autopilot](/guides/cluster/autopilot.html) features, all servers
in a Nomad cluster must be running with Raft protocol version 3 or later.
#### Upgrading to Raft Protocol 3
This section provides details on upgrading to Raft Protocol 3 in Nomad 0.8 and higher. Raft protocol version 3 requires Nomad running 0.8.0 or newer on all servers in order to work. See [Raft Protocol Version Compatibility](/docs/upgrade/upgrade-specific.html#raft-protocol-version-compatibility) for more details. Also the format of `peers.json` used for outage recovery is different when running with the latest Raft protocol. See [Manual Recovery Using peers.json](/guides/outage.html#manual-recovery-using-peers-json) for a description of the required format.
Please note that the Raft protocol is different from Nomad's internal protocol as shown in commands like `nomad server members`. To see the version of the Raft protocol in use on each server, use the `nomad operator raft list-peers` command.
The easiest way to upgrade servers is to have each server leave the cluster, upgrade its `raft_protocol` version in the `server` stanza, and then add it back. Make sure the new server joins successfully and that the cluster is stable before rolling the upgrade forward to the next server. It's also possible to stand up a new set of servers, and then slowly stand down each of the older servers in a similar fashion.
When using Raft protocol version 3, servers are identified by their `node-id` instead of their IP address when Nomad makes changes to its internal Raft quorum configuration. This means that once a cluster has been upgraded with servers all running Raft protocol version 3, it will no longer allow servers running any older Raft protocol versions to be added. If running a single Nomad server, restarting it in-place will result in that server not being able to elect itself as a leader. To avoid this, either set the Raft protocol back to 2, or use [Manual Recovery Using peers.json](/docs/guides/outage.html#manual-recovery-using-peers-json) to map the server to its node ID in the Raft quorum configuration.
### Node Draining Improvements
Node draining via the [`node drain`][drain-cli] command or the [drain

View File

@@ -186,3 +186,39 @@ nomad-server01.global 10.10.11.5:4647 10.10.11.5:4647 follower true
nomad-server02.global 10.10.11.6:4647 10.10.11.6:4647 leader true
nomad-server03.global 10.10.11.7:4647 10.10.11.7:4647 follower true
```
## Peers.json Format Changes in Raft Protocol 3
For Raft protocol version 3 and later, peers.json should be formatted as a JSON
array containing the node ID, address:port, and suffrage information of each
Nomad server in the cluster, like this:
```
[
{
"id": "adf4238a-882b-9ddc-4a9d-5b6758e4159e",
"address": "10.1.0.1:8300",
"non_voter": false
},
{
"id": "8b6dda82-3103-11e7-93ae-92361f002671",
"address": "10.1.0.2:8300",
"non_voter": false
},
{
"id": "97e17742-3103-11e7-93ae-92361f002671",
"address": "10.1.0.3:8300",
"non_voter": false
}
]
```
- `id` `(string: <required>)` - Specifies the `node ID`
of the server. This can be found in the logs when the server starts up,
and it can also be found inside the `node-id` file in the server's data directory.
- `address` `(string: <required>)` - Specifies the IP and port of the server. The port is the
server's RPC port used for cluster communications.
- `non_voter` `(bool: <false>)` - This controls whether the server is a non-voter, which is used
in some advanced [Autopilot](/guides/cluster/autopilot.html) configurations. If omitted, it will
default to false, which is typical for most clusters.