mirror of
https://github.com/kemko/nomad.git
synced 2026-01-04 17:35:43 +03:00
* Move commands from docs to its own root-level directory * temporarily use modified dev-portal branch with nomad ia changes * explicitly clone nomad ia exp branch * retrigger build, fixed dev-portal broken build * architecture, concepts and get started individual pages * fix get started section destinations * reference section * update repo comment in website-build.sh to show branch * docs nav file update capitalization * update capitalization to force deploy * remove nomad-vs-kubernetes dir; move content to what is nomad pg * job section * Nomad operations category, deploy section * operations category, govern section * operations - manage * operations/scale; concepts scheduling fix * networking * monitor * secure section * remote auth-methods folder and move up pages to sso; linkcheck * Fix install2deploy redirects * fix architecture redirects * Job section: Add missing section index pages * Add section index pages so breadcrumbs build correctly * concepts/index fix front matter indentation * move task driver plugin config to new deploy section * Finish adding full URL to tutorials links in nav * change SSO to Authentication in nav and file system * Docs NomadIA: Move tutorials into NomadIA branch (#26132) * Move governance and policy from tutorials to docs * Move tutorials content to job-declare section * run jobs section * stateful workloads * advanced job scheduling * deploy section * manage section * monitor section * secure/acl and secure/authorization * fix example that contains an unseal key in real format * remove images from sso-vault * secure/traffic * secure/workload-identities * vault-acl change unseal key and root token in command output sample * remove lines from sample output * fix front matter * move nomad pack tutorials to tools * search/replace /nomad/tutorials links * update acl overview with content from deleted architecture/acl * fix spelling mistake * linkcheck - fix broken links * fix link to Nomad variables tutorial * fix link to Prometheus tutorial * move who uses Nomad to use cases page; move spec/config shortcuts add dividers * Move Consul out of Integrations; move namespaces to govern * move integrations/vault to secure/vault; delete integrations * move ref arch to docs; rename Deploy Nomad back to Install Nomad * address feedback * linkcheck fixes * Fixed raw_exec redirect * add info from /nomad/tutorials/manage-jobs/jobs * update page content with newer tutorial * link updates for architecture sub-folders * Add redirects for removed section index pages. Fix links. * fix broken links from linkcheck * Revert to use dev-portal main branch instead of nomadIA branch * build workaround: add intro-nav-data.json with single entry * fix content-check error * add intro directory to get around Vercel build error * workound for emtpry directory * remove mdx from /intro/ to fix content-check and git snafu * Add intro index.mdx so Vercel build should work --------- Co-authored-by: Tu Nguyen <im2nguyen@gmail.com>
260 lines
10 KiB
Plaintext
260 lines
10 KiB
Plaintext
---
|
|
layout: docs
|
|
page_title: 'Upgrade Nomad'
|
|
sidebar_current: 'guides-upgrade'
|
|
description: |-
|
|
Upgrade HashiCorp Nomad without causing an outage. Review the upgrade process for a new version, for moving from Nomad community edition to Nomad Enterprise, and for Raft protocol version three.
|
|
---
|
|
|
|
# Upgrade Nomad
|
|
|
|
This page provides guidance on upgrading HashiCorp Nomad without causing an
|
|
outage. Review the upgrade process for a new version, for moving from Nomad
|
|
Community to Nomad Enterprise, and for Raft protocol version three.
|
|
|
|
## Introduction
|
|
|
|
Nomad is designed to be flexible and resilient when upgrading from one Nomad
|
|
version to the next. Upgrades should cause neither a Nomad nor a service
|
|
outage. However, there are some restrictions to be aware of before upgrading:
|
|
|
|
- Nomad strives to be backward compatible for at least 2 point release, so
|
|
Nomad v1.7.x works with v1.5.x.
|
|
|
|
- Nomad does _not_ support downgrading at this time. Downgrading clients
|
|
requires draining allocations and removing the [data directory][data_dir].
|
|
Downgrading servers safely requires re-provisioning the cluster.
|
|
|
|
- New features are unlikely to work correctly until all nodes have been
|
|
upgraded.
|
|
|
|
- Check the [version upgrade details page][upgrade-specific] for important
|
|
changes and backward incompatibilities.
|
|
|
|
- When upgrading a Nomad Client, if it takes longer than the
|
|
[`heartbeat_grace`][heartbeat_grace] (10s by default) period to restart, all
|
|
allocations on that node may be rescheduled.
|
|
|
|
Nomad supports upgrading in place or by rolling in new servers:
|
|
|
|
- In Place: The Nomad binary can be updated on existing hosts. Running
|
|
allocations will continue running uninterrupted.
|
|
|
|
- Rolling: New hosts containing the new Nomad version may be added followed by
|
|
the removal of old hosts. The old nodes must be drained to migrate running
|
|
allocations to the new nodes.
|
|
|
|
This guide describes both approaches.
|
|
|
|
## Upgrade Process
|
|
|
|
Once you have checked the [upgrade details for the new
|
|
version][upgrade-specific], the upgrade process is as simple as updating the
|
|
binary on each host and restarting the Nomad service.
|
|
|
|
At a high level we complete the following steps to upgrade Nomad:
|
|
|
|
- **Add the new version**
|
|
- **Check cluster health**
|
|
- **Remove the old version**
|
|
- **Check cluster health**
|
|
- **Upgrade clients**
|
|
|
|
### 1. Add the new version to the existing cluster
|
|
|
|
While it is possible to upgrade Nomad client nodes before servers, this guide
|
|
recommends upgrading servers first as many new client features will not work
|
|
until servers are upgraded.
|
|
|
|
In a [federated cluster](//nomad/docs/deploy/clusters/federate-regions),
|
|
new features are not guaranteed to work until all agents in a region and the
|
|
server nodes in the authoritative region are upgraded.
|
|
|
|
Whether you are replacing Nomad in place on existing systems or bringing up new
|
|
servers you should make changes incrementally, verifying cluster health at each
|
|
step of the upgrade.
|
|
|
|
On a single server, install the new version of Nomad. You can do this by
|
|
joining a new server to the cluster or by replacing or upgrading the binary
|
|
locally and restarting the Nomad service.
|
|
|
|
Note that if you have [`leave_on_terminate`][] or [`leave_on_interrupt`][] set,
|
|
you should ensure you're using the expected signal for your upgrade process. For
|
|
example, if you have `leave_on_terminate` set and you intend on updating a
|
|
server in-place, you should `SIGINT` and not `SIGTERM` when shutting down the
|
|
server before restarting it.
|
|
|
|
### 2. Check cluster health
|
|
|
|
[Monitor the Nomad logs][monitor] on the remaining servers to check that the
|
|
new server has joined the cluster correctly.
|
|
|
|
Run `nomad agent-info` on the new servers and check that the `last_log_index`
|
|
is of a similar value to the other servers. This step ensures that changes have
|
|
been replicated to the new server.
|
|
|
|
```shell-session
|
|
ubuntu@nomad-server-10-1-1-4:~$ nomad agent-info
|
|
nomad
|
|
bootstrap = false
|
|
known_regions = 1
|
|
leader = false
|
|
server = true
|
|
raft
|
|
applied_index = 53460
|
|
commit_index = 53460
|
|
fsm_pending = 0
|
|
last_contact = 54.512216ms
|
|
last_log_index = 53460
|
|
last_log_term = 1
|
|
last_snapshot_index = 49511
|
|
last_snapshot_term = 1
|
|
num_peers = 2
|
|
...
|
|
```
|
|
|
|
Continue with the upgrades across the servers making sure to do a single Nomad
|
|
server at a time. You can check state of the servers with [`nomad server members`][server-members], and the state of the client nodes with [`nomad node status`][node-status].
|
|
|
|
### 3. Remove the old versions from servers
|
|
|
|
If you are doing an in place upgrade on existing servers this step is not
|
|
necessary as the version was changed in place.
|
|
|
|
If you are doing an upgrade by adding new servers and removing old servers
|
|
from the fleet you need to ensure that the server has left the fleet safely.
|
|
|
|
1. Stop the service on the existing host
|
|
2. On another server issue a `nomad server members` and check the status, if
|
|
the server is now in a left state you are safe to continue.
|
|
3. If the server is not in a left state, issue a `nomad server force-leave <server id>`
|
|
to remove the server from the cluster.
|
|
|
|
Monitor the logs of the other hosts in the Nomad cluster over this period.
|
|
|
|
### 4. Check cluster health
|
|
|
|
Use the same actions in step #2 above to confirm cluster health.
|
|
|
|
### 5. Upgrade clients
|
|
|
|
Following the successful upgrade of the servers you can now update your
|
|
clients using a similar process as the servers. You may either upgrade clients
|
|
in-place or start new nodes on the new version. See the [Workload Migration
|
|
Guide](/nomad/docs/manage/migrate-workloads) for instructions on how to migrate running
|
|
allocations from the old nodes to the new nodes with the [`nomad node drain`](/nomad/commands/node/drain) command.
|
|
|
|
## Done
|
|
|
|
You are now running the latest Nomad version. You can verify all
|
|
Clients joined by running `nomad node status` and checking all the clients
|
|
are in a `ready` state.
|
|
|
|
## Upgrading to Nomad Enterprise
|
|
|
|
Before upgrading servers to Nomad Enterprise versions 1.6.0 and later,
|
|
you should validate your enterprise license with the
|
|
[`nomad license inspect` command](/nomad/commands/license/inspect)
|
|
using the binary that you are upgrading to.
|
|
See the [licensing FAQ](/nomad/docs/enterprise/license/faq)
|
|
for more information.
|
|
|
|
After that, the process of upgrading to a Nomad Enterprise version is identical to upgrading
|
|
between versions of open source Nomad. The same guidance above should be
|
|
followed and as always, prior to starting the upgrade please check the [specific
|
|
version details](/nomad/docs/upgrade/upgrade-specific) page as some version
|
|
differences may require specific steps.
|
|
|
|
[data_dir]: /nomad/docs/configuration#data_dir
|
|
[heartbeat_grace]: /nomad/docs/configuration/server#heartbeat_grace
|
|
[monitor]: /nomad/commands/monitor
|
|
[node-status]: /nomad/commands/node/status
|
|
[server-members]: /nomad/commands/server/members
|
|
[upgrade-specific]: /nomad/docs/upgrade/upgrade-specific
|
|
|
|
## Upgrading to Raft Protocol 3
|
|
|
|
This section provides details on upgrading to Raft Protocol 3. Raft
|
|
protocol version 3 requires Nomad running 0.8.0 or newer on all
|
|
servers in order to work. Raft protocol version 2 will be removed in
|
|
Nomad 1.4.0.
|
|
|
|
To see the version of the Raft protocol in use on each server, use the
|
|
`nomad operator raft list-peers` command.
|
|
|
|
Note that the format of `peers.json` used for outage recovery is
|
|
different when running with the latest Raft protocol. See [Manual
|
|
Recovery Using
|
|
peers.json](/nomad/docs/manage/outage-recovery#manual-recovery-using-peers-json)
|
|
for a description of the required format.
|
|
|
|
When using Raft protocol version 3, servers are identified by their
|
|
`node-id` instead of their IP address when Nomad makes changes to its
|
|
internal Raft quorum configuration. This means that once a cluster has
|
|
been upgraded with servers all running Raft protocol version 3, it
|
|
will no longer allow servers running any older Raft protocol versions
|
|
to be added.
|
|
|
|
### Upgrading a Production Cluster to Raft Version 3
|
|
|
|
For production raft clusters with 3 or more members, the easiest way
|
|
to upgrade servers is to have each server leave the cluster, upgrade
|
|
its [`raft_protocol`] version in the `server` block (if upgrading to
|
|
a version lower than v1.3.0), and then add it back. Make sure the new
|
|
server joins successfully and that the cluster is stable before
|
|
rolling the upgrade forward to the next server. It's also possible to
|
|
stand up a new set of servers, and then slowly stand down each of the
|
|
older servers in a similar fashion.
|
|
|
|
For in-place raft protocol upgrades, perform the following for each
|
|
server, leaving the leader until last to reduce the chance of leader
|
|
elections that will slow down the process:
|
|
|
|
* Stop the server.
|
|
* Run `nomad server force-leave $server_name`.
|
|
* If the upgrade is for a Nomad version lower than v1.3.0, update the
|
|
[`raft_protocol`] in the server's configuration file to `3`.
|
|
* Restart the server.
|
|
* Run `nomad operator raft list-peers` to verify that the
|
|
`RaftProtocol` for the server is now `3`.
|
|
* On the server, run `nomad agent-info` and check that the
|
|
`last_log_index` is of a similar value to the other servers. This
|
|
step ensures that raft is healthy and changes are replicating to the
|
|
new server.
|
|
|
|
### Upgrading a Single Server Cluster to Raft Version 3
|
|
|
|
If you are running a single Nomad server, restarting it in-place will
|
|
result in that server not being able to elect itself as a leader. To
|
|
avoid this, create a new [`peers.json`][peers-json] file before
|
|
restarting the server with the new configuration. If you have `jq`
|
|
installed you can run the following script on the server's host to
|
|
write the correct `peers.json` file:
|
|
|
|
```
|
|
#!/usr/bin/env bash
|
|
|
|
NOMAD_DATA_DIR=$(nomad agent-info -json | jq -r '.config.DataDir')
|
|
NOMAD_ADDR=$(nomad agent-info -json | jq -r '.stats.nomad.leader_addr')
|
|
NODE_ID=$(cat "$NOMAD_DATA_DIR/server/node-id")
|
|
|
|
cat <<EOF > "$NOMAD_DATA_DIR/server/raft/peers.json"
|
|
[
|
|
{
|
|
"id": "$NODE_ID",
|
|
"address": "$NOMAD_ADDR",
|
|
"non_voter": false
|
|
}
|
|
]
|
|
EOF
|
|
```
|
|
|
|
After running this script, if the upgrade is for a Nomad version lower
|
|
than v1.3.0, update the [`raft_protocol`] in the server's
|
|
configuration to `3` and restart the server.
|
|
|
|
[peers-json]: /nomad/docs/manage/outage-recovery#manual-recovery-using-peers-json
|
|
[`raft_protocol`]: /nomad/docs/configuration/server#raft_protocol
|
|
[`leave_on_interrupt`]: /nomad/docs/configuration#leave_on_interrupt
|
|
[`leave_on_terminate`]: /nomad/docs/configuration#leave_on_terminate
|