Files
nomad/website/content/docs/deploy/clusters/federation-considerations.mdx
Aimee Ukasick 53b083b8c5 Docs: Nomad IA (#26063)
* Move commands from docs to its own root-level directory

* temporarily use modified dev-portal branch with nomad ia changes

* explicitly clone nomad ia exp branch

* retrigger build, fixed dev-portal broken build

* architecture, concepts and get started individual pages

* fix get started section destinations

* reference section

* update repo comment in website-build.sh to show branch

* docs nav file update capitalization

* update capitalization to force deploy

* remove nomad-vs-kubernetes dir; move content to what is nomad pg

* job section

* Nomad operations category, deploy section

* operations category, govern section

* operations - manage

* operations/scale; concepts scheduling fix

* networking

* monitor

* secure section

* remote auth-methods folder and move up pages to sso; linkcheck

* Fix install2deploy redirects

* fix architecture redirects

* Job section: Add missing section index pages

* Add section index pages so breadcrumbs build correctly

* concepts/index fix front matter indentation

* move task driver plugin config to new deploy section

* Finish adding full URL to tutorials links in nav

* change SSO to Authentication in nav and file system

* Docs NomadIA: Move tutorials into NomadIA branch (#26132)

* Move governance and policy from tutorials to docs

* Move tutorials content to job-declare section

* run jobs section

* stateful workloads

* advanced job scheduling

* deploy section

* manage section

* monitor section

* secure/acl and secure/authorization

* fix example that contains an unseal key in real format

* remove images from sso-vault

* secure/traffic

* secure/workload-identities

* vault-acl change unseal key and root token in command output sample

* remove lines from sample output

* fix front matter

* move nomad pack tutorials to tools

* search/replace /nomad/tutorials links

* update acl overview with content from deleted architecture/acl

* fix spelling mistake

* linkcheck - fix broken links

* fix link to Nomad variables tutorial

* fix link to Prometheus tutorial

* move who uses Nomad to use cases page; move spec/config shortcuts

add dividers

* Move Consul out of Integrations; move namespaces to govern

* move integrations/vault to secure/vault; delete integrations

* move ref arch to docs; rename Deploy Nomad back to Install Nomad

* address feedback

* linkcheck fixes

* Fixed raw_exec redirect

* add info from /nomad/tutorials/manage-jobs/jobs

* update page content with newer tutorial

* link updates for architecture sub-folders

* Add redirects for removed section index pages. Fix links.

* fix broken links from linkcheck

* Revert to use dev-portal main branch instead of nomadIA branch

* build workaround: add intro-nav-data.json with single entry

* fix content-check error

* add intro directory to get around Vercel build error

* workound for emtpry directory

* remove mdx from /intro/ to fix content-check and git snafu

* Add intro index.mdx so Vercel build should work

---------

Co-authored-by: Tu Nguyen <im2nguyen@gmail.com>
2025-07-08 19:24:52 -05:00

65 lines
3.4 KiB
Plaintext
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
layout: docs
page_title: Multi-region federation operational considerations
description: |-
Review operational considerations for running Nomad multi-region federated clusters as well as instructions for migrating the authoritative region to a federated region.
---
# Multi-region federation operational considerations
This page lists operational considerations for running multi-region federated
clusters as well as instructions for migrating the authoritative region to a
federated region.
## Operational considerations
When operating multi-region federated Nomad clusters, consider the following:
* **Regular snapshots**: You can back up Nomad server state using the
[`nomad operator snapshot save`][] and [`nomad operator snapshot agent`][] commands. Performing
regular backups expedites disaster recovery. The cadence depends on cluster rates of change
and your internal SLAs. You should regularly test snapshots using the
[`nomad operator snapshot restore`][] command to ensure they work.
* **Local ACL management tokens**: You need local management tokens to perform federated cluster
administration when the authoritative region is down. Make sure you have existing break-glass
tokens available for each region.
* **Known paths to creating local ACL tokens**: If the authoritative region fails, creation of
global ACL tokens fails. If this happens, having the ability to create local ACL tokens allows
you to continue to interact with each available federated region.
## Authoritative and federated regions
* **Can non-authoritative regions continue to operate if the authoritative region is unreachable?**:
Yes, running workloads are never interrupted due to federation failures. Scheduling of new
workloads and rescheduling of failed workloads is never interrupted due to federation failures.
See [Failure Scenarios][failure_scenarios] for details.
* **Can the authoritative region be deployed with servers only?** Yes, deploying the Nomad
authoritative region with servers only, without clients, works as expected. This servers-only
approach can expedite disaster recovery of the region. Restoration does not include objects such
as nodes, jobs, or allocations, which are large and require compute intensive reconciliation
after restoration.
* **Can I migrate the authoritative region to a currently federated region?** It is possible by
following these steps:
1. Update the [`authoritative_region`][] configuration parameter on the desired authoritative
region servers.
1. Restart the server processes in the new authoritative region and ensure all data is present in
state as expected. If the network was partitioned as part of the failure of the original
authoritative region, writes of replicated objects may not have been successfully replicated to
federated regions.
1. Update the [`authoritative_region`][] configuration parameter on the federated region servers
and restart their processes.
* **Can federated regions be bootstrapped while the authoritative region is down?** No they
cannot.
[`nomad operator snapshot save`]: /nomad/commands/operator/snapshot/save
[`nomad operator snapshot agent`]: /nomad/commands/operator/snapshot/agent
[`nomad operator snapshot restore`]: /nomad/commands/operator/snapshot/restore
[failure_scenarios]: /nomad/docs/deploy/clusters/federation-failure-scenarios
[`authoritative_region`]: /nomad/docs/configuration/server#authoritative_region