mirror of
https://github.com/kemko/nomad.git
synced 2026-01-04 17:35:43 +03:00
* Move commands from docs to its own root-level directory * temporarily use modified dev-portal branch with nomad ia changes * explicitly clone nomad ia exp branch * retrigger build, fixed dev-portal broken build * architecture, concepts and get started individual pages * fix get started section destinations * reference section * update repo comment in website-build.sh to show branch * docs nav file update capitalization * update capitalization to force deploy * remove nomad-vs-kubernetes dir; move content to what is nomad pg * job section * Nomad operations category, deploy section * operations category, govern section * operations - manage * operations/scale; concepts scheduling fix * networking * monitor * secure section * remote auth-methods folder and move up pages to sso; linkcheck * Fix install2deploy redirects * fix architecture redirects * Job section: Add missing section index pages * Add section index pages so breadcrumbs build correctly * concepts/index fix front matter indentation * move task driver plugin config to new deploy section * Finish adding full URL to tutorials links in nav * change SSO to Authentication in nav and file system * Docs NomadIA: Move tutorials into NomadIA branch (#26132) * Move governance and policy from tutorials to docs * Move tutorials content to job-declare section * run jobs section * stateful workloads * advanced job scheduling * deploy section * manage section * monitor section * secure/acl and secure/authorization * fix example that contains an unseal key in real format * remove images from sso-vault * secure/traffic * secure/workload-identities * vault-acl change unseal key and root token in command output sample * remove lines from sample output * fix front matter * move nomad pack tutorials to tools * search/replace /nomad/tutorials links * update acl overview with content from deleted architecture/acl * fix spelling mistake * linkcheck - fix broken links * fix link to Nomad variables tutorial * fix link to Prometheus tutorial * move who uses Nomad to use cases page; move spec/config shortcuts add dividers * Move Consul out of Integrations; move namespaces to govern * move integrations/vault to secure/vault; delete integrations * move ref arch to docs; rename Deploy Nomad back to Install Nomad * address feedback * linkcheck fixes * Fixed raw_exec redirect * add info from /nomad/tutorials/manage-jobs/jobs * update page content with newer tutorial * link updates for architecture sub-folders * Add redirects for removed section index pages. Fix links. * fix broken links from linkcheck * Revert to use dev-portal main branch instead of nomadIA branch * build workaround: add intro-nav-data.json with single entry * fix content-check error * add intro directory to get around Vercel build error * workound for emtpry directory * remove mdx from /intro/ to fix content-check and git snafu * Add intro index.mdx so Vercel build should work --------- Co-authored-by: Tu Nguyen <im2nguyen@gmail.com>
65 lines
3.4 KiB
Plaintext
65 lines
3.4 KiB
Plaintext
---
|
||
layout: docs
|
||
page_title: Multi-region federation operational considerations
|
||
description: |-
|
||
Review operational considerations for running Nomad multi-region federated clusters as well as instructions for migrating the authoritative region to a federated region.
|
||
---
|
||
|
||
# Multi-region federation operational considerations
|
||
|
||
This page lists operational considerations for running multi-region federated
|
||
clusters as well as instructions for migrating the authoritative region to a
|
||
federated region.
|
||
|
||
## Operational considerations
|
||
|
||
When operating multi-region federated Nomad clusters, consider the following:
|
||
|
||
* **Regular snapshots**: You can back up Nomad server state using the
|
||
[`nomad operator snapshot save`][] and [`nomad operator snapshot agent`][] commands. Performing
|
||
regular backups expedites disaster recovery. The cadence depends on cluster rates of change
|
||
and your internal SLA’s. You should regularly test snapshots using the
|
||
[`nomad operator snapshot restore`][] command to ensure they work.
|
||
|
||
* **Local ACL management tokens**: You need local management tokens to perform federated cluster
|
||
administration when the authoritative region is down. Make sure you have existing break-glass
|
||
tokens available for each region.
|
||
|
||
* **Known paths to creating local ACL tokens**: If the authoritative region fails, creation of
|
||
global ACL tokens fails. If this happens, having the ability to create local ACL tokens allows
|
||
you to continue to interact with each available federated region.
|
||
|
||
## Authoritative and federated regions
|
||
|
||
* **Can non-authoritative regions continue to operate if the authoritative region is unreachable?**:
|
||
Yes, running workloads are never interrupted due to federation failures. Scheduling of new
|
||
workloads and rescheduling of failed workloads is never interrupted due to federation failures.
|
||
See [Failure Scenarios][failure_scenarios] for details.
|
||
|
||
* **Can the authoritative region be deployed with servers only?** Yes, deploying the Nomad
|
||
authoritative region with servers only, without clients, works as expected. This servers-only
|
||
approach can expedite disaster recovery of the region. Restoration does not include objects such
|
||
as nodes, jobs, or allocations, which are large and require compute intensive reconciliation
|
||
after restoration.
|
||
|
||
* **Can I migrate the authoritative region to a currently federated region?** It is possible by
|
||
following these steps:
|
||
|
||
1. Update the [`authoritative_region`][] configuration parameter on the desired authoritative
|
||
region servers.
|
||
1. Restart the server processes in the new authoritative region and ensure all data is present in
|
||
state as expected. If the network was partitioned as part of the failure of the original
|
||
authoritative region, writes of replicated objects may not have been successfully replicated to
|
||
federated regions.
|
||
1. Update the [`authoritative_region`][] configuration parameter on the federated region servers
|
||
and restart their processes.
|
||
|
||
* **Can federated regions be bootstrapped while the authoritative region is down?** No they
|
||
cannot.
|
||
|
||
[`nomad operator snapshot save`]: /nomad/commands/operator/snapshot/save
|
||
[`nomad operator snapshot agent`]: /nomad/commands/operator/snapshot/agent
|
||
[`nomad operator snapshot restore`]: /nomad/commands/operator/snapshot/restore
|
||
[failure_scenarios]: /nomad/docs/deploy/clusters/federation-failure-scenarios
|
||
[`authoritative_region`]: /nomad/docs/configuration/server#authoritative_region
|