Files
nomad/website/content/docs/architecture/index.mdx
Aimee Ukasick 53b083b8c5 Docs: Nomad IA (#26063)
* Move commands from docs to its own root-level directory

* temporarily use modified dev-portal branch with nomad ia changes

* explicitly clone nomad ia exp branch

* retrigger build, fixed dev-portal broken build

* architecture, concepts and get started individual pages

* fix get started section destinations

* reference section

* update repo comment in website-build.sh to show branch

* docs nav file update capitalization

* update capitalization to force deploy

* remove nomad-vs-kubernetes dir; move content to what is nomad pg

* job section

* Nomad operations category, deploy section

* operations category, govern section

* operations - manage

* operations/scale; concepts scheduling fix

* networking

* monitor

* secure section

* remote auth-methods folder and move up pages to sso; linkcheck

* Fix install2deploy redirects

* fix architecture redirects

* Job section: Add missing section index pages

* Add section index pages so breadcrumbs build correctly

* concepts/index fix front matter indentation

* move task driver plugin config to new deploy section

* Finish adding full URL to tutorials links in nav

* change SSO to Authentication in nav and file system

* Docs NomadIA: Move tutorials into NomadIA branch (#26132)

* Move governance and policy from tutorials to docs

* Move tutorials content to job-declare section

* run jobs section

* stateful workloads

* advanced job scheduling

* deploy section

* manage section

* monitor section

* secure/acl and secure/authorization

* fix example that contains an unseal key in real format

* remove images from sso-vault

* secure/traffic

* secure/workload-identities

* vault-acl change unseal key and root token in command output sample

* remove lines from sample output

* fix front matter

* move nomad pack tutorials to tools

* search/replace /nomad/tutorials links

* update acl overview with content from deleted architecture/acl

* fix spelling mistake

* linkcheck - fix broken links

* fix link to Nomad variables tutorial

* fix link to Prometheus tutorial

* move who uses Nomad to use cases page; move spec/config shortcuts

add dividers

* Move Consul out of Integrations; move namespaces to govern

* move integrations/vault to secure/vault; delete integrations

* move ref arch to docs; rename Deploy Nomad back to Install Nomad

* address feedback

* linkcheck fixes

* Fixed raw_exec redirect

* add info from /nomad/tutorials/manage-jobs/jobs

* update page content with newer tutorial

* link updates for architecture sub-folders

* Add redirects for removed section index pages. Fix links.

* fix broken links from linkcheck

* Revert to use dev-portal main branch instead of nomadIA branch

* build workaround: add intro-nav-data.json with single entry

* fix content-check error

* add intro directory to get around Vercel build error

* workound for emtpry directory

* remove mdx from /intro/ to fix content-check and git snafu

* Add intro index.mdx so Vercel build should work

---------

Co-authored-by: Tu Nguyen <im2nguyen@gmail.com>
2025-07-08 19:24:52 -05:00

91 lines
4.9 KiB
Plaintext

---
layout: docs
page_title: Architecture
description: |-
Nomad's system architecture supports a "run any workload anywhere" approach to scheduling and orchestration. Learn how regions, servers, and clients interact, and how Nomad clusters replicate data and run workloads.
---
# Architecture
This page provides conceptual information on Nomad architecture. Learn how regions, servers, and clients interact, and how Nomad clusters replicate data and run workloads.
Nomad is a complex system that has many different pieces. To help both users and
developers of Nomad build a mental model of how it works, this page documents
the system architecture.
Refer to the [glossary][] for more details on some of the terms discussed here.
~> **Advanced Topic!** This page covers technical details
of Nomad. You do not need to understand these details to
effectively use Nomad. The details are documented here for
those who wish to learn about them without having to go
spelunking through the source code.
## High-Level Overview
Looking at only a single region, at a high level Nomad looks like this:
[![Regional Architecture](/img/nomad-architecture-region.png)](/img/nomad-architecture-region.png)
Within each region, we have both clients and servers. Servers are responsible for
accepting jobs from users, managing clients, and [computing task placements](/nomad/docs/concepts/scheduling/how-scheduling-works).
Each region may have clients from multiple datacenters, allowing a small number of servers
to handle very large clusters.
In some cases, for either availability or scalability, you may need to run multiple
regions. Nomad supports federating multiple regions together into a single cluster.
At a high level, this setup looks like this:
[![Global Architecture](/img/nomad-architecture-global.png)](/img/nomad-architecture-global.png)
Regions are fully independent from each other, and do not share jobs, clients, or
state. They are loosely-coupled using a gossip protocol, which allows users to
submit jobs to any region or query the state of any region transparently. Requests
are forwarded to the appropriate server to be processed and the results returned.
Data is _not_ replicated between regions.
The servers in each region are all part of a single consensus group. This means
that they work together to elect a single leader which has extra duties. The leader
is responsible for processing all queries and transactions. Nomad is optimistically
concurrent, meaning all servers participate in making scheduling decisions in parallel.
The leader provides the additional coordination necessary to do this safely and
to ensure clients are not oversubscribed.
Each region is expected to have either three or five servers. This strikes a balance
between availability in the case of failure and performance, as consensus gets
progressively slower as more servers are added. However, there is no limit to the number
of clients per region.
Clients are configured to communicate with their regional servers and communicate
using remote procedure calls (RPC) to register themselves, send heartbeats for liveness,
wait for new allocations, and update the status of allocations. A client registers
with the servers to provide the resources available, attributes, and installed drivers.
Servers use this information for scheduling decisions and create allocations to assign
work to clients.
Users make use of the Nomad CLI or API to submit jobs to the servers. A job represents
a desired state and provides the set of tasks that should be run. The servers are
responsible for scheduling the tasks, which is done by finding an optimal placement for
each task such that resource utilization is maximized while satisfying all constraints
specified by the job. Resource utilization is maximized by bin packing, in which
the scheduling tries to make use of all the resources of a machine without
exhausting any dimension. Job constraints can be used to ensure an application is
running in an appropriate environment. Constraints can be technical requirements based
on hardware features such as architecture and availability of GPUs, or software features
like operating system and kernel version, or they can be business constraints like
ensuring PCI compliant workloads run on appropriate servers.
## Getting in Depth
This has been a brief high-level overview of the architecture of Nomad. There
are more details available for each of the sub-systems. The [consensus protocol](/nomad/docs/architecture/cluster/consensus),
[gossip protocol](/nomad/docs/architecture/security/gossip), and [scheduler design](/nomad/docs/concepts/scheduling/how-scheduling-works)
are all documented in more detail.
For other details, either consult the code, [open an issue on
GitHub][gh_issue], or ask a question in the [community forum][forum].
[gh_issue]: https://github.com/hashicorp/nomad/issues/new/choose
[forum]: https://discuss.hashicorp.com/c/nomad
[glossary]: /nomad/docs/glossary