mirror of
https://github.com/kemko/nomad.git
synced 2026-01-06 10:25:42 +03:00
Update website to remove a lot of copy-paste with Vault + improve images
This commit is contained in:
@@ -12,7 +12,7 @@ Nomad is a complex system that has many different pieces. To help both users and
|
||||
build a mental model of how it works, this page documents the system architecture.
|
||||
|
||||
~> **Advanced Topic!** This page covers technical details
|
||||
of Nomad. You don't need to understand these details to
|
||||
of Nomad. You do not need to understand these details to
|
||||
effectively use Nomad. The details are documented here for
|
||||
those who wish to learn about them without having to go
|
||||
spelunking through the source code.
|
||||
@@ -74,7 +74,7 @@ clarify what is being discussed:
|
||||
|
||||
Looking at only a single region, at a high level Nomad looks like:
|
||||
|
||||

|
||||
[](/assets/images/nomad-architecture-region.png)
|
||||
|
||||
Within each region, we have both clients and servers. Servers are responsible for
|
||||
accepting jobs from users, managing clients, and [computing task placements](/docs/internals/scheduling.html).
|
||||
@@ -85,7 +85,7 @@ In some cases, for either availability or scalability, you may need to run multi
|
||||
regions. Nomad supports federating multiple regions together into a single cluster.
|
||||
At a high level, this looks like:
|
||||
|
||||

|
||||
[](/assets/images/nomad-architecture-global.png)
|
||||
|
||||
Regions are fully independent from each other, and do not share jobs, clients or
|
||||
state. They are loosely-coupled using a gossip protocol, which allows users to
|
||||
|
||||
@@ -3,54 +3,57 @@ layout: "docs"
|
||||
page_title: "Consensus Protocol"
|
||||
sidebar_current: "docs-internals-consensus"
|
||||
description: |-
|
||||
Nomad uses a consensus protocol to provide Consistency as defined by CAP. The consensus protocol is based on Raft: In search of an Understandable Consensus Algorithm. For a visual explanation of Raft, see The Secret Lives of Data.
|
||||
Nomad uses a consensus protocol to provide Consistency as defined by CAP.
|
||||
The consensus protocol is based on Raft: In search of an Understandable
|
||||
Consensus Algorithm. For a visual explanation of Raft, see The Secret Lives of
|
||||
Data.
|
||||
---
|
||||
|
||||
# Consensus Protocol
|
||||
|
||||
Nomad uses a [consensus protocol](http://en.wikipedia.org/wiki/Consensus_(computer_science))
|
||||
to provide [Consistency (as defined by CAP)](http://en.wikipedia.org/wiki/CAP_theorem).
|
||||
Nomad uses a [consensus protocol](https://en.wikipedia.org/wiki/Consensus_(computer_science))
|
||||
to provide [Consistency (as defined by CAP)](https://en.wikipedia.org/wiki/CAP_theorem).
|
||||
The consensus protocol is based on
|
||||
["Raft: In search of an Understandable Consensus Algorithm"](https://ramcloud.stanford.edu/wiki/download/attachments/11370504/raft.pdf).
|
||||
For a visual explanation of Raft, see [The Secret Lives of Data](http://thesecretlivesofdata.com/raft).
|
||||
|
||||
~> **Advanced Topic!** This page covers technical details of
|
||||
the internals of Nomad. You don't need to know these details to effectively
|
||||
the internals of Nomad. You do not need to know these details to effectively
|
||||
operate and use Nomad. These details are documented here for those who wish
|
||||
to learn about them without having to go spelunking through the source code.
|
||||
|
||||
## Raft Protocol Overview
|
||||
|
||||
Raft is a consensus algorithm that is based on
|
||||
[Paxos](http://en.wikipedia.org/wiki/Paxos_%28computer_science%29). Compared
|
||||
[Paxos](https://en.wikipedia.org/wiki/Paxos_%28computer_science%29). Compared
|
||||
to Paxos, Raft is designed to have fewer states and a simpler, more
|
||||
understandable algorithm.
|
||||
|
||||
There are a few key terms to know when discussing Raft:
|
||||
|
||||
* Log - The primary unit of work in a Raft system is a log entry. The problem
|
||||
* **Log** - The primary unit of work in a Raft system is a log entry. The problem
|
||||
of consistency can be decomposed into a *replicated log*. A log is an ordered
|
||||
sequence of entries. We consider the log consistent if all members agree on
|
||||
the entries and their order.
|
||||
|
||||
* FSM - [Finite State Machine](http://en.wikipedia.org/wiki/Finite-state_machine).
|
||||
* **FSM** - [Finite State Machine](https://en.wikipedia.org/wiki/Finite-state_machine).
|
||||
An FSM is a collection of finite states with transitions between them. As new logs
|
||||
are applied, the FSM is allowed to transition between states. Application of the
|
||||
same sequence of logs must result in the same state, meaning behavior must be deterministic.
|
||||
|
||||
* Peer set - The peer set is the set of all members participating in log replication.
|
||||
* **Peer set** - The peer set is the set of all members participating in log replication.
|
||||
For Nomad's purposes, all server nodes are in the peer set of the local region.
|
||||
|
||||
* Quorum - A quorum is a majority of members from a peer set: for a set of size `n`,
|
||||
* **Quorum** - A quorum is a majority of members from a peer set: for a set of size `n`,
|
||||
quorum requires at least `(n/2)+1` members.
|
||||
For example, if there are 5 members in the peer set, we would need 3 nodes
|
||||
to form a quorum. If a quorum of nodes is unavailable for any reason, the
|
||||
cluster becomes *unavailable* and no new logs can be committed.
|
||||
|
||||
* Committed Entry - An entry is considered *committed* when it is durably stored
|
||||
* **Committed Entry** - An entry is considered *committed* when it is durably stored
|
||||
on a quorum of nodes. Once an entry is committed it can be applied.
|
||||
|
||||
* Leader - At any given time, the peer set elects a single node to be the leader.
|
||||
* **Leader** - At any given time, the peer set elects a single node to be the leader.
|
||||
The leader is responsible for ingesting new log entries, replicating to followers,
|
||||
and managing when an entry is considered committed.
|
||||
|
||||
|
||||
@@ -3,19 +3,20 @@ layout: "docs"
|
||||
page_title: "Gossip Protocol"
|
||||
sidebar_current: "docs-internals-gossip"
|
||||
description: |-
|
||||
Nomad uses a gossip protocol to manage membership. All of this is provided through the use of the Serf library.
|
||||
Nomad uses a gossip protocol to manage membership. All of this is provided
|
||||
through the use of the Serf library.
|
||||
---
|
||||
|
||||
# Gossip Protocol
|
||||
|
||||
Nomad uses a [gossip protocol](http://en.wikipedia.org/wiki/Gossip_protocol)
|
||||
Nomad uses a [gossip protocol](https://en.wikipedia.org/wiki/Gossip_protocol)
|
||||
to manage membership. This is provided through the use of the [Serf library](https://www.serfdom.io/).
|
||||
The gossip protocol used by Serf is based on
|
||||
["SWIM: Scalable Weakly-consistent Infection-style Process Group Membership Protocol"](http://www.cs.cornell.edu/~asdas/research/dsn02-swim.pdf),
|
||||
["SWIM: Scalable Weakly-consistent Infection-style Process Group Membership Protocol"](https://www.cs.cornell.edu/~asdas/research/dsn02-swim.pdf),
|
||||
with a few minor adaptations. There are more details about [Serf's protocol here](https://www.serfdom.io/docs/internals/gossip.html).
|
||||
|
||||
~> **Advanced Topic!** This page covers technical details of
|
||||
the internals of Nomad. You don't need to know these details to effectively
|
||||
the internals of Nomad. You do not need to know these details to effectively
|
||||
operate and use Nomad. These details are documented here for those who wish
|
||||
to learn about them without having to go spelunking through the source code.
|
||||
|
||||
|
||||
@@ -13,5 +13,5 @@ details of how Nomad functions, its architecture and sub-systems.
|
||||
|
||||
-> **Note:** Knowledge of Nomad internals is not
|
||||
required to use Nomad. If you aren't interested in the internals
|
||||
of Nomad, you may safely skip this section. If you're operating Nomad,
|
||||
of Nomad, you may safely skip this section. If you are operating Nomad,
|
||||
we recommend understanding the internals.
|
||||
|
||||
@@ -13,18 +13,18 @@ from jobs to client machines. This process must respect the constraints as decla
|
||||
in the job, and optimize for resource utilization. This page documents the details
|
||||
of how scheduling works in Nomad to help both users and developers
|
||||
build a mental model. The design is heavily inspired by Google's
|
||||
work on both [Omega: flexible, scalable schedulers for large compute clusters](http://research.google.com/pubs/pub41684.html)
|
||||
and [Large-scale cluster management at Google with Borg](http://research.google.com/pubs/pub43438.html).
|
||||
work on both [Omega: flexible, scalable schedulers for large compute clusters](https://research.google.com/pubs/pub41684.html)
|
||||
and [Large-scale cluster management at Google with Borg](https://research.google.com/pubs/pub43438.html).
|
||||
|
||||
~> **Advanced Topic!** This page covers technical details
|
||||
of Nomad. You don't need to understand these details to
|
||||
of Nomad. You do not need to understand these details to
|
||||
effectively use Nomad. The details are documented here for
|
||||
those who wish to learn about them without having to go
|
||||
spelunking through the source code.
|
||||
|
||||
# Scheduling in Nomad
|
||||
|
||||

|
||||
[](/assets/images/nomad-data-model.png)
|
||||
|
||||
There are four primary "nouns" in Nomad, these are jobs, nodes, allocations, and evaluations.
|
||||
Jobs are submitted by users and represent a _desired state_. A job is a declarative description
|
||||
@@ -43,7 +43,7 @@ it with the desired state.
|
||||
|
||||
This diagram shows the flow of an evaluation through Nomad:
|
||||
|
||||

|
||||
[](/assets/images/nomad-evaluation-flow.png)
|
||||
|
||||
The lifecycle of an evaluation beings with an event causing the evaluation to be
|
||||
created. Evaluations are created in the `pending` state and are enqueued into the
|
||||
|
||||
@@ -19,7 +19,7 @@ it will dump the current telemetry information to the agent's `stderr`.
|
||||
This telemetry information can be used for debugging or otherwise
|
||||
getting a better view of what Nomad is doing.
|
||||
|
||||
Telemetry information can be streamed to both [statsite](http://github.com/armon/statsite)
|
||||
Telemetry information can be streamed to both [statsite](https://github.com/armon/statsite)
|
||||
as well as statsd based on providing the appropriate configuration options.
|
||||
|
||||
Below is sample output of a telemetry dump:
|
||||
|
||||
Reference in New Issue
Block a user