Update website to remove a lot of copy-paste with Vault + improve images

This commit is contained in:
Seth Vargo
2015-09-20 16:37:22 -04:00
parent 6a3623ebe0
commit d9aed3af4d
65 changed files with 116 additions and 3676 deletions

View File

@@ -12,7 +12,7 @@ Nomad is a complex system that has many different pieces. To help both users and
build a mental model of how it works, this page documents the system architecture.
~> **Advanced Topic!** This page covers technical details
of Nomad. You don't need to understand these details to
of Nomad. You do not need to understand these details to
effectively use Nomad. The details are documented here for
those who wish to learn about them without having to go
spelunking through the source code.
@@ -74,7 +74,7 @@ clarify what is being discussed:
Looking at only a single region, at a high level Nomad looks like:
![Regional Architecture](/assets/images/region-arch.png)
[![Regional Architecture](/assets/images/nomad-architecture-region.png)](/assets/images/nomad-architecture-region.png)
Within each region, we have both clients and servers. Servers are responsible for
accepting jobs from users, managing clients, and [computing task placements](/docs/internals/scheduling.html).
@@ -85,7 +85,7 @@ In some cases, for either availability or scalability, you may need to run multi
regions. Nomad supports federating multiple regions together into a single cluster.
At a high level, this looks like:
![Global Architecture](/assets/images/global-arch.png)
[![Global Architecture](/assets/images/nomad-architecture-global.png)](/assets/images/nomad-architecture-global.png)
Regions are fully independent from each other, and do not share jobs, clients or
state. They are loosely-coupled using a gossip protocol, which allows users to

View File

@@ -3,54 +3,57 @@ layout: "docs"
page_title: "Consensus Protocol"
sidebar_current: "docs-internals-consensus"
description: |-
Nomad uses a consensus protocol to provide Consistency as defined by CAP. The consensus protocol is based on Raft: In search of an Understandable Consensus Algorithm. For a visual explanation of Raft, see The Secret Lives of Data.
Nomad uses a consensus protocol to provide Consistency as defined by CAP.
The consensus protocol is based on Raft: In search of an Understandable
Consensus Algorithm. For a visual explanation of Raft, see The Secret Lives of
Data.
---
# Consensus Protocol
Nomad uses a [consensus protocol](http://en.wikipedia.org/wiki/Consensus_(computer_science))
to provide [Consistency (as defined by CAP)](http://en.wikipedia.org/wiki/CAP_theorem).
Nomad uses a [consensus protocol](https://en.wikipedia.org/wiki/Consensus_(computer_science))
to provide [Consistency (as defined by CAP)](https://en.wikipedia.org/wiki/CAP_theorem).
The consensus protocol is based on
["Raft: In search of an Understandable Consensus Algorithm"](https://ramcloud.stanford.edu/wiki/download/attachments/11370504/raft.pdf).
For a visual explanation of Raft, see [The Secret Lives of Data](http://thesecretlivesofdata.com/raft).
~> **Advanced Topic!** This page covers technical details of
the internals of Nomad. You don't need to know these details to effectively
the internals of Nomad. You do not need to know these details to effectively
operate and use Nomad. These details are documented here for those who wish
to learn about them without having to go spelunking through the source code.
## Raft Protocol Overview
Raft is a consensus algorithm that is based on
[Paxos](http://en.wikipedia.org/wiki/Paxos_%28computer_science%29). Compared
[Paxos](https://en.wikipedia.org/wiki/Paxos_%28computer_science%29). Compared
to Paxos, Raft is designed to have fewer states and a simpler, more
understandable algorithm.
There are a few key terms to know when discussing Raft:
* Log - The primary unit of work in a Raft system is a log entry. The problem
* **Log** - The primary unit of work in a Raft system is a log entry. The problem
of consistency can be decomposed into a *replicated log*. A log is an ordered
sequence of entries. We consider the log consistent if all members agree on
the entries and their order.
* FSM - [Finite State Machine](http://en.wikipedia.org/wiki/Finite-state_machine).
* **FSM** - [Finite State Machine](https://en.wikipedia.org/wiki/Finite-state_machine).
An FSM is a collection of finite states with transitions between them. As new logs
are applied, the FSM is allowed to transition between states. Application of the
same sequence of logs must result in the same state, meaning behavior must be deterministic.
* Peer set - The peer set is the set of all members participating in log replication.
* **Peer set** - The peer set is the set of all members participating in log replication.
For Nomad's purposes, all server nodes are in the peer set of the local region.
* Quorum - A quorum is a majority of members from a peer set: for a set of size `n`,
* **Quorum** - A quorum is a majority of members from a peer set: for a set of size `n`,
quorum requires at least `(n/2)+1` members.
For example, if there are 5 members in the peer set, we would need 3 nodes
to form a quorum. If a quorum of nodes is unavailable for any reason, the
cluster becomes *unavailable* and no new logs can be committed.
* Committed Entry - An entry is considered *committed* when it is durably stored
* **Committed Entry** - An entry is considered *committed* when it is durably stored
on a quorum of nodes. Once an entry is committed it can be applied.
* Leader - At any given time, the peer set elects a single node to be the leader.
* **Leader** - At any given time, the peer set elects a single node to be the leader.
The leader is responsible for ingesting new log entries, replicating to followers,
and managing when an entry is considered committed.

View File

@@ -3,19 +3,20 @@ layout: "docs"
page_title: "Gossip Protocol"
sidebar_current: "docs-internals-gossip"
description: |-
Nomad uses a gossip protocol to manage membership. All of this is provided through the use of the Serf library.
Nomad uses a gossip protocol to manage membership. All of this is provided
through the use of the Serf library.
---
# Gossip Protocol
Nomad uses a [gossip protocol](http://en.wikipedia.org/wiki/Gossip_protocol)
Nomad uses a [gossip protocol](https://en.wikipedia.org/wiki/Gossip_protocol)
to manage membership. This is provided through the use of the [Serf library](https://www.serfdom.io/).
The gossip protocol used by Serf is based on
["SWIM: Scalable Weakly-consistent Infection-style Process Group Membership Protocol"](http://www.cs.cornell.edu/~asdas/research/dsn02-swim.pdf),
["SWIM: Scalable Weakly-consistent Infection-style Process Group Membership Protocol"](https://www.cs.cornell.edu/~asdas/research/dsn02-swim.pdf),
with a few minor adaptations. There are more details about [Serf's protocol here](https://www.serfdom.io/docs/internals/gossip.html).
~> **Advanced Topic!** This page covers technical details of
the internals of Nomad. You don't need to know these details to effectively
the internals of Nomad. You do not need to know these details to effectively
operate and use Nomad. These details are documented here for those who wish
to learn about them without having to go spelunking through the source code.

View File

@@ -13,5 +13,5 @@ details of how Nomad functions, its architecture and sub-systems.
-> **Note:** Knowledge of Nomad internals is not
required to use Nomad. If you aren't interested in the internals
of Nomad, you may safely skip this section. If you're operating Nomad,
of Nomad, you may safely skip this section. If you are operating Nomad,
we recommend understanding the internals.

View File

@@ -13,18 +13,18 @@ from jobs to client machines. This process must respect the constraints as decla
in the job, and optimize for resource utilization. This page documents the details
of how scheduling works in Nomad to help both users and developers
build a mental model. The design is heavily inspired by Google's
work on both [Omega: flexible, scalable schedulers for large compute clusters](http://research.google.com/pubs/pub41684.html)
and [Large-scale cluster management at Google with Borg](http://research.google.com/pubs/pub43438.html).
work on both [Omega: flexible, scalable schedulers for large compute clusters](https://research.google.com/pubs/pub41684.html)
and [Large-scale cluster management at Google with Borg](https://research.google.com/pubs/pub43438.html).
~> **Advanced Topic!** This page covers technical details
of Nomad. You don't need to understand these details to
of Nomad. You do not need to understand these details to
effectively use Nomad. The details are documented here for
those who wish to learn about them without having to go
spelunking through the source code.
# Scheduling in Nomad
![Data Model](/assets/images/nomad-nouns.png)
[![Nomad Data Model](/assets/images/nomad-data-model.png)](/assets/images/nomad-data-model.png)
There are four primary "nouns" in Nomad, these are jobs, nodes, allocations, and evaluations.
Jobs are submitted by users and represent a _desired state_. A job is a declarative description
@@ -43,7 +43,7 @@ it with the desired state.
This diagram shows the flow of an evaluation through Nomad:
![Evaluation Flow](/assets/images/eval-flow.png)
[![Nomad Evaluation Flow](/assets/images/nomad-evaluation-flow.png)](/assets/images/nomad-evaluation-flow.png)
The lifecycle of an evaluation beings with an event causing the evaluation to be
created. Evaluations are created in the `pending` state and are enqueued into the

View File

@@ -19,7 +19,7 @@ it will dump the current telemetry information to the agent's `stderr`.
This telemetry information can be used for debugging or otherwise
getting a better view of what Nomad is doing.
Telemetry information can be streamed to both [statsite](http://github.com/armon/statsite)
Telemetry information can be streamed to both [statsite](https://github.com/armon/statsite)
as well as statsd based on providing the appropriate configuration options.
Below is sample output of a telemetry dump: