The drainer component is fairly complex. As part of upcoming work to fix some of the drainer's rough edges, document the drainer's architecture from a Nomad developer perspective.
8.6 KiB
Architecture: Drainer
The drainer is a component that runs on the leader that services requests from
the nomad node drain command and related workflows in the web UI. For a
play-by-play from the user's perspective, see node drain tutorial. This
document describes the internals of the drainer for Nomad developers.
The high-level workflow is that:
- The user sets the drain state of the Client ("Node") in the state store.
- Allocations are migrated according to their
migrateblock. - The drainer creates a watcher for the Node to fire an event when the work is done or the drain's deadline is reached.
- The drainer creates watchers for each job's allocs on the Node, to fire progress events.
Effectively the drainer marks allocations for migration and emits an eval, and then lets the scheduler take it from there.
Components
There are four major components of the drainer:
-
NodeDrainer: The entrypoint struct for thenomad/drainerpackage. This struct runs a top-level event loop that's enabled only on the leader. It's configured with the three "watcher" interfaces described below. -
DrainingNodeWatcher: A watcher interface implemented by thenodeDrainWatcherstruct inwatch_nodes.go. Runs a loop that watches for changes to Nodes. If a Node change transitions a Node to draining, theDrainingNodeWatcheradds the Node to its tracker. It queries the state store to get the jobs that have allocations running on that Node, and registers those jobs with theDrainingJobWatcher. -
DrainingJobWatcher: A watcher interface implemented by thedrainingJobWatcherstruct inwatch_jobs.go. Runs a loop that watches jobs registered by theDrainingNodeWatcherand allocations in the state store. The job watcher is where the job'smigrateblock is handled so that only the correct number of allocations are being drained at a time. The job watcher exposes two methods that return channels:-
Drain()returns a channel that produces aDrainRequestfor all the allocs on these jobs that need to be drained. TheNodeDrainerturns this request into Raft writes viaAllocUpdateDesiredTransition. -
Migrated()returns a channel that produces slices of allocations that have completed migration. "Completed" should mean exactly what the end user would expect; the replacement allocations have been placed,ephemeral_diskhas been migrated (if possible), and the old allocation is fully stopped.
-
-
DrainDeadlineNotifier: A watcher interface implemented by thedeadlineHeapstruct indrain_heap.go. Runs a loop that tracks the Nodes being drained against their deadline timers. TheNodeDrainercan watch the channel returned by theNextBatchmethod to get slices of Nodes that have failed to complete their migrations by the deadline.
There is also a collection of other minor components important to understanding the workflow:
-
Raft shims: Because the
nomad/drainerpackage is not in the same package as the server code, the server configures theNodeDrainerwith shim functions that close over the small set of Raft apply functions the drainer needs. For this reason they are located innomad/drainer_shims.gorather than thenomad/drainerpackage.-
AllocUpdateDesiredTransitionincludes allocation desired status changes and the evaluations that will need to be processed. -
NodesDrainCompleteincludes updates for the drained Node.
-
-
drainingNode: This struct represents the state of a single Node whose drain is being tracked. Created by theDrainingNodeWatcherwhenever a Node is marked for draining in the state store. -
DrainRequest: This struct represents a set of allocations that should be marked for drain. Created byDrainingJobWatcherwhenever it receives a job to drain.
A note on code style: the drainer is implemented with an unusual amount of
dependency injection via factory functions that return interfaces because it has
to handle state and raft writes without being in the top-level nomad package
itself. It also can't import the top-level nomad package because the drainer
is instantiated by the server, and that would create a circular
import. Generally speaking we don't want to emulate this style elsewhere in
Nomad because it makes implementation harder to follow, but it makes sense in
this limited case.
Events
The components combine into three high-level flow of events. The first is the
flow of a newly draining Node. The NodeWatcher gets the Node from a blocking
query. It registers the job with the JobWatcher. The JobWatcher determines
which allocations need draining. These are polled from the Drain() channel by
NodeDrainer and written to raft via the AllocUpdateDesiredTransition
shim. Then the scheduler and clients picks up the changes.
flowchart TD
%% entities
clients
scheduler
user(user)
NodeDrainer([NodeDrainer])
NodeWatcher([DrainingNodeWatcher])
JobWatcher([DrainingJobWatcher])
StateStore([state store])
%% style classes
classDef component fill:#d5f6ea,stroke-width:4px,stroke:#1d9467
classDef other fill:#d5f6ea,stroke:#1d9467
class user,clients,scheduler,StateStore other;
class NodeDrainer,NodeWatcher,JobWatcher component;
user -. "1. enable drain\nfor node" .-> StateStore
StateStore -- "2. blocking query for\nnewly draining nodes" --> NodeWatcher
NodeWatcher -- "3. RegisterJobs(jobs)" --> JobWatcher
JobWatcher -- "4. Drain(): allocs for job that need draining" --> NodeDrainer
NodeDrainer -- "5. AllocUpdateDesiredTransition\n(raft shim)" --> StateStore
StateStore -. 6. EvalDequeue .-> scheduler
StateStore -. 7. GetAllocs .-> clients
The second is when allocation migrations are complete. The clients update the
state of the migrated allocs. The JobWatcher has a blocking query that detects
these changes. Allocs that are done migrating get sent on the Migrated()
channel that's polled by the NodeDrainer. The NodeDrainer determines whether
the Node is done being drained, and writes an update via the
NodesDrainComplete raft shim.
flowchart TD
%% entities
clients
StateStore([state store])
NodeDrainer([NodeDrainer])
JobWatcher([DrainingJobWatcher])
%% style classes
classDef component fill:#d5f6ea,stroke-width:4px,stroke:#1d9467
classDef other fill:#d5f6ea,stroke:#1d9467
class clients,StateStore other;
class NodeDrainer,JobWatcher component;
clients -. "1. UpdateAlloc" .-> StateStore
StateStore -- "2. blocking query\nfor allocs" --> JobWatcher
JobWatcher -- "3. Migrated(): allocs that are done" --> NodeDrainer
NodeDrainer -- "4. NodesDrainComplete\n(raft shim)" --> StateStore
And the third is when Nodes pass their deadline. The NodeWatcher is
responsible for adding and removing the watch in the DeadlineNotifier. The
DeadlineNotifier is responsible for watching the timer. If the Node isn't
removed before the deadline, the DeadlineNotifier tells the NodeDrainer and
the NodeDrainer updates the state via the NodesDrainComplete shim. At this
point the remaining allocs will be forced to shutdown immediately.
flowchart TD
%% entities
NodeDrainer([NodeDrainer])
NodeWatcher([DrainingNodeWatcher])
DeadlineNotifier([DrainDeadlineNotifier])
StateStore([state store])
%% style classes
classDef component fill:#d5f6ea,stroke-width:4px,stroke:#1d9467
classDef other fill:#d5f6ea,stroke:#1d9467
class StateStore other;
class NodeDrainer,NodeWatcher,DeadlineNotifier component;
NodeWatcher -- "1. Watch()" --> DeadlineNotifier
NodeWatcher -- "2a. Remove()" --> DeadlineNotifier
DeadlineNotifier -- "2b. watch the clock" --> DeadlineNotifier
DeadlineNotifier -- "3. node has passed deadline" --> NodeDrainer
NodeDrainer -- "4. NodesDrainComplete\n(raft shim)" --> StateStore