mirror of
https://github.com/kemko/nomad.git
synced 2026-01-01 16:05:42 +03:00
docs: expand Autoscaling documentation (#14937)
Rename `Internals` section to `Concepts` to match core docs structure and expand on how policies are evaluated. Also include missing documentation for check grouping and fix examples to use the new feature.
This commit is contained in:
52
website/content/tools/autoscaling/concepts/index.mdx
Normal file
52
website/content/tools/autoscaling/concepts/index.mdx
Normal file
@@ -0,0 +1,52 @@
|
||||
---
|
||||
layout: docs
|
||||
page_title: Autoscaling Concepts
|
||||
description: >
|
||||
This section covers concepts of the Nomad Autoscaler and explains
|
||||
technical details of its operation.
|
||||
---
|
||||
|
||||
# Nomad Autoscaler Concepts
|
||||
|
||||
This section covers concepts of the Nomad Autoscaler and explains the technical
|
||||
details of how it functions, its architecture, and sub-systems.
|
||||
|
||||
The Nomad Autoscaler is modeled around the concept of a closed-loop control
|
||||
system. These types of systems are often at the core of self-regulating
|
||||
mechanisms because they are able to adjust some value based on the current
|
||||
state of the system and some user provided configuration. An example of a
|
||||
closed-loop control system is a thermostat, where you set the desired
|
||||
temperature and the appliance will regulate the output of cold and hot air to
|
||||
make sure the room stays at the value set.
|
||||
|
||||
In closed-loop systems there are a few key components:
|
||||
|
||||
* **Setpoint** is the desired output as defined by the user.
|
||||
* **Comparator** computes the difference between the setpoint and current
|
||||
state of the system.
|
||||
* **Controller** connects all the components together and defines what
|
||||
needs to be done to bring the system closer to the desired state.
|
||||
* **Actuator** applies the changes defined by the controller.
|
||||
* **System** is the entity being controlled.
|
||||
* **Output** is the current value of the system.
|
||||
* **Sensor** reads the system output and translates it to a value that can be
|
||||
used by the controller.
|
||||
|
||||
[](/img/autoscaling/control-loop.png)
|
||||
|
||||
The Nomad Autoscaler follows this same base architecture and offloads some of
|
||||
the components to [different types of plugins](/tools/autoscaling/concepts/plugins).
|
||||
|
||||
* The autoscaling **policy** is how users define their desired outcome and
|
||||
control the Nomad Autoscaler.
|
||||
* **Target** is what users want to scale. It can be a job group, where the
|
||||
number of allocations is scaled, or a set of Nomad clients, where the number
|
||||
of nodes is what changes.
|
||||
* **Strategy plugins** receive the current status of the scaling target (such
|
||||
as the number of allocations of a group) and metrics of the system to compute
|
||||
what actions need to be taken.
|
||||
* **Target plugins** communicate with targets to both read its status and to
|
||||
apply changes defined by the Autoscaler.
|
||||
* **APM plugins** read application performance metrics from external sources.
|
||||
|
||||
[](/img/autoscaling/autoscaler-arch.png)
|
||||
@@ -0,0 +1,104 @@
|
||||
---
|
||||
layout: docs
|
||||
page_title: Checks
|
||||
description: Learn about how the Autoscaler deals with policy checks.
|
||||
---
|
||||
|
||||
# Scaling Policy Checks
|
||||
|
||||
A scaling policy can include several [checks][policy_check] all of which
|
||||
produce a scaling suggestion. Each check can specify its own source of metrics
|
||||
data and apply different strategies based on the desired outcome.
|
||||
|
||||
```hcl
|
||||
policy {
|
||||
# ...
|
||||
check "cpu_allocated_percentage" {
|
||||
source = "prometheus"
|
||||
query = "..."
|
||||
|
||||
strategy "target-value" {
|
||||
target = 70
|
||||
}
|
||||
}
|
||||
|
||||
check "high-memory-usage" {
|
||||
source = "prometheus"
|
||||
query = "..."
|
||||
group = "memory-usage"
|
||||
|
||||
strategy "threshold" {
|
||||
upper_bound = 100
|
||||
lower_bound = 70
|
||||
delta = 1
|
||||
}
|
||||
}
|
||||
|
||||
check "low-memory-usage" {
|
||||
source = "prometheus"
|
||||
query = "..."
|
||||
group = "memory-usage"
|
||||
|
||||
strategy "threshold" {
|
||||
upper_bound = 30
|
||||
lower_bound = 0
|
||||
delta = -1
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Resolving Conflicts
|
||||
|
||||
The checks are all executed at the same time during a policy evaluation and
|
||||
they can generate conflicting scaling actions. In a scenario like this, the
|
||||
Autoscaler iterates over the results and chooses the safest option, which is
|
||||
defined as the action that results in retaining the most capacity of the
|
||||
resource.
|
||||
|
||||
In a scenario where two checks return different desired scaling directions, the
|
||||
following logic is applied.
|
||||
|
||||
- `ScaleOut and ScaleIn => ScaleOut`
|
||||
- `ScaleOut and ScaleNone => ScaleOut`
|
||||
- `ScaleIn and ScaleNone => ScaleNone`
|
||||
|
||||
In situations where the same actions are suggested, but with different counts
|
||||
the following logic is applied, where the count is the final desired value.
|
||||
|
||||
- `ScaleOut(10) and ScaleOut(9) => ScaleOut(10)`
|
||||
- `ScaleIn(3) and ScaleIn(4) => ScaleIn(4)`
|
||||
|
||||
## Check Grouping
|
||||
|
||||
The above logic for resolving conflicts only works when the checks are
|
||||
independent from each other. If you use the same `query` in multiple `check`
|
||||
blocks, or if the underlying data being queried is somehow correlated, only
|
||||
one check will result in a scaling action.
|
||||
|
||||
In the example above, the `high-memory-usage` and `low-memory-usage` checks use
|
||||
the same query to retrieve memory usage information. We expect that memory
|
||||
usage is either low or high (or neither), but never both at the same time.
|
||||
|
||||
Without grouping the target is never be able to reduce its count, since the
|
||||
possible resulting actions and the final scaling outcome can only be one of the
|
||||
following:
|
||||
|
||||
- `ScaleOut and ScaleNone => ScaleOut`
|
||||
- `ScaleIn and ScaleNone => ScaleNone`
|
||||
- `ScaleNone and ScaleNone => ScaleNone`
|
||||
|
||||
To fix this problem, the correlated checks need to be set to the same `group`.
|
||||
The Nomad Autoscaler then computes a single scaling action for the entire group
|
||||
by applying a slightly different logic:
|
||||
|
||||
- `ScaleOut and ScaleIn => ScaleOut`
|
||||
- `ScaleOut and ScaleNone => ScaleOut`
|
||||
- `ScaleIn and ScaleNone => ScaleIn`
|
||||
- `ScaleNone and ScaleNone => ScaleNone`
|
||||
|
||||
`ScaleNone` results are ignored unless all checks in the group return it and so
|
||||
a group is able to `ScaleIn` a target even when all other checks results in no
|
||||
action.
|
||||
|
||||
[policy_check]: /tools/autoscaling/policy#check-options
|
||||
@@ -0,0 +1,35 @@
|
||||
---
|
||||
layout: docs
|
||||
page_title: Autoscaling Policy Evaluation
|
||||
description: >
|
||||
This section covers how scaling policies are evaluated to generate scaling
|
||||
actions.
|
||||
---
|
||||
|
||||
# Policy Evaluation
|
||||
|
||||
When the Nomad Autoscaler [agent] starts it loads all the policies defined in
|
||||
the [sources][agent_source] configured and monitors them for changes. Each
|
||||
policy is assigned a handler that periodically sends the policy to a broker
|
||||
where it is evaluated by a worker. The frequency the policy is enqueued is set
|
||||
by its [`evaluation_interval`][policy_eval_interval].
|
||||
|
||||
The worker executes a series of steps by calling the different plugins used in
|
||||
the policy to determine if a scaling action is needed and then to apply the
|
||||
necessary actions. The worker then loops back to evaluate the next policy.
|
||||
|
||||
If a scaling action is performed and the policy defines a
|
||||
[`cooldown`][policy_cooldown] value the policy handler waits the specified
|
||||
value before enqueuing it again.
|
||||
|
||||
If the policy target are Nomad clients the target plugin will usually execute
|
||||
more steps, such as [selecting nodes to be removed][concepts_node_selector] and
|
||||
draining them.
|
||||
|
||||
[](/img/autoscaling/policy-eval.png)
|
||||
|
||||
[agent]: /tools/autoscaling/agent
|
||||
[agent_source]: /tools/autoscaling/agent/source
|
||||
[concepts_node_selector]: /tools/autoscaling/concepts/policy-eval/node-selector-strategy
|
||||
[policy_cooldown]: /tools/autoscaling/policy#cooldown
|
||||
[policy_eval_interval]: /tools/autoscaling/policy#evaluation_interval
|
||||
@@ -1,26 +0,0 @@
|
||||
---
|
||||
layout: docs
|
||||
page_title: Checks
|
||||
description: Learn about how the Autoscaler deals with policy checks.
|
||||
---
|
||||
|
||||
# Nomad Autoscaler Check Calculations
|
||||
|
||||
A scaling policy can include several checks all of which produce a scaling
|
||||
suggesting. The checks are executed at the same time during a policy evaluation
|
||||
and the results can conflict with each other. In a scenario like this, the
|
||||
autoscaler iterates the results the chooses the safest result which results in
|
||||
retaining the most capacity of the resource.
|
||||
|
||||
In a scenario where two checks return different desired directions, the following
|
||||
logic is applied.
|
||||
|
||||
- `ScaleOut and ScaleIn => ScaleOut`
|
||||
- `ScaleOut and ScaleNone => ScaleOut`
|
||||
- `ScaleIn and ScaleNone => ScaleNone`
|
||||
|
||||
In situations where the two same actions are suggested, but with different counts the
|
||||
following logic is applied, where the count is the absolute desired value.
|
||||
|
||||
- `ScaleOut(10) and ScaleOut(9) => ScaleOut(10)`
|
||||
- `ScaleIn(3) and ScaleIn(4) => ScaleIn(4)`
|
||||
@@ -1,15 +0,0 @@
|
||||
---
|
||||
layout: docs
|
||||
page_title: Internals
|
||||
description: >
|
||||
This section covers the internals of the Nomad Autoscaler and explains
|
||||
technical details of its operation.
|
||||
---
|
||||
|
||||
# Nomad Autoscaler Internals
|
||||
|
||||
This section covers the internals of the Nomad Autoscaler and explains the
|
||||
technical details of how it functions, its architecture, and sub-systems.
|
||||
|
||||
- [Autoscaler plugins](/tools/autoscaling/internals/plugins)
|
||||
- [Check calculations](/tools/autoscaling/internals/checks)
|
||||
@@ -14,6 +14,10 @@ Multiple tiers can be defined by declaring more than one `check` in the
|
||||
same scaling policy. If there is any overlap between the bounds, the [safest
|
||||
`check`][internals_check] will be used.
|
||||
|
||||
~> **Note:** When using the `threshold` strategy with multiple checks make sure
|
||||
they all have the same [`group`][policy_group] value, otherwise your target
|
||||
may not be able to scale down.
|
||||
|
||||
## Agent Configuration Options
|
||||
|
||||
```hcl
|
||||
@@ -29,6 +33,8 @@ policy {
|
||||
# ...
|
||||
check "high-memory-usage" {
|
||||
# ...
|
||||
group = "memory-usage"
|
||||
|
||||
strategy "threshold" {
|
||||
upper_bound = 100
|
||||
lower_bound = 70
|
||||
@@ -36,8 +42,10 @@ policy {
|
||||
}
|
||||
}
|
||||
|
||||
check "low-memory-traffic" {
|
||||
check "low-memory-usage" {
|
||||
# ...
|
||||
group = "memory-usage"
|
||||
|
||||
strategy "threshold" {
|
||||
upper_bound = 30
|
||||
lower_bound = 0
|
||||
@@ -66,7 +74,8 @@ policy {
|
||||
as the new target count. Conflicts with `delta` and `percentage`.
|
||||
|
||||
- `within_bounds_trigger` `(int: 5)` - The number of data points in the query
|
||||
result time series that must be within the bound valus to trigger the action.
|
||||
result time series that must be within the bound values to trigger the
|
||||
action.
|
||||
|
||||
At least one of `lower_bound` or `upper_bound` must be defined. If
|
||||
`lower_bound` is not defined, any value below `upper_bound` is considered
|
||||
@@ -76,3 +85,4 @@ within bounds. Similarly, if `upper_bound` is not defined, any value above
|
||||
One, and only one, of `delta`, `percentage`, or `value` must be defined.
|
||||
|
||||
[internals_check]: /tools/autoscaling/internals/checks
|
||||
[policy_group]: /tools/autoscaling/policy#group
|
||||
|
||||
@@ -67,6 +67,10 @@ horizontal application scaling or horizontal cluster scaling.
|
||||
- `query_window` - Defines how far back to query the APM for metrics. It should
|
||||
be provided as a duration (e.g.: `"5s"`, `"1m"`). Defaults to `1m`.
|
||||
|
||||
- `group` - Specifies which checks should treated as correlated when the policy
|
||||
is evaluated. Refer to [Check Grouping][concepts_grouping] for more
|
||||
information.
|
||||
|
||||
- `on_error` - Defines how to handle errors during the `check` evaluation.
|
||||
Possible values are `"fail"` or `"ignore"`. If set to `"fail"` the policy
|
||||
evaluation will stop in case an error occurs and not scaling action will take
|
||||
@@ -236,6 +240,7 @@ scaling "mem" {
|
||||
}
|
||||
```
|
||||
|
||||
[concepts_grouping]: /tools/autoscaling/concepts/policy-eval/checks#check-grouping
|
||||
[das]: /tools/autoscaling#dynamic-application-sizing
|
||||
[policy_default_cooldown_agent]: /tools/autoscaling/agent#default_cooldown
|
||||
[eval_interval_agent]: /tools/autoscaling/agent#default_evaluation_interval
|
||||
|
||||
@@ -1,4 +1,8 @@
|
||||
[
|
||||
{
|
||||
"title": "Overview",
|
||||
"path": "index"
|
||||
},
|
||||
{
|
||||
"title": "Autoscaling",
|
||||
"routes": [
|
||||
@@ -6,6 +10,57 @@
|
||||
"title": "Overview",
|
||||
"path": "autoscaling"
|
||||
},
|
||||
{
|
||||
"title": "Concepts",
|
||||
"routes": [
|
||||
{
|
||||
"title": "Overview",
|
||||
"path": "autoscaling/concepts"
|
||||
},
|
||||
{
|
||||
"title": "Policy Evaluation",
|
||||
"routes": [
|
||||
{
|
||||
"title": "Overview",
|
||||
"path": "autoscaling/concepts/policy-eval"
|
||||
},
|
||||
{
|
||||
"title": "Checks",
|
||||
"path": "autoscaling/concepts/policy-eval/checks"
|
||||
},
|
||||
{
|
||||
"title": "Node Selector Strategy",
|
||||
"path": "autoscaling/concepts/policy-eval/node-selector-strategy"
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"title": "Plugins",
|
||||
"routes": [
|
||||
{
|
||||
"title": "Overview",
|
||||
"path": "autoscaling/concepts/plugins"
|
||||
},
|
||||
{
|
||||
"title": "Base",
|
||||
"path": "autoscaling/concepts/plugins/base"
|
||||
},
|
||||
{
|
||||
"title": "APM",
|
||||
"path": "autoscaling/concepts/plugins/apm"
|
||||
},
|
||||
{
|
||||
"title": "Strategy",
|
||||
"path": "autoscaling/concepts/plugins/strategy"
|
||||
},
|
||||
{
|
||||
"title": "Target",
|
||||
"path": "autoscaling/concepts/plugins/target"
|
||||
}
|
||||
]
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"title": "Agent",
|
||||
"routes": [
|
||||
@@ -166,48 +221,6 @@
|
||||
"path": "autoscaling/plugins/external"
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"title": "Internals",
|
||||
"routes": [
|
||||
{
|
||||
"title": "Overview",
|
||||
"path": "autoscaling/internals"
|
||||
},
|
||||
{
|
||||
"title": "Checks",
|
||||
"path": "autoscaling/internals/checks"
|
||||
},
|
||||
{
|
||||
"title": "Node Selector Strategy",
|
||||
"path": "autoscaling/internals/node-selector-strategy"
|
||||
},
|
||||
{
|
||||
"title": "Plugins",
|
||||
"routes": [
|
||||
{
|
||||
"title": "Overview",
|
||||
"path": "autoscaling/internals/plugins"
|
||||
},
|
||||
{
|
||||
"title": "Base",
|
||||
"path": "autoscaling/internals/plugins/base"
|
||||
},
|
||||
{
|
||||
"title": "APM",
|
||||
"path": "autoscaling/internals/plugins/apm"
|
||||
},
|
||||
{
|
||||
"title": "Strategy",
|
||||
"path": "autoscaling/internals/plugins/strategy"
|
||||
},
|
||||
{
|
||||
"title": "Target",
|
||||
"path": "autoscaling/internals/plugins/target"
|
||||
}
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
BIN
website/public/img/autoscaling/autoscaler-arch.png
Normal file
BIN
website/public/img/autoscaling/autoscaler-arch.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 105 KiB |
BIN
website/public/img/autoscaling/control-loop.png
Normal file
BIN
website/public/img/autoscaling/control-loop.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 77 KiB |
BIN
website/public/img/autoscaling/policy-eval.png
Normal file
BIN
website/public/img/autoscaling/policy-eval.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 224 KiB |
@@ -1 +1,18 @@
|
||||
module.exports = []
|
||||
module.exports = [
|
||||
// Rename and re-arrange Autoscaling Internals section
|
||||
{
|
||||
source: '/nomad/tools/autoscaling/internals/:path*',
|
||||
destination: '/nomad/tools/autoscaling/concepts/:path*',
|
||||
permanent: true,
|
||||
},
|
||||
{
|
||||
source: '/nomad/tools/autoscaling/concepts/checks',
|
||||
destination: '/nomad/tools/autoscaling/concepts/policy-eval/checks',
|
||||
permanent: true,
|
||||
},
|
||||
{
|
||||
source: '/nomad/tools/autoscaling/concepts/node-selector-strategy',
|
||||
destination: '/nomad/tools/autoscaling/concepts/policy-eval/node-selector-strategy',
|
||||
permanent: true,
|
||||
},
|
||||
]
|
||||
|
||||
Reference in New Issue
Block a user