docs: expand Autoscaling documentation (#14937)

Rename `Internals` section to `Concepts` to match core docs structure
and expand on how policies are evaluated.

Also include missing documentation for check grouping and fix examples
to use the new feature.
This commit is contained in:
Luiz Aoqui
2022-10-19 17:57:08 -04:00
committed by GitHub
parent aa5b83bf17
commit 56816f2f93
18 changed files with 281 additions and 86 deletions

View File

@@ -0,0 +1,52 @@
---
layout: docs
page_title: Autoscaling Concepts
description: >
This section covers concepts of the Nomad Autoscaler and explains
technical details of its operation.
---
# Nomad Autoscaler Concepts
This section covers concepts of the Nomad Autoscaler and explains the technical
details of how it functions, its architecture, and sub-systems.
The Nomad Autoscaler is modeled around the concept of a closed-loop control
system. These types of systems are often at the core of self-regulating
mechanisms because they are able to adjust some value based on the current
state of the system and some user provided configuration. An example of a
closed-loop control system is a thermostat, where you set the desired
temperature and the appliance will regulate the output of cold and hot air to
make sure the room stays at the value set.
In closed-loop systems there are a few key components:
* **Setpoint** is the desired output as defined by the user.
* **Comparator** computes the difference between the setpoint and current
state of the system.
* **Controller** connects all the components together and defines what
needs to be done to bring the system closer to the desired state.
* **Actuator** applies the changes defined by the controller.
* **System** is the entity being controlled.
* **Output** is the current value of the system.
* **Sensor** reads the system output and translates it to a value that can be
used by the controller.
[![Closed-loop controller](/img/autoscaling/control-loop.png)](/img/autoscaling/control-loop.png)
The Nomad Autoscaler follows this same base architecture and offloads some of
the components to [different types of plugins](/tools/autoscaling/concepts/plugins).
* The autoscaling **policy** is how users define their desired outcome and
control the Nomad Autoscaler.
* **Target** is what users want to scale. It can be a job group, where the
number of allocations is scaled, or a set of Nomad clients, where the number
of nodes is what changes.
* **Strategy plugins** receive the current status of the scaling target (such
as the number of allocations of a group) and metrics of the system to compute
what actions need to be taken.
* **Target plugins** communicate with targets to both read its status and to
apply changes defined by the Autoscaler.
* **APM plugins** read application performance metrics from external sources.
[![Nomad Autoscaler architecture](/img/autoscaling/autoscaler-arch.png)](/img/autoscaling/autoscaler-arch.png)

View File

@@ -0,0 +1,104 @@
---
layout: docs
page_title: Checks
description: Learn about how the Autoscaler deals with policy checks.
---
# Scaling Policy Checks
A scaling policy can include several [checks][policy_check] all of which
produce a scaling suggestion. Each check can specify its own source of metrics
data and apply different strategies based on the desired outcome.
```hcl
policy {
# ...
check "cpu_allocated_percentage" {
source = "prometheus"
query = "..."
strategy "target-value" {
target = 70
}
}
check "high-memory-usage" {
source = "prometheus"
query = "..."
group = "memory-usage"
strategy "threshold" {
upper_bound = 100
lower_bound = 70
delta = 1
}
}
check "low-memory-usage" {
source = "prometheus"
query = "..."
group = "memory-usage"
strategy "threshold" {
upper_bound = 30
lower_bound = 0
delta = -1
}
}
}
```
## Resolving Conflicts
The checks are all executed at the same time during a policy evaluation and
they can generate conflicting scaling actions. In a scenario like this, the
Autoscaler iterates over the results and chooses the safest option, which is
defined as the action that results in retaining the most capacity of the
resource.
In a scenario where two checks return different desired scaling directions, the
following logic is applied.
- `ScaleOut and ScaleIn => ScaleOut`
- `ScaleOut and ScaleNone => ScaleOut`
- `ScaleIn and ScaleNone => ScaleNone`
In situations where the same actions are suggested, but with different counts
the following logic is applied, where the count is the final desired value.
- `ScaleOut(10) and ScaleOut(9) => ScaleOut(10)`
- `ScaleIn(3) and ScaleIn(4) => ScaleIn(4)`
## Check Grouping
The above logic for resolving conflicts only works when the checks are
independent from each other. If you use the same `query` in multiple `check`
blocks, or if the underlying data being queried is somehow correlated, only
one check will result in a scaling action.
In the example above, the `high-memory-usage` and `low-memory-usage` checks use
the same query to retrieve memory usage information. We expect that memory
usage is either low or high (or neither), but never both at the same time.
Without grouping the target is never be able to reduce its count, since the
possible resulting actions and the final scaling outcome can only be one of the
following:
- `ScaleOut and ScaleNone => ScaleOut`
- `ScaleIn and ScaleNone => ScaleNone`
- `ScaleNone and ScaleNone => ScaleNone`
To fix this problem, the correlated checks need to be set to the same `group`.
The Nomad Autoscaler then computes a single scaling action for the entire group
by applying a slightly different logic:
- `ScaleOut and ScaleIn => ScaleOut`
- `ScaleOut and ScaleNone => ScaleOut`
- `ScaleIn and ScaleNone => ScaleIn`
- `ScaleNone and ScaleNone => ScaleNone`
`ScaleNone` results are ignored unless all checks in the group return it and so
a group is able to `ScaleIn` a target even when all other checks results in no
action.
[policy_check]: /tools/autoscaling/policy#check-options

View File

@@ -0,0 +1,35 @@
---
layout: docs
page_title: Autoscaling Policy Evaluation
description: >
This section covers how scaling policies are evaluated to generate scaling
actions.
---
# Policy Evaluation
When the Nomad Autoscaler [agent] starts it loads all the policies defined in
the [sources][agent_source] configured and monitors them for changes. Each
policy is assigned a handler that periodically sends the policy to a broker
where it is evaluated by a worker. The frequency the policy is enqueued is set
by its [`evaluation_interval`][policy_eval_interval].
The worker executes a series of steps by calling the different plugins used in
the policy to determine if a scaling action is needed and then to apply the
necessary actions. The worker then loops back to evaluate the next policy.
If a scaling action is performed and the policy defines a
[`cooldown`][policy_cooldown] value the policy handler waits the specified
value before enqueuing it again.
If the policy target are Nomad clients the target plugin will usually execute
more steps, such as [selecting nodes to be removed][concepts_node_selector] and
draining them.
[![Scaling policy evaluation pipeline](/img/autoscaling/policy-eval.png)](/img/autoscaling/policy-eval.png)
[agent]: /tools/autoscaling/agent
[agent_source]: /tools/autoscaling/agent/source
[concepts_node_selector]: /tools/autoscaling/concepts/policy-eval/node-selector-strategy
[policy_cooldown]: /tools/autoscaling/policy#cooldown
[policy_eval_interval]: /tools/autoscaling/policy#evaluation_interval

View File

@@ -1,26 +0,0 @@
---
layout: docs
page_title: Checks
description: Learn about how the Autoscaler deals with policy checks.
---
# Nomad Autoscaler Check Calculations
A scaling policy can include several checks all of which produce a scaling
suggesting. The checks are executed at the same time during a policy evaluation
and the results can conflict with each other. In a scenario like this, the
autoscaler iterates the results the chooses the safest result which results in
retaining the most capacity of the resource.
In a scenario where two checks return different desired directions, the following
logic is applied.
- `ScaleOut and ScaleIn => ScaleOut`
- `ScaleOut and ScaleNone => ScaleOut`
- `ScaleIn and ScaleNone => ScaleNone`
In situations where the two same actions are suggested, but with different counts the
following logic is applied, where the count is the absolute desired value.
- `ScaleOut(10) and ScaleOut(9) => ScaleOut(10)`
- `ScaleIn(3) and ScaleIn(4) => ScaleIn(4)`

View File

@@ -1,15 +0,0 @@
---
layout: docs
page_title: Internals
description: >
This section covers the internals of the Nomad Autoscaler and explains
technical details of its operation.
---
# Nomad Autoscaler Internals
This section covers the internals of the Nomad Autoscaler and explains the
technical details of how it functions, its architecture, and sub-systems.
- [Autoscaler plugins](/tools/autoscaling/internals/plugins)
- [Check calculations](/tools/autoscaling/internals/checks)

View File

@@ -14,6 +14,10 @@ Multiple tiers can be defined by declaring more than one `check` in the
same scaling policy. If there is any overlap between the bounds, the [safest
`check`][internals_check] will be used.
~> **Note:** When using the `threshold` strategy with multiple checks make sure
they all have the same [`group`][policy_group] value, otherwise your target
may not be able to scale down.
## Agent Configuration Options
```hcl
@@ -29,6 +33,8 @@ policy {
# ...
check "high-memory-usage" {
# ...
group = "memory-usage"
strategy "threshold" {
upper_bound = 100
lower_bound = 70
@@ -36,8 +42,10 @@ policy {
}
}
check "low-memory-traffic" {
check "low-memory-usage" {
# ...
group = "memory-usage"
strategy "threshold" {
upper_bound = 30
lower_bound = 0
@@ -66,7 +74,8 @@ policy {
as the new target count. Conflicts with `delta` and `percentage`.
- `within_bounds_trigger` `(int: 5)` - The number of data points in the query
result time series that must be within the bound valus to trigger the action.
result time series that must be within the bound values to trigger the
action.
At least one of `lower_bound` or `upper_bound` must be defined. If
`lower_bound` is not defined, any value below `upper_bound` is considered
@@ -76,3 +85,4 @@ within bounds. Similarly, if `upper_bound` is not defined, any value above
One, and only one, of `delta`, `percentage`, or `value` must be defined.
[internals_check]: /tools/autoscaling/internals/checks
[policy_group]: /tools/autoscaling/policy#group

View File

@@ -67,6 +67,10 @@ horizontal application scaling or horizontal cluster scaling.
- `query_window` - Defines how far back to query the APM for metrics. It should
be provided as a duration (e.g.: `"5s"`, `"1m"`). Defaults to `1m`.
- `group` - Specifies which checks should treated as correlated when the policy
is evaluated. Refer to [Check Grouping][concepts_grouping] for more
information.
- `on_error` - Defines how to handle errors during the `check` evaluation.
Possible values are `"fail"` or `"ignore"`. If set to `"fail"` the policy
evaluation will stop in case an error occurs and not scaling action will take
@@ -236,6 +240,7 @@ scaling "mem" {
}
```
[concepts_grouping]: /tools/autoscaling/concepts/policy-eval/checks#check-grouping
[das]: /tools/autoscaling#dynamic-application-sizing
[policy_default_cooldown_agent]: /tools/autoscaling/agent#default_cooldown
[eval_interval_agent]: /tools/autoscaling/agent#default_evaluation_interval

View File

@@ -1,4 +1,8 @@
[
{
"title": "Overview",
"path": "index"
},
{
"title": "Autoscaling",
"routes": [
@@ -6,6 +10,57 @@
"title": "Overview",
"path": "autoscaling"
},
{
"title": "Concepts",
"routes": [
{
"title": "Overview",
"path": "autoscaling/concepts"
},
{
"title": "Policy Evaluation",
"routes": [
{
"title": "Overview",
"path": "autoscaling/concepts/policy-eval"
},
{
"title": "Checks",
"path": "autoscaling/concepts/policy-eval/checks"
},
{
"title": "Node Selector Strategy",
"path": "autoscaling/concepts/policy-eval/node-selector-strategy"
}
]
},
{
"title": "Plugins",
"routes": [
{
"title": "Overview",
"path": "autoscaling/concepts/plugins"
},
{
"title": "Base",
"path": "autoscaling/concepts/plugins/base"
},
{
"title": "APM",
"path": "autoscaling/concepts/plugins/apm"
},
{
"title": "Strategy",
"path": "autoscaling/concepts/plugins/strategy"
},
{
"title": "Target",
"path": "autoscaling/concepts/plugins/target"
}
]
}
]
},
{
"title": "Agent",
"routes": [
@@ -166,48 +221,6 @@
"path": "autoscaling/plugins/external"
}
]
},
{
"title": "Internals",
"routes": [
{
"title": "Overview",
"path": "autoscaling/internals"
},
{
"title": "Checks",
"path": "autoscaling/internals/checks"
},
{
"title": "Node Selector Strategy",
"path": "autoscaling/internals/node-selector-strategy"
},
{
"title": "Plugins",
"routes": [
{
"title": "Overview",
"path": "autoscaling/internals/plugins"
},
{
"title": "Base",
"path": "autoscaling/internals/plugins/base"
},
{
"title": "APM",
"path": "autoscaling/internals/plugins/apm"
},
{
"title": "Strategy",
"path": "autoscaling/internals/plugins/strategy"
},
{
"title": "Target",
"path": "autoscaling/internals/plugins/target"
}
]
}
]
}
]
}

Binary file not shown.

After

Width:  |  Height:  |  Size: 105 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 77 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 224 KiB

View File

@@ -1 +1,18 @@
module.exports = []
module.exports = [
// Rename and re-arrange Autoscaling Internals section
{
source: '/nomad/tools/autoscaling/internals/:path*',
destination: '/nomad/tools/autoscaling/concepts/:path*',
permanent: true,
},
{
source: '/nomad/tools/autoscaling/concepts/checks',
destination: '/nomad/tools/autoscaling/concepts/policy-eval/checks',
permanent: true,
},
{
source: '/nomad/tools/autoscaling/concepts/node-selector-strategy',
destination: '/nomad/tools/autoscaling/concepts/policy-eval/node-selector-strategy',
permanent: true,
},
]