docs: expand Autoscaling documentation (#14937)

Rename `Internals` section to `Concepts` to match core docs structure and expand on how policies are evaluated. Also include missing documentation for check grouping and fix examples to use the new feature.
2026-01-01 16:05:42 +03:00 · 2022-10-19 17:57:08 -04:00
parent aa5b83bf17
commit 56816f2f93
18 changed files with 281 additions and 86 deletions
--- a/website/content/tools/autoscaling/concepts/index.mdx
+++ b/website/content/tools/autoscaling/concepts/index.mdx
@@ -0,0 +1,52 @@
+---
+layout: docs
+page_title: Autoscaling Concepts
+description: >
+  This section covers concepts of the Nomad Autoscaler and explains
+  technical details of its operation.
+---
+
+# Nomad Autoscaler Concepts
+
+This section covers concepts of the Nomad Autoscaler and explains the technical
+details of how it functions, its architecture, and sub-systems.
+
+The Nomad Autoscaler is modeled around the concept of a closed-loop control
+system. These types of systems are often at the core of self-regulating
+mechanisms because they are able to adjust some value based on the current
+state of the system and some user provided configuration. An example of a
+closed-loop control system is a thermostat, where you set the desired
+temperature and the appliance will regulate the output of cold and hot air to
+make sure the room stays at the value set.
+
+In closed-loop systems there are a few key components:
+
+* **Setpoint** is the desired output as defined by the user.
+* **Comparator** computes the difference between the setpoint and current
+  state of the system.
+* **Controller** connects all the components together and defines what
+  needs to be done to bring the system closer to the desired state.
+* **Actuator** applies the changes defined by the controller.
+* **System** is the entity being controlled.
+* **Output** is the current value of the system.
+* **Sensor** reads the system output and translates it to a value that can be
+  used by the controller.
+
+[![Closed-loop controller](/img/autoscaling/control-loop.png)](/img/autoscaling/control-loop.png)
+
+The Nomad Autoscaler follows this same base architecture and offloads some of
+the components to [different types of plugins](/tools/autoscaling/concepts/plugins).
+
+* The autoscaling **policy** is how users define their desired outcome and
+  control the Nomad Autoscaler.
+* **Target** is what users want to scale. It can be a job group, where the
+  number of allocations is scaled, or a set of Nomad clients, where the number
+  of nodes is what changes.
+* **Strategy plugins** receive the current status of the scaling target (such
+  as the number of allocations of a group) and metrics of the system to compute
+  what actions need to be taken.
+* **Target plugins** communicate with targets to both read its status and to
+  apply changes defined by the Autoscaler.
+* **APM plugins** read application performance metrics from external sources.
+
+[![Nomad Autoscaler architecture](/img/autoscaling/autoscaler-arch.png)](/img/autoscaling/autoscaler-arch.png)
--- a/website/content/tools/autoscaling/internals/plugins/apm.mdx
+++ b/website/content/tools/autoscaling/internals/plugins/apm.mdx
--- a/website/content/tools/autoscaling/internals/plugins/base.mdx
+++ b/website/content/tools/autoscaling/internals/plugins/base.mdx
--- a/website/content/tools/autoscaling/internals/plugins/index.mdx
+++ b/website/content/tools/autoscaling/internals/plugins/index.mdx
--- a/website/content/tools/autoscaling/internals/plugins/strategy.mdx
+++ b/website/content/tools/autoscaling/internals/plugins/strategy.mdx
--- a/website/content/tools/autoscaling/internals/plugins/target.mdx
+++ b/website/content/tools/autoscaling/internals/plugins/target.mdx
--- a/website/content/tools/autoscaling/concepts/policy-eval/checks.mdx
+++ b/website/content/tools/autoscaling/concepts/policy-eval/checks.mdx
@@ -0,0 +1,104 @@
+---
+layout: docs
+page_title: Checks
+description: Learn about how the Autoscaler deals with policy checks.
+---
+
+# Scaling Policy Checks
+
+A scaling policy can include several [checks][policy_check] all of which
+produce a scaling suggestion. Each check can specify its own source of metrics
+data and apply different strategies based on the desired outcome.
+
+```hcl
+policy {
+  # ...
+  check "cpu_allocated_percentage" {
+    source = "prometheus"
+    query  = "..."
+
+    strategy "target-value" {
+      target = 70
+    }
+  }
+
+  check "high-memory-usage" {
+    source = "prometheus"
+    query  = "..."
+    group  = "memory-usage"
+
+    strategy "threshold" {
+      upper_bound = 100
+      lower_bound = 70
+      delta       = 1
+    }
+  }
+
+  check "low-memory-usage" {
+    source = "prometheus"
+    query  = "..."
+    group  = "memory-usage"
+
+    strategy "threshold" {
+      upper_bound = 30
+      lower_bound = 0
+      delta       = -1
+    }
+  }
+}
+```
+
+## Resolving Conflicts
+
+The checks are all executed at the same time during a policy evaluation and
+they can generate conflicting scaling actions. In a scenario like this, the
+Autoscaler iterates over the results and chooses the safest option, which is
+defined as the action that results in retaining the most capacity of the
+resource.
+
+In a scenario where two checks return different desired scaling directions, the
+following logic is applied.
+
+- `ScaleOut and ScaleIn => ScaleOut`
+- `ScaleOut and ScaleNone => ScaleOut`
+- `ScaleIn and ScaleNone => ScaleNone`
+
+In situations where the same actions are suggested, but with different counts
+the following logic is applied, where the count is the final desired value.
+
+- `ScaleOut(10) and ScaleOut(9) => ScaleOut(10)`
+- `ScaleIn(3) and ScaleIn(4) => ScaleIn(4)`
+
+## Check Grouping
+
+The above logic for resolving conflicts only works when the checks are
+independent from each other. If you use the same `query` in multiple `check`
+blocks, or if the underlying data being queried is somehow correlated, only
+one check will result in a scaling action.
+
+In the example above, the `high-memory-usage` and `low-memory-usage` checks use
+the same query to retrieve memory usage information. We expect that memory
+usage is either low or high (or neither), but never both at the same time.
+
+Without grouping the target is never be able to reduce its count, since the
+possible resulting actions and the final scaling outcome can only be one of the
+following:
+
+- `ScaleOut and ScaleNone => ScaleOut`
+- `ScaleIn and ScaleNone => ScaleNone`
+- `ScaleNone and ScaleNone => ScaleNone`
+
+To fix this problem, the correlated checks need to be set to the same `group`.
+The Nomad Autoscaler then computes a single scaling action for the entire group
+by applying a slightly different logic:
+
+- `ScaleOut and ScaleIn => ScaleOut`
+- `ScaleOut and ScaleNone => ScaleOut`
+- `ScaleIn and ScaleNone => ScaleIn`
+- `ScaleNone and ScaleNone => ScaleNone`
+
+`ScaleNone` results are ignored unless all checks in the group return it and so
+a group is able to `ScaleIn` a target even when all other checks results in no
+action.
+
+[policy_check]: /tools/autoscaling/policy#check-options
--- a/website/content/tools/autoscaling/concepts/policy-eval/index.mdx
+++ b/website/content/tools/autoscaling/concepts/policy-eval/index.mdx
@@ -0,0 +1,35 @@
+---
+layout: docs
+page_title: Autoscaling Policy Evaluation
+description: >
+  This section covers how scaling policies are evaluated to generate scaling
+  actions.
+---
+
+# Policy Evaluation
+
+When the Nomad Autoscaler [agent] starts it loads all the policies defined in
+the [sources][agent_source] configured and monitors them for changes. Each
+policy is assigned a handler that periodically sends the policy to a broker
+where it is evaluated by a worker. The frequency the policy is enqueued is set
+by its [`evaluation_interval`][policy_eval_interval].
+
+The worker executes a series of steps by calling the different plugins used in
+the policy to determine if a scaling action is needed and then to apply the
+necessary actions. The worker then loops back to evaluate the next policy.
+
+If a scaling action is performed and the policy defines a
+[`cooldown`][policy_cooldown] value the policy handler waits the specified
+value before enqueuing it again.
+
+If the policy target are Nomad clients the target plugin will usually execute
+more steps, such as [selecting nodes to be removed][concepts_node_selector] and
+draining them.
+
+[![Scaling policy evaluation pipeline](/img/autoscaling/policy-eval.png)](/img/autoscaling/policy-eval.png)
+
+[agent]: /tools/autoscaling/agent
+[agent_source]: /tools/autoscaling/agent/source
+[concepts_node_selector]: /tools/autoscaling/concepts/policy-eval/node-selector-strategy
+[policy_cooldown]: /tools/autoscaling/policy#cooldown
+[policy_eval_interval]: /tools/autoscaling/policy#evaluation_interval
--- a/website/content/tools/autoscaling/concepts/policy-eval/node-selector-strategy.mdx
+++ b/website/content/tools/autoscaling/concepts/policy-eval/node-selector-strategy.mdx
--- a/website/content/tools/autoscaling/internals/checks.mdx
+++ b/website/content/tools/autoscaling/internals/checks.mdx
@@ -1,26 +0,0 @@
---
-layout: docs
-page_title: Checks
-description: Learn about how the Autoscaler deals with policy checks.
---
-
-# Nomad Autoscaler Check Calculations
-
-A scaling policy can include several checks all of which produce a scaling
-suggesting. The checks are executed at the same time during a policy evaluation
-and the results can conflict with each other. In a scenario like this, the
-autoscaler iterates the results the chooses the safest result which results in
-retaining the most capacity of the resource.
-
-In a scenario where two checks return different desired directions, the following
-logic is applied.
-
- `ScaleOut and ScaleIn => ScaleOut`
- `ScaleOut and ScaleNone => ScaleOut`
- `ScaleIn and ScaleNone => ScaleNone`
-
-In situations where the two same actions are suggested, but with different counts the
-following logic is applied, where the count is the absolute desired value.
-
- `ScaleOut(10) and ScaleOut(9) => ScaleOut(10)`
- `ScaleIn(3) and ScaleIn(4) => ScaleIn(4)`
--- a/website/content/tools/autoscaling/internals/index.mdx
+++ b/website/content/tools/autoscaling/internals/index.mdx
@@ -1,15 +0,0 @@
---
-layout: docs
-page_title: Internals
-description: >
-  This section covers the internals of the Nomad Autoscaler and explains
-  technical details of its operation.
---
-
-# Nomad Autoscaler Internals
-
-This section covers the internals of the Nomad Autoscaler and explains the
-technical details of how it functions, its architecture, and sub-systems.
-
- [Autoscaler plugins](/tools/autoscaling/internals/plugins)
- [Check calculations](/tools/autoscaling/internals/checks)
--- a/website/content/tools/autoscaling/plugins/strategy/threshold.mdx
+++ b/website/content/tools/autoscaling/plugins/strategy/threshold.mdx
@@ -14,6 +14,10 @@ Multiple tiers can be defined by declaring more than one `check` in the
 same scaling policy. If there is any overlap between the bounds, the [safest
 `check`][internals_check] will be used.

+~> **Note:** When using the `threshold` strategy with multiple checks make sure
+  they all have the same [`group`][policy_group] value, otherwise your target
+  may not be able to scale down.
+
 ## Agent Configuration Options

 ```hcl
@@ -29,6 +33,8 @@ policy {
  # ...
  check "high-memory-usage" {
    # ...
+    group = "memory-usage"
+
    strategy "threshold" {
      upper_bound = 100
      lower_bound = 70
@@ -36,8 +42,10 @@ policy {
    }
  }

-  check "low-memory-traffic" {
+  check "low-memory-usage" {
    # ...
+    group = "memory-usage"
+
    strategy "threshold" {
      upper_bound = 30
      lower_bound = 0
@@ -66,7 +74,8 @@ policy {
  as the new target count. Conflicts with `delta` and `percentage`.

 - `within_bounds_trigger` `(int: 5)` - The number of data points in the query
-  result time series that must be within the bound valus to trigger the action.
+  result time series that must be within the bound values to trigger the
+  action.

 At least one of `lower_bound` or `upper_bound` must be defined. If
 `lower_bound` is not defined, any value below `upper_bound` is considered
@@ -76,3 +85,4 @@ within bounds. Similarly, if `upper_bound` is not defined, any value above
 One, and only one, of `delta`, `percentage`, or `value` must be defined.

 [internals_check]: /tools/autoscaling/internals/checks
+[policy_group]: /tools/autoscaling/policy#group
--- a/website/content/tools/autoscaling/policy.mdx
+++ b/website/content/tools/autoscaling/policy.mdx
@@ -67,6 +67,10 @@ horizontal application scaling or horizontal cluster scaling.
 - `query_window` - Defines how far back to query the APM for metrics. It should
  be provided as a duration (e.g.: `"5s"`, `"1m"`). Defaults to `1m`.

+- `group` - Specifies which checks should treated as correlated when the policy
+  is evaluated. Refer to [Check Grouping][concepts_grouping] for more
+  information.
+
 - `on_error` - Defines how to handle errors during the `check` evaluation.
  Possible values are `"fail"` or `"ignore"`. If set to `"fail"` the policy
  evaluation will stop in case an error occurs and not scaling action will take
@@ -236,6 +240,7 @@ scaling "mem" {
 }
 ```

+[concepts_grouping]: /tools/autoscaling/concepts/policy-eval/checks#check-grouping
 [das]: /tools/autoscaling#dynamic-application-sizing
 [policy_default_cooldown_agent]: /tools/autoscaling/agent#default_cooldown
 [eval_interval_agent]: /tools/autoscaling/agent#default_evaluation_interval
--- a/website/data/tools-nav-data.json
+++ b/website/data/tools-nav-data.json
@@ -1,4 +1,8 @@
 [
+  {
+    "title": "Overview",
+    "path": "index"
+  },
  {
    "title": "Autoscaling",
    "routes": [
@@ -6,6 +10,57 @@
        "title": "Overview",
        "path": "autoscaling"
      },
+      {
+        "title": "Concepts",
+        "routes": [
+          {
+            "title": "Overview",
+            "path": "autoscaling/concepts"
+          },
+          {
+            "title": "Policy Evaluation",
+            "routes": [
+              {
+                "title": "Overview",
+                "path": "autoscaling/concepts/policy-eval"
+              },
+              {
+                "title": "Checks",
+                "path": "autoscaling/concepts/policy-eval/checks"
+              },
+              {
+                "title": "Node Selector Strategy",
+                "path": "autoscaling/concepts/policy-eval/node-selector-strategy"
+              }
+            ]
+          },
+          {
+            "title": "Plugins",
+            "routes": [
+              {
+                "title": "Overview",
+                "path": "autoscaling/concepts/plugins"
+              },
+              {
+                "title": "Base",
+                "path": "autoscaling/concepts/plugins/base"
+              },
+              {
+                "title": "APM",
+                "path": "autoscaling/concepts/plugins/apm"
+              },
+              {
+                "title": "Strategy",
+                "path": "autoscaling/concepts/plugins/strategy"
+              },
+              {
+                "title": "Target",
+                "path": "autoscaling/concepts/plugins/target"
+              }
+            ]
+          }
+        ]
+      },
      {
        "title": "Agent",
        "routes": [
@@ -166,48 +221,6 @@
            "path": "autoscaling/plugins/external"
          }
        ]
-      },
-      {
-        "title": "Internals",
-        "routes": [
-          {
-            "title": "Overview",
-            "path": "autoscaling/internals"
-          },
-          {
-            "title": "Checks",
-            "path": "autoscaling/internals/checks"
-          },
-          {
-            "title": "Node Selector Strategy",
-            "path": "autoscaling/internals/node-selector-strategy"
-          },
-          {
-            "title": "Plugins",
-            "routes": [
-              {
-                "title": "Overview",
-                "path": "autoscaling/internals/plugins"
-              },
-              {
-                "title": "Base",
-                "path": "autoscaling/internals/plugins/base"
-              },
-              {
-                "title": "APM",
-                "path": "autoscaling/internals/plugins/apm"
-              },
-              {
-                "title": "Strategy",
-                "path": "autoscaling/internals/plugins/strategy"
-              },
-              {
-                "title": "Target",
-                "path": "autoscaling/internals/plugins/target"
-              }
-            ]
-          }
-        ]
      }
    ]
  }
--- a/website/public/img/autoscaling/autoscaler-arch.png
+++ b/website/public/img/autoscaling/autoscaler-arch.png
--- a/website/public/img/autoscaling/control-loop.png
+++ b/website/public/img/autoscaling/control-loop.png
--- a/website/public/img/autoscaling/policy-eval.png
+++ b/website/public/img/autoscaling/policy-eval.png
--- a/website/redirects.js
+++ b/website/redirects.js
@@ -1 +1,18 @@
-module.exports = []
+module.exports = [
+  // Rename and re-arrange Autoscaling Internals section
+  {
+    source: '/nomad/tools/autoscaling/internals/:path*',
+    destination: '/nomad/tools/autoscaling/concepts/:path*',
+    permanent: true,
+  },
+  {
+    source: '/nomad/tools/autoscaling/concepts/checks',
+    destination: '/nomad/tools/autoscaling/concepts/policy-eval/checks',
+    permanent: true,
+  },
+  {
+    source: '/nomad/tools/autoscaling/concepts/node-selector-strategy',
+    destination: '/nomad/tools/autoscaling/concepts/policy-eval/node-selector-strategy',
+    permanent: true,
+  },
+]