Move check_restart to its own section.

2026-01-04 17:35:43 +03:00 · 2017-09-14 16:44:27 -07:00
parent 5141c957f7
commit 1564e1c4b3
3 changed files with 157 additions and 66 deletions
--- a/website/source/docs/job-specification/check_restart.html.md
+++ b/website/source/docs/job-specification/check_restart.html.md
@@ -0,0 +1,151 @@
+---
+layout: "docs"
+page_title: "check_restart Stanza - Job Specification"
+sidebar_current: "docs-job-specification-check_restart"
+description: |-
+  The "check_restart" stanza instructs Nomad when to restart tasks with
+  unhealthy service checks.
+---
+
+# `check_restart` Stanza
+
+<table class="table table-bordered table-striped">
+  <tr>
+    <th width="120">Placement</th>
+    <td>
+      <code>job -> group -> task -> service -> **check_restart**</code>
+    </td>
+  </tr>
+  <tr>
+    <th width="120">Placement</th>
+    <td>
+      <code>job -> group -> task -> service -> check -> **check_restart**</code>
+    </td>
+  </tr>
+</table>
+
+As of Nomad 0.7 the `check_restart` stanza instructs Nomad when to restart
+tasks with unhealthy service checks.  When a health check in Consul has been
+unhealthy for the `limit` specified in a `check_restart` stanza, it is
+restarted according to the task group's [`restart` policy][restart_stanza]. The
+`check_restart` settings apply to [`check`s][check_stanza], but may also be
+placed on [`service`s][service_stanza] to apply to all checks on a service.
+
+```hcl
+job "mysql" {
+  group "mysqld" {
+
+    restart {
+      attempts = 3
+      delay    = "10s"
+      interval = "10m"
+      mode     = "fail"
+    }
+
+    task "server" {
+      service {
+        tags = ["leader", "mysql"]
+
+        port = "db"
+
+        check {
+          type     = "tcp"
+          port     = "db"
+          interval = "10s"
+          timeout  = "2s"
+        }
+
+        check {
+          type     = "script"
+          name     = "check_table"
+          command  = "/usr/local/bin/check_mysql_table_status"
+          args     = ["--verbose"]
+          interval = "60s"
+          timeout  = "5s"
+
+          check_restart {
+            limit = 3
+            grace = "90s"
+
+            ignore_warnings = false
+          }
+        }
+      }
+    }
+  }
+}
+```
+
+- `limit` `(int: 0)` - Restart task when a health check has failed `limit`
+  times.  For example 1 causes a restart on the first failure. The default,
+  `0`, disables healtcheck based restarts. Failures must be consecutive. A
+  single passing check will reset the count, so flapping services may not be
+  restarted.
+
+- `grace` `(string: "1s")` - Duration to wait after a task starts or restarts
+  before checking its health.
+
+- `ignore_warnings` `(bool: false)` - By default checks with both `critical`
+  and `warning` statuses are considered unhealthy. Setting `ignore_warnings =
+  true` treats a `warning` status like `passing` and will not trigger a restart.
+
+## Example Behavior
+
+Using the example `mysql` above would have the following behavior:
+
+```hcl
+check_restart {
+  # ...
+  grace = "90s"
+  # ...
+}
+```
+
+When the `server` task first starts and is registered in Consul, its health
+will not be checked for 90 seconds. This gives the server time to startup.
+
+```hcl
+check_restart {
+  limit = 3
+  # ...
+}
+```
+
+After the grace period if the script check fails, it has 180 seconds (`60s
+interval * 3 limit`) to pass before a restart is triggered. Once a restart is
+triggered the task group's [`restart` policy][restart_stanza] takes control:
+
+```hcl
+restart {
+  # ...
+  delay    = "10s"
+  # ...
+}
+```
+
+The [`restart` stanza][restart_stanza] controls the restart behavior of the
+task. In this case it will wait 10 seconds before restarting. Note that even if
+the check passes in this time the restart will still occur.
+
+Once the task restarts Nomad waits the `grace` period again before starting to
+check the task's health.
+
+
+```hcl
+restart {
+  attempts = 3
+  # ...
+  interval = "10m"
+  mode     = "fail"
+}
+```
+
+If the check continues to fail, the task will be restarted up to `attempts`
+times within an `interval`. If the `restart` attempts are reached within the
+`limit` then the `mode` controls the behavior. In this case the task would fail
+and not be restarted again. See the [`restart` stanza][restart_stanza] for
+details.
+
+[check_stanza]:  /docs/job-specification/service.html#check-parameters "check stanza"
+[restart_stanza]: /docs/job-specification/restart.html "restart stanza"
+[service_stanza]: /docs/job-specification/service.html "service stanza"
--- a/website/source/docs/job-specification/service.html.md
+++ b/website/source/docs/job-specification/service.html.md
@@ -117,6 +117,8 @@ scripts.
 - `args` `(array<string>: [])` - Specifies additional arguments to the
  `command`. This only applies to script-based health checks.

+- `check_restart` - See [`check_restart` stanza][check_restart_stanza].
+
 - `command` `(string: <varies>)` - Specifies the command to run for performing
  the health check. The script must exit: 0 for passing, 1 for warning, or any
  other value for a failing health check. This is required for script-based
@@ -168,72 +170,6 @@ scripts.
 - `tls_skip_verify` `(bool: false)` - Skip verifying TLS certificates for HTTPS
  checks. Requires Consul >= 0.7.2.

-#### `check_restart` Stanza
-
-As of Nomad 0.7 `check` stanzas may include a `check_restart` stanza to restart
-tasks with unhealthy checks. Restarts use the parameters from the
-[`restart`][restart_stanza] stanza, so if a task group has the default `15s`
-delay, tasks won't be restarted for an extra 15 seconds after the
-`check_restart` block considers it failed. `check_restart` stanzas have the
-follow parameters:
-
- `limit` `(int: 0)` - Restart task after `limit` failing health checks. For
-  example 1 causes a restart on the first failure. The default, `0`, disables
-  healtcheck based restarts. Failures must be consecutive. A single passing
-  check will reset the count, so flapping services may not be restarted.
-
- `grace` `(string: "1s")` - Duration to wait after a task starts or restarts
-  before checking its health. On restarts the `delay` and max jitter is added
-  to the grace period to prevent checking a task's health before it has
-  restarted.
-
- `ignore_warnings` `(bool: false)` - By default checks with both `critical`
-  and `warning` statuses are considered unhealthy. Setting `ignore_warnings =
-  true` treats a `warning` status like `passing` and will not trigger a restart.
-
-For example:
-
-```hcl
-restart {
-  delay = "8s"
-}
-
-task "mysqld" {
-  service {
-    # ...
-    check {
-      type     = "script"
-      name     = "check_table"
-      command  = "/usr/local/bin/check_mysql_table_status"
-      args     = ["--verbose"]
-      interval = "20s"
-      timeout  = "5s"
-  
-      check_restart {
-        # Restart the task after 3 consecutive failed checks (180s)
-        limit = 3
-  
-        # Ignore failed checks for 90s after a service starts or restarts
-        grace = "90s"
-  
-        # Treat warnings as unhealthy (the default)
-        ignore_warnings = false
-      }
-    }
-  }
-}
-```
-
-In this example the `mysqld` task has `90s` from startup to begin passing
-healthchecks. After the grace period if `mysqld` would remain unhealthy for
-`60s` (as determined by `limit * interval`) it would be restarted after `8s`
-(as determined by the `restart.delay`). Nomad would then wait `100s` (as
-determined by `grace + delay + (delay * 0.25)`) before checking `mysqld`'s
-health again.
-
-~> `check_restart` stanzas may also be placed in `service` stanzas to apply the
-   same restart logic to multiple checks.
-
 #### `header` Stanza

 HTTP checks may include a `header` stanza to set HTTP headers. The `header`
@@ -388,6 +324,7 @@ service {
 [qemu driver][qemu] since the Nomad client does not have access to the file
 system of a task for that driver.</small>

+[check_restart_stanza]: /docs/job-specification/check_restart.html "check_restart stanza"
 [service-discovery]: /docs/service-discovery/index.html "Nomad Service Discovery"
 [interpolation]: /docs/runtime/interpolation.html "Nomad Runtime Interpolation"
 [network]: /docs/job-specification/network.html "Nomad network Job Specification"
--- a/website/source/layouts/docs.erb
+++ b/website/source/layouts/docs.erb
@@ -26,6 +26,9 @@
          <li<%= sidebar_current("docs-job-specification-artifact")%>>
            <a href="/docs/job-specification/artifact.html">artifact</a>
          </li>
+          <li<%= sidebar_current("docs-job-specification-check_restart")%>>
+            <a href="/docs/job-specification/check_restart.html">check_restart</a>
+          </li>
          <li<%= sidebar_current("docs-job-specification-constraint")%>>
            <a href="/docs/job-specification/constraint.html">constraint</a>
          </li>