Update operating a job, upgrade guide (#2913)

* Update operating a job, upgrade guide This PR updates the guide for updating a job to reflect the changes in Nomad 0.6 * Feedback changes * Feedback * Feedback
2026-01-06 10:25:42 +03:00 · 2017-07-26 15:06:17 -07:00
parent b57b9af467
commit 6194ac4073
3 changed files with 651 additions and 109 deletions
--- a/website/source/docs/operating-a-job/update-strategies/blue-green-and-canary-deployments.html.md
+++ b/website/source/docs/operating-a-job/update-strategies/blue-green-and-canary-deployments.html.md
@@ -3,9 +3,8 @@ layout: "docs"
 page_title: "Blue/Green & Canary Deployments - Operating a Job"
 sidebar_current: "docs-operating-a-job-updating-blue-green-deployments"
 description: |-
-  Nomad supports blue/green and canary deployments through the declarative job
-  file syntax. By specifying multiple task groups, Nomad allows for easy
-  configuration and rollout of blue/green and canary deployments.
+  Nomad has built-in support for doing blue/green and canary deployments to more
+  safely update existing applications and services.
 ---

 # Blue/Green &amp; Canary Deployments
@@ -17,136 +16,438 @@ organizations prefer to put a "canary" build into production or utilize a
 technique known as a "blue/green" deployment to ensure a safe application
 rollout to production while minimizing downtime.

+## Blue/Green Deployments
+
 Blue/Green deployments have several other names including Red/Black or A/B, but
 the concept is generally the same. In a blue/green deployment, there are two
 application versions. Only one application version is active at a time, except
 during the transition phase from one version to the next. The term "active"
 tends to mean "receiving traffic" or "in service".

-Imagine a hypothetical API server which has ten instances deployed to production
-at version 1.3, and we want to safely upgrade to version 1.4. After the new
-version has been approved to production, we may want to do a small rollout. In
-the event of failure, we can quickly rollback to 1.3.
+Imagine a hypothetical API server which has five instances deployed to
+production at version 1.3, and we want to safely upgrade to version 1.4. We want
+to create five new instances at version 1.4 and in the case that they are
+operating correctly we want to promote them and take down the five versions
+running 1.3. In the event of failure, we can quickly rollback to 1.3.

-To start, version 1.3 is considered the active set and version 1.4 is the
-desired set. Here is a sample job file which models the transition from version
-1.3 to version 1.4 using a blue/green deployment.
+To start, we examine our job which is running in production:

 ```hcl
 job "docs" {
-  datacenters = ["dc1"]
+  # ...

-  group "api-green" {
-    count = 10
+  group "api" {
+    count = 5

    task "api-server" {
      driver = "docker"

+      update {
+        max_parallel     = 1
+        canary           = 5
+        min_healthy_time = "30s"
+        healthy_deadline = "10m"
+        auto_revert      = true
+      }
+
      config {
        image = "api-server:1.3"
      }
    }
  }
-
-  group "api-blue" {
-    count = 0
-
-    task "api-server" {
-      driver = "docker"
-
-      config {
-        image = "api-server:1.4"
-      }
-    }
-  }
 }
 ```

-It is clear that the active group is "api-green" since it has a non-zero count.
-To transition to v1.4 (api-blue), we increase the count of api-blue to match
-that of api-green.
+We see that it has an `update` stanza that has the `canary` equal to the desired
+count. This is what allows us to easily model blue/green deployments. When we
+change the job to run the "api-server:1.4" image, Nomad will create 5 new
+allocations without touching the original "api-server:1.3" allocations. Below we
+can see how this works by changing the image to run the new version:

 ```diff
@@ -2,6 +2,8 @@ job "docs" {
- group "api-blue" {
-  count = 0
-+  count = 10
-
-   task "api-server" {
-     driver = "docker"
+  group "api" {
+    task "api-server" {
+      config {
+-       image = "api-server:1.3"
+       image = "api-server:1.4"
 ```

 Next we plan and run these changes:

-```shell
+```text
 $ nomad plan docs.nomad
-```
+/- Job: "docs"
+/- Task Group: "api" (5 canary, 5 ignore)
+  +/- Task: "api-server" (forces create/destroy update)
+    +/- Config {
+      +/- image: "api-server:1.3" => "api-server:1.4"
+        }

-Assuming the plan output looks okay, we are ready to run these changes.
+Scheduler dry-run:
+- All tasks successfully allocated.
+
+Job Modify Index: 7
+To submit the job with version verification run:
+
+nomad run -check-index 7 example.nomad
+
+When running the job with the check-index flag, the job will only be run if the
+server side version matches the job modify index returned. If the index has
+changed, another user has modified the job and the plan's results are
+potentially invalid.

-```shell
 $ nomad run docs.nomad
+# ...
 ```

-Our deployment is not yet finished. We are currently running at double capacity,
-so approximately half of our traffic is going to the blue and half is going to
-green. Usually we inspect our monitoring and reporting system. If we are
-experiencing errors, we reduce the count of "api-blue" back to 0. If we are
-running successfully, we change the count of "api-green" to 0.
+We can see from the plan output that Nomad is going to create 5 canaries that
+are running the "api-server:1.4" image and ignore all the allocations running
+the older image. Now if we examine the status of the job we can see that both
+the blue ("api-server:1.3") and green ("api-server:1.4") set are running.

-```diff
-@@ -2,6 +2,8 @@ job "docs" {
- group "api-green" {
-  count = 10
-+  count = 0
+```text
+$ nomad status docs
+ID            = docs
+Name          = docs
+Submit Date   = 07/26/17 19:57:47 UTC
+Type          = service
+Priority      = 50
+Datacenters   = dc1
+Status        = running
+Periodic      = false
+Parameterized = false

-   task "api-server" {
-     driver = "docker"
+Summary
+Task Group  Queued  Starting  Running  Failed  Complete  Lost
+api         0       0         10       0       0         0
+
+Latest Deployment
+ID          = 32a080c1
+Status      = running
+Description = Deployment is running but requires promotion
+
+Deployed
+Task Group  Auto Revert  Promoted  Desired  Canaries  Placed  Healthy  Unhealthy
+api         true         false     5        5         5       5        0
+
+Allocations
+ID        Node ID   Task Group  Version  Desired  Status   Created At
+6d8eec42  087852e2  api         1        run      running  07/26/17 19:57:47 UTC
+7051480e  087852e2  api         1        run      running  07/26/17 19:57:47 UTC
+36c6610f  087852e2  api         1        run      running  07/26/17 19:57:47 UTC
+410ba474  087852e2  api         1        run      running  07/26/17 19:57:47 UTC
+85662a7a  087852e2  api         1        run      running  07/26/17 19:57:47 UTC
+3ac3fe05  087852e2  api         0        run      running  07/26/17 19:53:56 UTC
+4bd51979  087852e2  api         0        run      running  07/26/17 19:53:56 UTC
+2998387b  087852e2  api         0        run      running  07/26/17 19:53:56 UTC
+35b813ee  087852e2  api         0        run      running  07/26/17 19:53:56 UTC
+b53b4289  087852e2  api         0        run      running  07/26/17 19:53:56 UTC
 ```

-The next time we want to do a deployment, the "green" group becomes our
-transition group, since the "blue" group is currently active.
+Now that we have the new set in production, we can route traffic to it and
+validate the new job version is working properly. Based on whether the new
+version is functioning properly or improperly we will either want to promote or
+fail the deployment.
+
+### Promoting the Deployment
+
+After deploying the new image along side the old version we have determined it
+is functioning properly and we want to transistion fully to the new version.
+Doing so is as simple as promoting the deployment:
+
+```text
+$ nomad deployment promote 32a080c1
+==> Monitoring evaluation "61ac2be5"
+    Evaluation triggered by job "docs"
+    Evaluation within deployment: "32a080c1"
+    Evaluation status changed: "pending" -> "complete"
+==> Evaluation "61ac2be5" finished with status "complete"
+```
+
+If we look at the job's status we see that after promotion, Nomad stopped the
+older allocations and is only running the new one. This now completes our
+blue/green deployment.
+
+```text
+$ nomad status docs
+ID            = docs
+Name          = docs
+Submit Date   = 07/26/17 19:57:47 UTC
+Type          = service
+Priority      = 50
+Datacenters   = dc1
+Status        = running
+Periodic      = false
+Parameterized = false
+
+Summary
+Task Group  Queued  Starting  Running  Failed  Complete  Lost
+api         0       0         5        0       5         0
+
+Latest Deployment
+ID          = 32a080c1
+Status      = successful
+Description = Deployment completed successfully
+
+Deployed
+Task Group  Auto Revert  Promoted  Desired  Canaries  Placed  Healthy  Unhealthy
+api         true         true      5        5         5       5        0
+
+Allocations
+ID        Node ID   Task Group  Version  Desired  Status    Created At
+6d8eec42  087852e2  api         1        run      running   07/26/17 19:57:47 UTC
+7051480e  087852e2  api         1        run      running   07/26/17 19:57:47 UTC
+36c6610f  087852e2  api         1        run      running   07/26/17 19:57:47 UTC
+410ba474  087852e2  api         1        run      running   07/26/17 19:57:47 UTC
+85662a7a  087852e2  api         1        run      running   07/26/17 19:57:47 UTC
+3ac3fe05  087852e2  api         0        stop     complete  07/26/17 19:53:56 UTC
+4bd51979  087852e2  api         0        stop     complete  07/26/17 19:53:56 UTC
+2998387b  087852e2  api         0        stop     complete  07/26/17 19:53:56 UTC
+35b813ee  087852e2  api         0        stop     complete  07/26/17 19:53:56 UTC
+b53b4289  087852e2  api         0        stop     complete  07/26/17 19:53:56 UTC
+```
+
+### Failing the Deployment
+
+After deploying the new image alongside the old version we have determined it
+is not functioning properly and we want to roll back to the old version.  Doing
+so is as simple as failing the deployment:
+
+```text
+$ nomad deployment fail 32a080c1
+Deployment "32a080c1-de5a-a4e7-0218-521d8344c328" failed. Auto-reverted to job version 0.
+
+==> Monitoring evaluation "6840f512"
+    Evaluation triggered by job "example"
+    Evaluation within deployment: "32a080c1"
+    Allocation "0ccb732f" modified: node "36e7a123", group "cache"
+    Allocation "64d4f282" modified: node "36e7a123", group "cache"
+    Allocation "664e33c7" modified: node "36e7a123", group "cache"
+    Allocation "a4cb6a4b" modified: node "36e7a123", group "cache"
+    Allocation "fdd73bdd" modified: node "36e7a123", group "cache"
+    Evaluation status changed: "pending" -> "complete"
+==> Evaluation "6840f512" finished with status "complete"
+```
+
+If we now look at the job's status we can see that after failing the deployment,
+Nomad stopped the new allocations and is only running the old ones and reverted
+the working copy of the job back to the original specification running
+"api-server:1.3".
+
+```text
+$ nomad status docs
+ID            = docs
+Name          = docs
+Submit Date   = 07/26/17 19:57:47 UTC
+Type          = service
+Priority      = 50
+Datacenters   = dc1
+Status        = running
+Periodic      = false
+Parameterized = false
+
+Summary
+Task Group  Queued  Starting  Running  Failed  Complete  Lost
+api         0       0         5        0       5         0
+
+Latest Deployment
+ID          = 6f3f84b3
+Status      = successful
+Description = Deployment completed successfully
+
+Deployed
+Task Group  Auto Revert  Desired  Placed  Healthy  Unhealthy
+cache       true         5        5       5        0
+
+Allocations
+ID        Node ID   Task Group  Version  Desired  Status    Created At
+27dc2a42  36e7a123  api         1        stop     complete  07/26/17 20:07:31 UTC
+5b7d34bb  36e7a123  api         1        stop     complete  07/26/17 20:07:31 UTC
+983b487d  36e7a123  api         1        stop     complete  07/26/17 20:07:31 UTC
+d1cbf45a  36e7a123  api         1        stop     complete  07/26/17 20:07:31 UTC
+d6b46def  36e7a123  api         1        stop     complete  07/26/17 20:07:31 UTC
+0ccb732f  36e7a123  api         2        run      running   07/26/17 20:06:29 UTC
+64d4f282  36e7a123  api         2        run      running   07/26/17 20:06:29 UTC
+664e33c7  36e7a123  api         2        run      running   07/26/17 20:06:29 UTC
+a4cb6a4b  36e7a123  api         2        run      running   07/26/17 20:06:29 UTC
+fdd73bdd  36e7a123  api         2        run      running   07/26/17 20:06:29 UTC
+
+$ nomad job deployments docs
+ID        Job ID   Job Version  Status      Description
+6f3f84b3  example  2            successful  Deployment completed successfully
+32a080c1  example  1            failed      Deployment marked as failed - rolling back to job version 0
+c4c16494  example  0            successful  Deployment completed successfully
+```

 ## Canary Deployments

-A canary deployment is a special type of blue/green deployment in which a subset
-of nodes continues to run in production for an extended period of time.
-Sometimes this is done for logging/analytics or as an extended blue/green
-deployment. Whatever the reason, Nomad supports canary deployments. Using the
-same strategy as defined above, simply keep the "blue" at a lower number, for
-example:
+Canary updates are a useful way to test a new version of a job before beginning
+a rolling upgrade. The `update` stanza supports setting the number of canaries
+the job operator would like Nomad to create when the job changes via the
+`canary` parameter. When the job specification is updated, Nomad creates the
+canaries without stopping any allocations from the previous job.
+
+This pattern allows operators to achieve higher confidence in the new job
+version because they can route traffic, examine logs, etc, to determine the new
+application is performing properly.

 ```hcl
 job "docs" {
-  datacenters = ["dc1"]
+  # ...

  group "api" {
-    count = 10
+    count = 5

    task "api-server" {
      driver = "docker"

+      update {
+        max_parallel     = 1
+        canary           = 1
+        min_healthy_time = "30s"
+        healthy_deadline = "10m"
+        auto_revert      = true
+      }
+
      config {
        image = "api-server:1.3"
      }
    }
  }
-
-  group "api-canary" {
-    count = 1
-
-    task "api-server" {
-      driver = "docker"
-
-      config {
-        image = "api-server:1.4"
-      }
-    }
-  }
 }
 ```

-Here you can see there is exactly one canary version of our application (v1.4)
-and ten regular versions. Typically canary versions are also tagged
-appropriately in the [service discovery](/docs/service-discovery/index.html)
-layer to prevent unnecessary routing.
+In the example above, the `update` stanza tells Nomad to create a single canary
+when the job specification is changed. Below we can see how this works by
+changing the image to run the new version:
+
+```diff
+@@ -2,6 +2,8 @@ job "docs" {
+  group "api" {
+    task "api-server" {
+      config {
+-       image = "api-server:1.3"
+       image = "api-server:1.4"
+```
+
+Next we plan and run these changes:
+
+```text
+$ nomad plan docs.nomad
+/- Job: "docs"
+/- Task Group: "api" (1 canary, 5 ignore)
+  +/- Task: "api-server" (forces create/destroy update)
+    +/- Config {
+      +/- image: "api-server:1.3" => "api-server:1.4"
+        }
+
+Scheduler dry-run:
+- All tasks successfully allocated.
+
+Job Modify Index: 7
+To submit the job with version verification run:
+
+nomad run -check-index 7 example.nomad
+
+When running the job with the check-index flag, the job will only be run if the
+server side version matches the job modify index returned. If the index has
+changed, another user has modified the job and the plan's results are
+potentially invalid.
+
+$ nomad run docs.nomad
+# ...
+```
+
+We can see from the plan output that Nomad is going to create 1 canary that
+will run the "api-server:1.4" image and ignore all the allocations running
+the older image. If we inspect the status we see that the canary is running
+along side the older version of the job:
+
+```text
+$ nomad status docs
+ID            = docs
+Name          = docs
+Submit Date   = 07/26/17 19:57:47 UTC
+Type          = service
+Priority      = 50
+Datacenters   = dc1
+Status        = running
+Periodic      = false
+Parameterized = false
+
+Summary
+Task Group  Queued  Starting  Running  Failed  Complete  Lost
+api         0       0         6        0       0         0
+
+Latest Deployment
+ID          = 32a080c1
+Status      = running
+Description = Deployment is running but requires promotion
+
+Deployed
+Task Group  Auto Revert  Promoted  Desired  Canaries  Placed  Healthy  Unhealthy
+api         true         false     5        1         1       1        0
+
+Allocations
+ID        Node ID   Task Group  Version  Desired  Status   Created At
+85662a7a  087852e2  api         1        run      running  07/26/17 19:57:47 UTC
+3ac3fe05  087852e2  api         0        run      running  07/26/17 19:53:56 UTC
+4bd51979  087852e2  api         0        run      running  07/26/17 19:53:56 UTC
+2998387b  087852e2  api         0        run      running  07/26/17 19:53:56 UTC
+35b813ee  087852e2  api         0        run      running  07/26/17 19:53:56 UTC
+b53b4289  087852e2  api         0        run      running  07/26/17 19:53:56 UTC
+```
+
+Now if we promote the canary, this will trigger a rolling update to replace the
+remaining allocations running the older image. The rolling update will happen at
+a rate of `max_parallel`, so in this case one allocation at a time:
+
+```text
+$ nomad deployment promote 37033151
+==> Monitoring evaluation "37033151"
+    Evaluation triggered by job "docs"
+    Evaluation within deployment: "ed28f6c2"
+    Allocation "f5057465" created: node "f6646949", group "cache"
+    Allocation "f5057465" status changed: "pending" -> "running"
+    Evaluation status changed: "pending" -> "complete"
+==> Evaluation "37033151" finished with status "complete"
+
+$ nomad status docs
+ID            = docs
+Name          = docs
+Submit Date   = 07/26/17 20:28:59 UTC
+Type          = service
+Priority      = 50
+Datacenters   = dc1
+Status        = running
+Periodic      = false
+Parameterized = false
+
+Summary
+Task Group  Queued  Starting  Running  Failed  Complete  Lost
+api         0       0         5        0       2         0
+
+Latest Deployment
+ID          = ed28f6c2
+Status      = running
+Description = Deployment is running
+
+Deployed
+Task Group  Auto Revert  Promoted  Desired  Canaries  Placed  Healthy  Unhealthy
+api         true         true      5        1         2       1        0
+
+Allocations
+ID        Node ID   Task Group  Version  Desired  Status    Created At
+f5057465  f6646949  api         1        run      running   07/26/17 20:29:23 UTC
+b1c88d20  f6646949  api         1        run      running   07/26/17 20:28:59 UTC
+1140bacf  f6646949  api         0        run      running   07/26/17 20:28:37 UTC
+1958a34a  f6646949  api         0        run      running   07/26/17 20:28:37 UTC
+4bda385a  f6646949  api         0        run      running   07/26/17 20:28:37 UTC
+62d96f06  f6646949  api         0        stop     complete  07/26/17 20:28:37 UTC
+f58abbb2  f6646949  api         0        stop     complete  07/26/17 20:28:37 UTC
+```
+
+Alternatively, if the canary was not performing properly, we could abandon the
+change using the `nomad deployment fail` command, similar to the blue/green
+example.
--- a/website/source/docs/operating-a-job/update-strategies/index.html.md
+++ b/website/source/docs/operating-a-job/update-strategies/index.html.md
@@ -12,10 +12,11 @@ description: |-

 Most applications are long-lived and require updates over time. Whether you are
 deploying a new version of your web application or upgrading to a new version of
-redis, Nomad has built-in support for rolling updates. When a job specifies a
-rolling update, Nomad can take some configurable strategies to minimize or
-eliminate down time, stagger deployments, and more. This section and subsections
-will explore how to do so safely with Nomad.
+Redis, Nomad has built-in support for rolling, blue/green, and canary updates.
+When a job specifies a rolling update, Nomad uses task state and health check
+information in order to detect allocation health and minimize or eliminate
+downtime. This section and subsections will explore how to do so safely with
+Nomad.

 Please see one of the guides below or use the navigation on the left:

--- a/website/source/docs/operating-a-job/update-strategies/rolling-upgrades.html.md
+++ b/website/source/docs/operating-a-job/update-strategies/rolling-upgrades.html.md
@@ -4,35 +4,71 @@ page_title: "Rolling Upgrades - Operating a Job"
 sidebar_current: "docs-operating-a-job-updating-rolling-upgrades"
 description: |-
  In order to update a service while reducing downtime, Nomad provides a
-  built-in mechanism for rolling upgrades. Rolling upgrades allow for a subset
-  of applications to be updated at a time, with a waiting period between to
+  built-in mechanism for rolling upgrades. Rolling upgrades incrementally
+  transistions jobs between versions and using health check information to
  reduce downtime.
 ---

 # Rolling Upgrades

-In order to update a service while reducing downtime, Nomad provides a built-in
-mechanism for rolling upgrades. Jobs specify their "update strategy" using the
-`update` block in the job specification as shown here:
+Nomad supports rolling updates as a first class feature. To enable rolling
+updates a job or task group is annotated with a high-level description of the
+update strategy using the [`update` stanza][update]. Under the hood, Nomad
+handles limiting parallelism, interfacing with Consul to determine service
+health and even automatically reverting to an older, healthy job when a
+deployment fails.  
+
+## Enabling Rolling Updates
+
+Rolling updates are enabled by adding the [`update` stanza][update] to the job
+specification. The `update` stanza may be placed at the job level or in an
+individual task group. When placed at the job level, the update strategy is
+inherited by all task groups in the job. When placed at both the job and group
+level, the 'update` stanzas are merged, with group stanzas taking precedance
+over job level stanzas. See the [`update` stanza
+documentation](/docs/job-specification/update.html#upgrade-stanza-inheritance)
+for an example.

 ```hcl
-job "docs" {
-  update {
-    stagger      = "30s"
-    max_parallel = 3
-  }
+job "geo-api-server" {
+  # ...
+
+  group "api-server" {
+    count = 6
+
+    # Add an update stanza to enable rolling updates of the service
+    update {
+      max_parallel = 2
+      min_healthy_time = "30s"
+      healthy_deadline = "10m"
+    }

-  group "example" {
    task "server" {
+      driver = "docker"
+
+      config {
+        image = "geo-api-server:0.1"
+      }
+
      # ...
    }
  }
 }
 ```

-In this example, Nomad will only update 3 task groups at a time (`max_parallel =
-3`) and will wait 30 seconds (`stagger = "30s"`) before moving on to the next
-set of task groups.
+In this example, by adding the simple `update` stanza to the "api-server" task
+group, we inform Nomad that updates to the group should be handled with a
+rolling update strategy.
+
+Thus when a change is made to the job file that requires new allocations to be
+made, Nomad will deploy 2 allocations at a time and require that the allocations
+running in a healthy state for 30 seconds before deploying more versions of the
+new group.
+
+By default Nomad determines allocation health by ensuring that all tasks in the
+group are running and that any [service
+check](/docs/job-specification/service.html#check-parameters) the tasks register
+are passing.

 ## Planning Changes

@@ -40,37 +76,36 @@ Suppose we make a change to a file to upgrade the version of a Docker container
 that is configured with the same rolling update strategy from above.

 ```diff
-@@ -2,6 +2,8 @@ job "docs" {
-   group "example" {
+@@ -2,6 +2,8 @@ job "geo-api-server" {
+   group "api-server" {
     task "server" {
       driver = "docker"

       config {
-        image = "nginx:1.10"
-+        image = "nginx:1.11"
+-        image = "geo-api-server:0.1"
+        image = "geo-api-server:0.2"
 ```

 The [`nomad plan` command](/docs/commands/plan.html) allows
 us to visualize the series of steps the scheduler would perform. We can analyze
 this output to confirm it is correct:

-```shell
-$ nomad plan docs.nomad
+```text
+$ nomad plan geo-api-server.nomad
 ```

 Here is some sample output:

 ```text
-+/- Job: "my-web"
-+/- Task Group: "web" (3 create/destroy update)
-  +/- Task: "web" (forces create/destroy update)
+/- Job: "geo-api-server"
+/- Task Group: "api-server" (2 create/destroy update, 4 ignore)
+  +/- Task: "server" (forces create/destroy update)
    +/- Config {
-      +/- image: "nginx:1.10" => "nginx:1.11"
+      +/- image: "geo-api-server:0.1" => "geo-api-server:0.2"
    }

 Scheduler dry-run:
 - All tasks successfully allocated.
- Rolling update, next evaluation will be in 30s.

 Job Modify Index: 7
 To submit the job with version verification run:
@@ -83,8 +118,213 @@ changed, another user has modified the job and the plan's results are
 potentially invalid.
 ```

-Here we can see that Nomad will destroy the 3 existing task groups and create 3
-replacements but it will occur with a rolling update with a stagger of `30s`.
+Here we can see that Nomad will begin a rolling update by creating and
+destroying 2 allocations first and for the time being ignoring 4 of the old
+allocations, matching our configured `max_parallel`.

-For more details on the `update` block, see the
-[job specification documentation](/docs/job-specification/update.html).
+## Inspecting a Deployment
+
+After running the plan we can submit the updated job by simply running `nomad
+run`. Once run, Nomad will begin the rolling upgrade of our service by placing
+2 allocations at a time of the new job and taking two of the old jobs down.
+
+We can inspect the current state of a rolling deployment using `nomad status`:
+
+```text
+$ nomad status geo-api-server
+ID            = geo-api-server
+Name          = geo-api-server
+Submit Date   = 07/26/17 18:08:56 UTC
+Type          = service
+Priority      = 50
+Datacenters   = dc1
+Status        = running
+Periodic      = false
+Parameterized = false
+
+Summary
+Task Group  Queued  Starting  Running  Failed  Complete  Lost
+api-server  0       0         6        0       4         0
+
+Latest Deployment
+ID          = c5b34665
+Status      = running
+Description = Deployment is running
+
+Deployed
+Task Group  Desired  Placed  Healthy  Unhealthy
+api-server  6        4       2        0
+
+Allocations
+ID        Node ID   Task Group  Version  Desired  Status    Created At
+14d288e8  f7b1ee08  api-server  1        run      running   07/26/17 18:09:17 UTC
+a134f73c  f7b1ee08  api-server  1        run      running   07/26/17 18:09:17 UTC
+a2574bb6  f7b1ee08  api-server  1        run      running   07/26/17 18:08:56 UTC
+496e7aa2  f7b1ee08  api-server  1        run      running   07/26/17 18:08:56 UTC
+9fc96fcc  f7b1ee08  api-server  0        run      running   07/26/17 18:04:30 UTC
+2521c47a  f7b1ee08  api-server  0        run      running   07/26/17 18:04:30 UTC
+6b794fcb  f7b1ee08  api-server  0        stop     complete  07/26/17 18:04:30 UTC
+9bc11bd7  f7b1ee08  api-server  0        stop     complete  07/26/17 18:04:30 UTC
+691eea24  f7b1ee08  api-server  0        stop     complete  07/26/17 18:04:30 UTC
+af115865  f7b1ee08  api-server  0        stop     complete  07/26/17 18:04:30 UTC
+```
+
+Here we can see that Nomad has created a deployment to conduct the rolling
+upgrade from job version 0 to 1 and has placed 4 instances of the new job and
+has stopped 4 of the old instances. If we look at the deployed allocations, we
+also can see that Nomad has placed 4 instances of job version 1 but only
+considers 2 of them healthy. This is because the 2 newest placed allocations
+haven't been healthy for the required 30 seconds yet.
+
+If we wait for the deployment to complete and re-issue the command, we get the
+following:
+
+```text
+$ nomad status geo-api-server
+ID            = geo-api-server
+Name          = geo-api-server
+Submit Date   = 07/26/17 18:08:56 UTC
+Type          = service
+Priority      = 50
+Datacenters   = dc1
+Status        = running
+Periodic      = false
+Parameterized = false
+
+Summary
+Task Group  Queued  Starting  Running  Failed  Complete  Lost
+cache       0       0         6        0       6         0
+
+Latest Deployment
+ID          = c5b34665
+Status      = successful
+Description = Deployment completed successfully
+
+Deployed
+Task Group  Desired  Placed  Healthy  Unhealthy
+cache       6        6       6        0
+
+Allocations
+ID        Node ID   Task Group  Version  Desired  Status    Created At
+d42a1656  f7b1ee08  api-server  1        run      running   07/26/17 18:10:10 UTC
+401daaf9  f7b1ee08  api-server  1        run      running   07/26/17 18:10:00 UTC
+14d288e8  f7b1ee08  api-server  1        run      running   07/26/17 18:09:17 UTC
+a134f73c  f7b1ee08  api-server  1        run      running   07/26/17 18:09:17 UTC
+a2574bb6  f7b1ee08  api-server  1        run      running   07/26/17 18:08:56 UTC
+496e7aa2  f7b1ee08  api-server  1        run      running   07/26/17 18:08:56 UTC
+9fc96fcc  f7b1ee08  api-server  0        stop     complete  07/26/17 18:04:30 UTC
+2521c47a  f7b1ee08  api-server  0        stop     complete  07/26/17 18:04:30 UTC
+6b794fcb  f7b1ee08  api-server  0        stop     complete  07/26/17 18:04:30 UTC
+9bc11bd7  f7b1ee08  api-server  0        stop     complete  07/26/17 18:04:30 UTC
+691eea24  f7b1ee08  api-server  0        stop     complete  07/26/17 18:04:30 UTC
+af115865  f7b1ee08  api-server  0        stop     complete  07/26/17 18:04:30 UTC
+```
+
+Nomad has successfully transitioned the group to running the updated canary and
+did so with no downtime to our service by ensuring only two allocations were
+changed at a time and the newly placed allocations ran successfully. Had any of
+the newly placed allocations failed their health check, Nomad would have aborted
+the deployment and stopped placing new allocations. If configured, Nomad can
+automatically revert back to the old job definition when the deployment fails.
+
+## Auto Reverting on Failed Deployments
+
+In the case we do a deployment in which the new allocations are unhealthy, Nomad
+will fail the deployment and stop placing new instances of the job. It
+optionally supports automatically reverting back to the last stable job version
+on deployment failure. Nomad keeps a history of submitted jobs and whether the
+job version was stable.  A job is considered stable if all its allocations are
+healthy.
+
+To enable this we simply add the `auto_revert` parameter to the `update` stanza:
+
+```
+update {
+  max_parallel = 2
+  min_healthy_time = "30s"
+  healthy_deadline = "10m"
+
+  # Enable automatically reverting to the last stable job on a failed
+  # deployment. 
+  auto_revert = true
+}
+```
+
+Now imagine we want to update our image to "geo-api-server:0.3" but we instead
+update it to the below and run the job:
+
+```diff
+@@ -2,6 +2,8 @@ job "geo-api-server" {
+   group "api-server" {
+     task "server" {
+       driver = "docker"
+
+       config {
+-        image = "geo-api-server:0.2"
+        image = "geo-api-server:0.33"
+```
+
+If we run `nomad job deployments` we can see that the deployment fails and Nomad
+auto-reverts to the last stable job:
+
+```text
+$ nomad job deployments geo-api-server
+ID        Job ID          Job Version  Status      Description
+0c6f87a5  geo-api-server  3            successful  Deployment completed successfully
+b1712b7f  geo-api-server  2            failed      Failed due to unhealthy allocations - rolling back to job version 1
+3eee83ce  geo-api-server  1            successful  Deployment completed successfully
+72813fcf  geo-api-server  0            successful  Deployment completed successfully
+```
+
+Nomad job versions increment monotonically, so even though Nomad reverted to the
+job specification at version 1, it creates a new job version. We can see the
+differences between a jobs versions and how Nomad auto-reverted the job using
+the `job history` command:
+
+```text
+$ nomad job history -p geo-api-server
+Version     = 3
+Stable      = true
+Submit Date = 07/26/17 18:44:18 UTC
+Diff        =
+/- Job: "geo-api-server"
+/- Task Group: "api-server"
+  +/- Task: "server"
+    +/- Config {
+      +/- image: "geo-api-server:0.33" => "geo-api-server:0.2"
+        }
+
+Version     = 2
+Stable      = false
+Submit Date = 07/26/17 18:45:21 UTC
+Diff        =
+/- Job: "geo-api-server"
+/- Task Group: "api-server"
+  +/- Task: "server"
+    +/- Config {
+      +/- image: "geo-api-server:0.2" => "geo-api-server:0.33"
+        }
+
+Version     = 1
+Stable      = true
+Submit Date = 07/26/17 18:44:18 UTC
+Diff        =
+/- Job: "geo-api-server"
+/- Task Group: "api-server"
+  +/- Task: "server"
+    +/- Config {
+      +/- image: "geo-api-server:0.1" => "geo-api-server:0.2"
+        }
+
+Version     = 0
+Stable      = true
+Submit Date = 07/26/17 18:43:43 UTC
+```
+
+We can see that Nomad considered the job running "geo-api-server:0.1" and
+"geo-api-server:0.2" as stable but job Version 2 that submitted the incorrect
+image is marked as unstable. This is because the placed allocations failed to
+start. Nomad detected the deployment failed and as such, created job Version 3
+that reverted back to the last healthy job.
+
+[update]: /docs/job-specification/update.html "Nomad update Stanza"