diff --git a/website/source/docs/operating-a-job/update-strategies/blue-green-and-canary-deployments.html.md b/website/source/docs/operating-a-job/update-strategies/blue-green-and-canary-deployments.html.md index 0df8446f2..351e8416f 100644 --- a/website/source/docs/operating-a-job/update-strategies/blue-green-and-canary-deployments.html.md +++ b/website/source/docs/operating-a-job/update-strategies/blue-green-and-canary-deployments.html.md @@ -3,9 +3,8 @@ layout: "docs" page_title: "Blue/Green & Canary Deployments - Operating a Job" sidebar_current: "docs-operating-a-job-updating-blue-green-deployments" description: |- - Nomad supports blue/green and canary deployments through the declarative job - file syntax. By specifying multiple task groups, Nomad allows for easy - configuration and rollout of blue/green and canary deployments. + Nomad has built-in support for doing blue/green and canary deployments to more + safely update existing applications and services. --- # Blue/Green & Canary Deployments @@ -17,136 +16,438 @@ organizations prefer to put a "canary" build into production or utilize a technique known as a "blue/green" deployment to ensure a safe application rollout to production while minimizing downtime. +## Blue/Green Deployments + Blue/Green deployments have several other names including Red/Black or A/B, but the concept is generally the same. In a blue/green deployment, there are two application versions. Only one application version is active at a time, except during the transition phase from one version to the next. The term "active" tends to mean "receiving traffic" or "in service". -Imagine a hypothetical API server which has ten instances deployed to production -at version 1.3, and we want to safely upgrade to version 1.4. After the new -version has been approved to production, we may want to do a small rollout. In -the event of failure, we can quickly rollback to 1.3. +Imagine a hypothetical API server which has five instances deployed to +production at version 1.3, and we want to safely upgrade to version 1.4. We want +to create five new instances at version 1.4 and in the case that they are +operating correctly we want to promote them and take down the five versions +running 1.3. In the event of failure, we can quickly rollback to 1.3. -To start, version 1.3 is considered the active set and version 1.4 is the -desired set. Here is a sample job file which models the transition from version -1.3 to version 1.4 using a blue/green deployment. +To start, we examine our job which is running in production: ```hcl job "docs" { - datacenters = ["dc1"] + # ... - group "api-green" { - count = 10 + group "api" { + count = 5 task "api-server" { driver = "docker" + update { + max_parallel = 1 + canary = 5 + min_healthy_time = "30s" + healthy_deadline = "10m" + auto_revert = true + } + config { image = "api-server:1.3" } } } - - group "api-blue" { - count = 0 - - task "api-server" { - driver = "docker" - - config { - image = "api-server:1.4" - } - } - } } ``` -It is clear that the active group is "api-green" since it has a non-zero count. -To transition to v1.4 (api-blue), we increase the count of api-blue to match -that of api-green. +We see that it has an `update` stanza that has the `canary` equal to the desired +count. This is what allows us to easily model blue/green deployments. When we +change the job to run the "api-server:1.4" image, Nomad will create 5 new +allocations without touching the original "api-server:1.3" allocations. Below we +can see how this works by changing the image to run the new version: ```diff @@ -2,6 +2,8 @@ job "docs" { - group "api-blue" { -- count = 0 -+ count = 10 - - task "api-server" { - driver = "docker" + group "api" { + task "api-server" { + config { +- image = "api-server:1.3" ++ image = "api-server:1.4" ``` Next we plan and run these changes: -```shell +```text $ nomad plan docs.nomad -``` ++/- Job: "docs" ++/- Task Group: "api" (5 canary, 5 ignore) + +/- Task: "api-server" (forces create/destroy update) + +/- Config { + +/- image: "api-server:1.3" => "api-server:1.4" + } -Assuming the plan output looks okay, we are ready to run these changes. +Scheduler dry-run: +- All tasks successfully allocated. + +Job Modify Index: 7 +To submit the job with version verification run: + +nomad run -check-index 7 example.nomad + +When running the job with the check-index flag, the job will only be run if the +server side version matches the job modify index returned. If the index has +changed, another user has modified the job and the plan's results are +potentially invalid. -```shell $ nomad run docs.nomad +# ... ``` -Our deployment is not yet finished. We are currently running at double capacity, -so approximately half of our traffic is going to the blue and half is going to -green. Usually we inspect our monitoring and reporting system. If we are -experiencing errors, we reduce the count of "api-blue" back to 0. If we are -running successfully, we change the count of "api-green" to 0. +We can see from the plan output that Nomad is going to create 5 canaries that +are running the "api-server:1.4" image and ignore all the allocations running +the older image. Now if we examine the status of the job we can see that both +the blue ("api-server:1.3") and green ("api-server:1.4") set are running. -```diff -@@ -2,6 +2,8 @@ job "docs" { - group "api-green" { -- count = 10 -+ count = 0 +```text +$ nomad status docs +ID = docs +Name = docs +Submit Date = 07/26/17 19:57:47 UTC +Type = service +Priority = 50 +Datacenters = dc1 +Status = running +Periodic = false +Parameterized = false - task "api-server" { - driver = "docker" +Summary +Task Group Queued Starting Running Failed Complete Lost +api 0 0 10 0 0 0 + +Latest Deployment +ID = 32a080c1 +Status = running +Description = Deployment is running but requires promotion + +Deployed +Task Group Auto Revert Promoted Desired Canaries Placed Healthy Unhealthy +api true false 5 5 5 5 0 + +Allocations +ID Node ID Task Group Version Desired Status Created At +6d8eec42 087852e2 api 1 run running 07/26/17 19:57:47 UTC +7051480e 087852e2 api 1 run running 07/26/17 19:57:47 UTC +36c6610f 087852e2 api 1 run running 07/26/17 19:57:47 UTC +410ba474 087852e2 api 1 run running 07/26/17 19:57:47 UTC +85662a7a 087852e2 api 1 run running 07/26/17 19:57:47 UTC +3ac3fe05 087852e2 api 0 run running 07/26/17 19:53:56 UTC +4bd51979 087852e2 api 0 run running 07/26/17 19:53:56 UTC +2998387b 087852e2 api 0 run running 07/26/17 19:53:56 UTC +35b813ee 087852e2 api 0 run running 07/26/17 19:53:56 UTC +b53b4289 087852e2 api 0 run running 07/26/17 19:53:56 UTC ``` -The next time we want to do a deployment, the "green" group becomes our -transition group, since the "blue" group is currently active. +Now that we have the new set in production, we can route traffic to it and +validate the new job version is working properly. Based on whether the new +version is functioning properly or improperly we will either want to promote or +fail the deployment. + +### Promoting the Deployment + +After deploying the new image along side the old version we have determined it +is functioning properly and we want to transistion fully to the new version. +Doing so is as simple as promoting the deployment: + +```text +$ nomad deployment promote 32a080c1 +==> Monitoring evaluation "61ac2be5" + Evaluation triggered by job "docs" + Evaluation within deployment: "32a080c1" + Evaluation status changed: "pending" -> "complete" +==> Evaluation "61ac2be5" finished with status "complete" +``` + +If we look at the job's status we see that after promotion, Nomad stopped the +older allocations and is only running the new one. This now completes our +blue/green deployment. + +```text +$ nomad status docs +ID = docs +Name = docs +Submit Date = 07/26/17 19:57:47 UTC +Type = service +Priority = 50 +Datacenters = dc1 +Status = running +Periodic = false +Parameterized = false + +Summary +Task Group Queued Starting Running Failed Complete Lost +api 0 0 5 0 5 0 + +Latest Deployment +ID = 32a080c1 +Status = successful +Description = Deployment completed successfully + +Deployed +Task Group Auto Revert Promoted Desired Canaries Placed Healthy Unhealthy +api true true 5 5 5 5 0 + +Allocations +ID Node ID Task Group Version Desired Status Created At +6d8eec42 087852e2 api 1 run running 07/26/17 19:57:47 UTC +7051480e 087852e2 api 1 run running 07/26/17 19:57:47 UTC +36c6610f 087852e2 api 1 run running 07/26/17 19:57:47 UTC +410ba474 087852e2 api 1 run running 07/26/17 19:57:47 UTC +85662a7a 087852e2 api 1 run running 07/26/17 19:57:47 UTC +3ac3fe05 087852e2 api 0 stop complete 07/26/17 19:53:56 UTC +4bd51979 087852e2 api 0 stop complete 07/26/17 19:53:56 UTC +2998387b 087852e2 api 0 stop complete 07/26/17 19:53:56 UTC +35b813ee 087852e2 api 0 stop complete 07/26/17 19:53:56 UTC +b53b4289 087852e2 api 0 stop complete 07/26/17 19:53:56 UTC +``` + +### Failing the Deployment + +After deploying the new image alongside the old version we have determined it +is not functioning properly and we want to roll back to the old version. Doing +so is as simple as failing the deployment: + +```text +$ nomad deployment fail 32a080c1 +Deployment "32a080c1-de5a-a4e7-0218-521d8344c328" failed. Auto-reverted to job version 0. + +==> Monitoring evaluation "6840f512" + Evaluation triggered by job "example" + Evaluation within deployment: "32a080c1" + Allocation "0ccb732f" modified: node "36e7a123", group "cache" + Allocation "64d4f282" modified: node "36e7a123", group "cache" + Allocation "664e33c7" modified: node "36e7a123", group "cache" + Allocation "a4cb6a4b" modified: node "36e7a123", group "cache" + Allocation "fdd73bdd" modified: node "36e7a123", group "cache" + Evaluation status changed: "pending" -> "complete" +==> Evaluation "6840f512" finished with status "complete" +``` + +If we now look at the job's status we can see that after failing the deployment, +Nomad stopped the new allocations and is only running the old ones and reverted +the working copy of the job back to the original specification running +"api-server:1.3". + +```text +$ nomad status docs +ID = docs +Name = docs +Submit Date = 07/26/17 19:57:47 UTC +Type = service +Priority = 50 +Datacenters = dc1 +Status = running +Periodic = false +Parameterized = false + +Summary +Task Group Queued Starting Running Failed Complete Lost +api 0 0 5 0 5 0 + +Latest Deployment +ID = 6f3f84b3 +Status = successful +Description = Deployment completed successfully + +Deployed +Task Group Auto Revert Desired Placed Healthy Unhealthy +cache true 5 5 5 0 + +Allocations +ID Node ID Task Group Version Desired Status Created At +27dc2a42 36e7a123 api 1 stop complete 07/26/17 20:07:31 UTC +5b7d34bb 36e7a123 api 1 stop complete 07/26/17 20:07:31 UTC +983b487d 36e7a123 api 1 stop complete 07/26/17 20:07:31 UTC +d1cbf45a 36e7a123 api 1 stop complete 07/26/17 20:07:31 UTC +d6b46def 36e7a123 api 1 stop complete 07/26/17 20:07:31 UTC +0ccb732f 36e7a123 api 2 run running 07/26/17 20:06:29 UTC +64d4f282 36e7a123 api 2 run running 07/26/17 20:06:29 UTC +664e33c7 36e7a123 api 2 run running 07/26/17 20:06:29 UTC +a4cb6a4b 36e7a123 api 2 run running 07/26/17 20:06:29 UTC +fdd73bdd 36e7a123 api 2 run running 07/26/17 20:06:29 UTC + +$ nomad job deployments docs +ID Job ID Job Version Status Description +6f3f84b3 example 2 successful Deployment completed successfully +32a080c1 example 1 failed Deployment marked as failed - rolling back to job version 0 +c4c16494 example 0 successful Deployment completed successfully +``` ## Canary Deployments -A canary deployment is a special type of blue/green deployment in which a subset -of nodes continues to run in production for an extended period of time. -Sometimes this is done for logging/analytics or as an extended blue/green -deployment. Whatever the reason, Nomad supports canary deployments. Using the -same strategy as defined above, simply keep the "blue" at a lower number, for -example: +Canary updates are a useful way to test a new version of a job before beginning +a rolling upgrade. The `update` stanza supports setting the number of canaries +the job operator would like Nomad to create when the job changes via the +`canary` parameter. When the job specification is updated, Nomad creates the +canaries without stopping any allocations from the previous job. + +This pattern allows operators to achieve higher confidence in the new job +version because they can route traffic, examine logs, etc, to determine the new +application is performing properly. ```hcl job "docs" { - datacenters = ["dc1"] + # ... group "api" { - count = 10 + count = 5 task "api-server" { driver = "docker" + update { + max_parallel = 1 + canary = 1 + min_healthy_time = "30s" + healthy_deadline = "10m" + auto_revert = true + } + config { image = "api-server:1.3" } } } - - group "api-canary" { - count = 1 - - task "api-server" { - driver = "docker" - - config { - image = "api-server:1.4" - } - } - } } ``` -Here you can see there is exactly one canary version of our application (v1.4) -and ten regular versions. Typically canary versions are also tagged -appropriately in the [service discovery](/docs/service-discovery/index.html) -layer to prevent unnecessary routing. +In the example above, the `update` stanza tells Nomad to create a single canary +when the job specification is changed. Below we can see how this works by +changing the image to run the new version: + +```diff +@@ -2,6 +2,8 @@ job "docs" { + group "api" { + task "api-server" { + config { +- image = "api-server:1.3" ++ image = "api-server:1.4" +``` + +Next we plan and run these changes: + +```text +$ nomad plan docs.nomad ++/- Job: "docs" ++/- Task Group: "api" (1 canary, 5 ignore) + +/- Task: "api-server" (forces create/destroy update) + +/- Config { + +/- image: "api-server:1.3" => "api-server:1.4" + } + +Scheduler dry-run: +- All tasks successfully allocated. + +Job Modify Index: 7 +To submit the job with version verification run: + +nomad run -check-index 7 example.nomad + +When running the job with the check-index flag, the job will only be run if the +server side version matches the job modify index returned. If the index has +changed, another user has modified the job and the plan's results are +potentially invalid. + +$ nomad run docs.nomad +# ... +``` + +We can see from the plan output that Nomad is going to create 1 canary that +will run the "api-server:1.4" image and ignore all the allocations running +the older image. If we inspect the status we see that the canary is running +along side the older version of the job: + +```text +$ nomad status docs +ID = docs +Name = docs +Submit Date = 07/26/17 19:57:47 UTC +Type = service +Priority = 50 +Datacenters = dc1 +Status = running +Periodic = false +Parameterized = false + +Summary +Task Group Queued Starting Running Failed Complete Lost +api 0 0 6 0 0 0 + +Latest Deployment +ID = 32a080c1 +Status = running +Description = Deployment is running but requires promotion + +Deployed +Task Group Auto Revert Promoted Desired Canaries Placed Healthy Unhealthy +api true false 5 1 1 1 0 + +Allocations +ID Node ID Task Group Version Desired Status Created At +85662a7a 087852e2 api 1 run running 07/26/17 19:57:47 UTC +3ac3fe05 087852e2 api 0 run running 07/26/17 19:53:56 UTC +4bd51979 087852e2 api 0 run running 07/26/17 19:53:56 UTC +2998387b 087852e2 api 0 run running 07/26/17 19:53:56 UTC +35b813ee 087852e2 api 0 run running 07/26/17 19:53:56 UTC +b53b4289 087852e2 api 0 run running 07/26/17 19:53:56 UTC +``` + +Now if we promote the canary, this will trigger a rolling update to replace the +remaining allocations running the older image. The rolling update will happen at +a rate of `max_parallel`, so in this case one allocation at a time: + +```text +$ nomad deployment promote 37033151 +==> Monitoring evaluation "37033151" + Evaluation triggered by job "docs" + Evaluation within deployment: "ed28f6c2" + Allocation "f5057465" created: node "f6646949", group "cache" + Allocation "f5057465" status changed: "pending" -> "running" + Evaluation status changed: "pending" -> "complete" +==> Evaluation "37033151" finished with status "complete" + +$ nomad status docs +ID = docs +Name = docs +Submit Date = 07/26/17 20:28:59 UTC +Type = service +Priority = 50 +Datacenters = dc1 +Status = running +Periodic = false +Parameterized = false + +Summary +Task Group Queued Starting Running Failed Complete Lost +api 0 0 5 0 2 0 + +Latest Deployment +ID = ed28f6c2 +Status = running +Description = Deployment is running + +Deployed +Task Group Auto Revert Promoted Desired Canaries Placed Healthy Unhealthy +api true true 5 1 2 1 0 + +Allocations +ID Node ID Task Group Version Desired Status Created At +f5057465 f6646949 api 1 run running 07/26/17 20:29:23 UTC +b1c88d20 f6646949 api 1 run running 07/26/17 20:28:59 UTC +1140bacf f6646949 api 0 run running 07/26/17 20:28:37 UTC +1958a34a f6646949 api 0 run running 07/26/17 20:28:37 UTC +4bda385a f6646949 api 0 run running 07/26/17 20:28:37 UTC +62d96f06 f6646949 api 0 stop complete 07/26/17 20:28:37 UTC +f58abbb2 f6646949 api 0 stop complete 07/26/17 20:28:37 UTC +``` + +Alternatively, if the canary was not performing properly, we could abandon the +change using the `nomad deployment fail` command, similar to the blue/green +example. diff --git a/website/source/docs/operating-a-job/update-strategies/index.html.md b/website/source/docs/operating-a-job/update-strategies/index.html.md index b86f8193a..2dccee380 100644 --- a/website/source/docs/operating-a-job/update-strategies/index.html.md +++ b/website/source/docs/operating-a-job/update-strategies/index.html.md @@ -12,10 +12,11 @@ description: |- Most applications are long-lived and require updates over time. Whether you are deploying a new version of your web application or upgrading to a new version of -redis, Nomad has built-in support for rolling updates. When a job specifies a -rolling update, Nomad can take some configurable strategies to minimize or -eliminate down time, stagger deployments, and more. This section and subsections -will explore how to do so safely with Nomad. +Redis, Nomad has built-in support for rolling, blue/green, and canary updates. +When a job specifies a rolling update, Nomad uses task state and health check +information in order to detect allocation health and minimize or eliminate +downtime. This section and subsections will explore how to do so safely with +Nomad. Please see one of the guides below or use the navigation on the left: diff --git a/website/source/docs/operating-a-job/update-strategies/rolling-upgrades.html.md b/website/source/docs/operating-a-job/update-strategies/rolling-upgrades.html.md index 1076f66c6..39626772d 100644 --- a/website/source/docs/operating-a-job/update-strategies/rolling-upgrades.html.md +++ b/website/source/docs/operating-a-job/update-strategies/rolling-upgrades.html.md @@ -4,35 +4,71 @@ page_title: "Rolling Upgrades - Operating a Job" sidebar_current: "docs-operating-a-job-updating-rolling-upgrades" description: |- In order to update a service while reducing downtime, Nomad provides a - built-in mechanism for rolling upgrades. Rolling upgrades allow for a subset - of applications to be updated at a time, with a waiting period between to + built-in mechanism for rolling upgrades. Rolling upgrades incrementally + transistions jobs between versions and using health check information to reduce downtime. --- # Rolling Upgrades -In order to update a service while reducing downtime, Nomad provides a built-in -mechanism for rolling upgrades. Jobs specify their "update strategy" using the -`update` block in the job specification as shown here: +Nomad supports rolling updates as a first class feature. To enable rolling +updates a job or task group is annotated with a high-level description of the +update strategy using the [`update` stanza][update]. Under the hood, Nomad +handles limiting parallelism, interfacing with Consul to determine service +health and even automatically reverting to an older, healthy job when a +deployment fails. + +## Enabling Rolling Updates + +Rolling updates are enabled by adding the [`update` stanza][update] to the job +specification. The `update` stanza may be placed at the job level or in an +individual task group. When placed at the job level, the update strategy is +inherited by all task groups in the job. When placed at both the job and group +level, the 'update` stanzas are merged, with group stanzas taking precedance +over job level stanzas. See the [`update` stanza +documentation](/docs/job-specification/update.html#upgrade-stanza-inheritance) +for an example. ```hcl -job "docs" { - update { - stagger = "30s" - max_parallel = 3 - } +job "geo-api-server" { + # ... + + group "api-server" { + count = 6 + + # Add an update stanza to enable rolling updates of the service + update { + max_parallel = 2 + min_healthy_time = "30s" + healthy_deadline = "10m" + } - group "example" { task "server" { + driver = "docker" + + config { + image = "geo-api-server:0.1" + } + # ... } } } ``` -In this example, Nomad will only update 3 task groups at a time (`max_parallel = -3`) and will wait 30 seconds (`stagger = "30s"`) before moving on to the next -set of task groups. +In this example, by adding the simple `update` stanza to the "api-server" task +group, we inform Nomad that updates to the group should be handled with a +rolling update strategy. + +Thus when a change is made to the job file that requires new allocations to be +made, Nomad will deploy 2 allocations at a time and require that the allocations +running in a healthy state for 30 seconds before deploying more versions of the +new group. + +By default Nomad determines allocation health by ensuring that all tasks in the +group are running and that any [service +check](/docs/job-specification/service.html#check-parameters) the tasks register +are passing. ## Planning Changes @@ -40,37 +76,36 @@ Suppose we make a change to a file to upgrade the version of a Docker container that is configured with the same rolling update strategy from above. ```diff -@@ -2,6 +2,8 @@ job "docs" { - group "example" { +@@ -2,6 +2,8 @@ job "geo-api-server" { + group "api-server" { task "server" { driver = "docker" config { -- image = "nginx:1.10" -+ image = "nginx:1.11" +- image = "geo-api-server:0.1" ++ image = "geo-api-server:0.2" ``` The [`nomad plan` command](/docs/commands/plan.html) allows us to visualize the series of steps the scheduler would perform. We can analyze this output to confirm it is correct: -```shell -$ nomad plan docs.nomad +```text +$ nomad plan geo-api-server.nomad ``` Here is some sample output: ```text -+/- Job: "my-web" -+/- Task Group: "web" (3 create/destroy update) - +/- Task: "web" (forces create/destroy update) ++/- Job: "geo-api-server" ++/- Task Group: "api-server" (2 create/destroy update, 4 ignore) + +/- Task: "server" (forces create/destroy update) +/- Config { - +/- image: "nginx:1.10" => "nginx:1.11" + +/- image: "geo-api-server:0.1" => "geo-api-server:0.2" } Scheduler dry-run: - All tasks successfully allocated. -- Rolling update, next evaluation will be in 30s. Job Modify Index: 7 To submit the job with version verification run: @@ -83,8 +118,213 @@ changed, another user has modified the job and the plan's results are potentially invalid. ``` -Here we can see that Nomad will destroy the 3 existing task groups and create 3 -replacements but it will occur with a rolling update with a stagger of `30s`. +Here we can see that Nomad will begin a rolling update by creating and +destroying 2 allocations first and for the time being ignoring 4 of the old +allocations, matching our configured `max_parallel`. -For more details on the `update` block, see the -[job specification documentation](/docs/job-specification/update.html). +## Inspecting a Deployment + +After running the plan we can submit the updated job by simply running `nomad +run`. Once run, Nomad will begin the rolling upgrade of our service by placing +2 allocations at a time of the new job and taking two of the old jobs down. + +We can inspect the current state of a rolling deployment using `nomad status`: + +```text +$ nomad status geo-api-server +ID = geo-api-server +Name = geo-api-server +Submit Date = 07/26/17 18:08:56 UTC +Type = service +Priority = 50 +Datacenters = dc1 +Status = running +Periodic = false +Parameterized = false + +Summary +Task Group Queued Starting Running Failed Complete Lost +api-server 0 0 6 0 4 0 + +Latest Deployment +ID = c5b34665 +Status = running +Description = Deployment is running + +Deployed +Task Group Desired Placed Healthy Unhealthy +api-server 6 4 2 0 + +Allocations +ID Node ID Task Group Version Desired Status Created At +14d288e8 f7b1ee08 api-server 1 run running 07/26/17 18:09:17 UTC +a134f73c f7b1ee08 api-server 1 run running 07/26/17 18:09:17 UTC +a2574bb6 f7b1ee08 api-server 1 run running 07/26/17 18:08:56 UTC +496e7aa2 f7b1ee08 api-server 1 run running 07/26/17 18:08:56 UTC +9fc96fcc f7b1ee08 api-server 0 run running 07/26/17 18:04:30 UTC +2521c47a f7b1ee08 api-server 0 run running 07/26/17 18:04:30 UTC +6b794fcb f7b1ee08 api-server 0 stop complete 07/26/17 18:04:30 UTC +9bc11bd7 f7b1ee08 api-server 0 stop complete 07/26/17 18:04:30 UTC +691eea24 f7b1ee08 api-server 0 stop complete 07/26/17 18:04:30 UTC +af115865 f7b1ee08 api-server 0 stop complete 07/26/17 18:04:30 UTC +``` + +Here we can see that Nomad has created a deployment to conduct the rolling +upgrade from job version 0 to 1 and has placed 4 instances of the new job and +has stopped 4 of the old instances. If we look at the deployed allocations, we +also can see that Nomad has placed 4 instances of job version 1 but only +considers 2 of them healthy. This is because the 2 newest placed allocations +haven't been healthy for the required 30 seconds yet. + +If we wait for the deployment to complete and re-issue the command, we get the +following: + +```text +$ nomad status geo-api-server +ID = geo-api-server +Name = geo-api-server +Submit Date = 07/26/17 18:08:56 UTC +Type = service +Priority = 50 +Datacenters = dc1 +Status = running +Periodic = false +Parameterized = false + +Summary +Task Group Queued Starting Running Failed Complete Lost +cache 0 0 6 0 6 0 + +Latest Deployment +ID = c5b34665 +Status = successful +Description = Deployment completed successfully + +Deployed +Task Group Desired Placed Healthy Unhealthy +cache 6 6 6 0 + +Allocations +ID Node ID Task Group Version Desired Status Created At +d42a1656 f7b1ee08 api-server 1 run running 07/26/17 18:10:10 UTC +401daaf9 f7b1ee08 api-server 1 run running 07/26/17 18:10:00 UTC +14d288e8 f7b1ee08 api-server 1 run running 07/26/17 18:09:17 UTC +a134f73c f7b1ee08 api-server 1 run running 07/26/17 18:09:17 UTC +a2574bb6 f7b1ee08 api-server 1 run running 07/26/17 18:08:56 UTC +496e7aa2 f7b1ee08 api-server 1 run running 07/26/17 18:08:56 UTC +9fc96fcc f7b1ee08 api-server 0 stop complete 07/26/17 18:04:30 UTC +2521c47a f7b1ee08 api-server 0 stop complete 07/26/17 18:04:30 UTC +6b794fcb f7b1ee08 api-server 0 stop complete 07/26/17 18:04:30 UTC +9bc11bd7 f7b1ee08 api-server 0 stop complete 07/26/17 18:04:30 UTC +691eea24 f7b1ee08 api-server 0 stop complete 07/26/17 18:04:30 UTC +af115865 f7b1ee08 api-server 0 stop complete 07/26/17 18:04:30 UTC +``` + +Nomad has successfully transitioned the group to running the updated canary and +did so with no downtime to our service by ensuring only two allocations were +changed at a time and the newly placed allocations ran successfully. Had any of +the newly placed allocations failed their health check, Nomad would have aborted +the deployment and stopped placing new allocations. If configured, Nomad can +automatically revert back to the old job definition when the deployment fails. + +## Auto Reverting on Failed Deployments + +In the case we do a deployment in which the new allocations are unhealthy, Nomad +will fail the deployment and stop placing new instances of the job. It +optionally supports automatically reverting back to the last stable job version +on deployment failure. Nomad keeps a history of submitted jobs and whether the +job version was stable. A job is considered stable if all its allocations are +healthy. + +To enable this we simply add the `auto_revert` parameter to the `update` stanza: + +``` +update { + max_parallel = 2 + min_healthy_time = "30s" + healthy_deadline = "10m" + + # Enable automatically reverting to the last stable job on a failed + # deployment. + auto_revert = true +} +``` + +Now imagine we want to update our image to "geo-api-server:0.3" but we instead +update it to the below and run the job: + +```diff +@@ -2,6 +2,8 @@ job "geo-api-server" { + group "api-server" { + task "server" { + driver = "docker" + + config { +- image = "geo-api-server:0.2" ++ image = "geo-api-server:0.33" +``` + +If we run `nomad job deployments` we can see that the deployment fails and Nomad +auto-reverts to the last stable job: + +```text +$ nomad job deployments geo-api-server +ID Job ID Job Version Status Description +0c6f87a5 geo-api-server 3 successful Deployment completed successfully +b1712b7f geo-api-server 2 failed Failed due to unhealthy allocations - rolling back to job version 1 +3eee83ce geo-api-server 1 successful Deployment completed successfully +72813fcf geo-api-server 0 successful Deployment completed successfully +``` + +Nomad job versions increment monotonically, so even though Nomad reverted to the +job specification at version 1, it creates a new job version. We can see the +differences between a jobs versions and how Nomad auto-reverted the job using +the `job history` command: + +```text +$ nomad job history -p geo-api-server +Version = 3 +Stable = true +Submit Date = 07/26/17 18:44:18 UTC +Diff = ++/- Job: "geo-api-server" ++/- Task Group: "api-server" + +/- Task: "server" + +/- Config { + +/- image: "geo-api-server:0.33" => "geo-api-server:0.2" + } + +Version = 2 +Stable = false +Submit Date = 07/26/17 18:45:21 UTC +Diff = ++/- Job: "geo-api-server" ++/- Task Group: "api-server" + +/- Task: "server" + +/- Config { + +/- image: "geo-api-server:0.2" => "geo-api-server:0.33" + } + +Version = 1 +Stable = true +Submit Date = 07/26/17 18:44:18 UTC +Diff = ++/- Job: "geo-api-server" ++/- Task Group: "api-server" + +/- Task: "server" + +/- Config { + +/- image: "geo-api-server:0.1" => "geo-api-server:0.2" + } + +Version = 0 +Stable = true +Submit Date = 07/26/17 18:43:43 UTC +``` + +We can see that Nomad considered the job running "geo-api-server:0.1" and +"geo-api-server:0.2" as stable but job Version 2 that submitted the incorrect +image is marked as unstable. This is because the placed allocations failed to +start. Nomad detected the deployment failed and as such, created job Version 3 +that reverted back to the last healthy job. + +[update]: /docs/job-specification/update.html "Nomad update Stanza"