Updating

2026-01-10 04:15:41 +03:00 · 2016-06-30 11:49:59 -07:00
parent 4718046bec
commit fe490b45b6
2 changed files with 172 additions and 1 deletions
--- a/website/source/docs/jobops/updating.html.md
+++ b/website/source/docs/jobops/updating.html.md
@@ -7,3 +7,168 @@ description: |-
 ---

 # Updating a Job
+
+When operating a service, updating the version of the job will be a common task.
+Under a cluster scheduler the same best practices apply for reliably deploying
+new versions including: rolling updates, blue-green deploys and canaries which
+are special cased blue-green deploys. This section will explore how to do each
+of these safely with Nomad.
+
+## Rolling Updates
+
+In order to update a service without introducing down-time, Nomad has build in
+support for rolling updates. When a job specifies a rolling update, with the
+below syntax, Nomad will only update `max-parallel` number of task groups at a
+time and will wait `stagger` duration before updating the next set.
+
+```
+job "rolling" {
+    ...
+    update {
+        stagger = "30s"
+        max_parallel = 1
+    }
+    ...
+}
+```
+
+We can use the [`nomad plan` command](/docs/commands/plan.html) while updating
+jobs to ensure the scheduler will do as we expect. In this example, we have 3
+web server instances that we want to update their version. After the job file
+was modified we can run `plan`:
+
+```
+$ nomad plan my-web.nomad
+/- Job: "my-web"
+/- Task Group: "web" (3 create/destroy update)
+  +/- Task: "web" (forces create/destroy update)
+    +/- Config {
+      +/- image:             "nginx:1.10" => "nginx:1.11"
+          port_map[0][http]: "80"
+    }
+
+Scheduler dry-run:
+- All tasks successfully allocated.
+- Rolling update, next evaluation will be in 10s.
+
+Job Modify Index: 7
+To submit the job with version verification run:
+
+nomad run -check-index 7 my-web.nomad
+
+When running the job with the check-index flag, the job will only be run if the
+server side version matches the the job modify index returned. If the index has
+changed, another user has modified the job and the plan's results are
+potentially invalid.
+```
+
+Here we can see that Nomad will destroy the 3 existing tasks and create 3
+replacements but it will occur with a rolling update with a stagger of `10s`.
+For more details on the update block, see
+[here](/docs/jobspec/index.html#update).
+
+## Blue-green and Canaries
+
+Blue-green deploys have serveral names, Red/Black, A/B, Blue/Green, but the
+concept is the same. The idea is to have two sets of applications with only one
+of them being live at a given time, except while transistion from one set to
+another.  What the term "live" means is that the live set of applications are
+the set receiving traffic.
+
+So imagine we have an API server that has 10 instances deployed to production
+at version 1 and we want to upgrade to version 2. Hopefully the new version has
+been tested in a QA environment and is now ready to start accepting production
+traffic.
+
+In this case we would consider version 1 to be the live set and we want to
+transistion to version 2. We can model this workflow with the below job:
+
+```
+job "my-api" {
+    ...
+
+    group "api-green" {
+        count = 10
+
+        task "api-server" {
+            driver = "docker"
+            
+            config {
+                image = "api-server:v1"
+            }
+        }
+    }
+
+    group "api-blue" {
+        count = 0
+
+        task "api-server" {
+            driver = "docker"
+            
+            config {
+                image = "api-server:v2"
+            }
+        }
+    }
+}
+```
+
+Here we can see the live group is "api-green" since it has a non-zero count. To
+transistion to v2, we up the count of "api-blue" and down the count of
+"api-green". We can now see how the canary process is a special case of
+blue-green. If we set "api-blue" to `count = 1` and "api-green" to `count = 9`,
+there will still be the original 10 instances but we will be testing only one
+instance of the new version, essentially canarying it.
+
+If at any time we notice that the new version is behaving incorrectly and we
+want to roll back, all that we have to do is drop the count of the new group to
+0 and restore the original version back to 10. This fine control lets job
+operators be confident that deployments will not cause down time. If the deploy
+is successful and we fully transistion from v1 to v2 the job file will look like
+this:
+
+```
+job "my-api" {
+    ...
+
+    group "api-green" {
+        count = 0
+
+        task "api-server" {
+            driver = "docker"
+            
+            config {
+                image = "api-server:v1"
+            }
+        }
+    }
+
+    group "api-blue" {
+        count = 10
+
+        task "api-server" {
+            driver = "docker"
+            
+            config {
+                image = "api-server:v2"
+            }
+        }
+    }
+}
+```
+
+Now "api-blue" is the live group and when we are ready to update the api to v3,
+we would modify "api-green" and repeat this process. The rate at which the count
+of groups are incremented and decremented is totally up to the user. It is
+usually good practice to start by transistion one at a time until a certain
+confidence threshold is met based on application specific logs and metrics.
+
+## Handling Drain Signals
+
+On operating systems that support signals, Nomad will signal the application
+before killing it. This gives the application time to gracefully drain
+connections and conduct any other cleanup that is necessary. Certain
+applications take longer to drain than others and as such Nomad lets the job
+file specify how long to wait inbetween signaling the application to exit and
+forcefully killing it. This is configurable via the `kill_timeout`. More details
+can be seen [here](/docs/jobspec/index.html#kill_timeout).
--- a/website/source/docs/jobspec/index.html.md
+++ b/website/source/docs/jobspec/index.html.md
@@ -150,6 +150,8 @@ The `job` object supports the following keys:
  and defaults to `service`. To learn more about each scheduler type visit
  [here](/docs/jobspec/schedulers.html)

+<a id="update"></a>
+
 *   `update` - Specifies the task's update strategy. When omitted, rolling
    updates are disabled. The `update` block supports the following keys:

@@ -266,9 +268,13 @@ The `task` object supports the following keys:

 * `meta` - Annotates the task group with opaque metadata.

+<a id="kill_timeout"></a>
+
 * `kill_timeout` - `kill_timeout` is a time duration that can be specified using
  the `s`, `m`, and `h` suffixes, such as `30s`. It can be used to configure the
-  time between signaling a task it will be killed and actually killing it.
+  time between signaling a task it will be killed and actually killing it. Nomad
+  sends an `os.Interrupt` which on Unix systems is defined as `SIGINT`. After
+  the timeout a kill signal is sent (on Unix `SIGKILL`).

 * `logs` - Logs allows configuring log rotation for the `stdout` and `stderr`
  buffers of a Task. See the log rotation reference below for more details.