This commit is contained in:
Alex Dadgar
2016-06-30 11:49:59 -07:00
parent 4718046bec
commit fe490b45b6
2 changed files with 172 additions and 1 deletions

View File

@@ -7,3 +7,168 @@ description: |-
---
# Updating a Job
When operating a service, updating the version of the job will be a common task.
Under a cluster scheduler the same best practices apply for reliably deploying
new versions including: rolling updates, blue-green deploys and canaries which
are special cased blue-green deploys. This section will explore how to do each
of these safely with Nomad.
## Rolling Updates
In order to update a service without introducing down-time, Nomad has build in
support for rolling updates. When a job specifies a rolling update, with the
below syntax, Nomad will only update `max-parallel` number of task groups at a
time and will wait `stagger` duration before updating the next set.
```
job "rolling" {
...
update {
stagger = "30s"
max_parallel = 1
}
...
}
```
We can use the [`nomad plan` command](/docs/commands/plan.html) while updating
jobs to ensure the scheduler will do as we expect. In this example, we have 3
web server instances that we want to update their version. After the job file
was modified we can run `plan`:
```
$ nomad plan my-web.nomad
+/- Job: "my-web"
+/- Task Group: "web" (3 create/destroy update)
+/- Task: "web" (forces create/destroy update)
+/- Config {
+/- image: "nginx:1.10" => "nginx:1.11"
port_map[0][http]: "80"
}
Scheduler dry-run:
- All tasks successfully allocated.
- Rolling update, next evaluation will be in 10s.
Job Modify Index: 7
To submit the job with version verification run:
nomad run -check-index 7 my-web.nomad
When running the job with the check-index flag, the job will only be run if the
server side version matches the the job modify index returned. If the index has
changed, another user has modified the job and the plan's results are
potentially invalid.
```
Here we can see that Nomad will destroy the 3 existing tasks and create 3
replacements but it will occur with a rolling update with a stagger of `10s`.
For more details on the update block, see
[here](/docs/jobspec/index.html#update).
## Blue-green and Canaries
Blue-green deploys have serveral names, Red/Black, A/B, Blue/Green, but the
concept is the same. The idea is to have two sets of applications with only one
of them being live at a given time, except while transistion from one set to
another. What the term "live" means is that the live set of applications are
the set receiving traffic.
So imagine we have an API server that has 10 instances deployed to production
at version 1 and we want to upgrade to version 2. Hopefully the new version has
been tested in a QA environment and is now ready to start accepting production
traffic.
In this case we would consider version 1 to be the live set and we want to
transistion to version 2. We can model this workflow with the below job:
```
job "my-api" {
...
group "api-green" {
count = 10
task "api-server" {
driver = "docker"
config {
image = "api-server:v1"
}
}
}
group "api-blue" {
count = 0
task "api-server" {
driver = "docker"
config {
image = "api-server:v2"
}
}
}
}
```
Here we can see the live group is "api-green" since it has a non-zero count. To
transistion to v2, we up the count of "api-blue" and down the count of
"api-green". We can now see how the canary process is a special case of
blue-green. If we set "api-blue" to `count = 1` and "api-green" to `count = 9`,
there will still be the original 10 instances but we will be testing only one
instance of the new version, essentially canarying it.
If at any time we notice that the new version is behaving incorrectly and we
want to roll back, all that we have to do is drop the count of the new group to
0 and restore the original version back to 10. This fine control lets job
operators be confident that deployments will not cause down time. If the deploy
is successful and we fully transistion from v1 to v2 the job file will look like
this:
```
job "my-api" {
...
group "api-green" {
count = 0
task "api-server" {
driver = "docker"
config {
image = "api-server:v1"
}
}
}
group "api-blue" {
count = 10
task "api-server" {
driver = "docker"
config {
image = "api-server:v2"
}
}
}
}
```
Now "api-blue" is the live group and when we are ready to update the api to v3,
we would modify "api-green" and repeat this process. The rate at which the count
of groups are incremented and decremented is totally up to the user. It is
usually good practice to start by transistion one at a time until a certain
confidence threshold is met based on application specific logs and metrics.
## Handling Drain Signals
On operating systems that support signals, Nomad will signal the application
before killing it. This gives the application time to gracefully drain
connections and conduct any other cleanup that is necessary. Certain
applications take longer to drain than others and as such Nomad lets the job
file specify how long to wait inbetween signaling the application to exit and
forcefully killing it. This is configurable via the `kill_timeout`. More details
can be seen [here](/docs/jobspec/index.html#kill_timeout).

View File

@@ -150,6 +150,8 @@ The `job` object supports the following keys:
and defaults to `service`. To learn more about each scheduler type visit
[here](/docs/jobspec/schedulers.html)
<a id="update"></a>
* `update` - Specifies the task's update strategy. When omitted, rolling
updates are disabled. The `update` block supports the following keys:
@@ -266,9 +268,13 @@ The `task` object supports the following keys:
* `meta` - Annotates the task group with opaque metadata.
<a id="kill_timeout"></a>
* `kill_timeout` - `kill_timeout` is a time duration that can be specified using
the `s`, `m`, and `h` suffixes, such as `30s`. It can be used to configure the
time between signaling a task it will be killed and actually killing it.
time between signaling a task it will be killed and actually killing it. Nomad
sends an `os.Interrupt` which on Unix systems is defined as `SIGINT`. After
the timeout a kill signal is sent (on Unix `SIGKILL`).
* `logs` - Logs allows configuring log rotation for the `stdout` and `stderr`
buffers of a Task. See the log rotation reference below for more details.