mirror of
https://github.com/kemko/nomad.git
synced 2026-01-04 09:25:46 +03:00
Updating
This commit is contained in:
@@ -7,3 +7,168 @@ description: |-
|
||||
---
|
||||
|
||||
# Updating a Job
|
||||
|
||||
When operating a service, updating the version of the job will be a common task.
|
||||
Under a cluster scheduler the same best practices apply for reliably deploying
|
||||
new versions including: rolling updates, blue-green deploys and canaries which
|
||||
are special cased blue-green deploys. This section will explore how to do each
|
||||
of these safely with Nomad.
|
||||
|
||||
## Rolling Updates
|
||||
|
||||
In order to update a service without introducing down-time, Nomad has build in
|
||||
support for rolling updates. When a job specifies a rolling update, with the
|
||||
below syntax, Nomad will only update `max-parallel` number of task groups at a
|
||||
time and will wait `stagger` duration before updating the next set.
|
||||
|
||||
```
|
||||
job "rolling" {
|
||||
...
|
||||
update {
|
||||
stagger = "30s"
|
||||
max_parallel = 1
|
||||
}
|
||||
...
|
||||
}
|
||||
```
|
||||
|
||||
We can use the [`nomad plan` command](/docs/commands/plan.html) while updating
|
||||
jobs to ensure the scheduler will do as we expect. In this example, we have 3
|
||||
web server instances that we want to update their version. After the job file
|
||||
was modified we can run `plan`:
|
||||
|
||||
```
|
||||
$ nomad plan my-web.nomad
|
||||
+/- Job: "my-web"
|
||||
+/- Task Group: "web" (3 create/destroy update)
|
||||
+/- Task: "web" (forces create/destroy update)
|
||||
+/- Config {
|
||||
+/- image: "nginx:1.10" => "nginx:1.11"
|
||||
port_map[0][http]: "80"
|
||||
}
|
||||
|
||||
Scheduler dry-run:
|
||||
- All tasks successfully allocated.
|
||||
- Rolling update, next evaluation will be in 10s.
|
||||
|
||||
Job Modify Index: 7
|
||||
To submit the job with version verification run:
|
||||
|
||||
nomad run -check-index 7 my-web.nomad
|
||||
|
||||
When running the job with the check-index flag, the job will only be run if the
|
||||
server side version matches the the job modify index returned. If the index has
|
||||
changed, another user has modified the job and the plan's results are
|
||||
potentially invalid.
|
||||
```
|
||||
|
||||
Here we can see that Nomad will destroy the 3 existing tasks and create 3
|
||||
replacements but it will occur with a rolling update with a stagger of `10s`.
|
||||
For more details on the update block, see
|
||||
[here](/docs/jobspec/index.html#update).
|
||||
|
||||
## Blue-green and Canaries
|
||||
|
||||
Blue-green deploys have serveral names, Red/Black, A/B, Blue/Green, but the
|
||||
concept is the same. The idea is to have two sets of applications with only one
|
||||
of them being live at a given time, except while transistion from one set to
|
||||
another. What the term "live" means is that the live set of applications are
|
||||
the set receiving traffic.
|
||||
|
||||
So imagine we have an API server that has 10 instances deployed to production
|
||||
at version 1 and we want to upgrade to version 2. Hopefully the new version has
|
||||
been tested in a QA environment and is now ready to start accepting production
|
||||
traffic.
|
||||
|
||||
In this case we would consider version 1 to be the live set and we want to
|
||||
transistion to version 2. We can model this workflow with the below job:
|
||||
|
||||
```
|
||||
job "my-api" {
|
||||
...
|
||||
|
||||
group "api-green" {
|
||||
count = 10
|
||||
|
||||
task "api-server" {
|
||||
driver = "docker"
|
||||
|
||||
config {
|
||||
image = "api-server:v1"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
group "api-blue" {
|
||||
count = 0
|
||||
|
||||
task "api-server" {
|
||||
driver = "docker"
|
||||
|
||||
config {
|
||||
image = "api-server:v2"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Here we can see the live group is "api-green" since it has a non-zero count. To
|
||||
transistion to v2, we up the count of "api-blue" and down the count of
|
||||
"api-green". We can now see how the canary process is a special case of
|
||||
blue-green. If we set "api-blue" to `count = 1` and "api-green" to `count = 9`,
|
||||
there will still be the original 10 instances but we will be testing only one
|
||||
instance of the new version, essentially canarying it.
|
||||
|
||||
If at any time we notice that the new version is behaving incorrectly and we
|
||||
want to roll back, all that we have to do is drop the count of the new group to
|
||||
0 and restore the original version back to 10. This fine control lets job
|
||||
operators be confident that deployments will not cause down time. If the deploy
|
||||
is successful and we fully transistion from v1 to v2 the job file will look like
|
||||
this:
|
||||
|
||||
```
|
||||
job "my-api" {
|
||||
...
|
||||
|
||||
group "api-green" {
|
||||
count = 0
|
||||
|
||||
task "api-server" {
|
||||
driver = "docker"
|
||||
|
||||
config {
|
||||
image = "api-server:v1"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
group "api-blue" {
|
||||
count = 10
|
||||
|
||||
task "api-server" {
|
||||
driver = "docker"
|
||||
|
||||
config {
|
||||
image = "api-server:v2"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Now "api-blue" is the live group and when we are ready to update the api to v3,
|
||||
we would modify "api-green" and repeat this process. The rate at which the count
|
||||
of groups are incremented and decremented is totally up to the user. It is
|
||||
usually good practice to start by transistion one at a time until a certain
|
||||
confidence threshold is met based on application specific logs and metrics.
|
||||
|
||||
## Handling Drain Signals
|
||||
|
||||
On operating systems that support signals, Nomad will signal the application
|
||||
before killing it. This gives the application time to gracefully drain
|
||||
connections and conduct any other cleanup that is necessary. Certain
|
||||
applications take longer to drain than others and as such Nomad lets the job
|
||||
file specify how long to wait inbetween signaling the application to exit and
|
||||
forcefully killing it. This is configurable via the `kill_timeout`. More details
|
||||
can be seen [here](/docs/jobspec/index.html#kill_timeout).
|
||||
|
||||
@@ -150,6 +150,8 @@ The `job` object supports the following keys:
|
||||
and defaults to `service`. To learn more about each scheduler type visit
|
||||
[here](/docs/jobspec/schedulers.html)
|
||||
|
||||
<a id="update"></a>
|
||||
|
||||
* `update` - Specifies the task's update strategy. When omitted, rolling
|
||||
updates are disabled. The `update` block supports the following keys:
|
||||
|
||||
@@ -266,9 +268,13 @@ The `task` object supports the following keys:
|
||||
|
||||
* `meta` - Annotates the task group with opaque metadata.
|
||||
|
||||
<a id="kill_timeout"></a>
|
||||
|
||||
* `kill_timeout` - `kill_timeout` is a time duration that can be specified using
|
||||
the `s`, `m`, and `h` suffixes, such as `30s`. It can be used to configure the
|
||||
time between signaling a task it will be killed and actually killing it.
|
||||
time between signaling a task it will be killed and actually killing it. Nomad
|
||||
sends an `os.Interrupt` which on Unix systems is defined as `SIGINT`. After
|
||||
the timeout a kill signal is sent (on Unix `SIGKILL`).
|
||||
|
||||
* `logs` - Logs allows configuring log rotation for the `stdout` and `stderr`
|
||||
buffers of a Task. See the log rotation reference below for more details.
|
||||
|
||||
Reference in New Issue
Block a user