nomad/website/content/docs/job-specification/disconnect.mdx

---
layout: docs
page_title: disconnect Block - Job Specification
description: |-
  The "disconnect" block describes the behavior of both the Nomad server and
  client in case of a network partition, as well as how to reconcile the workloads
  in case of a reconnection.
---

# `disconnect` Block

<Placement groups={['job', 'group', 'disconnect']} />

The `disconnect` block describes the system's behavior in case of a network
partition. By default, without a `disconnect` block, if an allocation is on a
node that misses heartbeats, the allocation will be marked `lost` and will be
replaced.

Replacement happens when a node is lost. When a node is drained, Nomad
[migrates][] the allocations instead, and Nomad ignores the `disconnect`
block. When a Nomad agent fails to set up the allocation or the tasks of an
allocation fail more than their [`restart`][] block allows, Nomad
[reschedules][] the allocations and ignores the `disconnect`.


```hcl
job "docs" {
  group "example" {
    disconnect {
      lost_after = "6h"
      replace = false
      reconcile = "keep_original"
    }
  }

  group "example2" {
    disconnect {
      stop_on_client_after = "12h"
      replace = false
      reconcile = "keep_original"
    }
  }
}
```

~> Note that you cannot use both`lost_after` and `stop_on_client_after` in the
same `disconnect` block.

## `disconnect` Parameters

- `lost_after` `(string: "")` - Specifies a duration during which a Nomad client
  will attempt to reconnect allocations after it fails to heartbeat in the
  [`heartbeat_grace`][] window.  It defaults to "", which is equivalent to
  having the disconnect block be nil.

  You cannot use `lost_after` and `stop_on_client_after` in the same
  `disconnect` block.

  Refer to [the Lost After section][lost-after] for more details.

- `replace` `(bool: false)` - Specifies if Nomad should replace the disconnected
  allocation with a new one rescheduled on a different node. Nomad considers the
  replacement allocation a reschedule and obeys the job's [`reschedule`][]
  block. If false and the node the allocation is running on disconnects
  or goes down, Nomad does not replace this allocation and reports `unknown`
  until the node reconnects, or until you manually stop the allocation.

  ```plaintext
  `nomad alloc stop  <alloc ID>`
  ```

  If true, a new alloc will be placed immediately upon the node becoming
  disconnected.

- `stop_on_client_after` `(string: "")` - Specifies a duration after which a
  disconnected Nomad client will stop its allocations. Setting
  `stop_on_client_after` shorter than `lost_after` and `replace = false` at the
  same time is not permitted and will cause a validation error, because this
  would lead to a state where no allocations can be scheduled.

  The Nomad client process must be running for this to occur.

  You cannot use `stop_on_client_after` and `lost_after` in the same
  `disconnect` block.

  Refer to [the Stop After section][stop-after] for more details.

- `reconcile` `(string: "best_score")` - Specifies which allocation to keep once
  the previously disconnected node regains connectivity.
  It has four possible values which are described below:

    - `keep_original`: Always keep the original allocation. Bear in mind
    when choosing this option, it can have crashed while the client was
    disconnected.
    - `keep_replacement`: Always keep the allocation that was replaced
    to replace the disconnected one.
    - `best_score`: Keep the allocation running on the node with the best
    score.
    - `longest_running`: Keep the allocation that has been up and running
    continuously for the longest time.


## `disconnect` Examples

The following examples only show the `disconnect` blocks. Remember that the
`disconnect` block is only valid in the placements listed previously.

### Stop After

This example shows how `stop_on_client_after` interacts with
other blocks. For the `first` group, after the default 10 second
[`heartbeat_grace`] window expires and 90 more seconds passes, the
server replaces the allocation. The client waits 90 seconds
before sending a stop signal (`SIGTERM`) to the `first-task`
task. After 15 more seconds because of the task's `kill_timeout`, the
client will send `SIGKILL`. The `second` group does not have
`stop_on_client_after`, so the server replaces the
allocation after the 10 second [`heartbeat_grace`] expires. It will
not be stopped on the client, regardless of how long the client is out
of touch.

Note that if the server's clocks are not closely synchronized with
each other, the server may replace the group before the client has
stopped the allocation. Operators should ensure that clock drift
between servers is as small as possible.

Note also that a group using this feature will be stopped on the
client if the Nomad server cluster fails, since the client will be
unable to contact any server in that case. Groups opting in to this
feature are therefore exposed to an additional runtime dependency and
potential point of failure.

```hcl
group "first" {
  disconnect {
    stop_on_client_after = "90s"
  }

  task "first-task" {
    kill_timeout = "15s"
  }
}

group "second" {

  task "second-task" {
    kill_timeout = "5s"
  }
}
```

### Lost After

By default, allocations running on a client that fails to heartbeat will be
marked "lost". When a client reconnects, its allocations, which may still be
healthy, will restart because they have been marked "lost". This can cause
issues with stateful tasks or tasks with long restart times.

Instead, an operator may desire that these allocations reconnect without a
restart. When `lost_after` is specified, the Nomad server will mark
clients that fail to heartbeat as "disconnected" rather than "down", and will
mark allocations on a disconnected client as "unknown" rather than "lost".
These allocations may continue to run on the disconnected client. Replacement
allocations will be scheduled according to the allocations' `replace` settings
until the disconnected client reconnects. Once a disconnected client reconnects,
Nomad will compare the "unknown" allocations with their replacements will
decide which ones to keep according to the `reconcile` setting.
If the `lost_after` duration expires before the client reconnects,
the allocations will be marked "lost". Clients that contain "unknown"
allocations will transition to "disconnected" rather than "down" until the last
`lost_after` duration has expired.

In the example code below, if both of these task groups were placed on the same
client and that client experienced a network outage, both of the group's
allocations would be marked as "disconnected" at two minutes because of the
client's `heartbeat_grace` value of "2m". If the network outage continued for
eight hours, and the client continued to fail to heartbeat, the client would
remain in a "disconnected" state, as the first group's `lost_after`
is twelve hours. Once all groups' `lost_after` durations are
exceeded, in this case in twelve hours, the client node will be marked as "down"
and the allocation will be marked as "lost". If the client had reconnected
before twelve hours had passed, the allocations would gracefully reconnect
using the strategy defined by [`reconcile`].

Lost After is useful for edge deployments, or scenarios when
operators want zero on-client downtime due to node connectivity issues. This
setting cannot be used with `stop_on_client_after`.

```hcl
# server_config.hcl

server {
  enabled         = true
  heartbeat_grace = "2m"
}
```

```hcl
# jobspec.nomad

group "first" {
  disconnect {
    lost_after = "12h"
    reconcile = "best_score"
  }

  task "first-task" {
    ...
  }
}

group "second" {
  disconnect {
    lost_after = "12h"
    reconcile = "keep_original"
  }

  task "second-task" {
    ...
  }
}
```

[`heartbeat_grace`]: /nomad/docs/configuration/server#heartbeat_grace
[stop-after]: /nomad/docs/job-specification/disconnect#stop-after
[lost-after]: /nomad/docs/job-specification/disconnect#lost-after
[`reconcile`]: /nomad/docs/job-specification/disconnect#reconcile
[migrates]: /nomad/docs/job-specification/migrate
[`restart`]: /nomad/docs/job-specification/restart
[reschedules]: /nomad/docs/job-specification/reschedule
[`reschedule`]: /nomad/docs/job-specification/reschedule