Files
nomad/website/content/docs/monitor/inspect-workloads.mdx
Aimee Ukasick 5dc7e7fe25 Docs: Chore: Ent labels (#26323)
* replace outdated tutorial links

* update more tutorial links

* Add CE/ENT or ENT to left nav

* remove ce/ent labels

* revert enterprise features
2025-07-30 09:02:28 -05:00

222 lines
9.3 KiB
Plaintext

---
layout: docs
page_title: Inspect job workloads
description: |-
Inspect jobs, allocations, and tasks in the Nomad web UI and interact
with these objects to perform operations like restart an allocation or
stop a job.
---
# Inspect job workloads
The Web UI can be a powerful companion when monitoring and debugging jobs
running in Nomad. The Web UI will list all jobs, link jobs to allocations,
allocations to client nodes, client nodes to driver health, and much more.
## List jobs
The first page you will arrive at in the Web UI is the Jobs List page. Here you
will find every job for a namespace in a region. The table of jobs is
searchable, sortable, and filterable. Each job row in the table shows basic
information, such as job name, status, type, and priority, as well as richer
information such as a visual representation of all allocation statuses.
This view will also live-update as jobs get submitted, get purged, and change
status.
[![Jobs List][img-jobs-list]][img-jobs-list]
## Filter job view
If your Nomad cluster has many jobs, it can be useful to filter the list of all
jobs down to only those matching certain facets. The Web UI has four facets you
can filter by:
1. **Type:** The type of job, including Batch, Parameterized, Periodic, Service,
and System.
1. **Status:** The status of the job, including Pending, Running, and Dead.
1. **Datacenter:** The datacenter the job is running in, including a dynamically
generated list based on the jobs in the namespace.
1. **Prefix:** The possible common naming prefix for a job, including a
dynamically generated list based on job names up to the first occurrence of
`-`, `.`, and `_`. Only prefixes that match multiple jobs are included.
[![Job Filters][img-job-filters]][img-job-filters]
## Inspect an allocation
In Nomad, allocations are the schedulable units of work. This is where runtime
metrics begin to surface. An allocation is composed of one or more tasks, and
the utilization metrics for tasks are aggregated so they can be observed at the
allocation level.
### Monitor resource utilization
Nomad has APIs for reading point-in-time resource utilization metrics for tasks
and allocations. The Web UI uses these metrics to create time-series graphs for
the current session.
When viewing an allocation, resource utilization will automatically start
logging.
[![Allocation Resource Utilization][img-alloc-resource-utilization]][img-alloc-resource-utilization]
### Review task events
When Nomad places, prepares, and starts a task, a series of task events are
emitted to help debug issues in the event that the task fails to start.
Task events are listed on the Task Detail page and live-update as Nomad handles
managing the task.
[![Task Events][img-task-events]][img-task-events]
### View rescheduled allocations
Allocations will be placed on any client node that satisfies the constraints of
the job definition. There are events, however, that will cause Nomad to
reschedule allocations, (e.g., node failures).
Allocations can be configured [in the job definition to reschedule] to a
different client node if the allocation ends in a failed status. This will
happen after the task has exhausted its [local restart attempts].
The end result of this automatic procedure is a failed allocation and that
failed allocation's rescheduled successor. Since Nomad handles all of this
automatically, the Web UI makes sure to explain the state of allocations through
icons and linking previous and next allocations in a reschedule chain.
[![Allocation Reschedule Icon][img-alloc-reschedule-icon]][img-alloc-reschedule-icon]
[![Allocation Reschedule Details][img-alloc-reschedule-details]][img-alloc-reschedule-details]
### Unhealthy driver
Given the nature of long-lived processes, it's possible for the state of the
client node an allocation is scheduled on to change during the lifespan of the
allocation. Nomad attempts to monitor pertinent conditions including driver
health.
The Web UI denotes when a driver an allocation depends on is unhealthy on the
client node the allocation is running on.
[![Allocation Unhealthy Driver][img-alloc-unhealthy-driver]][img-alloc-unhealthy-driver]
### Preempted allocations
Much like how Nomad will automatically reschedule allocations, Nomad will
automatically preempt allocations when necessary. When monitoring allocations in
Nomad, it's useful to know what allocations were preempted and what job caused
the preemption.
The Web UI makes sure to tell this full story by showing which allocation caused
an allocation to be preempted as well as the opposite: what allocations an
allocation preempted. This makes it possible to traverse down from a job to a
preempted allocation, to the allocation that caused the preemption, to the job
that the preempting allocation is for.
[![Allocation Preempter][img-alloc-preempter]][img-alloc-preempter]
[![Allocation Preempted][img-alloc-preempted]][img-alloc-preempted]
## Review task logs
A task will typically emit log information to `stdout` and `stderr`. Nomad
captures these logs and exposes them through an API. The Web UI uses these APIs
to offer `head`, `tail`, and streaming logs from the browser.
The Web UI will first attempt to directly connect to the client node the task is
running on. Typically, client nodes are not accessible from the public internet.
If this is the case, the Web UI will fall back and proxy to the client node from
the server node with no loss of functionality.
[![Task Logs][img-task-logs]][img-task-logs]
~> Not all browsers support streaming HTTP requests. In the event that streaming
is not supported, logs will still be followed using interval polling.
## Restart or stop an allocation or task
Nomad allows for restarting and stopping individual allocations and tasks. When
a task is restarted, Nomad will perform a local restart of the task. When an
allocation is stopped, Nomad will mark the allocation as complete and perform a
reschedule onto a different client node.
Both of these features are also available in the Web UI.
[![Allocation Stop and Restart][img-alloc-stop-restart]][img-alloc-stop-restart]
## Force a periodic instance
Periodic jobs are configured like a cron job. Sometimes, you might want to start
the job before the its next scheduled run time. Nomad calls this a [periodic
force] and it can be done from the Web UI on the Job Overview page for a
periodic job.
[![Periodic Force][img-periodic-force]][img-periodic-force]
## Submit a new version of a job
From the Job Definition page, a job can be edited. After clicking the Edit
button in the top-right corner of the code window, the job definition JSON
becomes editable. The edits can then be planned and scheduled.
[![Job Definition Edit][img-job-definition-edit]][img-job-definition-edit]
~> Since each job within a namespace must have a unique name, it is possible to
submit a new version of a job from the Run Job screen. Always review the plan
output.
## Monitor a deployment
When a system or service job includes the [`update` stanza], a deployment is
created upon job submission. Job deployments can be monitored in realtime from
the Web UI.
The Web UI will show as new allocations become placed, tallying towards the
expected total, and tally allocations as they become healthy or unhealthy.
Optionally, a job may use canary deployments to allow for additional health
checks or manual testing before a full roll out. If a job uses canaries and is
not configured to automatically promote the canary, the canary promotion
operation can be done from the Job Overview page in the Web UI.
[![Job Deployment with Canary Promotion][img-job-deployment-canary]][img-job-deployment-canary]
## Stop a job
Jobs can be stopped from the Job Overview page. Stopping a job will gracefully
stop all allocations, marking them as complete, and freeing up resources in the
cluster.
[![Job Stop][img-job-stop]][img-job-stop]
## Continue your exploration
Now that you have explored operations that can be performed with jobs through
the Nomad UI, learn how to inspect the state of your cluster using the Nomad UI.
[`update` stanza]: /nomad/docs/job-specification/update
[img-alloc-preempted]: /img/monitor/guide-ui-img-alloc-preempted.png
[img-alloc-preempter]: /img/monitor/guide-ui-img-alloc-preempter.png
[img-alloc-reschedule-details]: /img/monitor/guide-ui-img-alloc-reschedule-details.png
[img-alloc-reschedule-icon]: /img/monitor/guide-ui-img-alloc-reschedule-icon.png
[img-alloc-resource-utilization]: /img/monitor/guide-ui-img-alloc-resource-utilization.png
[img-alloc-stop-restart]: /img/monitor/guide-ui-img-alloc-stop-restart.png
[img-alloc-unhealthy-driver]: /img/monitor/guide-ui-img-alloc-unhealthy-driver.png
[img-job-definition-edit]: /img/monitor/guide-ui-img-job-definition-edit.png
[img-job-deployment-canary]: /img/monitor/guide-ui-img-job-deployment-canary.png
[img-job-filters]: /img/monitor/guide-ui-img-job-filters.png
[img-job-stop]: /img/monitor/guide-ui-img-job-stop.png
[img-jobs-list]: /img/monitor/guide-ui-jobs-list.png
[img-periodic-force]: /img/monitor/guide-ui-img-periodic-force.png
[img-task-events]: /img/monitor/guide-ui-img-task-events.png
[img-task-logs]: /img/monitor/guide-ui-img-task-logs.png
[in the job definition to reschedule]: /nomad/docs/job-specification/reschedule
[local restart attempts]: /nomad/docs/job-specification/restart
[periodic force]: /nomad/commands/job/periodic-force
[securing the web ui with acls]: /nomad/docs/secure/acl