Commit Graph

101 Commits

Author SHA1 Message Date
Piotr Kazmierczak
611452e1af stateful deployments: use TaskGroupVolumeClaim table to associate volume requests with volume IDs (#24993)
We introduce an alternative solution to the one presented in #24960 which is
based on the state store and not previous-next allocation tracking in the
reconciler. This new solution reduces cognitive complexity of the scheduler
code at the cost of slightly more boilerplate code, but also opens up new
possibilities in the future, e.g., allowing users to explicitly "un-stick"
volumes with workloads still running.

The diagram below illustrates the new logic:

     SetVolumes()                                               upsertAllocsImpl()          
     sets ns, job                             +-----------------checks if alloc requests    
     tg in the scheduler                      v                 sticky vols and consults    
            |                  +-----------------------+        state. If there is no claim,
            |                  | TaskGroupVolumeClaim: |        it creates one.             
            |                  | - namespace           |                                    
            |                  | - jobID               |                                    
            |                  | - tg name             |                                    
            |                  | - vol ID              |                                    
            v                  | uniquely identify vol |                                    
     hasVolumes()              +----+------------------+                                    
     consults the state             |           ^                                           
     and returns true               |           |               DeleteJobTxn()              
     if there's a match <-----------+           +---------------removes the claim from      
     or if there is no                                          the state                   
     previous claim                                                                         
|                             | |                                                      |    
+-----------------------------+ +------------------------------------------------------+    
                                                                                            
           scheduler                                  state store
2025-02-07 17:41:01 +01:00
Piotr Kazmierczak
ebffcce378 stateful deployments: remove CSIVolumeIDs (#24908) 2025-01-21 17:00:55 +01:00
Piotr Kazmierczak
8cbb74786c stateful deployments: find feasible node for sticky host volumes (#24558)
This changeset implements node feasibility checks for sticky host volumes.
2024-12-19 09:25:55 -05:00
Michael Schurter
cbbe6bb389 docs: explain schedule state values (#24160)
* docs: explain schedule state values

GET /v1/client/allocation/:alloc_id/pause?task=:task_name is a tiny but
critical API for observability of tasks with a schedule. This PR
explains each of the values which might be returned.

* correct docstring

* add missing state and expand PUT docs

---------

Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>
2024-10-17 11:42:12 -07:00
Daniel Bennett
5e1fae2856 networking: set alloc NetworkStatus.AddressIPv6 (#23959)
when a CNI result includes an IPv6 address,
set it on the alloc's NetworkStatus for reference.

e.g.:

$ nomad alloc status -json 3dca | jq '.NetworkStatus'
{
  "Address": "172.26.64.14",
  "AddressIPv6": "fd00:a110:c8::b",
  "DNS": null,
  "InterfaceName": "eth0"
}
2024-09-16 10:21:52 -05:00
Daniel Bennett
cfeedd05e8 api: use the task in Allocations.GetPauseState (#23377) 2024-06-18 12:31:12 -05:00
Tim Gross
fa70267787 scheduler: RescheduleTracker dropped if follow-up fails placements (#12319)
When an allocation fails it triggers an evaluation. The evaluation is processed
and the scheduler sees it needs to reschedule, which triggers a follow-up
eval. The follow-up eval creates a plan to `(stop 1) (place 1)`. The replacement
alloc has a `RescheduleTracker` (or gets its `RescheduleTracker` updated).

But in the case where the follow-up eval can't place all allocs (there aren't
enough resources), it can create a partial plan to `(stop 1) (place 0)`. It then
creates a blocked eval. The plan applier stops the failed alloc. Then when the
blocked eval is processed, the job is missing an allocation, so the scheduler
creates a new allocation. This allocation is _not_ a replacement from the
perspective of the scheduler, so it's not handed off a `RescheduleTracker`.

This changeset fixes this by annotating the reschedule tracker whenever the
scheduler can't place a replacement allocation. We check this annotation for
allocations that have the `stop` desired status when filtering out allocations
to pass to the reschedule tracker. I've also included tests that cover this case
and expands coverage of the relevant area of the code.

Fixes: https://github.com/hashicorp/nomad/issues/12147
Fixes: https://github.com/hashicorp/nomad/issues/17072
2024-06-10 11:15:40 -04:00
Daniel Bennett
4415fabe7d jobspec: time based task execution (#22201)
this is the CE side of an Enterprise-only feature.
a job trying to use this in CE will fail to validate.

to enable daily-scheduled execution entirely client-side,
a job may now contain:

task "name" {
  schedule {
    cron {
      start    = "0 12 * * * *" # may not include "," or "/"
      end      = "0 16"         # partial cron, with only {minute} {hour}
      timezone = "EST"          # anything in your tzdb
    }
  }
...

and everything about the allocation will be placed as usual,
but if outside the specified schedule, the taskrunner will block
on the client, waiting on the schedule start, before proceeding
with the task driver execution, etc.

this includes a taksrunner hook, which watches for the end of
the schedule, at which point it will kill the task.

then, restarts-allowing, a new task will start and again block
waiting for start, and so on.

this also includes all the plumbing required to pipe API calls
through from command->api->agent->server->client, so that
tasks can be force-run, force-paused, or resume the schedule
on demand.
2024-05-22 15:40:25 -05:00
codenoid
557b4942d0 api: fix panic in Allocation.Stub() when Job is nil (#19115) 2023-11-17 08:55:46 -05:00
deverton-godaddy
e75ae1de96 [api] Add NetworkStatus to allocation response (#17280)
Service discovery or mesh network systems consuming the Nomad event stream or API need to know the CNI assigned IP for the allocation. This data is returned by the underlying Nomad API but isn't mapped in the response struct.
2023-07-04 19:35:38 -04:00
sejalapeno
05c84d64d2 Update allocations.go (#17726)
* Update allocations.go

updated missing client status "unknown" #17688

* changelog

* Update .changelog/17726.txt

adding relevant desc.

Co-authored-by: Seth Hoenig <shoenig@duck.com>

---------

Co-authored-by: Seth Hoenig <shoenig@duck.com>
2023-06-26 13:33:29 -05:00
Tim Gross
9a6078a2ae node pools: implement support in scheduler (#17443)
Implement scheduler support for node pool:

* When a scheduler is invoked, we get a set of the ready nodes in the DCs that
  are allowed for that job. Extend the filter to include the node pool.
* Ensure that changes to a job's node pool are picked up as destructive
  allocation updates.
* Add `NodesInPool` as a metric to all reporting done by the scheduler.
* Add the node-in-pool the filter to the `Node.Register` RPC so that we don't
  generate spurious evals for nodes in the wrong pool.
2023-06-07 10:39:03 -04:00
Phil Renaud
d9027a7238 Fixes to scheduling-filtering-in-ui (#17244) 2023-05-18 17:38:34 -04:00
Phil Renaud
4335f9f460 [ui] Adds a "Scheduling" filter to the job.allocations page (#17227)
* Basic filter concept

* Make sure NextAllocation gets sent up with allocation stub
2023-05-18 16:24:41 -04:00
hashicorp-copywrite[bot]
f005448366 [COMPLIANCE] Add Copyright and License Headers 2023-04-10 15:36:59 +00:00
Michael Schurter
fb085186b7 client/metadata: fix crasher caused by AllowStale = false (#16549)
Fixes #16517

Given a 3 Server cluster with at least 1 Client connected to Follower 1:

If a NodeMeta.{Apply,Read} for the Client request is received by
Follower 1 with `AllowStale = false` the Follower will forward the
request to the Leader.

The Leader, not being connected to the target Client, will forward the
RPC to Follower 1.

Follower 1, seeing AllowStale=false, will forward the request to the
Leader.

The Leader, not being connected to... well hoppefully you get the
picture: an infinite loop occurs.
2023-03-20 16:32:32 -07:00
Luiz Aoqui
b24dddce2a api: set last index and request time on alloc stop (#16319)
Some of the methods in `Allocations()` incorrectly use the `putQuery`
in API calls where `put` is more appropriate since they are not reading
information back. These methods are also not returning request metadata
such as `LastIndex` back to callers, which can be useful to have in some
scenarios.

They also provide poor developer experience as they take an
`*api.Allocation` struct when only the allocation ID is necessary. This
can lead consumers to make unnecessary API calls to fetch the full
allocation.

Fixing these problems require updating the methods' signatures so they
take `*WriteOptions` instead of `*QueryOptions` and return `*WriteMeta`,
but this is a breaking change that requires advanced notice to consumers.

This commit adds a future breaking change notice and also fixes the
`Stop` method so it properly returns request metadata in a backwards
compatible way.
2023-03-03 15:52:41 -05:00
Luiz Aoqui
f74f50804a Task lifecycle restart (#14127)
* allocrunner: handle lifecycle when all tasks die

When all tasks die the Coordinator must transition to its terminal
state, coordinatorStatePoststop, to unblock poststop tasks. Since this
could happen at any time (for example, a prestart task dies), all states
must be able to transition to this terminal state.

* allocrunner: implement different alloc restarts

Add a new alloc restart mode where all tasks are restarted, even if they
have already exited. Also unifies the alloc restart logic to use the
implementation that restarts tasks concurrently and ignores
ErrTaskNotRunning errors since those are expected when restarting the
allocation.

* allocrunner: allow tasks to run again

Prevent the task runner Run() method from exiting to allow a dead task
to run again. When the task runner is signaled to restart, the function
will jump back to the MAIN loop and run it again.

The task runner determines if a task needs to run again based on two new
task events that were added to differentiate between a request to
restart a specific task, the tasks that are currently running, or all
tasks that have already run.

* api/cli: add support for all tasks alloc restart

Implement the new -all-tasks alloc restart CLI flag and its API
counterpar, AllTasks. The client endpoint calls the appropriate restart
method from the allocrunner depending on the restart parameters used.

* test: fix tasklifecycle Coordinator test

* allocrunner: kill taskrunners if all tasks are dead

When all non-poststop tasks are dead we need to kill the taskrunners so
we don't leak their goroutines, which are blocked in the alloc restart
loop. This also ensures the allocrunner exits on its own.

* taskrunner: fix tests that waited on WaitCh

Now that "dead" tasks may run again, the taskrunner Run() method will
not return when the task finishes running, so tests must wait for the
task state to be "dead" instead of using the WaitCh, since it won't be
closed until the taskrunner is killed.

* tests: add tests for all tasks alloc restart

* changelog: add entry for #14127

* taskrunner: fix restore logic.

The first implementation of the task runner restore process relied on
server data (`tr.Alloc().TerminalStatus()`) which may not be available
to the client at the time of restore.

It also had the incorrect code path. When restoring a dead task the
driver handle always needs to be clear cleanly using `clearDriverHandle`
otherwise, after exiting the MAIN loop, the task may be killed by
`tr.handleKill`.

The fix is to store the state of the Run() loop in the task runner local
client state: if the task runner ever exits this loop cleanly (not with
a shutdown) it will never be able to run again. So if the Run() loops
starts with this local state flag set, it must exit early.

This local state flag is also being checked on task restart requests. If
the task is "dead" and its Run() loop is not active it will never be
able to run again.

* address code review requests

* apply more code review changes

* taskrunner: add different Restart modes

Using the task event to differentiate between the allocrunner restart
methods proved to be confusing for developers to understand how it all
worked.

So instead of relying on the event type, this commit separated the logic
of restarting an taskRunner into two methods:
- `Restart` will retain the current behaviour and only will only restart
  the task if it's currently running.
- `ForceRestart` is the new method where a `dead` task is allowed to
  restart if its `Run()` method is still active. Callers will need to
  restart the allocRunner taskCoordinator to make sure it will allow the
  task to run again.

* minor fixes
2022-08-24 17:43:07 -04:00
Seth Hoenig
5694999c61 cli: display nomad service check status output in CLI commands
This PR adds some NSD check status output to the CLI.

1. The 'nomad alloc status' command produces nsd check summary output (if present)
2. The 'nomad alloc checks' sub-command is added to produce complete nsd check output (if present)
2022-08-19 09:18:29 -05:00
Tim Gross
fccc49a5eb api: document warnings for setting api.ClientConnTimeout (#14122)
HTTP API consumers that have network line-of-sight to client nodes can connect
directly for a small number of APIs. But in environments where the consumer
doesn't have line-of-sight, there's a long pause waiting for the
`api.ClientConnTimeout` to expire. Warn about this in the API docs so that
authors can avoid the extra timeout.
2022-08-15 16:06:02 -04:00
James Rasell
c67fd40084 api: use errors.New not fmt.Errorf when error doesn't have format. (#14027)
* api: use errors.New not fmt.Errorf when error doesn't have format.

* semgrep: add rule to catch fmt.Errorf use without formatting.
2022-08-05 17:05:47 +02:00
James Rasell
581390bed1 cli: do not import structs, use API package only. (#13938) 2022-08-02 16:33:08 +02:00
James Rasell
16cb776f9c api: add service registration HTTP API wrapper. 2022-03-03 12:14:00 +01:00
Mahmood Ali
a15a61759e exec: api: handle closing errors differently
refactor the api handling of `nomad exec`, and ensure that we process
all received events before handling websocket closing.

The exit code should be the last message received, and we ought to
ignore any websocket close error we receive afterwards.

Previously, we used two channels: one for websocket frames and another
for handling errors. This raised the possibility that we processed the
error before processing the frames, resulting into an "unexpected EOF"
error.
2021-05-25 11:19:42 -04:00
Luiz Aoqui
c7114921fa Add metrics for blocked eval resources (#10454)
* add metrics for blocked eval resources

* docs: add new blocked_evals metrics

* fix to call `pruneStats` instead of `stats.prune` directly
2021-04-29 15:03:45 -04:00
Mahmood Ali
7df90da9c8 oversubscription: adds CLI and API support
This commit updates the API to pass the MemoryMaxMB field, and the CLI to show
the max set for the task.

Also, start parsing the MemoryMaxMB in hcl2, as it's set by tags.

A sample CLI output; note the additional `Max: ` for "task":

```
$ nomad alloc status 96fbeb0b
ID                  = 96fbeb0b-a0b3-aa95-62bf-b8a39492fd5c
[...]

Task "cgroup-fetcher" is "running"
Task Resources
CPU        Memory         Disk     Addresses
0/500 MHz  32 MiB/20 MiB  300 MiB

Task Events:
[...]

Task "task" is "running"
Task Resources
CPU        Memory          Disk     Addresses
0/500 MHz  176 KiB/20 MiB  300 MiB
           Max: 30 MiB

Task Events:
[...]
```
2021-03-30 16:55:58 -04:00
James Rasell
788f9f23a0 api: add Allocation client and server terminal status funcs. 2021-03-25 08:52:59 +01:00
Michael Dwan
e14be0b4f2 Add devices to AllocatedTaskResources 2021-02-22 12:47:36 -07:00
Michael Schurter
a55f46e9ba api: add field filters to /v1/{allocations,nodes}
Fixes #9017

The ?resources=true query parameter includes resources in the object
stub listings. Specifically:

- For `/v1/nodes?resources=true` both the `NodeResources` and
  `ReservedResources` field are included.
- For `/v1/allocations?resources=true` the `AllocatedResources` field is
  included.

The ?task_states=false query parameter removes TaskStates from
/v1/allocations responses. (By default TaskStates are included.)
2020-10-14 10:35:22 -07:00
Mahmood Ali
679fee900c api: target servers for allocation requests (#8897)
Allocation requests should target servers, which then can forward the
request to the appropriate clients.

Contacting clients directly is fragile and prune to failures: e.g.
clients maybe firewalled and not accessible from the API client, or have
some internal certificates not trusted by the API client.

FWIW, in contexts where we anticipate lots of traffic (e.g. logs, or
exec), the api package attempts contacting the client directly but then
fallsback to using the server. This approach seems excessive in these
simple GET/PUT requests.

Fixes #8894
2020-09-16 09:34:17 -04:00
Nick Ethier
7c4bff9549 command: correctly show host IP in ports output /w multi-host networks (#8289) 2020-06-25 15:16:01 -04:00
Lang Martin
13e37865b7 csi: volumes listed in nomad node status (#7318)
* api/allocations: GetTaskGroup finds the taskgroup struct

* command/node_status: display CSI volume names

* nomad/state/state_store: new CSIVolumesByNodeID

* nomad/state/iterator: new SliceIterator type implements memdb.ResultIterator

* nomad/csi_endpoint: deal with a slice of volumes

* nomad/state/state_store: CSIVolumesByNodeID return a SliceIterator

* nomad/structs/csi: CSIVolumeListRequest takes a NodeID

* nomad/csi_endpoint: use the return iterator

* command/agent/csi_endpoint: parse query params for CSIVolumes.List

* api/nodes: new CSIVolumes to list volumes by node

* command/node_status: use the new list endpoint to print volumes

* nomad/state/state_store: error messages consider the operator

* command/node_status: include the Provider
2020-03-23 13:58:30 -04:00
Lang Martin
aea212d34d csi: CLI for volume status, registration/deregistration and plugin status (#7193)
* command/csi: csi, csi_plugin, csi_volume

* helper/funcs: move ExtraKeys from parse_config to UnusedKeys

* command/agent/config_parse: use helper.UnusedKeys

* api/csi: annotate CSIVolumes with hcl fields

* command/csi_plugin: add Synopsis

* command/csi_volume_register: use hcl.Decode style parsing

* command/csi_volume_list

* command/csi_volume_status: list format, cleanup

* command/csi_plugin_list

* command/csi_plugin_status

* command/csi_volume_deregister

* command/csi_volume: add Synopsis

* api/contexts/contexts: add csi search contexts to the constants

* command/commands: register csi commands

* api/csi: fix struct tag for linter

* command/csi_plugin_list: unused struct vars

* command/csi_plugin_status: unused struct vars

* command/csi_volume_list: unused struct vars

* api/csi: add allocs to CSIPlugin

* command/csi_plugin_status: format the allocs

* api/allocations: copy Allocation.Stub in from structs

* nomad/client_rpc: add some error context with Errorf

* api/csi: collapse read & write alloc maps to a stub list

* command/csi_volume_status: cleanup allocation display

* command/csi_volume_list: use Schedulable instead of Healthy

* command/csi_volume_status: use Schedulable instead of Healthy

* command/csi_volume_list: sprintf string

* command/csi: delete csi.go, csi_plugin.go

* command/plugin: refactor csi components to sub-command plugin status

* command/plugin: remove csi

* command/plugin_status: remove csi

* command/volume: remove csi

* command/volume_status: split out csi specific

* helper/funcs: add RemoveEqualFold

* command/agent/config_parse: use helper.RemoveEqualFold

* api/csi: do ,unusedKeys right

* command/volume: refactor csi components to `nomad volume`

* command/volume_register: split out csi specific

* command/commands: use the new top level commands

* command/volume_deregister: hardwired type csi for now

* command/volume_status: csiFormatVolumes rescued from volume_list

* command/plugin_status: avoid a panic on no args

* command/volume_status: avoid a panic on no args

* command/plugin_status: predictVolumeType

* command/volume_status: predictVolumeType

* nomad/csi_endpoint_test: move CreateTestPlugin to testing

* command/plugin_status_test: use CreateTestCSIPlugin

* nomad/structs/structs: add CSIPlugins and CSIVolumes search consts

* nomad/state/state_store: add CSIPlugins and CSIVolumesByIDPrefix

* nomad/search_endpoint: add CSIPlugins and CSIVolumes

* command/plugin_status: move the header to the csi specific

* command/volume_status: move the header to the csi specific

* nomad/state/state_store: CSIPluginByID prefix

* command/status: rename the search context to just Plugins/Volumes

* command/plugin,volume_status: test return ids now

* command/status: rename the search context to just Plugins/Volumes

* command/plugin_status: support -json and -t

* command/volume_status: support -json and -t

* command/plugin_status_csi: comments

* command/*_status: clean up text

* api/csi: fix stale comments

* command/volume: make deregister sound less fearsome

* command/plugin_status: set the id length

* command/plugin_status_csi: more compact plugin health

* command/volume: better error message, comment
2020-03-23 13:58:30 -04:00
Mahmood Ali
c50f295629 api: alloc exec recovers from bad client connection
If alloc exec fails to connect to the nomad client associated with the
alloc, fail over to using a server.

The code attempted to special case `net.Error` for failover to rule out
other permanent non-networking errors, by reusing a pattern in the
logging handling.

But this pattern does not apply here.  `net/http.Http` wraps all errors
as `*url.Error` that is net.Error.  The websocket doesn't, and instead
returns the raw error.  If the raw error isn't a `net.Error`, like in
the case of TLS handshake errors, the api package would fail immediately
rather than failover.
2020-03-04 17:43:00 -05:00
Mahmood Ali
298c528839 cli: recover from client ACL lookup failures
This fixes a bug in the CLI handling of node lookup failures when
querying allocation and FS endpoints.

Allocation and FS endpoint are handled by the client; one can query the
relevant client directly, or query a server to have it forwarded
transparently to relevant client.  Querying the client directly is
benefecial to avoid loading servers with IO.

As an optimization, the CLI attempts to query the client directly, but
then falls back to using server forwarding path if it encounters network
or connection errors (e.g. clients are locked down or in a separate
inaccessible network).

Here, we fix a bug where if the CLI fails to find to lookup the client
details because it lacks ACL capability or other unexpected reasons, the
CLI will not go through fallback path.
2019-10-04 11:23:59 -04:00
Michael Schurter
1ba097bbd5 api: add missing Networks field to alloc resources 2019-07-31 01:04:06 -04:00
Chris Baker
06e30c4589 api: removed unused AllocID from AllocSignalRequest 2019-06-21 21:44:38 +00:00
Mahmood Ali
6168cc59be add api support for nomad exec
Adds nomad exec support in our API, by hitting the websocket endpoint.

We introduce API structs that correspond to the drivers streaming exec structs.

For creating the websocket connection, we reuse the transport setting from api
http client.
2019-05-09 16:49:08 -04:00
Mahmood Ali
1a54a0b839 divest /api from nomad/structs
The API package needs to be independent from rest of nomad packages, to
avoid leaking internal packages and dependencies (e.g. raft, ugorji,
etc)
2019-04-28 13:32:26 -04:00
Danielle Lancashire
023d0dff31 allocs: Add nomad alloc signal command
This command will be used to send a signal to either a single task within an
allocation, or all of the tasks if <task-name> is omitted. If the sent signal
terminates the allocation, it will be treated as if the allocation has crashed,
rather than as if it was operator-terminated.

Signal validation is currently handled by the driver itself and nomad
does not attempt to restrict or validate them.
2019-04-25 12:43:32 +02:00
Danielle
9a4fe5e98f Merge pull request #5512 from hashicorp/dani/f-alloc-stop
alloc-lifecycle: nomad alloc stop
2019-04-23 13:05:08 +02:00
Danielle Lancashire
bb142af5d6 allocs: Add nomad alloc stop
This adds a `nomad alloc stop` command that can be used to stop and
force migrate an allocation to a different node.

This is built on top of the AllocUpdateDesiredTransitionRequest and
explicitly limits the scope of access to that transition to expose it
under the alloc-lifecycle ACL.

The API returns the follow up eval that can be used as part of
monitoring in the CLI or parsed and used in an external tool.
2019-04-23 12:50:23 +02:00
Preetha Appan
ad77c18c87 Add preemption related fields to AllocationListStub 2019-04-18 10:36:44 -05:00
Danielle Lancashire
419d70c5f9 allocs: Add nomad alloc restart
This adds a `nomad alloc restart` command and api that allows a job operator
with the alloc-lifecycle acl to perform an in-place restart of a Nomad
allocation, or a given subtask.
2019-04-11 14:25:49 +02:00
James Rasell
ee92bf86d8 Add NodeName to the alloc/job status outputs.
Currently when operators need to log onto a machine where an alloc
is running they will need to perform both an alloc/job status
call and then a call to discover the node name from the node list.

This updates both the job status and alloc status output to include
the node name within the information to make operator use easier.

Closes #2359
Cloess #1180
2019-04-10 10:34:10 -05:00
Mahmood Ali
5e185386e0 api: avoid codegen for syncing
Given that the values will rarely change, specially considering that any
changes would be backward incompatible change.  As such, it's simpler to
keep syncing manually in the rare occasion and avoid the syncing code
overhead.
2019-01-18 18:52:31 -05:00
Preetha Appan
6bb8e5aa58 Show preemption output in plan CLI 2018-11-08 09:48:43 -06:00
Preetha Appan
5f27e0010d structs and API changes to plan and alloc structs needed for preemption 2018-10-30 11:06:32 -05:00
Alex Dadgar
5e67b37aad use int64 2018-10-16 15:34:32 -07:00
Preetha Appan
3ca71ae935 Change CPU/Disk/MemoryMB to int everywhere in new resource structs 2018-10-16 16:21:42 -05:00