Commit Graph

22154 Commits

Author SHA1 Message Date
Tim Gross
2d4e5b8fe9 scheduler: fix quadratic performance with spread blocks (#11712)
When the scheduler picks a node for each evaluation, the
`LimitIterator` provides at most 2 eligible nodes for the
`MaxScoreIterator` to choose from. This keeps scheduling fast while
producing acceptable results because the results are binpacked.

Jobs with a `spread` block (or node affinity) remove this limit in
order to produce correct spread scoring. This means that every
allocation within a job with a `spread` block is evaluated against
_all_ eligible nodes. Operators of large clusters have reported that
jobs with `spread` blocks that are eligible on a large number of nodes
can take longer than the nack timeout to evaluate (60s). Typical
evaluations are processed in milliseconds.

In practice, it's not necessary to evaluate every eligible node for
every allocation on large clusters, because the `RandomIterator` at
the base of the scheduler stack produces enough variation in each pass
that the likelihood of an uneven spread is negligible. Note that
feasibility is checked before the limit, so this only impacts the
number of _eligible_ nodes available for scoring, not the total number
of nodes.

This changeset sets the iterator limit for "large" `spread` block and
node affinity jobs to be equal to the number of desired
allocations. This brings an example problematic job evaluation down
from ~3min to ~10s. The included tests ensure that we have acceptable
spread results across a variety of large cluster topologies.
2021-12-21 10:10:01 -05:00
Andy Assareh
20bbdba041 Mesh Gateway doc enhancements (#11354)
* Mesh Gateway doc enhancements

1. I believe this line should be corrected to add mesh as one of the choices
2. I found that we are not setting this meta, and it is a required element for wan federation. I believe it would be helpful and potentially time saving to note that right here.
2021-12-20 17:10:44 -05:00
Guilherme
649f1ab6df Fix 'check calculations' link (#11420) 2021-12-20 17:09:15 -05:00
Luiz Aoqui
84ef826d1c changelog: add entries for #11555, #11557, and #11687 (#11706) 2021-12-20 13:45:20 -05:00
Tim Gross
3740c24d9e api: respect wildcard in evaluations list API (#11710) 2021-12-20 12:23:50 -05:00
Jai
0ec5db432f Merge pull request #11578 from hashicorp/f-ui/clickable-links-allocs
clickable links in allocations chart
2021-12-20 10:08:01 -05:00
James Rasell
ab9ba35e6a chore: fixup inconsistent method receiver names. (#11704) 2021-12-20 11:44:21 +01:00
Jai
ca8af7314a Merge pull request #11545 from hashicorp/f-ui/add-alloc-filters-on-table
Add Allocation Filters in Client View
2021-12-18 09:39:53 -05:00
Jai
296a29f0dd Merge pull request #11544 from hashicorp/f-ui/add-filters-to-allocs
Add filters to Allocations
2021-12-18 09:38:28 -05:00
Luiz Aoqui
a8c9676c99 ui: fix action call to set filter query param 2021-12-17 20:41:53 -05:00
Luiz Aoqui
efd05eaa54 ui: fix volume serializer tests 2021-12-17 20:23:28 -05:00
Luiz Aoqui
e6ee0619c0 ui: fix allocation serializer tests 2021-12-17 20:02:59 -05:00
Luiz Aoqui
ad80c84aff ui: fix job allocation filter by status, remove version filter, and add tests 2021-12-17 19:50:43 -05:00
Luiz Aoqui
f8709ff55a ui: fix file formating 2021-12-17 19:47:25 -05:00
Luiz Aoqui
3f363938b7 changelog: fix entry for #11544 2021-12-17 18:57:54 -05:00
Luiz Aoqui
770bb0534a ui: fix linting 2021-12-17 18:55:41 -05:00
Luiz Aoqui
ba1151198e changelog: add entry for #11545 2021-12-17 18:49:56 -05:00
Luiz Aoqui
1d773d0d9e ui: fix task group alloc filter and add tests 2021-12-17 18:49:47 -05:00
Luiz Aoqui
648b71c96a ui: display empty message in the client details page if there are no allocations to show 2021-12-17 18:49:47 -05:00
Luiz Aoqui
6112620590 ui: fix client details page alloc status filter and replace task group with namespace and job 2021-12-17 18:49:42 -05:00
Jai
c0add56610 fix: more descriptive parameters in sort function
Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
2021-12-17 09:46:29 -05:00
Jai
a8854bc3a8 fix: remove eslint disable indent 2021-12-17 09:36:42 -05:00
Michael Schurter
3ca534acf6 Merge pull request #11697 from hashicorp/f-raft-state-err
cli: return error from raft commands if db is open
2021-12-16 14:18:15 -08:00
Michael Schurter
fa3de735cf cli: return error from raft commands if db is open
Before this change trying to run `nomad operator raft {info,logs}` on an
inuse raft.db would cause the command to block until the agent using
raft.db is closed.

After this change the command will block for 1s before returning a
(hopefully) helpful error message.

This change also sets the ReadOnly mode on the underlying BoltDb to
ensure diagnostics make no changes to the underlying store. We have no
evidence this has ever occurred, but it seems like a useful safety
measure.

No changelog added since this is a minor tweak in a "new" feature (it
was hidden in previous relases).
2021-12-16 11:41:01 -08:00
Luiz Aoqui
55018bdfe6 docs: add v1.2.0 upgrade guide about Nomad UI ACL change for job details page (#11689) 2021-12-16 14:32:20 -05:00
Luiz Aoqui
15db86a6af docs: add more references and examples to the template block (#11691) 2021-12-16 14:14:01 -05:00
Noel Quiles
495a46ee79 website: Disable alert banner (#11688) 2021-12-16 13:43:47 -05:00
Tim Gross
bd18a452ab cli: stream raft logs to operator raft logs subcommand (#11684)
The `nomad operator raft logs` command uses a raft helper that reads
in the logs from raft and serializes them to JSON. The previous
implementation returned the slice of all logs and then serializes the
entire object. Update the helper to stream the log entries and then
serialize them as newline-delimited JSON.
2021-12-16 13:38:58 -05:00
Jai Bhagat
094c1912f9 feat: add sliceClick to job-page/summary 2021-12-16 11:24:03 -05:00
Jai Bhagat
c6dd71322a chore: prettify job-page/summary 2021-12-16 11:23:05 -05:00
Tim Gross
03ea7d1c17 cli: unhide advanced operator raft debugging commands (#11682)
The `nomad operator raft` and `nomad operator snapshot state`
subcommands for inspecting on-disk raft state were hidden and
undocumented. Expose and document these so that advanced operators
have support for these tools.
2021-12-16 10:32:11 -05:00
Tim Gross
97621ec3c5 nomad eval list command (#11675)
Use the new filtering and pagination capabilities of the `Eval.List`
RPC to provide filtering and pagination at the command line.

Also includes note that `nomad eval status -json` is deprecated and
will be replaced with a single evaluation view in a future version of
Nomad.
2021-12-15 11:58:38 -05:00
Tim Gross
072d3b6b74 cli: ensure -stale flag is respected by nomad operator debug (#11678)
When a cluster doesn't have a leader, the `nomad operator debug`
command can safely use stale queries to gracefully degrade the
consistency of almost all its queries. The query parameter for these
API calls was not being set by the command.

Some `api` package queries do not include `QueryOptions` because
they target a specific agent, but they can potentially be forwarded to
other agents. If there is no leader, these forwarded queries will
fail. Provide methods to call these APIs with `QueryOptions`.
2021-12-15 10:44:03 -05:00
Luiz Aoqui
86e2dd718c api: return error when LicenseGet status is not 200 (#11644) 2021-12-14 19:47:09 -05:00
Noel Quiles
608cdfc71d website: Copy updates (#11677) 2021-12-14 16:35:21 -05:00
Noel Quiles
ed91a53475 website: Update website Docker image (#11667) 2021-12-13 16:40:46 -05:00
Kevin Wang
be08f0ae3f feat: versioned docs (#11407) 2021-12-13 16:21:57 -05:00
Tim Gross
35c22bcb6c provide -no-shutdown-delay flag for job/alloc stop (#11596)
Some operators use very long group/task `shutdown_delay` settings to
safely drain network connections to their workloads after service
deregistration. But during incident response, they may want to cause
that drain to be skipped so they can quickly shed load.

Provide a `-no-shutdown-delay` flag on the `nomad alloc stop` and
`nomad job stop` commands that bypasses the delay. This sets a new
desired transition state on the affected allocations that the
allocation/task runner will identify during pre-kill on the client.

Note (as documented here) that using this flag will almost always
result in failed inbound network connections for workloads as the
tasks will exit before clients receive updated service discovery
information and won't be gracefully drained.
2021-12-13 14:54:53 -05:00
Tim Gross
43b3e1628f Merge pull request #11665 from hashicorp/merge-release-1.2.3-branch
Merge release 1.2.3 branch
2021-12-13 10:42:15 -05:00
Tim Gross
a6f6abbbee prepare for next release 2021-12-13 10:14:22 -05:00
Tim Gross
5757613077 Merge tag 'v1.2.3' into merge-release-1.2.3-branch
Version 1.2.3
2021-12-13 10:12:07 -05:00
Tim Gross
4d6658409a trigger Vercel pipeline 2021-12-13 09:56:34 -05:00
Tim Gross
63329878c0 update download to Nomad v1.2.3 (#11664) 2021-12-13 09:56:12 -05:00
Nomad Release Bot
55e5c49b99 Release v1.2.3 2021-12-10 20:10:08 +00:00
Nomad Release bot
a79efc8422 Generate files for 1.2.3 release 2021-12-10 19:30:22 +00:00
Tim Gross
45a5b22b65 docs: add 1.2.3 to changelog 2021-12-10 14:06:14 -05:00
Tim Gross
9439d7a823 golang security update 1.17.5 2021-12-10 13:50:22 -05:00
Tim Gross
972708aaed evaluations list pagination and filtering (#11648)
API queries can request pagination using the `NextToken` and `PerPage`
fields of `QueryOptions`, when supported by the underlying API.

Add a `NextToken` field to the `structs.QueryMeta` so that we have a
common field across RPCs to tell the caller where to resume paging
from on their next API call. Include this field on the `api.QueryMeta`
as well so that it's available for future versions of List HTTP APIs
that wrap the response with `QueryMeta` rather than returning a simple
list of structs. In the meantime callers can get the `X-Nomad-NextToken`.

Add pagination to the `Eval.List` RPC by checking for pagination token
and page size in `QueryOptions`. This will allow resuming from the
last ID seen so long as the query parameters and the state store
itself are unchanged between requests.

Add filtering by job ID or evaluation status over the results we get
out of the state store.

Parse the query parameters of the `Eval.List` API into the arguments
expected for filtering in the RPC call.
2021-12-10 13:43:03 -05:00
Kevin Wang
ddca508b0d feat(website): extract /plugins /tools docs (#11584)
Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
Co-authored-by: Mike Nomitch <mnomitch@hashicorp.com>
2021-12-09 14:25:18 -05:00
Brandon Romano
1fdf1b9122 Update the banner (#11656) 2021-12-09 12:08:58 -05:00