Commit Graph

79 Commits

Author SHA1 Message Date
James Rasell
e3fea745eb docs: Remove long removed client iops metrics from monitoring page. (#25926) 2025-05-23 16:14:16 +01:00
Aimee Ukasick
c12ad24de0 Docs: SEO updates to operations, other specs sections (#25518)
* seo operation section

* other specifications section

* Update website/content/docs/other-specifications/variables.mdx

Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>

---------

Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>
2025-05-22 07:47:05 -05:00
Aimee Ukasick
4075b0b8ba Docs: Add garbage collection page (#25715)
* add garbage collection page

* finish client; add resources section

* finish server section; task driver section

* add front matter description

* fix typos

* Address Tim's feedback
2025-04-28 08:37:23 -05:00
Aimee Ukasick
9778fa4912 Docs: Fix broken links in main for 1.10 release (#25540)
* Docs: Fix broken links in main for 1.10 release

* Implement Tim's suggestions

* Remove link to Portworx from ecosystem page

* remove "Portworx" since Portworx 3.2 no longer supports Nomad
2025-04-01 09:09:44 -05:00
Aimee Ukasick
95ee9261a5 Docs: fix broken links in 1.10 beta docs (#25469)
* Docs: fix 1.10 broken link in operations/stateful-workloads

* updated the link in other pages
2025-03-20 13:17:09 -05:00
Daniel Bennett
8c609ad762 docs: oidc client assertions and pkce (#25375) 2025-03-20 09:14:17 -05:00
Shantanu Gadgil
b641d25730 website: fix URL for periodic jobs (#25436) 2025-03-19 07:32:51 +00:00
Aimee Ukasick
5bceb3956e DHV Front matter description updates for devdot search (#25022)
* front matter description updates for devdot search; CE-812

* Apply suggestions from code review

Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>

---------

Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>
2025-02-06 09:34:54 -06:00
Daniel Bennett
b3ecb69b5a docs: add quota storage metrics (#24998)
and reformat the whole darn table
2025-01-31 16:03:43 -06:00
Tim Gross
614e9067ab docs: considerations for stateful workloads updates for DHV (#24930)
We have a document describing the various approaches to storage that surveys the
landscape and makes recommendations based on the user's environment. Add dynamic
host volumes to this document.

Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>
2025-01-28 16:33:22 -05:00
James Rasell
ef32825ede docs: Remove Portworx state workloads link. (#24921)
Portworx website no longer has Nomad related documentation.
2025-01-24 08:45:41 +00:00
James Rasell
7d48aa2667 client: emit optional telemetry from prerun and prestart hooks. (#24556)
The Nomad client can now optionally emit telemetry data from the
prerun and prestart hooks. This allows operators to monitor and
alert on failures and time taken to complete.

The new datapoints are:
  - nomad.client.alloc_hook.prerun.success (counter)
  - nomad.client.alloc_hook.prerun.failed (counter)
  - nomad.client.alloc_hook.prerun.elapsed (sample)

  - nomad.client.task_hook.prestart.success (counter)
  - nomad.client.task_hook.prestart.failed (counter)
  - nomad.client.task_hook.prestart.elapsed (sample)

The hook execution time is useful to Nomad engineering and will
help optimize code where possible and understand job specification
impacts on hook performance.

Currently only the PreRun and PreStart hooks have telemetry
enabled, so we limit the number of new metrics being produced.
2024-12-12 14:43:14 +00:00
Aimee Ukasick
4dfedf1aef add top-level heading so the page renders correctly (#24491)
Add opening paragraph; update description
2024-11-19 11:10:10 -06:00
James Rasell
dc501339da docs: Add federated region concept and operations pages. (#24477)
In order to help users understand multi-region federated
deployments, this change adds two new sections to the website.

The first expands the architecture page, so we can add further
detail over time with an initial federation page. The second adds
a federation operations page which goes into failure planning and
mitigation.

Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>
Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
2024-11-19 12:39:57 +00:00
Daniel Bennett
c32d9ed6f5 docs: ipv6: small fixes (#24368)
* escaping newlines is not allowed in go-sockaddr template
* client{} block in client section
* tiny extra clarification that the NOMAD_ADDR is an example
2024-11-05 11:11:36 -06:00
Aimee Ukasick
5b1ad83d82 Docs: Add IPv6 support page (#24228)
* initial content from Daniel's doc

* Add IPv6 support doc to operations section.

* daniel obsessively re-refactors his docs

* Style guide edits

* a few more style nits

---------

Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>
2024-10-29 14:02:04 -05:00
Michael Schurter
da75d4ff4b docs: fix aed -> aead typo (#24123) 2024-10-03 13:31:32 -04:00
Adrian Todorov
2444cc3504 docs: small updates to Nomad as an AWS OIDC Provider docs (#24078)
A few small updates to the recent "Federate access to AWS with Nomad Workload Identity" documentation, most notably that restart isn't needed because AWS SDKs handle OIDC reauth gracefully (unlike any other type of auth - for all others it's cached statically on startup, so nothing but a full restart works in case your credentials expire).
2024-09-30 11:02:09 -04:00
Aimee Ukasick
5f92ccbfb2 Docs: Terraform prereq clarification (#24069)
Clarify Terraform prereq since you don't need to install the Terraform CLI locally.

Fixes: [CE-726](https://hashicorp.atlassian.net/browse/CE-726)

[CE-726]: https://hashicorp.atlassian.net/browse/CE-726?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ
2024-09-27 13:47:10 -04:00
Phil Renaud
e206993d49 Feature: Golden Versions (#24055)
* TaggedVersion information in structs, rather than job_endpoint (#23841)

* TaggedVersion information in structs, rather than job_endpoint

* Test for taggedVersion description length

* Some API plumbing

* Tag and Untag job versions (#23863)

* Tag and Untag at API level on down, but am I unblocking the wrong thing?

* Code and comment cleanup

* Unset methods generally now I stare long into the namespace abyss

* Namespace passes through with QueryOptions removed from a write requesting struct

* Comment and PR review cleanup

* Version back to VersionStr

* Generally consolidate unset logic into apply for version tagging

* Addressed some PR comments

* Auth check and RPC forwarding

* uint64 instead of pointer for job version after api layer and renamed copy

* job tag command split into apply and unset

* latest-version convenience handling moved to CLI command level

* CLI tests for tagging/untagging

* UI parts removed

* Add to job table when unsetting job tag on latest version

* Vestigial no more

* Compare versions by name and version number with the nomad history command (#23889)

* First pass at passing a tagname and/or diff version to plan/versions requests

* versions API now takes compare_to flags

* Job history command output can have tag names and descriptions

* compare_to to diff-tag and diff-version, plus adding flags to history command

* 0th version now shows a diff if a specific diff target is requested

* Addressing some PR comments

* Simplify the diff-appending part of jobVersions and hide None-type diffs from CLI

* Remove the diff-tag and diff-version parts of nomad job plan, with an eye toward making them a new top-level CLI command soon

* Version diff tests

* re-implement JobVersionByTagName

* Test mods and simplification

* Documentation for nomad job history additions

* Prevent pruning and reaping of TaggedVersion jobs (#23983)

tagged versions should not count against JobTrackedVersions
i.e. new job versions being inserted should not evict tagged versions

and GC should not delete a job if any of its versions are tagged

Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>

---------

Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>

* [ui] Version Tags on the job versions page (#24013)

* Timeline styles and their buttons modernized, and tags added

* styled but not yet functional version blocks

* Rough pass at edit/unedit UX

* Styles consolidated

* better UX around version tag crud, plus adapter and serializers

* Mirage and acceptance tests

* Modify percy to not show time-based things

---------

Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>

* Job revert command and API endpoint can take a string version tag name (#24059)

* Job revert command and API endpoint can take a string version tag name

* RevertOpts as a signature-modified alternative to Revert()

* job revert CLI test

* Version pointers in endpoint tests

* Dont copy over the tag when a job is reverted to a version with a tag

* Convert tag name to version number at CLI level

* Client method for version lookup by tag

* No longer double-declaring client

* [ui] Add tag filter to the job versions page (#24064)

* Rough pass at the UI for version diff dropdown

* Cleanup and diff fetching via adapter method

* TaggedVersion now VersionTag (#24066)

---------

Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>
2024-09-25 19:59:16 -04:00
Tim Gross
a3a2028837 docs: update key management docs for keyring-in-Raft (#24026)
In #23977 we moved the keyring into Raft. This changeset documents the
operational changes and adds notes to the upgrade guide.
2024-09-25 10:48:14 -04:00
Anthony
46d92a53a5 Usage doc for configuring Nomad OIDC with AWS IAM (#23845) 2024-09-23 14:01:22 -04:00
Tim Gross
ef116b12d5 metrics: add client.tasks state metrics (#23773)
Although we have `client.allocations` metrics to track allocation states on a
client, having separate metrics for `client.tasks` will allow operators to
identify that there are individual tasks in an unexpected state in an otherwise
healthy allocation.

Fixes: https://github.com/hashicorp/nomad/issues/23770
2024-08-09 09:02:17 -04:00
Tim Gross
0f4014b4a9 docs: external KMS configuration (#23600)
In #23580 we're implementing support for encrypting Nomad's key material with
external KMS providers or Vault Transit. This changeset breaks out the
documentation from that PR to keep the review manageable and present it to a
wider set of reviewers.

Ref: https://hashicorp.atlassian.net/browse/NET-10334
Ref: https://github.com/hashicorp/nomad/issues/14852
Ref: https://github.com/hashicorp/nomad/pull/23580
2024-07-19 15:08:54 -04:00
Piotr Kazmierczak
abc6fe325d docs: fix typo in nomad quota utilization metrics (#23185) 2024-06-05 16:20:44 +02:00
Piotr Kazmierczak
2a09abc477 metrics: quota utilization configuration and documentation (#22912)
Introduces support for (optional) quota utilization metrics

CE part of the hashicorp/nomad-enterprise#1488 change
2024-06-03 21:06:19 +02:00
James Rasell
6cb9bed236 docs: add operations benchmarking page with nomad-bench link. (#22393) 2024-05-30 07:34:10 +01:00
Piotr Kazmierczak
048f4511e2 docs: correct nanoseconds to milliseconds for MeasureSince metrics (#20446) 2024-04-18 18:16:58 +02:00
Piotr Kazmierczak
0d14dd96ca eval_broker: track enqueue and dequeue times (#20329)
Adds new metrics to the eval broker that track times of evaluations enqueueing
and dequeueing.
2024-04-15 16:16:50 +02:00
Luiz Aoqui
b5573b7470 docs: fix invoke_scheduler metrics (#20172) 2024-03-21 10:57:30 -04:00
CJ
c9cd8480fa docs: considerations for Stateful Workloads (#19077)
Co-authored-by: Adrian Todorov <adrian.todorov@hashicorp.com>
2024-01-10 16:06:45 -05:00
Seth Hoenig
5f3aae7340 website: fix spellcheck path and cleanup some misspellings (#19238) 2023-11-30 09:38:19 -06:00
James Rasell
e2487698e6 docs: add alloc metrics note about possible cgroup variations. (#19195) 2023-11-28 14:32:08 +00:00
Kerim Satirli
5e1bbf90fc docs: update all URLs to developer.hashicorp.com (#16247) 2023-10-24 11:00:11 -04:00
Karuppiah Natarajan
2fd508d4f1 docs: fix link for stopping an agent (#18130) 2023-08-02 11:51:45 -04:00
Luiz Aoqui
ce0f60fb68 metrics: report task memory_max value (#17938)
Add new `nomad.client.allocs.memory.max_allocated` metric to report the
value of the task `memory_max` resource value.
2023-07-19 16:50:12 -04:00
Patric Stout
ede662a828 metrics: add "total_ticks_count" for CPU metrics (#17579)
This counter tells you the total amount of ticks for that CPU
entry since the start of Nomad.
2023-07-05 10:28:55 -04:00
Tim Gross
288ff2f0c4 docs: add missing client.allocs metrics (#17540)
The docs were missing counter metrics emitted by the task runner around task
state changes.
2023-06-15 09:18:11 -04:00
Tim Gross
068d0ea9af node pools: add pool as label on client metrics (#17528)
This changeset adds the node pool as a label anywhere we're already emitting
labels with additional information such as node class or ID about the client.
2023-06-14 15:58:38 -04:00
Tim Gross
6bd1ebed29 docs: note namespace apply/delete behaviors, fix metric (#17527)
This changeset includes some fixes to documentation discovered while working on
node pools, but we didn't want to include in the node pool PRs so they can get
backported easily:

* namespace apply/delete commands are forwarded to the authoritative region
* deleting a namespace requires there are no non-terminal jobs in any of the
  federated regions
* fixed a typo in the name of the `nomad.client.allocated.disk` metric
2023-06-14 14:52:06 -04:00
Luiz Aoqui
f0f4cbb848 node pools: list nodes in pool (#17413) 2023-06-06 10:43:43 -04:00
Tim Gross
bb0140803e node pools: implement RPC to list jobs in a given node pool (#17396)
Implements the `NodePool.ListJobs` RPC, with pagination and filtering based on
the existing `Job.List` RPC.
2023-06-05 15:36:52 -04:00
Luiz Aoqui
970e998b00 node pools: add CRUD API (#17384) 2023-06-01 15:55:49 -04:00
Tim Gross
c3002db815 client: allow drain_on_shutdown configuration (#16827)
Adds a new configuration to clients to optionally allow them to drain their
workloads on shutdown. The client sends the `Node.UpdateDrain` RPC targeting
itself and then monitors the drain state as seen by the server until the drain
is complete or the deadline expires. If it loses connection with the server, it
will monitor local client status instead to ensure allocations are stopped
before exiting.
2023-04-14 15:35:32 -04:00
Tim Gross
504fdf0e43 docs: document signal handling (#16835)
Expand documentation about Nomad's signal handling behaviors, including removing
incorrect information about graceful client shutdowns.
2023-04-11 16:26:39 -04:00
Tim Gross
cef87f054a docs: add notes about keyring to snapshot restore (#16663)
When cluster administrators restore from Raft snapshot, they also need to ensure the
keyring is in place. For on-prem users doing in-place upgrades this is less of a
concern but for typical cloud workflows where the whole host is replaced, it's
an important warning (at least until #14852 has been implemented).
2023-03-28 08:31:01 -04:00
Piotr Kazmierczak
949a6f60c7 renamed stanza to block for consistency with other projects (#15941) 2023-01-30 15:48:43 +01:00
Tim Gross
92effde870 docs: add more warnings about running agent as root on Linux (#15926) 2023-01-27 15:22:18 -05:00
Ashlee M Boyer
3444ece549 docs: Migrate link formats (#15779)
* Adding check-legacy-links-format workflow

* Adding test-link-rewrites workflow

* chore: updates link checker workflow hash

* Migrating links to new format

Co-authored-by: Kendall Strautman <kendallstrautman@gmail.com>
2023-01-25 09:31:14 -08:00
Tim Gross
9bdb6a5b7d Rename nomad.broker.total_blocked metric (#15835)
This changeset fixes a long-standing point of confusion in metrics emitted by
the eval broker. The eval broker has a queue of "blocked" evals that are waiting
for an in-flight ("unacked") eval of the same job to be completed. But this
"blocked" state is not the same as the `blocked` status that we write to raft
and expose in the Nomad API to end users. There's a second metric
`nomad.blocked_eval.total_blocked` that refers to evaluations in that
state. This has caused ongoing confusion in major customer incidents and even in
our own documentation! (Fixed in this PR.)

There's little functional change in this PR aside from the name of the metric
emitted, but there's a bit refactoring to clean up the names in `eval_broker.go`
so that there aren't name collisions and multiple names for the same
state. Changes included are:
* Everything that was previously called "pending" referred to entities that were
  associated witht he "ready" metric. These are all now called "ready" to match
  the metric.
* Everything named "blocked" in `eval_broker.go` is now named "pending", except
  for a couple of comments that actually refer to blocked RPCs.
* Added a note to the upgrade guide docs for 1.5.0.
* Fixed the scheduling performance metrics docs because the description for
  `nomad.broker.total_blocked` was actually the description for
  `nomad.blocked_eval.total_blocked`.
2023-01-20 14:23:56 -05:00