nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-04 17:35:43 +03:00

Author	SHA1	Message	Date
James Rasell	e3fea745eb	docs: Remove long removed client iops metrics from monitoring page. (#25926 )	2025-05-23 16:14:16 +01:00
Aimee Ukasick	c12ad24de0	Docs: SEO updates to operations, other specs sections (#25518 ) * seo operation section * other specifications section * Update website/content/docs/other-specifications/variables.mdx Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com> --------- Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>	2025-05-22 07:47:05 -05:00
Aimee Ukasick	4075b0b8ba	Docs: Add garbage collection page (#25715 ) * add garbage collection page * finish client; add resources section * finish server section; task driver section * add front matter description * fix typos * Address Tim's feedback	2025-04-28 08:37:23 -05:00
Aimee Ukasick	9778fa4912	Docs: Fix broken links in main for 1.10 release (#25540 ) * Docs: Fix broken links in main for 1.10 release * Implement Tim's suggestions * Remove link to Portworx from ecosystem page * remove "Portworx" since Portworx 3.2 no longer supports Nomad	2025-04-01 09:09:44 -05:00
Aimee Ukasick	95ee9261a5	Docs: fix broken links in 1.10 beta docs (#25469 ) * Docs: fix 1.10 broken link in operations/stateful-workloads * updated the link in other pages	2025-03-20 13:17:09 -05:00
Daniel Bennett	8c609ad762	docs: oidc client assertions and pkce (#25375 )	2025-03-20 09:14:17 -05:00
Shantanu Gadgil	b641d25730	website: fix URL for periodic jobs (#25436 )	2025-03-19 07:32:51 +00:00
Aimee Ukasick	5bceb3956e	DHV Front matter description updates for devdot search (#25022 ) * front matter description updates for devdot search; CE-812 * Apply suggestions from code review Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com> --------- Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>	2025-02-06 09:34:54 -06:00
Daniel Bennett	b3ecb69b5a	docs: add quota storage metrics (#24998 ) and reformat the whole darn table	2025-01-31 16:03:43 -06:00
Tim Gross	614e9067ab	docs: considerations for stateful workloads updates for DHV (#24930 ) We have a document describing the various approaches to storage that surveys the landscape and makes recommendations based on the user's environment. Add dynamic host volumes to this document. Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>	2025-01-28 16:33:22 -05:00
James Rasell	ef32825ede	docs: Remove Portworx state workloads link. (#24921 ) Portworx website no longer has Nomad related documentation.	2025-01-24 08:45:41 +00:00
James Rasell	7d48aa2667	client: emit optional telemetry from prerun and prestart hooks. (#24556 ) The Nomad client can now optionally emit telemetry data from the prerun and prestart hooks. This allows operators to monitor and alert on failures and time taken to complete. The new datapoints are: - nomad.client.alloc_hook.prerun.success (counter) - nomad.client.alloc_hook.prerun.failed (counter) - nomad.client.alloc_hook.prerun.elapsed (sample) - nomad.client.task_hook.prestart.success (counter) - nomad.client.task_hook.prestart.failed (counter) - nomad.client.task_hook.prestart.elapsed (sample) The hook execution time is useful to Nomad engineering and will help optimize code where possible and understand job specification impacts on hook performance. Currently only the PreRun and PreStart hooks have telemetry enabled, so we limit the number of new metrics being produced.	2024-12-12 14:43:14 +00:00
Aimee Ukasick	4dfedf1aef	add top-level heading so the page renders correctly (#24491 ) Add opening paragraph; update description	2024-11-19 11:10:10 -06:00
James Rasell	dc501339da	docs: Add federated region concept and operations pages. (#24477 ) In order to help users understand multi-region federated deployments, this change adds two new sections to the website. The first expands the architecture page, so we can add further detail over time with an initial federation page. The second adds a federation operations page which goes into failure planning and mitigation. Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com> Co-authored-by: Michael Schurter <mschurter@hashicorp.com>	2024-11-19 12:39:57 +00:00
Daniel Bennett	c32d9ed6f5	docs: ipv6: small fixes (#24368 ) * escaping newlines is not allowed in go-sockaddr template * client{} block in client section * tiny extra clarification that the NOMAD_ADDR is an example	2024-11-05 11:11:36 -06:00
Aimee Ukasick	5b1ad83d82	Docs: Add IPv6 support page (#24228 ) * initial content from Daniel's doc * Add IPv6 support doc to operations section. * daniel obsessively re-refactors his docs * Style guide edits * a few more style nits --------- Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>	2024-10-29 14:02:04 -05:00
Michael Schurter	da75d4ff4b	docs: fix aed -> aead typo (#24123 )	2024-10-03 13:31:32 -04:00
Adrian Todorov	2444cc3504	docs: small updates to Nomad as an AWS OIDC Provider docs (#24078 ) A few small updates to the recent "Federate access to AWS with Nomad Workload Identity" documentation, most notably that restart isn't needed because AWS SDKs handle OIDC reauth gracefully (unlike any other type of auth - for all others it's cached statically on startup, so nothing but a full restart works in case your credentials expire).	2024-09-30 11:02:09 -04:00
Aimee Ukasick	5f92ccbfb2	Docs: Terraform prereq clarification (#24069 ) Clarify Terraform prereq since you don't need to install the Terraform CLI locally. Fixes: [CE-726](https://hashicorp.atlassian.net/browse/CE-726) [CE-726]: https://hashicorp.atlassian.net/browse/CE-726?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ	2024-09-27 13:47:10 -04:00
Phil Renaud	e206993d49	Feature: Golden Versions (#24055 ) * TaggedVersion information in structs, rather than job_endpoint (#23841) * TaggedVersion information in structs, rather than job_endpoint * Test for taggedVersion description length * Some API plumbing * Tag and Untag job versions (#23863) * Tag and Untag at API level on down, but am I unblocking the wrong thing? * Code and comment cleanup * Unset methods generally now I stare long into the namespace abyss * Namespace passes through with QueryOptions removed from a write requesting struct * Comment and PR review cleanup * Version back to VersionStr * Generally consolidate unset logic into apply for version tagging * Addressed some PR comments * Auth check and RPC forwarding * uint64 instead of pointer for job version after api layer and renamed copy * job tag command split into apply and unset * latest-version convenience handling moved to CLI command level * CLI tests for tagging/untagging * UI parts removed * Add to job table when unsetting job tag on latest version * Vestigial no more * Compare versions by name and version number with the nomad history command (#23889) * First pass at passing a tagname and/or diff version to plan/versions requests * versions API now takes compare_to flags * Job history command output can have tag names and descriptions * compare_to to diff-tag and diff-version, plus adding flags to history command * 0th version now shows a diff if a specific diff target is requested * Addressing some PR comments * Simplify the diff-appending part of jobVersions and hide None-type diffs from CLI * Remove the diff-tag and diff-version parts of nomad job plan, with an eye toward making them a new top-level CLI command soon * Version diff tests * re-implement JobVersionByTagName * Test mods and simplification * Documentation for nomad job history additions * Prevent pruning and reaping of TaggedVersion jobs (#23983) tagged versions should not count against JobTrackedVersions i.e. new job versions being inserted should not evict tagged versions and GC should not delete a job if any of its versions are tagged Co-authored-by: Daniel Bennett <dbennett@hashicorp.com> --------- Co-authored-by: Daniel Bennett <dbennett@hashicorp.com> * [ui] Version Tags on the job versions page (#24013) * Timeline styles and their buttons modernized, and tags added * styled but not yet functional version blocks * Rough pass at edit/unedit UX * Styles consolidated * better UX around version tag crud, plus adapter and serializers * Mirage and acceptance tests * Modify percy to not show time-based things --------- Co-authored-by: Daniel Bennett <dbennett@hashicorp.com> * Job revert command and API endpoint can take a string version tag name (#24059) * Job revert command and API endpoint can take a string version tag name * RevertOpts as a signature-modified alternative to Revert() * job revert CLI test * Version pointers in endpoint tests * Dont copy over the tag when a job is reverted to a version with a tag * Convert tag name to version number at CLI level * Client method for version lookup by tag * No longer double-declaring client * [ui] Add tag filter to the job versions page (#24064) * Rough pass at the UI for version diff dropdown * Cleanup and diff fetching via adapter method * TaggedVersion now VersionTag (#24066) --------- Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>	2024-09-25 19:59:16 -04:00
Tim Gross	a3a2028837	docs: update key management docs for keyring-in-Raft (#24026 ) In #23977 we moved the keyring into Raft. This changeset documents the operational changes and adds notes to the upgrade guide.	2024-09-25 10:48:14 -04:00
Anthony	46d92a53a5	Usage doc for configuring Nomad OIDC with AWS IAM (#23845 )	2024-09-23 14:01:22 -04:00
Tim Gross	ef116b12d5	metrics: add `client.tasks` state metrics (#23773 ) Although we have `client.allocations` metrics to track allocation states on a client, having separate metrics for `client.tasks` will allow operators to identify that there are individual tasks in an unexpected state in an otherwise healthy allocation. Fixes: https://github.com/hashicorp/nomad/issues/23770	2024-08-09 09:02:17 -04:00
Tim Gross	0f4014b4a9	docs: external KMS configuration (#23600 ) In #23580 we're implementing support for encrypting Nomad's key material with external KMS providers or Vault Transit. This changeset breaks out the documentation from that PR to keep the review manageable and present it to a wider set of reviewers. Ref: https://hashicorp.atlassian.net/browse/NET-10334 Ref: https://github.com/hashicorp/nomad/issues/14852 Ref: https://github.com/hashicorp/nomad/pull/23580	2024-07-19 15:08:54 -04:00
Piotr Kazmierczak	abc6fe325d	docs: fix typo in nomad quota utilization metrics (#23185 )	2024-06-05 16:20:44 +02:00
Piotr Kazmierczak	2a09abc477	metrics: quota utilization configuration and documentation (#22912 ) Introduces support for (optional) quota utilization metrics CE part of the hashicorp/nomad-enterprise#1488 change	2024-06-03 21:06:19 +02:00
James Rasell	6cb9bed236	docs: add operations benchmarking page with nomad-bench link. (#22393 )	2024-05-30 07:34:10 +01:00
Piotr Kazmierczak	048f4511e2	docs: correct nanoseconds to milliseconds for MeasureSince metrics (#20446 )	2024-04-18 18:16:58 +02:00
Piotr Kazmierczak	0d14dd96ca	eval_broker: track enqueue and dequeue times (#20329 ) Adds new metrics to the eval broker that track times of evaluations enqueueing and dequeueing.	2024-04-15 16:16:50 +02:00
Luiz Aoqui	b5573b7470	docs: fix `invoke_scheduler` metrics (#20172 )	2024-03-21 10:57:30 -04:00
CJ	c9cd8480fa	docs: considerations for Stateful Workloads (#19077 ) Co-authored-by: Adrian Todorov <adrian.todorov@hashicorp.com>	2024-01-10 16:06:45 -05:00
Seth Hoenig	5f3aae7340	website: fix spellcheck path and cleanup some misspellings (#19238 )	2023-11-30 09:38:19 -06:00
James Rasell	e2487698e6	docs: add alloc metrics note about possible cgroup variations. (#19195 )	2023-11-28 14:32:08 +00:00
Kerim Satirli	5e1bbf90fc	docs: update all URLs to `developer.hashicorp.com` (#16247 )	2023-10-24 11:00:11 -04:00
Karuppiah Natarajan	2fd508d4f1	docs: fix link for stopping an agent (#18130 )	2023-08-02 11:51:45 -04:00
Luiz Aoqui	ce0f60fb68	metrics: report task memory_max value (#17938 ) Add new `nomad.client.allocs.memory.max_allocated` metric to report the value of the task `memory_max` resource value.	2023-07-19 16:50:12 -04:00
Patric Stout	ede662a828	metrics: add "total_ticks_count" for CPU metrics (#17579 ) This counter tells you the total amount of ticks for that CPU entry since the start of Nomad.	2023-07-05 10:28:55 -04:00
Tim Gross	288ff2f0c4	docs: add missing `client.allocs` metrics (#17540 ) The docs were missing counter metrics emitted by the task runner around task state changes.	2023-06-15 09:18:11 -04:00
Tim Gross	068d0ea9af	node pools: add pool as label on client metrics (#17528 ) This changeset adds the node pool as a label anywhere we're already emitting labels with additional information such as node class or ID about the client.	2023-06-14 15:58:38 -04:00
Tim Gross	6bd1ebed29	docs: note namespace apply/delete behaviors, fix metric (#17527 ) This changeset includes some fixes to documentation discovered while working on node pools, but we didn't want to include in the node pool PRs so they can get backported easily: * namespace apply/delete commands are forwarded to the authoritative region * deleting a namespace requires there are no non-terminal jobs in any of the federated regions * fixed a typo in the name of the `nomad.client.allocated.disk` metric	2023-06-14 14:52:06 -04:00
Luiz Aoqui	f0f4cbb848	node pools: list nodes in pool (#17413 )	2023-06-06 10:43:43 -04:00
Tim Gross	bb0140803e	node pools: implement RPC to list jobs in a given node pool (#17396 ) Implements the `NodePool.ListJobs` RPC, with pagination and filtering based on the existing `Job.List` RPC.	2023-06-05 15:36:52 -04:00
Luiz Aoqui	970e998b00	node pools: add CRUD API (#17384 )	2023-06-01 15:55:49 -04:00
Tim Gross	c3002db815	client: allow `drain_on_shutdown` configuration (#16827 ) Adds a new configuration to clients to optionally allow them to drain their workloads on shutdown. The client sends the `Node.UpdateDrain` RPC targeting itself and then monitors the drain state as seen by the server until the drain is complete or the deadline expires. If it loses connection with the server, it will monitor local client status instead to ensure allocations are stopped before exiting.	2023-04-14 15:35:32 -04:00
Tim Gross	504fdf0e43	docs: document signal handling (#16835 ) Expand documentation about Nomad's signal handling behaviors, including removing incorrect information about graceful client shutdowns.	2023-04-11 16:26:39 -04:00
Tim Gross	cef87f054a	docs: add notes about keyring to snapshot restore (#16663 ) When cluster administrators restore from Raft snapshot, they also need to ensure the keyring is in place. For on-prem users doing in-place upgrades this is less of a concern but for typical cloud workflows where the whole host is replaced, it's an important warning (at least until #14852 has been implemented).	2023-03-28 08:31:01 -04:00
Piotr Kazmierczak	949a6f60c7	renamed stanza to block for consistency with other projects (#15941 )	2023-01-30 15:48:43 +01:00
Tim Gross	92effde870	docs: add more warnings about running agent as root on Linux (#15926 )	2023-01-27 15:22:18 -05:00
Ashlee M Boyer	3444ece549	docs: Migrate link formats (#15779 ) * Adding check-legacy-links-format workflow * Adding test-link-rewrites workflow * chore: updates link checker workflow hash * Migrating links to new format Co-authored-by: Kendall Strautman <kendallstrautman@gmail.com>	2023-01-25 09:31:14 -08:00
Tim Gross	9bdb6a5b7d	Rename `nomad.broker.total_blocked` metric (#15835 ) This changeset fixes a long-standing point of confusion in metrics emitted by the eval broker. The eval broker has a queue of "blocked" evals that are waiting for an in-flight ("unacked") eval of the same job to be completed. But this "blocked" state is not the same as the `blocked` status that we write to raft and expose in the Nomad API to end users. There's a second metric `nomad.blocked_eval.total_blocked` that refers to evaluations in that state. This has caused ongoing confusion in major customer incidents and even in our own documentation! (Fixed in this PR.) There's little functional change in this PR aside from the name of the metric emitted, but there's a bit refactoring to clean up the names in `eval_broker.go` so that there aren't name collisions and multiple names for the same state. Changes included are: * Everything that was previously called "pending" referred to entities that were associated witht he "ready" metric. These are all now called "ready" to match the metric. * Everything named "blocked" in `eval_broker.go` is now named "pending", except for a couple of comments that actually refer to blocked RPCs. * Added a note to the upgrade guide docs for 1.5.0. * Fixed the scheduling performance metrics docs because the description for `nomad.broker.total_blocked` was actually the description for `nomad.blocked_eval.total_blocked`.	2023-01-20 14:23:56 -05:00

1 2

79 Commits