From 53b083b8c5e09c8f12ae6a9db9fe6f88e75c2d12 Mon Sep 17 00:00:00 2001 From: Aimee Ukasick Date: Tue, 8 Jul 2025 19:24:52 -0500 Subject: [PATCH] Docs: Nomad IA (#26063) * Move commands from docs to its own root-level directory * temporarily use modified dev-portal branch with nomad ia changes * explicitly clone nomad ia exp branch * retrigger build, fixed dev-portal broken build * architecture, concepts and get started individual pages * fix get started section destinations * reference section * update repo comment in website-build.sh to show branch * docs nav file update capitalization * update capitalization to force deploy * remove nomad-vs-kubernetes dir; move content to what is nomad pg * job section * Nomad operations category, deploy section * operations category, govern section * operations - manage * operations/scale; concepts scheduling fix * networking * monitor * secure section * remote auth-methods folder and move up pages to sso; linkcheck * Fix install2deploy redirects * fix architecture redirects * Job section: Add missing section index pages * Add section index pages so breadcrumbs build correctly * concepts/index fix front matter indentation * move task driver plugin config to new deploy section * Finish adding full URL to tutorials links in nav * change SSO to Authentication in nav and file system * Docs NomadIA: Move tutorials into NomadIA branch (#26132) * Move governance and policy from tutorials to docs * Move tutorials content to job-declare section * run jobs section * stateful workloads * advanced job scheduling * deploy section * manage section * monitor section * secure/acl and secure/authorization * fix example that contains an unseal key in real format * remove images from sso-vault * secure/traffic * secure/workload-identities * vault-acl change unseal key and root token in command output sample * remove lines from sample output * fix front matter * move nomad pack tutorials to tools * search/replace /nomad/tutorials links * update acl overview with content from deleted architecture/acl * fix spelling mistake * linkcheck - fix broken links * fix link to Nomad variables tutorial * fix link to Prometheus tutorial * move who uses Nomad to use cases page; move spec/config shortcuts add dividers * Move Consul out of Integrations; move namespaces to govern * move integrations/vault to secure/vault; delete integrations * move ref arch to docs; rename Deploy Nomad back to Install Nomad * address feedback * linkcheck fixes * Fixed raw_exec redirect * add info from /nomad/tutorials/manage-jobs/jobs * update page content with newer tutorial * link updates for architecture sub-folders * Add redirects for removed section index pages. Fix links. * fix broken links from linkcheck * Revert to use dev-portal main branch instead of nomadIA branch * build workaround: add intro-nav-data.json with single entry * fix content-check error * add intro directory to get around Vercel build error * workound for emtpry directory * remove mdx from /intro/ to fix content-check and git snafu * Add intro index.mdx so Vercel build should work --------- Co-authored-by: Tu Nguyen --- website/content/api-docs/acl/auth-methods.mdx | 2 +- website/content/api-docs/acl/tokens.mdx | 2 +- website/content/api-docs/evaluations.mdx | 2 +- website/content/api-docs/index.mdx | 4 +- website/content/api-docs/json-jobs.mdx | 24 +- website/content/api-docs/nodes.mdx | 2 +- website/content/api-docs/operator/index.mdx | 4 +- website/content/api-docs/operator/keyring.mdx | 4 +- website/content/api-docs/operator/raft.mdx | 2 +- .../content/api-docs/operator/snapshot.mdx | 2 +- .../api-docs/operator/upgrade-check.mdx | 2 +- .../content/api-docs/operator/utilization.mdx | 4 +- .../content/api-docs/sentinel-policies.mdx | 2 +- website/content/api-docs/task-api.mdx | 10 +- website/content/api-docs/variables/index.mdx | 4 +- website/content/api-docs/variables/locks.mdx | 26 +- website/content/api-docs/volumes.mdx | 2 +- .../commands/acl/auth-method/create.mdx | 0 .../commands/acl/auth-method/delete.mdx | 0 .../commands/acl/auth-method/info.mdx | 0 .../commands/acl/auth-method/list.mdx | 0 .../commands/acl/auth-method/update.mdx | 0 .../commands/acl/binding-rule/create.mdx | 0 .../commands/acl/binding-rule/delete.mdx | 0 .../commands/acl/binding-rule/info.mdx | 0 .../commands/acl/binding-rule/list.mdx | 0 .../commands/acl/binding-rule/update.mdx | 0 .../{docs => }/commands/acl/bootstrap.mdx | 0 .../content/{docs => }/commands/acl/index.mdx | 56 +- .../{docs => }/commands/acl/policy/apply.mdx | 0 .../{docs => }/commands/acl/policy/delete.mdx | 0 .../{docs => }/commands/acl/policy/info.mdx | 0 .../{docs => }/commands/acl/policy/list.mdx | 0 .../{docs => }/commands/acl/policy/self.mdx | 0 .../{docs => }/commands/acl/role/create.mdx | 0 .../{docs => }/commands/acl/role/delete.mdx | 0 .../{docs => }/commands/acl/role/info.mdx | 0 .../{docs => }/commands/acl/role/list.mdx | 0 .../{docs => }/commands/acl/role/update.mdx | 0 .../{docs => }/commands/acl/token/create.mdx | 0 .../{docs => }/commands/acl/token/delete.mdx | 0 .../{docs => }/commands/acl/token/info.mdx | 0 .../{docs => }/commands/acl/token/list.mdx | 0 .../{docs => }/commands/acl/token/self.mdx | 0 .../{docs => }/commands/acl/token/update.mdx | 0 .../{docs => }/commands/agent-info.mdx | 0 website/content/{docs => }/commands/agent.mdx | 4 +- .../{docs => }/commands/alloc/checks.mdx | 0 .../{docs => }/commands/alloc/exec.mdx | 0 .../content/{docs => }/commands/alloc/fs.mdx | 2 +- .../{docs => }/commands/alloc/index.mdx | 16 +- .../{docs => }/commands/alloc/logs.mdx | 0 .../{docs => }/commands/alloc/pause.mdx | 0 .../{docs => }/commands/alloc/restart.mdx | 0 .../{docs => }/commands/alloc/signal.mdx | 0 .../{docs => }/commands/alloc/status.mdx | 0 .../{docs => }/commands/alloc/stop.mdx | 2 +- .../{docs => }/commands/config/index.mdx | 2 +- .../{docs => }/commands/config/validate.mdx | 0 .../{docs => }/commands/deployment/fail.mdx | 2 +- .../{docs => }/commands/deployment/index.mdx | 12 +- .../{docs => }/commands/deployment/list.mdx | 0 .../{docs => }/commands/deployment/pause.mdx | 0 .../commands/deployment/promote.mdx | 4 +- .../{docs => }/commands/deployment/resume.mdx | 2 +- .../{docs => }/commands/deployment/status.mdx | 0 .../commands/deployment/unblock.mdx | 4 +- .../{docs => }/commands/eval/delete.mdx | 4 +- .../{docs => }/commands/eval/index.mdx | 6 +- .../content/{docs => }/commands/eval/list.mdx | 0 .../{docs => }/commands/eval/status.mdx | 0 website/content/{docs => }/commands/fmt.mdx | 0 website/content/{docs => }/commands/index.mdx | 0 .../{docs => }/commands/job/action.mdx | 0 .../{docs => }/commands/job/allocs.mdx | 2 +- .../{docs => }/commands/job/deployments.mdx | 0 .../{docs => }/commands/job/dispatch.mdx | 4 +- .../content/{docs => }/commands/job/eval.mdx | 2 +- .../{docs => }/commands/job/history.mdx | 0 .../content/{docs => }/commands/job/index.mdx | 40 +- .../content/{docs => }/commands/job/init.mdx | 2 +- .../{docs => }/commands/job/inspect.mdx | 0 .../commands/job/periodic-force.mdx | 2 +- .../content/{docs => }/commands/job/plan.mdx | 2 +- .../{docs => }/commands/job/promote.mdx | 4 +- .../{docs => }/commands/job/restart.mdx | 0 .../{docs => }/commands/job/revert.mdx | 6 +- .../content/{docs => }/commands/job/run.mdx | 8 +- .../content/{docs => }/commands/job/scale.mdx | 2 +- .../commands/job/scaling-events.mdx | 0 .../content/{docs => }/commands/job/start.mdx | 4 +- .../{docs => }/commands/job/status.mdx | 0 .../content/{docs => }/commands/job/stop.mdx | 2 +- .../{docs => }/commands/job/tag/apply.mdx | 0 .../{docs => }/commands/job/tag/index.mdx | 4 +- .../{docs => }/commands/job/tag/unset.mdx | 0 .../{docs => }/commands/job/validate.mdx | 0 .../{docs => }/commands/license/get.mdx | 0 .../{docs => }/commands/license/index.mdx | 4 +- .../{docs => }/commands/license/inspect.mdx | 0 website/content/{docs => }/commands/login.mdx | 0 .../content/{docs => }/commands/monitor.mdx | 0 .../{docs => }/commands/namespace/apply.mdx | 2 +- .../{docs => }/commands/namespace/delete.mdx | 0 .../{docs => }/commands/namespace/index.mdx | 14 +- .../{docs => }/commands/namespace/inspect.mdx | 0 .../{docs => }/commands/namespace/list.mdx | 0 .../{docs => }/commands/namespace/status.mdx | 0 .../{docs => }/commands/node-pool/apply.mdx | 0 .../{docs => }/commands/node-pool/delete.mdx | 0 .../{docs => }/commands/node-pool/index.mdx | 14 +- .../{docs => }/commands/node-pool/info.mdx | 0 .../{docs => }/commands/node-pool/init.mdx | 0 .../{docs => }/commands/node-pool/jobs.mdx | 0 .../{docs => }/commands/node-pool/list.mdx | 0 .../{docs => }/commands/node-pool/nodes.mdx | 0 .../{docs => }/commands/node/config.mdx | 0 .../{docs => }/commands/node/drain.mdx | 8 +- .../{docs => }/commands/node/eligibility.mdx | 2 +- .../{docs => }/commands/node/index.mdx | 10 +- .../{docs => }/commands/node/meta/apply.mdx | 0 .../{docs => }/commands/node/meta/index.mdx | 6 +- .../{docs => }/commands/node/meta/read.mdx | 0 .../{docs => }/commands/node/status.mdx | 0 .../{docs => }/commands/operator/api.mdx | 0 .../operator/autopilot/get-config.mdx | 2 +- .../commands/operator/autopilot/health.mdx | 2 +- .../operator/autopilot/set-config.mdx | 2 +- .../commands/operator/client-state.mdx | 0 .../{docs => }/commands/operator/debug.mdx | 0 .../operator/gossip/keyring-generate.mdx | 0 .../operator/gossip/keyring-install.mdx | 0 .../commands/operator/gossip/keyring-list.mdx | 0 .../operator/gossip/keyring-remove.mdx | 0 .../commands/operator/gossip/keyring-use.mdx | 0 .../{docs => }/commands/operator/index.mdx | 40 +- .../{docs => }/commands/operator/metrics.mdx | 0 .../commands/operator/raft/info.mdx | 0 .../commands/operator/raft/list-peers.mdx | 2 +- .../commands/operator/raft/logs.mdx | 0 .../commands/operator/raft/remove-peer.mdx | 6 +- .../commands/operator/raft/state.mdx | 0 .../operator/raft/transfer-leadership.mdx | 2 +- .../commands/operator/root/keyring-list.mdx | 0 .../commands/operator/root/keyring-remove.mdx | 2 +- .../commands/operator/root/keyring-rotate.mdx | 0 .../operator/scheduler/get-config.mdx | 0 .../operator/scheduler/set-config.mdx | 0 .../commands/operator/snapshot/agent.mdx | 0 .../commands/operator/snapshot/inspect.mdx | 2 +- .../commands/operator/snapshot/redact.mdx | 0 .../commands/operator/snapshot/restore.mdx | 2 +- .../commands/operator/snapshot/save.mdx | 2 +- .../commands/operator/snapshot/state.mdx | 0 .../commands/operator/utilization.mdx | 4 +- .../{docs => }/commands/plugin/index.mdx | 2 +- .../{docs => }/commands/plugin/status.mdx | 0 .../{docs => }/commands/quota/apply.mdx | 0 .../{docs => }/commands/quota/delete.mdx | 0 .../{docs => }/commands/quota/index.mdx | 12 +- .../{docs => }/commands/quota/init.mdx | 0 .../{docs => }/commands/quota/inspect.mdx | 0 .../{docs => }/commands/quota/list.mdx | 0 .../{docs => }/commands/quota/status.mdx | 0 .../commands/recommendation/apply.mdx | 0 .../commands/recommendation/dismiss.mdx | 0 .../commands/recommendation/index.mdx | 8 +- .../commands/recommendation/info.mdx | 0 .../commands/recommendation/list.mdx | 0 .../{docs => }/commands/scaling/index.mdx | 4 +- .../commands/scaling/policy-info.mdx | 0 .../commands/scaling/policy-list.mdx | 0 .../{docs => }/commands/sentinel/apply.mdx | 4 +- .../{docs => }/commands/sentinel/delete.mdx | 0 .../{docs => }/commands/sentinel/index.mdx | 8 +- .../{docs => }/commands/sentinel/list.mdx | 0 .../{docs => }/commands/sentinel/read.mdx | 0 .../commands/server/force-leave.mdx | 0 .../{docs => }/commands/server/index.mdx | 6 +- .../{docs => }/commands/server/join.mdx | 4 +- .../{docs => }/commands/server/members.mdx | 0 .../{docs => }/commands/service/delete.mdx | 0 .../{docs => }/commands/service/index.mdx | 6 +- .../{docs => }/commands/service/info.mdx | 0 .../{docs => }/commands/service/list.mdx | 0 .../{docs => }/commands/setup/consul.mdx | 0 .../{docs => }/commands/setup/index.mdx | 4 +- .../{docs => }/commands/setup/vault.mdx | 0 .../content/{docs => }/commands/status.mdx | 0 .../content/{docs => }/commands/system/gc.mdx | 0 .../{docs => }/commands/system/index.mdx | 4 +- .../commands/system/reconcile-summaries.mdx | 0 .../{docs => }/commands/tls/ca-create.mdx | 0 .../{docs => }/commands/tls/ca-info.mdx | 0 .../{docs => }/commands/tls/cert-create.mdx | 2 +- .../{docs => }/commands/tls/cert-info.mdx | 0 .../content/{docs => }/commands/tls/index.mdx | 8 +- website/content/{docs => }/commands/ui.mdx | 0 .../content/{docs => }/commands/var/get.mdx | 0 .../content/{docs => }/commands/var/index.mdx | 12 +- .../content/{docs => }/commands/var/init.mdx | 0 .../content/{docs => }/commands/var/list.mdx | 0 .../content/{docs => }/commands/var/lock.mdx | 0 .../content/{docs => }/commands/var/purge.mdx | 0 .../content/{docs => }/commands/var/put.mdx | 0 .../content/{docs => }/commands/version.mdx | 0 .../commands/volume/claim-delete.mdx | 0 .../{docs => }/commands/volume/claim-list.mdx | 0 .../{docs => }/commands/volume/create.mdx | 6 +- .../{docs => }/commands/volume/delete.mdx | 6 +- .../{docs => }/commands/volume/deregister.mdx | 2 +- .../{docs => }/commands/volume/detach.mdx | 0 .../{docs => }/commands/volume/index.mdx | 24 +- .../{docs => }/commands/volume/init.mdx | 0 .../{docs => }/commands/volume/register.mdx | 4 +- .../commands/volume/snapshot-create.mdx | 4 +- .../commands/volume/snapshot-delete.mdx | 4 +- .../commands/volume/snapshot-list.mdx | 4 +- .../{docs => }/commands/volume/status.mdx | 2 +- .../cluster}/consensus.mdx | 0 .../cluster}/federation.mdx | 14 +- .../docs/architecture/cluster/node-pools.mdx | 11 + .../docs/{concepts => architecture}/cpu.mdx | 0 .../{concepts => }/architecture/index.mdx | 6 +- .../security}/gossip.mdx | 2 +- .../security/index.mdx} | 26 +- .../plugins => architecture}/storage/csi.mdx | 0 .../storage/host-volumes.mdx | 6 +- .../storage/index.mdx | 4 +- .../storage}/stateful-workloads.mdx | 18 +- website/content/docs/concepts/acl/index.mdx | 122 - website/content/docs/concepts/filesystem.mdx | 4 +- website/content/docs/concepts/index.mdx | 12 - website/content/docs/concepts/job.mdx | 10 +- ...cheduling.mdx => how-scheduling-works.mdx} | 4 +- .../docs/concepts/scheduling/index.mdx | 8 +- .../docs/concepts/scheduling/placement.mdx | 8 +- .../{ => concepts/scheduling}/schedulers.mdx | 4 +- .../docs/concepts/stateful-deployments.mdx | 10 +- website/content/docs/concepts/variables.mdx | 4 +- .../docs/concepts/workload-identity.mdx | 6 +- website/content/docs/configuration/acl.mdx | 2 +- .../content/docs/configuration/autopilot.mdx | 2 +- website/content/docs/configuration/client.mdx | 20 +- website/content/docs/configuration/consul.mdx | 2 +- .../docs/configuration/keyring/index.mdx | 8 +- .../content/docs/configuration/reporting.mdx | 2 +- .../content/docs/configuration/sentinel.mdx | 6 +- website/content/docs/configuration/server.mdx | 18 +- .../content/docs/configuration/telemetry.mdx | 2 +- website/content/docs/configuration/tls.mdx | 2 +- website/content/docs/configuration/vault.mdx | 2 +- .../docs/deploy/clusters/connect-nodes.mdx | 226 ++ .../docs/deploy/clusters/federate-regions.mdx | 148 + .../clusters/federation-considerations.mdx} | 12 +- .../federation-failure-scenarios.mdx} | 0 .../content/docs/deploy/clusters/index.mdx | 15 + .../docs/deploy/clusters/reverse-proxy-ui.mdx | 392 ++ .../docs/{install => deploy}/index.mdx | 2 +- .../{operations => deploy}/nomad-agent.mdx | 24 +- .../{install => deploy}/production/index.mdx | 16 +- .../production/reference-architecture.mdx | 252 ++ .../production/requirements.mdx | 8 +- .../production}/windows-service.mdx | 0 .../docs/deploy/task-driver/docker.mdx | 491 +++ .../{drivers => deploy/task-driver}/exec.mdx | 149 +- .../content/docs/deploy/task-driver/index.mdx | 62 + .../{drivers => deploy/task-driver}/java.mdx | 149 +- .../content/docs/deploy/task-driver/qemu.mdx | 103 + .../task-driver}/raw_exec.mdx | 99 +- website/content/docs/drivers/index.mdx | 11 - website/content/docs/ecosystem.mdx | 8 +- website/content/docs/enterprise/index.mdx | 16 +- .../content/docs/enterprise/license/faq.mdx | 4 +- .../content/docs/enterprise/license/index.mdx | 6 +- .../license/utilization-reporting.mdx | 2 +- website/content/docs/faq.mdx | 4 +- website/content/docs/glossary.mdx | 2 +- website/content/docs/govern/index.mdx | 113 + website/content/docs/govern/namespaces.mdx | 144 + .../content/docs/govern/resource-quotas.mdx | 276 ++ website/content/docs/govern/sentinel.mdx | 206 ++ .../content/docs/govern/use-node-pools.mdx | 10 + website/content/docs/integrations/index.mdx | 12 - .../docs/job-declare/configure-tasks.mdx | 212 ++ .../consul-service-mesh.mdx} | 11 +- .../content/docs/job-declare/create-job.mdx | 193 + .../content/docs/job-declare/exit-signals.mdx | 45 + .../job-declare/failure/check-restart.mdx | 80 + .../docs/job-declare/failure/index.mdx | 32 + .../docs/job-declare/failure/reschedule.mdx | 94 + .../docs/job-declare/failure/restart.mdx | 99 + website/content/docs/job-declare/index.mdx | 50 + .../content/docs/job-declare/multiregion.mdx | 459 +++ .../docs/job-declare/nomad-actions.mdx | 550 +++ .../docs/job-declare/nomad-variables.mdx | 482 +++ .../strategy/blue-green-canary.mdx | 474 +++ .../docs/job-declare/strategy/index.mdx | 16 + .../docs/job-declare/strategy/rolling.mdx | 323 ++ .../docs/job-declare/task-dependencies.mdx | 385 ++ .../task-driver}/docker.mdx | 484 +-- .../docs/job-declare/task-driver/exec.mdx | 148 + .../docs/job-declare/task-driver/index.mdx | 42 + .../docs/job-declare/task-driver/java.mdx | 154 + .../task-driver}/qemu.mdx | 86 +- .../docs/job-declare/task-driver/raw_exec.mdx | 109 + website/content/docs/job-declare/vault.mdx | 49 + website/content/docs/job-networking/cni.mdx | 46 + website/content/docs/job-networking/index.mdx | 12 + .../docs/job-networking/service-discovery.mdx | 10 + website/content/docs/job-run/index.mdx | 46 + website/content/docs/job-run/inspect.mdx | 212 ++ website/content/docs/job-run/logs.mdx | 128 + .../docs/job-run/utilization-metrics.mdx | 91 + website/content/docs/job-run/versions.mdx | 552 +++ .../content/docs/job-scheduling/affinity.mdx | 250 ++ website/content/docs/job-scheduling/index.mdx | 33 + .../docs/job-scheduling/preemption.mdx | 502 +++ .../content/docs/job-scheduling/spread.mdx | 277 ++ .../docs/job-specification/affinity.mdx | 8 +- .../docs/job-specification/artifact.mdx | 4 +- .../docs/job-specification/connect.mdx | 4 +- .../docs/job-specification/constraint.mdx | 8 +- .../content/docs/job-specification/consul.mdx | 8 +- .../docs/job-specification/csi_plugin.mdx | 4 +- .../job-specification/dispatch_payload.mdx | 4 +- .../content/docs/job-specification/env.mdx | 4 +- .../docs/job-specification/gateway.mdx | 4 +- .../content/docs/job-specification/group.mdx | 4 +- .../docs/job-specification/identity.mdx | 6 +- .../content/docs/job-specification/job.mdx | 6 +- .../content/docs/job-specification/logs.mdx | 4 +- .../content/docs/job-specification/meta.mdx | 4 +- .../docs/job-specification/migrate.mdx | 6 +- .../docs/job-specification/multiregion.mdx | 8 +- .../docs/job-specification/network.mdx | 12 +- .../docs/job-specification/parameterized.mdx | 4 +- .../content/docs/job-specification/proxy.mdx | 4 +- .../docs/job-specification/resources.mdx | 8 +- .../docs/job-specification/service.mdx | 10 +- .../job-specification/sidecar_service.mdx | 4 +- .../docs/job-specification/sidecar_task.mdx | 4 +- .../content/docs/job-specification/spread.mdx | 12 +- .../content/docs/job-specification/task.mdx | 16 +- .../docs/job-specification/template.mdx | 8 +- .../job-specification/transparent_proxy.mdx | 6 +- .../content/docs/job-specification/update.mdx | 4 +- .../docs/job-specification/upstreams.mdx | 4 +- .../content/docs/job-specification/vault.mdx | 6 +- .../content/docs/job-specification/volume.mdx | 12 +- .../docs/job-specification/volume_mount.mdx | 2 +- website/content/docs/manage/autopilot.mdx | 244 ++ .../content/docs/manage/format-cli-output.mdx | 415 +++ .../garbage-collection.mdx | 4 +- website/content/docs/manage/index.mdx | 10 + .../{operations => manage}/key-management.mdx | 6 +- .../content/docs/manage/migrate-workloads.mdx | 361 ++ .../content/docs/manage/outage-recovery.mdx | 252 ++ .../content/docs/monitor/cluster-topology.mdx | 123 + website/content/docs/monitor/event-stream.mdx | 99 + .../index.mdx} | 14 +- .../content/docs/monitor/inspect-cluster.mdx | 172 + .../docs/monitor/inspect-workloads.mdx | 221 ++ website/content/docs/networking/cni.mdx | 21 +- .../consul/index.mdx | 51 +- .../consul/service-mesh.mdx | 77 +- website/content/docs/networking/index.mdx | 22 +- .../ipv6-support.mdx => networking/ipv6.mdx} | 8 +- .../docs/networking/service-discovery.mdx | 142 +- .../docs/nomad-vs-kubernetes/alternative.mdx | 41 - .../docs/nomad-vs-kubernetes/index.mdx | 41 - .../docs/nomad-vs-kubernetes/supplement.mdx | 41 - .../content/docs/operations/benchmarking.mdx | 19 - website/content/docs/operations/index.mdx | 19 - .../docs/other-specifications/acl-policy.mdx | 2 +- .../docs/other-specifications/namespace.mdx | 8 +- .../docs/other-specifications/node-pool.mdx | 6 +- .../docs/other-specifications/quota.mdx | 8 +- .../docs/other-specifications/variables.mdx | 6 +- .../docs/other-specifications/volume/csi.mdx | 6 +- .../docs/other-specifications/volume/host.mdx | 8 +- .../other-specifications/volume/index.mdx | 4 +- website/content/docs/partnerships.mdx | 14 +- .../content/docs/{install => }/quickstart.mdx | 6 +- .../docs/reference/go-template-syntax.mdx | 402 ++ .../hcl2/expressions.mdx | 8 +- .../hcl2/functions/collection/chunklist.mdx | 0 .../hcl2/functions/collection/coalesce.mdx | 2 +- .../functions/collection/coalescelist.mdx | 2 +- .../hcl2/functions/collection/compact.mdx | 0 .../hcl2/functions/collection/concat.mdx | 0 .../hcl2/functions/collection/contains.mdx | 0 .../hcl2/functions/collection/distinct.mdx | 0 .../hcl2/functions/collection/element.mdx | 4 +- .../hcl2/functions/collection/flatten.mdx | 4 +- .../hcl2/functions/collection/index-fn.mdx | 2 +- .../hcl2/functions/collection/keys.mdx | 2 +- .../hcl2/functions/collection/length.mdx | 0 .../hcl2/functions/collection/lookup.mdx | 2 +- .../hcl2/functions/collection/merge.mdx | 0 .../hcl2/functions/collection/range.mdx | 0 .../hcl2/functions/collection/reverse.mdx | 2 +- .../functions/collection/setintersection.mdx | 6 +- .../hcl2/functions/collection/setproduct.mdx | 10 +- .../hcl2/functions/collection/setunion.mdx | 6 +- .../hcl2/functions/collection/slice.mdx | 2 +- .../hcl2/functions/collection/sort.mdx | 0 .../hcl2/functions/collection/values.mdx | 4 +- .../hcl2/functions/collection/zipmap.mdx | 0 .../hcl2/functions/conversion/can.mdx | 6 +- .../hcl2/functions/conversion/convert.mdx | 0 .../hcl2/functions/conversion/try.mdx | 2 +- .../hcl2/functions/crypto/bcrypt.mdx | 0 .../hcl2/functions/crypto/md5.mdx | 0 .../hcl2/functions/crypto/rsadecrypt.mdx | 2 +- .../hcl2/functions/crypto/sha1.mdx | 0 .../hcl2/functions/crypto/sha256.mdx | 0 .../hcl2/functions/crypto/sha512.mdx | 0 .../hcl2/functions/datetime/formatdate.mdx | 2 +- .../hcl2/functions/datetime/timeadd.mdx | 0 .../hcl2/functions/encoding/base64decode.mdx | 2 +- .../hcl2/functions/encoding/base64encode.mdx | 2 +- .../hcl2/functions/encoding/csvdecode.mdx | 0 .../hcl2/functions/encoding/jsondecode.mdx | 4 +- .../hcl2/functions/encoding/jsonencode.mdx | 4 +- .../hcl2/functions/encoding/urlencode.mdx | 0 .../hcl2/functions/encoding/yamldecode.mdx | 6 +- .../hcl2/functions/encoding/yamlencode.mdx | 10 +- .../hcl2/functions/file/abspath.mdx | 2 +- .../hcl2/functions/file/basename.mdx | 4 +- .../hcl2/functions/file/dirname.mdx | 4 +- .../hcl2/functions/file/file.mdx | 2 +- .../hcl2/functions/file/filebase64.mdx | 4 +- .../hcl2/functions/file/fileexists.mdx | 2 +- .../hcl2/functions/file/fileset.mdx | 0 .../hcl2/functions/file/pathexpand.mdx | 0 .../hcl2/functions/index.mdx | 0 .../hcl2/functions/ipnet/cidrhost.mdx | 4 +- .../hcl2/functions/ipnet/cidrnetmask.mdx | 0 .../hcl2/functions/ipnet/cidrsubnet.mdx | 12 +- .../hcl2/functions/ipnet/cidrsubnets.mdx | 10 +- .../hcl2/functions/numeric/abs.mdx | 0 .../hcl2/functions/numeric/ceil.mdx | 2 +- .../hcl2/functions/numeric/floor.mdx | 2 +- .../hcl2/functions/numeric/log.mdx | 0 .../hcl2/functions/numeric/max.mdx | 2 +- .../hcl2/functions/numeric/min.mdx | 2 +- .../hcl2/functions/numeric/parseint.mdx | 2 +- .../hcl2/functions/numeric/pow.mdx | 0 .../hcl2/functions/numeric/signum.mdx | 0 .../hcl2/functions/string/chomp.mdx | 2 +- .../hcl2/functions/string/format.mdx | 4 +- .../hcl2/functions/string/formatlist.mdx | 4 +- .../hcl2/functions/string/indent.mdx | 0 .../hcl2/functions/string/join.mdx | 2 +- .../hcl2/functions/string/lower.mdx | 4 +- .../hcl2/functions/string/regex_replace.mdx | 2 +- .../hcl2/functions/string/replace.mdx | 2 +- .../hcl2/functions/string/split.mdx | 2 +- .../hcl2/functions/string/strlen.mdx | 0 .../hcl2/functions/string/strrev.mdx | 2 +- .../hcl2/functions/string/substr.mdx | 0 .../hcl2/functions/string/title.mdx | 4 +- .../hcl2/functions/string/trim.mdx | 6 +- .../hcl2/functions/string/trimprefix.mdx | 6 +- .../hcl2/functions/string/trimspace.mdx | 2 +- .../hcl2/functions/string/trimsuffix.mdx | 6 +- .../hcl2/functions/string/upper.mdx | 4 +- .../hcl2/functions/uuid/uuidv4.mdx | 2 +- .../hcl2/functions/uuid/uuidv5.mdx | 4 +- .../hcl2/index.mdx | 12 +- .../hcl2/locals.mdx | 2 +- .../hcl2/syntax.mdx | 2 +- .../hcl2/variables.mdx | 10 +- .../metrics.mdx} | 2 +- .../runtime-environment-settings.mdx} | 2 +- .../runtime-variable-interpolation.mdx} | 8 +- .../sentinel-policy.mdx} | 6 +- .../docs/release-notes/nomad/upcoming.mdx | 4 +- .../docs/release-notes/nomad/v1-10-x.mdx | 18 +- .../docs/release-notes/nomad/v1_8_x.mdx | 4 +- website/content/docs/runtime/index.mdx | 18 - website/content/docs/scale/benchmarking.mdx | 20 + website/content/docs/scale/index.mdx | 16 + website/content/docs/secure/acl/bootstrap.mdx | 317 ++ .../consul/acl.mdx => secure/acl/consul.mdx} | 17 +- website/content/docs/secure/acl/index.mdx | 193 + .../secure/acl/policies/create-policy.mdx | 444 +++ .../docs/secure/acl/policies/index.mdx | 404 +++ .../content/docs/secure/acl/tokens/index.mdx | 325 ++ .../content/docs/secure/acl/tokens/vault.mdx | 176 + .../authentication}/jwt.mdx | 4 +- .../authentication}/oidc.mdx | 16 +- .../docs/secure/authentication/sso-auth0.mdx | 329 ++ .../secure/authentication/sso-pkce-jwt.mdx | 1032 ++++++ .../docs/secure/authentication/sso-vault.mdx | 584 +++ website/content/docs/secure/index.mdx | 10 + .../docs/secure/traffic/gossip-encryption.mdx | 88 + website/content/docs/secure/traffic/index.mdx | 33 + website/content/docs/secure/traffic/tls.mdx | 479 +++ .../{integrations => secure}/vault/acl.mdx | 45 +- .../{integrations => secure}/vault/index.mdx | 2 +- .../workload-identity}/aws-oidc-provider.mdx | 0 .../docs/secure/workload-identity/index.mdx | 11 + .../docs/secure/workload-identity/vault.mdx | 1343 +++++++ .../docs/stateful-workloads/csi-volumes.mdx | 661 ++++ .../dynamic-host-volumes.mdx | 250 ++ .../content/docs/stateful-workloads/index.mdx | 66 + .../static-host-volumes.mdx | 389 ++ website/content/docs/upgrade/index.mdx | 18 +- .../content/docs/upgrade/upgrade-specific.mdx | 4 +- .../{who-uses-nomad.mdx => use-cases.mdx} | 81 +- website/content/docs/what-is-nomad.mdx | 181 + website/content/intro/README.md | 1 + website/content/intro/index.mdx | 9 +- website/content/intro/use-cases.mdx | 70 - website/content/intro/vs/ecs.mdx | 30 - website/content/intro/vs/index.mdx | 35 - website/content/intro/vs/mesos.mdx | 35 - website/content/intro/vs/terraform.mdx | 31 - .../content/partials/consul-namespaces.mdx | 44 + website/content/partials/envvars.mdx | 8 +- .../partials/install/cgroup-controllers.mdx | 2 +- .../partials/jwt_claim_mapping_details.mdx | 2 +- .../concepts => partials}/node-pools.mdx | 20 +- .../v1-10/breaking-sentinel-apply.mdx | 2 +- .../content/partials/service-discovery.mdx | 140 + .../content/partials/task-driver-intro.mdx | 61 +- website/content/plugins/devices/nvidia.mdx | 8 +- website/content/plugins/drivers/podman.mdx | 6 +- .../content/plugins/drivers/virt/index.mdx | 2 +- website/content/tools/autoscaling/agent.mdx | 2 +- .../policy-eval/node-selector-strategy.mdx | 2 +- .../tools/nomad-pack/advanced-usage.mdx | 193 + .../content/tools/nomad-pack/create-packs.mdx | 456 +++ website/content/tools/nomad-pack/index.mdx | 268 ++ website/data/commands-nav-data.json | 1020 ++++++ website/data/docs-nav-data.json | 3225 ++++++----------- website/data/intro-nav-data.json | 29 +- website/data/tools-nav-data.json | 19 +- website/public/img/clusters/active-alert.png | Bin 0 -> 29387 bytes .../img/clusters/alertmanager-webui.png | Bin 0 -> 24999 bytes website/public/img/clusters/alerts.png | Bin 0 -> 20148 bytes .../public/img/clusters/cannot-fetch-logs.png | Bin 0 -> 98523 bytes .../img/clusters/cannot-remote-exec.png | Bin 0 -> 237318 bytes .../public/img/clusters/chrome-pending.png | Bin 0 -> 478633 bytes .../public/img/clusters/chrome-timeout.png | Bin 0 -> 413854 bytes .../public/img/clusters/firefox-pending.png | Bin 0 -> 417237 bytes .../public/img/clusters/firefox-timeout.png | Bin 0 -> 422859 bytes website/public/img/clusters/new-targets.png | Bin 0 -> 68349 bytes .../img/clusters/nomad-multi-region.png | Bin 0 -> 21902 bytes .../img/clusters/prometheus-targets.png | Bin 0 -> 43575 bytes website/public/img/clusters/running-jobs.png | Bin 0 -> 29230 bytes .../public/img/clusters/safari-pending.png | Bin 0 -> 654522 bytes .../public/img/clusters/safart-timeout.png | Bin 0 -> 661156 bytes .../img/deploy/nomad_fault_tolerance.png | Bin 0 -> 40010 bytes .../public/img/deploy/nomad_network_arch.png | Bin 0 -> 38074 bytes .../img/deploy/nomad_network_arch_0-1x.png | Bin 0 -> 55629 bytes .../img/deploy/nomad_network_arch_0-1y.png | Bin 0 -> 163656 bytes .../img/deploy/nomad_reference_diagram.png | Bin 0 -> 33672 bytes .../govern/nomad-ui-namespace-dropdown.png | Bin 0 -> 30655 bytes website/public/img/govern/sentinel.jpg | Bin 0 -> 92691 bytes .../monitor/guide-ui-img-alloc-preempted.png | Bin 0 -> 162699 bytes .../monitor/guide-ui-img-alloc-preempter.png | Bin 0 -> 69602 bytes .../guide-ui-img-alloc-reschedule-details.png | Bin 0 -> 68957 bytes .../guide-ui-img-alloc-reschedule-icon.png | Bin 0 -> 62969 bytes ...uide-ui-img-alloc-resource-utilization.png | Bin 0 -> 188082 bytes .../guide-ui-img-alloc-stop-restart.png | Bin 0 -> 65381 bytes .../guide-ui-img-alloc-unhealthy-driver.png | Bin 0 -> 94935 bytes .../guide-ui-img-client-allocations.png | Bin 0 -> 85818 bytes .../guide-ui-img-client-attributes.png | Bin 0 -> 169958 bytes .../monitor/guide-ui-img-client-detail.png | Bin 0 -> 531565 bytes .../img/monitor/guide-ui-img-client-drain.png | Bin 0 -> 25873 bytes .../guide-ui-img-client-driver-status.png | Bin 0 -> 63605 bytes .../monitor/guide-ui-img-client-events.png | Bin 0 -> 42185 bytes ...ide-ui-img-client-resource-utilization.png | Bin 0 -> 201104 bytes .../monitor/guide-ui-img-clients-filters.png | Bin 0 -> 22242 bytes .../img/monitor/guide-ui-img-clients-list.png | Bin 0 -> 159073 bytes .../guide-ui-img-job-definition-edit.png | Bin 0 -> 247158 bytes .../guide-ui-img-job-deployment-canary.png | Bin 0 -> 174730 bytes .../img/monitor/guide-ui-img-job-filters.png | Bin 0 -> 28648 bytes .../img/monitor/guide-ui-img-job-stop.png | Bin 0 -> 43508 bytes .../monitor/guide-ui-img-periodic-force.png | Bin 0 -> 123400 bytes .../monitor/guide-ui-img-server-detail.png | Bin 0 -> 99914 bytes .../img/monitor/guide-ui-img-servers-list.png | Bin 0 -> 91406 bytes .../img/monitor/guide-ui-img-task-events.png | Bin 0 -> 206655 bytes .../img/monitor/guide-ui-img-task-logs.png | Bin 0 -> 180296 bytes .../public/img/monitor/guide-ui-jobs-list.png | Bin 0 -> 108240 bytes .../public/img/monitor/inspect-cluster.mdx | 172 + .../public/img/monitor/inspect-workloads.mdx | 221 ++ .../alloc-associations-across-dcs.png | Bin 0 -> 409511 bytes .../topo-viz/allocation-associations.png | Bin 0 -> 804862 bytes .../img/monitor/topo-viz/allocation-panel.png | Bin 0 -> 109402 bytes .../monitor/topo-viz/allocation-tooltip.png | Bin 0 -> 31106 bytes .../img/monitor/topo-viz/client-panel.png | Bin 0 -> 72400 bytes .../topo-viz/client-with-many-allocs.png | Bin 0 -> 15245 bytes .../img/monitor/topo-viz/cluster-panel.png | Bin 0 -> 50828 bytes .../img/monitor/topo-viz/cluster-view.png | Bin 0 -> 781441 bytes .../img/monitor/topo-viz/empty-clients.png | Bin 0 -> 102167 bytes .../img/monitor/topo-viz/topo-viz-link.png | Bin 0 -> 368719 bytes website/public/img/secure/acl.jpg | Bin 0 -> 51650 bytes .../auth0-configure-callback-action.png | Bin 0 -> 101049 bytes .../secure/auth0-configure-callback-flow.png | Bin 0 -> 85378 bytes .../secure/auth0-configure-callback-rule.png | Bin 0 -> 109959 bytes .../secure/auth0-configure-callback-urls.png | Bin 0 -> 161845 bytes .../secure/auth0-configure-user-metadata.png | Bin 0 -> 77841 bytes .../public/img/secure/auth0-create-user.png | Bin 0 -> 38049 bytes .../secure/auth0-get-application-params.png | Bin 0 -> 95611 bytes .../img/secure/nomad-ui-jobs-signed-in.png | Bin 0 -> 35227 bytes .../img/secure/nomad-ui-oidc-login-button.png | Bin 0 -> 20549 bytes .../img/secure/nomad-ui-oidc-login-form.png | Bin 0 -> 23746 bytes .../img/secure/nomad-ui-oidc-signed-in.png | Bin 0 -> 29941 bytes website/redirects.js | 261 +- website/scripts/website-build.sh | 2 +- 614 files changed, 23285 insertions(+), 4609 deletions(-) rename website/content/{docs => }/commands/acl/auth-method/create.mdx (100%) rename website/content/{docs => }/commands/acl/auth-method/delete.mdx (100%) rename website/content/{docs => }/commands/acl/auth-method/info.mdx (100%) rename website/content/{docs => }/commands/acl/auth-method/list.mdx (100%) rename website/content/{docs => }/commands/acl/auth-method/update.mdx (100%) rename website/content/{docs => }/commands/acl/binding-rule/create.mdx (100%) rename website/content/{docs => }/commands/acl/binding-rule/delete.mdx (100%) rename website/content/{docs => }/commands/acl/binding-rule/info.mdx (100%) rename website/content/{docs => }/commands/acl/binding-rule/list.mdx (100%) rename website/content/{docs => }/commands/acl/binding-rule/update.mdx (100%) rename website/content/{docs => }/commands/acl/bootstrap.mdx (100%) rename website/content/{docs => }/commands/acl/index.mdx (64%) rename website/content/{docs => }/commands/acl/policy/apply.mdx (100%) rename website/content/{docs => }/commands/acl/policy/delete.mdx (100%) rename website/content/{docs => }/commands/acl/policy/info.mdx (100%) rename website/content/{docs => }/commands/acl/policy/list.mdx (100%) rename website/content/{docs => }/commands/acl/policy/self.mdx (100%) rename website/content/{docs => }/commands/acl/role/create.mdx (100%) rename website/content/{docs => }/commands/acl/role/delete.mdx (100%) rename website/content/{docs => }/commands/acl/role/info.mdx (100%) rename website/content/{docs => }/commands/acl/role/list.mdx (100%) rename website/content/{docs => }/commands/acl/role/update.mdx (100%) rename website/content/{docs => }/commands/acl/token/create.mdx (100%) rename website/content/{docs => }/commands/acl/token/delete.mdx (100%) rename website/content/{docs => }/commands/acl/token/info.mdx (100%) rename website/content/{docs => }/commands/acl/token/list.mdx (100%) rename website/content/{docs => }/commands/acl/token/self.mdx (100%) rename website/content/{docs => }/commands/acl/token/update.mdx (100%) rename website/content/{docs => }/commands/agent-info.mdx (100%) rename website/content/{docs => }/commands/agent.mdx (98%) rename website/content/{docs => }/commands/alloc/checks.mdx (100%) rename website/content/{docs => }/commands/alloc/exec.mdx (100%) rename website/content/{docs => }/commands/alloc/fs.mdx (96%) rename website/content/{docs => }/commands/alloc/index.mdx (63%) rename website/content/{docs => }/commands/alloc/logs.mdx (100%) rename website/content/{docs => }/commands/alloc/pause.mdx (100%) rename website/content/{docs => }/commands/alloc/restart.mdx (100%) rename website/content/{docs => }/commands/alloc/signal.mdx (100%) rename website/content/{docs => }/commands/alloc/status.mdx (100%) rename website/content/{docs => }/commands/alloc/stop.mdx (98%) rename website/content/{docs => }/commands/config/index.mdx (86%) rename website/content/{docs => }/commands/config/validate.mdx (100%) rename website/content/{docs => }/commands/deployment/fail.mdx (97%) rename website/content/{docs => }/commands/deployment/index.mdx (65%) rename website/content/{docs => }/commands/deployment/list.mdx (100%) rename website/content/{docs => }/commands/deployment/pause.mdx (100%) rename website/content/{docs => }/commands/deployment/promote.mdx (98%) rename website/content/{docs => }/commands/deployment/resume.mdx (96%) rename website/content/{docs => }/commands/deployment/status.mdx (100%) rename website/content/{docs => }/commands/deployment/unblock.mdx (95%) rename website/content/{docs => }/commands/eval/delete.mdx (93%) rename website/content/{docs => }/commands/eval/index.mdx (75%) rename website/content/{docs => }/commands/eval/list.mdx (100%) rename website/content/{docs => }/commands/eval/status.mdx (100%) rename website/content/{docs => }/commands/fmt.mdx (100%) rename website/content/{docs => }/commands/index.mdx (100%) rename website/content/{docs => }/commands/job/action.mdx (100%) rename website/content/{docs => }/commands/job/allocs.mdx (97%) rename website/content/{docs => }/commands/job/deployments.mdx (100%) rename website/content/{docs => }/commands/job/dispatch.mdx (98%) rename website/content/{docs => }/commands/job/eval.mdx (98%) rename website/content/{docs => }/commands/job/history.mdx (100%) rename website/content/{docs => }/commands/job/index.mdx (54%) rename website/content/{docs => }/commands/job/init.mdx (94%) rename website/content/{docs => }/commands/job/inspect.mdx (100%) rename website/content/{docs => }/commands/job/periodic-force.mdx (98%) rename website/content/{docs => }/commands/job/plan.mdx (99%) rename website/content/{docs => }/commands/job/promote.mdx (98%) rename website/content/{docs => }/commands/job/restart.mdx (100%) rename website/content/{docs => }/commands/job/revert.mdx (95%) rename website/content/{docs => }/commands/job/run.mdx (97%) rename website/content/{docs => }/commands/job/scale.mdx (98%) rename website/content/{docs => }/commands/job/scaling-events.mdx (100%) rename website/content/{docs => }/commands/job/start.mdx (95%) rename website/content/{docs => }/commands/job/status.mdx (100%) rename website/content/{docs => }/commands/job/stop.mdx (98%) rename website/content/{docs => }/commands/job/tag/apply.mdx (100%) rename website/content/{docs => }/commands/job/tag/index.mdx (74%) rename website/content/{docs => }/commands/job/tag/unset.mdx (100%) rename website/content/{docs => }/commands/job/validate.mdx (100%) rename website/content/{docs => }/commands/license/get.mdx (100%) rename website/content/{docs => }/commands/license/index.mdx (79%) rename website/content/{docs => }/commands/license/inspect.mdx (100%) rename website/content/{docs => }/commands/login.mdx (100%) rename website/content/{docs => }/commands/monitor.mdx (100%) rename website/content/{docs => }/commands/namespace/apply.mdx (95%) rename website/content/{docs => }/commands/namespace/delete.mdx (100%) rename website/content/{docs => }/commands/namespace/index.mdx (67%) rename website/content/{docs => }/commands/namespace/inspect.mdx (100%) rename website/content/{docs => }/commands/namespace/list.mdx (100%) rename website/content/{docs => }/commands/namespace/status.mdx (100%) rename website/content/{docs => }/commands/node-pool/apply.mdx (100%) rename website/content/{docs => }/commands/node-pool/delete.mdx (100%) rename website/content/{docs => }/commands/node-pool/index.mdx (77%) rename website/content/{docs => }/commands/node-pool/info.mdx (100%) rename website/content/{docs => }/commands/node-pool/init.mdx (100%) rename website/content/{docs => }/commands/node-pool/jobs.mdx (100%) rename website/content/{docs => }/commands/node-pool/list.mdx (100%) rename website/content/{docs => }/commands/node-pool/nodes.mdx (100%) rename website/content/{docs => }/commands/node/config.mdx (100%) rename website/content/{docs => }/commands/node/drain.mdx (96%) rename website/content/{docs => }/commands/node/eligibility.mdx (98%) rename website/content/{docs => }/commands/node/index.mdx (67%) rename website/content/{docs => }/commands/node/meta/apply.mdx (100%) rename website/content/{docs => }/commands/node/meta/index.mdx (83%) rename website/content/{docs => }/commands/node/meta/read.mdx (100%) rename website/content/{docs => }/commands/node/status.mdx (100%) rename website/content/{docs => }/commands/operator/api.mdx (100%) rename website/content/{docs => }/commands/operator/autopilot/get-config.mdx (95%) rename website/content/{docs => }/commands/operator/autopilot/health.mdx (95%) rename website/content/{docs => }/commands/operator/autopilot/set-config.mdx (97%) rename website/content/{docs => }/commands/operator/client-state.mdx (100%) rename website/content/{docs => }/commands/operator/debug.mdx (100%) rename website/content/{docs => }/commands/operator/gossip/keyring-generate.mdx (100%) rename website/content/{docs => }/commands/operator/gossip/keyring-install.mdx (100%) rename website/content/{docs => }/commands/operator/gossip/keyring-list.mdx (100%) rename website/content/{docs => }/commands/operator/gossip/keyring-remove.mdx (100%) rename website/content/{docs => }/commands/operator/gossip/keyring-use.mdx (100%) rename website/content/{docs => }/commands/operator/index.mdx (57%) rename website/content/{docs => }/commands/operator/metrics.mdx (100%) rename website/content/{docs => }/commands/operator/raft/info.mdx (100%) rename website/content/{docs => }/commands/operator/raft/list-peers.mdx (96%) rename website/content/{docs => }/commands/operator/raft/logs.mdx (100%) rename website/content/{docs => }/commands/operator/raft/remove-peer.mdx (83%) rename website/content/{docs => }/commands/operator/raft/state.mdx (100%) rename website/content/{docs => }/commands/operator/raft/transfer-leadership.mdx (93%) rename website/content/{docs => }/commands/operator/root/keyring-list.mdx (100%) rename website/content/{docs => }/commands/operator/root/keyring-remove.mdx (94%) rename website/content/{docs => }/commands/operator/root/keyring-rotate.mdx (100%) rename website/content/{docs => }/commands/operator/scheduler/get-config.mdx (100%) rename website/content/{docs => }/commands/operator/scheduler/set-config.mdx (100%) rename website/content/{docs => }/commands/operator/snapshot/agent.mdx (100%) rename website/content/{docs => }/commands/operator/snapshot/inspect.mdx (95%) rename website/content/{docs => }/commands/operator/snapshot/redact.mdx (100%) rename website/content/{docs => }/commands/operator/snapshot/restore.mdx (93%) rename website/content/{docs => }/commands/operator/snapshot/save.mdx (96%) rename website/content/{docs => }/commands/operator/snapshot/state.mdx (100%) rename website/content/{docs => }/commands/operator/utilization.mdx (94%) rename website/content/{docs => }/commands/plugin/index.mdx (89%) rename website/content/{docs => }/commands/plugin/status.mdx (100%) rename website/content/{docs => }/commands/quota/apply.mdx (100%) rename website/content/{docs => }/commands/quota/delete.mdx (100%) rename website/content/{docs => }/commands/quota/index.mdx (78%) rename website/content/{docs => }/commands/quota/init.mdx (100%) rename website/content/{docs => }/commands/quota/inspect.mdx (100%) rename website/content/{docs => }/commands/quota/list.mdx (100%) rename website/content/{docs => }/commands/quota/status.mdx (100%) rename website/content/{docs => }/commands/recommendation/apply.mdx (100%) rename website/content/{docs => }/commands/recommendation/dismiss.mdx (100%) rename website/content/{docs => }/commands/recommendation/index.mdx (78%) rename website/content/{docs => }/commands/recommendation/info.mdx (100%) rename website/content/{docs => }/commands/recommendation/list.mdx (100%) rename website/content/{docs => }/commands/scaling/index.mdx (82%) rename website/content/{docs => }/commands/scaling/policy-info.mdx (100%) rename website/content/{docs => }/commands/scaling/policy-list.mdx (100%) rename website/content/{docs => }/commands/sentinel/apply.mdx (90%) rename website/content/{docs => }/commands/sentinel/delete.mdx (100%) rename website/content/{docs => }/commands/sentinel/index.mdx (82%) rename website/content/{docs => }/commands/sentinel/list.mdx (100%) rename website/content/{docs => }/commands/sentinel/read.mdx (100%) rename website/content/{docs => }/commands/server/force-leave.mdx (100%) rename website/content/{docs => }/commands/server/index.mdx (73%) rename website/content/{docs => }/commands/server/join.mdx (87%) rename website/content/{docs => }/commands/server/members.mdx (100%) rename website/content/{docs => }/commands/service/delete.mdx (100%) rename website/content/{docs => }/commands/service/index.mdx (82%) rename website/content/{docs => }/commands/service/info.mdx (100%) rename website/content/{docs => }/commands/service/list.mdx (100%) rename website/content/{docs => }/commands/setup/consul.mdx (100%) rename website/content/{docs => }/commands/setup/index.mdx (89%) rename website/content/{docs => }/commands/setup/vault.mdx (100%) rename website/content/{docs => }/commands/status.mdx (100%) rename website/content/{docs => }/commands/system/gc.mdx (100%) rename website/content/{docs => }/commands/system/index.mdx (77%) rename website/content/{docs => }/commands/system/reconcile-summaries.mdx (100%) rename website/content/{docs => }/commands/tls/ca-create.mdx (100%) rename website/content/{docs => }/commands/tls/ca-info.mdx (100%) rename website/content/{docs => }/commands/tls/cert-create.mdx (97%) rename website/content/{docs => }/commands/tls/cert-info.mdx (100%) rename website/content/{docs => }/commands/tls/index.mdx (70%) rename website/content/{docs => }/commands/ui.mdx (100%) rename website/content/{docs => }/commands/var/get.mdx (100%) rename website/content/{docs => }/commands/var/index.mdx (88%) rename website/content/{docs => }/commands/var/init.mdx (100%) rename website/content/{docs => }/commands/var/list.mdx (100%) rename website/content/{docs => }/commands/var/lock.mdx (100%) rename website/content/{docs => }/commands/var/purge.mdx (100%) rename website/content/{docs => }/commands/var/put.mdx (100%) rename website/content/{docs => }/commands/version.mdx (100%) rename website/content/{docs => }/commands/volume/claim-delete.mdx (100%) rename website/content/{docs => }/commands/volume/claim-list.mdx (100%) rename website/content/{docs => }/commands/volume/create.mdx (95%) rename website/content/{docs => }/commands/volume/delete.mdx (91%) rename website/content/{docs => }/commands/volume/deregister.mdx (96%) rename website/content/{docs => }/commands/volume/detach.mdx (100%) rename website/content/{docs => }/commands/volume/index.mdx (63%) rename website/content/{docs => }/commands/volume/init.mdx (100%) rename website/content/{docs => }/commands/volume/register.mdx (97%) rename website/content/{docs => }/commands/volume/snapshot-create.mdx (94%) rename website/content/{docs => }/commands/volume/snapshot-delete.mdx (92%) rename website/content/{docs => }/commands/volume/snapshot-list.mdx (95%) rename website/content/{docs => }/commands/volume/status.mdx (98%) rename website/content/docs/{concepts => architecture/cluster}/consensus.mdx (100%) rename website/content/docs/{concepts/architecture => architecture/cluster}/federation.mdx (88%) create mode 100644 website/content/docs/architecture/cluster/node-pools.mdx rename website/content/docs/{concepts => architecture}/cpu.mdx (100%) rename website/content/docs/{concepts => }/architecture/index.mdx (94%) rename website/content/docs/{concepts => architecture/security}/gossip.mdx (97%) rename website/content/docs/{concepts/security.mdx => architecture/security/index.mdx} (94%) rename website/content/docs/{concepts/plugins => architecture}/storage/csi.mdx (100%) rename website/content/docs/{concepts/plugins => architecture}/storage/host-volumes.mdx (98%) rename website/content/docs/{concepts/plugins => architecture}/storage/index.mdx (93%) rename website/content/docs/{operations => architecture/storage}/stateful-workloads.mdx (96%) delete mode 100644 website/content/docs/concepts/acl/index.mdx delete mode 100644 website/content/docs/concepts/index.mdx rename website/content/docs/concepts/scheduling/{scheduling.mdx => how-scheduling-works.mdx} (98%) rename website/content/docs/{ => concepts/scheduling}/schedulers.mdx (98%) create mode 100644 website/content/docs/deploy/clusters/connect-nodes.mdx create mode 100644 website/content/docs/deploy/clusters/federate-regions.mdx rename website/content/docs/{operations/federation/index.mdx => deploy/clusters/federation-considerations.mdx} (87%) rename website/content/docs/{operations/federation/failure.mdx => deploy/clusters/federation-failure-scenarios.mdx} (100%) create mode 100644 website/content/docs/deploy/clusters/index.mdx create mode 100644 website/content/docs/deploy/clusters/reverse-proxy-ui.mdx rename website/content/docs/{install => deploy}/index.mdx (99%) rename website/content/docs/{operations => deploy}/nomad-agent.mdx (90%) rename website/content/docs/{install => deploy}/production/index.mdx (66%) create mode 100644 website/content/docs/deploy/production/reference-architecture.mdx rename website/content/docs/{install => deploy}/production/requirements.mdx (98%) rename website/content/docs/{install => deploy/production}/windows-service.mdx (100%) create mode 100644 website/content/docs/deploy/task-driver/docker.mdx rename website/content/docs/{drivers => deploy/task-driver}/exec.mdx (58%) create mode 100644 website/content/docs/deploy/task-driver/index.mdx rename website/content/docs/{drivers => deploy/task-driver}/java.mdx (51%) create mode 100644 website/content/docs/deploy/task-driver/qemu.mdx rename website/content/docs/{drivers => deploy/task-driver}/raw_exec.mdx (55%) delete mode 100644 website/content/docs/drivers/index.mdx create mode 100644 website/content/docs/govern/index.mdx create mode 100644 website/content/docs/govern/namespaces.mdx create mode 100644 website/content/docs/govern/resource-quotas.mdx create mode 100644 website/content/docs/govern/sentinel.mdx create mode 100644 website/content/docs/govern/use-node-pools.mdx delete mode 100644 website/content/docs/integrations/index.mdx create mode 100644 website/content/docs/job-declare/configure-tasks.mdx rename website/content/docs/{networking/service-mesh.mdx => job-declare/consul-service-mesh.mdx} (94%) create mode 100644 website/content/docs/job-declare/create-job.mdx create mode 100644 website/content/docs/job-declare/exit-signals.mdx create mode 100644 website/content/docs/job-declare/failure/check-restart.mdx create mode 100644 website/content/docs/job-declare/failure/index.mdx create mode 100644 website/content/docs/job-declare/failure/reschedule.mdx create mode 100644 website/content/docs/job-declare/failure/restart.mdx create mode 100644 website/content/docs/job-declare/index.mdx create mode 100644 website/content/docs/job-declare/multiregion.mdx create mode 100644 website/content/docs/job-declare/nomad-actions.mdx create mode 100644 website/content/docs/job-declare/nomad-variables.mdx create mode 100644 website/content/docs/job-declare/strategy/blue-green-canary.mdx create mode 100644 website/content/docs/job-declare/strategy/index.mdx create mode 100644 website/content/docs/job-declare/strategy/rolling.mdx create mode 100644 website/content/docs/job-declare/task-dependencies.mdx rename website/content/docs/{drivers => job-declare/task-driver}/docker.mdx (57%) create mode 100644 website/content/docs/job-declare/task-driver/exec.mdx create mode 100644 website/content/docs/job-declare/task-driver/index.mdx create mode 100644 website/content/docs/job-declare/task-driver/java.mdx rename website/content/docs/{drivers => job-declare/task-driver}/qemu.mdx (56%) create mode 100644 website/content/docs/job-declare/task-driver/raw_exec.mdx create mode 100644 website/content/docs/job-declare/vault.mdx create mode 100644 website/content/docs/job-networking/cni.mdx create mode 100644 website/content/docs/job-networking/index.mdx create mode 100644 website/content/docs/job-networking/service-discovery.mdx create mode 100644 website/content/docs/job-run/index.mdx create mode 100644 website/content/docs/job-run/inspect.mdx create mode 100644 website/content/docs/job-run/logs.mdx create mode 100644 website/content/docs/job-run/utilization-metrics.mdx create mode 100644 website/content/docs/job-run/versions.mdx create mode 100644 website/content/docs/job-scheduling/affinity.mdx create mode 100644 website/content/docs/job-scheduling/index.mdx create mode 100644 website/content/docs/job-scheduling/preemption.mdx create mode 100644 website/content/docs/job-scheduling/spread.mdx create mode 100644 website/content/docs/manage/autopilot.mdx create mode 100644 website/content/docs/manage/format-cli-output.mdx rename website/content/docs/{operations => manage}/garbage-collection.mdx (98%) create mode 100644 website/content/docs/manage/index.mdx rename website/content/docs/{operations => manage}/key-management.mdx (94%) create mode 100644 website/content/docs/manage/migrate-workloads.mdx create mode 100644 website/content/docs/manage/outage-recovery.mdx create mode 100644 website/content/docs/monitor/cluster-topology.mdx create mode 100644 website/content/docs/monitor/event-stream.mdx rename website/content/docs/{operations/monitoring-nomad.mdx => monitor/index.mdx} (96%) create mode 100644 website/content/docs/monitor/inspect-cluster.mdx create mode 100644 website/content/docs/monitor/inspect-workloads.mdx rename website/content/docs/{integrations => networking}/consul/index.mdx (75%) rename website/content/docs/{integrations => networking}/consul/service-mesh.mdx (86%) rename website/content/docs/{operations/ipv6-support.mdx => networking/ipv6.mdx} (96%) delete mode 100644 website/content/docs/nomad-vs-kubernetes/alternative.mdx delete mode 100644 website/content/docs/nomad-vs-kubernetes/index.mdx delete mode 100644 website/content/docs/nomad-vs-kubernetes/supplement.mdx delete mode 100644 website/content/docs/operations/benchmarking.mdx delete mode 100644 website/content/docs/operations/index.mdx rename website/content/docs/{install => }/quickstart.mdx (96%) create mode 100644 website/content/docs/reference/go-template-syntax.mdx rename website/content/docs/{job-specification => reference}/hcl2/expressions.mdx (98%) rename website/content/docs/{job-specification => reference}/hcl2/functions/collection/chunklist.mdx (100%) rename website/content/docs/{job-specification => reference}/hcl2/functions/collection/coalesce.mdx (84%) rename website/content/docs/{job-specification => reference}/hcl2/functions/collection/coalescelist.mdx (85%) rename website/content/docs/{job-specification => reference}/hcl2/functions/collection/compact.mdx (100%) rename website/content/docs/{job-specification => reference}/hcl2/functions/collection/concat.mdx (100%) rename website/content/docs/{job-specification => reference}/hcl2/functions/collection/contains.mdx (100%) rename website/content/docs/{job-specification => reference}/hcl2/functions/collection/distinct.mdx (100%) rename website/content/docs/{job-specification => reference}/hcl2/functions/collection/element.mdx (75%) rename website/content/docs/{job-specification => reference}/hcl2/functions/collection/flatten.mdx (93%) rename website/content/docs/{job-specification => reference}/hcl2/functions/collection/index-fn.mdx (80%) rename website/content/docs/{job-specification => reference}/hcl2/functions/collection/keys.mdx (80%) rename website/content/docs/{job-specification => reference}/hcl2/functions/collection/length.mdx (100%) rename website/content/docs/{job-specification => reference}/hcl2/functions/collection/lookup.mdx (80%) rename website/content/docs/{job-specification => reference}/hcl2/functions/collection/merge.mdx (100%) rename website/content/docs/{job-specification => reference}/hcl2/functions/collection/range.mdx (100%) rename website/content/docs/{job-specification => reference}/hcl2/functions/collection/reverse.mdx (81%) rename website/content/docs/{job-specification => reference}/hcl2/functions/collection/setintersection.mdx (71%) rename website/content/docs/{job-specification => reference}/hcl2/functions/collection/setproduct.mdx (90%) rename website/content/docs/{job-specification => reference}/hcl2/functions/collection/setunion.mdx (68%) rename website/content/docs/{job-specification => reference}/hcl2/functions/collection/slice.mdx (83%) rename website/content/docs/{job-specification => reference}/hcl2/functions/collection/sort.mdx (100%) rename website/content/docs/{job-specification => reference}/hcl2/functions/collection/values.mdx (72%) rename website/content/docs/{job-specification => reference}/hcl2/functions/collection/zipmap.mdx (100%) rename website/content/docs/{job-specification => reference}/hcl2/functions/conversion/can.mdx (83%) rename website/content/docs/{job-specification => reference}/hcl2/functions/conversion/convert.mdx (100%) rename website/content/docs/{job-specification => reference}/hcl2/functions/conversion/try.mdx (96%) rename website/content/docs/{job-specification => reference}/hcl2/functions/crypto/bcrypt.mdx (100%) rename website/content/docs/{job-specification => reference}/hcl2/functions/crypto/md5.mdx (100%) rename website/content/docs/{job-specification => reference}/hcl2/functions/crypto/rsadecrypt.mdx (94%) rename website/content/docs/{job-specification => reference}/hcl2/functions/crypto/sha1.mdx (100%) rename website/content/docs/{job-specification => reference}/hcl2/functions/crypto/sha256.mdx (100%) rename website/content/docs/{job-specification => reference}/hcl2/functions/crypto/sha512.mdx (100%) rename website/content/docs/{job-specification => reference}/hcl2/functions/datetime/formatdate.mdx (97%) rename website/content/docs/{job-specification => reference}/hcl2/functions/datetime/timeadd.mdx (100%) rename website/content/docs/{job-specification => reference}/hcl2/functions/encoding/base64decode.mdx (87%) rename website/content/docs/{job-specification => reference}/hcl2/functions/encoding/base64encode.mdx (88%) rename website/content/docs/{job-specification => reference}/hcl2/functions/encoding/csvdecode.mdx (100%) rename website/content/docs/{job-specification => reference}/hcl2/functions/encoding/jsondecode.mdx (86%) rename website/content/docs/{job-specification => reference}/hcl2/functions/encoding/jsonencode.mdx (83%) rename website/content/docs/{job-specification => reference}/hcl2/functions/encoding/urlencode.mdx (100%) rename website/content/docs/{job-specification => reference}/hcl2/functions/encoding/yamldecode.mdx (90%) rename website/content/docs/{job-specification => reference}/hcl2/functions/encoding/yamlencode.mdx (81%) rename website/content/docs/{job-specification => reference}/hcl2/functions/file/abspath.mdx (88%) rename website/content/docs/{job-specification => reference}/hcl2/functions/file/basename.mdx (87%) rename website/content/docs/{job-specification => reference}/hcl2/functions/file/dirname.mdx (88%) rename website/content/docs/{job-specification => reference}/hcl2/functions/file/file.mdx (92%) rename website/content/docs/{job-specification => reference}/hcl2/functions/file/filebase64.mdx (82%) rename website/content/docs/{job-specification => reference}/hcl2/functions/file/fileexists.mdx (87%) rename website/content/docs/{job-specification => reference}/hcl2/functions/file/fileset.mdx (100%) rename website/content/docs/{job-specification => reference}/hcl2/functions/file/pathexpand.mdx (100%) rename website/content/docs/{job-specification => reference}/hcl2/functions/index.mdx (100%) rename website/content/docs/{job-specification => reference}/hcl2/functions/ipnet/cidrhost.mdx (88%) rename website/content/docs/{job-specification => reference}/hcl2/functions/ipnet/cidrnetmask.mdx (100%) rename website/content/docs/{job-specification => reference}/hcl2/functions/ipnet/cidrsubnet.mdx (89%) rename website/content/docs/{job-specification => reference}/hcl2/functions/ipnet/cidrsubnets.mdx (81%) rename website/content/docs/{job-specification => reference}/hcl2/functions/numeric/abs.mdx (100%) rename website/content/docs/{job-specification => reference}/hcl2/functions/numeric/ceil.mdx (76%) rename website/content/docs/{job-specification => reference}/hcl2/functions/numeric/floor.mdx (76%) rename website/content/docs/{job-specification => reference}/hcl2/functions/numeric/log.mdx (100%) rename website/content/docs/{job-specification => reference}/hcl2/functions/numeric/max.mdx (80%) rename website/content/docs/{job-specification => reference}/hcl2/functions/numeric/min.mdx (80%) rename website/content/docs/{job-specification => reference}/hcl2/functions/numeric/parseint.mdx (90%) rename website/content/docs/{job-specification => reference}/hcl2/functions/numeric/pow.mdx (100%) rename website/content/docs/{job-specification => reference}/hcl2/functions/numeric/signum.mdx (100%) rename website/content/docs/{job-specification => reference}/hcl2/functions/string/chomp.mdx (81%) rename website/content/docs/{job-specification => reference}/hcl2/functions/string/format.mdx (95%) rename website/content/docs/{job-specification => reference}/hcl2/functions/string/formatlist.mdx (84%) rename website/content/docs/{job-specification => reference}/hcl2/functions/string/indent.mdx (100%) rename website/content/docs/{job-specification => reference}/hcl2/functions/string/join.mdx (82%) rename website/content/docs/{job-specification => reference}/hcl2/functions/string/lower.mdx (64%) rename website/content/docs/{job-specification => reference}/hcl2/functions/string/regex_replace.mdx (91%) rename website/content/docs/{job-specification => reference}/hcl2/functions/string/replace.mdx (82%) rename website/content/docs/{job-specification => reference}/hcl2/functions/string/split.mdx (82%) rename website/content/docs/{job-specification => reference}/hcl2/functions/string/strlen.mdx (100%) rename website/content/docs/{job-specification => reference}/hcl2/functions/string/strrev.mdx (84%) rename website/content/docs/{job-specification => reference}/hcl2/functions/string/substr.mdx (100%) rename website/content/docs/{job-specification => reference}/hcl2/functions/string/title.mdx (66%) rename website/content/docs/{job-specification => reference}/hcl2/functions/string/trim.mdx (53%) rename website/content/docs/{job-specification => reference}/hcl2/functions/string/trimprefix.mdx (54%) rename website/content/docs/{job-specification => reference}/hcl2/functions/string/trimspace.mdx (81%) rename website/content/docs/{job-specification => reference}/hcl2/functions/string/trimsuffix.mdx (54%) rename website/content/docs/{job-specification => reference}/hcl2/functions/string/upper.mdx (64%) rename website/content/docs/{job-specification => reference}/hcl2/functions/uuid/uuidv4.mdx (87%) rename website/content/docs/{job-specification => reference}/hcl2/functions/uuid/uuidv5.mdx (95%) rename website/content/docs/{job-specification => reference}/hcl2/index.mdx (90%) rename website/content/docs/{job-specification => reference}/hcl2/locals.mdx (95%) rename website/content/docs/{job-specification => reference}/hcl2/syntax.mdx (97%) rename website/content/docs/{job-specification => reference}/hcl2/variables.mdx (96%) rename website/content/docs/{operations/metrics-reference.mdx => reference/metrics.mdx} (99%) rename website/content/docs/{runtime/environment.mdx => reference/runtime-environment-settings.mdx} (99%) rename website/content/docs/{runtime/interpolation.mdx => reference/runtime-variable-interpolation.mdx} (96%) rename website/content/docs/{enterprise/sentinel.mdx => reference/sentinel-policy.mdx} (98%) delete mode 100644 website/content/docs/runtime/index.mdx create mode 100644 website/content/docs/scale/benchmarking.mdx create mode 100644 website/content/docs/scale/index.mdx create mode 100644 website/content/docs/secure/acl/bootstrap.mdx rename website/content/docs/{integrations/consul/acl.mdx => secure/acl/consul.mdx} (96%) create mode 100644 website/content/docs/secure/acl/index.mdx create mode 100644 website/content/docs/secure/acl/policies/create-policy.mdx create mode 100644 website/content/docs/secure/acl/policies/index.mdx create mode 100644 website/content/docs/secure/acl/tokens/index.mdx create mode 100644 website/content/docs/secure/acl/tokens/vault.mdx rename website/content/docs/{concepts/acl/auth-methods => secure/authentication}/jwt.mdx (95%) rename website/content/docs/{concepts/acl/auth-methods => secure/authentication}/oidc.mdx (95%) create mode 100644 website/content/docs/secure/authentication/sso-auth0.mdx create mode 100644 website/content/docs/secure/authentication/sso-pkce-jwt.mdx create mode 100644 website/content/docs/secure/authentication/sso-vault.mdx create mode 100644 website/content/docs/secure/index.mdx create mode 100644 website/content/docs/secure/traffic/gossip-encryption.mdx create mode 100644 website/content/docs/secure/traffic/index.mdx create mode 100644 website/content/docs/secure/traffic/tls.mdx rename website/content/docs/{integrations => secure}/vault/acl.mdx (93%) rename website/content/docs/{integrations => secure}/vault/index.mdx (98%) rename website/content/docs/{operations => secure/workload-identity}/aws-oidc-provider.mdx (100%) create mode 100644 website/content/docs/secure/workload-identity/index.mdx create mode 100644 website/content/docs/secure/workload-identity/vault.mdx create mode 100644 website/content/docs/stateful-workloads/csi-volumes.mdx create mode 100644 website/content/docs/stateful-workloads/dynamic-host-volumes.mdx create mode 100644 website/content/docs/stateful-workloads/index.mdx create mode 100644 website/content/docs/stateful-workloads/static-host-volumes.mdx rename website/content/docs/{who-uses-nomad.mdx => use-cases.mdx} (51%) create mode 100644 website/content/docs/what-is-nomad.mdx create mode 100644 website/content/intro/README.md delete mode 100644 website/content/intro/use-cases.mdx delete mode 100644 website/content/intro/vs/ecs.mdx delete mode 100644 website/content/intro/vs/index.mdx delete mode 100644 website/content/intro/vs/mesos.mdx delete mode 100644 website/content/intro/vs/terraform.mdx create mode 100644 website/content/partials/consul-namespaces.mdx rename website/content/{docs/concepts => partials}/node-pools.mdx (92%) create mode 100644 website/content/partials/service-discovery.mdx create mode 100644 website/content/tools/nomad-pack/advanced-usage.mdx create mode 100644 website/content/tools/nomad-pack/create-packs.mdx create mode 100644 website/content/tools/nomad-pack/index.mdx create mode 100644 website/data/commands-nav-data.json create mode 100644 website/public/img/clusters/active-alert.png create mode 100644 website/public/img/clusters/alertmanager-webui.png create mode 100644 website/public/img/clusters/alerts.png create mode 100644 website/public/img/clusters/cannot-fetch-logs.png create mode 100644 website/public/img/clusters/cannot-remote-exec.png create mode 100644 website/public/img/clusters/chrome-pending.png create mode 100644 website/public/img/clusters/chrome-timeout.png create mode 100644 website/public/img/clusters/firefox-pending.png create mode 100644 website/public/img/clusters/firefox-timeout.png create mode 100644 website/public/img/clusters/new-targets.png create mode 100644 website/public/img/clusters/nomad-multi-region.png create mode 100644 website/public/img/clusters/prometheus-targets.png create mode 100644 website/public/img/clusters/running-jobs.png create mode 100644 website/public/img/clusters/safari-pending.png create mode 100644 website/public/img/clusters/safart-timeout.png create mode 100644 website/public/img/deploy/nomad_fault_tolerance.png create mode 100644 website/public/img/deploy/nomad_network_arch.png create mode 100644 website/public/img/deploy/nomad_network_arch_0-1x.png create mode 100644 website/public/img/deploy/nomad_network_arch_0-1y.png create mode 100644 website/public/img/deploy/nomad_reference_diagram.png create mode 100644 website/public/img/govern/nomad-ui-namespace-dropdown.png create mode 100644 website/public/img/govern/sentinel.jpg create mode 100644 website/public/img/monitor/guide-ui-img-alloc-preempted.png create mode 100644 website/public/img/monitor/guide-ui-img-alloc-preempter.png create mode 100644 website/public/img/monitor/guide-ui-img-alloc-reschedule-details.png create mode 100644 website/public/img/monitor/guide-ui-img-alloc-reschedule-icon.png create mode 100644 website/public/img/monitor/guide-ui-img-alloc-resource-utilization.png create mode 100644 website/public/img/monitor/guide-ui-img-alloc-stop-restart.png create mode 100644 website/public/img/monitor/guide-ui-img-alloc-unhealthy-driver.png create mode 100644 website/public/img/monitor/guide-ui-img-client-allocations.png create mode 100644 website/public/img/monitor/guide-ui-img-client-attributes.png create mode 100644 website/public/img/monitor/guide-ui-img-client-detail.png create mode 100644 website/public/img/monitor/guide-ui-img-client-drain.png create mode 100644 website/public/img/monitor/guide-ui-img-client-driver-status.png create mode 100644 website/public/img/monitor/guide-ui-img-client-events.png create mode 100644 website/public/img/monitor/guide-ui-img-client-resource-utilization.png create mode 100644 website/public/img/monitor/guide-ui-img-clients-filters.png create mode 100644 website/public/img/monitor/guide-ui-img-clients-list.png create mode 100644 website/public/img/monitor/guide-ui-img-job-definition-edit.png create mode 100644 website/public/img/monitor/guide-ui-img-job-deployment-canary.png create mode 100644 website/public/img/monitor/guide-ui-img-job-filters.png create mode 100644 website/public/img/monitor/guide-ui-img-job-stop.png create mode 100644 website/public/img/monitor/guide-ui-img-periodic-force.png create mode 100644 website/public/img/monitor/guide-ui-img-server-detail.png create mode 100644 website/public/img/monitor/guide-ui-img-servers-list.png create mode 100644 website/public/img/monitor/guide-ui-img-task-events.png create mode 100644 website/public/img/monitor/guide-ui-img-task-logs.png create mode 100644 website/public/img/monitor/guide-ui-jobs-list.png create mode 100644 website/public/img/monitor/inspect-cluster.mdx create mode 100644 website/public/img/monitor/inspect-workloads.mdx create mode 100644 website/public/img/monitor/topo-viz/alloc-associations-across-dcs.png create mode 100644 website/public/img/monitor/topo-viz/allocation-associations.png create mode 100644 website/public/img/monitor/topo-viz/allocation-panel.png create mode 100644 website/public/img/monitor/topo-viz/allocation-tooltip.png create mode 100644 website/public/img/monitor/topo-viz/client-panel.png create mode 100644 website/public/img/monitor/topo-viz/client-with-many-allocs.png create mode 100644 website/public/img/monitor/topo-viz/cluster-panel.png create mode 100644 website/public/img/monitor/topo-viz/cluster-view.png create mode 100644 website/public/img/monitor/topo-viz/empty-clients.png create mode 100644 website/public/img/monitor/topo-viz/topo-viz-link.png create mode 100644 website/public/img/secure/acl.jpg create mode 100644 website/public/img/secure/auth0-configure-callback-action.png create mode 100644 website/public/img/secure/auth0-configure-callback-flow.png create mode 100644 website/public/img/secure/auth0-configure-callback-rule.png create mode 100644 website/public/img/secure/auth0-configure-callback-urls.png create mode 100644 website/public/img/secure/auth0-configure-user-metadata.png create mode 100644 website/public/img/secure/auth0-create-user.png create mode 100644 website/public/img/secure/auth0-get-application-params.png create mode 100644 website/public/img/secure/nomad-ui-jobs-signed-in.png create mode 100644 website/public/img/secure/nomad-ui-oidc-login-button.png create mode 100644 website/public/img/secure/nomad-ui-oidc-login-form.png create mode 100644 website/public/img/secure/nomad-ui-oidc-signed-in.png diff --git a/website/content/api-docs/acl/auth-methods.mdx b/website/content/api-docs/acl/auth-methods.mdx index 9584c443e..37b871461 100644 --- a/website/content/api-docs/acl/auth-methods.mdx +++ b/website/content/api-docs/acl/auth-methods.mdx @@ -359,7 +359,7 @@ $ curl \ ``` [private key jwt]: https://oauth.net/private-key-jwt/ -[concepts-assertions]: /nomad/docs/concepts/acl/auth-methods/oidc#client-assertions +[concepts-assertions]: /nomad/docs/secure/authentication/oidc#client-assertions [x5t]: https://datatracker.ietf.org/doc/html/rfc7515#section-4.1.7 [x5t#S256]: https://datatracker.ietf.org/doc/html/rfc7515#section-4.1.8 [pkce]: https://oauth.net/2/pkce/ diff --git a/website/content/api-docs/acl/tokens.mdx b/website/content/api-docs/acl/tokens.mdx index 625c04192..983ec3003 100644 --- a/website/content/api-docs/acl/tokens.mdx +++ b/website/content/api-docs/acl/tokens.mdx @@ -16,7 +16,7 @@ An operator created token can be provided in the body of the request to bootstra if required. If no header is provided the cluster will return a generated management token. The provided token should be presented in a UUID format. This request is always forwarded to the authoritative region. It can only be invoked once -until a [bootstrap reset](/nomad/tutorials/access-control/access-control-bootstrap#re-bootstrap-acl-system) is performed. +until a [bootstrap reset](/nomad/docs/secure/acl/bootstrap#re-bootstrap-acl-system) is performed. | Method | Path | Produces | | ------ | ------------------- | ------------------ | diff --git a/website/content/api-docs/evaluations.mdx b/website/content/api-docs/evaluations.mdx index 2984cdaf1..0876c1752 100644 --- a/website/content/api-docs/evaluations.mdx +++ b/website/content/api-docs/evaluations.mdx @@ -444,4 +444,4 @@ $ curl \ ``` [update_scheduler_configuration]: /nomad/api-docs/operator/scheduler#update-scheduler-configuration -[metrics reference]: /nomad/docs/operations/metrics-reference +[metrics reference]: /nomad/docs/reference/metrics diff --git a/website/content/api-docs/index.mdx b/website/content/api-docs/index.mdx index 1f4ca6c92..7e2f53fa2 100644 --- a/website/content/api-docs/index.mdx +++ b/website/content/api-docs/index.mdx @@ -727,5 +727,5 @@ specific response codes are returned but all clients should handle the following - 404 indicates an unknown resource. - 5xx means that the client should not expect the request to succeed if retried. -[cli_operator_api]: /nomad/docs/commands/operator/api -[cli_operator_api_filter]: /nomad/docs/commands/operator/api#filter +[cli_operator_api]: /nomad/commands/operator/api +[cli_operator_api_filter]: /nomad/commands/operator/api#filter diff --git a/website/content/api-docs/json-jobs.mdx b/website/content/api-docs/json-jobs.mdx index 7a822722c..365822b55 100644 --- a/website/content/api-docs/json-jobs.mdx +++ b/website/content/api-docs/json-jobs.mdx @@ -33,10 +33,10 @@ The [`nomad job run -json`][job-run-json] flag submits a JSON formatted job: $ nomad job run -json example.json ``` -[job-inspect]: /nomad/docs/commands/job/inspect -[job-output]: /nomad/docs/commands/job/run#output +[job-inspect]: /nomad/commands/job/inspect +[job-output]: /nomad/commands/job/run#output [job-parse]: /nomad/api-docs/jobs#parse-job -[job-run-json]: /nomad/docs/commands/job/run#json +[job-run-json]: /nomad/commands/job/run#json ## Syntax @@ -307,7 +307,7 @@ The `Job` object supports the following keys: - `Type` - Specifies the job type and switches which scheduler is used. Nomad provides the `service`, `system` and `batch` schedulers, and defaults to `service`. To learn more about each scheduler type visit - [here](/nomad/docs/schedulers) + [here](/nomad/docs/concepts/scheduling/schedulers) - `Update` - Specifies an update strategy to be applied to all task groups within the job. When specified both at the job level and the task group level, @@ -482,13 +482,13 @@ The `Task` object supports the following keys: to. The file is written relative to the task's local directory. - `Driver` - Specifies the task driver that should be used to run the - task. See the [driver documentation](/nomad/docs/drivers) for what + task. See the [driver documentation](/nomad/docs/job-declare/task-driver) for what is available. Examples include `docker`, `qemu`, `java`, and `exec`. - `Env` - A map of key-value representing environment variables that will be passed along to the running process. Nomad variables are interpreted when set in the environment variable values. See the table of - interpreted variables [here](/nomad/docs/runtime/interpolation). + interpreted variables [here](/nomad/docs/reference/runtime-variable-interpolation). For example the below environment map will be reinterpreted: @@ -540,7 +540,7 @@ The `Task` object supports the following keys: Consul for service discovery. A `Service` object represents a routable and discoverable service on the network. Nomad automatically registers when a task is started and de-registers it when the task transitions to the dead state. - [Click here](/nomad/docs/integrations/consul-integration#service-discovery) to learn more about + [Click here](/nomad/docs/networking/service-discovery) to learn more about services. Below is the fields in the `Service` object: - `Name`: An explicit name for the Service. Nomad will replace `${JOB}`, @@ -887,7 +887,7 @@ An example `Update` block: The `Constraint` object supports the following keys: - `LTarget` - Specifies the attribute to examine for the - constraint. See the table of attributes [here](/nomad/docs/runtime/interpolation#interpreted_node_vars). + constraint. See the table of attributes [here](/nomad/docs/reference/runtime-variable-interpolation#interpreted_node_vars). - `RTarget` - Specifies the value to compare the attribute against. This can be a literal value, another attribute or a regular expression if @@ -937,7 +937,7 @@ are described in [affinities](/nomad/docs/job-specification/affinity) The `Affinity` object supports the following keys: - `LTarget` - Specifies the attribute to examine for the - affinity. See the table of attributes [here](/nomad/docs/runtime/interpolation#interpreted_node_vars). + affinity. See the table of attributes [here](/nomad/docs/reference/runtime-variable-interpolation#interpreted_node_vars). - `RTarget` - Specifies the value to compare the attribute against. This can be a literal value, another attribute or a regular expression if @@ -1205,7 +1205,7 @@ in [spread](/nomad/docs/job-specification/spread). The `Spread` object supports the following keys: - `Attribute` - Specifies the attribute to examine for the - spread. See the [table of attributes](/nomad/docs/runtime/interpolation#interpreted_node_vars) for examples. + spread. See the [table of attributes](/nomad/docs/reference/runtime-variable-interpolation#interpreted_node_vars) for examples. - `SpreadTarget` - Specifies a list of attribute values and percentages. This is an optional field, when left empty Nomad will evenly spread allocations across values of the attribute. @@ -1242,6 +1242,6 @@ The `Scaling` object supports the following keys: autoscaler (e.g., [nomad-autoscaler](https://github.com/hashicorp/nomad-autoscaler)). [ct]: https://github.com/hashicorp/consul-template 'Consul Template by HashiCorp' -[drain]: /nomad/docs/commands/node/drain -[env]: /nomad/docs/runtime/environment 'Nomad Runtime Environment' +[drain]: /nomad/commands/node/drain +[env]: /nomad/docs/reference/runtime-environment-settings 'Nomad Runtime Environment' [Workload Identity]: /nomad/docs/concepts/workload-identity 'Nomad Workload Identity' diff --git a/website/content/api-docs/nodes.mdx b/website/content/api-docs/nodes.mdx index ca0f1c38d..9095f9008 100644 --- a/website/content/api-docs/nodes.mdx +++ b/website/content/api-docs/nodes.mdx @@ -948,7 +948,7 @@ $ curl \ This endpoint toggles the drain mode of the node. When draining is enabled, no further allocations will be assigned to this node, and existing allocations will be migrated to new nodes. See the [Workload Migration -Guide](/nomad/tutorials/manage-clusters/node-drain) for suggested usage. +Guide](/nomad/docs/manage/migrate-workloads) for suggested usage. | Method | Path | Produces | | ------ | ------------------------- | ------------------ | diff --git a/website/content/api-docs/operator/index.mdx b/website/content/api-docs/operator/index.mdx index de998d540..10670871c 100644 --- a/website/content/api-docs/operator/index.mdx +++ b/website/content/api-docs/operator/index.mdx @@ -14,9 +14,9 @@ as interacting with the Raft subsystem, licensing, snapshots, autopilot and sche ~> Use this interface with extreme caution, as improper use could lead to a Nomad outage and even loss of data. -See the [Outage Recovery](/nomad/tutorials/manage-clusters/outage-recovery) guide for some examples of how +See the [Outage Recovery](/nomad/docs/manage/outage-recovery) guide for some examples of how these capabilities are used. For a CLI to perform these operations manually, please see the documentation for the -[`nomad operator`](/nomad/docs/commands/operator) command. +[`nomad operator`](/nomad/commands/operator) command. Please choose a sub-section in the navigation for more information diff --git a/website/content/api-docs/operator/keyring.mdx b/website/content/api-docs/operator/keyring.mdx index feeb0b237..e0d27df52 100644 --- a/website/content/api-docs/operator/keyring.mdx +++ b/website/content/api-docs/operator/keyring.mdx @@ -256,8 +256,8 @@ $ curl \ ``` -[Key Management]: /nomad/docs/operations/key-management -[`nomad operator root keyring`]: /nomad/docs/commands/operator/root/keyring-rotate +[Key Management]: /nomad/docs/manage/key-management +[`nomad operator root keyring`]: /nomad/commands/operator/root/keyring-rotate [blocking queries]: /nomad/api-docs#blocking-queries [oidc-disco]: https://openid.net/specs/openid-connect-discovery-1_0.html [oidc_issuer]: /nomad/docs/configuration/server#oidc_issuer diff --git a/website/content/api-docs/operator/raft.mdx b/website/content/api-docs/operator/raft.mdx index 7b06c6786..ccc1feee3 100644 --- a/website/content/api-docs/operator/raft.mdx +++ b/website/content/api-docs/operator/raft.mdx @@ -216,4 +216,4 @@ $ curl --request PUT \ -[consensus protocol guide]: /nomad/docs/concepts/consensus +[consensus protocol guide]: /nomad/docs/architecture/cluster/consensus diff --git a/website/content/api-docs/operator/snapshot.mdx b/website/content/api-docs/operator/snapshot.mdx index d67516f2c..d75360009 100644 --- a/website/content/api-docs/operator/snapshot.mdx +++ b/website/content/api-docs/operator/snapshot.mdx @@ -11,7 +11,7 @@ description: |- This endpoint generates and returns an atomic, point-in-time snapshot of the Nomad server state for disaster recovery. Snapshots include all state managed by Nomad's -Raft [consensus protocol](/nomad/docs/concepts/consensus). +Raft [consensus protocol](/nomad/docs/architecture/cluster/consensus). Snapshots are exposed as gzipped tar archives which internally contain the Raft metadata required to restore, as well as a binary serialized version of the diff --git a/website/content/api-docs/operator/upgrade-check.mdx b/website/content/api-docs/operator/upgrade-check.mdx index cdae7f9f3..f41e348fe 100644 --- a/website/content/api-docs/operator/upgrade-check.mdx +++ b/website/content/api-docs/operator/upgrade-check.mdx @@ -181,5 +181,5 @@ $ nomad operator api \ [`identity`]: /nomad/docs/job-specification/identity [`vault`]: /nomad/docs/job-specification/vault -[nomad_acl_vault_wid]: /nomad/docs/integrations/vault/acl#nomad-workload-identities +[nomad_acl_vault_wid]: /nomad/docs/secure/vault/acl#nomad-workload-identities diff --git a/website/content/api-docs/operator/utilization.mdx b/website/content/api-docs/operator/utilization.mdx index 19dfe10e0..bc000d5e3 100644 --- a/website/content/api-docs/operator/utilization.mdx +++ b/website/content/api-docs/operator/utilization.mdx @@ -12,7 +12,7 @@ reporting bundles for Nomad Enterprise. -## Generate Nomad Enterprise Utilization Report Buindle +## Generate Nomad Enterprise Utilization Report Bundle This endpoint generates a utilization report. If Nomad did not record a utilization snapshot in the previous 24 hours, Nomad records a utilization @@ -56,4 +56,4 @@ API, decodes this to a human-readable file in the current working directory. [blocking queries]: /nomad/api-docs#blocking-queries [required ACLs]: /nomad/api-docs#acls -[`nomad operator utilization`]: /nomad/docs/command/operator/utilization +[`nomad operator utilization`]: /nomad/commands/operator/utilization diff --git a/website/content/api-docs/sentinel-policies.mdx b/website/content/api-docs/sentinel-policies.mdx index 44446a81a..efe2015f9 100644 --- a/website/content/api-docs/sentinel-policies.mdx +++ b/website/content/api-docs/sentinel-policies.mdx @@ -9,7 +9,7 @@ description: >- # Sentinel Policies HTTP API The `/sentinel/policies` and `/sentinel/policy/` endpoints are used to manage Sentinel policies. -For more details about Sentinel policies, please see the [Sentinel Policy Guide](/nomad/tutorials/governance-and-policy/sentinel). +For more details about Sentinel policies, please see the [Sentinel Policy Guide](/nomad/docs/govern/sentinel). Sentinel endpoints are only available when ACLs are enabled. For more details about ACLs, please see the [ACL Guide](/nomad/tutorials/access-control). diff --git a/website/content/api-docs/task-api.mdx b/website/content/api-docs/task-api.mdx index 5e7cee54a..5f644154f 100644 --- a/website/content/api-docs/task-api.mdx +++ b/website/content/api-docs/task-api.mdx @@ -97,12 +97,12 @@ $ nomad node status -filter 'Meta.example == "Hello World!"' - Using the Task API Unix Domain Socket on Windows [requires][windows] Windows build 17063 or later. -[acl]: /nomad/docs/concepts/acl/ -[acl-tokens]: /nomad/docs/concepts/acl/#token -[alloc-exec]: /nomad/docs/commands/alloc/exec -[anon]: /nomad/tutorials/access-control/access-control#acl-policies +[acl]: /nomad/docs/secure/acl/ +[acl-tokens]: /nomad/docs/secure/acl/#tokens +[alloc-exec]: /nomad/commands/alloc/exec +[anon]: /nomad/docs/secure/acl#policies [bind_addr]: /nomad/docs/configuration -[mTLS]: /nomad/tutorials/transport-security/security-enable-tls +[mTLS]: /nomad/docs/secure/traffic/tls [task-user]: /nomad/docs/job-specification/task#user [workload-id]: /nomad/docs/concepts/workload-identity [windows]: https://devblogs.microsoft.com/commandline/af_unix-comes-to-windows/ diff --git a/website/content/api-docs/variables/index.mdx b/website/content/api-docs/variables/index.mdx index 8f1e8bcb2..0e29e6045 100644 --- a/website/content/api-docs/variables/index.mdx +++ b/website/content/api-docs/variables/index.mdx @@ -17,5 +17,5 @@ documentation for the [`nomad var`][] commands. Please choose a sub-section in the navigation for more information -[`nomad var`]: /nomad/docs/commands/var -[Variables]: /nomad/docs/concepts/variables \ No newline at end of file +[`nomad var`]: /nomad/commands/var +[Variables]: /nomad/docs/concepts/variables diff --git a/website/content/api-docs/variables/locks.mdx b/website/content/api-docs/variables/locks.mdx index bbb79d758..8b27be181 100644 --- a/website/content/api-docs/variables/locks.mdx +++ b/website/content/api-docs/variables/locks.mdx @@ -21,17 +21,17 @@ it through the use of a parameter defining the operation to be performed. The lock operation parameter can be: -- `lock-acquire`: When used, the call will introduce a lock over the variable if -it exists, or create a new one if it doesn't. The lock ID will be returned in the -response and it must be provided to perform any other operation over the lock. +- `lock-acquire`: When used, the call will introduce a lock over the variable if +it exists, or create a new one if it doesn't. The lock ID will be returned in the +response and it must be provided to perform any other operation over the lock. The variable items can be updated at any time using the lock ID, but the lock parameters are unmmutable, attempting to modify them while a lock is present will -generate an error. +generate an error. In the case of attempting to acquire a variable that is already locked, a conflict -response will be returned. +response will be returned. -The lock-acquire operation will override the variable items if new values are +The lock-acquire operation will override the variable items if new values are present. @@ -62,7 +62,7 @@ $ curl \ #### Sample Response -The response body returns the created or updated variable including the lock +The response body returns the created or updated variable including the lock parameters and ID, along with metadata created by the server: ```json @@ -88,7 +88,7 @@ parameters and ID, along with metadata created by the server: - `lock-renew`: A valid call to lock renew needs to be placed before the lock's TTL is up in order to mantain the variable locked. A valid call must include the lock ID as part of the request body. If the lock TTL is up without a renewal or -release calls, the variable will remain unlockable for at least the lock delay. +release calls, the variable will remain unlockable for at least the lock delay. #### Sample Request @@ -132,8 +132,8 @@ parameters: ``` - `lock-release`: A call to the endpoint with the `lock-release` operation will -immediately remove the lock over the variable, making it modifiable without -restrictions again. +immediately remove the lock over the variable, making it modifiable without +restrictions again. The lock-release operation will not override the variable items, if the request body contains any item, it will generate a bad request response. @@ -197,15 +197,15 @@ will include only metadata and not the `Items` field: ## Restrictions -When creating a new variable using the lock-acquire operation, all the known -[restrictions][] regarding the path and size of the content apply, but unlike +When creating a new variable using the lock-acquire operation, all the known +[restrictions][] regarding the path and size of the content apply, but unlike regular variables, locked variables can be created with or without any items. The lock TTL and Delay must be values between 10 seconds and 24 hours. [Variables]: /nomad/docs/concepts/variables [restrictions]: /nomad/api-docs/variables/variables#restrictions -[`nomad var`]: /nomad/docs/commands/var +[`nomad var`]: /nomad/commands/var [blocking queries]: /nomad/api-docs#blocking-queries [required ACLs]: /nomad/api-docs#acls [RFC3986]: https://www.rfc-editor.org/rfc/rfc3986#section-2 diff --git a/website/content/api-docs/volumes.mdx b/website/content/api-docs/volumes.mdx index 81e5353af..a9f32b596 100644 --- a/website/content/api-docs/volumes.mdx +++ b/website/content/api-docs/volumes.mdx @@ -1219,6 +1219,6 @@ $ curl \ [required ACLs]: /nomad/api-docs#acls [csi]: https://github.com/container-storage-interface/spec [csi_plugin]: /nomad/docs/job-specification/csi_plugin -[csi_plugins_internals]: /nomad/docs/concepts/plugins/storage/csi +[csi_plugins_internals]: /nomad/docs/architecture/storage/csi [Create CSI Volume]: #create-csi-volume [Volume Expansion]: /nomad/docs/other-specifications/volume/csi#volume-expansion diff --git a/website/content/docs/commands/acl/auth-method/create.mdx b/website/content/commands/acl/auth-method/create.mdx similarity index 100% rename from website/content/docs/commands/acl/auth-method/create.mdx rename to website/content/commands/acl/auth-method/create.mdx diff --git a/website/content/docs/commands/acl/auth-method/delete.mdx b/website/content/commands/acl/auth-method/delete.mdx similarity index 100% rename from website/content/docs/commands/acl/auth-method/delete.mdx rename to website/content/commands/acl/auth-method/delete.mdx diff --git a/website/content/docs/commands/acl/auth-method/info.mdx b/website/content/commands/acl/auth-method/info.mdx similarity index 100% rename from website/content/docs/commands/acl/auth-method/info.mdx rename to website/content/commands/acl/auth-method/info.mdx diff --git a/website/content/docs/commands/acl/auth-method/list.mdx b/website/content/commands/acl/auth-method/list.mdx similarity index 100% rename from website/content/docs/commands/acl/auth-method/list.mdx rename to website/content/commands/acl/auth-method/list.mdx diff --git a/website/content/docs/commands/acl/auth-method/update.mdx b/website/content/commands/acl/auth-method/update.mdx similarity index 100% rename from website/content/docs/commands/acl/auth-method/update.mdx rename to website/content/commands/acl/auth-method/update.mdx diff --git a/website/content/docs/commands/acl/binding-rule/create.mdx b/website/content/commands/acl/binding-rule/create.mdx similarity index 100% rename from website/content/docs/commands/acl/binding-rule/create.mdx rename to website/content/commands/acl/binding-rule/create.mdx diff --git a/website/content/docs/commands/acl/binding-rule/delete.mdx b/website/content/commands/acl/binding-rule/delete.mdx similarity index 100% rename from website/content/docs/commands/acl/binding-rule/delete.mdx rename to website/content/commands/acl/binding-rule/delete.mdx diff --git a/website/content/docs/commands/acl/binding-rule/info.mdx b/website/content/commands/acl/binding-rule/info.mdx similarity index 100% rename from website/content/docs/commands/acl/binding-rule/info.mdx rename to website/content/commands/acl/binding-rule/info.mdx diff --git a/website/content/docs/commands/acl/binding-rule/list.mdx b/website/content/commands/acl/binding-rule/list.mdx similarity index 100% rename from website/content/docs/commands/acl/binding-rule/list.mdx rename to website/content/commands/acl/binding-rule/list.mdx diff --git a/website/content/docs/commands/acl/binding-rule/update.mdx b/website/content/commands/acl/binding-rule/update.mdx similarity index 100% rename from website/content/docs/commands/acl/binding-rule/update.mdx rename to website/content/commands/acl/binding-rule/update.mdx diff --git a/website/content/docs/commands/acl/bootstrap.mdx b/website/content/commands/acl/bootstrap.mdx similarity index 100% rename from website/content/docs/commands/acl/bootstrap.mdx rename to website/content/commands/acl/bootstrap.mdx diff --git a/website/content/docs/commands/acl/index.mdx b/website/content/commands/acl/index.mdx similarity index 64% rename from website/content/docs/commands/acl/index.mdx rename to website/content/commands/acl/index.mdx index 480eaf10e..b313baace 100644 --- a/website/content/docs/commands/acl/index.mdx +++ b/website/content/commands/acl/index.mdx @@ -50,33 +50,33 @@ subcommands are available: - [`acl token self`][tokenself] - Get info on self ACL token - [`acl token update`][tokenupdate] - Update existing ACL token -[bootstrap]: /nomad/docs/commands/acl/bootstrap -[authmethodcreate]: /nomad/docs/commands/acl/auth-method/create -[authmethoddelete]: /nomad/docs/commands/acl/auth-method/delete -[authmethodinfo]: /nomad/docs/commands/acl/auth-method/info -[authmethodlist]: /nomad/docs/commands/acl/auth-method/list -[authmethodupdate]: /nomad/docs/commands/acl/auth-method/update -[bindingrulecreate]: /nomad/docs/commands/acl/binding-rule/create -[bindingruledelete]: /nomad/docs/commands/acl/binding-rule/delete -[bindingruleinfo]: /nomad/docs/commands/acl/binding-rule/info -[bindingrulelist]: /nomad/docs/commands/acl/binding-rule/list -[bindingruleupdate]: /nomad/docs/commands/acl/binding-rule/update -[policyapply]: /nomad/docs/commands/acl/policy/apply -[policydelete]: /nomad/docs/commands/acl/policy/delete -[policyinfo]: /nomad/docs/commands/acl/policy/info -[policylist]: /nomad/docs/commands/acl/policy/list -[policyself]: /nomad/docs/commands/acl/policy/self -[tokencreate]: /nomad/docs/commands/acl/token/create -[tokenupdate]: /nomad/docs/commands/acl/token/update -[tokendelete]: /nomad/docs/commands/acl/token/delete -[tokeninfo]: /nomad/docs/commands/acl/token/info -[tokenlist]: /nomad/docs/commands/acl/token/list -[tokenself]: /nomad/docs/commands/acl/token/self -[rolecreate]: /nomad/docs/commands/acl/role/create -[roleupdate]: /nomad/docs/commands/acl/role/update -[roledelete]: /nomad/docs/commands/acl/role/delete -[roleinfo]: /nomad/docs/commands/acl/role/info -[rolelist]: /nomad/docs/commands/acl/role/list +[bootstrap]: /nomad/commands/acl/bootstrap +[authmethodcreate]: /nomad/commands/acl/auth-method/create +[authmethoddelete]: /nomad/commands/acl/auth-method/delete +[authmethodinfo]: /nomad/commands/acl/auth-method/info +[authmethodlist]: /nomad/commands/acl/auth-method/list +[authmethodupdate]: /nomad/commands/acl/auth-method/update +[bindingrulecreate]: /nomad/commands/acl/binding-rule/create +[bindingruledelete]: /nomad/commands/acl/binding-rule/delete +[bindingruleinfo]: /nomad/commands/acl/binding-rule/info +[bindingrulelist]: /nomad/commands/acl/binding-rule/list +[bindingruleupdate]: /nomad/commands/acl/binding-rule/update +[policyapply]: /nomad/commands/acl/policy/apply +[policydelete]: /nomad/commands/acl/policy/delete +[policyinfo]: /nomad/commands/acl/policy/info +[policylist]: /nomad/commands/acl/policy/list +[policyself]: /nomad/commands/acl/policy/self +[tokencreate]: /nomad/commands/acl/token/create +[tokenupdate]: /nomad/commands/acl/token/update +[tokendelete]: /nomad/commands/acl/token/delete +[tokeninfo]: /nomad/commands/acl/token/info +[tokenlist]: /nomad/commands/acl/token/list +[tokenself]: /nomad/commands/acl/token/self +[rolecreate]: /nomad/commands/acl/role/create +[roleupdate]: /nomad/commands/acl/role/update +[roledelete]: /nomad/commands/acl/role/delete +[roleinfo]: /nomad/commands/acl/role/info +[rolelist]: /nomad/commands/acl/role/list [secure-guide]: /nomad/tutorials/access-control -[federated]: /nomad/tutorials/manage-clusters/federation +[federated]: //nomad/docs/deploy/clusters/federate-regions [`authoritative_region`]: /nomad/docs/configuration/server#authoritative_region diff --git a/website/content/docs/commands/acl/policy/apply.mdx b/website/content/commands/acl/policy/apply.mdx similarity index 100% rename from website/content/docs/commands/acl/policy/apply.mdx rename to website/content/commands/acl/policy/apply.mdx diff --git a/website/content/docs/commands/acl/policy/delete.mdx b/website/content/commands/acl/policy/delete.mdx similarity index 100% rename from website/content/docs/commands/acl/policy/delete.mdx rename to website/content/commands/acl/policy/delete.mdx diff --git a/website/content/docs/commands/acl/policy/info.mdx b/website/content/commands/acl/policy/info.mdx similarity index 100% rename from website/content/docs/commands/acl/policy/info.mdx rename to website/content/commands/acl/policy/info.mdx diff --git a/website/content/docs/commands/acl/policy/list.mdx b/website/content/commands/acl/policy/list.mdx similarity index 100% rename from website/content/docs/commands/acl/policy/list.mdx rename to website/content/commands/acl/policy/list.mdx diff --git a/website/content/docs/commands/acl/policy/self.mdx b/website/content/commands/acl/policy/self.mdx similarity index 100% rename from website/content/docs/commands/acl/policy/self.mdx rename to website/content/commands/acl/policy/self.mdx diff --git a/website/content/docs/commands/acl/role/create.mdx b/website/content/commands/acl/role/create.mdx similarity index 100% rename from website/content/docs/commands/acl/role/create.mdx rename to website/content/commands/acl/role/create.mdx diff --git a/website/content/docs/commands/acl/role/delete.mdx b/website/content/commands/acl/role/delete.mdx similarity index 100% rename from website/content/docs/commands/acl/role/delete.mdx rename to website/content/commands/acl/role/delete.mdx diff --git a/website/content/docs/commands/acl/role/info.mdx b/website/content/commands/acl/role/info.mdx similarity index 100% rename from website/content/docs/commands/acl/role/info.mdx rename to website/content/commands/acl/role/info.mdx diff --git a/website/content/docs/commands/acl/role/list.mdx b/website/content/commands/acl/role/list.mdx similarity index 100% rename from website/content/docs/commands/acl/role/list.mdx rename to website/content/commands/acl/role/list.mdx diff --git a/website/content/docs/commands/acl/role/update.mdx b/website/content/commands/acl/role/update.mdx similarity index 100% rename from website/content/docs/commands/acl/role/update.mdx rename to website/content/commands/acl/role/update.mdx diff --git a/website/content/docs/commands/acl/token/create.mdx b/website/content/commands/acl/token/create.mdx similarity index 100% rename from website/content/docs/commands/acl/token/create.mdx rename to website/content/commands/acl/token/create.mdx diff --git a/website/content/docs/commands/acl/token/delete.mdx b/website/content/commands/acl/token/delete.mdx similarity index 100% rename from website/content/docs/commands/acl/token/delete.mdx rename to website/content/commands/acl/token/delete.mdx diff --git a/website/content/docs/commands/acl/token/info.mdx b/website/content/commands/acl/token/info.mdx similarity index 100% rename from website/content/docs/commands/acl/token/info.mdx rename to website/content/commands/acl/token/info.mdx diff --git a/website/content/docs/commands/acl/token/list.mdx b/website/content/commands/acl/token/list.mdx similarity index 100% rename from website/content/docs/commands/acl/token/list.mdx rename to website/content/commands/acl/token/list.mdx diff --git a/website/content/docs/commands/acl/token/self.mdx b/website/content/commands/acl/token/self.mdx similarity index 100% rename from website/content/docs/commands/acl/token/self.mdx rename to website/content/commands/acl/token/self.mdx diff --git a/website/content/docs/commands/acl/token/update.mdx b/website/content/commands/acl/token/update.mdx similarity index 100% rename from website/content/docs/commands/acl/token/update.mdx rename to website/content/commands/acl/token/update.mdx diff --git a/website/content/docs/commands/agent-info.mdx b/website/content/commands/agent-info.mdx similarity index 100% rename from website/content/docs/commands/agent-info.mdx rename to website/content/commands/agent-info.mdx diff --git a/website/content/docs/commands/agent.mdx b/website/content/commands/agent.mdx similarity index 98% rename from website/content/docs/commands/agent.mdx rename to website/content/commands/agent.mdx index 9440d47cf..1db14e179 100644 --- a/website/content/docs/commands/agent.mdx +++ b/website/content/commands/agent.mdx @@ -230,7 +230,7 @@ You may, however, may pass the following configuration options as CLI arguments: [data_dir]: /nomad/docs/configuration#data_dir [datacenter]: /nomad/docs/configuration#datacenter [enabled]: /nomad/docs/configuration/acl#enabled -[encryption overview]: /nomad/tutorials/transport-security/security-gossip-encryption +[encryption overview]: /nomad/docs/secure/traffic/gossip-encryption [key_file]: /nomad/docs/configuration/consul#key_file [log_include_location]: /nomad/docs/configuration#log_include_location [log_json]: /nomad/docs/configuration#log_json @@ -240,7 +240,7 @@ You may, however, may pass the following configuration options as CLI arguments: [network_interface]: /nomad/docs/configuration/client#network_interface [node_class]: /nomad/docs/configuration/client#node_class [node_pool]: /nomad/docs/configuration/client#node_pool -[Operating Nomad agents]: /nomad/docs/operations/nomad-agent +[Operating Nomad agents]: /nomad/docs/deploy/nomad-agent [Nomad agent configuration]: /nomad/docs/configuration [plugin_dir]: /nomad/docs/configuration#plugin_dir [region]: /nomad/docs/configuration#region diff --git a/website/content/docs/commands/alloc/checks.mdx b/website/content/commands/alloc/checks.mdx similarity index 100% rename from website/content/docs/commands/alloc/checks.mdx rename to website/content/commands/alloc/checks.mdx diff --git a/website/content/docs/commands/alloc/exec.mdx b/website/content/commands/alloc/exec.mdx similarity index 100% rename from website/content/docs/commands/alloc/exec.mdx rename to website/content/commands/alloc/exec.mdx diff --git a/website/content/docs/commands/alloc/fs.mdx b/website/content/commands/alloc/fs.mdx similarity index 96% rename from website/content/docs/commands/alloc/fs.mdx rename to website/content/commands/alloc/fs.mdx index db2bfd992..689170f24 100644 --- a/website/content/docs/commands/alloc/fs.mdx +++ b/website/content/commands/alloc/fs.mdx @@ -114,4 +114,4 @@ bam @include 'general_options.mdx' -[allocation working directory]: /nomad/docs/runtime/environment#task-directories 'Task Directories' +[allocation working directory]: /nomad/docs/reference/runtime-environment-settings#task-directories 'Task Directories' diff --git a/website/content/docs/commands/alloc/index.mdx b/website/content/commands/alloc/index.mdx similarity index 63% rename from website/content/docs/commands/alloc/index.mdx rename to website/content/commands/alloc/index.mdx index a95746aad..d3fa653c8 100644 --- a/website/content/docs/commands/alloc/index.mdx +++ b/website/content/commands/alloc/index.mdx @@ -25,11 +25,11 @@ subcommands are available: - [`alloc status`][status] - Display allocation status information and metadata - [`alloc stop`][stop] - Stop and reschedule a running allocation -[checks]: /nomad/docs/commands/alloc/checks 'Outputs service health check status information' -[exec]: /nomad/docs/commands/alloc/exec 'Run a command in a running allocation' -[fs]: /nomad/docs/commands/alloc/fs 'Inspect the contents of an allocation directory' -[logs]: /nomad/docs/commands/alloc/logs 'Streams the logs of a task' -[restart]: /nomad/docs/commands/alloc/restart 'Restart a running allocation or task' -[signal]: /nomad/docs/commands/alloc/signal 'Signal a running allocation' -[status]: /nomad/docs/commands/alloc/status 'Display allocation status information and metadata' -[stop]: /nomad/docs/commands/alloc/stop 'Stop and reschedule a running allocation' +[checks]: /nomad/commands/alloc/checks 'Outputs service health check status information' +[exec]: /nomad/commands/alloc/exec 'Run a command in a running allocation' +[fs]: /nomad/commands/alloc/fs 'Inspect the contents of an allocation directory' +[logs]: /nomad/commands/alloc/logs 'Streams the logs of a task' +[restart]: /nomad/commands/alloc/restart 'Restart a running allocation or task' +[signal]: /nomad/commands/alloc/signal 'Signal a running allocation' +[status]: /nomad/commands/alloc/status 'Display allocation status information and metadata' +[stop]: /nomad/commands/alloc/stop 'Stop and reschedule a running allocation' diff --git a/website/content/docs/commands/alloc/logs.mdx b/website/content/commands/alloc/logs.mdx similarity index 100% rename from website/content/docs/commands/alloc/logs.mdx rename to website/content/commands/alloc/logs.mdx diff --git a/website/content/docs/commands/alloc/pause.mdx b/website/content/commands/alloc/pause.mdx similarity index 100% rename from website/content/docs/commands/alloc/pause.mdx rename to website/content/commands/alloc/pause.mdx diff --git a/website/content/docs/commands/alloc/restart.mdx b/website/content/commands/alloc/restart.mdx similarity index 100% rename from website/content/docs/commands/alloc/restart.mdx rename to website/content/commands/alloc/restart.mdx diff --git a/website/content/docs/commands/alloc/signal.mdx b/website/content/commands/alloc/signal.mdx similarity index 100% rename from website/content/docs/commands/alloc/signal.mdx rename to website/content/commands/alloc/signal.mdx diff --git a/website/content/docs/commands/alloc/status.mdx b/website/content/commands/alloc/status.mdx similarity index 100% rename from website/content/docs/commands/alloc/status.mdx rename to website/content/commands/alloc/status.mdx diff --git a/website/content/docs/commands/alloc/stop.mdx b/website/content/commands/alloc/stop.mdx similarity index 98% rename from website/content/docs/commands/alloc/stop.mdx rename to website/content/commands/alloc/stop.mdx index 4f3f73062..1d41a1afc 100644 --- a/website/content/docs/commands/alloc/stop.mdx +++ b/website/content/commands/alloc/stop.mdx @@ -65,6 +65,6 @@ $ nomad alloc stop -detach eb17e557 @include 'general_options.mdx' -[eval status]: /nomad/docs/commands/eval/status +[eval status]: /nomad/commands/eval/status [`shutdown_delay`]: /nomad/docs/job-specification/group#shutdown_delay [system allocs will not]: /nomad/docs/job-specification/reschedule diff --git a/website/content/docs/commands/config/index.mdx b/website/content/commands/config/index.mdx similarity index 86% rename from website/content/docs/commands/config/index.mdx rename to website/content/commands/config/index.mdx index f71d297b7..bbf2890d9 100644 --- a/website/content/docs/commands/config/index.mdx +++ b/website/content/commands/config/index.mdx @@ -18,4 +18,4 @@ following subcommands are available: - [`config validate`][validate] - Validate configuration files -[validate]: /nomad/docs/commands/config/validate 'Validate configuration files' +[validate]: /nomad/commands/config/validate 'Validate configuration files' diff --git a/website/content/docs/commands/config/validate.mdx b/website/content/commands/config/validate.mdx similarity index 100% rename from website/content/docs/commands/config/validate.mdx rename to website/content/commands/config/validate.mdx diff --git a/website/content/docs/commands/deployment/fail.mdx b/website/content/commands/deployment/fail.mdx similarity index 97% rename from website/content/docs/commands/deployment/fail.mdx rename to website/content/commands/deployment/fail.mdx index 11bd88464..7b07211ab 100644 --- a/website/content/docs/commands/deployment/fail.mdx +++ b/website/content/commands/deployment/fail.mdx @@ -62,4 +62,4 @@ cache 3 2 1 0 @include 'general_options.mdx' -[eval status]: /nomad/docs/commands/eval/status +[eval status]: /nomad/commands/eval/status diff --git a/website/content/docs/commands/deployment/index.mdx b/website/content/commands/deployment/index.mdx similarity index 65% rename from website/content/docs/commands/deployment/index.mdx rename to website/content/commands/deployment/index.mdx index 8b935e06c..b9b922f03 100644 --- a/website/content/docs/commands/deployment/index.mdx +++ b/website/content/commands/deployment/index.mdx @@ -23,9 +23,9 @@ subcommands are available: - [`deployment resume`][resume] - Resume a paused deployment - [`deployment status`][status] - Display the status of a deployment -[fail]: /nomad/docs/commands/deployment/fail 'Manually fail a deployment' -[list]: /nomad/docs/commands/deployment/list 'List all deployments' -[pause]: /nomad/docs/commands/deployment/pause 'Pause a deployment' -[promote]: /nomad/docs/commands/deployment/promote 'Promote canaries in a deployment' -[resume]: /nomad/docs/commands/deployment/resume 'Resume a paused deployment' -[status]: /nomad/docs/commands/deployment/status 'Display the status of a deployment' +[fail]: /nomad/commands/deployment/fail 'Manually fail a deployment' +[list]: /nomad/commands/deployment/list 'List all deployments' +[pause]: /nomad/commands/deployment/pause 'Pause a deployment' +[promote]: /nomad/commands/deployment/promote 'Promote canaries in a deployment' +[resume]: /nomad/commands/deployment/resume 'Resume a paused deployment' +[status]: /nomad/commands/deployment/status 'Display the status of a deployment' diff --git a/website/content/docs/commands/deployment/list.mdx b/website/content/commands/deployment/list.mdx similarity index 100% rename from website/content/docs/commands/deployment/list.mdx rename to website/content/commands/deployment/list.mdx diff --git a/website/content/docs/commands/deployment/pause.mdx b/website/content/commands/deployment/pause.mdx similarity index 100% rename from website/content/docs/commands/deployment/pause.mdx rename to website/content/commands/deployment/pause.mdx diff --git a/website/content/docs/commands/deployment/promote.mdx b/website/content/commands/deployment/promote.mdx similarity index 98% rename from website/content/docs/commands/deployment/promote.mdx rename to website/content/commands/deployment/promote.mdx index 517365a93..538a8ef49 100644 --- a/website/content/docs/commands/deployment/promote.mdx +++ b/website/content/commands/deployment/promote.mdx @@ -222,5 +222,5 @@ ee8f972e 6240eed6 web 0 run running 07/25/17 18:37:08 UT @include 'general_options.mdx' -[`job revert`]: /nomad/docs/commands/job/revert -[eval status]: /nomad/docs/commands/eval/status +[`job revert`]: /nomad/commands/job/revert +[eval status]: /nomad/commands/eval/status diff --git a/website/content/docs/commands/deployment/resume.mdx b/website/content/commands/deployment/resume.mdx similarity index 96% rename from website/content/docs/commands/deployment/resume.mdx rename to website/content/commands/deployment/resume.mdx index f09043c12..f329148a0 100644 --- a/website/content/docs/commands/deployment/resume.mdx +++ b/website/content/commands/deployment/resume.mdx @@ -53,4 +53,4 @@ Deployment "c848972e-dcd3-7354-e0d2-39d86642cdb1" resumed @include 'general_options.mdx' -[eval status]: /nomad/docs/commands/eval/status +[eval status]: /nomad/commands/eval/status diff --git a/website/content/docs/commands/deployment/status.mdx b/website/content/commands/deployment/status.mdx similarity index 100% rename from website/content/docs/commands/deployment/status.mdx rename to website/content/commands/deployment/status.mdx diff --git a/website/content/docs/commands/deployment/unblock.mdx b/website/content/commands/deployment/unblock.mdx similarity index 95% rename from website/content/docs/commands/deployment/unblock.mdx rename to website/content/commands/deployment/unblock.mdx index 5ac8c35cd..fab3def1a 100644 --- a/website/content/docs/commands/deployment/unblock.mdx +++ b/website/content/commands/deployment/unblock.mdx @@ -71,5 +71,5 @@ cache 3 2 1 0 @include 'general_options.mdx' -[eval status]: /nomad/docs/commands/eval/status -[federated regions]: /nomad/tutorials/manage-clusters/federation +[eval status]: /nomad/commands/eval/status +[federated regions]: //nomad/docs/deploy/clusters/federate-regions diff --git a/website/content/docs/commands/eval/delete.mdx b/website/content/commands/eval/delete.mdx similarity index 93% rename from website/content/docs/commands/eval/delete.mdx rename to website/content/commands/eval/delete.mdx index 4155d2eaa..a504fc284 100644 --- a/website/content/docs/commands/eval/delete.mdx +++ b/website/content/commands/eval/delete.mdx @@ -69,5 +69,5 @@ Successfully deleted 23 evaluations @include 'general_options.mdx' -[scheduler_get_config]: /nomad/docs/commands/operator/scheduler/get-config -[scheduler_set_config]: /nomad/docs/commands/operator/scheduler/set-config +[scheduler_get_config]: /nomad/commands/operator/scheduler/get-config +[scheduler_set_config]: /nomad/commands/operator/scheduler/set-config diff --git a/website/content/docs/commands/eval/index.mdx b/website/content/commands/eval/index.mdx similarity index 75% rename from website/content/docs/commands/eval/index.mdx rename to website/content/commands/eval/index.mdx index eea008188..4e9bf66c0 100644 --- a/website/content/docs/commands/eval/index.mdx +++ b/website/content/commands/eval/index.mdx @@ -19,6 +19,6 @@ subcommands are available: - [`eval list`][list] - List all evals - [`eval status`][status] - Display the status of a eval -[delete]: /nomad/docs/commands/eval/delete 'Delete evals' -[list]: /nomad/docs/commands/eval/list 'List all evals' -[status]: /nomad/docs/commands/eval/status 'Display the status of a eval' +[delete]: /nomad/commands/eval/delete 'Delete evals' +[list]: /nomad/commands/eval/list 'List all evals' +[status]: /nomad/commands/eval/status 'Display the status of a eval' diff --git a/website/content/docs/commands/eval/list.mdx b/website/content/commands/eval/list.mdx similarity index 100% rename from website/content/docs/commands/eval/list.mdx rename to website/content/commands/eval/list.mdx diff --git a/website/content/docs/commands/eval/status.mdx b/website/content/commands/eval/status.mdx similarity index 100% rename from website/content/docs/commands/eval/status.mdx rename to website/content/commands/eval/status.mdx diff --git a/website/content/docs/commands/fmt.mdx b/website/content/commands/fmt.mdx similarity index 100% rename from website/content/docs/commands/fmt.mdx rename to website/content/commands/fmt.mdx diff --git a/website/content/docs/commands/index.mdx b/website/content/commands/index.mdx similarity index 100% rename from website/content/docs/commands/index.mdx rename to website/content/commands/index.mdx diff --git a/website/content/docs/commands/job/action.mdx b/website/content/commands/job/action.mdx similarity index 100% rename from website/content/docs/commands/job/action.mdx rename to website/content/commands/job/action.mdx diff --git a/website/content/docs/commands/job/allocs.mdx b/website/content/commands/job/allocs.mdx similarity index 97% rename from website/content/docs/commands/job/allocs.mdx rename to website/content/commands/job/allocs.mdx index fd9aa6db2..8dcdf2ee2 100644 --- a/website/content/docs/commands/job/allocs.mdx +++ b/website/content/commands/job/allocs.mdx @@ -68,7 +68,7 @@ c2b4606d-1b02-0d8d-5fdd-031167cd4c91 ``` Refer to the [Format Nomad Command Output With -Templates](/nomad/tutorials/templates/format-output-with-templates) tutorial for +Templates](/nomad/docs/manage/format-cli-output) tutorial for more examples of using Go templates to format Nomad CLI output. ## General options diff --git a/website/content/docs/commands/job/deployments.mdx b/website/content/commands/job/deployments.mdx similarity index 100% rename from website/content/docs/commands/job/deployments.mdx rename to website/content/commands/job/deployments.mdx diff --git a/website/content/docs/commands/job/dispatch.mdx b/website/content/commands/job/dispatch.mdx similarity index 98% rename from website/content/docs/commands/job/dispatch.mdx rename to website/content/commands/job/dispatch.mdx index ca7588ae2..58ec328aa 100644 --- a/website/content/docs/commands/job/dispatch.mdx +++ b/website/content/commands/job/dispatch.mdx @@ -167,8 +167,8 @@ Evaluation ID = 31199841 @include 'general_options.mdx' -[eval status]: /nomad/docs/commands/eval/status +[eval status]: /nomad/commands/eval/status [parameterized job]: /nomad/docs/job-specification/parameterized 'Nomad parameterized Job Specification' [multiregion]: /nomad/docs/job-specification/multiregion#parameterized-dispatch [`job_max_priority`]: /nomad/docs/configuration/server#job_max_priority -[job parameters]: /nomad/docs/job-specification/job#job-parameters +[job parameters]: /nomad/docs/job-specification/job#parameters diff --git a/website/content/docs/commands/job/eval.mdx b/website/content/commands/job/eval.mdx similarity index 98% rename from website/content/docs/commands/job/eval.mdx rename to website/content/commands/job/eval.mdx index 0b20f23e8..87d414b12 100644 --- a/website/content/docs/commands/job/eval.mdx +++ b/website/content/commands/job/eval.mdx @@ -75,4 +75,4 @@ $ nomad job eval -force-reschedule job1 @include 'general_options.mdx' -[eval status]: /nomad/docs/commands/eval/status +[eval status]: /nomad/commands/eval/status diff --git a/website/content/docs/commands/job/history.mdx b/website/content/commands/job/history.mdx similarity index 100% rename from website/content/docs/commands/job/history.mdx rename to website/content/commands/job/history.mdx diff --git a/website/content/docs/commands/job/index.mdx b/website/content/commands/job/index.mdx similarity index 54% rename from website/content/docs/commands/job/index.mdx rename to website/content/commands/job/index.mdx index bed9c1c4c..9ad611f45 100644 --- a/website/content/docs/commands/job/index.mdx +++ b/website/content/commands/job/index.mdx @@ -38,23 +38,23 @@ subcommands are available: - [`job validate`][validate] - Check a job specification for syntax errors -[action]: /nomad/docs/commands/job/action 'Execute predefined actions' -[allocs]: /nomad/docs/commands/job/allocs 'List allocations for a job' -[deployments]: /nomad/docs/commands/job/deployments 'List deployments for a job' -[dispatch]: /nomad/docs/commands/job/dispatch 'Dispatch an instance of a parameterized job' -[eval]: /nomad/docs/commands/job/eval 'Force an evaluation for a job' -[history]: /nomad/docs/commands/job/history 'Display all tracked versions of a job' -[init]: /nomad/docs/commands/job/init 'Create an example job specification' -[inspect]: /nomad/docs/commands/job/inspect 'Inspect the contents of a submitted job' -[periodic force]: /nomad/docs/commands/job/periodic-force 'Force the evaluation of a periodic job' -[plan]: /nomad/docs/commands/job/plan 'Schedule a dry run for a job' -[restart]: /nomad/docs/commands/job/restart 'Restart or reschedule allocations for a job' -[revert]: /nomad/docs/commands/job/revert 'Revert to a prior version of the job' -[run]: /nomad/docs/commands/job/run 'Submit a new job' -[status]: /nomad/docs/commands/job/status 'Display status information about a job' -[scale]: /nomad/docs/commands/job/scale 'Update the number of allocations for a task group in a job' -[scaling-events]: /nomad/docs/commands/job/scaling-events 'List the recent scaling events for a job' -[stop]: /nomad/docs/commands/job/stop 'Stop a running job and cancel its allocations' -[tag]: /nomad/docs/commands/job/tag 'Tag a job with a version' -[validate]: /nomad/docs/commands/job/validate 'Check a job specification for syntax errors' -[promote]: /nomad/docs/commands/job/promote +[action]: /nomad/commands/job/action 'Execute predefined actions' +[allocs]: /nomad/commands/job/allocs 'List allocations for a job' +[deployments]: /nomad/commands/job/deployments 'List deployments for a job' +[dispatch]: /nomad/commands/job/dispatch 'Dispatch an instance of a parameterized job' +[eval]: /nomad/commands/job/eval 'Force an evaluation for a job' +[history]: /nomad/commands/job/history 'Display all tracked versions of a job' +[init]: /nomad/commands/job/init 'Create an example job specification' +[inspect]: /nomad/commands/job/inspect 'Inspect the contents of a submitted job' +[periodic force]: /nomad/commands/job/periodic-force 'Force the evaluation of a periodic job' +[plan]: /nomad/commands/job/plan 'Schedule a dry run for a job' +[restart]: /nomad/commands/job/restart 'Restart or reschedule allocations for a job' +[revert]: /nomad/commands/job/revert 'Revert to a prior version of the job' +[run]: /nomad/commands/job/run 'Submit a new job' +[status]: /nomad/commands/job/status 'Display status information about a job' +[scale]: /nomad/commands/job/scale 'Update the number of allocations for a task group in a job' +[scaling-events]: /nomad/commands/job/scaling-events 'List the recent scaling events for a job' +[stop]: /nomad/commands/job/stop 'Stop a running job and cancel its allocations' +[tag]: /nomad/commands/job/tag 'Tag a job with a version' +[validate]: /nomad/commands/job/validate 'Check a job specification for syntax errors' +[promote]: /nomad/commands/job/promote diff --git a/website/content/docs/commands/job/init.mdx b/website/content/commands/job/init.mdx similarity index 94% rename from website/content/docs/commands/job/init.mdx rename to website/content/commands/job/init.mdx index 205e05ac2..ae0902b2e 100644 --- a/website/content/docs/commands/job/init.mdx +++ b/website/content/commands/job/init.mdx @@ -42,4 +42,4 @@ Example job file written to example.nomad.hcl ``` [jobspec]: /nomad/docs/job-specification 'Nomad Job Specification' -[drivers]: /nomad/docs/drivers 'Nomad Task Drivers documentation' +[drivers]: /nomad/docs/job-declare/task-driver 'Nomad Task Drivers documentation' diff --git a/website/content/docs/commands/job/inspect.mdx b/website/content/commands/job/inspect.mdx similarity index 100% rename from website/content/docs/commands/job/inspect.mdx rename to website/content/commands/job/inspect.mdx diff --git a/website/content/docs/commands/job/periodic-force.mdx b/website/content/commands/job/periodic-force.mdx similarity index 98% rename from website/content/docs/commands/job/periodic-force.mdx rename to website/content/commands/job/periodic-force.mdx index 0cb107880..a32c962b0 100644 --- a/website/content/docs/commands/job/periodic-force.mdx +++ b/website/content/commands/job/periodic-force.mdx @@ -66,6 +66,6 @@ Evaluation ID: 0865fbf3-30de-5f53-0811-821e73e63178 @include 'general_options.mdx' -[eval status]: /nomad/docs/commands/eval/status +[eval status]: /nomad/commands/eval/status [force the evaluation]: /nomad/api-docs/jobs#force-new-periodic-instance [periodic job]: /nomad/docs/job-specification/periodic diff --git a/website/content/docs/commands/job/plan.mdx b/website/content/commands/job/plan.mdx similarity index 99% rename from website/content/docs/commands/job/plan.mdx rename to website/content/commands/job/plan.mdx index 1814b4181..b57ffb2de 100644 --- a/website/content/docs/commands/job/plan.mdx +++ b/website/content/commands/job/plan.mdx @@ -239,5 +239,5 @@ if a change is detected. [job specification]: /nomad/docs/job-specification [hcl job specification]: /nomad/docs/job-specification [`go-getter`]: https://github.com/hashicorp/go-getter -[`nomad job run -check-index`]: /nomad/docs/commands/job/run#check-index +[`nomad job run -check-index`]: /nomad/commands/job/run#check-index [`tee`]: https://man7.org/linux/man-pages/man1/tee.1.html diff --git a/website/content/docs/commands/job/promote.mdx b/website/content/commands/job/promote.mdx similarity index 98% rename from website/content/docs/commands/job/promote.mdx rename to website/content/commands/job/promote.mdx index 8c8d87c64..de8e2b536 100644 --- a/website/content/docs/commands/job/promote.mdx +++ b/website/content/commands/job/promote.mdx @@ -224,5 +224,5 @@ ee8f972e 6240eed6 web 0 run running 07/25/17 18:37:08 UT @include 'general_options.mdx' -[job revert]: /nomad/docs/commands/job/revert -[eval status]: /nomad/docs/commands/eval/status +[job revert]: /nomad/commands/job/revert +[eval status]: /nomad/commands/eval/status diff --git a/website/content/docs/commands/job/restart.mdx b/website/content/commands/job/restart.mdx similarity index 100% rename from website/content/docs/commands/job/restart.mdx rename to website/content/commands/job/restart.mdx diff --git a/website/content/docs/commands/job/revert.mdx b/website/content/commands/job/revert.mdx similarity index 95% rename from website/content/docs/commands/job/revert.mdx rename to website/content/commands/job/revert.mdx index 9f23ae48b..94235dbe5 100644 --- a/website/content/docs/commands/job/revert.mdx +++ b/website/content/commands/job/revert.mdx @@ -101,6 +101,6 @@ Submit Date = 07/25/17 21:27:18 UTC @include 'general_options.mdx' -[`job history`]: /nomad/docs/commands/job/history -[eval status]: /nomad/docs/commands/eval/status -[run]: /nomad/docs/commands/job/run +[`job history`]: /nomad/commands/job/history +[eval status]: /nomad/commands/eval/status +[run]: /nomad/commands/job/run diff --git a/website/content/docs/commands/job/run.mdx b/website/content/commands/job/run.mdx similarity index 97% rename from website/content/docs/commands/job/run.mdx rename to website/content/commands/job/run.mdx index e53c16dff..d9654cb3b 100644 --- a/website/content/docs/commands/job/run.mdx +++ b/website/content/commands/job/run.mdx @@ -209,10 +209,10 @@ $ nomad job run example.nomad.hcl @include 'general_options.mdx' -[`batch`]: /nomad/docs/schedulers#batch -[eval status]: /nomad/docs/commands/eval/status +[`batch`]: /nomad/docs/concepts/scheduling/schedulers#batch +[eval status]: /nomad/commands/eval/status [`go-getter`]: https://github.com/hashicorp/go-getter -[`job plan` command]: /nomad/docs/commands/job/plan +[`job plan` command]: /nomad/commands/job/plan [job specification]: /nomad/docs/job-specification [JSON jobs]: /nomad/api-docs/json-jobs -[`system`]: /nomad/docs/schedulers#system +[`system`]: /nomad/docs/concepts/scheduling/schedulers#system diff --git a/website/content/docs/commands/job/scale.mdx b/website/content/commands/job/scale.mdx similarity index 98% rename from website/content/docs/commands/job/scale.mdx rename to website/content/commands/job/scale.mdx index 5a5a2dd1d..c346cc29a 100644 --- a/website/content/docs/commands/job/scale.mdx +++ b/website/content/commands/job/scale.mdx @@ -101,4 +101,4 @@ $ nomad job scale job1 group1 8 @include 'general_options.mdx' -[eval status]: /nomad/docs/commands/eval/status +[eval status]: /nomad/commands/eval/status diff --git a/website/content/docs/commands/job/scaling-events.mdx b/website/content/commands/job/scaling-events.mdx similarity index 100% rename from website/content/docs/commands/job/scaling-events.mdx rename to website/content/commands/job/scaling-events.mdx diff --git a/website/content/docs/commands/job/start.mdx b/website/content/commands/job/start.mdx similarity index 95% rename from website/content/docs/commands/job/start.mdx rename to website/content/commands/job/start.mdx index b1a64e833..b3d9e307f 100644 --- a/website/content/docs/commands/job/start.mdx +++ b/website/content/commands/job/start.mdx @@ -56,6 +56,6 @@ $ nomad job start example @include 'general_options.mdx' -[eval status]: /nomad/docs/commands/eval/status -[run]: /nomad/docs/commands/job/run +[eval status]: /nomad/commands/eval/status +[run]: /nomad/commands/job/run [Job statuses]: /nomad/docs/concepts/job#job-statuses diff --git a/website/content/docs/commands/job/status.mdx b/website/content/commands/job/status.mdx similarity index 100% rename from website/content/docs/commands/job/status.mdx rename to website/content/commands/job/status.mdx diff --git a/website/content/docs/commands/job/stop.mdx b/website/content/commands/job/stop.mdx similarity index 98% rename from website/content/docs/commands/job/stop.mdx rename to website/content/commands/job/stop.mdx index ac20a02ac..a3bd91234 100644 --- a/website/content/docs/commands/job/stop.mdx +++ b/website/content/commands/job/stop.mdx @@ -125,6 +125,6 @@ $ nomad job stop -detach job1 @include 'general_options.mdx' -[eval status]: /nomad/docs/commands/eval/status +[eval status]: /nomad/commands/eval/status [multi-region]: /nomad/docs/job-specification/multiregion [`shutdown_delay`]: /nomad/docs/job-specification/group#shutdown_delay diff --git a/website/content/docs/commands/job/tag/apply.mdx b/website/content/commands/job/tag/apply.mdx similarity index 100% rename from website/content/docs/commands/job/tag/apply.mdx rename to website/content/commands/job/tag/apply.mdx diff --git a/website/content/docs/commands/job/tag/index.mdx b/website/content/commands/job/tag/index.mdx similarity index 74% rename from website/content/docs/commands/job/tag/index.mdx rename to website/content/commands/job/tag/index.mdx index f29b1896b..b6f0f44a9 100644 --- a/website/content/docs/commands/job/tag/index.mdx +++ b/website/content/commands/job/tag/index.mdx @@ -17,5 +17,5 @@ Usage: `nomad job tag [options] [args]` `job tag` has the following subcommands: -- [`job tag apply`](/nomad/docs/commands/job/tag/apply): Save a job version tag. -- [`job tag unset`](/nomad/docs/commands/job/tag/apply): Remove a tag from a job version. +- [`job tag apply`](/nomad/commands/job/tag/apply): Save a job version tag. +- [`job tag unset`](/nomad/commands/job/tag/apply): Remove a tag from a job version. diff --git a/website/content/docs/commands/job/tag/unset.mdx b/website/content/commands/job/tag/unset.mdx similarity index 100% rename from website/content/docs/commands/job/tag/unset.mdx rename to website/content/commands/job/tag/unset.mdx diff --git a/website/content/docs/commands/job/validate.mdx b/website/content/commands/job/validate.mdx similarity index 100% rename from website/content/docs/commands/job/validate.mdx rename to website/content/commands/job/validate.mdx diff --git a/website/content/docs/commands/license/get.mdx b/website/content/commands/license/get.mdx similarity index 100% rename from website/content/docs/commands/license/get.mdx rename to website/content/commands/license/get.mdx diff --git a/website/content/docs/commands/license/index.mdx b/website/content/commands/license/index.mdx similarity index 79% rename from website/content/docs/commands/license/index.mdx rename to website/content/commands/license/index.mdx index 8fa6c73b2..da7416928 100644 --- a/website/content/docs/commands/license/index.mdx +++ b/website/content/commands/license/index.mdx @@ -22,5 +22,5 @@ subcommands are available: - [`license get`][get] - Get the current license from a server - [`license inspect`][inspect] - Inspect and validate a license -[get]: /nomad/docs/commands/license/get 'Get the current license from a server' -[inspect]: /nomad/docs/commands/license/inspect 'Inspect and validate a license' +[get]: /nomad/commands/license/get 'Get the current license from a server' +[inspect]: /nomad/commands/license/inspect 'Inspect and validate a license' diff --git a/website/content/docs/commands/license/inspect.mdx b/website/content/commands/license/inspect.mdx similarity index 100% rename from website/content/docs/commands/license/inspect.mdx rename to website/content/commands/license/inspect.mdx diff --git a/website/content/docs/commands/login.mdx b/website/content/commands/login.mdx similarity index 100% rename from website/content/docs/commands/login.mdx rename to website/content/commands/login.mdx diff --git a/website/content/docs/commands/monitor.mdx b/website/content/commands/monitor.mdx similarity index 100% rename from website/content/docs/commands/monitor.mdx rename to website/content/commands/monitor.mdx diff --git a/website/content/docs/commands/namespace/apply.mdx b/website/content/commands/namespace/apply.mdx similarity index 95% rename from website/content/docs/commands/namespace/apply.mdx rename to website/content/commands/namespace/apply.mdx index e451b02c1..905b77199 100644 --- a/website/content/docs/commands/namespace/apply.mdx +++ b/website/content/commands/namespace/apply.mdx @@ -10,7 +10,7 @@ description: | The `namespace apply` command is used create or update a namespace. - Visit the + Visit the Nomad Namespaces tutorial for more information. diff --git a/website/content/docs/commands/namespace/delete.mdx b/website/content/commands/namespace/delete.mdx similarity index 100% rename from website/content/docs/commands/namespace/delete.mdx rename to website/content/commands/namespace/delete.mdx diff --git a/website/content/docs/commands/namespace/index.mdx b/website/content/commands/namespace/index.mdx similarity index 67% rename from website/content/docs/commands/namespace/index.mdx rename to website/content/commands/namespace/index.mdx index 18ea7f84f..8773566de 100644 --- a/website/content/docs/commands/namespace/index.mdx +++ b/website/content/commands/namespace/index.mdx @@ -9,7 +9,7 @@ description: | The `namespace` command is used to interact with namespaces. -Visit [Create and use namespaces](/nomad/tutorials/manage-clusters/namespaces) for more information. +Visit [Create and use namespaces](/nomad/docs/govern/namespaces) for more information. ## Usage @@ -28,10 +28,10 @@ In [federated][] clusters, all namespace updates are forwarded to the [`authoritative_region`][] and replicated to non-authoritative regions. This requires that ACLs have been bootstrapped in the authoritative region. -[apply]: /nomad/docs/commands/namespace/apply 'Create or update a namespace' -[delete]: /nomad/docs/commands/namespace/delete 'Delete a namespace' -[inspect]: /nomad/docs/commands/namespace/inspect 'Inspect a namespace' -[list]: /nomad/docs/commands/namespace/list 'List available namespaces' -[status]: /nomad/docs/commands/namespace/status "Display a namespace's status" -[federated]: /nomad/tutorials/manage-clusters/federation +[apply]: /nomad/commands/namespace/apply 'Create or update a namespace' +[delete]: /nomad/commands/namespace/delete 'Delete a namespace' +[inspect]: /nomad/commands/namespace/inspect 'Inspect a namespace' +[list]: /nomad/commands/namespace/list 'List available namespaces' +[status]: /nomad/commands/namespace/status "Display a namespace's status" +[federated]: //nomad/docs/deploy/clusters/federate-regions [`authoritative_region`]: /nomad/docs/configuration/server#authoritative_region diff --git a/website/content/docs/commands/namespace/inspect.mdx b/website/content/commands/namespace/inspect.mdx similarity index 100% rename from website/content/docs/commands/namespace/inspect.mdx rename to website/content/commands/namespace/inspect.mdx diff --git a/website/content/docs/commands/namespace/list.mdx b/website/content/commands/namespace/list.mdx similarity index 100% rename from website/content/docs/commands/namespace/list.mdx rename to website/content/commands/namespace/list.mdx diff --git a/website/content/docs/commands/namespace/status.mdx b/website/content/commands/namespace/status.mdx similarity index 100% rename from website/content/docs/commands/namespace/status.mdx rename to website/content/commands/namespace/status.mdx diff --git a/website/content/docs/commands/node-pool/apply.mdx b/website/content/commands/node-pool/apply.mdx similarity index 100% rename from website/content/docs/commands/node-pool/apply.mdx rename to website/content/commands/node-pool/apply.mdx diff --git a/website/content/docs/commands/node-pool/delete.mdx b/website/content/commands/node-pool/delete.mdx similarity index 100% rename from website/content/docs/commands/node-pool/delete.mdx rename to website/content/commands/node-pool/delete.mdx diff --git a/website/content/docs/commands/node-pool/index.mdx b/website/content/commands/node-pool/index.mdx similarity index 77% rename from website/content/docs/commands/node-pool/index.mdx rename to website/content/commands/node-pool/index.mdx index 8a75dc989..385069b20 100644 --- a/website/content/docs/commands/node-pool/index.mdx +++ b/website/content/commands/node-pool/index.mdx @@ -30,10 +30,10 @@ following subcommands are available: - [`node pool nodes`][nodes] - Retrieve a list of nodes in a node pool. -[apply]: /nomad/docs/commands/node-pool/apply -[delete]: /nomad/docs/commands/node-pool/delete -[info]: /nomad/docs/commands/node-pool/info -[init]: /nomad/docs/commands/node-pool/init -[jobs]: /nomad/docs/commands/node-pool/jobs -[list]: /nomad/docs/commands/node-pool/list -[nodes]: /nomad/docs/commands/node-pool/nodes +[apply]: /nomad/commands/node-pool/apply +[delete]: /nomad/commands/node-pool/delete +[info]: /nomad/commands/node-pool/info +[init]: /nomad/commands/node-pool/init +[jobs]: /nomad/commands/node-pool/jobs +[list]: /nomad/commands/node-pool/list +[nodes]: /nomad/commands/node-pool/nodes diff --git a/website/content/docs/commands/node-pool/info.mdx b/website/content/commands/node-pool/info.mdx similarity index 100% rename from website/content/docs/commands/node-pool/info.mdx rename to website/content/commands/node-pool/info.mdx diff --git a/website/content/docs/commands/node-pool/init.mdx b/website/content/commands/node-pool/init.mdx similarity index 100% rename from website/content/docs/commands/node-pool/init.mdx rename to website/content/commands/node-pool/init.mdx diff --git a/website/content/docs/commands/node-pool/jobs.mdx b/website/content/commands/node-pool/jobs.mdx similarity index 100% rename from website/content/docs/commands/node-pool/jobs.mdx rename to website/content/commands/node-pool/jobs.mdx diff --git a/website/content/docs/commands/node-pool/list.mdx b/website/content/commands/node-pool/list.mdx similarity index 100% rename from website/content/docs/commands/node-pool/list.mdx rename to website/content/commands/node-pool/list.mdx diff --git a/website/content/docs/commands/node-pool/nodes.mdx b/website/content/commands/node-pool/nodes.mdx similarity index 100% rename from website/content/docs/commands/node-pool/nodes.mdx rename to website/content/commands/node-pool/nodes.mdx diff --git a/website/content/docs/commands/node/config.mdx b/website/content/commands/node/config.mdx similarity index 100% rename from website/content/docs/commands/node/config.mdx rename to website/content/commands/node/config.mdx diff --git a/website/content/docs/commands/node/drain.mdx b/website/content/commands/node/drain.mdx similarity index 96% rename from website/content/docs/commands/node/drain.mdx rename to website/content/commands/node/drain.mdx index 37feefe93..41cd16565 100644 --- a/website/content/docs/commands/node/drain.mdx +++ b/website/content/commands/node/drain.mdx @@ -157,9 +157,9 @@ $ nomad node drain -self -monitor @include 'general_options_no_namespace.mdx' -[eligibility]: /nomad/docs/commands/node/eligibility +[eligibility]: /nomad/commands/node/eligibility [`migrate`]: /nomad/docs/job-specification/migrate [`reschedule`]: /nomad/docs/job-specification/reschedule -[node status]: /nomad/docs/commands/node/status -[workload migration guide]: /nomad/tutorials/manage-clusters/node-drain -[internals-csi]: /nomad/docs/concepts/plugins/storage/csi +[node status]: /nomad/commands/node/status +[workload migration guide]: /nomad/docs/manage/migrate-workloads +[internals-csi]: /nomad/docs/architecture/storage/csi diff --git a/website/content/docs/commands/node/eligibility.mdx b/website/content/commands/node/eligibility.mdx similarity index 98% rename from website/content/docs/commands/node/eligibility.mdx rename to website/content/commands/node/eligibility.mdx index 45fd3c907..aa00a4b35 100644 --- a/website/content/docs/commands/node/eligibility.mdx +++ b/website/content/commands/node/eligibility.mdx @@ -68,4 +68,4 @@ Node "574545c5-c2d7-e352-d505-5e2cb9fe169f" scheduling eligibility set: ineligib @include 'general_options_no_namespace.mdx' -[drain]: /nomad/docs/commands/node/drain +[drain]: /nomad/commands/node/drain diff --git a/website/content/docs/commands/node/index.mdx b/website/content/commands/node/index.mdx similarity index 67% rename from website/content/docs/commands/node/index.mdx rename to website/content/commands/node/index.mdx index 67994b21d..d79a69519 100644 --- a/website/content/docs/commands/node/index.mdx +++ b/website/content/commands/node/index.mdx @@ -27,8 +27,8 @@ subcommands are available: - [`node status`][status] - Display status information about nodes -[config]: /nomad/docs/commands/node/config 'View or modify client configuration details' -[drain]: /nomad/docs/commands/node/drain 'Set drain mode on a given node' -[eligibility]: /nomad/docs/commands/node/eligibility 'Toggle scheduling eligibility on a given node' -[meta]: /nomad/docs/commands/node/meta 'Interact with node metadata' -[status]: /nomad/docs/commands/node/status 'Display status information about nodes' +[config]: /nomad/commands/node/config 'View or modify client configuration details' +[drain]: /nomad/commands/node/drain 'Set drain mode on a given node' +[eligibility]: /nomad/commands/node/eligibility 'Toggle scheduling eligibility on a given node' +[meta]: /nomad/commands/node/meta 'Interact with node metadata' +[status]: /nomad/commands/node/status 'Display status information about nodes' diff --git a/website/content/docs/commands/node/meta/apply.mdx b/website/content/commands/node/meta/apply.mdx similarity index 100% rename from website/content/docs/commands/node/meta/apply.mdx rename to website/content/commands/node/meta/apply.mdx diff --git a/website/content/docs/commands/node/meta/index.mdx b/website/content/commands/node/meta/index.mdx similarity index 83% rename from website/content/docs/commands/node/meta/index.mdx rename to website/content/commands/node/meta/index.mdx index d0d3d3512..e0e18500e 100644 --- a/website/content/docs/commands/node/meta/index.mdx +++ b/website/content/commands/node/meta/index.mdx @@ -24,6 +24,6 @@ Please see the individual subcommand help for detailed usage information: - [`apply`][apply] - Modify node metadata - [`read`][read] - Read node metadata -[interp]: /nomad/docs/runtime/interpolation#node-attributes -[apply]: /nomad/docs/commands/node/meta/apply -[read]: /nomad/docs/commands/node/meta/read +[interp]: /nomad/docs/reference/runtime-variable-interpolation#node-attributes +[apply]: /nomad/commands/node/meta/apply +[read]: /nomad/commands/node/meta/read diff --git a/website/content/docs/commands/node/meta/read.mdx b/website/content/commands/node/meta/read.mdx similarity index 100% rename from website/content/docs/commands/node/meta/read.mdx rename to website/content/commands/node/meta/read.mdx diff --git a/website/content/docs/commands/node/status.mdx b/website/content/commands/node/status.mdx similarity index 100% rename from website/content/docs/commands/node/status.mdx rename to website/content/commands/node/status.mdx diff --git a/website/content/docs/commands/operator/api.mdx b/website/content/commands/operator/api.mdx similarity index 100% rename from website/content/docs/commands/operator/api.mdx rename to website/content/commands/operator/api.mdx diff --git a/website/content/docs/commands/operator/autopilot/get-config.mdx b/website/content/commands/operator/autopilot/get-config.mdx similarity index 95% rename from website/content/docs/commands/operator/autopilot/get-config.mdx rename to website/content/commands/operator/autopilot/get-config.mdx index 464223db5..4625255f4 100644 --- a/website/content/docs/commands/operator/autopilot/get-config.mdx +++ b/website/content/commands/operator/autopilot/get-config.mdx @@ -46,5 +46,5 @@ returned configuration settings. @include 'general_options_no_namespace.mdx' -[autopilot-guide]: /nomad/tutorials/manage-clusters/autopilot +[autopilot-guide]: /nomad/docs/manage/autopilot [autopilot-config]: /nomad/docs/configuration/autopilot diff --git a/website/content/docs/commands/operator/autopilot/health.mdx b/website/content/commands/operator/autopilot/health.mdx similarity index 95% rename from website/content/docs/commands/operator/autopilot/health.mdx rename to website/content/commands/operator/autopilot/health.mdx index 4702c5b79..ab837b870 100644 --- a/website/content/docs/commands/operator/autopilot/health.mdx +++ b/website/content/commands/operator/autopilot/health.mdx @@ -42,5 +42,5 @@ e349749b-3303-3ddf-959c-b5885a0e1f6e node1 127.0.0.1:4647 alive 1.7. @include 'general_options_no_namespace.mdx' -[autopilot-guide]: /nomad/tutorials/manage-clusters/autopilot +[autopilot-guide]: /nomad/docs/manage/autopilot [api-docs]: /nomad/api-docs/operator/autopilot#read-autopilot-configuration diff --git a/website/content/docs/commands/operator/autopilot/set-config.mdx b/website/content/commands/operator/autopilot/set-config.mdx similarity index 97% rename from website/content/docs/commands/operator/autopilot/set-config.mdx rename to website/content/commands/operator/autopilot/set-config.mdx index b45f63737..79ad24385 100644 --- a/website/content/docs/commands/operator/autopilot/set-config.mdx +++ b/website/content/commands/operator/autopilot/set-config.mdx @@ -71,4 +71,4 @@ Configuration updated! [`redundancy_zone`]: /nomad/docs/configuration/server#redundancy_zone [`upgrade_version`]: /nomad/docs/configuration/server#upgrade_version -[autopilot-guide]: /nomad/tutorials/manage-clusters/autopilot +[autopilot-guide]: /nomad/docs/manage/autopilot diff --git a/website/content/docs/commands/operator/client-state.mdx b/website/content/commands/operator/client-state.mdx similarity index 100% rename from website/content/docs/commands/operator/client-state.mdx rename to website/content/commands/operator/client-state.mdx diff --git a/website/content/docs/commands/operator/debug.mdx b/website/content/commands/operator/debug.mdx similarity index 100% rename from website/content/docs/commands/operator/debug.mdx rename to website/content/commands/operator/debug.mdx diff --git a/website/content/docs/commands/operator/gossip/keyring-generate.mdx b/website/content/commands/operator/gossip/keyring-generate.mdx similarity index 100% rename from website/content/docs/commands/operator/gossip/keyring-generate.mdx rename to website/content/commands/operator/gossip/keyring-generate.mdx diff --git a/website/content/docs/commands/operator/gossip/keyring-install.mdx b/website/content/commands/operator/gossip/keyring-install.mdx similarity index 100% rename from website/content/docs/commands/operator/gossip/keyring-install.mdx rename to website/content/commands/operator/gossip/keyring-install.mdx diff --git a/website/content/docs/commands/operator/gossip/keyring-list.mdx b/website/content/commands/operator/gossip/keyring-list.mdx similarity index 100% rename from website/content/docs/commands/operator/gossip/keyring-list.mdx rename to website/content/commands/operator/gossip/keyring-list.mdx diff --git a/website/content/docs/commands/operator/gossip/keyring-remove.mdx b/website/content/commands/operator/gossip/keyring-remove.mdx similarity index 100% rename from website/content/docs/commands/operator/gossip/keyring-remove.mdx rename to website/content/commands/operator/gossip/keyring-remove.mdx diff --git a/website/content/docs/commands/operator/gossip/keyring-use.mdx b/website/content/commands/operator/gossip/keyring-use.mdx similarity index 100% rename from website/content/docs/commands/operator/gossip/keyring-use.mdx rename to website/content/commands/operator/gossip/keyring-use.mdx diff --git a/website/content/docs/commands/operator/index.mdx b/website/content/commands/operator/index.mdx similarity index 57% rename from website/content/docs/commands/operator/index.mdx rename to website/content/commands/operator/index.mdx index 2ac15f39d..4b9b5a526 100644 --- a/website/content/docs/commands/operator/index.mdx +++ b/website/content/commands/operator/index.mdx @@ -68,24 +68,24 @@ The following subcommands are available: - [`operator snapshot inspect`][snapshot-inspect] - Inspects a snapshot of the Nomad server state -[debug]: /nomad/docs/commands/operator/debug 'Builds an archive of configuration and state' -[get-config]: /nomad/docs/commands/operator/autopilot/get-config 'Autopilot Get Config command' -[gossip_keyring_generate]: /nomad/docs/commands/operator/gossip/keyring-generate 'Generates a gossip encryption key' -[gossip_keyring_install]: /nomad/docs/commands/operator/gossip/keyring-install 'Install a gossip encryption key' -[gossip_keyring_list]: /nomad/docs/commands/operator/gossip/keyring-list 'List available gossip encryption keys' -[gossip_keyring_remove]: /nomad/docs/commands/operator/gossip/keyring-remove 'Deletes a gossip encryption key' -[gossip_keyring_use]: /nomad/docs/commands/operator/gossip/keyring-use 'Sets a gossip encryption key as the active key' -[list]: /nomad/docs/commands/operator/raft/list-peers 'Raft List Peers command' +[debug]: /nomad/commands/operator/debug 'Builds an archive of configuration and state' +[get-config]: /nomad/commands/operator/autopilot/get-config 'Autopilot Get Config command' +[gossip_keyring_generate]: /nomad/commands/operator/gossip/keyring-generate 'Generates a gossip encryption key' +[gossip_keyring_install]: /nomad/commands/operator/gossip/keyring-install 'Install a gossip encryption key' +[gossip_keyring_list]: /nomad/commands/operator/gossip/keyring-list 'List available gossip encryption keys' +[gossip_keyring_remove]: /nomad/commands/operator/gossip/keyring-remove 'Deletes a gossip encryption key' +[gossip_keyring_use]: /nomad/commands/operator/gossip/keyring-use 'Sets a gossip encryption key as the active key' +[list]: /nomad/commands/operator/raft/list-peers 'Raft List Peers command' [operator]: /nomad/api-docs/operator 'Operator API documentation' -[outage recovery guide]: /nomad/tutorials/manage-clusters/outage-recovery -[remove]: /nomad/docs/commands/operator/raft/remove-peer 'Raft Remove Peer command' -[root_keyring_list]: /nomad/docs/commands/operator/root/keyring-list 'List available root encryption keys' -[root_keyring_remove]: /nomad/docs/commands/operator/root/keyring-remove 'Deletes a root encryption key' -[root_keyring_rotate]: /nomad/docs/commands/operator/root/keyring-rotate 'Rotates the root encryption key' -[set-config]: /nomad/docs/commands/operator/autopilot/set-config 'Autopilot Set Config command' -[snapshot-save]: /nomad/docs/commands/operator/snapshot/save 'Snapshot Save command' -[snapshot-restore]: /nomad/docs/commands/operator/snapshot/restore 'Snapshot Restore command' -[snapshot-inspect]: /nomad/docs/commands/operator/snapshot/inspect 'Snapshot Inspect command' -[snapshot-agent]: /nomad/docs/commands/operator/snapshot/agent 'Snapshot Agent command' -[scheduler-get-config]: /nomad/docs/commands/operator/scheduler/get-config 'Scheduler Get Config command' -[scheduler-set-config]: /nomad/docs/commands/operator/scheduler/set-config 'Scheduler Set Config command' +[outage recovery guide]: /nomad/docs/manage/outage-recovery +[remove]: /nomad/commands/operator/raft/remove-peer 'Raft Remove Peer command' +[root_keyring_list]: /nomad/commands/operator/root/keyring-list 'List available root encryption keys' +[root_keyring_remove]: /nomad/commands/operator/root/keyring-remove 'Deletes a root encryption key' +[root_keyring_rotate]: /nomad/commands/operator/root/keyring-rotate 'Rotates the root encryption key' +[set-config]: /nomad/commands/operator/autopilot/set-config 'Autopilot Set Config command' +[snapshot-save]: /nomad/commands/operator/snapshot/save 'Snapshot Save command' +[snapshot-restore]: /nomad/commands/operator/snapshot/restore 'Snapshot Restore command' +[snapshot-inspect]: /nomad/commands/operator/snapshot/inspect 'Snapshot Inspect command' +[snapshot-agent]: /nomad/commands/operator/snapshot/agent 'Snapshot Agent command' +[scheduler-get-config]: /nomad/commands/operator/scheduler/get-config 'Scheduler Get Config command' +[scheduler-set-config]: /nomad/commands/operator/scheduler/set-config 'Scheduler Set Config command' diff --git a/website/content/docs/commands/operator/metrics.mdx b/website/content/commands/operator/metrics.mdx similarity index 100% rename from website/content/docs/commands/operator/metrics.mdx rename to website/content/commands/operator/metrics.mdx diff --git a/website/content/docs/commands/operator/raft/info.mdx b/website/content/commands/operator/raft/info.mdx similarity index 100% rename from website/content/docs/commands/operator/raft/info.mdx rename to website/content/commands/operator/raft/info.mdx diff --git a/website/content/docs/commands/operator/raft/list-peers.mdx b/website/content/commands/operator/raft/list-peers.mdx similarity index 96% rename from website/content/docs/commands/operator/raft/list-peers.mdx rename to website/content/commands/operator/raft/list-peers.mdx index da8bb868f..3a3b9b3e8 100644 --- a/website/content/docs/commands/operator/raft/list-peers.mdx +++ b/website/content/commands/operator/raft/list-peers.mdx @@ -61,4 +61,4 @@ nomad-server03.global 10.10.11.7:4647 10.10.11.7:4647 follower true @include 'general_options_no_namespace.mdx' [operator]: /nomad/api-docs/operator -[outage recovery]: /nomad/tutorials/manage-clusters/outage-recovery +[outage recovery]: /nomad/docs/manage/outage-recovery diff --git a/website/content/docs/commands/operator/raft/logs.mdx b/website/content/commands/operator/raft/logs.mdx similarity index 100% rename from website/content/docs/commands/operator/raft/logs.mdx rename to website/content/commands/operator/raft/logs.mdx diff --git a/website/content/docs/commands/operator/raft/remove-peer.mdx b/website/content/commands/operator/raft/remove-peer.mdx similarity index 83% rename from website/content/docs/commands/operator/raft/remove-peer.mdx rename to website/content/commands/operator/raft/remove-peer.mdx index b5d237d80..144148bc4 100644 --- a/website/content/docs/commands/operator/raft/remove-peer.mdx +++ b/website/content/commands/operator/raft/remove-peer.mdx @@ -37,7 +37,7 @@ If ACLs are enabled, this command requires a management token. @include 'general_options_no_namespace.mdx' -[`nomad server force-leave`]: /nomad/docs/commands/server/force-leave 'Nomad server force-leave command' -[`nomad server members`]: /nomad/docs/commands/server/members 'Nomad server members command' +[`nomad server force-leave`]: /nomad/commands/server/force-leave 'Nomad server force-leave command' +[`nomad server members`]: /nomad/commands/server/members 'Nomad server members command' [operator]: /nomad/api-docs/operator 'Nomad Operator API' -[outage recovery]: /nomad/tutorials/manage-clusters/outage-recovery +[outage recovery]: /nomad/docs/manage/outage-recovery diff --git a/website/content/docs/commands/operator/raft/state.mdx b/website/content/commands/operator/raft/state.mdx similarity index 100% rename from website/content/docs/commands/operator/raft/state.mdx rename to website/content/commands/operator/raft/state.mdx diff --git a/website/content/docs/commands/operator/raft/transfer-leadership.mdx b/website/content/commands/operator/raft/transfer-leadership.mdx similarity index 93% rename from website/content/docs/commands/operator/raft/transfer-leadership.mdx rename to website/content/commands/operator/raft/transfer-leadership.mdx index 697b97faf..57052fcd9 100644 --- a/website/content/docs/commands/operator/raft/transfer-leadership.mdx +++ b/website/content/commands/operator/raft/transfer-leadership.mdx @@ -50,7 +50,7 @@ Provide either `-peer-address` or `-peer-id`, but not both. @include 'general_options_no_namespace.mdx' -[`nomad operator raft list-peers`]: /nomad/docs/commands/operator/raft/list-peers 'Nomad operator raft list-peers command' +[`nomad operator raft list-peers`]: /nomad/commands/operator/raft/list-peers 'Nomad operator raft list-peers command' [operator]: /nomad/api-docs/operator 'Nomad Operator API' [rolling upgrade]: /nomad/docs/upgrade#upgrade-process [Read Raft Configuration]: /nomad/api-docs/operator/raft#read-raft-configuration diff --git a/website/content/docs/commands/operator/root/keyring-list.mdx b/website/content/commands/operator/root/keyring-list.mdx similarity index 100% rename from website/content/docs/commands/operator/root/keyring-list.mdx rename to website/content/commands/operator/root/keyring-list.mdx diff --git a/website/content/docs/commands/operator/root/keyring-remove.mdx b/website/content/commands/operator/root/keyring-remove.mdx similarity index 94% rename from website/content/docs/commands/operator/root/keyring-remove.mdx rename to website/content/commands/operator/root/keyring-remove.mdx index 3091f4927..8b5fc9b71 100644 --- a/website/content/docs/commands/operator/root/keyring-remove.mdx +++ b/website/content/commands/operator/root/keyring-remove.mdx @@ -21,7 +21,7 @@ nomad operator root keyring remove [options] The `key ID` must be the UUID. Use the `-verbose` option with the [`nomad operator root keyring list` -command](/nomad/docs/commands/operator/root/keyring-list) to fetch the key +command](/nomad/commands/operator/root/keyring-list) to fetch the key UUID. ## Options diff --git a/website/content/docs/commands/operator/root/keyring-rotate.mdx b/website/content/commands/operator/root/keyring-rotate.mdx similarity index 100% rename from website/content/docs/commands/operator/root/keyring-rotate.mdx rename to website/content/commands/operator/root/keyring-rotate.mdx diff --git a/website/content/docs/commands/operator/scheduler/get-config.mdx b/website/content/commands/operator/scheduler/get-config.mdx similarity index 100% rename from website/content/docs/commands/operator/scheduler/get-config.mdx rename to website/content/commands/operator/scheduler/get-config.mdx diff --git a/website/content/docs/commands/operator/scheduler/set-config.mdx b/website/content/commands/operator/scheduler/set-config.mdx similarity index 100% rename from website/content/docs/commands/operator/scheduler/set-config.mdx rename to website/content/commands/operator/scheduler/set-config.mdx diff --git a/website/content/docs/commands/operator/snapshot/agent.mdx b/website/content/commands/operator/snapshot/agent.mdx similarity index 100% rename from website/content/docs/commands/operator/snapshot/agent.mdx rename to website/content/commands/operator/snapshot/agent.mdx diff --git a/website/content/docs/commands/operator/snapshot/inspect.mdx b/website/content/commands/operator/snapshot/inspect.mdx similarity index 95% rename from website/content/docs/commands/operator/snapshot/inspect.mdx rename to website/content/commands/operator/snapshot/inspect.mdx index 02e473e90..0b11a2b91 100644 --- a/website/content/docs/commands/operator/snapshot/inspect.mdx +++ b/website/content/commands/operator/snapshot/inspect.mdx @@ -57,4 +57,4 @@ ClusterMetadata 1 71 B Total 90 158 KiB ``` -[outage recovery]: /nomad/tutorials/manage-clusters/outage-recovery +[outage recovery]: /nomad/docs/manage/outage-recovery diff --git a/website/content/docs/commands/operator/snapshot/redact.mdx b/website/content/commands/operator/snapshot/redact.mdx similarity index 100% rename from website/content/docs/commands/operator/snapshot/redact.mdx rename to website/content/commands/operator/snapshot/redact.mdx diff --git a/website/content/docs/commands/operator/snapshot/restore.mdx b/website/content/commands/operator/snapshot/restore.mdx similarity index 93% rename from website/content/docs/commands/operator/snapshot/restore.mdx rename to website/content/commands/operator/snapshot/restore.mdx index 9ee91ee1b..e09970553 100644 --- a/website/content/docs/commands/operator/snapshot/restore.mdx +++ b/website/content/commands/operator/snapshot/restore.mdx @@ -35,5 +35,5 @@ nomad operator snapshot restore [options] @include 'general_options_no_namespace.mdx' -[outage recovery]: /nomad/tutorials/manage-clusters/outage-recovery +[outage recovery]: /nomad/docs/manage/outage-recovery diff --git a/website/content/docs/commands/operator/snapshot/save.mdx b/website/content/commands/operator/snapshot/save.mdx similarity index 96% rename from website/content/docs/commands/operator/snapshot/save.mdx rename to website/content/commands/operator/snapshot/save.mdx index eea932b65..30b6d178e 100644 --- a/website/content/docs/commands/operator/snapshot/save.mdx +++ b/website/content/commands/operator/snapshot/save.mdx @@ -61,5 +61,5 @@ nomad operator snapshot save [options] @include 'general_options_no_namespace.mdx' -[outage recovery]: /nomad/tutorials/manage-clusters/outage-recovery +[outage recovery]: /nomad/docs/manage/outage-recovery [KMS provider]: /nomad/docs/configuration/keyring diff --git a/website/content/docs/commands/operator/snapshot/state.mdx b/website/content/commands/operator/snapshot/state.mdx similarity index 100% rename from website/content/docs/commands/operator/snapshot/state.mdx rename to website/content/commands/operator/snapshot/state.mdx diff --git a/website/content/docs/commands/operator/utilization.mdx b/website/content/commands/operator/utilization.mdx similarity index 94% rename from website/content/docs/commands/operator/utilization.mdx rename to website/content/commands/operator/utilization.mdx index e3510fb6e..0a0c8c1f9 100644 --- a/website/content/docs/commands/operator/utilization.mdx +++ b/website/content/commands/operator/utilization.mdx @@ -19,8 +19,8 @@ capability. Refer to the [manual license utilization -reporting](/nomad/docs/enterprise/license/manual-reporting) page to learn more -about reporting your Nomad Enterprise license utilization. +reporting](/nomad/docs/enterprise/license/utilization-reporting) page to learn +more about reporting your Nomad Enterprise license utilization. diff --git a/website/content/docs/commands/plugin/index.mdx b/website/content/commands/plugin/index.mdx similarity index 89% rename from website/content/docs/commands/plugin/index.mdx rename to website/content/commands/plugin/index.mdx index a218cdcfb..7fb656064 100644 --- a/website/content/docs/commands/plugin/index.mdx +++ b/website/content/commands/plugin/index.mdx @@ -21,4 +21,4 @@ subcommands are available: - [`plugin status`][status] - Display status information about a plugin [csi]: https://github.com/container-storage-interface/spec -[status]: /nomad/docs/commands/plugin/status 'Display status information about a plugin' +[status]: /nomad/commands/plugin/status 'Display status information about a plugin' diff --git a/website/content/docs/commands/plugin/status.mdx b/website/content/commands/plugin/status.mdx similarity index 100% rename from website/content/docs/commands/plugin/status.mdx rename to website/content/commands/plugin/status.mdx diff --git a/website/content/docs/commands/quota/apply.mdx b/website/content/commands/quota/apply.mdx similarity index 100% rename from website/content/docs/commands/quota/apply.mdx rename to website/content/commands/quota/apply.mdx diff --git a/website/content/docs/commands/quota/delete.mdx b/website/content/commands/quota/delete.mdx similarity index 100% rename from website/content/docs/commands/quota/delete.mdx rename to website/content/commands/quota/delete.mdx diff --git a/website/content/docs/commands/quota/index.mdx b/website/content/commands/quota/index.mdx similarity index 78% rename from website/content/docs/commands/quota/index.mdx rename to website/content/commands/quota/index.mdx index 00fdac9ff..ee2fb683d 100644 --- a/website/content/docs/commands/quota/index.mdx +++ b/website/content/commands/quota/index.mdx @@ -25,9 +25,9 @@ subcommands are available: - [`quota list`][quotalist] - List quota specifications - [`quota status`][quotastatus] - Display a quota's status and current usage -[quotaapply]: /nomad/docs/commands/quota/apply -[quotadelete]: /nomad/docs/commands/quota/delete -[quotainit]: /nomad/docs/commands/quota/init -[quotainspect]: /nomad/docs/commands/quota/inspect -[quotalist]: /nomad/docs/commands/quota/list -[quotastatus]: /nomad/docs/commands/quota/status +[quotaapply]: /nomad/commands/quota/apply +[quotadelete]: /nomad/commands/quota/delete +[quotainit]: /nomad/commands/quota/init +[quotainspect]: /nomad/commands/quota/inspect +[quotalist]: /nomad/commands/quota/list +[quotastatus]: /nomad/commands/quota/status diff --git a/website/content/docs/commands/quota/init.mdx b/website/content/commands/quota/init.mdx similarity index 100% rename from website/content/docs/commands/quota/init.mdx rename to website/content/commands/quota/init.mdx diff --git a/website/content/docs/commands/quota/inspect.mdx b/website/content/commands/quota/inspect.mdx similarity index 100% rename from website/content/docs/commands/quota/inspect.mdx rename to website/content/commands/quota/inspect.mdx diff --git a/website/content/docs/commands/quota/list.mdx b/website/content/commands/quota/list.mdx similarity index 100% rename from website/content/docs/commands/quota/list.mdx rename to website/content/commands/quota/list.mdx diff --git a/website/content/docs/commands/quota/status.mdx b/website/content/commands/quota/status.mdx similarity index 100% rename from website/content/docs/commands/quota/status.mdx rename to website/content/commands/quota/status.mdx diff --git a/website/content/docs/commands/recommendation/apply.mdx b/website/content/commands/recommendation/apply.mdx similarity index 100% rename from website/content/docs/commands/recommendation/apply.mdx rename to website/content/commands/recommendation/apply.mdx diff --git a/website/content/docs/commands/recommendation/dismiss.mdx b/website/content/commands/recommendation/dismiss.mdx similarity index 100% rename from website/content/docs/commands/recommendation/dismiss.mdx rename to website/content/commands/recommendation/dismiss.mdx diff --git a/website/content/docs/commands/recommendation/index.mdx b/website/content/commands/recommendation/index.mdx similarity index 78% rename from website/content/docs/commands/recommendation/index.mdx rename to website/content/commands/recommendation/index.mdx index a4c26a836..c0cf6f933 100644 --- a/website/content/docs/commands/recommendation/index.mdx +++ b/website/content/commands/recommendation/index.mdx @@ -23,7 +23,7 @@ subcommands are available: - [`recommendation info`][recommendationinfo] - Display an individual Nomad recommendation - [`recommendation list`][recommendationlist] - Display all Nomad recommendations -[recommendationapply]: /nomad/docs/commands/recommendation/apply -[recommendationdismiss]: /nomad/docs/commands/recommendation/dismiss -[recommendationinfo]: /nomad/docs/commands/recommendation/info -[recommendationlist]: /nomad/docs/commands/recommendation/list +[recommendationapply]: /nomad/commands/recommendation/apply +[recommendationdismiss]: /nomad/commands/recommendation/dismiss +[recommendationinfo]: /nomad/commands/recommendation/info +[recommendationlist]: /nomad/commands/recommendation/list diff --git a/website/content/docs/commands/recommendation/info.mdx b/website/content/commands/recommendation/info.mdx similarity index 100% rename from website/content/docs/commands/recommendation/info.mdx rename to website/content/commands/recommendation/info.mdx diff --git a/website/content/docs/commands/recommendation/list.mdx b/website/content/commands/recommendation/list.mdx similarity index 100% rename from website/content/docs/commands/recommendation/list.mdx rename to website/content/commands/recommendation/list.mdx diff --git a/website/content/docs/commands/scaling/index.mdx b/website/content/commands/scaling/index.mdx similarity index 82% rename from website/content/docs/commands/scaling/index.mdx rename to website/content/commands/scaling/index.mdx index d66b06fb7..3ef8b4c46 100644 --- a/website/content/docs/commands/scaling/index.mdx +++ b/website/content/commands/scaling/index.mdx @@ -19,5 +19,5 @@ subcommands are available: - [`policy info`][scalingpolicyinfo] - Display an individual Nomad scaling policy - [`policy list`][scalingpolicylist] - List all Nomad scaling policies -[scalingpolicyinfo]: /nomad/docs/commands/scaling/policy-info -[scalingpolicylist]: /nomad/docs/commands/scaling/policy-list +[scalingpolicyinfo]: /nomad/commands/scaling/policy-info +[scalingpolicylist]: /nomad/commands/scaling/policy-list diff --git a/website/content/docs/commands/scaling/policy-info.mdx b/website/content/commands/scaling/policy-info.mdx similarity index 100% rename from website/content/docs/commands/scaling/policy-info.mdx rename to website/content/commands/scaling/policy-info.mdx diff --git a/website/content/docs/commands/scaling/policy-list.mdx b/website/content/commands/scaling/policy-list.mdx similarity index 100% rename from website/content/docs/commands/scaling/policy-list.mdx rename to website/content/commands/scaling/policy-list.mdx diff --git a/website/content/docs/commands/sentinel/apply.mdx b/website/content/commands/sentinel/apply.mdx similarity index 90% rename from website/content/docs/commands/sentinel/apply.mdx rename to website/content/commands/sentinel/apply.mdx index 04f36ada5..f763ee8a7 100644 --- a/website/content/docs/commands/sentinel/apply.mdx +++ b/website/content/commands/sentinel/apply.mdx @@ -23,7 +23,7 @@ policy file. The policy file can be read from stdin by specifying "-" as the file name. Additionally, you must specify the `-scope` option. Refer to the [`-scope` field -description](/nomad/docs/commands/sentinel/apply#scope) for more information. +description](/nomad/commands/sentinel/apply#scope) for more information. Sentinel commands are only available when ACLs are enabled. This command requires a management token. @@ -39,7 +39,7 @@ requires a management token. - The `submit-host-volume` scope for creating or updating dynamic host volumes. - Refer to the [Sentinel guide](/nomad/docs/enterprise/sentinel) for scope details. + Refer to the [Sentinel guide](/nomad/docs/reference/sentinel-policy) for scope details. - `-level` : (default: advisory) Sets the enforcement level of the policy. Must be one of advisory, soft-mandatory, hard-mandatory. diff --git a/website/content/docs/commands/sentinel/delete.mdx b/website/content/commands/sentinel/delete.mdx similarity index 100% rename from website/content/docs/commands/sentinel/delete.mdx rename to website/content/commands/sentinel/delete.mdx diff --git a/website/content/docs/commands/sentinel/index.mdx b/website/content/commands/sentinel/index.mdx similarity index 82% rename from website/content/docs/commands/sentinel/index.mdx rename to website/content/commands/sentinel/index.mdx index ddab180e3..01c9f3c23 100644 --- a/website/content/docs/commands/sentinel/index.mdx +++ b/website/content/commands/sentinel/index.mdx @@ -23,7 +23,7 @@ subcommands are available: - [`sentinel list`][list] - Display all Sentinel policies - [`sentinel read`][read] - Inspects an existing Sentinel policies -[delete]: /nomad/docs/commands/sentinel/delete -[list]: /nomad/docs/commands/sentinel/list -[read]: /nomad/docs/commands/sentinel/read -[apply]: /nomad/docs/commands/sentinel/apply +[delete]: /nomad/commands/sentinel/delete +[list]: /nomad/commands/sentinel/list +[read]: /nomad/commands/sentinel/read +[apply]: /nomad/commands/sentinel/apply diff --git a/website/content/docs/commands/sentinel/list.mdx b/website/content/commands/sentinel/list.mdx similarity index 100% rename from website/content/docs/commands/sentinel/list.mdx rename to website/content/commands/sentinel/list.mdx diff --git a/website/content/docs/commands/sentinel/read.mdx b/website/content/commands/sentinel/read.mdx similarity index 100% rename from website/content/docs/commands/sentinel/read.mdx rename to website/content/commands/sentinel/read.mdx diff --git a/website/content/docs/commands/server/force-leave.mdx b/website/content/commands/server/force-leave.mdx similarity index 100% rename from website/content/docs/commands/server/force-leave.mdx rename to website/content/commands/server/force-leave.mdx diff --git a/website/content/docs/commands/server/index.mdx b/website/content/commands/server/index.mdx similarity index 73% rename from website/content/docs/commands/server/index.mdx rename to website/content/commands/server/index.mdx index 54db6b9ee..7081f1af9 100644 --- a/website/content/docs/commands/server/index.mdx +++ b/website/content/commands/server/index.mdx @@ -22,6 +22,6 @@ subcommands are available: - [`server join`][join] - Join server nodes together - [`server members`][members] - Display a list of known servers and their status -[force-leave]: /nomad/docs/commands/server/force-leave "Force a server into the 'left' state" -[join]: /nomad/docs/commands/server/join 'Join server nodes together' -[members]: /nomad/docs/commands/server/members 'Display a list of known servers and their status' +[force-leave]: /nomad/commands/server/force-leave "Force a server into the 'left' state" +[join]: /nomad/commands/server/join 'Join server nodes together' +[members]: /nomad/commands/server/members 'Display a list of known servers and their status' diff --git a/website/content/docs/commands/server/join.mdx b/website/content/commands/server/join.mdx similarity index 87% rename from website/content/docs/commands/server/join.mdx rename to website/content/commands/server/join.mdx index 353d895ba..2a329ade4 100644 --- a/website/content/docs/commands/server/join.mdx +++ b/website/content/commands/server/join.mdx @@ -37,8 +37,8 @@ $ nomad server join 10.0.0.8:4648 Joined 1 servers successfully ``` -[federate]: /nomad/tutorials/manage-clusters/federation -[Configure for multiple regions]: /nomad/tutorials/access-control/access-control-bootstrap#configure-for-multiple-regions +[federate]: //nomad/docs/deploy/clusters/federate-regions +[Configure for multiple regions]: /nomad/docs/secure/acl/bootstrap#configure-for-multiple-regions ## General options diff --git a/website/content/docs/commands/server/members.mdx b/website/content/commands/server/members.mdx similarity index 100% rename from website/content/docs/commands/server/members.mdx rename to website/content/commands/server/members.mdx diff --git a/website/content/docs/commands/service/delete.mdx b/website/content/commands/service/delete.mdx similarity index 100% rename from website/content/docs/commands/service/delete.mdx rename to website/content/commands/service/delete.mdx diff --git a/website/content/docs/commands/service/index.mdx b/website/content/commands/service/index.mdx similarity index 82% rename from website/content/docs/commands/service/index.mdx rename to website/content/commands/service/index.mdx index 6c73bff2b..3e045addd 100644 --- a/website/content/docs/commands/service/index.mdx +++ b/website/content/commands/service/index.mdx @@ -20,6 +20,6 @@ subcommands are available: - [`service info`][serviceinfo] - Display an individual Nomad service registration - [`service list`][servicelist] - Display all registered Nomad services -[servicedelete]: /nomad/docs/commands/service/delete -[serviceinfo]: /nomad/docs/commands/service/info -[servicelist]: /nomad/docs/commands/service/list +[servicedelete]: /nomad/commands/service/delete +[serviceinfo]: /nomad/commands/service/info +[servicelist]: /nomad/commands/service/list diff --git a/website/content/docs/commands/service/info.mdx b/website/content/commands/service/info.mdx similarity index 100% rename from website/content/docs/commands/service/info.mdx rename to website/content/commands/service/info.mdx diff --git a/website/content/docs/commands/service/list.mdx b/website/content/commands/service/list.mdx similarity index 100% rename from website/content/docs/commands/service/list.mdx rename to website/content/commands/service/list.mdx diff --git a/website/content/docs/commands/setup/consul.mdx b/website/content/commands/setup/consul.mdx similarity index 100% rename from website/content/docs/commands/setup/consul.mdx rename to website/content/commands/setup/consul.mdx diff --git a/website/content/docs/commands/setup/index.mdx b/website/content/commands/setup/index.mdx similarity index 89% rename from website/content/docs/commands/setup/index.mdx rename to website/content/commands/setup/index.mdx index f4d683b94..6b5fb7d3f 100644 --- a/website/content/docs/commands/setup/index.mdx +++ b/website/content/commands/setup/index.mdx @@ -27,5 +27,5 @@ subcommands are available: - [`setup consul`][consul] - Setup a Consul cluster for Nomad integration. - [`setup vault`][vault] - Setup a Vault cluster for Nomad integration. -[consul]: /nomad/docs/commands/setup/consul -[vault]: /nomad/docs/commands/setup/vault +[consul]: /nomad/commands/setup/consul +[vault]: /nomad/commands/setup/vault diff --git a/website/content/docs/commands/setup/vault.mdx b/website/content/commands/setup/vault.mdx similarity index 100% rename from website/content/docs/commands/setup/vault.mdx rename to website/content/commands/setup/vault.mdx diff --git a/website/content/docs/commands/status.mdx b/website/content/commands/status.mdx similarity index 100% rename from website/content/docs/commands/status.mdx rename to website/content/commands/status.mdx diff --git a/website/content/docs/commands/system/gc.mdx b/website/content/commands/system/gc.mdx similarity index 100% rename from website/content/docs/commands/system/gc.mdx rename to website/content/commands/system/gc.mdx diff --git a/website/content/docs/commands/system/index.mdx b/website/content/commands/system/index.mdx similarity index 77% rename from website/content/docs/commands/system/index.mdx rename to website/content/commands/system/index.mdx index d9f195d50..7e3193b81 100644 --- a/website/content/docs/commands/system/index.mdx +++ b/website/content/commands/system/index.mdx @@ -20,5 +20,5 @@ subcommands are available: - [`system gc`][gc] - Run the system garbage collection process - [`system reconcile summaries`][reconcile-summaries] - Reconciles the summaries of all registered jobs -[gc]: /nomad/docs/commands/system/gc 'Run the system garbage collection process' -[reconcile-summaries]: /nomad/docs/commands/system/reconcile-summaries 'Reconciles the summaries of all registered jobs' +[gc]: /nomad/commands/system/gc 'Run the system garbage collection process' +[reconcile-summaries]: /nomad/commands/system/reconcile-summaries 'Reconciles the summaries of all registered jobs' diff --git a/website/content/docs/commands/system/reconcile-summaries.mdx b/website/content/commands/system/reconcile-summaries.mdx similarity index 100% rename from website/content/docs/commands/system/reconcile-summaries.mdx rename to website/content/commands/system/reconcile-summaries.mdx diff --git a/website/content/docs/commands/tls/ca-create.mdx b/website/content/commands/tls/ca-create.mdx similarity index 100% rename from website/content/docs/commands/tls/ca-create.mdx rename to website/content/commands/tls/ca-create.mdx diff --git a/website/content/docs/commands/tls/ca-info.mdx b/website/content/commands/tls/ca-info.mdx similarity index 100% rename from website/content/docs/commands/tls/ca-info.mdx rename to website/content/commands/tls/ca-info.mdx diff --git a/website/content/docs/commands/tls/cert-create.mdx b/website/content/commands/tls/cert-create.mdx similarity index 97% rename from website/content/docs/commands/tls/cert-create.mdx rename to website/content/commands/tls/cert-create.mdx index bff5f8571..67d7a92eb 100644 --- a/website/content/docs/commands/tls/cert-create.mdx +++ b/website/content/commands/tls/cert-create.mdx @@ -78,4 +78,4 @@ $ nomad tls cert create -cli ==> Cli Certificate key saved to global-cli-nomad-key.pem ``` -[TLS encryption]: /nomad/tutorials/transport-security/security-enable-tls +[TLS encryption]: /nomad/docs/secure/traffic/tls diff --git a/website/content/docs/commands/tls/cert-info.mdx b/website/content/commands/tls/cert-info.mdx similarity index 100% rename from website/content/docs/commands/tls/cert-info.mdx rename to website/content/commands/tls/cert-info.mdx diff --git a/website/content/docs/commands/tls/index.mdx b/website/content/commands/tls/index.mdx similarity index 70% rename from website/content/docs/commands/tls/index.mdx rename to website/content/commands/tls/index.mdx index 55667194d..ba70b9be1 100644 --- a/website/content/docs/commands/tls/index.mdx +++ b/website/content/commands/tls/index.mdx @@ -21,7 +21,7 @@ subcommands are available: - [`cert create`][certcreate] - Create self signed certificates - [`cert info`][certinfo] - Display information from a certificate -[cacreate]: /nomad/docs/commands/tls/ca-create 'Create Certificate Authority' -[cainfo]: /nomad/docs/commands/tls/ca-info 'Display information from a CA certificate' -[certcreate]: /nomad/docs/commands/tls/cert-create 'Create self signed certificates' -[certinfo]: /nomad/docs/commands/tls/cert-info 'Display information from a certificate' +[cacreate]: /nomad/commands/tls/ca-create 'Create Certificate Authority' +[cainfo]: /nomad/commands/tls/ca-info 'Display information from a CA certificate' +[certcreate]: /nomad/commands/tls/cert-create 'Create self signed certificates' +[certinfo]: /nomad/commands/tls/cert-info 'Display information from a certificate' diff --git a/website/content/docs/commands/ui.mdx b/website/content/commands/ui.mdx similarity index 100% rename from website/content/docs/commands/ui.mdx rename to website/content/commands/ui.mdx diff --git a/website/content/docs/commands/var/get.mdx b/website/content/commands/var/get.mdx similarity index 100% rename from website/content/docs/commands/var/get.mdx rename to website/content/commands/var/get.mdx diff --git a/website/content/docs/commands/var/index.mdx b/website/content/commands/var/index.mdx similarity index 88% rename from website/content/docs/commands/var/index.mdx rename to website/content/commands/var/index.mdx index c835f74dd..d092a2c28 100644 --- a/website/content/docs/commands/var/index.mdx +++ b/website/content/commands/var/index.mdx @@ -58,9 +58,9 @@ user = dba ``` [variables]: /nomad/docs/concepts/variables -[init]: /nomad/docs/commands/var/init -[get]: /nomad/docs/commands/var/get -[list]: /nomad/docs/commands/var/list -[put]: /nomad/docs/commands/var/put -[purge]: /nomad/docs/commands/var/purge -[lock]: /nomad/docs/commands/var/lock +[init]: /nomad/commands/var/init +[get]: /nomad/commands/var/get +[list]: /nomad/commands/var/list +[put]: /nomad/commands/var/put +[purge]: /nomad/commands/var/purge +[lock]: /nomad/commands/var/lock diff --git a/website/content/docs/commands/var/init.mdx b/website/content/commands/var/init.mdx similarity index 100% rename from website/content/docs/commands/var/init.mdx rename to website/content/commands/var/init.mdx diff --git a/website/content/docs/commands/var/list.mdx b/website/content/commands/var/list.mdx similarity index 100% rename from website/content/docs/commands/var/list.mdx rename to website/content/commands/var/list.mdx diff --git a/website/content/docs/commands/var/lock.mdx b/website/content/commands/var/lock.mdx similarity index 100% rename from website/content/docs/commands/var/lock.mdx rename to website/content/commands/var/lock.mdx diff --git a/website/content/docs/commands/var/purge.mdx b/website/content/commands/var/purge.mdx similarity index 100% rename from website/content/docs/commands/var/purge.mdx rename to website/content/commands/var/purge.mdx diff --git a/website/content/docs/commands/var/put.mdx b/website/content/commands/var/put.mdx similarity index 100% rename from website/content/docs/commands/var/put.mdx rename to website/content/commands/var/put.mdx diff --git a/website/content/docs/commands/version.mdx b/website/content/commands/version.mdx similarity index 100% rename from website/content/docs/commands/version.mdx rename to website/content/commands/version.mdx diff --git a/website/content/docs/commands/volume/claim-delete.mdx b/website/content/commands/volume/claim-delete.mdx similarity index 100% rename from website/content/docs/commands/volume/claim-delete.mdx rename to website/content/commands/volume/claim-delete.mdx diff --git a/website/content/docs/commands/volume/claim-list.mdx b/website/content/commands/volume/claim-list.mdx similarity index 100% rename from website/content/docs/commands/volume/claim-list.mdx rename to website/content/commands/volume/claim-list.mdx diff --git a/website/content/docs/commands/volume/create.mdx b/website/content/commands/volume/create.mdx similarity index 95% rename from website/content/docs/commands/volume/create.mdx rename to website/content/commands/volume/create.mdx index ba4455c31..f99fb2aa8 100644 --- a/website/content/docs/commands/volume/create.mdx +++ b/website/content/commands/volume/create.mdx @@ -89,9 +89,9 @@ the exact section. [csi]: https://github.com/container-storage-interface/spec -[csi_plugins_internals]: /nomad/docs/concepts/plugins/storage/csi -[registers]: /nomad/docs/commands/volume/register -[registered]: /nomad/docs/commands/volume/register +[csi_plugins_internals]: /nomad/docs/architecture/storage/csi +[registers]: /nomad/commands/volume/register +[registered]: /nomad/commands/volume/register [volume_specification]: /nomad/docs/other-specifications/volume [csi_vol_spec]: /nomad/docs/other-specifications/volume/csi [host_vol_spec]: /nomad/docs/other-specifications/volume/host diff --git a/website/content/docs/commands/volume/delete.mdx b/website/content/commands/volume/delete.mdx similarity index 91% rename from website/content/docs/commands/volume/delete.mdx rename to website/content/commands/volume/delete.mdx index 55544ad26..b4ce6cad0 100644 --- a/website/content/docs/commands/volume/delete.mdx +++ b/website/content/commands/volume/delete.mdx @@ -51,6 +51,6 @@ volumes or `host-volume-delete` for dynamic host volumes. @include 'general_options.mdx' [csi]: https://github.com/container-storage-interface/spec -[csi_plugins_internals]: /nomad/docs/concepts/plugins/storage/csi -[deregistered]: /nomad/docs/commands/volume/deregister -[registered]: /nomad/docs/commands/volume/register +[csi_plugins_internals]: /nomad/docs/architecture/storage/csi +[deregistered]: /nomad/commands/volume/deregister +[registered]: /nomad/commands/volume/register diff --git a/website/content/docs/commands/volume/deregister.mdx b/website/content/commands/volume/deregister.mdx similarity index 96% rename from website/content/docs/commands/volume/deregister.mdx rename to website/content/commands/volume/deregister.mdx index 43f889c9c..938bc4ba5 100644 --- a/website/content/docs/commands/volume/deregister.mdx +++ b/website/content/commands/volume/deregister.mdx @@ -41,4 +41,4 @@ When ACLs are enabled, this command requires a token with the @include 'general_options.mdx' [csi]: https://github.com/container-storage-interface/spec -[`volume delete`]: /nomad/docs/commands/volume/delete +[`volume delete`]: /nomad/commands/volume/delete diff --git a/website/content/docs/commands/volume/detach.mdx b/website/content/commands/volume/detach.mdx similarity index 100% rename from website/content/docs/commands/volume/detach.mdx rename to website/content/commands/volume/detach.mdx diff --git a/website/content/docs/commands/volume/index.mdx b/website/content/commands/volume/index.mdx similarity index 63% rename from website/content/docs/commands/volume/index.mdx rename to website/content/commands/volume/index.mdx index ea24c6f0c..b1ef30aae 100644 --- a/website/content/docs/commands/volume/index.mdx +++ b/website/content/commands/volume/index.mdx @@ -29,15 +29,15 @@ subcommands are available: - [`volume snapshot list`][snapshot-list] - List all volume snapshots. - [`volume status`][status] - Display status information about a volume. -[create]: /nomad/docs/commands/volume/create -[claim-delete]: /nomad/docs/commands/volume/claim-delete -[claim-list]: /nomad/docs/commands/volume/claim-list -[delete]: /nomad/docs/commands/volume/delete -[deregister]: /nomad/docs/commands/volume/deregister 'Deregister a volume' -[detach]: /nomad/docs/commands/volume/detach 'Detach a volume' -[init]: /nomad/docs/commands/volume/init 'Create an example volume specification file' -[register]: /nomad/docs/commands/volume/register 'Register a volume' -[snapshot-create]: /nomad/docs/commands/volume/snapshot-create -[snapshot-delete]: /nomad/docs/commands/volume/snapshot-delete -[snapshot-list]: /nomad/docs/commands/volume/snapshot-list -[status]: /nomad/docs/commands/volume/status 'Display status information about a volume' +[create]: /nomad/commands/volume/create +[claim-delete]: /nomad/commands/volume/claim-delete +[claim-list]: /nomad/commands/volume/claim-list +[delete]: /nomad/commands/volume/delete +[deregister]: /nomad/commands/volume/deregister 'Deregister a volume' +[detach]: /nomad/commands/volume/detach 'Detach a volume' +[init]: /nomad/commands/volume/init 'Create an example volume specification file' +[register]: /nomad/commands/volume/register 'Register a volume' +[snapshot-create]: /nomad/commands/volume/snapshot-create +[snapshot-delete]: /nomad/commands/volume/snapshot-delete +[snapshot-list]: /nomad/commands/volume/snapshot-list +[status]: /nomad/commands/volume/status 'Display status information about a volume' diff --git a/website/content/docs/commands/volume/init.mdx b/website/content/commands/volume/init.mdx similarity index 100% rename from website/content/docs/commands/volume/init.mdx rename to website/content/commands/volume/init.mdx diff --git a/website/content/docs/commands/volume/register.mdx b/website/content/commands/volume/register.mdx similarity index 97% rename from website/content/docs/commands/volume/register.mdx rename to website/content/commands/volume/register.mdx index 84c56f9e9..1c33a3a4e 100644 --- a/website/content/docs/commands/volume/register.mdx +++ b/website/content/commands/volume/register.mdx @@ -110,6 +110,6 @@ the exact section. [csi]: https://github.com/container-storage-interface/spec -[csi_plugins_internals]: /nomad/docs/concepts/plugins/storage/csi +[csi_plugins_internals]: /nomad/docs/architecture/storage/csi [volume_specification]: /nomad/docs/other-specifications/volume -[`volume create`]: /nomad/docs/commands/volume/create +[`volume create`]: /nomad/commands/volume/create diff --git a/website/content/docs/commands/volume/snapshot-create.mdx b/website/content/commands/volume/snapshot-create.mdx similarity index 94% rename from website/content/docs/commands/volume/snapshot-create.mdx rename to website/content/commands/volume/snapshot-create.mdx index 49a4b49b1..b2cb9b6e7 100644 --- a/website/content/docs/commands/volume/snapshot-create.mdx +++ b/website/content/commands/volume/snapshot-create.mdx @@ -64,5 +64,5 @@ Completed snapshot of volume ebs_prod_db1 with snapshot ID snap-12345. [csi]: https://github.com/container-storage-interface/spec [csi_plugin]: /nomad/docs/job-specification/csi_plugin -[registered]: /nomad/docs/commands/volume/register -[csi_plugins_internals]: /nomad/docs/concepts/plugins/storage/csi +[registered]: /nomad/commands/volume/register +[csi_plugins_internals]: /nomad/docs/architecture/storage/csi diff --git a/website/content/docs/commands/volume/snapshot-delete.mdx b/website/content/commands/volume/snapshot-delete.mdx similarity index 92% rename from website/content/docs/commands/volume/snapshot-delete.mdx rename to website/content/commands/volume/snapshot-delete.mdx index 6478dc1c7..1b7fe956b 100644 --- a/website/content/docs/commands/volume/snapshot-delete.mdx +++ b/website/content/commands/volume/snapshot-delete.mdx @@ -45,5 +45,5 @@ Deleted snapshot snap-12345. [csi]: https://github.com/container-storage-interface/spec [csi_plugin]: /nomad/docs/job-specification/csi_plugin -[registered]: /nomad/docs/commands/volume/register -[csi_plugins_internals]: /nomad/docs/concepts/plugins/storage/csi +[registered]: /nomad/commands/volume/register +[csi_plugins_internals]: /nomad/docs/architecture/storage/csi diff --git a/website/content/docs/commands/volume/snapshot-list.mdx b/website/content/commands/volume/snapshot-list.mdx similarity index 95% rename from website/content/docs/commands/volume/snapshot-list.mdx rename to website/content/commands/volume/snapshot-list.mdx index a79ef60c2..48702249e 100644 --- a/website/content/docs/commands/volume/snapshot-list.mdx +++ b/website/content/commands/volume/snapshot-list.mdx @@ -63,5 +63,5 @@ snap-12345 vol-abcdef 50GiB 2021-01-03T12:15:02Z true [csi]: https://github.com/container-storage-interface/spec [csi_plugin]: /nomad/docs/job-specification/csi_plugin -[registered]: /nomad/docs/commands/volume/register -[csi_plugins_internals]: /nomad/docs/concepts/plugins/storage/csi +[registered]: /nomad/commands/volume/register +[csi_plugins_internals]: /nomad/docs/architecture/storage/csi diff --git a/website/content/docs/commands/volume/status.mdx b/website/content/commands/volume/status.mdx similarity index 98% rename from website/content/docs/commands/volume/status.mdx rename to website/content/commands/volume/status.mdx index 73b32a6da..bd7d7ad3f 100644 --- a/website/content/docs/commands/volume/status.mdx +++ b/website/content/commands/volume/status.mdx @@ -133,5 +133,5 @@ b00fa322 28be17d5 write csi 0 run [csi]: https://github.com/container-storage-interface/spec [csi_plugin]: /nomad/docs/job-specification/csi_plugin -[`volume create`]: /nomad/docs/commands/volume/create +[`volume create`]: /nomad/commands/volume/create [dhv]: /nomad/docs/other-specifications/volume/host diff --git a/website/content/docs/concepts/consensus.mdx b/website/content/docs/architecture/cluster/consensus.mdx similarity index 100% rename from website/content/docs/concepts/consensus.mdx rename to website/content/docs/architecture/cluster/consensus.mdx diff --git a/website/content/docs/concepts/architecture/federation.mdx b/website/content/docs/architecture/cluster/federation.mdx similarity index 88% rename from website/content/docs/concepts/architecture/federation.mdx rename to website/content/docs/architecture/cluster/federation.mdx index 196087d46..aa00cbd3d 100644 --- a/website/content/docs/concepts/architecture/federation.mdx +++ b/website/content/docs/architecture/cluster/federation.mdx @@ -57,13 +57,13 @@ the job on all the specified regions, removing the need for multiple job specification copies and registration on each region. Multiregion jobs do not provide regional failover in the event of failure. -[acl_policy]: /nomad/docs/concepts/acl/#policy -[acl_role]: /nomad/docs/concepts/acl/#role -[acl_auth_method]: /nomad/docs/concepts/acl/#auth-method -[acl_binding_rule]: /nomad/docs/concepts/acl/#binding-rule -[acl_token]: /nomad/docs/concepts/acl/#token -[node_pool]: /nomad/docs/concepts/node-pools +[acl_policy]: /nomad/docs/secure/acl/#policies +[acl_role]: /nomad/docs/secure/acl/#roles +[acl_auth_method]: /nomad/docs/secure/acl/#authentication-methods +[acl_binding_rule]: /nomad/docs/secure/acl/#binding-rules +[acl_token]: /nomad/docs/secure/acl/#tokens +[node_pool]: /nomad/docs/architecture/cluster/node-pools [namespace]: /nomad/docs/other-specifications/namespace [quota]: /nomad/docs/other-specifications/quota -[sentinel_policies]: /nomad/docs/enterprise/sentinel#sentinel-policies +[sentinel_policies]: /nomad/docs/reference/sentinel-policy [`multiregion`]: /nomad/docs/job-specification/multiregion diff --git a/website/content/docs/architecture/cluster/node-pools.mdx b/website/content/docs/architecture/cluster/node-pools.mdx new file mode 100644 index 000000000..d9993bb05 --- /dev/null +++ b/website/content/docs/architecture/cluster/node-pools.mdx @@ -0,0 +1,11 @@ +--- +layout: docs +page_title: Node pools +description: Nomad's node pools feature groups clients and segments infrastructure into logical units so that jobs have control over client allocation placement. Review node pool replication in multi-region clusters, built-in node pools, node pool patterns, and enterprise features such as scheduler configuration, node pool governance, and multi-region jobs. +--- + +# Node pools + +This page contains conceptual information about Nomad's node pools feature. + +@include 'node-pools.mdx' diff --git a/website/content/docs/concepts/cpu.mdx b/website/content/docs/architecture/cpu.mdx similarity index 100% rename from website/content/docs/concepts/cpu.mdx rename to website/content/docs/architecture/cpu.mdx diff --git a/website/content/docs/concepts/architecture/index.mdx b/website/content/docs/architecture/index.mdx similarity index 94% rename from website/content/docs/concepts/architecture/index.mdx rename to website/content/docs/architecture/index.mdx index 8087ff61a..a06acf316 100644 --- a/website/content/docs/concepts/architecture/index.mdx +++ b/website/content/docs/architecture/index.mdx @@ -28,7 +28,7 @@ Looking at only a single region, at a high level Nomad looks like this: [![Regional Architecture](/img/nomad-architecture-region.png)](/img/nomad-architecture-region.png) Within each region, we have both clients and servers. Servers are responsible for -accepting jobs from users, managing clients, and [computing task placements](/nomad/docs/concepts/scheduling/scheduling). +accepting jobs from users, managing clients, and [computing task placements](/nomad/docs/concepts/scheduling/how-scheduling-works). Each region may have clients from multiple datacenters, allowing a small number of servers to handle very large clusters. @@ -78,8 +78,8 @@ ensuring PCI compliant workloads run on appropriate servers. ## Getting in Depth This has been a brief high-level overview of the architecture of Nomad. There -are more details available for each of the sub-systems. The [consensus protocol](/nomad/docs/concepts/consensus), -[gossip protocol](/nomad/docs/concepts/gossip), and [scheduler design](/nomad/docs/concepts/scheduling/scheduling) +are more details available for each of the sub-systems. The [consensus protocol](/nomad/docs/architecture/cluster/consensus), +[gossip protocol](/nomad/docs/architecture/security/gossip), and [scheduler design](/nomad/docs/concepts/scheduling/how-scheduling-works) are all documented in more detail. For other details, either consult the code, [open an issue on diff --git a/website/content/docs/concepts/gossip.mdx b/website/content/docs/architecture/security/gossip.mdx similarity index 97% rename from website/content/docs/concepts/gossip.mdx rename to website/content/docs/architecture/security/gossip.mdx index efe8d15a5..2d1b698af 100644 --- a/website/content/docs/concepts/gossip.mdx +++ b/website/content/docs/architecture/security/gossip.mdx @@ -30,7 +30,7 @@ cross region requests. The integrated failure detection allows Nomad to gracefully handle an entire region losing connectivity, or just a single server in a remote region. Nomad also uses the gossip protocol to detect servers in the same region to perform automatic clustering via the [consensus -protocol](/nomad/docs/concepts/consensus). +protocol](/nomad/docs/architecture/cluster/consensus). To provide all these features, Nomad uses [Serf][serf] as an embedded library. From a user perspective, this is not important, since Nomad masks the diff --git a/website/content/docs/concepts/security.mdx b/website/content/docs/architecture/security/index.mdx similarity index 94% rename from website/content/docs/concepts/security.mdx rename to website/content/docs/architecture/security/index.mdx index 25e9ba848..6097b2812 100644 --- a/website/content/docs/concepts/security.mdx +++ b/website/content/docs/architecture/security/index.mdx @@ -26,7 +26,7 @@ features for multi-tenant deployments are offered exclusively in the enterprise version. This documentation may need to be adapted to your deployment situation, but the general mechanisms for a secure Nomad deployment revolve around: -- **[mTLS](/nomad/tutorials/transport-security/security-enable-tls)** Mutual +- **[mTLS](/nomad/docs/secure/traffic/tls)** Mutual authentication of both the TLS server and client x509 certificates prevents internal abuse by preventing unauthenticated access to network components within the cluster. @@ -34,7 +34,7 @@ but the general mechanisms for a secure Nomad deployment revolve around: - **[ACLs](/nomad/tutorials/access-control)** Enables authorization for authenticated connections by granting capabilities to ACL tokens. -- **[Namespaces](/nomad/tutorials/manage-clusters/namespaces)** Access to read +- **[Namespaces](/nomad/docs/govern/namespaces)** Access to read and write to a namespace can be controlled to allow for granular access to job information managed within a multi-tenant cluster. @@ -91,7 +91,7 @@ recommendations accordingly. ### Requirements -- [mTLS enabled](/nomad/tutorials/transport-security/security-enable-tls) +- [mTLS enabled](/nomad/docs/secure/traffic/tls) Mutual TLS (mTLS) enables [mutual authentication](https://en.wikipedia.org/wiki/Mutual_authentication) with security properties to prevent the following problems: @@ -142,7 +142,7 @@ recommendations accordingly. identity. This allows for access to capabilities within the cluster to be restricted to specific users. -- [Namespaces](/nomad/tutorials/manage-clusters/namespaces) This feature +- [Namespaces](/nomad/docs/govern/namespaces) This feature allows for a cluster to be shared by multiple teams within a company. Using this logical separation is important for multi-tenant clusters to prevent users without access to that namespace from conflicting with each other. @@ -171,7 +171,7 @@ environment. frequently is highly recommended to reduce damage of accidentally leaked credentials. - - Use [Vault](/nomad/docs/integrations/vault-integration) to create and manage + - Use [Vault](/nomad/docs/secure/vault) to create and manage dynamic, rotated credentials prevent secrets from being easily exposed within the [job specification](/nomad/docs/job-specification) itself which may be leaked into version control or otherwise be accidentally stored @@ -192,7 +192,7 @@ environment. Containers](https://katacontainers.io/). These types of runtimes provide sandboxing features which help prevent raw access to the underlying shared kernel for other containers and the Nomad client agent itself. Docker driver - allows [customizing runtimes](/nomad/docs/drivers/docker#runtime). + allows [customizing runtimes](/nomad/docs/job-declare/task-driver/docker#runtime). - **[Disable Unused Drivers](/nomad/docs/configuration/client#driver-denylist)** - Each driver provides different degrees of isolation, and bugs may allow @@ -204,9 +204,9 @@ environment. both the Nomad hosts and applied to containers for an extra layer of security. Seccomp profiles are able to be passed directly to containers using the - **[`security_opt`](/nomad/docs/drivers/docker#security_opt)** + **[`security_opt`](/nomad/docs/job-declare/task-driver/docker#security_opt)** parameter available in the default [Docker - driver](/nomad/docs/drivers/docker). + driver](/nomad/docs/job-declare/task-driver/docker). - **[Service Mesh](https://www.hashicorp.com/resources/service-mesh-microservices-networking)** - Integrating service mesh technologies such as @@ -274,7 +274,7 @@ The following are not part of the threat model for client agents: - **Access (read or write) to the Nomad configuration directory** - Access to a client's configuration file can enable and disable features for a client including insecure drivers such as - [`raw_exec`](/nomad/docs/drivers/raw_exec). + [`raw_exec`](/nomad/docs/job-declare/task-driver/raw_exec). - **Memory access to a running Nomad client agent** - Direct access to the memory of the Nomad client agent process allows an attack to extract secrets @@ -306,7 +306,7 @@ The following are not part of the threat model for client agents: and the backend configuration of these drivers should be considered to implement defense in depth. For example, a custom Docker driver that limits the ability to mount the host file system may be subverted by network access - to an exposed Docker daemon API through other means such as the [`raw_exec`](/nomad/docs/drivers/raw_exec) + to an exposed Docker daemon API through other means such as the [`raw_exec`](/nomad/docs/job-declare/task-driver/raw_exec) driver. ### External Threats @@ -330,7 +330,7 @@ There are two main components to consider to for external threats in a Nomad clu |----------------------|---------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------| | **4646** / TCP | All | [HTTP](https://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol) to provide [UI](/nomad/tutorials/web-ui/web-ui-access) and [API](/nomad/api-docs) access to agents. | | **4647** / TCP | All | [RPC](https://en.wikipedia.org/wiki/Remote_procedure_call) protocol used by agents. | -| **4648** / TCP + UDP | Servers | [gossip](/nomad/docs/concepts/gossip) protocol to manage server membership using [Serf][serf]. | +| **4648** / TCP + UDP | Servers | [gossip](/nomad/docs/architecture/security/gossip) protocol to manage server membership using [Serf][serf]. | [api_metrics]: /nomad/api-docs/metrics @@ -338,5 +338,5 @@ There are two main components to consider to for external threats in a Nomad clu [Variables]: /nomad/docs/concepts/variables [verify_https_client]: /nomad/docs/configuration/tls#verify_https_client [serf]: https://github.com/hashicorp/serf -[Sentinel Policies]: /nomad/tutorials/governance-and-policy/sentinel -[Resource Quotas]: /nomad/tutorials/governance-and-policy/quotas +[Sentinel Policies]: /nomad/docs/govern/sentinel +[Resource Quotas]: /nomad/docs/govern/resource-quotas diff --git a/website/content/docs/concepts/plugins/storage/csi.mdx b/website/content/docs/architecture/storage/csi.mdx similarity index 100% rename from website/content/docs/concepts/plugins/storage/csi.mdx rename to website/content/docs/architecture/storage/csi.mdx diff --git a/website/content/docs/concepts/plugins/storage/host-volumes.mdx b/website/content/docs/architecture/storage/host-volumes.mdx similarity index 98% rename from website/content/docs/concepts/plugins/storage/host-volumes.mdx rename to website/content/docs/architecture/storage/host-volumes.mdx index 2fcb75fe2..f6f82f147 100644 --- a/website/content/docs/concepts/plugins/storage/host-volumes.mdx +++ b/website/content/docs/architecture/storage/host-volumes.mdx @@ -417,13 +417,13 @@ Plugin authors should consider these details when writing plugins. the host filesystem, or some external data store of your choosing, perhaps even Nomad variables. -[stateful-workloads]: /nomad/docs/operations/stateful-workloads#host-volumes +[stateful-workloads]: /nomad/docs/architecture/storage/stateful-workloads#host-volumes [plugin_dir]: /nomad/docs/configuration/client#host_volume_plugin_dir [volume specification]: /nomad/docs/other-specifications/volume/host [go-version]: https://pkg.go.dev/github.com/hashicorp/go-version#pkg-constants -[cli-create]: /nomad/docs/commands/volume/create +[cli-create]: /nomad/commands/volume/create [api-create]: /nomad/api-docs/volumes#create-dynamic-host-volume -[cli-delete]: /nomad/docs/commands/volume/delete +[cli-delete]: /nomad/commands/volume/delete [api-delete]: /nomad/api-docs/volumes#delete-dynamic-host-volume [mkdir_plugin]: /nomad/docs/other-specifications/volume/host#mkdir-plugin [host_volumes_dir]: /nomad/docs/configuration/client#host_volumes_dir diff --git a/website/content/docs/concepts/plugins/storage/index.mdx b/website/content/docs/architecture/storage/index.mdx similarity index 93% rename from website/content/docs/concepts/plugins/storage/index.mdx rename to website/content/docs/architecture/storage/index.mdx index 4c41ff77f..03da12190 100644 --- a/website/content/docs/concepts/plugins/storage/index.mdx +++ b/website/content/docs/architecture/storage/index.mdx @@ -28,5 +28,5 @@ Choose between two types of storage plugins: which clients have instances of a given plugin, so it can call on the plugin to mount the volume. -[dhv]: /nomad/docs/concepts/plugins/storage/host-volumes -[csi]: /nomad/docs/concepts/plugins/storage/csi +[dhv]: /nomad/docs/architecture/storage/host-volumes +[csi]: /nomad/docs/architecture/storage/csi diff --git a/website/content/docs/operations/stateful-workloads.mdx b/website/content/docs/architecture/storage/stateful-workloads.mdx similarity index 96% rename from website/content/docs/operations/stateful-workloads.mdx rename to website/content/docs/architecture/storage/stateful-workloads.mdx index f47f0f29c..b709296f0 100644 --- a/website/content/docs/operations/stateful-workloads.mdx +++ b/website/content/docs/architecture/storage/stateful-workloads.mdx @@ -124,7 +124,7 @@ security profile (due to needing to run as `privileged` containers in order to be able to mount volumes). The [Stateful Workloads with CSI -tutorial](/nomad/tutorials/stateful-workloads/stateful-workloads-csi-volumes) +tutorial](/nomad/docs/stateful-workloads/csi-volumes) and the [Nomad CSI demo repository](https://github.com/hashicorp/nomad/tree/main/demo/csi) offer guidance and examples on how to use CSI plugins with Nomad and include job files @@ -143,11 +143,11 @@ you can use host volumes for both local somewhat persistent storage and for highly persistent networked storage. Host volumes may be dynamic or static. Provision dynamic host volumes -with the [`volume create`](/nomad/docs/commands/volume/create) command or +with the [`volume create`](/nomad/commands/volume/create) command or API. [ACL policies](/nomad/docs/other-specifications/acl-policy#namespace-rules) allow delegation of control for storage within a namespace to Nomad Operators. The dynamic host volume [plugin -specification](/nomad/docs/concepts/plugins/storage/host-volumes) allows you to +specification](/nomad/docs/architecture/storage/host-volumes) allows you to develop plugins specific to your local storage environment. For example, in an on-prem cluster you could write a plugin to perform LVM thin-provisioning. @@ -155,7 +155,7 @@ You declare static host volumes in the Nomad agent's configuration file, and you must restart the Nomad client to reconfigure them. This makes static host volumes impractical if you frequently change your storage configuration. Furthermore, it might require coordination between different -[personas](/nomad/docs/concepts/security#personas) to configure and consume host +[personas](/nomad/docs/architecture/security#personas) to configure and consume host volumes. For example, a Nomad Administrator must modify Nomad's configuration file to add, update, and remove host volumes to make them available for consumption by Nomad Operators. Or, with networked host volumes, a Storage Administrator @@ -229,7 +229,7 @@ following resources: ### Allocations - Monitoring your allocations and their storage with [Nomad's event - stream](/nomad/tutorials/integrate-nomad/event-stream) + stream](//nomad/docs/monitor/event-stream) - [Best practices for cluster setup](/well-architected-framework/nomad/production-reference-architecture-vm-with-consul) @@ -248,8 +248,8 @@ following resources: ### Dynamic Host Volumes -- [Dynamic host volume plugins](/nomad/docs/concepts/plugins/storage/host-volumes) -- [Dynamic host volume tutorial](/nomad/tutorials/stateful-workloads/stateful-workloads-dynamic-host-volumes) +- [Dynamic host volume plugins](/nomad/docs/architecture/storage/host-volumes) +- [Dynamic host volume tutorial](/nomad/docs/stateful-workloads/dynamic-host-volumes) -[csi-concepts]: /nomad/docs/concepts/plugins/storage/csi -[csi-tutorial]: /nomad/tutorials/stateful-workloads/stateful-workloads-csi-volumes +[csi-concepts]: /nomad/docs/architecture/storage/csi +[csi-tutorial]: /nomad/docs/stateful-workloads/csi-volumes diff --git a/website/content/docs/concepts/acl/index.mdx b/website/content/docs/concepts/acl/index.mdx deleted file mode 100644 index 2da93992b..000000000 --- a/website/content/docs/concepts/acl/index.mdx +++ /dev/null @@ -1,122 +0,0 @@ ---- -layout: docs -page_title: Access Control List (ACL) -description: Learn how Nomad's Access Control List (ACL) security system uses tokens, policies, roles, and capabilities to control access to data and resources. ---- - -# Access Control List (ACL) - -This page provides conceptual information about Nomad's Access Control List -(ACL) security system. At the highest level, Nomad's ACL system has tokens, -policies, roles, and capabilities. Additionally, Nomad's Single Sign-On (SSO) -ACL capabilities use auth methods and binding rules to restrict access. - -The Nomad [access control tutorials][] provide detailed information and -guidance on Nomad ACL system. - -## Policy - -Policies consist of a set of rules defining the capabilities or actions to be -granted. For example, a `readonly` policy might only grant the ability to list -and inspect running jobs, but not to submit new ones. No permissions are -granted by default, making Nomad a default-deny system. - -Each policy comprises one or more rules. The rules define the capabilities of a -Nomad ACL token for accessing the objects in a Nomad cluster, objects like -namespaces, node, agent, operator, quota, etc. For more information on writing -policies, see the [ACL policy reference doc][]. - -## Role - -Roles group one or more ACL policies into a container which can then be used to -generate ACL tokens for authorisation. This abstraction allows easier control -and updating of ACL permissions, particularly in larger, more diverse clusters. - -## Token - -Requests to Nomad are authenticated using a bearer token. Each ACL token has a -public Accessor ID which is used to name a token and a Secret ID which is used -to make requests to Nomad. The Secret ID is provided using a request header -(`X-Nomad-Token`) and is used to authenticate the caller. Tokens are either -management or client types. The `management` tokens are effectively "root" in -the system and can perform any operation. The `client` tokens are associated -with one or more ACL policies or roles which grant specific capabilities. - -When ACL tokens are created, they can be optionally marked as `Global`. This -causes them to be created in the authoritative region and replicated to all -other regions. Otherwise, tokens are created locally in the region the request -was made and not replicated. `Local` tokens cannot be used for cross-region -requests since they are not replicated between regions. - -## Workload Identity - -Nomad allocations can receive workload identities in the form of a -[JSON Web Token (JWT)][jwt]. The -[Workload Identity concept page][workload identity] has more information on -this topic. - -## Auth Method - -Authentication methods dictate how Nomad should talk to SSO providers when a -user requests to authenticate using one. Currently, Nomad supports the [OpenID -Connect (OIDC)][oidc] SSO workflow which allows users to log in to Nomad via -applications such as [Auth0][auth0], [Okta][okta], and [Vault][vault], and -non-interactive login via externally-issued [JSON Web Tokens (JWT)][jwt]. - -Since both the `oidc` and `jwt` auth methods ultimately operate on JWTs as -bearer tokens, use the following to determine which method fits your use case: - -- **JWT** - - - Ideal for machine-oriented, headless login where an operator may have already - arranged for a valid JWT to be dropped on a VM or provided to a container. - - User or application performing the Nomad login must have a valid JWT - to begin login. - - Does not require browser interaction. - -- **OIDC** - - - Ideal for human-oriented, interactive login where an operator or administrator - may have deployed SSO widely and doesn't want to distribute Nomad ACL tokens - to every authorized user. - - User performing the Nomad login does not need a JWT. - - Requires browser interaction. - -## Binding Rule - -Binding rules provide a mapping between a Nomad user's SSO authorisation claims -and internal Nomad objects such as ACL Roles and ACL Policies. A binding rule -is directly related to a single auth method, and therefore only evaluated by -login attempts using that method. All binding rules mapped to an auth method -are evaluated during each login attempt. - - - Binding rules are evaluated in no specific order, and should there be an - overlap in their selectors or scope, a "sum" of all the binding rules will be - applied, thus the least granular binding rules will always override the more - granular ones, as long as they apply to the same auth method and identity. - - -A successful selector match between an SSO provider claim and a binding rule -will result in the generated ACL token having the identified ACL role or policy -assigned to it. If the `BindType` parameter is `management`, the ACL token -generated will be a `management` token, rather than a client token. This -matcher supersedes role or policy assignments, and therefore should be used -with caution. - -## Replication - -Multi-region federated clusters run replication process to replicate ACL -objects from the [authoritative region][]. The replication processes run on -each federated leader and replicate ACL policies, roles, auth methods, binding -rules, and token marked as `Global`. - -[access control tutorials]: /nomad/tutorials/access-control -[ACL policy reference doc]: /nomad/docs/other-specifications/acl-policy -[authoritative region]: /nomad/docs/configuration/server#authoritative_region -[jwt]: https://datatracker.ietf.org/doc/html/rfc7519 -[workload identity]: /nomad/docs/concepts/workload-identity -[oidc]: https://openid.net/connect/ -[auth0]: https://auth0.com/ -[okta]: https://www.okta.com/ -[vault]: https://www.vaultproject.io/ diff --git a/website/content/docs/concepts/filesystem.mdx b/website/content/docs/concepts/filesystem.mdx index 3129c588d..8e89adfa3 100644 --- a/website/content/docs/concepts/filesystem.mdx +++ b/website/content/docs/concepts/filesystem.mdx @@ -481,13 +481,13 @@ the volume, it is not possible to have `artifact`, `template`, or `dispatch_payload` blocks write to a volume. [artifacts]: /nomad/docs/job-specification/artifact -[csi volumes]: /nomad/docs/concepts/plugins/storage/csi +[csi volumes]: /nomad/docs/architecture/storage/csi [dispatch payloads]: /nomad/docs/job-specification/dispatch_payload [templates]: /nomad/docs/job-specification/template [`data_dir`]: /nomad/docs/configuration#data_dir [`ephemeral_disk`]: /nomad/docs/job-specification/ephemeral_disk [artifact]: /nomad/docs/job-specification/artifact -[chroot contents]: /nomad/docs/drivers/exec#chroot +[chroot contents]: /nomad/docs/deploy/task-driver/exec#chroot [filesystem isolation capability]: /nomad/docs/concepts/plugins/task-drivers#capabilities-capabilities-error [filesystem isolation mode]: #task-drivers-and-filesystem-isolation-modes [migrated]: /nomad/docs/job-specification/ephemeral_disk#migrate diff --git a/website/content/docs/concepts/index.mdx b/website/content/docs/concepts/index.mdx deleted file mode 100644 index f13be5647..000000000 --- a/website/content/docs/concepts/index.mdx +++ /dev/null @@ -1,12 +0,0 @@ ---- -layout: docs -page_title: Concepts -description: >- - Learn about Nomad's architecture, core concepts, and behavior. ---- - -# Nomad Concepts - -This section covers the core concepts of Nomad and explains the -technical details of how Nomad functions, its architecture, and -sub-systems. diff --git a/website/content/docs/concepts/job.mdx b/website/content/docs/concepts/job.mdx index 94fe05048..d73bb8cd0 100644 --- a/website/content/docs/concepts/job.mdx +++ b/website/content/docs/concepts/job.mdx @@ -176,10 +176,10 @@ configure them: [job-spec]: /nomad/docs/job-specification [job-spec-tutorial]: /nomad/tutorials/job-specifications [quickstart]: /nomad/tutorials/get-started/gs-deploy-job -[Schedulers]: /nomad/docs/schedulers +[Schedulers]: /nomad/docs/concepts/scheduling/schedulers [task-groups]: /nomad/docs/glossary#task-group [tasks]: /nomad/docs/glossary#task -[job-versions-guide]: /nomad/tutorials/manage-jobs/jobs-version -[compare-versions-section]: /nomad/tutorials/manage-jobs/jobs-version#compare-versions -[revert-version-section]: /nomad/tutorials/manage-jobs/jobs-version#revert-to-a-version -[clone-version-section]: /nomad/tutorials/manage-jobs/jobs-version#clone-a-version +[job-versions-guide]: /nomad/docs/job-run/versions +[compare-versions-section]: /nomad/docs/job-run/versions#compare-versions +[revert-version-section]: /nomad/docs/job-run/versions#revert-to-a-version +[clone-version-section]: /nomad/docs/job-run/versions#clone-a-version diff --git a/website/content/docs/concepts/scheduling/scheduling.mdx b/website/content/docs/concepts/scheduling/how-scheduling-works.mdx similarity index 98% rename from website/content/docs/concepts/scheduling/scheduling.mdx rename to website/content/docs/concepts/scheduling/how-scheduling-works.mdx index db1d24cf6..4dad4a986 100644 --- a/website/content/docs/concepts/scheduling/scheduling.mdx +++ b/website/content/docs/concepts/scheduling/how-scheduling-works.mdx @@ -1,10 +1,10 @@ --- layout: docs -page_title: Scheduling in Nomad +page_title: How Nomad job scheduling works description: Nomad implements job scheduling using jobs, nodes, allocations, and evaluations. Learn about job lifecycle and how the job scheduler generates the allocation plan that the server implements using a service, batch, system, sysbatch, or core scheduler. --- -# Scheduling in Nomad +# How Nomad job scheduling works This page provides conceptual information on how Nomad implements job scheduling using jobs, nodes, allocations, and evaluations. Learn about job lifecycle and how the job scheduler generates the allocation plan that the server implements using a service, batch, system, sysbatch, or core scheduler. diff --git a/website/content/docs/concepts/scheduling/index.mdx b/website/content/docs/concepts/scheduling/index.mdx index 7a538ecd7..5e47ea8d4 100644 --- a/website/content/docs/concepts/scheduling/index.mdx +++ b/website/content/docs/concepts/scheduling/index.mdx @@ -6,15 +6,17 @@ description: Nomad's scheduling component assigns jobs to client machines. Explo # Scheduling workloads -Scheduling workloads is the process of assigning tasks from jobs to client machines. +Scheduling workloads is the process of assigning tasks from jobs to client machines. It is one of Nomad's core functions. The design is heavily inspired by Google's work on both [Omega: flexible, scalable schedulers for large compute clusters][omega] and [Large-scale cluster management at Google with Borg][borg]. Refer to the links below for implementation details on scheduling in Nomad. -- [Scheduling Internals](/nomad/docs/concepts/scheduling/scheduling) - An overview of how the scheduler works. +- [How scheduling works](/nomad/docs/concepts/scheduling/how-scheduling-works) - An overview of how the scheduler works. - [Placement](/nomad/docs/concepts/scheduling/placement) - Explains how placements are computed and how they can be adjusted. -- [Preemption](/nomad/docs/concepts/scheduling/preemption) - Details of preemption, an advanced scheduler feature. +- [Preemption](/nomad/docs/concepts/scheduling/preemption) - Details of + preemption, an advanced scheduler feature. +- [Nomad job schedulers](/nomad/concepts/scheduling/schedulers) - Explains the different types of job schedulers. [omega]: https://research.google.com/pubs/pub41684.html [borg]: https://research.google.com/pubs/pub43438.html diff --git a/website/content/docs/concepts/scheduling/placement.mdx b/website/content/docs/concepts/scheduling/placement.mdx index 633aef5f3..59cedd36c 100644 --- a/website/content/docs/concepts/scheduling/placement.mdx +++ b/website/content/docs/concepts/scheduling/placement.mdx @@ -112,7 +112,7 @@ Refer to the [Node Pools][concept_np] concept page for more information. ## Understanding Evaluation Status When a job cannot be immediately placed, Nomad creates a chain of evaluations -to manage placement. First, your initial evaluation may be marked complete even if +to manage placement. First, your initial evaluation may be marked complete even if placement fails. Then Nomad creates a blocked evaluation to retry placement when resources become available. Your job remains pending until Nomad places all allocations. @@ -120,12 +120,12 @@ For example, if your job has specific constraints that are not available, you get an initial completed evaluation and a blocked evaluation. Nomad tracks ongoing placement attempts until a node that fits your constraints is available. -To troubleshoot placement issues, use `nomad eval status `. Check the output +To troubleshoot placement issues, use `nomad eval status `. Check the output for placement failures and linked evaluations. [api_client_metadata]: /nomad/api-docs/client#update-dynamic-node-metadata [cli_node_meta]: /nomad/docs/commands/node/meta -[concept_np]: /nomad/docs/concepts/node-pools +[concept_np]: /nomad/docs/architecture/cluster/node-pools [config_client_meta]: /nomad/docs/configuration/client#meta [config_client_node_class]: /nomad/docs/configuration/client#node_class [config_client_node_pool]: /nomad/docs/configuration/client#node_pool @@ -133,6 +133,6 @@ for placement failures and linked evaluations. [job_affinity]: /nomad/docs/job-specification/affinity [job_constraint]: /nomad/docs/job-specification/constraint [job_spread]: /nomad/docs/job-specification/spread -[node attributes]: /nomad/docs/runtime/interpolation#node-attributes +[node attributes]: /nomad/docs/reference/runtime-variable-interpolation#node-attributes [spec_node_pool]: /nomad/docs/other-specifications/node-pool [spec_node_pool_sched_config]: /nomad/docs/other-specifications/node-pool#scheduler_config-parameters diff --git a/website/content/docs/schedulers.mdx b/website/content/docs/concepts/scheduling/schedulers.mdx similarity index 98% rename from website/content/docs/schedulers.mdx rename to website/content/docs/concepts/scheduling/schedulers.mdx index e3cda48c8..95373cd97 100644 --- a/website/content/docs/schedulers.mdx +++ b/website/content/docs/concepts/scheduling/schedulers.mdx @@ -1,10 +1,10 @@ --- layout: docs -page_title: Schedulers +page_title: Nomad job schedulers description: Learn how Nomad's service, batch, system, and system batch job schedulers enable flexible workloads. --- -# Schedulers +# Nomad job schedulers This page provides conceptual information about Nomad service, batch, system, and system batch job schedulers. diff --git a/website/content/docs/concepts/stateful-deployments.mdx b/website/content/docs/concepts/stateful-deployments.mdx index 1dccfe153..21d6d35bf 100644 --- a/website/content/docs/concepts/stateful-deployments.mdx +++ b/website/content/docs/concepts/stateful-deployments.mdx @@ -2,7 +2,7 @@ layout: docs page_title: Stateful deployments description: |- - Learn how Nomad handles stateful deployments. Use a dynamic host volume for your stateful workload and bind an allocation to a volume ID with the sticky parameter. Learn how Nomad scales jobs with sticky volumes. + Learn how Nomad handles stateful deployments. Use a dynamic host volume for your stateful workload and bind an allocation to a volume ID with the sticky parameter. Learn how Nomad scales jobs with sticky volumes. --- # Stateful deployments @@ -10,7 +10,7 @@ description: |- Stateful deployments support only dynamic host volumes. For CSI volumes, use the -[`per_alloc` property](/nomad/docs/job-specification/volume#per_alloc), +[`per_alloc` property](/nomad/docs/job-specification/volume#per_alloc), which serves a similar purpose. @@ -73,11 +73,11 @@ to stop the job. Refer to the following Nomad pages for more information about stateful workloads and volumes: -- [Considerations for Stateful Workloads](/nomad/docs/operations/stateful-workloads) explores the options for persistent storage of workloads running in Nomad. -- The [Nomad volume specification][volumes] defines the schema for creating and registering volumes. +- [Considerations for Stateful Workloads](/nomad/docs/architecture/storage/stateful-workloads) explores the options for persistent storage of workloads running in Nomad. +- The [Nomad volume specification][volumes] defines the schema for creating and registering volumes. - The [job specification `volume` block](/nomad/docs/job-specification/volume) lets you configure a group that requires a specific volume from the cluster. - The [Stateful Workloads](/nomad/tutorials/stateful-workloads) tutorials explore techniques to run jobs that require access to persistent storage. - + [allocation]: /nomad/docs/glossary#allocation [delete]: /nomad/api-docs/volumes#delete-task-group-host-volume-claims [list]: /nomad/api-docs/volumes#list-task-group-host-volume-claims diff --git a/website/content/docs/concepts/variables.mdx b/website/content/docs/concepts/variables.mdx index 8c2842f75..414651485 100644 --- a/website/content/docs/concepts/variables.mdx +++ b/website/content/docs/concepts/variables.mdx @@ -241,14 +241,14 @@ implementation or the [Nomad Autoscaler][] High Availability implementation. [HashiCorp Consul]: https://www.consul.io/ [HashiCorp Vault]: https://www.vaultproject.io/ -[Key Management]: /nomad/docs/operations/key-management +[Key Management]: /nomad/docs/manage/key-management [ACL policy specification]: /nomad/docs/other-specifications/acl-policy [`template`]: /nomad/docs/job-specification/template#nomad-variables [workload identity]: /nomad/docs/concepts/workload-identity [Workload Associated ACL Policies]: /nomad/docs/concepts/workload-identity#workload-associated-acl-policies [ACL policy namespace rules]: /nomad/docs/other-specifications/acl-policy#namespace-rules [The Chubby Lock Service for Loosely-Coupled Distributed Systems]: https://research.google/pubs/pub27897/ -[`nomad var lock`]: /nomad/docs/commands/var +[`nomad var lock`]: /nomad/commands/var [Go Package]: https://pkg.go.dev/github.com/hashicorp/nomad/api [implementation]: https://github.com/hashicorp/nomad/blob/release/1.7.0/command/var_lock.go#L240 [Nomad Autoscaler]: https://github.com/hashicorp/nomad-autoscaler/blob/v0.4.0/command/agent.go#L392 diff --git a/website/content/docs/concepts/workload-identity.mdx b/website/content/docs/concepts/workload-identity.mdx index 38834ab21..63e2ab355 100644 --- a/website/content/docs/concepts/workload-identity.mdx +++ b/website/content/docs/concepts/workload-identity.mdx @@ -190,12 +190,12 @@ integration pages for more information. [jobspec_vault_ns]: /nomad/docs/job-specification/vault#namespace [jobspec_vault_role]: /nomad/docs/job-specification/vault#role [vault_role_agent_config]: /nomad/docs/configuration/vault#create_from_role -[plan applier]: /nomad/docs/concepts/scheduling/scheduling +[plan applier]: /nomad/docs/concepts/scheduling/how-scheduling-works [JSON Web Token (JWT)]: https://datatracker.ietf.org/doc/html/rfc7519 [Task Access to Variables]: /nomad/docs/concepts/variables#task-access-to-variables [List Services API]: /nomad/api-docs/services#list-services [Read Service API]: /nomad/api-docs/services#read-service [windows]: https://devblogs.microsoft.com/commandline/af_unix-comes-to-windows/ [taskapi]: /nomad/api-docs/task-api -[consul_int]: /nomad/docs/integrations/consul-integration -[vault_int]: /nomad/docs/integrations/vault-integration +[consul_int]: /nomad/docs/secure/acl/consul +[vault_int]: /nomad/docs/secure/vault diff --git a/website/content/docs/configuration/acl.mdx b/website/content/docs/configuration/acl.mdx index 8dcce0d4d..6161d1d20 100644 --- a/website/content/docs/configuration/acl.mdx +++ b/website/content/docs/configuration/acl.mdx @@ -68,4 +68,4 @@ acl { [secure-guide]: /nomad/tutorials/access-control [authoritative-region]: /nomad/docs/configuration/server#authoritative_region -[Configure for multiple regions]: /nomad/tutorials/access-control/access-control-bootstrap#configure-for-multiple-regions +[Configure for multiple regions]: /nomad/docs/secure/acl/bootstrap#configure-for-multiple-regions diff --git a/website/content/docs/configuration/autopilot.mdx b/website/content/docs/configuration/autopilot.mdx index 350d75ef0..284e67c9e 100644 --- a/website/content/docs/configuration/autopilot.mdx +++ b/website/content/docs/configuration/autopilot.mdx @@ -14,7 +14,7 @@ Autopilot features in the `autopilot` block of a Nomad agent configuration. Enable dead server cleanup, redundancy zones, and custom upgrades. Disable upgrade migration. Tune Raft settings for stable server introduction. -Refer to the [Autopilot Guide](/nomad/tutorials/manage-clusters/autopilot) for +Refer to the [Autopilot Guide](/nomad/docs/manage/autopilot) for how to configure and use Autopilot. ```hcl diff --git a/website/content/docs/configuration/client.mdx b/website/content/docs/configuration/client.mdx index 73784f7a0..388af74aa 100644 --- a/website/content/docs/configuration/client.mdx +++ b/website/content/docs/configuration/client.mdx @@ -15,8 +15,8 @@ allocation directories, artifact and template behavior, networking, node pools, servers to join, garbage collection, workload behavior, client resources, chroot, host volumes, host network, and driver-specific behavior. -Refer to the [Set Server and Client Nodes](/nomad/docs/operations/nomad-agent) -and [Nomad Agent](/nomad/docs/commands/agent) pages to learn about the Nomad +Refer to the [Set Server and Client Nodes](/nomad/docs/deploy/nomad-agent) +and [Nomad Agent](/nomad/commands/agent) pages to learn about the Nomad agent process and how to configure the server and client nodes in your cluster. ```hcl @@ -248,7 +248,7 @@ client { ### `chroot_env` Parameters -On Linux, drivers based on [isolated fork/exec](/nomad/docs/drivers/exec) implement file system isolation using chroot. The `chroot_env` map lets you configure the chroot environment using source paths on the host operating system. +On Linux, drivers based on [isolated fork/exec](/nomad/docs/job-declare/task-driver/exec) implement file system isolation using chroot. The `chroot_env` map lets you configure the chroot environment using source paths on the host operating system. The mapping format is: @@ -276,7 +276,7 @@ client { @include 'chroot-limitations.mdx' When `chroot_env` is unspecified, the `exec` driver uses a default chroot -environment with the most commonly used parts of the operating system. Refer to the [Nomad `exec` driver documentation](/nomad/docs/drivers/exec#chroot) for +environment with the most commonly used parts of the operating system. Refer to the [Nomad `exec` driver documentation](/nomad/docs/deploy/task-driver/exec#chroot) for the full list. Nomad never attempts to embed the `alloc_dir` in the chroot as doing so would cause infinite recursion. @@ -288,7 +288,7 @@ Refer to the [plugin block][plugin-block] documentation for more information. The following is not an exhaustive list of options for only the Nomad client. To find the options supported by each individual Nomad driver, -refer to the [drivers documentation](/nomad/docs/drivers). +refer to the [drivers documentation](/nomad/docs/job-declare/task-driver). - `"driver.allowlist"` `(string: "")` - Specifies a comma-separated list of allowlisted drivers. If specified, drivers not in the allowlist will be @@ -830,17 +830,17 @@ client { [plugin-block]: /nomad/docs/configuration/plugin [server-join]: /nomad/docs/configuration/server_join 'Server Join' [metadata_constraint]: /nomad/docs/job-specification/constraint#user-specified-metadata 'Nomad User-Specified Metadata Constraint Example' -[runtime_var_interpolation]: /nomad/docs/runtime/interpolation -[task working directory]: /nomad/docs/runtime/environment#task-directories 'Task directories' +[runtime_var_interpolation]: /nomad/docs/reference/runtime-variable-interpolation +[task working directory]: /nomad/docs/reference/runtime-environment-settings#task-directories 'Task directories' [go-sockaddr/template]: https://pkg.go.dev/github.com/hashicorp/go-sockaddr/template [landlock]: https://docs.kernel.org/userspace-api/landlock.html [`leave_on_interrupt`]: /nomad/docs/configuration#leave_on_interrupt [`leave_on_terminate`]: /nomad/docs/configuration#leave_on_terminate [migrate]: /nomad/docs/job-specification/migrate -[`nomad node drain -self -no-deadline`]: /nomad/docs/commands/node/drain +[`nomad node drain -self -no-deadline`]: /nomad/commands/node/drain [`TimeoutStopSec`]: https://www.freedesktop.org/software/systemd/man/systemd.service.html#TimeoutStopSec= [top_level_data_dir]: /nomad/docs/configuration#data_dir [unveil]: /nomad/docs/concepts/plugins/task-drivers#fsisolation-unveil [dynamic host volumes]: /nomad/docs/other-specifications/volume/host -[`volume create`]: /nomad/docs/commands/volume/create -[`volume register`]: /nomad/docs/commands/volume/register +[`volume create`]: /nomad/commands/volume/create +[`volume register`]: /nomad/commands/volume/register diff --git a/website/content/docs/configuration/consul.mdx b/website/content/docs/configuration/consul.mdx index 3f5b27fd1..970c86aad 100644 --- a/website/content/docs/configuration/consul.mdx +++ b/website/content/docs/configuration/consul.mdx @@ -429,7 +429,7 @@ for Nomad clients in the same partition. [consul]: https://www.consul.io/ 'Consul by HashiCorp' -[bootstrap]: /nomad/tutorials/manage-clusters/clustering 'Automatic Bootstrapping' +[bootstrap]: /nomad/docs/deploy/clusters/connect-nodes 'Automatic Bootstrapping' [go-sockaddr/template]: https://pkg.go.dev/github.com/hashicorp/go-sockaddr/template [grpc_port]: /consul/docs/agent/config/config-files#grpc_port [grpctls_port]: /consul/docs/agent/config/config-files#grpc_tls_port diff --git a/website/content/docs/configuration/keyring/index.mdx b/website/content/docs/configuration/keyring/index.mdx index b2cdef9bd..a708548cd 100644 --- a/website/content/docs/configuration/keyring/index.mdx +++ b/website/content/docs/configuration/keyring/index.mdx @@ -104,7 +104,7 @@ keyring "awskms" { [variables]: /nomad/docs/concepts/variables [workload identities]: /nomad/docs/concepts/workload-identity -[Key Management]: /nomad/docs/operations/key-management -[key rotation]: /nomad/docs/operations/key-management#key-rotation -[keyring_rotate_cmd]: /nomad/docs/commands/operator/root/keyring-rotate -[keyring_list_cmd]: /nomad/docs/commands/operator/root/keyring-list +[Key Management]: /nomad/docs/manage/key-management +[key rotation]: /nomad/docs/manage/key-management#key-rotation +[keyring_rotate_cmd]: /nomad/commands/operator/root/keyring-rotate +[keyring_list_cmd]: /nomad/commands/operator/root/keyring-list diff --git a/website/content/docs/configuration/reporting.mdx b/website/content/docs/configuration/reporting.mdx index 1ef8e8cd6..0a2a2bf60 100644 --- a/website/content/docs/configuration/reporting.mdx +++ b/website/content/docs/configuration/reporting.mdx @@ -47,4 +47,4 @@ reporting { [server_mode_enabled]: /nomad/docs/configuration/server#enabled [automated_license_utilization_reporting]: /nomad/docs/enterprise/license/utilization-reporting -[`nomad operator utilization`]: /nomad/docs/commands/operator/utilization +[`nomad operator utilization`]: /nomad/commands/operator/utilization diff --git a/website/content/docs/configuration/sentinel.mdx b/website/content/docs/configuration/sentinel.mdx index 227d57fad..8789e0867 100644 --- a/website/content/docs/configuration/sentinel.mdx +++ b/website/content/docs/configuration/sentinel.mdx @@ -42,10 +42,10 @@ Refer to these resources for details on using Sentinel policies with Nomad: - [Governance and policy on - Nomad](/nomad/tutorials/governance-and-policy/governance-and-policy) -- [Sentinel policies](/nomad/tutorials/governance-and-policy/sentinel) + Nomad](/nomad/docs/govern) +- [Sentinel policies](/nomad/docs/govern/sentinel) - [Sentinel policy - reference](https://developer.hashicorp.com/nomad/docs/enterprise/sentinel) + reference](https://developer.hashicorp.com/nomad/docs/reference/sentinel-policy) diff --git a/website/content/docs/configuration/server.mdx b/website/content/docs/configuration/server.mdx index c27150a43..7cc6526dc 100644 --- a/website/content/docs/configuration/server.mdx +++ b/website/content/docs/configuration/server.mdx @@ -208,7 +208,7 @@ server { when raising this value is that during network partitions or other events (server crash) where a leader is lost, Nomad will not elect a new leader for a longer period of time than the default. The [`nomad.nomad.leader.barrier` and - `nomad.raft.leader.lastContact` metrics](/nomad/docs/operations/metrics-reference) are a good + `nomad.raft.leader.lastContact` metrics](/nomad/docs/reference/metrics) are a good indicator of how often leader elections occur and Raft latency. - `raft_snapshot_threshold` `(int: "8192")` - Specifies the minimum number of @@ -232,7 +232,7 @@ server { - `redundancy_zone` `(string: "")` - (Enterprise-only) Specifies the redundancy zone that this server will be a part of for Autopilot management. For more - information, refer to the [Autopilot Guide](/nomad/tutorials/manage-clusters/autopilot). + information, refer to the [Autopilot Guide](/nomad/docs/manage/autopilot). - `rejoin_after_leave` `(bool: false)` - Specifies if Nomad will ignore a previous leave and attempt to rejoin the cluster when starting. By default, @@ -267,7 +267,7 @@ server { - `upgrade_version` `(string: "")` - A custom version of the format X.Y.Z to use in place of the Nomad version when custom upgrades are enabled in Autopilot. - For more information, refer to the [Autopilot Guide](/nomad/tutorials/manage-clusters/autopilot). + For more information, refer to the [Autopilot Guide](/nomad/docs/manage/autopilot). - `search` ([search][search]: nil) - Specifies configuration parameters for the Nomad search API. @@ -383,7 +383,7 @@ server { The Nomad servers can automatically bootstrap if Consul is configured. For a more detailed explanation, refer to the -[automatic Nomad bootstrapping documentation](/nomad/tutorials/manage-clusters/clustering). +[automatic Nomad bootstrapping documentation](/nomad/docs/deploy/clusters/connect-nodes). ### Restricting Schedulers @@ -519,18 +519,18 @@ election was due to a datacenter-wide failure affecting Clients, it will be 30 minutes before Nomad recognizes that they are `down` and replaces their work. -[encryption]: /nomad/tutorials/transport-security/security-gossip-encryption 'Nomad Encryption Overview' +[encryption]: /nomad/docs/secure/traffic/gossip-encryption 'Nomad Encryption Overview' [server-join]: /nomad/docs/configuration/server_join 'Server Join' [update-scheduler-config]: /nomad/api-docs/operator/scheduler#update-scheduler-configuration 'Scheduler Config' [bootstrapping a cluster]: /nomad/docs/faq#bootstrapping [rfc4648]: https://tools.ietf.org/html/rfc4648#section-5 -[monitoring_nomad_progress]: /nomad/docs/operations/monitoring-nomad#progress -[`nomad operator gossip keyring generate`]: /nomad/docs/commands/operator/gossip/keyring-generate +[monitoring_nomad_progress]:/nomad/docs/monitor#progress +[`nomad operator gossip keyring generate`]: /nomad/commands/operator/gossip/keyring-generate [search]: /nomad/docs/configuration/search -[encryption key]: /nomad/docs/operations/key-management +[encryption key]: /nomad/docs/manage/key-management [disconnect.lost_after]: /nomad/docs/job-specification/disconnect#lost_after [herd]: https://en.wikipedia.org/wiki/Thundering_herd_problem [wi]: /nomad/docs/concepts/workload-identity -[Configure for multiple regions]: /nomad/tutorials/access-control/access-control-bootstrap#configure-for-multiple-regions +[Configure for multiple regions]: /nomad/docs/secure/acl/bootstrap#configure-for-multiple-regions [top_level_data_dir]: /nomad/docs/configuration#data_dir [JWKS URL]: /nomad/api-docs/operator/keyring#list-active-public-keys diff --git a/website/content/docs/configuration/telemetry.mdx b/website/content/docs/configuration/telemetry.mdx index 62ea073c9..9f165bc66 100644 --- a/website/content/docs/configuration/telemetry.mdx +++ b/website/content/docs/configuration/telemetry.mdx @@ -23,7 +23,7 @@ telemetry { This section of the documentation only covers the configuration options for the `telemetry` block. To understand the architecture and metrics themselves, -refer to the [Telemetry guide](/nomad/docs/operations/monitoring-nomad). +refer to the [Telemetry guide](/nomad/docs/monitor). ## `telemetry` Parameters diff --git a/website/content/docs/configuration/tls.mdx b/website/content/docs/configuration/tls.mdx index 2ecd4257b..4bdaaf231 100644 --- a/website/content/docs/configuration/tls.mdx +++ b/website/content/docs/configuration/tls.mdx @@ -29,7 +29,7 @@ tls { start the Nomad agent. This section of the documentation only covers the configuration options for the `tls` block. To understand how to setup the certificates themselves, refer to -the [Enable TLS Encryption for Nomad Tutorial](/nomad/tutorials/transport-security/security-enable-tls). +the [Enable TLS Encryption for Nomad Tutorial](/nomad/docs/secure/traffic/tls). ## `tls` Parameters diff --git a/website/content/docs/configuration/vault.mdx b/website/content/docs/configuration/vault.mdx index b3b7a71ba..d0d9a7500 100644 --- a/website/content/docs/configuration/vault.mdx +++ b/website/content/docs/configuration/vault.mdx @@ -231,7 +231,7 @@ can be accomplished by sending the process a `SIGHUP` signal. [`vault.cluster`]: /nomad/docs/job-specification/vault#cluster [jobspec_vault_role]: /nomad/docs/job-specification/vault#role [jobspec_identity]: /nomad/docs/job-specification/identity -[nomad-vault]: /nomad/docs/integrations/vault-integration 'Nomad Vault Integration' +[nomad-vault]: /nomad/docs/secure/vault 'Nomad Vault Integration' [taskuser]: /nomad/docs/job-specification/task#user "Nomad task Block" [vault]: https://www.vaultproject.io/ 'Vault by HashiCorp' [vault_bound_aud]: /vault/api-docs/auth/jwt#bound_audiences diff --git a/website/content/docs/deploy/clusters/connect-nodes.mdx b/website/content/docs/deploy/clusters/connect-nodes.mdx new file mode 100644 index 000000000..5d1f406ae --- /dev/null +++ b/website/content/docs/deploy/clusters/connect-nodes.mdx @@ -0,0 +1,226 @@ +--- +layout: docs +page_title: Connect nodes into a cluster +description: |- + Connect nodes together to create a Nomad cluster manually + or automatically with cloud auto-join on AWS, Azure, and GCP. +--- + +# Connect nodes into a cluster + +In order to create a Nomad cluster out of individual nodes, you need to +introduce them to one another. There are several ways to perform this: + +- Manually +- Cloud Auto-Join +- Consul + +This tutorial describes each method and provides configuration snippets, which +you can use as starting points for your own configuration. + +## Manual clustering + +Manually bootstrapping a Nomad cluster does not rely on additional tooling, but +does require operator participation in the cluster formation process. When +bootstrapping, Nomad servers and clients must be started and informed with the +address of at least one Nomad server. + +As you can tell, this creates a chicken-and-egg problem where one server must +first be fully bootstrapped and configured before the remaining servers and +clients can join the cluster. This requirement can add additional provisioning +time as well as ordered dependencies during provisioning. + +First, you need to bootstrap a single Nomad server and capture its IP address. +Place this address in the configuration once you have that nodes IP address. + +For Nomad servers, this configuration may look something like this: + +```hcl +server { + enabled = true + bootstrap_expect = 3 + + # This is the IP address of the first server provisioned + server_join { + retry_join = [":4648"] + } +} +``` + +Alternatively, you can supply a server's address after the servers have all been +started by running the [`server join` command] on the servers individually to +cluster the servers. All servers can join one other server, and then rely on the +gossip protocol to discover the rest. + +```shell-session +$ nomad server join +``` + +For Nomad clients, the configuration may look something like: + +```hcl +client { + enabled = true + server_join { + retry_join = [":4647"] + } +} +``` + +The client node's server list can be updated at run time using the +[`node config` command]. + +```shell-session +$ nomad node config -update-servers :4647 +``` + +The port corresponds to the RPC port. If no port is specified with the IP +address, the default RPC port of `4647` is assumed. + +As servers are added or removed from the cluster, this information is pushed to +the client. This means only one server must be specified because, after initial +contact, the full set of servers in the client's region are shared with the +client. + +## Join nodes using cloud auto-join + +As of Nomad 0.8.4, [`retry_join`] accepts a unified interface using the +[go-discover] library for doing automatic cluster joining using cloud metadata. +To use retry-join with a supported cloud provider, specify the configuration on +the command line or configuration file as a `key=value key=value ...` string. +Values are taken literally and must not be URL encoded. If the values contain +spaces, backslashes or double quotes they need to be double quoted and the usual +escaping rules apply. + +```json +{ + "retry_join": ["provider=my-cloud config=val config2=\"some other val\" ..."] +} +``` + +Consult the [cloud provider-specific configurations] in the cloud-autojoin +documentation. This can be combined with static IP or DNS addresses or even +multiple configurations for different providers. In order to use discovery +behind a proxy, you will need to set `HTTP_PROXY`, `HTTPS_PROXY` and `NO_PROXY` +environment variables per [Golang `net/http` library]. + +## Use Consul to automatically cluster nodes + +To automatically bootstrap a Nomad cluster, Nomad can leverage another HashiCorp +open source tool, [Consul]. Bootstrapping Nomad is easiest against an existing +Consul cluster. The Nomad servers and clients will become informed of each +other's existence when the Consul agent is installed and configured on each +host. As an added benefit, integrating Consul with Nomad provides service and +health check registration for applications which later run under Nomad. + +Consul models infrastructures as datacenters and multiple Consul datacenters can +be connected over the WAN so that clients can discover nodes in other +datacenters. Since Nomad regions can encapsulate many datacenters, you should be +running a Consul cluster in every Nomad region and connecting them over the +WAN. Refer to the Consul tutorial for both [bootstrapping] a single +datacenter and [connecting multiple Consul clusters over the WAN]. + +If a Consul agent is installed on the host prior to Nomad starting, the Nomad +agent will register with Consul and discover other nodes. + +For servers, you must inform the cluster how many servers you expect to have. +This is required to form the initial quorum, since Nomad is unaware of how many +peers to expect. For example, to form a region with three Nomad servers, you +would use the following Nomad configuration file: + +```hcl +# /etc/nomad.d/server.hcl + +# data_dir tends to be environment specific. +data_dir = "/opt/nomad/data" + +server { + enabled = true + bootstrap_expect = 3 +} +``` + +This configuration would be saved to disk and then run: + +```shell-session +$ nomad agent -config=/etc/nomad.d/server.hcl +``` + +A similar configuration is available for Nomad clients: + +```hcl +# /etc/nomad.d/client.hcl + +datacenter = "dc1" + +# data_dir tends to be environment specific. +data_dir = "/opt/nomad/data" + +client { + enabled = true +} +``` + +The agent is started in a similar manner: + +```shell-session +$ sudo nomad agent -config=/etc/nomad.d/client.hcl +``` + +Nomad clients should always run as root (or with `sudo`). The above +configurations include no IP or DNS addresses between the clients and +servers. This is because Nomad detected the existence of Consul and utilized +service discovery to form the cluster. + +### Consul auto-join internals + +~> This section discusses the internals of the Consul and Nomad integration at a +very high level. Reading is only recommended for those curious to the +implementation. + +As discussed in the previous section, Nomad merges multiple configuration files +together, so the `-config` may be specified more than once: + +```shell-session +$ nomad agent -config=base.hcl -config=server.hcl +``` + +In addition to merging configuration on the command line, Nomad also maintains +its own internal configurations (called "default configs") which include reasonable +base defaults. One of those default configurations includes a "consul" block, +which specifies reasonable defaults for connecting to and integrating with Consul. In +essence, this configuration file resembles the following: + +```hcl +# You do not need to add this to your configuration file. This is an example +# that is part of Nomad's internal default configuration for Consul integration. +consul { + # The address to the Consul agent. + address = "127.0.0.1:8500" + + # The service name to register the server and client with Consul. + server_service_name = "nomad" + client_service_name = "nomad-client" + + # Enables automatically registering the services. + auto_advertise = true + + # Enabling the server and client to bootstrap using Consul. + server_auto_join = true + client_auto_join = true +} +``` + +Refer to the [`consul` stanza] documentation for the complete set of configuration +options. + +[`consul` stanza]: /nomad/docs/configuration/consul +[`node config` command]: /nomad/commands/node/config +[`retry_join`]: /nomad/docs/configuration/server_join#retry_join +[`server join` command]: /nomad/commands/server/join +[bootstrapping]: /consul/docs/deploy/server/vm/bootstrap +[cloud provider-specific configurations]: /nomad/docs/configuration/server_join#cloud-auto-join +[connecting multiple consul clusters over the wan]: /consul/docs/east-west/wan-federation +[consul]: /consul/ +[go-discover]: https://github.com/hashicorp/go-discover +[golang `net/http` library]: https://golang.org/pkg/net/http/#ProxyFromEnvironment diff --git a/website/content/docs/deploy/clusters/federate-regions.mdx b/website/content/docs/deploy/clusters/federate-regions.mdx new file mode 100644 index 000000000..2c23a003d --- /dev/null +++ b/website/content/docs/deploy/clusters/federate-regions.mdx @@ -0,0 +1,148 @@ +--- +layout: docs +page_title: Federate multi-region clusters +description: |- + Set up multi-region federation to allow job submissions and API + interactions from any server in any region. +--- + +# Federate multi-region clusters + +Nomad operates at a regional level and provides first-class support for +federation. Federation enables users to submit jobs or interact with the HTTP +API targeting any region, from any server, even if that server resides in a +different region. + +Federating multiple Nomad clusters requires network connectivity between the +clusters. Servers in each cluster must be able to communicate over [RPC and +Serf][ports-used]. Federated clusters are expected to communicate over WANs, so +they do not need the same low latency as servers within a region. + +Once Nomad servers are able to connect over the network, you can issue the +[nomad server join][server-join] command from any server in one region to a +server in a remote region to federate the clusters. + +[![Multi-Region][multi-region-pic]][multi-region-pic] + +## Prerequisites + +To perform the tasks described in this guide, you need to have two Nomad +environments with ports 4646, 4647, and 4648 exposed. You can use this +[Terraform environment][nomad-tf] to provision the sandbox environments. This +guide assumes two clusters with one server node and two client nodes in each +cluster. While the Terraform code already opens port 4646, you will also need to +expose ports 4647 and 4648 on the server you wish to run [nomad server +join][server-join] against (consult the [Nomad Port Requirements][ports-used] +documentation for more information). + + + + This tutorial is for demo purposes and only assumes a single server +node in each cluster. Consult the [reference architecture][reference-arch] for +production configuration. + + + +## Verify current regions + +Currently, each of your clusters is in the default `global` +[region][region-config]. You can verify this by running [nomad server +members][nomad-server-members] on any node in each of your clusters: + +```shell-session +$ nomad server members +Name Address Port Status Leader Protocol Build Datacenter Region +ip-172-31-29-34.global 172.31.29.34 4648 alive true 2 0.10.1 dc1 global +``` + +## Change the regions + +Respectively change the region of your individual clusters into `west` and +`east` by adding the [region][region-config] parameter into the agent +configuration on the servers and clients (if you are using the provided sandbox +environment, this configuration is located at `/etc/nomad.d/nomad.hcl`). + +Below is a snippet of the configuration file showing the required change on a +node for one of the clusters (remember to change this value to `east` on the +servers and clients in your other cluster): + +```hcl +data_dir = "/opt/nomad/data" +bind_addr = "0.0.0.0" +region = "west" +# ... +``` + +Once you have made the necessary changes for each cluster, restart the nomad +service on each node: + +```shell-session +$ sudo systemctl restart nomad +``` + +Re-run the `nomad server members` command on any node in the cluster to verify +that your server is configured to be the in the correct region. The output below +is from running the command in the `west` region (make sure to run this command +in your other cluster to make sure it is in the `east` region): + +```shell-session +$ nomad server members +Name Address Port Status Leader Protocol Build Datacenter Region +ip-172-31-29-34.west 172.31.29.34 4648 alive true 2 0.10.1 dc1 west +``` + +## Federate the regions + +Run the [`nomad server join`][server-join] command from a server in one cluster +and supply it the IP address of the server in your other cluster while +specifying port 4648. + +Below is an example of running the `nomad server join` command from the server +in the `west` region while targeting the server in the `east` region: + +```shell-session +$ nomad server join 172.31.26.138:4648 +Joined 1 servers successfully +``` + +## Verify the clusters have been federated + +After you have federated your clusters, the output from the `nomad server members` command will show the servers from both regions: + +```shell-session +$ nomad server members +Name Address Port Status Leader Protocol Build Datacenter Region +ip-172-31-26-138.east 172.31.26.138 4648 alive true 2 0.10.1 dc1 east +ip-172-31-29-34.west 172.31.29.34 4648 alive true 2 0.10.1 dc1 west +``` + +## Check job status in remote cluster + +From the Nomad cluster in the `west` region, try to run the [`nomad status`][nomad-status] command to check the status of jobs in the `east` region: + +```shell-session +$ nomad status -region="east" +No running jobs +``` + +If your regions were not federated properly, you will receive the following +output: + +```shell-session +$ nomad status -region="east" +Error querying jobs: Unexpected response code: 500 (No path to region) +``` + +## Learn more about federation + +- [Deployment Topology across Multiple Regions][multi-region] + +[multi-region]: /nomad/tutorials/enterprise/production-reference-architecture-vm-with-consul#deployment-topology-across-multiple-regions +[multi-region-pic]: /img/clusters/nomad-multi-region.png +[nomad-server-members]: /nomad/commands/server/members +[nomad-status]: /nomad/commands/status +[nomad-tf]: https://github.com/hashicorp/nomad/tree/master/terraform#provision-a-nomad-cluster-in-the-cloud +[ports-used]: /nomad/docs/deploy/production/requirements#ports-used +[reference-arch]: /nomad/tutorials/enterprise/production-reference-architecture-vm-with-consul +[region-config]: /nomad/docs/configuration#region +[server-join]: /nomad/commands/server/join diff --git a/website/content/docs/operations/federation/index.mdx b/website/content/docs/deploy/clusters/federation-considerations.mdx similarity index 87% rename from website/content/docs/operations/federation/index.mdx rename to website/content/docs/deploy/clusters/federation-considerations.mdx index b320df66b..a3ead6788 100644 --- a/website/content/docs/operations/federation/index.mdx +++ b/website/content/docs/deploy/clusters/federation-considerations.mdx @@ -1,11 +1,11 @@ --- layout: docs -page_title: Federated cluster operations +page_title: Multi-region federation operational considerations description: |- Review operational considerations for running Nomad multi-region federated clusters as well as instructions for migrating the authoritative region to a federated region. --- -# Federated cluster operations +# Multi-region federation operational considerations This page lists operational considerations for running multi-region federated clusters as well as instructions for migrating the authoritative region to a @@ -57,8 +57,8 @@ When operating multi-region federated Nomad clusters, consider the following: * **Can federated regions be bootstrapped while the authoritative region is down?** No they cannot. -[`nomad operator snapshot save`]: /nomad/docs/commands/operator/snapshot/save -[`nomad operator snapshot agent`]: /nomad/docs/commands/operator/snapshot/agent -[`nomad operator snapshot restore`]: /nomad/docs/commands/operator/snapshot/restore -[failure_scenarios]: /nomad/docs/operations/federation/failure +[`nomad operator snapshot save`]: /nomad/commands/operator/snapshot/save +[`nomad operator snapshot agent`]: /nomad/commands/operator/snapshot/agent +[`nomad operator snapshot restore`]: /nomad/commands/operator/snapshot/restore +[failure_scenarios]: /nomad/docs/deploy/clusters/federation-failure-scenarios [`authoritative_region`]: /nomad/docs/configuration/server#authoritative_region diff --git a/website/content/docs/operations/federation/failure.mdx b/website/content/docs/deploy/clusters/federation-failure-scenarios.mdx similarity index 100% rename from website/content/docs/operations/federation/failure.mdx rename to website/content/docs/deploy/clusters/federation-failure-scenarios.mdx diff --git a/website/content/docs/deploy/clusters/index.mdx b/website/content/docs/deploy/clusters/index.mdx new file mode 100644 index 000000000..fc7eef29e --- /dev/null +++ b/website/content/docs/deploy/clusters/index.mdx @@ -0,0 +1,15 @@ +--- +layout: docs +page_title: Nomad clusters +description: |- + This section contains material for creating and federating Nomad clusters.Connect nodes into a cluster, federate multi-region clusters, and configure a web UI reverse proxy. Review multi-region federation considerations and failure scenarios. mplement load balancing using Fabio, HAProxy, NGINX, or Traefik. Manage external traffic with application load balancing. + +--- + +# Nomad clusters + +This section contains content for creating and federating Nomad clusters. +Connect nodes into a cluster, federate multi-region clusters, and configure a +web UI reverse proxy. Review multi-region federation considerations and +failure scenarios. Implement load balancing using Fabio, HAProxy, NGINX, or +Traefik. Manage external traffic with application load balancing. diff --git a/website/content/docs/deploy/clusters/reverse-proxy-ui.mdx b/website/content/docs/deploy/clusters/reverse-proxy-ui.mdx new file mode 100644 index 000000000..efd7f73e2 --- /dev/null +++ b/website/content/docs/deploy/clusters/reverse-proxy-ui.mdx @@ -0,0 +1,392 @@ +--- +layout: docs +page_title: Configure a web UI reverse proxy +description: |- + Run and configure NGINX as a reverse proxy for the Nomad web UI to + create a secure way for users to access detailed cluster information. +--- + +# Configure a web UI reverse proxy + +NGINX can be used to reverse proxy web services and balance load across multiple instances of the same service. A reverse proxy has the added benefits of enabling multiple web services to share a single, memorable domain and authentication to view internal systems. + +To ensure every feature in the Nomad UI remains fully functional, you must properly configure your reverse proxy to meet Nomad's specific networking requirements. + +This guide will explore common configuration changes necessary when reverse proxying Nomad's Web UI. Issues common to default proxy configurations will be discussed and demonstrated. As you learn about each issue, you will deploy NGINX configuration changes that will address it. + +## Prerequisites + +This guide assumes basic familiarity with Nomad and NGINX. + +Here is what you will need for this guide: + +- Nomad 0.11.0 installed locally +- Docker + +## Start Nomad + +Because of best practices around least access to nodes, it is typical for Nomad +UI users to not have direct access to the Nomad client nodes. You can simulate +that for the purposes of this guide by advertising an incorrect `http` address. + +Create a file named `nomad.hcl` with the following configuration snippet. + +```hcl +# Advertise a bogus HTTP address to force the UI +# to fallback to streaming logs through the proxy. +advertise { + http = "internal-ip:4646" +} +``` + +Start Nomad as a dev agent with this custom configuration file. + +```shell-session +$ sudo nomad agent -dev -config=nomad.hcl +``` + +Next, create a service job file that will frequently write logs to `stdout`. This sample job file below can be used if you don't have your own. + +```hcl +# fs-example.nomad.hcl + +job "fs-example" { + datacenters = ["dc1"] + + task "fs-example" { + driver = "docker" + + config { + image = "dingoeatingfuzz/fs-example:0.3.0" + } + + resources { + cpu = 500 + memory = 512 + } + } +} +``` + +Run this service job using the Nomad CLI or UI. + +```shell-session +$ nomad run fs-example.nomad.hcl +``` + +At this point, you have a Nomad cluster running locally with one job in it. You can visit the Web UI at `http://localhost:4646`. + +## Configure NGINX to reverse proxy the web UI + +As mentioned earlier, the overarching goal is to configure a proxy from Nomad UI users to the Nomad UI running on the Nomad cluster. To do that, you will configure a NGINX instance as your reverse proxy. + +Create a basic NGINX configuration file to reverse proxy the Web UI. It is important to name the NGINX configuration file `nginx.conf` otherwise the file will not bind correctly. + +```nginx +# nginx.conf +events {} + +http { + server { + location / { + proxy_pass http://host.docker.internal:4646; + proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; + } + } +} +``` + + + + If you are not using Docker for Mac or Docker for Windows, the `host.docker.internal` DNS record may not be available. + + + +This basic NGINX configuration does two things. The first is forward all traffic into NGINX to the proxy address at `http://host.docker.internal:4646`. Since NGINX will be running in Docker and Nomad is running locally, this address is equivalent to `http://localhost:4646` which is where the Nomad API and Web UI are served. The second thing this configuration does is attach the `X-Forwarded-For` header which allows HTTP requests to be traced back to their origin. + +Next in a new terminal session, start NGINX in Docker using this configuration file. + +```shell-session +$ docker run --publish=8080:80 \ + --mount type=bind,source=$PWD/nginx.conf,target=/etc/nginx/nginx.conf \ + nginx:latest +``` + +NGINX will be started as soon as Docker has finished pulling layers. At that point you can visit `http://localhost:8080` to visit the Nomad Web UI through the NGINX reverse proxy. + +## Extend connection timeout + +The Nomad Web UI uses long-lived connections for its live-update feature. If the proxy closes the connection early because of a connection timeout, it could prevent the Web UI from continuing to live-reload data. + +The Nomad Web UI will live-reload all data to make sure views are always fresh as Nomad's server state changes. This is achieved using the [blocking queries][blocking-queries] to the Nomad API. Blocking queries are an implementation of long-polling which works by keeping HTTP connections open until server-side state has changed. This is advantageous over traditional polling which results in more requests that often return no new information. It is also faster since a connection will close as soon as new information is available rather than having to wait for the next iteration of a polling loop. A consequence of this design is that HTTP requests aren't always expected to be short-lived. NGINX has a [default proxy timeout of 60 seconds](http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_read_timeout) while Nomad's blocking query system will leave connections open for five minutes by default. + +To observe the proxy time out a connection, visit the Nomad jobs list through the proxy at `http://localhost:8080/ui/jobs` with your Browser's developer tools open. + + + + + +With the Nomad UI page open, press the `F12` key to open the Developer tools. If it is not already selected, go to the Developer tools pane and select the **Network** tab. Leaving the tools pane open. Refresh the UI page. + +The blocking query connection for jobs will remain in "(pending)" status. + +[![Chrome developer tools window showing pending connection.][img-chrome-pending]][img-chrome-pending] + +In approximately 60 seconds it will transition to a "504 Gateway Time-out" status. + +[![Chrome developer tools window showing connection timeout.][img-chrome-timeout]][img-chrome-timeout] + + + + + +With the Nomad UI page open, go to the **Tools** menu, **Web Developer** flyout, and **Network** option. + +The blocking query connection for jobs will not show a status while it is still active. + +[![Firefox developer tools window showing pending connection.][img-firefox-pending]][img-firefox-pending] + +In approximately 60 seconds it will transition to a "504 Gateway Time-out" status. + +[![Firefox developer tools window showing connection timeout.][img-firefox-pending]][img-firefox-pending] + + + + + +With the Nomad UI page open, go to the **Tools** menu, **Web Developer** flyout, and **Network** option. + +The blocking query connection for jobs will have a spinning icon next to the Name while it is still active. + +[![Safari developer tools window showing pending connection.][img-safari-pending]][img-safari-pending] + +In approximately 60 seconds it will transition to a red error state. Clicking on the red error will show that you received a "504 Gateway Time-out" error. + +[![Safari developer tools window showing connection timeout.][img-safari-pending]][img-safari-pending] + + + + + +With the Nomad UI page open, press the `F12` key to open the Developer tools. If it is not already selected, go to the Developer tools pane and select the **Network** tab. Leaving the tools pane open. Refresh the UI page. + +The blocking query connection for jobs will remain in "(pending)" status. + +[![Edge developer tools window showing pending connection.][img-chrome-pending]][img-chrome-pending] + +In approximately 60 seconds it will transition to a "504 Gateway Time-out" status. + +[![Edge developer tools window showing connection timeout.][img-chrome-timeout]][img-chrome-timeout] + + + + + +Open your browser's developer tools. Locate the network information and wait approximately 60 seconds. The request to `/v1/jobs` will timeout with the following error message. + +```plaintext +Failed to load resource: the server responded with a status of +504 (Gateway Time-out) +``` + + + +
+ + +To prevent these timeouts, update the NGINX configuration's `location` block to extend the +`proxy_read_timeout` setting. The Nomad API documentation's [Blocking +Queries][blocking-queries] section explains that Nomad adds the result of (`wait` / 16) to the declared wait +time. You should set the `proxy_read_timeout` to slightly exceed Nomad's calculated wait time. + +This guide uses the default blocking query `wait` of 300 seconds. Nomad adds +18.75 seconds to that wait time, so the `proxy_read_timeout` should be greater than 318.75 seconds. + +Set the`proxy_read_timeout` to `319s`. + +```nginx +# ... +proxy_pass http://host.docker.internal:4646; +proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; + +# Nomad blocking queries will remain open for a default of 5 minutes. +# Increase the proxy timeout to accommodate this timeout with an +# additional grace period. +proxy_read_timeout 319s; +# ... +``` + +Restart the NGINX docker container to load these configuration changes. + +## Disable proxy buffering + +When possible the Web UI will use a streaming HTTP request to stream logs on the task logs page. NGINX by default will buffer proxy responses in an attempt to free up connections to the backend server being proxied as soon as possible. + +Proxy buffering causes logs events to not stream because they will be temporarily captured within NGINX's proxy buffer until either the connection is closed or the proxy buffer size is hit and the data is finally flushed to the client. + +Older browsers may not support this technology, in which case logs are streamed using a simple polling mechanism. + +To observe this issue, visit the task logs page of your sample job by first visiting the sample job at `http://localhost:8080/jobs/fs-example` then clicking into the most recent allocation, then clicking into the `fs-example` task, then clicking the logs tab. + +Logs will not load and eventually the following error will appear in the UI. + +[![Error in the UI. Cannot fetch logs. The logs for this task are inaccessible][img-cannot-fetch-logs]][img-cannot-fetch-logs] + +There will also be this additional error in the browser developer tools console. + +```plaintext +GET http://internal-ip:4646/v1/client/fs/logs/131f60f7-ef46-9fc0-d80d-29e673f01bd6?follow=true&offset=50000&origin=end&task=ansi&type=stdout net::ERR_NAME_NOT_RESOLVED +``` + +This `ERR_NAME_NOT_RESOLVED` error can be safely ignored. To prevent streaming logs through Nomad server nodes when unnecessary, the Web UI optimistically attempts to connect directly to the client node the task is running on. Since the Nomad configuration file used in this guide specifically advertises an address that can't be reached, the UI automatically falls back to requesting logs through the proxy. + +To allow log streaming through NGINX, the NGINX configuration needs to be updated to disable proxy buffering. Add the following to the `location` block of the existing NGINX configuration file. + +```nginx +# ... +proxy_read_timeout 319s; + +# Nomad log streaming uses streaming HTTP requests. In order to +# synchronously stream logs from Nomad to NGINX to the browser +# proxy buffering needs to be turned off. +proxy_buffering off; +# ... +``` + +Restart the NGINX docker container to load these configuration changes. + +## Enable WebSocket connections + +As of Nomad 0.11.0, the Web UI has supported [interactive exec sessions with any running task in the cluster](https://www.hashicorp.com/blog/hashicorp-nomad-remote-exec-web-ui). This is achieved using the exec API which is implemented using WebSockets. + +WebSockets are necessary for the exec API because they allow bidirectional data transfer. This is used to receive changes to the remote output as well as send commands and signals from the browser-based terminal. + +The way a WebSocket connection is established is through a handshake request. The handshake is an HTTP request with special `Connection` and `Upgrade` headers. + +WebSockets also do not support CORS headers. The server-side of a WebSocket connection needs to verify trusted origins on its own. Nomad does this verification by checking if the `Origin` header of the handshake request is equal to the address of the Nomad API. + +By default NGINX will not fulfill the handshake or the origin verification. This results in exec sessions immediately terminating. You can experience this in the Web UI by going to `http://localhost:8080/jobs/fs-example`, clicking the Exec button, choosing the task, and attempting to run the command `/bin/sh`. + +[![Error in the UI when running /bin/sh. The connection has closed.][img-cannot-remote-exec]][img-cannot-remote-exec] + +The fulfill the handshake NGINX will need to forward the `Connection` and `Upgrade` headers. To meet the origin verification required by the Nomad API, NGINX will have to override the existing `Origin` header to match the host address. Add the following to the `location` block of the existing NGINX configuration file. + +```nginx +# ... +proxy_buffering off; + +# The Upgrade and Connection headers are used to establish +# a WebSockets connection. +proxy_set_header Upgrade $http_upgrade; +proxy_set_header Connection "upgrade"; + +# The default Origin header will be the proxy address, which +# will be rejected by Nomad. It must be rewritten to be the +# host address instead. +proxy_set_header Origin "${scheme}://${proxy_host}"; +# ... +``` + +Restart the NGINX docker container to load these configuration changes. + +WebSocket connections are also stateful. If you are planning on using NGINX to balance load across all Nomad server nodes, it is important to ensure that WebSocket connections get routed to a consistent host. + +This can be done by specifying an upstream in NGINX and using it as the proxy pass. Add the following after the server block in the existing NGINX configuration file. + +```nginx +# ... +# Since WebSockets are stateful connections but Nomad has multiple +# server nodes, an upstream with ip_hash declared is required to ensure +# that connections are always proxied to the same server node when possible. +upstream nomad-ws { + ip_hash; + server host.docker.internal:4646; +} +# ... +``` + +Traffic must also pass through the upstream. To do this, change the `proxy_pass` in the NGINX configuration file. + +```nginx +# ... +location / { + proxy_pass http://nomad-ws +# ... +``` + +Since a dev environment only has one node, this change has no observable effect. + +## Review the complete NGINX configuration + +At this point all Web UI features are now working through the NGINX proxy. Here is the completed NGINX configuration file. + +```nginx +events {} + +http { + server { + location / { + proxy_pass http://nomad-ws; + proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; + + # Nomad blocking queries will remain open for a default of 5 minutes. + # Increase the proxy timeout to accommodate this timeout with an + # additional grace period. + proxy_read_timeout 319s; + + # Nomad log streaming uses streaming HTTP requests. In order to + # synchronously stream logs from Nomad to NGINX to the browser + # proxy buffering needs to be turned off. + proxy_buffering off; + + # The Upgrade and Connection headers are used to establish + # a WebSockets connection. + proxy_set_header Upgrade $http_upgrade; + proxy_set_header Connection "upgrade"; + + # The default Origin header will be the proxy address, which + # will be rejected by Nomad. It must be rewritten to be the + # host address instead. + proxy_set_header Origin "${scheme}://${proxy_host}"; + } + } + + # Since WebSockets are stateful connections but Nomad has multiple + # server nodes, an upstream with ip_hash declared is required to ensure + # that connections are always proxied to the same server node when possible. + upstream nomad-ws { + ip_hash; + server host.docker.internal:4646; + } +} +``` + +## Next steps + +In this guide, you set up a reverse NGINX proxy configured for the Nomad UI. +You also explored common configuration properties necessary to allow the Nomad +UI to work properly through a proxy—connection timeouts, proxy buffering, +WebSocket connections, and Origin header rewriting. + +You can use these building blocks to configure your preferred proxy server +software to work with the Nomad UI. For further information about the NGINX +specific configuration highlighted in this guide, consult: + +- [connection timeout][nginx-proxy-read-timeout] +- [proxy buffering][nginx-proxy-buffering] +- [WebSocket proxying][nginx-websocket-proxying] +- [session persistence][nginx-session-persistence] + +[blocking-queries]: /nomad/api-docs#blocking-queries +[nginx-proxy-buffering]: http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_request_buffering +[nginx-proxy-read-timeout]: http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_read_timeout +[nginx-session-persistence]: https://nginx.org/en/docs/http/load_balancing.html#nginx_load_balancing_with_ip_hash +[nginx-websocket-proxying]: https://nginx.org/en/docs/http/websocket.html +[img-cannot-fetch-logs]: /img/clusters/cannot-fetch-logs.png +[img-cannot-remote-exec]: /img/clusters/cannot-remote-exec.png +[img-chrome-pending]: /img/clusters/chrome-pending.png +[img-chrome-timeout]: /img/clusters/chrome-timeout.png +[img-firefox-pending]: /img/clusters/firefox-pending.png +[img-firefox-timeout]: /img/clusters/firefox-timeout.png +[img-safari-pending]: /img/clusters/safari-pending.png +[img-safari-timeout]: /img/clusters/safari-timeout.png diff --git a/website/content/docs/install/index.mdx b/website/content/docs/deploy/index.mdx similarity index 99% rename from website/content/docs/install/index.mdx rename to website/content/docs/deploy/index.mdx index bbeba80ac..236977daa 100644 --- a/website/content/docs/install/index.mdx +++ b/website/content/docs/deploy/index.mdx @@ -11,7 +11,7 @@ Nomad is available as a pre-compiled binary or as a package for several operating systems. You can also [build Nomad from source](#from-source). -> If you are interested in trialing Nomad without installing it locally, see -the [Quickstart](/nomad/docs/install/quickstart) for options to get started with +the [Quickstart](/nomad/docs/quickstart) for options to get started with Nomad. diff --git a/website/content/docs/operations/nomad-agent.mdx b/website/content/docs/deploy/nomad-agent.mdx similarity index 90% rename from website/content/docs/operations/nomad-agent.mdx rename to website/content/docs/deploy/nomad-agent.mdx index df3862cbb..408151bce 100644 --- a/website/content/docs/operations/nomad-agent.mdx +++ b/website/content/docs/deploy/nomad-agent.mdx @@ -1,19 +1,19 @@ --- layout: docs -page_title: Nomad agents +page_title: Operate a Nomad agent description: |- The Nomad agent is a long running process which can be used either in a client or server mode. --- -# Nomad agents +# Operate a Nomad agent A Nomad agent is a long running process that runs on every machine in your Nomad cluster. The behavior of the agent depends on if it is running in client or server mode. Clients run tasks, while servers manage the cluster. -Server agents are part of the [consensus protocol](/nomad/docs/concepts/consensus) and -[gossip protocol](/nomad/docs/concepts/gossip). The consensus protocol, powered +Server agents are part of the [consensus protocol](/nomad/docs/architecture/cluster/consensus) and +[gossip protocol](/nomad/docs/architecture/security/gossip). The consensus protocol, powered by Raft, lets the servers perform leader election and state replication. The gossip protocol allows for server clustering and multi-region federation. The higher burden on the server nodes means that you should run them on @@ -21,7 +21,7 @@ dedicated instances because the servers are more resource intensive than a client node. Client agents use fingerprinting to determine the capabilities and resources of -the host machine, as well as what [drivers](/nomad/docs/drivers) are available. +the host machine, as well as what [drivers](/nomad/docs/job-declare/task-driver) are available. Clients register with servers to provide node information and a heartbeat. Clients run tasks that the server assigns to them. Client nodes make up the majority of the cluster and are very lightweight. They interface with the server @@ -30,7 +30,7 @@ nodes and maintain very little state of their own. Each cluster has usually 3 or ## Run an agent -Start the agent with the [`nomad agent` command](/nomad/docs/commands/agent). +Start the agent with the [`nomad agent` command](/nomad/commands/agent). This command blocks, running forever or until told to quit. The `nomad agent` command takes a variety of configuration options, but most have sane defaults. @@ -136,7 +136,7 @@ this lifecycle is useful for building a mental model of an agent's interactions with a cluster and how the cluster treats a node. When a client agent starts, it fingerprints the host machine to identify its -attributes, capabilities, and [task drivers](/nomad/docs/drivers). The client +attributes, capabilities, and [task drivers](/nomad/docs/job-declare/task-driver). The client then reports this information to the servers during an initial registration. You provide the addresses of known servers to the agent via configuration, potentially using DNS for resolution. Use [Consul](https://www.consul.io/) @@ -155,10 +155,10 @@ garbage collection of nodes. By default, if a node is in a failed or 'down' state for over 24 hours, Nomad garbage collects that node. Servers are slightly more complex since they perform additional functions. They -participate in a [gossip protocol](/nomad/docs/concepts/gossip) both to cluster +participate in a [gossip protocol](/nomad/docs/architecture/security/gossip) both to cluster within a region and to support multi-region configurations. When a server starts, it does not know the address of other servers in the cluster. To discover its peers, it must join the cluster. You do this with the -[`server join` command](/nomad/docs/commands/server/join) or by providing the +[`server join` command](/nomad/commands/server/join) or by providing the proper configuration on start. Once a node joins, this information is gossiped to the entire cluster, meaning all nodes will eventually be aware of each other. @@ -185,8 +185,8 @@ owned by `root` with filesystem permissions set to `0700`. [`leave_on_interrupt`]: /nomad/docs/configuration#leave_on_interrupt [`leave_on_terminate`]: /nomad/docs/configuration#leave_on_terminate -[`server force-leave` command]: /nomad/docs/commands/server/force-leave -[consensus]: /nomad/docs/concepts/consensus +[`server force-leave` command]: /nomad/commands/server/force-leave +[consensus]: /nomad/docs/architecture/cluster/consensus [`drain_on_shutdown`]: /nomad/docs/configuration/client#drain_on_shutdown [reload its configuration]: /nomad/docs/configuration#configuration-reload -[metrics]: /nomad/docs/operations/metrics-reference +[metrics]: /nomad/docs/reference/metrics diff --git a/website/content/docs/install/production/index.mdx b/website/content/docs/deploy/production/index.mdx similarity index 66% rename from website/content/docs/install/production/index.mdx rename to website/content/docs/deploy/production/index.mdx index 1df72e142..8c9d99554 100644 --- a/website/content/docs/install/production/index.mdx +++ b/website/content/docs/deploy/production/index.mdx @@ -1,11 +1,11 @@ --- layout: docs -page_title: Install Nomad in a production environment +page_title: Deploy Nomad in a production environment description: |- - This page contains a high-level overview of Nomad production installation and links review hardware requirements, reference architecture, a deployment guide, and configuration documentation. + This page contains a high-level overview of Nomad production deployment and links review hardware requirements, reference architecture, a deployment guide, and configuration documentation. --- -# Install Nomad in a production environment +# Deploy Nomad in a production environment While HashiCorp Nomad provides a low-friction practitioner experience out of the box, there are a few critical steps to take for a successful production @@ -28,20 +28,20 @@ deploying HashiCorp Nomad in production. ## Verify hardware requirements Review the recommended machine resources (instances), port requirements, and -network topology for Nomad in the [Hardware Requirements](/nomad/docs/install/production/requirements). +network topology for Nomad in the [Hardware Requirements](/nomad/docs/deploy/production/requirements). ## Install Nomad -Visit the [Install Nomad](/nomad/docs/install) page to learn the options +Visit the [Install Nomad](/nomad/docs/deploy) page to learn the options available for installing Nomad and how to verify a successful installation. ## Configure your Nomad servers and clients -Refer to the [Set Server & Client Nodes](/nomad/docs/operations/nomad-agent) -and [Nomad Agent documentation](/nomad/docs/commands/agent) pages to learn about the +Refer to the [Set Server & Client Nodes](/nomad/docs/deploy/nomad-agent) +and [Nomad Agent documentation](/nomad/commands/agent) pages to learn about the Nomad agent process and how to configure the server and client nodes in your cluster. -[nomad reference architecture]: /nomad/tutorials/enterprise/production-reference-architecture-vm-with-consul +[nomad reference architecture]: /nomad/docs/deploy/production/reference-architecture [nomad deployment guide]: /nomad/tutorials/enterprise/production-deployment-guide-vm-with-consul diff --git a/website/content/docs/deploy/production/reference-architecture.mdx b/website/content/docs/deploy/production/reference-architecture.mdx new file mode 100644 index 000000000..b4116bb36 --- /dev/null +++ b/website/content/docs/deploy/production/reference-architecture.mdx @@ -0,0 +1,252 @@ +--- +layout: docs +page_title: Production reference architecture +description: |- + Review the recommended compute and networking resources for provisioning a Nomad Enterprise cluster in a production environment. +--- + +# Production reference architecture + +This document provides recommended practices and a reference architecture for +Nomad production deployments. This reference architecture conveys a general +architecture that should be adapted to accommodate the specific needs of each +implementation. + +The following topics are addressed: + +- [Reference Architecture](#ra) +- [Deployment Topology within a Single Region](#one-region) +- [Deployment Topology across Multiple Regions](#multi-region) +- [Network Connectivity Details](#net) +- [Deployment System Requirements](#system-reqs) +- [High Availability](#high-availability) +- [Failure Scenarios](#failure-scenarios) + + + +This document describes deploying a Nomad cluster in combination with, or with +access to, a [Consul cluster][]. We recommend the use of Consul with Nomad to +provide automatic clustering, service discovery, health checking and dynamic +configuration. + + + +## Reference architecture + +A Nomad cluster typically comprises three or five servers (but no more than +seven) and a number of client agents. Nomad differs slightly from Consul in +that it divides infrastructure into [regions][glossary-regions] which are +served by one Nomad server cluster, but can manage multiple +[datacenters][glossary-dc] or availability zones. For example, a _US Region_ +can include datacenters _us-east-1_ and _us-west-2_. + +In a Nomad multi-region architecture, communication happens via [WAN gossip][]. +Additionally, Nomad can integrate easily with Consul to provide features such as +automatic clustering, service discovery, and dynamic configurations. Thus we +recommend you use Consul in your Nomad deployment to simplify the deployment. + + + +In cloud environments, a single cluster may be deployed across multiple +availability zones. For example, in AWS each Nomad server can be deployed to an +associated EC2 instance, and those EC2 instances distributed across multiple +AZs. Similarly, Nomad server clusters can be deployed to multiple cloud regions +to allow for region level HA scenarios. + +For more information on Nomad server cluster design, see the [cluster +requirements documentation][requirements]. + +The design shared in this document is the recommended architecture for +production environments, as it provides flexibility and resilience. Nomad +utilizes an existing Consul server cluster; however, the deployment design of +the Consul server cluster is outside the scope of this document. + +Nomad to Consul connectivity is over HTTP and should be secured with TLS as well as a Consul +token to provide encryption of all traffic. This is done using Nomad's +[Automatic Clustering with Consul][consul-clustering]. + + + +### Deployment topology within a single region + +A single Nomad cluster is recommended for applications deployed in the same region. + +Each cluster is expected to have either three or five servers. +This strikes a balance between availability in the case of failure and +performance, as [Raft][] consensus gets progressively +slower as more servers are added. + +The time taken by a new server to join an existing large cluster may increase as +the size of the cluster increases. + +#### Reference diagram + +[![Reference diagram][img-reference-diagram]][img-reference-diagram] + + + +### Deployment topology across multiple regions + +By deploying Nomad server clusters in multiple regions, the user is able to +interact with the Nomad servers by targeting any region from any Nomad server +even if that server resides in a separate region. However, most data is not +replicated between regions as they are fully independent clusters. The +exceptions which _are_ replicated between regions are: + +- [ACL policies and global tokens][acl] +- [Sentinel policies in Nomad Enterprise][sentinel] + +Nomad server clusters in different datacenters can be federated using WAN links. +The server clusters can be joined to communicate over the WAN on port `4648`. +This same port is used for single datacenter deployments over LAN as well. + +Additional documentation is available to learn more about Nomad cluster +[federation][]. + + + +## Network connectivity details + +[![Nomad network diagram][img-nomad-net]][img-nomad-net] + +Nomad servers are expected to be able to communicate in high bandwidth, low +latency network environments and have below 10 millisecond latencies between +cluster members. Nomad servers can be spread across cloud regions or datacenters +if they satisfy these latency requirements. + +Nomad client clusters require the ability to receive traffic as noted in +the Network Connectivity Details; however, clients can be separated into any +type of infrastructure (multi-cloud, on-prem, virtual, bare metal, etc.) as long +as they are reachable and can receive job requests from the Nomad servers. + +Additional documentation is available to learn more about [Nomad networking][]. + + + +## Deployment system requirements + +Nomad server agents are responsible for maintaining the cluster state, +responding to RPC queries (read operations), and for processing all write +operations. Given that Nomad server agents do most of the heavy lifting, server +sizing is critical for the overall performance efficiency and health of the +Nomad cluster. + +### Nomad servers + + + +| Type | CPU | Memory | Disk | Typical Cloud Instance Types | +| ----- | --------- | ----------------- | ----------- | ------------------------------------------ | +| Small | 2-4 core | 8-16 GB RAM | 50 GB | **AWS**: m5.large, m5.xlarge | +| | | | | **Azure**: Standard_D2_v3, Standard_D4_v3 | +| | | | | **GCP**: n2-standard-2, n2-standard-4 | +| Large | 8-16 core | 32-64 GB RAM | 100 GB | **AWS**: m5.2xlarge, m5.4xlarge | +| | | | | **Azure**: Standard_D8_v3, Standard_D16_v3 | +| | | | | **GCP**: n2-standard-8, n2-standard-16 | + + + +#### Hardware sizing considerations + +- The small size would be appropriate for most initial production + deployments, or for development/testing environments. + +- The large size is for production environments where there is a + consistently high workload. + + + + For large workloads, ensure that the disks support a high number of +IOPS to keep up with the rapid Raft log update rate. + + + +Nomad clients can be setup with specialized workloads as well. For example, if +workloads require GPU processing, a Nomad datacenter can be created to serve +those GPU specific jobs and joined to a Nomad server cluster. For more +information on specialized workloads, see the documentation on [job +constraints][] to target specific client nodes. + +## High availability + +A Nomad server cluster is the highly available unit of deployment within a +single datacenter. A recommended approach is to deploy a three or five node +Nomad server cluster. With this configuration, during a Nomad server outage, +failover is handled immediately without human intervention. + +When setting up high availability across regions, multiple Nomad server clusters +are deployed and connected via WAN gossip. Nomad clusters in regions are fully +independent from each other and do not share jobs, clients, or state. Data +residing in a single region-specific cluster is not replicated to other clusters +in other regions. + +## Failure scenarios + +Typical distribution in a cloud environment is to spread Nomad server nodes into +separate Availability Zones (AZs) within a high bandwidth, low latency network, +such as an AWS Region. The diagram below shows Nomad servers deployed in +multiple AZs promoting a single voting member per AZ and providing both AZ-level +and node-level failure protection. + +[![Nomad fault tolerance][img-fault-tolerance]][img-fault-tolerance] + +Additional documentation is available to learn more about [cluster sizing and +failure tolerances][sizing] as well as [outage recovery][]. + +### Availability zone failure + +In the event of a single AZ failure, only a single Nomad server is affected +which would not impact job scheduling as long as there is still a Raft quorum +(that is, 2 available servers in a 3 server cluster, 3 available servers in a 5 +server cluster, more generally: + +
quorum = floor( count(members) / 2) + 1
+ +There are two scenarios that could occur should an AZ fail in a multiple AZ +setup: leader loss or follower loss. + +#### Leader server loss + +If the AZ containing the Nomad leader server fails, the remaining quorum members +would elect a new leader. The new leader then begins to accept new log entries and +replicates these entries to the remaining followers. + +#### Follower server loss + +If the AZ containing a Nomad follower server fails, there is no immediate impact +to the Nomad leader server or cluster operations. However, there still must be a +Raft quorum in order to properly manage a future failure of the Nomad leader +server. + +### Region failure + +In the event of a region-level failure (which would contain an entire Nomad +server cluster), clients are still able to submit jobs to another region +that is properly federated. However, data loss is likely as Nomad +server clusters do not replicate their data to other region clusters. See +[Multi-region Federation][federation] for more setup information. + +## Next steps + +Read [Deployment Guide][deployment-guide]] to learn the steps required to +install and configure a single HashiCorp Nomad cluster that uses Consul. + +[acl]: /nomad/secure/acl/bootstrap +[consul cluster]: /nomad/docs/networking/consul +[deployment-guide]: /nomad/tutorials/enterprise/production-deployment-guide-vm-with-consul +[img-fault-tolerance]: /img/deploy/nomad_fault_tolerance.png +[img-nomad-net]: /img/deploy/nomad_network_arch_0-1x.png +[img-reference-diagram]: /img/deploy/nomad_reference_diagram.png +[job constraints]: /nomad/docs/job-specification/constraint +[federation]: /nomad/docs/deploy/clusters/federate-regions +[nomad networking]: /nomad/docs/deploy/production/requirements#network-topology +[nomad server federation]: /nomad/docs/deploy/clusters/federate-regions +[outage recovery]: /nomad/docs/manage/outage-recovery +[raft]: https://raft.github.io/ +[requirements]: /nomad/docs/deploy/production/requirements +[sentinel]: /nomad/docs/govern/sentinel +[sizing]: /nomad/docs/architecture/cluster/consensus#deployment_table +[wan gossip]: /nomad/docs/architecture/security/gossip +[consul-clustering]: /nomad/docs/deploy/clusters/connect-nodes +[glossary-regions]: /nomad/docs/glossary#regions +[glossary-dc]: /nomad/docs/glossary#datacenters diff --git a/website/content/docs/install/production/requirements.mdx b/website/content/docs/deploy/production/requirements.mdx similarity index 98% rename from website/content/docs/install/production/requirements.mdx rename to website/content/docs/deploy/production/requirements.mdx index be6973647..66e561ff1 100644 --- a/website/content/docs/install/production/requirements.mdx +++ b/website/content/docs/deploy/production/requirements.mdx @@ -209,7 +209,7 @@ in automated pipelines for [CLI operations][docs_cli], such as ~> **Note:** The Nomad Docker image is not tested when running as an agent. -[Security Model]: /nomad/docs/concepts/security +[Security Model]: /nomad/docs/architecture/security [production deployment guide]: /nomad/tutorials/enterprise/production-deployment-guide-vm-with-consul#configure-systemd [linux capabilities]: #linux-capabilities [`capabilities(7)`]: https://man7.org/linux/man-pages/man7/capabilities.7.html @@ -218,7 +218,7 @@ in automated pipelines for [CLI operations][docs_cli], such as [Rootless Nomad Clients]: #rootless-nomad-clients [nomad_docker_hub]: https://hub.docker.com/r/hashicorp/nomad [docs_cli]: /nomad/docs/commands -[`nomad job plan`]: /nomad/docs/commands/job/plan -[`nomad fmt`]: /nomad/docs/commands/fmt -[mTLS]: /nomad/tutorials/transport-security/security-enable-tls +[`nomad job plan`]: /nomad/commands/job/plan +[`nomad fmt`]: /nomad/commands/fmt +[mTLS]: /nomad/docs/secure/traffic/tls [ephemeral disk migration]: /nomad/docs/job-specification/ephemeral_disk#migrate diff --git a/website/content/docs/install/windows-service.mdx b/website/content/docs/deploy/production/windows-service.mdx similarity index 100% rename from website/content/docs/install/windows-service.mdx rename to website/content/docs/deploy/production/windows-service.mdx diff --git a/website/content/docs/deploy/task-driver/docker.mdx b/website/content/docs/deploy/task-driver/docker.mdx new file mode 100644 index 000000000..9bb3b35b7 --- /dev/null +++ b/website/content/docs/deploy/task-driver/docker.mdx @@ -0,0 +1,491 @@ +--- +layout: docs +page_title: Configure the Docker task driver +description: Nomad's Docker task driver lets you run Docker-based tasks in your jobs. Modify the Docker task driver plugin configuration. Learn about CPU, memory, filesystem IO, and security resource isolation as well as how Nomad handles dangling containers. +--- + +# Configure the Docker task driver + +Name: `docker` + +The `docker` driver provides a first-class Docker workflow on Nomad. The Docker +driver handles downloading containers, mapping ports, and starting, watching, +and cleaning up after containers. + +**Note:** If you are using Docker Desktop for Windows or MacOS, check +[the FAQ][faq-win-mac]. + +## Capabilities + +The `docker` driver implements the following capabilities: + +| Feature | Implementation | +| -------------------- | ----------------- | +| `nomad alloc signal` | true | +| `nomad alloc exec` | true | +| filesystem isolation | image | +| network isolation | host, group, task | +| volume mounting | all | + +## Client Requirements + +Nomad requires Docker to be installed and running on the host alongside the +Nomad agent. + +By default Nomad communicates with the Docker daemon using the daemon's Unix +socket. Nomad will need to be able to read/write to this socket. If you do not +run Nomad as root, make sure you add the Nomad user to the Docker group so +Nomad can communicate with the Docker daemon. + +For example, on Ubuntu you can use the `usermod` command to add the `nomad` +user to the `docker` group so you can run Nomad without root: + +```shell-session +$ sudo usermod -G docker -a nomad +``` + +Nomad clients manage a cpuset cgroup for each task to reserve or share CPU +[cores][]. In order for Nomad to be compatible with Docker's own cgroups +management, it must write to cgroups owned by Docker, which requires running as +root. If Nomad is not running as root, CPU isolation and NUMA-aware scheduling +will not function correctly for workloads with `resources.cores`, including +workloads using task drivers other than `docker` on the same host. + +For the best performance and security features you should use recent versions +of the Linux Kernel and Docker daemon. + +If you would like to change any of the options related to the `docker` driver +on a Nomad client, you can modify them with the [plugin block][plugin-block] +syntax. Below is an example of a configuration (many of the values are the +default). See the next section for more information on the options. + +```hcl +plugin "docker" { + config { + endpoint = "unix:///var/run/docker.sock" + + auth { + config = "/etc/docker-auth.json" + helper = "ecr-login" + } + + tls { + cert = "/etc/nomad/nomad.pub" + key = "/etc/nomad/nomad.pem" + ca = "/etc/nomad/nomad.cert" + } + + extra_labels = ["job_name", "job_id", "task_group_name", "task_name", "namespace", "node_name", "node_id"] + + gc { + image = true + image_delay = "3m" + container = true + + dangling_containers { + enabled = true + dry_run = false + period = "5m" + creation_grace = "5m" + } + } + + volumes { + enabled = true + selinuxlabel = "z" + } + + allow_privileged = false + allow_caps = ["chown", "net_raw"] + } +} +``` + +## Plugin Options + +- `endpoint` - If using a non-standard socket, HTTP or another location, or if + TLS is being used, docker.endpoint must be set. If unset, Nomad will attempt + to instantiate a Docker client using the `DOCKER_HOST` environment variable and + then fall back to the default listen address for the given operating system. + Defaults to `unix:///var/run/docker.sock` on Unix platforms and + `npipe:////./pipe/docker_engine` for Windows. + +- `allow_privileged` - Defaults to `false`. Changing this to true will allow + containers to use privileged mode, which gives the containers full access to + the host's devices. Note that you must set a similar setting on the Docker + daemon for this to work. + +- `pull_activity_timeout` - Defaults to `2m`. If Nomad receives no communication + from the Docker engine during an image pull within this timeframe, Nomad will + time out the request that initiated the pull command. (Minimum of `1m`) + +- `pids_limit` - Defaults to unlimited (`0`). An integer value that specifies + the pid limit for all the Docker containers running on that Nomad client. You + can override this limit by setting [`pids_limit`] in your task config. If + this value is greater than `0`, your task `pids_limit` must be less than or + equal to the value defined here. + +- `allow_caps` - A list of allowed Linux capabilities. Defaults to + +```hcl +["audit_write", "chown", "dac_override", "fowner", "fsetid", "kill", "mknod", + "net_bind_service", "setfcap", "setgid", "setpcap", "setuid", "sys_chroot"] +``` + + which is the same list of capabilities allowed by [docker by + default][docker_caps] (without [`NET_RAW`][no_net_raw]). Allows the operator + to control which capabilities can be obtained by tasks using + [`cap_add`][cap_add] and [`cap_drop`][cap_drop] options. Supports the value + `"all"` as a shortcut for allow-listing all capabilities supported by the + operating system. Note that due to a limitation in Docker, tasks running as + non-root users cannot expand the capabilities set beyond the default. They can + only have their capabilities reduced. + +!> **Warning:** Allowing more capabilities beyond the default may lead to +undesirable consequences, including untrusted tasks being able to compromise the +host system. + +- `allow_runtimes` - defaults to `["runc", "nvidia"]` - A list of the allowed + docker runtimes a task may use. + +- `auth` block: + + - `config` - Allows an operator to specify a + JSON file which is in the dockercfg format containing authentication + information for a private registry, from either (in order) `auths`, + `credsStore` or `credHelpers`. + + - `helper` - Allows an operator to specify a + [credsStore](https://docs.docker.com/engine/reference/commandline/login/#credential-helper-protocol) + like script on `$PATH` to lookup authentication information from external + sources. The script's name must begin with `docker-credential-` and this + option should include only the basename of the script, not the path. + + If you set an auth helper, it will be tried for all images, including + public images. If you mix private and public images, you will need to + include [`auth_soft_fail=true`] in every job using a public image. + +- `tls` block: + + - `cert` - Path to the server's certificate file (`.pem`). Specify this + along with `key` and `ca` to use a TLS client to connect to the docker + daemon. `endpoint` must also be specified or this setting will be ignored. + + - `key` - Path to the client's private key (`.pem`). Specify this along with + `cert` and `ca` to use a TLS client to connect to the docker daemon. + `endpoint` must also be specified or this setting will be ignored. + + - `ca` - Path to the server's CA file (`.pem`). Specify this along with + `cert` and `key` to use a TLS client to connect to the docker daemon. + `endpoint` must also be specified or this setting will be ignored. + +- `disable_log_collection` - Defaults to `false`. Setting this to true will + disable Nomad logs collection of Docker tasks. If you don't rely on nomad log + capabilities and exclusively use host based log aggregation, you may consider + this option to disable nomad log collection overhead. + +- `extra_labels` - Extra labels to add to Docker containers. + Available options are `job_name`, `job_id`, `task_group_name`, `task_name`, + `namespace`, `node_name`, `node_id`. Globs are supported (e.g. `task*`) + +- `logging` block: + + - `type` - Defaults to `"json-file"`. Specifies the logging driver docker + should use for all containers Nomad starts. Note that for older versions + of Docker, only `json-file` file or `journald` will allow Nomad to read + the driver's logs via the Docker API, and this will prevent commands such + as `nomad alloc logs` from functioning. + + - `config` - Defaults to `{ max-file = "2", max-size = "2m" }`. This option + can also be used to pass further + [configuration](https://docs.docker.com/config/containers/logging/configure/) + to the logging driver. + +- `gc` block: + + - `image` - Defaults to `true`. Changing this to `false` will prevent Nomad + from removing images from stopped tasks. + + - `image_delay` - A time duration, as [defined + here](https://golang.org/pkg/time/#ParseDuration), that defaults to `3m`. + The delay controls how long Nomad will wait between an image being unused + and deleting it. If a task is received that uses the same image within + the delay, the image will be reused. If an image is referenced by more than + one tag, `image_delay` may not work correctly. + + - `container` - Defaults to `true`. This option can be used to disable Nomad + from removing a container when the task exits. Under a name conflict, + Nomad may still remove the dead container. + + - `dangling_containers` block for controlling dangling container detection + and cleanup: + + - `enabled` - Defaults to `true`. Enables dangling container handling. + + - `dry_run` - Defaults to `false`. Only log dangling containers without + cleaning them up. + + - `period` - Defaults to `"5m"`. A time duration that controls interval + between Nomad scans for dangling containers. + + - `creation_grace` - Defaults to `"5m"`. Grace period after a container is + created during which the GC ignores it. Only used to prevent the GC from + removing newly created containers before they are registered with the + GC. Should not need adjusting higher but may be adjusted lower to GC + more aggressively. + +- `volumes` block: + + - `enabled` - Defaults to `false`. Allows tasks to bind host paths + (`volumes`) inside their container and use volume drivers + (`volume_driver`). Binding relative paths is always allowed and will be + resolved relative to the allocation's directory. + + - `selinuxlabel` - Allows the operator to set a SELinux label to the + allocation and task local bind-mounts to containers. If used with + `docker.volumes.enabled` set to false, the labels will still be applied to + the standard binds in the container. + +- `infra_image` - This is the Docker image to use when creating the parent + container necessary when sharing network namespaces between tasks. Defaults to + `registry.k8s.io/pause-:3.3`. The image will only be pulled from the + container registry if its tag is `latest` or the image doesn't yet exist + locally. + +- `infra_image_pull_timeout` - A time duration that controls how long Nomad will + wait before cancelling an in-progress pull of the Docker image as specified in + `infra_image`. Defaults to `"5m"`. + +- `image_pull_timeout` - (Optional) A default time duration that controls how long Nomad + waits before cancelling an in-progress pull of the Docker image as specified + in `image` across all tasks. Defaults to `"5m"`. + +- `windows_allow_insecure_container_admin` - Indicates that on windows, docker + checks the `task.user` field or, if unset, the container image manifest after + pulling the container, to see if it's running as `ContainerAdmin`. If so, exits + with an error unless the task config has `privileged=true`. Defaults to `false`. + +## Client Configuration + +~> Note: client configuration options will soon be deprecated. Please use +[plugin options][plugin-options] instead. See the [plugin block][plugin-block] +documentation for more information. + +The `docker` driver has the following [client configuration +options](/nomad/docs/configuration/client#options): + +- `docker.endpoint` - If using a non-standard socket, HTTP or another location, + or if TLS is being used, `docker.endpoint` must be set. If unset, Nomad will + attempt to instantiate a Docker client using the `DOCKER_HOST` environment + variable and then fall back to the default listen address for the given + operating system. Defaults to `unix:///var/run/docker.sock` on Unix platforms + and `npipe:////./pipe/docker_engine` for Windows. + +- `docker.auth.config` - Allows an operator to specify a + JSON file which is in the dockercfg format containing authentication + information for a private registry, from either (in order) `auths`, + `credsStore` or `credHelpers`. + +- `docker.auth.helper` - Allows an operator to specify a + [credsStore](https://docs.docker.com/engine/reference/commandline/login/#credential-helper-protocol) + -like script on \$PATH to lookup authentication information from external + sources. The script's name must begin with `docker-credential-` and this + option should include only the basename of the script, not the path. + +- `docker.tls.cert` - Path to the server's certificate file (`.pem`). Specify + this along with `docker.tls.key` and `docker.tls.ca` to use a TLS client to + connect to the docker daemon. `docker.endpoint` must also be specified or this + setting will be ignored. + +- `docker.tls.key` - Path to the client's private key (`.pem`). Specify this + along with `docker.tls.cert` and `docker.tls.ca` to use a TLS client to + connect to the docker daemon. `docker.endpoint` must also be specified or this + setting will be ignored. + +- `docker.tls.ca` - Path to the server's CA file (`.pem`). Specify this along + with `docker.tls.cert` and `docker.tls.key` to use a TLS client to connect to + the docker daemon. `docker.endpoint` must also be specified or this setting + will be ignored. + +- `docker.cleanup.image` Defaults to `true`. Changing this to `false` will + prevent Nomad from removing images from stopped tasks. + +- `docker.cleanup.image.delay` A time duration, as [defined + here](https://golang.org/pkg/time/#ParseDuration), that defaults to `3m`. The + delay controls how long Nomad will wait between an image being unused and + deleting it. If a tasks is received that uses the same image within the delay, + the image will be reused. + +- `docker.volumes.enabled`: Defaults to `false`. Allows tasks to bind host paths + (`volumes`) inside their container and use volume drivers (`volume_driver`). + Binding relative paths is always allowed and will be resolved relative to the + allocation's directory. + +- `docker.volumes.selinuxlabel`: Allows the operator to set a SELinux label to + the allocation and task local bind-mounts to containers. If used with + `docker.volumes.enabled` set to false, the labels will still be applied to the + standard binds in the container. + +- `docker.privileged.enabled` Defaults to `false`. Changing this to `true` will + allow containers to use `privileged` mode, which gives the containers full + access to the host's devices. Note that you must set a similar setting on the + Docker daemon for this to work. + +- `docker.caps.allowlist`: A list of allowed Linux capabilities. Defaults to + `"CHOWN,DAC_OVERRIDE,FSETID,FOWNER,MKNOD,NET_RAW,SETGID,SETUID,SETFCAP, SETPCAP,NET_BIND_SERVICE,SYS_CHROOT,KILL,AUDIT_WRITE"`, which is the list of + capabilities allowed by docker by default, as [defined + here](https://docs.docker.com/engine/reference/run/#runtime-privilege-and-linux-capabilities). + Allows the operator to control which capabilities can be obtained by tasks + using `cap_add` and `cap_drop` options. Supports the value `"ALL"` as a + shortcut for allowlisting all capabilities. + +- `docker.cleanup.container`: Defaults to `true`. This option can be used to + disable Nomad from removing a container when the task exits. Under a name + conflict, Nomad may still remove the dead container. + +- `docker.nvidia_runtime`: Defaults to `nvidia`. This option allows operators to select the runtime that should be used in order to expose Nvidia GPUs to the container. + +Note: When testing or using the `-dev` flag you can use `DOCKER_HOST`, +`DOCKER_TLS_VERIFY`, and `DOCKER_CERT_PATH` to customize Nomad's behavior. If +`docker.endpoint` is set Nomad will **only** read client configuration from the +config file. + +An example is given below: + +```hcl +client { + options { + "docker.cleanup.image" = "false" + } +} +``` + +## Client Attributes + +The `docker` driver will set the following client attributes: + +- `driver.docker` - This will be set to "1", indicating the driver is + available. + +- `driver.docker.bridge_ip` - The IP of the Docker bridge network if one + exists. + +- `driver.docker.version` - This will be set to version of the docker server. + +Here is an example of using these properties in a job file: + +```hcl +job "docs" { + # Require docker version higher than 1.2. + constraint { + attribute = "${attr.driver.docker.version}" + operator = ">" + version = "1.2" + } +} +``` + +## Resource Isolation + +### CPU + +Nomad limits containers' CPU based on CPU shares. CPU shares allow containers +to burst past their CPU limits. CPU limits will only be imposed when there is +contention for resources. When the host is under load your process may be +throttled to stabilize QoS depending on how many shares it has. You can see how +many CPU shares are available to your process by reading [`NOMAD_CPU_LIMIT`][runtime_env]. +1000 shares are approximately equal to 1 GHz. + +Please keep the implications of CPU shares in mind when you load test workloads +on Nomad. + +If resources [`cores`][cores] is set, the task is given an isolated reserved set of +CPU cores to make use of. The total set of cores the task may run on is the private +set combined with the variable set of unreserved cores. The private set of CPU cores +is available to your process by reading [`NOMAD_CPU_CORES`][runtime_env]. + +### Memory + +Nomad limits containers' memory usage based on total virtual memory. This means +that containers scheduled by Nomad cannot use swap. This is to ensure that a +swappy process does not degrade performance for other workloads on the same +host. + +Since memory is not an elastic resource, you will need to make sure your +container does not exceed the amount of memory allocated to it, or it will be +terminated or crash when it tries to malloc. A process can inspect its memory +limit by reading [`NOMAD_MEMORY_LIMIT`][runtime_env], but will need to track its own memory +usage. Memory limit is expressed in megabytes so 1024 = 1 GB. + +### IO + +Nomad's Docker integration does not currently provide QoS around network or +filesystem IO. These will be added in a later release. + +### Security + +Docker provides resource isolation by way of +[cgroups and namespaces](https://docs.docker.com/introduction/understanding-docker/#the-underlying-technology). +Containers essentially have a virtual file system all to themselves. If you +need a higher degree of isolation between processes for security or other +reasons, it is recommended to use full virtualization like +[QEMU](/nomad/docs/job-declare/task-driver/qemu). + +## Caveats + +### Dangling Containers + +Nomad has a detector and a reaper for dangling Docker containers, +containers that Nomad starts yet does not manage or track. Though rare, they +lead to unexpectedly running services, potentially with stale versions. + +When Docker daemon becomes unavailable as Nomad starts a task, it is possible +for Docker to successfully start the container but return a 500 error code from +the API call. In such cases, Nomad retries and eventually aims to kill such +containers. However, if the Docker Engine remains unhealthy, subsequent retries +and stop attempts may still fail, and the started container becomes a dangling +container that Nomad no longer manages. + +The newly added reaper periodically scans for such containers. It only targets +containers with a `com.hashicorp.nomad.allocation_id` label, or match Nomad's +conventions for naming and bind-mounts (i.e. `/alloc`, `/secrets`, `local`). +Containers that don't match Nomad container patterns are left untouched. + +Operators can run the reaper in a dry-run mode, where it only logs dangling +container ids without killing them, or disable it by setting the +`gc.dangling_containers` config block. + +### Docker for Windows + +Docker for Windows only supports running Windows containers. Because Docker for +Windows is relatively new and rapidly evolving you may want to consult the +[list of relevant issues on GitHub][winissues]. + +## Next steps + +[Use the Docker task driver in a job](/nomad/docs/job-declare/task-driver/docker). + +[faq-win-mac]: /nomad/docs/faq#q-how-to-connect-to-my-host-network-when-using-docker-desktop-windows-and-macos +[winissues]: https://github.com/hashicorp/nomad/issues?q=is%3Aopen+is%3Aissue+label%3Atheme%2Fdriver%2Fdocker+label%3Atheme%2Fplatform-windows +[plugin-options]: #plugin-options +[plugin-block]: /nomad/docs/configuration/plugin +[allocation working directory]: /nomad/docs/reference/runtime-environment-settings#task-directories 'Task Directories' +[`auth_soft_fail=true`]: /nomad/docs/job-declare/task-driver/docker#auth_soft_fail +[cap_add]: /nomad/docs/job-declare/task-driver/docker#cap_add +[cap_drop]: /nomad/docs/job-declare/task-driver/docker#cap_drop +[no_net_raw]: /nomad/docs/upgrade/upgrade-specific#nomad-1-1-0-rc1-1-0-5-0-12-12 +[upgrade_guide_extra_hosts]: /nomad/docs/upgrade/upgrade-specific#docker-driver +[tini]: https://github.com/krallin/tini +[docker_caps]: https://docs.docker.com/engine/reference/run/#runtime-privilege-and-linux-capabilities +[allow_caps]: /nomad/docs/job-declare/task-driver/docker#allow_caps +[Connect]: /nomad/docs/job-specification/connect +[`bridge`]: /nomad/docs/job-specification/network#bridge +[network block]: /nomad/docs/job-specification/network#bridge-mode +[`network.mode`]: /nomad/docs/job-specification/network#mode +[`pids_limit`]: /nomad/docs/job-declare/task-driver/docker#pids_limit +[Windows isolation]: https://learn.microsoft.com/en-us/virtualization/windowscontainers/manage-containers/hyperv-container +[cores]: /nomad/docs/job-specification/resources#cores +[runtime_env]: /nomad/docs/reference/runtime-environment-settings#job-related-variables +[`--cap-add`]: https://docs.docker.com/engine/reference/run/#runtime-privilege-and-linux-capabilities +[`--cap-drop`]: https://docs.docker.com/engine/reference/run/#runtime-privilege-and-linux-capabilities +[cores]: /nomad/docs/job-specification/resources#cores diff --git a/website/content/docs/drivers/exec.mdx b/website/content/docs/deploy/task-driver/exec.mdx similarity index 58% rename from website/content/docs/drivers/exec.mdx rename to website/content/docs/deploy/task-driver/exec.mdx index 05e465313..af8bc49c6 100644 --- a/website/content/docs/drivers/exec.mdx +++ b/website/content/docs/deploy/task-driver/exec.mdx @@ -1,132 +1,19 @@ --- layout: docs -page_title: Isolated Fork/Exec task driver -description: Nomad's Isolated Fork/Exec task driver lets you run binaries using OS isolation primitives. Learn how to use the Isolated Fork/Exec task driver in your jobs. Configure the command to execute with command arguments, namespace isolation, and Linux capabilities. Review the Isolated Fork/Exec task driver capabilities, plugin options, client requirements, and client attributes. Learn how the Isolated Fork/Exec task driver affects resource isolation, chroot, and CPU resources. +page_title: Configure the Isolated Fork/Exec task driver +description: Nomad's Isolated Fork/Exec task driver lets you run binaries using OS isolation primitives. Review the Isolated Fork/Exec task driver capabilities, plugin options, client requirements, and client attributes. Learn how the Isolated Fork/Exec task driver affects resource isolation, chroot, and CPU resources. --- -# Isolated Fork/Exec task driver +# Configure the Isolated Fork/Exec task driver Name: `exec` -The `exec` driver is used to execute a particular command for a task. -However, unlike [`raw_exec`](/nomad/docs/drivers/raw_exec) it uses the underlying isolation -primitives of the operating system to limit the task's access to resources. While -simple, since the `exec` driver can invoke any command, it can be used to call -scripts or other wrappers which provide higher level features. - -## Task Configuration - -```hcl -task "webservice" { - driver = "exec" - - config { - command = "my-binary" - args = ["-flag", "1"] - } -} -``` - -The `exec` driver supports the following configuration in the job spec: - -- `command` - The command to execute. Must be provided. If executing a binary - that exists on the host, the path must be absolute and within the task's - [chroot](#chroot) or in a [host volume][] mounted with a - [`volume_mount`][volume_mount] block. The driver will make the binary - executable and will search, in order: - - - The `local` directory with the task directory. - - The task directory. - - Any mounts, in the order listed in the job specification. - - The `usr/local/bin`, `usr/bin` and `bin` directories inside the task - directory. - - If executing a binary that is downloaded - from an [`artifact`](/nomad/docs/job-specification/artifact), the path can be - relative from the allocation's root directory. - -- `args` - (Optional) A list of arguments to the `command`. References - to environment variables or any [interpretable Nomad - variables](/nomad/docs/runtime/interpolation) will be interpreted before - launching the task. - -- `pid_mode` - (Optional) Set to `"private"` to enable PID namespace isolation for - this task, or `"host"` to disable isolation. If left unset, the behavior is - determined from the [`default_pid_mode`][default_pid_mode] in plugin configuration. - -!> **Warning:** If set to `"host"`, other processes running as the same user will -be able to access sensitive process information like environment variables. - -- `ipc_mode` - (Optional) Set to `"private"` to enable IPC namespace isolation for - this task, or `"host"` to disable isolation. If left unset, the behavior is - determined from the [`default_ipc_mode`][default_ipc_mode] in plugin configuration. - -!> **Warning:** If set to `"host"`, other processes running as the same user will be -able to make use of IPC features, like sending unexpected POSIX signals. - -- `cap_add` - (Optional) A list of Linux capabilities to enable for the task. - Effective capabilities (computed from `cap_add` and `cap_drop`) must be a - subset of the allowed capabilities configured with [`allow_caps`][allow_caps]. - Note that `"all"` is not permitted here if the `allow_caps` field in the - driver configuration doesn't also allow all capabilities. - -```hcl -config { - cap_add = ["net_raw", "sys_time"] -} -``` - -- `cap_drop` - (Optional) A list of Linux capabilities to disable for the task. - Effective capabilities (computed from `cap_add` and `cap_drop`) must be a subset - of the allowed capabilities configured with [`allow_caps`][allow_caps]. - -```hcl -config { - cap_drop = ["all"] - cap_add = ["chown", "sys_chroot", "mknod"] -} -``` - -- `work_dir` - (Optional) Sets a custom working directory for the task. This path must be - absolute and within the task's [chroot](#chroot) or in a [host volume][] mounted - with a [`volume_mount`][volume_mount] block. This will also change the working - directory when using `nomad alloc exec`. - -## Examples - -To run a binary present on the Node: - -```hcl -task "example" { - driver = "exec" - - config { - # When running a binary that exists on the host, the path must be absolute. - command = "/bin/sleep" - args = ["1"] - } -} -``` - -To execute a binary downloaded from an -[`artifact`](/nomad/docs/job-specification/artifact): - -```hcl -task "example" { - driver = "exec" - - config { - command = "name-of-my-binary" - } - - artifact { - source = "https://internal.file.server/name-of-my-binary" - options { - checksum = "sha256:abd123445ds4555555555" - } - } -} -``` +The `exec` driver is used to execute a particular command for a task. However, +unlike [`raw_exec`](/nomad/docs/job-declare/task-driver/raw_exec) it uses the +underlying isolation primitives of the operating system to limit the task's +access to resources. While simple, since the `exec` driver can invoke any +command, it can be used to call scripts or other wrappers which provide higher +level features. ## Capabilities @@ -298,15 +185,19 @@ CPU cores to make use of. The total set of cores the task may run on is the priv set combined with the variable set of unreserved cores. The private set of CPU cores is available to your process by reading [`NOMAD_CPU_CORES`][runtime_env]. -[default_pid_mode]: /nomad/docs/drivers/exec#default_pid_mode -[default_ipc_mode]: /nomad/docs/drivers/exec#default_ipc_mode -[cap_add]: /nomad/docs/drivers/exec#cap_add -[cap_drop]: /nomad/docs/drivers/exec#cap_drop +## Next steps + +[Use the Isolated Fork/Exec task driver in a job](/nomad/docs/job-declare/task-driver/exec). + +[default_pid_mode]: /nomad/docs/job-declare/task-driver/exec#default_pid_mode +[default_ipc_mode]: /nomad/docs/job-declare/task-driver/exec#default_ipc_mode +[cap_add]: /nomad/docs/job-declare/task-driver/exec#cap_add +[cap_drop]: /nomad/docs/job-declare/task-driver/exec#cap_drop [no_net_raw]: /nomad/docs/upgrade/upgrade-specific#nomad-1-1-0-rc1-1-0-5-0-12-12 -[allow_caps]: /nomad/docs/drivers/exec#allow_caps +[allow_caps]: /nomad/docs/job-declare/task-driver/exec#allow_caps [docker_caps]: https://docs.docker.com/engine/reference/run/#runtime-privilege-and-linux-capabilities [host volume]: /nomad/docs/configuration/client#host_volume-block [volume_mount]: /nomad/docs/job-specification/volume_mount [cores]: /nomad/docs/job-specification/resources#cores -[runtime_env]: /nomad/docs/runtime/environment#job-related-variables -[cgroup controller requirements]: /nomad/docs/install/production/requirements#hardening-nomad +[runtime_env]: /nomad/docs/reference/runtime-environment-settings#job-related-variables +[cgroup controller requirements]: /nomad/docs/deploy/production/requirements#hardening-nomad diff --git a/website/content/docs/deploy/task-driver/index.mdx b/website/content/docs/deploy/task-driver/index.mdx new file mode 100644 index 000000000..e2cf63299 --- /dev/null +++ b/website/content/docs/deploy/task-driver/index.mdx @@ -0,0 +1,62 @@ +--- +layout: docs +page_title: Configure Nomad task drivers +description: Nomad's bundled task drivers integrate with the host OS to run job tasks in isolation. Review conceptual, configuration, and reference information for the Docker, Isolated Fork/Exec, Java, QEMU, and Raw Fork/Exec task drivers. +--- + +# Configure Nomad task drivers + +Nomad's bundled task drivers integrate with the host OS to run job tasks in +isolation. Review driver capabilities, client configuration, and reference +information for the Docker, Isolated Fork/Exec, Java, QEMU, and Raw Fork/Exec +task drivers. + +@include 'task-driver-intro.mdx' + +## Configuration + +Refer to the [plugin block documentation][plugin] for examples on how to use the +plugin block in Nomad's client configuration. Review the [Docker driver's client +requirements section][docker_plugin] for a detailed example. + +## Nomad task drivers + +The Nomad binary contains several bundled task drivers. We also support +additional task driver plugins that you may install separately. + +| Bundled with Nomad | Plugins | +|----------------------|-----------------------| +| [Docker] | [Exec2] | +| [Isolated Fork/Exec] | [Podman] | +| [Java] | [Virt] | +| [QEMU] | | +| [Raw Fork/Exec] | | + +## Community task drivers + +You may also use [community-supported task driver +plugins](/nomad/plugins/drivers/community/). + +## Use task drivers in jobs + +Refer to [Use Nomad task drivers in jobs](/nomad/docs/job-declare/task-driver) +for usage information. + +## Create task drivers + +Nomad's task driver architecture is pluggable, which gives you the flexibility +to create your own drivers without having to recompile Nomad. Refer to the +[plugin authoring guide][plugin_guide] for details. + + +[plugin]: /nomad/docs/configuration/plugin +[docker_plugin]: /nomad/docs/deploy/task-driver/docker#client-requirements +[plugin_guide]: /nomad/docs/concepts/plugins/task-drivers +[Docker]: /nomad/docs/deploy/task-driver/docker +[Exec2]: /nomad/plugins/drivers/exec2 +[Isolated Fork/Exec]: /nomad/docs/deploy/task-driver/exec +[Podman]: /nomad/plugins/drivers/podman +[Java]: /nomad/docs/deploy/task-driver/java +[Virt]: /nomad/plugins/drivers/virt +[QEMU]: /nomad/docs/deploy/task-driver/qemu +[Raw Fork/Exec]: /nomad/docs/deploy/task-driver/raw_exec diff --git a/website/content/docs/drivers/java.mdx b/website/content/docs/deploy/task-driver/java.mdx similarity index 51% rename from website/content/docs/drivers/java.mdx rename to website/content/docs/deploy/task-driver/java.mdx index c815bd67e..bae42c006 100644 --- a/website/content/docs/drivers/java.mdx +++ b/website/content/docs/deploy/task-driver/java.mdx @@ -1,10 +1,10 @@ --- layout: docs -page_title: Java task driver -description: Nomad's Java task driver lets you run JAR files in your workloads. Learn how to configure a job task that uses the Java task driver. Configure paths, JAR args, JVM options, namespace isolation, work directory, and Linux capabilities. Review the Java task driver capabilities, plugin options, client requirements, and client attributes such as Java version and virtual machine. Learn how the Java task driver affects resource isolation and chroot. +page_title: Configure the Java task driver +description: Nomad's Java task driver lets you run JAR files in your workloads. Review the Java task driver capabilities, plugin options, client requirements, and client attributes such as Java version and virtual machine. Learn how the Java task driver affects resource isolation and chroot. --- -# Java task driver +# Configure the Java task driver Name: `java` @@ -12,133 +12,6 @@ The `java` driver is used to execute Java applications packaged into a Java Jar file. The driver requires the Jar file to be accessible from the Nomad client via the [`artifact` downloader](/nomad/docs/job-specification/artifact). -## Task Configuration - -```hcl -task "webservice" { - driver = "java" - - config { - jar_path = "local/example.jar" - jvm_options = ["-Xmx2048m", "-Xms256m"] - } -} -``` - -The `java` driver supports the following configuration in the job spec: - -- `class` - (Optional) The name of the class to run. If `jar_path` is specified - and the manifest specifies a main class, this is optional. If shipping classes - rather than a Jar, please specify the class to run and the `class_path`. - -- `class_path` - (Optional) The `class_path` specifies the class path used by - Java to lookup classes and Jars. - -- `jar_path` - (Optional) The path to the downloaded Jar. In most cases this will just be - the name of the Jar. However, if the supplied artifact is an archive that - contains the Jar in a subfolder, the path will need to be the relative path - (`subdir/from_archive/my.jar`). - -- `args` - (Optional) A list of arguments to the Jar's main method. References - to environment variables or any [interpretable Nomad - variables](/nomad/docs/runtime/interpolation) will be interpreted before - launching the task. - -- `jvm_options` - (Optional) A list of JVM options to be passed while invoking - java. These options are passed without being validated in any way by Nomad. - -- `pid_mode` - (Optional) Set to `"private"` to enable PID namespace isolation for - this task, or `"host"` to disable isolation. If left unset, the behavior is - determined from the [`default_pid_mode`][default_pid_mode] in plugin configuration. - -!> **Warning:** If set to `"host"`, other processes running as the same user will -be able to access sensitive process information like environment variables. - -- `ipc_mode` - (Optional) Set to `"private"` to enable IPC namespace isolation for - this task, or `"host"` to disable isolation. If left unset, the behavior is - determined from the [`default_ipc_mode`][default_ipc_mode] in plugin configuration. - -!> **Warning:** If set to `"host"`, other processes running as the same user will be -able to make use of IPC features, like sending unexpected POSIX signals. - -- `cap_add` - (Optional) A list of Linux capabilities to enable for the task. - Effective capabilities (computed from `cap_add` and `cap_drop`) must be a - subset of the allowed capabilities configured with [`allow_caps`][allow_caps]. - Note that `"all"` is not permitted here if the `allow_caps` field in the - driver configuration doesn't also allow all capabilities. - - -```hcl -config { - cap_add = ["net_raw", "sys_time"] -} -``` - -- `cap_drop` - (Optional) A list of Linux capabilities to disable for the task. - Effective capabilities (computed from `cap_add` and `cap_drop`) must be a subset - of the allowed capabilities configured with [`allow_caps`][allow_caps]. - -```hcl -config { - cap_drop = ["all"] - cap_add = ["chown", "sys_chroot", "mknod"] -} -``` - -- `work_dir` - (Optional) Sets a custom working directory for the task. This path must be - absolute and within the task's [chroot](#chroot) or in a [host volume][] mounted - with a [`volume_mount`][volume_mount] block. This will also change the working - directory when using `nomad alloc exec`. - -## Examples - -A simple config block to run a Java Jar: - -```hcl -task "web" { - driver = "java" - - config { - jar_path = "local/hello.jar" - jvm_options = ["-Xmx2048m", "-Xms256m"] - } - - # Specifying an artifact is required with the "java" driver. This is the - # mechanism to ship the Jar to be run. - artifact { - source = "https://internal.file.server/hello.jar" - - options { - checksum = "md5:123445555555555" - } - } -} -``` - -A simple config block to run a Java class: - -```hcl -task "web" { - driver = "java" - - config { - class = "Hello" - class_path = "${NOMAD_TASK_DIR}" - jvm_options = ["-Xmx2048m", "-Xms256m"] - } - - # Specifying an artifact is required with the "java" driver. This is the - # mechanism to ship the Jar to be run. - artifact { - source = "https://internal.file.server/Hello.class" - - options { - checksum = "md5:123445555555555" - } - } -} -``` - ## Capabilities The `java` driver implements the following [capabilities](/nomad/docs/concepts/plugins/task-drivers#capabilities-capabilities-error). @@ -257,13 +130,17 @@ create. This list is configurable through the agent client [configuration file](/nomad/docs/configuration/client#chroot_env). -[default_pid_mode]: /nomad/docs/drivers/java#default_pid_mode -[default_ipc_mode]: /nomad/docs/drivers/java#default_ipc_mode -[cap_add]: /nomad/docs/drivers/java#cap_add -[cap_drop]: /nomad/docs/drivers/java#cap_drop +## Next steps + +[Use the Java task driver in a job](/nomad/docs/job-declare/task-driver/java). + +[default_pid_mode]: /nomad/docs/job-declare/task-driver/java#default_pid_mode +[default_ipc_mode]: /nomad/docs/job-declare/task-driver/java#default_ipc_mode +[cap_add]: /nomad/docs/job-declare/task-driver/java#cap_add +[cap_drop]: /nomad/docs/job-declare/task-driver/java#cap_drop [no_net_raw]: /nomad/docs/upgrade/upgrade-specific#nomad-1-1-0-rc1-1-0-5-0-12-12 -[allow_caps]: /nomad/docs/drivers/java#allow_caps +[allow_caps]: /nomad/docs/job-declare/task-driver/java#allow_caps [docker_caps]: https://docs.docker.com/engine/reference/run/#runtime-privilege-and-linux-capabilities -[cgroup controller requirements]: /nomad/docs/install/production/requirements#hardening-nomad +[cgroup controller requirements]: /nomad/docs/deploy/production/requirements#hardening-nomad [volume_mount]: /nomad/docs/job-specification/volume_mount [host volume]: /nomad/docs/configuration/client#host_volume-block diff --git a/website/content/docs/deploy/task-driver/qemu.mdx b/website/content/docs/deploy/task-driver/qemu.mdx new file mode 100644 index 000000000..b08466d37 --- /dev/null +++ b/website/content/docs/deploy/task-driver/qemu.mdx @@ -0,0 +1,103 @@ +--- +layout: docs +page_title: Configure the QEMU task driver +description: Nomad's QEMU task driver provides a generic virtual machine runner that can execute any regular QEMU image. Review the QEMU task driver capabilities, plugin options, client requirements, and client attributes such as QEMU version. Learn how the QEMU task driver provides the highest level of workload isolation. +--- + +# Configure the QEMU task driver + +Name: `qemu` + +The `qemu` driver provides a generic virtual machine runner. QEMU can utilize +the KVM kernel module to utilize hardware virtualization features and provide +great performance. Currently the `qemu` driver can map a set of ports from the +host machine to the guest virtual machine, and provides configuration for +resource allocation. + +The `qemu` driver can execute any regular `qemu` image (e.g. `qcow`, `img`, +`iso`), and is currently invoked with `qemu-system-x86_64`. + +The driver requires the image to be accessible from the Nomad client via the +[`artifact` downloader](/nomad/docs/job-specification/artifact). + +## Capabilities + +The `qemu` driver implements the following [capabilities](/nomad/docs/concepts/plugins/task-drivers#capabilities-capabilities-error). + +| Feature | Implementation | +| -------------------- | -------------- | +| `nomad alloc signal` | false | +| `nomad alloc exec` | false | +| filesystem isolation | image | +| network isolation | none | +| volume mounting | none | + +## Client Requirements + +The `qemu` driver requires QEMU to be installed and in your system's `$PATH`. +The task must also specify at least one artifact to download, as this is the only +way to retrieve the image being run. + +## Client Attributes + +The `qemu` driver will set the following client attributes: + +- `driver.qemu` - Set to `1` if QEMU is found on the host node. Nomad determines + this by executing `qemu-system-x86_64 -version` on the host and parsing the output +- `driver.qemu.version` - Version of `qemu-system-x86_64`, ex: `2.4.0` + +Here is an example of using these properties in a job file: + +```hcl +job "docs" { + # Only run this job where the qemu version is higher than 1.2.3. + constraint { + attribute = "${driver.qemu.version}" + operator = ">" + value = "1.2.3" + } +} +``` + +## Plugin Options + +```hcl +plugin "qemu" { + config { + image_paths = ["/mnt/image/paths"] + args_allowlist = ["-drive", "-usbdevice"] + } +} +``` + +- `image_paths` (`[]string`: `[]`) - Specifies the host paths the QEMU + driver is allowed to load images from. +- `args_allowlist` (`[]string`: `[]`) - Specifies the command line + flags that the [`args`] option is permitted to pass to QEMU. If + unset, a job submitter can pass any command line flag into QEMU, + including flags that provide the VM with access to host devices such + as USB drives. Refer to the [QEMU documentation] for the available + flags. + +## Resource Isolation + +Nomad uses QEMU to provide full software virtualization for virtual machine +workloads. Nomad can use QEMU KVM's hardware-assisted virtualization to deliver +better performance. + +Virtualization provides the highest level of isolation for workloads that +require additional security, and resource use is constrained by the QEMU +hypervisor rather than the host kernel. VM network traffic still flows through +the host's interface(s). + +Note that the strong isolation provided by virtualization only applies +to the workload once the VM is started. Operators should use the +`args_allowlist` option to prevent job submitters from accessing +devices and resources they are not allowed to access. + +## Next steps + +[Use the Java task driver in a job](/nomad/docs/job-declare/task-driver/qemu). + +[`args`]: /nomad/docs/job-declare/task-driver/qemu#args +[QEMU documentation]: https://www.qemu.org/docs/master/system/invocation.html diff --git a/website/content/docs/drivers/raw_exec.mdx b/website/content/docs/deploy/task-driver/raw_exec.mdx similarity index 55% rename from website/content/docs/drivers/raw_exec.mdx rename to website/content/docs/deploy/task-driver/raw_exec.mdx index a4edc359f..892d371e5 100644 --- a/website/content/docs/drivers/raw_exec.mdx +++ b/website/content/docs/deploy/task-driver/raw_exec.mdx @@ -1,10 +1,10 @@ --- layout: docs page_title: Raw Fork/Exec task driver -description: Nomad's Raw Exec task driver lets you execute commands with no resource isolation. Learn how to use the Raw Fork/Exec task driver in your jobs. Configure the command to execute, command arguments, cgroup overrides, work directory, and out-of-memory (OOM) behavior. Review the Isolated Fork/Exec task driver capabilities, plugin options, client requirements, and client attributes. +description: Nomad's Raw Exec task driver lets you execute commands with no resource isolation. Review the Isolated Fork/Exec task driver capabilities, plugin options, client requirements, and client attributes. --- -# Raw Fork/Exec task driver +# Configure the Raw Fork/Exec task driver Name: `raw_exec` @@ -12,93 +12,6 @@ The `raw_exec` driver is used to execute a command for a task without any isolation. Further, the task is started as the same user as the Nomad process. As such, it should be used with extreme care and is disabled by default. -## Task Configuration - -```hcl -task "webservice" { - driver = "raw_exec" - - config { - command = "my-binary" - args = ["-flag", "1"] - } -} -``` - -The `raw_exec` driver supports the following configuration in the job spec: - -- `command` - The command to execute. Must be provided. If executing a binary - that exists on the host, the path must be absolute. If executing a binary that - is downloaded from an [`artifact`](/nomad/docs/job-specification/artifact), the - path can be relative from the allocation's root directory. - -- `args` - (Optional) A list of arguments to the `command`. References - to environment variables or any [interpretable Nomad - variables](/nomad/docs/runtime/interpolation) will be interpreted before - launching the task. - -- `cgroup_v1_override` - (Optional) A map of controller names to paths. The - task will be added to these cgroups. The task will fail if these cgroups do - not exist. **WARNING:** May conflict with other Nomad driver's cgroups and - have unintended side effects. - -- `cgroup_v2_override` - (Optional) Adds the task to a unified cgroup path. - Paths may be relative to the cgroupfs root or absolute. **WARNING:** May - conflict with other Nomad driver's cgroups and have unintended side - effects. - -~> On Linux, you cannot set the `task.user` field on a task using the `raw_exec` -driver if you have hardened the Nomad client according to the -[production][hardening] guide. On Windows, when Nomad is running as a [system -service][service], you may specify a less-privileged service user. For example, -`NT AUTHORITY\LocalService`, `NT AUTHORITY\NetworkService`. - -- `oom_score_adj` - (Optional) A positive integer to indicate the likelihood of - the task being OOM killed (valid only for Linux). Defaults to 0. - -- `work_dir` - (Optional) Sets a custom working directory for the task. This - must be an absolute path. This will also change the working directory when - using `nomad alloc exec`. - -- `denied_envvars` - (Optional) Passes a list of environment variables that - the driver should scrub from the task environment. Supports globbing, with "*" - wildcard accepted as prefix and/or suffix. - -## Examples - -To run a binary present on the Node: - -``` -task "example" { - driver = "raw_exec" - - config { - # When running a binary that exists on the host, the path must be absolute/ - command = "/bin/sleep" - args = ["1"] - } -} -``` - -To execute a binary downloaded from an [`artifact`](/nomad/docs/job-specification/artifact): - -``` -task "example" { - driver = "raw_exec" - - config { - command = "name-of-my-binary" - } - - artifact { - source = "https://internal.file.server/name-of-my-binary" - options { - checksum = "sha256:abd123445ds4555555555" - } - } -} -``` - ## Capabilities The `raw_exec` driver implements the following [capabilities](/nomad/docs/concepts/plugins/task-drivers#capabilities-capabilities-error). @@ -210,7 +123,11 @@ resources { } ``` -[hardening]: /nomad/docs/install/production/requirements#user-permissions -[service]: /nomad/docs/install/windows-service +## Next steps + +[Use the `raw_exec` driver in a job](/nomad/docs/job-declare/task-driver/raw_exec). + +[hardening]: /nomad/docs/deploy/production/requirements#user-permissions +[service]: /nomad/docs/deploy/production/windows-service [plugin-options]: #plugin-options [plugin-block]: /nomad/docs/configuration/plugin diff --git a/website/content/docs/drivers/index.mdx b/website/content/docs/drivers/index.mdx deleted file mode 100644 index dc300a85f..000000000 --- a/website/content/docs/drivers/index.mdx +++ /dev/null @@ -1,11 +0,0 @@ ---- -layout: docs -page_title: Nomad task drivers -description: Nomad's bundled task drivers integrate with the host OS to run job tasks in isolation. Review conceptual, installation, usage, and reference information for the Docker, Isolated Fork/Exec, Java, QEMU, and Raw Fork/Exec task drivers. ---- - -# Nomad task drivers - -Nomad's bundled task drivers integrate with the host OS to run job tasks in isolation. Review conceptual, installation, usage, and reference information for the Docker, Isolated Fork/Exec, Java, QEMU, and Raw Fork/Exec task drivers. - -@include 'task-driver-intro.mdx' diff --git a/website/content/docs/ecosystem.mdx b/website/content/docs/ecosystem.mdx index a87688371..4bde196b5 100644 --- a/website/content/docs/ecosystem.mdx +++ b/website/content/docs/ecosystem.mdx @@ -49,11 +49,11 @@ description: Learn about the Nomad ecosystem, which includes CI/CD, task drivers ## Secret Management -- [Vault](/nomad/docs/integrations/vault) +- [Vault](/nomad/docs/secure/vault) ## Service Mesh -- [Consul](/nomad/docs/integrations/consul/service-mesh) +- [Consul](/nomad/docs/networking/consul/service-mesh) ## Provisioning @@ -67,13 +67,13 @@ description: Learn about the Nomad ecosystem, which includes CI/CD, task drivers ## Service Proxy -- [Envoy](/nomad/docs/integrations/consul-connect) +- [Envoy](/nomad/docs/networking/consul) - [NGINX](/nomad/tutorials/load-balancing/load-balancing-nginx) - [Traefik](/nomad/tutorials/load-balancing/load-balancing-traefik) ## Storage -- [CSI](/nomad/docs/concepts/plugins/storage/csi) +- [CSI](/nomad/docs/architecture/storage/csi) ## GPUs diff --git a/website/content/docs/enterprise/index.mdx b/website/content/docs/enterprise/index.mdx index b60bf9705..174bcde16 100644 --- a/website/content/docs/enterprise/index.mdx +++ b/website/content/docs/enterprise/index.mdx @@ -61,7 +61,7 @@ the numbers match, Nomad will begin to promote new servers and demote old ones. See the [Autopilot - Upgrade -Migrations](/nomad/tutorials/manage-clusters/autopilot#upgrade-migrations) +Migrations](/nomad/docs/manage/autopilot#upgrade-migrations) documentation for a thorough overview. ### Automated Backups @@ -75,7 +75,7 @@ This capability provides an enterprise solution for backup and restoring the state of Nomad servers within an environment in an automated manner. These snapshots are atomic and point-in-time. -See the [Operator Snapshot agent](/nomad/docs/commands/operator/snapshot/agent) +See the [Operator Snapshot agent](/nomad/commands/operator/snapshot/agent) documentation for a thorough overview. ### Enhanced Read Scalability @@ -87,7 +87,7 @@ committed). Adding explicit non-voters will scale reads and scheduling without impacting write latency. See the [Autopilot - Read -Scalability](/nomad/tutorials/manage-clusters/autopilot#server-read-and-scheduling-scaling) +Scalability](/nomad/docs/manage/autopilot#server-read-and-scheduling-scaling) documentation for a thorough overview. ### Redundancy Zones @@ -102,7 +102,7 @@ will promote the non-voter to a voter automatically, putting the hot standby server into service quickly. See the [Autopilot - Redundancy -Zones](/nomad/tutorials/manage-clusters/autopilot#redundancy-zones) +Zones](/nomad/docs/manage/autopilot#redundancy-zones) documentation for a thorough overview. ### Multiple Vault Namespaces @@ -113,7 +113,7 @@ consolidation when running Nomad and Vault together. Nomad will automatically retrieve a Vault token based on a job's defined Vault Namespace and make it available for the specified Nomad task at hand. -Refer to the [Vault Integration documentation](/nomad/docs/integrations/vault/acl#vault-namespaces)) for more information. +Refer to the [Vault Integration documentation](/nomad/docs/secure/vault/acl#vault-namespaces)) for more information. ### Multiple Vault and Consul Clusters @@ -162,7 +162,7 @@ This allows operators to partition a shared cluster and ensure that no single actor can consume the whole resources of the cluster. See the [Resource Quotas -Guide](/nomad/tutorials/governance-and-policy/quotas) for a thorough +Guide](/nomad/docs/govern/resource-quotas) for a thorough overview. ### Sentinel Policies @@ -208,5 +208,5 @@ request a trial of Nomad Enterprise. [multiregion deployments]: /nomad/docs/job-specification/multiregion [autoscaling capabilities]: /nomad/tools/autoscaling [scaling policies]: /nomad/tools/autoscaling/policy -[Sentinel Policies Guide]: /nomad/tutorials/governance-and-policy/sentinel -[Nomad Sentinel policy reference]: /nomad/docs/enterprise/sentinel +[Sentinel Policies Guide]: /nomad/docs/govern/sentinel +[Nomad Sentinel policy reference]: /nomad/docs/reference/sentinel-policy diff --git a/website/content/docs/enterprise/license/faq.mdx b/website/content/docs/enterprise/license/faq.mdx index db8295481..6a9ae7cc1 100644 --- a/website/content/docs/enterprise/license/faq.mdx +++ b/website/content/docs/enterprise/license/faq.mdx @@ -21,7 +21,7 @@ when the license expires, and a server can not start with an expired license. For new "non-terminating" contract licenses, instead of expiration being compared to the current time, expiration time is compared to the build date of the Nomad binary, which you can find in the output of the -[`nomad version`](/nomad/docs/commands/version) command. +[`nomad version`](/nomad/commands/version) command. The practical result is that newer contract licenses will work in perpetuity for any v1.6.0+ version of Nomad built prior to the expiration time of the license. No features will stop working, @@ -97,4 +97,4 @@ not having forward compatibility, and may result in a crash loop. ## Q: Is there a tutorial available for the license configuration steps? Please visit the [Enterprise License Tutorial](/nomad/tutorials/enterprise/hashicorp-enterprise-license). -[license inspect]: /nomad/docs/commands/license/inspect +[license inspect]: /nomad/commands/license/inspect diff --git a/website/content/docs/enterprise/license/index.mdx b/website/content/docs/enterprise/license/index.mdx index c61c0869e..d335265b5 100644 --- a/website/content/docs/enterprise/license/index.mdx +++ b/website/content/docs/enterprise/license/index.mdx @@ -34,10 +34,10 @@ relating to Enterprise features not being parseable by open source Nomad. Nomad Enterprise licenses have an expiration time. You can read and validate the license on a running server, on disk, or in your environment with the -[`nomad license` commands](/nomad/docs/commands/license). +[`nomad license` commands](/nomad/commands/license). Before upgrading Nomad or replacing your license with a new one, you should always run -[`nomad license inspect`](/nomad/docs/commands/license/inspect) +[`nomad license inspect`](/nomad/commands/license/inspect) to ensure the license is valid with your server binary. As a Nomad Enterprise license approaches its expiration time, Nomad servers @@ -75,7 +75,7 @@ NOMAD_LICENSE=misconfigured nomad agent -dev ==> Error starting agent: server setup failed: failed to initialize enterprise licensing: a file license was configured but the license is invalid: error decoding version: expected integer ``` -See the [License commands](/nomad/docs/commands/license) for more information on +See the [License commands](/nomad/commands/license) for more information on interacting with the Enterprise License. [releases site]: https://releases.hashicorp.com/nomad diff --git a/website/content/docs/enterprise/license/utilization-reporting.mdx b/website/content/docs/enterprise/license/utilization-reporting.mdx index 57d67127c..f950f0111 100644 --- a/website/content/docs/enterprise/license/utilization-reporting.mdx +++ b/website/content/docs/enterprise/license/utilization-reporting.mdx @@ -121,7 +121,7 @@ Set the following environment variable. $ export OPTOUT_LICENSE_REPORTING=true ``` -Now restart your system by following [these instructions](/nomad/docs/operations/nomad-agent). +Now restart your system by following [these instructions](/nomad/docs/deploy/nomad-agent). Check your product logs roughly 24 hours after opting out to make sure that the system isn’t trying to send reports. diff --git a/website/content/docs/faq.mdx b/website/content/docs/faq.mdx index 6f2c45478..0b45423b3 100644 --- a/website/content/docs/faq.mdx +++ b/website/content/docs/faq.mdx @@ -19,8 +19,8 @@ and [`disable_update_check`](/nomad/docs/configuration#disable_update_check). ## Q: Is Nomad eventually or strongly consistent? -Nomad makes use of both a [consensus protocol](/nomad/docs/concepts/consensus) and -a [gossip protocol](/nomad/docs/concepts/gossip). The consensus protocol is strongly +Nomad makes use of both a [consensus protocol](/nomad/docs/architecture/cluster/consensus) and +a [gossip protocol](/nomad/docs/architecture/security/gossip). The consensus protocol is strongly consistent, and is used for all state replication and scheduling. The gossip protocol is used to manage the addresses of servers for automatic clustering and multi-region federation. This means all data that is managed by Nomad is strongly consistent. diff --git a/website/content/docs/glossary.mdx b/website/content/docs/glossary.mdx index bdf71eed3..a86a5517e 100644 --- a/website/content/docs/glossary.mdx +++ b/website/content/docs/glossary.mdx @@ -108,7 +108,7 @@ region and they manage all jobs and clients, run evaluations, and create task allocations. The servers replicate data between each other and perform leader election to ensure high availability. More information about latency requirements for servers can be found in [Network -Topology](/nomad/docs/install/production/requirements#network-topology). +Topology](/nomad/docs/deploy/production/requirements#network-topology). ## Task diff --git a/website/content/docs/govern/index.mdx b/website/content/docs/govern/index.mdx new file mode 100644 index 000000000..78f9c1b32 --- /dev/null +++ b/website/content/docs/govern/index.mdx @@ -0,0 +1,113 @@ +--- +layout: docs +page_title: Governance and policy on Nomad +description: |- + This section provides best practices and guidance for operating Nomad securely in a multi-team setting through features such as resource quotas, node pools, and Sentinel policies. +--- + +# Governance and policy on Nomad + +Nomad Enterprise is aimed at teams and organizations and addresses the organizational complexity of multi-team and multi-cluster deployments with collaboration and governance features. + +This section provides best practices and guidance for operating Nomad +securely in a multi-team setting through features such as resource quotas, node +pools, and Sentinel policies. + +## Resource quotas + + + +When many teams or users +are sharing Nomad clusters, there is the concern that a single user could use +more than their fair share of resources. Resource quotas provide a mechanism for +cluster administrators to restrict the resources within a namespace. + +Quota specifications are first class objects in Nomad. A quota specification +has a unique name, an optional human readable description, and a set of quota +limits. The quota limits define the allowed resource usage within a region. + +Quota objects are shareable among namespaces. This allows an operator to define +higher level quota specifications, such as a `prod-api` quota, and multiple +namespaces can apply the `prod-api` quota specification. + +## Sentinel + + + +[![Sentinel Overview][img_sentinel_overview]][img_sentinel_overview] + +- **Sentinel Policies** - Policies are able to introspect on request arguments + and use complex logic to determine if the request meets policy requirements. + For example, a Sentinel policy may restrict Nomad jobs to only using the + "docker" driver or prevent jobs from being modified outside of business + hours. + +- **Policy Scope** - Sentinel policies declare a "scope", which determines when + the policies apply. Currently the only supported scope is "submit-job", which + applies to any new jobs being submitted, or existing jobs being updated. + +- **Enforcement Level** - Sentinel policies support multiple enforcement levels. + The `advisory` level emits a warning when the policy fails, while + `soft-mandatory` and `hard-mandatory` will prevent the operation. A + `soft-mandatory` policy can be overridden if the user has necessary + permissions. + +### Sentinel policies + +Each Sentinel policy has a unique name, an optional description, applicable +scope, enforcement level, and a Sentinel rule definition. If multiple policies +are installed for the same scope, all of them are enforced and must pass. + +Sentinel policies _cannot_ be used unless the ACL system is enabled. + +### Policy scope + +Sentinel policies specify an applicable scope, which limits when the policy is +enforced. This allows policies to govern various aspects of the system. + +The following table summarizes the scopes that are available for Sentinel +policies: + +| Scope | Description | +| ---------- | ----------------------------------------------------- | +| submit-job | Applies to any jobs (new or updated) being registered | + +### Enforcement level + +Sentinel policies specify an enforcement level which changes how a policy is +enforced. This allows for more flexibility in policy enforcement. + +The following table summarizes the enforcement levels that are available: + +| Enforcement Level | Description | +| ----------------- | ---------------------------------------------------------------------- | +| advisory | Issues a warning when a policy fails | +| soft-mandatory | Prevents operation when a policy fails, issues a warning if overridden | +| hard-mandatory | Prevents operation when a policy fails | + +The [`sentinel-override` capability] is required to override a `soft-mandatory` +policy. This allows a restricted set of users to have override capability when +necessary. + +### Multi-region configuration + +Nomad supports multi-datacenter and multi-region configurations. A single region +is able to service multiple datacenters, and all servers in a region replicate +their state between each other. In a multi-region configuration, there is a set +of servers per region. Each region operates independently and is loosely coupled +to allow jobs to be scheduled in any region and requests to flow transparently +to the correct region. + +When ACLs are enabled, Nomad depends on an "authoritative region" to act as a +single source of truth for ACL policies, global ACL tokens, and Sentinel +policies. The authoritative region is configured in the [`server` stanza] of +agents, and all regions must share a single authoritative source. Any Sentinel +policies are created in the authoritative region first. All other regions +replicate Sentinel policies, ACL policies, and global ACL tokens to act as local +mirrors. This allows policies to be administered centrally, and for enforcement +to be local to each region for low latency. + + +[img_sentinel_overview]: /img/govern/sentinel.jpg +[`sentinel-override` capability]: /nomad/tutorials/access-control#sentinel-override +[`server` stanza]: /nomad/docs/configuration/server diff --git a/website/content/docs/govern/namespaces.mdx b/website/content/docs/govern/namespaces.mdx new file mode 100644 index 000000000..a800e2145 --- /dev/null +++ b/website/content/docs/govern/namespaces.mdx @@ -0,0 +1,144 @@ +--- +layout: docs +page_title: Create and use namespaces +description: |- + Segment jobs and their associated objects from the jobs of other users of + the cluster using Nomad namespaces. +--- + +# Create and use namespaces + +Nomad has support for namespaces, which allow jobs and their +associated objects to be segmented from each other and other users of the +cluster. + +Nomad places all jobs and their derived objects into namespaces. These include +jobs, allocations, deployments, and evaluations. + +Nomad does not namespace objects that are shared across multiple namespaces. +This includes nodes, [ACL policies][acls], [Sentinel policies], and +[quota specifications][quotas]. + +In this guide, you'll create and manage a namespace with the CLI. After creating +the namespace, you then learn how to deploy and manage a job within that +namespace. Finally, you practice securing the namespace. + +## Create and view a namespace + +You can manage namespaces with the `nomad namespace` subcommand. + +Create the namespace of a cluster. + +```shell-session +$ nomad namespace apply -description "QA instances of webservers" web-qa +Successfully applied namespace "web-qa"! +``` + +List the namespaces of a cluster. + +```shell-session +$ nomad namespace list +Name Description +default Default shared namespace +api-prod Production instances of backend API servers +api-qa QA instances of backend API servers +web-prod Production instances of webservers +web-qa QA instances of webservers +``` + +## Run a job in a namespace + +To run a job in a specific namespace, annotate the job with the `namespace` +parameter. If omitted, the job will be run in the `default` namespace. Below is +an example of running the job in the newly created `web-qa` namespace: + +```hcl +job "rails-www" { + + ## Run in the QA environments + namespace = "web-qa" + + ## Only run in one datacenter when QAing + datacenters = ["us-west1"] + # ... +} +``` + +## Use namespaces in the CLI and UI + +### Nomad CLI + +When using commands that operate on objects that are namespaced, the namespace +can be specified either with the flag `-namespace` or read from the +`NOMAD_NAMESPACE` environment variable. + +Request job status using the `-namespace` flag. + +```shell-session +$ nomad job status -namespace=web-qa +ID Type Priority Status Submit Date +rails-www service 50 running 09/17/17 19:17:46 UTC +``` + +Export the `NOMAD_NAMESPACE` environment variable. + +```shell-session +$ export NOMAD_NAMESPACE=web-qa +``` + +Use the exported environment variable to request job status. + +```shell-session +$ nomad job status +ID Type Priority Status Submit Date +rails-www service 50 running 09/17/17 19:17:46 UTC +``` + +### Nomad UI + +The Nomad UI provides a drop-down menu to allow operators to select the +namespace that they would like to control. The drop-down will appear once there +are namespaces defined. It is located in the top section of the left-hand column +of the interface under the "WORKLOAD" label. + +[![An image of the Nomad UI showing the location of the namespace drop-down. +The drop-down is open showing the "Default Namespace" option and an option for a +"web-qa" namespace.][img_ui_ns_dropdown]][img_ui_ns_dropdown] + +## Secure a namespace + +Access to namespaces can be restricted using [ACLs]. As an example, you could +create an ACL policy that allows full access to the QA environment for the web +namespaces but restrict the production access by creating the following policy: + +```hcl +# Allow read only access to the production namespace +namespace "web-prod" { + policy = "read" +} + +# Allow writing to the QA namespace +namespace "web-qa" { + policy = "write" +} +``` + +## Consul namespaces + +@include 'consul-namespaces.mdx' + +Refer to the [Consul networking integration +guide](/nomad/docs/networking/consul) for Consul integration instructions. + +## Resources + +For specific details about working with namespaces, consult the [namespace +commands] and [HTTP API] documentation. + + +[acls]: /nomad/tutorials/access-control +[http api]: /nomad/api-docs/namespaces +[img_ui_ns_dropdown]: /img/govern/nomad-ui-namespace-dropdown.png +[namespace commands]: /nomad/commands/namespace +[quotas]: /nomad/docs/govern/resource-quotas +[sentinel policies]: /nomad/docs/govern/sentinel diff --git a/website/content/docs/govern/resource-quotas.mdx b/website/content/docs/govern/resource-quotas.mdx new file mode 100644 index 000000000..741140d73 --- /dev/null +++ b/website/content/docs/govern/resource-quotas.mdx @@ -0,0 +1,276 @@ +--- +layout: docs +page_title: Create and use resource quotas +description: |- + Create quotas that you attach to namespaces, and then secure them with ACLs so that you can restrict aggregate resource usage for namespaces. +--- + +# Create and use resource quotas + +[Nomad Enterprise] provides support for resource quotas, which allow operators +to restrict the aggregate resource usage of [namespaces]. Once a quota +specification is attached to a namespace, the Nomad cluster will count all +resource usage by jobs in that namespace toward the quota limits. If the +resource is exhausted, allocations within the namespaces will be queued until +resources become available—by other jobs finishing or the quota being expanded. + +In this guide, you'll create a quota that limits resources in the global region. +You will then assign the quota to a namespace where you will deploy a job. +Finally, you'll learn how to secure the quota with ACLs. + + + +## Define and create a quota + +You can manage resource quotas by using the `nomad quota` subcommand. To +get started with creating a quota specification, run `nomad quota init` which +produces an example quota specification. + +```shell-session +$ nomad quota init +Example quota specification written to spec.hcl +``` + +The file `spec.hcl` defines a limit for the global region. Additional limits may +be specified in order to limit other regions. + +```hcl +# spec.hcl +name = "default-quota" +description = "Limit the shared default namespace" + +# Create a limit for the global region. Additional limits may +# be specified in-order to limit other regions. +limit { + region = "global" + region_limit { + cores = 0 + cpu = 2500 + memory = 1000 + memory_max = 1000 + device "nvidia/gpu/1080ti" { + count = 1 + } + } + variables_limit = 1000 +} +``` + +### Resource limits + +When specifying resource limits the following enforcement behaviors are defined: + +- `limit < 0`: A limit less than zero disallows any access to the resource. + +- `limit == 0`: A limit of zero allows unlimited access to the resource. + +- `limit > 0`: A limit greater than zero enforces that the consumption is less + than or equal to the given limit. + +-> **Note:** Limits that specify the count of devices work differently. A `count > 0` +enforces that the consumption is less than or equal to the given limit, and setting +it to `0` disallows any usage of the given device. Unlike other limits, device count +cannot be set to a negative value. To allow unlimited access to a device +resource, remove it from the region limit. + +A quota specification is composed of one or more resource limits. Each limit +applies to a particular Nomad region. Within the limit object, operators can +specify the allowed CPU and memory usage. + +### Create the quota + +To create the quota, run `nomad quota apply` with the filename of your quota specification. + +```shell-session +$ nomad quota apply spec.hcl +Successfully applied quota specification "default-quota"! +``` + +Check for success with `nomad quota list`. + +```shell-session +$ nomad quota list +Name Description +default-quota Limit the shared default namespace +``` + +## Attach a quota to a namespace + +In order for a quota to be enforced, you have to attach the quota specification +to a namespace. This can be done using the `nomad namespace apply` command. Add +the new quota specification to the `default` namespace as follows: + +```shell-session +$ nomad namespace apply -quota default-quota default +Successfully applied namespace "default"! +``` + +## View quota information + +Now that you have attached a quota to a namespace, you can run a job in the default namespace. + +Initialize the job. + +```shell-session +$ nomad job init +Example job file written to example.nomad.hcl +``` + +Run the job in the default namespace. + +```shell-session +$ nomad job run -detach example.nomad.hcl +Job registration successful +Evaluation ID: 985a1df8-0221-b891-5dc1-4d31ad4e2dc3 +``` + +Check the status. + +```shell-session +$ nomad quota status default-quota +Name = default-quota +Description = Limit the shared default namespace +Limits = 1 + +Quota Limits +Region CPU Usage Core Usage Memory Usage Memory Max Usage Variables Usage +global 0 / 2500 0 / inf 0 / 1000 0 / 1000 0 / 1000 + +Quota Device Limits +Region Device Name Device Usage +global nvidia/gpu/1080ti 1 / 1 +``` + +Notice that the newly created job is accounted against the quota specification +since it is being run in the namespace attached to the "default-quota" quota. + +```shell-session +$ nomad job run -detach example.nomad.hcl +Job registration successful +Evaluation ID: ce8e1941-0189-b866-3dc4-7cd92dc38a69 +``` + +Now, add more instances of the job by changing `count = 1` to `count = 4` and +check to see if additional allocations were added. + +```shell-session +$ nomad status example +ID = example +Name = example +Submit Date = 10/16/17 10:51:32 PDT +Type = service +Priority = 50 +Datacenters = dc1 +Status = running +Periodic = false +Parameterized = false + +Summary +Task Group Queued Starting Running Failed Complete Lost +cache 1 0 3 0 0 0 + +Placement Failure +Task Group "cache": + * Quota limit hit "memory exhausted (1024 needed > 1000 limit)" + +Latest Deployment +ID = 7cd98a69 +Status = running +Description = Deployment is running + +Deployed +Task Group Desired Placed Healthy Unhealthy +cache 4 3 0 0 + +Allocations +ID Node ID Task Group Version Desired Status Created At +6d735236 81f72d90 cache 1 run running 10/16/17 10:51:32 PDT +ce8e1941 81f72d90 cache 1 run running 10/16/17 10:51:32 PDT +9b8e185e 81f72d90 cache 1 run running 10/16/17 10:51:24 PDT +``` + +The output indicates that Nomad created two more allocations but did not place +the fourth allocation, which would have caused the quota to be oversubscribed on +memory. + +## Secure quota with ACLs + +Access to quotas can be restricted using [ACLs]. As an example, you could create +an ACL policy that allows read-only access to quotas. + +```hcl +# Allow read only access to quotas. +quota { + policy = "read" +} +``` + +Proper ACLs are necessary to prevent users from bypassing quota enforcement by +increasing or removing the quota specification. + +## Design for federated clusters + +Nomad makes working with quotas in a federated cluster simple by replicating +quota specifications from the [authoritative Nomad region]. This allows +operators to interact with a single cluster but create quota specifications that +apply to all Nomad clusters. + +As an example, you can create a quota specification that applies to two regions: + +```hcl +name = "federated-example" +description = "A single quota spec affecting multiple regions" + +# Create a limits for two regions +limit { + region = "europe" + region_limit { + cpu = 20000 + memory = 10000 + } +} + +limit { + region = "asia" + region_limit { + cpu = 10000 + memory = 5000 + } +} +``` + +Apply the specification. + +```shell-session +$ nomad quota apply spec.hcl +Successfully applied quota specification "federated-example"! +``` + +Now that the specification is applied and attached to a namespace with jobs in each region, +use the `nomad quota status` command to observe how the enforcement applies +across federated clusters. + +```shell-session +$ nomad quota status federated-example +Name = federated-example +Description = A single quota spec affecting multiple regions +Limits = 2 + +Quota Limits +Region CPU Usage Memory Usage +asia 2500 / 10000 1000 / 5000 +europe 8800 / 20000 6000 / 10000 +``` + +## Learn more about quotas + +For specific details about working with quotas, consult the [quota commands] and +[HTTP API] documentation. + +[acls]: /nomad/docs/secure/acl +[authoritative nomad region]: /nomad/docs/configuration/server#authoritative_region +[http api]: /nomad/api-docs/quotas +[namespaces]: /nomad/docs/govern/namespaces +[nomad enterprise]: https://www.hashicorp.com/products/nomad +[quota commands]: /nomad/commands/quota +[quotas]: /nomad/docs/govern/resource-quotas diff --git a/website/content/docs/govern/sentinel.mdx b/website/content/docs/govern/sentinel.mdx new file mode 100644 index 000000000..72fdf6620 --- /dev/null +++ b/website/content/docs/govern/sentinel.mdx @@ -0,0 +1,206 @@ +--- +layout: docs +page_title: Create and manage Sentinel policies +description: |- + Create, install, test, and update Sentinel policies to express your policies + as code so that Nomad automatically enforces them. +--- + +# Create and manage Sentinel policies + + +[Nomad Enterprise] integrates with [HashiCorp Sentinel][sentinel] for +fine-grained policy enforcement. Sentinel allows operators to express their +policies as code and have their policies automatically enforced. This allows +operators to define a "sandbox" and restrict actions to only those compliant +with policy. + +The Sentinel integration builds on the [ACL System][acls]. The integration +provides the ability to create fine grained policy enforcements. Users must have +the appropriate permissions to perform an action and are subject to any +applicable Sentinel policies. + +In this guide, you will create a policy and then practice applying it to a job +at different enforcement levels. Finally, you'll learn more about Sentinel +language specifics. + + + +## Prerequisites + +The following example demonstrates how to install a Sentinel policy. It assumes +that ACLs have already been bootstrapped (refer to the [ACL guide][acls]), and +that a `NOMAD_TOKEN` environment variable is set to a management token. + +## Create, install, and test a policy + +First, create a Sentinel policy, named `test.sentinel`: + +```sentinel +## Test policy always fails for demonstration purposes +main = rule { false } +``` + +Then, install this as an "advisory" policy which issues a warning on failure: + +```shell-session +$ nomad sentinel apply -level=advisory test-policy test.sentinel +Successfully wrote "test-policy" Sentinel policy! +``` + +Use `nomad job init` to create a job file. + +```shell-session +$ nomad job init +Example job file written to example.nomad.hcl +``` + +Attempt to submit that job file with `nomad job run`. + +```shell-session +$ nomad job run example.nomad.hcl +Job Warnings: +1 warning(s): + +* test-policy : Result: false (allowed failure based on level) + +FALSE - test-policy:2:1 - Rule "main" + + +==> Monitoring evaluation "f43ac28d" + Evaluation triggered by job "example" + Evaluation within deployment: "11e01124" + Allocation "2618f3b4" created: node "add8ce93", group "cache" + Allocation "5c2674f2" created: node "add8ce93", group "cache" + Allocation "9937811f" created: node "add8ce93", group "cache" + Evaluation status changed: "pending" -> "complete" +==> Evaluation "f43ac28d" finished with status "complete" +``` + +The output indicates that the policy failed, but the job was accepted because of +an "advisory" enforcement level. + +## Update and test the policy + +Next, change `test.sentinel` to only allow "exec" based drivers: + +```sentinel +# Test policy only allows exec based tasks +main = rule { all_drivers_exec } + +# all_drivers_exec checks that all the drivers in use are exec +all_drivers_exec = rule { + all job.task_groups as tg { + all tg.tasks as task { + task.driver is "exec" + } + } +} +``` + +Then updated the policy at a soft mandatory level: + +```shell-session +$ nomad sentinel apply -level=soft-mandatory test-policy test.sentinel +Successfully wrote "test-policy" Sentinel policy! +``` + +With the new policy, attempt to submit the same job, which uses the "docker" +driver: + +```shell-session +$ nomad run example.nomad.hcl +Error submitting job: Unexpected response code: 500 (1 error(s) occurred: + +* test-policy : Result: false + +FALSE - test-policy:2:1 - Rule "main" + FALSE - test-policy:6:5 - all job.task_groups as tg { + all tg.tasks as task { + task.driver is "exec" + } +} + +FALSE - test-policy:5:1 - Rule "all_drivers_exec" +) +``` + +The output indicates that the policy and job have failed. + +## Override the policy + +Because the policy is failing, the job was rejected. Since the policy level is +"soft-mandatory", you can override it using the `-policy-override` flag. + +Submit the job again with the `-policy-override` flag set: + +```shell-session +$ nomad job run -policy-override example.nomad.hcl +Job Warnings: +1 warning(s): + +* test-policy : Result: false (allowed failure based on level) + +FALSE - test-policy:2:1 - Rule "main" + FALSE - test-policy:6:5 - all job.task_groups as tg { + all tg.tasks as task { + task.driver is "exec" + } +} + +FALSE - test-policy:5:1 - Rule "all_drivers_exec" + + +==> Monitoring evaluation "16195b50" + Evaluation triggered by job "example" + Evaluation within deployment: "11e01124" + Evaluation status changed: "pending" -> "complete" +==> Evaluation "16195b50" finished with status "complete" +``` + +This time, the job was accepted but with a warning that our policy is failing +but was overridden. + +## Extend your knowledge: policy specification + +Sentinel policies are specified in the [Sentinel Language][sentinel]. The +language is designed to be understandable for people who are reading and writing +policies, while remaining fast to evaluate. There is no limitation on how +complex policies can be, but they are in the execution path so care should be +taken to avoid adversely impacting performance. + +In each scope, there are different objects made available for introspection, +such a job being submitted. Policies can inspect these objects to apply +fine-grained policies. + +### Sentinel job objects + +The `job` object is made available to policies in the `submit-job` scope +automatically, without an explicit import. The object maps to the +[JSON job specification], but fields differ slightly for better readability. + +Sentinel convention for identifiers is lower case and separated by underscores. +All fields on the job are accessed by the same name, converted to lower case and +separating camel case to underscores. Here are some examples: + +| Job Field | Sentinel Accessor | +| ---------------------------------------- | ------------------------------------------- | +| `job.ID` | `job.id` | +| `job.AllAtOnce` | `job.all_at_once` | +| `job.ParentID` | `job.parent_id` | +| `job.TaskGroups` | `job.task_groups` | +| `job.TaskGroups[0].EphemeralDisk.SizeMB` | `job.task_groups[0].ephemeral_disk.size_mb` | + +## Learn more about Sentinel + +For specific details about working with Sentinel, consult the [`nomad sentinel` sub-commands] +and [HTTP API] documentation. + +[`nomad sentinel` sub-commands]: /nomad/commands/sentinel +[`sentinel-override` capability]: /nomad/tutorials/access-control#sentinel-override +[`server` stanza]: /nomad/docs/configuration/server +[acls]: /nomad/tutorials/access-control +[http api]: /nomad/api-docs/sentinel-policies +[json job specification]: /nomad/api-docs/json-jobs +[nomad enterprise]: https://www.hashicorp.com/products/nomad/ +[sentinel]: https://docs.hashicorp.com/sentinel diff --git a/website/content/docs/govern/use-node-pools.mdx b/website/content/docs/govern/use-node-pools.mdx new file mode 100644 index 000000000..9edb26025 --- /dev/null +++ b/website/content/docs/govern/use-node-pools.mdx @@ -0,0 +1,10 @@ +--- +layout: docs +page_title: Create and use node pools +description: |- + Create node pools. Review node pool replication in multi-region clusters, built-in node pools, node pool patterns, and enterprise features such as scheduler configuration, node pool governance, and multi-region jobs. +--- + +# Create and use node pools + +@include 'node-pools.mdx' diff --git a/website/content/docs/integrations/index.mdx b/website/content/docs/integrations/index.mdx deleted file mode 100644 index ce56bc2a6..000000000 --- a/website/content/docs/integrations/index.mdx +++ /dev/null @@ -1,12 +0,0 @@ ---- -layout: docs -page_title: HashiCorp integrations -description: |- - This section features Nomad's integrations with Consul and Vault. Learn how to integrate access control lists (ACLs) and Consul service mesh. ---- - -# HashiCorp integrations - -Nomad integrates seamlessly with Consul and Vault for service discovery and secrets management. - -Please navigate the appropriate sub-sections for more information. diff --git a/website/content/docs/job-declare/configure-tasks.mdx b/website/content/docs/job-declare/configure-tasks.mdx new file mode 100644 index 000000000..931117af7 --- /dev/null +++ b/website/content/docs/job-declare/configure-tasks.mdx @@ -0,0 +1,212 @@ +--- +layout: docs +page_title: Configure job tasks +description: |- + Provide values to a Nomad workload through job specification configuration files, command line arguments, and environment variables. +--- + +# Configure job tasks + +Most applications require some kind of local configuration. While command line +arguments are the simplest method, many applications require more complex +configurations provided via environment variables or configuration files. This +section explores how to configure Nomad jobs to support many common +configuration use cases. + +## Define application arguments + +Many tasks accept configuration via command-line arguments. For example, +consider the [http-echo](https://github.com/hashicorp/http-echo) server which +is a small go binary that renders the provided text as a webpage. The binary +accepts two parameters: + +- The `-listen` flag contains the `address:port` to listen on +- `-text` - the text to render as the HTML page + +Outside of Nomad, the server is started like this: + +```shell-session +$ http-echo -listen=":5678" -text="hello world" +``` + +The Nomad equivalent job file might look something like this: + +```hcl +job "docs" { + datacenters = ["dc1"] + + + group "example" { + network { + port "http" { + static = "5678" + } + } + + task "server" { + driver = "exec" + + config { + command = "/bin/http-echo" + + args = [ + "-listen", + ":5678", + "-text", + "hello world", + ] + } + } + } +} +``` + + + + For this job specification, you must install the `http-echo` in +the `/bin` folder on each of your clients. Nomad can +also optionally fetch the binary using the `artifact` resource. + + + +Nomad has many [drivers], and most support passing arguments to their tasks via +the `args` parameter. This parameter also supports Nomad variable +[interpolation]. For example, if you wanted Nomad to dynamically allocate a high +port to bind the service on instead of relying on a static port for the previous +job: + +```hcl +job "docs" { + datacenters = ["dc1"] + + group "example" { + network { + port "http" { + static = "5678" + } + } + + task "server" { + driver = "exec" + + config { + command = "/bin/http-echo" + + args = [ + "-listen", + ":${NOMAD_PORT_http}", + "-text", + "hello world", + ] + } + } + } +} +``` + +## Set environment variables + +Some applications can be configured via environment variables. The +[Twelve-Factor App](https://12factor.net/config) document suggests configuring +applications through environment variables. Nomad supports custom environment +variables in two ways: + +- Interpolation in an `env` stanza +- Templated in the a `template` stanza + +### `env` stanza + +Each task may have an `env` stanza which specifies environment variables: + +```hcl +task "server" { + env { + my_key = "my-value" + } +} +``` + +The `env` stanza also supports [interpolation]: + +```hcl +task "server" { + env { + LISTEN_PORT = "${NOMAD_PORT_http}" + } +} +``` + +Consult the [`env` stanza] documentation for details. + +### Build environment variables with templates + +Nomad's [`template` stanza] can be used to generate environment variables. +Environment variables may be templated with [Node attributes and +metadata][nodevars], the contents of files on disk, Consul keys, or secrets from +Vault: + +```hcl +template { + data = < Monitoring evaluation "0d159869" + Evaluation triggered by job "docs" + Allocation "5cbf23a1" created: node "1e1aa1e0", group "example" + Evaluation status changed: "pending" -> "complete" +==> Evaluation "0d159869" finished with status "complete" +``` + +Now that the job is scheduled, it may or may not be running. You need to inspect +the allocation status and logs to make sure the job started correctly. The next +section on [inspecting state](/nomad/docs/job-run/inspect) +details ways to examine this job's state. + +## Update and plan the job + +When making updates to the job, it is best to always run the plan command and +then the run command. For example: + +```diff +@@ -2,6 +2,8 @@ job "docs" { + datacenters = ["dc1"] + + group "example" { ++ count = "2" ++ + task "server" { + driver = "docker" +``` + +After saving these changes to disk, run the `nomad job plan` command: + +```shell-session +$ nomad job plan docs.nomad.hcl ++/- Job: "docs" ++/- Task Group: "example" (1 create, 1 in-place update) + +/- Count: "1" => "2" (forces create) + Task: "server" + +Scheduler dry-run: +- All tasks successfully allocated. + +Job Modify Index: 131 +To submit the job with version verification run: + +nomad job run -check-index 131 docs.nomad.hcl + +When running the job with the check-index flag, the job will only be run if the +job modify index given matches the server-side version. If the index has +changed, another user has modified the job and the plan's results are +potentially invalid. +``` + +#### Reserved port collisions + +Because this job uses a static port, it is possible for some instances to not be +placeable depending on the number of clients you have in your Nomad cluster. If +your plan output contains: + +```plaintext hideClipboard +Dimension "network: reserved port collision" exhausted on x nodes +``` + +This indicates that every feasible client in your cluster has or will have +something placed at the requested port, leaving no place for some of these +allocations to run. To resolve this, you need to reduce the requested count, +add additional clients, or migrate from static ports to dynamic ports in your +job specification. + +## Run the job + +Now, assuming the output is okay, execute the `nomad job run` command. Including +the `check-index` parameter ensures that the job +was not changed between the plan and run phases. + +```shell-session +$ nomad job run -check-index 131 docs.nomad.hcl +==> Monitoring evaluation "42d788a3" + Evaluation triggered by job "docs" + Allocation "e7b8d4f5" created: node "012ea79b", group "example" + Allocation "5cbf23a1" modified: node "1e1aa1e0", group "example" + Evaluation status changed: "pending" -> "complete" +==> Evaluation "42d788a3" finished with status "complete" +``` + +For more details on advanced job updating strategies such as canary builds and +build-green deployments, consult the documentation on [job update strategies]. + +[job specification]: /nomad/docs/job-specification +[job update strategies]:/nomad/docs/job-declare/strategy +[inspecting state]: /nomad/docs/job-run/inspect + diff --git a/website/content/docs/job-declare/exit-signals.mdx b/website/content/docs/job-declare/exit-signals.mdx new file mode 100644 index 000000000..2d4705a6f --- /dev/null +++ b/website/content/docs/job-declare/exit-signals.mdx @@ -0,0 +1,45 @@ +--- +layout: docs +page_title: Configure exit signals +description: |- + Configure the exit signal that Nomad sends to an application to allow it + to gracefully terminate before Nomad kills the application. +--- + +# Configure exit signals + +On operating systems that support signals, Nomad will send the application a +configurable signal before killing it. This gives the application time to +gracefully drain connections and conduct other cleanup before shutting down. +Certain applications take longer to drain than others, and thus Nomad allows +specifying the amount of time to wait for the application to exit before +force-killing it. + +Before Nomad terminates an application, it will send the `SIGINT` signal to the +process. Processes running under Nomad should respond to this signal to +gracefully drain connections. After a configurable timeout, the application +will be force-terminated. + +The signal sent may be configured with the [`kill_signal`][kill_signal] task +parameter, and the timeout before the task is force-terminated may be +configured via [`kill_timeout`][kill_timeout]. + +```hcl +job "docs" { + group "example" { + task "server" { + # ... + kill_timeout = "45s" + } + } +} +``` + +The behavior is slightly different for Docker-based tasks. Nomad will run the +`docker stop` command with the specified `kill_timeout`. The signal that `docker stop` sends to your container entrypoint is configurable using the +[`STOPSIGNAL` configuration directive]; however, please note that the default is +`SIGTERM`. + +[kill_signal]: /nomad/docs/job-specification/task#kill_signal +[kill_timeout]: /nomad/docs/job-specification/task#kill_timeout +[`stopsignal` configuration directive]: https://docs.docker.com/engine/reference/builder/#stopsignal diff --git a/website/content/docs/job-declare/failure/check-restart.mdx b/website/content/docs/job-declare/failure/check-restart.mdx new file mode 100644 index 000000000..548b90448 --- /dev/null +++ b/website/content/docs/job-declare/failure/check-restart.mdx @@ -0,0 +1,80 @@ +--- +layout: docs +page_title: Configure health check restart +description: |- + Configure your Nomad job to restart workloads when health checks + fail. +--- + +# Configure health check restart + +The [`check_restart` stanza][check restart] instructs Nomad when to restart +tasks with unhealthy service checks. When a health check in Consul has been +unhealthy for the limit specified in a check_restart stanza, it is restarted +according to the task group's restart policy. Restarts are local to the node +running the task based on the tasks `restart` policy. + +The `limit` field is used to specify the number of times a failing health check +is seen before local restarts are attempted. Operators can also specify a +`grace` duration to wait after a task restarts before checking its health. + +You should configure the check restart on services when its likely that a +restart would resolve the failure. An example of this might be restarting to +correct a transient connection issue on the service. + +The following `check_restart` stanza waits for two consecutive health check +failures with a grace period and considers both `critical` and `warning` +statuses as failures. + +```hcl +check_restart { + limit = 2 + grace = "10s" + ignore_warnings = false +} +``` + +The following CLI example output shows health check failures triggering restarts +until its restart limit is reached. + +```shell-session +$ nomad alloc status e1b43128-2a0a-6aa3-c375-c7e8a7c48690 +ID = e1b43128 +Eval ID = 249cbfe9 +Name = demo.demo[0] +Node ID = 221e998e +Job ID = demo +Job Version = 0 +Client Status = failed +Client Description = +Desired Status = run +Desired Description = +Created = 2m59s ago +Modified = 39s ago + +Task "test" is "dead" +Task Resources +CPU Memory Disk Addresses +100 MHz 300 MiB 300 MiB p1: 127.0.0.1:28422 + +Task Events: +Started At = 2018-04-12T22:50:32Z +Finished At = 2018-04-12T22:50:54Z +Total Restarts = 3 +Last Restart = 2018-04-12T17:50:15-05:00 + +Recent Events: +Time Type Description +2018-04-12T17:50:54-05:00 Not Restarting Exceeded allowed attempts 3 in interval 30m0s and mode is "fail" +2018-04-12T17:50:54-05:00 Killed Task successfully killed +2018-04-12T17:50:54-05:00 Killing Sent interrupt. Waiting 5s before force killing +2018-04-12T17:50:54-05:00 Restart Signaled health check: check "service: \"demo-service-test\" check" unhealthy +2018-04-12T17:50:32-05:00 Started Task started by client +2018-04-12T17:50:15-05:00 Restarting Task restarting in 16.887291122s +2018-04-12T17:50:15-05:00 Killed Task successfully killed +2018-04-12T17:50:15-05:00 Killing Sent interrupt. Waiting 5s before force killing +2018-04-12T17:50:15-05:00 Restart Signaled health check: check "service: \"demo-service-test\" check" unhealthy +2018-04-12T17:49:53-05:00 Started Task started by client +``` + +[check restart]: /nomad/docs/job-specification/check_restart diff --git a/website/content/docs/job-declare/failure/index.mdx b/website/content/docs/job-declare/failure/index.mdx new file mode 100644 index 000000000..a5d9058de --- /dev/null +++ b/website/content/docs/job-declare/failure/index.mdx @@ -0,0 +1,32 @@ +--- +layout: docs +page_title: Failure recovery strategies +description: |- + Discover the available job failure recovery strategies in Nomad so that you + can restart or reschedule jobs automatically if they fail. +--- + +# Failure recovery strategies + +Most applications deployed in Nomad are either long running services or one time +batch jobs. They can fail for various reasons like: + +- A temporary error in the service that resolves when its restarted. + +- An upstream dependency might not be available, leading to a health check + failure. + +- Disk, Memory or CPU contention on the node that the application is running on. + +- The application uses Docker and the Docker daemon on that node is + unresponsive. + +Nomad provides configurable options to enable recovering failed tasks to avoid +downtime. Nomad will try to restart a failed task on the node it is running on, +and also try to reschedule it on another node. Please start with one of the +guides below or use the navigation on the left for details on each option: + +- [Local restarts](/nomad/docs/job-declare/failure/restart) +- [Health check restarts](/nomad/docs/job-declare/failure/check-restart) +- [Reschedule](/nomad/docs/job-declare/failure/reschedule) + diff --git a/website/content/docs/job-declare/failure/reschedule.mdx b/website/content/docs/job-declare/failure/reschedule.mdx new file mode 100644 index 000000000..0676b3b49 --- /dev/null +++ b/website/content/docs/job-declare/failure/reschedule.mdx @@ -0,0 +1,94 @@ +--- +layout: docs +page_title: Configure reschedule +description: |- + Discover how to control the rescheduling behaviors of jobs to allow them + to be scheduled on different nodes if needed, such as in the event of a failure. +--- + +# Configure reschedule + +Tasks can sometimes fail due to network, CPU or memory issues on the node +running the task. In such situations, Nomad can reschedule the task on another +node. The [`reschedule` stanza] can be used to configure how Nomad +should try placing failed tasks on another node in the cluster. Reschedule +attempts have a delay between each attempt, and the delay can be configured to +increase between each rescheduling attempt according to a configurable +`delay_function`. Consult the [`reschedule` stanza] documentation for more +information. + +Service jobs are configured by default to have unlimited reschedule attempts. +You should use the reschedule stanza to ensure that failed tasks are +automatically reattempted on another node without needing operator intervention. + +The following CLI example shows job and allocation statuses for a task being +rescheduled by Nomad. The CLI shows the number of previous attempts if there is +a limit on the number of reschedule attempts. The CLI also shows when the next +reschedule will be attempted. + +```shell-session +$ nomad job status demo +ID = demo +Name = demo +Submit Date = 2018-04-12T15:48:37-05:00 +Type = service +Priority = 50 +Datacenters = dc1 +Status = pending +Periodic = false +Parameterized = false + +Summary +Task Group Queued Starting Running Failed Complete Lost +demo 0 0 0 2 0 0 + +Future Rescheduling Attempts +Task Group Eval ID Eval Time +demo ee3de93f 5s from now + +Allocations +ID Node ID Task Group Version Desired Status Created Modified +39d7823d f2c2eaa6 demo 0 run failed 5s ago 5s ago +fafb011b f2c2eaa6 demo 0 run failed 11s ago 10s ago + +``` + +```shell-session +$ nomad alloc status 3d0b +ID = 3d0bbdb1 +Eval ID = 79b846a9 +Name = demo.demo[0] +Node ID = 8a184f31 +Job ID = demo +Job Version = 0 +Client Status = failed +Client Description = +Desired Status = run +Desired Description = +Created = 15s ago +Modified = 15s ago +Reschedule Attempts = 3/5 +Reschedule Eligibility = 25s from now + +Task "demo" is "dead" +Task Resources +CPU Memory Disk Addresses +100 MHz 300 MiB 300 MiB p1: 127.0.0.1:27646 + +Task Events: +Started At = 2018-04-12T20:44:25Z +Finished At = 2018-04-12T20:44:25Z +Total Restarts = 0 +Last Restart = N/A + +Recent Events: +Time Type Description +2018-04-12T15:44:25-05:00 Not Restarting Policy allows no restarts +2018-04-12T15:44:25-05:00 Terminated Exit Code: 127 +2018-04-12T15:44:25-05:00 Started Task started by client +2018-04-12T15:44:25-05:00 Task Setup Building Task Directory +2018-04-12T15:44:25-05:00 Received Task received by client + +``` + +[`reschedule` stanza]: /nomad/docs/job-specification/reschedule 'Nomad reschedule Stanza' diff --git a/website/content/docs/job-declare/failure/restart.mdx b/website/content/docs/job-declare/failure/restart.mdx new file mode 100644 index 000000000..beba1eae5 --- /dev/null +++ b/website/content/docs/job-declare/failure/restart.mdx @@ -0,0 +1,99 @@ +--- +layout: docs +page_title: Configure restart +description: |- + Discover how to control the restart behaviors of jobs so that Nomad schedules + them on different nodes if needed, such as in the event of a failure. +--- + +# Configure restart + +Nomad will [by default][defaults] attempt to restart a job locally on the node +that it is running or scheduled to be running on. These defaults vary by the +scheduler type in use for the job: system, service, or batch. + +To customize this behavior, the task group can be annotated with configurable +options using the [`restart` stanza][restart]. Nomad will restart the failed +task up to `attempts` times within a provided `interval`. Operators can also +choose whether to keep attempting restarts on the same node, or to fail the task +so that it can be rescheduled on another node, via the `mode` parameter. + +Setting mode to `fail` in the restart stanza allows rescheduling to occur +potentially moving the task to another node and is best practice. + +The following CLI example shows job status and allocation status for a failed +task that is being restarted by Nomad. Allocations are in the `pending` state +while restarts are attempted. The `Recent Events` section in the CLI shows +ongoing restart attempts. + +```shell-session +$ nomad job status demo +ID = demo +Name = demo +Submit Date = 2018-04-12T14:37:18-05:00 +Type = service +Priority = 50 +Datacenters = dc1 +Status = running +Periodic = false +Parameterized = false + +Summary +Task Group Queued Starting Running Failed Complete Lost +demo 0 3 0 0 0 0 + +Allocations +ID Node ID Task Group Version Desired Status Created Modified +ce5bf1d1 8a184f31 demo 0 run pending 27s ago 5s ago +d5dee7c8 8a184f31 demo 0 run pending 27s ago 5s ago +ed815997 8a184f31 demo 0 run pending 27s ago 5s ago +``` + +In the following example, the allocation `ce5bf1d1` is restarted by Nomad +approximately every ten seconds, with a small random jitter. It eventually +reaches its limit of three attempts and transitions into a `failed` state, after +which it becomes eligible for [rescheduling][rescheduling]. + +```shell-session +$ nomad alloc status ce5bf1d1 +ID = ce5bf1d1 +Eval ID = 64e45d11 +Name = demo.demo[1] +Node ID = a0ccdd8b +Job ID = demo +Job Version = 0 +Client Status = failed +Client Description = +Desired Status = run +Desired Description = +Created = 56s ago +Modified = 22s ago + +Task "demo" is "dead" +Task Resources +CPU Memory Disk Addresses +100 MHz 300 MiB 300 MiB + +Task Events: +Started At = 2018-04-12T22:29:08Z +Finished At = 2018-04-12T22:29:08Z +Total Restarts = 3 +Last Restart = 2018-04-12T17:28:57-05:00 + +Recent Events: +Time Type Description +2018-04-12T17:29:08-05:00 Not Restarting Exceeded allowed attempts 3 in interval 5m0s and mode is "fail" +2018-04-12T17:29:08-05:00 Terminated Exit Code: 127 +2018-04-12T17:29:08-05:00 Started Task started by client +2018-04-12T17:28:57-05:00 Restarting Task restarting in 10.364602876s +2018-04-12T17:28:57-05:00 Terminated Exit Code: 127 +2018-04-12T17:28:57-05:00 Started Task started by client +2018-04-12T17:28:47-05:00 Restarting Task restarting in 10.666963769s +2018-04-12T17:28:47-05:00 Terminated Exit Code: 127 +2018-04-12T17:28:47-05:00 Started Task started by client +2018-04-12T17:28:35-05:00 Restarting Task restarting in 11.777324721s +``` + +[restart]: /nomad/docs/job-specification/restart 'Nomad restart Stanza' +[defaults]: /nomad/docs/job-specification/restart#restart-parameter-defaults +[rescheduling]: /nomad/docs/job-specification/reschedule 'Nomad restart Stanza' diff --git a/website/content/docs/job-declare/index.mdx b/website/content/docs/job-declare/index.mdx new file mode 100644 index 000000000..bbb6e0c2f --- /dev/null +++ b/website/content/docs/job-declare/index.mdx @@ -0,0 +1,50 @@ +--- +layout: docs +page_title: Declare a Nomad job +description: |- + This section provides guidance for declaring Nomad job with a job specification. +--- + +# Declare a Nomad job + +Developers deploy and manage their applications in Nomad via jobs. This section provides guidance for declaring a Nomad job with a job specification. + +In Nomad, a job is a user-specified state for a workload. The user expresses the job that should be running, but not where it should run. Nomad allocates resources and ensures that the actual state matches the user's desired state. A job consists of one or more tasks that you can organize into task groups. + +Declare the desired state of your job in a [job specification][job-spec], or jobspec, that describes the tasks and resources necessary for the job to run. You can also include job constraints to control which clients Nomad runs the job on. + +When you submit your job specification, Nomad automatically allocates resources +to run it. Nomad also makes sure that the actual job state matches your desired +state. + +## Deploy a new Nomad job + +The general flow for operating a job in Nomad is: + +1. Author the job file according to the [job specification][job-spec]. +1. Plan and review the changes with a Nomad server. +1. Submit the job file to a Nomad server. +1. (Optional) Review job status and logs. + +## Update a running job + +When updating a job, there are a number of built-in update strategies which may +be defined in the job file. The general flow for updating an existing job in +Nomad is: + +1. Modify the existing job file with the desired changes. +1. Plan and review the changes with a Nomad server. +1. Submit the job file to a Nomad server. +1. (Optional) Review job status and logs. + +Because the job file defines the update strategy (blue-green, rolling updates, +etc.), the workflow remains the same regardless of whether this is an initial +deployment or a long-running job. + +## Resources + +- Refer to the [Job concept page](/nomad/docs/concepts/job) for more information + on Nomad jobs. +- [Nomad job specification][job-spec] + +[job-spec]: /nomad/docs/job-specification diff --git a/website/content/docs/job-declare/multiregion.mdx b/website/content/docs/job-declare/multiregion.mdx new file mode 100644 index 000000000..84fd52a30 --- /dev/null +++ b/website/content/docs/job-declare/multiregion.mdx @@ -0,0 +1,459 @@ +--- +layout: docs +page_title: Configure multi-region deployments +description: |- + Deploy applications to multiple federated Nomad clusters with configurable + rollout and rollback strategies. +--- + +# Configure multi-region deployments + +Federated Nomad clusters enable users to submit jobs targeting any region +from any server even if that server resides in a different region. As of Nomad 0.12 +Enterprise, you can also submit jobs that are deployed to multiple +regions. This tutorial demonstrates multi-region deployments, including +configurable rollout and rollback strategies. + +You can create a multi-region deployment job by adding a [`multiregion`] +stanza to the job as shown below. + +```hcl +multiregion { + + strategy { + max_parallel = 1 + on_failure = "fail_all" + } + + region "west" { + count = 2 + datacenters = ["west-1"] + } + + region "east" { + count = 1 + datacenters = ["east-1", "east-2"] + } + +} +``` + + + +The functionality described here is available only in [Nomad +Enterprise](https://www.hashicorp.com/products/nomad/pricing/) with the +Multi-Cluster & Efficiency module. To explore Nomad Enterprise features, you can +sign up for a free 30-day trial from +[here](https://www.hashicorp.com/products/nomad/trial). + + + +## Prerequisites + +To perform the tasks described in this guide, you need to have two Nomad +environments running Nomad 0.12 or greater with ports 4646, 4647, and 4648 exposed. You can use this +[Terraform environment][nomad-tf] to provision the sandbox environments. This +guide assumes two clusters with one server node and two client nodes in each +cluster. While the Terraform code already opens port 4646, you will also need to +expose ports 4647 and 4648 on the server you wish to run [nomad server +join][server-join] against. Consult the [Nomad Port Requirements][ports-used] +documentation for more information. + +Next, you'll need to federate these two regions as described in the [federation guide]. + + + + This tutorial is for demo purposes and only assumes a single server +node in each cluster. Consult the [reference architecture][reference-arch] for +production configuration. + + + +Run the [`nomad server members`][nomad-server-members] command. + +```shell-session +$ nomad server members +``` + +After you have federated your clusters, the output should include the servers from both regions. + +```plaintext +Name Address Port Status Leader Protocol Build Datacenter Region +ip-172-31-26-138.east 172.31.26.138 4648 alive true 2 0.12.0+ent east-1 east +ip-172-31-29-34.west 172.31.29.34 4648 alive true 2 0.12.0+ent west-1 west +``` + +If you are using [ACLs][acls-track], you'll need to make sure your token has `submit-job` +permissions with a `global` scope. + +You may wish to review the [update strategies guides][updates-track] before +starting this guide. + +## Multi-region concepts + +Federated Nomad clusters are members of the same gossip cluster but not the +same raft/consensus cluster; they don't share their data stores. Each region in a +multi-region deployment gets an independent copy of the job, parameterized with +the values of the `region` stanza. Nomad regions coordinate to rollout each +region's deployment using rules determined by the `strategy` stanza. + +A single region deployment using one of the various [update strategies][updates-track] +begins in the `running` state and ends in either the `successful` state if it succeeds, +the `canceled` state if another deployment supersedes it before it's +complete, or the `failed` state if it fails for any other reason. A failed single +region deployment may automatically revert to the previous version of the job if +its [`update` stanza] has the [`auto_revert`][update-auto-revert] setting. + +In a multi-region deployment, regions begin in the `pending` state. This allows +Nomad to determine that all regions have accepted the job before +continuing. At this point, up to `max_parallel` regions will enter `running` at +a time. When each region completes its local deployment, it enters a `blocked` +state where it waits until the last region has completed the deployment. The +final region will unblock the regions to mark them as `successful`. + +## Create a multi-region job + +The job below will deploy to both regions. The `max_parallel` field of the +`strategy` block restricts Nomad to deploy to the regions one at a time. If +either of the region deployments fail, both regions will be marked as +failed. The `count` field for each region is interpolated for each region, +replacing the `count = 0` in the task group count. The job's `update` block +uses the default ["task states"] value to determine if the job is healthy; if +you configured a Consul [`service`][consul-service] with health checks you +could use that instead. + +```hcl +job "example" { + + multiregion { + + strategy { + max_parallel = 1 + on_failure = "fail_all" + } + + region "west" { + count = 2 + datacenters = ["west-1"] + } + + region "east" { + count = 1 + datacenters = ["east-1", "east-2"] + } + + } + + update { + max_parallel = 1 + min_healthy_time = "10s" + healthy_deadline = "2m" + progress_deadline = "3m" + auto_revert = true + auto_promote = true + canary = 1 + stagger = "30s" + } + + + group "cache" { + + count = 0 + + network { + port "db" { + to = 6379 + } + } + + task "redis" { + driver = "docker" + + config { + image = "redis:6.0" + + ports = ["db"] + } + + resources { + cpu = 256 + memory = 128 + } + } + } +} +``` + +## Run the multi-region job + +You can run the job from either region. + +```shell-session +$ nomad job run ./multi.nomad +``` + +If successful, you should receive output similar to the following. + +```plaintext +Job registration successful +Evaluation ID: f71cf273-a29e-65e3-bc5b-9710a3c5bc8f +``` + +Check the job status from the east region. + +```shell-session +$ nomad job status -region east example +``` + +Note that there are no running allocations in the east region, +and that the status is "pending" because the east region is waiting +for the west region to complete. + +```plaintext +... +Latest Deployment +ID = d74a086b +Status = pending +Description = Deployment is pending, waiting for peer region + +Multiregion Deployment +Region ID Status +east d74a086b pending +west 48fccef3 running + +Deployed +Task Group Auto Revert Desired Placed Healthy Unhealthy Progress Deadline +cache true 1 0 0 0 N/A + +Allocations +No allocations placed +``` + +Check the job status from the west region. + +```shell-session +$ nomad job status -region west example +``` + +You should observe running allocations. + +```plaintext +... +Latest Deployment +ID = 48fccef3 +Status = running +Description = Deployment is running + +Multiregion Deployment +Region ID Status +east d74a086b pending +west 48fccef3 running + +Deployed +Task Group Auto Revert Desired Placed Healthy Unhealthy Progress Deadline +cache true 2 2 0 0 2020-06-17T13:35:49Z + +Allocations +ID Node ID Task Group Version Desired Status Created Modified +44b3988a 4786abea cache 0 run running 14s ago 13s ago +7c8a2b80 4786abea cache 0 run running 13s ago 12s ago +``` + +The west region should be healthy 10s after the task state for all tasks +switches to "running". To observe, run the following status check. + +```shell-session +$ nomad job status -region west example +``` + +At this point, the status for the west region will +transition to "blocked" and the east region's deployment will become +"running". + +```plaintext +... +Latest Deployment +ID = 48fccef3 +Status = blocked +Description = Deployment is complete but waiting for peer region + +Multiregion Deployment +Region ID Status +east d74a086b running +west 48fccef3 blocked +``` + +Once the east region's deployment has completed, check the status again. + +```shell-session +$ nomad job status -region east example +``` + +Both regions should transition to "successful". + +```plaintext +... +Latest Deployment +ID = d74a086b +Status = successful +Description = Deployment completed successfully + +Multiregion Deployment +Region ID Status +east d74a086b successful +west 48fccef3 successful +``` + +## Failed deployments + +Next, you'll simulate a failed deployment. First, add a new task group that will +succeed in the west region but fail in the east region. + +```hcl +group "sidecar" { + + # set the reschedule stanza so that we don't have to wait too long + # for the deployment to be marked failed + reschedule { + attempts = 1 + interval = "24h" + unlimited = false + delay = "5s" + delay_function = "constant" + } + + task "sidecar" { + driver = "docker" + + config { + image = "busybox:1" + command = "/bin/sh" + args = ["local/script.sh"] + } + + # this script will always fail in the east region + template { + destination = "local/script.sh" + data = < + +```hcl +job "redis-actions" { + + group "cache" { + network { + port "db" {} + } + + task "redis" { + driver = "docker" + + config { + image = "redis:7" + ports = ["db"] + command = "/bin/sh" + args = ["-c", "redis-server --port ${NOMAD_PORT_db} & /local/db_log.sh"] + } + + template { + data = < + +This job creates a single Redis instance, with a port called "db" that Nomad dynamically assigns and a Nomad service health check in place. The task's [config][jobspec-task-config-block] and [template][jobspec-task-template-block] blocks start the redis server and report on current database size every 3 seconds. + +## Write your first Action + +If you were a user with management or `alloc-exec` privileges, you could add data to your Redis instance by ssh-ing into the running instance. However, this has several drawbacks: +- You might have to ssh into the instance several times to add data at different points, requiring you to remember how to do it. Or worse: another operator less familiar with the process may have to do so. +- There is no auditable record of manual additions. If you need to repeat or scale the workflow, you would have to do so manually. +- Your Nomad task may have access to Redis using managed secrets or environment variables, but you as a user may not. Passing credentials manually, either to access Redis or ssh into your Nomad instance, opens up a repeated access hole in the security of your workflow. + +Instead of `ssh`ing into the box and executing the `redis-cli SET` command over and over again, you will commit it as an action to the jobspec's task. Add the following to the `service` block of the task: + + + +```hcl +# Adds a specific key-value pair ('hello'/'world') to the Redis database +action "add-key" { + command = "/bin/sh" + args = ["-c", "redis-cli -p ${NOMAD_PORT_db} SET hello world; echo 'Key \"hello\" added with value \"world\"'"] +} +``` + + +This action uses the `redis-cli` command to set a key-value pair and then outputs a confirmation message. + +Now, submit your job: + +```shell-session +$ nomad job run redis-actions.nomad.hcl +``` + +The job will update with a new action available. Use the Nomad CLI to execute it, supplying the job, group, task, and action name: + +```shell-session +$ nomad action \ + -job=redis-actions \ + -group=cache \ + -task=redis \ +add-key +``` + +You should see the output described in our action to indicate the key was added: + +```shell-session +OK +Key "hello" added with value "world" +``` + +You've just executed a command defined in the jobspec of your running Nomad job. + +## Simulate a repeatable workflow + +An action that applies a constant state can be useful (an action that manually clears a cache, or that puts a site into maintenance mode, for example). However, for this example, simulate an action that someone might want to take many times. Instead of a constant key/value, modify the action to randomly generate strings. You can think of this action as a proxy for a real-world scenario where a persistent artifact is saved upon user sign-up, or another public-facing action. + + + +```hcl +# Adds a random key/value to the Redis database +action "add-random-key" { + command = "/bin/sh" + args = ["-c", "key=$(head /dev/urandom | tr -dc A-Za-z0-9 | head -c 13); value=$(head /dev/urandom | tr -dc A-Za-z0-9 | head -c 13); redis-cli -p ${NOMAD_PORT_db} SET $key $value; echo Key $key added with value $value"] +} +``` + + +This will add a random key/value to the database and report back. We can add a second action that differs only in that it prepends "temp_" to the key, to help illustrate further functionality: + + + +```hcl +# Adds a random key/value with a "temp_" prefix to the Redis database +action "add-random-temporary-key" { + command = "/bin/sh" + args = ["-c", "key=temp_$(head /dev/urandom | tr -dc A-Za-z0-9 | head -c 13); value=$(head /dev/urandom | tr -dc A-Za-z0-9 | head -c 13); redis-cli -p ${NOMAD_PORT_db} SET $key $value; echo Key $key added with value $value"] +} +``` + + +Like our `add-random-key` action, this new action might be thought of as a simulation of an application generating persistent artifacts. In the case of these temp keys, a real-world scenario might be keys to indicate user sign-up email verification. Soon, we will create a further Action that treats these random keys differently depending on their prefix. + +These two actions will populate the database with random data of two different sorts. Now, create an action to view the keys we've added by appending the following code block: + + + +```hcl +# Lists all keys currently stored in the Redis database. +action "list-keys" { + command = "/bin/sh" + args = ["-c", "redis-cli -p ${NOMAD_PORT_db} KEYS '*'"] +} +``` + + +Now, update your job: + +```shell-session +$ nomad job run redis-actions.nomad.hcl +``` + +If you have the [Nomad Web UI][web-ui] running, accessing your Job page should show an Actions drop-down: + +[![Nomad Job page with Actions dropdown open][actions-dropdown]][actions-dropdown] + +Selecting one of those actions will open a fly-out, complete with output from your selected action: + +[![Nomad Job page with Actions flyout open][actions-flyout]][actions-flyout] + +Next, append a new action block that creates a "safety valve" action to clear the temporary keys from our database. This uses our earlier `add-random-key` and `add-random-temporary-key` actions by differentiating between the artifacts they generated. + + + +```hcl +# Deletes all keys with a 'temp_' prefix +action "flush-temp-keys" { + command = "/bin/sh" + args = ["-c", < + +In a real-world scenario, for example, an action like this might filter and clear automatically-added entries, and report back on remaining keys or time taken to delete them. + +You can run this job from the command line in two ways: + +1: When your task is running on a single allocation, or you want to perform the action on a random allocation running your task: + +```shell-session +$ nomad action \ + -group=cache \ + -task=redis \ + -job=redis-actions \ +flush-temp-keys +``` + +2: When you want to perform the action on a specific, known allocation, first get its allocation ID: + +```shell-session +$ nomad job status redis-actions +``` + +Nomad CLI should show information about the jobspec, including the following: + +``` +ID = redis-actions +... +Allocations +ID Node ID Task Group Version Desired Status Created Modified +d841c716 03a56d12 cache 0 run running 5m4s ago 4m48s ago +``` + +Copy the ID from the Allocation displayed and run the following to perform the `flush-temp-keys` action upon it: + +```shell-session +$ nomad action \ + -alloc=d841c716 \ + -job=redis-actions \ +flush-temp-keys +``` + +If the action being run is not allocation dependent, use the first method. If your job hosted multiple instances of Redis and you need to clear the cache of a specific one, use the second method. In the real world, the method you chose will depend on your goal. + +## Actions can impact the running task + +Some actions might not affect the current state of the application. For example, processing logs, reporting and sending server statistics, revoking tokens, etc. But, in this example, the action does impact the active state of the task. The Redis task has been writing its `DBSIZE` to `db_log.sh` and logging it every few seconds. Inspect the running job and get its allocation ID. Then, run the following: + +```shell-session +nomad alloc logs +``` + +Nomad outputs the logs for the allocation ID of the job: + +```shell-session +Tue Nov 28 01:23:46 UTC 2023: Current DB Size: 1 +Tue Nov 28 01:23:49 UTC 2023: Current DB Size: 1 +Tue Nov 28 01:23:52 UTC 2023: Current DB Size: 2 +Tue Nov 28 01:23:55 UTC 2023: Current DB Size: 3 +Tue Nov 28 01:23:58 UTC 2023: Current DB Size: 3 +Tue Nov 28 01:24:01 UTC 2023: Current DB Size: 4 +Tue Nov 28 01:24:04 UTC 2023: Current DB Size: 5 +Tue Nov 28 01:24:07 UTC 2023: Current DB Size: 8 +``` + +Now add another action to impact a more low-level configuration option for our application. Redis lets us turn its persistence to disk on and off. Write an Action to check this and then flip it. Append the following action block: + + + +```hcl +# Toggles saving to disk (RDB persistence). When enabled, allocation logs will indicate a save every 60 seconds. +action "toggle-save-to-disk" { + command = "/bin/sh" + args = ["-c", < + +Enabling RDB snapshotting with the above will modify the output in your application logs, too. + +```shell-session +$ nomad action \ + -group=cache \ + -task=redis \ + -job=redis-actions \ +toggle-save-to-disk +``` + +Nomad returns a confirmation that the action run and that saving to disk is enabled. + +```shell-session +OK +Saving to disk enabled: 60 seconds interval if at least 1 key changed +``` + +Access your server logs and find the lines that show Redis saving the snapshot: + +```shell-session +Tue Nov 28 01:31:14 UTC 2023: Current DB Size: 12 +28 Nov 01:31:17.800 * 2 changes in 60 seconds. Saving... +28 Nov 01:31:17.800 * Background saving started by pid 36652 +28 Nov 01:31:17.810 * DB saved on disk +28 Nov 01:31:17.810 * RDB: 0 MB of memory used by copy-on-write +28 Nov 01:31:17.902 * Background saving terminated with success +Tue Nov 28 01:31:17 UTC 2023: Current DB Size: 12 +``` + +Nomad Actions can impact the state and behaviour of the very task on which they're running. Keeping this in mind can help developers and platform teams separate business and operational logic in their applications. + +## Indefinite and self-terminating actions + +All of the actions so far have been self-terminating: they execute a command that completes and signals its completion. However, Actions will wait for completion of the desired task, and the Nomad API and Web UI use websockets to facilitate this. + +Add an action that runs until you manually stop it with a signal interruption, like `ctrl + c` to observe the latency of the Redis instance: + + + +```hcl +# Performs a latency check of the Redis server. +# This action is a non-terminating action, meaning it will run indefinitely until it is stopped. +# Pass a signal interruption (Ctrl-C) to stop the action. +action "health-check" { + command = "/bin/sh" + args = ["-c", "redis-cli -p ${NOMAD_PORT_db} --latency"] +} +``` + + +Submit the job and run the action with the following: + +```shell-session +$ nomad action \ + -group=cache \ + -task=redis \ + -job=redis-actions \ + -t=true \ +health-check +``` + +The output should indicate the minimum, maximum, and average latency in ms for our Redis instance. Interrupting the signal or closing the websocket will end the action's execution. + +[![Non-terminating Nomad Action run via the UI][actions-health-check]][actions-health-check] + +## Wrap-up + +Find the complete Redis job with actions (with a few extras thrown in) below: + + + +```hcl +job "redis-actions" { + + group "cache" { + network { + port "db" {} + } + + task "redis" { + driver = "docker" + + config { + image = "redis:7" + ports = ["db"] + command = "/bin/sh" + args = ["-c", "redis-server --port ${NOMAD_PORT_db} & /local/db_log.sh"] + } + + template { + data = < + +Experiment with duplicating and modifying these actions to explore the potential of an actions-based workflow in Nomad. + +To further explore how Actions can be used in your workflows, consider the following: + +- The examples above are mostly self-contained in that they run in isolation on a single allocation within a job with only one task group and task. Try creating a job with multiple groups and tasks whose actions can talk to one another by way of service discovery. +- Try using the [GET job actions endpoint][api-list-job-actions] to see a list of actions available to a job and its groups and tasks +- Try writing an action that takes advantage of Nomad's environment variables: for example, the following actions are illustrative of how an operator might add shortcuts to their Nomad jobs to get a sense of system state: + +```hcl +action "get-alloc-info" { + command = "/bin/sh" + args = ["-c", + < + + You should always protect access to variables with Access Control +Lists (ACLs). Writing ACL policies for variables is covered in the [Nomad +Variables Access Control][] tutorial + + + +For complete documentation on the Nomad Variables feature and related concepts, +see the [Variables reference documentation][], the [Key Management +documentation][], and the [Workload Identity documentation][] + +## Automatic access + +The [workload identity][] for each task grants it automatic read and list access +to variables found at Nomad-owned paths with the prefix `nomad/jobs/`, followed +by the job ID, task group name, and task name. + +If you've completed the [Nomad Variables Access Control][] tutorial, you will +have a "prod" namespace and a token associated with the "prod-ops" policy. If +not, you can use a management token for this section and create the "prod" +namespace. + +```shell-session +$ nomad namespace apply -description "production environment" prod +Successfully applied namespace "prod"! +``` + +In this tutorial you'll be working in the "prod" namespace. Set the +`NOMAD_NAMESPACE` variable so that the command line writes all variables to that +namespace. + +```shell-session +export NOMAD_NAMESPACE=prod +``` + +Create the following variables to see how different jobs, groups, and tasks can +access them. + +```shell-session +nomad var put nomad/jobs password=passw0rd1 +nomad var put nomad/jobs/example person_to_greet=alice +nomad var put nomad/jobs/example/web foo=1 bar=2 baz=3 +nomad var put nomad/jobs/example/web/httpd port=8001 +nomad var put nomad/jobs/example/web/sidecar password=passw0rd2 +``` + +Create the following job specification. This job `example` has one group `web` +with two tasks, `httpd` and `sidecar`. It includes templates that access all the +variables you wrote earlier. + +```hcl +job "example" { + datacenters = ["dc1"] + + group "web" { + + network { + port "www" { + to = 8001 + } + } + + task "httpd" { + driver = "docker" + + config { + image = "busybox:1" + command = "httpd" + args = ["-v", "-f", "-p", "0.0.0.0:${PORT}", "-h", "${NOMAD_ALLOC_DIR}/data"] + ports = ["www"] + } + + template { + destination = "${NOMAD_SECRETS_DIR}/env.txt" + env = true + data = < + + Hello Variables - Index + +

Hello, {{ with nomadVar "nomad/jobs/example" }}{{ .person_to_greet }}{{ end }}!

+

Here is the group variable:

+
    + {{- with nomadVar "nomad/jobs/example/web" -}} + {{- range $k, $v := . }} +
  • {{ $k }}={{ $v }}
  • + {{- end }} + {{- end }} +
+

View the output from the sidecar task.

+ + +EOT + } + } + + task "sidecar" { + driver = "docker" + + config { + image = "busybox:1" + command = "sleep" + args = ["300"] + } + + template { + destination = "${NOMAD_ALLOC_DIR}/data/sidecar.html" + change_mode = "noop" + data = < + + Hello Variables - Sidecar + +

The task has access to the following variables:

+
    + {{- range nomadVarList "nomad" }} +
  • {{ .Path }}
  • + {{- end }} +
+

View the index page.

+ + +EOT + } + } + } +} +``` + +Run this job and wait for the deployment to complete and note the allocation +short ID. In this example, the allocation short ID is `ec6dc2e4`. + +```shell-session +$ nomad job run ./example.nomad.hcl +==> 2022-09-19T11:42:20-04:00: Monitoring evaluation "0d8a7587" + 2022-09-19T11:42:20-04:00: Evaluation triggered by job "example" + 2022-09-19T11:42:20-04:00: Evaluation within deployment: "b58da4d8" + 2022-09-19T11:42:20-04:00: Allocation "ec6dc2e4" created: node "9063a25f", group "web" + 2022-09-19T11:42:20-04:00: Evaluation status changed: "pending" -> "complete" +==> 2022-09-19T11:42:20-04:00: Evaluation "0d8a7587" finished with status "complete" +==> 2022-09-19T11:42:20-04:00: Monitoring deployment "b58da4d8" + ✓ Deployment "b58da4d8" successful + + 2022-09-19T11:42:32-04:00 + ID = b58da4d8 + Job ID = example + Job Version = 0 + Status = successful + Description = Deployment completed successfully + + Deployed + Task Group Desired Placed Healthy Unhealthy Progress Deadline + web 1 1 1 0 2022-09-19T15:52:31Z +``` + +First, use `nomad alloc exec` to enter the `httpd` task and show the command +line arguments for the processes running in the container. + +```shell-session +$ nomad alloc exec -task httpd ec6dc2e4 ps -ef +PID USER TIME COMMAND + 1 root 0:00 httpd -v -f -p 0.0.0.0:8001 -h /alloc/data + 8 root 0:00 ps -ef +``` + +Note that the port number has been interpolated with environment variable that +you rendered in the following template by using the `env` field: + +```hcl + template { + destination = "${NOMAD_SECRETS_DIR}/env.txt" + env = true + data = < 8001 +``` + +You can also use `curl`: + +```shell-session + +$ curl 127.0.0.1:21976 + + + Hello Variables - Index + +

Hello, alice!

+

Here is the group variable:

+
    +
  • bar=2
  • +
  • baz=3
  • +
  • foo=1
  • +
+

View the output from the sidecar task.

+ + +``` + +This corresponds to this template block that reads the variable accessible to +the job "example" at `nomad/jobs/example` and the variable accessible to the +group "web" within the job "example" at `nomad/jobs/example/web`. + +```hcl + template { + destination = "${NOMAD_ALLOC_DIR}/data/index.html" + change_mode = "noop" + data = < + + Hello Variables - Index + +

Hello, {{ with nomadVar "nomad/jobs/example" }}{{ .person_to_greet }}{{ end }}!

+

Here is the group variable:

+
    + {{- with nomadVar "nomad/jobs/example/web" -}} + {{- range $k, $v := . }} +
  • {{ $k }}={{ $v }}
  • + {{- end }} + {{- end }} +
+

View the output from the sidecar task.

+ + +EOT +``` + +Visit the webpage rendered by the sidecar task: + +```shell-session +curl -s http://127.0.0.1:21976/sidecar.html + + + Hello Variables - Sidecar + +

The task has access to the following variables:

+
    +
  • nomad/jobs
  • +
  • nomad/jobs/example
  • +
  • nomad/jobs/example/web
  • +
  • nomad/jobs/example/web/sidecar
  • +
+

View the index page.

+ + +``` + +This corresponds to the following template block, which lists all the variables +this task has access to in its own namespace: + +``` + template { + destination = "${NOMAD_ALLOC_DIR}/data/sidecar.html" + change_mode = "noop" + data = < + + Hello Variables - Sidecar + +

The task has access to the following variables:

+
    + {{- range nomadVarList "nomad" }} +
  • {{ .Path }}
  • + {{- end }} +
+

View the index page.

+ + +EOT + } +``` + +Note that `nomad/jobs/example/httpd` does not appear in the list. If you added a +variable to `nomad/jobs/another-example` it would also not appear in the +list. If you added `nomad/jobs/example/sidecar` to a different namespace, it +would not appear in the list. + +## Workload associated ACL policies + +You may need to give tasks access to variables that are on paths shared by many +jobs. For example, all jobs in your cluster may need a shared API key for a +third-party monitoring vendor. You can provide access to these variables secrets +by creating policies associated with the task's [workload identity][]. See +[Workload Associated ACL Policies][] for full documentation. + +Create a new namespace named `shared`. + +```shell-session +$ nomad namespace apply shared +Successfully applied namespace "shared"! +``` + +Create a variable named `vendor/foo/bar` in the `shared` namespace. + +```shell-session +nomad var put -namespace shared vendor/foo/bar user=me password=passw0rd1 +``` + +To give the task you wrote earlier access to all secrets in the `shared` +namespace, you can create the following policy file `shared-policy.hcl`. + +```hcl +namespace "shared" { + variables { + path "*" { + capabilities = ["read"] + } + } +} +``` + +Now, create the policy and associate it with the `httpd` task in the web group +of the example job, specifying the appropriate flags on the `nomad acl policy +apply` command. + +```shell-session +nomad acl policy apply \ + -namespace prod -job example -group web -task httpd \ + shared-policy ./shared-policy.hcl +``` + +You can view the policy to see that it's associated with the workload. + +```shell-session +$ nomad acl policy info shared-policy +Name = shared-policy +Description = +CreateIndex = 390 +ModifyIndex = 390 + +Associated Workload +Namespace = prod +JobID = example +Group = web +Task = httpd + +Rules + +namespace "shared" { + variables { + path "*" { + capabilities = ["read"] + } + } +} +``` + +Change the template for the `httpd` task. + +```hcl + template { + destination = "alloc/index.html" + data = < + + Hello Variables - Index + +

Hello, {{ with nomadVar "nomad/jobs/example" }}{{ .person_to_greet }}{{ end }}!

+

Here is the shared variable:

+
    + {{- with nomadVar "vendor/foo/bar@shared" }} + {{- range $k, $v := . }} +
  • {{ $k }}={{ $v }}
  • + {{- end }} + {{- end }} +
+ + +EOT +``` + +Update the job and wait for the deployment to complete. + +```shell-session +nomad job run ./example.nomad.hcl +``` + +Visit the webpage served by the `httpd` task. + +```shell-session +curl -s http://127.0.0.1:8001/index.html + + + Hello Variables - Index + +

Hello, alice!

+

Here is the shared variable:

+
    +
  • password=passw0rd1
  • +
  • user=me
  • +
+ + +``` + +## Updating task variables + +You can update the value of a variable and it will be updated in the templates +that read that value. + +Update the shared variable so that the "password" field changes. + +```shell-session +nomad var put -namespace shared -force vendor/foo/bar user=me password=passw0rd2 +``` + +After a few moments, the value will be updated on the template. + +```shell-session +curl -s http://127.0.0.1:8001/index.html + + + Hello Variables - Index + +

Hello, alice!

+

Here is the shared variable:

+
    +
  • password=passw0rd2
  • +
  • user=me
  • +
+ + +``` + +You can use the template +[`change_mode`](/nomad/docs/job-specification/template#change_mode) +to specify Nomad's behavior when a value changes. + +## Next steps + +Because Nomad Variables use functions in the template block to emit data to +Nomad jobs, consider learning more about templates in Nomad with the [Templates +collection](/nomad/tutorials/templates). + +[Nomad Variables]: /nomad/docs/concepts/variables +[Nomad Variables Access Control]: /nomad/tutorials/variables/variables-acls +[Variables reference documentation]: /nomad/docs/concepts/variables +[Key Management documentation]: /nomad/docs/manage/key-management +[Workload Identity documentation]: /nomad/docs/concepts/workload-identity +[workload identity]: /nomad/docs/concepts/workload-identity +[`template`]: /nomad/docs/job-specification/template +[Workload Associated ACL Policies]: /nomad/docs/concepts/workload-identity#workload-associated-acl-policies + diff --git a/website/content/docs/job-declare/strategy/blue-green-canary.mdx b/website/content/docs/job-declare/strategy/blue-green-canary.mdx new file mode 100644 index 000000000..b79401694 --- /dev/null +++ b/website/content/docs/job-declare/strategy/blue-green-canary.mdx @@ -0,0 +1,474 @@ +--- +layout: docs +page_title: Configure blue-green and canary deployments +description: |- + Set up and configure Nomad jobs to deploy using the blue-green and + canary deployment strategies. +--- + +# Configure blue-green and canary deployments + +Sometimes [rolling updates] do not offer the required flexibility for updating +an application in production. Often organizations prefer to put a "canary" build +into production or utilize a technique known as a "blue/green" deployment to +ensure a safe application roll-out to production while minimizing downtime. + +## Blue/Green deployments + +Blue/Green deployments have several other names including Red/Black or A/B, but +the concept is generally the same. In a blue/green deployment, there are two +application versions. Only one application version is active at a time, except +during the transition phase from one version to the next. The term "active" +tends to mean "receiving traffic" or "in service". + +Imagine a hypothetical API server which has five instances deployed to +production at version 1.3, and you want to safely update to version 1.4. You +want to create five new instances at version 1.4 and in the case that they are +operating correctly you want to promote them and take down the five versions +running 1.3. In the event of failure, you can quickly rollback to 1.3. + +To start, you examine your job which is running in production: + +```hcl +job "docs" { + # ... + + group "api" { + count = 5 + + update { + max_parallel = 1 + canary = 5 + min_healthy_time = "30s" + healthy_deadline = "10m" + auto_revert = true + auto_promote = false + } + + task "api-server" { + driver = "docker" + + config { + image = "api-server:1.3" + } + } + } +} +``` + +Notice that the job has an `update` stanza with the `canary` count equal to the +desired count. This allows a Nomad job to model blue/green deployments. When you +change the job to run the "api-server:1.4" image, Nomad will create five new +allocations while leaving the original "api-server:1.3" allocations running. + +Observe how this works by changing the image to run the new version: + +```diff +@@ -2,6 +2,8 @@ job "docs" { + group "api" { + task "api-server" { + config { +- image = "api-server:1.3" ++ image = "api-server:1.4" +``` + +Next, plan these changes. Save the modified jobspec with the new version of `api-server` to a file name `docs.nomad.hcl`. + +```shell-session +$ nomad job plan docs.nomad.hcl ++/- Job: "docs" ++/- Task Group: "api" (5 canary, 5 ignore) + +/- Task: "api-server" (forces create/destroy update) + +/- Config { + +/- image: "api-server:1.3" => "api-server:1.4" + } + +Scheduler dry-run: +- All tasks successfully allocated. + +Job Modify Index: 7 +To submit the job with version verification run: + +nomad job run -check-index 7 docs.nomad.hcl + +When running the job with the check-index flag, the job will only be run if the +job modify index given matches the server-side version. If the index has +changed, another user has modified the job and the plan's results are +potentially invalid. +``` + +Run the changes. + +```shell-session +$ nomad job run docs.nomad.hcl +## ... +``` + +The plan output states that Nomad is going to create five canaries running the +"api-server:1.4" image and ignore all the allocations running the older image. +Now, if you examine the status of the job you will note that both the blue +("api-server:1.3") and green ("api-server:1.4") set are running. + +```shell-session +$ nomad status docs +ID = docs +Name = docs +Submit Date = 07/26/17 19:57:47 UTC +Type = service +Priority = 50 +Datacenters = dc1 +Status = running +Periodic = false +Parameterized = false + +Summary +Task Group Queued Starting Running Failed Complete Lost +api 0 0 10 0 0 0 + +Latest Deployment +ID = 32a080c1 +Status = running +Description = Deployment is running but requires manual promotion + +Deployed +Task Group Auto Revert Promoted Desired Canaries Placed Healthy Unhealthy +api true false 5 5 5 5 0 + +Allocations +ID Node ID Task Group Version Desired Status Created At +6d8eec42 087852e2 api 1 run running 07/26/17 19:57:47 UTC +7051480e 087852e2 api 1 run running 07/26/17 19:57:47 UTC +36c6610f 087852e2 api 1 run running 07/26/17 19:57:47 UTC +410ba474 087852e2 api 1 run running 07/26/17 19:57:47 UTC +85662a7a 087852e2 api 1 run running 07/26/17 19:57:47 UTC +3ac3fe05 087852e2 api 0 run running 07/26/17 19:53:56 UTC +4bd51979 087852e2 api 0 run running 07/26/17 19:53:56 UTC +2998387b 087852e2 api 0 run running 07/26/17 19:53:56 UTC +35b813ee 087852e2 api 0 run running 07/26/17 19:53:56 UTC +b53b4289 087852e2 api 0 run running 07/26/17 19:53:56 UTC +``` + +Now that the new version is running in production, you can route traffic to it +and validate that it is working properly. If so, you would promote the +deployment and Nomad would stop allocations running the older version. If not, +you would either troubleshoot one of the running containers or destroy the new +containers by failing the deployment. + +### Promote the deployment + +After deploying the new image along side the old version you have determined it +is functioning properly and you want to transition fully to the new version. +Doing so is as simple as promoting the deployment: + +```shell-session +$ nomad deployment promote 32a080c1 +==> Monitoring evaluation "61ac2be5" + Evaluation triggered by job "docs" + Evaluation within deployment: "32a080c1" + Evaluation status changed: "pending" -> "complete" +==> Evaluation "61ac2be5" finished with status "complete" +``` + +If you inspect the job's status, you can observe that after promotion, Nomad +stopped the older allocations and is only running the new one. This now +completes the blue/green deployment. + +```shell-session +$ nomad status docs +ID = docs +Name = docs +Submit Date = 07/26/17 19:57:47 UTC +Type = service +Priority = 50 +Datacenters = dc1 +Status = running +Periodic = false +Parameterized = false + +Summary +Task Group Queued Starting Running Failed Complete Lost +api 0 0 5 0 5 0 + +Latest Deployment +ID = 32a080c1 +Status = successful +Description = Deployment completed successfully + +Deployed +Task Group Auto Revert Promoted Desired Canaries Placed Healthy Unhealthy +api true true 5 5 5 5 0 + +Allocations +ID Node ID Task Group Version Desired Status Created At +6d8eec42 087852e2 api 1 run running 07/26/17 19:57:47 UTC +7051480e 087852e2 api 1 run running 07/26/17 19:57:47 UTC +36c6610f 087852e2 api 1 run running 07/26/17 19:57:47 UTC +410ba474 087852e2 api 1 run running 07/26/17 19:57:47 UTC +85662a7a 087852e2 api 1 run running 07/26/17 19:57:47 UTC +3ac3fe05 087852e2 api 0 stop complete 07/26/17 19:53:56 UTC +4bd51979 087852e2 api 0 stop complete 07/26/17 19:53:56 UTC +2998387b 087852e2 api 0 stop complete 07/26/17 19:53:56 UTC +35b813ee 087852e2 api 0 stop complete 07/26/17 19:53:56 UTC +b53b4289 087852e2 api 0 stop complete 07/26/17 19:53:56 UTC +``` + +### Fail a deployment + +After deploying the new image alongside the old version you have determined it +is not functioning properly and you want to roll back to the old version. Doing +so is as simple as failing the deployment: + +```shell-session +$ nomad deployment fail 32a080c1 +Deployment "32a080c1-de5a-a4e7-0218-521d8344c328" failed. Auto-reverted to job version 0. + +==> Monitoring evaluation "6840f512" + Evaluation triggered by job "example" + Evaluation within deployment: "32a080c1" + Allocation "0ccb732f" modified: node "36e7a123", group "cache" + Allocation "64d4f282" modified: node "36e7a123", group "cache" + Allocation "664e33c7" modified: node "36e7a123", group "cache" + Allocation "a4cb6a4b" modified: node "36e7a123", group "cache" + Allocation "fdd73bdd" modified: node "36e7a123", group "cache" + Evaluation status changed: "pending" -> "complete" +==> Evaluation "6840f512" finished with status "complete" +``` + +After failing the deployment, check the job's status. Confirm that Nomad has +stopped the new allocations and is only running the old ones, and that the working +copy of the job has reverted back to the original specification running "api-server:1.3". + +```shell-session +$ nomad status docs +ID = docs +Name = docs +Submit Date = 07/26/17 19:57:47 UTC +Type = service +Priority = 50 +Datacenters = dc1 +Status = running +Periodic = false +Parameterized = false + +Summary +Task Group Queued Starting Running Failed Complete Lost +api 0 0 5 0 5 0 + +Latest Deployment +ID = 6f3f84b3 +Status = successful +Description = Deployment completed successfully + +Deployed +Task Group Auto Revert Desired Placed Healthy Unhealthy +cache true 5 5 5 0 + +Allocations +ID Node ID Task Group Version Desired Status Created At +27dc2a42 36e7a123 api 1 stop complete 07/26/17 20:07:31 UTC +5b7d34bb 36e7a123 api 1 stop complete 07/26/17 20:07:31 UTC +983b487d 36e7a123 api 1 stop complete 07/26/17 20:07:31 UTC +d1cbf45a 36e7a123 api 1 stop complete 07/26/17 20:07:31 UTC +d6b46def 36e7a123 api 1 stop complete 07/26/17 20:07:31 UTC +0ccb732f 36e7a123 api 2 run running 07/26/17 20:06:29 UTC +64d4f282 36e7a123 api 2 run running 07/26/17 20:06:29 UTC +664e33c7 36e7a123 api 2 run running 07/26/17 20:06:29 UTC +a4cb6a4b 36e7a123 api 2 run running 07/26/17 20:06:29 UTC +fdd73bdd 36e7a123 api 2 run running 07/26/17 20:06:29 UTC +``` + +```shell-session +$ nomad job deployments docs +ID Job ID Job Version Status Description +6f3f84b3 example 2 successful Deployment completed successfully +32a080c1 example 1 failed Deployment marked as failed - rolling back to job version 0 +c4c16494 example 0 successful Deployment completed successfully +``` + +## Deploy with canaries + +Canary updates are a useful way to test a new version of a job before beginning +a rolling update. The `update` stanza supports setting the number of canaries +the job operator would like Nomad to create when the job changes via the +`canary` parameter. When the job specification is updated, Nomad creates the +canaries without stopping any allocations from the previous job. + +This pattern allows operators to achieve higher confidence in the new job +version because they can route traffic, examine logs, etc, to determine the new +application is performing properly. + +```hcl +job "docs" { + # ... + + group "api" { + count = 5 + + update { + max_parallel = 1 + canary = 1 + min_healthy_time = "30s" + healthy_deadline = "10m" + auto_revert = true + auto_promote = false + } + + task "api-server" { + driver = "docker" + + config { + image = "api-server:1.3" + } + } + } +} +``` + +In the example above, the `update` stanza tells Nomad to create a single canary +when the job specification is changed. + +You can experience how this behaves by changing the image to run the new +version: + +```diff +@@ -2,6 +2,8 @@ job "docs" { + group "api" { + task "api-server" { + config { +- image = "api-server:1.3" ++ image = "api-server:1.4" +``` + +Next, plan these changes. + +```shell-session +$ nomad job plan docs.nomad.hcl ++/- Job: "docs" ++/- Task Group: "api" (1 canary, 5 ignore) + +/- Task: "api-server" (forces create/destroy update) + +/- Config { + +/- image: "api-server:1.3" => "api-server:1.4" + } + +Scheduler dry-run: +- All tasks successfully allocated. + +Job Modify Index: 7 +To submit the job with version verification run: + +nomad job run -check-index 7 docs.nomad.hcl + +When running the job with the check-index flag, the job will only be run if the +job modify index given matches the server-side version. If the index has +changed, another user has modified the job and the plan's results are +potentially invalid. + +$ nomad job run docs.nomad.hcl +# ... +``` + +Run the changes. + +```shell-session +$ nomad job run docs.nomad.hcl +## ... +``` + +Note from the plan output, Nomad is going to create one canary that will run the +"api-server:1.4" image and ignore all the allocations running the older image. +After running the job, The `nomad status` command output shows that the canary +is running along side the older version of the job: + +```shell-session +$ nomad status docs +ID = docs +Name = docs +Submit Date = 07/26/17 19:57:47 UTC +Type = service +Priority = 50 +Datacenters = dc1 +Status = running +Periodic = false +Parameterized = false + +Summary +Task Group Queued Starting Running Failed Complete Lost +api 0 0 6 0 0 0 + +Latest Deployment +ID = 32a080c1 +Status = running +Description = Deployment is running but requires manual promotion + +Deployed +Task Group Auto Revert Promoted Desired Canaries Placed Healthy Unhealthy +api true false 5 1 1 1 0 + +Allocations +ID Node ID Task Group Version Desired Status Created At +85662a7a 087852e2 api 1 run running 07/26/17 19:57:47 UTC +3ac3fe05 087852e2 api 0 run running 07/26/17 19:53:56 UTC +4bd51979 087852e2 api 0 run running 07/26/17 19:53:56 UTC +2998387b 087852e2 api 0 run running 07/26/17 19:53:56 UTC +35b813ee 087852e2 api 0 run running 07/26/17 19:53:56 UTC +b53b4289 087852e2 api 0 run running 07/26/17 19:53:56 UTC +``` + +Now if you promote the canary, this will trigger a rolling update to replace the +remaining allocations running the older image. The rolling update will happen at +a rate of `max_parallel`, so in this case, one allocation at a time. + +```shell-session +$ nomad deployment promote 37033151 +==> Monitoring evaluation "37033151" + Evaluation triggered by job "docs" + Evaluation within deployment: "ed28f6c2" + Allocation "f5057465" created: node "f6646949", group "cache" + Allocation "f5057465" status changed: "pending" -> "running" + Evaluation status changed: "pending" -> "complete" +==> Evaluation "37033151" finished with status "complete" +``` + +Check the status. + +```shell-session +$ nomad status docs +ID = docs +Name = docs +Submit Date = 07/26/17 20:28:59 UTC +Type = service +Priority = 50 +Datacenters = dc1 +Status = running +Periodic = false +Parameterized = false + +Summary +Task Group Queued Starting Running Failed Complete Lost +api 0 0 5 0 2 0 + +Latest Deployment +ID = ed28f6c2 +Status = running +Description = Deployment is running + +Deployed +Task Group Auto Revert Promoted Desired Canaries Placed Healthy Unhealthy +api true true 5 1 2 1 0 + +Allocations +ID Node ID Task Group Version Desired Status Created At +f5057465 f6646949 api 1 run running 07/26/17 20:29:23 UTC +b1c88d20 f6646949 api 1 run running 07/26/17 20:28:59 UTC +1140bacf f6646949 api 0 run running 07/26/17 20:28:37 UTC +1958a34a f6646949 api 0 run running 07/26/17 20:28:37 UTC +4bda385a f6646949 api 0 run running 07/26/17 20:28:37 UTC +62d96f06 f6646949 api 0 stop complete 07/26/17 20:28:37 UTC +f58abbb2 f6646949 api 0 stop complete 07/26/17 20:28:37 UTC +``` + +Alternatively, if the canary was not performing properly, you could abandon the +change using the `nomad deployment fail` command, similar to the blue/green +example. + +[rolling updates]: /nomad/docs/job-declare/strategy/rolling diff --git a/website/content/docs/job-declare/strategy/index.mdx b/website/content/docs/job-declare/strategy/index.mdx new file mode 100644 index 000000000..fc78f08b6 --- /dev/null +++ b/website/content/docs/job-declare/strategy/index.mdx @@ -0,0 +1,16 @@ +--- +layout: docs +page_title: Job deployment strategies +description: |- + Discover common patterns to update running jobs in Nomad, including rolling updates, blue-green deployments, and canary deployments. Nomad provides built-in support for each strategy. +--- + +# Job deployment strategies + +Most applications are long-lived and require updates over time. Whether you are +deploying a new version of your web application or upgrading to a new version of +Redis, Nomad has built-in support for rolling, blue/green, and canary updates. +When a job specifies a rolling update, Nomad uses task state and health check +information in order to detect allocation health and minimize or eliminate +downtime. This section and subsections will explore how to do so safely with +Nomad. diff --git a/website/content/docs/job-declare/strategy/rolling.mdx b/website/content/docs/job-declare/strategy/rolling.mdx new file mode 100644 index 000000000..667f9e5be --- /dev/null +++ b/website/content/docs/job-declare/strategy/rolling.mdx @@ -0,0 +1,323 @@ +--- +layout: docs +page_title: Configure rolling updates +description: |- + Enable rolling updates for a Nomad job, inspect the deployment, and set up + Nomad to automatically revert failed deployments to a previous working + version. +--- + +# Configure rolling updates + +Nomad supports rolling updates as a first class feature. To enable rolling +updates a job or task group is annotated with a high-level description of the +update strategy using the [`update` stanza]. Under the hood, Nomad handles +limiting parallelism, interfacing with Consul to determine service health and +even automatically reverting to an older, healthy job when a deployment fails. + +## Enable rolling updates for job + +Rolling updates are enabled by adding the [`update` stanza] to the job +specification. The `update` stanza may be placed at the job level or in an +individual task group. When placed at the job level, the update strategy is +inherited by all task groups in the job. When placed at both the job and group +level, the `update` stanzas are merged, with group stanzas taking precedence +over job level stanzas. There is more information about +[inheritance][update-stanza-inheritance] in the `update` stanza documentation. + +```hcl +job "geo-api-server" { + # ... + + group "api-server" { + count = 6 + + # Add an update stanza to enable rolling updates of the service + update { + max_parallel = 2 + min_healthy_time = "30s" + healthy_deadline = "10m" + } + + task "server" { + driver = "docker" + + config { + image = "geo-api-server:0.1" + } + + # ... + } + } +} +``` + +In this example, by adding the simple `update` stanza to the "api-server" task +group, you inform Nomad that updates to the group should be handled with a +rolling update strategy. + +Thus when a change is made to the job file that requires new allocations to be +made, Nomad will deploy 2 allocations at a time and require that the allocations +be running in a healthy state for 30 seconds before deploying more versions of the +new group. + +By default Nomad determines allocation health by ensuring that all tasks in the +group are running and that any [service check] the tasks register are passing. + +## Check the planned changes + +Suppose you make a change to a file to update the version of a Docker container +that is configured with the same rolling update strategy from above. + +```diff +@@ -2,6 +2,8 @@ job "geo-api-server" { + group "api-server" { + task "server" { + driver = "docker" + + config { +- image = "geo-api-server:0.1" ++ image = "geo-api-server:0.2" +``` + +The [`nomad job plan` command] allows you to visualize the series of steps the +scheduler would perform. You can analyze this output to confirm it is correct: + +```shell-session +$ nomad job plan geo-api-server.nomad.hcl ++/- Job: "geo-api-server" ++/- Task Group: "api-server" (2 create/destroy update, 4 ignore) + +/- Task: "server" (forces create/destroy update) + +/- Config { + +/- image: "geo-api-server:0.1" => "geo-api-server:0.2" + } + +Scheduler dry-run: +- All tasks successfully allocated. + +Job Modify Index: 7 +To submit the job with version verification run: + +nomad job run -check-index 7 geo-api-server.nomad.hcl + +When running the job with the check-index flag, the job will only be run if the +job modify index given matches the server-side version. If the index has +changed, another user has modified the job and the plan's results are +potentially invalid. +``` + +Here you can observe that Nomad will begin a rolling update by creating and +destroying two allocations first; for the time being ignoring four of the old +allocations, consistent with the configured `max_parallel` count. + +## Inspect a deployment + +After running the plan you can submit the updated job by running `nomad run`. +Once run, Nomad will begin the rolling update of the service by placing two +allocations at a time of the new job and taking two of the old jobs down. + +You can inspect the current state of a rolling deployment using `nomad status`: + +```shell-session +$ nomad status geo-api-server +ID = geo-api-server +Name = geo-api-server +Submit Date = 07/26/17 18:08:56 UTC +Type = service +Priority = 50 +Datacenters = dc1 +Status = running +Periodic = false +Parameterized = false + +Summary +Task Group Queued Starting Running Failed Complete Lost +api-server 0 0 6 0 4 0 + +Latest Deployment +ID = c5b34665 +Status = running +Description = Deployment is running + +Deployed +Task Group Desired Placed Healthy Unhealthy +api-server 6 4 2 0 + +Allocations +ID Node ID Task Group Version Desired Status Created At +14d288e8 f7b1ee08 api-server 1 run running 07/26/17 18:09:17 UTC +a134f73c f7b1ee08 api-server 1 run running 07/26/17 18:09:17 UTC +a2574bb6 f7b1ee08 api-server 1 run running 07/26/17 18:08:56 UTC +496e7aa2 f7b1ee08 api-server 1 run running 07/26/17 18:08:56 UTC +9fc96fcc f7b1ee08 api-server 0 run running 07/26/17 18:04:30 UTC +2521c47a f7b1ee08 api-server 0 run running 07/26/17 18:04:30 UTC +6b794fcb f7b1ee08 api-server 0 stop complete 07/26/17 18:04:30 UTC +9bc11bd7 f7b1ee08 api-server 0 stop complete 07/26/17 18:04:30 UTC +691eea24 f7b1ee08 api-server 0 stop complete 07/26/17 18:04:30 UTC +af115865 f7b1ee08 api-server 0 stop complete 07/26/17 18:04:30 UTC +``` + +The output indicates that Nomad has created a deployment to conduct the rolling +update from job version 0 to 1. It has placed four instances of the new job and +has stopped four of the old instances. Consult the list of deployed allocations, +and note that Nomad has placed four instances of job version 1 but only +considers 2 of them healthy. This is because the two newest placed allocations +haven't been healthy for the required 30 seconds yet. + +Wait for the deployment to complete, and then re-issue the command. You will +receive output similar to the following: + +```shell-session +$ nomad status geo-api-server +ID = geo-api-server +Name = geo-api-server +Submit Date = 07/26/17 18:08:56 UTC +Type = service +Priority = 50 +Datacenters = dc1 +Status = running +Periodic = false +Parameterized = false + +Summary +Task Group Queued Starting Running Failed Complete Lost +cache 0 0 6 0 6 0 + +Latest Deployment +ID = c5b34665 +Status = successful +Description = Deployment completed successfully + +Deployed +Task Group Desired Placed Healthy Unhealthy +cache 6 6 6 0 + +Allocations +ID Node ID Task Group Version Desired Status Created At +d42a1656 f7b1ee08 api-server 1 run running 07/26/17 18:10:10 UTC +401daaf9 f7b1ee08 api-server 1 run running 07/26/17 18:10:00 UTC +14d288e8 f7b1ee08 api-server 1 run running 07/26/17 18:09:17 UTC +a134f73c f7b1ee08 api-server 1 run running 07/26/17 18:09:17 UTC +a2574bb6 f7b1ee08 api-server 1 run running 07/26/17 18:08:56 UTC +496e7aa2 f7b1ee08 api-server 1 run running 07/26/17 18:08:56 UTC +9fc96fcc f7b1ee08 api-server 0 stop complete 07/26/17 18:04:30 UTC +2521c47a f7b1ee08 api-server 0 stop complete 07/26/17 18:04:30 UTC +6b794fcb f7b1ee08 api-server 0 stop complete 07/26/17 18:04:30 UTC +9bc11bd7 f7b1ee08 api-server 0 stop complete 07/26/17 18:04:30 UTC +691eea24 f7b1ee08 api-server 0 stop complete 07/26/17 18:04:30 UTC +af115865 f7b1ee08 api-server 0 stop complete 07/26/17 18:04:30 UTC +``` + +Nomad has successfully transitioned the group to running the updated canary and +did so with no downtime to your service by ensuring only two allocations were +changed at a time and the newly placed allocations ran successfully. Had any of +the newly placed allocations failed their health check, Nomad would have aborted +the deployment and stopped placing new allocations. If configured, Nomad can +automatically revert back to the old job definition when the deployment fails. + +## Use auto-revert on failed deployments + +In the case you do a deployment in which the new allocations are unhealthy, +Nomad will fail the deployment and stop placing new instances of the job. It +optionally supports automatically reverting back to the last stable job version +on deployment failure. Nomad keeps a history of submitted jobs and whether the +job version was stable. A job is considered stable if all its allocations are +healthy. + +To enable this, add the `auto_revert` parameter to the `update` stanza: + +```hcl +update { + max_parallel = 2 + min_healthy_time = "30s" + healthy_deadline = "10m" + + # Enable automatically reverting to the last stable job on a failed + # deployment. + auto_revert = true +} +``` + +Now imagine you want to update your image to "geo-api-server:0.3" but you instead +update it to the below and run the job: + +```diff +@@ -2,6 +2,8 @@ job "geo-api-server" { + group "api-server" { + task "server" { + driver = "docker" + + config { +- image = "geo-api-server:0.2" ++ image = "geo-api-server:0.33" +``` + +Running `nomad job deployments` will show that the deployment fails, and Nomad +auto-reverted to the last stable job: + +```shell-session +$ nomad job deployments geo-api-server +ID Job ID Job Version Status Description +0c6f87a5 geo-api-server 3 successful Deployment completed successfully +b1712b7f geo-api-server 2 failed Failed due to unhealthy allocations - rolling back to job version 1 +3eee83ce geo-api-server 1 successful Deployment completed successfully +72813fcf geo-api-server 0 successful Deployment completed successfully +``` + +Nomad job versions increment monotonically. Even though Nomad reverted to the +job specification at version 1, it creates a new job version. You can observe the +differences between a job's versions and how Nomad auto-reverted the job using +the `job history` command: + +```shell-session +$ nomad job history -p geo-api-server +Version = 3 +Stable = true +Submit Date = 07/26/17 18:44:18 UTC +Diff = ++/- Job: "geo-api-server" ++/- Task Group: "api-server" + +/- Task: "server" + +/- Config { + +/- image: "geo-api-server:0.33" => "geo-api-server:0.2" + } + +Version = 2 +Stable = false +Submit Date = 07/26/17 18:45:21 UTC +Diff = ++/- Job: "geo-api-server" ++/- Task Group: "api-server" + +/- Task: "server" + +/- Config { + +/- image: "geo-api-server:0.2" => "geo-api-server:0.33" + } + +Version = 1 +Stable = true +Submit Date = 07/26/17 18:44:18 UTC +Diff = ++/- Job: "geo-api-server" ++/- Task Group: "api-server" + +/- Task: "server" + +/- Config { + +/- image: "geo-api-server:0.1" => "geo-api-server:0.2" + } + +Version = 0 +Stable = true +Submit Date = 07/26/17 18:43:43 UTC +``` + +This output describes the process of a reverted deployment. Starting at the end +of the output and working backwards, Nomad shows that version 0 was submitted. +Next, version 1 was an image change from 0.1 to 0.2 of geo-api-server and was +flagged as stable. Version 2 of the job attempted to update geo-api-server from +0.2 to 0.33; however, the deployment failed and never became stable. Finally, +version 3 of the job is created when Nomad automatically reverts the failed +deployment and redeploys the last healthy version--geo-api-server version 0.2. + +[`nomad job plan` command]: /nomad/commands/job/plan +[`update` stanza]: /nomad/docs/job-specification/update +[service check]: /nomad/docs/job-specification/check +[update-stanza-inheritance]: /nomad/docs/job-specification/update diff --git a/website/content/docs/job-declare/task-dependencies.mdx b/website/content/docs/job-declare/task-dependencies.mdx new file mode 100644 index 000000000..9a7cfe068 --- /dev/null +++ b/website/content/docs/job-declare/task-dependencies.mdx @@ -0,0 +1,385 @@ +--- +layout: docs +page_title: Configure task dependencies +description: |- + Create, configure, and run two jobs to use init and sidecar tasks. Create a dependency between the jobs and discover how to model complex workload dependency trees. +--- + +# Configure task dependencies + + +Nomad task dependencies provide the ability to define prestart tasks. + +Prestart tasks have two patterns: init tasks and sidecar tasks. Init +tasks are tasks that must run to completion before the main workload is started. +They are commonly used to download assets or to create necessary tables for an +extract-transform-load (ETL) job. Sidecar tasks are started before main workload +starts and run for the lifetime of the main workload. Typical sidecars tasks are +log forwarders, proxies, and for platform abstractions. This tutorial demonstrates +an init task. + +You can create an init task by adding a [`lifecycle` stanza] with `hook` set to +`prestart` and `sidecar` to `false` as below. + +```hcl + lifecycle { + hook = "prestart" + sidecar = false + } +``` + +You can model complex job dependency trees by using one or more init tasks to +delay the job's main tasks from running until a condition is met. In this case, +until a service is available and advertised in Consul. + +In this tutorial you will work with several Nomad objects: + +- **mock-app** - a job file that contains two tasks + + - **await-mock-service** - an init task that will wait infinitely for a + service named "mock-service" to be advertised over the Consul DNS API. Once + found, it will exit successfully. + + - **mock-app-container** - the main workload task that is dependent on the + "mock-service" service. + +- **mock-service** - a job that contains one task which advertises a service + named "mock-service". This is provided as a Nomad job as a convenience, but + could be replaced by any means of registering a service named "mock-service" + in Consul, like the CLI or API. + +In this guide, you will complete the following actions: + +- Deploy the "mock-app" job. + +- Verify that the "mock-app-container" task remains in pending and unstarted. + +- Start the "mock-service" job. + +- Verify that the "await-mock-service" container completes successfully and that + the "mock-app-container" task starts. + +## Prerequisites + +To complete this tutorial you will need: + +- a Nomad cluster and at least one Consul server. +- Nomad v0.11.0 or greater + +If you do not have an existing Nomad cluster, you can learn how to deploy +on using the [Install Nomad] guide. Similarly, if you do not have an +existing Consul datacenter, you can learn how to deploy Consul with the +[Install Consul] guide. + +## Create the mock-app job file + +This example uses a looping script, in the `config` stanza, to mock service +payloads. + +Create an HCL file named `mock-app.nomad.hcl` with the following content. + +```hcl +job "mock-app" { + datacenters = ["dc1"] + type = "service" + + group "mock-app" { + # disable deployments + update { + max_parallel = 0 + } + + task "await-mock-service" { + driver = "docker" + + config { + image = "busybox:1.28" + command = "sh" + args = ["-c", "echo -n 'Waiting for service'; until nslookup mock-service.service.consul 2>&1 >/dev/null; do echo '.'; sleep 2; done"] + network_mode = "host" + } + + resources { + cpu = 200 + memory = 128 + } + + lifecycle { + hook = "prestart" + sidecar = false + } + } + + task "mock-app-container" { + driver = "docker" + + config { + image = "busybox" + command = "sh" + args = ["-c", "echo The app is running! && sleep 3600"] + } + + resources { + cpu = 200 + memory = 128 + } + } + } +} +``` + +The job contains two tasks—"await-mock-service" and "mock-app". The +"await-mock-service" task is configured to busy-wait until the "mock-service" +service is advertised in Consul. For this guide, this will not happen until you +run the `mock-service.nomad.hcl` job. In a more real-world case, this could be any +service dependency that advertises itself in Consul. + +You can use this pattern to model more complicated chains of service dependency +by including more await-style workloads. + +### Ensure that name resolution works properly + +Since the "await-mock-service" task uses nslookup inside of a Docker container, +you will need to ensure that your container can perform lookups against your +Consul DNS API endpoint. This tutorial uses `network_mode = host` to allow the +container to use the Nomad client nodes DNS resolution pathway. + +The nslookup application will only perform queries on the standard DNS port +(53). You might need to use an application to forward requests from port 53 to +the Consul DNS API port—port 8600 by default. You can learn several ways to +accomplish this forwarding in the [Forward DNS] Learn guide. + +You could also add a `dns_servers` value to the config stanza of the +"await-mock-service" task in the mock-app.nomad.hcl file to direct the query to a +DNS server directly that meets the above criteria. + +## Run the mock-app job + +Run `nomad run mock-app.nomad.hcl`. + +```shell-session +$ nomad run mock-app.nomad.hcl +``` + +The job will launch and provide you an allocation ID in the output. + +```shell-session +$ nomad run mock-app.nomad.hcl +==> Monitoring evaluation "01c73d5a" + Evaluation triggered by job "mock-app" + Allocation "3044dda0" created: node "f26809e6", group "mock-app" + Evaluation status changed: "pending" -> "complete" +==> Evaluation "01c73d5a" finished with status "complete" +``` + +## Verify mock-app-container is pending + +Run the `nomad alloc status` command with the provided allocation ID. + +```shell-session +$ nomad alloc status 3044dda0 +``` + +The `nomad alloc status` command provides you with useful information about the resource. For this guide, focus on the status +of each task. Each task's status is output in lines that look like `Task "await-mock-service" is "running"`. + +```shell-session +$ nomad alloc status 3044dda0 +ID = 3044dda0-8dc1-1bac-86ea-66a3557c67d3 +Eval ID = 01c73d5a +Name = mock-app.mock-app[0] +Node ID = f26809e6 +Node Name = nomad-client-2.node.consul +Job ID = mock-app +Job Version = 0 +Client Status = running +Client Description = Tasks are running +Desired Status = run +Desired Description = +Created = 43s ago +Modified = 42s ago + +Task "await-mock-service" (prestart) is "running" +Task Resources +CPU Memory Disk Addresses +3/200 MHz 80 KiB/128 MiB 300 MiB + +Task Events: +Started At = 2020-03-18T17:07:26Z +Finished At = N/A +Total Restarts = 0 +Last Restart = N/A + +Recent Events: +Time Type Description +2020-03-18T13:07:26-04:00 Started Task started by client +2020-03-18T13:07:26-04:00 Task Setup Building Task Directory +2020-03-18T13:07:26-04:00 Received Task received by client + +Task "mock-app-container" is "pending" +Task Resources +CPU Memory Disk Addresses +200 MHz 128 MiB 300 MiB + +Task Events: +Started At = N/A +Finished At = N/A +Total Restarts = 0 +Last Restart = N/A + +Recent Events: +Time Type Description +2020-03-18T13:07:26-04:00 Received Task received by client +``` + +Notice that the await-mock-service task is running and that the +"mock-app-container" task is pending. The "mock-app-container" task will remain +in pending until the "await-mock-service" task completes successfully. + +## Create the mock-service job file + +Create a file named `mock-service.nomad.hcl` with the following content. + +```hcl +job "mock-service" { + datacenters = ["dc1"] + type = "service" + + group "mock-service" { + task "mock-service" { + driver = "docker" + + config { + image = "busybox" + command = "sh" + args = ["-c", "echo The service is running! && while true; do sleep 2; done"] + } + + resources { + cpu = 200 + memory = 128 + } + + service { + name = "mock-service" + } + } + } +} + +``` + +This job advertises the "mock-service" service in Consul. When run, this +will allow the await-mock-service task to complete successfully and let +the "mock-app-container" task start up. + +## Start mock-service job + +Run `nomad run mock-service.nomad.hcl`. + +```shell-session +$ nomad run mock-service.nomad.hcl +``` + +Nomad will start the job and return information about the scheduling information. + +```shell-session +$ nomad run mock-service.nomad +==> Monitoring evaluation "f31f8eb1" + Evaluation triggered by job "mock-service" + Allocation "d7767adf" created: node "f26809e6", group "mock-service" + Evaluation within deployment: "3d86e09a" + Evaluation status changed: "pending" -> "complete" +==> Evaluation "f31f8eb1" finished with status "complete" +``` + +## Verify mock-app-container is running + +Finally, check the output of the `nomad alloc status` command again to check the +task statuses. Use the allocation ID from when you ran the "mock-app" job. + +```shell-session +$ nomad alloc status 3044dda0 +``` + +Again, focus on the task status lines for "await-mock-service" and +"mock-app-container". + +```plaintext +ID = 3044dda0-8dc1-1bac-86ea-66a3557c67d3 +Eval ID = 01c73d5a +Name = mock-app.mock-app[0] +Node ID = f26809e6 +Node Name = nomad-client-2.node.consul +Job ID = mock-app +Job Version = 0 +Client Status = running +Client Description = Tasks are running +Desired Status = run +Desired Description = +Created = 21m38s ago +Modified = 7m27s ago + +Task "await-mock-service" (prestart) is "dead" +Task Resources +CPU Memory Disk Addresses +0/200 MHz 80 KiB/128 MiB 300 MiB + +Task Events: +Started At = 2020-03-18T17:07:26Z +Finished At = 2020-03-18T17:21:35Z +Total Restarts = 0 +Last Restart = N/A + +Recent Events: +Time Type Description +2020-03-18T13:21:35-04:00 Terminated Exit Code: 0 +2020-03-18T13:07:26-04:00 Started Task started by client +2020-03-18T13:07:26-04:00 Task Setup Building Task Directory +2020-03-18T13:07:26-04:00 Received Task received by client + +Task "mock-app-container" is "running" +Task Resources +CPU Memory Disk Addresses +0/200 MHz 32 KiB/128 MiB 300 MiB + +Task Events: +Started At = 2020-03-18T17:21:37Z +Finished At = N/A +Total Restarts = 0 +Last Restart = N/A + +Recent Events: +Time Type Description +2020-03-18T13:21:37-04:00 Started Task started by client +2020-03-18T13:21:35-04:00 Driver Downloading image +2020-03-18T13:21:35-04:00 Task Setup Building Task Directory +2020-03-18T13:07:26-04:00 Received Task received by client +``` + +Notice, the "await-mock-service" task is dead and based on the "Recent Events" +table terminated with "Exit Code: 0". This indicates that it completed +successfully. The "mock-app-container" task has now transitioned to the +"running" status and the container is running. + +## Next steps + +Now that you have completed this guide, you have experimented with using Nomad +task dependencies to model inter-job dependencies. + +### Further reading + +- [`lifecycle` stanza] +- [`service` stanza] +- Consul [Service Definition] +- Consul [DNS Interface] + +[forward dns]: /consul/tutorials/networking/dns-forwarding +[install consul]: /consul/tutorials/production-deploy/deployment-guide +[install nomad]: /nomad/tutorials/enterprise/production-deployment-guide-vm-with-consul +[`lifecycle` stanza]: /nomad/docs/job-specification/lifecycle +[`service` stanza]: /nomad/docs/job-specification/service +[service definition]: /consul/docs/services/usage/define-services +[dns interface]: /consul/docs/services/discovery/dns-overview +[dns interface]: /consul/docs/services/discovery/dns-overview + diff --git a/website/content/docs/drivers/docker.mdx b/website/content/docs/job-declare/task-driver/docker.mdx similarity index 57% rename from website/content/docs/drivers/docker.mdx rename to website/content/docs/job-declare/task-driver/docker.mdx index 146c335eb..5b8511761 100644 --- a/website/content/docs/drivers/docker.mdx +++ b/website/content/docs/job-declare/task-driver/docker.mdx @@ -1,10 +1,10 @@ --- layout: docs -page_title: Docker task driver -description: Nomad's Docker task driver lets you run Docker-based tasks in your jobs. Learn how to configure job tasks, authenticate against a private repository, use insecure registries, and configure Docker networking. Modify the Docker task driver plugin configuration. Learn about CPU, memory, filesystem IO, and security resource isolation as well as how Nomad handles dangling containers. +page_title: Use the Docker task driver in a job +description: Nomad's Docker task driver lets you run Docker-based tasks in your jobs. Learn how to configure job tasks, authenticate against a private repository, use insecure registries, and configure Docker networking. --- -# Docker task driver +# Use the Docker task driver in a job Name: `docker` @@ -12,8 +12,12 @@ The `docker` driver provides a first-class Docker workflow on Nomad. The Docker driver handles downloading containers, mapping ports, and starting, watching, and cleaning up after containers. --> **Note:** If you are using Docker Desktop for Windows or MacOS, please check -[our FAQ][faq-win-mac]. +**Note:** If you are using Docker Desktop for Windows or MacOS, check +[the FAQ][faq-win-mac]. + +Refer to [Configure the Docker task +driver](/nomad/docs/deploy/task-driver/docker) for capabilities, client +requirements, and plugin configuration. ## Task Configuration @@ -37,8 +41,7 @@ The `docker` driver supports the following configuration in the job spec. Only and should include `https://` if required. By default it will be fetched from Docker Hub. If the tag is omitted or equal to `latest` the driver will always try to pull the image. If the image to be pulled exists in a registry that - requires authentication credentials must be provided to Nomad. Please see the - [Authentication section](#authentication). + requires authentication credentials must be provided to Nomad. ```hcl config { @@ -53,7 +56,7 @@ The `docker` driver supports the following configuration in the job spec. Only - `args` - (Optional) A list of arguments to the optional `command`. If no `command` is specified, the arguments are passed directly to the container. References to environment variables or any [interpretable Nomad - variables](/nomad/docs/runtime/interpolation) will be interpreted before + variables](/nomad/docs/reference/runtime-variable-interpolation) will be interpreted before launching the task. For example: ```hcl @@ -71,7 +74,7 @@ The `docker` driver supports the following configuration in the job spec. Only - `auth_soft_fail` `(bool: false)` - Don't fail the task on an auth failure. Attempt to continue without auth. If the Nomad client configuration has an - [`auth.helper`](#plugin_auth_helper) block, the helper will be tried for + [`auth.helper`](/nomad/docs/deploy/task-driver/docker#helper) block, the helper will be tried for all images, including public images. If you mix private and public images, you will need to include `auth_soft_fail=true` in every job using a public image. @@ -378,7 +381,7 @@ The `docker` driver supports the following configuration in the job spec. Only that exist inside the allocation working directory. You can allow mounting host paths outside of the [allocation working directory] on individual clients by setting the `docker.volumes.enabled` option to `true` in the - [client's configuration](#client-requirements). We recommend using + [client's configuration](/nomad/docs/deploy/task-driver/docker#client-requirements). We recommend using [`mount`](#mount) if you wish to have more control over volume definitions. ```hcl @@ -608,10 +611,10 @@ you will need to specify credentials in your job via: - the `auth` option in the task config. - by storing explicit repository credentials or by specifying Docker - `credHelpers` in a file and setting the auth [config](#plugin_auth_file) + `credHelpers` in a file and setting the auth [config](/nomad/docs/deploy/task-driver/docker#config) value on the client in the plugin options. -- by specifying an auth [helper](#plugin_auth_helper) on the client in the +- by specifying an auth [helper](/nomad/docs/deploy/task-driver/docker#helper) on the client in the plugin options. The `auth` object supports the following keys: @@ -817,473 +820,28 @@ Some networking modes like `container` or `none` will require coordination outside of Nomad. First-class support for these options may be improved later through Nomad plugins or dynamic job configuration. -## Capabilities - -The `docker` driver implements the following [capabilities](/nomad/docs/concepts/plugins/task-drivers#capabilities-capabilities-error). - -| Feature | Implementation | -| -------------------- | ----------------- | -| `nomad alloc signal` | true | -| `nomad alloc exec` | true | -| filesystem isolation | image | -| network isolation | host, group, task | -| volume mounting | all | - -## Client Requirements - -Nomad requires Docker to be installed and running on the host alongside the -Nomad agent. - -By default Nomad communicates with the Docker daemon using the daemon's Unix -socket. Nomad will need to be able to read/write to this socket. If you do not -run Nomad as root, make sure you add the Nomad user to the Docker group so -Nomad can communicate with the Docker daemon. - -For example, on Ubuntu you can use the `usermod` command to add the `nomad` -user to the `docker` group so you can run Nomad without root: - -```shell-session -$ sudo usermod -G docker -a nomad -``` - -Nomad clients manage a cpuset cgroup for each task to reserve or share CPU -[cores][]. In order for Nomad to be compatible with Docker's own cgroups -management, it must write to cgroups owned by Docker, which requires running as -root. If Nomad is not running as root, CPU isolation and NUMA-aware scheduling -will not function correctly for workloads with `resources.cores`, including -workloads using task drivers other than `docker` on the same host. - -For the best performance and security features you should use recent versions -of the Linux Kernel and Docker daemon. - -If you would like to change any of the options related to the `docker` driver -on a Nomad client, you can modify them with the [plugin block][plugin-block] -syntax. Below is an example of a configuration (many of the values are the -default). See the next section for more information on the options. - -```hcl -plugin "docker" { - config { - endpoint = "unix:///var/run/docker.sock" - - auth { - config = "/etc/docker-auth.json" - helper = "ecr-login" - } - - tls { - cert = "/etc/nomad/nomad.pub" - key = "/etc/nomad/nomad.pem" - ca = "/etc/nomad/nomad.cert" - } - - extra_labels = ["job_name", "job_id", "task_group_name", "task_name", "namespace", "node_name", "node_id"] - - gc { - image = true - image_delay = "3m" - container = true - - dangling_containers { - enabled = true - dry_run = false - period = "5m" - creation_grace = "5m" - } - } - - volumes { - enabled = true - selinuxlabel = "z" - } - - allow_privileged = false - allow_caps = ["chown", "net_raw"] - } -} -``` - -## Plugin Options - -- `endpoint` - If using a non-standard socket, HTTP or another location, or if - TLS is being used, docker.endpoint must be set. If unset, Nomad will attempt - to instantiate a Docker client using the `DOCKER_HOST` environment variable and - then fall back to the default listen address for the given operating system. - Defaults to `unix:///var/run/docker.sock` on Unix platforms and - `npipe:////./pipe/docker_engine` for Windows. - -- `allow_privileged` - Defaults to `false`. Changing this to true will allow - containers to use privileged mode, which gives the containers full access to - the host's devices. Note that you must set a similar setting on the Docker - daemon for this to work. - -- `pull_activity_timeout` - Defaults to `2m`. If Nomad receives no communication - from the Docker engine during an image pull within this timeframe, Nomad will - time out the request that initiated the pull command. (Minimum of `1m`) - -- `pids_limit` - Defaults to unlimited (`0`). An integer value that specifies - the pid limit for all the Docker containers running on that Nomad client. You - can override this limit by setting [`pids_limit`] in your task config. If - this value is greater than `0`, your task `pids_limit` must be less than or - equal to the value defined here. - -- `allow_caps` - A list of allowed Linux capabilities. Defaults to - -```hcl -["audit_write", "chown", "dac_override", "fowner", "fsetid", "kill", "mknod", - "net_bind_service", "setfcap", "setgid", "setpcap", "setuid", "sys_chroot"] -``` - - which is the same list of capabilities allowed by [docker by - default][docker_caps] (without [`NET_RAW`][no_net_raw]). Allows the operator - to control which capabilities can be obtained by tasks using - [`cap_add`][cap_add] and [`cap_drop`][cap_drop] options. Supports the value - `"all"` as a shortcut for allow-listing all capabilities supported by the - operating system. Note that due to a limitation in Docker, tasks running as - non-root users cannot expand the capabilities set beyond the default. They can - only have their capabilities reduced. - -!> **Warning:** Allowing more capabilities beyond the default may lead to -undesirable consequences, including untrusted tasks being able to compromise the -host system. - -- `allow_runtimes` - defaults to `["runc", "nvidia"]` - A list of the allowed - docker runtimes a task may use. - -- `auth` block: - - - `config` - Allows an operator to specify a - JSON file which is in the dockercfg format containing authentication - information for a private registry, from either (in order) `auths`, - `credsStore` or `credHelpers`. - - - `helper` - Allows an operator to specify a - [credsStore](https://docs.docker.com/engine/reference/commandline/login/#credential-helper-protocol) - like script on `$PATH` to lookup authentication information from external - sources. The script's name must begin with `docker-credential-` and this - option should include only the basename of the script, not the path. - - If you set an auth helper, it will be tried for all images, including - public images. If you mix private and public images, you will need to - include [`auth_soft_fail=true`] in every job using a public image. - -- `tls` block: - - - `cert` - Path to the server's certificate file (`.pem`). Specify this - along with `key` and `ca` to use a TLS client to connect to the docker - daemon. `endpoint` must also be specified or this setting will be ignored. - - - `key` - Path to the client's private key (`.pem`). Specify this along with - `cert` and `ca` to use a TLS client to connect to the docker daemon. - `endpoint` must also be specified or this setting will be ignored. - - - `ca` - Path to the server's CA file (`.pem`). Specify this along with - `cert` and `key` to use a TLS client to connect to the docker daemon. - `endpoint` must also be specified or this setting will be ignored. - -- `disable_log_collection` - Defaults to `false`. Setting this to true will - disable Nomad logs collection of Docker tasks. If you don't rely on nomad log - capabilities and exclusively use host based log aggregation, you may consider - this option to disable nomad log collection overhead. - -- `extra_labels` - Extra labels to add to Docker containers. - Available options are `job_name`, `job_id`, `task_group_name`, `task_name`, - `namespace`, `node_name`, `node_id`. Globs are supported (e.g. `task*`) - -- `logging` block: - - - `type` - Defaults to `"json-file"`. Specifies the logging driver docker - should use for all containers Nomad starts. Note that for older versions - of Docker, only `json-file` file or `journald` will allow Nomad to read - the driver's logs via the Docker API, and this will prevent commands such - as `nomad alloc logs` from functioning. - - - `config` - Defaults to `{ max-file = "2", max-size = "2m" }`. This option - can also be used to pass further - [configuration](https://docs.docker.com/config/containers/logging/configure/) - to the logging driver. - -- `gc` block: - - - `image` - Defaults to `true`. Changing this to `false` will prevent Nomad - from removing images from stopped tasks. - - - `image_delay` - A time duration, as [defined - here](https://golang.org/pkg/time/#ParseDuration), that defaults to `3m`. - The delay controls how long Nomad will wait between an image being unused - and deleting it. If a task is received that uses the same image within - the delay, the image will be reused. If an image is referenced by more than - one tag, `image_delay` may not work correctly. - - - `container` - Defaults to `true`. This option can be used to disable Nomad - from removing a container when the task exits. Under a name conflict, - Nomad may still remove the dead container. - - - `dangling_containers` block for controlling dangling container detection - and cleanup: - - - `enabled` - Defaults to `true`. Enables dangling container handling. - - - `dry_run` - Defaults to `false`. Only log dangling containers without - cleaning them up. - - - `period` - Defaults to `"5m"`. A time duration that controls interval - between Nomad scans for dangling containers. - - - `creation_grace` - Defaults to `"5m"`. Grace period after a container is - created during which the GC ignores it. Only used to prevent the GC from - removing newly created containers before they are registered with the - GC. Should not need adjusting higher but may be adjusted lower to GC - more aggressively. - -- `volumes` block: - - - `enabled` - Defaults to `false`. Allows tasks to bind host paths - (`volumes`) inside their container and use volume drivers - (`volume_driver`). Binding relative paths is always allowed and will be - resolved relative to the allocation's directory. - - - `selinuxlabel` - Allows the operator to set a SELinux label to the - allocation and task local bind-mounts to containers. If used with - `docker.volumes.enabled` set to false, the labels will still be applied to - the standard binds in the container. - -- `infra_image` - This is the Docker image to use when creating the parent - container necessary when sharing network namespaces between tasks. Defaults to - `registry.k8s.io/pause-:3.3`. The image will only be pulled from the - container registry if its tag is `latest` or the image doesn't yet exist - locally. - -- `infra_image_pull_timeout` - A time duration that controls how long Nomad will - wait before cancelling an in-progress pull of the Docker image as specified in - `infra_image`. Defaults to `"5m"`. - -- `image_pull_timeout` - (Optional) A default time duration that controls how long Nomad - waits before cancelling an in-progress pull of the Docker image as specified - in `image` across all tasks. Defaults to `"5m"`. - -- `windows_allow_insecure_container_admin` - Indicates that on windows, docker - checks the `task.user` field or, if unset, the container image manifest after - pulling the container, to see if it's running as `ContainerAdmin`. If so, exits - with an error unless the task config has `privileged=true`. Defaults to `false`. - -## Client Configuration - -~> Note: client configuration options will soon be deprecated. Please use -[plugin options][plugin-options] instead. See the [plugin block][plugin-block] -documentation for more information. - -The `docker` driver has the following [client configuration -options](/nomad/docs/configuration/client#options): - -- `docker.endpoint` - If using a non-standard socket, HTTP or another location, - or if TLS is being used, `docker.endpoint` must be set. If unset, Nomad will - attempt to instantiate a Docker client using the `DOCKER_HOST` environment - variable and then fall back to the default listen address for the given - operating system. Defaults to `unix:///var/run/docker.sock` on Unix platforms - and `npipe:////./pipe/docker_engine` for Windows. - -- `docker.auth.config` - Allows an operator to specify a - JSON file which is in the dockercfg format containing authentication - information for a private registry, from either (in order) `auths`, - `credsStore` or `credHelpers`. - -- `docker.auth.helper` - Allows an operator to specify a - [credsStore](https://docs.docker.com/engine/reference/commandline/login/#credential-helper-protocol) - -like script on \$PATH to lookup authentication information from external - sources. The script's name must begin with `docker-credential-` and this - option should include only the basename of the script, not the path. - -- `docker.tls.cert` - Path to the server's certificate file (`.pem`). Specify - this along with `docker.tls.key` and `docker.tls.ca` to use a TLS client to - connect to the docker daemon. `docker.endpoint` must also be specified or this - setting will be ignored. - -- `docker.tls.key` - Path to the client's private key (`.pem`). Specify this - along with `docker.tls.cert` and `docker.tls.ca` to use a TLS client to - connect to the docker daemon. `docker.endpoint` must also be specified or this - setting will be ignored. - -- `docker.tls.ca` - Path to the server's CA file (`.pem`). Specify this along - with `docker.tls.cert` and `docker.tls.key` to use a TLS client to connect to - the docker daemon. `docker.endpoint` must also be specified or this setting - will be ignored. - -- `docker.cleanup.image` Defaults to `true`. Changing this to `false` will - prevent Nomad from removing images from stopped tasks. - -- `docker.cleanup.image.delay` A time duration, as [defined - here](https://golang.org/pkg/time/#ParseDuration), that defaults to `3m`. The - delay controls how long Nomad will wait between an image being unused and - deleting it. If a tasks is received that uses the same image within the delay, - the image will be reused. - -- `docker.volumes.enabled`: Defaults to `false`. Allows tasks to bind host paths - (`volumes`) inside their container and use volume drivers (`volume_driver`). - Binding relative paths is always allowed and will be resolved relative to the - allocation's directory. - -- `docker.volumes.selinuxlabel`: Allows the operator to set a SELinux label to - the allocation and task local bind-mounts to containers. If used with - `docker.volumes.enabled` set to false, the labels will still be applied to the - standard binds in the container. - -- `docker.privileged.enabled` Defaults to `false`. Changing this to `true` will - allow containers to use `privileged` mode, which gives the containers full - access to the host's devices. Note that you must set a similar setting on the - Docker daemon for this to work. - -- `docker.caps.allowlist`: A list of allowed Linux capabilities. Defaults to - `"CHOWN,DAC_OVERRIDE,FSETID,FOWNER,MKNOD,NET_RAW,SETGID,SETUID,SETFCAP, SETPCAP,NET_BIND_SERVICE,SYS_CHROOT,KILL,AUDIT_WRITE"`, which is the list of - capabilities allowed by docker by default, as [defined - here](https://docs.docker.com/engine/reference/run/#runtime-privilege-and-linux-capabilities). - Allows the operator to control which capabilities can be obtained by tasks - using `cap_add` and `cap_drop` options. Supports the value `"ALL"` as a - shortcut for allowlisting all capabilities. - -- `docker.cleanup.container`: Defaults to `true`. This option can be used to - disable Nomad from removing a container when the task exits. Under a name - conflict, Nomad may still remove the dead container. - -- `docker.nvidia_runtime`: Defaults to `nvidia`. This option allows operators to select the runtime that should be used in order to expose Nvidia GPUs to the container. - -Note: When testing or using the `-dev` flag you can use `DOCKER_HOST`, -`DOCKER_TLS_VERIFY`, and `DOCKER_CERT_PATH` to customize Nomad's behavior. If -`docker.endpoint` is set Nomad will **only** read client configuration from the -config file. - -An example is given below: - -```hcl -client { - options { - "docker.cleanup.image" = "false" - } -} -``` - -## Client Attributes - -The `docker` driver will set the following client attributes: - -- `driver.docker` - This will be set to "1", indicating the driver is - available. - -- `driver.docker.bridge_ip` - The IP of the Docker bridge network if one - exists. - -- `driver.docker.version` - This will be set to version of the docker server. - -Here is an example of using these properties in a job file: - -```hcl -job "docs" { - # Require docker version higher than 1.2. - constraint { - attribute = "${attr.driver.docker.version}" - operator = ">" - version = "1.2" - } -} -``` - -## Resource Isolation - -### CPU - -Nomad limits containers' CPU based on CPU shares. CPU shares allow containers -to burst past their CPU limits. CPU limits will only be imposed when there is -contention for resources. When the host is under load your process may be -throttled to stabilize QoS depending on how many shares it has. You can see how -many CPU shares are available to your process by reading [`NOMAD_CPU_LIMIT`][runtime_env]. -1000 shares are approximately equal to 1 GHz. - -Please keep the implications of CPU shares in mind when you load test workloads -on Nomad. - -If resources [`cores`][cores] is set, the task is given an isolated reserved set of -CPU cores to make use of. The total set of cores the task may run on is the private -set combined with the variable set of unreserved cores. The private set of CPU cores -is available to your process by reading [`NOMAD_CPU_CORES`][runtime_env]. - -### Memory - -Nomad limits containers' memory usage based on total virtual memory. This means -that containers scheduled by Nomad cannot use swap. This is to ensure that a -swappy process does not degrade performance for other workloads on the same -host. - -Since memory is not an elastic resource, you will need to make sure your -container does not exceed the amount of memory allocated to it, or it will be -terminated or crash when it tries to malloc. A process can inspect its memory -limit by reading [`NOMAD_MEMORY_LIMIT`][runtime_env], but will need to track its own memory -usage. Memory limit is expressed in megabytes so 1024 = 1 GB. - -### IO - -Nomad's Docker integration does not currently provide QoS around network or -filesystem IO. These will be added in a later release. - -### Security - -Docker provides resource isolation by way of -[cgroups and namespaces](https://docs.docker.com/introduction/understanding-docker/#the-underlying-technology). -Containers essentially have a virtual file system all to themselves. If you -need a higher degree of isolation between processes for security or other -reasons, it is recommended to use full virtualization like -[QEMU](/nomad/docs/drivers/qemu). - -## Caveats - -### Dangling Containers - -Nomad has a detector and a reaper for dangling Docker containers, -containers that Nomad starts yet does not manage or track. Though rare, they -lead to unexpectedly running services, potentially with stale versions. - -When Docker daemon becomes unavailable as Nomad starts a task, it is possible -for Docker to successfully start the container but return a 500 error code from -the API call. In such cases, Nomad retries and eventually aims to kill such -containers. However, if the Docker Engine remains unhealthy, subsequent retries -and stop attempts may still fail, and the started container becomes a dangling -container that Nomad no longer manages. - -The newly added reaper periodically scans for such containers. It only targets -containers with a `com.hashicorp.nomad.allocation_id` label, or match Nomad's -conventions for naming and bind-mounts (i.e. `/alloc`, `/secrets`, `local`). -Containers that don't match Nomad container patterns are left untouched. - -Operators can run the reaper in a dry-run mode, where it only logs dangling -container ids without killing them, or disable it by setting the -`gc.dangling_containers` config block. - -### Docker for Windows - -Docker for Windows only supports running Windows containers. Because Docker for -Windows is relatively new and rapidly evolving you may want to consult the -[list of relevant issues on GitHub][winissues]. [faq-win-mac]: /nomad/docs/faq#q-how-to-connect-to-my-host-network-when-using-docker-desktop-windows-and-macos [winissues]: https://github.com/hashicorp/nomad/issues?q=is%3Aopen+is%3Aissue+label%3Atheme%2Fdriver%2Fdocker+label%3Atheme%2Fplatform-windows [plugin-options]: #plugin-options [plugin-block]: /nomad/docs/configuration/plugin -[allocation working directory]: /nomad/docs/runtime/environment#task-directories 'Task Directories' +[allocation working directory]: /nomad/docs/reference/runtime-environment-settings#task-directories 'Task Directories' [`auth_soft_fail=true`]: #auth_soft_fail -[cap_add]: /nomad/docs/drivers/docker#cap_add -[cap_drop]: /nomad/docs/drivers/docker#cap_drop +[cap_add]: /nomad/docs/deploy/task-driver/docker#cap_add +[cap_drop]: /nomad/docs/deploy/task-driver/docker#cap_drop [no_net_raw]: /nomad/docs/upgrade/upgrade-specific#nomad-1-1-0-rc1-1-0-5-0-12-12 [upgrade_guide_extra_hosts]: /nomad/docs/upgrade/upgrade-specific#docker-driver [tini]: https://github.com/krallin/tini [docker_caps]: https://docs.docker.com/engine/reference/run/#runtime-privilege-and-linux-capabilities -[allow_caps]: /nomad/docs/drivers/docker#allow_caps +[allow_caps]: /nomad/docs/deploy/task-driver/docker#allow_caps [Connect]: /nomad/docs/job-specification/connect [`bridge`]: /nomad/docs/job-specification/network#bridge [network block]: /nomad/docs/job-specification/network#bridge-mode [`network.mode`]: /nomad/docs/job-specification/network#mode -[`pids_limit`]: /nomad/docs/drivers/docker#pids_limit +[`pids_limit`]: /nomad/docs/deploy/task-driver/docker#pids_limit [Windows isolation]: https://learn.microsoft.com/en-us/virtualization/windowscontainers/manage-containers/hyperv-container [cores]: /nomad/docs/job-specification/resources#cores -[runtime_env]: /nomad/docs/runtime/environment#job-related-variables +[runtime_env]: /nomad/docs/reference/runtime-environment-settings#job-related-variables [`--cap-add`]: https://docs.docker.com/engine/reference/run/#runtime-privilege-and-linux-capabilities [`--cap-drop`]: https://docs.docker.com/engine/reference/run/#runtime-privilege-and-linux-capabilities [cores]: /nomad/docs/job-specification/resources#cores diff --git a/website/content/docs/job-declare/task-driver/exec.mdx b/website/content/docs/job-declare/task-driver/exec.mdx new file mode 100644 index 000000000..94760d656 --- /dev/null +++ b/website/content/docs/job-declare/task-driver/exec.mdx @@ -0,0 +1,148 @@ +--- +layout: docs +page_title: Use the Isolated Fork/Exec task driver in a job +description: Nomad's Isolated Fork/Exec task driver lets you run binaries using OS isolation primitives. Learn how to use the Isolated Fork/Exec task driver in your jobs. Configure the command to execute with command arguments, namespace isolation, and Linux capabilities. +--- + +# Use the Isolated Fork/Exec task driver in a job + +Name: `exec` + +The `exec` driver is used to execute a particular command for a task. However, +unlike [`raw_exec`](/nomad/docs/job-declare/task-driver/raw_exec) it uses the +underlying isolation primitives of the operating system to limit the task's +access to resources. While simple, since the `exec` driver can invoke any +command, it can be used to call scripts or other wrappers which provide higher +level features. + +Refer to [Configure the Isolated Fork/Exec task +driver](/nomad/docs/deploy/task-driver/exec) for capabilities, client +requirements, and plugin configuration. + +## Task Configuration + +```hcl +task "webservice" { + driver = "exec" + + config { + command = "my-binary" + args = ["-flag", "1"] + } +} +``` + +The `exec` driver supports the following configuration in the job spec: + +- `command` - The command to execute. Must be provided. If executing a binary + that exists on the host, the path must be absolute and within the task's + [chroot](/nomad/docs/deploy/task-driver/exec#chroot) or in a [host volume][] mounted with a + [`volume_mount`][volume_mount] block. The driver will make the binary + executable and will search, in order: + + - The `local` directory with the task directory. + - The task directory. + - Any mounts, in the order listed in the job specification. + - The `usr/local/bin`, `usr/bin` and `bin` directories inside the task + directory. + + If executing a binary that is downloaded + from an [`artifact`](/nomad/docs/job-specification/artifact), the path can be + relative from the allocation's root directory. + +- `args` - (Optional) A list of arguments to the `command`. References + to environment variables or any [interpretable Nomad + variables](/nomad/docs/reference/runtime-variable-interpolation) will be interpreted before + launching the task. + +- `pid_mode` - (Optional) Set to `"private"` to enable PID namespace isolation for + this task, or `"host"` to disable isolation. If left unset, the behavior is + determined from the [`default_pid_mode`][default_pid_mode] in plugin configuration. + +!> **Warning:** If set to `"host"`, other processes running as the same user will +be able to access sensitive process information like environment variables. + +- `ipc_mode` - (Optional) Set to `"private"` to enable IPC namespace isolation for + this task, or `"host"` to disable isolation. If left unset, the behavior is + determined from the [`default_ipc_mode`][default_ipc_mode] in plugin configuration. + +!> **Warning:** If set to `"host"`, other processes running as the same user will be +able to make use of IPC features, like sending unexpected POSIX signals. + +- `cap_add` - (Optional) A list of Linux capabilities to enable for the task. + Effective capabilities (computed from `cap_add` and `cap_drop`) must be a + subset of the allowed capabilities configured with [`allow_caps`][allow_caps]. + Note that `"all"` is not permitted here if the `allow_caps` field in the + driver configuration doesn't also allow all capabilities. + +```hcl +config { + cap_add = ["net_raw", "sys_time"] +} +``` + +- `cap_drop` - (Optional) A list of Linux capabilities to disable for the task. + Effective capabilities (computed from `cap_add` and `cap_drop`) must be a subset + of the allowed capabilities configured with [`allow_caps`][allow_caps]. + +```hcl +config { + cap_drop = ["all"] + cap_add = ["chown", "sys_chroot", "mknod"] +} +``` + +- `work_dir` - (Optional) Sets a custom working directory for the task. This path must be + absolute and within the task's [chroot](/nomad/docs/deploy/task-driver/exec#chroot) or in a [host volume][] mounted + with a [`volume_mount`][volume_mount] block. This will also change the working + directory when using `nomad alloc exec`. + +## Examples + +To run a binary present on the Node: + +```hcl +task "example" { + driver = "exec" + + config { + # When running a binary that exists on the host, the path must be absolute. + command = "/bin/sleep" + args = ["1"] + } +} +``` + +To execute a binary downloaded from an +[`artifact`](/nomad/docs/job-specification/artifact): + +```hcl +task "example" { + driver = "exec" + + config { + command = "name-of-my-binary" + } + + artifact { + source = "https://internal.file.server/name-of-my-binary" + options { + checksum = "sha256:abd123445ds4555555555" + } + } +} +``` + + +[default_pid_mode]: /nomad/docs/deploy/task-driver/exec#default_pid_mode +[default_ipc_mode]: /nomad/docs/deploy/task-driver/exec#default_ipc_mode +[cap_add]: /nomad/docs/deploy/task-driver/exec#cap_add +[cap_drop]: /nomad/docs/deploy/task-driver/exec#cap_drop +[no_net_raw]: /nomad/docs/upgrade/upgrade-specific#nomad-1-1-0-rc1-1-0-5-0-12-12 +[allow_caps]: /nomad/docs/deploy/task-driver/exec#allow_caps +[docker_caps]: https://docs.docker.com/engine/reference/run/#runtime-privilege-and-linux-capabilities +[host volume]: /nomad/docs/configuration/client#host_volume-block +[volume_mount]: /nomad/docs/job-specification/volume_mount +[cores]: /nomad/docs/job-specification/resources#cores +[runtime_env]: /nomad/docs/reference/runtime-environment-settings#job-related-variables +[cgroup controller requirements]: /nomad/docs/deploy/production/requirements#hardening-nomad diff --git a/website/content/docs/job-declare/task-driver/index.mdx b/website/content/docs/job-declare/task-driver/index.mdx new file mode 100644 index 000000000..9bceb69a9 --- /dev/null +++ b/website/content/docs/job-declare/task-driver/index.mdx @@ -0,0 +1,42 @@ +--- +layout: docs +page_title: Use Nomad task drivers in jobs +description: Nomad's bundled task drivers integrate with the host OS to run job tasks in isolation. Review conceptual, installation, usage, and reference information for the Docker, Isolated Fork/Exec, Java, QEMU, and Raw Fork/Exec task drivers. +--- + +# Use Nomad task drivers in jobs + +Nomad's bundled task drivers integrate with the host OS to run job tasks in +isolation. Review job usage for the Docker, Isolated +Fork/Exec, Java, QEMU, and Raw Fork/Exec task drivers. + +@include 'task-driver-intro.mdx' + +## Nomad task drivers + +The Nomad binary contains several bundled task drivers. We also support +additional task driver plugins that you may install separately. + +| Bundled with Nomad | Plugins | +|----------------------|-----------------------| +| [Docker] | [Exec2] | +| [Isolated Fork/Exec] | [Podman] | +| [Java] | [Virt] | +| [QEMU] | | +| [Raw Fork/Exec] | | + +Refer to each task driver's page for detailed usage. + +## Configure task driver plugins + +Refer to [Configure Nomad task drivers](/nomad/docs/deploy/task-driver) for task +driver plugin configuration details. + +[Docker]: /nomad/docs/job-declare/task-driver/docker +[Exec2]: /nomad/plugins/drivers/exec2 +[Isolated Fork/Exec]: /nomad/docs/job-declare/task-driver/exec +[Podman]: /nomad/plugins/drivers/podman +[Java]: /nomad/docs/job-declare/task-driver/java +[Virt]: /nomad/plugins/drivers/virt +[QEMU]: /nomad/docs/job-declare/task-driver/qemu +[Raw Fork/Exec]: /nomad/docs/job-declare/task-driver/raw_exec diff --git a/website/content/docs/job-declare/task-driver/java.mdx b/website/content/docs/job-declare/task-driver/java.mdx new file mode 100644 index 000000000..79b2fd2f6 --- /dev/null +++ b/website/content/docs/job-declare/task-driver/java.mdx @@ -0,0 +1,154 @@ +--- +layout: docs +page_title: Use the Java task driver in a job +description: Nomad's Java task driver lets you run JAR files in your workloads. Learn how to configure a job task that uses the Java task driver. Configure paths, JAR args, JVM options, namespace isolation, work directory, and Linux capabilities. +--- + +# Use the Java task driver in a job + +Name: `java` + +The `java` driver is used to execute Java applications packaged into a Java Jar +file. The driver requires the Jar file to be accessible from the Nomad +client via the [`artifact` downloader](/nomad/docs/job-specification/artifact). + +Refer to [Configure the Java task driver](/nomad/docs/deploy/task-driver/java) +for capabilities, client requirements, and plugin configuration. + +## Task Configuration + +```hcl +task "webservice" { + driver = "java" + + config { + jar_path = "local/example.jar" + jvm_options = ["-Xmx2048m", "-Xms256m"] + } +} +``` + +The `java` driver supports the following configuration in the job spec: + +- `class` - (Optional) The name of the class to run. If `jar_path` is specified + and the manifest specifies a main class, this is optional. If shipping classes + rather than a Jar, please specify the class to run and the `class_path`. + +- `class_path` - (Optional) The `class_path` specifies the class path used by + Java to lookup classes and Jars. + +- `jar_path` - (Optional) The path to the downloaded Jar. In most cases this will just be + the name of the Jar. However, if the supplied artifact is an archive that + contains the Jar in a subfolder, the path will need to be the relative path + (`subdir/from_archive/my.jar`). + +- `args` - (Optional) A list of arguments to the Jar's main method. References + to environment variables or any [interpretable Nomad + variables](/nomad/docs/reference/runtime-variable-interpolation) will be interpreted before + launching the task. + +- `jvm_options` - (Optional) A list of JVM options to be passed while invoking + java. These options are passed without being validated in any way by Nomad. + +- `pid_mode` - (Optional) Set to `"private"` to enable PID namespace isolation for + this task, or `"host"` to disable isolation. If left unset, the behavior is + determined from the [`default_pid_mode`][default_pid_mode] in plugin configuration. + +!> **Warning:** If set to `"host"`, other processes running as the same user will +be able to access sensitive process information like environment variables. + +- `ipc_mode` - (Optional) Set to `"private"` to enable IPC namespace isolation for + this task, or `"host"` to disable isolation. If left unset, the behavior is + determined from the [`default_ipc_mode`][default_ipc_mode] in plugin configuration. + +!> **Warning:** If set to `"host"`, other processes running as the same user will be +able to make use of IPC features, like sending unexpected POSIX signals. + +- `cap_add` - (Optional) A list of Linux capabilities to enable for the task. + Effective capabilities (computed from `cap_add` and `cap_drop`) must be a + subset of the allowed capabilities configured with [`allow_caps`][allow_caps]. + Note that `"all"` is not permitted here if the `allow_caps` field in the + driver configuration doesn't also allow all capabilities. + + +```hcl +config { + cap_add = ["net_raw", "sys_time"] +} +``` + +- `cap_drop` - (Optional) A list of Linux capabilities to disable for the task. + Effective capabilities (computed from `cap_add` and `cap_drop`) must be a subset + of the allowed capabilities configured with [`allow_caps`][allow_caps]. + +```hcl +config { + cap_drop = ["all"] + cap_add = ["chown", "sys_chroot", "mknod"] +} +``` + +- `work_dir` - (Optional) Sets a custom working directory for the task. This path must be + absolute and within the task's [chroot](/nomad/docs/deploy/task-driver/java#chroot) or in a [host volume][] mounted + with a [`volume_mount`][volume_mount] block. This will also change the working + directory when using `nomad alloc exec`. + +## Examples + +A simple config block to run a Java Jar: + +```hcl +task "web" { + driver = "java" + + config { + jar_path = "local/hello.jar" + jvm_options = ["-Xmx2048m", "-Xms256m"] + } + + # Specifying an artifact is required with the "java" driver. This is the + # mechanism to ship the Jar to be run. + artifact { + source = "https://internal.file.server/hello.jar" + + options { + checksum = "md5:123445555555555" + } + } +} +``` + +A simple config block to run a Java class: + +```hcl +task "web" { + driver = "java" + + config { + class = "Hello" + class_path = "${NOMAD_TASK_DIR}" + jvm_options = ["-Xmx2048m", "-Xms256m"] + } + + # Specifying an artifact is required with the "java" driver. This is the + # mechanism to ship the Jar to be run. + artifact { + source = "https://internal.file.server/Hello.class" + + options { + checksum = "md5:123445555555555" + } + } +} +``` + +[default_pid_mode]: /nomad/docs/deploy/task-driver/java#default_pid_mode +[default_ipc_mode]: /nomad/docs/deploy/task-driver/java#default_ipc_mode +[cap_add]: /nomad/docs/deploy/task-driver/java#cap_add +[cap_drop]: /nomad/docs/deploy/task-driver/java#cap_drop +[no_net_raw]: /nomad/docs/upgrade/upgrade-specific#nomad-1-1-0-rc1-1-0-5-0-12-12 +[allow_caps]: /nomad/docs/deploy/task-driver/java#allow_caps +[docker_caps]: https://docs.docker.com/engine/reference/run/#runtime-privilege-and-linux-capabilities +[cgroup controller requirements]: /nomad/docs/deploy/production/requirements#hardening-nomad +[volume_mount]: /nomad/docs/job-specification/volume_mount +[host volume]: /nomad/docs/configuration/client#host_volume-block diff --git a/website/content/docs/drivers/qemu.mdx b/website/content/docs/job-declare/task-driver/qemu.mdx similarity index 56% rename from website/content/docs/drivers/qemu.mdx rename to website/content/docs/job-declare/task-driver/qemu.mdx index ad68dbd91..2eee6c201 100644 --- a/website/content/docs/drivers/qemu.mdx +++ b/website/content/docs/job-declare/task-driver/qemu.mdx @@ -1,10 +1,10 @@ --- layout: docs -page_title: QEMU task driver -description: Nomad's QEMU task driver provides a generic virtual machine runner that can execute any regular QEMU image. Learn how to use the QEMU task driver in your jobs. Configure image path, driver interface, accelerator, graceful shutdown, guest agent, and port map. Review the QEMU task driver capabilities, plugin options, client requirements, and client attributes such as QEMU version. Learn how the QEMU task driver provides the highest level of workload isolation. +page_title: Use the QEMU task driver in a job +description: Nomad's QEMU task driver provides a generic virtual machine runner that can execute any regular QEMU image. Learn how to use the QEMU task driver in your jobs. Configure image path, driver interface, accelerator, graceful shutdown, guest agent, and port map. --- -# QEMU task driver +# Use the QEMU task driver in a job Name: `qemu` @@ -20,6 +20,9 @@ The `qemu` driver can execute any regular `qemu` image (e.g. `qcow`, `img`, The driver requires the image to be accessible from the Nomad client via the [`artifact` downloader](/nomad/docs/job-specification/artifact). +Refer to [Configure the QEMU task driver](/nomad/docs/deploy/task-driver/qemu) +for capabilities, client requirements, and plugin configuration. + ## Task Configuration ```hcl @@ -105,80 +108,5 @@ task "virtual" { } ``` -## Capabilities - -The `qemu` driver implements the following [capabilities](/nomad/docs/concepts/plugins/task-drivers#capabilities-capabilities-error). - -| Feature | Implementation | -| -------------------- | -------------- | -| `nomad alloc signal` | false | -| `nomad alloc exec` | false | -| filesystem isolation | image | -| network isolation | none | -| volume mounting | none | - -## Client Requirements - -The `qemu` driver requires QEMU to be installed and in your system's `$PATH`. -The task must also specify at least one artifact to download, as this is the only -way to retrieve the image being run. - -## Client Attributes - -The `qemu` driver will set the following client attributes: - -- `driver.qemu` - Set to `1` if QEMU is found on the host node. Nomad determines - this by executing `qemu-system-x86_64 -version` on the host and parsing the output -- `driver.qemu.version` - Version of `qemu-system-x86_64`, ex: `2.4.0` - -Here is an example of using these properties in a job file: - -```hcl -job "docs" { - # Only run this job where the qemu version is higher than 1.2.3. - constraint { - attribute = "${driver.qemu.version}" - operator = ">" - value = "1.2.3" - } -} -``` - -## Plugin Options - -```hcl -plugin "qemu" { - config { - image_paths = ["/mnt/image/paths"] - args_allowlist = ["-drive", "-usbdevice"] - } -} -``` - -- `image_paths` (`[]string`: `[]`) - Specifies the host paths the QEMU - driver is allowed to load images from. -- `args_allowlist` (`[]string`: `[]`) - Specifies the command line - flags that the [`args`] option is permitted to pass to QEMU. If - unset, a job submitter can pass any command line flag into QEMU, - including flags that provide the VM with access to host devices such - as USB drives. Refer to the [QEMU documentation] for the available - flags. - -## Resource Isolation - -Nomad uses QEMU to provide full software virtualization for virtual machine -workloads. Nomad can use QEMU KVM's hardware-assisted virtualization to deliver -better performance. - -Virtualization provides the highest level of isolation for workloads that -require additional security, and resource use is constrained by the QEMU -hypervisor rather than the host kernel. VM network traffic still flows through -the host's interface(s). - -Note that the strong isolation provided by virtualization only applies -to the workload once the VM is started. Operators should use the -`args_allowlist` option to prevent job submitters from accessing -devices and resources they are not allowed to access. - -[`args`]: /nomad/docs/drivers/qemu#args +[`args`]: /nomad/docs/job-declare/task-driver/qemu#args [QEMU documentation]: https://www.qemu.org/docs/master/system/invocation.html diff --git a/website/content/docs/job-declare/task-driver/raw_exec.mdx b/website/content/docs/job-declare/task-driver/raw_exec.mdx new file mode 100644 index 000000000..690b60b10 --- /dev/null +++ b/website/content/docs/job-declare/task-driver/raw_exec.mdx @@ -0,0 +1,109 @@ +--- +layout: docs +page_title: Use the Raw Fork/Exec task driver in a job +description: Nomad's Raw Exec task driver lets you execute commands with no resource isolation. Learn how to use the Raw Fork/Exec task driver in your jobs. +--- + +# Use the Raw Fork/Exec task driver in a job + +Name: `raw_exec` + +The `raw_exec` driver is used to execute a command for a task without any +isolation. Further, the task is started as the same user as the Nomad process. +As such, it should be used with extreme care and is disabled by default. + +Refer to [Configure the Raw Fork/Exec task +driver](/nomad/docs/job-declare/task-driver/raw_exec) for capabilities, client +requirements, and plugin configuration. + +## Task configuration + +```hcl +task "webservice" { + driver = "raw_exec" + + config { + command = "my-binary" + args = ["-flag", "1"] + } +} +``` + +The `raw_exec` driver supports the following configuration in the job spec: + +- `command` - The command to execute. Must be provided. If executing a binary + that exists on the host, the path must be absolute. If executing a binary that + is downloaded from an [`artifact`](/nomad/docs/job-specification/artifact), the + path can be relative from the allocation's root directory. + +- `args` - (Optional) A list of arguments to the `command`. References + to environment variables or any [interpretable Nomad + variables](/nomad/docs/reference/runtime-variable-interpolation) will be interpreted before + launching the task. + +- `cgroup_v1_override` - (Optional) A map of controller names to paths. The + task will be added to these cgroups. The task will fail if these cgroups do + not exist. **WARNING:** May conflict with other Nomad driver's cgroups and + have unintended side effects. + +- `cgroup_v2_override` - (Optional) Adds the task to a unified cgroup path. + Paths may be relative to the cgroupfs root or absolute. **WARNING:** May + conflict with other Nomad driver's cgroups and have unintended side + effects. + +~> On Linux, you cannot set the `task.user` field on a task using the `raw_exec` +driver if you have hardened the Nomad client according to the +[production][hardening] guide. On Windows, when Nomad is running as a [system +service][service], you may specify a less-privileged service user. For example, +`NT AUTHORITY\LocalService`, `NT AUTHORITY\NetworkService`. + +- `oom_score_adj` - (Optional) A positive integer to indicate the likelihood of + the task being OOM killed (valid only for Linux). Defaults to 0. + +- `work_dir` - (Optional) Sets a custom working directory for the task. This + must be an absolute path. This will also change the working directory when + using `nomad alloc exec`. + +- `denied_envvars` - (Optional) Passes a list of environment variables that + the driver should scrub from the task environment. Supports globbing, with "*" + wildcard accepted as prefix and/or suffix. + +## Examples + +To run a binary present on the Node: + +``` +task "example" { + driver = "raw_exec" + + config { + # When running a binary that exists on the host, the path must be absolute/ + command = "/bin/sleep" + args = ["1"] + } +} +``` + +To execute a binary downloaded from an [`artifact`](/nomad/docs/job-specification/artifact): + +``` +task "example" { + driver = "raw_exec" + + config { + command = "name-of-my-binary" + } + + artifact { + source = "https://internal.file.server/name-of-my-binary" + options { + checksum = "sha256:abd123445ds4555555555" + } + } +} +``` + +[hardening]: /nomad/docs/deploy/production/requirements#user-permissions +[service]: /nomad/docs/deploy/production/windows-service +[plugin-options]: #plugin-options +[plugin-block]: /nomad/docs/configuration/plugin diff --git a/website/content/docs/job-declare/vault.mdx b/website/content/docs/job-declare/vault.mdx new file mode 100644 index 000000000..87ba7fe28 --- /dev/null +++ b/website/content/docs/job-declare/vault.mdx @@ -0,0 +1,49 @@ +--- +layout: docs +page_title: Use Vault in a job +description: |- + Use Vault in your Nomad job. +--- + +# Use Vault in a job + +## Use a Vault namespace + +This example job specifies to use the `engineering` namespace in +Vault. The job authenticates to Vault using its workload identity with the +`nomad-workloads` Vault role, reads the value at secret/foo, and fetches the +value for key `bar`. + +```hcl +job "vault" { + + group "demo" { + task "task" { + vault { + namespace = "engineering" + role = "nomad-workloads" + } + + driver = "raw_exec" + config { + command = "/usr/bin/cat" + args = ["secrets/config.txt"] + } + + template { + data = <`. + +For example, to use the configuration named `mynet`, you should set the task +group's network mode to `cni/mynet`. + +```hcl +job "docs" { + group "example" { + network { + mode = "cni/mynet" + } + } +} +``` + +Nodes that have a network configuration defining a network named `mynet` in +their `cni_config_dir` are eligible to run the workload. Nomad then schedules +the workload on client nodes that have fingerprinted a CNI configuration with +the given name. + +Nomad additionally supplies the following arguments via `CNI_ARGS` to the CNI +network: `NOMAD_REGION`, `NOMAD_NAMESPACE`, `NOMAD_JOB_ID`, `NOMAD_GROUP_NAME`, +and `NOMAD_ALLOC_ID`. + +Since the `CNI_ARGS` do not allow values to contain a semicolon, Nomad will not +set keys where the value contains a semicolon (this could happen with the job +ID). CNI plugins utilizing `NOMAD_*` CNI arguments are advised to apply a +defensive policy or simply error out. diff --git a/website/content/docs/job-networking/index.mdx b/website/content/docs/job-networking/index.mdx new file mode 100644 index 000000000..ce2cb120b --- /dev/null +++ b/website/content/docs/job-networking/index.mdx @@ -0,0 +1,12 @@ +--- +layout: docs +page_title: Workload networking +description: |- + This section provides guidance on networking your Nomad workloads. Topics include using a CNI network and Consul service discovery. +--- + +# Workload networking + +This section provides guidance on networking your Nomad workloads. Topics +include using a CNI network and Consul service discovery. Refer to the [Nomad +networking page](/nomad/docs/networking) to learn more about Nomad's networking capabilities. diff --git a/website/content/docs/job-networking/service-discovery.mdx b/website/content/docs/job-networking/service-discovery.mdx new file mode 100644 index 000000000..5ffe2df0b --- /dev/null +++ b/website/content/docs/job-networking/service-discovery.mdx @@ -0,0 +1,10 @@ +--- +layout: docs +page_title: Service Discovery +description: |- + Nomad service discovery helps you automatically connect workloads. Compare Nomad's built-in service discovery feature to Consul service discovery, which adds a DSN interface and service mesh. Learn about health checks, configuring tags in job specification service blocks, and how to specify tags for canary and blue/green allocations. +--- + +# Service discovery + +@include 'service-discovery.mdx' diff --git a/website/content/docs/job-run/index.mdx b/website/content/docs/job-run/index.mdx new file mode 100644 index 000000000..9c5590e03 --- /dev/null +++ b/website/content/docs/job-run/index.mdx @@ -0,0 +1,46 @@ +--- +layout: docs +page_title: Run Nomad jobs +description: |- + This section provides guidance for running Nomad jobs. Inspect job, access logs for troubleshooting, collect resource utilization metrics, and create job versions. +--- + +# Run Nomad jobs + +This section provides guidance for running Nomad jobs. Inspect job, access logs +for troubleshooting, collect resource utilization metrics, and create job +versions. + +The following list is the general flow for operating a job in Nomad: + +1. Declare the job according to the [job specification](/nomad/docs/job-specification). +1. Plan and review the changes with a Nomad server. +1. Submit the job file to a Nomad server. +1. (Optional) Review job status and logs. + +Refer to [Create and submit a job](/nomad/docs/job-declare/create-job) to learn +how to create a job specification, review the job plan, and submit the job. + +## Run a job + +Execute the [`nomad job run` command](/nomad/commands/job/run) to run your job. + +Now that the job is scheduled, it may or may not be running. You need to inspect +the allocation status and logs to make sure the job started correctly. Refer to +the section on [inspecting state](/nomad/docs/job-run/inspect) +details ways to examine this job's state. + +## Update a running job + +When updating a job, there are a number of [built-in update +strategies](/nomad/docs/job-declare/strategy) which may be defined in the job +file. The general flow for updating an existing job in Nomad is: + +1. Modify the existing job file with the desired changes. +1. Plan and review the changes with a Nomad server. +1. Submit the job file to a Nomad server. +1. (Optional) Review job status and logs. + +Because the job specification defines the deployment strategy, the workflow +remains the same regardless of whether this is an initial deployment or a +long-running job. diff --git a/website/content/docs/job-run/inspect.mdx b/website/content/docs/job-run/inspect.mdx new file mode 100644 index 000000000..ca969e369 --- /dev/null +++ b/website/content/docs/job-run/inspect.mdx @@ -0,0 +1,212 @@ +--- +layout: docs +page_title: Inspect running jobs +description: |- + Inspect the status of a running job, the associated evaluation, and allocations to troubleshoot for errors with the Nomad CLI. +--- + +# Inspect running jobs + +A successful job submission is not an indication of a successfully-running job. +This is the nature of a highly-optimistic scheduler. A successful job submission +means the server was able to issue the proper scheduling commands. It does not +indicate the job is actually running. To verify the job is running and healthy, +you might need to inspect its state. + +This section will utilize the job named "docs", but +these operations and command largely apply to all jobs in Nomad. + +## Query the job status + +After a job is submitted, you can query the status of that job using the job +status command: + +```shell-session +$ nomad job status +ID Type Priority Status +docs service 50 running +``` + +At a high level, you can observe that the job is currently running, but what +does "running" actually mean. By supplying the name of a job to the job status +command, you can ask Nomad for more detailed job information: + +```shell-session +$ nomad job status docs +ID = docs +Name = docs +Type = service +Priority = 50 +Datacenters = dc1 +Status = running +Periodic = false + +Summary +Task Group Queued Starting Running Failed Complete Lost +example 0 0 3 0 0 0 + +Allocations +ID Eval ID Node ID Task Group Desired Status Created At +04d9627d 42d788a3 a1f934c9 example run running +e7b8d4f5 42d788a3 012ea79b example run running +5cbf23a1 42d788a3 1e1aa1e0 example run running +``` + +This output shows that there are three instances of this task running, each with +its own allocation. For more information on the `status` command, please consult +the [`nomad job status` command] documentation. + +## Fetch an evaluation's status + +You can think of an evaluation as a submission to the scheduler. An example +below shows status output for a job where some allocations were placed +successfully, but did not have enough resources to place all of the desired +allocations. + +If you issue the status command with the `-evals` flag, the output will show +that there is an outstanding evaluation for this hypothetical job: + +```shell-session +$ nomad job status -evals docs +ID = docs +Name = docs +Type = service +Priority = 50 +Datacenters = dc1 +Status = running +Periodic = false + +Evaluations +ID Priority Triggered By Status Placement Failures +5744eb15 50 job-register blocked N/A - In Progress +8e38e6cf 50 job-register complete true + +Placement Failure +Task Group "example": + * Resources exhausted on 1 nodes + * Dimension "cpu" exhausted on 1 nodes + +Allocations +ID Eval ID Node ID Task Group Desired Status Created At +12681940 8e38e6cf 4beef22f example run running +395c5882 8e38e6cf 4beef22f example run running +4d7c6f84 8e38e6cf 4beef22f example run running +843b07b8 8e38e6cf 4beef22f example run running +a8bc6d3e 8e38e6cf 4beef22f example run running +b0beb907 8e38e6cf 4beef22f example run running +da21c1fd 8e38e6cf 4beef22f example run running +``` + +The output states that the job has a "blocked" evaluation that is in progress. +When Nomad can not place all the desired allocations, it creates a blocked +evaluation that waits for more resources to become available. + +The `eval status` command enables examination of any evaluation in more detail. +For the most part this should never be necessary. However, it can be useful to +understand what triggered a specific evaluation and it's current status. Running +it on the "complete" evaluation provides output similar to the following: + +```shell-session +$ nomad eval status 8e38e6cf +ID = 8e38e6cf +Status = complete +Status Description = complete +Type = service +TriggeredBy = job-register +Job ID = docs +Priority = 50 +Placement Failures = true + +Failed Placements +Task Group "example" (failed to place 3 allocations): + * Resources exhausted on 1 nodes + * Dimension "cpu" exhausted on 1 nodes + +Evaluation "5744eb15" waiting for additional capacity to place remainder +``` + +This output indicates that the evaluation was created by a "job-register" event +and that it had placement failures. The evaluation also has the information on +why placements failed. Also output is the evaluation of any follow-up +evaluations created. + +If you would like to learn more about this output, consult the documentation for +[`nomad eval status` command]. + +## Retrieve an allocation's status + +You can think of an allocation as an instruction to schedule. Like an +application or service, an allocation has logs and state. The `alloc status` +command gives the most recent events that occurred for a task, its resource +usage, port allocations and more: + +```shell-session +$ nomad alloc status 04d9627d +ID = 04d9627d +Eval ID = 42d788a3 +Name = docs.example[2] +Node ID = a1f934c9 +Job ID = docs +Client Status = running + +Task "server" is "running" +Task Resources +CPU Memory Disk Addresses +0/100 MHz 728 KiB/10 MiB 300 MiB http: 10.1.1.196:5678 + +Recent Events: +Time Type Description +10/09/16 00:36:06 UTC Started Task started by client +10/09/16 00:36:05 UTC Received Task received by client +``` + +The [`nomad alloc status` command] is a good starting to point for debugging an +application that did not start. Hypothetically assume a user meant to start a +Docker container named "redis:2.8", but accidentally put a comma instead of a +period, typing "redis:2,8". + +When the job is executed, it produces a failed allocation. The `nomad alloc status` command will give the reason why. + +```shell-session +$ nomad alloc status 04d9627d +ID = 04d9627d +... + +Recent Events: +Time Type Description +06/28/16 15:50:22 UTC Not Restarting Error was unrecoverable +06/28/16 15:50:22 UTC Driver Failure failed to create image: Failed to pull `redis:2,8`: API error (500): invalid tag format +06/28/16 15:50:22 UTC Received Task received by client +``` + +Unfortunately not all failures are as visible in the allocation status output. +If the `alloc status` command shows many restarts, there is likely an +application-level issue during start up. For example: + +```shell-session +$ nomad alloc status 04d9627d +ID = 04d9627d +... + +Recent Events: +Time Type Description +06/28/16 15:56:16 UTC Restarting Task restarting in 5.178426031s +06/28/16 15:56:16 UTC Terminated Exit Code: 1, Exit Message: "Docker container exited with non-zero exit code: 1" +06/28/16 15:56:16 UTC Started Task started by client +06/28/16 15:56:00 UTC Restarting Task restarting in 5.00123931s +06/28/16 15:56:00 UTC Terminated Exit Code: 1, Exit Message: "Docker container exited with non-zero exit code: 1" +06/28/16 15:55:59 UTC Started Task started by client +06/28/16 15:55:48 UTC Received Task received by client +``` + +To debug these failures, you can use the `nomad alloc logs` command, which is +discussed in the [accessing logs] section of this documentation. + +For more information on the `alloc status` command, please consult the +documentation for the [`nomad alloc status` command]. + +[accessing logs]: /nomad/docs/job-run/logs +[`nomad alloc status` command]: /nomad/commands/alloc/status +[`nomad eval status` command]: /nomad/commands/eval/status +[`nomad job status` command]: /nomad/commands/job/status + diff --git a/website/content/docs/job-run/logs.mdx b/website/content/docs/job-run/logs.mdx new file mode 100644 index 000000000..e0c845e06 --- /dev/null +++ b/website/content/docs/job-run/logs.mdx @@ -0,0 +1,128 @@ +--- +layout: docs +page_title: Access job logs for troubleshooting +description: |- + Access logs of applications running in Nomad with the Nomad CLI or API. +--- + +# Access job logs for troubleshooting + +Viewing application logs is critical for debugging issues, examining performance +problems, or even verifying the application started correctly. To make this +as simple as possible, Nomad provides: + +- Job specification for [log rotation](/nomad/docs/job-specification/logs) +- CLI command for [log viewing](/nomad/commands/alloc/logs) +- API for programmatic [log access](/nomad/api-docs/client#stream-logs) + +This section will use the job named "docs", but +these operations and command largely apply to all jobs in Nomad. + +As a reminder, here is the output of the run command from the previous example: + +```shell-session +$ nomad job run docs.nomad.hcl +==> Monitoring evaluation "42d788a3" + Evaluation triggered by job "docs" + Allocation "04d9627d" created: node "a1f934c9", group "example" + Allocation "e7b8d4f5" created: node "012ea79b", group "example" + Allocation "5cbf23a1" modified: node "1e1aa1e0", group "example" + Evaluation status changed: "pending" -> "complete" +==> Evaluation "42d788a3" finished with status "complete" +``` + +The provided allocation ID (which is also available via the `nomad status` +command) is required to access the application's logs. To access the logs of our +application, issue the following command: + +```shell-session +$ nomad alloc logs 04d9627d +``` + +The output will look something like this: + +```plaintext + 10.1.1.196:5678 10.1.1.196:33407 "GET / HTTP/1.1" 200 12 "curl/7.35.0" 21.809µs + 10.1.1.196:5678 10.1.1.196:33408 "GET / HTTP/1.1" 200 12 "curl/7.35.0" 20.241µs + 10.1.1.196:5678 10.1.1.196:33409 "GET / HTTP/1.1" 200 12 "curl/7.35.0" 13.629µs +``` + +By default, this will return the logs of the task. If more than one task is +defined in the job file, the name of the task is a required argument: + +```shell-session +$ nomad alloc logs 04d9627d server +``` + +The logs command supports both displaying the logs as well as following logs, +blocking for more output, similar to `tail -f`. To follow the logs, use the +appropriately named `-f` flag: + +```shell-session +$ nomad alloc logs -f 04d9627d +``` + +This will stream logs to your console. + +If you are only interested in the "tail" of the log, use the `-tail` and `-n` +flags: + +```shell-session +$ nomad alloc logs -tail -n 25 04d9627d +``` + +This will show the last 25 lines. If you omit the `-n` flag, `-tail` will +default to 10 lines. + +By default, only the logs on stdout are displayed. To show the log output from +stderr, use the `-stderr` flag: + +```shell-session +$ nomad alloc logs -stderr 04d9627d +``` + +## Consider the "log shipper" pattern + +While the logs command works well for quickly accessing application logs, it +generally does not scale to large systems or systems that produce a lot of log +output, especially for the long-term storage of logs. Nomad's retention of log +files is best effort, so chatty applications should use a better log retention +strategy. + +Since applications log to the `alloc/` directory, all tasks within the same task +group have access to each others logs. Thus it is possible to have a task group +as follows: + +```hcl +group "my-group" { + task "server" { + # ... + + # Setting the server task as the leader of the task group allows Nomad to + # signal the log shipper task to gracefully shutdown when the server exits. + leader = true + } + + task "log-shipper" { + # ... + } +} +``` + +In the above example, the `server` task is the application that should be run +and will be producing the logs. The `log-shipper` reads those logs from the +`alloc/logs/` directory and sends them to a longer-term storage solution such as +Amazon S3 or an internal log aggregation system. + +When using the log shipper pattern, especially for batch jobs, the main task +should be marked as the [leader task]. By marking the main task as a leader, +when the task completes all other tasks within the group will be gracefully +shutdown. This allows the log shipper to finish sending any logs and then +exiting itself. The log shipper should set a high enough [`kill_timeout`] such +that it can ship any remaining logs before exiting. + +[log rotation]: /nomad/docs/job-specification/logs +[log viewing]: /nomad/commands/alloc/logs +[log access]: /nomad/api-docs/client#stream-logs +[leader task]: /nomad/docs/job-specification/task#leader +[`kill_timeout`]: /nomad/docs/job-specification/task#kill_timeout diff --git a/website/content/docs/job-run/utilization-metrics.mdx b/website/content/docs/job-run/utilization-metrics.mdx new file mode 100644 index 000000000..76c243c7e --- /dev/null +++ b/website/content/docs/job-run/utilization-metrics.mdx @@ -0,0 +1,91 @@ +--- +layout: docs +page_title: Collect resource utilization metrics +description: |- + Inspect the resource consumption and utilization information of a job with + the task drivers in Nomad. +--- + +# Collect resource utilization metrics + +Understanding the resource utilization of an application is important, and Nomad +supports reporting detailed statistics in many of its drivers. The main +interface for outputting resource utilization is the `alloc status` command with +the `-stats` flag. + +This section will use the job named "docs", but +these operations and command largely apply to all jobs in Nomad. + +Here is the output of the run command for the "docs" job. + +```shell-session +$ nomad job run docs.nomad.hcl +==> Monitoring evaluation "42d788a3" + Evaluation triggered by job "docs" + Allocation "04d9627d" created: node "a1f934c9", group "example" + Allocation "e7b8d4f5" created: node "012ea79b", group "example" + Allocation "5cbf23a1" modified: node "1e1aa1e0", group "example" + Evaluation status changed: "pending" -> "complete" +==> Evaluation "42d788a3" finished with status "complete" +``` + +To fetch the detailed usage statistics, issue the following command. Your +allocation id will be different; replace `04d9627d` with the allocation id from +your running "docs" job: + +```shell-session +$ nomad alloc status -stats 04d9627d +ID = 04d9627d +Eval ID = 42d788a3 +Name = docs.example[2] +Node ID = a1f934c9 +Job ID = docs +Client Status = running + +Task "server" is "running" +Task Resources +CPU Memory Disk Addresses +75/100 MHz 784 KiB/10 MiB 300 MiB http: 10.1.1.196:5678 + +Memory Stats +Cache Max Usage RSS Swap +56 KiB 1.3 MiB 784 KiB 0 B + +CPU Stats +Percent Throttled Periods Throttled Time +0.00% 0 0 + +Recent Events: +Time Type Description + Started Task started by client + Received Task received by client +``` + +The output indicates that the job is running near the limit of configured CPU +but has plenty of memory headroom. You can use this information to alter the +job's resources to better reflect its actual needs: + +```hcl +resources { + cpu = 200 + memory = 10 +} +``` + +Adjusting resources is very important for a variety of reasons: + +- Ensuring your application does not get OOM killed if it hits its memory limit. +- Ensuring the application performs well by ensuring it has some CPU allowance. +- Optimizing cluster density by reserving what you need and not over-allocating. + +While single point in time resource usage measurements are useful, it is often +more useful to graph resource usage over time to better understand and estimate +resource usage. Nomad supports outputting resource data to statsite and statsd +and is the recommended way of monitoring resources. For more information about +outputting telemetry, consult the [Telemetry Guide]. + +For more advanced use cases, the resource usage data is also accessible via the +client's HTTP API. Learn more about it in the [`allocation` API] documentation. + +[telemetry guide]:/nomad/docs/monitor +[`allocation` api]: /nomad/api-docs/client diff --git a/website/content/docs/job-run/versions.mdx b/website/content/docs/job-run/versions.mdx new file mode 100644 index 000000000..592a687f0 --- /dev/null +++ b/website/content/docs/job-run/versions.mdx @@ -0,0 +1,552 @@ +--- +layout: docs +page_title: Create and modify job versions +description: |- + Create, modify, delete, compare, and revert job versions with the Nomad CLI, API, or UI. +--- + +# Create and modify job versions + +Nomad creates a new version for your job each time you run your job. A job can +have an unlimited number of versions, and version history is stored in state. +Over time, Nomad garbage collects dead versions that do not have a version tag. Saving a tag to a version prevents Nomad from garbage collecting that version. + +This guide demonstrates the following job version features: + +- Create, modify, and delete job version tags. +- Compare versions. +- Revert a running job to an older version, no matter how much time has passed. +- Clone a version. + +## Prerequisites + +- This feature requires Nomad v1.9.0 and later. +- You are familiar with [job versions and tags][job-concept]. + +## Create the `hello-world` job + +The examples use a job named `hello-world`, which is one of Nomad's job +templates. + +1. On the **Jobs** page, click **Run Job**. +1. Click **Choose from template**. +1. Select **Hello world**. +1. Click **Apply**. +1. Click **Plan**. +1. Review the **Job Plan** output. +1. Click **Run** to run the `hello-world` job. + +## Create a version tag + +When you create a version tag, you should provide Nomad with these attributes: + +- A tag name +- A job name +- A version number + +The following example creates a tag named `golden-version` for version zero of `hello-world`. It includes a description of the tag. + + + + +Use the `nomad job tag apply [options] ` command to create a tag. + +```shell-session +$ nomad job tag apply -version 0 -name "golden-version" \ + -description "The version we can roll back to." \ + hello-world + +Job version 0 tagged with name "golden-version" +``` + +Note that Nomad tags the latest version if you omit the version number. + +Refer to the [`job tag apply`][job-tag-apply-cli] command reference for details on +including general options such as `namespace`. + + + + +| Method | Path | Produces | +| ------ | ----------------- | ------------------ | +| `POST` | `/v1/job/:job_id/versions/:tag_name/tag` | `application/json` | + +This example assumes the Nomad API is accessible on `localhost:4646`. + +```shell-session +$ curl -X POST \ + localhost:4646/v1/job/hello-world/versions/golden-version/tag \ + -H "Content-Type: application/json" -d \ + '{"Version": 0, "Description": "The version we can roll back to."}' +``` + +The JSON response is similar to the following example. + +```json +{ + "Name":"golden-version", + "Description":"The version we can roll back to.", + "TaggedTime":1728325495829793000, + "Index":361, + "LastContact":0, + "KnownLeader":false, + "NextToken":""} +``` + +Refer to the Jobs HTTP API [Create Job Version Tag][job-tag-apply-api] reference for +details on path and payload parameters. + + + + +1. From the **Jobs** screen, click the **hello-world** job to display job details. +1. Click **Versions**. +1. Find **Version #0** in the list. +1. Click **Tag this version**. +1. Enter `golden version` in the **Tag Name** field and `The version we can roll + back to.` in the **Tag Description** field. +1. Click **Save**. + +Version zero now has a `golden-version` tag. + +![Version tag](/img/nomad/job-version-tag/nomad-ui-version-tag.png) + + + + +Using the CLI, you can run a new version of your job and create a tag for that +new version. The following example runs a new version of the hello-world job and +immediately tags that version. + +```shell-session +$ nomad job run hello-world.nomad.hcl && \ + nomad job tag apply -name "high-traffic-version" hello-world + +==> 2024-10-08T14:42:30-05:00: Monitoring evaluation "90714134" + 2024-10-08T14:42:30-05:00: Evaluation triggered by job "hello-world" + 2024-10-08T14:42:31-05:00: Evaluation within deployment: "192ecea1" + 2024-10-08T14:42:31-05:00: Allocation "ec85c1bd" created: node "d6ee954e", group "servers" + 2024-10-08T14:42:31-05:00: Evaluation status changed: "pending" -> "complete" +==> 2024-10-08T14:42:31-05:00: Evaluation "90714134" finished with status "complete" +==> 2024-10-08T14:42:31-05:00: Monitoring deployment "192ecea1" + ✓ Deployment "192ecea1" successful + + 2024-10-08T14:42:48-05:00 + ID = 192ecea1 + Job ID = hello-world + Job Version = 4 + Status = successful + Description = Deployment completed successfully + + Deployed + Task Group Desired Placed Healthy Unhealthy Progress Deadline + servers 1 1 1 0 2024-10-08T14:52:46-05:00 +Job version 1 tagged with name "high-traffic-version" +``` + +## Modify a version tag + +The following example updates both the name and description of the `golden-version` tag for a hello-world job. + + + + +Use the `nomad job tag apply [options] ` command to modify a tag's attributes. + +```shell-session +$ nomad job tag apply -version 0 -name "golden-version-0" \ + -description "Low traffic version." \ + hello-world + +Job version 0 tagged with name "golden-version-0" +``` + +Refer to the [`job tag apply`][job-tag-apply-cli] command reference for details on +including general options such as `namespace`. + + + + +| Method | Path | Produces | +| ------ | ----------------- | ------------------ | +| `POST` | `/v1/job/:job_id/versions/:tag_name/tag` | `application/json` | + +This example assumes the Nomad API is accessible on `localhost:4646`. + +```shell-session +$ curl -X POST \ + localhost:4646/v1/job/hello-world/versions/golden-version-0/tag \ + -H "Content-Type: application/json" -d \ + '{"Version": 0, "Description": "Low traffic version."}' +``` + +The response is similar to the following. + +```json +{ + "Name":"golden-version-0", + "Description":"Low traffic version.", + "TaggedTime":1728407951089465000, + "Index":3460, + "LastContact":0, + "KnownLeader":false, + "NextToken":""} +``` + +See the Jobs HTTP API [Create Job Version Tag][job-tag-apply-api] reference for +details on path and payload parameters. + + + + +1. From the **Jobs** screen, click the **hello-world** job to display job details. +1. Click **Versions**. +1. Find **Version #0** in the list. +1. Click **golden-version**. +1. Edit the tag name and description. +1. Click **Save**. + + + + +## Delete a version tag + +The following example deletes the `golden-version` tag attached to the `hello-world` job. + + + + +Use `nomad job tag unset -name "" ` to delete a tag from a version. This command requires a tag name and job ID. + +```shell-session +$ nomad job tag unset -name "golden-version" hello-world + +removed from job "hello-world" +``` + +Refer to the [`job tag unset`][job-tag-unset-cli] command reference for details on +including general options such as `namespace`. + + + + +| Method | Path | Produces | +| ------ | ----------------- | ------------------ | +| `DELETE` | `/v1/job/:job_id/versions/:tag_name/tag` | `application/json` | + +This example assumes the Nomad API is accessible on `localhost:4646`. + +```shell-session +$ curl -X DELETE \ + localhost:4646/v1/job/hello-world/versions/golden-version/tag \ + -H "Content-Type: application/json" +``` + +The response is similar to the following. + +```json +{ + "Name":"", + "Description":"", + "TaggedTime":0, + "Index":5135, + "LastContact":0, + "KnownLeader":false, + "NextToken":"" +} +``` + +Refer to the Jobs HTTP API [Delete Job Version Tag][job-tag-unset-api] reference for +details on path and payload parameters. + + + + +1. From the **Jobs** screen, click the **hello-world** job to display job details. +1. Click **Versions**. +1. Find **Version #0** in the list. +1. Click **golden-version**. +1. Click **Delete** to remove the tag. + + + + +## Compare versions + + + + +Use the [`nomad job history -p` command][nomad-job-history-cli] to compare +different job versions. The `-p` option displays the differences between each +version and the most recent version. You also have these options: + +- `-diff-version`: Specifies the version number of the job to compare against. + Mutually exclusive with the `-diff-tag` flag. +- `-diff-tag`: Specifies the version of the job to compare against, referenced + by tag name. Defaults to the latest version. Mutually exclusive with `-diff-version`. + +### Show diff based on a version + +The `nomad job history -p -diff-version ` command compares all +versions against the specified `diff-version`. + +The following example compares all job versions to version 4. + +```shell-session +$ nomad job history -p -diff-version=4 hello-world +``` + +You can also perform a diff between two specific versions. This example compares + version 4 of the hello-world job with version 1 of the job. + +```shell-session +$ nomad job history -p -version=4 -diff-version=1 hello-world +``` + +### Show diff based on a tag + +The `nomad job history -p -diff-tag ` command compares all +versions against the specified `diff-tag`. + +The following example compares all job versions to the version tagged with the name `golden-version`. + +```shell-session +$ nomad job history -p -diff-tag="golden-version" hello-world +``` + +You can also perform a diff between a tag and a version number. The following +example compares the current version, `-version=4`, with the version tagged +`golden-version`. + +```shell-session +$ nomad job history -p -version=4 -diff-tag="golden-version" hello-world + +Version = 4 +Stable = true +Submit Date = 2024-10-08T14:42:30-05:00 +Tag Name = high-traffic-version +Diff = ++/- Job: "hello-world" ++/- Task Group: "servers" + + Network { + Hostname: "" + Mode: "" + + Dynamic Port { + + HostNetwork: "default" + + Label: "www" + + To: "8002" + } + } + - Network { + Hostname: "" + Mode: "" + - Dynamic Port { + - HostNetwork: "default" + - Label: "www" + - To: "8001" + } + } +``` + + + + +You can get a version list with the `Diffs` field populated. To compare all +versions to a specific version, use the `diff_version` query parameter. + +This example compares all versions to version one. + +```shell-session +$ curl -X GET \ + localhost:4646/v1/job/hello-world/versions?diffs=true&diff_version=1 +``` + +Refer to the Jobs HTTP API [List Job Versions][job-list-diff-api] reference for +details and complete examples. + + + + +The job detail's **Versions** tab shows the list of versions. + +![Version diff features](/img/nomad/job-version-tag/nomad-ui-version-diff.png) + +The two important elements are the **Diff against** dropdown, labeled "1", and +the changes show or hide toggle, labeled "2". + +The **Diff against** dropdown contains versions or tags that change how the UI +compares the versions against each other. + +![Version diff dropdown items](/img/nomad/job-version-tag/nomad-ui-diff-dd.png) + +The **Diff against previous version** option means that each version displays +the difference with the previous version in the list. The **See Change** +toggle displays the number of changes. Click the **See Change** arrow +to review the actual difference. + +![Version diff previous](/img/nomad/job-version-tag/nomad-ui-diffs-expanded.png) + +When you select a version or tag, the UI +automatically displays the differences each version has with the selected +version. + +![Each version's difference with version seven](/img/nomad/job-version-tag/nomad-ui-diff-changes.png) + + + + +## Revert to a version + +Use job tags to revert the current running job to a prior version. + +The following examples revert versions of the `hello-world` job to specific version number or tag names. + + + + +Use the `nomad job revert [options] ` command to revert +the current job to a prior version. + +This example reverts the job to version three. + +```shell-session +$ nomad job revert hello-world 3 +``` + +This example reverts the job to the version with the `golden-version` tag. + +```shell-session +$ nomad job revert hello-world "golden-version" +``` + +Refer to the [`job revert`][job-revert-cli] command reference for more examples +as well as details on including general options such as namespace. + + + + +| Method | Path | Produces | +| ------ | ------------------------ | ------------------ | +| `POST` | `/v1/job/:job_id/revert` | `application/json` | + +You can revert a job to a previous version by specifying version number or the +tag name. + +This example reverts the current job to version six. + +```shell-session +$ curl -X POST \ + localhost:4646/v1/job/hello-world/revert \ + -H "Content-Type: application/json" -d \ + '{"JobID": "hello-world", "JobVersion": 6}' +``` + +This example reverts the current job to the version tagged `golden-version`. + +```shell-session +$ curl -X POST \ + localhost:4646/v1/job/hello-world/revert \ + -H "Content-Type: application/json" -d \ + '{"JobID": "hello-world", "TaggedVersion": "golden-version"}' +``` + +The JSON response for both examples is similar to the following. + +```json +{ + "EvalID":"c3b8b0b1-85b5-34f9-de70-80d859c6466a", + "EvalCreateIndex":6442, + "JobModifyIndex":6442, + "Warnings":"", + "Index":6442, + "LastContact":0, + "KnownLeader":false, + "NextToken":"" +} +``` + +Refer to the Jobs HTTP API [Revert to older Job Version][job-revert-api] +reference for details on path and payload parameters. + + + + +In this example, you revert the current job to the version with the +`golden-version` tag. + +![Revert job](/img/nomad/job-version-tag/nomad-ui-revert-job.png) + +1. From the **Jobs** screen, click the **hello-world** job to display job details. +1. Click **Versions**. +1. Find the version with the `golden-version` tag. +1. Click **Revert Version**. The UI asks you to confirm. +1. Click **Yes, Revert Version** to complete the reversion process. + +The UI then displays the **Overview** tab, where you can review the new version deployment. + + + + +## Clone a version + +Use the web UI to clone a job version. + +You can use a cloned version for a new version of the same job or to create a new job. + +### Clone as new version + +In this example, you clone the `hello-world` job's `golden-version`, edit the +job spec, plan, and then run the new version. + +1. From the **Jobs** screen, click the **hello-world** job to display job details. +1. Click the **Versions** tab. +1. Under the version with the `golden-version` tag, click **Clone and Edit**. +1. Click **Clone as New Version of hello-world**. +1. Edit the job definition. + + Since this job spec was created using HCL, the UI displays the definition in the **Job Spec** tab. + + Change the network port to `8080`. + + + If you choose to edit the JSON in the Full Definition tab, + the JSON definition replaces the HCL definition, so you lose the HCL + job spec. + We recommending using HCL for job specs. + + +1. Click **Plan**. +1. Review the plan output. +1. Click **Run** to run the new version. + +The **Versions** tab displays the new version. + +### Clone as new job + +In this example, you clone the `hello-world` job's `golden-version`, edit the +job name and network port, plan, and then run the new job. + +1. From the **Jobs** screen, click the **hello-world** job to display job details. +1. Click the **Versions** tab. +1. Under the version with the `golden-version` tag, click **Clone and Edit**. +1. Click **Clone as New Job**. +1. Edit the job spec. + + Change the job name to `hello-earth` and the network port to `9080`. You must change the job name. Otherwise, Nomad creates a new version of the + original job. + +1. Click **Plan**. +1. Review the plan output. +1. Click **Run** to run the new job. + +Nomad loads the **Overview** of the `hello-earth` job so you can review the deployment. + +[job-concept]: /nomad/docs/concepts/job#job-versions +[job-tag-apply-cli]: /nomad/commands/job/tag/apply +[job-tag-apply-api]: /nomad/api-docs/jobs#create-job-version-tag +[job-tag-unset-cli]: /nomad/commands/job/tag/unset +[job-tag-unset-api]: /nomad/api-docs/jobs#delete-job-version-tag +[nomad-job-history-cli]: /nomad/commands/job/history +[job-list-diff-api]: /nomad/api-docs/jobs#list-job-versions +[job-revert-cli]: /nomad/commands/job/revert +[job-revert-api]: /nomad/api-docs/jobs#revert-to-older-job-version diff --git a/website/content/docs/job-scheduling/affinity.mdx b/website/content/docs/job-scheduling/affinity.mdx new file mode 100644 index 000000000..df43fef4e --- /dev/null +++ b/website/content/docs/job-scheduling/affinity.mdx @@ -0,0 +1,250 @@ +--- +layout: docs +page_title: Job placements with affinities +description: >- + Configure affinities to express placement preferences for your jobs. Create a + job with an affinity, submit it to Nomad, and monitor it after placement. +--- + +# Job placements with affinities + +The [affinity][affinity-stanza] stanza allows operators to express placement +preferences for their jobs on particular types of nodes. Note that there is a +key difference between the [constraint][constraint] stanza and the affinity +stanza. The constraint stanza strictly filters where jobs are run based on +[attributes][attributes] and [client metadata][client-metadata]. If no nodes are +found to match, the placement does not succeed. The affinity stanza acts like a +"soft constraint." Nomad will attempt to match the desired affinity, but +placement will succeed even if no nodes match the desired criteria. This is done +in conjunction with scoring based on the Nomad scheduler's bin packing algorithm +which you can read more about [here][scheduling]. + +In this guide, you will encounter a sample application. Your application can run +in datacenters `dc1` and `dc2`; however, you have a strong preference to run it in +dc2. You will learn how to configure the job to inform the scheduler of your +preference, while still allowing it to place your workload in `dc1` if the +desired resources aren't available in dc2. + +By specify an affinity with the proper [weight], the Nomad scheduler can find +the best nodes on which to place your job. The affinity weight will be included +when scoring nodes for placement along with other factors like the bin-packing +algorithm. + +### Prerequisites + +To perform the tasks described in this guide, you need to have a Nomad +environment with Consul installed. You can use this [repository] to provision a +sandbox environment. This guide will assume a cluster with one server node and +three client nodes. + + + + This guide is for demo purposes and is only using a single +server node. In a production cluster, 3 or 5 server nodes are recommended. + + + +## Place one of the client nodes in a different datacenter + +You are going express your job placement preference based on the datacenter your +nodes are located in. Choose one of your client nodes and edit +`/etc/nomad.d/nomad.hcl` to change its location to `dc2`. A snippet of an +example configuration file is show below with the required change is shown +below. + +```hcl +data_dir = "/opt/nomad/data" +bind_addr = "0.0.0.0" +datacenter = "dc2" + +# Enable the client +client { + enabled = true +# ... +} +``` + +After making the change on your chosen client node, restart the Nomad service + +```shell-session +$ sudo systemctl restart nomad +``` + +If everything worked correctly, one of your nodes will now show datacenter `dc2` +when you run the [`nomad node status`][node-status] command. + +```shell-session +$ nomad node status +ID DC Name Class Drain Eligibility Status +3592943e dc1 ip-172-31-27-159 false eligible ready +3dea0188 dc1 ip-172-31-16-175 false eligible ready +6b6e9518 dc2 ip-172-31-27-25 false eligible ready +``` + +## Create a job with an affinity + +Create a file with the name `redis.nomad.hcl` and place the following content in it: + +```hcl +job "redis" { + datacenters = ["dc1", "dc2"] + type = "service" + + affinity { + attribute = "${node.datacenter}" + value = "dc2" + weight = 100 + } + + group "cache1" { + count = 4 + + network { + port "db" { + to = 6379 + } + } + + service { + name = "redis-cache" + port = "db" + + check { + name = "alive" + type = "tcp" + interval = "10s" + timeout = "2s" + } + } + + + task "redis" { + driver = "docker" + + config { + image = "redis:latest" + ports = ["db"] + } + } + } +} +``` + +Observe that the job uses the `affinity` stanza and specifies `dc2` as the value +for the `${node.datacenter}` [attribute][attributes]. It also uses the value +`100` for the [weight][weight] which will cause the Nomad scheduler to rank +nodes in datacenter `dc2` with a higher score. Keep in mind that weights can +range from -100 to 100, inclusive. Negative weights serve as anti-affinities +which cause Nomad to avoid placing allocations on nodes that match the criteria. + +## Register the redis Nomad job + +Run the Nomad job with the following command: + +```shell-session +$ nomad run redis.nomad.hcl +==> Monitoring evaluation "11388ef2" + Evaluation triggered by job "redis" + Allocation "0dfcf0ba" created: node "6b6e9518", group "cache1" + Allocation "89a9aae9" created: node "3592943e", group "cache1" + Allocation "9a00f742" created: node "6b6e9518", group "cache1" + Allocation "fc0f21bc" created: node "3dea0188", group "cache1" + Evaluation status changed: "pending" -> "complete" +==> Evaluation "11388ef2" finished with status "complete" +``` + +Note that two of the allocations in this example have been placed on node +`6b6e9518`. This is the node configured to be in datacenter `dc2`. The Nomad +scheduler selected this node because of the affinity specified. All of the +allocations have not been placed on this node because the Nomad scheduler +considers other factors in the scoring such as bin-packing. This helps avoid +placing too many instances of the same job on a node and prevents reduced +capacity during a node level failure. You will take a detailed look at the +scoring in the next few steps. + +## Check the status of the job + +At this point, Check the status of the job and verify where the allocations +have been placed. Run the following command: + +```shell-session +$ nomad status redis +``` + +There should be four instances of the job running in the `Summary` section of +the output as shown below: + +```plaintext +... +Summary +Task Group Queued Starting Running Failed Complete Lost +cache1 0 0 4 0 0 0 + +Allocations +ID Node ID Task Group Version Desired Status Created Modified +0dfcf0ba 6b6e9518 cache1 0 run running 1h44m ago 1h44m ago +89a9aae9 3592943e cache1 0 run running 1h44m ago 1h44m ago +9a00f742 6b6e9518 cache1 0 run running 1h44m ago 1h44m ago +fc0f21bc 3dea0188 cache1 0 run running 1h44m ago 1h44m ago +``` + +You can cross-check this output with the results of the `nomad node status` +command to verify that the majority of your workload has been placed on the node +in `dc2`. In the case of the above output, that node is `6b6e9518`. + +## Obtain detailed scoring information on job placement + +The Nomad scheduler will not always place all of your workload on nodes you have +specified in the `affinity` stanza even if the resources are available. This is +because affinity scoring is combined with other metrics as well before making a +scheduling decision. In this step, you will take a look at some of those other +factors. + +Using the output from the previous step, find an allocation that has been placed +on a node in `dc2` and use the [`nomad alloc status`][alloc status] command with +the [`-verbose`][verbose] option to obtain detailed scoring information on it. +In this example, the allocation ID to be inspected is `0dfcf0ba` (your +allocation IDs will be different). + +```shell-session +$ nomad alloc status -verbose 0dfcf0ba +``` + +The resulting output will show the `Placement Metrics` section at the bottom. + +```plaintext +... +Placement Metrics +Node binpack job-anti-affinity node-reschedule-penalty node-affinity final score +6b6e9518-d2a4-82c8-af3b-6805c8cdc29c 0.33 0 0 1 0.665 +3dea0188-ae06-ad98-64dd-a761ab2b1bf3 0.33 0 0 0 0.33 +3592943e-67e4-461f-d888-d5842372a4d4 0.33 0 0 0 0.33 +``` + +Note that the results from the `binpack`, `job-anti-affinity`, +`node-reschedule-penalty`, and `node-affinity` columns are combined to produce +the numbers listed in the `final score` column for each node. The Nomad +scheduler uses the final score for each node in deciding where to make +placements. + +## Next steps + +Experiment with the weight provided in the `affinity` stanza (the value can be +from -100 through 100) and observe how the final score given to each node +changes (use the `nomad alloc status` command as shown in the previous step). + +### Reference material + +- The [affinity][affinity-stanza] stanza documentation +- [Scheduling][scheduling] with Nomad + +[affinity-stanza]: /nomad/docs/job-specification/affinity +[alloc status]: /nomad/commands/alloc/status +[attributes]: /nomad/docs/reference/runtime-variable-interpolation#node-attributes +[constraint]: /nomad/docs/job-specification/constraint +[client-metadata]: /nomad/docs/configuration/client#meta +[node-status]: /nomad/commands/node/status +[scheduling]: /nomad/docs/concepts/scheduling/how-scheduling-works +[verbose]: /nomad/commands/alloc/status#verbose +[weight]: /nomad/docs/job-specification/affinity#weight +[repository]: https://github.com/hashicorp/nomad/tree/master/terraform#provision-a-nomad-cluster-in-the-cloud diff --git a/website/content/docs/job-scheduling/index.mdx b/website/content/docs/job-scheduling/index.mdx new file mode 100644 index 000000000..922967e6b --- /dev/null +++ b/website/content/docs/job-scheduling/index.mdx @@ -0,0 +1,33 @@ +--- +layout: docs +page_title: Advanced job scheduling +description: >- + Discover the advanced scheduling features available to Nomad jobs including affinity and spread. +--- + +# Advanced job scheduling + +The Nomad [scheduler][scheduling] uses a bin-packing algorithm to optimize the +resource utilization and density of applications in your Nomad cluster. Nomad +0.9 adds new features to allow operators more fine-grained control over +allocation placement. This enables use cases similar to the following: + +- Expressing preference for a certain class of nodes for a specific application + via the [affinity stanza][affinity-stanza]. + +- Spreading allocations across a datacenter, rack or any other node attribute or + metadata with the [spread stanza][spread-stanza]. + +Please refer to the tutorials below for using affinity and spread in Nomad 0.9. + +- [Preemption][preemption-guide] +- [Affinity][affinity-guide] +- [Spread][spread-guide] + +[affinity-guide]: /nomad/docs/job-scheduling/affinity +[affinity-stanza]: /nomad/docs/job-specification/affinity +[preemption-guide]: /nomad/docs/job-scheduling/preemption +[preemption-stanza]: /nomad/docs/job-specification/spread +[scheduling]: /nomad/docs/concepts/scheduling/how-scheduling-works +[spread-guide]: /nomad/docs/job-scheduling/spread +[spread-stanza]: /nomad/docs/job-specification/spread diff --git a/website/content/docs/job-scheduling/preemption.mdx b/website/content/docs/job-scheduling/preemption.mdx new file mode 100644 index 000000000..24dffc0f4 --- /dev/null +++ b/website/content/docs/job-scheduling/preemption.mdx @@ -0,0 +1,502 @@ +--- +layout: docs +page_title: Use preemption for job priority +description: >- + Deploy a low priority job and a high priority job. Then use preemption + to run the higher priority job even when other jobs are running. +--- + +# Use preemption for job priority + +Preemption allows Nomad to evict running allocations to place allocations of a +higher priority. Allocations of a job that are blocked temporarily go into +"pending" status until the cluster has additional capacity to run them. This is +useful when operators need to run relatively higher priority tasks sooner even +under resource contention across the cluster. + +Nomad v0.9.0 added Preemption for [system][system-job] jobs. Nomad v0.9.3 +[Enterprise][enterprise] added preemption for [service][service-job] and +[batch][batch-job] jobs. Nomad v0.12.0 made preemption an open source feature +for all three job types. + +Preemption is enabled by default for system jobs. It can be enabled for service +and batch jobs by sending a [payload][payload-preemption-config] with the +appropriate options specified to the [scheduler configuration][update-scheduler] +API endpoint. + +### Prerequisites + +To perform the tasks described in this guide, you need to have a Nomad +environment with Consul installed. You can use this [repository] to provision a +sandbox environment; however, you need to use Nomad v0.12.0 or higher or Nomad +Enterprise v0.9.3 or higher. + +You need a cluster with one server node and three client nodes. To simulate +resource contention, the nodes in this environment each have 1 GB RAM (For AWS, +you can choose the [t2.micro][t2-micro] instance type). + + + + This tutorial is for demo purposes and is only using a +single server node. Three or five server nodes are recommended for a +production cluster. + + + +## Create a job with low priority + +Start by creating a job with relatively lower priority into your Nomad cluster. +One of the allocations from this job will be preempted in a subsequent +deployment when there is a resource contention in the cluster. Copy the +following job into a file and name it `webserver.nomad.hcl`. + +```hcl +job "webserver" { + datacenters = ["dc1"] + type = "service" + priority = 40 + + group "webserver" { + count = 3 + network { + port "http" { + to = 80 + } + } + + service { + name = "apache-webserver" + port = "http" + + check { + name = "alive" + type = "http" + path = "/" + interval = "10s" + timeout = "2s" + } + } + + task "apache" { + driver = "docker" + + config { + image = "httpd:latest" + ports = ["http"] + } + + resources { + memory = 600 + } + } + } +} +``` + +Note that the [count][count] is 3 and that each allocation is specifying 600 MB +of [memory][memory]. Remember that each node only has 1 GB of RAM. + +## Run the low priority job + +Use the [`nomad job run` command][] to start the `webserver.nomad.hcl` job. + +```shell-session +$ nomad job run webserver.nomad.hcl +==> Monitoring evaluation "35159f00" + Evaluation triggered by job "webserver" + Evaluation within deployment: "278b2e10" + Allocation "0850e103" created: node "cf8487e2", group "webserver" + Allocation "551a7283" created: node "ad10ba3b", group "webserver" + Allocation "8a3d7e1e" created: node "18997de9", group "webserver" + Evaluation status changed: "pending" -> "complete" +==> Evaluation "35159f00" finished with status "complete" +``` + +Check the status of the `webserver` job using the [`nomad job status` command][] +at this point and verify that an allocation has been placed on each client node +in the cluster. + +```shell-session +$ nomad job status webserver +ID = webserver +Name = webserver +Submit Date = 2021-02-11T19:18:29-05:00 +Type = service +Priority = 40 +... +Allocations +ID Node ID Task Group Version Desired Status Created Modified +0850e103 cf8487e2 webserver 0 run running 8s ago 2s ago +551a7283 ad10ba3b webserver 0 run running 8s ago 2s ago +8a3d7e1e 18997de9 webserver 0 run running 8s ago 1s ago +``` + +## Create a job with high priority + +Create another job with a [priority] greater than the "webserver" job. Copy the +following into a file named `redis.nomad.hcl`. + +```hcl +job "redis" { + datacenters = ["dc1"] + type = "service" + priority = 80 + + group "cache1" { + count = 1 + + network { + port "db" { + to = 6379 + } + } + + service { + name = "redis-cache" + port = "db" + + check { + name = "alive" + type = "tcp" + interval = "10s" + timeout = "2s" + } + } + + task "redis" { + driver = "docker" + + config { + image = "redis:latest" + ports = ["db"] + } + + resources { + memory = 700 + } + } + } +} +``` + +Note that this job has a priority of 80 (greater than the priority of the +`webserver` job from earlier) and requires 700 MB of memory. This allocation +will create a resource contention in the cluster since each node only has 1 GB +of memory with a 600 MB allocation already placed on it. + +## Observe a run before and after enabling preemption + +### Try to run `redis.nomad.hcl` + +Remember that preemption for service and batch jobs is [not enabled by +default][preemption-config]. This means that the `redis` job will be queued due +to resource contention in the cluster. You can verify the resource contention +before actually registering your job by running the [`nomad job plan` command][]. + +```shell-session +$ nomad job plan redis.nomad.hcl ++ Job: "redis" ++ Task Group: "cache1" (1 create) + + Task: "redis" (forces create) + +Scheduler dry-run: +- WARNING: Failed to place all allocations. + Task Group "cache1" (failed to place 1 allocation): + * Resources exhausted on 3 nodes + * Dimension "memory" exhausted on 3 nodes +... +``` + +Run the `redis.nomad.hcl` job with the [`nomad job run` command][]. Observe that the +allocation was queued. + +```shell-session +$ nomad job run redis.nomad.hcl +==> Monitoring evaluation "3c6593b4" + Evaluation triggered by job "redis" + Evaluation within deployment: "ae55a4aa" + Evaluation status changed: "pending" -> "complete" +==> Evaluation "3c6593b4" finished with status "complete" but failed to place all allocations: + Task Group "cache1" (failed to place 1 allocation): + * Resources exhausted on 3 nodes + * Dimension "memory" exhausted on 3 nodes + Evaluation "249fd21b" waiting for additional capacity to place remainder +``` + +You can also verify the allocation has been queued by now by fetching the status +of the job using the [`nomad job status` command][]. + +```shell-session +$ nomad job status redis +ID = redis +Name = redis +Submit Date = 2021-02-11T19:22:55-05:00 +Type = service +Priority = 80 +... +Placement Failure +Task Group "cache1": + * Resources exhausted on 3 nodes + * Dimension "memory" exhausted on 3 nodes +... +Allocations +No allocations placed +``` + +Stop the `redis` job for now. In the next steps, you will enable service job +preemption and re-deploy. Use the [`nomad job stop` command][] with the `-purge` +flag set. + +```shell-session +$ nomad job stop -purge redis +==> Monitoring evaluation "a9c9945d" + Evaluation triggered by job "redis" + Evaluation within deployment: "ae55a4aa" + Evaluation status changed: "pending" -> "complete" +==> Evaluation "a9c9945d" finished with status "complete" +``` + +### Enable service job preemption + +Get the current [scheduler configuration][scheduler-configuration] using the +Nomad API. Setting an environment variable with your cluster address makes the +`curl` commands more reusable. Substitute in the proper address for your Nomad +cluster. + +```shell-session +$ export NOMAD_ADDR=http://127.0.0.1:4646 +``` + +If you are enabling preemption in an ACL-enabled Nomad cluster, you will also +need to [authenticate to the API][api-auth] with a Nomad token via the +`X-Nomad-Token` header. In this case, you can use an environment variable to add +the header option and your token value to the command. If you don't use tokens, +skip this step. The `curl` commands will run correctly when the variable is +unset. + +```shell-session +$ export NOMAD_AUTH='--header "X-Nomad-Token: «replace with your token»"' +``` + +ACLs, consult the + +Now, fetch the configuration with the following `curl` command. + +```shell-session +$ curl --silent ${NOMAD_AUTH} \ + ${NOMAD_ADDR}/v1/operator/scheduler/configuration?pretty +``` + +```json +{ + "SchedulerConfig": { + "SchedulerAlgorithm": "binpack", + "PreemptionConfig": { + "SystemSchedulerEnabled": true, + "BatchSchedulerEnabled": false, + "ServiceSchedulerEnabled": false + }, + "CreateIndex": 5, + "ModifyIndex": 5 + }, + "Index": 5, + "LastContact": 0, + "KnownLeader": true +} +``` + +Note that [BatchSchedulerEnabled][batch-enabled] and +[ServiceSchedulerEnabled][service-enabled] are both set to `false` by default. +Since you are preempting service jobs in this guide, you need to set +`ServiceSchedulerEnabled` to `true`. Do this by directly interacting +with the [API][update-scheduler]. + +Create the following JSON payload and place it in a file named `scheduler.json`: + +```json +{ + "PreemptionConfig": { + "SystemSchedulerEnabled": true, + "BatchSchedulerEnabled": false, + "ServiceSchedulerEnabled": true + } +} +``` + +Note that [ServiceSchedulerEnabled][service-enabled] has been set to `true`. + +Run the following command to update the scheduler configuration: + +```shell-session +$ curl --silent ${NOMAD_AUTH} \ + --request POST --data @scheduler.json \ + ${NOMAD_ADDR}/v1/operator/scheduler/configuration +``` + +You should now be able to inspect the scheduler configuration again and verify +that preemption has been enabled for service jobs (output below is abbreviated): + +```shell-session +$ curl --silent ${NOMAD_AUTH} \ + ${NOMAD_ADDR}/v1/operator/scheduler/configuration?pretty +``` + +```plaintext +... + "PreemptionConfig": { + "SystemSchedulerEnabled": true, + "BatchSchedulerEnabled": false, + "ServiceSchedulerEnabled": true + }, +... +``` + +### Try running the redis job again + +Now that you have enabled preemption on service jobs, deploying your `redis` job +should evict one of the lower priority `webserver` allocations and place it into +a queue. You can run `nomad plan` to output a preview of what will happen: + +```shell-session +$ nomad job plan redis.nomad.hcl ++ Job: "redis" ++ Task Group: "cache1" (1 create) + + Task: "redis" (forces create) + +Scheduler dry-run: +- All tasks successfully allocated. + +Preemptions: + +Alloc ID Job ID Task Group +8a3d7e1e-40ee-f731-5135-247d8b7c2901 webserver webserver +... +``` + +The preceding plan output shows that one of the `webserver` allocations will be +evicted in order to place the requested `redis` instance. + +Now use the [`nomad job run` command][] to run the `redis.nomad.hcl` job file. + +```shell-session +$ nomad job run redis.nomad.hcl +==> Monitoring evaluation "fef3654f" + Evaluation triggered by job "redis" + Evaluation within deployment: "37b37a63" + Allocation "6ecc4bbe" created: node "cf8487e2", group "cache1" + Evaluation status changed: "pending" -> "complete" +==> Evaluation "fef3654f" finished with status "complete" +``` + +Run the [`nomad job status` command][] on the `webserver` job to verify one of +the allocations has been evicted. + +```shell-session +$ nomad job status webserver +ID = webserver +Name = webserver +Submit Date = 2021-02-11T19:18:29-05:00 +Type = service +Priority = 40 +... +Summary +Task Group Queued Starting Running Failed Complete Lost +webserver 1 0 2 0 1 0 + +Placement Failure +Task Group "webserver": + * Resources exhausted on 3 nodes + * Dimension "memory" exhausted on 3 nodes +... + +Allocations +ID Node ID Task Group Version Desired Status Created Modified +0850e103 cf8487e2 webserver 0 evict complete 21m17s ago 18s ago +551a7283 ad10ba3b webserver 0 run running 21m17s ago 20m55s ago +8a3d7e1e 18997de9 webserver 0 run running 21m17s ago 20m57s ago +``` + +### Stop the job + +Use the [`nomad job stop` command][] on the `redis` job. This will provide the +capacity necessary to unblock the third `webserver` allocation. + +```shell-session +$ nomad job stop redis +==> Monitoring evaluation "df368cb1" + Evaluation triggered by job "redis" + Evaluation within deployment: "37b37a63" + Evaluation status changed: "pending" -> "complete" +==> Evaluation "df368cb1" finished with status "complete" +``` + +Run the [`nomad job status` command][] on the `webserver` job. The output should +now indicate that a new third allocation was created to replace the one that was +preempted. + +```shell-session +$ nomad job status webserver +ID = webserver +Name = webserver +Submit Date = 2021-02-11T19:18:29-05:00 +Type = service +Priority = 40 +Datacenters = dc1 +Namespace = default +Status = running +Periodic = false +Parameterized = false + +Summary +Task Group Queued Starting Running Failed Complete Lost +webserver 0 0 3 0 1 0 + +Latest Deployment +ID = 278b2e10 +Status = successful +Description = Deployment completed successfully + +Deployed +Task Group Desired Placed Healthy Unhealthy Progress Deadline +webserver 3 3 3 0 2021-02-12T00:28:51Z + +Allocations +ID Node ID Task Group Version Desired Status Created Modified +4e212aec cf8487e2 webserver 0 run running 21s ago 3s ago +0850e103 cf8487e2 webserver 0 evict complete 22m48s ago 1m49s ago +551a7283 ad10ba3b webserver 0 run running 22m48s ago 22m26s ago +8a3d7e1e 18997de9 webserver 0 run running 22m48s ago 22m28s ago +``` + +## Next steps + +The process you learned in this tutorial can also be applied to +[batch][batch-enabled] jobs as well. Read more about preemption in the +[Nomad documentation][preemption]. + +### Reference material + +- [Preemption][preemption] + +[batch-enabled]: /nomad/api-docs/operator/scheduler#batchschedulerenabled-1 +[batch-job]: /nomad/docs/concepts/scheduling/schedulers#batch +[count]: /nomad/docs/job-specification/group#count +[enterprise]: /nomad/docs/enterprise +[memory]: /nomad/docs/job-specification/resources#memory +[payload-preemption-config]: //nomad/api-docs/operator/scheduler#preemptionconfig-1 +[preemption-config]: /nomad/api-docs/operator/scheduler#preemptionconfig-1 +[preemption]: /nomad/docs/concepts/scheduling/preemption +[priority]: /nomad/docs/job-specification/job#priority +[repository]: https://github.com/hashicorp/nomad/tree/master/terraform#provision-a-nomad-cluster-in-the-cloud +[scheduler-configuration]: /nomad/api-docs/operator/scheduler#read-scheduler-configuration +[service-enabled]: /nomad/api-docs/operator/scheduler#serviceschedulerenabled-1 +[service-job]: /nomad/docs/concepts/scheduling/schedulers#service +[step-1]: #create-a-job-with-low-priority +[system-job]: /nomad/docs/concepts/scheduling/schedulers#system +[t2-micro]: https://aws.amazon.com/ec2/instance-types/ +[update-scheduler]: /nomad/api-docs/operator/scheduler#update-scheduler-configuration +[`nomad job plan` command]: /nomad/commands/job/plan +[`nomad job run` command]: /nomad/commands/job/run +[`nomad job status` command]: /nomad/commands/job/status +[`nomad job stop` command]: /nomad/commands/job/stop +[api-auth]: /nomad/api-docs/#authentication diff --git a/website/content/docs/job-scheduling/spread.mdx b/website/content/docs/job-scheduling/spread.mdx new file mode 100644 index 000000000..6ca3c5091 --- /dev/null +++ b/website/content/docs/job-scheduling/spread.mdx @@ -0,0 +1,277 @@ +--- +layout: docs +page_title: Use spread to increase failure tolerance +description: >- + Create a job with the spread stanza to prevent application downtime + as a result of a physical domain failure in a datacenter or rack. +--- + +# Use spread to increase failure tolerance + +The Nomad scheduler uses a bin-packing algorithm when making job placements on +nodes to optimize resource utilization and density of applications. Although bin +packing ensures optimal resource utilization, it can lead to some nodes carrying +a majority of allocations for a given job. This can cause cascading failures +where the failure of a single node or a single data center can lead to +application unavailability. + +The [spread stanza][spread-stanza] solves this problem by allowing operators to +distribute their workloads in a customized way based on [attributes] and/or +[client metadata][client-metadata]. By using spread criteria in their job +specification, Nomad job operators can ensure that failures across a domain such +as datacenter or rack don't affect application availability. + +Consider a Nomad application that needs to be deployed to multiple datacenters +within a region. Datacenter `dc1` has four nodes while `dc2` has one node. This +application has ten instances and seven of them must be deployed to `dc1` since +it receives more user traffic and you need to make sure the application doesn't +suffer downtime due to not enough running instances to process requests. The +remaining 3 allocations can be deployed to `dc2`. + +Use the `spread` stanza in the Nomad [job specification][job-specification] to +ensure the 70% of the workload is being placed in datacenter `dc1` and 30% is +being placed in `dc2`. The Nomad operator can use the [percent] option with a +[target] to customize the spread. + +### Prerequisites + +To perform the tasks described in this guide, you need to have a Nomad +environment with Consul installed. You can use this [repository] to provision a +sandbox environment. This guide will assume a cluster with one server node and +five client nodes. + + + + This guide is for demo purposes and is only using a single +server node. In a production cluster, 3 or 5 server nodes are recommended. + + + +## Place one of the client nodes in a different datacenter + +In this guide, you are going to customize the spread for our job placement +between the datacenters our nodes are located in. Choose one of your client +nodes and edit `/etc/nomad.d/nomad.hcl` to change its location to `dc2`. A +snippet of an example configuration file is show below with the required change +is shown below. + +```hcl +data_dir = "/opt/nomad/data" +bind_addr = "0.0.0.0" +datacenter = "dc2" + +# Enable the client +client { + enabled = true +# ... +} +``` + +After making the change on your chosen client node, restart the Nomad service + +```shell-session +$ sudo systemctl restart nomad +``` + +If everything is configured correctly, you should be able to run the [`nomad node status`][node-status] command and confirm that one of your nodes is now in +datacenter `dc2`. + +```shell-session +$ nomad node status +ID DC Name Class Drain Eligibility Status +5d16d949 dc2 ip-172-31-62-240 false eligible ready +7b381152 dc1 ip-172-31-59-115 false eligible ready +10cc48cc dc1 ip-172-31-58-46 false eligible ready +93f1e628 dc1 ip-172-31-58-113 false eligible ready +12894b80 dc1 ip-172-31-62-90 false eligible ready +``` + +## Create a job with the spread stanza + +Create a file with the name `redis.nomad.hcl` and place the following content in it: + +```hcl +job "redis" { + datacenters = ["dc1", "dc2"] + type = "service" + + spread { + attribute = "${node.datacenter}" + weight = 100 + + target "dc1" { + percent = 70 + } + + target "dc2" { + percent = 30 + } + } + + group "cache1" { + count = 10 + + network { + port "db" { + to = 6379 + } + } + + task "redis" { + driver = "docker" + + config { + image = "redis:latest" + + ports = ["db"] + } + + service { + name = "redis-cache" + port = "db" + + check { + name = "alive" + type = "tcp" + interval = "10s" + timeout = "2s" + } + } + } + } +} +``` + +Note that the job specifies the `spread` stanza and specified the +[datacenter][attributes] attribute while targeting `dc1` and `dc2` with the +percent options. This will tell the Nomad scheduler to make an attempt to +distribute 70% of the workload on `dc1` and 30% of the workload on `dc2`. + +## Register the redis.nomad.hcl job + +Run the Nomad job with the following command: + +```shell-session +$ nomad run redis.nomad.hcl +==> Monitoring evaluation "c3dc5ebd" + Evaluation triggered by job "redis" + Allocation "7a374183" created: node "5d16d949", group "cache1" + Allocation "f4361df1" created: node "7b381152", group "cache1" + Allocation "f7af42dc" created: node "5d16d949", group "cache1" + Allocation "0638edf2" created: node "10cc48cc", group "cache1" + Allocation "49bc6038" created: node "12894b80", group "cache1" + Allocation "c7e5679a" created: node "5d16d949", group "cache1" + Allocation "cf91bf65" created: node "7b381152", group "cache1" + Allocation "d16b606c" created: node "12894b80", group "cache1" + Allocation "27866df0" created: node "93f1e628", group "cache1" + Allocation "8531a6fc" created: node "7b381152", group "cache1" + Evaluation status changed: "pending" -> "complete" +``` + +Note that three of the ten allocations have been placed on node `5d16d949`. This +is the node configured to be in datacenter `dc2`. The Nomad scheduler has +distributed 30% of the workload to `dc2` as specified in the `spread` stanza. + +Keep in mind that the Nomad scheduler still factors in other components into the +overall scoring of nodes when making placements, so you should not expect the +spread stanza to strictly implement your distribution preferences like a +[constraint][constraint-stanza]. Now, take a detailed look at the scoring in +the next few steps. + +## Check the status of the job + +Check the status of the job and verify where allocations have been placed. Run +the following command: + +```shell-session +$ nomad status redis +``` + +The output should list ten running instances of your job in the `Summary` +section as show below: + +```plaintext +... +Summary +Task Group Queued Starting Running Failed Complete Lost +cache1 0 0 10 0 0 0 + +Allocations +ID Node ID Task Group Version Desired Status Created Modified +0638edf2 10cc48cc cache1 0 run running 2m20s ago 2m ago +27866df0 93f1e628 cache1 0 run running 2m20s ago 1m57s ago +49bc6038 12894b80 cache1 0 run running 2m20s ago 1m58s ago +7a374183 5d16d949 cache1 0 run running 2m20s ago 2m1s ago +8531a6fc 7b381152 cache1 0 run running 2m20s ago 2m2s ago +c7e5679a 5d16d949 cache1 0 run running 2m20s ago 1m55s ago +cf91bf65 7b381152 cache1 0 run running 2m20s ago 1m57s ago +d16b606c 12894b80 cache1 0 run running 2m20s ago 2m1s ago +f4361df1 7b381152 cache1 0 run running 2m20s ago 2m3s ago +f7af42dc 5d16d949 cache1 0 run running 2m20s ago 1m54s ago +``` + +You can cross-check this output with the results of the `nomad node status` +command to verify that 30% of your workload has been placed on the node in `dc2` +(in our case, that node is `5d16d949`). + +## Obtain detailed scoring information on job placement + +The Nomad scheduler will not always spread your workload in the way you have +specified in the `spread` stanza even if the resources are available. This is +because spread scoring is factored in with other metrics as well before making a +scheduling decision. In this step, you will take a look at some of those other +factors. + +Using the output from the previous step, take any allocation that has been +placed on a node and use the nomad [alloc status][alloc status] command with the +[verbose][verbose] option to obtain detailed scoring information on it. In this +example, the guide refers to allocation ID `0638edf2`—your allocation IDs will +be different. + +```shell-session +$ nomad alloc status -verbose 0638edf2 +``` + +The resulting output will show the `Placement Metrics` section at the bottom. + +```plaintext +... +Placement Metrics +Node node-affinity allocation-spread binpack job-anti-affinity node-reschedule-penalty final score +10cc48cc-2913-af54-74d5-d7559f373ff2 0 0.429 0.33 0 0 0.379 +93f1e628-e509-b1ab-05b7-0944056f781d 0 0.429 0.515 -0.2 0 0.248 +12894b80-4943-4d5c-5716-c626c6b99be3 0 0.429 0.515 -0.2 0 0.248 +7b381152-3802-258b-4155-6d7dfb344dd4 0 0.429 0.515 -0.2 0 0.248 +5d16d949-85aa-3fd3-b5f4-51094cbeb77a 0 0.333 0.515 -0.2 0 0.216 +``` + +Note that the results from the `allocation-spread`, `binpack`, +`job-anti-affinity`, `node-reschedule-penalty`, and `node-affinity` columns are +combined to produce the numbers listed in the `final score` column for each +node. The Nomad scheduler uses the final score for each node in deciding where +to make placements. + +## Next steps + +Change the values of the `percent` options on your targets in the `spread` +stanza and observe how the placement behavior along with the final score given +to each node changes (use the `nomad alloc status` command as shown in the +previous step). + +### Reference material + +- The [spread][spread-stanza] stanza documentation +- [Scheduling][scheduling] with Nomad + +[alloc status]: /nomad/commands/alloc/status +[attributes]: /nomad/docs/reference/runtime-variable-interpolation +[client-metadata]: /nomad/docs/configuration/client#meta +[constraint-stanza]: /nomad/docs/job-specification/constraint +[job-specification]: /nomad/docs/job-specification +[node-status]: /nomad/commands/node/status +[percent]: /nomad/docs/job-specification/spread#percent +[repository]: https://github.com/hashicorp/nomad/tree/master/terraform#provision-a-nomad-cluster-in-the-cloud +[scheduling]: /nomad/docs/concepts/scheduling/how-scheduling-works +[spread-stanza]: /nomad/docs/job-specification/spread +[target]: /nomad/docs/job-specification/spread#target +[verbose]: /nomad/commands/alloc/status#verbose diff --git a/website/content/docs/job-specification/affinity.mdx b/website/content/docs/job-specification/affinity.mdx index 2d0a921b5..b51f2fc31 100644 --- a/website/content/docs/job-specification/affinity.mdx +++ b/website/content/docs/job-specification/affinity.mdx @@ -68,7 +68,7 @@ allocations. - `attribute` `(string: "")` - Specifies the name or reference of the attribute to examine for the affinity. This can be any of the [Nomad interpolated - values](/nomad/docs/runtime/interpolation#interpreted_node_vars). + values](/nomad/docs/reference/runtime-variable-interpolation#interpreted_node_vars). - `operator` `(string: "=")` - Specifies the comparison operator. The ordering is compared lexically. Possible values include: @@ -92,7 +92,7 @@ allocations. - `value` `(string: "")` - Specifies the value to compare the attribute against using the specified operation. This can be a literal value, another attribute, or any [Nomad interpolated - values](/nomad/docs/runtime/interpolation#interpreted_node_vars). + values](/nomad/docs/reference/runtime-variable-interpolation#interpreted_node_vars). - `weight` `(integer: 50)` - Specifies a weight for the affinity. The weight is used during scoring and must be an integer between -100 to 100. Negative weights act as @@ -274,6 +274,6 @@ The placement score is affected by the following factors: [group]: /nomad/docs/job-specification/group [client-meta]: /nomad/docs/configuration/client#meta [task]: /nomad/docs/job-specification/task -[interpolation]: /nomad/docs/runtime/interpolation -[node-variables]: /nomad/docs/runtime/interpolation#node-variables +[interpolation]: /nomad/docs/reference/runtime-variable-interpolation +[node-variables]: /nomad/docs/reference/runtime-variable-interpolation#node-variables [constraint]: /nomad/docs/job-specification/constraint diff --git a/website/content/docs/job-specification/artifact.mdx b/website/content/docs/job-specification/artifact.mdx index 4ab868e67..211e600ca 100644 --- a/website/content/docs/job-specification/artifact.mdx +++ b/website/content/docs/job-specification/artifact.mdx @@ -136,7 +136,7 @@ artifact { To download from a private repo, sshkey needs to be set. The key must be base64-encoded string. On Linux, you can run `base64 -w0 ` to encode the -file. Or use [HCL2](/nomad/docs/job-specification/hcl2) +file. Or use [HCL2](/nomad/docs/reference/hcl2) expressions to read and encode the key from a file on your machine: ```hcl @@ -282,7 +282,7 @@ artifact { [s3-bucket-addr]: http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingBucket.html#access-bucket-intro 'Amazon S3 Bucket Addressing' [s3-region-endpoints]: http://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region 'Amazon S3 Region Endpoints' [iam-instance-profiles]: https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_switch-role-ec2_instance-profiles.html 'EC2 IAM instance profiles' -[task's working directory]: /nomad/docs/runtime/environment#task-directories 'Task Directories' +[task's working directory]: /nomad/docs/reference/runtime-environment-settings#task-directories 'Task Directories' [task_user]: /nomad/docs/job-specification/task#user [filesystem internals]: /nomad/docs/concepts/filesystem#templates-artifacts-and-dispatch-payloads [do_spaces]: https://www.digitalocean.com/products/spaces diff --git a/website/content/docs/job-specification/connect.mdx b/website/content/docs/job-specification/connect.mdx index 31cbb14e0..df7c21195 100644 --- a/website/content/docs/job-specification/connect.mdx +++ b/website/content/docs/job-specification/connect.mdx @@ -10,7 +10,7 @@ description: |- The `connect` block allows configuring various options for -[Consul Connect](/nomad/docs/integrations/consul-connect). It is +[Consul Connect](/nomad/docs/networking/consul). It is valid only within the context of a service definition at the task group level. For using `connect` when Consul ACLs are enabled, be sure to read through the [Secure Nomad Jobs with Consul Connect](/nomad/tutorials/integrate-consul/consul-service-mesh) @@ -246,7 +246,7 @@ job "ingress-demo" { [group]: /nomad/docs/job-specification/group "Nomad group Job Specification" -[interpolation]: /nomad/docs/runtime/interpolation "Nomad interpolation" +[interpolation]: /nomad/docs/reference/runtime-variable-interpolation "Nomad interpolation" [job]: /nomad/docs/job-specification/job "Nomad job Job Specification" diff --git a/website/content/docs/job-specification/constraint.mdx b/website/content/docs/job-specification/constraint.mdx index 3622c9f1a..548c1b58c 100644 --- a/website/content/docs/job-specification/constraint.mdx +++ b/website/content/docs/job-specification/constraint.mdx @@ -69,7 +69,7 @@ allocations. - `attribute` `(string: "")` - Specifies the name or reference of the attribute to examine for the constraint. This can be any of the [Nomad interpolated - values](/nomad/docs/runtime/interpolation#interpreted_node_vars). + values](/nomad/docs/reference/runtime-variable-interpolation#interpreted_node_vars). - `operator` `(string: "=")` - Specifies the comparison operator. If the operator is one of `>, >=, <, <=`, the ordering is compared numerically if the operands @@ -99,7 +99,7 @@ allocations. - `value` `(string: "")` - Specifies the value to compare the attribute against using the specified operation. This can be a literal value, another attribute, or any [Nomad interpolated - values](/nomad/docs/runtime/interpolation#interpreted_node_vars). + values](/nomad/docs/reference/runtime-variable-interpolation#interpreted_node_vars). ### `operator` values @@ -319,7 +319,7 @@ constraint { [group]: /nomad/docs/job-specification/group 'Nomad group Job Specification' [client-meta]: /nomad/docs/configuration/client#meta 'Nomad meta Job Specification' [task]: /nomad/docs/job-specification/task 'Nomad task Job Specification' -[interpolation]: /nomad/docs/runtime/interpolation 'Nomad interpolation' -[node-variables]: /nomad/docs/runtime/interpolation#node-variables- 'Nomad interpolation-Node variables' +[interpolation]: /nomad/docs/reference/runtime-variable-interpolation 'Nomad interpolation' +[node-variables]: /nomad/docs/reference/runtime-variable-interpolation#node-variables- 'Nomad interpolation-Node variables' [client-meta]: /nomad/docs/configuration/client#custom-metadata-and-node-class 'Nomad Custom Metadata and Node Class' [semver2]: https://semver.org/spec/v2.0.0.html 'Semantic Versioning 2.0' diff --git a/website/content/docs/job-specification/consul.mdx b/website/content/docs/job-specification/consul.mdx index cdd835067..2279e0bac 100644 --- a/website/content/docs/job-specification/consul.mdx +++ b/website/content/docs/job-specification/consul.mdx @@ -281,16 +281,16 @@ job "docs" { [`identity`]: /nomad/docs/job-specification/identity [`consul.service_identity`]: /nomad/docs/configuration/consul#service_identity [`consul.token`]: /nomad/docs/configuration/consul#token -[Configuring Consul Authentication]: /nomad/docs/integrations/consul/acl#configuring-consul-authentication -[Migrating to Using Workload Identity with Consul]: /nomad/docs/integrations/consul/acl#migrating-to-using-workload-identity-with-consul +[Configuring Consul Authentication]: /nomad/docs/secure/acl/consul#configuring-consul-authentication +[Migrating to Using Workload Identity with Consul]: /nomad/docs/secure/acl/consul#migrating-to-using-workload-identity-with-consul [config_consul_namespace]: /nomad/docs/configuration/consul#namespace [Consul Namespaces]: /consul/docs/enterprise/namespaces [Consul Admin Partitions]: /consul/docs/enterprise/admin-partitions [template]: /nomad/docs/job-specification/template "Nomad template Job Specification" [`consul.name`]: /nomad/docs/configuration/consul#name -[flag_consul_namespace]: /nomad/docs/commands/job/run#consul-namespace +[flag_consul_namespace]: /nomad/commands/job/run#consul-namespace [Connect]: /nomad/docs/job-specification/connect [admin partition]: /consul/docs/enterprise/admin-partitions [agent configuration reference]: /consul/docs/agent/config/config-files#partition-1 [Consul Enterprise]: /consul/docs/enterprise -[Nomad Workload Identities]: /nomad/docs/integrations/consul/acl#nomad-workload-identities +[Nomad Workload Identities]: /nomad/docs/secure/acl/consul#nomad-workload-identities diff --git a/website/content/docs/job-specification/csi_plugin.mdx b/website/content/docs/job-specification/csi_plugin.mdx index e6f173fcc..3c0440549 100644 --- a/website/content/docs/job-specification/csi_plugin.mdx +++ b/website/content/docs/job-specification/csi_plugin.mdx @@ -137,5 +137,5 @@ job "plugin-efs" { [csi]: https://github.com/container-storage-interface/spec [csi_volumes]: /nomad/docs/job-specification/volume -[system]: /nomad/docs/schedulers#system -[`topology_request`]: /nomad/docs/commands/volume/create#topology_request +[system]: /nomad/docs/concepts/scheduling/schedulers#system +[`topology_request`]: /nomad/commands/volume/create#topology_request diff --git a/website/content/docs/job-specification/dispatch_payload.mdx b/website/content/docs/job-specification/dispatch_payload.mdx index 1ae70337a..240a89804 100644 --- a/website/content/docs/job-specification/dispatch_payload.mdx +++ b/website/content/docs/job-specification/dispatch_payload.mdx @@ -44,5 +44,5 @@ dispatch_payload { } ``` -[localdir]: /nomad/docs/runtime/environment#local 'Task Local Directory' -[parameterized]: /nomad/docs/job-specification/parameterized 'Nomad parameterized Job Specification' +[localdir]: /nomad/docs/reference/runtime-environment-settings#local +[parameterized]: /nomad/docs/job-specification/parameterized diff --git a/website/content/docs/job-specification/env.mdx b/website/content/docs/job-specification/env.mdx index 72fc4909f..d60a9874f 100644 --- a/website/content/docs/job-specification/env.mdx +++ b/website/content/docs/job-specification/env.mdx @@ -2,7 +2,7 @@ layout: docs page_title: env block in the job specification description: |- - Configure environment variables in the `env` block of the Nomad job specification. + Configure environment variables in the `env` block of the Nomad job specification. --- # `env` block in the job specification @@ -80,5 +80,5 @@ Nomad also supports populating dynamic environment variables from data stored in HashiCorp Consul and Vault. To use this feature please see the documentation on the [`template` block][template-env]. -[interpolation]: /nomad/docs/runtime/interpolation 'Nomad interpolation' +[interpolation]: /nomad/docs/reference/runtime-variable-interpolation 'Nomad interpolation' [template-env]: /nomad/docs/job-specification/template#environment-variables 'Nomad template Block' diff --git a/website/content/docs/job-specification/gateway.mdx b/website/content/docs/job-specification/gateway.mdx index 5f13d6931..911143247 100644 --- a/website/content/docs/job-specification/gateway.mdx +++ b/website/content/docs/job-specification/gateway.mdx @@ -2,7 +2,7 @@ layout: docs page_title: gateway block in the job specification description: |- - Configure a Consul mesh, ingress, or terminating gateway in the `gateway` block of the Nomad job specification. + Configure a Consul mesh, ingress, or terminating gateway in the `gateway` block of the Nomad job specification. --- # `gateway` block in the job specification @@ -729,7 +729,7 @@ job "countdash-mesh-two" { [proxy]: /nomad/docs/job-specification/gateway#proxy-parameters [linked-service]: /nomad/docs/job-specification/gateway#linked-service-parameters [listener]: /nomad/docs/job-specification/gateway#listener-parameters -[interpolation]: /nomad/docs/runtime/interpolation +[interpolation]: /nomad/docs/reference/runtime-variable-interpolation [listener-service]: /nomad/docs/job-specification/gateway#listener-service-parameters [service-default]: /consul/docs/connect/config-entries/service-defaults [sidecar_task]: /nomad/docs/job-specification/sidecar_task diff --git a/website/content/docs/job-specification/group.mdx b/website/content/docs/job-specification/group.mdx index 6970191b6..77879a0ba 100644 --- a/website/content/docs/job-specification/group.mdx +++ b/website/content/docs/job-specification/group.mdx @@ -202,7 +202,7 @@ group "example" { [job]: /nomad/docs/job-specification/job 'Nomad job Job Specification' [constraint]: /nomad/docs/job-specification/constraint 'Nomad constraint Job Specification' [consul]: /nomad/docs/job-specification/consul -[consul_namespace]: /nomad/docs/commands/job/run#consul-namespace +[consul_namespace]: /nomad/commands/job/run#consul-namespace [spread]: /nomad/docs/job-specification/spread 'Nomad spread Job Specification' [affinity]: /nomad/docs/job-specification/affinity 'Nomad affinity Job Specification' [ephemeraldisk]: /nomad/docs/job-specification/ephemeral_disk 'Nomad ephemeral_disk Job Specification' @@ -215,7 +215,7 @@ group "example" { [disconnect]: /nomad/docs/job-specification/disconnect 'Nomad disconnect Job Specification' [restart]: /nomad/docs/job-specification/restart 'Nomad restart Job Specification' [service]: /nomad/docs/job-specification/service 'Nomad service Job Specification' -[service_discovery]: /nomad/docs/integrations/consul-integration#service-discovery 'Nomad Service Discovery' +[service_discovery]: /nomad/docs/networking/service-discovery 'Nomad Service Discovery' [update]: /nomad/docs/job-specification/update 'Nomad update Job Specification' [vault]: /nomad/docs/job-specification/vault 'Nomad vault Job Specification' [volume]: /nomad/docs/job-specification/volume 'Nomad volume Job Specification' diff --git a/website/content/docs/job-specification/identity.mdx b/website/content/docs/job-specification/identity.mdx index bf8fe6d44..c339f647a 100644 --- a/website/content/docs/job-specification/identity.mdx +++ b/website/content/docs/job-specification/identity.mdx @@ -360,9 +360,9 @@ EOF [`template`]: /nomad/docs/job-specification/template [`vault.default_identity`]: /nomad/docs/configuration/vault#default_identity [`vault`]: /nomad/docs/job-specification/vault -[int_consul_wid]: /nomad/docs/integrations/consul/acl -[int_vault_wid]: /nomad/docs/integrations/vault/acl +[int_consul_wid]: /nomad/docs/secure/acl/consul +[int_vault_wid]: /nomad/docs/secure/vault/acl [taskapi]: /nomad/api-docs/task-api [taskuser]: /nomad/docs/job-specification/task#user "Nomad task Block" [windows]: https://devblogs.microsoft.com/commandline/af_unix-comes-to-windows/ -[task working directory]: /nomad/docs/runtime/environment#task-directories 'Task Directories' +[task working directory]: /nomad/docs/reference/runtime-environment-settings#task-directories 'Task Directories' diff --git a/website/content/docs/job-specification/job.mdx b/website/content/docs/job-specification/job.mdx index ba208e2d8..5bf563da6 100644 --- a/website/content/docs/job-specification/job.mdx +++ b/website/content/docs/job-specification/job.mdx @@ -245,12 +245,12 @@ $ VAULT_TOKEN="..." nomad job run example.nomad.hcl [group]: /nomad/docs/job-specification/group 'Nomad group Job Specification' [meta]: /nomad/docs/job-specification/meta 'Nomad meta Job Specification' [migrate]: /nomad/docs/job-specification/migrate 'Nomad migrate Job Specification' -[namespace]: /nomad/tutorials/manage-clusters/namespaces +[namespace]: /nomad/docs/govern/namespaces [parameterized]: /nomad/docs/job-specification/parameterized 'Nomad parameterized Job Specification' [periodic]: /nomad/docs/job-specification/periodic 'Nomad periodic Job Specification' -[region]: /nomad/tutorials/manage-clusters/federation +[region]: //nomad/docs/deploy/clusters/federate-regions [reschedule]: /nomad/docs/job-specification/reschedule 'Nomad reschedule Job Specification' -[scheduler]: /nomad/docs/schedulers 'Nomad Scheduler Types' +[scheduler]: /nomad/docs/concepts/scheduling/schedulers 'Nomad Scheduler Types' [spread]: /nomad/docs/job-specification/spread 'Nomad spread Job Specification' [task]: /nomad/docs/job-specification/task 'Nomad task Job Specification' [update]: /nomad/docs/job-specification/update 'Nomad update Job Specification' diff --git a/website/content/docs/job-specification/logs.mdx b/website/content/docs/job-specification/logs.mdx index ec41b9b67..8640b3862 100644 --- a/website/content/docs/job-specification/logs.mdx +++ b/website/content/docs/job-specification/logs.mdx @@ -93,6 +93,6 @@ logs { } ``` -[logs-command]: /nomad/docs/commands/alloc/logs 'Nomad logs command' -[`disable_log_collection`]: /nomad/docs/drivers/docker#disable_log_collection +[logs-command]: /nomad/commands/alloc/logs 'Nomad logs command' +[`disable_log_collection`]: /nomad/docs/deploy/task-driver/docker#disable_log_collection [ephemeral disk documentation]: /nomad/docs/job-specification/ephemeral_disk 'Nomad ephemeral disk Job Specification' diff --git a/website/content/docs/job-specification/meta.mdx b/website/content/docs/job-specification/meta.mdx index dd57b174f..159bbccb5 100644 --- a/website/content/docs/job-specification/meta.mdx +++ b/website/content/docs/job-specification/meta.mdx @@ -110,5 +110,5 @@ EOH [job]: /nomad/docs/job-specification/job 'Nomad job Job Specification' [group]: /nomad/docs/job-specification/group 'Nomad group Job Specification' [task]: /nomad/docs/job-specification/task 'Nomad task Job Specification' -[interpolation]: /nomad/docs/runtime/interpolation 'Nomad interpolation' -[env_meta]: /nomad/docs/runtime/environment#meta-block-for-arbitrary-configuration +[interpolation]: /nomad/docs/reference/runtime-variable-interpolation 'Nomad interpolation' +[env_meta]: /nomad/docs/reference/runtime-environment-settings#meta-block-for-arbitrary-configuration diff --git a/website/content/docs/job-specification/migrate.mdx b/website/content/docs/job-specification/migrate.mdx index ddf15a112..2e9574936 100644 --- a/website/content/docs/job-specification/migrate.mdx +++ b/website/content/docs/job-specification/migrate.mdx @@ -49,7 +49,7 @@ block for allocations on that node. The `migrate` block is for job authors to define how their services should be migrated, while the node drain deadline is for system operators to put hard limits on how long a drain may take. -See the [Workload Migration Guide](/nomad/tutorials/manage-clusters/node-drain) for details +See the [Workload Migration Guide](/nomad/docs/manage/migrate-workloads) for details on node draining. ## Parameters @@ -82,8 +82,8 @@ on node draining. [checks]: /nomad/docs/job-specification/service#check [count]: /nomad/docs/job-specification/group#count -[drain]: /nomad/docs/commands/node/drain -[deadline]: /nomad/docs/commands/node/drain#deadline +[drain]: /nomad/commands/node/drain +[deadline]: /nomad/commands/node/drain#deadline [replaces]: /nomad/docs/job-specification/disconnect#replace [`restart`]: /nomad/docs/job-specification/restart [reschedules]: /nomad/docs/job-specification/reschedule diff --git a/website/content/docs/job-specification/multiregion.mdx b/website/content/docs/job-specification/multiregion.mdx index 3269d121f..7c8255b23 100644 --- a/website/content/docs/job-specification/multiregion.mdx +++ b/website/content/docs/job-specification/multiregion.mdx @@ -277,13 +277,13 @@ group "worker" { } ``` -[federated regions]: /nomad/tutorials/manage-clusters/federation +[federated regions]: //nomad/docs/deploy/clusters/federate-regions [`update` block]: /nomad/docs/job-specification/update [update-auto-revert]: /nomad/docs/job-specification/update#auto_revert [examples]: #examples -[upgrade strategies]: /nomad/tutorials/job-updates -[`nomad deployment unblock`]: /nomad/docs/commands/deployment/unblock +[upgrade strategies]: /nomad/docs/job-declare/strategy/ +[`nomad deployment unblock`]: /nomad/commands/deployment/unblock [parameterized job]: /nomad/docs/job-specification/parameterized -[`job dispatch`]: /nomad/docs/commands/job/dispatch +[`job dispatch`]: /nomad/commands/job/dispatch [HTTP API]: /nomad/api-docs/jobs#dispatch-job [time zone]: /nomad/docs/job-specification/periodic#time_zone diff --git a/website/content/docs/job-specification/network.mdx b/website/content/docs/job-specification/network.mdx index cb50d7c4f..bee4e5041 100644 --- a/website/content/docs/job-specification/network.mdx +++ b/website/content/docs/job-specification/network.mdx @@ -39,7 +39,7 @@ job "docs" { When the `network` block is defined with `bridge` as the networking mode, all tasks in the task group share the same network namespace. This is a prerequisite for -[Consul Connect](/nomad/docs/integrations/consul-connect). Tasks running within a +[Consul Connect](/nomad/docs/networking/consul). Tasks running within a network namespace are not visible to applications outside the namespace on the same host. This allows [Connect][]-enabled applications to bind only to localhost within the shared network stack, and use the proxy for ingress and egress traffic. @@ -81,7 +81,7 @@ All other operating systems use the `host` networking mode. - `hostname` `(string: "")` - The hostname assigned to the network namespace. This is currently only supported using the [Docker driver][docker-driver] and when the [mode](#mode) is set to [`bridge`](#bridge). This parameter supports - [interpolation](/nomad/docs/runtime/interpolation). + [interpolation](/nomad/docs/reference/runtime-variable-interpolation). - `dns` ([DNSConfig](#dns-parameters): nil) - Sets the DNS configuration for the allocations. By default all task drivers will inherit @@ -133,7 +133,7 @@ The label of the port is just text - it has no special meaning to Nomad. - `searches` `(array: nil)` - Sets the search list for hostname lookup - `options` `(array: nil)` - Sets internal resolver variables. -These parameters support [interpolation](/nomad/docs/runtime/interpolation). +These parameters support [interpolation](/nomad/docs/reference/runtime-variable-interpolation). ## `cni` parameters @@ -141,7 +141,7 @@ These parameters support [interpolation](/nomad/docs/runtime/interpolation). These get turned into `CNI_ARGS` per the [CNI spec](https://www.cni.dev/docs/spec/#parameters). -These parameters support [interpolation](/nomad/docs/runtime/interpolation). +These parameters support [interpolation](/nomad/docs/reference/runtime-variable-interpolation). ## Examples @@ -364,7 +364,7 @@ network { variables are set for group network ports. [docs_networking_bridge]: /nomad/docs/networking#bridge-networking -[docker-driver]: /nomad/docs/drivers/docker 'Nomad Docker Driver' -[qemu-driver]: /nomad/docs/drivers/qemu 'Nomad QEMU Driver' +[docker-driver]: /nomad/docs/job-declare/task-driver/docker 'Nomad Docker Driver' +[qemu-driver]: /nomad/docs/job-declare/task-driver/qemu 'Nomad QEMU Driver' [connect]: /nomad/docs/job-specification/connect 'Nomad Consul Connect Integration' [`cni_path`]: /nomad/docs/configuration/client#cni_path diff --git a/website/content/docs/job-specification/parameterized.mdx b/website/content/docs/job-specification/parameterized.mdx index 41adc948f..6523cd647 100644 --- a/website/content/docs/job-specification/parameterized.mdx +++ b/website/content/docs/job-specification/parameterized.mdx @@ -213,9 +213,9 @@ $ nomad job periodic force sync/dispatch-1730972650-247c6e97 ``` [batch-type]: /nomad/docs/job-specification/job#type 'Batch scheduler type' -[dispatch command]: /nomad/docs/commands/job/dispatch 'Nomad Job Dispatch Command' +[dispatch command]: /nomad/commands/job/dispatch 'Nomad Job Dispatch Command' [resources]: /nomad/docs/job-specification/resources 'Nomad resources Job Specification' -[interpolation]: /nomad/docs/runtime/interpolation 'Nomad Runtime Interpolation' +[interpolation]: /nomad/docs/reference/runtime-variable-interpolation 'Nomad Runtime Interpolation' [dispatch_payload]: /nomad/docs/job-specification/dispatch_payload 'Nomad dispatch_payload Job Specification' [multiregion]: /nomad/docs/job-specification/multiregion#parameterized-dispatch [periodic]: /nomad/docs/job-specification/periodic diff --git a/website/content/docs/job-specification/proxy.mdx b/website/content/docs/job-specification/proxy.mdx index c1320fb75..4319dc850 100644 --- a/website/content/docs/job-specification/proxy.mdx +++ b/website/content/docs/job-specification/proxy.mdx @@ -97,11 +97,11 @@ sidecar_service { } ``` -[Consul Connect]: /nomad/docs/integrations/consul-connect +[Consul Connect]: /nomad/docs/networking/consul [job]: /nomad/docs/job-specification/job 'Nomad job Job Specification' [group]: /nomad/docs/job-specification/group 'Nomad group Job Specification' [task]: /nomad/docs/job-specification/task 'Nomad task Job Specification' -[runtime variable interpolation]: /nomad/docs/runtime/interpolation 'Nomad interpolation' +[runtime variable interpolation]: /nomad/docs/reference/runtime-variable-interpolation 'Nomad interpolation' [sidecar_service]: /nomad/docs/job-specification/sidecar_service 'Nomad sidecar service Specification' [upstreams]: /nomad/docs/job-specification/upstreams 'Nomad upstream config Specification' [expose]: /nomad/docs/job-specification/expose 'Nomad proxy expose configuration' diff --git a/website/content/docs/job-specification/resources.mdx b/website/content/docs/job-specification/resources.mdx index 104a3c5ae..396c4d0bc 100644 --- a/website/content/docs/job-specification/resources.mdx +++ b/website/content/docs/job-specification/resources.mdx @@ -160,10 +160,10 @@ resource utilization and considering the following suggestions: [api_sched_config]: /nomad/api-docs/operator/scheduler#update-scheduler-configuration [device]: /nomad/docs/job-specification/device 'Nomad device Job Specification' -[docker_cpu]: /nomad/docs/drivers/docker#cpu -[exec_cpu]: /nomad/docs/drivers/exec#cpu +[docker_cpu]: /nomad/docs/deploy/task-driver/docker#cpu +[exec_cpu]: /nomad/docs/deploy/task-driver/exec#cpu [np_sched_config]: /nomad/docs/other-specifications/node-pool#memory_oversubscription_enabled [quota_spec]: /nomad/docs/other-specifications/quota [numa]: /nomad/docs/job-specification/numa 'Nomad NUMA Job Specification' -[`secrets/`]: /nomad/docs/runtime/environment#secrets -[concepts-cpu]: /nomad/docs/concepts/cpu +[`secrets/`]: /nomad/docs/reference/runtime-environment-settings#secrets +[concepts-cpu]: /nomad/docs/architecture/cpu diff --git a/website/content/docs/job-specification/service.mdx b/website/content/docs/job-specification/service.mdx index 0a9df0bf9..0c06814a7 100644 --- a/website/content/docs/job-specification/service.mdx +++ b/website/content/docs/job-specification/service.mdx @@ -318,7 +318,7 @@ network { ### Using driver address mode -The [Docker](/nomad/docs/drivers/docker#network_mode) driver supports the `driver` +The [Docker](/nomad/docs/job-declare/task-driver/docker#network_mode) driver supports the `driver` setting for the `address_mode` parameter in both `service` and `check` blocks. The driver address mode allows advertising and health checking the IP and port assigned to a task by the driver. This way, if you're using a network plugin like @@ -430,7 +430,7 @@ directly since Nomad isn't managing any port assignments. ### IPv6 Docker containers -The [Docker](/nomad/docs/drivers/docker#advertise_ipv6_address) driver supports the +The [Docker](/nomad/docs/job-declare/task-driver/docker#advertise_ipv6_address) driver supports the `advertise_ipv6_address` parameter in its configuration. Services will automatically advertise the IPv6 address when `advertise_ipv6_address` @@ -536,10 +536,10 @@ advertise and check directly since Nomad isn't managing any port assignments. [check_restart_block]: /nomad/docs/job-specification/check_restart [consul_grpc]: /consul/api-docs/agent/check#grpc [consul_passfail]: /consul/docs/discovery/checks#success-failures-before-passing-critical -[service-discovery]: /nomad/docs/integrations/consul-integration#service-discovery 'Nomad Service Discovery' -[interpolation]: /nomad/docs/runtime/interpolation 'Nomad Runtime Interpolation' +[service-discovery]: /nomad/docs/networking/service-discovery 'Nomad Service Discovery' +[interpolation]: /nomad/docs/reference/runtime-variable-interpolation 'Nomad Runtime Interpolation' [network]: /nomad/docs/job-specification/network 'Nomad network Job Specification' -[qemu]: /nomad/docs/drivers/qemu 'Nomad QEMU Driver' +[qemu]: /nomad/docs/job-declare/task-driver/qemu 'Nomad QEMU Driver' [restart_block]: /nomad/docs/job-specification/restart 'restart block' [connect]: /nomad/docs/job-specification/connect 'Nomad Consul Connect Integration' [kind]: /consul/api-docs/agent/service#kind diff --git a/website/content/docs/job-specification/sidecar_service.mdx b/website/content/docs/job-specification/sidecar_service.mdx index 16c94979e..9c622ce6b 100644 --- a/website/content/docs/job-specification/sidecar_service.mdx +++ b/website/content/docs/job-specification/sidecar_service.mdx @@ -12,7 +12,7 @@ description: |- The `sidecar_service` block allows configuring various options for the sidecar proxy managed by Nomad for [Consul -Connect](/nomad/docs/integrations/consul-connect) integration. It is +Connect](/nomad/docs/networking/consul) integration. It is valid only within the context of a connect block. ```hcl @@ -89,6 +89,6 @@ The following example includes specifying upstreams and meta. [job]: /nomad/docs/job-specification/job 'Nomad job Job Specification' [group]: /nomad/docs/job-specification/group 'Nomad group Job Specification' [task]: /nomad/docs/job-specification/task 'Nomad task Job Specification' -[interpolation]: /nomad/docs/runtime/interpolation 'Nomad interpolation' +[interpolation]: /nomad/docs/reference/runtime-variable-interpolation 'Nomad interpolation' [sidecar_service]: /nomad/docs/job-specification/sidecar_service 'Nomad sidecar service Specification' [proxy]: /nomad/docs/job-specification/proxy 'Nomad sidecar proxy config Specification' diff --git a/website/content/docs/job-specification/sidecar_task.mdx b/website/content/docs/job-specification/sidecar_task.mdx index 3cefcac05..b2ea0256f 100644 --- a/website/content/docs/job-specification/sidecar_task.mdx +++ b/website/content/docs/job-specification/sidecar_task.mdx @@ -12,7 +12,7 @@ description: |- The `sidecar_task` block allows configuring various options for the proxy sidecar or Connect gateway managed by Nomad for the [Consul -Connect](/nomad/docs/integrations/consul-connect) integration such as +Connect](/nomad/docs/networking/consul) integration such as resource requirements, kill timeouts and more as defined below. It is valid only within the context of a [`connect`][connect] block. @@ -179,7 +179,7 @@ The following example configures resources for the sidecar task and other config [connect]: /nomad/docs/job-specification/connect 'Nomad connect Job Specification' [gateway]: /nomad/docs/job-specification/gateway [group]: /nomad/docs/job-specification/group 'Nomad group Job Specification' -[interpolation]: /nomad/docs/runtime/interpolation 'Nomad interpolation' +[interpolation]: /nomad/docs/reference/runtime-variable-interpolation 'Nomad interpolation' [job]: /nomad/docs/job-specification/job 'Nomad job Job Specification' [logs]: /nomad/docs/job-specification/logs 'Nomad logs Job Specification' [resources]: /nomad/docs/job-specification/resources 'Nomad resources Job Specification' diff --git a/website/content/docs/job-specification/spread.mdx b/website/content/docs/job-specification/spread.mdx index ca34e3ae3..79d04f00b 100644 --- a/website/content/docs/job-specification/spread.mdx +++ b/website/content/docs/job-specification/spread.mdx @@ -78,7 +78,7 @@ allocations. - `attribute` `(string: "")` - Specifies the name or reference of the attribute to use. This can be any of the [Nomad interpolated - values](/nomad/docs/runtime/interpolation#interpreted_node_vars). + values](/nomad/docs/reference/runtime-variable-interpolation#interpreted_node_vars). - `target` ([target](#target-parameters): <required>) - Specifies one or more target percentages for each value of the `attribute` in the spread block. If this is omitted, @@ -88,7 +88,7 @@ allocations. during scoring and must be an integer between 0 to 100. Weights can be used when there is more than one spread or affinity block to express relative preference across them. -## Parameters +### Target parameters - `value` `(string:"")` - Specifies a target value of the attribute from a `spread` block. @@ -202,8 +202,8 @@ spread { [group]: /nomad/docs/job-specification/group 'Nomad group Job Specification' [client-meta]: /nomad/docs/configuration/client#meta 'Nomad meta Job Specification' [task]: /nomad/docs/job-specification/task 'Nomad task Job Specification' -[interpolation]: /nomad/docs/runtime/interpolation 'Nomad interpolation' -[node-variables]: /nomad/docs/runtime/interpolation#node-variables- 'Nomad interpolation-Node variables' +[interpolation]: /nomad/docs/reference/runtime-variable-interpolation 'Nomad interpolation' +[node-variables]: /nomad/docs/reference/runtime-variable-interpolation#node-variables- 'Nomad interpolation-Node variables' [constraint]: /nomad/docs/job-specification/constraint 'Nomad Constraint job Specification' -[Key Metrics]: /nomad/docs/operations/metrics-reference#key-metrics -[scheduler algorithm]: /nomad/docs/commands/operator/scheduler/set-config#scheduler-algorithm +[Key Metrics]: /nomad/docs/reference/metrics#key-metrics +[scheduler algorithm]: /nomad/commands/operator/scheduler/set-config#scheduler-algorithm diff --git a/website/content/docs/job-specification/task.mdx b/website/content/docs/job-specification/task.mdx index 3f681b913..3fe1b4da3 100644 --- a/website/content/docs/job-specification/task.mdx +++ b/website/content/docs/job-specification/task.mdx @@ -48,7 +48,7 @@ job "docs" { task to have access to dispatch payloads. - `driver` - Specifies the task driver that should be used to run the - task. See the [driver documentation](/nomad/docs/drivers) for what + task. See the [driver documentation](/nomad/docs/job-declare/task-driver) for what is available. Examples include `docker`, `qemu`, `java` and `exec`. - `env` ([Env][]: nil) - Specifies environment variables that will @@ -139,7 +139,7 @@ The following examples only show the `task` blocks. Remember that the This example defines a task that starts a Docker container as a service. Docker is just one of many drivers supported by Nomad. Read more about drivers in the -[Nomad drivers documentation](/nomad/docs/drivers). +[Nomad drivers documentation](/nomad/docs/job-declare/task-driver). ```hcl task "server" { @@ -225,16 +225,16 @@ task "server" { [service]: /nomad/docs/job-specification/service 'Nomad service Job Specification' [vault]: /nomad/docs/job-specification/vault 'Nomad vault Job Specification' [volumemount]: /nomad/docs/job-specification/volume_mount 'Nomad volume_mount Job Specification' -[exec]: /nomad/docs/drivers/exec 'Nomad exec Driver' -[raw_exec]: /nomad/docs/drivers/raw_exec 'Nomad raw_exec Driver' -[java]: /nomad/docs/drivers/java 'Nomad Java Driver' -[docker]: /nomad/docs/drivers/docker 'Nomad Docker Driver' +[exec]: /nomad/docs/job-declare/task-driver/exec 'Nomad exec Driver' +[raw_exec]: /nomad/docs/job-declare/task-driver/raw_exec 'Nomad raw_exec Driver' +[java]: /nomad/docs/job-declare/task-driver/java 'Nomad Java Driver' +[docker]: /nomad/docs/job-declare/task-driver/docker 'Nomad Docker Driver' [rkt]: /nomad/plugins/drivers/community/rkt 'Nomad rkt Driver' -[service_discovery]: /nomad/docs/integrations/consul-integration#service-discovery 'Nomad Service Discovery' +[service_discovery]: /nomad/docs/networking/service-discovery 'Nomad Service Discovery' [template]: /nomad/docs/job-specification/template 'Nomad template Job Specification' [user_drivers]: /nomad/docs/configuration/client#user-checked_drivers [user_denylist]: /nomad/docs/configuration/client#user-denylist [max_kill]: /nomad/docs/configuration/client#max_kill_timeout [kill_signal]: /nomad/docs/job-specification/task#kill_signal [Workload Identity]: /nomad/docs/concepts/workload-identity 'Nomad Workload Identity' -[service]: /nomad/docs/install/windows-service +[service]: /nomad/docs/deploy/production/windows-service diff --git a/website/content/docs/job-specification/template.mdx b/website/content/docs/job-specification/template.mdx index 61a8ce83f..3759dbfb8 100644 --- a/website/content/docs/job-specification/template.mdx +++ b/website/content/docs/job-specification/template.mdx @@ -818,12 +818,12 @@ options](/nomad/docs/configuration/client#options): [nvars]: /nomad/docs/concepts/variables 'Nomad Variables' [ct_api_tree]: https://github.com/hashicorp/consul-template/blob/master/docs/templating-language.md#tree 'Consul Template API by HashiCorp - tree' [gt]: https://pkg.go.dev/text/template 'Go template package' -[gt_learn]: /nomad/tutorials/templates/go-template-syntax +[gt_learn]: /nomad/docs/reference/go-template-syntax [artifact]: /nomad/docs/job-specification/artifact 'Nomad artifact Job Specification' -[env]: /nomad/docs/runtime/environment 'Nomad Runtime Environment' -[nodevars]: /nomad/docs/runtime/interpolation#interpreted_node_vars 'Nomad Node Variables' +[env]: /nomad/docs/reference/runtime-environment-settings 'Nomad Runtime Environment' +[nodevars]: /nomad/docs/reference/runtime-variable-interpolation#interpreted_node_vars 'Nomad Node Variables' [go-envparse]: https://github.com/hashicorp/go-envparse#readme 'The go-envparse Readme' -[task working directory]: /nomad/docs/runtime/environment#task-directories 'Task Directories' +[task working directory]: /nomad/docs/reference/runtime-environment-settings#task-directories 'Task Directories' [filesystem internals]: /nomad/docs/concepts/filesystem#templates-artifacts-and-dispatch-payloads [`client.template.wait_bounds`]: /nomad/docs/configuration/client#wait_bounds [rhash]: https://en.wikipedia.org/wiki/Rendezvous_hashing diff --git a/website/content/docs/job-specification/transparent_proxy.mdx b/website/content/docs/job-specification/transparent_proxy.mdx index 5745a9b83..29f254469 100644 --- a/website/content/docs/job-specification/transparent_proxy.mdx +++ b/website/content/docs/job-specification/transparent_proxy.mdx @@ -162,7 +162,7 @@ sidecar_service { [service intentions]: /consul/docs/connect/config-entries/service-intentions [virtual IP]: /consul/docs/services/discovery/dns-static-lookups#service-virtual-ip-lookups [`consul-cni`]: https://releases.hashicorp.com/consul-cni -[cni_plugins]: /nomad/docs/networking/cni#cni-reference-plugins +[cni_plugins]: /nomad/docs/job-networking/cni#cni-reference-plugins [consul_dns_port]: /consul/docs/agent/config/config-files#dns_port [`recursors`]: /consul/docs/agent/config/config-files#recursors [port_labels]: /nomad/docs/job-specification/network#port-parameters @@ -172,6 +172,6 @@ sidecar_service { [`static`]: /nomad/docs/job-specification/network#static [`outbound_listener_port`]: /consul/docs/connect/proxies/proxy-config-reference#outbound_listener_port [`template`]: /nomad/docs/job-specification/template#consul-integration -[`nomad node meta apply`]: /nomad/docs/commands/node/meta/apply +[`nomad node meta apply`]: /nomad/commands/node/meta/apply [`network.dns`]: /nomad/docs/job-specification/network#dns-parameters -[Transparent Proxy]: /nomad/docs/integrations/consul/service-mesh#transparent-proxy +[Transparent Proxy]: /nomad/docs/networking/consul/service-mesh#transparent-proxy diff --git a/website/content/docs/job-specification/update.mdx b/website/content/docs/job-specification/update.mdx index ef7b82c98..4e2b7fbf4 100644 --- a/website/content/docs/job-specification/update.mdx +++ b/website/content/docs/job-specification/update.mdx @@ -269,7 +269,7 @@ group "two" { } ``` -[canary]: /nomad/tutorials/job-updates/job-blue-green-and-canary-deployments 'Nomad Canary Deployments' +[canary]: /nomad/docs/job-declare/strategy/blue-green-canary 'Nomad Canary Deployments' [checks]: /nomad/docs/job-specification/service#check -[rolling]: /nomad/tutorials/job-updates/job-rolling-update 'Nomad Rolling Upgrades' +[rolling]: /nomad/docs/job-declare/strategy/rolling 'Nomad Rolling Upgrades' [strategies]: /nomad/tutorials/job-updates 'Nomad Update Strategies' diff --git a/website/content/docs/job-specification/upstreams.mdx b/website/content/docs/job-specification/upstreams.mdx index 91ec2aa11..80225985e 100644 --- a/website/content/docs/job-specification/upstreams.mdx +++ b/website/content/docs/job-specification/upstreams.mdx @@ -135,13 +135,13 @@ and a local bind port. } ``` -[Consul Connect]: /nomad/docs/integrations/consul-connect +[Consul Connect]: /nomad/docs/networking/consul [Consul Connect Guide]: /consul/tutorials/get-started-vms/virtual-machine-gs-service-discovery [`transparent_proxy`]: /nomad/docs/job-specification/transparent_proxy [job]: /nomad/docs/job-specification/job 'Nomad job Job Specification' [group]: /nomad/docs/job-specification/group 'Nomad group Job Specification' [task]: /nomad/docs/job-specification/task 'Nomad task Job Specification' -[interpolation]: /nomad/docs/runtime/interpolation 'Nomad interpolation' +[interpolation]: /nomad/docs/reference/runtime-variable-interpolation 'Nomad interpolation' [sidecar_service]: /nomad/docs/job-specification/sidecar_service 'Nomad sidecar service Specification' [upstreams]: /nomad/docs/job-specification/upstreams 'Nomad upstream config Specification' [service_defaults_mode]: /consul/docs/connect/config-entries/service-defaults#meshgateway diff --git a/website/content/docs/job-specification/vault.mdx b/website/content/docs/job-specification/vault.mdx index 1666d3c07..449608f0b 100644 --- a/website/content/docs/job-specification/vault.mdx +++ b/website/content/docs/job-specification/vault.mdx @@ -213,11 +213,11 @@ vault { ``` [`create_from_role`]: /nomad/docs/configuration/vault#create_from_role -[docker]: /nomad/docs/drivers/docker "Docker Driver" +[docker]: /nomad/docs/job-declare/task-driver/docker "Docker Driver" [restart]: /nomad/docs/job-specification/restart "Nomad restart Job Specification" [template]: /nomad/docs/job-specification/template "Nomad template Job Specification" [vault]: https://www.vaultproject.io/ "Vault by HashiCorp" [`vault.name`]: /nomad/docs/configuration/vault#name [`vault_retry`]: /nomad/docs/configuration/client#vault_retry -[Workload Identity with Vault]: /nomad/docs/integrations/vault/acl#nomad-workload-identities -[legacy Vault authentication workflow]: /nomad/docs/v1.8.x/integrations/vault/acl#authentication-without-workload-identity-legacy +[Workload Identity with Vault]: /nomad/docs/secure/vault/acl#nomad-workload-identities +[legacy Vault authentication workflow]: /nomad/docs/v1.8.x/integrations/vault/acl diff --git a/website/content/docs/job-specification/volume.mdx b/website/content/docs/job-specification/volume.mdx index 26f8adf33..1af5d9801 100644 --- a/website/content/docs/job-specification/volume.mdx +++ b/website/content/docs/job-specification/volume.mdx @@ -238,12 +238,12 @@ ID Node ID Task Group Version Desired Status Created Modified [volume_mount]: /nomad/docs/job-specification/volume_mount 'Nomad volume_mount Job Specification' [host_volume]: /nomad/docs/configuration/client#host_volume-block -[csi_volume]: /nomad/docs/commands/volume/register +[csi_volume]: /nomad/commands/volume/register [csi_plugin]: /nomad/docs/job-specification/csi_plugin -[csi_volume]: /nomad/docs/commands/volume/register -[attachment mode]: /nomad/docs/commands/volume/register#attachment_mode -[volume registration]: /nomad/docs/commands/volume/register#mount_options +[csi_volume]: /nomad/commands/volume/register +[attachment mode]: /nomad/commands/volume/register#attachment_mode +[volume registration]: /nomad/commands/volume/register#mount_options [dynamic host volumes]: /nomad/docs/other-specifications/volume/host [stateful deployments]: /nomad/docs/concepts/stateful-deployments -[`volume create`]: /nomad/docs/commands/volume/create -[`volume register`]: /nomad/docs/commands/volume/register +[`volume create`]: /nomad/commands/volume/create +[`volume register`]: /nomad/commands/volume/register diff --git a/website/content/docs/job-specification/volume_mount.mdx b/website/content/docs/job-specification/volume_mount.mdx index 9944f8022..007bd3d43 100644 --- a/website/content/docs/job-specification/volume_mount.mdx +++ b/website/content/docs/job-specification/volume_mount.mdx @@ -77,4 +77,4 @@ volumes, see [Volume Interpolation]. [volume]: /nomad/docs/job-specification/volume 'Nomad volume Job Specification' [volume interpolation]: /nomad/docs/job-specification/volume#volume-interpolation -[hcl2]: /nomad/docs/job-specification/hcl2 +[hcl2]: /nomad/docs/reference/hcl2 diff --git a/website/content/docs/manage/autopilot.mdx b/website/content/docs/manage/autopilot.mdx new file mode 100644 index 000000000..a9e781b41 --- /dev/null +++ b/website/content/docs/manage/autopilot.mdx @@ -0,0 +1,244 @@ +--- +layout: docs +page_title: Autopilot +description: |- + Configure and use Autopilot features to help maintain your cluster. Monitor + server health, upgrade and clean up servers, and use redundancy zones. +--- + +# Autopilot + +Autopilot is a set of new features added in Nomad 0.8 to allow for automatic +operator-friendly management of Nomad servers. It includes cleanup of dead +servers, monitoring the state of the Raft cluster, and stable server +introduction. + +To enable Autopilot features (with the exception of dead server cleanup), the +`raft_protocol` setting in the [server stanza] must be set to 3 on all servers. + +## Configuration + +The configuration of Autopilot is loaded by the leader from the agent's +[Autopilot settings] when initially bootstrapping the cluster: + +```hcl +autopilot { + cleanup_dead_servers = true + last_contact_threshold = "200ms" + max_trailing_logs = 250 + server_stabilization_time = "10s" + enable_redundancy_zones = false + disable_upgrade_migration = false + enable_custom_upgrades = false +} + +``` + +After bootstrapping, the configuration can be viewed or modified either via the +[`operator autopilot`] subcommand or the +[`/v1/operator/autopilot/configuration`] HTTP endpoint: + +View the configuration. + +```shell-session +$ nomad operator autopilot get-config +CleanupDeadServers = true +LastContactThreshold = 200ms +MaxTrailingLogs = 250 +ServerStabilizationTime = 10s +EnableRedundancyZones = false +DisableUpgradeMigration = false +EnableCustomUpgrades = false +``` + +Update the configuration. + +```shell-session +$ nomad operator autopilot set-config -cleanup-dead-servers=false +Configuration updated! +``` + +View the configuration to confirm your changes. + +```shell-session +$ nomad operator autopilot get-config +CleanupDeadServers = false +LastContactThreshold = 200ms +MaxTrailingLogs = 250 +ServerStabilizationTime = 10s +EnableRedundancyZones = false +DisableUpgradeMigration = false +EnableCustomUpgrades = false +``` + +## Dead server cleanup + +Dead servers will periodically be cleaned up and removed from the Raft peer set, +to prevent them from interfering with the quorum size and leader elections. This +cleanup will also happen whenever a new server is successfully added to the +cluster. + +Prior to Autopilot, it would take 72 hours for dead servers to be automatically +reaped, or operators had to script a `nomad force-leave`. If another server +failure occurred, it could jeopardize the quorum, even if the failed Nomad +server had been automatically replaced. Autopilot helps prevent these kinds of +outages by quickly removing failed servers as soon as a replacement Nomad server +comes online. When servers are removed by the cleanup process they will enter +the "left" state. + +This option can be disabled by running `nomad operator autopilot set-config` +with the `-cleanup-dead-servers=false` option. + +## Server health checking + +An internal health check runs on the leader to monitor the stability of servers. +A server is considered healthy if all of the following conditions are true: + +- Its status according to Serf is 'Alive' + +- The time since its last contact with the current leader is below + `LastContactThreshold` + +- Its latest Raft term matches the leader's term + +- The number of Raft log entries it trails the leader by does not exceed + `MaxTrailingLogs` + +The status of these health checks can be viewed through the +[`/v1/operator/autopilot/health`] HTTP endpoint, with a top level `Healthy` +field indicating the overall status of the cluster: + +```shell-session +$ curl localhost:4646/v1/operator/autopilot/health +{ + "Healthy": true, + "FailureTolerance": 0, + "Servers": [ + { + "ID": "e349749b-3303-3ddf-959c-b5885a0e1f6e", + "Name": "node1", + "Address": "127.0.0.1:4647", + "SerfStatus": "alive", + "Version": "0.8.0", + "Leader": true, + "LastContact": "0s", + "LastTerm": 2, + "LastIndex": 10, + "Healthy": true, + "Voter": true, + "StableSince": "2017-03-28T18:28:52Z" + }, + { + "ID": "e35bde83-4e9c-434f-a6ef-453f44ee21ea", + "Name": "node2", + "Address": "127.0.0.1:4747", + "SerfStatus": "alive", + "Version": "0.8.0", + "Leader": false, + "LastContact": "35.371007ms", + "LastTerm": 2, + "LastIndex": 10, + "Healthy": true, + "Voter": false, + "StableSince": "2017-03-28T18:29:10Z" + } + ] +} + +``` + +## Stable server introduction + +When a new server is added to the cluster, there is a waiting period where it +must be healthy and stable for a certain amount of time before being promoted to +a full, voting member. This can be configured via the `ServerStabilizationTime` +setting. + +--- + +~> The following Autopilot features are available only in [Nomad Enterprise] +version 0.8.0 and later. + +## Server read and scheduling scaling + +With the [`non_voting_server`] option, a server can be explicitly marked as a +non-voter and will never be promoted to a voting member. This can be useful when +more read scaling is needed; being a non-voter means that the server will still +have data replicated to it, but it will not be part of the quorum that the +leader must wait for before committing log entries. Non voting servers can also +act as scheduling workers to increase scheduling throughput in large clusters. + +## Redundancy zones + +Prior to Autopilot, it was difficult to deploy servers in a way that took +advantage of isolated failure domains such as AWS Availability Zones; users +would be forced to either have an overly-large quorum (2-3 nodes per AZ) or give +up redundancy within an AZ by deploying only one server in each. + +If the `EnableRedundancyZones` setting is set, Nomad will use its value to look +for a zone in each server's specified [`redundancy_zone`] field. + +Here's an example showing how to configure this: + +```hcl +/* config.hcl */ +server { + redundancy_zone = "west-1" +} +``` + +```shell-session +$ nomad operator autopilot set-config -enable-redundancy-zones=true +Configuration updated! +``` + +Nomad will then use these values to partition the servers by redundancy zone, +and will aim to keep one voting server per zone. Extra servers in each zone will +stay as non-voters on standby to be promoted if the active voter leaves or dies. + +## Upgrade migrations + +Autopilot in Nomad Enterprise supports upgrade migrations by default. To disable +this functionality, set `DisableUpgradeMigration` to true. + +When a new server is added and Autopilot detects that its Nomad version is newer +than that of the existing servers, Autopilot will avoid promoting the new server +until enough newer-versioned servers have been added to the cluster. When the +count of new servers equals or exceeds that of the old servers, Autopilot will +begin promoting the new servers to voters and demoting the old servers. After +this is finished, the old servers can be safely removed from the cluster. + +To check the Nomad version of the servers, either the [autopilot health] +endpoint or the `nomad members`command can be used: + +```shell-session +$ nomad server members +Name Address Port Status Leader Protocol Build Datacenter Region +node1 127.0.0.1 4648 alive true 3 0.7.1 dc1 global +node2 127.0.0.1 4748 alive false 3 0.7.1 dc1 global +node3 127.0.0.1 4848 alive false 3 0.7.1 dc1 global +node4 127.0.0.1 4948 alive false 3 0.8.0 dc1 global +``` + +### Migrations without a Nomad version change + +The `EnableCustomUpgrades` field can be used to override the version information +used during a migration, so that the migration logic can be used for updating +the cluster when changing configuration. + +If the `EnableCustomUpgrades` setting is set to `true`, Nomad will use its value +to look for a version in each server's specified [`upgrade_version`] tag. The +upgrade logic will follow semantic versioning and the `upgrade_version` must be +in the form of either `X`, `X.Y`, or `X.Y.Z`. + +[`/v1/operator/autopilot/configuration`]: /nomad/api-docs/operator/autopilot +[`/v1/operator/autopilot/health`]: /nomad/api-docs/operator/autopilot#read-health +[`non_voting_server`]: /nomad/docs/configuration/server#non_voting_server +[`operator autopilot`]: /nomad/commands/operator +[`redundancy_zone`]: /nomad/docs/configuration/server#redundancy_zone +[`upgrade_version`]: /nomad/docs/configuration/server#upgrade_version +[autopilot health]: /nomad/api-docs/operator/autopilot#read-health +[autopilot settings]: /nomad/docs/configuration/autopilot +[nomad enterprise]: https://www.hashicorp.com/products/nomad +[server stanza]: /nomad/docs/configuration/server +[version upgrade section]: /nomad/docs/upgrade/upgrade-specific#raft-protocol-version-compatibility diff --git a/website/content/docs/manage/format-cli-output.mdx b/website/content/docs/manage/format-cli-output.mdx new file mode 100644 index 000000000..0b99a8271 --- /dev/null +++ b/website/content/docs/manage/format-cli-output.mdx @@ -0,0 +1,415 @@ +--- +layout: docs +page_title: Format CLI output with templates +description: |- + Customize the output of Nomad CLI commands with Go template syntax. Practice + interacting with unfamiliar template contexts. +--- + +# Format CLI output with templates + +When using Nomad at an intermediate to advanced level, you'll need to interface with other systems or customize output generated by Nomad. The `-t` flag is a powerful way to pass a template in Go's text/template format to +several of the Nomad commands that generate output based on the API. This allows +you to filter and customize the output to meet your specific needs. + +The commands that allow for the -t flag are: + +- `nomad acl policy list` +- `nomad acl token list` +- `nomad alloc status` +- `nomad deployment list` +- `nomad deployment status` +- `nomad eval status` +- `nomad job deployments` +- `nomad job history` +- `nomad job inspect` +- `nomad namespace list` +- `nomad node status` +- `nomad plugin status` +- `nomad quota list` +- `nomad volume status` + +This guide will teach you how to explore the objects that are returned to +the template engine and how to use template syntax to format the output into +a custom form. + +## Prerequisites + +This guide assumes the following: + +- Familiarity with Go's text/template syntax. You can learn more about it in the + [Go template syntax][go-template-syntax] reference. + +- That you are running these commands against a Nomad cluster with an active + workload. You can create a minimal environment using a dev agent, started with + `sudo nomad agent -dev`, then running at least one Nomad job. You can use + `nomad init -short` to create a sample Docker job or provide your own Nomad + job. + +## Note the shell-specific syntax + +When using the -t flag, you need to correctly handle string literals based on +your shell environment. In a POSIX shell, you can run the following with a +single quote: + +```shell-session +$ nomad node status -t '{{printf "%#+v" .}}' +``` + +In a Windows shell (for example, PowerShell), use single +quotes but escape the double quotes inside the parameter as follows: + +```powershell +PS> nomad node status -t '{{printf \"%#+v\" .}}' +``` + +In this guide, you can select examples with the proper escaping using the +tabs above the snippets. + +## Start discovering objects + +The `printf` function and the `"%#+v"` format string are critical tools for you +in exploring an unfamiliar template context. + +Run the following command to output the context being passed to the template +in Go object format. + + + + +```shell-session +$ nomad node status -t '{{printf "%#+v" .}}' +``` + + + + +```powershell +PS> nomad node status -t '{{printf \"%#+v\" .}}' +``` + + + + +```plaintext +[]*api.NodeListStub{(*api.NodeListStub)(0xc0003fa160), (*api.NodeListStub)(0xc0003fa0b0), (*api.NodeListStub)(0xc0003fa000)} +``` + +The output indicates that the context consists of a list (`[]`) of pointers +(`*`) to `api.NodeListStub` objects. The list will also show one NodeListStub +object per client node in your cluster's server state. + +You can explore these api.NodeListStub object by using the `range` control over +the list. + + + + +```shell-session +$ nomad node status -t '{{range .}}{{printf "%#+v" .}}{{end}}' +``` + + + + +```powershell +PS> nomad node status -t '{{range .}}{{printf \"%#+v\" .}}{{end}}' +``` + + + + +```plaintext +&api.NodeListStub{Address:"10.0.2.52", ID:"4f60bc83-71a2-7790-b120-4e55d0e6ed34", Datacenter:"dc1", Name:"nomad-client-2.node.consul", NodeClass:"", Version:"0.12.0", Drain:false, SchedulingEligibility:"eligible", Status:"ready", ... +``` + +If you have a lot of client nodes in your cluster state, this output will be +unwieldy. In that case, you can use `with` and the index function to get the +first list item. + + + + +```shell-session +$ nomad node status -t '{{with index . 0}}{{printf "%#+v" .}}{{end}}' +``` + + + + +```powershell +PS> nomad node status -t '{{with index . 0}}{{printf \"%#+v\" .}}{{end}}' +&api.NodeListStub{Address:"10.0.2.52", ID:"4f60bc83-71a2-7790-b120-4e55d0e6ed34", Datacenter:"dc1", Name:"nomad-client-2.node.consul", NodeClass:"", Version:"0.12.0", Drain:false, SchedulingEligibility:"eligible", Status:"ready", ... +``` + + + + +```plaintext +&api.NodeListStub{Address:"10.0.2.52", ID:"4f60bc83-71a2-7790-b120-4e55d0e6ed34", Datacenter:"dc1", Name:"nomad-client-2.node.consul", NodeClass:"", Version:"0.12.0", Drain:false, SchedulingEligibility:"eligible", Status:"ready", ... +``` + +Finally, output `Name` and `Version` for each client in the cluster. + + + + +```shell-session +$ nomad node status -t '{{range .}}{{printf "%s: %s\n" .Name .Version}}{{end}}' +``` + + + + +```powershell +PS> nomad node status -t '{{range .}}{{printf \"%s: %s\n\" .Name .Version}}{{end}}' +``` + + + + +```plaintext +nomad-client-2.node.consul: 0.12.0 +nomad-client-3.node.consul: 0.12.0 +nomad-client-1.node.consul: 0.12.0 +``` + +## Make quiet output + +Suppose you want to create a reduced version of the `nomad job status` output +to show just the running job IDs in your cluster and nothing else. + + + + +```shell-session +$ nomad job inspect -t '{{range .}}{{if eq .Status "running"}}{{ println .Name}}{{end}}{{end}}' +``` + + + + +```powershell +PS> nomad job inspect -t '{{range .}}{{if eq .Status \"running\"}}{{ println .Name}}{{end}}{{end}}' +``` + + + + +Nomad will output the job IDs for every running job in your cluster. For example: + +```plaintext +fabio +sockshop-carts +sockshop-catalogue +sockshop-frontend +sockshop-infra +sockshop-orders +sockshop-payment +sockshop-shipping +sockshop-user +``` + +### Challenge yourself + +Allocations have a slightly different shape. How might you create similar output +from the `nomad alloc status` command? Make sure that your Nomad cluster has at +least one allocation running and then use the printf technique from earlier to +explore the values sent into the template. + + + + +Print the context that you are passed from the command using the printf command. + + + + +```shell-session +$ nomad alloc status -t '{{printf "%#+v" . }}' +``` + + + + +```powershell +PS> nomad alloc status -t '{{printf \"%#+v\" . }}' +``` + + + + +```plaintext +[]*api.AllocationListStub ... +``` + +Note that the first thing that you receive is a list (`[]`) of pointers (`*`) to +`AllocationListStub` objects. + +Use `range` to traverse each item in the list. + + + + +```shell-session +$ nomad alloc status -t '{{range .}}{{printf "%#+v" . }}{{end}}' +``` + + + + +```powershell +PS> nomad alloc status -t '{{range .}}{{printf \"%#+v\" . }}{{end}}' +``` + + + + +```plaintext +&api.AllocationListStub{ID:"30663b68-4d8a-aada-4ad2-011b1acae3a1", EvalID:"c5eda90b-f675-048e-b2f7-9ced30e4916b", Name:"sockshop-user.userdb[0]", Namespace:"default", NodeID:"3be35c12-70aa-8816-195e-a4630a457727", NodeName:"nomad-client-3.node.consul", JobID:"sockshop-user", JobType:"service", JobVersion:0x0, ... +``` + +If you have a lot of allocations running, this could get unwieldy. In that case, +you can use `with` and the index function to get the first list item. + + + + +```shell-session +$ nomad alloc status -t '{{with index . 0}}{{printf "%#+v" . }}{{end}}' +``` + + + + +```powershell +PS> nomad alloc status -t '{{with index . 0}}{{printf \"%#+v\" . }}{{end}}' +``` + + + + +```plaintext +&api.AllocationListStub{ID:"30663b68-4d8a-aada-4ad2-011b1acae3a1", EvalID:"c5eda90b-f675-048e-b2f7-9ced30e4916b", Name:"sockshop-user.userdb[0]", Namespace:"default", NodeID:"3be35c12-70aa-8816-195e-a4630a457727", NodeName:"nomad-client-3.node.consul", JobID:"sockshop-user", JobType:"service", JobVersion:0x0, ... +``` + +The fields on the AllocationListStub object that give insight into the running +state of an allocation are `DesiredStatus` and `ClientStatus`. + +-> **Did you know?** The definition of an [AllocationListStub][] object and +valid values for the DesiredStatus and ClientStatus are located in Nomad's +[api package][]. Take a moment to look at it and see what other information you +might be interested in displaying with templates. + +Update your template to show items with a DesiredStatus of "run" and a client +status of "running" or "pending." + + + + +```shell-session +$ nomad alloc status -t '{{range .}}{{if and (eq .DesiredStatus "run") (or (eq .ClientStatus "running") (eq .ClientStatus "pending"))}}{{println .ID}}{{end}}{{end}}' +``` + + + + +```powershell +PS> nomad alloc status -t '{{range .}}{{if and (eq .DesiredStatus \"run\") (or (eq .ClientStatus \"running\") (eq .ClientStatus \"pending\"))}}{{println .ID}}{{end}}{{end}}' +``` + + + + +```plaintext +30663b68-4d8a-aada-4ad2-011b1acae3a1 +11b916da-d679-1718-26f3-f6cd499bfdb8 +68bcb157-359f-9293-d091-5a8ef71475ad +... +``` + +You now have a list of the IDs for all of the allocations running in your Nomad +cluster. + + + + +## Retrieve a template from file + +Using the command line to write templates becomes challenging +as the template becomes more complex. + +By writing a template in its own file, you can use comments, span multiple lines, and indent conditionals in order to make them more readable to you and to other operators. + +Consider using some of these techniques +to include the template data into the command. + + + + + +Create a file named running_jobs.tmpl with the following content. + +```plaintext +{{- /* + Get Running Jobs + Run with `nomad job inspect -t "$(cat running_jobs.tmpl)"` +*/ -}} +{{- range . -}} + {{- if eq .Status "running" -}} + {{- println .Name -}} + {{- end -}} +{{- end -}} +``` + +Now, use a subshell to read the file into a variable + +```shell-session +$ nomad job inspect -t "$(cat running_jobs.tmpl)" +``` + + + + + +Create a file named running_jobs.tmpl with the following content. + +```plaintext +{{- /* + Get Running Jobs + Run with: + $content=Get-Content running_jobs.tmpl -Raw; nomad job inspect -t $content +*/ -}} +{{- range . -}} + {{- if eq .Status \"running\" -}} + {{- println .Name -}} + {{- end -}} +{{- end -}} +``` + +Now, use a subshell to read the file into a variable + +```powershell +PS> $content=Get-Content running_jobs.tmpl -Raw; nomad job inspect -t $content +``` + + + + + +## Learn more + +In this guide, you learned how to: + +- Customize the output of several Nomad commands using Go's text/template + syntax. + +- Use the `printf` function to discover what is available in the template's + context. + +- Use a template definition contained in a file as part of the command. + + +[go-template-syntax]: /nomad/docs/reference/go-template-syntax +[allocationliststub]: https://godoc.org/github.com/hashicorp/nomad/api#AllocationListStub +[api package]: https://godoc.org/github.com/hashicorp/nomad/api diff --git a/website/content/docs/operations/garbage-collection.mdx b/website/content/docs/manage/garbage-collection.mdx similarity index 98% rename from website/content/docs/operations/garbage-collection.mdx rename to website/content/docs/manage/garbage-collection.mdx index 3fc2a5b84..f6fb5307c 100644 --- a/website/content/docs/operations/garbage-collection.mdx +++ b/website/content/docs/manage/garbage-collection.mdx @@ -186,7 +186,7 @@ allocation is terminal, the client garbage collection process communicates with the task driver to ensure the task's resources have been cleaned up. Note that the Docker task driver periodically cleans up its own resources. Refer to the [Docker task driver plugin -options](https://developer.hashicorp.com/nomad/docs/drivers/docker#gc) for +options](https://developer.hashicorp.com/nomad/docs/deploy/task-driver/docker#gc) for details. When a task has configured restart attempts and the task fails, the Nomad client @@ -210,5 +210,5 @@ disk space might not be fully reclaimed until fixed. - [client Block in Agent Configuration](/nomad/docs/configuration/client) - [server Block in Agent Configuration](/nomad/docs/configuration/server) -- [the `nomad system gc` command reference](/nomad/docs/commands/system/gc) +- [the `nomad system gc` command reference](/nomad/commands/system/gc) - [System HTTP API Force GC](/nomad/api-docs/system#force-gc) diff --git a/website/content/docs/manage/index.mdx b/website/content/docs/manage/index.mdx new file mode 100644 index 000000000..ea2ddaf4c --- /dev/null +++ b/website/content/docs/manage/index.mdx @@ -0,0 +1,10 @@ +--- +layout: docs +page_title: Manage Nomad +description: |- + This section provides guidance on the operational management of Nomad. Topics include garbage collection, key management, workload migration, outage recovery, enabling Autopilot, formatting command output, and using namespaces. +--- + +# Manage Nomad + +This section provides guidance on the operational management of Nomad. Topics include garbage collection, key management, workload migration, outage recovery, enabling Autopilot, formatting command output, and using namespaces. diff --git a/website/content/docs/operations/key-management.mdx b/website/content/docs/manage/key-management.mdx similarity index 94% rename from website/content/docs/operations/key-management.mdx rename to website/content/docs/manage/key-management.mdx index c0b5b6e8f..25704c513 100644 --- a/website/content/docs/operations/key-management.mdx +++ b/website/content/docs/manage/key-management.mdx @@ -91,9 +91,9 @@ keyring rotate`][] once the servers have joined. [variables]: /nomad/docs/concepts/variables [workload identities]: /nomad/docs/concepts/workload-identity -[client assertion JWTs]: /nomad/docs/concepts/acl/auth-methods/oidc#client-assertions +[client assertion JWTs]: /nomad/docs/secure/authentication/oidc#client-assertions [data directory]: /nomad/docs/configuration#data_dir [`keyring`]: /nomad/docs/configuration/keyring -[`nomad operator root keyring rotate -full`]: /nomad/docs/commands/operator/root/keyring-rotate -[`nomad operator root keyring rotate`]: /nomad/docs/commands/operator/root/keyring-rotate +[`nomad operator root keyring rotate -full`]: /nomad/commands/operator/root/keyring-rotate +[`nomad operator root keyring rotate`]: /nomad/commands/operator/root/keyring-rotate [`root_key_gc_interval`]: /nomad/docs/configuration/server#root_key_gc_interval diff --git a/website/content/docs/manage/migrate-workloads.mdx b/website/content/docs/manage/migrate-workloads.mdx new file mode 100644 index 000000000..041595a6e --- /dev/null +++ b/website/content/docs/manage/migrate-workloads.mdx @@ -0,0 +1,361 @@ +--- +layout: docs +page_title: Migrate workloads +description: |- + Configure jobs to allow Nomad to migrate workloads during events like node + drains. Discover how to customize migration configuration for your + applications. +--- + +# Migrate workloads + +Migrating workloads and decommissioning nodes are a normal part of cluster +operations for a variety of reasons: server maintenance, operating system +upgrades, etc. Nomad offers a number of parameters for controlling how running +jobs are migrated off of draining nodes. + +## Define how your job is migrated + +In Nomad 0.8, a [`migrate`][migrate] stanza was added to jobs to allow control +over how allocations for a job are migrated off of a draining node. Below is an +example job that runs a web service and has a Consul health check: + +```hcl +job "webapp" { + datacenters = ["dc1"] + + migrate { + max_parallel = 2 + health_check = "checks" + min_healthy_time = "15s" + healthy_deadline = "5m" + } + + group "webapp" { + count = 9 + + network { + port "http" { + to = 5678 + } + } + + task "webapp" { + driver = "docker" + config { + image = "hashicorp/http-echo:0.2.3" + args = ["-text", "ok"] + ports = ["http"] + } + + service { + name = "webapp" + port = "http" + check { + name = "http-ok" + type = "http" + path = "/" + interval = "10s" + timeout = "2s" + } + } + } + } +} +``` + +The above `migrate` stanza ensures only 2 allocations are stopped at a time to +migrate during node drains. Even if multiple nodes running allocations for this +job were draining at the same time, only 2 allocations would be migrated at a +time. + +When the job is run it may be placed on multiple nodes. In the following +example the 9 `webapp` allocations are spread across 2 nodes: + +```shell-session +$ nomad run webapp.nomad.hcl +==> Monitoring evaluation "5129bc74" + Evaluation triggered by job "webapp" + Allocation "5b4d6db5" created: node "46f1c6c4", group "webapp" + Allocation "670a715f" created: node "f7476465", group "webapp" + Allocation "78b6b393" created: node "46f1c6c4", group "webapp" + Allocation "85743ff5" created: node "f7476465", group "webapp" + Allocation "edf71a5d" created: node "f7476465", group "webapp" + Allocation "56f770c0" created: node "46f1c6c4", group "webapp" + Allocation "9a51a484" created: node "46f1c6c4", group "webapp" + Allocation "f6f6e64c" created: node "f7476465", group "webapp" + Allocation "fefe81d0" created: node "f7476465", group "webapp" + Evaluation status changed: "pending" -> "complete" +==> Evaluation "5129bc74" finished with status "complete" +``` + +If one those nodes needed to be decommissioned, perhaps because of a hardware +issue, then an operator would issue node drain to migrate the allocations off: + +```shell-session +$ nomad node drain -enable -yes 46f1 +2018-04-11T23:41:56Z: Ctrl-C to stop monitoring: will not cancel the node drain +2018-04-11T23:41:56Z: Node "46f1c6c4-a0e5-21f6-fd5c-d76c3d84e806" drain strategy set +2018-04-11T23:41:57Z: Alloc "5b4d6db5-3fcb-eb7d-0415-23eefcd78b6a" marked for migration +2018-04-11T23:41:57Z: Alloc "56f770c0-f8aa-4565-086d-01faa977f82d" marked for migration +2018-04-11T23:41:57Z: Alloc "56f770c0-f8aa-4565-086d-01faa977f82d" draining +2018-04-11T23:41:57Z: Alloc "5b4d6db5-3fcb-eb7d-0415-23eefcd78b6a" draining +2018-04-11T23:42:03Z: Alloc "56f770c0-f8aa-4565-086d-01faa977f82d" status running -> complete +2018-04-11T23:42:03Z: Alloc "5b4d6db5-3fcb-eb7d-0415-23eefcd78b6a" status running -> complete +2018-04-11T23:42:22Z: Alloc "78b6b393-d29c-d8f8-e8e8-28931c0013ee" marked for migration +2018-04-11T23:42:22Z: Alloc "78b6b393-d29c-d8f8-e8e8-28931c0013ee" draining +2018-04-11T23:42:27Z: Alloc "78b6b393-d29c-d8f8-e8e8-28931c0013ee" status running -> complete +2018-04-11T23:42:29Z: Alloc "9a51a484-8c43-aa4e-d60a-46cfd1450780" marked for migration +2018-04-11T23:42:29Z: Alloc "9a51a484-8c43-aa4e-d60a-46cfd1450780" draining +2018-04-11T23:42:29Z: Node "46f1c6c4-a0e5-21f6-fd5c-d76c3d84e806" has marked all allocations for migration +2018-04-11T23:42:34Z: Alloc "9a51a484-8c43-aa4e-d60a-46cfd1450780" status running -> complete +2018-04-11T23:42:34Z: All allocations on node "46f1c6c4-a0e5-21f6-fd5c-d76c3d84e806" have stopped. +``` + +There are a couple of important events to notice in the output. First, only two +allocations are migrated initially: + +```plaintext +2018-04-11T23:41:57Z: Alloc "5b4d6db5-3fcb-eb7d-0415-23eefcd78b6a" marked for migration +2018-04-11T23:41:57Z: Alloc "56f770c0-f8aa-4565-086d-01faa977f82d" marked for migration +``` + +This is because `max_parallel = 2` in the job specification. The next +allocation on the draining node waits to be migrated: + +```plaintext +2018-04-11T23:42:22Z: Alloc "78b6b393-d29c-d8f8-e8e8-28931c0013ee" marked for migration +``` + +Note that this occurs 25 seconds after the initial migrations. The 25 second +delay is because a replacement allocation took 10 seconds to become healthy and +then the `min_healthy_time = "15s"` meant node draining waited an additional 15 +seconds. If the replacement allocation had failed within that time the node +drain would not have continued until a replacement could be successfully made. + +### Verify drained node's scheduling eligibility + +Now that the example drain has finished you can inspect the state of the drained +node: + +```shell-session +$ nomad node status +ID DC Name Class Drain Eligibility Status +f7476465 dc1 nomad-1 false eligible ready +96b52ad8 dc1 nomad-2 false eligible ready +46f1c6c4 dc1 nomad-3 false ineligible ready +``` + +While node `46f1c6c4` has `Drain = false`, notice that its `Eligibility = ineligible`. Node scheduling eligibility is a new field in Nomad 0.8. When a +node is ineligible for scheduling the scheduler will not consider it for new +placements. + +While draining, a node will always be ineligible for scheduling. Once draining +completes it will remain ineligible to prevent refilling a newly drained node. + +However, by default canceling a drain with the `-disable` option will reset a +node to be eligible for scheduling. To cancel a drain and preserving the node's +ineligible status use the `-keep-ineligible` option. + +Scheduling eligibility can be toggled independently of node drains by using the +[`nomad node eligibility`][eligibility] command: + +```shell-session +$ nomad node eligibility -disable 46f1 +Node "46f1c6c4-a0e5-21f6-fd5c-d76c3d84e806" scheduling eligibility set: ineligible for scheduling +``` + +### Use a drain deadline to force completion + +Sometimes a drain is unable to proceed and complete normally. This could be +caused by not enough capacity existing in the cluster to replace the drained +allocations or by replacement allocations failing to start successfully in a +timely fashion. + +Operators may specify a deadline when enabling a node drain to prevent drains +from not finishing. Once the deadline is reached, all remaining allocations on +the node are stopped regardless of `migrate` stanza parameters. + +The default deadline is 1 hour and may be changed with the +[`-deadline`][deadline] command line option. The [`-force`][force] option is an +instant deadline: all allocations are immediately stopped. The +[`-no-deadline`][no-deadline] option disables the deadline so a drain may +continue indefinitely. + +Like all other drain parameters, a drain's deadline can be updated by making +subsequent `nomad node drain ...` calls with updated values. + +## Plan a drain strategy for batch and system jobs + +So far you have only seen how draining works with service jobs. Both batch and +system jobs are have different behaviors during node drains. + +### Drain batch jobs + +Node drains only migrate batch jobs once the drain's deadline has been reached. +For node drains without a deadline the drain will not complete until all batch +jobs on the node have completed (or failed). + +The goal of this behavior is to avoid losing progress a batch job has made by +forcing it to exit early. + +### Keep system jobs running + +Node drains only stop system jobs once all other allocations have exited. This +way if a node is running a log shipping daemon or metrics collector as a system +job, it will continue to run as long as there are other allocations running. + +The [`-ignore-system`][ignore-system] option leaves system jobs running even +after all other allocations have exited. This is useful when system jobs are +used to monitor Nomad or the node itself. + +## Drain multiple nodes + +A common operation is to decommission an entire class of nodes at once. Prior +to Nomad 0.7 this was a problematic operation as the first node to begin +draining may migrate all of their allocations to the next node about to be +drained. In pathological cases this could repeat on each node to be drained and +cause allocations to be rescheduled repeatedly. + +As of Nomad 0.8 an operator can avoid this churn by marking nodes ineligible +for scheduling before draining them using the [`nomad node eligibility`][eligibility] command. + +Mark a node as ineligible for scheduling with the `-disable` flag. + +```shell-session +$ nomad node eligibility -disable 46f1 +Node "46f1c6c4-a0e5-21f6-fd5c-d76c3d84e806" scheduling eligibility set: ineligible for scheduling +``` + +```shell-session +$ nomad node eligibility -disable 96b5 +Node "96b52ad8-e9ad-1084-c14f-0e11f10772e4" scheduling eligibility set: ineligible for scheduling +``` + +Check node status to confirm eligibility. + +```shell-session +$ nomad node status +ID DC Name Class Drain Eligibility Status +f7476465 dc1 nomad-1 false eligible ready +46f1c6c4 dc1 nomad-2 false ineligible ready +96b52ad8 dc1 nomad-3 false ineligible ready +``` + +Now that both `nomad-2` and `nomad-3` are ineligible for scheduling, they can +be drained without risking placing allocations on an _about-to-be-drained_ +node. + +Toggling scheduling eligibility can be done totally independently of draining. +For example when an operator wants to inspect the allocations currently running +on a node without risking new allocations being scheduled and changing the +node's state. + +Make current node ineligible for scheduling. + +```shell-session +$ nomad node eligibility -self -disable +Node "96b52ad8-e9ad-1084-c14f-0e11f10772e4" scheduling eligibility set: ineligible for scheduling +``` + +Make current node eligible for scheduling again with the `-enable` flag. + +```shell-session +$ nomad node eligibility -self -enable +Node "96b52ad8-e9ad-1084-c14f-0e11f10772e4" scheduling eligibility set: eligible for scheduling +``` + +### Example: migrating datacenters + +A more complete example of draining multiple nodes would be when migrating from +an old datacenter (`dc1`) to a new datacenter (`dc2`): + +```shell-session +$ nomad node status -allocs +ID DC Name Class Drain Eligibility Status Running Allocs +f7476465 dc1 nomad-1 false eligible ready 4 +46f1c6c4 dc1 nomad-2 false eligible ready 1 +96b52ad8 dc1 nomad-3 false eligible ready 4 +168bdd03 dc2 nomad-4 false eligible ready 0 +9ccb3306 dc2 nomad-5 false eligible ready 0 +7a7f9a37 dc2 nomad-6 false eligible ready 0 +``` + +Before migrating ensure that all jobs in `dc1` have `datacenters = ["dc1", "dc2"]`. Then before draining, mark all nodes in `dc1` as ineligible for +scheduling. + +Shell scripting can help automate manipulating multiple nodes at once. + +```shell-session +$ nomad node status | awk '{ print $2 " " $1 }' | grep ^dc1 | awk '{ system("nomad node eligibility -disable "$2) }' +Node "f7476465-4d6e-c0de-26d0-e383c49be941" scheduling eligibility set: ineligible for scheduling +Node "46f1c6c4-a0e5-21f6-fd5c-d76c3d84e806" scheduling eligibility set: ineligible for scheduling +Node "96b52ad8-e9ad-1084-c14f-0e11f10772e4" scheduling eligibility set: ineligible for scheduling +``` + +Check status to confirm ineligibility. + +```shell-session +$ nomad node status +ID DC Name Class Drain Eligibility Status +f7476465 dc1 nomad-1 false ineligible ready +46f1c6c4 dc1 nomad-2 false ineligible ready +96b52ad8 dc1 nomad-3 false ineligible ready +168bdd03 dc2 nomad-4 false eligible ready +9ccb3306 dc2 nomad-5 false eligible ready +7a7f9a37 dc2 nomad-6 false eligible ready +``` + +Then drain each node in `dc1`. + +```shell-session +$ nomad node drain -enable -yes -detach f7476465 +Node "f7476465-4d6e-c0de-26d0-e383c49be941" drain strategy set +``` + +Pass the ID for each node with the flags ``-enable -yes -detach` to initiate the drain. + +```shell-session +$ nomad node drain -enable -yes -detach 46f1c6c4 +Node "46f1c6c4-a0e5-21f6-fd5c-d76c3d84e806" drain strategy set +``` + +For this example, only monitor the final node that is draining. Watching `nomad node status -allocs` +is also a good way to monitor the status of drains. + +```shell-session +$ nomad node drain -enable -yes 9ccb3306 +2018-04-12T22:08:00Z: Ctrl-C to stop monitoring: will not cancel the node drain +2018-04-12T22:08:00Z: Node "96b52ad8-e9ad-1084-c14f-0e11f10772e4" drain strategy set +2018-04-12T22:08:15Z: Alloc "392ee2ec-d517-c170-e7b1-d93b2d44642c" marked for migration +2018-04-12T22:08:16Z: Alloc "392ee2ec-d517-c170-e7b1-d93b2d44642c" draining +2018-04-12T22:08:17Z: Alloc "6a833b3b-c062-1f5e-8dc2-8b6af18a5b94" marked for migration +2018-04-12T22:08:17Z: Alloc "6a833b3b-c062-1f5e-8dc2-8b6af18a5b94" draining +2018-04-12T22:08:21Z: Alloc "392ee2ec-d517-c170-e7b1-d93b2d44642c" status running -> complete +2018-04-12T22:08:22Z: Alloc "6a833b3b-c062-1f5e-8dc2-8b6af18a5b94" status running -> complete +2018-04-12T22:09:08Z: Alloc "d572d7a3-024b-fcb7-128b-1932a49c8d79" marked for migration +2018-04-12T22:09:09Z: Alloc "d572d7a3-024b-fcb7-128b-1932a49c8d79" draining +2018-04-12T22:09:14Z: Alloc "d572d7a3-024b-fcb7-128b-1932a49c8d79" status running -> complete +2018-04-12T22:09:33Z: Alloc "f3f24277-4435-56a3-7ee1-1b1eff5e3aa1" marked for migration +2018-04-12T22:09:33Z: Alloc "f3f24277-4435-56a3-7ee1-1b1eff5e3aa1" draining +2018-04-12T22:09:33Z: Node "96b52ad8-e9ad-1084-c14f-0e11f10772e4" has marked all allocations for migration +2018-04-12T22:09:39Z: Alloc "f3f24277-4435-56a3-7ee1-1b1eff5e3aa1" status running -> complete +2018-04-12T22:09:39Z: All allocations on node "96b52ad8-e9ad-1084-c14f-0e11f10772e4" have stopped. +``` + +Note that there was a 15 second delay between node `96b52ad8` starting to drain +and having its first allocation migrated. The delay was due to 2 other +allocations for the same job already being migrated from the other nodes. Once +at least 8 out of the 9 allocations are running for the job, another allocation +could begin draining. + +The final node drain command did not exit until 6 seconds after the `drain complete` message because the command line tool blocks until all allocations on +the node have stopped. This allows operators to script shutting down a node +once a drain command exits and know all services have already exited. + +[deadline]: /nomad/commands/node/drain#deadline +[eligibility]: /nomad/commands/node/eligibility +[force]: /nomad/commands/node/drain#force +[ignore-system]: /nomad/commands/node/drain#ignore-system +[migrate]: /nomad/docs/job-specification/migrate +[no-deadline]: /nomad/commands/node/drain#no-deadline diff --git a/website/content/docs/manage/outage-recovery.mdx b/website/content/docs/manage/outage-recovery.mdx new file mode 100644 index 000000000..8f537382b --- /dev/null +++ b/website/content/docs/manage/outage-recovery.mdx @@ -0,0 +1,252 @@ +--- +layout: docs +page_title: Recover from an outage +description: |- + Discover techniques and steps to recover from a single node failure, + multi-node failure, or a complete loss of quorum. +--- + +# Recover from an outage + +Don't panic. This is a critical first step. + +Depending on your [deployment configuration], it may take only a single server +failure for cluster unavailability. Recovery requires an operator to intervene, +but the process is straightforward. + +~> This guide is for recovery from a Nomad outage due to a majority of server +nodes in a datacenter being lost. If you are looking to add or remove servers, +consult the [bootstrapping guide]. + +## Failure of a single server cluster + +If you had only a single server and it has failed, try to restore operation by +restarting it. A single server configuration requires the +[`-bootstrap-expect=1`] flag. If the server cannot be recovered, you need to +bring up a new server. Consult the [bootstrapping guide] for more detail. + +In the case of an unrecoverable server failure in a single server cluster, data +loss is inevitable since data was not replicated to any other servers. This is +why a single server deploy is **never** recommended. + +## Failure of a server in a multi-server cluster + +If you think the failed server is recoverable, the easiest option is to bring it +back online and have it rejoin the cluster with the same IP address. This will +return the cluster to a fully healthy state. Similarly, even if you need to +rebuild a new Nomad server to replace the failed node, you may wish to do that +immediately. Keep in mind that the rebuilt server needs to have the same IP +address as the failed server. Again, once this server is online and has +rejoined, the cluster will return to a fully healthy state. + +Both of these strategies involve a potentially lengthy time to reboot or rebuild +a failed server. If this is impractical or if building a new server with the +same IP isn't an option, you need to remove the failed server. Usually, you can +issue a [`nomad server force-leave`] command to remove the failed server if it +is still a member of the cluster. + + + + Your raft cluster will need to have a quorum of nodes available to +perform any online modifications to the Raft peer information. Membership +changes are written to the Raft log, and all Raft log writes require quorum. +If this is impossible, continue to **Failure of Multiple Servers in a +Multi-Server Cluster** + + + +If, for some reason, the Raft configuration continues to show any stale members, +you can use the [`nomad operator raft remove-peer`] command to remove the stale +peer server on the fly with no downtime. + +Once you have made the membership changes necessary, you should verify the +current Raft state with the [`nomad operator raft list-peers`] command: + +```shell-session +$ nomad operator raft list-peers +Node ID Address State Voter +nomad-server01.global 10.10.11.5:4647 10.10.11.5:4647 follower true +nomad-server02.global 10.10.11.6:4647 10.10.11.6:4647 leader true +nomad-server03.global 10.10.11.7:4647 10.10.11.7:4647 follower true +``` + +## Failure of multiple servers in a multi-server cluster + +In the event that multiple servers are lost, causing a loss of quorum and a +complete outage, partial recovery is possible using data on the remaining +servers in the cluster. There may be data loss in this situation because +multiple servers were lost, so information about what's committed could be +incomplete. The recovery process implicitly commits all outstanding Raft log +entries, so it's also possible to commit data that was uncommitted before the +failure. + +The [section below][] contains the details of the recovery procedure. You will +include the remaining servers in the `raft/peers.json` recovery file. The +cluster should be able to elect a leader once the remaining servers are all +restarted with an identical `raft/peers.json` configuration. + +Any new servers you introduce later can be fresh with totally clean data +directories and joined using Nomad's `server join` command. + +In extreme cases, it should be possible to recover with only a single remaining +server by starting that single server with itself as the only peer in the +`raft/peers.json` recovery file. + +The `raft/peers.json` recovery file is final, and a snapshot is taken after it +is ingested, so you are guaranteed to start with your recovered configuration. +This does implicitly commit all Raft log entries, so should only be used to +recover from an outage, but it should allow recovery from any situation where +there's some cluster data available. + +## Manual recovery using peers.json + +To begin, stop all remaining servers. You can attempt a graceful leave, but it +will not work in most cases. Do not worry if the leave exits with an error. The +cluster is in an unhealthy state, so this is expected. + +The `peers.json` file is not present by default and is only used when performing +recovery. This file will be deleted after Nomad starts and ingests this file. + +Nomad automatically creates a `raft/peers.info` file on startup to mark that it +is on the current version of Raft. Do not remove the `raft/peers.info` +file at any time. + +Using `raft/peers.json` for recovery can cause uncommitted Raft log entries to +be implicitly committed, so this should only be used after an outage where no +other option is available to recover a lost server. Make sure you don't have any +automated processes that will put the peers file in place on a periodic basis. + +The next step is to go to the [`-data-dir`][] of each Nomad server. Inside that +directory, there will be a `raft/` sub-directory. Create a `raft/peers.json` +file. Its contents will depend on the raft protocol version of your cluster. + +### Raft protocol 3 peers.json specification + +```json +[ + { + "id": "adf4238a-882b-9ddc-4a9d-5b6758e4159e", + "address": "10.1.0.1:4647", + "non_voter": false + }, + { + "id": "8b6dda82-3103-11e7-93ae-92361f002671", + "address": "10.1.0.2:4647", + "non_voter": false + }, + { + "id": "97e17742-3103-11e7-93ae-92361f002671", + "address": "10.1.0.3:4647", + "non_voter": false + } +] +``` + +- `id` `(string: **required**)` - Specifies the `node ID` of the server. This + can be found in the logs when the server starts up, and it can also be found + inside the `node-id` file in the server's data directory. + +- `address` `(string: **required**)` - Specifies the IP and port of the server + in `ip:port` format. The port is the server's RPC port used for cluster + communications, typically 4647. + +- `non_voter` `(bool: _false_)` - This controls whether the server is a + non-voter, which is used in some advanced [Autopilot] configurations. If + omitted, it will default to false, which is typical for most clusters. + +You can use this `jq` filter to create a `peers.json` file with the list of `alive` servers. Check the generated output and make any necessary changes. + +```shell +$ nomad server members -json | jq '[ .[] | select(.Status == "alive") | {id: .Tags.id, address: "\(.Tags.rpc_addr):\(.Tags.port)", non_voter: false} ]' +``` + +### Raft protocol 2 peers.json specification + +```json +["10.0.1.8:4647", "10.0.1.6:4647", "10.0.1.7:4647"] +``` + +Raft protocol version 2 peers.json files contain a list of IP:Port addresses for +each server. Note that the port should refer to the RPC port and not the HTTP +API port. + +### Deploy peers.json to all server nodes + +Create entries for all remaining servers. You must confirm that servers you do +not include here have indeed failed and will not later rejoin the cluster. + +Deploy this file is the same across all remaining server nodes. + +### Verify keyring on server nodes + + + +Prior to Nomad 1.9.0, [key material][Key material] was never stored in Raft. This meant that +the `nomad agent snapshot save` command and snapshot agent did not save Nomad's +keyring. If you are using versions prior to Nomad 1.9.0, you should make sure you have backed up the keyring of at least one +server. + + + +Go to the [`-data-dir`][] of each Nomad server. Inside that directory, there +will be a `keystore/` sub-directory with `.nks.json` files. Ensure that these +files exist on at least one server before continuing. + +### Restart cluster nodes + +At this point, you can restart all the remaining servers. Log lines will be +emitted as the servers ingest the recovery file: + +```plaintext +... +2016/08/16 14:39:20 [INFO] nomad: found peers.json file, recovering Raft configuration... +2016/08/16 14:39:20 [INFO] nomad.fsm: snapshot created in 12.484µs +2016/08/16 14:39:20 [INFO] snapshot: Creating new snapshot at /tmp/peers/raft/snapshots/2-5-1471383560779.tmp +2016/08/16 14:39:20 [INFO] nomad: deleted peers.json file after successful recovery +2016/08/16 14:39:20 [INFO] raft: Restored from snapshot 2-5-1471383560779 +2016/08/16 14:39:20 [INFO] raft: Initial configuration (index=1): [{Suffrage:Voter ID:10.212.15.121:4647 Address:10.212.15.121:4647}] +... +``` + +If any servers managed to perform a graceful leave, you may need to have them +rejoin the cluster using the [`server join`] command: + +```shell-session +$ nomad server join +Successfully joined cluster by contacting 1 nodes. +``` + +It should be noted that any existing member can be used to rejoin the cluster as +the gossip protocol will take care of discovering the server nodes. + +At this point, the cluster should be in an operable state again. One of the +nodes should claim leadership and emit a log like: + +```plaintext +[INFO] nomad: cluster leadership acquired +``` + +You can use the [`nomad operator raft list-peers`] command to inspect the Raft +configuration: + +```shell-session +$ nomad operator raft list-peers +Node ID Address State Voter +nomad-server01.global 10.10.11.5:4647 10.10.11.5:4647 follower true +nomad-server02.global 10.10.11.6:4647 10.10.11.6:4647 leader true +nomad-server03.global 10.10.11.7:4647 10.10.11.7:4647 follower true +``` + +[`-bootstrap-expect=1`]: /nomad/docs/configuration/server#bootstrap_expect +[`-data-dir`]: /nomad/docs/configuration#data_dir +[`nomad operator raft list-peers`]: /nomad/commands/operator/raft/list-peers +[`nomad operator raft remove-peer`]: /nomad/commands/operator/raft/remove-peer +[`nomad server force-leave`]: /nomad/commands/server/force-leave +[`nomad server force-leave`]: /nomad/commands/server/force-leave +[`server join`]: /nomad/commands/server/join +[autopilot]: /nomad/docs/manage/autopilot +[bootstrapping guide]: /nomad/docs/deploy/clusters/connect-nodes +[deployment configuration]: /nomad/docs/architecture/cluster/consensus#deployment_table +[section below]: #manual-recovery-using-peers-json +[Key material]: /nomad/docs/manage/key-management +[restore the keyring]: /nomad/docs/manage/key-management#restoring-the-keyring-from-backup diff --git a/website/content/docs/monitor/cluster-topology.mdx b/website/content/docs/monitor/cluster-topology.mdx new file mode 100644 index 000000000..4aedceea0 --- /dev/null +++ b/website/content/docs/monitor/cluster-topology.mdx @@ -0,0 +1,123 @@ +--- +layout: docs +page_title: Cluster state topology +description: |- + Discover and use the topology visualization of the Nomad web UI to observe + clients and active workloads on your cluster. +--- + +# Cluster state topology + +As a Nomad cluster grows, the exact state of allocations and clients can become a mystery. For the most part this is a good thing: it means Nomad is quietly scheduling workloads without any need for intervention. However, as an operator, it is reasonable to still want to know what is going on within your cluster. + +The topology visualization is a single view of an entire cluster. It helps you perform preventative maintenance and it can help you understand your cluster's particular behaviors. + +## Prerequisites + +This tutorial assumes basic familiarity with Nomad. You must have access to an existing cluster that is running one or more jobs. + +Here is what you will need for this guide: + +- An active Nomad >=1.0.0 cluster +- Access to the cluster's web UI +- Read permissions to one or more namespaces + +## Navigate to the topology visualization + +The global left-hand navigation of the web UI has a Topology link under the cluster section. This link will take you to the topology visualization. + +[![The left-hand Nomad UI navigation with the topology link highlighted][topo-viz-link]][topo-viz-link] + +## The elements of the visualization + +The left-hand information panel contains aggregate cluster statistics. This includes the sum total of memory and CPU (in MHz) available to Nomad. The percentages of memory and CPU tell how much of each resource has been reserved—not how much is currently utilized. For instance, if a Docker container is currently utilizing 30MiB of memory but the task declared the need for 500MiB in the job spec, then the topology visualization will count this allocation as 500MiB. + +These aggregate metrics are meant to give a rough sense of scale and can answer immediate forecasting questions. If your cluster is currently at 80% capacity of a total 200GiB of memory and you know your services will only grow in the next year, then you can conclude that the capacity of the cluster will also have to grow. + +Be careful not to assume there is room for your allocation just because the aggregate remaining resources are less than what your job requires. Since resources are aggregated but allocations must be placed on a single client, it's possible that no client has room for your allocation. If there is 1500MHz of CPU available across a cluster but only 500MHz available per client, then a task that needs 600MHz of CPU cannot be placed. + +[![The info panel of the topology visualization with aggregate cluster statistics included][cluster-panel]][cluster-panel] + +The main visualization organizes all of your datacenters and clients and brings focus to your allocations. + +The outermost section represents a datacenter (1). Each section is labeled by the datacenter name, capacity, and status. Clients are rendered within their respective datacenter. Clients will also be labeled by their name, capacity, and status (2). This also includes icon indicators of scheduling eligibility and drain state (3). + +[![The primary cluster view of the topology visualization. Annotation 1: the datacenter section. Annotation 2: the client name and details. Annotation 3: a lock icon indicator for scheduling ineligibility. Annotation 4: two rows of rectangles representing allocations.][cluster-view]][cluster-view] + +Clients across your entire fleet are sized vertically based on their capacity. Clients with more total capacity are taller. This makes scanning a cluster easier. + +Within each client container are two rows; one for each primary scheduling unit (CPU and memory). Each row will include each allocation on the client scaled proportionally to the amount of resources reserved for it (4). An allocation for a task group that requires 8GiB of memory on a client that has 32GiB total will occupy 25% of a client row. + +## Interact with the visualization + +The topology visualization is designed to presennt as much information as possible in a single view. More information can be expanded by hovering or clicking on specific elements. + +Hovering over allocations will open a tooltip that gives quick details including the specific allocation reservation requirements and the job the allocation belongs to. + +[![The allocation tooltip showing allocation information for the allocation under the cursor][allocation-tooltip]][allocation-tooltip] + +Clicking an allocation will select it and swap out the cluster aggregate statistics in the information panel with allocation information. This includes links to the allocation, the job the allocation belongs to, the client the allocation is running on, and the current resource utilization over time. + +[![The info panel of the topology visualization with allocation information included][alloc-info-panel]][alloc-info-panel] + +In addition to the information shown in the panel, when an allocation is selected, associations among all allocations for the same task group and job will be drawn. This helps convey the distribution of a single task group across a cluster. + +[![Lines drawn between allocations to show that they all belong to the same job and task group][alloc-associations]][alloc-associations] + +For large clusters, the topology visualization will hide the client labels. When this is the case, clicking a client will expand client details in the information panel. + +[![The info panel of the topology visualization with client information included][client-panel]][client-panel] + +## Effective use of the topology visualization + +The topology visualization is intended to be an open-ended exploration tool. Here are a few example explorations that the visualization is particularly well-suited for. + +### Identify excessive client provisioning + +Sometimes clients are provisioned separately from application sizing and placement. This can lead to a drift between the expected client requirements and the actual requirements. Clients with no allocations still cost money. + +This can be quickly detected with the topology visualization. Empty clients are highlighted in red and labeled as empty. + +[![An empty client in the cluster view emphasized in red][empty-clients]][empty-clients] + +~> Is this a problem you have? Consider using [horizontal cluster autoscaling](https://github.com/hashicorp/nomad-autoscaler). + +### Spot potentially hazardous or flaky allocation distributions + +Nomad will automatically place allocations based on the requirements and constraints declared in the job spec. It is not uncommon for jobs to have missing constraints due to human error or unknown caveats. This class of error will often make itself known when the task starts, but sometimes the error is invisible and only surfaces through a peculiar error rate. + +For instance, imagine a service Service A that has five allocations. Four are in datacenter West and one is in datacenter East. Service A must talk to Service B, but due to networking rules, they cannot communicate across the datacenter boundary. If Service B is in datacenter West, then 80% of Service A traffic (assuming a uniform distribution) will work as intended while the remaining 20% will error. Or worse than error: silently behave incorrectly. + +This is easily remedied with a datacenter constraint in the job spec, but the problem must first be identified. Since the topology visualization will associate all allocations for Service A, this can be quickly spotted. + +[![Allocations associated across datacenters][alloc-associations-across-dcs]][alloc-associations-across-dcs] + +### Find noisy neighbor risks + +By default, tasks in Nomad have soft CPU limits. This lets tasks occasionally spike over their allotted CPU while still allowing for efficient bin-packing of allocations on a single client. + +It is possible for many allocations on a single client to exceed their CPU soft-limit—or for one allocation to greatly exceed it—starving other allocations of CPU. This can cause degraded performance and anomalous errors to arise from untested race conditions or timeouts. In this case, the problematic allocation is only problematic due to the external circumstances of the client it was scheduled on. + +The topology visualization makes it very clear when an important allocation is scheduled alongside many other allocations on a densely packed client. This alone doesn't mean there is a noisy neighbor problem, but it might be enough to defensively modify the job spec. Adding more CPU headroom or constraints can help stabilize the service. + +[![A single client in the topology visualization with many allocations][client-with-many-allocs]][client-with-many-allocs] + +## Next steps + +The topology visualization is a useful tool for learning to use Nomad and for understanding your cluster at a moment in time. It does _not_ show historical allocation reservation information. + +To get deeper utilization and historical data, you will need to set up a monitoring stack using Nomad's telemetry data. The topology visualization may inform your own custom dashboards as you invest in setting up operations tooling for your specific needs. + +1. [Use Prometheus to monitor Nomad metrics](/nomad/tutorials/manage-clusters/prometheus-metrics) +2. [Review the full set of metrics Nomad exports](/nomad/docs/reference/metrics) + +[topo-viz-link]: /img/monitor/topo-viz/topo-viz-link.png +[cluster-panel]: /img/monitor/topo-viz/cluster-panel.png +[cluster-view]: /img/monitor/topo-viz/cluster-view.png +[allocation-tooltip]: /img/monitor/topo-viz/allocation-tooltip.png +[alloc-info-panel]: /img/monitor/topo-viz/allocation-panel.png +[alloc-associations]: /img/monitor/topo-viz/allocation-associations.png +[client-panel]: /img/monitor/topo-viz/client-panel.png +[empty-clients]: /img/monitor/topo-viz/empty-clients.png +[alloc-associations-across-dcs]: /img/monitor/topo-viz/alloc-associations-across-dcs.png +[client-with-many-allocs]: /img/monitor/topo-viz/client-with-many-allocs.png diff --git a/website/content/docs/monitor/event-stream.mdx b/website/content/docs/monitor/event-stream.mdx new file mode 100644 index 000000000..28933c387 --- /dev/null +++ b/website/content/docs/monitor/event-stream.mdx @@ -0,0 +1,99 @@ +--- +layout: docs +page_title: Monitor Nomad's event stream +description: |- + Subscribe to Nomad's event stream to observe activities in your Nomad cluster. +--- + +# Monitor Nomad's event stream + +The event stream provides a way to subscribe to Job, Allocation, Evaluation, +Deployment, and Node changes in near real time. Whenever a state change occurs in +Nomad via Nomad's Finite State Machine (FSM) a set of events for each updated +object are created. + +Currently, Nomad users must take a number of steps to build a mental model of +what recent actions have occurred in their cluster. Individual log statements +often provide a small fragment of information related to a unit of work within +Nomad. Logs can also be interlaced with other unrelated logs, complicating the +process of understanding and building context around the issue a user is trying +to identify. Log statements may also be too specific to a smaller piece of work +that took place and the larger context around the log or action is missed or +hard to infer. + +Another pain point is that the value of log statements depend on how they are +managed by an organization. Larger teams with external log aggregators find +more value out of standard logging than a smaller team who manually scans +through files to debug issues. Today, an operator might combine the information +returned by `/v1/nodes`, `/v1/evaluations`, `/v1/allocations`, and then +`/v1/nodes` again using the Nomad API to try to figure out what exactly +happened. + +This complex workflow has led to a lack of prescriptive guidance when working +with customers and users on how to best monitor and debug their Nomad clusters. +Third party tools such as `nomad-firehose` have also emerged to try to solve +the issue by continuously scraping endpoints to react to changes within Nomad. + +The event stream provides a first-class solution to receiving a stream of events +in Nomad and provides users a much better understanding of how their cluster is +operating out of the box. + +## Access the event stream + +The event stream currently exists in the API, so users can access the new event +stream endpoint from the below command. + +```shell-session +$ curl -s -v -N http://127.0.0.1:4646/v1/event/stream +``` + +## Subscribe to all or certain events + +Filter on certain topics by specifying a filter key. This example listens +for Node events only relating to that particular node's ID. It also listens +to all Deployment events and Job events related to a job named web. + +```shell-session +$ curl -s -v -N \ + --data-urlencode "topic=Node:ccc4ce56-7f0a-4124-b8b1-a4015aa82c40" \ + --data-urlencode "topic=Deployment" \ + --data-urlencode "topic=Job:web" \ + http://127.0.0.1:4646/v1/event/stream +``` + +## What is in an event? + +Each event contains a `Topic`, `Type`, `Key`, `Namespace`, `FilterKeys`, `Index`, +and `Payload`. The contents of the Payload depend on the event Topic. An event +for the Node Topic contains a Node object. For example: + +```json +{ + "Topic": "Node", + "Type": "NodeRegistration", + "Key": "afb9c810-d701-875a-b2a6-f2631d7c2f60", + "Namespace": "", + "FilterKeys": null, + "Index": 7, + "Payload": { + "Node": { + "//": "...entire node object" + } + } +} +``` + +## Develop using event stream + +Here are some patterns of how you might use Nomad's event stream. + +- Subscribe to all or subset of cluster events + +- Add an additional tool in your regular debugging & monitoring workflow as an + SRE to gauge the qualitative state and health of your cluster. + +- Trace through a specific job deployment as it upgrades from an evaluation to a + deployment and uncover any blockers in the path for the scheduler. + +- Build a slack bot integration to send deploy notifications. + - https://github.com/drewbailey/nomad-deploy-notifier diff --git a/website/content/docs/operations/monitoring-nomad.mdx b/website/content/docs/monitor/index.mdx similarity index 96% rename from website/content/docs/operations/monitoring-nomad.mdx rename to website/content/docs/monitor/index.mdx index 41a8f01bf..5e2352103 100644 --- a/website/content/docs/operations/monitoring-nomad.mdx +++ b/website/content/docs/monitor/index.mdx @@ -291,14 +291,14 @@ latency and packet loss for the [Serf] address. [alerting-rules]: https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/ [alertmanager]: https://prometheus.io/docs/alerting/alertmanager/ -[allocation-metrics]: /nomad/docs/operations/metrics-reference#allocation-metrics +[allocation-metrics]: /nomad/docs/reference/metrics#allocation-metrics [circonus-telem]: /nomad/docs/configuration/telemetry#circonus-apica [collection-interval]: /nomad/docs/configuration/telemetry#collection_interval [datadog-alerting]: https://www.datadoghq.com/blog/monitoring-101-alerting/ [datadog-telem]: /nomad/docs/configuration/telemetry#datadog -[draining]: /nomad/tutorials/manage-clusters/node-drain +[draining]: /nomad/docs/manage/migrate-workloads [gh-11830]: https://github.com/hashicorp/nomad/pull/11830 -[metric-types]: /nomad/docs/operations/metrics-reference#metric-types +[metric-types]: /nomad/docs/reference/metrics#metric-types [metrics-api-endpoint]: /nomad/api-docs/metrics [prometheus-telem]: /nomad/docs/configuration/telemetry#prometheus [`plan_rejection_tracker`]: /nomad/docs/configuration/server#plan_rejection_tracker @@ -306,8 +306,8 @@ latency and packet loss for the [Serf] address. [statsd-exporter]: https://github.com/prometheus/statsd_exporter [statsd-telem]: /nomad/docs/configuration/telemetry#statsd [statsite-telem]: /nomad/docs/configuration/telemetry#statsite -[tagged-metrics]: /nomad/docs/operations/metrics-reference#tagged-metrics +[tagged-metrics]: /nomad/docs/reference/metrics#tagged-metrics [telemetry-block]: /nomad/docs/configuration/telemetry -[Consensus Protocol (Raft)]: /nomad/docs/operations/monitoring-nomad#consensus-protocol-raft -[Job Summary Metrics]: /nomad/docs/operations/metrics-reference#job-summary-metrics -[Scheduling]: /nomad/docs/concepts/scheduling/scheduling +[Consensus Protocol (Raft)]: /nomad/docs/monitor#raft-consensus-protocol +[Job Summary Metrics]: /nomad/docs/reference/metrics#job-summary-metrics +[Scheduling]: /nomad/docs/concepts/scheduling/how-scheduling-works diff --git a/website/content/docs/monitor/inspect-cluster.mdx b/website/content/docs/monitor/inspect-cluster.mdx new file mode 100644 index 000000000..61d307c51 --- /dev/null +++ b/website/content/docs/monitor/inspect-cluster.mdx @@ -0,0 +1,172 @@ +--- +layout: docs +page_title: Inspect the cluster +description: |- + View the server and client nodes, inspect an individual server or client node, + list allocations, and review client events from the Nomad web UI. +--- + +# Inspect the cluster + +The Web UI can be a powerful tool for monitoring the state of the Nomad cluster +from an operator's perspective. This includes showing all client nodes, showing +driver health for client nodes, driver status, resource utilization, allocations +by client node, and more. + +## View cluster clients + +From any page, the Clients List page can be accessed from the left-hand +navigation bar. On narrow screens this navigation bar may need to be opened from +the top-right menu button. The table lists every client in the cluster and is +searchable, sortable, and filterable. Each client row in the table shows basic +information, such as the Node ID, name, state, address, datacenter, and how many +allocations are running in it. + +This view will also live-update as the states of client nodes change. + +[![Clients List][img-clients-list]][img-clients-list] + +## Filter the client view + +If your Nomad cluster has many client nodes, it can be useful to filter the list +of all client nodes down to only those matching certain facets. The Web UI has +three facets you can filter by: + +1. **Class:** The node of the client, including a dynamically generated list + based on the node class of each client node in the cluster. + +1. **State:** The state of the cluster, including Initializing, Ready, Down, + Ineligible, and Draining. + +1. **Datacenter:** The datacenter the client node is in, including a dynamically + generated list based on all the datacenters in the cluster. + +[![Clients filters][img-clients-filters]][img-clients-filters] + +## Inspect an individual client + +From the Clients List page, clicking a client node in the table will direct you +to the Client Detail page for the client node. This page includes all +information about the client node is live-updated to always present up-to-date +information. + +[![Client Detail][img-client-detail]][img-client-detail] + +## Monitor resource utilization + +Nomad has APIs for reading point-in-time resource utilization metrics for client +nodes. The Web UI uses these metrics to create time-series graphics for the +current session. + +When viewing a client node, resource utilization will automatically start +logging. + +[![Client Resource Utilization][img-client-resource-utilization]][img-client-resource-utilization] + +## List client allocations + +Allocations belong to jobs and are placed on client nodes. The Client Detail +page will list all allocations for a client node, including completed, failed, +and lost allocations, until they are garbage-collected. + +This is presented in a searchable table which can additionally be filtered to +only preempted allocations. + +[![Client Allocations][img-client-allocations]][img-client-allocations] + +## Review client events + +Client nodes will also emit events on meaningful state changes, such as when the +node becomes ready for scheduling or when a driver becomes unhealthy. + +[![Client Events][img-client-events]][img-client-events] + +## Check client driver status + +Task drivers are additional services running on a client node. Nomad will +fingerprint and communicate with the task driver to determine if the driver is +available and healthy. This information is reported through the Web UI on the +Client Detail page. + +[![Client Driver Status][img-client-driver-status]][img-client-driver-status] + +## View client attributes + +In order to allow job authors to constrain the placement of their jobs, Nomad +fingerprints the hardware of the node the client agent is running on. This is a +deeply nested document of properties that the Web UI presents in a scannable +way. + +In addition to the hardware attributes, Nomad operators can annotate a client +node with [metadata] as part of the client configuration. This metadata is also +presented on the Client Detail page. + +[![Client Attributes][img-client-attributes]][img-client-attributes] + +## Monitor a node drain + +A routine part of maintaining a Nomad cluster is draining nodes of allocations. +This can be in preparation of performing operating system upgrades or +decommissioning an old node in favor of a new VM. + +Drains are [performed from the CLI], but the status of a drain can be monitored +from the Web UI. A client node will state if it is actively draining or +ineligible for scheduling. + +Since drains can be configured in a variety of ways, the Client Detail page will +also present the details of how the drain is performed. + +[![Client Drain][img-client-drain]][img-client-drain] + +## View cluster servers + +Whereas client nodes are used to run your jobs, server nodes are used to run +Nomad and maintain availability. From any page, the Servers List page can be +accessed from the left-hand navigation bar. + +The table lists every server node in your cluster. This will be a small list— +[typically three or five]. + +[![Servers List][img-servers-list]][img-servers-list] + +## Inspect an individual server + +Clicking a server node on the Servers List will expand the tags table for the +server node. + +[![Server Detail][img-server-detail]][img-server-detail] + +## Secure the UI + +Depending on the size of your team and the details of you Nomad deployment, you +may wish to control which features different internal users have access to. This +includes limiting who has access to list and manage client nodes and list and +manage server nodes. You can enforce this with Nomad's access control list +system. + +By default, all features—read and write—are available to all users of the Web +UI. Check out the [Securing the Web UI with ACLs] tutorial to learn how to prevent +anonymous users from having write permissions as well as how to continue to use +Web UI write features as a privileged user. + +## Continue your exploration + +Now that you have explored how to inspect the state of your cluster through the +Nomad UI, you will next learn some considerations you should keep in mind when +using the Nomad UI. + +[img-client-allocations]: /img/monitor/guide-ui-img-client-allocations.png +[img-client-attributes]: /img/monitor/guide-ui-img-client-attributes.png +[img-client-detail]: /img/monitor/guide-ui-img-client-detail.png +[img-client-drain]: /img/monitor/guide-ui-img-client-drain.png +[img-client-driver-status]: /img/monitor/guide-ui-img-client-driver-status.png +[img-client-events]: /img/monitor/guide-ui-img-client-events.png +[img-client-resource-utilization]: /img/monitor/guide-ui-img-client-resource-utilization.png +[img-clients-filters]: /img/monitor/guide-ui-img-clients-filters.png +[img-clients-list]: /img/monitor/guide-ui-img-clients-list.png +[img-server-detail]: /img/monitor/guide-ui-img-server-detail.png +[img-servers-list]: /img/monitor/guide-ui-img-servers-list.png +[metadata]: /nomad/docs/configuration/client#meta +[performed from the cli]: /nomad/docs/manage/migrate-workloads +[securing the web ui with acls]: /nomad/tutorials/access-control +[typically three or five]: /nomad/docs/architecture/cluster/consensus#deployment-table diff --git a/website/content/docs/monitor/inspect-workloads.mdx b/website/content/docs/monitor/inspect-workloads.mdx new file mode 100644 index 000000000..81408120a --- /dev/null +++ b/website/content/docs/monitor/inspect-workloads.mdx @@ -0,0 +1,221 @@ +--- +layout: docs +page_title: Inspect job workloads +description: |- + Inspect jobs, allocations, and tasks in the Nomad web UI and interact + with these objects to perform operations like restart an allocation or + stop a job. +--- + +# Inspect job workloads + +The Web UI can be a powerful companion when monitoring and debugging jobs +running in Nomad. The Web UI will list all jobs, link jobs to allocations, +allocations to client nodes, client nodes to driver health, and much more. + +## List jobs + +The first page you will arrive at in the Web UI is the Jobs List page. Here you +will find every job for a namespace in a region. The table of jobs is +searchable, sortable, and filterable. Each job row in the table shows basic +information, such as job name, status, type, and priority, as well as richer +information such as a visual representation of all allocation statuses. + +This view will also live-update as jobs get submitted, get purged, and change +status. + +[![Jobs List][img-jobs-list]][img-jobs-list] + +## Filter job view + +If your Nomad cluster has many jobs, it can be useful to filter the list of all +jobs down to only those matching certain facets. The Web UI has four facets you +can filter by: + +1. **Type:** The type of job, including Batch, Parameterized, Periodic, Service, + and System. + +1. **Status:** The status of the job, including Pending, Running, and Dead. + +1. **Datacenter:** The datacenter the job is running in, including a dynamically + generated list based on the jobs in the namespace. + +1. **Prefix:** The possible common naming prefix for a job, including a + dynamically generated list based on job names up to the first occurrence of + `-`, `.`, and `_`. Only prefixes that match multiple jobs are included. + +[![Job Filters][img-job-filters]][img-job-filters] + +## Inspect an allocation + +In Nomad, allocations are the schedulable units of work. This is where runtime +metrics begin to surface. An allocation is composed of one or more tasks, and +the utilization metrics for tasks are aggregated so they can be observed at the +allocation level. + +### Monitor resource utilization + +Nomad has APIs for reading point-in-time resource utilization metrics for tasks +and allocations. The Web UI uses these metrics to create time-series graphs for +the current session. + +When viewing an allocation, resource utilization will automatically start +logging. + +[![Allocation Resource Utilization][img-alloc-resource-utilization]][img-alloc-resource-utilization] + +### Review task events + +When Nomad places, prepares, and starts a task, a series of task events are +emitted to help debug issues in the event that the task fails to start. + +Task events are listed on the Task Detail page and live-update as Nomad handles +managing the task. + +[![Task Events][img-task-events]][img-task-events] + +### View rescheduled allocations + +Allocations will be placed on any client node that satisfies the constraints of +the job definition. There are events, however, that will cause Nomad to +reschedule allocations, (e.g., node failures). + +Allocations can be configured [in the job definition to reschedule] to a +different client node if the allocation ends in a failed status. This will +happen after the task has exhausted its [local restart attempts]. + +The end result of this automatic procedure is a failed allocation and that +failed allocation's rescheduled successor. Since Nomad handles all of this +automatically, the Web UI makes sure to explain the state of allocations through +icons and linking previous and next allocations in a reschedule chain. + +[![Allocation Reschedule Icon][img-alloc-reschedule-icon]][img-alloc-reschedule-icon] + +[![Allocation Reschedule Details][img-alloc-reschedule-details]][img-alloc-reschedule-details] + +### Unhealthy driver + +Given the nature of long-lived processes, it's possible for the state of the +client node an allocation is scheduled on to change during the lifespan of the +allocation. Nomad attempts to monitor pertinent conditions including driver +health. + +The Web UI denotes when a driver an allocation depends on is unhealthy on the +client node the allocation is running on. + +[![Allocation Unhealthy Driver][img-alloc-unhealthy-driver]][img-alloc-unhealthy-driver] + +### Preempted allocations + +Much like how Nomad will automatically reschedule allocations, Nomad will +automatically preempt allocations when necessary. When monitoring allocations in +Nomad, it's useful to know what allocations were preempted and what job caused +the preemption. + +The Web UI makes sure to tell this full story by showing which allocation caused +an allocation to be preempted as well as the opposite: what allocations an +allocation preempted. This makes it possible to traverse down from a job to a +preempted allocation, to the allocation that caused the preemption, to the job +that the preempting allocation is for. + +[![Allocation Preempter][img-alloc-preempter]][img-alloc-preempter] + +[![Allocation Preempted][img-alloc-preempted]][img-alloc-preempted] + +## Review task logs + +A task will typically emit log information to `stdout` and `stderr`. Nomad +captures these logs and exposes them through an API. The Web UI uses these APIs +to offer `head`, `tail`, and streaming logs from the browser. + +The Web UI will first attempt to directly connect to the client node the task is +running on. Typically, client nodes are not accessible from the public internet. +If this is the case, the Web UI will fall back and proxy to the client node from +the server node with no loss of functionality. + +[![Task Logs][img-task-logs]][img-task-logs] + +~> Not all browsers support streaming HTTP requests. In the event that streaming +is not supported, logs will still be followed using interval polling. + +## Restart or stop an allocation or task + +Nomad allows for restarting and stopping individual allocations and tasks. When +a task is restarted, Nomad will perform a local restart of the task. When an +allocation is stopped, Nomad will mark the allocation as complete and perform a +reschedule onto a different client node. + +Both of these features are also available in the Web UI. + +[![Allocation Stop and Restart][img-alloc-stop-restart]][img-alloc-stop-restart] + +## Force a periodic instance + +Periodic jobs are configured like a cron job. Sometimes, you might want to start +the job before the its next scheduled run time. Nomad calls this a [periodic +force] and it can be done from the Web UI on the Job Overview page for a +periodic job. + +[![Periodic Force][img-periodic-force]][img-periodic-force] + +## Submit a new version of a job + +From the Job Definition page, a job can be edited. After clicking the Edit +button in the top-right corner of the code window, the job definition JSON +becomes editable. The edits can then be planned and scheduled. + +[![Job Definition Edit][img-job-definition-edit]][img-job-definition-edit] + +~> Since each job within a namespace must have a unique name, it is possible to +submit a new version of a job from the Run Job screen. Always review the plan +output. + +## Monitor a deployment + +When a system or service job includes the [`update` stanza], a deployment is +created upon job submission. Job deployments can be monitored in realtime from +the Web UI. + +The Web UI will show as new allocations become placed, tallying towards the +expected total, and tally allocations as they become healthy or unhealthy. + +Optionally, a job may use canary deployments to allow for additional health +checks or manual testing before a full roll out. If a job uses canaries and is +not configured to automatically promote the canary, the canary promotion +operation can be done from the Job Overview page in the Web UI. + +[![Job Deployment with Canary Promotion][img-job-deployment-canary]][img-job-deployment-canary] + +## Stop a job + +Jobs can be stopped from the Job Overview page. Stopping a job will gracefully +stop all allocations, marking them as complete, and freeing up resources in the +cluster. + +[![Job Stop][img-job-stop]][img-job-stop] + +## Continue your exploration + +Now that you have explored operations that can be performed with jobs through +the Nomad UI, learn how to inspect the state of your cluster using the Nomad UI. + +[`update` stanza]: /nomad/docs/job-specification/update +[img-alloc-preempted]: /img/monitor/guide-ui-img-alloc-preempted.png +[img-alloc-preempter]: /img/monitor/guide-ui-img-alloc-preempter.png +[img-alloc-reschedule-details]: /img/monitor/guide-ui-img-alloc-reschedule-details.png +[img-alloc-reschedule-icon]: /img/monitor/guide-ui-img-alloc-reschedule-icon.png +[img-alloc-resource-utilization]: /img/monitor/guide-ui-img-alloc-resource-utilization.png +[img-alloc-stop-restart]: /img/monitor/guide-ui-img-alloc-stop-restart.png +[img-alloc-unhealthy-driver]: /img/monitor/guide-ui-img-alloc-unhealthy-driver.png +[img-job-definition-edit]: /img/monitor/guide-ui-img-job-definition-edit.png +[img-job-deployment-canary]: /img/monitor/guide-ui-img-job-deployment-canary.png +[img-job-filters]: /img/monitor/guide-ui-img-job-filters.png +[img-job-stop]: /img/monitor/guide-ui-img-job-stop.png +[img-jobs-list]: /img/monitor/guide-ui-jobs-list.png +[img-periodic-force]: /img/monitor/guide-ui-img-periodic-force.png +[img-task-events]: /img/monitor/guide-ui-img-task-events.png +[img-task-logs]: /img/monitor/guide-ui-img-task-logs.png +[in the job definition to reschedule]: /nomad/docs/job-specification/reschedule +[local restart attempts]: /nomad/docs/job-specification/restart +[periodic force]: /nomad/commands/job/periodic-force +[securing the web ui with acls]: /nomad/tutorials/access-control diff --git a/website/content/docs/networking/cni.mdx b/website/content/docs/networking/cni.mdx index 19376bddb..d81072b3c 100644 --- a/website/content/docs/networking/cni.mdx +++ b/website/content/docs/networking/cni.mdx @@ -39,8 +39,8 @@ Perform the following on each Nomad client: 1. [Create a bridge mode configuration](#create-a-cni-bridge-mode-configuration). 1. [Configure Nomad clients](#configure-nomad-clients). -After you configure and restart your Nomad clients, [use a CNI network with a -job](#use-a-cni-network-with-a-job). +After you configure and restart your Nomad clients, refer to [Use a CNI network +with a job][] for job configuration. ### Install CNI reference plugins @@ -242,24 +242,13 @@ client {
-## Use a CNI network with a job +## Next steps -To specify that a job should use a CNI network, set the task group's network -[`mode`](/nomad/docs/job-specification/network#mode) attribute to the value -`cni/`. Nomad then schedules the workload on client nodes -that have fingerprinted a CNI configuration with the given name. For example, to -use the configuration named `mynet`, you should set the task group's network -mode to `cni/mynet`. Nodes that have a network configuration defining a network -named `mynet` in their `cni_config_dir` are eligible to run the workload. Nomad -additionally supplies the following arguments via `CNI_ARGS` to the CNI network: -`NOMAD_REGION`, `NOMAD_NAMESPACE`, `NOMAD_JOB_ID`, `NOMAD_GROUP_NAME`, and -`NOMAD_ALLOC_ID`. Since the `CNI_ARGS` do not allow values to contain a semicolon -Nomad will not set keys where the value contains a semicolon (this could happen -with the job ID). CNI plugins utilizing `NOMAD_*` CNI arguments are advised to -apply a defensive policy or simply error out. + Refer to [Use a CNI network with a job][] for job configuration details. [cni-spec]: https://www.cni.dev/docs/spec/ [cni-plugin-docs]: https://www.cni.dev/plugins/current/ [bridge]: https://www.cni.dev/plugins/current/main/bridge/ [firewall]: https://www.cni.dev/plugins/current/meta/firewall/ [portmap]: https://www.cni.dev/plugins/current/meta/portmap/ +[Use a CNI network with a job]: /nomad/docs/job-networking/cni diff --git a/website/content/docs/integrations/consul/index.mdx b/website/content/docs/networking/consul/index.mdx similarity index 75% rename from website/content/docs/integrations/consul/index.mdx rename to website/content/docs/networking/consul/index.mdx index 1b5a3aa6d..4be4c72d7 100644 --- a/website/content/docs/integrations/consul/index.mdx +++ b/website/content/docs/networking/consul/index.mdx @@ -92,43 +92,7 @@ information. ## Consul namespaces -Nomad provides integration with [Consul Namespaces][consul_namespaces] for -service registrations specified in `service` blocks and Consul KV reads in -`template` blocks. - -By default, Nomad will not specify a Consul namespace on service registrations -or KV store reads, which Consul then implicitly resolves to the `"default"` -namespace. This default namespace behavior can be modified by setting the -[`namespace`][consul_agent_namespace] field in the Nomad agent Consul -configuration block. - -For more control over Consul namespaces, Nomad Enterprise supports configuring -the Consul [namespace][consul_jobspec_namespace] at the group or task level in -the Nomad job spec as well as the [`-consul-namespace`][consul_run_namespace] -command line argument for `job run`. - -The Consul namespace used for a set of group or task service registrations -within a group, as well as `template` KV store access is determined from the -following hierarchy from highest to lowest precedence: - -* group and task configuration: Consul - [namespace field][consul_jobspec_namespace] defined in the job at the task or - group level. - -* job run command option: Consul namespace defined in the - [`-consul-namespace`][consul_run_namespace] command line option on job - submission. - -* job run command environment various: Consul namespace defined as the - [`CONSUL_NAMESPACE`][consul_env_namespace] environment variable on job - submission. - -* agent configuration: Consul namespace defined in the - [`namespace`][consul_agent_namespace] Nomad agent Consul configuration - parameter. - -* Consul default: If no Consul namespace options are configured, Consul will - automatically make use of the `"default"` namespace. +@include 'consul-namespaces.mdx' ## Multiple Consul clusters @@ -178,7 +142,7 @@ Consul, with some exceptions. | Nomad 1.7.0+ | ✅ | ✅ | ✅ | | Nomad 1.6.0+ | ✅ | ✅ | ✅ | -[Automatic Clustering with Consul]: /nomad/tutorials/manage-clusters/clustering +[Automatic Clustering with Consul]: /nomad/docs/deploy/clusters/connect-nodes [CDP]: /consul/docs/connect/dataplane [Consul Template]: https://github.com/hashicorp/consul-template [Consul]: https://www.consul.io/ "Consul by HashiCorp" @@ -186,15 +150,10 @@ Consul, with some exceptions. [`consul.cluster`]: /nomad/docs/job-specification/consul#cluster [`template` job specification documentation]: /nomad/docs/job-specification/template#consul-integration [`template`]: /nomad/docs/job-specification/template -[consul_agent_namespace]: /nomad/docs/configuration/consul#namespace -[consul_jobspec_namespace]: /nomad/docs/job-specification/consul#namespace -[consul_namespaces]: /consul/docs/enterprise/namespaces -[consul_run_namespace]: /nomad/docs/commands/job/run#consul-namespace -[consul_env_namespace]: /consul/commands#consul_namespace -[int_consul_acl]: /nomad/docs/integrations/consul/acl -[int_consul_mesh]: /nomad/docs/integrations/consul/service-mesh +[int_consul_acl]: /nomad/docs/secure/acl/consul +[int_consul_mesh]: /nomad/docs/networking/consul/service-mesh [nomad_config_consul]: /nomad/docs/configuration/consul -[service]: /nomad/docs/job-specification/service "Nomad service Job Specification" +[service]: /nomad/docs/job-specification/service [bridge or CNI networking mode]: /nomad/docs/job-specification/network#mode [nameserver]: /nomad/docs/job-specification/network#servers [transparent proxy]: /nomad/docs/job-specification/transparent_proxy diff --git a/website/content/docs/integrations/consul/service-mesh.mdx b/website/content/docs/networking/consul/service-mesh.mdx similarity index 86% rename from website/content/docs/integrations/consul/service-mesh.mdx rename to website/content/docs/networking/consul/service-mesh.mdx index 2be9d9928..d2e969944 100644 --- a/website/content/docs/integrations/consul/service-mesh.mdx +++ b/website/content/docs/networking/consul/service-mesh.mdx @@ -7,6 +7,24 @@ description: |- # Consul service mesh +## Introduction + +Service mesh is a networking pattern that deploys and configures +infrastructure to directly connect workloads. One of the most common pieces of +infrastructure deployed are sidecar proxies. These proxies usually run +alongside the main workload in an isolated network namespace such that all +network traffic flows through the proxy. + +The proxies are often referred to as the **data plane** since they are +responsible for _moving data_ while the components that configure them are part +of the **control plane** because they are responsible for controlling the _flow +of data_. + +By funneling traffic through a common layer of infrastructure the control plane +is able to centralize and automatically apply configuration to all proxies to +enable features such as automated traffic encryption, fine-grained routing, and +service-based access control permissions throughout the entire mesh. + [Consul service mesh](/consul/docs/connect) provides service-to-service connection authorization and encryption using mutual Transport Layer Security (TLS). Applications can use sidecar proxies in a @@ -16,7 +34,7 @@ inbound and outbound connections without being aware of the service mesh at all. ~> **Note:** Nomad's service mesh integration requires Linux network namespaces. Consul service mesh will not run on Windows or macOS. -# Nomad with Consul service mesh integration +## Nomad with Consul service mesh integration Nomad integrates with Consul to provide secure service-to-service communication between Nomad jobs and task groups. To support Consul service mesh, Nomad @@ -37,7 +55,7 @@ For using the Consul service mesh integration with Consul ACLs enabled, see the [Secure Nomad Jobs with Consul Service Mesh](/nomad/tutorials/integrate-consul/consul-service-mesh) guide. -# Nomad Consul service mesh example +## Nomad Consul service mesh example The following section walks through an example to enable secure communication between a web dashboard and a backend counting service. The web dashboard and @@ -46,9 +64,9 @@ proxies to run along side these applications. The dashboard is configured to connect to the counting service via localhost on port 9001. The proxy is managed by Nomad, and handles mTLS communication to the counting service. -## Prerequisites +### Prerequisites -### Consul +#### Consul The Consul service mesh integration with Nomad requires [Consul 1.6 or later.](https://releases.hashicorp.com/consul/1.6.0/) The Consul agent can be @@ -95,7 +113,7 @@ For JSON configurations: } ``` -#### Consul TLS +##### Consul TLS ~> **Note:** Consul 1.14+ made a [backwards incompatible change][consul_grpc_tls] in how TLS enabled grpc listeners work. When using Consul 1.14 with TLS enabled users @@ -116,7 +134,7 @@ consul { } ``` -#### Consul access control lists +##### Consul access control lists ~> **Note:** Starting in Nomad v1.3.0, Consul Service Identity ACL tokens automatically generated by Nomad on behalf of Connect enabled services are now created in [`Local`] @@ -133,7 +151,7 @@ service_prefix "" { policy = "read" } node_prefix "" { policy = "read" } ``` -#### Transparent proxy +##### Transparent proxy Using Nomad's support for [transparent proxy][] configures the task group's network namespace so that traffic flows through the Envoy proxy. When the @@ -172,7 +190,7 @@ bind_addr = "{{ GetPrivateInterfaces | include \"network\" \"10.37.105.0/20\" recursors = ["208.67.222.222", "208.67.220.220"] ``` -### Nomad +#### Nomad Nomad must schedule onto a routable interface in order for the proxies to connect to each other. The following steps show how to start a Nomad dev agent @@ -182,16 +200,16 @@ configured for Consul service mesh. $ sudo nomad agent -dev-connect ``` -### Container Network Interface (CNI) plugins +#### Container Network Interface (CNI) plugins Nomad uses CNI reference plugins to configure the network namespace used to secure the Consul service mesh sidecar proxy. All Nomad client nodes using network namespaces must have these CNI plugins [installed][cni_install]. To use [`transparent_proxy`][] mode, Nomad client nodes will also need the -[`consul-cni`][] plugin installed. See the Linux post-installation [steps](/nomad/docs/install#linux-post-installation-steps) for more detail on how to install CNI plugins. +[`consul-cni`][] plugin installed. See the Linux post-installation [steps](/nomad/docs/deploy#linux-post-installation-steps) for more detail on how to install CNI plugins. -## Run the service mesh-enabled services +### Run the service mesh-enabled services Once Nomad and Consul are running, with Consul DNS enabled for transparent proxy mode as described above, submit the following service mesh-enabled services to @@ -269,7 +287,7 @@ job "countdash" { The job contains two task groups: an API service and a web frontend. -### API service +#### API service The API service is defined as a task group with a bridge network: @@ -316,7 +334,7 @@ Envoy proxy will automatically route traffic to that port inside the network namespace. Note that currently this cannot be a named port; it must be a hard-coded port value. See [GH-9907]. -### Web Frontend +#### Web Frontend The web frontend is defined as a task group with a bridge network and a static forwarded port: @@ -381,7 +399,7 @@ that the `count-api.virtual.consul` name resolves to a virtual IP address. Note that you don't need to specify a port number because the virtual IP will only be directed to the correct service port. -### Manually configured upstreams +#### Manually configured upstreams You can also use Connect without Consul DNS and `transparent_proxy` mode. This approach is not recommended because it requires duplicating service intention @@ -447,6 +465,35 @@ This environment variable value gets interpolated with the upstream's address. Note that dashes (`-`) are converted to underscores (`_`) in environment variables so `count-api` becomes `count_api`. +### Envoy proxy + +Consul Service Mesh uses [Envoy][] as proxy. Nomad calls Consul's [`consul +connect envoy -bootstrap`][consul_cli_envoy] CLI command to generate the +initial proxy configuration. + +Nomad injects a prestart sidecar Docker task to run the Envoy proxy. This task +can be customized using the [`sidecar_task`][] block. + +### Gateways + +Since the mesh defines a closed boundary that only selected services can +participate in, there are specialized proxies called gateways that can be used +for mesh-wide connectivity. Nomad can deploy these gateways using the +[`gateway`][] block. Nomad injects an Envoy proxy task to any `group` with a +`gateway` service. + +The types of gateways provided by Consul Service Mesh are: + +- **Mesh gateways** allow communication between different service meshes and + are deployed using the [`mesh`][] parameter. + +- **Ingress gateways** allow services outside the mesh to connect to services + inside the mesh and are deployed using the [`ingress`][] parameter. + +- **Egress gateways** allow services inside the mesh to communication with + services outside the mesh and are deployed using the [`terminating`][] + parameter. + ## Limitations - The minimum Consul version to use Connect with Nomad is Consul v1.8.0. @@ -504,7 +551,7 @@ nomad node meta apply -node-id $nodeID \ [anon_token]: /consul/docs/security/acl/tokens#special-purpose-tokens [consul_ports]: /consul/docs/agent/config/config-files#ports [consul_grpc_tls]: /consul/docs/upgrading/upgrade-specific#changes-to-grpc-tls-configuration -[cni_install]: /nomad/docs/install#linux-post-installation-steps +[cni_install]: /nomad/docs/deploy#linux-post-installation-steps [transparent proxy]: /consul/docs/k8s/connect/transparent-proxy [go-sockaddr/template]: https://pkg.go.dev/github.com/hashicorp/go-sockaddr/template [`recursors`]: /consul/docs/agent/config/config-files#recursors diff --git a/website/content/docs/networking/index.mdx b/website/content/docs/networking/index.mdx index 8ee25dfd6..c8247b9b5 100644 --- a/website/content/docs/networking/index.mdx +++ b/website/content/docs/networking/index.mdx @@ -56,13 +56,13 @@ interface to the allocation. Mapping](/img/networking/port_mapping.png)](/img/networking/port_mapping.png) Tasks can access the selected port number using the -[`NOMAD_PORT_