Commit Graph

4183 Commits

Author SHA1 Message Date
Tim Gross
e4e445ed1d docs: fix an anchor link in secure vars docs (#14231) 2022-08-23 10:46:24 -04:00
Seth Hoenig
e49bb89093 Merge pull request #14215 from hashicorp/docs-update-checks-for-nsd
docs: update check documentation with NSD specifics
2022-08-23 09:23:53 -05:00
Seth Hoenig
701c926b45 docs: fix checks doc typo
Co-authored-by: Piotr Kazmierczak <phk@mm.st>
2022-08-23 09:23:36 -05:00
Tim Gross
2eaf3d7270 allow ACL policies to be associated with workload identity (#14140)
The original design for workload identities and ACLs allows for operators to
extend the automatic capabilities of a workload by using a specially-named
policy. This has shown to be potentially unsafe because of naming collisions, so
instead we'll allow operators to explicitly attach a policy to a workload
identity.

This changeset adds workload identity fields to ACL policy objects and threads
that all the way down to the command line. It also a new secondary index to the
ACL policy table on namespace and job so that claim resolution can efficiently
query for related policies.
2022-08-22 16:41:21 -04:00
Luiz Aoqui
934bafb922 template: use pointer values for gid and uid (#14203)
When a Nomad agent starts and loads jobs that already existed in the
cluster, the default template uid and gid was being set to 0, since this
is the zero value for int. This caused these jobs to fail in
environments where it was not possible to use 0, such as in Windows
clients.

In order to differentiate between an explicit 0 and a template where
these properties were not set we need to use a pointer.
2022-08-22 16:25:49 -04:00
Seth Hoenig
c7ab7a8313 docs: update check documentation with NSD specifics
This PR updates the checks documentation to mention support for checks
when using the Nomad service provider. There are limitations of NSD
compared to Consul, and those configuration options are now noted as
being Consul-only.
2022-08-22 10:50:26 -05:00
Phil Renaud
3bf5633446 [ui] general keyboard navigation: 1.3.4 release (#14138)
* Initialized keyboard service

Neat but funky: dynamic subnav traversal

👻

generalized traverseSubnav method

Shift as special modifier key

Nice little demo panel

Keyboard shortcuts keycard

Some animation styles on keyboard shortcuts

Handle situations where a link is deeply nested from its parent menu item

Keyboard service cleanup

helper-based initializer and teardown for new contextual commands

Keyboard shortcuts modal component added and demo-ghost removed

Removed j and k from subnav traversal

Register and unregister methods for subnav plus new subnavs for volumes and volume

register main nav method

Generalizing the register nav method

12762 table keynav (#12975)

* Experimental feature: shortcut visual hints

* Long way around to a custom modifier for keyboard shortcuts

* dynamic table and list iterative shortcuts

* Progress with regular old tether

* Delogging

* Table Keynav tether fix, server and client navs, and fix to shiftless on modified arrow keys

Go to Optimize keyboard link and storage key changed to g r

parameterized jobs keyboard nav

Dynamic numeric keynav for multiple tables (#13482)

* Multiple tables init

* URL-bind enumerable keyboard commands and add to more taskRow and allocationRows

* Type safety and lint fixes

* Consolidated push to keyCommands

* Default value when removing keyCommands

* Remove the URL-based removal method and perform a recompute on any add

Get tests passing in Keynav: remove math helpers and a few other defensive moves (#13761)

* Remove ember math helpers

* Test fixes for jobparts/body

* Kill an unneeded integration helper test

* delog

* Trying if disabling percy lets this finish

* Okay so its not percy; try parallelism in circle

* Percyless yet again

* Trying a different angle to not have percy

* Upgrade percy to 1.6.1

[ui] Keyboard nav: "u" key to go up a level (#13754)

* U to go up a level

* Mislabelled my conditional

* Custom lint ignore rule

* Custom lint ignore rule, this time with commas

* Since we're getting rid of ember math helpers elsewhere, do the math ourselves here

Replace ArrowLeft etc. with an ascii arrow (#13776)

* Replace ArrowLeft etc. with an ascii arrow

* non-mutative helper cleanup

Keyboard Nav: let users rebind their shortcuts (#13781)

* click-outside and shortcuts enabled/disabled toggle

* Trap focus when modal open

* Enabled/disabled saved to localStorage

* Autofocus edit button on variable index

* Modal overflow styles

* Functional rebind

* Saving rebinds to localStorage for all majors

* Started on defaultCommandBindings

* Modal header style and cancel rebind on escape

* keyboardable keybindings w buttons instead of spans

* recording and defaultvalues

* Enter short-circuits rebind

* Only some commands are rebindable, and dont show dupes

* No unused get import

* More visually distinct header on modal

* Disallowed keys for rebind, showing buffer as you type, and moving dedupe to modal logic

willDestroy hook to prevent tests from doubling/tripling up addEventListener on kb events

remove unused tests

Keyboard Navigation acceptance tests (#13893)

* Acceptance tests for keyboard modal

* a11y audit fix and localStorage clear

* Bind/rebind/localStorage tests

* Keyboard tests for dynamic nav and tables

* Rebinder and assert expectation

* Second percy snapshot showing hints no longer relevant

Weird issue where linktos with query props specifically from the task-groups page would fail to route / hit undefined.shouldSuperCede errors

Adds the concept of exclusivity to a keycommand, removing peers that also share its label

Lintfix

Changelog and PR feedback

Changelog and PR feedback

Fix to rebinding in firefox by blurring the now-disabled button on rebind (#14053)

* Secure Variables shortcuts removed

* Variable index route autofocus removed

* Updated changelog entry

* Updated changelog entry

* Keynav docs (#14148)

* Section added to the API Docs UI page

* Added a note about disabling

* Prev and Next order

* Remove dev log and unneeded comments
2022-08-17 12:59:33 -04:00
Kerim Satirli
c1abf47e16 adds link for Nomad-Pack GitHub action (#14118) 2022-08-16 08:34:26 +02:00
Tim Gross
ed55b97f30 secure vars: filter by path in List RPCs (#14036)
The List RPCs only checked the ACL for the Prefix argument of the request. Add
an ACL filter to the paginator for the List RPC.

Extend test coverage of ACLs in the List RPC and in the `acl` package, and add a
"deny" capability so that operators can deny specific paths or prefixes below an
allowed path.
2022-08-15 11:38:20 -04:00
Mike Nomitch
24bd3378fe Add notes about DAS being prometheus only (#14040) 2022-08-15 10:17:31 +02:00
Seth Hoenig
ea570b293b Merge pull request #14086 from Morantron/patch-1
Fix typo in /tools/autoscaling
2022-08-12 09:07:12 -05:00
Seth Hoenig
d3222e24c3 Merge pull request #14088 from hashicorp/b-plan-vault-token
cli: support vault token in plan command
2022-08-12 09:05:34 -05:00
Seth Hoenig
9504a4537b Merge pull request #14089 from hashicorp/f-docker-disable-healthchecks
docker: configuration for disable docker healthcheck
2022-08-12 09:00:31 -05:00
James Rasell
3f5bc76df1 docs: correctly state RPC port is used by servers and clients. (#14091) 2022-08-12 10:14:14 +02:00
Seth Hoenig
0bd42a501c docker: create a docker task config setting for disable built-in healthcheck
This PR adds a docker driver task configuration setting for turning off
built-in HEALTHCHECK of a container.

References)
https://docs.docker.com/engine/reference/builder/#healthcheck
https://github.com/docker/engine-api/blob/master/types/container/config.go#L16

Closes #5310
Closes #14068
2022-08-11 10:33:48 -05:00
Seth Hoenig
e96d52d87f cli: respect vault token in plan command
This PR fixes a regression where the 'job plan' command would not respect
a Vault token if set via --vault-token or $VAULT_TOKEN.

Basically the same bug/fix as for the validate command in https://github.com/hashicorp/nomad/issues/13062

Fixes https://github.com/hashicorp/nomad/issues/13939
2022-08-11 08:54:08 -05:00
Morantron
0eb2194924 Update index.mdx 2022-08-11 09:03:52 +02:00
Seth Hoenig
a2d3c29005 Merge pull request #14065 from hashicorp/b-fwd-vtoken-validation
cli: forward request for job validation to nomad leader
2022-08-10 15:14:01 -05:00
Seth Hoenig
7b4652440a cli: forward request for job validation to nomad leader
This PR changes the behavior of 'nomad job validate' to forward the
request to the nomad leader, rather than responding from any server.

This is because we need the leader when validating Vault tokens, since
the leader is the only server with an active vault client.
2022-08-10 14:34:04 -05:00
dgotlieb
1d9e6f94f8 doc typo fix
docker and podman don't suck 🤣
2022-08-10 15:04:07 +03:00
Charlie Voiselle
279631eeb5 Sweep of docs for repeated words; minor edits (#14032) 2022-08-05 16:45:30 -04:00
Luiz Aoqui
e1ae7bf7d1 qemu: reduce monitor socket path (#13971)
The QEMU driver can take an optional `graceful_shutdown` configuration
which will create a Unix socket to send ACPI shutdown signal to the VM.

Unix sockets have a hard length limit and the driver implementation
assumed that QEMU versions 2.10.1 were able to handle longer paths. This
is not correct, the linked QEMU fix only changed the behaviour from
silently truncating longer socket paths to throwing an error.

By validating the socket path before starting the QEMU machine we can
provide users a more actionable and meaningful error message, and by
using a shorter socket file name we leave a bit more room for
user-defined values in the path, such as the task name.

The maximum length allowed is also platform-dependant, so validation
needs to be different for each OS.
2022-08-04 12:10:35 -04:00
Luiz Aoqui
a37ef39b8f template: set default UID/GID to -1 (#13998)
UID/GID 0 is usually reserved for the root user/group. While Nomad
clients are expected to run as root it may not always be the case.

Setting these values as -1 if not defined will fallback to the pervious
behaviour of not attempting to set file ownership and use whatever
UID/GID the Nomad agent is running as. It will also keep backwards
compatibility, which is specially important for platforms where this
feature is not supported, like Windows.
2022-08-04 11:26:08 -04:00
Luiz Aoqui
c4b322687d docs: remove link to HCL2 timestamp function (#13999)
The `timestamp` HCL2 function was never part of the set of supported
functions.
2022-08-04 10:07:51 -04:00
Derek Strickland
696deb9600 Add Nomad RetryConfig to agent template config (#13907)
* add Nomad RetryConfig to agent template config
2022-08-03 16:56:30 -04:00
Piotr Kazmierczak
2e0b875b14 client: enable specifying user/group permissions in the template stanza (#13755)
* Adds Uid/Gid parameters to template.

* Updated diff_test

* fixed order

* update jobspec and api

* removed obsolete code

* helper functions for jobspec parse test

* updated documentation

* adjusted API jobs test.

* propagate uid/gid setting to job_endpoint

* adjusted job_endpoint tests

* making uid/gid into pointers

* refactor

* updated documentation

* updated documentation

* Update client/allocrunner/taskrunner/template/template_test.go

Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>

* Update website/content/api-docs/json-jobs.mdx

Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>

* propagating documentation change from Luiz

* formatting

* changelog entry

* changed changelog entry

Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
2022-08-02 22:15:38 +02:00
Tim Gross
9384ba19ad docs: concepts for secure variables and workload identity (#13764)
Includes concept docs for secure variables, concept docs for workload
identity, and an operations docs for keyring management.
2022-08-02 10:06:26 -04:00
Eric Weber
07bbf1f91e Add stage_publish_base_dir field to csi_plugin stanza of a job (#13919)
* Allow specification of CSI staging and publishing directory path
* Add website documentation for stage_publish_dir
* Replace erroneous reference to csi_plugin.mount_config with csi_plugin.mount_dir
* Avoid requiring CSI plugins to be redeployed after introducing StagePublishDir
2022-08-02 09:42:44 -04:00
Tim Gross
82861ae8d7 secure vars: enforce ENT quotas (OSS work) (#13951)
Move the secure variables quota enforcement calls into the state store to ensure
quota checks are atomic with quota updates (in the same transaction).

Switch to a machine-size int instead of a uint64 for quota tracking. The
ENT-side quota spec is described as int, and negative values have a meaning as
"not permitted at all". Using the same type for tracking will make it easier to
the math around checks, and uint64 is infeasibly large anyways.

Add secure vars to quota HTTP API and CLI outputs and API docs.
2022-08-02 09:32:09 -04:00
Tim Gross
aa0c3a416a docs: fix path for quota/usage API (#13952) 2022-08-02 08:46:45 -04:00
Seth Hoenig
0693bdd875 website: enable setting custom tool for launching website dev container
When working in a podman environment, it's nice to just run the website
development container using podman.
2022-07-26 09:15:03 -05:00
asymmetric
1d3ee6a70d Update filesystem.mdx (#13738)
fix alloc working directory path
2022-07-25 10:25:48 -04:00
Scott Holodak
7c1c965918 docs: fix placement for scaling and csi_plugin (#13892) 2022-07-25 10:06:59 -04:00
Charlie Voiselle
7f9ff2430c Fix link (#13881) 2022-07-22 12:27:45 -04:00
Michael Schurter
5c6352e1ba docs: clarify submit-job allows stopping (#13871) 2022-07-21 10:18:57 -07:00
Tim Gross
587360543b docs: keyring commands (#13690)
Document the secure variables keyring commands, document the aliased
gossip keyring commands, and note that the old gossip keyring commands
are deprecated.
2022-07-20 14:14:10 -04:00
Tim Gross
bf6116f5dd docs: document secure variables server config options (#13695) 2022-07-20 14:13:39 -04:00
Will Jordan
662a12a41e Return 429 response on HTTP max connection limit (#13621)
Return 429 response on HTTP max connection limit. Instead of silently closing
the connection, return a `429 Too Many Requests` HTTP response with a helpful
error message to aid debugging when the connection limit is unintentionally
reached.

Set a 10-millisecond write timeout and rate limiter for connection-limit 429
response to prevent writing the HTTP response from consuming too many server
resources.

Add `nomad.agent.http.exceeded metric` counting the number of HTTP connections
exceeding concurrency limit.
2022-07-20 14:12:21 -04:00
Luiz Aoqui
208b682211 docs: update Autoscaler AWS plugin with new ws_credential_provider config (#13779) 2022-07-19 10:27:55 -04:00
Niklas Hambüchen
d18df07ccb docs: job-specification: Explain that priority has no effect on run order (#13835)
Makes the issues from #9845 and #12792 less surprising to the user.
2022-07-19 08:55:29 -04:00
Andy Assareh
7e29f64aec word typo digestible (#13772) 2022-07-19 09:00:52 +02:00
Seth Hoenig
bd462ebc5f Merge pull request #13813 from hashicorp/docs-move-checks
docs: move checks into own page
2022-07-18 12:27:43 -05:00
Seth Hoenig
e12a0e763e docs: move checks into own page
This PR creates a top-level 'check' page for job-specification docs.

The content for checks is about half the content of the service page, and
is about to increase in size when we add docs about Nomad service checks.
Seemed like a good idea to just split the checks section out into its own
thing (e.g. check_restart is already a topic).

Doing the move first lets us backport this change without adding Nomad service
check stuff yet.

Mostly just a lift-and-shift but with some tweaked examples to de-emphasize
the use of script checks.
2022-07-18 09:34:55 -05:00
Tim Gross
5c0ef26299 docs: ACL policy spec reference (#13787)
The "Secure Nomad with Access Control" guide provides a tutorial for
bootstrapping Nomad ACLs, writing policies, and creating tokens. Add a reference
guide just for the ACL policy specification.
2022-07-18 09:35:28 -04:00
Luiz Aoqui
cd047cdc03 docs: update Podman docs to v0.4.0 (#13783) 2022-07-15 18:01:35 -04:00
Michael Schurter
875cf8db51 Improve metrics reference documentation (#13769)
* docs: tighten up parameterized job metrics docs

* docs: improve alloc status descriptions

Remove `nomad.client.allocations.start` as it doesn't exist.
2022-07-15 14:22:39 -07:00
Michael Schurter
717e92d3aa docs: clarify blocked_evals metrics (#13751)
Related to #13740

- blocked_evals.total_blocked is the number of evals blocked for *any*
  reason
- blocked_evals.total_quota_limit is the number of evals blocked by
  quota limits, but critically: their resources are *not* counted in the
  cpu/memory
2022-07-14 11:32:33 -07:00
Seth Hoenig
22d7075457 Merge pull request #13716 from hashicorp/docs-update-consul-warning
docs: remove consul 1.12.0 warning
2022-07-14 08:45:56 -05:00
Luiz Aoqui
d456cc1e7f Track plan rejection history and automatically mark clients as ineligible (#13421)
Plan rejections occur when the scheduler work and the leader plan
applier disagree on the feasibility of a plan. This may happen for valid
reasons: since Nomad does parallel scheduling, it is expected that
different workers will have a different state when computing placements.

As the final plan reaches the leader plan applier, it may no longer be
valid due to a concurrent scheduling taking up intended resources. In
these situations the plan applier will notify the worker that the plan
was rejected and that they should refresh their state before trying
again.

In some rare and unexpected circumstances it has been observed that
workers will repeatedly submit the same plan, even if they are always
rejected.

While the root cause is still unknown this mitigation has been put in
place. The plan applier will now track the history of plan rejections
per client and include in the plan result a list of node IDs that should
be set as ineligible if the number of rejections in a given time window
crosses a certain threshold. The window size and threshold value can be
adjusted in the server configuration.

To avoid marking several nodes as ineligible at one, the operation is rate
limited to 5 nodes every 30min, with an initial burst of 10 operations.
2022-07-12 18:40:20 -04:00
Michael Schurter
f998a2b77b core: merge reserved_ports into host_networks (#13651)
Fixes #13505

This fixes #13505 by treating reserved_ports like we treat a lot of jobspec settings: merging settings from more global stanzas (client.reserved.reserved_ports) "down" into more specific stanzas (client.host_networks[].reserved_ports).

As discussed in #13505 there are other options, and since it's totally broken right now we have some flexibility:

Treat overlapping reserved_ports on addresses as invalid and refuse to start agents. However, I'm not sure there's a cohesive model we want to publish right now since so much 0.9-0.12 compat code still exists! We would have to explain to folks that if their -network-interface and host_network addresses overlapped, they could only specify reserved_ports in one place or the other?! It gets ugly.
Use the global client.reserved.reserved_ports value as the default and treat host_network[].reserverd_ports as overrides. My first suggestion in the issue, but @groggemans made me realize the addresses on the agent's interface (as configured by -network-interface) may overlap with host_networks, so you'd need to remove the global reserved_ports from addresses shared with a shared network?! This seemed really confusing and subtle for users to me.
So I think "merging down" creates the most expressive yet understandable approach. I've played around with it a bit, and it doesn't seem too surprising. The only frustrating part is how difficult it is to observe the available addresses and ports on a node! However that's a job for another PR.
2022-07-12 14:40:25 -07:00