Commit Graph

130 Commits

Author SHA1 Message Date
Tim Gross
30bc456f03 logs: allow disabling log collection in jobspec (#16962)
Some Nomad users ship application logs out-of-band via syslog. For these users
having `logmon` (and `docker_logger`) running is unnecessary overhead. Allow
disabling the logmon and pointing the task's stdout/stderr to /dev/null.

This changeset is the first of several incremental improvements to log
collection short of full-on logging plugins. The next step will likely be to
extend the internal-only task driver configuration so that cluster
administrators can turn off log collection for the entire driver.

---

Fixes: #11175

Co-authored-by: Thomas Weber <towe75@googlemail.com>
2023-04-24 10:00:27 -04:00
Luiz Aoqui
e169847a64 docs: add missing field Capabilities to Namespace API (#16931) 2023-04-19 08:14:36 -07:00
Luiz Aoqui
ba364fc400 docs: add missing API field JobACL and fix workload identity headers (#16930) 2023-04-19 08:12:58 -07:00
Seth Hoenig
2c44cbb001 api: enable support for setting original job source (#16763)
* api: enable support for setting original source alongside job

This PR adds support for setting job source material along with
the registration of a job.

This includes a new HTTP endpoint and a new RPC endpoint for
making queries for the original source of a job. The
HTTP endpoint is /v1/job/<id>/submission?version=<version> and
the RPC method is Job.GetJobSubmission.

The job source (if submitted, and doing so is always optional), is
stored in the job_submission memdb table, separately from the
actual job. This way we do not incur overhead of reading the large
string field throughout normal job operations.

The server config now includes job_max_source_size for configuring
the maximum size the job source may be, before the server simply
drops the source material. This should help prevent Bad Things from
happening when huge jobs are submitted. If the value is set to 0,
all job source material will be dropped.

* api: avoid writing var content to disk for parsing

* api: move submission validation into RPC layer

* api: return an error if updating a job submission without namespace or job id

* api: be exact about the job index we associate a submission with (modify)

* api: reword api docs scheduling

* api: prune all but the last 6 job submissions

* api: protect against nil job submission in job validation

* api: set max job source size in test server

* api: fixups from pr
2023-04-11 08:45:08 -05:00
Piotr Kazmierczak
3da1756dd0 acl: JWT changelog entry and typo fix 2023-03-30 09:40:11 +02:00
Piotr Kazmierczak
ed068c0831 acl: JWT auth method 2023-03-30 09:39:56 +02:00
James Rasell
96740b5392 docs: remove Java and Scala SDKs from supported list. (#16555) 2023-03-20 15:35:02 +01:00
Michael Schurter
46ae1025bb docs: dispatch_payload and jobs api docs had some weirdness (#16514)
* docs: dispatch_payload docs had some weirdness

Docs said "Examples" when there was only 1 example. Not sure what the
floating "to" in the description was for.

* docs: missing a heading level on jobs api docs
2023-03-16 09:42:46 -07:00
Luiz Aoqui
f2bfbfaf03 acl: update job eval requirement to submit-job (#16463)
The job evaluate endpoint creates a new evaluation for the job which is
a write operation. This change modifies the necessary capability from
`read-job` to `submit-job` to better reflect this.
2023-03-13 17:13:54 -04:00
Luiz Aoqui
158d6a91c2 docs: fix alloc stop no_shutdown_delay (#16282) 2023-03-03 14:44:49 -05:00
Aofei Sheng
64d27c6a2f docs: fix typos in task-api.mdx and workload-identity.mdx (#16309) 2023-03-03 08:37:59 -05:00
Dao Thanh Tung
41e510abe4 Fix missing query parameter in job doc (#16233)
Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg>
2023-02-22 10:28:32 -06:00
Seth Hoenig
ed4ad3e7b7 docs: slight tidy up of var create example payload (#16212) 2023-02-17 13:12:39 -06:00
James Rasell
004ddb2b63 acl: add validation to binding rule selector on upsert. (#16210)
* acl: add validation to binding rule selector on upsert.

* docs: add more information on binding rule selector escaping.
2023-02-17 15:38:55 +01:00
Michael Schurter
037823e864 Minor post-1.5-beta1 API, code, and docs cleanups (#16193)
* api: return error on parse failure

* docs: clarify anonymous policy with task api
2023-02-16 10:32:21 -08:00
Michael Schurter
eabb47e2d0 Workload Identity, Task API, and Dynamic Node Metadata Docs (#16102)
* docs: add dynamic node metadata api docs

Also update all paths in the client API docs to explicitly state the
`/v1/` prefix. We're inconsistent about that, but I think it's better to
display the full path than to only show the fragment. If we ever do a
`/v2/` whether or not we explicitly state `/v1/` in our docs won't be
our greatest concern.

* docs: add task-api docs
2023-02-09 16:03:43 -08:00
Charlie Voiselle
fe4ff5be2a Add option to expose workload token to task (#15755)
Add `identity` jobspec block to expose workload identity tokens to tasks.

---------

Co-authored-by: Anders <mail@anars.dk>
Co-authored-by: Tim Gross <tgross@hashicorp.com>
Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
2023-02-02 10:59:14 -08:00
Daniel Bennett
9f583f57f5 Change job init default to example.nomad.hcl and recommend in docs (#15997)
recommend .nomad.hcl for job files instead of .nomad (without .hcl)
* nomad job init -> example.nomad.hcl
* update docs
2023-02-02 11:47:47 -06:00
James Rasell
e4e4dc18e6 acl: fix a bug in token creation when parsing expiration TTLs. (#15999)
The ACL token decoding was not correctly handling time duration
syntax such as "1h" which forced people to use the nanosecond
representation via the HTTP API.

The change adds an unmarshal function which allows this syntax to
be used, along with other styles correctly.
2023-02-01 17:43:41 +01:00
Mike Nomitch
c2491e9146 Increases max variable size to 64KiB from 16KiB (#15983) 2023-01-31 13:32:36 -05:00
Piotr Kazmierczak
949a6f60c7 renamed stanza to block for consistency with other projects (#15941) 2023-01-30 15:48:43 +01:00
James Rasell
14fb036473 sso: allow binding rules to create management ACL tokens. (#15860)
* sso: allow binding rules to create management ACL tokens.

* docs: update binding rule docs to detail management type addition.
2023-01-26 09:57:44 +01:00
Ashlee M Boyer
3444ece549 docs: Migrate link formats (#15779)
* Adding check-legacy-links-format workflow

* Adding test-link-rewrites workflow

* chore: updates link checker workflow hash

* Migrating links to new format

Co-authored-by: Kendall Strautman <kendallstrautman@gmail.com>
2023-01-25 09:31:14 -08:00
James Rasell
e1f88f8334 docs: add OIDC login API and CLI docs. (#15818) 2023-01-20 10:07:26 +01:00
huazhihao
99ab6d439c docs: fix system sample request (#15650) 2023-01-03 10:58:21 -05:00
Piotr Kazmierczak
4ed7ef76b4 acl: binding rules API documentation (#15581) 2022-12-20 11:22:51 +01:00
Piotr Kazmierczak
605597ffd0 acl: SSO auth methods API documentation (#15475)
This PR provides documentation for the ACL Auth Methods API endpoints.

Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
2022-12-09 09:47:31 +01:00
Luiz Aoqui
de4aba19bb docs: improve job parse API documentation (#15387) 2022-11-25 12:46:53 -05:00
Ayrat Badykov
322c6b3dce fix create snapshot request docs (#15242) 2022-11-17 08:43:40 +01:00
Nikita Beletskii
b55ab6318e Fix variable create API example in docs (#15248) 2022-11-15 16:04:11 +01:00
Tim Gross
ce0e0768ff API for Eval.Count (#15147)
Add a new `Eval.Count` RPC and associated HTTP API endpoints. This API is
designed to support interactive use in the `nomad eval delete` command to get a
count of evals expected to be deleted before doing so.

The state store operations to do this sort of thing are somewhat expensive, but
it's cheaper than serializing a big list of evals to JSON. Note that although it
seems like this could be done as an extra parameter and response field on
`Eval.List`, having it as its own endpoint avoids having to change the response
body shape and lets us avoid handling the legacy filter params supported by
`Eval.List`.
2022-11-07 08:53:19 -05:00
Phil Renaud
85f472189a Accidentally trailed off on a docs paragraph (#15118) 2022-11-02 23:33:41 -04:00
Phil Renaud
1a29e72f7f [ui] Adds meta to job list stub and displays a pack logo on the jobs index (#14833)
* Adds meta to job list stub and displays a pack logo on the jobs index

* Changelog

* Modifying struct for optional meta param

* Explicitly ask for meta anytime I look up a job from index or job page

* Test case for the endpoint

* adding meta field to API struct and ommitting from response if empty

* passthru method added to api/jobs.list

* Meta param listed in docs for jobs list

* Update api/jobs.go

Co-authored-by: Tim Gross <tgross@hashicorp.com>

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2022-11-02 16:58:24 -04:00
James Rasell
1c9b4e398d acl: add ACL roles to event stream topic and resolve policies. (#14923)
This changes adds ACL role creation and deletion to the event
stream. It is exposed as a single topic with two types; the filter
is primarily the role ID but also includes the role name.

While conducting this work it was also discovered that the events
stream has its own ACL resolution logic. This did not account for
ACL tokens which included role links, or tokens with expiry times.
ACL role links are now resolved to their policies and tokens are
checked for expiry correctly.
2022-10-20 09:43:35 +02:00
Kevin Wang
57dc7c2ab1 fix: website broken links (#14904)
* fix: website broken links

* fix up keyring-rotate link

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2022-10-17 11:32:10 -04:00
Tim Gross
e7e9713d2e variables: document restrictions on path and size (#14687) 2022-09-26 11:40:53 -04:00
Tim Gross
9ea8ca4aa5 docs: variables HTTP API documentation (#14516) 2022-09-13 10:18:26 -04:00
Tim Gross
0437bf384b docs: keyring HTTP API documentation (#14513) 2022-09-13 09:46:54 -04:00
Tim Gross
f2186be02c CSI: failed allocation should not block its own controller unpublish (#14484)
A Nomad user reported problems with CSI volumes associated with failed
allocations, where the Nomad server did not send a controller unpublish RPC.

The controller unpublish is skipped if other non-terminal allocations on the
same node claim the volume. The check has a bug where the allocation belonging
to the claim being freed was included in the check incorrectly. During a normal
allocation stop for job stop or a new version of the job, the allocation is
terminal. But allocations that fail are not yet marked terminal at the point in
time when the client sends the unpublish RPC to the server.

For CSI plugins that support controller attach/detach, this means that the
controller will not be able to detach the volume from the allocation's host and
the replacement claim will fail until a GC is run. This changeset fixes the
conditional so that the claim's own allocation is not included, and makes the
logic easier to read. Include a test case covering this path.

Also includes two minor extra bugfixes:

* Entities we get from the state store should always be copied before
altering. Ensure that we copy the volume in the top-level unpublish workflow
before handing off to the steps.

* The list stub object for volumes in `nomad/structs` did not match the stub
object in `api`. The `api` package also did not include the current
readers/writers fields that are expected by the UI. True up the two objects and
add the previously undocumented fields to the docs.
2022-09-08 13:30:05 -04:00
James Rasell
20f71cf3e4 docs: add documentation for ACL token expiration and ACL roles. (#14332)
The ACL command docs are now found within a sub-dir like the
operator command docs. Updates to the ACL token commands to
accommodate token expiry have also been added.

The ACL API docs are now found within a sub-dir like the operator
API docs. The ACL docs now include the ACL roles endpoint as well
as updated ACL token endpoints for token expiration.

The configuration section is also updated to accommodate the new
ACL and server parameters for the new ACL features.
2022-08-31 16:13:47 +02:00
Tim Gross
8dfa6b06e7 docs: fixing a few more places we missed "secure" during rename (#14395) 2022-08-30 10:08:50 -04:00
Luiz Aoqui
f74f50804a Task lifecycle restart (#14127)
* allocrunner: handle lifecycle when all tasks die

When all tasks die the Coordinator must transition to its terminal
state, coordinatorStatePoststop, to unblock poststop tasks. Since this
could happen at any time (for example, a prestart task dies), all states
must be able to transition to this terminal state.

* allocrunner: implement different alloc restarts

Add a new alloc restart mode where all tasks are restarted, even if they
have already exited. Also unifies the alloc restart logic to use the
implementation that restarts tasks concurrently and ignores
ErrTaskNotRunning errors since those are expected when restarting the
allocation.

* allocrunner: allow tasks to run again

Prevent the task runner Run() method from exiting to allow a dead task
to run again. When the task runner is signaled to restart, the function
will jump back to the MAIN loop and run it again.

The task runner determines if a task needs to run again based on two new
task events that were added to differentiate between a request to
restart a specific task, the tasks that are currently running, or all
tasks that have already run.

* api/cli: add support for all tasks alloc restart

Implement the new -all-tasks alloc restart CLI flag and its API
counterpar, AllTasks. The client endpoint calls the appropriate restart
method from the allocrunner depending on the restart parameters used.

* test: fix tasklifecycle Coordinator test

* allocrunner: kill taskrunners if all tasks are dead

When all non-poststop tasks are dead we need to kill the taskrunners so
we don't leak their goroutines, which are blocked in the alloc restart
loop. This also ensures the allocrunner exits on its own.

* taskrunner: fix tests that waited on WaitCh

Now that "dead" tasks may run again, the taskrunner Run() method will
not return when the task finishes running, so tests must wait for the
task state to be "dead" instead of using the WaitCh, since it won't be
closed until the taskrunner is killed.

* tests: add tests for all tasks alloc restart

* changelog: add entry for #14127

* taskrunner: fix restore logic.

The first implementation of the task runner restore process relied on
server data (`tr.Alloc().TerminalStatus()`) which may not be available
to the client at the time of restore.

It also had the incorrect code path. When restoring a dead task the
driver handle always needs to be clear cleanly using `clearDriverHandle`
otherwise, after exiting the MAIN loop, the task may be killed by
`tr.handleKill`.

The fix is to store the state of the Run() loop in the task runner local
client state: if the task runner ever exits this loop cleanly (not with
a shutdown) it will never be able to run again. So if the Run() loops
starts with this local state flag set, it must exit early.

This local state flag is also being checked on task restart requests. If
the task is "dead" and its Run() loop is not active it will never be
able to run again.

* address code review requests

* apply more code review changes

* taskrunner: add different Restart modes

Using the task event to differentiate between the allocrunner restart
methods proved to be confusing for developers to understand how it all
worked.

So instead of relying on the event type, this commit separated the logic
of restarting an taskRunner into two methods:
- `Restart` will retain the current behaviour and only will only restart
  the task if it's currently running.
- `ForceRestart` is the new method where a `dead` task is allowed to
  restart if its `Run()` method is still active. Callers will need to
  restart the allocRunner taskCoordinator to make sure it will allow the
  task to run again.

* minor fixes
2022-08-24 17:43:07 -04:00
Piotr Kazmierczak
34e4b080f6 template: custom change_mode scripts (#13972)
This PR adds the functionality of allowing custom scripts to be executed on template change. Resolves #2707
2022-08-24 17:43:01 +02:00
Phil Renaud
3bf5633446 [ui] general keyboard navigation: 1.3.4 release (#14138)
* Initialized keyboard service

Neat but funky: dynamic subnav traversal

👻

generalized traverseSubnav method

Shift as special modifier key

Nice little demo panel

Keyboard shortcuts keycard

Some animation styles on keyboard shortcuts

Handle situations where a link is deeply nested from its parent menu item

Keyboard service cleanup

helper-based initializer and teardown for new contextual commands

Keyboard shortcuts modal component added and demo-ghost removed

Removed j and k from subnav traversal

Register and unregister methods for subnav plus new subnavs for volumes and volume

register main nav method

Generalizing the register nav method

12762 table keynav (#12975)

* Experimental feature: shortcut visual hints

* Long way around to a custom modifier for keyboard shortcuts

* dynamic table and list iterative shortcuts

* Progress with regular old tether

* Delogging

* Table Keynav tether fix, server and client navs, and fix to shiftless on modified arrow keys

Go to Optimize keyboard link and storage key changed to g r

parameterized jobs keyboard nav

Dynamic numeric keynav for multiple tables (#13482)

* Multiple tables init

* URL-bind enumerable keyboard commands and add to more taskRow and allocationRows

* Type safety and lint fixes

* Consolidated push to keyCommands

* Default value when removing keyCommands

* Remove the URL-based removal method and perform a recompute on any add

Get tests passing in Keynav: remove math helpers and a few other defensive moves (#13761)

* Remove ember math helpers

* Test fixes for jobparts/body

* Kill an unneeded integration helper test

* delog

* Trying if disabling percy lets this finish

* Okay so its not percy; try parallelism in circle

* Percyless yet again

* Trying a different angle to not have percy

* Upgrade percy to 1.6.1

[ui] Keyboard nav: "u" key to go up a level (#13754)

* U to go up a level

* Mislabelled my conditional

* Custom lint ignore rule

* Custom lint ignore rule, this time with commas

* Since we're getting rid of ember math helpers elsewhere, do the math ourselves here

Replace ArrowLeft etc. with an ascii arrow (#13776)

* Replace ArrowLeft etc. with an ascii arrow

* non-mutative helper cleanup

Keyboard Nav: let users rebind their shortcuts (#13781)

* click-outside and shortcuts enabled/disabled toggle

* Trap focus when modal open

* Enabled/disabled saved to localStorage

* Autofocus edit button on variable index

* Modal overflow styles

* Functional rebind

* Saving rebinds to localStorage for all majors

* Started on defaultCommandBindings

* Modal header style and cancel rebind on escape

* keyboardable keybindings w buttons instead of spans

* recording and defaultvalues

* Enter short-circuits rebind

* Only some commands are rebindable, and dont show dupes

* No unused get import

* More visually distinct header on modal

* Disallowed keys for rebind, showing buffer as you type, and moving dedupe to modal logic

willDestroy hook to prevent tests from doubling/tripling up addEventListener on kb events

remove unused tests

Keyboard Navigation acceptance tests (#13893)

* Acceptance tests for keyboard modal

* a11y audit fix and localStorage clear

* Bind/rebind/localStorage tests

* Keyboard tests for dynamic nav and tables

* Rebinder and assert expectation

* Second percy snapshot showing hints no longer relevant

Weird issue where linktos with query props specifically from the task-groups page would fail to route / hit undefined.shouldSuperCede errors

Adds the concept of exclusivity to a keycommand, removing peers that also share its label

Lintfix

Changelog and PR feedback

Changelog and PR feedback

Fix to rebinding in firefox by blurring the now-disabled button on rebind (#14053)

* Secure Variables shortcuts removed

* Variable index route autofocus removed

* Updated changelog entry

* Updated changelog entry

* Keynav docs (#14148)

* Section added to the API Docs UI page

* Added a note about disabling

* Prev and Next order

* Remove dev log and unneeded comments
2022-08-17 12:59:33 -04:00
Seth Hoenig
7b4652440a cli: forward request for job validation to nomad leader
This PR changes the behavior of 'nomad job validate' to forward the
request to the nomad leader, rather than responding from any server.

This is because we need the leader when validating Vault tokens, since
the leader is the only server with an active vault client.
2022-08-10 14:34:04 -05:00
Charlie Voiselle
279631eeb5 Sweep of docs for repeated words; minor edits (#14032) 2022-08-05 16:45:30 -04:00
Piotr Kazmierczak
2e0b875b14 client: enable specifying user/group permissions in the template stanza (#13755)
* Adds Uid/Gid parameters to template.

* Updated diff_test

* fixed order

* update jobspec and api

* removed obsolete code

* helper functions for jobspec parse test

* updated documentation

* adjusted API jobs test.

* propagate uid/gid setting to job_endpoint

* adjusted job_endpoint tests

* making uid/gid into pointers

* refactor

* updated documentation

* updated documentation

* Update client/allocrunner/taskrunner/template/template_test.go

Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>

* Update website/content/api-docs/json-jobs.mdx

Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>

* propagating documentation change from Luiz

* formatting

* changelog entry

* changed changelog entry

Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
2022-08-02 22:15:38 +02:00
Tim Gross
82861ae8d7 secure vars: enforce ENT quotas (OSS work) (#13951)
Move the secure variables quota enforcement calls into the state store to ensure
quota checks are atomic with quota updates (in the same transaction).

Switch to a machine-size int instead of a uint64 for quota tracking. The
ENT-side quota spec is described as int, and negative values have a meaning as
"not permitted at all". Using the same type for tracking will make it easier to
the math around checks, and uint64 is infeasibly large anyways.

Add secure vars to quota HTTP API and CLI outputs and API docs.
2022-08-02 09:32:09 -04:00
Tim Gross
aa0c3a416a docs: fix path for quota/usage API (#13952) 2022-08-02 08:46:45 -04:00
Charlie Voiselle
7f9ff2430c Fix link (#13881) 2022-07-22 12:27:45 -04:00