* Scaffolding actions (#18639)
* Task-level actions for job submissions and retrieval
* FIXME: Temporary workaround to get ember dev server to pass exec through to 4646
* Update api/tasks.go
Co-authored-by: Tim Gross <tgross@hashicorp.com>
* Update command/agent/job_endpoint.go
Co-authored-by: Tim Gross <tgross@hashicorp.com>
* Diff and copy implementations
* Action structs get their own file, diff updates to behave like our other diffs
* Test to observe actions changes in a version update
* Tests migrated into structs/diff_test and modified with PR comments in mind
* APIActionToSTructsAction now returns a new value
* de-comment some plain parts, remove unused action lookup
* unused param in action converter
---------
Co-authored-by: Tim Gross <tgross@hashicorp.com>
* New endpoint: job/:id/actions (#18690)
* unused param in action converter
* backing out of parse_job level and moved toward new endpoint level
* Adds taskName and taskGroupName to actions at job level
* Unmodified job mock actions tests
* actionless job test
* actionless job test
* Multi group multi task actions test
* HTTP method check for GET, cleaner errors in job_endpoint_test
* decomment
* Actions aggregated at job model level (#18733)
* Removal of temporary fix to proxy to 4646
* Run Action websocket endpoint (#18760)
* Working demo for review purposes
* removal of cors passthru for websockets
* Remove job_endpoint-specific ws handlers and aimed at existing alloc exec handlers instead
* PR comments adressed, no need for taskGroup pass, better group and task lookups from alloc
* early return in action validate and removed jobid from req args per PR comments
* todo removal, we're checking later in the rpc
* boolean style change on tty
* Action CLI command (#18778)
* Action command init and stuck-notes
* Conditional reqpath to aim at Job action endpoint
* De-logged
* General CLI command cleanup, observe namespace, pass action as string, get random alloc w group adherence
* tab and varname cleanup
* Remove action param from Allocations().Exec calls
* changelog
* dont nil-check acl
---------
Co-authored-by: Tim Gross <tgross@hashicorp.com>
* core: plumbing to support numa aware scheduling
* core: apply node resources compatibility upon fsm rstore
Handle the case where an upgraded server dequeus an evaluation before
a client triggers a new fingerprint - which would be needed to cause
the compatibility fix to run. By running the compat fix on restore the
server will immediately have the compatible pseudo topology to use.
* lint: learn how to spell pseudo
The RPC handlers expect to see `nil` ACL objects whenever ACLs are disabled. By
using `nil` as a sentinel value, we have the risk of nil pointer exceptions and
improper handling of `nil` when returned from our various auth methods that can
lead to privilege escalation bugs. This is the final patch in a series to
eliminate the use of `nil` ACLs as a sentinel value for when ACLs are disabled.
This patch adds a new virtual ACL policy field for when ACLs are disabled and
updates our authentication logic to use it. Included:
* Extends auth package tests to demonstrate that nil ACLs are treated as failed
auth and disabled ACLs succeed auth.
* Adds a new `AllowDebug` ACL check for the weird special casing we have for
pprof debugging when ACLs are disabled.
* Removes the remaining unexported methods (and repeated tests) from the
`nomad/acl.go` file.
* Update the semgrep rules to detect improper nil ACL checking and remove the
old invalid ACL checks.
* Update the contributing guide for RPC authentication.
Ref: https://github.com/hashicorp/nomad-enterprise/pull/1218
Ref: https://github.com/hashicorp/nomad/pull/18703
Ref: https://github.com/hashicorp/nomad/pull/18715
Ref: https://github.com/hashicorp/nomad/pull/16799
Ref: https://github.com/hashicorp/nomad/pull/18730
Ref: https://github.com/hashicorp/nomad/pull/18744
The retry tests in the `api` package set up a client but don't use `NewClient`,
so the address never gets parsed into a `url.URL` and that's causing some test
failures.
- Expose internal HTTP client's Do() via Raw
- Use URL parser to identify scheme
- Align more with curl output
- Add changelog
- Fix test failure; add tests for socket envvars
- Apply review feedback for tests
- Consolidate address parsing
- Address feedback from code reviews
Co-authored-by: Tim Gross <tgross@hashicorp.com>
To support Workload Identity with Consul for templates, we want templates to be
able to use the WI created at the task scope (either implicitly or set by the
user). But to allow different tasks within a group to be assigned to different
clusters as we're doing for Vault, we need to be able to set the `consul` block
with its `cluster` field at the task level to override the group.
It includes the work over the state store, the PRC server, the HTTP server, the go API package and the CLI's command. To read more on the actuall functionality, refer to the RFCs [NMD-178] Locking with Nomad Variables and [NMD-179] Leader election using locking mechanism for the Autoscaler.
This feature will help operator to remove a failed/left node from Serf layer immediately
without waiting for 24 hours for the node to be reaped
* Update CLI with prune flag
* Update API /v1/agent/force-leave with prune query string parameter
* Update CLI and API doc
* Add unit test
In Nomad Enterprise when multiple Vault/Consul clusters are configured, cluster admins can control access to clusters for jobs via namespace ACLs, similar to how we've done so for node pools. This changeset updates the ACL configuration structs, but doesn't wire them up.
Rename the agent configuraion for workload identity to
`WorkloadIdentityConfig` to make its use more explicit and remove the
`ServiceName` field since it is never expected to be defined in a
configuration file.
Also update the job mutation to inject a service identity following
these rules:
1. Don't inject identity if `consul.use_identity` is false.
2. Don't inject identity if `consul.service_identity` is not specified.
3. Don't inject identity if service provider is not `consul`.
4. Set name and service name if the service specifies an identity.
5. Inject `consul.service_identity` if service does not specify an
identity.
This PR introduces updates to the jobspec required for workload identity support for services.
---------
Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
Allows for multiple `identity{}` blocks for tasks along with user-specified audiences. This is a building block to allow workload identities to be used with Consul, Vault and 3rd party JWT based auth methods.
Expiration is still unimplemented and is necessary for JWTs to be used securely, so that's up next.
---------
Co-authored-by: Tim Gross <tgross@hashicorp.com>
This directory and its subdirectories (packages) contain files licensed with the MPLv2 `LICENSE` file in this directory and are intentionally licensed separately from the BSL `LICENSE` file at the root of this repository.
There are some refactorings that have to be made in the getter and state
where the api changed in `slices`
* Bump golang.org/x/exp
* Bump golang.org/x/exp in api
* Update job_endpoint_test
* [feedback] unexport sort function
This feature is necessary when user want to explicitly re-render all templates on task restart.
E.g. to fetch all new secrets from Vault, even if the lease on the existing secrets has not been expired.
Service discovery or mesh network systems consuming the Nomad event stream or API need to know the CNI assigned IP for the allocation. This data is returned by the underlying Nomad API but isn't mapped in the response struct.
This complements the `env` parameter, so that the operator can author
tasks that don't share their Vault token with the workload when using
`image` filesystem isolation. As a result, more powerful tokens can be used
in a job definition, allowing it to use template stanzas to issue all kinds of
secrets (database secrets, Vault tokens with very specific policies, etc.),
without sharing that issuing power with the task itself.
This is accomplished by creating a directory called `private` within
the task's working directory, which shares many properties of
the `secrets` directory (tmpfs where possible, not accessible by
`nomad alloc fs` or Nomad's web UI), but isn't mounted into/bound to the
container.
If the `disable_file` parameter is set to `false` (its default), the Vault token
is also written to the NOMAD_SECRETS_DIR, so the default behavior is
backwards compatible. Even if the operator never changes the default,
they will still benefit from the improved behavior of Nomad never reading
the token back in from that - potentially altered - location.
* jobspec: rename node pool scheduler_configuration
In HCL specifications we usually call configuration blocks `config`
instead of `configuration`.
* np: add memory oversubscription config
* np: make scheduler config ENT
Add structs and fields to support the Nomad Pools Governance Enterprise
feature of controlling node pool access via namespaces.
Nomad Enterprise allows users to specify a default node pool to be used
by jobs that don't specify one. In order to accomplish this, it's
necessary to distinguish between a job that explicitly uses the
`default` node pool and one that did not specify any.
If the `default` node pool is set during job canonicalization it's
impossible to do this, so this commit allows a job to have an empty node
pool value during registration but sets to `default` at the admission
controller mutator.
In order to guarantee state consistency the state store validates that
the job node pool is set and exists before inserting it.
Implement scheduler support for node pool:
* When a scheduler is invoked, we get a set of the ready nodes in the DCs that
are allowed for that job. Extend the filter to include the node pool.
* Ensure that changes to a job's node pool are picked up as destructive
allocation updates.
* Add `NodesInPool` as a metric to all reporting done by the scheduler.
* Add the node-in-pool the filter to the `Node.Register` RPC so that we don't
generate spurious evals for nodes in the wrong pool.
Implements the HTTP API associated with the `NodePool.ListJobs` RPC, including
the `api` package for the public API and documentation.
Update the `NodePool.ListJobs` RPC to fix the missing handling of the special
"all" pool.
This changeset only adds the `node_pool` field to the jobspec, and ensures that
it gets picked up correctly as a change. Without the rest of the implementation
landed yet, the field will be ignored.
* Add UnexpectedResultError to nomad/api
This allows users to perform additional status-based behavior by rehydrating the error using `errors.As` inside of consumers.
* build(deps): bump github.com/shoenig/test from 0.6.4 to 0.6.5 in /api
* deps: update shoenig/test to v0.6.5
* deps: update again to v0.6.6
---------
Co-authored-by: Tim Gross <tgross@hashicorp.com>
Co-authored-by: Seth Hoenig <shoenig@duck.com>