The `mock_driver` is an internal task driver used mostly for testing and
simulating workloads. During the allocrunner v2 work (#4792) its name
changed from `mock_driver` to just `mock` and then back to
`mock_driver`, but the fingreprint key was kept as `driver.mock`.
This results in tasks configured with `driver = "mock"` to be scheduled
(because Nomad thinks the client has a task driver called `mock`), but
fail to actually run (because the Nomad client can't find a driver
called `mock` in its catalog).
Fingerprinting the right name prevents the job from being scheduled in
the first place.
Also removes mentions of the mock driver from documentation since its an
internal driver and not available in any production release.
Add information about autopilot health to the `/operator/autopilot/health` API
in Nomad Enterprise.
I've pulled the CE changes required for this feature out of @lindleywhite's PR
in the Enterprise repo. A separate PR will include a new `operator autopilot
health` command that can present this information at the command line.
Ref: https://github.com/hashicorp/nomad-enterprise/pull/1394
Co-authored-by: Lindley <lindley@hashicorp.com>
When filtering list results, the filter expression is applied to the
full object, not the stub. This is useful because it allows users to
filter the list on fields not present in the object stub. But it can
also be confusing because some fields have different names, or only
exist in the stub, so the filter expression needs to reference fields
not present in returned data.
Filtering on the stub would reduce the confusion, but it would also
restrict users to only be able to filter on the fields in the stub,
which, by definition, are just a subset of the original fields.
Documenting this behaviour can help users understand unexpected errors
and results.
The new `nomad setup vault -check` commmand can be used to retrieve
information about the changes required before a cluster is migrated from
the deprecated legacy authentication flow with Vault to use only
workload identities.
Add new optional `OIDCDisableUserInfo` setting for OIDC auth provider which
disables a request to the identity provider to get OIDC UserInfo.
This option is helpful when your identity provider doesn't send any additional
claims from the UserInfo endpoint, such as Microsoft AD FS OIDC Provider:
> The AD FS UserInfo endpoint always returns the subject claim as specified in the
> OpenID standards. AD FS doesn't support additional claims requested via the
> UserInfo endpoint
Fixes#19318
An audit of Nomad's ACLs resulted in some confusion around whether the
`NamespaceValidator` method is conjunctive ("add", as implied by the docs) or
disjunctive ("or", as it is by design). Clarify the ACL documentation as
follows:
* Call out where fine-grained capabilities imply grants to other
capabilities (for example, that `csi-read-volume` grants `csi-list-volume`).
* Fix an incorrectly documented ACL requirement for the CSI List External
Volumes API.
* Clarify how ACLs are expected to work for the two search API endpoints, such
that you need list/read access to the objects in the search context.
Lower cased the title and headings in line with our company-wide style since this is being linked in an upcoming blog I was editing. I also lowercased words such as "Auth Method" and other primitives/components when mentioned in prose - this is in line with our style guide as well where we don't capitalize auth method and we only capitalize components that are SKU/product-like in their separateness/importance.
https://docs.google.com/document/d/1MRvGd6tS5JkIwl_GssbyExkMJqOXKeUE00kSEtFi8m8/edit
Adam Trujilo should be in agreement with changes like this based on our past discussions, but feel free to bring in stake holders if you're not sure about accepting and we can discuss.
* API command and jobspec docs
* PR comments addressed
* API docs for job/jobid/action socket
* Removing a perhaps incorrect origin of job_id across the jobs api doc
* PR comments addressed
Clarify the difference between the `/client` and `/node` endpoints and
link from one to the other to help users discover the endpoint they are
looking for.
Also update the `/client` page description and dynamic nod metadata
section headers to help the page be more discoverable by search engines.
* Rename pages to include roles
* Models and adapters
* [ui] Any policy checks in the UI now check for roles' policies as well as token policies (#18346)
* combinedPolicies as a concept
* Classic decorator on role adapter
* We added a new request for roles, so the test based on a specific order of requests got fickle fast
* Mirage roles cluster scaffolded
* Acceptance test for roles and policies on the login page
* Update mirage mock for nodes fetch to account for role policies / empty token.policies
* Roles-derived policies checks
* [ui] Access Control with Roles and Tokens (#18413)
* top level policies routes moved into access control
* A few more routes and name cleanup
* Delog and test fixes to account for new url prefix and document titles
* Overview page
* Tokens and Roles routes
* Tokens helios table
* Add a role
* Hacky role page and deletion
* New policy keyboard shortcut and roles breadcrumb nav
* If you leave New Role but havent made any changes, remove the newly-created record from store
* Roles index list and general role route crud
* Roles index actually links to roles now
* Helios button styles for new roles and policies
* Handle when you try to create a new role without having any policies
* Token editing generally
* Create Token functionality
* Cant delete self-token but management token editing and deleting is fine
* Upgrading helios caused codemirror to explode, shimmed
* Policies table fix
* without bang-element condition, modifier would refire over and over
* Token TTL or Time setting
* time will take you on
* Mirage hooks for create and list roles
* Ensure policy names only use allow characters in mirage mocks
* Mirage mocked roles and policies in the default cluster
* log and lintfix
* chromedriver to 2.1.2
* unused unit tests removed
* Nice profile dropdown
* With the HDS accordion, rename our internal component scss ref
* design revisions after discussion
* Tooltip on deleted-policy tokens
* Two-step button peripheral isDeleting gcode removed
* Never to null on token save
* copywrite headers added and empty routefiles removed
* acceptance test fixes for policies endpoint
* Route for updating a token
* Policies testfixes
* Ember on-click-outside modifier upgraded with general ember-modifier upgrade
* Test adjustments to account for new profile header dropdown
* Test adjustments for tokens via policy pages
* Removed an unused route
* Access Control index page tests
* a11y tests
* Tokens index acceptance tests generally
* Lintfix
* Token edit page tests
* Token editing tests
* New token expiration tests
* Roles Index tests
* Role editing policies tests
* A complete set of Access Control Roles tests
* Policies test
* Be more specific about which row to check for expiration time
* Nil check on expirationTime equality
* Management tokens shouldnt show No Roles/Policies, give them their own designation
* Route guard on selftoken, conditional columns, and afterModel at parent to prevent orphaned policies on tokens/roles from stopping a new save
* Policy unloading on delete and other todos plus autofocus conditionally re-enabled
* Invalid policies non-links now a concept for Roles index
* HDS style links to make job.variables.alert links look like links again
* Mirage finding looks weird so making model async in hash even though redundant
* Drop rsvp
* RSVP wasnt the problem, cached lookups were
* remove old todo comments
* de-log
It includes the work over the state store, the PRC server, the HTTP server, the go API package and the CLI's command. To read more on the actuall functionality, refer to the RFCs [NMD-178] Locking with Nomad Variables and [NMD-179] Leader election using locking mechanism for the Autoscaler.
This feature will help operator to remove a failed/left node from Serf layer immediately
without waiting for 24 hours for the node to be reaped
* Update CLI with prune flag
* Update API /v1/agent/force-leave with prune query string parameter
* Update CLI and API doc
* Add unit test
Add structs and fields to support the Nomad Pools Governance Enterprise
feature of controlling node pool access via namespaces.
Nomad Enterprise allows users to specify a default node pool to be used
by jobs that don't specify one. In order to accomplish this, it's
necessary to distinguish between a job that explicitly uses the
`default` node pool and one that did not specify any.
If the `default` node pool is set during job canonicalization it's
impossible to do this, so this commit allows a job to have an empty node
pool value during registration but sets to `default` at the admission
controller mutator.
In order to guarantee state consistency the state store validates that
the job node pool is set and exists before inserting it.
Implement scheduler support for node pool:
* When a scheduler is invoked, we get a set of the ready nodes in the DCs that
are allowed for that job. Extend the filter to include the node pool.
* Ensure that changes to a job's node pool are picked up as destructive
allocation updates.
* Add `NodesInPool` as a metric to all reporting done by the scheduler.
* Add the node-in-pool the filter to the `Node.Register` RPC so that we don't
generate spurious evals for nodes in the wrong pool.
Implements the HTTP API associated with the `NodePool.ListJobs` RPC, including
the `api` package for the public API and documentation.
Update the `NodePool.ListJobs` RPC to fix the missing handling of the special
"all" pool.
This changeset only adds the `node_pool` field to the jobspec, and ensures that
it gets picked up correctly as a change. Without the rest of the implementation
landed yet, the field will be ignored.
When the server restarts for the upgrade, it loads the `structs.Job` from the
Raft snapshot/logs. The jobspec has long since been parsed, so none of the
guards around the default value are in play. The empty field value for `Enabled`
is the zero value, which is false.
This doesn't impact any running allocation because we don't replace running
allocations when either the client or server restart. But as soon as any
allocation gets rescheduled (ex. you drain all your clients during upgrades),
it'll be using the `structs.Job` that the server has, which has `Enabled =
false`, and logs will not be collected.
This changeset fixes the bug by adding a new field `Disabled` which defaults to
false (so that the zero value works), and deprecates the old field.
Fixes#17076
This PR modifies references to the envoyproxy/envoy docker image to
explicitly include the docker.io prefix. This does not affect existing
users, but makes things easier for Podman users, who otherwise need to
specify the full name because Podman does not default to docker.io