Core scheduler relies on a special table in the state store—the TimeTable—to
figure out which objects can be GC'd. The TimeTable correlates Raft indices
with objects insertion time, a solution we used before most of the objects we
store in the state contained timestamps. This introduced a bit of a memory
overhead and complexity, but most importantly meant that any GC threshold users
set greater than timeTableLimit = 72 * time.Hour was ignored. This PR removes
the TimeTable and relies on object timestamps to determine whether they could
be GCd or not.
* docs: explain schedule state values
GET /v1/client/allocation/:alloc_id/pause?task=:task_name is a tiny but
critical API for observability of tasks with a schedule. This PR
explains each of the values which might be returned.
* correct docstring
* add missing state and expand PUT docs
---------
Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>
* jobspec: add a chown option to artifact block
This PR adds a boolean 'chown' field to the artifact block.
It indicates whether the Nomad client should chown the downloaded files
and directories to be owned by the task.user. This is useful for drivers
like raw_exec and exec2 which are subject to the host filesystem user
permissions structure. Before, these drivers might not be able to use or
manage the downloaded artifacts since they would be owned by the root
user on a typical Nomad client configuration.
* api: no need for pointer of chown field
* TaggedVersion information in structs, rather than job_endpoint (#23841)
* TaggedVersion information in structs, rather than job_endpoint
* Test for taggedVersion description length
* Some API plumbing
* Tag and Untag job versions (#23863)
* Tag and Untag at API level on down, but am I unblocking the wrong thing?
* Code and comment cleanup
* Unset methods generally now I stare long into the namespace abyss
* Namespace passes through with QueryOptions removed from a write requesting struct
* Comment and PR review cleanup
* Version back to VersionStr
* Generally consolidate unset logic into apply for version tagging
* Addressed some PR comments
* Auth check and RPC forwarding
* uint64 instead of pointer for job version after api layer and renamed copy
* job tag command split into apply and unset
* latest-version convenience handling moved to CLI command level
* CLI tests for tagging/untagging
* UI parts removed
* Add to job table when unsetting job tag on latest version
* Vestigial no more
* Compare versions by name and version number with the nomad history command (#23889)
* First pass at passing a tagname and/or diff version to plan/versions requests
* versions API now takes compare_to flags
* Job history command output can have tag names and descriptions
* compare_to to diff-tag and diff-version, plus adding flags to history command
* 0th version now shows a diff if a specific diff target is requested
* Addressing some PR comments
* Simplify the diff-appending part of jobVersions and hide None-type diffs from CLI
* Remove the diff-tag and diff-version parts of nomad job plan, with an eye toward making them a new top-level CLI command soon
* Version diff tests
* re-implement JobVersionByTagName
* Test mods and simplification
* Documentation for nomad job history additions
* Prevent pruning and reaping of TaggedVersion jobs (#23983)
tagged versions should not count against JobTrackedVersions
i.e. new job versions being inserted should not evict tagged versions
and GC should not delete a job if any of its versions are tagged
Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>
---------
Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>
* [ui] Version Tags on the job versions page (#24013)
* Timeline styles and their buttons modernized, and tags added
* styled but not yet functional version blocks
* Rough pass at edit/unedit UX
* Styles consolidated
* better UX around version tag crud, plus adapter and serializers
* Mirage and acceptance tests
* Modify percy to not show time-based things
---------
Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>
* Job revert command and API endpoint can take a string version tag name (#24059)
* Job revert command and API endpoint can take a string version tag name
* RevertOpts as a signature-modified alternative to Revert()
* job revert CLI test
* Version pointers in endpoint tests
* Dont copy over the tag when a job is reverted to a version with a tag
* Convert tag name to version number at CLI level
* Client method for version lookup by tag
* No longer double-declaring client
* [ui] Add tag filter to the job versions page (#24064)
* Rough pass at the UI for version diff dropdown
* Cleanup and diff fetching via adapter method
* TaggedVersion now VersionTag (#24066)
---------
Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>
so more than one copy of a program can run
at a time on the same port with SO_REUSEPORT.
requires host network mode.
some task drivers (like docker) may also need
config {
network_mode = "host"
}
but this is not validated prior to placement.
when a CNI result includes an IPv6 address,
set it on the alloc's NetworkStatus for reference.
e.g.:
$ nomad alloc status -json 3dca | jq '.NetworkStatus'
{
"Address": "172.26.64.14",
"AddressIPv6": "fd00:a110:c8::b",
"DNS": null,
"InterfaceName": "eth0"
}
(CE backport of ENT 59433d56c7215c0b8bf33764f41b57d9bd30160f (without ent files))
* scheduler: enhance numa aware scheduling with support for devices
* cr: add comments
In #16872 we added support for configuring the API client with a unix domain
socket. In order to set the host correctly, we parse the address before mutating
the Address field in the configuration. But this prevents the configuration from
being reused across multiple clients, as the next time we parse the address it
will no longer be pointing to the socket. This breaks consumers like the
autoscaler, which reuse the API config between plugins.
Update the `NewClient` constructor to only override the `url` field if it hasn't
already been parsed. Include a test demonstrating safe reuse with a unix domain
socket.
Ref: https://github.com/hashicorp/nomad-autoscaler/issues/944
Ref: https://github.com/hashicorp/nomad-autoscaler/pull/945
On supported platforms, the secrets directory is a 1MiB tmpfs. But some tasks
need larger space for downloading large secrets. This is especially the case for
tasks using `templates`, which need extra room to write a temporary file to the
secrets directory that gets renamed to the old file atomically.
This changeset allows increasing the size of the tmpfs in the `resources`
block. Because this is a memory resource, we need to include it in the memory we
allocate for scheduling purposes. The task is already prevented from using more
memory in the tmpfs than the `resources.memory` field allows, but can bypass
that limit by writing to the tmpfs via `template` or `artifact` blocks.
Therefore, we need to account for the size of the tmpfs in the allocation
resources. Simply adding it to the memory needed when we create the allocation
allows it to be accounted for in all downstream consumers, and then we'll
subtract that amount from the memory resources just before configuring the task
driver.
For backwards compatibility, the default value of 1MiB is "free" and ignored by
the scheduler. Otherwise we'd be increasing the allocated resources for every
existing alloc, which could cause problems across upgrades. If a user explicitly
sets `resources.secrets = 1` it will no longer be free.
Fixes: https://github.com/hashicorp/nomad/issues/2481
Ref: https://hashicorp.atlassian.net/browse/NET-10070
When a root key is rotated, the servers immediately start signing Workload
Identities with the new active key. But workloads may be using those WI tokens
to sign into external services, which may not have had time to fetch the new
public key and which might try to fetch new keys as needed.
Add support for prepublishing keys. Prepublished keys will be visible in the
JWKS endpoint but will not be used for signing or encryption until their
`PublishTime`. Update the periodic key rotation to prepublish keys at half the
`root_key_rotation_threshold` window, and promote prepublished keys to active
after the `PublishTime`.
This changeset also fixes two bugs in periodic root key rotation and garbage
collection, both of which can't be safely fixed without implementing
prepublishing:
* Periodic root key rotation would never happen because the default
`root_key_rotation_threshold` of 720h exceeds the 72h maximum window of the FSM
time table. We now compare the `CreateTime` against the wall clock time instead
of the time table. (We expect to remove the time table in future work, ref
https://github.com/hashicorp/nomad/issues/16359)
* Root key garbage collection could GC keys that were used to sign
identities. We now wait until `root_key_rotation_threshold` +
`root_key_gc_threshold` before GC'ing a key.
* When rekeying a root key, the core job did not mark the key as inactive after
the rekey was complete.
Ref: https://hashicorp.atlassian.net/browse/NET-10398
Ref: https://hashicorp.atlassian.net/browse/NET-10280
Fixes: https://github.com/hashicorp/nomad/issues/19669
Fixes: https://github.com/hashicorp/nomad/issues/23528
Fixes: https://github.com/hashicorp/nomad/issues/19368
Fixes a bug where variable values in job submissions that contained newlines
weren't encoded correctly, and thus jobs that contained them couldn't be
resumed once stopped via the UI.
Internal ref: https://hashicorp.atlassian.net/browse/NET-9966
The RPC handler for scaling a job passes flags to enforce the job modify index
is unchanged when it makes the write to Raft. But its only checking against the
existing job modify index at the time the RPC handler snapshots the state store,
so it can only enforce consistency for its own validation.
In clusters with automated scaling, it would be useful to expose the enforce
index options to the API, so that cluster admins can enforce that scaling only
happens when the job state is consistent with a state they've previously seen in
other API calls. Add this option to the CLI and API and have the RPC handler
check them if asked.
Fixes: https://github.com/hashicorp/nomad/issues/23444
When an allocation fails it triggers an evaluation. The evaluation is processed
and the scheduler sees it needs to reschedule, which triggers a follow-up
eval. The follow-up eval creates a plan to `(stop 1) (place 1)`. The replacement
alloc has a `RescheduleTracker` (or gets its `RescheduleTracker` updated).
But in the case where the follow-up eval can't place all allocs (there aren't
enough resources), it can create a partial plan to `(stop 1) (place 0)`. It then
creates a blocked eval. The plan applier stops the failed alloc. Then when the
blocked eval is processed, the job is missing an allocation, so the scheduler
creates a new allocation. This allocation is _not_ a replacement from the
perspective of the scheduler, so it's not handed off a `RescheduleTracker`.
This changeset fixes this by annotating the reschedule tracker whenever the
scheduler can't place a replacement allocation. We check this annotation for
allocations that have the `stop` desired status when filtering out allocations
to pass to the reschedule tracker. I've also included tests that cover this case
and expands coverage of the relevant area of the code.
Fixes: https://github.com/hashicorp/nomad/issues/12147
Fixes: https://github.com/hashicorp/nomad/issues/17072
this is the CE side of an Enterprise-only feature.
a job trying to use this in CE will fail to validate.
to enable daily-scheduled execution entirely client-side,
a job may now contain:
task "name" {
schedule {
cron {
start = "0 12 * * * *" # may not include "," or "/"
end = "0 16" # partial cron, with only {minute} {hour}
timezone = "EST" # anything in your tzdb
}
}
...
and everything about the allocation will be placed as usual,
but if outside the specified schedule, the taskrunner will block
on the client, waiting on the schedule start, before proceeding
with the task driver execution, etc.
this includes a taksrunner hook, which watches for the end of
the schedule, at which point it will kill the task.
then, restarts-allowing, a new task will start and again block
waiting for start, and so on.
this also includes all the plumbing required to pipe API calls
through from command->api->agent->server->client, so that
tasks can be force-run, force-paused, or resume the schedule
on demand.
* Hacky but shows links and desc
* markdown
* Small pre-test cleanup
* Test for UI description and link rendering
* JSON jobspec docs and variable example job get UI block
* Jobspec documentation for UI block
* Description and links moved into the Title component and made into Helios components
* Marked version upgrade
* Allow links without a description and max description to 1000 chars
* Node 18 for setup-js
* markdown sanitization
* Ui to UI and docs change
* Canonicalize, copy and diff for job.ui
* UI block added to testJob for structs testing
* diff test
* Remove redundant reset
* For readability, changing the receiving pointer of copied job variables
* TestUI endpiont conversion tests
* -require +must
* Nil check on Links
* JobUIConfig.Links as pointer
---------
Co-authored-by: Tim Gross <tgross@hashicorp.com>
Users can override the default sidecar task for Connect workloads. This sidecar
task might need access to certificate stores on the host. Allow adding the
`volume_mount` block to the sidecar task override.
Also fixes a bug where `volume_mount` blocks would not appear in plan diff
outputs.
Fixes: https://github.com/hashicorp/nomad/issues/19786
introduce a new API /v1/jobs/statuses, primarily for use in the UI,
which collates info about jobs, their allocations, and latest deployment.
currently the UI gets *all* of /v1/jobs and sorts and paginates them client-side
in the browser, and its "summary" column is based on historical summary data
(which can be visually misleading, and sometimes scary when a job has failed
at some point in the not-yet-garbage-collected past).
this does pagination and filtering and such, and returns jobs sorted by ModifyIndex,
so latest-changed jobs still come first. it pulls allocs and latest deployment
straight out of current state for more a more robust, holistic view of the job status.
it is less efficient per-job, due to the extra state lookups, but should be more efficient
per-page (excepting perhaps for job(s) with very-many allocs).
if a POST body is sent like `{"jobs": [{"namespace": "cool-ns", "id": "cool-job"}]}`,
then the response will be limited to that subset of jobs. the main goal here is to
prevent "jostling" the user in the UI when jobs come into and out of existence.
and if a blocking query is started with `?index=N`, then the query should only
unblock if jobs "on page" change, rather than any change to any of the state
tables being queried ("jobs", "allocs", and "deployment"), to save unnecessary
HTTP round trips.
Add a transparent proxy block to the existing Connect sidecar service proxy
block. This changeset is plumbing required to support transparent proxy
configuration on the client.
Ref: https://github.com/hashicorp/nomad/issues/10628
The JSON response for the Read Stats client API includes an `AllocDirStats`
field. This field is missing in the `api` package, so consumers of the Go API
can't use it to read the values we're getting back from the HTTP server.
Fixes: https://github.com/hashicorp/nomad/issues/20246
Add support for further configuring `gateway.ingress.service` blocks to bring
this block up-to-date with currently available Consul API fields (except for
namespace and admin partition, which will need be handled under a different
PR). These fields are sent to Consul as part of the job endpoint submission hook
for Connect gateways.
Co-authored-by: Horacio Monsalvo <horacio.monsalvo@southworks.com>
Add information about autopilot health to the `/operator/autopilot/health` API
in Nomad Enterprise.
I've pulled the CE changes required for this feature out of @lindleywhite's PR
in the Enterprise repo. A separate PR will include a new `operator autopilot
health` command that can present this information at the command line.
Ref: https://github.com/hashicorp/nomad-enterprise/pull/1394
Co-authored-by: Lindley <lindley@hashicorp.com>
* tests: swap testify for test in plugins/csi/client_test.go
* tests: swap testify for test in testutil/
* tests: swap testify for test in host_test.go
* tests: swap testify for test in plugin_test.go
* tests: swap testify for test in utils_test.go
* tests: swap testify for test in scheduler/
* tests: swap testify for test in parse_test.go
* tests: swap testify for test in attribute_test.go
* tests: swap testify for test in plugins/drivers/
* tests: swap testify for test in command/
* tests: fixup some test usages
* go: run go mod tidy
* windows: cpuset test only on linux
This PR is the first on two that will implement the new Disconnect block. In this PR the new block is introduced to be backwards compatible with the fields it will replace. For more information refer to this RFC and this ticket.
Add new configuration option on task's volume_mounts, to give a fine grained control over SELinux "z" label
* Update website/content/docs/job-specification/volume_mount.mdx
Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
* fix: typo
* func: make volume mount verification happen even on mounts with no volume
---------
Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
Co-authored-by: Tim Gross <tgross@hashicorp.com>
The new `nomad setup vault -check` commmand can be used to retrieve
information about the changes required before a cluster is migrated from
the deprecated legacy authentication flow with Vault to use only
workload identities.
Some users with batch workloads or short-lived prestart tasks want to derive a
Vaul token, use it, and then allow it to expire without requiring a constant
refresh. Add the `vault.allow_token_expiration` field, which works only with the
Workload Identity workflow and not the legacy workflow.
When set to true, this disables the client's renewal loop in the
`vault_hook`. When Vault revokes the token lease, the token will no longer be
valid. The client will also now automatically detect if the Vault auth
configuration does not allow renewals and will disable the renewal loop
automatically.
Note this should only be used when a secret is requested from Vault once at the
start of a task or in a short-lived prestart task. Long-running tasks should
never set `allow_token_expiration=true` if they obtain Vault secrets via
`template` blocks, as the Vault token will expire and the template runner will
continue to make failing requests to Vault until the `vault_retry` attempts are
exhausted.
Fixes: https://github.com/hashicorp/nomad/issues/8690
Add support for Consul Enterprise admin partitions. We added fingerprinting in
https://github.com/hashicorp/nomad/pull/19485. This PR adds a `consul.partition`
field. The expectation is that most users will create a mapping of Nomad node
pool to Consul admin partition. But we'll also create an implicit constraint for
the fingerprinted value.
Fixes: https://github.com/hashicorp/nomad/issues/13139
Add new optional `OIDCDisableUserInfo` setting for OIDC auth provider which
disables a request to the identity provider to get OIDC UserInfo.
This option is helpful when your identity provider doesn't send any additional
claims from the UserInfo endpoint, such as Microsoft AD FS OIDC Provider:
> The AD FS UserInfo endpoint always returns the subject claim as specified in the
> OpenID standards. AD FS doesn't support additional claims requested via the
> UserInfo endpoint
Fixes#19318
Nomad CI checks for copywrite headers using multiple config files
for specific exemption paths. This means the top-level config file
does not take effect when running the copywrite script within
these sub-folders. Exempt files therefore need to be added to the
sub-config files, along with the top level.
This commit introduces the parameter preventRescheduleOnLost which indicates that the task group can't afford to have multiple instances running at the same time. In the case of a node going down, its allocations will be registered as unknown but no replacements will be rescheduled. If the lost node comes back up, the allocs will reconnect and continue to run.
In case of max_client_disconnect also being enabled, if there is a reschedule policy, an error will be returned.
Implements issue #10366
Co-authored-by: Dom Lavery <dom@circleci.com>
Co-authored-by: Tim Gross <tgross@hashicorp.com>
Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>