nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-01 16:05:42 +03:00

Author	SHA1	Message	Date
Phil Renaud	86c858cdc3	[ui] Sentinel Policies CRUD UI (#20483 ) * Gallery allows picking stuff * Small fixes * added sentinel templates * Can set enforcement level on policies * Working on the interactive sentinel dev mode * Very rough development flow on FE * Changed position in gutter menu * More sentinel stuff * PR cleanup: removed testmode, removed unneeded mixins and deps * Heliosification * Index-level sentinel policy deletion and page title fixes * Makes the Canaries sentinel policy real and then comments out the unfinished ones * rename Access Control to Administration in prep for moving Sentinel Policies and Node Pool admin there * Sentinel policies moved within the Administration section * Mirage fixture for sentinel policy endpoints * Description length check and 500 prevention * Sync review PR feedback addressed, implied butons on radio cards * Cull un-used sentinel policies --------- Co-authored-by: Mike Nomitch <mail@mikenomitch.com>	2024-05-22 16:41:50 -04:00
Daniel Bennett	4415fabe7d	jobspec: time based task execution (#22201 ) this is the CE side of an Enterprise-only feature. a job trying to use this in CE will fail to validate. to enable daily-scheduled execution entirely client-side, a job may now contain: task "name" { schedule { cron { start = "0 12 * * * *" # may not include "," or "/" end = "0 16" # partial cron, with only {minute} {hour} timezone = "EST" # anything in your tzdb } } ... and everything about the allocation will be placed as usual, but if outside the specified schedule, the taskrunner will block on the client, waiting on the schedule start, before proceeding with the task driver execution, etc. this includes a taksrunner hook, which watches for the end of the schedule, at which point it will kill the task. then, restarts-allowing, a new task will start and again block waiting for start, and so on. this also includes all the plumbing required to pipe API calls through from command->api->agent->server->client, so that tasks can be force-run, force-paused, or resume the schedule on demand.	2024-05-22 15:40:25 -05:00
David Yu	6a25c2fb12	docs: add installation section to exec2 driver (#22091 ) * Update exec2.mdx Add installation section * Update exec2.mdx	2024-05-22 15:14:00 -05:00
Phil Renaud	e8b77fcfa0	[ui] Jobspec UI block: Descriptions and Links (#18292 ) * Hacky but shows links and desc * markdown * Small pre-test cleanup * Test for UI description and link rendering * JSON jobspec docs and variable example job get UI block * Jobspec documentation for UI block * Description and links moved into the Title component and made into Helios components * Marked version upgrade * Allow links without a description and max description to 1000 chars * Node 18 for setup-js * markdown sanitization * Ui to UI and docs change * Canonicalize, copy and diff for job.ui * UI block added to testJob for structs testing * diff test * Remove redundant reset * For readability, changing the receiving pointer of copied job variables * TestUI endpiont conversion tests * -require +must * Nil check on Links * JobUIConfig.Links as pointer --------- Co-authored-by: Tim Gross <tgross@hashicorp.com>	2024-05-22 15:00:45 -04:00
Seth Hoenig	09bd11383c	client: alloc_mounts directory must be sibling of data directory (#22199 ) This PR adjusts the default location of -alloc-mounts-dir path to be a sibling of the -data-dir path rather than a child. This is because on a production-hardened systems the data dir is supposed to be chmod 0700 owned by root - preventing the exec2 task driver (and others using unveil file system isolation features) from working properly. For reference the directory structure from -data-dir now looks like this after running an example job. Under the alloc_mounts directory, task specific directories are mode 0710 and owned by the task user (which may be a dynamic user UID/GID). ➜ sudo tree -p -d -u /tmp/mynomad [drwxrwxr-x shoenig ] /tmp/mynomad ├── [drwx--x--x root ] alloc_mounts │ └── [drwx--x--- 80552 ] c753b71d-c6a1-3370-1f59-47ab838fd8a6-mytask │ ├── [drwxrwxrwx nobody ] alloc │ │ ├── [drwxrwxrwx nobody ] data │ │ ├── [drwxrwxrwx nobody ] logs │ │ └── [drwxrwxrwx nobody ] tmp │ ├── [drwxrwxrwx nobody ] local │ ├── [drwxr-xr-x root ] private │ ├── [drwx--x--- 80552 ] secrets │ └── [drwxrwxrwt nobody ] tmp └── [drwx------ root ] data ├── [drwx--x--x root ] alloc │ └── [drwxr-xr-x root ] c753b71d-c6a1-3370-1f59-47ab838fd8a6 │ ├── [drwxrwxrwx nobody ] alloc │ │ ├── [drwxrwxrwx nobody ] data │ │ ├── [drwxrwxrwx nobody ] logs │ │ └── [drwxrwxrwx nobody ] tmp │ └── [drwx--x--- 80552 ] mytask │ ├── [drwxrwxrwx nobody ] alloc │ │ ├── [drwxrwxrwx nobody ] data │ │ ├── [drwxrwxrwx nobody ] logs │ │ └── [drwxrwxrwx nobody ] tmp │ ├── [drwxrwxrwx nobody ] local │ ├── [drwxrwxrwx nobody ] private │ ├── [drwx--x--- 80552 ] secrets │ └── [drwxrwxrwt nobody ] tmp ├── [drwx------ root ] client └── [drwxr-xr-x root ] server ├── [drwx------ root ] keystore ├── [drwxr-xr-x root ] raft │ └── [drwxr-xr-x root ] snapshots └── [drwxr-xr-x root ] serf 32 directories	2024-05-22 13:14:34 -05:00
Tim Gross	5bfb500932	refactor scheduler tests for node down/disconnected (#22198 ) While working on #20462 #12319 I found that some of our scheduler tests around down nodes or disconnected clients were enforcing invariants that were unclear. This changeset pulls out some minor refactorings so that the bug fix PR is easier to review. This includes: * Migrating a few tests from `testify` to `shoenig/test` that I'm going to touch in #12319 anyways. * Adding test names to the node down test * Update the disconnected client test so that we always re-process the pending/blocked eval it creates; this eliminates 2 redundant sub-tests. * Update the disconnected client test assertions so that they're explicit in the test setup rather than implied by whether we re-process the pending/blocked eval. Ref: https://github.com/hashicorp/nomad/issues/20462 Ref: https://github.com/hashicorp/nomad/pull/12319	2024-05-22 10:23:08 -04:00
KeisukeYamashita	1b872c422c	build: fix broken link to nomad in docker (#22191 ) Signed-off-by: KeisukeYamashita <19yamashita15@gmail.com>	2024-05-22 12:02:25 +02:00
Nick Wales	1174019676	docs: typo fix (#22090 )	2024-05-21 14:29:31 -04:00
Michael Schurter	a3b1810bdb	doc: specify ca cert needs to be shared (#20620 ) Specify that the Vault JWT auth method must be configured to trust Nomad's CA certificate when mTLS is enabled.	2024-05-17 14:49:48 -07:00
Tim Gross	5a6262d1c4	tproxy: add implicit constraint on client version (#20623 ) The new transparent proxy feature already has an implicity constraint on the presence of the CNI plugin. But if the CNI plugin is installed on an older version of Nomad, this isn't sufficient to protect against placing tproxy workloads on clients that can't support it. Add a Nomad version constraint as well. Fixes: https://github.com/hashicorp/nomad/issues/20614	2024-05-17 11:57:06 -04:00
Piotr Kazmierczak	b5bca27c07	docs: add a note to binding rules docs about multiple rules application (#20624 )	2024-05-17 17:40:12 +02:00
Seth Hoenig	7d00a494d9	windows: fix inefficient gathering of task processes (#20619 ) * windows: fix inefficient gathering of task processes * return set of just executor pid in case of ps error	2024-05-17 09:46:23 -05:00
Ben Roberts	a6f6384b71	Permit Consul Connect Gateways to be used with podman (#20611 ) * Permit Consul Connect Gateways to be used with podman Enable use of Consul Connect Gateways (ingresss/terminating/mesh) with podman task driver. task driver for Connect-enabled tasks for sidecar services which used podman if any other task in the same task group was using podman or fell back to docker otherwise. That PR did not consider consul connect gateways, which remained hardcoded to using docker task driver always. This change applies the same heuristic also to gateway tasks, enabling use of podman. Limitations: The heuristic only works where the task group containing the gateway also contains a podman task. Therefore it does not work for the ingress example in the docs (https://developer.hashicorp.com/nomad/docs/job-specification/gateway#ingress-gateway) which uses connect native and requires the gateway be in a separate task. * cl: add cl for connect gateway podman autodetect * connect: add test ensuring we guess podman for gateway when possible --------- Co-authored-by: Seth Hoenig <shoenig@duck.com>	2024-05-17 09:26:09 -05:00
claire labry	e9d6c39dba	SMRE/BPA Onboarding LTS (#20595 ) Configuration changes to use backport assistant with LTS support. These include: * adding a manifest file for active releases * adding configuration to send backport to ENT repo	2024-05-17 08:21:42 -04:00
Tim Gross	5666065131	tests: update disconnected client scheduler tests to avoid blocking (#20615 ) While working on #20462, I discovered that some of the scheduler tests for disconnected clients making long blocking queries. The tests used `testutil.WaitForResult` to wait for an evaluation to be written to the state store. The evaluation was never written, but the tests were not correctly returning an error for an empty query. This resulted in the tests blocking for 5s and then continuing anyways. In practice, the evaluation is never written to the state store as part of the test harness `Process` method, so this test assertion was meaningless. Remove the broken assertion from the two top-level tests that used it, and upgrade these tests to use `shoenig/test` in the process. This will save us ~50s per test run.	2024-05-16 12:16:27 -04:00
Tim Gross	c8c67da52d	CSI: allow plugin GC to detect jobs with updated plugin IDs (#20555 ) When a job that implements a plugin is updated to have a new plugin ID, the old version of the plugin is never deleted. We want to delay deleting plugins until garbage collection to avoid race conditions between a plugin being registered and its allocations being marked healthy. Add logic to the state store's `DeleteCSIPlugin` method (used only by GC) to check whether any of the jobs associated with the plugin have no allocations and either have been purged or have been updated to no longer implement that plugin ID. This changeset also updates the CSI plugin lifecycle tests in the state store to use `shoenig/test` over `testify`, and removes a spurious error log that was happening on every periodic plugin GC attempt. Fixes: https://github.com/hashicorp/nomad/issues/20225	2024-05-16 10:29:07 -04:00
Tim Gross	b1657dd1fa	CSI: track node claim before staging to prevent interleaved unstage (#20550 ) The CSI hook for each allocation that claims a volume runs concurrently. If a call to `MountVolume` happens at the same time as a call to `UnmountVolume` for the same volume, it's possible for the second alloc to detect the volume has already been staged, then for the original alloc to unpublish and unstage it, only for the second alloc to then attempt to publish a volume that's been unstaged. The usage tracker on the volume manager was intended to prevent this behavior but the call to claim the volume was made only after staging and publishing was complete. Move the call to claim the volume for the usage tracker to the top of the `MountVolume` workflow to prevent it from being unstaged until all consuming allocations have called `UnmountVolume`. Fixes: https://github.com/hashicorp/nomad/issues/20424	2024-05-16 09:45:07 -04:00
Tim Gross	953bfcc31e	services: retry failed Nomad service deregistrations from client (#20596 ) When the allocation is stopped, we deregister the service in the alloc runner's `PreKill` hook. This ensures we delete the service registration and wait for the shutdown delay before shutting down the tasks, so that workloads can drain their connections. However, the call to remove the workload only logs errors and never retries them. Add a short retry loop to the `RemoveWorkload` method for Nomad services, so that transient errors give us an extra opportunity to deregister the service before the tasks are stopped, before we need to fall back to the data integrity improvements implemented in #20590. Ref: https://github.com/hashicorp/nomad/issues/16616	2024-05-16 08:59:54 -04:00
Dianne Laguerta	cabdd7eddb	migrate GHA workflows to using single runner labels (#20581 )	2024-05-16 13:35:10 +01:00
Szymon Nowicki-Korgol	898dddc5db	structs: Fix job canonicalization for array type fields (#20522 ) Co-authored-by: Piotr Kazmierczak <470696+pkazmierczak@users.noreply.github.com>	2024-05-16 14:05:12 +02:00
Phil Renaud	6886edf033	Makes it so an empty state query blocks and changes the style to be more Nomadic (#20588 )	2024-05-15 13:57:48 -04:00
Deniz Onur Duzgun	1cc99cc1b4	bug: resolve type conversion alerts (#20553 )	2024-05-15 13:22:10 -04:00
Tim Gross	6d806a9934	services: fix data integrity errors for Nomad native services (#20590 ) This changeset fixes three potential data integrity issues between allocations and their Nomad native service registrations. * When a node is marked down because it missed heartbeats, we remove Vault and Consul tokens (for the pre-Workload Identity workflows) after we've written the node update to Raft. This is unavoidably non-transactional because the Consul and Vault servers aren't in the same Raft cluster as Nomad itself. But we've unnecessarily mirrored this same behavior to deregister Nomad services. This makes it possible for the leader to successfully write the node update to Raft without removing services. To address this, move the delete into the same Raft transaction. One minor caveat with this approach is the upgrade path: if the leader is upgraded first and a node is marked down during this window, older followers will have stale information until they are also upgraded. This is unavoidable without requiring the leader to unconditionally make an extra Raft write for every down node until 2 LTS versions after Nomad 1.8.0. This temporary reduction in data integrity for stale reads seems like a reasonable tradeoff. * When an allocation is marked client-terminal from the client in `UpdateAllocsFromClient`, we have an opportunity to ensure data integrity by deregistering services for that allocation. * When an allocation is deleted during eval garbage collection, we have an opportunity to ensure data integrity by deregistering services for that allocation. This is a cheap no-op if the allocation has been previously marked client-terminal. This changeset does not address client-side retries for the originally reported issue, which will be done in a separate PR. Ref: https://github.com/hashicorp/nomad/issues/16616	2024-05-15 11:56:07 -04:00
Seth Hoenig	4148ca1769	client: mount shared alloc dir as nobody (#20589 ) In the Unveil filesystem isolation mode we were mounting the shared alloc dir with the UID/GID of the user of the task dir being mounted and 0710 filesystem permissions. This was causing the actual task dir to become inaccessible to other tasks in the allocation (a race where the last mounter wins). Instead mount the shared alloc dir as nobody with 0777 filesystem permissions.	2024-05-15 10:43:30 -05:00
Tim Gross	c9fd93c772	connect: support `volume_mount` blocks for sidecar task overrides (#20575 ) Users can override the default sidecar task for Connect workloads. This sidecar task might need access to certificate stores on the host. Allow adding the `volume_mount` block to the sidecar task override. Also fixes a bug where `volume_mount` blocks would not appear in plan diff outputs. Fixes: https://github.com/hashicorp/nomad/issues/19786	2024-05-14 12:49:37 -04:00
James Rasell	04ba358266	client: expose network namespace CNI config as task env vars. (#11810 ) This change exposes CNI configuration details of a network namespace as environment variables. This allows a task to use these value to configure itself; a potential use case is to run a Raft application binding to IP and Port details configured using the bridge network mode.	2024-05-14 09:02:06 +01:00
Juana De La Cuesta	169818b1bd	[gh-6980] Client: clean up old allocs before running new ones using the `exec` task driver. (#20500 ) Whenever the "exec" task driver is being used, nomad runs a plug in that in time runs the task on a container under the hood. If by any circumstance the executor is killed, the task is reparented to the init service and wont be stopped by Nomad in case of a job updated or stop. This commit introduces two mechanisms to avoid this behaviour: * Adds signal catching and handling to the executor, so in case of a SIGTERM, the signal will also be passed on to the task. * Adds a pre start clean up of the processes in the container, ensuring only the ones the executor runs are present at any given time.	2024-05-14 09:51:27 +02:00
Tim Gross	5b328d9adc	CSI: add support for wildcard namespaces on `plugin status` (#20551 ) The `nomad plugin status :plugin_id` command lists allocations that implement the plugin being queried. This list is filtered by the `-namespace` flag as usual. Cluster admins will likely deploy plugins to a single namespace, but for convenience they may want to have the wildcard namespace set in their command environment. Add support for handling the wildcard namespace to the CSI plugin RPC handler. Fixes: https://github.com/hashicorp/nomad/issues/20537	2024-05-13 15:42:35 -04:00
Tim Gross	0fb22eeab3	docs: fix broken markdown in alloc exec (#20576 )	2024-05-13 15:34:37 -04:00
Tim Gross	65ae61249c	CSI: include volume namespace in staging path (#20532 ) CSI volumes are namespaced. But the client does not include the namespace in the staging mount path. This causes CSI volumes with the same volume ID but different namespace to collide if they happen to be placed on the same host. The per-allocation paths don't need to be namespaced, because an allocation can only mount volumes from its job's own namespace. Rework the CSI hook tests to have more fine-grained control over the mock on-disk state. Add tests covering upgrades from staging paths missing namespaces. Fixes: https://github.com/hashicorp/nomad/issues/18741	2024-05-13 11:24:09 -04:00
Tim Gross	623486b302	deps: vendor containernetworking/plugins functions for net NS utils (#20556 ) We bring in `containernetworking/plugins` for the contents of a single file, which we use in a few places for running a goroutine in a specific network namespace. This code hasn't needed an update in a couple of years, and a good chunk of what we need was previously vendored into `client/lib/nsutil` already. Updating the library via dependabot is causing errors in Docker driver tests because it updates a lot of transient dependencies, and it's bringing in a pile of new transient dependencies like opentelemetry. Avoid this problem going forward by vendoring the remaining code we hadn't already. Ref: https://github.com/hashicorp/nomad/pull/20146	2024-05-13 09:10:16 -04:00
Tim Gross	baee2a0f38	docs: correct ACL requirements for CSI plugins (#20552 ) CSI plugins are not namespaced, and there's no "list plugin" ACL. Instead, listing and reading plugins require the `plugin:read` ACL.	2024-05-13 09:10:02 -04:00
James Rasell	65d86cbccc	github: fix lint action check with install-vault descriptions. (#20547 )	2024-05-13 09:54:41 +01:00
Tim Gross	1251c1ded9	docs: note that plugin policy is required in the UI for CSI volumes (#20557 ) The ACL docs have a section explaining that some parts of the UI need slightly wider read permissions than expected. These docs should include that you need `plugin:read` to look at CSI volume pages in the UI. Fixes: https://github.com/hashicorp/nomad/issues/18527	2024-05-10 16:42:10 -04:00
James Rasell	7e42ad869a	client: fix unallocated CPU metric when reserved cpu is set. (#20543 )	2024-05-09 10:55:22 +01:00
Phil Renaud	8620fdca85	Adds the token name to the Profile link in the top nav (#20539 )	2024-05-08 12:33:58 -04:00
Piotr Kazmierczak	fe1533e638	Merge pull request #20536 from hashicorp/release/1.8.0-beta.1 Release/1.8.0 beta.1	2024-05-08 08:29:04 +02:00
Seth Hoenig	14a022cbc0	drivers/raw_exec: enable setting cgroup override values (#20481 ) * drivers/raw_exec: enable setting cgroup override values This PR enables configuration of cgroup override values on the `raw_exec` task driver. WARNING: setting cgroup override values eliminates any gauruntee Nomad can make about resource availability for any task on the client node. For cgroup v2 systems, set a single unified cgroup path using `cgroup_v2_override`. The path may be either absolute or relative to the cgroup root. config { cgroup_v2_override = "custom.slice/app.scope" } or config { cgroup_v2_override = "/sys/fs/cgroup/custom.slice/app.scope" } For cgroup v1 systems, set a per-controller path for each controller using `cgroup_v1_override`. The path(s) may be either absolute or relative to the controller root. config { cgroup_v1_override = { "pids": "custom/app", "cpuset": "custom/app", } } or config { cgroup_v1_override = { "pids": "/sys/fs/cgroup/pids/custom/app", "cpuset": "/sys/fs/cgroup/cpuset/custom/app", } } * drivers/rawexec: ensure only one of v1/v2 cgroup override is set * drivers/raw_exec: executor should error if setting cgroup does not work * drivers/raw_exec: create cgroups in raw_exec tests * drivers/raw_exec: ensure we fail to start if custom cgroup set and non-root * move custom cgroup func into shared file --------- Co-authored-by: Michael Schurter <mschurter@hashicorp.com>	2024-05-07 16:46:27 -07:00
hc-github-team-nomad-core	e1333eb9f6	Prepare for next release	2024-05-07 07:06:12 +00:00
hc-github-team-nomad-core	e1a176c120	Generate files for 1.8.0-beta.1 release	2024-05-07 07:06:07 +00:00
Piotr Kazmierczak	d68d9c27e1	Prepare release 1.8.0-beta.1	2024-05-07 09:01:15 +02:00
James Rasell	5041460043	core: do not create evaluations within batch deregister endpoint. (#20510 ) The batch deregister RPC endpoint is only used by the internal garbage collection process, it is not exposed via the HTTP API or used anywhere else. The GC process ensures that a job can only be removed from state if all related evaluations and allocations are in a state that means they can also be removed from state. This means that we do not need to create evaluations when jobs are being deregistered via this endpoint.	2024-05-07 07:39:13 +01:00
Phil Renaud	16479af38d	Jobs Index Page: Live Updates + Pagination (#20452 ) * Hook and latch on the initial index * Serialization and restart of controller and table * de-log * allocBlocks reimplemented at job model level * totalAllocs doesnt mean on jobmodel what it did in steady.js * Hamburgers to sausages * Hacky way to bring new jobs back around and parent job handling in list view * Getting closer to hook/latch * Latch from update on hook from initialize, but fickle * Note on multiple-watch problem * Sensible monday morning comment removal * use of abortController to handle transition and reset events * Next token will now update when there's an on-page shift * Very rough anti-jostle technique * Demoable, now to move things out of route and into controller * Into the controller, generally * Smarter cancellations * Reset abortController on index models run, and system/sysbatch jobs now have an improved groupCountSum computed property * Prev Page reverse querying * n+1th jobs existing will trigger nextToken/pagination display * Start of a GET/POST statuses return * Namespace fix * Unblock tests * Realizing to my small horror that this skipURLModification flag may be too heavy handed * Lintfix * Default liveupdates localStorage setting to true * Pagination and index rethink * Big uncoupling of watchable and url-append stuff * Testfixes for region, search, and keyboard * Job row class for test purposes * Allocations in test now contain events * Starting on the jobs list tests in earnest * Forbidden state de-bubbling cleanup * Job list page size fixes * Facet/Search/Filter jobs list tests skipped * Maybe it's the automatic mirage logging * Unbreak task unit test * Pre-sort sort * styling for jobs list pagination and general PR cleanup * moving from Job.ActiveDeploymentID to Job.LatestDeployment.ID * modifyIndex-based pagination (#20350) * modifyIndex-based pagination * modifyIndex gets its own column and pagination compacted with icons * A generic withPagination handler for mirage * Some live-PR changes * Pagination and button disabled tests * Job update handling tests for jobs index * assertion timeout in case of long setTimeouts * assert.timeouts down to 500ms * de-to-do * Clarifying comment and test descriptions * Bugfix: resizing your browser on the new jobs index page would make the viz grow forever (#20458) * [ui] Searching and filtering options (#20459) * Beginnings of a search box for filter expressions * jobSearchBox integration test * jobs list updateFilter initial test * Basic jobs list filtering tests * First attempt at side-by-side facets and search with a computed filter * Weirdly close to an iterative approach but checked isnt tracked properly * Big rework to make filter composition and decomposition work nicely with the url * Namespace facet dropdown added * NodePool facet dropdown added * hdsFacet for future testing and basic namespace filtering test * Namespace filter existence test * Status filtering * Node pool/dynamic facet test * Test patchups * Attempt at optimize test fix * Allocation re-load on optimize page explainer * The Big Un-Skip * Post-PR-review cleanup * todo-squashing * [ui] Handle parent/child jobs with the paginated Jobs Index route (#20493) * First pass at a non-watchQuery version * Parameterized jobs get child fetching and jobs index status style for parent jobs * Completed allocs vs Running allocs in a child-job context, and fix an issue where moving from parent to parent would not reset index * Testfix and better handling empty-child-statuses-list * Parent/child test case * Dont show empty allocation-status bars for parent jobs with no children * Splits Settings into 2 sections, sign-in/profile and user settings (#20535) * Changelog	2024-05-06 17:09:37 -04:00
Phil Renaud	890c2ce713	Remove json linting while editing variables (#20529 )	2024-05-03 16:33:33 -04:00
Daniel Bennett	cf87a556b3	api: new /v1/jobs/statuses endpoint for /ui/jobs page (#20130 ) introduce a new API /v1/jobs/statuses, primarily for use in the UI, which collates info about jobs, their allocations, and latest deployment. currently the UI gets all of /v1/jobs and sorts and paginates them client-side in the browser, and its "summary" column is based on historical summary data (which can be visually misleading, and sometimes scary when a job has failed at some point in the not-yet-garbage-collected past). this does pagination and filtering and such, and returns jobs sorted by ModifyIndex, so latest-changed jobs still come first. it pulls allocs and latest deployment straight out of current state for more a more robust, holistic view of the job status. it is less efficient per-job, due to the extra state lookups, but should be more efficient per-page (excepting perhaps for job(s) with very-many allocs). if a POST body is sent like `{"jobs": [{"namespace": "cool-ns", "id": "cool-job"}]}`, then the response will be limited to that subset of jobs. the main goal here is to prevent "jostling" the user in the UI when jobs come into and out of existence. and if a blocking query is started with `?index=N`, then the query should only unblock if jobs "on page" change, rather than any change to any of the state tables being queried ("jobs", "allocs", and "deployment"), to save unnecessary HTTP round trips.	2024-05-03 15:01:40 -05:00
Tim Gross	54fc146432	agent: add support for sdnotify protocol (#20528 ) Nomad agents expect to receive `SIGHUP` to reload their configuration. The signal handler for this is installed fairly late in agent startup, after the client or server components are up and running. This means that configuration management tools can potentially reload the configuration before the agent can handle it, causing the agent to crash. We don't want to allow configuration reload during client or server component startup, because it would significantly complicate initialization. Instead, we'll implement the systemd notify protocol. This causes systemd to block sending configuration reload signals until the agent is actually ready. Users can still bypass this by sending signals directly. Note that there are several Go libraries that implement the sdnotify protocol, but most are part of much larger projects which would create a lot of dependabot burden. The bits of the protocol we need are extremely simple to implement in a just a couple of functions. For non-Linux or non-systemd Linux systems, this feature is a no-op. In future work we could potentially implement service notification for Windows as well. Fixes: https://github.com/hashicorp/nomad/issues/3885	2024-05-03 13:42:07 -04:00
Tim Gross	f41bc468eb	consul: provide `CONSUL_HTTP_TOKEN` env var to tasks (#20519 ) When available, we provide an environment variable `CONSUL_TOKEN` to tasks, but this isn't the environment variable expected by the Consul CLI. Job specifications like deploying an API Gateway become noticeably nicer if we can instead provide the expected env var.	2024-05-03 11:30:33 -04:00
James Rasell	cd9e032855	deps: upgrade hashicorp/cap to v0.6.0 (#20517 )	2024-05-03 15:30:48 +01:00
Tim Gross	f9dd120d29	cli: add `-jwks-ca-file` to Vault/Consul setup commands (#20518 ) When setting up auth methods for Consul and Vault in production environments, we can typically assume that the CA certificate for the JWKS endpoint will be in the host certificate store (as part of the usual configuration management cluster admins needs to do). But for quick demos with `-dev` agents, this won't be the case. Add a `-jwks-ca-file` parameter to the setup commands so that we can use this tool to quickly setup WI with `-dev` agents running TLS.	2024-05-03 08:26:29 -04:00
Seth Hoenig	422d62df89	checklist: remove steps for openapi for rpc (#20515 )	2024-05-02 08:53:45 -05:00

1 2 3 4 5 ...

25811 Commits