nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-01 16:05:42 +03:00

Author	SHA1	Message	Date
hc-github-team-nomad-core	e1333eb9f6	Prepare for next release	2024-05-07 07:06:12 +00:00
hc-github-team-nomad-core	e1a176c120	Generate files for 1.8.0-beta.1 release	2024-05-07 07:06:07 +00:00
Piotr Kazmierczak	d68d9c27e1	Prepare release 1.8.0-beta.1	2024-05-07 09:01:15 +02:00
James Rasell	5041460043	core: do not create evaluations within batch deregister endpoint. (#20510 ) The batch deregister RPC endpoint is only used by the internal garbage collection process, it is not exposed via the HTTP API or used anywhere else. The GC process ensures that a job can only be removed from state if all related evaluations and allocations are in a state that means they can also be removed from state. This means that we do not need to create evaluations when jobs are being deregistered via this endpoint.	2024-05-07 07:39:13 +01:00
Phil Renaud	16479af38d	Jobs Index Page: Live Updates + Pagination (#20452 ) * Hook and latch on the initial index * Serialization and restart of controller and table * de-log * allocBlocks reimplemented at job model level * totalAllocs doesnt mean on jobmodel what it did in steady.js * Hamburgers to sausages * Hacky way to bring new jobs back around and parent job handling in list view * Getting closer to hook/latch * Latch from update on hook from initialize, but fickle * Note on multiple-watch problem * Sensible monday morning comment removal * use of abortController to handle transition and reset events * Next token will now update when there's an on-page shift * Very rough anti-jostle technique * Demoable, now to move things out of route and into controller * Into the controller, generally * Smarter cancellations * Reset abortController on index models run, and system/sysbatch jobs now have an improved groupCountSum computed property * Prev Page reverse querying * n+1th jobs existing will trigger nextToken/pagination display * Start of a GET/POST statuses return * Namespace fix * Unblock tests * Realizing to my small horror that this skipURLModification flag may be too heavy handed * Lintfix * Default liveupdates localStorage setting to true * Pagination and index rethink * Big uncoupling of watchable and url-append stuff * Testfixes for region, search, and keyboard * Job row class for test purposes * Allocations in test now contain events * Starting on the jobs list tests in earnest * Forbidden state de-bubbling cleanup * Job list page size fixes * Facet/Search/Filter jobs list tests skipped * Maybe it's the automatic mirage logging * Unbreak task unit test * Pre-sort sort * styling for jobs list pagination and general PR cleanup * moving from Job.ActiveDeploymentID to Job.LatestDeployment.ID * modifyIndex-based pagination (#20350) * modifyIndex-based pagination * modifyIndex gets its own column and pagination compacted with icons * A generic withPagination handler for mirage * Some live-PR changes * Pagination and button disabled tests * Job update handling tests for jobs index * assertion timeout in case of long setTimeouts * assert.timeouts down to 500ms * de-to-do * Clarifying comment and test descriptions * Bugfix: resizing your browser on the new jobs index page would make the viz grow forever (#20458) * [ui] Searching and filtering options (#20459) * Beginnings of a search box for filter expressions * jobSearchBox integration test * jobs list updateFilter initial test * Basic jobs list filtering tests * First attempt at side-by-side facets and search with a computed filter * Weirdly close to an iterative approach but checked isnt tracked properly * Big rework to make filter composition and decomposition work nicely with the url * Namespace facet dropdown added * NodePool facet dropdown added * hdsFacet for future testing and basic namespace filtering test * Namespace filter existence test * Status filtering * Node pool/dynamic facet test * Test patchups * Attempt at optimize test fix * Allocation re-load on optimize page explainer * The Big Un-Skip * Post-PR-review cleanup * todo-squashing * [ui] Handle parent/child jobs with the paginated Jobs Index route (#20493) * First pass at a non-watchQuery version * Parameterized jobs get child fetching and jobs index status style for parent jobs * Completed allocs vs Running allocs in a child-job context, and fix an issue where moving from parent to parent would not reset index * Testfix and better handling empty-child-statuses-list * Parent/child test case * Dont show empty allocation-status bars for parent jobs with no children * Splits Settings into 2 sections, sign-in/profile and user settings (#20535) * Changelog	2024-05-06 17:09:37 -04:00
Phil Renaud	890c2ce713	Remove json linting while editing variables (#20529 )	2024-05-03 16:33:33 -04:00
Daniel Bennett	cf87a556b3	api: new /v1/jobs/statuses endpoint for /ui/jobs page (#20130 ) introduce a new API /v1/jobs/statuses, primarily for use in the UI, which collates info about jobs, their allocations, and latest deployment. currently the UI gets all of /v1/jobs and sorts and paginates them client-side in the browser, and its "summary" column is based on historical summary data (which can be visually misleading, and sometimes scary when a job has failed at some point in the not-yet-garbage-collected past). this does pagination and filtering and such, and returns jobs sorted by ModifyIndex, so latest-changed jobs still come first. it pulls allocs and latest deployment straight out of current state for more a more robust, holistic view of the job status. it is less efficient per-job, due to the extra state lookups, but should be more efficient per-page (excepting perhaps for job(s) with very-many allocs). if a POST body is sent like `{"jobs": [{"namespace": "cool-ns", "id": "cool-job"}]}`, then the response will be limited to that subset of jobs. the main goal here is to prevent "jostling" the user in the UI when jobs come into and out of existence. and if a blocking query is started with `?index=N`, then the query should only unblock if jobs "on page" change, rather than any change to any of the state tables being queried ("jobs", "allocs", and "deployment"), to save unnecessary HTTP round trips.	2024-05-03 15:01:40 -05:00
Tim Gross	54fc146432	agent: add support for sdnotify protocol (#20528 ) Nomad agents expect to receive `SIGHUP` to reload their configuration. The signal handler for this is installed fairly late in agent startup, after the client or server components are up and running. This means that configuration management tools can potentially reload the configuration before the agent can handle it, causing the agent to crash. We don't want to allow configuration reload during client or server component startup, because it would significantly complicate initialization. Instead, we'll implement the systemd notify protocol. This causes systemd to block sending configuration reload signals until the agent is actually ready. Users can still bypass this by sending signals directly. Note that there are several Go libraries that implement the sdnotify protocol, but most are part of much larger projects which would create a lot of dependabot burden. The bits of the protocol we need are extremely simple to implement in a just a couple of functions. For non-Linux or non-systemd Linux systems, this feature is a no-op. In future work we could potentially implement service notification for Windows as well. Fixes: https://github.com/hashicorp/nomad/issues/3885	2024-05-03 13:42:07 -04:00
Tim Gross	f41bc468eb	consul: provide `CONSUL_HTTP_TOKEN` env var to tasks (#20519 ) When available, we provide an environment variable `CONSUL_TOKEN` to tasks, but this isn't the environment variable expected by the Consul CLI. Job specifications like deploying an API Gateway become noticeably nicer if we can instead provide the expected env var.	2024-05-03 11:30:33 -04:00
James Rasell	cd9e032855	deps: upgrade hashicorp/cap to v0.6.0 (#20517 )	2024-05-03 15:30:48 +01:00
Tim Gross	f9dd120d29	cli: add `-jwks-ca-file` to Vault/Consul setup commands (#20518 ) When setting up auth methods for Consul and Vault in production environments, we can typically assume that the CA certificate for the JWKS endpoint will be in the host certificate store (as part of the usual configuration management cluster admins needs to do). But for quick demos with `-dev` agents, this won't be the case. Add a `-jwks-ca-file` parameter to the setup commands so that we can use this tool to quickly setup WI with `-dev` agents running TLS.	2024-05-03 08:26:29 -04:00
Seth Hoenig	422d62df89	checklist: remove steps for openapi for rpc (#20515 )	2024-05-02 08:53:45 -05:00
James Rasell	3f866a7e82	test: regenerate test TLS certificates. (#20511 )	2024-05-02 13:58:32 +01:00
Michael Schurter	3aefc010d7	test: remove spurious print statements (#20503 )	2024-05-01 09:47:56 -07:00
Tim Gross	77dc74a301	quota: ensure quota usage is freed when jobs are purged (#20492 ) When a job is purged, we delete all its allocations and the client detects the absense of the allocations to clean up its resources locally. But the client won't be able to send an allocation status update in this case, which frees the quota being used by that allocation. Instead, we need to free the quota usage inside the state store immediately. To do so, we check if the allocation is already client-terminal before copying it and passing it into the Enterprise code for cleanup. This commit also refactors the job delete to make it clear there's a single caller of this alloc deletion path. This refactoring eliminates some wasteful logic that queries the "allocs" table, allocates a slice of strings for their IDs, and then queries the "allocs" table one-by-one for each of them for deletion anyways. Tests for this code can be found in the linked ENT repo PR. Fixes: https://github.com/hashicorp/nomad-enterprise/issues/1422 Ref: https://hashicorp.atlassian.net/browse/NOMAD-620 Ref: https://github.com/hashicorp/nomad-enterprise/pull/1432	2024-05-01 08:44:22 -04:00
Piotr Kazmierczak	abe9c0803a	e2e: unflake TestWorkloadIdentity/testNobody (#20499 ) sometimes the container quits too fast	2024-04-30 18:17:14 +02:00
James Rasell	05a7bb53d3	cli: fix handling of scaling jobs which don't generate evals. (#20479 ) In some cases, Nomad job scaling will not generate evaluations such as parameterized jobs. This change fixes the CLI behaviour in this case, and copies the job run command for consistency.	2024-04-30 10:32:31 +01:00
Tim Gross	ff2d9de592	Revert "E2E: skip Vault 1.16.1 for JWT compatibility test (#20301 )" (#20484 ) This reverts commit 45b36371a12ffae5b5bfaaeadb08f801fb6bc98d. Now that Vault 1.16.2 has shipped, the E2E test will pick up only a working version. Closes: https://github.com/hashicorp/nomad/issues/20298	2024-04-26 09:36:09 -04:00
Seth Hoenig	5f64e42d73	client: fixup how alloc mounts directory are setup (#20463 )	2024-04-26 07:29:52 -05:00
Seth Hoenig	7874d21881	docs: add exec2 task driver page (#20480 )	2024-04-24 07:26:54 -05:00
Seth Hoenig	8ae1a0e356	docs: add docs around dynamic workload users (#20477 )	2024-04-23 07:57:40 -05:00
Seth Hoenig	1dfc715721	docs: add docs for fsisolation.Unveil fs isolation mode (#20475 )	2024-04-23 07:55:54 -05:00
Daniel Bennett	3ac3bc1cfe	acl: token global mode can not be changed (#20464 ) true up CLI and docs with API reality	2024-04-22 11:58:47 -05:00
Tim Gross	ea5f2f6748	acl: remove remaining unused nil ACL object handling (#20456 ) As of #18754 which shipped in Nomad 1.7, we no longer need to nil-check the object returned by ResolveACL if there's no error return, because in the case where ACLs are disabled we return a special "ACLs disabled" ACL object. Checking nil is not a bug but should be discouraged because it opens us up to future bugs that would bypass ACLs. We fixed a bunch of these cases in https://github.com/hashicorp/nomad/pull/20150 but I didn't update the semgrep rule, which meant we missed a few more. Update the semgrep rule and fix the remaining cases.	2024-04-18 14:34:17 -04:00
Piotr Kazmierczak	048f4511e2	docs: correct nanoseconds to milliseconds for MeasureSince metrics (#20446 )	2024-04-18 18:16:58 +02:00
dependabot[bot]	b25de662a1	chore(deps): bump github.com/docker/docker from 25.0.2+incompatible to 26.0.1+incompatible (#20389 ) * chore(deps): bump github.com/docker/docker Bumps [github.com/docker/docker](https://github.com/docker/docker) from 25.0.2+incompatible to 26.0.1+incompatible. - [Release notes](https://github.com/docker/docker/releases) - [Commits](https://github.com/docker/docker/compare/v25.0.2...v26.0.1) --- updated-dependencies: - dependency-name: github.com/docker/docker dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> * include changelog --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Tim Gross <tgross@hashicorp.com>	2024-04-18 11:35:09 -04:00
Tim Gross	e4fe564bba	deps: update golang.org/x/net (#20434 ) Although Nomad does not use HTTP2, vulnerability scans detect our version of `golang.org/x/net` as having an HPACK DoS vuln (GHSA-4v7x-pqxf-cx7m). Upgrade the library so as to quiet the alerts. Fixes: https://github.com/hashicorp/nomad-enterprise/issues/1423	2024-04-18 10:34:35 -04:00
Tim Gross	b662f1e6e5	docs: fix incorrect dispatch payload limit in API docs (#20433 ) The dispatch payload limit is limited to 16KiB, not 64KiB. It's correct in the command docs but incorrect in the API docs. Ref: https://github.com/hashicorp/nomad/blob/v1.7.7/nomad/job_endpoint.go#L36-L38 Fixes: https://github.com/hashicorp/nomad/issues/20432	2024-04-18 10:20:15 -04:00
Daniel Bennett	363d2370f3	test: change some helpers testing.T to .TB (#20427 ) TB interface instead of T struct, so they can be used in Benchmarks too	2024-04-17 14:03:12 -05:00
Tim Gross	6d58acd897	WI: ensure tasks within same alloc get different Consul tokens (#20411 ) The `consul_hook` in the allocrunner gets a separate Consul token for each task, even if the tasks' identities have the same name, but used the identity name as the key to the alloc hook resources map. This means the last task in the group overwrites the Consul tokens of all other tasks. Fix this by adding the task name to the key in the allocrunner's `consul_hook`. And update the taskrunner's `consul_hook` to expect the task name in the key. Fixes: https://github.com/hashicorp/nomad/issues/20374 Fixes: https://hashicorp.atlassian.net/browse/NOMAD-614	2024-04-17 11:29:58 -04:00
Juana De La Cuesta	64978662b6	Post 1.7.7 release (#20421 ) Generate files for 1.7.7 release, prepare for next release and merge release 1.7.7 files	2024-04-17 10:44:32 +02:00
Daniel Bennett	ca1860ae76	state: enable more reverse sorting (#20410 ) * mainly jobs endpoint * update call sites * add new sort helpers * put sorting in a separate file	2024-04-16 15:10:11 -05:00
Tu Nguyen	79c07807f4	docs: update docs link in quick start (#20409 )	2024-04-16 15:52:35 -04:00
Phil Renaud	5150adffc0	[ui] Fix a bug where promotion would be asked with no new canaries (#20408 ) * Fix a UI bug where promotion would be asked with no new canaries * Because we now make sure of your allocations, our test cases should more accurately reflect a state of a promotable workflow	2024-04-16 15:50:06 -04:00
Tim Gross	22bfcdecf1	docs: add missing copyright headers in Terraform examples (#20412 )	2024-04-16 15:21:03 -04:00
Nick Wales	e014e8411c	terraform: updates AWS example packer and terraform code (#19512 ) The "Provision a Nomad cluster in the cloud" works in AWS with these updates: - use an available ubuntu version - uses hashicorp packages where possible - updates Nvidia installation - installs CNI plugins	2024-04-16 10:47:31 -04:00
Luiz Aoqui	9d4f7bcb68	mock_driver: fix fingreprint key (#20351 ) The `mock_driver` is an internal task driver used mostly for testing and simulating workloads. During the allocrunner v2 work (#4792) its name changed from `mock_driver` to just `mock` and then back to `mock_driver`, but the fingreprint key was kept as `driver.mock`. This results in tasks configured with `driver = "mock"` to be scheduled (because Nomad thinks the client has a task driver called `mock`), but fail to actually run (because the Nomad client can't find a driver called `mock` in its catalog). Fingerprinting the right name prevents the job from being scheduled in the first place. Also removes mentions of the mock driver from documentation since its an internal driver and not available in any production release.	2024-04-16 07:16:55 +01:00
Daniel Bennett	ee213c3ddd	comment on Job.ModifyIndex vs Job.JobModifyIndex (#20393 )	2024-04-15 16:39:16 -05:00
Daniel Bennett	30c0461048	systemd: comment on OOMScoreAdjust in service unit (#20392 )	2024-04-15 16:35:41 -05:00
Tim Gross	745d1dbe10	deps: update `go-getter` (#20391 )	2024-04-15 16:59:53 -04:00
Piotr Kazmierczak	0d14dd96ca	eval_broker: track enqueue and dequeue times (#20329 ) Adds new metrics to the eval broker that track times of evaluations enqueueing and dequeueing.	2024-04-15 16:16:50 +02:00
Tim Gross	1739f94e84	docs: fix a broken link on the Consul index page (#20387 )	2024-04-12 15:31:48 -04:00
Phil Renaud	f9c4d2bdf0	the hasBeenRestarted allocation property checks against its task events, which can sometimes be null (#20383 )	2024-04-12 14:49:07 -04:00
Tim Gross	43281f6038	docs: provide guidance on using Consul DNS (#20369 ) Add a standalone section to the Consul integration docs showing how to configure both the Consul agent and the workload to take advantage of Consul DNS. Include a reference to the new transparent proxy feature as well. Fixes: https://github.com/hashicorp/nomad/issues/18305	2024-04-12 14:38:04 -04:00
Tim Gross	9cb1ef3e3d	CNI: fix bugs in parsing strings to port number integers (#20379 ) Ports are a maximum of uint16, but we have a few places in the recent tproxy code where we were parsing them as 64-bit wide integers and then downcasting them to `int`, which is technically unsafe and triggers code scanning alerts. In practice we've validated the range elsewhere and don't build for 32-bit platforms. This changeset fixes the parsing to make everything a bit more robust and silence the alert. Fixes: https://github.com/hashicorp/nomad-enterprise/security/code-scanning/444	2024-04-12 13:31:25 -04:00
Daniel Bennett	bd802e43d0	add LICENSE to release artifacts (#20345 ) * add LICENSE(.txt) to zip that goes on releases.hashicorp.com * add LICENSE(.txt) to linux packages and docker image * add some more docker labels (including license)	2024-04-12 10:57:15 -05:00
Tim Gross	d40e23f939	E2E: clean up go mod cache after building `consul-cni` (#20378 ) In #20296 we added a Go tool chain to the AMI we use for E2E tests, so that we can build `consul-cni` for tproxy testing. This is intended to be temporary until `consul-k8s` 1.4.2 is officially released. But the Go cache from building `consul-k8s` uses up roughly 1.5GiB of space and the test machines have fairly small disks. This causes the Nomad clients to aggressively GC client allocations that stop, which breaks tests that run batch workloads and then read their logs.	2024-04-12 11:52:46 -04:00
Seth Hoenig	ae6c4c8e3f	deps: purge use of old x/exp packages (#20373 )	2024-04-12 08:29:00 -05:00
Tim Gross	1e50090776	docs: clarify "best effort" for ephemeral disk migration (#20357 ) The docs for ephemeral disk migration use the term "best effort" without outlining the requirements or the cases under which the migration can fail. Update the docs to make it obvious that ephemeral disk migration is subject to data loss. Fixes: https://github.com/hashicorp/nomad/issues/20355	2024-04-11 16:35:22 -04:00
astudentofblake	7b7ed12326	func: Allow custom paths to be added the the getter landlock (#20349 ) * func: Allow custom paths to be added the the getter landlock Fixes: 20315 * fix: slices imports fix: more meaningful examples fix: improve documentation fix: quote error output	2024-04-11 15:17:33 -05:00

1 2 3 4 5 ...

25773 Commits