nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-03 08:55:43 +03:00

Author	SHA1	Message	Date
Michael Smithhisler	6b6aa7cc26	identity: adds ability to specify custom filepath for saving workload identities (#24038 )	2024-09-23 10:27:00 -04:00
Daniel Bennett	ec81e7c57c	networking: add ignore_collision for static port{} (#23956 ) so more than one copy of a program can run at a time on the same port with SO_REUSEPORT. requires host network mode. some task drivers (like docker) may also need config { network_mode = "host" } but this is not validated prior to placement.	2024-09-17 16:01:48 -05:00
Daniel Bennett	5e1fae2856	networking: set alloc NetworkStatus.AddressIPv6 (#23959 ) when a CNI result includes an IPv6 address, set it on the alloc's NetworkStatus for reference. e.g.: $ nomad alloc status -json 3dca \| jq '.NetworkStatus' { "Address": "172.26.64.14", "AddressIPv6": "fd00:a110:c8::b", "DNS": null, "InterfaceName": "eth0" }	2024-09-16 10:21:52 -05:00
Tim Gross	a9beef7edd	jobspec: remove HCL1 support (#23912 ) This changeset removes support for parsing jobspecs via the long-deprecated HCLv1. Fixes: https://github.com/hashicorp/nomad/issues/20195 Ref: https://hashicorp.atlassian.net/browse/NET-10220	2024-09-05 09:02:45 -04:00
Piotr Kazmierczak	9265b384b3	quota: parse device block (#23866 )	2024-08-28 18:45:12 +02:00
Seth Hoenig	8b093a6a5d	scheduler: support for device - aware numa scheduling (#1760 ) (#23837 ) (CE backport of ENT 59433d56c7215c0b8bf33764f41b57d9bd30160f (without ent files)) * scheduler: enhance numa aware scheduling with support for devices * cr: add comments	2024-08-20 07:53:04 -05:00
Florian Apolloner	d6be784e2d	namespaces: add allowed network modes to capabilities. (#23813 )	2024-08-16 09:47:19 -04:00
Tim Gross	b7419bc940	api: only set url field in config if previously unset (#23785 ) In #16872 we added support for configuring the API client with a unix domain socket. In order to set the host correctly, we parse the address before mutating the Address field in the configuration. But this prevents the configuration from being reused across multiple clients, as the next time we parse the address it will no longer be pointing to the socket. This breaks consumers like the autoscaler, which reuse the API config between plugins. Update the `NewClient` constructor to only override the `url` field if it hasn't already been parsed. Include a test demonstrating safe reuse with a unix domain socket. Ref: https://github.com/hashicorp/nomad-autoscaler/issues/944 Ref: https://github.com/hashicorp/nomad-autoscaler/pull/945	2024-08-09 13:28:04 -04:00
Tim Gross	b25f1b66ce	resources: allow job authors to configure size of secrets tmpfs (#23696 ) On supported platforms, the secrets directory is a 1MiB tmpfs. But some tasks need larger space for downloading large secrets. This is especially the case for tasks using `templates`, which need extra room to write a temporary file to the secrets directory that gets renamed to the old file atomically. This changeset allows increasing the size of the tmpfs in the `resources` block. Because this is a memory resource, we need to include it in the memory we allocate for scheduling purposes. The task is already prevented from using more memory in the tmpfs than the `resources.memory` field allows, but can bypass that limit by writing to the tmpfs via `template` or `artifact` blocks. Therefore, we need to account for the size of the tmpfs in the allocation resources. Simply adding it to the memory needed when we create the allocation allows it to be accounted for in all downstream consumers, and then we'll subtract that amount from the memory resources just before configuring the task driver. For backwards compatibility, the default value of 1MiB is "free" and ignored by the scheduler. Otherwise we'd be increasing the allocated resources for every existing alloc, which could cause problems across upgrades. If a user explicitly sets `resources.secrets = 1` it will no longer be free. Fixes: https://github.com/hashicorp/nomad/issues/2481 Ref: https://hashicorp.atlassian.net/browse/NET-10070	2024-08-05 16:06:58 -04:00
Tim Gross	2f4353412d	keyring: support prepublishing keys (#23577 ) When a root key is rotated, the servers immediately start signing Workload Identities with the new active key. But workloads may be using those WI tokens to sign into external services, which may not have had time to fetch the new public key and which might try to fetch new keys as needed. Add support for prepublishing keys. Prepublished keys will be visible in the JWKS endpoint but will not be used for signing or encryption until their `PublishTime`. Update the periodic key rotation to prepublish keys at half the `root_key_rotation_threshold` window, and promote prepublished keys to active after the `PublishTime`. This changeset also fixes two bugs in periodic root key rotation and garbage collection, both of which can't be safely fixed without implementing prepublishing: * Periodic root key rotation would never happen because the default `root_key_rotation_threshold` of 720h exceeds the 72h maximum window of the FSM time table. We now compare the `CreateTime` against the wall clock time instead of the time table. (We expect to remove the time table in future work, ref https://github.com/hashicorp/nomad/issues/16359) * Root key garbage collection could GC keys that were used to sign identities. We now wait until `root_key_rotation_threshold` + `root_key_gc_threshold` before GC'ing a key. * When rekeying a root key, the core job did not mark the key as inactive after the rekey was complete. Ref: https://hashicorp.atlassian.net/browse/NET-10398 Ref: https://hashicorp.atlassian.net/browse/NET-10280 Fixes: https://github.com/hashicorp/nomad/issues/19669 Fixes: https://github.com/hashicorp/nomad/issues/23528 Fixes: https://github.com/hashicorp/nomad/issues/19368	2024-07-19 13:29:41 -04:00
Martina Santangelo	661011f5de	cni: allow users to set CNI args in job spec (#23538 )	2024-07-12 11:47:15 -04:00
Piotr Kazmierczak	fa8ffedd74	api: handle newlines in JobSubmission vars correctly (#23560 ) Fixes a bug where variable values in job submissions that contained newlines weren't encoded correctly, and thus jobs that contained them couldn't be resumed once stopped via the UI. Internal ref: https://hashicorp.atlassian.net/browse/NET-9966	2024-07-12 08:04:27 +02:00
Tim Gross	cd3101d624	scale: add `-check-index` to `job scale` command (#23457 ) The RPC handler for scaling a job passes flags to enforce the job modify index is unchanged when it makes the write to Raft. But its only checking against the existing job modify index at the time the RPC handler snapshots the state store, so it can only enforce consistency for its own validation. In clusters with automated scaling, it would be useful to expose the enforce index options to the API, so that cluster admins can enforce that scaling only happens when the job state is consistent with a state they've previously seen in other API calls. Add this option to the CLI and API and have the RPC handler check them if asked. Fixes: https://github.com/hashicorp/nomad/issues/23444	2024-06-27 16:54:06 -04:00
Daniel Bennett	cfeedd05e8	api: use the task in Allocations.GetPauseState (#23377 )	2024-06-18 12:31:12 -05:00
Tim Gross	fa70267787	scheduler: `RescheduleTracker` dropped if follow-up fails placements (#12319 ) When an allocation fails it triggers an evaluation. The evaluation is processed and the scheduler sees it needs to reschedule, which triggers a follow-up eval. The follow-up eval creates a plan to `(stop 1) (place 1)`. The replacement alloc has a `RescheduleTracker` (or gets its `RescheduleTracker` updated). But in the case where the follow-up eval can't place all allocs (there aren't enough resources), it can create a partial plan to `(stop 1) (place 0)`. It then creates a blocked eval. The plan applier stops the failed alloc. Then when the blocked eval is processed, the job is missing an allocation, so the scheduler creates a new allocation. This allocation is _not_ a replacement from the perspective of the scheduler, so it's not handed off a `RescheduleTracker`. This changeset fixes this by annotating the reschedule tracker whenever the scheduler can't place a replacement allocation. We check this annotation for allocations that have the `stop` desired status when filtering out allocations to pass to the reschedule tracker. I've also included tests that cover this case and expands coverage of the relevant area of the code. Fixes: https://github.com/hashicorp/nomad/issues/12147 Fixes: https://github.com/hashicorp/nomad/issues/17072	2024-06-10 11:15:40 -04:00
nicoche	ffcb72bfe3	api: Add Notes field to service checks (#22397 ) Co-authored-by: Piotr Kazmierczak <470696+pkazmierczak@users.noreply.github.com>	2024-06-10 16:59:49 +02:00
Daniel Bennett	4415fabe7d	jobspec: time based task execution (#22201 ) this is the CE side of an Enterprise-only feature. a job trying to use this in CE will fail to validate. to enable daily-scheduled execution entirely client-side, a job may now contain: task "name" { schedule { cron { start = "0 12 * * * *" # may not include "," or "/" end = "0 16" # partial cron, with only {minute} {hour} timezone = "EST" # anything in your tzdb } } ... and everything about the allocation will be placed as usual, but if outside the specified schedule, the taskrunner will block on the client, waiting on the schedule start, before proceeding with the task driver execution, etc. this includes a taksrunner hook, which watches for the end of the schedule, at which point it will kill the task. then, restarts-allowing, a new task will start and again block waiting for start, and so on. this also includes all the plumbing required to pipe API calls through from command->api->agent->server->client, so that tasks can be force-run, force-paused, or resume the schedule on demand.	2024-05-22 15:40:25 -05:00
Phil Renaud	e8b77fcfa0	[ui] Jobspec UI block: Descriptions and Links (#18292 ) * Hacky but shows links and desc * markdown * Small pre-test cleanup * Test for UI description and link rendering * JSON jobspec docs and variable example job get UI block * Jobspec documentation for UI block * Description and links moved into the Title component and made into Helios components * Marked version upgrade * Allow links without a description and max description to 1000 chars * Node 18 for setup-js * markdown sanitization * Ui to UI and docs change * Canonicalize, copy and diff for job.ui * UI block added to testJob for structs testing * diff test * Remove redundant reset * For readability, changing the receiving pointer of copied job variables * TestUI endpiont conversion tests * -require +must * Nil check on Links * JobUIConfig.Links as pointer --------- Co-authored-by: Tim Gross <tgross@hashicorp.com>	2024-05-22 15:00:45 -04:00
Deniz Onur Duzgun	1cc99cc1b4	bug: resolve type conversion alerts (#20553 )	2024-05-15 13:22:10 -04:00
Tim Gross	c9fd93c772	connect: support `volume_mount` blocks for sidecar task overrides (#20575 ) Users can override the default sidecar task for Connect workloads. This sidecar task might need access to certificate stores on the host. Allow adding the `volume_mount` block to the sidecar task override. Also fixes a bug where `volume_mount` blocks would not appear in plan diff outputs. Fixes: https://github.com/hashicorp/nomad/issues/19786	2024-05-14 12:49:37 -04:00
Daniel Bennett	cf87a556b3	api: new /v1/jobs/statuses endpoint for /ui/jobs page (#20130 ) introduce a new API /v1/jobs/statuses, primarily for use in the UI, which collates info about jobs, their allocations, and latest deployment. currently the UI gets all of /v1/jobs and sorts and paginates them client-side in the browser, and its "summary" column is based on historical summary data (which can be visually misleading, and sometimes scary when a job has failed at some point in the not-yet-garbage-collected past). this does pagination and filtering and such, and returns jobs sorted by ModifyIndex, so latest-changed jobs still come first. it pulls allocs and latest deployment straight out of current state for more a more robust, holistic view of the job status. it is less efficient per-job, due to the extra state lookups, but should be more efficient per-page (excepting perhaps for job(s) with very-many allocs). if a POST body is sent like `{"jobs": [{"namespace": "cool-ns", "id": "cool-job"}]}`, then the response will be limited to that subset of jobs. the main goal here is to prevent "jostling" the user in the UI when jobs come into and out of existence. and if a blocking query is started with `?index=N`, then the query should only unblock if jobs "on page" change, rather than any change to any of the state tables being queried ("jobs", "allocs", and "deployment"), to save unnecessary HTTP round trips.	2024-05-03 15:01:40 -05:00
Seth Hoenig	ae6c4c8e3f	deps: purge use of old x/exp packages (#20373 )	2024-04-12 08:29:00 -05:00
Tim Gross	e8d203e7ce	transparent proxy: add jobspec support (#20144 ) Add a transparent proxy block to the existing Connect sidecar service proxy block. This changeset is plumbing required to support transparent proxy configuration on the client. Ref: https://github.com/hashicorp/nomad/issues/10628	2024-04-04 17:01:07 -04:00
Tim Gross	78f9f17867	api: add missing `AllocDirStats` field in Go API (#20261 ) The JSON response for the Read Stats client API includes an `AllocDirStats` field. This field is missing in the `api` package, so consumers of the Go API can't use it to read the values we're getting back from the HTTP server. Fixes: https://github.com/hashicorp/nomad/issues/20246	2024-04-03 08:54:05 -04:00
Tim Gross	bdf3ff301e	jobspec: add support for destination partition to `upstream` block (#20167 ) Adds support for specifying a destination Consul admin partition in the `upstream` block. Fixes: https://github.com/hashicorp/nomad/issues/19785	2024-03-22 16:15:22 -04:00
Tim Gross	10dd738a03	jobspec: update `gateway.ingress.service` Consul API fields (#20176 ) Add support for further configuring `gateway.ingress.service` blocks to bring this block up-to-date with currently available Consul API fields (except for namespace and admin partition, which will need be handled under a different PR). These fields are sent to Consul as part of the job endpoint submission hook for Connect gateways. Co-authored-by: Horacio Monsalvo <horacio.monsalvo@southworks.com>	2024-03-22 13:50:48 -04:00
Tim Gross	5138c1c82f	autopilot: add Enterprise health information to API endpoint (#20153 ) Add information about autopilot health to the `/operator/autopilot/health` API in Nomad Enterprise. I've pulled the CE changes required for this feature out of @lindleywhite's PR in the Enterprise repo. A separate PR will include a new `operator autopilot health` command that can present this information at the command line. Ref: https://github.com/hashicorp/nomad-enterprise/pull/1394 Co-authored-by: Lindley <lindley@hashicorp.com>	2024-03-18 11:38:17 -04:00
Amir Abbas	40b8f17717	Support insecure flag on artifact (#20126 )	2024-03-14 10:59:20 -05:00
carrychair	5f5b34db0e	remove repetitive words (#20110 ) Signed-off-by: carrychair <linghuchong404@gmail.com>	2024-03-11 08:52:08 +00:00
Seth Hoenig	4d83733909	tests: swap testify for test in more places (#20028 ) * tests: swap testify for test in plugins/csi/client_test.go * tests: swap testify for test in testutil/ * tests: swap testify for test in host_test.go * tests: swap testify for test in plugin_test.go * tests: swap testify for test in utils_test.go * tests: swap testify for test in scheduler/ * tests: swap testify for test in parse_test.go * tests: swap testify for test in attribute_test.go * tests: swap testify for test in plugins/drivers/ * tests: swap testify for test in command/ * tests: fixup some test usages * go: run go mod tidy * windows: cpuset test only on linux	2024-02-29 12:11:35 -06:00
Juana De La Cuesta	20cfbc82d3	Introduces `Disconnect` block into the `TaskGroup` configuration (#19886 ) This PR is the first on two that will implement the new Disconnect block. In this PR the new block is introduced to be backwards compatible with the fields it will replace. For more information refer to this RFC and this ticket.	2024-02-19 16:41:35 +01:00
Juana De La Cuesta	120c3ca3c9	Add granular control of SELinux labels for host mounts (#19839 ) Add new configuration option on task's volume_mounts, to give a fine grained control over SELinux "z" label * Update website/content/docs/job-specification/volume_mount.mdx Co-authored-by: Luiz Aoqui <luiz@hashicorp.com> * fix: typo * func: make volume mount verification happen even on mounts with no volume --------- Co-authored-by: Luiz Aoqui <luiz@hashicorp.com> Co-authored-by: Tim Gross <tgross@hashicorp.com>	2024-02-05 10:05:33 +01:00
Luiz Aoqui	e1e80f383e	vault: add new `nomad setup vault -check` commmand (#19720 ) The new `nomad setup vault -check` commmand can be used to retrieve information about the changes required before a cluster is migrated from the deprecated legacy authentication flow with Vault to use only workload identities.	2024-01-12 15:48:30 -05:00
Tim Gross	0935f443dc	vault: support allowing tokens to expire without refresh (#19691 ) Some users with batch workloads or short-lived prestart tasks want to derive a Vaul token, use it, and then allow it to expire without requiring a constant refresh. Add the `vault.allow_token_expiration` field, which works only with the Workload Identity workflow and not the legacy workflow. When set to true, this disables the client's renewal loop in the `vault_hook`. When Vault revokes the token lease, the token will no longer be valid. The client will also now automatically detect if the Vault auth configuration does not allow renewals and will disable the renewal loop automatically. Note this should only be used when a secret is requested from Vault once at the start of a task or in a short-lived prestart task. Long-running tasks should never set `allow_token_expiration=true` if they obtain Vault secrets via `template` blocks, as the Vault token will expire and the template runner will continue to make failing requests to Vault until the `vault_retry` attempts are exhausted. Fixes: https://github.com/hashicorp/nomad/issues/8690	2024-01-10 14:49:02 -05:00
Tim Gross	d3e5cae1eb	consul: support admin partitions (#19665 ) Add support for Consul Enterprise admin partitions. We added fingerprinting in https://github.com/hashicorp/nomad/pull/19485. This PR adds a `consul.partition` field. The expectation is that most users will create a mapping of Nomad node pool to Consul admin partition. But we'll also create an implicit constraint for the fingerprinted value. Fixes: https://github.com/hashicorp/nomad/issues/13139	2024-01-10 10:41:29 -05:00
Egor Mikhailov	18f49e015f	auth: add new optional `OIDCDisableUserInfo` setting for OIDC auth provider (#19566 ) Add new optional `OIDCDisableUserInfo` setting for OIDC auth provider which disables a request to the identity provider to get OIDC UserInfo. This option is helpful when your identity provider doesn't send any additional claims from the UserInfo endpoint, such as Microsoft AD FS OIDC Provider: > The AD FS UserInfo endpoint always returns the subject claim as specified in the > OpenID standards. AD FS doesn't support additional claims requested via the > UserInfo endpoint Fixes #19318	2024-01-09 13:41:46 -05:00
James Rasell	5a00440b06	api: Fix operator snapshot API streaming. (#19608 )	2024-01-05 14:33:39 +00:00
James Rasell	91cba75f5c	copywrite: fix and add copywrite config enterprise comments. (#19590 ) Nomad CI checks for copywrite headers using multiple config files for specific exemption paths. This means the top-level config file does not take effect when running the copywrite script within these sub-folders. Exempt files therefore need to be added to the sub-config files, along with the top level.	2024-01-03 08:58:53 +00:00
hc-github-team-es-release-engineering	a4ecc2fbc8	Merge pull request #19283 from hashicorp/RELENG-960-EOY-license-fixes [DO NOT MERGE UNTIL EOY] update year in LICENSE and copywrite files	2024-01-02 09:38:54 -08:00
Luiz Aoqui	e4e70b086a	ci: run linter in `./api` package (#19513 )	2023-12-19 15:59:47 -05:00
Luiz Aoqui	859606a54a	consul: fix parsing of `service.cluster` field (#19510 )	2023-12-19 09:55:41 -05:00
Mike Nomitch	31f4296826	Adds support for failures before warning to Consul service checks (#19336 ) Adds support for failures before warning and failures before critical to the automatically created Nomad client and server services in Consul	2023-12-14 11:33:31 -08:00
Juana De La Cuesta	cf539c405e	Add a new parameter to avoid starting a replacement for lost allocs (#19101 ) This commit introduces the parameter preventRescheduleOnLost which indicates that the task group can't afford to have multiple instances running at the same time. In the case of a node going down, its allocations will be registered as unknown but no replacements will be rescheduled. If the lost node comes back up, the allocs will reconnect and continue to run. In case of max_client_disconnect also being enabled, if there is a reschedule policy, an error will be returned. Implements issue #10366 Co-authored-by: Dom Lavery <dom@circleci.com> Co-authored-by: Tim Gross <tgross@hashicorp.com> Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>	2023-12-06 12:28:42 +01:00
Morgan Drake	c5b36b500b	move license to 2024	2023-12-01 12:26:27 -08:00
Jorge Marey	5f78940911	Allow setting a token name template on auth methods (#19135 ) Co-authored-by: James Rasell <jrasell@hashicorp.com>	2023-11-28 12:26:21 +00:00
codenoid	557b4942d0	api: fix panic in Allocation.Stub() when Job is nil (#19115 )	2023-11-17 08:55:46 -05:00
Seth Hoenig	3ba364e42f	deps: update some dependencies (#19002 ) * deps: update shoenig/test to 1.7.0 * deps: update go-set/v2 to v2.1.0 * deps: update shoenig/go-landlock to v1.2.0	2023-11-07 07:34:40 -06:00
Michael Schurter	e49ca3c431	identity: Implement `change_mode` (#18943 ) * identity: support change_mode and change_signal wip - just jobspec portion * test struct * cleanup some insignificant boogs * actually implement change mode * docs tweaks * add changelog * test identity.change_mode operations * use more words in changelog * job endpoint tests * address comments from code review --------- Co-authored-by: Tim Gross <tgross@hashicorp.com>	2023-11-01 09:41:11 -05:00
Luiz Aoqui	d7edbd44b7	api: handle redirect during websocket upgrade (#18903 ) When attempting a WebSocket connection upgrade the client may receive a redirect request from the server, in which case the request should be reattempted using the new address present in the `Location` header.	2023-10-31 17:12:11 -04:00
Luiz Aoqui	3ddf1ecf1d	actions: minor bug fixes and improvements (#18904 )	2023-10-31 17:06:02 -04:00

1 2 3 4 5 ...

1260 Commits