nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-04 17:35:43 +03:00

Author	SHA1	Message	Date
Michael Schurter	2def3bb2b9	Prepare release 1.7.0-beta.2	2023-11-15 14:42:22 -08:00
Adriano Caloiaro	f66eb83fc0	Add `go-netaddrs` support to `retry_join` (#18745 )	2023-11-15 10:07:18 -05:00
Phil Renaud	bb6c86d2a4	Shows the client/node name alongside alloc short ID if the job is sys/sysbatch (#19051 )	2023-11-15 10:05:12 -05:00
Luiz Aoqui	26746a4093	cli: add zero nodes message to `node status` (#19082 ) Display a message to indicate that there are no nodes registered when `node status` returns zero values.	2023-11-14 23:00:12 -05:00
Tim Gross	98e9fb4698	docs: clarify when "all" is not permitted for `cap_add` (#19091 ) Linux capabilities configurable by the task must be a subset of those configured in the plugin configuration. Clarify this implies that `"all"` is not permitted if the plugin is not also configured to allow all capabilities. Fixes: https://github.com/hashicorp/nomad/issues/19059	2023-11-14 16:33:55 -05:00
Tim Gross	0236bd0907	qemu: fix panic from missing resources block (#19089 ) The `qemu` driver uses our universal executor to run the qemu command line tool. Because qemu owns the resource isolation, we don't pass in the resource block that the universal executor uses to configure cgroups and core pinning. This resulted in a panic. Fix the panic by returning early in the cgroup configuration in the universal executor. This fixes `qemu` but also any third-party drivers that might exist and are using our executor code without passing in the resource block. In future work, we should ensure that the `resources` block is being translated into qemu equivalents, so that we have support for things like NUMA-aware scheduling for that driver. Fixes: https://github.com/hashicorp/nomad/issues/19078	2023-11-14 16:26:44 -05:00
dependabot[bot]	9bc4a8df59	chore(deps): bump debug from 4.1.1 to 4.3.4 in /scripts/screenshots/src (#18636 ) Bumps [debug](https://github.com/debug-js/debug) from 4.1.1 to 4.3.4. - [Release notes](https://github.com/debug-js/debug/releases) - [Commits](https://github.com/debug-js/debug/compare/4.1.1...4.3.4) --- updated-dependencies: - dependency-name: debug dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-11-14 14:36:02 -05:00
Phil Renaud	12e43aa07f	Re-add wildcard for test-ui path restrictions (#19085 )	2023-11-14 11:28:53 -05:00
Tim Gross	42f0540f9a	docs: fix link to dynamic node metadata API (#19086 )	2023-11-14 11:16:12 -05:00
Tim Gross	8fac70c92c	E2E: refactor `vaultcompat` to allow for ENT tests (#19081 ) We want to run the Vault compatibility E2E test with Vault Enterprise binaries and use Vault namespaces. Refactor the `vaultcompat` test so as to parameterize most of the test setup logic with the namespace, and add the appropriate build tag for the CE version of the test.	2023-11-14 09:54:47 -05:00
Tim Gross	b5af87ebf3	set Vault namespace from task in `vault_hook` JWT login (#19080 ) The JWT login codepath for the `vault_hook` was missing the Vault namespace, so the login request for non-default namespaces would fail.	2023-11-14 09:54:36 -05:00
Juana De La Cuesta	bae82b14b4	docs: Add section for disable restart (#19083 ) * docs: add section for disable restart that mirrors what is on disable reschedule * Update restart.mdx	2023-11-14 14:53:43 +01:00
Tim Gross	1c9c75cc83	E2E: refactor `consulcompat` to allow for ENT tests (#19068 ) We want to run the Consul compatibility E2E test with Consul Enterprise binaries and use Consul namespaces. Refactor the `consulcompat` test so as to parameterize most of the test setup logic with the namespace, and add the appropriate build tag for the CE version of the test. Ref: https://github.com/hashicorp/nomad-enterprise/pull/1305	2023-11-10 15:05:51 -05:00
Seth Hoenig	5987ba434f	e2ev3: wait for logs to become ready (#19067 ) Just because an alloc is running does not mean nomad is ready to serve task logs. In a test case where you immediatly read logs after starting a task, it could be that nomad responds with "no logs found" when you try to read logs, in which case you just need to wait longer. Do so in the v3 TaskLogs helper function.	2023-11-10 12:43:16 -06:00
Luiz Aoqui	f0acf72ae7	client: fix Consul token retrievel for templates (#19058 ) The template hook must use the Consul token for the cluster defined in the task-level `consul` block or, if `nil, in the group-level `consul` block. The Consul tokens are generated by the allocrunner consul hook, but during the transition period we must fallback to the Nomad agent token if workload identities are not being used. So an empty token returned from `GetConsulTokens()` is not enough to determine if we should use the legacy flow (either this is an old task or the cluster is not configured for Consul WI), or if there is a misconfiguration (task or group is `consul` block is using a cluster that doesn't have an `identity` set). In order to distinguish between the two scenarios we must iterate over the task identities looking for one suitable for the Consul cluster being used.	2023-11-10 13:42:30 -05:00
Phil Renaud	62007e3b18	[ui] Small fix to let UI actions passing use job.name instead of job.id, since namespace is passed as an explicit param afterward (#19061 )	2023-11-10 10:55:00 -05:00
Seth Hoenig	c17333d74a	e2e refactor oversubscription (#19060 ) * e2e: remove old oversubscription test * e2e: fixup and cleanup oversubscription test suite Fix and cleanup this old oversubscription test. * use t.Cleanup instead of defer in tests	2023-11-10 09:25:32 -06:00
Tim Gross	5d0008a9b4	tools: bump version of `hc-install` (#19063 ) The version we have of `hc-install` doesn't allow installing Enterprise binaries. Upgrade so that this is available to the development team and to our E2E tests in the Enterprise repo.	2023-11-10 09:57:29 -05:00
Tim Gross	4e38b41d9d	E2E: add template block to `consulcompat` test (#19055 ) The Consul compatibility test focuses on Connect, but it'd be a good idea to ensure we can successfully get template data out of Consul as well. Also tightens up the test's Consul ACL policy for the Nomad agent.	2023-11-10 09:25:37 -05:00
Seth Hoenig	1f957947b4	e2e: refactor nomadexec test suite (#19054 )	2023-11-10 07:09:24 -06:00
Seth Hoenig	2f8d94ae3e	e2e: more cpu and memory for java tasks and some scripts (#19057 )	2023-11-10 07:08:14 -06:00
Tim Gross	5ad715b281	fix taskrunner test after broken signature (#19056 ) PRs #19034 and #19040 accidentally conflicted with each other without a merge conflict when #19034 changes the method signature of `SetConsulTokens`. Because CI doesn't rebase, both PRs tested fine and only were broken once they landed on `main`. Fix that.	2023-11-09 15:53:25 -05:00
Seth Hoenig	f211a0ab7c	e2e: update terrform lock file for 1.6.3 (#19049 ) Using the latest version of terraform, the lock file is not the same as when it was generated. Seems like the http module is not needed? versioned? present? anymore.	2023-11-09 10:49:49 -06:00
Luiz Aoqui	b61a31c38f	chore: remove comment about WI change mode (#19047 ) Identity change mode was implemented in #18943 and handles the update at the task level, so workload identity manager receives the update as expected.	2023-11-09 11:06:03 -05:00
Luiz Aoqui	85d923b759	cli: fix Consul env var URL reference (#19041 )	2023-11-09 10:58:03 -05:00
Luiz Aoqui	6d8417014f	client: pass alloc hook resources to template hook (#19040 ) The task template hook uses the alloc resource to retrieve Consul tokens, so it must be passed from the allocation.	2023-11-09 10:55:35 -05:00
Seth Hoenig	402540f7fb	e2e: bump packer build instances because faster (#19046 )	2023-11-09 09:33:30 -06:00
Tim Gross	c7c3b3ae33	revoke Consul tokens obtained via WI when alloc stops (#19034 ) Add a `Postrun` and `Destroy` hook to the allocrunner's `consul_hook` to ensure that Consul tokens we've created via WI get revoked via the logout API when we're done with them. Also add the logout to the `Prerun` hook if we've hit an error.	2023-11-09 10:08:09 -05:00
Luke Kysow	36c9aee3f0	Bump consul-template to 0.35.0 (#19032 ) * Bump consul-template to 0.35.0 * run go mod tidy	2023-11-09 09:48:33 -05:00
Seth Hoenig	a28e5b6965	e2e: refactor metrics test to use NSD and WI (#19022 ) * e2e: remove old metrics suite * e2e: install stress on e2e jammy image * e2e: overhaul metrics test to use nomad service discovery, workload identity * e2e: format metrics hcl files and copywrite * e2e: undo tf lock file * e2e: undo reg auth file perms * e2e: format cpustress.hcl	2023-11-09 08:21:16 -06:00
Phil Renaud	f322bb7efb	Nicer comment styles in example jobs (#19037 )	2023-11-08 20:13:34 -05:00
Phil Renaud	6cd706f460	Only run test-ui, and percy, in the event that a push/pr touches the ui directory (#19038 )	2023-11-08 20:12:54 -05:00
Piotr Kazmierczak	128c71b579	cli: simplify conditionals in setup commands (#19011 )	2023-11-08 19:41:15 -05:00
Tim Gross	7191c78928	refactor: rename allocrunner's Consul service reg handler (#19019 ) The allocrunner has a service registration handler that proxies various API calls to Consul. With multi-cluster support (for ENT), the service registration handler is what selects the correct Consul client. The name of this field in the allocrunner and taskrunner code base looks like it's referring to the actual Consul API client. This was actually the case before Nomad native service discovery was implemented, but now the name is misleading.	2023-11-08 15:39:32 -05:00
Luiz Aoqui	6761f1f98c	cli: fix `setup consul` binding rule config (#19033 ) When creating the binding rule, `BindName` must match the pattern used for the role name, otherwise the task will not be able to login to Consul. Also update the equality check for the binding rule to ensure this property is held even if the auth method already has existing binding rules attached.	2023-11-08 15:13:16 -05:00
Michael Schurter	c4ae91f8be	Fix WorkloadIdentity.TTL handling, jobspec2 testing, and hcl1 vs 2 parsing (#19024 ) * make the little dots consistent * don't trim delimiter as that over matches * test jobspec2 package * copy api/WorkloadIdentity.TTL -> structs * test ttl parsing * fix hcl1 v 2 parsing mismatch * make jobspec(1) tests match jobspec2 tests	2023-11-08 09:01:16 -08:00
Tim Gross	9d075c44b2	config: remove old Vault/Consul config blocks from parser (#18997 ) Remove the now-unused original configuration blocks for Consul and Vault from the agent configuration parsing. When the agent needs to refer to a Consul or Vault block it will always be for a specific cluster for the task/service (or the default cluster for the agent's own use). This is third of three changesets for this work. Fixes: https://github.com/hashicorp/nomad/issues/18947 Ref: https://github.com/hashicorp/nomad/pull/18991 Ref: https://github.com/hashicorp/nomad/pull/18994	2023-11-08 09:30:08 -05:00
Seth Hoenig	63da22063b	e2e: update pledge driver to 0.3.0 (#19020 )	2023-11-08 06:58:59 -06:00
hc-github-team-es-release-engineering	57d3019879	REPLAT-962 Update LICENSE text (#19023 )	2023-11-08 11:54:54 +00:00
Luiz Aoqui	ab36cf031c	vault: avoid continual renewal of invalid token (#18985 ) A series of errors may happen when a token is invalidated while the Vault client is waiting to renew it. The token may have been invalidated for several reasons, such as the alloc finished running and it's now terminal or the token may have been change directly on Vault out-of-band. Most of the errors are caused by retries that will never succeed until Vault fully removes the token from its state. This commit prevents the retries by making the error `invalid lease ID` a fatal error. In earlier versions of Vault, this case was covered by the error `lease not found or lease is not renewable`, which is already considered to be a fatal error by Nomad: `2d0cde4ccc/vault/expiration.go (L636-L639)` But https://github.com/hashicorp/vault/pull/5346 introduced an earlier `nil` check that generates a different error message: `750ab337ea/vault/expiration.go (L1362-L1364)` Both errors happen for the same reason (`le == nil`) and so should be considered fatal on renewal.	2023-11-07 19:50:19 -05:00
Luiz Aoqui	7054fe1a8c	vault: always renew tokens using the renewal loop (#18998 ) Previously, a Vault token could renewed either periodically via the renewal loop or immediately by calling `RenewToken()`. But a race condition in the renewal loop could cause an attempt to renew an expired token. If both `updateCh` and `renewalCh` are active (such as when a task stops at the same time its token is waiting for renewal), the following `select` picks a `case` at random. `78f0c6b2a9/client/vaultclient/vaultclient.go (L557-L564)` If `case <-renewalCh` is picked, the token is incorrectly re-added to the heap, causing unnecessary renewals of a token that is already expired. `1604dba508/client/vaultclient/vaultclient.go (L505-L510)` To prevent this situation, the `renew()` function should only renew tokens that are currently in the heap, so `RenewToken()` must first push the token to the heap and wait for the renewal to happen instead of calling `renew()` directly since this could cause another race condition where the token is renewed twice: once by `RenewToken()` calling `renew()` directly and a second time if the renewal happens to pick the token as soon as `RenewToken()` adds it to the heap.	2023-11-07 19:49:33 -05:00
Phil Renaud	783572de7d	[ui] Actions implementation in the web UI (#18793 ) * runAction model and adapter funcs * Hacky but functional action running from job index * remove proxy hack * runAction added to taskSubRow * Added tty and ws_handshake to job action endpoint call * delog * Bunch of streaming work * action started, running, and finished notification titles, neutral color, and ansi escape * Handle random alloc selection in the web ui * Run on All implementation in web ui * [ui] Helios two-step button and uniform title bar for Actions (#18912) * Initial pass at title bar button uniformity * Vertical align on actions dropdown toggle and small edits to prevent keynav overflow issue * We represent loading state w text and disable now * Pageheader component to align buttons * Buttons standardized * Actions dropdown reveal for multi-alloc job * Notification code styles * An action-having single alloc job * Mirageed * Actions-laden jobs in mirage * Separating allocCount and taskCount in mirage mocks * Unbreak stop job tests * Permissions for actions dropdown * tests for running actions from the job index page * running from a task row actions tests * some todocleanup * PR feedback addressed, including page helper for actions	2023-11-07 15:29:43 -05:00
Seth Hoenig	cf2f48efd4	build: update to Go 1.21.4 (#19013 )	2023-11-07 13:18:07 -06:00
Seth Hoenig	a2f7ab2645	e2e disable windows (#19012 ) * e2e: disable windows client * e2e: disable windows artifact test	2023-11-07 09:34:18 -06:00
Tim Gross	50f0ce5412	config: remove old Vault/Consul config blocks from client (#18994 ) Remove the now-unused original configuration blocks for Consul and Vault from the client. When the client needs to refer to a Consul or Vault block it will always be for a specific cluster for the task/service. Add a helper for accessing the default clusters (for the client's own use). This is two of three changesets for this work. The remainder will implement the same changes in the `command/agent` package. As part of this work I discovered and fixed two bugs: * The gRPC proxy socket that we create for Envoy is only ever created using the default Consul cluster's configuration. This will prevent Connect from being used with the non-default cluster. * The Consul configuration we use for templates always comes from the default Consul cluster's configuration, but will use the correct Consul token for the non-default cluster. This will prevent templates from being used with the non-default cluster. Ref: https://github.com/hashicorp/nomad/issues/18947 Ref: https://github.com/hashicorp/nomad/pull/18991 Fixes: https://github.com/hashicorp/nomad/issues/18984 Fixes: https://github.com/hashicorp/nomad/issues/18983	2023-11-07 09:15:37 -05:00
Tim Gross	1998004483	move deprecation warning for Vault/Consul token to admission hook (#18995 ) Submitting a Consul or Vault token with a job is deprecated in Nomad 1.7 and intended for removal in Nomad 1.9. We added a deprecation warning to the CLI when the user passes in the appropriate flag or environment variable in does not use Vault or Consul but happen to have the appropriate environment variable in your environment. While this is generally a bad practice (because the token is leaked to Nomad), it's also the existing practice for some users. Move the warning to the job admission hook. This will allow us to warn only when appropriate, and that will also help the migration process by producing warnings only for the relevant jobs.	2023-11-07 08:37:06 -05:00
Seth Hoenig	3ba364e42f	deps: update some dependencies (#19002 ) * deps: update shoenig/test to 1.7.0 * deps: update go-set/v2 to v2.1.0 * deps: update shoenig/go-landlock to v1.2.0	2023-11-07 07:34:40 -06:00
Piotr Kazmierczak	7c6863b479	cli: setup vault command (#18910 ) An interactive setup helper for configuring Vault to accept Nomad WI-enabled workloads. --------- Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>	2023-11-07 10:42:00 +01:00
Dave May	e4f98a8d1d	docs: fix broken links in docker.mdx (#19003 )	2023-11-07 07:34:47 +00:00
Tim Gross	1ef99f0536	config: remove old Vault/Consul config blocks from server (#18991 ) Remove the now-unused original configuration blocks for Consul and Vault from the server. When the server needs to refer to a Consul or Vault block it will always be for a specific cluster for the task/service. Add a helper for accessing the default clusters (for the servers own use). This is one of three changesets for this work. The remainder will implement the same changes in the `client` package and on the `command/agent` package. As part of this work I discovered that the job submission hook for Vault only checks the enabled flag on the default cluster, rather than the clusters that are used by the job being submitted. This will return an error on job registration saying that Vault is disabled. Fix that to check only the cluster(s) used by the job. Ref: https://github.com/hashicorp/nomad/issues/18947 Fixes: https://github.com/hashicorp/nomad/issues/18990	2023-11-06 10:26:20 -05:00

1 2 3 4 5 ...

25304 Commits