nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-08 11:25:41 +03:00

Author	SHA1	Message	Date
Seth Hoenig	286dce7a2a	exec2: add a client.users configuration block (#20093 ) * exec: add a client.users configuration block For now just add min/max dynamic user values; soon we can also absorb the "user.denylist" and "user.checked_drivers" options from the deprecated client.options map. * give the no-op pool implementation a better name * use explicit error types to make referencing them cleaner in tests * use import alias to not shadow package name	2024-03-08 16:02:32 -06:00
Giovanni Avelar	26a27bb12c	cli: add -json option on jobs status command (#18925 )	2024-03-08 16:03:52 -05:00
Luke Kysow	9c3bbd191a	Bump consul-template to 0.37.2 (#20105 )	2024-03-08 14:56:35 -05:00
Tim Gross	ac366521f2	deps: upgrade protobuf lib to 1.33.0 (#20100 ) Although Nomad is not vulnerable to CVE-2024-24786 because it's configured to discard unknown messages during unmarshaling, we should upgrade so that third-party vulnerability scanners don't detect the vulnerable version and complain. Also update go1.22.1 changelog entry to include CVEs	2024-03-08 10:55:55 -05:00
Seth Hoenig	2c1f5daad7	more test refactoring (#20092 ) * tests: swap testify for test in client/config * tests: swap testify for test in logmon/	2024-03-07 11:04:16 -06:00
Michael Schurter	3193ac204f	docs: skipping a major release is fine (#20075 ) Nomad has always placed an extremely high priority on backward compatibility. We have always aimed to support N-2 major releases and usually gone above and beyond that. The new https://www.hashicorp.com/long-term-support policy also mentions that N-2 is what we have always supported, so it's probably time for our docs to reflect that reality.	2024-03-06 08:57:12 -08:00
Michael Schurter	82fe2b5df6	docs: fix s/port-plan-failure (#20079 ) Fixes #20070	2024-03-06 08:56:31 -08:00
Seth Hoenig	55b0795866	build: upgrade to go1.22 (#20066 ) * build: upgrade to go1.22 * add cl * build: use codecgen from go-msgpack v1.1.5+base32 and stringer 0.18.0 for compatability with go1.22 * ci: update golangci-lint to 1.56.2 * build: update hclogvet for go1.22 * build: bump to go1.22.1	2024-03-06 09:54:04 -06:00
Seth Hoenig	67554b8f91	exec2: implement dynamic workload users taskrunner hook (#20069 ) * exec2: implement dynamic workload users taskrunner hook This PR impelements a TR hook for allocating dynamic workload users from a pool managed by the Nomad client. This adds a new task driver Capability, DynamicWorkloadUsers - which a task driver must indicate in order to make use of this feature. The client config plumbing is coming in a followup PR - in the RFC we realized having a client.users block would be nice to have, with some additional unrelated options being moved from the deprecated client.options config. * learn to spell	2024-03-06 09:34:27 -06:00
Mark Johnston	3e7191ccb7	Fix wording of ACL error message (#20071 ) When creating a job ACL, you must supply a job ID if you supply a namespace. If you try to give a namespace without a job ID, the error states "JobACL.JobID without Namespace" instead.	2024-03-05 16:49:28 -08:00
Phil Renaud	7820df53ca	[ui]] Percy Stabilization (#20061 ) * Some actions and inline-chart stabilization * Weird little semicolon, are you my undoing?	2024-03-05 08:49:58 -05:00
Seth Hoenig	57bd39061b	exec2: implement a dynamic users pool (#20065 ) * exec2: implement a dynamic users pool This PR adds an implementation of a Pool from which dynamic users can be allocated on behalf of tasks making use of an upcoming feature of Nomad client (dynamic users). A task hook and client plumbing, etc. will be in follow up PRs. * no need for randomness assertion	2024-03-05 07:35:20 -06:00
Seth Hoenig	06a4fcb7d5	build: update the actions/checkout version (#20067 )	2024-03-04 13:01:38 -06:00
Soren L. Hansen	96acddbc13	Avoid NPE in nomad/command/job_restart.go (#20049 ) stopAlloc() checks if an allocation represents a system job like this: ``` if alloc.Job.Type == api.JobTypeSystem { ... } ``` This caused the cli to crash: ``` ==> 2024-02-29T08:45:53+01:00: Restarting 2 allocations 2024-02-29T08:45:54+01:00: Rescheduling allocation "6a9da11a" for group "redacted-group" panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x2 addr=0x20 pc=0x10686affc] goroutine 36 [running]: github.com/hashicorp/nomad/command.(JobRestartCommand).stopAlloc(0x14000b11040, {0x14000996dc0?, 0x0?}) github.com/hashicorp/nomad/command/job_restart.go:968 +0x25c github.com/hashicorp/nomad/command.(JobRestartCommand).handleAlloc(0x14000b11040, {0x14000996dc0?, 0x0?}) github.com/hashicorp/nomad/command/job_restart.go:868 +0x34 github.com/hashicorp/nomad/command.(JobRestartCommand).Run.(JobRestartCommand).Run.func1.func2() github.com/hashicorp/nomad/command/job_restart.go:392 +0x28 github.com/hashicorp/go-multierror.(Group).Go.func1() github.com/hashicorp/go-multierror@v1.1.1/group.go:23 +0x60 created by github.com/hashicorp/go-multierror.(*Group).Go in goroutine 1 github.com/hashicorp/go-multierror@v1.1.1/group.go:20 +0x84 ``` Attaching a debugger revealed that `alloc.Job` was set, but `alloc.Job.Type` was nil. After guarding the `.Type` check with a `alloc.Job.Type != nil`, it still crashed. This time, `alloc.Job` was nil. I was scrambling to get the job running again, so I didn't have the opportunity to find out why those values were nil, but this change ensures the CLI does not crash in these situations. Fixes #20048	2024-03-01 08:07:28 -06:00
Seth Hoenig	a66f7ba888	ci: update macos runners to macos-14 (apple silicon) (#20054 )	2024-02-29 14:31:59 -06:00
Seth Hoenig	4d83733909	tests: swap testify for test in more places (#20028 ) * tests: swap testify for test in plugins/csi/client_test.go * tests: swap testify for test in testutil/ * tests: swap testify for test in host_test.go * tests: swap testify for test in plugin_test.go * tests: swap testify for test in utils_test.go * tests: swap testify for test in scheduler/ * tests: swap testify for test in parse_test.go * tests: swap testify for test in attribute_test.go * tests: swap testify for test in plugins/drivers/ * tests: swap testify for test in command/ * tests: fixup some test usages * go: run go mod tidy * windows: cpuset test only on linux	2024-02-29 12:11:35 -06:00
Phil Renaud	c2fe51bf11	Fixes an issue where shift+num would not open an eval on the evaluations index table (#20047 )	2024-02-29 11:25:52 -06:00
James Rasell	8f3f2a8c5c	docs: fix autoscaler variable ACL policy example. (#20050 )	2024-02-29 15:44:29 +00:00
Jeff Boruszak	57af1cdcbf	docs: Consul Admin partition example (#20022 )	2024-02-28 09:04:04 -06:00
James Rasell	dfda021aaf	docs: add autoscaler ACL policy requirements. (#20041 ) Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>	2024-02-28 14:19:38 +00:00
Soren L. Hansen	14280e0820	Prevent NPE when service lacks identity (#19987 ) Fixes a null pointer exception if `Alloc.SignIdentities` was called for any service and any service lacked an identity. Fixes #19986	2024-02-22 09:01:06 -05:00
Luiz Aoqui	cce72cddfd	docs: add Autoscaler `query_window_offset` config (#19942 )	2024-02-20 17:01:30 -05:00
Michael Schurter	b3a4c80f8c	docs: fix s/envoy-bootstrap-error redirect (#20015 ) And cleanup whitespace	2024-02-20 12:26:26 -08:00
Mike Nomitch	18e5e168f4	Remove accidental console log in namespace test setup (#19874 )	2024-02-20 11:42:11 -08:00
Tim Gross	45b2c34532	cni: add DNS set by CNI plugins to task configuration (#20007 ) CNI plugins may set DNS configuration, but this isn't threaded through to the task configuration so that we can write it to the `/etc/resolv.conf` file as needed. Add the `AllocNetworkStatus` to the alloc hook resources so they're accessible from the taskrunner. Any DNS entries provided by the user will override these values. Fixes: https://github.com/hashicorp/nomad/issues/11102	2024-02-20 10:17:27 -05:00
Juana De La Cuesta	20cfbc82d3	Introduces `Disconnect` block into the `TaskGroup` configuration (#19886 ) This PR is the first on two that will implement the new Disconnect block. In this PR the new block is introduced to be backwards compatible with the fields it will replace. For more information refer to this RFC and this ticket.	2024-02-19 16:41:35 +01:00
Phil Renaud	e8db588368	ui: Warning and better error handling for invalid-named variables and job templates (#19989 ) * Warning and better error handling for invalid-named variables and job templates * Warning and better error handling for invalid-named variables and job templates * Tests for variable pathname warnings * Only show the bad-name warning if the variable is being created and path is editable	2024-02-15 11:54:09 -05:00
Phil Renaud	c1cbc39a96	Confirmation on exit from exec as long as socket has been opened (#19985 )	2024-02-15 11:52:23 -05:00
Tim Gross	3149e5393c	connect: fix missing diff of expose block (#19990 ) While working on #10628 I discovered that the `expose` block was missing an implementation of Diff, which means it doesn't show up correctly in `job plan` output. Also, fix field comparison in `ServiceCheck.Equal`. This is a bug in the method but it doesn't look like it impacts production code.	2024-02-15 09:01:53 -05:00
James Rasell	4b46ff8ce0	test: fix test datarace within helper broker. (#19974 )	2024-02-15 08:54:56 +00:00
James Rasell	e4648551e5	test: fix test datarace within driver shared eventer. (#19975 )	2024-02-15 07:39:43 +00:00
Tim Gross	a74775814c	fingerprint: add DNS address and port to Consul fingerprint (#19969 ) In order to provide a DNS address and port to Connect tasks configured for transparent proxy, we need to fingerprint the Consul DNS address and port. The client will pass this address/port to the iptables configuration provided to the `consul-cni` plugin. Ref: https://github.com/hashicorp/nomad/issues/10628	2024-02-14 12:15:58 -05:00
Cedric Le Roux	994a2b1036	client: fixed a bug where corrupt client state could panic the client (#19972 )	2024-02-14 11:14:11 -05:00
Tim Gross	c1b5850473	docs: add warning not to enable Consul `tls.grpc.verify_incoming` (#19970 ) Consul does not support incoming TLS verification of Envoy. This failure results in hard-to-understand errors like `SSLV3_ALERT_BAD_CERTIFICATE` in the Envoy allocation logs. Leave a warning about this to users. Closes: https://github.com/hashicorp/nomad/issues/19772 Closes: https://github.com/hashicorp/nomad/issues/16854 Ref: https://github.com/hashicorp/consul/issues/13088	2024-02-14 08:56:35 -05:00
Tim Gross	c364cb5729	Merge pull request #19968 from hashicorp/post-1.7.5-release Post 1.7.5 release	2024-02-13 11:42:30 -05:00
Tim Gross	3978f96898	Merge release 1.7.5 files	2024-02-13 11:34:25 -05:00
hc-github-team-nomad-core	64c2e2b868	Prepare for next release	2024-02-13 11:32:59 -05:00
hc-github-team-nomad-core	6e08d9ffff	Generate files for 1.7.5 release	2024-02-13 11:32:59 -05:00
Luiz Aoqui	62b7d6ffe9	vault: revert #18998 to fix potential deadlock (#19963 ) * Revert "vault: always renew tokens using the renewal loop (#18998)" This reverts commit `7054fe1a8c`. * test: add case for concurrent Vault token renewal	2024-02-13 09:50:46 -05:00
Julien Castets	61941d8204	docs: autoscaler doc for max_scale_up and max_scale_down of target-value strategy (#19945 ) See https://github.com/hashicorp/nomad-autoscaler/pull/848	2024-02-13 07:38:39 +00:00
Phil Renaud	1bde7a8fb4	[ui] Upgrades to build storybook on node v20 (#19953 ) * Attempting to build storybook on node v20 * babel-plugin-dynamic-import-node added * build without babel-plugin-dynamic-import-node explicitly declared	2024-02-12 16:51:47 -05:00
Tim Gross	a54657899c	CNI: fix deprecation warnings (#19954 ) We updated our `go-cni` dependency in #17582 but this left deprecation warnings on the `cni.CNIResult` type (now `cni.Result`).	2024-02-12 15:35:43 -05:00
Seth Hoenig	37c497628c	docs: describe cloud environments in fingerprint denylist (#19952 ) This PR changes the example of the client config option "fingerprint.denylist" to include all the cloud environment fingerprinters. Each one contains a 2 second HTTP timeout to a metadata endpoint that does not exist if you are not in that particular cloud. When run in serial on startup, this results in an 8 second wait where nothing useful is happening. Closes #16727	2024-02-12 09:57:29 -06:00
Tim Gross	e986c298ac	alloc exec: fix panics after stream close (#19932 ) In #19172 we added a check on websocket errors to see if they were one of several benign "close" messages. This change inadvertently assumed that other messages used for close would not implement `HTTPCodedError`. When errors like the following are received: > msgpack decode error [pos 0]: io: read/write on closed pipe" they are sent from the inner loop as though they were a "real" error, but the channel is already being closed with a "close" message. This allowed many more attempts to pass thru a previously-undiscovered race condition in the two goroutines that stream RPC responses to the websocket. When the input stream returns an error for any reason (for example, the command we're executing has exited), it will unblock the "outer" goroutine and cause a write to the websocket. If we're concurrently writing the "close error" discussed above, this results in a panic from the websocket library. This changeset includes two fixes: * Catch "closed pipe" error correctly so that we're not sending unnecessary error messages. * Move all writes to the websocket into the same response streaming goroutine. The main handler goroutine will block on a results channel, and the response streaming goroutine will send on that channel with the final error when it's done so it can be reported to the user.	2024-02-12 09:43:34 -05:00
Tim Gross	0985f96f8d	state: fix state store corruption in plan apply (#19937 ) The state store's `UpsertPlanResults` method canonicalizes allocations in order to upgrade them to a new version. But the method does not copy the allocation before doing so, which can potentially corrupt the state store. This hasn't been implicated in any known user-facing bugs, but was detected when running Nomad under a build with Go toolchains data race detection enabled.	2024-02-12 08:59:04 -05:00
Luiz Aoqui	e2bfdf0c10	events: emit event when job is deleted (#19903 ) When jobs are deregistered with the `purge` flag they are immediately deleted from the state store instead of just updated to be marked as stopped. Without tracking job deletions the event stream would not receive a `JobDeregistered` event when `purge` was set.	2024-02-09 18:19:33 -05:00
Luiz Aoqui	4a8b01430b	scheduler: retain eval metrics on port collision (#19933 ) When an allocation can't be placed because of a port collision the resulting blocked eval is expected to have a metric reporting the port that caused the conflict, but this metrics was not being emitted when preemption was enabled.	2024-02-09 18:18:48 -05:00
Luiz Aoqui	b52a44717e	executor: limit the value of CPU shares (#19935 ) The value for the executor cgroup CPU weight must be within the limits imposed by the Linux kernel. Nomad used the task `resource.cpu`, an unbounded value, directly as the cgroup CPU weight, causing it to potentially go outside the imposed values. This commit clamps the CPU shares values to be within the limits allowed. Co-authored-by: Tim Gross <tgross@hashicorp.com>	2024-02-09 16:29:14 -05:00
Luiz Aoqui	db5ffde2b7	client: prevent start on cgroups init error (#19915 ) The Nomad client expects certain cgroups paths to exist in order to manage tasks. These paths are created when the agent first starts, but if process fails the agent would just log the error and proceed with its initialization, despite not being able to run tasks. This commit surfaces the errors back to the client initialization so the process can stop early and make clear to operators that something went wrong.	2024-02-09 13:45:29 -05:00
Tim Gross	110d93ab25	windows: remove LazyDLL calls for system modules (#19925 ) On Windows, Nomad uses `syscall.NewLazyDLL` and `syscall.LoadDLL` functions to load a few system DLL files, which does not prevent DLL hijacking attacks. Hypothetically a local attacker on the client host that can place an abusive library in a specific location could use this to escalate privileges to the Nomad process. Although this attack does not fall within the Nomad security model, it doesn't hurt to follow good practices here. We can remove two of these DLL loads by using wrapper functions provided by the stdlib in `x/sys/windows` Co-authored-by: dduzgun-security <deniz.duzgun@hashicorp.com>	2024-02-09 08:47:48 -05:00

1 2 3 4 5 ...

25643 Commits