nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-03 17:05:43 +03:00

Author	SHA1	Message	Date
Piotr Kazmierczak	f7a4ded2c0	security: add CT executeTemplate to default function_denylist (#24541 ) This PR adds Consul Template's executeTemplate function to the denylist by default, in order to prevent accidental or malicious infinitely recursive execution. --------- Co-authored-by: Tim Gross <tgross@hashicorp.com>	2024-11-22 19:33:56 +01:00
Martijn Vegter	997da25cdb	scheduler: take all assigned cpu cores into account instead of only those part of the largest lifecycle (#24304 ) Fixes a bug in the AllocatedResources.Comparable method, where the scheduler would only take into account the cpusets of the tasks in the largest lifecycle. This could result in overlapping cgroup cpusets. Now we make the distinction between reserved and fungible resources throughout the lifespan of the alloc. In addition, added logging in case of future regressions thus not requiring manual inspection of cgroup files.	2024-11-21 13:21:48 -05:00
Martijn Vegter	bfb714144e	client: fixed a bug where AMD CPUs were not correctly fingerprinting base speed (#24415 ) Relates to: #19468	2024-11-21 09:08:47 -06:00
James Rasell	beb4097e81	client: mark the remote_task hook as deprecated. (#24505 )	2024-11-20 15:32:50 +00:00
Florian Apolloner	0a343798b6	Add NOMAD_* variables to CNI args. Fixes #23830 (#24319 ) Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>	2024-11-19 12:48:48 -08:00
Tim Gross	a420732424	consul: allow non-root Nomad to rewrite token (#24410 ) When a task restarts, the Nomad client may need to rewrite the Consul token, but it's created with permissions that prevent a non-root agent from writing to it. While Nomad clients should be run as root (currently), it's harmless to allow whatever user the Nomad agent is running as to be able to write to it, and that's one less barrier to rootless Nomad. Ref: https://github.com/hashicorp/nomad/issues/23859#issuecomment-2465757392	2024-11-19 10:21:14 -05:00
Gabi	89c3d69d79	nsutil: wrap error that comes from the syscall so caller can do errors.As (#24480 ) User of `nsutil` library should be able to do the following and for it to work: ``` var errno syscall.Errno if errors.As(err, &errno) { if errno == unix.EBUSY { ... } } ``` This commit fixes that issue.	2024-11-19 10:24:49 +01:00
Tim Gross	6be9a50626	vault: catch expired lease as fatal error (#24409 ) When a Vault lease expires, it's revoked on the server and cannot be removed, so this error should be treated as fatal. The errors we get aren't wrapped by the Vault SDK, so unfortunately we have to read the error messages and can't easily enumerate non-fatal error messages (which might be bubbling up from the stdlib). I've audited the errors currently used and have documented their source. Ref `52ba156d47/vault/expiration.go (L1327)` Fixes: https://github.com/hashicorp/nomad/issues/23859	2024-11-18 09:12:35 -05:00
Michael Smithhisler	0714353324	fix: handle template re-renders on client restart (#24399 ) When multiple templates with api functions are included in a task, it's possible for consul-template to re-render templates as it creates watchers, overwriting render event data. This change uses event fields that do not get overwritten, and only executes the change mode for templates that were actually written to disk. --------- Co-authored-by: Tim Gross <tgross@hashicorp.com>	2024-11-08 12:49:38 -05:00
Seth Hoenig	4ef4bebd1f	connect: handle grpc_address as gosockaddr/template string (#24280 ) * connect: handle grpc_address as gosockaddr/template string This PR fixes a bug where the consul.grpc_address could not be set using a go-sockaddr/template string. This was inconsistent with how we do accept such strings for consul.address values. * add changelog	2024-11-07 09:04:58 -06:00
James Rasell	c44f933aeb	test: ensure RPC only test client sets enterprise specific config. (#24376 )	2024-11-06 13:43:25 +00:00
Tim Gross	a8b84a6eed	testing: RPC-only test client helper (#24371 ) In #10193 we introduced a testing helper that spins up a client RPC server without the rest of the client operations so that we can make server-side client RPC tests lighter. But this wasn't actually ever wired up to the intended target. While working on Dynamic Host Volumes I noticed that this would be useful for RPC tests. This changeset fixes some bugs in the helper that arose from client code drift, and makes it used by the client RPC tests for CSI. This will also get used for the DHV RPC tests. Ref: https://github.com/hashicorp/nomad/pull/10193	2024-11-05 14:59:53 -05:00
Juanadelacuesta	d0b015ec01	func: move the user andd group type declarations	2024-10-31 10:34:26 +01:00
Juanadelacuesta	0cd1b5ff13	func: move the validation to a dependency and use id sets	2024-10-28 18:59:51 +01:00
Rodrigo Lourenço	cdebf96b0e	fingerprint gce: collect preemptibility	2024-10-23 15:19:20 +02:00
Seth Hoenig	f1ce127524	jobspec: add a chown option to artifact block (#24157 ) * jobspec: add a chown option to artifact block This PR adds a boolean 'chown' field to the artifact block. It indicates whether the Nomad client should chown the downloaded files and directories to be owned by the task.user. This is useful for drivers like raw_exec and exec2 which are subject to the host filesystem user permissions structure. Before, these drivers might not be able to use or manage the downloaded artifacts since they would be owned by the root user on a typical Nomad client configuration. * api: no need for pointer of chown field	2024-10-11 11:30:27 -05:00
Tim Gross	b7595c646d	alloc fs: use case-insensitive check for reads of secret/private dir (#24125 ) When using the Client FS APIs, we check to ensure that reads don't traverse into the allocation's secret dir and private dir. But this check can be bypassed on case-insensitive file systems (ex. Windows, macOS, and Linux with obscure ext4 options enabled). This allows a user with `read-fs` permissions but not `alloc-exec` permissions to read from the secrets dir. This changeset updates the check so that it's case-insensitive. This risks false positives for escape (see linked Go issue), but only if a task without filesystem isolation deliberately writes into the task working directory to do so, which is a fail-safe failure mode. Ref: https://github.com/golang/go/issues/18358 Co-authored-by: dduzgun-security <deniz.duzgun@hashicorp.com>	2024-10-03 14:20:24 -04:00
Martijn Vegter	3ecf0d21e2	metrics: introduce client config to include alloc metadata as part of the base labels (#23964 )	2024-10-02 10:55:44 -04:00
Juliano Martinez	4a74fda8ce	Allow client template config block to be parsed when using json config (#24007 ) - Adds tests - Adds sample test data for parsing hcl and json - Adds changelog	2024-10-01 15:44:36 -04:00
Piotr Kazmierczak	981ca36049	docker: use official client instead of fsouza/go-dockerclient (#23966 ) This PR replaces fsouza/go-dockerclient 3rd party docker client library with docker's official SDK. --------- Co-authored-by: Tim Gross <tgross@hashicorp.com> Co-authored-by: Seth Hoenig <shoenig@duck.com>	2024-09-26 18:41:44 +02:00
Tim Gross	cc9227b858	template: fix panic in change_mode=script on client restart (#24057 ) When we introduced change_mode=script to templates, we passed the driver handle down into the template manager so we could call its `Exec` method directly. But the lifecycle of the driver handle is managed by the taskrunner and isn't available when the template manager is first created. This has led to a series of patches trying to fixup the behavior (#15915, #15192, #23663, #23917). Part of the challenge in getting this right is using an interface to avoid the circular import of the driver handle. But the taskrunner already has a way to deal with this problem using a "lazy handle". The other template change modes already use this indirectly through the `Lifecycle` interface. Change the driver handle `Exec` call in the template manager to a new `Lifecycle.Exec` call that reuses the existing behavior. This eliminates the need for the template manager to know anything at all about the handle state. Fixes: https://github.com/hashicorp/nomad/issues/24051	2024-09-25 08:59:01 -04:00
Michael Smithhisler	338487c159	fix: add node pool attribute to interpretable values in task env (#24052 )	2024-09-24 13:23:16 -04:00
Michael Smithhisler	6b6aa7cc26	identity: adds ability to specify custom filepath for saving workload identities (#24038 )	2024-09-23 10:27:00 -04:00
Tim Gross	b7f1800657	fingerprint: update landlock test to accept v4+ APIs (#23979 ) The landlock fingerprint test assumes there's no version of the landlock API >3. Update the test assertion to allow for the current v4 and any future versions.	2024-09-17 15:07:44 -04:00
Seth Hoenig	51215bf102	deps: update to go-set/v3 and refactor to use custom iterators (#23971 ) * deps: update to go-set/v3 * deps: use custom set iterators for looping	2024-09-16 13:40:10 -05:00
Daniel Bennett	5e1fae2856	networking: set alloc NetworkStatus.AddressIPv6 (#23959 ) when a CNI result includes an IPv6 address, set it on the alloc's NetworkStatus for reference. e.g.: $ nomad alloc status -json 3dca \| jq '.NetworkStatus' { "Address": "172.26.64.14", "AddressIPv6": "fd00:a110:c8::b", "DNS": null, "InterfaceName": "eth0" }	2024-09-16 10:21:52 -05:00
Tim Gross	07aca67108	template: lock task handle before trying script check (#23917 ) In #23663 we fixed the template hook so that `change_mode="script"` didn't lose track of the task handle during restores. But this revealed a second bug which is that access to the handle is not locked while in use, which can allow it to be removed concurrently. Fixes: https://github.com/hashicorp/nomad/issues/23875	2024-09-12 08:41:06 -04:00
Juana De La Cuesta	9c5f962940	Update client/lib/cgroupslib/partition_linux.go Co-authored-by: Tim Gross <tgross@hashicorp.com>	2024-09-06 10:56:47 +02:00
Juana De La Cuesta	426c225dc2	Update client/lib/cgroupslib/partition_linux.go Co-authored-by: Tim Gross <tgross@hashicorp.com>	2024-09-06 10:56:41 +02:00
Juana De La Cuesta	8e6d85b66f	Update client/lib/cgroupslib/partition_linux.go Co-authored-by: Tim Gross <tgross@hashicorp.com>	2024-09-06 10:56:36 +02:00
Juanadelacuesta	a65d05ff51	fix: keep a register of the usable cores to avoid using more than that	2024-09-05 17:02:54 +02:00
Daniel Bennett	2f5cf8efae	networking: option to enable ipv6 on bridge network (#23882 ) by setting bridge_network_subnet_ipv6 in client config Co-authored-by: Martina Santangelo <martina.santangelo@hashicorp.com>	2024-09-04 10:17:10 -05:00
Daniel Bennett	a6e29057d6	networking: refactor some iptables for testability (#23856 )	2024-08-23 10:05:46 -05:00
Seth Hoenig	8b093a6a5d	scheduler: support for device - aware numa scheduling (#1760 ) (#23837 ) (CE backport of ENT 59433d56c7215c0b8bf33764f41b57d9bd30160f (without ent files)) * scheduler: enhance numa aware scheduling with support for devices * cr: add comments	2024-08-20 07:53:04 -05:00
Piotr Kazmierczak	0bc9796d3b	client: log an error message if total detected cpu is zero (#23827 )	2024-08-15 18:31:27 +02:00
Tim Gross	682c8c0c81	cgroupslib: allow initial controller check with delegated cgroups v2 (#23803 ) During Nomad client initialization with cgroups v2, we assert that the required cgroup controllers are available in the root `cgroup.subtree_control` file by idempotently writing to the file. But if Nomad is running with delegated cgroups, this will fail file permissions checks even if the subtree control file already has the controllers we need. Update the initialization to first check if the controllers are missing before attempting to write to them. This allows cgroup delegation so long as the cluster administrator has pre-created a Nomad owned cgroups tree and set the `Delegate` option in a systemd override. If not, initialization fails in the existing way. Although this is one small step along the way to supporting a rootless Nomad client, running Nomad as non-root is still unsupported. I've intentionally not documented setting up cgroup delegation in this PR, as this PR is insufficient by itself to have a secure and properly-working rootless Nomad client. Ref: https://github.com/hashicorp/nomad/issues/18211 Ref: https://github.com/hashicorp/nomad/issues/13669	2024-08-14 16:58:21 -04:00
Martina Santangelo	73ce56ba27	networking: refactor building nomad bridge config (#23772 )	2024-08-14 12:43:31 -05:00
Seth Hoenig	db0642099e	build: update golangci-lint to 1.60.1 (#23807 ) * build: update golangci-lint to 1.60.1 * ci: update golangci-lint to v1.60.1 Helps with go1.23 compatability. Introduces some breaking changes / newly enforced linter patterns so those are fixed as well.	2024-08-14 10:09:31 -05:00
Tim Gross	920f4702d6	testing: fix skip comment on `RequireWindows` helper (#23776 )	2024-08-09 09:07:25 -04:00
Tim Gross	ef116b12d5	metrics: add `client.tasks` state metrics (#23773 ) Although we have `client.allocations` metrics to track allocation states on a client, having separate metrics for `client.tasks` will allow operators to identify that there are individual tasks in an unexpected state in an otherwise healthy allocation. Fixes: https://github.com/hashicorp/nomad/issues/23770	2024-08-09 09:02:17 -04:00
Tim Gross	9543e740af	docker: fix delimiter for selinux label for read-only volumes (#23750 ) The Docker driver's `volume` field to specify bind-mounts takes a list of strings that consist of three `:`-delimited fields: source, destination, and options. We append the SELinux label from the plugin configuration as the third field. But when the user has already specified the volume is read-only with `:ro`, we're incorrectly appending the SELinux label with another `:` instead of the required `,`. Combine the options into a single field value before appending them to the bind mounts configuration. Updated the tests to split out Windows behavior (which doesn't accept options) and to ensure the test task has the expected environment for bind mounts. Fixes: https://github.com/hashicorp/nomad/issues/23690	2024-08-08 09:08:01 -04:00
Deniz Onur Duzgun	0f7b8698ec	security: fix write symlink escape on the same allocdir path (#23738 ) Resolves symlink escape when unarchiving by removing existing paths within the same allocation directory which can occur by writing a header that points to a symlink that lives outside of the sandbox environment. This exploit requires first compromising the Nomad client agent at the source allocation. Ref: https://hashicorp.atlassian.net/browse/NET-10607 Ref: https://github.com/hashicorp/nomad-enterprise/pull/1725	2024-08-05 16:23:27 -04:00
Tim Gross	b25f1b66ce	resources: allow job authors to configure size of secrets tmpfs (#23696 ) On supported platforms, the secrets directory is a 1MiB tmpfs. But some tasks need larger space for downloading large secrets. This is especially the case for tasks using `templates`, which need extra room to write a temporary file to the secrets directory that gets renamed to the old file atomically. This changeset allows increasing the size of the tmpfs in the `resources` block. Because this is a memory resource, we need to include it in the memory we allocate for scheduling purposes. The task is already prevented from using more memory in the tmpfs than the `resources.memory` field allows, but can bypass that limit by writing to the tmpfs via `template` or `artifact` blocks. Therefore, we need to account for the size of the tmpfs in the allocation resources. Simply adding it to the memory needed when we create the allocation allows it to be accounted for in all downstream consumers, and then we'll subtract that amount from the memory resources just before configuring the task driver. For backwards compatibility, the default value of 1MiB is "free" and ignored by the scheduler. Otherwise we'd be increasing the allocated resources for every existing alloc, which could cause problems across upgrades. If a user explicitly sets `resources.secrets = 1` it will no longer be free. Fixes: https://github.com/hashicorp/nomad/issues/2481 Ref: https://hashicorp.atlassian.net/browse/NET-10070	2024-08-05 16:06:58 -04:00
Tim Gross	c280891703	template: allow change_mode script to run after client restart (#23663 ) For templates with `change_mode = "script"`, we set a driver handle in the poststart method, so the template runner can execute the script inside the task. But when the client is restarted and the template contents change during that window, we trigger a change_mode in the prestart method. In that case, the hook will not have the handle and so returns an errror trying to run the change mode. We restore the driver handle before we call any prestart hooks, so we can pass that handle in the constructor whenever it's available. In the normal task start case the handle will be empty but also won't be called. The error messages are also misleading, as there's no capabilities check happening here. Update the error messages to match. Fixes: https://github.com/hashicorp/nomad/issues/15851 Ref: https://hashicorp.atlassian.net/browse/NET-9338	2024-07-24 08:29:39 -04:00
Daniel Bennett	32d8ec446f	cni: fix parsing .conf and .json configs (#23629 ) so we can support more than just .conflist format like our docs claim we do	2024-07-18 14:55:13 -05:00
Martina Santangelo	8cbd857ac9	cni: test Setup method (#23609 ) Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>	2024-07-17 13:02:23 -04:00
guifran001	1c44521543	client: Add a preferred address family option for network-interface (#23389 ) to prefer ipv4 or ipv6 when deducing IP from network interface Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>	2024-07-12 15:30:38 -05:00
Martina Santangelo	661011f5de	cni: allow users to set CNI args in job spec (#23538 )	2024-07-12 11:47:15 -04:00
Piotr Kazmierczak	7772711c89	plugins: fix nomadTopologyToProto panic on systems that don't support NUMA (#23399 ) After changes introduced in #23284 we no longer need to make a if !st.SupportsNUMA() check in the GetNodes() topology method. In fact this check will now cause panic in nomadTopologyToProto method on systems that don't support NUMA.	2024-07-09 08:41:52 +02:00
Deniz Onur Duzgun	ef6cdec884	security: add escape to arbitrary file access (#23319 )	2024-07-08 14:00:09 -04:00

1 2 3 4 5 ...

5002 Commits