nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-01 16:05:42 +03:00

Author	SHA1	Message	Date
dependabot[bot]	398b5000c1	chore(deps): bump github.com/hashicorp/go-plugin from 1.4.10 to 1.6.0 (#19646 ) Co-authored-by: James Rasell <jrasell@hashicorp.com>	2024-01-08 08:26:34 +00:00
James Rasell	91cba75f5c	copywrite: fix and add copywrite config enterprise comments. (#19590 ) Nomad CI checks for copywrite headers using multiple config files for specific exemption paths. This means the top-level config file does not take effect when running the copywrite script within these sub-folders. Exempt files therefore need to be added to the sub-config files, along with the top level.	2024-01-03 08:58:53 +00:00
Morgan Drake	c5b36b500b	move license to 2024	2023-12-01 12:26:27 -08:00
modrake	51ffe4208e	workaround and fixes for MPL and copywrite bot (#18775 )	2023-10-17 08:02:13 +01:00
Daniel Bennett	4895d708b4	csi: implement NodeExpandVolume (#18522 ) following ControllerExpandVolume in `c6dbba7cde`, which expands the disk at e.g. a cloud vendor, the controller plugin may say that we also need to issue NodeExpandVolume for the node plugin to make the new disk space available to task(s) that have claims on the volume by e.g. expanding the filesystem on the node. csi spec: https://github.com/container-storage-interface/spec/blob/c918b7f/spec.md#nodeexpandvolume	2023-09-18 10:30:15 -05:00
Seth Hoenig	591394fb62	drivers: plumb hardware topology via grpc into drivers (#18504 ) * drivers: plumb hardware topology via grpc into drivers This PR swaps out the temporary use of detecting system hardware manually in each driver for using the Client's detected topology by plumbing the data over gRPC. This ensures that Client configuration is taken to account consistently in all references to system topology. * cr: use enum instead of bool for core grade * cr: fix test slit tables to be possible	2023-09-18 08:58:07 -05:00
Daniel Bennett	c6dbba7cde	csi: implement ControllerExpandVolume (#18359 ) the first half of volume expansion, this allows a user to update requested capacity ("capacity_min" and "capacity_max") in a volume specification file, and re-issue either Register or Create volume commands (or api calls). the requested capacity will now be "reconciled" with the current real capacity of the volume, issuing a ControllerExpandVolume RPC call to a running controller plugin, if requested "capacity_min" is higher than the current capacity on the volume in state. csi spec: https://github.com/container-storage-interface/spec/blob/c918b7f/spec.md#controllerexpandvolume note: this does not yet cover NodeExpandVolume	2023-09-14 14:13:04 -05:00
Seth Hoenig	2e1974a574	client: refactor cpuset partitioning (#18371 ) * client: refactor cpuset partitioning This PR updates the way Nomad client manages the split between tasks that make use of resources.cpus vs. resources.cores. Previously, each task was explicitly assigned which CPU cores they were able to run on. Every time a task was started or destroyed, all other tasks' cpusets would need to be updated. This was inefficient and would crush the Linux kernel when a client would try to run ~400 or so tasks. Now, we make use of cgroup heirarchy and cpuset inheritence to efficiently manage cpusets. * cr: tweaks for feedback	2023-09-12 09:11:11 -05:00
hashicorp-copywrite[bot]	89e24d7405	Adding explicit MPL license for sub-package This directory and its subdirectories (packages) contain files licensed with the MPLv2 `LICENSE` file in this directory and are intentionally licensed separately from the BSL `LICENSE` file at the root of this repository.	2023-08-10 17:27:01 -05:00
Seth Hoenig	a4cc76bd3e	numa: enable numa topology detection (#18146 ) * client: refactor cgroups management in client * client: fingerprint numa topology * client: plumb numa and cgroups changes to drivers * client: cleanup task resource accounting * client: numa client and config plumbing * lib: add a stack implementation * tools: remove ec2info tool * plugins: fixup testing for cgroups / numa changes * build: update makefile and package tests and cl	2023-08-10 17:05:30 -05:00
James Rasell	81c14dee3c	test: enable exec test previously disabled due to CircleCI (#17975 )	2023-07-19 15:15:11 +01:00
Devashish Taneja	b31e891e5f	Include parent job ID as a Docker container label (#17843 ) Fixes: #17751	2023-07-10 11:27:45 -04:00
Seth Hoenig	89ce092b20	docker: stop network pause container of lost alloc after node restart (#17455 ) This PR fixes a bug where the docker network pause container would not be stopped and removed in the case where a node is restarted, the alloc is moved to another node, the node comes back up. See the issue below for full repro conditions. Basically in the DestroyNetwork PostRun hook we would depend on the NetworkIsolationSpec field not being nil - which is only the case if the Client stays alive all the way from network creation to network teardown. If the node is rebooted we lose that state and previously would not be able to find the pause container to remove. Now, we manually find the pause container by scanning them and looking for the associated allocID. Fixes #17299	2023-06-09 08:46:29 -05:00
Tim Gross	bf7b82b52b	drivers: make internal `DisableLogCollection` capability public (#17196 ) The `DisableLogCollection` capability was introduced as an experimental interface for the Docker driver in 0.10.4. The interface has been stable and allowing third-party task drivers the same capability would be useful for those drivers that don't need the additional overhead of logmon. This PR only makes the capability public. It doesn't yet add it to the configuration options for the other internal drivers. Fixes: #14636 #15686	2023-05-16 09:16:03 -04:00
Tim Gross	88323bab4a	allocrunner: provide factory function so we can build mock ARs (#17161 ) Tools like `nomad-nodesim` are unable to implement a minimal implementation of an allocrunner so that we can test the client communication without having to lug around the entire allocrunner/taskrunner code base. The allocrunner was implemented with an interface specifically for this purpose, but there were circular imports that made it challenging to use in practice. Move the AllocRunner interface into an inner package and provide a factory function type. Provide a minimal test that exercises the new function so that consumers have some idea of what the minimum implementation required is.	2023-05-12 13:29:44 -04:00
Luiz Aoqui	ee5a08dbb2	Revert "hashicorp/go-msgpack v2 (#16810 )" (#17047 ) This reverts commit `8a98520d56`.	2023-05-01 17:18:34 -04:00
Ian Fijolek	8a98520d56	hashicorp/go-msgpack v2 (#16810 ) * Upgrade from hashicorp/go-msgpack v1.1.5 to v2.1.0 Fixes #16808 * Update hashicorp/net-rpc-msgpackrpc to v2 to match go-msgpack * deps: use go-msgpack v2.0.0 go-msgpack v2.1.0 includes some code changes that we will need to investigate furthere to assess its impact on Nomad, so keeping this dependency on v2.0.0 for now since it's no-op. --------- Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>	2023-04-17 17:02:05 -04:00
Seth Hoenig	74b16da272	deps: update docker to 23.0.3 (#16862 ) * [no ci] deps: update docker to 23.0.3 This PR brings our docker/docker dependency (which is hosted at github.com/moby/moby) up to 23.0.3 (forward about 2 years). Refactored our use of docker/libnetwork to reference the package in its new home, which is docker/docker/libnetwork (it is no longer an independent repository). Some minor nearby test case cleanup as well. * add cl	2023-04-12 14:13:36 -05:00
hashicorp-copywrite[bot]	f005448366	[COMPLIANCE] Add Copyright and License Headers	2023-04-10 15:36:59 +00:00
Luiz Aoqui	b2c873274b	plugin: add missing fields to `TaskConfig` (#16434 )	2023-03-13 15:58:16 -04:00
Lance Haig	0e74431b01	Update ioutil library references to os and io respectively for API and Plugins package (#16330 ) No user facing changes so I assume no change log is required	2023-03-08 10:25:09 -06:00
Piotr Kazmierczak	949a6f60c7	renamed stanza to block for consistency with other projects (#15941 )	2023-01-30 15:48:43 +01:00
stswidwinski	5ce42fe8f2	Add mount propagation to protobuf definition of mounts (#15096 ) * Add mount propagation to protobuf definition of mounts * Fix formatting * Add mount propagation to the simple roundtrip test. * changelog: add entry for #15096 Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>	2022-11-17 18:14:59 -05:00
stswidwinski	d16a2c9467	Fix goroutine leakage (#15180 ) * Fix goroutine leakage * cl: add cl entry Co-authored-by: Seth Hoenig <shoenig@duck.com>	2022-11-17 09:47:11 -06:00
Seth Hoenig	ff1a30fe8d	cleanup more helper updates (#14638 ) * cleanup: refactor MapStringStringSliceValueSet to be cleaner * cleanup: replace SliceStringToSet with actual set * cleanup: replace SliceStringSubset with real set * cleanup: replace SliceStringContains with slices.Contains * cleanup: remove unused function SliceStringHasPrefix * cleanup: fixup StringHasPrefixInSlice doc string * cleanup: refactor SliceSetDisjoint to use real set * cleanup: replace CompareSliceSetString with SliceSetEq * cleanup: replace CompareMapStringString with maps.Equal * cleanup: replace CopyMapStringString with CopyMap * cleanup: replace CopyMapStringInterface with CopyMap * cleanup: fixup more CopyMapStringString and CopyMapStringInt * cleanup: replace CopySliceString with slices.Clone * cleanup: remove unused CopySliceInt * cleanup: refactor CopyMapStringSliceString to be generic as CopyMapOfSlice * cleanup: replace CopyMap with maps.Clone * cleanup: run go mod tidy	2022-09-21 14:53:25 -05:00
Tim Gross	263a0497dd	testing: skip exec stream child process test (#14601 ) This test is broken in CircleCI only. It works on GHA in both 20.04 and 22.04 and has been verified to work on real Nomad; temporarily commenting-out so that we don't block unrelated CI runs. WIP to fix in https://github.com/hashicorp/nomad/pull/14600	2022-09-15 11:53:12 -04:00
Piotr Kazmierczak	c4be2c6078	cleanup: replace TypeToPtr helper methods with pointer.Of (#14151 ) Bumping compile time requirement to go 1.18 allows us to simplify our pointer helper methods.	2022-08-17 18:26:34 +02:00
Seth Hoenig	0c62f445c3	build: run gofmt on all go source files Go 1.19 will forecefully format all your doc strings. To get this out of the way, here is one big commit with all the changes gofmt wants to make.	2022-08-16 11:14:11 -05:00
Tim Gross	ce14fc6e6b	deps: remove deprecated net/context (#13932 ) The `golang.org/x/net/context` package was merged into the stdlib as of go 1.7. Update the imports to use the identical stdlib version. Clean up import blocks for the impacted files to remove unnecessary package aliasing.	2022-07-28 14:46:56 -04:00
Seth Hoenig	a6f345c8f5	ci: fixup task runner chroot test This PR is 2 fixes for the flaky TestTaskRunner_TaskEnv_Chroot test. And also the TestTaskRunner_Download_ChrootExec test. - Use TinyChroot to stop copying gigabytes of junk, which causes GHA to fail to create the environment in time. - Pre-create cgroups on V2 systems. Normally the cgroup directory is managed by the cpuset manager, but that is not active in taskrunner tests, so create it by hand in the test framework.	2022-04-19 10:37:46 -05:00
fyn	b6ec83b59b	fix(plugins): should return when ctx.Done	2022-04-09 01:04:29 +08:00
Seth Hoenig	be7ec8de3e	raw_exec: make raw exec driver work with cgroups v2 This PR adds support for the raw_exec driver on systems with only cgroups v2. The raw exec driver is able to use cgroups to manage processes. This happens only on Linux, when exec_driver is enabled, and the no_cgroups option is not set. The driver uses the freezer controller to freeze processes of a task, issue a sigkill, then unfreeze. Previously the implementation assumed cgroups v1, and now it also supports cgroups v2. There is a bit of refactoring in this PR, but the fundamental design remains the same. Closes #12351 #12348	2022-04-04 16:11:38 -05:00
James Rasell	245dd801dd	Merge pull request #12304 from th0m/tlefebvre/fix-wrong-drivernetworkmanager-interface fix: update incorrect DriverNetworkManager interface implementation	2022-04-04 11:29:22 +02:00
Seth Hoenig	5da1a31e94	client: enable support for cgroups v2 This PR introduces support for using Nomad on systems with cgroups v2 [1] enabled as the cgroups controller mounted on /sys/fs/cgroups. Newer Linux distros like Ubuntu 21.10 are shipping with cgroups v2 only, causing problems for Nomad users. Nomad mostly "just works" with cgroups v2 due to the indirection via libcontainer, but not so for managing cpuset cgroups. Before, Nomad has been making use of a feature in v1 where a PID could be a member of more than one cgroup. In v2 this is no longer possible, and so the logic around computing cpuset values must be modified. When Nomad detects v2, it manages cpuset values in-process, rather than making use of cgroup heirarchy inheritence via shared/reserved parents. Nomad will only activate the v2 logic when it detects cgroups2 is mounted at /sys/fs/cgroups. This means on systems running in hybrid mode with cgroups2 mounted at /sys/fs/cgroups/unified (as is typical) Nomad will continue to use the v1 logic, and should operate as before. Systems that do not support cgroups v2 are also not affected. When v2 is activated, Nomad will create a parent called nomad.slice (unless otherwise configured in Client conifg), and create cgroups for tasks using naming convention <allocID>-<task>.scope. These follow the naming convention set by systemd and also used by Docker when cgroups v2 is detected. Client nodes now export a new fingerprint attribute, unique.cgroups.version which will be set to 'v1' or 'v2' to indicate the cgroups regime in use by Nomad. The new cpuset management strategy fixes #11705, where docker tasks that spawned processes on startup would "leak". In cgroups v2, the PIDs are started in the cgroup they will always live in, and thus the cause of the leak is eliminated. [1] https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html Closes #11289 Fixes #11705 #11773 #11933	2022-03-23 11:35:27 -05:00
Tim Gross	1e55a35848	csi: set gRPC authority header for unix domain socket (#12359 ) The go-grpc library used by most CSI plugins doesn't require the authority header to be set, which violates the HTTP2 spec but doesn't impact Nomad because both sides of the connection are using the same library. But plugins written in other languages (`democratic-csi` for example) may have more strictly conforming gRPC server libraries and we need to set the authority header manually.	2022-03-23 12:01:08 -04:00
Tim Gross	fe65d80501	CSI: fix timestamp from volume snapshot responses (#12352 ) Listing snapshots was incorrectly returning nanoseconds instead of seconds, and formatting of timestamps both list and create snapshot was treating the timestamp as though it were nanoseconds instead of seconds. This resulted in create timestamps always being displayed as zero values. Fix the unit conversion error in the command line and the incorrect extraction in the CSI plugin client code. Beef up the unit tests to make sure this code is actually exercised.	2022-03-23 10:39:28 -04:00
James Rasell	80b4eeaba5	Merge branch 'main' into tlefebvre/fix-wrong-drivernetworkmanager-interface	2022-03-17 09:38:13 +01:00
Thomas Lefebvre	4c9f476d32	fix: update incorrect DriverNetworkManager interface implementation in plugins/drivers/client.go and drivers/mock/driver.go And add assertions to catch drifts at compilation time.	2022-03-15 11:51:01 -07:00
Seth Hoenig	b242957990	ci: swap ci parallelization for unconstrained gomaxprocs	2022-03-15 12:58:52 -05:00
Seth Hoenig	3888a59c47	testing: use a smaller chroot when running exec driver tests The default chroot copies all of /bin, /usr, etc. which can ammount to gigabytes of stuff not actually needed for running our tests. Use a smaller chroot in test cases so that CI infra with poor disk IO has a chance.	2022-03-09 16:24:07 -06:00
Tim Gross	03a8d72dba	CSI: implement support for topology (#12129 )	2022-03-01 10:15:46 -05:00
Tim Gross	b775a73ded	CSI: make gRPC client creation more robust (#12057 ) Nomad communicates with CSI plugin tasks via gRPC. The plugin supervisor hook uses this to ping the plugin for health checks which it emits as task events. After the first successful health check the plugin supervisor registers the plugin in the client's dynamic plugin registry, which in turn creates a CSI plugin manager instance that has its own gRPC client for fingerprinting the plugin and sending mount requests. If the plugin manager instance fails to connect to the plugin on its first attempt, it exits. The plugin supervisor hook is unaware that connection failed so long as its own pings continue to work. A transient failure during plugin startup may mislead the plugin supervisor hook into thinking the plugin is up (so there's no need to restart the allocation) but no fingerprinter is started. * Refactors the gRPC client to connect on first use. This provides the plugin manager instance the ability to retry the gRPC client connection until success. * Add a 30s timeout to the plugin supervisor so that we don't poll forever waiting for a plugin that will never come back up. Minor improvements: * The plugin supervisor hook creates a new gRPC client for every probe and then throws it away. Instead, reuse the client as we do for the plugin manager. * The gRPC client constructor has a 1 second timeout. Clarify that this timeout applies to the connection and not the rest of the client lifetime.	2022-02-15 16:57:29 -05:00
Tim Gross	358a46819b	fix integer bounds checks (#11815 ) * driver: fix integer conversion error The shared executor incorrectly parsed the user's group into int32 and then cast to uint32 without bounds checking. This is harmless because an out-of-bounds gid will throw an error later, but it triggers security and code quality scans. Parse directly to uint32 so that we get correct error handling. * helper: fix integer conversion error The autopilot flags helper incorrectly parses a uint64 to a uint which is machine specific size. Although we don't have 32-bit builds, this sets off security and code quality scaans. Parse to the machine sized uint. * driver: restrict bounds of port map The plugin server doesn't constrain the maximum integer for port maps. This could result in a user-visible misconfiguration, but it also triggers security and code quality scans. Restrict the bounds before casting to int32 and return an error. * cpuset: restrict upper bounds of cpuset values Our cpuset configuration expects values in the range of uint16 to match the expectations set by the kernel, but we don't constrain the values before downcasting. An underflow could lead to allocations failing on the client rather than being caught earlier. This also make security and code quality scanners happy. * http: fix integer downcast for per_page parameter The parser for the `per_page` query parameter downcasts to int32 without bounds checking. This could result in underflow and nonsensical paging, but there's no server-side consequences for this. Fixing this will silence some security and code quality scanners though.	2022-01-25 11:16:48 -05:00
James Rasell	ab9ba35e6a	chore: fixup inconsistent method receiver names. (#11704 )	2021-12-20 11:44:21 +01:00
Michael Schurter	eeb1da8a2e	test: update tests to properly use AllocDir Also use t.TempDir when possible.	2021-10-19 10:49:07 -07:00
Michael Schurter	37f053ff89	client: never embed alloc_dir in chroot Fixes #2522 Skip embedding client.alloc_dir when building chroot. If a user configures a Nomad client agent so that the chroot_env will embed the client.alloc_dir, Nomad will happily infinitely recurse while building the chroot until something horrible happens. The best case scenario is the filesystem's path length limit is hit. The worst case scenario is disk space is exhausted. A bad agent configuration will look something like this: ```hcl data_dir = "/tmp/nomad-badagent" client { enabled = true chroot_env { # Note that the source matches the data_dir "/tmp/nomad-badagent" = "/ohno" # ... } } ``` Note that `/ohno/client` (the state_dir) will still be created but not `/ohno/alloc` (the alloc_dir). While I cannot think of a good reason why someone would want to embed Nomad's client (and possibly server) directories in chroots, there should be no cause for harm. chroots are only built when Nomad runs as root, and Nomad disables running exec jobs as root by default. Therefore even if client state is copied into chroots, it will be inaccessible to tasks. Skipping the `data_dir` and `{client,server}.state_dir` is possible, but this PR attempts to implement the minimum viable solution to reduce risk of unintended side effects or bugs. When running tests as root in a vm without the fix, the following error occurs: ``` === RUN TestAllocDir_SkipAllocDir alloc_dir_test.go:520: Error Trace: alloc_dir_test.go:520 Error: Received unexpected error: Couldn't create destination file /tmp/TestAllocDir_SkipAllocDir1457747331/001/nomad/test/testtask/nomad/test/testtask/.../nomad/test/testtask/secrets/.nomad-mount: open /tmp/TestAllocDir_SkipAllocDir1457747331/001/nomad/test/.../testtask/secrets/.nomad-mount: file name too long Test: TestAllocDir_SkipAllocDir --- FAIL: TestAllocDir_SkipAllocDir (22.76s) ``` Also removed unused Copy methods on AllocDir and TaskDir structs. Thanks to @eveld for not letting me forget about this!	2021-10-18 09:22:01 -07:00
Mahmood Ali	6c414cd5f9	gofmt all the files mostly to handle build directives in 1.17.	2021-10-01 10:14:28 -04:00
James Rasell	e34fa583f9	allow configuration of Docker hostnames in bridge mode (#11173 ) Add a new hostname string parameter to the network block which allows operators to specify the hostname of the network namespace. Changing this causes a destructive update to the allocation and it is omitted if empty from API responses. This parameter also supports interpolation. In order to have a hostname passed as a configuration param when creating an allocation network, the CreateNetwork func of the DriverNetworkManager interface needs to be updated. In order to minimize the disruption of future changes, rather than add another string func arg, the function now accepts a request struct along with the allocID param. The struct has the hostname as a field. The in-tree implementations of DriverNetworkManager.CreateNetwork have been modified to account for the function signature change. In updating for the change, the enhancement of adding hostnames to network namespaces has also been added to the Docker driver, whilst the default Linux manager does not current implement it.	2021-09-16 08:13:09 +02:00
James Rasell	3bffe443ac	chore: fix incorrect docstring formatting.	2021-08-30 11:08:12 +02:00
Grant Griffiths	cba476eae6	CSI ListSnapshots secrets implementation Signed-off-by: Grant Griffiths <ggriffiths@purestorage.com>	2021-07-28 11:30:29 -07:00

1 2 3 4 5 ...

380 Commits