nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-06 18:35:44 +03:00

Author	SHA1	Message	Date
Tim Gross	a74775814c	fingerprint: add DNS address and port to Consul fingerprint (#19969 ) In order to provide a DNS address and port to Connect tasks configured for transparent proxy, we need to fingerprint the Consul DNS address and port. The client will pass this address/port to the iptables configuration provided to the `consul-cni` plugin. Ref: https://github.com/hashicorp/nomad/issues/10628	2024-02-14 12:15:58 -05:00
Cedric Le Roux	994a2b1036	client: fixed a bug where corrupt client state could panic the client (#19972 )	2024-02-14 11:14:11 -05:00
Tim Gross	c1b5850473	docs: add warning not to enable Consul `tls.grpc.verify_incoming` (#19970 ) Consul does not support incoming TLS verification of Envoy. This failure results in hard-to-understand errors like `SSLV3_ALERT_BAD_CERTIFICATE` in the Envoy allocation logs. Leave a warning about this to users. Closes: https://github.com/hashicorp/nomad/issues/19772 Closes: https://github.com/hashicorp/nomad/issues/16854 Ref: https://github.com/hashicorp/consul/issues/13088	2024-02-14 08:56:35 -05:00
Tim Gross	c364cb5729	Merge pull request #19968 from hashicorp/post-1.7.5-release Post 1.7.5 release	2024-02-13 11:42:30 -05:00
Tim Gross	3978f96898	Merge release 1.7.5 files	2024-02-13 11:34:25 -05:00
hc-github-team-nomad-core	64c2e2b868	Prepare for next release	2024-02-13 11:32:59 -05:00
hc-github-team-nomad-core	6e08d9ffff	Generate files for 1.7.5 release	2024-02-13 11:32:59 -05:00
Luiz Aoqui	62b7d6ffe9	vault: revert #18998 to fix potential deadlock (#19963 ) * Revert "vault: always renew tokens using the renewal loop (#18998)" This reverts commit `7054fe1a8c`. * test: add case for concurrent Vault token renewal	2024-02-13 09:50:46 -05:00
Julien Castets	61941d8204	docs: autoscaler doc for max_scale_up and max_scale_down of target-value strategy (#19945 ) See https://github.com/hashicorp/nomad-autoscaler/pull/848	2024-02-13 07:38:39 +00:00
Phil Renaud	1bde7a8fb4	[ui] Upgrades to build storybook on node v20 (#19953 ) * Attempting to build storybook on node v20 * babel-plugin-dynamic-import-node added * build without babel-plugin-dynamic-import-node explicitly declared	2024-02-12 16:51:47 -05:00
Tim Gross	a54657899c	CNI: fix deprecation warnings (#19954 ) We updated our `go-cni` dependency in #17582 but this left deprecation warnings on the `cni.CNIResult` type (now `cni.Result`).	2024-02-12 15:35:43 -05:00
Seth Hoenig	37c497628c	docs: describe cloud environments in fingerprint denylist (#19952 ) This PR changes the example of the client config option "fingerprint.denylist" to include all the cloud environment fingerprinters. Each one contains a 2 second HTTP timeout to a metadata endpoint that does not exist if you are not in that particular cloud. When run in serial on startup, this results in an 8 second wait where nothing useful is happening. Closes #16727	2024-02-12 09:57:29 -06:00
Tim Gross	e986c298ac	alloc exec: fix panics after stream close (#19932 ) In #19172 we added a check on websocket errors to see if they were one of several benign "close" messages. This change inadvertently assumed that other messages used for close would not implement `HTTPCodedError`. When errors like the following are received: > msgpack decode error [pos 0]: io: read/write on closed pipe" they are sent from the inner loop as though they were a "real" error, but the channel is already being closed with a "close" message. This allowed many more attempts to pass thru a previously-undiscovered race condition in the two goroutines that stream RPC responses to the websocket. When the input stream returns an error for any reason (for example, the command we're executing has exited), it will unblock the "outer" goroutine and cause a write to the websocket. If we're concurrently writing the "close error" discussed above, this results in a panic from the websocket library. This changeset includes two fixes: * Catch "closed pipe" error correctly so that we're not sending unnecessary error messages. * Move all writes to the websocket into the same response streaming goroutine. The main handler goroutine will block on a results channel, and the response streaming goroutine will send on that channel with the final error when it's done so it can be reported to the user.	2024-02-12 09:43:34 -05:00
Tim Gross	0985f96f8d	state: fix state store corruption in plan apply (#19937 ) The state store's `UpsertPlanResults` method canonicalizes allocations in order to upgrade them to a new version. But the method does not copy the allocation before doing so, which can potentially corrupt the state store. This hasn't been implicated in any known user-facing bugs, but was detected when running Nomad under a build with Go toolchains data race detection enabled.	2024-02-12 08:59:04 -05:00
Luiz Aoqui	e2bfdf0c10	events: emit event when job is deleted (#19903 ) When jobs are deregistered with the `purge` flag they are immediately deleted from the state store instead of just updated to be marked as stopped. Without tracking job deletions the event stream would not receive a `JobDeregistered` event when `purge` was set.	2024-02-09 18:19:33 -05:00
Luiz Aoqui	4a8b01430b	scheduler: retain eval metrics on port collision (#19933 ) When an allocation can't be placed because of a port collision the resulting blocked eval is expected to have a metric reporting the port that caused the conflict, but this metrics was not being emitted when preemption was enabled.	2024-02-09 18:18:48 -05:00
Luiz Aoqui	b52a44717e	executor: limit the value of CPU shares (#19935 ) The value for the executor cgroup CPU weight must be within the limits imposed by the Linux kernel. Nomad used the task `resource.cpu`, an unbounded value, directly as the cgroup CPU weight, causing it to potentially go outside the imposed values. This commit clamps the CPU shares values to be within the limits allowed. Co-authored-by: Tim Gross <tgross@hashicorp.com>	2024-02-09 16:29:14 -05:00
Luiz Aoqui	db5ffde2b7	client: prevent start on cgroups init error (#19915 ) The Nomad client expects certain cgroups paths to exist in order to manage tasks. These paths are created when the agent first starts, but if process fails the agent would just log the error and proceed with its initialization, despite not being able to run tasks. This commit surfaces the errors back to the client initialization so the process can stop early and make clear to operators that something went wrong.	2024-02-09 13:45:29 -05:00
Tim Gross	110d93ab25	windows: remove LazyDLL calls for system modules (#19925 ) On Windows, Nomad uses `syscall.NewLazyDLL` and `syscall.LoadDLL` functions to load a few system DLL files, which does not prevent DLL hijacking attacks. Hypothetically a local attacker on the client host that can place an abusive library in a specific location could use this to escalate privileges to the Nomad process. Although this attack does not fall within the Nomad security model, it doesn't hurt to follow good practices here. We can remove two of these DLL loads by using wrapper functions provided by the stdlib in `x/sys/windows` Co-authored-by: dduzgun-security <deniz.duzgun@hashicorp.com>	2024-02-09 08:47:48 -05:00
Tim Gross	62c57d208b	fingerprint: eliminate spurious warning logs with Consul CE (#19923 ) Support for fingerprinting the Consul admin partition was added in #19485. But when the client fingerprints Consul CE, it gets a valid fingerprint and working Consul but with a warn-level log. Return "ok" from the partition extractor, but also ensure that we only add the Consul attribute if it actually has a value. Fixes: https://github.com/hashicorp/nomad/issues/19756	2024-02-09 08:19:00 -05:00
Phil Renaud	81f868631f	Fix vercel deployments EBADENGINE errors (#19914 )	2024-02-08 14:14:57 -05:00
Phil Renaud	41c783aec2	Noting action name restrictions, and correcting those of auth methods and roles (#19905 )	2024-02-08 12:01:22 -05:00
Tim Gross	fc26e0cb22	Post 1.7.4 release (#19918 )	2024-02-08 10:58:50 -05:00
Luiz Aoqui	2a348ba714	docs: expand impact of `verify_https_client=false` (#19916 ) When Nomad is configured with `verify_https_client=false` endpoints that do not require an ACL token can be accessed without any other type of authentication. Expand the docs to mention this effect.	2024-02-08 10:55:40 -05:00
Tim Gross	2970690355	Merge release 1.7.4 files	2024-02-08 10:41:11 -05:00
hc-github-team-nomad-core	33f0a5b268	Prepare for next release	2024-02-08 10:40:24 -05:00
hc-github-team-nomad-core	875e96cccc	Generate files for 1.7.4 release	2024-02-08 10:40:24 -05:00
Tim Gross	df86503349	template: sandbox template rendering The Nomad client renders templates in the same privileged process used for most other client operations. During internal testing, we discovered that a malicious task can create a symlink that can cause template rendering to read and write to arbitrary files outside the allocation sandbox. Because the Nomad agent can be restarted without restarting tasks, we can't simply check that the path is safe at the time we write without encountering a time-of-check/time-of-use race. To protect Nomad client hosts from this attack, we'll now read and write templates in a subprocess: * On Linux/Unix, this subprocess is sandboxed via chroot to the allocation directory. This requires that Nomad is running as a privileged process. A non-root Nomad agent will warn that it cannot sandbox the template renderer. * On Windows, this process is sandboxed via a Windows AppContainer which has been granted access to only to the allocation directory. This does not require special privileges on Windows. (Creating symlinks in the first place can be prevented by running workloads as non-Administrator or non-ContainerAdministrator users.) Both sandboxes cause encountered symlinks to be evaluated in the context of the sandbox, which will result in a "file not found" or "access denied" error, depending on the platform. This change will also require an update to Consul-Template to allow callers to inject a custom `ReaderFunc` and `RenderFunc`. This design is intended as a workaround to allow us to fix this bug without creating backwards compatibility issues for running tasks. A future version of Nomad may introduce a read-only mount specifically for templates and artifacts so that tasks cannot write into the same location that the Nomad agent is. Fixes: https://github.com/hashicorp/nomad/issues/19888 Fixes: CVE-2024-1329	2024-02-08 10:40:24 -05:00
Tim Gross	0d3cd1427f	migration: check symlink sources during archive unpack During allocation directory migration, the client was not checking that any symlinks in the archive aren't pointing to somewhere outside the allocation directory. While task driver sandboxing will protect against processes inside the task from reading/writing thru the symlink, this doesn't protect against the client itself from performing unintended operations outside the sandbox. This changeset includes two changes: * Update the archive unpacking to check the source of symlinks and require that they fall within the sandbox. * Fix a bug in the symlink check where it was using `filepath.Rel` which doesn't work for paths in the sibling directories of the sandbox directory. This bug doesn't appear to be exploitable but caused errors in testing. Fixes: https://github.com/hashicorp/nomad/issues/19887	2024-02-08 10:40:24 -05:00
hc-github-team-nomad-core	c03c735c99	Backport of deps: update dependencies indirectly bringing in older runc into release/1.7.x #19866 Co-authored-by: Tim Gross <tgross@hashicorp.com>	2024-02-08 10:40:24 -05:00
hc-github-team-nomad-core	af7cf79df7	Backport of chore(deps): bump github.com/opencontainers/runc from 1.1.10 to 1.1.12 into release/1.7.x #19862 Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-02-08 10:40:24 -05:00
Luiz Aoqui	7391a59695	docs: add note about stub list filtering (#19902 ) When filtering list results, the filter expression is applied to the full object, not the stub. This is useful because it allows users to filter the list on fields not present in the object stub. But it can also be confusing because some fields have different names, or only exist in the stub, so the filter expression needs to reference fields not present in returned data. Filtering on the stub would reduce the confusion, but it would also restrict users to only be able to filter on the fields in the stub, which, by definition, are just a subset of the original fields. Documenting this behaviour can help users understand unexpected errors and results.	2024-02-07 16:41:07 -05:00
Luiz Aoqui	ce710d49fd	cli: fix `tls ca create` command with `-domain` (#19892 ) The current implementation of the `nomad tls ca create` command ovierrides the value of the `-domain` flag with `"nomad"` if no additional customization is provided. This results in a certificate for the wrong domain or an error if the `-name-constraint` flag is also used. THe logic for `IsCustom()` also seemed reversed. If all custom fields are empty then the certificate is _not_ customized, so `IsCustom()` should return false.	2024-02-07 16:40:51 -05:00
Phil Renaud	15b06e8505	[ui] HashiCorp Design System upgraded to 3.6.0 (#19872 ) * HashiCorp Design System upgraded to 3.6.0 * Fresh yarn * Responses out of range are brought back within * General pass at a11y fixes with updated components and node * Further tooltip updates * 3 more partitions worth of toggle and tooltip updates * scale-events-accordion and topo-viz node fixes	2024-02-07 16:08:41 -05:00
Kiara Grouwstra	1e04fc4613	Libraries & SDKs: add nix-nomad (#19808 )	2024-02-06 20:47:23 -05:00
Luiz Aoqui	7daa854491	docs: remove duplicate entry for `upstreams.config` (#19877 )	2024-02-06 20:44:02 -05:00
Luiz Aoqui	5825cefe51	docs: remove Docker `cpuset_cpus` config (#19882 ) Nomad 1.7 refactored how CPU cores are assigned to tasks, making the Docker-specific `cpuset_cpus` configuration no longer used.	2024-02-06 10:51:16 -05:00
Phil Renaud	c927377700	Random exec assignment depends on taskGroup name if provided (#19878 )	2024-02-05 23:23:01 -05:00
Luiz Aoqui	50c50a6328	cli: fix return code when job deployment succeeds (#19876 ) When a job eval is blocked due to missing capacity, the `nomad job run` command will monitor the deployment, which may succeed once additional capacity is made available. But the current implementation would return `2` even when the deployment succeeded because it only took the first eval status into account. This commit updates the eval monitoring logic to reset the scheduling error state if the deployment eventually succeeds.	2024-02-05 18:32:25 -05:00
Juana De La Cuesta	120c3ca3c9	Add granular control of SELinux labels for host mounts (#19839 ) Add new configuration option on task's volume_mounts, to give a fine grained control over SELinux "z" label * Update website/content/docs/job-specification/volume_mount.mdx Co-authored-by: Luiz Aoqui <luiz@hashicorp.com> * fix: typo * func: make volume mount verification happen even on mounts with no volume --------- Co-authored-by: Luiz Aoqui <luiz@hashicorp.com> Co-authored-by: Tim Gross <tgross@hashicorp.com>	2024-02-05 10:05:33 +01:00
Tim Gross	f1637bdd5f	deps: update dependencies indirectly bringing in older runc (#19863 ) Although Nomad itself is not vulnerable to CVE-2024-21626, we want to update dependencies that bring in the vulnerable packages so as not to trip vulnerability scanners. Update `containerd` and `go-dockerclient` as well as the various transitive dependencies these bring in.	2024-02-02 16:08:22 -05:00
dependabot[bot]	b94a193c8a	chore(deps): bump github.com/opencontainers/runc from 1.1.10 to 1.1.12 (#19851 ) * chore(deps): bump github.com/opencontainers/runc from 1.1.10 to 1.1.12 Bumps [github.com/opencontainers/runc](https://github.com/opencontainers/runc) from 1.1.10 to 1.1.12. - [Release notes](https://github.com/opencontainers/runc/releases) - [Changelog](https://github.com/opencontainers/runc/blob/v1.1.12/CHANGELOG.md) - [Commits](https://github.com/opencontainers/runc/compare/v1.1.10...v1.1.12) --- updated-dependencies: - dependency-name: github.com/opencontainers/runc dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> * add changelog entry --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Tim Gross <tgross@hashicorp.com>	2024-02-02 10:18:53 -05:00
Tim Gross	334c383eb6	template: run template tests on Windows where possible (#19856 ) We don't run the whole suite of unit tests on all platforms to keep CI times reasonable, so the only things we've been running on Windows are platform-specific. I'm working on some platform-specific `template` related work and having these tests run on Windows will reduce the risk of regressions. Our Windows CI box doesn't have Consul or Vault, so I've skipped those tests for the time being, and can follow up with that later. There's also a test with assertions looking for specific paths, and the results are different on Windows. I've skipped those for the moment as well and will follow up under a separate PR. Also swap `testify` for `shoenig/test`	2024-02-02 09:22:03 -05:00
Heat Hamilton	556d44cd7a	Merge pull request #19848 from hashicorp/heat/chore/update-website-dependencies website: update dependencies	2024-01-30 15:07:11 -05:00
Heat Hamilton	0b29a7d727	Update dependencies to match Next v14 in Dev Portal; updated husky workflow to v9; updated nvmrc to v18	2024-01-30 13:43:36 -05:00
Daniel Bennett	e059adef98	e2e: PreCleanup and other jobs3 helpers (#19844 )	2024-01-29 17:54:54 -06:00
Seth Hoenig	b50b81e488	users: refactor method for getting UID from username (#19840 ) This PR refactors a helper function for getting the UID associated with a given username to also return the GID and home directory. Also adds unit tests on the known values of root and nobody user on Ubuntu Linux.	2024-01-29 13:56:30 -06:00
Luiz Aoqui	41277f823f	license: fix some imports of BUSL-1.1 in MPL-2.0 (#19832 ) Some packages licensed under MPL-2.0 were incorrectly importing code from packages licensed under BUSL-1.1. Not all imports are fixed here as they will require additional work to untangle them. To help track progress this commit adds a Semgrep rule that detects incorrect BUSL-1.1 imports in MPL-2.0 packages.	2024-01-29 12:04:12 -05:00
James Rasell	10324566ae	driver/rawexec: populate OOM killed exit result. (#19829 )	2024-01-29 08:54:52 +00:00
James Rasell	8d6067e987	driver/qemu: populate OOM killed exit result. (#19830 )	2024-01-29 07:34:27 +00:00

1 2 3 4 5 ...

25612 Commits