nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-06 10:25:42 +03:00

Author	SHA1	Message	Date
Alex Munda	b5d21a9191	Always allow idempotency key meta. Tests for idempotent dispatch	2021-06-29 10:30:04 -05:00
James Rasell	d1a930213f	Merge pull request #10822 from hashicorp/b-gh-10820 cli: fixed system commands so they correctly use passed flags.	2021-06-29 10:11:00 +02:00
James Rasell	d5bbb9b235	changelog: add entry for #10822	2021-06-29 10:10:32 +02:00
Tim Gross	7edf0cc108	docs: unset port `to` field maps to dynamic port (#10828 )	2021-06-28 15:55:24 -04:00
Tim Gross	166a056531	docs: add missing backwards compat warning about port_map (#10827 ) The `docker` driver's `port_map` field was deprecated in 0.12 and this is documented in the task driver's docs, but we never explicitly flagged it for backwards compatibility.	2021-06-28 15:49:41 -04:00
Kendall Strautman	1b68a1d067	chore: upgrade react-head and deps (#10811 )	2021-06-28 08:55:38 -07:00
Seth Hoenig	37729bb027	consul/connect: automatically set consul tls sni name for connect native tasks This PR makes it so that Nomad will automatically set the CONSUL_TLS_SERVER_NAME environment variable for Connect native tasks running in bridge networking mode where Consul has TLS enabled. Because of the use of a unix domain socket for communicating with Consul when in bridge networking mode, the server name is a file name instead of something compatible with the mTLS certificate Consul will authenticate against. "localhost" is by default a compatible name, so Nomad will set the environment variable to that. Fixes #10804	2021-06-28 08:36:53 -05:00
Boris Shomodjvarac	2cd8f3ae93	docs: update csi_plugin example (#10821 ) Current efs driver does not support telling it if its a `node` or a `controller`, and it will not print any error it will just ignore all other parameters then:( So this will result in endpoint being `/tmp/csi.sock` and not `/csi/csi.sock` which will in turn break nomad/csi integration. Also I changed the latest image tag to v1.3.2 to make sure anybody copy pasting this example is sure that it will work. Tested on nomad 1.1.2	2021-06-28 08:28:03 -04:00
James Rasell	2e5f30bd8d	cli: fixed system commands so they correctly use passed flags.	2021-06-28 10:57:50 +02:00
Tim Gross	9b35750489	csi: fix CLI panic when formatting volume status with -verbose flag (#10818 ) When the `-verbose` flag is passed to the `nomad volume status` command, we hit a code path where the rows of text to be formatted were not initialized correctly, resulting in a panic in the CLI.	2021-06-25 16:17:37 -04:00
Alex Munda	c4be87ad1c	Enforce idempotency of dispatched jobs using special meta key	2021-06-23 17:10:31 -05:00
Zachary Shilton	09bd33640b	website: upgrade code-block (#10792 ) * website: upgrade code-block * website: bump to latest pre-releases * website: bump to stable releases	2021-06-22 11:55:58 -04:00
Mahmood Ali	6d2b704dda	Merge pull request #10801 from hashicorp/merge-release-1.1.2 Prepare for 1.1.3 development	2021-06-22 10:46:41 -04:00
Mahmood Ali	1bf9e7c266	prepare for 1.1.3 development	2021-06-22 10:41:44 -04:00
Mahmood Ali	41319b022f	update website to 1.1.2 (#10800 )	2021-06-22 10:40:46 -04:00
Tim Gross	3fdbbeefe0	docs: improve CSI deployment recommendations (#10798 ) * add some more context to the recommendations * add recommendations around per-AZ `plugin_id`	2021-06-22 10:23:09 -04:00
Nomad Release Bot	4615b9602b	remove generated files	2021-06-22 14:13:49 +00:00
Nomad Release Bot	e978d371fa	Release v1.1.2	2021-06-22 14:12:36 +00:00
Nomad Release bot	60638a086e	Generate files for 1.1.2 release	2021-06-22 00:45:27 +00:00
Mahmood Ali	90987f272e	prepare changelog for 1.1.2	2021-06-21 20:36:39 -04:00
Dave May	b430bafe90	Add remaining pprof profiles to nomad operator debug (#10748 ) * Add remaining pprof profiles to debug dump * Refactor pprof profile capture * Add WaitForFilesUntil and WaitForResultUntil utility functions * Add CHANGELOG entry	2021-06-21 14:22:49 -04:00
Seth Hoenig	fa1be62204	Merge pull request #10795 from hashicorp/docs-update-cl docs: update cl with missing entries	2021-06-21 09:24:59 -05:00
Seth Hoenig	15160ded65	docs: update cl with missing entries	2021-06-21 09:22:48 -05:00
Huan Wang	df74bc0cbb	update-gopsutil (#10790 )	2021-06-21 10:19:39 -04:00
Seth Hoenig	d61bf59105	Merge pull request #10789 from hashicorp/b-cns-mups consul/connect: Validate uniqueness of Connect upstreams within task group	2021-06-21 09:01:02 -05:00
Seth Hoenig	adcbcc129b	consul/connect: Validate uniqueness of Connect upstreams within task group This PR adds validation during job submission that Connect proxy upstreams within a task group are using different listener addresses. Otherwise, a duplicate envoy listener will be created and not be able to bind. Closes #7833	2021-06-18 16:50:53 -05:00
Russell Rollins	c56251b9dd	Adds error handling for client error in getRandomJobAlloc. (#10787 )	2021-06-18 16:26:43 -04:00
Seth Hoenig	6dcada4346	Merge pull request #10784 from hashicorp/b-dlskf e2e: fix a couple recent e2e bugs	2021-06-18 13:17:20 -05:00
Seth Hoenig	15d39f0dee	e2e: use -detach mode when registering jobs with cli This PR changes the e2e helper thingy to set -detach option when registering a job with the CLI instead of the API. This is necessary for jobs which never become healthy, as the deployment never finishes for failing jobs and the command never returns, causing the test to timeout after 10 minutes.	2021-06-18 12:18:40 -05:00
Seth Hoenig	57fdb81433	consul: set task name only for group service checks This PR fixes a bug introduced in a refactoring https://github.com/hashicorp/nomad/pull/10764/files#diff-56b3c82fcbc857f8fb93a903f1610f6e6859b3610a4eddf92bad9ea27fdc85ec where task level service checks would inherent the task name field, when they shouldn't. Fixes #10781	2021-06-18 12:16:27 -05:00
Tim Gross	2520d83e85	tests: allocrunner CNI tests are Linux-only (#10783 ) Running the `client/allocrunner` tests fail to compile on macOS because the CNI test file depends on the CNI network configurator, which is in a Linux-only file.	2021-06-18 11:34:31 -04:00
Tim Gross	77f6ecbbbf	deps: bump go-getter to 1.5.4 (#10778 )	2021-06-17 16:30:00 -04:00
Seth Hoenig	2d8fc6b344	Merge pull request #10776 from hashicorp/b-cns-sysjob-ups consul/connect: in-place update service definition when connect upstreams are modified	2021-06-17 10:13:56 -05:00
Tim Gross	ad3070a1c2	docs: host_network does support Docker task port mapping (#10774 )	2021-06-17 09:11:10 -04:00
Tim Gross	b0922e90a7	changelog entry for #10756	2021-06-16 22:02:10 -04:00
Seth Hoenig	7ba60b4e33	consul/connect: in-place update service definition when connect upstreams are modified This PR fixes a bug where modifying the upstreams of a Connect sidecar proxy would not result Consul applying the changes, unless an additional change to the job would trigger a task replacement (thus replacing the service definition). The fix is to check if upstreams have been modified between Nomad's view of the sidecar service definition, and the service definition for the sidecar that is actually registered in Consul. Fixes #8754	2021-06-16 16:48:26 -05:00
Tim Gross	2a640f0b2d	docker: generate /etc/hosts file for bridge network mode (#10766 ) When `network.mode = "bridge"`, we create a pause container in Docker with no networking so that we have a process to hold the network namespace we create in Nomad. The default `/etc/hosts` file of that pause container is then used for all the Docker tasks that share that network namespace. Some applications rely on this file being populated. This changeset generates a `/etc/hosts` file and bind-mounts it to the container when Nomad owns the network, so that the container's hostname has an IP in the file as expected. The hosts file will include the entries added by the Docker driver's `extra_hosts` field. In this changeset, only the Docker task driver will take advantage of this option, as the `exec`/`java` drivers currently copy the host's `/etc/hosts` file and this can't be changed without breaking backwards compatibility. But the fields are available in the task driver protobuf for community task drivers to use if they'd like.	2021-06-16 14:55:22 -04:00
dependabot[bot]	3b5bca63b6	build(deps): bump postcss from 7.0.35 to 7.0.36 in /website (#10772 ) Bumps [postcss](https://github.com/postcss/postcss) from 7.0.35 to 7.0.36. - [Release notes](https://github.com/postcss/postcss/releases) - [Changelog](https://github.com/postcss/postcss/blob/main/CHANGELOG.md) - [Commits](https://github.com/postcss/postcss/compare/7.0.35...7.0.36) --- updated-dependencies: - dependency-name: postcss dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2021-06-16 12:18:43 -04:00
dependabot[bot]	0983b073e3	build(deps): bump ws from 7.3.1 to 7.4.6 in /scripts/screenshots/src (#10671 ) Bumps [ws](https://github.com/websockets/ws) from 7.3.1 to 7.4.6. - [Release notes](https://github.com/websockets/ws/releases) - [Commits](https://github.com/websockets/ws/compare/7.3.1...7.4.6) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2021-06-16 11:09:34 -04:00
Seth Hoenig	e16a35167b	Merge pull request #10765 from hashicorp/b-java-fp-version client/fingerprint/java: improve java version string regex matching	2021-06-15 17:14:13 -05:00
Seth Hoenig	674183c35d	client/fingerprint/java: improve java version string regex matching This PR improves the regular expression used for matching the java version string, which varies a lot depending on the java vendor and version. These are the example strings we now test for: java version "1.7.0_80" openjdk version "11.0.1" 2018-10-16 openjdk version "11.0.1" 2018-10-16 java version "1.6.0_36" openjdk version "1.8.0_192" openjdk 11.0.11 2021-04-20 LTS The last one is a new test added on behalf of #6081, which is still broken on today's CentOS 7 default JDK package. openjdk 11.0.11 2021-04-20 LTS OpenJDK Runtime Environment 18.9 (build 11.0.11+9-LTS) OpenJDK 64-Bit Server VM 18.9 (build 11.0.11+9-LTS, mixed mode, sharing) ==> Evaluation "21c6caf7" finished with status "complete" but failed to place all allocations: Task Group "example" (failed to place 1 allocation): * Constraint "${driver.java.version} >= 11.0.0": 1 nodes excluded by filter Evaluation "2b737d48" waiting for additional capacity to place remainder Fixes #6081	2021-06-15 14:15:01 -05:00
Seth Hoenig	52bf197790	Merge pull request #10764 from hashicorp/b-passfail-lost consul: make failures_before_critical and success_before_passing work with group services	2021-06-15 12:41:04 -05:00
Seth Hoenig	0ef0b2ef2b	docs: add bugfix note to 1.0.8	2021-06-15 12:40:44 -05:00
Seth Hoenig	b4a631c1c5	consul: make failures_before_critical and success_before_passing work with group services This PR fixes some job submission plumbing to make sure the Consul Check parameters - failure_before_critical - success_before_passing work with group-level services. They already work with task-level services.	2021-06-15 11:20:40 -05:00
Seth Hoenig	ab9b589b33	Merge pull request #10762 from hashicorp/docs-update-cl-2 docs: update changelog	2021-06-15 09:25:51 -05:00
Seth Hoenig	d7530f04ae	docs: update changelog	2021-06-15 09:17:06 -05:00
James Rasell	c3b15b8733	Merge pull request #10758 from hashicorp/b-fix-test-datarace-plugins plugins: fix test data race.	2021-06-15 14:33:53 +02:00
James Rasell	ff4cd338d9	plugins: fix test data race.	2021-06-15 09:31:08 +02:00
Isabel Suchanek	ca010f9f87	cli: check deployment exists before monitoring (#10757 ) System and batch jobs don't create deployments, which means nomad tries to monitor a non-existent deployment when it runs a job and outputs an error message. This adds a check to make sure a deployment exists before monitoring. Also fixes some formatting. Co-authored-by: Tim Gross <tgross@hashicorp.com>	2021-06-14 16:42:38 -07:00
Mahmood Ali	8052ae1d11	deployment watcher: Reuse allocsCh if allocIndex remains the same (#10756 ) Fix deployment watchers to avoid creating unnecessary deployment watcher goroutines and blocking queries. `deploymentWatcher.getAllocsCh` creates a new goroutine that makes a blocking query to fetch updates of deployment allocs. ## Background When operators submit a new or updated service job, Nomad create a new deployment by default. The deployment object controls how fast to place the allocations through [`max_parallel`](https://www.nomadproject.io/docs/job-specification/update#max_parallel) and health checks configurations. The `scheduler` and `deploymentwatcher` package collaborate to achieve deployment logic: The scheduler only places the canaries and `max_parallel` allocations for a new deployment; the `deploymentwatcher` monitors for alloc progress and then enqueues a new evaluation whenever the scheduler should reprocess a job and places the next `max_parallel` round of allocations. The `deploymentwatcher` package makes blocking queries against the state store, to fetch all deployments and the relevant allocs for each running deployments. If `deploymentwatcher` fails or is hindered from fetching the state, the deployments fail to make progress. `Deploymentwatcher` logic only runs on the leader. ## Why unnecessary deployment watchers can halt cluster progress Previously, `getAllocsCh` is called on every for loop iteration in `deploymentWatcher.watch()` function. However, the for-loop may iterate many times before the allocs get updated. In fact, whenever a new deployment is created/updated/deleted, all `deploymentWatcher`s get notified through `w.deploymentUpdateCh`. The `getAllocsCh` goroutines and blocking queries spike significantly and grow quadratically with respect to the number of running deployments. The growth leads to two adverse outcomes: 1. it spikes the CPU/Memory usage resulting potentially leading to OOM or very slow processing 2. it activates the [query rate limiter](`abaa9c5c5b/nomad/deploymentwatcher/deployment_watcher.go (L896-L898)`), so later the watcher fails to get updates and consequently fails to make progress towards placing new allocations for the deployment! So the cluster fails to catch up and fails to make progress in almost all deployments. The cluster recovers after a leader transition: the deposed leader stops all watchers and free up goroutines and blocking queries; the new leader recreates the watchers without the quadratic growth and remaining under the rate limiter. Well, until a spike of deployments are created triggering the condition again. ### Relevant Code References Path for deployment monitoring: * [`Watcher.watchDeployments`](`abaa9c5c5b/nomad/deploymentwatcher/deployments_watcher.go (L164-L192)`) loops waiting for deployment updates. * On every deployment update, [`w.getDeploys`](`abaa9c5c5b/nomad/deploymentwatcher/deployments_watcher.go (L194-L229)`) returns all deployments in the system * `watchDeployments` calls `w.add(d)` on every active deployment * which in turns, [updates existing watcher if one is found](`abaa9c5c5b/nomad/deploymentwatcher/deployments_watcher.go (L251-L255)`). * The deployment watcher [updates local local deployment field and trigger `deploymentUpdateCh` channel]( `abaa9c5c5b/nomad/deploymentwatcher/deployment_watcher.go (L136-L147)`) * The [deployment watcher `deploymentUpdateCh` selector is activated](`abaa9c5c5b/nomad/deploymentwatcher/deployment_watcher.go (L455-L489)`). Most of the time the selector clause is a no-op, because the flow was triggered due to another deployment update * The `watch` for-loop iterates again and in the previous code we create yet another goroutine and blocking call that risks being rate limited. Co-authored-by: Tim Gross <tgross@hashicorp.com>	2021-06-14 16:01:01 -04:00

1 2 3 4 5 ...

21557 Commits