nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-03 00:45:43 +03:00

Author	SHA1	Message	Date
Luiz Aoqui	d456cc1e7f	Track plan rejection history and automatically mark clients as ineligible (#13421 ) Plan rejections occur when the scheduler work and the leader plan applier disagree on the feasibility of a plan. This may happen for valid reasons: since Nomad does parallel scheduling, it is expected that different workers will have a different state when computing placements. As the final plan reaches the leader plan applier, it may no longer be valid due to a concurrent scheduling taking up intended resources. In these situations the plan applier will notify the worker that the plan was rejected and that they should refresh their state before trying again. In some rare and unexpected circumstances it has been observed that workers will repeatedly submit the same plan, even if they are always rejected. While the root cause is still unknown this mitigation has been put in place. The plan applier will now track the history of plan rejections per client and include in the plan result a list of node IDs that should be set as ineligible if the number of rejections in a given time window crosses a certain threshold. The window size and threshold value can be adjusted in the server configuration. To avoid marking several nodes as ineligible at one, the operation is rate limited to 5 nodes every 30min, with an initial burst of 10 operations.	2022-07-12 18:40:20 -04:00
Michael Schurter	f998a2b77b	core: merge reserved_ports into host_networks (#13651 ) Fixes #13505 This fixes #13505 by treating reserved_ports like we treat a lot of jobspec settings: merging settings from more global stanzas (client.reserved.reserved_ports) "down" into more specific stanzas (client.host_networks[].reserved_ports). As discussed in #13505 there are other options, and since it's totally broken right now we have some flexibility: Treat overlapping reserved_ports on addresses as invalid and refuse to start agents. However, I'm not sure there's a cohesive model we want to publish right now since so much 0.9-0.12 compat code still exists! We would have to explain to folks that if their -network-interface and host_network addresses overlapped, they could only specify reserved_ports in one place or the other?! It gets ugly. Use the global client.reserved.reserved_ports value as the default and treat host_network[].reserverd_ports as overrides. My first suggestion in the issue, but @groggemans made me realize the addresses on the agent's interface (as configured by -network-interface) may overlap with host_networks, so you'd need to remove the global reserved_ports from addresses shared with a shared network?! This seemed really confusing and subtle for users to me. So I think "merging down" creates the most expressive yet understandable approach. I've played around with it a bit, and it doesn't seem too surprising. The only frustrating part is how difficult it is to observe the available addresses and ports on a node! However that's a job for another PR.	2022-07-12 14:40:25 -07:00
Tim Gross	f295396ef8	docs: rename Internals to Concepts (#13696 )	2022-07-11 16:55:33 -04:00
Tim Gross	b209fc47da	docs: move operator subcommands under their own trees (#13677 ) The sidebar navigation tree for the `operator` sub-sub commands is getting cluttered and we have a new set of commands coming to support secure variables keyring as well. Move these all under their own subtrees.	2022-07-11 14:00:24 -04:00
Seth Hoenig	64f35f9cf3	docs: move upgrade docs for max_client_timeout Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>	2022-07-07 16:46:26 -05:00
Seth Hoenig	cbcceb0625	docs: upgrade guide for client max_kill_timeout	2022-07-07 15:27:40 -05:00
Luiz Aoqui	52389ff726	cli: improve output of eval commands (#13581 ) Use the same output format when listing multiple evals in the `eval list` command and when `eval status <prefix>` matches more than one eval. Include the eval namespace in all output formats and always include the job ID in `eval status` since, even `node-update` evals are related to a job. Add Node ID to the evals table output to help differentiate `node-update` evals. Co-authored-by: James Rasell <jrasell@hashicorp.com>	2022-07-07 13:13:34 -04:00
Ted Behling	295021caad	driver/docker: Don't pull InfraImage if it exists (#13265 ) Co-authored-by: James Rasell <jrasell@hashicorp.com>	2022-07-07 17:44:06 +02:00
Seth Hoenig	142918ac9f	docs: fixup from cr comments	2022-07-07 08:37:10 -05:00
Seth Hoenig	39fd91fe2e	docs: add docs for simple load balancing nomad services This PR adds a section to template docs for simple load balancing with nomad servicse.	2022-07-06 17:34:30 -05:00
James Rasell	11cb4c6d82	core: allow deleting of evaluations (#13492 ) * core: add eval delete RPC and core functionality. * agent: add eval delete HTTP endpoint. * api: add eval delete API functionality. * cli: add eval delete command. * docs: add eval delete website documentation.	2022-07-06 16:30:11 +02:00
James Rasell	24220d0a02	core: allow pausing and un-pausing of leader broker routine (#13045 ) * core: allow pause/un-pause of eval broker on region leader. * agent: add ability to pause eval broker via scheduler config. * cli: add operator scheduler commands to interact with config. * api: add ability to pause eval broker via scheduler config * e2e: add operator scheduler test for eval broker pause. * docs: include new opertor scheduler CLI and pause eval API info.	2022-07-06 16:13:48 +02:00
Michelle Noorali	b9e084a4b7	doc: explain permissions for Vault sys/capabilties-self	2022-07-06 10:01:30 -04:00
Yann Coleu	154bb23d23	docs: typo on command word (#13582 )	2022-07-05 16:24:25 -04:00
Derek Strickland	bbd11fd9b5	docs: update task leader to explain shutdown sequence. (#13498 ) * docs: update task leader to explain shutdown sequence.	2022-06-29 05:13:45 -04:00
James Rasell	c635ae0f89	docs: fixup HCL2 index collection function documentation. (#13511 )	2022-06-28 18:27:38 +02:00
Andrew	37e5accf09	Fix typo in Docker docs (#13497 )	2022-06-28 11:05:50 +02:00
Seth Hoenig	f1cafd0789	core: remove support for raft protocol version 2 This PR checks server config for raft_protocol, which must now be set to 3 or unset (0). When unset, version 3 is used as the default.	2022-06-23 14:37:50 +00:00
Michael Schurter	c52741ae1b	docs: clarify total_escaped is just an optimization (#13460 )	2022-06-22 11:39:56 -07:00
Elijah Voigt	009a4d9a85	Lob.com uses Nomad too! (#13295 ) Lob.com has been ramping up our use of Nomad for ~6 months. Now that we've started blogging about it we'd love to be on the _official_ list.	2022-06-21 09:10:08 -04:00
Nick Wales	37ee50010e	Merge pull request #13401 from nickwales/tls_typo Updates TLS documentation	2022-06-16 12:34:59 -05:00
Arthur Leclerc	7518f42d1c	docs: Fix typo (#13389 )	2022-06-16 13:24:18 -04:00
Nick Wales	a8dca34a3a	Updates TLS documentation	2022-06-16 12:15:40 -05:00
Luiz Aoqui	3737fb3c7d	docs: create volume spec page (#13353 ) In addition to jobs, there are other objects in Nomad that have a specific format and can be provided to commands and API endpoints. This commit creates a new menu section to hold the specification for volumes and update the command pages to point to the new centralized definition. Redirecting the previous entries is not possible with `redirect.js` because they are done server-side and URL fragments are not accessible to detect a match. So we provide hidden anchors with a link to the new page to guide users towards the new documentation. Co-authored-by: Tim Gross <tgross@hashicorp.com>	2022-06-14 14:08:25 -04:00
Grant Griffiths	2986f1f18a	CSI: make plugin health_timeout configurable in csi_plugin stanza (#13340 ) Signed-off-by: Grant Griffiths <ggriffiths@purestorage.com>	2022-06-14 10:04:16 -04:00
Michael Schurter	34959b26df	docs: explain behavior of system gc command (#13342 )	2022-06-13 09:54:23 +02:00
Derek Strickland	dd71afb891	template: improve default language for max_stale and wait (#13334 ) * template: improve default language for max_stale and wait Co-authored-by: Michael Schurter <mschurter@hashicorp.com>	2022-06-10 14:34:25 -04:00
Daniel Rossbach	9bb9aab714	qemu driver: Add option to configure drive_interface (#11864 )	2022-06-10 10:03:51 -04:00
Raffaele Di Fazio	0b9fc17ae4	Update supplement.mdx with the right GitHub spelling (#13326 )	2022-06-10 11:46:19 +02:00
phreakocious	f8774369d2	Add `guest_agent` config option for QEMU driver (#12800 ) Add boolean 'guest_agent' config option for QEMU driver, which will create the socket file for the QEMU Guest Agent in the task dir when enabled.	2022-06-09 09:21:38 -04:00
Derek Strickland	e78a5908b9	docker: update images to reference hashicorpdev Docker organization (#12903 ) docker: update images to reference hashicorpdev dockerhub organization generate job_init.bindata_assetfs.go Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>	2022-06-08 15:06:00 -04:00
Derek Strickland	7899fd3fac	consul-template: Add fault tolerant defaults (#13041 ) consul-template: Add fault tolerant defaults Co-authored-by: Tim Gross <tgross@hashicorp.com>	2022-06-08 14:08:25 -04:00
Shantanu Gadgil	b1a84bb77e	`heartbeat_grace` is a `server` parameter (#13288 ) `heartbeat_grace` is a `server` parameter, not a `client` parameter.	2022-06-08 10:49:23 -04:00
Conor Evans	2a01807d20	add filebase64 function (#11791 ) Signed-off-by: Conor Evans <coevans@tcd.ie>	2022-06-06 11:58:17 -04:00
dgotlieb	99b9408c91	docs: update warning for gateway listener docs for non-tcp protos	2022-06-06 10:53:01 -04:00
Radek Simko	0246944d68	docs/job-spec: Fix formatting in network page (#13228 )	2022-06-06 10:14:12 -04:00
Radek Simko	cbde2ba94b	docs/docker: fix broken link to bridge mode (#13221 )	2022-06-06 09:59:36 -04:00
Radek Simko	ff87354665	docs: link to client reqs section for added clarity (#13215 )	2022-06-06 09:56:29 -04:00
Lance Haig	eafc93902b	Allow Operator Generated bootstrap token (#12520 )	2022-06-03 07:37:24 -04:00
Huan Wang	b6e07487c2	adding support for customized ingress tls (#13184 )	2022-06-02 18:43:58 -04:00
Shantanu Gadgil	f0bc4cedca	fingerprint kernel architecture name (#13182 )	2022-06-02 15:51:00 -04:00
Seth Hoenig	9d03cd4c70	Merge pull request #12951 from jorgemarey/f-srv-tagged-addresses Allow setting tagged addresses on services	2022-06-01 10:51:49 -05:00
Anthony	5b80907a5d	docs: added note about vault -period flag (#13185 )	2022-05-31 14:26:03 -07:00
Seth Hoenig	69bbaa44f9	docs: add docs and tests for tagged_addresses	2022-05-31 13:02:48 -05:00
Toyam Cox	a145ffc6dc	docs: make the example for 'load' work (#13102 )	2022-05-27 08:48:58 -04:00
Seth Hoenig	865b43c049	Merge pull request #13125 from hashicorp/b-connect-upstream-namespace connect: enable setting connect upstream destination namespace	2022-05-26 10:29:11 -05:00
Seth Hoenig	616988c6fb	connect: enable setting connect upstream destination namespace	2022-05-26 09:39:36 -05:00
Amier Chery	07043893c1	Merge pull request #13083 from josegonzalez/patch-1 Update service.check.task definition to match code	2022-05-26 10:38:49 -04:00
Michael Schurter	3968509886	artifact: fix numerous go-getter security issues Fix numerous go-getter security issues: - Add timeouts to http, git, and hg operations to prevent DoS - Add size limit to http to prevent resource exhaustion - Disable following symlinks in both artifacts and `job run` - Stop performing initial HEAD request to avoid file corruption on retries and DoS opportunities. Approach Since Nomad has no ability to differentiate a DoS-via-large-artifact vs a legitimate workload, all of the new limits are configurable at the client agent level. The max size of HTTP downloads is also exposed as a node attribute so that if some workloads have large artifacts they can specify a high limit in their jobspecs. In the future all of this plumbing could be extended to enable/disable specific getters or artifact downloading entirely on a per-node basis.	2022-05-24 16:29:39 -04:00
PinkLolicorn	b181919ce6	docs: `mount_flags` takes a slice of strings (#13087 ) The description of `mount_flags` provides incorrect example of the accepted value format. This fixes the issue by changing the example from a string `ro,noatime` to a slice of strings `["ro", "noatime"]`.	2022-05-20 09:16:17 -04:00

1 2 3 4 5 ...

460 Commits