nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-04 09:25:46 +03:00

Author	SHA1	Message	Date
Michael Schurter	796c05b9b8	client: register before restoring Registration and restoring allocs don't share state or depend on each other in any way (syncing allocs with servers is done outside of registration). Since restoring is synchronous, start the registration goroutine first. For nodes with lots of allocs to restore or close to their heartbeat deadline, this could be the difference between becoming "lost" or not.	2019-05-14 10:53:27 -07:00
Michael Schurter	6a2792ad90	client: do not restart dead tasks until server is contacted (try 2) Refactoring of 104067bc2b2002a4e45ae7b667a476b89addc162 Switch the MarkLive method for a chan that is closed by the client. Thanks to @notnoop for the idea! The old approach called a method on most existing ARs and TRs on every runAllocs call. The new approach does a once.Do call in runAllocs to accomplish the same thing with less work. Able to remove the gate abstraction that did much more than was needed.	2019-05-14 10:53:27 -07:00
Michael Schurter	e7042b674b	client: do not restart dead tasks until server is contacted Fixes #1795 Running restored allocations and pulling what allocations to run from the server happen concurrently. This means that if a client is rebooted, and has its allocations rescheduled, it may restart the dead allocations before it contacts the server and determines they should be dead. This commit makes tasks that fail to reattach on restore wait until the server is contacted before restarting.	2019-05-14 10:53:27 -07:00
Lang Martin	a732cd1f06	Merge pull request #5642 from hashicorp/b-network-fingerprinting-ipv4 network fingerprinting multiple IPs on the configured network device	2019-05-13 11:46:53 -04:00
Lang Martin	c7071a12e3	client improve a comment in updateNetworks	2019-05-10 11:25:04 -04:00
Mahmood Ali	5abbee5d39	Merge pull request #5632 from hashicorp/f-nomad-exec-parts-01-base nomad exec part 1: plumbing and docker driver	2019-05-09 18:09:27 -04:00
Mahmood Ali	979a6a1778	implement client endpoint of nomad exec Add a client streaming RPC endpoint for processing nomad exec tasks, by invoking the relevant task handler for execution.	2019-05-09 16:49:08 -04:00
Preetha	eb7a3bc616	Merge pull request #5654 from hashicorp/b-hearbeat-lockfix Remove unnecessary locking and serverlist syncing in heartbeats	2019-05-08 13:36:39 -05:00
Preetha Appan	12e1804733	code review feedback	2019-05-07 16:23:32 -05:00
Chris Baker	4b54e27841	stale allocation data leads to incorrect (and even negative) metrics (#5637 ) * client: was not using up-to-date client state in determining which alloc count towards allocated resources * Update client/client.go Co-Authored-By: cgbaker <cgbaker@hashicorp.com>	2019-05-07 15:54:36 -04:00
Preetha Appan	5f88d0f408	Remove unnecessary locking and serverlist syncing in heartbeats This removes an unnecessary shared lock between discovery and heartbeating which was causing heartbeats to be missed upon retries when a single server fails. Also made a drive by fix to call the periodic server shuffler goroutine.	2019-05-06 14:44:55 -05:00
Lang Martin	22568599e8	client fingerprinting can keep multi ips on a device	2019-05-02 18:11:28 -04:00
Lang Martin	33f550fb52	Merge pull request #5553 from hashicorp/b-fingerprinter-manual-config client fingerprinter doesn't overwrite manual configuration	2019-04-26 12:55:34 -04:00
Danielle Lancashire	023d0dff31	allocs: Add nomad alloc signal command This command will be used to send a signal to either a single task within an allocation, or all of the tasks if <task-name> is omitted. If the sent signal terminates the allocation, it will be treated as if the allocation has crashed, rather than as if it was operator-terminated. Signal validation is currently handled by the driver itself and nomad does not attempt to restrict or validate them.	2019-04-25 12:43:32 +02:00
Chris Baker	7d8fa4c045	client/metrics: modified metrics to use (updated) client copy of allocation instead of (unupdated) server copy	2019-04-22 18:31:45 +00:00
Lang Martin	583ae3722c	client fingerprinter doesn't overwrite manual configuration Revert "Revert accidental merge of pr #5482" This reverts commit `c45652ab8c`.	2019-04-19 15:23:48 -04:00
Mahmood Ali	8041b0cbe2	clarify cryptic log line	2019-04-19 09:31:43 -04:00
Mahmood Ali	9dcebcd8a3	client: avoid registering node twice right away I noticed that `watchNodeUpdates()` almost immediately after `registerAndHeartbeat()` calls `retryRegisterNode()`, well after 5 seconds. This call is unnecessary and made debugging a bit harder. So here, we ensure that we only re-register node for new node events, not for initial registration.	2019-04-19 09:12:50 -04:00
Mahmood Ali	7a68d76160	client: wait for batched driver updated Here we retain 0.8.7 behavior of waiting for driver fingerprints before registering a node, with some timeout. This is needed for system jobs, as system job scheduling for node occur at node registration, and the race might mean that a system job may not get placed on the node because of missing drivers. The timeout isn't strictly necessary, but raising it to 1 minute as it's closer to indefinitely blocked than 1 second. We need to keep the value high enough to capture as much drivers/devices, but low enough that doesn't risk blocking too long due to misbehaving plugin. Fixes https://github.com/hashicorp/nomad/issues/5579	2019-04-19 09:00:24 -04:00
Lang Martin	c45652ab8c	Revert accidental merge of pr #5482 Revert "fingerprint Constraints and Affinities have Equals, as set" This reverts commit `596f16fb5f`. Revert "client tests assert the independent handling of interface and speed" This reverts commit `7857ac5993`. Revert "structs missed applying a style change from the review" This reverts commit `658916e327`. Revert "client, structs comments" This reverts commit `be2838d6ba`. Revert "client fingerprint updateNetworks preserves the network configuration" This reverts commit `fc309cb430`. Revert "client_test cleanup comments from review" This reverts commit `bc0bf4efb9`. Revert "client Networks Equals is set equality" This reverts commit `f8d432345b`. Revert "struct cleanup indentation in RequestedDevice Equals" This reverts commit `f4746411ca`. Revert "struct Equals checks for identity before value checking" This reverts commit `0767a4665e`. Revert "fix client-test, avoid hardwired platform dependecy on lo0" This reverts commit `e89dbb2ab1`. Revert "refactor error in client fingerprint to include the offending data" This reverts commit `a7fed726c6`. Revert "add client updateNodeResources to merge but preserve manual config" This reverts commit `84bd433c7e`. Revert "refactor struts.RequestedDevice to have its own Equals" This reverts commit `6897825240`. Revert "refactor structs.Resource.Networks to have its own Equals" This reverts commit `49e2e6c77b`. Revert "refactor structs.Resource.Devices to have its own Equals" This reverts commit `4ede9226bb`. Revert "add COMPAT(0.10): Remove in 0.10 notes to impl for structs.Resources" This reverts commit `49fbaace52`. Revert "add structs.Resources Equals" This reverts commit `8528a2a2a6`. Revert "test that fingerprint resources are updated, net not clobbered" This reverts commit `8ee02ddd23`.	2019-04-11 10:29:40 -04:00
Lang Martin	be2838d6ba	client, structs comments	2019-04-11 09:56:22 -04:00
Lang Martin	fc309cb430	client fingerprint updateNetworks preserves the network configuration	2019-04-11 09:56:22 -04:00
Lang Martin	84bd433c7e	add client updateNodeResources to merge but preserve manual config	2019-04-11 09:56:22 -04:00
Danielle Lancashire	419d70c5f9	allocs: Add nomad alloc restart This adds a `nomad alloc restart` command and api that allows a job operator with the alloc-lifecycle acl to perform an in-place restart of a Nomad allocation, or a given subtask.	2019-04-11 14:25:49 +02:00
Michael Schurter	645c8c41ea	client: log when allocs have been processed Will hopefully help us catch deadlocks/livelocks/slowdowns in the add/remove allocs pipeline which should be fast.	2019-02-04 11:07:57 -08:00
Preetha Appan	032ec425c2	Only set deployment health if not already set	2019-01-12 10:38:20 -06:00
Michael Schurter	a20ae598c7	Apply suggestions from code review Co-Authored-By: preetapan <preetha@hashicorp.com>	2019-01-12 10:38:20 -06:00
Preetha Appan	72dead7448	REfactor statedb factory config to set it directly in client config	2019-01-12 10:38:20 -06:00
Preetha Appan	5d7472fe82	Remove invalid allocs	2019-01-12 10:38:20 -06:00
Preetha Appan	80919bf713	Modified destroy failure handling to rely on allocrunner's destroy method Added a unit test with custom statedb implementation that errors, to use to verify destroy errors	2019-01-12 10:37:12 -06:00
Preetha Appan	29894883a2	Add back code to mark alloc as failed when restore fails Also modify restore such that any handled errors don't propagate back to the client	2019-01-12 10:37:12 -06:00
Preetha Appan	80d92481ca	Revert code that made an alloc update when restore fails Restore currently shuts down the client so the alloc update cant always make it to the server	2019-01-12 10:37:12 -06:00
Preetha Appan	cf9c398296	Handle client initialization errors when adding allocs or restoring allocs We mark the alloc as failed and track failed allocs so that we don't send updates after the first time	2019-01-12 10:37:12 -06:00
Danielle Tomlinson	dccf2a0de9	client: Cleanup allocrunner access	2019-01-11 18:39:18 +01:00
Alex Dadgar	437f03d877	recover	2019-01-07 14:49:40 -08:00
Nick Ethier	145827d8b7	fix tests that fail as a result of async client startup	2018-12-20 00:53:44 -05:00
Michael Schurter	784706a1e5	client/state: support upgrading from 0.8->0.9 Also persist and load DeploymentStatus to avoid rechecking health after client restarts.	2018-12-19 10:39:27 -08:00
Nick Ethier	12528cadda	drivermanager: attempt to reattach and shutdown driver plugin if blocked by allow/block lists	2018-12-18 23:01:57 -05:00
Nick Ethier	6951ca487d	drivermanager: use allocID and task name to route task events	2018-12-18 23:01:51 -05:00
Nick Ethier	331793e283	client: batch initial fingerprinting in plugin manangers drivermanager: fix pr comments/feedback	2018-12-18 22:56:19 -05:00
Nick Ethier	2f010a2f25	client/drivermananger: fixup issues from rebase and address PR comments	2018-12-18 22:55:38 -05:00
Nick Ethier	39ca1b00dd	client/drivermananger: add driver manager The driver manager is modeled after the device manager and is started by the client. It's responsible for handling driver lifecycle and reattachment state, as well as processing the incomming fingerprint and task events from each driver. The mananger exposes a method for registering event handlers for task events that is used by the task runner to update the server when a task has been updated with an event. Since driver fingerprinting has been implemented by the driver manager, it is no longer needed in the fingerprint mananger and has been removed.	2018-12-18 22:55:18 -05:00
Danielle Tomlinson	934d2e6bf6	client: Async API for shutdown/destroy allocrunners	2018-12-18 23:38:33 +01:00
Danielle Tomlinson	bba8b4ef4f	Merge pull request #4989 from hashicorp/dani/b-client-update-race-condition client: Give a copy of clientconfig to allocrunner	2018-12-17 10:49:46 +01:00
Danielle Tomlinson	98dc399d5c	Merge pull request #4990 from hashicorp/dani/b-alloc-lock client: updateAlloc release lock after read	2018-12-13 12:43:59 +01:00
Danielle Tomlinson	30bed980f1	client: Give a copy of clientconfig to allocrunner Currently, there is a race condition between creating a taskrunner, and updating node attributes via fingerprinting. This is because the taskenv builder will try to iterate over the clientconfig.Node.Attributes map, which can be concurrently updated by the fingerprinting process, thus causing a panic. This fixes that by providing a copy of the clientconfg to the allocrunner inside the Read lock during config creation.	2018-12-13 12:42:15 +01:00
Danielle Tomlinson	875dd737cb	client: updateAlloc release lock after read The allocLock is used to synchronize access to the alloc runner map, not to ensure internal consistency of the alloc runners themselves. This updates the updateAlloc process to avoid hanging on to an exclusive lock of the map while applying changes to allocrunners themselves, as they should be internally consistent. This fixes a bug where any client allocation api will block during the shutdown or updating of an allocrunner and its child taskrunners.	2018-12-12 16:30:01 +01:00
Mahmood Ali	926428fe0f	Merge pull request #4984 from hashicorp/b-client-update-driver client: update driver info on new driver fingerprint	2018-12-11 18:01:03 -05:00
Alex Dadgar	f42c060d35	Merge pull request #4970 from hashicorp/f-no-iops Deprecate IOPS	2018-12-11 12:51:22 -08:00
Mahmood Ali	cae36e49a6	client: update driver info on new fingerprint Fixes a bug where a driver health and attributes are never updated from their initial status. If a driver started unhealthy, it may never go into a healthy status.	2018-12-11 14:25:10 -05:00

1 2 3 4 5 ...

582 Commits