nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-16 07:15:40 +03:00

Author	SHA1	Message	Date
Lang Martin	5b010fab10	csi: use node MaxVolumes during scheduling (#7565 ) * nomad/state/state_store: CSIVolumesByNodeID ignores namespace * scheduler/scheduler: add CSIVolumesByNodeID to the state interface * scheduler/feasible: check node MaxVolumes * nomad/csi_endpoint: no namespace inn CSIVolumesByNodeID anymore * nomad/state/state_store: avoid DenormalizeAllocationSlice * nomad/state/iterator: clean up SliceIterator Next * scheduler/feasible_test: block with MaxVolumes * nomad/state/state_store_test: fix args to CSIVolumesByNodeID	2020-03-31 17:16:47 -04:00
Chris Baker	1c9bac9087	wip: added job.scale rpc endpoint, needs explicit test (tested via http now)	2020-03-24 13:57:09 +00:00
Mahmood Ali	08e5a9087f	Merge pull request #7414 from hashicorp/b-network-mode-change Detect network mode change	2020-03-24 09:46:40 -04:00
Lang Martin	ce9dbe619f	csi: the scheduler allows a job with a volume write claim to be updated (#7438 ) * nomad/structs/csi: split CanWrite into health, in use * scheduler/scheduler: expose AllocByID in the state interface * nomad/state/state_store_test * scheduler/stack: SetJobID on the matcher * scheduler/feasible: when a volume writer is in use, check if it's us * scheduler/feasible: remove SetJob * nomad/state/state_store: denormalize allocs before Claim * nomad/structs/csi: return errors on claim, with context * nomad/csi_endpoint_test: new alloc doesn't look like an update * nomad/state/state_store_test: change test reference to CanWrite	2020-03-23 21:21:04 -04:00
Tim Gross	2cebb3e66f	csi: improve error messages from scheduler (#7426 )	2020-03-23 13:59:25 -04:00
Lang Martin	9c9a0c5eb5	csi: volume ids are only unique per namespace (#7358 ) * nomad/state/schema: use the namespace compound index * scheduler/scheduler: CSIVolumeByID interface signature namespace * scheduler/stack: SetJob on CSIVolumeChecker to capture namespace * scheduler/feasible: pass the captured namespace to CSIVolumeByID * nomad/state/state_store: use namespace in csi_volume index * nomad/fsm: pass namespace to CSIVolumeDeregister & Claim * nomad/core_sched: pass the namespace in volumeClaimReap * nomad/node_endpoint_test: namespaces in Claim testing * nomad/csi_endpoint: pass RequestNamespace to state.* * nomad/csi_endpoint_test: appropriately failed test * command/alloc_status_test: appropriately failed test * node_endpoint_test: avoid notTheNamespace for the job * scheduler/feasible_test: call SetJob to capture the namespace * nomad/csi_endpoint: ACL check the req namespace, query by namespace * nomad/state/state_store: remove deregister namespace check * nomad/state/state_store: remove unused CSIVolumes * scheduler/feasible: CSIVolumeChecker SetJob -> SetNamespace * nomad/csi_endpoint: ACL check * nomad/state/state_store_test: remove call to state.CSIVolumes * nomad/core_sched_test: job namespace match so claim gc works	2020-03-23 13:59:25 -04:00
Danielle Lancashire	777d98bc1a	sched/feasible: Return more detailed CSI Failure messages	2020-03-23 13:58:30 -04:00
Danielle Lancashire	0aca822f8d	sched/feasible: Validate CSIVolume's correctly Previously we were looking up plugins based on the Alias Name for a CSI Volume within the context of its task group. Here we first look up a volume based on its identifier and then validate the existence of the plugin based on its `PluginID`.	2020-03-23 13:58:30 -04:00
Danielle Lancashire	356c77ef84	sched/feasible: CSI - Filter applicable volumes This commit filters the jobs volumes when setting them on the feasibility checker. This ensures that the rest of the checker does not have to worry about non-csi volumes.	2020-03-23 13:58:30 -04:00
Lang Martin	9f850016bd	csi: fix index maintenance for CSIVolume and CSIPlugin tables (#7049 ) * state_store: csi volumes/plugins store the index in the txn * nomad: csi_endpoint_test require index checks need uint64() * nomad: other tests using int 0 not uint64(0) * structs: pass index into New, but not other struct methods * state_store: csi plugin indexes, use new struct interface * nomad: csi_endpoint_test check index/query meta (on explicit 0) * structs: NewCSIVolume takes an index arg now * scheduler/test: NewCSIVolume takes an index arg now	2020-03-23 13:58:29 -04:00
Lang Martin	f370e25843	CSI: Scheduler knows about CSI constraints and availability (#6995 ) * structs: piggyback csi volumes on host volumes for job specs * state_store: CSIVolumeByID always includes plugins, matches usecase * scheduler/feasible: csi volume checker * scheduler/stack: add csi volumes * contributing: update rpc checklist * scheduler: add volumes to State interface * scheduler/feasible: introduce new checker collection tgAvailable * scheduler/stack: taskGroupCSIVolumes checker is transient * state_store CSIVolumeDenormalizePlugins comment clarity * structs: remote TODO comment in TaskGroup Validate * scheduler/feasible: CSIVolumeChecker hasPlugins improve comment * scheduler/feasible_test: set t.Parallel * Update nomad/state/state_store.go Co-Authored-By: Danielle <dani@hashicorp.com> * Update scheduler/feasible.go Co-Authored-By: Danielle <dani@hashicorp.com> * structs: lift ControllerRequired to each volume * state_store: store plug.ControllerRequired, use it for volume health * feasible: csi match fast path remove stale host volume copied logic * scheduler/feasible: improve comments Co-authored-by: Danielle <dani@builds.terrible.systems>	2020-03-23 13:58:29 -04:00
Jasmine Dahilig	5bf43b50d4	fix bug in lifecycle scheduler test mocks	2020-03-21 17:52:51 -04:00
Jasmine Dahilig	0dea3a39b0	add test cases for scheduler alloc placement with lifecycle resources	2020-03-21 17:52:47 -04:00
Jasmine Dahilig	ae38e5284b	add allocfit test for lifecycles	2020-03-21 17:52:46 -04:00
Mahmood Ali	7897104b72	update scheduler to account for hooks	2020-03-21 17:52:45 -04:00
Mahmood Ali	82cb13aac9	Detect network mode change Mark job as updated if network mode changed.	2020-03-21 16:51:10 -04:00
Drew Bailey	7955f2b3a6	include pro tag in serveral oss.go files	2020-02-10 15:56:14 -05:00
Drew Bailey	f7fb6219a9	add state store test to ensure PlacedCanaries is updated	2020-02-03 13:58:01 -05:00
Drew Bailey	895e563461	nomad state store must be modified through raft, rm local state change	2020-02-03 13:57:34 -05:00
Drew Bailey	a880d75b16	comment for filtering reason	2020-02-03 09:02:09 -05:00
Drew Bailey	92f0a343cb	add test for node eligibility	2020-02-03 09:02:09 -05:00
Drew Bailey	cd00d6ded5	make diffSystemAllocsForNode aware of eligibility diffSystemAllocs -> diffSystemAllocsForNode, this function is only used for diffing system allocations, but lacked awareness of eligible nodes and the node ID that the allocation was going to be placed. This change now ignores a change if its existing allocation is on an ineligible node. For a new allocation, it also checks tainted and ineligible nodes in the same function instead of nil-ing out the diff after computation in diffSystemAllocs	2020-02-03 09:02:08 -05:00
Drew Bailey	580baea231	ignore computed diffs if node is ineligible test flakey, add temp sleeps for debugging fix computed class	2020-02-03 09:02:08 -05:00
Drew Bailey	264932dae4	Return FailedTGAlloc metric instead of no node err If an existing system allocation is running and the node its running on is marked as ineligible, subsequent plan/applys return an RPC error instead of a more helpful plan result. This change logs the error, and appends a failedTGAlloc for the placement.	2020-01-22 10:07:15 -05:00
Drew Bailey	81a24098f0	Update Evicted allocations to lost when lost If an alloc is being preempted and marked as evict, but the underlying node is lost before the migration takes place, the allocation currently stays as desired evict, status running forever, or until the node comes back online. This commit updates updateNonTerminalAllocsToLost to check for a destired status of Evict as well as Stop when updating allocations on tainted nodes. switch to table test for lost node cases	2020-01-07 13:34:18 -05:00
Preetha Appan	be897cadc3	More error->debug for logging in the bin packing iterator	2019-12-12 15:50:16 -06:00
Preetha Appan	ed1f30e799	Use debug logging for scheduler internals We currently log an error if preemption is unable to find a suitable set of allocations to preempt. This commit changes that to debug level since not finding preemptable allocations is not an error condition.	2019-12-12 12:05:29 -06:00
Michael Schurter	b2dd21a19e	Merge pull request #6792 from hashicorp/b-propose-panic scheduler: fix panic when preempting and evicting allocs	2019-12-03 10:40:19 -08:00
Tim Gross	3716a67b30	scheduler: fix job update placement on prev node penalized (#6781 ) Fixes #5856 When the scheduler looks for a placement for an allocation that's replacing another allocation, it's supposed to penalize the previous node if the allocation had been rescheduled or failed. But we're currently always penalizing the node, which leads to unnecessary migrations on job update. This commit leaves in place the existing behavior where if the previous alloc was itself rescheduled, its previous nodes are also penalized. This is conservative but the right behavior especially on larger clusters where a group of hosts might be having correlated trouble (like an AZ failure). Co-Authored-By: Michael Schurter <mschurter@hashicorp.com>	2019-12-03 06:14:49 -08:00
Michael Schurter	f12bfdb193	scheduler: update tests with modern error helper	2019-12-02 20:25:52 -08:00
Michael Schurter	6112ad9f92	scheduler: fix panic when preempting and evicting Fixes #6787 In ProposedAllocs the proposed alloc slice was being copied while its contents were not. Since RemoveAllocs nils elements of the proposed alloc slice and is called twice, it could panic on the second call when erroneously accessing a nil'd alloc. The fix is to not copy the proposed alloc slice and pass the slice returned by the 1st RemoveAllocs call to the 2nd call, thus maintaining the trimmed length.	2019-12-02 20:22:22 -08:00
Michael Schurter	62751321bf	Merge pull request #6699 from hashicorp/f-semver-constraints Add new "semver" constraint	2019-11-19 12:18:43 -08:00
Drew Bailey	89964c989a	Removes checking constraints for inplace update	2019-11-19 13:34:41 -05:00
Michael Schurter	75d6d4ec5e	core: add semver constraint The existing version constraint uses logic optimized for package managers, not schedulers, when checking prereleases: - 1.3.0-beta1 will not satisfy ">= 0.6.1" - 1.7.0-rc1 will not satisfy ">= 1.6.0-beta1" This is due to package managers wishing to favor final releases over prereleases. In a scheduler versions more often represent the earliest release all required features/APIs are available in a system. Whether the constraint or the version being evaluated are prereleases has no impact on ordering. This commit adds a new constraint - `semver` - which will use Semver v2.0 ordering when evaluating constraints. Given the above examples: - 1.3.0-beta1 satisfies ">= 0.6.1" using `semver` - 1.7.0-rc1 satisfies ">= 1.6.0-beta1" using `semver` Since existing jobspecs may rely on the old behavior, a new constraint was added and the implicit Consul Connect and Vault constraints were updated to use it.	2019-11-19 08:40:19 -08:00
Drew Bailey	c87c6415eb	DOCS: Spread stanza does not exist on task Fixes documentation inaccuracy for spread stanza placement. Spreads can only exist on the top level job struct or within a group. comment about nil assumption	2019-11-19 08:26:36 -05:00
Drew Bailey	1607a203a8	Check for changes to affinity and constraints Adds checks for affinity and constraint changes when determining if we should update inplace. refactor to check all levels at once check for spread changes when checking inplace update	2019-11-19 08:26:34 -05:00
Chris Baker	bb964defa4	changed all tests to require from t.Fatalf	2019-11-07 22:39:47 +00:00
Chris Baker	bcd1243471	the scheduler checks whether task changes require a restart, this needed to be updated to consider devices	2019-11-07 17:51:15 +00:00
Michael Schurter	08a17854ce	core: fix panic when AllocatedResources is nil Fix for #6540	2019-10-28 14:38:21 -07:00
Danielle Lancashire	ab5ba7aa9b	config: Hoist volume.config.source into volume Currently, using a Volume in a job uses the following configuration: ``` volume "alias-name" { type = "volume-type" read_only = true config { source = "host_volume_name" } } ``` This commit migrates to the following: ``` volume "alias-name" { type = "volume-type" source = "host_volume_name" read_only = true } ``` The original design was based due to being uncertain about the future of storage plugins, and to allow maxium flexibility. However, this causes a few issues, namely: - We frequently need to parse this configuration during submission, scheduling, and mounting - It complicates the configuration from and end users perspective - It complicates the ability to do validation As we understand the problem space of CSI a little more, it has become clear that we won't need the `source` to be in config, as it will be used in the majority of cases: - Host Volumes: Always need a source - Preallocated CSI Volumes: Always needs a source from a volume or claim name - Dynamic Persistent CSI Volumes: Always needs a source to attach the volumes to for managing upgrades and to avoid dangling. - Dynamic Ephemeral CSI Volumes: Less thought out, but `source` will probably point to the plugin name, and a `config` block will allow you to pass meta to the plugin. Or will point to a pre-configured ephemeral config. *If implemented The new design simplifies this by merging the source into the volume stanza to solve the above issues with usability, performance, and error handling.	2019-09-13 04:37:59 +02:00
Preetha Appan	654c72a7b4	update comment	2019-09-05 18:43:30 -05:00
Preetha Appan	87e998d043	Fix inplace updates bug with group level networks During inplace updates, we should be using network information from the previous allocation being updated.	2019-09-05 18:37:24 -05:00
Jasmine Dahilig	c346a47b5b	add default update stanza and max_parallel=0 disables deployments (#6191 )	2019-09-02 10:30:09 -07:00
Mahmood Ali	8a0647c9cf	schedulers: check all drivers on node When checking driver feasability for an alloc with multiple drivers, we must check that all drivers are detected and healthy. Nomad 0.9 and 0.8 have a bug where we may check a single driver only, but which driver is dependent on map traversal order, which is unspecified in golang spec.	2019-08-29 09:03:31 -04:00
Mahmood Ali	542d17e745	scheduler: tests for multiple drivers in TG	2019-08-29 09:03:31 -04:00
Danielle Lancashire	41292055de	scheduler: Implicit constraint on readonly hostvol When a Client declares a volume is ReadOnly, we should only schedule it for requests for ReadOnly volumes. This change means that if a host exposes a readonly volume, we then validate that the group level requests for the volume are all read only for that host.	2019-08-21 20:57:05 +02:00
Danielle Lancashire	af5d42c058	structs: Unify Volume and VolumeRequest	2019-08-12 15:39:08 +02:00
Danielle	0f5cf5fa91	Update scheduler/feasible.go Co-Authored-By: Mahmood Ali <mahmood@hashicorp.com>	2019-08-12 15:39:08 +02:00
Danielle Lancashire	709abbc675	scheduler: Add a feasability checker for Host Vols	2019-08-12 15:39:08 +02:00
Preetha Appan	5a1dd79179	Code review feedback	2019-07-31 01:04:08 -04:00

1 2 3 4 5 ...

655 Commits