nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-06 10:25:42 +03:00

Author	SHA1	Message	Date
Aimee Ukasick	a30cb2f137	Update UI, code comment, and README links to docs, tutorials (#26429 ) * Update UI, code comment, and README links to docs, tutorials * fix typo in ephemeral disks learn more link url * feedback on typo Co-authored-by: Tim Gross <tgross@hashicorp.com> --------- Co-authored-by: Tim Gross <tgross@hashicorp.com>	2025-08-06 09:40:23 -05:00
Tim Gross	4ce937884d	scheduler: move result mutation into `computeStop` (#26351 ) The `computeStop` method returns two values that only get used to mutate the result and the untainted set. Move the mutation into the method to match the work done in #26325. Ref: https://github.com/hashicorp/nomad/pull/26325 Ref: https://hashicorp.atlassian.net/browse/NMD-819	2025-07-29 08:23:06 -04:00
Tim Gross	26554e544e	scheduler: move result mutation into `computeUpdates` (#26336 ) The `computeUpdate` method returns 4 different values, some of which are just different shapes of the same data and only ever get used to be applied to the result in the caller. Move the mutation of the result into `computeUpdates` to match the work done in #26325. Clean up the return signature so that only slices we need downstream are returned, and fix the incorrect docstring. Also fix a silent bug where the `inplace` set includes the original alloc and not the updated version. This has no functional change because all existing callers only ever look at the length of this slice, but it will prevent future bugs if that ever changes. Ref: https://github.com/hashicorp/nomad/pull/26325 Ref: https://hashicorp.atlassian.net/browse/NMD-819	2025-07-25 08:21:37 -04:00
James Rasell	5989d5862a	ci: Update golangci-lint to v2 and fix highlighted issues. (#26334 )	2025-07-25 10:44:08 +01:00
Tim Gross	2c4be7fc2e	Reconciler mutation improvements (#26325 ) Refactors of the `computeGroup` code in the reconciler to make understanding its mutations more manageable. Some of this work makes mutation more consistent but more importantly it's intended to make it readily _detectable_ while still being readable. Includes: * In the `computeCanaries` function, we mutate the dstate and the result and then the return values are used to further mutate the result in the caller. Move all this mutation into the function. * In the `computeMigrations` function, we mutate the result and then the return values are used to further mutate the result in the caller. Move all this mutation into the function. * In the `cancelUnneededCanaries` function, we mutate the result and then the return values are used to further mutate the result in the caller. Move all this mutation into the function, and annotate which `allocSet`s are mutated by taking a pointer to the set. * The `createRescheduleLaterEvals` function currently mutates the results and returns updates to mutate the results in the caller. Move all this mutation into the function to help cleanup `computeGroup`. * Extract `computeReconnecting` method from `computeGroup`. There's some tangled logic in `computeGroup` for determining changes to make for reconnecting allocations. Pull this out into its own function. Annotate mutability in the function by passing pointers to `allocSet` where needed, and mutate the result to update counts. Rename the old `computeReconnecting` method to `appendReconnectingUpdates` to mirror the naming of the similar logic for disconnects. * Extract `computeDisconnecting` method from `computeGroup`. There's some tangled logic in `computeGroup` for determining changes to make for disconnected allocations. Pull this out into its own function. Annotate mutability in the function by passing pointers to `allocSet` where needed, and mutate the result to update counts. * The `appendUnknownDisconnectingUpdates` method used to create updates for disconnected allocations mutates one of its `allocSet` arguments to change the allocations that the reschedule now set points to. Pull this update out into the caller. * A handful of small docstring and helper function fixes Ref: https://hashicorp.atlassian.net/browse/NMD-819	2025-07-24 08:33:49 -04:00
Tim Gross	e675491eb6	refactor uses of `allocSet` in reconciler (#26324 ) The reconciler contains a large set of methods and functions that operate on `allocSet` (a map of allocation IDs to their allocs). Update these so that they are consistently methods that are documented to not consume the `allocSet`. This sets the stage for further improvements around mutability in the reconciler. This changeset also includes a few related refactors: * Use the `allocSet` alias in every location it's relevant in the reconciler, for consistency and clarity. * Move the filter functions and related helpers in the `allocs.go` file into the `filters.go` file. * Update the method receiver on `allocSet` to match everywhere and generally improve the docstrings on the filter functions. Ref: https://hashicorp.atlassian.net/browse/NMD-819	2025-07-23 08:57:41 -04:00
Piotr Kazmierczak	973a554808	scheduler: remove unnecessary reconnecting and ignore allocset assignment (#26298 ) These values aren't used anywhere, and the code is confusing as is.	2025-07-21 09:06:52 +02:00
Tim Gross	333dd94362	scheduler: exit early on count=0 and filter out server-terminal (#26292 ) When a task group is removed from a jobspec, the reconciler stops all allocations and immediately returns from `computeGroup`. We can do the same for when the group has been scaled-to-zero, but doing so runs into an inconsistency in the way that server-terminal allocations are handled. Prior to this change server-terminal allocations fall through `computeGroup` without being marked as `ignore`, unless they are terminal canaries, in which case they are marked `stop` (but this is a no-op). This inconsistency causes a _tiny_ amount of extra `Plan.Submit`/Raft traffic, but more importantly makes it more difficult to make test assertions for `stop` vs `ignore` vs fallthrough. Remove this inconsistency by filtering out server-terminal allocations early in `computeGroup`. This brings the cluster reconciler's behavior closer to the node reconciler's behavior, except that the node reconciler discards _all_ terminal allocations because it doesn't support rescheduling. This changeset required adjustments to two tests, but the tests themselves were a bit of a mess: * In https://github.com/hashicorp/nomad/pull/25726 we added a test of how canaries were treated when on draining nodes. But the test didn't correctly configure the job with an update block, leading to misleading test behavior. Fix the test to exercise the intended behavior and refactor for clarity. * While working on reconciler behaviors around stopped allocations, I found it extremely hard to follow the intent of the disconnected client tests because many of the fields in the table-driven test are switches for more complex behavior or just tersely named. Attempt to make this a little more legible by moving some branches directly into fields, renaming some fields, and flattening out some branching. Ref: https://hashicorp.atlassian.net/browse/NMD-819	2025-07-18 08:51:52 -04:00
Tim Gross	35f3f6ce41	scheduler: add disconnect and reschedule info to reconciler output (#26255 ) The `DesiredUpdates` struct that we send to the Read Eval API doesn't include information about disconnect/reconnect and rescheduling. Annotate the `DesiredUpdates` with this data, and adjust the `eval status` command to display only those fields that have non-zero values in order to make the output width manageable. Ref: https://hashicorp.atlassian.net/browse/NMD-815	2025-07-16 08:46:38 -04:00
Allison Larson	3ca518e89c	Add node_pool to blockedEval metric (#26215 ) Adds the node_pool to the blockedEval metrics that get emitted for resource/cpu, along with the dc and node class.	2025-07-15 09:48:04 -07:00
Piotr Kazmierczak	08b3db104d	docs: update reconciler diagram to reflect recent refactors (#26260 ) Co-authored-by: Tim Gross <tgross@hashicorp.com>	2025-07-11 15:34:07 +02:00
Tim Gross	26302ab25d	reconciler: share assertions in property tests (#26259 ) Refactor the reconciler property tests to extract functions for safety property assertions we'll share between different job types for the same reconciler.	2025-07-11 09:27:22 -04:00
Tim Gross	74f7a8f037	scheduler: basic node reconciler safety properties for system jobs (#26216 ) Property test assertions for the core safety properties of the node reconciler, for system jobs. Ref: https://hashicorp.atlassian.net/browse/NMD-814 Ref: https://github.com/hashicorp/nomad/pull/26167	2025-07-09 14:44:05 -04:00
Tim Gross	94e03f894a	scheduler: basic cluster reconciler safety properties for batch jobs (#26172 ) Property test assertions for the core safety proprerties of the cluster reconciler, for batch jobs. The changeset includes fixes for any bugs found during work-in-progress, which will get pulled out to their own PRs. Ref: https://hashicorp.atlassian.net/browse/NMD-814 Ref: https://github.com/hashicorp/nomad/pull/26167	2025-07-09 14:43:55 -04:00
Piotr Kazmierczak	e50db4d1b8	scheduler: property testing of cancelUnneededCanaries (#26204 ) In the spirit of #26180 Internal ref: https://hashicorp.atlassian.net/browse/NMD-814	2025-07-09 13:46:13 -04:00
Tim Gross	7c6c1ed0d3	scheduler: reconciler should constrain placements to count (#26239 ) While working on property testing in #26172 we discovered there are scenarios where the reconciler will produce more than the expected number of placements. Testing of those scenarios at the whole-scheduler level shows that this gets handled correctly downstream of the reconciler, but this makes it harder to reason about reconciler behavior. Cap the number of placements in the reconciler. Ref: https://github.com/hashicorp/nomad/pull/26172	2025-07-09 11:51:01 -04:00
Tim Gross	eb47d1ca11	scheduler: eliminate dead code in node reconciler (#26236 ) While working on property testing in #26216, I discovered we had unreachable code in the node reconciler. The `diffSystemAllocsForNode` function receives a set of non-terminal allocations, but then has branches where it assumes the allocations might be terminal. It's trivially provable that these allocs are always live, as the system scheduler splits the set of known allocs into live and terminal sets before passing them into the node reconciler. Eliminate the unreachable code and improve the variable names to make the known state of the allocs more clear in the reconciler code. Ref: https://github.com/hashicorp/nomad/pull/26216	2025-07-09 11:31:04 -04:00
Piotr Kazmierczak	8bc6abcd2e	scheduler: basic cluster reconciler safety properties for service jobs (#26167 )	2025-07-09 17:30:37 +02:00
Tim Gross	c043d1c850	scheduler: property testing of reconcile reconnecting (#26180 ) To help break down the larger property tests we're doing in #26167 and #26172 into more manageable chunks, pull out a property test for just the `reconcileReconnecting` method. This method helpfully already defines its important properties, so we can implement those as test assertions. Ref: https://hashicorp.atlassian.net/browse/NMD-814 Ref: https://github.com/hashicorp/nomad/pull/26167 Ref: https://github.com/hashicorp/nomad/pull/26172	2025-07-07 09:40:49 -04:00
Tim Gross	5c909213ce	scheduler: add reconciler annotations to completed evals (#26188 ) The output of the reconciler stage of scheduling is only visible via debug-level logs, typically accessible only to the cluster admin. We can give job authors better ability to understand what's happening to their jobs if we expose this information to them in the `eval status` command. Add the reconciler's desired updates to the evaluation struct so it can be exposed in the API. This increases the size of evals by roughly 15% in the state store, or a bit more when there are preemptions (but we expect this will be a small minority of evals). Ref: https://hashicorp.atlassian.net/browse/NMD-818 Fixes: https://github.com/hashicorp/nomad/issues/15564	2025-07-07 09:40:21 -04:00
Tim Gross	9a29df2292	scheduler: emit structured logs from reconciliation (#26169 ) Both the cluster reconciler and node reconciler emit a debug-level log line with their results, but these are unstructured multi-line logs that are annoying for operators to parse. Change these to emit structured key-value pairs like we do everywhere else. Ref: https://hashicorp.atlassian.net/browse/NMD-818 Ref: https://go.hashi.co/rfc/nmd-212	2025-07-01 10:37:44 -04:00
Piotr Kazmierczak	36e7148247	scheduler: doc.go files for new packages (#26177 )	2025-07-01 16:28:33 +02:00
Tim Gross	ec8250ed30	property test generation for reconciler (#26142 ) As part of ongoing work to make the scheduler more legible and more robustly tested, we're implementing property testing of at least the reconciler. This changeset provides some infrastructure we'll need for generating the test cases using `pgregory.net/rapid`, without building out any of the property assertions yet (that'll be in upcoming PRs over the next couple weeks). The alloc reconciler generator produces a job, a previous version of the job, a set of tainted nodes, and a set of existing allocations. The node reconciler generator produces a job, a set of nodes, and allocations on those nodes. Reconnecting allocs are not yet well-covered by these generators, and with ~40 dimensions covered so far we may need to pull those out to their own tests in order to get good coverage. Note the scenarios only randomize fields of interest; fields like the job name that don't impact the reconciler would use up available shrink cycles on failed tests without actually reducing the scope of the scenario. Ref: https://hashicorp.atlassian.net/browse/NMD-814 Ref: https://github.com/flyingmutant/rapid	2025-06-26 11:09:53 -04:00
Piotr Kazmierczak	27da75044e	scheduler: move tests that depend on calling schedulers into `integration` package (#26037 )	2025-06-24 09:31:10 +02:00
Piotr Kazmierczak	12ddb6db94	scheduler: capture reconciler state in ReconcilerState object (#26088 ) This changeset separates reconciler fields into their own sub-struct to make testing easier and the code more explicit about what fields relate to which state.	2025-06-23 15:36:39 +02:00
Piotr Kazmierczak	1030760d3f	scheduler: adjust method comments and names to reflect recent refactoring (#26085 ) Co-authored-by: Tim Gross <tgross@hashicorp.com>	2025-06-20 17:23:31 +02:00
Piotr Kazmierczak	b82fd2e159	scheduler: refactor cluster reconciler to avoid hidden state mutation (#26042 ) Cluster reconciler code is notoriously hard to follow because most of its method continuously mutate the fields of the allocReconciler object. Even for top-level methods it makes the code hard to follow, but gets really gnarly with lower-level methods (of which there are many). This changeset proposes a refactoring that makes the vast majority of said methods return explicit values, and avoid mutating object fields.	2025-06-20 07:37:16 +02:00
Piotr Kazmierczak	0ddbc548a3	scheduler: rename reconciliation package to `reconciler` (#26038 ) nouns are better than verbs for package names	2025-06-12 14:36:09 +02:00
Piotr Kazmierczak	199d12865f	scheduler: isolate `feasibility` (#26031 ) This change isolates all the code that deals with node selection in the scheduler into its own package called feasible. --------- Co-authored-by: Tim Gross <tgross@hashicorp.com>	2025-06-11 20:11:04 +02:00
Piotr Kazmierczak	76e3c2961a	scheduler: isolate reconciliation code (#26002 ) This moves all the code of service/batch and system/sysbatch reconciliation into a new reconcile package.	2025-06-10 15:46:39 +02:00
Piotr Kazmierczak	ce054aae96	scheduler: add a readme and start documenting low level implementation details (#25986 ) In an effort to improve the readability and maintainability of nomad/scheduler package, we begin with a README file that describes its operation in more detail than the official documentation does. This PR will be followed by a few small ones that move the code around that package, improve variable naming and also keep that readme up to date. --------- Co-authored-by: Tim Gross <tgross@hashicorp.com>	2025-06-05 15:36:17 +02:00
Piotr Kazmierczak	648bacda77	testing: migrate nomad/scheduler off of testify (#25968 ) In the spirit of #25909, this PR removes testify dependencies from the scheduler package, along with reflect.DeepEqual removal. This is again a combination of semgrep and hx editing magic. --------- Co-authored-by: Tim Gross <tgross@hashicorp.com>	2025-06-04 09:29:28 +02:00
tehut	55523ecf8e	Add NodeMaxAllocations to client configuration (#25785 ) * Set MaxAllocations in client config Add NodeAllocationTracker struct to Node struct Evaluate MaxAllocations in AllocsFit function Set up cli config parsing Integrate maxAllocs into AllocatedResources view Co-authored-by: Tim Gross <tgross@hashicorp.com> --------- Co-authored-by: Tim Gross <tgross@hashicorp.com>	2025-05-22 12:49:27 -07:00
Tim Gross	456d95a19e	scheduler: account for affinity value of zero in score normalization (#25800 ) If there are no affinities on a job, we don't want to count an affinity score of zero in the number of scores we divide the normalized score by. This is how we handle other scoring components like node reschedule penalties on nodes that weren't running the previous allocation. But we also exclude counting the affinity in the case where we have affinity but the value is zero. In pathological cases, this can result in a node with a low affinity being picked over a node with no affinity, because the denominator is 1 larger. Include zero-value affinities in the count of scores if the job has affinities but the value just happens to be zero. Fixes: https://github.com/hashicorp/nomad/issues/25621	2025-05-19 14:10:00 -04:00
Allison Larson	fd16f80b5a	Only error on constraints if no allocs are running (#25850 ) * Only error on constraints if no allocs are running When running `nomad job run <JOB>` multiple times with constraints defined, there should be no error as a result of filtering out nodes that do not/have not ever satsified the constraints. When running a systems job with constraint, any run after an initial startup returns an exit(2) and a warning about unplaced allocations due to constraints. An error that is not encountered on the initial run, though the constraint stays the same. This is because the node that satisfies the condition is already running the allocation, and the placement is ignored. Another placement is attempted, but the only node(s) left are the ones that do not satisfy the constraint. Nomad views this case (no allocations that were attempted to placed could be placed successfully) as an error, and reports it as such. In reality, no allocations should be placed or updated in this case, but it should not be treated as an error. This change uses the `ignored` placements from diffSystemAlloc to attempt to determine if the case encountered is an error (no ignored placements means that nothing is already running, and is an error), or is not one (an ignored placement means that the task is already running somewhere on a node). It does this at the point where `failedTGAlloc` is populated, so placement functionality isn't changed, just the field that populates error. There is functionality that should be preserved which (correctly) notifies a user if a job is attempted that cannot be run on any node due to the constraints filtering out all available nodes. This should still behave as expected. * Add changelog entry * Handle in-place updates for constrained system jobs * Update .changelog/25850.txt Co-authored-by: Piotr Kazmierczak <470696+pkazmierczak@users.noreply.github.com> * Remove conditionals --------- Co-authored-by: Piotr Kazmierczak <470696+pkazmierczak@users.noreply.github.com>	2025-05-15 15:14:03 -07:00
Tim Gross	5208ad4c2c	scheduler: allow canaries to be migrated on node drain (#25726 ) When a node is drained that has canaries that are not yet healthy, the canaries may not be properly migrated and the deployment will halt. This happens only if there are more than `migrate.max_parallel` canaries on the node and the canaries are not yet healthy (ex. they have a long `update.min_healthy_time`). In this circumstance, the first batch of canaries are marked for migration by the drainer correctly. But then the reconciler counts these migrated canaries against the total number of expected canaries and no longer progresses the deployment. Because an insufficient number of allocations have reported they're healthy, the deployment cannot be promoted. When the reconciler looks for canaries to cancel, it leaves in the list any canaries that are already terminal (because there shouldn't be any work to do). But this ends up skipping the creation of a new canary to replace terminal canaries that have been marked for migration. Add a conditional for this case to cause the canary to be removed from the list of active canaries so we can replace it. Ref: https://hashicorp.atlassian.net/browse/NMD-560 Fixes: https://github.com/hashicorp/nomad/issues/17842	2025-04-24 09:24:28 -04:00
Allison Larson	50513a87b7	Preserve core resources during inplace service alloc updates (#25705 ) * Preserve core resources during inplace service alloc updates When an alloc is running with the core resources specified, and the alloc is able to be updated in place, the cores it is running on should be preserved. This fixes a bug where the allocation's task's core resources (CPU.ReservedCores) would be recomputed each time the reconciler checked that the allocation could continue to run on the given node. Under circumstances where a different core on the node became available before this check was made, the selection process could compute this new core as the core to run on, regardless of core the allocation was already running on. The check takes into account other allocations running on the node with reserved cores, but cannot check itself. When this would happen for multiple allocations being evaluated in a single plan, the selection process would see the other cores being previously reserved but be unaware of the one it ran on, resulting in the same core being chosen over and over for each allocation that was being checked, and updated in the state store (but not on the node). Once those cores were chosen and committed for multiple allocs, the node appears to be exhausted on the cores dimension, and it would prevent any additional allocations from being started on the node. The reconciler check/computation for allocations that are being updated in place and have resources.cores defined is effectively a check that the node has the available cores to run on, not a computation that should be changed. The fix still performs the check, but once it is successful any existing ReservedCores are preserved. Because any changes to this resource is considered a "destructive change", this can be confidently preserved during the inplace update. * Adjust reservedCores scheduler test * Add changelog entry	2025-04-23 10:38:47 -07:00
Tim Gross	c205688857	scheduler: fix state corruption from rescheduler tracker updates (#25698 ) In #12319 we fixed a bug where updates to the reschedule tracker would be dropped if the follow-up allocation failed to be placed by the scheduler in the later evaluation. We did this by mutating the previous allocation's reschedule tracker. But we did this without copying the previous allocation first and then making sure the updated copy was in the plan. This is unfortunately unsafe and corrupts the state store on the server where the scheduler ran; it may cause a race condition in RPC handlers and it causes the server to be out of sync with the other servers. This was discovered while trying to make all our tests race-free, but likely impacts production users. Copy the previous allocation before updating the reschedule tracker, and swap out the updated allocation in the plan. This also requires that we include the reschedule tracker in the "normalized" (stripped-down) allocations we send to the leader as part of a plan. Ref: https://github.com/hashicorp/nomad/pull/12319 Fixes: https://hashicorp.atlassian.net/browse/NET-12357	2025-04-18 08:42:54 -04:00
Tim Gross	5c89b07f11	CI: run copywrite on PRs, not just after merges (#25658 ) * CI: run copywrite on PRs, not just after merges * fix a missing copyright header	2025-04-10 17:01:34 -04:00
Carlos Galdino	048c5bcba9	Use core ID when selecting cores (#25340 ) * Use core ID when selecting cores If the available cores are not a continuous set, the core selector might panic when trying to select cores. For example, consider a scenario where the available cores for the selector are the following: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47] This list contains 46 cores, because cores with IDs 0 and 24 are not included in the list Before this patch, if we requested 46 cores, the selector would panic trying to access the item with index 46 in `cs.topology.Cores`. This patch changes the selector to use the core ID instead when looking for a core inside `cs.topology.Cores`. This prevents an out of bounds access that was causing the panic. Note: The patch is straightforward with the change. Perhaps a better long-term solution would be to restructure the `numalib.Topology.Cores` field to be a `map[ID]Core`, but that is a much larger change that is more difficult to land. Also, the amount of cores in our case is small—at most 192—so a search won't have any noticeable impact. * Add changelog entry * Build list of IDs inline	2025-04-10 13:04:15 -07:00
James Rasell	4c4cb2c6ad	agent: Fix misaligned contextual k/v logging arguments. (#25629 ) Arguments passed to hclog log lines should always have an even number to provide the expected k/v output.	2025-04-10 14:40:21 +01:00
Michael Smithhisler	f2b761f17c	disconnected: removes deprecated disconnect fields (#25284 ) The group level fields stop_after_client_disconnect, max_client_disconnect, and prevent_reschedule_on_lost were deprecated in Nomad 1.8 and replaced by field in the disconnect block. This change removes any logic related to those deprecated fields. --------- Co-authored-by: Tim Gross <tgross@hashicorp.com>	2025-03-05 14:46:02 -05:00
Tim Gross	1788bfb42e	remove addresses from node class hash (#24942 ) When a node is fingerprinted, we calculate a "computed class" from a hash over a subset of its fields and attributes. In the scheduler, when a given node fails feasibility checking (before fit checking) we know that no other node of that same class will be feasible, and we add the hash to a map so we can reject them early. This hash cannot include any values that are unique to a given node, otherwise no other node will have the same hash and we'll never save ourselves the work of feasibility checking those nodes. In #4390 we introduce the `nomad.advertise.address` attribute and in #19969 we introduced `consul.dns.addr` attribute. Both of these are unique per node and break the hash. Additionally, we were not correctly filtering attributes out when checking if a node escaped the class by not filtering for attributes that start with `unique.`. The test for this introduced in #708 had an inverted assertion, which allowed this to pass unnoticed since the early days of Nomad. Ref: https://github.com/hashicorp/nomad/pull/708 Ref: https://github.com/hashicorp/nomad/pull/4390 Ref: https://github.com/hashicorp/nomad/pull/19969	2025-03-03 09:28:32 -05:00
James Rasell	7268053174	vault: Remove legacy token based authentication workflow. (#25155 ) The legacy workflow for Vault whereby servers were configured using a token to provide authentication to the Vault API has now been removed. This change also removes the workflow where servers were responsible for deriving Vault tokens for Nomad clients. The deprecated Vault config options used byi the Nomad agent have all been removed except for "token" which is still in use by the Vault Transit keyring implementation. Job specification authors can no longer use the "vault.policies" parameter and should instead use "vault.role" when not using the default workload identity. --------- Co-authored-by: Tim Gross <tgross@hashicorp.com> Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>	2025-02-28 07:40:02 +00:00
Piotr Kazmierczak	58c6387323	stateful deployments: task group host volume claims API (#25114 ) This PR introduces API endpoints /v1/volumes/claims/ and /v1/volumes/claim/:id for listing and deleting task group host volume claims, respectively.	2025-02-25 15:51:59 +01:00
Tim Gross	dc58f247ed	docs: clarify reschedule, migrate, and replacement terminology (#24929 ) Our vocabulary around scheduler behaviors outside of the `reschedule` and `migrate` blocks leaves room for confusion around whether the reschedule tracker should be propagated between allocations. There are effectively five different behaviors we need to cover: * restart: when the tasks of an allocation fail and we try to restart the tasks in place. * reschedule: when the `restart` block runs out of attempts (or the allocation fails before tasks even start), and we need to move the allocation to another node to try again. * migrate: when the user has asked to drain a node and we need to move the allocations. These are not failures, so we don't want to propagate the reschedule tracker. * replacement: when a node is lost, we don't count that against the `reschedule` tracker for the allocations on the node (it's not the allocation's "fault", after all). We don't want to run the `migrate` machinery here here either, as we can't contact the down node. To the scheduler, this is effectively the same as if we bumped the `group.count` * replacement for `disconnect.replace = true`: this is a replacement, but the replacement is intended to be temporary, so we propagate the reschedule tracker. Add a section to the `reschedule`, `migrate`, and `disconnect` blocks explaining when each item applies. Update the use of the word "reschedule" in several places where "replacement" is correct, and vice-versa. Fixes: https://github.com/hashicorp/nomad/issues/24918 Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>	2025-02-18 09:31:03 -05:00
Paweł Bęza	43885f6854	Allow for in-place update when affinity or spread was changed (#25109 ) Similarly to #6732 it removes checking affinity and spread for inplace update. Both affinity and spread should be as soft preference for Nomad scheduler rather than strict constraint. Therefore modifying them should not trigger job reallocation. Fixes #25070 Co-authored-by: Tim Gross <tgross@hashicorp.com>	2025-02-14 14:33:18 -05:00
Piotr Kazmierczak	5468829260	stateful deployments: fix return in the `hasVolumes` feasibility check (#25084 ) A return statement was missing in the sticky volume check—when we weren't able to find a suitable volume, we did not return false. This was caught by e2e test. This PR fixes the issue, and corrects and expands the unit test.	2025-02-11 18:57:48 +01:00
Piotr Kazmierczak	611452e1af	stateful deployments: use `TaskGroupVolumeClaim` table to associate volume requests with volume IDs (#24993 ) We introduce an alternative solution to the one presented in #24960 which is based on the state store and not previous-next allocation tracking in the reconciler. This new solution reduces cognitive complexity of the scheduler code at the cost of slightly more boilerplate code, but also opens up new possibilities in the future, e.g., allowing users to explicitly "un-stick" volumes with workloads still running. The diagram below illustrates the new logic: SetVolumes() upsertAllocsImpl() sets ns, job +-----------------checks if alloc requests tg in the scheduler v sticky vols and consults \| +-----------------------+ state. If there is no claim, \| \| TaskGroupVolumeClaim: \| it creates one. \| \| - namespace \| \| \| - jobID \| \| \| - tg name \| \| \| - vol ID \| v \| uniquely identify vol \| hasVolumes() +----+------------------+ consults the state \| ^ and returns true \| \| DeleteJobTxn() if there's a match <-----------+ +---------------removes the claim from or if there is no the state previous claim \| \| \| \| +-----------------------------+ +------------------------------------------------------+ scheduler state store	2025-02-07 17:41:01 +01:00
Matt Keeler	833e240597	Upgrade to using hashicorp/go-metrics@v0.5.4 (#24856 ) * Upgrade to using hashicorp/go-metrics@v0.5.4 This also requires bumping the dependencies for: * memberlist * serf * raft * raft-boltdb * (and indirectly hashicorp/mdns due to the memberlist or serf update) Unlike some other HashiCorp products, Nomads root module is currently expected to be consumed by others. This means that it needs to be treated more like our libraries and upgrade to hashicorp/go-metrics by utilizing its compat packages. This allows those importing the root module to control the metrics module used via build tags.	2025-01-31 15:22:00 -05:00

1 2 3 4 5 ...

949 Commits