Both the cluster reconciler and node reconciler emit a debug-level log line with
their results, but these are unstructured multi-line logs that are annoying for
operators to parse. Change these to emit structured key-value pairs like we do
everywhere else.
Ref: https://hashicorp.atlassian.net/browse/NMD-818
Ref: https://go.hashi.co/rfc/nmd-212
As part of ongoing work to make the scheduler more legible and more robustly
tested, we're implementing property testing of at least the reconciler. This
changeset provides some infrastructure we'll need for generating the test cases
using `pgregory.net/rapid`, without building out any of the property assertions
yet (that'll be in upcoming PRs over the next couple weeks).
The alloc reconciler generator produces a job, a previous version of the job, a
set of tainted nodes, and a set of existing allocations. The node reconciler
generator produces a job, a set of nodes, and allocations on those
nodes. Reconnecting allocs are not yet well-covered by these generators, and
with ~40 dimensions covered so far we may need to pull those out to their own
tests in order to get good coverage.
Note the scenarios only randomize fields of interest; fields like the job name
that don't impact the reconciler would use up available shrink cycles on failed
tests without actually reducing the scope of the scenario.
Ref: https://hashicorp.atlassian.net/browse/NMD-814
Ref: https://github.com/flyingmutant/rapid
This changeset separates reconciler fields into their own sub-struct to make
testing easier and the code more explicit about what fields relate to which
state.
Cluster reconciler code is notoriously hard to follow because most of its
method continuously mutate the fields of the allocReconciler object. Even
for top-level methods it makes the code hard to follow, but gets really gnarly
with lower-level methods (of which there are many). This changeset proposes a
refactoring that makes the vast majority of said methods return explicit values,
and avoid mutating object fields.