I noticed that `watchNodeUpdates()` almost immediately after
`registerAndHeartbeat()` calls `retryRegisterNode()`, well after 5
seconds.
This call is unnecessary and made debugging a bit harder. So here, we
ensure that we only re-register node for new node events, not for
initial registration.
Here we retain 0.8.7 behavior of waiting for driver fingerprints before
registering a node, with some timeout. This is needed for system jobs,
as system job scheduling for node occur at node registration, and the
race might mean that a system job may not get placed on the node because
of missing drivers.
The timeout isn't strictly necessary, but raising it to 1 minute as it's
closer to indefinitely blocked than 1 second. We need to keep the value
high enough to capture as much drivers/devices, but low enough that
doesn't risk blocking too long due to misbehaving plugin.
Fixes https://github.com/hashicorp/nomad/issues/5579
Renewal time was being calculated as 10s+Intn(lease-10s), so the renewal
time could be very rapid or within 1s of the deadline: [10s, lease)
This commit fixes the renewal time by calculating it as:
(lease/2) +/- 10s
For a lease of 60s this means the renewal will occur in [20s, 40s).
Revert "fingerprint Constraints and Affinities have Equals, as set"
This reverts commit 596f16fb5f.
Revert "client tests assert the independent handling of interface and speed"
This reverts commit 7857ac5993.
Revert "structs missed applying a style change from the review"
This reverts commit 658916e327.
Revert "client, structs comments"
This reverts commit be2838d6ba.
Revert "client fingerprint updateNetworks preserves the network configuration"
This reverts commit fc309cb430.
Revert "client_test cleanup comments from review"
This reverts commit bc0bf4efb9.
Revert "client Networks Equals is set equality"
This reverts commit f8d432345b.
Revert "struct cleanup indentation in RequestedDevice Equals"
This reverts commit f4746411ca.
Revert "struct Equals checks for identity before value checking"
This reverts commit 0767a4665e.
Revert "fix client-test, avoid hardwired platform dependecy on lo0"
This reverts commit e89dbb2ab1.
Revert "refactor error in client fingerprint to include the offending data"
This reverts commit a7fed726c6.
Revert "add client updateNodeResources to merge but preserve manual config"
This reverts commit 84bd433c7e.
Revert "refactor struts.RequestedDevice to have its own Equals"
This reverts commit 6897825240.
Revert "refactor structs.Resource.Networks to have its own Equals"
This reverts commit 49e2e6c77b.
Revert "refactor structs.Resource.Devices to have its own Equals"
This reverts commit 4ede9226bb.
Revert "add COMPAT(0.10): Remove in 0.10 notes to impl for structs.Resources"
This reverts commit 49fbaace52.
Revert "add structs.Resources Equals"
This reverts commit 8528a2a2a6.
Revert "test that fingerprint resources are updated, net not clobbered"
This reverts commit 8ee02ddd23.
This adds a `nomad alloc restart` command and api that allows a job operator
with the alloc-lifecycle acl to perform an in-place restart of a Nomad
allocation, or a given subtask.
Remove runLaunched tracking as Run is *always* called for killable
TaskRunners. TaskRunners which fail before Run can be called (during
NewTaskRunner or Restore) are not killable as they're never added to the
client's alloc map.
This PR switches to using plain fifo files instead of golang structs
managed by containerd/fifo library.
The library main benefit is management of opening fifo files. In Linux,
a reader `open()` request would block until a writer opens the file (and
vice-versa). The library uses goroutines so that it's the first IO
operation that blocks.
This benefit isn't really useful for us: Given that logmon simply
streams output in a separate process, blocking of opening or first read
is effectively the same.
The library additionally makes further complications for managing state
and tracking read/write permission that seems overhead for our use,
compared to using a file directly.
Looking here, I made the following incidental changes:
* document that we do handle if fifo files are already created, as we
rely on that behavior for logmon restarts
* use type system to lock read vs write: currently, fifo library returns
`io.ReadWriteCloser` even if fifo is opened for writing only!
I chose to make them more of integration tests since there's a lot more
plumbing involved. The internal implementation details of how we craft
task envs can now change and these tests will still properly assert the
task runtime environment is setup properly.
Noticed that the protobuf files are out of sync with ones generated by 1.2.0 protoc go plugin.
The cause for these files seem to be related to release processes, e.g. [0.9.0-beta1 preperation](ecec3d38de (diff-da4da188ee496377d456025c2eab4e87)), and [0.9.0-beta3 preperation](b849d84f2f).
This restores the changes to that of the pinned protoc version and fails build if protobuf files are out of sync. Sample failing Travis job is that of the first commit change: https://travis-ci.org/hashicorp/nomad/jobs/506285085