In #8435 (shipped in 0.12.1), we updated the `Job.Register` RPC to atomically
write the eval along with the job. But this didn't get copied to
`Job.Dispatch`. Under excessive load testing we demonstrated this can result in
dispatched jobs without corresponding evals.
Update the dispatch RPC to write the eval in the same Raft log as the job
registration. Note that we don't need to version-check this change for upgrades,
because the register and dispatch RPCs share the same `JobRegisterRequestType`
Raft message, and therefore all supported server versions already look for the
eval in the FSM. If an updated leader includes the eval, older followers will
write the eval. If a non-updated leader writes the eval in a separate Raft
entry, updated followers will write those evals normally.
Fixes: https://github.com/hashicorp/nomad/issues/26655
Ref: https://hashicorp.atlassian.net/browse/NMD-947
Ref: https://github.com/hashicorp/nomad/pull/8435
This changeset adds system scheduler tests of various permutations of the `update`
block. It also fixes a number of bugs discovered in the process.
* Don't create deployment for in-flight rollout. If a system job is in the
middle of a rollout prior to upgrading to a version of Nomad with system
deployments, we'll end up creating a system deployment which might never
complete because previously placed allocs will not be tracked. Check to see if
we have existing allocs that should belong to the new deployment and prevent a
deployment from being created in that case.
* Ensure we call `Copy` on `Deployment` to avoid state store corruption.
* Don't limit canary counts by `max_parallel`.
* Never create deployments for `sysbatch` jobs.
Ref: https://hashicorp.atlassian.net/browse/NMD-761
In the system scheduler, we need to keep track which nodes were previously used
as "canary nodes" and not pick them at random, in case of previously failed
canaries or changes to the amount of canaries in the jobspec.
---------
Co-authored-by: Tim Gross <tgross@hashicorp.com>
Typically the `LOGNAME` environment variable should be set according
to the values within `/etc/passwd` and represents the name of the
logged in user. This should be set, where possible, alongside the
USER and HOME variables for all drivers that use the shared
executor and do not use a sub-shell.
don't require "bridge" network mode when using connect{}
we document this as "at your own risk" because CNI configuration
is so flexible that we can't guarantee a user's network will work,
but Nomad's "bridge" CNI config may be used as a reference.
Currently every time a client starts, it creates a new consul token per service or task,. This PR changes the behaviour , it persists consul ACL token to the client state and it starts by looking up a token before creating a new one.
Fixes: #20184Fixes: #20185
look, I know I misspelled "locater" in the code comment, but it's easier to acknowledge that here in this commit message than it is to push a new commit with all the test/approval machinery in github.
This changeset adjusts the handling of allocations placement when we're
promoting a deployment, and it corrects the behavior of isDeploymentComplete,
which previously would never mark promoted deployment as complete.
The TestVolumeWatch_LeadershipTransition test was a little racy
and the fix required adding an eventually wrapper to the end of
the test. While doing this work, it seemed fit to move the package
to the must library also.
When creating constants with a custom type, each definition should
include the type definition. If only the first constant defines
this, it will have a different type to the other constants.
This change fixes occurances of this and enables SA9004 within CI
linting to catch future problems while the change is in review.
Adds a new `windows` command which is available when running on
a Windows hosts. The command includes two new subcommands:
* `service install`
* `service uninstall`
The `service install` command will install the called binary into
the Windows program files directory, create a new Windows service,
setup configuration and data directories, and register the service
with the Window eventlog. If the service and/or binary already
exist, the service will be stopped, service and eventlog updated
if needed, binary replaced, and the service started again.
The `service uninstall` command will stop the service, remove the
Windows service, and deregister the service with the eventlog. It
will not remove the configuration/data directory nor will it remove
the installed binary.
Defines a `winsvc.Event` type which can be sent using the `winsvc.SendEvent`
function. If nomad is running on Windows and can send to the Windows
Eventlog the event will be sent. Initial event types are defined for
starting, ready, stopped, and log message.
The `winsvc.EventLogger` provides an `io.WriteCloser` that can be included
in the logger's writers collection. It will extract the log level from
log lines and write them appropriately to the eventlog. The eventlog
only supports error, warning, and info levels so messages with other
levels will be ignored.
A new configuration block is included for enabling logging to the
eventlog. Logging must be enabled with the `log_level` option and
the `eventlog.level` value can then be of the same or higher severity.
Provides interfaces to the Windows service manager and Windows
services. These interfaces support creating new Windows services,
deleting Windows services, configuring Windows services, and
registering/deregistering services with Windows Eventlog.
A path helper is included to support expansion of paths using a
subset of known folder IDs.
A privileged helper is included to check that the process is
currently being executed with elevated privileges, which are
required for managing Windows services and modifying the registry.
The call to IMDSv1 has been failing since we switched to v2 which
meant the UI e2e script attempted to use the service IP address
for its tests. The service IP address is the Nomad client's
private address which is not routable from the e2e test runner
which means the test fails.
This change updates the IP discovery to use IMDSv2 which means the
address is correctly populated and routable. The change also makes
this discovery method by a job action within the proxy job. This
exercises that feature and utilizes it in a way for which it was
designed.
Ensuring the keyring is ready before starting the Nomad client in
the client intro e2e test speeds up execution. This is because the
client does not have to wait to retry failed registrations due to
the keyring not being ready.
The new client intro test mimics the Consul and Vault compat tests
and uses local agents to perform the required setup. This method
allows us the flexibility moving forward to test when enforcement
mode is in strict.
The test suite will now be triggered from the test-e2e CI run
and can also be called by a make target.
Because the Enterprise code has a set of copywrite exclusion entries below the
one listed here in CE, we need to make sure that the last CE line in the
configuration file ends in a comma.
This adds artifact inspection after download to detect any issues
with the content fetched. Currently this means checking for any
symlinks within the artifact that resolve outside the task or
allocation directories. On platforms where lockdown is available
(some Linux) this inspection is not performed.
The inspection can be disabled with the DisableArtifactInspection
option. A dedicated option for disabling this behavior allows
the DisableFilesystemIsolation option to be enabled but still
have artifacts inspected after download.