If a Nomad job is started with a large number of instances (e.g. 4 billion),
then the Nomad servers that attempt to schedule it will run out of memory and
crash. While it's unlikely that anyone would intentionally schedule a job with 4
billion instances, we have occasionally run into issues with bugs in external
automation. For example, an automated deployment system running on a test
environment had an off-by-one error, and deployed a job with count = uint32(-1),
causing the Nomad servers for that environment to run out of memory and crash.
To prevent this, this PR introduces a job_max_count Nomad server configuration
parameter. job_max_count limits the number of allocs that may be created from a
job. The default value is 50000 - this is low enough that a job with the maximum
possible number of allocs will not require much memory on the server, but is
still much higher than the number of allocs in the largest Nomad job we have
ever run.
* Add preserve-resources flag when registering a job
* Add preserve-resources flag to website docs
* Add changelog
* Update tests, docs
* Preserve counts & resources in fsm
* Update doc
* Update preservation of resources/count to happen in StateStore
On Windows, the `os.Process.Signal` method returns an error when sending
`os.Interrupt` (SIGINT) because it isn't implemented. This causes test servers
in the `testutil` packages to break on Windows. Use the platform specific
syscalls to generate the SIGINT instead.
The agent's signal handler also did not correctly handle the Ctrl-C because we
were masking os.Interrupt instead of SIGINT.
Fixes: https://github.com/hashicorp/nomad/issues/26775
Co-authored-by: Chris Roberts <croberts@hashicorp.com>
When calling the client identity renew API, it is possible the
target node ID is provided by either the URI or within the request
body. This change fixes a bug where all calls using a node_id query
parameter would be reject as it failed to decode the empty request
body.
Co-authored-by: Tim Gross <tgross@hashicorp.com>
don't require "bridge" network mode when using connect{}
we document this as "at your own risk" because CNI configuration
is so flexible that we can't guarantee a user's network will work,
but Nomad's "bridge" CNI config may be used as a reference.
When creating constants with a custom type, each definition should
include the type definition. If only the first constant defines
this, it will have a different type to the other constants.
This change fixes occurances of this and enables SA9004 within CI
linting to catch future problems while the change is in review.
Defines a `winsvc.Event` type which can be sent using the `winsvc.SendEvent`
function. If nomad is running on Windows and can send to the Windows
Eventlog the event will be sent. Initial event types are defined for
starting, ready, stopped, and log message.
The `winsvc.EventLogger` provides an `io.WriteCloser` that can be included
in the logger's writers collection. It will extract the log level from
log lines and write them appropriately to the eventlog. The eventlog
only supports error, warning, and info levels so messages with other
levels will be ignored.
A new configuration block is included for enabling logging to the
eventlog. Logging must be enabled with the `log_level` option and
the `eventlog.level` value can then be of the same or higher severity.
The HTTP request body contains the node ID where the request should
be routed and without decoding this, we cannot route to anything
other than local nodes.
The `RetryJoin` function checks for an error and logs it before
retrying. The error variables were shadowed which resulted in
the errors never being logged. This predefines the variables
to prevent them from being shadowed.
The testlog package was also updated to support providing a custom
writer which allows logging output to be easily caught and inspected.
The Nomad clients store their Nomad identity in memory and within
their state store. While active, it is not possible to dump the
state to view the stored identity token, so having a way to view
the current claims while running aids debugging and operations.
This change adds a client identity workflow, allowing operators
to view the current claims of the nodes identity. It does not
return any of the signing key material.
This change implements the client -> server workflow for Nomad
node introduction. A Nomad node can optionally be started with an
introduction token, which is a signed JWT containing claims for
the node registration. The server handles this according to the
enforcement configuration.
The introduction token can be provided by env var, cli flag, or
by placing it within a default filesystem location. The latter
option does not override the CLI or env var.
The region claims has been removed from the initial claims set of
the intro identity. This boundary is guarded by mTLS and aligns
with the node identity.
* Add -log-file-export and -log-lookback commands to add historical log to
debug capture
* use monitor.PrepFile() helper for other historical log tests
* Add MonitorExport command and handlers
* Implement autocomplete
* Require nomad in serviceName
* Fix race in StreamReader.Read
* Add and use framer.Flush() to coordinate function exit
* Add LogFile to client/Server config and read NomadLogPath in rpcHandler instead of HTTPServer
* Parameterize StreamFixed stream size
The node introduction workflow will utilise JWT's that can be used
as authentication tokens on initial client registration. This
change implements the basic builder for this JWT claim type and
the RPC and HTTP handler functionality that will expose this to
the operator.
The new configuration block exposes some key options which allow
cluster administrators to control certain client introduction
behaviours.
This change introduces the new block and plumbing, so that it is
exposed in the Nomad server for consumption via internal processes.
The Nomad client will have its identity renewed according to the
TTL which defaults to 24h. In certain situations such as root
keyring rotation, operators may want to force clients to renew
their identities before the TTL threshold is met. This change
introduces a client HTTP and RPC endpoint which will instruct the
node to request a new identity at its next heartbeat. This can be
used via the API or a new command.
While this is a manual intervention step on top of the any keyring
rotation, it dramatically reduces the initial feature complexity
as it provides an asynchronous and efficient method of renewal that
utilises existing functionality.
When performing a graceful shutdown the client drain configuration
is checked for a deadline which is appended to the timeout. When
running as a server the client will not be set. Attempting to get
the drain deadline will result in a panic. This checks for the
client being available prior to fetching the deadline value.
When a Nomad client register or re-registers, the RPC handler will
generate and return a node identity if required. When an identity
is generated, the signing key ID will be stored within the node
object, to ensure a root key is not deleted until it is not used.
During normal client operation it will periodically heartbeat to
the Nomad servers to indicate aliveness. The RPC handler that
is used for this action has also been updated to conditionally
perform identity generation. Performing it here means no extra RPC
handlers are required and we inherit the jitter in identity
generation from the heartbeat mechanism.
The identity generation check methods are performed from the RPC
request arguments, so they a scoped to the required behaviour and
can handle the nuance of each RPC. Failure to generate an identity
is considered terminal to the RPC call. The client will include
behaviour to retry this error which is always caused by the
encrypter not being ready unless the servers keyring has been
corrupted.
When a test starts an agent and the client is enabled, we can
wait until this reaches the ready state within the set up method.
This mimics what we already do with leadership and the root
keyring and should reduce flakey tests where it assume the client
is ready as soon as the set up function returns, which is not
guaranteed.
The change exposed a couple of TLS reload tests which were not
using the test agent correctly. They were setting up a client even
though it would never be able to join the cluster due to TLS
configuration issues. These have been fixed.
When performing a graceful shutdown a channel is used to wait for
the agent to leave. The channel is closed when the agent leaves
successfully, but it also is closed within a deferral. If the
agent successfully leaves and closes the channel, a panic will
occur when the channel is closed the second time within the
deferral. To prevent this from occurring, the channel closing
is wrapped within a `OnceFunc` so the channel is only closed
once.
While waiting for the agent to leave during a graceful shutdown
the wait can be interrupted immediately if another signal is
received. It is common that while waiting a `SIGPIPE` is received
from journald causing the wait to end early. This results in the
agent not finishing the leave process and reporting an error when
the process has stopped. Instead of allowing any signal to interrupt
the wait, the signal is checked for a `SIGPIPE` and if matched will
continue waiting.
When a node is garbage collected, any dynamic host volumes on the node are
orphaned in the state store. We generally don't want to automatically collect
these volumes and risk data loss, and have provided a CLI flag to `-force`
remove them in #25902. But for clusters running on ephemeral cloud
instances (ex. AWS EC2 in an autoscaling group), deleting host volumes may add
excessive friction. Add a configuration knob to the client configuration to
remove host volumes from the state store on node GC.
Ref: https://github.com/hashicorp/nomad/pull/25902
Ref: https://github.com/hashicorp/nomad/issues/25762
Ref: https://hashicorp.atlassian.net/browse/NMD-705
* Set MaxAllocations in client config
Add NodeAllocationTracker struct to Node struct
Evaluate MaxAllocations in AllocsFit function
Set up cli config parsing
Integrate maxAllocs into AllocatedResources view
Co-authored-by: Tim Gross <tgross@hashicorp.com>
---------
Co-authored-by: Tim Gross <tgross@hashicorp.com>
This introduces a new HTTP endpoint (and an associated CLI command) for querying
ACL policies associated with a workload identity. It allows users that want
to learn about the ACL capabilities from within WI-tasks to know what sort of
policies are enabled.
---------
Co-authored-by: Tim Gross <tgross@hashicorp.com>
Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>
Nomad Enterprise users operating in air-gapped or otherwise secured environments
don't want to send license reporting metrics directly from their
servers. Implement manual/offline reporting by periodically recording usage
metrics snapshots in the state store, and providing an API and CLI by which
cluster administrators can download the snapshot for review and out-of-band
transmission to HashiCorp.
This is the CE portion of the work required for implemention in the Enterprise
product. Nomad CE does not perform utilization reporting.
Ref: https://github.com/hashicorp/nomad-enterprise/pull/2673
Ref: https://hashicorp.atlassian.net/browse/NMD-68
Ref: https://go.hashi.co/rfc/nmd-210
The server startup could "hang" to the view of an operator if it
had a key that could not be decrypted or replicated loaded from
the FSM at startup.
In order to prevent this happening, the server startup function
will now use a timeout to wait for the encrypter to be ready. If
the timeout is reached, the error is sent back to the caller which
fails the CLI command. This bubbling of error message will also
flush to logs which will provide addition operator feedback.
The server only cares about keys loaded from the FSM snapshot and
trailing logs before the encrypter should be classed as ready. So
that the encrypter ready function does not get blocked by keys
added outside of the initial Raft load, we take a snapshot of the
decryption tasks as we enter the blocking call, and class these as
our barrier.