* func: add initial enos skeleton
* style: add headers
* func: change the variables input to a map of objects to simplify the workloads creation
* style: formating
* Add tests for servers and clients
* style: separate the tests in diferent scripts
* style: add missing headers
* func: add tests for allocs
* style: improve output
* func: add step to copy remote upgrade version
* style: hcl formatting
* fix: remove the terraform nomad provider
* fix: Add clean token to remove extra new line added in provision
* fix: Add clean token to remove extra new line added in provision
* fix: Add clean token to remove extra new line added in provision
* fix: add missing license headers
* style: hcl fmt
* style: rename variables and fix format
* func: remove the template step on the workloads module and chop the noamd token output on the provide module
* fix: correct the jobspec path on the workloads module
* fix: add missing variable definitions on job specs for workloads
* style: formatting
* fix: Add clean token to remove extra new line added in provision
* func: add module to upgrade servers
* style: missing headers
* func: add upgrade module
* func: add install for windows as well
* func: add an intermediate module that runs the upgrade server for each server
* fix: add missing license headers
* fix: remove extra input variables and connect upgrade servers to the scenario
* fix: rename missing env variables for cluster health scripts
* func: move the cluster health test outside of the modules and into the upgrade scenario
* fix: fix the regex to ignore snap files on the gitignore file
* fix: Add clean token to remove extra new line added in provision
* fix: Add clean token to remove extra new line added in provision
* fix: Add clean token to remove extra new line added in provision
* fix: remove extra input variables and connect upgrade servers to the scenario
* style: formatting
* fix: move taken and restoring snapshots out of the upgrade_single_server to avoid possible race conditions
* fix: rename variable in health test
* fix: Add clean token to remove extra new line added in provision
* func: add an intermediate module that runs the upgrade server for each server
* fix: Add clean token to remove extra new line added in provision
* fix: Add clean token to remove extra new line added in provision
* fix: Add clean token to remove extra new line added in provision
* func: fix the last_log_index check and add a versions check
* func: done use for_each when upgrading the servers, hardcodes each one to ensure they are upgraded one by one
* Update enos/modules/upgrade_instance/variables.tf
Co-authored-by: Tim Gross <tgross@hashicorp.com>
* Update enos/modules/upgrade_instance/variables.tf
Co-authored-by: Tim Gross <tgross@hashicorp.com>
* Update enos/modules/upgrade_instance/variables.tf
Co-authored-by: Tim Gross <tgross@hashicorp.com>
* func: make snapshot by calling every server and allowing stale data
* style: formatting
* fix: make the source for the upgrade binary unknow until apply
* func: use enos bundle to install remote upgrade version, enos_files is not meant for dynamic files
---------
Co-authored-by: Tim Gross <tgross@hashicorp.com>
Internally, sizes are always in binary units; this documentation is misleading and implies that they work in decimal units.
Without going through and replacing _every_ "MB" -> "MiB" this is the best way to hint to developers that binary sizes are used.
* Adds Actions to job status command output
* Adds Actions to job status command output
* Status documentation updated to show actions and formatJobActions no longer cares about pipe delineation
* Multi-condition start/revert/edit buttons when a job isn't running
* mirage-mocked revertable jobs and acceptance tests
* Remove version-watching from job index route
The `volume delete` command doesn't allow using a prefix for the volume ID for
either CSI or dynamic host volumes. Use a prefix search and wildcard namespace
as we do for other CLI commands.
Ref: https://hashicorp.atlassian.net/browse/NET-12057
If you create a volume via `volume create/register` and want to update it later,
you need to change the volume spec to add the ID that was returned. This isn't a
very nice UX, so let's add an `-id` argument that allows you to update existing
volumes that have that ID.
Ref: https://hashicorp.atlassian.net/browse/NET-12083
* Upgrade to using hashicorp/go-metrics@v0.5.4
This also requires bumping the dependencies for:
* memberlist
* serf
* raft
* raft-boltdb
* (and indirectly hashicorp/mdns due to the memberlist or serf update)
Unlike some other HashiCorp products, Nomads root module is currently expected to be consumed by others. This means that it needs to be treated more like our libraries and upgrade to hashicorp/go-metrics by utilizing its compat packages. This allows those importing the root module to control the metrics module used via build tags.
* func: add initial enos skeleton
* style: add headers
* func: change the variables input to a map of objects to simplify the workloads creation
* style: formating
* Add tests for servers and clients
* style: separate the tests in diferent scripts
* style: add missing headers
* func: add tests for allocs
* style: improve output
* func: add step to copy remote upgrade version
* style: hcl formatting
* fix: remove the terraform nomad provider
* fix: Add clean token to remove extra new line added in provision
* fix: Add clean token to remove extra new line added in provision
* fix: Add clean token to remove extra new line added in provision
* fix: add missing license headers
* style: hcl fmt
* style: rename variables and fix format
* func: remove the template step on the workloads module and chop the noamd token output on the provide module
* fix: correct the jobspec path on the workloads module
* fix: add missing variable definitions on job specs for workloads
* style: formatting
* fix: rename variable in health test
* quota spec:
if `region_limit.storage.host_volumes` is set,
do not require that `variables` also be set,
and vice versa.
* subtract from quota usage on volume delete
* stub CE quota subtraction method
When a client restarts but can't restore a volume (ex. the plugin is now
missing), it's removed from the node fingerprint. So we won't allow future
scheduling of the volume, but we were not updating the volume state field to
report this reasoning to operators. Make debugging easier and the state field
more meaningful by setting the value to "unavailable".
Also, remove the unused "deleted" field. We did not implement soft deletes and
aren't planning on it for Nomad 1.10.0.
Ref: https://hashicorp.atlassian.net/browse/NET-11551
When we implemented CSI, the types of the fields for access mode and attachment
mode on volume requests were defined with a prefix "CSI". This gets confusing
now that we have dynamic host volumes using the same fields. Fortunately the
original was a typedef on string, and the Go API in the `api` package just uses
strings directly, so we can change the name of the type without breaking
backwards compatibility for the msgpack wire format.
Update the names to `VolumeAccessMode` and `VolumeAttachmentMode`. Keep the CSI
and DHV specific value constant names for these fields (they aren't currently
1:1), so that we can easily differentiate in a given bit of code which values
are valid.
Ref: https://github.com/hashicorp/nomad/pull/24881#discussion_r1920702890
Nomad Enterprise will utilise new reporting metrics and the
changes here allow this work to be conducted.
The server specific GetClientNodesCount function has been remomved
from CE as this is only called within enterprise code. A new
heartbeater function allows us to get the number of active timers,
which can be used by the heartbeater metrics and any other callers
that want this data.
When a volume is updated, we merge the new definition to the old. But the
volume's context comes from the plugin and is likely not present in the user's
volume specification. Which means that if the user re-submits the volume
specification to make an adjustment to the volume, we wipe out the context field
which might be required for subsequent operations by the CSI plugin. This was
discovered to be a problem with the Terraform provider and fixed there, but it's
also a problem for users of the `volume create` and `volume register` commands.
Update the merge so that we only overwrite the value of the context if it's been
explictly set by the user. We still need to support user-driven updates to
context for the `volume register` workflow.
Ref: https://github.com/hashicorp/terraform-provider-nomad/pull/503
Fixes: https://github.com/democratic-csi/democratic-csi/issues/438
The `volume init` command creates example volume specifications. But one of the
values for `capability.access_mode` is not a valid value. Correct the example to
match the validation logic.
The default node configuration in the client should always set an empty
HostVolumes map. Otherwise callers can panic, e.g.,:
goroutine 179 [running]:
github.com/hashicorp/nomad/client/hostvolumemanager.UpdateVolumeMap({0x36042b0, 0xc000c62a80}, 0x0, {0xc000a802a0, 0xd}, 0xc000691940)
github.com/hashicorp/nomad/client/hostvolumemanager/volume_fingerprint.go:43 +0x1b2
github.com/hashicorp/nomad/client.(*Client).batchFirstFingerprints.func1({0xc000a802a0, 0xd}, 0xc000691940)
github.com/hashicorp/nomad/client/node_updater.go:54 +0xd7
github.com/hashicorp/nomad/client.(*batchNodeUpdates).batchHostVolumeUpdates(0xc000912608?, 0xc0009f2f88)
github.com/hashicorp/nomad/client/node_updater.go:417 +0x152
github.com/hashicorp/nomad/client.(*Client).batchFirstFingerprints(0xc000c2d188)
github.com/hashicorp/nomad/client/node_updater.go:53 +0x1c5
created by github.com/hashicorp/nomad/client.NewClient in goroutine 1
github.com/hashicorp/nomad/client/client.go:557 +0x2069
is a panic of the HVM when restarting a client that doesn't have any static
host volumes, but does have a dynamic host volume.