This PR replaces fsouza/go-dockerclient 3rd party docker client library with
docker's official SDK.
---------
Co-authored-by: Tim Gross <tgross@hashicorp.com>
Co-authored-by: Seth Hoenig <shoenig@duck.com>
The main point of this dependency upgrade is to pull in the fixes in
hashicorp/yamux#127 which prevents leaking deadlocked goroutines. It has
been observed to improve the issue in hashicorp/nomad#23305 but does not
seem sufficient to fix it entirely.
Since touching yamux is a rare and scary event, I do **not** intend to
backport this. If we discover the improvements are stable and
significant enough, or if further fixes land in yamux, backporting can
be done at that time.
Fixes#20335
The major change between Raft v1.6 -> v1.7 was the introduction of the
Prevote feature. Before Prevote, when a partitioned node rejoins a
cluster it may cause an election even if the cluster was stable. Prevote
can avoid this useless election so reintroducing partitioned servers to
an otherwise stable cluster becomes seamless.
Full details: https://github.com/hashicorp/raft/pull/530
In #20335 we discussed whether or not to add a configuration option to
disable prevote in case bugs were discovered. While bugs have been found
(hence the v1.7.1 version as opposed to v1.7.0), I'm choosing to follow
Vault's lead of straightfordwardly bumping the raft dependency:
hashicorp/vault#27605 and hashicorp/vault#28218
In Nomad 1.4.0, we shipped support for encrypted Variables and signed Workload
Identities, but the key material is protected only by a AEAD encrypting the
KEK. Add support for Vault transit encryption and external KMS from major cloud
providers. The servers call out to the external service to decrypt each key in
the on-disk keystore.
Ref: https://hashicorp.atlassian.net/browse/NET-10334
Fixes: https://github.com/hashicorp/nomad/issues/14852
* Upgrade consul-template to 0.39.0 to allow template queries of admin
partitions and sameness groups.
* Upgrade our Consul API to 1.29.1 because it's required for CT, and to remove
the replacement pinned version we were using to pick up some newer Consul API
features we needed in 1.7.0.
Ref: https://hashicorp.atlassian.net/browse/NET-10153
Update `runc` to 1.1.13 to pick up build support for Go 1.22.4+, in order to
ensure we've resolved errors cloning processes into Linux namespaces for
libcontainer (`exec` driver) with new versions of Go and older but still
supported versions of glibc.
This changeset has two minor quirks:
* Testing shows that the reported issues is already resolved on `main` by
upgrading to Go 1.22.4 without this dependency bump, at least for glibc 2.31.
Upgrading the dependency should make sure there isn't another glibc version
where the problem will still appear.
* This version of `runc` refers to fields in `cilium/ebpf` which are not present
in more recent versions of that library. So in order to build, we have to
downgrade `cilium/ebpf`. Fortunately, `runc` is the only consumer of that
transitive dependency.
Closes: https://github.com/hashicorp/nomad/issues/20212
Ref: https://hashicorp.atlassian.net/browse/NET-10078
We bring in `containernetworking/plugins` for the contents of a single file,
which we use in a few places for running a goroutine in a specific network
namespace. This code hasn't needed an update in a couple of years, and a good
chunk of what we need was previously vendored into `client/lib/nsutil`
already.
Updating the library via dependabot is causing errors in Docker driver tests
because it updates a lot of transient dependencies, and it's bringing in a pile
of new transient dependencies like opentelemetry. Avoid this problem going
forward by vendoring the remaining code we hadn't already.
Ref: https://github.com/hashicorp/nomad/pull/20146
Although Nomad does not use HTTP2, vulnerability scans detect our version of
`golang.org/x/net` as having an HPACK DoS vuln (GHSA-4v7x-pqxf-cx7m). Upgrade
the library so as to quiet the alerts.
Fixes: https://github.com/hashicorp/nomad-enterprise/issues/1423
A Nomad user reported an issue where template runner `View.poll` goroutines were
being leaked when using templates with many dependencies. This resource leak was
fixed in consul-template 0.37.4.
Fixes: https://github.com/hashicorp/nomad/issues/20163
Replaces #18812
Upgraded with:
```
find . -name '*.go' -exec sed -i s/"github.com\/hashicorp\/go-msgpack\/codec"/"github.com\/hashicorp\/go-msgpack\/v2\/codec/" '{}' ';'
find . -name '*.go' -exec sed -i s/"github.com\/hashicorp\/net-rpc-msgpackrpc"/"github.com\/hashicorp\/net-rpc-msgpackrpc\/v2/" '{}' ';'
go get
go get -v -u github.com/hashicorp/raft-boltdb/v2
go get -v github.com/hashicorp/serf@5d32001edfaa18d1c010af65db707cdb38141e80
```
see https://github.com/hashicorp/go-msgpack/releases/tag/v2.1.0
for details
Although Nomad is not vulnerable to CVE-2024-24786 because it's configured to
discard unknown messages during unmarshaling, we should upgrade so that
third-party vulnerability scanners don't detect the vulnerable version and
complain.
Also update go1.22.1 changelog entry to include CVEs
* build: upgrade to go1.22
* add cl
* build: use codecgen from go-msgpack v1.1.5+base32 and stringer 0.18.0
for compatability with go1.22
* ci: update golangci-lint to 1.56.2
* build: update hclogvet for go1.22
* build: bump to go1.22.1
* tests: swap testify for test in plugins/csi/client_test.go
* tests: swap testify for test in testutil/
* tests: swap testify for test in host_test.go
* tests: swap testify for test in plugin_test.go
* tests: swap testify for test in utils_test.go
* tests: swap testify for test in scheduler/
* tests: swap testify for test in parse_test.go
* tests: swap testify for test in attribute_test.go
* tests: swap testify for test in plugins/drivers/
* tests: swap testify for test in command/
* tests: fixup some test usages
* go: run go mod tidy
* windows: cpuset test only on linux
The Nomad client renders templates in the same privileged process used for most
other client operations. During internal testing, we discovered that a malicious
task can create a symlink that can cause template rendering to read and write to
arbitrary files outside the allocation sandbox. Because the Nomad agent can be
restarted without restarting tasks, we can't simply check that the path is safe
at the time we write without encountering a time-of-check/time-of-use race.
To protect Nomad client hosts from this attack, we'll now read and write
templates in a subprocess:
* On Linux/Unix, this subprocess is sandboxed via chroot to the allocation
directory. This requires that Nomad is running as a privileged process. A
non-root Nomad agent will warn that it cannot sandbox the template renderer.
* On Windows, this process is sandboxed via a Windows AppContainer which has
been granted access to only to the allocation directory. This does not require
special privileges on Windows. (Creating symlinks in the first place can be
prevented by running workloads as non-Administrator or
non-ContainerAdministrator users.)
Both sandboxes cause encountered symlinks to be evaluated in the context of the
sandbox, which will result in a "file not found" or "access denied" error,
depending on the platform. This change will also require an update to
Consul-Template to allow callers to inject a custom `ReaderFunc` and
`RenderFunc`.
This design is intended as a workaround to allow us to fix this bug without
creating backwards compatibility issues for running tasks. A future version of
Nomad may introduce a read-only mount specifically for templates and artifacts
so that tasks cannot write into the same location that the Nomad agent is.
Fixes: https://github.com/hashicorp/nomad/issues/19888
Fixes: CVE-2024-1329
Although Nomad itself is not vulnerable to CVE-2024-21626, we want to update
dependencies that bring in the vulnerable packages so as not to trip
vulnerability scanners. Update `containerd` and `go-dockerclient` as well as the
various transitive dependencies these bring in.