Nomad v1.9.0 (finally!) removes support for HCL1 and the `-hcl1` flag.
See #23912 for details.
One of the uses of HCL1 over HCL2 was that HCL1 allowed quoted keys in
blocks such as env, meta, and Docker's labels:
```hcl
some_block {
"foo.bar" = "baz"
}
```
This works in HCL1 but is invalid HCL2. In HCL2 you must use a map
instead of a block:
```hcl
some_map = {
"eggs.spam" = "works!"
}
```
This was such a hassle for users we special cased the `env` and `meta`
blocks to be accepted as blocks or maps in #9936.
However Docker `labels`, being a task config option, is much harder to
special case and commonly needs dots-in-keys for things like DataDog
autodiscovery via Docker container labels:
https://docs.datadoghq.com/containers/docker/integrations/?tab=labels
Luckily `labels` can be specified as a list-of-maps instead:
```hcl
labels = [
{
"com.datadoghq.ad.check_names" = "[\"openmetrics\"]"
"com.datadoghq.ad.init_configs" = "[{}]"
}
]
```
So instead of adding more awkward hcl1/2 backward compat code to Nomad,
I just updated the docs to hopefully help people hit by this.
The only other known workaround is dropping HCL in favor of JSON
jobspecs altogether, but that forces a huge migration and maintenance
burden on users:
https://discuss.hashicorp.com/t/docker-based-autodiscovery-with-datadog-how-can-we-make-it-work/18870
Nomad clients manage a cpuset cgroup for each task to reserve or share CPU
cores. But Docker owns its own cgroups, and attempting to set a parent cgroup
that Nomad manages runs into conflicts with how runc manages cgroups via
systemd. Therefore Nomad must run as root in order for cpuset management to ever
be compatible with Docker.
However, some users running in unsupported configurations felt that the changes
we made in Nomad 1.7.0 to ensure Nomad was running correctly represented a
regression. This changeset disables cpuset management for non-root Nomad
clients. When running Nomad as non-root, the driver will not longer reconcile
cpusets with Nomad and `resources.cores` will behave incorrectly (but the driver
will still run).
Although this is one small step along the way to supporting a rootless Nomad
client, running Nomad as non-root is still unsupported. This PR is insufficient
by itself to have a secure and properly-working rootless Nomad client.
Ref: https://github.com/hashicorp/nomad/issues/18211
Ref: https://github.com/hashicorp/nomad/issues/13669
Ref: https://hashicorp.atlassian.net/browse/NET-10652
Ref: https://github.com/opencontainers/runc/blob/main/docs/systemd.md
The documentation for the `SSL` option for the Docker driver is
misleading inasmuch as it's both deprecated and non-functional in current
versions of Docker. Remove this option from the docs and add a section
explaining how to use insecure registries.
Fixes: https://github.com/hashicorp/nomad/issues/23616
This enables checks for ContainerAdmin user on docker images on Windows. It's
only checked if users run docker with process isolation and not hyper-v,
because hyper-v provides its own, proper sandboxing.
---------
Co-authored-by: Tim Gross <tgross@hashicorp.com>
This allows users to set a custom value of attempts that will be made to purge
an existing (not running) container if one is found during task creation.
---------
Co-authored-by: Tim Gross <tgross@hashicorp.com>
* drivers/raw_exec: enable setting cgroup override values
This PR enables configuration of cgroup override values on the `raw_exec`
task driver. WARNING: setting cgroup override values eliminates any
gauruntee Nomad can make about resource availability for *any* task on
the client node.
For cgroup v2 systems, set a single unified cgroup path using `cgroup_v2_override`.
The path may be either absolute or relative to the cgroup root.
config {
cgroup_v2_override = "custom.slice/app.scope"
}
or
config {
cgroup_v2_override = "/sys/fs/cgroup/custom.slice/app.scope"
}
For cgroup v1 systems, set a per-controller path for each controller using
`cgroup_v1_override`. The path(s) may be either absolute or relative to
the controller root.
config {
cgroup_v1_override = {
"pids": "custom/app",
"cpuset": "custom/app",
}
}
or
config {
cgroup_v1_override = {
"pids": "/sys/fs/cgroup/pids/custom/app",
"cpuset": "/sys/fs/cgroup/cpuset/custom/app",
}
}
* drivers/rawexec: ensure only one of v1/v2 cgroup override is set
* drivers/raw_exec: executor should error if setting cgroup does not work
* drivers/raw_exec: create cgroups in raw_exec tests
* drivers/raw_exec: ensure we fail to start if custom cgroup set and non-root
* move custom cgroup func into shared file
---------
Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
* drivers/raw_exec: enable configuring raw_exec task to have no memory limit
This PR makes it possible to configure a raw_exec task to not have an
upper memory limit, which is how the driver would behave pre-1.7.
This is done by setting memory_max = -1. The cluster (or node pool) must
have memory oversubscription enabled.
* cl: add cl
Nomad can only use cgroups to control resource requirements if all the cgroups
controllers are actually enabled. Add this to our requirements documentation as
well as the impacted `exec` and `java` task drivers.
Linux capabilities configurable by the task must be a subset of those configured
in the plugin configuration. Clarify this implies that `"all"` is not permitted
if the plugin is not also configured to allow all capabilities.
Fixes: https://github.com/hashicorp/nomad/issues/19059
The `docker` driver cannot expand capabilities beyond the default set when the
task is a non-root user. Clarify this in the documentation of `allow_caps` and
update the `cap_add` and `cap_drop` to match the `exec` driver, which has more
clear language overall.
* docker: set force=true on remove image to handle images referenced by multiple tags
This PR changes our call of docker client RemoveImage() to RemoveImageExtended with
the Force=true option set. This fixes a bug where an image referenced by more than
one tag could never be garbage collected by Nomad. The Force option only applies to
stopped containers; it does not affect running workloads.
* docker: add note about image_delay and multiple tags
The exec driver and other drivers derived from the shared executor check the
path of the command before handing off to libcontainer to ensure that the
command doesn't escape the sandbox. But we don't check any host volume mounts,
which should be safe to use as a source for executables if we're letting the
user mount them to the container in the first place.
Check the mount config to verify the executable lives in the mount's host path,
but then return an absolute path within the mount's task path so that we can hand
that off to libcontainer to run.
Includes a good bit of refactoring here because the anchoring of the final task
path has different code paths for inside the task dir vs inside a mount. But
I've fleshed out the test coverage of this a good bit to ensure we haven't
created any regressions in the process.
This reverts PR #12416 and commit 6668ce022a.
While the driver options are well and truly deprecated, this documentation also
covers features like `fingerprint.denylist` that are not available any other
way. Let's revert this until #12420 is ready.
The QEMU driver can take an optional `graceful_shutdown` configuration
which will create a Unix socket to send ACPI shutdown signal to the VM.
Unix sockets have a hard length limit and the driver implementation
assumed that QEMU versions 2.10.1 were able to handle longer paths. This
is not correct, the linked QEMU fix only changed the behaviour from
silently truncating longer socket paths to throwing an error.
By validating the socket path before starting the QEMU machine we can
provide users a more actionable and meaningful error message, and by
using a shorter socket file name we leave a bit more room for
user-defined values in the path, such as the task name.
The maximum length allowed is also platform-dependant, so validation
needs to be different for each OS.
Closes#12927Closes#12958
This PR updates the version of redis used in our examples from 3.2 to 7.
The old version is very not supported anymore, and we should be setting
a good example by using a supported version.
The long-form example job is now fixed so that the service stanza uses
nomad as the service discovery provider, and so now the job runs without
a requirement of having Consul running and configured.