Commit Graph

55 Commits

Author SHA1 Message Date
Tim Gross
f295396ef8 docs: rename Internals to Concepts (#13696) 2022-07-11 16:55:33 -04:00
Seth Hoenig
64f35f9cf3 docs: move upgrade docs for max_client_timeout
Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
2022-07-07 16:46:26 -05:00
Seth Hoenig
cbcceb0625 docs: upgrade guide for client max_kill_timeout 2022-07-07 15:27:40 -05:00
Seth Hoenig
f1cafd0789 core: remove support for raft protocol version 2
This PR checks server config for raft_protocol, which must now
be set to 3 or unset (0). When unset, version 3 is used as the
default.
2022-06-23 14:37:50 +00:00
Arthur Leclerc
7518f42d1c docs: Fix typo (#13389) 2022-06-16 13:24:18 -04:00
Michael Schurter
3968509886 artifact: fix numerous go-getter security issues
Fix numerous go-getter security issues:

- Add timeouts to http, git, and hg operations to prevent DoS
- Add size limit to http to prevent resource exhaustion
- Disable following symlinks in both artifacts and `job run`
- Stop performing initial HEAD request to avoid file corruption on
  retries and DoS opportunities.

**Approach**

Since Nomad has no ability to differentiate a DoS-via-large-artifact vs
a legitimate workload, all of the new limits are configurable at the
client agent level.

The max size of HTTP downloads is also exposed as a node attribute so
that if some workloads have large artifacts they can specify a high
limit in their jobspecs.

In the future all of this plumbing could be extended to enable/disable
specific getters or artifact downloading entirely on a per-node basis.
2022-05-24 16:29:39 -04:00
Luiz Aoqui
59ce4f8caf docs: add Consul 1.12.0 upgrade notice 2022-05-16 18:44:26 -04:00
Tim Gross
3671ea6a8f remove pre-0.9 driver code and related E2E test (#12791)
This test exercises upgrades between 0.8 and Nomad versions greater
than 0.9. We have not supported 0.8.x in a very long time and in any
case the test has been marked to skip because the downloader doesn't
work.
2022-04-27 09:53:37 -04:00
James Rasell
89b74632d4 docs: add upgrade note for Consul implicit constraint. (#12749) 2022-04-22 15:53:27 +02:00
Seth Hoenig
b2a2f77d40 docs: update documentation with connect acls changes
This PR updates the changelog, adds notes the 1.3 upgrade guide, and
updates the connect integration docs with documentation about the new
requirement on Consul ACL policies of Consul agent default anonymous ACL
tokens.
2022-04-18 08:22:33 -05:00
Seth Hoenig
f2ea1fab5a connect: prefix tag with nomad.; merge into envoy_stats_tags; update docs
This PR expands on the work done in #12543 to
- prefix the tag, so it is now "nomad.alloc_id" to be more consistent with Consul tags
- merge into pre-existing envoy_stats_tags fields
- update the upgrade guide docs
- update changelog
2022-04-14 12:52:52 -05:00
Seth Hoenig
be80a63584 docs: fixup title formatting in upgrade guide 2022-04-08 11:50:54 -05:00
Luiz Aoqui
c3e36bb367 docs: fix upgrade specific broken link and conflict tag (#12521) 2022-04-08 12:36:47 -04:00
Michael Schurter
f87ec7e64e template: disallow writeToFile by default
Resolves #12095 by WONTFIXing it.

This approach disables `writeToFile` as it allows arbitrary host
filesystem writes and is only a small quality of life improvement over
multiple `template` stanzas.

This approach has the significant downside of leaving people who have
altered their `template.function_denylist` *still vulnerable!* I added
an upgrade note, but we should have implemented the denylist as a
`map[string]bool` so that new funcs could be denied without overriding
custom configurations.

This PR also includes a bug fix that broke enabling all consul-template
funcs. We repeatedly failed to differentiate between a nil (unset)
denylist and an empty (allow all) one.
2022-03-28 17:05:42 -07:00
Luiz Aoqui
7edf1885c4 docs: fix link and add note about Nomad v1.3.0 on raft v3 upgrade (#12378) 2022-03-25 10:11:46 -04:00
Seth Hoenig
c27af79add client: cgroups v2 code review followup 2022-03-24 13:40:42 -05:00
Seth Hoenig
5da1a31e94 client: enable support for cgroups v2
This PR introduces support for using Nomad on systems with cgroups v2 [1]
enabled as the cgroups controller mounted on /sys/fs/cgroups. Newer Linux
distros like Ubuntu 21.10 are shipping with cgroups v2 only, causing problems
for Nomad users.

Nomad mostly "just works" with cgroups v2 due to the indirection via libcontainer,
but not so for managing cpuset cgroups. Before, Nomad has been making use of
a feature in v1 where a PID could be a member of more than one cgroup. In v2
this is no longer possible, and so the logic around computing cpuset values
must be modified. When Nomad detects v2, it manages cpuset values in-process,
rather than making use of cgroup heirarchy inheritence via shared/reserved
parents.

Nomad will only activate the v2 logic when it detects cgroups2 is mounted at
/sys/fs/cgroups. This means on systems running in hybrid mode with cgroups2
mounted at /sys/fs/cgroups/unified (as is typical) Nomad will continue to
use the v1 logic, and should operate as before. Systems that do not support
cgroups v2 are also not affected.

When v2 is activated, Nomad will create a parent called nomad.slice (unless
otherwise configured in Client conifg), and create cgroups for tasks using
naming convention <allocID>-<task>.scope. These follow the naming convention
set by systemd and also used by Docker when cgroups v2 is detected.

Client nodes now export a new fingerprint attribute, unique.cgroups.version
which will be set to 'v1' or 'v2' to indicate the cgroups regime in use by
Nomad.

The new cpuset management strategy fixes #11705, where docker tasks that
spawned processes on startup would "leak". In cgroups v2, the PIDs are
started in the cgroup they will always live in, and thus the cause of
the leak is eliminated.

[1] https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html

Closes #11289
Fixes #11705 #11773 #11933
2022-03-23 11:35:27 -05:00
Luiz Aoqui
eca4ac67f3 cli: display Raft version in server members (#12317)
The previous output of the `nomad server members` command would output a
column named `Protocol` that displayed the Serf protocol being currently
used by servers.

This is not a configurable option, so it holds very little value to
operators. It is also easy to confuse it with the Raft Protocol version,
which is configurable and highly relevant to operators.

This commit replaces the previous `Protocol` column with the new `Raft
Version`. It also updates the `-detailed` flag to be called `-verbose`
so it matches other commands. The detailed output now also outputs the
same information as the standard output with the addition of the
previous `Protocol` column and `Tags`.
2022-03-17 14:15:10 -04:00
Luiz Aoqui
dfe520a9d8 server: transfer leadership in case of error (#12293)
When a Nomad server becomes the Raft leader, it must perform several
actions defined in the establishLeadership function. If any of these
actions fail, Raft will think the node is the leader, but it will not
actually be able to act as a Nomad leader.

In this scenario, leadership must be revoked and transferred to another
server if possible, or the node should retry the establishLeadership
steps.
2022-03-17 11:10:57 -04:00
Seth Hoenig
96a6f2c956 docs: emphasize snapshot before upgrading 2022-02-24 08:22:41 -06:00
Seth Hoenig
16efcf4e71 core: switch to go.etc.io/bbolt
This PR swaps the underlying BoltDB implementation from boltdb/bolt
to go.etc.io/bbolt.

In addition, the Server has a new configuration option for disabling
NoFreelistSync on the underlying database.

Freelist option: https://github.com/etcd-io/bbolt/blob/master/db.go#L81
Consul equivelent PR: https://github.com/hashicorp/consul/pull/11720
2022-02-23 14:26:41 -06:00
Tim Gross
7bcf0afd81 CSI: allow for concurrent plugin allocations (#12078)
The dynamic plugin registry assumes that plugins are singletons, which
matches the behavior of other Nomad plugins. But because dynamic
plugins like CSI are implemented by allocations, we need to handle the
possibility of multiple allocations for a given plugin type + ID, as
well as behaviors around interleaved allocation starts and stops.

Update the data structure for the dynamic registry so that more recent
allocations take over as the instance manager singleton, but we still
preserve the previous running allocations so that restores work
without racing.

Multiple allocations can run on a client for the same plugin, even if
only during updates. Provide each plugin task a unique path for the
control socket so that the tasks don't interfere with each other.
2022-02-23 15:23:07 -05:00
Luiz Aoqui
6a3368a08f docs: add upgrade note and ACL requirements for the job submit endpoint (#12046) 2022-02-10 15:35:16 -05:00
Tim Gross
e3009f1c6d raft: default to protocol v3 (#11572)
Many of Nomad's Autopilot features require raft protocol version
3. Set the default raft protocol to 3, and improve the upgrade
documentation.
2022-02-03 15:03:12 -05:00
Luiz Aoqui
ac18d719fe docs: update 1.2.0 upgrade note now that the UI ACL is fixed (#11840) 2022-01-17 11:09:08 -05:00
James Rasell
b34e652b7d docs: add 1.2.0 HCLv2 strict parsing upgrade note. 2022-01-03 15:41:18 +00:00
Luiz Aoqui
55018bdfe6 docs: add v1.2.0 upgrade guide about Nomad UI ACL change for job details page (#11689) 2021-12-16 14:32:20 -05:00
Tim Gross
97621ec3c5 nomad eval list command (#11675)
Use the new filtering and pagination capabilities of the `Eval.List`
RPC to provide filtering and pagination at the command line.

Also includes note that `nomad eval status -json` is deprecated and
will be replaced with a single evaluation view in a future version of
Nomad.
2021-12-15 11:58:38 -05:00
Kevin Wang
ddca508b0d feat(website): extract /plugins /tools docs (#11584)
Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
Co-authored-by: Mike Nomitch <mnomitch@hashicorp.com>
2021-12-09 14:25:18 -05:00
Luiz Aoqui
9a0fee0a17 docs: add upgrade guide notes for Nomad 1.2.2 (#11567) 2021-11-24 14:24:20 -05:00
Luiz Aoqui
681eeca515 docs: update Nvidia device plugin as external (#11313) 2021-10-14 12:22:31 -04:00
Michael Schurter
5e17917738 docs: add upgrade guide entry for audit log naming 2021-09-16 16:19:52 -07:00
Luiz Aoqui
cb0b2f5387 Document Docker extra_hosts behaviour post v1.1.3 (#11079)
Co-authored-by: Derek Strickland <1111455+DerekStrickland@users.noreply.github.com>
2021-09-01 12:41:06 -04:00
Mahmood Ali
1635c4b080 release 1.1.4 (#11088) 2021-08-30 11:43:05 -04:00
Luiz Aoqui
d74ab11d25 Don't timestamp active log file (#11070)
* don't timestamp active log file

* website: update log_file default value

* changelog: add entry for #11070

* website: add upgrade instructions for log_file in v1.14 and v1.2.0
2021-08-23 11:27:34 -04:00
Michael Schurter
2ffb7e1397 docs: add backward incompat note about #10875
Fixes #11002
2021-08-05 15:08:55 -07:00
Mike Nomitch
9235c790f0 [docs] Adds federation caveat to upgrade guide (#10847) 2021-07-09 09:42:17 -04:00
Tim Gross
a66034bb3a docker: move host path for hosts file mount to alloc dir (#10823)
In Nomad 1.1.1 we generate a hosts file based on the Nomad-owned network
namespace, rather than using the default hosts file from the pause
container. This hosts file should be shared between tasks in the same
allocation so that tasks can update the file and have the results propagated
between tasks.
2021-06-30 11:10:04 -04:00
Tim Gross
166a056531 docs: add missing backwards compat warning about port_map (#10827)
The `docker` driver's `port_map` field was deprecated in 0.12 and this is
documented in the task driver's docs, but we never explicitly flagged it for
backwards compatibility.
2021-06-28 15:49:41 -04:00
Seth Hoenig
845a3d3cdc docs: minor wording tweaks + cl 2021-05-17 12:52:52 -06:00
Seth Hoenig
7245ac3fc5 docs: update docs for linux capabilities in exec/java/docker drivers
Update docs for allow_caps, cap_add, cap_drop in exec/java/docker driver
pages. Also update upgrade guide with guidance on new default linux
capabilities for exec and java drivers.
2021-05-17 12:37:40 -06:00
Seth Hoenig
003d68fe6d drivers/docker+exec+java: disable net_raw capability by default
The default Linux Capabilities set enabled by the docker, exec, and
java task drivers includes CAP_NET_RAW (for making ping just work),
which has the side affect of opening an ARP DoS/MiTM attack between
tasks using bridge networking on the same host network.

https://docs.docker.com/engine/reference/run/#runtime-privilege-and-linux-capabilities

This PR disables CAP_NET_RAW for the docker, exec, and java task
drivers. The previous behavior can be restored for docker using the
allow_caps docker plugin configuration option.

A future version of nomad will enable similar configurability for the
exec and java task drivers.
2021-05-12 13:22:09 -07:00
Mike Nomitch
ee1163ed94 docs: add detail to 1.1 upgrade guide for licensing 2021-05-10 12:28:05 -04:00
Mike Nomitch
1df61f9c7e Moving licensing to the top of the upgrade guide and clarifying wording 2021-05-07 08:17:17 -04:00
Mike Nomitch
d5276c63ff website: adding trial links 2021-05-07 08:17:17 -04:00
Tim Gross
638a0daa1b docs: Enterprise licensing updates 2021-04-28 14:46:06 -04:00
Tim Gross
ee9bb3cc4f docs: changelog and upgrade note for iptables improvement 2021-04-15 10:19:37 -04:00
Tim Gross
45f0a3a532 CSI: capability block is required for volume registration 2021-04-08 13:02:24 -04:00
Bryce Kalow
ee79587a67 feat(website): migrates to new nav data format (#10264) 2021-03-31 08:43:17 -05:00
Florian Apolloner
f21ab14690 Automatically populate CONSUL_HTTP_ADDR for connect native tasks in host networking mode. Fixes #10239 2021-03-28 14:34:31 +02:00