Commit Graph

276 Commits

Author SHA1 Message Date
Michael Schurter
29af9891f8 test: test the buffered pipe used by nsd (#12563)
Nomad Service Discovery uses an in-memory buffered pipe implementation
to connect consul-template to the Nomad API.

This adds a basic test for that helper functionality.
2022-04-14 08:38:25 -07:00
Yoan Blanc
bda7b1ece0 feat: remove dependency to consul/lib
Signed-off-by: Yoan Blanc <yoan@dosimple.ch>
2022-04-09 13:22:44 +02:00
James Rasell
9dc0b88cb5 client: add Nomad template service functionality to runner. (#12458)
This change modifies the template task runner to utilise the
new consul-template which includes Nomad service lookup template
funcs.

In order to provide security and auth to consul-template, we use
a custom HTTP dialer which is passed to consul-template when
setting up the runner. This method follows Vault implementation.

Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
2022-04-06 19:17:05 +02:00
Derek Strickland
786180601d reconciler: support disconnected clients (#12058)
* Add merge helper for string maps
* structs: add statuses, MaxClientDisconnect, and helper funcs
* taintedNodes: Include disconnected nodes
* upsertAllocsImpl: don't use existing ClientStatus when upserting unknown
* allocSet: update filterByTainted and add delayByMaxClientDisconnect
* allocReconciler: support disconnecting and reconnecting allocs
* GenericScheduler: upsert unknown and queue reconnecting

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2022-04-05 17:10:37 -04:00
Luiz Aoqui
d412f7b497 Support Vault entity aliases (#12449)
Move some common Vault API data struct decoding out of the Vault client
so it can be reused in other situations.

Make Vault job validation its own function so it's easier to expand it.

Rename the `Job.VaultPolicies` method to just `Job.Vault` since it
returns the full Vault block, not just their policies.

Set `ChangeMode` on `Vault.Canonicalize`.

Add some missing tests.

Allows specifying an entity alias that will be used by Nomad when
deriving the task Vault token.

An entity alias assigns an indentity to a token, allowing better control
and management of Vault clients since all tokens with the same indentity
alias will now be considered the same client. This helps track Nomad
activity in Vault's audit logs and better control over Vault billing.

Add support for a new Nomad server configuration to define a default
entity alias to be used when deriving Vault tokens. This default value
will be used if the task doesn't have an entity alias defined.
2022-04-05 14:18:10 -04:00
James Rasell
d49cf2388a Merge branch 'main' into f-1.3-boogie-nights 2022-03-23 09:41:25 +01:00
Seth Hoenig
a44c55ae84 ci: limit gotestsum to circle ci
Part 2 of breaking up https://github.com/hashicorp/nomad/pull/12255

This PR makes it so gotestsum is invoked only in CircleCI. Also the
HCLogger(t) is plumbed more correctly in TestServer and TestAgent so
that they respect NOMAD_TEST_LOG_LEVEL.

The reason for these is we'll want to disable logging in GHA,
where spamming the disk with logs really drags performance.
2022-03-18 09:15:01 -05:00
Seth Hoenig
b242957990 ci: swap ci parallelization for unconstrained gomaxprocs 2022-03-15 12:58:52 -05:00
Luiz Aoqui
154264fcd9 Add pagination, filtering and sort to more API endpoints (#12186) 2022-03-08 20:54:17 -05:00
James Rasell
13da88bc74 helper: add ipaddr pkg to check for any IP addresses. 2022-03-03 11:24:50 +01:00
James Rasell
12265ee9d1 events: add state objects and logic for service registrations. 2022-02-28 10:44:58 +01:00
Seth Hoenig
42c6d5a5c5 command: switch from raft-boltdb to raft-boltdb/v2 2022-02-23 14:43:59 -06:00
Seth Hoenig
b2fe196e42 agent: switch to go.etc.io/bbolt for state store
This PR modifies the server and client agents to use `go.etc.io/bbolt` as the
implementation for their state stores.
2022-02-23 14:28:31 -06:00
Michael Schurter
2411d3afd2 core: remove all traces of unused protocol version
Nomad inherited protocol version numbering configuration from Consul and
Serf, but unlike those projects Nomad has never used it. Nomad's
`protocol_version` has always been `1`.

While the code is effectively unused and therefore poses no runtime
risks to leave, I felt like removing it was best because:

1. Nomad's RPC subsystem has been able to evolve extensively without
   needing to increment the version number.
2. Nomad's HTTP API has evolved extensively without increment
   `API{Major,Minor}Version`. If we want to version the HTTP API in the
   future, I doubt this is the mechanism we would choose.
3. The presence of the `server.protocol_version` configuration
   parameter is confusing since `server.raft_protocol` *is* an important
   parameter for operators to consider. Even more confusing is that
   there is a distinct Serf protocol version which is included in `nomad
   server members` output under the heading `Protocol`. `raft_protocol`
   is the *only* protocol version relevant to Nomad developers and
   operators. The other protocol versions are either deadcode or have
   never changed (Serf).
4. If we were to need to version the RPC, HTTP API, or Serf protocols, I
   don't think these configuration parameters and variables are the best
   choice. If we come to that point we should choose a versioning scheme
   based on the use case and modern best practices -- not this 6+ year
   old dead code.
2022-02-18 16:12:36 -08:00
Seth Hoenig
b432f377cf api: return sorted results in certain list endpoints
These API endpoints now return results in chronological order. They
can return results in reverse chronological order by setting the
query parameter ascending=true.

- Eval.List
- Deployment.List
2022-02-15 13:48:28 -06:00
Luiz Aoqui
bc333c2560 Merge tag 'v1.2.6' into merge-release-1.2.6-branch
Version 1.2.6
2022-02-10 14:55:34 -05:00
Seth Hoenig
b3c0e6a7a5 client: check escaping of alloc dir using symlinks
This PR adds symlink resolution when doing validation of paths
to ensure they do not escape client allocation directories.
2022-02-09 19:50:13 -05:00
Seth Hoenig
4f56d81ce2 Merge pull request #11983 from hashicorp/b-select-after
cleanup: prevent leaks from time.After
2022-02-03 09:38:06 -06:00
Glen Yu
5a3278368d added Int32ToPtr helper function (#11985) 2022-02-02 17:12:54 -05:00
Seth Hoenig
c1e033c8c6 cleanup: prevent leaks from time.After
This PR replaces use of time.After with a safe helper function
that creates a time.Timer to use instead. The new function returns
both a time.Timer and a Stop function that the caller must handle.

Unlike time.NewTimer, the helper function does not panic if the duration
set is <= 0.
2022-02-02 14:32:26 -06:00
Seth Hoenig
97176a5654 deps: import libtime the normal way
Previously we copied this library by hand to avoid vendor-ing a bunch of
files related to minimock. Now that we no longer vendor, just import the
library normally.

Also we might use more of the library for handling `time.After` uses,
for which this library provides a Context-based solution.
2022-01-31 14:49:05 -06:00
Tim Gross
358a46819b fix integer bounds checks (#11815)
* driver: fix integer conversion error

The shared executor incorrectly parsed the user's group into int32 and
then cast to uint32 without bounds checking. This is harmless because
an out-of-bounds gid will throw an error later, but it triggers
security and code quality scans. Parse directly to uint32 so that we
get correct error handling.

* helper: fix integer conversion error

The autopilot flags helper incorrectly parses a uint64 to a uint which
is machine specific size. Although we don't have 32-bit builds, this
sets off security and code quality scaans. Parse to the machine sized
uint.

* driver: restrict bounds of port map

The plugin server doesn't constrain the maximum integer for port
maps. This could result in a user-visible misconfiguration, but it
also triggers security and code quality scans. Restrict the bounds
before casting to int32 and return an error.

* cpuset: restrict upper bounds of cpuset values

Our cpuset configuration expects values in the range of uint16 to
match the expectations set by the kernel, but we don't constrain the
values before downcasting. An underflow could lead to allocations
failing on the client rather than being caught earlier. This also make
security and code quality scanners happy.

* http: fix integer downcast for per_page parameter

The parser for the `per_page` query parameter downcasts to int32
without bounds checking. This could result in underflow and
nonsensical paging, but there's no server-side consequences for
this. Fixing this will silence some security and code quality scanners
though.
2022-01-25 11:16:48 -05:00
Seth Hoenig
87dbc7162b deps: upgrade docker and runc
This PR upgrades
 - docker dependency to the latest tagged release (v20.10.12)
 - runc dependency to the latest tagged release (v1.0.3)

Docker does not abide by [semver](https://github.com/moby/moby/issues/39302), so it is marked +incompatible,
and transitive dependencies are upgrade manually.

Runc made three relevant breaking changes

 * cgroup manager .Set changed to accept Resources instead of Cgroup
   3f65946756

 * config.Device moved to devices.Device
   https://github.com/opencontainers/runc/pull/2679

 * mountinfo.Mounted now returns an error if the specified path does not exist
   https://github.com/moby/sys/blob/mountinfo/v0.5.0/mountinfo/mountinfo.go#L16
2022-01-18 08:35:26 -06:00
Michael Schurter
fa3de735cf cli: return error from raft commands if db is open
Before this change trying to run `nomad operator raft {info,logs}` on an
inuse raft.db would cause the command to block until the agent using
raft.db is closed.

After this change the command will block for 1s before returning a
(hopefully) helpful error message.

This change also sets the ReadOnly mode on the underlying BoltDb to
ensure diagnostics make no changes to the underlying store. We have no
evidence this has ever occurred, but it seems like a useful safety
measure.

No changelog added since this is a minor tweak in a "new" feature (it
was hidden in previous relases).
2021-12-16 11:41:01 -08:00
Tim Gross
bd18a452ab cli: stream raft logs to operator raft logs subcommand (#11684)
The `nomad operator raft logs` command uses a raft helper that reads
in the logs from raft and serializes them to JSON. The previous
implementation returned the slice of all logs and then serializes the
entire object. Update the helper to stream the log entries and then
serialize them as newline-delimited JSON.
2021-12-16 13:38:58 -05:00
Mahmood Ali
68bae12fd4 Raft Debugging Improvements (#11414) 2021-11-04 10:16:12 -04:00
Dave May
1bd132f09d debug: Improve namespace and region support (#11269)
* Include region and namespace in CLI output
* Add region and prefix matching for server members
* Add namespace and region API outputs to cluster metadata folder
* Add region awareness to WaitForClient helper function
* Add helper functions for SliceStringHasPrefix and StringHasPrefixInSlice
* Refactor test client agent generation
* Add tests for region
* Add changelog
2021-10-12 16:58:41 -04:00
Mahmood Ali
88ff7f40de Merge pull request #11089 from hashicorp/b-cve-2021-37218
Apply authZ for nomad Raft RPC layer
2021-10-05 08:49:21 -04:00
Mahmood Ali
6c414cd5f9 gofmt all the files
mostly to handle build directives in 1.17.
2021-10-01 10:14:28 -04:00
Tim Gross
420bce0af0 devices: externalize nvidia device driver 2021-09-29 13:43:37 -07:00
James Rasell
e26f1c4591 lint: mark false positive or fix gocritic append lint errors. 2021-09-06 10:49:44 +02:00
James Rasell
3bffe443ac chore: fix incorrect docstring formatting. 2021-08-30 11:08:12 +02:00
Mahmood Ali
39627df49f Apply authZ for nomad Raft RPC layer
When mTLS is enabled, only nomad servers of the region should access the
Raft RPC layer. Clients and servers in other regions should only use the
Nomad RPC endpoints.

Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
Co-authored-by: Seth Hoenig <shoenig@hashicorp.com>
2021-08-26 15:10:07 -04:00
James Rasell
530c0f8448 tlsutil: update testing certificates close to expiry. 2021-08-13 11:09:40 +02:00
Mahmood Ali
3f7a5c1474 pool: track usage of incoming streams (#10710)
Track usage of incoming streams on a connection. Connections without
reference counts get marked as unused and reaped in a periodic job.

This fixes a bug where `alloc exec` and `alloc fs` sessions get terminated
unexpectedly. Previously, when a client heartbeats switches between
servers, the pool connection reaper eventually identifies the connection
as unused and closes it even if it has an active exec/fs sessions.

Fixes #10579
2021-06-07 10:22:37 -04:00
Seth Hoenig
312161c5fc consul/connect: add support for connect mesh gateways
This PR implements first-class support for Nomad running Consul
Connect Mesh Gateways. Mesh gateways enable services in the Connect
mesh to make cross-DC connections via gateways, where each datacenter
may not have full node interconnectivity.

Consul docs with more information:
https://www.consul.io/docs/connect/gateways/mesh-gateway

The following group level service block can be used to establish
a Connect mesh gateway.

service {
  connect {
    gateway {
      mesh {
        // no configuration
      }
    }
  }
}

Services can make use of a mesh gateway by configuring so in their
upstream blocks, e.g.

service {
  connect {
    sidecar_service {
      proxy {
        upstreams {
          destination_name = "<service>"
          local_bind_port  = <port>
          datacenter       = "<datacenter>"
          mesh_gateway {
            mode = "<mode>"
          }
        }
      }
    }
  }
}

Typical use of a mesh gateway is to create a bridge between datacenters.
A mesh gateway should then be configured with a service port that is
mapped from a host_network configured on a WAN interface in Nomad agent
config, e.g.

client {
  host_network "public" {
    interface = "eth1"
  }
}

Create a port mapping in the group.network block for use by the mesh
gateway service from the public host_network, e.g.

network {
  mode = "bridge"
  port "mesh_wan" {
    host_network = "public"
  }
}

Use this port label for the service.port of the mesh gateway, e.g.

service {
  name = "mesh-gateway"
  port = "mesh_wan"
  connect {
    gateway {
      mesh {}
    }
  }
}

Currently Envoy is the only supported gateway implementation in Consul.
By default Nomad client will run the latest official Envoy docker image
supported by the local Consul agent. The Envoy task can be customized
by setting `meta.connect.gateway_image` in agent config or by setting
the `connect.sidecar_task` block.

Gateways require Consul 1.8.0+, enforced by the Nomad scheduler.

Closes #9446
2021-06-04 08:24:49 -05:00
Seth Hoenig
6884455750 connect: use exp backoff when waiting on consul envoy bootstrap
This PR wraps the use of the consul envoy bootstrap command in
an expoenential backoff closure, configured to timeout after 60
seconds. This is an increase over the current behavior of making
3 attempts over 6 seconds.

Should help with #10451
2021-04-27 09:21:50 -06:00
Chris Baker
cb3d6ece21 json handles were moved to a new package in #10202
this was unecessary after refactoring, so this moves them back to their
original location in package structs
2021-04-02 13:31:10 +00:00
Chris Baker
80066ac798 Merge branch 'main' into f-node-drain-api 2021-04-01 15:22:57 -05:00
Tim Gross
14568b3e00 deps: bump gopsutil to v3.21.2 2021-03-30 16:02:51 -04:00
Chris Baker
a52f32dedc restored Node.Sanitize() for RPC endpoints
multiple other updates from code review
2021-03-26 17:03:15 +00:00
Chris Baker
0cd707e3a9 moved JSON handlers and extension code around a bit for proper order of
initialization
2021-03-22 14:12:42 +00:00
Charlie Voiselle
d914990e5f Fixup uses of sanity (#10187)
* Fixup uses of `sanity`
* Remove unnecessary comments.

These checks are better explained by earlier comments about
the context of the test. Per @tgross, moved the tests together
to better reinforce the overall shared context.

* Update nomad/fsm_test.go
2021-03-16 18:05:08 -04:00
Tim Gross
a12f44705a RPC endpoints to support 'nomad ui -login'
RPC endpoints for the user-driven APIs (`UpsertOneTimeToken` and
`ExchangeOneTimeToken`) and token expiration (`ExpireOneTimeTokens`).
Includes adding expiration to the periodic core GC job.
2021-03-10 08:17:56 -05:00
Kris Hicks
2cd7136bc7 Fix some errcheck errors (#9811)
* Throw away result of multierror.Append

When given a *multierror.Error, it is mutated, therefore the return
value is not needed.

* Simplify MergeMultierrorWarnings, use StringBuilder

* Hash.Write() never returns an error

* Remove error that was always nil

* Remove error from Resources.Add signature

When this was originally written it could return an error, but that was
refactored away, and callers of it as of today never handle the error.

* Throw away results of io.Copy during Bridge

* Handle errors when computing node class in test
2021-01-14 12:46:35 -08:00
Kris Hicks
c52e0bbf41 gatedwriter: Fix race condition (#9791)
If one thread calls `Flush()` on a gatedwriter while another thread attempts to
`Write()` new data to it, strange things will happen.

The test I wrote shows that at the very least you can write _while_ flushing,
and the call to `Write()` will happen during the internal writes of the
buffered data, which is maybe not what is expected. (i.e. the `Write()`'d data
will be inserted somewhere in the middle of the data being `Flush()'d`)

It's also the case that, because `Write()` only has a read lock, if you had
multiple threads trying to write ("read") at the same time you might have data
loss because the `w.buf` that was read would not necessarily be up-to-date by
the time `p2` was appended to it and it was re-assigned to `w.buf`. You can see
this if you run the new gatedwriter tests with `-race` against the old implementation:

```
WARNING: DATA RACE
Read at 0x00c0000c0420 by goroutine 11:
  runtime.growslice()
      /usr/lib/go/src/runtime/slice.go:125 +0x0
  github.com/hashicorp/nomad/helper/gated-writer.(*Writer).Write()
      /home/hicks/workspace/nomad/helper/gated-writer/writer.go:41 +0x2b6
  github.com/hashicorp/nomad/helper/gated-writer.TestWriter_WithMultipleWriters.func1()
      /home/hicks/workspace/nomad/helper/gated-writer/writer_test.go:90 +0xea
```

This race condition is fixed in this change.
2021-01-14 12:43:14 -08:00
Seth Hoenig
33527b1547 e2e: add e2e test for service registration 2021-01-05 08:48:12 -06:00
Mahmood Ali
8879645ab9 docker: introduce a new hcl2-friendly mount syntax (#9635)
Introduce a new more-block friendly syntax for specifying mounts with a new `mount` block type with the target as label:

```hcl
config {
  image = "..."

  mount {
    type = "..."
    target = "target-path"
    volume_options { ... }
  }
}
```

The main benefit here is that by `mount` being a block, it can nest blocks and avoids the compatibility problems noted in https://github.com/hashicorp/nomad/pull/9634/files#diff-2161d829655a3a36ba2d916023e4eec125b9bd22873493c1c2e5e3f7ba92c691R128-R155 .

The intention is for us to promote this `mount` blocks and quietly deprecate the `mounts` type, while still honoring to preserve compatibility as much as we could.

This addresses the issue in https://github.com/hashicorp/nomad/issues/9604 .
2020-12-15 14:13:50 -05:00
Seth Hoenig
14aca2fe3e Merge pull request #9624 from hashicorp/b-connect-meta-regression
consul/connect: fix regression where client connect images ignored
2020-12-14 11:03:09 -06:00
Seth Hoenig
d5e6c5e22e command: give flag-helpers a better name 2020-12-14 10:07:27 -06:00