Commit Graph

1584 Commits

Author SHA1 Message Date
Mahmood Ali
41bec868a8 http: adjust log level for request failure
Failed requests due to API client errors are to be marked as DEBUG.

The Error log level should be reserved to signal problems with the
cluster and are actionable for nomad system operators.  Logs due to
misbehaving API clients don't represent a system level problem and seem
spurius to nomad maintainers at best.  These log messages can also be
attack vectors for deniel of service attacks by filling servers disk
space with spurious log messages.
2020-04-22 16:19:59 -04:00
Mahmood Ali
5abc59284f Merge pull request #7704 from hashicorp/b-agent-shutdown-order
agent: shutdown agent http server last
2020-04-20 10:37:26 -04:00
Mahmood Ali
360e0a1669 agent: route http logs through hclog
Pipe http server log to hclog, so that it uses the same logging format
as rest of nomad logs.  Also, supports emitting them as json logs, when
json formatting is set.

The http server logs are emitted as Trace level, as they are typically
repsent HTTP client errors (e.g. failed tls handshakes, invalid headers,
etc).

Though, Panic logs represent server errors and are relayed as Error
level.
2020-04-20 10:33:40 -04:00
Mahmood Ali
d89687d014 agent: shutdown agent http server last
Shutdown http server last, after nomad client/server components
terminate.

Before this change, if the agent is taking an unexpectedly long time to
shutdown, the operator cannot query the http server directly: they
cannot access agent specific http endpoints and need to query another
agent about the troublesome agent.

Unexpectedly long shutdown can happen in normal cases, e.g. a client
might hung is if one of the allocs it is running has a long
shutdown_delay.

Here, we switch to ensuring that the http server is shutdown last.

I believe this doesn't require extra care in agent shutting down logic
while operators may be able to submit write http requests.  We already
need to cope with operators submiting these http requests to another
agent or by servers updating the client allocations.
2020-04-13 10:50:07 -04:00
Mahmood Ali
85a1bb49f3 tests: deflake some SetServer related tests
Some tests assert on numbers on numbers of servers, e.g.
TestHTTP_AgentSetServers and TestHTTP_AgentListServers_ACL . Though, in dev and
test modes, the agent starts with servers having duplicate entries for
advertised and normalized RPC values, then settles with one unique value after
Raft/Serf re-sets servers with one single unique value.

This leads to flakiness, as the test will fail if assertion runs before Serf
update takes effect.

Here, we update the inital dev handling so it only adds a unique value if the
advertised and normalized values are the same.

Sample log lines illustrating the problem:

```
=== CONT  TestHTTP_AgentSetServers
    TestHTTP_AgentSetServers: testlog.go:34: 2020-04-06T21:47:51.016Z [INFO]  nomad.raft: initial configuration: index=1 servers="[{Suffrage:Voter ID:127.0.0.1:9008 Address:127.0.0.1:9008}]"
    TestHTTP_AgentSetServers: testlog.go:34: 2020-04-06T21:47:51.016Z [INFO]  nomad: serf: EventMemberJoin: TestHTTP_AgentSetServers.global 127.0.0.1
    TestHTTP_AgentSetServers: testlog.go:34: 2020-04-06T21:47:51.035Z [DEBUG] client.server_mgr: new server list: new_servers=[127.0.0.1:9008, 127.0.0.1:9008] old_servers=[]
...
    TestHTTP_AgentSetServers: agent_endpoint_test.go:759:
                Error Trace:    agent_endpoint_test.go:759
                                                        http_test.go:1089
                                                        agent_endpoint_test.go:705
                Error:          "[127.0.0.1:9008 127.0.0.1:9008]" should have 1 item(s), but has 2
                Test:           TestHTTP_AgentSetServers
```
2020-04-07 09:27:48 -04:00
Mahmood Ali
5562abd7bf fixup! backend: support WS authentication handshake in alloc/exec 2020-04-03 14:20:31 -04:00
Mahmood Ali
67880310a1 backend: support WS authentication handshake in alloc/exec
The javascript Websocket API doesn't support setting custom headers
(e.g. `X-Nomad-Token`).  This change adds support for having an
authentication handshake message: clients can set `ws_handshake` URL
query parameter to true and send a single handshake message with auth
token first before any other mssage.

This is a backward compatible change: it does not affect nomad CLI path, as it
doesn't set `ws_handshake` parameter.
2020-04-03 11:18:54 -04:00
Mahmood Ali
179fefc8b7 agent config parsing tests for scheduler config 2020-04-03 07:54:32 -04:00
Chris Baker
d3e7288334 Merge pull request #7572 from hashicorp/f-7422-scaling-events
finalizing scaling API work
2020-04-01 13:49:22 -05:00
Seth Hoenig
98db449208 connect: fix bug where absent connect.proxy stanza needs default config
In some refactoring, a bug was introduced where if the connect.proxy
stanza in a submitted job was nil, the default proxy configuration
would not be initialized with default values, effectively breaking
Connect.

      connect {
        sidecar_service {} # should work
      }

In contrast, by setting an empty proxy stanza, the config values would
be inserted correctly.

      connect {
        sidecar_service {
	  proxy {} # workaround
	}
      }

This commit restores the original behavior, where having a proxy
stanza present is not required.

The unit test for this case has also been corrected.
2020-04-01 11:19:32 -06:00
Chris Baker
f23695e07a adding raft and state_store support to track job scaling events
updated ScalingEvent API to record "message string,error bool" instead
of confusing "reason,error *string"
2020-04-01 16:15:14 +00:00
Seth Hoenig
e63f13a0da connect: enable automatic expose paths for individual group service checks
Part of #6120

Building on the support for enabling connect proxy paths in #7323, this change
adds the ability to configure the 'service.check.expose' flag on group-level
service check definitions for services that are connect-enabled. This is a slight
deviation from the "magic" that Consul provides. With Consul, the 'expose' flag
exists on the connect.proxy stanza, which will then auto-generate expose paths
for every HTTP and gRPC service check associated with that connect-enabled
service.

A first attempt at providing similar magic for Nomad's Consul Connect integration
followed that pattern exactly, as seen in #7396. However, on reviewing the PR
we realized having the `expose` flag on the proxy stanza inseperably ties together
the automatic path generation with every HTTP/gRPC defined on the service. This
makes sense in Consul's context, because a service definition is reasonably
associated with a single "task". With Nomad's group level service definitions
however, there is a reasonable expectation that a service definition is more
abstractly representative of multiple services within the task group. In this
case, one would want to define checks of that service which concretely make HTTP
or gRPC requests to different underlying tasks. Such a model is not possible
with the course `proxy.expose` flag.

Instead, we now have the flag made available within the check definitions themselves.
By making the expose feature resolute to each check, it is possible to have
some HTTP/gRPC checks which make use of the envoy exposed paths, as well as
some HTTP/gRPC checks which make use of some orthongonal port-mapping to do
checks on some other task (or even some other bound port of the same task)
within the task group.

Given this example,

group "server-group" {
  network {
    mode = "bridge"
    port "forchecks" {
      to = -1
    }
  }

  service {
    name = "myserver"
    port = 2000

    connect {
      sidecar_service {
      }
    }

    check {
      name     = "mycheck-myserver"
      type     = "http"
      port     = "forchecks"
      interval = "3s"
      timeout  = "2s"
      method   = "GET"
      path     = "/classic/responder/health"
      expose   = true
    }
  }
}

Nomad will automatically inject (via job endpoint mutator) the
extrapolated expose path configuration, i.e.

expose {
  path {
    path            = "/classic/responder/health"
    protocol        = "http"
    local_path_port = 2000
    listener_port   = "forchecks"
  }
}

Documentation is coming in #7440 (needs updating, doing next)

Modifications to the `countdash` examples in https://github.com/hashicorp/demo-consul-101/pull/6
which will make the examples in the documentation actually runnable.

Will add some e2e tests based on the above when it becomes available.
2020-03-31 17:15:50 -06:00
Seth Hoenig
ee3b43e6c0 jobspec: parse multi expose.path instead of explicit slice 2020-03-31 17:15:27 -06:00
Seth Hoenig
2a9749c41c connect: enable proxy.passthrough configuration
Enable configuration of HTTP and gRPC endpoints which should be exposed by
the Connect sidecar proxy. This changeset is the first "non-magical" pass
that lays the groundwork for enabling Consul service checks for tasks
running in a network namespace because they are Connect-enabled. The changes
here provide for full configuration of the

  connect {
    sidecar_service {
      proxy {
        expose {
          paths = [{
		path = <exposed endpoint>
                protocol = <http or grpc>
                local_path_port = <local endpoint port>
                listener_port = <inbound mesh port>
	  }, ... ]
       }
    }
  }

stanza. Everything from `expose` and below is new, and partially implements
the precedent set by Consul:
  https://www.consul.io/docs/connect/registration/service-registration.html#expose-paths-configuration-reference

Combined with a task-group level network port-mapping in the form:

  port "exposeExample" { to = -1 }

it is now possible to "punch a hole" through the network namespace
to a specific HTTP or gRPC path, with the anticipated use case of creating
Consul checks on Connect enabled services.

A future PR may introduce more automagic behavior, where we can do things like

1) auto-fill the 'expose.path.local_path_port' with the default value of the
   'service.port' value for task-group level connect-enabled services.

2) automatically generate a port-mapping

3) enable an 'expose.checks' flag which automatically creates exposed endpoints
   for every compatible consul service check (http/grpc checks on connect
   enabled services).
2020-03-31 17:15:27 -06:00
Seth Hoenig
69f19cc0c0 client: use consistent name for struct receiver parameter
This helps reduce the number of squiggly lines in Goland.
2020-03-31 17:15:27 -06:00
Yoan Blanc
c3928fe360 fixup! vendor: explicit use of hashicorp/go-msgpack
Signed-off-by: Yoan Blanc <yoan@dosimple.ch>
2020-03-31 09:48:07 -04:00
Yoan Blanc
887f23a351 vendor: explicit use of hashicorp/go-msgpack
Signed-off-by: Yoan Blanc <yoan@dosimple.ch>
2020-03-31 09:45:21 -04:00
Seth Hoenig
a86e575670 Merge pull request #7524 from hashicorp/docs-consul-acl-minimums
consul: annotate Consul interfaces with ACLs
2020-03-30 13:27:27 -06:00
Mahmood Ali
5855b62d70 Merge pull request #7534 from hashicorp/b-windows-dev-network
windows: support -dev mode
2020-03-30 14:35:28 -04:00
Seth Hoenig
7a7701a4eb consul: annotate Consul interfaces with ACLs 2020-03-30 10:17:28 -06:00
Drew Bailey
207791951b update audit examples to an endpoint that is audited 2020-03-30 10:03:11 -04:00
Mahmood Ali
d5db765005 tests: remove TestHTTP_NodeDrain_Compat
Nomad 0.11 servers no longer support having pre-0.8 clients.
2020-03-30 07:06:52 -04:00
Mahmood Ali
ed6a8d80c0 tests: deflake TestHTTP_NodeDrain
A node may be recognized as not running any allocs and have its drain
flag reset before the test queries it.
2020-03-30 07:06:52 -04:00
Mahmood Ali
0e5664eb28 tests: deflake TestConsul_PeriodicSync 2020-03-30 07:06:47 -04:00
Mahmood Ali
b398f288b9 windows: support -dev mode
Support running `nomad agent -dev` in Windows, by setting proper network
interface.

Prior to this change, `nomad` uses `lo` interface but Windows uses
"Loopback Pseudo-Interface 1" to refer to loopback device interface:
https://github.com/golang/go/blob/go1.14.1/src/net/net_windows_test.go#L304-L318
.
2020-03-28 12:01:51 -04:00
Drew Bailey
e000fc8932 remove auditing for /ui/ 2020-03-27 10:12:42 -04:00
Drew Bailey
94a74717c4 wrap http.Handlers
better comments
2020-03-27 09:35:10 -04:00
Drew Bailey
d945b26612 sync changes made to oss files from ent 2020-03-25 10:57:44 -04:00
Drew Bailey
5751ba6d16 add in change missed from ent 2020-03-25 10:53:38 -04:00
Drew Bailey
dc7e0bae77 add auditor 2020-03-25 10:48:23 -04:00
Drew Bailey
80965716b6 allow all build contexts to use noOpAuditor 2020-03-25 10:38:40 -04:00
Mahmood Ali
8edf55600b Merge pull request #7487 from hashicorp/b-xss-oss
agent: prevent XSS by controlling Content-Type
2020-03-25 09:56:11 -04:00
Michael Schurter
79e55b4d76 remove double negative from comment
Co-Authored-By: Mahmood Ali <mahmood@hashicorp.com>
2020-03-25 09:45:43 -04:00
Michael Schurter
a7dce71fbf test: assert monitor endpoint sets proper headers 2020-03-25 09:45:43 -04:00
Michael Schurter
43e4706b35 test: assert fs endpoints are xss safe 2020-03-25 09:45:43 -04:00
Michael Schurter
b2ff9bcb1f agent: prevent XSS by controlling Content-Type 2020-03-25 09:45:43 -04:00
Mahmood Ali
4bbde0ea33 tests: test agent to use a noop auditor 2020-03-25 08:45:44 -04:00
Mahmood Ali
c55f3ed084 per-task restart policy 2020-03-24 17:00:41 -04:00
Lang Martin
5485cff30b csi: return an empty result list from plugins & volumes without type, not an error (#7471) 2020-03-24 14:28:28 -04:00
Chris Baker
4330fe8497 bad conversion between api.ScalingPolicy and structs.ScalingPolicy meant
that we were throwing away .Min if provided
2020-03-24 14:39:06 +00:00
Chris Baker
d4f967cdd4 made count optional during job scaling actions
added ACL protection in Job.Scale
in Job.Scale, only perform a Job.Register if the Count was non-nil
2020-03-24 14:39:05 +00:00
Chris Baker
9292e88f2b changes to Canonicalize, Validate, and api->struct conversion so that tg.Count, tg.Scaling.Min/Max are well-defined with reasonable defaults.
- tg.Count defaults to tg.Scaling.Min if present (falls back on previous default of 1 if Scaling is absent)
- Validate() enforces tg.Scaling.Min <= tg.Count <= tg.Scaling.Max

modification in ApiScalingPolicyToStructs, api.TaskGroup.Validate so that defaults are handled for TaskGroup.Count and
2020-03-24 13:57:17 +00:00
Chris Baker
2d88d57e52 fixed http endpoints for job.register and job.scalestatus 2020-03-24 13:57:16 +00:00
Chris Baker
03eb96aba2 wip: scaling status return, almost done 2020-03-24 13:57:15 +00:00
Chris Baker
66bf8dd48d wip: some tests still failing
updating job scaling endpoints to match RFC, cleaning up the API object as well
2020-03-24 13:57:14 +00:00
Chris Baker
3b4a1aecd9 finished refactoring state store, schema, etc 2020-03-24 13:57:14 +00:00
Chris Baker
6b9c0043e4 wip: added Enabled to ScalingPolicyListStub, removed JobID from body of scaling request 2020-03-24 13:57:12 +00:00
Chris Baker
94381c0da8 wip: added tests for client methods around group scaling 2020-03-24 13:57:11 +00:00
Chris Baker
4406668b53 wip: add GET endpoint for job group scaling target 2020-03-24 13:57:10 +00:00
Chris Baker
1c9bac9087 wip: added job.scale rpc endpoint, needs explicit test (tested via http now) 2020-03-24 13:57:09 +00:00