Commit Graph

21461 Commits

Author SHA1 Message Date
Jai Bhagat
bcb3e64237 add name property back to agent schema in ember mirage
The name property had to be added back to the agent schema
in the Agent Factory because the /agent/monitor endpoint in
the config finds agents by their names and since member is not
a proper entity in our Mirage Config we can't just findBy name
of the member. So although we're following the correct schema
we're set-up to rely on this.
2021-07-11 16:09:43 -04:00
Jai Bhagat
a9da541412 refactor tests that depend on Agent fixutres in Mirage 2021-07-08 08:01:15 -04:00
Jai Bhagat
36c5074ebf add agent serializer
This commit adds a serializer for the Agent. When the factory was originally
designed to have the Member properities directly on the Agent class which was
not set-up properly technically, but since we didn't really make use of the
Agent endpoint.
2021-07-06 16:42:07 -04:00
Jai Bhagat
4c020c7229 edit the computed agent version property
This PR edits the computed agent version that is returned upon hitting
the agent self request endpoint. The reason is because we believe that
the Agent Member Tag property sometimes returns null because we may have
cases where there are only clients and no servers and only servers are
included in the Serf Gossip Protocol. There may be other cases where we
do in fact have servers but the node is erased for some reason. We are
unsure how to replicate that issue, however.

edit mirage config

This commit updates the Mirage Config because our acceptance tests
depend on the Mirage Config, while we rely on Mirage Factories to
populate fixture data for us to use when to run the Nomad UI locally

Revert "update the open-button disability functionality depending on a job's state"

This reverts commit 5190b308a51d55a7b0617854164c155d36d7e513.
2021-06-14 13:22:36 -04:00
Seth Hoenig
3a5cbc4713 Merge pull request #10754 from hashicorp/b-client-connect-constraint
consul/connect: remove unnecessary connect constraint on clients
2021-06-14 09:41:25 -05:00
James Rasell
7019bc2b9c Merge pull request #10752 from hashicorp/b-fix-test-datarace-volumewatcher
volumewatcher: fix test data race.
2021-06-14 16:30:34 +02:00
Tim Gross
2b63a093ac quotas: evaluate quota feasibility last in scheduler (#10753)
The `QuotaIterator` is used as the source of nodes passed into feasibility
checking for constraints. Every node that passes the quota check counts the
allocation resources agains the quota, and as a result we count nodes which
will be later filtered out by constraints. Therefore for jobs with
constraints, nodes that are feasibility checked but fail have been counted
against quotas. This failure mode is order dependent; if all the unfiltered
nodes happen to be quota checked first, everything works as expected.

This changeset moves the `QuotaIterator` to happen last among all feasibility
checkers (but before ranking). The `QuotaIterator` will never receive filtered
nodes so it will calculate quotas correctly.
2021-06-14 10:11:40 -04:00
Seth Hoenig
0d13ef0c75 consul/connect: remove unnecessary connect constraint on clients
PR https://github.com/hashicorp/nomad/pull/10702 added 2 new constraints
for connect jobs - one for Consul gRPC listener, and one for Connect being
enabled on Clients. Connect does not need to be enabled on clients, only
on Consul servers. Remove the extra constraint.

Discuss:
https://discuss.hashicorp.com/t/nomad-1-1-1-and-consul-connect-enabled-on-consul-clients/25295
2021-06-14 08:01:45 -05:00
James Rasell
b6505c2350 volumewatcher: fix test data race. 2021-06-14 12:11:35 +02:00
Brandon Romano
ff2e2c113b Merge pull request #10750 from hashicorp/br.quote-image
Fix headshot image 404
2021-06-11 15:38:09 -07:00
Brandon Romano
399dd84acc Fix headshot image 404 2021-06-11 15:31:05 -07:00
Luiz Aoqui
5cfc104a38 fix agent-info help message formatting (#10747) 2021-06-11 15:39:28 -04:00
James Rasell
88e456d962 Merge pull request #10745 from hashicorp/b-fix-test-datarace-deploymentwatcher
deploymentwatcher: fix test data race.
2021-06-11 17:23:03 +02:00
James Rasell
a7d055a584 Merge pull request #10744 from hashicorp/b-remove-duplicate-imports
chore: remove duplicate import statements
2021-06-11 16:42:34 +02:00
Mahmood Ali
c467de331e Merge pull request #10742 from hashicorp/deflake-tests-20210608
Deflaking Test 2021 June edition
2021-06-11 09:14:40 -04:00
James Rasell
9c926d7a64 Merge pull request #10739 from hashicorp/f-remove-unused-types-pkg
core: remove unused types pkg and PeriodicCallback type.
2021-06-11 13:27:22 +02:00
James Rasell
9a25e89b00 deploymentwatcher: fix test data race. 2021-06-11 11:55:21 +02:00
James Rasell
a4156c3e94 tests: remove duplicate import statements. 2021-06-11 09:39:22 +02:00
James Rasell
d1db141472 jobspec2: remove duplicate imports statements. 2021-06-11 09:38:47 +02:00
James Rasell
b3fe60a7f3 drivers: remove duplicate import statements. 2021-06-11 09:38:09 +02:00
James Rasell
55f0611aec e2e: remove duplicate import statements. 2021-06-11 09:37:23 +02:00
Mahmood Ali
7254fb34d1 deflake TestNomad_BootstrapExpect and other leader tests
The test fails reliably locally on my machine. The test uses non-dev mode
where Raft actions get committed to disk, causing operations to exceed
the 50ms tight Raft deadlines.

So, here we ensure that non-dev servers use default Raft config
files with longer timeouts.

Also, noticed that the test queries a server, that may a follower with a
stale state.

I've updated the test to ensure we query the leader for its state. The
Barrier call ensures that the leader is a "stable" leader with committed
entries. Protects against a window where a new leader reports the
previous term before it commits a raft log entry.
2021-06-10 22:04:10 -04:00
Mahmood Ali
14994d666d tests: deflake TestAgentProfile_RemoteClient
TestAgentProfile_RemoteClient test must wait for the client node to be
registered in raft state store, and not merely that the server has a
network connection from the client.

In https://app.circleci.com/pipelines/github/hashicorp/nomad/15539/workflows/8dcbc3f3-946b-4da0-b089-9093788bc0c9/jobs/147919, notice how `node registration complete` log line occured after the test already have failed.

This is another case of flakiness due to not waiting for client
registration.
2021-06-10 22:00:15 -04:00
Mahmood Ali
a1800abb7f tests: deflake TestMonitor_Monitor_RemoteServer and cross-region tests
Ensure that all servers are joined to each other before test proceed,
instead of just joining them to the first server and relying on
background serf propagation.

Relying on backgorund serf propagation is a cause of flakiness,
specially for tests with multiple regions. The server receiving the RPC
may not be aware of the region and fail to forward RPC accordingly.

For example, consider `TestMonitor_Monitor_RemoteServer` failure in https://app.circleci.com/pipelines/github/hashicorp/nomad/16402/workflows/7f327235-7d0c-40ba-9757-600522afca51/jobs/158045  you can observe:
* `nomad-117` is joined to `nomad-118` and `nomad-119`
* `nomad-119` is the foreign region
* `nomad-117` gains leadership in the default region, `nomad-118` is the non-leader
*  search logs for `nomad: adding server` and notice that `nomad-118`
   only added `nomad-118` and `nomad-118`, but not `nomad-119`!
* so the query to the non-leader in the test fails to be forwarded to
  the appopriate region.
2021-06-10 21:27:55 -04:00
Mahmood Ali
7badf0fda2 tests: deflake CSI forwarding tests
This updates `client.Ready()` so it returns once the client node got
registered at the servers. Previously, it returns when the
fingerprinters first batch completes, wtihout ensuring that the node is
stored in the Raft data. The tests may fail later when it with unknown
node errors later.

`client.Reedy()` seem to be only called in CSI and some client stats
now.

This class of bug, assuming client is registered without checking, is a
source of flakiness elsewhere. Other tests use other mechanisms for
checking node readiness, though not consistently.
2021-06-10 21:26:34 -04:00
Isabel Suchanek
398d81ccf1 Merge pull request #10740 from hashicorp/docs-deploy-monitor
docs: add deployment monitor to docs, changelog
2021-06-10 13:53:03 -07:00
Isabel Suchanek
34aecd0e82 docs: add deployment monitor to docs, changelog
Updates the deployment status and job run docs
2021-06-10 10:51:33 -07:00
James Rasell
348288c03d core: remove unused types pkg and PeriodicCallback type. 2021-06-10 15:57:13 +02:00
Mahmood Ali
91c6193bf1 update release to 1.1.1 (#10735) 2021-06-10 08:57:30 -04:00
Mahmood Ali
f69c9390ea prepare for 1.1.2 dev cycle 2021-06-10 08:04:25 -04:00
Nomad Release Bot
9caf5e2881 remove generated files 2021-06-10 08:04:25 -04:00
Nomad Release bot
21465a592d Generate files for 1.1.1 release 2021-06-10 08:04:25 -04:00
Mahmood Ali
d9024c3a8d prepare changelog for 1.1.1/1.0.7 release 2021-06-10 08:04:25 -04:00
Shishir Mahajan
b9668e87dc Update containerd task driver options.
- hostname
- auth

Signed-off-by: Shishir Mahajan <smahajan@roblox.com>
2021-06-10 08:03:49 -04:00
Mahmood Ali
122a4cb844 tests: use standard library testing.TB
Glint pulled in an updated version of mitchellh/go-testing-interface
which broke some existing tests because the update added a Parallel()
method to testing.T. This switches to the standard library testing.TB
which doesn't have a Parallel() method.
2021-06-09 16:18:45 -07:00
Isabel Suchanek
0edda116ad cli: add monitor flag to deployment status
Adding '-verbose' will print out the allocation information for the
deployment. This also changes the job run command so that it now blocks
until deployment is complete and adds timestamps to the output so that
it's more in line with the output of node drain.

This uses glint to print in place in running in a tty. Because glint
doesn't yet support cmd/powershell, Windows workflows use a different
library to print in place, which results in slightly different
formatting: 1) different margins, and 2) no spinner indicating
deployment in progress.
2021-06-09 16:18:45 -07:00
Mahmood Ali
66d590c64b golang: update to 1.16.5 (#10733)
1.16.5 contains some security fixes for zip that are notable to get: https://github.com/golang/go/issues?q=milestone%3AGo1.16.5+label%3ACherryPickApproved
2021-06-09 11:51:41 -04:00
Mike Wickett
a4714e1ff4 website: update alert banner (#10728) 2021-06-09 11:02:10 -04:00
Mahmood Ali
322617dfbe driver/docker: ignore cpuset errors for short-lived tasks follow up (#10730)
minor refactor and changelog
2021-06-09 11:00:39 -04:00
Seth Hoenig
0849b03f27 Merge pull request #10732 from hashicorp/docs-update-cl
docs: update cl
2021-06-09 09:57:20 -05:00
Seth Hoenig
0e7c20f0ca docs: update cl 2021-06-09 09:50:29 -05:00
Seth Hoenig
529b722d8d Merge pull request #10729 from hashicorp/f-cns-acl-check_cp-ent
consul: move consul acl tests into ent files
2021-06-09 09:45:13 -05:00
Mahmood Ali
646e8112bc Merge pull request #10416 from hashicorp/b-cores-docker
driver/docker: ignore error if container exists before cgroup can be written
2021-06-09 10:34:02 -04:00
Seth Hoenig
9883b95ae4 consul: move consul acl tests into ent files
(cherry-pick ent back to oss)

This PR moves a lot of Consul ACL token validation tests into ent files,
so that we can verify correct behavior difference between OSS and ENT
Nomad versions.
2021-06-09 08:38:42 -05:00
Michael Schurter
594ceb7022 docs: improve wait_for_index metrics description (#10717)
Old description of `{plan,worker}.wait_for_index` described the metric
in terms of waiting for a snapshot which has two problems:

1. "Snapshot" is an overloaded term in Nomad and operators can't be
   expected to know which use we're referring to here.
2. The most important thing about the metric is what we're waiting *on*
   before taking a snapshot: the raft index of the object to be
   processed (plan or eval).

The new description tries to cram all of that context into the tiny
space provided.

See #5791 for details about the `wait_for_index` mechanism in general.
2021-06-09 08:53:06 -04:00
Seth Hoenig
402b19c3b0 Merge pull request #10720 from hashicorp/f-cns-acl-check
consul: correctly check consul acl token namespace when using consul oss
2021-06-08 15:43:42 -05:00
Seth Hoenig
09c9a17a7f consul: correctly check consul acl token namespace when using consul oss
This PR fixes the Nomad Object Namespace <-> Consul ACL Token relationship
check when using Consul OSS (or Consul ENT without namespace support).

Nomad v1.1.0 introduced a regression where Nomad would fail the validation
when submitting Connect jobs and allow_unauthenticated set to true, with
Consul OSS - because it would do the namespace check against the Consul ACL
token assuming the "default" namespace, which does not work because Consul OSS
does not have namespaces.

Instead of making the bad assumption, expand the namespace check to handle
each special case explicitly.

Fixes #10718
2021-06-08 13:55:57 -05:00
Michael Schurter
f180cf0fa0 Merge pull request #10721 from hashicorp/b-icanthazip
e2e: use api.ipify.org
2021-06-08 09:45:04 -07:00
Tim Gross
f5c71526c1 docs: warn not to set network_mode for Connect-enabled Docker task 2021-06-08 10:14:15 -04:00
James Rasell
007b265674 Merge pull request #10723 from hashicorp/f-changelog-10712
chanagelog: add entry for #10712
2021-06-08 15:05:00 +02:00