Commit Graph

22267 Commits

Author SHA1 Message Date
Tim Gross
cdbb2bcf48 CSI: resolve invalid claim states (#11890)
* csi: resolve invalid claim states on read

It's currently possible for CSI volumes to be claimed by allocations
that no longer exist. This changeset asserts a reasonable state at
the state store level by registering these nil allocations as "past
claims" on any read. This will cause any pass through the periodic GC
or volumewatcher to trigger the unpublishing workflow for those claims.

* csi: make feasibility check errors more understandable

When the feasibility checker finds we have no free write claims, it
checks to see if any of those claims are for the job we're currently
scheduling (so that earlier versions of a job can't block claims for
new versions) and reports a conflict if the volume can't be scheduled
so that the user can fix their claims. But when the checker hits a
claim that has a GCd allocation, the state is recoverable by the
server once claim reaping completes and no user intervention is
required; the blocked eval should complete. Differentiate the
scheduler error produced by these two conditions.
2022-01-28 14:43:35 -05:00
Tim Gross
debffe2436 csi: update leader's ACL in volumewatcher (#11891)
The volumewatcher that runs on the leader needs to make RPC calls
rather than writing to raft (as we do in the deploymentwatcher)
because the unpublish workflow needs to make RPC calls to the
clients. This requires that the volumewatcher has access to the
leader's ACL token.

But when leadership transitions, the new leader creates a new leader
ACL token. This ACL token needs to be passed into the volumewatcher
when we enable it, otherwise the volumewatcher can find itself with a
stale token.
2022-01-28 14:43:27 -05:00
Derek Strickland
143fb90e4c Update IsEmpty to check for pre-1.2.4 fields (#11930) 2022-01-28 14:41:49 -05:00
Nomad Release Bot
2f9accbde7 Release v1.2.4 2022-01-19 00:22:47 +00:00
Nomad Release bot
9f21b724ac Generate files for 1.2.4 release 2022-01-18 23:43:00 +00:00
Luiz Aoqui
c5fd90a7dd docs: add 1.2.4 to changelog 2022-01-18 18:31:34 -05:00
Luiz Aoqui
a0c0b808af docs: add nomad.plan.node_rejected metric (#11860) 2022-01-18 13:47:20 -05:00
Luiz Aoqui
61340142fa ui: fix test (#11870) 2022-01-18 10:36:10 -05:00
Dave May
8d28bfe415 cli: Add event stream capture to nomad operator debug (#11865) 2022-01-17 21:35:51 -05:00
Michael Schurter
dc81f2650a cli: improve debug error messages (#11507)
Improves `nomad debug` error messages when contacting agents that do not
have /v1/agent/host endpoints (the endpoint was added in v0.12.0)

Part of #9568 and manually tested against Nomad v0.8.7.

Hopefully isRedirectError can be reused for more cases listed in #9568
2022-01-17 11:15:17 -05:00
Luiz Aoqui
ac18d719fe docs: update 1.2.0 upgrade note now that the UI ACL is fixed (#11840) 2022-01-17 11:09:08 -05:00
Luiz Aoqui
d1c3c22064 docs: add HashiBox to the list of community tools (#11861) 2022-01-17 11:08:41 -05:00
Luiz Aoqui
4c5dd858d8 changelog: add entry for #11793 (#11862) 2022-01-17 11:08:29 -05:00
James Rasell
868ab230e7 Merge pull request #11849 from hashicorp/b-changelog-11848
changelog: add entry for #11848
2022-01-17 09:35:10 +01:00
Luiz Aoqui
8a427a470a scheduler: detect and log unexpected scheduling collisions (#11793) 2022-01-14 20:09:14 -05:00
Jai
fcd86e49a1 Merge pull request #11820 from hashicorp/f-ui/alloc-legend
feat:  add links to legend items in `allocation-summary`
2022-01-14 14:02:54 -05:00
Tim Gross
307bcada7f csi: volume deregistration should require exact ID (#11852)
The command line client sends a specific volume ID, but this isn't
enforced at the API level and we were incorrectly using a prefix match
for volume deregistration, resulting in cases where a volume with a
shorter ID that's a prefix of another volume would be deregistered
instead of the intended volume.
2022-01-14 12:26:03 -05:00
Tim Gross
e14c10e884 csi: when warning for multiple prefix matches, use full ID (#11853)
When the `volume deregister` or `volume detach` commands get an ID
prefix that matches multiple volumes, show the full length of the
volume IDs in the list of volumes shown so so that the user can select
the correct one.
2022-01-14 12:25:48 -05:00
Tim Gross
6b7ecb2a65 freebsd: build fix for ARM7 32-bit (#11854)
The size of `stat_t` fields is architecture dependent, which was
reportedly causing a build failure on FreeBSD ARM7 32-bit
systems. This changeset matches the behavior we have on Linux.
2022-01-14 12:25:32 -05:00
Tim Gross
77287b0bfc drivers: set world-readable permissions on copied resolv.conf (#11856)
When we copy the system DNS to a task's `resolv.conf`, we should set
the permissions as world-readable so that unprivileged users within
the task can read it.
2022-01-14 12:25:23 -05:00
Jai Bhagat
9731dea75d chore: add changelog 2022-01-14 10:23:09 -05:00
Jai Bhagat
08a5e867e2 test: add test stories for clicking allocation summary 2022-01-14 10:23:09 -05:00
Jai Bhagat
a3d6240895 refact: add data-test-selectors and correct css selectors in summary 2022-01-14 10:23:06 -05:00
Jai Bhagat
2e73e425ed styling: remove clickable link text decoration override to match new mocks 2022-01-14 10:20:36 -05:00
Jai Bhagat
ecaf46c6c9 refact: allocation and child summaries into ember-cli-page-object components 2022-01-14 10:20:33 -05:00
Jai Bhagat
c1bef174ee fix: typo in data-test-selector 2022-01-14 10:19:01 -05:00
Jai Bhagat
1eebec0a03 styling: update styling to match new figma mocks 2022-01-14 10:14:44 -05:00
Jai Bhagat
205a07c237 feat: add clicking functionality to alloc status legend 2022-01-14 10:14:44 -05:00
James Rasell
54cbfe0c5b changelog: add entry for #11848 2022-01-14 13:40:50 +01:00
James Rasell
8f01d74f70 Merge pull request #11842 from hashicorp/b-name-oss-files-consistently
chore: ensure consistent file naming for non-enterprise files.
2022-01-14 08:13:49 +01:00
James Rasell
c08a036655 Merge pull request #11403 from hashicorp/f-gh-11059
agent/docs: add better clarification when top-level data dir needs setting
2022-01-13 16:41:35 +01:00
James Rasell
eee5d90e8b Merge pull request #11402 from hashicorp/document-client-initial-vault-renew
taskrunner: add clarifying initial vault token renew comment.
2022-01-13 16:21:58 +01:00
Luiz Aoqui
3bf78fde47 Fix log level parsing from lines that include a timestamp (#11838) 2022-01-13 09:56:35 -05:00
Seth Hoenig
d87c41d2b7 Merge pull request #11831 from hashicorp/mods-explain-pinned
mods: explain replace statements
2022-01-13 08:53:17 -06:00
James Rasell
e961751d94 chore: ensure consistent file naming for non-enterprise files. 2022-01-13 11:32:16 +01:00
Luiz Aoqui
6b488bdad3 Fix ACL requirements for job details UI (#11672) 2022-01-12 21:26:02 -05:00
Luiz Aoqui
1344906f10 docs: fix autoscaling Datadog site configuration (#11824) 2022-01-12 21:06:30 -05:00
Michael Schurter
07fd6d9a66 Merge pull request #11830 from hashicorp/b-validate-reserved-ports
agent: validate reserved_ports are valid
2022-01-12 17:12:30 -08:00
Michael Schurter
a490e4d423 Merge pull request #11833 from hashicorp/deps-go-getter-v1.5.11
deps: update go-getter to v1.5.11
2022-01-12 16:42:55 -08:00
Michael Schurter
12d21aa0fe doc: add changelog for #11830 2022-01-12 14:21:47 -08:00
Michael Schurter
b4d3a610db agent: validate reserved_ports are valid
Goal is to fix at least one of the causes that can cause a node to be
ineligible to receive work:
https://github.com/hashicorp/nomad/issues/9506#issuecomment-1002880600
2022-01-12 14:21:47 -08:00
Michael Schurter
d65c6fcad8 deps: update go-getter to v1.5.11
Pulls in https://github.com/hashicorp/go-getter/pull/348

Fixes the possibility to log an sshkey if a specific error condition is
hit.
2022-01-12 14:11:16 -08:00
Seth Hoenig
742ac92886 mods: explain replace statements 2022-01-12 15:14:46 -06:00
Seth Hoenig
b446352f15 Merge pull request #11827 from hashicorp/cleanup-response-recorder
cleanup: stop referencing depreceted HeaderMap field
2022-01-12 11:10:51 -06:00
Seth Hoenig
4b20581dc5 cleanup: stop referencing depreceted HeaderMap field
Remove reference to the deprecated ResponseRecorder.HeaderMap field,
instead calling .Response.Header() to get the same data.

closes #10520
2022-01-12 10:32:54 -06:00
sara-gawlinski
e644aa7c0a Update alert-banner (#11817)
Updating banner for edge survey
2022-01-12 11:28:17 -05:00
Tim Gross
1b719eef68 docs: improve changelog for PR #11783 (#11818) 2022-01-11 11:54:12 -05:00
Tim Gross
ef93ab2d56 docs: changelog for PR #11783 (#11812) 2022-01-10 16:39:21 -05:00
Alessandro De Blasis
759397533a metrics: added mapped_file metric (#11500)
Signed-off-by: Alessandro De Blasis <alex@deblasis.net>
Co-authored-by: Nate <37554478+servusdei2018@users.noreply.github.com>
2022-01-10 15:35:19 -05:00
grembo
e9032c10d3 Un-break templates when using vault stanza change_mode noop (#11783)
Templates in nomad jobs make use of the vault token defined in
the vault stanza when issuing credentials like client certificates.

When using change_mode "noop" in the vault stanza, consul-template
is not informed in case a vault token is re-issued (which can
happen from time to time for various reasons, as described
in https://www.nomadproject.io/docs/job-specification/vault).

As a result, consul-template will keep using the old vault token
to renew credentials and - once the token expired - stop renewing
credentials. The symptom of this problem is a vault_token
file that is newer than the issued credential (e.g., TLS certificate)
in a job's /secrets directory.

This change corrects this, so that h.updater.updatedVaultToken(token)
is called, which will inform stakeholders about the new
token and make sure, the new token is used by consul-template.

Example job template fragment:

    vault {
        policies = ["nomad-job-policy"]
        change_mode = "noop"
    }

    template {
      data = <<-EOH
        {{ with secret "pki_int/issue/nomad-job"
        "common_name=myjob.service.consul" "ttl=90m"
        "alt_names=localhost" "ip_sans=127.0.0.1"}}
        {{ .Data.certificate }}
        {{ .Data.private_key }}
        {{ .Data.issuing_ca }}
        {{ end }}
      EOH
      destination = "${NOMAD_SECRETS_DIR}/myjob.crt"
      change_mode = "noop"
    }

This fix does not alter the meaning of the three change modes of vault

- "noop" - Take no action
- "restart" - Restart the job
- "signal" - send a signal to the task

as the switch statement following line 232 contains the necessary
logic.

It is assumed that "take no action" was never meant to mean "don't tell
consul-template about the new vault token".

Successfully tested in a staging cluster consisting of multiple
nomad client nodes.
2022-01-10 14:41:38 -05:00