demo: CSI Ceph

This changeset expands on the existing demonstration we had for Ceph by
showing volume creation. It includes a demo setup for Ceph on Vagrant so that
you don't need a whole Ceph cluster to try it out.
This commit is contained in:
Tim Gross
2021-04-29 15:25:24 -04:00
committed by Tim Gross
parent 336809f81b
commit b67fda839b
10 changed files with 593 additions and 190 deletions

View File

@@ -1,66 +1,132 @@
# Openstack Ceph-CSI Plugin
# Ceph CSI Plugin
The configuration here is for the Ceph RBD driver, migrated from the k8s config [documentation](https://github.com/ceph/ceph-csi/blob/master/docs/deploy-rbd.md). It can be easily modified for the CephFS Driver, as used [here](https://github.com/ceph/ceph-csi/blob/master/docs/deploy-cephfs.md).
## Requirements
The example plugin job creates a file at `local/cloud.conf` using a [`template`](https://www.nomadproject.io/docs/job-specification/template) stanza which pulls the necessary credentials from a [Vault kv-v2](https://www.vaultproject.io/docs/secrets/kv/kv-v2) secrets store.
### Docker Privileged Mode
The Ceph CSI Node task requires that [`privileged = true`](https://www.nomadproject.io/docs/drivers/docker#privileged) be set. This is not needed for the Controller task.
## Container Arguments
Refer to the official plugin [guide](https://github.com/ceph/ceph-csi/blob/master/docs/deploy-rbd.md).
- `--type=rbd`
- Driver type `rbd` (or alternately `cephfs`)
- `--endpoint=unix:///csi/csi.sock`
- This option must match the `mount_dir` specified in the `csi_plugin` stanza for the task.
- `--nodeid=${node.unique.name}`
- A unique ID for the node the task is running on. Recommend using `${node.unique.name}`
- `--cluster=${NOMAD_DC}`
- The cluster the Controller/Node is a part of. Recommend using `${NOMAD_DC}`
- `--instanceid=${attr.unique.platform.aws.instance-id}`
- Unique ID distinguishing this instance of Ceph CSI among other instances, when sharing Ceph clusters across CSI instances for provisioning. Used for topology-aware deployments.
The configuration here is for the Ceph RBD driver, migrated from the k8s
config
[documentation](https://github.com/ceph/ceph-csi/blob/master/docs/deploy-rbd.md). It
can be modified for the CephFS Driver, as used
[here](https://github.com/ceph/ceph-csi/blob/master/docs/deploy-cephfs.md).
## Deployment
### Plugin
The Ceph CSI Node task requires that [`privileged =
true`](https://www.nomadproject.io/docs/drivers/docker#privileged) be
set. This is not needed for the Controller task.
```bash
export NOMAD_ADDR=https://nomad.example.com:4646
export NOMAD_TOKEN=34534-3sdf3-szfdsafsdf3423-zxdfsd3
nomad job run ceph-csi-plugin.hcl
### Plugin Arguments
Refer to the official plugin
[guide](https://github.com/ceph/ceph-csi/blob/master/docs/deploy-rbd.md).
* `--type=rbd`: driver type `rbd` (or alternately `cephfs`)
* `--endpoint=unix:///csi/csi.sock`: this option must match the `mount_dir`
specified in the `csi_plugin` stanza for the task.
* `--nodeid=${node.unique.id}`: a unique ID for the node the task is running
on.
* `--instanceid=${NOMAD_ALLOC_ID}`: a unique ID distinguishing this instance
of Ceph CSI among other instances, when sharing Ceph clusters across CSI
instances for provisioning. Used for topology-aware deployments.
### Run the Plugins
Run the plugins:
```
$ nomad job run -var-file=nomad.vars ./plugin-cephrbd-controller.nomad
==> Monitoring evaluation "c8e65575"
Evaluation triggered by job "plugin-cephrbd-controller"
==> Monitoring evaluation "c8e65575"
Evaluation within deployment: "b15b6b2b"
Allocation "1955d2ab" created: node "8dda4d46", group "cephrbd"
Evaluation status changed: "pending" -> "complete"
==> Evaluation "c8e65575" finished with status "complete"
$ nomad job run -var-file=nomad.vars ./plugin-cephrbd-node.nomad
==> Monitoring evaluation "5e92c5dc"
Evaluation triggered by job "plugin-cephrbd-node"
==> Monitoring evaluation "5e92c5dc"
Allocation "5bb9e57a" created: node "8dda4d46", group "cephrbd"
Evaluation status changed: "pending" -> "complete"
==> Evaluation "5e92c5dc" finished with status "complete"
$ nomad plugin status cephrbd
ID = cephrbd
Provider = rbd.csi.ceph.com
Version = canary
Controllers Healthy = 1
Controllers Expected = 1
Nodes Healthy = 1
Nodes Expected = 1
Allocations
ID Node ID Task Group Version Desired Status Created Modified
1955d2ab 8dda4d46 cephrbd 0 run running 3m47s ago 3m37s ago
5bb9e57a 8dda4d46 cephrbd 0 run running 3m44s ago 3m43s ago
```
### Volume Registration
### Create a Volume
The `external_id` value for the volume must be strictly formatted, see `ceph_csi.tf`. Based on [Ceph-CSI ID Format](https://github.com/ceph/ceph-csi/blob/71ddf51544be498eee03734573b765eb04480bb9/internal/util/volid.go#L27), see [examples](https://github.com/ceph/ceph-csi/blob/71ddf51544be498eee03734573b765eb04480bb9/internal/util/volid_test.go#L33).
The `secrets` block for the volume must be populated with the `userID` and
`userKey` values pulled from `/etc/ceph/ceph.client.<user>.keyring`.
The `secrets` block will be populated with values pulled from `/etc/ceph/ceph.client.<user>.keyring`, e.g.
```
userid = "<user>"
userkey = "AWBg/BtfJInSFBATOrrnCh6UGE3QB3nYakdF+g=="
$ nomad volume create ./volume.hcl
Created external volume 0001-0024-e9ba69fa-67ff-5920-b374-84d5801edd19-0000000000000002-3603408d-a9ca-11eb-8ace-080027c5bc64 with ID testvolume
```
```bash
export NOMAD_ADDR=https://nomad.example.com:4646
export NOMAD_TOKEN=34534-3sdf3-szfdsafsdf3423-zxdfsd3
nomad volume register example_volume.hcl
### Register a Volume
You can register a volume that already exists in Ceph. In this case, you'll
need to provide the `external_id` field. The `ceph-csi-id.tf` Terraform file
in this directory can be used to generate the correctly-formatted ID. This is
based on [Ceph-CSI ID
Format](https://github.com/ceph/ceph-csi/blob/71ddf51544be498eee03734573b765eb04480bb9/internal/util/volid.go#L27)
(see
[examples](https://github.com/ceph/ceph-csi/blob/71ddf51544be498eee03734573b765eb04480bb9/internal/util/volid_test.go#L33)).
## Running Ceph in Vagrant
For demonstration purposes only, you can run Ceph as a single container Nomad
job on the Vagrant VM managed by the `Vagrantfile` at the top-level of this
repo.
The `./run-ceph.sh` script in this directory will deploy the demo container
and wait for it to be ready. The data served by this container is entirely
ephemeral and will be destroyed once it stops; you should not use this an
example of how to run production Ceph workloads!
```sh
$ ./run-ceph.sh
nomad job run -var-file=nomad.vars ./ceph.nomad
==> Monitoring evaluation "68dde586"
Evaluation triggered by job "ceph"
==> Monitoring evaluation "68dde586"
Evaluation within deployment: "79e23968"
Allocation "77fd50fb" created: node "ca3ee034", group "ceph"
Evaluation status changed: "pending" -> "complete"
==> Evaluation "68dde586" finished with status "complete"
waiting for Ceph to be ready..............................
ready!
```
The setup script in the Ceph container configures a key, which you'll need for
creating volumes. You can extract the key from the keyring via `nomad alloc
exec`:
```
$ nomad alloc exec 77f cat /etc/ceph/ceph.client.admin.keyring | awk '/key/{print $3}'
AQDsIoxgHqpeBBAAtmd9Ndu4m1xspTbvwZdIzA==
```
To run the Controller plugin against this Ceph, you'll need to use the plugin
job in the file `plugin-cephrbd-controller-vagrant.nomad` so that it can reach
the correct ports.
## Ceph CSI Driver Source
- https://github.com/ceph/ceph-csi

View File

@@ -1,119 +0,0 @@
job "ceph-csi-plugin" {
datacenters = ["dc1"]
type = "system"
group "nodes" {
task "ceph-node" {
driver = "docker"
template {
data = <<EOF
[{
"clusterID": "<clusterid>",
"monitors": [
{{range $index, $service := service "mon.ceph"}}{{if gt $index 0}}, {{end}}"{{.Address}}"{{end}}
]
}]
EOF
destination = "local/config.json"
change_mode = "restart"
}
config {
image = "quay.io/cephcsi/cephcsi:v2.1.2-amd64"
volumes = [
"./local/config.json:/etc/ceph-csi-config/config.json"
]
mounts = [
{
type = "tmpfs"
target = "/tmp/csi/keys"
readonly = false
tmpfs_options {
size = 1000000 # size in bytes
}
}
]
args = [
"--type=rbd",
# Name of the driver
"--drivername=rbd.csi.ceph.com",
"--logtostderr",
"--nodeserver=true",
"--endpoint=unix://csi/csi.sock",
"--instanceid=${attr.unique.platform.aws.instance-id}",
"--nodeid=${attr.unique.consul.name}",
# TCP port for liveness metrics requests (/metrics)
"--metricsport=${NOMAD_PORT_prometheus}",
]
privileged = true
resources {
cpu = 200
memory = 500
network {
mbits = 1
// prometheus metrics port
port "prometheus" {}
}
}
}
service {
name = "prometheus"
port = "prometheus"
tags = ["ceph-csi"]
}
csi_plugin {
id = "ceph-csi"
type = "node"
mount_dir = "/csi"
}
}
task "ceph-controller" {
template {
data = <<EOF
[{
"clusterID": "<clusterid>",
"monitors": [
{{range $index, $service := service "mon.ceph"}}{{if gt $index 0}}, {{end}}"{{.Address}}"{{end}}
]
}]
EOF
destination = "local/config.json"
change_mode = "restart"
}
driver = "docker"
config {
image = "quay.io/cephcsi/cephcsi:v2.1.2-amd64"
volumes = [
"./local/config.json:/etc/ceph-csi-config/config.json"
]
resources {
cpu = 200
memory = 500
network {
mbits = 1
// prometheus metrics port
port "prometheus" {}
}
}
args = [
"--type=rbd",
"--controllerserver=true",
"--drivername=rbd.csi.ceph.com",
"--logtostderr",
"--endpoint=unix://csi/csi.sock",
"--metricsport=$${NOMAD_PORT_prometheus}",
"--nodeid=$${attr.unique.platform.aws.hostname}"
]
}
service {
name = "prometheus"
port = "prometheus"
tags = ["ceph-csi"]
}
csi_plugin {
id = "ceph-csi"
type = "controller"
mount_dir = "/csi"
}
}
}
}

View File

@@ -0,0 +1,123 @@
# This job deploys Ceph as a Docker container in "demo mode"; it runs all its
# processes in a single task and doesn't will not persist data after a restart
variable "cluster_id" {
type = string
# generated from uuid5(dns) with ceph.example.com as the seed
default = "e9ba69fa-67ff-5920-b374-84d5801edd19"
description = "cluster ID for the Ceph monitor"
}
variable "hostname" {
type = string
default = "linux" # hostname of the Nomad repo's Vagrant box
description = "hostname of the demo host"
}
job "ceph" {
datacenters = ["dc1"]
group "ceph" {
network {
# we can't configure networking in a way that will both satisfy the Ceph
# monitor's requirement to know its own IP address *and* be routable
# between containers, without either CNI or fixing
# https://github.com/hashicorp/nomad/issues/9781
#
# So for now we'll use host networking to keep this demo understandable.
# That also means the controller plugin will need to use host addresses.
mode = "host"
}
service {
name = "ceph-mon"
port = 3300
}
service {
name = "ceph-dashboard"
port = 5000
check {
type = "http"
interval = "5s"
timeout = "1s"
path = "/"
initial_status = "warning"
}
}
task "ceph" {
driver = "docker"
config {
image = "ceph/daemon:latest-octopus"
args = ["demo"]
network_mode = "host"
privileged = true
mount {
type = "bind"
source = "local/ceph"
target = "/etc/ceph"
}
}
resources {
memory = 512
cpu = 256
}
template {
data = <<EOT
MON_IP={{ sockaddr "with $ifAddrs := GetDefaultInterfaces | include \"type\" \"IPv4\" | limit 1 -}}{{- range $ifAddrs -}}{{ attr \"address\" . }}{{ end }}{{ end " }}
CEPH_PUBLIC_NETWORK=0.0.0.0/0
CEPH_DEMO_UID=demo
CEPH_DEMO_BUCKET=foobar
EOT
destination = "${NOMAD_TASK_DIR}/env"
env = true
}
template {
data = <<EOT
[global]
fsid = ${var.cluster_id}
mon initial members = ${var.hostname}
mon host = v2:{{ sockaddr "with $ifAddrs := GetDefaultInterfaces | include \"type\" \"IPv4\" | limit 1 -}}{{- range $ifAddrs -}}{{ attr \"address\" . }}{{ end }}{{ end " }}:3300/0
osd crush chooseleaf type = 0
osd journal size = 100
public network = 0.0.0.0/0
cluster network = 0.0.0.0/0
osd pool default size = 1
mon warn on pool no redundancy = false
osd_memory_target = 939524096
osd_memory_base = 251947008
osd_memory_cache_min = 351706112
osd objectstore = bluestore
[osd.0]
osd data = /var/lib/ceph/osd/ceph-0
[client.rgw.linux]
rgw dns name = ${var.hostname}
rgw enable usage log = true
rgw usage log tick interval = 1
rgw usage log flush threshold = 1
rgw usage max shards = 32
rgw usage max user shards = 1
log file = /var/log/ceph/client.rgw.linux.log
rgw frontends = beast endpoint=0.0.0.0:8080
EOT
destination = "${NOMAD_TASK_DIR}/ceph/ceph.conf"
}
}
}
}

View File

@@ -1,22 +0,0 @@
type = "csi"
id = "testvol"
name = "test_volume"
# this must be strictly formatted, see README
external_id = "ffff-0024-01616094-9d93-4178-bf45-c7eac19e8b15-000000000000ffff-00000000-1111-2222-bbbb-cacacacacaca"
access_mode = "single-node-writer"
attachment_mode = "block-device"
plugin_id = "ceph-csi"
mount_options {
fs_type = "ext4"
}
parameters {}
secrets {
userID = "<userid>"
userKey = "<userkey>"
}
context {
# note: although these are 'parameters' in the ceph-csi spec
# they are passed through to the provider as 'context'
clusterID = "<clusterid>"
pool = "my_pool"
}

View File

@@ -0,0 +1,5 @@
# generated from uuid5(dns) with ceph.example.com as the seed
cluster_id = "e9ba69fa-67ff-5920-b374-84d5801edd19"
# hostname for the Vagrant host where Ceph is running
hostname = "linux"

View File

@@ -0,0 +1,115 @@
variable "cluster_id" {
type = string
# generated from uuid5(dns) with ceph.example.com as the seed
default = "e9ba69fa-67ff-5920-b374-84d5801edd19"
description = "cluster ID for the Ceph monitor"
}
job "plugin-cephrbd-controller" {
datacenters = ["dc1", "dc2"]
constraint {
attribute = "${attr.kernel.name}"
value = "linux"
}
type = "service"
group "cephrbd" {
network {
# we can't configure networking in a way that will both satisfy the Ceph
# monitor's requirement to know its own IP address *and* be routable
# between containers, without either CNI or fixing
# https://github.com/hashicorp/nomad/issues/9781
#
# So for now we'll use host networking to keep this demo understandable.
# That also means the controller plugin will need to use host addresses.
mode = "host"
port "prometheus" {}
}
service {
name = "prometheus"
port = "prometheus"
tags = ["ceph-csi"]
}
task "plugin" {
driver = "docker"
config {
image = "quay.io/cephcsi/cephcsi:canary"
args = [
"--drivername=rbd.csi.ceph.com",
"--v=5",
"--type=rbd",
"--controllerserver=true",
"--nodeid=${NODE_ID}",
"--instanceid=${POD_ID}",
"--endpoint=${CSI_ENDPOINT}",
"--metricsport=${NOMAD_PORT_prometheus}",
]
network_mode = "host"
ports = ["prometheus"]
# we need to be able to write key material to disk in this location
mount {
type = "bind"
source = "secrets"
target = "/tmp/csi/keys"
readonly = false
}
mount {
type = "bind"
source = "ceph-csi-config/config.json"
target = "/etc/ceph-csi-config/config.json"
readonly = false
}
}
template {
data = <<-EOT
POD_ID=${NOMAD_ALLOC_ID}
NODE_ID=${node.unique.id}
CSI_ENDPOINT=unix://csi/csi.sock
EOT
destination = "${NOMAD_TASK_DIR}/env"
env = true
}
# ceph configuration file
template {
data = <<-EOT
[{
"clusterID": "${var.cluster_id}",
"monitors": [
"{{ sockaddr "with $ifAddrs := GetDefaultInterfaces | include \"type\" \"IPv4\" | limit 1 -}}{{- range $ifAddrs -}}{{ attr \"address\" . }}{{ end }}{{ end " }}:3300"
]
}]
EOT
destination = "ceph-csi-config/config.json"
}
csi_plugin {
id = "cephrbd"
type = "controller"
mount_dir = "/csi"
}
# note: there's no upstream guidance on resource usage so
# this is a best guess until we profile it in heavy use
resources {
cpu = 256
memory = 256
}
}
}
}

View File

@@ -0,0 +1,106 @@
variable "cluster_id" {
type = string
# generated from uuid5(dns) with ceph.example.com as the seed
default = "e9ba69fa-67ff-5920-b374-84d5801edd19"
description = "cluster ID for the Ceph monitor"
}
job "plugin-cephrbd-controller" {
datacenters = ["dc1", "dc2"]
constraint {
attribute = "${attr.kernel.name}"
value = "linux"
}
type = "service"
group "cephrbd" {
network {
port "prometheus" {}
}
service {
name = "prometheus"
port = "prometheus"
tags = ["ceph-csi"]
}
task "plugin" {
driver = "docker"
config {
image = "quay.io/cephcsi/cephcsi:canary"
args = [
"--drivername=rbd.csi.ceph.com",
"--v=5",
"--type=rbd",
"--controllerserver=true",
"--nodeid=${NODE_ID}",
"--instanceid=${POD_ID}",
"--endpoint=${CSI_ENDPOINT}",
"--metricsport=${NOMAD_PORT_prometheus}",
]
ports = ["prometheus"]
# we need to be able to write key material to disk in this location
mount {
type = "bind"
source = "secrets"
target = "/tmp/csi/keys"
readonly = false
}
mount {
type = "bind"
source = "ceph-csi-config/config.json"
target = "/etc/ceph-csi-config/config.json"
readonly = false
}
}
template {
data = <<-EOT
POD_ID=${NOMAD_ALLOC_ID}
NODE_ID=${node.unique.id}
CSI_ENDPOINT=unix://csi/csi.sock
EOT
destination = "${NOMAD_TASK_DIR}/env"
env = true
}
# ceph configuration file
template {
data = <<EOF
[{
"clusterID": "${var.cluster_id}",
"monitors": [
{{range $index, $service := service "ceph-mon"}}{{if gt $index 0}}, {{end}}"{{.Address}}"{{end}}
]
}]
EOF
destination = "ceph-csi-config/config.json"
}
csi_plugin {
id = "cephrbd"
type = "controller"
mount_dir = "/csi"
}
# note: there's no upstream guidance on resource usage so
# this is a best guess until we profile it in heavy use
resources {
cpu = 256
memory = 256
}
}
}
}

View File

@@ -0,0 +1,69 @@
job "plugin-cephrbd-node" {
datacenters = ["dc1", "dc2"]
constraint {
attribute = "${attr.kernel.name}"
value = "linux"
}
type = "system"
group "cephrbd" {
network {
port "prometheus" {}
}
service {
name = "prometheus"
port = "prometheus"
tags = ["ceph-csi"]
}
task "plugin" {
driver = "docker"
config {
image = "quay.io/cephcsi/cephcsi:canary"
args = [
"--drivername=rbd.csi.ceph.com",
"--v=5",
"--type=rbd",
"--nodeserver=true",
"--nodeid=${NODE_ID}",
"--instanceid=${POD_ID}",
"--endpoint=${CSI_ENDPOINT}",
"--metricsport=${NOMAD_PORT_prometheus}",
]
privileged = true
ports = ["prometheus"]
}
template {
data = <<-EOT
POD_ID=${NOMAD_ALLOC_ID}
NODE_ID=${node.unique.id}
CSI_ENDPOINT=unix://csi/csi.sock
EOT
destination = "${NOMAD_TASK_DIR}/env"
env = true
}
csi_plugin {
id = "cephrbd"
type = "node"
mount_dir = "/csi"
}
# note: there's no upstream guidance on resource usage so
# this is a best guess until we profile it in heavy use
resources {
cpu = 256
memory = 256
}
}
}
}

View File

@@ -0,0 +1,19 @@
#!/usr/bin/env bash
CONSUL_HTTP_ADDR=${CONSUL_HTTP_ADDR:-http://localhost:8500}
echo
echo "nomad job run -var-file=nomad.vars ./ceph.nomad"
nomad job run -var-file=nomad.vars ./ceph.nomad
echo
echo -n "waiting for Ceph to be ready..."
while :
do
STATUS=$(curl -s "$CONSUL_HTTP_ADDR/v1/health/checks/ceph-dashboard" | jq -r '.[0].Status')
if [[ "$STATUS" == "passing" ]]; then echo; break; fi
echo -n "."
sleep 1
done
echo "ready!"

View File

@@ -0,0 +1,41 @@
id = "testvolume"
name = "test1"
type = "csi"
plugin_id = "cephrbd"
capacity_min = "100MB"
capacity_max = "1GB"
capability {
access_mode = "single-node-writer"
attachment_mode = "file-system"
}
capability {
access_mode = "single-node-writer"
attachment_mode = "block-device"
}
# mount_options {
# fs_type = "ext4"
# mount_flags = ["ro"]
# }
# creds should be coming from:
# /var/lib/ceph/mds/ceph-demo/keyring
# but instead we're getting them from:
# /etc/ceph/ceph.client.admin.keyring
secrets {
userID = "admin"
userKey = "AQDsIoxgHqpeBBAAtmd9Ndu4m1xspTbvwZdIzA=="
}
parameters {
# seeded from uuid5(ceph.example.com)
clusterID = "e9ba69fa-67ff-5920-b374-84d5801edd19"
pool = "rbd"
imageFeatures = "layering"
}