diff --git a/demo/csi/ceph-csi-plugin/README.md b/demo/csi/ceph-csi-plugin/README.md index 267a4cd58..0eaa1aa63 100644 --- a/demo/csi/ceph-csi-plugin/README.md +++ b/demo/csi/ceph-csi-plugin/README.md @@ -1,66 +1,132 @@ -# Openstack Ceph-CSI Plugin +# Ceph CSI Plugin -The configuration here is for the Ceph RBD driver, migrated from the k8s config [documentation](https://github.com/ceph/ceph-csi/blob/master/docs/deploy-rbd.md). It can be easily modified for the CephFS Driver, as used [here](https://github.com/ceph/ceph-csi/blob/master/docs/deploy-cephfs.md). - -## Requirements - -The example plugin job creates a file at `local/cloud.conf` using a [`template`](https://www.nomadproject.io/docs/job-specification/template) stanza which pulls the necessary credentials from a [Vault kv-v2](https://www.vaultproject.io/docs/secrets/kv/kv-v2) secrets store. - - -### Docker Privileged Mode - -The Ceph CSI Node task requires that [`privileged = true`](https://www.nomadproject.io/docs/drivers/docker#privileged) be set. This is not needed for the Controller task. - -## Container Arguments - -Refer to the official plugin [guide](https://github.com/ceph/ceph-csi/blob/master/docs/deploy-rbd.md). - -- `--type=rbd` - - - Driver type `rbd` (or alternately `cephfs`) - -- `--endpoint=unix:///csi/csi.sock` - - - This option must match the `mount_dir` specified in the `csi_plugin` stanza for the task. - -- `--nodeid=${node.unique.name}` - - - A unique ID for the node the task is running on. Recommend using `${node.unique.name}` - -- `--cluster=${NOMAD_DC}` - - - The cluster the Controller/Node is a part of. Recommend using `${NOMAD_DC}` - -- `--instanceid=${attr.unique.platform.aws.instance-id}` - - - Unique ID distinguishing this instance of Ceph CSI among other instances, when sharing Ceph clusters across CSI instances for provisioning. Used for topology-aware deployments. +The configuration here is for the Ceph RBD driver, migrated from the k8s +config +[documentation](https://github.com/ceph/ceph-csi/blob/master/docs/deploy-rbd.md). It +can be modified for the CephFS Driver, as used +[here](https://github.com/ceph/ceph-csi/blob/master/docs/deploy-cephfs.md). ## Deployment -### Plugin +The Ceph CSI Node task requires that [`privileged = +true`](https://www.nomadproject.io/docs/drivers/docker#privileged) be +set. This is not needed for the Controller task. -```bash -export NOMAD_ADDR=https://nomad.example.com:4646 -export NOMAD_TOKEN=34534-3sdf3-szfdsafsdf3423-zxdfsd3 -nomad job run ceph-csi-plugin.hcl +### Plugin Arguments + +Refer to the official plugin +[guide](https://github.com/ceph/ceph-csi/blob/master/docs/deploy-rbd.md). + +* `--type=rbd`: driver type `rbd` (or alternately `cephfs`) + +* `--endpoint=unix:///csi/csi.sock`: this option must match the `mount_dir` + specified in the `csi_plugin` stanza for the task. + +* `--nodeid=${node.unique.id}`: a unique ID for the node the task is running + on. + +* `--instanceid=${NOMAD_ALLOC_ID}`: a unique ID distinguishing this instance + of Ceph CSI among other instances, when sharing Ceph clusters across CSI + instances for provisioning. Used for topology-aware deployments. + +### Run the Plugins + +Run the plugins: + +``` +$ nomad job run -var-file=nomad.vars ./plugin-cephrbd-controller.nomad +==> Monitoring evaluation "c8e65575" + Evaluation triggered by job "plugin-cephrbd-controller" +==> Monitoring evaluation "c8e65575" + Evaluation within deployment: "b15b6b2b" + Allocation "1955d2ab" created: node "8dda4d46", group "cephrbd" + Evaluation status changed: "pending" -> "complete" +==> Evaluation "c8e65575" finished with status "complete" + +$ nomad job run -var-file=nomad.vars ./plugin-cephrbd-node.nomad +==> Monitoring evaluation "5e92c5dc" + Evaluation triggered by job "plugin-cephrbd-node" +==> Monitoring evaluation "5e92c5dc" + Allocation "5bb9e57a" created: node "8dda4d46", group "cephrbd" + Evaluation status changed: "pending" -> "complete" +==> Evaluation "5e92c5dc" finished with status "complete" + +$ nomad plugin status cephrbd +ID = cephrbd +Provider = rbd.csi.ceph.com +Version = canary +Controllers Healthy = 1 +Controllers Expected = 1 +Nodes Healthy = 1 +Nodes Expected = 1 + +Allocations +ID Node ID Task Group Version Desired Status Created Modified +1955d2ab 8dda4d46 cephrbd 0 run running 3m47s ago 3m37s ago +5bb9e57a 8dda4d46 cephrbd 0 run running 3m44s ago 3m43s ago ``` -### Volume Registration +### Create a Volume -The `external_id` value for the volume must be strictly formatted, see `ceph_csi.tf`. Based on [Ceph-CSI ID Format](https://github.com/ceph/ceph-csi/blob/71ddf51544be498eee03734573b765eb04480bb9/internal/util/volid.go#L27), see [examples](https://github.com/ceph/ceph-csi/blob/71ddf51544be498eee03734573b765eb04480bb9/internal/util/volid_test.go#L33). +The `secrets` block for the volume must be populated with the `userID` and +`userKey` values pulled from `/etc/ceph/ceph.client..keyring`. -The `secrets` block will be populated with values pulled from `/etc/ceph/ceph.client..keyring`, e.g. ``` -userid = "" -userkey = "AWBg/BtfJInSFBATOrrnCh6UGE3QB3nYakdF+g==" +$ nomad volume create ./volume.hcl +Created external volume 0001-0024-e9ba69fa-67ff-5920-b374-84d5801edd19-0000000000000002-3603408d-a9ca-11eb-8ace-080027c5bc64 with ID testvolume ``` -```bash -export NOMAD_ADDR=https://nomad.example.com:4646 -export NOMAD_TOKEN=34534-3sdf3-szfdsafsdf3423-zxdfsd3 -nomad volume register example_volume.hcl +### Register a Volume + +You can register a volume that already exists in Ceph. In this case, you'll +need to provide the `external_id` field. The `ceph-csi-id.tf` Terraform file +in this directory can be used to generate the correctly-formatted ID. This is +based on [Ceph-CSI ID +Format](https://github.com/ceph/ceph-csi/blob/71ddf51544be498eee03734573b765eb04480bb9/internal/util/volid.go#L27) +(see +[examples](https://github.com/ceph/ceph-csi/blob/71ddf51544be498eee03734573b765eb04480bb9/internal/util/volid_test.go#L33)). + + +## Running Ceph in Vagrant + +For demonstration purposes only, you can run Ceph as a single container Nomad +job on the Vagrant VM managed by the `Vagrantfile` at the top-level of this +repo. + +The `./run-ceph.sh` script in this directory will deploy the demo container +and wait for it to be ready. The data served by this container is entirely +ephemeral and will be destroyed once it stops; you should not use this an +example of how to run production Ceph workloads! + +```sh +$ ./run-ceph.sh + +nomad job run -var-file=nomad.vars ./ceph.nomad +==> Monitoring evaluation "68dde586" + Evaluation triggered by job "ceph" +==> Monitoring evaluation "68dde586" + Evaluation within deployment: "79e23968" + Allocation "77fd50fb" created: node "ca3ee034", group "ceph" + Evaluation status changed: "pending" -> "complete" +==> Evaluation "68dde586" finished with status "complete" + +waiting for Ceph to be ready.............................. +ready! ``` +The setup script in the Ceph container configures a key, which you'll need for +creating volumes. You can extract the key from the keyring via `nomad alloc +exec`: + +``` +$ nomad alloc exec 77f cat /etc/ceph/ceph.client.admin.keyring | awk '/key/{print $3}' +AQDsIoxgHqpeBBAAtmd9Ndu4m1xspTbvwZdIzA== +``` + +To run the Controller plugin against this Ceph, you'll need to use the plugin +job in the file `plugin-cephrbd-controller-vagrant.nomad` so that it can reach +the correct ports. + ## Ceph CSI Driver Source - https://github.com/ceph/ceph-csi diff --git a/demo/csi/ceph-csi-plugin/ceph-csi-plugin.hcl b/demo/csi/ceph-csi-plugin/ceph-csi-plugin.hcl deleted file mode 100644 index eb7454608..000000000 --- a/demo/csi/ceph-csi-plugin/ceph-csi-plugin.hcl +++ /dev/null @@ -1,119 +0,0 @@ -job "ceph-csi-plugin" { - datacenters = ["dc1"] - type = "system" - group "nodes" { - task "ceph-node" { - driver = "docker" - template { - data = <", - "monitors": [ - {{range $index, $service := service "mon.ceph"}}{{if gt $index 0}}, {{end}}"{{.Address}}"{{end}} - ] -}] -EOF - destination = "local/config.json" - change_mode = "restart" - } - config { - image = "quay.io/cephcsi/cephcsi:v2.1.2-amd64" - volumes = [ - "./local/config.json:/etc/ceph-csi-config/config.json" - ] - mounts = [ - { - type = "tmpfs" - target = "/tmp/csi/keys" - readonly = false - tmpfs_options { - size = 1000000 # size in bytes - } - } - ] - args = [ - "--type=rbd", - # Name of the driver - "--drivername=rbd.csi.ceph.com", - "--logtostderr", - "--nodeserver=true", - "--endpoint=unix://csi/csi.sock", - "--instanceid=${attr.unique.platform.aws.instance-id}", - "--nodeid=${attr.unique.consul.name}", - # TCP port for liveness metrics requests (/metrics) - "--metricsport=${NOMAD_PORT_prometheus}", - ] - privileged = true - resources { - cpu = 200 - memory = 500 - network { - mbits = 1 - // prometheus metrics port - port "prometheus" {} - } - } - } - service { - name = "prometheus" - port = "prometheus" - tags = ["ceph-csi"] - } - csi_plugin { - id = "ceph-csi" - type = "node" - mount_dir = "/csi" - } - } - task "ceph-controller" { - - template { - data = <", - "monitors": [ - {{range $index, $service := service "mon.ceph"}}{{if gt $index 0}}, {{end}}"{{.Address}}"{{end}} - ] -}] -EOF - destination = "local/config.json" - change_mode = "restart" - } - driver = "docker" - config { - image = "quay.io/cephcsi/cephcsi:v2.1.2-amd64" - volumes = [ - "./local/config.json:/etc/ceph-csi-config/config.json" - ] - resources { - cpu = 200 - memory = 500 - network { - mbits = 1 - // prometheus metrics port - port "prometheus" {} - } - } - args = [ - "--type=rbd", - "--controllerserver=true", - "--drivername=rbd.csi.ceph.com", - "--logtostderr", - "--endpoint=unix://csi/csi.sock", - "--metricsport=$${NOMAD_PORT_prometheus}", - "--nodeid=$${attr.unique.platform.aws.hostname}" - ] - } - service { - name = "prometheus" - port = "prometheus" - tags = ["ceph-csi"] - } - csi_plugin { - id = "ceph-csi" - type = "controller" - mount_dir = "/csi" - } - } - } -} \ No newline at end of file diff --git a/demo/csi/ceph-csi-plugin/ceph.nomad b/demo/csi/ceph-csi-plugin/ceph.nomad new file mode 100644 index 000000000..91f0a8a4a --- /dev/null +++ b/demo/csi/ceph-csi-plugin/ceph.nomad @@ -0,0 +1,123 @@ +# This job deploys Ceph as a Docker container in "demo mode"; it runs all its +# processes in a single task and doesn't will not persist data after a restart + +variable "cluster_id" { + type = string + # generated from uuid5(dns) with ceph.example.com as the seed + default = "e9ba69fa-67ff-5920-b374-84d5801edd19" + description = "cluster ID for the Ceph monitor" +} + +variable "hostname" { + type = string + default = "linux" # hostname of the Nomad repo's Vagrant box + description = "hostname of the demo host" +} + +job "ceph" { + datacenters = ["dc1"] + + group "ceph" { + + network { + # we can't configure networking in a way that will both satisfy the Ceph + # monitor's requirement to know its own IP address *and* be routable + # between containers, without either CNI or fixing + # https://github.com/hashicorp/nomad/issues/9781 + # + # So for now we'll use host networking to keep this demo understandable. + # That also means the controller plugin will need to use host addresses. + mode = "host" + } + + service { + name = "ceph-mon" + port = 3300 + } + + service { + name = "ceph-dashboard" + port = 5000 + + check { + type = "http" + interval = "5s" + timeout = "1s" + path = "/" + initial_status = "warning" + } + } + + task "ceph" { + driver = "docker" + + config { + image = "ceph/daemon:latest-octopus" + args = ["demo"] + network_mode = "host" + privileged = true + + mount { + type = "bind" + source = "local/ceph" + target = "/etc/ceph" + } + } + + resources { + memory = 512 + cpu = 256 + } + + template { + + data = <