README formatting

This commit is contained in:
Rob Genova
2017-06-25 10:45:30 -07:00
parent 6a6f9ca5da
commit f94f182a77
3 changed files with 108 additions and 25 deletions

View File

@@ -1,10 +1,18 @@
# Provision a Nomad cluster on AWS with Packer & Terraform
Use this to easily provision a Nomad sandbox environment on AWS with [Packer](https://packer.io) and [Terraform](https://terraform.io). [Consul](https://www.consul.io/intro/index.html) and [Vault](https://www.vaultproject.io/intro/index.html) are also installed (colocated for convenience). The intention is to allow easy exploration of Nomad and its integrations with the HashiCorp stack. This is not meant to be a production ready environment. A demonstration of [Nomad's Apache Spark integration](examples/spark/README.md) is included.
Use this to easily provision a Nomad sandbox environment on AWS with
[Packer](https://packer.io) and [Terraform](https://terraform.io).
[Consul](https://www.consul.io/intro/index.html) and
[Vault](https://www.vaultproject.io/intro/index.html) are also installed
(colocated for convenience). The intention is to allow easy exploration of
Nomad and its integrations with the HashiCorp stack. This is not meant to be a
production ready environment. A demonstration of [Nomad's Apache Spark
integration](examples/spark/README.md) is included.
## Setup
Clone this repo and (optionally) use [Vagrant](https://www.vagrantup.com/intro/index.html) to bootstrap a local staging environment:
Clone this repo and (optionally) use [Vagrant](https://www.vagrantup.com/intro/index.html)
to bootstrap a local staging environment:
```bash
$ git clone git@github.com:hashicorp/nomad.git
@@ -48,14 +56,20 @@ server_count = "3"
client_count = "4"
```
Note that a pre-provisioned, publicly available AMI is used by default (for the `us-east-1` region). To provision your own customized AMI with [Packer](https://www.packer.io/intro/index.html), follow the instructions [here](aws/packer/README.md). You will need to replace the AMI ID in terraform.tfvars with your own. You can also modify the `region`, `instance_type`, `server_count` and `client_count`. At least one client and one server are required.
Note that a pre-provisioned, publicly available AMI is used by default
(for the `us-east-1` region). To provision your own customized AMI with
[Packer](https://www.packer.io/intro/index.html), follow the instructions
[here](aws/packer/README.md). You will need to replace the AMI ID in
terraform.tfvars with your own. You can also modify the `region`,
`instance_type`, `server_count` and `client_count`. At least one client and one
server are required.
Provision the cluster:
```bash
terraform get
terraform plan
terraform apply
$ terraform get
$ terraform plan
$ terraform apply
```
## Access the cluster
@@ -66,9 +80,11 @@ SSH to one of the servers using its public IP:
$ ssh -i /path/to/key ubuntu@PUBLIC_IP
```
Note that the AWS security group is configured by default to allow all traffic over port 22. This is not recommended for production deployments.
Note that the AWS security group is configured by default to allow all traffic
over port 22. This is not recommended for production deployments.
Run a few basic commands to verify that Consul and Nomad are up and running properly:
Run a few basic commands to verify that Consul and Nomad are up and running
properly:
```bash
$ consul members
@@ -84,7 +100,14 @@ $ vault unseal
$ export VAULT_TOKEN=[INITIAL_ROOT_TOKEN]
```
The `vault init` command above creates a single [Vault unseal key](https://www.vaultproject.io/docs/concepts/seal.html). For a production environment, it is recommended that you create at least five unseal key shares and securely distribute them to independent operators. The `vault init` command defaults to five key shares and a key threshold of three. If you provisioned more than one server, the others will become standby nodes (but should still be unsealed). You can query the active and standby nodes independently:
The `vault init` command above creates a single
[Vault unseal key](https://www.vaultproject.io/docs/concepts/seal.html) for
convenience. For a production environment, it is recommended that you create at
least five unseal key shares and securely distribute them to independent
operators. The `vault init` command defaults to five key shares and a key
threshold of three. If you provisioned more than one server, the others will
become standby nodes (but should still be unsealed). You can query the active
and standby nodes independently:
```bash
$ dig active.vault.service.consul
@@ -103,4 +126,9 @@ See:
## Apache Spark integration
Nomad is well-suited for analytical workloads, given its performance characteristics and first-class support for batch scheduling. Apache Spark is a popular data processing engine/framework that has been architected to use third-party schedulers. The Nomad ecosystem includes a fork that natively integrates Nomad with Spark. A detailed walkthrough of the integration is included [here](examples/spark/README.md).
Nomad is well-suited for analytical workloads, given its performance
characteristics and first-class support for batch scheduling. Apache Spark is a
popular data processing engine/framework that has been architected to use
third-party schedulers. The Nomad ecosystem includes a fork that natively
integrates Nomad with Spark. A detailed walkthrough of the integration is
included [here](examples/spark/README.md).

View File

@@ -1,5 +1,7 @@
# Examples
The examples included here are designed to introduce specific features and provide a basic learning experience. The examples subdirectory is automatically provisioned into the home directory of the VMs in your cloud environment.
The examples included here are designed to introduce specific features and
provide a basic learning experience. The examples subdirectory is automatically
provisioned into the home directory of the VMs in your cloud environment.
- [Spark Integration](spark/README.md)

View File

@@ -1,18 +1,27 @@
# Nomad / Spark integration
The Nomad ecosystem includes a fork of Apache Spark that natively supports using a Nomad cluster to run Spark applications. When running on Nomad, the Spark executors that run Spark tasks for your application, and optionally the application driver itself, run as Nomad tasks in a Nomad job. See the [usage guide](./RunningSparkOnNomad.pdf) for more details.
The Nomad ecosystem includes a fork of Apache Spark that natively supports using
a Nomad cluster to run Spark applications. When running on Nomad, the Spark
executors that run Spark tasks for your application, and optionally the
application driver itself, run as Nomad tasks in a Nomad job. See the
[usage guide](./RunningSparkOnNomad.pdf) for more details.
Clusters provisioned with Nomad's Terraform templates are automatically configured to run the Spark integration. The sample job files found here are also provisioned onto every client and server.
Clusters provisioned with Nomad's Terraform templates are automatically
configured to run the Spark integration. The sample job files found here are
also provisioned onto every client and server.
## Setup
To give the Spark integration a test drive, provision a cluster and SSH to any one of the clients or servers (the public IPs are displayed when the Terraform provisioning process completes):
To give the Spark integration a test drive, provision a cluster and SSH to any
one of the clients or servers (the public IPs are displayed when the Terraform
provisioning process completes):
```bash
$ ssh -i /path/to/key ubuntu@PUBLIC_IP
```
The Spark history server and several of the sample Spark jobs below require HDFS. Using the included job file, deploy an HDFS cluster on Nomad:
The Spark history server and several of the sample Spark jobs below require
HDFS. Using the included job file, deploy an HDFS cluster on Nomad:
```bash
$ cd $HOME/examples/spark
@@ -20,13 +29,15 @@ $ nomad run hdfs.nomad
$ nomad status hdfs
```
When the allocations are all in the `running` state (as shown by `nomad status hdfs`), query Consul to verify that the HDFS service has been registered:
When the allocations are all in the `running` state (as shown by `nomad status
hdfs`), query Consul to verify that the HDFS service has been registered:
```bash
$ dig hdfs.service.consul
```
Next, create directories and files in HDFS for use by the history server and the sample Spark jobs:
Next, create directories and files in HDFS for use by the history server and the
sample Spark jobs:
```bash
$ hdfs dfs -mkdir /foo
@@ -47,11 +58,15 @@ You can get the private IP for the history server with a Consul DNS lookup:
$ dig spark-history.service.consul
```
Cross-reference the private IP with the `terraforom apply` output to get the corresponding public IP. You can access the history server at `http://PUBLIC_IP:18080`.
Cross-reference the private IP with the `terraforom apply` output to get the
corresponding public IP. You can access the history server at
`http://PUBLIC_IP:18080`.
## Sample Spark jobs
The sample `spark-submit` commands listed below demonstrate several of the official Spark examples. Features like `spark-sql`, `spark-shell` and pyspark are included. The commands can be executed from any client or server.
The sample `spark-submit` commands listed below demonstrate several of the
official Spark examples. Features like `spark-sql`, `spark-shell` and pyspark
are included. The commands can be executed from any client or server.
You can monitor the status of a Spark job in a second terminal session with:
@@ -67,19 +82,48 @@ To view the output of the job, run `nomad logs` for the driver's Allocation ID.
### SparkPi (Java)
```bash
spark-submit --class org.apache.spark.examples.JavaSparkPi --master nomad --deploy-mode cluster --conf spark.executor.instances=4 --conf spark.nomad.cluster.monitorUntil=complete --conf spark.eventLog.enabled=true --conf spark.eventLog.dir=hdfs://hdfs.service.consul/spark-events --conf spark.nomad.sparkDistribution=https://s3.amazonaws.com/rcgenova-nomad-spark/spark-2.1.0-bin-nomad-preview-6.tgz https://s3.amazonaws.com/rcgenova-nomad-spark/spark-examples_2.11-2.1.0-SNAPSHOT.jar 100
spark-submit \
--class org.apache.spark.examples.JavaSparkPi \
--master nomad \
--deploy-mode cluster \
--conf spark.executor.instances=4 \
--conf spark.nomad.cluster.monitorUntil=complete \
--conf spark.eventLog.enabled=true \
--conf spark.eventLog.dir=hdfs://hdfs.service.consul/spark-events \
--conf spark.nomad.sparkDistribution=https://s3.amazonaws.com/rcgenova-nomad-spark/spark-2.1.0-bin-nomad-preview-6.tgz \
https://s3.amazonaws.com/rcgenova-nomad-spark/spark-examples_2.11-2.1.0-SNAPSHOT.jar 100
```
### Word count (Java)
```bash
spark-submit --class org.apache.spark.examples.JavaWordCount --master nomad --deploy-mode cluster --conf spark.executor.instances=4 --conf spark.nomad.cluster.monitorUntil=complete --conf spark.eventLog.enabled=true --conf spark.eventLog.dir=hdfs://hdfs.service.consul/spark-events --conf spark.nomad.sparkDistribution=https://s3.amazonaws.com/rcgenova-nomad-spark/spark-2.1.0-bin-nomad-preview-6.tgz https://s3.amazonaws.com/rcgenova-nomad-spark/spark-examples_2.11-2.1.0-SNAPSHOT.jar hdfs://hdfs.service.consul/foo/history.log
spark-submit \
--class org.apache.spark.examples.JavaWordCount \
--master nomad \
--deploy-mode cluster \
--conf spark.executor.instances=4 \
--conf spark.nomad.cluster.monitorUntil=complete \
--conf spark.eventLog.enabled=true \
--conf spark.eventLog.dir=hdfs://hdfs.service.consul/spark-events \
--conf spark.nomad.sparkDistribution=https://s3.amazonaws.com/rcgenova-nomad-spark/spark-2.1.0-bin-nomad-preview-6.tgz \
https://s3.amazonaws.com/rcgenova-nomad-spark/spark-examples_2.11-2.1.0-SNAPSHOT.jar \
hdfs://hdfs.service.consul/foo/history.log
```
### DFSReadWriteTest (Scala)
```bash
spark-submit --class org.apache.spark.examples.DFSReadWriteTest --master nomad --deploy-mode cluster --conf spark.executor.instances=4 --conf spark.nomad.cluster.monitorUntil=complete --conf spark.eventLog.enabled=true --conf spark.eventLog.dir=hdfs://hdfs.service.consul/spark-events --conf spark.nomad.sparkDistribution=https://s3.amazonaws.com/rcgenova-nomad-spark/spark-2.1.0-bin-nomad-preview-6.tgz https://s3.amazonaws.com/rcgenova-nomad-spark/spark-examples_2.11-2.1.0-SNAPSHOT.jar /home/ubuntu/.bashrc hdfs://hdfs.service.consul/foo
spark-submit \
--class org.apache.spark.examples.DFSReadWriteTest \
--master nomad \
--deploy-mode cluster \
--conf spark.executor.instances=4 \
--conf spark.nomad.cluster.monitorUntil=complete \
--conf spark.eventLog.enabled=true \
--conf spark.eventLog.dir=hdfs://hdfs.service.consul/spark-events \
--conf spark.nomad.sparkDistribution=https://s3.amazonaws.com/rcgenova-nomad-spark/spark-2.1.0-bin-nomad-preview-6.tgz \
https://s3.amazonaws.com/rcgenova-nomad-spark/spark-examples_2.11-2.1.0-SNAPSHOT.jar \
/home/ubuntu/.bashrc hdfs://hdfs.service.consul/foo
```
### spark-shell
@@ -87,7 +131,10 @@ spark-submit --class org.apache.spark.examples.DFSReadWriteTest --master nomad -
Start the shell:
```bash
spark-shell --master nomad --conf spark.executor.instances=4 --conf spark.nomad.sparkDistribution=https://s3.amazonaws.com/rcgenova-nomad-spark/spark-2.1.0-bin-nomad-preview-6.tgz
spark-shell \
--master nomad \
--conf spark.executor.instances=4 \
--conf spark.nomad.sparkDistribution=https://s3.amazonaws.com/rcgenova-nomad-spark/spark-2.1.0-bin-nomad-preview-6.tgz
```
Run a few commands:
@@ -105,7 +152,10 @@ $ distData.filter(_ < 10).collect()
Start the shell:
```bash
spark-sql --master nomad --conf spark.executor.instances=4 --conf spark.nomad.sparkDistribution=https://s3.amazonaws.com/rcgenova-nomad-spark/spark-2.1.0-bin-nomad-preview-6.tgz jars/spark-sql_2.11-2.1.0-SNAPSHOT.jar
spark-sql \
--master nomad \
--conf spark.executor.instances=4 \
--conf spark.nomad.sparkDistribution=https://s3.amazonaws.com/rcgenova-nomad-spark/spark-2.1.0-bin-nomad-preview-6.tgz jars/spark-sql_2.11-2.1.0-SNAPSHOT.jar
```
Run a few commands:
@@ -125,7 +175,10 @@ $ SELECT * FROM usersTable;
Start the shell:
```bash
pyspark --master nomad --conf spark.executor.instances=4 --conf spark.nomad.sparkDistribution=https://s3.amazonaws.com/rcgenova-nomad-spark/spark-2.1.0-bin-nomad-preview-6.tgz
pyspark \
--master nomad \
--conf spark.executor.instances=4 \
--conf spark.nomad.sparkDistribution=https://s3.amazonaws.com/rcgenova-nomad-spark/spark-2.1.0-bin-nomad-preview-6.tgz
```
Run a few commands: