terraform README updates

This commit is contained in:
Rob Genova
2017-05-15 21:17:48 -07:00
parent 9f1933e2c4
commit 0f61923fc4
5 changed files with 110 additions and 35 deletions

View File

@@ -1,6 +1,9 @@
# Provision a Nomad cluster with Terraform
Easily provision a fully functional and integrated HashiCorp sandbox environment in the cloud. The goal is to allow easy exploration of Nomad, including the integrations with Consul and Vault. A number of [examples] (examples/README.md) are included.
Provision a fully functional Nomad cluster in the cloud with [Packer](https://packer.io) and [Terraform](https://terraform.io). The goal is to allow easy exploration of Nomad, including the integrations with Consul and Vault. To get started, use one of the per cloud provider links below:
See the README in the [AWS] (aws/README.md) subdirectory to get started.
[AWS](aws/README.md)
Google Cloud (coming soon)
Microsoft Azure (coming soon)
A number of [examples](examples/README.md) and guides are also included.

View File

@@ -20,15 +20,15 @@ You will need the following:
- [API access keys](http://aws.amazon.com/developers/access-keys/)
- [SSH key pair](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html)
Set the following environment variables:
If you provisioned a Vagrant environment using the included Vagrantfile, you will need to copy your private key to it. If not, you will need to [install Terraform](https://www.terraform.io/intro/getting-started/install.html).
Set environment variables for your AWS credentials:
```bash
$ export AWS_ACCESS_KEY_ID=[ACCESS_KEY_ID]
$ export AWS_SECRET_ACCESS_KEY=[SECRET_ACCESS_KEY]
```
If you provisioned a Vagrant environment, you will need to copy your private key to it.
### Provision
`cd` to one of the environment subdirectories:
@@ -41,23 +41,23 @@ Update terraform.tfvars with your SSH key name:
```bash
region = "us-east-1"
ami = "ami-62a60374"
instance_type = "t2.small"
ami = "ami-28a1dd3e"
instance_type = "t2.medium"
key_name = "KEY"
key_file = "/home/vagrant/.ssh/KEY.pem"
server_count = "1"
client_count = "2"
server_count = "3"
client_count = "4"
```
For example:
```bash
region = "us-east-1"
ami = "ami-62a60374"
ami = "ami-28a1dd3e"
instance_type = "t2.medium"
key_name = "hashi-us-east-1"
key_file = "/home/vagrant/.ssh/hashi-us-east-1.pem"
server_count = "1"
client_count = "2"
server_count = "3"
client_count = "4"
```
Provision:

View File

@@ -0,0 +1,18 @@
# Build an Amazon machine image
See the pre-requisites listed [here](../aws/README.md). If not, you will need to [install Packer](https://www.packer.io/intro/getting-started/install.html).
Set environment variables for your AWS credentials:
```bash
$ export AWS_ACCESS_KEY_ID=[ACCESS_KEY_ID]
$ export AWS_SECRET_ACCESS_KEY=[SECRET_ACCESS_KEY]
```
Build your AMI:
```bash
packer build packer.json
```
Don't forget to copy the AMI Id to your [terraform.tfvars file](../env/us-east/terraform.tfvars).

View File

@@ -1,6 +1,5 @@
## Examples
The examples included here are designed to introduce specific features and provide a basic learning experience. The examples subdirectory is automatically provisioned into the home directory of the VMs in your [AWS] (../aws/README.md) environment.
The examples included here are designed to introduce specific features and provide a basic learning experience. The examples subdirectory is automatically provisioned into the home directory of the VMs in your cloud environment.
- Nomad
- [Spark Integration](spark/README.md)
- [Spark Integration](spark/README.md)

View File

@@ -1,8 +1,12 @@
## Spark integration
# Nomad / Spark integration
`cd` to `examples/spark/spark` on one of the servers. The `spark/spark` subdirectory will be created when the cluster is provisioned.
Spark supports using a Nomad cluster to run Spark applications. When running on Nomad, the Spark executors that run Spark tasks for your application, and optionally the application driver itself, run as Nomad tasks in a Nomad job. See the [usage guide](./RunningSparkOnNomad.pdf) for more details.
You can use the spark-submit commands below to run several of the official Spark examples against Nomad. You can monitor Nomad status simulaneously with:
To give the Spark integration a test drive `cd` to `examples/spark/spark` on one of the servers (the `examples/spark/spark` subdirectory will be created when the cluster is provisioned).
A number of sample Spark commands are listed below. These demonstrate some of the official examples as well as features like spark-sql, spark-shell and dataframes.
You can monitor Nomad status simulaneously with:
```bash
$ nomad status
@@ -10,45 +14,69 @@ $ nomad status [JOB_ID]
$ nomad alloc-status [ALLOC_ID]
```
## Sample Spark commands
### SparkPi
Java
Java (client mode)
```bash
$ ./bin/spark-submit --class org.apache.spark.examples.JavaSparkPi --master nomad --conf spark.executor.instances=8 --conf spark.nomad.sparkDistribution=https://s3.amazonaws.com/rcgenova-nomad-spark/spark-2.1.0-SNAPSHOT-bin-nomad-spark.tgz examples/jars/spark-examples*.jar 100
$ ./bin/spark-submit --class org.apache.spark.examples.JavaSparkPi --master nomad --conf spark.executor.instances=8 --conf spark.nomad.sparkDistribution=https://s3.amazonaws.com/rcgenova-nomad-spark/spark-2.1.0-bin-nomad-preview-6.tgz examples/jars/spark-examples*.jar 100
```
Python
Java (cluster mode)
./bin/spark-submit --class org.apache.spark.examples.JavaSparkPi --master nomad --deploy-mode cluster --conf spark.executor.instances=4 --conf spark.nomad.cluster.monitorUntil=complete --conf spark.nomad.sparkDistribution=https://s3.amazonaws.com/rcgenova-nomad-spark/spark-2.1.0-bin-nomad-preview-6.tgz https://s3.amazonaws.com/rcgenova-nomad-spark/spark-examples_2.11-2.1.0-SNAPSHOT.jar 100
Python (client mode)
```bash
$ ./bin/spark-submit --master nomad --conf spark.executor.instances=8 --conf spark.nomad.sparkDistribution=https://s3.amazonaws.com/rcgenova-nomad-spark/spark-2.1.0-SNAPSHOT-bin-nomad-spark.tgz examples/src/main/python/pi.py 100
$ ./bin/spark-submit --master nomad --conf spark.executor.instances=8 --conf spark.nomad.sparkDistribution=https://s3.amazonaws.com/rcgenova-nomad-spark/spark-2.1.0-bin-nomad-preview-6.tgz examples/src/main/python/pi.py 100
```
Scala
Python (cluster mode)
./bin/spark-submit --master nomad --deploy-mode cluster --conf spark.executor.instances=4 --conf spark.nomad.cluster.monitorUntil=complete --conf spark.nomad.sparkDistribution=https://s3.amazonaws.com/rcgenova-nomad-spark/spark-2.1.0-bin-nomad-preview-6.tgz examples/src/main/python/pi.py 100
Scala, (client mode)
```bash
$ ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master nomad --conf spark.executor.instances=8 --conf spark.nomad.sparkDistribution=https://s3.amazonaws.com/rcgenova-nomad-spark/spark-2.1.0-SNAPSHOT-bin-nomad-spark.tgz examples/jars/spark-examples*.jar 100
$ ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master nomad --conf spark.executor.instances=8 --conf spark.nomad.sparkDistribution=https://s3.amazonaws.com/rcgenova-nomad-spark/spark-2.1.0-bin-nomad-preview-6.tgz examples/jars/spark-examples*.jar 100
```
## Machine Learning
### Machine Learning
Python
Python (client mode)
```bash
$ ./bin/spark-submit --master nomad --conf spark.executor.instances=8 --conf spark.nomad.sparkDistribution=https://s3.amazonaws.com/rcgenova-nomad-spark/spark-2.1.0-SNAPSHOT-bin-nomad-spark.tgz examples/src/main/python/ml/logistic_regression_with_elastic_net.py
$ ./bin/spark-submit --master nomad --conf spark.executor.instances=8 --conf spark.nomad.sparkDistribution=https://s3.amazonaws.com/rcgenova-nomad-spark/spark-2.1.0-bin-nomad-preview-6.tgz examples/src/main/python/ml/logistic_regression_with_elastic_net.py
```
Scala
Scala (client mode)
```bash
$ ./bin/spark-submit --class org.apache.spark.examples.SparkLR --master nomad --conf spark.executor.instances=8 --conf spark.nomad.sparkDistribution=https://s3.amazonaws.com/rcgenova-nomad-spark/spark-2.1.0-SNAPSHOT-bin-nomad-spark.tgz examples/jars/spark-examples*.jar
$ ./bin/spark-submit --class org.apache.spark.examples.SparkLR --master nomad --conf spark.executor.instances=8 --conf spark.nomad.sparkDistribution=https://s3.amazonaws.com/rcgenova-nomad-spark/spark-2.1.0-bin-nomad-preview-6.tgz examples/jars/spark-examples*.jar
```
## pyspark
### Streaming
Run these commands simultaneously:
```bash
$ ./bin/pyspark --master nomad --conf spark.executor.instances=8 --conf spark.nomad.sparkDistribution=https://s3.amazonaws.com/rcgenova-nomad-spark/spark-2.1.0-SNAPSHOT-bin-nomad-spark.tgz
bin/spark-submit --class org.apache.spark.examples.streaming.clickstream.PageViewGenerator --master nomad --deploy-mode cluster --conf spark.executor.instances=4 --conf spark.nomad.cluster.monitorUntil=complete --conf spark.nomad.sparkDistribution=https://s3.amazonaws.com/rcgenova-nomad-spark/spark-2.1.0-bin-nomad-preview-6.tgz https://s3.amazonaws.com/rcgenova-nomad-spark/spark-examples_2.11-2.1.0-SNAPSHOT.jar 44444 10
```
```bash
bin/spark-submit --class org.apache.spark.examples.streaming.clickstream.PageViewStream --master nomad --deploy-mode cluster --conf spark.executor.instances=4 --conf spark.nomad.cluster.monitorUntil=complete --conf spark.nomad.sparkDistribution=https://s3.amazonaws.com/rcgenova-nomad-spark/spark-2.1.0-bin-nomad-preview-6.tgz https://s3.amazonaws.com/rcgenova-nomad-spark/spark-examples_2.11-2.1.0-SNAPSHOT.jar errorRatePerZipCode localhost 44444
```
### pyspark
```bash
$ ./bin/pyspark --master nomad --conf spark.executor.instances=8 --conf spark.nomad.sparkDistribution=https://s3.amazonaws.com/rcgenova-nomad-spark/spark-2.1.0-bin-nomad-preview-6.tgz
```
```bash
df = spark.read.json("examples/src/main/resources/people.json")
df.show()
df.printSchema()
@@ -57,11 +85,15 @@ sqlDF = spark.sql("SELECT * FROM people")
sqlDF.show()
```
## spark-shell
### spark-shell
```bash
$ ./bin/spark-shell --master nomad --conf spark.executor.instances=8 --conf spark.nomad.sparkDistribution=https://s3.amazonaws.com/rcgenova-nomad-spark/spark-2.1.0-SNAPSHOT-bin-nomad-spark.tgz
$ ./bin/spark-shell --master nomad --conf spark.executor.instances=8 --conf spark.nomad.sparkDistribution=https://s3.amazonaws.com/rcgenova-nomad-spark/spark-2.1.0-bin-nomad-preview-6.tgz
```
From spark-shell:
```bash
:type spark
spark.version
@@ -70,10 +102,33 @@ val distData = sc.parallelize(data)
distData.filter(_ < 10).collect()
```
## spark-sql
### spark-sql
```bash
$ ./bin/spark-sql --master nomad --conf spark.executor.instances=8 --conf spark.nomad.sparkDistribution=https://s3.amazonaws.com/rcgenova-nomad-spark/spark-2.1.0-SNAPSHOT-bin-nomad-spark.tgz jars/spark-sql_2.11-2.1.0-SNAPSHOT.jar
bin/spark-sql --master nomad --conf spark.executor.instances=8 --conf spark.nomad.sparkDistribution=https://s3.amazonaws.com/rcgenova-nomad-spark/spark-2.1.0-bin-nomad-preview-6.tgz jars/spark-sql_2.11-2.1.0-SNAPSHOT.jar
```
From spark-shell:
```bash
CREATE TEMPORARY VIEW usersTable
USING org.apache.spark.sql.parquet
OPTIONS (
path "examples/src/main/resources/users.parquet"
);
SELECT * FROM usersTable;
```
### Data frames
```bash
bin/spark-shell --master nomad --conf spark.executor.instances=8 --conf spark.nomad.sparkDistribution=https://s3.amazonaws.com/rcgenova-nomad-spark/spark-2.1.0-bin-nomad-preview-6.tgz
```
From spark-shell:
```bash
val usersDF = spark.read.load("examples/src/main/resources/users.parquet")
usersDF.select("name", "favorite_color").write.save("/tmp/namesAndFavColors.parquet")
```