diff --git a/terraform/README.md b/terraform/README.md index 8b408bcfc..71bff1050 100644 --- a/terraform/README.md +++ b/terraform/README.md @@ -29,9 +29,9 @@ $ export AWS_ACCESS_KEY_ID=[ACCESS_KEY_ID] $ export AWS_SECRET_ACCESS_KEY=[SECRET_ACCESS_KEY] ``` -## Provision +## Provision a cluster -`cd` to one of the environment subdirectories: +`cd` to an environment subdirectory: ```bash $ cd env/us-east @@ -41,9 +41,9 @@ Update terraform.tfvars with your SSH key name: ```bash region = "us-east-1" -ami = "ami-28a1dd3e" +ami = "ami-feac99e8" instance_type = "t2.medium" -key_name = "KEY" +key_name = "KEY_NAME" server_count = "3" client_count = "4" ``` @@ -51,7 +51,7 @@ For example: ```bash region = "us-east-1" -ami = "ami-28a1dd3e" +ami = "ami-feac99e8" instance_type = "t2.medium" key_name = "hashi-us-east-1" server_count = "3" @@ -70,10 +70,10 @@ terraform apply ## Access the cluster -SSH to a server using its public IP. For example: +SSH to any client or server using its public IP. For example: ```bash -$ ssh -i /home/vagrant/.ssh/KEY.pem ubuntu@SERVER_PUBLIC_IP +$ ssh -i /home/vagrant/.ssh/KEY.pem ubuntu@PUBLIC_IP ``` The AWS security group is configured by default to allow all traffic over port 22. This is not recommended for production deployments. @@ -107,10 +107,4 @@ See: ## Apache Spark integration -Nomad is well-suited for analytical workloads, given its performance characteristics and first-class support for batch scheduling. Apache Spark is a popular data processing engine/framework that has been architected to use third-party schedulers. We maintain a fork that natively integrates Nomad with Spark. Sample job files and documentation are included [here](examples/spark/README.md) and also provisioned into the cluster itself under the `HOME` directory. - - - - - - +Nomad is well-suited for analytical workloads, given its performance characteristics and first-class support for batch scheduling. Apache Spark is a popular data processing engine/framework that has been architected to use third-party schedulers. The Nomad ecosystem includes a fork that natively integrates Nomad with Spark. A detailed walkthrough of the integration is included [here](examples/spark/README.md). diff --git a/terraform/examples/spark/README.md b/terraform/examples/spark/README.md index 7cb47ae4d..db26536ed 100644 --- a/terraform/examples/spark/README.md +++ b/terraform/examples/spark/README.md @@ -38,21 +38,20 @@ $ hdfs dfs -ls / Finally, deploy the Spark history server: ```bash -$ cd $HOME/examples/spark $ nomad run spark-history-server-hdfs.nomad ``` -You can find the private IP for the service with a Consul DNS lookup: +You can get the private IP for the history server with a Consul DNS lookup: ```bash $ dig spark-history.service.consul ``` -Cross-reference the private IP with the `terraforom apply` output to get the corresponding public IP. You can access the history server at http://PUBLIC_IP:18080 +Cross-reference the private IP with the `terraforom apply` output to get the corresponding public IP. You can access the history server at `http://PUBLIC_IP:18080`. ## Sample Spark jobs -A number of sample spark-submit commands are listed below that demonstrate several of the official Spark examples. Features like `spark-sql`, `spark-shell` and pyspark are included as well. The commands can be executed from any client or server. +The sample `spark-submit` commands listed below demonstrate several of the official Spark examples. Features like `spark-sql`, `spark-shell` and pyspark are included. The commands can be executed from any client or server. ### SparkPi (Java)