Wednesday 9 December 2015

Spark and AWS Sample Code

Load CSV data to Amazon S3

***** Using Python *******

pyspark --jars /mnt/resource/lokeshtest/guava-12.0.1.jar,/mnt/resource/lokeshtest/hadoop-aws-2.6.0.jar,/mnt/resource/lokeshtest/aws-java-sdk-1.7.3.jar --packages com.databricks:spark-csv_2.10:1.2.0

df = sqlContext.read.format('com.databricks.spark.csv').options(header='true', inferschema='true').load('s3n://YOUR_KEY:YOUR_SECRET@BUCKET_NAME/resources/spark-csv/sparksamplecsv.csv')

print df.show()



****** Using Scala *******

sudo -u root spark-shell --jars /mnt/resource/lokeshtest/guava-12.0.1.jar,/mnt/resource/lokeshtest/hadoop-aws-2.6.0.jar,/mnt/resource/lokeshtest/aws-java-sdk-1.7.3.jar --packages com.databricks:spark-csv_2.10:1.2.0

val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").option("inferSchema", "true").load("s3n://YOUR_KEY:YOUR_SECRET@BUCKET_NAME/resources/spark-csv/sparksamplecsv.csv")

df.show()


**********************************************************************************************************************************************************

No comments:

Post a Comment