Wednesday 9 December 2015

Load data from MYSQL and dump to S3 using SPARK


RDBMS IMPORT



***** Using Python *******

pyspark --jars /mnt/resource/lokeshtest/guava-12.0.1.jar,/mnt/resource/lokeshtest/hadoop-aws-2.6.0.jar,/mnt/resource/lokeshtest/aws-java-sdk-1.7.3.jar,/mnt/resource/lokeshtest/mysql-connector-java-5.1.38/mysql-connector-java-5.1.38/mysql-connector-java-5.1.38-bin.jar --packages com.databricks:spark-csv_2.10:1.2.0

from pyspark import SQLContext

sqlcontext=SQLContext(sc)

dataframe_mysql = sqlcontext.read.format("jdbc").options(url="jdbc:mysql://YOUR_PUBLIC IP:3306/DB_NAME",driver = "com.mysql.jdbc.Driver",dbtable = "TBL_NAME",user="sqluser",password="sqluser").load()

dataframe_mysql.show()



****** Using Scala *******

sudo -u root spark-shell --jars /mnt/resource/lokeshtest/guava-12.0.1.jar,/mnt/resource/lokeshtest/hadoop-aws-2.6.0.jar,/mnt/resource/lokeshtest/aws-java-sdk-1.7.3.jar,/mnt/resource/lokeshtest/mysql-connector-java-5.1.38/mysql-connector-java-5.1.38/mysql-connector-java-5.1.38-bin.jar --packages com.databricks:spark-csv_2.10:1.2.0

import org.apache.spark.sql.SQLContext

val sqlcontext = new org.apache.spark.sql.SQLContext(sc)

val dataframe_mysql = sqlcontext.read.format("jdbc").option("url", "jdbc:mysql://YOUR_PUBLIC IP:3306/DB_NAME").option("driver", "com.mysql.jdbc.Driver").option("dbtable", "TBL_NAME").option("user", "sqluser").option("password", "sqluser").load()

dataframe_mysql.show()


**********************************************************************************************************************************************************

****** Using Scala *******

Persist in Mem Cache:

dataframe_mysql.cache

Perform some transformation or filter on df using map, etc.

val filter_gta = dataframe_mysql.filter(dataframe_mysql("date") === "20151129")

Optional: Repartition Data:

filter_gta.repartition(1)

Save to S3 as CSV:

filter_gta.write.format("com.databricks.spark.csv").option("header","true").save("s3n://YOUR_KEY:YOUR_SECRET@BUCKET_NAME/resources/spark-csv/mysqlimport1.csv")


************************************************************************************************************************************************************

No comments:

Post a Comment