Wednesday, 10 August 2016

Submit Spark job on yarn cluster

 Error: Could not find or load main class org.apache.spark.deploy.yarn.ApplicationMaster
Solution : 


If you are using spark with hdp/hdinsight, then we have to do following things.
  1. Add these entries in your $SPARK_HOME/conf/spark-defaults.conf
    spark.driver.extraJavaOptions -Dhdp.version=2.2.9.1-19 (your installed HDP version)
    spark.yarn.am.extraJavaOptions -Dhdp.version=2.2.9.1-19 (your installed HDP version)
  2. create java-opts file in $SPARK_HOME/conf and add the installed HDP version in that file like
-Dhdp.version=2.2.9.1-19 (your installed HDP version)
to know hdp verion please run command hdp-select status hadoop-client in the cluster

Example command :
spark-submit --class org.apache.spark.examples.SparkPi \
    --master yarn \
    --deploy-mode cluster \
    --driver-memory 1g \
    --executor-memory 2g \
    --executor-cores 1 \
    --queue default \
    /usr/hdp/current/spark/lib/spark-examples*.jar \
    10

Error : jar changed on src filesystem(Spark on yarn cluster mode )

if you are getting this error it means you are uploading assembly jars 
SolutionIn yarn-cluster mode, Spark submit automatically uploads the assembly jar to a distributed cache that all executor containers read from, so there is no need to manually copy the assembly jar to all nodes (or pass it through --jars).
It seems there are two versions of the same jar in your HDFS. Try removing all old jars from your .sparkStaging directory and try again ,it should work .

1 comment: