Thursday 18 August 2016

Change Timezone to IST in CENTOS

sudo mv /etc/localtime /etc/localtime.bak
sudo ln -s /usr/share/zoneinfo/Asia/Kolkata /etc/localtime

date
Thu Aug 18 12:58:33 IST 2016

Friday 12 August 2016

Hbase Table backup/Restore

These are the following steps for backup/restore :


1 . Export table to hdfs directory :


hbase org.apache.hadoop.hbase.mapreduce.Export \
   <tablename> <outputdir> [<versions> [<starttime> [<endtime>]]]
   
Example:

hbase org.apache.hadoop.hbase.mapreduce.Export  test1  /hbase_backup/test1

Note: To import in different cluster/or if table is not in hbase ,table must be created in that particular cluster  before import command .



2.Import/Restore data in hbase cluster from hdfs

 Once table instance is created in hbase cluster :now we can import hbase table (restore)

Command :

hbase org.apache.hadoop.hbase.mapreduce.Import <tablename> <inputdir>

Example:

hbase org.apache.hadoop.hbase.mapreduce.Import test1 /hbase_backup/test1


Wednesday 10 August 2016

Submit Spark job on yarn cluster

 Error: Could not find or load main class org.apache.spark.deploy.yarn.ApplicationMaster
Solution : 


If you are using spark with hdp/hdinsight, then we have to do following things.
  1. Add these entries in your $SPARK_HOME/conf/spark-defaults.conf
    spark.driver.extraJavaOptions -Dhdp.version=2.2.9.1-19 (your installed HDP version)
    spark.yarn.am.extraJavaOptions -Dhdp.version=2.2.9.1-19 (your installed HDP version)
  2. create java-opts file in $SPARK_HOME/conf and add the installed HDP version in that file like
-Dhdp.version=2.2.9.1-19 (your installed HDP version)
to know hdp verion please run command hdp-select status hadoop-client in the cluster

Example command :
spark-submit --class org.apache.spark.examples.SparkPi \
    --master yarn \
    --deploy-mode cluster \
    --driver-memory 1g \
    --executor-memory 2g \
    --executor-cores 1 \
    --queue default \
    /usr/hdp/current/spark/lib/spark-examples*.jar \
    10

Error : jar changed on src filesystem(Spark on yarn cluster mode )

if you are getting this error it means you are uploading assembly jars 
SolutionIn yarn-cluster mode, Spark submit automatically uploads the assembly jar to a distributed cache that all executor containers read from, so there is no need to manually copy the assembly jar to all nodes (or pass it through --jars).
It seems there are two versions of the same jar in your HDFS. Try removing all old jars from your .sparkStaging directory and try again ,it should work .