Simple and effective Workarounds !: August 2016

Thursday 18 August 2016

Change Timezone to IST in CENTOS

sudo mv /etc/localtime /etc/localtime.bak
sudo ln -s /usr/share/zoneinfo/Asia/Kolkata /etc/localtime

date
Thu Aug 18 12:58:33 IST 2016

Friday 12 August 2016

Hbase Table backup/Restore

These are the following steps for backup/restore :

1 . Export table to hdfs directory :

hbase org.apache.hadoop.hbase.mapreduce.Export \
<tablename> <outputdir> [<versions> [<starttime> [<endtime>]]]

Example:

hbase org.apache.hadoop.hbase.mapreduce.Export test1 /hbase_backup/test1

Note: To import in different cluster/or if table is not in hbase ,table must be created in that particular cluster before import command .

2.Import/Restore data in hbase cluster from hdfs

Once table instance is created in hbase cluster :now we can import hbase table (restore)

Command :

hbase org.apache.hadoop.hbase.mapreduce.Import <tablename> <inputdir>

Example:

hbase org.apache.hadoop.hbase.mapreduce.Import test1 /hbase_backup/test1

Wednesday 10 August 2016

Submit Spark job on yarn cluster

Error: Could not find or load main class org.apache.spark.deploy.yarn.ApplicationMaster
Solution :

If you are using spark with hdp/hdinsight, then we have to do following things.

Add these entries in your $SPARK_HOME/conf/spark-defaults.conf

spark.driver.extraJavaOptions -Dhdp.version=2.2.9.1-19 (your installed HDP version)

spark.yarn.am.extraJavaOptions -Dhdp.version=2.2.9.1-19 (your installed HDP version)
create java-opts file in $SPARK_HOME/conf and add the installed HDP version in that file like

-Dhdp.version=2.2.9.1-19 (your installed HDP version)

to know hdp verion please run command hdp-select status hadoop-client in the cluster

Example command :

spark-submit --class org.apache.spark.examples.SparkPi \

--master yarn \

--deploy-mode cluster \

--driver-memory 1g \

--executor-memory 2g \

--executor-cores 1 \

--queue default \

/usr/hdp/current/spark/lib/spark-examples*.jar \

10

Error : jar changed on src filesystem(Spark on yarn cluster mode )

if you are getting this error it means you are uploading assembly jars
Solution : In yarn-cluster mode, Spark submit automatically uploads the assembly jar to a distributed cache that all executor containers read from, so there is no need to manually copy the assembly jar to all nodes (or pass it through --jars).

It seems there are two versions of the same jar in your HDFS. Try removing all old jars from your .sparkStaging directory and try again ,it should work .