Thursday 21 July 2016

Spark Memory Consumption Optimization

Changing spark.executor.memory and spark.executor.instances will bring down memory consumption.

By default the value for spark.executor.memory is 4608m and spark.executor.instances is 2




When I run Spark-Shell after SSH, below is the memory consumption footprint.






Change spark-executor.memory to 1608m and run Spark-Shell. Below is the memory consumption footprint.


Now Changed spark.executor.instances to 1 and run Spark-Shell. Below is the memory consumption footprint.


Install Python Libraries to use in Pyspark (HDINSIGHT SPARK CLUSTER)

Install the libraries using commands below:
cd /usr/bin/anaconda/bin/

export PATH=/usr/bin/anaconda/bin:$PATH

conda update matplotlib

conda install Theano

pip install scikit-neuralnetwork

pip install vaderSentiment

Tuesday 19 July 2016

Get List of Hosts in HDINSIGHT Cluster

sudo apt-get install jq

curl -u admin:PASSWORD -G "https://CLUSTERNAME.azurehdinsight.net/api/v1/clusters/CLUSTERNAME/hosts" | jq '.items[].Hosts.host_name'

OUTPUT:

% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  2090  100  2090    0     0   2557      0 --:--:-- --:--:-- --:--:--  2555
"hn0-linuxh.bpa0iysj5klejpk0fzlmcicoke.ix.internal.cloudapp.net"
"hn1-linuxh.bpa0iysj5klejpk0fzlmcicoke.ix.internal.cloudapp.net"
"wn1-linuxh.bpa0iysj5klejpk0fzlmcicoke.ix.internal.cloudapp.net"
"zk0-linuxh.bpa0iysj5klejpk0fzlmcicoke.ix.internal.cloudapp.net"
"zk1-linuxh.bpa0iysj5klejpk0fzlmcicoke.ix.internal.cloudapp.net"
"zk6-linuxh.bpa0iysj5klejpk0fzlmcicoke.ix.internal.cloudapp.net"


Parse the output and dump the list to a text file.