If you want to get a better performance out of your Hadoop cluster, you might want to look into optimizing the memory settings of your NodeManagers. If you installed your cluster using Ambari, and you didn't change the memory settings during setup, your might underutilize the memory which is available on your machines. Therfore Hortonworks provides a nifty little tool for calculating the memory settings based on their best-practices.
You can find the corresponding python script on github.
To calculate the recommending settings just
git clone https://github.com/hortonworks/hdp-configuration-utils.git ./2.1/hdp-configuration-utils.py -c 16 -m 64 -d 4 -k True
Adjust the property to fit the sizing of your machines
-c = cores (number of cores) -m = memory (64 GB of RAM) -d = disks (number of disks) -k = is HBase enabled (True/False)
The result in this case would be
Using cores=16 memory=64GB disks=4 hbase=True Profile: cores=16 memory=49152MB reserved=16GB usableMem=48GB disks=4 Num Container=8 Container Ram=6144MB Used Ram=48GB Unused Ram=16GB ***** mapred-site.xml ***** mapreduce.map.memory.mb=6144 mapreduce.map.java.opts=-Xmx4096m mapreduce.reduce.memory.mb=6144 mapreduce.reduce.java.opts=-Xmx4096m mapreduce.task.io.sort.mb=1792 ***** yarn-site.xml ***** yarn.scheduler.minimum-allocation-mb=6144 yarn.scheduler.maximum-allocation-mb=49152 yarn.nodemanager.resource.memory-mb=49152 yarn.app.mapreduce.am.resource.mb=6144 yarn.app.mapreduce.am.command-opts=-Xmx4096m ***** tez-site.xml ***** tez.am.resource.memory.mb=6144
You can now change the corresponding properties in your
*.xml files and restart the affected components/deamons. Now you should have more YARN memory availble in your cluster and can run more and bigger containers on the NodeManagers.
Tweaking those settings is also possible. But be careful to not overcommit your memory settings to much, because this might have severe performance implications since swapping the memory back to disk is extremly slow.