I have setup mutlinode hadoop with 3 datanodes and 1 namenode using virtualbox on Ubuntu. My host system serves as NameNode (also datanode) and two VMs serve as DataNodes. My systems are:
- 192.168.1.5: NameNode (also datanode)
- 192.168.1.10: DataNode2
- 192.168.1.11: DataNode3
I am able to SSH all systems from each system. My hadoop/etc/hadoop/slaves on all systems have entry as:
192.168.1.5
192.168.1.10
192.168.1.11
hadoop/etc/hadoop/master
on all systems have entry as: 192.168.1.5
All core-site.xml
, yarn-site.xml
, hdfs-site.xml
, mapred-site.xml
, hadoop-env.sh
are same on machines except of missing entry for dfs.namenode.name.dir
in hdfs-site.xml
in both DataNodes.
When I execute start-yarn.sh
and start-dfs.sh
from NameNode, all work fine and through JPS I am able to see all required services on all machines.
Jps on NameNode:
5840 NameNode
5996 DataNode
7065 Jps
6564 NodeManager
6189 SecondaryNameNode
6354 ResourceManager
Jps on DataNodes:
3070 DataNode
3213 NodeManager
3349 Jps
However when I want to check from namenode/dfshealth.html#tab-datanode
and namenode:50070/dfshealth.html#tab-overview
, both indicates only 2 datanodes.
tab-datanode shows NameNode and DataNode2 as active datanodes. DataNode3 is not displayed at all.
I checked all configuration files (mentioned xml, sh and slves/master) multiple times to make sure nothing is different on both datanodes.
Also etc/hosts
file also contains all node's entry in all systems:
127.0.0.1 localhost
#127.0.1.1 smishra-VM2
192.168.1.11 DataNode3
192.168.1.10 DataNode2
192.168.1.5 NameNode
One thing I'll like mention is that I configured 1 VM 1st then I made clone of that. So both VMs have same configuration. So its more confusing why 1 datanode is shown but not the other one.