0

I have a hadoop cluster with two computers, One as a master and another one as a slave. My input data is present on the Local disk of Master and I have also copied the input data files in the HDFS system. Now my question is, if I run the MapReduce task on this cluster then the whole input file is present on only one system [ which i think is opposed to the MapReduce's basic principle of "Data Locality" ]. I would like to know if there is any mechanism to distribute/partition the initial files so that the input files can be distributed on the different nodes of the cluster.

4

1 回答 1

0

假设您的集群由节点 1 和节点 2 组成。如果节点 1 是主节点,则该节点上没有运行 Datanode。所以你在节点 2 上只有一个 Datanode,所以我不确定你说的是什么意思,"so that the input files can be distributed on the different nodes of the cluster"因为在你当前的设置下,你只有一个可以存储数据的节点。

但是如果你考虑一个通用的n节点集群,那么如果你将数据复制到HDFS中,那么数据通过hadoop本身分布到集群的不同节点上,所以你不必担心这个。

于 2013-06-28T18:07:37.207 回答