java - How to distribute the initial input files to nodes in Hadoop MapReduce?

Question

I have a hadoop cluster with two computers, One as a master and another one as a slave. My input data is present on the Local disk of Master and I have also copied the input data files in the HDFS system. Now my question is, if I run the MapReduce task on this cluster then the whole input file is present on only one system [ which i think is opposed to the MapReduce's basic principle of "Data Locality" ]. I would like to know if there is any mechanism to distribute/partition the initial files so that the input files can be distributed on the different nodes of the cluster.

score 0 · Accepted Answer

假设您的集群由节点 1 和节点 2 组成。如果节点 1 是主节点，则该节点上没有运行 Datanode。所以你在节点 2 上只有一个 Datanode，所以我不确定你说的是什么意思，"so that the input files can be distributed on the different nodes of the cluster"因为在你当前的设置下，你只有一个可以存储数据的节点。

但是如果你考虑一个通用的n节点集群，那么如果你将数据复制到HDFS中，那么数据通过hadoop本身分布到集群的不同节点上，所以你不必担心这个。

java - How to distribute the initial input files to nodes in Hadoop MapReduce?

1 回答 1

Related

Reference