0

I have a 456kb file which is being read from hdfs and its given as input to mapper function. Every line contain a integer for which I am downloading some files and storing them on local system. I have hadoop set up on two-node cluster and the split size is changed from the program to open 8-mappers :

    Configuration configuration = new Configuration();

    configuration.setLong("mapred.max.split.size", 60000L);
    configuration.setLong("mapred.min.split.size", 60000L);

8 mappers are created but same data is downloaded on both the servers, I think its happening because block size is still set to default 256mb and input file is processed twice. So my question is can we process a small size file with map reduce?

4

1 回答 1

1

如果您下载文件需要时间,您可能会遇到所谓的 Hadoop 推测执行,默认情况下启用该功能。不过,这只是一个猜测,因为您说您不止一次下载了相同的文件。

通过推测执行开启相同的输入,可以并行处理多次,以利用机器能力的差异。由于作业中的大多数任务即将结束,Hadoop 平台将在几个没有其他工作要执行的节点上安排剩余任务的冗余副本。

您可以通过将mapred.map.tasks.speculative.executionmapred.reduce.tasks.speculative.execution JobConf 选项分别设置为 false 来禁用映射器和化简器的推测执行。

于 2013-10-08T08:52:13.273 回答