hadoop - Number of input splits is equals to number of mappers?

Question

I am processing the the one file with the map reduce that file size is 1Gb and my default block size in HDFS is 64 MB so for this example how many input splits is there and how many mappers is there ?

score 0 · Accepted Answer

Number of splits=Number of mappers.

因此，如果您的文件大小为 1GB (1024/64)，您将运行 16 个映射器。

您的输入拆分与块大小不同。块是包含实际数据的物理表示，但输入拆分只是一个逻辑表示，它只包含拆分长度和拆分位置。

然而，映射器的数量也取决于各种因素。

如果您的文件被压缩，而这又不是可拆分格式，那么您最终将使用一个映射器来处理整个文件。
如果issplittable()在 Inputformat 类中设置为 false，则您的文件不可拆分，然后您还将运行一个映射器。
减速器必须在驱动程序代码中明确设置。job.setNumReduceTasks()会这样做。如果未设置，则减速器的数量默认为 1。

我认为输入拆分的数量取决于输入文件的大小。

score -1 · Accepted Answer

块数 = 映射器数；如果只有一个文件大小为 1 GB，块大小为 64 MB，则没有块（块）=> 1026 MB/64 MB = 16。所以没有mappers = 16。默认情况下我们只会得到一个Reducer，如果我们想运行更多的reducer你可以设置job.setNumReduceTasks();

hadoop - Number of input splits is equals to number of mappers?

2 回答 2

Related

Reference