image - Overlap between Hadoop InputSplits?

Question

I'm making a Hadoop job for doing convolutions on one or more potentially very large PGM files. Each mapper will process some number of rows from one of the files and the reducers will put the files back together again. However, each mapper needs a few rows above and below that that it's doing the convolution on. Usually this is not a problem since I have made a RecordReader to get this redundancy, but it presents an issue for the first and last lines of an InputSplit since I cannot access the rows from the last split.

Is there any way to make InputSplits overlap so that the last few lines of the first are the first few lines of the second?

score 0 · Accepted Answer

0

您可以编写自己的自定义拆分器。见史蒂文·刘易斯的这篇文章

于 2013-04-19T19:18:16.440 回答

image - Overlap between Hadoop InputSplits?

1 回答 1

Related

Reference