使用Apache MRUnit,我可以在本地对我的 MapReduce 程序进行单元测试,然后再在集群上运行它。
我的程序需要从DistributedCache中读取,所以我封装了一个在单元测试中模拟DistributedCache.getLocalCacheFiles
的类。我设置了一个存根,以便在不调用该方法时将返回本地路径。但事实证明,该方法被调用并抛出。 FileNotFoundException
这是我的 MapReduce 程序的样子
public class TopicByTime implements Tool {
private static Map<String, String> topicList = null;
public static void main(String[] args) throws Exception {
System.exit(ToolRunner.run(new TopicByTime(), args));
}
@Override
public int run(String[] args) throws Exception {
Job job = new Job();
/* Job setup */
DistributedCache.addCacheFile(new URI(/* path on hdfs */), conf);
job.waitForCompletion(true);
return 0;
}
protected static class TimeMapper extends Mapper<LongWritable, Text, Text, Text> {
@Override
public void setup(Context context) throws IOException, InterruptedException {
DistributedCacheClass cache = new DistributedCacheClass();
Path[] localPaths = cache.getLocalCacheFiles(context.getConfiguration());
if (null == localPaths || 0 == localPaths.length) {
throw new FileNotFoundException("Distributed cached file not found");
}
topicList = Utils.loadTopics(localPaths[0].toString());
}
@Override
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
/* do map */
}
}
/* Reducer and overriding methods */
}
还有我的测试程序
public class TestTopicByTime {
@Before
public void setUp() throws IOException {
Path[] localPaths = { new Path("resource/test/topic_by_time.txt")};
Configuration conf = Mockito.mock(Configuration.class);
DistributedCacheClass cache = Mockito.mock(DistributedCacheClass.class);
when(cache.getLocalCacheFiles(conf)).thenReturn(localPaths);
}
@Test
public void testMapper() {
}
@Test
public void testReducer() {
}
@Test
public void testMapReduce() {
}
}
DistributedCacheClass
是一个简单的包装器
public class DistributedCacheClass {
public Path[] getLocalCacheFiles(Configuration conf) throws IOException {
return DistributedCache.getLocalCacheFiles(conf);
}
}
我可以在 Mapper 的设置方法中添加一个标志,以便在测试时读取本地路径,但我确实想从我的 MapReduce 程序中拆分测试代码。
我是模拟测试和 MRUnit 的新手,所以我的程序中可能存在新手错误。请指出错误,我将修复它们并在下面发布我的更新。