我刚才遇到了这个问题。我发现您可以修改方法 SequenceFilesFromMailArchivesTest.testSequential 如下来解决问题:
@Test
public void testSequential() throws Exception {
File outputDir = this.getTestTempDir("mail-archives-out");
String[] args = {
"--input", inputDir.getAbsolutePath(),
"--output", outputDir.getAbsolutePath(),
"--charset", "UTF-8",
"--keyPrefix", "TEST",
"--method", "sequential",
"--body", "--subject", "--separator", ""
};
// run the application's main method
SequenceFilesFromMailArchives.main(args);
// app should create a single SequenceFile named "chunk-0" in the output dir
File expectedChunkFile = new File(outputDir, "chunk-0");
String expectedChunkPath = expectedChunkFile.getAbsolutePath();
Assert.assertTrue("Expected chunk file " + expectedChunkPath + " not found!", expectedChunkFile.isFile());
Configuration conf = new Configuration();
SequenceFileIterator<Text, Text> iterator = new SequenceFileIterator<Text, Text>(new Path(expectedChunkPath), true, conf);
Assert.assertTrue("First key/value pair not found!", iterator.hasNext());
Pair<Text, Text> record ;//= iterator.next();
/*
File parentFile = new File(new File(new File("TEST"), "subdir"), "mail-messages.gz");
Assert.assertEquals(new File(parentFile, testVars[0][0]).toString(), record.getFirst().toString());
Assert.assertEquals(testVars[0][1] + testVars[0][2], record.getSecond().toString());
Assert.assertTrue("Second key/value pair not found!", iterator.hasNext());
record = iterator.next();
Assert.assertEquals(new File(parentFile, testVars[1][0]).toString(), record.getFirst().toString());
Assert.assertEquals(testVars[1][1] + testVars[1][2], record.getSecond().toString());
*/
record = iterator.next();
File parentFileSubSubDir = new File(new File(new File(new File("TEST"), "subdir"), "subsubdir"), "mail-messages-2.gz");
Assert.assertEquals(new File(parentFileSubSubDir, testVars[0][0]).toString(), record.getFirst().toString());
Assert.assertEquals(testVars[0][1] + testVars[0][2], record.getSecond().toString());
Assert.assertTrue("Second key/value pair not found!", iterator.hasNext());
record = iterator.next();
Assert.assertEquals(new File(parentFileSubSubDir, testVars[1][0]).toString(), record.getFirst().toString());
Assert.assertEquals(testVars[1][1] + testVars[1][2], record.getSecond().toString());
/////////Modified By ZhouShuang/////////////
record = iterator.next();
File parentFile = new File(new File(new File("TEST"), "subdir"), "mail-messages.gz");
Assert.assertEquals(new File(parentFile, testVars[0][0]).toString(), record.getFirst().toString());
Assert.assertEquals(testVars[0][1] + testVars[0][2], record.getSecond().toString());
Assert.assertTrue("Second key/value pair not found!", iterator.hasNext());
record = iterator.next();
Assert.assertEquals(new File(parentFile, testVars[1][0]).toString(), record.getFirst().toString());
Assert.assertEquals(testVars[1][1] + testVars[1][2], record.getSecond().toString());
//////////Modified By ZhouShuang////////////
Assert.assertFalse("Only two key/value pairs expected!", iterator.hasNext());
}
问题的发生只是因为 listFiles() 返回的 File[] 中的文件是随机排序的。我已经制作了一个测试程序来检查它。结果如下: /home/alain/mytests/subsubdir /home/alain/mytests/mail-messages.gz 并根据 PrefixAdditionFilter 类中的 accept() 方法,递归地将目录中的文件放入序列文件。所以当我们使用iterator.next获取SequenceFile中的key-value时,我们会先得到subsubdir/mail-messages-2.gz,然后是mail-messages.gz。但在原来的 testSequential() 函数中,它首先检查 mail-messages.gz,然后检查 subsubdir/mail-messages-2.gz。所以顺序颠倒了。刚刚修改了订单,就没事了。注意,有两个 SequenceFilesFromMailArchivesTest.java 文件,一个在分发包中,另一个在集成包中。我们应该修改后者。我犯了一个错误:)