1

您如何对 mongo-hadoop 作业进行单元测试?

到目前为止我的尝试:

public class MapperTest {

    MapDriver<Object, BSONObject, Text, IntWritable> d;

    @Before
    public void setUp() throws IOException {
        WordMapper mapper = new WordMapper();
        d = MapDriver.newMapDriver(mapper);
    }

    @Test
    public void testMapper() throws IOException {

        BSONObject doc = new BasicBSONObject("sentence", "Two words");
        d.withInput(new Text("anykey"), doc );

        d.withOutput(new Text("Two"), new IntWritable(1));
        d.withOutput(new Text("words"), new IntWritable(1));

        d.runTest();
    }
}

产生这个输出:

类 org.bson.BasicBSONObject 的 io.serializations 中没有适用的类在 conf 中实现序列化

java.lang.IllegalStateException at org.apache.hadoop.mrunit.internal.io.Serialization.copy(Serialization.java:67) at org.apache.hadoop.mrunit.internal.io.Serialization.copy(Serialization.java:91 ) 在 org.apache.hadoop.mrunit.TestDriver.copy(TestDriver.java:608) 在 org.apache.hadoop 的 org.apache.hadoop.mrunit.internal.io.Serialization.copyWithConf(Serialization.java:104)。 mrunit.TestDriver.copyPair(TestDriver.java:612) 在 org.apache.hadoop.mrunit.MapDriverBase.addInput(MapDriverBase.java:118) 在 org.apache.hadoop.mrunit.MapDriverBase.withInput(MapDriverBase.java:207) ...

4

1 回答 1

1

您需要设置序列化程序。示例:mapDriver.getConfiguration().setStrings("io.serializations", "org.apache.hadoop.io.serializer.WritableSerialization", MongoSerDe.class.getName());

MongoSerDe src:https ://gist.github.com/lfrancke/01d1819a94f14da171e3

但是我在使用这个(MongoSerDe)时遇到错误“org.bson.io.BasicOutputBuffer.pipe(Ljava/io/DataOutput;)I”。

于 2016-01-31T14:18:22.807 回答