hadoop - MRUnit 正确创建 HBase 结果

Question

我有一个 mapreduce 作业，其中映射器从几个 HBase 表中读取。它在我的集群上运行良好。我正在用 MRUnit 追溯编写一些单元测试。我正在尝试从手动实例化的 KeyValue 对象列表中组合一个 Result 对象，以用作 map() 方法的输入。当我随后尝试读取 map() 方法中的几列时，似乎只有列表中的第一个 KeyValue 对象保留在 Result 对象中——其他列为空。在下面，我有一个名为“0”的列族。

private MapDriver<ImmutableBytesWritable, Result, Text, Text> mapDriver;
private HopperHbaseMapper hopperHbaseMapper;

@Before
public void setUp() {    
  hopperHbaseMapper = new HopperHbaseMapper();
  mapDriver = MapDriver.newMapDriver(hopperHbaseMapper);    
}

@Test
public void testMapHbase() throws Exception {    
  String testKey = "123";
  ImmutableBytesWritable key = new ImmutableBytesWritable(testKey.getBytes());    
  List<KeyValue> keyValues = new ArrayList<KeyValue>();
  KeyValue keyValue1 = new KeyValue(testKey.getBytes(), "0".getBytes(), "first_name".getBytes(), "Joe".getBytes());
  KeyValue keyValue2 = new KeyValue(testKey.getBytes(), "0".getBytes(), "last_name".getBytes(), "Blow".getBytes());
  keyValues.add(keyValue1);
  keyValues.add(keyValue2);
  Result result = new Result(keyValues);
  mapDriver.withInput(key, result);
  mapDriver.withOutput(new Text(testKey), new Text(testKey + "\tJoe\tBlow"));
  mapDriver.runTest();
}

我是否错误地创建了 Result 对象？如前所述，映射器在我的集群上的真实 HBase 数据上运行良好，所以我相信是我的测试设置有问题。

score 3 · Accepted Answer

与 rowkey 一样，HBase 也以字典顺序存储列。因此，您必须使用TreeSet<KeyValue> set = new TreeSet<KeyValue>(KeyValue.COMPARATOR); ans 将其传递set给 Result 构造函数，例如Result(set).

TreeSet<KeyValue> set = new TreeSet<KeyValue>(KeyValue.COMPARATOR);

byte[] row = Bytes.toBytes("row01");
byte[] cf = Bytes.toBytes("cf");
set.add(new KeyValue(row, cf, "cone".getBytes(), Bytes.toBytes("row01_cone_one")));
set.add(new KeyValue(row, cf, "ctwo".getBytes(), Bytes.toBytes("row01_ctwo_two")));
set.add(new KeyValue(row, cf, "cthree".getBytes(), Bytes.toBytes("row01_cthree_three")));
set.add(new KeyValue(row, cf, "cfour".getBytes(), Bytes.toBytes("row01_cfour_four")));
set.add(new KeyValue(row, cf, "cfive".getBytes(), Bytes.toBytes("row01_cfive_five")));
set.add(new KeyValue(row, cf, "csix".getBytes(), Bytes.toBytes("row01_csix_six")));

KeyValue[] kvs = new KeyValue[set.size()];
set.toArray(kvs);

Result result = new Result(kvs);
mapDriver.withInput(key, result);

我也在这里发布了我的答案

score 1 · Accepted Answer

在最新的 Hbase 库中，不推荐使用 Result 方法，因此我们应该使用 Result.create 方法。编写我的解决方案时，我遇到了与问题作者相同的问题。在 Sakthivel 的评论中找到了解决方案。这是用 Scala 语言实现的 Sakthivel 解决方案。

import org.apache.hadoop.hbase.{CellUtil, KeyValue}
import scala.collection.immutable.TreeSet


implicit val ordering =  KeyValue.COMPARATOR

val cells = TreeSet(
      CellUtil.createCell(toBytes("myRowKey"), toBytes("myColumnFamily"),toBytes("myQualifier1"), 1000L, KeyValue.Type.Minimum.getCode, toBytes("myValue1")),
      CellUtil.createCell(toBytes("myRowKey"), toBytes("myColumnFamily"),toBytes("myQualifier2"), 1000L, KeyValue.Type.Minimum.getCode, toBytes("myValue2")),
      CellUtil.createCell(toBytes("myRowKey"), toBytes("myColumnFamily"),toBytes("myQualifier3"), 1000L, KeyValue.Type.Minimum.getCode, toBytes("myValue3")),
      CellUtil.createCell(toBytes("myRowKey"), toBytes("myColumnFamily"),toBytes("myQualifier4"), 1000L, KeyValue.Type.Minimum.getCode, toBytes("myValue4")),
      CellUtil.createCell(toBytes("myRowKey"), toBytes("myColumnFamily"),toBytes("myQualifier5"), 1000L, KeyValue.Type.Minimum.getCode, toBytes("myValue5"))
    )

val result = Result.create(cells.toArray)

希望它会帮助有人为 hbase 功能编写单元测试。

hadoop - MRUnit 正确创建 HBase 结果

2 回答 2

Related

Reference