serialization - 在 HBase 中存储和检索字符串数组

Question

我已阅读有关使用 HBase 存储字符串数组的答案（如何将复杂对象存储到 hadoop Hbase？）。

据说使用ArrayWritableClass 来序列化数组。我WritableUtils.toByteArray(Writable ... writable)会得到一个byte[]可以存储在 HBase 中的文件。

当我现在尝试再次检索行时，我得到了一个byte[]我必须以某种方式再次转换回ArrayWritable. 但我找不到办法做到这一点。也许您知道答案，或者我在序列化我的String[]?

score 5 · Accepted Answer

您可以应用以下方法取回ArrayWritable（取自我之前的回答，请参见此处）。

public static <T extends Writable> T asWritable(byte[] bytes, Class<T> clazz)
            throws IOException {
        T result = null;
        DataInputStream dataIn = null;
        try {
            result = clazz.newInstance();
            ByteArrayInputStream in = new ByteArrayInputStream(bytes);
            dataIn = new DataInputStream(in);
            result.readFields(dataIn);
        }
        catch (InstantiationException e) {
            // should not happen
            assert false;
        }
        catch (IllegalAccessException e) {
            // should not happen
            assert false;
        }
        finally {
            IOUtils.closeQuietly(dataIn);
        }
        return result;
    }

此方法只是根据提供的类类型标记将字节数组反序列化为正确的对象类型。
例如：假设您有一个自定义的 ArrayWritable：

public class TextArrayWritable extends ArrayWritable {
    public TextArrayWritable() {
      super(Text.class);
    }
}

现在您发出一个 HBase get：

...
Get get = new Get(row);
Result result = htable.get(get);
byte[] value = result.getValue(family, qualifier);
TextArrayWritable tawReturned = asWritable(value, TextArrayWritable.class);
Text[] texts = (Text[]) tawReturned.toArray();
for (Text t : texts) {
  System.out.print(t + " ");
}
...

注意：
您可能已经在 WritableUtils 中找到了readCompressedStringArray()和writeCompressedStringArray()方法，如果您有自己的 String 数组支持的 Writable 类，它们似乎是合适的。在使用它们之前，我会警告您，由于 gzip 压缩/解压缩导致的开销，这些可能会导致严重的性能损失。

serialization - 在 HBase 中存储和检索字符串数组

1 回答 1

Related

Reference