c# - 前缀过滤器的 HBase Shell 比 Restful 端点快近 100 倍

Question

如果我在 HBase shell 上使用前缀过滤器运行扫描，无论我使用什么前缀，我都会在不到 1 秒的时间内得到响应。（0 对 9 或“a”对“z”对响应速度没有影响）。

但是，当我从 Microsoft HBase 库（在 C# 中）进行相同的查询时，最多可能需要 90 秒才能得到答案。有趣的是，如果我选择一个接近 0 的前缀，它会更快，我从 0 移动得越远，花费的时间就越长。（“a”作为前缀过滤器比“f”快）。

不知道如何确定 shell 的作用与 restful 查询不同，或者如何使 restful 查询更高效。

一些细节：

此表中有超过 20,000,000 条记录
行键设计为[guid]_[inverse timestamp]，例如a6fc9620-5ff0-41c0-9ed9-660bc3fbb65c_9223370501253811889

关于我应该寻找或尝试改进其余 api 请求的任何想法？

score 0 · Accepted Answer

Turns out this is a non-issue. I wasn't running the same commands on the shell vs the rest API like I thought.

On the rest API, I was giving two filters, a page filter and a prefix filter.

On the HBase shell I was running

scan 'beacon', {STARTROW => 'ff', FILTER => "PageFilter(25)"}

The STARTROW isn't the same as a prefix filter. It is actually doing something more like setting a full beginning row key, and thus make the scan performant as it's not traversing the whole table.

Turns out, this is what I should have been doing in the rest API call too. When I set a start and end row in addition to a prefix filter and page filter, it works quickly and as expected.

https://community.hortonworks.com/articles/55204/recommended-way-to-do-hbase-prefix-scan-through-hb.html

Should I use prefixfilter or rowkey range scan in HBase

c# - 前缀过滤器的 HBase Shell 比 Restful 端点快近 100 倍

1 回答 1

Related

Reference