regex - 如何使用 RegEx 过滤 Accumulo 上的扫描

Question

我以前对存储在 Accumulo 中的数据使用过扫描，并且已经取回了整个结果集（无论Range我指定什么）。问题是，我想在客户端收到它们之前从 Accumulo 过滤服务器端的那些。我希望有人有一个简单的代码示例来说明这是如何完成的。

据我了解，Filter提供了一些（全部？）此功能，但在实践中如何使用 API？我从这里的 Accumulo 文档中看到了在 shell 客户端上使用 Filter 的示例：http: //accumulo.apache.org/user_manual_1.3-incubating/examples/filter.html

我在网上找不到任何代码示例，说明一种基于正则表达式过滤任何数据的扫描的简单方法，尽管我认为这应该是相对容易做的事情。

score 9 · Accepted Answer

该类Filter为您想要的功能奠定了框架。要创建自定义过滤器，您需要扩展Filter和实现该accept(Key k, Value v)方法。如果您只想基于正则表达式进行过滤，则可以避免使用RegExFilter.

使用 aRegExFilter很简单。这是一个例子：

//first connect to Accumulo
ZooKeeperInstance inst = new ZooKeeperInstance(instanceName, zooServers);
Connector connect = inst.getConnector(user, password);

//initialize a scanner
Scanner scan = connect.createScanner(myTableName, myAuthorizations);

//to use a filter, which is an iterator, you must create an IteratorSetting
//specifying which iterator class you are using
IteratorSetting iter = new IteratorSetting(15, "myFilter", RegExFilter.class);
//next set the regular expressions to match. Here, I want all key/value pairs in
//which the column family begins with "J"
String rowRegex = null;
String colfRegex = "J.*";
String colqRegex = null;
String valueRegex = null;
boolean orFields = false;
RegExFilter.setRegexs(iter, rowRegex, colfRegex, colqRegex, valueRegex, orFields);
//now add the iterator to the scanner, and you're all set
scan.addScanIterator(iter);

在这种情况下，构造函数的前两个参数iteratorSetting（优先级和名称）不相关。添加上述代码后，遍历扫描器将仅返回与正则表达式参数匹配的键/值对。

regex - 如何使用 RegEx 过滤 Accumulo 上的扫描

1 回答 1

Related

Reference