hbase - 从 HBase 读取 Titan Vertex 的分页

Question

我目前正在创建一个可以从 Hadoop HBase 后端读取 Titan Vertex 的 Java 代码。我知道 blueprint api 在每个 TransactionalGraph 上都提供了一个 getVertices() 方法，但我仍在尝试实现我自己的方法。现在对于通常的顶点读取，我已经有一个可以读取整个 HBase 后端并从 Titan Graph 获取所有顶点的工作代码，但是我在实现分页时遇到了问题。

到目前为止我的代码：

    Scan scan = new Scan();
    Filter pageFilter = new ColumnPaginationFilter(DEFAULT_PAGE_SIZE, currentOffSet);
    scan.setFilter(pageFilter);
    scan.addFamily(Backend.EDGESTORE_NAME.getBytes());
    scan.setMaxVersions(10);
    List<Vertex> vertexList = new ArrayList<>(DEFAULT_PAGE_SIZE);
    HTablePool pool = new HTablePool(config, DEFAULT_PAGE_SIZE);
    ResultScanner scanner = pool.getTable(attributeMap.get("storage.tablename")).getScanner(scan);

但是 ResultScanner 返回整个 Graph。

currentOffSet是一个 int 变量，用于确定当前页码。

我也尝试过ResultScanner#next(int rowCount)。它工作正常。但在这个过程中，我没有返回上一页的选项。

谁能帮我？

先感谢您。

score 0 · Accepted Answer

我已经解决了。逻辑很简单。您必须在扫描仪实例上使用setStartRow方法。第一次没有必要，因为扫描应该从第一行开始。然后我们需要获取 *(PAGE_SIZE+1)* 行数。ResultScanner的最后一行将用作下一页的起始行。

要返回上一页，我们需要使用缓冲区或堆栈来存储所有先前访问过的页面的起始行。

这是我的代码片段：

    Scan scan = (new Scan()).addFamily(Backend.EDGESTORE_NAME.getBytes());
    Filter filter = new PageFilter(DEFAULT_PAGE_SIZE + 1);
    scan.setFilter(filter);
    if (currentPageStartRowForHBase != null) {
        scan.setStartRow(currentPageStartRowForHBase);
    }
    List<Vertex> vertexList = new ArrayList<>(DEFAULT_PAGE_SIZE + 1);
    HTablePool pool = null;
    ResultScanner scanner = null;
    try {
        if (pool == null) {
            pool = new HTablePool(config, DEFAULT_PAGE_SIZE + 1);

        }
        scanner = pool.getTable(attributeMap.get("storage.tablename")).getScanner(scan);
        for (Result result : scanner) {
            ByteBuffer byteBuffer = ByteBuffer.wrap(result.getRow());
            Vertex vertex = this.getVertex(IDHandler.getKeyID(byteBuffer));
            if (vertexList.size() < DEFAULT_PAGE_SIZE)
                vertexList.add(vertex);
            else {
                nextPageStartRowForHBase = byteBuffer.array();
            }
        }
    } catch (Exception e) {
        e.printStackTrace();
    }

nextPageStartRowForHBase和currentPageStartRowForHBase是byte[]。

这满足了我的要求。但如果有人有更好的解决方案，请与我们分享。

hbase - 从 HBase 读取 Titan Vertex 的分页

1 回答 1

Related

Reference