21

我正在为 Apache Arrow API 寻找有用的文档或示例。谁能指出一些有用的资源?我只能找到一些博客和 JAVA 文档(不多说)。

根据我的阅读,它是用于快速分析的标准内存列式数据库。是否可以将数据加载到箭头内存并对其进行操作?

4

2 回答 2

4

You should use arrow as a middle man between two applications which need to communicate using passing objects.

Arrow isn’t a standalone piece of software but rather a component used to accelerate analytics within a particular system and to allow Arrow-enabled systems to exchange data with low overhead.

For example Arrow improves the performance for data movement within a cluster.

See tests for examples.

  @Test
  public void test() throws Exception {
    BufferAllocator allocator = new RootAllocator(Integer.MAX_VALUE);
    File testInFile = testFolder.newFile("testIn.arrow");
    File testOutFile = testFolder.newFile("testOut.arrow");

    writeInput(testInFile, allocator);

    String[] args = {"-i", testInFile.getAbsolutePath(), "-o", testOutFile.getAbsolutePath()};
    int result = new FileRoundtrip(System.out, System.err).run(args);
    assertEquals(0, result);

    validateOutput(testOutFile, allocator);
}

Also Apache Parquet uses it. There are conversion examples from/to arrow objects:

MessageType parquet = converter.fromArrow(allTypesArrowSchema).getParquetSchema();

Schema arrow = converter.fromParquet(supportedTypesParquetSchema).getArrowSchema();
于 2017-07-09T06:59:40.283 回答
0

他们现在在他们的网站上有一些关于如何使用 Apache Arrow 的基本文档。虽然它可以使用一些填充。

于 2020-11-20T15:10:58.800 回答