0

我有一个使用 hudi 从 spark kinesis 流中创建并存储在 S3 中的镶木地板记录。

从此记录生成 AWS 粘合表。org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat我按照说明将 InputRecord 类型更新为https://cwiki.apache.org/confluence/display/HUDI/Migration+Guide+From+com.uber.hoodie+to+org.apache.hudi

从我运行的 presto-cli

presto-cli --catalog hive --schema my-schema --server my-server:8889
presto:my-schema> select * from table

这返回

Query 20200211_185222_00050_hej8h, FAILED, 1 node
Splits: 17 total, 0 done (0.00%)
0:01 [0 rows, 0B] [0 rows/s, 0B/s]

Query 20200211_185222_00050_hej8h failed: No value present

但是当我跑步时

select id from table

它返回

    id    
----------
 34551832 
(1 row)

Query 20200211_185250_00051_hej8h, FINISHED, 1 node
Splits: 17 total, 17 done (100.00%)
0:00 [1 rows, 93B] [2 rows/s, 213B/s]

这是预期的行为吗?或者 Hudi/AWS Glue/Presto 之间的设置是否存在潜在问题

2020 年 2 月 12 日更新

使用 --debug 选项的堆栈跟踪

presto:schema> select * from table;

Query 20200212_092259_00006_hej8h, FAILED, 1 node
http://xx-xxx-xxx-xxx.xx-xxxxx-xxx.compute.amazonaws.com:8889/ui/query.html?20200212_092259_00006_hej8h
Splits: 17 total, 0 done (0.00%)
CPU Time: 0.0s total,     0 rows/s,     0B/s, 23% active
Per Node: 0.1 parallelism,     0 rows/s,     0B/s
Parallelism: 0.1
Peak Memory: 0B
0:00 [0 rows, 0B] [0 rows/s, 0B/s]

Query 20200212_092259_00006_hej8h failed: No value present
java.util.NoSuchElementException: No value present
    at java.util.Optional.get(Optional.java:135)
    at com.facebook.presto.parquet.reader.ParquetReader.readArray(ParquetReader.java:156)
    at com.facebook.presto.parquet.reader.ParquetReader.readColumnChunk(ParquetReader.java:282)
    at com.facebook.presto.parquet.reader.ParquetReader.readStruct(ParquetReader.java:193)
    at com.facebook.presto.parquet.reader.ParquetReader.readColumnChunk(ParquetReader.java:276)
    at com.facebook.presto.parquet.reader.ParquetReader.readStruct(ParquetReader.java:193)
    at com.facebook.presto.parquet.reader.ParquetReader.readColumnChunk(ParquetReader.java:276)
    at com.facebook.presto.parquet.reader.ParquetReader.readBlock(ParquetReader.java:268)
    at com.facebook.presto.hive.parquet.ParquetPageSource$ParquetBlockLoader.load(ParquetPageSource.java:247)
    at com.facebook.presto.hive.parquet.ParquetPageSource$ParquetBlockLoader.load(ParquetPageSource.java:225)
    at com.facebook.presto.spi.block.LazyBlock.assureLoaded(LazyBlock.java:283)
    at com.facebook.presto.spi.block.LazyBlock.getLoadedBlock(LazyBlock.java:274)
    at com.facebook.presto.spi.Page.getLoadedPage(Page.java:261)
    at com.facebook.presto.operator.TableScanOperator.getOutput(TableScanOperator.java:254)
    at com.facebook.presto.operator.Driver.processInternal(Driver.java:379)
    at com.facebook.presto.operator.Driver.lambda$processFor$8(Driver.java:283)
    at com.facebook.presto.operator.Driver.tryWithLock(Driver.java:675)
    at com.facebook.presto.operator.Driver.processFor(Driver.java:276)
    at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1077)
    at com.facebook.presto.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:162)
    at com.facebook.presto.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:483)
    at com.facebook.presto.$gen.Presto_0_227____20200211_134743_1.run(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

4

1 回答 1

0

似乎问题可能出在其他地方,这里是 hudi 团队提出的问题 --> https://github.com/apache/incubator-hudi/issues/1325

于 2020-02-17T17:51:13.627 回答