我有一个使用 hudi 从 spark kinesis 流中创建并存储在 S3 中的镶木地板记录。
从此记录生成 AWS 粘合表。org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat
我按照说明将 InputRecord 类型更新为https://cwiki.apache.org/confluence/display/HUDI/Migration+Guide+From+com.uber.hoodie+to+org.apache.hudi
从我运行的 presto-cli
presto-cli --catalog hive --schema my-schema --server my-server:8889
presto:my-schema> select * from table
这返回
Query 20200211_185222_00050_hej8h, FAILED, 1 node
Splits: 17 total, 0 done (0.00%)
0:01 [0 rows, 0B] [0 rows/s, 0B/s]
Query 20200211_185222_00050_hej8h failed: No value present
但是当我跑步时
select id from table
它返回
id
----------
34551832
(1 row)
Query 20200211_185250_00051_hej8h, FINISHED, 1 node
Splits: 17 total, 17 done (100.00%)
0:00 [1 rows, 93B] [2 rows/s, 213B/s]
这是预期的行为吗?或者 Hudi/AWS Glue/Presto 之间的设置是否存在潜在问题
2020 年 2 月 12 日更新
使用 --debug 选项的堆栈跟踪
presto:schema> select * from table;
Query 20200212_092259_00006_hej8h, FAILED, 1 node
http://xx-xxx-xxx-xxx.xx-xxxxx-xxx.compute.amazonaws.com:8889/ui/query.html?20200212_092259_00006_hej8h
Splits: 17 total, 0 done (0.00%)
CPU Time: 0.0s total, 0 rows/s, 0B/s, 23% active
Per Node: 0.1 parallelism, 0 rows/s, 0B/s
Parallelism: 0.1
Peak Memory: 0B
0:00 [0 rows, 0B] [0 rows/s, 0B/s]
Query 20200212_092259_00006_hej8h failed: No value present
java.util.NoSuchElementException: No value present
at java.util.Optional.get(Optional.java:135)
at com.facebook.presto.parquet.reader.ParquetReader.readArray(ParquetReader.java:156)
at com.facebook.presto.parquet.reader.ParquetReader.readColumnChunk(ParquetReader.java:282)
at com.facebook.presto.parquet.reader.ParquetReader.readStruct(ParquetReader.java:193)
at com.facebook.presto.parquet.reader.ParquetReader.readColumnChunk(ParquetReader.java:276)
at com.facebook.presto.parquet.reader.ParquetReader.readStruct(ParquetReader.java:193)
at com.facebook.presto.parquet.reader.ParquetReader.readColumnChunk(ParquetReader.java:276)
at com.facebook.presto.parquet.reader.ParquetReader.readBlock(ParquetReader.java:268)
at com.facebook.presto.hive.parquet.ParquetPageSource$ParquetBlockLoader.load(ParquetPageSource.java:247)
at com.facebook.presto.hive.parquet.ParquetPageSource$ParquetBlockLoader.load(ParquetPageSource.java:225)
at com.facebook.presto.spi.block.LazyBlock.assureLoaded(LazyBlock.java:283)
at com.facebook.presto.spi.block.LazyBlock.getLoadedBlock(LazyBlock.java:274)
at com.facebook.presto.spi.Page.getLoadedPage(Page.java:261)
at com.facebook.presto.operator.TableScanOperator.getOutput(TableScanOperator.java:254)
at com.facebook.presto.operator.Driver.processInternal(Driver.java:379)
at com.facebook.presto.operator.Driver.lambda$processFor$8(Driver.java:283)
at com.facebook.presto.operator.Driver.tryWithLock(Driver.java:675)
at com.facebook.presto.operator.Driver.processFor(Driver.java:276)
at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1077)
at com.facebook.presto.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:162)
at com.facebook.presto.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:483)
at com.facebook.presto.$gen.Presto_0_227____20200211_134743_1.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)