环境信息
- 快速版本:0.248
- 蜂巢版本:2.3.8
- Hadoop:2.10.1
- hudi 版本:0.9
问题描述
我在hive中创建了一些外部表,通过hive客户端查询这些表,所有表看起来都正常。但是当我使用select * from tablex limit 10
presto客户端中这样的语句查询这些表时,有的表可以查询成功,有的表查询失败。我对比了查询成功和失败的表,发现除了表结构和数据外,它们似乎没有什么不同。这个问题困扰了我好几天。非常感谢您的回答。
- hive连接器的配置如下
connector.name=hive-hadoop2
hive.metastore.uri=thrift://x.x.x.x:9083
hive.config.resources=/xxx/hdfs-site.xml
hive.metastore-cache-ttl=0s
hive.metastore-refresh-interval=1s
hive.parquet.use-column-names=true
- 创建外部表的示例语句如下:
CREATE EXTERNAL TABLE `tablex`(
`_hoodie_commit_time` string,
`_hoodie_commit_seqno` string,
`_hoodie_record_key` string,
`_hoodie_partition_path` string,
`_hoodie_file_name` string,
`bid` bigint,
`create_date_time` bigint,
...
)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
WITH SERDEPROPERTIES (
'path'='xxxx')
STORED AS INPUTFORMAT
'org.apache.hudi.hadoop.HoodieParquetInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
'hdfs://ns/xxxx'
TBLPROPERTIES (
'last_commit_time_sync'='20210820033837',
'spark.sql.sources.provider'='hudi',
'spark.sql.sources.schema.numParts'='1',
'spark.sql.sources.schema.part.0'='{"type":"struct","fields":[{"name":"bid","type":"long","nullable":true,"metadata":{}},{"name":"create_date_time","type":"timestamp","nullable":true,"metadata":{}},...]}',
'transient_lastDdlTime'='1629401918')
- 堆栈跟踪
2021-08-21T17:20:43.986+0800 DEBUG hive-hive-51 org.apache.hadoop.ipc.ProtobufRpcEngine Call: getListing took 0ms
2021-08-21T17:20:43.987+0800 INFO hive-hive-51 org.apache.hudi.common.table.view.HoodieTableFileSystemView Adding file-groups for partition :, #FileGroups=3
2021-08-21T17:20:43.987+0800 INFO hive-hive-51 org.apache.hudi.common.table.view.AbstractTableFileSystemView addFilesToView: NumFiles=5, FileGroupsCreationTime=0, StoreTimeTaken=0
2021-08-21T17:20:43.987+0800 INFO hive-hive-51 org.apache.hudi.hadoop.HoodieROTablePathFilter Based on hoodie metadata from base path: hdfs://ns/xxxx, caching 3 files under hdfs://ns/xxxx
2021-08-21T17:20:43.988+0800 DEBUG hive-hive-51 org.apache.hudi.hadoop.HoodieROTablePathFilter hdfs://ns/xxxx/.hoodie_partition_metadata checked after cache population, accept => false
2021-08-21T17:20:43.988+0800 DEBUG hive-hive-51 org.apache.hudi.hadoop.HoodieROTablePathFilter Checking acceptance for path hdfs://ns/xxxx/44028b6c-e34a-4fe6-a8bf-bd141e80f84b-0_0-347-624_20210820050305.parquet
2021-08-21T17:20:43.988+0800 DEBUG hive-hive-51 org.apache.hudi.hadoop.HoodieROTablePathFilter hdfs://ns/xxxx/44028b6c-e34a-4fe6-a8bf-bd141e80f84b-0_0-347-624_20210820050305.parquet Hoodie path checked against cache, accept => true
2021-08-21T17:20:43.989+0800 DEBUG hive-hive-51 org.apache.hudi.hadoop.HoodieROTablePathFilter Checking acceptance for path hdfs://ns/xxxx/ab5c1b29-d764-4f84-96e9-9b95c9f24563-0_2-347-626_20210820050305.parquet
2021-08-21T17:20:43.989+0800 DEBUG hive-hive-51 org.apache.hudi.hadoop.HoodieROTablePathFilter hdfs://ns/xxxx/ab5c1b29-d764-4f84-96e9-9b95c9f24563-0_2-347-626_20210820050305.parquet Hoodie path checked against cache, accept => true
2021-08-21T17:20:43.990+0800 DEBUG hive-hive-51 org.apache.hudi.hadoop.HoodieROTablePathFilter Checking acceptance for path hdfs://ns/xxxx/adc51a4f-745f-4aa6-8f9a-d3037edee18c-0_1-347-625_20210820050305.parquet
2021-08-21T17:20:43.990+0800 DEBUG hive-hive-51 org.apache.hudi.hadoop.HoodieROTablePathFilter hdfs://ns/xxxx/adc51a4f-745f-4aa6-8f9a-d3037edee18c-0_1-347-625_20210820050305.parquet Hoodie path checked against cache, accept => true
2021-08-21T17:20:44.017+0800 ERROR remote-task-callback-190 com.facebook.presto.execution.StageExecutionStateMachine Stage execution 20210821_092043_00005_uzj3a.1.0 failed
com.facebook.presto.spi.PrestoException: Error opening Hive split hdfs://ns/xxxx/adc51a4f-745f-4aa6-8f9a-d3037edee18c-0_1-347-625_20210820050305.parquet (offset=0, length=6145852): null
at com.facebook.presto.hive.parquet.ParquetPageSourceFactory.createParquetPageSource(ParquetPageSourceFactory.java:328)
at com.facebook.presto.hive.parquet.ParquetPageSourceFactory.createPageSource(ParquetPageSourceFactory.java:172)
at com.facebook.presto.hive.HivePageSourceProvider.createHivePageSource(HivePageSourceProvider.java:394)
at com.facebook.presto.hive.HivePageSourceProvider.createPageSource(HivePageSourceProvider.java:184)
at com.facebook.presto.spi.connector.classloader.ClassLoaderSafeConnectorPageSourceProvider.createPageSource(ClassLoaderSafeConnectorPageSourceProvider.java:63)
at com.facebook.presto.split.PageSourceManager.createPageSource(PageSourceManager.java:80)
at com.facebook.presto.operator.TableScanOperator.getOutput(TableScanOperator.java:248)
at com.facebook.presto.operator.Driver.processInternal(Driver.java:418)
at com.facebook.presto.operator.Driver.lambda$processFor$9(Driver.java:301)
at com.facebook.presto.operator.Driver.tryWithLock(Driver.java:722)
at com.facebook.presto.operator.Driver.processFor(Driver.java:294)
at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1077)
at com.facebook.presto.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:162)
at com.facebook.presto.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:545)
at com.facebook.presto.$gen.Presto_0_248_0b0ce2f____20210820_050243_1.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:101)
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:622)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3441)
at org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:777)
at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:694)
at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:355)
at org.apache.hadoop.hdfs.DFSInputStream.actualGetFromOneDataNode(DFSInputStream.java:1161)
at org.apache.hadoop.hdfs.DFSInputStream.fetchBlockByteRange(DFSInputStream.java:1086)
at org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1439)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1402)
at org.apache.hadoop.fs.FSInputStream.readFully(FSInputStream.java:78)
at org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:116)
at com.facebook.presto.parquet.cache.MetadataReader.readFooter(MetadataReader.java:97)
at com.facebook.presto.parquet.cache.MetadataReader.getParquetMetadata(MetadataReader.java:318)
at com.facebook.presto.hive.parquet.ParquetPageSourceFactory.createParquetPageSource(ParquetPageSourceFactory.java:223)
... 17 more
2021-08-21T17:20:44.037+0800 INFO dispatcher-query-104 com.facebook.presto.event.QueryMonitor TIMELINE: Query 20210821_092043_00005_uzj3a :: Transaction:[46dd708d-1e43-4acd-b4b8-7d84bde8cbe8] :: elapsed 160ms :: planning 22ms :: scheduling 134ms :: running 42874ms :: finishing 0ms :: begin 2021-08-21T17:20:43.857+08:00 :: end 2021-08-21T17:20:44.017+08:00
2021-08-21T17:20:53.987+0800 DEBUG IPC Client (62922123) connection to node02/x.x.x.x:9000 from zhongtai org.apache.hadoop.ipc.Client IPC Client (62922123) connection to node02/x.x.x.x:9000from zhongtai: closed
2021-08-21T17:20:53.987+0800 DEBUG IPC Client (62922123) connection to node02/x.x.x.x:9000 from zhongtai org.apache.hadoop.ipc.Client IPC Client (62922123) connection to node02/x.x.x.x:9000from zhongtai: stopped, remaining connections 0
注:一些敏感信息如ip、路径等,我替换为xx等内容