我有一个 snappy.parquet 文件,其架构如下:
{
"type": "struct",
"fields": [{
"name": "MyTinyInt",
"type": "byte",
"nullable": true,
"metadata": {}
}
...
]
}
更新:镶木地板工具揭示了这一点:
############ Column(MyTinyInt) ############
name: MyTinyInt
path: MyTinyInt
max_definition_level: 1
max_repetition_level: 0
physical_type: INT32
logical_type: Int(bitWidth=8, isSigned=true)
converted_type (legacy): INT_8
当我尝试在 Azure Data Studio 中运行存储过程以使用 PolyBase 将其加载到外部临时表中时,出现错误:
11:16:21Started executing query at Line 113
Msg 106000, Level 16, State 1, Line 1
HdfsBridge::recordReaderFillBuffer - Unexpected error encountered filling record reader buffer: ClassCastException: class java.lang.Integer cannot be cast to class parquet.io.api.Binary (java.lang.Integer is in module java.base of loader 'bootstrap'; parquet.io.api.Binary is in unnamed module of loader 'app')
仅使用 varchars 加载到外部表中工作正常
CREATE EXTERNAL TABLE [domain].[TempTable]
(
...
MyTinyInt tinyint NULL,
...
)
WITH
(
LOCATION = ''' + @Location + ''',
DATA_SOURCE = datalake,
FILE_FORMAT = parquet_snappy
)
数据最终将合并到数据仓库 Synapse 表中。在该表中,列必须是 tinyint 类型。