我正在尝试将多个镶木地板文件合并为一个。他们的模式在领域方面是相同的,但我ParquetWriter
抱怨他们不是。经过一番调查,我发现模式中的 pandas 元数据不同,导致了这个错误。
是否可以忽略/合并/删除熊猫元?我什至需要熊猫元?
import pyarrow.parquet as pq
pq_tables=[]
for file_ in files:
pq_table = pq.read_table(f'{MESS_DIR}/{file_}')
pq_tables.append(pq_table)
if writer is None:
writer = pq.ParquetWriter(COMPRESSED_FILE, schema=pq_table.schema, use_deprecated_int96_timestamps=True)
writer.write_table(table=pq_table)
确切的错误-
Traceback (most recent call last):
File "{PATH_TO}/main.py", line 68, in lambda_handler
writer.write_table(table=pq_table)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/parquet.py", line 335, in write_table
raise ValueError(msg)
ValueError: Table schema does not match schema used to create file: