2

我安装了最新版本的parquet-toolsfromapache-mr和 version parquet-tools-1.8.2.jar

这是一个可重现的示例:

>>> import boto3
>>> client = GET_CLIENT() # redacted
>>> import pandas as pd
>>> df = pd.DataFrame([[1,2,3]], columns=["a","b","c"])
>>> df
   a  b  c
0  1  2  3
>>> from io import BytesIO
>>> filebuf = BytesIO()
>>> df.to_parquet(filebuf, compression="zstd") # Change this to gzip and it works!
>>> client.put_object(Bucket="foo", Key="bar/example.zstd.parquet", Body=filebuf.getvalue())

aws s3 cp得到了 parquet 文件并试图parquet-tools head在它上面运行,但是得到了:

$ parquet-tools head example.zstd.parquet
Could not read footer: java.lang.NullPointerException

但是,对 gzip 压缩文件执行相同的命令会给我:

$ parquet-tools head example.gzip.parquet
a = 1
b = 2
c = 3

这是 zstd 压缩或 parquet-tools 的错误吗?还是我没有在某处阅读细则?

注意:我parquet-tools的别名为java -jar .../parquet-tools-1.8.2.jar

4

0 回答 0