我尝试从 AWS S3 读取镶木地板文件。
相同的代码适用于我的 Windows 机器。
谷歌搜索没有结果。
Pandas 应该使用 fastparquet 来构建数据框。fastparquet 已安装。
代码:
import boto3
import pandas as pd
def get_parquet_from_s3(bucket_name, file_name):
"""
:param bucket_name:
:param file_name:
:return:
"""
df = pd.read_parquet('s3://{}/{}'.format(bucket_name, file_name))
print(df.head())
get_parquet_from_s3('my_bucket_name','my_file_name')
我得到以下异常:
/home/ubuntu/.local/lib/python3.6/site-packages/numba/errors.py:131: UserWarning: Insufficiently recent colorama version found. Numba requires colorama >= 0.3.9
warnings.warn(msg)
Traceback (most recent call last):
File "test_pd_read_parq.py", line 15, in <module>
get_parquet_from_s3('my_bucket_name','my_file_name')
File "test_pd_read_parq.py", line 12, in get_parquet_from_s3
df = pd.read_parquet('s3://{}/{}'.format(bucket_name, file_name))
File "/home/ubuntu/.local/lib/python3.6/site-packages/pandas/io/parquet.py", line 294, in read_parquet
return impl.read(path, columns=columns, **kwargs)
File "/home/ubuntu/.local/lib/python3.6/site-packages/pandas/io/parquet.py", line 192, in read
parquet_file = self.api.ParquetFile(path, open_with=s3.s3.open)
AttributeError: 'S3File' object has no attribute 's3'
软件和操作系统版本
python : 3.6
pandas : 0.25.0
s3fs : 0.3.1
ubuntu : 18.04
fastparquet : 0.3.1
boto3 : 1.9.198
botocore : 1.12.198
解决方法
import s3fs
from fastparquet import ParquetFile
def get_parquet_from_s3(bucket_name, file_name
s3 = s3fs.S3FileSystem()
pf = ParquetFile('{}/{}'.format(bucket_name, file_name), open_with=s3.open)
df = pf.to_pandas()