python - 将 Pandas DataFrame 与 In-Memory Feather 相互转换

Question

使用 pandas 中的 IO 工具可以将 a 转换为DataFrame内存中的羽化缓冲区：

import pandas as pd  
from io import BytesIO 

df = pd.DataFrame({'a': [1,2], 'b': [3.0,4.0]})  

buf = BytesIO()

df.to_feather(buf)

但是，使用相同的缓冲区转换回 DataFrame

pd.read_feather(buf)

导致错误：

ArrowInvalid：不是羽毛文件

如何将 DataFrame 转换为内存中的羽化表示，并相应地转换回 DataFrame？

预先感谢您的考虑和回复。

score 6 · Accepted Answer

pandas==0.25.2这可以通过以下方式完成：

import pandas
import io
df = pandas.DataFrame(data={'a': [1, 2], 'b': [3.0, 4.0]})
buf = io.BytesIO()
df.to_feather(buf)
output = pandas.read_feather(buf)

然后调用output.head(2)返回：

    a    b
 0  1  3.0
 1  2  4.0

如果您有DataFrame多个索引，您可能会看到类似的错误

ValueError: 羽毛不支持对索引进行序列化；您可以 .reset_index() 将索引设为列

在这种情况下，您需要先调用，.reset_index()然后再to_feather调用.set_index([...])read_feather

我想补充的最后一件事是，如果您正在对执行某些操作，则BytesIO需要在写入羽毛字节后回溯到 0。例如：

buffer = io.BytesIO()
df.reset_index(drop=False).to_feather(buffer)
buffer.seek(0)
s3_client.put_object(Body=buffer, Bucket='bucket', Key='file')

python - 将 Pandas DataFrame 与 In-Memory Feather 相互转换

1 回答 1

Related

Reference