python - 用熊猫加载大 CSV 文件

Question

我正在尝试将 csv 文件（大约 250 MB）加载为带有熊猫的数据框。在我的第一次尝试中，我使用了典型的 read_csv 命令，但我收到了错误内存。我已经尝试过使用大块的 Pandas 中的Large, persistent DataFrame 中提到的方法：

x=pd.read_csv('myfile.csv', iterator=True, chunksize=1000)
xx=pd.concat([chunk for chunk in x], ignore_index=True)

但是当我尝试连接时，我收到以下错误： Exception: "All objects passed are None"。实际上我无法访问这些块

我正在使用带有熊猫 0.11.0 的 32 位 winpython 3.3.2.1

score 2 · Accepted Answer

我建议你安装 64 位版本的 winpython。然后您应该能够毫无问题地加载 250 MB 的文件。

score 0 · Accepted Answer

我迟到了，但发布代码的实际问题是使用pd.concat([chunk for chunk in x])有效地取消了分块的任何好处，因为它将所有这些块再次连接到一个大 DataFrame 中。
这甚至可能暂时需要两倍的内存。

2 回答 2