2

我正在尝试以块的形式读取和过滤 csv 文件,然后将结果放入数据框中。

这是我用于读取和过滤 csv 的内容:

csv_chunks = pandas.read_csv(filepath, sep = DELIMITER,skiprows = 2, chunksize = 1000, converters = {"A": str, "B": str})
for chunk in csv_chunks:
    chunk = chunk[(chunk["B"] + chunk["A"]).isin(acids.tolist())]

当我去连接块时

df = pandas.concat(chunk for chunk in csv_chunks)

我收到一条错误消息

  File "C:\Program Files\Python\Anaconda\lib\site-packages\pandas\tools\merge.py
", line 872, in concat
verify_integrity=verify_integrity)
File "C:\Program Files\Python\Anaconda\lib\site-packages\pandas\tools\merge.py
", line 913, in __init__
raise Exception('All objects passed were None')
Exception: All objects passed were None

有几个块是空的,但也有非空的,所以不确定哪些对象被视为无。欢迎任何想法!

谢谢,安妮

4

1 回答 1

1

尝试:

csv_chunks = [chunk[(chunk["B"] + chunk["A"]).isin(acids.tolist())]
              for chunk in csv_chunks]
df = pandas.concat(csv_chunks)

编码

for chunk in csv_chunks:
    chunk = chunk[(chunk["B"] + chunk["A"]).isin(acids.tolist())]

可能没有按照您的意愿行事。随着 的每次迭代for-loopfor chunk in csv_chunks将一个项目分配csv_chunkschunk。然后,

chunk = chunk[(chunk["B"] + chunk["A"]).isin(acids.tolist())]

立即将新值重新分配chunk。很好,但这不会改变csv_chunks. 你只是在玩弄一些自变量中的值,chunk.

要修改 中的值csv_chunks,您可以使用列表推导来构建一个新列表,然后将其重新分配给变量csv_chunks

csv_chunks = [chunk[(chunk["B"] + chunk["A"]).isin(acids.tolist())]
              for chunk in csv_chunks]
于 2013-07-12T16:57:21.023 回答