python - R 在 Python 中的 read.table 等价物

Question

我正在尝试将我的一些处理工作从 R 转移到 Python。在 R 中，我使用 read.table() 来读取非常混乱的 CSV 文件，它会自动以正确的格式拆分记录。例如

391788,"HP Deskjet 3050 scanner always seems to break","<p>I'm running a Windows 7 64 blah blah blah........ake this work permanently?</p>

<p>Update: It might have something to do with my computer. It seems to work much better on another computer, windows 7 laptop. Not sure exactly what the deal is, but I'm still looking into it...</p>
","windows-7 printer hp"

被正确地分成 4 列。1 条记录可以分成多行，并且到处都有逗号。在 RI 中只需执行以下操作：

read.table(infile, header = FALSE, nrows=chunksize, sep=",", stringsAsFactors=FALSE)

Python中有什么东西可以同样好地做到这一点吗？

谢谢！

score 4 · Accepted Answer

您可以使用 csv 模块。

from csv import reader
csv_reader = reader(open("C:/text.txt","r"), quotechar="\"")

for row in csv_reader:
    print row

['391788', 'HP Deskjet 3050 scanner always seems to break', "<p>I'm running a Windows 7 64 blah blah blah........ake this work permanently?</p>\n\n<p>Update: It might have something to do with my computer. It seems to work much better on another computer, windows 7 laptop. Not sure exactly what the deal is, but I'm still looking into it...</p>\n", 'windows-7 printer hp']

输出长度 = 4

score 3 · Accepted Answer

该pandas模块还提供了许多类似 R 的函数和数据结构，包括read_csv. 这里的优点是数据将作为 pandas 读入DataFrame，这比标准的 python 列表或 dict 更容易操作（特别是如果你习惯了 R）。这是一个例子：

>>> from pandas import read_csv
>>> ugly = read_csv("ugly.csv",header=None)
>>> ugly
        0                                              1  \
0  391788  HP Deskjet 3050 scanner always seems to break   

                                                   2                     3  
0  <p>I'm running a Windows 7 64 blah blah blah.....  windows-7 printer hp

python - R 在 Python 中的 read.table 等价物

2 回答 2

Related

Reference