我正在尝试将我的一些处理工作从 R 转移到 Python。在 R 中,我使用 read.table() 来读取非常混乱的 CSV 文件,它会自动以正确的格式拆分记录。例如
391788,"HP Deskjet 3050 scanner always seems to break","<p>I'm running a Windows 7 64 blah blah blah........ake this work permanently?</p>
<p>Update: It might have something to do with my computer. It seems to work much better on another computer, windows 7 laptop. Not sure exactly what the deal is, but I'm still looking into it...</p>
","windows-7 printer hp"
被正确地分成 4 列。1 条记录可以分成多行,并且到处都有逗号。在 RI 中只需执行以下操作:
read.table(infile, header = FALSE, nrows=chunksize, sep=",", stringsAsFactors=FALSE)
Python中有什么东西可以同样好地做到这一点吗?
谢谢!