python - CSV 阅读器在逗号后错误地解析制表符

Question

我正在使用 CSV 阅读器在 Python 中读取 TSV。代码是：

f = csv.reader(open('sample.csv'), delimiter='\t')
for chunk in f:
   print(chunk)

选项卡分隔的 CSV 文件中的一行如下所示（此处托管 csv ）：

文档	unit1_toks	unit2_toks	unit1_txt1	unit2_txt2	s1_toks	s2_toks	unit1_sent	unit2_sent	目录
GUM_bio_galois	156-160	161-170	" 我们 zouden dan voorstellen	dat de auteur al zijn werk zou moeten publiceren	107-182	107-182	泊松宣称伽罗瓦的工作“不可理解”，宣称“[伽罗瓦]论证是不够的。” [16]	Poisson 宣称 Galois 的作品“难以理解”，并宣称“[Galois'] 论证会建议作者应该发表意见。” [16]	1>2

我得到以下输出（CSV 阅读器缺少一些制表符空格）：

['GUM_bio_galois', 
'156-160', 
'161-170', 
' We zouden dan voorstellen\tdat de auteur al zijn werk zou moeten publiceren\t107-182\t107-182\tPoisson declared Galois \' work  incomprehensible " , declaring that " [ Galois \' ] argument is not sufficient . " [ 16 ]', 
'Poisson declared Galois \' work " incomprehensible " , declaring that " [ Galois \' ] argument would then suggest that the author should publish the opinion . " [ 16 ]', 
'1>2']

我希望它看起来像这样：

['GUM_bio_galois', 
'156-160', 
'161-170', 
'" We zouden dan voorstellen',
'dat de auteur al zijn werk zou moeten publiceren',
'107-182',
'107-182',
'Poisson declared Galois \' work  incomprehensible " , declaring that " [ Galois \' ] argument is not sufficient . " [ 16 ]', 
'Poisson declared Galois \' work " incomprehensible " , declaring that " [ Galois \' ] argument would then suggest that the author should publish the opinion . " [ 16 ]', 
'1>2']

如何让 CSV 阅读器处理不完整的引号并将它们保留在我的输出中？

score 1 · Accepted Answer

import csv
with open('sample.csv') as f:
   rdr = csv.reader(f, quoting=csv.QUOTE_NONE, delimiter='\t')
   header = next(rdr)
   for line in rdr:
      print(line)

或使用csv.DictReader：

import csv
with open('sample.csv') as f:
   rdr = csv.DictReader(f, quoting=csv.QUOTE_NONE, delimiter='\t')
   for line in rdr:
      print(line)

python - CSV 阅读器在逗号后错误地解析制表符

1 回答 1

Related

Reference