我正在尝试将包含 utf-8 字符串的字典写入 CSV。我正在按照这里的说明进行操作。然而,尽管对这些 utf-8 字符串进行了精心编码和解码,但我得到了一个涉及“ascii”集的 UnicodeEncodeErrors。
我有一个字典列表,其中包含字符串和整数作为与维基百科文章更改相关的值。下面的列表对应于这种变化,例如:
edgelist = [{'articleName': 'Barack Obama', 'editorName': 'Schonbrunn', 'revID': '121844749', 'bytesAdded': '183'},
{'articleName': 'Barack Obama', 'editorName': 'Eep\xc2\xb2', 'revID': '121862749', 'bytesAdded': '107'}]
问题是list[1]['editorName']
。它有类型'str'
并且el[1]['editorName'].decode('utf-8')
是u'Eep\xb2'
我正在尝试的代码是:
_ENCODING = 'utf-8'
def dictToCSV(edgelist,output_file):
with codecs.open(output_file,'wb',encoding=_ENCODING) as f:
w = csv.DictWriter(f,sorted(edgelist[0].keys()))
w.writeheader()
for d in edgelist:
for k,v in d.items():
if type(v) == int:
d[k]=str(v).encode(_ENCODING)
w.writerow({k:v.decode(_ENCODING) for k,v in d.items()})
这将返回:
dictToCSV(edgelist,'test2.csv')
File "csv_to_charts.py", line 129, in dictToCSV
w.writerow({k:v.decode(_ENCODING,'ignore') for k,v in d.items()})
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/csv.py", line 148, in writerow
return self.writer.writerow(self._dict_to_list(rowdict))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xb2' in position 3: ordinal not in range(128)
其他排列,例如将 decode 交换为 encode 或最后有问题的行中没有任何内容也会返回错误:
w.writerow({k:v.encode(_ENCODING) for k,v in d.items()})
返回'UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 56: ordinal not in range(128)
w.writerow({k:v for k,v in d.items()})
返回UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 56: ordinal not in range(128)
- 在此之后,我更改
with codecs.open(output_file,'wb',encoding=_ENCODING) as f:
为with open(output_file,'wb') as f:
并仍然收到相同的错误。
排除列表元素或包含此有问题的字符串的键,否则脚本可以正常工作。