python - 尝试在 Python 中将字典列表写入 csv，遇到编码问题

Question

因此，我遇到了一个编码问题，原因是在 Python 中将字典写入 csv。

这是一个示例代码：

import csv

some_list = ['jalape\xc3\xb1o']

with open('test_encode_output.csv', 'wb') as csvfile:
    output_file = csv.writer(csvfile)
    for item in some_list:
        output_file.writerow([item])

这工作得很好，给了我一个 csv 文件，里面写着“jalapeño”。

但是，当我创建包含此类 UTF-8 字符的值的字典列表时...

import csv

some_list = [{'main': ['4 dried ancho chile peppers, stems, veins
            and seeds removed']}, {'main': ['2 jalape\xc3\xb1o 
            peppers, seeded and chopped', '1 dash salt']}]

with open('test_encode_output.csv', 'wb') as csvfile:
    output_file = csv.writer(csvfile)
    for item in some_list:
        output_file.writerow([item])

我刚刚得到一个包含 2 行的 csv 文件，其中包含以下条目：

{'main': ['4 dried ancho chile peppers, stems, veins and seeds removed']}
{'main': ['2 jalape\xc3\xb1o peppers, seeded and chopped', '1 dash salt']}

我知道我的东西是用正确的编码写的，但是因为它们不是字符串，所以当它们被 csv.writer 写出时，它们是按原样写的。这令人沮丧。我在这里搜索了一些类似的问题，人们提到过使用 csv.DictWriter 但这对我来说效果并不好，因为我的字典列表不仅仅是 1 key 'main'。有些还有其他键，例如'toppings'，'crust'等。不仅如此，我还在对它们做更多的工作，最终输出是将成分格式化为数量，单位，成分，所以我最终会得到一个字典列表，如

[{'main': {'amount': ['4'], 'unit': [''], 
'ingredient': ['dried ancho chile peppers']}},
{'topping': {'amount': ['1'], 'unit': ['pump'], 
'ingredient': ['cool whip']}, 'filling': 
{'amount': ['2'], 'unit': ['cups'], 
'ingredient': ['strawberry jam']}}]

说真的，任何帮助都将不胜感激，否则我必须在 LibreOffice 中使用查找和替换来修复所有这些 \x** UTF-8 编码。

谢谢！

score 2 · Accepted Answer

您正在将字典写入CSV 文件，而.writerow()期望具有奇异值的列表在写入时会转换为字符串。

不要编写字典，正如您所发现的那样，它们会变成字符串表示形式。

您需要确定如何将每个字典的键和/或值转换为列，其中每列是单个原始值。

例如，如果您只想编写main密钥（如果存在），请执行以下操作：

with open('test_encode_output.csv', 'wb') as csvfile:
    output_file = csv.writer(csvfile)
    for item in some_list:
        if 'main' in item:
            output_file.writerow(item['main'])

假设与'main'键关联的值始终是值列表。

如果您想使用 Unicode 值保存字典，那么您使用了错误的工具。CSV 是一种平面数据格式，只有行和原始列。改用可以保留适量信息的工具。

对于具有字符串键、列表、数字和 unicode 文本的字典，您可以使用 JSON，或者pickle如果涉及更复杂和自定义的数据类型，也可以使用。使用 JSON 时，您确实希望将字节字符串解码为 Python Unicode 值，或者始终使用 UTF-8 编码的字节字符串，或者使用关键字说明json库应如何为您处理字符串编码：encoding

import json

with open('data.json', 'w') as jsonfile:
    json.dump(some_list, jsonfile, encoding='utf8')

因为 JSON 字符串始终是 unicode 值。的默认值encoding是utf8，但为了清楚起见，我在此处添加了它。

再次加载数据：

with open('data.json', 'r') as jsonfile:
    some_list = json.load(jsonfile)

请注意，这将返回 unicode 字符串，而不是编码为 UTF8 的字符串。

该pickle模块的工作方式大致相同，但数据格式不是人类可读的：

import pickle

# store
with open('data.pickle', 'wb') as pfile:
    pickle.dump(some_list, pfile)

# load
with open('data.pickle', 'rb') as pfile:
    some_list = pickle.load(pfile)

pickle将完全按照您存储的方式返回您的数据。字节串仍然是字节串，unicode 值将恢复为 unicode。

score 0 · Accepted Answer

正如您在输出中看到的那样，您使用了字典，因此如果您希望处理该字符串，则必须编写以下代码：

import csv

some_list = [{'main': ['4 dried ancho chile peppers, stems, veins', '\xc2\xa0\xc2\xa0\xc2\xa0 and seeds removed']}, {'main': ['2 jalape\xc3\xb1o peppers, seeded and chopped', '1 dash salt']}]

with open('test_encode_output.csv', 'wb') as csvfile:
    output_file = csv.writer(csvfile)
    for item in some_list:
        output_file.writerow(item['main'])  #so instead of [item], we use item['main']

我知道这可能不是您想要的代码，因为它限制您调用每个键 main 但至少它现在得到处理。

您可能想要更好地制定您想做的事情，因为现在还不是很清楚（至少对我来说）。例如，您是否想要一个 csv 文件，该文件在第一个单元格中为您提供 main，然后是 4 个干燥...

python - 尝试在 Python 中将字典列表写入 csv，遇到编码问题

2 回答 2

Related

Reference