python - 将 unicode 字符串转换为其原始格式

Question

可能重复：
在python中将拉丁字符串转换为unicode

存储在文件中后，我有一个具有以下格式的列表

list_example = [
         u"\u00cdndia, Tail\u00e2ndia &amp; Cingapura",
         u"Lines through the days 1 (Arabic) \u0633\u0637\u0648\u0631 \u0639\u0628\u0631 \u0627\u0644\u0623\u064a\u0627\u0645 1",
]

但列表中字符串的实际格式是

actual_format = [
         "Índia, Tailândia & Cingapura ",
         "Lines through the days 1 (Arabic) سطور عبر الأيام 1 | شمس الدين خ "
]

如何将字符串转换为列表中list_example存在的字符串actual_format？

score 2 · Accepted Answer

你的问题对我来说有点不清楚。无论如何，以下指南应该可以帮助您解决问题。

如果您在 Python 源代码中定义这些字符串，那么您应该

知道您的编辑器以哪种字符编码保存源代码文件（例如 utf-8）
在源文件的第一行中声明该编码，例如# -*- coding: utf-8 -*-
将这些字符串定义为 unicode 对象：

strings = [u"Índia, Tailândia & Cingapura ", u"Lines through the days 1 (Arabic) سطور عبر الأيام 1 | شمس الدين خ "]

（注意：在 Python 3 中，文字字符串默认是 unicode 对象，即不需要u. 在 Python 2 中，unicode 字符串是 type unicode，在 Python 3 中，unicode strings 是 type string。）

然后，当您想要将这些字符串保存到文件时，您应该明确定义字符编码：

with open('filename', 'w') as f:
    s = '\n'.join(strings)
    f.write(s.encode('utf-8'))

然后，当您想再次从该文件中读取这些字符串时，您必须再次明确定义字符编码才能正确解码文件内容：

with open('filename') as f:
    strings = [l.decode('utf-8') for line in f]

score 1 · Accepted Answer

1

actual_format = [x.decode('unicode-escape') for x in list_example]

于 2012-05-25T11:10:18.353 回答

python - 将 unicode 字符串转换为其原始格式

2 回答 2

Related

Reference