python - Python 将 latin1 转换为 UTF8

Question

在 Python 2.7 中，如何将 latin1 字符串转换为 UTF-8。

例如，我正在尝试将 é 转换为 utf-8。

>>> "é"
'\xe9'
>>> u"é"
u'\xe9'
>>> u"é".encode('utf-8')
'\xc3\xa9'
>>> print u"é".encode('utf-8')
Ã©

字母是é，它是拉丁小写字母E WITH ACUTE (U+00E9) UTF-8 字节编码为：c3a9
拉丁字节编码为：e9

如何获得拉丁字符串的 UTF-8 编码版本？有人可以举一个如何转换é的例子吗？

score 10 · Accepted Answer

要将字节序列从 latin 1 解码为 Unicode，请使用以下.decode()方法：

>>> '\xe9'.decode('latin1')
u'\xe9'

Python\xab对下面的 unicode 代码点使用转义\u00ff。

>>> '\xe9'.decode('latin1') == u'\u00e9'
True

上面的 Latin-1 字符可以编码为 UTF-8 为：

>>> '\xe9'.decode('latin1').encode('utf8')
'\xc3\xa9'

score 3 · Accepted Answer

>>> u"é".encode('utf-8')
'\xc3\xa9'

你有一个 UTF-8 编码的字节序列。不要尝试直接打印编码字节。要打印它们，您需要将编码字节解码回 Unicode 字符串。

>>> u"é".encode('utf-8').decode('utf-8')
u'\xe9'
>>> print u"é".encode('utf-8').decode('utf-8')
é

请注意，编码和解码是相反的操作，可以有效地抵消。u"é"尽管 Python 将其打印为等效的u'\xe9'.

>>> u"é" == u'\xe9'
True

score 0 · Accepted Answer

概念 = concept.encode('ascii', 'ignore') 概念 = MySQLdb.escape_string(concept.decode('latin1').encode('utf8').rstrip())

我这样做，我不确定这是否是一个好方法，但它每次都有效！

3 回答 3