python - 在 Python 2.7.5 中比较字符串和 unicode

Question

我想知道为什么当我制作：

a = [u'k',u'ę',u'ą']

然后输入：

'k' in a

我得到True，同时：

'ę' in a

会给我False吗？

这真的让我头疼，似乎有人故意这样做是为了让人们生气......

score 15 · Accepted Answer

这是为什么？

在 Python 2.x 中，您不能直接将 unicode 与非 ascii 字符的字符串进行比较。这将引发警告：

Warning (from warnings module):
  File "__main__", line 1
UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal

但是，在 Python 3.x 中这不会出现，因为所有字符串都是 unicode 对象。

解决方案？

您可以将字符串设为 unicode：

>>> u'ç' in a
True

现在，您正在比较两个 unicode 对象，而不是 unicode 到字符串。

或者在比较之前将两者都转换为编码，例如 utf-8：

>>> c = u"ç"
>>> u'ç'.encode('utf-8') == c.encode('utf-8')
True

此外，要在程序中使用非 ascii 字符，您必须在文件顶部指定编码：

# -*- coding: utf-8 -*-

#the whole program

希望这可以帮助！

score 4 · Accepted Answer

您需要明确地将字符串设为 unicode。下面显示了一个示例，以及当您未将其指定为 unicode 时给出的警告：

>>> a = [u'k',u'ę',u'ą']
>>> 'k' in a
True
>>> 'ę' in a
__main__:1: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
False
>>> u'ę' in a
True

score 1 · Accepted Answer

u'ę'是一个unicode对象，而'ę'是str您当前语言环境中的一个对象。有时，根据语言环境，它们会相同，有时则不会。

Python 3 的优点之一是所有文本都是 unicode，所以这个特殊问题就消失了。

score 0 · Accepted Answer

确保在 unicode 文字前面指定源代码编码和使用。u

这适用于 Python 3 和 Python 2：

#!/usr/bin/python
# -*- coding: utf-8 -*-

a = [u'k',u'ę',u'ą']

print(u'ę' in a)
# True

python - 在 Python 2.7.5 中比较字符串和 unicode

4 回答 4

Related

Reference