python - 比较字符串不起作用

Question

我有一个存储在文本文件中并加载到列表中的文章标题列表。我正在尝试将当前标题与该列表中的所有标题进行比较，如下所示

def duplicate(entry):
    for line in posted_titles:
        print 'Comparing'
        print entry.title
        print line
        if line.lower() == entry.title.lower()
            print 'found duplicate'
            return True
    return False

我的问题是，这永远不会返回 true。当它为 and 打印出相同的字符串时entry.title，line它不会将它们标记为相等。是否有字符串比较方法或我应该使用的东西？

编辑查看字符串的表示后，repr(line)正在比较的字符串如下所示：

u"Some Article Title About Things And Stuff - Publisher Name"
'Some Article Title About Things And Stuff - Publisher Name'

score 1 · Accepted Answer

如果你能提供一个实际的例子，它会更有帮助。

无论如何，您的问题是 Python 2 中的不同字符串编码。entry.title显然是 unicode 字符串（u在引号前用 a 表示），line而是正常的str（反之亦然）。

对于在两种格式中均等表示的所有字符（ASCII 字符和可能更多），相等比较将成功。对于其他角色，它不会：

>>> 'Ä' == u'Ä'
False

当以相反的顺序进行比较时，IDLE 实际上在这里给出了警告：

>>> u'Ä' == 'Ä'
Warning (from warnings module):
  File "__main__", line 1
UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
False

str.decode您可以通过使用和提供原始编码从普通字符串中获取 unicode 字符串。例如latin1在我的 IDLE 中：

>>> 'Ä'.decode('latin1')
u'\xc4'
>>> 'Ä'.decode('latin1') == u'Ä'
True

如果您知道它是 utf-8，您也可以指定它。例如，使用 utf-8 保存的以下文件也将打印 True：

# -*- coding: utf-8 -*-
print('Ä'.decode('utf-8') == u'Ä')

score 0 · Accepted Answer

==适合字符串比较。确保您正在处理字符串

if str(line).lower() == str(entry.title).lower()

其他可能的语法是布尔表达式str1 is str2。

python - 比较字符串不起作用

2 回答 2

Related

Reference