python - urdu 字符串看起来相同，但相比之下发现不相等的 python3

Question

在我的应用程序中，我在文本文件中列出了（乌尔都语）单词，（目前像这样的单个单词）

我还有另一个包含 urdu 字符串的文本文件（目前像这样的单个单词并且完全相同）

现在我需要查找字符串文件的字符串是否包含单词文件中存在的任何单词。为此，我将两个文件都读入这样的列表中；

// 读取字符串的文本文件...

fileToRead = codecs.open('string.txt', mode, encoding=encoding)
fileData = fileToRead.read()
lstFileData = fileData.split('\n')


wordListToRead = codecs.open('words.txt', mode, encoding=encoding)
wordData = wordListToRead.read()
lstWords = wordData.split('\n')

我只是像这样遍历列表；

for string in lstFileData:
    if string in lstWords:
        // do further work

它不工作而且我不知道为什么？虽然字符串是 'فلسفے' 并且 lstWords 中有这个字符串。我需要添加一些编码吗？任何形式的帮助将不胜感激。

score 1 · Accepted Answer

刚刚在 python3 中尝试过，它似乎对我有用：

lstWords = ['a', 'فلسفے', 'b']
string = 'فلسفے'
if string in lstWords:
    print("yes")

编辑：再次，刚刚使用文件 IO 测试了您更新的代码，它工作正常（我没有指定编码）。这是它的工作链接：https ://trinket.io/python3/3890d8b261

score 0 · Accepted Answer

可能会帮助像我这样的人

虽然听起来很有趣，但问题在file encoding type. 我在简单的记事本中打开文件进行一些更改并保存它。它将我的文件从更改utf-8为utf-8 BOM. 而且我的代码不起作用。一旦我在 utf-8 中的 notepad++ 中创建了新文件，相同的代码就开始正常工作了。（因为问题不在代码中，而是在文件编码中）

python - urdu 字符串看起来相同，但相比之下发现不相等的 python3

2 回答 2

Related

Reference