python - 遍历python列表

Question

我有如下 UTF-8 Unicode 文本文件（非英语）

unicode 文本文件

所以我在python中将编码标记为UTF-8并将文件导入python。

# -*- coding: utf-8 -*-

我用“。”标记了句子。并得到句子列表。

句子列表

现在我需要与另一个 unicode 单词列表进行比较，并找出每个句子中是否有这些单词。

这是我的代码。但它只显示了第一个匹配项。

for sentence in sentences:
    for word in sentence.split(" "):
        if word in pronouns:
            print sentence

编辑：

最后我注意到源文本文件中有无效的 unicode 字符。这里描述了Tokenizing unicode using nltk

score 2 · Accepted Answer

我试图模拟你的问题，但我得到了预期的结果，也许问题出在编码或你的代词列表中。

pronouns = ['aa','bb','cc']

sentences = ['aa dkdje asdf aesr','bb asersada','cc ase aser sa sa c ','aa saef sf se s', 'aa','bb']

for sentence in sentences:
    for word in sentence.split(" "):
        if word in pronouns:
            print (sentence)

代码的输出是：

aa dkdje asdf aesr
bb asersada
cc ase aser sa sa c 
aa saef sf se s
aa
bb

希望这会有所帮助。

python - 遍历python列表

1 回答 1

Related

Reference