python - 带有 nltk.wordnet.synsets 的 Python IF 语句

Question

import nltk
from nltk import *
from nltk.corpus import wordnet as wn

output=[]
wordlist=[]

entries = nltk.corpus.cmudict.entries()

for entry in entries[:200]: #create a list of words, without the pronounciation since.pos_tag only works with a list
    wordlist.append(entry[0])

for word in nltk.pos_tag(wordlist): #create a list of nouns
    if(word[1]=='NN'):
        output.append(word[0])

for word in output:
    x = wn.synsets(word) #remove all words which does not have synsets (this is the problem)
    if len(x)<1:
        output.remove(word)

for word in output[:200]:
    print (word," ",len(wn.synsets(word)))

我正在尝试删除所有没有同义词的单词，但由于某种原因它不起作用。运行程序后，我发现即使说一个词具有 len(wn.synsets(word)) = 0，它也不会从我的列表中删除。有人可以告诉我出了什么问题吗？

score 5 · Accepted Answer

您不能遍历列表并同时删除当前项目。这是一个演示问题的玩具示例：

In [73]: output = range(10)

In [74]: for item in output:
   ....:     output.remove(item)

您可能希望output删除其中的所有项目。但其中一半仍然存在：

In [75]: output
Out[75]: [1, 3, 5, 7, 9]

为什么不能同时循环和删除：

想象一下 Python 使用一个内部计数器来记住当前项目的索引，因为它通过for-loop.

当计数器等于 0 时（第一次通过循环），Python 执行

output.remove(item)

美好的。中现在少了一项output。但随后 Python 将计数器增加到 1。所以 word 的下一个值是output[1]，它是原始列表中的第三项。

0  <-- first item removed
1  <-- the new output[0] ** THIS ONE GETS SKIPPED **
2  <-- the new output[1] -- gets removed on the next iteration

（解决方法）解决方案：

相反，要么迭代的副本output，要么构建一个新列表。在这种情况下，我认为构建一个新列表更有效：

new_output = []
for word in output:
    x = wn.synsets(word) 
    if len(x)>=1:
        new_output.append(word)

python - 带有 nltk.wordnet.synsets 的 Python IF 语句

1 回答 1

Related

Reference