python-2.7 - 当单词的同义词存在时，为什么代码在python中返回IndexError错误

Question

我不明白为什么会收到此错误。请帮忙

>>> mylist = []
>>> file1 = open("medDict.txt", "r")
>>> for line in file1:
    from nltk.corpus import wordnet
    print line
    wordFromList2 = wordnet.synsets(line)[0]
    mylist.append(wordFromList2)


abnormal


Traceback (most recent call last):
  File "<pyshell#10>", line 4, in <module>
    wordFromList2 = wordnet.synsets(line)[0]
IndexError: list index out of range

medDict.txt 包含以下单词

abnormal
acne
ache
diarrhea
fever

score 1 · Accepted Answer

@Blender 关于word.synsets(). 如果您需要访问任何synsets具有自然语言空格的内容，Wordnet 使用下划线 _而不是. 例如，如果您想找到类似kick the bucket从 NLTK WN 界面访问同义词集的内容wn.synsets("kick_the_bucket")

>>> from nltk.corpus import wordnet as wn
>>> wn.synsets('kick the bucket')
[]
>>> wn.synsets('kick_the_bucket')
[Synset('die.v.01')]

但是，请注意，有时 WordNet 会使用破折号而不是下划线对一些同义词集进行编码。例如9-11，可以访问但9_11不是。

>>> wn.synsets('9-11')
[Synset('9/11.n.01')]
>>> wn.synsets('9_11')
[]

现在解决您的代码问题。

1.当你逐行读取文件时，你也读取了行中不可见但存在的内容\n。所以你需要改变这个：

>>> mylist = []
>>> file1 = open("medDict.txt", "r")

对此：

>>> words_from_file = [i.strip() for i in open("medDict.txt", "r")]

2.我不太确定你真的想要wordnet.synsets(word)[0]，这意味着你只采取第一种感觉，请注意它可能不是Most Frequent Sense (MFS)。所以不要这样做：

>>> wordFromList2 = wordnet.synsets(line)[0]
>>> mylist.append(wordFromList2)

我认为更合适的方法是使用 aset而不是updateset

>>> list_of_synsets = set()
>>> for i in words_from_file:
>>>  list_of_synsets.update(wordnet.synsets(i))
>>> print list_of_synsets

score 0 · Accepted Answer

word.synsets() is whitespace-sensitive:

>>> wordnet.synsets('abnormal')
    [Synset('abnormal.a.01'), Synset('abnormal.a.02'), Synset('abnormal.s.03')]
>>> wordnet.synsets(' abnormal')
    []

.strip() the whitespace from your line before passing it in:

wordFromList2 = wordnet.synsets(line.strip())[0]

python-2.7 - 当单词的同义词存在时，为什么代码在python中返回IndexError错误

2 回答 2

Related

Reference