0

我不明白为什么会收到此错误。请帮忙

>>> mylist = []
>>> file1 = open("medDict.txt", "r")
>>> for line in file1:
    from nltk.corpus import wordnet
    print line
    wordFromList2 = wordnet.synsets(line)[0]
    mylist.append(wordFromList2)


abnormal


Traceback (most recent call last):
  File "<pyshell#10>", line 4, in <module>
    wordFromList2 = wordnet.synsets(line)[0]
IndexError: list index out of range

medDict.txt 包含以下单词

abnormal
acne
ache
diarrhea
fever
4

2 回答 2

1

@Blender 关于word.synsets(). 如果您需要访问任何synsets具有自然语言空格的内容,Wordnet 使用下划线 _而不是. 例如,如果您想找到类似kick the bucket从 NLTK WN 界面访问同义词集的内容wn.synsets("kick_the_bucket")

>>> from nltk.corpus import wordnet as wn
>>> wn.synsets('kick the bucket')
[]
>>> wn.synsets('kick_the_bucket')
[Synset('die.v.01')]

但是,请注意,有时 WordNet 会使用破折号而不是下划线对一些同义词集进行编码。例如9-11,可以访问但9_11不是。

>>> wn.synsets('9-11')
[Synset('9/11.n.01')]
>>> wn.synsets('9_11')
[]

现在解决您的代码问题。

1.当你逐行读取文件时,你也读取了行中不可见但存在的内容\n。所以你需要改变这个:

>>> mylist = []
>>> file1 = open("medDict.txt", "r")

对此:

>>> words_from_file = [i.strip() for i in open("medDict.txt", "r")]

2.我不太确定你真的想要wordnet.synsets(word)[0],这意味着你只采取第一种感觉,请注意它可能不是Most Frequent Sense (MFS)。所以不要这样做:

>>> wordFromList2 = wordnet.synsets(line)[0]
>>> mylist.append(wordFromList2)

我认为更合适的方法是使用 aset而不是updateset

>>> list_of_synsets = set()
>>> for i in words_from_file:
>>>  list_of_synsets.update(wordnet.synsets(i))
>>> print list_of_synsets
于 2013-03-31T14:23:46.353 回答
0

word.synsets() is whitespace-sensitive:

>>> wordnet.synsets('abnormal')
    [Synset('abnormal.a.01'), Synset('abnormal.a.02'), Synset('abnormal.s.03')]
>>> wordnet.synsets(' abnormal')
    []

.strip() the whitespace from your line before passing it in:

wordFromList2 = wordnet.synsets(line.strip())[0]
于 2013-03-31T08:45:46.683 回答