我正在使用 WordNet 和 NLTK 进行词义消歧。我对所有与声音有关的单词都感兴趣。我有一个这样的单词列表,“roll”就是其中之一。然后我检查我的任何句子是否包含这个词(我也会根据 POS 检查它)。如果是,我只想选择与声音相关的句子。在下面的示例中,它将是第二句话。我现在的想法就是选择这样的词,他们的定义中有一个词“声音”作为“快速连续敲打的鼓(尤其是小军鼓)的声音”。但我怀疑还有一种更优雅的方式。任何想法将不胜感激!
from nltk.wsd import lesk
from nltk.corpus import wordnet as wn
samples = [('The van rolled along the highway.','n'),
('The thunder rolled and the lightning striked.','n')]
word = 'roll'
for sentence, pos_tag in samples:
word_syn = lesk(word_tokenize(sentence.lower()), word, pos_tag)
print 'Sentence:', sentence
print 'Word synset:', word_syn
print 'Corresponding definition:', word_syn.definition()
输出:
Sentence: The van rolled along the highway.
Word synset: Synset('scroll.n.02')
Corresponding definition: a document that can be rolled up (as for storage)
Sentence: The thunder rolled and the lightning striked.
Word synset: Synset('paradiddle.n.01')
Corresponding definition: the sound of a drum (especially a snare drum) beaten rapidly and continuously