1

我正在使用 WordNet 和 NLTK 进行词义消歧。我对所有与声音有关的单词都感兴趣。我有一个这样的单词列表,“roll”就是其中之一。然后我检查我的任何句子是否包含这个词(我也会根据 POS 检查它)。如果是,我只想选择与声音相关的句子。在下面的示例中,它将是第二句话。我现在的想法就是选择这样的词,他们的定义中有一个词“声音”作为“快速连续敲打的鼓(尤其是小军鼓)的声音”。但我怀疑还有一种更优雅的方式。任何想法将不胜感激!

from nltk.wsd import lesk
from nltk.corpus import wordnet as wn

samples = [('The van rolled along the highway.','n'),
('The thunder rolled and the lightning striked.','n')]

word = 'roll'
for sentence, pos_tag in samples:
    word_syn = lesk(word_tokenize(sentence.lower()), word, pos_tag)
    print 'Sentence:', sentence
    print 'Word synset:', word_syn
    print  'Corresponding definition:', word_syn.definition()

输出:

Sentence: The van rolled along the highway.
Word synset: Synset('scroll.n.02')
Corresponding definition: a document that can be rolled up (as for storage)
Sentence: The thunder rolled and the lightning striked.
Word synset: Synset('paradiddle.n.01')
Corresponding definition: the sound of a drum (especially a snare drum) beaten rapidly and continuously
4

1 回答 1

3

您可以使用 WordNet 上位词(具有更一般含义的同义词集)。我的第一个想法是从当前的同义词向上(使用synset.hypernyms())并继续检查我是否找到“声音”同义词。当我点击根(没有上位词,即synset.hypernyms()返回一个空列表)时,我会停下来。

现在对于您的两个示例,这会产生以下同义词序列:

Sentence:The van rolled along the highway .
Word synset:Synset('scroll.n.02')
[Synset('manuscript.n.02')]
[Synset('autograph.n.01')]
[Synset('writing.n.02')]
[Synset('written_communication.n.01')]
[Synset('communication.n.02')]
[Synset('abstraction.n.06')]
[Synset('entity.n.01')]

Sentence:The thunder rolled and the lightning striked .
Word synset:Synset('paradiddle.n.01')
[Synset('sound.n.04')]
[Synset('happening.n.01')]
[Synset('event.n.01')]
[Synset('psychological_feature.n.01')]
[Synset('abstraction.n.06')]
[Synset('entity.n.01')]

因此,您可能要查找的同义词之一是sound.n.04. 但可能还有其他人,我认为您可以尝试其他示例并尝试提出一个列表。

于 2017-04-27T13:09:05.917 回答