2

如果我在 python 中有一个单词列表,例如:

words = ["blue", "red", "ball"]

有没有办法使用 WordNet 以编程方式为这组单词生成上位词?

4

1 回答 1

6

首先,请参阅https://stackoverflow.com/a/29478711/610569以注意“sense”(synset/concept)与“words”(在 wordnet 的上下文中,引理)之间的区别。

给定两个同义词集(NOT words),可以找到它们之间的最低共同上位词:

>>> from nltk.corpus import wordnet as wn

# A word can represent multiple meaning (aka synsets)
>>> wn.synsets('dog')
[Synset('dog.n.01'), Synset('frump.n.01'), Synset('dog.n.03'), Synset('cad.n.01'), Synset('frank.n.02'), Synset('pawl.n.01'), Synset('andiron.n.01'), Synset('chase.v.01')]

>>> wn.synsets('cat')
[Synset('cat.n.01'), Synset('guy.n.01'), Synset('cat.n.03'), Synset('kat.n.01'), Synset('cat-o'-nine-tails.n.01'), Synset('caterpillar.n.02'), Synset('big_cat.n.01'), Synset('computerized_tomography.n.01'), Synset('cat.v.01'), Synset('vomit.v.01')]

>>> wn.synsets('dog')[0].definition()
u'a member of the genus Canis (probably descended from the common wolf) that has been domesticated by man since prehistoric times; occurs in many breeds'

>>> wn.synsets('cat')[0].definition()
u'feline mammal usually having thick soft fur and no ability to roar: domestic cats; wildcats'

>>> dog = wn.synsets('dog')[0]
>>> cat = wn.synsets('cat')[0]


>>> cat.lowest_common_hypernyms(dog)
[Synset('carnivore.n.01')]

http://www.nltk.org/howto/wordnet_lch.html

最低的常用上位词可靠吗?

Wordnet 是一种手工制作的资源,因此它的可靠性取决于在整个 WordNet 本体中创建同义词集的原因和方式

我可以将此信息用于我的 NLP 任务吗?

也许......但最有可能的是,它没有用。

它可以比较两个以上的同义词吗?

不完全是。您必须进行多次成对搜索,例如

>>> mouse = wn.synsets('mouse')[0]
>>> cat = wn.synsets('cat')[0]
>>> dog = wn.synsets('dog')[0]

>>> dog.lowest_common_hypernyms(cat)
[Synset('carnivore.n.01')]
>>> cat.lowest_common_hypernyms(mouse)
[Synset('placental.n.01')]
>>> dog.lowest_common_hypernyms(mouse)
[Synset('placental.n.01')]

>>> placental = dog.lowest_common_hypernyms(mouse)[0]
>>> carnivore = dog.lowest_common_hypernyms(cat)[0]
>>> placental.lowest_common_hypernyms(carnivore)
[Synset('placental.n.01')]

但是你可以看到它是多么的低效。因此,如果您重写自己的代码以遍历 WordNet 本体并找到给定 N 的最低公共上位词会更容易。同义词集,而不是成对进行。

于 2017-07-24T23:07:41.257 回答