python - NLTK 中的 Wordnet 选择限制

Question

有没有办法通过 NLTK 从同义词集中捕获 WordNet 选择限制（例如 +animate、+human 等）？或者有没有其他方法可以提供有关同义词集的语义信息？我能得到的最接近的是上位词关系。

score 5 · Accepted Answer

这取决于您的“选择限制”是什么，或者我将其称为语义特征，因为在经典语义中，存在一个concepts我们必须找到的概念的世界并进行比较

区分特征（即用于区分它们的概念的特征）和
相似特征（即概念相似的特征并强调区分它们的必要性）

例如：

Man is [+HUMAN], [+MALE], [+ADULT]
Woman is [+HUMAN], [-MALE], [+ADULT]

[+HUMAN] and [+ADULT] = similarity features
[+-MALE] is the discrimating features

传统语义学的共同问题以及在计算语义学中应用该理论是

“是否有一个特定的功能列表可以用来比较任何

“如果有，这份名单上有什么特点？” 概念？”

（有关更多详细信息，请参阅 www.acl.ldc.upenn.edu/E/E91/E91-1034.pdf‎）

回到 WordNet，我可以建议 2 种方法来解决“选择限制”

首先，检查区分特征的上位词，但首先你必须确定区分特征是什么。为了区分动物和人类，我们将区分特征设为 [+-human] 和 [+-animal]。

from nltk.corpus import wordnet as wn

# Concepts to compare
dog_sense = wn.synsets('dog')[0] # It's http://goo.gl/b9sg9X
jb_sense = wn.synsets('James_Baldwin')[0] # It's http://goo.gl/CQQIG9

# To access the hypernym_paths()[0]
# It's weird for that hypernym_paths gives a list of list rather than a list, nevertheless it works.
dog_hypernyms = dog_sense.hypernym_paths()[0]
jb_hypernyms = jb_sense.hypernym_paths()[0]


# Discriminating features in terms of concepts in WordNet
human = wn.synset('person.n.01') # i.e. [+human]
animal = wn.synset('animal.n.01') # i.e. [+animal]

try:
  assert human in jb_hypernyms and animal not in jb_hypernyms
  print "James Baldwin is human"
except:
  print "James Baldwin is not human"

try:
  assert human in dog_hypernyms and animal not in dog_hypernyms
  print "Dog is an animal"
except:
  print "Dog is not an animal"

其次，按照@Jacob 的建议检查相似性度量。

dog_sense = wn.synsets('dog')[0] # It's http://goo.gl/b9sg9X
jb_sense = wn.synsets('James_Baldwin')[0] # It's http://goo.gl/CQQIG9

# Features to check against whether the 'dubious' concept is a human or an animal
human = wn.synset('person.n.01') # i.e. [+human]
animal = wn.synset('animal.n.01') # i.e. [+animal]

if dog_sense.wup_similarity(animal) > dog_sense.wup_similarity(human):
  print "Dog is more of an animal than human"
elif dog_sense.wup_similarity(animal) < dog_sense.wup_similarity(human):
  print "Dog is more of a human than animal"

score 0 · Accepted Answer

您可以尝试将一些相似函数与精选的同义词一起使用，并使用它进行过滤。但它本质上与遵循上位词树相同 - afaik 所有的 wordnet 相似度函数在计算中都使用上位词距离。此外，同义词集的许多可选属性可能值得探索，但它们的存在可能非常不一致。

python - NLTK 中的 Wordnet 选择限制

2 回答 2

Related

Reference