25

我是 NLTK Python 的新手,我正在寻找一些可以进行词义消歧的示例应用程序。我在搜索结果中有很多算法,但没有示例应用程序。我只是想传一句话,想通过参考wordnet库来了解每个单词的意思。谢谢

我在 PERL 中找到了一个类似的模块。http://marimba.d.umn.edu/allwords/allwords.html NLTK Python 中是否存在这样的模块?

4

6 回答 6

17

最近部分pywsd代码已经移植到模块NLTK中的最前沿版本' wsd.py,试试:

>>> from nltk.wsd import lesk
>>> sent = 'I went to the bank to deposit my money'
>>> ambiguous = 'bank'
>>> lesk(sent, ambiguous)
Synset('bank.v.04')
>>> lesk(sent, ambiguous).definition()
u'act as the banker in a game or in gambling'

为了获得更好的 WSD 性能,请使用pywsd库而不是NLTK模块。一般来说,simple_lesk()from比frompywsd做得更好。当我有空时,我会尽可能地更新模块。leskNLTKNLTK


在回应 Chris Spencer 的评论时,请注意 Lesk 算法的局限性。我只是给出算法的准确实现。这不是灵丹妙药,http ://en.wikipedia.org/wiki/Lesk_algorithm

另请注意,尽管:

lesk("My cat likes to eat mice.", "cat", "n")

不要给你正确的答案,你可以使用以下pywsd实现max_similarity()

>>> from pywsd.similarity import max_similiarity
>>> max_similarity('my cat likes to eat mice', 'cat', 'wup', pos='n').definition 
'feline mammal usually having thick soft fur and no ability to roar: domestic cats; wildcats'
>>> max_similarity('my cat likes to eat mice', 'cat', 'lin', pos='n').definition 
'feline mammal usually having thick soft fur and no ability to roar: domestic cats; wildcats'

@Chris,如果您想要 python setup.py ,请礼貌地提出要求,我会写...

于 2014-06-22T22:23:28.980 回答
8

是的,事实上, NLTK 团队写了一本书,其中有多个关于分类的章节,并且明确介绍了如何使用 WordNet。您还可以从 Safari 购买该书的实体版。

仅供参考:NLTK 是由自然语言编程学者编写的,用于他们的编程入门课程。

于 2011-12-21T18:49:57.287 回答
4

作为对 OP 请求的实际回答,这里是几个 WSD 方法的 python 实现,它以 NLTK 的同义词集的形式返回意义,https://github.com/alvations/pywsd

这包括

  • Lesk算法(包括原始 Lesk改编的 Lesk简单的 Lesk
  • 基线算法(随机意义、第一意义、最常见意义)

它可以这样使用:

#!/usr/bin/env python -*- coding: utf-8 -*-

bank_sents = ['I went to the bank to deposit my money',
'The river bank was full of dead fishes']

plant_sents = ['The workers at the industrial plant were overworked',
'The plant was no longer bearing flowers']

print "======== TESTING simple_lesk ===========\n"
from lesk import simple_lesk
print "#TESTING simple_lesk() ..."
print "Context:", bank_sents[0]
answer = simple_lesk(bank_sents[0],'bank')
print "Sense:", answer
print "Definition:",answer.definition
print

print "#TESTING simple_lesk() with POS ..."
print "Context:", bank_sents[1]
answer = simple_lesk(bank_sents[1],'bank','n')
print "Sense:", answer
print "Definition:",answer.definition
print

print "#TESTING simple_lesk() with POS and stems ..."
print "Context:", plant_sents[0]
answer = simple_lesk(plant_sents[0],'plant','n', True)
print "Sense:", answer
print "Definition:",answer.definition
print

print "======== TESTING baseline ===========\n"
from baseline import random_sense, first_sense
from baseline import max_lemma_count as most_frequent_sense

print "#TESTING random_sense() ..."
print "Context:", bank_sents[0]
answer = random_sense('bank')
print "Sense:", answer
print "Definition:",answer.definition
print

print "#TESTING first_sense() ..."
print "Context:", bank_sents[0]
answer = first_sense('bank')
print "Sense:", answer
print "Definition:",answer.definition
print

print "#TESTING most_frequent_sense() ..."
print "Context:", bank_sents[0]
answer = most_frequent_sense('bank')
print "Sense:", answer
print "Definition:",answer.definition
print

[出去]:

======== TESTING simple_lesk ===========

#TESTING simple_lesk() ...
Context: I went to the bank to deposit my money
Sense: Synset('depository_financial_institution.n.01')
Definition: a financial institution that accepts deposits and channels the money into lending activities

#TESTING simple_lesk() with POS ...
Context: The river bank was full of dead fishes
Sense: Synset('bank.n.01')
Definition: sloping land (especially the slope beside a body of water)

#TESTING simple_lesk() with POS and stems ...
Context: The workers at the industrial plant were overworked
Sense: Synset('plant.n.01')
Definition: buildings for carrying on industrial labor

======== TESTING baseline ===========
#TESTING random_sense() ...
Context: I went to the bank to deposit my money
Sense: Synset('deposit.v.02')
Definition: put into a bank account

#TESTING first_sense() ...
Context: I went to the bank to deposit my money
Sense: Synset('bank.n.01')
Definition: sloping land (especially the slope beside a body of water)

#TESTING most_frequent_sense() ...
Context: I went to the bank to deposit my money
Sense: Synset('bank.n.01')
Definition: sloping land (especially the slope beside a body of water)
于 2014-01-03T10:21:10.377 回答
0

NLTK 有访问 Wordnet 的 api。Wordnet将单词作为同义词集。这会给你一些关于这个词、它的上位词、下位词、词根等的信息。

“Python Text Processing with NLTK 2.0 Cookbook”是一本很好的书,可以帮助您开始了解 NLTK 的各种功能。它易于阅读、理解和实施。

此外,您可以查看其他有关使用维基百科进行词义消歧的论文(NLTK 领域之外)。

于 2011-01-02T16:10:12.233 回答
-1

是的,可以使用 NLTK 中的 wordnet 模块。您帖子中提到的工具中使用的相似度测量也存在于 NLTK wordnet 模块中。

于 2010-09-18T17:58:59.377 回答