python - 试图从 nltk 获取首字母缩略词

Question

我比较新，正在学习python。我正在尝试编写一个应用程序，该应用程序将采用用户提供的单词并就该单词提供一些替代建议。似乎 nltk 拥有我需要的大部分内容。我一直在查看一些示例，并且能够使其按如下方式工作：

from nltk.corpus import wordnet as wn
    for lemma in wn.synset('car.n.01').lemmas:
        print lemma, lemma.count()

这工作正常。我发现的问题是，如果用户拼写错误或将单词复数，那么我会崩溃：

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/nltk-2.0.1rc1-py2.6.egg/nltk/corpus/reader/wordnet.py", line 1035, in synset
    raise WordNetError(message % (lemma, pos))
nltk.corpus.reader.wordnet.WordNetError: no lemma 'cars' with part of speech 'n'

基于此错误，它似乎找不到“汽车”作为名词。有没有办法进行搜索以查看是否找到了这个词，或者有更好的方法来实现这个？

score 1 · Accepted Answer

我认为您没有以正确的方式调用 Wordnet：

>>> wn.synsets('cars')
[Synset('car.n.01'), Synset('car.n.02'), Synset('car.n.03'),
Synset('car.n.04'), Synset('cable_car.n.01')]

现在：

>>> for synset in wn.synsets('cars'):
...    synset.lemmas
[Lemma('car.n.01.car'), Lemma('car.n.01.auto'),
Lemma('car.n.01.automobile'),Lemma('car.n.01.machine'),
Lemma('car.n.01.motorcar')]...

对于拼写错误，我认为 NLTK 没有内置功能。您可以：

使用类似的库pyenchant，它可以访问一些不错的 C 库（Myspell、Hunspell）。IMO 的主要问题是，对于拼写错误的单词，您没有得到很多不同的建议。
检查自己用户提交的单词，并提出替代拼写。这没什么大不了的。您可以从研究该程序的作用开始（或直接使用它），它提供了一个很好的示例，说明如何在单词列表上构建 gram 索引。

要获取有关引理的信息：

>>> # get one of the lemmas
>>> lemma = wn.synsets('cars')[0].lemmas[0]
>>> lemma
Lemma('car.n.01.car')
>>> dir(lemma)
[...'antonyms', 'attributes', 'causes', 'count',
'derivationally_related_forms', 'entailments', 'frame_ids'... 'name'...]
>>> lemma.name
'car'

在每个对象上使用dir来检查它的属性，并尝试一下:)

python - 试图从 nltk 获取首字母缩略词

1 回答 1

Related

Reference