14

我需要一个单词的输入文本文件。然后我需要使用 wordnet 找到单词的同义词集的引理名称、定义和示例。我已经阅读了这本书:“Python Text Processing with NLTK 2.0 Cookbook”和“Natural Language Processing using NLTK”来帮助我朝这个方向发展。虽然我已经了解如何使用终端来完成此操作,但我无法使用文本编辑器来做同样的事情。

例如,如果输入文本中包含单词“flabbergasted”,则输出需要采用以下方式:

flabbergasted (动词) flabbergast, boggle, Bowl over-covered with amazement ; “这令人难以置信!” (形容词)目瞪口呆、目瞪口呆、目瞪口呆、目瞪口呆、惊慌失措、目瞪口呆、目瞪口呆——仿佛被震惊和惊讶击得哑口无言;“一圈警察都被她否认目睹了这起事故而目瞪口呆”;“目瞪口呆的市议员说不出话来”;“被他升职的消息震惊了”

同义词、定义和例句直接从 WordNet 获得!

我有以下代码:


from __future__ import division
import nltk
from nltk.corpus import wordnet as wn


tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')
fp = open("inpsyn.txt")
data = fp.read()

#to tokenize input text into sentences

print '\n-----\n'.join(tokenizer.tokenize(data))# splits text into sentences

#to tokenize the tokenized sentences into words

tokens = nltk.wordpunct_tokenize(data)
text = nltk.Text(tokens)
words = [w.lower() for w in text]  
print words     #to print the tokens

for a in words:
    print a

syns = wn.synsets(a)
print "synsets:", syns

for s in syns:
    for l in s.lemmas:
        print l.name
    print s.definition
    print s.examples

我得到以下输出:


flabbergasted

['flabbergasted']
flabbergasted
synsets: [Synset('flabbergast.v.01'), Synset('dumbfounded.s.01')]
flabbergast
boggle
bowl_over
overcome with amazement
['This boggles the mind!']
dumbfounded
dumfounded
flabbergasted
stupefied
thunderstruck
dumbstruck
dumbstricken
as if struck dumb with astonishment and surprise
['a circle of policement stood dumbfounded by her denial of having seen the accident', 'the flabbergasted aldermen were speechless', 'was thunderstruck by the news of his promotion']

有没有办法检索词性以及一组引理名称?

4

4 回答 4

22
def synset(word):
    wn.synsets(word)

不返回任何东西,所以默认情况下你会得到None

你应该写

def synset(word):
    return wn.synsets(word)

提取引理名称:

from nltk.corpus import wordnet
syns = wordnet.synsets('car')
syns[0].lemmas[0].name
>>> 'car'
[s.lemmas[0].name for s in syns]
>>> ['car', 'car', 'car', 'car', 'cable_car']


[l.name for s in syns for l in s.lemmas]
>>>['car', 'auto', 'automobile', 'machine', 'motorcar', 'car', 'railcar', 'railway_car', 'railroad_car', 'car', 'gondola', 'car', 'elevator_car', 'cable_car', 'car']
于 2011-04-04T06:04:27.553 回答
5

在这里,我创建了一个可以轻松使用(导入)的模块,并且将字符串传递给它,将返回字符串的所有引理词。

模块:

#!/usr/bin/python2.7
''' pass a string to this funciton ( eg 'car') and it will give you a list of
words which is related to cat, called lemma of CAT. '''
from nltk.corpus import wordnet as wn
import sys
#print all the synset element of an element
def lemmalist(str):
    syn_set = []
    for synset in wn.synsets(str):
        for item in synset.lemma_names:
            syn_set.append(item)
    return syn_set

用法:

注意:模块名称是 lemma.py 因此“从引理导入引理”

>>> from lemma import lemmalist
>>> lemmalist('car')
['car', 'auto', 'automobile', 'machine', 'motorcar', 'car', 'railcar', 'railway_car', 'railroad_car', 'car', 'gondola', 'car', 'elevator_car', 'cable_car', 'car']

干杯!

于 2013-09-27T08:41:31.047 回答
1
synonyms = []
for syn in wordnet.synsets("car"):
    for l in syn.lemmas():
        synonyms.append(l.name())
print synonyms
于 2016-10-24T07:36:01.383 回答
0

NLTK 3.0,lemma_names已从属性更改为方法。因此,如果您收到错误消息:

TypeError: 'method' object is not iterable

您可以使用以下方法修复它:

>>> from nltk.corpus import wordnet as wn
>>> [item for sysnet in wn.synsets('car') for item in sysnet.lemma_names()]

这将输出:

>>> [
       'car', 'auto', 'automobile', 'machine', 'motorcar', 'car', 
       'railcar', 'railway_car', 'railroad_car', 'car', 'gondola', 
       'car', 'elevator_car', 'cable_car', 'car'
    ]
于 2018-02-01T03:31:02.490 回答