我需要一个单词的输入文本文件。然后我需要使用 wordnet 找到单词的同义词集的引理名称、定义和示例。我已经阅读了这本书:“Python Text Processing with NLTK 2.0 Cookbook”和“Natural Language Processing using NLTK”来帮助我朝这个方向发展。虽然我已经了解如何使用终端来完成此操作,但我无法使用文本编辑器来做同样的事情。
例如,如果输入文本中包含单词“flabbergasted”,则输出需要采用以下方式:
flabbergasted (动词) flabbergast, boggle, Bowl over-covered with amazement ; “这令人难以置信!” (形容词)目瞪口呆、目瞪口呆、目瞪口呆、目瞪口呆、惊慌失措、目瞪口呆、目瞪口呆——仿佛被震惊和惊讶击得哑口无言;“一圈警察都被她否认目睹了这起事故而目瞪口呆”;“目瞪口呆的市议员说不出话来”;“被他升职的消息震惊了”
同义词、定义和例句直接从 WordNet 获得!
我有以下代码:
from __future__ import division
import nltk
from nltk.corpus import wordnet as wn
tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')
fp = open("inpsyn.txt")
data = fp.read()
#to tokenize input text into sentences
print '\n-----\n'.join(tokenizer.tokenize(data))# splits text into sentences
#to tokenize the tokenized sentences into words
tokens = nltk.wordpunct_tokenize(data)
text = nltk.Text(tokens)
words = [w.lower() for w in text]
print words #to print the tokens
for a in words:
print a
syns = wn.synsets(a)
print "synsets:", syns
for s in syns:
for l in s.lemmas:
print l.name
print s.definition
print s.examples
我得到以下输出:
flabbergasted
['flabbergasted']
flabbergasted
synsets: [Synset('flabbergast.v.01'), Synset('dumbfounded.s.01')]
flabbergast
boggle
bowl_over
overcome with amazement
['This boggles the mind!']
dumbfounded
dumfounded
flabbergasted
stupefied
thunderstruck
dumbstruck
dumbstricken
as if struck dumb with astonishment and surprise
['a circle of policement stood dumbfounded by her denial of having seen the accident', 'the flabbergasted aldermen were speechless', 'was thunderstruck by the news of his promotion']
有没有办法检索词性以及一组引理名称?