8

我正在尝试获取从其基本形式修改的英语单词的基本英语单词。这个问题在这里被问过,但我没有看到正确的答案,所以我试着这样说。我从 NLTK 包中尝试了 2 个词干分析器和一个词形还原器,它们是搬运工词干分析器、雪球词干分析器和 wordnet 词形分析器。

我试过这段代码:

from nltk.stem.porter import PorterStemmer
from nltk.stem.snowball import SnowballStemmer
from nltk.stem.wordnet import WordNetLemmatizer

words = ['arrival','conclusion','ate']

for word in words:
    print "\n\nOriginal Word =>", word
    print "porter stemmer=>", PorterStemmer().stem(word)
    snowball_stemmer = SnowballStemmer("english")
    print "snowball stemmer=>", snowball_stemmer.stem(word)
    print "WordNet Lemmatizer=>", WordNetLemmatizer().lemmatize(word)

这是我得到的输出:

Original Word => arrival
porter stemmer=> arriv
snowball stemmer=> arriv
WordNet Lemmatizer=> arrival


Original Word => conclusion
porter stemmer=> conclus
snowball stemmer=> conclus
WordNet Lemmatizer=> conclusion


Original Word => ate
porter stemmer=> ate
snowball stemmer=> ate
WordNet Lemmatizer=> ate

但我想要这个输出

    Input : arrival
    Output: arrive

    Input : conclusion
    Output: conclude

    Input : ate
    Output: eat 

我怎样才能做到这一点?是否有任何可用的工具?这称为形态分析。我知道这一点,但肯定有一些工具已经实现了这一点。帮助表示赞赏:)

第一次编辑

我试过这段代码

import nltk
from nltk.stem.wordnet import WordNetLemmatizer
from nltk.tokenize import word_tokenize
from nltk.corpus import wordnet as wn

query = "The Indian economy is the worlds tenth largest by nominal GDP and third largest by purchasing power parity"

def is_noun(tag):
    return tag in ['NN', 'NNS', 'NNP', 'NNPS']

def is_verb(tag):
    return tag in ['VB', 'VBD', 'VBG', 'VBN', 'VBP', 'VBZ']

def is_adverb(tag):
    return tag in ['RB', 'RBR', 'RBS']

def is_adjective(tag):
    return tag in ['JJ', 'JJR', 'JJS']

def penn_to_wn(tag):
    if is_adjective(tag):
        return wn.ADJ
    elif is_noun(tag):
        return wn.NOUN
    elif is_adverb(tag):
        return wn.ADV
    elif is_verb(tag):
        return wn.VERB
    return wn.NOUN

tags = nltk.pos_tag(word_tokenize(query))
for tag in tags:
    wn_tag = penn_to_wn(tag[1])
    print tag[0]+"---> "+WordNetLemmatizer().lemmatize(tag[0],wn_tag)

在这里,我尝试通过提供适当的标签来使用 wordnet lemmatizer。这是输出:

The---> The
Indian---> Indian
economy---> economy
is---> be
the---> the
worlds---> world
tenth---> tenth
largest---> large
by---> by
nominal---> nominal
GDP---> GDP
and---> and
third---> third
largest---> large
by---> by
purchasing---> purchase
power---> power
parity---> parity

不过,这种方法不会处理“到达”和“结论”之类的词。有什么解决办法吗?

4

2 回答 2

2

好的,所以...对于“吃”这个词,我认为您正在寻找NodeBox::Linguistics

print en.verb.present("gave")
>>> give

而且我不完全理解您为什么要动词或“到达”而不是“结论”。

于 2014-11-15T15:17:29.343 回答
2

尝试包,从这里word_stemmer克隆它并执行.pip install -e word_forms

from word_forms.word_forms import get_word_forms
get_word_forms('conclusion')

# gives:
{'a': {'conclusive'},
 'n': {'conclusion', 'conclusions', 'conclusivenesses', 'conclusiveness'},
 'r': {'conclusively'},
 'v': {'concludes', 'concluded', 'concluding', 'conclude'}}

在您的情况下,您希望从名词词形式获得动词形式。

于 2018-05-22T12:03:38.007 回答