12

我试图通过使用 nltk pos_tag 来获取一个单词是单数形式还是复数形式。但结果并不准确。

所以,我需要一种方法来找到一个单词是单数形式还是复数形式?此外,我需要它而不使用任何 python 包。

4

2 回答 2

13

对于英语,每个单词都应该有一个根引理,默认复数是单数。

假设你的列表中只有名词,你可以试试这个:

from nltk.stem import WordNetLemmatizer

wnl = WordNetLemmatizer()

def isplural(word):
    lemma = wnl.lemmatize(word, 'n')
    plural = True if word is not lemma else False
    return plural, lemma

nounls = ['geese', 'mice', 'bars', 'foos', 'foo', 
                'families', 'family', 'dog', 'dogs']

for nn in nounls:
    isp, lemma = isplural(nn)
    print nn, lemma, isp

当 word 超出 wordnet 时,您将遇到问题,然后您必须使用更复杂的分类器有限状态机out of NLTK

于 2013-09-20T10:04:06.537 回答
7

Assuming you want an English solution, you can do something similar to 2er0's solution a bit more directly with pattern-en:

from pattern.en import singularize

def isplural(pluralForm):
     singularForm = singularize(pluralForm)
     plural = True if pluralForm is not singularForm else False
     return plural, singularForm

nounls = ['geese', 'mice', 'bars', 'foos', 'foo', 
            'families', 'family', 'dog', 'dogs']

for pluralForm in nounls:
    isp, singularForm = isplural(pluralForm)
    print pluralForm, singularForm, isp

which outputs

geese goose True
mice mouse True
bars bar True
foos foo True
foo foo False
families family True
family family False
dog dog False
dogs dog True

the only difference in output between 2er0's solution and this is

foos foo True

since his solution outputs False, as he pointed out since foos is not in wordnet (and not an English word at all).

于 2013-09-20T14:46:19.983 回答