1

我正在阅读这篇文章并用我自己的数据进行实验,我发现文章中给出的两个例子和我的一个词都不像描述的那样起作用。您可以参考这篇文章以获取更多信息,尽管这里的问题有一切。

# stemmed root words: Books, Braveri, Harri, Transpar
from nltk.stem.wordnet import WordNetLemmatizer as Lemmatizer

# the article shared the same lemmatizer initialization.
lem = WordNetLemmatizer()

# returned 'harry' in the example without pos tag
In [269]: lem.lemmatize('harri', pos='n')
Out[269]: 'harri'

In [270]: lem.lemmatize("Books", pos='n')
Out[269]: 'Books'

# returned 'book' in the example with pos tag
In [270]: lem.lemmatize("Books", pos='v')
Out[269]: 'Books'

# my example root word, didn't change at all
[ins] In [278]: lem.lemmatize("Transpar", pos="a")
Out[278]: 'Transpar'
[ins] In [281]: lem.lemmatize("Transpar", pos="n")
Out[281]: 'Transpar'

# returned 'bravery' in the example without pos tag
[ins] In [280]: lem.lemmatize("Braveri", pos="n")
Out[280]: 'Braveri'

因此,此 lemmatizer 的默认 pos 标签只是wordnet.NOUN提供 pos 标签,否则不会有什么不同。仅供参考,transpar最初是transparent.

唯一的区别是作者使用 NLTK 词干分析器来词干词干,而我使用texthero.stem.

是我做错了还是 NLTK 发生了变化?

4

0 回答 0