9

我是 python 新手,用书中的例子练习。
谁能解释为什么当我试图用这段代码来阻止一些例子时,什么都没有改变?

>>> from nltk.stem import PorterStemmer
>>> stemmer=PorterStemmer()
>>> stemmer.stem('numpang wifi stop gadget shopping')
'numpang wifi stop gadget shopping'

但是当我这样做时它有效

>>> stemmer.stem('shopping')
'shop'
4

3 回答 3

13

试试这个:

res = ",".join([ stemmer.stem(kw) for kw in 'numpang wifi stop gadget shopping'.split(" ")])

问题在于,这个词干分析器可能只对单个单词起作用。您的字符串没有“根”字,而单个单词“购物”有根“商店”。所以你必须单独计算词干

编辑:

从他们的源代码->

Stemming algorithms attempt to automatically remove suffixes (and in some
cases prefixes) in order to find the "root word" or stem of a given word. This
is useful in various natural language processing scenarios, such as search.

所以我想你确实是被迫自己拆分你的字符串

于 2012-10-19T12:27:40.417 回答
4

词干化是将给定单词简化为其基本形式或变形形式的过程,在这里你试图阻止整个句子,

请按照以下步骤操作

from nltk.tokenize import word_tokenize
from nltk.stem import PorterStemmer
sentence = "numpang wifi stop gadget shopping"
tokens = word_tokenize(sentence)
stemmer=PorterStemmer()

Output=[stemmer.stem(word) for word in tokens]
于 2016-07-10T10:15:17.087 回答
1

试试这个:

from nltk.stem import PorterStemmer
from nltk.tokenize import word_tokenize

stemmer = PorterStemmer()

some_text = "numpang wifi stop gadget shopping"

words = word_tokenize(some_text)

for word in words:
    print(stemmer.stem(word))
于 2018-06-23T16:40:52.777 回答