3

我在一个列表中有一堆句子,我想使用 nltk 库来阻止它。我能够一次阻止一个句子,但是我遇到了从列表中提取句子并将它们重新组合在一起的问题。我缺少一个步骤吗?nltk 库很新。谢谢!

import nltk 
from nltk.stem import PorterStemmer 
ps = PorterStemmer()

# Success: one sentences at a time 
data = 'the gamers playing games'
words = word_tokenize(data)
for w in words:
    print(ps.stem(w))


# Fails: 

data_list = ['the gamers playing games',
            'higher scores',
            'sports']
words = word_tokenize(data_list)
for w in words:
    print(ps.stem(w))

# Error: TypeError: expected string or bytes-like object
# result should be: 
['the gamer play game',
 'higher score',
 'sport']
4

3 回答 3

4

您正在传递一个您不能传递的列表word_tokenize

解决方案是将您的逻辑包装在另一个for-loop中,

data_list = ['the gamers playing games','higher scores','sports']
for words in data_list:
    words = tokenize.word_tokenize(words)
    for w in words:
        print(ps.stem(w))

>>>>the
gamer
play
game
higher
score
sport
于 2018-07-13T17:33:28.287 回答
0

为了阻止并重新编译回列表数据结构,我会选择:

ps = PorterStemmer()
data_list_s = [] 
for words in data_list:
    words = word_tokenize(words)
    words_s = ''
    for w in words:
        w_s = ps.stem(w)
        words_s+=w_s+' '
    data_list_s.append(words_s)

这会将每个元素的提取结果data_list放入一个名为 的新列表中data_list_s

于 2022-01-27T04:01:21.447 回答
-1
import nltk
from nltk.tokenize import sent_tokenize
from nltk.stem import PorterStemmer

sentence = """At eight o'clock on Thursday morning, Arthur didn't feel very good. So i take him to hospital."""

sentence = sentence.lower()

word_tokens = nltk.word_tokenize(sentence)
sent_tokens = sent_tokenize(sentence)

stemmer = PorterStemmer()
stemmed_word = []
stemmed_sent = []
for token in word_tokens:
    stemmed_word.append(stemmer.stem(token))
    
for sent_token in sent_tokens:
    stemmed_sent.append(stemmer.stem(sent_token))
    
print(stemmed_word)
print(stemmed_sent)
于 2021-07-02T20:04:48.677 回答