tweets - 在推文文本中循环，如果单词不在停用词中，则正确拼写、词形还原和词干

翻译自：https://stackoverflow.com/questions/59094923 2019-11-28T19:00:55.343

38 次

在此处输入图像描述，下面的代码应遍历推文数据集-文本列，如果单词不在停用词列表中，则应更正拼写，词形还原，然后词干。它不能正常工作你能帮我解决它吗？请检查附图中的错误

pstem = PorterStemmer()
lem = WordNetLemmatizer()
spell = SpellChecker()
stop_words = stopwords.words('english')

for i in range(len(df.index)):
    text = df.loc[i]['text']
    tokens = nltk.word_tokenize(text)
    tokens = [word for word in tokens if word not in stop_words] 
    for j in range(len(tokens)):
        tokens[j] = spell.correction(tokens[j])
        tokens[j] = lem.lemmatize(tokens[j])
        tokens[j] = pstem.stem(tokens[j])
    tokens_sent=' '.join(tokens)
    df.at[i,"text"] = tokens_sent

tweets - 在推文文本中循环，如果单词不在停用词中，则正确拼写、词形还原和词干

0 回答 0

Related

Reference