python-3.x - 我想对 dask 数据框进行词形还原，但我被卡住了

问问题 2019-03-04T08:18:24.710

102 次

我是 dask 的新手，想知道是否有人可以帮帮我。我有一个 >20GB 的大型文本数据集，需要/想要对列进行词形还原。我目前的功能 - 直接与熊猫一起使用的是

wnl = WordNetLemmatizer()

def lemmatizing(sentence):    
    stemSentence = ""

    for word in sentence.split():
        stem = wnl.lemmatize(word)
        stemSentence += stem
        stemSentence += " "

    stemSentence = stemSentence.strip()

    return stemSentence

通常会做以下事情

df['news_content'] = df['news_content'].apply(lemmatizing)

我正在看，delayed但我对如何实现它感到困惑。

非常感谢任何帮助。

python-3.x - 我想对 dask 数据框进行词形还原，但我被卡住了

0 回答 0

Related

Reference