python - nltk 语言模型 TypeError:ngarms() 得到了一个意外的关键字参数“pad_symbol”

Question

我正在执行以下代码：

from nltk.corpus import brown
from nltk.model import Ngram
lm = NgramModel(2, brown.words(categories='news'), estimator=None)

但我得到一个错误：

我真的不知道为什么我会遇到这个问题；它是来自 nltk 代码的错误吗？有没有人知道我做错了什么？

先感谢您。

score 1 · Accepted Answer

由于这个答案已经超过 3 年没有更新，这里是 NLTK v3.5 中的 ngram 模型代码示例

from nltk.corpus import brown
from nltk.lm import KneserNeyInterpolated
from nltk.lm.preprocessing import padded_everygram_pipeline

# create a bigram model using Kneser-Ney smoothing
lm = KneserNeyInterpolated(2) # could also be MLE(2)
# use the Brown Corpus to train the language model
# padding adds <s> tags before a sentence, and </s> tags after a sentence
train, vocab = padded_everygram_pipeline(order=2, text=brown.sents())
# optionally, choose a category of the Brown Corpus to train a language model
# train, vocab = padded_everygram_pipeline(order=2, text=brown.sents(categories='news'))
lm.fit(train, vocab) # fit the trained model

score 0 · Accepted Answer

据我所见， nltk.model 仍然有一些错误，因此它不在 nltk-master 中NgramModel bugs。由于模型分支仍在开发中，我下载了最新版本，但仍然遇到与您帖子中相同的错误。

如果您需要这个模块并且愿意降级版本，那么我找到了一个具有工作版本的版本。NLTK 2.0.4

python - nltk 语言模型 TypeError:ngarms() 得到了一个意外的关键字参数“pad_symbol”

2 回答 2

Related

Reference