我必须使用 nltk 生成随机句子。然而,似乎 text.generate() 只给了我们带有三元组的句子。有什么办法可以将其扩展为包括一元和二元?
我目前的代码是:
exclude = set(string.punctuation)
ln = ''.join(ch for ch in ln if ch not in exclude)
words = nltk.word_tokenize(ln)
my_bigrams = nltk.bigrams(words)
my_trigrams = nltk.trigrams(words)
tText = Text(words)
tText1 = Text(my_bigrams)
tText2 = Text(my_trigrams)
print tText.generate()
print tText1.generate()
print tText2.generate()
generate() 函数的变化:
def generate(self, length=100, c=3):
"""
Print random text, generated using a trigram language model.
:param length: The length of text to generate (default=100)
:type length: int
:seealso: NgramModel
"""
if '_trigram_model' not in self.__dict__:
print "Building ngram index..."
estimator = lambda fdist, bins: LidstoneProbDist(fdist, 0.2)
self._trigram_model = NgramModel(c, self, estimator=estimator)
text = self._trigram_model.generate(length)
print tokenwrap(text)