代码摘自 Natural Language Processing with Python,第 119 页。 Brown Corpus 不同部分的模态频率。我的问题是它无法像书中描述的那样进行制表。基本上我不知道为什么会这样。我的 Python 版本是 3.7.9 64 位。所有扩展都很顺利。
布朗语料库不同部分的模态频率
def tabulate(cfdist, words, categories):
print('%-16s' % 'Category')
for word in words: # column headings
print('%6s' % word,)
print()
for category in categories:
print('%-16s' % category,) # row headings
for word in words: # for each word
print('%6d' % cfdist[category][word]) # print table cell
print() # end the row
cfd = nltk.ConditionalFreqDist(
(genre, word)
for genre in brown.categories()
for word in brown.words(categories=genre))
genres = ['news', 'religion', 'hobbies', 'science_fiction', 'romance', 'humor']
modals = ['can', 'could', 'may', 'might', 'must', 'will']
tabulate(cfd, modals, genres)