我知道如何使用 NLTK 获得二元组和三元组搭配,并将它们应用到我自己的语料库中。代码如下。
但是我不确定(1)如何获取特定单词的搭配?(2) NLTK 是否有基于对数似然比的搭配指标?
import nltk
from nltk.collocations import *
from nltk.tokenize import word_tokenize
text = "this is a foo bar bar black sheep foo bar bar black sheep foo bar bar black sheep shep bar bar black sentence"
trigram_measures = nltk.collocations.TrigramAssocMeasures()
finder = TrigramCollocationFinder.from_words(word_tokenize(text))
for i in finder.score_ngrams(trigram_measures.pmi):
print i