当输入中给出一个单词时,我的二元语言模型工作正常,但是当我给我的三元模型提供两个单词时,它的行为很奇怪,并预测下一个单词是“未知”。 我的代码:
def get_unigram_probability(word):
if word not in unigram:
return 0
return unigram[word] / total_words
def get_bigram_probability(words):
if words not in bigram:
return 0
return bigram[words] / unigram[words[0]]
V = len(vocabulary)
def get_trigram_probability(words):
if words not in trigram:
return 0
return trigram[words] + 1 / bigram[words[:2]] + V
对于 bi-gram 下一个词预测:
def find_next_word_bigram(words):
candidate_list = []
# Calculate probability for each word by looping through them
for word in vocabulary:
p2 = get_bigram_probability((words[-1], word))
candidate_list.append((word, p2))
# sort the list with words with often occurence in the beginning
candidate_list.sort(key=lambda x: x[1], reverse=True)
# print(candidate_list)
return candidate_list[0]
对于三元组:
def find_next_word_trigram(words):
candidate_list = []
# Calculate probability for each word by looping through them
for word in vocabulary:
p3 = get_trigram_probability((words[-2], words[-1], word)) if len(words) >= 3 else 0
candidate_list.append((word, p3))
# sort the list with words with often occurence in the beginning
candidate_list.sort(key=lambda x: x[1], reverse=True)
# print(candidate_list)
return candidate_list[0]
我只想知道我应该在代码中的哪个位置进行更改,以便三元组可以预测给定输入大小为 2 个单词的下一个单词。