我正在尝试将 sentence_bleu 应用于 Pandas 中的列,以评估某些机器翻译的质量。但是它输出的分数是不正确的。谁能看到我的错误?
import pandas as pd
from nltk.translate.bleu_score import sentence_bleu
translations = {
'reference': [['this', 'is', 'a', 'test'],['this', 'is', 'a', 'test'],['this', 'is', 'a', 'test']],
'candidate': [['this', 'is', 'a', 'test'],['this', 'is', 'not','a', 'quiz'],['I', 'like', 'kitties', '.']]
}
df = pd.DataFrame(translations)
df['BLEU'] = df.apply(lambda row: sentence_bleu(row['reference'],row['candidate']), axis=1)
df
它输出这个:
Index reference candidate BLEU
0 [this, is, a, test] [this, is, a, test] 1.288230e-231
1 [this, is, a, test] [this, is, not, a, quiz] 1.218332e-231
2 [this, is, a, test] [I, like, kitties, .] 0.000000e+00
第 0 行应等于 1.0,第 1 行应小于 1.0。大概在0.9左右。我究竟做错了什么?