我正在遵循 bleu 评分的原始代码,如下所示:
from nltk.translate.bleu_score import sentence_bleu
reference = [['this', 'is', 'a', 'test'], ['this', 'is' 'test']]
candidate = ['this', 'is', 'a', 'test']
score = sentence_bleu(reference, candidate)
print(score)
并且代码工作正常。但我正在尝试通过将 csv 文件导入为以下代码来更改reference
and :candidate
import nltk
import csv
import itertools
from nltk.translate.bleu_score import sentence_bleu
print("Opening references file...")
with open('bleu-ref.csv', newline='') as csvfile:
spamreader = csv.reader(csvfile, delimiter=' ', quotechar='|')
sentences = []
for row in spamreader:
# print(', '.join(row))
sentences.append(' '.join(row))
sent = [[i] for i in sentences]
reference = []
for i in range(len(sent)):
sent[i]
chink = []
for j in sent[i]:
chink = chink + nltk.word_tokenize(j)
reference.append(chink)
print("Opening candidates file...")
with open('bleu-can.csv', newline='') as csvfile:
spamreader = csv.reader(csvfile, delimiter=' ', quotechar='|')
sentences_can = []
for row in spamreader:
# print(', '.join(row))
sentences_can.append(' '.join(row))
sent_can = [[i] for i in sentences_can]
candidate = []
for i in range(len(sent_can)):
sent_can[i]
chink_can = []
for j in sent_can[i]:
chink_can = chink_can + nltk.word_tokenize(j)
candidate.append(chink_can)
score = sentence_bleu(reference, candidate)
但它遇到了一个错误:
Traceback (most recent call last):
File "nltk-bleu-score.py", line 56, in <module>
score = sentence_bleu(reference, candidate)
File "C:\Users\Fachri\Anaconda3\lib\site-packages\nltk\translate\bleu_score.py", line 89, in sentence_bleu
emulate_multibleu)
File "C:\Users\Fachri\Anaconda3\lib\site-packages\nltk\translate\bleu_score.py", line 162, in corpus_bleu
p_i = modified_precision(references, hypothesis, i)
File "C:\Users\Fachri\Anaconda3\lib\site-packages\nltk\translate\bleu_score.py", line 292, in modified_precision
counts = Counter(ngrams(hypothesis, n)) if len(hypothesis) >= n else Counter()
File "C:\Users\Fachri\Anaconda3\lib\collections\__init__.py", line 535, in __init__
self.update(*args, **kwds)
File "C:\Users\Fachri\Anaconda3\lib\collections\__init__.py", line 622, in update
_count_elements(self, iterable)
TypeError: unhashable type: 'list'
然后,我从原始代码和修改后的代码中检查 and 的类型,它返回相同的reference
类型candidate
list
我很困惑是什么让这些列表不同。
名单reference
和“候选人”如下所示
Opening references file...
[['two', 'airplanes', 'are', 'waiting', 'on', 'the', 'tarmac'], ['Two', 'airplanes', 'parked', 'at', 'the', 'airport', '.']]
Opening candidates file...
[['An', 'airplane', 'sitting', 'on', 'the', 'tarmac', 'at', 'an', 'airport', 'with', 'another', 'plane', 'in', 'the', 'background', '.']]