我使用单个单词识别。我有一个学习语料库和一个测试语料库,每个单词都有一个音频。多亏了 Praat,我每个字都得到了 MFCC。例如:
File type = "ooTextFile"
Object class = "MFCC 1"
xmin = 0
xmax = 1.244920634920635
nx = 121
dx = 0.01
x1 = 0.02246031746031751
fmin = 0
fmax = 3900
maximumNumberOfCoefficients = 37
frame []:
frame [1]:
numberOfCoefficients = 6
c0 = 739.4967520397784
c []:
c [1] = -17.706854515821973
c [2] = 68.74208039764812
c [3] = 19.417181363049988
c [4] = 18.23020375368816
c [5] = 26.51577939943669
c [6] = 4.862083512334764
然后我比较每个单词并得到 DTW(与 Praat)
91.69273804739957 (= distance at (0.1, 0.1))
140.1880158235773 (weighted distance)
但是,我不知道如何“阅读”结果。我怎么知道识别率是多少?如果我认识这个词,我应该采取哪些数据?