1

I am recently writing a sound detection project using HTK (a HMM tool kit). After testing I get the following result file:

#!MLF!#
"../data/test/keyboard_04.rec"
0 47000000 keyboard -83909.929688
.

In the official doc, it says the time stamp has a unit of 100ns, so by this result, it says from 0s-4.7s, there is a sound of "keyboard". But the weird thing is that the testing sound file only has 1.9s, here is the detailed information:

>>  audioinfo('keyboard_04.wav')
ans = 
         Filename: [1x50 char]
CompressionMethod: 'Uncompressed'
      NumChannels: 2
       SampleRate: 44100
     TotalSamples: 83712
         Duration: 1.8982
            Title: []
          Comment: []
           Artist: []
    BitsPerSample: 24

Moreover, when I am running HVite, there is a warning:

WARNING [-7032]  OWarn: change HMM Set vecSize

Maybe this relates to my problem?

Does anybody know why the time stamp is so large? Thanks anyway!

4

1 回答 1

0

啊,我知道为什么会有这样的时差了。HTK 结果中的时间戳是“总帧时间”,即使有重叠。比如说,在我的示例中,窗口大小为 25 毫秒,窗口步长为 10 毫秒,总共 188 帧。

对于 HTK,188*0.025=4.7(s)。但是这个时间结果没有考虑重叠。

考虑重叠,即0.025+187*0.01=1.895,这就是音频时间。

HTK中的设置多么奇怪,哈哈。

于 2015-05-25T11:43:14.973 回答