4

我正在关注一篇关于如何生成倒谱以用于检测语音共振峰并使用 iPhone Accelerate 框架对其进行编码的文章。然而,结果并不像文章所期望的那样。对于清音部分(文章中的图 3),它在前几个 bin 中显示较小的值。但是,当我的代码运行时,清音部分的值很大(接近 1.0),看起来更像是浊音部分。

这是我的代码:

// copy buffer data into a separate array and apply hamming window
// don't use leadlength because we copied to beginning of buffer
int offset = (int)(s * stepSize);
float *hamBuffer = malloc(n*sizeof(float));
for (int i=0; i < n; i++)
    hamBuffer[i] = hpBuffer[offset+i] * ((1.0f-0.46f) - 0.46f*cos(TWOPI*i/((float)n-1.0f)));

// configure float array into acceptable input array format (interleaved)
vDSP_ctoz((COMPLEX*)hamBuffer, 2, &complexArray, 1, halfN);

// free ham buffer
free(hamBuffer);

// run FFT
vDSP_fft_zrip(setupReal, &complexArray, stride, log2n, FFT_FORWARD);

// Absolute square (equivalent to mag^2)
complexArray.imagp[0] = 0.0f;
vDSP_zvmags(&complexArray, 1, complexArray.realp, 1, halfN);
bzero(complexArray.imagp, (halfN) * sizeof(float));

// scale
float scale = 1.0f / (2.0f*(float)n);
vDSP_vsmul(complexArray.realp, 1, &scale, complexArray.realp, 1, halfN);

// get log of absolute values for passing to inverse FFT for cepstrum
float *logmag = malloc(sizeof(float)*halfN);
for (int i=0; i < halfN; i++)
    logmag[i] = log10f(fabsf(complexArray.realp[i]));

// configure float array into acceptable input array format (interleaved)
vDSP_ctoz((COMPLEX*)logmag, 2, &complexArray, 1, halfN/2);

// create cepstrum
vDSP_fft_zrip(setupReal, &complexArray, stride, log2n-1, FFT_INVERSE);

// scale again
scale = (float) 1.0 / (2 * n);
vDSP_vsmul(complexArray.realp, 1, &scale, complexArray.realp, 1, halfN);
vDSP_vsmul(complexArray.imagp, 1, &scale, complexArray.imagp, 1, halfN);

//convert interleaved to real
float *displayData = malloc(sizeof(float)*n);
vDSP_ztoc(&complexArray, 1, (COMPLEX*)displayData, 2, halfN);    

// print cepstrum to debug window
for (int i=0; i < halfN; i++)
    printf("%f\r\n", displayData[i]);  

以下是前几个 bin 的结果:

-1.036735
0.807992
-0.030310
0.201064
-0.048442
0.071084
-0.050529
0.108412
-0.037282
0.080372
-0.003775
0.102596
-0.027706
0.044470
0.010319
0.041597
-0.050533
0.012725
-0.003895
-0.016887
-0.010547

他们确实“稳定”到零,但前几个数字比我对清音部分的预期要大得多。我的代码看起来不正确吗?我想我非常密切地关注了这篇文章。为什么我在清音部分的前几个 bin 中得到如此大的值?

4

0 回答 0