mfcc - 使用 mfcc 和动态时间规整 (dtw) 进行声音分类

Question

我的目标是对我在 java 中使用 mfcc 和 dtw 的非语音信号进行分类。但是我被困在中间。我将不胜感激任何帮助。我已经为每一帧评估了 13 个 mfcc 值，但是有些值是负数，我很困惑我所遵循的过程是对还是错。目前我正在使用 JAudio 提供的代码。我也尝试过其他代码，它们也给了我负值。

其次，我每帧得到 13 个系数，考虑到一定长度的样本有 157 帧，我得到 157 组 13 mfcc。我很难如何使用 DTW 中的所有系数，因为 dtw 只给出两个时间信号之间的最近距离。我确实有 DTW 代码来比较两个时间信号。我不确定如何将信号的所有 mfccs 值用作特征。

我缺少一些关键的分类步骤吗？请帮我。

score 0 · Accepted Answer

查看：http ://code.google.com/p/aquila/ 具体来说：http ://code.google.com/p/aquila/source/browse/trunk/examples/dtw_distance/main.cpp其中有一个示例dtw 距离计算的代码。

score 0 · Accepted Answer

假设您有 N1 组 13 个 MFCC，每组用于第一个信号，N2 组 MFCC 用于第二个信号。您应该计算第一个信号中的每组与第二个信号中的每组之间的距离（您可以使用欧几里得距离来计算两个 13 大小的阵列之间的距离）

这将为您留下一个 N1xN2 二维数组，您现在应该在其上应用 DTW。

score 0 · Accepted Answer

DTW 的使用假设在您的情况下验证 2 个音频序列。因此，对于要验证的序列，您将有一个矩阵 M1xN 和查询 M2xN。这意味着您的成本矩阵将具有 M1xM2。

要构建成本矩阵，您必须在序列之间应用距离/成本度量，如 cost(i,j) = your_chosen_multidimension_metric(M1[i,:],M2[j,:])

结果成本矩阵将是 2D，您可以轻松应用 DTW。

我基于 MFCC 为 DTW 制作了类似的代码。下面是返回 DTW 分数的 Python 实现；x 和 y 是语音序列的 MFCC 矩阵，具有 M1xN 和 M2xN 维度：

def my_dtw (x, y):
    cost_matrix = cdist(x, y,metric='seuclidean')
    m,n = np.shape(cost_matrix)
    for i in range(m):
        for j in range(n):
            if ((i==0) & (j==0)):
                cost_matrix[i,j] = cost_matrix[i,j]

            elif (i==0):
                cost_matrix[i,j] = cost_matrix[i,j] + cost_matrix[i,j-1]

            elif (j==0):
                cost_matrix[i,j] = cost_matrix[i,j] + cost_matrix[i-1,j]

            else:
                min_local_dist = cost_matrix[i-1,j]

                if min_local_dist > cost_matrix[i,j-1]:
                    min_local_dist = cost_matrix[i,j-1]

                if min_local_dist > cost_matrix[i-1,j-1]:
                    min_local_dist = cost_matrix[i-1,j-1]

                cost_matrix[i,j] = cost_matrix[i,j] + min_local_dist
    return cost_matrix[m-1,n-1]

mfcc - 使用 mfcc 和动态时间规整 (dtw) 进行声音分类

3 回答 3

Related

Reference