machine-learning - How to train a machine learning algorithm using MFCC coefficient vectors?

Question

For my final year project i am trying to identify dog/bark/bird sounds real time (by recording sound clips). I am using MFCC as the audio features. Initially i have extracted altogether 12 MFCC vectors from a sound clip using jAudio library. Now I'm trying to train a machine learning algorithm(at the moment i have not decided the algorithm but it is most probably SVM). The sound clip size is like around 3 seconds. I need to clarify some information about this process. They are,

Do i have to train this algorithm using frame based MFCCs(12 per frame) or or overall clip based MFCCs(12 per sound clip)?
To train the algorithm do i have to consider all the 12 MFCCs as 12 different attributes or do i have to consider those 12 MFCCs as a one attribute ?

These MFCCs are the overall MFCCS for the clip,

-9.598802712290967 -21.644963856237265 -7.405551798816725 -11.638107212413201 -19.441831623156144 -2.780967392843105 -0.5792847321137902 -13.14237288849559 -4.920408873192934 -2.7111507999281925 -7.336670942457227 2.4687330348335212

Any help will be really appreciated to overcome these problems. I couldn't find out a good help on Google. :)

score 5 · Accepted Answer

您应该计算每帧的 MFCC。由于您的信号随时间变化，因此将它们用于整个剪辑是没有意义的。更糟糕的是，您最终可能会得到具有相似表示的狗和鸟。我会尝试几种帧长度。通常，它们会以毫秒为单位。
所有这些都应该是单独的功能。让机器学习算法决定哪个是最好的预测器。

请注意，MFCC 对噪音很敏感，因此请先检查您的样本的声音。Yaafe 库提供了更丰富的音频特征提取选择，其中许多在您的情况下会更好。具体是哪个？以下是我发现在鸟叫分类中最有用的内容：

光谱平坦度
知觉传播
频谱滚降
光谱减少
光谱形状统计
谱斜率
线性预测编码 (LPC)
线谱对 (LSP)

也许你可能会觉得看看这个项目很有趣，尤其是我与 Yaafe 交互的部分。

早在我使用 SVM 的时候，就完全按照您的计划。今天我肯定会选择梯度提升。

machine-learning - How to train a machine learning algorithm using MFCC coefficient vectors?

1 回答 1

Related

Reference