7

For my final year project i am trying to identify dog/bark/bird sounds real time (by recording sound clips). I am using MFCC as the audio features. Initially i have extracted altogether 12 MFCC vectors from a sound clip using jAudio library. Now I'm trying to train a machine learning algorithm(at the moment i have not decided the algorithm but it is most probably SVM). The sound clip size is like around 3 seconds. I need to clarify some information about this process. They are,

  1. Do i have to train this algorithm using frame based MFCCs(12 per frame) or or overall clip based MFCCs(12 per sound clip)?

  2. To train the algorithm do i have to consider all the 12 MFCCs as 12 different attributes or do i have to consider those 12 MFCCs as a one attribute ?

These MFCCs are the overall MFCCS for the clip,

-9.598802712290967 -21.644963856237265 -7.405551798816725 -11.638107212413201 -19.441831623156144 -2.780967392843105 -0.5792847321137902 -13.14237288849559 -4.920408873192934 -2.7111507999281925 -7.336670942457227 2.4687330348335212

Any help will be really appreciated to overcome these problems. I couldn't find out a good help on Google. :)

4

1 回答 1

5
  1. 您应该计算每帧的 MFCC。由于您的信号随时间变化,因此将它们用于整个剪辑是没有意义的。更糟糕的是,您最终可能会得到具有相似表示的狗和鸟。我会尝试几种帧长度。通常,它们会以毫秒为单位。

  2. 所有这些都应该是单独的功能。让机器学习算法决定哪个是最好的预测器。

请注意,MFCC 对噪音很敏感,因此请先检查您的样本的声音。Yaafe 库提供了更丰富的音频特征提取选择,其中许多在您的情况下会更好。具体是哪个?以下是我发现在鸟叫分类中最有用的内容:

  • 光谱平坦度
  • 知觉传播
  • 频谱滚降
  • 光谱减少
  • 光谱形状统计
  • 谱斜率
  • 线性预测编码 (LPC)
  • 线谱对 (LSP)

也许你可能会觉得看看这个项目很有趣,尤其是我与 Yaafe 交互的部分。

早在我使用 SVM 的时候,就完全按照您的计划。今天我肯定会选择梯度提升。

于 2016-02-07T14:27:21.927 回答