0

I have a small number of similar types of sounds (I shall refer to these as DB_sounds) to which I need to match a recording (Rec_sounds). Each Rec_sound is short and unique and needs to be matched to its corresponding DB_sound. How do I go about matching them?

To illustrate my problem, consider the following:
Bob, with a deep voice in room A (with some background noise) says Ma
Alice, with high voice in room B says Eh
A Baby is learning to speak. His first word is Eh

Ma and Eh are 2 different types of DB_sounds, so I have to return 2 different results. I have several DB_sound samples of different people saying Ma and Eh to compare the Rec_sounds to

The sounds that I am dealing with are voice recordings of single syllables like la, ba, ne, eh, ma etc.

How should I tackle this?
I don't think audio fingerprinting will work (see spectrogram), and existing voice recognition software like this google api integration in python don't work since I am not trying to recognize human language, but just sounds.

I don't mind building something from the ground up, just point me in a direction you think will work, and please add plenty justification for why you think so.

Spectrograms of 8 samples of a baby saying EH enter image description here

Time domain graphs of 8 samples of a baby saying EH enter image description here

4

1 回答 1

1

如果您只想识别声音,我将从一个简单的过程开始:

  1. 裁剪每个声音样本的静音(简单的能量阈值)。
  2. 计算每个数据库样本的音频特征(例如MFCCs)。
  3. 执行交叉验证的分类过程,将音频特征映射到您想要识别的声音类别。

有用的 Python 库:用于读取 wav 文件的scipy 、用于音频特征提取的essentia 、用于分类和其他机器学习的scikit-learn 。

于 2015-11-11T16:05:50.593 回答