java - 用于训练 HMM 的 MFCC 数据格式

Question

我正在尝试使用 mfcc 功能和隐藏的马尔可夫模型在 java 中开发一个音频分类系统。我正在关注这篇研究论文：http ://acccn.net/cr569/Rstuff/keys/bathSoundMonitoring.pdf 。

算法描述如下：

每个声音文件，对应于声音事件的样本，在帧中进行处理，这些帧由汉明窗（25 毫秒）预先强调和加窗，重叠率为 50%。由 13 阶 MFCC 组成的特征向量表征每一帧。我们使用从左到右的六态连续密度 HMM 对每个声音进行建模，而不会跳过状态。每个 HMM 状态由两个高斯混合分量组成。在模型初始化阶段完成后，所有 HMM 模型都在三个迭代周期中进行训练。

我已经完成了第一部分工作，即从样本声音中提取特征。结果，我得到了一个二维数组，每行由 13 列组成（每行代表一个声音帧）。现在我的问题是如何使用这些数据训练 hmm。

我正在使用 jahmm 库。到目前为止，我已经开发了一些示例代码来大致了解该库的工作原理。

/**Some sample data to act as the mfcc data. Here each line terminated by a new space
     * is one observation. I don't know whether each line should be one row from the mfcc values 
     * (representing one frame) or each line should be representing a set of features from one audio file.
     */
    String realSequences = "1.1;2.2;3.3;4.4;5.5;6.6;7.7;8.8;9.9;10.0;11.1;12.2;13.3;\n"
            + "1.1;2.2;3.3;4.4;5.5;6.6;7.7;8.8;9.9;10.0;11.1;12.2;13.3;\n"
            + "1.1;2.2;3.3;4.4;5.5;6.6;7.7;8.8;9.9;10.0;11.1;12.2;13.3;\n"
            + "1.1;2.2;3.3;4.4;5.5;6.6;7.7;8.8;9.9;10.0;11.1;12.2;13.3;\n"
            + "1.1;2.2;3.3;4.4;5.5;6.6;7.7;8.8;9.9;10.0;11.1;12.2;13.3;\n"
            + "1.1;2.2;3.3;4.4;5.5;6.6;7.7;8.8;9.9;10.0;11.1;12.2;13.3;\n"
            + "1.1;2.2;3.3;4.4;5.5;6.6;7.7;8.8;9.9;10.0;11.1;12.2;13.3;\n"
            + "1.1;2.2;3.3;4.4;5.5;6.6;7.7;8.8;9.9;10.0;11.1;12.2;13.3;\n"
            + "1.1;2.2;3.3;4.4;5.5;6.6;7.7;8.8;9.9;10.0;11.1;12.2;13.3;\n"
            + "1.1;2.2;3.3;4.4;5.5;6.6;7.7;8.8;9.9;10.0;11.1;12.2;13.3;\n"
            + "1.1;2.2;3.3;4.4;5.5;6.6;7.7;8.8;9.9;10.0;11.1;12.2;13.3;\n"
            + "1.1;2.2;3.3;4.4;5.5;6.6;7.7;8.8;9.9;10.0;11.1;12.2;13.3;\n"
            + "1.1;2.2;3.3;4.4;5.5;6.6;7.7;8.8;9.9;10.0;11.1;12.2;13.3;\n"
            + "1.1;2.2;3.3;4.4;5.5;6.6;7.7;8.8;9.9;10.0;11.1;12.2;13.3;\n"
            + "1.1;2.2;3.3;4.4;5.5;6.6;7.7;8.8;9.9;10.0;11.1;12.2;13.3;\n"
            + "1.1;2.2;3.3;4.4;5.5;6.6;7.7;8.8;9.9;10.0;11.1;12.2;13.3;\n"
            + "1.1;2.2;3.3;4.4;5.5;6.6;7.7;8.8;9.9;10.0;11.1;12.2;13.3;\n"
            + "1.1;2.2;3.3;4.4;5.5;6.6;7.7;8.8;9.9;10.0;11.1;12.2;13.3;\n"
            + "1.1;2.2;3.3;4.4;5.5;6.6;7.7;8.8;9.9;10.0;11.1;12.2;13.3;\n"
            + "1.1;2.2;3.3;4.4;5.5;6.6;7.7;8.8;9.9;10.0;11.1;12.2;13.3;\n"
            + "1.1;2.2;3.3;4.4;5.5;6.6;7.7;8.8;9.9;10.0;11.1;12.2;13.3;\n";


    /**
     * This is the reader class that reads the data and puts then in a relevant collection format
     * 
     */
    Reader reader = new StringReader(realSequences);
    List<? extends List<ObservationReal>> sequences =
            ObservationSequencesReader.readSequences(new ObservationRealReader(), reader);
    reader.close();


    /**
     * As the description states that each state is composed of two Gaussian mixture components.
     */
    OpdfGaussianMixtureFactory gMixtureFactory = new OpdfGaussianMixtureFactory(2);

    /**
     * The manual for jahmm says that KMeans learner is a good way to initialize the hmm. It has 6 states
     * and uses the two gaussian mixture models created above.
     */
    KMeansLearner<ObservationReal> kml = new KMeansLearner<ObservationReal>(6, gMixtureFactory, sequences);
    Hmm<ObservationReal> initHmm = kml.iterate();


    /*
     * As the papers states the hmm is trained in 3 iterative cycles.
     */
    BaumWelchLearner bwl = new BaumWelchLearner();
    Hmm<ObservationReal> learntHmm = null;
    for (int i = 0; i < 3; i++) {
        learntHmm = bwl.iterate(initHmm, sequences);
    }

我的问题是：

Q1：mfcc 数据应该以什么格式传递来训练 hmm？（参见 realSeuqences 行的评论）

Q2：在语音识别中，有时我们需要通过重复同一个词让我们说 10 次来训练系统。这是否意味着它用这 10 个样本训练了一个 hmm？如果是，那么如何用相同声音的不同样本训练一个 hmm。或者是 10 个单独训练的 hmm 但标有那个词？

Q3：如何在声音识别方面比较两个hmm模型。使用 viterbi 或 Kullback Leibler 距离会更好吗？

score 2 · Accepted Answer

Q1：mfcc 数据应该以什么格式传递来训练 hmm？（参见 realSeuqences 行的评论）

MFCC 数据必须表示为：

List<? extends List<ObservationVector>> sequences

这是一个数据序列列表。每个序列对应一个词样本，是一个向量列表，每个向量代表一个帧，包含 13 个 MFCC 值。

Q2：在语音识别中，有时我们需要通过重复同一个词让我们说 10 次来训练系统。这是否意味着它用这 10 个样本训练了一个 hmm？

输入数据是每个单词的序列列表。这个列表是一起处理的。

如果是，那么如何用相同声音的不同样本训练一个 hmm。或者是 10 个单独训练的 hmm 但标有那个词？

这是一个HMM。hmm 训练算法适用于每个单词的多个样本。它实际上需要相当多的样本，超过 10 个。

Q3：如何在声音识别方面比较两个hmm模型。使用 viterbi 或 Kullback Leibler 距离会更好吗？

这里的“比较”是什么意思还不是很清楚。您是否希望一个 HMM 的状态比另一个少，或者是什么。你想用什么属性来比较。答案取决于此。

并且，需要注意的是，语音识别 HMM 训练有一些特定的（如何选择状态数，使用哪些特征，如何初始化 HMM）。出于这个原因，为了获得最佳性能，最好使用像 CMUSphinx ( http://cmusphinx.sourceforge.net ) 这样的专用工具包，而不是通用工具包。

java - 用于训练 HMM 的 MFCC 数据格式

1 回答 1

Related

Reference