python - TensorFlow 声音识别

Question

我正在制作我自己版本的TensorFlow 音频识别示例，以识别一些声音效果而不是语音。在训练我的声音识别模型时出现以下错误：

2019-09-11 19:16:38.221677: E tensorflow/core/kernels/mfcc_mel_filterbank.cc:153] 在 mel 频率设计中从 0 开始缺少 5 个频段。可能是频道太多或频谱中的频率分辨率不够。(input_length: 257 input_sample_rate: 44100 output_channel_count: 40 lower_frequency_limit: 20 upper_frequency_limit: 4000

你能解释一下这意味着什么，我该如何解决这个问题？我的音频片段大约 1 秒长，44.1khz 并且是立体声的。

非常感谢！

score 0 · Accepted Answer

问题是该示例期望音频文件的采样率为 16000khz，但我提供的是 44100khz 文件。

我通过添加以下标志解决了这个问题：

--sample_rate=44100

python - TensorFlow 声音识别

1 回答 1

Related

Reference