speech-to-text - .opus 音频文件的谷歌语音到文本配置是什么样的

Question

我正在将一个 .opus 音频文件传递给谷歌的语音到文本 api 以进行转录。我正在使用以下配置：

encoding = enums.RecognitionConfig.AudioEncoding.OGG_OPUS
language_code = "en-US"
sample_rate_hertz = 16000

我收到以下错误：

google.api_core.exceptions.GoogleAPICallError: None Unable to recognize speech, possible error in encoding or channel config. Please correct the config and retry the request.

我尝试了其他编码，如 FLAC 和 LINEAR16，并得到 None 作为输出。

opus 音频文件是否需要额外的配置字段，配置文件应该是什么样子？

score 1 · Accepted Answer

在通过谷歌提供的文档和几次尝试之后，我找到了解决我遇到的错误的方法。OGG_OPUS 编码需要对 audio_channel_count 进行明确的配置定义。在我的情况下，音频通道是 2，我需要明确定义它。此外，在多通道的情况下，enable_separate_recognition_per_channel 需要设置为 True。

对我有用的配置是：

encoding = enums.RecognitionConfig.AudioEncoding.OGG_OPUS
config = {
        "audio_channel_count": audio_channel_count,
        "enable_separate_recognition_per_channel": enable_separate_recognition_per_channel,
        "language_code": language_code,
        "sample_rate_hertz": sample_rate_hertz,
        "encoding": encoding
    }

为配置文件中的每个参数使用正确的值非常重要。

speech-to-text - .opus 音频文件的谷歌语音到文本配置是什么样的

1 回答 1

Related

Reference