google-cloud-speech - 如何将谷歌语音 API 用于具有 2 个通道的音频

Question

我们有两个人在不同频道上讲话的录音。我在这里尝试 node.js 的官方文档。首先，我收到一个错误，即有效负载大小超过了最大限制。

ubuntu@ip-xxxx:~/nodejs-docs-samples/speech$ node recognize.js async /home/ubuntu/output.wav
(node:18306) UnhandledPromiseRejectionWarning: Unhandled promise rejection (rejection id: 1): Error: Request payload size exceeds the limit: 10485760 bytes.

然而，该文档刚刚提到了记录长度方面的限制，而不是文件大小方面的限制。这是链接

有什么解决方法吗？

另外，我尝试使用较小的文件大小并得到配置错误：

ubuntu@ip-xxx:~/nodejs-docs-samples/speech$ node recognize.js async /home/ubuntu/output2.wav
(node:18291) UnhandledPromiseRejectionWarning: Unhandled promise rejection (rejection id: 1): Error: Invalid Configuration, Does not match Wav File Header.
Wav Header Contents:
Encoding: LINEAR16
Channels: 2
Sample Rate: 16000.
Request Contents:
Encoding: linear16
Channels: 1
Sample Rate: 16000.

我不确定 API 是否允许使用 2 通道音频输入，因为我在文档中找不到任何此类配置。但是，我发现了这个链接，建议将音频拆分为单独的通道并单独使用它们。以编程方式执行此操作的推荐方法是什么？

score 3 · Accepted Answer

我最终采用了这种方法

使用将文件拆分为通道sox
将两个频道的音频上传到谷歌云存储（对于本地文件，如果录音长度超过1分钟，语音API将不会处理。所以如果文件很大，必须上传到谷歌云存储）
通过语音识别 API 传递每个文件
将成绩单分开保存。我们无法将两者合并，因为谷歌语音 API 不提供单词的时间戳

这是一个将文件拆分为通道的辅助函数

function splitFileToChannels (fileName) {
  let output = {
    channel1: `${fileName}_channel1.wav`,
    channel2: `${fileName}_channel2.wav`
  };
  let channel1Command = `sox ${fileName} ${fileName}_channel1.wav remix 1`;
  let channel2Command = `sox ${fileName} ${fileName}_channel2.wav remix 2`;
  return Promise.all([
    childProcess.execAsync(channel1Command),
    childProcess.execAsync(channel2Command)
  ])
  .then(() => {
    return output;
  });
}

另外，我必须先将mp3文件转换为wav格式，然后再拆分为频道。

google-cloud-speech - 如何将谷歌语音 API 用于具有 2 个通道的音频

1 回答 1

Related

Reference