node.js - 将 LINEAR16 音频编码为 Twilio 媒体音频/x-mulaw | 节点JS

Question

我一直在尝试将 mulaw 媒体流传输回 Twilio。要求是有效载荷必须以 8000 的采样率和 base64 编码的音频/x-mulaw 编码

我的输入来自 LINEAR16 Google Docs中的@google-cloud/text-to-speech

这就是我对来自@google-cloud/text-to-speech 的响应进行编码的方式

 const wav = new wavefile.WaveFile(speechResponse.audioContent)
    wav.toBitDepth('8')
    wav.toSampleRate(8000)
    wav.toMuLaw()

然后我通过 websocket 将结果发送回 twilio

twilioWebsocket.send(JSON.stringify({
      event: 'media',
      media: {
        payload: wav.toBase64(),
      },
      streamSid: meta.streamSid,
}))

问题是我们只在 twilio 调用的另一端出现随机噪声，似乎编码不正确

其次，我通过保存在文件中检查了@google-cloud/text-to-speech 输出音频，它正确且清晰

谁能帮我编码

score 1 · Accepted Answer

我也有同样的问题。错误在中wav.toBase64()，因为这包括 wav 标头。Twilio 媒体流需要原始音频数据，您可以使用获取wav.data.samples，因此您的代码将是：

 const wav = new wavefile.WaveFile(speechResponse.audioContent)
    wav.toBitDepth('8')
    wav.toSampleRate(8000)
    wav.toMuLaw()

 const payload = Buffer.from(wav.data.samples).toString('base64');

score 0 · Accepted Answer

我只是有同样的问题。解决方案是，您需要手动将 LINEAR16 转换为相应的 MULAW 编解码器。

您可以使用音乐库中的代码。

我创建了一个函数来将一个线性 16 字节数组转换为 mulaw：

short2ulaw(b: Buffer): Buffer {
    // Linear16 to linear8 -> buffer is half the size
    // As of LINEAR16 nature, the length should ALWAYS be even
    const returnbuffer = Buffer.alloc(b.length / 2)

    for (let i = 0; i < b.length / 2; i++) {
      // The nature of javascript forbids us to use 16-bit types. Every number is
      // A double precision 64 Bit number.
      let short = b.readInt16LE(i * 2)

      let sign = 0

      // Determine the sign of the 16-Bit byte
      if (short < 0) {
        sign = 0x80
        short = short & 0xef
      }

      short = short > 32635 ? 32635 : short

      const sample = short + 0x84
      const exponent = this.exp_lut[sample >> 8] & 0x7f
      const mantissa = (sample >> (exponent + 3)) & 0x0f
      let ulawbyte = ~(sign | (exponent << 4) | mantissa) & 0x7f

      ulawbyte = ulawbyte == 0 ? 0x02 : ulawbyte

      returnbuffer.writeUInt8(ulawbyte, i)
    }

    return returnbuffer
  }

现在你可以在 Raw PCM (Linear16) 上使用它。现在您只需要考虑在 google 流的开头剥离字节，因为 google 添加了 wav 标头。然后，您可以对生成的 base64 缓冲区进行编码并将其发送到 twilio。

node.js - 将 LINEAR16 音频编码为 Twilio 媒体音频/x-mulaw | 节点JS

2 回答 2

Related

Reference