node.js - 如何进行实时语音识别 | 谷歌云语音转文本

Question

我正在尝试从我的扬声器转录音频
我正在将声音从扬声器传输到 node.js 文件（https://askubuntu.com/a/850174）

parec -d alsa_output.pci-0000_00_1b.0.analog-stereo.monitor --rate=16000 --channels=1 | node transcribe.js

这是我的 transcribe.js

const speech = require('@google-cloud/speech');

const client = new speech.SpeechClient();

const encoding = 'LINEAR16';
const sampleRateHertz = 16000;
const languageCode = 'en-US';

const request = {
    config: {
        encoding: encoding,
        sampleRateHertz: sampleRateHertz,
        languageCode: languageCode,
    },
    interimResults: false, // If you want interim results, set this to true
};

const recognizeStream = client
    .streamingRecognize(request)
    .on('error', console.error)
    .on('data', data => {
        console.log(
            `Transcription: ${data.results[0].alternatives[0].transcript}`
        );
    });

process.stdin.pipe(recognizeStream);

但是 Google Cloud Speech-to-Text 在大约 1 分钟内进行流式识别有一个限制。所以我有错误“超过 65 秒的最大允许流持续时间”。

如何将流拆分为静音作为拆分器的块或持续时间为 30 秒的块？

score 0 · Accepted Answer

我们可以将音频通过管道传输到 sox 实用程序，以通过 0.3 秒持续时间且不超过 55 秒的静音来分割它

sox -t raw -r 16k -e signed -b 16 -c 1 - ./chunks/output.wav  silence 1 0.3 0.1% 1 0.3 0.1% trim 0 55 : newfile : restart

现在我们可以查看块目录中的新文件并将它们流式传输到 Google Cloud Speech-to-Text API

node.js - 如何进行实时语音识别 | 谷歌云语音转文本

1 回答 1

Related

Reference