python - 如何在 Google Speech API 中转录大文件？

Question

我将如何转录一个大文件，以避免Operation not complete and retry limit reached.使用 Google Speech API 异步转录大音频文件时出现错误？

可能的解决方案

如果操作尚未完成，您可以通过重复发出 GET 请求来轮询端点，直到响应的 done 属性为 true。

在python中这样做可行吗？还是我应该将文件分成较小的文件并重试？

语音 API 的已知问题

编码。

到目前为止我已经尝试过什么

编码命令

ffmpeg -i 2017-06-13-17_48_51.flac -ac 1 mono.flac

为什么 ffmpeg 超过 sox？

我选择 ffmpeg 是因为我使用 sox 得到了这个错误

sox 2017-06-13-17_48_51.flac --channels=1 --bits=16 2017-06-13-17_48_51_more_stable.flac

sox WARN 抖动：抖动裁剪了 55 个样本；减少音量？

输入音频文件

Input File : '2017-06-13-17_48_51.flac' Channels : 2 Sample Rate : 48000 Precision : 16-bit Duration : 00:21:18.40 = 61363200 samples ~ 95880 CDDA sectors File Size : 60.7M Bit Rate : 380k Sample Encoding: 16-bit FLAC

运行此命令

ffmpeg -i 2017-06-13-17_48_51.flac -ac 1 mono.flac

输出音频文件

Input File : 'mono.flac' Channels : 1 Sample Rate : 48000 Precision : 16-bit Duration : 00:21:18.40 = 61363200 samples ~ 95880 CDDA sectors File Size : 59.9M Bit Rate : 375k Sample Encoding: 16-bit FLAC Comment : 'encoder=Lavf56.40.101'

蟒蛇文件

Google Speech API 异步 Ex。带显式凭据

我将 Flac Hertz 更改为“48000”并放入显式环境路径

import argparse
import io
import time
import os
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "cloud_speech_service_keys.json"
def transcribe_file(speech_file):
    """Transcribe the given audio file asynchronously."""
    from google.cloud import speech
    speech_client = speech.Client()

    with io.open(speech_file, 'rb') as audio_file:
        content = audio_file.read()
        audio_sample = speech_client.sample(
            content,
            source_uri=None,
            encoding='LINEAR16',
            sample_rate_hertz=16000)

    operation = audio_sample.long_running_recognize('en-US')

    retry_count = 100
    while retry_count > 0 and not operation.complete:
        retry_count -= 1
        time.sleep(2)
        operation.poll()

    if not operation.complete:
        print('Operation not complete and retry limit reached.')
        return

    alternatives = operation.results
    for alternative in alternatives:
        print('Transcript: {}'.format(alternative.transcript))
        print('Confidence: {}'.format(alternative.confidence))
    # [END send_request]
def transcribe_gcs(gcs_uri):
    """Asynchronously transcribes the audio file specified by the gcs_uri."""
    from google.cloud import speech
    speech_client = speech.Client()

    audio_sample = speech_client.sample(
        content=None,
        source_uri=gcs_uri,
        encoding='FLAC',
        sample_rate_hertz=48000)

    operation = audio_sample.long_running_recognize('en-US')

    retry_count = 100
    while retry_count > 0 and not operation.complete:
        retry_count -= 1
        time.sleep(2)
        operation.poll()

    if not operation.complete:
        print('Operation not complete and retry limit reached.')
        return

    alternatives = operation.results
    for alternative in alternatives:
        print('Transcript: {}'.format(alternative.transcript))
        print('Confidence: {}'.format(alternative.confidence))
    # [END send_request_gcs]


if __name__ == '__main__':
    parser = argparse.ArgumentParser(
        description=__doc__,
        formatter_class=argparse.RawDescriptionHelpFormatter)
    parser.add_argument(
        'path', help='File or GCS path for audio file to be recognized')
    args = parser.parse_args()
    if args.path.startswith('gs://'):
        transcribe_gcs(args.path)
    else:
        transcribe_file(args.path)