1

我将如何转录一个大文件,以避免Operation not complete and retry limit reached.使用 Google Speech API 异步转录大音频文件时出现错误?

可能的解决方案

如果操作尚未完成,您可以通过重复发出 GET 请求来轮询端点,直到响应的 done 属性为 true。

在python中这样做可行吗?还是我应该将文件分成较小的文件并重试?

语音 API 的已知问题

  • 编码。

到目前为止我已经尝试过什么


编码命令

ffmpeg -i 2017-06-13-17_48_51.flac -ac 1 mono.flac

为什么 ffmpeg 超过 sox?

我选择 ffmpeg 是因为我使用 sox 得到了这个错误

sox 2017-06-13-17_48_51.flac --channels=1 --bits=16 2017-06-13-17_48_51_more_stable.flac

sox WARN 抖动:抖动裁剪了 55 个样本;减少音量?

输入音频文件

Input File : '2017-06-13-17_48_51.flac' Channels : 2 Sample Rate : 48000 Precision : 16-bit Duration : 00:21:18.40 = 61363200 samples ~ 95880 CDDA sectors File Size : 60.7M Bit Rate : 380k Sample Encoding: 16-bit FLAC

运行此命令

ffmpeg -i 2017-06-13-17_48_51.flac -ac 1 mono.flac

输出音频文件

Input File : 'mono.flac' Channels : 1 Sample Rate : 48000 Precision : 16-bit Duration : 00:21:18.40 = 61363200 samples ~ 95880 CDDA sectors File Size : 59.9M Bit Rate : 375k Sample Encoding: 16-bit FLAC Comment : 'encoder=Lavf56.40.101'

蟒蛇文件

Google Speech API 异步 Ex。带显式凭据

我将 Flac Hertz 更改为“48000”并放入显式环境路径

import argparse
import io
import time
import os
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "cloud_speech_service_keys.json"
def transcribe_file(speech_file):
    """Transcribe the given audio file asynchronously."""
    from google.cloud import speech
    speech_client = speech.Client()

    with io.open(speech_file, 'rb') as audio_file:
        content = audio_file.read()
        audio_sample = speech_client.sample(
            content,
            source_uri=None,
            encoding='LINEAR16',
            sample_rate_hertz=16000)

    operation = audio_sample.long_running_recognize('en-US')

    retry_count = 100
    while retry_count > 0 and not operation.complete:
        retry_count -= 1
        time.sleep(2)
        operation.poll()

    if not operation.complete:
        print('Operation not complete and retry limit reached.')
        return

    alternatives = operation.results
    for alternative in alternatives:
        print('Transcript: {}'.format(alternative.transcript))
        print('Confidence: {}'.format(alternative.confidence))
    # [END send_request]
def transcribe_gcs(gcs_uri):
    """Asynchronously transcribes the audio file specified by the gcs_uri."""
    from google.cloud import speech
    speech_client = speech.Client()

    audio_sample = speech_client.sample(
        content=None,
        source_uri=gcs_uri,
        encoding='FLAC',
        sample_rate_hertz=48000)

    operation = audio_sample.long_running_recognize('en-US')

    retry_count = 100
    while retry_count > 0 and not operation.complete:
        retry_count -= 1
        time.sleep(2)
        operation.poll()

    if not operation.complete:
        print('Operation not complete and retry limit reached.')
        return

    alternatives = operation.results
    for alternative in alternatives:
        print('Transcript: {}'.format(alternative.transcript))
        print('Confidence: {}'.format(alternative.confidence))
    # [END send_request_gcs]


if __name__ == '__main__':
    parser = argparse.ArgumentParser(
        description=__doc__,
        formatter_class=argparse.RawDescriptionHelpFormatter)
    parser.add_argument(
        'path', help='File or GCS path for audio file to be recognized')
    args = parser.parse_args()
    if args.path.startswith('gs://'):
        transcribe_gcs(args.path)
    else:
        transcribe_file(args.path)
4

0 回答 0