我将如何转录一个大文件,以避免Operation not complete and retry limit reached.
使用 Google Speech API 异步转录大音频文件时出现错误?
可能的解决方案
如果操作尚未完成,您可以通过重复发出 GET 请求来轮询端点,直到响应的 done 属性为 true。
在python中这样做可行吗?还是我应该将文件分成较小的文件并重试?
语音 API 的已知问题
- 编码。
到目前为止我已经尝试过什么
编码命令
ffmpeg -i 2017-06-13-17_48_51.flac -ac 1 mono.flac
为什么 ffmpeg 超过 sox?
我选择 ffmpeg 是因为我使用 sox 得到了这个错误
sox 2017-06-13-17_48_51.flac --channels=1 --bits=16 2017-06-13-17_48_51_more_stable.flac
sox WARN 抖动:抖动裁剪了 55 个样本;减少音量?
输入音频文件
Input File : '2017-06-13-17_48_51.flac'
Channels : 2
Sample Rate : 48000
Precision : 16-bit
Duration : 00:21:18.40 = 61363200 samples ~ 95880 CDDA sectors
File Size : 60.7M
Bit Rate : 380k
Sample Encoding: 16-bit FLAC
运行此命令
ffmpeg -i 2017-06-13-17_48_51.flac -ac 1 mono.flac
输出音频文件
Input File : 'mono.flac'
Channels : 1
Sample Rate : 48000
Precision : 16-bit
Duration : 00:21:18.40 = 61363200 samples ~ 95880 CDDA sectors
File Size : 59.9M
Bit Rate : 375k
Sample Encoding: 16-bit FLAC
Comment : 'encoder=Lavf56.40.101'
蟒蛇文件
Google Speech API 异步 Ex。带显式凭据
我将 Flac Hertz 更改为“48000”并放入显式环境路径
import argparse import io import time import os os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "cloud_speech_service_keys.json" def transcribe_file(speech_file): """Transcribe the given audio file asynchronously.""" from google.cloud import speech speech_client = speech.Client() with io.open(speech_file, 'rb') as audio_file: content = audio_file.read() audio_sample = speech_client.sample( content, source_uri=None, encoding='LINEAR16', sample_rate_hertz=16000) operation = audio_sample.long_running_recognize('en-US') retry_count = 100 while retry_count > 0 and not operation.complete: retry_count -= 1 time.sleep(2) operation.poll() if not operation.complete: print('Operation not complete and retry limit reached.') return alternatives = operation.results for alternative in alternatives: print('Transcript: {}'.format(alternative.transcript)) print('Confidence: {}'.format(alternative.confidence)) # [END send_request] def transcribe_gcs(gcs_uri): """Asynchronously transcribes the audio file specified by the gcs_uri.""" from google.cloud import speech speech_client = speech.Client() audio_sample = speech_client.sample( content=None, source_uri=gcs_uri, encoding='FLAC', sample_rate_hertz=48000) operation = audio_sample.long_running_recognize('en-US') retry_count = 100 while retry_count > 0 and not operation.complete: retry_count -= 1 time.sleep(2) operation.poll() if not operation.complete: print('Operation not complete and retry limit reached.') return alternatives = operation.results for alternative in alternatives: print('Transcript: {}'.format(alternative.transcript)) print('Confidence: {}'.format(alternative.confidence)) # [END send_request_gcs] if __name__ == '__main__': parser = argparse.ArgumentParser( description=__doc__, formatter_class=argparse.RawDescriptionHelpFormatter) parser.add_argument( 'path', help='File or GCS path for audio file to be recognized') args = parser.parse_args() if args.path.startswith('gs://'): transcribe_gcs(args.path) else: transcribe_file(args.path)