rest - 是否可以使用 curl 来使用 Google Cloud Speech API 来识别 10 到 15 分钟内的文件？

Question

我正在使用带有 cURL 的 REST API，因为我需要做一些快速而简单的事情，而且我在一个无法开始倾倒垃圾的盒子上；即一些厚厚的开发人员SDK。

我开始base64编码flac文件并启动speech.syncrecognize.

最终失败了：

{
  "error": {
    "code": 400,
    "message": "Request payload size exceeds the limit: 10485760.",
    "status": "INVALID_ARGUMENT"
  }
}

所以好吧，你不能在请求中发送 31,284,578 字节；必须使用云存储。因此，我上传了 flac 音频文件，并现在在 Cloud Storage 中使用该文件重试。那失败了：

{
  "error": {
    "code": 400,
    "message": "For audio inputs longer than 1 min, use the 'AsyncRecognize' method.",
    "status": "INVALID_ARGUMENT"
  }
}

很好，speech.syncrecognize不喜欢内容大小；再试一次speech.asyncrecognize。那失败了：

{
  "error": {
    "code": 400,
    "message": "For audio inputs longer than 1 min, please use LINEAR16 encoding.",
    "status": "INVALID_ARGUMENT"
  }
}

好的，所以speech.asyncrecognize只能做LPCM；以格式上传文件，pcm_s16le然后重试。所以最后，我得到了一个操作手柄：

{
  "name": "9174269756763138681"
}

Keep checking it, and eventually it's complete:

{
  "name": "9174269756763138681",
  "done": true,
  "response": {
    "@type": "type.googleapis.com/google.cloud.speech.v1beta1.AsyncRecognizeResponse"
  }
}

So wait, after all that, with the result now sitting on a queue, there is no REST method to request the result? Someone please tell me that I've missed the glaringly obvious staring me right in the face, and that Google didn't create completely pointless, incomplete, REST API.

score 3 · Accepted Answer

So the answer to the question is, No, it is possible to use curl, to use Google Cloud Speech API, to recognize within 10 to 15 minute files... assuming you navigate and conform to a fairly tight set of constraints... at least in beta1.

What is not overtly obvious from the documentation is the result should be returned by the operations.get method... which would have been obvious had any of my attempts actually returned something other than empty results.

The source rate in my files is either 44,100 or 48,000 Hz, and I was setting sample_rate to the source native rate. However, contrary to the documentation which states:

Sample rate in Hertz of the audio data sent in all RecognitionAudio messages. Valid values are: 8000-48000. 16000 is optimal. For best results, set the sampling rate of the audio source to 16000 Hz. If that's not possible, use the native sample rate of the audio source (instead of re-sampling).

after re-sampling to 16,000 Hz I started to get results with operations.get.

I think it's worth noting that correlation does not imply causation. After re-sampling to 16,000 Hz the files becomes significantly smaller. Thus, I can't prove it's a sample rate issue, and not just the service choking on files over a certain size.

It's also worth noting the documentation refers to the Sample Rate inconsistently. It appears that gRPC API may be expecting sample_rate, and REST API may be expecting sampleRate, according to their respective detailed definitions, in which case the Quickstart may be giving an incorrect example for the REST API.

rest - 是否可以使用 curl 来使用 Google Cloud Speech API 来识别 10 到 15 分钟内的文件？

1 回答 1

Related

Reference