python - 如何处理从谷歌语音到文本 API 的结果

Question

我正在使用谷歌语音到文本 API。它返回 google.cloud.speech_v1.types.RecognizeResponse 类型的对象。我发现这在 Python 中几乎无法使用，因为我无法遍历它来返回多个文本字符串。

经过大量搜索使其在 Python 中可用的解决方案后，我在 Stack Overflow 中找到了一个可以从 google.protobuf.json_format.MessageToJson() 使用的解决方案。但是，当我运行以下功能时...

def transcribe(self, fp):
    transcribed = []

    data = fp.read()
    speech_content_bytes = base64.b64encode(data)
    speech_content = speech_content_bytes.decode('utf-8')

    os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = self.json_path
    os.environ["GCLOUD_PROJECT"] = proj_name
    config = {'language_code': 'en-US'}
    audio = {'content': data}

    client = speech.SpeechClient()
    response = client.recognize(config, audio)
    print('response is a ' + str(type(response)))
    result_json = MessageToJson(response)
    print('result_json is a ' + str(type(result_json)))
    result_json = json.loads(result_json)
    print('now result_json is a ' + str(type(result_json)))

    for result in result_json["results"]:
        transcribed.append(result["alternatives"][0]["transcript"].upper())

    return transcribed

...我得到以下输出：

response is a <class 'google.cloud.speech_v1.types.RecognizeResponse'>
result_json is a <class 'str'>
now result_json is a <class 'dict'>

如您所见，运行 google MessageToJson 函数的结果实际上是一个字符串，我必须使用 json.loads 函数将其加载到 Dict 中。

为什么 MessageToJson 函数会返回一个字符串，而不是一个 Dict / json 对象？
是否有另一种方法可以使用 Python 中的 google.cloud.speech_v1.types.RecognizeResponse 对象来获取转录文本？

我不明白为什么谷歌会返回这个很难使用的对象。

score 0 · Accepted Answer

MessageToJson 将 RecognizeResponse 从 protobuf 消息转换为 JSON 格式，但是以字符串的形式。

您可以通过以下方式直接使用 RecognizeResponse：

response: RecognizeResponse = client.recognize(config=your_config, audio=your_audio)
final_transcripts = []
final_transcripts_confidence = []
for result in response.results:
   alternative = result.alternatives[0]
   final_transcripts_confidence.append(alternative.confidence)
   final_transcripts.append(alternative.transcript)

如果您仍然想使用 MessageToJson 并将其转换为字典，您可以执行以下操作：

import json
from google.protobuf.json_format import MessageToJson

response: RecognizeResponse = client.recognize(config=your_config, audio=your_audio)
response_json_str = MessageToJson(response, indent=0)
response_dict = json.loads(response_json_str)

或者您使用 MessageToDict 直接转换为字典。

注意：
从某些版本开始，原型转换发生了变化并导致出现错误：AttributeError: 'DESCRIPTOR'

要解决这个问题，您应该使用：

RecognizeResponse.to_json(response)

或者：

RecognizeResponse.to_dict(response)

python - 如何处理从谷歌语音到文本 API 的结果

1 回答 1

Related

Reference