python - How to get phonemes from Google Cloud API Text-to-Speech

Question

I am following the Google Cloud API Text-to-Speech Python tutorial. I would like to know if there is a way to return the phonemes and their duration, an intermediate step in generating the interpreted speech. Is that possible? If so, can you please refer me to the documentation and hopefully some sample code that does this. I searched and could not find anything that already answered my question.

Thanks! gma

score 3 · Accepted Answer

提到从谷歌云 API Text-to-Speech 获取音素的所有步骤。在第 3 部分中，您可以找到示例代码。以下是您可以遵循的步骤：

[第1部分]

在 Google Cloud Console 的项目选择器页面上，选择或创建一个 Google Cloud 项目。
确保为您的 Cloud 项目启用了结算功能
启用 Cloud Text-to-Speech API。
创建服务帐户：在 GCP Console 中，转到创建服务帐号页面。湾。选择一个项目。C。在服务帐户名称字段中，输入名称。Cloud Console 会根据此名称填写 Service account ID 字段。d。单击完成以完成创建服务帐户。不要关闭浏览器窗口。您将在下一步中使用它。
创建服务帐户密钥：在 GCP Console 中，点击您创建的服务帐号的电子邮件地址。湾。单击键。C。单击添加密钥，然后单击创建新密钥。d。单击创建。JSON 密钥文件已下载到您的计算机。e. 单击关闭。
将环境变量 GOOGLE_APPLICATION_CREDENTIALS 设置为包含您的服务帐户密钥的 JSON 文件的路径。此变量仅适用于您当前的 shell 会话，因此如果您打开一个新会话，请再次设置该变量。

示例 1. Linux 或 macOS export GOOGLE_APPLICATION_CREDENTIALS="KEY_PATH"

将 KEY_PATH 替换为包含您的服务帐户密钥的 JSON 文件的路径。

例如：- export GOOGLE_APPLICATION_CREDENTIALS="/home/user/Downloads/service-account-file.json"

示例 2.窗口

对于 powershell：

$env:GOOGLE_APPLICATION_CREDENTIALS="KEY_PATH"

将 KEY_PATH 替换为包含您的服务帐户密钥的 JSON 文件的路径。

例如：

$env:GOOGLE_APPLICATION_CREDENTIALS="C:\Users\username\Downloads\service-account-file.json"

对于命令提示符：

set GOOGLE_APPLICATION_CREDENTIALS=KEY_PATH

将 KEY_PATH 替换为包含您的服务帐户密钥的 JSON 文件的路径。
安装并初始化云 SDK。

[第2部分]

安装客户端库

pip install --upgrade google-cloud-texttospeech

[第三部分]

创建音频数据

现在您可以使用 Text-to-Speech 创建合成人类语音的音频文件。使用以下代码向 Text-to-Speech API 发送合成请求。

from google.cloud import texttospeech

client = texttospeech.TextToSpeechClient()

synthesis_input = texttospeech.SynthesisInput(text="Hello, World!")

voice = texttospeech.VoiceSelectionParams(
    language_code="en-US", ssml_gender=texttospeech.SsmlVoiceGender.NEUTRAL
)

audio_config = texttospeech.AudioConfig(
    audio_encoding=texttospeech.AudioEncoding.MP3
)

response = client.synthesize_speech(
    input=synthesis_input, voice=voice, audio_config=audio_config
)

with open("output.mp3", "wb") as out:
    out.write(response.audio_content)
    print('Audio content written to file "output.mp3"')

如果您遇到任何问题，请参考以下链接：

https://cloud.google.com/text-to-speech/docs/quickstart-client-libraries#client-libraries-install-python

score 1 · Accepted Answer

感谢您的回复@Akshansha。我知道如何创建合成人类语音的音频文件。我的问题更多是关于如何获取音素或视位等元数据。例如，通过 Amazon Polly API，您可以在使用 Text-to-Speech 时获取此类数据：

{"time":0,"type":"sentence","start":0,"end":23,"value":"Mary had a little lamb."}
{"time":6,"type":"word","start":0,"end":4,"value":"Mary"}
{"time":6,"type":"viseme","value":"p"}
{"time":73,"type":"viseme","value":"E"}
{"time":180,"type":"viseme","value":"r"}
{"time":292,"type":"viseme","value":"i"}
{"time":373,"type":"word","start":5,"end":8,"value":"had"}
{"time":373,"type":"viseme","value":"k"}
{"time":460,"type":"viseme","value":"a"}
{"time":521,"type":"viseme","value":"t"}
{"time":604,"type":"word","start":9,"end":10,"value":"a"}
{"time":604,"type":"viseme","value":"@"}
{"time":643,"type":"word","start":11,"end":17,"value":"little"}
{"time":643,"type":"viseme","value":"t"}
{"time":739,"type":"viseme","value":"i"}
{"time":769,"type":"viseme","value":"t"}
{"time":799,"type":"viseme","value":"t"}
{"time":882,"type":"word","start":18,"end":22,"value":"lamb"}
{"time":882,"type":"viseme","value":"t"}
{"time":964,"type":"viseme","value":"a"}
{"time":1082,"type":"viseme","value":"p"}

我在问我们是否可以使用 Google Cloud API TTS 获得类似的结果？

谢谢，gma

python - How to get phonemes from Google Cloud API Text-to-Speech

2 回答 2

Related

Reference