ssml - 无法使用谷歌云文本到语音中的韵律控制音量级别

Question

SSML Volume 属性对输出音频没有影响

以下是ssml

<speak>
    <prosody volume = "+0dB"> This is a sentence with volume 10 For GOOGLE. </prosody>
    <s><prosody volume = "+6dB"> This is a sentence with volume 6 For GOOGLE. </prosody></s> 
    <s><prosody volume = "+24dB"> This is a sentence with volume +24 For GOOGLE. </prosody></s>
    <s><prosody volume = "+48dB"> This is a sentence with volume +48 For GOOGLE.</prosody></s> <s><prosody volume = "+196dB"> This is a sentence with volume +196 For GOOGLE.</prosody></s>
</speak>

这是一个示例代码

  String ssml = $"<speak><prosody volume = \"+0dB\"> This is a sentence with volume 10 For GOOGLE.</prosody>" +
                $" <s><prosody volume = \"+6dB\"> This is a sentence with volume 6 For GOOGLE.</prosody></s>" +
                $" <s><prosody volume = \"+24dB\"> This is a sentence with volume +24 For GOOGLE.</prosody></s>" +
                $" <s><prosody volume = \"+48dB\"> This is a sentence with volume +48 For GOOGLE.</prosody></s>" +
                $" <s><prosody volume = \"+196dB\"> This is a sentence with volume +196 For GOOGLE.</prosody></s>" +
                $"</speak>";

配音(ssml);

    public static void Dubb(string ssml)
    {
        var client = TextToSpeechClient.Create();

        // The input to be synthesized, can be provided as text or SSML.
        var input = new SynthesisInput
        {
            Ssml = ssml
        };

        // Build the voice request.
        var voiceSelection = new VoiceSelectionParams
        {
            LanguageCode = "en-US",
            SsmlGender = SsmlVoiceGender.Female
        };

        // Specify the type of audio file.
        var audioConfig = new AudioConfig
        {
            AudioEncoding = AudioEncoding.Linear16
        };


        // Perform the text-to-speech request.
        var response = client.SynthesizeSpeech(input, voiceSelection, audioConfig);

        // Write the response to the output file.
        using (var output = File.Create("output.wav"))
        {
            response.AudioContent.WriteTo(output);
        }

    }

我预计每行的音量都会增加，但事实并非如此。

score 0 · Accepted Answer

我试过这个

<speak>
    <prosody volume = "+0dB"> This is a sentence with volume 10 For GOOGLE. </prosody>
    <s><prosody volume = "+6dB"> This is a sentence with volume 6 For GOOGLE. </prosody></s> 
    <s><prosody volume = "+24dB"> This is a sentence with volume +24 For GOOGLE. </prosody></s>
    <s><prosody volume = "+48dB"> This is a sentence with volume +48 For GOOGLE.</prosody></s> <s><prosody volume = "+196dB"> This is a sentence with volume +196 For GOOGLE.</prosody></s>
</speak>

在TTS UI上，它确实按预期工作。

从那里您可以将其导出为 JSON（也许它可以帮助您）。

{
  "audioConfig": {
    "audioEncoding": "LINEAR16",
    "pitch": 0,
    "speakingRate": 1
  },
  "input": {
    "ssml": "<speak> <prosody volume = \"+0dB\"> This is a sentence with volume 10 For GOOGLE. </prosody> <s><prosody volume = \"+6dB\"> This is a sentence with volume 6 For GOOGLE. </prosody></s> <s><prosody volume = \"+24dB\"> This is a sentence with volume +24 For GOOGLE. </prosody></s> <s><prosody volume = \"+48dB\"> This is a sentence with volume +48 For GOOGLE.</prosody></s> <s><prosody volume = \"+196dB\"> This is a sentence with volume +196 For GOOGLE.</prosody></s> </speak>"
  },
  "voice": {
    "languageCode": "en-US",
    "name": "en-US-Standard-A"
  }
}

ssml - 无法使用谷歌云文本到语音中的韵律控制音量级别

1 回答 1

Related

Reference