1

我基本上希望 tts 在打印出它所说的内容时说话。我几乎复制并粘贴了 pyttsx3 文档来执行此操作,但它不起作用。

import pyttsx3
def onStart(name):
   print ('starting', name)
def onWord(name, location, length):
   print ('word', name, location, length)
def onEnd(name, completed):
   print ('finishing', name, completed)
engine = pyttsx3.init()
engine.connect('started-utterance', onStart)
engine.connect('started-word', onWord)
engine.connect('finished-utterance', onEnd)
engine.say('The quick brown fox jumped over the lazy dog.')
engine.runAndWait()

结果就是这样。单词事件仅在讲话完成后触发,并且没有实际打印任何单词。

starting None
word None 1 0
finishing None True

我已经为此工作了好几天,我尝试了其他库,如 win32com.client.Dispatch('SAPI.Spvoice') 和 gtts,但似乎没有一个能够做我想要的。Sapi.spvoice 似乎有一个事件可以做我想要的,但我似乎也无法让它工作。虽然我也不确定我做得是否正确。https://docs.microsoft.com/en-us/previous-versions/windows/desktop/ms723593(v=vs.85)

from win32com.client import Dispatch
import win32com.client

class ContextEvents():
    def onWord():
        print("the word event occured")
        
        # Work with Result
        
s = Dispatch('SAPI.Spvoice')
e = win32com.client.WithEvents(s, ContextEvents)
s.Speak('The quick brown fox jumped over the lazy dog.')

据我了解,需要有一个事件类,并且事件必须在该类中以 On(event) 的形式出现。或者其他的东西。我尝试安装 espeak,但也没有成功。保持头脑清醒,我有点像python中的新手,所以如果有人愿意给出一个彻底的解释,那就太好了。

4

1 回答 1

0

所以我不熟悉那个库,但很可能正在发生的事情是在事件能够传递给包装库之前生成和播放流。我可以说,如果您想使用 AWS 的 Polly 将输出字级时间信息 - 您需要两次调用 - 一个用于获取音频流,另一个用于获取 ssml 元数据。

Windows .net System.Speech.Synthesis库确实有您可以收听的进度事件,但我不知道是否有 python 库来包装它。

但是,如果您愿意从 python 运行 powershell 命令,那么您可以尝试使用我编写的这个 gist ,它包装了 Windows 综合功能并输出单词计时。这是一个可以为您提供所需内容的示例:

$text = "hello world! this is a long sentence with many words";
$sampleRate = 24000;

# generate tts and save bytes to memory (powershell variable)
# events holds event timings
# NOTE: assumes out-ssml-winrt.ps1 is in current directory, change as needed...
$events = .\out-ssml-winrt.ps1 $text -Variable 'soundstream' -SampleRate $sampleRate -Channels 1 -SpeechMarkTypes 'words';

# estimate duration based on samplerate (rough)
$estimatedDurationMilliseconds = $global:soundstream.Length / $sampleRate * 1000;

$global:e = $events;

# add a final event at the end of the loop to wait for audio to complete
$events += @([pscustomobject]@{ type = 'end'; time = $estimatedDurationMilliseconds; value = '' });
# create background player
$memstream = [System.IO.MemoryStream]::new($global:soundstream);
$player = [System.Media.SoundPlayer]::new($memstream)
$player.Play();

# loop through word events
$now = 0;
$events | % {
    $word = $_;
    # milliseconds into wav file event happens
    $when = $word.time;
    # distance from last timestamp to this event
    $delta = $when - $now;
    # wait until right time to display
    if ($delta -gt 0) {
        Start-sleep -Milliseconds $delta;
    }
    $now = $when;
    # output word
    Write-Output $word.value;
}
# just to let you know - audio should be finished
Write-Output "Playback Complete";
$player.Stop(); $player.Dispose(); $memstream.Dispose();
于 2021-04-23T22:42:10.447 回答