我正在创建一项服务来转录实时音频流。适用于 Python的异步 Amazon Transcribe Streaming SDK提供了区分说话者的可能性。
将show_speaker_label=True
参数传递到客户端配置后,API 返回每个单词的说话者标签,如下所示:
{
"Transcript": {
"Results": [
{
"Alternatives": [
{
"Items": [
{
"Confidence": 0.97,
"Content": "From",
"EndTime": 18.98,
"Speaker": "0",
"StartTime": 18.74,
"Type": "pronunciation",
"VocabularyFilterMatch": false
},
{
"Confidence": 1,
"Content": "the",
"EndTime": 19.31,
"Speaker": "0",
"StartTime": 19,
"Type": "pronunciation",
"VocabularyFilterMatch": false
},
{
"Confidence": 1,
"Content": "last",
"EndTime": 19.86,
"Speaker": "0",
"StartTime": 19.32,
"Type": "pronunciation",
"VocabularyFilterMatch": false
},
...
{
"Confidence": 1,
"Content": "chronic",
"EndTime": 22.55,
"Speaker": "0",
"StartTime": 21.97,
"Type": "pronunciation",
"VocabularyFilterMatch": false
},
...
"Confidence": 1,
"Content": "fatigue",
"EndTime": 24.42,
"Speaker": "0",
"StartTime": 23.95,
"Type": "pronunciation",
"VocabularyFilterMatch": false
},
{
"EndTime": 25.22,
"StartTime": 25.22,
"Type": "speaker-change",
"VocabularyFilterMatch": false
},
{
"Confidence": 0.99,
"Content": "True",
"EndTime": 25.63,
"Speaker": "1",
"StartTime": 25.22,
"Type": "pronunciation",
"VocabularyFilterMatch": false
},
{
"Content": ".",
"EndTime": 25.63,
"StartTime": 25.63,
"Type": "punctuation",
"VocabularyFilterMatch": false
}
],
"Transcript": "From the last note she still has mild sleep deprivation and chronic fatigue True."
}
],
"EndTime": 25.63,
"IsPartial": false,
"ResultId": "XXXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXX",
"StartTime": 18.74
}
]
}
}
我想输出一个简单的逐行转录,其中包括每个句子的说话者标签,如下所示:
Speaker 1: Hello my name is Frank, what is yours?
Speaker 2: Hi, my name is Lucy. Nice to meet you.
但是,我不确定应用哪种策略来解析 API 响应。是否最好通过遍历项目并跟踪当前正在说话的人来解析结果。或者我应该遍历结果并等到遇到“speaker-change”类型的项目?
我已经在 Google 上搜索了示例,但我发现的解决方案要么有点混乱,要么适用于返回的 JSON 响应以进行批量转录。在此处输入链接描述
任何人都有正确解析这些结果的经验吗?您的意见将非常有帮助。