0

我正在创建一项服务来转录实时音频流。适用于 Python的异步 Amazon Transcribe Streaming SDK提供了区分说话者的可能性。

show_speaker_label=True参数传递到客户端配置后,API 返回每个单词的说话者标签,如下所示:

{
  "Transcript": {
    "Results": [
      {
        "Alternatives": [
          {
            "Items": [
              {
                "Confidence": 0.97,
                "Content": "From",
                "EndTime": 18.98,
                "Speaker": "0",
                "StartTime": 18.74,
                "Type": "pronunciation",
                "VocabularyFilterMatch": false
              },
              {
                "Confidence": 1,
                "Content": "the",
                "EndTime": 19.31,
                "Speaker": "0",
                "StartTime": 19,
                "Type": "pronunciation",
                "VocabularyFilterMatch": false
              },
              {
                "Confidence": 1,
                "Content": "last",
                "EndTime": 19.86,
                "Speaker": "0",
                "StartTime": 19.32,
                "Type": "pronunciation",
                "VocabularyFilterMatch": false
              },
             ...
              {
                "Confidence": 1,
                "Content": "chronic",
                "EndTime": 22.55,
                "Speaker": "0",
                "StartTime": 21.97,
                "Type": "pronunciation",
                "VocabularyFilterMatch": false
              },
              ...
                "Confidence": 1,
                "Content": "fatigue",
                "EndTime": 24.42,
                "Speaker": "0",
                "StartTime": 23.95,
                "Type": "pronunciation",
                "VocabularyFilterMatch": false
              },
              {
                "EndTime": 25.22,
                "StartTime": 25.22,
                "Type": "speaker-change",
                "VocabularyFilterMatch": false
              },
              {
                "Confidence": 0.99,
                "Content": "True",
                "EndTime": 25.63,
                "Speaker": "1",
                "StartTime": 25.22,
                "Type": "pronunciation",
                "VocabularyFilterMatch": false
              },
              {
                "Content": ".",
                "EndTime": 25.63,
                "StartTime": 25.63,
                "Type": "punctuation",
                "VocabularyFilterMatch": false
              }
            ],
            "Transcript": "From the last note she still has mild sleep deprivation and chronic fatigue True."
          }
        ],
        "EndTime": 25.63,
        "IsPartial": false,
        "ResultId": "XXXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXX",
        "StartTime": 18.74
      }
    ]
  }
}

我想输出一个简单的逐行转录,其中包括每个句子的说话者标签,如下所示:

Speaker 1: Hello my name is Frank, what is yours?
Speaker 2: Hi, my name is Lucy. Nice to meet you.

但是,我不确定应用哪种策略来解析 API 响应。是否最好通过遍历项目并跟踪当前正在说话的人来解析结果。或者我应该遍历结果并等到遇到“speaker-change”类型的项目?

我已经在 Google 上搜索了示例,但我发现的解决方案要么有点混乱,要么适用于返回的 JSON 响应以进行批量转录。在此处输入链接描述

任何人都有正确解析这些结果的经验吗?您的意见将非常有帮助。

4

0 回答 0