.net - Microsoft 语音识别：具有置信度分数的替代结果？

Question

我是使用 Microsoft.Speech 识别器（使用 Microsoft Speech Platform SDK 版本 11）的新手，我试图让它从一个简单的语法输出 n 最佳识别匹配，以及每个的置信度分数。

根据文档（以及在对这个问题的回答中e.Result.Alternates提到的），除了得分最高的单词之外，应该能够使用它来访问已识别的单词。但是，即使将置信度拒绝阈值重置为 0（这应该意味着什么都不会被拒绝），我仍然只得到一个结果，并且没有替代（尽管SpeechHypothesized事件表明至少其他单词中的一个似乎确实被识别为 non - 在某些时候置信度为零）。

我的问题：任何人都可以向我解释为什么我只得到一个识别词，即使置信度拒绝阈值设置为零？如何获得其他可能的匹配项及其置信度分数？我在这里想念什么？

下面是我的代码。提前感谢任何可以提供帮助的人:)

在下面的示例中，识别器被发送一个单词“news”的 wav 文件，并且必须从相似的单词（“noose”、“newts”）中进行选择。我想提取每个单词的识别器置信度得分列表（它们都应该不为零），即使它只会返回最好的一个（“新闻”）作为结果。

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using Microsoft.Speech.Recognition;

namespace SimpleRecognizer
{
    class Program
    {
        static readonly string[] settings = new string[] {
            "CFGConfidenceRejectionThreshold",
            "HighConfidenceThreshold", 
            "NormalConfidenceThreshold",
            "LowConfidenceThreshold"};

        static void Main(string[] args)
        {
            // Create a new SpeechRecognitionEngine instance.
            SpeechRecognitionEngine sre = new SpeechRecognitionEngine(); //en-US SRE

            // Configure the input to the recognizer.
            sre.SetInputToWaveFile(@"C:\Users\Anjana\Documents\news.wav");

            // Display Recognizer Settings (Confidence Thresholds)
            ListSettings(sre);

            // Set Confidence Threshold to Zero (nothing should be rejected)
            sre.UpdateRecognizerSetting("CFGConfidenceRejectionThreshold", 0);
            sre.UpdateRecognizerSetting("HighConfidenceThreshold", 0);
            sre.UpdateRecognizerSetting("NormalConfidenceThreshold", 0);
            sre.UpdateRecognizerSetting("LowConfidenceThreshold", 0);

            // Display New Recognizer Settings
            ListSettings(sre);

            // Build a simple Grammar with three choices
            Choices topics = new Choices();
            topics.Add(new string[] { "news", "newts", "noose" });
            GrammarBuilder gb = new GrammarBuilder();
            gb.Append(topics);
            Grammar g = new Grammar(gb);
            g.Name = "g";

            // Load the Grammar
            sre.LoadGrammar(g);

            // Register handlers for Grammar's SpeechRecognized Events
            g.SpeechRecognized += new EventHandler<SpeechRecognizedEventArgs>(gram_SpeechRecognized);

            // Register a handler for the recognizer's SpeechRecognized event.
            sre.SpeechRecognized += new EventHandler<SpeechRecognizedEventArgs>(sre_SpeechRecognized);

            // Register Handler for SpeechHypothesized
            sre.SpeechHypothesized += new EventHandler<SpeechHypothesizedEventArgs>(sre_SpeechHypothesized);

            // Start recognition.
            sre.Recognize();

            Console.ReadKey(); //wait to close

        }
        static void gram_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
        {
            Console.WriteLine("\nNumber of Alternates from Grammar {1}: {0}", e.Result.Alternates.Count.ToString(), e.Result.Grammar.Name);
            foreach (RecognizedPhrase phrase in e.Result.Alternates)
            {
                Console.WriteLine(phrase.Text + ", " + phrase.Confidence);
            }
        }
        static void sre_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
        {
            Console.WriteLine("\nSpeech recognized: " + e.Result.Text + ", " + e.Result.Confidence);
            Console.WriteLine("Number of Alternates from Recognizer: {0}", e.Result.Alternates.Count.ToString());
            foreach (RecognizedPhrase phrase in e.Result.Alternates)
            {
                Console.WriteLine(phrase.Text + ", " + phrase.Confidence);
            }
        }
        static void sre_SpeechHypothesized(object sender, SpeechHypothesizedEventArgs e)
        {
            Console.WriteLine("Speech from grammar {0} hypothesized: {1}, {2}", e.Result.Grammar.Name, e.Result.Text, e.Result.Confidence);
        }
        private static void ListSettings(SpeechRecognitionEngine recognizer)
        {
            foreach (string setting in settings)
            {
                try
                {
                    object value = recognizer.QueryRecognizerSetting(setting);
                    Console.WriteLine("  {0,-30} = {1}", setting, value);
                }
                catch
                {
                    Console.WriteLine("  {0,-30} is not supported by this recognizer.",
                      setting);
                }
            }
            Console.WriteLine();
        }
    }
}

这给出了以下输出：

Original recognizer settings:
  CFGConfidenceRejectionThreshold = 20
  HighConfidenceThreshold        = 80
  NormalConfidenceThreshold      = 50
  LowConfidenceThreshold         = 20

Updated recognizer settings:
  CFGConfidenceRejectionThreshold = 0
  HighConfidenceThreshold        = 0
  NormalConfidenceThreshold      = 0
  LowConfidenceThreshold         = 0

Speech from grammar g hypothesized: noose, 0.2214646
Speech from grammar g hypothesized: news, 0.640804

Number of Alternates from Grammar g: 1
news, 0.9208503

Speech recognized: news, 0.9208503
Number of Alternates from Recognizer: 1
news, 0.9208503

我还尝试为每个单词使用一个单独的短语（而不是一个具有三个选项的短语），甚至为每个单词/短语使用单独的语法来实现这一点。结果基本相同：只有一个“替代品”。

score 1 · Accepted Answer

我相信这是 SAPI 允许您询问 SR 引擎并不真正支持的东西的另一个地方。

Microsoft.Speech.Recognition 和 System.Speech.Recognition 都使用底层 SAPI 接口来完成它们的工作；唯一的区别是使用的是哪个 SR 引擎。（Microsoft.Speech.Recognition 使用服务器引擎；System.Speech.Recognition 使用桌面引擎。）

替代主要是为听写设计的，而不是上下文无关的语法。您始终可以为 CFG 获得一个替代项，但替代生成代码看起来不会扩展 CFG 的替代项。

不幸的是，Microsoft.Speech.Recognition 引擎不支持听写。（但是，它确实可以处理质量低得多的音频，并且不需要培训。）

.net - Microsoft 语音识别：具有置信度分数的替代结果？

1 回答 1

Related

Reference