c# - 为什么 Microsoft 语音识别 SemanticValue.Confidence 值始终为 1？

Question

我正在尝试将 SpeechRecognizer 与自定义语法一起使用来处理以下模式：

“你能打开{item}吗？” 其中 {item} 使用 DictationGrammar。

我正在使用内置于 Vista 和 .NET 4.0 中的语音引擎。

我希望能够获得返回的 SemanticValues 的置信度。请参见下面的示例。

如果我简单地使用“recognizer.AddGrammar(new DictationGrammar())”，我可以浏览 e.Results.Alternates 并查看每个备用的置信度值。如果 DictationGrammar 位于顶层，这将起作用。

编造的例子：

你能打开火狐吗？.95
你能打开费尔法克斯吗？.93
可以打开文件传真吗？.72
你会写火狐吗？.85
你能固定费尔法克斯吗？.63

但是，如果我构建一个语法来寻找“你能打开 {semanticValue Key='item' GrammarBuilder=new DictationGrammar()}？”，那么我会得到这个：

你能打开火狐吗？.91 - 语义 = {GrammarBuilder.Name = "你能打开吗"}
你能打开费尔法克斯吗？.91 - 语义 = {GrammarBuilder.Name = "你能打开吗"}
可以打开文件传真吗？.91 - 语义 = {GrammarBuilder.Name = "你能打开吗"}
你会写火狐吗？.85 - 语义=空
你能固定费尔法克斯吗？.63 - 语义=空

.91 向我展示了它与“你能打开 {item} 吗？”的模式相匹配的自信程度。但没有进一步区分。

但是，如果我随后查看 e.Result.Alternates.Semantics.Where( s => s.Key == "item" )，并查看他们的 Confidence，我会得到：

火狐1.0
费尔法克斯 1.0
文件传真 1.0

这对我没有多大帮助。

当我查看匹配语义值的置信度时，我真正想要的是这样的：

火狐 .95
费尔法克斯 .93
文件传真 .85

似乎它应该这样工作......

难道我做错了什么？有没有办法在语音框架内做到这一点？

我希望有一些内置的机制，以便我可以以“正确”的方式做到这一点。

至于另一种可能可行的方法......

使用 SemanticValue 方法匹配模式
对于与该模式匹配的任何内容，提取 {item} 的原始音频（使用 RecognitionResult.Words 和 RecognitionResult.GetAudioForWordRange）
通过具有 DictationGrammar 的 SpeechRecognizer 运行 {item} 的原始音频以获得置信度

...但这比我真正想做的要多。

score 2 · Accepted Answer

我认为听写语法只进行转录。它在不提取语义含义的情况下对文本进行语音，因为根据定义，听写语法支持所有单词并且对您的特定语义映射没有任何线索。您需要使用自定义语法来提取语义。如果您提供 SRGS 语法或在代码中或使用 SpeechServer 工具构建一个，您可以为某些单词和短语指定语义映射。然后识别器可以提取语义并给你一个语义信心。

您应该能够从识别器的识别中获得 Confidence 值，尝试 System.Speech.Recognition.RecognitionResult.Confidence。

Microsoft Server Speech Platform 10.2 SDK 附带的帮助文件有更多详细信息。（这是用于服务器应用程序的 Microsoft.Speech API，与用于客户端应用程序的 System.Speech API 非常相似）请参阅 (http://www.microsoft.com/downloads/en/details.aspx?FamilyID=1b1604d3-4f66 -4241-9a21-90a294a5c9a4.) 或 Microsoft.Speech 文档http://msdn.microsoft.com/en-us/library/microsoft.speech.recognition.semanticvalue(v=office.13).aspx

对于 SemanticValue 类，它说：

所有基于语音平台的识别引擎输出都为所有识别的输出提供有效的语义值实例，甚至是没有明确语义结构的短语。

短语的 SemanticValue 实例是使用 RecognizedPhrase 对象（或从它继承的对象，例如 RecognitionResult）上的 Semantics 属性获得的。

为没有语义结构的识别短语获得的 SemanticValue 对象具有以下特征：

没有孩子（计数为0）

Value 属性为空。

人工置信水平 1.0（由 Confidence 返回）

通常，应用程序间接创建 SemanticValue 实例，通过使用 SemanticResultValue 和 SemanticResultKey 实例以及 Choices 和 GrammarBuilder 对象将它们添加到 Grammar 对象。

SemanticValue 的直接构造在创建强类型语法期间很有用

当您在语法中使用 SemanticValue 功能时，您通常会尝试将不同的短语映射到单一含义。在您的情况下，短语“IE”或“Internet Explorer”应该都映射到相同的语义含义。您在语法中设置选项以理解可以映射到特定含义的每个短语。这是一个简单的 Winform 示例：

private void btnTest_Click(object sender, EventArgs e)
{
    SpeechRecognitionEngine myRecognizer = new SpeechRecognitionEngine();

    Grammar testGrammar = CreateTestGrammar();  
    myRecognizer.LoadGrammar(testGrammar);

    // use microphone
    try
    {
        myRecognizer.SetInputToDefaultAudioDevice();
        WriteTextOuput("");
        RecognitionResult result = myRecognizer.Recognize();              

        string item = null;
        float confidence = 0.0F;
        if (result.Semantics.ContainsKey("item"))
        {
            item = result.Semantics["item"].Value.ToString();
            confidence = result.Semantics["item"].Confidence;
            WriteTextOuput(String.Format("Item is '{0}' with confidence {1}.", item, confidence));
        }

    }
    catch (InvalidOperationException exception)
    {
        WriteTextOuput(String.Format("Could not recognize input from default aduio device. Is a microphone or sound card available?\r\n{0} - {1}.", exception.Source, exception.Message));
        myRecognizer.UnloadAllGrammars();
    }

}

private Grammar CreateTestGrammar()
{                        
    // item
    Choices item = new Choices();
    SemanticResultValue itemSRV;
    itemSRV = new SemanticResultValue("I E", "explorer");
    item.Add(itemSRV);
    itemSRV = new SemanticResultValue("explorer", "explorer");
    item.Add(itemSRV);
    itemSRV = new SemanticResultValue("firefox", "firefox");
    item.Add(itemSRV);
    itemSRV = new SemanticResultValue("mozilla", "firefox");
    item.Add(itemSRV);
    itemSRV = new SemanticResultValue("chrome", "chrome");
    item.Add(itemSRV);
    itemSRV = new SemanticResultValue("google chrome", "chrome");
    item.Add(itemSRV);
    SemanticResultKey itemSemKey = new SemanticResultKey("item", item);

    //build the permutations of choices...
    GrammarBuilder gb = new GrammarBuilder();
    gb.Append(itemSemKey);

    //now build the complete pattern...
    GrammarBuilder itemRequest = new GrammarBuilder();
    //pre-amble "[I'd like] a"
    itemRequest.Append(new Choices("Can you open", "Open", "Please open"));

    itemRequest.Append(gb, 0, 1);

    Grammar TestGrammar = new Grammar(itemRequest);
    return TestGrammar;
}

c# - 为什么 Microsoft 语音识别 SemanticValue.Confidence 值始终为 1？

1 回答 1

Related

Reference