首先,没有内置的数字概念。语音只是单词的序列,如果您需要识别数字 - 您需要识别表示数字的单词,例如“一”和“十五”。有些数字由多个单词表示,例如“一百”或“五十一”——您也需要识别它们。
您可以从识别 1 到 9 的数字开始:
var engine = new SpeechRecognitionEngine(CultureInfo.GetCultureInfo("en-US"));
engine.SetInputToDefaultAudioDevice();
var num1To9 = new Choices(
new SemanticResultValue("one", 1),
new SemanticResultValue("two", 2),
new SemanticResultValue("three", 3),
new SemanticResultValue("four", 4),
new SemanticResultValue("five", 5),
new SemanticResultValue("six", 6),
new SemanticResultValue("seven", 7),
new SemanticResultValue("eight", 8),
new SemanticResultValue("nine", 9));
var gb = new GrammarBuilder();
gb.Culture = CultureInfo.GetCultureInfo("en-US");
gb.Append("set timer for");
gb.Append(num1To9);
gb.Append("seconds");
var g = new Grammar(gb);
engine.LoadGrammar(g); // better not use LoadGrammarAsync
engine.SpeechRecognized += OnSpeechRecognized;
engine.RecognizeAsync(RecognizeMode.Multiple);
Console.WriteLine("Speak");
Console.ReadKey();
所以我们的语法可以理解为:
- “设置计时器”短语
- 后跟“一”或“二”或“三”……
- 后跟“秒”
我们使用SemanticResultValue
将标签分配给特定的短语。在这种情况下,标签是对应于特定单词(“one”、“two”、“three”)的数字(1,2,3...)。通过这样做 - 您可以从识别结果中提取该值:
private static void OnSpeechRecognized(object sender, SpeechRecognizedEventArgs e) {
var numSeconds = (int)e.Result.Semantics.Value;
Console.WriteLine($"Starting timer for {numSeconds} seconds...");
}
这已经是一个工作示例,它将识别您的短语,例如“将计时器设置为 5 秒”,并允许您从中提取语义值 (5)。
现在您可以将各种数字单词组合在一起,例如:
var num10To19 = new Choices(
new SemanticResultValue("ten", 10),
new SemanticResultValue("eleven", 11),
new SemanticResultValue("twelve", 12),
new SemanticResultValue("thirteen", 13),
new SemanticResultValue("fourteen", 14),
new SemanticResultValue("fifteen", 15),
new SemanticResultValue("sexteen", 16),
new SemanticResultValue("seventeen", 17),
new SemanticResultValue("eighteen", 18),
new SemanticResultValue("nineteen", 19)
);
var numTensFrom20To90 = new Choices(
new SemanticResultValue("twenty", 20),
new SemanticResultValue("thirty", 30),
new SemanticResultValue("forty", 40),
new SemanticResultValue("fifty", 50),
new SemanticResultValue("sixty", 60),
new SemanticResultValue("seventy", 70),
new SemanticResultValue("eighty", 80),
new SemanticResultValue("ninety", 90)
);
var num20to99 = new GrammarBuilder();
// first word is "twenty", "thirty" etc
num20to99.Append(numTensFrom20To90);
// followed by ONE OR ZERO "digit" words ("one", "two", "three" etc)
num20to99.Append(num1To9, 0, 1);
但是正确地为它们分配语义值变得很棘手,因为这个 apiGrammarBuilder
没有足够强大的能力来做到这一点。
当您想要做的事情不能(轻松)使用纯GrammarBuilder
类和相关类完成时 - 您必须使用更强大的 xml 文件,其语法在本规范中定义。
这些语法文件的描述超出了此问题的范围,但幸运的是,对于您的任务,Microsoft Speech SDK 中已经提供了您可能已经下载并安装的语法文件。因此,从“C:\Program Files\Microsoft SDKs\Speech\v11.0\Samples\Sample Grammars\en-US.grxml”(或您安装 SDK 的任何位置)复制文件并删除一些不相关的内容,例如第一个<tag>
元素里面有大的CDATA。
此文件中的相关规则名为“Cardinal”,允许识别 0 到 100 万之间的数字。那么我们的代码就变成了:
var sampleDoc = new SrgsDocument(@"en-US-sample.grxml");
sampleDoc.Culture = CultureInfo.GetCultureInfo("en-US");
// define new rule, named Timer
SrgsRule rootRule = new SrgsRule("Timer");
// match "set timer for" phrase
rootRule.Add(new SrgsItem("set timer for"));
// followed by whatever "Cardinal" rule defines (reference to another rule)
rootRule.Add(new SrgsRuleRef(sampleDoc.Rules["Cardinal"]));
// followed by "seconds"
rootRule.Add(new SrgsItem("seconds"));
// add to rules
sampleDoc.Rules.Add(rootRule);
// make it a root rule, so that it will be used for recognition
sampleDoc.Root = rootRule;
var g = new Grammar(sampleDoc);
engine.LoadGrammar(g); // better not use LoadGrammarAsync
engine.SpeechRecognized += OnSpeechRecognized;
engine.RecognizeAsync(RecognizeMode.Multiple);
处理程序变为:
private static void OnSpeechRecognized(object sender, SpeechRecognizedEventArgs e) {
var numSeconds = Convert.ToInt32(e.Result.Semantics.Value);
Console.WriteLine($"Starting timer for {numSeconds} seconds...");
}
现在您可以识别多达 100 万个数字。
当然,没有必要像我们上面所做的那样在代码中定义规则 - 您可以在 xml 中完全定义所有规则,然后只需将其加载并从中SrgsDocument
创建一个。Grammar
如果您想识别多个命令 - 这是一个示例:
var sampleDoc = new SrgsDocument(@"en-US-sample.grxml");
sampleDoc.Culture = CultureInfo.GetCultureInfo("en-US");
// this rule is the same as above
var setTimerRule = new SrgsRule("SetTimer");
setTimerRule.Add(new SrgsItem("set timer for"));
setTimerRule.Add(new SrgsRuleRef(sampleDoc.Rules["Cardinal"]));
setTimerRule.Add(new SrgsItem("seconds"));
sampleDoc.Rules.Add(setTimerRule);
// new rule, clear timer
var clearTimerRule = new SrgsRule("ClearTimer");
// just match this phrase
clearTimerRule.Add(new SrgsItem("clear timer"));
sampleDoc.Rules.Add(clearTimerRule);
// new root rule, marching either set timer OR clear timer
var rootRule = new SrgsRule("Times");
rootRule.Add(new SrgsOneOf( // << OneOf is basically the same as Choice
// reference to SetTimer
new SrgsItem(new SrgsRuleRef(setTimerRule),
// assign command name. Both "command" and "settimer" are arbitrary names I chose
new SrgsSemanticInterpretationTag("out = rules.latest();out.command = 'settimer';")),
new SrgsItem(new SrgsRuleRef(clearTimerRule),
// assign command name. If this rule "wins" - command will be cleartimer
new SrgsSemanticInterpretationTag("out.command = 'cleartimer';"))
));
sampleDoc.Rules.Add(rootRule);
sampleDoc.Root = rootRule;
var g = new Grammar(sampleDoc);
处理程序变为:
private static void OnSpeechRecognized(object sender, SpeechRecognizedEventArgs e) {
var sem = e.Result.Semantics;
// here "command" is arbitrary key we assigned in our rule
var commandName = (string) sem["command"].Value;
switch (commandName) {
// also arbitrary values we assigned, not related to rule names or something else
case "settimer":
var numSeconds = Convert.ToInt32(sem.Value);
Console.WriteLine($"Starting timer for {numSeconds} seconds...");
break;
case "cleartimer":
Console.WriteLine("timer cleared");
break;
}
}
对于完整性 - 这是您如何使用纯 xml 执行相同操作的方法。使用 xml 编辑器打开该“en-US-sample.grxml”文件,并添加我们在上面在代码中定义的规则。它们看起来像这样:
<rule id="SetTimer" scope="private">
<item>set timer for</item>
<item>
<ruleref uri="#Cardinal" />
</item>
<item>seconds</item>
</rule>
<rule id="ClearTimer" scope="private">
<item>clear timer</item>
</rule>
<rule id="Timers" scope="public">
<one-of>
<item>
<ruleref uri="#SetTimer" />
<tag>out = rules.latest(); out.command = 'settimer'</tag>
</item>
<item>
<ruleref uri="#ClearTimer" />
<tag>out.command = 'cleartimer'</tag>
</item>
</one-of>
</rule>
现在在根语法标记处设置根规则:
<grammar xml:lang="en-US" version="1.0" xmlns="http://www.w3.org/2001/06/grammar" tag-format="semantics/1.0"
root="Timers">
并保存。
现在我们根本不需要在代码中定义任何东西,我们需要做的就是加载我们的语法文件:
var sampleDoc = new SrgsDocument(@"en-US-sample.grxml");
var g = new Grammar(sampleDoc);
engine.LoadGrammar(g);
就这样。因为“Timers”规则是语法文件中的根规则——它将用于识别,其行为与我们在代码中定义的版本完全相同。