我正在做一个涉及我公寓里的麦克风的个人项目,我可以向它发出口头命令。为此,我一直在使用 Microsoft Speech API,特别是 C# 中 System.Speech.Recognition 中的 RecognitionEngine。我构造一个语法如下:
// validCommands is a Choices object containing all valid command strings
// recognizer is a RecognitionEngine
GrammarBuilder builder = new GrammarBuilder(recognitionSystemName);
builder.Append(validCommands);
recognizer.SetInputToDefaultAudioDevice();
recognizer.LoadGrammar(new Grammar(builder));
recognizer.RecognizeAsync(RecognizeMode.Multiple);
// etc ...
当我实际给它一个命令时,这似乎工作得很好。它还没有误认我的命令之一。不幸的是,它也倾向于将随机谈话作为命令!我试图通过在命令选项对象前面加上一个“名称”(recognitionSystemName)来改善这一点,我将系统称为。奇怪的是,这似乎没有帮助。我将它限制为一组预定的命令短语,所以我认为它能够检测到语音是否不是任何字符串。我最好的猜测是它假设所有声音都是命令并从命令集中选择最佳匹配。任何有关改进此系统以使其不再触发非针对它的对话的建议都会非常有帮助。
编辑:我已将名称识别器移至单独的 SpeechRecognitionEngine,但准确性很差。这是我为检查准确性而编写的一些测试代码:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Speech.Recognition;
namespace RecognitionAccuracyTest
{
class RecognitionAccuracyTest
{
static int recogcount;
[STAThread]
static void Main()
{
recogcount = 0;
System.Console.WriteLine("Beginning speech recognition accuracy test.");
SpeechRecognitionEngine recognizer;
recognizer = new SpeechRecognitionEngine(new System.Globalization.CultureInfo("en-US"));
recognizer.SetInputToDefaultAudioDevice();
recognizer.LoadGrammar(new Grammar(new GrammarBuilder("Octavian")));
recognizer.SpeechHypothesized += new EventHandler<SpeechHypothesizedEventArgs>(recognizer_SpeechHypothesized);
recognizer.SpeechRecognized += new EventHandler<SpeechRecognizedEventArgs>(recognizer_SpeechRecognized);
recognizer.RecognizeAsync(RecognizeMode.Multiple);
while (true) ;
}
static void recognizer_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
System.Console.WriteLine("Recognized @ " + e.Result.Confidence);
try
{
if (e.Result.Audio != null)
{
System.IO.FileStream stream = new System.IO.FileStream("audio" + ++recogcount + ".wav", System.IO.FileMode.Create);
e.Result.Audio.WriteToWaveStream(stream);
stream.Close();
}
}
catch (Exception) { }
}
static void recognizer_SpeechHypothesized(object sender, SpeechHypothesizedEventArgs e)
{
System.Console.WriteLine("Hypothesized @ " + e.Result.Confidence);
}
}
}
如果名称是“Octavian”,它会识别“Octopus”、“Octagon”、“Volkswagen”和“哇,真的吗?”之类的东西。我可以清楚地听到相关音频剪辑的差异。任何使这不可怕的想法都会很棒。