我需要借助 Microsoft 语音 SDK ( System.Speech.Recognition
) 来识别用户发音的“质量”。我正在使用 MS Speech Engine - US,所以我真正需要的是找出说话者的声音与“北美”口音的接近程度。
一种方法是检查用户的声音与美国英语语音发音的接近程度。正如 MSDN 中提到的,这个过程似乎是由它自己在语音 SDK 中完成的,所以我需要把它弄出来。由于我们也可以自己为引擎设置语音,我相信这是可能的。
但是,我不清楚我必须做什么。那么,如何才能了解用户的发音质量/与美国北美英语语音发音的接近程度?用户只需说出预定义的句子,例如“Hello World。我在这里”。
更新
通过使用以下代码,我得到了某种“音素”(如 MSDN 中所述)
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Speech.Recognition;
using System.Speech.Synthesis;
using System.Windows.Forms;
using System.IO;
namespace US_Speech_Recognizer
{
public class RecognizeSpeech
{
private SpeechRecognitionEngine sEngine; //Speech recognition engine
private SpeechSynthesizer sSpeak; //Speech synthesizer
string text3 = "";
public RecognizeSpeech()
{
//Make the recognizer ready
sEngine = new SpeechRecognitionEngine(new System.Globalization.CultureInfo("en-US"));
//Load grammar
Choices sentences = new Choices();
sentences.Add(new string[] { "I am hungry" });
GrammarBuilder gBuilder = new GrammarBuilder(sentences);
Grammar g = new Grammar(gBuilder);
sEngine.LoadGrammar(g);
//Add a handler
sEngine.SpeechRecognized +=new EventHandler<SpeechRecognizedEventArgs>(sEngine_SpeechRecognized);
sSpeak = new SpeechSynthesizer();
sSpeak.Rate = -2;
//Computer speaks the words to get the phones
Stream stream = new MemoryStream();
sSpeak.SetOutputToWaveStream(stream);
sSpeak.Speak("I was hungry");
stream.Position = 0;
sSpeak.SetOutputToNull();
//Configure the recognizer to stream
sEngine.SetInputToWaveStream(stream);
sEngine.RecognizeAsync(RecognizeMode.Single);
}
//Start the speech recognition task
private void sEngine_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
string text = "";
if (e.Result.Text == "I am hungry")
{
foreach (RecognizedWordUnit wordUnit in e.Result.Words)
{
text = text + wordUnit.Pronunciation + "\n";
}
MessageBox.Show(e.Result.Text + "\n" + text);
}
}
}
}
这是与音素相关的直接代码片段(摘自以上代码)
//Start the speech recognition task
private void sEngine_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
string text = "";
if (e.Result.Text == "I am hungry")
{
foreach (RecognizedWordUnit wordUnit in e.Result.Words)
{
text = text + wordUnit.Pronunciation + "\n";
}
MessageBox.Show(e.Result.Text + "\n" + text);
}
}
以下是我的输出。我得到的音素从第二行开始显示。第一行简单地显示了识别的句子
所以,请告诉我,根据 MSDN,这是“音素”。那么,这实际上是“音素”吗?我从来没有见过这些,这就是为什么。
上面的代码是根据这个链接完成的http://msdn.microsoft.com/en-us/library/microsoft.speech.recognition.srgsgrammar.srgstoken.pronunciation(v=office.14).aspx