我正在制作一个涉及使用 Windows 语音识别的应用程序。我正在考虑使用 c++ 来做到这一点,因为我对这种语言有一些经验。我想使用语音识别的方式是让它在内部工作。如果我将音频文件上传到我的程序中,我希望语音识别将此音频写为文本文件,但这一切都应该在内部完成。请为此提供一些帮助,如果我没有正确解释我的问题,请告诉我,我会再次尝试解释。
在此先感谢, divs
我正在制作一个涉及使用 Windows 语音识别的应用程序。我正在考虑使用 c++ 来做到这一点,因为我对这种语言有一些经验。我想使用语音识别的方式是让它在内部工作。如果我将音频文件上传到我的程序中,我希望语音识别将此音频写为文本文件,但这一切都应该在内部完成。请为此提供一些帮助,如果我没有正确解释我的问题,请告诉我,我会再次尝试解释。
在此先感谢, divs
(Old question, but no accepted answer, and appears quite high in google)
If you really want to do this in C++, you have to download the SAPI SDK, which does not come standard with Windows : http://www.microsoft.com/downloads/en/details.aspx?FamilyID=5e86ec97-40a7-453f-b0ee-6583171b4530&displaylang=en , select SpeechSDK51.exe
The best documentation you can find on SAPI is not on the web, it's in the SDK itself, in the Docs/ folder. The .chm explains everything really well. Here is an additional link to get you started.
However, it C++ is not a requirement for you, I strongly recommend you do it in C#. It's really much simpler (no COM components, no separate SDK, more doc on MSDN, more tutorials, ...) . See this CodeProject article; you'll have to remove all the GUI stuff, and all the speech synthesis stuff, and you'll see, speech recognition boild down to 10 lines of code. Quite impressive.
EDIT sample code, not compiled, not tested :
using System.Speech;
using System.Speech.Recognition;
// in constructor or initialisation
SpeechRecognitionEngine recognizer = null;
recognizer = new SpeechRecognitionEngine();
recognizer.SetInputToDefaultAudioDevice();
recognizer.SpeechRecognized += new EventHandler<SpeechRecognizedEventArgs>(recognizer_SpeechRecognized);
recognizer.RecognizeAsync(RecognizeMode.Multiple);
// The callback called when a sentence is recognized
private void recognizer_SpeechRecognized(object sender, SpeechRecognizedEventArgs e){
string text = e.Result.Text;
// Do whatever you want with 'text' now
}
ta dah, done
Windows 为客户端和服务器提供语音识别引擎。两者都可以使用 C++ 或 .NET 语言进行编程。用于 C++ 编程的传统 API 称为 SAPI。用于客户端和服务器语音的 .NET 框架命名空间是 System.Speech 和 Microsoft.Speech。
SAPI 文档 - http://msdn.microsoft.com/en-us/library/ms723627(VS.85).aspx
用于客户端识别的 .NET 命名空间是 System.Speech - http://msdn.microsoft.com/en-us/library/system.speech.recognition.aspx。Windows Vista 和 7 包括语音引擎。
用于服务器识别的 .NET 命名空间是 Microsoft.Speech,10.2 版本的完整 SDK 可在http://www.microsoft.com/downloads/en/details.aspx?FamilyID=1b1604d3-4f66-4241-9a21- 90a294a5c9a4。语音引擎可免费下载。
许多早期的问题已经解决了这个问题。有关示例,请参阅基于语音识别和SAPI 的原型和 Windows 7 问题。