2

I've been writing an extension that allows the user to issue voice commands to control their browser, and things were going great until I hit a catastrophic problem. It goes like this:

The speech recognition object is in continuous mode, and whenever the onerror: 'no-speech' or onend events fire, it restarts. This way, the extension is constantly waiting to accept input and reacts whenever a command is issued, even after 5 minutes of silence.

After a few days of of development, today I reached the point where I was testing it in practical use, and I found that after a little while (and with no change to anything on my part), my onend event started firing constantly. As in, looking at the console, I would see 18,000 requests being made in the space of three seconds, all being instantly denied, thus triggering onend and restarting the request.

I'm aware that it would be optimal to wait for sound before sending a request, or to have local speech recognition capabilities without the need for a remote server, but the present API does not allow that.

Are my suspicions correct? Am I getting request limited?

4

2 回答 2

2

我的怀疑正确吗?我的请求是否受到限制?

是的

我知道在发送请求之前等待声音或在不需要远程服务器的情况下拥有本地语音识别功能是最佳选择,但目前的 API 不允许这样做。

要隐藏您的请求的 IP 源,您可以使用 Tor 等匿名网络,尽管它不会很快。

假设 Google 会花费资源来处理您系统上录制的所有音频,这是幼稚的。在您的应用程序开发中,最好依赖至少提供一些保证的 API。它可以是商业 API 或像 CMUSphinx 这样的开源实现。

使用CMUSphinx,您还可以通过指定命令的语法来正确实现命令关键字检测并提高准确性。

于 2013-09-18T08:19:57.137 回答
1

您还可以使用语音活动检测 (VAD) 算法来检测用户何时说话。这可以通过设置音量阈值或频率阈值来完成(例如,人类语音通常小于 400hz)。这样,您就不会向 Google 发送无用的请求,除非这些条件是有意的。我不建议使用 Tor,因为这会显着增加延迟。CMUSphinx 可能是最好的本地系统选项,但如果仍想使用基于 Web 的服务,我建议使用语音活动检测算法或寻找不同的基于 Web 的软件。

于 2013-09-18T16:31:39.737 回答