android - 谷歌语音识别超时

Question

我正在开发一个基于语音识别的 Android 应用程序。

直到今天，一切都正常且及时，例如我会启动我的语音识别器，说话，并且在最多 1 或 2 秒内应用程序收到结果。

这是一个非常可接受的用户体验。

那么今天我现在必须等待十秒钟或更长时间才能获得识别结果。

我尝试设置以下 EXTRAS，但没有任何明显的区别

RecognizerIntent.EXTRA_SPEECH_INPUT_POSSIBLY_COMPLETE_SILENCE_LENGTH_MILLIS
RecognizerIntent.EXTRA_SPEECH_INPUT_COMPLETE_SILENCE_LENGTH_MILLIS
RecognizerIntent.EXTRA_SPEECH_INPUT_MINIMUM_LENGTH_MILLIS

我一直在不断更改我的应用程序，但是这些更改都与语音识别器无关。

有什么方法可以减少语音识别器从切换onBeginningOfSpeech()到之间的时间onResults()吗？

这是一个需要多长时间的例子

07-01 17:50:20.839 24877-24877/com.voice I/Voice: onReadyForSpeech()
07-01 17:50:21.614 24877-24877/com.voice I/Voice: onBeginningOfSpeech()
07-01 17:50:38.163 24877-24877/com.voice I/Voice: onEndOfSpeech()

score 22 · Accepted Answer

编辑-显然已在 2016 年 8 月即将发布的版本中修复您可以测试测试版以确认。

这是 Google 'Now' V6.0.23.* 版本中的一个错误，并且在最新的 V6.1.28.* 中仍然存在

自从 V5.11.34.* 发布以来，Google 的实现SpeechRecognizer一直被错误所困扰。

你可以使用这个要点来复制其中的许多。

您可以使用此BugRecognitionListener来解决其中的一些问题。

我已将这些直接报告给 Now 团队，所以他们知道，但到目前为止，还没有解决任何问题。Google Now 没有外部错误跟踪器，因为它不是 AOSP 的一部分，所以恐怕没有什么可以加星标的。

您详细介绍的最新错误几乎使它们的实现无法使用，正如您正确指出的那样，控制语音输入时间的参数被忽略了。根据文档：

此外，根据识别器实现，这些值可能无效。

是我们应该期待的......

如果您不说话或不发出任何可察觉的声音，识别将无限期地继续。

我目前正在创建一个项目来复制这个新错误和所有其他错误，我将很快转发并链接到这里。

编辑- 我希望我可以创建一个解决方法，使用部分或不稳定结果的检测作为触发器来知道用户仍在说话。一旦他们停下来，我可以recognizer.stopListening()在一段时间后手动调用。

不幸的是，stopListening()它也被破坏并且实际上并没有停止识别，因此没有解决方法。

围绕上述尝试，销毁识别器并仅依赖部分结果直到该点（当onResults()不调用销毁识别器时）未能产生可靠的实现，除非您只是关键字定位。

在 Google 解决此问题之前，我们无能为力。您唯一的出路是发送电子邮件至apps-help@google.com报告问题，并希望他们收到的数量能给他们带来帮助......

score 9 · Accepted Answer

笔记！这仅适用于在线模式。 启用听写模式并禁用部分结果：

intent.putExtra("android.speech.extra.DICTATION_MODE", true);
intent.putExtra(RecognizerIntent.EXTRA_PARTIAL_RESULTS, false);

在听写模式下，speechRecognizer 仍会调用onPartialResults()，但是您应该将部分视为最终结果。

score 4 · Accepted Answer

更新：

以防万一有人在设置语音识别时遇到问题，您可以使用我构建的Droid Speech 库来克服 android 中的语音超时问题。

我的应用完全依赖于语音识别功能，而谷歌已经投下了一颗炸弹。从表面上看，我相信至少在不久的将来不会解决这个问题。

目前，我确实找到了一种解决方案，可以让谷歌语音识别按预期传递语音结果。

注意：此方法与上述解决方案略有不同。

此方法的主要目的是确保用户说出的整个单词都被 onPartialResults() 捕获。

在正常情况下，如果用户在给定实例中说出多个单词，则响应时间太快，部分结果往往只会得到第一个单词而不是完整的结果。

因此，为了确保在 onPartialResults() 捕获每个单词，引入了一个处理程序来检查用户暂停延迟，然后过滤结果。另请注意，来自 onPartialResults() 的结果数组通常只有一个项目。

SpeechRecognizer userSpeech = SpeechRecognizer.createSpeechRecognizer(this);

Intent speechIntent = new Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH);
speechIntent.putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL, RecognizerIntent.LANGUAGE_MODEL_FREE_FORM);
speechIntent.putExtra(RecognizerIntent.EXTRA_CALLING_PACKAGE, this.getPackageName());
speechIntent.putExtra(RecognizerIntent.EXTRA_PARTIAL_RESULTS, true);
speechIntent.putExtra(RecognizerIntent.EXTRA_MAX_RESULTS, ModelData.MAX_VOICE_RESULTS);

Handler checkForUserPauseAndSpeak = new Handler(); 
Boolean speechResultsFound = false;

userSpeech.setRecognitionListener(new RecognitionListener(){

    @Override
    public void onRmsChanged(float rmsdB)
    {
        // NA
    }

    @Override
    public void onResults(Bundle results)
    {
        if(speechResultsFound) return;

        speechResultsFound = true;

        // Speech engine full results (Do whatever you would want with the full results)
    }

    @Override
    public void onReadyForSpeech(Bundle params)
    {
        // NA
    }

    @Override
    public void onPartialResults(Bundle partialResults)
    {
        if(partialResults.getStringArrayList(SpeechRecognizer.RESULTS_RECOGNITION).size() > 0 &&
                partialResults.getStringArrayList(SpeechRecognizer.RESULTS_RECOGNITION).get(0) != null &&
                !partialResults.getStringArrayList(SpeechRecognizer.RESULTS_RECOGNITION).get(0).trim().isEmpty())
        {
            checkForUserPauseAndSpeak.removeCallbacksAndMessages(null);
            checkForUserPauseAndSpeak.postDelayed(new Runnable()
            {
                @Override
                public void run()
                {
                    if(speechResultsFound) return;

                    speechResultsFound = true;

                    // Stop the speech operations
                    userSpeech.destroy();

                    // Speech engine partial results (Do whatever you would want with the partial results)

                }

            }, 1000);
        }
    }

    @Override
    public void onEvent(int eventType, Bundle params)
    {
        // NA
    }

    @Override
    public void onError(int error)
    {
        // Error related code
    }

    @Override
    public void onEndOfSpeech()
    {
        // NA
    }

    @Override
    public void onBufferReceived(byte[] buffer)
    {
        // NA
    }

    @Override
    public void onBeginningOfSpeech()
    {
        // NA
    }
});

userSpeech.startListening(speechIntent);

score 3 · Accepted Answer

我发现的最佳解决方案（直到谷歌修复错误）是进入谷歌应用程序信息，然后点击“卸载更新”按钮。这将删除对该应用程序所做的所有对语音识别器有直接影响的更新，基本上将其恢复为出厂状态。

**在我们知道它已修复之前停止自动更新可能是个好主意。***注意：这只是针对开发人员的解决方案，显然如果您在商店中有应用程序，这将无济于事。对不起...

score 2 · Accepted Answer

更新：在我今天的测试中，这个错误似乎终于得到了解决，这不再是必要的。留下它以防将来再次损坏。根据我的测试，语音超时工作正常。

好的，我知道这非常难看，但它似乎可以使用 onPartialResults （我理解 onPartialResults 的陷阱，但我已经尝试过几次了，直到谷歌修复了这个荒谬的错误！）我还没有详尽地测试过它然而（我将在应用程序中使用它时会发回结果）但我迫切需要一个解决方案。基本上，我使用 onRmsChanged 来触发用户说完，假设当 RmsDb 低于峰值并且 2 秒内没有 onPartialResults 时，我们就完成了。

我不喜欢这件事的一件事是摧毁 SR 会发出双 uh-oh 哔哔声。FWIW 和 YMMV。请发布任何改进！

注意：如果您要重复使用它，请不要忘记重置 bBegin 和 fPeak！您还需要重新创建 SR（onStartCommand 或停止并启动服务。）

import android.app.Service;
import android.content.Intent;
import android.os.Bundle;
import android.os.IBinder;
import android.speech.RecognitionListener;
import android.speech.RecognizerIntent;
import android.speech.SpeechRecognizer;
import android.support.annotation.Nullable;
import android.util.Log;

import java.util.ArrayList;

public class SpeechToTextService extends Service {

    private String TAG = "STT";

    float fPeak;
    boolean bBegin;
    long lCheckTime;
    long lTimeout = 2000;

    @Override
    public void onCreate() {
        super.onCreate();

        bBegin = false;
        fPeak = -999; //Only to be sure it's under ambient RmsDb.

        final SpeechRecognizer sr = SpeechRecognizer.createSpeechRecognizer(getApplicationContext());
        sr.setRecognitionListener(new RecognitionListener() {

            @Override
            public void onReadyForSpeech(Bundle bundle) {
                Log.i(TAG, "onReadyForSpeech");
            }

            @Override
            public void onBeginningOfSpeech() {
                bBegin = true;
                Log.i(TAG, "onBeginningOfSpeech");
            }

            @Override
            public void onRmsChanged(float rmsDb) {
                if(bBegin) {
                    if (rmsDb > fPeak) {
                        fPeak = rmsDb;
                        lCheckTime = System.currentTimeMillis();
                    }
                    if (System.currentTimeMillis() > lCheckTime + lTimeout) {
                        Log.i(TAG, "DONE");
                        sr.destroy();
                    }
                }
                //Log.i(TAG, "rmsDB:"+rmsDb);
            }

            @Override
            public void onBufferReceived(byte[] buffer) {
                Log.i(TAG, "onBufferReceived");
            }

            @Override
            public void onEndOfSpeech() {
                Log.i(TAG, "onEndOfSpeech");
            }

            @Override
            public void onError(int error) {
                Log.i(TAG, "onError:" + error);
            }

            @Override
            public void onResults(Bundle results) {

                ArrayList data = results.getStringArrayList(
                        SpeechRecognizer.RESULTS_RECOGNITION);

                String sTextFromSpeech;
                if (data != null) {
                    sTextFromSpeech = data.get(0).toString();
                } else {
                    sTextFromSpeech = "";
                }
                Log.i(TAG, "onResults:" + sTextFromSpeech);
            }

            @Override
            public void onPartialResults(Bundle bundle) {

                lCheckTime = System.currentTimeMillis();
                ArrayList data = bundle.getStringArrayList(
                        SpeechRecognizer.RESULTS_RECOGNITION);

                String sTextFromSpeech;
                if (data != null) {
                    sTextFromSpeech = data.get(0).toString();
                } else {
                    sTextFromSpeech = "";
                }
                Log.i(TAG, "onPartialResults:" + sTextFromSpeech);
            }

            @Override
            public void onEvent(int eventType, Bundle params) {

                Log.i(TAG, "onEvent:" + eventType);
            }
        });

        Intent iSRIntent = new Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH);
        iSRIntent.putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL,
                RecognizerIntent.LANGUAGE_MODEL_FREE_FORM);
        iSRIntent.putExtra(RecognizerIntent.EXTRA_PARTIAL_RESULTS, true);
        iSRIntent.putExtra(RecognizerIntent.EXTRA_CALLING_PACKAGE, getPackageName());
        iSRIntent.putExtra(RecognizerIntent.EXTRA_LANGUAGE, "en-US");
        iSRIntent.putExtra(RecognizerIntent.EXTRA_LANGUAGE_PREFERENCE, "en-US");
        sr.startListening(iSRIntent);
    }

    @Nullable
    @Override
    public IBinder onBind(Intent intent) {
        return null;
    }
}

score 0 · Accepted Answer

仅离线解决方案：

我也遇到过同样的问题（Android 系统需要 25 秒才能通过onPartialResults()afteronEndOfSpeech()触发生成语音的转录。

我已经尝试了以下代码并且它有效：

Intent.putExtra
(
    RecognizerIntent.EXTRA_PREFER_OFFLINE,
    true
);

此解决方案适用于我的应用程序，如果您不使用在线模式（我通过手机设置下载了语言包），它可能适用于您。

android - 谷歌语音识别超时

6 回答 6

Related

Reference