android - 我想在我的家庭自动化应用程序中加入连续（免提）语音命令识别

Question

我创建了一个简单的 android 应用程序来控制连接到我的 Raspberry Pi 的继电器。我已经使用按钮以及基本的语音识别来触发这些按钮并打开/关闭相应的中继通道。

到目前为止，语音识别部分由 RecognizerIntent 处理，其中我需要按下我的应用程序上的一个按钮来打开一个谷歌语音提示，它会听我的语音命令并激活/停用控制继电器开关的相应按钮。

我想对连续语音识别做同样的事情，它允许应用程序连续听我的命令，而用户不必按下应用程序上的按钮，从而允许免提操作。

这是我现有的代码，一种非常简单的语音识别方法，可以让我打开和关闭连接到继电器的各种设备的按钮：

public void micclick(View view) {
        if(view.getId()==R.id.mic)
        {promptSpeechInput();}
}

private void promptSpeechInput() {
    Intent i= new Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH);
    i.putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL,RecognizerIntent.LANGUAGE_MODEL_FREE_FORM);
    i.putExtra(RecognizerIntent.EXTRA_LANGUAGE, Locale.getDefault());
    i.putExtra(RecognizerIntent.EXTRA_PROMPT,"Speak!");
    try{
        startActivityForResult(i,100);

    }
    catch (ActivityNotFoundException a)
    {
        Toast.makeText(MainActivity.this,"Sorry your device doesn't support",Toast.LENGTH_SHORT).show();
    }
}
public void onActivityResult(int requestCode, int resultCode, Intent i) {
    super.onActivityResult(requestCode, resultCode, i);
    String voicetxt;
    switch (requestCode) {
        case 100:
            if (resultCode == RESULT_OK && i != null) {
                ArrayList<String> result2 = i.getStringArrayListExtra(RecognizerIntent.EXTRA_RESULTS);
                voicetxt = result2.get(0);
                if (voicetxt.equals("fan on")) {
                    StringBuffer result=new StringBuffer();
                    toggleButton1.setChecked(true);
                    result.append("Fan: ").append(toggleButton1.getText());
                    sc.onRelayNumber="a";
                    new Thread(sc).start();
                    Toast.makeText(MainActivity.this, result.toString(),Toast.LENGTH_SHORT).show();
                }
                if (voicetxt.equals("fan of")) {
                    StringBuffer result=new StringBuffer();
                    toggleButton1.setChecked(false);
                    result.append("Fan: ").append(toggleButton1.getText());
                    sc.onRelayNumber = "a_off";
                    new Thread(sc).start();
                    Toast.makeText(MainActivity.this, result.toString(),Toast.LENGTH_SHORT).show();
                }
                if (voicetxt.equals("light on")) {
                    StringBuffer result=new StringBuffer();
                    toggleButton2.setChecked(true);
                    result.append("Light: ").append(toggleButton2.getText());
                    sc.onRelayNumber = "b";
                    new Thread(sc).start();
                    Toast.makeText(MainActivity.this, result.toString(),Toast.LENGTH_SHORT).show();
                }
                if (voicetxt.equals("light off")) {
                    StringBuffer result=new StringBuffer();
                    toggleButton2.setChecked(false);
                    result.append("Light: ").append(toggleButton2.getText());
                    sc.onRelayNumber = "b_off";
                    new Thread(sc).start();
                    Toast.makeText(MainActivity.this, result.toString(),Toast.LENGTH_SHORT).show();
                }
                if (voicetxt.equals("air conditioner on")) {
                    StringBuffer result=new StringBuffer();
                    toggleButton3.setChecked(true);
                    result.append("AC: ").append(toggleButton3.getText());
                    sc.onRelayNumber = "c";
                    new Thread(sc).start();
                    Toast.makeText(MainActivity.this, result.toString(),Toast.LENGTH_SHORT).show();
                }
                if (voicetxt.equals("air conditioner of")) {
                    StringBuffer result=new StringBuffer();
                    toggleButton3.setChecked(false);
                    result.append("AC: ").append(toggleButton3.getText());
                    sc.onRelayNumber = "c_off";
                    new Thread(sc).start();
                    Toast.makeText(MainActivity.this, result.toString(),Toast.LENGTH_SHORT).show();
                }
                if (voicetxt.equals("heater on")) {
                    StringBuffer result=new StringBuffer();
                    toggleButton4.setChecked(true);
                    result.append("Heater: ").append(toggleButton4.getText());
                    sc.onRelayNumber = "d";
                    new Thread(sc).start();
                    Toast.makeText(MainActivity.this, result.toString(),Toast.LENGTH_SHORT).show();
                }
                if (voicetxt.equals("heater off")) {
                    StringBuffer result=new StringBuffer();
                    toggleButton4.setChecked(false);
                    result.append("Heater: ").append(toggleButton4.getText());
                    sc.onRelayNumber = "d_off";
                    new Thread(sc).start();
                    Toast.makeText(MainActivity.this, result.toString(),Toast.LENGTH_SHORT).show();
                }
            }
            break;
    }
}

我想实现相同的功能而不必按下按钮。请注意，我是 Android 应用程序开发的新手。如果可能，请描述外部库的使用，如果需要的话，因为我认为谷歌的 RecognizerIntent 不可能进行连续识别。我推测我可能需要包含像CMUSphinx这样的库，但我不确定如何去做。

score 5 · Accepted Answer

对于连续识别/听写模式，您可以做几件事。您可以使用 android 本身的 google 语音识别，不建议用于连续识别（如https://developer.android.com/reference/android/speech/SpeechRecognizer.html所述）

此 API 的实现可能会将音频流式传输到远程服务器以执行语音识别。因此，此 API 不打算用于连续识别，这会消耗大量电池和带宽。

但是如果你真的需要它，你可以通过创建自己的类并继承 IRecognitionListener 来解决问题。（我是在xamarin-android上写的，语法和原生android很相似）

public class CustomRecognizer : Java.Lang.Object, IRecognitionListener, TextToSpeech.IOnInitListener
{
    private SpeechRecognizer _speech;

    private Intent _speechIntent;


    public string Words;


    public CustomRecognizer(Context _context)
    {
        this._context = _context;
        Words = "";
        _speech = SpeechRecognizer.CreateSpeechRecognizer(this._context);
        _speech.SetRecognitionListener(this);
        _speechIntent = new Intent(RecognizerIntent.ActionRecognizeSpeech);
        _speechIntent.PutExtra(RecognizerIntent.ExtraLanguageModel, RecognizerIntent.LanguageModelFreeForm);
        _speechIntent.PutExtra(RecognizerIntent.ActionRecognizeSpeech, RecognizerIntent.ExtraPreferOffline);
        _speechIntent.PutExtra(RecognizerIntent.ExtraSpeechInputCompleteSilenceLengthMillis, 1000); 
        _speechIntent.PutExtra(RecognizerIntent.ExtraSpeechInputPossiblyCompleteSilenceLengthMillis, 1000);
        _speechIntent.PutExtra(RecognizerIntent.ExtraSpeechInputMinimumLengthMillis, 1500);
    }

    void startover()
    {
        _speech.Destroy();
        _speech = SpeechRecognizer.CreateSpeechRecognizer(this._context);
        _speech.SetRecognitionListener(this);
        _speechIntent = new Intent(RecognizerIntent.ActionRecognizeSpeech);
        _speechIntent.PutExtra(RecognizerIntent.ExtraSpeechInputCompleteSilenceLengthMillis, 1000);
        _speechIntent.PutExtra(RecognizerIntent.ExtraSpeechInputPossiblyCompleteSilenceLengthMillis, 1000);
        _speechIntent.PutExtra(RecognizerIntent.ExtraSpeechInputMinimumLengthMillis, 1500);
    StartListening();
    }
    public void StartListening()
    {
        _speech.StartListening(_speechIntent);
    }

    public void StopListening()
    {
        _speech.StopListening();
    }

    public void OnBeginningOfSpeech()
    {

    }

    public void OnBufferReceived(byte[] buffer)
    {
    }

    public void OnEndOfSpeech()
    {

    }

    public void OnError([GeneratedEnum] SpeechRecognizerError error)
    {
        Words = error.ToString();
        startover();
    }

    public void OnEvent(int eventType, Bundle @params)
    {
    }

    public void OnPartialResults(Bundle partialResults)
    {
    }

    public void OnReadyForSpeech(Bundle @params)
    {
    }

    public void OnResults(Bundle results)
    {

        var matches = results.GetStringArrayList(SpeechRecognizer.ResultsRecognition);
        if (matches == null)
            Words = "Null";
        else
            if (matches.Count != 0)
            Words = matches[0];
        else
            Words = "";

        //do anything you want for the result
        }
        startover();
    }

    public void OnRmsChanged(float rmsdB)
    {

    }

    public void OnInit([GeneratedEnum] OperationResult status)
    {
        if (status == OperationResult.Error)
            txtspeech.SetLanguage(Java.Util.Locale.Default);
    }


}

在活动中调用它：

void StartRecording()
    {
        string rec = PackageManager.FeatureMicrophone;

        if (rec != "android.hardware.microphone")
        {
            // no microphone, no recording. Disable the button and output an alert
            Toast.MakeText(this, "NO MICROPHONE", ToastLength.Short);
        }
        else
        {

            //you can pass any object you want to connect to your recognizer here (I am passing the activity)
            CustomRecognizer voice = new CustomRecognizer(this);
            voice.StartListening();

        }
    }

不要忘记请求使用麦克风的权限！

解释：

- 这将消除烦人的“点击开始录制”

- 这将始终记录您调用 StartListening() 的那一刻并且永远不会停止，因为每次完成录制时我总是调用 startover() 或 StartListening()

- 这是一个非常糟糕的解决方法，因为在它处理您的录音的那一刻，录音机在调用 StartListening() 之前不会收到任何声音输入（没有解决方法）

- 谷歌识别对语音指令不是很好，因为语言模型是“[lang]句子”，所以你不能限制单词，结果谷歌总是会尝试做出一个“好句子”。

为了更好的结果和UX，我真的建议你使用Google Cloud API（但它必须是在线的，并且成本高），第二个建议是CMUSphinx / PocketSphinx，它是开源的，可以做离线模式，但你必须做所有事情手动

PocketSphinx 优势：

您可以创建自己的字典
离线模式兼容
您可以对声学模型（语音等）进行自己的训练，因此您可以根据您的环境和发音进行配置
您可以通过访问“PartialResult”获得实时结果

PocketSphinx 缺点：您必须手动完成所有操作，包括设置声学模型、字典、语言模型、阈值等（如果您想要简单的东西，那就过分了）。

score 0 · Accepted Answer

您可以使用以下方法使其响应“房子，（暂停）打开灯”（尽管它会响应其他词而不仅仅是“房子”）：

您可以使用 AudioRecord 类连续录制音频（但不能录制到文件中），直到原始录制数据告诉您检测到某个音量阈值（例如说“house”时）。当您检测到此阈值时，您将继续录制，直到至少有 0.5 秒的静音。此时，您停止录制并立即调用 SpeechRecognizer.StartListening 函数。然后用户会听到哔哔声，当他听到哔哔声时，他可以说“开灯”。

因此，这样的东西应该可以实现所需的功能，尽管它并不完美。

此致。

android - 我想在我的家庭自动化应用程序中加入连续（免提）语音命令识别

2 回答 2

Related

Reference