ios - 在 AVSpeechUtterance 之后使用 SFSpeechRecognizer 时出现 AVAudioSession 问题

Question

在通过 AVSpeechUtterance 向用户发表欢迎信息后，我尝试使用 SFSpeechRecognizer 进行语音转文本。但是随机地，语音识别没有启动（在说出欢迎信息之后），它会抛出下面的错误信息。

[avas] 错误：AVAudioSession.mm:1049：-[AVAudioSession setActive:withOptions:error:]：停用具有运行 I/O 的音频会话。在停用音频会话之前，应停止或暂停所有 I/O。

它工作了几次。我不清楚为什么它不能始终如一地工作。

我尝试了其他 SO 帖子中提到的解决方案，其中提到检查是否有音频播放器正在运行。我将语音检查添加到代码的文本部分。它返回 false（即没有其他音频播放器正在运行）但文本语音仍然没有开始收听用户语音。你能指导我出什么问题吗？

正在运行 iOS 10.3 的 iPhone 6 上进行测试

以下是使用的代码片段：

文字转语音：

- (void) speak:(NSString *) textToSpeak {
    [[AVAudioSession sharedInstance] setActive:NO withOptions:0 error:nil];
    [[AVAudioSession sharedInstance] setCategory:AVAudioSessionCategoryPlayback
      withOptions:AVAudioSessionCategoryOptionDuckOthers error:nil];

    [synthesizer stopSpeakingAtBoundary:AVSpeechBoundaryImmediate];

    AVSpeechUtterance* utterance = [[AVSpeechUtterance new] initWithString:textToSpeak];
    utterance.voice = [AVSpeechSynthesisVoice voiceWithLanguage:locale];
    utterance.rate = (AVSpeechUtteranceMinimumSpeechRate * 1.5 + AVSpeechUtteranceDefaultSpeechRate) / 2.5 * rate * rate;
    utterance.pitchMultiplier = 1.2;
    [synthesizer speakUtterance:utterance];
}

- (void)speechSynthesizer:(AVSpeechSynthesizer*)synthesizer didFinishSpeechUtterance:(AVSpeechUtterance*)utterance {
    //Return success message back to caller

    [[AVAudioSession sharedInstance] setActive:NO withOptions:0 error:nil];
    [[AVAudioSession sharedInstance] setCategory:AVAudioSessionCategoryAmbient
      withOptions: 0 error: nil];
    [[AVAudioSession sharedInstance] setActive:YES withOptions: 0 error:nil];
}

语音转文本：

- (void) recordUserSpeech:(NSString *) lang {
    NSLocale *locale = [[NSLocale alloc] initWithLocaleIdentifier:lang];
    self.sfSpeechRecognizer = [[SFSpeechRecognizer alloc] initWithLocale:locale];
    [self.sfSpeechRecognizer setDelegate:self];

    NSLog(@"Step1: ");
    // Cancel the previous task if it's running.
    if ( self.recognitionTask ) {
        NSLog(@"Step2: ");
        [self.recognitionTask cancel];
        self.recognitionTask = nil;
    }

    NSLog(@"Step3: ");
    [self initAudioSession];

    self.recognitionRequest = [[SFSpeechAudioBufferRecognitionRequest alloc] init];
    NSLog(@"Step4: ");

    if (!self.audioEngine.inputNode) {
        NSLog(@"Audio engine has no input node");
    }

    if (!self.recognitionRequest) {
        NSLog(@"Unable to created a SFSpeechAudioBufferRecognitionRequest object");
    }

    self.recognitionTask = [self.sfSpeechRecognizer recognitionTaskWithRequest:self.recognitionRequest resultHandler:^(SFSpeechRecognitionResult *result, NSError *error) {

        bool isFinal= false;

        if (error) {
            [self stopAndRelease];
            NSLog(@"In recognitionTaskWithRequest.. Error code ::: %ld, %@", (long)error.code, error.description);
            [self sendErrorWithMessage:error.localizedFailureReason andCode:error.code];
        }

        if (result) {

            [self sendResults:result.bestTranscription.formattedString];
            isFinal = result.isFinal;
        }

        if (isFinal) {
            NSLog(@"result.isFinal: ");
            [self stopAndRelease];
            //return control to caller
        }
    }];

    NSLog(@"Step5: ");

    AVAudioFormat *recordingFormat = [self.audioEngine.inputNode outputFormatForBus:0];

    [self.audioEngine.inputNode installTapOnBus:0 bufferSize:1024 format:recordingFormat block:^(AVAudioPCMBuffer * _Nonnull buffer, AVAudioTime * _Nonnull when) {
        //NSLog(@"Installing Audio engine: ");
        [self.recognitionRequest appendAudioPCMBuffer:buffer];
    }];

    NSLog(@"Step6: ");

    [self.audioEngine prepare];
    NSLog(@"Step7: ");
    NSError *err;
    [self.audioEngine startAndReturnError:&err];
}
- (void) initAudioSession
{
    AVAudioSession *audioSession = [AVAudioSession sharedInstance];
    [audioSession setCategory:AVAudioSessionCategoryRecord error:nil];
    [audioSession setMode:AVAudioSessionModeMeasurement error:nil];
    [audioSession setActive:YES withOptions:AVAudioSessionSetActiveOptionNotifyOthersOnDeactivation error:nil];
}

-(void) stopAndRelease
{
    NSLog(@"Invoking SFSpeechRecognizer stopAndRelease: ");
    [self.audioEngine stop];
    [self.recognitionRequest endAudio];
    [self.audioEngine.inputNode removeTapOnBus:0];
    self.recognitionRequest = nil;
    [self.recognitionTask cancel];
    self.recognitionTask = nil;
}

关于添加的日志，我能够看到所有日志，直到打印“Step7”。

在调试设备中的代码时，它始终在以下行触发中断（我设置了异常断点），但继续继续执行。然而，在少数成功的执行过程中也会以同样的方式发生。

AVAudioFormat *recordingFormat = [self.audioEngine.inputNode outputFormatForBus:0];

[self.audioEngine 准备];

score 2 · Accepted Answer

原因是音频没有完全完成，什么时候-speechSynthesizer:didFinishSpeechUtterance:被调用，因此你在尝试调用时遇到这种错误setActive:NO。AudioSession在 I/O 运行期间，您不能停用或更改任何设置。解决方法：等待几毫秒（请阅读下文），然后执行AudioSession停用等操作。

关于音频播放完成的几句话。

乍一看这可能看起来很奇怪，但我已经花了很多时间来研究这个问题。当您将最后一个声音块放入设备输出时，您只有大概的实际完成时间。查看AudioSession属性ioBufferDuration：

音频 I/O 缓冲区持续时间是单个音频输入/输出周期的秒数。例如，对于每个音频 I/O 周期的 I/O 缓冲区持续时间为 0.005 秒：

如果获得输入，您会收到 0.005 秒的音频。

如果提供输出，您必须提供 0.005 秒的音频。

典型的最大 I/O 缓冲区持续时间为 0.93 秒（对应于 44.1 kHz 采样率下的 4096 个采样帧）。最短 I/O 缓冲区持续时间至少为 0.005 秒（256 帧），但可能会更低，具体取决于所使用的硬件。

因此，我们可以将此值解释为一个块的播放时间。但是您在此时间线和实际音频播放完成（硬件延迟）之间仍然有一个小的非计算持续时间。我会说您需要等待大约ioBufferDuration * 1000 + delayms 以确保音频播放完成（ioBufferDuration * 1000- 因为它是以秒为单位的持续时间），这delay是一些非常小的值。

更重要的是，即使是 Apple 开发人员也不太确定音频完成时间。快速查看新的音频类AVAudioPlayerNode和func scheduleBuffer(_ buffer: AVAudioPCMBuffer, completionHandler: AVFoundation.AVAudioNodeCompletionHandler? = nil)：

@param completionHandler 在缓冲区被播放器消耗或播放器停止后调用。可能为零。

@discussion 安排要在任何先前安排的命令之后播放的缓冲区。可以 在渲染开始之前或缓冲区完全播放之前调用 completionHandler 。

您可以在了解音频单元渲染回调函数（AudioUnit是提供快速访问 I/O 数据的低级 API）中阅读有关音频处理的更多信息。

ios - 在 AVSpeechUtterance 之后使用 SFSpeechRecognizer 时出现 AVAudioSession 问题

1 回答 1

关于音频播放完成的几句话。

Related

Reference