ios - 提高/增加文本到语音的音量 (AVSpeechUtterance) 使其更响亮

Question

我有一个导航应用程序，它使用AVSpeechUtterance. 我已经把音量设置为 1 像这样。speechUtteranceInstance.volume = 1，但与来自 iPhone 的音乐或播客相比，音量仍然非常低，尤其是当声音通过蓝牙或有线连接（如通过蓝牙连接到汽车）时

有什么办法可以提高音量吗？（我知道之前有人问过这个问题，但到目前为止还没有找到适合我的解决方案。）

score 9 · Accepted Answer

After a lot more research and playing around, I found a good workaround solution.

First of all I think this is an iOS bug. When all below conditions are true I found that the voice instruction itself is also ducked (or at least it sounds ducked) resulting in the voice instruction playing at the same volume as the DUCKED music (thus way too soft to hear well).

Playing music in the background
Ducking this background music through the .duckOther audioSessionCategory
Playing a voiceUtterance through AVSpeechSynthesizer
Playing audio over a connected bluetooth device (like bluetooth headset or bluetooth car speakers)

The workaround solution I found is to feed the speechUtterance to an AVAudioEngine. This can only be done on iOS13 or above, since that adds the .write method to AVSpeechSynthesizer

In short I use AVAudioEngine, AVAudioUnitEQ and AVAudioPlayerNode, setting the globalGain property of the AVAudioUnitEQ to about 10 dB. There are also a few quirks with this, but they can be worked around (see code comments).

Here's the complete code:

import UIKit
import AVFoundation
import MediaPlayer

class ViewController: UIViewController {

    // MARK: AVAudio properties
    var engine = AVAudioEngine()
    var player = AVAudioPlayerNode()
    var eqEffect = AVAudioUnitEQ()
    var converter = AVAudioConverter(from: AVAudioFormat(commonFormat: AVAudioCommonFormat.pcmFormatInt16, sampleRate: 22050, channels: 1, interleaved: false)!, to: AVAudioFormat(commonFormat: AVAudioCommonFormat.pcmFormatFloat32, sampleRate: 22050, channels: 1, interleaved: false)!)
    let synthesizer = AVSpeechSynthesizer()
    var bufferCounter: Int = 0

    let audioSession = AVAudioSession.sharedInstance()




    override func viewDidLoad() {
        super.viewDidLoad()



        let outputFormat = AVAudioFormat(commonFormat: AVAudioCommonFormat.pcmFormatFloat32, sampleRate: 22050, channels: 1, interleaved: false)!
        setupAudio(format: outputFormat, globalGain: 0)



    }

    func activateAudioSession() {
        do {
            try audioSession.setCategory(.playback, mode: .voicePrompt, options: [.mixWithOthers, .duckOthers])
            try audioSession.setActive(true, options: .notifyOthersOnDeactivation)
        } catch {
            print("An error has occurred while setting the AVAudioSession.")
        }
    }

    @IBAction func tappedPlayButton(_ sender: Any) {

        eqEffect.globalGain = 0
        play()

    }

    @IBAction func tappedPlayLoudButton(_ sender: Any) {
        eqEffect.globalGain = 10
        play()

    }

    func play() {
        let path = Bundle.main.path(forResource: "voiceStart", ofType: "wav")!
        let file = try! AVAudioFile(forReading: URL(fileURLWithPath: path))
        self.player.scheduleFile(file, at: nil, completionHandler: nil)
        let utterance = AVSpeechUtterance(string: "This is to test if iOS is able to boost the voice output above the 100% limit.")
        synthesizer.write(utterance) { buffer in
            guard let pcmBuffer = buffer as? AVAudioPCMBuffer, pcmBuffer.frameLength > 0 else {
                print("could not create buffer or buffer empty")
                return
            }

            // QUIRCK Need to convert the buffer to different format because AVAudioEngine does not support the format returned from AVSpeechSynthesizer
            let convertedBuffer = AVAudioPCMBuffer(pcmFormat: AVAudioFormat(commonFormat: AVAudioCommonFormat.pcmFormatFloat32, sampleRate: pcmBuffer.format.sampleRate, channels: pcmBuffer.format.channelCount, interleaved: false)!, frameCapacity: pcmBuffer.frameCapacity)!
            do {
                try self.converter!.convert(to: convertedBuffer, from: pcmBuffer)
                self.bufferCounter += 1
                self.player.scheduleBuffer(convertedBuffer, completionCallbackType: .dataPlayedBack, completionHandler: { (type) -> Void in
                    DispatchQueue.main.async {
                        self.bufferCounter -= 1
                        print(self.bufferCounter)
                        if self.bufferCounter == 0 {
                            self.player.stop()
                            self.engine.stop()
                            try! self.audioSession.setActive(false, options: [])
                        }
                    }

                })

                self.converter!.reset()
                //self.player.prepare(withFrameCount: convertedBuffer.frameLength)
            }
            catch let error {
                print(error.localizedDescription)
            }
        }
        activateAudioSession()
        if !self.engine.isRunning {
            try! self.engine.start()
        }
        if !self.player.isPlaying {
            self.player.play()
        }
    }

    func setupAudio(format: AVAudioFormat, globalGain: Float) {
        // QUIRCK: Connecting the equalizer to the engine somehow starts the shared audioSession, and if that audiosession is not configured with .mixWithOthers and if it's not deactivated afterwards, this will stop any background music that was already playing. So first configure the audio session, then setup the engine and then deactivate the session again.
        try? self.audioSession.setCategory(.playback, options: .mixWithOthers)

        eqEffect.globalGain = globalGain
        engine.attach(player)
        engine.attach(eqEffect)
        engine.connect(player, to: eqEffect, format: format)
        engine.connect(eqEffect, to: engine.mainMixerNode, format: format)
        engine.prepare()

        try? self.audioSession.setActive(false)

    }

}

score 0 · Accepted Answer

文档提到默认.volume值为 1.0，这是最响亮的。实际响度基于用户音量设置。如果用户调高音量，我并没有真正遇到语音不够响亮的问题。

如果用户音量级别低于某个级别，也许您可以考虑显示视觉警告。似乎这个答案显示了如何通过 AVAudioSession 做到这一点。

AVAudioSession 值得探索，因为有些设置确实会影响语音输出……例如，您的应用程序是否会中断来自其他应用程序的语音。

score -1 · Accepted Answer

尝试这个：

import Speech

try? AVAudioSession.sharedInstance().setCategory(.playback, mode: .default, options: [])

let utterance = AVSpeechUtterance(string: "Hello world")        
utterance.voice = AVSpeechSynthesisVoice(language: "en-GB")

let synthesizer = AVSpeechSynthesizer()
synthesizer.speak(utterance)

ios - 提高/增加文本到语音的音量 (AVSpeechUtterance) 使其更响亮

3 回答 3

Related

Reference