1

尝试使用 DeepSpeech 将音频转换为文本,它适用于来自 Mozilla/DeepSpeech 的默认音频文件。但是当我尝试从 PC 的麦克风录制音频并将其馈送到模型时,它会引发错误('wave.Error: unknown format: 3')。我正在使用 sounddevice 库来录制音频。我试图改变采样率和通道数,但没有奏效。我知道错误是由“wave”函数引起的,但我无法弄清楚。请帮忙

我的代码:

from deepspeech import Model
import numpy as np
import os
import wave
from playsound import playsound
import sounddevice as sd
from scipy.io.wavfile import write
model_file_path = 'deepspeech-0.8.2-models.pbmm'
lm_file_path = 'deepspeech-0.9.3-models.scorer'

beam_width = 500
lm_alpha = 0.93
lm_beta = 1.18

model = Model(model_file_path)
model.enableExternalScorer(lm_file_path)

model.setScorerAlphaBeta(lm_alpha, lm_beta)
model.setBeamWidth(beam_width)


def read_wav_file(filename):
    with wave.open(filename, 'rb') as w:
        rate = w.getframerate()
        frames = w.getnframes()
        buffer = w.readframes(frames)
        print("Rate:", rate)
        print("Frames:", frames)
        print("Buffer Len:", len(buffer))

    return buffer, rate


def transcribe_batch(audio_file):
    buffer, rate = read_wav_file(audio_file)
    data16 = np.frombuffer(buffer, dtype=np.int16)
    return model.stt(data16)


fs = 48000  # Sample rate
seconds = 5  # Duration of recording
myrecording = sd.rec(int(seconds * fs), samplerate=fs,     channels=1)
sd.wait()  # Wait until recording is finished
write('output.wav', fs, myrecording)  # Save as WAV file
playsound('output.wav')


trans = transcribe_batch('output.wav')
print("trancribed message: ", trans)

with open('subtitle.txt', 'w') as f:
   f.write(trans)


print("executed successfully")

错误:

TensorFlow: v2.3.0-6-g23ad988fcd
DeepSpeech: v0.9.3-0-gf2e9c858
2021-11-14 00:57:24.539394: I tensorflow/core/platform/cpu_feature_guard.cc:142] This     TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the     following CPU instructions in performance-critical operations:  AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Traceback (most recent call last):
  File "C:/Users/Lenovo/PycharmProjects/deepspeech-wenv/test.py", line 48, in <module>
    trans = transcribe_batch('output.wav')
  File "C:/Users/Lenovo/PycharmProjects/deepspeech-wenv/test.py", line 35, in transcribe_batch
    buffer, rate = read_wav_file(audio_file)
  File "C:/Users/Lenovo/PycharmProjects/deepspeech-wenv/test.py", line 23, in read_wav_file
    with wave.open(filename, 'rb') as w:
  File "C:\Users\Lenovo\AppData\Local\Programs\Python\Python36\lib\wave.py", line 499, in open
    return Wave_read(f)
  File "C:\Users\Lenovo\AppData\Local\Programs\Python\Python36\lib\wave.py", line 163, in     __init__
    self.initfp(f)
  File "C:\Users\Lenovo\AppData\Local\Programs\Python\Python36\lib\wave.py", line 143, in initfp
    self._read_fmt_chunk(chunk)
  File "C:\Users\Lenovo\AppData\Local\Programs\Python\Python36\lib\wave.py", line 260, in _read_fmt_chunk
    raise Error('unknown format: %r' % (wFormatTag,))
wave.Error: unknown format: 3
4

0 回答 0