python - 使用 deepspeech 转录时如何使用 GPU

Question

我正在使用优秀的deepspeech包在 Python 中转录音频文件。这是我的快速实现：

import wave
import deepspeech
import numpy as np

model_file_path = 'deepspeech-0.9.3-models.pbmm'
model = deepspeech.Model(model_file_path)
filename = 'podcast.wav'
w = wave.open(filename, 'r')
frames = w.getnframes()
buffer = w.readframes(frames)
data16 = np.frombuffer(buffer, dtype=np.int16)
text = model.stt(data16)

podcast.wav是一个约 20 分钟的音频文件。运行text = model.stt(data16)需要 10 多分钟（我在 10 分钟后中断了该过程），考虑到 GPU 的可用性（我使用的是 Google Colab），这出乎意料地慢。我怀疑脚本没有使用 GPU。是否有上述代码的另一种实现来确保使用 GPU？我可以确认deepspeech-gpu已安装。

score 1 · Accepted Answer

只安装deepspeech -gpu 就可以了。

pip install deepspeech-gpu

尝试卸载您之前可能已安装的 CPU 版本。

pip uninstall deepspeech

您可以通过监控您的 GPU 使用情况来验证这一点。在 Colab 中运行代码时显示 GPU 使用情况

python - 使用 deepspeech 转录时如何使用 GPU

1 回答 1

Related

Reference