我目前正在处理一些音频数据。我有一个音频文件,它是通过使用 pydub 将较大的文件拆分为静音而创建的。
但是,如果我在使用 pydub 导出此音频文件后将其获取,然后将 AudioSegment 的数组转换为 numpy 数组,并使用 soundfile 重新写入,我得到的音频文件写入速度约为原来的一半。可能出了什么问题?
import soundfile as sf
import numpy as np
from pydub import AudioSegment, effects
from pathlib import Path
# This code takes a large .mp3 file ("original_audio_mp3") with sample rate of 44100 khz
sound = AudioSegment.from_file(original_audio_mp3)
if sound.frame_rate != desired_sample_rate:
sound = sound.set_frame_rate(desired_sample_rate) # convert to 16000 khz sample rate
sound = effects.normalize(sound) # normalize audio file
dBFS = sound.dBFS # get decibels relative to full scale
sound_chunks = split_on_silence(sound,
min_silence_len = 200, # measured in ms
silence_thresh = dBFS -30 # if DBFS goes 30 below the file's dBFS it will be considered "silence"
# this "audio_segment_0.wav" file came from the above code.
audio_file_path = Path("audio_segment_0.wav")
raw_audio = AudioSegment.from_file(audio_file_path).set_frame_rate(16000)
# append 200 ms of silence to beginning and end of file
raw_audio = effects.normalize(raw_audio)
silence = AudioSegment.silent(duration = 200, frame_rate = 16000)
raw_audio_w_silence = silence + raw_audio + silence
# export it
raw_audio_w_silence.export("pydub_audio.wav", format = 'wav') # the output from this sounds completely OK.
# read audio, manipulate and write with soundfile
new_audio = AudioSegment.from_file("pydub_audio.wav").set_frame_rate(16000)
new_audio_signal = np.array(new_audio.get_array_of_samples(), dtype = np.float32) / 32768.0 # scale to between [-1.0, 1.0]
# the output from down here using the scaled numpy array sounds about half the speed as the first.
sf.write("soundfile_export.wav", data = new_audio_signal, samplerate = new_audio.frame_rate, format = 'wav')