28

我知道以下问题: How to create a pydub AudioSegment using an numpy array?

我的问题正好相反。如果我有一个 pydub AudioSegment 如何将它转换为一个 numpy 数组?

我想使用 scipy 过滤器等。我不太清楚 AudioSegment 原始数据的内部结构是什么。

4

4 回答 4

19

Pydub 具有将音频数据作为样本数组获取的功能,它是一个array.array实例(不是 numpy 数组),但您应该能够相对容易地将其转换为 numpy 数组:

from pydub import AudioSegment
sound = AudioSegment.from_file("sound1.wav")

# this is an array
samples = sound.get_array_of_samples()

不过,您也许可以创建实现的 numpy 变体。该方法的实现非常简单:

def get_array_of_samples(self):
    """
    returns the raw_data as an array of samples
    """
    return array.array(self.array_type, self._data)

也可以从(修改的?)样本数组创建新的音频片段:

new_sound = sound._spawn(samples)

上面有点hacky,它是为AudioSegment类中的内部使用而编写的,但它主要只是弄清楚你正在使用什么类型的音频数据(样本数组、样本列表、字节、字节串等)。尽管有下划线前缀,但使用它是安全的。

于 2016-06-24T20:32:32.057 回答
10

您可以array.array从 an获取AudioSegment然后将其转换为 a numpy.ndarray

from pydub import AudioSegment
import numpy as np
song = AudioSegment.from_mp3('song.mp3')
samples = song.get_array_of_samples()
samples = np.array(samples)
于 2017-03-02T22:51:25.380 回答
4

现有的答案都不是完美的,他们错过了重塑和样本宽度。我已经编写了这个函数来帮助将音频转换为 np 中的标准音频表示:

def pydub_to_np(audio: pydub.AudioSegment) -> (np.ndarray, int):
    """
    Converts pydub audio segment into np.float32 of shape [duration_in_seconds*sample_rate, channels],
    where each value is in range [-1.0, 1.0]. 
    Returns tuple (audio_np_array, sample_rate).
    """
    return np.array(audio.get_array_of_samples(), dtype=np.float32).reshape((-1, audio.channels)) / (
            1 << (8 * audio.sample_width - 1)), audio.frame_rate

于 2021-04-02T16:23:16.670 回答
1

get_array_of_samples(未在[ReadTheDocs.AudioSegment]: audiosegment 模块上找到)返回一个一维数组,并且效果不佳,因为它丢失了有关音频流的信息(帧,通道,...)

几天前,我遇到了这个问题,当我使用[PyPI]: sounddevice(需要一个numpy.ndarray)播放声音时(我需​​要在不同的输出音频设备上播放它)。这就是我想出的。

代码00.py

#!/usr/bin/env python

import sys
from pprint import pprint as pp
import numpy as np
import pydub
import sounddevice as sd


def audio_file_to_np_array(file_name):
    asg = pydub.AudioSegment.from_file(file_name)
    dtype = getattr(np, "int{:d}".format(asg.sample_width * 8))  # Or could create a mapping: {1: np.int8, 2: np.int16, 4: np.int32, 8: np.int64}
    arr = np.ndarray((int(asg.frame_count()), asg.channels), buffer=asg.raw_data, dtype=dtype)
    print("\n", asg.frame_rate, arr.shape, arr.dtype, arr.size, len(asg.raw_data), len(asg.get_array_of_samples()))  # @TODO: Comment this line!!!
    return arr, asg.frame_rate


def main(*argv):
    pp(sd.query_devices())  # @TODO: Comment this line!!!
    a, fr = audio_file_to_np_array("./test00.mp3")
    dvc = 5  # Index of an OUTPUT device (from sd.query_devices() on YOUR machine)
    #sd.default.device = dvc  # Change default OUTPUT device
    sd.play(a, samplerate=fr)
    sd.wait()


if __name__ == "__main__":
    print("Python {:s} {:03d}bit on {:s}\n".format(" ".join(elem.strip() for elem in sys.version.split("\n")),
                                                   64 if sys.maxsize > 0x100000000 else 32, sys.platform))
    rc = main(*sys.argv[1:])
    print("\nDone.")
    sys.exit(rc)

输出

 [cfati@CFATI-5510-0:e:\Work\Dev\StackOverflow\q038015319]> set PATH=%PATH%;f:\Install\pc064\FFMPEG\FFMPEG\4.3.1\bin

 [cfati@CFATI-5510-0:e:\Work\Dev\StackOverflow\q038015319]> dir /b
 code00.py
 test00.mp3

 [cfati@CFATI-5510-0:e:\Work\Dev\StackOverflow\q038015319]> "e:\Work\Dev\VEnvs\py_pc064_03.09.01_test0\Scripts\python.exe" code00.py
 Python 3.9.1 (tags/v3.9.1:1e5d33e, Dec  7 2020, 17:08:21) [MSC v.1927 64 bit (AMD64)] 064bit on win32

    0 Microsoft Sound Mapper - Input, MME (2 in, 0 out)
 >  1 Microphone (Logitech USB Headse, MME (2 in, 0 out)
    2 Microphone (Realtek Audio), MME (2 in, 0 out)
    3 Microsoft Sound Mapper - Output, MME (0 in, 2 out)
 <  4 Speakers (Logitech USB Headset), MME (0 in, 2 out)
    5 Speakers / Headphones (Realtek , MME (0 in, 2 out)
    6 Primary Sound Capture Driver, Windows DirectSound (2 in, 0 out)
    7 Microphone (Logitech USB Headset), Windows DirectSound (2 in, 0 out)
    8 Microphone (Realtek Audio), Windows DirectSound (2 in, 0 out)
    9 Primary Sound Driver, Windows DirectSound (0 in, 2 out)
   10 Speakers (Logitech USB Headset), Windows DirectSound (0 in, 2 out)
   11 Speakers / Headphones (Realtek Audio), Windows DirectSound (0 in, 2 out)
   12 Realtek ASIO, ASIO (2 in, 2 out)
   13 Speakers (Logitech USB Headset), Windows WASAPI (0 in, 2 out)
   14 Speakers / Headphones (Realtek Audio), Windows WASAPI (0 in, 2 out)
   15 Microphone (Logitech USB Headset), Windows WASAPI (1 in, 0 out)
   16 Microphone (Realtek Audio), Windows WASAPI (2 in, 0 out)
   17 Microphone (Realtek HD Audio Mic input), Windows WDM-KS (2 in, 0 out)
   18 Speakers (Realtek HD Audio output), Windows WDM-KS (0 in, 2 out)
   19 Stereo Mix (Realtek HD Audio Stereo input), Windows WDM-KS (2 in, 0 out)
   20 Microphone (Logitech USB Headset), Windows WDM-KS (1 in, 0 out)
   21 Speakers (Logitech USB Headset), Windows WDM-KS (0 in, 2 out)

  44100 (82191, 2) int16 164382 328764 164382

 --- (Manually inserted line) Sound is playing :) ---

 Done.

备注

  • 如所见,没有硬编码的值(就维度、dtype而言,...)
  • 我还需要返回采样率(因为它不能嵌入数组中),并且它是设备需要的(在这种情况下,它是44.1k,这是默认值 - 但我已经测试了具有该值一半的文件)
  • 所有现有答案都使用浮点数来表示样本。这对我不起作用,因为大多数测试文件的采样率是16 位长,并且不支持np.float16 (我的FPU),所以我不得不使用int
  • 附带说明一下,在对各种文件进行测试时,SoundDevice无法在我的Win笔记本电脑上播放.m4a(很可能是因为32k采样率),但PyDub能够
于 2021-12-15T15:54:06.520 回答