python - 将声音文件作为 NumPy 数组导入 Python（audiolab 的替代品）

Question

我过去一直在使用Audiolab导入声音文件，效果很好。然而：

它不支持某些格式，例如 mp3，因为底层的 libsndfile拒绝支持它们
在Windows 下的 Python 2.6 中不起作用，作者不在附近修复它

-

In [2]: from scikits import audiolab
--------------------------------------------------------------------

ImportError                               Traceback (most recent call last)

C:\Python26\Scripts\<ipython console> in <module>()

C:\Python26\lib\site-packages\scikits\audiolab\__init__.py in <module>()
     23 __version__ = _version
     24
---> 25 from pysndfile import formatinfo, sndfile
     26 from pysndfile import supported_format, supported_endianness, \
     27                       supported_encoding, PyaudioException, \

C:\Python26\lib\site-packages\scikits\audiolab\pysndfile\__init__.py in <module>()
----> 1 from _sndfile import Sndfile, Format, available_file_formats, available_encodings
      2 from compat import formatinfo, sndfile, PyaudioException, PyaudioIOError
      3 from compat import supported_format, supported_endianness, supported_encoding

ImportError: DLL load failed: The specified module could not be found.``

所以我想：

弄清楚为什么它在 2.6 中不起作用（_sndfile.pyd 有问题？），也许找到一种方法来扩展它以使用不受支持的格式
找到 audiolab 的完整替代品

score 14 · Accepted Answer

Audiolab 正在使用 Python 2.6.2 在 Ubuntu 9.04 上为我工作，所以这可能是 Windows 问题。在您到论坛的链接中，作者还暗示这是一个 Windows 错误。

在过去，这个选项也对我有用：

from scipy.io import wavfile
fs, data = wavfile.read(filename)

请注意data可能具有int数据类型，因此它不会在 [-1,1) 范围内缩放。例如，如果data是int16，您必须除以data在2**15[-1,1) 范围内进行缩放。

score 6 · Accepted Answer

Sox http://sox.sourceforge.net/ can be your friend for this. It can read many many different formats and output them as raw in whatever datatype you prefer. In fact, I just wrote the code to read a block of data from an audio file into a numpy array.

I decided to go this route for portability (sox is very widely available) and to maximize the flexibility of input audio types I could use. Actually, it seems from initial testing that it isn't noticeably slower for what I'm using it for... which is reading short (a few seconds) of audio from very long (hours) files.

Variables you need:

SOX_EXEC # the sox / sox.exe executable filename
filename # the audio filename of course
num_channels # duh... the number of channels
out_byps # Bytes per sample you want, must be 1, 2, 4, or 8

start_samp # sample number to start reading at
len_samp   # number of samples to read

The actual code is really simple. If you want to extract the whole file, you can remove the start_samp, len_samp, and 'trim' stuff.

import subprocess # need the subprocess module
import numpy as NP # I'm lazy and call numpy NP

cmd = [SOX_EXEC,
       filename,              # input filename
       '-t','raw',            # output file type raw
       '-e','signed-integer', # output encode as signed ints
       '-L',                  # output little endin
       '-b',str(out_byps*8),  # output bytes per sample
       '-',                   # output to stdout
       'trim',str(start_samp)+'s',str(len_samp)+'s'] # only extract requested part 

data = NP.fromstring(subprocess.check_output(cmd),'<i%d'%(out_byps))
data = data.reshape(len(data)/num_channels, num_channels) # make samples x channels

PS: Here is code to read stuff from audio file headers using sox...

    info = subprocess.check_output([SOX_EXEC,'--i',filename])
    reading_comments_flag = False
    for l in info.splitlines():
        if( not l.strip() ):
            continue
        if( reading_comments_flag and l.strip() ):
            if( comments ):
                comments += '\n'
            comments += l
        else:
            if( l.startswith('Input File') ):
                input_file = l.split(':',1)[1].strip()[1:-1]
            elif( l.startswith('Channels') ):
                num_channels = int(l.split(':',1)[1].strip())
            elif( l.startswith('Sample Rate') ):
                sample_rate = int(l.split(':',1)[1].strip())
            elif( l.startswith('Precision') ):
                bits_per_sample = int(l.split(':',1)[1].strip()[0:-4])
            elif( l.startswith('Duration') ):
                tmp = l.split(':',1)[1].strip()
                tmp = tmp.split('=',1)
                duration_time = tmp[0]
                duration_samples = int(tmp[1].split(None,1)[0])
            elif( l.startswith('Sample Encoding') ):
                encoding = l.split(':',1)[1].strip()
            elif( l.startswith('Comments') ):
                comments = ''
                reading_comments_flag = True
            else:
                if( other ):
                    other += '\n'+l
                else:
                    other = l
                if( output_unhandled ):
                    print >>sys.stderr, "Unhandled:",l
                pass

score 5 · Accepted Answer

FFmpeg 支持 mp3 并在 Windows 上运行（http://zulko.github.io/blog/2013/10/04/read-and-write-audio-files-in-python-using-ffmpeg/）。

读取 mp3 文件：

import subprocess as sp

FFMPEG_BIN = "ffmpeg.exe"

command = [ FFMPEG_BIN,
        '-i', 'mySong.mp3',
        '-f', 's16le',
        '-acodec', 'pcm_s16le',
        '-ar', '44100', # ouput will have 44100 Hz
        '-ac', '2', # stereo (set to '1' for mono)
        '-']
pipe = sp.Popen(command, stdout=sp.PIPE, bufsize=10**8)

将数据格式化为 numpy 数组：

raw_audio = pipe.proc.stdout.read(88200*4)

import numpy

audio_array = numpy.fromstring(raw_audio, dtype="int16")
audio_array = audio_array.reshape((len(audio_array)/2,2))

score 4 · Accepted Answer

如果您想为 MP3 执行此操作

这是我正在使用的：它使用pydub和 scipy。

完整设置（在 Mac 上，在其他系统上可能不同）：

import tempfile
import os
import pydub
import scipy
import scipy.io.wavfile


def read_mp3(file_path, as_float = False):
    """
    Read an MP3 File into numpy data.
    :param file_path: String path to a file
    :param as_float: Cast data to float and normalize to [-1, 1]
    :return: Tuple(rate, data), where
        rate is an integer indicating samples/s
        data is an ndarray(n_samples, 2)[int16] if as_float = False
            otherwise ndarray(n_samples, 2)[float] in range [-1, 1]
    """

    path, ext = os.path.splitext(file_path)
    assert ext=='.mp3'
    mp3 = pydub.AudioSegment.from_mp3(file_path)
    _, path = tempfile.mkstemp()
    mp3.export(path, format="wav")
    rate, data = scipy.io.wavfile.read(path)
    os.remove(path)
    if as_float:
        data = data/(2**15)
    return rate, data

感谢詹姆斯汤普森的博客

score 2 · Accepted Answer

我最近一直在使用PySoundFile而不是 Audiolab。它很容易安装conda。

像大多数东西一样，它不支持 mp3 。MP3 不再是专利，所以没有理由不支持它；有人只需将支持写入 libsndfile。

python - 将声音文件作为 NumPy 数组导入 Python（audiolab 的替代品）

5 回答 5

Related

Reference