4

I'm attempting to process an audio file in python and apply a Low Pass filter to remove some background noise. Currently I am capable of successfully loading the file and generating an array with its data values:

class AudioModule:

    def __init__(self, fname=""):
      self.stream = wave.open(fname, 'r')
      self.frames = [] 

    def build(self):
      self.stream.rewind()
      for x in range(self.stream.getnframes()):
           self.frames.append(struct.unpack('B',self.stream.readframes(1)))  

I used struct.unpack('B'..) for this particular file. The audio file being loaded outputs the following specifications:

nchannels: 1
sampwidth: 1
framerate: 6000

I know that sampwidth specifies the width in bytes returned by each readframes(1) call. Upon loading the array it contains values as shown (ranging from 128 to 180 throughout):

>>> r.frames[6000:6025]
[(127,), (127,), (127,), (127,), (128,), (128,), (128,), (128,), (128,), (128,),      (128,), (128,), (128,), (128,), (128,), (128,), (128,), (128,), (128,), (128,), (128,), (128,), (128,), (128,), (128,)]

Question: What do those numbers represent? Other audio files with larger sample-width give completely different numbers. My goal is to trim out certain frequencies from the audio file, unfortunately I know very little about this and am unaware as to how these values relate to frequency.

What is the best ways to remove all values above a certain frequency threshold?

Additionally the values are being packed back to a different file as follows:

def store(self, fout=""):
      out = wave.open(fout, 'w')
      nchannels = self.stream.getnchannels()
      sampwidth = self.stream.getsampwidth()
      framerate = self.stream.getframerate()
      nframes = len(self.frames)
      comptype = "NONE"
      compname = "not compressed"

      out.setparams((nchannels, sampwidth, framerate, nframes,
          comptype, compname))

      if nchannels == 1:
           for f in self.frames:
                data = struct.pack('B', f[0])
                out.writeframes(data)
      elif nchannels == 2:
           for f in self.frames:
                data = struct.pack('BB', f[0], f[1])
                out.writeframes(data)
      out.close()     
4

1 回答 1

4

我认为这些数字是膜或体积振动延伸的抽象。较高的值意味着膜的较大振动。你可以在这里阅读更多。

样本宽度是体积的范围。采样类型不同,采样宽度也不同。例如,如果样本宽度为 1 位,那么我们只能将音频描述为有声或无声。因此,通常更高的样本宽度,音频质量更高。有关样本宽度的更多信息,您可以阅读Sample Rate and Bitrate:The Guts of Digital Audio

并且存储在音频文件中的信号是在时域中的。它不代表频率。如果要获取频域中的值,可以对得到的数组执行FFT

我建议使用numpy来执行音频。例如,要获取您想要的数组,您只需要使用np.fromstring. 并且已经定义了FFT等相关函数。许多样本和论文可以在 Google 上找到。

于 2013-07-16T13:15:13.543 回答