c# - 如何从音乐文件中读取样本？

Question

从编程的角度来看，我刚刚开始进行音乐编辑，我了解很多关于波形和类似性质的想法，但我一直被困在如何从声音文件中读取单个样本作为字节数组。

我正在使用 Alvas.Audio 库 ( http://www.alvas.net/alvas.audio.aspx ) 和 C#，如果这有助于回答问题。

我知道不同的文件格式有不同的数据存储方式，但我的主要问题围绕着如何以编程方式确定数据的存储方式并一次遍历文件一个样本。我可能会将所有文件转换为 .wav 格式（使用 Alvas 库），因此专门针对 wav 格式的答案就足够了，但是当文件为立体声时，我仍然对迭代样本感到好奇。据我了解，具有立体数据的文件连续存储并行样本。

我的最终目标是能够从歌曲的某个时间段（歌曲中某个地方的几秒钟）中获取样本，然后对它们进行一些数学运算或其他东西，但我只是不确定是什么我读入的数据实际上是正确的。

score 3 · Accepted Answer

另请参阅什么是 PCM 格式？

PCM（脉冲编码调制）是一种未压缩的音频格式。我们得到 Wav 文件，它维护（保存）PCM 数据。看看怎么做什么是 Wav 文件？方法 AudioCompressionManager.GetWaveFormat 有助于调查音频格式。

FormatTag = 1 是 PCM。
声道 = 用于单声道（单声道）、双声道（立体声）、8 个用于 7.1 环绕声（左、右、中、左环绕、右环绕、左后、右后位置。7.1 系统也有 1 个低声道频率效果 (LFE)，通常发送到低音炮）。
SamplesPerSec = 每秒（或采样）的数字化数量值。可以是任何值，但标准值：8000 Hz、11025 Hz、12000 Hz、16000 Hz、22050 Hz、24000 Hz、32000 Hz、44100 Hz、48000 Hz。
BitsPerSample - 最常见的使用 8 位（1 字节）和 16 位（2 字节）。很少有 24 位（3 字节）、32 位（4 字节）和 64 位（4 字节）。如果我们将 16 位视为基本格式，那么 8 位可以视为一种压缩格式。它的大小要小两倍，但对于 16 位，值的变体只能是 28 = 256 而不是 216 = 65536。这就是为什么 8 位音质会明显低于 16 位的原因。
BlockAlign = Channels * BitsPerSample / 8。其中 8 是每个字节的位数。
AvgBytesPerSec（比特率）= Channels * SamplesPerSec * BitsPerSample / 8。

您可以使用下面的代码更具体地分析 PCM 音频格式。

    private void WhatIsPcmFormat(string fileName)
    {
        WaveReader wr = new WaveReader(File.OpenRead(fileName));
        IntPtr format = wr.ReadFormat();
        wr.Close();
        WaveFormat wf = AudioCompressionManager.GetWaveFormat(format);
        if (wf.wFormatTag == AudioCompressionManager.PcmFormatTag)
        {
            int bitsPerByte = 8;
            Console.WriteLine("Channels: {0}, SamplesPerSec: {1}, BitsPerSample: {2}, BlockAlignIsEqual: {3}, BytesPerSecIsEqual: {4}", 
            wf.nChannels, wf.nSamplesPerSec, wf.wBitsPerSample, 
            (wf.nChannels * wf.wBitsPerSample) / bitsPerByte == wf.nBlockAlign, 
            (int)(wf.nChannels * wf.nSamplesPerSec * wf.wBitsPerSample) / bitsPerByte == wf.nAvgBytesPerSec);
        }
    }

score 2 · Accepted Answer

假设您知道如何打开文件并从中读取数据，那么您需要参考数据文件格式。对于 WAV 文件，请参阅此处了解有关如何组织和访问数据的说明。

Offset  Size  Name             Description

The canonical WAVE format starts with the RIFF header:

0         4   ChunkID          Contains the letters "RIFF" in ASCII form
                               (0x52494646 big-endian form).
4         4   ChunkSize        36 + SubChunk2Size, or more precisely:
                               4 + (8 + SubChunk1Size) + (8 + SubChunk2Size)
                               This is the size of the rest of the chunk 
                               following this number.  This is the size of the 
                               entire file in bytes minus 8 bytes for the
                               two fields not included in this count:
                               ChunkID and ChunkSize.
8         4   Format           Contains the letters "WAVE"
                               (0x57415645 big-endian form).

The "WAVE" format consists of two subchunks: "fmt " and "data":
The "fmt " subchunk describes the sound data's format:

12        4   Subchunk1ID      Contains the letters "fmt "
                               (0x666d7420 big-endian form).
16        4   Subchunk1Size    16 for PCM.  This is the size of the
                               rest of the Subchunk which follows this number.
20        2   AudioFormat      PCM = 1 (i.e. Linear quantization)
                               Values other than 1 indicate some 
                               form of compression.
22        2   NumChannels      Mono = 1, Stereo = 2, etc.
24        4   SampleRate       8000, 44100, etc.
28        4   ByteRate         == SampleRate * NumChannels * BitsPerSample/8
32        2   BlockAlign       == NumChannels * BitsPerSample/8
                               The number of bytes for one sample including
                               all channels. I wonder what happens when
                               this number isn't an integer?
34        2   BitsPerSample    8 bits = 8, 16 bits = 16, etc.
          2   ExtraParamSize   if PCM, then doesn't exist
          X   ExtraParams      space for extra parameters

The "data" subchunk contains the size of the data and the actual sound:

36        4   Subchunk2ID      Contains the letters "data"
                               (0x64617461 big-endian form).
40        4   Subchunk2Size    == NumSamples * NumChannels * BitsPerSample/8
                               This is the number of bytes in the data.
                               You can also think of this as the size
                               of the read of the subchunk following this 
                               number.
44        *   Data             The actual sound data.

更新：添加数据内联。

score 1 · Accepted Answer

“打包”音频数据的最常见方式是 PCM - 用于未压缩的 WAV 文件。每个样本都被“打包”成短整数值 ( short)，如果您有可以提供 PCM 的库，您可以通过将它们视为short值数组来获取数据。

根据通道的数量，您将获得short每个样本的 s 数量。由于每个short都是 2byte秒，因此立体声音频的每个样本通常有 4 个字节。

因此，例如要访问音频文件中 1.0s 位置的音频数据，您必须跳过 44100*4 字节，假设音频以 44100 采样（最常见的采样率 - 来自 CD）。

c# - 如何从音乐文件中读取样本？

3 回答 3

Related

Reference