c# - XAudio2 - Cracking output when using a dynamic buffer

Question

To provide a little bit of context. I am trying to output live audio from a camera in my c# application. After doing some research it seems pretty obvious to do it in a c++ managed dll. I chose the XAudio2 api because it should be pretty easy to implement and use with dynamic audio content.

So the idea is to create the XAudio device in c++ with an empty buffer and push in the audio from the c# code side. The audio chunks are pushed every 50ms because I want to keep the latency as small as possible.

// SampleRate = 44100; Channels = 2; BitPerSample = 16;
var blockAlign = (Channels * BitsPerSample) / 8;
var avgBytesPerSecond = SampleRate * blockAlign;
var avgBytesPerMillisecond = avgBytesPerSecond / 1000;
var bufferSize = avgBytesPerMillisecond * Time;
_sampleBuffer = new byte[bufferSize];

Everytime the timer runs it gets the pointer of the audio buffer, reads the data from the audio, copies the data to the pointer and calls the PushAudio method. I am also using a stopwatch to check how long the processing took and calculate the interval again for the timer to include the processing time.

private void PushAudioChunk(object sender, ElapsedEventArgs e)
{
    unsafe
    {
        _pushAudioStopWatch.Reset();
        _pushAudioStopWatch.Start();

        var audioBufferPtr = Output.AudioCapturerBuffer();
        FillBuffer(_sampleBuffer);
        Marshal.Copy(_sampleBuffer, 0, (IntPtr)audioBufferPtr, _sampleBuffer.Length);

        Output.PushAudio();

        _pushTimer.Interval = Time - _pushAudioStopWatch.ElapsedMilliseconds;
        _pushAudioStopWatch.Stop();
        DIX.Log.WriteLine("Push audio took: {0}ms", _pushAudioStopWatch.ElapsedMilliseconds);                
    }
}

This is the implementation of the c++ part.

Regarding to the documentation on msdn I created a XAudio2 device and added the MasterVoice and SourceVoice. The buffer is empty at first because the c# part is responsible to push in the audio data.

namespace Audio
{
    using namespace System;

    template <class T> void SafeRelease(T **ppT)
    {
        if (*ppT)
        {
            (*ppT)->Release();
            *ppT = NULL;
        }
    }

    WAVEFORMATEXTENSIBLE wFormat;

    XAUDIO2_BUFFER buffer = { 0 };

    IXAudio2* pXAudio2 = NULL;
    IXAudio2MasteringVoice* pMasterVoice = NULL;
    IXAudio2SourceVoice* pSourceVoice = NULL;           

    WaveOut::WaveOut(int bufferSize)
    {
        audioBuffer = new Byte[bufferSize];

        wFormat.Format.wFormatTag = WAVE_FORMAT_PCM;
        wFormat.Format.nChannels = 2;
        wFormat.Format.nSamplesPerSec = 44100;
        wFormat.Format.wBitsPerSample = 16;
        wFormat.Format.nBlockAlign = (wFormat.Format.nChannels * wFormat.Format.wBitsPerSample) / 8;
        wFormat.Format.nAvgBytesPerSec = wFormat.Format.nSamplesPerSec * wFormat.Format.nBlockAlign;
        wFormat.Format.cbSize = 0;
        wFormat.SubFormat = KSDATAFORMAT_SUBTYPE_PCM;

        HRESULT hr = XAudio2Create(&pXAudio2, 0, XAUDIO2_DEFAULT_PROCESSOR);

        if (SUCCEEDED(hr))
        {
            hr = pXAudio2->CreateMasteringVoice(&pMasterVoice);
        }

        if (SUCCEEDED(hr))
        {
            hr = pXAudio2->CreateSourceVoice(&pSourceVoice, (WAVEFORMATEX*)&wFormat,
                0, XAUDIO2_DEFAULT_FREQ_RATIO, NULL, NULL, NULL);
        }

        buffer.pAudioData = (BYTE*)audioBuffer;
        buffer.AudioBytes = bufferSize;
        buffer.Flags = 0;

        if (SUCCEEDED(hr))
        {
            hr = pSourceVoice->Start(0);
        }
    }

    WaveOut::~WaveOut()
    {

    }

    WaveOut^ WaveOut::CreateWaveOut(int bufferSize)
    {
        return gcnew WaveOut(bufferSize);
    }

    uint8_t* WaveOut::AudioCapturerBuffer()
    {
        if (!audioBuffer)
        {
            throw gcnew Exception("Audio buffer is not initialized. Did you forget to set up the audio container?");
        }

        return (BYTE*)audioBuffer;
    }

    int WaveOut::PushAudio()
    {
        HRESULT hr = pSourceVoice->SubmitSourceBuffer(&buffer);

        if (FAILED(hr))
        {
            return -1;
        }

        return 0;
    }
}

The problem I am facing is that I always have some cracking in the output. I tried to increase the interval of the timer or increased the buffer size a bit. Everytime the same result.

What am I doing wrong?

Update:

I created 3 buffers the XAudio engine can go through. The cracking got away. The missing part now is to fill the buffers at the right time from the c# part to avoid buffers with the same data.

void Render(void* param)
{
    std::vector<byte> audioBuffers[BUFFER_COUNT];
    size_t currentBuffer = 0;

    // Get the current state of the source voice
    while (BackgroundThreadRunning && pSourceVoice)
    {
        if (pSourceVoice)
        {
            pSourceVoice->GetState(&state);
        }

        while (state.BuffersQueued < BUFFER_COUNT)
        {
            std::vector<byte> resultData;
            resultData.resize(DATA_SIZE);
            CopyMemory(&resultData[0], pAudioBuffer, DATA_SIZE);

            // Retreive the next buffer to stream from MF Music Streamer
            audioBuffers[currentBuffer] = resultData;

            // Submit the new buffer
            XAUDIO2_BUFFER buf = { 0 };
            buf.AudioBytes = static_cast<UINT32>(audioBuffers[currentBuffer].size());
            buf.pAudioData = &audioBuffers[currentBuffer][0];

            pSourceVoice->SubmitSourceBuffer(&buf);

            // Advance the buffer index
            currentBuffer = ++currentBuffer % BUFFER_COUNT;

            // Get the updated state
            pSourceVoice->GetState(&state);
        }

        Sleep(30);
    }
}

score 1 · Accepted Answer

XAudio2在您通过SubmitSourceBuffer. 您必须保持该数据（在您的应用程序内存中）有效，并且在 XAudio2 需要读取它以处理数据的整个时间内分配的缓冲区。这样做是为了提高效率以避免需要额外的副本，但会增加多线程的负担，即保持内存可用，直到它完成播放为止。这也意味着您无法修改播放缓冲区。

您当前的代码只是重复使用相同的缓冲区，这会导致您在播放时更改数据时弹出。您可以通过在两个或三个缓冲区之间进行轮换来解决此问题。XAudio2 源语音具有状态信息，您可以使用它来确定它何时完成播放缓冲区，或者您可以注册显式回调，告诉您何时不再使用缓冲区。

有关使用 XAudio2 的示例，请参阅DirectX Tool Kit for Audio和经典 XAudio2示例。

c# - XAudio2 - Cracking output when using a dynamic buffer

1 回答 1

Related

Reference