c# - 将非托管 System.IntPtr 字节向量复制到 2D 设备字节数组的 GPU 行

Question

我正在使用 C# 和 CUDAfy.net（是的，这个问题在带有指针的直接 C 中更容易，但考虑到更大的系统，我有使用这种方法的原因）。

我有一个视频帧采集卡，它以 30 FPS 的速度收集字节 [1024 x 1024] 图像数据。每 33.3 毫秒，它填充一个循环缓冲区中的一个槽，并返回一个System.IntPtr指向的非托管 1D 向量的*byte; 循环缓冲区有 15 个插槽。

在 GPU 设备 (Tesla K40) 上，我想要一个全局 2D 数组，该数组被组织为密集的 2D 数组。也就是说，我想要类似循环队列的东西，但在 GPU 上组织为密集的 2D 数组。

byte[15, 1024*1024] rawdata; 
// if CUDAfy.NET supported jagged arrays I could use byte[15][1024*1024 but it does not

如何每 33 毫秒填写不同的行？我是否使用类似的东西：

gpu.CopyToDevice<byte>(inputPtr, 0, rawdata, offset, length) // length = 1024*1024
//offset is computed by  rowID*(1024*1024) where rowID wraps to 0 via modulo 15.
// inputPrt is the System.Inptr that points to the buffer in the circular queue (un-managed)?
// rawdata is a device buffer allocated gpu.Allocate<byte>(1024*1024);

在我的内核头文件中是：

[Cudafy]
public static void filter(GThread thread, byte[,] rawdata, int frameSize, byte[] result)

我确实尝试过这些方面的东西。但是 CudaFy 中没有 API 模式用于：

GPGPU.CopyToDevice(T) Method (IntPtr, Int32, T[,], Int32, Int32, Int32)

所以我使用 gpu.Cast 函数将 2D 设备数组更改为 1D。

我尝试了下面的代码，但我得到了 CUDA.net 异常：ErrorLaunchFailed

仅供参考：当我尝试使用 CUDA 模拟器时，它在 CopyToDevice 上中止，声称Data 不是主机分配的

public static byte[] process(System.IntPtr data, int slot)
{
    Stopwatch watch = new Stopwatch();
    watch.Start();
    byte[] output = new byte[FrameSize];
    int offset = slot*FrameSize;
    gpu.Lock();
    byte[] rawdata = gpu.Cast<byte>(grawdata, FrameSize); // What is the size supposed to be? Documentation lacking
    gpu.CopyToDevice<byte>(data, 0, rawdata, offset, FrameSize * frameCount);
    byte[] goutput = gpu.Allocate<byte>(output);
    gpu.Launch(height, width).filter(rawdata, FrameSize, goutput);
    runTime = watch.Elapsed.ToString();
    gpu.CopyFromDevice(goutput, output);
    gpu.Free(goutput);
    gpu.Synchronize();
    gpu.Unlock();
    watch.Stop();
    totalRunTime = watch.Elapsed.ToString();
    return output;
}

score 1 · Accepted Answer

我现在提出这个“解决方案” ： 1. 仅在本机模式下运行程序（而不是在仿真模式下）。或 2. 不要自己处理固定内存分配。

现在似乎有一个悬而未决的问题。但这仅在仿真模式下发生。

见：https ://cudafy.codeplex.com/workitem/636

score 1 · Accepted Answer

如果我正确理解了您的问题，我认为您正在寻找将从循环缓冲区中获得的数据转换为要发送到显卡 API
byte*的多维数组。byte

            int slots = 15;
            int rows = 1024;
            int columns = 1024;

//Try this
            for (int currentSlot = 0; currentSlot < slots; currentSlot++)
            {
                IntPtr intPtrToUnManagedMemory = CopyContextFrom(currentSlot);
                // use Marshal.Copy ?  
                byte[] byteData = CopyIntPtrToByteArray(intPtrToUnManagedMemory); 

                int offset =0;
                for (int m = 0; m < rows; m++)
                    for (int n = 0; n < columns; n++)
                    {
                        //then send this to your GPU method
                        rawForGpu[m, n] = ReadByteValue(IntPtr: intPtrToUnManagedMemory, 
                                                        offset++);
                    }
            }

//or try this
            for (int currentSlot = 0; currentSlot < slots; currentSlot++)
            {
                IntPtr intPtrToUnManagedMemory = CopyContextFrom(currentSlot);

                // use Marshal.Copy ?
                byte[] byteData = CopyIntPtrToByteArray(intPtrToUnManagedMemory); 

                byte[,] rawForGpu = ConvertTo2DArray(byteData, rows, columns);
            }
        }

        private static byte[,] ConvertTo2DArray(byte[] byteArr, int rows, int columns)
        {
            byte[,] data = new byte[rows, columns];
            int totalElements = rows * columns;
            //Convert 1D to 2D rows, colums
            return data;
        }

        private static IntPtr CopyContextFrom(int slotNumber)
        {
            //code that return byte* from circular buffer.
            return IntPtr.Zero;
        }

score 0 · Accepted Answer

您应该考虑使用GPGPU Async内置gpuKern.LaunchAsync(...)

查看http://www.codeproject.com/Articles/276993/Base-Encoding-on-a-GPU了解使用此功能的有效方法。另一个很好的例子可以在 CudafyExamples 项目中找到，寻找 PinnedAsyncIO.cs。你需要做你所描述的一切。

这CudaGPU.cs在 Cudafy.Host 项目中，它与您正在寻找的方法相匹配（只有它是异步的）：

public void CopyToDeviceAsync<T>(IntPtr hostArray, int hostOffset, DevicePtrEx devArray,
                                  int devOffset, int count, int streamId = 0) where T : struct;
public void CopyToDeviceAsync<T>(IntPtr hostArray, int hostOffset, T[, ,] devArray,
                                 int devOffset, int count, int streamId = 0) where T : struct;
public void CopyToDeviceAsync<T>(IntPtr hostArray, int hostOffset, T[,] devArray,
                                  int devOffset, int count, int streamId = 0) where T : struct;
public void CopyToDeviceAsync<T>(IntPtr hostArray, int hostOffset, T[] devArray,
                                  int devOffset, int count, int streamId = 0) where T : struct;

c# - 将非托管 System.IntPtr 字节向量复制到 2D 设备字节数组的 GPU 行

3 回答 3

Related

Reference