1

我目前正在尝试将 CUDA 程序的某种形式的输出连接到GL_TEXTURE_2D用于渲染的输出。我并不担心 CUDA 的输出类型(无论是数组还是曲面,我都可以调整程序)。

所以问题是,我该怎么做?(我当前的代码将输出数组复制到系统内存,然后使用 再次将其上传到 GPU GL.TexImage2D,这显然是非常低效的 - 当我禁用这两条代码时,它从每秒大约 300 次内核执行到高达 400 次)

我已经有一些测试代码,至少将 GL 纹理绑定到 CUDA,但我什至无法从中获取设备指针......

ctx = CudaContext.CreateOpenGLContext(CudaContext.GetMaxGflopsDeviceId(), CUCtxFlags.SchedAuto);

uint textureID = (uint)GL.GenTexture(); //create a texture in GL
GL.TexParameter(TextureTarget.Texture2D, TextureParameterName.TextureMinFilter, (int)TextureMinFilter.Linear);
GL.TexParameter(TextureTarget.Texture2D, TextureParameterName.TextureMagFilter, (int)TextureMagFilter.Linear);
GL.TexImage2D(TextureTarget.Texture2D, 0, PixelInternalFormat.Rgba, width, height, 0, OpenTK.Graphics.OpenGL.PixelFormat.Rgba, PixelType.UnsignedByte, null); //allocate memory for the texture in GL

CudaOpenGLImageInteropResource resultImage = new CudaOpenGLImageInteropResource(textureID, CUGraphicsRegisterFlags.WriteDiscard, CudaOpenGLImageInteropResource.OpenGLImageTarget.GL_TEXTURE_2D, CUGraphicsMapResourceFlags.WriteDiscard); //using writediscard because the CUDA kernel will only write to this texture

//then, as far as I understood the ManagedCuda example, I have to do the following when I call my kernel
//(done without a CudaGraphicsInteropResourceCollection because I only have one item)
resultImage.Map();
var ptr = resultImage.GetMappedPointer(); //this crashes
kernelSample.Run(ptr); //pass the pointer to the kernel so it knows where to write
resultImage.UnMap();

尝试获取指针时抛出以下异常:

ErrorNotMappedAsPointer: This indicates that a mapped resource is not available for access as a pointer.

我需要做什么来解决这个问题?

即使可以解决此异常,我将如何解决问题的另一部分;也就是说,我如何在内核中使用获取的指针?我可以为此使用表面吗?作为任意数组(指针算法)访问它?

编辑:看这个例子,显然我什至不需要每次调用内核时都映射资源,并调用渲染函数。但这将如何转化为 ManangedCUDA?

4

1 回答 1

1

感谢我找到的示例,我能够将其转换为 ManagedCUDA(在浏览源代码并摆弄之后),我很高兴地宣布,这确实将我的每秒样本数从大约 300 提高到 400 :)

显然需要使用 3D 数组(我没有在 ManagedCUDA 中看到使用 2D 数组的任何重载)但这并不重要 - 我只使用正好 1 深的 3D 数组/纹理。

id = GL.GenTexture();
GL.BindTexture(TextureTarget.Texture3D, id);
GL.TexParameter(TextureTarget.Texture3D, TextureParameterName.TextureMinFilter, (int)TextureMinFilter.Linear);
GL.TexParameter(TextureTarget.Texture3D, TextureParameterName.TextureMagFilter, (int)TextureMagFilter.Linear);
GL.TexImage3D(TextureTarget.Texture3D, 0, PixelInternalFormat.Rgba, width, height, 1, 0, OpenTK.Graphics.OpenGL.PixelFormat.Bgra, PixelType.UnsignedByte, IntPtr.Zero); //allocate memory for the texture but do not upload anything

CudaOpenGLImageInteropResource resultImage = new CudaOpenGLImageInteropResource((uint)id, CUGraphicsRegisterFlags.SurfaceLDST, CudaOpenGLImageInteropResource.OpenGLImageTarget.GL_TEXTURE_3D, CUGraphicsMapResourceFlags.WriteDiscard);
resultImage.Map();
CudaArray3D mappedArray = resultImage.GetMappedArray3D(0, 0);
resultImage.UnMap();

CudaSurface surfaceResult = new CudaSurface(kernelSample, "outputSurface", CUSurfRefSetFlags.None, mappedArray); //nothing needs to be done anymore - this call connects the 3D array from the GL texture to a surface reference in the kernel

内核代码:surface outputSurface;

__global__ void Sample() {
    ...
    surf3Dwrite(output, outputSurface, pixelX, pixelY, 0);
}
于 2016-03-27T13:07:52.853 回答