memory - 如何在多个 cuda 设备上使用常量内存

翻译自：https://stackoverflow.com/questions/13401073 2012-11-15T15:46:13.527

563 次

我正在尝试使用我拥有的两个 cuda 设备（我现在正在尝试使用 GF 690GTX 这实际上是两个独立的设备），并且我的程序使用常量内存将数据传输到设备。

我清楚地了解我应该如何处理全局内存以在两个设备中使用它：

//working with device 0
cudaSetDevice(0);
void* mem_on_dev_0 = cudaMalloc(...);
cudaMemcpy(mem_on_dev_0, mem_on_host, ...);
kernel_call<<<...>>>(mem_on_dev_0);

//working with device 1
cudaSetDevice(1);
void* mem_on_dev_1 = cudaMalloc(...);
cudaMemcpy(mem_on_dev_1, mem_on_host, ...);
kernel_call<<<...>>>(mem_on_dev_1);

但是当使用常量内存时，使用它的普通方法是在文件中的某处声明常量变量，然后使用“符号”函数来处理它：

// What device this memory is on?
__device__ __constant__ float g_const_memory[CONST_MEM_SIZE];

// dev_func can be told to be called on any device
__global__ void dev_func()
{
    //using const memory
    float f = g_const_memory[const_index];
}

void host_func()
{
    //cudaSetDevice(0);  //any sense?
    cudaMemcpyToSymbol(g_const_memory, host_mem, ...);
    dev_func<<<...>>>();
}

在问这个问题之前我已经用谷歌搜索了很多，但我没有找到任何答案。真的没有办法吗？

memory - 如何在多个 cuda 设备上使用常量内存

0 回答 0

Related

Reference