cuda - CUDA内存问题

Question

我有一个 CUDA 内核，我正在编译为一个 cubin 文件，没有任何特殊标志：

nvcc text.cu -cubin

它编译，虽然有这个消息：

咨询：假设全局内存空间，无法判断指针指向的内容

以及对某个临时 cpp 文件中的一行的引用。我可以通过注释掉一些对我来说毫无意义的看似任意的代码来实现这一点。

内核如下：

__global__ void string_search(char** texts, int* lengths, char* symbol, int* matches, int symbolLength)
{
    int localMatches = 0;
    int blockId = blockIdx.x + blockIdx.y * gridDim.x;
    int threadId = threadIdx.x + threadIdx.y * blockDim.x;
    int blockThreads = blockDim.x * blockDim.y;

    __shared__ int localMatchCounts[32];

    bool breaking = false;
    for(int i = 0; i < (lengths[blockId] - (symbolLength - 1)); i += blockThreads)
    {
        if(texts[blockId][i] == symbol[0])
        {
            for(int j = 1; j < symbolLength; j++)
            {
                if(texts[blockId][i + j] != symbol[j])
                {
                    breaking = true;
                    break;
                }
            }
            if (breaking) continue;
            localMatches++;
        }
    }

    localMatchCounts[threadId] = localMatches;

    __syncthreads();

    if(threadId == 0)
    {
        int sum = 0;
        for(int i = 0; i < 32; i++)
        {
            sum += localMatchCounts[i];
        }
        matches[blockId] = sum;
    }
}

如果我更换线路

localMatchCounts[threadId] = localMatches;

在此行的第一个 for 循环之后

localMatchCounts[threadId] = 5;

它编译时没有任何通知。这也可以通过注释掉行上方循环的看似随机的部分来实现。我也尝试用普通数组替换本地内存数组无效。谁能告诉我问题是什么？

该系统是 Vista 64 位，物有所值。

编辑：我修复了代码，因此它实际上可以工作，尽管它仍然会产生编译器通知。警告似乎不是问题，至少在正确性方面（它可能会影响性能）。

score 1 · Accepted Answer

像 char** 这样的指针数组在内核中是有问题的，因为内核无法访问主机的内存。
最好分配单个连续缓冲区并以支持并行访问的方式对其进行划分。
在这种情况下，我将定义一个一维数组，其中包含一个接一个地定位的所有字符串和另一个一维数组，大小为 2*numberOfStrings，其中包含第一个数组中每个字符串的偏移量及其长度：

例如 - 内核准备：

char* 缓冲区 = st[0] + st[1] + st[2] + ....;
int* 元数据 = new int[numberOfStrings * 2];
int lastpos = 0;
for (int cnt = 0; cnt < 2* numberOfStrings; cnt+=2)
{
    元数据[cnt] = lastpos;
    lastpos += 长度（st[cnt]）；
    元数据[cnt] = 长度（st[cnt]）；
}

在内核中：

currentIndex = threadId + blockId * numberOfBlocks;
char* currentString = 缓冲区 + 元数据[2 * currentIndex];
int currentStringLength = 元数据[2 * currentIndex + 1];

score 0 · Accepted Answer

该问题似乎与 char** 参数有关。把它变成一个 char* 解决了这个警告，所以我怀疑 cuda 可能对这种形式的数据有问题。在这种情况下，也许 cuda 更喜欢使用特定的 cuda 2D 数组。

cuda - CUDA内存问题

2 回答 2

Related

Reference