neural-network - 这个 OpenCL 内核导致错误 CL_INVALID_COMMAND_QUEUE 怎么样？

Question

我在实现前馈多层感知器时遇到问题，使用 JOCL 在 Java 中的 OpenCL 中进行反向支持学习。这是计算阶段的内核代码：

    #pragma OPENCL EXTENSION cl_khr_fp64 : enable
    __kernel void Neuron(__global const double *inputPatterns,
                           __global double *weights,
                           __global const int *numInputs,
                           __global const int *activation,
                           __global const double *bias,
                           __global const int *usingBias,
                           __global double *values,
                           __global const int *maxNumFloats,
                           __global const int *patternIndex,
                           __global const int *inputPatternSize,
                           __global const int *indexOffset,
                           __global const int *isInputNeuron,
                           __global const int *inputs)
    {
       int gid = get_global_id(0);
       double sum = 0.0;
       for(int i = 0; i < numInputs[gid+indexOffset[0]]; i++)
       {
           sum += values[inputs[(gid+indexOffset[0]) * maxNumFloats[0] + i]] *
                   weights[(gid+indexOffset[0]) * maxNumFloats[0] + i];
       }
       if(usingBias[gid+indexOffset[0]])
           sum += bias[gid+indexOffset[0]];
       if(isInputNeuron[gid+indexOffset[0]])
           sum += inputPatterns[gid+indexOffset[0]+(patternIndex[0] * inputPatternSize[0])];
       if(activation[gid+indexOffset[0]] == 1)
           sum = 1.0 / (1.0 + exp(-sum));
       values[gid + indexOffset[0]] = sum;
    }

基本上，我为网络中的每一层运行这个内核。对于第一层，没有“输入”，因此不会执行循环。然而，由于第一层是输入节点层，它确实从输入模式中添加了相关值。这执行得很好，此时我可以读回这些值。

然而，当我尝试运行第二层（它确实有输入，第一层的每个节点）时，对 clFinish() 的调用会返回错误 CL_INVALID_COMMAND_QUEUE。有时，此错误伴随着驱动程序崩溃和恢复。我已经阅读过（例如这里）这可能是 TDR 超时的问题，并试图提高限制，但不确定这是否有任何区别。

我正在通过对 clSetKernelArg() 的调用来检查任何愚蠢的东西，但是任何人都可以在代码中发现任何明显的错误吗？由于包含 for 循环，似乎在第二层引入了错误......如果需要，我可以澄清任何参数，但对于初始帖子来说似乎有点矫枉过正。

此外，我完全知道这段代码可能会冒犯各地有能力的编码人员，但请随意火焰：P

编辑：主机代码：

    //Calc
    for(int k = 0; k < GPUTickList.length; k++)
    {
        clFlush(clCommandQueue);
        clFinish(clCommandQueue);
        //If input nodes
        if(k == 0)
            //Set index offset to 0
            GPUMapIndexOffset.asIntBuffer().put(0, 0);
        else
            //Update index offset
            GPUMapIndexOffset.asIntBuffer().put(0,
                GPUMapIndexOffset.asIntBuffer().get(0) + GPUTickList[k-1]);
        //Write index offset to GPU buffer
        ret = clEnqueueWriteBuffer(clCommandQueue, memObjects[12], CL_TRUE, 0,
                Sizeof.cl_int, Pointer.to(GPUMapIndexOffset.position(0)), 0, null, null);             
        //Set work size (width of layer)
        global_work_size[0] = GPUTickList[k];
        ret = clEnqueueNDRangeKernel(clCommandQueue, kernel_iterate, 1,
            global_work_offset, global_work_size, local_work_size,
            0, null, null);
    }

编辑 2：我已将完整代码上传到pastebin。

score 2 · Accepted Answer

解决了。通过使用 [0] 索引的所有内容作为直接内核参数而不是缓冲区来修复错误。显然，硬件不喜欢大量的东西一次访问缓冲区的一个特定元素。

score 1 · Accepted Answer

我不确定您在循环上方有什么..您是否使用此循环以外的队列？以下是您可能想尝试的东西。

//flush + finish if you need to before the loop, otherwise remove these lines
clFlush(clCommandQueue);
clFinish(clCommandQueue);

cl_event latestEvent;
//Calc
for(int k = 0; k < GPUTickList.length; k++)
{
    //If input nodes
    if(k == 0)
        //Set index offset to 0
        GPUMapIndexOffset.asIntBuffer().put(0, 0);
    else
        //Update index offset
        GPUMapIndexOffset.asIntBuffer().put(0,
            GPUMapIndexOffset.asIntBuffer().get(0) + GPUTickList[k-1]);
    //Write index offset to GPU buffer
    ret = clEnqueueWriteBuffer(clCommandQueue, memObjects[12], CL_TRUE, 0,
            Sizeof.cl_int, Pointer.to(GPUMapIndexOffset.position(0)), 0, null, null);             
    //Set work size (width of layer)
    global_work_size[0] = GPUTickList[k];
    ret = clEnqueueNDRangeKernel(clCommandQueue, kernel_iterate, 1,
        global_work_offset, global_work_size, local_work_size,
        0, null, &latestEvent);
    clWaitForEvents(1, &latestEvent);
}

neural-network - 这个 OpenCL 内核导致错误 CL_INVALID_COMMAND_QUEUE 怎么样？

2 回答 2

Related

Reference