3

Here is my code:

struct S {
    int a, b;
    float c, d;
};
class A {
private:
    S* d;
    S h[3];
public:
    A() {
        cutilSafeCall(cudaMalloc((void**)&d, sizeof(S)*3));
    }
void Init();
};

void A::Init() {
    for (int i=0;i<3;i++) {
        h[i].a = 0;
        h[i].b = 1;
        h[i].c = 2;
        h[i].d = 3;
    }
    cutilSafeCall(cudaMemcpy(d, h, 3*sizeof(S), cudaMemcpyHostToDevice));
}

A a;

In fact it is a complex program which contain CUDA and OpenGL. When I debug this program, it fails when running at cudaMemcpy with the error information

cudaSafeCall() Runtime API error 11: invalid argument.

Actually, this program is transformed from another one that can run correctly. But in that one, I used two variables S* d and S h[3] in the main function instead of in the class. What is more weird is that I implement this class A in a small program, it works fine. And I've updated my driver, error still exists.

Could anyone give me a hint on why this happen and how to solve it. Thanks.

4

1 回答 1

4

因为 CUDA 中的内存操作是阻塞的,所以它们产生了一个同步点。因此,如果未使用 cudaThreadSynchonize 检查其他错误,则看起来像是内存调用中的错误。

因此,如果在内存操作中收到错误,请尝试在其前面放置一个 cudaThreadSynchronize 并检查结果。


确保正在执行第一个 malloc 语句。如果是关于 CUDA 初始化的问题,就像@Harrism 所指出的那样,那么它会在这个语句中失败??尝试放置 printf 语句,并查看是否执行了正确的初始化。我认为由于使用了未初始化的内存区域,通常会产生无效的参数错误。

  1. 将 printf 写入构造函数,显示 cudaMalloc 内存区域的地址

    A()
    {
        d = NULL;
        cutilSafeCall(cudaMalloc((void**)&d, sizeof(S)*3));
        printf("D: %p\n", d);
    }
    
  2. 尝试为本地分配的区域制作内存副本,即将 cudaMalloc 移动到 cudaMemcopy 的上方(仅用于测试)。

    void A::Init()
    {
        for (int i=0;i<3;i++)
        {
            h[i].a = 0;
            h[i].b = 1;
            h[i].c = 2;
            h[i].d = 3;
        }
        cutilSafeCall(cudaMalloc((void**)&d, sizeof(S)*3)); // here!..
        cutilSafeCall(cudaMemcpy(d, h, 3*sizeof(S), cudaMemcpyHostToDevice));
    }
    

祝你好运。

于 2012-08-28T10:54:28.213 回答