1

我问这个的原因是因为我的代码中有一些奇怪的错误,我怀疑这可能是一些别名问题:

__shared__ float x[32];
__shared__ unsigned int xsum[32];

int idx=threadIdx.x;
unsigned char * xchar=(unsigned char *)x;
//...do something
 if (threadIdx.x<32)
 {
    xchar[4*idx]&=somestring[0];
    xchar[4*idx+1]&=somestring[1];
    xchar[4*idx+2]&=somestring[2];
    xchar[4*idx+3]&=somestring[3];

    xsum[idx]+=*((unsigned int *)(x+idx));//<-Looks like the compiler sometimes fail to recongize this as the aliasing of xchar;
 };
4

2 回答 2

2

The compiler only needs to honour aliasing between compatible types. Since char and float are not compatible, the compiler is free to assume the pointers never alias.

If you want to do bitwise operations on float, firstly convert (via __float_as_int()) to unsigned integer, then operate on that, and finally convert back to float (using __int_as_float()).

于 2013-03-29T01:18:23.700 回答
1

我认为您在这里有比赛条件。但我不知道是什么somestring。如果所有线程都相同,则可以这样做:

__shared__ float x[32];

unsigned char * xchar=(unsigned char *)x;

//...do something

if(threadIdx.x<4) {
     xchar[threadIdx.x]&=somestring[threadIdx.x];
}

__syncthreads();

unsigned int xsum+=*((unsigned int *)x);

这意味着每个线程共享相同的数组,因此所有线程之间的 xsum 相同。如果您希望每个线程都有自己的数组,则必须分配一个数组32*number_of_threads_in_block并使用偏移量。

PS:上面的代码只适用于一维块。在 2D 或 3D 中,您必须计算自己的线程 ID,并确保只有 4 个线程执行代码。

于 2013-03-29T00:56:51.523 回答