我有这段 C++ 代码,我想将它移植到 CUDA。
for (int im = 0; im < numImages; im++)
{
for (p = 0; p < xsize*ysize; p++)
{
bool ok = false;
for (f = 0; f < numFeatures; f++)
{
if (feature[im][f][p] != 0)
{
ok = true;
break;
}
}
if (ok)
{
minDist = 1e9;
for (i = 0; i < numBins; i++)
{
dist = 0;
for (f = 0; f < numFeatures; f++)
{
dist += (float)((feature[im][f][p]-clusterPoint[f][i])*(feature[im][f][p]-clusterPoint[f][i]));
}
if (dist < minDist)
{
minDist = dist;
tmp = i;
}
}//end for i
for (f = 0; f < numFeatures; f++)
csum[f][tmp] += feature[im][f][p];
ccount[tmp]++;
averageDist[tmp] += sqrt(minDist);
} // end if (ok)
} //end for p
}// end for im
我想计算csum
,ccount
并averageDist
在 GPU 中。csum
并且averagedist
是浮点数,ccount
是整数。
这是一个并行减少问题吗?