cuda - CUDA：意外的 printf 行为

Question

我不明白我在 CUDA 内核中使用 printf 观察到的行为。有人可以对此有所了解吗？如果这是正常的，为什么会这样？有没有办法确保我在内核中修改数据之前打印数据（调试）？

这是代码：

~>more *
::::::::::::::
Makefile
::::::::::::::
all:
    nvcc -o WTF.cu.o -arch=sm_21 -c WTF.cu
    g++ -o WTF.exe -I/usr/local/cuda/include WTF.cpp WTF.cu.o -L/usr/local/cuda/lib64 -lcuda -lcudart
::::::::::::::
WTF.cpp
::::::::::::::
#include <iostream> // cout
#include <cstdlib>  // rand, srand

#include <cuda_runtime_api.h> // cudaXXX
void PrintOnGPU ( unsigned int const iDataSize, int * const iopData );

using namespace std;

int main ()
{
  // Allocate and initialize CPU data
  unsigned int dataSize = 4;
  srand ( time ( NULL ) ); // Random seed
  int * pCPUData = ( int * ) malloc ( sizeof ( int ) * dataSize );
  for ( unsigned int i = 0; i < dataSize; i++ ) { pCPUData[i] = rand () % 100; cout << "CPU : " << pCPUData[i] << endl; }

  // Print from GPU
  int * pGPUData = NULL;
  cudaMalloc ( ( void ** ) &pGPUData, dataSize * sizeof ( int ) );
  cudaMemcpy ( pGPUData, pCPUData, dataSize * sizeof ( int ), cudaMemcpyHostToDevice );
  PrintOnGPU ( dataSize, pGPUData );

  // Get out
  cudaFree ( pGPUData );
  if ( pCPUData ) { free ( pCPUData ); pCPUData = NULL; }
  return 0;
}
::::::::::::::
WTF.cu
::::::::::::::
#include "stdio.h"

__global__ void WTF ( unsigned int const iDataSize, int * const iopData )
{
  if ( iDataSize == 0 || !iopData ) return;

  // Don't modify : just print
  unsigned long long int tIdx = blockIdx.x * blockDim.x + threadIdx.x; // 1D grid
  if ( tIdx == 0 )
  {
    for ( unsigned int i = 0; i < iDataSize; i++ )
      printf ( "GPU : %i \n", iopData[i] );
  }
  __syncthreads();

  // Modify
  // iopData[tIdx] = 666; // WTF ?...
}

void PrintOnGPU ( unsigned int const iDataSize, int * const iopData )
{
  WTF<<<2,2>>> ( iDataSize, iopData );
}

而且，正如预期的那样，我没有得到超过 100 的值（cpp 文件中的第 15 行：rand () % 100）：

~>make; ./WTF.exe
nvcc -o WTF.cu.o -arch=sm_21 -c WTF.cu
g++ -o WTF.exe -I/usr/local/cuda/include WTF.cpp WTF.cu.o -L/usr/local/cuda/lib64 -lcuda -lcudart
CPU : 38
CPU : 73
CPU : 28
CPU : 82
GPU : 38 
GPU : 73 
GPU : 28 
GPU : 82

现在我取消注释 cu 文件 (iopData[tIdx] = 666) 中的第 17 行：我将所有值修改为 666（即高于 100）。由于在 CUDA 内核中的数据修改之前我有 4 个数据（cpp 文件中的 dataSize = 4）、一个 2 X 2 网格和一个 __syncthreads ()，我不应该 printf 任何修改过的数据，对吧？但是，我得到了这个（打印修改后的数据，值为 666）：

 ~>make; ./WTF.exe
nvcc -o WTF.cu.o -arch=sm_21 -c WTF.cu
g++ -o WTF.exe -I/usr/local/cuda/include WTF.cpp WTF.cu.o -L/usr/local/cuda/lib64 -lcuda -lcudart
CPU : 29
CPU : 72
CPU : 66
CPU : 90
GPU : 29 
GPU : 72 
GPU : 666 
GPU : 666

我不明白为什么这些 666 出现：对我来说，它们不应该出现？！如果这种行为是正常的，为什么会这样？

跳频

score 3 · Accepted Answer

这是因为您正在启动 2 个线程块，而这些线程块可以按任意顺序执行，同时或顺序执行。

假设您有未注释的麻烦行。现在假设线程块 1 首先运行并在线程块 0 之前完成。然后线程块 0 运行。但是线程块 0 正在打印，它正在打印所有 4 个值。所以线程块 1 到 666 之前设置的值被线程块 0 打印出来。

如果线程块 0 首先运行，则不会发生这种情况，因此我的猜测是您永远不会看到前 2 个 GPU 值列为 666，只有最后 2 个（来自线程块 1）。如果您只启动 1 个块，无论线程数如何（至少使用发布的内核代码），您也永远不会看到它。

您可能还认为这__syncthreads()是设备范围的同步而感到困惑。它不是。它仅充当线程块中线程的屏障。单独的线程块之间没有同步。

cuda - CUDA：意外的 printf 行为

1 回答 1

Related

Reference