c++ - 使用模板特征类型时，NVIDIA NVCC 会更改编译时间常数

Question

使用 C++ 模板时，我看到 NVIDIA NVCC（CUDA 4.0 和 4.1 测试）的奇怪行为。我将其简化为一个演示行为的简单示例。

这已经处于错误报告的状态。然而，我把它挂在这里，因为这个网站是一个越来越可靠的错误和修复来源。所以，我保持这个页面更新。

代码：

#include"stdio.h"

#define PETE_DEVICE __device__

template<class T, int N>  class ILattice;
template<class T>         class IScalar;
template<class T, int IL> struct AddILattice {};

template<class T>
PETE_DEVICE
void printType() {
  printf("%s\n",__PRETTY_FUNCTION__);
}

template<class T> class IScalar {
  T F;
};

template<class T, int N> class ILattice {
  T F[N];
};

template<class T, int N>
struct AddILattice<IScalar<T> , N> {
  typedef ILattice< T , N > Type_t;
};

#define IL 16

__global__ void kernel()
{
  printf("IL=%d\n",IL);  // Here IL==16

  typedef typename AddILattice<IScalar<float> ,IL>::Type_t Tnew;

  // This still works fine. Output:
  // void printType() [with T = ILattice<float, 16>]
  //
  printType<Tnew>();

  // Now problems begin: Output:
  // T=4 Tnew=0 IL=64
  // Here IL should still be 16
  // sizeof(Tnew) should be 16*sizeof(float)
  //
  printf("T=%d Tnew=%d IL=%d\n",sizeof(IScalar<float> ),sizeof(Tnew),IL);   
}   

int main()
{
    dim3  blocksPerGrid( 1 , 1 , 1 );
    dim3  threadsPerBlock( 1 , 1, 1);
    kernel<<< blocksPerGrid , threadsPerBlock , 48*1024 >>>( );

    cudaDeviceSynchronize();
    cudaError_t kernel_call = cudaGetLastError();
    printf("call: %s\n",cudaGetErrorString(kernel_call));

}

任何想法为什么编译器IL从 16 更改为 64 ？

score 2 · Accepted Answer

可能是因为你使用了错误的 printf 转换。%d意思是输出一个int，但是sizeof返回的不是int，而是一个size_t。另外使用 size_t 长度修饰符（并使其无符号），即替换%d为%zu.

printf 无法知道（由于 var-args 列表）真正传递了哪些类型，因此不会发生类型转换，它只能知道格式字符串的类型。所以你必须在那里传递正确的参数。当您在 size_t 与 int 大小相同的系统上时，您的代码可以工作（例如，许多 32 位系统）。但是你不能依赖这个事实，使用正确的转换会帮助你。

（所以它不是编译器改变你的常量，而是你输出错误）

c++ - 使用模板特征类型时，NVIDIA NVCC 会更改编译时间常数

1 回答 1

Related

Reference