cuda - CUDA内核中的主机浮点常量使用

Question

我正在使用 CUDA 5.0。我注意到编译器将允许我在内核中使用主机声明的int常量。但是，它拒绝编译任何使用主机声明的浮点常量的内核。有谁知道这种看似差异的原因？

例如，以下代码可以正常运行，但如果内核中的最后一行未注释，它将无法编译。

#include <cstdio>
#include <cuda_runtime.h>

static int   __constant__ DEV_INT_CONSTANT   = 1;
static float __constant__ DEV_FLOAT_CONSTANT = 2.0f;

static int   const        HST_INT_CONSTANT   = 3;
static float const        HST_FLOAT_CONSTANT = 4.0f;

__global__ void uselessKernel(float * val)
{
    *val = 0.0f;

    // Use device int and float constants
    *val += DEV_INT_CONSTANT;
    *val += DEV_FLOAT_CONSTANT;

    // Use host int and float constants
    *val += HST_INT_CONSTANT;
    //*val += HST_FLOAT_CONSTANT; // won't compile if uncommented
}

int main(void)
{
    float * d_val;
    cudaMalloc((void **)&d_val, sizeof(float));

    uselessKernel<<<1, 1>>>(d_val);

    cudaFree(d_val);
}

score 4 · Accepted Answer

在设备代码中添加一个常量数字是可以的，但在设备代码中添加一个存储在主机内存中的数字是不行的。

当从不引用该变量的 addr 时，编译器/优化器可以将代码中的每个引用static const int替换为该值3。在这种情况下，它就像#define HST_INT_CONSTANT 3，并且没有为此变量分配主机内存。

但是对于floatvar，主机内存总是被分配的，即使它是static const float. 由于内核无法直接访问主机内存，因此您的代码static const float不会被编译。

对于 C/C++，int可以比float.

你的代码在注释为 ON 时运行可以看作是 CUDA CI 认为的一个错误。这static const int是主机端的东西，设备不应该直接访问。

cuda - CUDA内核中的主机浮点常量使用

1 回答 1

Related

Reference