cuda - 推力函子：“启动请求的资源过多”

Question

我正在尝试在 CUDA 中实现类似的东西：

对于每个元素

p = { p if p >= floor
      z if p < floor

wherefloor和z是在测试开始时配置的常量。

我试图像这样实现它，但我收到错误“请求启动的资源太多”

一个函子：

struct floor_functor : thrust::unary_function <float, float>
{
        const float floorLevel, floorVal;

        floor_functor(float _floorLevel, float _floorVal) : floorLevel(_floorLevel), floorVal(_floorVal){}

        __host__
        __device__
        float operator()(float& x) const
        {
            if (x >= floorLevel)
                return x;
            else
                return floorVal;
        }
};

由变换使用：

thrust::transform(input->begin(), input->end(), output.begin(), floor_functor(floorLevel, floorVal));

如果我删除我的仿函数的一个成员，比如说floorVal，并使用一个只有一个成员变量的仿函数，它就可以正常工作。

有谁知道为什么会这样，我该如何解决？

附加信息：

我的数组长 786432 个元素。

我的 GPU 是 GeForce GTX590

我正在使用以下命令进行构建：

`nvcc -c -g -arch sm_11 -Xcompiler -fPIC -Xcompiler -Wall -DTHRUST_DEBUG  -I <my_include_dir> -o <my_output> <my_source>`

我的 cuda 版本是 4.0：

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2011 NVIDIA Corporation
Built on Thu_May_12_11:09:45_PDT_2011
Cuda compilation tools, release 4.0, V0.2.1221

我每个块的最大线程数是 1024（由 deviceQuery 报告）：

Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 32768
  Warp size:                                     32
  Maximum number of threads per block:           1024
  Maximum sizes of each dimension of a block:    1024 x 1024 x 64
  Maximum sizes of each dimension of a grid:     65535 x 65535 x 65535

更新：：

我偶然发现了解决问题的方法，但不明白。如果我将我的仿函数从“floor_functor”重命名为基本上其他任何东西，它就可以工作！我不知道为什么会这样，并且很想听听任何人对此的想法。

score 1 · Accepted Answer

为了更简单的 CUDA 实现，您可以使用 ArrayFire 在一行代码中执行此操作：

p(p < floor) = z;

只需将您的变量声明为 af::array's。

祝你好运！

免责声明：我从事各种 CUDA 项目，包括 ArrayFire。

cuda - 推力函子：“启动请求的资源过多”

1 回答 1

Related

Reference