cuda - CUDA：如何在指向数组的指针数组上应用 restrict？

Question

__restrict__这个使用两个int 数组的内核编译得很好：

__global__ void kerFoo( int* __restrict__ arr0, int* __restrict__ arr1, int num )
{
    for ( /* Iterate over array */ )
        arr1[i] = arr0[i];  // Copy one to other
}

但是，组成指针数组的相同的两个 int 数组无法编译：

__global__ void kerFoo( int* __restrict__ arr[2], int num )
{
    for ( /* Iterate over array */ )
        arr[1][i] = arr[0][i];  // Copy one to other
}

编译器给出的错误是：

error: invalid use of `restrict'

我有某些结构组成一个指向数组的指针数组。（例如，传递给内核的结构具有int* arr[16].）我如何将它们传递给内核并能够应用于__restrict__它们？

score 2 · Accepted Answer

CUDA C 手册仅参考了 C99 的定义__restrict__，没有特殊的 CUDA 特定情况。

由于指示的参数是一个包含两个指针的数组，因此这种使用__restrict__对我来说看起来完全有效，编译器没有理由抱怨恕我直言。我会要求编译器作者验证并可能/可能更正该问题。不过，我会对不同的意见感兴趣。

对@talonmies 的评论：

限制的全部意义在于告诉编译器两个或多个指针参数在内存中永远不会重叠。

严格来说，这不是真的。restrict告诉编译器，在其生命周期内，所讨论的指针是唯一可以访问指向对象的指针。请注意，指向的对象仅假定为int. （实际上int在这种情况下只有一个。）由于编译器无法知道数组的大小，因此由程序员来保护数组的边界。

score 0 · Accepted Answer

用一些任意迭代在代码中填写注释，我们得到以下程序：

__global__ void kerFoo( int* __restrict__ arr[2], int num )
{
    for ( int i = 0; i < 1024; i ++)
        arr[1][i] = arr[0][i];  // Copy one to other
}

这与 CUDA 10.1 (Godbolt.org)编译得很好。

cuda - CUDA：如何在指向数组的指针数组上应用 __restrict__？

2 回答 2

Related

Reference

cuda - CUDA：如何在指向数组的指针数组上应用 restrict？