c++ - CUDA 中的 C++11 别名模板

Question

基本问题是CUDA 编译器是否支持别名模板？

我在带有 gcc-4.8 的 Ubuntu 上使用 CUDA 7.5。我所有的模板类都在头文件中定义，并#include在编译期间 d 到单个翻译单元中。

我有一个简单的cuda_array类，它提供了一个围绕std::vector. 它本质上是一个非常简单的版本，thrust::host_vector结合了thrust::device_vector. 它的声明是

template <typename T, const size_t N>
class cuda_array {
    std::vector<T> host;
    T *device;
public:
    // lots of type aliases to meet container requirements
    void push() { /* cudaMemcpy(...,H2D); */ }
    void pull() { /* cudaMemcpy(...,D2H); */ }
    // a few others that aren't relevant here
};

为了制作一个矩阵，我只是做了一个快速的模板别名。

template <typename T, const size_t M, const size_t N>
using cuda_matrix = cuda_array<T, M * N>;

我想将我的矩阵向量乘法 CUDA 内核映射到重载operator*的类型安全和易于使用（它留给调用者以确保push并pull正确调用）。

template <typename T, const size_t rows, const size_t cols>
__global__ void matrix_vector_mul(T *A, T *b, T *result) {
     __shared__ T shared_b[cols];
    // rest of it
}

template <typename T, const size_t M, const size_t N>
__host__ cuda_array<T, M> operator*(cuda_matrix<T, M, N> &m, cuda_array<T, N> &v) {
    cuda_array<T, M> result;
    matrix_vector_mul<T, M, N><<<16, 32>>>(m.device_data(), v.device_data(), result.device_data());
    return result;
}

在我的'main.cpp'中，我有

cuda_matrix<int,16,32> A;
cuda_array<int,32> b;
auto result = A * b;

最后一行抛出一个错误说

error: no operator "*" matches these operands
        operand types are: cuda_matrix<int, 16UL, 32UL> * cuda_array<int, 32UL>

我追查了所有我能想到的模板类型推断错误的常见嫌疑人，但没有任何效果。无奈之下，我将我的cuda_matrix别名模板转换为模板类。

template <typename T, const size_t M, const size_t N>
class cuda_matrix : public cuda_array<T, M * N> {};

编译错误消失了！因此，CUDA 似乎还不支持别名模板。还是我做了一些我想不通的傻事？

score 4 · Accepted Answer

你必须记住：

§ 14.5.7 [temp.alias]/p2：

当template-id指代别名模板的特化时，它等价于通过将其模板参数替换为别名模板的type-id中的模板参数而获得的关联类型。[注意：别名模板名称永远不会被推导出来。——尾注]

这意味着不执行扣除：

template <typename T, const size_t M, const size_t N>
__host__ cuda_array<T, M> operator*(cuda_matrix<T, M, N> &m, cuda_array<T, N> &v)

但对于：

template <typename T, const size_t M, const size_t N>
__host__ cuda_array<T, M> operator*(cuda_array<T, M * N> &m, cuda_array<T, N> &v)
//                                  ~~~~~~~~~~~~~~~~~~~^

所以：

§ 14.8.2.5 [temp.deduct.type]/p16：

如果在具有非类型模板参数的函数模板的声明中，该非类型模板参数用于函数参数列表中的子表达式，则该表达式是如上所述的非推导上下文。

M是在不可演绎的上下文中，因此这operator*不被视为可行的过载。

作为解决方法之一，您可以改为验证推导出的值cuda_array：

template <typename T, std::size_t MN, std::size_t N>
auto operator*(const cuda_array<T, MN>& m, const cuda_array<T, N>& v)
    -> typename std::enable_if<(MN/N)*N==MN, cuda_array<T, MN/N>>::type;

或使用您已有的继承技巧；然后M和N是单独的非类型模板参数cuda_matrix。

c++ - CUDA 中的 C++11 别名模板

1 回答 1

Related

Reference