Suppose I compile the following with NVIDIA CUDA's nvcc compiler:
template<typename T, typename Operator>
__global__ void fooKernel(T t1, T t2) {
Operator op;
doSomethingWith(t1, t2);
}
template<typename T>
__device__ __host__ void T bar(T t1, T t2) {
return t1 + t2;
}
template<typename T, typename Operator>
void foo(T t1, T t2) {
fooKernel<<<2, 2>>>(t1, t2);
}
// explicit instantiation
template decltype(foo<int, bar<int>>) foo<int, bar<int>);
Now, I want my gcc, non-nvcc code to call foo()
:
...
template<typename T, typename Operator> void foo(T t1, T t2);
foo<int, bar<int>> (123, 456);
...
I have the appropriate (?) instantiation in the .o/.a/.so file I compile with CUDA.
Can I make that happen?