因为我需要使用 CUDA 对大量数字进行排序,所以我使用了推力。到目前为止,一切都很好......但是当我想调用一个“手写”内核时,有一个包含数据的推力::host_vector 怎么办?
我的方法是(缺少备份):
int CUDA_CountAndAdd_Kernel(thrust::host_vector<float> *samples, thrust::host_vector<int> *counts, int n) {
thrust::device_ptr<float> dSamples = thrust::device_malloc<float>(n);
thrust::copy(samples->begin(), samples->end(), dSamples);
thrust::device_ptr<int> dCounts = thrust::device_malloc<int>(n);
thrust::copy(counts->begin(), counts->end(), dCounts);
float *dSamples_raw = thrust::raw_pointer_cast(dSamples);
int *dCounts_raw = thrust::raw_pointer_cast(dCounts);
CUDA_CountAndAdd_Kernel<<<1, n>>>(dSamples_raw, dCounts_raw);
thrust::device_free(dCounts);
thrust::device_free(dSamples);
}
内核看起来像:
__global__ void CUDA_CountAndAdd_Kernel_Device(float *samples, int *counts)
但编译失败:
错误:“float **”类型的参数与“thrust::host_vector> *”类型的参数不兼容
咦?!我以为我在给出浮点和整数原始指针?还是我错过了什么?