我有一个推力代码,它将大量数据(2.4G)加载到内存中,执行计算,结果存储在主机(~1.5G)中,然后释放初始数据,将结果加载到设备中,对其执行其他计算,最后重新加载初始数据。推力代码如下所示:
thrust::host_device<float> hostData;
// here is a code which loads ~2.4G of data into hostData
thrust::device_vector<float> deviceData = hostData;
thrust::host_vector<float> hostResult;
// here is a code which perform calculations on deviceData and copies the result to hostResult (~1.5G)
free<thrust::device_vector<float> >(deviceData);
thrust::device_vector<float> deviceResult = hostResult;
// here is code which performs calculations on deviceResult and store some results also on the device
free<thrust::device_vector<float> >(deviceResult);
deviceData = hostData;
使用我定义的免费功能:
template<class T> void free(T &V) {
V.clear();
V.shrink_to_fit();
size_t mem_tot;
size_t mem_free;
cudaMemGetInfo(&mem_free, &mem_tot);
std::cout << "Free memory : " << mem_free << std::endl;
}
template void free<thrust::device_vector<int> >(thrust::device_vector<int>& V);
template void free<thrust::device_vector<float> >(
thrust::device_vector<float>& V);
但是,在尝试将 hostData 复制回 deviceData 时出现“thrust::system::detail::bad_alloc' what(): std::bad_alloc: out of memory”错误,即使此时 cudaMemGetInfo 返回该错误我有 ~我的设备有 6G 的可用内存。这是 free 方法的完整输出:
Free memory : 6295650304
Free memory : 6063775744
terminate called after throwing an instance of 'thrust::system::detail::bad_alloc'
what(): std::bad_alloc: out of memory
尽管有很多可用空间,但这似乎表明该设备内存不足。这是为推力向量释放内存的正确方法吗?我还应该注意,该代码适用于较小的数据量(高达 1.5G)