在尝试使用 boost::compute 时,我遇到了确定我可以在设备上分配的最大向量的问题(我对 boost::compute 还是很陌生)。以下代码片段
std::vector<cl_double> host_tmp;
std::cout << "CL_DEVICE_GLOBAL_MEM_SIZE / sizeof(cl_double) = " << device.get_info<cl_ulong>(CL_DEVICE_GLOBAL_MEM_SIZE) / sizeof(cl_double) << "\n";
std::cout << "CL_DEVICE_MAX_MEM_ALLOC_SIZE / sizeof(cl_double) = " << device.get_info<cl_ulong>(CL_DEVICE_MAX_MEM_ALLOC_SIZE) / sizeof(cl_double) << "\n";
size_t num_elements = device.get_info<cl_ulong>(CL_DEVICE_MAX_MEM_ALLOC_SIZE) / sizeof(cl_double);
compute::vector<cl_double> dev_tmp(context);
std::cout << "Maximum size of vector reported by .max_size() = " << dev_tmp.max_size() << "\n";
for (auto i = 0; i < 64; ++i) {
std::cout << "Resizing device vector to " << num_elements << "...";
dev_tmp.resize(num_elements, queue);
std::cout << " done.";
std::cout << " Assigning host data...";
host_tmp.resize(num_elements);
std::iota(host_tmp.begin(), host_tmp.end(), 0);
std::cout << " done.";
std::cout << " Copying data from host to device...";
compute::copy(host_tmp.begin(), host_tmp.end(), dev_tmp.begin(), queue);
std::cout << " done.\n";
num_elements += 1024 * 1024;
}
给
CL_DEVICE_GLOBAL_MEM_SIZE / sizeof(cl_double) = 268435456
CL_DEVICE_MAX_MEM_ALLOC_SIZE / sizeof(cl_double) = 67108864
Maximum size of vector reported by .max_size() = 67108864
Resizing device vector to 67108864... done. Assigning host data... done. Copying data from host to device... done.
Resizing device vector to 68157440... done. Assigning host data... done. Copying data from host to device... done.
...
Resizing device vector to 101711872...Memory Object Allocation Failure
很明显,报告的 max_size() 既不是硬限制也不是强制执行的。我假设为了安全起见,我应该坚持报告的 max_size(),但是,如果我在大小为 max_size() 的设备上分配多个向量,那么我也会收到该Memory Object Allocation Failure
消息。
- 使用 boost::compute 时处理(和避免)内存分配失败的正确/常用方法是什么?
- 如何确定在任何给定时刻可以分配的向量的最大大小(即设备可能已经包含分配的数据)?
- 如果我有太多数据,我可以让 boost::compute 自动分块处理它还是我必须自己分解它?
- 完成后如何释放设备上的内存?