像所有标准 c++ 容器一样,您可以thrust::device_vector
通过为它提供自己的“分配器”来自定义分配存储的方式。默认情况下,thrust::device_vector
的分配器是,当 Thrust 的后端系统是 CUDA 时,它使用( )thrust::device_malloc_allocator
分配(释放)存储。cudaMalloc
cudaFree
有时,需要自定义device_vector
分配内存的方式,例如在 OP 的情况下,他们希望在程序初始化时执行的单个大分配中子分配存储。这可以避免许多单独调用底层分配方案可能产生的开销,在这种情况下,cudaMalloc
.
提供device_vector
自定义分配器的一种简单方法是继承自device_malloc_allocator
. 原则上可以从头开始编写整个分配器,但使用继承方法,只需要提供allocate
和成员函数。deallocate
一旦定义了自定义分配器,就可以将其device_vector
作为第二个模板参数提供。
此示例代码演示了如何提供一个自定义分配器,该分配器在分配和解除分配时打印一条消息:
#include <thrust/device_malloc_allocator.h>
#include <thrust/device_vector.h>
#include <iostream>
template<typename T>
struct my_allocator : thrust::device_malloc_allocator<T>
{
// shorthand for the name of the base class
typedef thrust::device_malloc_allocator<T> super_t;
// get access to some of the base class's typedefs
// note that because we inherited from device_malloc_allocator,
// pointer is actually thrust::device_ptr<T>
typedef typename super_t::pointer pointer;
typedef typename super_t::size_type size_type;
// customize allocate
pointer allocate(size_type n)
{
std::cout << "my_allocator::allocate(): Hello, world!" << std::endl;
// defer to the base class to allocate storage for n elements of type T
// in practice, you'd do something more interesting here
return super_t::allocate(n);
}
// customize deallocate
void deallocate(pointer p, size_type n)
{
std::cout << "my_allocator::deallocate(): Hello, world!" << std::endl;
// defer to the base class to deallocate n elements of type T at address p
// in practice, you'd do something more interesting here
super_t::deallocate(p,n);
}
};
int main()
{
// create a device_vector which uses my_allocator
thrust::device_vector<int, my_allocator<int> > vec;
// create 10 ints
vec.resize(10, 13);
return 0;
}
这是输出:
$ nvcc my_allocator_test.cu -arch=sm_20 -run
my_allocator::allocate(): Hello, world!
my_allocator::deallocate(): Hello, world!
在这个例子中,请注意我们my_allocator::allocate()
从前听说过vec.resize(10,13)
。当超出范围my_allocator::deallocate()
时调用一次,因为它破坏了它的元素。vec