c++ - 避免在 C++ 向量或 valarray 中初始化

Question

在我的项目中，我必须从 CUDA (GPU) 设备（从视频卡的内存到 std::valarray）复制 std::valarray（或 std::vector）中的大量数值数据。

所以我需要尽可能快地调整这些数据结构的大小，但是当我调用成员方法 vector::resize 时，它会使用循环将数组的所有元素初始化为默认值。

// In a super simplified description resize behave like this pseudocode:
vector<T>::resize(N){
   // Setup the new size

   // allocate the new array
   this->_internal_vector = new T[N];

   // init to default
   // This loop is slow !!!!
   for ( i = 0; i < N ; ++i){
      this->_internal_vector[i] = T();
   }
}

显然我不需要这个初始化，因为我必须从 GPU 复制数据并且所有旧数据都被覆盖。并且初始化需要一些时间；所以我失去了表现。

为了处理我需要分配内存的数据；由方法 resize() 生成。

我非常肮脏和错误的解决方案是使用方法vector::reserve()，但是我失去了vector的所有特征；如果我调整数据大小，则将其替换为默认值。

因此，如果您知道，有一种策略可以避免这种预初始化为默认值（在 valarray 或向量中）。

I want a method resize that behave like this:
vector<T>::resize(N) {
    // Allocate the memory.
    this->_internal_vector = new T[N];

    // Update the the size of the vector or valarray

    // !! DO NOT initialize the new values.
}

表演示例：

#include <chrono>
#include <iostream>
#include <valarray>
#include <vector>

int main() {

  std::vector<double> vec;
  std::valarray<double> vec2;

  double *vec_raw;

  unsigned int N = 100000000;

  std::clock_t start;
  double duration;

  start = std::clock();
  // Dirty solution!
  vec.reserve(N);

  duration = (std::clock() - start) / (double)CLOCKS_PER_SEC;
  std::cout << "duration reserve: " << duration << std::endl;

  start = std::clock();

  vec_raw = new double[N];

  duration = (std::clock() - start) / (double)CLOCKS_PER_SEC;
  std::cout << "duration new: " << duration << std::endl;

  start = std::clock();

  for (unsigned int i = 0; i < N; ++i) {
    vec_raw[i] = 0;
  }

  duration = (std::clock() - start) / (double)CLOCKS_PER_SEC;
  std::cout << "duration raw init: " << duration << std::endl;

  start = std::clock();
  // Dirty solution
  for (unsigned int i = 0; i < vec.capacity(); ++i) {
    vec[i] = 0;
  }

  duration = (std::clock() - start) / (double)CLOCKS_PER_SEC;
  std::cout << "duration vec init dirty: " << duration << std::endl;

  start = std::clock();

  vec2.resize(N);

  duration = (std::clock() - start) / (double)CLOCKS_PER_SEC;
  std::cout << "duration valarray resize: " << duration << std::endl;

  return 0;
}

输出：

duration reserve: 1.1e-05
duration new: 1e-05
duration raw init: 0.222263
duration vec init dirty: 0.214459
duration valarray resize: 0.215735

注意：替换 std::allocator 不起作用，因为循环是由 resize() 调用的。

score 3 · Accepted Answer

假设您有一个数组（或某个集合），其中包含调用的数据data，并且您想将其复制到 vector vec。然后惯用的方法是使用std::vector::reserve然后std::vector::push_back。std::vector::reserve将为分配内存，std::vector但不会初始化内存，或设置内部计数器等。std::vector::push_back将插入数据并更新向量的大小。可选地，使用std::vector::insert需要两个迭代器，以避免循环和单独推回每个元素。

std::vector<double> vec;
vec.reserve(std::size(data)); // Allocate all data in one call.
vec.insert(std::begin(vec), std::begin(data), std::end(data)); // Insert the data elements.

或者，您可以使用std::vector带有两个迭代器的 ctor 重载：

std::vector<double> vec{std::begin(data), std::end(data)};

这也将在一次调用中分配所有数据，然后添加元素。

更新

如果您事先知道数据大小，您可以简单地使用std::array，例如：

constexpr const std::size_t N = 10'000;
std::array<double, N> arr;

arr[5432] = 2.5; // Perfectly valid.
// Or e.g. for CUDA.
cudaMemcpy(std::data(arr), gpu_arr, std::size(arr), cudaMemcpyDeviceToHost);

所有数据将立即分配，并且不会执行默认初始化（值是默认初始化的，但对于基本类型，这意味着什么都不做[不确定值]）。

std::array具有 C++ 集合的所有优点，如std::size、std::begin、std::end等std::data。

score 1 · Accepted Answer

如果您正在使用普通的旧数据（没有指针或引用，只有整数和浮点数），最好只使用普通的旧数组。将其与正确使用结合起来memcpy()，几乎可以保证您获得比任何本机 C++ 实现更好的性能。

关键是，C++ 不能真正将大片数据当作大片数据来处理。它必须处理未知类型的单个对象。它不知道是否可以通过复制它们的位来复制这些对象，它必须为每个单独的元素调用适当的默认、复制或移动构造函数、（移动）赋值运算符和析构函数。虽然优秀的 C++ 编译器能够删除大部分产生的垃圾代码，但结果通常无法与精心手工优化的实现竞争，memcpy()后者只能复制 16 个或更多字节的块，幸福地不知道这些实际上是否是 8 个short，两个doubles，或 1.33 个struct { float x,y,z; }.

c++ - 避免在 C++ 向量或 valarray 中初始化

2 回答 2

更新

Related

Reference