我正在使用 Strostrup 的书学习 C++,解决这个练习问题:
使用 new 测量在 [1000:0) 字节范围内分配 10,000 个随机大小的对象所需的时间(第 26.6.1 节);然后测量使用删除释放它们所需的时间。这样做两次,一次以分配的相反顺序解除分配,一次以随机顺序解除分配。然后,执行等效的操作,从池中分配 10,000 个大小为 500 字节的对象并释放它们。然后,相当于在堆栈上分配 [1000:0) 字节范围内的 10,000 个随机大小的对象,然后释放它们(以相反的顺序)。比较测量值。每次测量至少进行 3 次,以确保结果一致。
这是我的代码:
#include <array>
#include <algorithm>
#include <chrono>
#include <iostream>
#include <numeric>
#include <vector>
#include <random>
#include <memory_resource>
int main() {
constexpr size_t N = 1'000;
constexpr size_t SZ = 1'000;
constexpr size_t trials = 1;
std::mt19937 gen(std::random_device{}());
std::uniform_int_distribution<> sz(1, SZ);
std::array<char*, N> arr {0};
std::chrono::duration<int, std::micro> dt1 {0};
// allocate random size, deallocate in reverse order
for (size_t t = 0; t < trials; t++) {
auto t1 = std::chrono::steady_clock::now();
for (size_t i = 0; i < N; i++) {
arr[i] = new char[sz(gen)];
}
for (size_t i = N - 1; i < N; i--) {
delete arr[i];
}
auto t2 = std::chrono::steady_clock::now();
dt1 += std::chrono::duration_cast<std::chrono::microseconds>(t2 - t1);
}
std::cout << dt1.count() / trials << "us\n";
std::vector<int> v (N);
std::iota(v.begin(), v.end(), 0);
std::shuffle(v.begin(), v.end(), gen);
std::chrono::duration<int, std::micro> dt2 {0};
// allocate random size, deallocate in random order
for (size_t t = 0; t < trials; t++) {
auto t1 = std::chrono::steady_clock::now();
for (size_t i = 0; i < N; i++) {
arr[i] = new char[sz(gen)];
}
for (size_t i = 0; i < N; i++) {
delete arr[v[i]];
}
auto t2 = std::chrono::steady_clock::now();
dt2 += std::chrono::duration_cast<std::chrono::microseconds>(t2 - t1);
}
std::cout << dt2.count() / trials << "us\n";
std::pmr::unsynchronized_pool_resource pool;
std::array<std::pmr::vector<std::byte>, N> arr2;
std::chrono::duration<int, std::micro> dt3 {0};
// allocate fixed size in a pool, deallocate in reversed order
for (size_t t = 0; t < trials; t++) {
auto t1 = std::chrono::steady_clock::now();
for (size_t i = 0; i < N; i++) {
arr2[i] = std::pmr::vector<std::byte>(500, &pool);
}
for (size_t i = N - 1; i < N; i--) {
arr2[i].clear();
}
auto t2 = std::chrono::steady_clock::now();
dt3 += std::chrono::duration_cast<std::chrono::microseconds>(t2 - t1);
}
std::cout << dt3.count() / trials << "us\n";
std::chrono::duration<int, std::micro> dt4 {0};
// allocate fixed size in a pool, deallocate in random order
for (size_t t = 0; t < trials; t++) {
auto t1 = std::chrono::steady_clock::now();
for (size_t i = 0; i < N; i++) {
arr2[i] = std::pmr::vector<std::byte>(500, &pool);
}
for (size_t i = 0; i < N; i++) {
arr2[v[i]].clear();
}
auto t2 = std::chrono::steady_clock::now();
dt4 += std::chrono::duration_cast<std::chrono::microseconds>(t2 - t1);
}
std::cout << dt4.count() / trials << "us\n";
std::array<std::byte, N * SZ> buf;
std::chrono::duration<int, std::micro> dt5 {0};
// allocate random size in a stack, deallocate in reversed order
for (size_t t = 0; t < trials; t++) {
std::pmr::monotonic_buffer_resource pool2 {buf.data(), buf.size()};
auto t1 = std::chrono::steady_clock::now();
for (size_t i = 0; i < N; i++) {
arr2[i] = std::pmr::vector<std::byte>(sz(gen), &pool2);
}
auto t2 = std::chrono::steady_clock::now();
dt5 += std::chrono::duration_cast<std::chrono::microseconds>(t2 - t1);
}
std::cout << dt5.count() / trials << "us\n";
}
这是在我的机器中使用trials = 1gcc 9.1 中的 -O3 并使用 -O3 编译的典型输出:
901us
137us
637us
210us
1018us
所以我最初得出的结论是“哇,以相反顺序解除分配比以随机顺序解除分配要慢得多!这将是本次练习的一个教训。”
但是,事实证明这是不正确的。当我增加trials 的数量以进行平均(=一致)基准测试时,发生了一件奇怪的事情:
设置trials = 30给出:
233us
206us
348us
226us
365us
设置trials = 1'000给出:
167us
154us
221us
195us
194us
因此,当我一遍又一遍地分配/解除分配时,以相反顺序解除分配并不会比以随机顺序解除分配特别慢!
为什么会发生这种奇怪的事情?为什么第一次以相反的顺序释放很慢?