比你增加或减少计数器更重要的是你是增加内存还是减少内存。大多数缓存都针对增加内存而不是减少内存进行了优化。由于内存访问时间是当今大多数程序面临的瓶颈,这意味着更改程序以增加内存可以提高性能,即使这需要将计数器与非零值进行比较。在我的一些程序中,通过将代码更改为增加内存而不是减少内存,我看到了性能的显着提高。
持怀疑态度?这是我得到的输出:
sum up = 705046256
sum down = 705046256
Ave. Up Memory = 4839 mus
Ave. Down Memory = 5552 mus
sum up = inf
sum down = inf
Ave. Up Memory = 18638 mus
Ave. Down Memory = 19053 mus
从运行这个程序:
#include <chrono>
#include <iostream>
#include <random>
#include <vector>
template<class Iterator, typename T>
void FillWithRandomNumbers(Iterator start, Iterator one_past_end, T a, T b) {
std::random_device rnd_device;
std::mt19937 generator(rnd_device());
std::uniform_int_distribution<T> dist(a, b);
for (auto it = start; it != one_past_end; it++)
*it = dist(generator);
return ;
}
template<class Iterator>
void FillWithRandomNumbers(Iterator start, Iterator one_past_end, double a, double b) {
std::random_device rnd_device;
std::mt19937_64 generator(rnd_device());
std::uniform_real_distribution<double> dist(a, b);
for (auto it = start; it != one_past_end; it++)
*it = dist(generator);
return ;
}
template<class RAI, class T>
inline void sum_abs_up(RAI first, RAI one_past_last, T &total) {
T sum = 0;
auto it = first;
do {
sum += *it;
it++;
} while (it != one_past_last);
total += sum;
}
template<class RAI, class T>
inline void sum_abs_down(RAI first, RAI one_past_last, T &total) {
T sum = 0;
auto it = one_past_last;
do {
it--;
sum += *it;
} while (it != first);
total += sum;
}
template<class T> std::chrono::nanoseconds TimeDown(
std::vector<T> &vec, const std::vector<T> &vec_original,
std::size_t num_repititions, T &running_sum) {
std::chrono::nanoseconds total{0};
for (std::size_t i = 0; i < num_repititions; i++) {
auto start_time = std::chrono::high_resolution_clock::now();
sum_abs_down(vec.begin(), vec.end(), running_sum);
total += std::chrono::high_resolution_clock::now() - start_time;
vec = vec_original;
}
return total;
}
template<class T> std::chrono::nanoseconds TimeUp(
std::vector<T> &vec, const std::vector<T> &vec_original,
std::size_t num_repititions, T &running_sum) {
std::chrono::nanoseconds total{0};
for (std::size_t i = 0; i < num_repititions; i++) {
auto start_time = std::chrono::high_resolution_clock::now();
sum_abs_up(vec.begin(), vec.end(), running_sum);
total += std::chrono::high_resolution_clock::now() - start_time;
vec = vec_original;
}
return total;
}
int main() {
std::size_t num_repititions = 1 << 10;
{
typedef int ValueType;
auto lower = std::numeric_limits<ValueType>::min();
auto upper = std::numeric_limits<ValueType>::max();
std::vector<ValueType> vec(1 << 24);
FillWithRandomNumbers(vec.begin(), vec.end(), lower, upper);
const auto vec_original = vec;
ValueType sum_up = 0, sum_down = 0;
auto time_up = TimeUp(vec, vec_original, num_repititions, sum_up).count();
auto time_down = TimeDown(vec, vec_original, num_repititions, sum_down).count();
std::cout << "sum up = " << sum_up << '\n';
std::cout << "sum down = " << sum_down << '\n';
std::cout << "Ave. Up Memory = " << time_up/(num_repititions * 1000) << " mus\n";
std::cout << "Ave. Down Memory = "<< time_down/(num_repititions * 1000) << " mus"
<< std::endl;
}
{
typedef double ValueType;
auto lower = std::numeric_limits<ValueType>::min();
auto upper = std::numeric_limits<ValueType>::max();
std::vector<ValueType> vec(1 << 24);
FillWithRandomNumbers(vec.begin(), vec.end(), lower, upper);
const auto vec_original = vec;
ValueType sum_up = 0, sum_down = 0;
auto time_up = TimeUp(vec, vec_original, num_repititions, sum_up).count();
auto time_down = TimeDown(vec, vec_original, num_repititions, sum_down).count();
std::cout << "sum up = " << sum_up << '\n';
std::cout << "sum down = " << sum_down << '\n';
std::cout << "Ave. Up Memory = " << time_up/(num_repititions * 1000) << " mus\n";
std::cout << "Ave. Down Memory = "<< time_down/(num_repititions * 1000) << " mus"
<< std::endl;
}
return 0;
}
两者都sum_abs_up
做sum_abs_down
同样的事情,并且以相同的方式计时,唯一的区别是sum_abs_up
内存增加而sum_abs_down
内存减少。我什至通过vec
引用传递,以便两个函数访问相同的内存位置。然而,sum_abs_up
始终比sum_abs_down
. 自己运行一下(我用 g++ -O3 编译它)。
FYIvec_original
在那里进行实验,让我更容易改变sum_abs_up
,并sum_abs_down
以一种使它们改变的方式,vec
同时不允许这些改变影响未来的时间安排。
重要的是要注意我计时的循环有多紧。如果一个循环的主体很大,那么它的迭代器是否会增加或减少内存可能并不重要,因为执行循环主体所需的时间可能会完全占主导地位。此外,重要的是要提到,对于一些罕见的循环,下降内存有时比上升内存要快。但即使有这样的循环,上升总是比下降慢的情况很少发生(不像上升内存的循环,它通常总是比等效的内存下降循环快;少数几次甚至是 40 +% 更快)。
关键是,根据经验,如果你有选择,如果循环的主体很小,如果让你的循环增加内存而不是减少内存之间没有什么区别,那么你应该增加内存。