我目前正在学习 Linux 下的各种分析和性能实用程序,尤其是 valgrind/cachegrind。
我有以下玩具程序:
#include <iostream>
#include <vector>
int
main() {
const unsigned int COUNT = 1000000;
std::vector<double> v;
for(int i=0;i<COUNT;i++) {
v.push_back(i);
}
double counter = 0;
for(int i=0;i<COUNT;i+=8) {
counter += v[i+0];
counter += v[i+1];
counter += v[i+2];
counter += v[i+3];
counter += v[i+4];
counter += v[i+5];
counter += v[i+6];
counter += v[i+7];
}
std::cout << counter << std::endl;
}
g++ -O2 -g main.cpp
编译并运行该程序valgrind --tool=cachegrind ./a.out
,然后cg_annotate cachegrind.out.31694 --auto=yes
产生以下结果:
--------------------------------------------------------------------------------
-- Auto-annotated source: /home/andrej/Data/projects/pokusy/dod.cpp
--------------------------------------------------------------------------------
Ir I1mr ILmr Dr D1mr DLmr Dw D1mw DLmw
. . . . . . . . . #include <iostream>
. . . . . . . . . #include <vector>
. . . . . . . . .
. . . . . . . . . int
7 1 1 1 0 0 4 0 0 main() {
. . . . . . . . . const unsigned int COUNT = 1000000;
. . . . . . . . .
. . . . . . . . . std::vector<double> v;
. . . . . . . . .
5,000,000 0 0 1,999,999 0 0 0 0 0 for(int i=0;i<COUNT;i++) {
3,000,000 0 0 0 0 0 1,000,000 0 0 v.push_back(i);
. . . . . . . . . }
. . . . . . . . .
3 0 0 0 0 0 0 0 0 double counter = 0;
250,000 0 0 0 0 0 0 0 0 for(int i=0;i<COUNT;i+=8) {
250,000 0 0 125,000 1 1 0 0 0 counter += v[i+0];
125,000 0 0 125,000 0 0 0 0 0 counter += v[i+1];
125,000 1 1 125,000 0 0 0 0 0 counter += v[i+2];
125,000 0 0 125,000 0 0 0 0 0 counter += v[i+3];
125,000 0 0 125,000 0 0 0 0 0 counter += v[i+4];
125,000 0 0 125,000 0 0 0 0 0 counter += v[i+5];
125,000 0 0 125,000 125,000 125,000 0 0 0 counter += v[i+6];
125,000 0 0 125,000 0 0 0 0 0 counter += v[i+7];
. . . . . . . . . }
. . . . . . . . .
. . . . . . . . . std::cout << counter << std::endl;
11 0 0 6 1 1 0 0 0 }
我担心的是这条线:
125,000 0 0 125,000 125,000 125,000 0 0 0 counter += v[i+6];
为什么这条线有这么多缓存未命中?数据在连续内存中,每次迭代我都在读取 64 字节的数据(假设缓存线的长度为 64 字节)。
我在 Ubuntu Linux 18.04.1、内核 4.19、g++ 7.3.0 上运行这个程序。电脑是AMD 2400G。