c - 缓存命中、未命中和预测 - 对性能的影响

Question

我编写了以下玩具基准。

int N = 1024*4096;
unsigned char *ary = malloc(N);
ary[0] = 1;
int stride, i;
double start, end;
int sum;
for(stride = 1; stride < N; ++stride) {
    start = getCPUTime();

    sum = 0;
    for(i = 0; i < N; i+=stride) {
        sum += ary[i];
    }

    end = getCPUTime();
    printf("stride %d time %f sum %d\n", stride, (end - start)/(N/stride), sum);
}

基本上，它以不同的步幅遍历数组。然后我绘制了结果：

在此处输入图像描述

（结果被平滑）

当 stride 为 ~128 时，CPU 可以将所有要访问的数据放入 L1 Cache 中。鉴于访问的线性，未来的读取可能是可以预测的。

我的问题是，为什么在那之后平均阅读时间不断增加？我对 stride=~128 的推理也适用于大于该值的值。

谢谢！

score 0 · Accepted Answer

是你用的代码吗？它所做的只是从 16 MB 读取数据。我在我的 PC 上运行它，其中 16 MB 来自 RAM，计算 MB/秒，在步幅 2 时为 993，在步幅 999 时减少到 880。基于测量微秒运行时间，您的时间计算在步幅 2 处产生 0.0040，增加0.0045 步幅 999。

有各种各样的原因会在增加步幅时降低速度，例如突发读取、高速缓存对齐和不同的内存库。

c - 缓存命中、未命中和预测 - 对性能的影响

1 回答 1

Related

Reference