c - 为什么循环裂变在这种情况下有意义？

Question

没有裂变的代码是这样的：

int check(int * res, char * map, int n, int * keys){
    int ret = 0;
    for(int i = 0; i < n; ++i){
        res[ret] = i;
        ret += map[hash(keys[i])]
    }
    return ret;
}

裂变：

int check(int * res, char * map, int n, int * keys){
    int ret = 0;
    for(int i = 0; i < n; ++i){
        tmp[i] = map[hash(keys[i])];
    }
    for(int i = 0; i < n; ++i){
        res[ret] = i;
        ret += tmp[i];
    }
    return ret;
}

笔记：

瓶颈是map[hash(keys[i])]随机访问内存。
通常，这是if(tmp[i]) res[ret++] = i;为了避免 if，我正在使用ret += tmp[i].
map[..]始终为 0 或 1

裂变版本通常要快得多，我试图解释原因。我最好的猜测是，它ret += map[..]仍然引入了一些依赖，并阻止了投机执行。

我想听听是否有人有更好的解释。

score 8 · Accepted Answer

从我的测试中，我得到了融合和分裂循环之间大约 2 倍的速度差异。无论我如何调整循环，这种速度差异都非常一致。

Fused: 1.096258 seconds
Split: 0.562272 seconds

（请参阅底部的完整测试代码。）

虽然我不是 100% 确定，但我怀疑这是由于两件事的结合：

由于来自map[gethash(keys[i])]. _
融合循环版本中添加的依赖项。

很明显，map[gethash(keys[i])]几乎每次都会导致缓存未命中。事实上，可能足以使整个加载存储缓冲区饱和。

现在让我们看看添加的依赖项。问题是ret变量：

int check_fused(int * res, char * map, int n, int * keys){
    int ret = 0;
    for(int i = 0; i < n; ++i){
        res[ret] = i;
        ret += map[gethash(keys[i])];
    }
    return ret;
}

store的地址解析需要该ret变量。res[ret] = i;

在融合循环ret中，来自确定的缓存未命中。
在拆分循环中，ret即将到来tmp[i]- 这要快得多。

融合循环情况的地址解析延迟可能会导致res[ret] = i存储与map[gethash(keys[i])].

由于加载存储缓冲区的大小是固定的，但其中的垃圾数量增加了一倍：
您只能将缓存未命中重叠的次数减少到以前的一半。因此减速了 2 倍。

假设如果我们将融合循环更改为：

int check_fused(int * res, char * map, int n, int * keys){
    int ret = 0;
    for(int i = 0; i < n; ++i){
        res[i] = i;    //  Change "res" to "i"
        ret += map[gethash(keys[i])];
    }
    return ret;
}

这将打破地址解析依赖性。

^{（请注意，它不再相同，但这只是为了展示性能差异。）}

然后我们得到类似的时间：

Fused: 0.487477 seconds
Split: 0.574585 seconds

这是完整的测试代码：

#define SIZE 67108864

unsigned gethash(int key){
    return key & (SIZE - 1);
}

int check_fused(int * res, char * map, int n, int * keys){
    int ret = 0;
    for(int i = 0; i < n; ++i){
        res[ret] = i;
        ret += map[gethash(keys[i])];
    }
    return ret;
}
int check_split(int * res, char * map, int n, int * keys, int *tmp){
    int ret = 0;
    for(int i = 0; i < n; ++i){
        tmp[i] = map[gethash(keys[i])];
    }
    for(int i = 0; i < n; ++i){
        res[ret] = i;
        ret += tmp[i];
    }
    return ret;
}


int main()
{
    char *map = (char*)calloc(SIZE,sizeof(char));
    int *keys =  (int*)calloc(SIZE,sizeof(int));
    int *res  =  (int*)calloc(SIZE,sizeof(int));
    int *tmp  =  (int*)calloc(SIZE,sizeof(int));
    if (map == NULL || keys == NULL || res == NULL || tmp == NULL){
        printf("Memory allocation failed.\n");
        system("pause");
        return 1;
    }

    //  Generate Random Data
    for (int i = 0; i < SIZE; i++){
        keys[i] = (rand() & 0xff) | ((rand() & 0xff) << 16);
    }

    printf("Start...\n");

    double start = omp_get_wtime();
    int ret;

    ret = check_fused(res,map,SIZE,keys);
//    ret = check_split(res,map,SIZE,keys,tmp);

    double end = omp_get_wtime();

    printf("ret = %d",ret);
    printf("\n\nseconds = %f\n",end - start);

    system("pause");
}

score 1 · Accepted Answer

我不认为这是数组索引，而是对函数的调用hash()可能导致管道停顿并阻止最佳指令重新排序。

c - 为什么循环裂变在这种情况下有意义？

2 回答 2

Related

Reference