c++ - 数组遍历与指针，缓存效率方面

Question

void foo(Node* p[], int size){

    _uint64 arr_of_values[_MAX_THREADS];


    for (int i=0 ; i < size ; i++ ){
         arr_of_values[i] = p[i]->....;

         // much code here 
         // 
      }
 }

对比

void foo(Node* p[], int size){

    _uint64 arr_of_values[_MAX_THREADS];

    Node* p_end = p[size];
    for ( ; p != p_end ; ){            
         arr_of_values[i] = (*p)->.....;
         p++;


         // much code here 
         // 
     }

}

我创建了这个函数来演示我在问什么：

从缓存效率方面来看，什么更有效：采用 p[i] 还是使用 *p++？

（我永远不会在其余代码中使用 p[ix]，但我可以在以下计算中使用 p[i] 或 *p）

score 2 · Accepted Answer

最重要的是避免虚假分享。arr_of_values每个线程写入自己的插槽，但 8 或 16 个插槽共享一个缓存线（取决于 CPU），从而导致大量错误共享问题。在插槽之间添加填充以缓存对齐每个线程的插槽，或者在堆栈上累积并在最后只写入一次：

void foo(Node* p[], int size){

    _uint64 arr_of_values[_MAX_THREADS];

    Node* p_end = p[size];
    for ( ; p != p_end ; ){            
         temp = .....;
         p++;
         // much code here 
         // 
     }  
     arr_of_values[i] = temp;
}

通过指针访问或通过索引访问的问题与今天的编译器无关。

您的下一步行动应该是：获取软件优化手册的副本。阅读。措施。修复测量的热点，而不是猜测。

score 1 · Accepted Answer

从缓存的角度来看，问题不在于您访问元素的方式。在这种情况下使用指针或数组索引是等效的。

BTW Node* p[] 是一个指针数组。因此，您可能已将 Node 对象分配到遥远的内存区域。（例如使用几个 ptr = new Node()）。在以下情况下可获得最佳缓存性能：

您的节点连续存储到内存中
节点大小不超过缓存大小。

c++ - 数组遍历与指针，缓存效率方面

2 回答 2

Related

Reference