1

我有以下循环,我在 ARM 处理器上运行。

// pin here is pointer to some part of an array
for (i = 0; i < v->numelements; i++)
{
    pe   = pptr[i];
    peParent = pe->parent;

    SPHERE  *ps = (SPHERE *)(pe->data);

    pin[0] = FLOAT2FIX(ps->rad2);
    pin[1] = *peParent->procs->pe_intersect == &SphPeIntersect;
    fixifyVector( &pin[2], ps->center ); // Is an inline function

    pin = pin + 5;
}

通过循环的缓慢性能,我可以判断编译器无法展开这个循环,因为当我手动展开时,它变得非常快。我认为编译器被pin指针弄糊涂了。我们可以在这里使用restrict关键字来帮助编译器,还是restrict只为函数参数保留?一般来说,我们如何告诉编译器展开它而不用担心pin指针。

4

2 回答 2

4

要告诉 gcc 展开所有循环,您可以使用优化标志-funroll-loops

要仅展开特定循环,您可以使用:

__attribute__((optimize("unroll-loops")))

有关更多详细信息,请参阅此答案

编辑

如果编译器在进入时无法确定循环的迭代次数,则需要使用-funroll-all-loops. 请注意,从文档中:"Unroll all loops, even if their number of iterations is uncertain when the loop is entered. This usually makes programs run more slowly."

于 2013-04-15T19:12:21.720 回答
0

如果您将pptr大小扩大一倍,则可以使用该pld指令。

  __asm__ __volatile__("pld\t[%0]" :: "r" (pptr[i+1]));

或者,您可能需要预加载下一个 peParentSPHERE *ps. ARM 上的循环开销非常小。展开循环不太可能带来显着的好处。没有循环变量常量。当您展开循环时,编译器的调度程序更有可能在使用高级数据之前获取高级数据。

You have not presented all of the code to see the data dependencies. There maybe other variables that would benefit from being pre-loaded. Giving a complete example would probably help everyone answer your question.

于 2013-04-15T19:29:40.503 回答