1

我有一个简单的函数,它通过引用获取两个变量:

void foo(int*& it2,
         bit_reader<big_endian_tag>& reader2)
{
    for(/* ... */)
    {
        *it2++ = boo(reader2.next());
        // it2++ => 0x14001d890 add qword ptr [r12], 0x4
    }
}

这里的问题是 forit2reader2优化器使计算机在循环期间写入内存而不是寄存器。

但是,以下代码在循环期间将变量正确放入寄存器,但在循环之前和之后以不必要的副本形式产生了额外的开销:

void foo2(int*& it2,
         bit_reader<big_endian_tag>& reader2)
{
    auto reader = reader2;
    auto it     = it2;

    for(/* ... */)
    {
        *it++ = boo(reader.next());
        // it++ => 0x14001d890 add r15, 0x4
    }

    reader2 = reader;
    it2 = it;
}

例如

如何使第一个示例生成与第二个示例相同的代码但没有额外的副本?

4

3 回答 3

5

问题是编译器无法证明it2函数内部没有变化。(嗯,它可以,但这远远超出了普通 C++ 编译器的预期功能。)

它怎么知道boo(reader2.next());不改变价值?考虑:

int* i = 0;

struct foo
{
    int myInt;
    int blah() { i = &myInt; return 5; }
};

void bar(int*& ptr, const foo& f)
{
    *ptr = f.blah(); // changes value of ptr!
}

int otherInt;
i = &otherInt;

bar(i, foo());

这不会为 分配任何内容otherInt,而在您转换之后它将:

void bar(int*& ptr, const foo& f)
{
    int* ptrCopy = ptr;
    *ptrCopy = f.blah(); // changes ptr, but not ptrCopy
}

所以除非编译器可以证明行为是相同的,否则它不能进行优化。

C99 用关键字解决了这个问题restrict,但 C++ 没有等价物。但是,大多数 C++ 编译器中都存在扩展,例如__restrict__or __restrict

要在标准 C++ 中做到这一点,您只需要明确并自己制作副本

于 2012-08-15T21:41:31.863 回答
1

好吧,你不能。

当您通过非常量引用传递参数时,您要求编译器更新原始变量。所以它必须将新值写入内存。

于 2012-08-15T21:41:27.127 回答
0

It is all about optimizing for the "memory hierarchies", doing computation is fastest when done directly on registers and that's why you really really want to take stuff from memory and copy it into registers before computing anything on it and then copy the result back into the memory location you need to. The performance you gain by computing directly on registers will generally offset the cost of loading and saving the memory to and from registers.

How do you ensure you get stuff from memory into registers? e.g.

size_t size;
double* arr;
for (int i = 0; i < size - 1; ++i) {
    double a = arr[i];     // copy to register
    double b = arr[i + 1]; // copy to register
    b = a*b + b;           // make sure flop computation is done in registers
    arr[i] = b;            // copy back to memory
}
于 2012-08-15T21:54:25.947 回答