c++ - 从紧密的内部循环调用微小函数的开销？[C++]

Question

假设你看到一个像这样的循环：

for(int i=0;
    i<thing.getParent().getObjectModel().getElements(SOME_TYPE).count();
    ++i)
{
  thing.getData().insert(
    thing.GetData().Count(),
    thing.getParent().getObjectModel().getElements(SOME_TYPE)[i].getName()
    );
}

如果这是 Java，我可能不会三思而后行。但是在 C++ 的性能关键部分，它让我想修改它......但是我不知道编译器是否足够聪明以至于它是徒劳的。这是一个虚构的示例，但它所做的只是将字符串插入容器中。请不要假设其中任何一个都是 STL 类型，请笼统地考虑以下内容：

for 循环中有一个混乱的条件每次都会被评估，还是只评估一次？
如果这些 get 方法只是返回对对象上成员变量的引用，它们会被内联吗？
您是否希望自定义 [] 运算符得到优化？

换句话说，是否值得花时间（仅在性能上，而不是在可读性上）将其转换为以下内容：

ElementContainer &source = 
   thing.getParent().getObjectModel().getElements(SOME_TYPE);
int num = source.count();
Store &destination = thing.getData();
for(int i=0;i<num;++i)
{
  destination.insert(thing.GetData().Count(), source[i].getName());
}

请记住，这是一个紧密的循环，每秒调用数百万次。我想知道的是，所有这些是否会在每个循环中减少几个周期或更重要的东西？

是的，我知道关于“过早优化”的报价。而且我知道分析很重要。但这是一个关于现代编译器的更普遍的问题，尤其是 Visual Studio。

score 4 · Accepted Answer

回答此类问题的一般方法是查看生成的程序集。使用 gcc，这涉及将-c标志替换为-S.

我自己的规则是不与编译器抗争。如果要内联某些内容，那么我确保编译器具有执行此类内联所需的所有信息，并且（可能）我尝试使用显式inline关键字敦促他这样做。

此外，内联节省了一些操作码，但会使代码增长，就 L1 缓存而言，这可能对性能非常不利。

score 2 · Accepted Answer

您提出的所有问题都是特定于编译器的，因此唯一明智的答案是“视情况而定”。如果它对您很重要，您应该（一如既往）查看编译器发出的代码并进行一些计时实验。确保您的代码是在所有优化打开的情况下编译的 - 这对于诸如之类的东西可能会有很大的不同operator[]()，它通常作为内联函数实现，但除非您打开优化，否则它不会被内联（至少在 GCC 中）。

score 1 · Accepted Answer

通常，您的“for 条件”中不应包含所有垃圾，除非在循环执行期间结果将发生变化。

在循环外使用另一个变量集。这将在阅读代码时消除 WTF，不会对性能产生负面影响，并且会回避函数优化程度的问题。如果这些调用没有优化，这也会导致性能提升。

score 1 · Accepted Answer

如果循环那么关键，我只能建议您查看生成的代码。如果允许编译器积极优化调用，那么它可能不会成为问题。很抱歉这么说，但是现代编译器可以优化得非常好，我真的建议进行分析以在您的特定情况下找到最佳解决方案。

score 1 · Accepted Answer

如果方法很小并且可以并且将被内联，那么编译器可能会执行与您所做的相同的优化。因此，查看生成的代码并进行比较。

编辑：将 const 方法标记为也很重要const，例如在您的示例中count()，并且getName()应该const让编译器知道这些方法不会改变给定对象的内容。

score 0 · Accepted Answer

I think in this case you are asking the compiler to do more than it legitimately can given the scope of compile-time information it has access to. So, in particular cases the messy condition may be optimized away, but really, the compiler has no particularly good way to know what kind of side effects you might have from that long chain of function calls. I would assume that breaking out the test would be faster unless I have benchmarking (or disassembly) that shows otherwise.

This is one of the cases where the JIT compiler has a big advantage over a C++ compiler. It can in principle optimize for the most common case seen at runtime and provide optimized bytecode for that (plus checks to make sure that one falls into that case). This sort of thing is used all the time in polymorphic method calls that turn out not to actually be used polymorphically; whether it could catch something as complex as your example, though, I'm not certain.

For what it's worth, if speed really mattered, I'd split it up in Java too.

c++ - 从紧密的内部循环调用微小函数的开销？[C++]

6 回答 6

Related

Reference