c++ - C++ 特征库 - 二次规划，固定与动态，性能

Question

我正在研究做一些二次编程，并且已经看到了不同的库。我见过 QuadProg++ 的各种 Eigen 变体（KDE 论坛、Benjamin Stephens、StackOverflow 帖子）。作为测试，我在 GitHub 上分叉了wingsit 的 Eigen 变体，以实现编译时大小的问题，以通过模板测量性能。

我发现我在模板案例中的性能比来自wingsit 代码的动态大小（MatrixXD / VectorXD）案例更差。我知道这不是一个简单的问题，但是任何人都可以看到这可能的原因吗？

注意：我确实需要增加问题大小/迭代次数，一旦可以，我会发布。

编辑：我在 Ubuntu 12.04 上使用 GCC 4.6.3。这些是我正在使用的标志（从wingsit的代码修改）：

CFLAGS  = -O4 -Wall -msse2 -fopenmp      # option for obj
LFLAGS  = -O4 -Wall -msse2 -fopenmp      # option for exe (-lefence ...)

score 3 · Accepted Answer

静态大小的代码的好处通常会随着大小的增加而减少。静态大小代码的典型好处主要包括（但不限于）以下几点：

基于堆栈的分配比堆分配更快。然而，在大尺寸下，基于堆栈的分配不再可行（堆栈溢出），甚至从预取和引用位置的角度来看也是有益的。
循环展开当编译器看到一个小的静态大小的循环时，它可以展开它，并且可能使用 SSE 指令。这不适用于较大的尺寸。

换句话说，对于小尺寸（最多可能 N=12 左右），静态尺寸的代码可以比等效的动态尺寸的代码更好更快，只要编译器对内联和循环相当积极展开。但是，当尺寸较大时，就没有意义了。

此外，静态大小的代码也有许多缺点：

没有有效的移动语义/交换/写时复制策略，因为这样的代码通常是用静态数组实现的（为了获得上面提到的好处），不能简单地交换（如交换内部指针）。
Larger executables which contain functions that are spread out. Say you have one function template to multiply two matrices, and so, a new compiled function (inlined or not) is generated for each size combination. Then, you have some algorithm that does a lot of matrix multiplications, well, each multiplication will have to jump to (or execute inline) the specialization for that size combination. At the end, you end up with a lot more code that needs to be cached, and then, cache misses become a lot more frequent, and that will completely destroy the performance of your algorithm.

So, the lesson to draw from that is to use static-sized vectors and matrices only for small things like 2 to 6 dimensional vectors or matrices. But beyond that, it's preferable to go with dynamically-sized code (or maybe try static-sized code, but verify that it performs better, because it is not guaranteed to do so). So, I would advise you to reconsider your idea of using static-sized code for larger problems.

c++ - C++ 特征库 - 二次规划，固定与动态，性能

1 回答 1

Related

Reference