3

这个人特别声称二进制搜索(在 C 编译器中)比从生成的代码中硬编码的 if 分支慢。(请原谅 Clojure 代码和古怪的标题 - 这声称这个人所做的通常与编译器有关)。

他写

我偶尔会在黑暗的角落看到这种代码。当一个人知道他的处理器是如何工作的,知道他的 C 编译器是如何工作的,知道他的数据结构,并且真的,真的需要他的循环快时,他偶尔会写这种东西。

这是真正的程序员编写的代码。

这是二进制搜索示例(请原谅 Clojure)

Start: (1 2 3 4 6 8 9 10 11 12)
Finish: ((((1) (2)) ((3) ((4) (6)))) (((8) (9)) ((10) ((11) (12)))))

然后,如果基于硬编码值,他会用生成的代码分支替换二进制搜索:

(defn lookup-fn-handwritten [x]
  (if (< x 6) 
    (if (< x 3)                         ; x is < 6
      (if (< x 2)                       ; x is < 3
        (if ( < x 1)                    ; x is < 2
          0                             ; < 1
          1)                            ; 1 <= x < 2
        3)                              ; 2 <= x < 3
      (if (< x 4)                       ; 3 <= x < 6
        4                               ; 3 <= x < 4
        2))                             ; 4 <= x < 6
    (if (< x 10)                        ; 6 <= x < 10
      (if (< x 9)                       ; 6 <= x < 9
        (if (< x 8) 
          2                             ; 6 <= x < 8
          3)                            ; 8 <= x < 9
        3)                              ; 9 <= x < 10
      (if (< x 11)                      ; 10 < x
        (if (< x 12)                    ; 11 <= x
          1                             ; 11 <= x < 12
          0)
        0))))                           ; 12 <= x

http://www.learningclojure.com/2010/09/clojure-faster-than-machine-code.html

我的问题是 - 如果从生成的代码和硬编码值中进行分支硬编码是否比二进制搜索更有效?(在任何语言中 - 但作者声称这在 C 中有效 - 然后似乎只在 JVM 上演示它)。

(请再次原谅链接帖子的古怪标题 - 这只是疯狂。)

4

4 回答 4

9

Well, the if-cascade probably does the same thing as the binary search, which means that it does the same comparisons, but without the associated "binary search management". It's an unrolled loop, and there is indeed a reason why compilers unroll loops. So yes, it will be faster.

But will it REALLY be faster? Now there's a problem called "cache". Whether you unroll a loop or anything else, your code gets larger, so the benefit might be offset by more memory accesses to run the code.

In addition, you never quite know what kind of instructions the compiler might be using to optimize the code, which it might not be using when you manually unroll the loop. Or the other way 'round, who knows. Even more so in languages that have a binary search "built in" so the compiler knows what it is dealing with.

Thus, just counting operations like "I have all the compares and none of the other stuff" may not be enough; there are other factors that affect execution time. And if you profile on one CPU to find out "my unrolled version is faster", another CPU might still disagree.

Optimizing is a b-word, not sure whether I'm allowed to spell it out here :-)

于 2012-07-05T12:12:44.170 回答
2

这种比较在现代 JVM 上可能非常困难。
因为 HotSpot 编译器在运行时会进行许多动态优化,如果它检测到一个类运行了很多次,它将开始广泛内联函数调用,这会产生嵌套的 if 表达式树。

在进行这种比较时,重要的是要在该类上“预热”JVM 几百万次以完成这种内联。我不认为差异会非常明显。

于 2012-07-05T19:12:34.007 回答
2

“硬编码”版本也是二进制搜索,只是比大多数二进制搜索实现更有效。我根本不认为这种说法是非同寻常的。但是,如果您避免了大多数典型的实现效率低下,您可能能够以相当快的速度进行一般的二分搜索。

于 2012-07-05T13:04:59.827 回答
1

The cascading if is effectively a specific binary search so you save by not having the code to implement a generic binary search, effectively it's an unwound loop so it's always going to be faster.

In a generic situation (i.e. where you're not targeting a specific configuration) you may well find that the CPU cache compromises unrolled loop performance once the loop is too big for the cache.

First rule of optimisation is to measure, find the problems and optimise accordingly and the original link is an excellent example of this that fits a specific configuration.

So yes, a good algorithm will always beat a bad algorithm pretty much regardless of the compiler, interpreter or CPU. That's the point being made in the original link.

Algorithm efficiency

于 2012-07-05T12:11:33.300 回答