python - 优化 Python 中常量的冗余逻辑测试

Question

假设我有一个主函数，它执行布尔逻辑测试来决定运行 2 个子函数中的一个，函数 A 或函数 B。主函数循环了 10 亿次，但逻辑测试的值是一个常数（它是用户在程序启动时输入）。

我看到了 2 种可能的写法： 1）将逻辑测试埋在函数 A 中。至少在理论上，逻辑测试必须执行 10 亿次，这听起来效率不高。2）在main函数之前做逻辑测试。将主函数拆分为主函数 1 和主函数 2（除了运行哪个子函数之外，它们是相同的），并使用逻辑测试来决定运行哪个主函数。在这里，逻辑测试只执行一次，但这种实现会产生冗余代码。

实现 1) 和 2) 之间的计算效率有什么不同吗？换句话说，Python 是否进行了任何自动优化以使这两个实现在机器代码级别等效？

score 2 · Accepted Answer

虽然@mmgp 在这两个方面都是正确的——CPython 不做任何这样的优化，而且这不太可能成为 Python 擅长的那种代码的瓶颈——还有第三种选择。您可以传递要用作参数的函数：

>>> def g1():
...         print 'g1'
...     
>>> def g2():
...         print 'g2'
...     
>>> def subfunc(fn):
...         fn()
...     
>>> def caller(a):
...         f = g1 if a else g2
...         for i in range(2):
...                 subfunc(f)
...         
>>> caller(True)
g1
g1
>>> caller(False)
g2
g2

您的子功能可以保持完全相同，并且您已将测试提升到循环之外。

score 1 · Accepted Answer

正如 Patashu 建议的那样，让我们使用timeit测试而不是尝试猜测。我将使用中的魔法%timeit，ipython因为它更简单。这是代码：

In [275]: def ff(): pass

In [276]: def ft(): pass

In [277]: def f1(b): # naive implementation
   .....:     for i in range(1000000):
   .....:         if b: ft()
   .....:         else: ff()

In [278]: %timeit f1(True)
10 loops, best of 3: 117 ms per loop

In [279]: def f2(b): # DSM's implementation
   .....:     f = ft if b else ff
   .....:     for i in range(1000000):
   .....:         f()

In [280]: %timeit f2(True)
10 loops, best of 3: 99.2 ms per loop

所以，它要快一点，至少在我的 Mac 上的 CPython 3.3.0 64 位中。

However, if you know anything about Python optimization, you'll probably notice that this is about the same performance gain you'd expect just from moving a global variable to local. So, let's take that out of the equation by doing the same thing without hoisting the boolean expression:

In [277]: def f3(b): # Just local binding, no if hoisting
   .....:     f, g = ft, ff
   .....:     for i in range(1000000):
   .....:         if b: f()
   .....:         else: g()
In [286]: %timeit f3(True)
10 loops, best of 3: 94.8 ms per loop

I put together a more complete test, including the OP's intended optimization, and code that works in 3.x and 2.x without change, and ran it against Apple 2.7.2, python.org 3.3.0, PyPy 1.9.0/2.7.2, and Jython 2.5.2 (all 64-bit builds on a Mac, and then just using Cython 0.17.1 pyximport (under Python 3.3.0) to compile the same source as Cython code:

                 3.3.0   2.7.2   PyPy    Jython  Cython
orig             1.136   1.519   0.091   1.680   0.448
OP optimization  1.119   1.362   0.034   1.613   0.460
rebinding        0.936   1.369   0.030   1.492   0.137
DSM version      0.936   1.329   0.031   1.523   0.138

So, it looks like binding the names outside the loop gives a speed boost of somewhere between 1.1x and 3x; additionally hoisting the comparison out of the loop may give you another 3% or so—but all of this is nothing compared to using PyPy instead of CPython, Cython instead of Python, or even 3.x instead of 2.x. Writing actual Cython or custom C code, or moving the loop into numpy, would be even faster.

And if you think about it, this makes sense. If the cost of a billion bool comparisons or global lookups matters, the cost of a billion function calls and a billion loops through the interpreter are going to matter far more. If you're not bothering to optimize that out (and you can often do that just by using a generator expression, list comprehension, map call, etc. in place of a loop, even if switching interpreters, rewriting your code around numpy, etc. aren't feasible), you shouldn't be worrying about the small stuff.

And obviously, if that last 3% ever really does make a difference, you'll need to perform more realistic tests, on the platforms you actually care about.

It's probably worth using DSM's implementation—but because it's more idiomatic and easier to read, not because it may or may not be faster.

python - 优化 Python 中常量的冗余逻辑测试

2 回答 2

Related

Reference