performance - 推理 Haskell 的性能

Question

以下两个用于计算斐波那契数列第 n 项的 Haskell 程序具有截然不同的性能特征：

fib1 n =
  case n of
    0 -> 1
    1 -> 1
    x -> (fib1 (x-1)) + (fib1 (x-2))

fib2 n = fibArr !! n where
  fibArr = 1:1:[a + b | (a, b) <- zip fibArr (tail fibArr)]

它们在数学上非常接近相同，但fib2使用列表符号来记忆其中间结果，同时fib1具有显式递归。尽管有可能将中间结果缓存在中fib1，但即使对于而言，执行时间也是一个问题fib1 25，这表明递归步骤总是被评估的。引用透明度对 Haskell 的性能有什么贡献吗？我怎么能提前知道它会或不会呢？

这只是我担心的那种事情的一个例子。我想听听关于克服推理延迟执行的函数式编程语言的性能所固有的困难的任何想法。

总结： 我接受 3lectrologos 的回答，因为在 Haskell 中，您不必过多考虑语言的性能，如编译器的优化，这一点似乎非常重要——比我熟悉的任何其他语言都重要和。我倾向于说编译器的重要性是将惰性、函数式语言的性能推理与任何其他类型的性能推理区分开来的因素。

附录：遇到这个问题的任何人都可能想看看Johan Tibell 关于高性能 Haskell的幻灯片。

score 11 · Accepted Answer

在您的特定斐波那契示例中，不难看出为什么第二个应该运行得更快（尽管您没有指定 f2 是什么）。

主要是算法问题：

fib1实现了纯递归算法，并且（据我所知）Haskell 没有“隐式记忆”机制。
fib2使用显式记忆（使用 fibArr 列表存储先前计算的值。

一般来说，对像 Haskell 这样的惰性语言做出性能假设要比对渴望的语言做出性能假设要困难得多。不过，如果您了解底层机制（尤其是惰性）并积累一些经验，您将能够对性能做出一些“预测”。

参考透明度以（至少）两种方式（可能）提高性能：

首先，您（作为程序员）可以确保对同一个函数的两次调用将始终返回相同的结果，因此您可以在各种情况下利用这一点来提高性能。
其次（也是更重要的），Haskell 编译器可以确定上述事实，这可能会启用许多在不纯语言中无法启用的优化（如果您曾经编写过编译器或在编译器优化方面有任何经验，那么您就是可能意识到这一点的重要性）。

如果您想了解更多有关 Haskell 设计选择（懒惰、纯粹）背后的原因，我建议您阅读这篇文章。

score 6 · Accepted Answer

在 Haskell 和惰性语言中，关于性能的推理通常很困难，尽管并非不可能。Chris Okasaki 的Purely Function Data Structures中介绍了一些技术（在以前的版本中也可以在线获得）。

确保性能的另一种方法是固定评估顺序，使用注释或继续传递样式。这样，您就可以控制评估事物的时间。

在您的示例中，您可能会计算“自下而上”的数字并将前两个数字传递给每次迭代：

fib n = fib_iter(1,1,n)
    where
      fib_iter(a,b,0) = a
      fib_iter(a,b,1) = a
      fib_iter(a,b,n) = fib_iter(a+b,a,n-1)

这导致线性时间算法。

只要您有一个动态规划算法，其中每个结果都依赖于 N 个先前的结果，您就可以使用这种技术。否则，您可能必须使用数组或完全不同的东西。

score 5 · Accepted Answer

Your implementation of fib2 uses memoization but each time you call fib2 it rebuild the "whole" result. Turn on ghci time and size profiling:

Prelude> :set +s

If it was doing memoisation "between" calls the subsequent calls would be faster and use no memory. Call fib2 20000 twice and see for yourself.

By comparison a more idiomatic version where you define the exact mathematical identity:

-- the infinite list of all fibs numbers.
fibs = 1 : 1 : zipWith (+) fibs (tail fibs)

memoFib n = fibs !! n

actually do use memoisation, explicit as you see. If you run memoFib 20000 twice you'll see the time and space taken the first time then the second call is instantaneous and take no memory. No magic and no implicit memoization like a comment might have hinted at.

Now about your original question: optimizing and reasoning about performance in Haskell...

I wouldn't call myself an expert in Haskell, I have only been using it for 3 years, 2 of which at my workplace but I did have to optimize and get to understand how to reason somewhat about its performance.

As mentionned in other post laziness is your friend and can help you gain performance however YOU have to be in control of what is lazily evaluated and what is strictly evaluated.

Check this comparison of foldl vs foldr

foldl actually stores "how" to compute the value i.e. it is lazy. In some case you saves time and space beeing lazy, like the "infinite" fibs. The infinite "fibs" doesn't generate all of them but knows how. When you know you will need the value you might as well just get it "strictly" speaking... That's where strictness annotation are usefull, to give you back control.

I recall reading many times that in lisp you have to "minimize" consing.

Understanding what is stricly evaluated and how to force it is important but so is understanding how much "trashing" you do to the memory. Remember Haskell is immutable, that means that updating a "variable" is actually creating a copy with the modification. Prepending with (:) is vastly more efficient than appending with (++) because (:) does not copy memory contrarily to (++). Whenever a big atomic block is updated (even for a single char) the whole block needs to be copied to represent the "updated" version. The way you structure data and update it can have a big impact on performance. The ghc profiler is your friend and will help you spot these. Sure the garbage collector is fast but not having it do anything is faster!

Cheers

score 2 · Accepted Answer

除了记忆问题，fib1 还使用非尾调用递归。尾调用递归可以自动重构为一个简单的 goto 并执行得非常好，但 fib1 中的递归无法以这种方式优化，因为您需要来自 fib1 的每个实例的堆栈帧才能计算结果。如果您重写 fib1 以将运行总计作为参数传递，从而允许尾部调用而不是需要保留堆栈帧以进行最终添加，性能将大大提高。但当然不如记忆的例子多:)

score 1 · Accepted Answer

由于分配是任何函数式语言的主要成本，因此了解性能的一个重要部分是了解对象何时被分配、它们的生存时间、它们何时死亡以及它们何时被回收。要获取此信息，您需要一个堆分析器。这是一个必不可少的工具，幸运的是 GHC 附带了一个很好的工具。

如需更多信息，请阅读Colin Runciman的论文。

performance - 推理 Haskell 的性能

5 回答 5

Related

Reference