7

Wikipedia writes about Hylomorphism:

In [...] functional programming a hylomorphism is a recursive function, corresponding to the composition of an anamorphism (which first builds a set of results; also known as 'unfolding') followed by a catamorphism (which then folds these results into a final return value). Fusion of these two recursive computations into a single recursive pattern then avoids building the intermediate data structure. This is an example of deforestation, a program optimization strategy.

(Bold markup by me)

Using the recursion-schemes library I wrote a very simple hylomorphism:

import Data.Functor.Foldable
main :: IO ()
main = putStrLn $ show $ hylosum 1000

hylosum :: Int -> Int
hylosum end = hylo alg coalg 1
  where 
    -- Create list of Int's from 1 to n
    coalg :: Int -> ListF Int Int
    coalg n 
       | n > end = Nil
       | otherwise = Cons n (n + 1)
    -- Sum up a list of Int's
    alg :: ListF Int Int -> Int
    alg Nil  =  0
    alg (Cons a x) = a + x

In the cabal file I instructed GHC to optimize the code:

name:                Hylo
version:             0.1.0.0
synopsis:            Hylomorphisms and Deforestation        
build-type:          Simple
cabal-version:       >=1.10

executable             Hylo
  main-is:             Main.hs
  ghc-options:         -O2
  build-depends:       base >=4.10 && <4.11 , recursion-schemes      
  default-language:    Haskell2010

Using stackage lts-10.0 (GHC 8.2.2) I compile with stack build and run with stack exec Hylo -- +RTS -s and I get:

500500
      84,016 bytes allocated in the heap
       3,408 bytes copied during GC
      44,504 bytes maximum residency (1 sample(s))
      25,128 bytes maximum slop
           2 MB total memory in use (0 MB lost due to fragmentation)

Now I change hylosum 1000 to hylosum 1000000 (1000 times more) and I get:

500000500000
  16,664,864 bytes allocated in the heap
      16,928 bytes copied during GC
  15,756,232 bytes maximum residency (4 sample(s))
      29,224 bytes maximum slop
          18 MB total memory in use (0 MB lost due to fragmentation)

So heap usage goes up from 84 KB to 16,664 KB. This is 200 times more than before. Therefore I think, GHC does not do the deforestation / fusion mentioned in wikipedia!

This is not really a surprise: The anamorphism creates the list items from left to right (from 1 to n) and the catamorphism consumes the items the opposite way from right to left (from n to 1) and it's difficult to see how the hylomorphism can work without creating the whole intermediate list.

Questions: Is GHC able to perform the deforestation? If yes, what do I have to change in my code or cabal file? If yes, how does it really work? if no, where is the issue: in wikipedia, GHC or in the library?

4

1 回答 1

14

数据结构实际上被融合了,但生成的程序不是尾递归的。优化后的代码基本上是这样的(没有ConsNil看不到):

h n | n > end = 0
    | otherwise = n + h (n+1)

要评估结果,您必须首先h (n+1)递归评估,然后将结果添加到n. 在递归调用期间,该值n必须保持存储在某处,因此我们观察到内存使用量随着增加而end增加。

通过使递归调用位于尾部位置并携带一个恒定大小的累加器,可以获得更紧密的循环。我们希望代码对此进行优化:

-- with BangPatterns
h n !acc | n > end = acc
         | otherwise = h (n+1) (n + acc)

hylosum中,调用(+)发生在 中alg,我们将其替换为对将由 建立的延续的调用hylo

alg :: ListF Int (Int -> Int) -> Int -> Int
alg Nil acc = acc
alg (Cons n go) !acc = go (n + acc)

有了这个,我看到堆中分配了一个常量 51kB。

于 2018-03-06T16:48:40.317 回答