memory - 循环函数的意外内存使用

Question

在下面的程序中，我只希望cycle3以恒定的内存运行（它明确地打结）。然而，cycle2由于我无法理解的原因，它也在不断的记忆中运行。我希望做与因为cycle2完全相同的工作cycle1

xs' = xs ++ xs'
xs' = xs ++ xs ++ xs' -- substitute value of xs'
xs' = xs ++ xs ++ xs ++ xs' -- substitute value of xs' again
xs' = xs ++ xs ++ xs ++ ... -- and so on

有人可以解释我在这里缺少什么吗？

module Main where

import System.Environment (getArgs)

cycle1 :: [a] -> [a]
cycle1 [] = error "empty list"
cycle1 xs = xs ++ cycle1 xs

cycle2 :: [a] -> [a]
cycle2 [] = error "empty list"
cycle2 xs = xs' where xs' = xs ++ xs'

cycle3 :: [a] -> [a]
cycle3 [] = error "empty list"
cycle3 xs = let
  xs' = go xs' xs
  in xs'
  where
    go :: [a] -> [a] -> [a]
    go start [last] = last : start
    go start (x:xs) = x : go start xs

testMem :: (Show a) => ([a] -> [a]) -> [a] -> IO ()
testMem f xs = print (xs', xs') -- prints only first val, need second val holds onto reference
  where
    xs' = f xs

main :: IO ()
main = do
  args <- getArgs
  let mCycleFunc = case args of
        ["1"] -> Just cycle1
        ["2"] -> Just cycle2
        ["3"] -> Just cycle3
        _ -> Nothing
  case mCycleFunc of
    Just cycleFunc -> testMem cycleFunc [0..8]
    Nothing -> putStrLn "Valid args are one of {1, 2, 3}."

score 6 · Accepted Answer

cycle1每次消耗一个周期时都会创建一个新列表。原因应该很明显。

cycle2，但是，不这样做。它创建一个变量xs'，在它自己的定义中使用。每次你消费时cycle1都必须重新评估函数，但它没有任何递归函数。它只是引用相同的变量，该变量已经具有已知值。cycle1xscycle2

score 3 · Accepted Answer

它归结为共享或不共享相等的thunk。两个相等的 thunk 是保证产生相同结果的 thunk。在的情况下cycle1，您cycle1 xs每次[]在xs. 需要为该 thunk 分配新内存，并且需要从头开始计算其值，这会在您遍历它时分配新的列表对。

我认为如果您重命名为（并且我删除了“错误”案例），cycle2避免这种情况的方法会变得更容易理解：xs'result[]

cycle2 :: [a] -> [a]
cycle2 xs = result 
    where result = xs ++ result

这个定义在语义上等同于cycle1（对相同的参数产生相同的结果），但理解内存使用的关键是根据创建的 thunk 来看待它。当您执行此函数的编译代码时，它会立即为result. 您可以将 thunk 视为或多或少像这样的可变类型（完全虚构的伪代码）：

type Thunk a = union { NotDone (ThunkData a), Done a }
type ThunkData a = struct { force :: t0 -> ... -> tn -> a
                          , subthunk0 :: t0
                          , ...
                          , subthunkn :: tn }

这要么是一条记录，其中包含指向所需值的 thunk 的指针以及指向强制这些 thunk 的代码的指针，或者只是计算的结果。在的情况下cycle2，thunk forresult指向目标代码 for和(++)thunk 。最后一位意味着 thunk for有一个指向自身的指针，这解释了常量空间行为；强制的最后一步是让它指向自身。xsresultresultresult

cycle1另一方面，在 thunk的情况下，thunk 具有的代码、用于(++)的 thunkxs和要从头开始计算的新thunk。cycle1 xs原则上，编译器可能会认识到对后一个 thunk 的引用可以替换为对“父”块的引用，但编译器不会这样做；而在其中，cycle2它不得不这样做（一个变量的一个实例化绑定=一个块）。

请注意，这种自引用 thunk 行为可以分解为以下的合适实现fix：

-- | Find the least fixed point of @f@.  This implementation should produce
-- self-referential thunks, and thus run in constant space.
fix :: (a -> a) -> a
fix f = result
    where result = f result

cycle4 :: [a] -> [a]
cycle4 xs = fix (xs++)

memory - 循环函数的意外内存使用

2 回答 2

Related

Reference