haskell - 避免单子展开生成的稀疏评估列表中的thunk

Question

我有一个模拟库，它使用包装在 monad 中的 FFI M，并带有上下文。所有的外来函数都是纯函数，所以我决定让 monad 变得惰性，这通常便于流控制。我将我的模拟表示为一个模拟框架列表，我可以通过写入文件或以图形方式显示框架来使用它们。

simulation :: [(Frame -> M Frame)] -> Frame -> M [Frame] 
simulation [] frame = return [frame]
simulation (step:steps) frame 
   = step frame >>= fmap (frame:) . simulation steps

每个帧都包含一个新类型包装ForeignPtr的元组，我可以使用它提升到我的 Haskell 表示

lift :: Frame -> M HFrame

由于我的模拟中的时间步很短，我只想查看n我使用的每一帧

takeEvery n l = foldr cons nil l 0 where
    nil _ = []
    cons x rest 0 = x : rest n
    cons x rest n = rest (n-1)

所以我的代码看起来像

main = consume 
     $ takeEvery n 
     $ runM 
     $ simulation steps initialFrame >>= mapM lift

现在，问题是随着我的增加n，一个thunk 建立起来。我尝试了几种不同的方法来严格评估中的每一帧simulation，但我还没有弄清楚如何去做。ForeignPtr似乎没有NFData实例，所以我不能使用deepseq，但是我所有的尝试seq，包括seq在元组中的每个元素上使用，都没有明显的效果。

编辑：

根据要求，我提供了更多细节，我最初排除了这些细节，因为我认为它们可能主要是这个问题的噪音。

单子

newtype FT c a = FT (Context -> a)

instance Functor (FT c) where
    fmap f (FT a) = FT (f.a)

instance Applicative (FT c) where
    pure a = FT (\_ -> a)
    (<*>) (FT a) (FT b) = FT (\c -> a c $ b c)

instance Monad (FT c) where
    return = pure
    (>>=) (FT a) f = FT (\c -> (\(FT b) -> b c) $ f $ a c)

runFTIn :: Context -> (forall c. FT c a) -> a
runFTIn context (FT a) = a context


runFTWith :: [ContextOption] -> (forall c. FT c a) -> a
runFTWith options a
    = unsafePerformIO
    $ getContext options >>= \c -> return $ runFTIn c a
runFT = runFTWith []

unsafeLiftFromIO :: (Context -> IO a) -> FT c a
unsafeLiftFromIO a = FT (\c -> unsafePerformIO $ a c)

所有的外来函数都从IOwithunsafeLiftFromIO

newtype Box c = Box (ForeignPtr RawBox)
newtype Coordinates c = Coordinates (ForeignPtr RawCoordinates)
type Frame c = (Box c, Coordinates c)

liftBox :: Box c -> FT c HBox
liftCoordinates :: Coordinates c -> FT c HCoordinates
liftFrame (box, coordinates) = do
    box' <- liftBox box
    coordinates' <- liftCoordinates coordinates
    return (box', coordinates')

steps它们本身应该是任意的(Frame c -> FT c (Frame c))，因此严格性最好在更高级别的代码中。

编辑2：

我现在尝试使用Streamly，但是问题仍然存在，所以我认为问题确实是找到一种严格评估ForeignPtrs 的方法。

当前实现：

import Streamly
import qualified Streamly.Prelude as S
import qualified Streamly.Internal.Data.Stream.Serial as Serial

takeEvery n = Serial.unfoldrM ((fmap.fmap) (\(h, t) -> (h, S.drop (n-1) t)) . S.uncons)

(#) = flip ($)
simulation
    :: (IsStream t)
    => Frame c 
    -> t (FT c) (Frame c -> FT c (Frame c)) 
    -> t (FT c) (Frame c)
simulation frame = S.scanlM' (#) frame

编辑3：

澄清症状以及我如何诊断问题。

该库调用OpenCL在 GPU 上运行的函数。我确信指针的释放得到了正确处理 - ForeignPtrs 具有正确的释放功能，并且内存使用与总数无关，steps只要这个数字大于n. 我发现 GPU 上的内存使用基本上与n. 我用于此测试的消费者是

import qualified Data.ByteString.Lazy as BL
import Data.Binary
import Data.Binary.Put

writeTrajectory fn = fmap (BL.writeFile fn . runPut) . S.foldr ((>>).putFrame) (pure ()) . serially

为了我的流畅实施，以及

writeTrajectory fn = BL.writeFile fn . runPut . MapM_ putFrame

对于原始实现。两者都应该连续消耗流。我已经生成了steps用于测试的replicate.

我不确定如何更精确地分析 GPU 上的内存使用情况。系统内存使用在这里不是问题。

更新：我开始认为这不是严格问题，而是 GC 问题。运行时系统不知道在 GPU 上分配的内存大小，因此不知道收集指针，当 CPU 端也有东西时，这不是问题，因为这会产生分配同样，激活 GC。n这可以解释稍微不确定的内存使用情况，但与我所看到的线性相关。如何很好地解决这个问题是另一个问题，但我怀疑我的代码将会进行重大改革。

score 1 · Accepted Answer

我认为问题确实是找到一种方法来严格评估 ForeignPtrs

如果这确实是问题，那么一种方法是更改第二个子句simulation：

{-# LANGUAGE BangPatterns #-}

simulation :: [(Frame -> M Frame)] -> Frame -> M [Frame] 
simulation [] frame = return [frame]
simulation (step:steps) frame@(!_, !_)  -- Evaluate both components of the pair
   = step frame >>= fmap (frame:) . simulation steps

haskell - 避免单子展开生成的稀疏评估列表中的thunk

1 回答 1

Related

Reference