我们面临两个挑战。第一个是“[我们]如何在……各种查找结构中传播[a]变化”。第二个是最小化我们执行查找时所做的工作。
让我们编写一些工作代码,以便我们有一些具体的讨论。
首先,让我们看看什么是“更新”或“改变”。更新或更改以一种状态开始,并以另一种状态结束。它是从前一个状态到下一个状态的函数。基本上是type Update = State -> State
。在 Haskell 中,我们可以通过将状态隐藏在 some 中来使状态消失Monad
;这是一种非常普遍的做法,因此尽管它看起来“不纯”,但它非常“Haskell-ish”。你可以通过阅读state monad来了解更多关于这个想法的信息。
这是一个类似于MonadState的类,它让我们讨论可以分配 ( new
)、更新 ( set
) 和检查 ( get
) 的值。
-- Class for a typed dictionary in a monadic context
class (Monad m) => MonadReference m where
type Reference :: * -> *
new :: (Typeable a) => a -> m (Reference a)
set :: (Typeable a) => (Reference a) -> a -> m ()
get :: (Typeable a) => (Reference a) -> m a
我们将使用它来编写一些非常简单的示例代码。
data Person = Person {
name :: String
} deriving (Show, Typeable)
data Company = Company {
legalName :: String
} deriving (Show, Typeable)
-- the only thing we need MonadIO for in this exmple is printing output
example1 :: (MonadIO m, MonadReference m) => m ()
example1 = do
-- alice :: Reference Person
alice <- new $ Person { name = "Alice" }
bob <- new $ Person { name = "Bob" }
-- company :: Reference Company
company <- new $ Company { legalName = "Eve's Surveillance" }
(liftIO . print) =<< get alice
(liftIO . print) =<< get bob
(liftIO . print) =<< get company
(liftIO . putStrLn) ""
set alice Person { name = "Mike" }
set company Company { legalName = "Mike's Meddling" }
(liftIO . print) =<< get alice
(liftIO . print) =<< get bob
(liftIO . print) =<< get company
我们使用new
、get
和set
来创建一些Reference
s、检查它们并修改它们。
为了让它工作,我们需要一些无聊的样板。我们将借用IORef
a 的实现Reference
来运行此代码,而无需自己编写太多代码。
{-# LANGUAGE TypeFamilies, DeriveDataTypeable #-}
module Main (
main
) where
import Data.Typeable
import Data.Traversable
import Control.Applicative
import Data.IORef
--transformers package:
import Control.Monad.IO.Class
main = example1
-- Instead of implementing a dictionary, for an example we'll just use IORefs when we have IO.
instance MonadReference IO where
type Reference = IORef
new = newIORef
set = writeIORef
get = readIORef
现在,除了更新人员之外,我们还希望在多个数据结构中更新人员。我们将看两个数据结构:一个列表[Person]
, 和一个元组,(Person,Company)
。现在,我们可以Reference
为人们制作一个 s 列表,比如(people :: [Reference Person]) = [alice, bob]
,但这不是很有用。例如,我们真的不知道该怎么做show
。如果Reference
不混合在列表中会更有用。天真,Reference [Person]
会更有用。但这对此毫无意义set
,Reference
所以很明显我们的类型错误。Reference [Person]
只会让我们打电话get
把它变成一个m [Person]
,所以我们可以跳过它而只使用m [Person]
. 这是一个这样做的例子:
-- the only thing we need MonadIO for in this exmple is printing output
example2 :: (MonadIO m, MonadReference m) => m ()
example2 = do
-- alice :: Reference Person
alice <- new $ Person { name = "Alice" }
bob <- new $ Person { name = "Bob" }
-- company :: Reference Company
company <- new $ Company { legalName = "Eve's Surveillance" }
(liftIO . print) =<< get alice
(liftIO . print) =<< get bob
(liftIO . print) =<< get company
let people = do
a <- get alice
b <- get bob
return [a, b]
let structure2 = do
a <- get alice
c <- get company
return (a, c)
(liftIO . print) =<< people
(liftIO . print) =<< structure2
(liftIO . putStrLn) ""
set alice Person { name = "Mike" }
set company Company { legalName = "Mike's Meddling" }
(liftIO . print) =<< get alice
(liftIO . print) =<< get bob
(liftIO . print) =<< get company
(liftIO . print) =<< people
(liftIO . print) =<< structure2
现在我们对一个或多个用于执行此操作的库应该是什么样子有了相当多的了解。以下是我们可能已经想到的一些要求:
- 我们需要一些东西来保持所有实体的状态
- 我们需要一种方法来从一个状态到一个拥有新实体的新状态
- 我们需要一种方法来更新存储在状态中的实体
- 我们需要一种从状态中检索实体的方法
以下是对一些代码进行试验后出现的一些要求:
- 我们需要一种方法来生成依赖于实体当前状态的状态依赖值。我们在 、 和 中看到了
get alice
这get bob
一点get company
。
- 我们需要一种方法来从常量中生成依赖于状态的值。我们在 、 和 构造函数的使用中看到了
(:)
这[]
一点(,)
。
- 我们需要一种方法将多个状态相关值组合成新的状态相关值。
我们的示例也存在一些问题。如果我们接受MonadReference m => m a
type 的状态依赖值的类型a
,那么没有什么可以阻止我们认为从状态中获取值的东西也修改它。
我们也有性能问题。每次我们使用它们时,我们所有的状态相关值都会被完全重新计算。良好的性能要求可能是:
- 除非它所依赖的状态已被修改,否则不应计算状态相关值。
有了这些新要求,我们就可以制作新的接口。在我们制作新接口之后,我们可以为它们配备一个简单的实现。在我们有了一个简单的实现之后,我们可以解决我们对性能的要求,并做出一个高性能的实现。
一些可以让我们为下一步做准备的练习包括阅读或使用Control.Applicative
、发布者-订阅者设计模式、可操作的 monad 和转换器Program
或ProgramT
免费的 monad 和转换器Free
、FreeF
、 和FreeT
、Data.Traversable
、Control.Lens
和 knockout.js javascript 库。
更新:新界面
根据我们对状态相关值的新要求,我们可以编写一个新接口:
-- Class for a monad with state dependent values
class (MonadReference m, Applicative Computed, Monad Computed) => MonadComputed m where
type Computed :: * -> *
track :: (Typeable a) => Reference a -> m (Computed a)
runComputed :: (Typeable a) => (Computed a) -> m a
这些解决了我们的新要求,如下所示:
track
生成一个依赖于 a 的状态相关值Reference
,它满足我们的第一个新要求。
Applicative
'spure
和Monad
' return 都提供了一种方法来创建Computed
包含常量的新值。
Applicative
's<*>
和Monad
's>>=
提供了将计算值组合成新计算值的方法。
- 该
Computed
类型为实现排除不需要的类型提供了一种方法。
现在我们可以根据这个接口编写新的示例代码。我们将用三种不同的方式构造计算值:在带有实例的列表上使用' Data.Traversable
sequenceA ,使用实例 for ,最后使用实例 for 。Applicative
Computed
Monad
Computed
Applicative
Computed
-- the only thing we need MonadIO for in this exmple is printing output
example :: (MonadIO m, MonadComputed m) => m ()
example = do
-- aliceRef :: Reference Person
aliceRef <- new $ Person { name = "Alice" }
-- alice :: Computed Person
alice <- track aliceRef
bobRef <- new $ Person { name = "Bob" }
bob <- track bobRef
-- companyRef :: Reference Company
companyRef <- new $ Company { legalName = "Eve's Surveillance" }
-- company :: Computed Company
company <- track companyRef
(liftIO . print) =<< runComputed alice
(liftIO . print) =<< runComputed bob
(liftIO . print) =<< runComputed company
let people = Traversable.sequenceA [alice, bob]
let structure2 = do
a <- alice
c <- company
return (a, c)
let structure3 = (pure (,)) <*> structure2 <*> bob
(liftIO . print) =<< runComputed people
(liftIO . print) =<< runComputed structure2
(liftIO . print) =<< runComputed structure3
(liftIO . putStrLn) ""
set aliceRef Person { name = "Mike" }
set companyRef Company { legalName = "Mike's Meddling" }
(liftIO . print) =<< runComputed alice
(liftIO . print) =<< runComputed bob
(liftIO . print) =<< runComputed company
(liftIO . print) =<< runComputed people
(liftIO . print) =<< runComputed structure2
(liftIO . print) =<< runComputed structure3
请注意,如果我们不想或不需要track aliceRef
并且track bobRef
独立地,我们可以通过创建一个Computed
值列表mapM track [aliceRef, bobRef]
。
现在我们可以为 IO 做另一个简单的实现,这样我们就可以运行我们的示例并看到我们在正确的轨道上。我们将使用操作的Program
类型来简化此操作,并为我们提供一个Applicative
和一个Monad
实例。
-- Evaluate computations built in IO
instance MonadComputed IO where
-- Store the syntax tree in a Program from operational
type Computed = Program IORef
track = return . singleton
runComputed c = case view c of
Return x -> return x
ref :>>= k -> do
value <- readIORef ref
runComputed (k value)
此时整个运行示例为:
{-# LANGUAGE TypeFamilies, DeriveDataTypeable, FlexibleContexts #-}
module Main (
main
) where
import Data.Typeable
import qualified Data.Traversable as Traversable
import Control.Applicative
import Data.IORef
--transformers package:
import Control.Monad.IO.Class
--operational package:
import Control.Monad.Operational
main = example
data Person = Person {
name :: String
} deriving (Show, Typeable)
data Company = Company {
legalName :: String
} deriving (Show, Typeable)
-- the only thing we need MonadIO for in this exmple is printing output
example :: (MonadIO m, MonadComputed m) => m ()
example = do
-- aliceRef :: Reference Person
aliceRef <- new $ Person { name = "Alice" }
-- alice :: Computed Person
alice <- track aliceRef
bobRef <- new $ Person { name = "Bob" }
bob <- track bobRef
-- companyRef :: Reference Company
companyRef <- new $ Company { legalName = "Eve's Surveillance" }
-- company :: Computed Company
company <- track companyRef
(liftIO . print) =<< runComputed alice
(liftIO . print) =<< runComputed bob
(liftIO . print) =<< runComputed company
let people = Traversable.sequenceA [alice, bob]
let structure2 = do
a <- alice
c <- company
return (a, c)
let structure3 = (pure (,)) <*> structure2 <*> bob
(liftIO . print) =<< runComputed people
(liftIO . print) =<< runComputed structure2
(liftIO . print) =<< runComputed structure3
(liftIO . putStrLn) ""
set aliceRef Person { name = "Mike" }
set companyRef Company { legalName = "Mike's Meddling" }
(liftIO . print) =<< runComputed alice
(liftIO . print) =<< runComputed bob
(liftIO . print) =<< runComputed company
(liftIO . print) =<< runComputed people
(liftIO . print) =<< runComputed structure2
(liftIO . print) =<< runComputed structure3
-- Class for a typed dictionary in a monadic context
class (Monad m) => MonadReference m where
type Reference :: * -> *
new :: (Typeable a) => a -> m (Reference a)
set :: (Typeable a) => Reference a -> a -> m ()
get :: (Typeable a) => Reference a -> m a
-- Class for a monad with state dependent values
class (MonadReference m, Applicative Computed, Monad Computed) => MonadComputed m where
type Computed :: * -> *
track :: (Typeable a) => Reference a -> m (Computed a)
runComputed :: (Typeable a) => (Computed a) -> m a
-- Instead of implementing a dictionary, for an example we'll just use IORefs when we have IO.
instance MonadReference IO where
type Reference = IORef
new = newIORef
set = writeIORef
get = readIORef
-- Evaluate computations built in IO
instance MonadComputed IO where
-- Store the syntax tree in a Program from operational
type Computed = Program IORef
track = return . singleton
runComputed c = case view c of
Return x -> return x
ref :>>= k -> do
value <- readIORef ref
runComputed (k value)
我们仍然需要解决我们的性能要求,以最大限度地减少执行查找时所做的工作。我们的目标要求是:
- 除非它所依赖的状态已被修改,否则不应计算状态相关值。
我们现在可以根据我们的界面来澄清这一点:
runComputed
Computed
除非自上次执行以来它所依赖的值已被修改,否则不应计算runComputed
。
我们现在可以看到,我们想要的解决方案将是缓存失效或自下而上的查询评估。我猜想在一种具有惰性求值的语言中,它们的结果都是一样的。
最后更新:性能
配备新界面后,我们现在可以探索并实现我们的性能目标。在这样做的过程中,我发现我们遗漏了一个额外的、微妙的要求。runComputed
如果值未更改,我们希望重用先前计算的值。我们没有注意到的是 Haskell 的类型系统应该并且正在阻止我们这样做。一个类型的值Computed a
总是意味着同样的事情,它从来没有真正被修改过。因此,构建我们的结构的计算将意味着同样的事情,“由这些部分构建的计算”即使在我们执行之后也是如此runComputed
。我们需要溜到某个地方来放置第一次运行计算的副作用。我们可以用类型m (Computed a)
来代替。这样做的新方法MonadComputed m
是:
share :: (Typeable a) => (Computed a) -> m (Computed a)
我们得到的新的Computed a
含义略有不同:“由这些部分构建的可能缓存的计算”。我们已经在做类似的事情,但是告诉 Haskell 而不是告诉我们的代码。例如,我们写道:
let people = Traversable.sequenceA [alice, bob]
这let
告诉 Haskell 编译器每次遇到people
它都应该使用相同的 thunk。如果我们在Traversable.sequenceA [alice, bob]
每次使用时都编写它,Haskell 编译器可能不会创建和维护指向单个 thunk 的指针。在处理内存时知道这可能是一件好事。如果您想在内存中维护某些内容并避免计算,请使用let
,如果您想重新计算它以避免占用内存,请不要使用let
. 在这里,我们明确地想要保留我们的计算结构,所以我们将使用我们的新等价物,share
people <- share $ Traversable.sequenceA [alice, bob]
最后对示例代码的其余更改是为了演示更多可能的更新。
现在我们已经完成了接口,我们可以着手实现。此实现仍将利用IO
and IORef
s。我们的实现将基于订阅更改通知,并在更改发生时使缓存的更改及其依赖项无效。此数据结构存储一个值和想要通知的订阅者:
-- A published value for IO, using Weak references to the subscribers
data Published a = Published {
valueRef :: IORef a,
subscribers :: IORef [Weak (IO ())]
}
当 IO 中发生某些事情时需要通知的事情可能很简单IO ()
,但是依赖计算和值之间的循环会将所有依赖计算保存在内存中,直到忘记原始值。相反,指向依赖项更新操作的Weak
指针(从System.Mem.Weak
)应该允许垃圾收集器收集这些。
首先我们将实现MonadReference IO
. 我们处理Reference
s 到实体的代码被修改为窥视Published
以获取值,并在设置值时执行所有订阅者。
-- A new implementation that keeps an update list
instance MonadReference IO where
type Reference = Published
new = newIORefPublished
set = setIORefPublished
get = readIORefPublished
-- Separate implemenations for these, since we'd like to drop the Typeable constraint
newIORefPublished value =
do
ref <- newIORef value
subscribersRef <- newIORef []
return Published { valueRef = ref, subscribers = subscribersRef }
setIORefPublished published value =
do
writeIORef (valueRef published) value
notify $ subscribers published
--readIORefPublished = readIORef . valueRef
readIORefPublished x = do
putStrLn "getting"
readIORef $ valueRef x
通知订阅者有点棘手。我们需要忘记任何已被垃圾收集删除的订阅者。我预计订阅者可能会在其更新操作期间订阅事物以应对棘手的绑定情况,因此当订阅者被垃圾收集时,我们不会假设新的订阅者集是旧集,除了垃圾收集的订阅者,而是将它们过滤为单独的cleanupWeakRefs
步骤。
notify :: IORef [Weak (IO ())] -> IO ()
notify = go
where
go subscribersRef = do
subscribers <- readIORef subscribersRef
needsCleanup <- (liftM (any id)) (mapM notifySubscriber subscribers)
when needsCleanup $ cleanupWeakRefs subscribersRef
notifySubscriber weakSubscriber = do
maybeSubscriber <- deRefWeak weakSubscriber
case maybeSubscriber of
Nothing -> return True
Just subscriber -> subscriber >> return False
cleanupWeakRefs :: IORef [Weak a] -> IO ()
cleanupWeakRefs ref = do
weaks <- readIORef ref
newWeaks <- (liftM catMaybes) $ mapM testWeak weaks
writeIORef ref newWeaks
where
testWeak weakRef = liftM (>> Just weakRef) $ deRefWeak weakRef
我们已经完成了对实体的处理,是时候进入有趣且棘手的部分,即计算了。这是计算或状态相关值的完整数据类型:
-- Data type for building computations
data IORefComputed a where
Pure :: a -> IORefComputed a
Apply :: IORefComputed (b -> a) -> IORefComputed b -> IORefComputed a
Bound :: IORefComputed b -> (b -> IORefComputed a) -> IORefComputed a
Tracked :: Published a -> IORefComputed a
Shared :: Published (Either (IORefComputed a) a) -> IORefComputed a
Pure
表示不依赖于任何东西的值。Apply
表示由 的应用程序构建的值<*>
。Bound
表示使用Monad
实例的>>=
. Tracked
是使用track
. Shared
是我们记住计算并被通知跟踪值更改的点,使用share
. 我们重用该Published
类型来存储一个值及其订阅者,但我们存储的值是Either
共享缓存脏时需要执行的计算(IORefComputed a)
,或者缓存干净时的缓存值a
。以下是让用户使用这些的实例:
instance Monad IORefComputed where
return = Pure
(>>=) = Bound
(>>) _ = id
instance Applicative IORefComputed where
pure = return
(<*>) = Apply
instance Functor IORefComputed where
fmap = (<*>) . pure
-- Evaluate computations built in IO
instance MonadComputed IO where
type Computed = IORefComputed
track = trackIORefComputed
runComputed = evalIORefComputed
share = shareIORefComputed
-- Separate implementations, again to drop the Typeable constraint
trackIORefComputed = return . Tracked
注意: 的优化>>
几乎肯定会在 存在的情况下违反单子定律_|_
。
runComputed
现在我们需要对和进行非平凡的实现share
。首先我们看一下share
,它完成了大部分新工作:
shareIORefComputed :: IORefComputed a -> IO (IORefComputed a)
shareIORefComputed c =
case c of
Apply cf cx -> do
sharedf <- shareIORefComputed cf
sharedx <- shareIORefComputed cx
case (sharedf, sharedx) of
-- Optimize away constants
(Pure f, Pure x) -> return . Pure $ f x
_ -> do
let sharedc = sharedf <*> sharedx
published <- newIORefPublished $ Left sharedc
-- What we are going to do when either argument changes
markDirty <- makeMarkDirty published published sharedc
subscribeTo sharedf markDirty
subscribeTo sharedx markDirty
return $ Shared published
Bound cx k -> do
sharedx <- shareIORefComputed cx
case cx of
-- Optimize away constants
(Pure x) -> shareIORefComputed $ k x
_ -> do
let dirtyc = sharedx >>= k
published <- newIORefPublished $ Left dirtyc
-- What we are going to do when the argument to k changes
markDirty <- makeMarkDirty published published dirtyc
subscribeTo sharedx markDirty
return $ Shared published
_ -> return c
当我们被要求分享 , 的应用时<*>
,Apply
我们首先分享它的两个论点。如果我们可以确定它是恒定的,我们会优化掉这个值。如果我们不能优化它,我们会创建一个新的、最初是脏的缓存,并在任一参数更改时要求更新。
处理起来>>=
要困难得多。我们与 共享参数>>=
,但我们不知道Computed
函数将返回什么值,直到我们对每个参数进行评估。我们说它可以通过评估整个绑定来计算,并在参数更改时要求缓存无效。我们一定会在以后改进这一点。
在所有其他情况下,无需做任何事情来缓存该值。它要么是一个常量,要么是一个已经缓存的Tracked
or Shared
。
如果您怀疑是否需要share
,请将此定义替换为
shareIORefComputed c = return c
并运行示例。您会看到每次运行时都会读取每个涉及的值runComputed
。您无法runComputed
修改现有Computed
内容以了解它已被缓存的位置,因为我们无法更改 Haskell 中的现有值。
现在我们将实现runComputed
. 基本思想是我们像以前一样评估事物,但是当我们遇到脏共享缓存时,我们会计算它的新值并更新缓存。这些更新不会触发订阅者的通知。
evalIORefComputed :: IORefComputed a -> IO a
evalIORefComputed c =
case c of
Pure x -> return x
Apply cf cx -> do
f <- evalIORefComputed cf
x <- evalIORefComputed cx
return (f x)
Bound cx k -> do
value <- evalIORefComputed cx
evalIORefComputed (k value)
Tracked published -> readIORefPublished published
Shared publishedThunk -> do
thunk <- readIORefPublished publishedThunk
case thunk of
Left computation@(Bound cx k) -> do
x <- evalIORefComputed cx
-- Make a shared version of the computed computation
currentExpression <- shareIORefComputed (k x)
let gcKeyedCurrentExpression = Left currentExpression
writeIORef (valueRef publishedThunk) gcKeyedCurrentExpression
markDirty <- makeMarkDirty publishedThunk gcKeyedCurrentExpression computation
subscribeTo currentExpression markDirty
evalIORefComputed c
Left computation -> do
value <- evalIORefComputed computation
writeIORef (valueRef publishedThunk) (Right value)
return value
Right x ->
return x
这很简单,除了我们为脏共享做的事情>>=
。我们评估参数,然后share
进行计算。诀窍是我们要求在更新这个新值时将整个共享 thunk 标记为脏。当为此的脏标记是垃圾收集时,我们要求收集的垃圾忘记这一点currentExpression
。这提供了一个窗口,在此期间,thunk 可能会被标记为脏,即使它不再依赖于currentExpression
. 现在,共享绑定将通过对其参数的更改、对依赖于其参数的计算值的更改以及对最近依赖于其参数但尚未被垃圾回收的计算值的更改标记为脏。
实现的其余部分是构建对通知的弱引用并通过预先添加新订阅者来订阅已发布的值。
makeMarkDirty :: Published (Either (IORefComputed a) a) -> k -> IORefComputed a -> IO (Weak (IO ()))
makeMarkDirty published key definition =
do
let markDirty = do
existing <- readIORef (valueRef published)
case existing of
Right _ -> setIORefPublished published $ Left definition
_ -> return ()
mkWeak key markDirty Nothing
subscribeTo :: IORefComputed a -> Weak (IO ()) -> IO ()
subscribeTo (Tracked published) trigger = modifyIORef' (subscribers published) (trigger :)
subscribeTo (Shared published) trigger = modifyIORef' (subscribers published) (trigger :)
subscribeTo _ _ = return ()
完整的编译示例代码在 github 上。它需要变压器包。
如果您运行示例代码,您会注意到:
company
改名后,runComputed people
只执行一次get获取缓存值
- 在不进行任何更改后,所有三个结构都只执行一次 get 以获取整个缓存结构
- 更改后
bob
,runComputed structure2
仅执行一次 get 即可获取缓存值,并且计算完成的工作structure3
较少,即使bob
在structure3
.
structure2
,使用实例构建的一个Monad
,由于额外的中间共享值,需要最多的计算工作。