design-patterns - How to add fields that only cache something to ADT?

Question

Often I'm in the need of adding fields to an ADT that only memoize some redundant information. But I haven't figured out completely how to do it nicely and efficiently.

The best way to show the problem is to make an example. Suppose we're working with untyped lambda terms:

type VSym = String

data Lambda = Var VSym 
            | App Lambda Lambda
            | Abs VSym Lambda

And from time to time we need to compute the set of free variables of a term:

fv :: Lambda -> Set VSym
fv (Var v)    = Set.singleton v
fv (App s t)  = (fv s) `Set.union` (fv t)
fv (Abs v t)  = v `Set.delete` (fv t)

Soon we realize that repeated computations of fv are a bottleneck of our application. We would like to add it to the data type somehow. Like:

data Lambda1 = Var (Set VSym) VSym
             | App (Set VSym) Lambda Lambda
             | Abs (Set VSym) VSym Lambda

But it makes the definition quite ugly. Almost (Set VSym) takes more space than all the rest. Moreover, it breaks pattern matching in all functions that use Lambda. And to make things worse, if we later decide to add some other memoizing field, we'll have to rewrite all patterns again.

How to design a general solution that allows adding such memoizing fields easily and unobtrusively? I'd like to reach the following goals:

The data definition should look as close as possible to the original, so that it's easily readable and understandable.
Pattern matches too should remain simple and readable.
Adding a new memoizing field later should not break existing code, in particular:
- not to break existing patterns,
- not to require changes where the function we want to memoize was used (like code that used fv in this example).

I'll describe my current solution: To keep the data definition and pattern matches as little cluttered as possible, let's define:

data Lambda' memo = Var memo VSym 
                  | App memo (Lambda' memo) (Lambda' memo)
                  | Abs memo VSym (Lambda' memo)
type Lambda = Lambda' LambdaMemo

where the data to be memoized is defined separately:

data LambdaMemo = LambdaMemo { _fv :: Set VSym, _depth :: Int }

Then a simple function that retrieves the memoized part:

memo :: Lambda' memo -> memo
memo (Var c _)   = c
memo (App c _ _) = c
memo (Abs c _ _) = c

(This could be eliminated by using named fields. But then we'd have to name all the other fields as well.)

This allows us to pick specific parts from the memoize, keeping the same signature of fv as before:

fv :: Lambda -> Set VSym
fv = _fv . memo

depth :: Lambda -> Int
depth = _depth . memo

Finally, we declare these smart constructors:

var :: VSym -> Lambda
var v = Var (LambdaMemo (Set.singleton v) 0) v

app :: Lambda -> Lambda -> Lambda
app s t = App (LambdaMemo (fv s `Set.union` fv t) (max (depth t) (depth s))) s t

abs :: VSym -> Lambda -> Lambda
abs v t = Abs (LambdaMemo (v `Set.delete` fv t) (1 + depth t)) v t

Now we can efficiently write things that mix pattern matching with reading the memoized fields like

canSubstitute :: VSym -> Lambda -> Lambda -> Bool
canSubstitute x s t
  | not (x `Set.member` (fv t))
      = True -- the variable doesn't occur in `t` at all
canSubstitute x s t@(Abs _ u t')
  | u `Set.member` (fv s)
      = False
  | otherwise
      = canSubstitute x s t'
canSubstitute x s (Var _ _)
      = True
canSubstitute x s (App _ t1 t2)
      = canSubstitute x s t1 && canSubstitute x s t2

This seems to solve:

Pattern matching is still quite reasonable.
If we add a new memoizing field it won't disrupt existing code.
If we decide to memoize a function with signature Lambda -> Something we can easily add it as a new memoizing field.

What I still don't like about this design:

The data definition isn't so bad, but still placing memo everywhere clutters it considerably.
We need to have smart constructors for constructing values but we use the regular constructors for pattern matching. This is not so bad, we simply add one _, but having the same signature for constructing and deconstructing would be nice. I suppose Views or Pattern Synonyms would solve it.
The computation of the memoized fields (free variables, depth) is tightly coupled to the smart constructors. As it's reasonable to assume that those memoized functions will be always catamorphisms, I believe this could be solved to some extent by tools like the fixpoint package.

Any ideas how to improve it? Or are there better ways to solve such a problem?

score 2 · Accepted Answer

我认为您的所有目标都可以通过在函数中使用普通的旧记忆而不是通过在 ADT 本身中缓存结果来实现。就在几周前，我发布了stable-memo包，这应该会有所帮助。检查您的标准，我认为我们不能做得比这更好：

您的数据定义根本没有改变。
模式匹配也不会改变。
现有代码不必仅仅因为您编写了更多的记忆函数而改变。
- 现有模式不会被破坏。
- 没有现有的记忆功能被破坏。

使用它非常简单。只需应用于memo您想要记忆的任何函数，确保您在任何地方都使用该函数的记忆版本，即使在递归调用中也是如此。以下是如何编写您在问题中使用的示例：

import Data.StableMemo

type VSym = String

data Lambda = Var VSym 
            | App Lambda Lambda
            | Abs VSym Lambda

fv :: Lambda -> Set VSym
fv = memo go
  where
    go (Var v)   = Set.singleton v
    go (App s t) = fv s `Set.union` fv t
    go (Abs v t) = v `Set.delete` fv t

design-patterns - How to add fields that only cache something to ADT?

1 回答 1

Related

Reference