Often I'm in the need of adding fields to an ADT that only memoize some redundant information. But I haven't figured out completely how to do it nicely and efficiently.
The best way to show the problem is to make an example. Suppose we're working with untyped lambda terms:
type VSym = String
data Lambda = Var VSym
| App Lambda Lambda
| Abs VSym Lambda
And from time to time we need to compute the set of free variables of a term:
fv :: Lambda -> Set VSym
fv (Var v) = Set.singleton v
fv (App s t) = (fv s) `Set.union` (fv t)
fv (Abs v t) = v `Set.delete` (fv t)
Soon we realize that repeated computations of fv
are a bottleneck of our application. We would like to add it to the data type somehow. Like:
data Lambda1 = Var (Set VSym) VSym
| App (Set VSym) Lambda Lambda
| Abs (Set VSym) VSym Lambda
But it makes the definition quite ugly. Almost (Set VSym)
takes more space than all the rest. Moreover, it breaks pattern matching in all functions that use Lambda
. And to make things worse, if we later decide to add some other memoizing field, we'll have to rewrite all patterns again.
How to design a general solution that allows adding such memoizing fields easily and unobtrusively? I'd like to reach the following goals:
- The
data
definition should look as close as possible to the original, so that it's easily readable and understandable. - Pattern matches too should remain simple and readable.
- Adding a new memoizing field later should not break existing code, in particular:
- not to break existing patterns,
- not to require changes where the function we want to memoize was used (like code that used
fv
in this example).
I'll describe my current solution: To keep the data
definition and pattern matches as little cluttered as possible, let's define:
data Lambda' memo = Var memo VSym
| App memo (Lambda' memo) (Lambda' memo)
| Abs memo VSym (Lambda' memo)
type Lambda = Lambda' LambdaMemo
where the data to be memoized is defined separately:
data LambdaMemo = LambdaMemo { _fv :: Set VSym, _depth :: Int }
Then a simple function that retrieves the memoized part:
memo :: Lambda' memo -> memo
memo (Var c _) = c
memo (App c _ _) = c
memo (Abs c _ _) = c
(This could be eliminated by using named fields. But then we'd have to name all the other fields as well.)
This allows us to pick specific parts from the memoize, keeping the same signature of fv
as before:
fv :: Lambda -> Set VSym
fv = _fv . memo
depth :: Lambda -> Int
depth = _depth . memo
Finally, we declare these smart constructors:
var :: VSym -> Lambda
var v = Var (LambdaMemo (Set.singleton v) 0) v
app :: Lambda -> Lambda -> Lambda
app s t = App (LambdaMemo (fv s `Set.union` fv t) (max (depth t) (depth s))) s t
abs :: VSym -> Lambda -> Lambda
abs v t = Abs (LambdaMemo (v `Set.delete` fv t) (1 + depth t)) v t
Now we can efficiently write things that mix pattern matching with reading the memoized fields like
canSubstitute :: VSym -> Lambda -> Lambda -> Bool
canSubstitute x s t
| not (x `Set.member` (fv t))
= True -- the variable doesn't occur in `t` at all
canSubstitute x s t@(Abs _ u t')
| u `Set.member` (fv s)
= False
| otherwise
= canSubstitute x s t'
canSubstitute x s (Var _ _)
= True
canSubstitute x s (App _ t1 t2)
= canSubstitute x s t1 && canSubstitute x s t2
This seems to solve:
- Pattern matching is still quite reasonable.
- If we add a new memoizing field it won't disrupt existing code.
- If we decide to memoize a function with signature
Lambda -> Something
we can easily add it as a new memoizing field.
What I still don't like about this design:
- The
data
definition isn't so bad, but still placingmemo
everywhere clutters it considerably. - We need to have smart constructors for constructing values but we use the regular constructors for pattern matching. This is not so bad, we simply add one
_
, but having the same signature for constructing and deconstructing would be nice. I suppose Views or Pattern Synonyms would solve it. - The computation of the memoized fields (free variables, depth) is tightly coupled to the smart constructors. As it's reasonable to assume that those memoized functions will be always catamorphisms, I believe this could be solved to some extent by tools like the fixpoint package.
Any ideas how to improve it? Or are there better ways to solve such a problem?