I have long been wondering why lazy evaluation is useful. I have yet to have anyone explain to me in a way that makes sense; mostly it ends up boiling down to "trust me".
Note: I do not mean memoization.
I have long been wondering why lazy evaluation is useful. I have yet to have anyone explain to me in a way that makes sense; mostly it ends up boiling down to "trust me".
Note: I do not mean memoization.
Mostly because it can be more efficient -- values don't need to be computed if they're not going to be used. For example, I may pass three values into a function, but depending on the sequence of conditional expressions, only a subset may actually be used. In a language like C, all three values would be computed anyway; but in Haskell, only the necessary values are computed.
It also allows for cool stuff like infinite lists. I can't have an infinite list in a language like C, but in Haskell, that's no problem. Infinite lists are used fairly often in certain areas of mathematics, so it can be useful to have the ability to manipulate them.
A useful example of lazy evaluation is the use of quickSort
:
quickSort [] = []
quickSort (x:xs) = quickSort (filter (< x) xs) ++ [x] ++ quickSort (filter (>= x) xs)
If we now want to find the minimum of the list, we can define
minimum ls = head (quickSort ls)
Which first sorts the list and then takes the first element of the list. However, because of lazy evaluation, only the head gets computed. For example, if we take the minimum of the list [2, 1, 3,]
quickSort will first filter out all the elements that are smaller than two. Then it does quickSort on that (returning the singleton list [1]) which is already enough. Because of lazy evaluation, the rest is never sorted, saving a lot of computational time.
This is of course a very simple example, but laziness works in the same way for programs that are very large.
There is, however, a downside to all this: it becomes harder to predict the runtime speed and memory usage of your program. This doesn't mean that lazy programs are slower or take more memory, but it's good to know.
I find lazy evaluation useful for a number of things.
First, all existing lazy languages are pure, because it is very hard to reason about side effects in a lazy language.
Pure languages let you reason about function definitions using equational reasoning.
foo x = x + 3
Unfortunately in a non-lazy setting, more statements fail to return than in a lazy setting, so this is less useful in languages like ML. But in a lazy language you can safely reason about equality.
Secondly, a lot of stuff like the 'value restriction' in ML aren't needed in lazy languages like Haskell. This leads to a great decluttering of syntax. ML like languages need to use keywords like var or fun. In Haskell these things collapse down to one notion.
Third, laziness lets you write very functional code that can be understood in pieces. In Haskell it is common to write a function body like:
foo x y = if condition1
then some (complicated set of combinators) (involving bigscaryexpression)
else if condition2
then bigscaryexpression
else Nothing
where some x y = ...
bigscaryexpression = ...
condition1 = ...
condition2 = ...
This lets you work 'top down' though the understanding of the body of a function. ML-like languages force you to use a let
that is evaluated strictly. Consequently, you don't dare 'lift' the let clause out to the main body of the function, because if it expensive (or has side effects) you don't want it always to be evaluated. Haskell can 'push off' the details to the where clause explicitly because it knows that the contents of that clause will only be evaluated as needed.
In practice, we tend to use guards and collapse that further to:
foo x y
| condition1 = some (complicated set of combinators) (involving bigscaryexpression)
| condition2 = bigscaryexpression
| otherwise = Nothing
where some x y = ...
bigscaryexpression = ...
condition1 = ...
condition2 = ...
Fourth, laziness sometimes offers much more elegant expression of certain algorithms. A lazy 'quick sort' in Haskell is a one-liner and has the benefit that if you only look at the first few items, you only pay costs proportional to the cost of selecting just those items. Nothing prevents you from doing this strictly, but you'd likely have to recode the algorithm each time to achieve the same asymptotic performance.
Fifth, laziness allows you to define new control structures in the language. You can't write a new 'if .. then .. else ..' like construct in a strict language. If you try to define a function like:
if' True x y = x
if' False x y = y
in a strict language then both branches would be evaluated regardless of the condition value. It gets worse when you consider loops. All strict solutions require the language to provide you with some sort of quotation or explicit lambda construction.
Finally, in that same vein, some of the best mechanisms for dealing with side-effects in the type system, such as monads, really can only be expressed effectively in a lazy setting. This can be witnessed by comparing the complexity of F#'s Workflows to Haskell Monads. (You can define a monad in a strict language, but unfortunately you'll often fail a monad law or two due to lack of laziness and Workflows by comparison pick up a ton of strict baggage.)
There's a difference between normal order evaluation an lazy evaluation (as in Haskell).
square x = x * x
Evaluating the following expression...
square (square (square 2))
... with eager evaluation:
> square (square (2 * 2))
> square (square 4)
> square (4 * 4)
> square 16
> 16 * 16
> 256
... with normal order evaluation:
> (square (square 2)) * (square (square 2))
> ((square 2) * (square 2)) * (square (square 2))
> ((2 * 2) * (square 2)) * (square (square 2))
> (4 * (square 2)) * (square (square 2))
> (4 * (2 * 2)) * (square (square 2))
> (4 * 4) * (square (square 2))
> 16 * (square (square 2))
> ...
> 256
... with lazy evaluation:
> (square (square 2)) * (square (square 2))
> ((square 2) * (square 2)) * ((square 2) * (square 2))
> ((2 * 2) * (2 * 2)) * ((2 * 2) * (2 * 2))
> (4 * 4) * (4 * 4)
> 16 * 16
> 256
That's because lazy evaluation looks at the syntax tree and does tree-transformations...
square (square (square 2))
||
\/
*
/ \
\ /
square (square 2)
||
\/
*
/ \
\ /
*
/ \
\ /
square 2
||
\/
*
/ \
\ /
*
/ \
\ /
*
/ \
\ /
2
... whereas normal order evaluation only does textual expansions.
That's why we, when using lazy evaluation, get more powerful (evaluation terminates more often then other strategies) while the performance is equivalent to eager evaluation (at least in O-notation).
If you believe Simon Peyton Jones, lazy evaluation is not important per se but only as a 'hair shirt' that forced the designers to keep the language pure. I find myself sympathetic to this point of view.
Richard Bird, John Hughes, and to a lesser extend Ralf Hinze are able to do amazing things with lazy evaluation. Reading their work will help you appreciate it. To good starting points are Bird's magnificent Sudoku solver and Hughes's paper on Why Functional Programming Matters.
Lazy evaluation related to CPU the same way as garbage collection related to RAM. GC allows you to pretend that you have unlimited amount of memory and thus request as many objects in memory as you need. Runtime will automatically reclaim unusable objects. LE allows you pretending that you have unlimited computational resources - you can do as many computations as you need. Runtime just will not execute unnecessary (for given case) computations.
What is the practical advantage of these "pretending" models? It releases developer (to some extent) from managing resources and removes some boilerplate code from your sources. But more important is that you can efficiently reuse your solution in wider set of contexts.
Imagine that you have a list of numbers S and a number N. You need to find the closest to number N number M from list S. You can have two contexts: single N and some list L of Ns (e.i. for each N in L you look up the closest M in S). If you use lazy evaluation, you can sort S and apply binary search to find closest M to N. For good lazy sorting it will require O(size(S)) steps for single N and O(ln(size(S))*(size(S) + size(L))) steps for equally distributed L. If you don't have lazy evaluation to achieve the optimal efficiency you have to implement algorithm for each context.
Consider a tic-tac-toe program. This has four functions:
This creates a nice clear separation of concerns. In particular the move-generation function and the board evaluation functions are the only ones that need to understand the rules of the game: the move tree and minimax functions are completely reusable.
Now lets try implementing chess instead of tic-tac-toe. In an "eager" (i.e. conventional) language this won't work because the move tree won't fit in memory. So now the board evaluation and move generation functions need to be mixed in with the move tree and minimax logic because the minimax logic has to be used to decide which moves to generate. Our nice clean modular structure disappears.
However in a lazy language the elements of the move tree are only generated in response to demands from the minimax function: the entire move tree does not need to be generated before we let minimax loose on the top element. So our clean modular structure still works in a real game.
Here are two more points which I do not believe have been brought up in the discussion yet.
Laziness is a synchronization mechanism in a concurrent environment. It is a lightweight and easy way to create a reference to some computation, and share its results among many threads. If multiple threads attempt to access an unevaluated value, only one of them will execute it, and the others will block accordingly, receiving the value once it becomes available.
Laziness is fundamental to amortizing data structures in a pure setting. This is described by Okasaki in Purely Functional Data Structures in detail, but the basic idea is that lazy evaluation is a controlled form of mutation critical to allowing us implement certain types of data structures efficiently. While we often speak of laziness forcing us to wear the purity hairshirt, the other way applies too: they are a pair of synergistic language features.
When you turn on your computer and Windows refrains from opening every single directory on your hard drive in Windows Explorer and refrains from launching every single program installed on your computer, until you indicate that a certain directory is needed or a certain program is needed, that is "lazy" evaluation.
"Lazy" evaluation is performing operations when and as they are needed. It is useful when it is a feature of a programming language or library because it is generally harder to implement lazy evaluation on your own than simply to precalculate everything up front.
It can boost efficiency. This is the obvious-looking one, but it's not actually the most important. (Note also that laziness can kill efficiency too - this fact is not immediately obvious. However, by storing up lots of temporary results rather than calculating them immediately, you can use up a huge amount of RAM.)
It lets you define flow control constructs in normal user-level code, rather than it being hard-coded into the language. (E.g., Java has for
loops; Haskell has a for
function. Java has exception handling; Haskell has various types of exception monad. C# has goto
; Haskell has the continuation monad...)
It lets you decouple the algorithm for generating data from the algorithm for deciding how much data to generate. You can write one function that generates a notionally-infinite list of results, and another function that processes as much of this list as it decides it needs. More to the point, you can have five generator functions and five consumer functions, and you can efficiently produce any combination - instead of manually coding 5 x 5 = 25 functions that combine both actions at once. (!) We all know decoupling is a good thing.
It more or less forces you to design a pure functional language. It's always tempting to take short-cuts, but in a lazy language, the slightest impurity makes your code wildly unpredictable, which strongly militates against taking shortcuts.
Consider this:
if (conditionOne && conditionTwo) {
doSomething();
}
The method doSomething() will be executed only if conditionOne is true and conditionTwo is true. In the case where conditionOne is false, why do you need to compute the result of the conditionTwo? The evaluation of conditionTwo will be a waste of time in this case, especially if your condition is the result of some method process.
That's one example of the lazy evaluation interest...
One huge benefit of laziness is the ability to write immutable data structures with reasonable amortized bounds. A simple example is an immutable stack (using F#):
type 'a stack =
| EmptyStack
| StackNode of 'a * 'a stack
let rec append x y =
match x with
| EmptyStack -> y
| StackNode(hd, tl) -> StackNode(hd, append tl y)
The code is reasonable, but appending two stacks x and y takes O(length of x) time in best, worst, and average cases. Appending two stacks is a monolithic operation, it touches all of the nodes in stack x.
We can re-write the data structure as a lazy stack:
type 'a lazyStack =
| StackNode of Lazy<'a * 'a lazyStack>
| EmptyStack
let rec append x y =
match x with
| StackNode(item) -> Node(lazy(let hd, tl = item.Force(); hd, append tl y))
| Empty -> y
lazy
works by suspending the evaluation of code in its constructor. Once evaluated using .Force()
, the return value is cached and reused on every subsequent .Force()
.
With the lazy version, appends are an O(1) operation: it returns 1 node and suspends the actual rebuilding of the list. When you get the head of this list, it will evaluate the contents of the node, forcing it return the head and create one suspension with the remaining elements, so taking the head of the list is an O(1) operation.
So, our lazy list is in a constant state of rebuilding, you don't pay the cost for rebuilding this list until you traverse through all of its elements. Using laziness, this list supports O(1) consing and appending. Interestingly, since we don't evaluate nodes until their accessed, its wholly possible to construct a list with potentially infinite elements.
The data structure above does not require nodes to be recomputed on each traversal, so they are distinctly different from vanilla IEnumerables in .NET.
This snippet shows the difference between lazy and not lazy evaluation. Of course this fibonacci function could itself be optimized and use lazy evaluation instead of recursion, but that would spoil the example.
Let's suppose we MAY have to use the 20 first numbers for something, with not lazy evaluation all the 20 numbers have to be generated upfront, but, with lazy evaluation they'll be generated as needed only. Thus you will pay only the calculation price when needed.
Sample output
Not lazy generation: 0.023373 Lazy generation: 0.000009 Not lazy output: 0.000921 Lazy output: 0.024205
import time
def now(): return time.time()
def fibonacci(n): #Recursion for fibonacci (not-lazy)
if n < 2:
return n
else:
return fibonacci(n-1)+fibonacci(n-2)
before1 = now()
notlazy = [fibonacci(x) for x in range(20)]
after1 = now()
before2 = now()
lazy = (fibonacci(x) for x in range(20))
after2 = now()
before3 = now()
for i in notlazy:
print i
after3 = now()
before4 = now()
for i in lazy:
print i
after4 = now()
print "Not lazy generation: %f" % (after1-before1)
print "Lazy generation: %f" % (after2-before2)
print "Not lazy output: %f" % (after3-before3)
print "Lazy output: %f" % (after4-before4)
Lazy evaluation is most useful with data structures. You can define an array or vector inductively specifying only certain points in the structure and expressing all others in terms of the whole array. This lets you generate data structures very concisely and with high run-time performance.
To see this in action, you can have a look at my neural network library called instinct. It makes heavy use of lazy evaluation for elegance and high performance. For example I totally get rid of the traditionally imperative activation calculation. A simple lazy expression does everything for me.
This is used for example in the activation function and also in the backpropagation learning algorithm (I can only post two links, so you'll need to look up the learnPat
function in the AI.Instinct.Train.Delta
module yourself). Traditionally both require much more complicated iterative algorithms.
Other people already gave all the big reasons, but I think a useful exercise to help understand why laziness matters is to try and write a fixed-point function in a strict language.
In Haskell, a fixed-point function is super easy:
fix f = f (fix f)
this expands to
f (f (f ....
but because Haskell is lazy, that infinite chain of computation is no problem; the evaluation is done "outside-to-inside", and everything works wonderfully:
fact = fix $ \f n -> if n == 0 then 1 else n * f (n-1)
Importantly, it matters not that fix
be lazy, but that f
be lazy. Once you've already been given an strict f
, you can either throw your hands in the air and give up, or eta expand it and clutter stuff up. (This is a lot like what Noah was saying about it being the library that's strict/lazy, not the language).
Now imagine writing the same function in strict Scala:
def fix[A](f: A => A): A = f(fix(f))
val fact = fix[Int=>Int] { f => n =>
if (n == 0) 1
else n*f(n-1)
}
You of course get a stack overflow. If you want it to work, you need to make the f
argument call-by-need:
def fix[A](f: (=>A) => A): A = f(fix(f))
def fact1(f: =>Int=>Int) = (n: Int) =>
if (n == 0) 1
else n*f(n-1)
val fact = fix(fact1)
I don't know how you currently think of things, but I find it useful to think of lazy evaluation as a library issue rather than a language feature.
I mean that in strict languages, I can implement lazy evaluation by building a few data structures, and in lazy languages (at least Haskell), I can ask for strictness when I want it. Therefore, the language choice doesn't really make your programs lazy or non-lazy, but simply affects which you get by default.
Once you think of it like that, then think of all the places where you write a data structure that you can later use to generate data (without looking at it too much before then), and you'll see a lot of uses for lazy evaluation.
Lazy evaluation is poor man's equational reasoning (which could be expected, ideally, to be deducing properties of code from properties of types and operations involved).
Example where it works quite well: sum . take 10 $ [1..10000000000]
. Which we don't mind being reduced to a sum of 10 numbers, instead of just one direct and simple numeric calculation. Without the lazy evaluation of course this would create a gigantic list in memory just to use its first 10 elements. It would certainly be very slow, and might cause an out-of-memory error.
Example where it's not as great as we'd like: sum . take 1000000 . drop 500 $ cycle [1..20]
. Which will actually sum the 1 000 000 numbers, even if in a loop instead of in a list; still it should be reduced to just one direct numeric calculation, with few conditionals and few formulas. Which would be a lot better then summing up the 1 000 000 numbers. Even if in a loop, and not in a list (i.e. after the deforestation optimization).
Another thing is, it makes it possible to code in tail recursion modulo cons style, and it just works.
cf. related answer.
The most useful exploitation of lazy evaluation that I've used was a function that called a series of sub-functions in a particular order. If any one of these sub-functions failed (returned false), the calling function needed to immediately return. So I could have done it this way:
bool Function(void) {
if (!SubFunction1())
return false;
if (!SubFunction2())
return false;
if (!SubFunction3())
return false;
(etc)
return true;
}
or, the more elegant solution:
bool Function(void) {
if (!SubFunction1() || !SubFunction2() || !SubFunction3() || (etc) )
return false;
return true;
}
Once you start using it, you'll see opportunities to use it more and more often.
Without lazy evaluation you won't be allowed to write something like this:
if( obj != null && obj.Value == correctValue )
{
// do smth
}
Among other things, lazy languages allow multidimensional infinite data structures.
While scheme, python, etc allow single dimensional infinite data structures with streams, you can only traverse along one dimension.
Laziness is useful for the same fringe problem, but it's worth noting the coroutines connection mentioned in that link.
If by "lazy evaluation" you mean like in combound booleans, like in
if (ConditionA && ConditionB) ...
then the answer is simply that the fewer CPU cycles the program consumes, the faster it will run... and if a chunk of processing instructions will have no impact on the the outcome of the program then it is unecessary, (and therefore a waste of time) to perform them anyway...
if otoh, you mean what I have known as "lazy initializers", as in:
class Employee
{
private int supervisorId;
private Employee supervisor;
public Employee(int employeeId)
{
// code to call database and fetch employee record, and
// populate all private data fields, EXCEPT supervisor
}
public Employee Supervisor
{
get
{
return supervisor?? (supervisor = new Employee(supervisorId));
}
}
}
Well, this technique allows client code using the class to avoid the need to call the database for the Supervisor data record except when the client using the Employee object requires access to the supervisor's data... this makes the process of instantiating an Employee faster, and yet when you need the Supervisor, the first call to the Supervisor property will trigger the Database call and the data will be fetched and available...
Excerpt from Higher order functions
Let's find the largest number under 100,000 that's divisible by 3829. To do that, we'll just filter a set of possibilities in which we know the solution lies.
largestDivisible :: (Integral a) => a
largestDivisible = head (filter p [100000,99999..])
where p x = x `mod` 3829 == 0
We first make a list of all numbers lower than 100,000, descending. Then we filter it by our predicate and because the numbers are sorted in a descending manner, the largest number that satisfies our predicate is the first element of the filtered list. We didn't even need to use a finite list for our starting set. That's laziness in action again. Because we only end up using the head of the filtered list, it doesn't matter if the filtered list is finite or infinite. The evaluation stops when the first adequate solution is found.