Aug 31, 2020

# Brilliant But Lazy

The third and final installment in a three part series about Lodash. Lazy evaluation, performance gains and its applications in Lodash.   Some time back I wrote an article about Lodash. It covered some of the great things about the library as well as some pitfalls of using it. Part of it focused on Lodash's `chain` function, which allows you to pass a value through a chain of Lodash utilities.

Later, I wrote an article about composition with Lodash and the FP module, `flow` and how `flow` can overcome some of  `chain`'s issues.

In this article, I'll be talking a little more about `chain` and how it uses a trick that, in some cases, can drastically improve performance when dealing with arrays.

But first, let's talk about two different approaches to computing values.

## Lazy vs Eager Evaluation

First, let's consider three simple functions in JavaScript.

``````// Simple function to compute the boolean and of its arguments
function and(a, b) {
return a && b
}

// Another simple function to compute the boolean or of its arguments
function or(a, b) {
return a || b
}

// Simple troublemaker
function loopForever() {
while (true) {}
}
``````

The `and` and `or` functions just perform simple boolean operations.

``````const x = true, y = false, z = true

const result = or(x, and(y, z))
// -> result is true
``````

But if you try to pass `loopForever` as an argument into one of them,

``````const neverStopsLooping = or(x, loopForever())
// -> computer go BOOM!
``````

things stop working.

However, if you directly use the `||` operator instead of the `or` function, it works. You can try it yourself below:

Why does this work?

It's because JS and many other languages use short-circuit evaluation for boolean operators.

This is the essence of two broad evaluation strategies. The first example shows how JS usually evaluates expressions like the function call. It does so eagerly, which means that all sub-expressions (like the `loopsForever` function call and the value of the `x` variable) are evaluated first and then the whole expression.

On the other hand, in the last example with the `||` operator, the expression is evaluated from left to right. As soon as it is obvious that the first operand of `||` is true, there is no point in evaluating the entire expression. So the JS runtime doesn't even bother calling `loopsForever`.

The second example definitely isn't eager evaluation and isn't strictly using the other evaluation strategy either, but to a certain degree emulates lazy evaluation.

Some operators in JS work this way, but in general, JS evaluates expressions eagerly. On the other hand, some languages like Haskell and Elixir evaluate lazily by default.

Evaluating all expressions lazily allows you to use things that wouldn't be possible (at least not easily) in JS, such as infinite lists.

For example, in Haskell, it is possible to create and store infinite lists with the following shorthand:

``````-- A finite list
numbersOneToTen = [1..10]

-- An infinite list
naturalNumbers = [1..]
``````

A popular example of its usage is even displayed at the top of its official website, which contains a code snippet to generate the list of all prime numbers. It's surprisingly terse. Most of this code involves filtering out non-primes, but what we're interested in is in the first line, which applies the filter on an infinite list of numbers starting from 2 (`[2..]`).

How do you store an infinite list in memory? Obviously no computer on earth has the necessary amount of memory.

The key is in how lists are defined in Haskell. They're recursively defined as follows:

``````-- A list of type a (a is a stand-in for a type like Integrer or String)

data [a] = [] | (a : [a])

-- This is actually pseudo-code. The actual definition is similar,

In plain words, a list (of some type `a`) is defined either as an empty list or a head concatenated with a tail, which itself is a list.

So, when a list needs to be traversed, it is traversed an element at a time, starting from the head, proceeding recursively to the head of the tail. So, as long as nobody tries to traverse the entire list, the program uses finite memory.

All this talk about infinite lists may not make much practical sense right now, but it will as soon as we see how it can be used as a performance optimisation...

...and that's exactly what Lodash does.

## How Lodash Evaluates Chains

Remember `chain` from Lodash? It takes a value, wraps it up and allows you to call Lodash functions on it as methods. Effectively, it lets you chain together operations with a very convenient syntax.

However, in some cases, it may not be necessary to apply each function in the chain to every element in the input (particularly in the case of arrays). For example, in the above example, we use `take(2)` to pick out the first two elements from the result array. If the input array had 1000 elements, first mapping, then filtering those 1000 elements would be wasteful, if only 2 from the result were actually required.

So, Lodash gets around this by implementing a technique that reduces the number of function applications required to get the final result. We can see this by running the following code snippet:

When we see the number of times map and filter were called, instead of the expected 1000 for each of them, we see something much smaller (it will be random, since our input is random, but will usually be somewhere between 3 to 6).

The technique used by Lodash is clever, in some ways intuitive and is called...

## Shortcut Fusion

You could think of the previous example as working in stages. First the array is passed through `map`, then `filter` and finally through `take`. However, on average, this can be wasteful. Instead, if we sequence this differently and pass each element through `map`, `filter` and `take` before the next element, you could stop passing the elements through the pipeline as soon as the required number of elements (3) is satisfied. This is the essence of shortcut fusion. It means combining (fusing) functions that are going to be applied to elements into a single call and calling it on the elements in one go, instead of requiring passes (where the passes would be the `map`, `filter` and `take` stages).

For example,

``numbers.map(x => x + 2).map(x => x + 3)``

could be fused into

``numbers.map(x => x + 2 + 3)``

or, this

``numbers.filter(x => x > 0).filter(x => x % 2 === 0)``

could be converted to

``numbers.filter(x => x > 0 && x % 2 === 0)``

By doing so, the number of iterations on the array can be halved or better, especially if the entire array isn't going to be used anyway (like if `take` was being used too).

Lodash's magic isn't in the functions, but in the value method. The functions in `map`, `filter` and `take` (called the iteratees) aren't even applied until `value` is called. The `value` method calls the fused iteratees (callbacks passed to `map`, `filter` etc.) and then unwraps the value.

The wrapped value is stored somewhere nested inside many wrappers (`LodashWrapper` and `LazyWrapper`). We can see how the wrapped value remains the same by logging the result of each stage.

The wrapped value (in the most deeply nested `__wrapped__`) doesn't change at all. Instead, what we see is that in each stage, the `__iteratees__` array grows. `__iteratees__` contains descriptors for each callback in `map`, `filter` etc.

Each iteratee has a `type` field (1 if it was a callback inside `filter`, 2 for `map` and so on) and an `iteratee` field, which contains the actual function which will be called on the array.

To keep track of special filters like `take`, the wrapper contains a field called `__takeCount__` which is taken into account when calling `value`.

This implements lazy evaluation because the actual computation is only done when `value` is called, and at that time, only those values which are required are pulled out of the input array and passed through the iteratees.

You can find the code for the implementation of the value method here.

The `chain` method isn't the only Lodash utility that uses lazy evaluation. `flow` does this too:

## Generators

Lodash does fine when dealing with large collections of data efficiently. But what about infinite lists?

Remember the Haskell example? Is it possible to create something similar in JavaScript?

The key to implementing infinite lists is to realise that the list is represented as a head and tail. The tail is itself a list, but is not evaluated until its head or some other element actually needs to be accessed.

The list is traversed one value at a time. So what we need is a construct that allows us to define a pattern and then generate a value at a time on demand.

To do this, we can use generators. Generators can be created using generator functions which have a special syntax in JS.

``````function* infiniteList() {
// ...
}``````

Calling it returns an object that lets you generate values on demand.

``````const naturalNumbers = infiniteList()

``````

We could define the generator for the natural numbers as follows:

It's not the same as the Haskell example, which is immutable. The generator instance is stateful. As soon as you call `next` on it, it's impossible to get the old values back, unless you've stored them.

However, generators can be used as the basis to write powerful code that leverages lazy evaluation. You can read more about how to implement lazy versions of `map`, `filter` and `reduce` in this really good article.

## The Problem With Laziness

Lazy evaluation is great as a performance optimisation. Some languages use it as the default evaluation strategy. The problem with it though, is that it's really difficult to reason about complexity (particularly space utilisation) because memory is allocated on demand and it may not be obvious when it's going to be allocated.

Overuse of lazy evaluation can lead to unexpected performance bottlenecks, which is why a lot of people prefer eager evaluation as the default strategy and criticise Haskell for implementing lazy evaluation by default.

Lazy evaluation can be tricky, but in eager languages, it can improve performance, if used carefully.

Lodash is just one example of this. You can find many other examples of using lazy evaluation to enhance performance. `chain` is pretty cool, but there's only so much data you're going to work with if you're using JavaScript. For truly large datasets, you're probably going to pick other systems like Spark, which leverages lazy evaluation in a very similar way.

If you want a really deep insight into evaluation strategies, the classic book The Structure and Interpretation of Computer Programs contains a chapter on order of evaluation. But beware, examples are in Lisp. However, the explanation is very detailed. It also uses different names: the terms `Applicative order` and `Normal order` to more or less refer to eager and lazy evaluation respectively.    