Aug 31, 2020

Brilliant But Lazy

The third and final installment in a three part series about Lodash. Lazy evaluation, performance gains and its applications in Lodash.
Rishabh Karnad
Rishabh KarnadAssociate Engineering Manager
lines

Some time back I wrote an article about Lodash. It covered some of the great things about the library as well as some pitfalls of using it. Part of it focused on Lodash's chain function, which allows you to pass a value through a chain of Lodash utilities.

Later, I wrote an article about composition with Lodash and the FP module, flow and how flow can overcome some of  chain's issues.

In this article, I'll be talking a little more about chain and how it uses a trick that, in some cases, can drastically improve performance when dealing with arrays.

But first, let's talk about two different approaches to computing values.

Lazy vs Eager Evaluation

First, let's consider three simple functions in JavaScript.

The and and or functions just perform simple boolean operations.

But if you try to pass loopForever as an argument into one of them,

things stop working.

However, if you directly use the || operator instead of the or function, it works. You can try it yourself below:

Why does this work?

It's because JS and many other languages use short-circuit evaluation for boolean operators.

This is the essence of two broad evaluation strategies. The first example shows how JS usually evaluates expressions like the function call. It does so eagerly, which means that all sub-expressions (like the loopsForever function call and the value of the x variable) are evaluated first and then the whole expression.

On the other hand, in the last example with the || operator, the expression is evaluated from left to right. As soon as it is obvious that the first operand of || is true, there is no point in evaluating the entire expression. So the JS runtime doesn't even bother calling loopsForever.

The second example definitely isn't eager evaluation and isn't strictly using the other evaluation strategy either, but to a certain degree emulates lazy evaluation.

Some operators in JS work this way, but in general, JS evaluates expressions eagerly. On the other hand, some languages like Haskell and Elixir evaluate lazily by default.

Evaluating all expressions lazily allows you to use things that wouldn't be possible (at least not easily) in JS, such as infinite lists.

For example, in Haskell, it is possible to create and store infinite lists with the following shorthand:

A popular example of its usage is even displayed at the top of its official website, which contains a code snippet to generate the list of all prime numbers. It's surprisingly terse.

Most of this code involves filtering out non-primes, but what we're interested in is in the first line, which applies the filter on an infinite list of numbers starting from 2 ([2..]).

How do you store an infinite list in memory? Obviously no computer on earth has the necessary amount of memory.

The key is in how lists are defined in Haskell. They're recursively defined as follows:

In plain words, a list (of some type a) is defined either as an empty list or a head concatenated with a tail, which itself is a list.

So, when a list needs to be traversed, it is traversed an element at a time, starting from the head, proceeding recursively to the head of the tail. So, as long as nobody tries to traverse the entire list, the program uses finite memory.

All this talk about infinite lists may not make much practical sense right now, but it will as soon as we see how it can be used as a performance optimisation...

...and that's exactly what Lodash does.

How Lodash Evaluates Chains

Remember chain from Lodash? It takes a value, wraps it up and allows you to call Lodash functions on it as methods. Effectively, it lets you chain together operations with a very convenient syntax.

However, in some cases, it may not be necessary to apply each function in the chain to every element in the input (particularly in the case of arrays). For example, in the above example, we use take(2) to pick out the first two elements from the result array. If the input array had 1000 elements, first mapping, then filtering those 1000 elements would be wasteful, if only 2 from the result were actually required.

So, Lodash gets around this by implementing a technique that reduces the number of function applications required to get the final result. We can see this by running the following code snippet:

When we see the number of times map and filter were called, instead of the expected 1000 for each of them, we see something much smaller (it will be random, since our input is random, but will usually be somewhere between 3 to 6).

The technique used by Lodash is clever, in some ways intuitive and is called...

Shortcut Fusion

You could think of the previous example as working in stages. First the array is passed through map, then filter and finally through take.

However, on average, this can be wasteful. Instead, if we sequence this differently and pass each element through map, filter and take before the next element, you could stop passing the elements through the pipeline as soon as the required number of elements (3) is satisfied.

This is the essence of shortcut fusion. It means combining (fusing) functions that are going to be applied to elements into a single call and calling it on the elements in one go, instead of requiring passes (where the passes would be the map, filter and take stages).

For example,

could be fused into

or, this

could be converted to

By doing so, the number of iterations on the array can be halved or better, especially if the entire array isn't going to be used anyway (like if take was being used too).

Lodash's magic isn't in the functions, but in the value method. The functions in map, filter and take (called the iteratees) aren't even applied until value is called. The value method calls the fused iteratees (callbacks passed to map, filter etc.) and then unwraps the value.

The wrapped value is stored somewhere nested inside many wrappers (LodashWrapper and LazyWrapper). We can see how the wrapped value remains the same by logging the result of each stage.

The wrapped value (in the most deeply nested __wrapped__) doesn't change at all. Instead, what we see is that in each stage, the __iteratees__ array grows. __iteratees__ contains descriptors for each callback in map, filter etc.

Each iteratee has a type field (1 if it was a callback inside filter, 2 for map and so on) and an iteratee field, which contains the actual function which will be called on the array.

To keep track of special filters like take, the wrapper contains a field called __takeCount__ which is taken into account when calling value.

This implements lazy evaluation because the actual computation is only done when value is called, and at that time, only those values which are required are pulled out of the input array and passed through the iteratees.

You can find the code for the implementation of the value method here.

The chain method isn't the only Lodash utility that uses lazy evaluation. flow does this too:

Generators

Lodash does fine when dealing with large collections of data efficiently. But what about infinite lists?

Remember the Haskell example? Is it possible to create something similar in JavaScript?

The key to implementing infinite lists is to realise that the list is represented as a head and tail. The tail is itself a list, but is not evaluated until its head or some other element actually needs to be accessed.

The list is traversed one value at a time. So what we need is a construct that allows us to define a pattern and then generate a value at a time on demand.

To do this, we can use generators. Generators can be created using generator functions which have a special syntax in JS.

Calling it returns an object that lets you generate values on demand.

We could define the generator for the natural numbers as follows:

It's not the same as the Haskell example, which is immutable. The generator instance is stateful. As soon as you call next on it, it's impossible to get the old values back, unless you've stored them.

However, generators can be used as the basis to write powerful code that leverages lazy evaluation. You can read more about how to implement lazy versions of map, filter and reduce in this really good article.

The Problem With Laziness

Lazy evaluation is great as a performance optimisation. Some languages use it as the default evaluation strategy. The problem with it though, is that it's really difficult to reason about complexity (particularly space utilisation) because memory is allocated on demand and it may not be obvious when it's going to be allocated.

Overuse of lazy evaluation can lead to unexpected performance bottlenecks, which is why a lot of people prefer eager evaluation as the default strategy and criticise Haskell for implementing lazy evaluation by default.

Conclusion and Further Reading

Lazy evaluation can be tricky, but in eager languages, it can improve performance, if used carefully.

Lodash is just one example of this. You can find many other examples of using lazy evaluation to enhance performance. chain is pretty cool, but there's only so much data you're going to work with if you're using JavaScript. For truly large datasets, you're probably going to pick other systems like Spark, which leverages lazy evaluation in a very similar way.

If you want a really deep insight into evaluation strategies, the classic book The Structure and Interpretation of Computer Programs contains a chapter on order of evaluation. But beware, examples are in Lisp. However, the explanation is very detailed. It also uses different names: the terms Applicative order and Normal order to more or less refer to eager and lazy evaluation respectively.

Hope you enjoyed this article. Thanks for reading!

Book a Discovery Call.

blog logo