The Master Theorem is a lovely little theorem for solving the sorts of recurrences that often come up in divide-and-conquer algorithms. I’ve seen a bunch of different versions of this theorem over the years. Some are simpler to state but less powerful, while others are extremely expressive but can be a bit intimidating for students learning it for the first time. When I last taught CS161, I used this version of the Master Theorem:

Let be a recurrence defined by . ThenTheorem:

- if , then .
- if , then .
- if , then .

When I first saw this theorem as an undergrad, I remember thinking two things. First, “Wow! It’s amazing that this just happens to work out this way.” Second, “Where on earth did this come from?” I had an amazing algorithms instructor (Tim Roughgarden) who showed a derivation of this result and worked through the intuition behind it. Even so, I remember thinking that the math behind the formal proof of this theorem was surprisingly tricky and very quickly obscured in the intuition.

When it was my turn to teach CS161, I gave a presentation similar to the one I’d seen from Tim, which focused on the intuition behind the three cases, followed up with a quick mathematical derivation of each case. I based the math on what I’d read in CLRS, and I remember thinking that it was a complete pain to get everything to work out correctly.

About a year ago I was talking with a former student of mine (Roslyn Cyrus, whose blog has some very cool descriptions of how to model, represent, and solve puzzle games) about a recurrence relation she’d come across that didn’t quite fit the Master Theorem, and in the course of working through it together I came across a different way of presenting the proof of the Master Theorem that, I believe, is much easier than the one in CLRS, makes the intuition behind the theorem much clearer (and therefore more generalizable), and avoids a lot of the denser algebraic manipulations that typically go along with the theorem. I thought I’d share this approach in case anyone feels like adopting it in their own theory classes (or in case any future algorithms students want to get a better sense of where this result comes from!)

Rather than taking on the Master Theorem directly, I’d like to start with a different example.

Let’s talk nuclear reactors.

Nuclear reactors are cool. I say this as someone who enjoys visiting decommissioned nuclear plants and taking photos of old cooling towers. I have a very weak grasp of the physics behind how they work, but at a *very* high level, a nuclear reactor works like this. You have a bunch of atoms that will break apart if you hit them with a neutron. When those atoms break apart, they release a bunch of energy, and then they kick off more neutrons, which in turn break apart more atoms, which in turn kicks off more neutrons, etc. By extracting the energy that’s given off when the atoms break apart, the reactor generates electricity, and by very closely controlling how many of the neutrons are produced at each point in time, the reactor makes sure it doesn’t blow itself apart.

Let’s imagine that you’re operating a nuclear reactor and you want to figure out how much total power your reactor is going to produce over some period of time. In our simplified model, you can figure this out purely from three quantities. First, when an atom splits, how many other atoms will end up splitting as a direct result of the neutrons that atom kicks off? It might be that every time you split an atom, it fires off two neutrons, which then go on to cause two more atoms to split. Or perhaps each atom, on average, releases 1.3 neutrons, causing (on average) 1.3 more atoms to split. Or maybe, on average, each atom releases 0.7 neutrons, causing (on average) 0.7 more atoms to split. Let’s denote the average number of new atoms that will split based on a single fission event to be . Intuitively, if , the reaction will continue at the same rate over time, since each time an atom splits it triggers exactly one future atom to “replace” it and split again. If , the reaction will slow down and eventually die out, since each fission event doesn’t produce enough new fission events to “replace it.” And, if , then the reaction will grow over time and produce more and more power, since each fission event ends up triggering even more future fission events, which trigger more future fission events, etc. (That last case is generally considered desirable if you’re designing a nuclear weapon and should be treated with care if you’re designing a nuclear power plant.)

(Fun fact: the actual way nuclear reactors work is more complicated than this due to the fact that there are both *prompt neutrons* generated directly from nuclear fuel fission and *delayed neutrons** *generated from the decay of older fission byproducts. But in this simplified model, we’ll ignore that!)

The next quantity you’d probably want to know is how much power your reactor is initially generating. As a simplifying assumption, we’ll assume that the amount of power generated is directly proportional to the number of fission events occurring per unit time. (In reality it’s more complicated. I don’t know exactly *how* it’s more complicated, but I’d be surprised if it were really this simple.) We’ll imagine that the reactor’s initial power output at the time you start monitoring it is denoted by . If is large, it means that the reactor is initially producing a lot of power, and if is small it means that the reactor is initially producing very little power.

Finally, there’s the question of how long you let the reactor run. The longer you let the reactor run, the more power you’ll generate. Here we’re going to make a very aggressive simplifying assumption and assume that the reactor works in “stages,” where one “stage” corresponds to a bunch of atoms splitting, the next “stage” corresponds to the next round of atoms splitting, etc. Obviously, this isn’t how real reactors work, but it makes the math a bit easier. Let’s assume that the total number of “stages” we let the reactor run for is given by , with standing for “time.”

So we now have three parameters: , which tells us whether the reaction grows, shrinks, or remains stable over time; , which tells us the initial level of activity in the reactor; and , which tells us how long the reaction will go for. Given these parameters, how much total power ends up getting produced?

Well, let’s think about this one step at a time. In the first “stage” of our nuclear reaction, we’ll produce total power. Each one of the fission events will trigger new fission events, and since the power generated is proportional to the number of fission events, in the next stage we’ll produce total power. Depending on what value we have for , that might be greater, less than, or equal to the amount of power we generated in the first stage. Each of the fission events in that second stage will in turn trigger more fission events in the third stage, and because the power generated is proportional to the number of fission events, in the third stage we’ll have total power generated. Similarly, in the fourth stage we’ll have total power generated, in the fifth stage we’ll have total power generated, etc. In other words, the amount of power generated as a function of the number of steps is exponential in .

So how much total power ends up getting generated over time steps? Well, that’s going to be the sum of the power drawn at each step, which is given by

.

We now have a nice summation in terms of our three parameters – , , and – that tells us how much total power is going to be produced. So now we can ask – how exactly do you simplify that summation?

The good news is that this is a famous summation – it’s the sum of a geometric series! This is one of those sums that I think everyone at some point learned and then promptly forgot – I’ll admit that until I started routinely teaching CS theory I had completely forgotten it. There are two cases that arise for this sum. First, if , then notice that for any choice of , and so this sum is relatively easy to evaluate:

.

In other words, the total power generated is equal to the initial power times the total number of time steps , which makes sense because the total number of fission events isn’t changing as a function of time.

On the other hand, if , then there are two possibilities. The first possibility is that , in which case the amount of power generated at each stage in the reactor is lower than the previous stage (it’s decaying geometrically). In that case, using the formula for the sum of an infinite geometric series, we have the nice, handy result that

.

In other words, the total power generated will be at most some constant multiple of the original power, and the specific constant multiple depends on the choice of .

In the last case, where , we have this lovely formula at our disposal:

That formula is, um, a bit dense. For the purposes of where we’re going, I think it’s a bit easier to see what’s going on if we rewrite it like this:

In other words, the total power drawn is exponential in the number of time steps , with the base of the exponent depending on and an extra constant factor that also depends on .

So what’s the net result of all of this mathematical gymnastics? Although we’ve framed this in the context of nuclear reactions, really, the above math applies equally well to *any* process involving some quantity that changes exponentially as a function of time. In fact, we can summarize what we’ve seen in this nice, handy way:

Let be a growth parameter, be an initial value, and be a number of time steps. ThenTheorem:

- if , then .
- if , then .
- if , then .

The general framework of this theorem – there are some initial parameters, that based on three cases split into three different inequalities bounding some overall quantity – looks a *lot* like the setup of the Master Theorem, and that’s not a coincidence. In fact, the Master Theorem is essentially what you get if you work out how the work done in a recurrence relation grows or shrinks as a function of time and then just plug the appropriate quantities back up into the above theorem!

To pull everything together, let’s return back to the Master Theorem. The setup for the Master Theorem assumes that we have some recurrence relation . This corresponds to the work done by a recursive function that

- does work at each step, then
- makes total recursive calls,
- each of which is on an input of size .

The question then is the following: if you call this recursive function on an input of size , how much total work ends up being done? In other words, what’s a good, reasonable bound on ?

To figure that out, it often helps to visualize exactly what this recursive call tree would look like using a recursion tree. With the recurrence we have up above, the recursion tree would look something like this:

Here, our top-level recursive call (on an input of size ) fires off recursive calls, each of which are on an input of size . Each of those recursive calls then fire off new recursive calls, each of which is of size , and though it’s not shown those calls each fire off calls of size , which fire off calls of size , etc. until eventually we hit our base case.

Although we haven’t discussed base cases here, let’s assume that our base case occurs when we get the input down to size 1. Since when we’re layers deep into our recursion the input size is , this means that the recursion terminates when , so the number of layers in our tree will be .

If we want to sum up how much total work is done across all the recursive calls, we *could* sum up how much work is done at each individual call, but that turns out to be a bit tricky. Rather, we’ll sum up how much work is done at each individual layer in the tree. (If you haven’t seen this before, this is a standard technique that comes up when manipulating recursion trees.)

In the first layer, we have a single recursive call of size , and since our recurrence relation is , the work done purely in that call is .

In the second layer, we have recursive calls made. Each call is on an input of size . This means that each individual recursive calls does . Across all calls, that works out to a total of total work.

How about the layer after that? Well, there are recursive calls here (each of the calls in the layer above made calls) and each call is on an input of size , since each call in the layer above us was on an input of size and each call shrinks the input size by a factor of . The work done by each individual call here is , and summed up across all calls on this row, the work done is .

It seems like there’s a very interesting trend here. The first layer did work. The second layer did work. The third layer did work. It turns out that using similar logic the fourth layer ends up doing work, the fifth layer does work, etc. Notice that the work changes geometrically between levels – each level does a factor as much work as the previous one. We also know how many layers in the tree there are – there are of them – and we know how much work is done at the top level (it’s ). In other words, the work done here is pretty much exactly equal to the work done by our nuclear reactor, with

- ,
- , and
- .

So all that’s left to do now is to plug these quantities into each of the three cases of the earlier theorem and do some algebra!

Our first case is where . That’s the exponential decay case – it means that the total work done decays from layer to layer, like in our nuclear reactor where each fission doesn’t trigger enough fissions to replace it. In that case, the sum of the work across the layers of the tree is given by . Plugging in gives us that the total work here is , and since in this case is a fixed constant that doesn’t depend on , the total work in this case is . So if , we get that the recurrence solves to .

Our second case is where . That’s the stable case – it means that the work done in each layer of the recursion tree is the same, and it corresponds to the case in our nuclear reactor where each fission event replaces itself in the next iteration. In that case, the work done is . Plugging in our values, we get that the total work done here is .

The last case is where . That’s the exponential growth case – it means that the work done in each layer of the tree increases as you go down the tree, and it corresponds to the case in our nuclear reactor where the reactor power keeps on growing as more and more fissions occur. In this case, the work done is bounded by . If we plug in our values of , , and and simplify, we get that the total work done is bounded as follows:

That looks horrible – I’m really sorry! – but it’s about to get a lot better. Using the useful but not particularly well known equality that , we get that this whole thing simplifies to

and since is a constant with respect to , this whole messy expression ends up being .

Summarizing the three cases we found:

- If , then .
- If , then .
- If , then .

Or, if you think that fraction involving , , and is too messy, we can rewrite the same thing with logarithms:

- If , then .
- If , then .
- If , then .

*Et voila*. We’ve derived the Master Theorem!

This presentation is lengthy, but I think it hits on the core ideas behind the Master Theorem. First, that ** the Master Theorem is a theorem about exponential growth, stability, or decay**. The three cases in the Master Theorem correspond to whether some key parameter of the recurrence causes a certain behavior to grow exponentially, remain stable over time, or decay exponentially. Second, that

**. The specific exponential parameter we get here, , is found by looking at the recursion tree for the recurrence, going one layer at a time, and summing up the total work that’s being done.**

*the Master Theorem is a theorem about recursion trees*If you understand that these are the key ideas behind the Master Theorem, then you have more than a handy little tool for most recurrence relations – you have a framework for thinking about other recursion trees that you see in the future.

For example, consider the recurrence relation , which comes up in the analysis of the median-of-medians algorithm for linear-time selection. This recurrence relation doesn’t fit into the framework given by the Master Theorem because the recursive bits don’t have the same size. But that doesn’t mean we can’t take a stab at it. If we think about what that recursion tree is going to look like, we’ll see that at the first level we have a single call of size , and it does work. Below that are two calls of size and , which collectively will do work. That looks a lot like exponential decay. In fact, it is – if you keep expanding out the tree, you’ll see that the work at each level drops by at least per level. Using what we know about geometric series, that suggests we’d see the same sort of behavior we’d see in general exponential decay – that the work at the top dominates the rest of the work – and that’s indeed what we find. This recurrence solves to . (The initial recurrence that spawned the conversation that led to this presentation of the Master Theorem was . Can you do a similar analysis to solve it?)

Or what about my all-time favorite recurrence relation, ? That doesn’t look *anything* like what the Master Theorem talks about. But look at what happens if you draw out the recursion tree – each level does work (do you see why?) – so we’d expect that the work done here should be the work per level times the number of levels. That makes the recurrence solve to , once you recognize that the number of levels happens to be .

If I get another chance to teach CS161, I’m thinking about using this presentation – with some better visuals and a lot more exposition in the math – to explain the Master Theorem and how to attack other sorts of related recurrence relations. But in the meantime, I figured I’d get this out in case anyone else was interested in taking a look at it.

Enjoy!