recur Documentation Release 1.0

panopticonopolis

Oct 16, 2019

Contents:

1 Getting Started 3 1.1 Introduction: Another Guide To ? And Why Is It So Long?...... 3 1.2 Scope, Frame and Stack...... 4 1.3 Recursion in Light of Frames...... 10 1.4 Counting In Recursion...... 14 1.5 Recursion and Swapped Arguments...... 18 1.6 Palindromes and Recursion-as-Evaluation...... 21 1.7 Summing a List of Lists...... 26 1.8 Using Recursion to Make More: The Power Set...... 35 1.9 Expanding a Series: Pascal’s Triangle...... 41 1.10 Multiple Recursive Calls: Fibonacci Sequence, Part 1...... 49 1.11 Memoization: Fibonacci Sequence, Part 2...... 55 1.12 Recursive Approaches To Searching And Sorting...... 62 1.13 Recursion and Self-Similarity: Koch Curves...... 71 1.14 The Sierpinski Triangle...... 81 1.15 Lindenmayer Systems...... 92 1.16 Solving L-System Recursion...... 96 1.17 Boss Level: The Tower of Hanoi...... 104

2 Indices and tables 117

i ii recur Documentation, Release 1.0

A shamelessly verbose guide to the wonders of recursion

Contents: 1 recur Documentation, Release 1.0

2 Contents: CHAPTER 1

Getting Started

For a lot of people, learning recursion for the first time pretty much sucks. It doesn’t have to be that way. This guide is intended to help beginning (and perhaps even intermediate) programmers learn to think recursively. It’s not math-heavy, so there are no proofs, and very little discussion of time/space complexity. But I do take a text-heavy approach, because I think patient explanation is a key ingredient in helping people understand this crucial technique. Although the code is presented in Python, recursion is a fairly universal concept, so the material should be accessible to non-Python developers. Sometimes I’ll present the code immediately and then unpack it. Other times, we’ll work towards the final recursive solution, starting only from first principles. At the end of each section I deduce a few heuristics, and include an exercise or two that will apply the material and push comprehension a bit further. I hope the end result is a critical framework developers can use to identify, analyze and solve problems that demand (or simply favor) recursive solutions.

1.1 Introduction: Another Guide To Recursion? And Why Is It So Long?

Oftentimes algorithms are presented to beginning programmers as recipes or to-do lists. You may have a to-do list that starts with ‘pick up the kids from school’, then realize that, you first must put gas in the car. But first of all, you need to find your car keys. You can’t do the last step unless you’ve done all the other ones. If you’re baking a cake, the final step may be to frost the cake, but not before you have taken the cake out of the oven, which of course requires you to make the cake in the first place, in addition to preparing the frosting, etc. Of course, there is nothing particularly recursive about this. Breaking a problem down into more manageable sub- problems is true of algorithmic design in general. What makes recursion unique in algorithmic thinking is the fact that it’s about designing a solution in such a way that, simply by running the same function over and over with slightly modified inputs, you can have your cake and eat it too. The following guide to recursion is undoubtedly long. But that is because most material currently available online is simply too short. Recursion is an extremely compact method. While this allows for a presentation that is usually considered ‘elegant’, the practical consequence is that much of the program execution takes place implicitly.

3 recur Documentation, Release 1.0

For example, it’s not terribly difficult to understand the principle of recursive action, whether generally or for a specific piece of code. But, as with any algorithm, the ability to write out the actual process of computation on a line-by-line basis is where the rubber meets the road. As a result, much of the length of this document is due to the fact that I have annotated what actually happens when a function undergoes recursive computation, using print statements to trace inputs, transformations and outputs. It’s also long because the usual ways of presenting problems and their recursive solutions is inadequate. Usually, the code is simply given ex cathedra, or perhaps accompanied with a brief commentary on its salient aspects. With recursion, there is an abiding mystery of just how we got to that solution. Although it’s not true for every algorithm I’ve chosen to present here, in this guide I prefer to describe how to think about the problem, and then how to convert that critical thinking into code. If you understand recursion on an intuitive level, this guide is most likely not for you. It will be boring and obvious. But for many others, it’s my hope that the explanations of how recursive calls and returns work will be time well spent. The following text uses Python 3.6 but the concepts should be broadly applicable to many languages. A solid work- ing knowledge of Python is preferred, including the basic data types and their associated methods. (Virtually) no familiarity with object-oriented techniques is needed.

1.2 Scope, Frame and Stack

The most common way of introducing recursion to programmers involves two seemingly simple steps. Given a programming language that allows a function to call itself, we construct a recursive function once we do two things: 1) Identify the base case 2) Identify the recursive case We formulate these two cases by successfully reducing the problem at hand to ‘its simplest possible solution’, whatever that means. This, as the saying goes, is necessary but not sufficient, and there’s much that still needs to be unpacked. The purpose of this guide is to provide the reader with a toolbox of heuristics that can be used to quickly analyze a recursive algorithm. Once the reader is comfortable with analysis, the same set of heuristics can be applied to thinking about a problem recursively, and constructing a solution.

1.2.1 Understanding how functions work

Recursion is commonly introduced to students within the context of iterative procedures, especially for loops, ‘but with functions’. This is the first source of misunderstanding. Recursion is better considered as a consequence of func- tional scope, frames and the call stack. Put differently, we understand recursion when we understand how functions prioritize, quarantine, and share values bound to variables and the evaluation of expressions (scope), and the way those spaces are created and then discarded (frames added to and subtracted from the stack). This may make little sense in the abstract, so let’s look at some code. Just about the simplest program we can write is: hello="Hello, World!" print(hello)

>>> Hello, World!

In this program, there is only one frame in the stack, known as the global frame. We could add a few more variables and statements and what not, and everything would still belong to this global frame. All variables and their associated values would be just as available to any further statement we chose to write (although variables used for iterating over loops don’t have this sort of permanence).

4 Chapter 1. Getting Started recur Documentation, Release 1.0

This changes when we introduce functions. Every time we call (as opposed to merely define) a function, a new frame is created, and is added to the stack. At the same time, the flow of control passes to that frame. The usual analogy is a stack of cafeteria trays: the global frame is the first tray. When we call a function from the global frame, the new frame is placed on top of it. When that function completes its computation, it returns its results, and the frame is discarded. In terms of our analogy, we can put what we want on the tray, then remove it from the top of the stack. The tray beneath it is now available for plating food, or whatever it is we want to do with it. This kind of ordering is commonly known as LIFO, or Last In First Out. Of course, if the function calls another function and is waiting for results, then its frame remains open or ‘on the stack’. But as mentioned, while there may be numerous frames open at any given point during a program’s runtime, the Python interpreter is always executing the current computation within the context of a specific frame. This will be extremely important to keep in mind as we begin to understand recursion. Part of what makes frames valuable is that they regulate the accessibility of a function’s variables, otherwise known as scope. For example, consider this trivial function:

def foo(x): return x

x= 12

print(foo(x))

>>> 12

Easy enough. But consider that we also get 12 if we write:

def foo(y): return y

x= 12

print(foo(x))

Or even if we write:

def foo(y): return x

x= 12

print(foo(x))

This is scope at work. When we first specify x = 12 we do in at the global frame of the program. When we pass x to foo(), the function’s parameter doesn’t care if the argument in print(foo(_)) is x or q or whatever, but only that it is of the correct type. Otherwise the function can’t perform its computation. But once the argument is passed to the function, we must refer to it within the function as it was named in the function definition, ie, foo(y). Once foo() is invoked, it has its own frame created on the stack, one that now has its own, ‘local’ scope. (Note that the function has to be called or invoked - simply defining a function doesn’t create a new frame). Within that frame, y is a local variable that exists only within the function. When the function’s work is finished, y and all other local variables are destroyed - you can verify this in the second snippet of code, by attempting to print(y) outside of the function. Even if you invoke print(y) after the function has returned y, you’ll get an error. So why does return x in the last code snippet not throw an error? When we declared x = 12 we did so in the global frame. As such, it is a global variable. This means that x will be available to all statements and expressions in that frame (such as print(foo(x))), but also to any frame that originates from the global frame, like the one generated when we called foo(). In other words, if Python is executing

1.2. Scope, Frame and Stack 5 recur Documentation, Release 1.0

foo() and comes across a variable that is not stated inside foo(), it will look outside the scope and, if it finds x, it will use it (of course, if it doesn’t find it, it will throw an error). Take to the extreme, we could use this to forego passing any argument at all, although that doesn’t seem very useful:

def foo(): return x

x= 12

print(foo())

>>> 12

It stands to reason, then, that global and local variables retain their separate identities, even if they share a name:

def foo(f): f=5 print('in foo(), f =', f)

f= 12 print(foo(f)) print('in global frame, f =', f)

>>> in foo(), f = 5 >>> in global frame, f = 12

Here, the f printed inside the function is the local variable. Python won’t look any further than it has to, and is happy to print 5 for print('in foo(), f =', f). At the same time, print('in global frame, f =', f) gives us 12 in the global frame, even though we are also using f as the argument when we call foo(). Having two variables with the same name doesn’t bother the interpreter, as they are separated from one another by the rules of scope.

Note: Many of the programs we’ll see in this guide will be single functions, so function and frame may seem to be used interchangeably, in the sense that each frame will contain the function that is the entirety of the program. In a program with multiple functions, a separate frame opens for each function, and doesn’t take anything else with it. So there is an important difference, but one we won’t encounter very often.

1.2.2 Nested functions are nested frames

You may have already seen functions defined - or ‘nested’ - within other functions. As you may suspect, when such functions are invoked, the frames that contain them are also nested. Here, foo2() is a function defined within the function foo(). As a result, foo2() can also be called a ‘helper function’:

def foo(): f="I'm enclosed" print('in foo(), f says', f)

def foo2(): f="I'm local" print('in foo2(), f says,', f)

print(foo2()) (continues on next page)

6 Chapter 1. Getting Started recur Documentation, Release 1.0

(continued from previous page) f="I'm global" print('in global frame, f says,', f) print(foo())

>>> in global frame, f says, I'm global >>> in foo(), f says I'm enclosed >>> in foo2(), f says, I'm local

Here, f in foo() is an ‘enclosed’ variable, since foo() is a wrapper for another function. The separation of local vs enclosed vs global allows each f to be printed correctly. Just as we can reference a global variable from inside a local frame, we can also reference an enclosed variable from inside a local frame, as long as the function in the local frame is wrapped into the function in the enclosed frame: def foo(): g="I'm enclosed" print('in foo(), g says', g)

def foo2(): h="I'm local" print('in foo2(), g says,', g) print('from foo2(), f still says,', f)

print(foo2()) f="I'm global" print('in global frame, f says,', f) print(foo())

>>> in global frame, f says, I'm global >>> in foo(), g says I'm enclosed >>> in foo2(), g says, I'm enclosed >>> from foo2(), f still says, I'm global

The last print statement also shows that we can access the global frame, no matter how deeply nested we are. This is true for any frame, as long as there is a direct line of succession through which we can travel. It’s important to note in the above example how foo2() is called. The call doesn’t happen in the global frame, but from inside foo()’s frame. On the other hand, try this code and see what happens: def foo(): g="I'm enclosed" print('in foo(), g says', g)

def foo2(): h="I'm local" print('in foo2(), g says,', g) print('from foo2(), f still says,', f) f="I'm global" print('in global frame, f says,', f) (continues on next page)

1.2. Scope, Frame and Stack 7 recur Documentation, Release 1.0

(continued from previous page) print(foo()) print(foo2())

In order for foo2() to be accessible, we have to call it from the frame that contains its definition. This makes sense, because what’s the difference between asking for foo2() from the global frame, and asking for the variable g in foo() from the same, global frame? None; both g and foo2() are locked away inside the scope that can only be accessed by invoking foo(). By the same logic, whatever foo2() returns will be returned to the frame that called it. This seems obvious but is actually another key point to keep in mind. I’m belaboring these points because frames and scopes are implied and as such can be somewhat invisible. If you’re careless with your variable names or call order you may experience unintended consequences. But it’s also impossible to understand recursion without understanding - and keeping track of - frame and scope. It’s also pretty much impossible to design recursive code if you can’t visualize the flow of values as they are passed from one frame to another. Let’s simulate what that looks like with an an extremely artificial example, a series of nested functions:

def foo1(x=3):

def foo2(y=2):

def foo3(z=1): return z

return foo3()+y

return foo2()+x

print(foo1())

>>> 6

There are four successively nested frames here: 1) The global frame, containing print(foo1()) and the definition of foo1() 2) The first frame, containing foo1() and all of its variables and methods, including the definition of foo2() 3) The second frame, containing foo2() and all of its variables and methods, including the definition of foo3() 4) The third frame, containing foo3() and all of its variables and methods Executing each function call in the above order invokes the next function (and therefore the next frame). You could say it took us four steps to get to the innermost frame, where foo3() reigns supreme. What happens then? 5) foo3() returns the value z == 1 to foo2(). The third frame is now closed. 6) In the second frame, 1 is added to y, which is 2. The sum 3 is then returned to foo1(), and the second frame is closed. 7) In the first frame, 3 is added to x, which is 3. The sum 6 is then returned to the global frame, and the first frame is closed. 8) Finally, the global frame receives the results from foo1() and prints 6. Of course we would never write code like this for such a simple computation, but take a moment to make sure you understand the flow of how functions were called and values were passed back, frame by frame. This is not dissimilar from what happens in the course of recursion. The difference is that recursion won’t use a cascade of nested functions, but one function calling itself repeatedly.

8 Chapter 1. Getting Started recur Documentation, Release 1.0

1.2.3 Functional state and namespaces

As we’ve discussed, frames fulfill a crucial role in setting boundaries around computation by creating privacy by the rules of scope. This implies that each function has its own state, which includes the values assigned to variables, the items that happen to be in a list, etc. Of course, at the moment a function is invoked, it will have certain arguments passed to it, and those values may impact the function’s state. In turn, the state of the calling function, which receives the results returned from the called function, may (or may not) change thanks to those results. Recursion uses these same principles. Let’s look at some code that demonstrates this.

def foo(x): print('state of x before calling foo2() is', x)

def foo2(y): y=7 print('state of"x" passed to foo2() is', y) return y

x= foo2(x) print('state of x after calling foo2() is', x)

return x

print(foo(12))

>>> state of x in foo() before calling foo2() is 12 >>> state of "x" passed to foo2() is 7 >>> state of x in foo() after calling foo2() is 7 >>> 7

Here we changed the state of x in foo() by assigning x = foo2(x). But we have to be explicit about it - if we had just written foo2(x) without the x = then the state of x in foo() would have remained 12. In other words, the namespace of x in foo() would not have changed. Generally speaking, the originating frame - in this case foo() persists in its own state until the results of the called function are returned to the exact place where the call originally occurred. It’s only at that point that the originating frame carries on with its own computation, which may (or may not) involve a change in its state. This may be difficult to visualize at the moment, but it will become clearer as we begin looking at actual recursive code. This is also why the comparisons between procedural/iterative and recursive solutions are at best limited, and at worst thoroughly confusing. Of course, in many cases it’s possible to write both a recursive and an iterative solution to the same problem, and we’ll even see solutions that combine both techniques. But the dependence of iteration on frame/scope, at least within a function’s internal workings, is fairly minimal, whereas frame/scope is essential to understanding the mechanism of recursion.

1.2.4 Heuristics

Here are some important keywords that will help us navigate recursion: Frame: Whenever a function is called, a new frame is opened in which the function performs its computation. Once the computation is finished, the results are returned to the calling function and the frame, along with the function and its contents, is discarded. Stack: Also known as the call stack, it conceptualizes the order in which frames are created and discarded. This occurs in a last-in-first-out order (LIFO). Any frame that is open (ie, has not concluded its computation) remains on the stack, even though the program may not be executing anything in that frame.

1.2. Scope, Frame and Stack 9 recur Documentation, Release 1.0

State: The inventory of variables, methods and associated values constituting a function at a given moment in the program’s execution. A function can change its state thanks to internal computation, or when a result is returned from a function called by it. Namespace: The value bound to a variable within a function, at a specific frame. Usually written as ‘the namespace of x in foo() at frame 2 is 3.’

1.3 Recursion in Light of Frames

1.3.1 My first recursive code

Now we are slightly better able to answer the question of what happens when a function calls itself. Given the preceding discussion, we can characterize recursion as a specialized case of frame creation and destruction - one where each successive frame and its associated state is created by a function as a slightly modified copy of itself. Consider one of the simplest examples of recursion, a function summ() that sums a series of numbers. Starting from a given positive integer n, this function should add to n every integer smaller than n, all the way to 1. We would write this out as: summ(3) == 3 + 2 + 1 == 6 Here is the recursive code, which we’ll pick apart below. Don’t panic.

def summ(n): if n ==1: return n else: return n+ summ(n-1)

print(summ(3))

>>> 6

When our program (call it summ.py) is first executed, we create the global frame. At the global frame, Python’s interpreter defines summ() but does not execute it. It basically just says, ‘Here is a function called summ() that is expecting one argument. Gotcha.’ The next method executed is print(), which calls summ() with an argument of 3. This creates frame 1 in the stack, which, if we could see it, would contain a very familiar-looking function:

def summ(3) if 3 ==1: return 3 else: return 3+ summ(3-1)

Leveraging our discussion of namespace at the end of Scope, Frame and Stack, we would say that “the namespace of n in frame 1 is 3.” Let’s now execute the function summ(3). Since 3 != 1 we skip the if clause and proceed to else. There’s only a return statement there, so let’s try to evaluate it. We can get there partway, since we know n == 3, but can’t complete the return because the second part invokes summ() again, except now with the argument n - 1. So we have to call summ(2). This opens frame 2. In the meantime, frame 1 remains open, still waiting to resolve 3 + summ(2). Here’s what frame 2 looks like:

10 Chapter 1. Getting Started recur Documentation, Release 1.0

def summ(2) if 2 ==1: return 2 else: return 2+ summ(2-1)

In frame 2, the namespace of n is 2 - but keep in mind that in frame 1, it is still 3!. Regardless, frame 2 is in the same boat as frame 1: 2 != 1 so we skip the if clause and go to else. At the return statement, we know that n == 2 but still need to solve for summ(n - 1), ie, summ(2 - 1) or summ(1). So we invoke summ() once again, this time with an argument of 1. We leave frame 2 open, waiting to resolve 2 + summ(1), and create frame 3 by calling summ(1).

def summ(1) if 1 ==1: return 1 else: return 1+ summ(1-1)

Here in frame 3 we finally satisfy the if condition, since the namespace of n in this frame is indeed 1. We can return 1 and the else block goes untouched. Once we complete the return, the function in frame 3 will terminate and the frame will close. Now we begin moving back through the succession of open frames to conclude the computation. From frame 3, we are returning 1 to the frame that called it (frame 2). But where is it winding up in frame 2?

def summ(2) if 2 ==1: return 2 else: return 2+ summ(1)

Quite simply, it replaces summ(1) as the last term in the last line. This may seem obvious, but in a recursive setting, knowing where the returned value lands from the called function is essential to tracing the path a recursed value takes. Now that we have a definite value for summ(1), we add it to 2. With our 3 in hand, we are ready to return that value to frame 1, which is what’s next in the call stack. Once we’ve returned 3, frame 2 is discarded.

def summ(3) if 3 ==1: return 3 else: return 3+ summ(2)

The same process applies: now that we know that summ(2) == 3, we can add 3 + 3 and return 6 to the global frame. Once we’ve done this, the program ends. You can see that there is a distinct resemblance between the program flow in this example and our trivial code from the last example in Scope, Frame and Stack.

def foo1(x=3):

def foo2(y=2):

def foo3(z=1): return z

return foo3()+y (continues on next page)

1.3. Recursion in Light of Frames 11 recur Documentation, Release 1.0

(continued from previous page)

return foo2()+x

print(foo1())

One of the elegant aspects of recursive code is that we don’t need to create any additional functions beyond the one that we already specified to solve not only the specific problem, but the general case as well. In this case, we can now sum any positive integer n. By invoking itself with a modified argument, the function handles as many steps are necessary - it just keeps decrementing until it gets to 1. I want to point out two things before we leave summ(). The first is about frames, state and namespaces. You’ll note that, for every frame, the namespace of n was decremented by 1. Thus, even though summ() was pretty much a carbon copy in every frame, there was the crucial distinction of differing namespaces for n in each frame. Moreover, once each frame was seeded with those namespaces, the state persisted once we satisfied the if statement and we began returning values. In this case, we never changed the namespace for n - all the work was being done in the return statement, which picked up the existing value of n for that frame and just added it to what had been returned from the called frame/function. But fear not, we will see other examples where namespaces change for variables in frames. The other thing worth mentioning is something you should always keep in mind when looking at recursive code: How much code gets executed on the way in to satisfying the if statement, and how much gets executed on the way back? In this case, we carried out all of summ()’s code in every frame except for the recursive call. Once we were able to return 1, we could complete the computation for each open frame with ease. As an analogy, imagine you have to navigate a garden to get to, say a fruit tree. To do so, you make a path by laying down stones to a goal. Once you got to the tree and got the fruit, all you had to do to get back to your starting point was to retrace your steps along the path you had already laid. I’m being explicit about this because oftentimes the only examples of recursion offered are ones where the work is lopsided in exactly this way. People then get misled into thinking that there is some sort of special relationship between return and the recursive call, when there really isn’t. In fact, we’ll start pulling apart this syntax in the next sections, so that we can get the function itself to show us what’s going on.

1.3.2 Understanding recursion as a process

Now we can go back to the initial statement of what makes a function properly recursive: 1) Identify the base case 2) Identify the recursive case Until now I’ve been illustrating the recursive case, which is another way of saying, in what way do we modify the argument the function passes to itself in order to break the problem down to its fundamental? But in order for recursion to work, we also have to recognize what is that fundamental state. This is the ‘base case’. In the example of summ(), the simplest possible number that is the sum of whole numbers preceding it is, well, 1. In mathematical notation, f(1) == 1. (We could also specify the base case as if n == 0 but that would not add to the sum, and hence only represents an additional, unnecessary computational step.) So for summ() the base case is n == 1. In this case we can also either return 1 or n, as the value at the base case is the same as the variable that eventually led us there (this won’t always be true, either). The larger point is that the base and recursive cases must work in tandem: the recursive cases provides an incremental way to get to the base case, and the base case provides the fundamental answer that can then be returned back through the preceding frames to generate the final answer. At every step in the return journey, the returned value is computed against the state of each frame, until we are back where we began, solution in hand.

12 Chapter 1. Getting Started recur Documentation, Release 1.0

This process of ‘seeding’ our frames on the way to the base case is essential. Simply identifying the base case is necessary but insufficient, because if we don’t arrive at the base case recursively, there is nothing against which the base case can be successively computed during the return journey. In summ(), this took the form of adding n’s namespace to the returned value of every recursive call. It is this unbroken chain of as-yet undetermined computation that is fulfilled when the base case begins its trip back to the original, global frame. In terms of frame we can narrate the program as saying: 1 Global frame asks, “What is print(summ(3))?”“ 2 Frame 1 says, “summ(3) is 3 + summ(2) but I don’t know what summ(2) is.” 3 Frame 2 says, “summ(2) is 2 + summ(1) but I don’t know what summ(1) is.” 4 Frame 3 says, “I know that summ(1) == 1 so I can now return 1” 3 Frame 2 says, “summ(2) is now 2 + 1 so I can now return 3” 2 Frame 1 says, “summ(3) is now 3 + 3 so I can now return 6” 1 Global frame says, “print(summ(3)) == 6” If the numbering looks unusual it’s because I’m counting by frames and not steps. Steps may be useful for iterative procedures - for example, how many times are you looping through a particular iterable? Counting by frames em- phasizes recursion’s unique mechanism of first setting up the series of undetermined computations, and then, once the base case has been reached, completing those computations. As the numbering implies, this run of completions always happens in the reverse order. Once you understand that each frame has its own state and that that state persists while it waits for the return to occur, there is no mystery in how each recursive step ‘knows’ exactly what to add to the returned result before passing it on. Sometimes recursion can be complex, but if you hold on to this axiom, you should be able to trace the flow of values, no matter how convoluted. One way of visualizing this is to think of computation within a function as happening ‘vertically’, in the sense that a function generally executes its statements and expressions from the first line to the last line (taking into account, of course, branching and loops and such). On the other hand, recursive calls and their returns from frames thus generated are ‘horizontal’, in the sense that they stop and restart the generating function at the exact point where the function calls itself. I imagine it as a left-to-right process, but however you imagine it, the thing to keep in mind is that, at the moment the recursive call happens, all computation in the originating frame pauses. Similarly, at the moment that the ‘horizontal’ insertion occurs (ie, a result is returned to the originating frame), the ‘vertical’ computation restarts, with the state of the function in the originating frame re-engaged. Another benefit of conceptually separating the ‘horizontality’ of what is being returned from the ‘verticality’ of a specific frame’s computation is that it reminds us that functions compute until they are done, which overwhelmingly means when the function reaches a return statement. Of course, this doesn’t mean that every line is executed. In summ()’s base case, the else clause is never triggered. But for every frame the function executes until it reaches a return statement it can completely evaluate - or it reaches another recursive call. In the interest of thinking about this more clearly, allow me to suggest some non-standard terminology. Let’s call the entire recursive process a ‘recursive cascade’. I suggest this because once a function calls itself, you’re pretty much stuck with letting the entire process work itself out. Recursion is unlike iterative code where it’s possible to break out of a loop once a condition is met. In recursion, you either get to the base case (and out again), or you die trying. Ok, not really, but you’ll be stuck in an infinite loop and your program will request more memory than your machine has, and the program will crash. We’ll look at a few exceptions to this, but in general recursion is intended to be exhaustive in its application. The other two terms are related to the notion of ‘horizontality’ and ‘verticality’. A great way to understand recursive code (and how to build it) is to recognize what needs to happen on the way to the base case, and what needs to happen on the way back. In this sense, I think of all the code that precedes the recursive call as being ‘pre-recursive’, and everything that happens after it as ‘post-recursive’.

1.3. Recursion in Light of Frames 13 recur Documentation, Release 1.0

Pre-recursively, we are generally ‘seeding’ each frame with the values we want to be available to the function. In the case of summ(), that was making sure that each namespace of n had a value that had been decremented by 1. Once the base case was achieved, we flipped into post-recursive mode, where we revisited each calling frame (in LIFO order), and computed the base case’s returned value with the value of n in each frame. I realize that you may squint at summ() and struggle to see why I should be so overdetermined in my nomenclature. Suffice to say that it will come in very handy as our examples become more complex. Of course, you may not like these terms at all. For instance, you could argue that there is no such thing as ‘pre- recursive’ and ‘post-recursive’ becase once a function calls itself, it’s all recursive. You could think about recursion using the garden path metaphor I tossed out above, or using something else entirely. I’m just trying to present a number of ways in which you can think about this technique, so please use whatever feels best to you, and let me know if you have any suggestions.

1.3.3 Heuristics and Exercises

Since I like my way of framing recursion, I’ll use it to begin defining some heuristics. The recursive cascade is the recursive mechanism viewed as a whole. Just as functions execute until they are done (ie, reach a return statement), a recursive function, once called, will ‘cascade’ to the base case and back to the frame that originally called it. Every recursive algorithm is divided into (at least) two parts - the pre-recursive and post-recursive. The pre-recursive part of the function is executed before the recursive call, and post-recursive is the part that is executed after the recursive call has returned its results. The pre-recursive portion of function seeds the frames with the desired namespaces until the base case has been reached. Once the base case has been attained, the post-recursive portion of the function computes the value returned by the base case against the namespaces of selected variables, as pre-recursively seeded in each frame. The pre-recursive portion of the algorithm happens in the order that frames are created, whereas the post-recursive always happens in the reverse order. Question: Am I just hedging by saying ‘Every recursive algorithm is divided into (at least) two parts’? In what circumstances might there be more parts? Exercise: Write a recursive solution factorial() to find the factorial of a given positive integer n. To solve this problem, first consider how factorials are computed. How can this formula then be applied to a recursive context? How can you use what we developed in summ() to characterize both the base and recursive cases? Since we’re interested in not just solving the problem but understanding how this solutions works, build into your program a functionality to keep track of what is going on. Can you keep track of each argument as it is passed recursively? Can you count the steps? Can you count the frames? How can you tell which is which?

1.4 Counting In Recursion

1.4.1 Steps vs Frames

As the exercise at the end of the previous section implied, counting in recursion is not as intuitive as it is, say, for iterative code. For the latter, it is easy to insert print statements at any point in a loop or any other construction, and quickly understand the state of computation at that point. Inserting counters and print statements into recursive code, however, yields some initially baffling results. Look at what happens if we try to print n every time the function is called:

14 Chapter 1. Getting Started recur Documentation, Release 1.0

def summ(n): print('n at top of summ() =', n) if n ==1: return n else: return n+ summ(n-1)

print(summ(4))

>>> n at top of summ() = 4 >>> n at top of summ() = 3 >>> n at top of summ() = 2 >>> n at top of summ() = 1 >>> 10

This is fine, but doesn’t really tell us anything we didn’t already know. (Also, it seems like something is missing - what do you think it is?). It’s more useful if we place the print statements based on where n occurs in the if/else block. Now we can see the exact value at which we hit the base case:

def summ(n): if n ==1: print('n at if =', n) return n else: print('n at else =', n) return n+ summ(n-1)

print(summ(4))

>>> n at else = 4 >>> n at else = 3 >>> n at else = 2 >>> n at if = 1 >>> 10

It would be more useful still if we could count the number of frames as they are generated. We can achieve this by using the rules of scope. Recall that any variable declared in the global frame is, by default, a global variable. So let’s create a variable frame that will keep track of things for us. In order for summ() to access it, we need to use the global keyword. This tells Python that the variable we are citing within the function is not a new one, but rather one that already exists in the global frame, and the interpreter should go find it there, instead of creating it within - and restricting it to the scope of - summ().

frame=0 n=4 print('at global frame =', frame,'n =', n)

def summ(n): global frame frame+=1 if n ==1: print('base case frame =', frame,'n =', n) return n else: print('recursive frame =', frame,'n =', n) return n+ summ(n-1)

(continues on next page)

1.4. Counting In Recursion 15 recur Documentation, Release 1.0

(continued from previous page) print(summ(n))

>>> at global frame = 0 n = 4 >>> recursive frame = 1 n = 4 >>> recursive frame = 2 n = 3 >>> recursive frame = 3 n = 2 >>> base case frame = 4 n = 1 >>> 10

Using global variables is generally regarded as a suspect practice, since you can modify the variable from inside a function. If another function is relying on that global variable to stay constant, then this can cause some unpredictable and difficult-to-trace behavior. Fortunately, in this case modifying the global variable is exactly what we want. Now we know how many frames there are, and the exact value of the argument n being passed from frame to frame. Things are looking hopeful. The problem with all of these examples, however, is that we are seeing only half the picture. What about after the base case, when the recursive cascade reverses and clears up all the unresolved compu- tations? How do we access all of the post-recursive goodness we know is hiding in there?

1.4.2 Counting the post-recursive cascade

It would be great to see the values of n and the reversed order of frames printed to the console as well. As it stands, we can guess that the number of steps would equal the number of pre-recursive calls where an else statement is triggered, but as we look at more complex examples of recursion, this hypothesis will become difficult to verify, to put it kindly. The rub is in the final line: return n + summ(n - 1) That is, both the return statement and the recursive call occupy the same line. Each result returned from the previous frame is immediately added to n and returned again, until we reach the global frame. So there’s literally no opportunity for our print statements to record any of these computations. We can fix this by putting some daylight between the recursive call and the return with a simple little intervention:

frame=0 n=4 print('at global frame =', frame,'n =', n)

def summ(n): global frame frame+=1 if n ==1: print('base case frame =', frame,'n =', n) return n else: print('recursive frame =', frame,'n =', n) r= summ(n-1) frame-=1 print('recursive frame =', frame,'n =', n,'r =', r) return n+r

print(summ(n))

16 Chapter 1. Getting Started recur Documentation, Release 1.0

>>> at global frame = 0 n = 4 >>> recursive frame = 1 n = 4 >>> recursive frame = 2 n = 3 >>> recursive frame = 3 n = 2 >>> base case frame = 4 n = 1 >>> recursive frame = 3 n = 2 r = 1 >>> recursive frame = 2 n = 3 r = 3 >>> recursive frame = 1 n = 4 r = 6 >>> 10

By taking the returned result and binding it to a variable r, we have the chance to not only print that value, but also get the correct frame. Also keep in mind that r represents the returned result from the previous, called frame - it is what is being added to n, but is not what is being passed on. This is why you have to wait until the global frame to see the final answer. This trick of breaking apart the return statement is extremely handy when you want to peek into what is being returned during the post-recursive portion of the cascade. I’ll be using r (which stands for ‘recursion’, naturally) throughout this guide. At first glance it may seem redundant to have two print statements in the else clause but it’s not. As described at the end of Recursion in Light of Frames, the complete execution of any recursive function is split into at least two parts: what happens before the recursive call, and what happens afterwards. Here, all the print statements up to and including base case frame = 4 n = 1 represent the first, pre-recursive part of the function. All the print statements after the spot where the return statement inserts its result represent the post-recursive part of the function’s top-to-bottom execution. This is why we need to add frame -= 1 after r = summ(n - 1). We want to capture our actions as we rewind our way back through the series of open frames. We already have frame += 1 at the top of the function, which counts every frame created on the way to the base case. If we’d continued to increment by inserting frame += 1 after the recursive call, we would have gotten the total number of steps but we would have lost track of which frame we were in, hence frame -= 1. Question: What do you think happens if you remove frame -= 1 from the code? Try it out. What’s the reason for this behavior? Does it break the rules of a function’s state? Why not? By expanding the ‘return + recursive call’ one-liner, we can now access the complete narrative of the algorithm: the state of each frame as the recursive cascade proceeds both inward, from the global frame towards the base case frame, and back outward, as the paused computations in each open frame are completed. The code is decidedly less elegant than the original, but hopefully the printout makes it less mystifying. Knowing where to place print statements to answer specific questions will help you learn to read recursive algorithms more fluently.

1.4.3 Heuristics and Exercises

In recursion, counting frames is usually more important than counting steps. Being able to separate the pre-recursive and post-recursive state of a function (and the accompanying namespaces for variables) is essential to understanding how a recursive cascade unfolds. When a function’s recursive call is part of the return statement, break the two apart by introducing an intermediate variable. This provides the opportunity to inspect the value actually being returned from the called frame. Exercise: Revisit your factorial solution fom the previous section and apply the above techniques to get the full recursive narrative. What else can you print that would be useful information? Exercise: Consider the following function decToBin(), which recursively converts a decimal number to its binary equivalent. Add print statements and global variable to track frames so that you can see what the code is doing at each step.

1.4. Counting In Recursion 17 recur Documentation, Release 1.0

def decToBin(n): if n ==0: return 0 else: return n%2+ 10 * decToBin(int(n/2)) print(decToBin(7))

1.5 Recursion and Swapped Arguments

So far we’ve looked at a very simple recursive function, in the sense that summ() only had one argument that decre- mented regularly until we reached the base case. In a sense, this linear decrementing of n made it its own counter. We’ll see this counter functionality appear consistently as a great way to manage the recursive cascade. Of course, functions can take multiple arguments, and we can recursively modify those arguments to reach the answer quickly, skipping many steps along the way. The next algorithm we’ll look at does exactly this, in a very clever way. Python is agnostic about how arguments can be passed to functions. As long as the function receives the correct number of arguments and can run its computation without throwing any errors, the interpreter will be happy. So depending on what your function is trying to do, you can pass an integer in place of a float, or a tuple or even a string in place of a list. This is both a strength and a weakness of the language, but here we’ll only use it to our advantage. One way to benefit from this agnostic attitude is by writing recursive calls that re-arrange the order of arguments. We will use this to great effect later on as we draw and solve for the Towers of Hanoi, but let’s begin with a simpler example: finding the greatest common divisor of two numbers. The problem is simple: given two positive, non-zero numbers, what is the largest number that can divide into both? For example, given the pair (20, 12), the greatest common divisor is 4. We can solve for this iteratively by exhaustive enumeration: def gcdIter1(a, b): for x in range(a,-1,-1): if a%x ==0 and b%x ==0: return x print(gcdIter1(a, b))

The algorithm simply begins from a and decrements until x divides both a and b evenly. But this takes too many steps. If, for example, our pair was (2012314, 1234234) we would have to start from 2,012,234 and check each value until we reached the answer 2!

1.5.1 Doing it Euclid’s way

Far better to use Euclid’s algorithm, which asserts that “the greatest common divisor of two numbers does not change if the larger number is replaced by its difference with the smaller number”. Using the pair (20, 12), our sequence of substitutions looks like this:

20 - 12 = 8 12 - 8 = 4 8 - 4 = 4 4 - 4 = 0

4

18 Chapter 1. Getting Started recur Documentation, Release 1.0

Or, for the pair (25, 20):

25 - 20 = 5 20 - 5 = 15 15 - 5 = 10 10 - 5 = 5 5 - 5 = 0

5

As we iterate through the algorithm, we get our answer when the last remainder becomes equal to the largest remaining number. This final remainder is the greatest common divisor (or highest common factor, if you prefer). In its classic form, Euclid’s algorithm asks us to find the larger number so we can order the subtraction correctly - we want to avoid being left with a negative number. We can avoid this by dividing the two numbers and selecting for the remainder, using the modulo operator %. In the worst case, the modulo will be the smaller number of the two. Consider the pair (12, 20):

12 % 20 = 12

Since 12 divided by 20 has no other remainder than itself, we flip the values of the two variables and repeat the process:

20 % 12 = 8 12 % 8 = 4 8 % 4 = 0

As with the original algorithm, we return the last number that got us to 0. Iteratively, the code is very simple: def gcdIter2(a, b): while b: a, b= b, a%b return a print(gcdIter2(a, b))

Inserting a step counter and print statements before and after the statement inside the while loop allows us to inspect the progression of the algorithm:

>>> step = 1 >>> a = 12 b = 20 a % b = 12 >>> a is now 20 and b is now 12

>>> step = 2 >>> a = 20 b = 12 a % b = 8 >>> a is now 12 and b is now 8

>>> step = 3 >>> a = 12 b = 8 a % b = 4 >>> a is now 8 and b is now 4

>>> step = 4 >>> a = 8 b = 4 a % b = 0 >>> a is now 4 and b is now 0

>>> 4

Plugging in the pair (2012314, 1234234) shows that we can get to 2 in 13 steps, which, to put it mildly, is a vast improvement.

1.5. Recursion and Swapped Arguments 19 recur Documentation, Release 1.0

Using this iterative solution as a blueprint, a recursive version is fairly straightforward. As long as b != 0, swap the values a, b with b, a % b, else return a. This provides us with both the base and recursive cases: def gcdRecur(a, b): if b ==0: return a else: return gcdRecur(b, a% b)

If we add print-tracing as we did with summ(), we can precisely track the recursion: frame=0 print('at global frame =', frame,'a =', a,'b =', b) def gcdRecur(a, b): global frame frame+=1 if b ==0: print('base case frame =', frame,'a =', a,'b =', b) return a else: print('recursive frame =', frame,'a =', a,'b =', b) r= gcdRecur(b, a% b) frame-=1 print('recursive frame =', frame,'a =', a,'b =', b,'gcdRecur(b, a% b) =',

˓→r) return r print(gcdRecur(a, b))

>>> at global frame = 0 a = 12 b = 20 >>> recursive frame = 1 a = 12 b = 20 >>> recursive frame = 2 a = 20 b = 12 >>> recursive frame = 3 a = 12 b = 8 >>> recursive frame = 4 a = 8 b = 4 >>> base case frame = 5 a = 4 b = 0 >>> recursive frame = 4 a = 8 b = 4 gcdRecur(b, a % b) = 4 >>> recursive frame = 3 a = 12 b = 8 gcdRecur(b, a % b) = 4 >>> recursive frame = 2 a = 20 b = 12 gcdRecur(b, a % b) = 4 >>> recursive frame = 1 a = 12 b = 20 gcdRecur(b, a % b) = 4 >>> 4

This yields an interesting observation: once we have gotten to the base case, the returned value of r == 4 never changes. Look at the way the ‘return + recursive call’ was originally written: return gcdRecur(b, a % b) How is this different from summ(), or, for that matter, your implementation of factorial()? Simply put, there is no further computation occurring within the return statement. With return n + summ(n - 1) we still needed to add n from each frame, but here we just want the value of a that happened to be there the instant when b == 0 became True. Sometimes all we really want is what the base case tells us. In that case, no post-recursive computation is needed. You can further see the effect of this because, in the post-recursive portion of the readout, there is no change to the namespaces of a or b in any of the frames. They remain exactly as they were seeded during the pre-recursive portion of the cascade. Obviously, when compared with the iterative solution there is no real gain in terms of the number of steps. Both versions resolve (2012314, 1234234) in the same number of steps, if one were to equate the number of loop iterations

20 Chapter 1. Getting Started recur Documentation, Release 1.0

with the number of function calls. And to be honest, from a performance point of view, the iterative example is faster. But we will see other examples where recursion is faster, and in some cases, the only possible solution.

1.5.2 Heuristics and Exercises

We can add several heuristics for thinking recursively from this example. Learn to recognize an effective formula and how to translate it into code. Sometimes it is easier to first translate the formula into an iterative format, test and understand it, and use it as a guide to implementing the recursive solution. In addition to performing operations on arguments during the recursive call (eg, n - 1, a % b, we can swap those arguments as well. If the answer you need is captured by reaching the base case, there is no need to perform further computations on the return statement. Just return the desired value from the base case, and make sure that the return statement for the recursive call only carries that value back to the global frame. Exercise: Implement a recursive algorithm itos() that converts a number, digit by digit, to a string. Don’t convert the entire integer to a string and return it - that’s cheating! Also, the final returned result should be a single string representing the entire number. For example, if we passed the integer 1234 to itos(), the function would return '1234' such that type('1234') == str. You can break this problem down into three parts. 1) How do you identify your base case? 2) The pre-recursive work: How do you get to that base case? How do you need to seed your frames on the way to the base case? 3) The post-recursive work: What would you add to the base case as it works its way back through the recursed calls? Does the order of what is returned and what is added matter? Annotate your solution with print statements that show, at each frame, the state of the function, specifying what is being passed and what is being returned, along with a counter that tracks the frames as they are opened and closed.

1.6 Palindromes and Recursion-as-Evaluation

1.6.1 Applying recursion to strings

Recursion can also be used to simply evaluate the truth of an argument that is passed to the recursive function. With the base cases for summ() and factorial(), we identified the simplest possible state of the problem, eg: the sum of all numbers from 1 to 1 is 1, and 1! == 1. When evaluating for truth, we restate our base case to answer the question: what is the most basic state of the problem that would return True? On the way to the base case, we evaluate ever-smaller subsets of the problem using recursive calls. As long as each call evaluates to True, we keep reaching for the base case. If we can get to the most basic solution of that problem and return True from the base case, this is thanks to the fact that every previous reduction of the problem has also been true. For example, consider the problem of palindromicity: if a string can be read both forwards and backwards, that string is said to be a palindrome. We might say that the simplest palindrome is a single letter, for example a, since a == a is certainly true. In general for humans, shorter words are quite easy to evaluate, such as kayak. As a foil, we could bring up Python’s reversed() method, which is effective for individual words but is quite slow, performance-wise.

'redder' ==''.join(reversed('redder') 'driver' !=''.join(reversed('driver')

1.6. Palindromes and Recursion-as-Evaluation 21 recur Documentation, Release 1.0

However, entire phrases such as ‘Go hang a salami; I’m a lasagna hog’ may be more difficult to evaluate. So far we have been practicing recursion using numbers, which lend themselves quite well to operations like incre- menting, decrementing, and hitting limits. It’s a bit of a shift to focus on strings, but Python treats letters much like numbers. String slicing and indexing is a great way to isolate subsequences or individual letters in a string, and is much faster than reversed() above. So let’s consider how these string methods could be used in a recursive solution. First we decide how to apply a reductive strategy to the matter of palindromes. This is a good example of recognizing that, when considering how to design the recursive case, what you are really asking is, “How can I restate the problem so that every time I recursively call the function, the problem is smaller, and eventually I am guaranteed to reach the base case?” You’re just doing it with strings and not numbers - the approach remains the same. For example, if our string is racecar we know that the first and last letters are the same, as are the second and second-to-last letters, etc, all the way until we get to the midpoint of the string. Since e is at the midpoint and we know that all single letters show palindromicity, we know that e == e and we’re done. This is going to be easy! Recalling string index positions:

0 1 2 3 4 5 6 s = r a c e c a r

Put another way, s[0] == s[6] is the same as 'r' == 'r' s[1] == s[5] is the same as 'a' == 'a' s[2] == s[4] is the same as 'c' == 'c' s[3] == s[3] is the same as 'e' == 'e'

We can re-write the above to take advantage of Python’s indexing: increasing negative numbers count downwards from the end of the string. This way we don’t care how long the string is, and we know that if we keep doing it for long enough we’ll get to the midpoint: s[0] == s[-1] s[1] == s[-2] s[2] == s[-3] s[3] == s[-4]

Already, a glimpse of the recursive case is emerging. If we can just evaluate each pair in the sequence in order, we should be able to establish a word’s palindromicity. We can write out some pseudo-code to get started: 1) Take the first and last letters of a string 2) If they are equal proceed to the next pair of letters 3) If they are not, return False 4) If there is only one letter left, return True How can we represent this in code? Well, if s[0] == s[-1] then what’s left over? You may think that it is s[1:-2] but Python doesn’t count the character at index position -2, so in fact the portion of the substring still to be evaluated is s[1:-1]. Note that we are not altering the string in any way - the immutability of strings as a data type prevents that. Nor are we making a new copy of the string. We are just looking at a sub-section of the original string, and narrowing our view two characters at a time:

1) s[0] == s[-1] r ? r 2) s[1] == s[-2] a ? a 3) s[2] == s[-3] c ? c 4) len(s) == 1 e

Clearly, e is the midpoint, and we’ll want to stop once there, otherwise we’ll just be repeating computations. If we know to stop when there is only one character that we are testing, we can re-state the limit as len(s) == 1. This is

22 Chapter 1. Getting Started recur Documentation, Release 1.0

all very sensible, but if we aren’t mutating s how can we ever assert that len(s) == 1? In the case of racecar, len(s) == 7 always and forever. Except our limit is literally at len(s[3:-4]) == 1, so somehow we have to persuade Python that that’s what we’re talking about when we’re talking about s. This is where the recursive call works to our advantage, as we can pass slices as our argument:

def pal(s): if # some condition: # do something else: return pal(s[1:-1])

In other words, each frame only knows what it is passed to it as an argument. Let’s begin with racecar in the first frame. The argument for the first recursive call is s[1:-1], so the namespace for string s in frame 2 is aceca. The second recursive call opens a third frame, where the namespace for s is cec, etc. This means that, if you were deep inside the recursion and expected to access the full representation of s == 'racecar' you’d be out of luck - the function in that frame only knows what’s passed to it. I’m going over this again because it’s one of the tricky things about recursion. It seems at first a bit much to reconcile the statement that 1) “We aren’t mutating or copying s” with 2) “Each function state has its own s, depending on how the recursive call modifies its passed argument.” In fact, thanks to the rules of scope and frame creation, both can and must be true. There may be a bit of a cognitive leap when thinking this way about strings as opposed to numbers, but this is no different than the state of n in any given frame of summ(). It may have been 6 in the global frame, and 5 in the first recursed frame, but in the next recursed frame, it was passed as 4 and that’s what it is for that frame. Next, triggering the recursive call is always dependent on the truth of the condition if s[0] == s[-1]. If s[0] != s[-1] at any point in our investigation, we can simply stop what we’re doing and return the bad news, as there is no reason to go any further. (As a side note, a good deal of effective programming is knowing when to stop.) So let’s build this branch into our code:

def pal(s): if # some condition: # do something else: if s[0] == s[-1]: return pal(s[1:-1]) else: return False

There’s another bit of elegance made possible by recursion here. Because the next s that is being passed is always snipped of its first and last elements, if s[0] == s[-1] is always the right test for that particular frame. There’s no reason to muck about with evaluating if s[0] == s[-1] and then if s[1] == s[-2] and then if s[2] == s[-3] - the recursive call design handles everything for us correctly.

1.6.2 Multiple base cases

That’s great, but what about the base case? Based on our discussion, it would be intuitive to say:

def pal(s): if len(s) == 1: return True ...

This certainly works for racecar, but if we try redder we get a IndexError: string index out of range error. As it turns out, strings, whether palindromic or not, come in two flavors: those that have an odd number of letters, and those with an even number. Since redder belongs to the latter, does this mean we have to change our recursive call?

1.6. Palindromes and Recursion-as-Evaluation 23 recur Documentation, Release 1.0

Not at all. We are far better off defining an additional base case. We know that slicing a character off either end of an odd-numbered string will leave us with a final string of 1. By the same logic, a string with an even number of characters will end up with no elements in it once the final comparison and slicing is executed:

1) s[0] == s[-1] r ? r 2) s[1] == s[-2] e ? e 3) s[2] == s[-3] d ? d 4) len(s) == 0

That is, if s == 'dd' at step 3, then s[1:-1] == '', and len(s) == 0 for that final recursive call. As you know, an empty string is a perfectly legal thing to have. Here is the final code, accounting for both base cases: def pal(s): if len(s) ==1 or len(s) ==0: return True else: if s[0] == s[-1]: return pal(s[1:-1]) else: return False

If we trick out our code with print statements we can see the recursion in action: frame=0 def pal(s): global frame frame+=1

if len(s) ==1 or len(s) ==0: print('frame =', frame,'BASE; \t s =', s) return True else: if s[0] == s[-1]: print('frame =', frame,'TRUE; \t s =', s) r= pal(s[1:-1]) frame-=1 print('frame =', frame,'BACK; r =', r,'s =', s) return r else: print('frame =', frame,'FALSE; \t s =', s) return False

If we set s to racecar we get the following output:

>>> frame = 1 TRUE; s = racecar >>> frame = 2 TRUE; s = aceca >>> frame = 3 TRUE; s = cec >>> frame = 4 BASE; s = e >>> frame = 3 BACK; r = True s = cec >>> frame = 2 BACK; r = True s = aceca >>> frame = 1 BACK; r = True s = racecar >>> True

And for redder we see:

>>> frame = 1 TRUE; s = redder >>> frame = 2 TRUE; s = edde (continues on next page)

24 Chapter 1. Getting Started recur Documentation, Release 1.0

(continued from previous page) >>> frame = 3 TRUE; s = dd >>> frame = 4 BASE; s = >>> frame = 3 BACK; r = True s = dd >>> frame = 2 BACK; r = True s = edde >>> frame = 1 BACK; r = True s = redder >>> True

Note that nothing prints for s in the base case frame, as s is empty by that point. Finally, to show an example where the test fails, let’s try ricercar:

>>> frame = 1 TRUE; s = ricercar >>> frame = 2 FALSE; s = icerca >>> frame = 1 BACK; r = False s = ricercar >>> False

Notice how the recursion ‘turns back’ as soon as the conditional isn’t met. Thanks to the branching we put into the else block, we’re not forced to go all the way to the base case, once we know we don’t have to (although we still have to retrace our way through all the frames we opened pre-recursively). This is a very useful trick to know. Our algorithm is pretty good at handling words of any length or parity, but what about phrases? As an addendum, here is some code that will clean up most text and make it legible to our pal pal(). In general this is good programming practice - you want to break up functions into discrete areas of responsibility. It makes it much easier to debug, but also to reuse your code.

def clean(s): s=s.lower().replace('','') s_list= list(s) return ''.join([c for c in s if c not in '!@#$%^& *()_+;:,.<>?/\'\"'])

def pal(s): if len(s) ==1 or len(s) ==0: return True else: if s[0] == s[-1]: return pal(s[1:-1]) else: return False

print(pal(clean('Go hang a salami; I\'m a lasagna hog.')))

There are two takeaways from this example. The first is that you may not be able to restrict yourself to a single base case. If there are two (or more) states that satisfy the most fundamental statement of the problem, then each should absolutely be part of the base case design. We will see multiple base cases play a vital role in the recursive approach to the Fibonacci sequence later on. The second takeaway is our opportunistic approach to using recursion to get what we want. In summ(), once we hit the base case, we traced our way back down the stack, adding up the numbers that were deposited in each frame, until we reached the global frame. We always knew we would reach an answer, and based on our initial n we also knew how many steps it would take. Similarly, in gcdRecur() we knew that, given non-zero inputs, we would always get an answer, even if we didn’t know how many steps it might take. But when we got to the base case and found our answer, we simply pulled it back through the stack, all the way to the global frame, with no additional computation necessary. With pal(), we don’t know if the base case is attainable at all. Indeed, our print-tracing for the third, failed case shows that once a comparison of two letters was False, all we had to do was return that boolean through the stack.

1.6. Palindromes and Recursion-as-Evaluation 25 recur Documentation, Release 1.0

Obviously, that boolean could not come from the base case, so the option had to be created in the form of a conditional branch that would stop the recursion. If you are just beginning to work with functions, an 8-line program with no less than 3 return statements may seem like overkill. But the program flow must take into account every possibility of how a function should terminate. A function must exit when it encounters a return statement, and it must exit with the returned value. Therefore, using return statements is the most legible way of recognizing and tracing a function’s output. However, having said all that, we still have to thread our way back through the call stack, even if there are no additional computations to be made. To see this, use racercar as an input.

1.6.3 Heuristics and Exercises

Testing for palindromicity using recursion suggests to us the following heuristics: If there is more than one most basic form of the problem, then there is more than one base case that needs to be incorporated into the code. If the purpose of the program is to evaluate the truth of an input, the base case may not be attainable, in which case there must be another way of stopping the recursive process and returning the desired value to the global frame. Exercise: In the last exercise, we implemented a recursive algorithm itos() that converted an integer to a string. Now, implement a recursive function stoi(), which takes a string and returns the sum of integer values of each character of that string. Think carefully about how you identify your base case. What is the minimum possible value of a string, and what does it say about the string? How do you arrive at the base case? And what would you add to the base case as it works its way back through the recursed calls? Again, this problem divides neatly into base case, pre-recursive and post-recursive parts. As before, annotate your solution with print statements that show, at each frame, the state of the function, specifying what is being passed and what is being returned, along with a counter to keep track of frames.

1.7 Summing a List of Lists

1.7.1 Recursion. . . but only when it’s needed

In the preceding section on detecting palindromicity, we wrote code where we didn’t need to run through all the possible recursive cases. In the case of pal(), if at any point s[0] != s[-1], we returned False and sent the entire recursive cascade into reverse, without ever reaching the base case. We were able to do this by moving away from the simplest implementation of recursion, which can be represented as: def foo(n): if base case is True: return base case else: return foo(#modified n)

Now we have something a little more nuanced: def foo(n): if base case: return True else: if #some condition: return foo(#modified n) (continues on next page)

26 Chapter 1. Getting Started recur Documentation, Release 1.0

(continued from previous page) else: return False

One of the drawbacks of teaching recursion with only the most common examples (ie, factorial, Fibonacci) is that these examples write the recursive call as part of the return statement. For those algorithms, it’s appropriate that the ‘return + recursive call’ statement drives the entire post-recursive program flow. Recursion is about much more than just having a ‘return + recursive call’ statement at the end of your function. As pal() made clear, recursive calls can work in harmony with other operations to reach an answer. If we can place the ‘return + recursive call’ construction anywhere we need within the function, then it stands to reason that recursive calls can be entirely separate from return statements, too. We’ve already started to peel this apart in Counting In Recursion by creating an intermediate variable r in order to print out the post-recursive portion of the cascade. Now consider a scenario where you want to return the value generated by a recursive call, but also need to perform an additional computation on that value with a specific value in each frame. To illustrate, here is another example, and one where recursion really begins to shine. Say we have a list L of integers, and we want to return the sum. However, some of these integers sit in sublists within L. We can’t use Python’s sum() method, since it requires that the list be ‘flat’ - we get a Type Error if we try to sum an integer with a list. While we know that our input will only contain integers, those integers could be contained in lists within lists within lists, oh my. If we knew that we would only be getting a list with lists inside it - call it a ‘depth of two’ for the sake of convenience - then it wouldn’t be hard to write an iterative solution:

def sumListIter(L): total=0 for e in L: if isinstance(e, int): total+=e else: for e in e: total+=e return total

L=[[1,2],3,[4,5]] print(sumListIter(L))

>>> 15

Unfortunately, if we appended [[6]] to L we’d be out of luck. We could keep writing for loops to intercept varying depths of nestedness but things will get ugly quickly, and ugly code is always difficult to maintain. Also, there could always be a depth that we hadn’t thought to cover, like [[[[[2]]]]]. How would we approach this recursively? Let’s start with some pseudo-code: 1) Iterate over the items in list L 2) If an item is a list, iterate over it and add it to total 3) Else an item is an integer, so add it to total 4) When there are no more items in L, return total Admittedly, this second step is a bit of a puzzle. How can we iterate over a list if we’ve already passed step 1), which is the point in the function where we iterate over a list? It’s precisely here that recursion comes in. We can rewrite the second step as a call back to the first step: 1) Iterate over the items in list L

1.7. Summing a List of Lists 27 recur Documentation, Release 1.0

2) If an item is a list, send it to step 1. 3) Else an item is an integer, so add it to total 4) When there are no more items in L, return total Before we translate this into code, we have to answer a few questions. How do we know whether a variable represents an integer or a list? You have probably used the type() method to figure out what a particular variable or constant is:

x = 6 type(x) >>>

type(6) >>>

type([6]) >>>

An empty list is still a list, of course:

type([]) >>>

So for some element e in list L, let’s use this syntax to our advantage to translate the first two lines of our pseudocode:

def sumListRecur(L): for e in L: if type(e) == type([]): sumListRecur(e)

We add the third line of our pseudocode to cover for when e is an integer, and declare a variable total to collect the sums. Finally, we add the fourth line, which is the return statement:

def sumListRecur(L): total=0 for e in L: if type(e) == type([]): sumListRecur(e) else: total+=e return total

If we run it with L = [1, 2, [11, 13], 8, [4, [4, 5, 5]], [[5, []]]] as our list, we get

>>> 11

Uh-oh. We still seem to be adding only the items in the list that are not in nested lists. Can you see what’s wrong in the code? Something was lost in translation in the two versions of pseudo-code: total. That is, the recursive call sumListRecur(e) needs a container in which to dump its result. If you go back to the last section of Scope, Frame and Stack, we can only change a variable’s namespace by explicitly binding the new value to the variable. This simple fix does the trick:

def sumListRecur(L): total=0 for e in L: (continues on next page)

28 Chapter 1. Getting Started recur Documentation, Release 1.0

(continued from previous page) if type(e) == type([]): total+= sumListRecur(e) else: total+=e return total

>>> 58

Let’s unpack this code now, as it has a few interesting details. The first point is the recursive call itself. As we iterate over each e item in L, when we identify an instance where e == type([]), we only need to send that specific e as an argument for the recursive function. In this way, we have the same function sumListRecur() address a smaller version (e) of the total problem (L) - which is pretty much the point of recursion. Moreover, we do this only when we need to, since if e is not a list, it must be an integer, in which case it is added to total during the else block. In the majority of recursive cases seen so far, we have been passing arguments that have either predictably decremented to the base case (eg, summ() and factorial()), or we have sent ever-smaller slices of a defined string (pal()). In the case of pal() this process of decrementation is also fundamentally predictable - the maximum number of slices, if the string is in fact a palindrome, is always len(s) // 2. Only gcdRecur() is unpredictable in terms of the number of steps it takes to get to the base case, but the recursive call drives the algorithm to the base case regardless. With sumListRecur() plenty of work is being done without recursion. Indeed, this program could compute a flat list without resorting to recursion at all. On the other hand, as long as ‘‘e‘‘ is a list, the recursive call will get triggered. In this way, a deeply nested list, such as [[[5, []]]] is as easily handled as a flat list. If recursion is being deployed on an as-needed basis, that means that we may well hit the base case multiple times in the course of processing a list. This may sound trivial, but so far all of our algorithms have engaged recursion in a fairly linear fashion - a sort of ‘one and done’ approach. It’s valuable to recognize that you can use recursion only when you need it, and as often as you need it, within a single algorithm.

1.7.2 Computing values inside each recursive frame

The use of total deserves a fuller description. Recall the iterative code with which we started:

def sumListIter(L): total=0 for e in L: if isinstance(e, int): total+=e else: for e in e: total+=e return total

And compare it with our recursive code:

def sumListRecur(L): total=0 for e in L: if type(e) == type([]): total+= sumListRecur(e) else: total+=e return total

1.7. Summing a List of Lists 29 recur Documentation, Release 1.0

Honestly, except for the issue of depth, there doesn’t seem to be that much of a difference. In both cases, total scoops up all the values we need and returns the sum. And in the simplest case, where L is a flat list, there’s almost no difference at all - in both versions we use the else clause to add integers to total until we get the sum we’re after. But in the iterative version, there is only one total. Recall that when we are dealing with recursion, what we are really interested in is what happens within the frames, and that means that each frame has its own total! As we’ve established, every time we recursively invoke sumListRecur(e) we create a new frame, to which we pass e as the argument. What does the state of function sumListRecur() in that frame look like? Exactly like the original sumListRecur(), with the difference that e is the parameter and not L. What this also means, however, is that total is set to 0 - after all, that’s what we asked the code to do. So how does this help our computation? Think back on the discussion of how each frame in pal() held a different value of s. If I asked you, What is the value of s, you could only ask me to clarify, For which frame? Upon creation, each frame of pal() gets seeded with a different s. In the same way, the new frame of sumListRecur() gets seeded with e, but also total == 0. So if total is always 0, how can we add up anything? This is where the base case comes in. Let’s say that L == [1, [2, 3]]. Since we’re running the function for the first time, this is the state of frame 1. The first run through the for e in L loop doesn’t have a recursive call, so at this point total == 1. The next run through the loop triggers the recursive call, sending e == [2, 3] to frame 2. In frame 2, we skip the if clause and iterate over the list, adding each item of e to total. Now frame 1 has total == 5. Finally, we get to return total, returning 5 to frame 1. Where does that 5 wind up? In frame 1, in place of sumListRecur(e). It’s added to the current value of total, which is 1, yielding a sum of 6. Thanks to the final return total statement in frame 1, this is what is finally returned to the global frame. Let’s add our usual print-tracing statements and see what this looks like for a larger L:

depth=0

def sumListRecur(L): total=0 global depth depth+=1 print('\ndepth =', depth) print('total =', total) for e in L: print(' at top of for loop, next e =', e) if type(e) == type([]): print(' e=', e,'is a list so recurse...') r= sumListRecur(e) total+=r print(' total is now =', total) else: total+=e print(' e=', e,'is int so total =', total) depth-=1 print(' returning total =', total,'to depth =', depth) print('\ndepth =', depth) return total

L=[1,[2,3], [4,5],6] print('depth =', depth,' \nL =', L) print(sumListRecur(L))

>>> depth = 0 #global frame >>> L = [1, [2, 3], [4, 5], 6]

(continues on next page)

30 Chapter 1. Getting Started recur Documentation, Release 1.0

(continued from previous page) >>> depth = 1 #frame 1 >>> total = 0 >>> at top of for loop, next e = 1 >>> e = 1 is int so total = 1 #total = 1 >>> at top of for loop, next e = [2, 3] >>> e = [2, 3] is list so recurse...

>>> depth = 2 #frame 2 >>> total = 0 >>> at top of for loop, next e = 2 >>> e = 2 is int so total = 2 >>> at top of for loop, next e = 3 >>> e = 3 is int so total = 5 >>> returning total = 5 to depth = 1 #total = 5

>>> depth = 1 #frame 1 >>> at top of for loop, next e = [4, 5] >>> e = [4, 5] is list so recurse...

>>> depth = 2 #frame 3 >>> total = 0 >>> at top of for loop, next e = 4 >>> e = 4 is int so total = 4 >>> at top of for loop, next e = 5 >>> e = 5 is int so total = 9 >>> returning total = 9 to depth = 1 #total = 9

>>> depth = 1 #frame 1 >>> at top of for loop, next e = 6 >>> e = 6 is int so total = 21 >>> returning total = 21 to depth = 0 #total = 21

>>> depth = 0 #global frame >>> 21

In order to appreciate the importance of defining total as a variable that has local scope only, consider if we removed it from the function definition and just put it in the global frame, where it will be accessible from anywhere: def sumListRecur(L, total): for e in L: if type(e) == type([]): total+= sumListRecur(e, total) else: total+=e return total total=0 L=[1,[2,3], [4,5]] print(sumListRecur(L, total))

>>> 29

Another important trait of this code concerns frame creation. So far our examples have been rigorously predictable: the entire function’s work is inseparable from recursion. However, sumListRecur() only uses recursion when needed, and in some cases not at all. This implies that keeping track of frames is intrinsically different as well. We don’t have a monolithic structure for the function’s overall execution, but rather the type of each item in L tells us what

1.7. Summing a List of Lists 31 recur Documentation, Release 1.0 to do. You may have noticed in the above print-tracing code I didn’t explicitly track frames, preferring instead ‘depth’, or how many recursive calls were needed for each item in L. The fact is that depth and frame should really be tracked separately, since depth can be revisited, but frames are unique. For example, if L = [1, [2, 3], [4, 5], 6], for each item e in L we get: e depth frame 1 1 1 [2, 3] 2 2 [4, 5] 2 3 6 1 1

We don’t close frame 1 until we have finished unpacking all sublists (ie, the end of the program). While [2, 3] and [4, 5] both have an additional level of depth, each frame is unique. We don’t ‘go back’ to frame 2 when we recurse ‘[4, 5]’ but create a new, third frame. This is important because I don’t want you to think that frame 2 still retains a value for total when in fact it’s been closed. This may seem to be a trivial distinction, but we’ll see that it plays an important role when we encounter functions with multiple recursive calls, and also when we use depth (or ‘order’) to determine drawing the size of the next shape in a , so just keep it in mind for the future.

1.7.3 Where’s the base case?

A final observation on this code: what happened to our base case? Going back to our basic template the base case is clear:

def foo(n): if base case is True: return base case else: return foo(#modified n)

You could just as easily point out pal()’s base case:

def foo(n): if base case: return True else: if #some condition: return foo(#modified n) else: return False

But where is it in sumListRecur()? def sumListRecur(L): total=0 for e in L: if type(e) == type([]): total+= sumListRecur(e) else: total+=e return total

Quite simply, the base case is reached when the function executes all of its statements without triggering a recursive call. This is an interesting counterpoint to the other examples, where the base case is in the if portion of the if/

32 Chapter 1. Getting Started recur Documentation, Release 1.0 else block. So one way to think about sumListRecur() and similar functions is its base case is when there is nothing left to execute but the last return statement. If this doesn’t seem intuitive at first, it’s OK - we’ll see this come up frequently in more advanced recursive algorithms. To help you get more comfortable, here is a variation of sumListRecur(), where we aren’t summing a list of lists, but simply flattening L into a list without sublists: def flatten(L): newlist=[] for e in L: if type(e) == type([]): newlist.extend(flatten(e)) else: newlist.append(e) return newlist print(flatten([2,9,[2,1,13,2],8,[2,6]]))

>>> [2,9,2,1,13,2,8,2,6]

Here, instead of total, we declare an empty list newList, but with exactly the same local scope and functionality. Another neat trick is that the result of the recursive call is the parameter used for the extend() list method. Question: To better understand this code, ask yourself why we chose append() for one case, and extend() for the other?

1.7.4 Using recursion for directory listing

Recursing through lists has a very practical application as well. Consider being given the task of deriving a directory’s structure. You have access to the entire directory, but you have no idea how many folders are in it, nor how many subfolders may exist within any given folder. How do you print out the complete directory? This example may be a bit advanced compared to the code presented so far, but it’s worth the effrot. Here is some code that leverages Python’s os module that prints out the directory listing for the Python folder in my local MacOS drive: import os # can also be 'import os.path' def get_dirlist(path): """ Return a sorted list of all entries in path. This returns just names, not the full path to the names. """ dirlist= os.listdir(path) dirlist.sort() return dirlist def print_files(path, prefix=""): """ Print recursive listing of contents of path """

if prefix =="": # Detect outermost call, print a heading print("Folder listing for", path) prefix="|"

dirlist= get_dirlist(path) for f in dirlist: print(prefix+ f) # Print the line (continues on next page)

1.7. Summing a List of Lists 33 recur Documentation, Release 1.0

(continued from previous page) fullname= os.path.join(path, f) # Turn name into full pathname if os.path.isdir(fullname): # If a directory, recurse. print_files(fullname, prefix+"|")

print(print_files('/Users/polis/py/3.6'))

This follows the template for flatten.py, where recursion is triggered when fullname is found to be a directory and not a file. I won’t be exhaustive about it, but let’s break this code down a bit further: We use a number of methods that we import from the os module. When we call print_files() with a legal pathname, we first generate a header with no pipe. Since pipes are used for all subsequent prints, we immediately set prefix = '|'. Calling dir_list() with path as its argument creates a new list dirlist from the os.listdir() module method, sorts it and returns it:

>>> Folder listing for /Users/polis/py/3.6/euler >>> ['14_collatz', '.DS_Store', '12_triangle', '13_largesum', 'euler_diary_2019.txt']

˓→ #unsorted >>> ['.DS_Store', '12_triangle', '13_largesum', '14_collatz', 'euler_diary_2019.txt']

˓→ #sorted

Note that os.listdir() returns a flat, unsorted list. Some of these may be files, others may be directories. If we want to see the directory structure in alphabetical order and not the way it is stored in memory then we have to return the sorted list:

>>> | 12_triangle >>> ['triangle3.py', 'triangle2.py', 'triangle1.py'] #unsorted >>> ['triangle1.py', 'triangle2.py', 'triangle3.py'] #sorted >>> | | triangle1.py >>> | | triangle2.py >>> | | triangle3.py >>> | | triangle4.py

Now we want to print out the items returned from get_dirlist(), which have been assigned to dirlist in print_files(). But if we just print them out as strings we have no way of knowing what is a file and what is a directory. To do this, fullname = os.path.join(path, f) uses the join() method to concatenate the item f in dirlist with its pathname. For example, if. . .

f == collatz6.py

. . . then fullname = os.path.join(path, f) returns:

/Users/polis/py/3.6/euler/14_collatz/collatz6.py

But we aren’t printing the entire pathname, so why would we do this? Because f may be either a file or a directory. Hence the recursive call:

fullname= os.path.join(path, f) if os.path.isdir(fullname): print_files(fullname, prefix+"|")

34 Chapter 1. Getting Started recur Documentation, Release 1.0

This guarantees that we will print out the entire directory structure, and with each additional pipe showing the correct ‘depth’ of the directory structure.

1.7.5 Heuristics and Exercises

From this section, we can derive a number of helpful heuristics: Recursion doesn’t have to be the exclusive driver of the algorithm - it can be called on only when needed. But once it’s invoked, the recursive process must complete before the next computation can occur. A recursive call can happen anywhere within a function, and not just at the return statement. Binding the returned value of a recursive call to a variable allows us to capture the value and use it for further computation. The base case may not be an explicit value (eg, 1, or True), but simply the fact that the recursive case is no longer being called. As long as the program reaches a return statement without hitting a recursive call, the value being returned will reverse the recursive process. Exercise: Given a list of lists that contains only numbers, write a recursive function find_min() that returns the smallest integer. A good way to approach this problem is to first solve a smaller problem: How would you find - without using the min() method - the smallest value in a flat list? Once you’ve done that, consider how total functioned in sumListRecur(), and how you could integrate it with your approach to finding the minimum value into a recur- sive context. Credit: Much of this material adapted from Chapter 18.2-3 of the fantastic resource, Think Like A Computer Scientist

1.8 Using Recursion to Make More: The Power Set

1.8.1 Thinking before coding

So far, we have been using recursion reductively: we get an input and distill it down to a sum, product, integer or even a boolean. We simplified a nested list into a flat list, and, in the last exercise, pulled out a minimum value from any nested list, regardless of the depth of nesting in that list. It makes sense to apply recursion to these sorts of problems, as recursion itself is based on reducing a problem down to smaller and smaller subproblems until we get to the simplest possible statement of the problem. There is a certain cognitive harmony in knowing the answer you need to get from an input, and the method used to do it, work in the same ‘direction’. But what if we wanted to take an input and make more out of it? Our example here is the power set, or the set of all subsets of a given set. The power set is an essential gateway towards understanding a large family of problems revolving around combinatorial optimization. As a point of clarification, in Python, a set is a collection of unique elements, whereas a list may contain duplicates. We’ll use lists for our power set discussion, since a set that generates a power set doesn’t have to consist of unique elements (for instance, ‘apple, banana, banana, mango’ just means you’re working with two bananas). Also, we can leverage some of our previous experience with lists to better approach the problem. One thing we can establish right away is the base case. By definition, any set L, is itself a member of the power set. The empty set is also always a member of the power set. Therefore we know that the power set of the empty set contains the empty set as its only member:

powerSet([]) == [[]]

Using our basic template, this give us the first outlines of our code:

1.8. Using Recursion to Make More: The Power Set 35 recur Documentation, Release 1.0

def powerSet(L): if len(L) ==0: return [[]] else: # a bunch of code

It’s always good to see how a problem behaves, independently of how we might try to solve it. If you spend some time working out a few iterations or instances of a problem, you may be able to make observations or identify patterns that will help greatly with the algorithm design. If you can write a quick snippet of code, great - but sometimes grabbing a pencil and paper is even better. Being able to reason your way to a solution from first principles is one of the best ways to learn algorithmic thinking. So before we get to the rest of the code, let’s see what we can learn by developing the power set ‘in real life’. We already know that an empty set (or list - I’ll use the terms interchangeably here) will always return one element - itself - as its power set. After that, we get some pretty serious expansion:

[] == [[]] ['a'] == [[], ['a']] ['a', 'b'] == [[], ['a'], ['b'], ['a', 'b']] ['a', 'b', 'c'] == [[], ['a'], ['b'], ['a', 'b'], ['c'], ['a', 'c'], ['b', 'c'],

˓→['a', 'b', 'c']] ['a', 'b', 'c', 'd'] == [[], ['a'], ['b'], ['a', 'b'], ['c'], ['a', 'c'], ['b', 'c'],

˓→['a', 'b', 'c'], ['d'], ['a', 'd'], ['b', 'd'], ['a', 'b', 'd'], ['c', 'd'], ['a',

˓→'c', 'd'], ['b', 'c', 'd'], ['a', 'b', 'c', 'd']]

Counting the members (sublists) of each, we see that the power set shows exponential growth, or: len(powerSet([])) == 1 len(powerSet(['a'])) == 2 len(powerSet(['a', 'b'])) == 4 len(powerSet(['a', 'b', 'c'])) == 8 len(powerSet(['a', 'b', 'c', 'd'])) == 16 len(powerSet(L)) == 2**len(L)

Another interesting conclusion we can draw from the above few instances is that the power set is additive. If L == ['a', 'b', 'c'], then its power set will contain the power set of L == ['a', 'b']. Put another way, we don’t know what the power set of L == ['a', 'b', 'c'] is until we know the power set of L == ['a', 'b']. If we asked powerSet() to compute L == ['a', 'b', 'c'] we might get the following response: 1. I can’t compute the power set of L == ['a', 'b', 'c'] because I first need to compute the power set of L == ['a', 'b'] 2. I can’t compute the power set of L == ['a', 'b'] because I first need to compute the power set of L == ['a'] 3. I can’t compute the power set of L == ['a'] because I first need to compute the power set of L == [] In other words, “I can’t do ‘x’ until I’ve done ‘y’ and I can’t do ‘y’ until I’ve done ‘z’.” Whenever you find yourself in a situation like this, the chances for a recursive solution are pretty good. In that spirit, we want to reduce L until we’ve gotten it down to the empty set. As we did with our palindrome algorithm, we can use slices to send an ever-smaller list as an argument for each recursive call. This time, we apply slices not to the string, but to the list. For our purposes, the result is the same: def powerSet(L): if len(L) ==0: return [[]] else: (continues on next page)

36 Chapter 1. Getting Started recur Documentation, Release 1.0

(continued from previous page) return powerSet(L[:-1])

L=['a','b','c','d'] print('L =', L) print(powerSet(L))

Once at len(L) == 0, we’ll have the smallest statement of the problem in hand: [[]]. On the way, we’ll also have seeded each frame with decremented instances of L. It looks like we’ve got a handle on at least the pre-recursive portion of the algorithm. Question: Look closely at what gets returned from the base case. Is [[]] the same as L[:-1] at len(L) == 0? What difference does it make? Let’s add our usual print-tracing so we can start tracking frames. Also, it would be good to see the namespace for L in each frame:

frame=0

def powerSet(L): global frame frame+=1 if len(L) ==0: print('\nbase case, frame', frame) return [[]] else: print('\npre-recursive, frame', frame) print('L =', L) return powerSet(L[:-1])

print('global frame =', frame) L=['a','b','c','d'] print('L =', L) print(powerSet(L))

>>> global frame 0 >>> L = ['a', 'b', 'c', 'd']

>>> pre-recursive, frame 1 >>> L = ['a', 'b', 'c', 'd']

>>> pre-recursive, frame 2 >>> L = ['a', 'b', 'c']

>>> pre-recursive, frame 3 >>> L = ['a', 'b']

>>> pre-recursive, frame 4 >>> L = ['a']

>>> base case, frame 5 >>> [[]]

The only value returned throughout the post-recursive cascade at the moment is [[]], so we’ll for now we’ll omit printing those returns. Nevertheless, this code should be sufficient to get us going. We now have a structure that recognizes the additive nature of the power set, but is couched in recursive terms.

1.8. Using Recursion to Make More: The Power Set 37 recur Documentation, Release 1.0

1.8.2 The right seed in the right frame

Let’s now think about what we expect to happen in each frame, and what post-recursive communication of values between frames might look like. Starting from the base case, we return [[]]. In our example, we know that in frame 4, the namespace of L is ['a']. Post-recursively, we can try to create the power set for a set of two elements, [[]] and ['a']. If we can take that result and return it back to frame 3, we can do the same for that frame, and repeat until we have the complete set. Put another way, each frame has one, unique element that operates on multiple existing elements to create the permu- tations needed for that frame. Let’s call what’s coming from the called frame the base and what’s waiting for it the operator: in frame 5: result == [[]] in frame 4: base == [[]] operator == ['a'] result == [[], ['a']] in frame 3: base == [[], ['a']] operator == ['b'] result == [[], ['a'], ['b'], ['a', 'b']] in frame 2: base == [[], ['a'], ['b'], ['a', 'b']] operator == ['c'] result == [[], ['a'], ['b'], ['a', 'b'], ['c'], ['a', 'c'], ['b', 'c'], ['a', 'b

˓→', 'c']]

It’s clear that the result of each frame becomes the base of the frame that called it (remember, we’re moving away from the base case now). So we know that base should store the results of the recursive call powerSet(L[:-1]). Now we have to compute the interaction of base and operator, add those results to base, and return the whole thing to the next frame. First we’ll solve for the interaction between base and operator, simulating the state found in frame 4: base= [[]] operator=['a'] next_base= base[:] for b in base: next_base.append(b+ operator) print(next_base)

>>> [[[]], ['a']]

Plugging this into our existing draft of powerSet(): def powerSet(L): if len(L) ==0: return [[]] else: base= powerSet(L[:-1]) operator=['a'] next_base= base[:] (continues on next page)

38 Chapter 1. Getting Started recur Documentation, Release 1.0

(continued from previous page) for b in base: next_base.append(b+ operator) return next_base

If you run this code for L = [[]], you’ll get the right answer, but for any other L it doesn’t quite work. We’re still missing one last piece. Recall what we left in each frame on the way to the base case:

>>> pre-recursive, frame 1 >>> L = ['a', 'b', 'c', 'd']

>>> pre-recursive, frame 2 >>> L = ['a', 'b', 'c']

>>> pre-recursive, frame 3 >>> L = ['a', 'b']

>>> pre-recursive, frame 4 >>> L = ['a']

You can see that the last item in the list is our operator. And it’s easy to specify the last item in any list with L[-1:]. So all we have to do is replace operator = ['a'] with operator = L[:-1] to create the gener- alized case. A good takeaway here is how we got to operator. Decrementing the original list by slices had two purposes: getting to the base case, and setting up a situation where L[-1:] would give us the correct value for operator for every frame. In fact, it would be difficult to disentangle the two. This indeed works, but note that we can also rewrite the for loop as a much more concise list comprehension, and insert it directly into the return statement. Once you know how the code works, it makes it much more readable:

def powerSet(L): if len(L) ==0: return [[]] else: base= powerSet(L[:-1]) operator=L[-1:] return base+ [(b+ operator) for b in base]

L=['a','b','c','d'] print(powerSet(L))

Question: Why did we write L[-1:]? Isn’t L[-1] sufficient? Question: Can we set operator = L[-1:] before the recursive call? Why or why not? And here is the version with complete print-tracing:

frame=0

def powerSet(L): global frame frame+=1 if len(L) ==0: print('\nbase case, frame is', frame) print('returning [[]]') return [[]] else: (continues on next page)

1.8. Using Recursion to Make More: The Power Set 39 recur Documentation, Release 1.0

(continued from previous page) print('\npre-recursive, frame', frame) print('list in this frame is', L) print('operator in this frame is', L[-1:]) base= powerSet(L[:-1]) operator=L[-1:] frame-=1 print('\npost-recursive, frame', frame) print('base in this frame is', base) print('operated on by', operator) r= base+ [(b+ operator) for b in base] print('returning', r) return r print('global frame is', frame) L=['a','b','c','d'] print('L =', L) print(powerSet(L))

As with most algorithms, there are many ways to go about generating the power set. I’ve chosen this one because I think it’s a good example for how we can think recursively through the problem, step by step. But it’s always instructive to look at other solutions. From Wikipedia: def powerSet2(L): if L == []: return [L] else: e=L[0] t=L[1:] pt= powerSet2(t) fept= [x+ [e] for x in pt] return pt+ fept

Question: How is this different in terms of the order in which the power set is constructed? What causes the differ- ence? There are also several iterative solutions. You may be disappointed to realize that they are quite similar to the recursive ones. One thing that the iterative solution suggests is that [[]] is a sort of mathematical MacGuffin, an excuse to get the recursive process going, but in the end it’s really nothing. You wouldn’t be wrong. def powerSet3(L): result= [[]] for l in range(len(L)): for k in range(len(result)): result.append(result[k]+ L[l:l+1]) return result def powerSet4(L): result= [[]] for e in L: result.extend([subset+ [e] for subset in result]) return result

Question: What is the difference between the use of append() and extend() list methods? Can you rewrite powerSet3() to use extend(), or powerSet4() to use append()? Why or why not?

40 Chapter 1. Getting Started recur Documentation, Release 1.0

1.8.3 Heuristics and Exercises

Recursion doesn’t have to be reductive; it can be used to multiply, expand or further elaborate an input. Before thinking about a problem in recursive terms, it can be helpful to simulate desired inputs and outputs by cre- ating small ‘modules’ that will produce the desired results, and then using that to specify the return statement and computation that is internal to each frame. If you just work from the base case, the next steps may not be apparent. Pre-recursive seeding of frames has more functionality than simply decrementing to the base; it can also provide essential inputs for in-frame computation. Exercise: Write a recursive function powerbin() that, given a list of unique elements, returns an additional list of binary representations of each subset of that list. For example, if the input list was:

L = ['a', 'b', 'c', 'd']

Then the following sample of subsets would be returned as:

['a'] == ['1000'] ['a', 'c', 'd'] == ['1011'] ['b', 'd'] == ['0101']

What does this tell us about the power set and its relationship to binary counting?

1.9 Expanding a Series: Pascal’s Triangle

1.9.1 Returning the nth layer

Deriving the power set showed us that recursion could be used to expand an input at a literally exponential rate. The implementation also demonstrated the power of performing the same set of calculations on a frame-by-frame basis, and passing those results on to the next frame further down the stack. We’ll extend this even further with Pascal’s triangle, where we’ll derive an entire series from a minimal base case. Pascal’s triangle is complex and beautiful (and pre-dates Pascal substantially). Many other sequences can be derived from it; in turn, we can calculate its values in many ways. We’ll focus on deriving it from its starting point, the number 1. As always, let’s look at how the triangle ‘works’ before we start coding. The triangle itself can be rendered as follows:

1 1 1 1 2 1 1 3 3 1 1 4 6 4 1 1 5 10 10 5 1 ...

We can represent each row of the triangle as a list that has one more element than the previous one:

[1] len = 1 [1, 1] len = 2 [1, 2, 1] len = 3 [1, 3, 3, 1] len = 4 [1, 4, 6, 4, 1] len = 5 [1, 5, 10, 10, 5, 1] len = 6

1.9. Expanding a Series: Pascal’s Triangle 41 recur Documentation, Release 1.0

The method of expansion is simple: each next row is constructed by adding the number above and to the left with the number above and to the right, treating blank entries as 0. Traditionally, the first row is designated as the 0th row:

n triangle

0 1 1 1+0 1+0 2 1 1+1 1 3 1 1+2 2+1 1 ...

There is a way to calculate any nth row without knowing the value of the preceding row, but we are more interested in leveraging recursion so that we can derive the whole triangle from first principles. If n designates a given row of the triangle, we can decrement it until n == 0 gives us the 0th row, whose value we know is 1. Following our trusty basic template, the base case practically writes itself:

def pascal(n): if n ==0: return [1] else: # a whole bunch of code

Getting from row 0 to row 1 looks a little tricky, but there’s no reason why we need to deal with it immediately. As we did with powerSet(), sometimes an easier next step is to model a way to get from the nth row to the (n + 1)th row, eg:

1 4 6 4 1 --> 1 5 10 10 5 1

In Pythonic terms, how do we get from the fourth row, call it n4 == [1, 4, 6, 4, 1] to the fifth row, n5 == [1, 5, 10, 10, 5, 1]? If we design this correctly, then the algorithm should work for every value of n, including the base case, since recursion mandates that a function’s behavior will never change, only its inputs and state. On the other hand, it may work for all recursive cases, but not for the transition from the base case to the recursive case. Then we’ll know that we need to tweak something in the base case. To just test for the recursive case, we can set up a ‘fake’ recursive algorithm with the needed input, so we just have to compute the expected output as the return. Since we’re not having pascal() call itself, we don’t have to worry about getting tripped up if something goes wrong. It’s more like a one-shot function:

def pascal(n): if n ==0: # we want to skip this clause... return [1] else: n4=[1,4,6,4,1] # in place of recursive call n5= # some computation involving n4 return n5

print(pascal(4)) # ...so we make the passed arg > 0

If we do it correctly, return n5 will give us [1, 5, 10, 10, 5, 1]. We can then further test our model using [1, 2, 1]; if it works, we’ll get [1, 3, 3, 1], and so forth. Finally, we’ll create a connection between these spot tests and the base case: can the same logic convert [1] to [1, 1]? If so, we’ll be well on our way towards a solution. So what can we observe about the relationship between these two lists?

n4 == [1, 4, 6, 4, 1] n5 == [1, 5, 10, 10, 5, 1]

42 Chapter 1. Getting Started recur Documentation, Release 1.0

We know that, for n5, the first term in the row is 1, so we may as well declare our list with an initial value of [1]. We derive the inner terms of n5 by adding consecutive pairs of terms from n4. Finally, the last term of n5 is again 1, making it 1 term longer than n4. Here’s a first draft:

def pascal(n): if n ==0: return [1] else: n4=[1,4,6,4,1] n5=[1] for i in range(len(n4)-1): n5.append(n4[i]+ n4[i+1]) n5.append(1) return n5

print(pascal(4))

>>> [1, 5, 10, 10, 5, 1]

Question: Why are we ranging over len(n4) - 1 and not len(n4)? So this is looking pretty good. Spot-testing other rows also gives us the correct values. Best of all, our little algorithm generates row 1 from the base case, that is, row 0. But before we put it all together, let’s rewrite the loop as a (slightly verbose) list comprehension:

def pascal(n): if n ==0: return [1] else: n4=[1,3,3,1] return [1]+ [(n4[i]+ n4[i+1]) for i in range(len(n4)-1)]+[1]

print(pascal(3))

>>> [1, 4, 6, 4, 1]

This restatement allows us to see, perhaps more clearly than in the for loop, why the computation of the 0th row to the first row works:

return [1]+ [(n4[i]+ n4[i+1]) for i in range(len(n4)-1)]+[1]

We are guaranteed to return a list with first and last elements [1, 1]. This is true even if the entire list comprehension in the middle computes to nothing (ie, an empty list), since [1] + [] + [1] == [1, 1]. And this is precisely what happens when the returned value is [1], which is the base case: plugging [1] into the list comprehension yields an empty list. This is how we get from the 0th row to the 1st row, or from the base case to the first recursed frame! Finally, if we swap out the defined input n4 = [1, 3, 3, 1] with a decrementing recursive call such as pascal(n - 1) we are close to being finished. All we have to do is update our variable names and we have our final code:

def pascal(n): if n ==0: return [1] else: r= pascal(n-1) return [1]+ [(r[i]+ r[i+1]) for i in range(len(r)-1)]+[1] (continues on next page)

1.9. Expanding a Series: Pascal’s Triangle 43 recur Documentation, Release 1.0

(continued from previous page) print(pascal(4))

>>> [1, 4, 6, 4, 1]

With full print-tracing (and a little bit of variable re-arranging, since we want to print between the calculation of row and the return statement), we have: frame=0 n=5 def pascal(n): global frame frame+=1 if n ==0: print('\nbase case frame', frame) print('n = 0; returning [1]') return [1] else: print('\npre-recursive, frame', frame) print('n =', n) r= pascal(n-1) row=[1]+[(r[i]+r[i+1]) for i in range(len(r)-1)]+[1] frame-=1 print('\npost-recursive, frame', frame) print('n =', n) print('returning', row) return row print('global frame =', frame) print('n =', n) print(pascal(n))

>>> global frame = 0 >>> n = 5

>>> pre-recursive, frame 1 >>> n = 5

>>> pre-recursive, frame 2 >>> n = 4

>>> pre-recursive, frame 3 >>> n = 3

>>> pre-recursive, frame 4 >>> n = 2

>>> pre-recursive, frame 5 >>> n = 1

>>> base case frame = 6 >>> n = 0; returning [1]

>>> post-recursive, frame 5 >>> n = 1 (continues on next page)

44 Chapter 1. Getting Started recur Documentation, Release 1.0

(continued from previous page) >>> returning [1, 1]

>>> post-recursive, frame 4 >>> n = 2 >>> returning [1, 2, 1]

>>> post-recursive, frame 3 >>> n = 3 >>> returning [1, 3, 3, 1]

>>> post-recursive, frame 2 >>> n = 4 >>> returning [1, 4, 6, 4, 1]

>>> post-recursive, frame 1 >>> n = 5 >>> returning [1, 5, 10, 10, 5, 1] >>> [1, 5, 10, 10, 5, 1]

If you don’t like the verbosity of the list comprehension, here is a very elegant use of the zip() and map() methods that cuts down on the clutter. Otherwise the code is exactly the same:

def pascal(n): if n ==1: row=[1] else: r= pascal(n-1) pairs= zip(r[:-1], r[1:]) row=[1]+ map(sum, pairs)+[1] return row

Spend a few minutes with Python’s documentation to figure out exactly how these two methods work. They don’t do anything loops and such can’t do, but they do provide a very convenient shorthand.

1.9.2 Returning the entire series

Hang on a minute, though. We’re not really returning the triangle, are we? We’re just getting back the specific row that we asked for as n. All the other rows that get computed on the way are discarded, which seems a bit of a shame. We could set up, outside the function, a loop to append all returned values from pascal() to a list p:

n=5 p= [pascal(n) for n in range(n)] print(p)

>>> [[1], [1, 1], [1, 2, 1], [1, 3, 3, 1], [1, 4, 6, 4, 1]]

This gives us the correct values for rows 0-4. But it’s a little expensive, in the sense that we are repeating the calcula- tions leading up to n = 3 all over again in order to get to n = 4, etc. Is there a way to write the recursion so that it returns the complete list? One of the things that we can do is send a second argument to pascal() that will store all layers so far computed. We still use n to designate the last row/frame that we want, and it still works as our counter to get us down to the base case of if n == 0. But we also create a list tri that scoops up every row as it is created. Here’s a first draft:

1.9. Expanding a Series: Pascal’s Triangle 45 recur Documentation, Release 1.0

def pascal(n, tri): if n ==0: return [[1]] else: r= pascal(n-1, tri) row=[1]+ [(r[i]+ r[i+1]) for i in range(len(r)-1)]+[1] tri.append(row) print('tri =', tri) return row

print(pascal(4, [[1]]))

>>> tri = [[1], [1, 1]] >>> tri = [[1], [1, 1], [1, 2, 1]] >>> tri = [[1], [1, 1], [1, 2, 1], [1, 3, 3, 1]] >>> tri = [[1], [1, 1], [1, 2, 1], [1, 3, 3, 1], [1, 4, 6, 4, 1]] >>> [1, 4, 6, 4, 1]]

The recursive call r = pascal(n - 1, tri) may look a little odd. Obviously, now that pascal() has two arguments, the interpreter requires that we pass two arguments every time we call it, but it also looks like we’re mashing two values into one variable r. Except we’re not, because that’s not what’s being returned. The value returned is row. If you print out r right after the recursion call, you’ll see this:

>>> [1] >>> [1, 1] >>> [1, 2, 1] >>> [1, 3, 3, 1]

What you’re seeing is row, not n or tri. Keep in mind that what we are returning to r is first the base case, which is [[1]], followed by each recursed value of row. You may well protest that there is, in fact, an n, because you can print for it and it will yield a value. That value of n you’re accessing was computed on the way towards the base case and is still residing in the frame as a part of the function’s state. It was there since the creation of that frame, and has nothing to do with the chain of return statements. Also note the subtle change in the base case: we now want to return [[1]] and not [1] since we are appending lists to the base case’s return value, which is itself a list whose first element is [1]. Back to our larger problem. We can see from tri that we’re accumulating the rows correctly, but in the end there is nowhere for them to go, since the return statement (ie, what is returned by pascal(n - 1, tri) and bound to r) must be a list that represents the row on which the new row will be based - and not a list of lists. If we have any chance of seeing the entire triangle, what we need to do is return all of tri. This then means that we only want the last item in the tri list. But even if we write. . .

return tri[-1]

. . . as the return statement we get the same output as above - the last row of the triangle. What have to re-state the way in which we compute the row: if we are sending all of tri to r, then we need to tell the function to operate on the last item of the list in r, which is the most recently calculated row, in order to compute row. For example, if we have been generating the whole list and at a certain point we returned. . .

r=[[1], [1,1], [1,2,1], [1,3,3,1]]

. . . then we know that the last element (in this case, [1, 3, 3, 1]) is always represented by r[-1]. We want our calculation of row to take this into account. Looking at the listcomp we built. . .

46 Chapter 1. Getting Started recur Documentation, Release 1.0

row=[1]+ [(r[i]+ r[i+1]) for i in range(len(r)-1)]+[1]

. . . it’s clear that if we are applying a list of lists to this we will get a mess, if not an outright error. For example, in the first iteration, r[i] == [1] and r[i + 1] == [1, 1]. Instead of operating on a single list we are mashing entire lists together. What a disaster. Fortunately, Python allows us to specify an element that belongs to a list, even if that list is part of another, larger list:

L=[[1,2],3,[4,5]] L[0][1] L[1] L[2][0]

>>> 2 >>> 3 >>> 4

We can integrate this into a list comprehension, rewriting the row computation as:

row=[1]+ [(r[-1][i]+ r[-1][i+1]) for i in range(len(r[-1])-1)]+[1]

In other words, we are saying “take the ith element of the last item in r and add it to the next element of that same item in r”. Thanks to this tweak, our new code doesn’t look that different from the original:

def pascal(n, tri): if n ==0: return [[1]] else: r= pascal(n-1, tri) row=[1]+ [(r[-1][i]+ r[-1][i+1]) for i in range(len(r[-1])-1)]+[1] tri.append(row) return tri

print(pascal(4, [[1]]))

>>> [[1], [1, 1], [1, 2, 1], [1, 3, 3, 1], [1, 4, 6, 4, 1]]

I admit that this listcomp is even more verbose than the first time around, so we can also restate this in terms of the original for loop formulation:

def pascal(n, tri): if n ==0: return [[1]] else: r= pascal(n-1, tri) row=[1] for i in range(len(r[-1])-1): row.append(r[-1][i]+ r[-1][i+1]) row.append(1) tri.append(row) return tri

print(pascal(4, [[1]]))

>>> [[1], [1, 1], [1, 2, 1], [1, 3, 3, 1], [1, 4, 6, 4, 1]]

To see for yourself, insert a complete set of print-tracing elements and inspect how the recursion unfolds.

1.9. Expanding a Series: Pascal’s Triangle 47 recur Documentation, Release 1.0

1.9.3 Heuristics and Exercises

As we did with powerSet(), if you find yourself stuck for how to think through a problem recursively, solve a small portion of the problem first by creating a ‘fake’ recursive function. Once this one-shot function works, test it for other inputs, and then see if it works for what you chose to return from the base case. This is very different from solving the entire problem iteratively. While an iterative approach may give you visibility into the problem’s general behavior, it may not translate easily (or at all) into a recursive solution. The ‘fake recursion’ approach is more closely aligned with thinking recursively: we work within a function that’s set up to work recursively but doesn’t actually recurse. We attempt to solve for a single frame within the larger problem; by the principle of induction, we then continue testing the hypothesis. If it works for ‘n’, it should work for ‘n + 1’, ‘n - 1’, ‘n +/- x’ and, finally, ‘n == 0’, our base case. Sometimes the recursive call just drives to the base case and doesn’t need to do anything more than that. With summ() we added the namespace of n in each frame to the returning sum. Here, pascal(n - 1) merely sets up the correct number of frames for the post-recursive cascade. What is returned by each frame and what is computed within each frame always works together. If we alter what each frame returns, we will probably have to change the computation inside each frame. But recursion demands that each frame receive the same returned variable(s), and perform the same computations. This is one of the frustrations people experience with recursion, as it can lead to situations where nothing works until everything (suddenly) works. Multiple arguments can be passed to the recursive function to create containers for more comprehensive data. If we cannot alter the way the function is being called (ie, pascal() will only accept one argument), then we can set a default parameter which in many cases will fulfill the requirement, eg: def pascal(n, tri=[[1]]). Always worth re-stating: A recursive function’s work is basically divisible into two parts: the pre-recursive computa- tion and setup on the way to the base case, and the post-recursive computation, on the way back. There is no setup on the way back - you have to work with what you’ve got. When designing a recursive solution, you have to determine what needs to happen on the way in, and what needs to happen on the way back out. The distinct dividing line is the recursive call itself. We’ve already seen two extreme examples. In pascal(), all of the work happens on the return trip from the base case; this is also known as ‘corecursion’. Whereas in pal(), all of the work happens on the way to the base case. Recursion is flexible like that. Exercise: Building on one of the above heuristics, rewrite our last version of pascal() to use tri=[[1]] as a default argument. What else do you need to change inside and outside the function to make it work? What stays the same? What can you change that may not make a difference at all? Exercise: If we examine Pascal’s triangle, one of its sequences is the triangular numbers:

0 1 1 1 1 2 1 2 1 3 1 3 3 1 4 1 4 6 4 1 5 1 5 10 10 5 1

One way to visualize the triangular numbers is as the number of dots needed to create an equilateral triangle. If we omit 0, the sequence is as follows:

0, 1, 3, 6, 10, 15, 21, 28, 36...

You can see that Pascal’s triangle has this sequence represented (twice!) as an interior diagonal: the 1st element of row 2, the second element of row 3, the third element of row 4, etc. Conversely, the same sequence can be read from: the last element of row 2, the second-to-last element of row 3, the third-to-last element of row 4, etc.

48 Chapter 1. Getting Started recur Documentation, Release 1.0

Modify pascal() so that it compiles the triangular numbers in a separate list and returns it along with the triangle layers. See if you can add each triangular number as each new layer is generated. Make sure that you include 0 at the start of your sequence of triangular numbers. Some hints: • Get rid of the pretty formatting and left-justify the triangle to see how the triangular numbers line up. • If you get stuck using lists, what other data structure might be more effective? Bonus: What if you are asked to return only the triangular numbers at the end of the computation, but everything must still be written as one function?

1.10 Multiple Recursive Calls: Fibonacci Sequence, Part 1

1.10.1 Introducing multiple recursive calls

Along with factorials and the Towers of Hanoi, the Fibonacci sequence is one of the classic introductions to recur- sion. I’ve chosen to include it at a significantly later point in this guide, since Fibonacci has deep implications for understanding recursion, and particularly the efficiency of certain recursive algorithms. It shows that the most intuitive recursive solution may not be desirable, and will force us to look for better ways of implementing this technique. Let’s first look at the sequence itself, which is seeded with 0 and 1. Each succeeding Fibonacci number is the sum of the two previous ones. Written in the form of a list, the first ten Fibonacci numbers are:

n 0123456789 fib(n) [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]

When we ask for fib(n) we are asking for the place that nth number occupies in the Fibonacci sequence, similar to if we asked for the 10th prime number, or the 6th triangular number. If we were to represent this recursively, a few things immediately stand out. The first is that, like pascal(), we are generating the sequence by looking backwards to retrieve earlier terms that we need to perform the computation. In fact, we need to go all the way back to the base case to be able to calculate the intermediate values. Secondly, both 0 and 1 are the seeds needed to kick off the sequence What does this imply for a recursive solution? We’ve seen something similar with our function to determine palindromicity: for pal(), we had to account for the fact that any string either has an even or an odd number of characters, which meant we needed two base cases. Likewise, fib() has to account for 0 and 1:

fib(n): if n ==0 or n ==1: return n else: # some code

We return n here because if we want fib(0), we can’t very well return 1, can we? (We can also write if n < 2: return n, which is a bit more compact.) It’s true that, as we decrement fib(n), we will always return 1, but we’ll also hit fib(0) far more often than you think, as we’ll see. When considering the recursive case, let’s go back to the way the sequence is generated. In terms of any number n > 2, another way to say “add the previous two numbers” is “add n - 1 with n - 2”. In terms of generalizing this to the function, we can write:

fib(n) == fib(n - 1) + fib(n - 2)

1.10. Multiple Recursive Calls: Fibonacci Sequence, Part 1 49 recur Documentation, Release 1.0

In other words, we need to consult the sequence twice in order to get to n. This seems fairly straightforward if we are calculating fib(2): we just add up both base cases and return the result, which is 1.

n 0 1 0 1 2 fib(n) [0, 1] --> [0, 1, 1]

Adding the next term in a sequence is always easy if you have the formula and all the preceding values. But what if we wanted to know fib(3)?

n 0 1 0 1 2 3 fib(n) [0, 1] --> [0, 1, ?, fib(3)]

Working from our ‘(n - 1) + (n - 2)’ formula, to get fib(3) we first need to get fib(2) so we can add it to fib(1), which we already know is 1. And to get fib(2) we’ll need to invoke fib(1) and fib(0), which we know are 1 and 0. This need to work backwards to the base case and then move forwards again suggests that the Fibonacci sequence lends itself to recursion. In fact, there’s not much more to the code than what we’ve just reasoned through:

fib(n): if n ==0 or n ==1: return n else: return fib(n-1)+ fib(n-2)

Recursively speaking, we go back to the base case in order to get the value we are sure of, then add up all the returned values on the trip back through the call stack. Since all of the computation happens at the moment of the recursive calls, it’s appropriate to put fib(n - 1) + fib(n - 2) in the return statement. It’s just a matter of backfilling the sequence until we have all of the terms needed to derive the desired nth number. At first glance this solution looks entirely reasonable. But a closer look at the return statement reveals quite a devil in the details. Let’s map out what this might look like for fib(3) using our usual ‘talking algorithm’ approach: 1) Global frame: Hey fib(), compute fib(3) for me. 2) Frame 1 of fib(): I don’t know what fib(3) is because first I need to compute fib(2) and add it to fib(1) (return fib(2) + fib(1)). Since fib(2) is first, let me evaluate it: 3) Frame 2 of fib(): I don’t know what fib(2) is because I first need to compute fib(1) and add it to fib(0): return fib(1) + fib(0). Since fib(1) comes first, let me evaluate it: 4) Frame 3 of fib(): I know that fib(1) == 1, since it’s a base case, so I’ll return the result as 1. Closing frame 3. 5) Frame 2 of fib(): Thanks. I now have: return 1 + fib(0), so let’s evaluate fib(0): 6) Frame 4 of fib(): I know that fib(0) == 0, since it’s a base case, so I’ll return the result as 0. Closing frame 4. 7) Frame 2 of fib(): I can now complete return 1 + 0 so I’m returning 1 to frame 1. Closing frame 2. 8) Frame 1 of fib(): Ok, I now know fib(2) == 1 so I can update my return statement to return 1 + fib(1), but I still can’t return the answer since I now need to evaluate fib(1) 9) Frame 5 of fib(): I know that fib(1) == 1 is a base case so I will return 1. Frame 5 now closed. 10) Frame 1 of fib(): Ok, I now know fib(1) == 1 so I can finish evaluating the return statement and close frame 1: return 1 + 1 11) Global frame: I’ve received 2 from fib() That turns out to be a lot of steps to compute the next term in the sequence. Here’s an illustration of the calls as they unfold, frame by frame.

50 Chapter 1. Getting Started recur Documentation, Release 1.0

Fig. 1: Figure 1. The Fibonacci call stack for fib(3)

1.10. Multiple Recursive Calls: Fibonacci Sequence, Part 1 51 recur Documentation, Release 1.0

So far we’ve been illustrating the call stack from the bottom up. Just as cafeteria trays are stacked on top of one another and removed in a last-in-first-out (LIFO) order, that’s the way recursion creates and destroys frames. But multiple recursive calls introduce a branching structure that is conceptually closer to a tree. I find top-down diagrams much easier to read, because we’ve been taught to be top-down readers. So it’s much more intuitive to follow the branches until we ‘get to the bottom’ of a chain of logic. In this case, branches are successive recursive calls, and every time the base case is achieved (and the recursive cascade reversed), we record it as a leaf. An important part of reading these diagrams is the order in which computation happens. You can follow this by the frame numbering, keeping in mind that a computation completed at the base case will be returned to the calling frame, but that that frame may well initiate another branch of frame(s) in order to address the second recursive call in the return statement. So the program flow for the sequence of frames for fib(3) would be:

global-1-2-3-2-4-2-1-5-1-global

If this diagram of fib(3) seems a bit more complex than expected, things get noticeably worse for the next value of n. If we were to ask for the fourth Fibonacci number, our diagram would look like this:

Fig. 2: Figure 2. The Fibonacci call stack for fib(4)

You can see that all the computations that make up fib(3) are a subset of fib(4), and that all of fib(2) is a subset of fib(3), but also that fib(2) shows up a second time in the right portion of fib(4)’s tree! From a more general perspective, you can now see the return statement embodies two separate sides of the tree, where the computation of the first (left) term must be completed before computation of the second (right) term can be undertaken:

52 Chapter 1. Getting Started recur Documentation, Release 1.0

return fib(3) + fib(2) ^^^^^^ ^^^^^^ left right

The diagram for fib(4) points out two important consequences of multiple recursive calls. We’re already familiar with the first one: once a function begins recursing, it continues until the base case. But in the case of multiple recursive calls, getting to the base case means splitting off and leaving the second (right) call for later. Of course, since the function calls itself, this means there is a split for every call. Likewise, what gets addressed first every time is the left term, so the recursion works its way through every left term until the base case is reached. Multiple recursion implies that we traverse the entire height of the tree before computing any other branches. In keeping with the top-down visualization, this is known as ‘depth-first search’. In addition to depth-first search, the tree is gradually traversed left-to-right, which you can follow by the 1-2-3-2-4-2-1-5-1 sequence described above. It’s worth pointing out that, at the moment we return to frame 1, we have recursed our way through the left-hand side of the tree. We now have the answer to the first term of fib(n - 1) + fib(n - 2). If the tree were symmetrical, we would say that we were at its midpoint. We repeat the process for the right side, continuing to move depth-first and gradually left-to-right. The last base case we hit is the leaf at the rightmost position of the diagram. We’ll use this attribute of recursion to our advantage in the algorithms following this section. Of course, there are several ways to think about this: you could think of it as a metaphoric tree, in which case branches spring from a trunk and each branch ends in leaves. Here you might think of it as a ‘height-first’ search. There are also diagrams that grow the recursive calls horizontally, such that the ‘tree’ is on its side, branching horizontally from left to right. However, most texts use the top-to-bottom, left-to-right metaphor.

1.10.2 Measuring efficiency and complexity

The second consequence of multiple recursion is even more important, at least for our need to compute larger Fibonacci numbers. It may not have been apparent from fib(3), but fib(4) shows that there is some amount of redundant computation going on, and if there’s one thing that programmers despise it’s redundant computation. Here, in the first, left-handed branch of fib(4) we compute fib(2) as part of fib(3). As we return the results to frame 1, we also find ourselves having to compute fib(2) from scratch for the second, right-handed recursive call, fib(n - 2). The fact that fib(2) is a subtree of both fib(3) and fib(4) isn’t doing us any favors. You could say it’s a classic case of the right hand not knowing what the left hand is doing. The problem is only compounded when we get into double-digit Fibonacci numbers. This becomes computationally very expensive. In fact, try calculating fib(45) and see if you can’t make and eat a sandwich while you wait. If we include a step counter that simply ticks off the number of times fib() calls itself, we see a real explosion:

n fib(n) calls 0 0 1 1 1 1 2 1 3 3 2 5 4 3 9 5 5 15 6 8 25 7 13 41 8 21 67 9 34 109 10 55 177 11 89 287 12 144 465 (continues on next page)

1.10. Multiple Recursive Calls: Fibonacci Sequence, Part 1 53 recur Documentation, Release 1.0

(continued from previous page) 13 233 753 14 377 1,219 15 610 1,973 ...... 35 9,227,465 29,860,703

While dramatic, tabulating the number of calls doesn’t give us much of an indication of how long this might take. Fortunately, we can further analyize our algorithms by measuring code execution speed. A good tool for this is Python’s timeit module, which is designed for testing small pieces of code. To do so, first write this at the top of your code:

import timeit

Then wrap up your function call like so:

print(timeit.timeit('fib(35)', globals=globals(), number=1))

The keyword argument number sets the number of times timeit() runs the function. Since we’re only interested in a rough cut, you can simply set it to 1.

Note: If you’re running Python from the command line or a similar REPL session, include globals=globals() as another kwarg, so that timeit() can find the global frame. Otherwise Python will throw an error.

Our complete code for measuring fib() looks like this:

import timeit

def fib(n): if n ==0 or n ==1: return n else: return fib(n-1)+ fib(n-2)

print(timeit.timeit('fib(35)', globals=globals(), number=1),'seconds')

>>> 2.999023314 seconds

However, given hardware differences, my execution time will likely be different from yours. Computer science has therefore come up with a formal way of studying computational complexity, independent of hardware and such vari- ables. I’m not going to dwell on complexity in this guide, but you can see that fib() is quite an expensive way of going about things. The complexity of what people have come to call the ‘naive Fibonacci algorithm’ isn’t quite 2**n, where n is the Fibonacci number we’re after - 2**n would be the complexity of a perfect binary tree, where both the left and right sides have an equal number of nodes and leaves. But it’s not far off either. Since the left-hand side of the tree generated by fib(n - 1) will always be larger than the right-hand side, fib(n - 2), we say that the fib(n) tree is asymmetrical. In the end, the efficiency is closer to 1.6**n, which is still exponential and therefore to be avoided. It looks like recursion has finally failed us. Sure, the code is very pretty to look at, but useless if it doesn’t give the answer. On the other hand, if we could find a way to 1) store the result of each computation once we made it, and 2) refer back to that result when needed, we could potentially cut down on the number of steps. And we can, using a technique known as ‘memoization’, which I’ll cover in the next section.

54 Chapter 1. Getting Started recur Documentation, Release 1.0

1.10.3 Heuristics and Exercises

Recursion may be powerful, elegant and compact, but can also yield certain solutions that are computationally un- wieldy, if not inachievable. Solutions that imply returning multiple recursive calls should be approached with extreme caution. Multiple recursive calls expand call diagrams as linear ‘stacks’ to branching ‘trees’. When following the flow of a multiply recursing function, keep in mind that the function will recurse fully to the base case of its first recursive call before evaluating any other calls. This depth-first search is complemented by the fact that the tree will fill itself out by moving in a gradual, left-to-right direction. The final base case to be evaluated will be the rightmost leaf in the diagram. Exercise: The tribonacci series is a generalization of the Fibonacci sequence where each term is the sum of the three preceding terms. The first few terms of the sequence are:

[0, 0, 1, 1, 2, 4, 7, 13, 24, 44, 81, 149, 274]

The tribonacci sequence is defined as follows: fn(0) == fn(1) == 1 fn(2) == 2 fn(n) == fn(n - 1) + fn(n - 2) + fn(n - 3)

Create a recursive function trib() that returns the tribonacci number for any given n. How would you write the base case? At what point does the sequence noticeably slow down? Take a few minutes to draw out, by hand, the call diagram for trib(5). What does the branching look like? How would you characterize the computational complexity of a function like trib() versus that of fib()? Now measure performance by modifying both fib() and trib() to use timeit(). Write a loop that will record the times for a range of values for both functions. Do the results match up with your intuition? Why or why not?

1.11 Memoization: Fibonacci Sequence, Part 2

1.11.1 Memoizing by list

Quite simply, ‘memoization’ is a form of caching. Before looking at memoization for Fibonacci numbers, let’s do a simpler example, one that computes factorials. From there we’ll build out a series of related solutions that will get us to a clearly understandable memoized solution for fib(). For the factorial exercise in Recursion in Light of Frames, you probably came up with something like the following code: def fact(n): if n ==1: return n else: return n * fact(n-1) print(fact(5))

>>> 120

1.11. Memoization: Fibonacci Sequence, Part 2 55 recur Documentation, Release 1.0

As it stands, the program discards all the intermediate values of fact(n - 1) as it returns its way through the call stack. What if we wanted to capture those values? In a way, we already did this with our second version of pascal() in Expanding a Series: Pascal’s Triangle. There, we returned not just the nth row of the triangle, but all the rows preceding it. It took the form of a list of lists that was passed through each recursed frame, starting from the base case. We can modify fact() to do something similar:

def factMemList(n, L): if n ==1: return L else: factMemList(n-1, L) L.append(n * L[-1]) return L

print(factMemList(5,[1]))

>>> [1, 2, 6, 24, 120]

Here, we want to return the entire list L. So within the else block we make the recursive call, and then do some computation on the returned L. You’ll notice that we don’t assign the recursive call to any variable! Why is that? Didn’t I claim earlier that we have to explicitly update the state of a variable? Actually, we could write L = factMemList(n - 1, L) but we don’t have to, since factMemList(n - 1, L) already is L, simply by the fact that L is what’s being returned. In other words, the namespace of L for a given frame is automatically updated by the return L statement. For each frame, all that we are interested in getting is the right n and L. We get the right n by seeding each frame with n - 1 on the way to the base case, ie, pre-recursively. Once we get to the base case, we return L. Post-recursively, each successive frame performs the computation (n * L[-1]) and appends it to ‘‘L‘‘. Then that modified L is passed back to the next calling frame. The result is that, at each point, the correct n interacts with L[-1], which is always the last item in the list. When designing a recursive solution, remember the three basic design decisions that need to be made. In the process of getting to the base case, how do I want to ‘seed’ the function’s state for each frame? At what point do I ‘split’ the function with the recursive call? And what do I want to do post-recursively - that is, what am I returning, and what kind of computation do I want to perform on that returned object before I reach the current frame’s return statement? Our print-tracing approach tracks the recursion as follows:

frame=0 def factMemList(n, L): global frame frame+=1 if n ==1: print('\nframe =', frame) print('base case:', L) return L else: print('\nframe =', frame) print('n =', n,'L =', L) factMemList(n-1, L) frame-=1 print('\nframe =', frame) print('n =', n,'L =', L) L.append(n * L[-1]) print('returning L as', L) return L

(continues on next page)

56 Chapter 1. Getting Started recur Documentation, Release 1.0

(continued from previous page) print('global frame =', frame) print(factMemList(5,[1]))

>>> global frame = 0

>>> frame = 1 >>> n = 5 L = [1]

>>> frame = 2 >>> n = 4 L = [1]

>>> frame = 3 >>> n = 3 L = [1]

>>> frame = 4 >>> n = 2 L = [1]

>>> frame = 5 >>> base case: [1]

>>> frame = 4 >>> n = 2 L = [1] >>> returning L as [1, 2]

>>> frame = 3 >>> n = 3 L = [1, 2] >>> returning L as [1, 2, 6]

>>> frame = 2 >>> n = 4 L = [1, 2, 6] >>> returning L as [1, 2, 6, 24]

>>> frame = 1 >>> n = 5 L = [1, 2, 6, 24] >>> returning L as [1, 2, 6, 24, 120] >>> [1, 2, 6, 24, 120]

This is pretty good! We now have a complete list of factorials up to and including n, and every list element in index position L[n] corresponds to the n+1th factorial, so it’s not hard to look up whatever we might need.

1.11.2 Memoizing by dictionary

We can do even better, though, by using Python’s dict data type. Like lists, dictionaries are mutable, so we can add to them on the fly. Unlike lists, dictionaries consist of key-value pairs. Dictionaries also require uniqueness - but only for keys. So mapping n —> fn(n) as key-value pairs is more readable than a list. In a list, when we append each fn(n) based on a continuously incrementing n, n is implied by the index position of fn(n). In a dictionary, that n is explicitly stated as the key. (Dictionaries have the added flexibility that we can use any immutable data type as a key, but we won’t need to do that here - integers are all that’s required). Let’s call our dictionary factdict. If n == 6 and fact(6) == 720, in dictionary syntax the key-value pair would read as {6:720}. We populate factdict simply by declaring factdict[6] = 720, creating both key and value at the same stroke. So if factdict had the key-value pair {1:1} and we wanted to compute fact(n) for some n, we would write:

1.11. Memoization: Fibonacci Sequence, Part 2 57 recur Documentation, Release 1.0

factdict={1:1} n=2 factdict[n]=n * factdict[n-1] #value of factdict[n - 1] is 1

>>> {1:1, 2:2}

This simply multiplies n and the value assigned to the last key, and sets that product as a value to the new key n. Iteratively, we can easily populate a dictionary with the pairs {n:fn(n)}:

n=6 factdict={1:1} for i in range(2, n+1): factdict[i]=i * factdict[i-1] print(factdict)

>>> {1: 1, 2: 2, 3: 6, 4: 24, 5: 120, 6: 720}

Here we set a new key for each n, represented as i in the loop, and give it the value of n * factdict[n-1], since factdict[n-1] will always give us the value from the last available key-value pair. Since we’ve now expanded the dictionary by one pair, the next time the loop goes around facdict[i-1] will continue to access the most recently added key-value pair. Even though dictionaries, like lists, are unordered, we can guarantee this, because we are searching by key, and our keys always increment by 1. In fact, you could argue that. . .

factdict[i]=i * factdict[i-1]

. . . is just a dictionary rewrite of what we were doing with a list in factMemList():

L.append(n * L[-1])

The other bit that makes this code work is that we don’t start from an empty dictionary - otherwise factdict[i-1] would throw an error. But starting with {1:1} harmonizes nicely with the fact that our base case for factMemList was [1] - both examples are variations on saying that we know 1! == 1, the original base case for fact(). Let’s flesh out the code for factMemDict():

def factMemDict(n, factdict): if n ==1: return factdict else: factMemDict(n-1, factdict) factdict[n]=n * factdict[n-1] return factdict

print(factMemDict(5,{1:1}))

>>> {1: 1, 2: 2, 3: 6, 4: 24, 5: 120}

As you can see, except for the penultimate dictionary-specific line, it’s virtually identical to factMemList()! There’s no reason not to split the function in exactly the same place. Pre-recursively, we wanted to seed each frame with the correct value of n. Post-recursively, we wanted to return the whole dictionary, after adding a new key-value pair with the computation against the n that belongs to each frame. As a side note, I want to point out that the code we’ve developed so far is a bit different from what you usually see in discussions of memoization, like this one:

58 Chapter 1. Getting Started recur Documentation, Release 1.0

def factMem(n, factdict): if n ==1: return factdict[n] else: factdict[n]=n * factMem(n-1, factdict) return factdict[n]

print(factMem(5,{1:1}))

Run this code. What’s the difference? Which do you think is more useful? Honestly, whichever version you prefer, it doesn’t look like we’ve achieved much. Factorial is a linearly recursing algorithm, so we can’t improve its efficiency very much through memoization. But we now have a handle on what ‘memoizing’ an algorithm looks like. Let’s get back to where it does matter - fib().

1.11.3 Memoizing Fibonacci

Given the Fibonacci sequence’s nature, we need dictionaries for the job and not lists, as we want to be able to look up existing values quickly (via their keys). So let’s convert our seeds 0 and 1 to a dictionary, {0:0, 1:1}, and modify the base case accordingly:

def fibMem(n, fibdict): if n in fibdict: return fibdict else: # a bunch of code

fibdict={0:0,1:1}

Since we are now using a dictionary, we also have to change our base case test. Rather than evaluating whether n == 0 or n == 1 (or n < 2), we see if n is already a key in the dictionary by asking if n in fibdict. Since both 0 and 1 are in the dictionary by definition, we’ve got both base cases covered. Also, we are no longer returning an integer or a list, but the seed dictionary itself. Turning to the recursive case, we want to address both the pre-recursive and post-recursive parts. For the former, we want the correct value of n to be seeded in each frame. For the latter, we want to add each n as a new key to fibdict, and each freshly computed fibmemdict(n) as its corresponding value. As a one-shot, the dictionary syntax would look like this:

fibdict={0:0,1:1,2:1,3:2,4:3,5:5} fibdict[6]= fibdict[5]+ fibdict[4] print(fibdict)

>>> {0: 0, 1: 1, 2: 1, 3: 2, 4: 3, 5: 5, 6: 8}

It’s often a good idea to sketch out an iterative solution prior to the recursive one. We can look to the layout of factMemDict() to assert the iterative case:

n=6 fibdict={0:0,1:1} for i in range(2, n+1): fibdict[i]= fibdict[i-1]+ fibdict[i-2] print(fibdict)

1.11. Memoization: Fibonacci Sequence, Part 2 59 recur Documentation, Release 1.0

>>> {0: 0, 1: 1, 2: 1, 3: 2, 4: 3, 5: 5, 6: 8}

Pulling together the final code is actually as intuitive as you might hope. The same principles of ‘splitting the function’ apply here, and it’s simply a matter of substituting the factorial-generating dictionary operation with the Fibonacci- generating one: def fibMem(n, fibdict): if n in fibdict: return fibdict else: fibMem(n-1, fibdict) fibdict[n]= fibdict[n-1]+ fibdict[n-2] return fibdict print(fibMem(8,{0:0,1:1}))

>>> {0: 0, 1: 1, 2: 1, 3: 2, 4: 3, 5: 5, 6: 8, 7: 13, 8: 21}

This shows how we can incrementally build a solution to a problem that may at first seem to be difficult to approach. Instead of trying to go straight from fib() to fibMem(), we stepped back, removing the distraction of fib()’s multiple recursive calls, and built simple solutions for factorial, first using lists and then, based on what we learned there, dictionaries. From there, we transferred that learning to fibMem(), addressing the unique attributes of the Fibonacci sequence while, at the same time, changing the fundamental structure of our algorithm only slightly. This kind of incremental development is a very useful skill indeed. Let’s contrast this with the code usually given for memoization of Fibonacci: def fibMem2(n, fibdict): if n in fibdict: return fibdict[n] else: fibdict[n]= fibMem2(n-1)+ fibMem2(n-2) return fibdict[n] print(fibMem2(8,{0:0,1:1}))

>>> 21

Like the alternative example for factorial memoization, the output is not the entire dictionary, but just fibMem2(8). Fair enough, if that’s all you need. And the complete dictionary is there - if you insert print(fibdict) right before the return statement you can have a peek. It’s just that you can’t get it out of the function - it gets discarded along with everything else once the final return of fibdict[n] occurs. In this way, it’s identical to how tri was behaving when we first tried to revise pascal(). Finally, recall the table at the end of the previous section that showed just how computationally expensive fib() was. Let’s add our latest results as a new column:

fib(n) fibmem() n fib(n) calls calls 0 0 1 1 1 1 1 1 2 1 3 2 3 2 5 3 4 3 9 4 5 5 15 5 6 8 25 6 (continues on next page)

60 Chapter 1. Getting Started recur Documentation, Release 1.0

(continued from previous page) 7 13 41 7 8 21 67 8 9 34 109 9 10 55 177 10 11 89 287 11 12 144 465 12 13 233 753 13 14 377 1,219 14 15 610 1,973 15 ...... 35 9,227,465 29,860,703 35

As you can see, the number of times fibMem() calls itself remains steady at n, no matter how large of an n we choose. In other words, its computational complexity is linear, which is vastly preferable to exponential complexity of the ‘naive’ version. There are several other ways of integrating memoization into recursively expensive code. One is to leave the original recursive function untouched, and define another function to handle the memoization instead - a feature known as a ‘decorator’. Since this section has gone on long enough already, you can read more about decorators here.

1.11.4 Heuristics and Exercises

Reduce the complexity of the problem. Can you simplify it by removing a recursive call, by simulating what the memoized data structure should look like, or by using an iterative solution to model the computation? Consider that the return statement will have to carry back the memoized data structure. What changes have to be made to the rest of the function - pre-recursive, post-recursive and the recursive call itself? [something about how the base case and the return statement can be two different things - has this shown up yet? may need to be explicit in the text re: this] Exercise: The Padovan sequence is defined as follows:

f(0) == f(1) == f(2) ==1 f(n) == f(n-2)+ f(n-3)

The first few numbers of the sequence are:

[1, 1, 1, 2, 2, 3, 4, 5, 7, 9, 12, 16, 21, 28]

Create a recursive function padovan() that returns the Padovan number for any given n. How would you write the base case? At what point does the sequence noticeably slow down? Now write a memoized version of the same algorithm, padovanMem(). As with the tribonacci exercise, apply the timeit() method to see how performance improves. Exercise: The Collatz conjecture is known as the “hardest simple problem in mathematics”. For any positive number n, if n is even, f(n) = n / 2. If n is odd, f(n) = 3 * n + 1. While it remains unproven, the conjecture states that after a finite number of operations fn(n) = 1 for any positive number n. Whether or not the conjecture is true, the number of steps needed to reduce any given n to 1 is certainly unpredictable:

n steps 1 0 2 1 3 7 4 2 (continues on next page)

1.11. Memoization: Fibonacci Sequence, Part 2 61 recur Documentation, Release 1.0

(continued from previous page) 5 5 6 8 7 16 8 3 9 19 10 6 11 14 12 9 13 9 14 17 15 17 16 4 17 12 18 20 19 20 20 7

Write a recursive function collatz() that returns the integer that takes the greatest number of steps to reach 1 for a given range. For example, between 1 and 1,000,000, the integer 837,799 takes 524 steps to get to 1. You’ll see that to test for such a range takes a long amount of time. So write a second function, collatzMem(), that uses memoization to reduce the computation by an order of magnitude. This is a tricky problem. Here are some hints about how to break it down. 1. Write an iterative, brute force solution so you see how the function behaves. 2. Write the recursive version. 3. Go back to the iterative solution and write the memoized modification. 4. Finally, synthesize a solution that is both recursive and memoized.

1.12 Recursive Approaches To Searching And Sorting

Programmers spend a lot of time looking for some things and sorting other things. We’ll first look at how recursion can help us to find an item in a list. After that, we’ll explore one of the better ways of taking a list and sorting it. This may seem backwards - after all, doesn’t it make sense to sort a collection of values before you search for it? Just bear with me.

1.12.1 Searching by bisecting the search space

Say I gave you a list of items L and assured you that L was sorted (ie, in ascending order). Next, I may well ask you if a given element e was in that list, and if so, where. One way is to interrogate the list item by item (ie, for e in L). While this sort of exhaustive enumeration might work well for small collections, if our list had several billion items we would wind up wasting a lot of time. In the worst case - and we are always interested in the worst case - the sought-after item would occupy the final index position, or be entirely absent from the list, forcing us to examine every item in the list. Let’s instead use the sorted nature of the list to our advantage. Assuming that e is in our sorted list L, we know it must be in one of three places: 1. In the first half of L 2. At the midpoint of L, which we’ll call mid

62 Chapter 1. Getting Started recur Documentation, Release 1.0

3. In the second half of L If we get lucky and find that in fact L[mid] == e, then all we have to do is return True, or the index position of e (that is, L[mid]), depending on what’s asked of us. If L[mid] != e, however, the sorted nature of the list will tell us if our guess e was too low (L[mid] > e) or too high (L[mid] < e). We can then use this information to narrow our search by half again. Repeating this strategy allows us to quickly zero in on where e might be hiding (or if it’s not present at all). This powerful technique is known as bisection or binary search, and belongs to the class of ‘divide-and-conquer’ algorithms. Since recursion is itself a form of divid-and-conquer, you can see where this is going. To get us started, here is an iterative function, binSearch(), that does the job:

def binSearch(L, e): high= len(L)-1 low=0 while high>= low: mid= (high+ low)//2 if L[mid] == e: return True elif L[mid]> e: high= mid-1 else: low= mid+1 return False

The while loop ensures that our comparison stops when high and low meet. Once there is no daylight between the two values, we know that e isn’t in the list, and we jump out to return False. Otherwise we continue to reduce the search space by half with every iteration. Question: If our search space (ie, len(L)) were 100 items long, how many bisections are needed to conclude that e is not in L? What if len(L) == 1000? Can you abstract this into a general measurement for the efficiency of binary search, versus exhaustive enumeration? This is a good insight into how computational complexity can be measured, independently of hardware or other variables. To think about binary search recursively, let’s begin with the base case: as successive halvings of the list cause len(L) to approach zero, we will either find or not find e. So we have two base cases:

def binSearchRecur1(L, e): if L == []: return False elif len(L) ==1: return L[0] ==e else: # lots of code

This approach elegantly handles two edge cases: if we reach an empty list, we know we haven’t found e. On the other hand, if we are passed a list with only item in it, return L[0] == e returns a boolean that tells us whether L[0] is the e we’re looking for. Let’s now consider how to handle all other situations, that is, where len(L) > 1. As with the iterative version, we want to capture the bisection and test it against e. We base our recursive call on this evaluation, sending the halved list back to the function until we have reached one of the base cases:

def binSearchRecur1(L, e): if L == []: return False elif len(L) ==1: return L[0] ==e (continues on next page)

1.12. Recursive Approaches To Searching And Sorting 63 recur Documentation, Release 1.0

(continued from previous page) else: half= len(L)//2 if L[half]> e: return binSearchRecur1(L[:half], e) else: return binSearchRecur1(L[half:], e)

This looks pretty good: we have successfully covered all of our cases and are guaranteed an answer. The important stuff happens pre-recursively, in the sense that we do all the hard work on the way to the base case. Whether the base case yields True or False, that’s what we bring back through the recursive process, without any further alterations. Just like pal(), we only want to know if the hypothesis of our original proposition (in this case, ‘is e in L?’) holds up through the reductive process of recursion. However, our algorithm is not as efficient as it could be. Note that in the recursive call we are sending a copy of the list into the next frame (any time you see a : in a list method, it means a copy has been made). Even if it’s just half the size of the original list, it’s still an expense that we could do without. If our initial list has a billion elements, the recursive halving to half of a billion is still 500 million elements. Since L is not being altered in any way, we would prefer to operate only on the original list itself. To do this, we can use variables to specify the index positions that will form the boundaries of the search space. Every time we call the function, we’ll just change those index positions. In this way, we only need to keep one copy of the list in memory. Let’s add to our recursive call additional parameters for those index positions. To do this, we’ll rewrite our algorithm using a helper function. You’ll recall from our discussion in Scope, Frame and Stack that functions can be nested inside of other functions. These are known as inner (or helper) functions, and, by the rules of scope, all values in the enclosing function are available to the inner function. This turns out to be very useful for our current situation. Here is the complete code:

def binSearchRecur2(L, e):

def binSearchHelp(L, e, low, high): if high == low: return L[low] ==e

mid= (low+ high)//2

if L[mid] == e: return True elif L[mid]> e: if low == mid: return False else: return binSearchHelp(L, e, low, mid-1) else: return binSearchHelp(L, e, mid+1, high)

if L == []: return False else: return binSearchHelp(L, e,0, len(L)-1)

Let’s look at the wrapper function first. We see the same base case for the empty list as above - it should return False. Question: elif len(L) == 1: return L[0] == e seems to have disappeared. Why? Can you find this functionality in the new code? If the list isn’t empty, we call the helper function binSearchHelp() and pass it L and e, as well as arguments representing the bottom and top bounds of the list’s range. This call only happens once, and is not a recursive call! As

64 Chapter 1. Getting Started recur Documentation, Release 1.0

soon as we’re inside the inner function, recursion takes care of the rest. Within the helper function, the if high == low block is equivalent to the second base case in binSearchRecur1(): if there is one item in the list, return a boolean evaluating L[low] == e as True or False. Otherwise, we set up the midpoint mid and evaluate where e sits in relation to mid. And we keep bisecting recursively until L[mid] == e, or we run out of search space. You’ll notice that we’ve borrowed a few lines from the original iterative code. However, in the new version we don’t have a while loop - we keep the program spinning by recursively calling binSearchHelp() with new arguments for low and high. This is the code’s most important feature: we are not making copies of a list but just looking at a smaller subset of the original list (low, mid - 1 vs [half:] and mid + 1, high vs [:half]). While the readability of binSearchRecur2() may be more challenging in comparison to binSearchRecur1(), running these two algorithms against a large, randomly generated list yields signifi- cant differences in execution. As with the previous section, let’s use timeit() to accurately measure code execution speed. Before we can run timeit(), we need to generate a big list and randomly choose an e for our algorithm to find. Our complete code for testing both algorithms looks like this:

import timeit import random

def binSearchRecur1(L, e): if L == []: return False elif len(L) ==1: return L[0] ==e else: half= len(L)//2 if L[half]> e: return binSearchRecur1(L[:half], e) else: return binSearchRecur1(L[half:], e)

def binSearchRecur2(L, e):

def binSearchHelp(L, e, low, high): if high == low: return L[low] ==e mid= (low+ high)//2 if L[mid] == e: return True elif L[mid]> e: if low == mid: return False else: return binSearchHelp(L, e, low, mid-1) else: return binSearchHelp(L, e, mid+1, high)

if L == []: return False else: return binSearchHelp(L, e,0, len(L)-1)

L= [i for i in range(1000000000)] # creates list of random items (continues on next page)

1.12. Recursive Approaches To Searching And Sorting 65 recur Documentation, Release 1.0

(continued from previous page) e= random.choice(L) # selects one element from L print(timeit.timeit('binSearchRecur1(L, e)', globals=globals(), number=1),'seconds') print(timeit.timeit('binSearchRecur2(L, e)', globals=globals(), number=1),'seconds')

>>> 36.74825... seconds >>> 00.00047... seconds

While this disparity may not be as disastrous as what we saw with the Fibonacci sequence, it certainly highlights the importance of optimizing code wherever possible.

1.12.2 Sorting things out: mergeSort()

Sorting has always been a programming challenge. It seems to have fewer options than searching, and the compu- tational costs tend to be higher and less negotiable, partly due to the fact that sorting requires you to examine every item in a list at least once. If you haven’t looked at sorting algorithms yet, take some time to familiarize yourself with bubble sort and select sort. That will give you a better context for the following discussion. I’m only going to discuss merge sort, which is the algorithm that lends itself most naturally to recursion. As with the binary search approach taken above, merge sort falls into the category of divide-and-conquer algorithms. Once again, let’s start with the simplest statement of the problem: When can we be sure that, without even inspecting it, a list is sorted? Similar to binary search’s base cases, if the list is empty or has only one element, it must, by definition, already be in a sorted state. def mergeSort(L): if len(L)<2: return L else: # a whole bunch of code

For all other lists where len(L) > 1, we’ll employ the following strategy: 1. Keep splitting the list in half (recursively) until all our sublists contain one element. 2. Merge pairs of sublists by comparing the first element of each pair, first appending the smaller element to a new, sorted list, followed by the larger one. We’ll do this in a separate function. 3. If during merging one sublist of a pair runs out of elements, add all the remaining elements in the other sublist to the sorted list. The first step shows us how to recursively arrive at our base cases. The merging will happen after the recursive call, and will do the work of actually sorting the list. The third step takes care of the edge case where a pair of (now-sorted) lists are of unequal length. What would this look like in terms of actual code? Let’s begin at the beginning: breaking down the target list into single-item sublists. As with our searching algorithms, we want the base case to trigger when len(L) < 2, at which point we return that reduced list. Here is the code so far, where we recursively split our target list into left and right halves. def mergeSort(L): if len(L)<2: return L else: mid= len(L)//2 left= mergeSort(L[:mid]) right= mergeSort(L[mid:]) return merge(left, right)

66 Chapter 1. Getting Started recur Documentation, Release 1.0

The return statement implies that some function merge() will do the actual merging and sorting, so let’s put that aside for the moment, and better understand the recursive mechanism, and at what point sublists get sent from mergeSort() to merge(). The first thing that stands out with mergeSort() is its multiple recursive calls. As with Fibonacci, this means we have a branching call structure. Using L = [8, 4, 1, 6, 5, 9, 2, 0, 3] as our target list, here’s a diagram that shows what happens on the trips towards, and back from, the base case. Note that Fibonacci was a bit lopsided, since fib(n - 1) + fib(n - 2) always meant that the left side of the tree (fib(n - 1)) would have more nodes and leaves than the right side (fib(n - 2)). mergeSort() is our first example of a symmetrical binary tree.

Fig. 3: Figure 1. mergeSort() call tree for L = [8, 4, 1, 6, 5, 9, 2, 0, 3]

In keeping with the depth-first nature of recursion, the subdivision of the list happens until we hit the leftmost leaf in the tree. In this case:

frame 1 [8, 4, 1, 6, 5, 9, 2, 0, 3] frame 2 [8, 4, 1, 6] frame 3 [8, 4]

Note that this is all driven by the first recursive call, left = mergeSort(L[:mid]). We get to the base case for the first time in frame 4, where len(L) < 2 and the return assigns left = 8 in frame 3. Since the namespace for L in frame 3 is [8, 4], our second recursive call, right = mergeSort(L[mid:]), creates frame 5, which immediately goes to the base case, returning 4. Now back in frame 3, we have completed the function and are ready to send 8 and 4 as arguments to merge(). Before we do so, keep in mind that we are still deep in mergeSort()’s recursive structure. Unlike the programs we have seen so far, we aren’t returning values generated within mergeSort() back to mergeSort(), but rather

1.12. Recursive Approaches To Searching And Sorting 67 recur Documentation, Release 1.0

calling another function. This doesn’t mean that we are done with mergeSort() - a quick glance at the tree shows that we have a long way to go yet. For its part, merge() handles the work of putting elements of two lists - of any length - into a third list, all in the right order:

def merge(left, right): result=[] i, j=0,0 while i< len(left) and j< len(right): if left[i]< right[j]: result.append(left[i]) i+=1 else: result.append(right[j]) j+=1 while i< len(left): result.append(left[i]) i+=1 while j< len(right): result.append(right[j]) j+=1 return result

In our current case, [8] and [4] have been passed as arguments to the left and right parameters. What gets passed back to frame 3 of mergeSort() is result == [4, 8]. Now that frame 3 has everything it needs, it returns this result to frame 2, which originally called it. Where does this returned list wind up? Here:

left= mergeSort(L[:mid])

Recall that in frame 2, the namespace for L is [8, 4, 1, 6]. Now we have left = [4, 8] so we move on to the next line, which is right = mergeSort(L[mid:]). This call creates frame 6 to handle the second half of L, which is [1, 6]. Frames 7 and 8 provide us with base case singleton lists, and from frame 6 we send [1] and [6] to merge() as arguments for left and right. The combined and sorted list is returned to frame 6 and immediately passed up to frame 2, where it is bound to right in the statement

right= mergeSort(L[mid:])

Now we have two sorted lists for frame 2:

left=[4,8] right=[1,6]

Having reached the bottom of mergeSort() in frame 2, we repeat the same return, calling merge() with the above values for left and right. Finally, the sorted list [1, 4, 6, 8] is passed up to frame 1, where it is assigned to that frame’s namespace for left. We’ve now completed the left side of the recursive call structure! Of course, the right side is executed in exactly the same way. We continue to drill down to the leftmost leaf of that side, which is 5, sorting it against 9. The only additional wrinkle is when we get to frame 13 and its odd number of list items. Since [2, 0, 3] splits into [2] and [0, 3], the latter has to undergo recursion one more time. It’s ok, frame 13 will wait while this gets resolved. Similar to our example of summing a list of lists, recursion is good at handling these sorts of situations - it will drive on until the base case is triggered. This is another reason why merge() was designed to handle lists with different lengths - if we were only expecting to receive arguments where len(left) == len(right) we could easily get either an Index Error or, worse, a digit that is omitted or returned in the wrong order.

68 Chapter 1. Getting Started recur Documentation, Release 1.0

Finally, the right side of the tree has returned its portion of the list. So now we are back at frame 1, where left=[1,4,6,8] right=[0,2,3,5,9]

We do the final call to merge() and, since there are no more frames, return the final sorted list to the global frame. Here is the entire code, along with a random list generator and the timeit module: import timeit import random def merge(left, right): result=[] i, j=0,0 while i< len(left) and j< len(right): if left[i]< right[j]: result.append(left[i]) i+=1 else: result.append(right[j]) j+=1 while i< len(left): result.append(left[i]) i+=1 while j< len(right): result.append(right[j]) j+=1 return result def mergeSort(L): if len(L) ==1: return L else: mid= len(L)//2 left= mergeSort(L[:mid]) right= mergeSort(L[mid:]) return merge(left, right)

L= [random.randrange(1, 100) for x in range(1000)] print(timeit.timeit('mergeSort(L)', globals=globals(), number=1))

This seems like an awful lot of work for sorting a list, but it’s actually more efficient than most other sorting algorithms. More importantly, sorting a collection makes searching it much more efficient. Returning to my initial remarks at the top of this guide, it’s especially true if you have a large collection that you know will be searched many times but only needs to be sorted once. In that sense, the cost of sorting can be ‘amortized’ over the number of searches performed on the sorted set. Given a large enough number of searches, the investment in sorting becomes trivial. So, taking the time to sort may well be wortwhile.

1.12.3 Heuristics and Exercises

Using slices to drive recursion may be clearer to visualize but it creates copies of lists (or strings, tuples, ranges or bytes) for each frame, which can become computationally burdensome. It’s preferable to allow a recursive process to repeatedly access the same, single data structure, than replicate a piece of that data structure for every recursive call. If you write a recursive solution that is executing slowly over a large collection, see if there is a way to restate the boundaries of the search space without creating unnecessary copies.

1.12. Recursive Approaches To Searching And Sorting 69 recur Documentation, Release 1.0

A recursive function doesn’t have to be completely self-contained. We can place recursion within more complex program structures. For example, we can separate the recursive function from functions that do other, significant computational work. In the case of mergeSort(), we offloaded a fairly complex merge/sort process into a separate function that we called from the return statement of the recursive function. However, keep in mind that calls to and returns from other functions are still embedded in the recursive process, which must entirely run its course. Exercise: Rewrite binSearchRecur2 as a single function. What are the advantages, if any, of using a helper function in this case? Exercise: Here is a bubble sort algorithm, bubble(). Convert it into a recursive form, bubbleRecur().

def bubble(L):

for i in range(len(L)-1,0,-1):

for j in range(i):

if L[j]> L[j+1]: temp= L[j] L[j]= L[j+1] L[j+1]= temp

return L

Exercise: Here is a select sort algorithm, select(). Convert it into a recursive form, selectRecur().

def select(L):

for i in range(len(L)): mini=i

for j in range(i+1, len(L)): if L[mini]> L[j]: mini=j

temp= L[i] L[i]= L[mini] L[mini]= temp

return L

Exercise: We implemented binary search using slices, and then made it more efficient by using index positions. Merge sort also uses slices, so apply this strategy to write a function mergeSort2() that will show dramatically improved performance. Exercise: We now have six sorting algorithms: • bubble() • bubbleRecur() • select() • selectRecur() • mergeSort() • mergeSort2() In a single program, run these algorithms against the same randomly generated list of numbers (your list should be pretty big - say at least 1 million items). Use timeit to get a handle on the differences and improvements in the various algorithms’ execution times.

70 Chapter 1. Getting Started recur Documentation, Release 1.0

Credit: Much of this material adapted from pp152-164 of John Guttag’s Introduction to Computation and Program- ming Using Python, 2nd Ed.

1.13 Recursion and Self-Similarity: Koch Curves

1.13.1 Turtles all the way down

Let’s switch gears now and use recursion to draw some pretty pictures, specifically fractals. Fractals and recursion go together like chocolate and peanut butter, and seeing a figurative representation of recursion drawn line-by-line will further our understanding of how it works. We’ll start with the Koch curve. In order draw the Koch curve, we’ll use Python’s turtle library. So before we can define any function, we have to import the library into our program, and create a turtle object that will draw lines for us. We’ll also specify a window, line width, background color, etc:

import turtle

t= turtle.Turtle() wn= turtle.Screen() wn.bgcolor('black')

t.color('orange') t.pensize(5)

# we'll put our ``koch()`` function here

wn.exitonclick()

When you run this, you should see a black window pop up, with an orange cursor (the turtle) pointing to the right. Clicking on the window after the program has finished calls wn.exitonclick(), which deletes the window and exits the program. (Any print statements we choose to include during the design process will show up in the window from which you launch the program.) The turtle library has lots of features and we’ll only touch on a few of them here, so check out the documentation to see what else you can do with this simple but rich library. Let’s begin by drawing a straight line. To do that, we’ll lift up the pen and position it somewhat to the left of the center of the screen (by default the turtle is placed at x-y coordinates (0, 0)). Now that we’re at (-500, 0) we’ll the pen ‘down’ so we can begin drawing:

import turtle

t= turtle.Turtle() wn= turtle.Screen() wn.bgcolor('black')

t.color('orange') t.pensize(5) t.penup() t.setpos(-500,0) t.pendown() t.speed()

def koch(t): t.forward(1000)

koch(t) (continues on next page)

1.13. Recursion and Self-Similarity: Koch Curves 71 recur Documentation, Release 1.0

(continued from previous page) wn.exitonclick()

Ok, so this may not look like much. But you may be surprised to learn that you have already drawn a fractal - in this case, a Koch curve of order 0. So what do other orders of the curve look like, and what does recursion have to do with it? If a Koch curve of order 0 is a straight line, we generate further orders by trisecting the line, and inserting into the middle portion two lines joined at an acute angle. The length of each of the new lines is 1/3 of the total length of order 0. Traditionally, the inserted lines imply an equilateral triangle, but it can be any acute angle we like. This substitution increases the length of the curve by 33%. Crudely speaking, order 0 to order 1 looks like: order 0: ______

/\ /\ order 1: ____/ \____

In order to move from order 1 to order 2, we see that we now have not one line to trisect, but four. The process gets repeated until we have trisected all available lines to the desired order (or ‘depth’, which you’ll see is nicely aligned with the concept of ‘depth-first search’ that we first explored in Multiple Recursive Calls: Fibonacci Sequence, Part 1).

/\ /\ order 1: ____/ \____

__/\__ \/ order 2: __/\__/ \__/\__

You can view several orders of the Koch curve generated here. However, this animation is deceptive, since it is not the way that recursion generates our curve. Now that we know what we are after, let’s first figure out how to draw an order 1 curve from scratch. Here we will simply tell our turtle how far to go, how many degrees to turn, etc: import turtle t= turtle.Turtle() wn= turtle.Screen() wn.bgcolor('black') t.color('orange') t.pensize(5) t.penup() t.setpos(-500,0) t.pendown() t.speed() def koch(t, order, size): if order ==0: t.forward(size) else: t.forward(size//3) # Go 1/3 of the way t.left(60) (continues on next page)

72 Chapter 1. Getting Started recur Documentation, Release 1.0

(continued from previous page) t.forward(size//3) t.right(120) t.forward(size//3) t.left(60) t.forward(size//3)

size= 1000 order=1 koch(t, order, size)

wn.exitonclick()

In addition to the instructions to draw forwards, and turn left or right, I’ve made the program a bit more flexible by creating size and order variables and passing them as arguments to koch(). Since order 0 is the simplest possible statement of the problem, that seems like a good candidate for our base case. Otherwise, turtle trisects the given size and inserts the implied equilateral triangle in the center section. Take a moment to understand that each next instruction to our turtle is executed from the turtle’s last position and direction. Also, by symmetry, if we turn 60° to the left and 120° to the right, it’s the same as turning 60° to the left and -120° to the left. Thus we can restate the drawing as a series of left-handed turns, a regularity that suggests that we can write the entire else block as a for loop:

def koch(t, order, size): if order ==0: t.forward(size) else: for angle in [60,-120, 60,0]: t.forward(size//3) t.left(angle)

Question: Why do we need the final 0 in the list? What happens if we omit it? Now that we have sorted out the 0th and 1st order curves, we can consider more complex forms. In fact, we’re ready to insert our recursive call. If you think about it, the above code essentially represents the base case and a single recursive case, except there’s no recursive call, which is how we connect the recursive case to the base case. Given the way we have written our function, we know we need to pass t, order and size as the arguments of our recursive call. If our base case is order == 0 then we ought to be decrementing our way towards the base case by order - 1. So a good working hypothesis for our recursive call is koch(t, order - 1, size). Of course, it has to be inside koch(), but where to put it? First think about where the recursive call wouldn’t go. Obviously, we don’t need it in the base case, so it must be somewhere in the else block. All the action is happening inside the for loop, so probably somewhere inside there. Now consider the two statements in the loop: one makes the turtle draw a line, the other turns the turtle by a certain angle. We are certainly not interested in modifying the angles, so the only piece of code that makes sense to address is t.forward(size // 3). Are we inserting the recursive call before or after drawing the line, or is something else involved? If you consult the animation above, you’ll notice that there is a pattern of substitution occurring that depends on the order curve we want. Let’s look at how we get from an order 1 to an order 2 Koch curve:

/\ b/ \c order 1: _____/ \_____ a d

b__/\__c (continues on next page)

1.13. Recursion and Self-Similarity: Koch Curves 73 recur Documentation, Release 1.0

(continued from previous page) \/ order 2: __/\__/ \__/\__ a d

For every segment a, b, c, d we want to insert a trisection, which implies an additional iteration of koch(). So it looks like we want to actually replace ``t.forward(size // 3)`` with the recursive call. But if we get rid of ``t.forward(size // 3)`` what will do the actual drawing? This is where the base case comes in, as ``t.forward(size)`` is pretty much the only thing it does. Putting it all together, our recursive function looks like this: def koch(t, order, size): if order ==0: t.forward(size) else: for angle in [60,-120, 60,0]: koch(t, order-1, size) t.left(angle)

This looks good, right? Except if you run it, the turtle will quickly run off the screen and you won’t see more than a couple of segments of the curve. I failed to modify the recursive function’s size argument, so in the process of recursing we also super-sized the drawing. We really want to keep fixed the distance between the start and the end of our drawing. No matter how many orders of recursion we request, we have to begin at (-500, 0) and end at (500, 0). Since we are operating on a third of the length of size, we want that proportion to be preserved, hence we modify our recursive call to reflect this: koch(t, order - 1, size // 3). With that modification, our final code looks like this: import turtle t= turtle.Turtle() wn= turtle.Screen() wn.bgcolor('black') t.color('orange') t.pensize(5) t.penup() t.setpos(-500,0) t.pendown() t.speed(0) #fastest draw setting, 10 is slowest def koch(t, order, size): if order ==0: t.forward(size) else: for angle in [60,-120, 60,0]: koch(t, order-1, size//3) t.left(angle) size= 1000 order=3 koch(t, order, size) wn.exitonclick()

If we draw the Koch curve using a higher order, say 3 or 4, the final curve is drawn perfectly, starting from the first segment and all the way to the last. This is very different from the YouTube animation above, which showed a

74 Chapter 1. Getting Started recur Documentation, Release 1.0

complete rendering of one order before moving on to the next. It tells us something important about the way recursion is functioning both here, and generally.

1.13.2 Tracing the recursive calls

As we’ve discussed before, every time a recursive call is invoked, the current function is interrupted and the program flow pursues the recursion until it reaches the base case (or is otherwise interrupted, as we saw with pal()). So koch() is going to keep recursing until it reaches the base case, and only then will it begin drawing. In terms of frames, our ‘talking algorithm’ would narrate the recursion for order 3 as follows: 0) Global frame says, “Please compute koch(t, 3, 1000)” 1) Frame 1 says, “I can’t compute koch(t, 3, 1000) because I first need to compute koch(t, 2, 333)” 2) Frame 2 says, “I can’t compute koch(t, 2, 333) because I first need to compute koch(t, 1, 111)” 3) Frame 3 says, “I can’t compute koch(t, 1, 111) because I first need to compute koch(t, 0, 37)” 4) Frame 4 says, “I found the base case: I know how to compute koch(t, 0, 37)! Here you go:”

______

Wait, what? Where’s the rest of it? Let’s go back to the code and unpack it by expanding the for loop into a series of statements similar to what we had earlier, but where t.forward(size // 3) is replaced by the recursive call:

def koch(t, order, size): if order ==0: t.forward(size) else: koch(t, order-1, size//3) t.left(60) koch(t, order-1, size//3) t.right(120) koch(t, order-1, size//3) t.left(60) koch(t, order-1, size//3)

Ah. It’s now clear that the first recursive call only asked turtle to get to the base case and, once it got there, drew a line with length of 37. After that, we moved on to t.left(60), where we repositioned the turtle at an angle of 60° to the left. We recursed once again until we drew, using the base case, another segment of length 37. Following a rightward turn of 120° (or leftward turn of -120°, as we wrote in the for loop), we drew the third segment. The final recursive call is executed after another leftward 60° turn, leaving us with. . .

/\ /\ _____/ \_____

. . . but always drawn in terms of the subdivision of length that corresponds with how many calls it took us to get to the base case - in this case, a segment length of 37. So we know that the only time the program draws anything will be at the base case, and it will be that smallest length, which corresponds to the number of divisions by 3 needed to reach order 0. That is, if we only drew an order 1 curve (ie, one trisection), at size == 1000 and order == 1, we would have size // 3 == 333 at order == 0, so every length drawn would have to be 333. Here is a diagram that shows the tree created by this series of recursive calls: If this is a complete iteration of the else block, the next reasonable question is: How do we begin the next trisection at the correct angle? Or, how do we go from. . .

1.13. Recursion and Self-Similarity: Koch Curves 75 recur Documentation, Release 1.0

Fig. 4: Figure 1. Order 1 call tree for koch()

76 Chapter 1. Getting Started recur Documentation, Release 1.0

__/ \ __/\__ ...to... __/\__/ ...?

After all, if we only run the program using order == 1 it finishes with the turtle pointing straight to the left and not angled upwards by 60°. Let’s look at a diagram for order == 2 to better understand what’s going on:

Fig. 5: Figure 2. Order 2 call tree for koch()

The turn happens when frame 2 has run through all of its statements and is discarded. Where does this put us? Back at frame 1, at the next statement: t.left(60)! Once the program turns left by 60°, the next recursive call goes into effect. Frame 3 will go through the same set of drawings and turns for the second call, then turn right 120°, repeat, and finally do the same for one more repetition after turning left 60°. While Figure 2 only shows half of order == 2, if we continue assigning each subset of four lines a letter, we can see the following, where A, C, E and G are completed recursive calls, and B, D and F are turtle turns:

D C__/\__E \/ __/\__/ \__/\__ ABFG

... (continues on next page)

1.13. Recursion and Self-Similarity: Koch Curves 77 recur Documentation, Release 1.0

(continued from previous page) else: koch(t, order - 1, size//3) # A t.left(60) # B

koch(t, order - 1, size//3) # C t.right(120) # D

koch(t, order - 1, size//3) # E t.left(60) # F

koch(t, order - 1, size//3) # G

We’ve already seen the consequences of multiple recursive calls with the Fibonacci sequence. In that case, two recursive calls created two branches per node (or frame). In the accompanying exercise, the tribonacci sequence called for three recursive calls per function. With koch() we have no less than four. Just as with fib() and trib(), once a function is called, the recursion must continue until it’s complete. This rule still holds even when we are confronted with multiple recursive calls in a single function. In fact it makes it easier to think through the function, since you only need to be concerned with working your way through one thread at a time. We thought of fib() as recursing all the way down the ‘left-hand’ side of the call tree. In the same way, the first koch(t, order-1, size // 3) call resolves every subsequent first koch(t, order-1, size // 3) call. But when we get to a base case, we know that the next three recursive calls will also hit the base case right away. This is what allows the drawing to keep its integrity. Another essential detail about koch() is the lack of return statements, or any variables that capture the output of the recursed call. It’s simply not necessary. The program is basically saying, ‘at this point just get to the base case, do what it says and then carry on with the next line’. There’s nothing to return, nothing to store, and (unlike fib() and trib()) no additional computation needed beyond the execution of the base case itself. All it does is draw the line of the correct length. The subsequent line of code positions the turtle at the correct angle to draw the next line - but only once the preceding recursive call has returned from the base case.

1.13.3 The Koch snowflake

We can now extend the Koch curve into what is known as the Koch snowflake. Instead of executing orders of recursion on a single line, let’s do the same trisection on each side of a triangle. A good way to begin thinking about this is to ask: What is an order 0 snowflake? If an order 0 curve is a straight line, it stands to reason that an order 0 snowflake is simply a triangle. So we first make sure we can draw a triangle. We’ll reposition our starting point and shorten the initial length so that everything fits on the screen (well, it fits on my screen - you may have to fiddle with the values to ensure it fits yours). Then we’ll declare a separate function that will draw an equilateral triangle in a way very similar to what we’ve done above: import turtle t= turtle.Turtle() wn= turtle.Screen() wn.bgcolor('black') t.color("orange") t.pensize(5) t.penup() t.setpos(-450, 250) t.pendown() t.speed(0) (continues on next page)

78 Chapter 1. Getting Started recur Documentation, Release 1.0

(continued from previous page) def triangle(size): for angle in [-120,-120,0]: t.forward(size) t.left(angle) size= 900 triangle(size)

Now let’s think how we can apply koch() to a polygon as opposed to simply a line. Well, a polygon is just lines joined at angles, so we want to build the curve on a per-line basis. That is, we don’t need to consider the polygon as a whole. As long as we have the instructions to, on the one hand, build the polygon, and on the other, build a curve for each line, we should be in good shape, so to speak. Thinking this way is helpful because otherwise you may find yourself considering how to integrate the polygon with koch() in a way that is overly complicated. The description I’ve just offered implies a division of labor: 1. Set up the polygon. 2. In place of drawing the first line of the polygon, call koch() and let it do the work. 3. When the first side is completed, turn the turtle at the appropriate angle. 4. Invoke koch() again. 5. Continue until all sides have been completed and the turtle is back where it started. The other implication is that we would be better off computing our polygon and our curve separately. There’s no beter way to do that than to write separate functions. Assuming we change nothing about koch(), what does this mean for triangle()? Here’s a first draft, so we can just see the two functions together: def triangle(size): for angle in [-120,-120,0]: t.forward(size) t.left(angle) def koch(t, order, size): if order ==0: t.forward(size) else: koch(t, order-1, size//3) t.left(60) koch(t, order-1, size//3) t.right(120) koch(t, order-1, size//3) t.left(60) koch(t, order-1, size//3)

Now, how do we get the two functions talking to one another? Let’s go back to how we set up our recursive call in koch(). We quickly established that we wanted to substitute t.forward(size // 3) with the recursive call koch(t, order - 1, size // 3). Consulting our pseudocode above, you can see a similar opportunity: in triangle(), `t.forward(size) is exactly where we want koch() to do its work. Here is our final code, with the re-introduction of the for loop in koch(): import turtle t= turtle.Turtle() wn= turtle.Screen() (continues on next page)

1.13. Recursion and Self-Similarity: Koch Curves 79 recur Documentation, Release 1.0

(continued from previous page) wn.bgcolor('black')

t.color("orange") t.pensize(5) t.penup() t.setpos(-450, 250) t.pendown() t.speed(0)

def triangle(size): for angle in [-120,-120,0]: koch(t, order, size) t.left(angle)

def koch(t, order, size): if order ==0: t.forward(size) else: for angle in [60,-120, 60,0]: koch(t, order-1, size//3) t.left(angle) size= 900 order=3 triangle(size) wn.exitonclick()

1.13.4 Heuristics and Exercises

The simplest form of a fractal is known as order 0. When creating a recursively drawn solution, first learn how to draw that shape, and then use that as the base case. Then design the code that would draw the order 1 shape, using a simple if/else branching. Your next step should be to design the recursive call that links the two, such that all recursive calls ultimately resolve themselves at the base case. In some cases, for exampe when the purpose of the program is drawing, you may not even need to return a value from a recursive call, or store it in a variable. It’s completely sufficient to get to the base case, and simply allow the code to continue executing. Exercise: An n-flake is any polygon whose lines are recursively replaced with the polygon itself. Obviously, the Koch snowflake is one such n-flake. Modify koch() to an n-flake whose order 0 is a square, and which ‘grows’ squares on the same trisected basis as the Koch curve. At what depth do the new squares begin to overlap? At what depth do the drawing’s details become almost imperceptible? Can you think of a way to ‘magnify’ the view of a part of the pattern? You may want to use the methods t.fillcolor(), t.begin_fill() and t.end_fill() to fill in the poly- gon, and play around with other parameters, such as pensize(), to get a better view of what’s going on. Credit: Much of this material adapted from Chapter 18.1 of the fantastic resource, Think Like A Computer Scientist

80 Chapter 1. Getting Started recur Documentation, Release 1.0

1.14 The Sierpinski Triangle

1.14.1 Setting up the problem

We now know how to recursively apply a trisection to create complex forms that are nevertheless bounded by the initial length or perimeter of a fractal’s simplest possible order. Next we’ll leverage these and other techniques we’ve learned to develop a very interesting fractal which takes itself as the recursive element: the Sierpinski triangle. Briefly, the Sierpinski triangle is a fractal whose initial equilateral triangle is replaced by three smaller equilateral triangles, each of the same size, that can fit inside its perimeter. You could make the argument that the middle portion of the initial triangle can accommodate a fourth triangle, but we are disallowing rotation, so that region remains empty. Further orders of the fractal replace each of the three new triangles with another three, and so forth.

Fig. 6: Figure 1. Sierpinski triangles, orders 0 to 2

As with the Koch curve and Koch snowflake, we first want to establish the 0th order of the fractal. In fact, it is the same as the Koch snowflake - a single equilateral triangle. Unlike the snowflake, common practice draws it with the base on the bottom, so let’s modify our code to reflect this: import turtle def draw(t, p): t.fillcolor('orange') t.up() t.goto(p[0][0], p[0][1]) t.down() t.begin_fill() t.goto(p[1][0], p[1][1]) t.goto(p[2][0], p[2][1]) t.goto(p[0][0], p[0][1]) t.end_fill() def main(): t= turtle.Turtle() t.speed() wn= turtle.Screen() wn.bgcolor("black") p=[[-500,-400], [0, 500], [500,-400]] draw(t, p) wn.exitonclick() main()

1.14. The Sierpinski Triangle 81 recur Documentation, Release 1.0

I’ve also done a few things differently here, which will come in handy down the road. The first is wrapping the code that used to be in the global frame into a main() function. It’s generally good practice to write programs where everything is defined within a function. Compartmentalization helps debugging and keeps functions from accessing variables that, by virtue of being in the global frame and not residing in a specific function, you may not realize are being accessing until something breaks. The second is in the drawing itself. This time, I’m not concerned with specifying a color for the line. Instead, by using begin_fill() and end_fill(), the line creates the shape that is filled at the moment end_fill() is called. The Sierpinski triangle is shape-based, as opposed to the line-based fractals we have created so far, so it will allow us to better see what we have drawn. Finally, the most important innovation is our use of coordinates to guide the drawing. We use the turtle’s goto() method to tell turtle where it’s going next. The draw() function calls t.goto() four times: we first ‘pick up’ the turtle and jump to the desired coordinates, put it ‘down’ and draw our triangle with the next three goto() method calls. To finish drawing, we end at the same coordinates we started (NB: end_fill() will fill up any space outlined between it and the begin_fill(), regardless of whether the shape is closed). Also, writing t.goto(p[0][0], p[0][1]) may seem unnecessary when we could just write t.goto(p[0]), but the importance of referencing specific index positions for coordinates will become apparent soon enough. Loosely speaking, you could say that, for curves and snowflakes, we used a vector-based approach: the turtle’s last known position and angle provided the starting point for the next move’s direction and angle. Here we are using a raster-based approach, where everything is based on points on a grid. Both approaches have their merits, and it’s certainly possible to write code for these and other fractals using either method.

1.14.2 Midpoints of midpoints

Now that we have our order 0 fractal, let’s think about how we can start subdividing it. Each side of our equi- lateral triangle has a length of 1000, so our next order should be three triangles, each having sides of length 500. But since we’re not using vectors, we don’t have a statement like t.forward(length) that we can convert to t.forward(length // 2). A raster approach means we have to do some simple arithmetic on the list of coor- dinates that is p. For example, converting our original triangle to one that has the same starting vertex but only half the length looks like this:

p = [[-500, -400], [0, 500], [500, -400]] q = [[-500, -400], [-250, 50], [0, -400]]

Our vertex stays the same (p[0] == q[0]), but the other two coordinates change. We’re now getting the midpoint between p[0] and p[1], and the midpoint between p[0] and p[2], and making those our new endpoints for q[1] and q[2]. Mathematically, the formula for computing the midpoint (x’, y’) of coordinates (x1, y1) and (x2, y2) is:

x' = (x1 + x2) / 2 y' = (y1 + y2) / 2

Using the above example,

x' = (-500 + 0) / 2 = -250 y' = (-400 + 500) / 2 = 50

So now we know our first midpoint coordinate is (-250, 50). By the same logic, our second is (0, -400). We can abstract up a level to say that, for any pair of coordinates p1, p2, the midpoint is:

((p1[0] + p2[0]) / 2, (p1[1] + p2[1]) / 2)

We aren’t interested in explicitly defining triangles, as we were with koch(). Rather, the series of coordinates that we specify happen to make triangles. The end result may look the same, but by thinking about this in terms of coordinates

82 Chapter 1. Getting Started recur Documentation, Release 1.0

and midpoints, we’ve actually vastly simplified the problem. This creates a lot of flexibility, since we just need to get the right midpoints, computed from the right vertices, and arranged in the right order. By the same token, if we wanted the next-smaller triangle r to share the same vertex as p and q, we would apply the formula to q’s other two corners. So far, here are our three triangles:

Fig. 7: Figure 2. Three triangles built from a common vertex

p = [[-500, -400], [0, 500], [500, -400]] q = [[-500, -400], [-250, 50], [0, -400]] r = [[-500, -400], [-375, -175], [-250, -400]]

You may be getting the sense that we will be computing midpoints quite often, so this should probably be its own function that we can call at any time, with the function’s parameters being whatever pair of coordinates whose midpoint we need to compute at any given moment:

def mid(p1, p2): return ((p1[0]+ p2[0])/2, (p1[1]+ p2[1])/2)

Making midpoints - and midpoints of midpoints - has a distinctly recursive smell to it. But before we get to the recursive case, let’s look at what it will have to address. For instance, to complete our order 1 Sierpinski triangle, we have to draw the remaining two triangles within the confines of our order 0 triangle. To do that we’ll start A at the bottom left ((-500, -400)) to make the triangle (A, x, y). Then we’ll shift the vertex to B ((0, 500)) to make triangle (B, x, z). Finally, triangle (C, z, y) will be created from point C, at coordinates ((500, -400)). For each of these, we’ll use the mid()-generated coordinates, x, y, z. Any recursive solution will have to place us at these points. For further orders, we’ll need to place our starting vertices not just at A, B, C, but at x, y, z, and many other points.

1.14. The Sierpinski Triangle 83 recur Documentation, Release 1.0

Fig. 8: Figure 3. Order of vertices from which midpoints are computed

84 Chapter 1. Getting Started recur Documentation, Release 1.0

Question: Can you think of a reason why choosing A, B, C as the starting vertices is preferable to choosing something like A, x, y, ie, the bottom left-hand corner of every triangle?

1.14.3 Determining the order of drawing

Another organizing principle for our recursive solution is that of the orders themselves. If we know that order 0 is a single triangle with sides of length n, then order 1 will have 3 triangles with sides of length n / 2, and order 2 will have 9 triangles with sides of length n / 2 / 2, etc. So we’ll need a variable order that tells us how deep into recursion we need to go. Decrementing/incrementing order by 1 controls the depth at which we’re operating. By corollary, if main() defines order as 0 then we just go straight to the base case, which is already represented by draw() and the starting list of coordinates, p. We learned from our Koch snowflake code that it’s a good idea to keep the recursive mechanism in a separate function from the function that draws the actual triangle, so let’s stick to that and keep draw() as it is. We’ll create a new function, sierpinski(), that will call draw() when needed. As we get into more advanced implementations of recursion, you’ll see that keeping the recursing function separate from the computing function is a common design decision. This is what our code looks like so far: import turtle def draw(t, p): t.fillcolor('orange') t.up() t.goto(p[0][0], p[0][1]) t.down() t.begin_fill() t.goto(p[1][0], p[1][1]) t.goto(p[2][0], p[2][1]) t.goto(p[0][0], p[0][1]) t.end_fill() def mid(p1, p2): return ((p1[0]+ p2[0])/2, (p1[1]+ p2[1])/2) def sierpinski(t, order, p): draw(t, p) if order>0: # insert recursive magic here # probably involving 'sierpinski(t, order - 1, p)' # or something like that def main(): t= turtle.Turtle() t.speed() wn= turtle.Screen() wn.bgcolor("black")

order=0 p=[[-500,-400], [0, 500], [500,-400]]

sierpinski(t, order, p) wn.exitonclick() main()

1.14. The Sierpinski Triangle 85 recur Documentation, Release 1.0

This seems a bit underwhelming, since we’re not doing much more than drawing the 0 order fractal. But I want to bring attention to the structure being set up here. Looking at main(), you’ll notice that the call to draw() has been replaced by a call to sierpinski(). This makes sense, since we want to first evaluate how much recursion, if any, is needed before we start drawing, which is precisely what sierpinski() will handle. Therefore we’ll pass an additional argument order to sierpinski(), which I’ve set to 0 for the moment. There is also something unusual going on with the base case. Most algorithms we’ve seen so far have provided usable results only upon reaching the base case - usually after we decremented some variable to 1 or 0, or met some truth condition. Sometimes we just wanted what the base case gave us (True, in the case of pal(), or a in the case of gcdRecur()). With koch() we only drew a segment when we reached the base case, because it was at the base case that we had the correctly subdivided length of line. But mostly, we were interested in the final result provided to us at the conclusion of the recusrsive cascade. Here, we draw the 0 order fractal straight away, regardless of the value that is bound to order. This is because we want to represent the order 0 triangle first. If we didn’t, we wouldn’t have coloring for the ‘empty triangle’ in the middle. And if we waited until the end of the program, it would color over everything we’d done until that point. In fact, if we started by drawing the highest order set of triangles first, then each lower order would graphically ‘overwrite’ the previous, until we got to order 0, which would overwrite everything. Consider what this means for the general order in which we want to draw our triangles. We can’t draw order 1 until we’ve drawn order 0, so by the same reasoning we can’t draw order 2 until we’ve drawn order 1, etc. This seems backwards from how we’ve been using recursion so far: we drew (or computed) the highest order (or n) by reaching the base case and making all the subsequent post-recursive computations. With the exception of mergeSort() we didn’t care about any of recursion’s intermediate outputs, only the final result. If we want to draw a triangle immediately, this means we have to depart from our usual template: def fn(n): if n ==1: #or some other minimum return n else: return fn(n-1) # with other computation

The need to call draw() right off the bat means it can’t sit inside an if block: def sierpinski(t, order, p): draw(t, p) if order>0: # make recursive call with sierpinski(t, order - 1, p)

If our design is valid, then each recursive call will itself trigger draw() as its first statement. Furthermore, this means that, for orders other than 0, p has to represent the right coordinates. This is where mid() comes into play. Recall Figure 2, which showed three triangles of decreasing size built from a common vertex, p[0]. We can represent this recursively as: def mid(p1, p2): return ((p1[0]+ p2[0])/2, (p1[1]+ p2[1])/2) def sierpinski(t, order, p): draw(t, p) if order>0: sierpinski(t, order-1, [p[0], mid(p[0], p[1]), mid(p[0], p[2])])

In pseudocode, we might write this as follows: 1. Draw the base case 2. Draw the first triangle of the next order

86 Chapter 1. Getting Started recur Documentation, Release 1.0

3. Keep drawing the first triangle of the next order until order == 0 This is a good start. And you can see that we can do the same at the other two vertices of the 0 order fractal, simply by re-arranging the coordinates we want to pass to mid():

Fig. 9: Figure 4. Computing midpoints based on p

If we translate these next two triangle calculations into code, we’ll have: def sierpinski(t, order, p): draw(t, p) if order>0: sierpinski(t, order-1, [p[0], mid(p[0], p[1]), mid(p[0], p[2])]) sierpinski(t, order-1, [p[1], mid(p[0], p[1]), mid(p[1], p[2])]) sierpinski(t, order-1, [p[2], mid(p[2], p[1]), mid(p[0], p[2])])

Running this code for order 1 gives us the correct drawing. At this point, you may think that we only have a partial solution: a series of triangles of decreasing size, with each series anchored at one of the three vertices of the base case triangle. In other words, there will be a lot of triangles missing. However, if you run it for, say, order == 4, you’ll see that we have created a complete solution. Why is this?

1.14.4 Stepping through the code

Here’s our final code for an order 2 triangle. I’ve included a bit that will be very clarifying: a list of colors that will fill each triangle based on the order we’re in at the moment. import turtle def draw(t, color, p): t.fillcolor(color) t.up() t.goto(p[0][0], p[0][1]) t.down() t.begin_fill() t.goto(p[1][0], p[1][1]) t.goto(p[2][0], p[2][1]) t.goto(p[0][0], p[0][1]) t.end_fill() def mid(p1, p2): return ((p1[0]+ p2[0])/2, (p1[1]+ p2[1])/2) def sierpinski(t, order, p): (continues on next page)

1.14. The Sierpinski Triangle 87 recur Documentation, Release 1.0

(continued from previous page) colormap=['red','orange','yellow','green','blue','violet'] draw(t, colormap[order], p) if order>0: sierpinski(t, order-1, [p[0], mid(p[0], p[1]), mid(p[0], p[2])]) sierpinski(t, order-1, [p[1], mid(p[0], p[1]), mid(p[1], p[2])]) sierpinski(t, order-1, [p[2], mid(p[2], p[1]), mid(p[0], p[2])]) def main(): t= turtle.Turtle() t.speed() wn= turtle.Screen() wn.bgcolor("black") p=[[-500,-400], [0, 500], [500,-400]] sierpinski(t,2, p) wn.exitonclick() main()

The recursive calls may seem a bit intimidating, but all we are doing is computing appropriate coordinates for each triangle in a given order. draw() always jumps to and draws the set of coordinates p that is correct for that moment. If you’re feeling skeptical, here is a print-traced version of the code that will show you the outputs while the drawing happens: import turtle call=0 def draw(t, color, p): t.fillcolor(color) t.up() t.goto(p[0][0], p[0][1]) t.down() t.begin_fill() t.goto(p[1][0], p[1][1]) t.goto(p[2][0], p[2][1]) t.goto(p[0][0], p[0][1]) t.end_fill() def mid(p1, p2): return ((p1[0]+ p2[0])/2, (p1[1]+ p2[1])/2) def sierpinski(t, order, p): global call call+=1 print('\ncall is', call) colormap=['red','orange','yellow','green','blue','violet'] print(' draw', colormap[order],'at', p) draw(t, colormap[order], p) print(' outer call done, order =', order) if order>0: sierpinski(t, order-1, [p[0], mid(p[0], p[1]), mid(p[0], p[2])]) print(' 1st recursive call done, order =', order) sierpinski(t, order-1, [p[1], mid(p[0], p[1]), mid(p[1], p[2])]) print(' 2nd recursive call done, order =', order) sierpinski(t, order-1, [p[2], mid(p[2], p[1]), mid(p[0], p[2])]) print(' 3rd recursive call done, order =', order) (continues on next page)

88 Chapter 1. Getting Started recur Documentation, Release 1.0

(continued from previous page)

def main(): t= turtle.Turtle() t.speed() wn= turtle.Screen() wn.bgcolor("black") p=[[-500,-400], [0, 500], [500,-400]] sierpinski(t,2, p) wn.exitonclick()

main()

You may want to run the print-trace program and consult the output as we step through the code. Keep in mind that this isn’t any more difficult to trace through than any other code using multiple recursive calls - all the heuristics that we established with prior algorithms still hold. The added advantage with the Sierpinski triangle is seeing it drawn in real time, so mapping the recursion to the code will be easier - plus the final result is color-coded! A word on the recursive calls: if each triangle must fit three unique triangles within it, then we’ll clearly need three instances where sierpinski() calls itself. Of course, once we have specified three recursive calls within our function, this means that we will have three calls every time the function recurses. As with any function with multiple recursive calls, the most important thing to remember is that the first recursive call continues recursing that first call until it reaches the ‘bottom’ or ‘leaf’, in this case when order == 0. Put another way, it is recursion executing in its usual depth-first fashion. Put a third way, this initial, depth-first run is exactly what Figure 2 represents. In terms of our colors, we have the following:

order ==2 =='yellow' order ==1 =='orange' order ==0 =='red'

This is the correct order of drawing, since as I pointed out above, the base case always gets drawn first. And indeed, the first triangle is yellow, with a length of 1000 for each side. Once we’ve drawn the first triangle, we pass into the if block, since order == 2, and encounter the first recursive call:

sierpinski(t, order-1, [p[0], mid(p[0], p[1]), mid(p[0], p[2])])

This call keeps the bottom left vertex as-is (p[0]) and calls mid() twice to get the new length, 500, that is appro- priate to an order 1 triangle. However, we still aren’t done with this call, as order != 0. So one more time around with the same call yields a red triangle, still with the same starting vertex, but this time with a length of 250, as p[1] and p[2] in the ‘orange order’ have once again been modified to what’s needed for the ‘red order’:

p at yellow: [[-500, -400], [0, 500], [500, -400]] p at orange: [[-500, -400], (-250.0, 50.0), (0.0, -400.0)] p at red: [[-500, -400], (-375.0, -175.0), (-250.0, -400.0)]

So far we’ve replicated the original series of three triangles that we had with p, q, r in Figure 2 above, but as we saw, the code is a complete solution, so let’s keep going. Also, keep in mind that all of these calls are opening and closing frames, based on where we are in the program execution. Seeing the call tree should make this more clear: This further helps to discern the order of drawing. Since the recursive call whose triangle is anchored at p[0] is first, we will first see all possible triangles drawn from the perspective of p[0] at coordinates (-500, -400). This is equivalent to the first, left-most depth-first traversal of the call tree. This is represented in calls 1-3 in Figure 6.

1.14. The Sierpinski Triangle 89 recur Documentation, Release 1.0

Fig. 10: Figure 5. Call tree for an order 2 Sierpinski triangle

But since frame 2 is at order == 1 we can compute the remaining two recursive calls, that is, the remaining two red triangles at order == 0, whose starting vertices are at p[1] and p[2], respectively. So now we have calls 4-5 sorted out as well.

Fig. 11: Figure 6. Calls 1-3 and 4-5

Having computed everything for frame 2, we return to frame 1, where we move on to the second recursive call, p[1], which is the vertex at the apex of the triangle. We don’t have anything to draw in frame 1 (we already did that, as we’ve now passed into the if block), so we open frame 6, draw the orange triangle and, in frame 7, the smaller red triangle, where both triangles are using p[1] == (0, 500). This takes care of calls 6-7. For calls 8-9, we repeat the same logic as above: since frame 6 is at order == 1, we compute the remaining two red triangles, and return to frame 1 by way of frame 6: At the final recursive call p[2] in frame 1, all we have left is to fill in the bottom right of the base case triangle. By now you can see the pattern: in frames 10 and 11 p[2] is the starting vertex, providing us with calls 10-11. Finally,

90 Chapter 1. Getting Started recur Documentation, Release 1.0

Fig. 12: Figure 7. Calls 6-7 and 8-9 we add the last two red triangles, p[1] followed by p[0], for calls 12-13:

Fig. 13: Figure 8. Calls 10-11 and 12-13

The important bit in all of this is being able to follow the order of recursive calls, which, as I’ve said, is exactly the same as any other function that uses multiple recursion. I’ve omitted some discussion about how the various coordinates get passed and re-computed, but you should take a moment to trace through how this works, as it’s very elegant.

1.14.5 Heuristics and Exercises

In the case of the Sierpinski triangle, we wanted to ensure that lower-order drawing didn’t obscure higher-level trian- gles. To do this, every time sierpinski() called itself, draw() would be the first statement executed. This meant that, at the leaf level, only the base case was triggered, and the recursive calls were skipped.

1.14. The Sierpinski Triangle 91 recur Documentation, Release 1.0

By keeping the recursing function separate from the computing function, you can use ongoing results of recursion as inputs for further computation. We used this technique with sierpinski() to ‘outsource’ the drawing of triangles, just as mergeSort() ‘outsourced’ its merging/sorting needs to merge(). If you can write the base case and the minimally recursive case (ie, one recursion), you may well have solved the problem for any recursive depth. Exercise: Implement a recursive solution for the Sierpinski triangle using a vector-based approach. What parts of the raster-based code can you retain? Exercise: The is a variation that takes a square as its base case. Each square is divided into nine equal squares, and only the central square is preserved. Write a recursive implementation for the Sierpinski carpet using a raster-based approach. Credit: Some of this material adapted from Chapter 5.8 of Problem Solving with Algorithms and Data Structures using Python

1.15 Lindenmayer Systems

1.15.1 A different way of generating fractals

Also known as ‘rewrite’ or ‘L-systems’, this approach was first developed in 1968 by Aristid Lindenmayer, who used L-systems to describe the behaviour of plant cells, and to model the growth processes of plant development. L-systems can create realistic-looking branching structures, as well as generate fractals in a surprisingly simple way. L-systems are also a gateway to understanding systems of propositional logic. We’ll stay away from propositional logic, and just use the vector-based drawing approach to explore some of these shapes, which is perhaps more immediately gratifying. An L-system has three components: 1. An alphabet, consisting of two type of symbols: variables (which can be replaced) and constants (which cannot) 2. An axiom, consisting of symbols from the alphabet, which provides the starting point for replacement 3. Rule(s) of production, which show us the way(s) in which we can transform each replaceable symbol with said rule(s) There is one more requirement for all L-systems: each step (eg, an application of the rules of production) must be comprehensive before the next step is applied. Let’s use Lindenmayer’s original L-system as our first example: alphabet: variables: A, B constants: none axiom: A rules: A --> AB B --> A

We begin with our axiom A at step 0. Our first iteration yields AB by the rule A —> AB. Our second iteration involves taking every replaceable element in AB and applying the rules of production. Since A —> AB and B —> A, our next transformation yields AB —> ABA. You can see that for every iteration of n we generate an ever-increasing string of letters: n == 0: A n == 1: AB n == 2: ABA n == 3: ABAAB n == 4: ABAABABA n == 5: ABAABABAABAAB (continues on next page)

92 Chapter 1. Getting Started recur Documentation, Release 1.0

(continued from previous page) n == 6: ABAABABAABAABABAABABA n == 7: ABAABABAABAABABAABABAABAABABAABAAB

We can represent this in Python with a fairly trivial solution:

def lSysGenerate(s, n): for i in range(n): s= lSys(s) return s

def lSysCompute(s): new='' for c in s: if c =='A': new+='AB' elif c =='B': new+='A' return new

axiom='A' n=3 print(lSysGenerate(axiom, n))

>>> ABAAB

Question: Instead of returning the nth value of the L-system, modify the return statement so that it gathers up the length of each string in a list. Can you identify the sequence? If you go out to n == 8 that should be enough. Derive a formula that will generate the sequence of lengths up to any given n. So how can we use this system to generate fractals? If we feed it into our turtle program, we can say that every element in the string is a cue for the turtle to draw a segment. But if we can’t turn the turtle then all we’ll have is a very long, very straight line. This is where the alphabet’s constants come in - we’ll designate them as our angles. So if we wanted to turn right by a certain angle, we’ll include a +, and if we wanted to go left, we’ll designate it as -. Since these constants are not subject to substitution, they need to be written into the rules of production in order for them to propagate through the string. Because we’re already familiar with it, let’s go back to the Koch curve. Recall how we trisect a line of length n: 1. Draw a line of length n // 3 2. Turn left 60° 3. Draw a line of length n // 3 4. Turn right 120° 5. Draw a line of length n // 3 6. Turn left 60° 7. Draw a line of length n // 3 We can use an L-system to go from order 0 to order 1 simply by translating this into symbols, axioms and rules of production. For the axiom, all we need is a line, as represented by A. This, by definition, is our order 0 fractal. To derive our sole rule of production, we rewrite the entire set of instructions above as A-A++A-A, which is also what’s required to go from order 0 to order 1.

1.15. Lindenmayer Systems 93 recur Documentation, Release 1.0

This implies that our alphabet consists of three symbols: the variable A and the constants + and -. Finally, we write a bit of code that takes A, - and +, and turns them into commands that the turtle can follow:

def draw(t, s, n): for c in s: if c =='A': t.fd(n) elif c =='-': t.left(60) elif c =='+': t.right(60)

Of course, we want to be able to extend this to create as many iterations of A as we like.

n == 0: A n == 1: A-A++A-A n == 2: A-A++A-A-A-A++A-A++A-A++A-A-A-A++A-A n == 3: A-A++A-A-A-A++A-A++A-A++A-A-A-A++A-A-A-A++A-A-A-A++A-A++A-A++A-A-A-A++A-A++A-

˓→A++A-A-A-A++A-A++A-A++A-A-A-A++A-A-A-A++A-A-A-A++A-A++A-A++A-A-A-A++A-A

Recall that our only variable is A - we want to parse our string in such a way that constants are unaffected, otherwise we will muck up our angles and who knows where the turtle will go. So we’ll rewrite lSysCompute() using the following crisp construction:

def lSysCompute(s): d={'A':'A-A++A-A'} return ''.join([d.get(c) or c for c in s])

The first line in the function declares a dictionary d that has the target variable as the key, and the rule of production as its value. This is very handy, as in the future we can just substitute any dictionary we like for d. If we have multiple variables, each with its own rule of production, then all we do is add a new key-value pair. The second line is a list comprehension: if the character c in s is in the dictionary as a key, then add its value (ie, rule) as the next list item; if not, just add c as it is. Once the list is completed, it’s converted to a string and returned to lSysGenerate(). Here is our complete code for generating L-system Koch curves:

import string import turtle

def lSysGenerate(s, order): for i in range(order): s= lSys(s) return s

def lSysCompute(s): d={'A':'A-A++A-A'} return ''.join([d.get(c) or c for c in s])

def draw(t, s, length, angle): for c in s: if c in string.ascii_letters: t.forward(length) elif c =='-': t.left(angle) elif c =='+': t.right(angle) (continues on next page)

94 Chapter 1. Getting Started recur Documentation, Release 1.0

(continued from previous page) def main(): t= turtle.Turtle() wn= turtle.Screen() wn.bgcolor('black')

t.color('orange') t.pensize(1) t.penup() t.setpos(-250,-250) t.pendown() t.speed(0)

axiom='A' length= 10 angle= 60 iterations=3

draw(t, lSysGenerate(axiom, iterations), length, angle)

wn.exitonclick() main()

The only other modification to the code is in draw(), where I’ve changed the if block to pick up on the presence of any letter, not just A - another functionality we’ll need if we’re to parse strings with more than one variable. (NB: don’t forget to import the string module at the top of your code, otherwise Python won’t know how to interpret string.ascii_letters.) Now we can generate any L-system fractal we like - it’s just a matter of plugging in the right axiom, angle, length and rules of production. Try, for example, the Gosper Curve, whose whole L-system can be characterized as: alphabet: variables: A, B constants: +, - angle: 60° axiom: A rules: A --> A-B--B+A++AA+B- B --> +A-BB--B-A++A+B

Our old friend the Sierpinski triangle can be represented as well. You’ll note that the axiom is, once again, the base case that we used in sierpinski(): alphabet: variables: A, B constants: +, - angle: 120° axiom: A-B-B rules: A --> A-B--B+A++AA+B- B --> +A-BB--B-A++A+B

A particularly interesting shape is the Sierpinski arrowhead, which has the following traits: alphabet: variables: A, B constants: +, - (continues on next page)

1.15. Lindenmayer Systems 95 recur Documentation, Release 1.0

(continued from previous page) angle: 60° axiom: A rules: A --> B+A+B B --> A-B-A

As you can see, the Sierpinski arrowhead has a modest beginning. While it shares the same order 0 shape as the Koch curve, it diverges quite quickly - and only looks ‘right’ when order % 2 == 0 (that is, when order is even). Despite the fact that the rules of production are quite simple compared to the Gosper curve and Sierpinski’s triangle, if you run the arrowhead algorithm with a substantial number of iterations (eg, 8), the shape that it implies will seem quite familiar. But hang on, you say. This may be very interesting and all, but so far there hasn’t been any mention of recursion! Also, you may have noticed that, to see some of these drawings in their entirety, you’ve also had to revise your length and setpos() variables, as the length of an iteration doesn’t scale down nicely as it has done with our implementations of koch() and sierpinski(). I’m not trying to distract you with pretty pictures. By now you should have enough of a grasp on recursion that you can take the above code and figure out how to make it recursive - and have that recursive solution conserve the original length or perimeter of the fractal. Along the way, see if you can discover your own heuristics.

1.15.2 Exercises

Exercise: Modify the above code into a recursive form, using the Koch curve as an example. Make sure that your recursion preserves the original start and end points of the order 0 fractal - that is, if we have a Koch curve that begins at (-500, 0) and ends at (500, 0), then any order of the Koch curve should do the same. Exercise: For a given L-system, find if a string target exists as part of the system after a given number of iterations. Write a recursive solution that returns the order where target appears, as well as the index position at which it appears. For example, say that we want to find at what point target = 'BAABAABA' first appears in Lindenmayer’s original system. After running the function the output should print target BAABAABA is at index position 6 at order 7. If it doesn’t, output should be target BAABAABA not found.

1.16 Solving L-System Recursion

Here are step-by-step solutions to both the exercises presented at the end of Lindenmayer Systems.

1.16.1 Recursive version of a Koch curve L-system

Exercise: Modify the iterative code for a Koch curve L-system into recursive form. Make sure that your recursion preserves the original start and end points of the order 0 fractal - that is, if we have a Koch curve that begins at (-500, 0) and ends at (500, 0), then any order of the Koch curve should do the same. Solution: The first thing to do is to start small. We can simplify the problem by putting aside the whole drawing business and just generating the string using a recursive mechanism. If we can get the right theorem, then we should be able to draw the shape without a problem. Let’s go back to Lindenmayer’s original system, where the rules of production state A —> AB and B —> A. Here is our original iterative code:

96 Chapter 1. Getting Started recur Documentation, Release 1.0

def lSysGenerate(s, iterations): for i in range(iterations): s= lSysCompute(s) print('s =', s) return s

def lSysCompute(s): d={'A':'AB','B':'A'} return ''.join([d.get(c) or c for c in s])

axiom='A' iterations=3 print(lSysGenerate(axiom, iterations))

>>> s = AB >>> s = ABA >>> s = ABAAB >>> ABAAB

We can approach the problem intuitively and conservatively, by keeping the code that we know works and rewrit- ing the code that no longer applies. If you examine the two functions, lSysGenerate() feeds the string s to lSysCompute() while lSysCompute() ‘does the work’. You can already see the similarity to mergeSort() and sierpinski() where we put the recursive action in one function and fed its outputs to other functions that merged and sorted lists, or drew triangles. So let’s begin by converting lSysGenerate() from an iterative to a recursive form. To do that, let’s divide the problem into its pre- and post-recursive sections. Pre-recursively, it seems like there’s really no work that needs to be done. We don’t need to seed our frames with anything on the way to the base case. We just want to set up the right number of frames, so that we recurse the correct number of times and get the correct final theorem. This implies that all the work should happen post-recursively. More specifically, this means that the base case returns the seed for all further computations, just as we returned [[]] for powerSet(). With the base case in hand, every pass through the original code’s for loop can be recursively restated as each successive frame’s application of the rules of production to the base case/preceding frame. If we’ve done it right, our final returned string will have accumulated all the substitutions needed for the nth order. (I’m taking advantage of the fact that, since the recursion is linear, ‘frame’ and ‘order’ mean pretty much the same thing as ‘iterations’ did in the iterative code. Therefore I’ll substitute ‘order’ for ‘iterations’ from here on out.) As for the base case, it’s obviously the axiom itself. In the case of the original L-system, this is A, which, along with order, we pass to lSysGenerate(). So with order as our counter, we keep decrementing until we get to the base case of order == 0. Following our standard recursive template, we can assert:

def lSysGenerate(axiom, order): if order ==0: return axiom else: return lSysGenerate(axiom, order-1)

order=3 print(lSysGenerate('A', order)

>>> A

Since the function doesn’t have multiple recursive calls, we know that we’re dealing with a fairly simple structure here. Nevertheless, let’s throw in some print-tracing to keep track of frames, along with splitting the recursive call from the

1.16. Solving L-System Recursion 97 recur Documentation, Release 1.0 return statement with r: frame=0 def lSysGenerate(axiom, order): global frame if order ==0: frame+=1 print('base case, frame =', frame) print('returning', axiom) return axiom else: frame+=1 print('frame =', frame) r= lSysGenerate(axiom, order-1) frame-=1 print('frame =', frame) print('r =', r) return r print(lSysGenerate('A',3)

>>> frame = 1 >>> frame = 2 >>> frame = 3 >>> base case, frame = 4 >>> returning A >>> frame = 3 >>> r = A >>> frame = 2 >>> r = A >>> frame = 1 >>> r = A >>> A

We next want the opportunity to apply the rules of production to the string axiom that is being passed back to us from the base case. To do this, all we need is to call lSysCompute(), and the most concise way to do it is in the return statement itself. (Calling the external function at the return statement was exactly what we did when we called the merge() function in our mergeSort() program.) So far we have: def lSysGenerate(axiom, order): if order ==0: return axiom else: return lSysCompute(lSysGenerate(axiom, order-1)) def lSysCompute(s): d={'A':'AB','B':'A'} return ''.join([d.get(c) or c for c in s]) axiom='A' order=3 print(lSysGenerate(axiom, order))

>>> ABAAB

98 Chapter 1. Getting Started recur Documentation, Release 1.0

Ok, then. What about drawing? The fact is that we cannot draw the shape until we have the final string in hand, and this doesn’t happen until the recursive has completed. Since the iterative code doesn’t draw anything until the final theorem is generated, there is no change in the reationship between the draw() and lSysGenerate() functions.

1.16.2 Implementing absolute distance

But we are still missing a crucial part of the solution, which is the ability to draw at a scale that preserves the absolute distance described by the 0th order. With koch() and sierpinski(), we linked the variable tracking the fractal’s order with the length of the line that would be drawn. The higher the order, the shorter the line (or the smaller the shape). In the current case, we can’t do anything at the base case except retrieve the axiom, and we can’t draw the complete figure until we’ve exited the recursion. So it looks like we’re asking two separate things of our code: figure out the smallest line length we need, and compute the final iteration of the L-system string. We already know (and pretty much don’t have a choice) about how the latter works. The trick is to figure out where to compute the former, and how to extract it from the recursive mechanism. If we want two values from our recursion, it makes sense to include more than just the string in the recursive function’s return statement. This means that we will also have two variables in the base case’s return statement. Since we’ll have to return two values through the recursive cases, we’ll have to be careful about what we’re subjecting to computation. Recall from our expansion of Pascal’s triangle, we went from wanting to pass a single list (ie, the nth layer of the triangle) to a list of lists (ie, all layers of the triangle up to and including the nth layer). The trick is that, while we wanted the entire list of lists, we only wanted to recursively operate on the last returned sublist. We’ll be doing something similar here. So far, we’ve got the string-generation part down, and there’s no need to mess with that. To find the smallest line length length, we know that for every order, length will be recomputed successively as length /= 3 . So it makes sense to conduct the division pre-recursively. Once we get down to the base case, we’ll have the correct minimum value for length. Now all we need to do is pass that value of length - unchanged - back through the recursive process and hand it to the frame that initially called the recursive function. Here’s the final code for lSysGenerate(), which I’ll walk through below: def lSysGenerate(axiom, order, length): if order ==0: return [axiom, length] else: length/=3 r= lSysGenerate(axiom, order-1, length) return [lSysCompute(r[0]), r[1]]

We first need to include length as an argument when defining (and calling) lSysGenerate(). Assuming that order > 0, we skip the if block and enter the else block. There, we modify the pre-recursive portion of the else block to compute length /= 3. This gets us to the point where we recursively call the function. Now, the point of the recursive call is to get us to the base case, with the additional requirement that we bring the successively subdivided variable length along with us. Since our function currently has the parameters lSysGenerate(axiom, order, length), we decrement order - 1 and the latest value of length trav- els with it. For the base case order == 0, we don’t need to recompute length, but we do need to capture it as part of our return statement. So we’ll recast our return statement as a list, [axiom, length]. In the case of the Koch curve, if we wanted order 2 and had an initial length of 1000, what should be returned is ['A', 111]. But returned as what? Here I’ve been a bit more explicit, as I want to emphasize the fact that we’re passing a list back to the calling frame. As I’ve done throughout this guide, I use the variable name r as a simple placeholder, which gives us a more intuitive view of the return statement. Since r is a list that consists of the string and the minimum

1.16. Solving L-System Recursion 99 recur Documentation, Release 1.0 length, we want to apply the rules of production (by calling lSysCompute()) to the first index item but leave the second one untouched. In the interest of compact code, you could also write: return [lSysCompute(lSysGenerate(axiom, order-1, length)[0]), lSysGenerate(axiom,

˓→order-1, length)[1]]

But this is difficult to read, and also implies an additional and unnecessary computation. Be nice to other people, and make your code easy to read. The last modification that we need to make is in the arguments of the draw() function, since we are now accessing two index items from a list - r[0] == theorem and r[1] == length. Here is the complete code: import string import turtle def lSysGenerate(axiom, order, length): if order ==0: return [axiom, length] else: length//=3 r= lSysGenerate(axiom, order-1, length) return [lSysCompute(r[0]), r[1]] def lSysCompute(theorem): d={'A':'A-A++A-A'} return ''.join([d.get(c) or c for c in lString]) def draw(t, theorem, length, angle): for c in theorem: if c in string.ascii_letters: t.fd(length) elif c =='-': t.left(angle) elif c =='+': t.right(angle) def main(): t= turtle.Turtle() wn= turtle.Screen() wn.bgcolor('black')

t.color('orange') t.pensize(1) t.penup() t.setpos(-500,0) t.pendown() t.speed(0)

axiom='A' length= 1000 angle= 60 order=4

r= lSysGenerate(axiom, order, length) draw(t, r[0], r[1], angle)

(continues on next page)

100 Chapter 1. Getting Started recur Documentation, Release 1.0

(continued from previous page) wn.exitonclick()

main()

You can quickly test this by writing a little for loop outside of main() that overlaps each order of the curve on top of the last, using a different pen color:

def main(order): t= turtle.Turtle() wn= turtle.Screen() wn.bgcolor('black')

colormap=['red','orange','yellow','green','blue','violet','white']

t.color(colormap[i]) t.pensize(3) t.penup() t.setpos(-1250,-600) t.pendown() t.speed(0)

axiom='A' length= 1000 angle= 60

r= lSysGenerate(axiom, order, length) draw(t, r[0], r[1], angle)

order=7 for i in range(order): main(i)

You’ll see that the code applies to any combination of alphabet, axiom and rules of production, although with varying and sometimes surprising results.

1.16.3 Recursively finding a target string

Exercise: For a given L-system, find if a string target exists as part of the system after a certain number of iterations. Write a recursive solution that returns the order where target appears, as well as its index position in the string as it exists in that order. For example, say that we want to find at what point target = 'BAABAABA' first appears in Lindenmayer’s original system. After running the function the output should print target BAABAABA is at index position 6 at order 7. If it doesn’t, output should be target BAABAABA not found. Solution: Given the generative nature of L-systems, keep in mind that target may show up in one order but, thanks to the next application of the rules of production, be substituted out at the next! By the same token, you have to write your solution so that what’s returned is the first time target is found, not the most recent. I mention this because recursion tends to favor retrieval of values at two points: at the base case, and at the end of the entire recursive cascade. It’s more difficult to capture values during recursion. One way we did this was with mergeSort(), where we called an outside function for input that was then passed into the recursive cascade. We used this same technique for the solution to the first exercise above, where we called lSysCompute() at every return statement of lSysGenerate(). Here is a somewhat different approach, where we don’t have to invoke another function. Instead, we’ll store the values

1.16. Solving L-System Recursion 101 recur Documentation, Release 1.0

as additional parameters of the recursive function itself. Let’s start with the solution we came up with for the previous exercise. Since we’re only interested in finding a string in the theorem, we can use Lindenmayer’s original system and dispense with a drawing component.

def lSysGenerate (axiom, order): if order ==0: return axiom else: return lSysCompute(lSysGenerate(axiom, order-1))

def lSysCompute(lString): d={'A':'AB','B':'A'} return ''.join([d.get(c) or c for c in lString])

def main(): axiom='A' order=8 print(lSysGenerate(axiom, order))

main()

>>> ABAABABAABAABABAABABAABAABABAABAABABAABABAABAABABAABABA

Great. We already know that we have to pass another argument to lSysGenerate() - the string that we’re looking for. Let’s call it target.

def lSysGenerate (axiom, order, target): if order ==0: return axiom else: return lSysCompute(lSysGenerate(axiom, order-1, target))

def lSysCompute(lString): d={'A':'AB','B':'A'} return ''.join([d.get(c) or c for c in lString])

def main(): axiom='A' order=8 target='BAABAABA' print(lSysGenerate(axiom, order, target))

We also have to unpack the else block in lSysGenerate() so that we can test for the presence of target. If it’s found, we want to record that, so we’ll need a variable hit that we’ll initially set to None. This variable is also added as a parameter to the function, so that it’s carried along with everything else.

def lSysGenerate (axiom, order, target): hit= None if order ==0: return [axiom, hit] else: r= lSysGenerate(axiom, order-1, target) if target in r[0]: r[1]= r[0].find(target) return [lSysCompute(r[0]), r[1]]

At the base case, we return a list, initially ['A', None], so we know that r[0] is the string, and, if we find target in r[0] is True, then the index position gets recorded in the second list item, r[1]. But we’re still

102 Chapter 1. Getting Started recur Documentation, Release 1.0

missing the exact order (or frame, or iteration) at which point this happens. So we’ll add another item to our base case’s returned list, order.

def lSysGenerate (axiom, order, target): hit= None if order ==0: return [axiom, hit, order] else: r= lSysGenerate(axiom, order-1, target) if target in r[0]: r[1]= r[0].find(target) r[2]= order print(r) return [lSysCompute(r[0]), r[1], r[2]]

I snuck in a print(r) statement to see what’s going on here:

>>> ['A', None, 0] >>> ['AB', None, 0] >>> ['ABA', None, 0] >>> ['ABAAB', None, 0] >>> ['ABAABABA', None, 0] >>> ['ABAABABAABAAB', None, 0] >>> ['ABAABABAABAABABAABABA', 6, 7] >>> ['ABAABABAABAABABAABABAABAABABAABAAB', 6, 8] >>> ['ABAABABAABAABABAABABAABAABABAABAABABAABABAABAABABAABABA', 6, 8]

Hmmm, so we’re getting the correct index position for target but the value for order isn’t sticking. This is because every frame re-checks to see whether target shows up within the namespace of r[0] for that frame. We need to be able to set things up so that, once target is found, we can stop looking, and preserve both the index position and order as they were in that frame. There is no other way, since, unlike a loop, we can’t just break out of the recursion. To do this, we set up a variable flag. Initially set to False, once target is found, we re-set flag to True and integrate it as a condition for the if block. Once flag == True then the loop is never triggered again, even if target in r[0] continues to be True for succeeding frames. Here’s our code so far:

def lSysGenerate (axiom, order, target, flag): hit= None if order ==0: return [axiom, hit, order, flag] else: r= lSysGenerate(axiom, order-1, target, flag) if target in r[0] and r[3] == False: r[1]= r[0].find(target) r[2]= order r[3]= True print(r) return [lSysCompute(r[0]), r[1], r[2], r[3]]

This looks pretty good! We are, however, missing one last piece. It’s always important to consider edge cases - what if target is part of the axiom itself? As we have written the code so far, we will only return True for order == 1 even if target is present in the 0th order. So we have to add a check at the base case. Here is the final, complete code:

def lSysGenerate (axiom, order, target, flag): hit= None (continues on next page)

1.16. Solving L-System Recursion 103 recur Documentation, Release 1.0

(continued from previous page) if order ==0: if target in axiom: hit= axiom.find(target) flag= True return [axiom, hit, order, flag] else: r= lSysGenerate(axiom, order-1, target, flag) if target in r[0] and r[3] == False: r[1]= r[0].find(target) r[2]= order r[3]= True return [lSysCompute(r[0]), r[1], r[2], r[3]]

def lSysCompute(lString): d={'A':'AB','B':'A'} return ''.join([d.get(c) or c for c in lString])

def main(): axiom='A' order=8 target='BAABAABA' flag= False r= lSysGenerate(axiom, order, target, flag) if r[1] != None: print(``target``, target,'is at index position', r[1],'at order', r[2]) else: print(``target``, target,'not found')

main()

>>> target BAABAABA is at index position 6 at order 7

This code may not be as elegant or compact as many of the high-level recursive examples. On the other hand, it shows how you can retrieve values while the recursive cascade is still unfolding. Also, by gradually building up the code, I hope I made it a little less intimidating than if I’d simply introduced the final solution straight away.

1.17 Boss Level: The Tower of Hanoi

Finally, we lay siege to the Tower of Hanoi. While I’ve studied recursion for only a brief time, I’ve become more and more surprised that many tutorials on the subject include this as only the third or fourth example (the other two are usually factorials and Fibonacci sequence). In this guide I’ve gone through dozens of algorithms, and I think the Tower problem is still very difficult to understand, partly because the problem and the solution are both easily stated. “The code is so short - how hard could it be?” are common Famous Last Words people say when they start digging into the puzzle. The real learning, as I’ve tried to emphasize in every section of this guide to recursion, comes from an ability to work through the process by which we come to the solution. So for this puzzle we’ll be deriving the answer from first principles, which is more beneficial (and perhaps easier) than reverse-engineering the solution. Then we’ll spend some time understanding how that solution actually works - as we’ll see, getting the answer and knowing how it works can be two very different things. First, a little history. The problem was first posed by Édouard Lucas in 1883. Assume n number of disks are stacked by decreasing size on peg A. There are two other empty pegs, B and C. You have to move the disks from A to C, with the order of stacking preserved. There are two further restrictions:

104 Chapter 1. Getting Started recur Documentation, Release 1.0

1. You can only move one disk at a time 2. A larger disk can never be placed on top of a smaller disk Lucas fancifully set the task to a group of monks in a temple, where they manually move a stack of 64 golden disks from A to C. Upon completing the task, their reward would be the end of the world. Now, executing the minimum number of moves for 64 disks, at one move per second, would take something like 585 billion years. Which means the world will be around for quite a while yet, at least if the disks have to be physically manipulated. It’s a good thing they didn’t use a computer to avoid any mistakes! (A good case for keeping monks away from computers is made in Arthur C. Clarke’s short story The Nine Billion Names of God.) Before we get started, we need to be more explicit about the rules for moving disks. Our first variation - let’s call it simpleHanoi() - will allow us to jump from peg A to peg C, even if we couldn’t land on peg B due to there being a smaller disk on B than the one we are moving. The second, stricter variant allows us to move disks only to a neighboring peg. We’ll get to strictHanoi() after we’ve sorted out simpleHanoi().

1.17.1 Put the algorithm in your hands

While I’ll be discussing the problem in Lucas’s original terms of disks and pegs, you can physically model the Tower problem with any group of objects of increasing size - say, a stack of books. In fact, I strongly recommend that you do this right now. It’s not often that you can hold the steps of an algorithm in your hands. Let’s start with a stack of n = 3 disks as our example, using the simpleHanoi() variant. How many steps does it take? 14? 11? You should be able to get it down to 7 fairly quickly. Programmers are interested in efficiency, so we’re really only concerned with the minimum number of moves. So we can better keep track, let’s also number the disks, starting with 1 for the smallest, top disk, and ending at n for the largest, bottom disk (the reason for doing it this way, as opposed to calling the smallest disk n and the largest disk 1, will become apparent when we get to the function itself). The first thing to notice about the 7-step solution is that, at step 4, we move disk 3 from peg A to peg C. This is the first disk that is permanently in the right spot. What’s more, we only have to move it once! Also, the remaining disks are all on peg B. The next three moves simply stack the remaining disks on peg C, using A as a temporary peg. Moving on to a stack of n = 4, some practice should get you to a minimum number of 15 steps. Note that the same pattern obtains: at move 8, the (largest, bottom) disk 4 can jump from peg A to peg C because disks 1-3 are all on peg B. This allows us to hypothesize that, regardless of the size of n, at the midpoint of the minimum number of steps, we should be able to move disk n from A to C. When you think about it, the corollary that “all remaining disks must be stacked on peg B” is simply the logical pre-requisite to this. So we could say that, in order for us to move disk 4 from A to C, what we really want to do is build the stack of all disks but 1 on peg B. To achieve this, we have to clear the way for disk 3 to move from A to B. You can see there is a strategy slowly emerging here, or at least a way for us to know we’re on the right track. But how do we know what the minimum number of steps is for any given n? Let’s look at n = 5 next. Instead of solving the entire puzzle for 5 disks, just determine how many steps you need to get to the theoretical midpoint, where disk 5 makes its only move from A to C. A bit more practice will reveal that this occurs on move 16. So by move 15, we have our stack of all remaining disks on peg B and disk 5 is ready to make its big jump. From there it’s easy to conclude that the minimum number of moves for n = 5 is 31. If we tabulate our observations for the non-strict variant of the Tower simpleHanoi(), we have the following number of moves for the first few values of n:

midpoint total n == 1 0 1 n == 2 2 3 n == 3 4 7 (continues on next page)

1.17. Boss Level: The Tower of Hanoi 105 recur Documentation, Release 1.0

(continued from previous page) n == 4 8 15 n == 5 16 31

From this we can extrapolate that, for n disks, the minimum number of moves works out to (2**n) - 1, and that we should hit our midpoint at (2**n) // 2. It looks like we have a handle on the scope of the problem. It’s possible to extend our heuristical thinking a bit further: • We can’t move the largest disk to peg C until it’s the only disk on peg A, and peg C is empty • In order for that to be true, all remaining disks must be stacked, in the correct order, on peg B • In order to do that, the next-largest disk must be at the bottom of peg B • To do this, we can only move that disk when it is at the top of peg A and peg B is empty • Therefore all disks smaller than that disk must be stacked on peg A You have probably already recognized this “I can’t do ‘x’ until I do ‘y’, and I can’t do ‘y’ until I do ‘z’” sort of talk as a euphemism for recursion. But do we have any guidance for how the recursion might be structured? Well, we do have an important clue, which is that the minimum number of moves is (2**n) - 1. If you worked through the mergeSort() example in Recursive Approaches To Searching And Sorting, you’ll know that (2**n) - 1 represents a binary tree structure. An initial root node generates two nodes of its own. In turn, each node creates two nodes of its own. This multiplication continues until something ends the tree, resulting in leaves (also known as terminal nodes). The tree we saw in mergeSort() was created using the two recursive calls. Every time mergeSort() called itself, we would eventually open two frames, one for mergeSort(L[:mid]) and another for mergeSort(L[mid:]). Here’s the tree and code for the example in the referenced section.

def mergeSort(L): if len(L) ==1: return L else: mid= len(L)//2 left= mergeSort(L[:mid]) right= mergeSort(L[mid:]) return merge(left, right)

In the case of mergeSort() the tree was created by halving the list recursively into left and right sublists, until a sublist had len(L) == 1, which was the condition for ‘leafiness’. With our solution to the Tower of Hanoi it will be a little more subtle. But this discussion shows that we have another piece of the puzzle: if simpleHanoi() can be modeled as a binary tree, then it will be represented in code as a function with two recursive cases. Let’s make a few more observations. The move of the next-largest (or n - 1) disk from peg A to peg B occurs at the midpoint between the start of the puzzle and the move where the largest disk jumps to C (‘1 to C’ is simply a restatement of the last move made overall).

n-1 to B n to C 1 to C n = 1 0 0 1 n = 2 1 2 3 n = 3 2 4 7 n = 4 4 8 15 n = 5 8 16 31

And when it comes to taking disk 2 from peg B to its destination peg C, there is a similar symmetry at work:

106 Chapter 1. Getting Started recur Documentation, Release 1.0

Fig. 14: Figure 1. mergeSort() call tree for L = [8, 4, 1, 6, 5, 9, 2, 0, 3]

1.17. Boss Level: The Tower of Hanoi 107 recur Documentation, Release 1.0

n-1 to B n to C n-1 to C 1 to C n = 1 0 0 0 1 n = 2 1 2 3 3 n = 3 2 4 6 7 n = 4 4 8 12 15 n = 5 8 16 24 31

So far, we’ve established the minimum number of steps for a given n and abstracted it into a formula. We’ve also intuited some basic structural characteristics for a number of instances of the puzzle, all of which seem consistent. The next logical step is to look at the sequence of specific moves. We know that n = 1 is a trivial example that can be stated as ‘Move disk 1 from A to C’. Let’s look at n = 2: 1. Move disk 1 from A to B (disk n - 1 to B) 2. Move disk 2 from A to C (disk n to C, midpoint) 3. Move disk 1 from B to C (disk n - 1 to C) The pattern becomes a bit clearer with n = 3: 1. Move disk 1 from A to C 2. Move disk 2 from A to B (disk n - 1 to B) 3. Move disk 1 from C to B 4. Move disk 3 from A to C (disk n to C, midpoint) 5. Move disk 1 from B to A 6. Move disk 2 from B to C (disk n - 1 to C) 7. Move disk 1 from A to C If we have n = 4 we can see some complexity beginning to arise but our basic structure of midpoints holds: 1. Move disk 1 from A to B 2. Move disk 2 from A to C 3. Move disk 1 from B to C 4. Move disk 3 from A to B (disk n - 1 to B) 5. Move disk 1 from C to A 6. Move disk 2 from C to B 7. Move disk 1 from A to B 8. Move disk 4 from A to C (disk n to C, midpoint) 9. Move disk 1 from B to C 10. Move disk 2 from B to A 11. Move disk 1 from C to A 12. Move disk 3 from B to C (disk n - 1 to B) 13. Move disk 1 from A to B 14. Move disk 2 from A to C 15. Move disk 1 from B to C

108 Chapter 1. Getting Started recur Documentation, Release 1.0

We can see that there are other patterns at work here and really begin to appreciate how a (2**n) - 1 tree expands. In this view, you can see that every successive value of n takes the previous sequence of n - 1, concatenates n, and then concatenates the n - 1 sequence again:

n = 1 1 n = 2 121 n = 3 1213121 n = 4 121312141213121 n = 5 1213121412131215121312141213121

Another view shows how the series expands by always inserting a new element between every move of the preceding series. In turn, all existing values are incremented by 1:

n = 1 1 n = 2 1 2 1 n = 3 1 2 1 3 1 2 1 n=4121312141213121 n=51213121412131215121312141213121

Of course, these are just two different views of the same series expansion. But it helps to be able to think about things from several angles, and can also lead to insights about how to solve the problem differently.

1.17.2 Filling in the binary tree

Since our hunch about the binary tree structure is bearing out, let’s write out the steps in terms of that form. Also, by now you may have concluded that, if we stick to the minimum number of steps, there is only one possible sequence for any given n. This implies that there is only one correct way of populating the tree - another hint that our recursive approach will work out, since recursion is exhaustive by nature. As the comparison of steps shows, since ‘Move disk n from A to C’ will always occur at the midpoint, then that move occupies the ‘root node’ of the tree, with an equal number of moves to either side of it. Even though n = 1 is our trivial example, it’s always a good place to start:

Fig. 15: Figure 2. Call tree for n = 1

I’m using the circles to designate frames, and diamonds to show when the step actually gets executed. We know that a tree is populated from its initial, root node, and execution of a tree goes from left to right, so rendering n = 2 isn’t too difficult: For n = 2, both frame creation and step execution move in the same direction, so I guess you could draw this diagram as three boxes in a single line. However, things get complicated with n = 3, so it’s more appropriate to consider the two disk 1 nodes as ‘children’ of the disk 2 ‘parent’. For n = 3, keep in mind that multiple recursion behaves in a depth-first fashion. So we can expect the first move to be represented by the left-most bottom leaf, and the final move to occupy the right-most bottom leaf.

1.17. Boss Level: The Tower of Hanoi 109 recur Documentation, Release 1.0

Fig. 16: Figure 3. Call tree for n = 2

Fig. 17: Figure 4. Call tree for n = 3

110 Chapter 1. Getting Started recur Documentation, Release 1.0

The n = 3 tree clearly shows how the order of execution is developing. The further down the tree we go (ie, the smaller the disk, or the smaller the n), the more moves are required of it. The way each disk occupies its own ‘level’ of the tree is also consistent. This will become very handy when we finally look at designing our recursive calls. Finally, the tree for n = 4 really illustrates the ‘rhythm’ of expansion. We dip down to the leftmost leaf, and execute the triangle of calls of frames 3-5, then back to frame 2, then the trio of frames 6-8. Having completed the left side of the tree, we execute the root node, and then move on to the right side.

Fig. 18: Figure 5. Call tree for n = 4

We’re almost ready to design our recursive calls. But before we do that, I want to point out one difference that you may have noticed between simpleHanoi() and mergeSort(): the order of execution is altered. In mergeSort(), for example, frame 2 (where L == [8, 4, 1, 6]) splits its list into [8, 4] for frame 3, and [1, 6] for frame 6. The two called frames are executed first, followed by the calling frame. In simpleHanoi(), we execute the first called frame, then the calling frame, and lastly the second called frame - another thing we’ll certainly have to take this into account. The other thing to consider is exactly what kind of data we are providing as our solution. I’ve been presenting the puzzle in a very text-heavy way (eg, ‘Move disk 3 from A to B’). This is because most implementations present the print statements as the solution. Once the program is finished, all that you have to show for it is what’s on the screen. Admittedly, this is a little bizarre. With the exception of our drawn fractals, we have so far worked with algorithms that provided us with results that we could then send to other functions. I’ll continue to develop the Tower of Hanoi example as it’s commonly done, but it’s certainly possible to repurpose the code so that it isn’t quite so self-contained.

1.17.3 Designing the recursive calls

Now that we’ve successfully built a model of how the puzzle works, we want to think about how a function could produce an output that matches the order of steps in our binary tree. How do we fit the recursive calls to the model? What do we have to cook with? Not much, it seems - n, A, B and C. But it’s enough. Also, thanks to our construction of both the binary tree and the sequence of moves, we know simpleHanoi() will have: 1) Two recursive calls 2) A way to decrement n so that we can address all ‘levels’ of n 3) The order of execution must be ‘called frame/calling frame/called frame’, versus the ‘called frame/called frame/calling frame’ implementation we had in mergeSort() 4) ‘Move disk n from A to C’ must occur at the midpoint of program execution At this point, let’s see if we can use these guidelines to simply just recreate the text output for n = 2:

1.17. Boss Level: The Tower of Hanoi 111 recur Documentation, Release 1.0

Move disk 1 from A to B Move disk 2 from A to C Move disk 1 from B to C

To get this result, we could draft a totally fake, non-recursive function: def simpleHanoi(n): print('Move disk', n-1,'from A to B') print('Move disk', n,'from A to C') print('Move disk', n-1,'from B to C') n=2 print(simpleHanoi(n))

>>> Move disk 1 from A to B >>> Move disk 2 from A to C >>> Move disk 1 from B to C

This addresses most of the four points above: 1. We have two recursive calls. 2. n - 1 implies where our recursive calls should be placed. 3. By sandwiching the print statement between the two calls, we guarantee that we will follow the ‘called frame/calling frame/called frame’ order of execution. 4. This sandwiching also ensures that the move of disk n from A to C occurs at the midpoint. And by recursion, we can assume that, at greater values of n, this symmetry will hold, since the first recursive call represents the left side of the tree, and the second the right. Obviously this code, in addition to not being recursive, doesn’t really work for anything other than n = 2. In fact, if recursion is to do all the work, the only print statement we can have is the one in the middle. The recursive statements will have to arrange the function’s variables - A, B, C - so that the correct print statement is executed for n, n - 1,..., until n == 0. And all in the correct order! Obviously, this means we cannot hard-code the string from A to C, because that will be correct for only a tiny number of moves. How can we create the flexibility we need? While we were deriving the steps needed to move the disks in the correct sequence, you may have noticed that any given step always only uses two pegs. This implies that each recursive call should specify both the peg that holds the disk, and the peg that is the disk’s destination. The third peg doesn’t need to be specified as it’s not at all part of the move. Staying with n = 2, it’s simple to restate our fake code to include our peg variable names: def simpleHanoi(n, A, C, B): if n>0: simpleHanoi(n-1, A, B, C) print('Move disk', n,'from', A,'to', C) simpleHanoi(n-1, B, C, A) n=2 print(simpleHanoi(n,'A','C','B'))

>>> Move disk 1 from A to B >>> Move disk 2 from A to C >>> Move disk 1 from B to C

I’ve made only four modifications to the code:

112 Chapter 1. Getting Started recur Documentation, Release 1.0

1. I’ve literally translated the first and third print statements into recursive calls (keep in mind that we still need to pass the correct number arguments originally defined in the function, so even though they’re not mentioned in the print statements, we still have to include C and A as arguments in their respective calls). 2. In order to make this literal translation possible, the initial parameters in the function definition (and the argu- ments passed when simpleHanoi() is first invoked) are (n, A, C, B) 3. Instead of hard-coded text, the middle print statement now calls parameters A and C, whose values are, unsur- prisingly, A and C. 4. The function’s contents is now inside an if block. Otherwise the first recursive call will keep decrementing n into negative numbers until the maximum recursion depth is exceeded, and we won’t ever see a single print statement. Ok, so we’ve translated a fake function into something that provides the same output for the simplest possible case that can use recursion. However, here is the remarkable thing: run this version of simpleHanoi() for any value of n and you’ll see that it yields the correct sequence. Somehow, we’ve solved the problem. How the hell did this happen?

1.17.4 Why does it work?

The short answer is thanks to recursion. That is, if it works for the simplest possible binary tree, it should work for a binary tree of any size. This isn’t an exclusive property of binary trees, though. Recall with sierpinski(), once we’d solved the base case and the minimally recursive case, we’d also solved the problem for any order fractal we wanted. You may find this explanation unsatisfying, so let’s look more closely at how simpleHanoi() generates all the right calls, in the right order. Another perspective might be helpful. Up until this point, I haven’t at all mentioned the base case. Is there one? Of course, but not where you might think it to be. Go back to the section on the Sierpinski triangle, and you’ll see a similar construction:

def sierpinski(t, order, p): draw(t, p) if order>0: # insert recursive magic here

With sierpinski() we wanted to be sure that we drew the triangle at the given coordinates p even if the if block never triggered. This always gave us a result, even if order == 0. In this sense, the base case for sierpinski() is the triangle described by p. It’s the simplest statement of the problem. Similarly, the simplest statement of the problem for simpleHanoi() is ‘Move disk 1 from A to C’, where n = 1. We enforce this by making the execution of the entire function subject to the condition n > 0. In the case of n = 1, both recursive calls send n == 0 to new frames. Since neither new frame triggers the if block, the program exits these frames without any further action. All we’re left with is print('Move disk', n, 'from', A, 'to', C). From this, we can generalize that the base case of any recursive treatment of a binary tree is that tree’s root node. For all other cases where n > 1, we trigger recursive calls that have to do actual work. If n > 1 and Move n from A to C is always the midpoint, then the left side of the tree finishes when all disks are stacked on peg B, and all calls on the right side of the tree are dedicated to getting the rest of the disks from B to the destination peg, C. The technique behind these calls should look familiar to you. If it doesn’t, go back to the example of the greatest common divisor in the section Recursion and Swapped Arguments. But while gcdRecur() swapped two parameters in a recursively linear context, simpleHanoi() is more complex, swapping three parameters over two recursive calls. This swapping is by far the most difficult thing to understand about simpleHanoi(), so let’s take a closer look at how it works. Let’s begin with frame 1 for n = 2. There’s a lot to keep track of, but keep in mind one of our heuristics:

1.17. Boss Level: The Tower of Hanoi 113 recur Documentation, Release 1.0

a frame holds the values of all the parameters and variables for that frame. Those values don’t change, either, unless we explicitly bind another value to one of those variables. What does get changed is the content/arrangement of the arguments for each recursive call. So far we have mostly been breaking down (or augmenting) lists, passing on boolean values, etc. Here all we’re doing is taking the parameters of the calling function and swapping them to form the arguments for the called function. That’s the trick: once the recursive call is made from a given frame, the swapped values then become the order for the newly called frame. When we return to the calling frame, we re-use the values as they exist in the calling frame to populate the swapped arguments for the second call. In the abstract this makes sense, but in practice it gets tricky rather quickly, so I’ll modify the code to label pegs A, B and C with source (src), temp (tmp) and destination (dst):

def simpleHanoi(n, src, dst, tmp): if n>0: simpleHanoi(n-1, src, tmp, dst) print('Move disk', n,'from', src,'to', dst) simpleHanoi(n-1, tmp, dst, src)

n=2

print(simpleHanoi(n,'A','C','B'))

It’s easy to follow the arguments A, C and B as they’re passed to the function and assigned to parameters src, dst and tmp. These, along with n, are used explicitly in the print statement. For the recursive statements, the swapping alters which term is bound to a given peg:

frame 1 function definition: 1) src = 'A' 2) dst = 'C' 3) tmp = 'B'

1st recursive call: 1) src --> src or 'A' --> 'A' 2) dst --> tmp or 'C' --> 'B' 3) tmp --> dst or 'B' --> 'C'

frame 2 function definition: 1) src = 'A' 2) dst = 'B' 3) tmp = 'C'

2nd recursive call: 1) src --> tmp or 'A' --> 'B' 2) dst --> dst or 'C' --> 'C' 3) tmp --> src or 'B' --> 'A'

frame 3 function definition: 1) src = 'B' 2) dst = 'C' 3) tmp = 'A'

Here is an expanded version of our n = 2 call diagram, where I’ve written in the functions’ actual values into the function definitions and recursive calls: You can now see clearly how frame 1’s first recursive call. . .

simpleHanoi(1, src='A', tmp='B', dst='C')

. . . sets the parameters for frame 2’s function definition:

114 Chapter 1. Getting Started recur Documentation, Release 1.0

Fig. 19: Figure 6. Call tree for n = 2 with complete function definitions

def simpleHanoi(1, src='A', dst='B', tmp='C'): # frame 2 stuff

By the same token, frame 1’s second recursive call. . . simpleHanoi(1, tmp='B', dst='C', src='A')

. . . sets the parameters for frame 3’s function definition: def simpleHanoi(1, src='B', dst='C', tmp='A'): # frame 3 stuff

Since the frames where n == 0 don’t do any work, I don’t include them in the diagram. You could say they are ‘ghost frames’ because they just open and close, returning program control to the calling frame. Still, these frames are vital, because it’s at this point that the recursion ‘turns around’. In a sense, these frames exert the pressure the program needs to bubble back up the call stack. You can see that a diagram that lists the complete function with all variables, parameters and arguments will get quite big quite quickly. So if you still don’t trust my explanation, I invite you to expand it out to cover n = 3 or even n = 4. Eventually, I believe you’ll agree with me. In the next section [forthcoming], we’ll look at another way of solving the Tower of Hanoi problem. We’ll use the strict version, which won’t allow us to jump over the middle peg. It will be a little more laborious, but will also lead to some very surprising results.

1.17.5 Heuristics and Exercises

Gathering as much information about a problem can have multiple benefits. As you develop your knowledge you can begin to see patterns and clues, which can inform the design of a recursive (or any other) solution. If a recursive solution works for a binary tree of the smallest non-trivial size, there’s a good chance it will work for a binary tree of any size. Swapping arguments is a powerful method for covering all contingencies of a recursive scenario - as long as you can keep track of what is going on. You can control the order in which nodes are executed by changing the order of statements in the recursive function (ie, where the recursive calls are, in relation to the node’s statements) We can write recursive functions in such a way that, at the leaf level of a binary tree, nothing happens (no statement is executed). However, this is very useful for when you just need to ‘turn around’ the recursive cascade.

1.17. Boss Level: The Tower of Hanoi 115 recur Documentation, Release 1.0

The base case of a binary tree is always the root node. Exercise: You’ve shown simpleHanoi() to your friends but they still don’t believe you, because everything is “just print statements” and there’s “no real data”. Following the usual rules, modify (or re-write) simpleHanoi() so that you begin with three lists. . .

A=[4,3,2,1] B=[] C=[]

. . . and end up with. . .

A=[] B=[] C=[4,3,2,1]

Your (recursive!) solution should provide for a way to store each step in some sort of data structure that will be returned by the function when it finishes executing. Now try to modify your solution to use a dictionary that’s initially defined as. . . d= {A: [4,3,2,1], B: [], C: []}

. . . and ends up in the following state: d= {A: [], B: [], C: [4,3,2,1]}

Exercise: Can you write a recursive solution to the Tower of Hanoi as a variation on our expansion of Pascal’s triangle? How about as an L-system? For this exercise, only reproduce which disk is moving for each step. (Hint: refer back to the part where I discuss the expansion of the series for various values of n).

116 Chapter 1. Getting Started CHAPTER 2

Indices and tables

• genindex • modindex • search

117