Interesting Topics
for Bachelor Theses
Walter M. Bohm¨ Institute for Statistics and Mathematics Vienna University of Economics [email protected]
October 4, 2018 The picture on the title page is an artwork by Jakob Bohm,¨ www.jacob-julian.com
The other pictures used in this book are from Wikipedia Commons and MacTutor History of Mathematics archive maintained at The University of St. Andrews.
2 Foreword
This booklet is a collection of topics which I prepared over the time for my students. The selection of topics reflects my personal interests and is therefore biased towards combinatorial mathematics, probability and statistics, opera- tions research, scheduling theory and, yes, history of mathematics. It is in the nature of things that the level of difficulty varies from topic to topic. Some are technically more demanding others somewhat easier. The only prerequisite to master these topics are courses in mathematics and statistics at an undergraduate university level. Otherwise, no special prior knowledge in mathematics is afforded. However, what is needed is serious interest in mathematics, of course.
How is a Topic organized? Each topic consists of three parts:
(1) An Invitation Of course, the major purpose of this invitation is to raise your interest and to draw your attention to a problem which I found very interesting, attractive and challenging. Further, in each invitation I introduce some basic terminology so that you can start reading basic literature related to the topic.
(2) Where to go from here Some of my invitations are more detailed depending on the topic, so you may ask yourself: Is there anything left for me to do? Yes, there is lot of work still to be done. The second section of each topic contains questions and problems which you may study in your thesis. This list is by no means exhaustive, so there enough opportunity to unleash your creative potential and hone your skills. For some topics I explicitly indicate some issues of general interest, these are points which you should to discuss in your thesis in order to make it more or less self-contained and appealing to readers not spezialized in this topic. And sometimes there is also a section What to be avoided: here I indicate aspects and issues related to the topic which may lead too far afield or are technically too difficult
(3) An annotated bibliography This is a commented list of interesting, helpful and important books and journal articles.
This book has not been finished yet and probably may never be.
You are free to use this material, though a proper citation is appreciated.
3 4 Contents
1 Recreational Mathematics 11 1.1 An Invitation...... 11 1.1.1 The Challenge of Mathematical Puzzles...... 12 1.1.2 Some Tiny Treasures From My Collection...... 14 1.2 Where to go from here...... 18 1.3 An annotated bibliography...... 23 1.4 References...... 24
2 Shortest Paths in Networks 27 2.1 An Invitation...... 27 2.1.1 The problem and its history...... 27 2.1.2 Preparing the stage - graphs, paths and cycles...... 28 2.1.3 Weighted graphs...... 31 2.1.4 Solvability...... 37 2.1.5 It’s time to relax...... 39 2.1.6 A sample run of the Bellman-Ford Algorithm...... 41 2.1.7 The complexity of the Bellman-Ford Algorithm...... 44 2.2 Where to go from here...... 45 2.2.1 Issues of general interest...... 45 2.2.2 Some more suggestions...... 46 2.2.3 To be avoided...... 48 2.3 An Annotated Bibliography...... 49 2.4 References...... 50
3 The Seven Bridges of K¨onigsberg 53 3.1 An Invitation...... 53 3.1.1 Euler’s 1736 paper...... 53 3.1.2 K¨onigsberg and a puzzle...... 54 3.1.3 Euler takes notice of the puzzle...... 54
5 Contents
3.1.4 Euler’s solution...... 56 3.1.5 What happened to the problem later?...... 60 3.1.6 An epilog: K¨onigsberg and its bridges today...... 60 3.1.7 The Chinese Postman Problem...... 61 3.2 Where to go from here...... 65 3.2.1 Issues of general interest...... 65 3.2.2 Some more suggestions...... 67 3.3 An Annotated Bibliography...... 69 3.4 References...... 70
4 The Chains of Andrei Andreevich Markov - I 73 4.1 An Invitation...... 73 4.1.1 The Law of Large Numbers and a Theological Debate.. 73 4.1.2 Let’s start with a definition...... 74 4.1.3 Example 1: Will We Have a White Christmas This Year? 77 4.1.4 Example 2: Losing Your Money - Delinquency Of Loans. 84 4.2 Where to go from here...... 88 4.2.1 Make up your mind - absorbing or regular chains?.... 88 4.2.2 Google’s PageRank Algorithm...... 89 4.2.3 Credit Ratings...... 90 4.2.4 Generating Random Text, maybe Bullshit...... 91 4.2.5 Other Applications...... 92 4.3 An Annotated Bibliography...... 93 4.4 A note on software...... 93 4.5 References...... 94
5 The Chains of Andrei Andreevich Markov - II 97 5.1 An Inivitation...... 97 5.2 An Annotated Bibliography...... 97 5.3 References...... 98
6 Benford’s Law 99 6.1 An Invitation...... 99 6.1.1 Simon Newcomb and the First Digit Law...... 99 6.1.2 The significand function...... 102
6 Contents
6.1.3 Benford’s Law and the uniform distribution...... 103 6.1.4 The general digit law...... 104 6.1.5 Testing the Hypothesis...... 106 6.1.6 Remarkable Properties of Benford’s Law...... 109 6.2 Where to go from here...... 113 6.2.1 Statistical Forensics...... 113 6.2.2 Experimental Statistics...... 115 6.3 An Annotated Bibliography...... 117 6.4 References...... 118
7 The Invention of the Logarithm 121 7.1 An Invitation...... 121 7.1.1 A personal remembrance...... 121 7.1.2 Tycho Brahe - the man with the silver nose...... 122 7.1.3 Prostaphaeresis...... 123 7.1.4 John Napier and Henry Briggs...... 125 7.2 Where to go from here...... 129 7.2.1 Historical Issues...... 129 7.2.2 Technical Issues...... 132 7.3 An Annotated Bibliography...... 136 7.4 References...... 137
8 Exercise Number One 139 8.1 An Invitation...... 139 8.1.1 Exercise number one...... 139 8.1.2 Partitions of integers...... 139 8.1.3 Partitions with restricted parts...... 141 8.1.4 Generating functions...... 142 8.2 Where to go from here...... 143 8.2.1 Issues of general interest...... 143 8.2.2 Some more suggestions...... 143 8.3 An Annotated Bibliography...... 143 8.4 References...... 144
9 The Ubiquitious Binomialcoefficient 145
7 Contents
9.1 An Invitation...... 145 9.1.1 The classical binomialtheorem...... 145 9.1.2 Pascal’s triangle...... 146 9.1.3 Newton’s binomial theorem...... 148 9.1.4 Binomial sums...... 150 9.2 Where to go from here...... 151 9.3 An Annotated Bibliography...... 152 9.4 References...... 152
10 Prime Time for a Prime Number 153 10.1 An Invitation...... 153 10.1.1 A new world record...... 153 10.1.2 Why primes are interesting...... 153 10.1.3 Primes and RSA-encryption...... 154 10.1.4 Really big numbers...... 156 10.1.5 Mersenne numbers...... 156 10.1.6 Primality testing...... 157 10.1.7 Generating prime numbers...... 160 10.1.8 Factoring of integers...... 161 10.2 Where to go from here...... 162 10.2.1 Computational issues...... 162 10.2.2 Issues of general interest...... 163 10.2.3 Some more suggestions...... 164 10.2.4 What to be avoided...... 164 10.3 An Annotated Bibliography...... 164 10.4 References...... 165
11 Elementary Methods of Cryptology 167 11.1 An Invitation...... 167 11.1.1 Some basic terms...... 168 11.1.2 Caesar’s Cipher...... 169 11.1.3 Frequency analysis...... 172 11.1.4 Monoalphabetic substitution...... 174 11.1.5 Combinatorial Optimization...... 177 11.1.6 The Vigen`ereCipher, le chiffre ind´echiffrable...... 179
8 Contents
11.1.7 Transposition Ciphers...... 184 11.1.8 Perfect Secrecy...... 185 11.2 Where to go from here...... 186 11.2.1 Issues of general interest...... 187 11.2.2 Some more suggestions...... 187 11.2.3 What to be avoided...... 188 11.3 An Annotated Bibliography...... 189 11.4 References...... 190
12 Parrondo’s Paradox 191 12.1 An Invitation...... 191 12.1.1 Favorable and unfavorable games...... 191 12.1.2 Combining strategies...... 192 12.2 Where to go from here...... 193 12.2.1 Issues of general interest...... 193 12.2.2 Some more suggestions...... 193 12.3 An Annotated Bibliography...... 193 12.4 References...... 194
13 Runs ins Random Sequences 195 13.1 An Invitation...... 195 13.1.1 Some remarkable examples...... 195 13.1.2 Important random variables related to runs...... 197 13.1.3 Methodological Issues...... 198 13.2 Where to go from here...... 198 13.2.1 Issues of general interest...... 198 13.2.2 Some more suggestions...... 199
14 The Myriad Ways of Sorting 201 14.1 An Invitation...... 201 14.1.1 Some basic terminology...... 201 14.1.2 An example: selection sort...... 202 14.1.3 Merging...... 203 14.1.4 Divide and Conquer...... 204 14.2 Where to go from here...... 206
9 Contents
14.2.1 Issues of general interest...... 206 14.2.2 Some more suggestions...... 207 14.3 An Annotated Bibliography...... 207 14.4 References...... 207
15 Women in Mathematics 209 15.1 An Invitation...... 209 15.1.1 Headline news...... 209 15.1.2 Emmy Noether...... 210 15.1.3 Other remarkable women...... 210 15.2 Where to go from here...... 211 15.2.1 What to be avoided...... 211 15.2.2 What you should do...... 211 15.2.3 A final remark on style...... 211 15.3 An Annotated Bibliography...... 212 15.4 References...... 212
10 Topic 1
Recreational Mathematics
A Contradiction in Terms ?
Mathematics is too serious and, therefore, no opportunity should be missed to make it amusing. Blaise Pascal1
Keywords: exciting puzzles, mathematical riddles and mysteries recreational mathematics
1.1 An Invitation
The subtitle of this topic suggests that mathematics and recreation do not fit together nicely. Mathematics is generally considered a hard and dry business, how can that be reconciled with enjoyable ac- tivities like relaxation and recreation? Martin Gardner (1959b) writes in the foreword of one of his wonderful books: It’s the element of play which makes recreational mathemat- ics recreational and this may take many forms, may it be solving a puzzle, a magic trick, a paradox, a fallacy, an exciting game. And it’s the delight and intellectual pleasure we experi- Martin Gardner ence when having solved a difficult puzzle. For 1914–2010 this reason it should not come as a surprise that even most brilliant scientists could not resist the temptations of recre- ational mathematics. Indeed, being a connoisseur of puzzles and the like you are in best company: Visitors of Albert Einstein, for instance, reported that in his bookshelf he always had a section stocked with mathematical puzzles and games.
1Cited from Petkovic (2010).
11 Topic 1. Recreational Mathematics
1.1.1 The Challenge of Mathematical Puzzles
Mathematical puzzles are a passion of mine since my childhood and I never missed an opportunity to get hands on an apparently new one. Unfortunately, some fifty years ago those opportunities were rarer than they are today. Now we have the world wide web and finding new and interesting puzzles is very easy. But in those days one had to rely mostly on newspapers and maga- zines. Some of them, the better ones, had puzzle corners in their weekend editions. There you could find besides the obligatory big weekend crossword various picture puzzles, also called rebus, and even prob- lems coming with a mathematical flavor. Rebus were often nothing more than pictures cut at random and one had to rearrange the snippets properly, a very bor- ing business, recommended only to feeble minded per- sons, as Henry E. Dudeney2, once remarked. But from time to time one could find nice little gems, wonderful mathematical puzzles, exciting challenges of your mind. Higher mathematical education or scholarship was usu- ally not required to solve these newspaper puzzles, but Henry E. Dudeney originality, diligence, patience and some basic under- 1857–1930 standing of logic were very helpful. Solutions of these puzzles were given only one week later in the next weekend edition. So, either you solved the puzzle and trusted in your solution, or you had to wait pa- tiently. In the meantime one could discuss the problem with classmates, friends and even teachers. Others at my age collected stamps, I collected interesting puzzles by cutting them out of the newspapers and storing the clippings in folders which, as years passed by, grew bigger and bigger. Eventually it became necessary to bring some order into my collection. So I began to categorize my puzzles into those belonging to arithmetics and geometry, number theory, logic, combinatorics, magic squares, graph theory and probability. All these are very renown fields of mathematics, though frankly speaking, at an age of fourteen my mathematical knowledge was a quantit´en´egligeable. But this changed by and by because by solving puzzles I learned quite a lot. Well, not necessarily useful mathematics in the sense that the acquired knowledge was of much use in our math lessons in school. For instance, knowing how to construct a magic square will be of no help when dealing with problems from elementary analysis. Over time I also realized that there exist special books solely devoted to recre- ational mathematics and that some of these books were available in public libraries in Vienna. This opened up a whole new world when I could read the wonderful books of H. E. Dudeney and Martin Gardner, collections of most exciting and challenging mathematical puzzles and games. What makes mathematical puzzles so attractive, not only to me but to so many other people? Certainly, it is problem solving, it’s the excitement when working
2Henry E. Dudeney, the famous British grand master of puzzles and mathematical games.
12 Topic 1. Recreational Mathematics on a puzzle, it’s the intellectual pleasure when having solved it. Interestingly, problem solving in the realm of recreational mathematics is not of highest es- teem among many professional mathematicians, particularly those adhering to the pure doctrine. An exemplary representative of these was Edmund Landau3. He once coined the somewhat contemptuous term Schmier¨olmathematik. This is certainly a fairly extreme view of matters. Indeed, when you perform schol- arly research in the literature on recreational mathematics, you will soon find out that creative mathematicians are seldom ashamed about their interest in recreational topics. And regarding problem solving: Andrei A. Markov 4, once remarked (Basharin, Langville, and Naumov, 2004):
The alleged opinion that studies in classes are of the highest scien- tific nature, while exercises in solving problems are of lowest rank, is unfair. Mathematics to a considerable extent consists in solving problems and together with proper discussion, this can be of the high- est scientific nature while studies in classes might be of the lowest rank.
There’s nothing more to be said, I think, except that recreational mathemat- ics is pure mathematics uncontaminated by utility (copyright Martin Gardner (1959b)). While problem solving lies at the heart of mathematical puzzles, it would be certainly wrong to classify as puzzle whenever a mathematical problem is solved, often by extensive and complicate reasoning and calculation, think of hard exam problems, for instance. What we need, is some kind of working definition: What is a mathematical puzzle? It’s surprisingly difficult to arrive at a definition which finds general accep- tance5. But there are some distinguishing characteristics of mathematical puz- zles Peter Winkler (2011) has worked out when reviewing the book by Miodrag Petkovic (2010) on famous puzzles of great mathematicians. • First and foremost: A puzzle is an engaging, self-contained mathematical question. • A puzzle should have a raison d’ˆetre, something that makes it worth thinking about. • It should be easy to communicate among people. • No special devices like high speed computers are required, all you need is paper and pencil, if at all. We may take these characteristics as cornerstones of a more complete and ac- ceptable definition. But for the moment, let’s dispense with formalities, let’s look for some recreation and diversion. What would be better suited for this purpose than some fine puzzle? 3Edmund Landau, 1877–1938, German mathematician. 4Andrei A. Markov, 1856–1922, Russian mathematician, see Topic4: The Chains of Andrei A. Markov on page 73. 5This is actually one of the challenges for you when writing your thesis about this topic.
13 Topic 1. Recreational Mathematics
Please follow me on a short sightseeing tour through my collection.
1.1.2 Some Tiny Treasures From My Collection
Do it with matches
The first object I want to present you is more or less a warm-up. I owe it to my friend and chess partner Karl Segal (1919–1978). It is one exemplar of a plenitude of puzzles dealing with matches, checkers, coins etc. The famous Moscow Puzzles collected by Boris Kordemsky (1972) have a whole chapter devoted to this class of problems, though this one does not appear there. One evening in 1972 we were sitting in a Viennese chess caf´eand played chess. After an exciting game which I terribly lost Karl wanted to cheer me up. He took a napkin and a pencil and drew an equality sign, Then he grabbed in his pockets and finally found a box of matches, took out a few and arranged them on the napkin as shown in Figure1.1.
Figure 1.1: A puzzle with matches
Karl explained:
Of course, you know how to handle roman numerals. See, this is a mathematical statement, but obviously it is false as 7 6= 1. You are allowed to move one match, and only one match, so as to make this a true statement. The equality sign must not be touched. So, don’t change it into 6= by moving a match. Can you solve it? It is not too difficult.
I have tried to find out the puzzle’s origin, but did not succeed. It has been created sometime by some anonymous. Please give it a try. One hint: this puzzle shares a wonderful property with many other puzzles: The solution lies at an unexpected place.
Cutting a plate of gold.
Fallacies belong to the basic repertoire of recreational mathematics. They come in various forms: arithmetical fallacies, logical or geometrical fallacies. The latter are particularly interesting and often hidden in puzzles which belong to the class of dissection problems. In practically all books on recreational mathematics you can find the following puzzle, so it’s very likely that you have seen it before. Still, I included it into
14 Topic 1. Recreational Mathematics my exhibition because there’s much more behind than a geometrical fallacy. The origin of the puzzle is obscure, but David Singmaster (2014) has found indications that the puzzle may be due to Sebastiano Serlio (1475–1554), an Italian Renaissance architect. In Serlio’s 1545 book Libro Primo d’Architettura there occurs a geometrical construction which contains a fallacy similar to our puzzle but it passed unrecognized by Serlio. Graham, Knuth, and Patashnik (2003, p. 293) report that this puzzle was Lewis Carroll’s favorite6. Here’s the puzzle:
You have a plate of pure gold. It has the shape of a square solid with side length 8 cm and thickness 1 cm. You also have a special high-precision saw to cut the plate which produces no losses due to cutting. Cut the plate as shown in the figure below (left) and rearrange pieces in the way as shown below (right). You will find that the new rectangle has an area of 65 cm2, whereas the square before cutting had an area of 64 cm2. So you won one cubic centimeter of gold. Can you explain this miracle?
Figure 1.2: Cutting a plate of gold
At the day of writing this (June 2017) the gain is about 700 ¤. What a fine business idea, money out of nothing! Can you find an explanantion? No? Not yet? Then you will be surprised that the pieces can be arranged also in this way: Please check, now you have lost
one cubic centimeter of gold! But matters are even more mysterious. If you cut a square of size 13 × 13 in the same way as before (see Figure1.3), then
6Lewis Carroll, pen name of Charles Lutwidge Dodgson (1832-1898), British author (Alice’s Adventures in Wonderland), mathematician, logician and photographer.
15 Topic 1. Recreational Mathematics
Figure 1.3: Cutting a 13 × 13 square
you find that after rearranging pieces the resulting rectangle has an area of 168 < 132 = 169. Looking closer at Figure1.2, you may realize that the pieces in the square have side length 3, 5 and 8. In Figure1.3, these are 5, 8 and 13. These numbers are members of one of the most famous integer sequences:
1, 1, 2, 3, 5, 8, 13, 21, 34,...
There’s an ingenious device available on the internet, Sloane’s On-Line Ency- clopedia of Integer Sequences (https://oeis.org/). Just type in the first 5 numbers to learn more about that sequence. You may also consult the bible of discrete mathematics, Concrete Mathematics by Graham, Knuth, and Patash- nik (2003) where this dissection problem is discussed on page 292. In this context it is quite illuminating to see a short note by Oskar Schl¨omilch (1868)7. After having described this puzzle to readers of the Zeitschrift f¨ur Mathematik und Physik he concludes: Wir theilen diese kleine Leckerei mit, weil die Aufsuchung des begangenen Fehlers eine h¨ubscheSch¨uleraufgabe bildet und weil sich an die Vermeidung des Fehlers die L¨osungund Construction einer quadratischen Gleichung kn¨upfenl¨asst. So, after all, it is certainly impossible to cut a plane figure, rearrange pieces and the resulting figure has larger area. It’s a geometrical fallacy. Not more? Well, Ian Stewart (2008, pp. 163) points his readers to a really weird mathe- matical fact, the Banach-Tarski Paradox . In 1924 Stefan Banach and Alfred Tarski8 proved that it is possible to dissect a sphere into finitely many pieces (actually 5 pieces suffice!), which can then be rearranged to make two spheres, each the same volume as the original. There are no overlaps, no missing bits, the pieces fit together perfectly. It’s a mathematical truth, it can be proved. Still, this fact is so counter intuitive that we call it a paradox. It originates from our concept of volume and the impossibility of defining this concept in a sensible way for really complicated geometrical shapes.
7Oskar Schl¨omilch, 1823–1901, German mathematician. 8Stefan Banach, 1892–1945, Alfred Tarski, 1901 - 1933, Polish mathematicians.
16 Topic 1. Recreational Mathematics
Another Paradox
The origin of this treasure is unknown, the puzzle started spreading around the world in the mid 1990’s like a wave of influenza. You can find it for instance in Winkler (2004).
You have just moved into an old house with a basement and an attic. There are three switches in the basement marked with ON and OFF. One of the three switches is connected to a bulb in the attic. You have to find out which switch is connected to it. You are allowed to play with the switches for as long as you need to, but you may only go up to the attic once to check the bulb and then say which switch is connected to it. This is the only possibility to find out whether the bulb in the attic is lit.
Paradoxes convey a counter intuitive fact. This puzzle belongs to a class which can be termed: I think, I must not have heard correctly! How can one solve the problem with just one bit of information, the latter coming from climbing up to the attic? Still, it is possible.
The Returning Hunter
I found this old riddle in a classic text by Martin Gardner (1994). It runs as follows: A hunter climbs down from his perch, walks one mile due south, turns and walks one mile due east, turns again and walks one mile due north. He finds himself back where he started. There’s a bear at his perch, the hunter shoots the bear. What color is the bear? So, that was easy! Of course, the bear must be white, it must be an ice bear, since the perch is certainly located exactly at the north pole. Because, otherwise the hunter could not have walked the way described. But this was not the problem. Here it comes: Can you think of another spot on the surface of the earth (assuming it is a perfect sphere), so that you can walk one mile due south, turn and walk one mile due east, turn again, walk one mile due north and arrive at the point where you have started? If there is such another spot, where is it located? Is there more than one?
The Problem of the Pennies
Let us conclude our sightseeing tour with a real highlight. Our last puzzle is a very prominent one, several famous mathematicians and physicists have worked out solution methods and published papers about it. Still, it comes along in a charming and innocuous manner:
17 Topic 1. Recreational Mathematics
There is a set of 12 pennies, one of them is a counterfeit penny, it may be too light or too heavy. Identify this penny with three weigh- ings only using a simple balance beam. Note, it is unknown to you whether the counterfeit is too light or too heavy.
A variant of this puzzle appears as an unsolved problem posed by Guy and Nowakowski (1995) in the Problems and Solutions Section of the American Mathematical Monthly. The authors report that this puzzle was very popular on both sides of the Atlantic during World War II. It was even suggested that it should be printed on paper strips and dropped over Germany in an attempt to sabotage their war effort. After the war a series of papers proposing solutions appeared, the most elegant one is due to Freeman J. Dyson (1946), a famous British mathematician and physicist. Besides giving a complete solution Dyson shows that if M equals the number of pennies and there is a number n satisfying
1 M = (3n − 3), n = 2, 3,..., 2 then n weighings are always sufficient to identify the counterfeit penny and its type. This is exactly the case in our puzzle, for M = 12 we have n = 3. Dyson also gives a solution for the case 3 ≤ M < (3n − 3)/2 and shows that the puzzle has no solution if the number of pennies is too large, i.e. if M > (3n − 3)/2. So much about theory. Now take a sheet of paper and a pencil and try to solve this challenging puzzle. I’m sure you will not prove immune to the fascination of this problem. Have fun!
1.2 Where to go from here
Suppose you have been invited by some renowned magazine to write an article about recreational mathematics. You have been selected for this prestigious job because the editors know about your profound knowledge, your expertise in this field. Thus, the creative challenge is to write a suspenseful report about recreational mathematics! Here are some suggestions I find interesting and which you may consider. How- ever, you should not feel obliged, your own ideas are certainly welcome.
18 Topic 1. Recreational Mathematics
In any case: please illustrate your work with well-chosen examples and take care to present also elegant solutions, as far as solutions exist. Not every puzzle is solvable, however.
1. Recreational Mathematics in Education and Teaching. It is a remarkable fact that puzzles and riddles have been used in mathe- matical education since ancient times. The oldest known example is the Papyrus Rhind or Ahmes Papyrus which dates around 1550 BC. It con- tains about 85 exercises in geometry, arithmetics and algebra. Some of them could be called rightly puzzles. I think it would be quite challenging to write your thesis about the rˆole recreational mathematics plays in education. Puzzles and games are a wonderful way to let pupils participate and at the same time expand their capabilities in problem solving. Bonnie Averbach and Orin Chein have used recreational mathematics in their math lessons over several years. They gathered a lot of experiences which they finally compiled in a remarkable book (Averbach and Chein, 1980). Their teaching paradigm may be condensed in a few motivating sentences addressed to their stu- dents: You participate and be the mathematician. Take a problem and use anything you know to solve it. Think about it; strain your mind and imagination; put it aside, if necessary; keep it in mind; come back to it. If you can solve it on your own, isn’t the feeling great? If you can’t solve it, maybe some mathematics (new to you) would be helpful to know. let’s develop some and see. It’s also my opinion that pedagogically there is really much to be gained from the inclusion of recreational mathematics into your lessons9.
2. Famous mathematicians and their puzzles. Famous mathematicians from the days of antiquity up to our time have always taken interest in mathematical puzzles. To name just a few: Archimedes of Syracuse, Cardano, Pascal, Huygens, Newton, Euler, Gauss, Hamilton, Cayley, Sylvester. In the 20th century and in our days: von Neumann, Banach, Littlewood, Ramanujan, Conway, Erd˝os,Knuth and the physics Nobel-Prize laureate Paul Dirac. What are the puzzles and games, they created, on what occasion? In this context the excellent book of Petkovic (2010) will be very helpful.
3. Famous puzzles and their history. Recall that a good puzzle should have a raison d’ˆetre. Some of those puz- zles gained the distinctive character of being exceptional challenges. They are so interesting that even today many people are discussing them, new
9I have three sons, now adult. One became a chemist, one a physicist and one an artist, each of them has a sound mathematical education. During their childhood and youth I regularly entertained them with exciting puzzles, and we really enjoyed. Recently at a family meeting on occasion of an anniversary we remembered those good old days and all those puzzles and riddles. But then one of my sons said: “Dear dad, frankly speaking, sometimes this puzzle stuff was really a torture.” Well, perhaps one should not exaggerate it.
19 Topic 1. Recreational Mathematics
solutions are published, variations invented. The Problem of the Pennies presented above is a typical representative. Archimdes’ Cattle Problem is a computationally hard puzzle from number theory with remarkable history. Originally due to Archimedes of Syracuse (287–212 BC), it was rediscovered by Gotthold Ephraim Lessing in 1773 and not solved before 1880. It is still discussed in various mathematical journals. You will have no problem to find more exciting examples. For helpful references please see the annotated bibliography below and search the web.
4. Classes of puzzles. Another way to organize your thesis is to concentrate on a particular class of puzzles. There are many such classes: puzzles from number theory, geometry, probability, packing problems, magic squares, paper folding, topology (e.g. knots, Borromean rings), combinatorics and logic (there’s some overlap between these classes). Paradoxes and fallacies are another way to categorize puzzles. They have always been very popular. Rich sources of paradoxa and fallacies are geometry, arithmetics, topology and, last but not least, probability theory. One word on topology. In abbrevi- ated form, it deals with properties of geometric forms and space which are invariant with respect to continuous transformations. Today it is a major branch of mathematics, but, interestingly, it has its origin in a mathemat- ical puzzle, The Seven Bridges of K¨onigsberg, which has been solved by Leonhard Euler in 173610. Puzzles with a topological background abound, here is one I like very much:
Can a tire tube be turned inside out through a hole in its side?
Hint: Think big! Or what about river crossing puzzles whose origin may be traced back to medieval times, or train shunting? Never heard about this? Train shunting (switching) problems are particularly popular in Great Britain where they have a very loyal fan base. There are board games, clubs and even regular meetings and conventions devoted to train shunting. The background is a serious one coming from operations research and the good old days of infancy of railroading when there were no double tracks, no automatic switches and no turn tables. Here I have an example from my collection, it comes in many variations (see Figure 1.4 below):
At the end of the main line there is a circular track, furthermore there are two wagons and an engine. The objective is to use the engine to change the position of the two wagons and for the engine then to return to the main line. Unfortunately, there is also a low bridge which the engine can pass under, but neither of the two wagons can. The engine can push and pull wagons.
10There will be a topic in this collection on The Seven Bridges of K¨onigsberg in near future.
20 Topic 1. Recreational Mathematics
bridge
A B
Engine
Figure 1.4: Exchange railway wagons A and B
5. Puzzle Composers and Authors. When reading books or papers on recreational mathematics ever and ever again you will come across a couple of famous names: Henry E. Dudeney, Martin Gardner and Ian Stewart have been mention already. But there is also Sam Loyd, the great American author on puzzle, a somewhat contro- versial person, a short biography can be found in Gardner (1959b). We should not forget about Lewis Carroll, Peter Winkler, W. W. Rouse Ball and H. S. M. Coxeter, or John Horton Conway. The wide class of logic puzzles is intimately connected with the name of Raymond Smullyan, the great philosopher, logician and piano player. You may not forget about the important and original contributions of French authors. Among these Claude Gaspard Bachet de M´eziriac(1581–1638), jesuit clergyman and mathematician, who authored an influential book on recreational mathe- matics in 1612, Probl´emesPlaisants, which contains e.g., several variants of medieval river crossing problems, weighing puzzles and magic squares. Or Jacques Ozanam (1640–1718) who has also a fine book on mathemati- cal puzzles. And, finally I must mention another French author, Edouard Lucas (1842–1898). He has written a classical four-volume book on recre- ational mathematics and is the inventor of the famous solitaire game The Towers of Hanoi. It would be a fine and promissing idea to develop your thesis around the lives and works of these recognizable people. What was their mathemat- ical background? How did they become authors of texts on recreational mathematics? What about their scientific work apart from recreational maths? What were their greatest successes? Did they have teaching po- sitions? Are there puzzles and games which are inseparably linked with these people?
6. Recreational Mathematics in Media. As I already remarked in the Invitation, newspapers and magazines have always been sources for new and sometimes interesting mathematical puz- zles. This has a surprisingly long tradition. A wonderful example in many respects worth to be mentioned is the Ladies’ Diary or Woman’s Al-
21 Topic 1. Recreational Mathematics
manack which was published from 1704 until 1840 when it was succeeded by The Lady’s and Gentleman’s Diary. As an almanach it contained calen- dar information, medical advice, short stories and mathematical puzzles. These were either invented by John Tipper (before 1680–1713), the first editor of the Diary, or sent by readers. In issue 25 from 1709 Tipper wrote regarding puzzles: Arithmetical Questions are as entertaining and delightful as any other Subject whatever, they are no other than Enigmas to be solved by Numbers (Albree and Brown, 2009). At the beginning puzzles were rather easy but later the level of difficulty increased sub- stantially. Today many periodicals still have puzzle corners, but their character changed somewhat. Now, it’s Sudoku and its companions like Kenken, Hashiwokakero etc. which are dominating the field. I dare not say that these are puzzles in the sense to be discussed in your thesis. They are tasks, used mainly to kill time when waiting at airports, for instance. Their availability is almost unlimited because the can be created auto- matically by diverse computer programs. So, mathematically, they are not very interesting. However, sophisticated puzzles and games continue to be invented and published, a major rˆolenow playing the world wide web. I think to examine the issue of perception of recreational mathematics in diverse media today and in the past would be another interesting approach to our topic.
7. And What About Games? All the puzzles considered so far challenged our ability to reason, it was the delightful play with problems and ideas. Yet, there is another im- portant class of problems in recreational mathematics, games. There are solitaire games, like Lucas’ Towers of Hanoi, solitaire with pegs (in may variants), with polyominoes (an appealing generalization of dominoes with a surprisingly rich mathematical theory behind), etc. And then, we have games for two players. Now a new element comes into play (literally): one has to account for the ability of reasoning of an opponent. In many games the player when it is his turn has a choice between two or more possible moves. Which should be selected? This raises the fascinating problem of finding a winning strategy. For some sufficiently simple classes of games it can be proved that such strategies do exist, for others not. Why not dedicate your thesis to this aspect of recreational mathematics? In almost all books on puzzles you will also find discussions of various sorts of games, see the Annotated Bibliography below. If you decide to pursue this approach then you should have a look at the classical book of Berlekamp, Conway, and Guy (2001–2004).
22 Topic 1. Recreational Mathematics
1.3 An annotated bibliography
You will find that the list of references at the end of this Topic is a rather long one. Do not be afraid, there’s no need to read all these books and journal articles, it’s just an offer. Please read whatever you need for your thesis and whatever you find interesting (maybe all?) and make your choice. Of course, you are free to use other texts and resources. Let’s begin with the good old books on recreational mathematics. Edouard Lucas has written classical and often cited textbooks on recreational mathe- matics, in particular R´ecr´eationsmath´ematiques (4 volumes, 1882–1894) and L’arithm´etiqueamusante, 1895. In volume 3 of his R´ecr´eations you will find one of the most popular games/puzzles, the Towers of Hanoi. Unfortunately, no English translations are available, to the best of my knowledge, but digitized version of the French original texts can be read via the web. One of the grand seigneurs of puzzle and game literature is Henry E. Dudeney. Probably his best- known book is The Canterbury Puzzles (1908). The puzzles are presented by characters based from The Canterbury Tales by Geoffrey Chaucer (1343-1400), the greatest English poet of the Middle Ages. On-line editions of the collection are available via www.gutenberg.org. Rouse Ball and Coxeter (1987) is one of those fine books which offer an excellent mix of interesting puzzles and games, partly revivals from very old sources, and nontrivial mathematical theory required to understand and solve the problems presented. My first book on puzzles and games was The Moscow Puzzles by Kordemsky (1972). It is still abailable today and offers a wealth of problems and games (and solutions) in 14 chapters. By the way, the English translation has been edited by Martin Gardner. The book of Averbach and Chein (1980) is written in the same spirit, but the presentation is organized differently. The author’s intention is not only to raise interest of students in recreational mathematics, but it also introduces basic concepts needed to solve puzzles. You find there readable short introductions into logic, graph theory, number theory, etc. Thus the emphasis lies on problem solving. Regarding the latter, which as we know is central to mathematical puzzles, you should also read the famous booklet P´olya (2004). It’s title is How to Solve It and it presents in a charming way rather general guidelines when it comes to solving a mathematical problem. You will enjoy this book. Martin Gardner’s list of publications in recreational maths is quite long. He has been editor of puzzle columns in diverse magazines like The Scientific Ameri- can, many of the problems presented there were later collected in books. Let me mention only a few: Best Mathematical Puzzles of Sam Loyd (1959a), Mathe- matical Puzzles and Diversions,(1959b), Hexaflexagons, Probability Paradoxes, and the Tower of Hanoi,(2008). Regarding puzzles and games from logic I recommend reading the excellent and entertaing books by Raymond M. Smullyan. He has authored more than 30 books about logic and logic puzzles. At the first place I should mention the
23 Topic 1. Recreational Mathematics rather recent G¨odelianPuzzle Book (2013). Here you can find entertaining variations of Kurt G¨odel’s Incompleteness Theorems, puzzles related to basic concepts of modern logic like truth, provability and undicidability. You may find amusing and interesting also What is the Name of This Book? (2011) and The Lady or the Tiger? (1982). His many puzzles about truth-tellers and liars have become really famous. Some of his books are available in the web. Fine collections of rather recently invented puzzles are the books by Peter Winkler (2004) and (2007). Winkler is professor of discrete mathematics at the Dartmouth College. Berlekamp, Conway, and Guy (2001–2004) is the textbook on mathemati- cal games. Volumes 1-3 deal with strategies Raymond M. Smullyan for two-person games, volume 4 is devoted 1919-2017 to solitaire games, topological puzzles and even Rubik’s Cube finds its place there. Last but not least I want to point you to Petkovic (2010). This book is a collection of stories and puzzles due to great mathematicians. Carefully worked out solutions are given also.
1.4 References
[1] Joe Albree and H. Brown Scott. “A valuable moment of mathematical genius: The Ladies’ Diary (1704-1840)”. In: Historia Mathematica (2009), pp. 10–47. [2] Bonnie Averbach and Orin Chein. Problem Solving Through Recreational Mathematics. Dover Publications, 1980. [3] G. P. Basharin, A. N. Langville, and V. A. Naumov. “The Life and Work of A. A. Markov”. In: Linear Algebra and its Applications 386 (2004), pp. 3–26. [4] E. R. Berlekamp, J. H. Conway, and Richard K. Guy. Winning Ways for your Mathematical Plays. 2nd. 4 vols. A. K. Peters, 2001–2004. [5] Henry E. Dudeney. The Canterbury Puzzles. E. P. Dutton and Company, 1908. url: http://www.gutenberg.org/files/27635/27635-h/27635- h.htm. [6] Freeman J. Dyson. “The problem of the pennies”. In: The Mathematical Gazette 30.291 (1946), pp. 231–234. [7] Martin Gardner. Best Mathematical Puzzles of Sam Loyd. Dover Publi- cation, 1959. [8] Martin Gardner. Hexaflexagons, Probability Paradoxes, and the Tower of Hanoi. Cambridge Universty Press, 2008. [9] Martin Gardner. Mathematical Puzzles and Diversions. Penguin Books, 1959.
24 Topic 1. Recreational Mathematics
[10] Martin Gardner. My Best Mathematical and Logic Puzzles. Dover Publi- cations, 1994. [11] L. Graham Ronald, Donald E. Knuth, and Oren Patashnik. Concrete Mathematics. 2nd ed. Addison-Wesly, 2003. [12] Richard K. Guy and Richard Nowakowski. “Coin-Weighing Problems”. In: American Mathematical Monthly 102.2 (1995), pp. 164–167. [13] Boris A. Kordemsky. The Moscow Puzzles. Dover Publications, 1972. [14] Miodrag S. Petkovic. Famous Puzzles of Great Mathematicians. American Mathematical Society, 2010. [15] G. P´olya. How to Solve It. Princeton University Press, 2004. [16] W. W. Rouse Ball and H. S. M. Coxeter. Mathematical Recreations and Essays. 13th. Dover Publications, 1987. [17] O. Schl¨omilch. “Ein geometrisches Paradox”. In: Zeitschrift f¨ur Mathe- matik und Physik 13 (1868), p. 162. [18] David Singmaster. Vanishing Area Puzzles. 2014. url: http : / / rmm . ludus-opuscula.org/Home/ArticleDetails/94. [19] Raymond M. Smullyan. The G¨odelianPuzzle Book. Dover Publications, 2013. [20] Raymond M. Smullyan. The Lady or the Tiger? Dover Publications, 1982. [21] Raymond M. Smullyan. What is the Name of This Book? Dover Recre- ational Math, 2011. url: https://archive.org/details/WhatIsTheNa meOfThisBook. [22] Ian Stewart. Professor Stewart’s Cabinet of Mathematical Curiosities. Perseus Books, 2008. [23] Peter Winkler. Mathematical Mind-benders. Taylor and Francis, 2007. [24] Peter Winkler. Mathematical Puzzles: A Connoisseur’s Collection. A. K. Peters, 2004. [25] Peter Winkler. “Review of Famous Puzzles of Great Mathematicians by Miodrag S. Petkovic”. In: The American Mathematical Monthly 118.7 (2011), pp. 661–664.
25 Topic 1. Recreational Mathematics
26 Topic 2
Shortest Paths in Networks
It’s one thing to feel that you are on the right path, but it’s another to think yours is the only path. Paulo Coelho, 2006
Keywords: combinatorial optimization, graph theory, algorithms computer science
2.1 An Invitation
2.1.1 The problem and its history
One of my sons is living with his family in a small village close zu Z¨urich. From time to time I set out to visit him and normally, I travel by airplane because it takes only one hour flight time. But I could also use the car. In this case I have to expect eight or even more stressful hours of driving. On the rare occasions when I decide to use the car, then, of course, I want to drive a shortest route, so I use a navigation system. Typing in the point of departure and the destination of my journey, the navigation system calculates two suggestions of optimal routes I could drive, and this takes only fractions of a second. How can this be done so quickly? A na¨ıve approach to carry out calculations is this one: • Find all possible routes and calculate their lengths. • Select the route with shortest length. This idea is not a good one: even for moderate sized networks of roads the number of possible routes is in general an extremely large number. Furthermore, most of these routes are not worth to be considered. For instance, travelling from Vienna to Z¨urich via Milan hardly makes any sense. Modern navigation systems do their job in a completly different way, they use an algorithm for solving a shortest path problem (SPP). Any famous problem has its history. But for the SSP it is rather difficult to pin down the origins of the problem exactly because we have almost no written
27 Topic 2. Shortest Paths in Networks evidence from ancient times. There is an exception, though, the SPP does occur implicitly in a few medieval texts, sometimes hidden in some sort of puzzle. This is of course a curiosity. But seriously, we may imagine that even in very primitive societies finding shortest paths was an essential task for gathering food, distributing goods, i.e., trading, and for communication. Besides human beings also animal societies are confronted with SPPs. Due to evolution certain animal societies are really high performers when solving SPPs. An interesting and striking example is the argentine ant, Linepithema humile. These ants form mega colonies of monstrous size. One of these mega colonies ranges over more than 6000 km from the northern parts of Spain to the south of Italy and its subcolonies are connected by well organized and optimized paths. Thus it should not come as a surprise that in modern combinatorial optimization metaheuristics based on ant colonies are routinely applied to solve very difficult and large scaled vehicle routing problems (which involve SSPs, of course). The modern theory of shortest paths has its origins in the 1950’s when providers of communication systems were facing an enormous growth of traffic volume. For instance, when at that time a customer made a long-distance call the major difficulty was to get the call to its destination. If the route of first choice was busy then the operators had to send the call along a second best route, a third best, etc. In the process of automation of communication it was necessary to program telefone exchange facilities to find such alternate routes quickly. This certainly involves solving nontrivial SPPs. So, it is not surprising that within a couple of years many people came up independently with almost the same algorithms for solving SPPs. An interesting account of the developments during these years is Schrijver (2012). Today SPPs are among the most important and most fundamental optimiza- tion problems. They are intesting and challenging not only per se, but occur ever and ever as subproblems in more complex settings. The latter range from vehicle routing and related transportation problems, the analysis of large social networks, molecular biology (DNA sequence alignment) to the development of highly integrated microprocessors. Since the networks arising in these appli- cations are usually of extraordinary size, efficiency of algorithms is certainly a major issue.
2.1.2 Preparing the stage - graphs, paths and cycles
The mathematical structure underlying the SPP is that of a graph. In this subsection I’ll introduce to you a few important definitions and basic terminol- ogy. As the focus of this Invitation does not lie on mathematical rigor we may approach the concept of a graph in a rather pedestrian-like fashion. A graph is a set of points V where some pairs of these points are related in a certain way. A most natural way to visualize this concept is a roadmap: the points correspond to villages of junctions of roads in a certain area. Two points are related, if there exists a road connection between these points. This idea of a network of roads appears as early as 1736 when Leonhard Euler (1707-1783)
28 Topic 2. Shortest Paths in Networks solved the famous problem of the Seven Bridges of K¨onigsberg. Indeed, Euler’s work marks the beginning of a mathematical discipline known today as Graph Theory1. The points in the set V will be called vertices. If two points (locations) u, v ∈ V are related, e.g. by a road connection, then we call the ordered pair (u, v) an arc and denote the set of all arcs by A. Note carefully, that these pairs (u, v) are considered to be ordered, i.e., (u, v) 6= (v, u). Therefore, in the context of a network of roads arcs represent one-way roads. More formally, a graph G is an ordered pair G = [V,A] of the set of vertices and the set of arcs. The number of vertices n = |V | will be called the order of the graph G, the number of arcs m = |A| the size of G. As long as the order n of G is not too large we may draw a diagram of G. This is simply done in the following way: • Draw n labeled points in the plane, these are the vertices. • For each arc (u, v) ∈ A draw an arrow connecting vertices u and v such that the arrow points from u to v. Here is an example: let V = {1, 2, 3, 4} and define the set of arcs by
A = {(1, 2), (1, 3), (2, 1), (2, 3), (3, 4), (4, 2)}
Then one possible diagram of G = [V,A] would be:
2
1 4
3
A little bit more has to be said about the arcs of a graph. • Our definition allows A to equal the empty set ∅, but normally A will be nonempty and contain m = |A| > 0 arcs. • If there exists a pair a = (u, v) in A then we say that the arc a connects the vertices u and v. It is also customary to say that a is incident from u and incident to vertex v. For instance in our example above there is an arc a = (2, 3), thus a is incident from vertex 2 and incident to vertex 3. • We assume that an arc never connects a vertex with itself, so there are no loops (u, u). • Furthermore, as A is a set, no arc occurs more than once in A. It is possible to extend the definition of a graph so that its arc set A becomes a multiset with multiple occurrences of arcs but we will not do so in this Invitation2. 1There will be a Topic devoted to the K¨onigsberg Problem. 2Such parallel arcs are interesting in certain routing problems, e.g., the Chinese Postman Problem.
29 Topic 2. Shortest Paths in Networks
If for each arc (u, v) ∈ A there is also its opposite (v, u) ∈ A, then G is called an undirected graph, otherwise G is a directed graph, sometimes called a digraph. It helps intuition to think of a directed graph as a system of oneway roads, whereas an undirected graph may be seen as a network of roads each being twoway. There is a one-to-one correspondence between directed und undirected graphs: if we add to A for each arc (u, v) its opposite (v, u) unless it is already in A, then we obtain an undirected graph. This is a pretty simple idea but unfortunately, it has its limitations in a shortest path context. For definiteness: in all what follows, when the term graph occurs, then it alway means directed graph, unless stated otherwise. Here is an example of an undirected graph: V = {1, 2, 3}, and arc set
A = {(1, 2), (2, 1), (1, 3), (3, 1), (2, 3), (3, 2)}
When drawing a diagram of an undirected graph it is customary to draw a single line segment without arrow tips of each pair of arcs {(u, v), (v, u)}:
2
1 3
Let G = [V,A] be a graph. A path P from vertex s to some vertex t is a sequence of contiguous arcs
P = [(s, a), (a, b), (b, c),..., (y, z), (z, t)]
P is a simple path if does not use the same arc more than once, P is elementary, if it does not use the same vertex more than once.
2 A path P from 1 → 4
P = [(1, 2), (2, 3), (3, 4)] 1 4
P is simple and elementary
3
The path P in the graph displayed above may be written more compactly as P = [1, 2, 3, 4]. If for a path P initial vertex and terminal vertex coincide, then P is called a cycle. For instance C = [1, 3, 4, 2, 1] is a cycle3:
3 A note on terminology. Many authors prefer the term circuit, but we use the term
30 Topic 2. Shortest Paths in Networks
2
1 4
3
Note that in this graph [1, 2, 1] is also a cycle, a 2-cycle. A graph is weakly connected, if for each pair of vertices u, v ∈ V there exists either a path from u → u or from v → u. A graph is called strongly connected, if for each pair of vertices there exists a path from u → v and also a path from v → u. It easy to verify by inspection that the sample graph given above is strongly connected. But in general, proving connectedness is a rather nontrivial task. In the sequel we will always assume that the graphs we are dealing with are at least weakly connected.
2.1.3 Weighted graphs
Let G = [V,A] be a graph and assign weights to arcs. Each arc (u, v) ∈ A is attached a weight w(u, v) which we assume to be an arbitrary real number. Let me now present some examples, and I will use this opportunity to show you which meaning we can give the notion of length of a path.
Example 1. (Transportation)
In a transportation network the weights most often represent physical distances between destinations. But weights may also be cost of transportation, as it is the case with the graph in Figure 1.1. The weight w(3, 6) = 2 means that transportation from vertex 3 to vertex 6 costs 2 ¤ per unit, whereas when transporting goods from 3 to 2 we make a profit of 4 ¤ per unit because w(3, 2) = −4. In this example it is most natural to define the length `(P ) of a path P as the sum of its constituent arcs. Thus for a path P = [v0, v1, . . . , vk] we define: X `(P ) = w(vi−1, vi). i=1
For instance, the path P = [1, 3, 2, 5, 6] in the graph displayed in Figure 1.1 has cycle, as it prevails in the literature on shortest paths. Interestingly, although the theory of graphs is a rather mature field of mathematics there is still some babylonian confusion. Indeed, Richard Stanley (MIT and Clay Institute) once said: The number of systems of terminology presently used in graph theory is equal, to a close approximation, to the number of graph theorists.
31 Topic 2. Shortest Paths in Networks
4 6 1 3 2 5 10 5 4 1 -4 5
7 3 6 2
Figure 2.1: A transportation network
length
`(P ) = 7 + (−4) + 3 + 5 = 11.
By simple inspection you will find that this P is not the shortest path from 1 → 6.
Example 2. (Reliability)
The weights assigned to arcs of a graph may also be probabilities. When does this make sense? Consider for instance a communication network. We may model this as a graph with vertices representing transmitters or relay stations, arcs are radio or cable connections. An interesting type of weight of the arc (u, v) is the maximum capacity of the wire connecting vertices u and v. But here we are more interested in the reliability of a connection, the probability that the connection is available at a particular time. Figure 1.2 gives an example of a small communication network.
0.95 0.98
3 5 7 0.99 0.98 0.92
0.94 1 0.94 0.99 0.96
0.98
2 4 6 0.94 0.87
Figure 2.2: A communication network
32 Topic 2. Shortest Paths in Networks
Consider the path P = [1, 2, 3, 5, 7] in the corresponding graph. What is the reliability of this path? Assuming statistical independence, we would calculate:
R(P ) = 0.98 · 0.94 · 0.99 · 0.98 = 0.89375
In other words, the reliability of a path is just the product of the weights of those arcs which lie on the path P . Of course, we would be interested in finding a path of maximum reliability. The problem now looks somewhat different from that in Example 1. But this is not so. We may most easily transform our problem to that of finding a shortest path. Just take logs!
If we define the reliability of P = [v0, v1, . . . , vk] as
k Y R(P ) = w(vi−1, vi), i=1 then
k X ln R(P ) = ln w(vi−1, i) i=1 There are two important observations you can make at this point: • Maximizing R(P ) is equivalent to maximizing ln R(P ), since the logarithm is a monotone increasing function. • Because arc weights w(u, v) are probabilities, we have 0 < w(u, v) ≤ 1. But this implies that the logs are negative: ln w(u, v) ≤ 0 for all arcs in the graph. Therefore
R(P ) → max ⇔ ln R(P ) → max ⇔ − ln R(P ) → min
In other words, if we replace the arc weights w(u, v) by − ln w(u, v) then finding a path of maximum reliability becomes a shortest path problem!
Example 3. (Trading currencies)
This beautiful example is taken from Sedgewick and Wayne (2011, p. 679). Trading currencies is an important branch of business for many big banks like Deutsch Bank and others. The basic idea lying at the heart of this business is to exploit short term deviations from equilibrium state on foreign exchange markets. To see how it works consider a set of five currencies: U.S. dollars (USD), Euros (EUR), British pounds (GBP), Swiss francs (CHF) and Canadian dollars (CAD). At a particular day4 the following exchange rates between these currencies were: 4Unfortunately the authors do not disclose to us the exact date when these data have been recorded.
33 Topic 2. Shortest Paths in Networks
USD EUR GBP CHF CAD USD 1 0.741 0.657 1.061 1.005 EUR 1.349 1 0.888 1.433 1.366 GBP 1.521 1.126 1 1.614 1.538 CHF 0.942 0.698 0.619 1 0.953 CAD 0.995 0.732 0.650 1.049 1
Suppose now, we want to change 1000 USD into CAD. How much Canadian dollars do we get? Easy, just calculate:
1000 USD = 1000 · 1.005 = 1005 CAD
Can we get more? Let’s try this: first convert USD into Swiss francs and then we convert these into Canadian dollars:
1000 USD = 1000 · 1.061 · 0.953 = 1011.1 CAD
You see: it makes a differences. Alternatively, we may convert to EUR first and then to CAD:
1000 USD = 1000 · 0.741 · 1.366 = 1012.2 CAD
This is again a little bit more, compared to direct conversion USD → CAD we have a plus of 0.72 percent. Not very much it seems, but thinking of a global player moving around billions of dollars, the situation appears in another light. Now, it is quite natural to ask: is there an optimal strategy of converting cur- rencies? We do not want to rely on trial and error any longer, we need a mathematical model. Let’s model the currency exchange problem by a graph: as vertices we take the five currencies. Each vertex is connected to every other vertex by an arc, because we can convert every currency in any other. The weights w(u, v) of the arcs are the exchange rates from currency u to currency v given in the table above. Figure 1.3 shows a diagram of this graph. Note that I have drawn this graph in such a way that the orientations of the arcs are encoded in the arc labels together with exchange rates. Otherwise the diagram would have become too messy to be useful. A sequence of conversions corresponds to a path in this graph, the weight of the path is the product of the exchange rates along the path and equals the total exchange rate along that path. For instance, the path P = [USD, EUR, CAD] has a weight of w(P ) = 0.741 · 1.366 = 1.0122.
For a path P = [v0, v1, . . . , vk] the total exchange rate equals:
k Y E(P ) = w(vi−1, vi) i=1 Of course, we want paths which maximize E(P ). As in Example 2 by taking logarithms of weights and changing their signs we turn this maximum problem
34 Topic 2. Shortest Paths in Networks
USD
→ 1.005 0.741 ← ← 0.995 1.349 →
→1.366 EUR CAD
←0.732 → ← 1.061 0.942
0.657 1.521 ← →
→ ← → 0.888 1.433 1.126 ← 1.538 0.698 → 0.953 1.049 0.650 → ← ←
→1.614 GBP CHF ←0.619
Figure 2.3: The currency exchange problem
into a problem of finding a shortest path:
k X E(P ) → max ⇔ − ln E(P ) = − ln w(vi−1, vi) → min i=1
At first sight the situation looks completly analogous to that encountered in Example 2. But there is a subtle difference. In Example 2 the weight trans- formation w(u, v) 7→ − ln w(u, v) resulted in weights all nonnegative because weights were probabilities. Here this is no longer the case. The transformed weights may be positive or negative. This has serious consequencies.
Example 4. (Scheduling)
This nice example has been adapted from Gondran and Minoux (1995, pp. 65). The construction of a single-family house requires the performance of a number of certain tasks like masonry, making the roof, sanitary installations, etc. These tasks cannot be performed in arbitrary order. For instance the carpentry of the roof requires a greater part of masonry to be finished. The following table gives a (highly aggregated and simplified) list of tasks, their duration in days and their dependencies:
35 Topic 2. Shortest Paths in Networks
Task Nr. Task Duration Previous tasks 1 masonry 10 – 2 carpentry of roof 3 1 3 tiling of roof 1 2 4 sanitary and electrical installations 8 1 5 front 2 3,4 6 windows 1 3,4 7 garden 4 3,4 8 ceiling 3 6 9 painting 2 8 10 moving in 1 5, 7, 9
Let’s represent the project of building a house by constructing a precedence graph G = [V,A]. This allows us to take explicitly care of dependencies of tasks. G has as vertices the 10 tasks, so V = {1, 2,..., 10}. Whenever a task v requires another task u to be finished, then this induces an arc (u, v) ∈ A in G. These arcs we assign weights equal to the duration of task u.
3 1 2 3 7
10 8 1 4
1 1 4 5 10 10 8 2
8 1 6 8 9 1 3
Figure 2.4: The precedence graph for building a house
See Figure 2.4 for a diagram of this precedence graph. One of the most important properties of precedence graphs is that they cannot have cycles. Looking at Figure 2.5 reveals immediately why this is so. If we try to interpret the graph in Figure 2.5 as precedence graph we run into serious trouble, for it says: • Task 2 cannot start before task 1 has been finished. • Task 3 cannot start before task 2 has been finished. • Task 1 cannot start before task 3 has been finished ??? The last statenment says that task 1 precedes itself which is impossible. Proper precedence graphs are so-called DAGs, an acronym for directed acyclic graph.
What about paths and their lengths in a precedence graph?
36 Topic 2. Shortest Paths in Networks
2
1 3
Figure 2.5: Precedence graphs must be acyclic.
Consider for instance the path P1 = [1, 4, 5] in Figure 2.4, it has total length `(P1) = 18. This tells us that performing tasks 1 and 4 in this order will take 18 days. But can we start working on task 5 immediately after task 4 has been completed? No! From the precedence graph we can read off that task 5 also depends on task 3, this in turn depends on task 2. So we find: The earliest time that task 5 can be started is the length of the longest path from 1 to 5 in the graph because this takes care of all activities task 5 is depending on. This fact is of considerable significance in project scheduling: any activity or task u represented by a vertex in a precedence graph cannot be started earlier than the length of some longest path to u. In our example vertex u = 10 is of particular interest, because the length of a longest path from 1 → 10 gives us the makespan of building the house, the earliest time (counted from t = 0) that the house is finished. So, in Example 4 we end up with a longest path problem5. Is it possible as in Examples 2 and 3 to transform this problem into a shortest path problem? Yes, and that is again very easy, just negate all arc weights, i.e., apply the transformation w(u, v) 7→ −w(u, v). In general, as the next section shows, this transformation leads into serious trouble. But for precedence graphs it works fine, because precedence graphs have no cycles, they are DAGs.
2.1.4 Solvability
In the last section we have seen that several optimal path problems can be reduced to the SSP. Thus at first sight it seems more or less obvious that the SSP must have a solution for any given weighted graph. But unfortunately, this is not so. The problem arises only when we have negative weights. The following small example shows why. What is a shortest path from 1 → 4? Well, guided by intuition you may suggest that P = [1, 2, 3, 4] is a shortest path with length `(P ) = 20. But is it really the shortest path from 1 → 4? No! Consider Q = [1, 2, 3, 5, 2, 3, 4]. This path has length `(Q) = 15, so Q is shorter than P . But Q isn’t a shortest path either because the path R = [1, 2, 3, 5, 2, 3, 5, 2, 3, 4] is even shorter, it has length `(R) = 12. Now you can see the problem very clearly: the paths R and Q contain the
5Eric Denardo once ironically remarked that only a notorious pessimist can be interested in longest paths.
37 Topic 2. Shortest Paths in Networks
5
−10 5
1 2 3 4 10 2 8
Figure 2.6: There’s a problem with this graph!
cycle C = [2, 3, 5, 2] which has negative length `(C) = −3. Each time a path traverses the cycle C the length of a path 1 → 4 is reduced by 3 units. Since we can traverse this cycle as often as we like, it follows that the length of any path 1 → 4 can be made arbitrarily small. In other words, the SSP has no solution for the graph given in Figure 1.4! Thus we have the important result: whenever a connected graph contains neg- ative cycles then the SSP has no solution, one also says the SSP is an ill posed problem. Although the SSP has no solution, the existence of negative cycles in a graph is a truly significant message, it tells us something very important about the problem at hand! To see this, let’s look once again at the currency exchange problem discussed in Example 3. Consider the path P = [USD, EUR, CAD, USD] which is a cycle. After transforming the exchange rates w(uv) 7→ − ln w(u, v), we find that P has length
`(P ) = − ln(0.741) − ln(1.366) − ln(0.955) = −0.0071196,
so this cycle has negative length. What does it mean? At the first place it means that the SSP on this graph has no solution. There exists no (finite!) optimal path to convert USD to CAD. At the second place the existence of a negative cycle opens a fascinating eco- nomic perspective! The total exchange rate along this cycle equals
E(P ) = e−`(P ) = e−0.0071196 = 1.0071
In other words, starting with 1000 USD, converting these to EUR, then EUR to CAD and these back to USD we get 1007.1 USD. Of course the profit does not appear to be very high. But you may bear in mind that a trader with an initial capital of USD 1 000 000 can make a profit of USD 1 007.1 every minute, about USD 420 000 per hour! In practice the so-called arbitrage profit is limited only by the time required to perform the exchanges, but using high frequency trading devices the profit may be extremly large, indeed. Thus we get money out of nothing! What a fine business idea. Frankly speaking, the picture I have drawn is not a complete one as it does not account for transaction cost. But if the trader is a global player like Deutsche
38 Topic 2. Shortest Paths in Networks
Bank then transaction cost can be kept at minimum level. Thus the arbitrage business is plenty profitable in the real world. Let us pause for a moment here. In this section we have seen that the SSP has a solution if the underlying graph has no cycles of negative length. Thus an algorithm for the SSP should be capable of: • Detection of negative cycles; • Efficient determination of a shortest path. In the next section I will introduce you to a very general idea to cover both issues.
2.1.5 It’s time to relax
Let G = [V,A] be a directed and connected graph with given arc weights. Before we embark on formulating an algorithm for the SSP we should become clear about what we really want: • If there is a negative cycle then it should be detected.
otherwise
• We may fix two vertices s and t and find the shortest path connecting s and t. This is known as single-pair shortest path problem (SPSP). • We may fix one vertex s and want to find the shortest paths to all other vertices, commonly refered to as single-source shortest path prob- lem (SSSP). • Alternatively we want to determine the shortest paths between all pairs of vertices in G, the all-pairs shortest path problem (APSP). Quite remarkably, from a computational point of view finding shortest paths from one source to all other vertices is not significantly more expensive than solving the single-pair problem. In this section we shall concentrate on SSSP and defer the all-pairs problem to Section 2. A majority of algorithms for the single-source problem are based on a very simple and effective idea: relaxation of arcs. The origin of this idea is not completely clear. As far as I could find out it appeared for the first time in Ford (1956), and independently in the 1958 French edition of Berge (1962, p. 70). Relaxation is a very simple and intuitive concept: suppose we have found a path P (s, u) connecting s and u with length d(u) and a path P (s, v) to v having length d(v). Neither P (s, u) nor P (s, v)need be shortest paths! Suppose further that there exists an arc (u, v) with length w(u, v).
39 Topic 2. Shortest Paths in Networks
u ) s, u P ( w(u, v) some path
s v some path P (s, v)
Relaxing the arc (u, v) means that we test whether we can improve the path to v found so far by going through u. This would be the case, if
d(v) > d(u) + w(u, v). (2.1)
If inequality (2.1) happens to hold then just take the shortcut by going to v via u, that’s the idea! Easy, isn’t it? Any arc (u, v) satisfying (2.1) is called eligible for relaxation, otherwise ineligi- ble. Now let’s craft this into an iterative algorithm. This is apart from implementa- tion details (some of which will be discussed shortly) the famous Bellman-Ford Algorithm6. For our algorithm to run we need two vectors d and p, each of dimension n = |V |, the number of vertices. • The vector d = [d(1), d(2), . . . , d(n)] holds the lengths of the shortest paths from s to any other vertex found so far. We initialize the distance vector by:
d(s) = 0 d(k) = ∞ for k 6= s
• The second vector p is used to keep track of shortcuts. Initially all com- ponents of p are set to zero. Whenever an arc (u, v) is relaxed then we set p(v) = u thus indicating that going to v is shorter by passing through u and then take arc (u, v). We will need p to reconstruct the shortest paths from s to any other vertex. After initialization we perform the following two steps:
(A) Find any eligible arc (u, v). If there is none, then STOP. (B) If (u, v) is eligible, then set d(v) = d(u) + w(u, v), update the predecessor vector p by putting p(v) = u and return to (A).
Before we give this algorithm a try, we have to discuss briefly some important questions.
6Richard Bellman (1920–1984), Lester Randolph Ford jun. (1927–)
40 Topic 2. Shortest Paths in Networks
1. Does this algorithm actually find all shortest paths from s to any other vertex in G? Yes, unless G has a negative cycle because, as we already know, in this case the SPP is ill posed and has no solution. As the algorithm is iterative, finding the solution requires the algorithm to converge meaning that the sequence of distance vectors has a limit. An informal argument for convergence is this: at any stage of the algorithm the distances in vector d are bounded from below by the actual minimum lengths of shortest paths. It can never happen that some d(u) is smaller than the length of a shortest path from s → u. Furthermore, the relaxation condition (2.1) guarantees that the values in vector d will decrease monotonically. Note that this is an informal and incomplete argument. Of course, convergence of the Bellman-Ford Algorithm requires a rigorous proof, please see Section 2.
2. In which order? In step (A) we required to find any eligible arc, nothing was said about how and in which order eligible arcs are processed. The Bellman-Ford Algorithm resolves this ambiguity by first forming a list of all arcs in some order, there are m = |A| of them, and then processing them one after the other by checking eligibility. By organizing search for eligible arcs in a more sophisticated manner we get other algorithms for SPP which are more efficient than Bellman-Ford. See Section 2.
3. When to stop? We stop when there are no more eligible arcs. But how many iterations are necessary? Suppose that there is no negative cycle, then a shortest path from s to some vertex v cannot have more than n−1 arcs. For, if it had, then the path would have passed through at least one vertex more than once. But that can happen only if it passed through a cycle. If there are only cycles of length ≥ 0 then the relaxation (2.1) would have been violated. So that’s impossible. On the other hand, if there are negative cycles, we will always find eligible arcs and the algorithm will never stop unless we force it to do so. In the Bellman-Ford Algorithm this is simply done by: • Process the list of all arcs n − 1 times. Stop earlier when there are no more eligible arcs. • Check if there is still an eligible arc. If so, then we have encountered a negative cycle.
2.1.6 A sample run of the Bellman-Ford Algorithm
Let’s return to the simple transportation network of Example 1 and let’s find all shortest paths from vertex 1 to any other vertex. The weighted graph G = [V,A] of this network is repeated here for ease of reading:
41 Topic 2. Shortest Paths in Networks
4 6 1 3 2 5 10 5 4 1 -4 5
7 3 6 2
The list of arcs and their weights is:
(u, v) (1, 2) (1, 3) (2, 4) (2, 5) (2, 6) (3, 2) (3, 6) (5, 3) (5, 4) (5, 6) w(u, v) 10 7 6 3 5 −4 2 4 1 5
We initialize the distance and predecessor vectors d and p:
u 1 2 3 4 5 6 d 0 ∞ ∞ ∞ ∞ ∞ p 0 0 0 0 0 0
Now we perform a first pass through the list of arcs. Each of them is eligible for relaxation, so we get step by step:
arc (1, 2) : d(2)=∞ > d1 + w(1, 2) = 0 + 10 = 10 thus relax (1, 2) and put d(2) = 10, p(2) = 1
arc (1, 3) : d(3)=∞ > d1 + w(1, 3) = 0 + 7 = 7 thus relax (1, 3) and put d(3) = 7, p(3) = 1 ...
arc (3, 2) : d(2)=10 > d3 + w(3, 2) = 7 + (−4) = 3 thus relax (3, 2) and put d(2) = 3, p(2) = 3 ...
The first pass results in distances d and predecessors p:
u 1 2 3 4 5 6 d 0 3 7 12 11 9 p 0 3 1 5 3 3
In the second pass we find that out of 10 arcs only 5 are still eligible for relax- ation, viz.,
(2, 4), (2, 5), (2, 6), (5, 4), (5, 6)
The second pass yields distances and predecessors:
u 1 2 3 4 5 6 d 0 3 7 7 6 8 (2.2) p 0 3 1 5 2 2
42 Topic 2. Shortest Paths in Networks
Passing through the arc list a third time we find that no more arcs are eligible for relaxation, therefore we stop here. The distances and predecessors (2.2) found during the second pass are optimal. Now let’s see what we have found: column u = 4 (2.2) tells us that a shortest path from 1 to 4 has length d(4) = 7. Also, a shortest path 1 → 6 has length d(6) = 8, etc. But, what is the shortest path to, say, u = 4? This path can be found by means of the p-row in table (2.2). Let P1,4 = [1,...? ... 4] denote the shortest path from 1 → 4. In column u = 4 we have p(4) = 5. Thus in the shortest path 1 → 4 the vertex visited immediately before 4 is vertex 5, so
P1,4 = [1,...? ... 5, 4]
The vertex visited before u = 5 is the predecessor p(5) = 2. The predecessor of 2 is p(2) = 3, the predecessor of 3 is p(3) = 1, Here we may stop, because 1 is the starting vertex, thus a shortest path from 1 → 4 is
P1,4 = [1, 3, 2, 5, 4]
But we have found even more! Looking closer to the predecessor vector p,
u 1 2 3 4 5 6 , p(u) 0 3 1 5 2 2 we see that the columns u = 2, 3, 4, 5, 6 determine arcs of the form (p(u), u). They form a subset B ⊂ A of the arcs of our graph G = [V,A], a very special subset. We used some of them to determine of shortest path 1 → 4. Let us redraw the diagram of our graph and emphasize these arcs, also add the calculated distances d(u) to the vertices of the graph:
7 4 6 1 3 3 2 5 6 10 5 4 0 1 -4 5
7 3 6 2 7 8
The arcs in subset B together with the original vertex set V = [1, 2, 3, 4, 5, 6] form a subgraph T = [V,B] of G = [V,A]. It has two very special properties: • T is a tree: in T there is exactly one path from vertex 1 to any other vertex in T . • It is a spanning tree because all vertices of G graph belong to this tree.
43 Topic 2. Shortest Paths in Networks
This spanning tree is called the shortest path tree of G rooted in vertex 1. From T we can determine immediately all shortest paths from 1 to any other vertex in V . For instance, a shortest path from 1 → 6 is Q = [1, 3, 2, 6] and it has length d(6) = 8.
2.1.7 The complexity of the Bellman-Ford Algorithm
What is the amount of computational work to be done when solving the SSSP for an arbitrary graph G by means of the Bellman-Ford Algorithm? Of course, this depends on both, the order n = |V | of G and its size m = |A|. Let us perform a worst case analysis. At the heart of the Bellman-Ford Algorithm there is the relaxation part:
if d(v) > d(u) + w(u,v) d(v) = d(u) + w(u,v) p(v) = u
The CPU-time t required to perform the relaxation of an arc (u, v) certainly depends on computer architecture and processor. This time is bounded by some constant a, i.e., t ≤ a, where the constant a is machine dependent. However, in any case a is independent of n and m. The graph has m arcs, so the time required to perform one pass through all arcs cannot be larger than a · m. If we cannot exclude apriori the existence of a negative cycle, then n passes are necessary. It follows that in total Bellman- Ford requires computing time not more than a · m · n. Thus, if we denote by τ the running time of the Bellman-Ford Algorithm, we have in the worst case τ ≤ a · m · n, where a is a constant independent of m and n. Mathematicians have a nice formalism to express this bounding, they say that τ is a big-Oh of m · n, written as
τ = O(m · n), (2.3)
This is the time complexity of the Bellman-Ford algorithm. The big O’s follow some rather weird arithmetic rules, but as it is a really important concept, so you should learn how to handle it. Chapter 9 of Graham, Knuth, and Patashnik (2003) is a wonderful source and very helpful. The upper bound (2.3) looks somewhat harmless but it is not. Size m and order n of a graph are not really independent. The number of arcs m may be any number between m = 0 (the graph has no arcs at all) to m = n(n − 1). In the latter case there is an arc between any pair of vertices, such a graph is called complete. These are two extremes, of course. If m is of order O(n2), i.e. m ≤ Mn2, M independent of n, then the graph G is said to be dense, it has really a lot of arcs. If m is of order O(n) then we say that G is a sparse graph, there are about as many arcs as there are vertices. Practical experience tells us that most graphs in real world applications of the SSSP are sparse.
44 Topic 2. Shortest Paths in Networks
To summarize: the running time of the Bellman-Ford alogorithm is of order O(n3) when G is dense, and is runs considerably faster, namely in O(n2) time when G is sparse. So, after all, is Bellman-Ford an efficient algorithm ? Under particular circumstances (negative weights, possibility of negative cycles) it is fairly efficient. But you will find out that for graphs having certain special properties Bellman-Ford is rather slow compared to other algorithms for the SSSP, see Section 2 below. After having read this Invitation so far the question arises: Are you still interested in this topic? If so, fine! Welcome on board! Please read on and see what I want from you.
2.2 Where to go from here
There are a few points which I find you should discuss carefully, issues of general interest. I have also collected some ideas and suggestions, optional issues, which you may find worth to be explored and presented in your thesis. But, of course, you are free (and strongly encouraged) to formulate your own ideas and make them part of your work. Regarding structure and design: Your thesis should be a fine mixture of theory and practice. Develop theoretical concepts and underpin these by appropriate examples. Note that this makes it necessary to implement algorithms in some computing environment. Regarding the latter there are practically no restric- tions. The computing environment may be R, matlab/octave, java, python or some other programming language.
2.2.1 Issues of general interest
Data structures
A most important point to be discussed quite early in your thesis is how to represent graphs numerically. You will have to learn about adjacency matrices and adjacency lists. Explain these concepts, discuss their implementation, their advantages and disadvantages, see Cormen et al. (2001, chapter 23) or Gondran and Minoux (1995, chapter 1).
Ralaxation
Provide a more detailed discussion of the relaxation principle. It has some inter- esting properties which deserve a presentation, see Cormen et al. (2001, chapter 25). Also, give a complete proof that relaxation converges to the optimum and actually finds all shortest paths from a single source vertex. Good references
45 Topic 2. Shortest Paths in Networks in this context are Sedgewick and Wayne (2011, chapter 4) and Gondran and Minoux (1995, chapter 2).
The Bellman-Ford Algorithm
Implement the Bellman-Ford Algorithm whose basics we have discussed above. Demonstrate it by running the algorithm on a graph of your choice (may be you find some interesting application). How does your algorithm work in case of a graph having a negative cycle?
Dijkstra’s Algorithm
This is one of the most famous and most efficient algorithms for solving the single-source shortest path problem. It is due to Edsger W. Dijkstra (1959)7. Recall that in the Bellman-Ford Algorithm we process the list of all arcs in some order. However, a clever choice of processing order may improve efficiency a lot. Explain in detail how Dijkstra’s algorithm determines the order in which arcs are processed. Show that a na¨ıve implementation has running time O(n2), where as always n = |V | is the number of vertices. If the graph is sparse, then using special data structures this time bound can be improved to O((m + n) log n), m = |A| the number of arcs. However, Dijkstra’s Algorithm can be used only when arc weights are positive. It fails in presence of negative weights, why?
2.2.2 Some more suggestions
Unit weight graphs - Breadth-First Search
There are interesting applications of the shortest path problem where all arcs of the underlying graph have the same weight, say w(u, v) = 1 for all u, v ∈ V . Find an interesting application of unit weight graphs and give an example. Interestingly, there are some problems in recreational mathematics, which lead to unit weight graphs8. The most efficient way to solve the SSSP for these special graphs is a method commonly known as Breadth-First Search (BFS). Explain this algorithm, show that its running time is O(m + n). One also says that BFS is linear in time. Actually, this is the best we can hope for. The book of Dasgupta, Papadimitriou, and Vazirani (2008, chapter 4) will be very helpful in this context.
7(1930–2002), dutch computer scientist 8Alcuin of York’s (ca. 735-804) medieval puzzle of the goat, the wolfe and the cabbage is a quite famous example, also the Two Jugs or Wine Decanting Problem is a SPP on unit weight graphs.
46 Topic 2. Shortest Paths in Networks
Longest paths in scheduling
In Example 4 I have presented a classical problem from scheduling theory, find- ing the makespan of a project consisting of several concurrent tasks. The un- derlying precedence graphs have no cycles and therefore it is always possible to put some order on vertices, a topological order. Once such an order is estab- lished (you should explain how this can be done), finding the longest path in a precedence graph can be accomplished in linear time. Discuss various aspects of the planning problem by means of an interesting example. See Gondran and Minoux (1995, pp. 67). You may also point out the relation to a classical optimization paradigm, dynamic programming.
All pairs of shortest paths
There are several interesting problems in combinatorial optimization where it is necessary to determine shortest paths between any pair of vertices in a weighted graph. The most prominent application is the Traveling Salesman Problem.A salesman has to visit customers in n different cities which we regard as vertices in a network of roads. How can we design a shortest route such that each city is visited at least once and the salesman returns to the city where he has started his tour. Note, I have emphasized that each city has to be visited at least once. If it where possible to the design the tour in such a way that each city is visted exactly once then the route would by a Hamiltonian Cycle. The Hamiltonian Property is very special. Except for special cases of network structure, for instance when the graph is complete, no efficient methods are known to find out whether such a cycle even exists, not to talk about finding it. However, it is possible to embed a connected graph into a complete graph which is always hamiltonian. For this to be accomplished, all pairs of shortest paths have to be found. One reasonable possibility is to apply Dijkstra’s Algorithm on each vertex pro- vided there are no negative arc weights. Using a fast implementation of Di- jkstra’s Algorithm this can be done very efficiently. But there are also other methods of comparable efficiency, the most prominent being the Algorithm of Floyd-Warshall which has running time of O(n3). A nice feature of this algo- rithm and some of its refinements is that it can handle negative arc weights also. Of course, negative cycles are still not permitted.
Negative arc weights
Negative weights cause problems as we have seen. Not only that they may induce negative cycles which render the SPP unsolvable, they also affect effi- ciency. Dijkstra runs much faster than Bellman-Ford. So it’s quite natural to ask whether there is a way to get rid of negative weights? The following idea is attractive because of its simplicity and seems so obvious, but it does not work in general. Possible troubles are sketched in Figure 2.7.
47 Topic 2. Shortest Paths in Networks
In the graph shown on the left side we have a negative weight, w(3, 4) = −9.
3 12 1 2 1 2
2 7 11 16
4 3 4 3 -9 0
Figure 2.7: Adding constants may cause troubles
A shortest path 1 → 4 certainly exists and is easily found by inspection: P = [1, 2, 3, 4]. Suppose we add 9 to all weights, then these become nonnegative, but the shortest path changes also, as can be seen from the righthand picture. Now a shortest path 1 → 4 is simply P = [1, 4]. Although this idea is too simple to work in general, a more sophisticated ap- proach of changing weights does work and give rise to another interesting way of solving the APSP, Johnson’s Algorithm which on sparse graphs is even more efficient than the algorithm of Floyd-Warshall.
Scaling
Substantial impovements in efficiency are achievable when arc weights are inte- gers. This is not an uncommon situation, for instance in scheduling arc weights are processing times which are typically given by positive integers. Scaling al- gorithms take the binary representation of the weights and uncover the bits one at a time from the most significant (leftmost) bit down to the least significant bit. In a first pass shortest paths are determined using the most significant bit of arc weights only, in the second pass we use the two highest order bits, and so on. The point is that these problems are not independent and it is possible to find optimal paths lengths dk(u) based on the first k bits from the path lengths dk−1(u) very efficiently. This idea has been introducted by Gabow (1985), see also Cormen et al. (2001, pp. 547). Scaling algorithms have a run- ning time of order O(m log W ), where W = max w(u, v) is the maximum arc weight. Variants of these algorithms can handle also negative arc weights.
2.2.3 To be avoided
Your thesis should not become a new booklet on graph theory. It is not neces- sary to give an introduction into the basic concepts of graph theory and explain the terms graph, path, cycle, etc. You may presuppose that your interested read- ers either have some knowledge in this field or are willing to aquire it. In the latter case it is sufficient to give some references to books or other ressources of have found to be useful during your studies. See the Annotated Bibliograpy
48 Topic 2. Shortest Paths in Networks in Section 3. Avoid discussion of the SSP on undirected graphs unless you know what you are doing. As long as arc weights are positive there is no problem, all standard algorithms will work also for undirected graphs because these can be trans- formed easily into directed graphs, we have talked about that very briefly on page 30. However, in case of negative weights this simple transformation will inevitably create negative cycles and render the SSP unsolvable. Solving the SSP on undirected graphs with general weights is much more difficult because this requires the concept of matching, a very advanced theme in graph theory. As this thesis topic is one with a strong graph theoretic flavour, you will have to draw diagrams of graphs, there will be tables, outlines and listings of algorithms, etc. Do not copy such items from anywhere and paste it into your text. This is very bad style and will not be accepted. As your thesis has to be typeset in LATEX, watch out for appropriate LATEX packages. For instance, graph drawing is very easy when you use the tikz-package, available from various sites in the web. This packages is also very well documented thus you will have no troubles when drawing fine diagrams. By the way, all graphs in this Invitation have been created with tikz.
2.3 An Annotated Bibliography
There is a large number of excellent books on graph theory. One of the best introductory texts is Chartrand (1975). It is strongly recommended if you want to get an overview of major concepts of graph theory (which you should). An- other (more advanced) book is Berge (1962). Claude Berge (1926–2002) was one of the most influential french mathematicians in the 20th century contributing many deep results in combinatorics and graph theory. Berge was also interested in arts, he was sculptor, painter and novelist9. Berge’s book has a chapter on shortest paths (chapter 7). In this you can find one of the first formulations of the relaxation pronciple. Interestingly, most books on graph theory do not cover the SSP. An explanation might be that most graph theorists view the SSP not as a problem interesting in graph theory but as one belonging to combinatorial optimization. However, there is a couple of books emphasizing algorithms for graphs and these books usually contain thorough discussions of the SSP. The books I like most are Gondran and Minoux (1995), Christofides (1975) and Gibbons (1991). The text of Michel Gondran and Michel Minoux is strongly influenced by Claude Berge, it has a long chapter on the SSP. There you will find a simple proof that iterated relaxations actually converge to the optimal solution. And you will find there also a detailed acount of the project scheduling problem and longest paths. In textbooks on combinatorial optimization the SSP always occurs at a promi-
9Among other things he wrote the murder mystery Who killed the Duke of Densmore? Sherlock Holmes could answer this question by means of a theorem on perfect graphs, an important topic in the scientific work of Berge.
49 Topic 2. Shortest Paths in Networks nent place as it is the classical combinatorial optimization problem. A won- derful book is Lawler (2001). This text is more or less selfcontained, it has a very readable introductory chapter on graph theory (chapter 2) and gives a deep coverage of the SSP in chapter 3. The approach to the SSP is somewhat different from ours, as it is not directly based on the relaxation idea but on a fundamental system of nonlinear equations arising from Bellman’s Optimality Principle which lies at the heart of dynamic programming. When talking about combinatorial optimization one must also mention Papadimitriou and Steiglitz (1998) as it is a classical textbook in this field. The SSP occurs in this book at various places (there’s no chapter devoted exclusively to the SSP). Really interesting is the formulation of the SSP as a linear programming problem with integer-valued decision variables. When you have some background in linear programming, you may find this book a valuable source. The SSP is also of some significance in computer science, so it does not come as a surprise that textbooks from this field usually contain interesting material. There are three books I strongly recommend. On the first place there is the Bible Of Computer Programmers, Cormen et al. (2001). It is a quite voluminous book with an extensive part on algorithms for graphs. There you can find among other things a thorough discussion of the relaxation principle. A fine text is Dasgupta, Papadimitriou, and Vazirani (2008), a really gentle introduction to algorithms with a careful coverage of the SSP. Last, but not least, there is the excellent book of Sedgewick and Wayne (2011). Like Cormen et al. it has a part in algorithms for graphs with a fine chapter on the SSP. This book also discusses in detail the computer implementation of algorithms, the language used throughout the book is java. Finally, regarding the origins of the SSP in the 1950s, you may consult the paper of Schrijver (2012).
2.4 References
[1] C. E. Berge. The Theory of Graphs and its Applications. Methuen, Lon- don, 1962. [2] Gary Chartrand. Introductory Graph Theory. Dover Publications, 1975. [3] Nicos Christofides. Graph Theory - An Algorithmic Approach. Academic Press, 1975. [4] Thomas H. Cormen et al. Introduction to Algorithms. 2nd. McGraw-Hill Higher Education, 2001. [5] Sanjoy Dasgupta, Christos H. Papadimitriou, and Umesh Vazirani. Al- gorithms. 1st ed. New York, NY, USA: McGraw-Hill, Inc., 2008. url: http://cseweb.ucsd.edu/~dasgupta/book/toc.pdf. [6] Edsgar W. Dijkstra. “A note on two problems in connexion with graphs”. In: Mumer. Math. 1 (1959), pp. 269–271. [7] L. R. Ford. Network Flow Theory. RAND Corporation, 1956.
50 Topic 2. Shortest Paths in Networks
[8] Harold N. Gabow. “Scaling algorithms for network problems”. In: Journal of Computer and System Sciences 31.2 (1985), pp. 148–168. [9] Alan Gibbons. Algorithmic Graph Theory. Cambridge University Press, 1991. [10] Michel Gondran and Michel Minoux. Graphs and Algorithms. John Wiley and Sons, 1995. [11] L. Graham Ronald, Donald E. Knuth, and Oren Patashnik. Concrete Mathematics. 2nd ed. Addison-Wesly, 2003. [12] Eugene L. Lawler. Combinatorial Optimization: Networks and Matroids. Dover Publications, 2001. url: www.plouffe.fr/simon/math/Combinat orialOptimization.pdf. [13] Christos H. Papadimitriou and K. Steiglitz. Combinatorial Optimization, Algorithms and Complexity. Dover Publications, 1998. [14] Alexander Schrijver. “On the history of the shortest path problem”. In: Documenta Mathematica (2012), pp. 155–167. [15] Robert Sedgewick and Kevin Wayne. Algorithms, 4th Edition. Addison- Wesley, 2011.
51 Topic 2. Shortest Paths in Networks
52 Topic 3
The Seven Bridges of K¨onigsberg
This question is so banal, but seemed to me worthy of attention in that neither geometry, nor algebra, nor even the art of counting was sufficient to solve it. Leonhard Euler, March 13, 1736 (in a letter to Giovanni Marinoni)
Keywords: graph theory, Euler paths, Euler cycles, the Chinese Postman Problem and its variants, urban operations research
3.1 An Invitation
3.1.1 Euler’s 1736 paper
This topic deals with a classical problem from graph theory which is very easy to state and also easy to understand. In pictorial language: consider a network of roads and find a tour through the network such that each road is used exactly once. Such a tour, when it exists, will be called an Euler cycle, if the tour returns to the loca- tion where it started. If initial point and end point of the tour do not coincide but still all raods are tra- versed exactly once, then we shall call it an Euler path. The problem has its origin in a famous puzzle which attracted Euler’s interest in 1736. The results of Eu- ler’s efforts to solve the puzzle were compiled into one of Euler’s most celebrated publications (Euler, 1736). This paper marks the beginning of two new branches Leonhard Euler of mathematics: topology and graph theory. Topol- (1707-1783) ogy, today a basic discipline of modern mathematics, deals with properties of space and bodies which remain invariant with respect to continuous transformations, like stretching, etc. Topology goes back to Gott- fried Leibniz (1646–1716) who envisaged a new kind of analysis, a geometry of position (geometria situs). On the other hand, graph theory deals with binary
53 Topic 3. The Seven Bridges of Konigsberg¨ relations between elements of a given set. As such it is a prominent part of what is known today as discrete mathematics. There is some confusion with respect to the date of publication of Euler’s paper. In the records of the St. Petersburg Academy of Science it is noted that Euler presented the K¨onigsberg problem and its solution in a talk given on August 26, 1735. The paper is contained in the Commentarii of 1736, but their publication was delayed so that this volume did not appear in print before 1741. Very likely, the presentation date 1735 is a misprint (Gr¨otschel and Yuan, 2012).
3.1.2 K¨onigsberg and a puzzle
In the 18th century K¨onigsberg (Regiomonti), the old capital of East Prus- sia, was one of the most important cultural and economic centers around the Baltic Sea. It was home of the Albertina University, home of the philosopher Immanuel Kant and of several famous mathematicians, just to mention Gustav J. Jacobi, David Hilbert and Hermann Minkowski. Over centuries K¨onigsberg grew into a large and wealthy town with many beautiful churches and a fine cathedral dedicated to Virgin Mary and St. Adalbert. In the midth of the 18th century (the time our story plays) K¨onigsberg had about 50 000 inhabitants. Its situation on the Pregel River made it an ideal trading center for many commodities, such as grain, potash, salt, hemp, and wood (R. J. Wilson, 1986). The city was set on both sides of the Pregel, and included two K¨onigsberg arround 1930 large islands which were con- In the foreground the open Schmiedebr¨ucke nected to each other and the mainland by seven bridges: Gr¨uneBr¨ucke, Kr¨amerBr¨ucke, Schmiedebr¨ucke, Hohe Br¨ucke, Holzbr¨ucke, K¨ottelbr¨ucke, and Honigbr¨ucke. In K¨onigsberg there was the nice tradtion of Corso. On sunday afternoon families used to promenade through the center of the city, having some rest in one of the caf´es,meeting friends, having conversation, and, yes, reasoned about a remarkable puzzle which must have originated at that time: Is it possible to walk through the city, crossing every of the seven bridges once and only once and to return to the point where the promenade has started?
3.1.3 Euler takes notice of the puzzle
Nobody could solve this puzzle and since all attempts to solve it had always failed it was commonly believed that this task is impossible. The question re-
54 Topic 3. The Seven Bridges of Konigsberg¨
Figure 3.1: Map of K¨onigsberg and its seven bridges, Source: Heritage Schoolhouse
mained unsettled until the famous Swiss mathematician Leonhard Euler (1707- 1783) took notice of it. It is not know when and from whom Euler learned about this problem for the first time. Actually, Euler never visited K¨onigsberg, as far as we know. But very likely the historical course was this (Sachs, Stiebitz, and J. R. Wilson, 1988): Carl Leonhard Gottlieb Ehler, at that time mayor of the city of Dantzig was a friend of Euler and a mathematical enthusiast. During the years 1735-1742 he corresponded with Euler in St. Petersburg, mainly acting as an intermediary between Euler and Heinrich K¨uhn,professor of mathematics at the Academic Gymnasium in Dantzig. Via Ehler K¨uhncommunicated the K¨onigsberg Problem to Euler. In a letter dated from March 9, 1736, Ehler wrote to Euler1:
You would render to me and our friend K¨uhna most valuable service, putting us greatly in your debt, most learned Sir, if you would send us the solution. which you know well, to the problem of the seven K¨onigsberg bridges, together with a proof. It would prove to be an outstanding example of the calculus of position [Calculi Situs], worthy of your great genius. I have added a sketch of the said bridges . . .
It took Euler only a few days to find a solution. He immediately reported it to Giovanni Jacobo Marinoni (1670–1755), astronomer at the court of Emperor Leopold I. in Vienna. In a letter dated from March 13, 1736, Euler wrote:
A problem was proposed to me about an island in the city of K¨onigsberg, surrounded by a river spanned by seven bridges, and I was asked whether someone could traverse the separate bridges in a connected walk in such a way that each bridge is crossed only once. I was informed that hitherto no-one had demonstrated the possibility of doing this, or shown that it is
1This and the following excerpts of letters are translations taken from Sachs, Stiebitz, and J. R. Wilson (1988).
55 Topic 3. The Seven Bridges of Konigsberg¨
impossible. This question is so banal, but seemed to me worthy of atten- tion in that neither geometry, nor algebra, nor even the art of counting [ars combinatoria] was sufficient to solve it. In view of this, it occurred to me to wonder whether it belonged to the geometry of position [geometria situs], which Leibniz had once so much longed for. And so, after some delibera- tion, I obtained a simple, yet completely established, rule with whose help one can immediately decide for all examples of this kind, with any number of bridges in any arrangement, whether such a round trip is possible, or not. . .
And on April 13, 1736 Euler replied to Ehler’s letter of March 9. The following citation reveals clearly that Euler didn’t consider the K¨onigsberg problem an interesting one from the standpoint of a mathematician. Indeed, he considered it merely a puzzle:
Thus you see, most noble Sir, how this type of solution bears little rela- tionship to mathematics, and I do not understand why you expect a math- ematician to produce it, rather than anyone else, for the solution is based on reason alone, and its discovery does not depend on any mathematical principle. Because of this, I do not know why even questions which bear so little relationship to mathematics are solved more quickly by mathemati- cians than by others. In the meantime. most noble Sir, you have assigned this question to the geometry of position, but I am ignorant as to what this new discipline involves, and as to which types of problem Leibniz and Wolff2 expected to see expressed in this way.
Not much later Euler must have changed his mind, now considering this puzzle important enough to write a paper about it.
3.1.4 Euler’s solution
In the sequel I will cite freely from Euler’s paper and the translation due to Michael Behrend (2012). Euler starts his paper by relating the K¨onigsberg Problem to the geometry of position (geometria situs) initiated by Leibniz. And now let us listen to Euler’s own words. In paragraph 2 he writes: § 2. This problem, then, which was described to me as quite well known, was as follows: At K¨onigsberg in Prussia there is an island A called der Kneiphof, and the river around it is divided into two branches, as shown in the figure; and the branches of this river are crossed by seven bridges a, b, c, d, e, f, and g. The following question was now raised concerning these bridges: whether someone could arrange a walk in such a way as to travel over every bridge once and not more than once. Some people (I was told) deny that this is possible, and others doubt it; but nobody asserts it. From this I set myself the following quite general problem: whatever the form of the river and its
2Abraham Wolff (1710–1795), jewish mathematician in Berlin, friend of Euler. Gotthold Ephraim Lessing memorialized Wolff by the figure of the dervish Al Hafi in his drama Nathan der Weise.
56 Topic 3. The Seven Bridges of Konigsberg¨ distribution into branches, and whatever the number of bridges, to find whether it is possible for each bridge to be crossed only once, or not.
Figure 3.2: A simplified map of K¨onigsberg
Euler begins his analysis by replacing the map of the city by a simpler diagram, as it is shown in Figure 3.2. It is likely that for his study Euler used an even more simplified diagram which represents what is called a graph.
C g
c d e A D
a b
B f
Figure 3.3: The translation of Figure 3.2 into a graph
You can find a more formal definition of graphs in Topic2: Shortest Paths in Networks. However, for our purpose it is sufficient to pursue a somewhat pedestrian-like approach: A graph is a pair G = (V,E) where V is a set of points (in our case V = {A, B, C, D}, the islands and riverbanks) and E a set of lines connecting these points (the bridges E = {a, b, c, d, e, f, g}). The points are called vertices, the line segments edges. Euler then remarks that in principle it is possible to solve the K¨onigsberg Prob- lem by making an exhaustive list of all possible routes and finding by inspection whether a particular route satisfies the conditions of the problem. But he im- mediately rejects this approach as impractical because the number of different routes will be too large, in general. Indeed, Euler doesn’t want to solve only the
57 Topic 3. The Seven Bridges of Konigsberg¨
K¨ongsberg Problem, he is seeking a general method suitable for much bigger networks. Next Euler represents routes as sequences of capital letters. For instance ABDC represents the route starting on the island A, then passing to the riverbank B by using bridge a or b, continuing to island D crossing the bridge f and going to C by passing bridge g. In modern graph-theoretic terms such a sequence is called a walk. Observe that Euler when writing ABDC for a walk allows for some ambiguity in that the used bridges are not specified. Later in his paper he resolves this ambiguity by writing more explicitely for a possible walk from A to C: AaBfDgC More generally: If a graph has vertex set V and edge set E, then a walk is a sequence of alternating vertices and edges v0, e1, v1., . . . , en, vn. If all edges are different (which is certainly required by the K¨onigsberg Problem) the walk is called a path. If initial and terminal vertex of a path coincide, the path becomes a cycle. If this cycle contains all edges of the graph the path (cycle) is called Euler path (cycle). By careful analysis Euler identifies the crucial condition for the existence of a solution of the K¨onigsberg Problem and its generalizations to more complex networks. It is the parity (odd or even) of the number of bridges which lead to an area. He writes in paragraph 20: § 20. Therefore in any given case it will be very easy to decide straightaway whether a crossing by each bridge once only can be planned or not, with the help of the following rule. If there are more than two regions with an odd number of bridges leading into them, then it can safely be stated that there is no such crossing. And if there are exactly two regions with an odd number of bridges leading into them, then the crossing can be done, provided the walk is started in one of these two regions. Finally, if there is no region at all with an odd number of bridges leading to into it, then the crossing can be planned in the desired way, and the start of the walking can be placed in any region. Therefore the rule just given fully satisfies the statement of the problem. In modern graph theory, the number of edges leading to a vertex v is called the degree of that vertex. It is usually denoted by d(v). Thus § 20 may be summarized by the statement: let G = (V,E) be a graph which is connected, which means, that it is possible to find a path between any pair of vertices, so there are no isolated areas. Then:
1. G contains an Euler cycle, if there are no vertices of odd vertex degree. 2. It contains an Euler path (initial and terminal vertex are different), if G has at most two vertices of odd degree.
These conditions are really easy to check. For the K¨onigsberg Problem we read off from Figure 3.3: d(A) = 5, d(B) = 3, d(C) = 3, d(D) = 3,
58 Topic 3. The Seven Bridges of Konigsberg¨ so, all vertex degrees are odd numbers, therefore the K¨onigsberg Problem has no solution. There is no Euler cycle and also no Euler path. Euler discusses also another example, where an Eulerian path exists but no Euler cycle. See Figure 3.4.
Figure 3.4: Another example in Euler’s paper
Representing areas again as vertices, it is an easy task to draw the graph rep- resenting this map. Counting the vertex degrees yields:
d(A) = 8, d(B) = 4, d(C) = 4, d(D) = 3, d(E) = 5, d(F ) = 6
There are exactly two vertices of odd degree, D and E, thus no Euler cycle exists, but an Euler path can be found. It must lead from D to E (or vice versa) such that it traverses each bridge exactly once. Can you find this path? Of course, you can find one such path (may be more?) just by patiently search- ing in the graph for an Eulerian path. But, certainly, you will come to the conclusion, that a na¨ıve search may be too cumbersome in bigger graphs. What is needed, is an algorithm! Let us read what to Euler says about this problem. In paragraph 21 (the last in his paper) he writes: § 21. But when it has been found that such a crossing can be arranged, the question remains how the walk is to be carried out. For this I use the following rule: in imag- ination, let the bridges that lead from one region to another be removed in pairs as many times as possible, by which the number of bridges will in most cases be greatly reduced; then let a walk be planned across the remaining bridges, which is easily done; then the bridges that were imagined as removed will not much disturb the walk so found, as will at once be seen on very little consideration; nor do I think it necessary to give more rules for arranging the walks. Frankly speaking, the rule Euler formulates can hardly be considered an algo- rithm. Still, there is no doubt that Euler knew how to construct an Eulerian path. But we should bear in mind that the theory of algorithms did not exist
59 Topic 3. The Seven Bridges of Konigsberg¨ in Euler’s time nor did Euler have the concept of recursion which proves to be very useful (Gr¨otschel and Yuan, 2012).
3.1.5 What happened to the problem later?
About 100 years after their publication in the St. Petersburg Commentarii Eu- ler’s discoveries about the K¨onigsberg Problem were almost forgotten. In 1851 E. Coupy published a french translation of Euler’s paper, mainly intended for his students at the Ecole´ Polytechnique in Paris, and in 1875 Louis Saalsch¨utz, professor at the K¨onigsberg University pointed at, that after a new bridge (Kaiserbr¨ucke) has been built to connect riverbanks B and D an Euler Path from A to D was now possible: Hence, there are only two odd vertices, A and
C g
c d e A D g a b
B f
Figure 3.5: In 1875 a new bridge has been built connecting B and D
C, so at least one Euler path from A → C is possible. E.g.,
[A, D, B, A, B, D, C, A, C]
However, an Euler cycle is still impossible. Euler’s findings have been rediscovered several times notably in Recreational Mathematics3, in particular in the context of unicursal problems, mazes and labyrinths. An unicursal problem is a diagram tracining puzzle: we are given a diagram like that the famous Lantern of Santa Claus in Figure 3.6. It is required to draw the diagram with as few as possible pen strokes without drawing a line more than once. The lantern is certainly unicursal, as it has an Euler path from A → B. Indeed, there are 44 such paths!
3.1.6 An epilog: K¨onigsberg and its bridges today
The beautiful medieval city of K¨onigsberg no longer exists. It has been de- stroyed by bombing raids in late summer 1944 and military actions during the battle of K¨onigsberg in winter 1944/45. The bridges, the principal actors in
3You may have seen that Chapter1 in our booklet is devoted to this subject.
60 Topic 3. The Seven Bridges of Konigsberg¨
D
E C
A B
Figure 3.6: The Santa Claus Lantern is unicursal
our story, were either destroyed or severly damaged. Today K¨onigsberg is a Russian naval base named Kaliningrad, the river Pregel has been renamed to Pregolya. The seven bridges no longer exist. Today there are eight bridges crossing the Pregolya river, partly reconstructions of the old bridges. From the Schmiedebr¨ucke only its pillars are still existing.
3.1.7 The Chinese Postman Problem
Are there any useful applications of Euler’s findings? So far, you might have got the impression that Euler has merely solved a puzzle, not more. Yes, there are! Let me introduce you to a most famous optimization problem: the Chinese Postman Problem. In the city of Qufu4 (Shandong Province, China) there are many post offices. Each post office serves a certain district of the city, and these districts are di- vided into subnetworks of roads which are assigned to postmen for mail delivery. Figure 3.7 displays such a subnetwork of roads to be served by a single post-
PO 5 2
3 3 3
5 1
4 6 3
2 6
Figure 3.7: A road network for a Chinese postman
4Place of birth of the great Chinese philosopher Confucius (551 BC - 479 BC).
61 Topic 3. The Seven Bridges of Konigsberg¨ man. The numbers attached to the roads are walking distances in kilometers, say. On each of his daily tours the postman starts at the post office (PO), then has to pass through each of the roads at least once and finally returns to the office. Now, a natural question is: How can we arrange a tour of mini- mum length? This question has become known as the famous Chinese Postman Problem (CPP), stated and solved first by the Chinese mathematician Mei-Ko Kwan in 1960. The name of this problem is very likely due to Jack Edmonds (1965) who coined the term in honor of Mei-Ko Kwan. The CPP is a classical combinatorial optimization problem like the Traveling Salesman Problem (TSP), but unlike the latter, it is well-behaved in the sense that even large instances of this problem can be solved in reasonable time. But, how is the CPP related to the K¨onigsberg Puzzle discussed by Euler? To see this relation, let us first represent the road network of Figure 3.7 schemat- ically by a graph. The post office, junctions and turns are the vertices of the graph, the roads are repesented by edges which are assigned as weights the distances. The resulting graph is shown in Figure 3.8 below, where the post office is located in vertex 1.
1 2 3 5 2
3 3 3
4 5 6 5 1
4 6 3
7 8 9 6 2
Figure 3.8: The graph corresponding to the network of Figure 3.7
If this graph had an Euler cycle this would immediately yield a solution of the CPP: the postman simply had to follow the cycle, and as the cycle passes through each edge it is certanly of minimum length because any Euler cycle has the same length, just the sum of all edge lengths of the graph. In this case, there is nothing to optimize. But looking at Figure 3.8 you realize that this graph is not Eulerian, because the vertices 2, 4, 6 and 8 have odd degree. Therefore no Euler cycle can exist. So, what should the postman do? Easy, think in practical terms: he will have to traverse some of the streets more than once. And now, we do indeed have an optimization problem: determine those streets which have to be traversed twice in such a way that the resulting round trip has minimum length. Mei-Ko Kwan’s solution approach is in principle a very easy one:
62 Topic 3. The Seven Bridges of Konigsberg¨
• Determine all vertices of odd degree in the graph. Let this set be denoted by M. The number m = |M| of elements in M will always be an even number, as Euler (1736) shows in § 16 of his paper. In our example M = {2, 4, 6, 8}. • Find all pairwise matchings of the vertices in M. In other words, find the set of all pairings of elements in M. If |M| = m, then it is easily seen that the number of pairwise matchings equals (n − 1)(n − 3) ··· 3 · 1. In our case we have m = 4 and therefore we have 3 · 1 = 3 matchings:
M1 = {2 ↔ 4, 6 ↔ 8},M2 = {2 ↔ 6, 4 ↔ 8},M3 = {2 ↔ 8, 4 ↔ 6} • For each matching determine the shortest paths connecting the vertices of each pair: Matching vertices shortest path length M1 2 ↔ 4 [2, 1, 4] 8 6 ↔ 8 [6, 9, 8] 5 Total length of M1 13 Note that the shortest path 2 ↔ 4 is not unique. We could have taken as well the path [2, 5, 4] also having length 8. Similarly, Matching vertices shortest path length M2 2 ↔ 6 [2, 5, 6] 4 4 ↔ 8 [4, 7, 8] 10 Total length of M2 14 And finally, Matching vertices shortest path length M3 2 ↔ 8 [2, 5, 8] 9 4 ↔ 6 [4, 5, 6] 6 Total length of M3 15
Matching M1 has the shortest total length, so duplicate the edges of the two shortest paths comprising this matching. In other words, for the shortest path [2, 1, 4] duplicate edges (2, 1) and (1, 4), for the path [6, 9, 8] add edges (6, 9) and (9, 8). These are the roads the postman has to walk twice. The last step is to figure out an Euler cycle in the augmented graph. As this network is a really small one, we may do this by simple inspection. Just take a pencil and pass along the edges of the graph depicted in Figure 3.9 recording the vertices visited to obtain: C = [1, 2, 3, 6, 9, 6, 5, 2, 1, 4, 5, 8, 9, 8, 7, 4, 1] We note in passing that this solution is not unique. This was a very simple example, a toy problem, as I have to concede. But the CPP isn’t merely a toy problem, it has many serious and important applications. And in these routing has to be performed in networks of really large size. For this task we need an algorithm, of course. Actually, we need three algorithms:
63 Topic 3. The Seven Bridges of Konigsberg¨
1 2 3 5 2
3 3 3
4 5 6 5 1
4 6 3
7 8 9 6 2
Figure 3.9: The augmented graph
1.) An algorithm for determining shortest paths in a weighted graph. Floyd’s Algorithm mentioned in Topic2( Shortest Paths in Networks) is a good choice.
2.) We need an algorithm to solve the minimum cost matching of odd-degree vertices.
3.) Finally, we need one more algorithm to determine an Euler cycle in the augmented graph.
In real world applications of the CPP networks sometimes have hundreds or even thousands of vertices and edges. Clearly, what is needed are efficient algorithms to perform the steps outlined above. More about that in Section 2. Typical examples of large scale CPPs come from urban operations research: • Routing of trucks for waste collection and street cleaning. • Optimal organization of snow and ice control on roads during winter time. • Routing of schoolbuses, etc. Here are some examples from industrial production where the CPP also proves to be very useful: • In shipbuilding industry huge plates of steel have to be cut, a process usually performed by plasma cutting devices. Here it is necessary to minimize the number of piercing points and waste of raw material. • In large storage depots and container terminals stacker cranes are used to move goods and containers around. A typical stacker crane must start from an initial position, perform a set of movements, and return to the initial position. The objective is to schedule the movements of the crane so as to minimize total cost. • Multifunctional robots are key elements of modern industrial production. They must carry out complicated movements depending on the task to be performed. These movements can be optimized in such a way that
64 Topic 3. The Seven Bridges of Konigsberg¨
processing times and consumption of energy are minimized. In all these examples (and there are many more!) it has been reported that formulation and solution of the underlying CPPs resulted in impressive gains of efficiency and savings of cost, see Section 3, An Annotated Bibliography below for some references.
3.2 Where to go from here
Writing a thesis about Euler cycles and Chinese postmen is a nice challenge as you will certainly realize when you get more and more involved into the subject. On one side there is beautiful mathematics, on the other side there are very interesting applications. In this section I present you some ideas you make take care of.
3.2.1 Issues of general interest
At the outset, however, an important point: do assume that the graphs you are dealing with are connected. Roughly, this means that there exist paths between any pair of vertices in a graph. You will have to state this assumption, but I recommend not to put too much emphasis on this issue. This would lead you too far afield as connectivity is a nontrivial graph property, in particular in case of directed graphs, a concept to be introduced shortly.
Algorithms for finding Euler cycles
As we have remarked above, Euler did no outline a general procedure to deter- mine Euler cycles. This gap was closed more than 100 years after Euler’s 1736 paper. Two classical algorithms have been invented during the 19th century, interestingly without reference to Euler’s original work on the subject which at that time has been more or less forgotten. The paper of Carl Hierholzer (1873) presents the first algorithm together with a complete proof of the statement that a connected graph has an Euler cycle if and only if all its vertices have even degree. Actually, Euler only proved the if -part, i.e., a necessary condition for this property. Hierholzer’s approach is simple and elegant. It successively finds cycles in the graph G = (V,E) and glues them together to an Euler cycle. When properly implemented (using a stack data structure) this algorithm is very efficient, the amount of computational work to find an Euler cycle is proportional to the number of edges |E|. The other algorithm is due to Fleury (1883). Actually, it is the most often cited algorithm as you can check by a quick web search. Fleury’s algorithm is so popular because it is very intuitive: just walk through the graph deleting each edge after it has been traversed, unless it is a bridge and you are forced to
65 Topic 3. The Seven Bridges of Konigsberg¨ walk over it. A bridge is an edge which when removed from the graph results in disconnected components. Whereas it is easy to figure out whether an edge is a bridge in small graphs, this tasks becomes quite formidable in bigger graphs and requires an algorithm, the paper of Jens Schmidt (2012) presents one. As a result the computational complextity of Fleury’s algorithm is considerably higher than that of Hierholzer’s algorithm. There exist alternatives to Hierholzer and Fleury, of course. One such alterna- tive is an algorithm due to Tucker (1976). You can find a detailed presentation and analysis of algorithms to find Euler paths and Euler cycles in the profound book of Fleischner (1991, Chapter X.). Your thesis should discuss these algorithms (at least those of Hierholzer and Fleury) carefully, their pros and cons, their time complexity. Maybe (yet an- other challenge) you write small computer programs to implement them. You may do this in R, matlab, its clone octave or any other environment you like. Also, you will have to think about practical and efficient ways to represent graphs by appropriate data structures. And, of course, you should illustrate the capabilities of your programs by some nice examples.
The Chinese Postman Problem
The CPP is certainly the most obvious application of Euler cycles. Recall from the Invitation that the CPP is a combinatorial optimization problem that con- sists of three layers: at the top level an Euler cycle is sought in a graph which has been suitably augmented. This augmentation, the addition of edges repre- senting those which have to be traversed more than once, requires a shortest path problem and a minimum cost matching problem to be solved. For the shortest path problem you may have a look into Topic2 for a first ori- entation. The book of Christofides (1975) provides more technical information about shortest paths and about the matching problem. Regarding the matching problem: this is really difficult. Therefore I recommend to explain briefly the basics and avoid getting too much involved into the general matching problem. Just take it as some kind of a blackbox. When you implement the CPP then use any of the available software libraries to handle the matching part of the CPP. It is a very good idea to illustrate your exposition of the CPP by one or more well-chosen applications. An easy-to-read and very informative first introduc- tion can be found in chapter 6 of Larson and Odoni (1981). Material useful in this context is contained in chapters 7 and 16 of Farahani and Miandoabchi (2013). A class of important applications is the provision of public services like munic- ipal waste collection. The paper of Belien, De Boeck, and Van Ackere (2014) is an up to date exposition with many references which you may find very helpful.
66 Topic 3. The Seven Bridges of Konigsberg¨
3.2.2 Some more suggestions
Directed graphs
In a directed graph all edges have an orientation, i.e., using once again the analogy of a network of roads, all roads are one-way. Such directed edges are usually called arcs. Also directed graphs can have Euler cycles. Figure 3.10 shows you an example of a directed graph which has an Euler-cycle. You
2
1 3
4
Figure 3.10: An Eulerian digraph should formulate conditions for a directed graph to have such cycles (or paths) and devise algorithms for finding them. Do the algorithms of Hierholzer and Fleury still work in case of directed graphs?
Once again: the Chinese postman
The CPP can also be considered in directed graphs. Now the situation is somewhat different compared to the CPP in undirected graphs. In the latter the postman will never have to walk along a road more than twice (can you explain, why?). In a directed graph, however, it may be the case that the postman has to traverse arcs (one-way roads) several times. On the other hand, the matching problem is easier to solve. It boils down essentially to a classical transportation or minimum cost flow problem. Just to get an impression of the situation have a look at Figure 3.11. It displays a small network with 13 vertices consisting entirely of one-way streets, distances are given in units of 100 m. You can easily check that this directed graph is not Eulerian, but again an augmentation can establish this property. The augmentation process results in the Eulerian network shown in Figure 3.12. The street connecting vertices 6 and 10 has to be traversed five times! The optimal postman tour has total length 10 160 m and is found to be C = [1, 2, 3, 6, 10, 11, 12, 13, 6, 10, 7, 9, 11, 12, 13, 6, 10, 7, 5, 4, 8, 9, 11, 12, 13, 6, 10, 7, 5, 2, 3, 6, 10, 7, 5, 4, 1],
67 Topic 3. The Seven Bridges of Konigsberg¨
4 2 1 2 3
2 2.2 1.5 3 4 5 6 3.6 2.2 2.5 13 4 4 7 10 2.5 2 2
8 9 11 12 2 4 3
Figure 3.11: A network with one-way streets
1 2 3
4 5 6
13 7 10
8 9 11 12
Figure 3.12: The Eulerian network
Can you verify my calculations?
Really poor guys: rural and the windy postmen
The classical CPP is a tractable problem, i.e., it can be solved in polynomial time. But there are variants of considerable importance in practical applications that are really hard. One such variant is the Rural Postman Problem (RPP). Here the network consists of two types of roads: there are roads whose traveral is mandatory (deliver mail) and other roads which may be traversed but need not. A typical example is displayed in Figure 3.13 where the mandatory roads are drawn in heavy lines. We may think of a postman who has to serve two districts which are connected by roads belonging to a district served by some other postman. Again an augmentation process is necessary, but the difficulty is
68 Topic 3. The Seven Bridges of Konigsberg¨
Figure 3.13: A network for a rural postman that for this process roads of both types (mandatory and non-mandatory) may be used. The RPP is indeed an extremely difficult combinatorial optimization problem. Finally, there is the Windy Postman Problem. This poor guy has to serve a network of roads which may be traversed in any direction (no orientation), but the cost of traversal does depend on the direction, see Figure 3.14 Think of a
4 min → ← 6 min
Figure 3.14: Direction dependent costs for the windy postman postman delivering his mail by using a bicyle and edge weights are travel times. If there is some headwind then these travel times certainly depend on direction. Again, no efficient solution procedure is known. A thorough discussion of these problems is found in the papers of Eiselt, Gendreau, and Laporte (1995a) and (1995b).
3.3 An Annotated Bibliography
To prepare your thesis you will need certainly some acquaintance with basic ter- minology and concepts from graph theory. Unfortunately, terminology in graph theory is far from being standardized, see footnote 3 in Topic2. Anyway, gentle introductions to graph theory are the books by Chartrand (1975), Hartsfield and Ringel (2003) and Gibbons (1991). Gibbons’s book is more technical in that it puts strong emphasis on algorithms, still it is very readable. In chapter 6 you find a thorough discussion of Eulerian graphs (directed and undirected) as well as postmen problems. Christofides (1975) is one of my favorite books. It combines both, a clear exposition of concepts and a detailed discussion of algorithms. Chapter 9 of this book is devoted to the Euler problem, chapter 12 to the matching problem. Another great book is Gondran and Minoux (1995),
69 Topic 3. The Seven Bridges of Konigsberg¨ in particular chapter 8. This chapter has also some exercises presenting various applications. Regarding historical background, I recommend the fine textbook Biggs, Lloyd, and R. J. Wilson (2006), which is a commented collection of milestone publi- cation in graph theory over the period 1736–1936. It begins with an English translation of Euler’s 1736 paper and translations of several other papers re- lated to the Euler problem. Here you find also interesting facts about Euler cycles and labyrinths! Gr¨otschel and Yuan (2012) gives an account of the his- tory of the K¨onigsberg bridges puzzle, Euler’s work on it, Mei-Ko Kwan and the Chinese Postman Problem. The classical paper on the CPP is certainly Edmonds and Johnson (1973). At the heart of this publications there is a thorough discussion of the matching problem which has to be solved for the CPP. If you want to read more from and about Leonhard Euler, consult The Euler Archive (2011) hosted by the Mathematical Association of America. It contains a small part of Euler’s enormous Opera Omnia, some of his papers come with English translations. The paper Hierholzer (1873) has a tragic history. He died in 1871 at an age of only 31 years. The cited paper was published posthumously after it has been prepared without written records by Hierholzer’s friends Christian Wiener and Jacob L¨uroth.An English translation is given in Biggs, Lloyd, and R. J. Wilson (2006), a text already mentioned. Janet Heine Barnett (2005) and R. J. Wilson (1986) are especially worth to be studied because both of them contains a thoroughly commented translation of Euler’s 1736 paper stating its findings in modern graph theoretic terms. A detailed presentation of Eulerian graphs is the 2-volumne series Fleischner (1990) and Fleischner (1991). In Chapter X. you will find various algorithms for finding Euler cycles together with a careful discussion of their efficiency. Finally, regarding the CPP: have a look at Eiselt, Gendreau, and Laporte (1995a) and (1995b). Although these papers are rather technical, take your time and read them, as these papers are consider standard reference texts to the CPP and its hard variants, e.g., the rural postman.
3.4 References
[1] The Euler Archive. A digital library dedicated to the life and work of Leonhard Euler. 2011. url: http://eulerarchive.maa.org/. [2] Michael Behrend. Republications in maze mathematics. 2012. url: www. cantab.net/users/michael.behrend/repubs/maze_maths/pages/ euler.html. [3] J. Belien, L. De Boeck, and J. Van Ackere. Municipal Solid Waste Collec- tion and Management Problems: A Literature Review. 2014. url: https:
70 Topic 3. The Seven Bridges of Konigsberg¨
//lirias.kuleuven.be/bitstream/123456789/407421/1/Municipal+ Solid+WasteCollectionProblems+-+final+paper_revised_3.pdf. [4] N. L. Biggs, E. K. Lloyd, and R. J. Wilson. Graph Theory 1736-1936. Oxford University Press, 2006. [5] Gary Chartrand. Introductory Graph Theory. Dover Publications, 1975. [6] Nicos Christofides. Graph Theory - An Algorithmic Approach. Academic Press, 1975. [7] Jack Edmonds. “The Chinese postman problem”. In: Operations Research 13. Suppl. I (1965), p. 373. [8] Jack Edmonds and Ellis L. Johnson. “Matching, Euler tours and the Chinese postman”. In: Mathematical Programming 5.1 (1973), pp. 88– 124. [9] H. A. Eiselt, Michel Gendreau, and Gilbert Laporte. “Arc Routing Prob- lems, Part I: The Chinese Postman Problem”. In: Operations Research 43.2 (1995), pp. 231–242. [10] H. A. Eiselt, Michel Gendreau, and Gilbert Laporte. “Arc Routing Prob- lems, Part II: The Rural Postman Problem”. In: Operations Research 43.3 (1995), pp. 399–414. [11] Leonhard Euler. “Solutio problematis ad geometriam situs pertinensis”. In: Commentarii Academiae Scientarum Imperialis Petropolitanae 8 (1736), pp. 128–140. English translation: Michael Behrend. Republications in maze mathematics. 2012. url: www.cantab.net/users/michael.behrend/ repubs/maze_maths/pages/euler.html. [12] R.Z. Farahani and E. Miandoabchi. Graph Theory for Operations Re- search and Management: Applications in Industrial Engineering. Business Science Reference, 2013. [13] H. Fleischner. Eulerian Graphs and Related Topics I. Annals of Discrete Mathematics. North Holland Pub. Col, 1990. [14] H. Fleischner. Eulerian Graphs and Related Topics II. Annals of Discrete Mathematics. North Holland Pub. Col, 1991. [15] Alan Gibbons. Algorithmic Graph Theory. Cambridge University Press, 1991. [16] Michel Gondran and Michel Minoux. Graphs and Algorithms. John Wiley and Sons, 1995. [17] M. Gr¨otschel and Y. Yuan. “Euler, Mei-Ko Kwan, K¨nigsberg and a Chi- nese Postman”. In: Documenta Mathematica Extra Volume ISMP (2012), pp. 43–50. [18] Nora Hartsfield and Gerhard Ringel. Pearls in Graph Theory, A Compre- hensive Introduction. Dover Publications, 2003. [19] Janet Heine Barnett. Early Writings on Graph Theory. 2005. url: www- users.math.umn.edu/~reiner/Classes/Konigsberg.pdf.
71 Topic 3. The Seven Bridges of Konigsberg¨
[20] Carl Hierholzer. “Uber¨ die M¨oglichkeit, einen Linienzug ohne Wiederhol- ung und ohne Unterbrechungen zu umfahren”. In: Mathematische An- nalen (1873), pp. 30–32. [21] R.C. Larson and A.R. Odoni. Urban operations research. Prentice Hall PTR, 1981. url: http://web.mit.edu/urban_or_book/www/book/. [22] H. Sachs, M. Stiebitz, and J. R. Wilson. “An Historical Note: Euler’s K¨onigsberg Letters”. In: Journal of Graph Theory 12 (1988), pp. 133– 139. [23] Jens M. Schmidt. A Simple Test on 2-Vertex- and 2-Edge-Connectivity. 2012. url: http://arxiv.org/abs/1209.0700. [24] Alan Tucker. “A New Applicable Proof of the Euler Circuit Theorem”. In: The American Mathematical Monthly 83.8 (1976), pp. 638–640. [25] R. J. Wilson. “An Eulerian Trail Through K¨onigsberg”. In: Journal of Graph Theory 10.3 (1986), pp. 265–275.
Current version of this topic finished on Dec. 12th, 2017.
72 Topic 4
The Chains of Andrei Andreevich Markov - I
Finite Markov Chains and Their Applications
He was too young to have been blighted by the cold world’s corrupt finesse; his soul still blossomed out, and lighted at a friends word, a girl’s caress. Alexander Pushkin, Eugene Onegin1
Keywords: applied probability, stochastic processes, limiting behavior; applications: weather prediction, credit risk, Google’s PageRank, voter migration, simulation of Markov chains
4.1 An Invitation
4.1.1 The Law of Large Numbers and a Theological Debate
Andreii Andreevich Markov was born on June 14, 1856 in Rjasan, about 200 km in the south-east of Moscow. After finishing classical gymnasium he studied mechanics and mathematics at the Univer- sity of St. Petersburg where he became a disciple of P. L. Chebyshev, one of the most influential and prolific Russian mathematicians of the 19th century. Among the wide spread research interests of Cheby- shev, ranging from analysis and probability to the theory of numbers, there was the Law of Large Num- bers (LLN) which finally attracted Markov’s atten- tion. A first version of this fundamental law has been formulated and proved by Jakob Bernoulli (1654– Andrei A. Markov 1705). In his analysis of games of chance he has 1856–1922 shown that in a prolonged sequence of independent random trials, each hav- ing only two possible outcomes, success or failure, the relative frequency of observing success comes close to the theoretical success probability p. Stochas- tic independence, however, is a crucial condition in Bernoulli’s derivation of
1Novel in verse published in 1833, cited from Hayes (2013). This was part of Markov’s original experiments on the statistics of language.
73 Topic 4. The Chains of Andrei Andreevich Markov - I the law, and that didn’t change until Markov’s contributions to the theory of the LLN. In the 19th century the concept of independence was not fully under- stood. This lack of comprehension resulted in the remarkable effect that the LLN got into the focus of a passionate theological debate about the existence of free will versus predestination. The debate was initiated by Pavel Nekrasov (1853–1924), a Russian mathematician with excellent relations to the Russian orthodox church. In a paper published in 1902 Nekrasov argued that voluntary acts are expressions of free will. And as such they are like independent events in probability theory, there are no causal links between these events. The LLN applies only to such events and this is, as he said, supported by data collected in social sciences like crime statistics. Markov strongly objected to this interpretation of the LLN and in 1906 he ad- dressed the problem to derive and prove a LLN for dependent trials. For this purpose he devised a simple stochastic process having two states only. Markov was able to show that as the process evolves over time the average times the system spends in either of these states approach a limit. As a sample applica- tion he performed a statistical analysis of the first 20 000 letters of Alexander Pushkin’s (1799–1837) poem Eugene Onegin. The two states of his process (he used the term chain) were: a letter is a vowel (state 1), a letter is a consonant (state 2). Then he counted how often a vowel is followed by a vowel, a conso- nant, and similarly he counted transitions from consonant to vowel (consonant). In this way he formed a matrix of transition probabilities which became basis of subsequent analysis and the demonstration of a LLW for dependent trials. So, this is quite remarkable, the first practical application of Markov chains was statistical linguistics, and in this field of science they are used even today.
4.1.2 Let’s start with a definition
Let X0,X1,X2,... denote a sequence of random variables, each taking its values in some finite set S. The sequence {Xn, n ≥ 0} is called a stochastic process and S is its state space. The index n of Xn usually denotes time which we assume to be discrete. Thus there is some sort of clock with ticks at times n = 1, 2, 3,... and at these clock ticks the process may or may not change its state. Hence, if an observer finds that the random event {Xn = k} occurred, we say the process Xn is in state k ∈ S at time n.
The behavior of the random process {Xn, n ≥ 0} can be described in various ways. One is to record values attained by Xn and determine the joint distribu- tion, that is the probability:
P (X0 = k0,X1 = k1,...,Xn = kn).
Equivalently, we may be interested in the probability of finding Xn in a par- ticular state kn, given the whole history of the process {Xn, n ≥ 0}. In other
74 Topic 4. The Chains of Andrei Andreevich Markov - I words, we are interested in the conditional probability
P (Xn = kn| X0 = k0,X1 = k1,...,Xn−1 = kn−1). (4.1) | {z } History Hn−1
Generally, it is extremely difficult to determine probabilities like (4.1), because the dependence of Xn on its history Hn−1 may be rather complicated. But there are situations that can be handled easily. One is independence in which case
P (Xn = k|Hn−1) = P (Xn = k).
An example is rolling a single dice where Xn denotes the value shown at the n-the experiment. Here the state space is S = {1, 2, 3, 4, 5, 6}, and whatever numbers have turned up in the first n−1 experiments, we always have (provided the dice is a fair one):
1 P (X = k|H ) = P (X = k) = for all k ∈ {1, 2, 3, 4, 5, 6}. n n−1 n 6 The case of independence is very well understood and probability theory pro- vides us we so marvelous tools like a Law of Large Numbers, a Central Limit Theorem etc. for independent stochastic sequences. Since independence makes everything much easier, it is not surprising that these fundamental laws have been discovered quite early, they are known since the 18th century. But what, if there is no independence? When in 1906 A. A. Markov addressed the problem of proving a Law of Large Numbers for sequences of dependent random trials, he did so by assuming a very specific and simple type of de- pendence: he assumed that from the whole history Hn−1 of Xn only the state occupied at time n − 1 counts. In other words:
P (Xn = j|Hn−1) = P (Xn = j|Xn−1 = i), for every pair i, j ∈ S. (4.2)
A typical example is the game Monopoly. Here Xn is the position of your token on the board after the n-th move. No matter how you came into position Xn−1 = i (say), rolling the dice determines how many steps your token moves ahead on the board. Where your token lands at the end of that move in position Xn = j (say), this depends only on the position before the move and the outcome of throwing the dice 2. Formula (4.2) is a formal statement of what is known as the Markov property, the random process {Xn, n ≥ 0} is called a finite state Markov chain. The basic structural parameters of a Markov chain (MC) are the one-step transition probabilities:
pij = P (Xn = j|Xn−1 = i). (4.3)
2You can find a nice analysis of Monopoly as a Markov Chain in the paper of Ash and Bishop (2003).
75 Topic 4. The Chains of Andrei Andreevich Markov - I
In a rather general setting these probabilities may depend on time n, i.e. the pij may vary with n as functions of time pij(n). However, in this invitation (and in your thesis also), we shall assume that transition probabilities are independent of time. A MC having this property is called time-homogeneous.
The double indexing of these probabilities pij suggests to combine them in a square matrix P. If the state space S has size |S| = N, then P is a matrix of order N × N:
p11 p12 . . . p1N p21 p22 . . . p2N P = . . . . . . . pN1 pN2 . . . pNN
P is called the transition matrix of the MC {Xn, n ≥ 0}. It has two rather obvious properties:
• The components are nonnegative: pij ≥ 0 or all i, j ∈ S, because these numbers are probabilities. PN • The sum of each row equals 1, i.e., for each i we have j=1 pij = 1. This is because the chain {Xn, n ≥ 0} cannot leave its state space. Any square matrix having these characteristics is called a stochastic matrix. We will see soon that these properties have remarkable and far-reaching conse- quences. Besides the transition matrix P we need another ingredient, an initial distribu- tion . This specifies the probability of X0 occupying a particular state at time zero, viz. P (X0 = i) for all states i. It will be very convenient to arrange these initial probabilities in a vector which we will almost exclusively use as a row vector: 3
t π0 = [π01, π02, . . . , π0N ], where π0i = P (X0 = i) (4.4)
Because π0 represents a probability distribution, we have
N X π0i = 1. (4.5) i=1
Any vector with nonnegative components which sum to one is called a probability vector. Since matrix algebra will play a prominent role in the sequel, why not restate the basic properties of P and π0 in terms of matrix operations? For this purpose we need to define a one-vector 1 of order N × 1 as a vector
3In this invitation we adhere to the convention that any vector a is always interpreted as a column vector. If it happens (and it will happen) that we need a as a row vector, then we simply transpose it to at.
76 Topic 4. The Chains of Andrei Andreevich Markov - I consisting entirely of ones: 1 1 1 = . . . 1 Furthermore, for any matrix A or vector a, we write A ≥ 0 and a ≥ 0, if all components of this matrix (this vector) are nonnegative. The symbol 0 represents the zero matrix or the zero vector. Then the basic properties of P and π described above can be written compactly as: P ≥ 0 nonnegativity π ≥ 0 P · 1 = 1 row sums equal to 1 (4.6) t π0 · 1 = 1 this is (4.5). (4.7) Observe that the left hand side of (4.6) is the usual matrix product of the transition matrix with the one-vector. Also, (4.7) is the matrix product of the t row vector π0 and the one-vector (a column vector!). As these are of orders 1 × N and N × 1, the result is a matrix of order 1 × 1, i.e., the scalar value 1.
The specification of the initial distribution π0 and the transition matrix P completely determines the stochastic evolution of the MC {Xn, n ≥ 0}. So, quite naturally the question arises:
Given π0 and P, how can we calculate the distribution of the states Xn at some time n? Let us denote this distribution by the probability vector:
t πn = [πn1, πn2, . . . , πnN ], where πni = P (Xn = i). (4.8)
So, how to calculate πn?
It will turn out that πn can be obtained by solving a simple recurrence relation. For the latter we can even find an explicit solution which, quite remarkably, is a power law!
4.1.3 Example 1: Will We Have a White Christmas This Year?
Every year at the beginning of December the question “Will we have a White Christmas this year?” pops up quite regularly in the weather shows of almost all TV-channels. Markov chain analysis may help us to shed some light on this really important question. Rotondi (2010) approaches this forecasting problem by defining a two-state MC with state space S = {G = green day,W = white day}.
77 Topic 4. The Chains of Andrei Andreevich Markov - I
A green day is defined as a day with observed snow depth < 50 mm, whereas a white day must have snow depth ≥ 50 mm. Snow depth data are available from the Global Historical Climatology Network (GHCN) which collects data from weather stations all over the world. One such station is located in the Central Park in New York. Over several years for the period from December 17th to December 31st transitions4 between the states W and G are counted and the observed relative frequencies are taken as statistical estimates of transition probabilities. This resulted in the transition matrix:
GW P = 0.964 0.036 G 0.224 0.776 W
Thus the probability pGG that a green day is followed by a green day equals 0.964, so it’s very likely that a green day follows a green day. Also, a green day is followed by a white day with probability pGW = 0.036. In an analogous manner we interpret the second row of P. It will be very convenient to draw a diagram of the possible transitions in this chain:
0.036
0.964 G W 0.776
0.224
Technically speaking, this is a directed graph with nodes corresponding to states and arcs (links) corresponding to possible transitions of the chain. Each arc i → j is assigned a weight which equals the transition probability pij. This graph is called the transition graph of the chain {Xn, n ≥ 0}. No special knowledge in graph theory is required for this topic, but if you want to know more, you may consult the Invitation to Topic2, Shortest Paths in Networks, which provides you with a rudimentary overview of the basic terminology from the theory of graphs. Let us attack our weather forecasting problem experimentally. All we need is a computer, actually a pocket calculator suffices. Hard-nosed guys amonmg you can do the job with paper and pencil only. Suppose that today is December 17th and this is a green day. This assumption specifies the following initial distribution:
t π0G = P (X0 = G) = 1, π0W = P (X0 = W ) = 0 =⇒ π0 = [1, 0]
4This period was chosen in order to minimize bias due to seasonality of weather patterns.
78 Topic 4. The Chains of Andrei Andreevich Markov - I
What is the probability that tomorrow, the day after tomorrow, etc. we will have a green or white day? To determine these probabilities we have to invoke the Law of Total Probability : if there is an aribtrary event A and further events B1,B2,...,Bn which are mutually disjoint and whose union equals the sample space Ω, i.e.,
Bi ∩ Bj = ∅ for all i 6= j, and B1 ∪ B2 ∪ ... ∪ Bn = Ω, then the total probability of A is given by
P (A) = P (A|B1)P (B1) + P (A|B2)P (B2) + ... + P (A|Bn)P (Bn). (4.9)
In our case with A = {X1 = G} and two conditions B1 = {X0 = G} and B2 = {X0 = W } we get:
P (X1 = G) = P (X1 = G|X0 = G)P (X0 = G) + P (X1 = G|X0 = W )P (X0 = W )
= π0G · pGG + π0W · pWG = 1 · 0.964 + 0 · 0.224 = 0.964 (A)
Similarly,
P (X1 = W ) = P (X1 = W |X0 = G)P (X0 = G) + P (X1 = W |X0 = W )P (X0 = W )
= π0G · pGW + π0W · pWW = 1 · 0.036 + 0 · 0.776 = 0.036 (B)
Not very spectacular, but looking closer at (A) and (B) you will find that both probabilities are sums of products, and whenever you encounter such a pattern, be sure, there’s a matrix multiplication lurking behind. Indeed:
0.964 0.036 πt = [0.964, 0.036] = [1, 0] 1 0.224 0.776
In other words, we have found for our example:
t t π1 = π0 · P (4.10)
t t So what about π2, π3, etc., the distribution of states in two, three days? That’s easy, the Markov Property (4.2) comes to our help. The latter implies that
t t π2 = π1 · P, but, by (4.10),
t t t 2 π2 = (π0P) · P = π0 · P
Continuing in this manner we obtain:
t t t 2 t 3 π3 = π2 · P = (π0P ) · P = π0P
79 Topic 4. The Chains of Andrei Andreevich Markov - I
These easy-to-grasp steps when properly continued show us that there exists a fundamental recurrence relation having a very simple resolution as a power law:
t t t t n πn = πn−1P =⇒ πn = π0P , n = 1, 2,... (4.11)
Moreover, at no place in our calculations we have made use of the fact that P is only of order 2 × 2. Indeed, the law of total probability holds for any finite number of conditioning events. Hence it follows that the basic recurrence (4.11) holds generally for all finite state Markov chains. Let’s use now (4.11) to calculate the state probabilities for our example chain. Of course, normally we will not perform calculations by hand. It is much more convenient to use some computing environment which supports matrix calculations, e.g., matlab or its free clone octave. Alternatively, you may also use R. The latter is somewhat special, I will have to say more about this tool in Section 4 below. t Starting with a green day, our initial probability vector equals π0 = [1, 0], we obtain successively:
day n P (Xn = G) P (Xn = W ) December 17th 0 1.0000 0.0000 1 0.9640 0.0360 2 0.9374 0.0626 3 0.9176 0.0824 4 0.9031 0.0969 5 0.8923 0.1077 6 0.8843 0.1157 Christmas Eve 7 0.8784 0.1216 ... New Year’s Eve 14 0.8636 0.1364 ... January 16th 30 0.8615 0.1385 January 17th 31 0.8615 0.1385 ...
Hm, that looks interesting. Our calculations show: starting with a green day on December 17th, it is very likely that we won’t have a White Christmas, in fact only with probability 12.2 % there will be snow on that day. Actually, the probability is about 88 % that December 24th is a green day again. Note however that this is not the probability of having 7 green days in a row. It just means that after 7 days we are again in state G. But, more can be seen: there is a remarkable pattern, it seems that the state
80 Topic 4. The Chains of Andrei Andreevich Markov - I probabilities approach a limit. Starting with a green day we expect in the long run a green day with probability 86 % and a white day with probability 14 %. Having become curious through these observations we continue our experiment: t this time we start with a white day, so π0 = [0, 1]. Then repeating the procedure we obtain:
day n P (Xn = G) P (Xn = W ) December 17th 0 0.0000 1.0000 1 0.2240 0.7760 2 0.3898 0.6102 ... Christmas Eve 7 0.7569 0.2431 ... New Year’s Eve 14 0.8488 0.1512 ... January 16th 30 0.8614 0.1386 ... January 26th 40 0.8615 0.1385 January 27th 41 0.8615 0.1385 ...
Now the probability of a white Christmas is 24 %. And interestingly, the long run distribution of green and white days is the same as before. These are quite remarkable and unexpected observations. Let us summarize what we have found so far experimentally:
• It seems that the state distribution πn approaches a limit as n → ∞. • Also, there is some indication that this limit is independent of the initial state distribution π0. We have found the same limit for the two initial distributions [1, 0] and [0, 1]. We could have started also with another initial distribution of green and white days, for instance, we could have taken t π0 = [ 0.5, 0.5 ]. Running through the recurrence we will again find that the state distri- bution settles in [ 0.8615, 0.1385 ]. • It is not implausible that these observations must be somehow related to special properties of the transition matrix P. Indeed, if we let the computer calculate some powers of P, we find another strange pattern: 0.86836 0.13164 0.86187 0.13813 P10 = , P20 = 0.81912 0.18088 0.85945 0.14055
0.86154 0.13846 0.86154 0.13846 P40 = , P80 = ... 0.86153 0.13847 0.86154 0.13846
81 Topic 4. The Chains of Andrei Andreevich Markov - I
Thus, the powers Pn themselves approach a limit, their rows are getting closer and closer to the conjectured limiting distribution of green and white days. The observations we have made are typical of regular chains, a term of great importance in the theory of finite MCs. A MC is called regular, if from some n onward in the powers Pn there are no zeros. More formally:
Pn > 0 for sufficiently large n.
A necessary (though not sufficient) condition for a chain to be regular can be identified by its transition graph: it must be possible to find a path of whatever length (= number of arcs comprising the path) between any pair of states. In terms of graph theory: the transition graph is strongly connected. For regular chains there holds the following fundamental theorem:
• The powers Pn of the transition matrix approach a limiting matrix A as n → ∞:
lim Pn = A (4.12) n→∞
• Each row of A is the same vector αt. This is the limiting distribution of states which is independent of the initial distribution π0 of the chain. In other words, whatever the initial distribution π0, the sequence πn generated by the basic recurrence
t t πn = πn−1P, n = 1, 2,... (4.13) has the limit
lim πt = αt (4.14) n→∞ n Since the probabilities must sum to one, the vector α must satisfy the sum condition
αt · 1 = 1 (4.15)
• The limiting probability vector α is also a stationary distribution of the chain {Xn, n ≥ 0} in the sense that
αtP = αt (4.16)
This follows directly from the existence of the limit (4.14) when applied to the recurrence (4.13). The term stationary simply says that if the chain starts with initial distribution α the distribution of states after one, two, etc. steps will always be the same, it will never change. Indeed,
αtP2 = (αtP)P = αtP = αt,
and similarly for any n ≥ 1 we have αtPn = αt.
82 Topic 4. The Chains of Andrei Andreevich Markov - I
Returning to our snowfall example: this chain is certainly regular ab initio as P > 0 and 0.8615 0.1385 A = lim P = , αt = [0.8615, 0.1385]. n→∞ 0.8615 0.1385
The way we have calculated the limiting distribution α is known as power method because
t t t t 2 t t 3 π1 = π0P, π2 = π0P , π3 = π0P ... which is just a more explicit way of writing the the recurrence (4.13). At the outset it is by no means clear that the sequence generated by the power method converges (and indeed, there are non-regular MCs where it does not!). Also, it may be that convergence is very slow and as a result a substantial amount of computational work may be necessary to come sufficiently close to the limit. There are several alternatives to the power method. A quite efficient and elegant one is to start with the stationary equation (4.16) and the condition αt · 1 = 1. A close look at (4.16) reveals that it is actually a system of linear equations in the unknowns α1, α2, . . . , αN , the components of α. It can be rewritten as:
αt = αtP =⇒ αt(I − P) = 0, (4.17) where I denotes the identity matrix. This system does not have a unique solution but that can be enforced by adding the sum condition (4.15) to (4.17). t Let’s try this with our snowfall example. We have α = [αG, αW ] and
0.036 −0.036 I − P = , −0.224 0.224 therefore the matrix equation (4.17) and (4.15) can be written more explicitly as:
t 0.036 αG − 0.224 αW = 0 α × 1st column of I − P t −0.036 αG + 0.224 αW = 0 α × 2nd column of I − P αG + αW = 1
The first equation can be removed as it is essentially identical with the second equation. Solving the last two equations yields: 0.224 0.036 α = = 0.86154, α = = 0.13846, G 0.26 w 0.26 which conforms nicely to our experimental results. Although this approach looks more attractive than the power method there are situations where the latter is really the method of choice. A remarkable example is the MC used in Google’s PageRank algorithm which has several trillions of
83 Topic 4. The Chains of Andrei Andreevich Markov - I states. In this case it is practically impossible to solve they system of linear equations (4.17) because it is too large. So far some basic ideas to regular Markov chains. Much more can be said about these, I’ll postpone this discussion to Section 2 were you will also find several really interesting applications among them Google’s famous PageRank. Not all MCs are regular, however. Absorbing chains are rather different, as our next example shows.
4.1.4 Example 2: Losing Your Money - Delinquency Of Loans
Lending money is always a risky business, as everybody knows. In accounting Markov chain models are often used as probability models for accounts receiv- able. Rating models like those of Standard & Poor’s or Moody are well-known examples. Here we shall analyze a rather simple model discussed in Grimshaw and Alexander (2011). The basic idea of this model is: accounts receivable move through different delinquency states each month. For instance, an account in the state current (state C) this month will be in the state current next month, if a payment has been made by due date, and it will move to the state delin- quent (state D), if no payment has been received. It may also happen that the account in state current is completely repaid, this is state R. An account in the state delinquent (D) may become a loss L or default, if the borrower fails to pay and there is no realistic hope that he will ever repay the loan. A simple MC for this model thus has four states: C (current), D (delinquent), L (loss or default) and R (repaid). Suppose, following data for state transition probabilities are available:
CDLR 0.95 0.04 0 0.01 C P = 0.15 0.75 0.07 0.03 D 0 0 1 0 L 0 0 0 1 R
Observe that the rows corresponding to the states L and R are special, all components are zero except for pLL = pRR = 1. This means that whenever the chain enters either of those states, it gets trapped there, these states can never be left again. Accordingly, L and R are called absorbing states. Their special character comes out clearly when we draw the transition graph of this chain, as it is shown below. Observe that states R and L have only incoming arcs, no outgoing ones. For the other states C and D we find that they communicate. Between these states a joining path can be found in any direction. Thus our state space S naturally decomposes into two disjoint subsets:
S = T ∪ A, with T = {C,D}, A = {L, R}
The set T is called transient because sooner or later any borrower will be in an
84 Topic 4. The Chains of Andrei Andreevich Markov - I
R
0.01 0.03
0.15 0.07 0.95 C D L
0.04 0.75
absorbing state5. In Example 1 we started our analysis by experimentation, this time we will rely an matrix algebra. We will able to find out the form of Pn and much more by exploiting the special structure of P. This structure is easy to see: 0.95 0.04 0 0.01 0.15 0.75 0.07 0.03 TR P = = . (4.18) 0 0 1 0 0 I 0 0 0 1 Thus P can be divided into four blocks: a square matrix T governing the transition between transient states: 0.95 0.04 T = , 0.15 0.75 Furthermore there’s a rectangular matrix R holding the probabilities of one- step transitions from any transient state into one of the two absorbing states: 0 0.01 R = . 0.07 0.03 In the bottom row of P we have a 2 × 2 matrix of zeros 0 and a 2 × 2 identity matrix I. The partition (4.18) is called a 2 × 2 block matrix. The nice thing about such a matrix is, it can be multiplied with itself in much the same way as we do this with any 2 × 2 matrix having scalar components. We only have to take care of the fact that when multiplying sub-blocks the commutative law will not hold in general because these blocks are matrices. So let’s calculate P2. The standard multiplication scheme is: TR 0 I T2 TR + R =⇒ P2 = TR T · T + R · 0 T · R + R · I 0 I 0 I 0 · T + I · 0 0 · R + I · I
5In the long run we are all dead. (John Maynard Keynes, 1883-1946)
85 Topic 4. The Chains of Andrei Andreevich Markov - I
Similarly, we calculate P3 = P2 · P:
T3 T2R + TR + R T3 (I + T + T2)R P3 = = 0 I 0 I
Can you see the pattern? Obviously Pn will look like
Tn (I + T + T2 + ... + Tn−1)R Pn = 0 I
That’s a very simple structure, except of the upper right corner. But again that expression reminds us on something. It’s essentially a geometric series! We can find its sum in exactly the same way6 as ordinary geometric series are summed, because the matrix T will commute with any of its powers Tk. Recall the classical summation formula for scalar geometric series 1 − an 1 + a + a2 + ... + an−1 = , provided a 6= 1, 1 − a and in the limit: 1 lim (1 + a + a2 + ... + an−1) = , provided |a| < 1, n→∞ 1 − a These formulas hold verbatim also for matrix geometric series:
I + T + T2 + ... + Tn−1 = (I − Tn)(I − T)−1, (4.19) provided I−T has an inverse. Fortunately, it can be shown that Tn converges to n a zero matrix component-wise, i.e. limn→∞ T = 0 which implies the existence of the inverse of I − T. This is the matrix analogue of the limiting relation n limn→∞ a = 0 when |a| < 1. It is important to keep in mind that all these claims need a proof, see Section 2. It follows that (4.19) becomes in the limit:
lim (I + T + T2 + ... + Tn−1) = (I − T)−1 (4.20) n→∞ So putting things together we have found something really remarkable:
• The n-step transition matrix of an absorbing chain is given by:
Tn (I − Tn)(I − T)−1R Pn = (4.21) 0 I
• In the long run the behavior of an absorbing chain is governed by the limiting matrix:
0 (I − T)−1R lim Pn = (4.22) n→∞ 0 I
because Tn → 0. 6You may consult any textbook on elementary mathematics.
86 Topic 4. The Chains of Andrei Andreevich Markov - I
In order to avoid our discussion becoming too academic, let’s use our example data from above. In particular, we shall determine the one-year transition matrix P12. Using any computing device supporting matrix calculations, you will find:
0.6751 0.1156 0.0777 0.1316 12 0.4335 0.0971 0.2994 0.1670 P = 0 0 1 0 0 0 0 1
The result is interesting. From the first row it follows about 7.8 % of loans being in state C initially have defaulted by the end of the year or even earlier. Among those loans that were initially delinquent (row 2) the corresponding probability is almost 30 %. The long run behavior is determined by (4.22):
0 0 0.4308 0.5692 n 0 0 0.5385 0.4615 lim P = n→∞ 0 0 1 0 0 0 0 1
This is remarkable again, the limiting matrix has different rows. In contrast to regular chains, for an absorbing chain the limiting distribution depends on the initial distribution. If we started with a good loan (state C), the initial distribution and its limit are
πt = [1, 0, 0, 0], lim πt Pn = [0, 0, 0.4308, 0.5692] 0 n→∞ 0 thus in the long run 43 % would be lost, 57 % repaid, whereas for delinquent loans these rates are 54 % and 46 %, respectively. On the other hand, if we start with a portfolio of 50 % good and 50 % delinquent loans,
πt = [0.5, 0.5, 0, 0], lim πt Pn = [0, 0, 0.4846, 0.5154], 0 n→∞ 0 losses would be about 49 %. That raises an interesting question: as loans can be traded, what is an optimal loan portfolio? The inverse appearing in (4.22) is of special interest, we will denote it by
N = (I − T)−1.
The matrix N is commonly known as the fundamental matrix of an absorbing chain because a lot of interesting quantities can be derived from it. In our example:
38.4615 6.1538 N = (I − T)−1 = . 23.0769 7.6923
87 Topic 4. The Chains of Andrei Andreevich Markov - I
N is called fundamental because its entries nij are expectations! It can be shown that for transient states i, j ∈ T :
nij = mean number of times the chain is in state j, given it started in state i.
The row sums therefore equal the mean number of steps until absorption.
38.4615 6.1538 1 44.6153 N · 1 = · = 23.0769 7.6923 1 30.7692
For example, consider a delinquent loan. It takes on average 31 months until this loan is either repaid or lost.
4.2 Where to go from here
The topic Finite Markov Chains is an extraordinary wide one, regarding the theory behind it but also regarding its applications. The latter should be in the focus of your thesis, but nevertheless, you should not forget about basic theory. In principle, there are two routes (at least) you may pursue: • Your thesis may become a well balanced mix of theory and application. If you prefer this route, then watch out for one, may be two really interesting real-world applications of MCs. Describe them in details, their pros and cons, the data used, discuss extensions and generalizations, statistical estimation of transition probabilities is also an issue. Develop the theory of MCs (see below) as far as it is necessary for your application(s). • Computer simulation of MCs is another fascinating topic, a suggestion is presented below. But be warned: although this sounds easy, it is not. And: keep your fingers off a class of methods which is known as Markov Chain Monte Carlo as far as it is concerned with Bayesian statistics. This is really beyond the scope of your thesis.
4.2.1 Make up your mind - absorbing or regular chains?
In the Invitation to this topic I have sketched some important theoretical con- cepts, though exposition has been kept on a minimum theoretical level, simply because my main intention was to raise your interest in this topic. But now its time to expand on your knowledge about MCs and incorporate this into your work. If you want to discuss absorbing chains and some of their applications: • Explain the concept of a fundamental matrix. • Show (elementary!) that the matrix Tn → 0. • Show that the entries in the fundamental matrix equal the expected du- ration of stay in a transient state. • Determine the expected time to absorption and also its variance.
88 Topic 4. The Chains of Andrei Andreevich Markov - I
If you put your emphasis on applications of regular chains : • There’s also a fundamental matrix for regular chains. Give a formula and discuss its derivation. • Give examples of various interesting quantities which can be derived from the fundamental matrix, e.g. discuss first passage times, i.e., the number of steps required to reach a given state for the first time. • There is a Law of Large Numbers for the average number of visits to a state and a Central Limit Theorem for the number of visits. You should give an account of these most important results, though you should not prove them. That’s too difficult. Finally, if you decide to concentrate on simulation of MCs, then your thesis still should not forget about theory. Thus basics of regular and absorbing chains should be discussed, may be also of periodic chains. For the theoretical part the famous book Kemeny and Snell (1983) will be very helpful, in particular chapters 3 and 4. Please have a look at the Annotated Bibliography, Section 3. As I already remarked, examples are a most important part of your thesis. Grabbing in the net you will find a tremendous number of applications. But please have a look at Langville and Hilgers (2006), the title of this paper is a program: the five greatest applications of Markov Chains. So make up your mind and decide what you want to work on. There are some applications which I found particularly interesting. You are invited to have a look at them, may be you find these interesting too.
4.2.2 Google’s PageRank Algorithm
If you call Google’s main page and type: Markov chains applications you get almost instantly (about 0.4 secs) 1 220 000 hits. The speed is impressing, of course. But even more impressing is that the search results are ranked by im- portance and it is this ranking which is to a greater part responsible of Google’s economic success. PageRank’s philosophy is that a webpage is important if it is pointed too by other important pages. The crucial step is to measure relative importance of pages. Sergei Brin and Larry Page, in 1999 students at Stanford University and later founders of Google, had the ingenious idea (Brin et al., 1999) to accomplish this task by representing the hyperlink structure of the world wide web as a gigantic transition graph. Each node in this graph repre- sents a web page, and if a page has a link to another page then there is an arc pointing to that page. It is possible to construct a transition probability matrix of enormous size equal to the number of web pages online, it has several trillions of rows and columns. To bring order into the web Sergei Brin, Larry Page and coworkers suggested to calculate the vector α of the limiting distribution of the corresponding MC. Once α is known, pages are ranked according to the values in this vector. Pages with high limiting probabilities get a high rank, those with small values get low
89 Topic 4. The Chains of Andrei Andreevich Markov - I ranks7. There are several mathematical hurdles. For instance, it is by no means clear that the resulting chain is regular. And indeed, it is not because there exist dangling pages which have no outgoing links. But by some ingenious tricks it can be made regular (as our snowfall example). So the limiting distribution exists and can be computed, theoretically. In practice, however, this is highly nontrivial because of size. For the calculation of α the power method is used with a number of additional measures taken which are not disclosed to the public. Still, the amount of computational work to be performed regularly is gigantic. Actually Moler (2002) has called this the world’s largest matrix computation. It should be remarked that Google’s PageRank method is not the only approach to the information retrieval problem in the WWW. More information on this and related applications of regular chains can be found in Langville and Meyer (2005) and the textbook Langville and Meyer (2006).
4.2.3 Credit Ratings
In Example 2 above we have already touched the credit business, a simple ab- sorbing chain was constructed to describe the changes in state of debt. On a higher level, e.g. when dealing with government bonds various international agencies collect data around the world and publish ratings. For instance, Stan- dard and Poor’s has its RatingsDirect (Vazza and Kraemer, 2015) which is published every year and contains a lot of interesting data. Most interesting is rating data. S & P’s are using a scale of seven grades ranging from triple A (AAA) over BBB, BB and B to CCC/C. Triple A is best, of course, CCC/C is rather bad, but not the worst case that can happen. There are two more ratings: Default (D) and Not Rated (NR). Default means that investors have lost their money (very likely), and an asset in state NR is usually also not a good thing. S & P’s publish tables with transition rates of assets or issuers changing between various rating levels. These tables are in principle transition matrices of MCs with nine states, two of them (Default and NR) are absorbing. The tables also show different levels of aggregation: there are global one-year transition matrices, matrices for USA, for Europe, for financial institutions and insurance business. Also transition matrices over longer periods, 5 years and 10 years are published. Moody has a similar system with 22 states (see e.g. Metz and Cantor (2007)), two of them again absorbing: Default (D) and Withdrawn (WR). I think that these reports are a rich source of data and valuable information. So, that’s a perfect playing ground to apply absorbing Markov chain theory.
7Bad guys (pages) which try to betray Google’s search engine get zero rank!
90 Topic 4. The Chains of Andrei Andreevich Markov - I
4.2.4 Generating Random Text, maybe Bullshit
Generating random text by simulating Markov chains is a fascinating and sometimes also really funny business. Practically all so-called bullshit gen- erators you can find in the web are based on this idea in one or another way. For instance Donald Trump’s public speeches inspired some people fa- miliar with basic Markov chain theory to create the speech generator located at http://trump.frost.works/8. The first publication I am aware of that discusses Markov chain text generation is the seminal paper on the mathematical foundations of communication theory by Claude Shannon (1948) who argued that any source transmitting data gives rise to a Markov chain. He also gives nice demonstrations and examples of text sequences generated by a MC. Basically, we need three ingredients for MC text generation.
1. A sufficiently large text corpus to estimate the transition matrix of a finite-state MC.
2. An algorithm to simulate the MC given its transition matrix.
As a text corpus one may take e.g. a collection of public speeches of a famous politician, a novel like Tolstoi’s War And Peace, etc. The corpus has to be split into tokens which may be single letters or groups of consecutive letters including punctuation and white space, but more interesting are tokens which are whole words from the text. Just to give you a very small example9: Suppose the text corpus is the following tweet of D. Trump from 28 Jan 2014 (for simplicity all characters lower case):
snowing in texas and louisiana, record setting freezing temperatures throughout the country and beyond. global warming is an expensive hoax!
As tokens we may define for instance all 2-groups (bigrams) of consecutive letters. This may remind you on Markov’s analysis of Eugene Onegin. In our sample corpus (· represents white space):
sn no ow wi in ng g··i in n··t te ex xa as ...
Take these as the states of the chain and perform now transition counts. For instance in our sample there are 5 occurrences of the token in, the chain visits five times this state. Four times this token changes to ng, and only once to n·. From the counts you can easily construct a transition matrix.
8For the sake of fairness, there are also Hillary Clinton speech generators, see e.g. http://www.themolecularuniverse.com/HillarySpeech/ 9You may also have a look at Topic 11- Elementary Methods of Cryptology to see it live in a cryptanalytic context.
91 Topic 4. The Chains of Andrei Andreevich Markov - I
Alternatively, you could also use trigrams, but note that then your transition matrix becomes very large. Anyway, for a text generator to produce something coming close to human language, it may be more fruitful the define words or groups of consecutive words as tokens. Having estimated a transition probability matrix P its really easy to simulate chains. Fix an initial state and then generate a sequence u1, u2,... of pseudo random numbers having a uniform distribution on the interval [0, 1]. For in- stance, when the chain is in state k, take the k-th row of P. Its cumulative sums split the unit interval [0, 1] into N contiguous subintervals, N being the number of states. Just check into which interval uk falls. If it falls into the i-th subinterval, the chain jumps into state i. Then repeat the procedure, this time with row i of P. You may implement this two-step process (estimation, simulation) in any pro- gramming language you like, java, python, etc. But, fortunately there are software packages making life easier, for instance R has the package markovchain. I’ll have to say more about that package later. Whatever way you do it, you should check the properties of the transition matrix estimated. Strange things may happen. Maybe the chain is regular, fine. Maybe it has absorbing states. Then sooner or later your simulated chain will get trapped in one of these absorbing states and your text generator keeps on producing something like ...... But it may also happen that the chain gets trapped in a subset of states which results in a marked deprivation of the text generated. Also, it is possible that the chain exhibits ultimately periodic behavior. This is very interesting from a mathematical point of view, but not so really welcome for a text generator, when from some time onward it keeps printing an endless string of sort bla bla bla ....
4.2.5 Other Applications
Here are a few more interesting applications reviewed in telegraphic style: • Brand Switching Models, also known as or related to so-called brand choice models. The behavior of consumers having the choice between different brands of some commodity can be modeled as a regular Markov chain. An analysis of these chains allows interesting statements about consumer loyalty and other aspects important from a marketing point of view. The paper of Colombo and Morrison (1989) is an easily accessible starting point. • Voter migration. In western democracies there are strong tendencies that voters no longer stick to a particular party but tend to change to alterna- tive political competitors. Impressive amounts of data are available today. It is not a new idea but pretty challenging to model voter migration as a MC. See the interesting websites maintained by Baxter (n.d.) or the SORA Institute for Social Research and Consulting, www.sora.at.
92 Topic 4. The Chains of Andrei Andreevich Markov - I
• Production management. Another application of absorbing Markov chain theory is modeling the flow of material through a production system. The stochastic character of this type of models is usually uncertainty due to possible reprocessing of parts because of insufficient quality as well as scrapping. The paper of Pillai and Chandrasekharan (2008) is exemplary in this context.
4.3 An Annotated Bibliography
A beautiful paper to begin with is Hayes (2013). The author presents a gentle introduction into the basics of Markov chains, there’s also a weather example and a discussion of Markov’s statistical analysis of Alexander Pushkin’s Eugene Onegin. In addition it has a nice account of Markov’s life and his work on Markov chains. Interesting details about Markov are revealed in the papers Langville and Hilgers (2006) and Basharin, Langville, and Naumov (2004). The former discusses also five great applications of MCs, among them Markov’s original linguistic analyses and the PageRank algorithm of Google. Markov chains are covered in practically all serious textbooks on probability, random processes and stochastic modeling. Unfortunately many of these books are not easily accessible to beginners and difficult to read. A major reason is the emphasis on Markov chains with infinite state spaces. But the mathematical theory for the latter is rather intricate. The standard textbook on finite chains is certainly Kemeny and Snell (1983). It is very well suited for beginners and elaborates clearly the matrix algebraic aspects of the subject. Unfortunately, the book is rather old and the notation used is somewhat nonstandard today. More demanding but still strongly recommended is Karlin and Pinsky (2011), in particular chapters 3 and 4. There you can also a lot of interesting applications carefully presented and worked out. Caveat. In the process of preparing this topic I have read quite a number of recent publications on Markov chains and I noticed that it becomes more and more fashionable to construct transition probability matrices as column-sum stochastic. Where we have defined pij as the probability of a jump from state i to state j some authors do it the other way round, so pij now becomes the probability of a jump from j to i. Obviously, this is done to achieve some notational simplification. That’s not worth it. I consider this bad style because it breaks with the tradition of all classical and serious texts on Markov chains.
4.4 A note on software
Applications of MCs including simulation is greatly facilitated by several soft- ware tools. The R markovchain package developed by Spedicato et al. (2015) is a rather comprehensive collection of software routines to handle finite MCs. Among standards like the calculation of fundamental matrices, it offers diag- nostics for a chain to be regular, etc. There is also the possibility to draw
93 Topic 4. The Chains of Andrei Andreevich Markov - I transition graphs, although you should not expect too much, unless the chain has a sufficiently small state space. Quite interesting for you are routines for statistical estimation of transition matrices and, last but not least, there is also a routine to simulate chains.
4.5 References
[1] R. B. Ash and R. L. Bishop. “Monopoly as a Markov Process”. In: (2003). url: www.math.uiuc.edu/~bishop/monopoly.pdf. [2] G. P. Basharin, A. N. Langville, and V. A. Naumov. “The Life and Work of A. A. Markov”. In: Linear Algebra and its Applications 386 (2004), pp. 3–26. [3] Martin Baxter. Electoral Calculus. url: http://www.electoralcalculu s.co.uk. [4] S. Brin et al. “The PageRank Citation Ranking: Bringing Order in the Web”. In: Technical Report 1999-0120, Computer Science Department. Stanford University (1999). [5] R. A. Colombo and D. G. Morrison. “A Brand Switching Model with Implications for Marketing Strategies”. In: Marketing Science 8.1 (1989), pp. 89–99. [6] S. D. Grimshaw and W. P. Alexander. “Markov chain models for delin- quency: Transition matrix estimation and forecasting”. In: Applied Stochas- tic Models in Business and Industry 27.3 (2011), pp. 267–279. [7] Brian Hayes. “First Links in the Markov Chain”. In: American Scientist 101 (2013), pp. 92–97. [8] Samuel Karlin and Mark A. Pinsky. An Introduction to Stochastic Mod- eling. Academic Press, 2011. [9] J. G. Kemeny and L. Snell. Finite Markov Chains. Springer, 1983. [10] A. N. Langville and P. von Hilgers. The Five greatest Applications of Markov Chains. 2006. url: http://langvillea.people.cofc.edu/ MCapps7.pdf. [11] A. N. Langville and Carl D. Meyer. “A Survey of Eigenvector Methods for Web Information Retrieval”. In: SIAM Review 47.1 (2005), pp. 135–161. [12] A. N. Langville and Carl D. Meyer. Google’s PageRank and Beyond. The Science of Search Engine Rankings. Princeton University Press, 2006. [13] Albert Metz and Richard Cantor. Introducing Moody’s Credit Transition Model. 2007. url: http://www.moodysanalytics.com/~/media/Brochu res/Credit-Research-Risk-Measurement/Quantative-Insight/Cred it-Transition-Model/Introductory-Article-Credit-Transition- Model.pdf. [14] Cleve Moler. The World’s Largest Matrix Computation. 2002. url: https ://www.mathworks.com/company/newsletters/articles/the-world- s-largest-matrix-computation.html.
94 Topic 4. The Chains of Andrei Andreevich Markov - I
[15] V. M. Pillai and M. P. Chandrasekharan. “An absorbing Markov chain model for production systems with rework and scrapping”. In: Computers and Industrial Engineering 55 (2008), pp. 695–706. [16] Michael A. Rotondi. “To Ski or Not to Ski: Estimating Transition Ma- trices to Predict Tomorrow’s Snowfall Using Real Data”. In: Journal of Statistical Education 18.3 (2010). [17] Claude E. Shannon. “A Mathematical Theory of Communication”. In: The Bell System Technical Journal 27.3 (1948), pp. 379–423. [18] G. A. Spedicato et al. The markovchain Package: A Package for Easily Handling Discrete Markov Chains in R. 2015. url: https://cran.r- project.org/web/packages/markovchain/vignettes/an_introducti on_to_markovchain_package.pdf. [19] Diane Vazza and Nick W. Kraemer. 2014 Annual Global Corporate Default Study And Rating Transitions. 2015. url: https : / / www . nact . org / resources/2014_SP_Global_Corporate_Default_Study.pdf.
95 Topic 4. The Chains of Andrei Andreevich Markov - I
96 Topic 5
The Chains of Andrei Andreevich Markov - II
Finite Markov Chains and Matrix Theory
The theory of finite homogeneous Markov chains provides one of the most beautiful and elegant applications of the theory of matrices. Carl D. Meyer, 1975
Keywords: probability theory, stochastic processes, matrix algebra
5.1 An Inivitation
This topic is under development and has not been finished yet.
Actually, when preparing Topic4, Finite Markov Chains and Their Applica- tions, I realized that it would be a good idea to make an own topic entirely devoted to matrix theory and applications. There is nothing to add to Carl D. Meyer’s quote above. This topic will cover presumably the following points: • Nonnegative matrices and the Perron-Frobenius Theorem. • Structural properties of a Markov transition matrix, reducibility. • Convergence of transition matrices, summability. • Spectral decomposition and Sylvesters Formula. • Convergence of the power method for regular chains. • The fundamental matrix as a generalized inverse. • Stochastic complements and uncoupling Markov chains. • . . . and may be more to come.
5.2 An Annotated Bibliography
Here are two interesting papers due to C. D. Meyer which strongly motivate me to work out this topic, Meyer (1975) and Meyer (1989).
97 Topic 5. The Chains of Andrei Andreevich Markov - II
5.3 References
[1] Carl D. Meyer. “Stochastic Complementation, Uncoupling Markov Chains, and the Theory of Nearly Reducible Systems”. In: SIAM Review 31.2 (1989), pp. 240–272. [2] Carl D. Meyer. “The Role of the Group Generalized Inverse in the Theory of Finite Markov Chains”. In: SIAM Review 17.3 (1975), pp. 443–464.
98 Topic 6 Benford’s Law
The Law of Anomalous Numbers is thus a general probability law of widespread application. Frank Benford, 1938
Keywords: Mathematical and statistical forensics, experimental statistics, probabilistic number theory
6.1 An Invitation
6.1.1 Simon Newcomb and the First Digit Law
A distinguished applied mathematician was extremely successful in bets that a number chosen at random in the Farmer’s Almanach, the US Census Report or a similar compendium would have the first significant digit less than 5. It is reported by Feller (1971) that this man won almost 70 % of his bets. How is this possible? Normally the numbers we use in everyday live are ex- pressed in decimal system so that the digits making up a number are integers in the range 0, 1, 2,... 9 except for the first or lead- ing digit which by convention and convenience is never zero. If we look at a collection of numbers like a table of physical constants, atomic weights, pop- ulations counts of cities, distances of galaxies from the earth, numbers in annual reports of companies, Farmers Almanach, etc., we intuitively expect that all digits in these numbers should occur with equal frequency. Thus the leading digit, say D1, should be 1, 2,..., 9 with frequency close to 1/9, and the second digit D2, the third D3, etc. should take their values with equal frequencies 1/10. Interestingly, very often this is not true, at least for Simon Newcomb the examples I have just mentioned and for may (1835-1909) other examples as well. In 1881 the astronomer Simon Newcomb (1881) pub- lished a short note in the American Journal of Mathematics which he opened as follows:
99 Topic 6. Benford’s Law
That the ten digits do not occur with equal frequency must be evident to anyone making use of logarithm tables, and noticing how much faster the first pages wear out than the last ones.
He argued that numbers whose first digit is small are more likely to be used than numbers with first digit being greater than 4 or 5, say. In particular, he stated:
The law of probability of the occurrence of numbers is such that all mantissae of their logarithms are equally probable.
Newcomb’s discovery did not have much resonance and apparently was forgot- ten until in 1938 Frank Benford, at that time physicist at General Electric, rediscovered Newcomb’s result which since then is commonly known as Ben- ford’s Law 1 . Benford (1938) collected lots of data from various areas so diverse as numbers of inhabitants of towns, physical measurements, the Farmer’s Al- manac, atomic weights, voltages of X-ray tubes, results from sports leagues, powers and square roots of natural numbers, etc. For most of these data he found that the frequency (we are tempted to say, the probability) of the first digit D1 is very close to the logarithmic law: 1 P (D = d) = log 1 + , d = 1, 2,..., 9 (6.1) 1 d
Here log means (as always in this introduction) logarithm to base 10. Formula (6.1) is commonly known as Benford’s First Digit Law. We shall call it also the weak version of Benford’s Law. By simple calculation we obtain:
d 1 2 3 4 5 6 7 8 9 P (D1 = d) 0.301 0.176 0.125 0.097 0.079 0.067 0.058 0.051 0.046
Table 6.1: Benford’s Law - first digit probabilities
A bar plot of these data is displayed in Figure 1 below. It explains why the applied mathematician mentioned above had a chance of about 69.9 % to win his bets. Not so bad. Benford called numbers following the logarithmic law anomalous numbers and observed that the fit to this law was even better when we combined data of very different sources into a single large sample. That’s quite remarkable too because these data have very different units of measurement! Mark Nigrini (1992) finished his PhD thesis on Benford’s Law and suggested to use it as an auditing and accounting tool to detect anomalies in company data. Indeed, he found that most accounting data very closely follow Benford’s
1Benford’s article suffered a much better fate than Newcomb’s paper, possibly in part because it immediately preceded a physics arcticle by Hans Bethe et al. on the multiple scattering of electrons (Miller, 2015).
100 Topic 6. Benford’s Law
0.3
0.2
0.1
0.0 1 2 3 4 5 6 7 8 9
Figure 6.1: Benford’s Law - first digit probabilities
Law. However, in case of accounting fraud this is quite often not the case. This significant deviation of overserved data from Benford’s Law may be an indication that data have been manipulated, may be by fraudulent intent. Nigrini’s work initiated a new discipline nowadays known as forensic statistics or analytic forensics. Very quickly judicial authorities became aware of these new ideas. The Wall Street Journal (July 10, 1995) reported that the chief financial investigator for the district attorney’s office in Brooklyn, N. Y., Mr. R. Burton used Nigrini’s program to analyze 784 checks issued by seven companies and found that check amounts on 103 checks didn’t conform to expected patterns. “Bingo, that means fraud,” says Mr. Burton. The district attorney has since caught the culprits, some bookkeepers and payroll clerks and is charging them with theft. In particular Mr. Burton obtained the frequencies of first digits given in Table 2. Look how different these frequencies are from those predicted by Benford’s Law. It seems that faking data in an intelligent way is not so easy and, probably Mafia people should learn more mathematics!
d 1 2 3 4 5 6 7 8 9 f(d) 0.000 0.019 0.000 0.097 0.612 0.233 0.01 0.029 0.000
Table 6.2: First digit frequencies of fraudulent data (Theodore P. Hill, 1999)
You might have got the impression that all numeric data follow Benford’s Law, but this is not true. Here are a few examples of data which are not Benford: ZIP-codes, prices of products like ¤ 1.99 often used in supermarkets for psy- chological reasons, telefone numbers, numbers drawn in lotteries, etc. Other examples are data sampled from a normal distribution, data generated by a random walk process, data having a narrow range of possible values like body- height etc. Why are many data sets Benford and others are not? It has been argued that this may be a property inherent to our decimal number system but it turned out that this argument does not hold. Benford’s Law can be observed also for binary, octal, hexagesimal numbers. It has also been argued that there may be some universal law behind our data, probably comparable to one of the most fundamental laws in probability, the
101 Topic 6. Benford’s Law
Central Limit Theorem. It was not before 1995 when the pioneering works of Theodore Hill (see the annotated bibliography in Section 3 below) shed a bright light on this mysterious law.
6.1.2 The significand function
We are going now to work out some basic mathematical ideas which not only prove to be very useful but also allow us to state a much stronger version of Benford’s Law. In his 1938 paper Benford evaluated 20 229 data entries from various sources and it must have been a rather fatiguing process to count the digits of so many numbers. Of course, today we will do that not by hand but use the computer. But how can we persuade the computer to return the first or second or third digit of a given number? This requires some technique. The significand function providing the basis of these techniques plays a funda- mental role in the context of Benford’s Law. Let x > 0 be a real number, then the significand function S(x) is defined as
S(x) = t, t ∈ [1, 10) (6.2) where t is the unique number such that x = t · 10σ for some necessarily unique σ ∈ Z. The exponent σ is called the scale of x. For convenience we define S(x) = 0 when x = 0. Since we are interested in the digits of x, its sign will never play a role. For example, for x = 2356.88 = 2.35688·103 we have S(x) = 2.35688 with scale σ = 3. How do we get the scale? For this purpose we need a most important integer function, the floor bxc, which is defined as the integer nearest to x when x is rounded down. Thus
b4.81c = 4, b−2.4c = −3, etc.
The scale σ of a nonnegative number x is simply blog xc. Indeed, log 2356.88 = 3.3723, thus σ = b3.3723c = 3. Note that σ is not very interesting for our purposes as we are primarily inter- ested in the values of the digits and not in the position of the decimal point. Thus it will be convenient to strip off the integer part of a number x, which yields the fractional part of x, denoted by hxi:
hxi = x − bxc
The unique representation of x = S(x) · 10σ implies S(x) = x · 10−σ. But for x > 0 it holds that x = 10log x therefore we get the following explicit representation of the significand function valid for all x 6= 0:
S(x) = 10log |x|−blog |x|c = 10hlog |x|i (6.3)
102 Topic 6. Benford’s Law
The function S(x) gives us direct access to the digits making up a number x.
Let Dm(x) := Dm, m ∈ N, be the m-th significant digit of x when counted from left. Since we agree that a number will never have leading zeroes, clearly:
D1 ∈ {1, 2,..., 9},Dm ∈ {0, 1,..., 9} for m > 1.
Also, let
−1 −2 S(x) = D1 + D2 · 10 + D3 · 10 + ... X 1−m = Dm · 10 (6.4) m∈N be the decimal expansion of the significand function. For example, S(2356.88) = 2.35688, thus its significant digits are:
D1 = 2,D2 = 3,D3 = 5,D4 = 6,D5 = 8,D6 = 8, and, of course by (6.4):
S(2356.88) = 2 + 3 · 10−1 + 5 · 10−2 + 5 · 10−3 + 6 · 10−4 + 8 · 10−5 + 8 · 10−6.
Cleverly using the floor function we get the digits Dm easily from the significand function:
m−1 m−2 Dm = b10 S(x)c − 10b10 S(x)c, for all m ∈ N. (6.5) In particular, the leading digit is given by
−1 D1 = bS(x)c − 10b10 S(x)c.
For instance, when x = 2356.88, then S(x) = 2.35688 and
0 D2 = b10 · 2.35688c − 10b10 · 2.35688c = b23.5688c − 10b2.35688c = 3, etc.
6.1.3 Benford’s Law and the uniform distribution
Now we put some randomness into the story: let X be a random variable and S(X) its significand function. We define: X satisfies Benford’s Law (strong version), if S(X) has a logarithmic distribution:
P (S(X) ≤ t) = log t, t ∈ [1, 10) (6.6)
Note that S(X) appears to be a continuous random variable. We shall see shortly that (6.6) implies Benford’s First Digit Law but the converse is not true. It is easy to see that the logarithmic law (6.6) holds if and only if the logarithm of S(X) has a continuous uniform distribution on [0, 1]. Apply in the equation above the substitition log t = s. Then t = 10s and:
P (S(X) ≤ 10s) = s s ∈ [0, 1).
103 Topic 6. Benford’s Law
Using the explicit expression (6.3) for the significand we obtain
P (10hlog |x|i ≤ 10s) = s s ∈ [0, 1).
Upon taking logs we get
P (logh|X|i ≤ s) = P (log S(X) ≤ s) = s, s ∈ [0, 1). (6.7)
Thus we realize that log S(x) ∼ U(0, 1), a continuous uniform distribution on the interval [0, 1]. These observations tell us that we will observe Benford’s law, if the logs of the significands of observed data are close to a uniform distribution et vice versa. Note, that (6.7) gives us a simple way to generate pseudo random numbers that follow Benford’s Law. Just generate a random number U ∼ U(0, 1) and form Z = 10U , then Z must be Benford.
6.1.4 The general digit law
The logarithmic distribution (6.6) has lots of information to offer. For brevity let S(X) := S and recall that the significand S has a decimal expansion
−1 −2 S = D1 + D2 · 10 + D3 · 10 + ..., moreover, S is a continuous random variable, so P (S ≤ t) = P (S < t) because P (S = t) = 0. From (6.6) we can easily derive not only Benford’s First Digit Law, but also the distribution of the second digit D2, of the third digit D3, etc.
Let’s begin with D1 and consider the event {D1 = d1}. Some reflection shows that it is equivalent to the event
{D1 = 1} ≡ {d1 ≤ S < d1 + 1}
Thus
P (D1 = d1) = P (d1 ≤ S < d1 + 1) = P (S ≤ d1 + 1) − P (S ≤ d1) d1 + 1 = log(d1 + 1) − log(d1) = log d1 1 = log 1 + , d1 which is Benford Law for the first significant digit. But without any problems we can get more. For the joint distribution of the first two digits D1 and D2 we obtain:
−1 −1 P (D1 = d1,D2 = d2) = P (d1 + d110 ≤ S < d1 + (d2 + 1) · 10 ) 1 = log 1 + (6.8) 10d1 + d2
104 Topic 6. Benford’s Law
You will have no difficulties to fill in the details. Of course, 1 ≤ D1 ≤ 9, 0 ≤ D2 ≤ 9. Example. If a number is randomly selected from a data set following Benford’s Law (strong version), then the probability that it starts with 50 ... equals
1 . P (D = 5,D = 0) = log 1 + = 0.0086 1 2 50
To obtain the marginal distribution of D2 we have to sum (6.8) over all possible values of d1:
9 X 1 P (D2 = d2) = log 1 + (6.9) 10k + d2 k=1 A trite caclculation yields:
d 0 1 2 3 4 5 6 7 8 9 P (D2 = d) 0.120 0.114 0.109 0.104 0.100 0.097 0.093 0.090 0.088 0.085
Table 6.3: Benford’s Law - second digit probabilities
A quite instructive barplot of the disgtributions of the first and second digits is displayed in Figure 6.2. You can see, that the distribution of D2 is already rather close to a discrete uniform distribution, and this pattern prevails when we consider the marginal distributions of Dm for m > 2. Continuing the arguments
0.3
0.2
0.1
0.0 0 1 2 3 4 5 6 7 8 9
Figure 6.2: Benford’s Law - first and second digit probabilities outlined above it is not difficult to find the joint distribution of the first k digits:
P (D1 = d1,D2 = d2,...,Dk = dk) = 1 = log 1 + k−1 k−2 (6.10) 10 d1 + 10 d2 + ... + dk Formula (6.10) is also known as Benford’s Law for the first k digits.
105 Topic 6. Benford’s Law
6.1.5 Testing the Hypothesis
Benford or not Benford?
It is a question of utmost practical importance to find out whether a given data set conforms to Benford’s Law. This question is a statistical decision problem and I will discuss very briefly how this problem can be addressed methodologi- cally. I decided to include this in our Invitation for two reasons: • A majority of publications related somehow to Benford’s Law is empirical in nature. • Many of these papers are relatively poor methodologically. At the very beginning it is important to announce some bad news: no finite data set can be exactly Benford in the sense of (6.10). The reason is that the Benford probabilities (6.10) of sets of k given significant digits become arbitrarily small as k → ∞, and no discrete probability distribution with finitely many points of support can take arbitrarily small positive values. Despite of this drop of bitterness it is still legitimate to ask: how do we mea- sure close conformance to Benford’s law or a significant deviation from it? In statistics, this question runs commonly under the headline goodness-of-fit. We have a null hypothesis:
H0 : data conform to Benford’s Law and the alternative hypothesis:
H1 : data do not conform to Benford’s Law Given a significance level α, we want to decide whether for a given data set hypothesis H0 has to be rejected or not. Or stated equivalently: do our data show a statistically significant deviation from Benford’s Law?. A rather traditional route of attack is this: use (6.10), usually for small values of k, and compare the predicted probabilities with empirical frequencies observed in sample data. Virtually all empirical work on Benford’s Law pursues this way and solves a discrete goodness-of-fit problem.
Separate testing of single digits
The idea is very simple and should be familiar to you from elementary statistics courses: suppose we want to find out whether there is a significant deviation between the observed frequency fd of the event {D1 = d} and the the corre- sponding probability which should be pd = P (D1 = d) = log(1 + 1/d), as we know from (6.1). Testing conformity to Benford’s First Digit Law is done for each the nine possible values d of D1 separately:
H0 : pd = log(1 + 1/d),
H1 : pd 6= log(1 + 1/d)
106 Topic 6. Benford’s Law
The test statistics are
|fd − pd| √ Td = p n, d = 1, 2,..., 9 pd(1 − pd)
For large sample size n the statistics Td are approximately standard normal, the corresponding p-values equal P (|Td| > |td|), where td is is the observed sample value of Td. The results of these tests have to be interpreted with care: Suppose that testing the nine hypotheses leads to a rejection of only the hypothesis for d = 1. Is this sufficient to conclude that our data are not Benford? No, not at all. The point is, that performing these tests simultaneously affects the chosen level of significance and in turn the probability of type-II error. So this procedure should only be used in an explorative analysis of your data.
Distance-based tests
Separate testing of single digits is easy but because of lacking power this ap- proach is not very reliable. An alternative is testing based on the vectors
p = [p1, p2, . . . , p9] and f = [f1, f2, . . . , f9], where the pd are calculated according to (6.1), of course. We are tempted to say that the observed frequencies f are close to the proba- bility distribution under H0 represented by p, if some appropriate measure of deviation has a small value, otherwise we would reject H0. To be concrete, our testinging problem is now more specifically:
H0 : D1 has distribution p and therefore is Benford
H1 : D1 has another distribution and therefore is not Benford Several distance measure are in practical use, probably the oldest is the χ2- statistic. It is just the sum of the squared and normalized deviations between pd and fd:
9 2 X (pd − fd) χ2 = n (6.11) p i=1 d The χ2-test is one of the most often used tests in empirical studies about Ben- ford’s Law. However, it has a major drawback: if n, the sample size, increases then also χ2 will grow and as a result the power of the test becomes so large that practically always H0 is rejected. Alternatives are available, for instance instead of using the normalized squared deviations we may use the so-called Chebyshev-Distance which just takes the maximum distance between pd and fd: ? √ µ = n max |pd − fd| (6.12) 1≤d≤9
107 Topic 6. Benford’s Law
It should be noted that in performing these types of goodness-of-fit tests we are by no means restricted to the leading digit D1. The same procedures apply when testing the distribution of the second digit D2. Its H0-distribution is given by (6.9). And it is also possible to formulate as null-distribution P (D1 = d1,D2 = d2), i.e., we test the joint distribution of the first two digits. This is done very ofting in accounting studies. See Section 2, where we will have so say more about this idea.
Tests based on the empirical distribution function
Given a sample (X1,X2,...,Xn) of identically and independently distributed random variables with distribution function F (x) = P (X ≤ x), the empirical distribution function (ecdf) Fn(t) of the sample is defined by n number of sample values ≤ t X F (t) = = 1(X ≤ 1), (6.13) n n i i=1 where 1(A) is the indicator function of event A. The ecdf has many remarkable properties, the most important of these being given in the Glivenko-Cantelli Lemma:
Let Dn = sup |Fn(t) − F (t)| then lim Dn = 0 with probability 1 (6.14) t n→∞
Note that Dn is the maximum absolute deviation between the ecdf and the true distribution function F (t) and the lemma states: this distance becomes arbitrarily small and remains that small as sample size increases. M. Lo´eve has called (6.14) the fundamental theorem of statistics. It implies that the whole unknown probabilistic structure of the sequence Xi can be discovered from data with certainty. Note also the formal similarity of Dn to the Chebyshev distance (6.12) introduced above.
1.0
0.8 D 0.6 n
0.4
0.2
−3 −2 −1 0 1 2 3
Figure 6.3: The maximum deviation Dn between the ecdf Fn(t) and F (t)
Example. In Figure 6.3 I have displayed the ecdf of a sample of size n = 10 taken from a standard normal distribution: X = {−3.01, −1.09, −0.59, −0.55, 0.79, 1.17, 1.31, 1.42, 1.97, 2.17}
108 Topic 6. Benford’s Law
For instance: Fn(0.5) = 0.4 because there are 4 sample values ≤ 0.5 and the Fn(t) has jumps of height 1/n = 0.1 for all sample values are different. The maximum absolute deviation has value Dn = 0.3852. The fundamental result (6.14) gives rise to several classical goodness-of-fit tests. Indeed, Dn is the test statistic of the famous Kolmogorov-Smirnov Test: it tests the null hypothesis H0 : F (t) = F0(t) against H1 : F (t) 6= F0(t) in the two-sided case and rejects H0 if the observed Dn is too large. So, it is quite natural to use this test to find out whether observed data deviate significantly from the Benford distribution. However, there is a problem: in the classical setting of the Kolmogorov-Smirnov Test it is assumed that the sample comes from a continuous distribution. Of course, the Benford First Digit Law is not continuous, the null distribution is the step function given by:
0 for t < 1 F0(t) = log(1 + d) for d ≤ t < d + 1, d = 1, 2,..., 8 (6.15) 1 for t ≥ 9
To apply the Kolmogorov-Smirnov Test to a discrete distribution we need a suitably adapted variant of this test to have reliable p-values. Such variants are available, see Arnold and Emerson (2011), and are now part of standard statistics packages like R. I will show you in the next section how to apply these. An interesting alternative to Kolmogorov-Smirnov ist the Cram´er-vonMises Test. Here the test statistic is essentially the sum of the squared deviations between F0(t) and the ecdf Fn(t): Z ∞ 2 2 W = n [Fn(t) − F0(t)] dF0(t) (6.16) −∞ You should not worry about the integral occurring here, actually W 2 is a sum because F0(t) is a step function. Observe the formal similarity between the statistic W 2 and the χ2-statistics (6.11). Before applying these statistical tests a decision has to be made which software to use. I recommend R, which as I found does a good job regarding our prob- lem. There are two packages which are particularly interesting for or purposes, BenfordTests and dgof which implement the Kolmogorov-Smirnov and the Cram´er-von Mises Tests for discrete distribitions mentioned above and some more. There is also a package benford.analysis which may be intersting for your experiments, but it is not discussed here. The functions provided by BenfordTests are very handy to use, whereas those of dgof are a bit more complicated in use.
6.1.6 Remarkable Properties of Benford’s Law
In the sequel I will introduce to you some important and really remarkable results, and I will do so in a rather informal way. The major motivation is
109 Topic 6. Benford’s Law that you should know about these properties of Benford’s Law without being incommodated too much by heavy mathematics. Indeed, the derivations and proofs are very technical and require quite sophisticated mathematical tools. The main source is this section is Arno Berger and Theodore P. Hill (2015).
Scale-invariance
A law as universal as Benford’s should be scale-invariant in the sense that it is independent of units of measurement. In his excellent paper Raimi (1976) writes:
. . . that law must surely be independent of the system of units chosen, since God is not known to favor either the metric system or the English system. In other words: a universal first digit law, if it exists, must be scale-invariant.
Just to give you an example: if accounting data of a big company are Benford in US $, then they should be so too in EUR, British Pounds, etc. This what we expect. Given a random variable X, scale-invariance means that the distribution of the digits D1,D2,... is the same for X and αX for any real scaling constant α > 0. Or interpreted in the sense of Benford’s law (strong version): the significands S(X) and S(αX) both follow the logarithmic distribution. It is important to note (and source of a common misunderstanding) that scale-invariance refers only to the digit distribution, not to the distribution of X itself. Indeed, no non-null random variable can be scale-invariant. The scale-invariance property is characterizing the law and therefore unique: if for any α > 0 and any d ∈ {1, 2,..., 9} 1 P (D (X) = d) = P (D (αx) = d) = log 1 + , 1 1 d then X is Benford et vice versa, the same is true for (6.10). The Benford distribution is the only distribution with this property.
Powers and products
The power theorem. The classical continuous distributions like normal, uni- form, exponential, are not Benford, but something interesting happens if we raise these to powers, Experiment 3 above has given us some indication. In- deed, it can be proved that if X is any continuous random variable having a probability density, then for the sequence Xn, n = 1, 2,... the significands S(Xn) tend in distribution of the logarithmic law (6.6). This result is of con- siderable value in application, e.g., in fraud detection, see Section 2. The product theorem. Also, if X and Y are independent random variables and X is Benford, then so is the product XY , provided P (XY = 0) = 0. This
110 Topic 6. Benford’s Law result has interesting implications. Consider for instance an inventory stock In. Very often these stocks behave like a random walk process as goods are added and withdrawn from the inventory in random amounts. We shall see in a moment that random walks are usually not Benford, but prices of goods often are. Thus if we want to determine the value Vn of an inventory stock at some epoch n we calculate Vn = In · pn. So, even when In is not Benford but pn is, then Vn will be Benford. A limit theorem for products. The nice behavior of Benford’s Law with regard to products is also reflected in the following very important result:
If X1,X2,... are independent and identically distributed continuous random variables then the significands of their product n Y X1 · X2 ··· Xn = Xi i=1 tend in distribution to Benford’s Law.
Here is an example: Let C0 some initial capital not necessarily random and suppose that we get interest on C0 with interest rates r1, r2, . . . , rn. Then by compounding interest the value Cn of our capital at epoch n will be:
Cn = C0(1 + r1)(1 + r2) ··· (1 + rn). If interest rates are continuous random variables then the product limit theorem applies and Cn will be approximately Benford for large n.
Sums
Regarding sums the situation is not so nice as it is for products. Here is an intuitive argument: Benford’s Law is a logarithmic distribution. Now log(xy) = log x + log y, but it is not possible to express log(x + y) in simple terms of log x and log y. Indeed, if X and Y are both Benford then X +Y will not be Benford. Moreover the following striking result holds:
If X1,X2,...,Xn are independent and identically distributed random variables Pn with finite variance, then i=1 Xi is not Bendord in the limit n → ∞, not even a subsequence will be Benford. Pn As a result, the classical random walk processes Sn = i=1 Xi with increments being discrete with ±1 or having some other distribution with finite variance are not Benford. An informal argument is this: the conditions stated above are those of the classical Central Limit Theorem. Thus the standardized sum Pn i=1 Xi will tend in distrinution to a standard normal, but the latter can be shown not to be Benford. However, there is one more invariance property. Sum-invariance. In his PhD-thesis Nigrini (1992) observed that in data sets he considered the sum of significands of data points with D1 = 1 was very
111 Topic 6. Benford’s Law
close to sums of items with D1 = 2 or any other possible value of the first digit. More precisely, let {x1, x2, . . . , xn} be a data sample of size n and let
Sd1,d2,...,dm (xk) be the significand of sample point xk having the first m digits equal to d1, d2, . . . , dm, otherwise set Sd1,d2,...,dm (xk) = 0. By the Law of Large Numbers the arithmetic mean of these significands tends to the mathematical expectation E[Sd1,d2,...,dm (X)], i.e., n 1 X lim Sd ,d ,...,d (xk) = E[Sd ,d ,...,d (X)], n→∞ n 1 2 m 1 2 m k=1 for all m ∈ N. So far, nothing special. However, if the data source X is Benford, then this limit is independent of the digits d1, d2, . . . , dm, moreover this invariance property is characterizing Benford’s Law. Indeed, if X is Benford, then it can be shown: 101−m E[S (X)] = (6.17) d1,d2,...,dm ln 10 . for all possible tuples d , d , . . . , d . For m = 1 we have E[S (X)] = 0.4343 for 1 2 m . d all digit values d, for m = 2 E[Sd1,d2 (X)] = 0.0434, etc.
Hill’s Limit Theorem
Many of the properties discussed so far are characterizing Benford’s Law, but none of them explains its astounding empirical ubiquity. Benford (1938) already observed, many data sets do not conform the law closely, others did reasonably well. But, as Raimi (1976) writes: what came closest of all, however, was the union of all his tables. Stated differently: the best fit to the logarithmic law Benford obtained when combining samples coming from so diverse sources like sports results, numbers from newspapers, atomic weights, etc. This seemingly harmless observation, unnoticed for many years, was the starting point of Hill’s seminal work. Theodore P. Hill (1995) derived a new statistical limit law which may be seen as some kind of Central Limit Theorem for signif- icant digits. This limit theorem offers a natural explanation for the empirical evidence of Benford’s Law. Recall, the Central Limit Theorem tells us that under certain mild conditions sums of independent random variables have a normal distribution when the number of summands tends to infinity. Similarly, Hill’s Theorem: if probability distributions are selected at random and random samples are taken from each of these distributions in any way so that the overall process is scale-unbiased, then the frequencies of significant digits will tend to the logarithmic distribition (6.6) as the sample size n → ∞. Some explantions are in order now:
• What does it mean: a probability distribution is selected at random? Easy (in principle): we perform a random experiment and its result will be a probability distribution. For example, suppose our random experiment
112 Topic 6. Benford’s Law
has as possible outcomes two probability distributions F1 and F2 forming a sample space Ω = {F1,F2}. F1 may be, e.g., a uniform distribution on [0, 1], F2 a standard normal distribution.
Suppose also that F1 is selected with probability 1/2 and this is also the 2 probability for selecting F2. Think of tossing an unbiased coin : if the coin shows head F1 is selected, otherwise F2. Once a distribution has been determined a sample of m1 (say) independent observations is taken from this distribution. The process is repeated, again a distribution is selected at random and becoming this way the source for another sample of m2 observations. This new sample is combined with the first sample to give a larger sample with m1 + m2 observations. The process continues and stops if our combined sample has reached some required size n. In our example Ω was a set containing only two points, the distribution functions F1 and F2. But Ω may be a continuum as well. For instance:
Ω = {all normal densities with µ and σ2, where µ has a uniform distribution on [−α, β]}
Here the base experiment is this: first select a random number u uniformly from the interval [−α, β]. Then take a sample from a normal distribution with µ = u. Repeat this process as long as required.
• What means the selection process is scale-unbiased? This is not the same as scale-invariance discussed above. In fact, it is a much weaker requirement: the sampling process on average does not favor one scale over another. It is even possible that none of the distribu- tions in Ω are Benford and therefore scale-invariant. The justification of scale-unbiasedness is somewhat akin to the assumption of independence in context of the Central Limit Theorem. Checking the assumption can be done indirectly by a goodness-of-fit test for the logarithmic law.
6.2 Where to go from here
Having read the Invitation you may be now sufficiently motivated to go on reading and see what I want from you. Writing a nontrivial thesis about Benford’s Law is certainly a challenge. There are at least two ways to go along.
6.2.1 Statistical Forensics
There is an enourmous amout of literature on the application of Benford’s Law to detect data manipulations. Proven cases of such manipulations range from the private sector (financial statements of big companies) over macroeconomic
2In technical terms: we are constructing a random probability measure.
113 Topic 6. Benford’s Law data reported by governments to EU-authorities to falsification of large sets of clinical data. Why not write a thrilling case study? You wont have difficulties to find attrac- tive interesting examples of spectacular bankruptcies, of insider trading, you might remember the Libor Scandal or diverse illegal manipulations of foreign exchange markets. And, of course, this list is by no means complete. It has been shown several times that Benford’s Law is applicable in diverse auditing contexts including internal, external and governmental auditing. A typical procedure is to test the Benford hypothesis on the first and/or the sec- ond digit of data like revenues, canceled checks, inventory and disbursements. Quite often these tests pointed auditors to telltale irregular patterns in various financial transactions. The US Internal Revenue Service uses Benford’s Law to sniff out tax cheats and Deutsche Bank crunched the numbers on Russell 3000 companies and found that a Benford distribution applies to almost every balance sheet and income statement item, from annual sales to accounts receiv- ables to net income. The vast majority of companies’ data adheres to Benford’s Law, with about 5 per cent of Russell 3000 companies not conforming based on Deutsche’s calculations. Similar data was found for global firms, see the interesting article in the South China Morning Post, July 10, 2015. Interesting studies have been performed for the public sector. In a series of papers Rauch et al. (2011) studied among others European Union economic data relevant to compliance with Stability and Growth Pact criteria. One of their findings was a significant deviation of Greek official data from Benford’s Law. The fact that Greek data manipulation was officially confirmed by EU Commission can be seen as an evidence that Benford’s Law might be a valuable forensic tool. But even sciences and academia are not immune to dishonesty and deception. A well-known case of data falsification from clinical experiments is reported by J. Interlandi in the New York Times Magazine (October 2006), the Poelman Affair. Lee, Tam Cho, and Judge (2015) have shown data manipulated by Eric Poelman show a significant deviation from Benfords’s Law. And finally, another interesting forensic application of Benford’s Law is to detect manipulations of elections. Much discussed examples are the election in Turkey, Nov. 1, 2015, and the presidential election in Iran 2009. Remarkably, it is often the last digit of vote counts that gives indications to manipulation. If you decide to pursue the forensic route in your thesis then there are a few points to be taken care of: • Collect a sufficient amount of data, at least a few hundred items. I know that this is the really cumbersome part of your study. For instance ac- counting data are often available only via annual reports of companies. These in turn are usually published as pdf-documents. Thus you will have to use appropriate software tools to extract data from those files. Regarding macroeconomic data the situation is much better as the EU offers free public access to most of these data.
114 Topic 6. Benford’s Law
• Keep in mind that if your data or parts thereof show significant devia- tions from Benford’s Law this does not automatically mean fraud or data falsification. Indeed, examples are known where unmanipulated data are far from Benford. So, if in such a case it is very likely that your null hypothesis that data are Benford is rejected. Still, it may be possible to use Benford’s Law as a forensic tool, if you transform your data. Such a transformation is mentioned in Section 7, Remarkable Properties, see also Morrow (2010). • Formulate your hypotheses carefully and perform various statistical anal- yses. Regarding tests, please read also the subsection on Experimental Statistics below. In addtion to the methods outlined in Section 5, the R-package benford.analysis will be very helpful in this context. Give critical interpretations of your results.
6.2.2 Experimental Statistics
This a second interesting route to follow. What can be said about the discrim- inative capabilities of the statistical tests presented in Section 5, about their power? Recall, the power of a hypothesis test is defined as the probability to reject the null hypothesis when the alternative is true. In other words, this is the probability that the test correctly signals data are not Benford. So this is a most important measure of quality of a test. Determination of the power of a statistical test requires the specification of an alternative. Normally, it is not enough to say: HA : data are not Benford. We must be more specific.
A standard scenario
One possibility is to state as alternative: the distribution of digits is that of uniform distribution (say), a normal distribution, or of an exponential distribu- tion. All these are known to be non-Benford. So, it would be very interesting to estimate the power of the tests described above by means of a systematic simulation study. • What is the effect of the sample size on the power? • How does variance influence power? For instance, what is the effect, if we widen the interval of support if a uniform, what if we increase the standard deviation of a normal distribution? I expect that sandard tests of the Benford hypothesis based on digits will show rather different behavior.
Faking data intelligently
Probably the most interesting alternative hypotheses arise, if we try to fake data in an intelligent way. What does it mean? It is no challenge to generate data sets where the first digit follows a Benford
115 Topic 6. Benford’s Law distribution very closely. This requires only a clever application of the properties of the significand function outline in Section 2. For instance you may generate a random sample of uniform variates, take the significands and replace the first digit by digits from a sample following Benford. What will happen? Data sets manipulated in this way will very likely pass any of the first digit tests, may it be χ2, Kolmogorov-Smirnov, etc. But how likely is it to reveal the manipulation when testing the first two digits? What happens, if we fake the first two digits of the data? How can this be done? An interesting idea is discussed in Jamain (2001, Section 4): data which are basically Benford are contaminated by non-Benford data. How do tests perform in dependence on the amount of contamination? Now I want you to do something I never ask my students: please activate some criminal energy in you! Develop ideas, scenarios, models to manipulate data in such a way that the classical digit-based tests perform as badly as much.
Testing characterizing properties
Tests based on the first few significant digits are tests of the weak form of Benford’s Law. But can we test also the strong version (6.6)? This can be done in several ways and it would be very interesting to find out how these approaches perform compared to digit-based tests. Here are some ideas you may consider in your experiments: • Since the significand function S(X) is a continuous random variable test- ing (6.6) amount to a classical continuous goodness-of-fit test. Equiva- lently you may test whether log S(X) has a uniform distribution on [0, 1]. • What about scale-invariance? We know that a random variable is scale- invariant if and only if it is Benford. Smith (1997) has suggested a Ones- Scaling Test. It checks the distribution of leading ones in rescaled data, i.e., it examines the relative frequencies of
2 D1(X) = 1,D1(αX) = 1,D1(α X) = 1,...
where α > 1 is some constant such that log α is irrational. For instance α = 2 may be a proper choice. How can we test not only leading ones but all possible leading digits simultaneously? Devise such a procedure generalizing Smith’s idea. • What about sum-invariance? Only Benford can have this property. When preparing this topic the following idea came into my mind: for a sample {x1, x2, . . . , xn} define
S(x ) if D (x ) = d S (x ) = k 1 k d k 0 else
By sum-invariance the arithmetic means µbd of the Sd(xk) should have roughly the same values for all possible values of digit D1 = {1, 2,..., 9}.
116 Topic 6. Benford’s Law
We know from (6.17) its theoretical value: if data are Benford, then . µd = E(S1(X)) = 1/ ln 10 = 0.4343.
Now by the Central Limit Theorem the standardized values
µbd − 1/ ln 10√ Td = n sd
2 are approximately standard normal for large sample size n, where sd de- P9 2 notes the sample variance of Sd(X). The sum of squares d=1 Td of the 2 statistics Td follows a χ distribution with 9 degrees of freedom. So we can use this distribution to determine critical values and p-values and thus have one more test of the Benford hypothesis. However, my argument has a weakness: it requires independence of the Td. Suppose, we tacitly ignore this point, how does this sum-invariance test perform?
6.3 An Annotated Bibliography
Searching the web with google using Benford’s Law as a search key I get at the time of writing this more than 100 000 hits. There is an enormous amount of publications dedicated to this law and it looks to me as if this number is growing exponentially. The papers and books on Benford’s Law fall roughly into two categories: (i) theoretical (extensions of the law, conditions when it does hold, when it does not, special topics from probability, number theory, computer science), (ii) ap- plications (forensic statistics, auditing, social sciences, astronomy). There is an extensive online bibliography created and maintained by A. Berger, T. P. Hill, and E. Rogers (2016) covering all these categories. Strongly recommended is the fine book edited by S. Miller (2015). This text- book has six parts, the first two of them devoted to the mathematical theory of Benford’s Law, the other parts cover many interesting applications from fraud detection, diagnosis of elections, applications in economics, psychology, natural sciences and computer science. Regarding the mathematical theory the most comprehensive text and standard reference is the book by Arno Berger and Theodore P. Hill (2015). Although sometimes technically demanding it gives a very readable and excellent coverage of the current state of the art. Also recommended regarding theory is the master thesis by Jamain (2001). If you plan to work on statistical forensics then there is no way around reading Nigrini. After his PhD Thesis (Nigrini, 1992) he has published quite a number of papers on applications of Benford’s Law to fraud detection, auditing and accounting. He has authored also a very interesting book (Nigrini, 2012) on the subject. Here you find several examples and demonstrations of statistical tests partly developed by the author and implemented in a spread sheet calculator.
117 Topic 6. Benford’s Law
6.4 References
[1] Taylor A. Arnold and John W. Emerson. “Nonparametric Goodness-of- Fit Tests for Discrete Null Distributions”. In: The R Journal 3.2 (2011), pp. 34–39. url: http : / / journal . r - project . org / archive / 2011 - 2/RJournal_2011-2_Arnold+Emerson.pdf. [2] Frank Benford. “The law of anomalous numbers”. In: Proc. Amer. Philo- sophical Soc. 78 (1938), pp. 551–572. [3] A. Berger, T. P. Hill, and E. E. Rogers. Benford Online Bibliography. 2016. url: http://www.benfordonline.net. [4] Arno Berger and Theodore P. Hill. An Introduction to Benford’s Law. Princeton University Press, 2015. [5] William Feller. An Introduction to Probability Theory and Its Applica- tions. 2nd. Vol. 2. John Wiley and Sons, 1971. [6] Theodore P. Hill. “A statistical derivation of the significant-digit law”. In: Statist. Sci. 10.4 (1995), pp. 354–363. [7] Theodore P. Hill. “The difficulty of faking data”. In: Chance 12.3 (1999), pp. 27–31. url: http://digitalcommons.calpoly.edu/cgi/viewcont ent.cgi?article=1048&context=rgp_rsr. [8] Adrien Jamain. “Benford’s law”. MA thesis. Imperial College of London, 2001. [9] J. Lee, W. K. Tam Cho, and G. Judge. In: Stephen J. Miller. Benford’s Law: Theory and Applications. Ed. by Stephen J. Miller. Princeton Uni- versity Press, 2015. Chap. Generalizing Benford’s Law. [10] Stephen J. Miller. Benford’s Law: Theory and Applications. Ed. by Stephen J. Miller. Princeton University Press, 2015. [11] John Morrow. Benford’s Law, Families of Distributions and a Test Basis. http://www.johnmorrow.info/projects/benford/benfordMain.pdf. last accessed Feb 25, 2016. 2010. [12] Simon Newcomb. “Note on the Frequency of Use of the Different Digits in Natural Numbers”. In: Amer. J. Math. 4.1-4 (1881), pp. 39–40. url: http://dx.doi.org/10.2307/2369148. [13] M. J. Nigrini. Benford’s Law: Applications for Forensic Accounting, Au- diting, and Fraud Detection. Wiley Corporate F&A. John Wiley & Sons, 2012. [14] M. J. Nigrini. “The detection of income tax evasion through an analysis of digital distributions”. Thesis (Ph.D.) Cincinnati, OH, USA: Department of Accounting, University of Cincinnati, 1992. [15] Ralph A. Raimi. “The first digit problem”. In: Amer. Math. Monthly 83.7 (1976), pp. 521–538. [16] Bernhard Rauch et al. “Fact and Fiction in EU-Governmental Economic Data”. In: German Economic Review 12.3 (2011), pp. 243–255. url: ht tp://dx.doi.org/10.1111/j.1468-0475.2011.00542.x.
118 Topic 6. Benford’s Law
[17] S. W. Smith. “The Scientist and Engineer’s Guide to Digital Signal Pro- cessing”. In: Republished in softcover by Newnes, 2002. San Diego, CA: California Technical Publishing, 1997. Chap. 34 - Explaining Benford’s Law.
119 Topic 6. Benford’s Law
120 Topic 7
The Invention of the Logarithm
A Success Story
For it would be without doubt an incredible stain in analysis, if the doctrine of logarithms were so replete with contradictions that it were impossible to find a reconciliation. So for a long time these difficulties tormented me, and I was under several illusions concerning this matter, in order to satisfy myself in some manner without being obliged to completely overturn the theory of logarithms. Leonhard Euler, 1749
Keywords: history of mathematics, logarithmic and exponential function numerical mathematics
7.1 An Invitation
7.1.1 A personal remembrance
In the early 1970s I was attending secondary school in Vienna (Astgasse). The mathematics courses there were pretty demanding and I clearly remember that one of the most cumbersome affairs in these courses was to perform various numerical calculations. We had to do these by hand, sometimes with the aid of a slide rule. The use of the latter required a good deal of dexterity and skill and getting correct results was not only a matter of meticulous preciseness but also of good luck1. This annoying situation changed in the sixth class. At the beginning of that year every pupil was handed a small innocuous booklet, Vierstellige Logarithmen. After having browsed through the book, my opinion was: I have never seen such an boring book! There were tables after tables, practically no text, only tables on each page. However, a few weeks later I had to modify my opinion. After having been introduced to the concept of the exponential function and its inverse, the logarithmic function, after having been taught how to solve simple
1At that time I didn’t know that slide rules use essentially logarithmic scales!
121 Topic 7. The Invention of the Logarithm exponential equations, our instructor gave us a brief course on how to use the tables of logarithms. It turned out that only a few rules had to be obeyed: a log(a · b) = log(a) + log(b), log = log(a) − log(b), b log(ab) = b log(a), log(1) = 0 (7.1)
It was a quantum leap! Multiplying two numbers? Easy, just determine their logarithms by a table look up, add these and one more table look up gives the result2. Division of numbers? What a nerve racking and tedious task when done by hand! But with logarithms it is as easy as subtracting two numbers! Calculating powers and roots? Also easy, it’s just a division! Needless to say, that I was really enthusiastic about this new tool, and my enthusiasm was shared by most of my class mates3. In my last year at secondary school the first pocket calculators became available, Texas Instruments was the leading company, offering the scientific calculator TI SR 50. It was as large as a brick stone and almost as heavy as a brick stone. And it was extraordinary expensive! The price of such a handy computer was higher than the average monthly salary of a worker at that time. But the capabilities of these small computers were really amazing. Now you could do all that hard numerical stuff without resort to logarithm tables. Not surprisingly, at the beginning of the 1980s tables of logarithms were completely replaced by pocket calculators, since their prices have gone down drastically. An era ended at that time, a truly remarkable success story which has lasted more than 350 years. But how did it begin?
7.1.2 Tycho Brahe - the man with the silver nose
Tycho Brahe, born on 14 December 1546, originated from a famous Danish noble family. At an age of only 13 years he started studying at the University Copenhagen. The solar eclipse in 1560 which has been predicted with high accuracy inspired him to concentrate his studies on astronomy. He continued his studies in Leipzig and Rostock. There, in 1566 he got involved into a heavy dispute with another Danish noble man (rumors say that the dispute was about a mathematical formula). The dispute ended in a duel with sabers in which Tycho Brahe Tycho’s nose was cut off. That accident disfigured 1546–1601 him for the rest of his life. In part this disfacement
2Actually, some intermediate scaling steps are necessary, but as our logarithms were to base 10, these were also quite easy. 3A few years later at the university I was taught the basic principles of Laplace Transforms, another key experience, which reminded me of logarithms. Using Laplace Transforms, differ- entiation of a function becomes (essentially) a multiplication by a variable, a definite integral is a simple division, etc.
122 Topic 7. The Invention of the Logarithm could be hidden by an artificial nose made of silver. For this reason Tycho became known as the man with the silver nose. Tycho’s studies and later his scientific work as an astronomer were characterized by his efforts to collect astronomical data from very accurate measurements and these in turn required high precision instruments. In 1575, at an age of 21 years, he was already a renown and eminent scientist and planned to leave Denmark for Basel. King Frederick did not want to lose him and offered him the island Ven located in the Oresund¨ between Sweden and Denmark where he built Uraniborg Castle, a combination of residence, alchemistic and technical laboratories and astronomical observatories. There he lived together with his court jester, a dwarf named Jepp, and his moose4 and pursued deep and comprehensive scientific studies. This work was interrupted from time to time by fabulous festivities. Indeed, at Uraniborg opulent banquets were held regularly for illustrious guests and Tycho proved to be a charming and entertaining host. It happened in the fall of 1590 that King Jacob VI of Scotland (later King James I of England) sailed with a delegation to Denmark to meet his bride-to- be, Anna of Denmark. Due to very bad weather the royal society was forced to land on Ven, not far away from Uraniborg, where they found shelter for a few days. Tycho showed himself from his best side as entertainer and host and on that occasion he told Dr. John Craig (?–1620), personal physician of King James, about a recent, truly marvelous invention, prostaphaersis. With its help extremely complex and expensive astronomical calculations could now be carried out with breathtaking ease.
7.1.3 Prostaphaeresis
This tongue-twisting word is a composition of the two Greek words aphairein and prostithenai which mean to subtract and to add, respectively. The basis of this remarkable method is formed by classical addition theorems for trigono- metric functions. For instance, it has been known since ancient times that
cos(α + β) = cos(α) cos(β) − sin(α) sin(β) (7.2) cos(α − β) = cos(α) cos(β) + sin(α) sin(β). (7.3)
These formulas can be proved by elementary geometry or, more illuminating, by using Euler’s Equation, see Section 2 below. If we add (7.2) and (7.3) we obtain after simplification: cos(α + β) + cos(α − β) cos(α) cos(β) = . (7.4) 2 Looking closer at (7.4) you may realize a remarkable fact: the left hand side is a multiplication of two numbers, namely the product of two cosines, whereas
4The moose, kept as a house pet, was allowed to move freely inside Uraniborg and had the somewhat strange and not species-appropriate attitude to drink lots of Danish beer. One day the moose being heavily drunk dropped down the staircase and broke its neck.
123 Topic 7. The Invention of the Logarithm on the right hand side we have essentially a summation (scaled down by two). Johannes Werner (1468-1522), a German mathematician and astronomer was one of the first to realize the potential of this formula to reduce the burdening and cumbersome multiplication of numbers to much easier addition. All one needed to exploit this potential was just a comprehensive collection of tables of the sine and cosine functions. But such tabular material was already available at that time, rather voluminous collections of tables giving the sine, cosine, tangent and secant (1/ cos(α)) functions with an accuracy of 10 decimal places and even more. To see how the method works let me give you an example. However, we will not use tables but rather a pocket calculator. The point is to see how prostaphaeretic multiplication is done. Suppose, we want to calculate the product 17 · 258. Since sine and cosine functions take their values in the interval [−1, 1] we scale first:
17 · 258 = 100 · 0.17 · 1000 · 0.258 = 105 · 0.17 · 0.258
Then we find angles α and β such that
cos(α) = 0.17, cos(β) = 0.258
These angles were determined formerly by look up in the tables, today we find them using the arccos-function of a pocket calculator:
α ' 80◦ 120 4300, β = 75◦ 20 5600
Of course, using radians instead of degrees would be a bit more comfortable, but let us follow the way people worked at that time as closely as possible. Next we apply Werner’s formula (7.4):
α + β = 155◦ 150 3900 cos(α + β) = −0.90928 α − β = 5◦ 90 4700 cos(α − β) = 0.99594 0.08766 ×0.5 = 0.04386 ×105 = 4386 which is indeed the correct result 17 · 258 = 4386. Observe, that only addition and scaling (simply a shift of the decimal point) are necessary. Division is also easy. Suppose we want to calculate x/y. Rewrite this as x · 1/y and put x = cos(α) and 1/y = cos(β). From the latter we have y = sec(β), the secant-function, which was also extensively tabulated. Now use again Werner’s formula (7.4). Although nowadays prostaphaeresis appears to us as a rather laborious method, at Tycho’s and Kepler’s time it was considered a major breakthrough in com- putational mathematics because it reduces the hard work of multiplication and division to the much simpler operations of addition and subtraction. There- fore it is not surprising that it found broad dissemination in Europe because it
124 Topic 7. The Invention of the Logarithm simplified computational work so much, especially in astronomy and nautical navigation. John Craig was deeply impressed by Tycho’s explanation of the new method. Incidentally he was not only physician but has also studied astronomy in Ger- many and one of his teachers was Paul Wittich (1546-1586), in later years assistant of Tycho. When Craig returned to Scotland he contacted his friend John Napier and told him about this wonderful invention of prostaphaeresis, the Artificium Tychonicum, as Johannes Kepler once called it.
7.1.4 John Napier and Henry Briggs
That John Craig was friend of John Napier is one of those remarkable in- cidences in the history of science which often gave the impetus to new and important developments. John Napier, Earl of Murchiston, was an affluent Scottish laird, mainly busy administering his estates. Science was his leisure time activity. He concentrated on various disciplines like theology and mathematics. Thus he was not a professional mathematician but a very talented amateur. Computational mathematics was a topic he was most interested in. For instance, he invented the Napier Rods, wooden tablets which could be used to multiply numbers and even calculate square roots. Indeed, in 1614 Napier wrote: I have always endeavoured according to my strength and the measure of my ability to do away with the difficulty and tediousness of calculations, the irksomeness of John Napier which is wont to deter very many from the study of 1550-1617 mathematics. At the time when Craig informed him about the new method of prostaphaeresis he was for some years already thinking about ways to reduce multiplication do addition, division to subtraction. Craigs message prompted him to increase his efforts. Interestingly, he did not develop further prostapaeresis, as we might expect, in- stead he based his approach on an idea which goes actually back to Archimedes of Syracuse (287? - 212? BC). It is the idea of correspondence between arith- metic and geometric progressions. Let me give an example of such a correspondence: let n = 0, 1, 2,... denote an arithmetic progression, it’s simply the sequence of non-negative integers. n To each term in this sequence define an = 2 , a geometric progression with initial value 1 and common ratio 2. For n = 0, 1, 2,..., 10 the correspondence is conveniently given in the following table: n 0 1 2 3 4 5 6 7 8 9 10 an 1 2 4 8 16 32 64 128 256 512 1024
Now let us call n, the numbers in the first row, the logarithms of an, the numbers
125 Topic 7. The Invention of the Logarithm in the second row. We can easily check by example, that this overly simplistic table is indeed a table of logarithms satisfying the basic log-rules (7.1). For instance to multiply 8 by 128 we would use our table this way: log(8 · 128) = log(8) + log(128) = 3 + 7 = 10, but 10 = log(1024), thus 8 · 128 = 1024. Also division is easy with our table. Suppose we want to calculate 512/64, then by applying the division rule (7.1) and table look up: 512 log = log(512) − log(64) = 9 − 6 = 3, 64 √ but 3 = log(8), therefore 512/64 = 8. What about a square root, 256, say? Easy again, √ 1 1 log( 256) = log(2561/2) = log(256) = · 8 = 4. 2 2 √ From our table we obtain 4 = log(16) which implies 16 = 256. Thus all log-rules apply. That’s fine. √ But now a problem appears. Suppose, we want√ to calculate 128 by means of our table. Proceeding as before, we find log( 128) = 3.5, but our table has no entry an for n = 3.5. Also, the table doesn’t have entries n, i.e. logs, for the integers between 8 and 16, neither for those between 16 and 32, etc. There are gaps in the table! And these gaps become progressively larger. Thus in this simple layout, our log-table turns out to be not very useful, it is not sufficiently dense. Napier was certainly aware of this problem and constructed his tables using a geometric progression in which the terms an are very close together, thus are much denser. His solution was so simple that the world wondered why no one had thought of it before, as Pierce (1977) remarks. In modern notation the arithmetic and geometric progressions Napier used were: n 0 1 2 . . . m . . . 2 m an v v(1 − 1/v) v(1 − 1/v) . . . v(1 − 1/v) ... The number v was chosen by Napier to be v = 107. This choice was mainly inspired by the major applications Napier had in mind, numerical calculations with values of trigonometric functions. That’s also the reason why Napier called the an sines. His choice of the ratio 1 − 1/v was a rather clever one as it results in a very slowly decreasing sequence. Indeed, we find: 0.99999990 · 107 = 107 0.99999991 · 107 = 9 999 999 0.99999992 · 107 = 9 999 998.0000001 0.99999993 · 107 = 9 999 997.0000003 ... 0.9999999α · 107 = A ...
126 Topic 7. The Invention of the Logarithm
This is essentially Napier’s First Table. We call the exponent α in the last line above the Napier Logarithm of A and denote it by LN(A) = α. Thus we define (as Napier did) the logarithm of A by
(1 − 1/v)αv = A ⇔ α = LN(A), where v = 107. (7.5)
Napier himself introduced the term logarithm as a synthesis of the Greek words logos and arithmos meaning ratio and number, respectively. The calculation of the numbers in this table is greatly facilitated by the fact that computation can be performed recursively and only subtractions are necessary. To see this, observe that
n n−1 an = (1 − 1/v) v = (1 − 1/v) v(1 − 1/v) = −7 = an−1(1 − 1/v) = an−1 − 10 an−1,
−7 but 10 an−1 is merely a shift of the decimal point. Still, the amount of com- putational work mastered by Napier was really impressing. Later Napier con- structed two more tables based on the first table to cover a broader range of values. It is important to observe that the Napier logs do not satisfy the standard rules (7.1). In particular, there is no basis β in this system (as e.g., the Eulerian number e is basis of the natural logarithms). Moreover, LN(1) 6= 0. Indeed, it is a huge number which can be shown to be (see Section 2): . LN(1) = 161 180 948.53537 38070 (7.6) . Here and in the sequel the symbol doteq = means that the right hand number is given correctly in all displayed decimal places. In (7.6) therefore LN(1) is correct to 10 decimal places. As a result the all important multiplication rule actually is:
LN(A · B) = LN(A) + LN(B) − LN(1), (7.7) and similar adaptations are necessary to the division- and power rule. So Napier’s system of logarithms is a system which one has to get used to. But once one has acquired some fluency with the rules of Napier’s system practical calculations using his tables could be performed rather routinely. After almost twenty years of incredible hard work 1614 John Napier published his tables in a book entitled Mirifici Logarithmorum Canonis Descriptio. In 1619 a second book, Mirifici Logarithmorum Canonis Constructio, was pub- lished posthumously which gives a description of the method he had used to calculate his tables. Napier’s publications were almost immediately accepted and appreciated by the scientific community of that time. It is only legitimate to say that Napier’s logarithms represented a major break-through in computational mathematics. Henry Briggs (1561–1630) came across Napier’s 1614 Canon almost imme- diately after its publication. At that time he was professor of geometry at
127 Topic 7. The Invention of the Logarithm
Gresham College, London. He began to read it with interest, but by the time he has finished, his interest was changed into enthusiasm. The book was his constant companion: he carried it with him when he went abroad; he conversed about it with his friends; and he expounded it to his students who attended his lectures, as Thompson and Pearson (1925) reported. Briggs decided to leave London for Scotland to visit Napier. When he arrived at Napier’s house Briggs addressed him full of deep admiration (Cajori, 1913a): My Lord, I have undertaken this long journey purposely to see your person, and to know by what engine of wit or ingenuity you came first to think of this most excellent help in astronomy viz., the logarithms. Briggs didn’t come empty-handed. Indeed, he had a lot of ideas and suggestions to improve the wonderful invention. He remained there as Napier’s guest for about a month and during that time in many fruitful discussions and conversa- tions the concept of the common logarithm was born, i.e. the logarithms with base 10. This idea improved on Napier’s own first construction substantially, because now the system had a base, so that :
log10(1) = 0 and log10(10) = 1. As a result, the somewhat clumsy rules for Naperian logs where simplified considerably. For instance the essential multiplication property becomes
log10(α · β) = log10(α) + log10(β), in contrast to (7.7), because now log10(1) = 0. In this new system of common logarithms all our standard rules (7.1) hold. Back to London Briggs immediately started calculating the new logarithms. He presented his results to Napier in 1616 on occasion of a second visit to Scotland. Soon after that meeting Napier died. In 1619 Briggs moved ahead in his career and became first Savillian Professor of Geometry at the University of Oxford. During the following five years he carried on his computational work, and in 1624 he published his famous book Arithmetica Logarithmica which contained the common logarithms of 30 000 numbers, the values given with an accuracy of 14 (!) decimal places. This book also has an excellent introduction into new methods and techniques Briggs had to develop to perform his incredibly messy computations. Soon afterwards, in 1628, based on Briggs’ work the Dutchman Adriaan Vlaq (1600–1667)5 published tables of common logarithms of the numbers 1 - 100 000 with an accuracy of 10 decimal places. By 1630, when Briggs died, logarithms have been widely accepted and disseminated all over Europe as a most marvelous tool for numerical computations in so diverse fields like physics, engineering, astronomy and especially nautical navigation. The great french astronomer and mathematician Pierre-Simon Laplace (1749–1827) brought it to the point when asserting that Napier and Briggs by shortening the labours (of calculation) doubled the life of the astronomer. So it is not an exaggeration to say: The work of Napier, Briggs and their successors on logarithms has given rise to an almost unparalleled success story!
5Vlaq had two professions, he was a surveyor and also a quite successful book publisher.
128 Topic 7. The Invention of the Logarithm
7.2 Where to go from here
This very short introduction into the history of logarithms is meant to be a starting point for a deeper study of this interesting subject. If you have read other topics in this book you may remember that usually at this place I present various suggestions divided into mandatory and optional, i.e., suggestions which you should or may take care of in your thesis. As this topic is a rather special one I have divided my suggestions into those which emphasize the historical perspective and those which are of a more tech- nical flavor. In designing your thesis you may: • Put your emphasis on history, or • You may concentrate on some technical issues, e.g., how to calculate loga- rithms numerically. Or, how to define logarithms for negative and complex numbers. • Or, as a third possibility, you may try to find a way to bring together both aspects in a fine and interesting way. So, make up your mind and read on. And, of course, bear in mind, your own ideas are always welcome!
7.2.1 Historical Issues
1. Decimal fractions and mathematical notation
The time around the end of the 16th century and the beginning of the 17th century was a transitional period in the history of mathematics. The foun- dations of many wonderful discoveries notably the invention of the differential and integral calculus were laid at that time. Two important innovations are directly connected to the works of Napier and Briggs: (a) the development of the modern exponential notation and (b) the propagation of decimal fractions. Indeed, Napier seems to be the first to use the decimal point systematically. You will find interesting material about these issues at various places in Boyer and Merzbach (2011), also Cajori (1913b) is a valuable source.
2. Properties of Naperian Logarithms
Give a careful exposition of the mathematical properties of Napier’s logarithms. I recommend that you start with the definition (7.5) and write LN(A) in terms of natural logs. In this way you can readily determine the numerical value (7.6) of LN(1) and formulate analogues to our common rules (7.1). You will find that by a proper scaling the rules for multiplication, division and powers are not so different from (7.1). Furthermore, it would be fine if you could elaborate on a geometric device used by Napier to define his logs. The papers of Ayoub (1993) and Panagiotou (2011) will be very helpful in this context.
129 Topic 7. The Invention of the Logarithm
3. Other people working on logarithms at Napier’s time
Raymond Ayoub (1993) writes that the invention of the logarithms in 1614 is one of those rare parthogenic events in the history of science–there seemed to be no visible developments which foreshadowed its creation. But still, it is true that several contemporaries of Napier and Briggs have been working on very similar concepts, just to mention Jobst B¨urgi and John Speidell. Others, like Johannes Kepler worked at their own system of logarithms, being inspired by the works of Napier and Briggs. Give a brief account of the approaches these people pursued.
4. The natural logarithm
Recall the original idea of John Napier: construction of logarithms based on a correspondence between a geometric and an arithmetic progression, where the geometric progression should be sufficiently dense. In 1647 the Belgian Jesuit Gregory of St. Vincent published a study about an interesting property of the algebraic curve xy = 1, which is a hyperbola, see the figure given below.
y = 1/x
A B C D
O K L M N
He observed and proved by geometric arguments: If the line segments OK, OL, OM, ON form a geometric progression, thus if |OK| = α > 1:
|OL| = α2, |OM| = α3, |ON| = α4,...
Then the areas
(ABLK), (BCML), (CDNM) are all equal. But this in turn means that the areas
(ABLK), (ACMK), (ADNK)
130 Topic 7. The Invention of the Logarithm form an arithmetic progression! Thus we have again a correspondence between an arithmetic and a geometric progression. However, such a correspondence is the basic principle of any logarithmic system. But now, in this special case, logarithms have a very natural meaning, as they represent areas of certain geometric figures. Still, the question remains: is this observation helpful at all? It converts a difficult concept into another difficult concept, namely solving a quadrature problem, i.e., finding the area below a hyperbolic curve. Today we know that such quadratures can be solved by means of integral calculus. In 1668 Nicolaus Mercator (1620–1687) developed an entirely new approach to the aforementioned quadrature problem thereby finding a way to determine the values of natural logarithms, as he called them. 1 He considered the algebraic curve (x + 1)y = 1, equivalent to y = x+1 . By long-division he obtained the non-terminating series 1 = 1 − x + x2 − x3 + ... (7.8) 1 + x At Mercator’s time it was already known that for integers n 6= −1
Z xn+1 xn dx = + C n + 1 Then he integrated (7.8) term by term to obtain:
x2 x3 x4 ln(1 + x) = x − + − + ... (7.9) 2 3 4 Using this series he evaluated ln(1.1) approximately by setting x = 0.1 in (7.9). Truncating the series after a few terms he arrived at
0.12 0.13 0.14 0.15 ln(1.1) ' 0.1 − + − + = 0.095310, 2 3 4 5 which is correct to 6 decimal places. Of course, this interesting story does not end her, nor is it complete. You are invited to fill the gaps in this short exposition and find out more. For instance: • How could Mercator know that the antiderivative of xn equals xn+1/(n + 1)? • Is it always possible (in the sense of allowed) to integrate an infinite series term by term? • Will the Mercator series work for all values of x? And truly interesting from a historical point of view: This idea of series in- tegration played a very important role when differential and integral calculus have been invented, in particular it figured prominently in the Leibniz-Newton calculus controversy, a most famous dispute over priority in the development of calculus. I suggest that you consult the papers of Panagiotou (2011) and Burn (2001), but see also Chapter 5 of Sonar (2016).
131 Topic 7. The Invention of the Logarithm
7.2.2 Technical Issues
5. How did Briggs calculate his tables?
This is a really interesting question! You should bare in mind that at the time of Napier and Briggs several important mathematical concepts have not been known. The idea of a function was not yet available, not to talk about exponential and logarithmic functions and their relation to each other. And it took another fifty years until differential and integral calculus have been invented. So, Briggs had to perform his extensive calculations without the help of pow- erful mathematical tools. This lack of mathematical machinery he filled with extraordinary diligence and ingenuity. Regarding ingenuity, Briggs not only an- ticipated Newton’s binomial theorem (discovered around 1664), he invented also the calculus of finite differences, a class of powerful and fascinating methods to deal with sequences of numbers. You will find the work of Denis Roegel (2010) to be very helpful.
6. How are logarithms calculated today?
The development of high speed computers during the last decades has rendered tables of logarithms essentially obsolete. So why to bother about methods to calculate logarithms numerically? Well, simply because logarithms are ubiquitous. There is a vast number of for- mulas in practically all areas of mathematics and its applications which contain logarithms. Or, think about the rather elementary task of solving an expo- nential equation. Also, when dealing with very large numbers, so large that computers run into trouble, logarithms come to our help. A striking example is the factorial function n! = 1 · 2 · 3 ··· n which grows extremely fast. But taking logs numbers can be kept at a manageable size, since the logarithmic function . is growing rather slowly6. For instance, if n = 100, then ln(100!) = 363.74, but 100! ≈ 10158, this makes a big difference. But how do computers calculate logarithms? One approach is to use power series expansions. Several useful series are known, the oldest being the Mercator series (7.9). The latter, however, is convergent only when −1 < x ≤ 1, and for values x which are close to 1 in absolute value, (7.9) and other series of this type converge very slowly and become practically useless in this case. Find other power series which are suited for the calculation of logarithms! You will find out that there are a lot of them. The Taylor-McLaurin Theorem will be very helpful. So, you should be able to use this tool to find other series.
6Sometimes also iterated logs are used, e.g. in number theory or probability. These are functions of the form ln ln(x) or even ln ln ln(x). The famous number theorist Carl Pomerance (1944– ) once humorously remarked: ln ln ln(x) goes off to infinity with x but has never been observed to do so.
132 Topic 7. The Invention of the Logarithm
Discuss also convergence issues of the series you suggest. In this context, it is good to know about the following trick: any real number x > 0 can be written as x = 2my, where m is an integer and 1 √ √ < y ≤ 2. 2 So, ln(x) = m ln(2) + ln(y). Setting y = 1 + z, using the above bounding you always have |z| < 1. An alternative to series are rational approximations, i.e. fractions of polynomial expressions. A striking example is this one which is also known as Shank’s approximation: 1 + x 2x(15 − 4x2) ln ' (7.10) 1 − x 15 − 9x2 To approximate ln(a) just put 1 + x a − 1 = a =⇒ x = 1 − x a + 1 For instance, to approximate ln(2) = 0.69314718 ..., set x = 1/3 and obtain ln(2) ' 0.6931216931217 which is correct to four decimal places. Approximations like (7.10) often (though not always) have their origin in a continued fraction. One well-known continued fraction for the natural logarithm is 1 + x 2x ln = (7.11) 1 − x x2 1 − 4x2 3 − 9x2 5 − 16x2 7 − 25x2 9 − 11 − ... To make use of expressions like (7.11) we terminate the continued fraction early, after the first, the second, etc. partial denominator, by dropping subsequent terms. In this way we obtain a series of approximants which become successively more accurate: 1 + x 2x ln ' = 2x 1. approximant 1 − x 1 2x 6x ' = 2. approximant x2 3 − x2 1 − 3 2x 2x(15 − 4x2) ' = 3. approximant x2 15 − 9x2 1 − 4x2 3 − 5 etc.
133 Topic 7. The Invention of the Logarithm
As you can see, the 3rd approximant already equals Shank’s approximation. You are invited to give these approximations a try, just take your pocket cal- culator and check various values, e.g., ln(2), ln(10), etc. This was just one spectacular example, several other continued fractions for logs are known. The theory of continued fractions is a really fascinating area of mathematical analysis, but frankly speaking, it is also rather difficult. If you want to learn more about them you may consult Jones and Thron (1980). In chapter 6 of this book you can find (7.11) as a special case of a more general result. There are many other powerful approaches to calculate logs. An extremely efficient algorithm is based on the concept of arithmetic-geometric mean, you may have a look at the paper of Carlson (1972).
7. Logarithms of negative and imaginary numbers
This suggestion is interesting both historically and from a technical point of view. You should have some basic knowledge of complex numbers, as it is presented in typical textbooks for undergraduates. By the end of the 17th century logarithms, common (base 10) or natural, were well established as an invaluable tool for computational mathematics. In an expression like ln(x) or log10(x), x was always considered a positive number. It was also known that the logarithmic func- tion and the exponential function are inverses to each other. Since y = 10x > 0 for all x and because x = log10 y, nobody felt the need to consider loga- rithms of negative numbers. Still, at the beginning of the 18th century a re- markable debate started about the question, what sense should be given to the expression log(−a) for a positive real number a. Here log denotes the log- arithm to any basis. In a remarkable correspon- dence in the years 1712-1713 Gottfried W. Leibniz Johann Bernoulli and Johann Bernoulli discussed this problem. Unfor- 1667–1748 tunately these letters were not published before 1745. But why this discussion? Cajori (1913b) gives an explanation. In the 18th century there was the tendency to take rules derived only for a special case of a mathematical concept and apply them to more general cases. This ten- dency became more and more pronounced and was called the principle of the permanence of formal laws. So by this principle or simply guided by scien- tific curiosity the question of logs of negative numbers became more and more interesting. The controversy between Leibniz and Bernoulli did not result in a satisfac- tory answer. On the contrary, quite disturbing contradictions were found when extending the concept of a logarithm to negative numbers. Indeed, negative
134 Topic 7. The Invention of the Logarithm numbers themselves were not generally accepted at that time. For instance, the renown french mathematician Blaise Pascal (1623–1662) regarded the sub- traction of 4 from 0 as pure nonsense (Kline, 1980, pp 114-116)! It seemed to many people simply inconceivable that there exist numbers less than nothing. One of the disturbing arguments used by Bernoulli was: it must be true that ln(x) = ln(−x) for all x 6= 0, because 1 f(x) = ln(x) =⇒ f 0(x) = and x (−1) 1 g(x) = ln(−x) =⇒ g0(x) = = (by the chain rule) (A) −x x Note that we immediately run into troubles at this point because by our stan- dard rules (7.1) we should have:
ln(−x) = ln[(−1)x] = ln(−1) + ln(x), but this would imply that ln(−1) = 0 which Bernoulli knew could not be true. Leibniz objected that logarithms of negative numbers must be imaginary. When the correspondence between Leibniz and Bernoulli became published this acted as a tremen- dous stimulus on Leonhard Euler, who was Bernoulli’s student. In two epoch-making papers 1747 and 1749 Euler carefully worked out this prob- lem and found that its solution lies at an unexpected place: all numbers except zero have an infinity of log- arithms. His proof is based on one of the most re- markable formulas in mathematical analysis, Euler’s formula, as it is called today: Leonhard Euler ix e = cos(x) + i sin(x), (7.12) 1707–1783 where i denotes the imaginary unit, defined7 by i2 = −1. Since the sine and cosine functions are periodic with period 2π, it follows that
ln(cos(x) + i sin(x)) = i(x ± 2nπ), n ∈ N (7.13) Setting x = π, we obtain cos(π) + i sin(π) = −1 and
ln(−1) = ±πi, ±3πi, . . . , and none of these values is zero. That ln(x) is multivalued causes the problem that ln(x) is no longer a function in the strict sense, therefore some additional restrictions are necessary. This leads to the concept of the principal value of the logarithm which is defined such that the argument φ or angle of a complex number z = |z|eφi is restricted to the interval −π < φ ≤ π. As a consequence our standard rules (7.1) do not always hold. For instance, it is not generally
7 The symbol i was introduced√ by Euler himself, although in the aforementioned papers of 1747 and 1749 he mostly writes −1 instead of i.
135 Topic 7. The Invention of the Logarithm true that ln(a · b) = ln(a) + ln(b). A counter example directly following from (7.12) and (7.12) is this one: πi 3πi ln(−i) = ln((−1) · i) = ln(−1) + ln(i) = πi + = , 2 2 but π 3π ln(−i) = − 6= , 2 2 which follows from (7.13). This is a dangerous trap when you are performing numerical calculations with complex logarithms, as computer software generally uses principal values. If you want to elaborate on these interesting aspects of logarithms then you should definitely read the original papers of Euler (see the annotated bibli- ography in Section 3 below). The historical controversy between Leibniz and Bernoulli is discussed in Cajori (1913b) where you can find a synopsis of this famous correspondence. Cajori (1913c) is devoted to Euler’s contributions and includes a synopsis of the correspondence on this subject between Euler and D’Alembert from April 15, 1747 to September 28, 1748.
7.3 An Annotated Bibliography
There is an enormous amount of literature on the history of logarithms. In several books on the history of mathematics in general you will find detailed accounts of John Napier, Henry Briggs and their time, including also develop- ments like decimal fractions and the invention of modern mathematical nota- tion. In the sequel I want to draw your attention to a few books which I found very interesting. Boyer and Merzbach (2011) is a very readable and rather complete textbook on the history of mathematics. This is also true of Struik (2008), a book first published in 1948. Also recommendable is Wussing (2008). Sonar (2016) is devoted primarily to the famous Newton-Leibniz Controversy, but it sheds also some light on other developments in mathematics during the 17th century. For instance the discovery of the Mercator series is described in detail. Papers on logarithms and their history continue to be published since the 19th century, often when there is an anniversary. Florian Cajori is author of an outstanding series of papers. In Cajori (1913a) you find an account of the work of Napier and Briggs, Cajori (1913b) and Cajori (1913c) are devoted to the early discussions of logarithms of negative and imaginary values. In Cajori (1913d) you find an exposition of the developments up to 1800, which is interesting insofar as Euler’s 1747 paper was not published before 1862. Logarithms viewed as complex functions and the idea of a principal value are presented in Cajori (1913e) and Cajori (1913f). Also, you should read the fine overviews due to Panagiotou (2011) and Burn (2001) which cover in detail the invention of hyperbolic or natural logarithms, a
136 Topic 7. The Invention of the Logarithm development which foreshadowed the revolutionary discoveries of the differential and integral calculus. Also recommended is the profound study by Denis Roegel (2010). It is quite voluminous as it contains a reconstruction of Briggs’ tables. But on the first 34 pages you find a detailed elaboration of the techniques used by Briggs to calculate common logarithms. Raymond Ayoub’s (1993) paper is a rather complete and very readable exposi- tion of the mathematics of Naperian logarithms. And last but not least, please read the excellent papers by Leonhard Euler: Euler (1747) and Euler (1749). For both papers English translations of the original french text (thanks to Stacy Langton and Todd Doucet) are available from the Euler Archive (http://eulerarchive.maa.org/).
7.4 References
[1] Raymond Ayoub. “What is a Naperian Logarithm?” In: The American Mathematical Monthly 100.4 (1993), pp. 351–364. [2] Carl B. Boyer and Uta C. Merzbach. A History of Mathematics. John Wiley & Sons, 2011. [3] R. P. Burn. “Alphonse Antonio de Sarasa and Logarithms”. In: Historia Mathematica 28 (2001), pp. 1–17. [4] Florian Cajori. “History of the Exponential and Logarithmic Concepts”. In: The American mathematical Monthely 20.1 (1913), pp. 5–14. [5] Florian Cajori. “History of the Exponential and Logarithmic Concepts”. In: The American mathematical Monthely 20.2 (1913), pp. 35–47. [6] Florian Cajori. “History of the Exponential and Logarithmic Concepts”. In: The American mathematical Monthely 20.3 (1913), pp. 75–84. [7] Florian Cajori. “History of the Exponential and Logarithmic Concepts”. In: The American mathematical Monthely 20.4 (1913), pp. 107–117. [8] Florian Cajori. “History of the Exponential and Logarithmic Concepts”. In: The American mathematical Monthely 20.5 (1913), pp. 148–151. [9] Florian Cajori. “History of the Exponential and Logarithmic Concepts”. In: The American mathematical Monthely 20.7 (1913), pp. 205–210. [10] B. C. Carlson. “An Algorithm for Computing Logarithms and Arctan- gents”. In: Mathematics of Computation 26.118 (1972), pp. 543–549. [11] Leonhard Euler. De la controverse entre Mrs. Leibniz et Bernoulli sur les logarithmes des nombres negatifs et imaginaires. 1747. url: http : //eulerarchive.maa.org/docs/translations/E168en.pdf. [12] Leonhard Euler. Sur les logarithmes des nombres negativs et imaginaires. 1749. url: http://eulerarchive.maa.org/docs/translations/E807e n.pdf. [13] W. B. Jones and W. J. Thron. Continued Fractions - Analytic Theory and Applications. Reading, MA, USA: Addison-Wesley, 1980.
137 Topic 7. The Invention of the Logarithm
[14] Morris Kline. Mathematics - The Loss of Certainty. Oxford University Press, 1980. [15] E. N. Panagiotou. “Using History to Teach Mathematics: The Case of Logarithms”. In: Science & Education 20 (2011), pp. 1–35. [16] R. C. Pierce. “A Brief History of Logarithms”. In: The Two-Year College Mathematics Journal 8 (1977), pp. 22–26. [17] Denis Roegel. A reconstruction of the tables of Briggs’ Arithmetica loga- rithmica (1624). 2010. url: http://locomat.loria.fr/briggs1624/ briggs1624doc.pdf. [18] Thomas Sonar. Die Geschichte des Priorit¨atenstreits zwischen Leibniz und Newton. Springer Spektrum, 2016. [19] Dirk J. Struik. A Concise History of Mathematics. 4th. Dover Publica- tions, 2008. [20] A. J. Thompson and Karl Pearson. “Henry Briggs and His Work on Log- arithms”. In: The American Mathematical Monthly 32.3 (1925), pp. 129– 131. [21] Hans Wussing. 6000 Jahre Mathematik - Eine kulturgeschichtliche Zeitreise. Vol. 1, Von den Anf¨angen bis Leibniz und Newton. Springer Verlag, 2008.
138 Topic 8
Exercise Number One Partition Theory
Keywords: partitions of integers, the Money Changing Problem, combinatorics, generating functions
This chapter has not been finished yet. October 4, 2018
8.1 An Invitation
8.1.1 Exercise number one
This topic should introduce you into one of the most fascinating areas of discrete mathematics: the theory of integer partitions. To begin with, let us have a look at this famous problem: 1. Auf wieviel Arten l¨aßtsich ein Franken in Kleingeld umwechseln? Als Kein- geld kommen (in der Schweiz) in Betracht: 1-, 2-, 5-, 10- ,20- und 50 Rap- penst¨ucke(1 Franken = 100 Rappen). Can you find the answer? This is Exercise 1 in Aufgaben und Lehrs¨atzeaus der Analysis I by George P´olya and G´abor Szeg˝o,one of the classical textbooks in mathematics, the first edition published in 1925. It is also known as Money Changing Problem and has been discussed and solved already by Leonhard Euler in the 18th century.
8.1.2 Partitions of integers
Technically speaking solving Exercise 1 requires to count a special class of integer partitions.
A partition of a positive integer n is a sequence of integers λ = (λ1, λ2, . . . , λk), such that
λ1 ≥ λ2 ≥ ... ≥ λk > 0 (A)
139 Topic 8. Exercise Number One
George Polya´ (1887–1985) and Gabor´ Szego˝ (1895–1985)
and
λ1 + λ2 + ... + λk = n.
The numbers λi are called the parts of λ. Symbolically one writes n a λ or λ ` n to express the fact that n is split into parts given by λ. The partition function p(n) counts the number of partitions of n. For instance, λ = (2, 1, 1, 1) ` 5, because 5 = 2 + 1 + 1 + 1. This λ is only one out of seven partitions of 5, indeed, we have
5 = 5 = 4 + 1 = 3 + 2 = 3 + 1 + 1 = 2 + 2 + 1 = 2 + 1 + 1 + 1 = 1 + 1 + 1 + 1 + 1, thus p(5) = 7. Note, that due to condition (A) the order of summands is not taken into account, i.e. the partitions λ1 = (3, 2) and λ2 = (2, 3) are considered to be the same object. The partition function p(n) grows very fast. Indeed, it can be shown that
p(10) = 42, p(20) = 627, p(50) = 204226, p(100) = 190569292, p(200) = 3972999029388, etc.
Is there a formula for p(n)? Yes, and this is one of the most exciting results of 20th-century mathemtics, the celebrated Hardy-Ramanujan-Rademacher For- mula. Unfortunately, this formula is extremely complicated, but it yields a simple approximation: 1 h i p(n) ∼ √ exp πp2n/3 for n → ∞ 4n 3
140 Topic 8. Exercise Number One
In 1929 J. E. Littlewood has written a fascinating review of the Collected Papers of Srinivasa Ramanujan which you should read in order to get an impression of the ingenuity of S. Ramanujan, see the references below. Are there other ways to calculate p(n) exactly? Find it out! Actually, there are several alternatives to the Hardy-Ramanujan-Rademacher Formula.
8.1.3 Partitions with restricted parts
Very often one is interested in partitions of n such that the parts λi satisfy certain conditions. For instance, we may consider partitions with all parts being odd integers. Taking n = 5, then the partitions of 5 into odd parts are:
5 = 5 = 3 + 1 + 1 = 1 + 1 + 1 + 1 + 1, and
p(n|all parts odd) = 3 (B)
We may also consider partitions such that all parts are different. For n = 5 we find:
5 = 5 = 4 + 1 = 3 + 2, and
p(n|all parts different) = 3 (C)
Interestingly, (B) and (C) yield the same value for n = 5. A mere coincidence, or is there a general rule? Indeed, Euler already discovered and proved that
p(n|all parts odd) = p(n|all parts different) (D) for all n ≥ 0. This is one, presumably the most famous, of many known so- called partition identities. How to prove (D)? This can be done in several ways. You should discuss at least one of them. A simple and very powerful device is a graphical representation of a partition called Young diagram. It consists of rows of boxes, like
141 Topic 8. Exercise Number One which represents the partition 19 a (8, 6, 2, 2, 1). A very similar representation are Ferrers graphs, which use circles instead of boxes. Several partition identities are directly deducible from these graphical devices. Here is an example of a partition identity which is easily proved using Young diagrams: The number of partitions of n with largest part equal to k equals the number of partitions of n with exactly k parts. Prove it! The idea of partitions into odd parts can be generalized. Let A by a finite or infinite subset of the set of positive integers N. For instance, A = {1, 3, 5,...}, the set of odd numbers. Then a very interesting problem is to find the number of partitions λ ` n such that all parts are elements of A, in other words, what is
p(n|all λi ∈ A) ? (E) Observe, when
A = {1, 2, 5, 10, 20, 50}, then (E) is the answer to the Money Changing Problem of Exercise 1. But beware! Consider a country with only 9, 17, 31 and 1000 ¤ bills. How many ways are there to change a 1000 ¤ bill? It is not all clear that there is even one possiblity to give such a change! In fact, the problem of determining if even one solution exists is known to by very hard, indeed, it is NP-hard.
8.1.4 Generating functions
The most important tool in the study of partitions are generating functions. They may be viewed as some sort of clotheslines on which we hang up numbers we want to display, e.g. the partition numbers p(n). Technically, generating functions are power series in some variable z, e.g. the generating function P (z) of the partition numbers p(n) is
P (z) = p(0) + p(1)z + p(2)z + p(3)z + ... = 1 + z + 2z2 + 3z3 + 5z4 + 7z5 + 11z6 + ...
It is remarkable that this function can be written as an infinite product:
∞ 1 1 1 Y 1 P (z) = ··· = (F) 1 − z 1 − z2 1 − z3 1 − zi i=1 This is a really fundamental relation discovered by Leonhard Euler. Two of the most important points about generating functions are:
142 Topic 8. Exercise Number One
• If we have a function term P (z) for the generating function like (F), then we may be able to extract the coefficient of zn in P (z) by various techniques. Indeed, all information about the counting sequence p(n) is contained in P (z)! • Furthermore, having understood the genesis of (F), you will be able to find generating functions for various restricted partition numbers. As an example, the generating function of the number of partitions with all parts different can be shown to be
∞ Y P (z|all parts different) = (1 + z)(1 + z2)(1 + z3) ··· = (1 + zi) i=1
There are many other examples. Particularly interesting is the generating function of the number of partitions with parts taken from some finite or infinite set A. Recall the Money Changing Problem!
8.2 Where to go from here
8.2.1 Issues of general interest • Prepare a careful though interesting introduction into partition theory. • Discuss some partition identities and prove them. There are two major techniques of proof: the method of bijections and generating functions, compare these methods. • Solve the Money Changing Problem as presented in Exercise 1. of P´olya and Szeg˝o. • Partition theory has many applications, e.g., in computer science but also in statistics. A prominent application is the Wilcoxon rank sum test, or equivalently Mann-Whitney’s U-statistic. Show how this famous 2-sample test is related to partitions.
8.2.2 Some more suggestions • Describe and implement an algorithm to generate systematically all par- titions of n, of course n must not be too large. You may do this in R, octave/matlab, whatever you want. How to handle restricted partitions? It may be interesting also to experiment with some symbolic compution software like Mathematica or Maple. As far as I know, Mathematica is accesible to you via a WU-campus license.
8.3 An Annotated Bibliography
The literature on integer partitions is enormous. Here are a few important resources.
143 Topic 8. Exercise Number One
• You should start your reading with Andrews and Eriksson (2004). This booklet, available as paperback, as an excellently written introductory textbook. • George Andrews is the grand seigneur of partition theory. Andrews (2003) is a classical text written for professional mathematicians. This book it is a reprint of the 1976 edition which has been published originally as part of the Encyclopedia of Mathematics. However, chapters 1 and 2 are easy to read and so it may be profitable for you to have a look at these. Chapter 5 gives a thorough derivation and proof of the Hardy-Ramanujan- Rademacher Formula. Chapter 14 is particularly useful, if you want to develop algorithms for systematically counting various types of partitions. • Sedgewick and Flajolet (2009) is a wonderful textbook. Partitions are not treated systematically in this book, but there is a wealth of material on partitions spread over the book. The first chapters introduce the so- called symbolic method, an extremely powerful and elegant technique to find generating functions. • The famous book by P´olya and Szeg˝ohas also been translated into En- glish. Exercises 1-27 are related to partition problems. You may wonder why the Money Changing Problem, which is obviously of combinatorial nature, has found its place in a book on analysis? • Regarding Wilf (1990a): the title is program! A free pdf-version of this book is available, see Wilf (1990b). More is still missing . . .
8.4 References
[1] G. E. Andrews. The Theory of Partitions. Cambridge University Press, 2003. [2] G. E. Andrews and K. Eriksson. Integer Partitions. Cambridge University Press, 2004. [3] J. E Littlewood. “Collected Papers of Srinivasa Ramanujan”. In: Mathe- matical Gazette 14 (1929), pp. 427–428. [4] Robert Sedgewick and Philippe Flajolet. Analytic Combinatorics. Cam- bridge University Press, 2009. [5] H. S. Wilf. Generatingfunctionology. Academic Press, 1990. [6] H. S. Wilf. Generatingfunctionology. 1990. url: http://www.math.upen n.edu/~wilf/gfology2.pdf.
144 Topic 9
The Ubiquitious Binomialcoefficient
Keywords: discrete mathematics, binomial theorem, binomial identities, summation of series
9.1 An Invitation
9.1.1 The classical binomialtheorem
From elementary mathematics you are certainly familiar with the classical bi- nomial coefficient n n(n − 1)(n − 2) ··· (n − k + 1) = , (9.1) k k! where n and k are nonnegative integers and k! denotes the well-known factorial n function k! = k · (k − 1) ··· 2 · 1. The symbol k is usually read as n choose k, n is called the upper index, k the lower index. It is very likely that you are also aware of n n! = , (9.2) k k!(n − k)! a representation of binomial coefficients which is very common, though from a computational point of view not the best we can have. Binomial coefficients as we have defined them so far are always nonnegative integers. This is by no means clear apriori if you look at (1) or (2). The name binomial coefficient stems from the fact that these numbers occur as coefficients in the expansion of the binomial (1 + z)n into ascending powers of z, viz: n n n n n (1 + z)n = + z + z2 + ... + zn−1 + zn (9.3) 0 1 2 n − 1 n This formula is known as the (classical) Binomial Theorem, and the binomial function f(z) = (1 + z)n is also called the generating function of the binomial coefficients, a very important concept in mathematics. You should check that n = 0, for all k > n, k
145 Topic 9. The Ubiquitious Binomialcoefficient and therefore the series (9.3) is always terminating, indeed, it is a polynomial of degree n in z. Using formula (1) which is due to Blaise Pascal (1623-1662), we find succes- sively: (1 + z) = 1 + z (1 + z)2 = 1 + 2z + z2 (1 + z)3 = 1 + 3z + 3z2 + z3 (1 + z)4 = 1 + 4z + 6z2 + 4z3 + z4 ...... The first three expansions have been known already in ancient times, e.g. they where known to Euklid (around 300 BC) and Diophantus (215–299?). Pascal’s formula can easily be found using a simple combinatorial argument. Just rewrite: (1 + z)n = (1 + z)(1 + z) ··· (1 + z), (9.4) | {z } n factors and now find out how in this n-fold product the term xk is composed. You should work out this argument in precise terms in your thesis and thereby show n that k equals the number of ways to form subsets of size k out of a groundset having n elements.
9.1.2 Pascal’s triangle
The binomial coefficients can be arranged in a triangular array. The first lines of this array read as: