06-02552 Princ. of Progr. Languages (and “Extended”) The University of Birmingham Spring Semester 2018-19 School of Computer Science c Uday Reddy2018-19 Handout 0: Historical development of “programming” concepts Digital computers were devised in the 1950’s and the activity of writing instructions for these computers came to be known as programming. However, “programming” in the broad sense of devising procedures, methods, recipes or rules for solving problems is much older, perhaps as old as humanity. As soon as human beings formed societies and started working together to carry out their daily activities, they needed to devise procedures for such activities and communicate them to each other, first in speech and then in writing. These ways of describing recipes are still in use pretty much the same way, and we use them in our daily lives, e.g., in following cooking recipes, using knitting patterns, or operating household gadgets. It is also in use in industrial and business contexts, e.g., for describing manufacturing processes on factory floor or business processes in organizations. “Programming” in mathematics is of special interest to us because it directly gave birth to programming in computer science. These programs are typically called algorithms, a term derived from the name of al-Khwarizmi, an Arab mathematician of 800 A.D., whose book Al-jabr wa’l muqabalah on algebraic manipulations was highly influential in the development of modern mathematics. The devising of mathematical algorithms began in pre-historic times as soon as numbers were invented to count and to measure. Methods were then needed for the arithmetic operations of addition, subtraction, multiplication, division and others. All these algorithms were devised in pre-historic times, even before civilization began, and we learn them in the early years of our schooling.

Babylonian algorithms The earliest written description of an algorithm we have available is that for division, 1 more precisely for finding inverses x , found on a stone tablet dating back to the Sumerian civilization of Babylonia in 2000 B.C. The first few stanzas of the tablet read as follows:1

[The number is 4;10. What is its inverse? Proceed as follows. Form the inverse of 10, you will find 6. Multiply 6 by 4, you will find 24. Add on 1, you will find 25. Form the inverse of 25], you will find 2;24. [Multiply 2;24 by 6], you will find 14;24. [The inverse is 14;24]. Such is the way to proceed. [The number is 8;20]. What is its inverse? [Proceed as follows. Form the inverse of 20], you will find 3. Mul[tiply 3 by 8], you will find 24. [Add on 1], you will find [25]. [Form the inverse of 25], you will find 2;24. [Multiply 2;24 by 3L you will find [7;12. The inverse is 7;12. Such is the way to proceed].

Recall that the Babylonians used base 60 for their number system (whereas we use base 10). So, 4 and 10 are two separate digits in the base-60 number system, representing the number 4 × 60 + 10. The inverse of the number is 14; 24, i.e., 14 × 60−1 + 24 × 60−2. The details of the algorithm are not important for our purpose. But we should pay attention to the form of the algorithm. Notice the following facts:

• Whereas, nowadays, we would use a symbol like x to denote the input of the algorithm, the Babylonians in 2000 B.C. did not yet have the symbolic notation. Instead, they described the algorithm on sample inputs, several of them in fact, to provide insight to the reader.

1Jean-Luc Chabert et al., A History of Algorithms: From the Pebble to the Microchip, Springer, 1994.

1 • Each step of the algorithm is an elementary operation such as addition or multiplication, which is described separately. From our point of view, this is like an assembly language where each elementary operation forms a separate instruction for the machine. • While the algorithm is described in an “imperative style,” i.e., expressed in terms of commands or instructions to the reader, the overall computation is functional. Each step of the algorithm acts on some inputs and produces some outputs, which then form the inputs for the succeeding steps. Mathematical algorithms are still described in this style, albeit intermixed with algebraic expressions.

Euclid’s constructions Euclid, a Greek mathematician from Alexandria at around 300 B.C., wrote a treatise on mathematics called The Elements. It was divided into 13 “books” with books 1-4 devoted to plane geometry, books 5-10 to ratios and proportions, and books 11-13 to spatial geometry. Founded in geometry, Euclid’s Elements is full of geometric constructions that can be carried out with a ruler and compass. Such “constructions” are once again algorithms, imperative in style. But the imperative style in geometry has more significance than in algebr because, as the steps of a construction are carried out on paper or tablet, the figures on paper are modified with additional lines or arcs being drawn. Thus Euclid’s imperative constructions involve change of state of the figures on the paper. They do not merely produce “outputs.” Euclid’s constructions rarely ask one to erase the lines or arcs that have been drawn. However, many constructions do produce intermediate lines/arcs, which do not form part of the net result of the construction. All of these intermediate lines need to be erased at the end of the construction, a process similar to garbage collection in programming languages. Every construction of Euclid came with a specification, a precise statement of the problem being solved, and a proof of correctness to the effect that the construction solves the stated problem. Euclid was thus the founder of the rigorous method of mathematical proof, which has since become the standard in mathematics. Euclid might also be regarded as the founder of programming theory with rigorous specifications and proofs of correctness. But this latter aspect of his work has not yet been widely recognized.2

Diophantus and algebra Diophantus, another Greek mathematician from Alexandria in the third century A.D., took the decisive step of introducing algebraic notation for sequences of computation steps. Whereas the Babylonians wrote “multiply 6 by 4, you will find 24,” Diophantus introduced the short-hand notation familiar today: 6 ∗ 4. The advantage of the Diophantine notation is that one can string together multiple operations, all in a single expression. Thus sequential algorithms could now be written as√ compact expressions or formulas. For example, a formula for a root of a quadratic equation is written as −b + b2 − 4ac. The development of computer programming languages paralleled the Diophantine invention, when scientists at IBM led by John Backus developed the Fortran (short for “formula translation”). A computer program called a compiler was devised to translate Diophantine-style algebraic formulas into sequential algorithms in the machine language. Algebraic notation provides sequencing via function composition. When we write an expression such as g(f(x)), we are saying, “apply the function f to the value x obtaining an output, and then apply the function g to that output obtaining the final result.” Thus, the algebraic notation gives us a compact way of describing a long series of computation steps. However, sequencing alone is not enough to express all algorithms. One needs at least two other ingredients: conditional evaluation and repetition. Conditional evaluation is expressed in standard mathematics as definition by cases. For example:  x, if x ≥ y max (x, y) = y, if x < y Similar notation is used in Haskell for definition by cases: max x y | x >= y = x | otherwise = y One also uses a simpler if-then-else form for conditional expressions: max (x, y) = if x ≥ y then x else y The introduction of repetition is however a much larger invention.

2I am indebted to Professor Tony Hoare for exhibiting these connections.

2 Peano, Godel¨ and primitive recursion Mathematicians have long used the principle called mathematical induction to prove statements parameterized by natural numbers. The so-called “natural numbers” are simply the series of non-negative integers 0, 1, 2,.... The principle of mathematical induction is the following rule:

A property P (n) is true for all natural numbers n provided: • Base case: P (0), i.e., P is true for 0, and • Step case: If P (x) is true for any natural number x, then it is true for P (x + 1).

We can also express the principle as a proof rule as follows:

P (0) ∀x. P (x) ⇒ P (x + 1) ∀n. P (n)

The key idea of the rule is that, in showing that P is true for x + 1, we can assume that it is already true for x. This assumption is referred to as the inductive hypothesis. The use of such inductive hypothesis is the key to proving properties indexed by an infinite collection of values such as that of natural numbers. For example, it is straightforward using mathematical induction to prove that the sum of the first n natural numbers can be given by a simple formula: n(n + 1) P (n) : 1 + 2 + ··· + n = 2 We leave the proof as an exercise. The essence of mathematical induction is problem reduction. The problem of the truth of P (x+1) is reduced to that of the truth of P (x), which I will display as:

P (x + 1) P (x) This means, if you know how to prove P (x) then there is a way to prove P (x + 1). In this way the bigger problem of the truth of P (x + 1) is “reduced” to the smaller problem of the truth of P (x). (Note that the direction of reduction, a problem-solving idea, is reverse to the direction of implication P (x) =⇒ P (x + 1), the logical idea.) How does problem reduction achieve anything, since we still have the smaller problem of P (x), which remains to be proved? That is where the idea of “induction” comes in. We can repeat the reduction as many times as needed, for example: P (3) P (2) P (1) P (0) The truth of P (3) is reduced to that of P (2), the truth of P (2) to that of P (1), and the truth of P (1) to that of P (0). The truth of P (0) is however covered by the base case, which has been already proved. Thus we can conclude that P (3) is true. Similarly, we can conclude that P (n) is true for every natural number n. Note that we don’t need to enumerate all the steps P (3), P (2), P (1) and P (0). The mathematical induction proof gives a general recipe. We assume that everybody knows that the recipe can be repeated as many times as needed. Mathematician Giuseppe Peano (Italy, 1858-1932) proposed a system of axioms for natural numbers, in which the principle of mathematical induction was a key axiom. His axiomatization is a pretty good one. Mathematical induction captures a key idea of what it means for something to be a natural number. Kurt Godel¨ (Austria and USA, 1906-1978) set out to investigate whether Peano’s axioms were complete, i.e., whether every mathematical fact about natural numbers can be shown to follow from Peano’s axioms. Surprisingly, he proved that they were incomplete, and even more surprisingly, that it is impossible to produce a complete axiom system for natural numbers. This is the famous Godel’s¨ incompleteness theorem. In the process of proving his incomplete theorem, Godel¨ noticed that one could define functions of natural numbers using a scheme similar to that of Peano. This method is now known as primitive recursion. To define a function f(n) for all natural numbers n, we need to:

• Base case: define f(0).

• Step case: define f(x + 1) as a function of f(x).

This involves the same problem reduction idea as in mathematical induction. We can reduce the problem of defining f(x + 1) to the problem of f(x).

f(x + 1) f(x)

3 then, for any particular natural number n, this kind of reduction can be done only a finite number of times, e.g.,

f(3) f(2) f(1) f(0) In words, we can reduce the problem of defining f(3) to that of defining f(2), defining f(2) to defining f(1) and defining f(1) to defining f(0). But, the definition of f(0) is covered by the base case. So, we can claim that f(3) is defined. Similarly, we can claim that f(n) is defined for every natural number n. This is the same idea as in mathematical induction. There, we were interested in proving, here in defining. But, from a problem-solving perspective, it is the same idea. Here is an example definition using primitive recursion, for the multiplication of natural numbers:

a ∗ 0 = 0 a ∗ (x + 1) = a ∗ x + a

Both the equations are straightforward mathematical facts we know about multiplication. However, when put them together in this primitive recursive fashion, they allow one to compute a∗n for any natural number n. For example, we can compute:

a ∗ 3 = a ∗ 2 + a = a ∗ 1 + a + a = a ∗ 0 + a + a + a = 0 + a + a + a

Notice how this works from a problem-solving perspective:

a ∗ 3 a ∗ 2 a ∗ 1 a ∗ 0 In each step, we reduce an instance of a ∗ (x + 1) to an instance of a ∗ x. When we reach the base case, no more reduction is needed. The same idea of primitive recursion can be used for all kinds of inductively defined data structures, such as lists and trees. For defining functions of lists, the primitive recursion scheme is:

• Base case: define f([ ]).

• Step case: define f(a : x) as a function of f(x).

For example, here is the Haskell definition of the list appending operation (denoted ++):

[] ++ y = y (a:x) ++ y = a:(x++y)

Haskell calculates the appending of specific lists using the same problem reduction process:

[a1, a2, a3] ++[b] = a1 :[a2, a3] ++[b] = a1 :a2 :[a3] ++[b] = a1 :a2 :a3 :[ ] ++[b] = a1 :a2 :a3 :[b]

Once again, notice the problem reduction at work:

[a1, a2, a3] ++[b] [a2, a3] ++[b] [a3] ++[b] [ ] ++[b] Finally, let us look at binary trees, defined using the following type definition:

type Tree a = EMPTY | NODE (Tree a, a, Tree a)

The corresponding primitive recursion scheme is:

• Base case: define f(EMPTY).

• Step case: define f(NODE(l,v,r) as a function of f(l) and f(r).

For example, consider the definition of the height of a tree:

height(EMPTY) = 0 heeight(NODE(l,v,r)) = 1 + max(height(l), height(r))

However, there is an important difference when we move to the case of binary trees. The problem here is a bit more sophisticated. Can you spot what it is?

4 General recursion Sequencing, conditional evaluation and primitive recursion are enough for describing the majority of useful algorithms. However, theoretically, they are not enough for describing all computable functions. To achieve completeness (“Turing completeness”), it is necessary to allow recursion to be “general.” In primitive recursion, we reduce the problem of f(x + 1) to that of f(x), or, equivalently, reduce the problem of f(z) for any positive z to that of f(z − 1). One might wonder, why can’t we reduce the problem of f(z) to that of calculating f for some other smaller input, e.g., f(z − 2) or f(z/2)? It turns out that all such reductions have the same power as that of Peano-Godel¨ reduction of f(z) to f(z − 1). We will discuss that issue later. One might also wonder, can we use problem reduction to a larger input, say f(z + 1) or f(2 ∗ z)? That would not work in general. If we reduce the problem of calculating f(3) to that of f(4), and then that of f(4) to that of f(5) etc., the process of calculation would never terminate and we would get no result. However, we might choose some strategic input, say n, up to which the problem is “reduced” upwards and then reduced downwards again. We might follow a “zig zag” path through the problem space. Such generality is rarely needed, but it is necessary for achieving theoretical completeness of the computational mechanisms. General recursion allows one to write recursive definitions where we can reduce the problem of f(z) to f(z0) for any z0, smaller or larger. The termination of the evaluation is then not guaranteed. The functions definable by general recursion are called partial recursive functions, “partial” because the functions so defined may terminate for some inputs and go on interminably for other inputs. Most programming languages provide general recursion as the mechanism, not only for theoretical completeness but also because the Peano-Godel¨ reduction is not always convenient. It is then the programmer’s responsibility to ensure that the definitions terminate.

Iterative algorithms If recursion was only proposed in early 20th century, how did the mathematicians describe iterative algorithms in earlier times? There are many such algorithms in mathematics, e.g., Newton’s method for finding roots of functions and Gaussian elimination algorithm for solving systems of linear equations etc. All such algorithms were described in natural language prose, typically along with their proofs of correctness. No systematic notations were employed. The majority of such algorithms were “functional” in spirit, but expressed in “imperative” style.

Alan Turing and the Turing machine In 1928, Alan Turing worked on his Master’s thesis at the University of Cambridge and boldly decided to attack one of the deepest problems in mathematics at that time, whether it is possible to mechanically compute “yes”’ or “no” answers to all mathematical questions. Turing proved that this was not possible. To prove this fact, he needed to define a formal model of what is meant by mechanically computable. The model he defined is called the Turing machine, and it became the basis for the subsequent design of digital computers. Turing’s machine model had an infinite tape in which the machine could remember data. The tape was divided into small cells, each of which could store a binary digit (0 or 1). Somewhat remarkably, Turing defined his machine in such a way that it could overwrite the cells on the tape with new data, erasing the values that might have been present in them from earlier calculations. Thus Turing machines were fundamentally enable with a mechanism for change of state, a crucial feature of imperative computations. Turing’s machine model is thus decidedly an “imperative” model. The running of the machine causes the old data on the tape to be erased and new data to be written. The state of the tape constantly changes. This is radically different from “functional” computation in the sense that the Turing machine does not simply calculate new data from old data, and but is able to overwrite the old data destroying it in the process. The machine thus evolves in time and the state of the machine irreversibly changes during the computation. Why Turing introduced the idea of “change” or “overwriting” of data is not clear. The Turing machine tape was meant to model the mathematician’s notebook where one records the results of calculations. Mathematicians did not generally overwrite data in doing calculations in their notebooks. It is likely that Turing needed to give his machine as many capabilites as possible in order to convince the readers that he was representing the entire idea of “mechanically computable.” Turing might have also been aware of the computing machines that were known in his time, such as those Charles Babbage (1791–1871), and understood that physical machines would need to overwrite data in order to reuse the finite amount of storage. During the second world war, Turing was involved in the Brtish code-breaking project, designing a machine called Bombe for doing calculations involved in code-breaking.

Early computers The full-scale development of imperative programming took place only after the digital computers were invented and the problem of programming them was attacked. Seminal work in developing the concepts of imperative programming was done by mathematicians Goldstine and von Neumann, who essentially wrote the first programming manual for digital computers in 1947. The programming languages available at that

5 time were simply machine languages, which are composed of sequences of machine instructions, much like that of the Turing machine. The machine languages were a throw back to the old Babylonian methods of describing algorithms. But they also relied much more crucially on state change, working within confines of small memories, and the programs were highly error-prone. Maurice Wilkes, the early computer scientist at Cambridge Computer Laboraty, wrote:

“I well remember when this realization first came on me with full force. The EDSAC was on the top floor of the building and the tape-punching and editing equipment one floor below. [...] It was on one of my journeys between the EDSAC room and the punching equipment that ”hesitating at the angles of stairs” the realization came over me with full force that a good part of the remainder of my life was going to be spent in finding errors in my own programs.”

The reasons for such errors were many-fold: the fact that the machines were extremely exacting, the fact that many low-level details had to be packed into programs, but also, more importatntly, the fact that the programs were made to deal with a dynamically evolving state. Edsger Dijkstra wrote in 1968:

“. . . our intellectual powers are rather geared to master static relations and that our powers to visualize processes evolving in time are relatively poorly developed. For that reason we should do (as wise programmers aware of our limitations) our utmost to shorten the conceptual gap between the static program and the dynamic process, to make the correspondence between the program (spread out in text space) and the process (spread out in time) as trivial as possible.”

This is an extremely deep observation that caused a revolution in the scientific thinking about programming in general. First of all, Dijkstra is noting that our minds are geared to understanding static relationships than dynamic processes. Do you agree with that? Secondly, he is saying that, if the correspondence between the static program and the dynamic execution process is as close as possible, then we can achieve better results in terms of correct programs. This was a Eureka moment for Computer Science. Prior to that, people generally assumed that programming was “easy.” It is easy enough to describe processes of computation that we already know how to do by hand. What is the big issue? However, this expectation did not come to fruition in practice. The programs written in the 1960s were extremely buggy and major software projects such as the Operating System for IBM 360 (called “OS/360”) turned out to be disasters.

Imperative programming The issue of the low-level detail swas addressed by the development of high-level programming languages. The programming language Fortran introduced formulaic notation into programming languages, much like what Diophantus did for mathematics. However, the higher-level structure of the program code was still the same as that of machine languages, i.e., imperative in nature. The programming language Algol 60, designed by an international committee of experts in Computer Science, introduced almost all features of imperative programming as practised today: the distinction between commands and expressions, the hierarchical nesting of commands as well as expressions, the concept of local variables, the concept of procedures — which are the counterpart of “functions” for imperative programs — and the formal semantics of parameter passing for procedures. The notations and techniques devised in Algol 60 reduced to a great deal the conceptual gap between static program and dynamic execution that Dijkstra spoke about. These notations have been continuously reused in all imperative languages designed since then, such as Pascal, C, and Java.

Functional programming , based entirely on Diophantine-style algebraic expressions and Godel’s recursion, takes Dijkstra’s prescription to its extreme. There is in essence no dynamic execution process at all. All execution is carried out by replacing equals by equals, i.e., static relationships that we can master easily. It was first put into practice in the programming language Lisp, designed by John McCarthy at MIT in 1960. In Britain, vociferously advocated functional programming, designing a language called ISWIM (which was an acronym for the funny expression “if you see what I mean”). The programming languages ML, Miranda and Haskell, follow on the trail of ISWIM.

6