<<

06-02552 Princ. of Progr. Languages (and “Extended”) The University of Birmingham Spring Semester 2019-20 School of c Uday Reddy2019-20 Handout 10: Procedures and objects The basic imperative language (see Handout 11) has only commands, variables and expressions. In this handout, we examine adding procedures and objects (actually classes).

1 Algol-like languages

Procedures can be added to the basic imperative language using the typed . This was in fact done in the first systematically designed , Algol 60, defined by an international committee of computer scientists.1 John Reynolds,2 in 1981, proposed a systematic redesign of Algol based on typed lambda calculus and called it Idealized Algol. We follow his approach. 1. Typed lambda calculus Recall that the types of a typed lambda calculus are given by the syntax:

T ::= b | T1 → T2 where b ranges over basic types. For a language, we choose basic types such as Int, Bool etc. For an imperative programming language, we can choose basic types that represent state-based computations. 2. Lambda calculus for imperative programs To obtain a typed imperative programming language, we pick basic types to be those representing imperative programming concepts (cf. Handout 11). These are:

• Mutable variables, also called references, representing storage locations.

• Expressions that read the state of variables and return a value (“state readers”) • Commands that alter the state of variables (“state transformers”).

Variables, expressions and commands are not treated as types in typical imperative programming languages. Rather, they are designated as separate syntactic categories. However, to obtain the full power of typed lambda calculus, it is useful to regard them as types. The basic data types such as int, bool, and char or not regarded as types of the lambda calculus. This is because there are no terms in imperative programming languages that directly denote data values (except constants). Rather, terms denote either mutable variables or expressions, each of which might deal with values of particular data types. Let δ stand for types such as int, bool, . . . . Then the basic types of the lambda calculus for imperative programs are:

• var[δ], also written as ref[δ], for variables that store δ-typed data values. • exp[δ], for expressions that return δ-typed data values.

• comm, for commands.

In summary, the types of our lambda calculus for imperative programs is as follows:

T ::= var[δ] | exp[δ] | comm | T1 → T2

3. Terminology: variables, references and identifiers. Note that the term “variable” in imperative programming refers to storage locations whose values can be modified. In contrast, lambda calculus as well as standard mathematics use the term “variable” for a completely different concept, viz., symbols used to stand for values.

1It is doubtful if the committee members knew lambda calculus fully, but they reinvented some of its ideas for themselves. Thus Algol had only part of lambda calculus, not the full calculus. Peter Landin showed the correspondence between the two procedure mechanisms a few years later. See Landin, Peter. A correspondence between ALGOL 60 and Church’s Lambda-notations: Part II, Communications of the ACM, March 1965. 2Reynolds, John. The essence of Algol, in Algorithmic Languages, North-Holland, 1981.

1 To avoid conflict between the two uses, Algol 68 introduced the term “reference” for mutable variables in the sense of imperative programming. The terminology did not catch on within the imperative language culture (except in isolated usages like “call by reference”). However, it became standard in functional programming culture. So, we use both the terms “variable” and “reference” for referring to this concept. Algol 60 used the term “identifier” to what we call variable in mathematics and the lambda calculus. So an “identifier” is a symbol, used for formal parameters for functions and for naming various things, like functions, types, classes etc. For example, in the term λx. x + y of type exp[int] → exp[int], the symbol x is a bound identifier and y is a free identifier. 4. Constants for imperative programs All the primitive operations of the imperative programs are modelled as constants in our typed lambda calculus. We group them into four classes, for ease of exposition: • Primitive operations for expressions. All the constants and primitive operations needed for data values are expressed as constants that act on exp types. Some examples are: 0, 1, 2, ...:: exp[int] true, false :: exp[bool] +, -, ... :: exp[int] → exp[int] → exp[int] =, <, ... :: exp[int] → exp[int] → exp[bool] &&, || :: exp[bool] → exp[bool] → exp[bool] not :: exp[bool] → exp[bool] The only thing surprising about these types is that they involve the exp type constructor. We need the exp type constructor because, in general, the arguments for operations such as + are “expressions” which can read the state of variables. The result of such an application, e.g., x + y, is again an expression that is state-dependent. • Primitive operations that deal with commands. These are as follows: skip :: comm (;) :: comm → comm → comm if :: exp[bool] → comm → comm → comm Note that (; ) is an infix operator. For if, we will use the :

if B then C1 else C2 if BC1 C2 • Primitive operations that deal with variables. These are as follows: read :: var[δ] → exp[δ] (:=) :: var[δ] → exp[δ] → comm

The assignment operator (:=) is an infix operator. Note that the type implies that, if V is a variable and E an expression then V := E is a command. Its effect is to evaluate E and assign its value to V . The read operation is normally left implicit in typical Algol-like languages. But it says that, if V is a variable, then read V is an expression. Its effect is to read the value of V and return it. Leaving the read operation implicit means that, whenever a variable is used in a position where an expression is expected, we automatically insert a read operator. For example, we write x := x + 1

where the variable x on the right hand side is used where an expression is needed. So, we understand it as: x := (read x) + 1

This convention is only used in what we traditionally call “imperative” languages. Functional languages like ML and Haskell have the read operation explicit. • Finally, we have an operation for local variable declarations: local[δ] :: (var[δ] → comm) → comm The effect of local[δ] B is to create a new local variable for δ-typed values, say V , and then execute B(V ). After B(V ) finishes, the local variable is deallocated.

2 Since this form looks a little heavy in normal usage:

local[int](λx. C)

we use the simpler notation:

{local[int] x; C}

and understand it to have the same effect.

In summary, all the behaviour of imperative programs can be modelled using a few primitive functions in terms of the basic types var[δ], exp[δ] and comm. 5. Procedures. The procedures of Algol-like languages are mapped directly into the functions of lambda calculus. For example, the Algol 60 procedure declaration:

procedure swap(var[int] x, var[int] y) { local[int] t; t := x; x := y; y := t } is thought of as the definition of a function swap:

let swap = λx. λy. {local[int] λt. t := read x; x := read y; y := read t }

The type of swap is:

swap : var[int] → var[int] → comm

In general, a procedure is always a function that has comm is its codomain type.3 A function that has an exp[δ] type as its codomain type is thought of as a “function procedure” or sometimes just called a “function”. (But this is misleading because all procedures are functions.) 6. Semantics of procedure call. Prior to Algol 60, the meaning of a procedure such as swap was understood operationally, in terms of machine instructions that would be executed. That story might run as follows: 1. Push references to the variables i and j on the system stack. 2. Push the program counter on the stack, and jump to the code of swap. 3. When the code of swap finishes, pop the arguments i and j as well as the saved program counter from the system stack, and jump back to the saved program counter position. The definition of Algol 60 put paid to such operational descriptions. The semantics of a procedure call, as given in the Algol 60 Report, is to simply copy the body of the procedure to where the procedure call appears, and replace the formal parameters by the arguments, like so: . . . . swap i j; {int t; =⇒ t := read i; i := read j; j := read t} . . . .

This semantics came to be known as the Algol copy rule. We might also call it procedure unfolding. Note that the copy rule is precisely the β-equivalence reduction rule of the lambda calculus.

3It is not useful to think of this as a function “returning” a command. The idea of “returning” results only makes sense in purely functional languages. Here it is better to think of swap as “mapping” two variables x and y into a command.

3 2 Objects and classes

Whereas in functional programming, data abstraction is achieved by abstracting over a type, in imperative programming it is more common to abstract over storage. The resulting abstractions are called objects. The behaviour of objects is defined via classes. Historically, objects and classes were introduced in the language 67.4 They were popularised by the language C++ in the 1980s and widely adopted since then. An object encapsulates some amount of storage, represented by mutable variables or other objects, and provides operations that can be used by clients. The clients cannot access the internal storage of the object directly. Rather they manipulate the object via the exported operations. (Note that the idea is quite similar to abstract data types, where clients cannot access the representation of the abstract type, but manipulate it via the exported operations.) Since we often need several objects of the same kind, what we define in a program is a class rather than an object. We can make instances of the class by using a declaration of the form

instance K a where K is a class and a is a symbol used to denote the new instance being created. A class is defined using a construct of this form:

class :: T { . . . local variables and instances init C0; methods . . . method definitions }

Here T is a type, called the method signature or the interface type, which declares the types of the methods; the command C0 is meant for initialising the local variables of the class; and the method definitions include a number of exported operations. We assume that, by default, all the local variables are “private,” i.e., hidden inside the class. (However, nothing stops us from exporting an entire variable as a “method” if we need to make it public. But this is rarely done.) As an example, we show a class defining point objects. First we define the signature type:

type PointSig = {set :: exp[real] -> exp[real] -> comm, xcoord :: exp[real], ycoord :: exp[real] }

Every point object is expected to have a set method, using which we can set its x and y coordinates, and methods xcoord and ycoord of expression type, which read the internal variables of the point object and return real values. Next we define a class implementing the signature:

Point = class :: PointSig { local[real] x; local[real] y; init {x := 0; y := 0}; methods set = λxnew. λynew. {x := xnew; y := ynew} xcoord = read x ycoord = read y }

4Simula 67 was designed by Ole-Johann Dahl and at the Norwegian Computing Centre in Oslo, both of whom received the ACM Turing Award in 2001. was involved in clarifying the early ideas of objects and classes.

4 We can define another class using the polar representation of points for the same signature type:

PolarPoint = class :: PointSig { local[real] r; local[real] theta; init {r := 0; theta := 0}; methods set = λxnew. λynew. { r := sqrt(xnew*xnew + ynew*ynew); theta := arctan(ynew/xnew)} xcoord = read r * (cos (read theta)) ycoord = read r * (sin (read theta)) }

Just as in the case of abstract data types, the client programs of the two Point classes can use their instances interchangeably. All point objects would be of type PointSig and no difference will be observed between the instances of the two classes modulo some rounding differences.

Class invariants Similar to the idea of data type invariants in the case of abstract data types, we have a notion of class invariants, which arise from the fact that the clients of a class do not have access to the data representation. Whenever a method of an object is invoked, its state will satisfy the class invariant. For example, the objects of the PolarPoint class always satisfy the class invariant:

(read r) ≥ 0 ∧ −π < read theta ≤ π

To illustrate class invariants using a more sophisticated example, we look at a class for sorted lists.

SortedList = class :: SortedListSig { local[list int] l; -- invariant: l is sorted in increasing order init {l := []} methods insert :: exp[int] -> comm insert x = {l := insert’ x (read l)}

delete :: exp[int] -> comm delete x = {l := delete’ x (read l)}

member :: exp[int] -> exp[bool] member x = member’ x (read l) }

We are reusing here the Haskell functions insert’, delete’ and member’ that we defined in the Data type invariants section of Handout 2. Recall that all these functions expect sorted lists as their argument and return sorted lists as results. Therefore, the class invariant of the SortedList class is maintained. The variable l always contains a sorted list. The class invariant is an assertion, i.e., it expressed using mutable variables and evaluates to true or false in individual states. The proof obligations for the class methods are:

true {init command} I I {insert x} I I {delete x} I I =⇒ true

These are similar to the proof obligations for data type invariant for the Haskell SortedList module, except that they are now formulated for the imperative commands using assertions.

5 Queue = class :: QueueSig { local[list int] f; -- front part of the queue local[list int] r; -- rear part of the queue

init {f := []; r := []}

methods insert :: exp[int] -> comm insert x = {r := x:r}

delete :: comm delete = {if (f == []) then { f := reverse r; r := []}; f := tail f; }

front :: exp[int] front = if (f == []) then last r else head f }

Figure 1: Queue class using a two-list representation

Logical relations The technique of logical relations can also be adapted to classes. However here we have two equivalent ways of presenting the proof:

1. proving the equivalence of two separate classes, 2. merging the two classes into a single one using the so-called auxiliary variables.

We discuss both the techniques using the example of Melville representation of queues. A class for queues using two lists is given in Fig. 1

Notation. The methods delete and front might appear a little strange because they do not take any arguments. To help our own intuition, we might add null arguments as follows:

delete :: () -> comm front :: () -> exp[int]

But it is quite unnecessary to do so. Commands and expressions are like IO actions in Haskell. They are evaluated/executed only when necessary. We would like to establish that the queue class of Fig. 1 is equivalent to a simple queue class with a single list as the representation for queues:

SimpleQueue = class :: QueueSig { local[list int] l; -- list of elements with the front at the left init {l := []} methods insert x = {l := l++[x]} delete = {l := tail l} front = head f }

6 Queue = class :: QueueSig { local[list int] f; -- front part of the queue local[list int] r; -- rear part of the queue

local[list int] l; -- list of all the elements -- invariant: l = f++(reverse r)

init {f := []; r := []; l := []}

methods insert x = {r := x:r; l := l++[x]}

delete = {if (f == []) then { f := reverse r; r := []}; f := tail f; l := tail l }

front = if (f == []) then last r else head f -- should be equal to (head l) }

Figure 2: Queue class with an auxiliary variable for the list of elements

The logical relations proof technique involves first formulating a class invariant:

R: l = f++(reverse r) which involves the local states of the two classes (with l in the state of SimpleQueue class and f and r being the state of the Queue class). Then the proof obligations involve proving that the logical relation is preserved by the methods of the two classes run side by side:

true {init}{init’} R R {insert x}{insert’ x} R R {delete}{delete’} R R =⇒ front = front’

The proofs of these conditions are similar to the corresponding conditions for the ADT logical relation.

3 Alternative methods for reasoning about classes

Auxiliary variables The auxiliary variable technique involves

• adding the state of the abstract representation (in this case that of the SimpleQueue class) as additional variables to the main class, and • adding the code for its manipulation in all the methods.

The resulting merged class is shown in Fig. 2. The auxiliary variable code is shown in italicised text. The variable l is called auxiliary because it never affects the state of the main queue variables and the results returned from the methods. In general, we can always add auxiliary variables to programs for the purpose of aiding the proof of correctness. A variable is said to be auxiliary if it never affects the genuine program variables (either through assignments or through conditional branching), and it never affects any return value of the methods.

7 Queue = class :: QueueSig { local[list int] f; -- front part of the queue local[list int] r; -- rear part of the queue

abstract methods elems :: exp[list int]; -- list of all the elements elems = f++(reverse r)

init {f := []; r := []} assert (elems == [])

methods insert x = assert (elems == l) {r := x:r} assert (elems == l++[x])

delete = assert (elems == l) {if (f == []) then { f := reverse r; r := []}; f := tail f; } assert (elems == (tail l))

front = if (f == []) then last r else head f -- should be equal to (head elems) }

Figure 3: Queue class with an abstract method

With the addition of the auxiliary variable in Fig. 2, the logical relation becomes an ordinary invariant on the local state of the class, and the verification conditions are standard: true {init} R R {insert x} R R {delete} R R =⇒ true

Abstract methods Finally, we consider a third technique for proving the correctness of a class. This involves using an abstract methods, expressions dependent on the state of the class giving mathematical quantities, which can be used to express the effect of the methods in terms of pre- and post-conditions. The queue class with an abstract method called elems is shown in Fig. 3. This method will never be executed by a client program, which is the reason for calling it an “abstract” method. Its sole purpose is to enable writing specifications for the actual methods. Note that

• the initialisation code has the post-condition saying that the list of elements is empty, • the insert method says that if the list of elements is l initially then it becomes l++[x], and • the delete method says that if the list of elements is l initially then it becomes (tail l).

8