Concepts in Programming Languages Practicalities Alan Mycroft∗  Course web page: www.cl.cam.ac.uk/teaching/1112/ConceptsPL/ h i Computer Laboratory with lecture slides, exercise sheet and reading material. University of Cambridge  One exam question. 2011–2012 (Easter Term)

(*) I thank Marcelo Fiore for permission to use these slides

1 2

Topics Main books I. Introduction and motivation.  J. . Mitchell. Concepts in programming languages. II. The first procedural language: (1954–58). Cambridge University Press, 2003. III. The first declarative language: LISP (1958–62).  T.W. Pratt and M. V.Zelkowitz. Programming Languages: IV. Block-structured procedural languages: Algol (1958–68), Design and implementation (3RD EDITION). Prentice Hall, Pascal (1970). 1999. V. Object-oriented languages — Concepts and origins: ⋆ M. L. Scott. Programming language pragmatics Simula (1964–67), (1971–80). (2ND EDITION). Elsevier, 2006. VI. Types in programming languages: ML (1973–1978).  . Sethi. Programming languages: Concepts & constructs VII. Data abstraction and modularity: SML Modules (1984–97). (2ND EDITION). Addison-Wesley, 1996. VIII. The state of the art: Scala (2007)

3 4 Topic I Goals Introduction and motivation  Critical thinking about programming languages. References: g g ? What is a programming language!?  Chapter 1 of Concepts in programming languages by  Study programming languages. J. C. Mitchell. CUP, 2003.  Be familiar with basic language concepts.  Chapter 1 of Programming languages: Design and  Appreciate trade-offs in language design. implementation (3RD EDITION) by T.W. Pratt and M. V.Zelkowitz. Prentice Hall, 1999.  Trace history, appreciate evolution and diversity of ideas.

 Chapter 1 of Programming language pragmatics  Be prepared for new programming methods, paradigms. (2ND EDITION) by M. L. Scott. Elsevier, 2006.

5 6

Why study programming languages? What makes a good language?  To improve the ability to develop effective algorithms.  Clarity, simplicity, and unity.  To improve the use of familiar languages.  Orthogonality.  To increase the vocabulary of useful programming constructs.  Naturalness for the application.  To allow a better choice of programming language.  Support of abstraction.  To make it easier to learn a new language.  Ease of program verification.  To make it easier to design a new language.  Programming environments.  To simulate useful features in languages that lack them.  Portability of programs.  To make better use of language technology wherever it appears.

7 8 What makes a language successful?

 Expressive power.  Cost of use.   Cost of execution. Ease of use for the novice.

 Cost of program translation.  Ease of implementation.

 Cost of program creation, testing, and use.  Open source.  Cost of program maintenance.  Excellent compilers.

 Economics, patronage, and inertia.

9 10

Applications domains Influences Era Application Major languages Other languages 1960s Business COBOL Assembler

Scientific FORTRAN ALGOL, BASIC, APL  Computer capabilities. System Assembler JOVIAL, Forth AI LISP SNOBOL  Applications. Today Business COBOL, SQL, spreadsheet C, PL/I, 4GLs Scientific FORTRAN, C, C++ BASIC, Pascal  Programming methods. Maple, Mathematica System BCPL, C, C++ Pascal, Ada, BASIC,  Implementation methods. MODULA AI LISP, Prolog  Theoretical studies. Publishing TEX, Postscript, word processing  Standardisation. Process UNIX shell, TCL, Marvel, Esterel New paradigms Smalltalk, SML, Haskell, Java Eifell, C#, Scala Python, Ruby

11 12 ? Why are there so many languages? 89?>Motivating application in language design:;=<  Evolution. A specific purpose provides focus for language designers;  Special purposes. it helps to set criteria for making design decisions.  Personal preference. A specific, motivating application also helps to solve one of the hardest problems in programming language design: deciding which features to leave out.

13 14

Examples: Good languages designed with a specific purpose Program execution model in mind. Good language design presents abstract machine.  LISP: symbolic computation, automated reasoning  FORTRAN: Flat register machine; memory arranged  FP: functional programming, algebraic laws as linear array  BCPL: compiler writing  LISP: cons cells, read-eval-print loop  Simula: simulation  Algol family: stack of activation records; heap storage  C: systems programming  BCPL, C: underlying machine + abstractions  ML: theorem proving  Simula: Object references  Smalltalk: Dynabook  FP, ML: functions are basic control structure  Clu, SML Modules: modular programming  Smalltalk: objects and methods, communicating by  C++: object orientation messages  Java: Internet applications  Java: Java virtual machine

15 16 89?>Classification of programming languages:;=< Theoretical foundations  Imperative Examples: procedural C,Ada,Pascal,Algol,FORTRAN,...  Formal-language theory. object oriented Scala, C#,Java, Smalltalk, SIMULA,...  Automata theory. scripting Perl,Python,PHP,...  Algorithmics.

 Declarative  λ-calculus. functional Haskell, SML, Lisp, Scheme, . . .  Semantics. logic Prolog  Formal verification. dataflow Id, Val  Type theory. constraint-based spreadsheets  Complexity theory. template-based XSLT  Logic.

17 18

Standardisation Language standardisation

Consider: int i; i = (1 && 2) + 3 ;  Proprietary standards. ? Is it valid C code? If so, what’s the value of i?  Consensus standards. ? How do we answer such questions!?  ANSI (American National Standards Institute) ! Read the reference manual.  IEEE (Institute of Electrical and Electronics Engineers)

 BSI (British Standard Institute) ! Try it and see!

 ISO (International Standards Organisation) ! Read the ANSI C Standard.

19 20 Language-standards issues Language standards PL/1 Timeliness. When do we standardise a language?

Conformance. What does it mean for a program to adhere to ? What does the following a standard and for a compiler to compile a standard? 9 + 8/3 Ambiguity and freedom to optimise — Machine mean? dependence — Undefined behaviour. − 11.666... ?

Obsolescence. When does a standard age and how does it − Overflow ? get modified? − 1.666... ? Deprecated features.

21 22

For 9 + 8/3 we have:

DEC(p,q) means p digits with q after the decimal point. DEC(1,0) + DEC(1,0)/DEC(1,0) => DEC(1,0) + DEC(15,15-((1-0)+0)) Type rules for DECIMAL in PL/1: => DEC(1,0) + DEC(15,14) => DEC(MIN(1+MAX(1-0,15-14)+MAX(0,14),15),MAX(0,14)) DEC(p1,q1) + DEC(p2,q2) => DEC(15,14) => DEC(MIN(1+MAX(p1-q1,p2-q2)+MAX(q1,q2),15),MAX(q1,q2)) DEC(p1,q1) / DEC(p2,q2) So the calculation is as follows => DEC(15,15-((p1-q1)+q2)) 9 + 8/3 = 9 + 2.66666666666666 = 11.66666666666666 - OVERFLOW = 1.66666666666666 - OVERFLOW disabled

23 24 History 1951–55: Experimental use of expression compilers. 1981–85: Smalltalk-80, Prolog, Ada 83.

1956–60: FORTRAN, COBOL, LISP, Algol 60. 1986–90: C++, SML, Haskell.

1961–65: APL notation, Algol 60 (revised), SNOBOL, CPL. 1991–95: Ada 95, TCL, Perl.

1966–70: APL, SNOBOL 4, FORTRAN 66, BASIC, SIMULA, 1996–2000: Java. Algol 68, Algol-W, BCPL. 2000–05: C#, Python, Ruby, Scala. 1971–75: Pascal, PL/1 (Standard), C, Scheme, Prolog.

1976–80: Smalltalk, Ada, FORTRAN 77, ML.

25 26

Language groups

 Multi-purpose languages Things to think about  Scala, C#, Java, C++, C  What makes a good language?

 Haskell, SML, Scheme, LISP  The role of  Perl, Python, Ruby 1. motivating applications,  Special-purpose languages 2. program execution,  UNIX shell 3. theoretical foundations  SQL in language design.

 A LTEX  Language standardisation.

27 28 FORTRAN = FORmula TRANslator (1957)  Developed in the 1950s by an IBM team led by John Topic II Backus. FORTRAN : A simple procedural language  The first high-level programming language to become widely used. References:g g  At the time the utility of any high-level language was  Chapter 10( 1) of Programming Languages: Design and open to question! § implementation (3RD EDITION) by T.W. Pratt and The main complain was the efficiency of compiled code. M. V.Zelkowitz. Prentice Hall, 1999. This heavily influenced the designed, orienting it towards  The History of FORTRAN I, II, and III by J. Backus. In providing execution efficiency. History of Programming Languages by R. L. Wexelblat. Academic Press, 1981.  Standards: 1966, 1977 (FORTRAN 77), 1990 (FORTRAN 90).

29 30

John Backus Overview As far as we were aware, we simply made up the Execution model language as we went along. We did not regard  FORTRAN program = main program + subprograms

language design as a difficult problem, merely a  Each is compiled separate from all others. simple prelude to the real problem: designing a  Translated programs are linked into final executable compiler which could produce efficient programs.a form during loading.

 All storage is allocated statically before program execution begins; no run-time storage management is provided.

 Flat register machine. No stacks, no recursion. Memory a In R.L.Wexelblat, History of Programming Languages, Academic Press, arranged as linear array. 1981, page 30.

31 32 Overview Overview Compilation Data types FORTRAN program  Numeric data: Integer, real, complex, double-precision  real. Compiler  Boolean data. called logical  Incomplete machine language Library routines  U Arrays. of fixed declared length UUUU oo UUU ooo UUUU * ow o U  Linker Character strings. of fixed declared length   Files. Machine language program

33 34

Example PROGRAM MAIN Overview PARAMETER (MaXsIz=99) Control structures REAL A(mAxSiZ)  FORTRAN 66 10 READ (5,100,END=999) K Relied heavily on statement labels and GOTO 100 FORMAT(I5) statements. IF (K.LE.0 .OR. K.GT.MAXSIZ) STOP READ *,(A(I),I=1,K)  FORTRAN 77 PRINT *,(A(I),I=1,K) Added some modern control structures PRINT *,’SUM=’,SUM(A,K) (e.g., conditionals). GO TO 10 999 PRINT *, "All Done" STOP END

35 36 Example Commentary C SUMMATION SUBPROGRAM  Columns and lines are relevant. FUNCTION SUM(V,N)  Blanks are ignored (by early FORTRANs). REAL V(N) SUM = 0.0  Variable names are from 1 to 6 characters long, DO 20 I = 1,N begin with a letter, and contain letters and digits. SUM = SUM + V(I)  Programmer-defined constants. 20 CONTINUE  Arrays: when sizes are given, lower bounds are RETURN assumed to be 1; otherwise subscript ranges must END be explicitly declared.  Variable types may not be declared: implicit naming convention.

37 38

 Data formats.  FORTRAN 77 has no while statement. On syntax  Functions are compiled separately from the main program. Information from the main program is not used to pass information to the compiler. Failure may arise when the A misspelling bug ... loader tries to merge subprograms with main program. do 10 i = 1,100 vs. do 10 i = 1.100  Function parameters are uniformly transmitted by reference (or value-result). ... that is reported to have caused the NASA Mariner 1 probe Recall that allocation is done statically. to fail to reach Venus!  DO loops by increment.  A value is returned in a FORTRAN function by assigning a value to the name of a function.

39 40 Types Storage  FORTRAN has no mechanism for creating user types. Representation and Management  Static type checking is used in FORTRAN, but the  Storage representation in FORTRAN is sequential. checking is incomplete.  Only two levels of referencing environment are provided, Many language features, including arguments in global and local. subprogram calls and the use of COMMON blocks, The global environment may be partitioned into separate cannot be statically checked (in part because common environments that are shared amongst sets of subprograms are compiled independently). subprograms, but only data objects may be shared in this Constructs that cannot be statically checked are way. ordinarily left unchecked at run time in FORTRAN implementations.

41 42

The sequential storage representation is critical in the  COMMON definition of the EQUIVALENCE and COMMON declarations. The global environment is set up in terms of sets of  EQUIVALENCE variables and arrays, which are termed COMMON blocks. This declaration allows more than one simple or A COMMON block is a named block of storage and may subscripted variable to refer to the same storage contain the values of any number of simple variables location. and arrays. COMMON blocks may be used to isolate global data to ? Is this a good idea? only a few subprograms needing that data. Consider the following: ? Is the COMMON block a good idea? REAL X Consider the following: INTEGER Y COMMON/BLK/X,Y,K(25) in MAIN EQUIVALENCE (X,Y) COMMON/BLK/U,V,I(5),M(4,5) in SUB

43 44 /.Aliasing-, () *+

Aliasing occurs when two names or expressions /.Parameters-, refer to the same object or location. () *+ There are two concepts that must be clearly distinguished.  Aliasing raises serious problems for both the user  The parameter names used in a function declaration and implementor of a language. are called formal parameters.  Because of the problems caused by aliasing, new language designs sometimes attempt to restrict or  When a function is called, expressions called actual eliminate altogether features that allow aliases to parameters are used to compute the parameter values be constructed. for that call.

45 46

 The language specifies that if a formal parameter is FORTRAN subroutines and functions assigned to, the actual parameter must be a variable, but because of independent compilation this rule  Actual parameters may be simple variables, literals, cannot be checked by the compiler. array names, subscripted variables, subprogram Example: names, or arithmetic or logical expressions. SUBROUTINE SUB(X,Y) The interpretation of a formal parameter as an array X = Y is done by the called subroutine. END  Each subroutine is compiled independently and no checking is done for compatibility between the CALL SUB(-1.0,1.0) subroutine declaration and its call.  Parameter passing is uniformly by reference.

47 48  J. McCarthy. Recursive functions of symbolic expressions Topic III and their computation by machine. Communications of LISP : functions, recursion, and lists the ACM, 3(4):184–195, 1960.a

References:g g  J. McCarthy. History of LISP. In History of Programming Languages. Academic Press, 1981.  Chapter 3 of Concepts in programming languages by J. C. Mitchell. CUP, 2003.

 Chapters 5( 4.5) and 13( 1) of Programming languages: § § Design and implementation (3RD EDITION) by T.W. Pratt aAvailable on-line from http://www-formal.stanford.edu/jmc/ h recursive.html . and M. V.Zelkowitz. Prentice Hall, 1999. i

49 50

LISP = LISt Processing (±1960)  Developed in the late 1950s and early 1960s by a team /.Programming-language phrases-, led by John McCarthy in MIT. () *+

 McCarthy described LISP as a “a scheme for representing  Expressions. A syntactic entity that may be evaluated to the partial recursive functions of a certain class of determine its value. symbolic expressions”.  Statement. A command that alters the state of the  Motivating problems: Symbolic computation (symbolic machine in some explicit way. differentiation), logic (Advice taker), experimental programming.  Declaration. A syntactic entity that introduces a new identifier, often specifying one or more attributes.  Software embedding LISP: Emacs (text editor), GTK (linux graphical toolkit), Sawfish (window manager), GnuCash (accounting software).

51 52 Innovation in the design of LISP Some contributions of LISP  LISP is an expression-based language. Conditional expressions that produce a value were  Lists. new in LISP.  Recursive functions.

 Pure LISP has no statements and no expressions  Garbage collection. with side effects. However, LISP also supports impure constructs.  Programs as data.

53 54

Overview  LISP syntax is extremely simple. To make parsing easy, all operations are written in prefix form (i.e., with the operator  Most operations in LISP take list arguments and return in front of all the operands). list values. Example:  LISP programs compute with atoms and cells. ( cons ’(a b c) ’(d e f) ) cons-cell representation  The basic data structures of LISP are dotted pairs, which are pairs written with a dot between the components. Remark: The function (quote x), or simply ’x, just Putting atoms or pairs together, one can write symbolic returns the literal value of its argument. expressions in a form traditionally called S-expressions. Most LISP programs use lists built out of S-expressions.

 LISP is an untyped programming language.

55 56 Historically, LISP was a dynamically scoped language ... ? How does one recognise a LISP program? ( defvar x T ) ( defvar x 1 ) val x = 1 ; ( defun test(y) (eq x y) ) ( defun g(z) (+ x z) ) fun g(z) = x + z ; ( cond ( defun f(y) fun f(y) ( x ( let ( (z 0) ) (test z) ) ) ( + ( g y ) = g(y) + ) ( let let vs. ( ( x y) ) val x = y ( defvar x T ) ( in ( defun test(y) (eq x y) ) g x ) g(x) ( cond ) ) ) end ; ( x ( let ( (x 0) ) (test x) ) ) ) ( f (+ x 1) ) f(x+1) ; ...when Scheme was introduced in 1978, it was a statically ! It is full of parentheses! scoped variant of LISP.

57 58

/.Renaming of local variables-, /.Static and dynamic scope-, () *+ () *+ Lexical scope is deeply related to renaming of variables. It Static scope rules relate references with declarations of names should not matter whether a program uses one name or in the program text; dynamic scope rules relate references another one for a local variable. Let us state this supposition with associations for names during program execution. as a principle: There are two main rules for finding the declaration of a global Consistent renaming of local names in the source text identifier: has no effect on the computation set up by a program.  Static scope. A global identifier refers to the identifier with This renaming principle motivates static scope because a that name that is declared in the closest enclosing scope language that obeys the renaming principle uses lexical of the program text. scope. The reason is that the renaming principle can be applied to rename local variables until each name has only  Dynamic scope. A global identifier refers to the identifier one declaration in the entire program. This one declaration associated with the most recent environment. is the one obtained under lexical scope.

59 60 /.The importance of static scope-, parameter specifications, relating subprogram calls to () *+ subprogram declarations, and relating statement labels Static scope rules play an important part in the design and referenced in goto statements to labels on particular implementation of most programming languages. statements.  Static scope rules allow many different sorts of In each of these cases, a different set of simplications connections to be established between references to may be made during translation that make execution of names and their declarations during translation. the program more efficient. For instance, relating a variable name to a declaration for  Static scope rules are also important for the programmer the variable and relating a constant name to a declaration in reading a program because they make it possible to for the constant. relate a name referenced in a program to a declaration for Other connections include relating names to type the name without tracing the course of program execution. declarations, relating formal parameters to formal

61 62

LISP abstract machine /. -, The abstract machine for Pure LISP has four parts: ()Abstract machines*+ The terminology abstract machine is generally used to refer to 1. A LISP expression to be evaluated. an idealised computing device that can execute a specific 2. A continuation, which is a function representing the programming language directly. Systems people use virtual remaining of the program to evaluate when done with machine (as in JVM) for a similar concept. the current expression. Typically an abstract machine may not be fully implementable. 3. An association list, also know as the A-list. However, an abstract machine should be sufficiently realistic The purpose of the A-list is to store the values of variables to provide useful information about real program execution. that may occur either in the current expression to be An important goal in discussing abstract machines is to evaluated or in the remaining expressions in the program. identify the mental model of the computer that a programmer 4. A heap, which is a set of cons cells (or dotted pairs) that uses to write and debug programs. might be pointed to by pointers in the A-list.

63 64 Recursion subst x y (cdr z). This possible conflict is resolved by the a McCarthy (1960) SAVE and UNSAVE routines that use the public push-down list . The SAVE routine has an index that tells it how many registers ( defun subst ( x y z ) ( cond in the push-down list are already in use. It moves the contents of ((atomz)(cond((eqzy)x)(Tz))) the registers which are to be saved to the first unused registers in ( T ( cons (subst x y (car z) ) (subst x y (cdr z)) ) ) the push-down list, advances the index of the list, and returns to ) the program form which control came. This program may then ) freely use these registers for temporary storage. Before the In general ..., the routine for a recursive function uses itself as a routine exits it uses UNSAVE, which restores the contents of the subroutine. For example, the program for subst x y z uses itself as temporary registers from the push-down list and moves back the a subroutine to evaluate the result of substituting into the index of this list. The result of these conventions is described, in subexpression car z and cdr z. While subst x y (cdr z) is being programming terminology, by saying that the recursive subroutine evaluated, the result of the previous evaluation of is transparent to the temporary storage registers. subst x y (car z) must be saved in a temporary storage register. However, subst may need the same register for evaluating a1995: now called a stack

65 66

/.Garbage collection-, () *+ Garbage collection In computing, garbage refers to memory locations that are McCarthy (1960) not accessible to a program. ...When a free register is wanted, and there is none At a given point in the execution of a program P,a left on the free-storage list, a reclamation† cycle starts. memory location ℓ is garbage if no completed ——— execution of P from this point can access location ℓ. † We already called this process “garbage collection”, In other words, replacing the contents of ℓ or making but I guess that I chickened out of using it in the this location inaccessible to P cannot affect any paper—or else the Research Laboratory of further execution of the program. Electronics grammar ladies wouldn’t let me. Garbage collection is the process of detecting garbage during the execution of a program and making it available.

67 68 Programs as data /. -, ()Reflection*+  LISP data and LISP program have the same syntax and A language mechanism is said to be reflective if it allows to internal representation. This allows data structures to be reason about its own structure. executed as programs and programs to be modified as A language is said to be fully reflective if it allows a program data. to reason about all aspects of its current structure and state.  One feature that sets LISP apart from many other languages is that it is possible for a program to build a data structure that represents an expression and then Fully reflective languages are still just research prototypes evaluates the expression as if it were written as part of (e.g. Maude), but limited forms of reflection appear in the program. This is done with the function eval. several languages (e.g. LISP, Prolog, Python).

69 70

Parameter passing in LISP Strict and lazy evaluation The actual parameters in a function call are always expressions, represented as lists structures. Example: Consider the following function definitions with LISP provides two main methods of parameter passing: parameter-passing by value.  Pass/Call-by-value. The most common method is to ( defun CountFrom(n) ( CountFrom(+ n 1) ) ) evaluate the expressions in the actual-parameter list, and pass the resulting values. ( defun FunnyOr(x y)  Pass/Call-by-name.⋆ A less common method is to transmit ( cond ( x 1) ( T y ) ) the expression in the actual parameter list unevaluated, ) and let the call function evaluate them as needed using eval. ( defun FunnyOrelse(x y) ( cond ( (eval x) 1) ( T (eval y) ) ) The programmer may specify transmission by name using ) nlambda in place of lambda in the function definition.

71 72 ? What happens in the following calls? Topic IV ( FunnyOr T (CountFrom 0) ) Block-structured procedural languages ( FunnyOr nil T ) g Algol and Pascal g

( FunnyOrelse ’T ’(CountFrom 0) ) References: ( FunnyOrelse ’nil ’T )  Chapters 5 and 7, of Concepts in programming languages by J. C. Mitchell. CUP, 2003.

 Chapters 10( 2) and 11( 1) of Programming languages: § § Design and implementation (3RD EDITION) by T.W. Pratt and M. V.Zelkowitz. Prentice Hall, 1999.

73 74

/. -, ()Parameters*+ There are two concepts that must be clearly distinguished:  A formal parameter is a declaration that appears in the declaration of the subprogram. (The computation in the  Chapter 5 of Programming languages: Concepts & body of the subprogram is written in terms of formal constructs by R.Sethi (2ND EDITION). Addison-Wesley, parameters.) 1996.  An actual parameter is a value that the calling program  Chapter 7 of Understanding programming languages by sends to the subprogram. MBen-Ari. Wiley, 1996. Example: Named parameter associations. Normally the actual parameters in a subprogram call are just listed and the matching with the formal parameters is done by

75 76 position: /.Parameter passing-, () *+ procedure Proc(First: Integer; Second: Character); The way that actual parameters are evaluated and passed Proc(24,’h’); to procedures depends on the programming language and In Ada it is possible to use named association in the call: the kind of parameter-passing mechanisms it uses. The main distinction between different parameter-passing Proc(Second => ’h’, First => 24); mechanisms are: ? What about in ML? Can it be simulated?  the time that the actual parameter is evaluated, and This is commonly used together with default parameters:  the location used to store the parameter value. procedure Proc(First: Integer := 0; Second: Character := ’*’); NB: The location of a variable (or expression) is called its Proc(Second => ’o’); L-value, and the value stored in this location is called the R-value of the variable (or expression).

77 78

/.Parameter passing-, () *+ Pass/Call-by-value /.Parameter passing-, () *+  In pass-by-value, the actual parameter is evaluated. The Pass/Call-by-reference value of the actual parameter is then stored in a new  In pass-by-reference, the actual parameter must have location allocated for the function parameter. an L-value. The L-value of the actual parameter is then  Under call-by-value, a formal parameter corresponds to bound to the formal parameter. the value of an actual parameter. That is, the formal x of a  Under call-by-reference, a formal parameter becomes procedure P takes on the value of the actual parameter. a synonym for the location of an actual parameter. An The idea is to evaluate a call P(E) as follows: actual reference parameter must have a location. x := E; execute the body of procedure P; if P is a function, return a result.

79 80 Example: program main; The difference between call-by-value and call-by-reference begin is important to the programmer in several ways: function f( var x: integer; y: integer): integer;  Side effects. Assignments inside the function body begin may have different effects under pass-by-value and x := 2; pass-by-reference. y := 1; if x = 1 then f := 1 else f:= 2  Aliasing. Aliasing occurs when two names refer to the end; same object or location. Aliasing may occur when two parameters are passed var z: integer; by reference or one parameter passed by reference z := 0; has the same location as the global variable of the writeln( f(z,z) ) procedure. end

81 82

/.Parameter passing-, () *+ Pass/Call-by-value/result Call-by-value/result is also known as copy-in/copy-out  Efficiency. Pass-by-value may be inefficient for large because the actuals are initially copied into the formals and structures if the value of the large structure must be the formals are eventually copied back out to the actuals. copied. Pass-by-reference maybe less efficient than Actuals that do not have locations are passed by value. pass-by-value for small structures that would fit directly on Actuals with locations are treated as follows: stack, because when parameters are passed by reference 1. Copy-in phase. Both the values and the locations of we must dereference a pointer to get their value. the actual parameters are computed. The values are assigned to the corresponding formals, as in call-by-value, and the locations are saved for the copy-out phase. 2. Copy-out phase. After the procedure body is executed, the final values of the formals are copied back out to the locations computed in the copy-in phase.

83 84 Examples:  Ada supports three kinds of parameters:  A parameter in Pascal is normally passed by value. It is passed by reference, however, if the keyword var appears 1. in parameters, corresponding to value parameters; before the declaration of the formal parameter. 2. out parameters, corresponding to just the copy-out procedure proc(in: Integer; var out: Real); phase of call-by-value/result; and 3. in out parameters, corresponding to either  The only parameter-passing method in C is call-by-value; reference parameters or value/result parameters, however, the effect of call-by-reference can be achieved at the discretion of the implementation. using pointers. In C++ true call-by-reference is available using reference parameters.

85 86

Block structure /.Parameter passing-, () *+  In a block-structured language, each program or Pass/Call-by-name subprogram is organised as a set of nested blocks. The Algol 60 report describes call-by-name as follows: A block is a region of program text, identified by begin 1. Actual parameters are textually substituted for the formals. and end markers, that may contain declarations local Possible conflicts between names in the actuals and local to this region. names in the procedure body are avoided by renaming the  In-line (or unnamed) blocks are useful for restricting the locals in the body. scope of variables by declaring them only when needed, 2. The resulting procedure body is substituted for the call. instead of at the beginning of a subprogram. The trend Possible conflicts between nonlocals in the procedure in programming is to reduce the size of subprograms, body and locals at the point of call are avoided by so the use of unnamed blocks is less useful than it renaming the locals at the point of call. used to be.

87 88 Nested procedures can be used to group statements that  Each declaration is visible within a certain region are executed at more than one location within a of program text, called a block. subprogram, but refer to local variables and so cannot be  external to the subprogram. Before modules and When a program begins executing the instructions object-oriented programming were introduced, nested contained in a block at run time, memory is procedures were used to structure large programs. allocated for the variables declared in that block.  When a program exits a block, some or all of the  Block structure was first defined in Algol. Pascal contains memory allocated to variables declared in that nested procedures but not in-line blocks; C contains in-line blocks but not nested procedures; Ada supports both. block will be deallocated.  An identifier that is not declared in the current  Block-structured languages are characterised by the block is considered global to the block and refers following properties: to the entity with this name that is declared in the  New variables may be declared at various points in a closest enclosing block. program.

89 90

Algol had a major effect on language design  The main characteristics of the Algol family are:  the familiar semicolon-separated sequence of  The Algol-like programming languages evolved in parallel statements, with the LISP family of languages, beginning with Algol 58  block structure, and Algol 60 in the late 1950s.  functions and procedures, and  The most prominent Algol-like programming languages  static typing. are Pascal and C, although C differs from most of the Algol-like languages in some significant ways. Further Algol-like languages are: Algol 58, Algol W, Euclid, etc.

91 92 Algol 60 Algol 60 Features  Designed by a committee (including Backus, McCarthy,  Simple statement-oriented syntax. Perlis) between 1958 and 1963.  Block structure.

 Intended to be a general purpose programming language,  Recursive functions and stack storage allocation. with emphasis on scientific and numerical applications.  Fewer ad hoc restrictions than previous languages  Compared with FORTRAN, Algol 60 provided better ways (e.g., general expressions inside array indices, to represent data structures and, like LISP, allowed procedures that could be called with procedure functions to be called recursively. parameters). Eclipsed by FORTRAN because of the lack of I/O  A primitive static type system, later improved in statements, separate compilation, and library; and Algol 68 and Pascal. because it was not supported by IBM.

93 94

Algol 60 Some trouble spots  The Algol 60 type discipline had some shortcomings.  Algol 60 was designed around two parameter-passing For instance: mechanisms, call-by-name and call-by-value. Call-by-name interacts badly with side effects;  Automatic type conversions were not fully specified (e.g., x := x/y was not properly defined when x and y call-by-value is expensive for arrays. were integers—is it allowed, and if so was the value  There are some awkward issues related to control rounded or truncated?). flow, such as memory management, when a program  The type of a procedure parameter to a procedure jumps out of a nested block. does not include the types of parameters.

 An array parameter to a procedure is given type array, without array bounds.

95 96 Algol 60 procedure typesa Algol 60 pass-by-name Copy rule In Algol 60, the type of each formal parameter of a procedure must be given. However, proc is considered a type (the type of procedures). This is real procedure sum(E,i,low,high); value low, high; much simpler than the ML types of function arguments. However, this is real E; integer i, low, high; really a type loophole; because calls to procedure parameters are not fully begin type checked, Algol 60 programs may produce run-time errors. sum:=0.0; Write a procedure declaration for that causes the following program Q for i := low step 1 until high do sum := sum+E; fragment to produce a run-time type error: end proc P ( proc Q ) integer j; real array A[1:10]; real result; begin Q(true) end; for j:= 1 step 1 until 10 do A[j] := j; P(Q); result := sum(A[j],j,1,10) where true is a Boolean value. Explain why the procedure is statically type By the Algol 60 copy rule, the function call to sum above is equivalent to: correct, but produces a run-time type error. (You may assume that adding begin a Boolean to an integer is a run-time error.) sum:=0.0; a Exercise 5.1 of Concepts in programming languages by J. Mitchell, CUP, for j := 1 step 1 until 10 do sum := sum+A[j]; 2003. end

97 98

Algol 60 pass-by-namea Algol 68 The following Algol 60 code declares a procedure P with one pass-by-name  Intended to remove some of the difficulties found in integer parameter. Explain how the procedure call P(A[i]) changes the Algol 60 and to improve the expressiveness of the values of i and A by substituting the actual parameters for the formal language. parameters, according to the Algol 60 copy rule. What integer values are printed by the program? And, by using pass-by-value parameter passing? It did not entirely succeed however, with one main begin problem being the difficulty of efficient compilation integer i; i:=1; (e.g., the implementation consequences of higher-order integer array A[1:2]; A[1]:=2; A[2]:=3; procedures where not well understood at the time). procedure P(x); integer x;  One contribution of Algol 68 was its regular, systematic begin i:=x; x:=1 end type system. P(A[i]); print(i,A[1],A[2]) The types (referred to as modes in Algol 68) are either end primitive (int, real, complex, bool, char, string, bits, aExercise 5.2 of Concepts in programming languages by J. Mitchell, CUP, bytes, semaphore, format, file) or compound (array, 2003. structure, procedure, set, pointer).

99 100 Type constructions could be combined without restriction. Algol innovations This made the type system seem more systematic than  Use of BNF syntax description. previous languages.  Block structure.  Algol 68 memory management involves a stack for local variables and heap storage. Algol 68 data on the heap are  Scope rules for local variables. explicitly allocated, and are reclaimed by garbage  Dynamic lifetimes for variables. collection.  Nested if-then-else expressions and statements.  Algol 68 parameter passing is by value, with  Recursive subroutines. pass-by-reference accomplished by pointer types. (This  Call-by-value and call-by-name arguments. is essentially the same design as that adopted in C.)  Explicit type declarations for variables.  The decision to allow independent constructs to be combined without restriction also led to some complex  Static typing. features, such as assignable pointers.  Arrays with dynamic bounds.

101 102

Pascal  Designed in the 1970s by Niklaus Wirth, after the design  A Pascal program is always formed from a single and implementation of Algol W. main program block, which contains within it definitions of the subprograms used.  Very successful programming language for teaching, in part because it was designed explicitly for that purpose. Each block has a characteristic structure: a header giving the specification of parameters and results, Also designed to be compiled in one pass. This hindered followed by constant definitions, type definitions, language design; e.g., it forced the problematic forward local variable declarations, other nested subprogram declaration. definitions, and the statements that make up the  Pascal is a block-structured language in which static executable part. scope rules are used to determine the meaning of nonlocal references to names.

103 104 A restriction that made Pascal simpler than Algol 68:  Pascal is a quasi-strong, statically typed programming procedure language. Allowed( j,k: integer ); An important contribution of the Pascal type system is the rich set of data-structuring concepts: e.g. enumerations, procedure subranges, records, variant records, sets, sequential files. AlsoAllowed( procedure P(i:integer); j,k: integer );  The Pascal type system is more expressive than the Algol 60 one (repairing some of its loopholes), and simpler procedure and more limited than the Algol 68 one (eliminating some NotAllowed( procedure of the compilation difficulties). MyProc( procedure P( i:integer ) ) );

105 106

 Pascal was the first language to propose index checking.

 Problematically, in Pascal, the index type of an array is  Pascal uses a mixture of name and structural part of its type. The Pascal standard defines conformant equivalence for determining if two variables have the array parameters whose bounds are implicitly passed to a same type. procedure. The Ada programmig language uses so-called Name equivalence is used in most cases for unconstrained array types to solve this problem. determining if formal and actual parameters in The subscript range must be fixed at compile time subprogram calls have the same type; structural permitting the compiler to perform all address calculations equivalence is used in most other situations. during compilation.  Parameters are passed by value or reference. procedure Allowed( a: array [1..10] of integer ) ; Complete static type checking is possible for procedure correspondence of actual and formal parameter NotAllowed( n: integer; types in each subprogram call. a: array [1..n] of integer ) ;

107 108 Pascal variant records Variant records have a part common to all records of that type, and a variable part, specific to some subset of the records. Variant records introduce weaknesses into the type type system for a language. kind = ( unary, binary) ; 1. Compilers do not usually check that the value in the type { datatype } tag field is consistent with the state of the record. UBtree = record { ’a UBtree = record of } value: integer ; { ’a * ’a UBkind } 2. Tag fields are optional. If omitted, no checking is case k: kind of { and ’a UBkind = } possible at run time to determine which variant is unary: ^UBtree ; { unary of ’a UBtree } present when a selection is made of a field in a variant. binary: record { | binary of } Note that datatype and case in ML effectively provide a safe left: ^UBtree ; { ’a UBtree * } form of variant records. Why are they safe? right: ^UBtree { ’a UBtree ; } end end ;

109 110

Summary  The Algol family of languages established the Topic V command-oriented syntax, with blocks, local declarations, Object-oriented languages : Concepts and origins and recursive functions, that are used in most current gSIMULA and Smalltalkg programming languages. References:  The Algol family of languages is statically typed, as each expression has a type that is determined by its syntactic ⋆ Chapters 10 and 11 of Concepts in programming form and the compiler checks before running the program languages by J. C. Mitchell. CUP, 2003. to make sure that the types of operations and operands  Chapters 8, and 12( 2 and 3) of Programming §§ agree. languages: Design and implementation (3RD EDITION) by T.W. Pratt and M. V.Zelkowitz. Prentice Hall, 1999.

111 112 Objects in ML !? exception Empty ;  Chapter 7 of Programming languages: Concepts & fun newStack(x0) = let val stack = ref [x0] constructs by R.Sethi (2ND EDITION). Addison-Wesley, in ref{ push = fn(x) 1996. => stack := ( x :: !stack ) ,  Chapters 14 and 15 of Understanding programming pop = fn() languages by MBen-Ari. Wiley, 1996. => case !stack of nil => raise Empty ⋆ B.Stroustrup. What is “Object-Oriented Programming”? | h::t => ( stack := t; h ) (1991 revised version). Proc. 1st European Conf. on }end ; Object-Oriented Programming. (Available on-line from exception Empty http://public.research.att.com/∼bs/papers.html .) h i val newStack = fn : ’a -> {pop:unit -> ’a, push:’a -> unit} ref

113 114

val BoolStack = newStack(true) ; IntStack0 := !IntStack1 ; val BoolStack = ref {pop=fn,push=fn} : {pop:unit -> bool, push:bool -> unit} ref val it = () : unit val IntStack0 = newStack(0) ; #pop(!IntStack0)() ; val IntStack0 = ref {pop=fn,push=fn} val it = 1 : int : {pop:unit -> int, push:int -> unit} ref #push(!IntStack0)(4) ; val IntStack1 = newStack(1) ; val it = () : unit val IntStack1 = ref {pop=fn,push=fn} : {pop:unit -> int, push:int -> unit} ref

115 116 Basic concepts in map ( #push(!IntStack0) ) [3,2,1] ; object-oriented languagesa val it = [(),(),()] : unit list map ( #pop(!IntStack0) ) [(),(),(),()] ; Four main language concepts for object-oriented languages: val it = [1,2,3,4] : int list 1. Dynamic lookup.

NB: 2. Abstraction.  ! The stack discipline for activation records fails! 3. Subtyping.

 ? Is ML an object-oriented language? 4. Inheritance. ! Of course not! aNotes from Chapter 10 of Concepts in programming languages ? Why? by J. C. Mitchell. CUP, 2003.

117 118

/.Dynamic lookup-, () *+  Dynamic lookup is sometimes confused with  Dynamic lookup means that when a message is sent to an overloading, which is a mechanism based on object, the method to be executed is selected dynamically, static types of operands. However, the two are at run time, according to the implementation of the object very different. ? Why? that receives the message. In other words, the object “chooses” how to respond to a message. The important property of dynamic lookup is that different objects may implement the same operation differently, and so may respond to the same message in different ways.

119 120 /. -, ()Abstraction*+ /. -,  Abstraction means that implementation details are hidden Subtyping () *+ inside a program unit with a specific interface. For objects, the interface usually consists of a set of methods that  Subtyping is a relation on types that allows values of manipulate hidden data. one type to be used in place of values of another.  Abstraction based on objects is similar in many ways to Specifically, if an object a has all the functionality of abstraction based on abstract data types: Objects and another object b, then we may use a in any context abstract data types both combine functions and data, and expecting b. abstraction in both cases involves distinguishing between  The basic principle associated with subtyping is a public interface and private implementation. substitutivity: If A is a subtype of B, then any expression Other features of object-oriented languages, however, of type A may be used without type error in any context make abstraction in object-oriented languages more that requires an expression of type B. flexible than abstraction with abstract data types.

121 122

 The primary advantage of subtyping is that it permits /. -, ()Inheritance*+ uniform operations over various types of data.  Inheritance is the ability to reuse the definition of one For instance, subtyping makes it possible to have kind of object to define another kind of object. heterogeneous data structures that contain objects that belong to different subtypes of some common type.  The importance of inheritance is that it saves the effort of duplicating (or reading duplicated) code and that,  Subtyping in an object-oriented language allows when one class is implemented by inheriting from functionality to be added without modifying general parts another, changes to one affect the other. This has a of a system. significant impact on code maintenance and modification.

123 124 History of objects SIMULA and Smalltalk Inheritance is not subtyping  Objects were invented in the design of SIMULA and refined in the evolution of Smalltalk. Subtyping is a relation on interfaces,  SIMULA: The first object-oriented language. inheritance is a relation on implementations. The object model in SIMULA was based on procedures One reason subtyping and inheritance are often confused is activation records, with objects originally described as that some class mechanisms combine the two. A typical procedures that return a pointer to their own activation example is C++, in which A will be recognized by the compiler record. as a subtype of B only if B is a public base class of A.  Smalltalk: A dynamically typed object-oriented language. Combining subtyping and inheritance is an elective design Many object-oriented ideas originated or were decision. popularised by the Smalltalk group, which built on Alan Kay’s then-futuristic idea of the Dynabook.

125 126

 SIMULA A generic event-based simulation program Q := make_queue(initial_event); repeat  Extremely influential as the first language with classes select event e from Q objects, dynamic lookup, subtyping, and inheritance. simulate event e place all events generated by e on Q  Originally designed for the purpose of simulation by until Q is empty O.-J.Dahl and K.Nygaard at the Norwegian Computing naturally requires: Center, Oslo, in the 1960s.  A data structure that may contain a variety of kinds  SIMULA was designed as an extension and modification of events. ; subtyping of Algol 60. The main features added to Algol 60 were:  The selection of the simulation operation according to class concepts and reference variables (pointers to the kind of event being processed. ; dynamic lookup objects); pass-by-reference; input-output features;  Ways in which to structure the implementation of coroutines (a mechanism for writing concurrent programs). related kinds of events. ; inheritance

127 128 Objects in SIMULA SIMULA Class: A procedure returning a pointer to its activation Object-oriented features record.  Objects: A SIMULA object is an activation record Object: An activation record produced by call to a class, produced by call to a class. called an instance of the class. ; a SIMULA object  Classes: A SIMULA class is a procedure that returns is a closure a pointer to its activation record. The body of a class  SIMULA implementations place objects on the heap. may initialise the objects it creates.

 Objects are deallocated by the garbage collector (which  Dynamic lookup: Operations on an object are selected deallocates objects only when they are no longer from the activation record of that object. reachable from the program that created them).

129 130

 Abstraction: Hiding was not provided in SIMULA 67 but was added later and used as the basis for C++.  Subtyping: Objects are typed according to the classes that create them. Subtyping is determined by class SIMULA 67 did not distinguish between public and private hierarchy. members of classes.  Inheritance: A SIMULA class may be defined, by class A later version of the language, however, allowed prefixing, as an extension of a class that has already attributes to be made “protected”, which means that they been defined including the ability to redefine parts of a are accessible for subclasses (but not for other classes), class in a subclass. or “hidden”, in which case they are not accessible to subclasses either.

131 132 SIMULA Sample codea SIMULA Further object-related features CLASS POINT(X,Y); REAL X, Y; COMMENT***CARTESIAN REPRESENTATION  Inner, which indicates that the method of a subclass BEGIN should be called in combination with execution of BOOLEAN PROCEDURE EQUALS(P); REF(POINT) P; superclass code that contains the inner keyword. IF P =/= NONE THEN EQUALS := ABS(X-P.X) + ABS(Y-P.Y) < 0.00001;  Inspect and qua, which provide the ability to test the type REAL PROCEDURE DISTANCE(P); REF(POINT) P; of an object at run time and to execute appropriate code IF P == NONE THEN ERROR ELSE accordingly. (inspect is a class (type) test, and qua is a DISTANCE := SQRT( (X-P.X)**2 + (Y-P.Y)**2 ); form of type cast that is checked for correctness at run END***POINT*** time.) aSee Chapter 4( 1) of SIMULA begin (2nd edition) by G.Birtwistle, § O.-J. Dahl, B. Myhrhug, and K. Nygaard. Chartwell-Bratt Ltd., 1980.

133 134

CLASS LINE(A,B,C); REAL A,B,C; COMMENT***Ax+By+C=0 REPRESENTATION BEGIN COMMENT*** INITIALISATION CODE BOOLEAN PROCEDURE PARALLELTO(L); REF(LINE) L; REAL D; IF L =/= NONE THEN D := SQRT( A**2 + B**2 ) PARALLELTO := ABS( A*L.B - B*L.A ) < 0.00001; IF D = 0.0 THEN ERROR ELSE REF(POINT) PROCEDURE MEETS(L); REF(LINE) L; BEGIN BEGIN REAL T; D := 1/D; IF L =/= NONE and ~PARALLELTO(L) THEN A := A*D; B := B*D; C := C * D; BEGIN END; ... END***LINE*** MEETS :- NEW POINT(...,...); END; END;***MEETS***

135 136 SIMULA Example: Subclasses and inheritance POINT CLASS COLOREDPOINT(C); COLOR C; SIMULA syntax for a class C1 with subclasses C2 and C3 is BEGIN BOOLEAN PROCEDURE EQUALS(Q); REF(COLOREDPOINT) Q; CLASS C1 ...; ; END***COLOREDPOINT** C1 CLASS C2 ; REF(POINT) P; REF(COLOREDPOINT) CP; C1 CLASS C3 P :- NEW POINT(1.0,2.5); ; CP :- NEW COLOREDPOINT(2.5,1.0,RED);

When we create a C2 object, for example, we do this by first NB: SIMULA 67 did not hide fields. Thus, creating a C1 object (activation record) and then appending a CP.C := BLUE; C2 object (activation record). changes the color of the point referenced by CP.

137 138

Examples: 1. CLASS A; A CLASS B; SIMULA REF(A) a; REF(B) b; Object types and subtypes a :- b; COMMENT***legal since B is  All instances of a class are given the same type. The ***a subclass of A name of this type is the same as the name of the class. ...  The class names (types of objects) are arranged in a b :- a; COMMENT***also legal, but checked at subtype hierarchy corresponding exactly to the subclass ***run time to make sure that hierarchy. ***a points to a B object, so ***as to avoid a type error

2. inspect a when B do b :- a otherwise ...

139 140 3. An error in the original SIMULA type checker surrounding the relationship between subtyping REF(A) a; REF(B) b; and inheritance: SIMULA also uses the semantically incorrect principle that, if then . CLASS A; A CLASS B; B<:A REF(B)<:REF(A) SIMULA subclassing produces the subtype relation So: this code ... B<:A. PROCEDURE ASSIGNa( REF(A) x ) BEGIN x :- a END; ASSIGNa(b); . . . will statically type check, but may cause a type error at run time.

P.S. The same type error occurs in the original implementation of Eiffel. A similar problem occurs in Java’s covariant arrays (see later).

141 142

Smalltalk Smalltalk Motivating application : Dynabook  Developed at XEROX PARC in the 1970s.  Concept developed by Alan Kay.  Major language that popularised objects; very flexible and  Influence on Smalltalk: powerful.  Objects and classes as useful organising concepts  The object metaphor was extended and refined. for building an entire programming environment  Used some ideas from SIMULA; but it was a and system. completely new language, with new terminology and  Language intended to be the operating system an original syntax. interface as well as the programming language for  Abstraction via private instance variables (data Dynabook. associated with an object) and public methods (code  Syntax designed to be used with a special-purpose for performing operations). editor.  Everything is an object; even a class. All operations  The implementation emphasised flexibility and are messages to objects. ease of use over efficiency.

143 144 Smalltalk Smalltalk Terminology Classes and objects  Object: A combination of private data and functions. Each object is an instance of some class. class name Point  Class: A template defining the implementation of a set of super class Object objects. class var pi  Subclass: Class defined by inheriting from its superclass. instance var x, y  Selector: The name of a message (analogous to a function name). class messages and methods  Message: A selector together with actual parameter ...names and codes for methods ... h i values (analogous to a function call). instance messages and methods  Method: The code in a class for responding to a message. ...names and codes for methods ...  Instance variable: Data stored in an individual h i object (instance class). Definition of Point class

145 146

A class message and method for point objects Some instance messages and methods

newX:xvalue Y:yvalue || x || ^x ^ self new x: xvalue y: yvalue y || ^y moveDx: dx Dy: dy || A new point at coordinates (3, 4) is created when the message x <- x+dx newX:3 Y:4 y <- y+dy is sent to the Point class. Executing the following code

For instance: p moveDX:2 Y:1

p <- Point newX:3 Y:4 the value of the expressions p x and p y is the object 5.

147 148 Smalltalk Inheritance  ColoredPoint inherits instance variables x and y, class name ColoredPoint methods x, y, moveDX:Dy:, etc. super class Point  class var ColoredPoint adds an instance variable color and a method color to return the color of a ColoredPoint. instance var color class messages and methods  The ColoredPoint draw method redefines (or overrides) the one inherited from Point. newX:xv Y:yv C:cv ...code ... h i instance messages and methods  An option available in Smalltalk is to specify that a superclass method should be undefined on a subclass. color ||^color draw ...code ... h i Definition of ColoredPoint class

149 150

Smalltalk Example: Consider Abstraction Smalltalk rules: newX:xv Y:yv C:cv || ^ self new x:xv y:yv color:cv  Methods are public. cp <- ColoredPoint newX:1 Y:2 C:red Any code with a pointer to an object may send any cp moveDx:3 Dy:4 message to that object. If the corresponding method is defined in the class of the object, or any superclass, The value of cp x is the object 4, and the value of the the method will be invoked. This makes all methods of expression cp color is the object red. an object visible to any code that can access the object. Note that even though moveDx:Dy: is an inherited method,  Instance variables are protected. defined originally for points without color, the result of moving The instance variables of an object are accessible only a ColoredPoint is again a ColoredPoint. to methods of the class of the object and to methods of its subclasses.

151 152 Smalltalk Example: A factorial method Dynamic lookup factorial || The run-time structures used for Smalltalk classes and objects self <= 1 support dynamic lookup in two ways. ifTrue: [^1] ifFalse: [^ (self-1) factorial * self] 1. Methods are selected through the receiver object. in the Integer class for 2. Method lookup starts with the method dictionary of the class of the receiver and then proceeds upwards through Integer p NN pp NNN the class hierarchy. ppp NN pp NNN ppp N SmallInt LargeInt

153 154

Smalltalk Interfaces as object types Smalltalk Although Smalltalk does not use any static type checking, Subtyping there is an implicit form of type that every Smalltalk programmer uses in some way. Type A is a subtype of type B if any context The type of an object in Smalltalk is its interface, i.e. the set of expecting an expression of type B may take any messages that can be sent to the object without receiving the expression of type A without introducing a type error. error “message not understood”. The interface of an object is determined by its class, as a class Semantically, in Smalltalk, it makes sense to associate lists the messages that each object will answer. However, subtyping with the superset relation on class interfaces. different classes may implement the same messages, as there ? Why? are no Smalltalk rules to keep different classes from using the same selector names.

155 156  In Smalltalk, the interface of a subclass is often a subtype Smalltalk of the interface of its superclass. The reason being that a Object-oriented features subclass will ordinarily inherit all of the methods of its  Objects: A Smalltalk object is created by a class. superclass, possibly adding more methods. At run time, an object stores its instance variables  In general, however, subclassing does not always lead to and a pointer to the instantiating class. subtyping in Smalltalk.  1. Because it is possible to delete a method from a Classes: A Smalltalk class defines variables, class methods, and the instance methods that are shared superclass in a subclass, a subclass may not produce a subtype. by all objects of the class. At run time, the class data structure contains pointers 2. On the other hand, it is easy to have subtyping without to an instance variable template, a method dictionary, inheritance. and the superclass.

157 158

 Abstraction: Abstraction is provided through protected instance variables. All methods are public but instance variables may be accessed only by the methods of the class and methods of subclasses. Topic VI  Subtyping: Smalltalk does not have a compile-time type Types in programming languages system. Subtyping arises implicitly through relations between the interfaces of objects. Subtyping depends on References:g g the set of messages that are understood by an object, not  Chapter 6 of Concepts in programming languages the representation of objects or whether inheritance is by J. C. Mitchell. CUP, 2003. used.  Sections 4.9 and 8.6 of Programming languages:  Inheritance: Smalltalk subclasses inherit all instance Concepts & constructs by R.Sethi (2ND EDITION). variables and methods of their superclasses. Methods Addison-Wesley, 1996. defined in a superclass may be redefined in a subclass or deleted.

159 160 /.Types in programming-, () *+  A type is a collection of computational entities that share  Using types to organise a program makes it easier some common property. for someone to read, understand, and maintain the  There are three main uses of types in programming program. Types can serve an important purpose in languages: documenting the design and intent of the program. 1. naming and organizing concepts,  Type information in programs can be used for many 2. making sure that bit sequences in computer memory kinds of optimisations. are interpreted consistently, 3. providing information to the compiler about data manipulated by the program.

161 162

/.Type safety-, /.Type systems-, () *+ () *+ A programming language is type safe if no program is A type system for a language is a set of rules for associating a allowed to violate its type distinctions. type with phrases in the language. Safety Example language Explanation Terms strong and weak refer to the effectiveness with which a type system prevents errors. A type system is strong if it Not safe C, C++ Type casts, accepts only safe phrases. In other words, phrases that are pointer arithmetic accepted by a strong type system are guaranteed to evaluate Almost safe Pascal Explicit deallocation; without type error. A type system is weak if it is not strong. dangling pointers Safe LISP, SML, Smalltalk, Java Type checking

163 164 /. -, /. -, Static and dynamic type checking Type checking () *+ () *+ A type error occurs when a computational entity is used in a Run-time type checking: The compiler generates code manner that is inconsistent with the concept it represents. so that, when an operation is performed, the code Type checking is used to prevent some or all type errors, checks to make sure that the operands have the ensuring that the operations in a program are applied properly. correct types. Examples: LISP, Smalltalk. Some questions to be asked about type checking in a language: Compile-time type checking: The compiler checks the  Is the type system strong or weak? program text for potential type errors.  Is the checking done statically or dynamically? Example: SML.  How expressive is the type system; that is, amongst safe NB: Most programming languages use some combination programs, how many does it accept? of compile-time and run-time type checking.

165 166

Static vs. dynamic type checking Type checking in ML Idea Main trade-offs between compile-time and run-time checking: Given a context Γ, an expression e, and a type τ, decide Form of type Advantages Disadvantages whether or not the expression e is of type τ in context Γ. checking Examples:

Run-time Prevents type errors Slows program Γ e1 : bool Γ e2 : bool execution  ⊢ ⊢ Γ e1 orelse e2 : bool Compile-time Prevents type errors May restrict ⊢ Eliminates run-time programming tests because tests TC(Γ, e1 orelse e2, τ) Finds type errors before are conservative TC(Γ, e1, bool) ∧ TC(Γ, e2, bool) , if τ = bool execution and run-time  =  tests false , otherwise 

167 168 /.Type equality-, () *+ Γ e1 : τ1 Γ e2 : τ2  ⊢ ⊢ The question of type equality arises during type checking. Γ (e1, e2): τ1 τ2 ⊢ ∗ ? What does it mean for two types to be equal!? Structural equality. Two type expressions are structurally equal if and only if they are equivalent under the TC(Γ, (e1, e2), τ) following three rules.

TC(Γ, e1, τ1) ∧ TC(Γ, e2, τ2) , if τ = τ1 τ2 SE1. A type name is structurally equal to itself. =  ∗  false , otherwise SE2. Two types are structurally equal if they are  formed by applying the same type constructor to structurally equal types. SE3. After a type declaration, say type n = T, the type name n is structurally equal to T.

169 170

Examples:

Name equality:  Type equality in Pascal/Modula-2. Type equality was left ambiguous in Pascal. Its successor, Modula-2, Pure name equality. A type name is equal to itself, but avoided ambiguity by defining two types to be compatible if no constructed type is equal to any other constructed type. 1. they are the same name, or 2. they are s and t, and s = t is a type declaration, or Transitive name equality. A type name is equal to itself and can be declared equal to other type names. 3. one is a subrange of the other, or Type-expression equality. A type name is equal only to 4. both are subranges of the same basic type. itself. Two type expressions are equal if they are  Type equality in C/C++. C uses structural equivalence formed by applying the same constructor to equal for all types except for records (structs). struct types expressions. In other words, the expressions have to are named in C and C++ and the name is treated as a be identical. type, equal only to itself. This constraint saves C from having to deal with recursive types.

171 172 /.Type inference-, /.Type declarations-, () *+ () *+ There are two basic forms of type declarations:  Type inference is the process of determining the types of phrases based on the constructs that appear in them. Transparent. An alternative name is given to a type that can also be expressed without this name.  An important language innovation.

Opaque. A new type is introduced into the program that is  A cool algorithm. not equal to any other type.  Gives some idea of how other static analysis algorithms work.

173 174

Type inference in ML Typing rule: Idea Γ f : σ −>τ Γ e : σ ⊢ ⊢ Typing rule: Γ f(e): τ ⊢ if x : τ in Γ Γ x : τ Inference rule: ⊢ Γ f : α Γ e : β ⊢ ⊢ α β −>γ Inference rule: Γ f(e): γ ≈ ⊢ γ α if x : α in Γ Γ x : γ ≈ ⊢

175 176 Example: √ √ Typing rule: √ f : α1, x : α3 f : α7 f : α1, x : α3 x : α8 ⊢ ⊢ Γ, x : σ e : τ f : α1, x : α3 f : α5 f : α1, x : α3 f(x): α6 ⊢ ⊢ ⊢ Γ (fn x => e): σ −> τ f : α1, x : α3 f(f(x)) : α4 ⊢ ⊢ f : α1 fn x => f(f(x)) : α2 ⊢ Inference rule: fn f => fn x => f(f(x)) : α0 ⊢ Γ, x : α e : β ⊢ γ α −>β Γ (fn x => e): γ ≈ α0 α1 −> α2 , α2 α3 −> α4 , α5 α6 −> α4 , α5 α1 ⊢ ≈ ≈ ≈ ≈ α7 α8 −> α6 , α7 α1 , α8 α3 ≈ ≈ ≈

Solution: α0 = (α3 −> α3)−> α3 −> α3

177 178

/.Polymorphism-, Explicit. The program text contains type variables that () *+ determine the way that a construct may be treated Polymorphism, which literally means “having multiple forms”, polymorphically. refers to constructs that can take on different types as needed. Explicit polymorphism often involves explicit instantiation or type application to indicate how type Forms of polymorphism in contemporary programming languages: variables are replaced with specific types in the use of a polymorphic construct. Parametric polymorphism. A function may be applied to any Example: C++ templates. arguments whose types match a type expression involving type variables. Ad hoc polymorphism or overloading. Two or more Parametric polymorphism may be: implementations with different types are referred to by the same name. Implicit. Programs do not need to contain types; types and instantiations of type variables are computed. Subtype polymorphism. The subtype relation between Example: SML. types allows an expression to have many possible types.

179 180 let-polymorphism Polymorphic exceptions  The standard sugaring Example: Depth-first search for finitely-branching trees. let val x = v in e end (fn x => e)(v) datatype 7 does not respect ML type checking.→ ’a FBtree = node of ’a * ’a FBtree list ; For instance fun dfs P (t: ’a FBtree) = let let val f = fn x =>x in f(f) end exception Ok of ’a; fun auxdfs( node(n,F) ) type checks, whilst = if P n then raise Ok n (fn f =>f(f))(fn x =>x) else foldl (fn(t,_) => auxdfs t) NONE F ; in does not. auxdfs t handle Ok n => SOME n  Type inference for let-expressions is involved, requiring end ; type schemes. val dfs = fn : (’a -> bool) -> ’a FBtree -> ’a option

181 182

When a polymorphic exception is declared, SML ensures Topic VII that it is used with only one type. The type of a top level exception must be monomorphic and the type variables Data abstraction and modularity of a local exception are frozen. g SML Modulesa g

Consider the following nonsense: References:

exception Poly of ’a ; (*** ILLEGAL!!! ***)  Chapter 7 of ML for the working programmer (2ND (raise Poly true) handle Poly x => x+1 ; EDITION) by L. C. Paulson. CUP, 1996.

aLargely based on an Introduction to SML Modules by Claudio Russo hhttp://research.microsoft.com/∼crussoi.

183 184 The Core and Modules languages

 The Standard ML Basis Library edited by E.R.Gansner SML consists of two sub-languages: and J. H. Reppy. CUP, 2004.  The Core language is for programming in the small, by [A useful introduction to SML standard libraries, and a supporting the definition of types and expressions good example of modular programming.] denoting values of those types.

 http://www.standardml.org/  The Modules language is for programming in the large, h i by grouping related Core definitions of types and expressions into self-contained units, with descriptive interfaces. The Core expresses details of data structures and algorithms. The Modules language expresses software architecture. Both languages are largely independent.

185 186

The Modules language Writing a real program as an unstructured sequence of Core SML Modules definitions quickly becomes unmanageable. Signatures and structures type nat = int val zero = 0 An abstract data type is a type equipped with a set of fun succ x = x + 1 operations, which are the only operations applicable to that fun iter b f i = type. if i = zero then b Its representation can be changed without affecting the rest else f (iter b f (i-1)) of the program. ... (* thousands of lines later *)  Structures let us package up declarations of related fun even (n:nat) = iter true not n types, values, and functions. The SML Modules language lets one split large programs into  Signatures let us specify what components a structure separate units with descriptive interfaces. must contain.

187 188 Structures The dot notation In Modules, one can encapsulate a sequence of Core type One can name a structure by binding it to an identifier. and value definitions into a unit called a structure. structure IntNat = We enclose the definitions in between the keywords struct struct . . . end. type nat = int Example: A structure representing the natural numbers, as ... positive integers. fun iter b f i = ... struct end type nat = int Components of a structure are accessed with the dot val zero = 0 notation. fun succ x = x + 1 fun even (n:IntNat.nat) = IntNat.iter true not n fun iter b f i = if i = zero then b else f (iter b f (i-1)) NB: Type IntNat.nat is statically equal to int. Value end IntNat.iter dynamically evaluates to a closure.

189 190

Nested structures Concrete signatures Structures can be nested inside other structures, in a hierarchy. Signature expressions specify the types of structures by structure IntNatAdd = listing the specifications of their components. struct A signature expression consists of a sequence of structure Nat = IntNat component specifications, enclosed in between the fun add n m = Nat.iter m Nat.succ n keywords sig . . . end. end sig type nat = int ... val zero : nat fun mult n m = val succ : nat -> nat IntNatAdd.Nat.iter IntNatAdd.Nat.zero (IntNatAdd.add m) n val ’a iter : ’a -> (’a->’a) -> nat -> ’a The dot notation (IntNatAdd.Nat) accesses a nested structure. end Sequencing dots provides deeper access (IntNatAdd.Nat.zero). This signature fully describes the type of IntNat. Nesting and dot notation provides name-space control. The specification of type nat is concrete: it must be int.

191 192 Opaque signatures Example: Polymorphic functional stacks.

On the other hand, the following signature signature STACK = sig type nat sig val zero : nat exception E val succ : nat -> nat type ’a reptype (* <-- INTERNAL REPRESENTATION *) val ’a iter : ’a -> (’a->’a) -> nat -> ’a val new: ’a reptype end val push: ’a -> ’a reptype -> ’a reptype val pop: ’a reptype -> ’a reptype specifies structures that are free to use any implementation for val top: ’a reptype -> ’a type nat (perhaps int, or word, or some recursive datatype). end ; This specification of type nat is opaque.

193 194

structure MyStack: STACK = val MyEmptyStack = MyStack.new ; struct val MyStack0 = MyStack.push 0 MyEmptyStack ; exception E ; val MyStack01 = MyStack.push 1 MyStack0 ; type ’a reptype = ’a list ; val MyStack0’ = MyStack.pop MyStack01 ; val new = [] ; MyStack.top MyStack0’ ; fun push x s = x::s ; fun split( h::t ) = ( h , t ) val MyEmptyStack = [] : ’a MyStack.reptype | split _ = raise E ; val MyStack0 = [0] : int MyStack.reptype fun pop s = #2( split s ) ; val MyStack01 = [1,0] : int MyStack.reptype fun top s = #1( split s ) ; val MyStack0’ = [0] : int MyStack.reptype end ; val it = 0 : int

195 196 Named and nested signatures Signature inclusion Signatures may be named and referenced, to avoid repetition: To avoid nesting, one can also directly include a signature signature NAT = identifier: sig type nat sig include NAT val zero : nat val add: nat -> nat ->nat val succ : nat -> nat end val ’a iter : ’a -> (’a->’a) -> nat -> ’a NB: This is equivalent to the following signature. end sig type nat Nested signatures specify named sub-structures: val zero: nat signature Add = val succ: nat -> nat sig structure Nat: NAT (* references NAT *) val ’a iter: ’a -> (’a->’a) -> nat -> ’a val add: Nat.nat -> Nat.nat -> Nat.nat val add: nat -> nat -> nat end end

197 198

Signature matching Properties of signature matching

Q: When does a structure satisfy a signature?  The components of a structure can be defined in a A: The type of a structure matches a signature whenever it different order than in the signature; names matter but implements at least the components of the signature. ordering does not. The structure must realise (i.e. define) all of the opaque •  A structure may contain more components, or type components in the signature. components of more general types, than are specified The structure must enrich this realised signature, • in a matching signature. component-wise:  Signature matching is structural. A structure can match ⋆ every concrete type must be implemented equivalently; many signatures and there is no need to pre-declare its ⋆ every specified value must have a more general type matching signatures (unlike “interfaces” in Java and C#). scheme; ⋆ every specified structure must be enriched by a  Although similar to record types, signatures actually substructure. play a number of different roles.

199 200 Using signatures to restrict access

The following structure uses a signature constraint to provide Subtyping a restricted view of IntNat: Signature matching supports a form of subtyping not found in structure ResIntNat = the Core language: IntNat : sig type nat val succ : nat->nat  A structure with more type, value, and structure val iter : nat->(nat->nat)->nat->nat components may be used where fewer components end are expected. NB: The constraint str:sig prunes the structure str  A value component may have a more general type according to the signature sig: scheme than expected.  ResIntNat.zero is undefined;

 ResIntNat.iter is less polymorphic that IntNat.iter.

201 202

Transparency of : SML Modules Information hiding Although the : operator can hide names, it does not conceal the definitions of opaque types. In SML, we can limit outside access to the components of Thus, the fact that ResIntNat.nat = IntNat.nat = int remains a structure by constraining its signature in transparent or transparent. opaque manners. For instance the application ResIntNat.succ(~3) is still Further, we can hide the representation of a type by means well-typed, because ~3 has type int . . . but ~3 is negative, so of an abstype declaration. not a valid representation of a natural number! The combination of these methods yields abstract structures.

203 204  The actual implementation of AbsNat.nat by int is Using signatures to hide hidden, so that AbsNat.nat = int. 6 the identity of types AbsNat is just IntNat, but with a hidden type representation. With different syntax, signature matching can also be used to enforce data abstraction:  AbsNat defines an abstract datatype of natural numbers: structure AbsNat = the only way to construct and use values of the abstract IntNat :> sig type nat type AbsNat.nat is through the operations, zero, succ, val zero: nat and iter. val succ: nat->nat E.g., the application AbsNat.succ(~3) is ill-typed: ~3 has val ’a iter: ’a->(’a->’a)->nat->’a type int, not AbsNat.nat. This is what we want, since ~3 end is not a natural number in our representation. The constraint str :> sig prunes str but also generates a In general, abstractions can also prune and specialise new, abstract type for each opaque type in sig. components.

205 206

2. abstypes 1. Opaque signature constraints structure MyHiddenStack: STACK = structure MyOpaqueStack :> STACK = MyStack ; struct exception E ; val MyEmptyOpaqueStack = MyOpaqueStack.new ; abstype ’a reptype = S of ’a list (* <-- HIDDEN *) val MyOpaqueStack0 = MyOpaqueStack.push 0 MyEmptyOpaqueStack ; with (* REPRESENTATION *) val MyOpaqueStack01 = MyOpaqueStack.push 1 MyOpaqueStack0 ; val new = S [] ; val MyOpaqueStack0’ = MyOpaqueStack.pop MyOpaqueStack01 ; fun push x (S s) = S( x::s ) ; MyOpaqueStack.top MyOpaqueStack0’ ; fun pop( S [] ) = raise E val MyEmptyOpaqueStack = - : ’a MyOpaqueStack.reptype | pop( S(_::t) ) = S( t ) ; val MyOpaqueStack0 = - : int MyOpaqueStack.reptype fun top( S [] ) = raise E val MyOpaqueStack01 = - : int MyOpaqueStack.reptype | top( S(h::_) ) = h ; val MyOpaqueStack0’ = - : int MyOpaqueStack.reptype end ; val it = 0 : int end ;

207 208 val MyHiddenEmptyStack = MyHiddenStack.new ; val MyHiddenStack0 = MyHiddenStack.push 0 MyHiddenEmptyStack ; SML Modules val MyHiddenStack01 = MyHiddenStack.push 1 MyHiddenStack0 ; Functors val MyHiddenStack0’ = MyHiddenStack.pop MyHiddenStack01 ; MyHiddenStack.top MyHiddenStack0’ ;  An SML functor is a structure that takes other structures as parameters. val MyHiddenEmptyStack = - : ’a MyHiddenStack.reptype val MyHiddenStack0 = - : int MyHiddenStack.reptype  Functors let us write program units that can be val MyHiddenStack01 = - : int MyHiddenStack.reptype combined in different ways. Functors can also val MyHiddenStack0’ = - : int MyHiddenStack.reptype express generic algorithms. val it = 0 : int

209 210

Functors  A functor is a function mapping a formal argument structure to a concrete result structure. Modules also supports parameterised structures, called functors.  The body of a functor may assume no more information about its formal argument than is Example: The functor AddFun below takes any specified in its signature. implementation, N, of naturals and re-exports it In particular, opaque types are treated as distinct type with an addition operation. parameters. functor AddFun(N:NAT) = struct Each actual argument can supply its own, independent structure Nat = N implementation of opaque types. fun add n m = Nat.iter n (Nat.succ) m end

211 212 Functor application Example: Generic imperative stacks. A functor may be used to create a structure by applying it to signature STACK = an actual argument: sig structure IntNatAdd = AddFun(IntNat) type itemtype structure AbsNatAdd = AddFun(AbsNat) val push: itemtype -> unit The actual argument must match the signature of the formal val pop: unit -> unit parameter—so it can provide more components, of more val top: unit -> itemtype general types. end ; Above, AddFun is applied twice, but to arguments that differ in their implementation of type nat (AbsNat.nat = IntNat.nat). 6

213 214 exception E ; functor Stack( T: sig type atype end ) : STACK = structure intStack struct = Stack(struct type atype = int end) ; type itemtype = T.atype val stack = ref( []: itemtype list ) structure intStack : STACK fun push x intStack.push(0) ; = ( stack := x :: !stack ) intStack.top() ; fun pop() intStack.pop() ; = case !stack of [] => raise E intStack.push(4) ; | _::s => ( stack := s ) fun top() val it = () : unit = case !stack of [] => raise E val it = 0 : intStack.itemtype | t::_ => t val it = () : unit end ; val it = () : unit

215 216 Why functors ? map ( intStack.push ) [3,2,1] ; Functors support: map ( fn _ => let val top = intStack.top() Code reuse. in intStack.pop(); top end ) AddFun may be applied many times to different [(),(),(),()] ; structures, reusing its body. val it = [(),(),()] : unit list Code abstraction. val it = [1,2,3,4] : intStack.itemtype list AddFun can be compiled before any argument is implemented. Type abstraction. AddFun can be applied to different types N.nat.

217 218

Topic VIII  A Scala Tutorial for Java Programmers by M. Schinz The state of the art and P.Haller. Programming Methods Laboratory, g Scala g < www.scala-lang.org > EPFL, 2008.

References:  Scala By Example by M. Odersky. Programming Methods Laboratory, EPFL, 2008.

 An overview of the Scala programming language by M. Odersky et al. Technical Report LAMP-REPORT-2006-001, Second Edition, 2006.

219 220 Scala (I)  Scala has been designed to work well with Java and C#.  Scala has been developed from 2001 in the Programming Every Java class is seen in Scala as two entities, a Methods Laboratory at EPFL by a group lead by Martin class containing all dynamic members and a singleton Odersky. It was first released publicly in 2004, with a object, containing all static members. second version released in 2006. Scala classes and objects can also inherit from Java  Scala is aimed at the construction of components and classes and implement Java interfaces. This makes it component systems. possible to use Scala code in a Java framework. One of the major design goals of Scala was that it should  Scala’s influences: Beta, C#, FamilyJ, gbeta, Haskell, be flexible enough to act as a convenient host language Java, Jiazzi, ML , Moby, MultiJava, Nice, OCaml, for domain specific languages implemented by library ≤ Pizza, Sather, Smalltalk, SML, XQuery, etc. modules.

221 222

A procedural language ! def qsort( xs: Array[Int] ) { def swap(i: Int, j:Int) { val t = xs(i); xs(i) = xs(j); xs(j) = t NB:

}  def sort(l: Int, r: Int) Definitions start with a reserved word. { val pivot = xs( (l+r)/2 ); var i = l; var j = r  Type declarations use the colon notation. while (i <= j) { while ( lt( xs(i), pivot ) ) i += 1  Array selections are written in functional notation. while ( lt( xs(j), pivot ) ) j -= 1 if ( i<=j ) swap(i,j); i += 1; j -= 1 (In fact, arrays in Scala inherit from functions.) { } }  Block structure. if (l

} sort(0,xs.length-1)

}

223 224 NB: A declarative language !  Polymorphism.  Type declarations can often be omitted because the def qsort[T]( xs: Array[T] )( lt: (T,T)=>Boolean ): Array[T] compiler can infer it from the context. = if ( xs.length <= 1 ) xs else  Higher-order functions. { val pivot = xs( xs.length/2 )  The binary operation e ⋆ e′ is always interpreted a the Array.concat( qsort( xs filter (x => lt(x,pivot)) ) lt , method call e. ⋆ (e′). xs filter (x => x == pivot ) , qsort( xs filter (x => lt(pivot,x)) ) lt )  The equality operation == between values is designed to be transparent with respect to the type } representation.

225 226

Scala (II) Mutable state Scala fuses (1) object-oriented programming and (2) functional  Real-world objects with state are represented in Scala programming in a statically typed programming language. by objects that have variables as members. 1. Scala uses a uniform and pure object-oriented model  In Scala, all mutable state is ultimately built from similar to that of Smalltalk: Every value is an object and variables. every operation is a message send (that is, the invocation of a method).  Every defined variable has to be initialised at the point of In fact, even primitive types are not treated specially; they its definition. are defined as type aliases of Scala classes.  Variables may be private. 2. Scala is also a functional language in the sense that functions are first-class values.

227 228 Functions Blocks A function in Scala is a first-class value. Scala is an expression-oriented language, every function The anonymous function returns some result. ( x1: T1, ... , xn: Tn ) => E Blocks in Scala are themselves expressions. Every block ends in a result expression which defines its value. is equivalent to the block def f ( x1: T1 , ... , xn: Tn ) = E ; f Scala uses the usual block-structured scoping rules. { } where f is a fresh name which is used nowhere else in the program.

229 230

Classes and objects Parameter passing  classes provide fields and methods. These are Scala uses call-by-value by default, but it switches to accessed using the dot notation. However, there may call-by-name evaluation if the parameter type is preceded be private fields and methods that are inaccessible by =>. outside the class. Imperative control structures Scala, being an object-oriented language, uses dynamic dispatch for method invocation. Dynamic A functional implementation of while loops: method dispatch is analogous to higher-order function calls. In both cases, the identity of the code to be def whileLoop( cond: => Boolean )( comm: => Unit ) executed is known only at run-time. This similarity is if (cond) comm ; whileLoop( cond )( comm ) { } not superficial. Indeed, Scala represents every function value as an object.

231 232  Every class in Scala has a superclass which it extends. A class inherits all members from its superclass. It may also override (i.e. redefine) some inherited members.  Methods in Scala do not necessarily take a parameter If class A extends class B, then objects of type A may be list. These parameterless methods are accessed just used wherever objects of type B are expected. We say in as value fields. this case that type A conforms to type B. The uniform access of fields and parameterless  Scala maintains the invariant that interpreting a value of a methods gives increased flexibility for the implementor subclass as an instance of its superclass does not change of a class. Often, a field in one version of a class the representation of the value. becomes a computed value in the next version. Uniform access ensures that clients do not have to Amongst other things, it guarantees that for each pair of be rewritten because of that change. types S <: T and each instance s of S the following semantic equality holds: s.asInstanceOf[T].asInstanceOf[S] = s

233 234

 Scala has object definitions. An object definition  abstract classes may have deferred members which are defines a class with a single instance. It is not declared but which do not have an implementation. possible to create other objects with the same Therefore, no objects of an abstract class may be created structure using new. using new. object EmptySet extends IntSet abstract class IntSet { { def incl( x: Int ): IntSet def incl( x:Int ): IntSet = new NonEmptySet(x,EmptySet,EmptySet) def contains( x:Int ): Boolean def contains( x: Int ): Boolean = false } } Abstract classes may be used to provide interfaces. An object is created the first time one of its members is accessed. (This strategy is called lazy evaluation.)

235 236 Case study (I)  A trait is a special form of an abstract class that does abstract class Expr { not have any value (as opposed to type) parameters for its def isNumber: Boolean constructor and is meant to be combined with other def isSum: Boolean classes. def numValue: Int def leftOp: Expr trait IntSet def rightOp: Expr { def incl( x:Int ): IntSet } class Number( n: Int ) extends Expr def contains( x:Int ): Boolean { def isNumber: Boolean = true } def isSum: Boolean = false def numValue: Int = n Traits may be used to collect signatures of some def leftOp: Expr = error("Number.leftOp") functionality provided by different classes. def rightOp: Expr = error("Number.rightOp")

}

237 238 class Sum( e1: Expr; e2: Expr ) extends Expr { def isNumber: Boolean = false def isSum: Boolean = true Case study (II) def numValue: Int = error("Sum.numValue") abstract class Expr { def leftOp: Expr = e1 def eval: Int def rightOp: Expr = e2 } } def eval( e: Expr ): Int = class Number( n: Int ) extends Expr { { if (e.isNumber) e.NumValue def eval: Int = n else if (e.isSum) eval(e.leftOp) + eval(e.rightOp) } else error("bad expression") class Sum( e1: Expr; e2: Expr ) extends Expr { def eval: Int = e1.eval + e2.eval } ? What is good and what is bad about this implementation? }

239 240 Case study (III) This implementation is easily extensible with new types Case classes of data: abstract class Expr class Prod( e1: Expr; e2: Expr ) extends Expr { case class Number( n: Int ) extends Expr def eval: Int = e1.eval * e2.eval case class Sum( e1: Expr; e2: Expr ) extends Expr } case class Prod( e1: Expr; e2: Expr ) extends Expr But, is this still the case for extensions involving new  Case classes implicitly come with a constructor operations on existing data? function, with the same name as the class. Hence one can construct expression trees as: Sum( Sum( Number(1) , Number(2) ) , Number(3) )

241 242

Case study (III) Pattern matching  Case classes and case objects implicitly come with The match method takes as argument a number of cases: implementations of methods toString, equals, and def eval( e: Expr ): Int hashCode. = e match  Case classes implicitly come with nullary accessor case Number(x) => x { methods which retrieve the constructor arguments. case Sum(l,r) => eval(l) + eval(r) case Prod(l,r) => eval(l) * eval(r)  Case classes allow the constructions of patterns which refer to the case class constructor. } If none of the patterns matches, the pattern matching expression is aborted with a MatchError exception.

243 244 Generic types Generic types and methods Variance annotations The combination of type parameters and subtyping poses  Classes in Scala can have type parameters. some interesting questions. ? If T is a subtype of a type S, should Array[T] be a abstract class Set[A] subtype of the type Array[S]? { def incl( x: A ): Set[A] ! Yes, Java decrees so! def contains( x: A ): Boolean ! No, if one wants to avoid run-time checks! } Example:  Scala has a fairly powerful type inferencer which allows val x = new Array[String](1) one to omit type parameters to polymorphic functions val y: Array[Any] = x // Scala forbids this because and constructors. // Array is not covariant y.update( 0 , new Rational(1,2) )

245 246

Scala uses a conservative approximation to verify In Scala, generic types like the following one: soundness of variance annotations: a covariant type class Array[A] { parameter of a class may only appear in covariant def apply( index: Int ): A ... position inside the class. Hence, the following class def update( index: Int, elem: A ) definition is rejected: ... class Array[+A] { } def apply( index: Int ): A have by default non-variant subtyping. ... However, one can enforce co-variant (or covariant) subtyping def update( index:Int , elem: A ) by prefixing a formal type parameter with a +. There is also a ... prefix - which indicates contra-variant subtyping. }

247 248 Functions are objects The function x => x+1 would be expanded to an instance Recall that Scala is an object-oriented language in that every of Function1 as follows: value is an object. It follows that functions are objects in Scala. new Function1[Int,Int] { Indeed, the function type def apply( x:Int ): Int = x+1 ( A1, ..., Ak ) => B } is equivalent to the following parameterised class type: Conversely, when a value of a function type is applied to abstract class Functionk[-A1,...,-Ak,+B] some arguments, the apply method of the type is implicitly def apply( x1:A1,...,xn:Ak ): B { } inserted; e.g. for f and object of type Function1[A,B], the application f(x) is expanded to f.apply(x).  Since function types are classes in Scala, they can be  NB: Function subtyping is contravariant in its arguments further refined in subclasses. An example are arrays,   whereas it is covariant in its result. ? Why?  which are treated as special functions over the type of     integers. 

249 250

Generic types Type parameter bounds trait Ord[A] Generic types { def lt( that: A ): Boolean View bounds

} One problem with type parameter bounds is that they case class Num( value: Int ) extends Ord[Num] { require forethought: if we had not declared Num as a def lt( that: Num ) = this.value < that.value subclass of Ord, we would not have been able to use } Num elements in heaps. By the same token, Int is not trait Heap[ A <: Ord[A] ] { a subclass of Ord, and so integers cannot be used as def insert( x: A ): Heap[A] heap elements. def min: A def remove: Heap[A]

}

251 252 Generic types A more flexible design, which admits elements of these types, Lower bounds uses view bounds: Covariant generic functional stacks: trait Heap[ A <% Ord[A] ] { abstract class Stack[+A] def insert( x: A ): Heap[A] { def push[B >: A]( x: B ): Stack[B] def min: A = new NonEmptyStack(x,this) def remove: Heap[A] def top: A } def pop: Stack[A] A view bounded type parameter clause [ A <% T ] only } specifies that the bounded type A must be convertible to class NonEmptyStack[+A]( elem: A, rest: Stack[A] ) extends Stack[A] the bound type T, using an implicit conversion. { def top = elem Views allow one to augment a class with new members and def pop = rest supported traits. }

253 254

Implicit parameters and conversions

object EmptyStack extends Stack[Nothing]  Implicit parameters { def top = error("EmptyStack.top") In Scala, there is an implicit keyword that can be used def pop = error("EmptyStack.pop") at the beginning of a parameter list.

} def qsort[T]( xs: Array[T] )( implicit o: Ord[T] ): Array[T]  Scala does not allow to parameterise objects with types. = if ( xs.length <= 1 ) xs else  { Nothing is a subtype of all other types. val pivot = xs( xs.length/2 ) Array.concat( qsort( xs filter (x => o.lt(x,pivot)) ) , xs filter (x => x == pivot ) , qsort( xs filter (x => o.lt(pivot,x)) ) )

}

255 256 The principal idea behind implicit parameters is that Mixin-class composition arguments for them can be left out from a method call. If the arguments corresponding to implicit parameters Every class or object in Scala can inherit from several traits are missing, they are inferred by the Scala compiler. in addition to a normal class. trait AbsIterator[T] NB: View bounds are convenient syntactic sugar for { implicit parameters. def hasNext: Boolean def next: T  Implicit conversions As last resort in case of type mismatch the Scala } trait RichIterator[T] extends AbsIterator[T] compiler will try to apply an implicit conversion. { def foreach( f: T => Unit ): Unit = implicit def int2ord( x: Int ): Ord[Int] while (hasNext) f(next) = new Ord[Int] def lt( y: Int ) = x < y { } Implicit conversions can also be applied in member } selections.

257 258

object Test { def main( args: Array[String] ): Unit = { class StringIterator( s: String ) class Iter extends StringIterator(args(0)) extends AbsIterator[Char] with RichIterator[Char] { private var i = 0 val iter = new Iter def hasNext = i < s.length iter.foreach(System.out.println) def next = val x = s charAt i; i = i+1; x { } } } } Traits can be used in all contexts where other abstract classes The class Iter is constructed from a mixin composition appear; however only traits can be used as mixins. of the parents StringIterator (called the superclass) and RichIterator (called a mixin) so as to combine their functionality.

259 260 Language innovations The class Iter inherits members from both StringIterator and RichIterator.  Flexible syntax and type system.

NB: Mixin-class composition is a form of multiple inheritance!  Pattern matching over class hierarchies unifies functional and object-oriented data access.

 Abstract types and mixin composition unify concepts from object and module systems.

261 262