9/23/20

Highlights of Psuedo-Code

• Virtual – More regularity – Higher level • Decreased chance of errors – Automate tedious and error-prone tasks • Increased security CS4100 – Error checking Dr. Martin • Simplify debugging – trace From Principles of Programming Languages: Design, Evaluation, and Implementation (Third Edition), by Bruce J. MacLennan, Chapter 2, and based on slides by Istvan Jonyer.

1 2

Now: FORTRAN Backus at IBM

The First Generation • Visionary at IBM • Early 1950s • Recognized need for faster coding practice – Simple assemblers and libraries of • Need “language” that allows decreasing costs to subroutines were tools of the day linear, in size of the program – Automatic programming was considered • for IBM 701 – Language based on mathematical notation unfeasible – Interpreter to simulate floating point arithmetic – Good coders liked being masters of the trade • Laning and Zierler at MIT in 1952 – Algebraic language

3 4

Backus at IBM Meanwhile

• Goals • Grace Hopper organizes Symposia via Office of Naval Research (ONR) – Get floating point operations into hardware: IBM 704 • Exposes deficiencies in pseudo-code • Backus meets Laning and Zierler – Decrease programming costs • Later (1978) Backus says: • Programmers to write in conventional mathematical notation – “As far as we were aware we simply made up the language as we went along. We did not regard language design as a difficult • Still generate efficient code problem, merely as a simple prelude to the real problem: designing • IBM authorizes project a compiler which could produce efficient programs.” – Backus begins outlining FORTRAN • FORTRAN compiler works! • IBM Mathematical FORmula TRANslating System – Has few assistants – Project is overlooked (greeted with indifference and skepticism according to Dijkstra)

5 6

1 9/23/20

FORTRAN timeline FORTRAN • 1954: Project approved • 1957: FORTRAN – First version released • Goals • 1958: FORTRAN II and III – Decrease programming costs (to IBM) – Still many dependencies on IBM 704 – Efficiency • 1962: FORTRAN IV – “ANS FORTRAN” by American National Standards Institute – Breaks machine dependence – Few implementations follow the specifications • We’ll look at 1966 ANS FORTRAN

7 8

Sample FORTRAN program Structural Organization

DIMENSION DTA(900) • Preliminary specification did not include subprograms SUM 0.0 (like in pseudo-code) READ 10, N • FORTRAN I, however, already included subprograms 10 FORMAT(I3) DO 20 I = 1, N READ 30, DTA(I) Main program 30 FORMAT(F10.6) IF (DTA(I)) 25, 20, 20 Subprogram 1 25 DTA(I) = -DTA(I) . 20 CONTINUE . … .

Subprogram n

9 10

Constructs Declarative Constructs

• Declarative constructs • Declarations include – (First part in pseudo-code: data – Allocate area of memory of a specified size initialization) – Attach symbolic name to that area of memory – Declare facts about the program, to be – Initialize the memory used at compile-time • FORTRAN example • Imperative constructs – DIMENSION DTA (900) – (Second part in pseudo-code: program) – DATA DTA, SUM / 900*0.0, 0.0 • initializes DTA to 900 zeroes – Commands to be executed during run-time • SUM to 0.0

11 12

2 9/23/20

Imperative Constructs Building a FORTRAN Program

• Categories: • Interpretation unacceptable, since the selling point – Computational is speed • E.g.: Assignment, Arithmetic operations • Need the following stages to build: • FORTRAN: AVG = SUM / FLOAT(N) – Control-flow 1. Compilation Translate code to relocatable object code • E.g.: comparisons, loop • FORTRAN: 2. Linking – IF-statements Incorporating libraries (resolving external dependencies) – DO loop 3. Loading – GOTO Program loaded into memory; converted from relocatable to – Input/output absolute format • E.g.: read, print 4. Execution • FORTRAN: Elaborate array of I/O instructions (tapes, drums, Control is turned over to the processor etc.)

13 14

DESIGN: Control Structures Compilation

• Compilation has 3 phases – Syntactic analysis • Control structures control flow in the • Classify statements, constructs and extract their parts program – Optimization • FORTRAN has considerable optimizations, since that was the • Most important statement in FORTRAN: selling point – Assignment Statement – Code synthesis • Put together parts of object code instructions in relocatable format

15 16

DESIGN: Control Structures Arithmetic IF-statement • Machine Dependence (1st generation) • In FORTRAN, these were based on • Example of machine dependence native IBM 704 branch instructions – IF (a) n1, n2, n3 – Evaluate a: branch to – “Assembly language for IBM 704” • n1: if -, • n2: if 0, FORTRAN II statement IBM 704 branch operation • n3: if + GOTO n TRA k (transfer direct) – CAS instruction in IBM 704 GOTO n, (n1, n2,…,nm) TRA i (transfer indirect) GOTO (n1, n2,…,nm), n TRA i,k (transfer indexed) • More conventional IF-statement was later IF (a) n1, n2, n3 CAS k introduced IF ACCUMULATOR OVERFLOW n1, n2 TOV k – IF (X .EQ. A(I)) K = I - 1 … …

17 18

3 9/23/20

Principles of Programming GOTO

• The Portability Principle • Workhorse of control flow in FORTRAN – Avoid features or facilities that are • 2-way branch: dependent on a particular computer or a IF (condition) GOTO 100 case for false small class of . GOTO 200 100 case for true 200 • Equivalent to if-then-else in newer languages

19 20

n-way Branching Reversing TRUE and FALSE with Computed GOTO

GOTO (L1, L2, L3, L4 ), I 10 case 1 GOTO 100 • To get if-then-else –style if: 20 case 2 GOTO 100 IF (.NOT. (condition)) GOTO 100 30 case 3 case for true GOTO 100 40 case 4 GOTO 200 GOTO 100 100

100 case for false • Transfer control to label Lk if I contains k 200 • Jump Table

21 22

n-way Branching Loops with Computed GOTO • Loops are implemented using combinations GOTO (10, 20, 30, 40 ), I of IF and GOTOs 10 case 1 • Trailing-decision loop: GOTO 100 … … 20 case 2 100 body of loop GOTO 100 IF (loop not done) GOTO 100 30 case 3 • Leading-decision loop: GOTO 100 100 IF (loop done) GOTO 200 40 case 4 GOTO 100 …body of loop… 100 GOTO 100 • IF and GOTO are selection statements 200 … • Readable?

23 24

4 9/23/20

But wait, there’s more! Hmmm…

• Mid-decision loop: • Very difficult to know what control 100 …first half of loop… structure is intended IF (loop done) GOTO 200 …second half of loop… • Spaghetti code GOTO 100 • Very powerful 200 … • Must be a principle in here somewhere

25 26

Principles of Programming GOTO: A Two-Edged Sword

• The Structure Principle (Dijkstra) • Very powerful – The static structure of the program should – Can be used for good or for evil correspond in a simple way to the dynamic • But seriously is GOTO good or bad? structure of the corresponding computations. – Good: very flexible, can implement elaborate control structures • What does this mean? – Bad: hard to know what is intended – Should be able to visualize behavior of – Violates the structure principle program based on written form

27 28

Ex: Computed and Assigned But that’s not all! GOTOs • We just saw the Computed GOTO:

GOTO (L1, L2, …, Ln), I – Jumps to label 1, 2, … ASSIGN 20 TO N • N has address of stmt • Now consider the Assigned GOTO: 20, say it is 347

GOTO N, (L1, L2, …, Ln) – Jumps to ADDRESS in N • Look for 347 in jump – List of labels not necessary GOTO (20, 30, 40, 50), N table - out of range – Must be used with ASSIGN-statement • Not checked ASSIGN 20 TO N • Fetch value at 347 and use as destination for – Put address of statement 20 into N jump – Not the same as N = 20 !!!! • Problem??? – Computed should have been Assigned

29 30

5 9/23/20

Ex: Computed and Assigned Principles of Programming GOTOs • The Syntactic Consistency Principle I = 3 • I expected to have an address – Things that look similar should be similar and things that look different should be

• GOTO statement with different. GOTO I, (20, 30, 40, 50) address 3 – Probably in area used by system, i.e. not a stmt • Problem??? – Assigned should have been computed

31 32

Syntactic Consistency Even worse…

• Best to avoid syntactic forms that can be converted to • Confusing the two GOTOs will not be other forms by a simple error – ** and * caught by the compiler – Weak Typing (more on this later) • Violates the defense in depth principle • Integer variables – Integers – Addresses of statements – Character strings • Maybe a LABEL type? – Catch errors at compile time

33 34

Principles of Programming The DO-loop • Fortunately, FORTRAN provides the DO-loop • The Defense in Depth Principle • Higher-level than IF-GOTO-style control structures – No direct machine-equivalency – If an error gets through one line of defense, DO 100 I = 1, N then it should be caught by the next line of A(I) = A(I) * 2 defense. 100 CONTINUE • I is called the controlled variable • CONTINUE must have matching label • DO allows stating what we want: higher level – Only built-in higher level structure

35 36

6 9/23/20

Nesting Principles of Programming • The DO-loop can be nested • Preservation of Information Principle DO 100 I = 1, N – The language should allow the representation of ... information that the user might know and that the DO 200 J = 1, N compiler might need. ... 200 CONTINUE • Do-loop makes explicit 100 CONTINUE – Control variable – They must be correctly nested – Initial and final values – Optimized: controlled variable can be stored in – Extent of loop • If and GOTO – Note: we could have done this with GOTO – Compiler has to figure out

37 38

Subprograms Subprograms • AKA subroutine SUBROUTINE Name(formals) – User defined • When invoked …body… – Function returns a value – Using call stmt RETURN • Can be used in an expression END – Formals bound to • Important, late addition actuals • Why are they important? … – Formals aka dummy – Subprograms define procedural abstractions CALL Name (actuals) variables – Repeated code can be abstracted out, variables formalized – Allow large programs to be modularized • Humans can only remember a few things at a time (about 7)

39 40

Example Principles of Programming

SUBROUTINE DIST (D, X, Y) • The Abstraction Principle D = X – Y IF (D .LT. 0) D = -D – Avoid requiring something to be stated RETURN more than once; factor out the recurring END pattern.

… CALL DIST (DIFFER, POSX, POSY) …

41 42

7 9/23/20

Libraries Parameter Passing

• Subprograms encourage libraries • Once we decide on subprograms, we – Subprograms are independent of each need to figure out how to pass other parameters – Can be compiled separately • Fortran parameters – Can be reused later – Input – Maintain library of already debugged and – Output compiled useful subprograms • Need address to write to – Both

43 44

A Dangerous Side-Effect Parameter Passing • What if parameter passed in is not a variable? SUBROUTINE SWITCH (N) • Pass by reference N = 3 – On chance may need to write to RETURN • all vars passed by reference END – Pass the address of the variable, not its value … – Advantage: CALL SWITCH (2) • Faster for larger (aggregate) data constructs • The literal 2 can be changed to the literal 3 in FORTRAN’s • Allows output parameters literal table!!! – Disadvantage: – I = 2 + 2 I = 6???? • Address has to be de-referenced – Violates security principle – Not by programmer—still, an additional operation • Values can be modified by subprogram • Need to pass size for data constructs - if wrong?

45 46

Principles of Programming Pass by Value-Result

• Security principle • Also called copy-restore – No program that violates the definition of • Instead of pass by reference, copy the value of actual the language, or its own intended structure, parameters into formal parameters should escape detection. • Upon return, copy new values back to actuals • Both operations done by caller – Can know not to copy meaningless result • E.g. actual was a constant or expression • Callee never has access to caller’s variables

47 48

8 9/23/20

Activation Records What happens exactly?

• What happens when a subprogram is • Before subprogram invocation: called? – Place parameters into callee’s activation – Transmit parameters record – Save caller’s status – Save caller’s status • Save content of registers – Enter the subprogram • Save instruction pointer (IP) – Restore caller’s state – Save pointer to caller’s activation record in – Return to caller callee’s activation record – Enter the subprogram

49 50

What happens exactly? Contents of Activation Record

• Returning from subprogram: • Parameters passed to subprogram – Restore instruction pointer to caller’s • P (resumption address) – Return to caller • Dynamic link (address of caller’s – Caller needs to restore its state (registers) activation record) – If subprogram is a function, return value • Temporary areas for storing registers must be made accessible

51 52

DESIGN: Data Structures Primitives

• First data structures • Primitives are scalars only – Suggested by mathematics – Integers • Primitives – Floating point numbers • Arrays – Double-precision floating point – Complex numbers – No text (string) processing

53 54

9 9/23/20

Representations Arithmetic Operators • Word-oriented • 2 + 3.1 = ? – 2 is integer, 3.1 is floating point – Most commonly 32 • How do we handle this situation? • Integer – Explicit type-casting: FLOAT(2) + 3.1 • Type-casting is also called “coercion” – Represented on 31 bits + 1 sign – FORTRAN: Operators are overloaded – Automatic type coercion • Floating point • Always coerce to encompassing set – Integer + Float à float addition – Using scientific notation: characteristic + – Float * Double à double multiplication mantissa – Integer – Complex à complex subtraction • Types dominate their subsets sm sc c7 … c0 m21 … m0

55 56

Example Hollerith Constants • Early form of character string in FORTRAN – 6HCARMEL is a six character string ‘CARMEL’ (H is for • X**(1/3) = ? Hollerith) 1/3 = 0 – Second-class citizens • No operations allowed 1/3.0 = 0.33333 • Can be read into an integer variable, which cannot (should not) be altered • Problems – Integer representing a Hollerith constant may be altered, which makes no sense • Weak typing – No type checking is performed

57 58

Constructor: Array Representation

• Constructor • Simple, intuitive representation • Column-major order – Method to build complex data structures – Most languages do row-major order Element Address from primitive ones – Addressing equation: A(1,1) A • a{A(2)} = a{A(1)} + 1 = a{A(1)} – 1 + 2 A(2,1) A + 1 • FORTRAN only has array constructors • a{A(i)} = a{A(1)} – 1 + i … • a{A(i,j)} = a{A(1,1)} + (j – 1)m + i – 1 A(m,1) A + m - 1 DIMENSION DTA, COORD(10,10) • FORTRAN uses 1-based addressing – One addressable slot of each elt A(1,2) A + m – Initialization is not required … – Maximum 3 dimensions A(m,2) A + 2m - 1 … A(m,n) A + nm - 1

59 60

10 9/23/20

Optimizations Subscripts • Subscripts can be expressions • Arrays are mostly associated with loops – A(i+m*c) – Most programmers initialize controlled variable to 1, and – This defeats above optimization reference array A(i) – Therefore, subscripts are limited to – Optimization: • c and c’ are integers, v is an integer variable • c • Initialize controlled variable to address of array element • v • Therefore, we’ll increment address itself • v+c, v-c • Dereference controlled variable to get array element • c*v • c*v+c’, c*v-c’ – A(J - 1) ok; A(1+J) not ok • Optimizations like this sold FORTRAN

61 62

DESIGN: Name Structures Declarations

• What do name structures structure? • Declarations are non-executable – Names, of course! statements • Primitives bind names to objects • Unlike IF, GOTO, etc., which are executable statements – INTEGER I, J, K • Allocate integers I, J, and K, and bind the • Static allocation names to memory locations – Allocated once, cannot be deallocated for • Declare: name, type, storage reuse – FORTRAN does not do dynamic allocation

63 64

Optional Declaration Now: Semantics (meaning) • FORTRAN does not require variables to be declared – First use will declare a variable • “They went to the bank of the Rio • What’s wrong with this? – COUNT = COUMT + 1 Grande.” – What if first use is not assignment? • What does this mean? • Convention: – Variables starting with letters i, j, k, l, m, n are integers • How do we know? – Others are floating point – Bad practice: Encourages funny names (KOUNT, ISUM, • CONTEXT, CONTEXT, CONTEXT XLENGTH…)

65 66

11 9/23/20

Programming Languages SCOPE

• X = COUNT(I) • Scope of a binding of a name • What does this mean – Region of program where binding is visible – X integer or real • In FORTRAN – COUNT array or function – Subprogram names GLOBAL • Again Context • Can be called from anywhere – Set of variables visible when statement is – Variable names LOCAL seen • To subprogram where declared • Context is called ENVIRONMENT

67 68

Contour Diagram Once we have subprograms…

Global scope

R Main program • We need to find a way to data S X – Parameters

R(2) • Pass by reference S(X) • Pass by value-result

R S – Caller copies value of actual to formal variable N N – On return, caller copies result value to actual X Y » Omit for constants or expressions as actuals Y S(X)

69 70

Once we have subprograms… Information Hiding • Share Data With Just Parameters? – Cumbersome, and hard to maintain • Information Hiding Principle – Produces long list of parameters • Modules should be designed so that – If data structure changes, there are many changes • The user has all the information needed to use to be made the module and nothing more. – Violates information hiding • The implementor has all the information needed to implement the module correctly and nothing more.

71 72

12 9/23/20

Sharing Data Common • FORTRAN’s solution: • COMMON blocks allow more flexibility – Allows sharing data between subprograms – Scope rules necessitate this • Consider a symbol table

SUBROUTINE ARRAY2 (N, L, C, D1, D2) COMMON /SYMTAB/ NAMES(100), LOC(100), TYPE(100) ... SUBROUTINE VAR (N, L, C) COMMON /SYMTAB/ NAMES(100), LOC(100), TYPE(100)

73 74

COMMON Problems Aliasing • Tedious to write • The ability to have more than one name • Unreadable for the same memory location • Virtually impossible to change AND • Very flexible! • COMMON permits aliasing, which is dangerous COMMON /B/ M, A(100) – If COMMON specifications don’t agree, misuse is possible COMMON /B/ X, K, C(50), D(50)

75 76

Aliasing EQUIVALENCE • Since dynamic memory allocation is not supported, and memory is scarce, FORTRAN has EQUIVALENCE

DIMENSION INDATA(10000), RESULT(8000) EQUIVALENCE INDATA(1), RESULT(8)

• Allows a way to explicitly alias two arrays to the same memory

77 78

13 9/23/20

EQUIVALENCE DESIGN: Syntactic Structures

• This is only to be used when usage of • Languages are defined by lexics and syntax INDATA and RESULT do not overlap – Lexics • Allows access to different data types (float as • Way to combine characters to form words or symbols if it was integer, etc.) • E.g. Identifier must begin with a letter, followed by no more than 5 letters or digits • Has same dangers as COMMON – Syntax • Way to combine symbols into meaningful instructions • Syntactic analysis: Lexical analyzer (scanner) Syntactic analyzer (parser)

79 80

Fixed Format Lexics Blanks Ignored • Still using punch-cards! • Particular columns had particular meanings • FORTRAN ignored spaces (not just white spaces) • Statements (columns 7-72) were free format • Thisisveryunfortunate!

Columns Purpose DIMENSION INDATA(10000), RESULT(8000) D I M E N S I O N I N D A T A (1 0 0 0 0), R E S U L T (8000) 1-5 Statement number DIMENSIONINDATA(10000),RESULT(8000)

6 Continuation

7-72 Statement • Lexing and parsing such a language is very difficult 73-90 Sequence number

81 82

Blanks Ignored No Reserved Words

• In combination with other features, it • FORTRAN allows variable named IF promoted mistakes DIMENSION IF(100) • How do you read this? DO 20 I = 1. 100 IF (I - 1) = 1 2 3 DO 20 I = 1, 100 IF (I - 1) 1, 2, 3 DO20I = 1.100 • The compiler does not know what • Variable DO20I is unlikely, but . and , are IF (I - 1) will be next to each other on the keyboard… – Needs to see , or = to decide

83 84

14 9/23/20

Algebraic Notation Operators Need Precedence

• One of the main goals was to facilitate • b2 – 4ac == (b2) – (4ac) scientific computing • ab2 == a(b2) – Algebraic notation had to look like math • Precedence rules 1. Exponentiation – (-B + SQRT(B**2 – 4*AA*C))/(2*A) 2. Multiplication and division – Very good, compared to our pseudo-code 3. Addition and subtraction • Operations on the same level are associated to the • Problems left (read left to right) – How do you parse and execute such a • How about unary operators (-)? statement?

85 86

Some Highlights Some Highlights

• Integer type is overworked • Arrays – Integer – Only data structure – Character strings – Data constructor – Addresses • Weak typing – Static – Limited to three dimensions • Combine the two and we have a security loophole – Meaningless operations can be performed without warning – Restrictions on index expressions – Optimized – Column major order for 2-dimensional – Not required to be initialized

87 88

15