9/23/20
Highlights of Psuedo-Code
• Virtual computer – More regularity FORTRAN – Higher level • Decreased chance of errors – Automate tedious and error-prone tasks • Increased security CS4100 – Error checking Dr. Martin • Simplify debugging – trace From Principles of Programming Languages: Design, Evaluation, and Implementation (Third Edition), by Bruce J. MacLennan, Chapter 2, and based on slides by Istvan Jonyer.
1 2
Now: FORTRAN Backus at IBM
The First Generation • Visionary at IBM • Early 1950s • Recognized need for faster coding practice – Simple assemblers and libraries of • Need “language” that allows decreasing costs to subroutines were tools of the day linear, in size of the program – Automatic programming was considered • Speedcoding for IBM 701 – Language based on mathematical notation unfeasible – Interpreter to simulate floating point arithmetic – Good coders liked being masters of the trade • Laning and Zierler at MIT in 1952 – Algebraic language
3 4
Backus at IBM Meanwhile
• Goals • Grace Hopper organizes Symposia via Office of Naval Research (ONR) – Get floating point operations into hardware: IBM 704 • Exposes deficiencies in pseudo-code • Backus meets Laning and Zierler – Decrease programming costs • Later (1978) Backus says: • Programmers to write in conventional mathematical notation – “As far as we were aware we simply made up the language as we went along. We did not regard language design as a difficult • Still generate efficient code problem, merely as a simple prelude to the real problem: designing • IBM authorizes project a compiler which could produce efficient programs.” – Backus begins outlining FORTRAN • FORTRAN compiler works! • IBM Mathematical FORmula TRANslating System – Has few assistants – Project is overlooked (greeted with indifference and skepticism according to Dijkstra)
5 6
1 9/23/20
FORTRAN timeline FORTRAN • 1954: Project approved • 1957: FORTRAN – First version released • Goals • 1958: FORTRAN II and III – Decrease programming costs (to IBM) – Still many dependencies on IBM 704 – Efficiency • 1962: FORTRAN IV – “ANS FORTRAN” by American National Standards Institute – Breaks machine dependence – Few implementations follow the specifications • We’ll look at 1966 ANS FORTRAN
7 8
Sample FORTRAN program Structural Organization
DIMENSION DTA(900) • Preliminary specification did not include subprograms SUM 0.0 (like in pseudo-code) READ 10, N • FORTRAN I, however, already included subprograms 10 FORMAT(I3) DO 20 I = 1, N READ 30, DTA(I) Main program 30 FORMAT(F10.6) IF (DTA(I)) 25, 20, 20 Subprogram 1 25 DTA(I) = -DTA(I) . 20 CONTINUE . … .
Subprogram n
9 10
Constructs Declarative Constructs
• Declarative constructs • Declarations include – (First part in pseudo-code: data – Allocate area of memory of a specified size initialization) – Attach symbolic name to that area of memory – Declare facts about the program, to be – Initialize the memory used at compile-time • FORTRAN example • Imperative constructs – DIMENSION DTA (900) – (Second part in pseudo-code: program) – DATA DTA, SUM / 900*0.0, 0.0 • initializes DTA to 900 zeroes – Commands to be executed during run-time • SUM to 0.0
11 12
2 9/23/20
Imperative Constructs Building a FORTRAN Program
• Categories: • Interpretation unacceptable, since the selling point – Computational is speed • E.g.: Assignment, Arithmetic operations • Need the following stages to build: • FORTRAN: AVG = SUM / FLOAT(N) – Control-flow 1. Compilation Translate code to relocatable object code • E.g.: comparisons, loop • FORTRAN: 2. Linking – IF-statements Incorporating libraries (resolving external dependencies) – DO loop 3. Loading – GOTO Program loaded into memory; converted from relocatable to – Input/output absolute format • E.g.: read, print 4. Execution • FORTRAN: Elaborate array of I/O instructions (tapes, drums, Control is turned over to the processor etc.)
13 14
DESIGN: Control Structures Compilation
• Compilation has 3 phases – Syntactic analysis • Control structures control flow in the • Classify statements, constructs and extract their parts program – Optimization • FORTRAN has considerable optimizations, since that was the • Most important statement in FORTRAN: selling point – Assignment Statement – Code synthesis • Put together parts of object code instructions in relocatable format
15 16
DESIGN: Control Structures Arithmetic IF-statement • Machine Dependence (1st generation) • In FORTRAN, these were based on • Example of machine dependence native IBM 704 branch instructions – IF (a) n1, n2, n3 – Evaluate a: branch to – “Assembly language for IBM 704” • n1: if -, • n2: if 0, FORTRAN II statement IBM 704 branch operation • n3: if + GOTO n TRA k (transfer direct) – CAS instruction in IBM 704 GOTO n, (n1, n2,…,nm) TRA i (transfer indirect) GOTO (n1, n2,…,nm), n TRA i,k (transfer indexed) • More conventional IF-statement was later IF (a) n1, n2, n3 CAS k introduced IF ACCUMULATOR OVERFLOW n1, n2 TOV k – IF (X .EQ. A(I)) K = I - 1 … …
17 18
3 9/23/20
Principles of Programming GOTO
• The Portability Principle • Workhorse of control flow in FORTRAN – Avoid features or facilities that are • 2-way branch: dependent on a particular computer or a IF (condition) GOTO 100 case for false small class of computers. GOTO 200 100 case for true 200 • Equivalent to if-then-else in newer languages
19 20
n-way Branching Reversing TRUE and FALSE with Computed GOTO
GOTO (L1, L2, L3, L4 ), I 10 case 1 GOTO 100 • To get if-then-else –style if: 20 case 2 GOTO 100 IF (.NOT. (condition)) GOTO 100 30 case 3 case for true GOTO 100 40 case 4 GOTO 200 GOTO 100 100
100 case for false • Transfer control to label Lk if I contains k 200 • Jump Table
21 22
n-way Branching Loops with Computed GOTO • Loops are implemented using combinations GOTO (10, 20, 30, 40 ), I of IF and GOTOs 10 case 1 • Trailing-decision loop: GOTO 100 … … 20 case 2 100 body of loop GOTO 100 IF (loop not done) GOTO 100 30 case 3 • Leading-decision loop: GOTO 100 100 IF (loop done) GOTO 200 40 case 4 GOTO 100 …body of loop… 100 GOTO 100 • IF and GOTO are selection statements 200 … • Readable?
23 24
4 9/23/20
But wait, there’s more! Hmmm…
• Mid-decision loop: • Very difficult to know what control 100 …first half of loop… structure is intended IF (loop done) GOTO 200 …second half of loop… • Spaghetti code GOTO 100 • Very powerful 200 … • Must be a principle in here somewhere
25 26
Principles of Programming GOTO: A Two-Edged Sword
• The Structure Principle (Dijkstra) • Very powerful – The static structure of the program should – Can be used for good or for evil correspond in a simple way to the dynamic • But seriously is GOTO good or bad? structure of the corresponding computations. – Good: very flexible, can implement elaborate control structures • What does this mean? – Bad: hard to know what is intended – Should be able to visualize behavior of – Violates the structure principle program based on written form
27 28
Ex: Computed and Assigned But that’s not all! GOTOs • We just saw the Computed GOTO:
GOTO (L1, L2, …, Ln), I – Jumps to label 1, 2, … ASSIGN 20 TO N • N has address of stmt • Now consider the Assigned GOTO: 20, say it is 347
GOTO N, (L1, L2, …, Ln) – Jumps to ADDRESS in N • Look for 347 in jump – List of labels not necessary GOTO (20, 30, 40, 50), N table - out of range – Must be used with ASSIGN-statement • Not checked ASSIGN 20 TO N • Fetch value at 347 and use as destination for – Put address of statement 20 into N jump – Not the same as N = 20 !!!! • Problem??? – Computed should have been Assigned
29 30
5 9/23/20
Ex: Computed and Assigned Principles of Programming GOTOs • The Syntactic Consistency Principle I = 3 • I expected to have an address – Things that look similar should be similar and things that look different should be
• GOTO statement with different. GOTO I, (20, 30, 40, 50) address 3 – Probably in area used by system, i.e. not a stmt • Problem??? – Assigned should have been computed
31 32
Syntactic Consistency Even worse…
• Best to avoid syntactic forms that can be converted to • Confusing the two GOTOs will not be other forms by a simple error – ** and * caught by the compiler – Weak Typing (more on this later) • Violates the defense in depth principle • Integer variables – Integers – Addresses of statements – Character strings • Maybe a LABEL type? – Catch errors at compile time
33 34
Principles of Programming The DO-loop • Fortunately, FORTRAN provides the DO-loop • The Defense in Depth Principle • Higher-level than IF-GOTO-style control structures – No direct machine-equivalency – If an error gets through one line of defense, DO 100 I = 1, N then it should be caught by the next line of A(I) = A(I) * 2 defense. 100 CONTINUE • I is called the controlled variable • CONTINUE must have matching label • DO allows stating what we want: higher level – Only built-in higher level structure
35 36
6 9/23/20
Nesting Principles of Programming • The DO-loop can be nested • Preservation of Information Principle DO 100 I = 1, N – The language should allow the representation of ... information that the user might know and that the DO 200 J = 1, N compiler might need. ... 200 CONTINUE • Do-loop makes explicit 100 CONTINUE – Control variable – They must be correctly nested – Initial and final values – Optimized: controlled variable can be stored in – Extent of loop index register • If and GOTO – Note: we could have done this with GOTO – Compiler has to figure out
37 38
Subprograms Subprograms • AKA subroutine SUBROUTINE Name(formals) – User defined • When invoked …body… – Function returns a value – Using call stmt RETURN • Can be used in an expression END – Formals bound to • Important, late addition actuals • Why are they important? … – Formals aka dummy – Subprograms define procedural abstractions CALL Name (actuals) variables – Repeated code can be abstracted out, variables formalized – Allow large programs to be modularized • Humans can only remember a few things at a time (about 7)
39 40
Example Principles of Programming
SUBROUTINE DIST (D, X, Y) • The Abstraction Principle D = X – Y IF (D .LT. 0) D = -D – Avoid requiring something to be stated RETURN more than once; factor out the recurring END pattern.
… CALL DIST (DIFFER, POSX, POSY) …
41 42
7 9/23/20
Libraries Parameter Passing
• Subprograms encourage libraries • Once we decide on subprograms, we – Subprograms are independent of each need to figure out how to pass other parameters – Can be compiled separately • Fortran parameters – Can be reused later – Input – Maintain library of already debugged and – Output compiled useful subprograms • Need address to write to – Both
43 44
A Dangerous Side-Effect Parameter Passing • What if parameter passed in is not a variable? SUBROUTINE SWITCH (N) • Pass by reference N = 3 – On chance may need to write to RETURN • all vars passed by reference END – Pass the address of the variable, not its value … – Advantage: CALL SWITCH (2) • Faster for larger (aggregate) data constructs • The literal 2 can be changed to the literal 3 in FORTRAN’s • Allows output parameters literal table!!! – Disadvantage: – I = 2 + 2 I = 6???? • Address has to be de-referenced – Violates security principle – Not by programmer—still, an additional operation • Values can be modified by subprogram • Need to pass size for data constructs - if wrong?
45 46
Principles of Programming Pass by Value-Result
• Security principle • Also called copy-restore – No program that violates the definition of • Instead of pass by reference, copy the value of actual the language, or its own intended structure, parameters into formal parameters should escape detection. • Upon return, copy new values back to actuals • Both operations done by caller – Can know not to copy meaningless result • E.g. actual was a constant or expression • Callee never has access to caller’s variables
47 48
8 9/23/20
Activation Records What happens exactly?
• What happens when a subprogram is • Before subprogram invocation: called? – Place parameters into callee’s activation – Transmit parameters record – Save caller’s status – Save caller’s status • Save content of registers – Enter the subprogram • Save instruction pointer (IP) – Restore caller’s state – Save pointer to caller’s activation record in – Return to caller callee’s activation record – Enter the subprogram
49 50
What happens exactly? Contents of Activation Record
• Returning from subprogram: • Parameters passed to subprogram – Restore instruction pointer to caller’s • P (resumption address) – Return to caller • Dynamic link (address of caller’s – Caller needs to restore its state (registers) activation record) – If subprogram is a function, return value • Temporary areas for storing registers must be made accessible
51 52
DESIGN: Data Structures Primitives
• First data structures • Primitives are scalars only – Suggested by mathematics – Integers • Primitives – Floating point numbers • Arrays – Double-precision floating point – Complex numbers – No text (string) processing
53 54
9 9/23/20
Representations Arithmetic Operators • Word-oriented • 2 + 3.1 = ? – 2 is integer, 3.1 is floating point – Most commonly 32 bits • How do we handle this situation? • Integer – Explicit type-casting: FLOAT(2) + 3.1 • Type-casting is also called “coercion” – Represented on 31 bits + 1 sign bit – FORTRAN: Operators are overloaded – Automatic type coercion • Floating point • Always coerce to encompassing set – Integer + Float à float addition – Using scientific notation: characteristic + – Float * Double à double multiplication mantissa – Integer – Complex à complex subtraction • Types dominate their subsets sm sc c7 … c0 m21 … m0
55 56
Example Hollerith Constants • Early form of character string in FORTRAN – 6HCARMEL is a six character string ‘CARMEL’ (H is for • X**(1/3) = ? Hollerith) 1/3 = 0 – Second-class citizens • No operations allowed 1/3.0 = 0.33333 • Can be read into an integer variable, which cannot (should not) be altered • Problems – Integer representing a Hollerith constant may be altered, which makes no sense • Weak typing – No type checking is performed
57 58
Constructor: Array Representation
• Constructor • Simple, intuitive representation • Column-major order – Method to build complex data structures – Most languages do row-major order Element Address from primitive ones – Addressing equation: A(1,1) A • a{A(2)} = a{A(1)} + 1 = a{A(1)} – 1 + 2 A(2,1) A + 1 • FORTRAN only has array constructors • a{A(i)} = a{A(1)} – 1 + i … • a{A(i,j)} = a{A(1,1)} + (j – 1)m + i – 1 A(m,1) A + m - 1 DIMENSION DTA, COORD(10,10) • FORTRAN uses 1-based addressing – One addressable slot of each elt A(1,2) A + m – Initialization is not required … – Maximum 3 dimensions A(m,2) A + 2m - 1 … A(m,n) A + nm - 1
59 60
10 9/23/20
Optimizations Subscripts • Subscripts can be expressions • Arrays are mostly associated with loops – A(i+m*c) – Most programmers initialize controlled variable to 1, and – This defeats above optimization reference array A(i) – Therefore, subscripts are limited to – Optimization: • c and c’ are integers, v is an integer variable • c • Initialize controlled variable to address of array element • v • Therefore, we’ll increment address itself • v+c, v-c • Dereference controlled variable to get array element • c*v • c*v+c’, c*v-c’ – A(J - 1) ok; A(1+J) not ok • Optimizations like this sold FORTRAN
61 62
DESIGN: Name Structures Declarations
• What do name structures structure? • Declarations are non-executable – Names, of course! statements • Primitives bind names to objects • Unlike IF, GOTO, etc., which are executable statements – INTEGER I, J, K • Allocate integers I, J, and K, and bind the • Static allocation names to memory locations – Allocated once, cannot be deallocated for • Declare: name, type, storage reuse – FORTRAN does not do dynamic allocation
63 64
Optional Declaration Now: Semantics (meaning) • FORTRAN does not require variables to be declared – First use will declare a variable • “They went to the bank of the Rio • What’s wrong with this? – COUNT = COUMT + 1 Grande.” – What if first use is not assignment? • What does this mean? • Convention: – Variables starting with letters i, j, k, l, m, n are integers • How do we know? – Others are floating point – Bad practice: Encourages funny names (KOUNT, ISUM, • CONTEXT, CONTEXT, CONTEXT XLENGTH…)
65 66
11 9/23/20
Programming Languages SCOPE
• X = COUNT(I) • Scope of a binding of a name • What does this mean – Region of program where binding is visible – X integer or real • In FORTRAN – COUNT array or function – Subprogram names GLOBAL • Again Context • Can be called from anywhere – Set of variables visible when statement is – Variable names LOCAL seen • To subprogram where declared • Context is called ENVIRONMENT
67 68
Contour Diagram Once we have subprograms…
Global scope
R Main program • We need to find a way to share data S X – Parameters
R(2) • Pass by reference S(X) • Pass by value-result
R S – Caller copies value of actual to formal variable N N – On return, caller copies result value to actual X Y » Omit for constants or expressions as actuals Y S(X)
69 70
Once we have subprograms… Information Hiding • Share Data With Just Parameters? – Cumbersome, and hard to maintain • Information Hiding Principle – Produces long list of parameters • Modules should be designed so that – If data structure changes, there are many changes • The user has all the information needed to use to be made the module and nothing more. – Violates information hiding • The implementor has all the information needed to implement the module correctly and nothing more.
71 72
12 9/23/20
Sharing Data Common • FORTRAN’s solution: • COMMON blocks allow more flexibility – Allows sharing data between subprograms – Scope rules necessitate this • Consider a symbol table
SUBROUTINE ARRAY2 (N, L, C, D1, D2) COMMON /SYMTAB/ NAMES(100), LOC(100), TYPE(100) ... SUBROUTINE VAR (N, L, C) COMMON /SYMTAB/ NAMES(100), LOC(100), TYPE(100)
73 74
COMMON Problems Aliasing • Tedious to write • The ability to have more than one name • Unreadable for the same memory location • Virtually impossible to change AND • Very flexible! • COMMON permits aliasing, which is dangerous COMMON /B/ M, A(100) – If COMMON specifications don’t agree, misuse is possible COMMON /B/ X, K, C(50), D(50)
75 76
Aliasing EQUIVALENCE • Since dynamic memory allocation is not supported, and memory is scarce, FORTRAN has EQUIVALENCE
DIMENSION INDATA(10000), RESULT(8000) EQUIVALENCE INDATA(1), RESULT(8)
• Allows a way to explicitly alias two arrays to the same memory
77 78
13 9/23/20
EQUIVALENCE DESIGN: Syntactic Structures
• This is only to be used when usage of • Languages are defined by lexics and syntax INDATA and RESULT do not overlap – Lexics • Allows access to different data types (float as • Way to combine characters to form words or symbols if it was integer, etc.) • E.g. Identifier must begin with a letter, followed by no more than 5 letters or digits • Has same dangers as COMMON – Syntax • Way to combine symbols into meaningful instructions • Syntactic analysis: Lexical analyzer (scanner) Syntactic analyzer (parser)
79 80
Fixed Format Lexics Blanks Ignored • Still using punch-cards! • Particular columns had particular meanings • FORTRAN ignored spaces (not just white spaces) • Statements (columns 7-72) were free format • Thisisveryunfortunate!
Columns Purpose DIMENSION INDATA(10000), RESULT(8000) D I M E N S I O N I N D A T A (1 0 0 0 0), R E S U L T (8000) 1-5 Statement number DIMENSIONINDATA(10000),RESULT(8000)
6 Continuation
7-72 Statement • Lexing and parsing such a language is very difficult 73-90 Sequence number
81 82
Blanks Ignored No Reserved Words
• In combination with other features, it • FORTRAN allows variable named IF promoted mistakes DIMENSION IF(100) • How do you read this? DO 20 I = 1. 100 IF (I - 1) = 1 2 3 DO 20 I = 1, 100 IF (I - 1) 1, 2, 3 DO20I = 1.100 • The compiler does not know what • Variable DO20I is unlikely, but . and , are IF (I - 1) will be next to each other on the keyboard… – Needs to see , or = to decide
83 84
14 9/23/20
Algebraic Notation Operators Need Precedence
• One of the main goals was to facilitate • b2 – 4ac == (b2) – (4ac) scientific computing • ab2 == a(b2) – Algebraic notation had to look like math • Precedence rules 1. Exponentiation – (-B + SQRT(B**2 – 4*AA*C))/(2*A) 2. Multiplication and division – Very good, compared to our pseudo-code 3. Addition and subtraction • Operations on the same level are associated to the • Problems left (read left to right) – How do you parse and execute such a • How about unary operators (-)? statement?
85 86
Some Highlights Some Highlights
• Integer type is overworked • Arrays – Integer – Only data structure – Character strings – Data constructor – Addresses • Weak typing – Static – Limited to three dimensions • Combine the two and we have a security loophole – Meaningless operations can be performed without warning – Restrictions on index expressions – Optimized – Column major order for 2-dimensional – Not required to be initialized
87 88
15