Structure of Programming Languages – Lecture 6

CSCI 6636 – 4536

March 10, 2020

CSCI 6636 – 4536 Lecture 6. . . 1/39 March 10, 2020 1 / 39 Outline

1 Expressions and Evaluation

2 Goto Statements

3 Structured Control What Kinds of Control are Needed? Structured Control Conditionals Repetition Exceptions

4 Control Expressions Conditional Expressions

5 Homework

CSCI 6636 – 4536 Lecture 6. . . 2/39 March 10, 2020 2 / 39 Expressions and Evaluation Arithmetic Expression Syntax

All programming languages support arithmetic expressions. The four primary arithmetic operations: +, −, ∗, and / are always supported. Javascript uses ∗∗, for exponentiation. APL has symbols for dozens of things.

Parentheses are used to influence the order of evaluation.

CSCI 6636 – 4536 Lecture 6. . . 3/39 March 10, 2020 3 / 39 Expressions and Evaluation Arithmetic Expression Semantics

The syntax for expressions is not enough to define their meaning. We must also know the evaluation rules: In most languages, the semantics of arithmetic expressions are defined by the rules for precedence and associativity. However, these rules differ from language to language. Some languages (Lisp, Scheme) are written in fully-parenthesized prefix form. Forth is written in non-parenthesized postfix form. APL is (was) evaluated right to left, modified by parenthesized subexpressions. It very very difficult to comprehend the meaning of a complex APL expression.

CSCI 6636 – 4536 Lecture 6. . . 4/39 March 10, 2020 4 / 39 Expressions and Evaluation Order of Evaluation: Compile Time vs. Run Time

Precedence and associativity govern the order in which operands are parsed and added to the computation tree at compile time. They do NOT always govern evaluation order at run time.

The order in which the run-time system evaluates the two operands of an operator, or the arguments in a function call, is defined in some languages, undefined in others.

CSCI 6636 – 4536 Lecture 6. . . 5/39 March 10, 2020 5 / 39 Expressions and Evaluation Semantics: Order of Evaluation.

In Scheme, the language semantics specify that the evaluation order is not constrained. However, it must not matter what order is used, and the result of the operation (given the same arguments) must always be the same. In C/C++, the operands/arguments can be evaluated in any order that is convenient for the compiler. This leaves freedom for an optimizer to work. The result of relying on any particular evaluation order is undefined. In Java, the order is left-to-right. In C#, the order is left-to-right.

CSCI 6636 – 4536 Lecture 6. . . 6/39 March 10, 2020 6 / 39 Expressions and Evaluation Lazy vs. Strict Evaluation

Strict Evaluation: Evaluate the expressions and statements in the order they are given in the program. Evaluate the argument-expressions before you call a function, and bind the results to the parameter name within the function.

Lazy Evaluation: Do not evaluate an expression unless it is needed to produce the output. Do not evaluate a parameter until you need it.

CSCI 6636 – 4536 Lecture 6. . . 7/39 March 10, 2020 7 / 39 Expressions and Evaluation In Praise of Laziness

A result from Lambda Calculus: Outside-in evaluation is strictly more powerful than inside-out: if ( x != 0) answer = z/x; else answer = z; O-I: Evaluate if (x != 0) first, then choose one clause. I-O: Evaluate z/x first and bomb with a divide-by-zero error. Every programming language evaluates conditionals outside in. Most languages evaluate everything else inside-out.

Modern functional languages use outside-in evaluation consistently; it is called lazy evaluation. Nothing is evaluated until and unless the result is needed.

CSCI 6636 – 4536 Lecture 6. . . 8/39 March 10, 2020 8 / 39 Goto Statements Jumps and

A statement transfers control to some other labelled statement in the program. A jump transfers control forward or backward a defined distance in the executable code. Of course, assembly languages rely on goto for conditionals and loops. The Forth assembler uses jumps to implement conditionals and loops. In Basic, all lines were given numbers reflection their proper order. The target of the goto could be any specific . In , labels were arbitrary identifiers, declared at the top of a program unit. InC #, statement labels are alphabetic, and written at the beginning of a line. You can also goto a case statement within a switch. A C/C++ goto is like C#, but the target cannot be a case label.

CSCI 6636 – 4536 Lecture 6. . . 9/39 March 10, 2020 9 / 39 Goto Statements Go to considered harmful. . .

In 1966, a paper by Boehm and Jacopini, proved that a language that provides sequencing, selection, and iteration is sufficient to write a program to compute any computable function. In 1968 Edgar Dijkstra published “Go to considered harmful”. It described the relatively poorer quality of programs written by programmers who relied on goto’s. Dijkstra advocated using if-then-else and while or repeat control structures because they mirror the structure of the process better than multiple go-to’s.

Donald Knuth responded with an article “Go to considered harmful considered harmful”, which was a defense of the need for goto’s to achieve larger-scale error handling and responsive execution.

Dijkstra and Knuth are two of the great men in Computing, and this debate is still famous.

CSCI 6636 – 4536 Lecture 6. . . 10/39 March 10, 2020 10 / 39 Goto Statements

For the next ten years, textbooks pushed the idea of programming with “structured style”. Using any other control statements was considered “poor technique”.

Fortran textbooks were published that contained clumsy, inefficient implementations of while loops and if-then-else instead of the native control statements.

One-in / one-out control structures became sacred cows: essential to avoid “spaghetti code”. Boolean status flags were used to control execution instead of goto’s.

Then C was introduced. It had a one-in/two-out control structure: a loop containing a break. By using the break effectively, the need for boolean spaghetti evaporated. Then C++ came, with exceptions. At that point, there was no longer a need for goto.

CSCI 6636 – 4536 Lecture 6. . . 11/39 March 10, 2020 11 / 39 Goto Statements From the Delphi website, circa 1995:

Delphi is a dialect of Pascal, with some OO constructs added.

The Goto keyword forces a jump to a given label.

It should Never be used in modern code since it makes code very difficult to maintain.

It is mostly used to force a termination of heavily nested code, where the logic to safely exit would be tortuous.

Never jump into or out of try statements, or into a loop or conditional block.

Be careful! Use with extreme caution and only when fully justified.

CSCI 6636 – 4536 Lecture 6. . . 12/39 March 10, 2020 12 / 39 Goto Statements Why goto Statements are Obsolete

Using a goto is no longer good practice because it has bad effects on: Translation: the compiler must remember the location of all statement labels for the scope of the program, just in they are needed. To compile a goto statement, the compiler must search the list of labels. Modularity: all parts of the program between the goto and the target become part of one module. This breaks or confuses modern ideas of modularity. Debug-ability: using goto’s makes it much harder to trace a program’s logic, and therefore hard to debug. Maintainability: for the same reason, it is harder to modify or maintain. With the introduction of break, continue, and exceptions, there is no longer a need for goto’s.

CSCI 6636 – 4536 Lecture 6. . . 13/39 March 10, 2020 13 / 39 Goto Statements New Kinds of Spaghetti Code

All major languages developed after C implement, tweak, and extend the C control structures, and support exceptions. Goto’s are uniformly frowned upon, but still there. However, we now have a variety of ways to make spaghetti code without using goto.

The term spaghetti code is now used to describe any program where the path of execution is not clear from the appearance of the code, and a flow chart is needed to figure out the logic.

You can write spaghetti code using “structured units” if you include deeply nested logic units controlled by multiple state variables (boolean or not). And it is still poor technique.

CSCI 6636 – 4536 Lecture 6. . . 14/39 March 10, 2020 14 / 39 Structured Control What Kinds of Control are Needed? Minimal Necessary Control

Jacopini and Boehm proved that the if...then...else, while, and statement sequences form an adequate basis for programming without goto’s. Procedural languages are built on this foundation.

But you do not need all of these control structures: if...then...else and recursion (replacing loops and sequences) form an adequate basis for a language. Functional languages are built on this foundation. The pure functional languages are built on this foundation and do not need or support sequences or loops.

CSCI 6636 – 4536 Lecture 6. . . 15/39 March 10, 2020 15 / 39 Structured Control What Kinds of Control are Needed? Structured vs. Unconstrained Control

The control capabilities built into a language can be structured or unconstrained. A goto statement or expression is unconstrained: It can go to a target anywhere. An if...then...else statement or a while loop is structured: it has a defined beginning and end, and control flows through it in a prescribed way. Function calls are structured control because a call defines a controlled interface between two parts of a program, and control returns to the point at which the call started. Exceptions and break statements are structured because they can only end up at defined spots in the program.

CSCI 6636 – 4536 Lecture 6. . . 16/39 March 10, 2020 16 / 39 Structured Control What Kinds of Control are Needed? Minimal necessary control

To avoid using goto’s, a set of structured control is required in a computer language language: the programmer needs a way to write conditional code and to repeat blocks of code. if...then...else and recursion form a minimal adequate basis for a language. Functional languages are built on this foundation.

if...then...else, the while loop and statement sequences also form an adequate basis for programming. Procedural languages are built on this foundation. However, many languages of both sorts provide a wide assortment of other control structures for convenience and program clarity. Note: A simple if, without the possibility of an else clause is not an adequate basis for defining a language.

CSCI 6636 – 4536 Lecture 6. . . 17/39 March 10, 2020 17 / 39 Structured Control Structured Control Structured Control

Conditionals Loops Exceptions

These are familiar today to every programmer. The goal here is to understand the variations, the details, the design decisions, and what a compiler does with the control statement.

CSCI 6636 – 4536 Lecture 6. . . 18/39 March 10, 2020 18 / 39 Structured Control Conditionals Variations on the Conditional

An easy way to start learning a new language is to find out how to write expressions, statements, and simple and multi-way conditionals. This is easy because all languages are very much alike in this area. However, there are lots of minor variations: Keywords vary: if, cond, WHEN, then, switch, EVALUATE else, elseif, elif, case The method of delimiting the three clauses varies: parentheses, brackets, case labels, keywords (then, elif, elseif, endif) There can be one test per possible action to execute (as in LISP cond) or one test altogether, followed by a sequence of several possible clauses, which will be selected based on the outcome of the test (as in a C switch). After an action is selected and executed, control normally goes to the end of the conditional unit. ( The C switch is poorly designed because a separate break is needed to do this.)

CSCI 6636 – 4536 Lecture 6. . . 19/39 March 10, 2020 19 / 39 Structured Control Conditionals Simple Conditional Semantics

A simple conditional statement consists of a test followed by two sets of actions (clauses). The first clause (the then clause) is used when the test result is true. The second clause (the else clause) is executed when the result is false.

A conditional statement must be evaluated outside-in. The condition must be evaluated first. The result of the condition is then tested and used to select either the then clause or the else clause. One or the other will be executed, but never both. The outside-in evaluation lets us write guarded expressions which avoid executing infinite loops or computations that would crash the program.

CSCI 6636 – 4536 Lecture 6. . . 20/39 March 10, 2020 20 / 39 Structured Control Conditionals Multi-way Conditionals-1

A multiway conditional has more than two clauses following the test. A cond in Lisp and Scheme is a series of (test action) pairs. The conditions are evaluated top-to-bottom. The action selected is the one following the first test that returns true. This is similar to the if..elseif..else in Fortran, the if..elsif..else in Ada, and the if..elif..else in Python. The same semantics is achieved in C with fewer keywords by nesting simple if statements and using braces to delimit clauses.

CSCI 6636 – 4536 Lecture 6. . . 21/39 March 10, 2020 21 / 39 Structured Control Conditionals Multi-way Conditionals-2

The C switch and the Ada case illustrate a different multi-way semantics: There is only one test expression at the top, Followed by multiple pairs, each pair consisting of one or more case labels and a set of actions. The case labels are integer or enumerated constants. A well-designed version has an optional default clause at the end. To execute a switch, the test expression is evaluated and the resulting value is compared to the case labels. If one matches, the corresponding action is selected. Reasonably clean programs can be built without the multi-way conditionals. They are common in languages, though, because they often provide a better way to model complex decision-making.

CSCI 6636 – 4536 Lecture 6. . . 22/39 March 10, 2020 22 / 39 Structured Control Conditionals Compiling a Conditional

During code generation, the compiler if (test) works on the fully parsed program. To on false, go to y compile a simple conditional, it must insert a conditional jump and an True unconditional jump into the code. Action(s) When the conditional jump must be go to z generated, the compiler does not know y how far ahead to go. So it remembers the False address of the unfinished jump instruction. Action(s) At the end of the true actions, an incomplete unconditional forward jump is z next line of code generated. The next address becomes the target for the conditional jump. After compiling the false actions, the second jump is completed with address z.

CSCI 6636 – 4536 Lecture 6. . . 23/39 March 10, 2020 23 / 39 Structured Control Repetition Loops vs. Recursion

A programming language needs a way to specify repetition. This could be done using recursion. It could be done using loops. Modern languages usually support both kinds of repetition, in one way or another. Generally, a loop is faster than a recursion to do the same thing. Often, a recursion is more concise than a loop.

CSCI 6636 – 4536 Lecture 6. . . 24/39 March 10, 2020 24 / 39 Structured Control Repetition Loop Statements

A loop statement executes a process that changes memory and/or produces a side-effect. A for-each loop applies an action to every element of an array or list. The general loop (FORTH and C) has a body of code with an exit test potentially anywhere within that body. Restricted loops have an exit test at the top (while) or at the bottom (do . . . while). Counted loops execute a block of code the number of times specified by the control code. Depending on the language, this can be computed at the beginning of the loop or remain flexible throughout the loop. We do not need all these loops to write clean, structured programs. They exist because they are useful ways to describe program logic. d

CSCI 6636 – 4536 Lecture 6. . . 25/39 March 10, 2020 25 / 39 Structured Control Repetition Compiling a While Loop

To compile a loop, the compiler must insert a conditional forward jump after the loop test and an unconditional backward jump at the bottom. x while (test) At the time the conditional jump must be on false, go to z generated, the compiler does not know how far ahead to go. So it remembers the Loop address of the unfinished jump instruction. Action(s) When it comes to the end of the loop go to x actions, it generates an unconditional z next line of code backward jump to the loop test. Then it uses the next machine address as the target for the incomplete conditional jump.

CSCI 6636 – 4536 Lecture 6. . . 26/39 March 10, 2020 26 / 39 Structured Control Repetition Compiling a General Loop

Some programs (for example, servers) run an explicit infinite loop. In such loops, there is no exit test at all. This is one-in-zero-out control. Far more common are situations that call for processing both before and after the loop test. Examples include the need for an eof test after reading input and before processing it. Iin these loops, the loop test can appear x Loop Action(s) anywhere in the body of the loop. It is if (test) implemented using if. . . break logic. goto z on true As with a restricted loop, a conditional Loop Actions(s) forward jump and an unconditional backward jump are required. Both must go to x be generated in an incomplete form and z next line of code patched later.

CSCI 6636 – 4536 Lecture 6. . . 27/39 March 10, 2020 27 / 39 Structured Control Repetition Counted Loops

Counted loops differ more in syntax and semantics, from language to language, than any other kind of control structure. The biggest issue is whether there is a constant tripcount, calculated once before repetition begins. That is, whether the required number of repetitions of the loop body can be changed by something in the body of the loop. Two factors can make the tripcount unpredictable: Can the exit condition be changed within the loop, or is it constant? Can the loop variable be changed by the loop body, or only by the control clause? A predictable tripcount makes the task of the compiler and the work at runtime minimally easier. There is a much larger effect, however, on the clarity and debug-ability of the program. C’s for loop is extreme: totally flexible and much more prone to run-time surprises and gotchas.

CSCI 6636 – 4536 Lecture 6. . . 28/39 March 10, 2020 28 / 39 Structured Control Repetition Compiling a Counted Loop

To compile a loop, it must insert a go to y conditional forward jump after the loop x Increment test and an unconditional backward jump y Condition at the bottom. on false, go to z At the time the conditional jump must be generated, the compiler does not know Loop how far ahead to go. So it remembers the Action(s) address of the unfinished jump instruction. When it comes to the end of the true go to x actions, it generates an incomplete z next line of code unconditional forward jump. It uses the next available address as the jump target for the first one. After generating code for the false actions, it discovers address z and goes back to patch the incomplete unconditional jump.

CSCI 6636 – 4536 Lecture 6. . . 29/39 March 10, 2020 29 / 39 Structured Control Exceptions Exceptions

Long after structured programming became the accepted procedural paradigm, goto statements were still needed, possibly 1% of the time. Important applications were: Error handling that could not be done locally. Escape from the bottom levels of a deeply nested control structure. Exceptions became part of most major languages by the mid-90’s, although some hardware platforms failed to support them completely.

CSCI 6636 – 4536 Lecture 6. . . 30/39 March 10, 2020 30 / 39 Structured Control Exceptions Implementation Difficulties

All stack frames from the one that throws the exception to the context that handles it must be removed from the runtime stack. The exception cannot be allocated on the stack, because the stack will be unwound. Therefore, In C++, an exception is a system object and both allocation and deallocation are handled by the system. The major theoretical problem involved is how to handle resources (dynamic allocation, locks, open files) that belong to the intermediate stack frames when control makes a superman-jump from the context that threw the exception to an earlier context that could handle it. In Java, this is handled by finally clauses. This system is known to be a major contributing factor to software bugs when the software deals with multiple resources. In C++, the problem is handled by running the destructors for all objects in the dying stack frames. This works reliably if programmers write correct destructors.

CSCI 6636 – 4536 Lecture 6. . . 31/39 March 10, 2020 31 / 39 Control Expressions Control Expressions

The Functional Philosophy Conditional Expressions Loop Expressions

CSCI 6636 – 4536 Lecture 6. . . 32/39 March 10, 2020 32 / 39 Control Expressions Expressions vs. Statements

All programming languages have control elements that allow testing and repetition. In statement-based languages, these are usually control statements In functional languages, these are usually control expressions or functions Some languages support both but rely more heavily on one or the other.

CSCI 6636 – 4536 Lecture 6. . . 33/39 March 10, 2020 33 / 39 Control Expressions The Functional Philosophy

Names and Values: A name is either undefined, or it has a single, immutable meaning. (No mutators, no assignment.) The type of a value can be deduced from the value itself. (It starts with some sort of a type tag.)

Functions: A function should depend only on its parameters and its own logic. It should not change anything outside its scope. Given the same parameters, the result should always be the same. A function IS an object and can be a parameter to or result of another function.

CSCI 6636 – 4536 Lecture 6. . . 34/39 March 10, 2020 34 / 39 Control Expressions Functional Principles

State: The state of a computation is entirely determined by the set of parameters in use on the stack. There is no global state.

Types: Objects have types; variable names don’t.

Control: Control is done through expressions, not statements. Statements do not exist.

CSCI 6636 – 4536 Lecture 6. . . 35/39 March 10, 2020 35 / 39 Control Expressions Order of Execution

Pure functional languages (ML, Haskell) use lazy evaluation throughout. As a result of laziness, the sequence in which lines of code are written is not the same as the order in which they are evaluated. (An apparent statement sequence is an illusion–the results would be the same if the lines were scrambled.) Instead, evaluation order is determined by how function calls are nested in the code, and by whether local variables and parameters are used at all. Parameter binding is the primary means naming a value. Functional languages use dynamic binding (no declarations) to attach names to objects, and allow only one binding during the lifetime of the name.

CSCI 6636 – 4536 Lecture 6. . . 36/39 March 10, 2020 36 / 39 Control Expressions Conditional Expressions Conditional Expressions

A conditional expression makes a test and returns a value. It can therefore be used in the middle of any expression, including another conditional expression. The conditional expression was supported by Algol-60. C’s conditional expression is ? : Java and C# share this syntax. Algol-60 and Python have analogous forms. Scheme and Lisp provide not-quite-identical versions of cond Haskell and Miranda provide guarded expressions.

CSCI 6636 – 4536 Lecture 6. . . 37/39 March 10, 2020 37 / 39 Control Expressions Conditional Expressions Loop Expressions

A loop expression evaluates a function repeatedly and calculates an answer, or iterates down an array or list, producing a result that is a number or another array or list. In Scheme: map operates on lists and produces a list as its result. (map 1+ (list 1 2 3 4 5)) ;Value 2: (2 3 4 5 6)

In APL any function can be applied to an array and produce an array: A ← 1 2 3 4 5 B ← 2 3 ρ ι 6 A + 1 B × 2 2 3 4 5 6 2 4 6 8 10 12 These control expressions can accomplish in one line what the usual control statements take several lines to do.

CSCI 6636 – 4536 Lecture 6. . . 38/39 March 10, 2020 38 / 39 Homework Homework Read Chapters 8 and 10 of the text.

1 This expression can be evaluated inside-out or outside in: x=sqrt( 3*z - floor(y)) Which function is called first in an inside-out evaluation? Which is called first in an outside-in evaluation? 2 What is the difference between a conditional statement and a conditional expression? Note that a conditional expression is NOT what you might write in the parentheses of a while statement. 3 What restrictions that are necessary in a C for loop to ensure that you can predict the tripcount ahead of time. 4 List the jump statements (things like break, continue, goto, and exceptions) that are supported in Ruby. Explain how each one works. 5 After reading the relevant section in the text, explain why unrestricted control statements (goto) make it difficult to translate and maintain a large program.

CSCI 6636 – 4536 Lecture 6. . . 39/39 March 10, 2020 39 / 39