Semantics Overview • Syntax is about “form” and semantics about “meaning” – Boundary between syntax & semantics is not always clear • First we’ll motivate why semantics matters. • Then we’ll look at issues close to the syntax end, what some calls “static semantics”, and the technique of attribute grammars. • Then we’ll sketch three approaches to defining “deeper” semantics (1) Operational semantics (2) Axiomatic semantics (3) Denotational semantics

CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc.

Motivation Program Verification

• Capturing what a program in some programming • Program verification can be done for simple programming language means is very difficult languages and small or moderately sized programs • We can’t really do it in any practical sense • It requires a formal specification for what the program – For most work-a-day programming languages (e.g., C, C++, should do – e.g., what it’s inputs will be and what actions Java, Perl, C#). it will take or output it will generate given the inputs – For large programs • That’s a hard task in itself! • So, why is worth trying? • There are applications where it is worth it to (1) use a • One reason: program verification! simplified programming language, (2) work out formal specs for a program, (3) capture the semantics of the • Program Verification is the process of formally simplified PL and (4) do the hard work of putting it all proving that the computer program does exactly what together and proving program correctness. is stated in the program specification it was written to • What are they? realize

CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. Program Verification Double Int kills Ariane 5

• There are applications where it is worth it to (1) use a simplified programming language, (2) work out formal • It took the European Space Agency specs for a program, (3) capture the semantics of the 10 years and $7B to produce simplified PL and (4) do the hard work of putting it all Ariane 5, a giant rocket capable of together and proving program correctness. Like… hurling a pair of three-ton satellites • Security and encryption into orbit with each launch and • Financial transactions intended to give Europe supremacy in the commercial space business • Applications on which lives depend (e.g., healthcare, aviation) • All it took to explode the rocket less • Expensive, one-shot, un-repairable applications (e.g., than a minute into its maiden voyage Martian rover) in 1996 was a small computer program trying to • Hardware design (e.g. Pentium chip) stuff a 64-bit number into a 16-bit space.

CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc.

Intel Pentium Bug So…

• In the mid 90’s a bug was found in • While automatic program verification is a the floating point hardware in Intel’s long range goal … latest Pentium microprocessor • Which might be restricted to applications • Unfortunately, the bug was only found where the extra cost is justified after many had been made and sold • We should try to design programming • The bug was subtle, effecting only the ninth languages that help, rather than hinder, our decimal place of some computations ability to make progress in this area. • But users cared • We should continue research on the semantics of programming languages … • Intel had to recall the chips, taking a $500M • And the ability to prove program correctness write-off

CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. Semantics Static Semantics • Static semantics covers some language features • Next we’ll look at issues close to the syntax difficult or impossible to handle in a BNF/CFG end, what some calls “static semantics”, • It’s also a mechanism for building a parser that and the technique of attribute grammars. produces an abstract syntax tree of its input • Then we’ll sketch three approaches to • Attribute grammars are one common technique defining “deeper” semantics • Categories attribute grammars can handle: (1) Operational semantics - Context-free but cumbersome (e.g., type (2) Axiomatic semantics checking) (3) Denotational semantics - Non-context-free (e.g., variables must be declared before they are used)

CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc.

Parse tree vs. abstract syntax tree Attribute Grammars • Parse trees follow a grammar and usually have lots of useless • Attribute Grammars (AGs) were nodes developed by Donald Knuth in ~1968 • An abstract syntax tree (AST) eliminates useless structural nodes • And uses just those nodes that correspond to constructs in the • Motivation: higher level programming language • CFGs cannot describe all of the syntax • It’s much easier to interpret an AST or generate code from it of programming languages

e! +! +! • Additions to CFGs to annotate the parse

e!! +! intint!! e+!! int! e+!! 3! tree with some “semantic” info • Primary value of AGs: e! +! int! 3! inte!! int! 3! 1! 2! • Static semantics specification int! 2! 1! 2! • Compiler design (static semantics checking)

1! parse tree an AST another AST CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. Attribute Grammar Example Attribute Grammars A Grammar is formally defined by specifying four components. • Ada has this rule to describe procedure definitions: • S is the start symbol Def: An attribute grammar is a CFG• N is a set G= of non-terminal symbols

Attribute Grammars Attribute Grammars Example: expressions of the form id + id

• Let X0 => X1 ... Xn be a grammar rule •id 's can be either int_type or real_type

• Functions of the form S(X0) = f(A(X1),...A(Xn) • types of the two id's must be the same define synthesized attributes • type of the expression must match its expected type - i.e., attribute defined by a nodes children BNF: ` + • Functions of the form I(Xj) = f(A(X0),…A(Xn)) for i <= j <= n define inherited attributes -> id - i.e., attribute defined by parent and siblings Attributes: • Initially, there are intrinsic attributes on the actual_type - synthesized for and `

CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. Attribute Grammars Attribute Grammars (continued) Attribute Grammar: How are attribute values computed?

1. Syntax rule: `[1] + [2] • If all attributes were inherited, the tree Semantic rules: could be decorated in top-down order `

Compilers usually maintain a 2. Syntax rule: ` -> id “symbol table” where they • In many cases, both kinds of attributes are record the names of proce- Semantic rule:! dures and variables along used, and it is some combination of top- with type type information. .actual_type ← Looking up this information down and bottom-up that must be used in the symbol table is a com- lookup_type (id, ) mon operation. `

Attribute Grammars (continued) Attribute Grammar Summary

• AGs are a practical extension to CFGs that allow us to Suppose we process the expression A+B annotate the parse tree with information needed for using rule `[1] + [2] semantic processing `

CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. Static vs. Dynamic Semantics Dynamic Semantics

• Attribute grammars are an example of static semantics (e.g., type checking) that don’t • No single widely acceptable notation or reason about how things change when a formalism for describing semantics. program is executed • But understanding what a program means often • Here are three approaches at which we’ll requires reasoning about how, for example, a briefly look: variable’s value changes – Operational semantics • Dynamic semantics tries to capture this – Axiomatic semantics – Denotational semantics – E.g., proving that an array index will never be out of its intended range

Dynamic Semantics Operational Semantics

• Q: How might we define what expression in a • Idea: describe the meaning of a program in language mean? language L by specifying how statements effect • A: One approach is to give a general mechanism the state of a machine (simulated or actual) when to translate a sentence in L into a set of sentences executed. in another language or system that is well defined • The change in the state of the machine (memory, • For example: registers, stack, heap, etc.) defines the meaning of • Define the meaning of computer science terms by the statement translating them in ordinary English • Define the meaning of English by showing how to • Similar in spirit to the notion of a Turing Machine translate into French and also used informally to explain higher-level • Define the meaning of French expression by constructs in terms of simpler ones. translating into mathematical logic

CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. Alan Turing and his Machine Operational Semantics • The Turing machine is an abstract machine introduced in 1936 by Alan Turing • This is a common technique – Alan Turing (1912 –54) was a British mathematician, logician, cryptographer, considered a father of modern computer science • Here’s how we might explain the meaning of the • It can be used to give a mathematically precise for statement in C in terms of a simpler definition of algorithm or 'mechanical procedure’ reference language: – Concept is still widely used in theoretical computer science, especially in complexity theory and the theory of computation. c statement operational semantics for(e1;e2;e3) e1; {} loop: if e2=0 goto exit e3; goto loop exit:

Operational Semantics Operational Semantics

• To use operational semantics for a high-level A better alternative: a complete computer language, a virtual machine in needed simulation • A hardware pure interpreter is too expensive • Build a translator (translates source code to the • A software pure interpreter also has machine code of an idealized computer) problems: • The detailed characteristics of the particular • Build a simulator for the idealized computer computer makes actions hard to understand Evaluation of operational semantics: • Such a semantic definition would be • Good if used informally machine-dependent • Extremely complex if used formally (e.g. VDL)

CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. What’s a calculus, anyway? Vienna Definition Language The Lambda “ACalculus method of computation or calculation in a special notation (as of logic or symbolic logic)” -- • The first use of operationalMerriam-Webster semantics was in • VDL was a language developed at the lambda calculus IBM Vienna Labs as a language for formal, algebraic definition via – A formal system designed to investigate function operational semantics. definition, function application and recursion • It was used to specify the semantics of PL/I – Introduced by Alonzo Church and Stephen Kleene in the 1930s • See: The Vienna Definition Language, P. Wegner, ACM Comp Surveys 4(1):5-63 (Mar 1972) • The lambda calculus can be called the smallest universal programming language • The VDL specification of PL/I was very large, very complicated, a remarkable technical • It’s widely used today as a target for defining accomplishment, and of little practical use. the semantics of a programming language

The Lambda Calculus The Lambda Calculus

• The lambda calculus consists of a single • The lambda calculus transformation rule (variable substitution) and – introduces variables ranging over values a single function definition scheme – defines functions by (lambda) abstracting • The lambda calculus is universal in the sense over variables that any computable function can be expressed and evaluated using this formalism – applies functions to values • We’ll revisit the lambda calculus later in the • Examples: course simple expression: x + 1 • The Lisp language is close to the lambda function that adds one to its arg: λx. x + 1 calculus model applying it to 2: (λx. x + 1) 2

CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. Operational Semantics Summary Axiomatic Semantics • Based on formal logic (first order predicate calculus) • The basic idea is to define a language’s • Original purpose: formal program verification semantics in terms of a reference • Approach: Define axioms and inference rules in logic language, system or machine for each statement type in the language (to allow transformations of expressions to other expressions) • It’s use ranges from the theoretical (e.g., • The expressions are called assertions and are either lambda calculus) to the practical (e.g., Java Virtual Machine) • Preconditions: An assertion before a statement states the relationships and constraints among variables that are true at that point in execution • Postconditions: An assertion following a statement

Logic 101 Propositional logic: Logical constants: true, false Propositional symbols: P, Q, S, ... that are either true or false Logical connectives: ∧ (and) , ∨ (or), ⇒ (implies), ⇔ (is equivalent), ¬ (not) which are defined by the truth tables below. Sentences are formed by combining propositional symbols, connectives and parentheses and are either true or false. e.g.: P∧Q ⇔ ¬ (¬P ∨ ¬Q) First order logic adds (1) Variables which can range over objects in the domain of discourse (2) Quantifiers including: ∀ (forall) and ∃ (there exists) (3) Predicates to capture domain classes and relations Examples: (∀p) (∀q) p∧q ⇔ ¬ (¬p ∨ ¬q) ∀x prime(x) ⇒ ∃y prime(y) ∧ y>x

CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. Axiomatic Semantics Axiomatic Semantics • Axiomatic semantics is based on Hoare Logic (after A weakest precondition is the least restrictive computer scientists Sir Tony Hoare) precondition that will guarantee the postcondition • Based on triples that describe how execution of a Notation: statement changes the state of the computation {P} Statement {Q} • Example: {P} S {Q} where precondition postcondition - P is a logical statement of what’s true before executing S Example: - Q is a logical expression describing what’s true after {?} a := b + 1 {a > 1} • In general we can reason forward or backward We often need to infer what the precondition must be for a - Given P and S determine Q given post-condition - Given S and Q determine P One possible precondition: {b>10} • Concrete example: {x>0} x = x+1 {x>1} Another: {b>1} Weakest precondition: {b > 0}

Weakest Precondition? Axiomatic Semantics in Use • A weakest precondition is the least restrictive precondition that will guarantee the post-condition Program proof process: • There are an infinite number of possible preconditions P? that satisfy • The post-condition for the whole program {P?} a := b + 1 {a > 1} is the desired results • Namely b>0, b>1, b>2, b>3, b>4, … • Work back through the program to the first • The weakest precondition is one that logically is implied statement by all of the (other) preconditions • If the precondition on the first statement is • b>1 => b>0 the same as (or implied by) the program • b>2 => b>0 • b>3 => b>0 specification, the program is correct • …

CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. Example: Assignment Statements Axiomatic Semantics

Here’s how we might define a simple assignment • The Rule of Consequence: A notation from symbolic logic for statement of the form x := e in a programming {P} S {Q}, P’ => P, Q => Q’ specifying a rule of language. {P'} S {Q'} inference with pre- mise P and conse- • {Qx->E} x := E {Q} • An inference rule for sequences quence Q is • Where Qx->E means the result of replacing all P occurrences of x with E in Q for a sequence S1 ; S2: Q So from {P1} S1 {P2} e.g., modus ponens can be specified as: {Q} a := b/2-1 {a<10} {P2} S2 {P3} P, P=>Q We can infer that the weakest precondition Q is the inference rule is: Q b/2-1<1 which can be rewritten as or b<22 {P1} S1 {P2}, {P2} S2 {P3} {P1} S1; S2 {P3}

Conditional Example Conditions Here’s a rule for a conditional statement Suppose we have: {B ∧ P} S1 {Q}, {¬Β ∧ P} S2 {Q} {P} {P} if B then S1 else S2 {Q} If x>0 then y=y-1 else y=y+1 And an example of its use for the statement {y>0} {P} if x>0 then y=y-1 else y=y+1 {y>0} Our rule So the weakest precondition P can be deduced as follows: {B ∧ P} S1 {Q}, {¬Β ∧ P} S2 {Q} {P} if B then S1 else S2 {Q} The postcondition of S1 and S2 is Q. Consider the two cases: The weakest precondition of S1 is x>0 ∧ y>1 and for S2 is x<=0 ∧ y>-1 – x>0 and y>1 The rule of consequence and the fact that y>1 ⇒ y>-1 supports the conclusion – x<=0 and y>-1 That the weakest precondition for the entire conditional is y>1 . • What is a (weakest) condition that implies both y>1 and y>-1

CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. Conditional Example Loops For the loop construct {P} while B do S end {Q} • What is a (weakest) condition that implies both the inference rule is: y>1 and y>-1? • Well y>1 implies y>-1 {I ∧ B} S {I} _ • y>1 is the weakest condition that ensures that {I} while B do S {I ∧ ¬B} after the conditional is executed, y>0 will be true. where I is the loop invariant, a proposition necessarily true throughout the loop’s • Our answer then is this: execution {y>1} • I is true before the loop executes and also If x>0 then y=y-1 else y=y+1 after the loop executes {y>0} • B is false after the loop executes

Loop Invariants Evaluation of Axiomatic Semantics A loop invariant I must meet the following conditions: 1. P => I (the loop invariant must be true initially) • Developing axioms or inference rules for 2. {I} B {I} (evaluation of the Boolean must not change the validity of I) all of the statements in a language is 3. {I and B} S {I} (I is not changed by executing the body of the loop) difficult 4. (I and (not B)) => Q (if I is true and B is false, Q is implied) • It is a good tool for correctness proofs, 5. The loop terminates (this can be difficult to prove) and an excellent framework for • The loop invariant I is a weakened version of the loop reasoning about programs postcondition, and it is also a precondition. • It is much less useful for language users • I must be weak enough to be satisfied prior to the beginning of and compiler writers the loop, but when combined with the loop exit condition, it must be strong enough to force the truth of the postcondition

CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. Denotational Semantics Denotational Semantics • A technique for describing the meaning of programs in terms of mathematical functions on • The process of building a denotational programs and program components. specification for a language: 1. Define a mathematical object for each • Programs are translated into functions about language entity which properties can be proved using the standard 2. Define a function that maps instances of the mathematical theory of functions, and especially language entities onto instances of the domain theory. corresponding mathematical objects • Originally developed by Scott and Strachey • The meaning of language constructs are defined (1970) and based on recursive function theory by only the values of the program's variables • The most abstract semantics description method

Denotational Semantics (continued) Example: Decimal Numbers The difference between denotational and operational semantics: In operational semantics, the state changes are

• Let VARMAP be a function that, when given a variable Mdec (

VARMAP(ij, s) = vj

CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. Expressions Assignment Statements

Me(` => if Me(E, s) = error if VARMAP( , s) = undef then error then error else VARMAP(, s) else s’ = {`

Logical Pretest Loops Logical Pretest Loops Ml(while B do L, s) Δ=

if Mb(B, s) = undef • The meaning of the loop is the value of the then error program variables after the statements in the loop have been executed the prescribed number of else if Mb(B, s) = false times, assuming there have been no errors then s • In essence, the loop has been converted from iteration to recursion, where the recursive control else if Msl(L, s) = error is mathematically defined by other recursive state then error mapping functions • Recursion, when compared to iteration, is easier to else Ml(while B do L, Msl(L, s)) describe with mathematical rigor

CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. Denotational Semantics Summary

Evaluation of denotational semantics: This lecture we covered the following • Can be used to prove the correctness of • Backus-Naur Form and Context Free programs Grammars • Provides a rigorous way to think about • Syntax Graphs and Attribute Grammars programs • Semantic Descriptions: Operational, • Can be an aid to language design Axiomatic and Denotational • Has been used in compiler generation systems