Overview • Syntax is about form and semantics meaning – Boundary between syntax & semantics is not always clear • First we’ll motivate why semantics matters • Then we’ll look at issues close to the syntax end (e.g., static semantics) and attribute grammars • Finally we’ll sketch three approaches to defining “deeper” semantics: (1) (2) Axiomatic semantics (3)

CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc.

Motivation Motivation: Some Reasons • Capturing what a program in some • How to convey to the programming language programming language means is very difficult compiler/interpreter writer what she should do? • We can’t really do it in any practical sense – Natural language may be too ambiguous – For most work-a-day programming • How to know that the compiler/interpreter did languages (e.g., C, C++, Java, Perl, C#, the right thing when it executed our code? Python) – We can’t answer this w/o a very solid idea of what – For large programs the right thing is • So, why is worth trying? • How to be sure that the program satisfies its specification? – Maybe we can do this automatically if we know what the program means

CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. Program Verification Program Verification

• Program verification is the process of formally proving that the computer program does exactly • There are applications where it is worth it to what is stated in the program’s specification (1) use a simplified programming language • Program verification can be done for simple (2) work out formal specs for a program programming languages and small or moderately sized programs (3) capture the semantics of the simplified PL and (4) do the hard work of putting it all together and • It requires a formal specification for what the proving program correctness. program should do – e.g., its inputs and the actions to take or output to generate given the inputs • What are they? • That’s a hard task in itself!

CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc.

Program Verification Double Int kills Ariane 5

• There are applications where it is worth it to (1) use a simplified programming language, (2) work out formal • It took the European Space Agency specs for a program, (3) capture the semantics of the 10 years and $7B to produce simplified PL and (4) do the hard work of putting it all Ariane 5, a giant rocket capable of together and proving program correctness. Like… hurling a pair of three-ton satellites • Security and encryption into orbit with each launch and • Financial transactions intended to give Europe supremacy in the commercial space business • Applications on which lives depend (e.g., healthcare, aviation) • All it took to explode the rocket less • Expensive, one-shot, un-repairable applications (e.g., than a minute into its maiden voyage Martian rover) in 1996 was a small computer program trying to • Hardware design (e.g. Pentium chip) stuff a 64-bit number into a 16-bit space.

CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. Intel Pentium Bug So…

• In the mid 90’s a bug was found in • While automatic program verification is a the floating point hardware in Intel’s long range goal … latest Pentium microprocessor • Which might be restricted to applications • Unfortunately, the bug was only found where the extra cost is justified after many had been made and sold • We should try to design programming • The bug was subtle, effecting only the ninth languages that help, rather than hinder, our decimal place of some computations ability to make progress in this area. • But users cared • We should continue research on the semantics of programming languages … • Intel had to recall the chips, taking a $500M • And the ability to prove program correctness write-off

CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc.

Semantics Static Semantics • Static semantics covers some language features • Next we’ll look at issues close to the syntax difficult or impossible to handle in a BNF/CFG end, what some calls static semantics, and • It’s also a mechanism for building a parser that the technique of attribute grammars. produces an abstract syntax tree of its input • Then we’ll sketch three approaches to • Attribute grammars are one common technique defining “deeper” semantics • Categories attribute grammars can handle: (1) Operational semantics - Context-free but cumbersome (e.g., type (2) Axiomatic semantics checking) (3) Denotational semantics - Non-context-free (e.g., variables must be declared before they are used)

CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. Parse tree vs. abstract syntax tree Attribute Grammars • Parse trees follow a grammar and usually have many nodes that • Attribute Grammars (AGs) were developed are artifacts of how the grammar was written by Donald Knuth in ~1968 • An abstract syntax tree (AST) eliminates useless structural nodes • It uses nodes corresponding to constructs in the programming • Motivation: language and is easier to interpret or generate code from it • CFGs can’t describe all of the syntax of • Consider 1 + 2 + 3: programming languages e! +! +! • Additions to CFGs to annotate the parse e!! +! intint!! e+!! int! e+!! 3! tree with some “semantic” info

e! +! int! 3! inte!! int! 3! 1! 2! • Primary value of AGs:

int! 2! 1! 2! • Static semantics specification

1! • Compiler design (static semantics checking) parse tree an AST another AST CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc.

Attribute Grammar Example Attribute Grammars A Grammar is formally defined by specifying four components. • Ada has this rule to describe procedure definitions: • S is the start symbol Def: An attribute grammar is a CFG• N is a set of non-terminal symbols => procedure end ; • T is a set of terminal symbols G=(S,N,T,P) • P is a set of productions or rules • The name after procedure must be the same as the name after end with the following additions: – For each grammar symbol x there is a set A(x) of • This is not possible to capture in a CFG (in practice) because there are too many names attribute values –Each rule has a set of functions that define certain • Solution: annotate parse tree nodes with attributes and add a “semantic” rules or constraints to the syntactic attributes of the non-terminals in the rule rule in the grammar – Each rule has a (possibly empty) set of predicates to check for attribute consistency rule: => procedure [1] end [2] ; constraint: [1].string == [2].string

CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. Attribute Grammars Attribute Grammars Example: expressions of the form id + id

• Let X0 => X1 ... Xn be a grammar rule •id 's can be either int_type or real_type

• Functions of the form S(X0) = f(A(X1),...A(Xn) • types of the two id's must be the same define synthesized attributes • type of the expression must match its expected type - i.e., attribute defined by a nodes children BNF: -> + • Functions of the form I(Xj) = f(A(X0),…A(Xn)) for i <= j <= n define inherited attributes -> id - i.e., attribute defined by parent and siblings Attributes: • Initially, there are intrinsic attributes on the actual_type - synthesized for and leaves expected_type - inherited for - i.e., attribute predefined

CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc.

Attribute Grammars Attribute Grammars (continued) Attribute Grammar: How are attribute values computed?

1. Syntax rule: -> [1] + [2] • If all attributes were inherited, the tree Semantic rules: could be decorated in top-down order .actual_type ← [1].actual_type Predicate: ! • If all attributes were synthesized, the tree [1].actual_type == [2].actual_type .expected_type == .actual_type could be decorated in bottom-up order

Compilers usually maintain a 2. Syntax rule: -> id “symbol table” where they • In many cases, both kinds of attributes are record the names of proce- Semantic rule:! dures and variables along used, and it is some combination of top- with type type information. .actual_type ← Looking up this information down and bottom-up that must be used in the symbol table is a com- lookup_type (id, ) mon operation.

CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. Attribute Grammars (continued) Attribute Grammar Summary

Suppose we process the expression A+B • AGs are a practical extension to CFGs that allow us to using rule -> [1] + [2] annotate the parse tree with information needed for semantic processing .expected_type ← inherited from parent – e.g., interpretation or compilation [1].actual_type ← lookup (A, [1]) [2].actual_type ← lookup (B, [2]) • We call the annotated tree an abstract syntax tree [1].actual_type == [2].actual_type – It no longer just reflects the derivation .actual_type ← [1].actual_type • AGs can move information from anywhere in abstract .actual_type == .expected_type syntax tree to anywhere else in a controlled way – Needed for no-local syntactic dependencies (e.g., Ada example) and for semantics

CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc.

Static vs. Dynamic Semantics Dynamic Semantics • Attribute grammar is an example of static semantics (e.g., type checking) that don’t • No single widely acceptable notation or reason about how things change when a formalism for describing dynamic program is executed semantics • Understanding what a program means often • Here are three approaches we’ll briefly requires reasoning about how, for example, a examine: variable’s value changes – Operational semantics • Dynamic semantics tries to capture this – Axiomatic semantics – Denotational semantics – E.g., proving that an array index will never be out of its intended range CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. Dynamic Semantics Operational Semantics

• Q: How might we define what expression in a • Idea: describe the meaning of a program in language mean? language L by specifying how statements effect • A: One approach is to give a general mechanism the state of a machine (simulated or actual) when to translate a sentence in L into a set of sentences executed. in another language or system that is well defined • The change in the state of the machine (memory, • For example: registers, stack, heap, etc.) defines the meaning of • Define the meaning of computer science terms by the statement translating them in ordinary English • Define the meaning of English by showing how to • Similar in spirit to the notion of a Turing Machine translate into French and also used informally to explain higher-level • Define the meaning of French expression by constructs in terms of simpler ones. translating into mathematical logic turtles all the way down CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc.

Alan Turing and his Machine Operational Semantics • The Turing machine is an abstract machine introduced in 1936 by Alan Turing • This is a common technique – Alan Turing (1912 –54) was a British mathematician, logician, • Here’s how we might explain the meaning of the cryptographer, considered a father of modern computer science for statement in C in terms of a simpler • It can be used to give a reference language: mathematically precise

definition of algorithm c statement operational semantics or 'mechanical procedure’ for(e1;e2;e3) e1; {} loop: if e2=0 goto exit • Concept widely used in theo- retical computer science, e3; goto loop especially in complexity exit: theory and the theory of computation

CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. Operational Semantics Operational Semantics

• To use operational semantics for a high-level A better alternative: a complete computer language, a virtual machine in needed simulation • A hardware pure interpreter is too expensive • Build a translator (translates source code to the • A software pure interpreter also has machine code of an idealized computer) problems: • The detailed characteristics of the particular • Build a simulator for the idealized computer computer makes actions hard to understand Evaluation of operational semantics: • Such a semantic definition would be • Good if used informally machine-dependent • Extremely complex if used formally (e.g. VDL)

CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc.

What’s a calculus, anyway? Vienna Definition Language The Lambda “ACalculus method of computation or calculation in a special notation (as of logic or symbolic logic)” -- • The first use of operationalMerriam-Webster semantics was in • VDL was a language developed at the lambda calculus IBM Vienna Labs as a language for formal, algebraic definition via – A formal system designed to investigate function operational semantics. definition, function application and recursion • It was used to specify the semantics of PL/I – Introduced by Alonzo Church and Stephen Kleene in the 1930s • See: The Vienna Definition Language, P. Wegner, ACM Comp Surveys 4(1):5-63 (Mar 1972) • The lambda calculus can be called the smallest universal programming language • The VDL specification of PL/I was very large, very complicated, a remarkable technical • It’s widely used today as a target for defining accomplishment, and of little practical use. the semantics of a programming language

CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. The Lambda Calculus The Lambda Calculus

• The lambda calculus consists of a single • The lambda calculus transformation rule (variable substitution) and – introduces variables ranging over values a single function definition scheme – defines functions by (lambda) abstracting • The lambda calculus is universal in the sense over variables that any computable function can be expressed and evaluated using this formalism – applies functions to values • We’ll revisit the lambda calculus later in the • Examples: course simple expression: x + 1 • The Lisp language is close to the lambda function that adds one to its arg: λx. x + 1 calculus model applying it to 2: (λx. x + 1) 2

CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc.

Operational Semantics Summary Axiomatic Semantics • Based on formal logic (first order predicate calculus) • The basic idea is to define a language’s • Original purpose: formal program verification semantics in terms of a reference • Approach: Define axioms and inference rules in logic language, system or machine for each statement type in the language (to allow transformations of expressions to other expressions) • It’s use ranges from the theoretical (e.g., • The expressions are called assertions and are either lambda calculus) to the practical (e.g., Java Virtual Machine) • Preconditions: An assertion before a statement states the relationships and constraints among variables that are true at that point in execution • Postconditions: An assertion following a statement

CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. Logic 101 Propositional logic: Logical constants: true, false Propositional symbols: P, Q, S, ... that are either true or false Logical connectives: ∧ (and) , ∨ (or), ⇒ (implies), ⇔ (is equivalent), ¬ (not) which are defined by the truth tables below. Sentences are formed by combining propositional symbols, connectives and parentheses and are either true or false. e.g.: P∧Q ⇔ ¬ (¬P ∨ ¬Q) First order logic adds (1) Variables which can range over objects in the domain of discourse (2) Quantifiers including: ∀ (forall) and ∃ (there exists) (3) Predicates to capture domain classes and relations Examples: (∀p) (∀q) p∧q ⇔ ¬ (¬p ∨ ¬q) ∀x prime(x) ⇒ ∃y prime(y) ∧ y>x

CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc.

Axiomatic Semantics Axiomatic Semantics • Axiomatic semantics is based on (after A weakest precondition is the least restrictive computer scientists Sir Tony Hoare) precondition that will guarantee the postcondition • Based on triples that describe how execution of a Notation: statement changes the state of the computation {P} Statement {Q} • Example: {P} S {Q} where precondition postcondition - P is a logical statement of what’s true before executing S Example: - Q is a logical expression describing what’s true after {?} a := b + 1 {a > 1} • In general we can reason forward or backward We often need to infer what the precondition must be for a - Given P and S determine Q given post-condition - Given S and Q determine P One possible precondition: {b>10} • Concrete example: {x>0} x = x+1 {x>1} Another: {b>1} Weakest precondition: {b > 0}

CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. Weakest Precondition? Axiomatic Semantics in Use • A weakest precondition is the least restrictive precondition that will guarantee the post-condition Program proof process: • There are an infinite number of possible preconditions P? that satisfy • The post-condition for the whole program {P?} a := b + 1 {a > 1} is the desired results • Namely b>0, b>1, b>2, b>3, b>4, … • Work back through the program to the first • The weakest precondition is one that logically is implied statement by all of the (other) preconditions • If the precondition on the first statement is • b>1 => b>0 the same as (or implied by) the program • b>2 => b>0 • b>3 => b>0 specification, the program is correct • …

CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc.

Example: Assignment Statements Axiomatic Semantics

Here’s how we might define a simple assignment • The Rule of Consequence: A notation from symbolic logic for statement of the form x := e in a programming {P} S {Q}, P’ => P, Q => Q’ specifying a rule of language. {P'} S {Q'} inference with pre- mise P and conse- • {Qx->E} x := E {Q} • An inference rule for sequences quence Q is • Where Qx->E means the result of replacing all P occurrences of x with E in Q for a sequence S1 ; S2: Q So from {P1} S1 {P2} e.g., modus ponens can be specified as: {Q} a := b/2-1 {a<10} {P2} S2 {P3} P, P=>Q We can infer that the weakest precondition Q is the inference rule is: Q b/2-1<1 which can be rewritten as or b<22 {P1} S1 {P2}, {P2} S2 {P3} {P1} S1; S2 {P3}

CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. Conditional Example Conditions Here’s a rule for a conditional statement Suppose we have: {B ∧ P} S1 {Q}, {¬Β ∧ P} S2 {Q} {P} {P} if B then S1 else S2 {Q} If x>0 then y=y-1 else y=y+1 And an example of its use for the statement {y>0} {P} if x>0 then y=y-1 else y=y+1 {y>0} Our rule So the weakest precondition P can be deduced as follows: {B ∧ P} S1 {Q}, {¬Β ∧ P} S2 {Q} {P} if B then S1 else S2 {Q} The postcondition of S1 and S2 is Q. Consider the two cases: The weakest precondition of S1 is x>0 ∧ y>1 and for S2 is x<=0 ∧ y>-1 – x>0 and y>1 The rule of consequence and the fact that y>1 ⇒ y>-1 supports the conclusion – x<=0 and y>-1 That the weakest precondition for the entire conditional is y>1 . • What is a (weakest) condition that implies both y>1 and y>-1

CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc.

Conditional Example Loops For the loop construct {P} while B do S end {Q} • What is a (weakest) condition that implies both the inference rule is: y>1 and y>-1? • Well y>1 implies y>-1 {I ∧ B} S {I} _ • y>1 is the weakest condition that ensures that {I} while B do S {I ∧ ¬B} after the conditional is executed, y>0 will be true. where I is the loop invariant, a proposition necessarily true throughout the loop’s • Our answer then is this: execution {y>1} • I is true before the loop executes and also If x>0 then y=y-1 else y=y+1 after the loop executes {y>0} • B is false after the loop executes

CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. Loop Invariants Evaluation of Axiomatic Semantics A loop invariant I must meet the following conditions: 1. P => I (the loop invariant must be true initially) • Developing axioms or inference rules for 2. {I} B {I} (evaluation of the Boolean must not change the validity of I) all of the statements in a language is 3. {I and B} S {I} (I is not changed by executing the body of the loop) difficult 4. (I and (not B)) => Q (if I is true and B is false, Q is implied) • It is a good tool for correctness proofs, 5. The loop terminates (this can be difficult to prove) and an excellent framework for • The loop invariant I is a weakened version of the loop reasoning about programs postcondition, and it is also a precondition. • It is much less useful for language users • I must be weak enough to be satisfied prior to the beginning of and compiler writers the loop, but when combined with the loop exit condition, it must be strong enough to force the truth of the postcondition

CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc.

Denotational Semantics Denotational Semantics • A technique for describing the meaning of programs in terms of mathematical functions on • The process of building a denotational programs and program components. specification for a language: 1. Define a mathematical object for each • Programs are translated into functions about language entity which properties can be proved using the standard 2. Define a function that maps instances of the mathematical theory of functions, and especially language entities onto instances of the domain theory. corresponding mathematical objects • Originally developed by Scott and Strachey • The meaning of language constructs are defined (1970) and based on recursive function theory by only the values of the program's variables • The most abstract semantics description method

CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. Denotational Semantics (continued) Example: Decimal Numbers The difference between denotational and operational semantics: In operational semantics, the state changes are → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 defined by coded algorithms; in denotational semantics, | (0|1|2|3|4|5|6|7|8|9) they are defined by rigorous mathematical functions M ('0') = 0, M ('1') = 1, …, M ('9') = 9 • The state of a program is the values of all its current dec dec dec M ( '0') = 10 * M () variables dec dec Mdec ( '1’) = 10 * Mdec () + 1 s = {, , …, } …

• Let VARMAP be a function that, when given a variable Mdec ( '9') = 10 * Mdec () + 9 name and a state, returns the current value of the variable

VARMAP(ij, s) = vj

CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc.

Expressions Assignment Statements

Me(, s) Δ= case of Ma(x := E, s) = => Mdec(, s) Δ => if Me(E, s) = error if VARMAP(, s) = undef then error then error else VARMAP(, s) else s’ = {,,...,}, => where for j = 1, 2, ..., n, if (Me(., s) = undef vj’ = VARMAP(ij, s) if ij <> x OR Me(., s) = undef) = Me(E, s) if ij = x then error else if (. = ‘+’ then Me(., s) + Me(., s) else Me(., s) * Me(., s)

CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. Logical Pretest Loops Logical Pretest Loops Ml(while B do L, s) Δ=

if Mb(B, s) = undef • The meaning of the loop is the value of the then error program variables after the statements in the loop have been executed the prescribed number of else if Mb(B, s) = false times, assuming there have been no errors then s • In essence, the loop has been converted from iteration to recursion, where the recursive control else if Msl(L, s) = error is mathematically defined by other recursive state then error mapping functions • Recursion, when compared to iteration, is easier to else Ml(while B do L, Msl(L, s)) describe with mathematical rigor

CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc.

Denotational Semantics Summary

Evaluation of denotational semantics: This lecture we covered the following • Can be used to prove the correctness of • Backus-Naur Form and Context Free programs Grammars • Provides a rigorous way to think about • Syntax Graphs and Attribute Grammars programs • Semantic Descriptions: Operational, • Can be an aid to language design Axiomatic and Denotational • Has been used in compiler generation systems

CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc.