Outline

• Parser overview Introduction to • Context-free (CFG’s) (adapted from CS 164 at Berkeley) • Derivations • -Directed Translation

The Functionality of the Parser Example

• Input: sequence of tokens from lexer • Pyth: if x == y: z =1 else: z = 2 • Output: abstract syntax of the program •Parser input: IF ID == ID : ID = INT ELSE : ID = INT •Parser output (): • One-pass : directly generate assembly code IF-THEN-ELSE – This is what you will do in the first assignment –Bali  SaM code == == ID ID IDINT ID INT

1 Why A Tree? Notation for Programming Languages

• Each stage of the compiler has two purposes: • Grammars: – Detect and filter out some class of errors E  int – Compute some new information or translate the E  E + E representation of the program to make things E  E * E easier for later stages E  ( E ) • Recursive structure of tree suits recursive • We can view these rules as rewrite rules structure of language definition – We start with E and replace occurrences of E with • With tree, later stages can easily find “the some right-hand side else clause”, e.g., rather than having to scan •E  E * E  ( E ) * E  ( E + E ) * E  … through tokens to find it.  (int + int) * int

Context-Free Grammars Examples of CFGs

•A CFG consists of Simple arithmetic expressions: –A set of non-terminals N E  int • By convention, written with capital letter in these notes E  E + E –A set of terminals T E  E * E • By convention, either lower case names or punctuation E  ( E ) –A start symbol S (a non-terminal) – One non-terminal: E –A set of productions – Several terminals: int, +, *, (, ) • Assuming E  N • Called terminals because they are never replaced E  , or – By convention the non-terminal for the first production is the start one E  Y1 Y2 ... Yn where Yi  N  T

2 Key Idea The Language of a CFG (Cont.)

1. Begin with a string consisting of the start Write symbol * X1 … Xn  Y1 … Ym 2. Replace any non-terminal X in the string by a if right-hand side of some production X1 … Xn  …  …  Y1 … Ym X  Y1 … Yn 3. Repeat (2) until there are only terminals in in 0 or more steps the string 4. The successive strings created in this way are called sentential forms.

The Language of a CFG Examples:

Let G be a context-free with start • S  0 also written as S  0 | 1 symbol S. Then the language of G is: S  1 Generates the language { “0”, “1” } * L(G) = { a1 … an | S  a1 … an and every ai • What about S  1 A is a terminal } A  0 | 1 • What about S  1 A A  0 | 1 A • What about S | ( S )

3 Derivations and Parse Trees Derivation Example

•Aderivation is a sequence of sentential forms • Grammar resulting from the application of a sequence of E  E + E | E * E | (E) | int productions S  …  … •String int * int + int • : summary of derivation w/o specifying completely the order in which rules were applied – Start symbol is the tree’s root

– For a production X  Y1 … Yn add children

Y1, …, Yn to node X

Derivation in Detail (1) Derivation in Detail (2)

E E E E  E + E E + E

4 Derivation in Detail (3) Derivation in Detail (4)

E E E E  E + E  E + E E + E E + E  E * E + E  E * E + E  int * E + E E * E E * E

int

Derivation in Detail (5) Derivation in Detail (6)

E E E E  E + E  E + E E + E E + E  E * E + E  E * E + E  int * E + E  int * E + E  int * int + E E * E  int * int + E E * E int  int * int + int int int int int

5 Notes on Derivations AST vs. Parse Tree

• A parse tree has • AST is condensed form of a parse tree – Terminals at the leaves – operators appear at internal nodes, not at leaves. – Non-terminals at the interior nodes – "Chains" of single productions are collapsed. – Lists are "flattened". • A left-right traversal of the leaves is the – Syntactic details are omitted original input • e.g., parentheses, commas, semi-colons • The parse tree shows the association of • AST is a better structure for later compiler operations, the input string does not ! stages – There may be multiple ways to match the input – omits details having to do with the source language, – Derivations (and parse trees) choose one – only contains information about the essential structure of the program.

Example: 2 * (4 + 5) Parse tree vs. AST Summary of Derivations

E • We are not just interested in whether T * s  L(G) • Also need derivation (or parse tree) and AST. T * F • Parse trees slavishly reflect the grammar. 2 + • Abstract syntax trees abstract from the grammar, F cutting out detail that interferes with later stages. ( E ) 4 5 • A derivation defines a parse tree – But one parse tree may have many derivations int (2) E T + • Derivations drive translation (to ASTs, etc.) T F • Leftmost and rightmost derivations most important in F int (5) parser implementation int (4)

6 Ambiguity Ambiguity. Example

• Grammar The string int + int + int has two parse trees  E E + E | E * E | ( E ) | int E E

E + E E E •Strings + int + int + int E + E int int E + E

int * int + int int int int int

+ is left-associative

Ambiguity. Example Ambiguity (Cont.)

The string int * int + int has two parse trees • A grammar is ambiguous if it has more than one parse tree for some string E E – Equivalently, there is more than one rightmost or leftmost derivation for some string E + E E * E • Ambiguity is bad E * E int int E + E – Leaves meaning of some programs ill-defined • Ambiguity is common in programming languages int int int int – Arithmetic expressions –IF-THEN-ELSE * has higher precedence than +

7 Dealing with Ambiguity Ambiguity. Example

• There are several ways to handle ambiguity The int * int + int has only one parse tree now E E • Most direct method is to rewrite the grammar unambiguously E + T E * E E  E + T | T T int int E + E T  T * int | int | ( E ) T * int int int • Enforces precedence of * over + • Enforces left-associativity of + and * int

Ambiguity Associativity Declarations

• Impossible to convert automatically an ambiguous • Consider the grammar E  E + E | int grammar to an unambiguous one • Used with care, ambiguity can simplify the grammar • Ambiguous: two parse trees of int + int + int – Sometimes allows more natural definitions E E – But we need disambiguation mechanisms • Instead of rewriting the grammar E + E E + E – Use the more natural (ambiguous) grammar – Along with disambiguating declarations E + E int int E + E • Most tools allow precedence and associativity declarations to disambiguate grammars int int int int •Examples … • Left-associativity declaration: %left ‘+’

8 Summary

• Grammar is specified using a context-free language (CFL) • Derivation: starting from start symbol, use grammar rules as rewrite rules to derive input string – Leftmost and rightmost derivations • Parse trees and abstract syntax trees • Ambiguous grammars – Ambiguity should be eliminated by modifying grammar, by specifying precedence rules etc. depending on how ambiguity arises in the grammar • Remaining question: how do we find the derivation for a given input string?

9