Parsing • Context-Free Grammars (CFG’S) (Adapted from CS 164 at Berkeley) • Derivations • Syntax-Directed Translation

Parsing • Context-Free Grammars (CFG’S) (Adapted from CS 164 at Berkeley) • Derivations • Syntax-Directed Translation

Outline • Parser overview Introduction to Parsing • Context-free grammars (CFG’s) (adapted from CS 164 at Berkeley) • Derivations • Syntax-Directed Translation The Functionality of the Parser Example • Input: sequence of tokens from lexer • Pyth: if x == y: z =1 else: z = 2 • Output: abstract syntax tree of the program •Parser input: IF ID == ID : ID = INT ELSE : ID = INT •Parser output (abstract syntax tree): • One-pass compiler: directly generate assembly code IF-THEN-ELSE – This is what you will do in the first assignment –Bali SaM code == == ID ID IDINT ID INT 1 Why A Tree? Notation for Programming Languages • Each stage of the compiler has two purposes: • Grammars: – Detect and filter out some class of errors E int – Compute some new information or translate the E E + E representation of the program to make things E E * E easier for later stages E ( E ) • Recursive structure of tree suits recursive • We can view these rules as rewrite rules structure of language definition – We start with E and replace occurrences of E with • With tree, later stages can easily find “the some right-hand side else clause”, e.g., rather than having to scan •E E * E ( E ) * E ( E + E ) * E … through tokens to find it. (int + int) * int Context-Free Grammars Examples of CFGs •A CFG consists of Simple arithmetic expressions: –A set of non-terminals N E int • By convention, written with capital letter in these notes E E + E –A set of terminals T E E * E • By convention, either lower case names or punctuation E ( E ) –A start symbol S (a non-terminal) – One non-terminal: E –A set of productions – Several terminals: int, +, *, (, ) • Assuming E N • Called terminals because they are never replaced E , or – By convention the non-terminal for the first production is the start one E Y1 Y2 ... Yn where Yi N T 2 Key Idea The Language of a CFG (Cont.) 1. Begin with a string consisting of the start Write symbol * X1 … Xn Y1 … Ym 2. Replace any non-terminal X in the string by a if right-hand side of some production X1 … Xn … … Y1 … Ym X Y1 … Yn 3. Repeat (2) until there are only terminals in in 0 or more steps the string 4. The successive strings created in this way are called sentential forms. The Language of a CFG Examples: Let G be a context-free grammar with start • S 0 also written as S 0 | 1 symbol S. Then the language of G is: S 1 Generates the language { “0”, “1” } * L(G) = { a1 … an | S a1 … an and every ai • What about S 1 A is a terminal } A 0 | 1 • What about S 1 A A 0 | 1 A • What about S | ( S ) 3 Derivations and Parse Trees Derivation Example •Aderivation is a sequence of sentential forms • Grammar resulting from the application of a sequence of E E + E | E * E | (E) | int productions S … … •String int * int + int • Parse tree: summary of derivation w/o specifying completely the order in which rules were applied – Start symbol is the tree’s root – For a production X Y1 … Yn add children Y1, …, Yn to node X Derivation in Detail (1) Derivation in Detail (2) E E E E E + E E + E 4 Derivation in Detail (3) Derivation in Detail (4) E E E E E + E E + E E + E E + E E * E + E E * E + E int * E + E E * E E * E int Derivation in Detail (5) Derivation in Detail (6) E E E E E + E E + E E + E E + E E * E + E E * E + E int * E + E int * E + E int * int + E E * E int * int + E E * E int int * int + int int int int int 5 Notes on Derivations AST vs. Parse Tree • A parse tree has • AST is condensed form of a parse tree – Terminals at the leaves – operators appear at internal nodes, not at leaves. – Non-terminals at the interior nodes – "Chains" of single productions are collapsed. – Lists are "flattened". • A left-right traversal of the leaves is the – Syntactic details are omitted original input • e.g., parentheses, commas, semi-colons • The parse tree shows the association of • AST is a better structure for later compiler operations, the input string does not ! stages – There may be multiple ways to match the input – omits details having to do with the source language, – Derivations (and parse trees) choose one – only contains information about the essential structure of the program. Example: 2 * (4 + 5) Parse tree vs. AST Summary of Derivations E • We are not just interested in whether T * s L(G) • Also need derivation (or parse tree) and AST. T * F • Parse trees slavishly reflect the grammar. 2 + • Abstract syntax trees abstract from the grammar, F cutting out detail that interferes with later stages. ( E ) 4 5 • A derivation defines a parse tree – But one parse tree may have many derivations int (2) E T + • Derivations drive translation (to ASTs, etc.) T F • Leftmost and rightmost derivations most important in F int (5) parser implementation int (4) 6 Ambiguity Ambiguity. Example • Grammar The string int + int + int has two parse trees E E + E | E * E | ( E ) | int E E E + E E E •Strings + int + int + int E + E int int E + E int * int + int int int int int + is left-associative Ambiguity. Example Ambiguity (Cont.) The string int * int + int has two parse trees • A grammar is ambiguous if it has more than one parse tree for some string E E – Equivalently, there is more than one rightmost or leftmost derivation for some string E + E E * E • Ambiguity is bad E * E int int E + E – Leaves meaning of some programs ill-defined • Ambiguity is common in programming languages int int int int – Arithmetic expressions –IF-THEN-ELSE * has higher precedence than + 7 Dealing with Ambiguity Ambiguity. Example • There are several ways to handle ambiguity The int * int + int has only one parse tree now E E • Most direct method is to rewrite the grammar unambiguously E + T E * E E E + T | T T int int E + E T T * int | int | ( E ) T * int int int • Enforces precedence of * over + • Enforces left-associativity of + and * int Ambiguity Associativity Declarations • Impossible to convert automatically an ambiguous • Consider the grammar E E + E | int grammar to an unambiguous one • Used with care, ambiguity can simplify the grammar • Ambiguous: two parse trees of int + int + int – Sometimes allows more natural definitions E E – But we need disambiguation mechanisms • Instead of rewriting the grammar E + E E + E – Use the more natural (ambiguous) grammar – Along with disambiguating declarations E + E int int E + E • Most tools allow precedence and associativity declarations to disambiguate grammars int int int int •Examples … • Left-associativity declaration: %left ‘+’ 8 Summary • Grammar is specified using a context-free language (CFL) • Derivation: starting from start symbol, use grammar rules as rewrite rules to derive input string – Leftmost and rightmost derivations • Parse trees and abstract syntax trees • Ambiguous grammars – Ambiguity should be eliminated by modifying grammar, by specifying precedence rules etc. depending on how ambiguity arises in the grammar • Remaining question: how do we find the derivation for a given input string? 9.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    9 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us