Introduction to Parsing Ambiguity and Syntax Errors

Introduction to Parsing Ambiguity and Syntax Errors

Outline Introduction to Parsing • Regular languages revisited Ambiguity and Syntax Errors • Parser overview • Context-free grammars (CFG’s) • Derivations • Ambiguity • Syntax errors Compiler Design 1 (2011) 2 Languages and Automata Limitations of Regular Languages • Formal languages are very important in CS Intuition: A finite automaton that runs long – Especially in programming languages enough must repeat states • A finite automaton cannot remember # of • Regular languages times it has visited a particular state – The weakest formal languages widely used • because a finite automaton has finite memory – Many applications – Only enough to store in which state it is – Cannot count, except up to a finite limit • We will also study context-free languages • Many languages are not regular • E.g., language of balanced parentheses is not regular: { (i )i | i ≥ 0} Compiler Design 1 (2011) 3 Compiler Design 1 (2011) 4 The Functionality of the Parser Example • Input: sequence of tokens from lexer • If-then-else statement if (x == y) then z =1; else z = 2; • Output: parse tree of the program • Parser input IF (ID == ID) THEN ID = INT; ELSE ID = INT; • Possible parser output IF-THEN-ELSE == = = ID ID ID INT ID INT Compiler Design 1 (2011) 5 Compiler Design 1 (2011) 6 Comparison with Lexical Analysis The Role of the Parser • Not all sequences of tokens are programs . • . Parser must distinguish between valid and Phase Input Output invalid sequences of tokens Lexer Sequence of Sequence of characters tokens • We need – A language for describing valid sequences of tokens Parser Sequence of Parse tree – A method for distinguishing valid from invalid tokens sequences of tokens Compiler Design 1 (2011) 7 Compiler Design 1 (2011) 8 Context-Free Grammars CFGs (Cont.) • Many programming language constructs have a • A CFG consists of recursive structure – A set of terminals T – A set of non-terminals N • A STMT is of the form – A start symbol S (a non-terminal) if COND then STMT else STMT , or – A set of productions while COND do STMT , or … Assuming X ∈ N the productions are of the form • Context-free grammars are a natural notation X →ε , or for this recursive structure X → Y1 Y2 ... Yn where Yi ∈ N ∪T Compiler Design 1 (2011) 9 Compiler Design 1 (2011) 10 Notational Conventions Examples of CFGs • In these lecture notes A fragment of our example language (simplified): – Non-terminals are written upper-case – Terminals are written lower-case STMT → if COND then STMT else STMT – The start symbol is the left-hand side of the first ⏐ production while COND do STMT ⏐ id = int Compiler Design 1 (2011) 11 Compiler Design 1 (2011) 12 Examples of CFGs (cont.) The Language of a CFG Grammar for simple arithmetic expressions: Read productions as replacement rules: → E → E * E X Y1 ... Yn Means X can be replaced by Y1 ... Yn ⏐ E + E X →ε ⏐ ( E ) Means X can be erased (replaced with empty string) ⏐ id Compiler Design 1 (2011) 13 Compiler Design 1 (2011) 14 Key Idea The Language of a CFG (Cont.) (1) Begin with a string consisting of the start More formally, we write symbol “S” X XX→ XXYYXX (2) Replace any non-terminal X in the string by 11LLin L i−+11 L min1 L a right-hand side of some production if there is a production XYY→ 1 n L XYYim→ 1 L (3) Repeat (2) until there are no non-terminals in the string Compiler Design 1 (2011) 15 Compiler Design 1 (2011) 16 The Language of a CFG (Cont.) The Language of a CFG Write Let G be a context-free grammar with start ∗ symbol S. Then the language of G is: XXYY11LLnm→ if aaSaa|→∗ and every a is a terminal { 11KKnn i } XX11LLLLnm→→→ YY in 0 or more steps Compiler Design 1 (2011) 17 Compiler Design 1 (2011) 18 Terminals Examples • Terminals are called so because there are no L(G) is the language of the CFG G rules for replacing them ii Strings of balanced parentheses {() |i ≥ 0} • Once generated, terminals are permanent Two grammars: • Terminals ought to be tokens of the language SS→ () SS→ () OR S → ε | ε Compiler Design 1 (2011) 19 Compiler Design 1 (2011) 20 Example Example (Cont.) A fragment of our example language (simplified): Some elements of the our language STMT → if COND then STMT id = int ⏐ if COND then STMT else STMT if (id == id) then id = int else id = int ⏐ while COND do STMT while (id != id) do id = int ⏐ id = int while (id == id) do while (id != id) do id = int COND → (id == id) if (id != id) then if (id == id) then id = int else id = int ⏐ (id != id) Compiler Design 1 (2011) 21 Compiler Design 1 (2011) 22 Arithmetic Example Notes Simple arithmetic expressions: The idea of a CFG is a big step. E→∗ E+E | E E | (E) | id But: Some elements of the language: • Membership in a language is just “yes” or “no”; id id + id we also need the parse tree of the input (id) id ∗ id • Must handle errors gracefully (id) ∗∗ id id (id) • Need an implementation of CFG’s (e.g., yacc) Compiler Design 1 (2011) 23 Compiler Design 1 (2011) 24 More Notes Derivations and Parse Trees • Form of the grammar is important A derivation is a sequence of productions – Many grammars generate the same language S →→→ – Parsing tools are sensitive to the grammar LLL A derivation can be drawn as a tree Note: Tools for regular languages (e.g., lex/ML-Lex) – Start symbol is the tree’s root are also sensitive to the form of the regular expression, but this is rarely a problem in practice – For a production XYY → 1 L n add children YY 1 L n to node X Compiler Design 1 (2011) 25 Compiler Design 1 (2011) 26 Derivation Example Derivation Example (Cont.) • Grammar E→∗ E+E | E E | (E) | id E E → E+E • String E + E id ∗ id + id →∗EE+E →∗id E + E E * E id →∗id id + E id id →∗id id + id Compiler Design 1 (2011) 27 Compiler Design 1 (2011) 28 Derivation in Detail (1) Derivation in Detail (2) E E E + E E E → E+E Compiler Design 1 (2011) 29 Compiler Design 1 (2011) 30 Derivation in Detail (3) Derivation in Detail (4) E E E E E + E E + E → E+E → E+E E * E →∗EE+E E * E →∗E+EE →∗id E + E id Compiler Design 1 (2011) 31 Compiler Design 1 (2011) 32 Derivation in Detail (5) Derivation in Detail (6) E E E E → E+E → E+E E + E E + E →∗EE+E →∗EE+E E * E →∗id E + E E * E id →∗id E + E → id∗ id + E →∗id id + E id id id id →∗id id + id Compiler Design 1 (2011) 33 Compiler Design 1 (2011) 34 Notes on Derivations Left-most and Right-most Derivations • A parse tree has • The example is a left-most – Terminals at the leaves derivation – At each step, replace the E – Non-terminals at the interior nodes left-most non-terminal → E+E • An in-order traversal of the leaves is the • There is an equivalent → E+id original input notion of a right-most derivation →∗EE + id • The parse tree shows the association of →∗Eid + id operations, the input string does not →∗id id + id Compiler Design 1 (2011) 35 Compiler Design 1 (2011) 36 Right-most Derivation in Detail (1) Right-most Derivation in Detail (2) E E E + E E E → E+E Compiler Design 1 (2011) 37 Compiler Design 1 (2011) 38 Right-most Derivation in Detail (3) Right-most Derivation in Detail (4) E E E E E + E E + E → E+E → E+E id → E+id E * E id → E+id → EE∗ + id Compiler Design 1 (2011) 39 Compiler Design 1 (2011) 40 Right-most Derivation in Detail (5) Right-most Derivation in Detail (6) E E E E → E+E → E+E E + E E + E → E+id → E+id E * E id →∗EE + id E * E id → EE∗ + id →∗Eid + id → E ∗id + id id id id →∗id id + id Compiler Design 1 (2011) 41 Compiler Design 1 (2011) 42 Derivations and Parse Trees Summary of Derivations • Note that right-most and left-most • We are not just interested in whether derivations have the same parse tree s ∈ L(G) – We need a parse tree for s • The difference is just in the order in which branches are added • A derivation defines a parse tree – But one parse tree may have many derivations • Left-most and right-most derivations are important in parser implementation Compiler Design 1 (2011) 43 Compiler Design 1 (2011) 44 Ambiguity Ambiguity (Cont.) • Grammar This string has two parse trees E → E + E | E * E | ( E ) | int E E • String E + E E * E int * int + int E * E int int E + E int int int int Compiler Design 1 (2011) 45 Compiler Design 1 (2011) 46 Ambiguity (Cont.) Dealing with Ambiguity • A grammar is ambiguous if it has more than • There are several ways to handle ambiguity one parse tree for some string – Equivalently, there is more than one right-most or • Most direct method is to rewrite grammar left-most derivation for some string unambiguously • Ambiguity is bad – Leaves meaning of some programs ill-defined E → T + E | T • Ambiguity is common in programming languages → T int* T | int| ( E ) – Arithmetic expressions – IF-THEN-ELSE • Enforces precedence of * over + Compiler Design 1 (2011) 47 Compiler Design 1 (2011) 48 Ambiguity: The Dangling Else The Dangling Else: Example • Consider the following grammar • The expression if C1 then if C2 then S3 else S4 S → if C then S has two parse trees | if C then S else S if if | OTHER C if C1 if S4 1 • This grammar is also ambiguous C2 S3 C2 S3 S4 • Typically we want the second form Compiler Design 1 (2011) 49 Compiler Design 1 (2011) 50 The Dangling Else: A Fix The Dangling Else: Example Revisited • else matches the closest unmatched then • The expression if C1 then if C2 then S3 else S4 • We can describe this in the grammar if if S → MIF /* all then are matched */ | UIF /* some then are unmatched */ C1 if C1 if S MIF → if C then MIF else MIF 4 | OTHER C S UIF → if C then S C2 S3 S4 2 3 | if C then MIF else UIF • A valid parse tree • Not valid because the (for a UIF) then expression is not • Describes the same set of strings a MIF Compiler Design 1 (2011)

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    16 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us