Introduction to Parsing Ambiguity and Syntax Errors

Introduction to Parsing Ambiguity and Syntax Errors

Outline Introduction to Parsing • Regular languages revisited Ambiguity and Syntax Errors • Parser overview • Context-free grammars (CFG’s) • Derivations • Ambiguity • Syntax errors 2 Languages and Automata Limitations of Regular Languages • Formal languages are very important in CS Intuition: A finite automaton that runs long – Especially in programming languages and compilers enough must repeat states • A finite automaton cannot remember number • Regular languages of times it has visited a particular state – The weakest formal languages widely used • because a finite automaton has finite memory – Many applications – Only enough to store in which state it is – Cannot count, except up to a finite limit • We will also study context-free languages • Many languages are not regular • E.g., the language of balanced parentheses is not regular: { (i )i | i ≥ 0} 3 4 The Functionality of the Parser Example • Input: sequence of tokens from lexer • If-then-else statement if (x == y) then z = 1; else z = 2; • Output: parse tree of the program • Parser input IF (ID == ID) THEN ID = INT; ELSE ID = INT; • Possible parser output IF-THEN-ELSE == = = ID ID ID INT ID INT 5 6 Comparison with Lexical Analysis The Role of the Parser • Not all sequences of tokens are programs ... • Parser must distinguish between valid and Phase Input Output invalid sequences of tokens Lexer Sequence of Sequence of characters tokens • We need – A language for describing valid sequences of tokens Parser Sequence of Parse tree – A method for distinguishing valid from invalid tokens sequences of tokens 7 8 Context-Free Grammars CFGs (Cont.) • Many programming language constructs have a A CFG consists of recursive structure – A set of terminals T – A set of non-terminals N • E.g. A STMT is of the form – A start symbol S (a non-terminal) if COND then STMT else STMT , or – A set of productions while COND do STMT , or … Assuming X ∈ N the productions are of the form • Context-free grammars are a natural notation X →ε , or for this recursive structure X → Y1 Y2 ... Yn where Yi ∈ N ∪T 9 10 Notational Conventions Examples of CFGs • In these lecture notes A fragment of an example language (simplified): – Non-terminals are written upper-case – Terminals are written lower-case STMT → if COND then STMT else STMT – The start symbol is the left-hand side of the first ⏐ production while COND do STMT ⏐ id = int 11 12 Examples of CFGs (cont.) The Language of a CFG Grammar for simple arithmetic expressions: Read productions as replacement rules: → E → E * E X Y1 ... Yn Means X can be replaced by Y1 ... Yn (in this order) ⏐ E + E X →ε ⏐ ( E ) Means X can be erased (replaced with empty string) ⏐ id 13 14 Key Idea The Language of a CFG (Cont.) (1) Begin with a string consisting of the start More formally, we write symbol “S” X XX→ XXYYXX (2) Replace any non-terminal X in the string by 11LLin L i−+11 L min1 L a right-hand side of some production if there is a production XYY→ 1 n L XYYim→ 1 L (3) Repeat (2) until there are no non-terminals in the string 15 16 The Language of a CFG (Cont.) The Language of a CFG Write Let G be a context-free grammar with start ∗ symbol S. Then the language of G is: XXYY11LLnm→ if ∗ aaSaa|→ and every a is a terminal { 11KKnn i } XX11LLLLnm→→→ YY in 0 or more steps 17 18 Terminals Examples • Terminals are called so because there are no L(G) is the language of the CFG G rules for replacing them ii Strings of balanced parentheses {() |i ≥ 0} • Once generated, terminals are permanent Two equivalent ways of writing the grammar G: • Terminals ought to be tokens of the language SS→ () SS→ () or S → ε | ε 19 20 Example Example (Cont.) A fragment of our example language (simplified): Some elements of the our language STMT → if COND then STMT id = int ⏐ if COND then STMT else STMT if (id == id) then id = int else id = int ⏐ while COND do STMT while (id != id) do id = int ⏐ id = int while (id == id) do while (id != id) do id = int COND → (id == id) if (id != id) then if (id == id) then id = int else id = int ⏐ (id != id) 21 22 Arithmetic Example Notes Simple arithmetic expressions: The idea of a CFG is a big step. E→∗ E+E | E E | (E) | id But: Some elements of the language: • Membership in a language is just “yes” or “no”; id id + id we also need the parse tree of the input (id) id ∗ id • Must handle errors gracefully (id) ∗∗ id id (id) • Need an implementation of CFG’s – e.g., yacc/bison/ML-yacc/... 23 24 More Notes Derivations and Parse Trees • Form of the grammar is important A derivation is a sequence of productions – Many grammars generate the same language S →→→ – Parsing tools are sensitive to the grammar LLL A derivation can be drawn as a tree – Start symbol is the tree’s root – For a production XYY → 1 L n add children YY 1 L n to node X Note: Tools for regular languages (e.g., lex/ML-Lex) are also sensitive to the form of the regular expression, but this is rarely a problem in practice 25 26 Derivation Example Derivation Example (Cont.) • Grammar Ε → E+E | E*E | (E) | id E→∗ E+E | E E | (E) | id E E → E+E • String E + E id ∗ id + id →∗EE+E →∗id E + E E * E id →∗id id + E id id →∗id id + id 27 28 Derivation in Detail (1) Derivation in Detail (2) Ε → E+E | E*E | (E) | id Ε → E+E | E*E | (E) | id E E E + E E E → E+E 29 30 Derivation in Detail (3) Derivation in Detail (4) Ε → E+E | E*E | (E) | id Ε → E+E | E*E | (E) | id E E E E E + E E + E → E+E → E+E E * E →∗EE+E E * E →∗E+EE →∗id E + E id 31 32 Derivation in Detail (5) Derivation in Detail (6) Ε → E+E | E*E | (E) | id Ε → E+E | E*E | (E) | id E E E E → E+E → E+E E + E E + E →∗EE+E →∗EE+E E * E →∗id E + E E * E id →∗id E + E → id∗ id + E →∗id id + E id id id id →∗id id + id 33 34 Notes on Derivations Left-most and Right-most Derivations • A parse tree has • What was shown before Ε → E+E | E*E | (E) | id – Terminals at the leaves was a left-most derivation – At each step, we replaced E – Non-terminals at the interior nodes the left-most non-terminal → E+E • An in-order traversal of the leaves is the • There is an equivalent → E+id original input notion of a right-most derivation →∗EE + id – Shown on the right • The parse tree shows the association of →∗Eid + id operations; the input string does not ! →∗id id + id 35 36 Right-most Derivation in Detail (1) Right-most Derivation in Detail (2) Ε → E+E | E*E | (E) | id Ε → E+E | E*E | (E) | id E E E + E E E → E+E 37 38 Right-most Derivation in Detail (3) Right-most Derivation in Detail (4) Ε → E+E | E*E | (E) | id Ε → E+E | E*E | (E) | id E E E E E + E E + E → E+E → E+E id → E+id E * E id → E+id → EE∗ + id 39 40 Right-most Derivation in Detail (5) Right-most Derivation in Detail (6) Ε → E+E | E*E | (E) | id Ε → E+E | E*E | (E) | id E E E E → E+E → E+E E + E E + E → E+id → E+id E * E id →∗EE + id E * E id → EE∗ + id →∗Eid + id → E ∗id + id id id id →∗id id + id 41 42 Derivations and Parse Trees Summary of Derivations • Note that: • We are not just interested in whether – right-most and left-most derivations have the same s ∈ L(G) parse tree – We need a parse tree for s – for each parse tree, there is a right-most and a left-most derivation • A derivation defines a parse tree – But one parse tree may have many derivations • The difference is just in the order in which branches are added • Left-most and right-most derivations are important in parser implementation 43 44 Ambiguity Ambiguity (Cont.) • Grammar: • A grammar is ambiguous if it has more than E → E + E | E * E | ( E ) | int one parse tree for some string – Equivalently, if there is more than one right-most or left-most derivation for some string • The string int * int + int has two parse trees • Ambiguity is bad E E – Leaves meaning of some programs ill-defined • Ambiguity is common in programming languages E + E E * E – Arithmetic expressions E E int int E + E – IF-THEN-ELSE * int int int int 45 46 Dealing with Ambiguity Ambiguity: The Dangling Else • There are several ways to handle ambiguity • Consider the following grammar • Most direct method is to rewrite the grammar S → if C then S unambiguously | if C then S else S | OTHER E → T + E | T T → int * T | int | ( E ) • This grammar is also ambiguous • This grammar enforces precedence of * over + 47 48 The Dangling Else: Example The Dangling Else: A Fix • The expression • else should match the closest unmatched then if C1 then if C2 then S3 else S4 • We can describe this in the grammar has two parse trees S → MIF /* all then are matched */ if if | UIF /* some then are unmatched */ MIF → if C then MIF else MIF C if C1 if S4 1 | OTHER UIF → if C then S | if C then MIF else UIF C2 S3 C2 S3 S4 • Typically we want the second form • Describes the same set of strings 49 50 The Dangling Else: Example Revisited Ambiguity • The expression if C1 then if C2 then S3 else S4 • No general techniques for handling ambiguity if if • Impossible to convert automatically an ambiguous grammar to an unambiguous one C1 if C1 if S4 • Used with care, ambiguity can simplify the C2 S3 S4 C2 S3 grammar • A valid parse tree – Sometimes allows more natural definitions • Not valid because the (for a UIF) then expression is not – However, we need disambiguation mechanisms a MIF 51 52 Precedence and Associativity Declarations Associativity Declarations • Instead of rewriting the grammar • Consider the grammar E → E + E | int – Use the more natural (ambiguous) grammar • Ambiguous: two parse trees of int + int + int – Along with disambiguating declarations E E • Most tools allow precedence and associativity E + E E E declarations to disambiguate grammars + E + E int int E + E • Examples … int int int int • Left associativity declaration: %left + 53 54 Precedence Declarations Error Handling • Consider the grammar E → E + E | E * E | int • Purpose of the compiler is And the string int + int * int – To detect non-valid programs – To translate the valid ones E E • Many kinds of possible errors (e.g.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    16 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us