Parsing Grammars

Parsing Grammars

Grammars Parsing A.k.a. Syntax Analysis Recognize sentences in a language. Discover the structure of a document/program. Construct (implicitly or explicitly) a tree (called as a parse tree) to represent the structure. The parse tree is used to guide translation. Compilers Parsing CSE 304/504 1 / 36 Grammars Grammars The syntactic structure of a language is defined using grammars. Grammars (like regular expressions) specify a set of strings over an alphabet. Efficient recognizers (automata) can be constructed to determine whether a string is in the language. Language heirarchy: Finite Languages (FL) Enumeration Regular Languages (RL ⊃ FL) Regular Expressions Context-Free Languages (CFL ⊃ RL) Context-Free Grammars Compilers Parsing CSE 304/504 2 / 36 Grammars Regular Languages Languages represented by Languages recognized ≡ regular expressions by finite automata Examples: p fa; b; cg p f, a; b; aa; ab; ba; bb;:::g p f(ab)n j n ≥ 0g ×f anbn j n ≥ 0g Compilers Parsing CSE 304/504 3 / 36 Grammars Context-Free Grammars Terminal Symbols: Tokens Nonterminal Symbols: set of strings made up of tokens Productions: Rules for constructing the set of strings associated with nonterminal symbols. Example: Stmt −! while Expr do Stmt Start symbol: a nonterminal symbol that represents the set of all strings in the language. Compilers Parsing CSE 304/504 4 / 36 Grammars Grammars Notation where recursion is explicit. Examples: f, a; b; aa; ab; ba; bb;:::g = L( (a j b)∗): E −! a Notational shorthand: E −! b E −! a j b S −! S −! j ES S −! ES fanbn j n ≥ 0g : S −! S −! aSb fw j no. of a's in w = no. of b's in wg: Not expressible in CFG. Compilers Parsing CSE 304/504 5 / 36 Grammars The First Useful Example E −! E + E E −! E − E E −! E ∗ E E −! E = E E −! ( E ) E −! id L(E) = fid; id + id; id − id;:::; id + (id ∗ id) − id;:::g Compilers Parsing CSE 304/504 6 / 36 Grammars Context-Free Grammars: Notations Production: rule with nonterminal symbol on left hand side, and a (possibly empty) sequence of terminal or nonterminal symbols on the right hand side. Notations: Terminals: lower case letters, digits, punctuation Nonterminals: Upper case letters Arbitrary Terminals/Nonterminals: X ; Y ; Z Strings of Terminals: u; v; w Strings of Terminals/Nonterminals: α; β; γ Start Symbol: S Compilers Parsing CSE 304/504 7 / 36 Grammars Derivations E −! E + E Grammar: E −! id E derives id + id: E =) E + E =) E + id =) id + id αAβ =) αγβ iff A −! γ is a production in the grammar. α =∗) β if α derives β in zero or more steps. ∗ Example: E =) id + id Sentence: A sequence of terminal symbols w such that S =+) w (where S is the start symbol) Sentential Form: A sequence of terminal/nonterminal symbols α such that S =∗) α Compilers Parsing CSE 304/504 8 / 36 Grammars Derivations E −! E + E Grammar: E −! id Leftmost derivation: Leftmost nonterminal is replaced first: E =) E + E =) id + E =) id + id ∗ Written as E =)lm id + id Rightmost derivation: Rightmost nonterminal is replaced first: E =) E + E =) E + id =) id + id ∗ Written as E =)rm id + id Compilers Parsing CSE 304/504 9 / 36 Grammars Parse Trees A Parse Tree is a graphical representation of a derivation E −! E + E Grammar: E −! id E E =) E + E E =) E + E =) id + E EE+ =) E + id =) id + id =) id + id id id A Parse Tree succinctly captures the structure of a sentence. Compilers Parsing CSE 304/504 10 / 36 Grammars Ambiguity A Grammar is ambiguous if there are multiple parse trees for the same sentence. E E Grammar: E −! E + E EE+ E * E E −! E ∗ E E −! id id EE* EE+ id Sentence: id + id ∗ id id id id id Compilers Parsing CSE 304/504 11 / 36 Grammars Disambiguition Express Preference for one parse tree over others. Example: id + id ∗ id The usual precedence of ∗ over + means: Preferred Compilers Parsing CSE 304/504 12 / 36 Grammars Parsing Construct a parse tree for a given string. S −! (S)S S −! a S −! (a)a (a)(a) S S ()S S ()SS a ()SS a a a ε Compilers Parsing CSE 304/504 13 / 36 Recursive-Descent Parsing A Procedure for Parsing Grammar: S −! a Algorithm parse S() f switch (input token) f case TOKEN A: consume(TOKEN A); return; default: /* Parse Error */ g g Compilers Parsing CSE 304/504 14 / 36 Recursive-Descent Parsing A Procedure for Parsing (Contd.) S −! a Grammar: S −! Algorithm parse S() f switch (input token) f case TOKEN A: /* Production 1 */ consume(TOKEN A); return; case TOKEN EOF : /* Production 2 */ return; default: /* Parse Error */ g g Compilers Parsing CSE 304/504 15 / 36 Recursive-Descent Parsing A Procedure for Parsing (Contd.) S −! (S)S Grammar: S −! a S −! Algorithm parse S() f switch (input token) f case TOKEN OPEN PAREN: /* Production 1 */ consume(TOKEN OPEN PAREN); parse S(); consume(TOKEN CLOSE PAREN); parse S(); return; /* Continued on next page */ Compilers Parsing CSE 304/504 16 / 36 Recursive-Descent Parsing A Procedure for Parsing (contd.) S −! (S)S Grammar: S −! a S −! case TOKEN A: /* Production 2 */ consume(TOKEN A); return; case TOKEN CLOSE PAREN: case TOKEN EOF : /* Production 3 */ return; default: /* Parse Error */ Compilers Parsing CSE 304/504 17 / 36 Recursive-Descent Parsing Predictive Parsing: Restrictions May not be able to choose a unique production S −! a B d B −! b B −! bc In general, we may need a backtracking parser: Recursive Descent Parsing Compilers Parsing CSE 304/504 18 / 36 Top-Down Predictive Parsing FIRST and FOLLOW Grammar: S −! (S)S j a j FIRST(X ) = First character of any string that can be derived from X FIRST(S) = f(; a; g. FOLLOW(A) = First character that, in any derivation of a string in the language, appears immediately after A. FOLLOW(S) = f); EOFg S a 2 FIRST(C) b 2 FOLLOW(C) C a b Compilers Parsing CSE 304/504 19 / 36 Top-Down Predictive Parsing FIRST and FOLLOW A −! a S −! ASB Grammar: B −! b S −! FIRST (X ): First terminal in some α such that X =)∗ α. FOLLOW (A): First terminal in some β such that S =)∗ αAβ. FIRST (S) = f a, g FOLLOW (S) = f b, EOF g FIRST (A) = f a g FOLLOW (A) = f a, b g FIRST (B) = f b g FOLLOW (B) = f b, EOF g Compilers Parsing CSE 304/504 20 / 36 Top-Down Predictive Parsing Definition of FIRST A −! a S −! ASB Grammar: B −! b S −! FIRST (α) is the smallest set such that α = Property of FIRST (α) a, a terminal a 2 FIRST (α) A −! 2 G =) 2 FIRST (α) A, a nonterminal A −! β 2 G, β 6= =) FIRST (β) ⊆ FIRST (α) X1; X2;:::; Xk , a FIRST (X1) − fg ⊆ FIRST (α) string of terminals FIRST (Xi ) ⊆ FIRST (α) if 8j < i 2 FIRST (Xj ) and nonterminals 2 FIRST (α) if 8j < k 2 FIRST (Xj ) FIRST (A) ⊇ FIRST (a) ⊇ fag FIRST (B) ⊇ FIRST (b) ⊇ fbg FIRST (S) ⊇ fg, and FIRST (S) ⊇ FIRST (fASBg) ⊇ FIRST (A) ⊇ fag Compilers Parsing CSE 304/504 21 / 36 Top-Down Predictive Parsing Definition of FOLLOW A −! a S −! ASB Grammar: B −! b S −! FOLLOW (A) is the smallest set such that A Property of FOLLOW (A) EOF 2 FOLLOW (S) = S, the start symbol Book notation: $ 2 FOLLOW (S) B −! αAβ 2 G FIRST (β) − fg ⊆ FOLLOW (A) B −! αA, or FOLLOW (B) ⊆ FOLLOW (A) B −! αAβ, 2 FIRST (β) FOLLOW (S) ⊇ fEOF g, and FOLLOW (S) ⊇ FIRST (B) − ⊇ fbg FOLLOW (A) ⊇ FIRST (SB) − ⊇ fa; bg FOLLOW (B) ⊇ FOLLOW (S) ⊇ fb; EOF g Compilers Parsing CSE 304/504 22 / 36 Top-Down Predictive Parsing Parsing Table A −! a S −! ASB Grammar: B −! b S −! Algorithm parse S() f switch (input token) f case TOKEN A: /* Production 3 */ parse A(); parse S(); parse B(); return; case TOKEN B: case TOKEN EOF : /* Production 4 */ return; Parsing Table: Input Symbol Nonterminal a b EOF S S −! ASB S −! S −! Compilers Parsing CSE 304/504 23 / 36 Top-Down Predictive Parsing Table-driven Parsing A −! a S −! ASB Grammar: B −! b S −! Parsing Table: Input Symbol Nonterminal a b EOF S S −! ASB S −! S −! A A −! a B B −! b Compilers Parsing CSE 304/504 24 / 36 Top-Down Predictive Parsing Nonrecursive Parsing Instead of recursion, use an explicit stack along with the parsing table. Data objects: Parsing Table: M(A; a), a two-dimensional array, dimensions indexed by nonterminal symbols (A) and terminal symbols (a). A Stack of terminal/nonterminal symbols Input stream of tokens The above data structures manipulated using a table-driven parsing program. Compilers Parsing CSE 304/504 25 / 36 Top-Down Predictive Parsing Table-driven Parsing: a Sketch stack initialized to EOF . while (! stack.isEmpty()) f X = stack.top(); if (X is a terminal symbol) consume(X ); else /* X is a nonterminal */ if (M[X ; input token] = X −! Y1; Y2;:::; Yk ) f stack.pop(); for i = k downto 1 do stack.push(Yi ); g else /* Syntax Error */ g Compilers Parsing CSE 304/504 26 / 36 Top-Down Predictive Parsing Constructing Parsing Table A −! a S −! ASB Grammar: B −! b S −! First(S) = f a, g Follow(S) = f b, EOF g First(A) = f a g Follow(A) = f a, b g First(B) = f b g Follow(B) = f b, EOF g Input Symbol Nonterminal a b EOF S S −! ASB S −! S −! A A −! a B B −! b Compilers Parsing CSE 304/504 27 / 36 Top-Down Predictive Parsing A Procedure to Construct Parsing Tables FIRST (X ): First terminal in some α such that X =)∗ α.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    18 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us