Grammars Parsing A.k.a. Syntax Analysis Recognize sentences in a language. Discover the structure of a document/program. Construct (implicitly or explicitly) a tree (called as a parse tree) to represent the structure. The parse tree is used to guide translation. Compilers Parsing CSE 304/504 1 / 36 Grammars Grammars The syntactic structure of a language is defined using grammars. Grammars (like regular expressions) specify a set of strings over an alphabet. Efficient recognizers (automata) can be constructed to determine whether a string is in the language. Language heirarchy: Finite Languages (FL) Enumeration Regular Languages (RL ⊃ FL) Regular Expressions Context-Free Languages (CFL ⊃ RL) Context-Free Grammars Compilers Parsing CSE 304/504 2 / 36 Grammars Regular Languages Languages represented by Languages recognized ≡ regular expressions by finite automata Examples: p fa; b; cg p f, a; b; aa; ab; ba; bb;:::g p f(ab)n j n ≥ 0g ×f anbn j n ≥ 0g Compilers Parsing CSE 304/504 3 / 36 Grammars Context-Free Grammars Terminal Symbols: Tokens Nonterminal Symbols: set of strings made up of tokens Productions: Rules for constructing the set of strings associated with nonterminal symbols. Example: Stmt −! while Expr do Stmt Start symbol: a nonterminal symbol that represents the set of all strings in the language. Compilers Parsing CSE 304/504 4 / 36 Grammars Grammars Notation where recursion is explicit. Examples: f, a; b; aa; ab; ba; bb;:::g = L( (a j b)∗): E −! a Notational shorthand: E −! b E −! a j b S −! S −! j ES S −! ES fanbn j n ≥ 0g : S −! S −! aSb fw j no. of a's in w = no. of b's in wg: Not expressible in CFG. Compilers Parsing CSE 304/504 5 / 36 Grammars The First Useful Example E −! E + E E −! E − E E −! E ∗ E E −! E = E E −! ( E ) E −! id L(E) = fid; id + id; id − id;:::; id + (id ∗ id) − id;:::g Compilers Parsing CSE 304/504 6 / 36 Grammars Context-Free Grammars: Notations Production: rule with nonterminal symbol on left hand side, and a (possibly empty) sequence of terminal or nonterminal symbols on the right hand side. Notations: Terminals: lower case letters, digits, punctuation Nonterminals: Upper case letters Arbitrary Terminals/Nonterminals: X ; Y ; Z Strings of Terminals: u; v; w Strings of Terminals/Nonterminals: α; β; γ Start Symbol: S Compilers Parsing CSE 304/504 7 / 36 Grammars Derivations E −! E + E Grammar: E −! id E derives id + id: E =) E + E =) E + id =) id + id αAβ =) αγβ iff A −! γ is a production in the grammar. α =∗) β if α derives β in zero or more steps. ∗ Example: E =) id + id Sentence: A sequence of terminal symbols w such that S =+) w (where S is the start symbol) Sentential Form: A sequence of terminal/nonterminal symbols α such that S =∗) α Compilers Parsing CSE 304/504 8 / 36 Grammars Derivations E −! E + E Grammar: E −! id Leftmost derivation: Leftmost nonterminal is replaced first: E =) E + E =) id + E =) id + id ∗ Written as E =)lm id + id Rightmost derivation: Rightmost nonterminal is replaced first: E =) E + E =) E + id =) id + id ∗ Written as E =)rm id + id Compilers Parsing CSE 304/504 9 / 36 Grammars Parse Trees A Parse Tree is a graphical representation of a derivation E −! E + E Grammar: E −! id E E =) E + E E =) E + E =) id + E EE+ =) E + id =) id + id =) id + id id id A Parse Tree succinctly captures the structure of a sentence. Compilers Parsing CSE 304/504 10 / 36 Grammars Ambiguity A Grammar is ambiguous if there are multiple parse trees for the same sentence. E E Grammar: E −! E + E EE+ E * E E −! E ∗ E E −! id id EE* EE+ id Sentence: id + id ∗ id id id id id Compilers Parsing CSE 304/504 11 / 36 Grammars Disambiguition Express Preference for one parse tree over others. Example: id + id ∗ id The usual precedence of ∗ over + means: Preferred Compilers Parsing CSE 304/504 12 / 36 Grammars Parsing Construct a parse tree for a given string. S −! (S)S S −! a S −! (a)a (a)(a) S S ()S S ()SS a ()SS a a a ε Compilers Parsing CSE 304/504 13 / 36 Recursive-Descent Parsing A Procedure for Parsing Grammar: S −! a Algorithm parse S() f switch (input token) f case TOKEN A: consume(TOKEN A); return; default: /* Parse Error */ g g Compilers Parsing CSE 304/504 14 / 36 Recursive-Descent Parsing A Procedure for Parsing (Contd.) S −! a Grammar: S −! Algorithm parse S() f switch (input token) f case TOKEN A: /* Production 1 */ consume(TOKEN A); return; case TOKEN EOF : /* Production 2 */ return; default: /* Parse Error */ g g Compilers Parsing CSE 304/504 15 / 36 Recursive-Descent Parsing A Procedure for Parsing (Contd.) S −! (S)S Grammar: S −! a S −! Algorithm parse S() f switch (input token) f case TOKEN OPEN PAREN: /* Production 1 */ consume(TOKEN OPEN PAREN); parse S(); consume(TOKEN CLOSE PAREN); parse S(); return; /* Continued on next page */ Compilers Parsing CSE 304/504 16 / 36 Recursive-Descent Parsing A Procedure for Parsing (contd.) S −! (S)S Grammar: S −! a S −! case TOKEN A: /* Production 2 */ consume(TOKEN A); return; case TOKEN CLOSE PAREN: case TOKEN EOF : /* Production 3 */ return; default: /* Parse Error */ Compilers Parsing CSE 304/504 17 / 36 Recursive-Descent Parsing Predictive Parsing: Restrictions May not be able to choose a unique production S −! a B d B −! b B −! bc In general, we may need a backtracking parser: Recursive Descent Parsing Compilers Parsing CSE 304/504 18 / 36 Top-Down Predictive Parsing FIRST and FOLLOW Grammar: S −! (S)S j a j FIRST(X ) = First character of any string that can be derived from X FIRST(S) = f(; a; g. FOLLOW(A) = First character that, in any derivation of a string in the language, appears immediately after A. FOLLOW(S) = f); EOFg S a 2 FIRST(C) b 2 FOLLOW(C) C a b Compilers Parsing CSE 304/504 19 / 36 Top-Down Predictive Parsing FIRST and FOLLOW A −! a S −! ASB Grammar: B −! b S −! FIRST (X ): First terminal in some α such that X =)∗ α. FOLLOW (A): First terminal in some β such that S =)∗ αAβ. FIRST (S) = f a, g FOLLOW (S) = f b, EOF g FIRST (A) = f a g FOLLOW (A) = f a, b g FIRST (B) = f b g FOLLOW (B) = f b, EOF g Compilers Parsing CSE 304/504 20 / 36 Top-Down Predictive Parsing Definition of FIRST A −! a S −! ASB Grammar: B −! b S −! FIRST (α) is the smallest set such that α = Property of FIRST (α) a, a terminal a 2 FIRST (α) A −! 2 G =) 2 FIRST (α) A, a nonterminal A −! β 2 G, β 6= =) FIRST (β) ⊆ FIRST (α) X1; X2;:::; Xk , a FIRST (X1) − fg ⊆ FIRST (α) string of terminals FIRST (Xi ) ⊆ FIRST (α) if 8j < i 2 FIRST (Xj ) and nonterminals 2 FIRST (α) if 8j < k 2 FIRST (Xj ) FIRST (A) ⊇ FIRST (a) ⊇ fag FIRST (B) ⊇ FIRST (b) ⊇ fbg FIRST (S) ⊇ fg, and FIRST (S) ⊇ FIRST (fASBg) ⊇ FIRST (A) ⊇ fag Compilers Parsing CSE 304/504 21 / 36 Top-Down Predictive Parsing Definition of FOLLOW A −! a S −! ASB Grammar: B −! b S −! FOLLOW (A) is the smallest set such that A Property of FOLLOW (A) EOF 2 FOLLOW (S) = S, the start symbol Book notation: $ 2 FOLLOW (S) B −! αAβ 2 G FIRST (β) − fg ⊆ FOLLOW (A) B −! αA, or FOLLOW (B) ⊆ FOLLOW (A) B −! αAβ, 2 FIRST (β) FOLLOW (S) ⊇ fEOF g, and FOLLOW (S) ⊇ FIRST (B) − ⊇ fbg FOLLOW (A) ⊇ FIRST (SB) − ⊇ fa; bg FOLLOW (B) ⊇ FOLLOW (S) ⊇ fb; EOF g Compilers Parsing CSE 304/504 22 / 36 Top-Down Predictive Parsing Parsing Table A −! a S −! ASB Grammar: B −! b S −! Algorithm parse S() f switch (input token) f case TOKEN A: /* Production 3 */ parse A(); parse S(); parse B(); return; case TOKEN B: case TOKEN EOF : /* Production 4 */ return; Parsing Table: Input Symbol Nonterminal a b EOF S S −! ASB S −! S −! Compilers Parsing CSE 304/504 23 / 36 Top-Down Predictive Parsing Table-driven Parsing A −! a S −! ASB Grammar: B −! b S −! Parsing Table: Input Symbol Nonterminal a b EOF S S −! ASB S −! S −! A A −! a B B −! b Compilers Parsing CSE 304/504 24 / 36 Top-Down Predictive Parsing Nonrecursive Parsing Instead of recursion, use an explicit stack along with the parsing table. Data objects: Parsing Table: M(A; a), a two-dimensional array, dimensions indexed by nonterminal symbols (A) and terminal symbols (a). A Stack of terminal/nonterminal symbols Input stream of tokens The above data structures manipulated using a table-driven parsing program. Compilers Parsing CSE 304/504 25 / 36 Top-Down Predictive Parsing Table-driven Parsing: a Sketch stack initialized to EOF . while (! stack.isEmpty()) f X = stack.top(); if (X is a terminal symbol) consume(X ); else /* X is a nonterminal */ if (M[X ; input token] = X −! Y1; Y2;:::; Yk ) f stack.pop(); for i = k downto 1 do stack.push(Yi ); g else /* Syntax Error */ g Compilers Parsing CSE 304/504 26 / 36 Top-Down Predictive Parsing Constructing Parsing Table A −! a S −! ASB Grammar: B −! b S −! First(S) = f a, g Follow(S) = f b, EOF g First(A) = f a g Follow(A) = f a, b g First(B) = f b g Follow(B) = f b, EOF g Input Symbol Nonterminal a b EOF S S −! ASB S −! S −! A A −! a B B −! b Compilers Parsing CSE 304/504 27 / 36 Top-Down Predictive Parsing A Procedure to Construct Parsing Tables FIRST (X ): First terminal in some α such that X =)∗ α.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages18 Page
-
File Size-