Computer Science 332 Compiler Construction

Top-Down Parsing Computer Science 332 Goal: ¡ Find a leftmost derivation for an input string, or Compiler Construction ¡ Construct a parse tree for the input starting from the root and creating nodes of the parse tree in preorder (parent, then children) 4.4: Top-Down Parsing Discussed deterministic special case – predictive parsing – in 2.4. (Skip Sections on Transition Diagrams and Error Recovery) General case is nondeterministic (backtracking) Of more theoretical than practical interest Recursive-Descent Parsing Recursive-Descent Parsing Requires backtracking Requires backtracking Consider grammar Consider grammar ¢ S cAd S ¢ cAd ¢ A ab | a A ¢ ab | a Parse input string w = cad: Parse input string w = cad: S c a d c a d Recursive-Descent Parsing Recursive-Descent Parsing Requires backtracking Requires backtracking Consider grammar Consider grammar ¢ S ¢ cAd S cAd A ¢ ab | a A ¢ ab | a Parse input string w = cad: S Parse input string w = cad: S c A d c A d c a d c a d Recursive-Descent Parsing Recursive-Descent Parsing Requires backtracking Requires backtracking Consider grammar Consider grammar S ¢ cAd S ¢ cAd A ¢ ab | a A ¢ ab | a Parse input string w = cad: S Parse input string w = cad: S c A d c A d a b a b c a d c a d Recursive-Descent Parsing Recursive-Descent Parsing Requires backtracking Requires backtracking Consider grammar Consider grammar ¢ S ¢ cAd S cAd A ¢ ab | a A ¢ ab | a Parse input string w = cad: S Parse input string w = cad: S c A d c A d a b FAIL c a d c a d Recursive-Descent Parsing Recursive-Descent Parsing Requires backtracking Requires backtracking Consider grammar Consider grammar S ¢ cAd S ¢ cAd A ¢ ab | a A ¢ ab | a Parse input string w = cad: S Parse input string w = cad: S c A d c A d a a c a d c a d Recursive-Descent Parsing Nonrecursive Predictive Parsing Requires backtracking Maintain stack explicitly, instead of relying on run- Consider grammar time support for recursion. S ¢ cAd Components ¡ A ¢ ab | a Input buffer : w$ ¡ Parse input string w = cad: S Stack : terminals and nonterminals ¡ Parsing table : c A d nonterminal × input symbol | ¢ production a SUCCEED ¡ Output stream : derivation c a d Nonrecursive Predictive Parsing Predictive Parsing Algorithm set input pointer ip to first symbol of w$ repeat Table M determines action based on stack symbol let X be the top stack symbol and a the symbol pointed to by ip X and input symbol a. if X is a terminal or $ then if X = a then Initial stack is start symbol on top of $. pop X from the stack and advance ip Possibilities are else error () else /* X is a nonterminal */ 1. X = a = $ : halt successfully if M[X, a] = X ¡ Y Y ... Y then begin 1 2 k 2. X = a $ : pop X and advance input pointer pop X from the stack push Y , Y , ..., Y onto the stack with Y on top /* order ? */ 3. X = nonterminal : Consult table entry M[X, a]. k k-1 1 1 output the production X ¡ Y Y ... Y If empty, report error. Else pop X and push table 1 2 k end entry. else error() until X = $ /* stack is empty */ Predictive Parsing Example FIRST and FOLLOW Grammar (note elimination of E T E' ∈ left recursion): E' + T E' | • Recall FIRST from Chapter 2: FIRST(α) is set of T F T' α terminals that begin strings derived from . Input: id + id * id T' * F T' | ∈ F ( E ) | id • Together with FOLLOW, helps us build parse Table: table from grammar. Input Symbol id + * ( ) $ • FOLLOW(A) is set of terminals a that can appear Nonterminal immediately to the right of A in some sentential E E ¡ T E' E ¡ T E' ⇒* α β ¡ E' E' ¡ +T E' E' ¡ ∈ E' ∈ form; i.e., a such that S Aa . T T ¡ F T' T ¡ F T' ¡ T' T' ¡ ∈ T' ¡ *F T'∋ T' ¡ ∈ T' ∈ F F ¢ id F ¡ ( E ) COMPUTING FIRST COMPUTING FOLLOW 1. If X is terminal, then FIRST(X) is {X}. 1. Place $ in FOLLOW(S), where S is the start symbol. £ 2. If X ∈ is a production, add ∈ to FIRST(X). £ 2. If there is a production A αBβ, then everything in 3. If X £ Y Y ... Y is a production, place a in FIRST(X) if β ∈ 1 2 k FIRST( ) except for is placed in FOLLOW(B). for some i, a is in FIRST(Y) and ∈ is in all of £ i 3. If there is a production A αB, or a production FIRST(Y ) ... FIRST(Y ); that is, Y ... Y ⇒* ∈. If ∈ is A £ αBβ where FIRST(β) contains ∈ (i.e., β ⇒* ∈), then 1 i-1 1 i-1 in FIRST(Y) for all j = 1, 2, ..., k, then add ∈ to everything in FOLLOW(A) is in FOLLOW(B). j FIRST(X). For example, everything in FIRST(Y ) is E T E' 1 Exercise: Compute FIRST, ∈ ∈ FOLLOW for nonterminals in E' + T E' | surely in FIRST(X). If Y does not derive , then we add 1 grammar: T F T' nothing more to FIRST(X), but if Y .⇒* ∈, then we add 1 T' * F T' | ∈ FIRST(Y ) and so on. F ( E ) | id 2 Construction of Predictive Parse Tables LL(1) Grammars Input: Grammar G Output: Parsing table M Ambiguous grammars will have more than one entry £ M[A, a] for some nonterminal A, terminal a. 1. For each production A α of the grammar, do steps 2 and 3. E.g., ambiguous if / then / else grammar : S iEtSS' | a S' eS | ∈ £ 2. For each terminal a in FIRST(α), add A α to M[A, a]. E b £ ∈ α α 3. If is in FIRST( ), add add A to M[A, b] for each This grammar produces a table M containing entry ∈ α £ £ terminal b in FOLLOW(A). If is FIRST( ) and $ is in M[S', e] = {S' ∈, S' eS} (because FOLLOW(S') = FOLLOW(A), add A £ α to M[A, $]. {e, $}). 4. Make each undefined entry of M be error. LL(1) Grammars LL(1) Grammars A grammar without such duplicate entries is called LL(1). First L means “read input Left to right”. So what to do when M has multiply-defined entries? Second L means “build Leftmost derivation”. Can try to make G LL(1) by eliminating left recursion, and left factoring the result – may produce an LL(1) grammar. 1 means one symbol of lookahead in input to make decisions. Won't work for some grammars, like our if / then / else No ambiguous or left-recursive grammar can be LL(1). example. £ α β, More technically: Grammar G is LL(1) iff for A | For such grammars, we may be able to eliminate all but one of 1.For no terminal a do both α and β derive strings beginning the multiple entries; e.g., change M[S', e] = {S' £ ∈, S' £ eS} with a. to M[S', e] = S' £ eS. 2.At most one of α and β can derive the empty string. But this must be done on a case-by case basis; there are no universal rules. 3.If β ⇒* ∈, then α does not derive any string beginning with a terminal in FOLLOW(A)..

Computer Science 332 Compiler Construction

Lecture 4 Dynamic Programming

Exhaustive Recursion and Backtracking

A Grammar-Based Approach to Class Diagram Validation Faizan Javed Marjan Mernik Barrett R

Backtrack Parsing Context-Free Grammar Context-Free Grammar

Backtracking / Branch-And-Bound

Module 5: Backtracking

Section 12.3 Context-Free Parsing We Know (Via a Theorem) That the Context-Free Languages Are Exactly Those Languages That Are Accepted by Pdas

Formal Grammar Specifications of User Interface Processes

CS/ECE 374: Algorithms & Models of Computation

Topics in Context-Free Grammar CFG's

Toward a Model for Backtracking and Dynamic Programming

Tree Search Backtracking Search Backtracking Search