<<

Top-Down

Computer Science 332 Goal:

¡ Find a leftmost derivation for an input string, or Compiler Construction ¡ Construct a parse for the input starting from the root and creating nodes of the in preorder (parent, then children)

4.4: Top-Down Parsing Discussed deterministic special case – predictive parsing – in 2.4. (Skip Sections on Transition Diagrams and Error Recovery) General case is nondeterministic (backtracking) Of more theoretical than practical interest

Recursive-Descent Parsing Recursive-Descent Parsing

Requires backtracking Requires backtracking Consider grammar Consider grammar ¢ S cAd S ¢ cAd ¢ A ab | a A ¢ ab | a

Parse input string w = cad: Parse input string w = cad: S

c a d c a d Recursive-Descent Parsing Recursive-Descent Parsing

Requires backtracking Requires backtracking Consider grammar Consider grammar ¢ S ¢ cAd S cAd

A ¢ ab | a A ¢ ab | a

Parse input string w = cad: S Parse input string w = cad: S

c A d c A d

c a d c a d

Recursive-Descent Parsing Recursive-Descent Parsing

Requires backtracking Requires backtracking Consider grammar Consider grammar

S ¢ cAd S ¢ cAd

A ¢ ab | a A ¢ ab | a

Parse input string w = cad: S Parse input string w = cad: S

c A d c A d a b a b c a d c a d Recursive-Descent Parsing Recursive-Descent Parsing

Requires backtracking Requires backtracking Consider grammar Consider grammar ¢ S ¢ cAd S cAd

A ¢ ab | a A ¢ ab | a

Parse input string w = cad: S Parse input string w = cad: S

c A d c A d a b FAIL c a d c a d

Recursive-Descent Parsing Recursive-Descent Parsing

Requires backtracking Requires backtracking Consider grammar Consider grammar

S ¢ cAd S ¢ cAd

A ¢ ab | a A ¢ ab | a

Parse input string w = cad: S Parse input string w = cad: S

c A d c A d a a c a d c a d Recursive-Descent Parsing Nonrecursive Predictive Parsing

Requires backtracking Maintain stack explicitly, instead of relying on run- Consider grammar time support for recursion.

S ¢ cAd Components ¡ A ¢ ab | a Input buffer : w$ ¡

Parse input string w = cad: S Stack : terminals and nonterminals ¡ Parsing table :

c A d nonterminal × input symbol | ¢ production

a SUCCEED ¡ Output stream : derivation c a d

Nonrecursive Predictive Parsing Predictive Parsing set input pointer ip to first symbol of w$ repeat

Table M determines action based on stack symbol let X be the top stack symbol and a the symbol pointed to by ip X and input symbol a. if X is a terminal or $ then

if X = a then Initial stack is start symbol on top of $. pop X from the stack and advance ip Possibilities are else error () else /* X is a nonterminal */ 1. X = a = $ : halt successfully if M[X, a] = X ¡ Y Y ... Y then begin 1 2 k 2. X = a $ : pop X and advance input pointer pop X from the stack push Y , Y , ..., Y onto the stack with Y on top /* order ? */ 3. X = nonterminal : Consult table entry M[X, a]. k k-1 1 1 output the production X ¡ Y Y ... Y If empty, report error. Else pop X and push table 1 2 k end entry. else error() until X = $ /* stack is empty */ Predictive Parsing Example FIRST and FOLLOW

Grammar (note elimination of E T E'

): E' + T E' | • Recall FIRST from Chapter 2: FIRST(α) is set of T F T' α

terminals that begin strings derived from . Input: id + id * id T' * F T' | ∈ F ( E ) | id • Together with FOLLOW, helps us build parse Table: table from grammar. Input Symbol id + * ( ) $ • FOLLOW(A) is set of terminals a that can appear Nonterminal immediately to the right of A in some sentential E E ¡ T E' E ¡ T E' ⇒* α β ¡ E' E' ¡ +T E' E' ¡ ∈ E' ∈ form; i.e., a such that S Aa .

T T ¡ F T' T ¡ F T' ¡ T' T' ¡ ∈ T' ¡ *F T'∋ T' ¡ ∈ T' ∈

F F ¢ id F ¡ ( E )

COMPUTING FIRST COMPUTING FOLLOW 1. If X is terminal, then FIRST(X) is {X}. 1. Place $ in FOLLOW(S), where S is the start symbol. £

∈ ∈ £ 2. If X is a production, add to FIRST(X). 2. If there is a production A αBβ, then everything in 3. If X £ Y Y ... Y is a production, place a in FIRST(X) if β ∈ 1 2 k FIRST( ) except for is placed in FOLLOW(B).

for some i, a is in FIRST(Y) and ∈ is in all of £ i 3. If there is a production A αB, or a production FIRST(Y ) ... FIRST(Y ); that is, Y ... Y ⇒* ∈. If ∈ is A £ αBβ where FIRST(β) contains ∈ (i.e., β ⇒* ∈), then 1 i-1 1 i-1 in FIRST(Y) for all j = 1, 2, ..., k, then add ∈ to everything in FOLLOW(A) is in FOLLOW(B). j FIRST(X). For example, everything in FIRST(Y ) is E T E' 1 Exercise: Compute FIRST, ∈ ∈ FOLLOW for nonterminals in E' + T E' |

surely in FIRST(X). If Y does not derive , then we add 1 grammar: T F T'

nothing more to FIRST(X), but if Y .⇒* ∈, then we add 1 T' * F T' | ∈

FIRST(Y ) and so on. F ( E ) | id 2 Construction of Predictive Parse Tables LL(1) Grammars

Input: Grammar G Output: Parsing table M Ambiguous grammars will have more than one entry

£ M[A, a] for some nonterminal A, terminal a. 1. For each production A α of the grammar, do steps 2

and 3. E.g., ambiguous if / then / else grammar : S iEtSS' | a S' eS | ∈ £ 2. For each terminal a in FIRST(α), add A α to M[A, a]. E b £

∈ α α 3. If is in FIRST( ), add add A to M[A, b] for each This grammar produces a table M containing entry

∈ α £ £ terminal b in FOLLOW(A). If is FIRST( ) and $ is in M[S', e] = {S' ∈, S' eS} (because FOLLOW(S') = FOLLOW(A), add A £ α to M[A, $]. {e, $}). 4. Make each undefined entry of M be error.

LL(1) Grammars LL(1) Grammars

A grammar without such duplicate entries is called LL(1).

First L means “read input Left to right”. So what to do when M has multiply-defined entries?

Second L means “build Leftmost derivation”. Can try to make G LL(1) by eliminating left recursion, and left

factoring the result – may produce an LL(1) grammar. 1 means one symbol of lookahead in input to make decisions.

Won't work for some grammars, like our if / then / else No ambiguous or left- can be LL(1). example.

£

α β, More technically: Grammar G is LL(1) iff for A | For such grammars, we may be able to eliminate all but one of 1.For no terminal a do both α and β derive strings beginning the multiple entries; e.g., change M[S', e] = {S' £ ∈, S' £ eS} with a. to M[S', e] = S' £ eS. 2.At most one of α and β can derive the empty string. But this must be done on a case-by case basis; there are no universal rules. 3.If β ⇒* ∈, then α does not derive any string beginning with a terminal in FOLLOW(A).