<<

MIT 6. 035 Top-Down

Martin Rinard Laboratory for Massachusetts Institute of Technology Orientation

specification • Lexical structure – regular expressions • Syntactic structure – • This Lecture - recursive ddescentescent parsers • Code parser as set of mutually recursive procedures • Structure ooff pprogramrogram matches sstructuretructure of grammar Starting Point

• Assume lexical has produced a sequence of tokens • Each token has a type and value • Types ccorrespondorrespond ttoo tterminalserminals • Values to contents of token read in • Examples • Int 549 – integer token with value 549 read in • if - if keyword, no need for a value • AddOp + - add operator, value + Basic Approach

• Start with Start symbol • Bu ild a le ftf tmost der iva tion • If leftmost symbol is nonterminal, choose a production and applyapply itit • If leftmost symbol is terminal, match against input • If all terminals match, have found a parse! •Key: find correct productions for nonterminals Graphical Illustration of Leftmost Derivation

Sentential Form

NT1 T1 T2 T3 NT2 NT3

Apply Production Not Here Here Grammar for Parsing Examp le

Start → Expr • Set ooff ttokokensens iiss Expr → Expr + Term { +, -, *, /, Int }, where Expr → Expr - Term Int = [0-9][0-9]* Expr → Term • FFoo r ccononveeniencenience , may reprepreesentsent Term → Term * Int each Int n token by n Term → Term / Int Term → Int Parsing Example

Parse Remaining Input Start 2-2*2

Sentential Form

Start

Current Position in Parsing Example

Parse Remaining Input Tree Start 2-2*2 Exprp Sentential Form

Expr

Apppplied Production

Start → Expr Current Position in Parse Tree Parsing Example

Parse Remaining Input Tree Start 2-2*2 Exprp Sentential Form

Expr - Term Expr - Term

Expr → Expr + Term Apppplied Production Expr → Expr - Term Expr → Term Expr → Expr - Term Parsing Example

Parse Remaining Input Tree Start 2-2*2 Exppr Sentential Form

Expr - Term Term - Term Term Exprp → Exprp + Teerm Apppplied Production Expr → Expr - Term Expr → Term Expr →→Te Termrm rm o m r Int

Te → - m 2-2*2 r Int lied Production

maining Input Te e Sentential F pp R App rsing Example m a er P T

p pr -

Start Ex

m r e Int s e Expr Te r e Pa Tr rm o

m r

Te - 2-2*2 2 maining Input e Sentential F R ! n e k Match Input To rsing Example m a er P T

p pr -

Start Ex

m r e Int 2 s e Expr Te r e Pa Tr rm o

m r

Te

-2*2 - 2 maining Input e Sentential F R ! n e k Match Input To rsing Example m a er P T

p pr -

Start Ex

m r e Int 2 s e Expr Te r e Pa Tr rm o

m r

Te

2*2 - 2 maining Input e Sentential F R ! n e k Match Input To rsing Example m a er P T

p pr -

Start Ex

m r e Int 2 s e Expr Te r e Pa Tr Int

rm * o Int * m r m r Te 2*2 Te

- → lied Production maining Input

2 m e r Sentential F pp R

App Te Int rsing Example m a * er P T

m r p pr - Te Start Ex

m r e Int 2 s e Expr Te r e Pa Tr rm o Int Int

* → 2*2

Int m r - lied Production

maining Input Te 2 e Sentential F pp R App Int rsing Example m a * er P T

m r p pr -

Te Int Start Ex

m r e Int 2 s e Expr Te r e Pa Tr rm o Int

* 2 2*2 - 2 maining Input e Sentential F R ! n e k Match Input Int To rsing Example m a * er P T

m r p pr -

Te Int 2 Start Ex

m r e Int 2 s e Expr Te r e Pa Tr rm o Int

* *2 2

- 2 maining Input e Sentential F R ! n e k Match Input Int To rsing Example m a * er P T

m r p pr -

Te Int 2 Start Ex

m r e Int 2 s e Expr Te r e Pa Tr rm o Int

* 2 2

- 2 maining Input e Sentential F R ! n e k Match Input Int To rsing Example m a * er P T

m r p pr -

Te Int 2 Start Ex

m r e Int 2 s e Expr Te r e Pa Tr rm o 2 * 2 2 - 2 maining Input e Sentential F R e s r Pa Int 2 Complete! rsing Example m a * er P T

m r p pr -

Te Int 2 Start Ex

m r e Int 2 s e Expr Te r e Pa Tr Summary

• Three Actions (Mechanisms) • AlApply pro ductit ion to expand current nonterminal in parse tree • Match ccurrenturrent terminal (consuming input) • Accept the parse as correct • Parser generates ppreorderreorder ttraversalraversal of parse ttreeree • visit parents before children • visit ssiblingsiblings ffromrom llefteft ttoo rrightight Policy Problem

• Which production to use for each nonterminal? • Classical Separation of Policy and Mechanism • One Approach: • Treat it as a search problem • At each choice point, try next alternative • If it is clear that current try fails, go back to previous choice and trytry something ddifferentifferent • General technique for searching • Used a lot in classical AI and natural langgguage processing (parsing, ) Backtracking Examp le

Parse Remaining Input Tree Start 2-2*2

Sentential Form Start Backtracking Examp le

Parse Remaining Input Tree Start 2-2*2 Exp r Sentential Form Expr

Applied Production Start → Expr Backtracking Examp le

Parse Remaining Input Tree Start 2-2*2 Expr Sentential Form Expr + Term Expr + Term

Applied Production Expr → Expr + Term Backtracking Examp le

Parse Remaining Input Tree Start 2-2*2 Expr Sentential Form Expr + Term Term + Term Term Applied Production Expr → Term Backtracking Examp le

Parse Remaining Input Tree Start Match Input 2-2*2 Expr Token! Sentential Form Expr + Term Int + Term Term Applied Production Int Term → Int Backtracking Examp le

Parse Remaining Input Tree Start Can’ t Match -2*2 Expr Input Sentential Form Token! Expr + Term 2 - Term Term Applied Production Int 2 Term → Int Backtracking Examp le

Parse Remaining Input Tree Start So Backtrack! 2-2*2 Exp r Sentential Form Expr

Applied Production Start → Expr Backtracking Examp le

Parse Remaining Input Tree Start 2-2*2 Expr Sentential Form Expr - Term Expr - Term

Applied Production Expr → Expr - Term Backtracking Examp le

Parse Remaining Input Tree Start 2-2*2 Expr Sentential Form Expr - Term Term - Term Term Applied Production Expr → Term Backtracking Examp le

Parse Remaining Input Tree Start 2-2*2 Expr Sentential Form Expr - Term Int - Term Term Applied Production Int Term → Int Backtracking Examp le

Parse Remaining Input Tree Start Match Input -2*2 Expr Token! Sentential Form Expr - Term 2 - Term Term

Int 2 Backtracking Examp le

Parse Remaining Input Tree Start Match Input 2*2 Expr Token! Sentential Form Expr - Term 2 - Term Term

Int 2 Left + Top-Down Parsing = Infinite Loop • Example Production: Term → Term*Num • PtPoten ttiial pars ing steps:

Term Term Term Num Term * Num Term *

Term * Num General Search Issues

• Three components • Search space (parse trees) • Search (parsing(parsing algorithm)algorithm) • Goal to find (parse tree for input program) • Would like to (but can’t always) ensure that • Find goal (hopefully quickly) if it exists • Search terminates if it does not • Handled inin variousvarious ways in various contextscontexts • Finite search space makes it easy • Exploration strategies for infinite search space • Sometimes oneone goal more important (model checking) • For parsing, hack grammar to remove Eliminating Left Recursion • Start with productions of form •A→ A α • A →β • α, β sequences of terminals and nonterminals that do not startstart with A • Repeated application of A →A α A builds parseparse tree like tthis:his: A α

A α β α Eliminating Left Recursion • Replacement productions –A→ A α A →βR R is a new nonterminal –A→β R →αR – R → ε New ParseParse TTrreeee Old Parse Tree A A R β R A α α R β α α ε ’ ’ agment m ’ r Fr m erm r Te T t Te t In t In

* / ε ammar In → → → → ’ ’ ’

m m m r r r

New Gr erm Te Te T Te t t In Hacked Grammar In

ammar * /

m m

agment r r t Fr

Te Te In → → →

Original Gr m m m r r r

Te Te Te Parse Tree Comparisons

Original Grammar New GGrraammarmmar

Term Term Int Term’ Term * Int Term’ Int * Int * Int * Int Term’

ε Eliminating Left Recursion

• Changes search space exploration algorithm • EElilim ina tes didirec t iifnfiiinitte recursion • But grammar less intuitive • Sets things up for predictivepredictive pparsingarsing Predictive Parsing

• Alternative to backtracking • UfUseful for programming lan guages, whihich can be designed to make parsing easier • Basic iideadea • Look ahead in input stream • Decide which pproductionroduction ttoo aapplypply basedbased on next tokens in input stream • We will use one token of lookahead ’ ’ m ’ m r r m r Te Te t Te t In t In

* / ε In → → → → ’ ’ ’

m m m m r r r r

Te Te Te Te

’ r p x ’ ’ E

m Expr r Expr ε Expr Te + - → → → → → ’ ’ ’

Predictive Parsing Example Grammar Start Expr Expr Expr Expr Choice Points

• Assume Term’ is current position in parse tree • Have three possiblepossible productionsproductions to apply Term’ → * Int Term’ Teerm’ → / Int Teerm’ Term’ →ε • Use next token to decide • If next token is *, apply Term’ → * Int Term’ • If next token is /, apply Term’ → / Int Term’ • Otherwise , aapplypply Term ’ → ε Predictive Parsing + Hand Coding = Recursive DescentDescent Parser

• One procedure per nonterminal NT

• PdProductitions NT → β1 , …, NT → βn • Procedure examines the current input symbol T to determine which production to apppply

•If T∈First(βk) • Apply production k

• Consume terminals in βk (h(check for correct terminal)

• Recursively call procedures for nonterminals in βk • Current input symbol stored in global variable token • Procedures return • true if parse succeedssucceeds • false if parse fails Boolean Term() Example if (token = Int n) token = NextToken(); return(TermPrime()) else return(false) Boolean TTermPrime()ermPrime() if (token = *) token = NextToken(); if (token = Int n) token = NexttTToken(); return((TTermPrime()) else return(false) else if (token = /) token = NextToken(); if (token = Int n) token = NextToken(); return(TermPrime()) else return(false) else return(true) Term → Int Term’ Term’ → * Int Term’ Term’ → / Int Term’ Term’ →ε Multiple Productions With Same Prefix in RHS • Example Grammar NT → if then NT → if then else • Assume NT is current position in parse treet ree, aandnd if is the next token • Unclear whichwhich production to apply

• Multiple k such that T∈First(βk) • if ∈ First(if then) •if∈ First(if then else) Solution: Left Factor the Grammar

• New Grammar Factors Common Prefix Into Single Production NT → if then NT’ NT’NT → else NT’ →ε • No choice when next token iiss iif!f! • All choices have been unified in one production. 2 ?

ε α e t and

1 α enera g can can generate 2

2

2 NT Nonterminals 1 NT α α or

1 2 1 and NT NT NT 1 f if i

→ → NT ust choose based on t t a M hat about productions with nonterminals? ust choose based on possible first terminals

NT NT • W M that Wh • • • i

n NT ≤ i ≤ ε 1 ε ε ves i er or all ves i f di d er di d NT and

n derives NT es i l li es i l li NT ... NT mp

1 i mp i ε NT ves →ε → i er

NT NT di d wo rules • • T • Fixed Point Algorithm for Derives ε for all nonterminals NT set NT de ri ves ε to be f alse for all productions of the form NT →ε set NT derives ε to be true while (some NT derives ε changed in last iteration)

for aallll productions ooff tthehe form NT → NT 1 ... NTn

if (for all 1≤i ≤n NTi derives ε) set NT derives ε to be true First(β)

•T∈ First(β ) if T can appear as the first symbol in a derivationderivation startingstarting fromfrom β 1) T∈First(T ) 2) First( S ) ⊆ First( S β) 3) NT derives ε implies First(β) ⊆ First(NT β) 4) NT → S β implies First(S β) ⊆ First(NT )

•Notation • T is a terminal, NT is a nonterminal, S is a terminal or nonterminal, and β is a sequence of terminals or nonterminals ) ) ’ ’

)? m ’ m r r )

m ) r Te Te ’

Te rm’ m r e e First(

First( T Te ⊆ First( ⊆ m ) aints

) u ints ints Num a a * Num T /N / rm’ rm’ e e Constr First( First( stem of Subset ⊆ Constr Constr ⊆ y S

* Num T / Num T /) quest: What is e First(/) First(*) te R ∈ ∈ a First( First( * / First( First(*) Inclusion ) ) β )

β NT

S NT quest Gener rm’ rm’ e e e implies T T First( s ε t t implies First( ) First( le ⊆ s In In β e )

T ⊆

* / ⊆ ammar ( Ru β

) S ) Gr

→ → →ε S β les + R ’ deriv → u m First rm’ rm’ r R e e ∈ T T Te T NT NT First( First(S ) 1 2) First( 3) 4) Until ) = {} ) = {} aints {} }

rm’ { rm’ to e e orithm )= Solution int g g ’ {*} {/} o

m = Sets r = e d P Num T Te * Num T / e Fix itializ opagate Constr n First( First( First( First(/) I Pr First(*) ation Al ) g ) ’ a

m p pg r

erm’ ) ) o ’ T Te m r rm’ erm T Te e First(

First( T aints ⊆ ⊆ ) )

aint Pr Num

Constr * / Num rm’ rm’ e e First( First( ⊆ ⊆ Constr )

* Num T / Num T * First(/) First(*) ∈ ∈ * First( First( First( First(/) / ) = {} rm’ ) = {} rm’ e e

} T T t

rm’ t { rm’ e

e In In orithm ε )= Solution * / ammar g g ’ {*} {/} Gr → → → m = r = ’

Num T m Te * Num T / rm’ rm’ r e e T T Te First( First( First( First(/) First(*) ation Al ) g ) ’ a

m p pg r

erm’ ) ) o ’ T Te m r rm’ erm T Te e First(

First( T aints ⊆ ⊆ ) )

aint Pr Num

Constr * / Num rm’ rm’ e e First( First( ⊆ ⊆ Constr )

* Num T / Num T * First(/) First(*) ∈ ∈ * First( First( First( First(/) / ) = {*} rm’ ) = {/} rm’ e e

} T T t

rm’ t { rm’ e

e In In orithm ε )= Solution * / ammar g g ’ {*} {/} Gr → → → m = r = ’

Num T m Te * Num T / rm’ rm’ r e e T T Te First( First( First( First(/) First(*) ation Al ) g ) ’ a

m p pg r

erm’ ) ) o ’ T Te m r rm’ erm T Te e First(

First( T aints ⊆ ⊆ ) )

aint Pr Num

Constr * / Num rm’ rm’ e e First( First( ⊆ ⊆ Constr )

* Num T / Num T * First(/) First(*) ∈ ∈ First( * First( First( First(/) / ) = {*} rm’ ) = {/} rm’ e e T T t

rm’ t {*,/} rm’ e

e In = In orithm ε ) Solution * / ammar g g ’ {*} {/} Gr → → → m = r = ’

Num T m Te * Num T / rm’ rm’ r e e T T Te First( First( First( First(/) First(*) ation Al ) g ) ’ a

m p pg r

erm’ ) ) o ’ T Te m r rm’ erm T Te e First(

First( T aints ⊆ ⊆ ) )

aint Pr Num

Constr * / Num rm’ rm’ e e First( First( ⊆ ⊆ Constr )

* Num T / Num T * First(/) First(*) ∈ ∈ First( * First( First( First(/) / ) = {*} rm’ ) = {/} rm’ e e T T t

rm’ t {*,/} rm’ e

e In = In orithm ε ) Solution * / ammar g g ’ {*} {/} Gr → → → m = r = ’

Num T m Te * Num T / rm’ rm’ r e e T T Te First( First( First( First(/) First(*) ation Al ) g ) ’ a

m p pg r

erm’ ) ) o ’ T Te m r rm’ erm T Te e First(

First( T aints ⊆ ⊆ ) )

aint Pr Num

Constr * / Num rm’ rm’ e e First( First( ⊆ ⊆ Constr )

* Num T / Num T * First(/) First(*) ∈ ∈ First( * First( First( First(/) / Building A Parse Tree

• Have each procedure return the section of the parse treetree fforor the ppartart ooff tthehe string it parsed • Use exceptions to make code structure clean Building Parse Tree In Example

Term() if (token = Int n) oldToken = token; token = NextToken(); node = TermPrime(); if (node == NULL) return oldToken; else return(new TermNode(oldToken, node); else throw SyntaxError TermPrime() if (token = *) || (token = /) first = token; next = NextToken(); if (next = Int n) token = NextToken(); return(new TermPrimeNode(first, next, TermPrime()) else throw SSyntaxErroryntaxError else return(NULL) Parse Tree for 2*3*4

Concrete Desired Parse Tree Abstract Term Parse Tree Int Term ’ Term 2 Int * Int Term ’ Term * 4 3 Int * Int * Int Term’ 2 3 4 ε Why Use Hand-Coded Parser?

• Why not use parser generator? •• What do you ddoo ifif youryour parserparser ddoesnoesn’ ttw work?ork? • – write more code •Pasarseer ggee neraatotor • Hack grammar • But if parser generator doesn’t work, nothing you can do • If you have complicated grammar • Increase chance of going outsideoutside comfort zzoneone of parser generator • Your parser may NEVER work Bottom Line

• Recursive descent parser properties • Probably more work • But less risk of a disaster - you can almost always make a recursive descent parser work • May have easier time dealing with resulting code • Single language system • No need to deal with potentially flaky parser generator • No integration issuesissues with automatically generated code •Ify our parser development time is small compared to rest of project, or you have a really complicated language, use hand-coded recursive descent parser Summary

• Top-Down Parsing • Use L ook ahea d to Avoidid Bac ktk trac kiking •Parser is • Hand-Coded • Set of Mutually Recursive Procedures Direct Generation of Abstract Tree

• TermPrime builds an incomplete tree • Missing leftmost child • Returns root and incomplete node • (root, incomplete) = TermPrime() • Called with token = * • Remaining tokens = 3 * 4 root Term

incomplete Term * Int 4 MissingMissingMissingleft leftLeftchild child child * Int toto be filled inin byby 3 callercaller Code for Term Input to Term() parse if (token = Int n) leftmostInt = token; token = NextToken(); 2*3*4 (root, incomplete) = TermPrime(); if (root == NULL) return leftmostInt; incomplete. leftChild = leftmostIn t; return root; else throw SyntaxError

token Int 2 Code for Term Input to Term() parse if (token = Int n) leftmostInt = token; token = NextToken(); 2*3*4 (root, incomplete) = TermPrime(); if (root == NULL) return leftmostInt; incomplete. leftChild = leftmostIn t; return root; else throw SyntaxError

token Int 2 Code for Term Input to Term() parse if (token = Int n) leftmostInt = token; token = NextToken(); 2*3*4 (root, incomplete) = TermPrime(); if (root == NULL) return leftmostInt; incomplete. leftChild = leftmostIn t; return root; else throw SyntaxError

token Int 2 Code for Term Input to Term() parse if (token = Int n) leftmostInt = token; token = NextToken(); 2*3*4 (root, incomplete) = TermPrime(); if (root == NULL) return leftmostInt; incomplete. leftChild = leftmostIn t; return root; else throw SyntaxError root Term

incomplete Term * Int 4 leftmostInt Int * Int 2 3 Code for Term Input to Term() parse if (token = Int n) leftmostInt = token; token = NextToken(); 2*3*4 (root, incomplete) = TermPrime(); if (root == NULL) return leftmostInt; incomplete. leftChild = leftmostIn t; return root; else throw SyntaxError root Term

incomplete Term * Int 4 leftmostInt Int * Int 2 3 Code for Term Input to Term() parse if (token = Int n) leftmostInt = token; token = NextToken(); 2*3*4 (root, incomplete) = TermPrime(); if (root == NULL) return leftmostInt; incomplete. leftChild = leftmostIn t; return root; else throw SyntaxError root Term

incomplete Term * Int 4 leftmostInt Int * Int 2 3 Code for TermPrime

TermPrime() if (token = *) || (token = /) Missing left child op = tktoken; next = NexttTToken(()); to be filled in by if (next = Int n) caller token = NextToken(); (root, incomplete) = TermPrime(); if (root == NULL) root = new ExprNode(NULL, op, next); return (root, root); else newChild = new ExprNode(NULL, op, next); incomplete.leftChild = newChild; return(root, newChild); else throw SSyntaxErroryntaxError else return(NULL,NULL) MIT OpenCourseWare http://ocw.mit.edu

6.035 Computer Language Engineering Spring 2010

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.