Recursive Descent Parser

MIT 6. 035 Top-Down Parsing Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology Orientation • Language specification • Lexical structure – regular expressions • Syntactic structure – grammar • This Lecture - recursive ddescentescent parsers • Code parser as set of mutually recursive procedures • Structure ooff pprogramrogram matches sstructuretructure of grammar Starting Point • Assume lexical analysis has produced a sequence of tokens • Each token has a type and value • Types ccorrespondorrespond ttoo tterminalserminals • Values to contents of token read in • Examples • Int 549 – integer token with value 549 read in • if - if keyword, no need for a value • AddOp + - add operator, value + Basic Approach • Start with Start symbol • Bu ild a le ftft most der iva tion • If leftmost symbol is nonterminal, choose a production and applyapply itit • If leftmost symbol is terminal, match against input • If all terminals match, have found a parse! •Key: find correct productions for nonterminals Graphical Illustration of Leftmost Derivation Sentential Form NT1 T1 T2 T3 NT2 NT3 Apply Production Not Here Here Grammar for Parsing Example Start → Expr • Set ooff ttokokensens iiss Expr → Expr + Term { +, -, *, /, Int }, where Expr → Expr - Term Int = [0-9][0-9]* Expr → Term • FFoo r ccononveeniencenience , may reprepreesentsent Term → Term * Int each Int n token by n Term → Term / Int Term → Int Parsing Example Parse Remaining Input Tree Start 2-2*2 Sentential Form Start Current Position in Parse Tree Parsing Example Parse Remaining Input Tree Start 2-2*2 Exprp Sentential Form Expr Apppplied Production Start → Expr Current Position in Parse Tree Parsing Example Parse Remaining Input Tree Start 2-2*2 Exprp Sentential Form Expr - Term Expr - Term Expr → Expr + Term Apppplied Production Expr → Expr - Term Expr → Term Expr → Expr - Term Parsing Example Parse Remaining Input Tree Start 2-2*2 Exppr Sentential Form Expr - Term Term - Term Term Exprp → Exprp + Teerm Apppplied Production Expr → Expr - Term Expr → Term Expr →→Te Termrm Parsing Example Parse Remaining Input Tree Start 2-2*2 Exppr Sentential Form Expr - Term Int - Term Term Apppplied Production Int Term → Int Parsing Example Parse Remaining Input Tree Start Match Input 2-2*2 Exppr Token! Sentential Form Expr - Term 2 - Term Term Int 2 Parsing Example Parse Remaining Input Tree Start Match Input -2*2 Exppr Token! Sentential Form Expr - Term 2 - Term Term Int 2 Parsing Example Parse Remaining Input Tree Start Match Input 2*2 Exppr Token! Sentential Form Expr - Term 2 - Term Term Int 2 Parsing Example Parse Remaining Input Tree Start 2*2 Exppr Sentential Form Expr - Term 2 - Term *Int Term Term * Int Apppplied Production Int 2 Term → Term * Int Parsing Example Parse Remaining Input Tree Start 2*2 Exppr Sentential Form Expr - Term 2 - Int * Int Term Term * Int Apppplied Production Int 2 Int Term → Int Parsing Example Parse Remaining Input Tree Start Match Input 2*2 Exppr Token! Sentential Form Expr - Term 2 - 2* Int Term Term * Int Int 2 Int 2 Parsing Example Parse Remaining Input Tree Start Match Input *2 Exppr Token! Sentential Form Expr - Term 2 - 2* Int Term Term * Int Int 2 Int 2 Parsing Example Parse Remaining Input Tree Start Match Input 2 Exppr Token! Sentential Form Expr - Term 2 - 2* Int Term Term * Int Int 2 Int 2 Parsing Example Parse Remaining Input Tree Start Parse Complete! 2 Exppr Sentential Form Expr - Term 2 - 2* 2 Term Term * Int 2 Int 2 Int 2 Summary • Three Actions (Mechanisms) • AlApply pro ductit ion to expand current nonterminal in parse tree • Match ccurrenturrent terminal (consuming input) • Accept the parse as correct • Parser generates ppreorderreorder ttraversalraversal of parse ttreeree • visit parents before children • visit ssiblingsiblings ffromrom llefteft ttoo rrightight Policy Problem • Which production to use for each nonterminal? • Classical Separation of Policy and Mechanism • One Approach: Backtracking • Treat it as a search problem • At each choice point, try next alternative • If it is clear that current try fails, go back to previous choice and trytry something ddifferentifferent • General technique for searching • Used a lot in classical AI and natural langgguage processing (parsing, speech recognition) Backtracking Example Parse Remaining Input Tree Start 2-2*2 Sentential Form Start Backtracking Example Parse Remaining Input Tree Start 2-2*2 Exp r Sentential Form Expr Applied Production Start → Expr Backtracking Example Parse Remaining Input Tree Start 2-2*2 Expr Sentential Form Expr + Term Expr + Term Applied Production Expr → Expr + Term Backtracking Example Parse Remaining Input Tree Start 2-2*2 Expr Sentential Form Expr + Term Term + Term Term Applied Production Expr → Term Backtracking Example Parse Remaining Input Tree Start Match Input 2-2*2 Expr Token! Sentential Form Expr + Term Int + Term Term Applied Production Int Term → Int Backtracking Example Parse Remaining Input Tree Start Can’ t Match -2*2 Expr Input Sentential Form Token! Expr + Term 2 - Term Term Applied Production Int 2 Term → Int Backtracking Example Parse Remaining Input Tree Start So Backtrack! 2-2*2 Exp r Sentential Form Expr Applied Production Start → Expr Backtracking Example Parse Remaining Input Tree Start 2-2*2 Expr Sentential Form Expr - Term Expr - Term Applied Production Expr → Expr - Term Backtracking Example Parse Remaining Input Tree Start 2-2*2 Expr Sentential Form Expr - Term Term - Term Term Applied Production Expr → Term Backtracking Example Parse Remaining Input Tree Start 2-2*2 Expr Sentential Form Expr - Term Int - Term Term Applied Production Int Term → Int Backtracking Example Parse Remaining Input Tree Start Match Input -2*2 Expr Token! Sentential Form Expr - Term 2 - Term Term Int 2 Backtracking Example Parse Remaining Input Tree Start Match Input 2*2 Expr Token! Sentential Form Expr - Term 2 - Term Term Int 2 Left Recursion + Top-Down Parsing = Infinite Loop • Example Production: Term → Term*Num • PtPoten ttiial pars ing steps: Term Term Term Num Term * Num Term * Term * Num General Search Issues • Three components • Search space (parse trees) • Search algorithm (parsing(parsing algorithm)algorithm) • Goal to find (parse tree for input program) • Would like to (but can’t always) ensure that • Find goal (hopefully quickly) if it exists • Search terminates if it does not • Handled inin variousvarious ways in various contextscontexts • Finite search space makes it easy • Exploration strategies for infinite search space • Sometimes oneone goal more important (model checking) • For parsing, hack grammar to remove left recursion Eliminating Left Recursion • Start with productions of form •A→ A α • A →β • α, β sequences of terminals and nonterminals that do not startstart with A • Repeated application of A →A α A builds parseparse tree like tthis:his: A α A α β α Eliminating Left Recursion • Replacement productions –A→ A α A →βR R is a new nonterminal –A→β R →αR – R → ε New ParseParse TTrreeee Old Parse Tree A A R β R A α α R β α α ε Hacked Grammar Original Grammar New Grammar Fragment Fragment Te rm → In t Te rm ’ Term → Term * Int Term’ → * Int Term’ Term → Term / Int Term’ → / Int Term’ Term → Int Term’ → ε Parse Tree Comparisons Original Grammar New GGrraammarmmar Term Term Int Term’ Term * Int Term’ Int * Int * Int * Int Term’ ε Eliminating Left Recursion • Changes search space exploration algorithm • EElilim ina tes didirec t iifnfiiinitte recursion • But grammar less intuitive • Sets things up for predictivepredictive pparsingarsing Predictive Parsing • Alternative to backtracking • UfUseful for programming lan guages, whihich can be designed to make parsing easier • Basic iideadea • Look ahead in input stream • Decide which pproductionroduction ttoo aapplypply bbasedased on next tokens in input stream • We will use one token of lookahead Predictive Parsing Example Grammar Start → Expr Te rm → In t Te rm ’ Expr → Term Expr’ Term’ → * In t Term’ Expr’ → + Expr’ Term’ → / In t Term’ Expr’ → - Expr’ Term’ → ε Expr’ → ε Choice Points • Assume Term’ is current position in parse tree • Have three possiblepossible productionsproductions to apply Term’ → * Int Term’ Teerm’ → / Int Teerm’ Term’ →ε • Use next token to decide • If next token is *, apply Term’ → * Int Term’ • If next token is /, apply Term’ → / Int Term’ • Otherwise , aapplypply Term ’ → ε Predictive Parsing + Hand Coding = Recursive DescentDescent Parser • One procedure per nonterminal NT • PdProducttiions NT → β1 , …, NT → βn • Procedure examines the current input symbol T to determine which production to apppply •If T∈First(βk) • Apply production k • Consume terminals in βk (h(check for correct terminal) • Recursively call procedures for nonterminals in βk • Current input symbol stored in global variable token • Procedures return • true if parse succeedssucceeds • false if parse fails Boolean Term() Example if (token = Int n) token = NextToken(); return(TermPrime()) else return(false) Boolean TTermPrime()ermPrime() if (token = *) token = NextToken(); if (token = Int n) token = NexttTToken(); return((TTermPrime()) else return(false) else if (token = /) token = NextToken(); if (token = Int n) token = NextToken(); return(TermPrime()) else return(false) else return(true) Term → Int Term’ Term’ → * Int Term’ Term’ → / Int Term’ Term’ →ε Multiple Productions With Same Prefix in RHS • Example Grammar NT → if then NT → if then else • Assume NT is current position in parse ttree ree, aandnd if is the next token • Unclear whichwhich production to apply • Multiple

Load more