Recursive-Descent Parsing Continued
Total Page:16
File Type:pdf, Size:1020Kb
Recursive-Descent Parsing continued CS F331 Programming Languages CSCE A331 Programming Language Concepts Lecture Slides Friday, February 14, 2020 Glenn G. Chappell Department of Computer Science University of Alaska Fairbanks [email protected] © 2017–2020 Glenn G. Chappell Review 2020-02-14 CS F331 / CSCE A331 Spring 2020 2 Review Overview of Lexing & Parsing Parsing Lexer Parser Character Lexeme AST or Stream Stream Error return (*dp + 2.6); //x return (*dp + 2.6); //x returnStmt key op id op num binOp: + lit punct punct unOp: * numLit: 2.6 Two phases: id: dp § Lexical analysis (lexing) § Syntax analysis (parsing) The output of a parser is typically an abstract syntax tree (AST). Specifications of these will vary. 2020-02-14 CS F331 / CSCE A331 Spring 2020 3 Review The Basics of Syntax Analysis — Categories of Parsers Parsing methods can be divided into two broad categories. Top-Down Parsers § Go through derivation top to bottom, expanding nonterminals. § Important subcategory: LL parsers (read input Left-to-right, produce Leftmost derivation). § Often hand-coded—but not always. § Method we look at: Predictive Recursive Descent. Bottom-Up Parsers § Go through the derivation bottom to top, reducing substrings to nonterminals. § Important subcategory: LR parsers (read input Left-to-right, produce Rightmost derivation). § Almost always automatically generated. § Method we look at: Shift-Reduce. 2020-02-14 CS F331 / CSCE A331 Spring 2020 4 Review The Basics of Syntax Analysis — Categories of Grammars LL(k) grammars: those usable by LL parsers that k is a number. make decisions based on the next k lexemes. Similarly: LR(k) grammars: those usable by LR parsers that make decisions based on the next k lexemes. All Grammars CFGs LR(1) Grammars Regular Grammars LL(1) Grammars 2020-02-14 CS F331 / CSCE A331 Spring 2020 5 Review Recursive-Descent Parsing — Intro thru Example #2 [1/2] Recursive Descent is a top-down parsing method. When we avoid backtracking: predictive. Predictive Recursive- Descent is an LL parsing method. There is one parsing function for each nonterminal. This parses all strings that the nonterminal can be expanded into. Parsing-function code is a translation of the right-hand side of the production for the nonterminal. § A terminal in the right-hand side becomes a check that the input string contains the proper lexeme. § A nonterminal in the right-hand side becomes a call to its parsing function. § If that returns false, our function must return false—no backtracking. § Brackets [ … ] in the right-hand side become a conditional. § Braces { … } in the right-hand side become See rdparser2.lua a loop. & userdparser2.lua. 2020-02-14 CS F331 / CSCE A331 Spring 2020 6 Review Recursive-Descent Parsing — Intro thru Example #2 [2/2] A naively written parser may call some inputs correct when it cannot parse the entire input. Example: ((x))) Syntactically Extra correct stuff Two Solutions § Introduce a new end of input lexeme (standard symbol: $). Revise the grammar to include it. § After parsing, check whether the end of the input was reached. Our parsers use the second solution. See rdparser2.lua & userdparser2.lua. 2020-02-14 CS F331 / CSCE A331 Spring 2020 7 Review Recursive-Descent Parsing — Example #3: Expressions [1/3] We wish to parse arithmetic expressions in their usual form, with variables, numeric literals, binary +, -, *, and / operators, and parentheses. When given a syntactically correct expression, our parser should return an abstract syntax tree (AST). All operators will be binary and left-associative: “a + b + c” means “(a + b) + c”. Precedence will be as usual: “a + b * c” means “a + (b * c)”. These may be overridden using parentheses: “(a + b) * c”. Due to the limitations of our in-class lexer, the expression “k-4” will need to be written as “k - 4”. 2020-02-14 CS F331 / CSCE A331 Spring 2020 8 Review Recursive-Descent Parsing — Example #3: Expressions [2/3] We started with Grammar 3, which has start symbol expr. This grammar encodes our associativity and precedence rules, and it allows us to use parentheses to override them. Grammar 3 expr → term | expr ( “+” | “-” ) term term → factor | term ( “*” | “/” ) factor factor → ID | NUMLIT | “(” expr “)” But Grammar 3 is not an LL(1) grammar, so we cannot use it. We see the problem if we try to write a Recursive-Descent parser. 2020-02-14 CS F331 / CSCE A331 Spring 2020 9 Review Recursive-Descent Parsing — Example #3: Expressions [3/3] Grammar 3 function parse_expr() expr → term if parse_term() then | expr ( “+” | “-” ) term … -- Construct AST term → factor return true | term ( “*” | “/” ) factor elseif parse_expr() then factor → ID … | NUMLIT | “(” expr “)” The above code has problems. § First, we are not supposed to be doing backtracking; if the call to parse_term returns false, then we must return false. § But even if we solve that, there is another problem. If parse_expr is called with input that does not begin with a valid term, then we get infinite recursion. 2020-02-14 CS F331 / CSCE A331 Spring 2020 10 Review Recursive-Descent Parsing — LL(1) Grammars An LL(1) grammar is a CFG that can be handled by an LL parsing method—such as Predictive Recursive Descent—without lookahead. The name comes from the fact that these parsers handle their input in a Left-to-right order, and they go through the steps required to generate a Leftmost derivation. When there is a choice to be made, an LL parser without lookahead must be able to make that choice based on the current lexeme. If this cannot be done, then the grammar is not LL(1). 2020-02-14 CS F331 / CSCE A331 Spring 2020 11 Review Recursive-Descent Parsing — Transforming Grammars Sometimes, we can transform a non-LL(1) grammar into an LL(1) grammar that generates the same language. For example, here is Grammar C, which is not LL(1), along with an LL(1) grammar that generates the same language. Grammar C Grammar Ca xx → yy | zz xx → yy | zz | “” yy → “” | “a” yy → “a” zz → “” | “b” zz → “b” 2020-02-14 CS F331 / CSCE A331 Spring 2020 12 Review Recursive-Descent Parsing — Back to Ex. #3 (Left-Associativity) Grammar 3b, below, is what we function parse_expr() want. It works with a Predictive if not parse_term() then Recursive-Descent parser, and return false we can use it to parse left- end associative binary operators. while matchString("+") or matchString("-") do Grammar 3b if not parse_term() then return false expr → term { ( “+” | “-” ) term } end term → factor { ( “*” | “/” ) factor } end factor → ID return true | NUMLIT end | “(” expr “)” Now, what about generating an AST? 2020-02-14 CS F331 / CSCE A331 Spring 2020 13 Recursive-Descent Parsing continued 2020-02-14 CS F331 / CSCE A331 Spring 2020 14 Recursive-Descent Parsing Back to Example #3: Expressions — ASTs [1/4] We want write a parser that returns an abstract syntax tree (AST). First we review some rooted-tree terminology. A node with one or more children—that is, a node A that is not a leaf—is an internal node. B C The tree at right has two internal nodes, labeled A and C. D E The subtrees of an internal node are those rooted A at each of its children. In the tree at right, the two subtrees of the A B C node are circled. D E 2020-02-14 CS F331 / CSCE A331 Spring 2020 15 Recursive-Descent Parsing Back to Example #3: Expressions — ASTs [2/4] Recall that a parse tree, or concrete syntax tree, includes one leaf node for each lexeme in the input, and also one node for each nonterminal in the derivation. Below is the parse tree for a + 2, based on Grammar 3b. expr This is not new! term + term It is the same kind of tree we drew before. factor factor a 2 2020-02-14 CS F331 / CSCE A331 Spring 2020 16 Recursive-Descent Parsing Back to Example #3: Expressions — ASTs [3/4] In an abstract syntax tree (AST), each internal node represents an operation. Its subtrees are the ASTs for the entities the operation is applied to. On the left is our parse tree for a + 2. On the right is an AST for the same expression. Abstract Parse Tree expr + Syntax Tree This is new. term + term a 2 factor factor a 2 Above, operation is broadly defined. For example, an AST node might represent the operation “execute all these statements, in order”. Its subtrees would be ASTs for the statements. 2020-02-14 CS F331 / CSCE A331 Spring 2020 17 Recursive-Descent Parsing Back to Example #3: Expressions — ASTs [4/4] The subtrees of an internal node in an AST are ASTs themselves. So we can build an AST from smaller ASTs in the same way we build an expression from smaller expressions. AST for AST for + a + 2 * (a + 2) * b a 2 + b a 2 Notes § ASTs omit portions of syntax that only serve to guide the parser— like semicolons, parentheses, commas, extra nonterminals. § There is no universal specification for an AST. Instead the ASTs used in a software project are specified so as to best meet the needs of that project. 2020-02-14 CS F331 / CSCE A331 Spring 2020 18 Recursive-Descent Parsing Back to Example #3: Expressions — Representing ASTs [1/7] + * a 2 + b a 2 We need to represent ASTs like these in Lua. § Represent a single node by the string form of its lexeme. § If there is more than one node in an AST, then represent it as an array whose first item represents the root, while each remaining item represents one of the subtrees of the root, in order. The first AST, in Lua: { "+", "a", "2" } TRY IT: What is the Lua representation of the second AST? 2020-02-14 CS F331 / CSCE A331 Spring 2020 19 Recursive-Descent Parsing Back to Example #3: Expressions — Representing ASTs [2/7] + * a 2 + b a 2 We need to represent ASTs like these in Lua.