Shift-Reduce Parsing Parsing Wrap-Up PL Feature: Type System
Total Page:16
File Type:pdf, Size:1020Kb
Shift-Reduce Parsing continued Parsing Wrap-Up PL Feature: Type System CS F331 Programming Languages CSCE A331 Programming Language Concepts Lecture Slides Wednesday, February 20, 2019 Glenn G. Chappell Department of Computer Science University of Alaska Fairbanks [email protected] © 2017–2019 Glenn G. Chappell Review Overview of Lexing & Parsing Parsing Lexer Parser Character Lexeme AST Stream Stream or Error cout << ff(12.6); cout << ff(12.6); expr id op id lit punct op op binOp: << Two phases: expr expr § Lexical analysis (lexing) id: cout funcCall § Syntax analysis (parsing) id: ff expr The output of a parser is often an abstract numLit: 12.6 syntax tree (AST). Specifications of these can vary. 20 Feb 2019 CS F331 / CSCE A331 Spring 2019 2 Review The Basics of Syntax Analysis — Categories of Parsers Parsing methods can be divided into two broad categories. Top-Down § Go through derivation from top to bottom, expanding nonterminals. § Important subcategory: LL parsers (read input Left-to-right, produce Leftmost derivation). § Often hand-coded—but not always. § Method we look at: Predictive Recursive Descent. Bottom-Up § Go through the derivation from bottom to top, reducing substrings to nonterminals. § Important subcategory: LR parsers (read input Left-to-right, produce Rightmost derivation). § Almost always automatically generated. § Method we look at: Shift-Reduce. 20 Feb 2019 CS F331 / CSCE A331 Spring 2019 3 Review The Basics of Syntax Analysis — Categories of Grammars LL(1) grammars: those usable by LL parsers without lookahead. LR(1) grammars: those usable by LR parsers without lookahead. All Grammars CFGs LR(1) Grammars Regular Grammars LL(1) Grammars 20 Feb 2019 CS F331 / CSCE A331 Spring 2019 4 Review Recursive-Descent Parsing [1/2] Recursive Descent is a top-down parsing method. § When we avoid backtracking: predictive. Predictive Recursive- Descent is an LL parsing method. § There is one parsing function for each nonterminal. This parses all strings that the nonterminal can be expanded into. The natural grammar for expressions with left-associative binary operators is not LL(1). But we can transform it appropriately. Not Usable (left recursion) Usable e → t e → t { ( “+” | “-” ) t } | e ( “+” | “-” ) t 20 Feb 2019 CS F331 / CSCE A331 Spring 2019 5 Review Recursive-Descent Parsing [2/2] On a correct parse, a parser typically returns an abstract syntax tree (AST). We specify the format of an AST for each line in our grammar. It is helpful to include information in the AST telling what kind of entity each node represents. Expression: a + 2 AST (diagram): binOp: + simpleVar: a numLit: 2 AST (Lua): {{ BIN_OP, "+" }, { SIMPLE_VAR, "a" }, { NUMLIT_VAL, "2" }} See rdparser4.lua. 20 Feb 2019 CS F331 / CSCE A331 Spring 2019 6 Review Shift-Reduce Parsing [1/3] We are looking at a class of table-based bottom-up parsing algorithms. Tables are produced before execution. Shortly we will take a brief look at this. Parser execution uses a state machine with a stack, called a Shift- Reduce parser. The name comes from two operations: § Shift. Advance to the next input symbol. § Reduce. Apply a production: replace substring with nonterminal. In the form we presented it, Shift-Reduce parsing is an LR parsing method that can handle all LR(1) grammars. As we will see, in practice, the class of grammars is usually further restricted. 20 Feb 2019 CS F331 / CSCE A331 Spring 2019 7 Review Shift-Reduce Parsing [2/3] The parser runs as a state machine with an associated stack. § A stack item holds a symbol—terminal or nonterminal—and a state. § The current state is the state in the top stack item. Top st ack it em expr 3 Current state ID 7 Stack: == 5 On the left: symbol On the right: state The parsing table includes: action table (columns are terminals) and goto table (columns are nonterminals). Rows are states. Operation § Begin by pushing an item holding the start state and any symbol. § At each step, do a lookup in the action table using the current state and the current input symbol. Do what the action table entry says. 20 Feb 2019 CS F331 / CSCE A331 Spring 2019 8 Review Shift-Reduce Parsing [3/3] Action-Table Entries S# (# is the number of a state) Shift—Push item: current symbol + state #. Advance input. R# (# is the number of a production) Reduce—Pop RHS of production #. Push LHS + state from goto table (lookup: state before push + LHS nonterminal). ACCEPT Terminate: syntactically correct. ERROR (I represent this by a blank table cell) Terminate: syntax error. 20 Feb 2019 CS F331 / CSCE A331 Spring 2019 9 Shift-Reduce Parsing continued ASTs How can a Shift-Reduce parser construct and return an AST? Hold three things in each stack item: symbol, state, AST. Process § When doing Shift, push the AST for the lexeme being shifted. § When doing Reduce, construct and push the AST for the new nonterminal, based on the ASTs in the popped stack items. § When doing ACCEPT, the top of the stack should have the start symbol (program?) and its AST. Return this AST to the caller. If an AST is stored as a pointer-based tree, and a stack item holds a pointer to the root node of the AST, then this process can be very efficient. In particular AST nodes never need to be copied. 20 Feb 2019 CS F331 / CSCE A331 Spring 2019 10 Shift-Reduce Parsing Parsing-Table Generation [1/2] Shift-Reduce parsers (and variations) dominate the field of automatically generated parsers. A parsing table can be generated using a formalized, automatic process that is similar to the process we followed when writing a state-machine lexer. However, for a grammar describing a real-world PL, the resulting parsing tables are typically far too large. Thus, when Shift- Reduce parsers were first introduced [D. Knuth 1965], they were not considered practical. This changed as ways were found of producing much smaller parsing tables. 20 Feb 2019 CS F331 / CSCE A331 Spring 2019 11 Shift-Reduce Parsing Parsing-Table Generation [2/2] One practical way of generating smaller parsing tables is Lookahead LR (LALR) [F. DeRemer 1969]. Despite the name, this does not necessarily do lookahead as we have described it; rather, during parsing-table generation, a kind of hypothetical lookahead is done, with the results used to collapse multiple states into a single state. However, this state-collapse idea does not work for all LR(k) grammars. The LR(k) grammars for which it gives correct results are called—you guessed it—LALR(k) grammars. LALR parsers appear to be the most common of the automatically generated parsers. In particular, the Yacc (“Yet Another Compiler Compiler”) parser generator and its descendants (GNU Bison, etc.), use LALR. 20 Feb 2019 CS F331 / CSCE A331 Spring 2019 12 Shift-Reduce Parsing Lookahead [1/2] Multiple-lexeme lookahead makes Predictive Recursive-Descent parsing a more powerful technique. Without lookahead, such a parser can handle LL(1) grammars and the associated LL(1) languages. With an additional lexeme of lookahead, we can handle LL(2) grammars and LL(2) languages. This allows us to use grammars and languages that we could not handle without lookahead. But the situation with LR parsers is different. 20 Feb 2019 CS F331 / CSCE A331 Spring 2019 13 Shift-Reduce Parsing Lookahead [2/2] We can modify a Shift-Reduce parser to do multiple-lexeme lookahead. The grammars that can be used by an LR parser (like Shift-Reduce) that makes each decision based on k upcoming lexemes are the LR(k) grammars. And the languages that can be generated by an LR(k) grammar are the LR(k) languages. There are LR(2) grammars that are not LR(1) grammars. And there are LR(3) grammars that are not LR(2) gammars, etc. However, the class of LR(k) languages is exactly the same class for all values of k. For example, given an LR(2) grammar, we can always transform it into an LR(1) grammar that generates the same language. Adding lookahead thus allows an LR parser to handle additional grammars, but no additional languages. 20 Feb 2019 CS F331 / CSCE A331 Spring 2019 14 Shift-Reduce Parsing Automata [1/2] A Deterministic Push-Down Automaton (DPDA) is a DFA with a stack added. Each transition has an associated push or pop. DPDAs are not capable of recognizing all CFLs. Those they can recognize are the Deterministic Context-Free Languages. Note, for those interested: The machine that can recognize any CFL is the Nondeterministic Push-Down Automaton (NPDA). An automaton is nondeterministic if its operation is not fully specified; perhaps there are choices along the way. A nondeterministic automaton accepts its input if there is some sequence of choices that leads to acceptance in the ordinary deterministic sense. 20 Feb 2019 CS F331 / CSCE A331 Spring 2019 15 Shift-Reduce Parsing Automata [2/2] A Shift-Reduce parser is almost a DPDA. The main difference is that a DPDA only pops one stack item at a time, while a Reduce operation may pop multiple items. But this difference is not sufficient to affect the languages that can be recognized. In fact, the following three categories of languages are identical. § The languages that can be generated by an LR(1) grammar, that is, the LR(k) languages (for any k). § The languages that can be parsed by a Shift-Reduce parser. § The languages that can be recognized by a DPDA, that is, the Deterministic Context-Free Languages. 20 Feb 2019 CS F331 / CSCE A331 Spring 2019 16 Parsing Wrap-Up Efficiency of Parsing [1/5] We have discussed parsing algorithms that can handle some, but not all, CFLs.