Recursive-Descent

22 March 2019 OSU CSE 1 BL Structure

Code Tokenizer Parser Generator

string of string of abstract string of characters tokens program integers (source code) (“words”) (object code)

Note that the parser starts with a string of tokens.

22 March 2019 OSU CSE 2 Plan for the BL Parser

• Design a context-free grammar (CFG) to specify syntactically valid BL programs • Use the grammar to implement a recursive-descent parser (i.e., an algorithm to parse a BL program and construct the corresponding Program object)

22 March 2019 OSU CSE 3 Parsing

• A CFG can be used to generate strings in its language – “Given the CFG, construct a string that is in the language” • A CFG can also be used to recognize strings in its language – “Given a string, decide whether it is in the language” – And, if it is, construct a derivation tree (or AST)

22 March 2019 OSU CSE 4 Parsing Parsing generally refers to this last • A CFG can step,be used i.e., going to generate from a stringstrings (in the in its languagelanguage) to its derivation tree or— – “Given the CFG,for a constructprogramming a string language that is— in the language”perhaps to an AST for the program. • A CFG can also be used to recognize strings in its language – “Given a string, decide whether it is in the language” – And, if it is, construct a derivation tree (or AST)

22 March 2019 OSU CSE 5 A Recursive-Descent Parser

• One parse method per non-terminal symbol • A non-terminal symbol on the right-hand side of a rewrite rule leads to a call to the parse method for that non-terminal • A terminal symbol on the right-hand side of a rewrite rule leads to “consuming” that token from the input token string • | in the CFG leads to “if-else” in the parser

22 March 2019 OSU CSE 6 Example: Arithmetic Expressions expr → expr add-op term | term term → term mult-op factor | factor factor → ( expr ) | digit-seq add-op → + | - mult-op → * | DIV | REM digit-seq → digit digit-seq | digit digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

22 March 2019 OSU CSE 7 A Problem expr → expr add-op term | term term → term mult-op factor | factor factor → ( expr ) | digit-seq add-op → + | - Do you see a mult-op → * | DIV | REM problem with a digit-seq → digit digit-seq | digitrecursive descent digit → | | | | | | parser| | for| this 0 1 2 3 4 5 6CFG?7 8 (Hint!9 )

22 March 2019 OSU CSE 8 A Solution expr → term { add-op term } term → factor { mult-op factor } factor → ( expr ) | digit-seq add-op → + | - mult-op → * | DIV | REM digit-seq → digit digit-seq | digit digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

22 March 2019 OSU CSE 9 A Solution expr → term { add-op term } term → factor { mult-op factor } factor → ( expr ) | digit-seq add-op → + | - The special CFG symbols { and } mult-op → | | mean* DIV that theREM enclosed sequence of digit-seq →symbolsdigit digit occurs-seq | zerodigit or more times; digit →this0 | 1helps| 2 | 3change| 4 | 5 a| 6left| 7-recursive| 8 | 9 CFG into an equivalent CFG that can be parsed by recursive descent.

22 March 2019 OSU CSE 10 The specialA Solution CFG symbols { and } also simplify a non-terminal for a number expr → termthat { addhas- opno termleading } zeroes. term → factor { mult-op factor } factor → ( expr ) | number add-op → + | - mult-op → * | DIV | REM number → 0 | nz-digit { 0 | nz-digit } nz-digit → 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

22 March 2019 OSU CSE 11 A Recursive-Descent Parser

• One parse method per non-terminal symbol • A non-terminal symbol on the right-hand side of a rewrite rule leads to a call to the parse method for that non-terminal • A terminal symbol on the right-hand side of a rewrite rule leads to “consuming” that token from the input token string • | in the CFG leads to “if-else” in the parser • {...} in the CFG leads to “while” in the parser

22 March 2019 OSU CSE 12 More Improvements If we treat every number as a token, expr → termthen { add things-op term get simpler } for the term → factorparser: { mult now-op there factor are } only 5 non- factor → ( exprterminals) | number to worry about. add-op → + | - mult-op → * | DIV | REM number → 0 | nz-digit { 0 | nz-digit } nz-digit → 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

22 March 2019 OSU CSE 13 More Improvements If we treat every add-op and mult-op expr → termas a { token,add-op then term it’s } even simpler: term → factorthere { mult are -onlyop factor 3 non }-terminals. factor → ( expr ) | number add-op → + | - mult-op → * | DIV | REM number → 0 | nz-digit { 0 | nz-digit } nz-digit → 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

22 March 2019 OSU CSE 14 Can you writeImprovements the tokenizer for this language, so every number, add-op, and mult-op is a token? expr → term { add-op term } term → factor { mult-op factor } factor → ( expr ) | number add-op → + | - mult-op → * | DIV | REM number → 0 | nz-digit { 0 | nz-digit } nz-digit → 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

22 March 2019 OSU CSE 15 Evaluating Arithmetic Expressions

• For this problem, parsing an arithmetic expression means evaluating it • The parser goes from a string of tokens in the language of the CFG on the previous slide, to the value of that expression as an int

22 March 2019 OSU CSE 16 Structure of Solution

"4 + 29 DIV 3" <"4", "+", "29", "DIV", "3", "EOI"> 13 Tokenizer Parser

string of string of value of characters tokens arithmetic (arithmetic expression expression)

22 March 2019 OSU CSE 17 StructureWe will use a ofQueue Solution to hold a mathematical value like this. "4 + 29 DIV 3" <"4", "+", "29", "DIV", "3", "EOI"> 13 Tokenizer Parser

string of string of value of characters tokens arithmetic (arithmetic expression expression)

22 March 2019 OSU CSE 18 StructureWe will also assume of Solution the tokenizer adds an “end-of-input” token at the end. "4 + 29 DIV 3" <"4", "+", "29", "DIV", "3", "EOI"> 13 Tokenizer Parser

string of string of value of characters tokens arithmetic (arithmetic expression expression)

22 March 2019 OSU CSE 19 Parsing an expr

• We want to parse an expr, which must start with a term and must be followed by zero or more (pairs of) add-ops and terms: expr → term { add-op term } • An expr has an int value, which is what we want returned by the method to parse an expr

22 March 2019 OSU CSE 20 Contract for Parser for expr

/** * Evaluates an expression and returns its value. * ... * @updates ts * @requires * [an expr string is a proper prefix of ts] * @ensures * valueOfExpr = [value of longest expr string at * start of #ts] and * #ts = [longest expr string at start of #ts] * ts */ private static int valueOfExpr(Queue ts) {...}

22 March 2019 OSU CSE 21 Parsing a term

• We want to parse a term, which must start with a factor and must be followed by zero or more (pairs of) mult-ops and factors: term → factor { mult-op factor } • A term has an int value, which is what we want returned by the method to parse a term

22 March 2019 OSU CSE 22 Contract for Parser for term

/** * Evaluates a term and returns its value. * ... * @updates ts * @requires * [a term string is a proper prefix of ts] * @ensures * valueOfTerm = [value of longest term string at * start of #ts] and * #ts = [longest term string at start of #ts] * ts */ private static int valueOfTerm(Queue ts) {...}

22 March 2019 OSU CSE 23 Parsing a factor

• We want to parse a factor, which must start with the token "(" followed by an expr followed by the token ")"; or it must be a number token: factor → ( expr ) | number • A factor has an int value, which is what we want returned by the method to parse a factor

22 March 2019 OSU CSE 24 Contract for Parser for factor

/** * Evaluates a factor and returns its value. * ... * @updates ts * @requires * [a factor string is a proper prefix of ts] * @ensures * valueOfFactor = [value of longest factor string at * start of #ts] and * #ts = [longest factor string at start of #ts] * ts */ private static int valueOfFactor(Queue ts){ ... }

22 March 2019 OSU CSE 25 Code for Parser for expr private static int valueOfExpr(Queue ts) { int value = valueOfTerm(ts); while (ts.front().equals("+") || ts.front().equals("-")) { String op = ts.dequeue(); int nextTerm = valueOfTerm(ts); if (op.equals("+")) { value = value + nextTerm; } else /* "-" */ { value = value - nextTerm; } } return value; } 22 March 2019 OSU CSE 26 exprCode for→ Parserterm { add for-op termexpr } add-op → + | - private static int valueOfExpr(Queue ts) { int value = valueOfTerm(ts); while (ts.front().equals("+") || ts.front().equals("-")) { String op = ts.dequeue(); int nextTerm = valueOfTerm(ts); if (op.equals("+")) { value = value + nextTerm; } else /* "-" */ { value = value - nextTerm; } } return value; } 22 March 2019 OSU CSE 27 Code for Parser for expr

private static int valueOfExpr(Queue ts) { int value = valueOfTerm(ts); while (ts.front().equals("+") || This method is ts.front().equals("-")) { very similar to String op = ts.dequeue(); int nextTerm = valueOfTerm(ts); valueOfExpr. if (op.equals("+")) { value = value + nextTerm; } else /* "-" */ { value = value - nextTerm; } } return value; } 22 March 2019 OSU CSE 28 Code for Parser for expr private static int valueOfExpr(Queue ts) { int value = valueOfTerm(ts); Look ahead while (ts.front().equals("+") || ts.front().equals("-")) { one token in String op = ts.dequeue(); ts to see int nextTerm = valueOfTerm(ts); what’s next. if (op.equals("+")) { value = value + nextTerm; } else /* "-" */ { value = value - nextTerm; } } return value; } 22 March 2019 OSU CSE 29 Code for Parser for expr private static int valueOfExpr(Queue ts) { int value = valueOfTerm(ts); while (ts.front().equals("+") || “Consume” ts.front().equals("-")) { the next token String op = ts.dequeue(); int nextTerm = valueOfTerm(ts); from ts. if (op.equals("+")) { value = value + nextTerm; } else /* "-" */ { value = value - nextTerm; } } return value; } 22 March 2019 OSU CSE 30 Code for Parser for expr private static int valueOfExpr(Queue ts) { int value = valueOfTerm(ts); while (ts.front().equals("+") || Evaluate next ts.front().equals("-")) { term. String op = ts.dequeue(); int nextTerm = valueOfTerm(ts); if (op.equals("+")) { value = value + nextTerm; } else /* "-" */ { value = value - nextTerm; } } return value; } 22 March 2019 OSU CSE 31 Code for Parser for expr private static int valueOfExpr(Queue ts) { int value = valueOfTerm(ts); Evaluate while (ts.front().equals("+") || ts.front().equals("-")) { (some of) the String op = ts.dequeue(); expression. int nextTerm = valueOfTerm(ts); if (op.equals("+")) { value = value + nextTerm; } else /* "-" */ { value = value - nextTerm; } } return value; } 22 March 2019 OSU CSE 32 Code for Parser for term private static int valueOfTerm(Queue ts) {

Can you write the body, using valueOfExpr as a guide?

}

22 March 2019 OSU CSE 33 Code for Parser for factor private static int valueOfFactor( Queue ts) { int value; if (ts.front().equals("(")) { ts.dequeue(); value = valueOfExpr(ts); ts.dequeue(); } else { String number = ts.dequeue(); value = Integer.parseInt(number); } return value; }

22 March 2019 OSU CSE 34 factorCode for→ Parser( expr ) for| number factor private static int valueOfFactor( Queue ts) { int value; if (ts.front().equals("(")) { ts.dequeue(); value = valueOfExpr(ts); ts.dequeue(); } else { String number = ts.dequeue(); value = Integer.parseInt(number); } return value; }

22 March 2019 OSU CSE 35 Code for Parser for factor private static int valueOfFactor( Queue ts) { int value; Look ahead if (ts.front().equals("(")) { one token in ts.dequeue(); to see value = valueOfExpr(ts); ts what’s next. ts.dequeue(); } else { String number = ts.dequeue(); value = Integer.parseInt(number); } return value; }

22 March 2019 OSU CSE 36 Code for Parser for factor private static int valueOfFactor( Queue ts) { int value; if (ts.front().equals("(")) { ts.dequeue(); What token value = valueOfExpr(ts); does this ts.dequeue(); throw away? } else { String number = ts.dequeue(); value = Integer.parseInt(number); } return value; }

22 March 2019 OSU CSE 37 Though method is called parseInt, it Codeis not for one Parser of our parser for methods; factor it is a private staticstatic int methodvalueOfFactor from the( Java library’s Queue ts) {class (with utilities). int value; Integer int if (ts.front().equals("(")) { ts.dequeue(); value = valueOfExpr(ts); ts.dequeue(); } else { String number = ts.dequeue(); value = Integer.parseInt(number); } return value; }

22 March 2019 OSU CSE 38 Code for Parser for factor private static int valueOfFactor( Queue ts) { int value; if (ts.front().equals("(")) { ts.dequeue(); value = valueOfExpr(ts); ts.dequeue(); } else { String numberRecursive = ts.dequeue descent();: notice that value = valueOfExprInteger.parseIntcalls(number); valueOfTerm, } which calls valueOfFactor, return value;which here may call . } valueOfExpr

22 March 2019 OSU CSE 39 Code for Parser for factor private static int valueOfFactor( Queue ts) { int value; How do you if (ts.front().equals("(")) { know this ts.dequeue(); (indirect) value = valueOfExpr(ts); recursion ts.dequeue(); terminates? } else { String number = ts.dequeue(); value = Integer.parseInt(number); } return value; }

22 March 2019 OSU CSE 40 A Recursive-Descent Parser

• One parse method per non-terminal symbol • A non-terminal symbol on the right-hand side of a rewrite rule leads to a call to the parse method for that non-terminal • A terminal symbol on the right-hand side of a rewrite rule leads to “consuming” that token from the input token string • | in the CFG leads to “if-else” in the parser • {...} in the CFG leads to “while” in the parser

22 March 2019 OSU CSE 41 Observations

• This is so formulaic that tools are available that can generate RDPs from CFGs • In the lab, you will write an RDP for a language similar to the one illustrated here – The CFG will be a bit different – There will be no tokenizer, so you will parse a string of characters in a Java StringBuilder • See methods charAt and deleteCharAt

22 March 2019 OSU CSE 42 Resources • Wikipedia: Recursive Descent Parser – http://en.wikipedia.org/wiki/Recursive_descent_parser • Java Libraries API: StringBuilder – http://docs.oracle.com/javase/8/docs/api/

22 March 2019 OSU CSE 43