Language Processing Systems

Prof. Mohamed Hamada Syntax Analysis () Software Engineering Lab. The University of Aizu Japan

1. Uses Regular Expressions to define tokens Parsing 2. Uses Finite Automata to recognize tokens

next char next token Bottom Up Parsing lexical Syntax Top Down Parsing get next analyzer analyzer char get next token Source Shift-reduce Parsing Program symbol Predictive Parsing table LL(k) Parsing LR(k) Parsing (Contains a record for each identifier) Left Recursion Uses Top-down parsing or Bottom-up parsing Left Factoring To construct a

Parsing LR Parser Example

Top Down Parsing Bottom Up Parsing Input

S t Shift-reduce Parsing LR Parsing Predictive Parsing a Output Program LL(k) Parsing LR(k) Parsing c k Left Recursion action goto Left Factoring Shift reduce parser Shift reduce parser

1. Construct the action-goto table from the given grammar 1. Construct the action-goto table from the given grammar

This is what make difference between different typs 2. Apply the shift-reduce parsing algorithm to construct the parse tree of shift reduce parsing such as SLR, CLR, LALR

In this course due to short of time we will not study how to construct the action-goto table

Shift reduce parser 2. Apply the shift-reduce parsing algorithm to construct the parse tree LR Parser Example The following algorithm shows how we can construct the move parsing table for an input string w$ with respect to a given grammar G.

set ip to point to the first symbol of the input string w$ Can be parsed with this action repeat forever begin The following grammar: and goto table if action[top(stack), current-input(ip)] = shift(s) then begin push current-input(ip) then s on top of the stack (1) E → E + T (2) E T State action goto advance ip to the next input symbol → id + * ( ) $ E T F end (3) T → T * F 0 s5 s4 1 2 3 else if action[top(stack), current-input(ip)] = reduce A à β then (4) T → F 1 s6 acc begin (5) F → ( E ) 2 r2 s7 r2 r2 pop 2*|β| symbols off the stack; (6) F → id 3 r4 r4 r4 r4 push A then goto[top(stack), A] on top of the stack; 4 s5 s4 8 2 3 5 r6 r6 r6 r6 output the production A à β s represents shift 6 s5 s4 9 3 end 7 s5 s4 10 else if action[top(stack), current-input(ip)] = accept then r represents reduce 8 s6 s11 9 r1 s7 r1 r1 return acc represents accept 10 r3 r3 r3 r3 else error() 11 r5 r5 r5 r5 empty represents error end

GRAMMAR: GRAMMAR: (1) E → E + T (1) E → E + T (2) E →LR T Parser Example (2) E →LR T Parser Example (3) T → T * F (3) T → T * F OUTPUT: OUTPUT: (4) T → F (4) T → F (5) F → ( E ) (5) F → ( E ) INPUT: id + id * id $ INPUT: id * id + id $ (6) F → id (6) F → id

LR Parsing LR Parsing STACK: E0 STACK: E5 Program Program id 0

State action goto State action goto id + * ( ) $ E T F id + * ( ) $ E T F F 0 s5 s4 1 2 3 0 s5 s4 1 2 3 1 s6 acc 1 s6 acc id 2 r2 s7 r2 r2 2 r2 s7 r2 r2 3 r4 r4 r4 r4 3 r4 r4 r4 r4 4 s5 s4 8 2 3 4 s5 s4 8 2 3 5 r6 r6 r6 r6 5 r6 r6 r6 r6 6 s5 s4 9 3 6 s5 s4 9 3 7 s5 s4 10 7 s5 s4 10 8 s6 s11 8 s6 s11 9 r1 s7 r1 r1 9 r1 s7 r1 r1 10 r3 r3 r3 r3 10 r3 r3 r3 r3 11 r5 r5 r5 r5 11 r5 r5 r5 r5

GRAMMAR: GRAMMAR: (1) E → E + T (1) E → E + T (2) E →LR T Parser Example (2) E →LR T Parser Example (3) T → T * F (3) T → T * F OUTPUT: OUTPUT: (4) T → F (4) T → F (5) F → ( E ) (5) F → ( E ) INPUT: id * id + id $ INPUT: id * id + id $ (6) F → id (6) F → id

LR Parsing LR Parsing STACK: 0 STACK: E3 Program Program F 0 T State action goto State action goto id + * ( ) $ E T F F id + * ( ) $ E T F F 0 s5 s4 1 2 3 0 s5 s4 1 2 3 1 s6 acc id 1 s6 acc id 2 r2 s7 r2 r2 2 r2 s7 r2 r2 3 r4 r4 r4 r4 3 r4 r4 r4 r4 4 s5 s4 8 2 3 4 s5 s4 8 2 3 5 r6 r6 r6 r6 5 r6 r6 r6 r6 6 s5 s4 9 3 6 s5 s4 9 3 7 s5 s4 10 7 s5 s4 10 8 s6 s11 8 s6 s11 9 r1 s7 r1 r1 9 r1 s7 r1 r1 10 r3 r3 r3 r3 10 r3 r3 r3 r3 11 r5 r5 r5 r5 11 r5 r5 r5 r5

GRAMMAR: GRAMMAR: (1) E → E + T (1) E → E + T (2) E →LR T Parser Example (2) E →LR T Parser Example (3) T → T * F (3) T → T * F OUTPUT: OUTPUT: (4) T → F (4) T → F (5) F → ( E ) (5) F → ( E ) INPUT: id * id + id $ INPUT: id * id + id $ (6) F → id (6) F → id

LR Parsing LR Parsing STACK: 0 STACK: E2 Program Program T T 0 T State action goto State action goto id + * ( ) $ E T F F id + * ( ) $ E T F F 0 s5 s4 1 2 3 0 s5 s4 1 2 3 1 s6 acc id 1 s6 acc id 2 r2 s7 r2 r2 2 r2 s7 r2 r2 3 r4 r4 r4 r4 3 r4 r4 r4 r4 4 s5 s4 8 2 3 4 s5 s4 8 2 3 5 r6 r6 r6 r6 5 r6 r6 r6 r6 6 s5 s4 9 3 6 s5 s4 9 3 7 s5 s4 10 7 s5 s4 10 8 s6 s11 8 s6 s11 9 r1 s7 r1 r1 9 r1 s7 r1 r1 10 r3 r3 r3 r3 10 r3 r3 r3 r3 11 r5 r5 r5 r5 11 r5 r5 r5 r5

GRAMMAR: GRAMMAR: (1) E → E + T (1) E → E + T (2) E’ →LR T Parser Example (2) E’ →LR T Parser Example (3) T → T * F (3) T → T * F OUTPUT: OUTPUT: (4) T → F (4) T → F (5) F → ( E ) (5) F → ( E ) INPUT: id * id + id $ INPUT: id * id + id $ (6) F → id (6) F → id

LR Parsing LR Parsing STACK: E7 STACK: E5 Program Program * id 2 T 7 T F * T State action goto State action goto 0 id + * ( ) $ E T F F 2 id + * ( ) $ E T F F id 0 s5 s4 1 2 3 T 0 s5 s4 1 2 3 1 s6 acc id 1 s6 acc id 2 r2 s7 r2 r2 0 2 r2 s7 r2 r2 3 r4 r4 r4 r4 3 r4 r4 r4 r4 4 s5 s4 8 2 3 4 s5 s4 8 2 3 5 r6 r6 r6 r6 5 r6 r6 r6 r6 6 s5 s4 9 3 6 s5 s4 9 3 7 s5 s4 10 7 s5 s4 10 8 s6 s11 8 s6 s11 9 r1 s7 r1 r1 9 r1 s7 r1 r1 10 r3 r3 r3 r3 10 r3 r3 r3 r3 11 r5 r5 r5 r5 11 r5 r5 r5 r5

GRAMMAR: GRAMMAR: (1) E → E + T (1) E → E + T (2) E’ →LR T Parser Example (2) E’ →LR T Parser Example (3) T → T * F (3) T → T * F OUTPUT: OUTPUT: (4) T → F (4) T → F (5) F → ( E ) (5) F → ( E ) INPUT: id * id + id $ INPUT: id * id + id $ (6) F → id (6) F → id

LR Parsing LR Parsing STACK: E7 STACK: 10E Program Program T * F 2 T F 7 T * F * T State action goto State action goto 0 id + * ( ) $ E T F F id 2 id + * ( ) $ E T F F id 0 s5 s4 1 2 3 T 0 s5 s4 1 2 3 1 s6 acc id 1 s6 acc id 2 r2 s7 r2 r2 0 2 r2 s7 r2 r2 3 r4 r4 r4 r4 3 r4 r4 r4 r4 4 s5 s4 8 2 3 4 s5 s4 8 2 3 5 r6 r6 r6 r6 5 r6 r6 r6 r6 6 s5 s4 9 3 6 s5 s4 9 3 7 s5 s4 10 7 s5 s4 10 8 s6 s11 8 s6 s11 9 r1 s7 r1 r1 9 r1 s7 r1 r1 10 r3 r3 r3 r3 10 r3 r3 r3 r3 11 r5 r5 r5 r5 11 r5 r5 r5 r5

GRAMMAR: GRAMMAR: (1) E → E + T (1) E → E + T (2) E →LR T Parser Example (2) E →LR T Parser Example (3) T → T * F (3) T → T * F OUTPUT: OUTPUT: (4) T → F (4) T → F (5) F → ( E ) (5) F → ( E ) INPUT: id * id + id $ INPUT: id * id + id $ (6) F → id (6) F → id E LR Parsing LR Parsing STACK: 0 STACK: 2 Program T Program T T T * F 0 T * F State action goto State action goto id + * ( ) $ E T F F id id + * ( ) $ E T F F id 0 s5 s4 1 2 3 0 s5 s4 1 2 3 1 s6 acc id 1 s6 acc id 2 r2 s7 r2 r2 2 r2 s7 r2 r2 3 r4 r4 r4 r4 3 r4 r4 r4 r4 4 s5 s4 8 2 3 4 s5 s4 8 2 3 5 r6 r6 r6 r6 5 r6 r6 r6 r6 6 s5 s4 9 3 6 s5 s4 9 3 7 s5 s4 10 7 s5 s4 10 8 s6 s11 8 s6 s11 9 r1 s7 r1 r1 9 r1 s7 r1 r1 10 r3 r3 r3 r3 10 r3 r3 r3 r3 11 r5 r5 r5 r5 11 r5 r5 r5 r5

GRAMMAR: GRAMMAR: (1) E → E + T (1) E → E + T (2) E →LR T Parser Example (2) E’ →LR T Parser Example (3) T → T * F (3) T → T * F OUTPUT: OUTPUT: (4) T → F (4) T → F (5) F → ( E ) (5) F → ( E ) INPUT: id * id + id $ INPUT: id * id + id $ (6) F → id (6) F → id E E LR Parsing LR Parsing STACK: 0 STACK: 1 Program T Program T E T * F 0 T * F State action goto State action goto id + * ( ) $ E T F F id id + * ( ) $ E T F F id 0 s5 s4 1 2 3 0 s5 s4 1 2 3 1 s6 acc id 1 s6 acc id 2 r2 s7 r2 r2 2 r2 s7 r2 r2 3 r4 r4 r4 r4 3 r4 r4 r4 r4 4 s5 s4 8 2 3 4 s5 s4 8 2 3 5 r6 r6 r6 r6 5 r6 r6 r6 r6 6 s5 s4 9 3 6 s5 s4 9 3 7 s5 s4 10 7 s5 s4 10 8 s6 s11 8 s6 s11 9 r1 s7 r1 r1 9 r1 s7 r1 r1 10 r3 r3 r3 r3 10 r3 r3 r3 r3 11 r5 r5 r5 r5 11 r5 r5 r5 r5

GRAMMAR: GRAMMAR: (1) E → E + T (1) E → E + T (2) E’ →LR T Parser Example (2) E’ →LR T Parser Example (3) T → T * F (3) T → T * F OUTPUT: OUTPUT: (4) T → F (4) T → F (5) F → ( E ) (5) F → ( E ) INPUT: id * id + id $ INPUT: id * id + id $ (6) F → id (6) F → id E E LR Parsing LR Parsing STACK: 6 STACK: 5 Program T Program T F + id 1 T * F 6 T * F id E State action goto + State action goto 0 id + * ( ) $ E T F F id 1 id + * ( ) $ E T F F id 0 s5 s4 1 2 3 E 0 s5 s4 1 2 3 1 s6 acc id 1 s6 acc id 2 r2 s7 r2 r2 0 2 r2 s7 r2 r2 3 r4 r4 r4 r4 3 r4 r4 r4 r4 4 s5 s4 8 2 3 4 s5 s4 8 2 3 5 r6 r6 r6 r6 5 r6 r6 r6 r6 6 s5 s4 9 3 6 s5 s4 9 3 7 s5 s4 10 7 s5 s4 10 8 s6 s11 8 s6 s11 9 r1 s7 r1 r1 9 r1 s7 r1 r1 10 r3 r3 r3 r3 10 r3 r3 r3 r3 11 r5 r5 r5 r5 11 r5 r5 r5 r5

GRAMMAR: GRAMMAR: (1) E → E + T (1) E → E + T (2) E’ →LR T Parser Example (2) E’ →LR T Parser Example (3) T → T * F (3) T → T * F OUTPUT: OUTPUT: (4) T → F (4) T → F (5) F → ( E ) (5) F → ( E ) INPUT: id * id + id $ INPUT: id * id + id $ (6) F → id (6) F → id E E T LR Parsing LR Parsing STACK: 6 STACK: 3 Program T F Program T F + F 1 T * F id 6 T * F id E State action goto + State action goto 0 id + * ( ) $ E T F F id 1 id + * ( ) $ E T F F id 0 s5 s4 1 2 3 E 0 s5 s4 1 2 3 1 s6 acc id 1 s6 acc id 2 r2 s7 r2 r2 0 2 r2 s7 r2 r2 3 r4 r4 r4 r4 3 r4 r4 r4 r4 4 s5 s4 8 2 3 4 s5 s4 8 2 3 5 r6 r6 r6 r6 5 r6 r6 r6 r6 6 s5 s4 9 3 6 s5 s4 9 3 7 s5 s4 10 7 s5 s4 10 8 s6 s11 8 s6 s11 9 r1 s7 r1 r1 9 r1 s7 r1 r1 10 r3 r3 r3 r3 10 r3 r3 r3 r3 11 r5 r5 r5 r5 11 r5 r5 r5 r5

GRAMMAR: GRAMMAR: (1) E → E + T (1) E → E + T (2) E’ →LR T Parser Example (2) E’ →LR T Parser Example (3) T → T * F (3) T → T * F OUTPUT: OUTPUT: (4) T → F (4) T → F (5) F → ( E ) (5) F → ( E ) E INPUT: id * id + id $ INPUT: id * id + id $ (6) F → id (6) F → id E T E + T LR Parsing LR Parsing STACK: 6 STACK: 9 Program T F Program T F + T 1 T * F id 6 T * F id E State action goto + State action goto 0 id + * ( ) $ E T F F id 1 id + * ( ) $ E T F F id 0 s5 s4 1 2 3 E 0 s5 s4 1 2 3 1 s6 acc id 1 s6 acc id 2 r2 s7 r2 r2 0 2 r2 s7 r2 r2 3 r4 r4 r4 r4 3 r4 r4 r4 r4 4 s5 s4 8 2 3 4 s5 s4 8 2 3 5 r6 r6 r6 r6 5 r6 r6 r6 r6 6 s5 s4 9 3 6 s5 s4 9 3 7 s5 s4 10 7 s5 s4 10 8 s6 s11 8 s6 s11 9 r1 s7 r1 r1 9 r1 s7 r1 r1 10 r3 r3 r3 r3 10 r3 r3 r3 r3 11 r5 r5 r5 r5 11 r5 r5 r5 r5

GRAMMAR: GRAMMAR: (1) E → E + T (1) E → E + T (2) E →LR T Parser Example (2) E’ →LR T Parser Example (3) T → T * F (3) T → T * F OUTPUT: OUTPUT: (4) T → F (4) T → F (5) F → ( E ) E (5) F → ( E ) E INPUT: id * id + id $ INPUT: id * id + id $ (6) F → id (6) F → id E + T E + T LR Parsing LR Parsing STACK: 0 STACK: 1 Program T F Program T F E T * F id 0 T * F id State action goto State action goto id + * ( ) $ E T F F id id + * ( ) $ E T F F id 0 s5 s4 1 2 3 0 s5 s4 1 2 3 1 s6 acc id 1 s6 acc id 2 r2 s7 r2 r2 2 r2 s7 r2 r2 3 r4 r4 r4 r4 3 r4 r4 r4 r4 4 s5 s4 8 2 3 4 s5 s4 8 2 3 5 r6 r6 r6 r6 5 r6 r6 r6 r6 6 s5 s4 9 3 6 s5 s4 9 3 7 s5 s4 10 7 s5 s4 10 8 s6 s11 8 s6 s11 9 r1 s7 r1 r1 9 r1 s7 r1 r1 10 r3 r3 r3 r3 10 r3 r3 r3 r3 11 r5 r5 r5 r5 11 r5 r5 r5 r5

Constructing Parsing Grammar Hierarchy Tables Non-ambiguous CFG All LR parsers use the same parsing program that we demonstrated in the previous slides. CLR(1) What differentiates the LR parsers are the action LALR(1) and the goto tables: LL(1) Simple LR (SLR): succeeds for the fewest grammars, but is the easiest to implement. SLR(1) Canonical LR: succeeds for the most grammars, but is the hardest to implement. It splits states when necessary to prevent reductions that would get the parser stuck.

Lookahead LR (LALR): succeeds for most common syntactic constructions used in programming languages, but produces LR tables much smaller than canonical LR.

Parsing How parser works? 1. Uses Regular Expressions to define tokens 2. Uses Finite Automata to recognize tokens

Bottom Up Parsing next char next token Top Down Parsing lexical Syntax get next analyzer analyzer char get next token Source Shift-reduce Parsing Predictive Parsing Program symbol LL(k) Parsing LR(k) Parsing table (Contains a record for each identifier) Left Recursion Left Factoring Uses Top-down parsing or Bottom-up parsing To construct a Parse tree How to write parser? How to write a parser?