Lecture 6: Syntax Analysis (Cont.)

Lecture 6: Syntax Analysis (cont.) Xiaoyuan Xie 谢晓园 [email protected] 计算机学院E301 Syntax Analysis 2018/10/20 2 Where are we ? Source Lexical Analysis Code Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Machine Code 3 Derivations Revisited • A derivation encodes two pieces of information: What productions were applied to produce the resulting string from the start symbol? In what order were they applied? • Multiple derivations might use the same productions, but apply them in a different order. 4 Derivation exercise 1 Productions: assign_stmt id := expr ; expr expr op term expr term term id Let’s derive: term real id := id + real – integer ; term integer Please use left-most derivation op + op - 5 id := id + real – integer ; Left-most derivation: Using production: assign_stmt assign_stmt id := expr ; id := expr ; expr expr op term id := expr op term ; expr expr op term id := expr op term op term ; expr term id := term op term op term ; term id id := id op term op term; op + id := id + term op term ; term real id := id + real op term ; op - id := id + real - term ; term integer id := id + real - integer; 6 Parse Trees • A parse tree is a tree encoding the steps in a derivation. • Internal nodes represent nonterminal symbols used in the production. • Inorder walk of the leaves contains the generated string. • Encodes what productions are used, not the order in which those productions are applied. 7 Parse Tree • Inner nodes of a parse tree are non-terminal symbols. • The leaves of a parse tree are terminal symbols. • A parse tree can be seen as a graphical representation of a derivation. EX. E -E -(E) -(E+E) -(id+E) -(id+id) E E -E -(E) E -(E+E) E - E - E - E (E ) (E ) E E E + E - - E E -(id+E) -(id+id) (E ) (E ) E + E E + E id id id 8 E E op E | ( E ) | -E | id op + | - | * | / E E op E E id op E E op E id + E id + E op E id + E op E id * id id + id op E id + id * E id + id * id 9 Parse Trees and Derivations Consider the expression grammar: E E+E | E*E | (E) | -E | id Leftmost derivations of id + id * id E E E E + E E + E id + E E + E E + E id E id + E id + E * E E + E id E * E 10 Parse Trees and Derivations (cont.) E id + E * E id + id * E E + E id E * E id E id + id * E id + id * id E + E id E * E id id 11 Alternative Parse Tree & Derivation E E * E E E + E * E E * E id + E * E E+ E id id + id * E id id id + id * id WHAT’S THE ISSUE HERE ? Two distinct leftmost derivations! 12 Challenges in Parsing 13 Ambiguity • A grammar produces more than one parse tree for a sentence is called as an ambiguous grammar. E E E+E id+E id+E*E E + E id+id*E id+id*id id E * E id id E E*E E+E*E id+E*E E id+id*E id+id*id E * E E + E id two parse trees for id+id*id. id id 14 Is Ambiguity a Problem? Depends on semantics. E E E E + E E * * id id E E E + E id id id id 15 Resolving Ambiguity • If a grammar can be made unambiguous at all, it is usually made unambiguous through layering. • Have exactly one way to build each piece of the string? • Have exactly one way of combining those pieces back together? 16 Resolving Ambiguity • For the most parsers, the grammar must be unambiguous. • unambiguous grammar • unique selection of the parse tree for a sentence • We should eliminate the ambiguity in the grammar during the design phase of the compiler. 17 Ambiguity – Operator Precedence • Ambiguous grammars (because of ambiguous operators) can be disambiguated according to the precedence and associativity rules. E E+E | E*E | E^E | id | (E) E E+T | T disambiguate the grammar T T*F | F F G^F | G precedence: ^ (right to left) G id | (E) * (left to right) + (left to right) Rewrite to eliminate the ambiguity Or, simply tell which parse tree should be selected 18 A Parser Context free grammar, G Yes, if s in L(G) Parser No, otherwise Token stream, s (from lexer) Error messages • Syntax analyzers (parsers) = CFG acceptors which also output the corresponding derivation when the token stream is accepted • Various kinds: LL(k), LR(k), SLR, LALR 19 Types • Top-Down Parsing • Recursive descent parsing • Predictive parsing • LL(1) • Bottom-Up Parsing • Shift-Reduce Parsing • LR parser 20 Homework Page 206: Exercise 4.2.1 Page 207: Exercise 4.2.2 (d) (f) (g) 21 Top-Down Parsing 2018/10/20 22 Two Key Points expression => term => term*factor => term/factor*factor – Q1: Which non-terminal to be replaced? Leftmost derivation – Q2: Which production to be used? Top-Down Parsing The parse tree is created top to bottom (from root to leaves). By always replacing the leftmost non-terminal symbol via a production rule, we are guaranteed of developing a parse tree in a left-to-right fashion that is consistent with scanning the input. 24 Pushdown Automaton An illustration with PDA Reading Analysis Head Stack Derivation Match? P: (1) Z aBeA abec Z Z production starting aBeA a with a？- (1) (2) A Bc (3) B d bec BeA B production starting bBeA b (4) B bB with b？ -(4) (5) B ec BeA B production starting eA e with e？-(5) a b e c c A A production starting with c? -(2)(5) An illustration with PDA Reading Stack Analysis Derivation Match? P: Head (1) Z aBeA c A A production Bc (2) A Bc starting with c?-(2) (3) B d (4) B bB Bc c c (5) B c A production starting with c? -(5) a b e c Problem - Backtraking • General category of Top-Down Parsing • Choose production rule based on input symbol • May require backtracking to correct a wrong choice. •Example: S c A d A ab | a S input: cad cad S cad cA d cA d a b Problem: backtrack S S S cad cad cad c d cA d A cA d a b a a 28 Problem – Left recursion • A grammar is Left Recursion if it has a nonterminal A such that there is a derivation A + A for some string . Left Recursion + top-down parsing = infinite loop Eg. Term Term*Num Term Term Term Num Term * Term Num * Term * Num …… Elimination of Left recursion • Eliminating Direct Left Recursion βiαi* Elimination of Left recursion • A A | elimination of left recursion PP PP| 1 • P P1| P2|…| P m | 1 | 2|… | n • elimination of left recursion P1 P| 2 P|…| nP P 1 P| 2 P|…| mP| 31 Elimination of Left recursion (eg.) • G[E]: E E+T|T T T*F|F F (E)| I Elimination of Left Recursion E TE E +TE| T FT T *FT| F (E)| i 32 Elimination of Left recursion (eg.) P PaPb|BaP • We have α = aPb, β=BaP • So, P βP’ Multiple P? Consider the most-left one. P’ αP’|ε • 改写后：P BaPP’ P’ aPbP’|ε Elimination of Indirect Left recursion Direct： Indirect： , , then we have e.g：S → Aa | b, A → Sd |ε S => Aa => Sda 1 34 Elimination of Left recursion algorithm 35 Elimination of Left recursion (eg.) S A b 1:S A S a | b 2:A A Aba | b A bA’ A’ baA’ | Elimination of Left recursion (eg.) 1:S 2:A S → Aa | b, S → Aa | b, ε A → Ac | Sd | A → Ac | Aad | bd |ε S → Aa | b, A →bdA’ | A’ A’ → cA’ | adA’ |ε Elimination of Left recursion (eg.) 1:S S Qc | c 2:Q Q Rb | b 3:R R S a | a S Qc | c Q Rb | b R Sa | a (Qc|c)a | a Qca | ca |a (Rb|b)ca | ca | a S Qc | c Q Rb | b R (bca | ca | a)R’ R’ bcaR’ | Elimination of Left recursion (eg.) 1:R S Qc | c 2:Q Q Rb | b 3:S R S a | a R Sa | a Q Rb | b Sab | ab | b S Qc | c Sabc | abc | bc | c S (abc | bc | c)S’ S’ abcS’ | Problem - Left Factoring • A 1 | 2 A A A 1 | 2 • A 1 | 2|… | n | A A| A 1 | 2|… | n 40 Problem - Left Factoring 41 Problem - Left Factoring • E.g – S iEtS | iEtSeS | a Eb – For, S, the longest pre-fix is iEtS, Thus, S iEtSS’ | a S’ eS| Eb Problem - Left Factoring • E.g. G： (1) S→aSb For (1)、(2)， extract the left factor： (2) S→aS S→ aS(b|ε) (3) S→ε S→ε We have G′： S→aSA A→b A→ε S→ε Homework Page 216: Exercise 4.3.1 44 Two Parsing Methods 2018/10/20 45 A Naïve Method – Recursive-Descent Parsing • Backtracking is needed (If a choice of a production rule does not work, we backtrack to try other alternatives.) • It is a general parsing technique, but not widely used. • Not efficient 46 Recursive-Descent Parsing A typical procedure for a nonterminal in a top-down parse 47 Recursive-Descent Parsing • Example Z ( ) B ( ) { { P: if (token == a) case token of (1) Z aBd {a} { match(a); d: match(d);break; (2) B d {d} B( ); c: match(c); break; (3) B c {c} match(d); b:{ match(b); (4) B bB {b} } B( ); break;} else error(); other: error(); } } a b c d void main() {read(); Z(); } A Non-Recursive Method – Predictive Parsing • no backtracking, efficient • needs a special form of grammars (LL(1) grammars).

Load more