Compiler Construction Lecture 4: Lexical Analysis in the Real World 2020-01-17 Michael Engel

Compiler Construction Lecture 4: Lexical analysis in the real world 2020-01-17 Michael Engel Includes material by Jan Christian Meyer Overview • NFA to DFA conversion • Subset construction algorithm • DFA state minimization: • Hopcroft's algorithm • Myhill-Nerode method • Using a scanner generator • lex syntax and usage • lex examples Compiler Construction 04: Lexical analysis in the real world !2 What have we achieved so far? • We know a method to convert a regular expression: (all | and) into a nondeterministic finite automaton (NFA): l a l a d n using the McNaughton, Thompson and Yamada algorithm Compiler Construction 04: Lexical analysis in the real world !3 Overhead of constructed NFAs Let’s look at another example: a(b|c)* • Construct the simple NFAs for a, b and c a b c s0 s1 s2 s3 s4 s5 • Construct the NFA for b|c b " s2 s3 " s6 s7 " s5 " s4 c Compiler Construction 04: Lexical analysis in the real world !4 Overhead of constructed NFAs • Now construct the NFA for (b|c)* " b " s2 s3 " " " s8 s6 s7 s9 " s5 " s4 c " • Looks pretty complex already? We're not even finished… Compiler Construction 04: Lexical analysis in the real world !5 Overhead of constructed NFAs • Finally, construct the NFA for a(b|c)* " b " s2 s3 " a " " " s0 s1 s8 s6 s7 s9 " s4 s5 " c " • This NFA has many more states than a minimal human-built DFA: b,c a s0 s1 Compiler Construction 04: Lexical analysis in the real world !6 From NFA to DFA • An NFA is not really helpful …since its implementation is not obvious • We know: every DFA is also an NFA (without "-transitions) • Every NFA can also be converted to an equivalent DFA (this can be proven by induction, we just show the construction) • The method to do this is called subset construction: The alphabet stays the same NFA: ( QN, �, �N, n0, FN ) � The set of states QN, transition function �N, start state qN0 DFA: ( QD, �, �D, d0, FD ) and set of accepting states FN are modified Compiler Construction 04: Lexical analysis in the real world !7 Subset construction algorithm q0 ← ε-closure({n0}); Idea of the algorithm: QD ← q0; Find sets of states that are WorkList ← {q0}; equivalent (due to "- transitions) and join these to while (WorkList != ∅) do form states of a DFA remove q from WorkList; for each character c∈�︎ do ε-closure: contains a set of states S and t ← ε-closure( N(q,c)); � any states in the NFA that can �D[q,c] ← t; be reached from one of the if t ∉ QD then states in S along paths that add t to QD and to WorkList; contain only "-transitions end; (these are identical to a state end; in S) Compiler Construction 04: Lexical analysis in the real world !8 Subset construction" example b q0 ← ε-closure({n0}); " n4 n5 " QD ← q0; a " " " WorkList ← {q0}; n0 n1 n2 n3 n8 n9 while (WorkList != ∅) do " n6 n7 " c remove q from WorkList; " for each character c∈�︎ do t ← ε-closure(�N(q,c)); �N a b c ε �D[q,c] ← t; n0 n1 – – – if t ∉ QD then n1 – – – n2 add t to QD and to WorkList; n2 – – – n3,n9 end; n3 – – – n4,n6 q0 ← {n0} end; n4 – n5 – – QD ← {n0}; n5 – – – n8 WorkList ← {n0}; n6 – – n7 – n7 – – – n8 n8 – – – n3,n9 n9 – – – – Compiler Construction 04: Lexical analysis in the real world !9 Subset construction" example b q0 ← ε-closure({n0}); " n4 n5 " QD ← q0; a " " " WorkList ← {q0}; n0 n1 n2 n3 n8 n9 while (WorkList != ∅) do " n6 n7 " c remove q from WorkList; " for each character c∈�︎ do t ← ε-closure(�N(q,c)); �N a b c ε while-loop Iteration 1 �D[q,c] ← t; n0 n1 – – – WorkList ← {{n0}}; if t ∉ QD then n1 – – – n2 ← q n0; add t to QD and to WorkList; n2 – – – n3,n9 c ← 'a': end; n3 – – – n4,n6 t ← ε-closure(�N(q,c)) end; n4 – n5 – – = ε-closure(�N(n0,’a')) n5 – – – n8 = ε-closure(n1) = {n1,n2,n3,n4,n6,n9} n6 – – n7 – D[n0,’a']←{n1,n2,n3,n4,n6,n9}; n7 – – – n8 � QD ←{{n0},{n1,n2,n3,n4,n6,n9}}; n8 – – – n3,n9 WorkList ← n9 – – – – {{n1,n2,n3,n4,n6,n9}}; Compiler Construction 04: Lexical analysis in the real world !10 Subset construction" example b q0 ← ε-closure({n0}); " n4 n5 " QD ← q0; a " " " WorkList ← {q0}; n0 n1 n2 n3 n8 n9 while (WorkList != ∅) do " n6 n7 " c remove q from WorkList; " for each character c∈�︎ do t ← ε-closure(�N(q,c)); �N a b c ε while-loop Iteration 1: �D[q,c] ← t; n0 n1 – – – if t ∉ QD then n1 – – – n2 WorkList ← {n0}; add t to QD and to WorkList; q ← n ; n2 – – – n3,n9 0 end; c ← 'b','c': n3 – – – n4,n6 end; t ← {} n4 – n5 – – no change to QD, Worklist n5 – – – n8 n6 – – n7 – n7 – – – n8 We will skip the iterations n8 – – – n3,n9 of the for loopD from that now do noton change Q n9 – – – – Compiler Construction 04: Lexical analysis in the real world !11 Subset construction" example b q0 ← ε-closure({n0}); " n4 n5 " QD ← q0; a " " " WorkList ← {q0}; n0 n1 n2 n3 n8 n9 while (WorkList != ∅) do " n6 n7 " c remove q from WorkList; " for each character c∈�︎ do t ← ε-closure(�N(q,c)); �N a b c ε while-loop Iteration 2 �D[q,c] ← t; n0 n1 – – – WorkList = {{n1,n2,n3,n4,n6,n9}}; if t ∉ QD then n1 – – – n2 ← q {n1,n2,n3,n4,n6,n9}; add t to QD and to WorkList; n2 – – – n3,n9 c ← 'b': end; n3 – – – n4,n6 t ← ε-closure(�N(q,c)) end; n4 – n5 – – = ε-closure(�N(q,’b’)) n5 – – – n8 = ε-closure(n5) = {n5,n8,n9,n3,n4,n6} n6 – – n7 – D[q,’a']←{n5,n8,n9,n3,n4,n6}; n7 – – – n8 � QD ←{{n0},{n1,n2,n3,n4,n6,n9}, n8 – – – n3,n9 {n5,n8,n9,n3,n4,n6}}; n9 – – – – WorkList ← {{n5,n8,n9,n3,n4,n6}}; Compiler Construction 04: Lexical analysis in the real world !12 Subset construction" example b q0 ← ε-closure({n0}); " n4 n5 " QD ← q0; a " " " WorkList ← {q0}; n0 n1 n2 n3 n8 n9 while (WorkList != ∅) do " n6 n7 " c remove q from WorkList; " for each character c∈�︎ do t ← ε-closure(�N(q,c)); �N a b c ε while-loop Iteration 2 �D[q,c] ← t; n0 n1 – – – WorkList = {{n1,n2,n3,n4,n6,n9}}; if t ∉ QD then n1 – – – n2 ← q {n1,n2,n3,n4,n6,n9}; add t to QD and to WorkList; n2 – – – n3,n9 c ← 'c': end; n3 – – – n4,n6 t ← ε-closure(�N(q,c)) end; n4 – n5 – – = ε-closure(�N(q,’c’)) n5 – – – n8 = ε-closure(n7) = {n7,n8,n9,n3,n4,n6} n6 – – n7 – D[q,’a’]←{n7,n8,n9,n3,n4,n6}; n7 – – – n8 � QD ←{{n0},{n1,n2,n3,n4,n6,n9}, n8 – – – n3,n9 {n5,n8,n9,n3,n4,n6}, n9 – – – – {n7,n8,n9,n3,n4,n6}}; WorkList ← {{n7,n8,n9,n3,n4,n6}}; Compiler Construction 04: Lexical analysis in the real world !13 Subset construction" example b q0 ← ε-closure({n0}); " n4 n5 " QD ← q0; a " " " WorkList ← {q0}; n0 n1 n2 n3 n8 n9 while (WorkList != ∅) do " n6 n7 " c remove q from WorkList; " for each character c∈�︎ do t ← ε-closure(�N(q,c)); �N a b c ε while-loop Iteration 3 �D[q,c] ← t; n0 n1 – – – WorkList = {{n7,n8,n9,n3,n4,n6}}; if t ∉ QD then n1 – – – n2 ← , q {n7 n8,n9,n3,n4,n6}; add t to QD and to WorkList; n2 – – – n3,n9 c ← 'b','c': end; n3 – – – n4,n6 t ← ε-closure(�N(q,c)) end; n4 – n5 – – = ε-closure(�N(q,’c’)) n5 – – – n8 = ε-closure(n5,n7) // we ran around the graph once! n6 – – n7 – n7 – – – n8 n8 – – – n3,n9 No new states are added D in this and the n9 – – – – to Q following iteration! Compiler Construction 04: Lexical analysis in the real world !14 Subset construction" example b b d2 " n4 n5 " b a " " " n0 n1 n2 n3 n8 n9 a d0 d1 c b " n6 n7 " c c d3 c " �N a b c ε n0 n1 – – – Set DFA NFA ε-closure( N(q,*)) n1 – – – n2 name states states � n2 – – – n3,n9 a b c { n1, n2, n3, n3 – – – n4,n6 q0 d0 n0 – – n4, n6, n9 } n4 – n5 – – { n1, n2, n3, { n5, n8, n9, { n7, n8, n9, q1 d1 – n5 – – – n8 n4, n6, n9 } n3, n4, n6 } n3, n4, n6 } { n5, n8, n9, n6 – – n7 – q2 d2 – q2 q3 n3, n4, n6 } n7 – – – n8 { n7, n8, n9, q3 d3 – q2 q3 n8 – – – n3,n9 n3, n4, n6 } n9 – – – – Compiler Construction 04: Lexical analysis in the real world !15 Subset construction: result " Our NFA for a(b|c)*: b " n4 n5 " a " " " n0 n1 n2 n3 n8 n9 " n6 n7 " c subset construction " algorithm b d2 b a still bigger than b,c d0 d1 c b c a s0 s1 d3 c constructed DFA minimal DFA Compiler Construction 04: Lexical analysis in the real world !16 Minimization of DFAs b d2 b,c b a a s0 s1 d0 d1 c b c d3 c • DFAs resulting from subset construction can have a large set of states • This does not increase the time needed to scan a string • It does increase the size of the recognizer in memory • On modern computers, the speed of memory accesses often governs the speed of computation • A smaller recognizer may fit better into the processor’s cache memory Compiler Construction 04: Lexical analysis in the real world !17 Minimization of DFAs b d2 b,c b a a s0 s1 d0 d1 c b c (states renumbered) d3 c • We need a technique to detect when two states are equivalent • i.e.

Compiler Construction Lecture 4: Lexical Analysis in the Real World 2020-01-17 Michael Engel

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support