Regular Languages & Regular Grammars

Chapter 3: Regular Languages & Regular Grammars ∗ Peter Cappello Department of Computer Science University of California, Santa Barbara Santa Barbara, CA 93106 [email protected] • Please read the corresponding chapter before attending this lecture. • These notes are supplemented with figures, and material that arises during the lecture in response to questions. • Please report any errors in these notes to [email protected]. I'll fix them immediately. ∗Based on An Introduction to Formal Languages and Automata, 3rd Ed., Peter Linz, Jones and Bartlett Publishers, Inc. 1 We now look at the grammatical equivalent of finite acceptors: regular expressions. 3.12 Connection Between Regular Expressions & Regu- lar Languages Regular Expressions Denote Regular Languages By definition, if L is a regular language, then it is accepted by some DFA, MD. We know: • 8 DFA MD; 9 NFA MN ; L(MD) = L(MN ); • 8 NFA MN ; 9 DFA MD; L(MD) = L(MN ): Thm. 3.1 Let r be a regular expression. Then there exists an NFA, M that accepts L(r). 2 Proof: 1. We prove this by showing how to constructs the desired NFA. 2. The structure of the construction reflects the recursive definition of regular expressions. 3. (a) Let r = ;. Draw a picture of an \empty set" NFA. (b) Let r = λ. Draw a picture of a \λ" NFA. (c) Let r = a 2 Σ. Draw a picture of an \a" NFA. (d) Let NFA M1 [M2] accept the language denoted by regular expression r1 [r2]. Draw a picture of an NFA M3 that accepts the language denoted by r1 + r2. (e) Let NFA M1 [M2] accept the language denoted by regular expression r1 [r2]. Draw a picture of an NFA M3 that accepts the language denoted by r1 · r2. 3 (f) Let NFA M accept the language denoted by regular expression r. Draw a picture of an NFA M ∗ that accepts the language denoted by regular expression r∗. 4. It should be clear that this recursive procedure constructs the desired NFA. However, one could be more rigorous and prove that it is so, by in- duction on the number of operators (+; ·; and ∗), where the basis is the construction of the NFAs that correspond to the primitive regular expressions. Example: Give an NFA that accepts the language denoted by 01∗ + 10∗. 4 Regular Expressions for Regular Languages • We have seen how to transform: { a regular expression to an equivalent NFA { an NFA to an equivalent DFA • Composing these, we can transform a regular expression to a DFA. • Question: Can we go the other way? • We easily can transform a DFA to an equivalent NFA. • Can we transform an NFA to an equivalent regular expression? Generalized Transition Graphs A generalized transition graph (GTG) is a transition graph whose edges are labelled with regular expressions. 5 Example: Draw an example of a GTG. • The TG representation of an NFA can be interpreted as a GTG. • If L is regular, there is a GTG that accepts it. • If M is a GTG, then L(M) is regular. Why? • Let M1 and M2 be GTGs. M1 is equivalent to M2 when L(M1) = L(M2). • We pursue the goal of transforming a GTG to an equivalent regular expression. • The process transform a GTG M to an equivalent GTG M 0 with 1 state removed, which is neither the initial state nor a final state. 6 Example: Transform the GTG fragment (draw one) to an equivalent one with 1 fewer state. • In the example above, state q has 2 incoming arcs and 2 outgoing arcs (not including the self-loop). • We needed to create 4 = 2 × 2 arcs to replace q. • In general, if the state to be removed has I incoming arcs and O outgoing arcs (not including self-loops), then IO replacement arcs must be created when removing it. 7 Thm. 3.2 Let L be a regular language. There exists a regular expression r such that L = L(r). Proof: 1. By definition, there is a finite acceptor M such that L = L(M). 2. Without loss of generality, let M be an NFA with 1 final state, qf 6= q0. 3. Interpret M as a GTG. 4. Call a state internal when it is neither q0 nor qf . 5. while ( the GTG has an internal state ) f Transform a GTG to an equivalent GTG with 1 fewer states. g // Draw a picture of this situation. ∗ ∗ ∗ 6. r = r1r2(r4 + r3r1r2) denotes the language accepted by the GTG. 7. Since the sequence of GTGs are all equivalent, L(M) = L(r). 8 Example: Let L be all binary string with an even number of 0s and an odd number of 1s. • Draw a picture of this, using 4 states. • Use the procedure given in the proof of Thm. 3.2 to compute a regular expression that denotes this language. • After getting down to 2 states, the initial and the final state, we have: ∗ r1 = 00 + 01(00) 10 ∗ r2 = 1 + 01(11) 0 ∗ r3 = 1 + 0(11) 10 ∗ r4 = 0(11) 0 • We obtain a regular expression for L by making these substitutions into our generic formula: ∗ ∗ ∗ r = r1r2(r4 + r3r1r2) : 9 Regular Expressions for Describing Simple Patterns • One application concerns compilers; the tokens of the language's gram- mar typically form a regular language. • Lex is an example of a tool that takes a description of a regular language as input, and produces C code that can be used to transform an input stream into a stream of tokens. • This can be done in Java too. • Text editors often allow one to define a search pattern as a regular expression. For example, in the UNIX ed program, the expression =abc ∗ d= returns the 1st occurrence of the string \ab" followed by 0 or more `c' characters, followed by a `d' character. 10 • For the ed program, this requires: 1. taking the regular expression at run time; 2. producing an equivalent NFA; 3. converting it to a DFA 4. wrapping a driver around the DFA to return the 1st occurrence of a string in the language accepted by the DFA. (2.4). • Actually, minimizing the number of states in the DFA would help. • We develop the needed theory during the next lecture • This would be an instructive project. 11.

Load more