Regular Languages

Regular Languages CSCI 2670 Department of Computer Science Fall 2014 CSCI 2670 Regular Languages Outline I Regular Expressions I Converting Regular Expressions to NFAs I Generalized Nondeterministic finite Automata I Converting GNFAs I Nonregular Languages I The Pumping Lemma CSCI 2670 Regular Languages Regular Expressions I Regular languages can also be defined via regular expressions (regexp), a form of shorthand for languages defined using regular operations. Definition Let Σ be any alphabet. 1. Each symbol a 2 Σ is a regular expression; 2. " is a regular expression; 3. ? is a regular expression; 4. if R1 and R2 are regular expressions, then (R1 [ R2) is a regular expression; 5. if R1 and R2 are regular expressions, then (R1 ◦ R2) is a regular expression; ∗ 6. if R1 is a regular expression, then R1 is a regular expression. CSCI 2670 Regular Languages Regular Expressions I If a 2 Σ, the regexp a denotes the language fag. I The regexp " denotes the language f"g. I The regexp (a [ b) denotes the language fag [ fbg. I The regexp (a ◦ b) denotes the language fag ◦ fbg. ∗ I The regexp (a [ b) ◦ a denotes fwaj w is any string over fa; bgg. Note the following: I R1 ◦ R2 is often abbreviated as R1R2. I If Σ = fa1; a2;:::g,Σb is used in place of (a1 [ a2 [ :::)b. I For any regexp R, R? = ? ∗ I ? = f"g I (R1 [ R2) is sometimes written (R1jR2). + I R is the concatenation of one or more elements from R. k I R is the concatenation of k elements from R. I The precedence of operators (greatest to least) is: ∗, ◦, [. CSCI 2670 Regular Languages Regular Expressions Example What languages do the following denote (where Σ = f0; 1g)? ∗ ∗ I 0 10 ∗ ∗ I Σ 1Σ ∗ ∗ I Σ 001Σ ∗ + ∗ I 1 (01 ) ∗ I (ΣΣ) ∗ I (ΣΣΣ) I (01 [ 10) ∗ ∗ I 0Σ 0 [ 1Σ 1 [ 0 [ 1 ∗ I (0 [ ")1 I (0 [ ")(1 [ ") CSCI 2670 Regular Languages Two Identities For any regular expression R: I R [ ? = R I R ◦ " = R The following do not hold in general, however. I R [ " = R I R ◦ ? = R Example If R = ab, then I L(R) = fabg I L(ab [ ?) = fabg I L(ab ◦ ?) = fg I L(ab [ ") = fab;"g I L(ab ◦ ") = fabg CSCI 2670 Regular Languages Equivalence between Regular Expressions and Finite Automata Theorem A language is regular if and only if some regular expression describes it. I That is, a language L is regular if and only if there exists a regular expression R such that L(R) = L. I Note that the theorem is a biconditional statement, and so to prove it, both directions must be proven. I If a language is described by a regular expression, then it is regular. I If a language is regular, then it is described by some regular expression. CSCI 2670 Regular Languages Equivalence between Regular Expressions and Finite Automata Lemma If a language is described by a regular expression, then it is regular. Proof. The proof proceeds by constructing, for each regexp R, an NFA N to recognize L(R). The proof uses structural induction. Basis: 1. R = a, where a 2 Σ: N = (fq0; q1g; Σ; δ; q0; fq1g), where δ(q0; a) = fq1g and δ(q; b) = ? for all q 6= q0 or b 2 Σ such that b 6= a. 2. R = ": N = (fq0g; Σ; δ; q0; fq0g), where δ(q; a) = ? for all states q and a 2 Σ 3. R = ?: N = (fq0g; Σ; δ; q0; fg), where δ(q; a) = ? for all states q and a 2 Σ CSCI 2670 Regular Languages Equivalence between Regular Expressions and Finite Automata 1. L(a) = fag 2. L(") = f"g 3. L(?) = ? CSCI 2670 Regular Languages Equivalence between Regular Expressions and Finite Automata Lemma If a language is described by a regular expression, then it is regular. Proof, Continued. Recursion: The recursive cases are taken care of by the proofs that regular languages are closed under union, concatenation, and Kleene star. 4. R = R1 [ R2 5. R = R1 ◦ R2 ∗ 6. R = R1 CSCI 2670 Regular Languages Equivalence between Regular Expressions and Finite Automata Example 1. Convert the regular expression (ab [ a)∗ to an NFA. 2. Convert the regular expression (a [ b)∗aba to an NFA. ?? CSCI 2670 Regular Languages Generalized NFAs I To prove that each regular language is described by a regular expression, we define generalized nondeterministic finite automata (GNFAs). I GNFAs are like NFAs, except: 1. The start state qstart has edges leading to every other state, but no incoming edges. 2. There is a unique accept state qaccept , qaccept 6= qstart , with incoming edges coming from every other node. It has no outgoing edges. 0 0 3. For all q; q 2 Q − fqaccept ; qstart g, there is exactly one edge from q to q . Note that q and q0 might be the same. CSCI 2670 Regular Languages Generalized NFAs 1. The start state qstart has edges leading to every other state, but no incoming edges. 2. There is a unique accept state qaccept , qaccept 6= qstart , with incoming edges coming from every other node. It has no outgoing edges. 0 0 3. For all q; q 2 Q − fqaccept ; qstart g, there is exactly one edge from q to q . Note that q and q0 might be the same. CSCI 2670 Regular Languages Generalized NFAs I The labels of the edges in an GNFA will be arbitrary regular expressions. I A DFA M can be converted into a GNFA: I Add a new start state qstart with an " edge leading to the old start state. I Add a new accept state qaccept with an " edge leading from each accept state in M to qaccept . 0 0 0 I If edges q !a q and q !b q exist, replace both with q !(a[b) q . 0 0 I If no edge leads from q to q , add q !? q . I It's not proven in the text, but it should be clear that each of these alterations does not change the language accepted by the automaton. CSCI 2670 Regular Languages From GNFAs to regular expressions I The conversion from GNFA to regular expression proceeds by combining nodes and labels in the graph. If a qrip exists such that: I qi !R1 qrip, I qrip !R2 qrip, qrip !R3 qj , and I qi !R4 qj , I then, I Delete qrip and each edge above. Add edge q ! ∗ q . I i R1R2 R3[R4 j I Do this for each qi and qj connected via qrip. I Repeat the process until only two nodes exist, qstart and qaccept . Let CONVERT (G) be the regexp obtained as a result of this process. CSCI 2670 Regular Languages Generalized NFAs (Definition) Definition I A generalized nondeterministic finite automaton (GNFA) is a 5-tuple (Q; Σ; δ; qstart ; qaccept ): I Q is a finite, nonempty set of states. I Σ is a finite, nonempty alphabet. I δ :(Q − fqaccept g) × (Q − fqstart g) ! R is the transition function, where R is the set of regular expressions over Σ. I qstart 2 Q is the start state. I qaccept is the unique accept state. I The function δ identifies the labels for edge (qi ; qj ), where qi 2 Q − fqaccept g and qj 2 Q − fqstart g. I Here, qi can't be the accept state, because no edge originates there. I Here, qj can't be the start state, because no edge ends there. CSCI 2670 Regular Languages Language Recognition for GNFAs Definition ∗ Let G be a GNFA and w = w1w2 ::: wk a string, where each wi 2 Σ . G accepts w iff there is a sequence of states q0; q1;::: qk such that I q0 = qstart . I qk = qaccept . I for each i, wi 2 L(Ri ), where Ri = δ(qi−1; qi ). I We split w into w1w2 ::: wk , where each wi corresponds to a string generated by a regular expression on an edge. I Specifically, wi is in the language indicated by the label from qi−1 to qi . CSCI 2670 Regular Languages Equivalence between Regular Expressions and GNFAs Proposition For any GNFA G, CONVERT (G) is equivalent to G. Proof. The proof proceeds by induction on the number of nodes in G. Basis: If G has only 2 nodes, then they must be the distinct start and accept states, and the regular expression between them is CONVERT (G) and describes exactly the strings accepted by G. Induction: Suppose the claim holds for GNFAs of k − 1 states and that G has k states (where k > 2). Since G has more than 2 states, it can be 0 reduced. Let G be a GNFA obtained by removing a state qrip from G according to the procedure described earlier. Let δ0 be the transition function for G 0. CSCI 2670 Regular Languages Equivalence between Regular Expressions and GNFAs Proposition For any GNFA G, CONVERT (G) is equivalent to G. Proof, Continued. I Let w = w1w2 ::: wn be a string accepted by G. Then there exists a sequence qstart ; q1; q2;:::; qaccept demonstrating that G accepts w. Note that for each i, wi 2 L(Ri ), where Ri = δ(qi−1; qi ). I State qrip is either in this sequence, or it's not. 1. If not, then the sequence qstart ; q1; q2;:::; qaccept demonstrates that 0 0 G accepts w, since for each qi and qi+1, δ (qi ; qi+1) = R [ S, where δ(qi ; qi+1) = R and S is some other regular expression. 2. If qrip is in the sequence qstart ; q1; q2;:::; qaccept , then the sequence with all occurrences of qrip removed constitutes an accepting computation path for G 0.

Load more