Chapter 3: Regular Languages & Regular Grammars ∗

Peter Cappello Department of Computer Science University of California, Santa Barbara Santa Barbara, CA 93106 [email protected]

• Please read the corresponding chapter before attending this lecture. • These notes are supplemented with figures, and material that arises during the lecture in response to questions.

• Please report any errors in these notes to [email protected]. I’ll fix them immediately.

∗Based on An Introduction to Formal Languages and Automata, 3rd Ed., Peter Linz, Jones and Bartlett Publishers, Inc.

1 We now look at the grammatical equivalent of finite acceptors: regular expressions.

3.12 Connection Between Regular Expressions & Regu- lar Languages

Regular Expressions Denote Regular Languages

By definition, if is a , then it is accepted by some DFA, MD. We know:

• ∀ DFA MD, ∃ NFA MN , L(MD) = L(MN ),

• ∀ NFA MN , ∃ DFA MD, L(MD) = L(MN ). Thm. 3.1 Let r be a . Then there exists an NFA, M that accepts L(r).

2 Proof:

1. We prove this by showing how to constructs the desired NFA. 2. The structure of the construction reflects the recursive definition of regular expressions. 3. (a) Let r = ∅. Draw a picture of an “empty set” NFA. (b) Let r = λ. Draw a picture of a “λ” NFA. (c) Let r = a ∈ Σ. Draw a picture of an “a” NFA.

(d) Let NFA M1 [M2] accept the language denoted by regular ex- pression r1 [r2]. Draw a picture of an NFA M3 that accepts the language denoted by r1 + r2.

(e) Let NFA M1 [M2] accept the language denoted by regular ex- pression r1 [r2]. Draw a picture of an NFA M3 that accepts the language denoted by r1 · r2.

3 (f) Let NFA M accept the language denoted by regular expression r. Draw a picture of an NFA M ∗ that accepts the language denoted by regular expression r∗. 4. It should be clear that this recursive procedure constructs the desired NFA. However, one could be more rigorous and prove that it is so, by in- duction on the number of operators (+, ·, and ∗), where the basis is the construction of the NFAs that correspond to the primitive regular expressions.

Example: Give an NFA that accepts the language denoted by 01∗ + 10∗.

4 Regular Expressions for Regular Languages

• We have seen how to transform: – a regular expression to an equivalent NFA – an NFA to an equivalent DFA • Composing these, we can transform a regular expression to a DFA. • Question: Can we go the other way? • We easily can transform a DFA to an equivalent NFA. • Can we transform an NFA to an equivalent regular expression?

Generalized Transition Graphs

A generalized transition graph (GTG) is a transition graph whose edges are labelled with regular expressions.

5 Example: Draw an example of a GTG.

• The TG representation of an NFA can be interpreted as a GTG. • If L is regular, there is a GTG that accepts it. • If M is a GTG, then L(M) is regular. Why?

• Let M1 and M2 be GTGs. M1 is equivalent to M2 when L(M1) = L(M2). • We pursue the goal of transforming a GTG to an equivalent regular expression. • The process transform a GTG M to an equivalent GTG M 0 with 1 state removed, which is neither the initial state nor a final state.

6 Example: Transform the GTG fragment (draw one) to an equivalent one with 1 fewer state.

• In the example above, state q has 2 incoming arcs and 2 outgoing arcs (not including the self-loop). • We needed to create 4 = 2 × 2 arcs to replace q. • In general, if the state to be removed has I incoming arcs and O outgoing arcs (not including self-loops), then IO replacement arcs must be created when removing it.

7 Thm. 3.2 Let L be a regular language. There exists a regular expression r such that L = L(r). Proof: 1. By definition, there is a finite acceptor M such that L = L(M).

2. Without loss of generality, let M be an NFA with 1 final state, qf 6= q0. 3. Interpret M as a GTG.

4. Call a state internal when it is neither q0 nor qf . 5. while ( the GTG has an internal state ) { Transform a GTG to an equivalent GTG with 1 fewer states. } // Draw a picture of this situation.

∗ ∗ ∗ 6. r = r1r2(r4 + r3r1r2) denotes the language accepted by the GTG. 7. Since the sequence of GTGs are all equivalent, L(M) = L(r).

8 Example: Let L be all binary string with an even number of 0s and an odd number of 1s. • Draw a picture of this, using 4 states. • Use the procedure given in the proof of Thm. 3.2 to compute a regular expression that denotes this language. • After getting down to 2 states, the initial and the final state, we have: ∗ r1 = 00 + 01(00) 10 ∗ r2 = 1 + 01(11) 0 ∗ r3 = 1 + 0(11) 10 ∗ r4 = 0(11) 0 • We obtain a regular expression for L by making these substitutions into our generic formula: ∗ ∗ ∗ r = r1r2(r4 + r3r1r2) .

9 Regular Expressions for Describing Simple Patterns

• One application concerns ; the tokens of the language’s gram- mar typically form a regular language. • Lex is an example of a tool that takes a description of a regular lan- guage as input, and produces C code that can be used to transform an input stream into a stream of tokens. • This can be done in Java too. • Text editors often allow one to define a search pattern as a regular expression. For example, in the UNIX ed program, the expression /abc ∗ d/ returns the 1st occurrence of the string “ab” followed by 0 or more ‘c’ characters, followed by a ‘d’ character.

10 • For the ed program, this requires: 1. taking the regular expression at run time; 2. producing an equivalent NFA; 3. converting it to a DFA 4. wrapping a driver around the DFA to return the 1st occurrence of a string in the language accepted by the DFA. (2.4). • Actually, minimizing the number of states in the DFA would help. • We develop the needed theory during the next lecture • This would be an instructive project.

11