Regular Expressions

Midterm Logistics

● Midterm next Tuesday, May 8 from 7PM – 10PM.

● Test location by last name:

● A – J: Go to 200-002

● K – Z: Go to Hewlett 201

● Covers material up through and including today's lecture.

● Open-book, open-note, open-computer, closed-network.

● SCPD: Exam will be emailed out Tuesday morning. You can start the exam any time within a 24-hour window.

● If you cannot make the normal exam time, please let us know no later than Friday.

Regular Languages

Regular Languages

● A language is called a iff:

● It is accepted by some DFA. ● It is accepted by some NFA. ● Regular languages are closed under various operations:

● Complement ● Union ● Intersection

Concatenation

● The of two languages L1 and L2 over the alphabet Σ is the language

L1 L2 = { wx | w ∈ L1 ∧ x ∈ L2 } ● The of strings that can be split into two

pieces: a string in L1 and a string in L2.

Concatenation Example

● Example: Let Σ = { a, b, …, z, A, B, …, Z }

● Noun = { Velociraptor, Rainbow, Whale, … } ● Verb = { Eats, Juggles, Loves, … } ● The = { The } ● The language TheNounVerbTheNoun is

● { TheVelociraptorEatsTheVelociraptor, TheWhaleLovesTheRainbow, TheRainbowJugglesTheVelociraptor, … }

Concatenating Regular Languages

● If L1 and L2 are regular languages, is L1L2?

● Intuition – can we split a string w into two strings x and y such

that w = xy, x ∈ L1, and y ∈ L2?

● Idea: Run the automaton for L1 on w, and whenever L1

reaches an accepting state hand the rest off w to L2.

● If L2 accepts the remainder, then L1 accepted the first part and the

string is in L1L2.

● If L2 rejects the remainder, then the split was incorrect.

Concatenating Regular Languages

Concatenating Regular Languages

start

Concatenating Regular Languages

start start

Concatenating Regular Languages

ε

ε start

ε

Concatenating Regular Languages

ε

ε start

ε

Lots and Lots of Concatenation

● Consider the language L = { aa, b }

● LL is the set of strings formed by concatenating pairs of strings in L.

● { aaaa, aab, baa, bb }

● LLL is the set of strings formed by concatenating triples of strings in L.

● { aaaaaa, aaaab, aabaa, aabb, baaaa, baab, bbaa, bbb }

● LLLL is the set of strings formed by concatenating quadruples of strings in L

● { aaaaaaaa, aaaaaab, aaaabaa, aaaabb, aabaaaa, aabaab, aabbaa, aabbb, baaaaaa, baaaab, baabaa, baabb, bbaaaa, bbaab, bbbaa, bbbb }

Language Exponentiation

● We can define what it means to “exponentiate” a language as follows:

● L0 = { ε }

● The set containing just the empty string. ● Idea: Any string formed by concatenating zero strings together is the empty string. ● Ln + 1 = LLn

● Idea: Concatenating (n + 1) strings together works by concatenating n strings, then concatenating one more.

The Kleene

● An important operation on languages is the Kleene Closure, which is defined as ∞ L* = ∪Li i = 0

● Intuitively, all possible ways of concatenating any number of copies of strings in L together.

The Kleene Closure

● An important operation on languages is the Kleene Closure, which is defined as ∞ L* = ∪Li i = 0

● Intuitively,This is an infinite all possible union of ways of concatenating sets. It is defined as “the anyset number of all x contained of copies in Li of strings in L together. for any natural number i.”

The Kleene Closure

● An important operation on languages is the Kleene Closure, which is defined as ∞ L* = ∪Li i = 0

● Intuitively, all possible ways of concatenating any number of copies of strings in L together.

Reasoning about Infinity

● How do we prove properties of this infinite union? ● A Bad Line of Reasoning:

● L0 = { ε } is regular. ● L1 = L is regular. ● L2 = LL is regular ● L3 = (LL)L is regular ● … ● So their infinite union is regular.

Reasoning about Infinity

Reasoning about Infinity

x

x Reasoning about Infinity

x

x Reasoning about Infinity

x

x Reasoning about Infinity

x

x Reasoning about Infinity

x

x Reasoning About the Infinite

● If a series of finite objects all have some property, their infinite union does not necessarily have that property!

● No matter how many times we zigzag that line, it's never straight. ● Concluding that it must be equal “in the limit” is not mathematically precise. ● (This is why calculus is interesting). ● A better intuition: Can we convert an NFA for the language L to an NFA for the language L*?

The Kleene Star

start

The Kleene Star

start ε

The Kleene Star

start ε

The Kleene Star

ε ε

start ε

Kleene Star in Action L = { ma, mom, mommy, mum }

Kleene Star in Action L = { ma, mom, mommy, mum }

a start m o m m y

u m

Kleene Star in Action L = { ma, mom, mommy, mum }

a start ε m o m m y

u m

Kleene Star in Action L = { ma, mom, mommy, mum }

ε

ε

a start ε m o m m y

u m

ε ε

Kleene Star in Action L = { ma, mom, mommy, mum }

ε

ε

a start ε m o m m y

u m

ε ε

m a m o m m u m

Kleene Star in Action L = { ma, mom, mommy, mum }

ε

ε

a start ε m o m m y

u m

ε ε

m a m o m m u m

Kleene Star in Action L = { ma, mom, mommy, mum }

ε

ε

a start ε m o m m y

u m

ε ε

m a m o m m u m

Kleene Star in Action L = { ma, mom, mommy, mum }

ε

ε

a start ε m o m m y

u m

ε ε

m a m o m m u m

Kleene Star in Action L = { ma, mom, mommy, mum }

ε

ε

a start ε m o m m y

u m

ε ε

m a m o m m u m

Kleene Star in Action L = { ma, mom, mommy, mum }

ε

ε

a start ε m o m m y

u m

ε ε

m a m o m m u m

Kleene Star in Action L = { ma, mom, mommy, mum }

ε

ε

a start ε m o m m y

u m

ε ε

m a m o m m u m

Kleene Star in Action L = { ma, mom, mommy, mum }

ε

ε

a start ε m o m m y

u m

ε ε

m a m o m m u m

Kleene Star in Action L = { ma, mom, mommy, mum }

ε

ε

a start ε m o m m y

u m

ε ε

m a m o m m u m

Kleene Star in Action L = { ma, mom, mommy, mum }

ε

ε

a start ε m o m m y

u m

ε ε

m a m o m m u m

Kleene Star in Action L = { ma, mom, mommy, mum }

ε

ε

a start ε m o m m y

u m

ε ε

m a m o m m u m

Kleene Star in Action L = { ma, mom, mommy, mum }

ε

ε

a start ε m o m m y

u m

ε ε

m a m o m m u m

Kleene Star in Action L = { ma, mom, mommy, mum }

ε

ε

a start ε m o m m y

u m

ε ε

m a m o m m u m

Kleene Star in Action L = { ma, mom, mommy, mum }

ε

ε

a start ε m o m m y

u m

ε ε

m a m o m m u m

Kleene Star in Action L = { ma, mom, mommy, mum }

ε

ε

a start ε m o m m y

u m

ε ε

m a m o m m u m

Kleene Star in Action L = { ma, mom, mommy, mum }

ε

ε

a start ε m o m m y

u m

ε ε

m a m o m m u m

Kleene Star in Action L = { ma, mom, mommy, mum }

ε

ε

a start ε m o m m y

u m

ε ε

m a m o m m u m

Kleene Star in Action L = { ma, mom, mommy, mum }

ε

ε

a start ε m o m m y

u m

ε ε

m a m o m m u m

Kleene Star in Action L = { ma, mom, mommy, mum }

ε

ε

a start ε m o m m y

u m

ε ε

m a m o m m u m

Kleene Star in Action L = { ma, mom, mommy, mum }

ε

ε

a start ε m o m m y

u m

ε ε

m a m o m m u m

Kleene Star in Action L = { ma, mom, mommy, mum }

ε

ε

a start ε m o m m y

u m

ε ε

m a m o m m u m

Kleene Star in Action L = { ma, mom, mommy, mum }

ε

ε

a start ε m o m m y

u m

ε ε

m a m o m m u m

Kleene Star in Action L = { ma, mom, mommy, mum }

ε

ε

a start ε m o m m y

u m

ε ε

m a m o m m u m

Kleene Star in Action L = { ma, mom, mommy, mum }

ε

ε

a start ε m o m m y

u m

ε ε

m a m o m m u m

Kleene Star in Action L = { ma, mom, mommy, mum }

ε

ε

a start ε m o m m y

u m

ε ε

m a m o m m u m

Kleene Star in Action L = { ma, mom, mommy, mum }

ε

ε

a start ε m o m m y

u m

ε ε

m a m o m m u m

Summary

● NFAs are a powerful type of automaton that allows for nondeterministic choices.

● NFAs can also have ε-transitions that move from state to state without consuming any input.

● The construction shows that NFAs are not more powerful than DFAs, because any NFA can be converted into a DFA that accepts the same language.

● The union, intersection, difference, complement, concatenation, and Kleene closure of regular languages are all regular languages.

Rethinking Regular Languages

● We currently have several tools for showing a language is regular.

● Construct a DFA for it. ● Construct an NFA for it. ● Apply closure properties to existing languages. ● We have not spoken much of this last idea.

Constructing Regular Languages

● Idea: Build up all regular languages as follows:

● Start with a small set of simple languages we already know to be regular. ● Using closure properties, combine these simple languages together to form more elaborate languages. ● A bottom-up approach to the regular languages.

Regular Expressions

● Regular expressions are a family of descriptions that can be used to capture the regular languages. ● Often provide a compact and human-readable description of the language. ● Used as the basis for numerous software systems (Perl, flex, grep, etc.)

Atomic Regular Expressions

● The regular expressions begin with three simple building blocks. ● The symbol Ø is a that represents the empty language Ø. ● The symbol ε is a regular expression that represents the language { ε }

● This is not the same as Ø! ● For any a ∈ Σ, the symbol a is a regular expression for the language { a }

Compound Regular Expressions

● We can combine together existing regular expressions in four ways.

● If R1 and R2 are regular expressions, R1R2 is a regular expression represents the concatenation of the

languages of R1 and R2.

● If R1 and R2 are regular expressions, R1 | R2 is a regular

expression representing the union of R1 and R2.

● If R is a regular expression, R* is a regular expression for the Kleene closure of R.

● If R is a regular expression, (R) is a regular expression

with the same meaning as R. Operator Precedence

● Regular expression operator precedence is (R) R*

R1R2

R1 | R2 ● So ab*c|d is parsed as ((a(b*))c)|d

Regular Expression Examples

● The regular expression trick|treat represents the regular language { trick, treat } ● The regular expression booo* represents the regular language { boo, booo, boooo, … } ● The regular expression candy!(candy!)* represents the regular language { candy!, candy!candy!, candy!candy!candy!, … }

Regular Expressions, Formally

● The language of a regular expression is the language described by that regular expression.

● Formally:

● ℒ(ε) = {ε} ● ℒ(Ø) = Ø ● ℒ(a) = {a}

● ℒ(R1 R2) = ℒ (R1) ℒ (R2)

● ℒ(R1 | R2) = ℒ (R1) ∪ ℒ (R2) ● ℒ(R*) = ℒ (R)* ● ℒ((R)) = ℒ (R)

Regular Expressions are Awesome

● Let Σ = {0, 1} ● Let L = { w | w contains 00 as a substring }

Regular Expressions are Awesome

● Let Σ = {0, 1} ● Let L = { w | w contains 00 as a substring }

(0 | 1)*00(0 | 1)*

Regular Expressions are Awesome

● Let Σ = {0, 1} ● Let L = { w | w contains 00 as a substring }

(0 | 1)*00(0 | 1)*

Regular Expressions are Awesome

● Let Σ = {0, 1} ● Let L = { w | w contains 00 as a substring }

(0 | 1)*00(0 | 1)*

11011100101 0000 11111011110011111

Regular Expressions are Awesome

● Let Σ = {0, 1} ● Let L = { w | w contains 00 as a substring }

(0 | 1)*00(0 | 1)*

11011100101 0000 11111011110011111

Regular Expressions are Awesome

● Let Σ = {0, 1} ● Let L = { w | |w| = 4 }

Regular Expressions are Awesome

Let Σ = {0, 1} Let L = { w | |w| = 4 }

Regular Expressions are Awesome

Let Σ = {0, 1} Let L = { w | |w| = 4 }

TheThe lengthlength ofof aa stringstring ww isis denoteddenoted ||ww||

Regular Expressions are Awesome

● Let Σ = {0, 1} ● Let L = { w | |w| = 4 }

Regular Expressions are Awesome

● Let Σ = {0, 1} ● Let L = { w | |w| = 4 }

(0|1)(0|1)(0|1)(0|1)

Regular Expressions are Awesome

● Let Σ = {0, 1} ● Let L = { w | |w| = 4 }

(0|1)(0|1)(0|1)(0|1)

Regular Expressions are Awesome

● Let Σ = {0, 1} ● Let L = { w | |w| = 4 }

(0|1)(0|1)(0|1)(0|1)

0000 1010 1111 1000

Regular Expressions are Awesome

● Let Σ = {0, 1} ● Let L = { w | |w| = 4 }

(0|1)(0|1)(0|1)(0|1)

0000 1010 1111 1000

Regular Expressions are Awesome

● Let Σ = {0, 1} ● Let L = { w | |w| = 4 }

(0|1)4

0000 1010 1111 1000

Regular Expressions are Awesome

● Let Σ = {0, 1} ● Let L = { w | |w| = 4 }

(0|1)4

0000 1010 1111 1000

Regular Expressions are Awesome

● Let Σ = {0, 1} ● Let L = { w | w contains at most one 0 }

Regular Expressions are Awesome

● Let Σ = {0, 1} ● Let L = { w | w contains at most one 0 }

1*(0 | ε)1*

Regular Expressions are Awesome

● Let Σ = {0, 1} ● Let L = { w | w contains at most one 0 }

1*(0 | ε)1*

Regular Expressions are Awesome

● Let Σ = {0, 1} ● Let L = { w | w contains at most one 0 }

1*(0 | ε)1*

11110111 111111 0111 0

Regular Expressions are Awesome

● Let Σ = {0, 1} ● Let L = { w | w contains at most one 0 }

1*(0 | ε)1*

11110111 111111 0111 0

Regular Expressions are Awesome

● Let Σ = {0, 1} ● Let L = { w | w contains at most one 0 }

1*0?1*

11110111 111111 0111 0

Regular Expressions are Awesome

● Let Σ = { a, ., @ }, where a represents “some letter.” ● A regular expression for email addresses is

aa* (.aa*)* @ aa*.aa* (.aa*)*

Regular Expressions are Awesome

● Let Σ = { a, ., @ }, where a represents “some letter.” ● A regular expression for email addresses is

aa* (.aa*)* @ aa*.aa* (.aa*)*

[email protected] [email protected] [email protected]

Regular Expressions are Awesome

● Let Σ = { a, ., @ }, where a represents “some letter.” ● A regular expression for email addresses is

aa* (.aa*)* @ aa*.aa* (.aa*)*

[email protected] [email protected] [email protected]

Regular Expressions are Awesome

● Let Σ = { a, ., @ }, where a represents “some letter.” ● A regular expression for email addresses is

aa* (.aa*)* @ aa*.aa* (.aa*)*

[email protected] [email protected] [email protected]

Regular Expressions are Awesome

● Let Σ = { a, ., @ }, where a represents “some letter.” ● A regular expression for email addresses is

aa* (.aa*)* @ aa*.aa* (.aa*)*

[email protected] [email protected] [email protected]

Regular Expressions are Awesome

● Let Σ = { a, ., @ }, where a represents “some letter.” ● A regular expression for email addresses is

a+ (.aa*)* @ aa*.aa* (.aa*)*

[email protected] [email protected] [email protected]

Regular Expressions are Awesome

● Let Σ = { a, ., @ }, where a represents “some letter.” ● A regular expression for email addresses is

a+ (.a+)* @ a+.a+ (.a+)*

[email protected] [email protected] [email protected]

Regular Expressions are Awesome

● Let Σ = { a, ., @ }, where a represents “some letter.” ● A regular expression for email addresses is

a+ (.a+)* @ a+.a+ (.a+)*

[email protected] [email protected] [email protected]

Regular Expressions are Awesome

● Let Σ = { a, ., @ }, where a represents “some letter.” ● A regular expression for email addresses is

a+ (.a+)* @ a+ (.a+)+

[email protected] [email protected] [email protected]

Regular Expressions are Awesome

● Let Σ = { a, ., @ }, where a represents “some letter.” ● A regular expression for email addresses is

a+(.a+)*@a+(.a+)+

[email protected] [email protected] [email protected]

Regular Expressions are Awesome a+(.a+)*@a+(.a+)+ @, . a, @, . @, . @, . q 2 q8 q7 @ ., @ . a @ @, . . a start q a q @ q a . a 0 1 3 q04 q5 q6 a a a

Extensions to Regular Expressions

● To simplify regular expressions, we introduce the following shorthand (which is just syntax sugar for some other regular expression). ● Rn represents the language RR...R (n times). ● R? represents R | ε.

● Either zero or one copies of R. ● R+ represents RR*.

● n ≥ 1 copies of R. ● Sometimes called Kleene Plus.

Regular Expressions and Languages

● So far, we have two characterizations of regular languages:

● Any language accepted by some DFA.

● Any language accepted by some NFA. ● Theorem: A language is regular iff it is described by some regular expression.

● Need to show if a regular expression exists for L, L is regular.

● Need to show that if L is regular, there is a regular expression for L. ● The second direction is not obvious!

A Marvelous Construction

● To show that any language described by a regular expression is regular, we show how to convert a regular expression into an NFA.

● Theorem: For any regular expression R, there is an NFA N such that

● ℒ(R) = ℒ (N)

● N has exactly one accepting state.

● N has no transitions into its start state.

● N has no transitions out of its accepting state.

start

A Marvelous Construction

To show that any language described by a regular expression is regular, we show how to convert a regular expression into an NFA. Theorem: For any regular expression R, there is an NFA N such that ℒ(R) = ℒ (N)

● N has exactly one accepting state.

● N has no transitions into its start state.

● N has no transitions out of its accepting state.

start

A Marvelous Construction

To show that any language described by a regular expression is regular, we show how to convert a TheseregularThese are areexpression strongerstronger into an NFA. requirementsrequirements thanthan areare necessary for a normal NFA. Theorem: For any regular expressionnecessary R, there for isa annormal NFA NNFA. such that WeWe enforceenforce thesethese rulesrules toto simplify the construction. ℒ(R) = ℒ (N) simplify the construction.

● N has exactly one accepting state.

● N has no transitions into its start state.

● N has no transitions out of its accepting state.

start

Base Cases

start ε

Automaton for ε

start

Automaton for Ø

start a

Automaton for single character a

Construction for R1R2

Construction for R1R2

start start

R1 R2

Construction for R1R2

start

R1 R2

Construction for R1R2

ε start

R1 R2

Construction for R1R2

ε start

R1 R2

Construction for R1 | R2

Construction for R1 | R2

start

R1

start

R 2 Construction for R1 | R2

start

start R1

start

R 2 Construction for R1 | R2

ε

start R1

ε

R 2 Construction for R1 | R2

ε

start R1

ε

R 2 Construction for R1 | R2

ε ε

start R1

ε ε

R 2 Construction for R1 | R2

ε ε

start R1

ε ε

R 2 Construction for R*

Construction for R*

start

R

Construction for R*

start start

R

Construction for R*

start start

R

ε

Construction for R*

start ε ε

R

ε

Construction for R*

ε start ε ε

R

ε

Construction for R*

ε start ε ε

R

ε

The Other Direction

● Proving that if L is regular, there is a regular expression R for L is much trickier. ● Idea: Convert an NFA for L into a regular expression for L. ● How do we do this?

From NFAs to Regular Expressions

From NFAs to Regular Expressions

, , …, s1 s2 sn start

From NFAs to Regular Expressions

, , …, s1 s2 sn start

Regular expression: Ø

From NFAs to Regular Expressions

, , …, s1 s2 sn start

Regular expression: Ø

s1, s2, …, sn start

From NFAs to Regular Expressions

, , …, s1 s2 sn start

Regular expression: Ø

s1, s2, …, sn start

Regular expression: | | … | (s1 s2 sn)*

From NFAs to Regular Expressions

| | … | s1 s2 sn start

Regular expression: Ø

s1 | s2 | … | sn start

Regular expression: | | … | (s1 s2 sn)*

From NFAs to Regular Expressions

| | … | s1 s2 sn start KeyKey idea:idea: AllowAllow transitionstransitions toto bebe Regular expression: Ø labeledlabeled withwith arbitraryarbitrary regularregular expressions.expressions.

s1 | s2 | … | sn start

Regular expression: | | … | (s1 s2 sn)*

From NFAs to Regular Expressions

From NFAs to Regular Expressions

start R

Regular expression: R

From NFAs to Regular Expressions

start R

Regular expression: R

KeyKey idea:idea: IfIf wewe cancan convertconvert anyany NFANFA intointo somethingsomething thatthat lookslooks likelike this,this, wewe cancan easilyeasily readread offoff thethe regularregular expression.expression.

From NFAs to Regular Expressions

start R

Regular expression: R

s1 | s2 | … | sn start

From NFAs to Regular Expressions

start R

Regular expression: R

(s | s | … | s )* start 1 2 n

From NFAs to Regular Expressions

start R

Regular expression: R

(s | s | … | s )* start 1 2 n

Regular expression: | | … | (s1 s2 sn)*

From NFAs to Regular Expressions

start R

Regular expression: R

From NFAs to Regular Expressions

start R

Regular expression: R

s1 | s2 | … | sn start

From NFAs to Regular Expressions

start R

Regular expression: R

start Ø

From NFAs to Regular Expressions

start R

Regular expression: R

start Ø

Regular expression: Ø

From NFAs to Regular Expressions

start R

Regular expression: R

From NFAs to Regular Expressions

start R

Regular expression: R

R11 R22

R12 start q q 1 R21 2

From NFAs to Regular Expressions

start R

Regular expression: R

R11* R12 (R22 | R21R11*R12)* start q1 q2

From NFAs to Regular Expressions

start R

Regular expression: R

R11* R12 (R22 | R21R11*R12)* start q1 q2

From NFAs to Regular Expressions

R11 R22

R12 start q q 1 R21 2

From NFAs to Regular Expressions

R11 R22

R12 start q q q q s 1 R21 2 f

From NFAs to Regular Expressions

R11 R22

R12 ε ε start q q q q s 1 R21 2 f

From NFAs to Regular Expressions

R11 R22

R12 ε ε start q q q q s 1 R21 2 f

From NFAs to Regular Expressions

R11 R22

R12 ε ε start q q q q s 1 R21 2 f

From NFAs to Regular Expressions

R11 R22

R12 ε ε start q q q q s 1 R21 2 f

CouldCould wewe eliminateeliminate thisthis statestate fromfrom thethe NFA?NFA?

From NFAs to Regular Expressions

R11 R22

R12 ε ε start q q q q s 1 R21 2 f

From NFAs to Regular Expressions

R11 R22

R12 ε ε start q q q q s 1 R21 2 f

From NFAs to Regular Expressions

ε R11* R12

R11 R22

R12 ε ε start q q q q s 1 R21 2 f

From NFAs to Regular Expressions

ε R11* R12

R11 R22

R12 ε ε start q q q q s 1 R21 2 f

From NFAs to Regular Expressions

ε R11* R12

R11 R22

R12 ε ε start q q q q s 1 R21 2 f

From NFAs to Regular Expressions

ε R11* R12

R11 R22

R12 ε ε start q q q q s 1 R21 2 f

From NFAs to Regular Expressions

ε R11* R12

R11 R22

R12 ε ε start q q q q s 1 R21 2 f

R21 R11* R12

From NFAs to Regular Expressions

ε R11* R12

R11 R22

R12 ε ε start q q q q s 1 R21 2 f

R21 R11* R12

From NFAs to Regular Expressions

ε R11* R12

R22

start ε qs q2 qf

R21 R11* R12

From NFAs to Regular Expressions

R11* R12

R22

start ε qs q2 qf

R21 R11* R12

From NFAs to Regular Expressions

R11* R12

start ε qs q2 qf

R22 | R21 R11* R12

From NFAs to Regular Expressions

R * R start 11 12 ε qs q2 qf

R22 | R21 R11* R12

From NFAs to Regular Expressions

R * R start 11 12 ε qs q2 qf

R22 | R21 R11* R12

From NFAs to Regular Expressions

R * R start 11 12 ε qs q2 qf

R22 | R21 R11* R12

From NFAs to Regular Expressions

R * R start 11 12 ε qs q2 qf

R22 | R21 R11* R12

From NFAs to Regular Expressions

R11* R12 (R22 | R21R11*R12)* ε

R * R start 11 12 ε qs q2 qf

R22 | R21 R11* R12

From NFAs to Regular Expressions

R11* R12 (R22 | R21R11*R12)* ε

R * R start 11 12 ε qs q2 qf

R22 | R21 R11* R12

From NFAs to Regular Expressions

R11* R12 (R22 | R21R11*R12)* ε

start qs qf

From NFAs to Regular Expressions

R11* R12 (R22 | R21R11*R12)*

start qs qf

From NFAs to Regular Expressions

R11* R12 (R22 | R21R11*R12)* start qs qf

From NFAs to Regular Expressions

R11* R12 (R22 | R21R11*R12)* start qs qf

R11 R22

R12 start q q 1 R21 2

The Construction at a Glance

● Start with an NFA for the language.

● For simplicity, add a new start state qs and accept

state qf to the NFA, then eliminate all other accepting states.

● Repeatedly remove states other than qs and qf from the NFA by “shortcutting” them until only two

states remain: qs and qf.

● The transition from qs to qf is then a regular expression for the NFA.

Another Example

start 0 q0 q1 0, 1 1

q 2 0, 1

Another Example

start 0 q0 q1 0 | 1 1

q 2 0, 1

Another Example

qs

start 0 q0 q1 0 | 1 1

q 2 qf 0, 1

Another Example start qs

0

q0 q1 0 | 1 1

q 2 qf 0, 1

Another Example start qs ε 0

q0 q1 0 | 1 1

q 2 qf 0, 1

Another Example start qs ε 0

q0 q1 0 | 1 1

q 2 qf 0, 1

Another Example start qs ε 0

q0 q1 0 | 1 1

q 2 qf 0, 1

Another Example start qs ε 0

q0 q1 0 | 1 1 ε ε

q 2 qf 0, 1

Another Example start qs ε 0

q0 q1 0 | 1 1 ε ε

q 2 qf 0, 1

Another Example start qs ε 0

q0 q1 0 | 1 ε ε

qf

Another Example start qs ε 0

q0 q1 0 | 1 ε ε

qf

Another Example start qs ε 0

q0 q1 0 | 1 ε ε

qf

Another Example start qs ε 0

q0 q1 0 | 1 ε ε

qf

Another Example start qs ε 0

q0 q1 0 | 1 ε ε 0ε qf

Another Example start qs ε 0

q0 q1 0 | 1 ε ε 0ε qf

Another Example start qs ε 0

q0 q1 0 | 1 ε ε 0ε qf

Another Example start qs ε 0

0(0 | 1) q 0 q1 0 | 1 ε ε 0ε qf

Another Example start qs ε 0

0(0 | 1) q 0 q1 0 | 1 ε ε 0ε qf

Another Example start qs ε

0(0 | 1) q 0

ε 0ε qf

Another Example start qs ε

0(0 | 1) q 0

ε 0 qf

Another Example start qs ε

0(0 | 1) q 0

0 | ε qf

Another Example start qs

ε

0(0 | 1) q 0

0 | ε

qf

Another Example start qs

ε

0(0 | 1) q 0

0 | ε

qf

Another Example start qs

ε

0(0 | 1) q 0

0 | ε

qf

Another Example start qs

ε

0(0 | 1) q 0

0 | ε

qf

Another Example start qs

ε

ε(0(0 | 1))*(0 | ε) 0(0 | 1) q 0

0 | ε

qf

Another Example start qs

ε

ε(0(0 | 1))*(0 | ε) 0(0 | 1) q 0

0 | ε

qf

Another Example start qs

ε(0(0 | 1))*(0 | ε)

qf

Another Example start qs

(0(0 | 1))*(0 | ε)

qf

Another Example start qs

(0(0 | 1))*(0 | ε)

qf

One More Example start q0 1 0

0 q q 1 1 2

1 0 0 1 q 3 q5

1 0

One More Example start qs q0 1 0

0 q q 1 1 2

1 0 0 1 q 3 q5

1 0

qf One More Example start qs q0 1 0

0 q q 1 1 2

1 0 0 1 q 3 q5

1 0

qf One More Example start ε qs q0 1 0

0 q q 1 1 2

1 0 0 1 q 3 q5

1 0

qf One More Example start ε qs q0 1 0

0 q q 1 1 2

1 0 0 1 q 3 q5

1 0

qf One More Example start ε qs q0 1 0

0 q q 1 1 2

1 0 0 1 q 3 q5

1 ε ε 0

qf One More Example start ε qs q0 1 0

0 q q 1 1 2

1 0 0 1 q 3 q5

1 ε ε 0

qf One More Example start ε qs q0 1 0

0 q q 1 1 2

1 0 0 1 q 3 q5

1 ε ε 0

qf One More Example start ε qs q0 1 0

0 q q 1 1 2

1 0 0 1 q 3 q5

1 ε ε 0

qf One More Example start ε qs q0 1 0

0 q q 1 1 2

1 0 0 1 11*ε q 3 q5

1 ε ε 0

qf One More Example start ε qs q0 1 0

0 q q 1 1 2

1 0 0 1 11*ε q 3 q5

1 ε ε 0

qf One More Example start ε qs q0 1 0

0 q q 1 1 2

1 0 0 1 11*ε q 3 q5

1 ε ε 0

qf One More Example start ε qs q0 1 0

0 q q 1 1 2

1 0 0 1 11*ε q 3 q5

1 ε ε 0

qf One More Example start ε qs q0 1 0

0 q q 1 1 2

1 0 0 1 11*ε q 3 q5 11*0 1 ε ε 0

qf One More Example start ε qs q0 1 0

0 q q 1 1 2

1 0 0 1 11*ε q 3 q5 11*0 1 ε ε 0

qf One More Example start ε qs q0 1 0

0 q q 1 1 2

0 1 11*ε q5 11*0 ε 0

qf One More Example start ε qs q0 1 0

0 q q 1 1 2

0 1 11* q5 11*0 ε 0

qf One More Example start ε qs q0 1 0

0 | 11*0 q q 1 1 2

0 1 11* q5

ε 0

qf One More Example start ε qs q0 1 0

0 | 11*0 q q 1 1 2

0 1 11* q5

ε 0

qf One More Example start ε qs q0 1 0

0 | 11*0 q q 1 1 2

0 1 11* q5

ε 0

qf One More Example start ε qs q0 1 0

0 | 11*0 q q 1 1 2

0 1 11* q5

ε 0

qf One More Example start ε qs q0 1 0

0 | 11*0 q q 1 1 2

0 1 11* 00*ε q5

ε 0

qf One More Example start ε qs q0 1 0

0 | 11*0 q q 1 1 2

0 1 11* 00*ε q5

ε 0

qf One More Example start ε qs q0 1 0

0 | 11*0 q q 1 1 2

0 1 11* 00*ε q5

ε 0

qf One More Example start ε qs q0 1 0

0 | 11*0 q q 1 1 2

0 1 11* 00*ε q5

ε 0

qf One More Example start ε qs q0 1 0

0 | 11*0 q q 1 1 2

0 1 11* 00*ε q5 00*1 ε 0

qf One More Example start ε qs q0 1 0

0 | 11*0 q q 1 1 2

0 1 11* 00*ε q5 00*1 ε 0

qf One More Example start ε qs q0 1 0

0 | 11*0 q q 1 1 2

11* 00*ε

00*1

qf One More Example start ε qs q0 1 0

0 | 11*0 q q 1 1 2

11* 00*

00*1

qf One More Example start ε qs q0 1 0

0 | 11*0 q q 1 1 | 00*1 2

11* 00*

qf One More Example start ε qs q0 1 0

0 | 11*0 q q 1 1 | 00*1 2

11* 00*

qf One More Example start ε qs q0 1 0

0 | 11*0 q q 1 1 | 00*1 2

11* 00*

qf One More Example start ε qs q0 1 0

0 | 11*0 q q 1 1 | 00*1 2

11* 00*

qf One More Example start ε qs q0 1 0

0 | 11*0 q q 1 1 | 00*1 2

11* 00*

qf One More Example start ε qs q0 1 0

0 | 11*0 q q 1 1 | 00*1 2 111*

11* 00*

qf One More Example start ε qs q0 1 0

0 | 11*0 q q 1 1 | 00*1 2 111*

11* 00*

qf One More Example start ε qs q0 1 0

0 | 11*0 q q 1 1 | 00*1 2 111*

11* 00*

qf One More Example start ε qs q0 1 0

0 | 11*0 q q 1 1 | 00*1 2 111* (1 | 00*1)11*

11* 00*

qf One More Example start ε qs q0 1 0

0 | 11*0 q q 1 1 | 00*1 2 111* (1 | 00*1)11*

11* 00*

qf One More Example start ε qs q0 1 0

0 | 11*0 q q 1 1 | 00*1 2 111* (1 | 00*1)11*

11* 00*

qf One More Example start ε qs q0 1 0

0 | 11*0 q q 1 1 | 00*1 2 111* (1 | 00*1)11* (0 | 11*0)(1 | 00*1)

11* 00*

qf One More Example start ε qs q0 1 0

0 | 11*0 q q 1 1 | 00*1 2 111* (1 | 00*1)11* (0 | 11*0)(1 | 00*1)

11* 00*

qf One More Example start ε qs q0 1 0

0 | 11*0 q q 1 1 | 00*1 2 111* (1 | 00*1)11* (0 | 11*0)(1 | 00*1)

11* 00*

qf One More Example start ε qs q0 1 0

0 | 11*0 q q 1 1 | 00*1 2 111* (1 | 00*1)11* (0 | 11*0)(1 | 00*1)

11* 00*

qf One More Example 1 (0 | 11*0) start ε qs q0 1 0

0 | 11*0 q q 1 1 | 00*1 2 111* (1 | 00*1)11* (0 | 11*0)(1 | 00*1)

11* 00*

qf One More Example 1 (0 | 11*0) start ε qs q0 1 0

0 | 11*0 q q 1 1 | 00*1 2 111* (1 | 00*1)11* (0 | 11*0)(1 | 00*1)

11* 00*

qf One More Example 1 (0 | 11*0) start ε qs q0 0

q2 111* (1 | 00*1)11* (0 | 11*0)(1 | 00*1)

00*

qf One More Example start ε qs q0

0 1 (0 | 11*0)

q 2 (0 | 11*0)(1 | 00*1) 111*

(1 | 00*1)11* 00*

qf One More Example start ε qs q0

1 (0 | 11*0) | 0

q 2 (0 | 11*0)(1 | 00*1) 111*

(1 | 00*1)11* 00*

qf One More Example start ε qs q0

1 (0 | 11*0) | 0

q 2 (0 | 11*0)(1 | 00*1) 111*

(1 | 00*1)11* | 00*

qf One More Example start ε qs q0

1 (0 | 11*0) | 0

q 2 (0 | 11*0)(1 | 00*1) 111*

(1 | 00*1)11* | 00*

qf One More Example start ε qs q0

1 (0 | 11*0) | 0

q 2 (0 | 11*0)(1 | 00*1) 111*

(1 | 00*1)11* | 00*

qf One More Example start ε qs q0

1 (0 | 11*0) | 0

q 2 (0 | 11*0)(1 | 00*1) 111*

(1 | 00*1)11* | 00*

qf One More Example start ε qs q0

1 (0 | 11*0) | 0

q 2 (0 | 11*0)(1 | 00*1) 111*

(1 (0 | 11*0) | 0) ((0 | 11*0)(1 | 00*1))*((1 | 00*1)11* | 00*)

(1 | 00*1)11* | 00*

qf One More Example start ε qs q0

1 (0 | 11*0) | 0

q 2 (0 | 11*0)(1 | 00*1) 111*

(1 (0 | 11*0) | 0) ((0 | 11*0)(1 | 00*1))*((1 | 00*1)11* | 00*)

(1 | 00*1)11* | 00*

qf One More Example start ε qs q0

111*

(1 (0 | 11*0) | 0) ((0 | 11*0)(1 | 00*1))*((1 | 00*1)11* | 00*)

qf One More Example start ε qs q0

(1 (0 | 11*0) | 0) ((0 | 11*0)(1 | 00*1))*((1 | 00*1)11* | 00*) 111*

qf One More Example start ε qs q0

(1 (0 | 11*0) | 0) ((0 | 11*0)(1 | 00*1))*((1 | 00*1)11* | 00*) 111*

qf One More Example start ε qs q0

(1 (0 | 11*0) | 0) ((0 | 11*0)(1 | 00*1))*((1 | 00*1)11* | 00*) | 111*

qf One More Example start ε qs q0

(1 (0 | 11*0) | 0) ((0 | 11*0)(1 | 00*1))*((1 | 00*1)11* | 00*) | 111*

qf One More Example start ε qs q0

(1 (0 | 11*0) | 0) ((0 | 11*0)(1 | 00*1))*((1 | 00*1)11* | 00*) | 111*

qf One More Example start qs

(1 (0 | 11*0) | 0) ((0 | 11*0)(1 | 00*1))*((1 | 00*1)11* | 00*) | 111*

qf Our Transformations

Our Transformations

DFA

Our Transformations

DFA NFA

Our Transformations

direct conversion

DFA NFA

Our Transformations

direct conversion

DFA NFA

subset construction

Our Transformations

direct conversion

DFA NFA Regexp

subset construction

Our Transformations

direct conversion

DFA NFA Regexp

subset construction recursive transform

Our Transformations

direct conversion state elimination

DFA NFA Regexp

subset construction recursive transform

Regular Languages

● A language L is regular iff

● L is accepted by some DFA. ● L is accepted by some NFA. ● L is described by some regular expression. ● What constructions on regular languages can we do with regular expressions?

Reversal

● The reverse of a string w is the string wR of the characters of w in the opposite order.

● helloR = olleh ● velociraptorR = rotparicolev ● aibohphobiaR = aibohphobia

Reversing a Language

● Given a language L, the reverse of L is the language LR defined as LR = { wR | w ∈ L } ● { whale, rainbow }R = { elahw, wobniar } ● { mom, momm, mommm, … }R = {mom, mmom, mmmom, … }

Reversing a Regular Language

● If L is regular, then LR is regular. ● Idea: Get a regular expression for L, then transform it into a regular expression for LR. ● We could also transform DFAs or NFAs, but the regular expression transformation is a bit easier.

Reversing a Regular Expression

● Let REV (E) denote a regular expression for ℒ(E)R.

● REV is defined inductively as follows:

● REV(a) = a, for any a ∈ Σ. ● REV(ε) = ε

● REV(Ø) = Ø

● REV(R1R2) = REV(R2) REV(R1)

● REV(R1 | R2) = REV(R1) | REV(R2)

● REV(R*) = REV(R)*

● REV((R)) = (REV(R))

Reversing a Regular Expression

= REV( (2 | 1*)0 ) = REV(0) REV( (2 | 1*) ) = 0 REV( (2 | 1*) ) = 0 ( REV(2 | 1*) ) = 0 ( REV(2) | REV(1*) ) = 0 ( 2 | REV(1*) ) = 0 ( 2 | REV(1)* ) = 0 ( 2 | 1* )

String

● Let Σ1 and Σ2 be alphabets.

● Consider any function h : Σ1 → Σ2* that associates

symbols in Σ1 with symbols in Σ2.

● For example:

● Σ1 = { 0, 1 }

● Σ2 = { a, b, c, d }

● h(0) = acdb

● h(1) = ccc

String Homomorphism

● Let Σ1 and Σ2 be alphabets.

● Consider any function h : Σ1 → Σ2* that associates

symbols in Σ1 with symbols in Σ2.

● For example:

● Σ1 = { a, b, c, d, … }

● Σ2 = { A, B, C, D, … }

● h(a) = A

● h(b) = B

● ...

String Homomorphism

● Let Σ1 and Σ2 be alphabets.

● Consider any function h : Σ1 → Σ2* that associates

symbols in Σ1 with symbols in Σ2.

● For example:

● Σ1 = { 0, 1 }

● Σ2 = { 0, 1 }

● h(0) = ε

● h(1) = 1

String Homomorphism

● Given a function h : Σ1 → Σ2*, consider the

function h* : Σ1* → Σ2* defined recursively as follows:

● h*(ε) = ε ● h*(wa) = h*(w) h(a) ● This function is called a string homomorphism.

● From Greek “same shape.”

A Simple Homomorphism

● Example: h(a) = A, h(b) = B = h*(baa) = h*(ba) h(a) = h*(b) h(a) h(a) = h*(ε) h(b) h(a) h(a) = h(b) h(a) h(a) = B h(a) h(a) = BA h(a) = BAA

A Simple Homomorphism

● Example: h(0) = a, h(1) = bc = h*(0110) = h*(011)h(0) = h*(01)h(1)h(0) = h*(0)h(1)h(1)h(0) = h*(ε)h(0)h(1)h(1)h(0) = h(0)h(1)h(1)h(0) = a h(1)h(1)h(0) = abc h(1)h(0) = abcbc h(0) = abcbca

A Simple Homomorphism

● Example: h(0) = ε, h(1) = 1 = h*(0110) = h*(011)h(0) = h*(01)h(1)h(0) = h*(0)h(1)h(1)h(0) = h*(ε)h(0)h(1)h(1)h(0) = h(0)h(1)h(1)h(0) = h(1)h(1)h(0) = 1 h(1)h(0) = 11 h(0) = 11

String Homomorphism, Intuitively

● String homomorphism represents building a new string that has the same structure as an older string.

● Example: Let Σ1 = { 0, 1, 2 } and consider the string 0121

● If Σ2 = {A, B, C, …, Z, a, b, …, z, ', [, ], . }, define h : Σ1 → Σ2* as

● h(0) = That's the way

● h(1) = [Uh huh uh huh]

● h(2) = I like it

● Then h*(0121) = That's the way [Uh huh uh huh] I like it [Uh huh uh huh]

● Note that h*(0121) has the same structure as 0121, just expressed differently.

Homomorphisms of Languages

● If L ⊆ Σ1* is a language and h* : Σ1* → Σ2* is a homomorphism, the language h*(L) is defined as h*(L) = { h*(w) | w ∈ L } ● The language formed by applying the homomorphism to every string in L.

Homomorphisms of Regular Languages

● If L is a regular language over Σ1 and

h* : Σ1* → Σ2* is a homomorphism, then is h*(L) a regular language? ● If so, how might we prove it? ● If not, why not?

Homomorphisms of Regular Languages

● Idea: Transform a regular expression for L into a regular expression for h*(L).

● Define HOM(R) as

● HOM(ε) = ε ● HOM(Ø) = Ø ● HOM(a) = (h(a))

● HOM(R1 R2) = HOM(R1) HOM(R2)

● HOM(R1 | R2) = HOM(R1) | HOM(R2) ● HOM(R*) = HOM(R)* ● HOM((R)) = (HOM(R))

Homomorphisms of Regular Languages

● Consider the language (0120)* and the function

● h(0) = n ● h(1) = y ● h(2) = a ● Then h*((0120)*) = (n)(y)(a)(n)*

Homomorphisms of Regular Languages

● Consider the language 011* and the function

● h(0) = Here ● h(1) = Kitty ● Then h*(011*) = (Here)(Kitty)(Kitty)*

The Big List of Closure Properties

● The regular languages are closed under

● Union

● Intersection

● Complement

● Set Difference (why?)

● Set Symmetric Difference (why?)

● Concatenation

● Kleene Closure

● Reversal

● String Homomorphism

● Plus a whole lot more!

Next Time

● The Limits of Regular Languages

● What languages are not regular? ● The Pumping Lemma for Regular Languages