The Limits of Regular Languages

The Limits of Regular Languages Announcements ● Midterm tomorrow night in Hewlett 200/201, 7PM – 10PM. ● Open-book, open-note, open-computer, closed-network. ● Covers material up to and including DFAs. Regular Expressions The Regular Expressions ● Goal: Assemble all regular languages from smaller building blocks! ● Atomic regular expressions: Ø ε a ● Compound regular expressions: R1R2 R1 | R2 R* (R) Operator Precedence ● Regular expression operator precedence: (R) R* R1R2 R1 | R2 ● ab*c|d is parsed as ((a(b*))c)|d Regular Expressions are Awesome ● Let Σ = { a, ., @ }, where a represents “some letter.” ● Regular expression for email addresses: aa* (.aa*)* @ aa*.aa* (.aa*)* [email protected] [email protected] [email protected] Regular Expressions are Awesome ● Let Σ = { a, ., @ }, where a represents “some letter.” ● Regular expression for email addresses: a+(.a+)*@a+(.a+)+ [email protected] [email protected] [email protected] Regular Expressions are Awesome a+(.a+)*@a+(.a+)+ @, . a, @, . @, . @, . q 2 q8 q7 @ ., @ . a @ @, . a start q a q @ q a . a 0 1 3 q04 q5 q6 a a a The Power of Regular Expressions Theorem: If R is a regular expression, then ℒ(R) is regular. Proof idea: Induction over the structure of regular expressions. Atomic regular expressions are the base cases, and the inductive step handles each way of combining regular expressions. A Marvelous Construction ● To show that any language described by a regular expression is regular, we show how to convert a regular expression into an NFA. ● Theorem: For any regular expression R, there is an NFA N such that ● ℒ(R) = ℒ(N) ● N has exactly one accepting state. ● N has no transitions into its start state. ● N has no transitions out of its accepting state. start A Marvelous Construction To show that any language described by a regular expression is regular, we show howTheseThese to convertareare strongerstronger a regular expression into an NFA.requirementsrequirements thanthan areare necessary for a normal NFA. Theorem: For any regular expressionnecessary Rfor, there a normal is an NFA. We enforce these rules to NFA N such that We enforce these rules to simplifysimplify thethe construction.construction. ℒ(R) = ℒ(N) ● N has exactly one accepting state. ● N has no transitions into its start state. ● N has no transitions out of its accepting state. start Base Cases start ε Automaton for ε start Automaton for Ø start a Automaton for single character a Construction for R1R2 ε start Machine for R1 Machine for R2 Construction for R1 | R2 ε ε start Machine for R1 ε ε Machine for R2 Construction for R* ε start ε ε Machine for R ε The Power of Regular Expressions Theorem: If L is a regular language, then there is a regular expression for L. This is not obvious! Proof idea: Show how to convert an arbitrary NFA into a regular expression. From NFAs to Regular Expressions s1, s2, …, sn start Regular expression: | | … | (s1 s2 sn)* From NFAs to Regular Expressions s1 | s2 | … | sn start Regular expression: | | … | (s1 s2 sn)* KeyKey idea:idea: LabelLabel transitionstransitions withwith arbitraryarbitrary regularregular expressions. expressions. From NFAs to Regular Expressions start R Regular expression: R KeyKey idea:idea: IfIf wewe cancan convertconvert anyany NFANFA intointo somethingsomething thatthat lookslooks likelike this,this, wewe cancan easilyeasily readread offoff thethe regularregular expression.expression. From NFAs to Regular Expressions start R Regular expression: R s1 | s2 | … | sn start From NFAs to Regular Expressions start R Regular expression: R (s | s | … | s )* start 1 2 n Regular expression: | | … | (s1 s2 sn)* From NFAs to Regular Expressions start R Regular expression: R s1 | s2 | … | sn start From NFAs to Regular Expressions start R Regular expression: R start Ø Regular expression: Ø From NFAs to Regular Expressions start R Regular expression: R R11 R22 R12 start q q 1 R21 2 From NFAs to Regular Expressions start R Regular expression: R R11* R12 (R22 | R21R11*R12)* start q1 q2 From NFAs to Regular Expressions R11 R22 R12 start q q 1 R21 2 From NFAs to Regular Expressions R11 R22 R12 ε ε start q q q q s 1 R21 2 f From NFAs to Regular Expressions R11 R22 R12 ε ε start q q q q s 1 R21 2 f CouldCould wewe eliminateeliminate thisthis statestate fromfrom thethe NFA?NFA? From NFAs to Regular Expressions ε R11* R12 R11 R22 R12 ε ε start q q q q s 1 R21 2 f Note:Note: We'reWe're usingusing concatenationconcatenation andand KleeneKleene closureclosure inin orderorder to skip this state. to skip this state. From NFAs to Regular Expressions ε R11* R12 R11 R22 R12 ε ε start q q q q s 1 R21 2 f R21 R11* R12 From NFAs to Regular Expressions ε R11* R12 R22 start ε qs q2 qf R21 R11* R12 From NFAs to Regular Expressions R11* R12 start ε qs q2 qf R22 | R21 R11* R12 Note:Note: We'reWe're usingusing unionunion toto combinecombine thesethese transitionstransitions together.together. From NFAs to Regular Expressions R * R start 11 12 ε qs q2 qf R22 | R21 R11* R12 From NFAs to Regular Expressions R11* R12 (R22 | R21R11*R12)* ε R * R start 11 12 ε qs q2 qf R22 | R21 R11* R12 From NFAs to Regular Expressions R11* R12 (R22 | R21R11*R12)* start qs qf R11 R22 R12 start q q 1 R21 2 The Construction at a Glance ● Start with an NFA for the language L. ● Add a new start state qs and accept state qf to the NFA. ● Add ε-transitions from each original accepting state to qf, then mark them as not accepting. ● Repeatedly remove states other than qs and qf from the NFA by “shortcutting” them until only two states remain: qs and qf. ● The transition from qs to qf is then a regular expression for the NFA. Our Transformations direct conversion state elimination DFA NFA Regexp subset construction recursive transform Theorem: The following are all equivalent: · L is a regular language. · There is a DFA D such that ℒ(D) = L. · There is an NFA N such that ℒ(N) = L. · There is a regular expression R such that ℒ(R) = L. Why This All Matters ● DFAs correspond to computers with finite memory. ● The equivalence of DFAs and NFAs tells us that given finite memory, nondeterminism does not increase computational power. ● Though it might save on memory. ● The equivalence of DFAs and regular expressions tells us that all problems solvable by finite computers can be assembled out of smaller building blocks. Is every language regular? An Important Observation 0, 1 1 0, 1 q5 0 q2 1 1 start 0 0 1 q0 q1 q3 q4 0 0 1 1 0 1 1 1 q0 q1 q2 q3 q1 q2 q3 q4 An Important Observation 0, 1 1 0, 1 q5 0 q2 1 1 start 0 0 1 q0 q1 q3 q4 0 0 1 1 0 1 1 1 q0 q1 q2 q3 q1 q2 q3 q4 An Important Observation 0, 1 1 0, 1 q5 0 q2 1 1 start 0 0 1 q0 q1 q3 q4 0 0 1 1 1 q0 q1 q2 q3 q4 An Important Observation 0, 1 1 0, 1 q5 0 q2 1 1 start 0 0 1 q0 q1 q3 q4 0 0 1 1 0 1 1 1 q0 q1 q2 q3 q1 q2 q3 q4 An Important Observation 0, 1 1 0, 1 q5 0 q2 1 1 start 0 0 1 q0 q1 q3 q4 0 0 1 1 0 1 1 0 1 1 1 q0 q1 q2 q3 q1 q2 q3 q1 q2 q3 q4 Visiting Multiple States ● Let D be a DFA with n states. ● Any string w accepted by D that has length at least n must visit some state twice. ● Number of states visited is equal to the length of the string plus one. ● By the pigeonhole principle, some state is duplicated. ● The substring of w between those revisited states can be removed, duplicated, tripled, etc. without changing the fact that D accepts w. Intuitively y x z start Informally ● Let L be a regular language. ● If we have a string w ∈ L that is “sufficiently long,” then we can split the string into three pieces and “pump” the middle. ● We can write w = xyz such that xy0z, xy1z, xy2z, …, xynz, … are all in L. ● Notation: yn means “n copies of y.” The Weak Pumping Lemma ● The Weak Pumping Lemma for Regular Languages states that For any regular language L, There exists a positive natural number n such that For any w ∈ L with |w| ≥ n, There exists strings x, y, z such that For any natural number i, w = xyz, y ≠ ε xyiz ∈ L The Weak Pumping Lemma ● The Weak Pumping Lemma for Regular Languages states that For any regular language L, There exists a positive natural number n such that For any w ∈ L with |w| ≥ n, There exists strings x, y, z such that For any natural number i, ThisThis numbernumber nn isis w = xyz, sometimessometimes calledcalled thethe pumping length. y ≠ ε pumping length. xyiz ∈ L The Weak Pumping Lemma ● The Weak Pumping Lemma for Regular Languages states that For any regular language L, There exists a positive natural number n such that For any w ∈ L with |w| ≥ n, There exists strings x, y, z such that For any natural number i, w = xyz, StringsStrings longerlonger thanthan thethe pumpingpumping lengthlength y ≠ ε mustmust havehave aa specialspecial property.property. xyiz ∈ L The Weak Pumping Lemma ● The Weak Pumping Lemma for Regular Languages states that For any regular language L, There exists a positive natural number n such that For any w ∈ L with |w| ≥ n, There exists strings x, y, z such that For any natural number i, w = xyz, w can be broken into three pieces, y ≠ ε where the middle piece isn't empty, where the middle piece can be xyiz ∈ L replicated zero or more times.

The Limits of Regular Languages

Midterm Study Sheet for CS3719 Regular Languages and Finite

Chapter 6 Formal Language Theory

Deterministic Finite Automata 0 0,1 1

6.045 Final Exam Name

(A) for Any Regular Expression R, the Set L(R) of Strings

Formal Languages We’Ll Use the English Language As a Running Example

Using Jflap in Academics to Understand Automata Theory in a Better Way

Chapter 3: Lexing and Parsing

Regular Expressions

Regular Languages Context-Free Languages

Generalized Nondeterministic Finite Automaton (GNFA); It Can Be Shown (We Omit the Proof for Brevity) That Any DFA Can Be Converted Into a GNFA

Regular Languages