Regular Expressions

Regular Expressions

CS 172: Computability and Complexity Regular Expressions Sanjit A. Seshia EECS, UC Berkeley Acknowledgments: L.von Ahn, L. Blum, M. Blum The Picture So Far DFA NFA Regular language S. A. Seshia 2 Today’s Lecture DFA NFA Regular Regular language expression S. A. Seshia 3 Regular Expressions • What is a regular expression? S. A. Seshia 4 Regular Expressions • Q. What is a regular expression? • A. It’s a “textual”/ “algebraic” representation of a regular language – A DFA can be viewed as a “pictorial” / “explicit” representation • We will prove that a regular expressions (regexps) indeed represent regular languages S. A. Seshia 5 Regular Expressions: Definition σ is a regular expression representing { σσσ} ( σσσ ∈∈∈ ΣΣΣ ) ε is a regular expression representing { ε} ∅ is a regular expression representing ∅∅∅ If R 1 and R 2 are regular expressions representing L 1 and L 2 then: (R 1R2) represents L 1⋅⋅⋅L2 (R 1 ∪∪∪ R2) represents L 1 ∪∪∪ L2 (R 1)* represents L 1* S. A. Seshia 6 Operator Precedence 1. *** 2. ( often left out; ⋅⋅⋅ a ··· b ab ) 3. ∪∪∪ S. A. Seshia 7 Example of Precedence R1*R 2 ∪∪∪ R3 = ( ())R1* R2 ∪∪∪ R3 S. A. Seshia 8 What’s the regexp? { w | w has exactly a single 1 } 0*10* S. A. Seshia 9 What language does ∅∅∅* represent? {ε} S. A. Seshia 10 What’s the regexp? { w | w has length ≥ 3 and its 3rd symbol is 0 } ΣΣΣ2 0 ΣΣΣ* Σ = (0 ∪∪∪ 1) S. A. Seshia 11 Some Identities Let R, S, T be regular expressions • R ∪∪∪∅∅∅ = ? • R ···∅∅∅ = ? • Prove: R ( S ∪∪∪ T ) = R S ∪∪∪ R T (what’s the proof idea?) S. A. Seshia 12 Some Applications of Regular Expressions • String matching & searching – Utilities like grep, awk, … – Search in editors: emacs, … • Programming Languages – Perl – Compiler design: lex/yacc • Computer Security – Virus signatures S. A. Seshia 13 Virus Signature as String Sequence of words, one … for each instruction: pop ecx i0 jecxz SFModMark i1 i0 mov esi, ecx i2 i1 mov eax, 0d601h i3 pop edx i4 i2 pop ecx i0 … i3 Chernobyl virus i4 code fragment i0 virus! S. A. Seshia 14 Virus Signature as Regexp … Sequence of words doesn’t nop work! nop pop ecx i0 nop i1 i0 jecxz SFModMark i2 nop nop i1 mov esi, ecx i3 nop i4 i2 nop nop i0 mov eax, 0d601h i3 pop edx nop nop pop ecx i4 … i0 Simple obfuscated Chernobyl virus! virus code fragment S. A. Seshia 15 Equivalence Theorem A language is regular ⇒ ⇒ ⇒ ⇒ ⇒ ⇒ ⇒ ⇒ if and only if some regular expression describes it S. A. Seshia 16 Part I (“if part”) Some regular expression R describes a language ⇒⇒⇒ That language is regular There exists NFA N such that R describes L(N) S. A. Seshia 17 Given regular expression R, we show there exists NFA N such that R represents L(N) Proof idea? S. A. Seshia 18 Given regular expression R, we show there exists NFA N such that R represents L(N) Proof Idea: Induction on the length of R : Base Cases (R has length 1): σσσ R = σσσ R = ε R = ∅∅∅ S. A. Seshia 19 Inductive Step : Assume R has length k > 1 and that any regular expression of length < k represents a language that can be recognized by an NFA What might R look like? R = R 1 ∪∪∪ R2 R = R 1R2 R = (R 1)* (remember: we have NFAs for R 1 and R 2) S. A. Seshia 20 Part I (“if part”) Some regular expression R describes a language ⇒⇒⇒ That language is regular There exists NFA N such that R describes L(N) DONE ! S. A. Seshia 21 An Example Transform (1(0 ∪∪∪ 1))* to an NFA ε 1 1,0 ε S. A. Seshia 22 Part II (“only if part”) A language is regular ⇒⇒⇒ Some regular expression R describes it Turn DFA into equivalent regular expression S. A. Seshia 23 Proof Sketch 1. DFA Generalized NFA • NFA with edges labeled by regexps, 1 start state, and 1 accept state 2. GNFA with k states GNFA with 2 states • k > 2; delete states but maintain equivalence 3. 2-state GNFA regular expression R R S. A. Seshia 24 GNFA Example & Definition 01*0 A GNFA is a tuple (Q, Σ, δ, qstart , qaccept ) • Q – set of states • Σ – finite alphabet (not regexps) • qstart – initial state (unique, no incoming edges) • ε transitions to old start state • qaccept – accepting state (unique, no outgoing edges) • ε transitions from old accept states • δ : (Q \ qaccept ) x (Q \ qstart ) R R – set of all regexps over Σ. Example: Any string matching 01 *0 can cause the transition. S. A. Seshia 25 Step 1: DFA to GNFA a a, b b What’s the corresponding GNFA? S. A. Seshia 26 Step 1: DFA to GNFA ε ε ε qstart DFA qaccept ε Add unique and distinct start and accept states Edges with multiple labels regexp labels If internal states (q 1, q 2) don’t have an edge between them, add one labeled with ∅∅∅ S. A. Seshia 27 Step 2: Eliminate states from GNFA While machine has more than 2 states: Pick an internal state, rip it out and re- label the arrows with regular expressions to account for the missing state 0 0 1 S. A. Seshia 28 Step 2: Eliminate states from GNFA While machine has more than 2 states: Pick an internal state, rip it out and re- label the arrows with regular expressions to account for the missing state 01*0 S. A. Seshia 29 a a ∪∪∪ b ε b ε q0 q1 q2 q3 S. A. Seshia 30 a ∪∪∪ b a*b ε q0 q2 q3 S. A. Seshia 31 (a*b)(a ∪∪∪b)* q0 q3 δδδ(δ(((q0,q 3) = (a*b)(a ∪∪∪b)* S. A. Seshia 32 Formally: Add qstart and qaccept and create GNFA G Run CONVERT(G) to eliminate states & get regexp: If #states = 2 return the expression on the arrow going from qstart to qaccept If #states > 2 ? S. A. Seshia 33 Formally: Add qstart and qaccept to create G Run CONVERT(G) : If #states > 2 select qrip ∈∈∈Q different from qstart and qaccept define Q′′′ = Q – {qrip } define δδδ′δ′′′ as: δ′δ′δ′ (qi,q j) = δδδ(qi,q rip )δδδ(q rip ,q rip )* δδδ(qrip ,q j) ∪∪∪δδδ (qi,q j) return CONVERT(G ′′′) /* recursion */ (what does this look like, pictorially?) S. A. Seshia 34 Prove: CONVERT(G) is equivalent to G Proof by induction on k (number of states in G) Base Case: k = 2 Inductive Step : Assume claim is true for k-1 states Prove that G and G ′′′ are equivalent By the induction hypothesis, G ′′′ is equivalent to CONVERT(G ′′′) S. A. Seshia 35 The Complete Picture DFA NFA Regular Regular language expression S. A. Seshia 36 Which language is regular? C = { w | w has equal number of 1s and 0s} NOT REGULAR D = { w | w has equal number of occurrences of 01 and 10} REGULAR! S. A. Seshia 37 Next Steps • Read Sipser 1.4 in preparation for next lecture S. A. Seshia 38.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    38 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us