<<

Regular Expressions (RegE)

Definition ( (RegE), value of a RegE L(α)) Regular expressions α, β ∈ RegE(Σ) over a finite non–empty alphabet Σ = {x1, x2,..., xn} and their value L(α) is defined by induction: expression α for value L(α) ≡ [α] λ empty string L(λ) = {λ} Basis: ∅ empty expression L(∅) = {} ≡ ∅ a a ∈ Σ L(a) = {a}. Induction: expression value remark α + β L(α + β) = L(α) ∪ L(β) αβ L(αβ) = L(α)L(β) . may be used α∗ L(α∗) = L(α)∗ (α) L((α)) = L(α) brackets do not change the value. The class of regular expressions over Σ: RegE(Σ) is the smallest class closed under operations above.

Automata and Grammars Regular Expressions, Moore and Mealy Machine, 2 way FA 5 March 23, 2017 1 / 32 Examples, Precedence

Example (Regular Expressions) The language of alternating 0’s and 1’s may be written: either (01)∗ + (10)∗ + 1(01)∗ + 0(10)∗ or (λ + 1)(01)∗(λ + 0). ∗ ∗ ∗ ∗ ∗ ∗ The language L((0 10 10 1) 0 ) = {w|w ∈ {0, 1} , |w|1 = 3k, k ≥ 0}.

Definition (Precedence) The star ∗is the operator with highest precedence, then ., the lowest precedence has the union +.

Theorem (RegE and DFA) Any language recognizable by a DFA can be expressed by a regular expression. Any language of a regular expression can be recognized by a λ-NFA (therefore also a DFA).

Automata and Grammars Regular Expressions, Moore and Mealy Machine, 2 way FA 5 March 23, 2017 2 / 32 From DFA to Regular Expressions

Compare to Kleene Theorem.

(k+1) (k) (k) (k) ∗ (k) Rij = Rij + Ri(k+1)(R(k+1)(k+1)) R(k+1)j (n) Finally, RegE = ⊕j∈FA R1j union over accepting states j.

Automata and Grammars Regular Expressions, Moore and Mealy Machine, 2 way FA 5 March 23, 2017 3 / 32 From Regular Expression to an λ–NFA automaton

From Regular Expression to an λ–NFA automaton By structural induction on R. In each Basis: step, we construct an λ-NFA E recog- nizing the language L(R) = L(E) with λ empty string λ additional properties: Exactly one accepting state. empty set ∅ No edges into the initial state. a No edges from the accepting state. a single letter a ∈ Σ a λ λ Union R + S: R λ λ

INDUCTION: S λ Concatenation RS: R S λ ∗ λ λ Closure R : R

λ Automata and Grammars Regular Expressions, Moore and Mealy Machine, 2 way FA 5 March 23, 2017 4 / 32 Converting Among Representations

Converting NFA to DFA λ−NFA NFA λ closure in O(n3). Search O(n) n states multiplied by n2 O(n32n) arcs for λ transitions. O(n) 3 n Subset construction, DFA O(n 2 ) O(n) with possibly 2n states. For 3 each state, O(n ) time to RE DFA compute transitions. O(n34n) Converting DFA to NFA Just modify transition table by putting set-brackets around states and adding column for λ in the case of λ−NFA. Automaton to Regular Expression Conversion O(n34n) (see construction in Section 3.2.1) RE to Automaton Conversion λ−NFA in the time O(n).

Automata and Grammars Regular Expressions, Moore and Mealy Machine, 2 way FA 5 March 23, 2017 5 / 32 Automata with the output

Definition (Moore machine)

Moore machine is a sixtuple A = (Q, Σ, Y , δ, µ, q0) consisting of Q non–empty set of states Σ finite nonempty set of symbols (input alphabet) Y finite nonempty set of symbols (output alphabet) δ a mapping Q × Σ → Q (transition function) µ a mapping Q → Y (output function)

q0 ∈ Q (initial state)

the output function may imitate final states F ⊆ Q may be replaced by output function µ : Q → {0, 1} as follows: µ(q) = 0 if q ∈/ F , µ(q) = 1 if q ∈ F .

Automata and Grammars Regular Expressions, Moore and Mealy Machine, 2 way FA 5 March 23, 2017 6 / 32 Moore Machine Example

Stav/výstup A B 00:00 15:00 00:15 15:00 30:00 15:15 15:15 30:15 15:30 00:15 15:15 00:30 30:00 40:00 30:15 30:15 40:15 30:30 Example (Tennis Game Score) 30:30 40:30 30:40 A machine calculates the tenis score. 15:30 30:30 15:40 00:30 15:30 00:40 Input alphabet: ID of the player 40:00 A 40:15 who scored a point 40:15 A 40:30 40:30 A deuce Output alphabet & states: the score 30:40 deuce B ( Q = Y and µ(q) = q) 15:40 30:40 B 00:40 15:00 B deuce A:40 40:B A:40 A deuce 40:B deuce B A 15:00 00:15 B 15:00 00:15

Automata and Grammars Regular Expressions, Moore and Mealy Machine, 2 way FA 5 March 23, 2017 7 / 32 Mealy machine

Definition (Mealy machine)

Mealy machine is a six–tuple A = (Q, Σ, Y , δ, λM , q0) consisting of: Q non–empty set of states Σ finite nonempty set of symbols (input alphabet) Y finte nonempty set of symbols (output alphabet) δ a mapping Q × Σ → Q (transition function) µ a mapping Q × Σ → Y (output function)

q0 ∈ Q (initial state)

The output is determined by a state and the input symbol Mealy machine is more general then Moore The output function may be replaced as follows

∀x ∈ Σ λM (q, x) = µ(q) or ∀x ∈ Σ λM (q, x) = µ(δ(q, x))

Automata and Grammars Regular Expressions, Moore and Mealy Machine, 2 way FA 5 March 23, 2017 8 / 32 Mealy Machine Example

Example (Mealy Machine) The automaton for integer division of the input by 8 (the reminder is discarded). Three bit move to the left we need to remember last three bits three–bit dynamic memory. State\symbol 0 1 0/0 1/1 →000 000/0 001/0 1/0 1/0 1/0 001 010/0 011/0 000 001 011 111 010 100/0 101/0 0/0 0/0 1/1 0/1 011 110/0 111/0 1/0 100 000/1 001/1 1/1 010 101 110 1/1 101 010/1 011/1 0/1 0/1 0/0 110 100/1 101/1 0/1 111 110/1 111/1 100 After three steps calculates properly non–regarding the initial state.

Automata and Grammars Regular Expressions, Moore and Mealy Machine, 2 way FA 5 March 23, 2017 9 / 32 Extended Output Function for any word in the input alphabet Σ∗ → we get a word in the output alphabet Y ∗ Moore machine a b c d output function µ : Q → Y µ∗ : Q × Σ∗ → Y ∗ y µ∗(q, λ) = λ (někdy µ∗(q, λ) = q) x u u v µ∗(q, wx) = µ∗(q, w).µ(δ∗(q, wx)) Example: µ∗(00:00,AABA)=(00:00 .) 15:00 . 30:00 . 30:15 . 40:15 Mealy machine

output function λM : Q × Σ → Y a b c d ∗ ∗ ∗ λM : Q × Σ → Y ∗ λM (q, λ) = λ ∗ ∗ ∗ λM (q, wx) = λM (q, w).λM (δ (q, w), x) ∗ x y u u Example: λM (000,1101010)=0001101

Automata and Grammars Regular Expressions, Moore and Mealy Machine, 2 way FA 5 March 23, 2017 10 / 32 Lemma (Moore and Mealy Machines Reductions) For any Moore machine there exists a Mealy machine mapping each input word to the same output word. For any Mealy machine there exists a Moore machine mapping each input word to the same output word.

Proof. a/x b/y q ⇒ Mealy machine B = (Q, Σ, Y , δ, λM , q0) where λM (q, x) = µ(δ(q, x)) ⇐ We define states of the Moore machine Q × Y , δ|([q, y] , x) = [δ(q, x), λ(q, x)]), µ([q, y]) = y. a b q/x q/y

Automata and Grammars Regular Expressions, Moore and Mealy Machine, 2 way FA 5 March 23, 2017 11 / 32 Further Generalisation

Finite automaton makes following actions: read a symbol changes its state moves its reading head to the right The head is not allowed to move to the left.

What happens, if we allow the head to move left and right? The automaton does not write anything on the tape!

Automata and Grammars Regular Expressions, Moore and Mealy Machine, 2 way FA 5 March 23, 2017 12 / 32 Two way finite automata

Definition (Two way finite automata)

Two way deterministic finite automaton is a five–tuple A = (Q, Σ, δ, q0, F ), where Q is a finite set of states, Σ is a finite set of input symbols transition function δ is a mapping Q × Σ → Q × {−1, 0, 1} extended by head transitions q0 ∈ Q initial state a set of accepting states F ⊆ Q.

We may represent it by a graph or a table.

Automata and Grammars Regular Expressions, Moore and Mealy Machine, 2 way FA 5 March 23, 2017 13 / 32 Two–way DFA computation

Definition (Two–way DFA computation) A string w is accepted by the two–way DFA, iff: computation started in the initial state at the left–most symbol of w the first transition from w to the right was in an accepting state the computation is not defined outside the word w (computation ends without accepting w).

We may add special end–symbols # ∈/ Σ to any word If L(A) = {#w#|w ∈ L ⊆ Σ∗} is regular, then also L is regular R ∗ L = ∂#∂#(L(A) ∩ #Σ #)

Automata and Grammars Regular Expressions, Moore and Mealy Machine, 2 way FA 5 March 23, 2017 14 / 32 Two–way automaton example

Example (Two–way automaton example)

Let A = (Q, Σ, δ, q1, F ). We define a two–way DFA | || | B = (Q ∪ Q ∪ Q ∪ {q0, qN , qF }, Σ, δ , q0, {qF }) accepting the language L(B) = {#u#|uu ∈ L(A)} (it is neither left nor right quotient!): δ| x ∈ Σ # remark q0 qN , −1 q1, +1 q1 is starting in A q p, +1 q|, −1 p = δ(q, x) q| q|, −1 q||, +1 || || q p , +1 qF , +1 q ∈ F , p = δ(q, x) || || q p , +1 qN , +1 q ∈/ F , p = δ(q, x) qN qN , +1 qN , +1 qF qN , +1 qN , +1

Theorem Languages accepted by two–way DFA are exactly regular languages.

Automata and Grammars Regular Expressions, Moore and Mealy Machine, 2 way FA 5 March 23, 2017 15 / 32 Two–way DFA and Regular Languages

Proof: DFA → two–way DFA

To a DFA we add the move of the head to the right | A = (Q, Σ, δ, q0, F ) → 2A = (Q, Σ, δ , q0, F ), where δ|(q, x) = (δ(q, x), +1).

For the other direction, we need introduction.

The influence of u ∈ Σ∗ on the computation over v ∈ Σ∗

the first time we leave u to the right

we leave v to the left and return back v

Automata and Grammars Regular Expressions, Moore and Mealy Machine, 2 way FA 5 March 23, 2017 16 / 32 Function fu describing computation two–way DFA over u

Algorthm: Function fu describing computation two–way DFA over u

| We define fu : Q ∪ {q0} → Q ∪ {0} | fu(q0) the state of the first transition to the right in case the computation begins left in the state q0,

fu(p); p ∈ Q the state of the right transition in case the computation begins right in p the symbol 0 denotes failure (a cycle or the head moves left from the initial symbol)

We define similarity ∼ on strings: u ∼ w ⇔def fu = fw , strings are similar iff they define identical function f

Languages recognized by two–way DFA are regular Similarity ∼ is a right congruence with a finite index. According to Myhill–Nerode theorem is the language L(A) regular.

Automata and Grammars Regular Expressions, Moore and Mealy Machine, 2 way FA 5 March 23, 2017 17 / 32 Constructive proof We need the left–right movement transcript to a linear computation. we are interested in accepting computations only. We focus on transitions in cuts between input symbols Algorthm: 2DFA → NFA Observations: Find all posssible cuts – The direction of movement repeats state sequences (its a finite (right, left) number). the first and the last transitions are Define non–deterministic to the right transition between cuts automaton is deterministic, any according to the input accepting computation is without symbol. cycles We re–construct the the first and the last cut contain computation by composing only one state. cuts like a puzzle.

Automata and Grammars Regular Expressions, Moore and Mealy Machine, 2 way FA 5 March 23, 2017 18 / 32 Formal reduction two–way DFA to NFA

Algorthm: Formal reduction two–way DFA to NFA

Let A = (Q, Σ, δ, q0, F )be a two–way DFA. We define an equivalent NFA | | | B = (Q , Σ, δ , (q0), F ) as follows: Q| all possible correct transition sequences sequences of states (q1,..., qk ); qi ∈ Q with an even length (k = 2m + 1) no state repeats at odd nor at even position (∀i 6= j)(q2i 6= q2j )&(∀i 6= j)(q2i+1 6= q2j+1) F | = {(q)|q ∈ F } sequences of the length 1 δ|(c, x) = {d|d ∈ Q|&c → d is locally consistent transition for x} F | = {(q)|q ∈ F } posloupnosti délky 1 δ|(c, x) = {d|d ∈ Q|&c → d je lokálně konzistentní přechod pro x} otočky: δ(ci , x) = (ci+1, −1) a δ(d i , x) = (d i+1, +1) i i zbytek |cz | = |dz |&(∀i = 2m + 1 ≤ |c|) δ(c , x) = (d , +1) &(∀i = 2m ≤ |c|) δ(d i , x) = (ci , −1).

L(A) = L(B) Automata and Grammars Regular Expressions, Moore and Mealy Machine, 2 way FA 5 March 23, 2017 19 / 32 Trajektory two–way DFA A corresponds to cuts in NFA B, therefore L(A) = L(B). Example Reduction Two–way DFA to NFA Possible cuts and their transitions leftwards only r – all even positions r, that means Let us have only one even position two–way DFA: possible cuts: (p), (q), (p, r, q), (q, r, p). a b → p p,+1 q,+1 a b ∗q q,+1 r,-1 → (p) (p) (q) r p,+1 r,-1 ∗(q) (q),(q,r,p) (p,r,q) (q,r,p) (q) Non–accepting computation example: a a b a a b a a b b p p p q q q Resulting NFA: r a a p q q q a r b (p) (q) (q,r,p) p q r r b p q .. Automata and Grammars Regular Expressions, Moore and Mealy Machine, 2 way FA 5 March 23, 2017 20 / 32 Finite Automata Summary

Finite Automata DFA, minimal DFA NFA 2n, λ–NFA, two–way FA nn Regular Expressions Automata and Languages regular languages closed under set operations closed under string operations closed under substitution, , inverse homomorphism all finite automata and regular expressions describe the same class of languges. Key theorems Mihyll–Nerode theorem (kongruence) Kleene theorem (elementar languages and operations) Pumping lemma. Automata with the output Moore machine Mealy machine.

Automata and Grammars Regular Expressions, Moore and Mealy Machine, 2 way FA 5 March 23, 2017 21 / 32 Hledání vzorů v textu

Statický text je indexovaný, lepší hledat jinak než RegE. RegE jsou užitečné v dynamickém textu (např. zprávy).

Example (Search for streets in addresses on the web) Street identification Streen|St\.|Avenue|Ave\.|Road|Rd\ the name before ’[A-Z][a-z]*( [A-Z][a-z]*)*’ house number [0-9]+[A-Z]? ’[0-9]+[A-Z]? [A-Z][a-z]*( [A-Z][a-z]*)* all together Streen|St\.|Avenue|Ave\.|Road|Rd\.’

We are missing: Bouleward, Place, Way Streets without any identifier (almost all Czech streets) Street names with numbers. ...

Automata and Grammars Regular Expressions, Moore and Mealy Machine, 2 way FA 5 March 23, 2017 22 / 32 String Substitution, (String) Homomorphism

Definition (String Substitution, (String) Homomorphism) We have a finite alphabet Σ. For each x ∈ Σ we have σ(x) a language over the alphabet Yx . Further, we define: σ(λ) = {λ} σ(u.v) = σ(u).σ(v)

∗ ∗ S The mapping σ :Σ → P(Y ), kde Y = x∈Σ Yx is substitution. S σ(L) = w∈L σ(w) e–free, λ–free substitution is a substitution where none σ(x) contains λ. n For w = a1 ... an ∈ Σ h(w) = h(a1) ... h(an). Further, h(L) = {h(w)|w ∈ L}.

Example (substitution) σ(0) = {ai bj , i, j ≥ 0}, σ(1) = {cd} σ(010) = {ai bj cak bl , i, j, k, l ≥ 0}

Automata and Grammars Regular Expressions, Moore and Mealy Machine, 2 way FA 5 March 23, 2017 23 / 32 (String) Homomorphism

Definition ((String) Homomorphism)

Homomorphism h is a special case of a substitution where h(x) = wx ∀x ∈ Σ. If ∀x : wx 6= λ is is e–free (λ–free) homomorphism. Inverse homomorphism h−1(L) = {w|h(w) ∈ L}.

Example (homomorphism) The function h defined by: h(0) = ab, and h(1) = λ is a homomorphism. For example, h(0011) = abab. For L = 10∗1 is h(L) = (ab)∗.

Theorem (Closure under homomorphism) If language L and all ∀x ∈ Σ σ(x) are regular, so is also σ(L), h(L), h−1(L).

Automata and Grammars Regular Expressions, Moore and Mealy Machine, 2 way FA 5 March 23, 2017 24 / 32 Notes to Closure Properties

Simplification of the automata design

L.∅ = ∅.L = ∅ {λ}.L = L.{λ} = L (L∗)∗ = L∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ (L1 ∪ L2) = L1(L2.L1) = L2(L1.L2) R R R (L1.L2) = L2 .L1

∂w (L1 ∪ L2) = ∂w (L1) ∪ ∂w (L2) ∗ ∗ ∂w (Σ − L) = Σ − ∂w L

h(L1 ∪ L2) = h(L1) ∪ h(L2)

Proof of non–regularity ∗ L = {w|w ∈ {0, 1} , |w|1 = |w|2} is not regular since L ∩ {0i 1j |i, j ≥ 0} = {0i 1i |i ≥ 0} is not regular (pumping lemma).

Automata and Grammars Regular Expressions, Moore and Mealy Machine, 2 way FA 5 March 23, 2017 25 / 32 Homomorphism preserves regularity

Theorem If L is a over alphabet Σ, and h is a homomorphism on Σ, then h(L) is also regular.

Proof. Let L = L(R) for some regular expression R. The proof is done by structural induction on subexpressions E of R: we claim L(h(E)) = h(L(E)). Basis: h({λ}) = λ, h(∅) = ∅. If E = a then L(E) = {a}, so h(L(E)) = {h(a)}. Thus, L(h(E)) = {h(a)}. Induction: Union: L(h(F + G)) = L(h(F ) + h(G)) = L(h(F )) ∪ L(h(G)) and h(L(F + G)) = h(L(F ) ∪ L(G)) = h(L(F )) ∪ h(L(G)). Right sides are equal from inductive hypothesis therefore left sides also equal. concatenation, closure proofs are similar.

Automata and Grammars Regular Expressions, Moore and Mealy Machine, 2 way FA 5 March 23, 2017 26 / 32 Inverse Homomorphism

Definition (Inverse homomorphism) Suppose h is a homomorphism from some alphabet Σ to strings in another alphabet T . Then h−1(L) ’h inverse of L’ is the set of strings w in Σ∗ such that h(w) is in L.

Example Let L = (00 + 1)∗, h(a) = 01 and h(b) = 10. We claim h−1(L) = (ba)∗.

Proof: h((ba)∗) ∈ L is easy to see. A homomorphism applied in the Other w generates isolated 0 (4 cases forward and inverse direction. to consider).

Automata and Grammars Regular Expressions, Moore and Mealy Machine, 2 way FA 5 March 23, 2017 27 / 32 Inverse Homomorphism DFA

Theorem If h is a homomorphism from alphabet Σ to alphabet T , and L is a regular language over T , then h−1(L) is also a regular language.

Proof. The proof starts with a DFA A for L. We construct a DFA for h−1(L).

For A = (Q, T , δ, q0, F ) we define B(Q, Σ, γ, q0, F ) where γ(q, a) = δ∗(q, h(a)) (δ∗ operates on strings). By induction on |w|, ∗ −1 γˆ(q0, w) = δ (q0, h(w)). The DFA for h (L) ap- Therefore, B accepts exactly those plies h to ints input, and strings w that are in h−1(L). then simulates the DFA for L.

Automata and Grammars Regular Expressions, Moore and Mealy Machine, 2 way FA 5 March 23, 2017 28 / 32 Visit every state example

Example ∗ Suppose A = (Q, Σ, δ, q0, F ) is an DFA. The language L of all strings w in Σ ∗ such that δ (q0, w) is in F and for every state q ∈ Q there is some prefix xq of w ∗ such that δ (a0, xq) = q. This language is regular.

M = L(A) the language accepted by DFA A in the usual way. We define a new alphabet T of triples {[paq]; p, q ∈ Q, a ∈ Σ, δ(p, a) = q}. We define the homomorphism h([paq]) = a for all p, q, a. −1 Language L1 = h (M) is regular since M is regular (DFA and inverse homomorphism). h−1(101) includes 23 = 8 strings, like [p1p][q0q][p1p] ∈ {[p1p], [q1q]}{[p0q], [q0q]}{[p1p], [q1q]}. We construct L from L1 (next slide). 1

0 p q 0,1

Automata and Grammars Regular Expressions, Moore and Mealy Machine, 2 way FA 5 March 23, 2017 29 / 32 Enforse start at q0. Define S E1 = a∈Σ,q∈Q{[q0aq]} = E1 = {[q0a1q0], [q0a2q1],..., [q0amqn]}. ∗ Then, L2 = L1 ∩ L(E1.T ). Adjacent states must equal. Define non-matching pairs S E2 = q6=r,p,q,r,s∈Q,a,b∈Σ{[paq][rbs]}. ∗ ∗ Define L3 = L2 − L(T .E2.T ), It ends in accepting state since we started from M language of accepting computations on the DFA A. All states. For each state q ∈ Q, define Eq be the regular expression that is the sum of all the symbols in T such that q Constructing language L appears in neither its first or last from language M by ap- ∗ position. We substract L(Eq ) from L3. plying operations that S ∗ L4 = L3 − q∈Q{Eq }. preserve regularity of lan- Remove states, leave symbols. guages. L = h(L4). We conclude L is regular.

Automata and Grammars Regular Expressions, Moore and Mealy Machine, 2 way FA 5 March 23, 2017 30 / 32 Decision Properties of Regular Languages

Lemma (Testing Emptiness of Regular Languages) For finite automatons, it is a question of graph reachability of any final state from the initial one. Reachability is O(n2).

Lemma For regular expression, we can convert it to λ−NFA in O(n) time and than check reachability.

It can be done also by direct inspection: Basis: ∅ denotes empty language; λ and a are not empty. Induction:

R = R1 + R2 is empty iff both L(R1) and L(R2) are empty. R = R1R2 is empty iff either L(R1) or L(R2) is empty. ∗ R = R1 is never empty, in includets λ. R = (R1) is empty iff R1 is empty, since they are the same language.

Automata and Grammars Regular Expressions, Moore and Mealy Machine, 2 way FA 5 March 23, 2017 31 / 32 Testing Membership in a Regular Language

Given a string w; |w| = n and a regular language L, is w ∈ L? DFA: Run automaton; if |w| = n, suitable representation, constant time transitions, it is O(n). NFA with s states: running time O(ns2).Each input symbol can be processed by taking the previous set of states, which numbers at most s states. λ−NFA - first compute λ−closure. Then, for each symbol proceed it and compute λ− closure of the result. For a regular expression of size s we convert it to an λ−NFA with at most 2s states and then simulate, taking O(ns2).

Automata and Grammars Regular Expressions, Moore and Mealy Machine, 2 way FA 5 March 23, 2017 32 / 32