Regular Expressions (1) Let Σ be an alphabet. The set of regular expressions over Σ is recursively defined: • ∅ is a denoting the set ∅. Regular Expressions and Regular • ǫ is a regular expression denoting the set {ǫ}. Grammars • For each a ∈ Σ: a is a regular expression denoting {a}. • If r and s are regular expressions denoting R and S, then Laura Kallmeyer (r + s), (rs) and (r∗) are also regular expressions denoting Heinrich-Heine-Universit¨at D¨usseldorf R ∪ S, RS and R∗ respectively. Sommersemester 2011

Regular Grammars 1 12-13 April 2011 Regular Grammars 3 12-13 April 2011

Kallmeyer Parsing Kallmeyer Parsing

Regular Expressions (2) Assume that ∗ has a higher priority than concatenation which in turn has a higher priority than +. + ∗ Overview a is an abbreviation for aa . x L x x 1. Regular expressions For a regular expression , we define ( ) as the set denoted by . 2. NFA with empty transitions 3. FSA and regular expressions 4. Regular Grammars 5. Regular Grammars and NFAs

Regular Grammars 2 12-13 April 2011 Regular Grammars 4 12-13 April 2011 Regular Expressions (3) NFA with empty transitions (2)

The following holds: Let hQ, Σ, δ, q0, F i be an NFA with empty transitions. For each language L: there is a regular expression x with L = L(x) Definition of δˆ:

iff there is a FSA M with L = L(M). ′ ′ • ǫ-closure(q) := {q | there is a path from q to q containing only ǫ-transitions}∪{q} In order to show this, we • δˆ(q, ǫ) := ǫ-closure(q) • define NFAs with empty transitions and show their equivalence ˆ to NFAs; • δ(q,wa) := Sp∈P ǫ-closure(p) where P = {p | there is an r ∈ δˆ(q, w) such that p ∈ δ(r, a)} • show how to construct an equivalent NFA with empty transitions for a given regular expression; • show how to construct an equivalent regular expression for a given DFA.

Regular Grammars 5 12-13 April 2011 Regular Grammars 7 12-13 April 2011

Kallmeyer Parsing Kallmeyer Parsing

NFA with empty transitions (1) NFA with empty transitions (3) ′ ′ A NFA with empty transitions is defined like an NFA except that δ Construction of an equivalent NFA hQ, Σ, δ , q0, F i (without empty allows for ε as second argument: transitions) for a given NFA hQ, Σ, δ, q0, F i with empty transitions:

′ ′ A NFA with empty transitions M is a quintuple hQ, Σ, δ, q0, F i • F := F ∪{q0} if ǫ-closure(q0) ∩ F 6= ∅. Otherwise, F := F . such that • δ′(q,a) := δˆ(q,a). • Q, Σ, q0 and F are like in a NFA. • δ : Q × (Σ ∪{ε}) → P(Q) is the transition function. ⇒ DFAs, NFAs and NFAs with empty transitions are equivalent.

Regular Grammars 6 12-13 April 2011 Regular Grammars 8 12-13 April 2011 FSA and regular expressions (1) FSA and regular expressions (3)

To show: • r = r1 + r2:  q1 ... qf1 1. For a given regular expression r there exists an NFA with ǫ m m empty transitions that accepts L(r). q0  m ǫ  ⇒ The set of languages denoted by regular expressions is a q2 ... qf2 m m subset of the set of languages accepted by FSAs.  2. For a given DFA, there exists a regular expression r such that • r = r1r2: L(r) is exactly the language acctepted by the DFA. ǫ  q1 ... qf1 q2 ... qf2 ⇒ The set of languages accepted by FSAs is a subset of the set m m m m  of languages denoted by regular expressions. ∗ • r = r1: ǫ  q0 ǫq1 ... qf1 ǫ qf0 m m m m ǫ 

Regular Grammars 9 12-13 April 2011 Regular Grammars 11 12-13 April 2011

Kallmeyer Parsing Kallmeyer Parsing

FSA and regular expressions (2) FSA and regular expressions (4) Construction of an equivalent NFA with empty transitions for a To show: For each DFA, there is an equivalent regular expression. given regular expression r: Assume that {q1,...,qn} it the set of states. • r = ǫ: k Define Ri,j as the set of sequences of input symbols that can be q0 obtained from following all paths from q to q that do not pass m i j  through states ql with l>k. • r = ∅: Question: What do the Rk look like and what are the  i,j q0 qf k m m corresponding regular expressions ri,j?  • r = a, a ∈ Σ: • k = 0: Only paths of length 0 or 1 can be considered, since,

a between qi and qj , no other states ql are possible. q0 qf m m 0 Each r has the form a1 + a2 + ··· + a or (if i = j)  i,j m a1 + a2 + ··· + am + ǫ or (if i 6= j and there is no edge from qi

to qj, ∅).

Regular Grammars 10 12-13 April 2011 Regular Grammars 12 12-13 April 2011 FSA and regular expressions (5) FSA and regular expressions (7)

k • In general, Rij contains The JFLAP conversion uses a different definition: k−1 k – all elements from Rij , Rij is the set obtained by using no states ql with l

Regular Grammars 13 12-13 April 2011 Regular Grammars 15 12-13 April 2011

Kallmeyer Parsing Kallmeyer Parsing

FSA and regular expressions (6) Regular Grammars (1) Example: abcs.jff A type 2 grammar is called

2 1 1 1 ∗ 1 ∗ ′ r12 = r12 + r12(r22) r22 • right-linear iff for all productions A → β: β ∈ T or β = β X with β′ ∈ T ∗, X ∈ N. r1 = r0 + r0 (r0 )∗r0 12 12 11 11 12 • left-linear iff for all productions A → β: β ∈ T ∗ or β = Xβ′ ∗ = a + ǫ(ǫ )a = a with β′ ∈ T ∗, X ∈ N.

1 0 0 0 ∗ 0 • regular if it is left-linear or right-linear. r22 = r22 + r21(r11) r12 ∗ For each left- there is an equivalent right-linear =(b + ǫ) + c(ǫ) a = ǫ + b + ca grammar and vice versa. 2 + ∗ ∗ r12 = a + a(ǫ + b + ca) = a(ǫ + b + ca) = a(b + ca)

Regular Grammars 14 12-13 April 2011 Regular Grammars 16 12-13 April 2011 Regular Grammars (2) Regular Grammars (4) The following holds: Construction of a right-linear grammar for given DFA: L = L(G) for a regular grammar G iff L is a regular set (i.e., can • The states are the nonterminals. be described by a regular expression). • The start state is the start symbol.

a To show this, we show that • For each transition ′  Q Q • for each right-linear grammar, there exists an equivalent NFA; ′ ′ add a production Q  → aQ and, if Q ∈ F , a production Q → a. and • for each DFA, there is an equivalent right-linear grammar.

If we construct the right-linear grammar for LR (L reversed) and we mirror all right-hand sides of productions, we obtain a left-linear grammar for L.

Regular Grammars 17 12-13 April 2011 Regular Grammars 19 12-13 April 2011

Kallmeyer Parsing

Regular Grammars (3) Construction of equivalent NFA (with empty transitions) for given right-linear G: • Start state [S].

• For all productions A → β1β2 there is a state [β2]. • Transitions simulate top-down left-to-right traversal of derivation tree: For every A → β: ǫ  [A] [β]   a For every state [aβ] with a ∈ T :  [aβ] [β]   • Final state is [ǫ].

Regular Grammars 18 12-13 April 2011