Parsing Regular Expressions and Regular Grammars

Regular Expressions (1) Let Σ be an alphabet. The set of regular expressions over Σ is recursively defined: Parsing • ∅ is a regular expression denoting the set ∅. Regular Expressions and Regular • ǫ is a regular expression denoting the set {ǫ}. Grammars • For each a ∈ Σ: a is a regular expression denoting {a}. • If r and s are regular expressions denoting R and S, then Laura Kallmeyer (r + s), (rs) and (r∗) are also regular expressions denoting Heinrich-Heine-Universität Düsseldorf R ∪ S, RS and R∗ respectively. Sommersemester 2011 Regular Grammars 1 12-13 April 2011 Regular Grammars 3 12-13 April 2011 Kallmeyer Parsing Kallmeyer Parsing Regular Expressions (2) Assume that ∗ has a higher priority than concatenation which in turn has a higher priority than +. + ∗ Overview a is an abbreviation for aa . x L x x 1. Regular expressions For a regular expression , we define ( ) as the set denoted by . 2. NFA with empty transitions 3. FSA and regular expressions 4. Regular Grammars 5. Regular Grammars and NFAs Regular Grammars 2 12-13 April 2011 Regular Grammars 4 12-13 April 2011 Regular Expressions (3) NFA with empty transitions (2) The following holds: Let hQ, Σ, δ, q0, F i be an NFA with empty transitions. For each language L: there is a regular expression x with L = L(x) Definition of δˆ: iff there is a FSA M with L = L(M). ′ ′ • ǫ-closure(q) := {q | there is a path from q to q containing only ǫ-transitions}∪{q} In order to show this, we • δˆ(q, ǫ) := ǫ-closure(q) • define NFAs with empty transitions and show their equivalence ˆ to NFAs; • δ(q,wa) := Sp∈P ǫ-closure(p) where P = {p | there is an r ∈ δˆ(q, w) such that p ∈ δ(r, a)} • show how to construct an equivalent NFA with empty transitions for a given regular expression; • show how to construct an equivalent regular expression for a given DFA. Regular Grammars 5 12-13 April 2011 Regular Grammars 7 12-13 April 2011 Kallmeyer Parsing Kallmeyer Parsing NFA with empty transitions (1) NFA with empty transitions (3) ′ ′ A NFA with empty transitions is defined like an NFA except that δ Construction of an equivalent NFA hQ, Σ, δ , q0, F i (without empty allows for ε as second argument: transitions) for a given NFA hQ, Σ, δ, q0, F i with empty transitions: ′ ′ A NFA with empty transitions M is a quintuple hQ, Σ, δ, q0, F i • F := F ∪{q0} if ǫ-closure(q0) ∩ F 6= ∅. Otherwise, F := F . such that • δ′(q,a) := δˆ(q,a). • Q, Σ, q0 and F are like in a NFA. • δ : Q × (Σ ∪{ε}) → P(Q) is the transition function. ⇒ DFAs, NFAs and NFAs with empty transitions are equivalent. Regular Grammars 6 12-13 April 2011 Regular Grammars 8 12-13 April 2011 FSA and regular expressions (1) FSA and regular expressions (3) To show: • r = r1 + r2: q1 ... qf1 1. For a given regular expression r there exists an NFA with ǫ m m empty transitions that accepts L(r). q0 m ǫ ⇒ The set of languages denoted by regular expressions is a q2 ... qf2 m m subset of the set of languages accepted by FSAs. 2. For a given DFA, there exists a regular expression r such that • r = r1r2: L(r) is exactly the language acctepted by the DFA. ǫ q1 ... qf1 q2 ... qf2 ⇒ The set of languages accepted by FSAs is a subset of the set m m m m of languages denoted by regular expressions. ∗ • r = r1: ǫ q0 ǫq1 ... qf1 ǫ qf0 m m m m ǫ Regular Grammars 9 12-13 April 2011 Regular Grammars 11 12-13 April 2011 Kallmeyer Parsing Kallmeyer Parsing FSA and regular expressions (2) FSA and regular expressions (4) Construction of an equivalent NFA with empty transitions for a To show: For each DFA, there is an equivalent regular expression. given regular expression r: Assume that {q1,...,qn} it the set of states. • r = ǫ: k Define Ri,j as the set of sequences of input symbols that can be q0 obtained from following all paths from q to q that do not pass m i j through states ql with l>k. • r = ∅: Question: What do the Rk look like and what are the i,j q0 qf k m m corresponding regular expressions ri,j? • r = a, a ∈ Σ: • k = 0: Only paths of length 0 or 1 can be considered, since, a between qi and qj , no other states ql are possible. q0 qf m m 0 Each r has the form a1 + a2 + ··· + a or (if i = j) i,j m a1 + a2 + ··· + am + ǫ or (if i 6= j and there is no edge from qi to qj, ∅). Regular Grammars 10 12-13 April 2011 Regular Grammars 12 12-13 April 2011 FSA and regular expressions (5) FSA and regular expressions (7) k • In general, Rij contains The JFLAP conversion uses a different definition: k−1 k – all elements from Rij , Rij is the set obtained by using no states ql with l<k. – all concatenation of k k+1 k+1 k+1 ∗ k+1 Then ri,j = rij + rik (rkk ) rkj . Rk−1 a) an element from ik , r1 is the regular expression for the language accepted by the DFA. k−1 i,j b) any number (including 0) of elements from Rkk and − k 1 ∗ ∗ ∗ c) an element from Rkj . With this, we obtain (ab c) ab for the sample automaton. k k−1 k−1 k−1 ∗ k−1 Consequently, ri,j = rij + rik (rkk ) rkj . Finally, the regular expression for the language accepted by the n n n automaton is r1f1 + r1f2 + ··· + r1fm if F = {qf1 ,...,qfm }. Regular Grammars 13 12-13 April 2011 Regular Grammars 15 12-13 April 2011 Kallmeyer Parsing Kallmeyer Parsing FSA and regular expressions (6) Regular Grammars (1) Example: abcs.jff A type 2 grammar is called 2 1 1 1 ∗ 1 ∗ ′ r12 = r12 + r12(r22) r22 • right-linear iff for all productions A → β: β ∈ T or β = β X with β′ ∈ T ∗, X ∈ N. r1 = r0 + r0 (r0 )∗r0 12 12 11 11 12 • left-linear iff for all productions A → β: β ∈ T ∗ or β = Xβ′ ∗ = a + ǫ(ǫ )a = a with β′ ∈ T ∗, X ∈ N. 1 0 0 0 ∗ 0 • regular if it is left-linear or right-linear. r22 = r22 + r21(r11) r12 ∗ For each left-linear grammar there is an equivalent right-linear =(b + ǫ) + c(ǫ) a = ǫ + b + ca grammar and vice versa. 2 + ∗ ∗ r12 = a + a(ǫ + b + ca) = a(ǫ + b + ca) = a(b + ca) Regular Grammars 14 12-13 April 2011 Regular Grammars 16 12-13 April 2011 Regular Grammars (2) Regular Grammars (4) The following holds: Construction of a right-linear grammar for given DFA: L = L(G) for a regular grammar G iff L is a regular set (i.e., can • The states are the nonterminals. be described by a regular expression). • The start state is the start symbol. a To show this, we show that • For each transition ′ Q Q • for each right-linear grammar, there exists an equivalent NFA; ′ ′ add a production Q → aQ and, if Q ∈ F , a production Q → a. and • for each DFA, there is an equivalent right-linear grammar. If we construct the right-linear grammar for LR (L reversed) and we mirror all right-hand sides of productions, we obtain a left-linear grammar for L. Regular Grammars 17 12-13 April 2011 Regular Grammars 19 12-13 April 2011 Kallmeyer Parsing Regular Grammars (3) Construction of equivalent NFA (with empty transitions) for given right-linear G: • Start state [S]. • For all productions A → β1β2 there is a state [β2]. • Transitions simulate top-down left-to-right traversal of derivation tree: For every A → β: ǫ [A] [β] a For every state [aβ] with a ∈ T : [aβ] [β] • Final state is [ǫ]. Regular Grammars 18 12-13 April 2011.

Load more