Parsing Regular Expressions and Regular Grammars

Regular Expressions (1) Let Σ be an alphabet. The set of regular expressions over Σ is recursively defined: Parsing • ∅ is a regular expression denoting the set ∅. Regular Expressions and Regular • ǫ is a regular expression denoting the set {ǫ}. Grammars • For each a ∈ Σ: a is a regular expression denoting {a}. • If r and s are regular expressions denoting R and S, then Laura Kallmeyer (r + s), (rs) and (r∗) are also regular expressions denoting Heinrich-Heine-Universität Düsseldorf R ∪ S, RS and R∗ respectively. Sommersemester 2011 Regular Grammars 1 12-13 April 2011 Regular Grammars 3 12-13 April 2011 Kallmeyer Parsing Kallmeyer Parsing Regular Expressions (2) Assume that ∗ has a higher priority than concatenation which in turn has a higher priority than +. + ∗ Overview a is an abbreviation for aa . x L x x 1. Regular expressions For a regular expression , we define ( ) as the set denoted by . 2. NFA with empty transitions 3. FSA and regular expressions 4. Regular Grammars 5. Regular Grammars and NFAs Regular Grammars 2 12-13 April 2011 Regular Grammars 4 12-13 April 2011 Regular Expressions (3) NFA with empty transitions (2) The following holds: Let hQ, Σ, δ, q0, F i be an NFA with empty transitions. For each language L: there is a regular expression x with L = L(x) Definition of δˆ: iff there is a FSA M with L = L(M). ′ ′ • ǫ-closure(q) := {q | there is a path from q to q containing only ǫ-transitions}∪{q} In order to show this, we • δˆ(q, ǫ) := ǫ-closure(q) • define NFAs with empty transitions and show their equivalence ˆ to NFAs; • δ(q,wa) := Sp∈P ǫ-closure(p) where P = {p | there is an r ∈ δˆ(q, w) such that p ∈ δ(r, a)} • show how to construct an equivalent NFA with empty transitions for a given regular expression; • show how to construct an equivalent regular expression for a given DFA. Regular Grammars 5 12-13 April 2011 Regular Grammars 7 12-13 April 2011 Kallmeyer Parsing Kallmeyer Parsing NFA with empty transitions (1) NFA with empty transitions (3) ′ ′ A NFA with empty transitions is defined like an NFA except that δ Construction of an equivalent NFA hQ, Σ, δ , q0, F i (without empty allows for ε as second argument: transitions) for a given NFA hQ, Σ, δ, q0, F i with empty transitions: ′ ′ A NFA with empty transitions M is a quintuple hQ, Σ, δ, q0, F i • F := F ∪{q0} if ǫ-closure(q0) ∩ F 6= ∅. Otherwise, F := F . such that • δ′(q,a) := δˆ(q,a). • Q, Σ, q0 and F are like in a NFA. • δ : Q × (Σ ∪{ε}) → P(Q) is the transition function. ⇒ DFAs, NFAs and NFAs with empty transitions are equivalent. Regular Grammars 6 12-13 April 2011 Regular Grammars 8 12-13 April 2011 FSA and regular expressions (1) FSA and regular expressions (3) To show: • r = r1 + r2: q1 ... qf1 1. For a given regular expression r there exists an NFA with ǫ m m empty transitions that accepts L(r). q0 m ǫ ⇒ The set of languages denoted by regular expressions is a q2 ... qf2 m m subset of the set of languages accepted by FSAs. 2. For a given DFA, there exists a regular expression r such that • r = r1r2: L(r) is exactly the language acctepted by the DFA. ǫ q1 ... qf1 q2 ... qf2 ⇒ The set of languages accepted by FSAs is a subset of the set m m m m of languages denoted by regular expressions. ∗ • r = r1: ǫ q0 ǫq1 ... qf1 ǫ qf0 m m m m ǫ Regular Grammars 9 12-13 April 2011 Regular Grammars 11 12-13 April 2011 Kallmeyer Parsing Kallmeyer Parsing FSA and regular expressions (2) FSA and regular expressions (4) Construction of an equivalent NFA with empty transitions for a To show: For each DFA, there is an equivalent regular expression. given regular expression r: Assume that {q1,...,qn} it the set of states. • r = ǫ: k Define Ri,j as the set of sequences of input symbols that can be q0 obtained from following all paths from q to q that do not pass m i j through states ql with l>k. • r = ∅: Question: What do the Rk look like and what are the i,j q0 qf k m m corresponding regular expressions ri,j? • r = a, a ∈ Σ: • k = 0: Only paths of length 0 or 1 can be considered, since, a between qi and qj , no other states ql are possible. q0 qf m m 0 Each r has the form a1 + a2 + ··· + a or (if i = j) i,j m a1 + a2 + ··· + am + ǫ or (if i 6= j and there is no edge from qi to qj, ∅). Regular Grammars 10 12-13 April 2011 Regular Grammars 12 12-13 April 2011 FSA and regular expressions (5) FSA and regular expressions (7) k • In general, Rij contains The JFLAP conversion uses a different definition: k−1 k – all elements from Rij , Rij is the set obtained by using no states ql with l<k. – all concatenation of k k+1 k+1 k+1 ∗ k+1 Then ri,j = rij + rik (rkk ) rkj . Rk−1 a) an element from ik , r1 is the regular expression for the language accepted by the DFA. k−1 i,j b) any number (including 0) of elements from Rkk and − k 1 ∗ ∗ ∗ c) an element from Rkj . With this, we obtain (ab c) ab for the sample automaton. k k−1 k−1 k−1 ∗ k−1 Consequently, ri,j = rij + rik (rkk ) rkj . Finally, the regular expression for the language accepted by the n n n automaton is r1f1 + r1f2 + ··· + r1fm if F = {qf1 ,...,qfm }. Regular Grammars 13 12-13 April 2011 Regular Grammars 15 12-13 April 2011 Kallmeyer Parsing Kallmeyer Parsing FSA and regular expressions (6) Regular Grammars (1) Example: abcs.jff A type 2 grammar is called 2 1 1 1 ∗ 1 ∗ ′ r12 = r12 + r12(r22) r22 • right-linear iff for all productions A → β: β ∈ T or β = β X with β′ ∈ T ∗, X ∈ N. r1 = r0 + r0 (r0 )∗r0 12 12 11 11 12 • left-linear iff for all productions A → β: β ∈ T ∗ or β = Xβ′ ∗ = a + ǫ(ǫ )a = a with β′ ∈ T ∗, X ∈ N. 1 0 0 0 ∗ 0 • regular if it is left-linear or right-linear. r22 = r22 + r21(r11) r12 ∗ For each left-linear grammar there is an equivalent right-linear =(b + ǫ) + c(ǫ) a = ǫ + b + ca grammar and vice versa. 2 + ∗ ∗ r12 = a + a(ǫ + b + ca) = a(ǫ + b + ca) = a(b + ca) Regular Grammars 14 12-13 April 2011 Regular Grammars 16 12-13 April 2011 Regular Grammars (2) Regular Grammars (4) The following holds: Construction of a right-linear grammar for given DFA: L = L(G) for a regular grammar G iff L is a regular set (i.e., can • The states are the nonterminals. be described by a regular expression). • The start state is the start symbol. a To show this, we show that • For each transition ′ Q Q • for each right-linear grammar, there exists an equivalent NFA; ′ ′ add a production Q → aQ and, if Q ∈ F , a production Q → a. and • for each DFA, there is an equivalent right-linear grammar. If we construct the right-linear grammar for LR (L reversed) and we mirror all right-hand sides of productions, we obtain a left-linear grammar for L. Regular Grammars 17 12-13 April 2011 Regular Grammars 19 12-13 April 2011 Kallmeyer Parsing Regular Grammars (3) Construction of equivalent NFA (with empty transitions) for given right-linear G: • Start state [S]. • For all productions A → β1β2 there is a state [β2]. • Transitions simulate top-down left-to-right traversal of derivation tree: For every A → β: ǫ [A] [β] a For every state [aβ] with a ∈ T : [aβ] [β] • Final state is [ǫ]. Regular Grammars 18 12-13 April 2011.

Parsing Regular Expressions and Regular Grammars

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support