CS 236 Language and Computation Course Notes Sect 2.2: The Chomsky Hierarchy and Regular Languages
Anton Setzer (Based on a book draft by J. V. Tucker and K. Stephenson) Dept. of Computer Science, Swansea University
http://www.cs.swan.ac.uk/∼csetzer/lectures/ languageComputation/09/index.html
December 12, 2009
CS 236 Sect. 2.2 1/ 65 2.2.1. Chomsky Hierarchy (12.1)
2.2.2. Regular Languages (12.2)
2.2.3. Regular Expressions (13.8)
CS 236 Sect. 2.2 2/ 65 2.2.1. Chomsky Hierarchy (12.1) Chomsky Hierarchy
The :::::::::::Chomsky :::::::::::hierarchy is the classification of grammars by means of 4 properties of its production rules:
I Unrestricted grammars.
I The limit of grammars.
I Context-sensitive grammars.
I It’s usually an accident if a grammar of a language is context-sensitive. C and C++ have some context-sensitive aspects (dealt with by selecting correct strings after the parsing).
I Context-free grammars.
I Easy to understand and supported by parse generators. I In language design one aims at languages having an underlying context-free grammar.
I Regular grammars.
I Simple to parse. Used for dividing the input stream of characters into tokens.
CS 236 Sect. 2.2.1 3/ 65 2.2.1. Chomsky Hierarchy (12.1) Unrestricted Grammars
Let in the following four definitions G = (T , N, S, P) be a grammar. Definition
Any grammar G is of :::::Type::0 or ::::::::::::::unrestricted, so any production
u −→ v
for u ∈ (T ∪ N)+, v ∈ (T ∪ N)∗ are allowed.
CS 236 Sect. 2.2.1 4/ 65 2.2.1. Chomsky Hierarchy (12.1) Context-Sensitive Grammars
Definition
Any grammar G is of :::::Type::1 or :::::::::::::::::::context-sensitive, if all its productions have the form uAv −→ uwv where A ∈ N is a nonterminal, which rewrites to a non-empty string w ∈ (T ∪ N)+, but only where A is in the context of strings u, v ∈ (T ∪ N)∗. Furthermore a production A −→ is allowed, but only if A does not occur in the right hand side of any production.
CS 236 Sect. 2.2.1 5/ 65 2.2.1. Chomsky Hierarchy (12.1) Context-Free Grammars
Definition
Any grammar G is of :::::Type::2 or ::::::::::::::context-free, if all its productions have the form A −→ w where A ∈ N is a nonterminal, which rewrites to a string w ∈ (T ∪ N)∗.
CS 236 Sect. 2.2.1 6/ 65 2.2.1. Chomsky Hierarchy (12.1) Regular Grammars
Definition
1. A grammar G is :::::::::::left-linear, iff all its productions have the form
A −→ Ba or A −→ a or A −→
2. A grammar G is ::::::::::::right-linear, iff all its productions have the form
A −→ aB or A −→ a or A −→
3. A grammar G is of :::::Type::3 or ::::::::regular, iff it is left-linear or right-linear In the above we have A, B ∈ N are nonterminal and a ∈ T . Note that in a regular grammar either all productions must be left-linear or all productions must be right-linear, so no mixing of the left-linear and right-linear is allowed. CS 236 Sect. 2.2.1 7/ 65 2.2.1. Chomsky Hierarchy (12.1) A Hierarchy of Languages
Definition ∗ A language L ⊆ T is ::::::::::::::unrestricted, :::::::::::::::::::context-sensitive, ::::::::::::::context-free,
or ::::::::regular, iff there exists a grammar G of the relevant type such that L(G) = L.
Remark For any L we have L regular ⇒ L context-free ⇒ L context-sensitive ⇒ L unrestricted.
CS 236 Sect. 2.2.1 8/ 65 2.2.1. Chomsky Hierarchy (12.1) Hierarchy of Languages
We have that
I every regular grammar is context-free.
I every context-sensitive grammar is an unrestricted grammar. However not every context-free grammar is context sensitive, since context-sensitive languages allow only productions A −→ if A does not occur at the right hand side of a production. (Otherwise all unrestricted languages would be context-sensitive). However one can construct from a context-free grammar a context-free grammar of the same language, which has only productions A −→ , if A does not occur on the right hand side of a production. This grammar is therefore context-sensitive as well.
CS 236 Sect. 2.2.1 9/ 65 2.2.1. Chomsky Hierarchy (12.1) Hierarchy of Languages
context context-free sensitive unrestricted regular
CS 236 Sect. 2.2.1 10/ 65 2.2.1. Chomsky Hierarchy (12.1) Examples of Equivalent Grammars
We give grammars of each type for defining the language
2n La := {ai | i is even }
CS 236 Sect. 2.2.1 11/ 65 2.2.1. Chomsky Hierarchy (12.1) Unrestricted Grammar for La2n
2n grammar G unrestricted,a
terminals a
nonterminals S
start symbol S
productions S −→ S −→ aa a −→ aaa
CS 236 Sect. 2.2.1 12/ 65 2.2.1. Chomsky Hierarchy (12.1) Context-Sensitive Grammar for La2n
2n grammar G context−sensitive,a
terminals a
nonterminals S, T
start symbol S
productions S −→ S −→ aa S −→ aaT aT −→ aTaa aT −→ aaa
CS 236 Sect. 2.2.1 13/ 65 2.2.1. Chomsky Hierarchy (12.1) Context-Free Grammar for La2n
2n grammar G context−free,a
terminals a
nonterminals S
start symbol S
productions S −→ S −→ aSa
CS 236 Sect. 2.2.1 14/ 65 2.2.1. Chomsky Hierarchy (12.1) Regular Grammar for La2n
2n grammar G regular,a
terminals a
nonterminals S, A
start symbol S
productions S −→ S −→ aA A −→ aS
CS 236 Sect. 2.2.1 15/ 65 2.2.1. Chomsky Hierarchy (12.1) Example 1 (Grammars of the Levels of the Chomsky Hierarchy)
grammar G
terminals a, b
nonterminals S
start symbol S
productions S −→ aSa, S −→ bSb, S −→
L(G) = ? G is of which type?
CS 236 Sect. 2.2.1 16/ 65 2.2.1. Chomsky Hierarchy (12.1) Example 2
grammar G
terminals a
nonterminals S
start symbol S
productions S −→ a, S −→ aS
L(G) = ? G is of which type?
CS 236 Sect. 2.2.1 17/ 65 2.2.1. Chomsky Hierarchy (12.1) Example 3
grammar G
terminals a, b
nonterminals S
start symbol S
productions S −→ ab, S −→ aSb
L(G) = ? G is of which type?
CS 236 Sect. 2.2.1 18/ 65 2.2.1. Chomsky Hierarchy (12.1) Example 4
n n n grammar G a b c
terminals a, b
nonterminals S
start symbol S
productions S −→ aSBC, S −→ aBC, CB −→ HB, HB −→ HC, HC −→ BC, aB −→ ab, bB −→ bb, bC −→ bc, cC −→ cc. L(G) = {anbncn | n ≥ 1}. G is of which type?
CS 236 Sect. 2.2.1 19/ 65 2.2.1. Chomsky Hierarchy (12.1) Examples
context context-free sensitive unrestricted regular n n n {a b | {a | n ≥ 1} n ≥ 1} {anbncn | n ≥ 1}
CS 236 Sect. 2.2.1 20/ 65 2.2.2. Regular Languages (12.2)
2.2.1. Chomsky Hierarchy (12.1)
2.2.2. Regular Languages (12.2)
2.2.3. Regular Expressions (13.8)
CS 236 Sect. 2.2.2 21/ 65 2.2.2. Regular Languages (12.2) Finite languages are regular
grammar G ab,aabb,aaabbb
terminals a, b
nonterminals S
start symbol S
productions S −→ ab S −→ aabb S −→ aaabbb The above grammar is not regular, since there can only be one terminal in the right hand string. But we can amend this:
CS 236 Sect. 2.2.2 22/ 65 2.2.2. Regular Languages (12.2) Finite languages are regular
grammar G ab,aabb,aaabbb
terminals a, b
nonterminals S, S1, S2, S3, S4, S5, S6, S7, S8, S9
start symbol S
productions S −→ aS1, S1 −→ b S −→ aS2, S2 −→ aS3, S3 −→ bS4, S4 −→ b S −→ aS5, S5 −→ aS6, S6 −→ aS7, S7 −→ bS8, S8 −→ bS9, S9 −→ b
CS 236 Sect. 2.2.2 23/ 65 2.2.2. Regular Languages (12.2) Observation
The above can be generalised to the following Lemma
1. Assume a grammar G which has only productions of the form
A −→ Bw or A −→ w 0
for some w ∈ T +, w 0 ∈ T ∗,A, B ∈ N. Then L(G) = L(G 0) for some left-linear grammar G 0. 2. Assume a grammar G which has only productions of the form
A −→ wB or A −→ w 0
for some w ∈ T +, w 0 ∈ T ∗,A, B ∈ N. Then L(G) = L(G 0) for some right-linear grammar G 0.
CS 236 Sect. 2.2.2 24/ 65 2.2.2. Regular Languages (12.2) Proof
I In (2) replace I Productions A −→ a1a2 ··· anB with n ≥ 2 by A −→ a1A1, A1 −→ a2A2 ,..., An−1 −→ anB for some new nonterminals Ai . I Productions A −→ a1a2 ··· an with n ≥ 2 by A −→ a1A1, A1 −→ a2A2 ,..., An−1 −→ an for some new nonterminals Ai .
I (1) is proved similarly.
CS 236 Sect. 2.2.2 25/ 65 2.2.2. Regular Languages (12.2) Lemma
Lemma All finite languages are regular.
Proof: Extend the example above.
CS 236 Sect. 2.2.2 26/ 65 2.2.2. Regular Languages (12.2) A Left-Linear Grammar for ambn
The following left-linear grammar generates {ambn | m, n ≥ 1}. m n grammar G left−linear,a b
terminals a, b
nonterminals S, T
start symbol S
productions S −→ Sb S −→ Tb T −→ Ta T −→ a
CS 236 Sect. 2.2.2 27/ 65 2.2.2. Regular Languages (12.2) A Right-Linear Grammar for ambn
The following right-linear grammar generates {ambn | m, n ≥ 1}: m n grammar G right−linear,a b
terminals a, b
nonterminals S, T
start symbol S
productions S −→ aS S −→ aT T −→ bT T −→ b
CS 236 Sect. 2.2.2 28/ 65 2.2.2. Regular Languages (12.2) Right-Linear Grammar for Numbers
Here is a right-linear grammars for numbers without leading zeros. We use “|” as for BNF. grammar G Number terminals 0, 1,..., 9 nonterminals Number, Digits start symbol Number productions Number −→ 0 Number −→ 1 Digits | 2 Digits | · · · | 9 Digits Digits −→ 0 Digits | 1 Digits | · · · | 9 Digits Digits −→
CS 236 Sect. 2.2.2 29/ 65 2.2.2. Regular Languages (12.2) Right-Linear Grammar for Numbers
Why didn’t we use the following as in the section on BNF?
grammar G Number terminals 0, 1,..., 9 nonterminals Number, Digit, NonZeroDigit, Digits start symbol Number productions Number −→ Digit | NonZeroDigit Digits Digits −→ Digit | Digit Digits Digit −→ 0 | NonZeroDigit NonZeroDigit −→ 1 | 2 | · · · | 9
Answer:
CS 236 Sect. 2.2.2 30/ 65 2.2.2. Regular Languages (12.2) Right-Linear Grammar for Post Codes
The next grammar generates the postcodes of the form SA1 8PP or in general LLd dLL for digits d and capital letters L without any leading zeros. We use the notation | as in BNF. We write xy for blank
CS 236 Sect. 2.2.2 31/ 65 2.2.2. Regular Languages (12.2) Right-Linear Grammar for Post Codes
grammar G Postcode
terminals 0, 1,..., 9, A, B,..., Z, xy nonterminals postcode, letter2, digit1, blank1, digit2, letter3, letter4 start symbol postcode productions postcode −→ A letter2 | B letter2 | · · · | Z letter2 letter2 −→ A digit1 | B digit1 | · · · | Z digit1 digit1 −→ 0 blank1 | 1 blank1 | · · · | 9 blank1 blank1 −→ xy digit2 digit2 −→ 0 letter3 | 1 letter3 | · · · | 9 letter3 letter3 −→ A letter4 | B letter4 | · · · | Z letter4 letter4 −→ A | B | · · · | Z
CS 236 Sect. 2.2.2 32/ 65 2.2.2. Regular Languages (12.2) Example Derivation
Postcode Here is a derivation of SA2xy8PP ∈ L(G ): postcode ⇒ S letter2 ⇒ SA digit1 ⇒ SA1 blank1 ⇒ SA1xy digit2 ⇒ SA1xy8 letter3 ⇒ SA1xy8P letter4 ⇒ SA1xy8PP
CS 236 Sect. 2.2.2 33/ 65 2.2.2. Regular Languages (12.2) Easier Proof that Postcodes are Regular
Can you give an easier proof that the language of postcodes is regular (both left-linear and right-linear)?
CS 236 Sect. 2.2.2 34/ 65 2.2.2. Regular Languages (12.2) Adding Silent Productions
We can generalise the lemma about generalising regular languages by allowing as well productions of the form A −→ B: Lemma
1. Assume a grammar G which has only productions of the form
A −→ Bw or A −→ w
for some w ∈ T ∗,A, B ∈ N. Then L(G) = L(G 0) for some left-linear grammar G 0. 2. Assume a grammar G which has only productions of the form
A −→ wB or A −→ w
for some w ∈ T ∗,A, B ∈ N. Then L(G) = L(G 0) for some right-linear grammar G 0.
CS 236 Sect. 2.2.2 35/ 65 2.2.2. Regular Languages (12.2) Multi-step Right-Linear/Left-Linear/Regular Grammars
We call grammars as above :::::::::::multistep :::::::::::::::::::::::::::::::::right-linear/left-linear/regular
::::::::::::grammars.
CS 236 Sect. 2.2.2 36/ 65 2.2.2. Regular Languages (12.2) Proof
In a first step we omit all transitions A −→ B for A, B ∈ N: Let G = (N, T , S, P) be a grammar having such transitions. We form a grammar G 0 having no such transitions as follows, defined as follows: grammar G 0 terminals N nonterminals T start symbol S ∗ 0 0 ∗ productions A → w if A ⇒G A → w for some A, A ∈ N, w ∈ T ∗ 0 0 A → wB if A ⇒G A → wB for some A, A , B ∈ N, w ∈ T ∗
CS 236 Sect. 2.2.2 37/ 65 2.2.2. Regular Languages (12.2) Proof
So in G 0 we just jump over all silent transitions A −→ B in G. We can in fact decide whether A ⇒∗ A0, since such a derivation must have 0 the form A = A or A = A1 ⇒ A2 ⇒ · · · ⇒ An = A for some Ai ∈ N. And if such derivation exists then a derivation exists in which all Ai are distinct (omit loops). Therefore n can be resricted to the number of elements in N, and therefore there are only finitely many possible derivations, which we can enumerate. For each of them we can check whether it is in fact a derivation, and therefore determine all possible derivaitons A ⇒∗ A0.
CS 236 Sect. 2.2.2 38/ 65 2.2.2. Regular Languages (12.2) Proof
Now one can easily see that for w ∈ T ∗
∗ ∗ S ⇒G w iff S ⇒G 0 w
CS 236 Sect. 2.2.2 39/ 65 2.2.2. Regular Languages (12.2) Proof
We have now obtained a grammar which fulfills the assumption of the first lemma in this Section. So the languages are definable by left-linear or right-linear grammars. Remark The left/right-linear grammars as in the previous lemma can be computed from the corresponding multistep left/right-linear grammars.
CS 236 Sect. 2.2.2 40/ 65 2.2.2. Regular Languages (12.2) Derivations in Regular Grammars
Theorem
(a) Let G = (N, T , S, P) be a left-linear grammar, A ∈ N, w ∈ (N ∪ T )∗,A ⇒∗ w. Then the derivation of A ⇒∗ w is
A ⇒ A1a1 ⇒ A2a2a1 ⇒ · · · ⇒ Anan ··· a2a1 = w (1) or A ⇒ A1a1 ⇒ A2a2a1 ⇒ · · · ⇒ Anan ··· a2a1 (2) ⇒ an+1an ··· a2a1 = w or A ⇒ (3)
for productions Ai −→ Ai+1ai+1 (in (1), (2)), An −→ an+1 (in (2)) and A → (in (3))
CS 236 Sect. 2.2.2 41/ 65 2.2.2. Regular Languages (12.2) Derivations in Regular Grammars
Theorem
(b) Let G = (N, T , S, P) be a right-linear grammar, A ∈ N, w ∈ (N ∪ T )∗,A ⇒∗ w. Then the derivation of A ⇒∗ w is
A ⇒ a1A1 ⇒ a1a2A2 ⇒ · · · ⇒ a1a2 ··· anAn = w (1) or A ⇒ a1A1 ⇒ a1a2A2 ⇒ · · · ⇒ a1a2 ··· anAn (2) ⇒ a1a2 ··· anan+1 = w or A ⇒ (3)
for productions Ai −→ ai+1Ai+1 (in (1) and (2)), An −→ an+1 (in (2)) and A → (in (3)).
CS 236 Sect. 2.2.2 42/ 65 2.2.2. Regular Languages (12.2) Proof
The above are the only derivations possible, noting that A ⇒ only occurs if A does not occur on the left hand side of a production.
CS 236 Sect. 2.2.2 43/ 65 2.2.2. Regular Languages (12.2) Mixing of Left- and Right-Linear
Remark In a regular grammar we are not allowed to mix left-linear and right-linear grammars. Otherwise we would obtain truely context-free languages.
CS 236 Sect. 2.2.2 44/ 65 2.2.2. Regular Languages (12.2) Example (Mixing Left/Right-Linear Rules)
The following grammar generates the language L(G) = ? which (as we will later) is context-free but not regular. grammar G
terminals a, b
nonterminals S, T
start symbol S
productions S −→ ab S −→ aT T −→ Sb
CS 236 Sect. 2.2.2 45/ 65 2.2.3. Regular Expressions (13.8)
2.2.1. Chomsky Hierarchy (12.1)
2.2.2. Regular Languages (12.2)
2.2.3. Regular Expressions (13.8)
CS 236 Sect. 2.2.3 46/ 65 2.2.3. Regular Expressions (13.8) Operators for Forming Languages
Definition ∗ Let L1, L2, L ⊆ T be languages over the alphabet T .
1. The ::::::::::::::::concatenation L:::::1.L2 of L1 and L2 is defined as
L1.L2 := {w1w2 | w1 ∈ L1, w2 ∈ L2}
2. The union L | L of L and L is defined as ::::::: ::::::1 2 1 2
L1 | L2 := L1 ∪ L2
The union is sometimes denoted by +: . ∗ 3. The :::::::::iteration or :::::::::::::Kleene-star L:: of L is defined as
∗ L := {w1w2 ··· wn | n ≥ 0, w1,..., wn ∈ L}
NoteCS that236 ∈ L∗. Sect. 2.2.3 47/ 65 2.2.3. Regular Expressions (13.8) Regular Expressions
Definition
Let T be an alphabet. We define the set of ::::::::regular:::::::::::::::expressions over T inductively, where each regular expression will be a language L ⊆ T ∗.
I ∅, {} are regular expressions.
I If a ∈ T then {a} is a regular expression. 0 0 0 ∗ I If L, L are regular expression, so are L.L , L | L , L .
CS 236 Sect. 2.2.3 48/ 65 2.2.3. Regular Expressions (13.8) Examples of Regular Expressions
I The set of non-zero digits is defined as
NonzeroDigit = {1} | {2} | · · · | {9}
I The set of digits is defined as
Digit = {0} | NonZeroDigit
I The set of numbers without leading zero is
Number = {0}|(NonZeroDigit.(Digit∗))
I The set of capital letters is defined by
CapitalLetter = {A} | {B} | · · · | {Z}
CS 236 Sect. 2.2.3 49/ 65 2.2.3. Regular Expressions (13.8) Examples of Regular Expressions
I The set of postcodes can be defined as
postcode = CapitalLetter.CapitalLetter.Digit.{xy}. Digit.CapitalLetter.CapitalLetter
CS 236 Sect. 2.2.3 50/ 65 2.2.3. Regular Expressions (13.8) Regular Expressions in Programming
I Regular Expressions occur very often in programming. I They occur in
I Linux/Unix (command grep/egrep), I in scripting languages (Perl, Python, Ruby), I (one of the main innovations of Ruby over Python was an improved notation ∼ for matching of regular expresions),
I in SQL,
I are supported in most programming languages by libraries.
CS 236 Sect. 2.2.3 51/ 65 2.2.3. Regular Expressions (13.8) Notations for Regular Expressions
I One usually writes a instead of {a}. In order to avoid ambiguities, one has to make T distinct from operations on regular expressions, and writes \( for the symbol ( in the alphabet, similar for \), \|, \∗. 0 0 I One writes LL for L.L .
I One writes [a1 ··· an] for {a1} | · · · | {an}.
I One writes [a − z] for [a, b, c,... z] similarly for [0 − 9]. ∗ + ∗ I One writes L∗ for L , L or L+ for L.L (so + L := {w1,..., wn | n ≥ 1, w1,..., wn ∈ L}).
I Lots of other useful operators for constructing regular exprssions have been defined.
I Each language has its own set and of regular expressions (using often different notations), and its own syntax. Sometimes operators are introduced which go beyond regular languages.
CS 236 Sect. 2.2.3 52/ 65 2.2.3. Regular Expressions (13.8) Example Use of Regular Expressions
I Assume you have files called logiccomputationch1.tex, logiccomputationch2.tex, logiccomputationch3.tex ,. . . Concatenation all of them into one file: cssetzer@cs-svr1:> cat logiccomputation[0-9].tex > logiccomputationall.tex
I Process lines in a file containing entries separated by “,”, do something if the first field is a student number (a string consisting of digits only). Python code file = open(filename) regExpStud = re.compile(’^[0-9]*$’) for line in file: a = line.split(’,’) if regExpStud.match(a[0]): print a[1][:-1] #cut off trailing ’\n’ file.close()
CS 236 Sect. 2.2.3 53/ 65 2.2.3. Regular Expressions (13.8) Closure of Regular Languages
In order to show that all regular expressions are regular we first show the following Lemma Let G, G 0 be both left-linear grammars or both right-linear grammars. Then we can define a left-linear or right-linear grammars Gi s.t. 0 1. L(G1) = L(G) | L(G ), 0 2. L(G2) = L(G).L(G ), ∗ 3. L(G3) = L(G) .
CS 236 Sect. 2.2.3 54/ 65 2.2.3. Regular Expressions (13.8) Proof
Assume in 1./2./3.
G = (T , N, S, P) , G 0 = (T 0, N0, S0, P0) .
After renaming of nonterminals we can assume N ∩ N0 = ∅. Let S00 be a new symbol not in N ∪ N0 ∪ T ∪ T 0. We define multi-step left/right-linear grammars with those properties, from which one can construct ordinary (one-step) left/right-linear grammars with those properties. We only carry out the proof for right-linear grammars.
CS 236 Sect. 2.2.3 55/ 65 2.2.3. Regular Expressions (13.8) Proof of 1.
We define G1 as follows: grammar G1
terminals T ∪ T 0
nonterminals N ∪ N0 ∪ {S00}
start symbol S00
productions S00 −→ S S00 −→ S0 P P0
CS 236 Sect. 2.2.3 56/ 65 2.2.3. Regular Expressions (13.8) Proof of 1.
0 So G1 has the productions from G and G plus S00 −→ S and S00 −→ S0 .
Derivations in G1 have the form S00 ⇒ S ⇒∗ w and S00 ⇒ S0 ⇒∗ w 0 for derivations ∗ S ⇒G w and 0 ∗ 0 S ⇒G 0 w So for w 00 ∈ (T ∪ T 0)∗ we have 00 ∗ 00 ∗ 00 0 ∗ 00 S ⇒ w iff S ⇒ w or S ⇒ 0 w , G1 G G so L(G 00) = L(G) ∪ L(G 0). CS 236 Sect. 2.2.3 57/ 65 2.2.3. Regular Expressions (13.8) Proof of 2.
We define G2 as follows: grammar G2
terminals T ∪ T 0
nonterminals N ∪ N0
start symbol S
productions A −→ aA0 for A −→ aA0 ∈ P (A, A0 ∈ N, a ∈ T ) A −→ aS0 for A −→ a ∈ P (A ∈ N, a ∈ T ) P0
CS 236 Sect. 2.2.3 58/ 65 2.2.3. Regular Expressions (13.8) Proof of 2.
So G2 has 0 I the productions from G ,
I the productions of the form A −→ aA from G and 0 I productions A −→ aS , if A −→ a is a production from G.
A derivation in G2 starts with a derivation
S ⇒ a1A1 ⇒ a1a2A2 ⇒ a1a2a3A3 ⇒ · · · ⇒ a1a2 ··· an−1An−1 0 ⇒ a1a2 ··· anS
for derivations in G of the form
S ⇒ a1A1 ⇒ a1a2A2 ⇒ a1a2a3A3 ⇒ · · · ⇒ a1a2 ··· an−1An−1 ⇒ a1a2 ··· an .
CS 236 Sect. 2.2.3 59/ 65 2.2.3. Regular Expressions (13.8) Proof of 2.
Then this is followed by a derivation
0 a1a2 ··· anS ⇒ a1a2 ··· anb1B1 ⇒ a1a2 ··· anb1b2B2 ⇒ · · · ⇒ a1a2 ··· anb1b2 ··· bm−1Bm−1 ⇒ a1a2 ··· anb1b2 ··· bm ,
for a derivation in G 0 of the form
0 S ⇒ b1B1 ⇒ b1b2B2 ⇒ · · · ⇒ b1b2 ··· bm−1Bm−1 ⇒ b1b2 ··· bm
Therefore S ⇒∗ w for some w ∈ (T ∪ T 0)∗ if and only if S ⇒∗ w 0 and G2 G1 S0 ⇒∗ w 00 for some w 0, w 00 s.t. w = ww 00. G2 0 So L(G2) = L(G).L(G ).
CS 236 Sect. 2.2.3 60/ 65 2.2.3. Regular Expressions (13.8) Proof of 3.
We define G3 as follows: grammar G3
terminals T
nonterminals N
start symbol S
productions S −→ , A −→ aA0 for A −→ aA0 ∈ P (A, A0 ∈ N, a ∈ T ) A −→ aS for A −→ a ∈ P (A ∈ N, a ∈ T )
CS 236 Sect. 2.2.3 61/ 65 2.2.3. Regular Expressions (13.8) Proof of 3.
Derivations in G3 are S ⇒ or they start similarly as for concatenation with S ⇒∗ wS for a derivation in G S ⇒∗ w and w ∈ N+. In the latter case it can continue either (using S −→ ) with wS ⇒ w or with wS ⇒∗ ww 0S for a derivation in G S ⇒∗ w 0 Again in the latter case we can continue (using S −→ ) with ww 0S → ww 0 or with ww 0S ⇒∗ ww 0w 00S for a derivation in G S ⇒∗ w 00 CS 236 Sect. 2.2.3 62/ 65 etc. 2.2.3. Regular Expressions (13.8) Proof of 3.
We obtain that in G3 we have
S ⇒∗ w
if there exist derivations in G of ∗ I S ⇒ w1 ∗ I S ⇒ w2
I ··· ∗ I S ⇒ wn
s.t. w = w1w2 ··· wn. So we get
∗ L(G3) = {w1w2 ··· wn | n ≥ 0, w1,..., wn ∈ L(G)} = L(G)
CS 236 Sect. 2.2.3 63/ 65 2.2.3. Regular Expressions (13.8) Regular Expressions are Regular
Lemma Let L be a regular Expression. Then there exist both left-linear and right-linear grammars G, G 0 s.t.
L(G) = L(G 0) = L
CS 236 Sect. 2.2.3 64/ 65 2.2.3. Regular Expressions (13.8) Proof
Induction on the definition of regular expressions. Case 1: L = ∅, {}, {a} (where a ∈ T ). Then L is finite, therefore definable by a left/right-linear grammar. ∗ Case 2: L = L1 | L2 or L = L1.L2 or L = L1. By IH Li are defined by left/right-linear grammars Gi . By the last lemma it follows that L can be defioned by a left/right-linear grammar.
CS 236 Sect. 2.2.3 65/ 65