CS 236 Language and Computation

CS 236 Language and Computation Course Notes Sect 2.2: The Chomsky Hierarchy and Regular Languages Anton Setzer (Based on a book draft by J. V. Tucker and K. Stephenson) Dept. of Computer Science, Swansea University http://www.cs.swan.ac.uk/∼csetzer/lectures/ languageComputation/09/index.html December 12, 2009 CS 236 Sect. 2.2 1/ 65 2.2.1. Chomsky Hierarchy (12.1) 2.2.2. Regular Languages (12.2) 2.2.3. Regular Expressions (13.8) CS 236 Sect. 2.2 2/ 65 2.2.1. Chomsky Hierarchy (12.1) Chomsky Hierarchy The :::::::::::Chomsky :::::::::::hierarchy is the classification of grammars by means of 4 properties of its production rules: I Unrestricted grammars. I The limit of grammars. I Context-sensitive grammars. I It's usually an accident if a grammar of a language is context-sensitive. C and C++ have some context-sensitive aspects (dealt with by selecting correct strings after the parsing). I Context-free grammars. I Easy to understand and supported by parse generators. I In language design one aims at languages having an underlying context-free grammar. I Regular grammars. I Simple to parse. Used for dividing the input stream of characters into tokens. CS 236 Sect. 2.2.1 3/ 65 2.2.1. Chomsky Hierarchy (12.1) Unrestricted Grammars Let in the following four definitions G = (T ; N; S; P) be a grammar. Definition Any grammar G is of :::::Type::0 or ::::::::::::::unrestricted, so any production u −! v for u 2 (T [ N)+, v 2 (T [ N)∗ are allowed. CS 236 Sect. 2.2.1 4/ 65 2.2.1. Chomsky Hierarchy (12.1) Context-Sensitive Grammars Definition Any grammar G is of :::::Type::1 or :::::::::::::::::::context-sensitive, if all its productions have the form uAv −! uwv where A 2 N is a nonterminal, which rewrites to a non-empty string w 2 (T [ N)+, but only where A is in the context of strings u; v 2 (T [ N)∗. Furthermore a production A −! is allowed, but only if A does not occur in the right hand side of any production. CS 236 Sect. 2.2.1 5/ 65 2.2.1. Chomsky Hierarchy (12.1) Context-Free Grammars Definition Any grammar G is of :::::Type::2 or ::::::::::::::context-free, if all its productions have the form A −! w where A 2 N is a nonterminal, which rewrites to a string w 2 (T [ N)∗. CS 236 Sect. 2.2.1 6/ 65 2.2.1. Chomsky Hierarchy (12.1) Regular Grammars Definition 1. A grammar G is :::::::::::left-linear, iff all its productions have the form A −! Ba or A −! a or A −! 2. A grammar G is ::::::::::::right-linear, iff all its productions have the form A −! aB or A −! a or A −! 3. A grammar G is of :::::Type::3 or ::::::::regular, iff it is left-linear or right-linear In the above we have A; B 2 N are nonterminal and a 2 T . Note that in a regular grammar either all productions must be left-linear or all productions must be right-linear, so no mixing of the left-linear and right-linear is allowed. CS 236 Sect. 2.2.1 7/ 65 2.2.1. Chomsky Hierarchy (12.1) A Hierarchy of Languages Definition ∗ A language L ⊆ T is ::::::::::::::unrestricted, :::::::::::::::::::context-sensitive, ::::::::::::::context-free, or ::::::::regular, iff there exists a grammar G of the relevant type such that L(G) = L. Remark For any L we have L regular ) L context-free ) L context-sensitive ) L unrestricted. CS 236 Sect. 2.2.1 8/ 65 2.2.1. Chomsky Hierarchy (12.1) Hierarchy of Languages We have that I every regular grammar is context-free. I every context-sensitive grammar is an unrestricted grammar. However not every context-free grammar is context sensitive, since context-sensitive languages allow only productions A −! if A does not occur at the right hand side of a production. (Otherwise all unrestricted languages would be context-sensitive). However one can construct from a context-free grammar a context-free grammar of the same language, which has only productions A −! , if A does not occur on the right hand side of a production. This grammar is therefore context-sensitive as well. CS 236 Sect. 2.2.1 9/ 65 2.2.1. Chomsky Hierarchy (12.1) Hierarchy of Languages context context-free sensitive unrestricted regular CS 236 Sect. 2.2.1 10/ 65 2.2.1. Chomsky Hierarchy (12.1) Examples of Equivalent Grammars We give grammars of each type for defining the language 2n La := fai j i is even g CS 236 Sect. 2.2.1 11/ 65 2.2.1. Chomsky Hierarchy (12.1) Unrestricted Grammar for La2n 2n grammar G unrestricted;a terminals a nonterminals S start symbol S productions S −! S −! aa a −! aaa CS 236 Sect. 2.2.1 12/ 65 2.2.1. Chomsky Hierarchy (12.1) Context-Sensitive Grammar for La2n 2n grammar G context−sensitive;a terminals a nonterminals S; T start symbol S productions S −! S −! aa S −! aaT aT −! aTaa aT −! aaa CS 236 Sect. 2.2.1 13/ 65 2.2.1. Chomsky Hierarchy (12.1) Context-Free Grammar for La2n 2n grammar G context−free;a terminals a nonterminals S start symbol S productions S −! S −! aSa CS 236 Sect. 2.2.1 14/ 65 2.2.1. Chomsky Hierarchy (12.1) Regular Grammar for La2n 2n grammar G regular;a terminals a nonterminals S; A start symbol S productions S −! S −! aA A −! aS CS 236 Sect. 2.2.1 15/ 65 2.2.1. Chomsky Hierarchy (12.1) Example 1 (Grammars of the Levels of the Chomsky Hierarchy) grammar G terminals a; b nonterminals S start symbol S productions S −! aSa; S −! bSb; S −! L(G) = ? G is of which type? CS 236 Sect. 2.2.1 16/ 65 2.2.1. Chomsky Hierarchy (12.1) Example 2 grammar G terminals a nonterminals S start symbol S productions S −! a; S −! aS L(G) = ? G is of which type? CS 236 Sect. 2.2.1 17/ 65 2.2.1. Chomsky Hierarchy (12.1) Example 3 grammar G terminals a; b nonterminals S start symbol S productions S −! ab; S −! aSb L(G) = ? G is of which type? CS 236 Sect. 2.2.1 18/ 65 2.2.1. Chomsky Hierarchy (12.1) Example 4 n n n grammar G a b c terminals a; b nonterminals S start symbol S productions S −! aSBC; S −! aBC; CB −! HB; HB −! HC; HC −! BC; aB −! ab; bB −! bb; bC −! bc; cC −! cc: L(G) = fanbncn j n ≥ 1g. G is of which type? CS 236 Sect. 2.2.1 19/ 65 2.2.1. Chomsky Hierarchy (12.1) Examples context context-free sensitive unrestricted regular n n n fa b j fa j n ≥ 1g n ≥ 1g fanbncn j n ≥ 1g CS 236 Sect. 2.2.1 20/ 65 2.2.2. Regular Languages (12.2) 2.2.1. Chomsky Hierarchy (12.1) 2.2.2. Regular Languages (12.2) 2.2.3. Regular Expressions (13.8) CS 236 Sect. 2.2.2 21/ 65 2.2.2. Regular Languages (12.2) Finite languages are regular grammar G ab;aabb;aaabbb terminals a; b nonterminals S start symbol S productions S −! ab S −! aabb S −! aaabbb The above grammar is not regular, since there can only be one terminal in the right hand string. But we can amend this: CS 236 Sect. 2.2.2 22/ 65 2.2.2. Regular Languages (12.2) Finite languages are regular grammar G ab;aabb;aaabbb terminals a; b nonterminals S; S1; S2; S3; S4; S5; S6; S7; S8; S9 start symbol S productions S −! aS1, S1 −! b S −! aS2, S2 −! aS3, S3 −! bS4, S4 −! b S −! aS5, S5 −! aS6, S6 −! aS7, S7 −! bS8, S8 −! bS9, S9 −! b CS 236 Sect. 2.2.2 23/ 65 2.2.2. Regular Languages (12.2) Observation The above can be generalised to the following Lemma 1. Assume a grammar G which has only productions of the form A −! Bw or A −! w 0 for some w 2 T +, w 0 2 T ∗,A; B 2 N. Then L(G) = L(G 0) for some left-linear grammar G 0. 2. Assume a grammar G which has only productions of the form A −! wB or A −! w 0 for some w 2 T +, w 0 2 T ∗,A; B 2 N. Then L(G) = L(G 0) for some right-linear grammar G 0. CS 236 Sect. 2.2.2 24/ 65 2.2.2. Regular Languages (12.2) Proof I In (2) replace I Productions A −! a1a2 ··· anB with n ≥ 2 by A −! a1A1, A1 −! a2A2 ;:::; An−1 −! anB for some new nonterminals Ai . I Productions A −! a1a2 ··· an with n ≥ 2 by A −! a1A1, A1 −! a2A2 ;:::; An−1 −! an for some new nonterminals Ai . I (1) is proved similarly. CS 236 Sect. 2.2.2 25/ 65 2.2.2. Regular Languages (12.2) Lemma Lemma All finite languages are regular. Proof: Extend the example above. CS 236 Sect. 2.2.2 26/ 65 2.2.2. Regular Languages (12.2) A Left-Linear Grammar for ambn The following left-linear grammar generates fambn j m; n ≥ 1g. m n grammar G left−linear;a b terminals a; b nonterminals S; T start symbol S productions S −! Sb S −! Tb T −! Ta T −! a CS 236 Sect. 2.2.2 27/ 65 2.2.2. Regular Languages (12.2) A Right-Linear Grammar for ambn The following right-linear grammar generates fambn j m; n ≥ 1g: m n grammar G right−linear;a b terminals a; b nonterminals S; T start symbol S productions S −! aS S −! aT T −! bT T −! b CS 236 Sect.

Load more