Morphic Characterizations with Insertion and Locality in the Framework of Chomsky-Sch¨Utzenberger Theorem 藤岡薫 † Kaoru Fujioka

FIT2011（第 10 回情報科学技術フォーラム）

A-010

Morphic Characterizations with Insertion and Locality in the Framework of Chomsky-Sch¨utzenberger Theorem 藤岡薫 † Kaoru Fujioka

1. Introduction |w| ≥ k, w is in L iff the prefix of w of length k is in Representing a class of languages through opera- A, the suffix of w of length k is in B, and any proper tions on its subclasses is a traditional issue within interior substring of w of length k is in C [3]. Let formal language theory. For context-free languages, LOC(k) be the class of strictly k-testable languages. there is a well-known Chomsky-Schützenberger cha- It has already been known the proper inclusion ⊂ ⊂ · · · ⊂ ⊂ · · · ⊂ racterization using its subclasses: each context-free LOC(1) LOC(2) LOC(k) REG, language L can be represented in the form L = where REG is the class of regular languages [4]. h(D ∩ R), where D is a Dyck language (parenthesis For languages L1, L2, and a morphism h, we use the ∩ { | ∈ ∩ language), R is a regular language, and h is a pro- following notation: h(L1 L2) = h(w) w L1 } L L jection [1]. L2 . For language classes 1 and 2, we introduce Among the variety of representation theo- the following language class: rems for context-free languages [7] [5], Chomsky- H(L1 ∩ L2) = {h(L1 ∩ L2) | h is a morphism, Schützenberger theorem is unique in that it consists Li ∈ Li (i = 1, 2)}. of Dyck languages, regular languages, and simple operations. In this work, we obtain some characteri- We show characterizations and representation the- zations and representation theorems of context-free orems of language families in Chomsky hierarchy with languages and regular languages in Chomsky hier- this notation. archy by insertion systems, strictly locally testable 3. Characterizations by context-free inser- languages, and morphisms in the framework of tion systems Chomsky-Schützenberger theorem. In this section, using insertion systems of weight 2. Preliminaries (i, 0) for i ≥ 1, in which no insertion operation can be

An insertion system is a triple γ = (T,P,AX ), controlled by contexts, we prove the following charac- where T is an alphabet, P is a finite set of inser- terizations concerning the class of regular languages tion rules of the form (u, x, v) with u, x, v ∈ T ∗, and the class of context-free languages. and AX is a finite set of strings over T called ax- We prove the representation theorem for regular r ioms. We write α =⇒γ β if α = α1uvα2 and languages with the help of strictly 2-testable langua- 0 ∩ β = α1uxvα2 for some insertion rule r :(u, x, v) ∈ P ges such that REG = H(INS1 LOC(2)). To show ∗ ⊇ 0 ∩ with α1, α2 ∈ T . A language generated by γ is defi- this, we consider the inclusion REG H(INS1 { ∈ ∗ | ⇒∗ ∈ } LOC(2)), which can be derived from INS0 ⊂ REG, ned by L(γ) = w T s = γ w for some s AX . 1 An insertion system γ is said to be of weight (i, j) if LOC(2) ⊂ REG, and the closure properties of regular i = max{|x| | (u, x, v) ∈ P }, j = max{|u| | (u, x, v) ∈ languages. The other inclusion is proved by simula- ∈ } ≥ j ting the derivations in regular grammar using inser- P or (v, x, u) P . For i, j 0, INSi is the class of all languages generated by insertion systems of weight tion systems and strictly 2-testable languages by the (i′, j′) with i′ ≤ i and j′ ≤ j. induction. From the definition, for any 0 ≤ i′ ≤ i and 0 ≤ j′ ≤ In contrast to this, we show that a regular language ′ j i L = {al | l ≥ 0} ∪ {bl | l ≥ 0} cannot be written in j, we have an inclusion INS ′ ⊆ INS [6]. i j ∩ Furthermore, it has already been known that for the form h(L(γ) R), for any insertion system γ of ≥ 1 weight (i, 0) (∀i ≥ 1), strictly 1-testable language R, any i 1, INSi is properly included by the class of context-free languages, denoted by CF [6] and INS0 and morphism h by contradiction. Then we have the 1 0 ∩ ⊂ is properly included by the class of regular languages, proper inclusion H(INS1 LOC(1)) REG. denoted by REG [2]. The value of weight (1, 0) in insertion systems is op- For a positive integer k ≥ 1, a language L over an timal for expressing regular languages which is shown alphabet T is strictly k-testable if there is a triplet by considering the following example: For an inser- { } { } { } (A, B, C) with A, B, C ⊆ T k such that for any w with tion system γ = ( a, b , (λ, ab, λ) , λ ), a strictly 1-testable language R prescribed by A = B = C = † Office for Strategic Research Planning, Kyushu University, {a, b}, and a morphism h : T ∗ → T ∗ such as h(c) = c 6-10-1 Hakozaki Higashi-ku Fukuoka-shi, 812-8581, Japan. [email protected] (∀c ∈ T ), it is shown that L(γ) ∩ R = h(L(γ) ∩ R) =

{ | ∈ ̸ } 1 ⊂ w w L(γ), w = λ is not regular by considering a can be derived directly from the fact that INS1 language L(γ) ∩ R ∩ {aibj | i, j ≥ 1} = {aibi | i ≥ 1}, CF , LOC(2) ⊂ REG, and the closure property of which is not regular. context-free languages. For context-free languages, we prove the representa- 0 ∩ 5. Conclusion tion theorem such as CF = H(INS2 LOC(2)). To ⊆ 0 ∩ show this, the inclusion CF H(INS2 LOC(2)) In this work, we contribute to the study of inser- is proved by simulating a derivation of context-free tion systems controlled by a context of length 1 and grammar in Chomsky normal form. The other one is context-free insertion systems for new characterizati- 0 ⊂ ⊂ proved by using INS2 CF , LOC(2) REG, and ons of context-free and regular languages. closure properties of context-free languages. We showed the following: • 0 ∩ ≥ 4. Characterizations by insertion systems H(INS1 LOC(k))) = REG with k 2. with a context of length one • H(INS0 ∩ LOC(1)) ⊂ REG ⊂ H(INS0 ∩ Now we consider the characterization and re- 1 i LOC(k)) with i, k ≥ 2. presentation theorems using insertion systems of ≥ 1 • 0 ∩ ≥ weight (i, 1) with i 1. As is shown in [6], INSi H(INSi LOC(k)) = CF with i, k 2. is known to be a proper subset of the class of 1 • 1 ∩ ⊂ ≥ context-free languages. We prove that INSi with H(INSi LOC(1)) CF with i 1. i ≥ 1 is incomparable with the class of regular • H(INS1 ∩ LOC(k)) = CF with i ≥ 1, k ≥ 2. languages by considering an insertion system γ = i ({a, b, c, d}, {(a, c, b), (c, d, b), (c, a, d), (a, b, d)}, {ab}) Acknowledgments and a regular language L = {a2in | n ≥ 1}. With the help of strictly 1-testable languages and This work was supported in part by Grants-in-Aid 1 ⊆ for scientiﬁc research from the Ministry of Education, simple operations, we show the inclusion INSi 1 ∩ Culture, Sports, Science and Technology of Japan H(INSi LOC(1)). The inclusion can be derived easily if we consider the fact V ∗ ∈ LOC(1) for any (No. 23740081). alphabet V . References Furthermore, we show a characterization of context- [1] Chomsky, N., Schutzenberger,M.P. The algebraic free languages using insertion systems of weight (i, 1) theory of context-free languages. Computer Pro- with i ≥ 1 and strictly 1-testable languages. The in- 1 ∩ ⊆ ≥ gramming and Formal Systems, pp.118-161, 1963. clusion H(INSi LOC(1)) CF (i 1) can be de- 1 ⊂ ⊂ rived directly from the fact INSi CF , LOC(1) [2] Fujioka, K. Morphic characterizations of langua- REG, and the closure property of context-free lan- ges in Chomsky hierarchy with insertion and loca- 1 ∩ guages. We prove the proper inclusion H(INSi lity. Inf. Comput., 209, 3, pp.397-408, 2011. LOC(1)) ⊂ CF (i ≥ 1) by showing that for a context- free language L = {anban | n ≥ 1}, there are no inser- [3] Head, T. Splicing representations of strictly lo- tion system γ of weight (i, 1), strictly 1-testable lan- cally testable languages Discrete Applied Math, 87, guage R, and morphism h such that L = h(L(γ) ∩ R) pp.87-139, 1998. by contradiction. [4] McNaughton,R., Papert,S.A. Counter-Free Auto- In contrast to this, we prove the representation the- mata (M.I.T. research monograph no. 65). The orem for the class of context-free languages by inser- MIT Press, 1971. tion systems of weight (1, 1) with the help of strictly 1 ∩ 2-testable languages, that is, H(INS1 LOC(2)) = [5] Okubo, F., Yokomori, Y. Morphic Characteriza- CF . tions of Language Families in Terms of Insertion To show the representation theorem, from Systems and Star Languages. Int. J. Found. Com- Chomsky-Sch¨utzenberger theorem, CF = put. Sci., 22, 1, pp. 247-260, 2011. H(Dyck ∩REG), we consider the inclusion 1 ∩ ⊇ ∩ [6] P˘aun, G., Rozenberg, G., Salomaa, A. DNA H(INS1 LOC(2)) H(Dyck REG), where Dyck is the class of Dyck languages. For any Computing. New Computing Paradigms. Springer, language L = h(D ∩ L(G)) with Dyck language D, 1998. regular grammar G, and morphism h, we construct [7] Rozenberg, G., Salomaa, A. Handbook of formal an insertion system γ of weight (1, 1), a strictly languages Springer-Verlag New York, Inc., 1997. 2-testable language R, and a morphism h′ and prove that h′(L(γ) ∩ R) = L holds. 1 ∩ ⊆ The converse inclusion H(INS1 LOC(2)) CF

Morphic Characterizations with Insertion and Locality in the Framework of Chomsky-Sch¨Utzenberger Theorem 藤岡 薫 † Kaoru Fujioka

Morphic Characterizations with Insertion and Locality in the Framework of Chomsky-Sch¨Utzenberger Theorem 藤岡薫 † Kaoru Fujioka