Multiset Controlled Grammars: a Normal Form and Closure Properties
Total Page:16
File Type:pdf, Size:1020Kb
Indonesian Journal of Electrical Engineering and Computer Science Vol. 8, No. 1, October 2017, pp. 36 ~ 42 DOI: 10.11591/ijeecs.v8.i1.pp36-42 36 Multiset Controlled Grammars: A Normal Form and Closure Properties Salbiah Ashaari1, Sherzod Turaev*2, M. Izzuddin M. Tamrin3, Abdurahim Okhunov4, Tamara Zhukabayeva5 1,2,3Kulliyyah of Information and Communication Technology, International Islamic University Malaysia 53100 Kuala Lumpur, Malaysia 4Kulliyyah of Engineering, International Islamic University Malaysia, 53100 Kuala Lumpur, Malaysia 5Faculty of Information Technology, L.N. Gumilyov Eurasian National University, 010008 Astana, Kazakhstan *Corresponding author, e-mail: [email protected], [email protected], [email protected], [email protected], [email protected] Abstract Multisets are very powerful and yet simple control mechanisms in regulated rewriting systems. In this paper, we review back the main results on the generative power of multiset controlled grammars introduced in recent research. It was proven that multiset controlled grammars are at least as powerful as additive valence grammars and at most as powerful as matrix grammars. In this paper, we mainly investigate the closure properties of multiset controlled grammars. We show that the family of languages generated by multiset controlled grammars is closed under operations union, concatenation, kleene-star, homomorphism and mirror image. Keywords: multiset, regulated grammar, multiset controlled grammar, generative capacity, closure property Copyright © 2017 Institute of Advanced Engineering and Science. All rights reserved. 1. Introduction A regulated grammar is depicted as a grammar with an additional (control) mechanism that able to restrict the use of the productions (a.k.a. rules) during derivation process in order to avoid certain derivations as well as to obtain a subset of the language generated in usual way. The primary motivation for introducing regulated grammars came from the fact that a plenty of languages of interest are seen to be non-context free such as the languages with reduplication, multiple agreement or crossed agreement properties. Therefore, the main aim of regulated grammars is to achieve a higher computational power and yet at the same time does not increase the complexity of the model [1, 2]. It is believed that the first regulated grammar, which is a matrix grammar, was introduced by Abraham in 1965 with the idea such in a derivation step, a sequence of productions are applied together [3]. Since then, a plenty of regulated grammars have been introduced and investigated in several papers such as [1-13], where each has a different control mechanism, and provides useful structures to handle a variety of issues in formal languages, programming languages, DNA computing, security, bioinformatics and many other areas. In this paper, we continue our research on multiset controlled grammars (see [4]); we investigate the closure properties of the family of languages generated by multiset controlled grammars. The study of closure properties is one of a crucial investigation in formal language theory since it provides a meaningful merit in both theory and practice of grammars. This paper is structured as follows. First, we give some basic notations and knowledge concerning to the theory of formal languages in which include grammars with regulated rewriting and set-theoretic operations on languages that will be used throughout the study. Then, we recall the definitions of multiset controlled grammars defined in [4] together with results on their generative power. Then, we demonstrate that for multiset controlled grammars one can construct equivalent normal forms, which will be used in study of the closure properties. In the last section, we put in a nutshell all materials studied previously with some possible future research topics on those grammars. Received June 6, 2017; Revised August 16, 2017; Accepted September 1, 2017 IJEECS ISSN: 2502-4752 37 2. Preliminaries In this section, we present some basic notations, terminologies and concepts concerning to the formal languages theories, multiset and regulated rewriting grammars that will be used in the following sections. For details, the reader is referred to [1, 2, 14-16]. Throughout the paper, we use the following basic notations. Symbols and represent the set membership and negation of set membership of an element to a set. Symbol signifies the set inclusion and marks the strict inclusion. Then, for a two sets and ∈ , ∉ if and . Further, notation | | is used to portray the cardinality of a set in which is⊆ the number of elements in the set⊂ as well as notation 2 is used to depict the 퐴power 퐵set퐴 of⊊ a퐵 set퐴 ⊆. Symbol퐵 퐴 ≠ denotes퐵 the empty set.퐴 The sets of integer,퐴 natural, real and rational퐴 numbers are denoted by , , and , respectively.퐴 퐴 ∅ An alphabet is a finite and nonempty set of symbols or letters, which is ℤdenotedℕ ℝ byℚ and a string (sometimes referred as word) over the alphabet is a finite sequence of symbols (concatenation of symbols) from . The string without symbols is called null or empty string andΣ denoted by . The set of all strings (including ) over the alphabet,Σ is denoted by and = { }. A string is a substringΣ of a string if and only if there exist , such∗ that += ∗ where휆 , , , . String is a휆 proper substring of ifΣ and Σ . The 1 2 lengthΣ Σ of− string휆 , denoted푤 by | |∗, is the number of symbols휈 in the string. Obviously,푢 푢 | | = 0 and 1 2 1 2 |휈 |푢=푤|푢| + | |, 푢, 푢 푤 . 휈A ∈languageΣ is푤 a subset of . A language휈 is푤 ≠ -휆free if 푤 ≠ 휈. For a set , a multiset푤 is a mapping∗ 푤: . The set of all multisets∗ over is denoted by 휆 . Then, the푤휈 set 푤 is 푣a called푤 휈 ∈ theΣ basic set of퐿 . For a multisetΣ and퐿 element휆 휆 ∉⊕퐿 , ( ) represents퐴 the number of occurrences휇 퐴 → ℕ of ⊕in . ⊕ 퐴 퐴 A퐴 Chomsky grammar is a quadruple퐴 = ( , , ,휇 )∈ 퐴where is an alphabet푎 ∈ 퐴 휇 푎of nonterminals, is an alphabet of terminals푎 and휇 = , is the start symbol and is a finite set of productions such that ( ) 퐺( 푁)푇×푆(푃 ) . We푁 simply use notation for a production푇 ( , ) . A direct derivation∗푁 ∩ 푇 relation∅∗ 푆 ∈ over푁 (∗ ) , denoted by 푃 , is defined as provided if and only푃 ⊆ if푁 there∪ 푇 푁is 푁a ∪rule푇 푁 ∪ 푇 such ∗that = and 퐴 =→ 푤 for some , 퐴 푤( ∈ 푃 ) . Since is a relation, then its th,푁 ∪ 푇 0, power is denoted⇒ 1 2 by , its transitive푢 ⇒ 푣 closure by ∗ , and its reflexive and퐴 →transitive푤 ∈ 푃 closure by푢 푥. 퐴A푥 string 1 2 1 2 푣 푥(푛푤푥 ) is a sentential푥 푥 ∈ form푁 ∪ 푇if + . If⇒ , then is called푛 a푛 sentence≥ or ∗a terminal …. ⇒ ∗ ⇒ ∗ ∗ ⇒ string and is said to be a successful derivation. We also use the notations or to 푤 ∈ 푁 ∪ 푇 푆 ⇒ 푤 푤 ∈ 푇 푤 푚 푟0푟1 푟푛 denote the derivation∗ that uses the sequence of rules = … , , 1 . The language 푆ge⇒nerated푤 by , denoted by ( ), is defined as ( ) = { | ⇒ ��}�.� �Two� 0 1 푛 푖 grammars and are called to be equivalent if and only if푚 they푟 generate푟 푟 푟 the∈ 푃 ∗same≤ 푖 ∗language,≤ 푛 i.e., ( ) = ( ). There are퐺 five main types퐿 퐺of grammars depending퐿 퐺 on their푤 ∈ 푇productions푆 ⇒ 푤 forms 1 2 : 퐺 퐺 1 2 a)퐿 퐺 a regular퐿 퐺 if and , 푢 →b)푣 a linear if ∗ ∗ and , c) a context-free푣 ∈ if∗푇 ∪ ∗푁( 푇 ∗ ) 푢and∈ 푁 , d) a context-푣sensitive∈ 푇 ∪ 푇 if 푁푇 ( ∗ 푢 )∈ 푁 ( ) and ( ) where | | | | and e) a recursively enumerable푣 ∈ 푁 ∪ 푇 or unrestricted∗푢 ∈+ 푁 if∗ ( ) and+ ( ) where contains at least one 푢nonterminal∈ 푁 ∪ 푇 symbol.푁 푁 ∪ 푇 푣 ∈ 푁 ∪+ 푇 푢 ≤ ∗푣 The families of languages generated by arbitrary,푢 ∈context푁 ∪ 푇 sensitive,푣 context∈ 푁 ∪ 푇free, regular,푢 linear and finite grammars are denoted by , , , , , and , respectively. For these language families, Chomsky hierarchy holds: 퐑퐑 퐂퐂 퐂퐂 퐑퐑퐑 퐋퐋퐋 � . �Before⊂ 퐑퐑퐑 moving⊂ 퐋퐋퐋 to ⊂the퐂퐂 operations⊂ 퐂퐂 ⊂ 퐑퐑 on languages, we recall some definition of regulated grammars mentioned in this study. A matrix grammar is a quadruple = ( , , , ) where , and are defined as for context-free grammar and is a set of matrices, that are finite sequences of context-free rules from × ( ) . Its language is defined퐺 푁by푇 푆( 푀) = { 푁| 푇 푆 ∗ 푀 ∗ and }. 휋 푁 푁 ∪ 푇 퐿 퐺 푤 ∈ 푇 푆 An additive∗ valence grammar is a 5-tuples = ( , , , , ) where , , , are defined as⇒ 푤for a 휋context∈ 푀 -free grammar and is a mapping from into set (set ). The language generated by the grammar consists of all string퐺 푁 푇 such푆 푃 푣that there푁 푇is 푆a 푃derivation 푣 ∗푃 ℤ ℚ 푤 ∈ 푇 푆 Multiset Controlled Grammars: Normal Form and Closure Properties (Salbiah Ashaari) 38 ISSN: 2502-4752 … where ( ) = 0 . The families of languages generated by matrix and additive 푟1푟2 푟푛 valence grammars 푛(with erasing rues) are denoted by , , ( , ), respectively. 푘=1 푘 Now,������ �we푤 recall some∑ 푣 set푟 -theoretic operations that will be used to investigate휆 휆 the closure properties of a grammar. Let and be two languages.퐌퐌퐌 Then,푎퐕퐌퐋 the퐌퐌퐌 union 푎(퐕퐌퐋), intersection ( ), difference ( ) and concatenation ( ) of and are defined as: 1 2 퐿 퐿 ∪ ∩ 1 2 − = { ⋅ }퐿 퐿 = { } 1 2 1 2 퐿 ∪ 퐿 = {푤 ∶ 푤 ∈ 퐿 표푟 푤 ∈ 퐿 } 1 2 1 2 퐿 ∩ 퐿 = { 푤 ∶ 푤 ∈ 퐿 푎푛푎 푤 ∈ 퐿 } 1 2 1 2 퐿 − 퐿 푤 ∶ 푤 ∈ 퐿 푎푛푎 푤 ∉ 퐿 1 2 1 2 1 1 2 2 The complement퐿 ⋅ 퐿 of푤 푤 ∶ 푤 with∈ 퐿 respect푎푛푎 푤 to∈ 퐿 is defined as = . For a language , the iterations of is defined as∗: ∗ ∗ 퐿 ⊆ Σ Σ 퐿� Σ − 퐿 퐿 =퐿{ }, 0 = , 퐿1 = 휆 , 퐿 2 퐿 퐿 = 퐿퐿 (iteration closure: Kleene star). ∗⋯= 푖 (positive iteration closure: Kleene plus). 푖≥0 퐿+ ⋃ 퐿 푖 푖≥1 Given퐿 ⋃ two퐿 alphabets , , a mapping : is called a morphism or synonymously a homomorphism if and only if: ∗ ∗ 1 2 1 2 (i) for every , there existsΣ Σ such that =ℎ Σ( )→ andΣ is distinct, (ii) for every , ∗ ( ) = ( ) (∗ ) .