Journal of Theoretical and Applied Information Technology 30th November 2017. Vol.95. No 22 © 2005 – ongoing JATIT & LLS
ISSN: 1992-8645 www.jatit.org E-ISSN: 1817-3195 MULTISET CONTROLLED GRAMMARS
1SALBIAH ASHAARI, 1SHERZOD TURAEV, 1M. IZZUDDIN M. TAMRIN 2ABDURAHIM OKHUNOV, 3TAMARA ZHUKABAYEVA 1Kulliyyah of Information and Communication Technology, International Islamic University Malaysia 53100 Kuala Lumpur, Malaysia 2Kulliyyah of Engineering, International Islamic University Malaysia 53100 Kuala Lumpur, Malaysia 3Faculty of Information Technology, L.N. Gumilyov Eurasian National University 010008 Astana, Kazakhstan E-mail: [email protected], [email protected], [email protected] [email protected], [email protected]
ABSTRACT
This study focusses on defining a new variant of regulated grammars called multiset controlled grammars as well as investigating their computational power. In general, a multiset controlled grammar is a grammar equipped with an arithmetic expression over multisets terminals where to every production in the grammar a multiset is assigned, which represents the number of the occurrences of terminals on the right-hand side of the production. Then a derivation in the grammar is said to be successful if only if its multiset value satisfies a certain relational condition. In the study, we have found that control by multisets is powerful tool and yet a simple method in regulation of generative processes in grammars. We have shown that multiset controlled grammars are at least as powerful as additive valence grammars, and they are at most powerful as matrix grammars. Keywords: Multisets, Context-free Grammars, Regulated Grammars, Generative Capacity.
1. INTRODUCTION Thus, we need to go beyond context-free grammars where one of the solutions is to consider Grammars are language generation models the context-sensitive grammars which are more that define their language strings so their process of powerful. Nevertheless, in spite of their great rewriting will generate them starting from a special power, they have some serious problems in the start symbol. Basically, they can be classified into practical usage, where they have several adverse two fundamental categories which are context-free features regarding decidability problems in which grammars and non-context-free grammars. Among whether they are undecidable or having exponential those two, context-free grammars are the most algorithms. Furthermore, it is hard or impossible to developed and well examined grammar class in describe the derivations of context sensitive Chomsky hierarchy since they have a lot of good grammars by a graph or tree structure which is an sides in terms of computational properties and essential tool in analyzing the structure of the complexity problems. Additionally, they are also problems [1,2]. These are the reasons why many undoubtedly easy to be implemented in many researchers are looking for intermediate grammars formal languages applications. However, between context-free and context-sensitive unfortunately, context-free grammars are not able to grammars, called regulated or controlled grammars, cover all aspects occurred in modeling of where they can combine the beauty and simplicity phenomena and it is also already well-known that of context-free, at the same time possess the power many real-world problems cannot be described of context-sensitive grammars. accurately by context-free languages which have been discussed in [1]. A regulated or controlled grammar is portrayed as a grammar with some additional
6089
Journal of Theoretical and Applied Information Technology 30th November 2017. Vol.95. No 22 © 2005 – ongoing JATIT & LLS
ISSN: 1992-8645 www.jatit.org E-ISSN: 1817-3195 mechanisms where the applications of certain rules are being restricted in order to avoid certain Interestingly, in the same year where derivations. In [1-10], we can find a large number multiset grammar was discovered, authors’ in [16] of old and new as well as well-known of various immediately came up with a Chomsky hierarchy types of regulated grammar that preserve the nature characterization of multiset grammars in term of of context-free such as matrix grammars, regularly multiset automata where their working mode is controlled grammars, vector grammars, random according to the addition and subtraction of context grammars, tree controlled grammars, semi- multiset. In that paper [16], they defined three types conditional grammars, global indexed grammars, of multiset automata known as multiset finite Petri net controlled grammars, string-regulated automata (MFA), multiset linear bounded automata graph grammars, Parikh vector controlled grammars (MLBA) and multiset Turing machine (MTM). It and many more. All of those grammars have was shown that MFAs are powerful as multiset achieved plentiful remarkable results within formal regular and context free grammars, Parikh set of language theory and are different from each other regular and context free grammars as well as depending on their restrictions either based upon semilinear grammars while MLBAs are powerful as the variety of context related or on the use of rules monotone grammars and MTMs are powerful as during the process of generating the languages. arbitrary grammars. Besides, it was proven that the However, under certain circumstances they are too deterministic variants of all of those automata are complicated or not computationally complete or firmly less powerful than the non-deterministic correlate to a group of grammars with too many variants unlike the grammar cases [16]. unsolvable decision problems which have lessen the practical interest. Therefore, here we introduce a Since then, after a while, the direction of new controlled grammar called multiset controlled multiset study are continuing expanded where in grammar where its restriction of rules is based on 2007, the researchers’ in [17] developed a new terminal multisets. grammar model known as random context multiset grammars which based on relation of partial order Multiset was defined by [11] is a on the objects the grammars contend with together collection of unordered objects called elements in with the multiset random context checkers and which it is allowed to have repeated occurrences of transducers concept. In that study, they showed how identical elements. It is important to consider the those grammars can generate set of recursively term multiset since there exist circumstances such enumerable of finite multiset and also can be easily there are repeated hydrogen and oxygen atoms in a enhanced to antiport P system [17]. sulfuric acid molecule , repeated roots of polynomial equations, repeated observations in Further, the authors’ in [18] made use of statistical samples and so on where those repeated fuzzy concept to introduce two new extensions of elements need to be counted in order to attain their multiset grammars and automata called fuzzy definiteness and adequacy [9]. On account to its multiset grammars and fuzzy multiset finite aptness, it has been used interchangeably with a automata with discussion of the relationship variety of term which carrying synonymy with between fuzzy multiset regular grammars with multiset even in different contexts like heap, bag, fuzzy multiset finite automata in 2013. They occurrence set, fireset, list, weighted set, sample demonstrated that if a fuzzy multiset language is and bunch [13,14]. accepted by a fuzzy multiset finite automaton, it can also be generated by fuzzy multiset regular The notion of relating multiset rewriting grammar and vice versa [18]. In addition, they with Chomsky grammars was initiated by Kudlek, investigated closure properties of fuzzy multiset Martin and Paun in 2001 with the name of “multiset finite languages family under certain regular grammars” where they considered the applicability operations such union, addition and Kleene star of rules as multiset in restricting the use of [18]. Shortly thereafter, in 2015, the researchers’ in productions of grammars [14]. Additionally, the [19] widened the study done by [18] by associating lower and upper bound of computational power as a deterministic fuzzy multiset finite automaton with well as the closure properties of multiset grammars a given fuzzy multiset finite automaton and have been investigated in [14,15]. In both papers, it showing that both automata are equally powerful in was found that multiset grammars are more the sense of fuzzy multiset language acceptance. powerful than context-free grammars and have at They as well studied and presented two minimal most computational power of matrix grammars.
6090
Journal of Theoretical and Applied Information Technology 30th November 2017. Vol.95. No 22 © 2005 – ongoing JATIT & LLS
ISSN: 1992-8645 www.jatit.org E-ISSN: 1817-3195 realizations of fuzzy multiset language where they symbols or letters, and a string (sometimes referred proved that both of them are isomorphic [19]. as a word) over the alphabet Σ is a finite sequence of symbols of Σ. The string without symbols is In 2016, Sharma et al. conferred an in- called the null or empty string, and it is denoted depth study of fuzzy multiset grammars where they by . The set of all strings (including ) over the introduced two new variant of grammars called alphabet, Σ is represented by Σ∗, and the set of all fuzzy multiset left linear grammars and fuzzy strings except the empty string is denoted by Σ , multiset right linear grammars. They proved that i.e., Σ Σ∗ . A language is a subset of Σ∗. both grammars along with fuzzy multiset grammars For a set , a mapping ∶ → is called a and fuzzy multiset regular grammars are equivalent multiset. The set of all multisets over is denoted to each other in normal formal except for the case by ⊕. Then, the set is a called the basic set of of empty string. They also showed that all ⊕. For a multiset ∈ ⊕ and element ∈ , languages of fuzzy multiset automaton class is represents the number of ’s in . closed under homomorphism, inverse homomorphism, right quotient by any multiset, 2.2 Grammars quotient with arbitrary multisets and reversal of a A phrase structure grammar is a quadruple fuzzy multiset finite language [22]. , , , where is an alphabet of nonterminals, is an alphabet of terminals and ∩ This paper is outlined as follows. First, we ∅, ∈ is the start symbol, is a finite set of recall some well-known basic notations, production of the form → where ∈ terminologies and concepts related to the formal ∪ ∗ ∪ ∗, ∈ ∪ ∗. If a production languages theories, multisets and regulated is in form of → , then it is called an erasing rule. rewriting grammars which will be used throughout this paper. Then, we introduce a new type of For a grammar , , , , a direct controlled grammars called multiset controlled derivation relation over ∪ ∗ which denoted by grammars. Further, we investigate their generative ⇒ and defined as ⇒ provided if and only if power and closure properties. Lastly, we give a there is a rule → ∈ such that and brief summarization of all materials discussed in for , ∈ ∪ ∗. Since ⇒ is a this paper together with some open problems relation, then its th, 0, power is denoted by regarding this multiset controlled grammar. ⇒ , its transitive closure by ⇒ , and its reflexive and transitive closure by ⇒∗. A string ∈ 2. PRELIMINARIES ∗ ∗ ∗ ∪ is a sentential form if ⇒ . If ∈ , then is called a sentence or a terminal string and In this section, we only recall some well- ∗ known basic notations, terminologies and concepts ⇒ is said to be a successful derivation. We …. related to the formal languages theories, multiset also use the notations ⇒ or to denote the and regulated rewriting grammars that will be used derivation that use the sequence of rules in the next sections. For more details information, … , ∈ ,1 . the reader can refer to [1,2,20-21]. The language generated by , denoted by 2.1 General Notations , is defined as ∈ ∗ | ⇒∗ . Throughout the paper, we use the Two grammars and are called to be following notations. The symbols ∈ and ∉ are equivalent if and only if they generate the same used to represent the set membership and negation language, i.e., . of set membership of an element to a set, respectively. The symbol ⊆ signifies the set The Chomsky hierarchy classifies all inclusion which is not necessarily proper, and the grammars into four basic categories according to symbol ⊂ marks the strict inclusion. | | portrays the form of production rules, i.e., a grammar the cardinality of a set , which is the number of , , , is called elements in , and 2 depicts the power set of a set unrestricted or recursively enumerable . The symbol ∅ denotes the empty set which is the grammar (type-0) if its productions are in set without elements. The sets of integer, natural, the form of → where ∈ ∪ , real and rational numbers are denoted by ∈ ∪ ∗ and contains at least one , , and ℚ, respectively. An alphabet Σ is a nonterminal symbol. finite and nonempty set of elements known as
6091
Journal of Theoretical and Applied Information Technology 30th November 2017. Vol.95. No 22 © 2005 – ongoing JATIT & LLS
ISSN: 1992-8645 www.jatit.org E-ISSN: 1817-3195 context sensitive grammar (type-1) if its some examples of language of multiset controlled productions are in the form of → where grammar. | | | |, ∈ ∪ ∗ ∪ ∗ and ∈ ∪ . Definition 1. A multiset controlled grammar is a 6- context free grammar (type-2) if its tuples , , , ,⊕, where , and are productions are in the form of → defined as for a context-free grammar, is a finite where ∈ and ∈ ∪ ∗. subset of ∪ ∗ ⨁ and : ⨁ → is a linear grammar if its productions are in the linear or nonlinear function. A triple , , ∈ form of → where ∈ and ∈ ∗ ∪ is written as → . ∗ ∗. regular grammar (type-3) if its productions If , ,…, , ∈ ,1 is a are in the form of → where ∈ ∗ ∪ linear, then it is in the form of , ,…, ∗ and ∈ . ∑ where ∈ ,0 . Then, as a nonlinear function , we can consider The families of languages generated by logarithms, polynomials, rational, exponential, arbitrary, unrestricted, context sensitive, context power and so on. free, regular, linear and finite grammars are denoted by , , , , , and , respectively. For Definition 2. A string-weight pair , ∈ these language families, Chomsky hierarchy holds: ∪ ∗ ⨁ is called a sentential form, written ⊂ ⊂ ⊂ ⊂ ⊂ . as . Then, we say that ∈ ∪ ∗ ⨁ directly derives ∈ ∪ ∗ ⨁ in if and only 2.3 Grammars with Regulated Rewriting if and for some We recall some definitions of controlled ∗ , ∈ ∪ written as ⇒ where grammars that will be used throughout the ∶ → ∈ with ⊕ . proposal.
Definition 3. A derivation in such A matrix grammar is a quadruple ⇒ ⇒ ⇒⋯⇒ , 1 is , , , where , and are defined as for ∗ ⨁ called successful if and only if ∈ and . context-free grammar and is a set of matrices, that are finite sequences of context-free rules from For short, it can be written as ⇒ ∗ ∪ . The language generated by is where ⋯ . defined by ∈ ∗| ⇒ and ∈ ∗ . Definition 4. The language of the grammar , An additive valence grammar is a 5-tuples denoted by , consists of all strings obtained by successful derivations in , i.e., , , , , where , , , are defined as ∗ ⨁ for a context-free grammar and is a mapping from ∈ ∣ ⇒ into (ℚ). The language generated by the where ⋯ . additive (multiplicative) grammar consists of all string ∈ ∗ such that there is a derivation Definition 5. The language , ,∗ … where ∣ ∈ , ∗ where the relation ∗ ∈ , , , , and ∈ , ⊆ is a cut point set called a threshold language of the 0 . grammar under and .
The families of languages generated by The families of threshold of languages matrix and additive valence grammars (with erasing generated by multiset controlled regular, linear and rues) are denoted by , , ( , ), context-free grammars with and without erasing respectively. rules are denoted by , , , , respectively. We also use bracket 3. DEFINITIONS AND EXAMPLES notation , ∈ , , to show that a statement holds both cases of with and without In this section, we give a meticulous erasing rules. definition of multiset controlled grammar, a derivation step and a successful derivation with
6092
Journal of Theoretical and Applied Information Technology 30th November 2017. Vol.95. No 22 © 2005 – ongoing JATIT & LLS
ISSN: 1992-8645 www.jatit.org E-ISSN: 1817-3195 Further, to demonstrate the derivation grammar where consists of the following process of multiset controlled grammars, we productions provide two examples using linear and non-linear ∶ → 1 , functions . ∶ → 1 , ∶ → 1 and Example 1 (Linear Function). Let log . , , , , , , , , ⊕, be a multiset where we consider the threshold language controlled grammar where consists of the , , , ∈ , of the grammar . following productions ∶ → 0,0,0 , We start with the only applicable ∶ → 1,1,0 , production rule which yields . Then we can ∶ → 1,1,0 , either ∶ → 0,1,1 , terminate the derivation by applying ∶ → 0,1,1 and productions to obtain or , , rewrite to by applying productions where we consider the threshold language . ,0, of grammar. Then, the derivation can be continued from We start with the only applicable production any number of times. To terminate production rule which yields . Then we can the derivation, production should be applied. In either general, we can see the derivation as ∗ terminate the derivation by applying ⇒ 1 ⇒ 1 . productions to obtain or rewrite to by applying Then, log 1 , 2 1, productions . 1. For instance, the string can be obtained by the following derivation: Then, the derivation can be continued from 1 ⇒ 1 ⇒ 2 ⇒ 3 A and B applying productions and any number of times. To terminate the derivation, productions ⇒ 4 4 and log 4.