Indexed Counter Languages Informatique Théorique Et Applications, Tome 26, No 1 (1992), P
Total Page:16
File Type:pdf, Size:1020Kb
INFORMATIQUE THÉORIQUE ET APPLICATIONS J. DUSKE M. MIDDENDORF R. PARCHMANN Indexed counter languages Informatique théorique et applications, tome 26, no 1 (1992), p. 93-113 <http://www.numdam.org/item?id=ITA_1992__26_1_93_0> © AFCET, 1992, tous droits réservés. L’accès aux archives de la revue « Informatique théorique et applications » im- plique l’accord avec les conditions générales d’utilisation (http://www.numdam. org/conditions). Toute utilisation commerciale ou impression systématique est constitutive d’une infraction pénale. Toute copie ou impression de ce fichier doit contenir la présente mention de copyright. Article numérisé dans le cadre du programme Numérisation de documents anciens mathématiques http://www.numdam.org/ Informatique théorique et Applications/Theoretical Informaties and Applications (vol. 26, n° 1, 1992, p. 93 à 113) INDEXED COUNTER LANGUAGES (*) by J. DUSKE (x), M. MIDDENDORF (*) and R. PARCHMANN (*) Communicated by J. BERSTEL Abstract. — Starting with the characterization of context-free counter languages by rightlinear indexed grammars with one index, indexed counter grammars are introduced and investigated. The family of indexed counter languages is a full AFL properly contained in the family of indexed languages and incomparable with the full trio oflinear indexed languages. Furthermore by modifying the dérivation mode, a characterization of type-0 languages by indexed counter grammars is given. Résumé. - Après une caractêrisation des langages algébriques à compteurs par des grammaires indexées linéaires droites d'index 1, on introduit et étudie les grammaires indexées à compteurs. La famille des langages indexés à compteurs est une AFL proprement contenue dans la famille des langages indexés, et incomparable au cône rationnel des langages indexés linéaires. De plus, en modifiant le mode de dérivation, on obtient une caractêrisation des langages de type 0 par des grammaires indexées à compteurs. 1. INTRODUCTION Indexed grammers have been introduced by Aho [1] as an extension of context-free grammars. In the study of indexed grammars the question arises whether the generative power of these grammars dépends on the number of indices. It is obvious, that an indexed grammar with an empty set of indices can only generate context-free languages. On the other hand two indices suffîce to generate ail indexed languages, because the indices of a gênerai indexed grammar can be coded by words of two indices. The concept of indexed grammars permits two principle ways of represen- ting context-free languages; first by using a context-free grammar, which is a special form of an indexed grammar, and second by using a rightlinear (*) Received August 1990, revised November 1990. (*) Institut fur Informatik, Universitât Hannover, D-3000 Hannover, Germany. Informatique théorique et Applications/Theoretical Informaties and Applications 0988-3754/92/01 93 21/S4.10/© AFCET-Gauthier-Villars 94 J. DUSKE, M. MIDDENDORF, R. PARCHMANN indexed grammar, which can be interpreted as a grammatical description of a pushdown automaton. Hence there is a natural way of representing context-free counter languages as rightlinear indexed grammars with only one index and a special "endmar- ker". Extending this concept to the case of a (genera!) indexed grammar, the use of only one index symbol can be interpreted as "counting while perform- ing context-free dérivations". The "endmarker" allows to detect a count of zero and to start counting again. In Section 2 we introducé such grammars formally and give examples. These grammars are called indexed counter grammars or ic-grammars and the corresponding languages are called indexed counter or ic-languages. Our investigations show that this subclass of indexed languages shares many properties with the context-free counter languages. In Section 3 we first give normal forms of ic-grammars. In particular, using regularity properties of the index words appearing in dérivations of indexed grammars, we arrive at the e-free Standard form of ic-grammars. Then it is shown that grammars in this standard form allow dérivations of terminal words such that the lengths of the appearing index words in these dérivations are linear bounded by the lengths of the derived words. With the aid of this result we give in Section 4 an example of an indexed language which is not an ic-language, i. e, the family of ic-languages (which is a full AFL) is properly contained in the family of indexed languages. We also investigate the linear and rightlinear forms of ic-grammars and completely characterize the subset and proper subset relation between the classes of (linear, rightlinear) indexed, (linear, rightlinear) indexed counter and (linear, rightlinear) context-free grammars. In the final Section 5 we consider indexed grammars with another dériva- tion mode introduced in [6]. Under this dérivation mode ic-grammârs generate the same languages as phrase structure grammars. This result shows that type-O languages can be obtained by "counting on leftmost dérivations of a context-free grammar". 2. FORMAL DEFINITIONS AND BASIC PROPERTIES Let us first defme the notion of an (linear, rightlinear) indexed grammar: DÉFINITION 2.1: An indexed grammar is a 5-tuple G = (N,T,I,P,S) where (1) TV, T, ƒ are finite, pairwise disjoint sets; the sets of variables, terminals and indices respectively; Informatique théorique et Applications/Theoretical Informaties and Applications INDEXED COUNTER LANGUAGES 95 (2) P is a finite set of pairs (Af 0), AeN, fel\j {e}, 0e(7V7* U T)*9 the set of productions; (Af 0) is denoted by Af-> 0; (3) S e TV, the start variable. Let ® = u1B1$1u2B2$2. .£„p„wM +1 with WJÊ?* for ie[l:«+l], and PJ.GT* foryefl :n] with n^O, be an element of (NP U T)*9 and let y G 7*, then we set For 0', 0"e(ffur)*, we set 0'=>0" iff ®' = ® X (0:y)02 with®!, ®2e(NI*\JT)* and Af ^QeP,feI{J {e}. n + # => is the ft-fold product, => is the transitive and => is the reflexive, transitive closure of =>. An indexed grammar G = (N, T, /, P, S) is called a linear indexed grammar, iff each production in P is of one of the forms Af-+ uByv or Af^y u with A, BeN,feIU{e}9u, veT* and y e/*. An indexed grammar G = (TV, 71, ƒ, P, 5) is called a rightlinear indexed gram- mar, iff each production in P is of one of the forms Af^uBy or Af^u witlij, £eiV,/e/U {e}, weT* andye/*. The language L(G) generated by an (linear, rightlinear) indexed grammar G = (N,T,I9P9S) is the set L(G) = { w\ we T*9 sXw}. A language L is called an (linear, rightlinear) index language iff L~L(G) for an (linear, rightlinear) indexed grammar G. The index words in a dérivation of such a rightlinear grammar can be interpreted as the pushdown list; the nonterminals can be interpreted as states of a suitable pushdown automaton. Vice versa for a given pushdown automaton a rightlinear indexed grammar can be constructed which générâtes the language that is accepted by that automaton. It follows that the rightlinear indexed languages are exactly the languages accepted by pushdown automata, Le. the context-free languages, as has been shown by Aho [1]. A pushdown automaton with only one pushdown symbol is a counter. Such a device must stop if the pushdown store is empty, Le. it has counted to zero. An iterated counter may count down to zero several times. For this purpose there is a bottom marker # in the pushdown store of an iterated counter. vol. 26, n° 1, 1992 96 J. DUSKE, M. MIDDENDORF, R. PARCHMANN The formai définition of such an automaton is as follows [4]: DÉFINITION 2.2: An iterated counter is a pushdown automaton K= (Z, T, r, Ô, z0, #, F) with r = {ƒ # } and where zeZ and aeTKJ [e] {e dénotes the empty word). The classes of languages accepted by these automata with final state, empty store or both coincide [4]. It is easy to construct a rightlinear indexed grammar for the iterated counter K= (Z, T,T,8,z0, #,F) of Définition 2.2 which générâtes the set of all words accepted by K with final states. Set G = (N9T,I9P9S) with TV- Z U { S }, ƒ = r = {ƒ, # } and the following set of productions: (a) S^zo# (b) if (z', f) e 8 (z, fl, ƒ) then zf -> az' ƒ ' e ƒ> (c) if (z', / # ) G ô (z, a, # ) then z # -> az' ƒ #e? (J) for ail zeF the production z -> e is in P and if (z',e)eS(z,a, #) with z'ef, then z# -+aeP. The productions of the form z# -^az'f(# represent the capability of K to start counting again, i. e, the itération capability. Counting with the help of the pushdown store of K corresponds to counting on dérivations of a rightlinear grammar. It is now interesting to investigate the problem of counting on dérivations of (linear) context-free grammars. This leads to the définition of an (rightlinear, linear) indexed counter grammar. DÉFINITION 2.3: An indexed grammar G = (N,T,I,P,S) is called indexed counter grammar {ic-grammar) iff ƒ== {ƒ, # } and the productions in P are of one of the forms: (a) S where S does not appear in any other production in P (b) Ag^G, ge{f,e}, ®e(Nf*\JT)* (c) A # -> 0, 0 G {NP # U T)*. G is called linear indexed counter grammar {linear ic-grammar) iff in the above définition 0 e 7* Nf* T* U T* in {b) and 0 e T* Nf* # r* U T* in (c). G is called rightlinear indexed counter grammar {rightlinear ic-grammar) iff in the above définition @eT* Nf* \J V in (b) and 0 e r* TV/* # U T* in (c).