
Indexed Languages and Unication Grammars y Tore Burheim Abstract Indexed languages are interesting in computational linguistics b ecause they are the least class of languages in the Chomsky hierarchy that has not b een shown not to b e adequate to describ e the string set of natural language sen tences We here dene a class of unication grammars that exactly describ e the class of indexed languages Introduction The o ccurrence of purely syntactical crossserial dep endencies in SwissGerman shows that contextfree grammars can not describ e the string sets of natural lan guage Shi The least class in the Chomsky hierarchy that can describ e unlimited crossserial dep endencies is indexed grammars Aho Gazdar discuss in Gaz the applicability of indexed grammars to natural languages and show how they can b e used to describ e dierent syntactic structures We are here going to study how we can describ e the class of indexed languages with a unication grammar formal ism After dening indexed grammars and a simple unication grammar framework we show how we can dene an equivalent unication grammar for any given indexed grammar Two grammars are equivalent if they generate the same language With this background we dene a class of unication grammars and show that this class describ es the class of indexed languages Indexed grammars Indexed grammars is a grammar formalism with generative capacity b etween con textfree grammars and contextsensitive grammars Contextfree grammars can not describ e crossserial dep endencies due to the pumping lemma while indexed grammars can However the class of languages generated by indexed grammars the indexed languages is a prop er subset of contextsensitive languages Aho Indexed grammars can b e seen as a contextfree grammar where we add a string or stack of indices to the nonterminal no des in the phrase structure trees or derivation trees as we will call them Some pro duction rules add an index to the b eginning of the string while the use of other pro duction rules is dep endent on the rst index in the string When such a pro duction rule is applied the index of which it is dep endent is removed and the rest of the indexstring is kept by the daughters In this way we may distribute information from one part of the derivation tree to another The original denition of indexed grammars was given This work has b een supp orted by grant from Norwegian Research Council y University of Bergen Department of Informatics N Bergen Norway and University of the Saarland Computational Linguistic Postfach D Saarbr ucken Germany Email ToreBurheimiiuibno by Aho Aho We are here using the denition used by Hop croft and Ullman HU with some minor notational variations Denition An indexed grammar G is a tuple G hN T I P S i where N is a nite set of symbols called nonterminals T is a nite set of symbols called terminals I is a nite set of symbols called indices P is a nite set of ordered pairs each on one of the forms hA B f i hAf i or hA i where A and B are nonterminal symbols in N is a nite string in N T and f is an index in I An element in P is called a pro duction rule and is written A B f Af or A S is a symbol in N and is called the start symbol and such that N T and I are pairwise disjoint An indexed grammar G hN T I P S i is on reduced form if each production in P is on one of the forms a A B f b Af B c A BC d A t where A B C are in N f is in I and t is in T fg Aho showed in his original pap er Aho that for every indexed grammar there exists an indexed grammar on reduced form which generates the same language To dene constituent structures and derivation trees we are going to use tree domains Let N b e the set of all integers greater than zero A tree domain D is + a set D N of number strings so that if x D then all prexes of x are also in + D and for all i N and x N if xi D then xj D for all j j i + + The out degree dx of an element x in a tree domain D is the cardinality of the set fi j xi D i N g The set of terminals of D is ter mD fx j x D dx g + The elements of a tree domain are totally ordered lexicographically as follows x y if x is a prex of y or there exist strings z z z N and i j N with i j + + 1 such that x z iz and y z j z We also dene that x y if x y and x y A tree domain D can b e viewed as a tree graph in the following way The elements of D are the no des in the tree is the ro ot and for every x D the element xi D is xs child number i A tree domain may b e innite but we shall restrict attention to nite tree domains A nite tree domain can also describ e the top ology of a derivation tree This representation provides a name for every no de in the derivation tree directly from the denition of a tree domain Our denition of derivation trees for indexed grammars with the use of tree domains is based on Hayashi Hay Denition A derivation tree based on an indexed grammar G hN T I P S i is a pair hD C i of a nite tree domain D and a function C D NI T fg I I where i C S I 1 See Gallier Gal for more ab out tree domains ii C x NI for every node x in D with dx Moreover if C x A I I for A N and I and C xi B with B N T fg and I I i i i i for every i i dx then either a A B f is a production rule in P such that dx f I and 1 f or 1 b Af B B is a production rule in P such that f I where 1 d(x) f and if B N and if B T fg or i i i i c A B B is a production rule in P such that if B N 1 i i d(x) and if B T fg i i iii C x T fg for every node in D with dx I sy m The symbol function C D N T and the index string func I idx tion C D I are total functions on D such that if C x A where I I sy m idx A N T fg and I then C x for al l x D x A and C I I The terminal string of a derivation tree hD C i is the string C x C x I I 1 I n where fx x g ter mD and x x for al l i i n 1 n i i+1 We also dene the license function license D ter mD P such that if A is a production rule according to a b or c in ii for a node x in D then licensex A Informally this is a traditional phrase structure tree If we have a no de with lab el A where A is a nonterminal symbol and is a string of indices and we use a pro duction rule A B f then the no des only child gets the lab el B f If we instead use a pro duction rule A BC on the same no de it gets two children lab eled B and C resp ectively or if we use a pro duction rule A t where t is a terminal symbol then we remove all the indices and the no des only child gets the lab el t If we have a no de lab eled with Af where f is a index and we use a pro duction rule Af B then the no des only child gets the lab el B We also see that the terminal string is a string in T since C x T fg for all x ter mD U Denition A string w is grammatical with respect to an indexed grammar G if and only if there exists a derivation tree based on G with w as the terminal string The language generated by G LG is the set of al l grammatical strings with respect to G Example Let G hN T I P S i b e an indexed grammar where T fa b cg is the set of terminal symbols N fS S A B C g is the set of nonterminal symbols I ff g g is the set of indices and P is the least set containing the following pro duction rules S S f Ag aA Af a S S g B g bB B f b S AB C C g cC C f c Figure shows the derivation tree for the string aabbcc based on this grammar n n n The language LG generated by this grammar is fa b c j n g We close this presentation of indexed grammars by showing a simple technical observation that we will use in later pro ofs Denition An indexed grammar G hN T I P S i has a marked indexend if and only if it has one and only one production rule where the start symbol occurs and this rule is on the form S A where A N and the index does not occur in any other production rule S S'f S'gf Agf Bgf Cgf a Afb Bf c Cf a b c Figure Derivation tree for the string aabbcc based on the grammar in Example If an indexed grammar has a marked indexend then in any derivation tree every nonterminal no de except the ro ot gets a at the end of the index list Since no rule requires that there is an empty index list and neither nor the start symbol o ccurs in any other pro duction rule it is straight forward to construct an equivalent grammar with a marked indexend for any indexed grammar Lemma For every indexed grammar G there exists an indexed grammar with a marked indexend G such that LG LG $ $ Pro of Let G hN T I P S i b e an indexed grammar and assume that S and 0 do not o ccur in G G is dened from G by adding the pro duction rule S S 0 $ such that S b ecomes the new start symbol and is added to the set of nonterminal 0 symbols and is added to the set of indices Formally if G hN T I P S i and S N T I then G hN fS g T I fg P fhS S ig S i Then G 0 0 0 0 $ $ has a marked indexend and we have to show that for any string w w LG if and only if w LG $ Let hD C i b e any derivation tree based on G and assume that w is its I terminal string From this we construct a derivation tree hD C i based on G $ I as follows First let D fx j x D g fg Then let C S and let 0 I x C x for all x C x for all x D ter mD Let also C C I I I I i has then the same terminal string as x ter mD The derivation tree hD C I hD C i Since no rule requires that there is an empty index list and do es not I o ccur in any pro duction rule in G a pro duction rule that is licensing a no
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages16 Page
-
File Size-