Some Results on Top-Context-Free Tree Languages

Some Results on Topcontextfree Tree Languages Dieter Hofbauer Maria Hub er Gregory Kucherov Technische Universitt Berlin Franklinstrae FR D Berlin Germany email dietercstuberlinde CRIN INRIALorraine rue du Jardin Botanique VillerslsNancyFrance email huberkucherovloriafr Abstract Topcontextfree tree languages called corgulier by Arnold and Dauchet constitute a natural sub class of contextfree tree languages In this pap er wegive further evidence for the imp ortance of this class by exhibiting certain closure prop erties We systematically treat closure under the op erations replacement and substitution as well as under the corresp onding iteration op erations Several other wellknown language classes are considered as well Furthermore various characteri zations of the regular topcontextfree languages are given among others by means of restricted regular expressions Intro duction This pap er is motivated by our previous work on tree languages related to term rewriting systems It is wellknown that for a leftlinear termrewriting Red R of ground terms reducible by R is a regular tree lan system Rtheset guage ConverselyifRed R is regular then R can eectively b e linearized ie a nite language can b e substituted for its nonlinear variables without changing the set of reducible ground terms However little is known ab out conditions on which Red R is contextfree Here again nonlinear vari ables play a crucial role This motivation led us to study the class of topcontextfree languages which turned out to b e of sp ecial imp ortance Topcontextfree tree languages are lan guages generated bycontextfree tree grammars in which righthand sides of pro duction rules contain nonterminal symb ols if at all only at the top p osi tion This class has b een studied by Arnold and Dauchet who proved in particular the following result Theorem The language L ff t t j t L g is contextfreeiL is top contextfre eiL itself is topcontextfree The theorem remains true if contextfree and topcontextfree are re placed by regular and nite resp ectivelyWe conjecture that this analogy Partially supp orted by a p ostdo c grant of the MEN Supp orted by the MESR can also b e drawn for the language Red R mentioned ab ove It is contextfree i a topcontextfree language can b e substituted for the nonlinear variables in R without changing the set of reducible ground terms This pap er is devoted to the study of some related topics that could serve as a basis for further investigations The class of topcontextfree languages is in a sense orthogonal to that of regular languages Though these two classes intersect there are very simple lan guages that are topcontextfree but not regular and vice versa For example the i i language ff g ag a j i g is topcontextfree but not regular Conversely Arnold and Dauchet showed that the language of all terms over a signature of one binary and one constant symb ol is not topcontextfree In order to obtain a criterion that a language is not topcontextfree we more generallyprovethat every topcontextfree language is slim a simple syntactic prop erty We further study the class of languages that are b oth topcontextfree and regular We prop ose a numberofequivalentcharacterizations of this class in terms of grammars linear topcontextfree nonbranching regular linear reg ular expressions and more syntactic prop erties slim passable p olynomially sizeb ounded Closure prop erties play an imp ortantroleinallbranches of formal language theory A detailed analysis of closure prop erties for the class of regular tree lan guages can b e found in In this pap er wecontinue this topic analyzing closure of dierent classes under the op erations replacement substitution and their it erations Replacement and substitution considered also in are two p ossible extentions of the string pro duct to the tree case In contrast to replacement sub stitution replaces equal symb ols by equal terms Wemake an exhaustivestudy of closure prop erties under these op erations for six classes of tree languages nite regular linear topcontextfree topcontextfree linear contextfree and contextfree Notations We assume the reader to b e familiar with basic denitions in term rewriting and formal language theory A signature is a nite set of function symb ols of xed arity for n denotes the set of symb ols in of arity n T X n is the set of nite terms over and a set of variables X The set of ground terms ie terms without variables over is denoted by T For t T X t is the set of p ositions in t dened in the usual way as sequences of natural P os numbers Wewrite p q if p is a prex of q The subterm of t at p osition p is tj For p Post tj is a principal subterm of t if jpj and it is a p p prop er subterm of t if jpj Here jpj denotes the length of p The depth of t is jtj maxfjpjjp Post g its size is size tjP ost jIf t is a subterm of tthen t can b e written as ct where c is a context A context is called context in case it contains only symb ols from Avariable x is said to b e linear in t if there is only one p osition p Post suchthattj x and is said to b e nonlinear in t otherwise A term is linear if p all its variables are linear and is nonlinear otherwise If unambiguous we sometimes prefer vector notation to three dots nota tion For example f t abbreviates f t t n and t will b e clear from n i the context and f x stands for f x x For f and L L T n n n let f L L ff t t j t L t L g The cardinalityofa n n n n nite multiset S is denoted by jS j Contextfree languages Replacement and Substitution G N P S consists of disjoint Denition A contextfree tree grammar signatures N nonterminals and terminalsa niterewrite system P over N and a distinct constant symbol S N initial symbolallrulesin P are of the form Ax x t n where A N n x x arepairwise dierent variables and t n n T fx x g N n G is said to be regular if N contains only constant symbols G is said to be topcontextfree if al l proper subterms of righthand sides of rules areinT fx x g n G is said to be linear if al l righthand sides of rules in P are linear The language generated by a grammar G N P S is LGft T j S tg P A language L T is called linear contextfree regular if there is a linear contextfree regular grammar generating LFor A N wealso n use the more general notation LG Ax x ft T X j Ax x tg n n P G S Thus LG L Fin Reg LinTopCf TopCf LinCf Cf will denote the class of nite regular linear topcontextfree topcontextfree linear contextfree context free languages By denition wehave the inclusions Fin Reg LinCf Cf Fin LinTopCf TopCf Cf and LinTopCf LinCf All inclusions are prop er and b oth LinTopCf and TopCf are incomparable with Reg Topcontextfree languages were studied by Arnold and Dauchet under the name of langages corguliersIn they showed that TopCf coincides with the class of languages obtained by deterministic topdown tree transformations on monadic regular languages It can even b e shown that it is sucient to con sider a single monadic language eg T where and are symb ols of arity fg and is a constantsymb ol Regular tree languages have b een extensively studied eg in For context free tree languages see eg Several normal forms have b een dened for regular and contextfree grammars In this pap er we will use the fact that for each grammar there is a reduced grammar of the same typ e generating the same language A grammar is said to b e reduced if all nonterminals A in N n n are reachable ie S cAt t for some term cAt t T n n N P and productive ie LG Ax x n It is wellknown that such a normal form always exists For topcontext free grammars we will use another normal form Wecallacontextfree grammar slow if all righthand sides of its rules contain exactly one symb ol Thus slow topcontextfree grammars contain only rules of the form Ax x f y y n m fy y z Ax x B z m k n where B N k f m fy y z z g fx x g k m m k n Whereas not all contextfree languages can b e generated byslow grammars for an example see exercise for each linear topcontextfree language there isaslow reduced linear topcontextfree grammar generating this language A pro of can b e found in The op erations creplacement and csubstitution constitute two dierentways of replacing a constant c in all terms of a language L by terms of a language L The csubstitution op eration substitutes all o ccurrences of c in a term of L by the same term of L When applying the creplacement op eration all cs in L Thus the terms of L are replaced indep endently by p ossibly dierent terms of csubstitution corresp onds to the usual substitution if c is treated as a variable The creplacement corresp onds to a substitution where all cs are considered as dierent linear variables In creplacementandcsubstitution are called OI and IOsubstitution resp ectively Denition For languages L L and a constant symbol c the creplacement S of L into L is L L t L where t L is denedbyforn c c c tL L if f c f t t L n c f t L t L otherwise c n c S S ft t gwhere t t is The csubstitution of L into L is L L c c c tL t L dened by for n t if f c f t t t n c f t t t t otherwise c n c When c is clear from the context or arbitrarywe will not mention it Given C and C C is said to b e closed under replacement substi classes of languages tution by C ifL L C L L C resp ectively holds for all L C L C c c and all constantsymbols cWe will also write CC C CC C C is said to b e closed under replacement substitution if

Some Results on Top-Context-Free Tree Languages

Regular Description of Context-Free Graph Languages

Weighted Regular Tree Grammars with Storage

Tree Automata Techniques and Applications

Taxonomy of XML Schema Languages Using Formal Language Theory

Tree Automata

Language, Automata and Logic for Finite Trees

A Model-Theoretic Description of Tree Adjoining Grammars 1

Uniform Vs. Nonuniform Membership for Mildly Context-Sensitive Languages: a Brief Survey

Context-Free Graph Grammars and Concatenation of Graphs

Hybrid Grammars for Parsing of Discontinuous Phrase Structures and Non-Projective Dependency Structures

Efficient Techniques for Parsing with Tree Automata

Regular Rooted Graph Grammars a Web Type and Schema Language