Regular Description of Context-Free Graph Languages

Regular Description of ContextFree Graph Languages Jo ost Engelfriet and Vincent van Oostrom Department of Computer Science Leiden University POBox RA Leiden The Netherlands email engelfriwileidenunivnl Abstract A set of lab eled graphs can b e dened by a regular tree language and one regular string language for each p ossible edge lab el as follows For each tree t from the regular tree language the graph g r t has the same no des as t with the same lab els and there is an edge with lab el from no de x to no de y if the string of lab els of the no des on the shortest path from x to y in t b elongs to the regular string language for Slightly generalizing this denition scheme we allow g r t to have only those no des of t that have certain lab els and we allow a relab eling of these no des It is shown that in this way exactly the class of CedNCE graph languages generated by CedNCE graph grammars is obtained one of the largest known classes of contextfree graph languages Introduction There are many kinds of contextfree graph grammars see eg ENRR EKR Some are no de rewriting and others are edge rewriting In b oth cases a pro duc tion of the grammar is of the form X D C Application of such a pro duction to a lab eled graph H consists of removing a no de or edge lab eled X from H replacing it by the graph D and connecting D to the remainder of H accord ing to the embedding pro cedure C Since these grammars are contextfree in the sense that one no de or edge is replaced their derivations can b e mo d eled by derivation trees as in the case of contextfree grammars for strings However in particular for certain types of no de rewriting grammars the gram mar may still b e contextsensitive in the sense that the edges of the graph generated according to the derivation tree may dep end on the order in which the pro ductions are applied A graph grammar that do es not suer from this quite disastrous contextsensitivity is said to b e conuent or to have the nite ChurchRosser prop erty see Cou for a uniform treatment Thus for a conuent graph grammar G each derivation tree of G yields a unique graph ? The rst author was supp orted by ESPRIT BRWG No COMPUGRAPH I I ?? The present address of the second author is Faculty of Mathematics and Computer Science Vrije Universiteit de Bo elelaan a HV Amsterdam The Nether lands email o ostromcsvunl in the graph language generated by G Due to this close relationship to deriva tion trees the generated graph language can b e describ ed in terms of a reg ular tree language the set of derivation trees and a nite number of regular string languages to simulate the embedding pro cedure We will show this for the particular case of the no de rewriting edNCE graph grammars studied in Kau Bra Bra ELR ELR Schu ELW EL EL ER CER Thus we dene the notion of a regular path description of a graph language mainly deter mined by a regular tree language and a nite number of regular string languages and prove that regular path descriptions have the same p ower as the conuent edNCE grammars or CedNCE grammars The idea of using regular tree and string languages for the description of graphs was introduced in Wel and in vestigated in ELW for sp ecial cases of the CedNCE grammar The structure of this pap er is as follows In Section we dene the edNCE grammar and in particular the conuent edNCE grammar In Section we in tro duce the notion of a regular path description of a graph language generalizing the regular path descriptions of Wel ELW In Sections and also some ex amples and some easy lemmas can b e found Section contains the pro of of the main result the characterization of the CedNCE graph languages by regular path descriptions We use this result to show that the b oundary edNCE gram mars or BedNCE grammars cf eg RW ELW have less generating p ower than the CedNCE grammars In Section we consider a number of sp ecial cases of the main result In particular we dene sp ecial types of regular path descrip tions that characterize the b oundary ap ex and linear edNCE graph languages In Section we investigate the string generating p ower of CedNCE grammars we view a graph grammar as a generator of all the strings that lab el directed paths in the generated graphs We use the main result to show that the class of string languages generated by CedNCE grammars in this way equals the class of output languages of nondeterministic treewalking transducers This implies that this string generating metho d is more p owerful than the one of EH that gives the output languages of deterministic treewalking transducers The main result of this pap er strengthens our b elief that the class of CedNCE graph languages which seems to b e the largest known class of graph languages that can b e generated by contextfree graph grammars where contextfree is taken in the sense of Cou is a robust class of contextfree graph languages it can b e characterized in several dierent ways Other characterizations can b e found in CER by handle rewriting hypergraph grammars and in Oos Eng by monadic second order logic The results of this pap er were established in and presented in Oos and in Eng The only added result is the characterization of ap ex edNCE languages Theorem which uses EHL More recent work on the class of CedNCE graph languages or its sub classes can b e found in eg Bra Cou Cou Eng Eng SW SW KL For a survey see ER The reader is assumed to b e familiar with the basic concepts of formal lan guage theory see eg HU and of regular tree languages see eg GS Conuent edNCE Graph Grammars In this subsection we give formal denitions for the edNCE graph grammars and in particular for the conuent edNCE CedNCE graph grammars These grammars generate directed graphs with lab eled no des and lab eled edges Let b e an alphab et of no de lab els and an alphab et of edge lab els A graph over and is a tuple H V E where V is the nite set of no des E fv w j v w V v w g is the set of edges and V is the no de lab eling function The comp onents of H are also denoted as V E H H and resp ectively Thus we consider directed graphs without lo ops multiple H edges b etween the same pair of no des are allowed but they must have dierent lab els A graph is undirected if for every v w E also w v E Graphs with unlab eled no des andor edges can b e mo deled by taking andor to b e a singleton resp ectively The set of all graphs over and is denoted GR A subset of GR is called a graph language As usual two graphs H and K are disjoint if V V Also as usual H K H and K are isomorphic if there is a bijection f V V such that E H K K ff v f w j v w E g and for all v V f v v The H H K H reader is assumed to b e familiar with the way in which concrete graphs are used as representatives of abstract graphs which are equivalence classes of concrete graphs with resp ect to isomorphism We are usually interested in abstract graphs but mostly discuss concrete ones For instance whereas a graph language is dened to b e a set of concrete graphs we usually view it as a set of abstract graphs After these preliminaries we turn to the denition of edNCE graph gram mar The name of these grammars can b e explained as follows NCE stands for neighbourhood controlled embedding the d stands for directed graphs and the e means that not only the no des but also the edges of the graphs are lab eled in particular the e stresses the fact that the edNCE grammar allows for dynamic edge relabeling Thus edNCE grammars are graph grammars with neighbour hood controlled embedding and dynamic edge relabeling They were introduced in Nag Nag Nag as depth contextfree graph grammars and studied in eg Kau Bra Bra Schu They were also investigated as generalizations of NLC graph grammars in eg EL EL ELW Denition An edNCE grammar is a tuple G P S where is the alphab et of no de lab els is the alphab et of terminal no de lab els is the alphab et of edge lab els is the alphab et of nal edge lab els P is the nite set of pro ductions and S is the initial nonterminal A pro duction is of the form X D C with X D GR and C V fin outg ut D Elements of are called nonterminal no de lab els and elements of nonnal edge lab els A no de with a terminal or nonterminal lab el is said to b e a terminal or nonterminal no de resp ectively and similarly for nal and nonnal edges For a pro duction p X D C X is the lefthand side of p D is the righthand side of p and C is its connection relation We write lhsp X rhsp D and conp C Each element x d of C with x V and d fin outg is a connection instruction of H p To improve readability a connection instruction x d will always b e written as x d In the literature the elements of a connection instruction are often listed in another order Two pro ductions X D C and X D C are called isomorphic if X X and there is an isomorphism f from D to D such that C f f x d j x d C g We will assume that P do es not contain distinct isomorphic pro ductions By copyP we denote the innite set of all pro ductions that are isomorphic to a pro duction in P an element of copyP will b e called a production copy of G The pro cess of rewriting in an edNCE grammar is dened through the appli cation of pro ductions or rather pro duction copies in the usual way Informally a rewriting step according to a pro duction p X D C consists of removing a no de v lab eled X the mother no de from the given hostgraph H substi tuting D the daughter

Regular Description of Context-Free Graph Languages

Weighted Regular Tree Grammars with Storage

Tree Automata Techniques and Applications

Taxonomy of XML Schema Languages Using Formal Language Theory

Tree Automata

Language, Automata and Logic for Finite Trees

A Model-Theoretic Description of Tree Adjoining Grammars 1

Uniform Vs. Nonuniform Membership for Mildly Context-Sensitive Languages: a Brief Survey

Context-Free Graph Grammars and Concatenation of Graphs

Hybrid Grammars for Parsing of Discontinuous Phrase Structures and Non-Projective Dependency Structures

Efficient Techniques for Parsing with Tree Automata

Regular Rooted Graph Grammars a Web Type and Schema Language

Practical MAT Learning of Natural Languages Using Treebanks