Grammars with Restricted Derivation Trees

∗ Jiríˇ Koutný Department of Information Systems Faculty of Information Technology Brno Universtity of Technology Božetechovaˇ 1/2, 616 66 Brno, Czech Republic [email protected]

Abstract 1. Introduction This extended abstract of the doctoral thesis studies the- The theory is an inherent part of the oretical properties of grammars with restricted derivation theoretical computer science particularly concerned with trees. After presenting the state of the art concerning this the study of the formal models. The formal models are investigation area, the research is focused on the three mathematical objects used to describe the formal lan- main kinds of the restrictions placed upon the derivation guages. The fundamental models include grammars and trees. First, it introduces completely new investigation automata. The former are used to generate words and area represented by cut-based restriction and examines the latter accept them. the power of the grammars restricted in this way. Se- cond, it investigates several new properties of path-based Grammars are the kind of the rewriting models that start restriction placed upon the derivation trees. Specifically, from a specified symbol (i.e., start symbol). Then, the it studies the impact of erasing productions on the power symbol is modified according to the given set of rewrit- of grammars with restricted path and introduces two cor- ing productions. Each production is composed of two responding normal forms. Then, it describes a new re- components—the left-hand side and the right-hand side lation between grammars with restricted path and some of a production. The application of a production on a pseudoknots. Next, it presents a counterargument to the word is done by rewriting a symbol equivalent to the left- power of grammars with controlled path that has been hand side of a production by its right-hand side in the considered as well-known so far. Finally, it introduces a word. This process is known as a derivation step. During generalization of path-based restriction to not just one the computation of a derivation step, just one symbol is but several paths. The model generalized in this way is rewritten in the word. Given a start symbol of a grammar, studied, namely its pumping, closure, and parsing prop- a derivation step is computed repeatedly by applying the erties. productions from the given set. Once the resulting word is composed of the symbols that cannot be rewritten any- more, the process of applying derivation steps ends and Categories and Subject Descriptors the resulting word belongs to the language of the gram- F.4.3 [Mathematical Logic and Formal Languages]: mar. Formal Languages—Classes defined by grammars or au- tomata, Operations on languages Essentially, the grammars are composed of a finitely many symbols that are rewritten by finitely many production in Keywords finitely many derivation steps. In this way, the grammars tree controlled grammars, level controlled grammars, path represent a finite description of even infinite languages. controlled grammars, paths controlled grammars, cut con- By the notion of infinite languages are meant those lan- trolled grammars, ordered cut controlled grammars, regu- guages that contains infinitely many words. Since the lated rewriting, restricted derivation trees most of the languages commonly used in practice are infi- nite, the grammars represent a powerful tool how to deal with them. In the formal language theory, there exists a ∗ Recommended by thesis supervisor: Prof. Alexander huge variety of grammars which essentially differ in two Meduna. Defended at Faculty of Information Technology, domains. Specifically, in the complexity of the produc- Brno University of Technology, Brno on September 18, tions and in the way how to select appropriate production 2012 to be applied in a derivation step. ⃝c Copyright 2012. All rights reserved. Permission to make digital or hard copies of part or all of this work for personal or classroom use Generally, the complexity of rewriting productions can be is granted without fee provided that copies are not made or distributed seen from two angles—theoretical and practical. Theore- for profit or commercial advantage and that copies show this notice on tical viewpoint: As little as possible restrictions placed on the first page or initial screen of a display along with the full citation. Copyrights for components of this work owned by others than ACM the form of the rewriting productions in a rewriting model must be honored. Abstracting with credit is permitted. To copy other- is desirable. More specifically, the more complex rewrit- wise, to republish, to post on servers, to redistribute to lists, or to use ing productions are, the larger class of languages may the any component of this work in other works requires prior specific per- model generate. In other words, to generate complex lan- mission and/or a fee. Permissions may be requested from STU Press, guages, complex productions are needed. By the notion Vazovova 5, 811 07 Bratislava, Slovakia. of a form of a production, namely the number of the sym- Koutný, J.: Grammars with Restricted Derivation Trees. Information bols on its left-hand side is meant. Practical viewpoint: Sciences and Technologies Bulletin of the ACM Slovakia, Vol. 4, No. 4 Grammars are theoretical models that are implemented (2012) 15-23 16 Koutn´y, J.: Grammars with Restricted Derivation Trees

in many practical applications. From the perspective of One way to extend the power of context-free grammars is cost-effective implementation, the simple rewriting pro- to consider context-sensitive grammars where the produc- ductions are desirable. As simple-enough productions for tions are more complex. Indeed, context-sensitive gram- effective implementation, those of the form with just one mars contain the productions with even more than a sin- symbol on their left-hand side are considered. Such kind gle symbol on the left-hand side. However, despite their of productions are referred to as context-free productions, great power, generating complex languages by context- since they can be applied without any consideration of a sensitive grammars actually leads to several fundamental context of currently rewritten symbol. problems making their practical usage problematic (see [6], [8], [29], [36], [44], and [45]). Specifically, for context- Non-regulated rewriting models like grammars and au- sensitive grammars, many problems are undecidable, it is tomata belong to the well-known core of the formal lan- difficult to describe the derivation by a graph structure, guage theory and they are frequently used in practice. etc. Indeed, automata including its variants underlie lexical analysers (see [3] and [30]), context-free grammars rep- One of the many others approaches extending the power resent the basis of both top-down as well as bottom-up of context-free grammars is represented by matrix gram- parsers (see [3] and [4]), etc. However, the power of the mars introduced by Abraham in [1]. The fundamental models with simple productions is indeed smaller than re- underlying principle in a derivation step in matrix gram- quired for usage in many practical applications. On the mars is that not just one but a fixed number of context- other hand, the models that use only simple productions free productions are required to be applied in a given or- are usually easier to implement. As a background, it is der. This provides synchronization among different parts desirable to extend context-free grammars as in many ap- of a generated word and many non-context-free languages plications there are some natural phenomena which can- can be generated in this way (see [32], [38], and[46]). not be captured by context-free rewriting. More precisely, the motivation is based on the observation that many There are lots of other well-known approaches for extend- of the languages commonly used in practice, including ing the power of context-free grammars which preserve programming and natural languages, are not context-free the context-free nature of productions. Specifically, Ran- (see [15], [16], [34], [35], and [39]). Consequently, that dom Context Grammars (see [43]), Programmed Gram- means such languages cannot be generated by a grammar mars (see [37]), Ordered Grammars (see [14]), Indian and with only context-free productions. For these reasons, the Russian Parallel Grammars (see [25]), idea whether or not it would be possible to use grammars (see [2]), and many others. However, these approaches do with only context-free productions and increase the cor- not represent the main topic of this work although some responding power in some other way—without changing connections can probably be found. the form of rewriting productions. 2. Derivation Tree Restricted Models Basically, this can be achieved by two fundamental app- One of the power-extending approaches is represented by roaches—using a kind of a regulation of rewriting or using the restrictions placed upon the derivation trees. Given a more than one grammar with context-free productions in grammar, by the notion of a derivation tree, a graph struc- a model: Using a kind of a regulation of rewriting. By the ture depicting the application of productions on the start notion of a regulation, the way how to select appropriate symbol up to the resulting word is meant. Indisputably, production to be applied is meant. Indeed, a situation is the investigation of context-free grammars with restricted common in which, given a word, it is possible to apply derivation trees represents an important trend in today’s several productions. Informally, the essential idea is rep- formal language theory (see [7], [9], [11], [13], [24], [17], resented by the observation that a regulation mechanism [19], [20], [22], [26], [27], [28], [31], [33], [41], and [42]). In somehow prescribes the order of productions the grammar essence, these grammars generate their languages just as must follow. Therefore, many different kinds of such a ordinary context-free grammars do but their derivation regulation have been introduced in order to ensure select- trees must satisfy some simple prescribed conditions. ing appropriate production. All of the resulting models based on a kind of regulation are collectively referred to The following two sections gives an informal overview of as regulated rewriting models. Using more than one gram- the results related to the investigation of derivation-tree- mar with context-free productions in a model. Roughly restricted grammars. Through those two sections, it is speaking, the main underlying idea is based on the obser- assumed that the reader is familiar with the graph theory vation that from the cooperation of several simple mod- (see [5]) and the theory of formal languages (see [29]), els, we can obtain more power than from each of them including the theory of regulated rewriting (see [10]). if they work separately. These systems were also thorou- ghly studied and the corresponding investigation area is referred to as the theory of grammar systems. However, 2.1 Level Based Restriction we will deal with them only rarely in this work. The idea of restrictions placed upon the derivation trees of context-free grammars is introduced by Culik and Maurer Informally, the goal of this work is to introduce a model in [9] and the resulting grammars restricted in this way are that generates more than context-free languages and is u- referred to as tree controlled grammars. In essence, the sable in practice. From the theoretical viewpoint it means, notion of a tree controlled grammar is defined as follows: the model should be able to generate namely the pro- take a context-free grammar, G, and a , gramming languages and the languages used in linguistics R. A word, w, generated by G belongs to the language de- (e.g., multiple repetition, cross-dependencies, and copy- fined by G and R if there is a derivation tree, t, for w in G language). From the practical viewpoint, it should be such that all levels of t (except the last one) are described possible to develop sophisticated parsing methods work- by R. Given a tree controlled grammar, (G, R), G and ing in a polynomial time for the model. R are referred to as controlled grammar and control lan- guage, respectively. Culik and Maurer investigate several Information Sciences and Technologies Bulletin of the ACM Slovakia, Vol. 4, No. 4 (2012) 15-23 17

basic properties of tree controlled grammars—namely, the trana, and P˘aun in [26] study a new type of a restriction membership problem and the power. in a derivation: a derivation tree in a context-free gram- mar is accepted if it contains a path described by a control Based on the original definition of a tree controlled gram- language. More precisely, they consider two context-free mar, P˘aun studies the modifications where many well- grammars, G and G′. A word, w, generated by G belongs known types of both controlled grammars and control lan- to the language defined by G and G′ if there is a deriva- guages are considered in [33]. More precisely, P˘aun stud- tion tree, t, for w in G such that there exists a path of ies controlling the levels of the derivation trees of a regular t described by the language of G′. Based on this restric- grammar by several types of a control language, control- tion, they introduce a path controlled grammar and study ling the levels of the derivation trees of a context-free several properties of this model. Specifically, they study grammar without erasing productions by several types of controlling a path of the derivation trees of several types a control language, and controlling the levels of the deriva- of grammars by a regular language and controlling a path tion trees of a context-free grammar by a finite language. of the derivation trees of a by a linear or context-free language. Then, they establish two kinds of It is well-known that tree controlled grammars with a pumping properties depending on the type of a controlled context-free grammar controlled by a regular language grammar, some consequences to the closure properties of characterize the class of recursively enumerable languages. path controlled grammars, and a basic parsing property Thus, the question arises whether or not it is possible to for path controlled grammars. They also investigate the achieve the same power as tree controlled grammars have power of path controlled grammars. However, there exists when the levels of the derivation trees are restricted by a a serious problem with the correctness of the proof they subregular control language. This problem is studied by present. Dassow and Truthe in [13], where many types of subreg- ular languages are considered. Dassow and Truthe study As a continuation of the investigation of path-based re- primarily controlling the levels of the derivation trees of strictions, Mart´ın-Vide and Mitrana study parsing pro- a context-free grammar by two types of a language such perties of path controlled grammars, closure properties of that one is a subset of the other and controlling the le- path controlled grammars, and several decision problems vels of the derivation trees of a context-free grammar by for path controlled grammars in [27] and [28]. many different types of subregular languages. The same authors, Dassow and Truthe, also study hierarchies of 2.3 Goals of the Thesis subregularly tree controlled languages in [11] and [12]. As it clearly follows from the previous sections, level- They present controlling the levels of the derivation trees based restriction is well-studied and the most of the im- of a context-free grammar by the union of monoids, by portant questions have been answered. On the other regular languages with restricted descriptional complexi- hand, in the case of path-based restriction many basic ty, and by the language accepted by a deterministic finite properties including the generative power have not been automaton with at most given number of states. successfully investigated yet. Moreover, several other re- strictions placed upon the derivation trees have not yet Stiebe in [40] states that there is a tree controlled gram- been introduced at all. Indeed, the restrictions placed mar for every linearly bounded queue automaton. Then, upon the cuts of the derivation tree as well as upon seve- Stiebe proves that controlling the levels of the derivation ral paths of the derivation trees represent completely new trees of a context-free grammar by the language accepted investigation areas. Thus, the goals of the doctoral thesis by a minimal finite automaton with at most five states consist in three investigation areas. characterize the class of context-sensitive languages. If, additionally, erasing productions in a controlled grammar are allowed, controlling the levels of the derivation trees • First, to introduce new investigation area represen- of a context-free grammar by the language accepted by a ted by cut-based restrictions and establish the gen- minimal finite automaton with at most five states chara- erative power of the model restricted in this way. cterizes the class of recursively enumerable languages. • Second, to establish several new results in the inves- tigation of one-path-restricted grammars introduced Turaev, Dassow, and Selamat in [41] examine tree con- in [26]. trolled grammars with bounded nonterminal complexity and demonstrate that seven nonterminals in a tree con- • Third, to generalize one-path-restricted model into trolled grammar are enough to generate any recursively several paths and investigate several its properties. enumerable language. Then, they establish that three nonterminals in a tree controlled grammar are enough 3. New Results in Restrictions Placed upon De- to generate any regular language and any regular sim- ple matrix language can be generated by a tree controlled rivation Trees grammar with three nonterminals. Finally, they demon- First we reformulate the fundamental definitions so that strate that three nonterminals in a tree controlled gram- all derivation-tree-based restrictions can be studied using mar are enough to generate any linear language. The the same notation. Then, we present new results in the same authors in [42] state several further nonterminal- derivation-tree based restrictions investigation area. complexity-related properties of tree controlled grammars. Since restriction placed upon a level, a path, and a cut is, in essence, a restriction placed upon a derivation tree, we 2.2 Path Based Restriction use a slightly modified but equivalent formulation of the As an attempt to increase the power of context-free gram- definitions stated in [26], [27], and [28]. Consequently, mars without changing the basic formalism and loosing aforementioned modifications allow us to investigate all some basic properties of context-free languages (decida- derivation-tree-based restrictions using the same termi- bility, efficient parsing, etc.), Marcus, Mart´ın-Vide, Mi- nology—i.e., restriction on the levels (see [9], [13], [33], 18 Koutn´y, J.: Grammars with Restricted Derivation Trees

[41], and [42]), the paths (see [7], [17], [18], [19], [20], [?], Next, we summarize the most interesting results and point [22], [23], [24], [26], [27], and [28]), or the cuts (see [24]). out some important open questions. Based on the State of More precisely, all restrictions placed upon the derivation the Art in the area of restrictions placed upon the deriva- trees are covered by the general notion of tree controlled tion trees summarized above, this work deals in principle grammar that generates its language under several kinds with three kinds of derivation-tree based restrictions, cut- of the restrictions. based, path-based, and several-path-based.

3.1 Cut Based Restriction Definition 3.1. For a sequence, s, of the nodes of a In this section, we introduce new derivation-tree-based derivation tree, the word obtained by concatenating all restrictions of tree-controlled grammars which are based symbols of s is denoted as word(s). on the restriction placed upon the cuts. Then, we in- troduce several properties of grammars with restriction Definition placed upon the cuts. Specifically, we investigate the 3.2. The class of regular, linear, ε-free power. context-free, context-free, propagating scattered context grammars, matrix grammars, context-sensitive, and un- 3.1.1 Definitions restricted grammars is denoted as REG, LIN, CFε, CF, PSC, MAT, CS, RE, respectively. Definition 3.8. Let (G, R) be a tree controlled gram- mar. The language that (G, R) generates under the cuts control by R is denoted by cutL(G, R) and defined by the Definition 3.3. The set of all regular, linear, ε-free following equivalence: context-free, context-free, context-sensitive, and unrestric- ∈ ∗ ∈ ted grammars is denoted as GREG, GLIN, GCF , GCF, For all x T , x cutL(G, R) if and only if there is ε ∈ △ GCS, and GRE, respectively. a derivation tree, t G (x), and a set, xM, of its cuts such that

Note that hereafter the notion tree controlled grammar is 1. for each c ∈ M, word(c) ∈ R, and used in different meaning than in the previous section, see x the following definitions of a tree controlled grammar and 2. xM covers the whole t. the definitions of the languages as well as the classes that tree controlled grammars generate under various kinds of restrictions that are introduced in the following three Definition 3.9. The class of tree controlled languages sections. under the cuts control is defined as cut-TC(CF, REG) = {cutL(G, R):(G, R) is a tree con- trolled grammar in which G is a context-free grammar and Definition 3.4. A tree controlled grammar is a pair, R ∈ REG} and the class of tree controlled languages with (G, R), where G = (V,T,P,S) is a controlled grammar, ε-free controlled grammar under cuts control is defined as and R is a control language over V . cut-TCε(CFε, REG) = {cutL(G, R):(G, R) is a tree controlled grammar in which G is an ε-free context-free grammar and R ∈ REG}. Definition 3.5. Let (G, R) be a tree controlled gram- △ ∈ ∗ mar where G = (V,T,P,S), then (G,R) (x), x V , Definition ≼ denotes the set of the derivation trees with frontier x in 3.10. Let be a binary relation on a se- G. quence, xM, of the cuts such that for each two cuts, c1, c2 ∈ xM, c1 ≼ c2 if and only if for each node, n2, of c2 either there is a node, n1, of c1 such that n2 is a direct ̸ In the research presented through this section, we do not descendent of n1, or n1 = n2. In other words, n1 = n2 directly deal with level-based restriction placed upon the implies n2 is a direct descendent of n1. derivation trees. However, for the sake of completeness, note the following definitions related to level-based re- Definition 3.11. Let (G, R) be a tree controlled gram- striction placed upon the derivation trees. mar. The language that (G, R) generates under the or- dered-cuts control by R is denoted by ord-cutL(G, R) and defined by the following equivalence: Definition 3.6. Let (G, R) be a tree controlled gram- mar. The language that (G, R) generates under the levels ∗ For all x ∈ T , x ∈ ord-cutL(G, R) if and only if control by R is denoted by levelsL(G, R) and defined by the there is a derivation tree, t ∈ G△(x), and a sequence, following equivalence: c1x, c2x,..., ∗ cnx, of the cuts of t, for some nx ≥ 1, such that For all x ∈ T , x ∈ levelsL(G, R) if and only if there is a derivation tree, t ∈ G△(x), such that for all levels, s, of t (except the last one), word(s) ∈ R. 1. for all i = 1x, 2x, . . . , nx, word(ci) ∈ R,

2. {c1x, c2x, . . . , cnx} covers the whole t, and Definition 3.7. For some language classes X and Y, ≼ − 3. cix c(i+1) for all i = 1, 2, . . . , n 1. the class of tree controlled languages under the levels con- x trol is defined as levels-TC(X, Y) = {levelsL(G, R):(G, R) is a tree con- Definition 3.12. The class of tree controlled langua- trolled grammar in which G ∈ GX and R ∈ Y}. ges under the ordered cuts control is defined as Information Sciences and Technologies Bulletin of the ACM Slovakia, Vol. 4, No. 4 (2012) 15-23 19

ord-cut-TC(CF, REG) = {ord-cutL(G, R):(G, R) is control by R is denoted by pathL(G, R) and defined by the a tree controlled grammar in which G is a context-free following equivalence: grammar and R ∈ REG} and the class of tree controlled ∗ languages with ε-free controlled grammar under ordered For all x ∈ T , x ∈ pathL(G, R) if and only if there cuts control is defined as is a derivation tree, t ∈ G△(x), such that there is a path, ord-cut-TCε(CFε, REG) = {ord-cutL(G, R):(G, R) is p, of t with word(p) ∈ R. a tree controlled grammar in which G is an ε-free context- free grammar and R ∈ REG}. Definition 3.14. For X, Y ∈ {LIN, CF}, the class of 3.1.2 Results tree controlled languages under the path control is defined as Concerning cut-based restriction placed upon the deriva- path-TC(X, Y) = {pathL(G, R):(G, R) is a tree con- tion trees, we have introduced two fundamental types of trolled grammar in which G ∈ GX and R ∈ Y} and the such kind of a restriction and thus, we have opened a class of tree controlled languages with ε-free controlled new investigation area in derivation-tree-restricted mod- grammar under path control is defined as els. Next, we have proved that both restrictions increase { path-TCε(CFε, Y) = pathL(G, R):(G, R) is a tree the power of context-free grammars so they characterize controlled grammar in which G is an ε-free context-free RE: grammar and R ∈ Y}. ord-cut-TC(CF, REG) = cut-TC(CF, REG) = RE.

Definition 3.15. Let (G, R) be a tree controlled gram- An important open problem consists of the investigation mar that generates the language under path control by R, of cut controlled grammars where ε-productions are for- where G = (V,T,P,S). (G, R) is in 1st normal form if bidden. Consequently, the grammars restricted in this every production, r : A → x ∈ P , is of the form A ∈ V −T way should be placed into the relation with some other and x ∈ T ∪ (V − T ) ∪ (V − T )2. well-known language families, such as CS, and the deci- ding the question whether or not: Definition 3.16. Let (G, R) be a tree controlled gram- ord-cut-TCε(CFε, REG) = CS, mar that generates the language under path control by R, where G = (V,T,P,S). (G, R) is in 2nd normal form if cut-TCε(CFε, REG) = CS. every production, r : A → x ∈ P , is of the form A ∈ V −T and x ∈ T ∪ ((V ∪ {E}) − T )2 where E ∩ V = ∅ and E → ε ∈ P . The alphabet of G should now include E, Next problem is represented by the descriptional comple- with E ̸∈ V . xity of ord-cut-TC(CF, REG) and cut-TC(CF, REG). The results presented above are based on the transforma- tion of an into Pentonnen normal Definition form. However, using Geffert normal form, the number 3.17. Let Σ be an alphabet. The following of nonterminals in the resulting cut controlled grammar languages over Σ are pseudoknots: would be reduced. 1. {xyxRyR : x, y ∈ Σ∗}, Another future research idea is represented by the con- R R ∗ {u1xu2yu3x u4y u5 : x, y, ui ∈ Σ , 1 ≤ i ≤ 5}, trolling the cuts of the derivation trees in which several types of subregular control languages are considered. In 2. {xyxRzzRyR : x, y, z ∈ Σ∗}, R R R ∗ this way, the question whether or not a kind of a subregu- {u1xu2yu3x u4zu5z u6y u7 : x, y, z, ui ∈ Σ , 1 ≤ ral language is enough to increase the power of controlled i ≤ 7}, grammar properly. Consequently, the relation between the power of level-based and cut-based models restricted 3. {xyxRzyRzR : x, y, z ∈ Σ∗}, R R R ∗ in this way would be founded out. {u1xu2yu3x u4zu5y u6z u7 : x, y, z, ui ∈ Σ , 1 ≤ i ≤ 7},

3.2 Path Based Restriction ∗ 4. {xyzxRyRzR : x, y, z ∈ Σ }, In this section, we introduce a path-based restriction on R R R ∗ {u1xu2yu3zu4x u5y u6z u7 : x, y, z, ui ∈ Σ , 1 ≤ tree-controlled grammars that is equivalent to the model i ≤ 7}. introduced in [26], [27], and [28]. Next, we formally define the pseudoknot structure represented as a language. We introduce new results related to the normal forms and the Note that presented pseudoknots form obviously non-con- presence of erasing productions in a controlled grammar. text-free languages. Then, this section presents a relationship between biology and the formal language theory in the form of word re- presentation of pseudoknots generated by path controlled 3.2.2 Results grammars. Last section of this part being a counterar- As a continuation of the investigation of path-based res- gument against the proof of the power of path controlled trictions introduced in [26] and studied in [27] and [28], grammars that has been considered as correct so far. we have considered the impact of ε-productions in path controlled grammars to the power and we have stated that ε-productions can be removed from a path controlled 3.2.1 Definitions grammar without affecting its language: Definition 3.13. Let (G, R) be a tree controlled gram- mar. The language that (G, R) generates under the path path-TC(CF, CF) = path-TCε(CFε, CF). 20 Koutn´y, J.: Grammars with Restricted Derivation Trees

We have established two Chomsky-like normal forms for 3.3.1 Definitions path controlled grammars and we have formulated algo- Definition 3.18. Let (G, R) be a tree controlled gram- rithms that transform a path controlled grammar into its mar. The language that (G, R) generates under the all- normal form: paths control by R is denoted by all-pathsL(G, R) and de- fined by the following equivalence:

• Let L ∈ path-TC(CF, CF). Then, there exists ∗ For all x ∈ T , x ∈ all-pathsL(G, R) if and only if a tree controlled grammar, (G, R), in 1st normal there is a derivation tree, t ∈ G△(x), such that for all form such that L = pathL(G, R). paths, s, of t, word(s) ∈ R. • Let L ∈ path-TC(CF, CF). Then, there exists nd a tree controlled grammar, (G, R), in 2 normal Definition 3.19. The class of tree controlled langua- form such that L = pathL(G, R). ges under all paths control is defined as all-path-TC(CF, REG) = {all-pathsL(G, R):(G, R) is a tree controlled grammar in which G is a context-free A future investigation idea consists of the modifying a grammar and R ∈ REG}. general parsing methods that are based on Chomsky nor- mal form such that it will be able to parse path controlled grammars in a polynomial time. Definition 3.20. Let (G, R) be a tree controlled gram- mar. The language of tree controlled grammar under Another practically motivated idea is represented the re- not common n-path control by R, n ≥ 1, is denoted by lation between path controlled grammars and the theory nc-n-pathL(G, R) and defined by the following equivalence: of pseudoknots. We have demonstrated several typical ∗ pseudoknots used in biology represented by the words of For all x ∈ T , x ∈ nc-n-pathL(G, R) if there exists non-context-free languages. We have demonstrated some a derivation tree, t ∈ G△(x), such that there is a set, Qt, pseudoknots belong to path-TC(LIN, LIN): of n paths of t such that for each path, p ∈ Qt, word(p) ∈ R.

{xyxRyR : x, y ∈ Σ∗} ∈ path-TC(LIN, LIN), ∗ Definition ∈ { } {xyxRzzRyR : x, y, z ∈ Σ } ∈ path-TC(LIN, LIN), 3.21. For X, Y REG, LIN, CF , the {xyxRzyRzR : x, y, z ∈ Σ∗} ∈ path-TC(LIN, LIN). class of tree controlled languages under not-common n- path control is defined as nc-n-path-TC(X, Y) = {nc-n-pathL(G, R):(G, R) is Apparently, there is a huge variety of another pseudo- a tree controlled grammar with G ∈ GX and R ∈ Y}. knot structures in biology. For example, {xyzxRyRzR : x, y, z ∈ Σ∗} and it is an open question whether or not Definition 3.22. Let p , p be any different two paths those pseudoknots can be generated by tree controlled 1 2 of a derivation tree, t. Then, p and p contain at least grammars with linear components that generate the lan- 1 2 one common node (the root of t, root(t)), and p ends in guage under path control. 1 a different leaf of t than p2. Let cmn(p1, p2) denote the maximal number of consecutive common nodes of p and The last presented result deals with a reflection on the 1 p . power of path controlled grammars that has been con- 2 sidered as well-known (see [26]) for more than last ten years. However, we have presented an argument against Definition 3.23. Let Qt be a nonempty set of some the correctness of the proof given in [26] that states paths of a derivation tree, t. The paths of Qt are divided in a common node of t if and only if for some k ≥ 1, path-TC(CF, CF) ⊆ MAT. 2 cmn(p1, p2) = k for every pair (p1, p2) ∈ Qt . Let all paths We have concluded the counterargument by stating that of Qt be divided in a common node of t. If Qt = {p}, | | ≥ the aforementioned inclusion still may hold; however, it then mQt = word(p) , otherwise mQt 1 denotes the cannot be proved in the way given in [26]. More precisely, maximal number of consecutive common nodes of all paths we have found the language tha can be generated by a in Qt. grammar with controlled path. However, this language cannot be generated by the grammar obtained by the Definition construction introduced in [26]. Apparently, the power 3.24. Let (G, R) be a tree controlled gram- mar. The language of tree controlled grammar under n- of path controlled grammar still represents an open prob- ≥ lem. path control by R, n 1, is denoted by n-pathL(G, R) and defined by the following equivalence:

3.3 Several Paths Based Restriction ∗ For all x ∈ T , x ∈ n-pathL(G, R) if there exists This section being a generalization of path-restricted re- a derivation tree, t ∈ G△(x), such that there is a set, Qt, writing model to a restriction placed upon not just one of n paths of t that are divided in a common node of t and but several paths. Then, it presents several properties of for each path, p ∈ Qt, word(p) ∈ R. the model restricted in this way. Specifically, the power of a kind of all-paths-restricted rewriting model, closure and pumping properties in relation to the number of controlled Definition 3.25. For X, Y ∈ {REG, LIN, CF}, the paths, and the approximation of the power for n-path class of tree controlled languages under n-path control is restricted model. The last section of this part presents an defined as { application related result concerning the parsing of path n-path-TC(X, Y) = n-pathL(G, R):(G, R) is a tree ∈ G ∈ } restricted languages. controlled grammar with G X and R Y . Information Sciences and Technologies Bulletin of the ACM Slovakia, Vol. 4, No. 4 (2012) 15-23 21

Conventions restriction placed on not just one but several paths. Con- Hereafter, tree controlled grammars that generate the lan- sequently, we have found some subsets of n-path con- guage under the n-path control are referred to as n-path trolled grammars so their languages satisfy pumping pre- tree controlled grammars. mises similar to well-known premises stated by pumping lemmata for CF, LIN, and REG—more precisely: Note that if we consider 0 controlled paths (i.e., n = 0 and consequently card(Q ) = 0) in the definition of t • If L ∈ I-n-path-TC(CF, LIN), n ≥ 1, then there L(G, R), then, clearly, the power of such a model n-path are two constants, k, q ≥ 0, such that each word, equals CF. z ∈ L, with |z| ≥ k can be written as

z = u1v1u2v2 . . . u4nv4nu4n+1 Definition 3.26. Let (G, R) be a tree controlled gram- | | ≤ ≥ ≥ with 0 < v1v2 . . . v4n q and for all i 1, mar. Consider n-pathL(G, R), for n 1. If for each i i i u1v1u2v2 . . . u4nv4nu4n+1 ∈ L. word, z ∈ n-pathL(G, R), there exist a derivation tree, ∈ △ ≥ t G (z), a set of its paths, Qt, mQt 1, and a parti- • If L ∈ III-n-path-TC(CF, LIN), n ≥ 1, then tion, word(p) = uvwxy, for each path, p ∈ Qt, satisfying there are two constants,k, q ≥ 0, such that each the premise of the pumping lemma for linear languages word, z ∈ L, with |z| ≥ k can be written as such that it holds z = u1v1u2v2 . . . u2n+2v2n+2u2n+3

with 0 < |v1v2 . . . v2n+2| ≤ q and for all i ≥ 1, • 1 ≤ m ≤ |u|, i i i Qt u1v1u2v2 . . . u2n+2v2n+2u2n+3 ∈ L. then n-pathL(G, R) is I-n-pathL(G, R), • If L ∈ V-n-path-TC(CF, LIN), n ≥ 1, then there • | | ≤ | | ≥ u < mQt uv , are two constants,k, q 0, such that each word, ∈ | | ≥ then n-pathL(G, R) is II-n-pathL(G, R), z L, with z k can be written as

• | | ≤ | | z = u1v1u2v2u3v3u4v4u5 uv < mQt uvw , then n-pathL(G, R) is III-n-pathL(G, R), with 0 < |v1v2v3v4| ≤ q and for all i ≥ 1, i i i i u1v u2v u3v u4v u5 ∈ L. • | | ≤ | | 1 2 3 4 uvw < mQt uvwx , then n-pathL(G, R) is IV -n-pathL(G, R), A natural question that still remains open is whether or • | | ≤ | | uvwx < mQt uvwxy , not there are similar pumping properties also for the lan- then n-pathL(G, R) is V -n-pathL(G, R). guages of II-n-path-TC(CF, LIN) and IV-n-path-TC (CF, LIN).

Definition 3.27. For i ∈ {I, II, III, IV, V}, n ≥ 1, We have also proved some closure properties, that is the class of i-n-path tree controlled languages is defined as • ≥ ∈ { } ∈ i-n-path-TC(CF, LIN) = { L(G, R):(G, R) is for n 1, i I, II, III, IV, V and TG,TR i-n-path { } tree controlled grammar in which G is a context-free gram- REG, LIN, CF , ∈ } mar and R LIN . n-path-TC(TG,TR), i-n-path-TC(TG,TR) are closed under intersection with regular languages, 3.3.2 Results union, and non-erasing homomorphism, It is well-know that path controlled grammars where the • ≥ ∈ { } controlling grammar is regular characterize the same lan- for n 1, i I, III, V , i-n-path-TC(CF, LIN) guage class as its controlled grammar (see [26]) do. We are not closed under concatenation, intersection, and have proved that the power of context-free grammars re- complement. mains unchanged even if we restrict all paths in their derivation trees by regular languages: Since n-path controlled grammars are a natural gener- CF = all-path-TC(CF, REG). alization of grammars with just one path controlled, we have studied several properties that are well-known for path controlled grammars in the case of controlling given Since for each context-free grammar, there is a regular n paths. Most importantly, we have tried to establish language that describes all paths in all its derivation trees; the power for n-path controlled grammar. We have found and there is no regular language which increases its power the approximation of the power that can be applied also when used to restrict the paths, if we consider tree con- on grammars with just one path controlled. However, trolled grammars (G, R) with R ∈ REG, then, obviously, this approximation does not say too much since it is well- the power of such a model equals CF for any n ≥ 1. known that PSC is not closed under erasing homomor- Therefore, we investigate the properties of tree controlled phism. Thus, we have informally concluded that: ”We grammar with non-regular control language. More specif- either have the power to check what we need but not to re- ically, we study tree controlled grammars that generates move it (using PSC) or vice versa (using MAT).” More their languages under n-path control with linear control precisely, we have stated the following: languages. Let L ∈ n-path-TC(CF, CF), for n ≥ 1. Then We have introduced a generalization of path controlled there exists L′ ∈ PSC with L = h(L′) for some homo- grammars so that they generate the language under the morphism h. 22 Koutn´y, J.: Grammars with Restricted Derivation Trees

Finally, we have studied several parsing properties of path- Acknowledgements. We gladly acknowledge the sup- based restriction which is indisputably one of the most im- port of the CZ.1.05/1.1.00/02.0070, CZ.1.07/2.4.00/12.00 portant language-class-characterizing property from the 17, FIT-S-10-2, FIT-S-11-2, FR2581/2010/G1, GD102/09 practical viewpoint. Formally, we have studied a polyno- /H042, MEB041003, and MSM0021630528 grants. mial time parsing possibilities and we have stated that: References • For a tree controlled grammar, (G, R) with an un- [1] S. Abraham. Some questions of language theory. In Proceedings ambiguous context-free grammar, G, and a linear of the 1965 conference on Computational linguistics, 1965. control language, R, the membership [2] A. V. Aho. Indexed grammars - an extension of context-free grammars. Journal of the ACM, 15:25, 1968. x ∈ nc-n-pathL(G, R), [3] A. V. Aho, R. Sethi, and J. D. Ullman. Compilers principles, techniques, and tools. Addison-Wesley, 1986. ≥ | |k ≥ n 1, is decidable in O( x ), for some k 0. [4] A. V. Aho and J. D. Ullman. The theory of parsing, translation, and compiling. Prentice-Hall, Inc., 1972. • For a tree controlled grammar, (G, R), where G is [5] A. Bondy. Graph Theory. Springer, 2010. a context-free grammar and R ∈ LIN, there is a tree controlled grammar, (G′,R′), such that G′ does [6] R. V. Book. On the structure of context-sensitive grammars. International Journal of Computer and Information Sciences, not contain unit productions and nc-n-pathL(G, R) = 2:11, 1973. ′ ′ ≥ nc-n-pathL(G ,R ), n 1. [7] M. Cermák,ˇ J. Koutný, and A. Meduna. Parsing based on n-path • For tree controlled grammar (G, R) where G is m- tree-controlled grammars. Theoretical and Applied Informatics, 2011:16, 2012. ambiguous LR grammar, m ≥ 1, and an unam- biguous language R ∈ LIN, the membership x ∈ [8] A. B. Cremers. Normal forms for context-sensitive grammars. L(G, R), n ≥ 1, is decidable in O(|x|k), for Acta Informatica, 3:15, 1973. nc-n-path ˇ some k ≥ 0. [9] K. Culik and H. A. Maurer. Tree controlled grammars. Computing, 19:11, 1977. The significant disadvantage concerning n-path tree con- [10] J. Dassow and Gh. Paun.˘ Regulated Rewriting in Formal trolled grammars is that the number of n paths satisfying Language Theory. Springer, 1989. the properties of Def. 3.24 is strictly limited by the length [11] J. Dassow, R. Stiebe, and B. Truthe. Two collapsing hierarchies of of the right-hand sides of the productions of underlying subregularly tree controlled languages. Theoretical Computer context-free grammar. That is, given a general context- Science, 410:11, 2009. free grammar, G, and a linear language, R, controlling [12] J. Dassow and B. Truthe. On two hierarchies of subregularly tree the paths, the membership of a certain language might controlled languages. In 10th International Workshop on be decidable. However, given the same context-free lan- Descriptional Complexity of Formal Systems, DCFS 2008, guage as L(G) as a context-free grammar, H, in Chom- Charlottetown, Prince Edward Island, Canada, July 16-18, 2008. sky normal form together with R, we might not be able to [13] J. Dassow and B. Truthe. Subregularly tree controlled grammars find suitable path restriction to obtain the same language. and languages. In Automata and Formal Languages - 12th On the other hand, the derivation trees of tree controlled International Conference AFL 2008, Balatonfured, 2008. grammars that generates their languages under n-path [14] I. Friš. Grammars with partial ordering of the rules. Information control by a linear language are constructed exactly as in and Control, 12:11, 1968. context-free grammars and, in addition, we have to check [15] J. Higginbotham. English is not a context-free language. some of their paths. Thus, there is actually great possi- Linguistic Inquiry, 15:10, 1984. bility to use well-known parsing methods for context-free [16] A. K. Joshi. Tree adjoining grammars: How much languages to construct the derivation trees and to check context-sensitivity is required to provide reasonable structural their paths. However, in this viewpoint, n-path tree con- descriptions. Technical report, University of Pennsylvania, 1985. trolled grammars seems to be a quite fragile formalism [17] J. Koutný. Regular paths in derivation trees of context-free since it requires a context-free grammar to have a pro- grammars. In Proceedings of the 15th Conference STUDENT duction with at least n nonterminals on the right-hand EEICT 2009 Volume 4, 2009. side which ensures the division of n paths in a common [18] J. Koutný. On n-path-controlled grammars. In Proceedings of the node. Moreover, it means that any attempt to use a pars- 16th Conference STUDENT EEICT 2010 Volume 5, 2010. ing method that transforms a context-free grammar into [19] J. Koutný. Syntax analysis of tree-controlled languages. In Chomsky normal form will basically destroy any path re- Proceedings of the 17th Conference STUDENT EEICT 2011 striction with n ≥ 3. Moreover, several nice properties of Volume 3, 2011. context-free grammars have been lost—e.g., decomposi- [20] J. Koutný. On path-controlled grammars and pseudoknots. In tion based on pumping lemma for linear languages is po- Proceedings of the 18th Conference STUDENT EEICT 2012 tentially ambiguous and thus, the membership problem Volume 3, 2012. for i-n-path-TC, i ∈ {I, II, III, IV, V}, is potentially [21] J. Koutný, Z. Krivka,ˇ and A. Meduna. On grammars with ambiguous also. controlled paths. Acta Cybernetica, submitted. [22] J. Koutný, Z. Krivka,ˇ and A. Meduna. Pumping properties of There are still many questions to be answered, namely path-restricted tree-controlled languages. In 7th Doctoral the power of grammars with path or paths controlled non- Workshop on Mathematical and Engineering Methods in regularly, further closure properties, decision properties, Computer Science, 2011. etc. However, there are several other more general vari- [23] J. Koutný and A. Meduna. On normal forms and erasing rules in ants of path-based restriction. Indeed, a modification of path controlled grammars. Schedae Informaticae, in press. the formalism such that the paths do not have to be di- [24] J. Koutný and A. Meduna. Tree-controlled grammars with vided in a common node of a derivation tree, or a variant restrictions placed upon cuts and paths. Kybernetika, 48:11, 2012. where a path in a tree does not have to start in the root [25] M. K. Levitina. On some grammars with rules of global and end in a leaf of the tree have not been investigated replacement. Scientific-technical information yet. (Nauchno-tehnichescaya informacii), Series 2, page 5, 1972. Information Sciences and Technologies Bulletin of the ACM Slovakia, Vol. 4, No. 4 (2012) 15-23 23

[26] S. Marcus, C. Martín-Vide, V. Mitrana, and Gh. Paun.˘ A new-old [42] S. Turaev, J. Dassow, and M. H. Selamat. Nonterminal complexity class of linguistically motivated regulated grammars. In Walter of tree controlled grammars. Theoretical Computer Science, Daelemans, Khalil Sima’an, Jorn Veenstra, Jakub Zavrel (Eds.): 412:7, 2011. Computational Linguistics in the Netherlands 2000, Selected [43] A. P. J. van der Walt. Random context grammars. In Proceedings Papers from the Eleventh CLIN Meeting, Tilburg, 2000. of Symposium on Formal Languages, 1970. [27] C. Martín-Vide and V. Mitrana. Further properties of [44] B. Wegbreit. A generator of context-sensitive languages. Journal path-controlled grammars. In Proceedings of FG-MoL 2005: The of Computer and System Sciences, 3:6, 1969. 10th Conference on and The 9th Meeting on [45] W. A. Woods. Context-sensitive parsing. Communications of the Mathematics of Language, 2005. ACM, 13:9, 1970. [28] C. Martin-Vide and V. Mitrana. Decision problems on [46] G. Zetzsche. Toward understanding the generative capacity of path-controlled grammars. International Journal of Foundations erasing rules in matrix grammars. International Journal of of Computer Science, 18:11, 2007. Foundations of Computer Science, 22:16, 2011. [29] A. Meduna. Automata and Languages: Theory and Applications. Springer Verlag, 2005. Selected Papers by the Author [30] A. Meduna. Elements of Compiler Design. Taylor and Francis ˇ Informa plc, 2008. M. Cermák, J. Koutný, and A. Meduna. Parsing based on n-path tree-controlled grammars. Theoretical and Applied Informatics, [31] P. Navrátil. Parsing based on regulated grammars. Master’s thesis, 2011:16, 2012. Faculty of Information Technology, Brno University of Technology, 2003. J. Koutný and A. Meduna. On normal forms and erasing rules in path [32] G. Paun.˘ On the generative capacity of simple matrix grammars of controlled grammars. Schedae Informaticae, in press. finite index. Information Processing Letters, 7:3, 1978. J. Koutný and A. Meduna. Tree-controlled grammars with restrictions [33] Gh. Paun.˘ On the generative capacity of tree controlled grammars. placed upon cuts and paths. Kybernetika, 48:11, 2012. Computing, 21:8, 1979. J. Koutný, Z. Krivka,ˇ and A. Meduna. On grammars with controlled [34] G. K. Pullum. On two recent attempts to show that english is not a paths. Acta Cybernetica, submitted. CFL. Computational Linguistics, 10:5, 1984. [35] G. K. Pullum and G. Gazdar. Natural languages and context-free J. Koutný, Z. Krivka,ˇ and A. Meduna. Pumping properties of languages. Linguistics and Philosophy, 4:34, 1982. path-restricted tree-controlled languages. In 7th Doctoral Workshop on Mathematical and Engineering Methods in [36] Jr. R. L. Cannon. An algebraic technique for context-sensitive Computer Science, 2011. parsing. International Journal of Computer and Information Sciences, 5:20, 1976. J. Koutný. Syntax analysis of tree-controlled languages. In [37] D. J. Rosenkrantz. Programmed grammars and classes of formal Proceedings of the 17th Conference STUDENT EEICT 2011 languages. Journal of the ACM, 16:25, 1969. Volume 3, 2011. [38] A. Salomaa. Matrix grammars with a leftmost restriction. J. Koutný. On n-path-controlled grammars. In Proceedings of the 16th Information and Control, 20:7, 1972. Conference STUDENT EEICT 2010 Volume 5, 2010. [39] S. Shieber. Evidence against the context-freeness of natural J. Koutný. On path-controlled grammars and pseudoknots. In language. Linguistics and Philosophy, 8:11, 1985. Proceedings of the 18th Conference STUDENT EEICT 2012 [40] R. Stiebe. On the complexity of the control language in tree Volume 3, 2012. controlled grammars. In Colloquium on the Occasion of the 50th Birthday of Victor Mitrana, 2008. J. Koutný. Regular paths in derivation trees of context-free grammars. In Proceedings of the 15th Conference STUDENT EEICT 2009 [41] S. Turaev, J. Dassow, and M. H. Selamat. Language classes Volume 4, 2009. generated by tree controlled grammars with bounded nonterminal complexity. In Descriptional Complexity of Formal Systems - 13th International Workshop, DCFS 2011, Gießen/Limburg, Germany, July 25-27, 2011. Proceedings, 2011.