The Automated Inference of Tree Systems

AN ABSTRACT OF THE THESIS OF Barry Arthur Levine for the degree of Doctor of Philosophy in Computer Science presented on March 29, 1979 Title: THE AUTOMATED INFERENCE OF TREE SYSTEMS Redacted for privacy Abstract approved: Professor Curtis R. Cook Tree systems are used in syntactic pattern recognition for describing two-dimensional patterns. We extend results on tree automata with the introduction of the subtree-invariant equivalence relation R. R relates two trees when the appearance of one implies the appearance of the other in similar trees. A new state minimizing algorithm for tree automata is formed using R. We also determine a bound for Brainerd's minimization method. We introduce the Group Unordered Tree Automaton (GUTA) which accepts all orientations of open-line patterns described using directed arc primitives. The specification of a GUTA includes an unordered tree automaton M, which only accepts a standard orientation of a given class of open-line pictures, and a transformation group, which des- cribes how the primitives transform under rotational shifts.The GUTA performs all orientational parses in parallel, reports all successful transformations and operates in the same time complexity as M.The GUTA is much easier to specify than the equivalent non-decomposed unordered tree automaton. The problem of automating the design of unordered and ordered tree automata (grammars) is studied bothon a system directed and on a highly interactive level. The system directed method uses Pao's lattice technique to infer tree automata (grammars) from structurally complete samples. It is shown that the method can infer any context- free grammar when provided with skeletalstructure descriptions. This extends the results of Pao which only deal withproper subclasses of context-free grammars. The highly interactive inference system is basedon the use of tree derivatives, also introduced in this thesis, for determining automaton states and possible state merging. Tree derivatives are sets of tree forms derived by replacing selected subtrees with marked nodes. The derivative sets are used to determine subtree-invariant equivalence relations which characterize tree automata. A minimization algorithm based on tree derivatives is given. We use tree derivatives to prove that a tree automaton withn states can be fully characterized by the set of trees that it accepts of depth atmost 2n. The inference method compares tree derivative sets and infers subtree-invariant equivalence relations. A relation is inferred if there is sufficient overlap between the derivative sets. Our method was compared to other tree automata inference schemes, including Crespi-Reghizzi's algorithm. We have shown that our method is appli- cable to the entire class of context-freegrammars and requires a smaller sample than Crespi-Reghizzi's algorithm whichcan only infer a proper subclass of operator precedence grammars. Furthermore, it appears more general than the other inference systems for tree automata or grammars. THE AUTOMATED INFERENCE OF TREE SYSTEMS by Barry Arthur Levine A THESIS submitted to Oregon State University in partial fulfillment of the requirements for the degree of Doctor of Philosophy Completed March 29, 1979 ComMencement June 1979 APPROVED: Redacted for privacy Professor of Department of Computer Science in charge of major Redacted for privacy Head of Department of Computer Science Redacted for privacy Dean of Gt^aduate School Date thesis is presented March 29, 1979 Typed by Joyce E. McEwen for Barry Arthur Levine ACKNOWLEDGEMENT I would like to thank Dr. Curtis Cook for his invaluable guidance and support in helping me produce this thesis. Many thanks are also due to Dr. Paul Cull for suggesting further research. TABLE OF CONTENTS Page 1 INTRODUCTION 1 2 SYNTACTIC PATTERN RECOGNITION 4 2.1 Motivation 4 2.2 Phrase Structure Grammars and the Chomsky Hierarchy 5 2.3 Syntactic Pattern Recognition with PSG'S 9 2.4 Extensions for Multidimensional Patterns 14 3 REGULAR TREE GRAMMARS AND TREE AUTOMATA 17 3.1 Trees and Tree Grammars 17 3.2 Tree Automata 25 3.3 Results on Tree Systems 29 4 TREE TRANSFORMATIONS AND GUTA 41 4.1 Regular Unordered Tree Grammars (Rutg) and Unordered Tree Automata (UTA) 41 4.2 Introduction to Tree Transformations 46 4.3Group Transformations 50 4.4 Non-Uniform Thresholds 58 5 INFERENCE OF TREE SYSTEMS 60 5.1 Introduction 60 5.2 Inductive Inference 61 5.3A Convergent System 65 5.4 Grammatical Inference 74 5.5 Lattice Size 76 5.6 Speedup of Convergence 77 5.7An Application to String Inference 81 6 TREE DERIVATIVES 88 6.1 Introduction 88 61,2 A Review of Biermann and Feldman's Results 89 6.3Tree Derivatives and Some General Properties 92 6.4 Inference of Tree Automata Using Tree Derivatives 106 6.4.1 Current Results on the Inference of Tree Automata 106 6.4.2 Motivations for Inference 108 6.4.3An Extension of Finite- State Results 110 TABLE OF CONTENTS (Continued) Page 6.4.4 Using Powers of Derivatives 115 6.4.5A Closer Look at Derivatives 119 6.4.6 Repeated Passes 123 6.5 Inference of Finite-State Machines Using Tree Derivatives 128 6.6 A Generalized Inference Technique 134 7 AN EFFICIENT ALGORITHM FOR THE INFERENCE OF TREE AUTOMATA 139 7.1 Introduction 139 7.2 Al: Changes to the Relation Set and G . 141 7.3 A2: Elimination of Grammatical 1 Productions 145 7.4 A3: Deterministic Parsing 147 7.5 A4: Final Space-Time Efficiency Improvements 148 7.5.1 Introduction to the Changes 148 7.5.2 A4 Formally Specified 162 7.5.3 Proof of Correctness--A4=A3 165 7.6 Summary of Algorithmic Improvements 168 8 APPLICATIONS OF ALGORITHM A4 169 8.1 Introduction 169 8.2 The Inference of Programming Languages 170 8.3 The Inference of Tree Systems 182 9 DISCUSSION 185 9.1 Summary of Results 185 9.2 Extensions 187 BIBLIOGRAPHY 189 APPENDIX A 192 LIST OF ILLUSTRATIONS Figure Page 2.1 Type 1 Chromosome 12 2.2 Type 2 Chromosome 13 2.3 Natural Rubber Molecule 15 2.4 Structural Description of Natural Rubber Molecule 15 3.1 Ordered Tree 18 4.1 Box Pattern 42 4.2 Valid Tree Representation for Box Pattern 42 4.3 Invalid Tree Representation for Box Pattern 42 4.4 Label Isomorphic Trees 43 4.5 Isomorphic Trees 43 4.6 Open Line Picture 47 4.7 Closed Line Picture 47 4.8 Oriented Primitives 47 4.9 Labelled Pattern 48 4.10 90° Shift 48 4.11 1800 Shift 49 5.1 Tree Construction for CF Inference 83 7.1 Tree Sample 140 LIST OF TABLES Table Page 6.1 113 6.2 115 THE AUTOMATED INFERENCE OF TREE SYSTEMS CHAPTER 1. INTRODUCTION Syntactic pattern recognition is the application of formal language theory to the generation and recognition of patterns. Researchers have developed new classes of grammars and automata ([32], [40], and [36]). The applications have served to enhance the study of linguistic systems. These new grammars and automata provide a formal means for dealing with the analysis of patterns. Tree automata (grammars) are used for the recognition (description) of sets of labelled trees. In this thesis we introduce a new type of tree automaton, the Group Unordered Tree Automaton (GUTA), and we extend string derivatives to tree derivatives. The GUTA are unordered tree automata that recognize all orientations of open line patterns. The specification of a GUTA includes an unordered tree automaton and a transformation group. The GUTA specification demon- strates how pattern transformations can be explicitly recorded in the recognition device. The problem of automating the design of unordered and ordered tree automata (grammars) is studied on both a system directed and highly interactive level. First, a system-directed inference method is introduced. When provided with a structurally complete sample, this method infers the target tree automaton (grammar). It requires a teacher to answer yes or no to the question Is the tree t in the target language?". It is shown that the method could be modified to infer any context-free grammar when provided with structural string descriptions. 2 This extends the results of Pao and Crespi-Reghizzi which only deal with proper subclasses of context-free grammars. The second type of inference system for tree automata requires direction from both the user and system for a solution. The method is based on the use of tree derivatives introduced in this thesis. Tree derivatives are sets of tree forms which serve to characterize the sets of trees accepted (generated) by tree automata (grammars). Tree derivatives are used to prove that a tree automaton with n states can be fully characterized by the set of trees that it accepts of depth at most 2n. This result yields a new minimization algorithm for tree automata. The second inference method used the derivative forms of trees for determining automaton states and state merging and is more general than the other inference algorithms reported in the literature. Finally, we demonstrate that our method can be used in a practical setting--the design of programming languages. Chapter 2 offers an historical background in the area of syntactic pattern recognition. Chapter 3 develops the necessary termi- nology and basic results surrounding tree grammars and tree automata. Chapter 4 discusses the design of GUTA and expresses the need for automated inference methods within that context. Chapter 5 extends Pao's results [34] for the inference of tree automata. These methods also lead to a system, combining the results of Pao and Crespi-Reghizzi [15], for inferring context-free grammars. Chapter 6 introduces the idea of tree derivatives. Properties of tree derivatives are obtained and inference techniques based on tree derivatives are developed.

Load more