Forestfire & Firewood, a Toolkit & GUI for Tree Algorithms

Eindhoven University of Technology MASTER ForestFIRE and FIREWood a toolkit and GUI for tree algorithms Strolenberg, Roger Award date: 2007 Link to publication Disclaimer This document contains a student thesis (bachelor's or master's), as authored by a student at Eindhoven University of Technology. Student theses are made available in the TU/e repository upon obtaining the required degree. The grade received is not published on the document as presented in the repository. The required complexity or quality of research of student theses may vary by program, and the required minimum study period may vary in duration. General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain TECHNISCHE UNIVERSITEIT EINDHOVEN Department of Mathematics and Computer Science MASTER’S THESIS ForestFIRE and FIREWood A Toolkit & GUI for Tree Algorithms by Roger Strolenberg Date of defense 1st of June 2007 Tutor ir. L.G.W.A. Cleophas, TU/e Supervisor dr.ir. C. Hemerik, TU/e ii Abstract Many fields in computer science use trees to represent hierarchical data. Parsing these trees and performing searches for patterns in them are well known problems. Tree parsing, for example, is a known issue in compilers, because compilers can parse intermediate representation trees to help in translating such trees into sequences of machine dependent instructions. Loek Cleophas’ PhD research focuses on tree domain problems. He gathered, structured and classified a collection of tree parsing, matching and acceptance algorithms. This thesis discusses a toolkit and graphical user interface that were developed to create an environment to experiment with these algorithms and to collect information on their properties. These properties can be used to select the most promising algorithms for the instruction selection process in compilers. The process of building the toolkit started with a study of existing tree toolkits. The knowledge gained from this study was used to construct the toolkit and user interface. The resulting toolkit consists of a subset of the algorithms described by Cleophas. Finally the toolkit and GUI were used to experiment with two types of algorithms. The description and results of these experiments are included at the end of this report. iii iv Acknowledgments I devote this section to all people who provided me with the knowledge, inspiration, and support throughout my Master project. Starting with my tutor Loek Cleophas, PhD student at the Technische Universiteit Eindhoven, who guided me through the project, provided the necessary theory and helped improving the toolkit and thesis. I also want to thank Kees Hemerik, my supervisor at TU/e, who provided valuable input and guarded the progress. Next, I want to thank my close friends, without whom things would have been very different: My old schoolmates, Nicky, Frank, Tim, and Manon, for their encouragements and kind words; Isabelle, for the challenging games and the pleasant conversations during the breaks. There are also some colleagues that may not be forgotten. Rudy, a colleague and friend, which worked on a similar Master project. We had very useful reviews of each other’s work and discussed daily programming curiosities. I also want to thank my colleagues Jan and Harald for removing the small language imperfections from my thesis. The project would also not be the same without the hilarious Wednesday afternoon lunches at the TU/e with Harald, Erik and Rudy. And of course I want to thank my parents and my sister who always support me. Without them, I would not be where I am today. v vi Acronyms This section contains the acronyms mentioned in the document. This list can be used as a quick reference. DAC Data Access Component DFRTA Deterministic Frontier-to-Root Tree Automaton DRFTA Deterministic Root-to-Frontier Tree Automaton FR Frontier-to-Root JRE Java Runtime Environment LCL Lazarus Component Library LHS Left Hand Side MDV Multiple Data View NFRTA Nondeterministic Frontier-to-Root Tree Automaton NRFTA Nondeterministic Root-to-Frontier Tree Automaton RF Root-to-Frontier RHS Right Hand Side RTG Regular Tree Grammar SDV Single Data View STF Smallest Tree First SWT Standard Widget Toolkit TA Tree Automaton TTF Tallest Tree First vii viii Contents Abstract iii Acknowledgments v Acronyms vii 0 Introduction 1 1 Domain 3 1.1 Basicconcepts ................................... 3 1.1.1 Trees..................................... 3 1.1.2 Treelanguages ............................... 5 1.1.3 Regulartreegrammars. 5 1.1.4 Treepatterns ................................ 9 1.1.5 Finitetreeautomata. 9 1.2 Problemsofinterest .............................. 12 1.3 ApplicationAreas ................................ 13 2 Research into existing toolkits 15 2.1 ATerms ....................................... 15 2.2 BEG......................................... 16 2.3 BURG........................................ 17 2.4 iBURG ....................................... 18 2.5 Timbuk ....................................... 19 2.6 Treebag ....................................... 20 2.7 Twig......................................... 21 2.8 Summary ...................................... 21 3 The Toolkit and GUI 25 3.1 ForestFIRE ..................................... 25 3.1.1 Trees..................................... 26 3.1.2 Regulartreegrammars. 28 3.1.3 Treepatterns ................................ 32 3.1.4 Treeautomata ............................... 33 3.2 FIREWood ..................................... 36 3.2.1 Architecture ................................ 36 3.2.2 Resultinguserinterface . 39 3.3 Implementationdetails. 42 4 Experiments 45 ix 4.1 Usedtreegrammars ................................ 45 4.1.1 Thesisgrammar .............................. 46 4.1.2 iBurgstandardgrammars . 46 4.1.3 ReportoftenEikelder . 47 4.1.4 Monoprojectgrammars .......................... 47 4.2 Grammar transformation experiments . ...... 48 4.2.1 RED-Z ................................... 49 4.2.2 RED-U ................................... 50 4.2.3 InfluenceofRED-Z/RED-Uorder . 50 4.2.4 RED-Z*/U*orderwithreuse . 55 4.2.5 RED-Znodeselectionstrategies . 59 4.2.6 Conclusion ................................. 66 4.3 Automaton construction experiments . ....... 66 4.3.1 Measurementtechniques. 67 4.3.2 Automaton construction: general issues . ...... 68 4.3.3 Constructions of nondeterministic automata . ........ 73 4.3.4 Constructions of deterministic automata . ....... 76 4.3.5 Conclusion ................................. 88 4.4 DFRTAbasedtreeparsingexperiments . ..... 90 4.4.1 Theparsingalgorithm . 90 4.4.2 Automatoncomparison .......................... 98 5 Conclusions 101 5.1 Results........................................ 101 5.2 Recommendations for future work . 103 5.3 Evaluation...................................... 104 A MSc Assignment description 105 A.1 OriginalAssignmentdescription. ....... 105 A.2 Additional data structure requirements . ......... 111 B Formal definitions 115 B.1 Treerelateddefinitions. 115 B.2 Treegrammarrelateddefinitions . 115 B.3 Treeautomatarelateddefinitions . 116 C ForestFIRE library 119 C.1 Basiccollections ................................ 119 C.1.1 List ..................................... 119 C.1.2 Dictionary.................................. 120 C.1.3 Set...................................... 121 C.2 Trees......................................... 122 C.2.1 Datastructures............................... 122 C.2.2 Invariants .................................. 124 C.2.3 Relatedalgorithms . 126 C.3 Regulartreegrammars. 127 C.3.1 Datastructures............................... 127 x C.3.2 Invariants .................................. 129 C.3.3 Relatedalgorithms . 129 C.4 Treepatterns .................................... 136 C.4.1 Datastructures............................... 136 C.4.2 Invariants .................................. 136 C.4.3 Relatedalgorithms . 136 C.5 Treeautomata ................................... 138 C.5.1 Datastructures............................... 138 C.5.2 Invariants .................................. 142 C.5.3 Relatedalgorithms . 143 D FIREWood file format 147 D.1 Alphabets...................................... 148 D.2 Trees......................................... 149 D.3 Treegrammars ................................... 149 D.4 Treepatternsandpatterncollections . ........ 150 D.5 Example....................................... 151 E Tree automaton construction – results 153 Bibliography 161 xi xii 0 Introduction The Software Engineering and Technology (SET) expertise group at the department of Math- ematics and Computer Science of the Technische Universiteit Eindhoven (TU/e) has as its main objective to create methods and supporting tools for development and maintenance of reliable software. Compilers, to which this master’s project is related, play an important role in supporting the software development process. A few years ago Loek G.W.A. Cleophas started his PhD research that focuses on algorithms for the tree parsing, matching and acceptance problems. These algorithms can be used in many fields. One of these fields is compilers. To be more precise:

Load more