Symbolic Analysis and Atomistic Model As a Basis for a Program Comprehension Methodology
Total Page:16
File Type:pdf, Size:1020Kb
JYVÄSKYLÄ STUDIES IN COMPUTING 90 Erkki Laitila Symbolic Analysis and Atomistic Model as a Basis for a Program Comprehension Methodology JYVÄSKYLÄN YLIOPISTO JYVÄSKYLÄ STUDIES IN COMPUTING 90 Erkki Laitila Symbolic Analysis and Atomistic Model as a Basis for a Program Comprehension Methodology Esitetään Jyväskylän yliopiston informaatioteknologian tiedekunnan suostumuksella julkisesti tarkastettavaksi yliopiston Agora-rakennuksessa (Ag Aud. 3) huhtikuun 26. päivänä 2008 kello 12. Academic dissertation to be publicly discussed, by permission of the Faculty of Information Technology of the University of Jyväskylä, in the Building Agora (Ag Aud. 3), on April 26, 2008 at 12 o'clock noon. UNIVERSITY OF JYVÄSKYLÄ JYVÄSKYLÄ 2008 Symbolic Analysis and Atomistic Model as a Basis for a Program Comprehension Methodology JYVÄSKYLÄ STUDIES IN COMPUTING 90 Erkki Laitila Symbolic Analysis and Atomistic Model as a Basis for a Program Comprehension Methodology UNIVERSITY OF JYVÄSKYLÄ JYVÄSKYLÄ 2008 Editors Tommi Kärkkäinen Department of Mathematical Information Technology, University of Jyväskylä Irene Ylönen, Marja-Leena Tynkkynen Publishing Unit, University Library of Jyväskylä Cover picture by Erkki Laitila URN:ISBN:978-951-39-3252-7 ISBN 978-951-39-3252-7 (PDF) ISBN 978-951-39-2908-4 (nid.) ISSN 1456-5390 Copyright ©2008 , by University of Jyväskylä Jyväskylä University Printing House, Jyväskylä 2008 3 ABSTRACT Laitila, Erkki Symbolic Analysis and Atomistic Model as a Basis for a Program Comprehension Methodology Jyväskylä: University of Jyväskylä, 2008, 326 p. (Jyväskylä Studies in Computing, ISSN 1456-5390; 90) ISBN 978-951-39-3252-7 (PDF), 978-951-39-2908-4 (nid.) Finnish summary Diss. Research on program comprehension (PC) is very important, because the amount of source code in mission-critical applications is increasing world-wide. Software maintenance takes more than one half of all software development time and the effort to understand code about a half of this. Although of great importance, research on program comprehension is not yet very advanced. Notwithstanding with its many excellent qualities, modern object-oriented code is harder to understand and more difficult to analyze than former procedural languages due to encapsulation and object bindings. As a solution for this problem we propose an information flow structure with four stages to help us in systematically obtaining new knowledge from the code. The first stage consists of loading the program through GrammarWare into a symbolic form to function as a construction for the model, as the second stage, which we call here ModelWare. In our research we wanted to find the smallest possible structure that could be used for modeling. This gave us the idea of an "atom" in the source code. The idea was then implemented as a so-called hybrid object, combining, in an ideal manner, object based abstraction and expressiveness of a logic language. As a consequence, semantics and associations could be presented in a symbolic form. The third stage, code simulation based on SimulationWare enables symbolic analysis, which brings to light a program simulation functionality that is comparable with dynamic analysis. The last stage in our methodology, KnowledgeWare, is aimed for collecting knowledge: the user constructs, stage by stage, the most suitable representations for the current tasks, which include code inspection, error detection and verification of current operations. The methodology is programmed with Visual Prolog and implemented in our JavaMaster tool, which enables the handling of Java code in accordance with the main stages. The formalism of the resulting implementation architecture combines the main functions in program development: reverse engineering for maintenance, and forward engineering for design of new code. Keywords: software maintenance, program comprehension, reverse engineering, grammars and automata, model theory, knowledge capture. 4 Author’s address Laitila, Erkki Department of Mathematical Information Technology University of Jyväskylä, Finland P.O. Box 35 (Agora), 40014 University of Jyväskylä [email protected] Supervisors Neittaanmäki, Pekka Department of Mathematical Information Technology University of Jyväskylä, Finland Kärkkäinen, Tommi Department of Mathematical Information Technology University of Jyväskylä, Finland Koskinen, Jussi Department of Computer Science and Information Systems University of Jyväskylä, Finland Reviewers Lahdelma, Risto Department of Information Technology University of Turku, Finland Leiss, Ernst L. University of Houston, USA Opponents Sajaniemi, Jorma University of Joensuu, Finland Seppänen, Veikko University of Oulu, Finland ACKNOWLEDGEMENTS The primary motivation for this work has its origins in the practical problems that I had come across in programming. For the past two decades I have been able to dedicate my company working hours both for implementing industrial Prolog applications and for studying interesting phenomena in artificial intelligence. I thank my wife, Maritta, for allowing me this rather flexible life- style, which finally led to this contribution. My aspiration towards program comprehension is due to my background both as a practitioner and an entrepreneur. My participation (2001-2002) in an educational program sponsored by Tekes aimed for Finnish software entrepreneurs in the USA allowed me to define, for this research, the most promising pragmatic stance in software reverse engineering, where static analysis and dynamic analysis are the current paradigms. During the re-orientation, Visual Prolog has been my secret weapon, the means to practice computer-aided invention and to implement prototypes and applications. My thanks for this excellent product with which one can create formalisms, such as hybrid-object models, go to Leo Jensen and his colleques in PDC. Of the topic related conversations that I had, the most pragmatic were those with Tuomo Tuomikoski and Prof. Veikko Seppänen from Elektrobit Ltd. My heartfelt thanks to my principal supervisor, Prof. Pekka Neittaanmäki for organizing the research work since 2004 when I began to update my initial knowledge as a computer engineer in order to obtain a PhD degree. I would like to express my gratitude to the supervisors for reviewing the most critical chapters. It surely has been a challenging task to assimilate all my proposals - I extended the scope no less than three times. Being a proposal for a new paradigm, symbolic code analysis also met criticism. Doubts were cast on the workings of software atom and on whether the Turing machine model would be relevant in this work. It was encouraging when Emeritus Prof. Aarni Perko voluntarily evaluated the thesis and wrote the first positive report about it. Further positive arguments came from Prof. Ernst Leiss, who saw, in my inclination towards reaching a comprehensive understanding instead of further specialization, a virtue, not a burden. To him, I therefore wish to express my sincere gratitude. I would like to thank Prof. Risto Lahdelma, a specialist in Prolog and software, for several suggestions during the long examination process, which improved the readability. I would like to thank COMAS (Jyväskylä Graduate School in Computing and Mathematical Sciences) for providing the financial means for my investigation. I would also like to thank the Ellen and Artturi Nyyssönen Foundation for their contribution. During the research Steve Legrand has helped me in completing the text. I would like to thank Steve for many interesting conversations leading to a conference trip as far as Mexico, and for nine checked articles. The quality must 6 have been good, because all of my latest six articles have been accepted for international conferences. Finally, I want to salute my children, who patiently supported my efforts during lonely times when the whole software formalism loomed over my work like a huge Gordian knot. My special thanks to Ville, who has kept me informed about the best practices in Java programming. In Jyväskylä, Nenäinniemi 7.4.2008 Erkki Laitila 7 FIGURES FIGURE0H 1 Program comprehension data flow...................................................... 22368H FIGURE1H 2 Reverse engineering, a horse shoe diagram (Tichelaar, 2001)......... 30369H FIGURE2H 3 Main levels of the methodology. .......................................................... 55370H FIGURE3H 4 The main approaches for the program comprehension tool. ........... 58371H FIGURE4H 5 Program comprehension as automata and transformations. ........... 63372H FIGURE5H 6 Definition of a computable function by a register machine. ............ 67373H FIGURE6H 7 The research as a three machine model............................................... 72374H FIGURE7H 8 The research approach and the corresponding theories................... 85375H FIGURE8H 9 Defining the Symbolic language........................................................... 89376H FIGURE9H 10 Direct, semantic translation................................................................. 105377H FIGURE10H 11 Summary of GrammarWare............................................................... 115378H FIGURE11H 12 Atomistic hybrid object, AHO, the architecture............................... 120379H FIGURE12H 13 Class hierarchy of the symbolic atom. ............................................... 120380H FIGURE13H 14 Atomistic metaphor for the Java main method................................ 122381H FIGURE14H 15 The semantics of the symbolic atom. ................................................