An Infrastructure to Support Interoperability in Reverse Engineering Nicholas Kraft Clemson University, [email protected]

Clemson University TigerPrints All Dissertations Dissertations 5-2007 An Infrastructure to Support Interoperability in Reverse Engineering Nicholas Kraft Clemson University, [email protected] Follow this and additional works at: https://tigerprints.clemson.edu/all_dissertations Part of the Computer Sciences Commons Recommended Citation Kraft, Nicholas, "An Infrastructure to Support Interoperability in Reverse Engineering" (2007). All Dissertations. 51. https://tigerprints.clemson.edu/all_dissertations/51 This Dissertation is brought to you for free and open access by the Dissertations at TigerPrints. It has been accepted for inclusion in All Dissertations by an authorized administrator of TigerPrints. For more information, please contact [email protected]. AN INFRASTRUCTURE TO SUPPORT INTEROPERABILITY IN REVERSE ENGINEERING A Dissertation Presented to the Graduate School of Clemson University In Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy Computer Science by Nicholas A. Kraft May 2007 Accepted by: Dr. Brian A. Malloy, Committee Chair Dr. Harold C. Grossman Dr. James F. Power Dr. Roy P. Pargas ABSTRACT An infrastructure that supports interoperability among reverse engineering tools and other software tools is described. The three major components of the infrastructure are: (1) a hierarchy of schemas for low- and middle-level program representation graphs, (2) g4re, a tool chain for reverse engineering C++ programs, and (3) a repository of reverse engineering artifacts, including the previous two components, a test suite, and tools, GXL instances, and XSLT transformations for graphs at each level of the hierarchy. The results of two case studies that investigated the space and time costs incurred by the infrastructure are provided. The results of two empirical evaluations that were performed using the api module of g4re, and were focused on computation of object-oriented metrics and three-dimensional visualization of class template diagrams, respectively, are also provided. DEDICATION For C.R.K and J.H.K. ACKNOWLEDGMENTS I would like to thank my advisor, Dr. Brian Malloy, for his advice, support, and, most of all, friendship. I would also like to thank the members of my committee, especially Dr. James Power. Thanks to Dr. Jason Hallstrom for his help and friendship, and to Dr. Errol Lloyd for his advice and collaboration. Special thanks to Dr. Baker, Dr. Finkbine, Dr. Hollingsworth, Dr. Lang, and Mr. Manwani for preparing and encouraging me, and to Dr. Murali Sitaraman for bringing me to Clemson. I would also like to thank the faculty and staff of the Computer Science department, especially Dr. Grossman and Dr. Westall. A very special thanks to Jay Harris for his support, trust, friendship, and respect. Also, thanks to Ben Hoipkemier for his multiple contributions to this project, to Daniel Lowhorn for being a great teammate, to Andy Dalton and George Dowding for many late night conversations, to Tony Bowles and John Hunt for helping me survive 822, and to Jennifer and Brian for always taking care of me. Finally, I am grateful to all of my family and friends for their love and support, particularly my parents Jane and Bill, my brother Charlie, my wife Christine, my grandparents Carl and Jean, my oldest friends Chris, Jack, and Sue Robinson, my buddy Ringo, and my in-laws the Stemm family. I would also like to thank the many anonymous reviewers who read drafts of this work for providing both insightful comments, and encouraging words. I would also like to thank the College of Engineering and Science, and the Graduate School at Clemson University for the awards in recognition of my research. I am truly honored and humbled to have been the first student from the Department of Computer Science to win these awards. TABLE OF CONTENTS Page TITLE PAGE .................................. i ABSTRACT ................................... iii DEDICATION .................................. v ACKNOWLEDGMENTS ........................... vii LIST OF FIGURES .............................. xi LIST OF SOURCE LISTINGS ....................... xiii LIST OF TABLES ............................... xv 1 Introduction .................................. 1 1.1 Research Problems . 2 1.2 Dissertation Outline . 6 2 Background .................................. 9 2.1 Graph eXchange Language . 9 2.2 Program Representation Graphs . 12 2.2.1 Abstract Syntax Graph . 12 2.2.2 Call Graph . 14 2.2.3 Class Graphs . 14 2.2.4 Control Flow Graphs . 16 3 Related Work ................................. 19 3.1 Exchange Formats and Schemas . 19 3.2 Infrastructures for Reverse Engineering . 21 3.3 Evaluating Results from Reverse Engineering . 23 3.4 Linking in Reverse Engineering Tools . 24 3.5 Tools for Reverse Engineering C++ Programs . 25 3.5.1 Tools that Provide a C++ Parser . 25 3.5.2 Tools that Utilize the GCC C++ Parser . 26 3.6 Discussion . 27 4 Schemas for Low- and Middle-Level Graphs ............. 31 4.1 Hierarchy of Schemas . 31 4.2 Low-Level Schemas: Levels 0 and I . 35 4.3 Middle-Level Schemas: Levels II, III, and IV . 38 4.4 Comparing Schema Instances . 43 Table of Contents (Continued) Page 5 g4re – Tool Chain for Reverse Engineering C++ Programs .... 45 5.1 Architecture . 45 5.1.1 The ASG module . 47 5.1.2 The Schema and Serialization Modules . 48 5.1.3 The Transformation Module . 48 5.1.4 The Linking Module . 49 5.1.5 The API Module . 50 5.2 Sample Usage . 51 5.2.1 Input ............................... 51 5.2.2 Usage . 54 6 Case Studies: Realizing the Infrastructure with g4re ........ 57 6.1 TestSuite................................. 57 6.2 Case Study: Exchanging Low-Level Graphs . 59 6.2.1 Exchanging Graphs at Level 0 . 60 6.2.2 Exchanging Graphs at Level I . 65 6.2.3 Discussion . 70 6.3 Case Study: Exchanging Middle-Level Graphs . 71 6.3.1 Exchanging Graphs at Levels II, III, and IV . 71 6.3.2 Transforming GXL Graphs with XSLT . 73 6.3.3 Discussion . 78 7 Applications: Empirical Evaluation with g4re ............ 79 7.1 Application: Computing Object-Oriented Metrics . 79 7.1.1 Overview . 79 7.1.2 Related Work . 81 7.1.3 Methodology . 83 7.1.4 Case Study . 85 7.2 Application: Visualizing Class Template Diagrams . 91 7.2.1 Overview . 91 7.2.2 Related Work . 93 7.2.3 Methodology . 94 7.2.4 Case Study . 96 8 Conclusion ................................... 101 APPENDICES .................................. 107 A Acronyms and Abbreviations . 109 B Repository of Reverse Engineering Artifacts . 111 BIBLIOGRAPHY ................................ 113 x LIST OF FIGURES Figure Page 2.1 Overview of GXL Validation. The process of validating GXL instance and/or schema graphs Solid edges represent input and output. Dashed edges represent notes. .............. 11 2.2 Sample AST for struct Node. An AST for struct Node, which is defined in Source Listing 2.1. Uses of the types int and Node have not yet been resolved to their definitions. ............ 13 2.3 Sample ASG for struct Node. An ASG for struct Node, which is defined in Source Listing 2.1. Uses of the types int and Node have been resolved to their definitions. ................. 13 4.1 Hierarchy of Schemas. Schemas for program representation graphs organized by levels. Solid edges with open arrows represent input and output. Dashed edges with filled arrows represent realization. Solid edges represent the progression of information from a graphical representation at one level to a graphical representation at a subsequent level. Filled arrows indicate a single instance is needed, empty arrows indicate that a set of instances are needed. ...................... 32 4.2 Level I: CppInfo API – Templates. Excerpt from the CppInfo API schema that illustrates the key classes, aggregations, and associations related to templates. .................... 37 4.3 Level I: CppInfo API – Function Types. Excerpt from the Cp- pInfo API schema that illustrates the key classes, aggregations, and associations related to function types. ........... 38 4.4 Level II: Class Diagram. Excerpt from the UML 2.0 schema that illustrates the key class-related components in a class diagram. ................................. 39 4.5 Level II: Class Diagram (continued). Excerpt from the UML 2.0 schema that illustrates the key template-related components in a class diagram. ........................... 40 4.6 Level II: Call Graph. Schema for a call graph, a graph whose nodes represent functions and function call sites, and whose edges represent function calls. ..................... 40 4.7 Level II: Control Flow Graph (CFG). Schema for a CFG, a graph whose nodes represent blocks of straight-line code, and whose edges represent flow of control between the blocks. ...... 41 List of Figures (Continued) Page 4.8 Level III: Object Relation Diagram (ORD). Schema for an ORD, a graph whose nodes represent classes, and whose edges represent relationships, including polymorphic relationships, between the classes. ............................ 42 4.9 Level III: Interprocedural Control Flow Graph (ICFG). Schema for an ICFG, a graph whose nodes represent blocks of straight- line code, and whose edges represent flow of control between the blocks and caller-callee relationships. ............... 42 4.10 Level IV: Class Firewall. Schema for a class firewall, a graph whose nodes represent classes, and whose edges represent testing dependencies between the classes. ............... 43 5.1 Overview of g4re. Dashed lines represent “use” dependencies. Bold text indicates an implementation artifact. Italic text indicates a third party library. .....................

An Infrastructure to Support Interoperability in Reverse Engineering Nicholas Kraft Clemson University, [email protected]

GNU/Linux AI & Alife HOWTO

Abaqus Installation and Licensing Guide

Open Thesis Final.Pdf

Code Structure by Miklos Vajna Senior Software Engineer at Collabora Productivity 2017-10-11

Ubuntu Unleashed 2013 Edition: Covering 12.10 and 13.04

Minidlna Tweaker : Aplicació Per Configurar I Previsualitzar El Servidor Minidlna Per a GNU/Linux

Openbsd Gaming Resource

Art-Workbook-V0 84

Abaqus GUI Toolkit User's Guide

IT Acronyms.Docx

Release 0.11 Todd Gamblin

GNAT User's Guide