Towards Efficient Instrumentation for Reverse-Engineering Object Oriented Software Through Static and Dynamic Analyses
Total Page:16
File Type:pdf, Size:1020Kb
Towards Efficient Instrumentation for Reverse-Engineering Object Oriented Software through Static and Dynamic Analyses by Hossein Mehrfard A dissertation submitted to the Faculty of Graduate and Postdoctoral Affairs in partial fulfillment of the requirements for the degree of Doctor of Philosophy in The Ottawa-Carleton Institute for Electrical and Computer Engineering Carleton University Ottawa, Ontario © 2017 Hossein Mehrfard Abstract In software engineering, program analysis is usually classified according to static analysis (by analyzing source code) and dynamic analysis (by observing program executions). While static analysis provides inaccurate and imprecise results due to programming language's features (e.g., late binding), dynamic analysis produces more accurate and precise results at runtime at the expense of longer executions to collect traces. One prime mechanism to observe executions in dynamic analysis is to instrument either the code or the binary/byte code. Instrumentation overhead potentially poses a serious threat to the accuracy of the dynamic analysis, especially for time dependent software systems (e.g., real-time software), since it can cause those software systems to go out of synchronization. For instance, in a typical real-time software, the dynamic analysis result is correct if the instrumentation overhead, which is due to gathering dynamic information, does not add to the response time of real-time behaviour to the extent that deadlines may be missed. If a deadline is missed, the dynamic analysis result and the system’s output are not accurate. There are two ways to increase accuracy of a dynamic analysis: devising more efficient instrumentation and using a hybrid (static plus dynamic) analysis. A hybrid analysis is a favourable approach to cope with the overhead problem over a purely dynamic analysis. Yet, in the context of reverse engineering source code to produce method calls dynamic and hybrid instrumentations typically lead to large execution traces and consequently large execution overhead. ii This thesis is a step towards efficient and accurate information collection through a hybrid analysis procedure to reverse engineer source code to produce method calls, with the prime objective to reduce instrumentation overhead. To that aim, the first contribution of this thesis is to systematically analyze the contribution to instrumentation overhead of different elements of an existing and promising hybrid solution. Then, a second contribution of the thesis is to suggest an instrumentation optimization process with a range of different designs for those elements to reduce the overhead and select the best one for each element to optimize that solution. The resulting optimized hybrid technique, our third contribution, which potentially produces more accurate instrumentation compared to that hybrid solution for multi-thread software by reducing execution overhead by three quarters, has a reasonable efficiency to reverse engineer programs to produce method calls for multi-threaded software. A final contribution of this thesis is to suggest a set of recommendations for efficient instrumentation. iii Acknowledgements First and foremost, thanks to the God, for shedding lights throughout my life. The God who resembles to nothing and falls beyond any imagination. I would like to express my sincere gratitude to my Ph.D. advisor, Prof. Yvan Labiche for the continuous technical and financial support during the course of my Ph.D. Your professionalism and decency have always been exemplary for me. I learned a great deal from you during my Ph.D. study, especially your attention to details finally taught me to how to craft a perfect software engineering experiment. Thank you Yvan for making my Ph.D. an exciting experience. I would like to thank my Ph.D. committee members, Prof. Guéhéneuc, Prof. Lethbridge, Prof. Franks, and Prof. Baysal for taking their time and reading my thesis. Your insightful comments and different perspectives improved the quality of this manuscript. My special thanks to my fiancé, Sara. Your love and emotional support motivated me to finish this work. Thank you Sara. My deep gratitude to my parents, Masoud and Mojgan, for their unconditional love and constant support, I owe it all to you. I am grateful to my brothers, Ali and Ehsan, who have supported me along the way. I would like to appreciate the SCE staffs, especially Jerry Buburuz and Daren Russ for facilitating my experiments and helping me with the last minute favours. My appreciation also goes to my friends at Carleton university Ehsan Rahimi and Alireza Sharifian for making my Ph.D. an enjoyable experience. Also, I am grateful to my friend Alireza Salimi for the stimulating discussions and his help on my Ph.D. iv Last but not the least, I would like to thank my fellow labmates in SQUALL laboratory for the constructive discussions and their feedback, and for the sleepless nights we were working together before deadlines. v Table of Contents Abstract .............................................................................................................................. ii Acknowledgements .......................................................................................................... iv Table of Contents ............................................................................................................. vi List of Tables .................................................................................................................... ix List of Figures ................................................................................................................... xi 1 Introduction .............................................................................................................................. 1 1.1 Contributions .............................................................................................................. 8 1.2 Manuscript Organization ............................................................................................ 9 2 Background and related work ................................................................................................. 11 2.1 Background .............................................................................................................. 11 2.2 Related works ........................................................................................................... 26 3 An Existing Hybrid Reverse Engineering Approach .............................................................. 52 3.1 Control flow and trace information .......................................................................... 53 3.2 Tool support ............................................................................................................. 56 3.3 Conclusion ............................................................................................................... 72 4 Investigating the Overhead of Kolbah’s Hybrid Approach .................................................... 74 4.1 Research Questions .................................................................................................. 75 4.2 Case studies .............................................................................................................. 77 4.3 Experiment set up ..................................................................................................... 80 4.4 Measurement framework ......................................................................................... 83 4.5 Criteria to select case study ...................................................................................... 86 4.6 Experimental design for overhead study .................................................................. 88 4.7 Results of overhead study ........................................................................................ 90 4.8 Probe effect on functional behaviour ....................................................................... 97 vi 4.9 Identifying potential improvement areas in Light .................................................... 99 4.10 Conclusion ............................................................................................................. 113 5 Optimizing the Hybrid Approach ......................................................................................... 114 5.1 Characteristics of the dynamic analysis ................................................................. 115 5.2 Research Questions ................................................................................................ 117 5.3 Case studies and experiment set up ........................................................................ 118 5.4 Optimizing the Light Instrumentation .................................................................... 119 5.5 Optimized dynamic analysis .................................................................................. 156 5.6 Frequency of probe effect observation ................................................................... 160 5.7 Threats to Validity .................................................................................................. 163 5.8 Conclusion ............................................................................................................. 163 6 Conclusion ............................................................................................................................ 165 6.1 Contribution ..........................................................................................................