Mining Software Repositories to Assist Developers and Support Managers by Ahmed E. Hassan A thesis presented to the University of Waterloo in fulfilment of the thesis requirement for the degree of Doctor of Philosophy in Computer Science Waterloo, Ontario, Canada, 2004 c Ahmed E. Hassan 2004 I hereby declare that I am the sole author of this thesis. This is a true copy of the thesis, including any required final revisions, as accepted by my examiners. I understand that my thesis may be made electronically available to the public. ii Abstract This thesis explores mining the evolutionary history of a software system to support software developers and managers in their endeavors to build and maintain complex software systems. We introduce the idea of evolutionary extractors which are special- ized extractors that can recover the history of software projects from soft- ware repositories, such as source control systems. The challenges faced in building C-REX, an evolutionary extractor for the C programming lan- guage, are discussed. We examine the use of source control systems in industry and the quality of the recovered C-REX data through a survey of several software practitioners. Using the data recovered by C-REX, we develop several approaches and techniques to assist developers and managers in their activities. We propose Source Sticky Notes to assist developers in understanding legacy software systems by attaching historical information to the depen- dency graph. We present the Development Replay approach to estimate the benefits of adopting new software maintenance tools by reenacting the development history. We propose the Top Ten List which assists managers in allocating test- ing resources to the subsystems that are most susceptible to have faults. To assist managers in improving the quality of their projects, we present a complexity metric which quantifies the complexity of the changes to the code instead of quantifying the complexity of the source code itself. All presented approaches are validated empirically using data from several large open source systems. The presented work highlights the benefits of transforming software repositories from static record keeping repositories to active repositories used by researchers to gain empirically based understanding of software development, and by software practitioners to predict, plan and under- stand various aspects of their project. iii Acknowledgements This thesis would not have been possible without the support of many exceptional people to whom I am grateful. I am greatly indebted to my supervisor Prof. Richard C. Holt for his support and guidance throughout the years. Ric gave me the freedom to pursue research on many interesting topics while providing encouraging words and advice when needed. Ric has taught me many valuable lessons about being a good researcher and a caring teacher. I sincerely appreciate his advice concerning my presentations and writings. His counsel has been invaluable in making my work readable and my presentations entertaining. Through out my research, Ric has continued to challenge my findings and question my results as we traveled the world presenting the work in this thesis at several international venues. Thanks Ric for making this a very enjoyable and educational experience! It saddens me that the journey has come to an end. Thanks for being a great teacher, a friendly proponent and a challenging opponent. I appreciate the time and effort that Prof. Jo Atlee, Prof. Charlie Clark, and Prof. Kostas Kontogiannis put into reading my thesis and their valu- able comments. I am grateful to Prof. Dewayne Perry for taking time out of his busy schedule to act as the external for my thesis. His insight and knowledge about prior research has been extremely valuable in strength- ening my work, especially in spots where I lacked the hard evidence to support my intuition. I would like to thank Prof. Michael W. Godfrey for always being like a big brother in offering his views and advice on several aspects of academic life. I also would like to thank Mary McColl for her assistance with vari- ous administrative details throughout my degree. I appreciate the lively and engaging discussions with many current and previous students of Ric and members of SWAG, in particular, Prof. Susan Sim, Prof. Bil Tzerpos, and Jingwei Wu (with whom I collaborated on many research papers). The work in this thesis uses several open source repositories. I grate- fully acknowledge the significant contributions of members of the Open v Source community who have given freely of their time to produce large software systems with rich and detailed source code repositories; and who assisted me in understanding and acquiring these valuable repositories. I am very fortunate to work with many great people at Research In Motion. I would like to thank them for their friendship and their will- ingness to listen to a crazy scientist who disturbed them with his ideas and thoughts during their peaceful breaks. In particular, I would like to thank Vi Thuan Banh, Denny Chiu, James Godfrey, and Sean (J. F.) Wilson. Many of the ideas in this thesis, were developed by monitoring some of their work habits (e.g. the source sticky notes approach). I am also grateful for Vi Thuan for proofreading parts of this thesis. Throughout this journey, I was lucky to have many good friends who have always been willing to discuss my research ideas and to disagree with me. Thankfully, they still remain my friends at the end of the jour- ney even though sometimes I was too busy to show my appreciation. I would like to thank Mohammed Abouzour, Andrew Hunter, Prof. Karim Karim and Stephen Sheeler. Thanks guys for always telling me when things just did not make sense, you guys made this journey fun and en- durable! Special thanks to my uncle (Mamdouh Bekhit) for his continuous and caring support. I am also grateful to my family for providing an environ- ment where I could escape in order to relax and regain my strength and sanity. The work in this thesis would not have been possible if it were not for a truly great and extremely patient person: My mom, to whom I dedicate this thesis. Thanks Mom! Ahmed E. Hassan on February 2005 vi To My Mom vii Related Publications The following is a list of our publications that are on the topic of mining software repositories: 1. Exploring Software Evolution Using Spectrographs, Jingwei Wu, Ahmed E. Hassan and Richard C. Holt, Proceedings of WCRE 2004: Working Conference on Reverse Engineering, Delft, 2004 2. Predicting Change Propagation in Software Systems, Ahmed E. Hassan and Richard C. Holt, Proceedings of ICSM 2004: Interna- tional Conference on Software Maintenance, Chicago, Illinois, USA, September 11-17, 2004. 3. Evolution Spectrographs: Visualizing Punctuated Change in Soft- ware Evolution, Jingwei Wu, Claus W. Spitzer, Ahmed E. Has- san and Richard C. Holt, Proceedings of IWPSE 2004: Interna- tional Workshop on Principles of Software Evolution, Kyoto, Japan, September 6-7, 2004. 4. Studying The Evolution of Software Systems Using Evolutionary Code Extractors, Ahmed E. Hassan and Richard C. Holt, Draft, Pro- ceedings of IWPSE 2004: International Workshop on Principles of Software Evolution, Kyoto, Japan, September 6-7, 2004. 5. Using Development History Sticky Notes to Understand Software Architecture, Ahmed E. Hassan and Richard C. Holt, Proceedings of IWPC 2004: International Workshop on Program Comprehension, Bari, Italy, June 24-26, 2004. 6. MSR 2004: The International Workshop on Mining Software Repos- itories, Ahmed E. Hassan, Richard C. Holt and Audris Mockus, Proceedings of ICSE 2004: International Conference on Software Engineering, Scotland, UK, May 23-28, 2004.Workshop Website: http://msr.uwaterloo.ca ix 7. Studying The Chaos of Code Development, Ahmed E. Hassan and Richard C. Holt, Proceedings of WCRE 2003: Working Conference on Reverse Engineering, Victoria, British Columbia, Canada, Novem- ber 13-16, 2003. 8. The Chaos of Software Development, Ahmed E. Hassan and Richard C. Holt, Proceedings of IWPSE 2003: International Workshop on Principles of Software Evolution, Helsinki, Finland, September 1-2, 2003. x Contents 1 Introduction 1 1.0.1 Prior Research ....................... 2 1.0.2 Personal Experience ................... 3 1.0.3 The Open Source Phenomena .............. 4 1.1 Research Hypothesis ....................... 5 1.2 Thesis Organization ........................ 6 1.3 Thesis Overview .......................... 7 1.3.1 Part I: Extracting Information From Software Repos- itories ............................ 7 1.3.2 Part II: Using Software Repositories to Support De- velopers ........................... 10 1.3.3 Part II: Using Software Repositories to Support Man- agers ............................ 11 1.4 Thesis Contributions ....................... 13 Part I: Extracting Information From Software Repositories 17 2 Studying The Evolution of Software Systems Using Evo- lutionary Code Extractors 21 2.1 Introduction ............................ 22 2.1.1 Organization Of Chapter . 23 2.2 Describing Source Code Evolution . 24 2.3 The Dimensions Of Source Code Evolution . 26 xi CONTENTS 2.3.1 Frequency of Snapshots . 27 2.3.2 Data Source ........................ 28 2.3.3 The Characteristics of the Source Code . 29 2.3.4 Level of Detail ....................... 31 2.4 Challenges And Complexity ................... 33 2.4.1 Robustness and Scalability . 33 2.4.2 Accuracy .......................... 34 2.4.3 The Changing and Unstable Nature of Source Code . 35 2.4.4 Development Time .................... 35 2.5 Previous Work ........................... 36 2.6 Conclusion ............................. 37 3 C-REX: An Evolutionary Code Extractor for C 39 3.1 Introduction ............................ 39 3.1.1 Organization of Chapter . 42 3.2 Evolutionary Code Extractors . 42 3.2.1 Frequency of Snapshots . 44 3.2.2 Data Source ........................ 45 3.2.3 The Characteristics of Code . 45 3.2.4 Level of Detail ....................... 46 3.3 Challenges In Developing C-REX . 47 3.3.1 Robustness and Scalability of the Extractor . 48 3.3.2 Accuracy of the Extracted Information .
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages300 Page
-
File Size-