Library Migration: a Retrospective Analysis and Tool

University of Calgary PRISM: University of Calgary's Digital Repository Graduate Studies The Vault: Electronic Theses and Dissertations 2019-03-18 Library Migration: A Retrospective Analysis and Tool Zaidi, Syed Sajjad Hussain Zaidi, S. S. H. (2019). Library Migration: A Retrospective Analysis and Tool (Unpublished master's thesis). University of Calgary, Calgary, AB. http://hdl.handle.net/1880/110093 master thesis University of Calgary graduate students retain copyright ownership and moral rights for their thesis. You may use this material in any way that is permitted by the Copyright Act or through licensing that has been assigned to the document. For uses that are not allowable under copyright legislation or licensing, you are required to seek permission. Downloaded from PRISM: https://prism.ucalgary.ca UNIVERSITY OF CALGARY Library Migration: A Retrospective Analysis and Tool by Syed Sajjad Hussain Zaidi A THESIS SUBMITTED TO THE FACULTY OF GRADUATE STUDIES IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE GRADUATE PROGRAM IN COMPUTER SCIENCE CALGARY, ALBERTA MARCH, 2019 © Syed Sajjad Hussain Zaidi 2019 Abstract Modern software engineering practices advocate the principle of reuse through third party software libraries. Software libraries offer application programming interfaces (APIs) for use by developers, thereby significantly reducing development cost and time. These libraries are however subject to deprecation, vulnerabilities, and instability. Furthermore, as the product matures, developers have to worry about upgrading and migrating to better alternatives in order to attract and retain clients. Refactoring a system to start using a new library can cause a domino effect resulting in serious damage to the software system. Little is currently known about library migration; the few existing studies are either too preliminary or too problematic to tell us much. We perform an empirical study of 114 open source Java-based software systems in which library migration had occurred. We find that library migration leads to significant compatibility issues and breakage of source code in practice: library migration broke 67% of the software projects in which it occurred, while 22% of the transitively dependent projects broke due to coupling of APIs with external dependencies. Developers do not effectively use available resources to notify their clients about such migrations, preferring to use internal GitHub threads in contrast to the release notes, and even then often failing to discuss or announce the fact that library migration is planned or has been performed. We also found that mitigating library migration requires substantial effort by developers: on average, transformations affected 9.3% of the total classes in the dependent system, while impacting 15% of the total lines of code. We propose a systematic recommendation tool to assist developers in migrating or upgrading to a new software library. Our prototype tool, named EDW (for External Dependency Watcher), can be utilized to estimate and predict impacts while mitigating migrations among third party libraries. We evaluate EDW by detecting the impacts of migration across three granularities in ten distinct open source projects: our tool has perfect precision for all three granularities; for recall, it obtained over 0.99 for class- and field-granularity while achieving ∼0.86 for method-granularity. We conduct a controlled experiment on EDW: human participants were able to perform the tasks in significantly less time and with better precision/recall using EDW as compared to JRipples. ii Acknowledgements I would like to thank my supervisor, Dr. Robert Walker, for giving me the opportunity to work with him. He taught me how to think critically while being pragmatic. Thanks, Rob, for pushing me towards better research when I hit a dead end and for having faith in me. I would also like to thank Dr. Günther Ruhe for his helpful comments and small talks; I was also able to learn a lot from him, while assisting him during his course. I would like to thank May Sayed for giving me useful tips when I started my program. I would like to thank Hao Men for helping me out in my research. I would also like to thank Hamza Sayed for his helpful feedback and comments on my research. Finally, I would like to acknowledge Jing Zhou’s work that guided me to start my research; I would like to meet him in person some day. iii Dedication I would like to dedicate this work to my Dad and Mom; without them I am nothing. I would also like to thank my Taya, Tai, and Chacho for having my back. iv Table of Contents Abstract ii Acknowledgements iii Dedication iv Table of Contents v List of Figures viii List of Tables x 1 Introduction 1 1.1 Background and Motivation . .3 1.2 Research Questions and Objectives . .7 1.3 Contributions . .7 1.4 Thesis Structure . .8 2 Motivation 9 2.1 Motivational Scenario . .9 2.2 Summary . 21 3 Related Work 22 3.1 Library Upgrading . 23 3.2 Tool Support for Library Migration . 26 3.3 Change Impact Analysis . 28 3.3.1 Incremental change . 30 3.4 Summary . 31 4 Data Collection and Analysis 32 4.1 Selection of Projects . 33 4.1.1 GitHub Crawler . 35 4.1.2 Code Analyzer . 36 4.1.3 Manual Selection . 37 4.2 Data Gathering . 38 4.2.1 Find it EZ . 38 v 4.2.2 JAPICC . 39 4.2.3 Code Compare . 39 4.3 Analysis and Results . 40 4.4 Summary . 53 5 Tool 58 5.1 Overview of EDW ................................... 60 5.2 Dependency Viewer Module . 62 5.2.1 External JAR submodule . 64 5.2.2 Affected classes submodule . 65 5.2.3 Imports submodule . 65 5.3 Editor Module . 65 5.4 Impact Analysis Viewer Module . 68 5.5 Graph Builder Module . 69 5.6 Metrics Builder Module . 69 5.7 Workflow . 71 5.8 Summary . 73 6 Simulation Study 74 6.1 Selection of Projects . 74 6.2 Procedure . 76 6.2.1 Ground truth . 76 6.2.2 Estimated impact set for EDW ........................ 77 6.2.3 Estimated impact set for JRipples ...................... 77 6.2.4 Comparing estimated impact set and ground truth . 78 6.3 Results . 79 6.3.1 RQ5: How well is EDW able to locate the initial impact set for library migration? . 79 6.3.2 RQ6: What factors impact how well EDW performs? . 83 6.4 Summary . 87 7 Human-Subjects Experiment 88 7.1 Hypotheses . 88 7.2 Subject System . 89 7.3 Tasks . 90 7.4 Participants . 90 7.5 Experimental Procedure . 91 7.6 Data Collection . 92 7.7 Results . 93 7.7.1 Quantitative . 93 7.7.2 Qualitative . 96 7.8 Limitations . 101 7.9 Summary . 101 vi 8 Discussion 103 8.1 Trends in Software Library Usage . 103 8.2 Code Breaking of Systems and Their Clients . 104 8.2.1 Categories of Libraries that Broke Clients . 105 8.3 Notification of Software Library Migration . 106 8.4 Evaluation of EDW .................................. 106 8.5 Threats to Validity . 107 8.5.1 Threats to validity of the library migration study . 107 8.5.2 Threats to validity for the evaluation of EDW ................ 108 8.6 Future Work . 109 8.6.1 Knowledge base for external libraries . 109 8.6.2 Improve effort estimation . 110 8.6.3 Improvements to EDW ............................ 110 8.7 Summary . 111 9 Conclusion 113 9.1 Future Work . 115 Bibliography 117 A Studied Systems 126 B Studied Library Migrations 131 C Systems that Broke Transitively Dependent Clients 140 D Systems Used for Evaluation 142 E Individual Project Results for Evaluation 144 F Experimental Materials 149 F.1 Task Descriptions and Instructions . 150 F.2 Pre-Study Questionnaire . 152 F.3 Post-Task Questionnaire . 154 F.4 Final Questionnaire . 155 G Detailed Experimental Results 157 vii List of Figures 1.1 Relationships between a studied project, a system that depends on that project, and the libraries depended upon by both. .2 2.1 Dependencies from classes in the developer’s system to Guava that he ultimately discovers. 15 2.2 Transitive dependencies, from utilityFunction1 in the developer’s system to Guava, that he ultimately discovers. 16 4.1 Approach for data collection and analysis. 34 4.2 Algorithm to detect elimination of external dependencies. 36 4.3 Project count versus reasons behind successful library migration without code breaking. 41 4.4 Project count versus reasons behind complete removal of dependency from a project. 45 4.5 Project count versus alternative library selection criterion. 48 4.6 Percentage of classes impacted versus Maven project (sorted in descending order). 54 4.7 Percentage of classes impacted versus Android project (sorted in descending order). 54 4.8 Percentage of classes impacted versus project from either platform (sorted in descending order). ..

Load more