Effective and Efficient Reuse with Software Libraries
Total Page:16
File Type:pdf, Size:1020Kb
Effective and Efficient Reuse with Software Libraries Lars Heinemann Institut für Informatik der Technischen Universität München Effective and Efficient Reuse with Software Libraries Lars Heinemann Vollständiger Abdruck der von der Fakultät für Informatik der Technischen Universität München zur Erlangung des akademischen Grades eines Doktors der Naturwissenschaften (Dr. rer. nat.) genehmigten Dissertation. Vorsitzender: Univ.-Prof. Dr. Daniel Cremers Prüfer der Dissertation: 1. Univ.-Prof. Dr. Dr. h.c. Manfred Broy 2. Prof. Martin Robillard, Ph.D. McGill University, Montréal, Kanada Die Dissertation wurde am 26.07.2012 bei der Technischen Universität München eingere- icht und durch die Fakultät für Informatik am 20.11.2012 angenommen. Abstract Research in software engineering has shown that software reuse positively affects the competitive- ness of an organization: the productivity of the development team is increased, the time-to-market is reduced, and the overall quality of the resulting software is improved. Today’s code repositories on the Internet provide a large number of reusable software libraries with a variety of functionality. To analyze how software projects utilize these libraries, this thesis contributes an empirical study on the extent and nature of software reuse in practice. The study results indicate that third-party code reuse plays a central role in modern software development and that reuse of software libraries is the predominant form of reuse. It shows that many of today’s software systems consist to a considerable fraction of code reused from software libraries. While these libraries represent a great reuse opportunity, they also pose a great challenge to de- velopers. Often these libraries and their application programming interfaces (APIs) are large and complex, leading to cognitive barriers for developers. These obstacles can cause unintended reim- plementation in a software system. This negatively affects the system’s maintainability due to the unnecessary size increase of the code base and may even compromise its functional correctness, as library functionality is often more mature. To investigate the applicability of analytic methods for finding missed reuse opportunities, this work presents an empirical study on detecting functionally similar code fragments. It reports on the considerable challenges of the automated detection of reimplemented functionality in real-world software systems. The results of the study emphasize the need for constructive approaches to avoid reimplementation during forward-engineering. Motivated by these results, this thesis introduces an API method recommendation system to support developers in using large APIs of reusable libraries. The approach employs data mining for extract- ing the intentional knowledge embodied in the identifiers of code bases using the APIs. The knowl- edge obtained is used to assist developers with context-specific recommendations for API methods. The system suggests methods that are useful for the current programming task based on the code the developer is editing within the integrated development environment (IDE). It thereby makes reuse of software libraries both more effective and efficient. As opposed to existing approaches, the proposed system is not dependent on the previous use of API entities and can therefore produce sensible recommendations in more cases. We evaluate the approach with a case study showing its applicability to real-world software systems and quantitatively comparing it to previous work. In addition, we present an approach that transfers the idea of API recommendation systems to the field of model-based development. We introduce a recommendation system that assists develop- ers with context-dependent model element recommendations during modeling. We instantiate and evaluate the approach for the Simulink modeling language. To the best of our knowledge, this is the first work to adopt recommendation systems to the field of model-based development. 3 Acknowledgements I am grateful to all the people that supported me on the way to this thesis. First, I want to thank Manfred Broy for providing me with the opportunity to work at his chair and for supervising this thesis. Working at the chair in teaching and research has both extended and sharpened my view on computer science in a way I did not expect. The professional environment at his chair is outstanding in many ways, in particular with respect to the challenging research projects involving key players from both industry and academia. I am also grateful to Martin Robillard for co-supervising this thesis and his valuable feedback on my work. In addition, I want to express my gratitude to the complete staff at the chair of Prof. Broy. I very much enjoyed working with all of you in the last three and a half years. I especially thank Silke Müller for the organizational support, the IT department for always providing me with all I needed in terms of hardware and software resources, Florian Deißenböck for the most direct and honest review feedback I ever received, Elmar Jürgens for being a great example regarding his remarkable writing skills, Benjamin Hummel for a constantly open door for discussing research ideas, Veronika Bauer for sharing an office with me, and Andreas Vogelsang for his numerous and always helpful paper reviews. My scientific work in the last years included the publication of a number of research papers. I am thankful for the co-authorship in joint paper projects of Veronika Bauer, Florian Deißenböck, Mario Gleirscher, Markus Herrmannsdörfer, Benjamin Hummel, Maximilian Irlbeck, Elmar Jür- gens, Klaus Lochmann, Daniel Ratiu, and Stefan Wagner. I also worked in interesting and challenging research projects at the chair of Prof. Broy together with a number of fellow researchers. I want to express my gratitude to Veronika Bauer, Sebastian Eder, Benedikt Hauptmann, Markus Herrmannsdörfer, Benjamin Hummel, Maximilian Junker, Klaus Lochmann, and Daniel Méndez-Fernández for being great colleagues during research projects. Further thanks go to the proof-readers of this thesis for insightful discussions and valuable feedback: Veronika Bauer, Jonas Eckhardt, Sebastian Eder, Henning Femmer, Bendikt Hauptmann, Benjamin Hummel, Maximilan Junker, Elmar Jürgens, Klaus Lochmann, Birgit Penzenstadler, and Andreas Vogelsang. Last but not least I want to thank my family and friends for their support during my entire education and especially this thesis project. 5 Contents 1 Introduction 9 1.1 Problem Statement . 11 1.2 Contribution . 11 1.3 Outline . 12 2 Preliminaries 13 2.1 Why Software Reuse? . 13 2.2 Software Reuse Approaches . 13 2.3 Obstacles for Software Reuse . 16 2.4 Perspective of this Thesis . 17 2.5 Data Mining Background . 18 2.6 Recommendation Systems in Software Engineering . 21 2.7 Model-Based Development . 22 2.8 ConQAT: Continuous Quality Assessment Toolkit . 23 3 State of the Art 25 3.1 Quantifying Reuse in Software Projects . 25 3.2 Detection of Functional Similarity in Programs . 27 3.3 Developer Support for Library Reuse . 31 3.4 Recommendation Systems for Model-Based Development . 37 4 Empirical Study on Third-Party Code Reuse in Practice 39 4.1 Terms . 40 4.2 Study Design . 40 4.3 Study Objects . 41 4.4 Study Implementation and Execution . 43 4.5 Results . 46 4.6 Discussion . 52 4.7 Threats to Validity . 55 4.8 Summary . 57 5 Challenges of the Detection of Functionally Similar Code Fragments 59 5.1 Terms . 59 5.2 Dynamic Detection Approach . 61 5.3 Case Study . 68 5.4 Threats to Validity . 80 5.5 Comparison to Jiang&Su’s Approach . 80 7 Contents 5.6 Discussion . 82 5.7 Summary . 83 6 Developer Support for Library Reuse 85 6.1 Motivation . 85 6.2 Terms . 86 6.3 Approach . 86 6.4 Case Study . 92 6.5 Discussion . 104 6.6 Threats to Validity . 105 6.7 Summary . 106 7 Beyond Code: Reuse Support for Model-Based Development 107 7.1 Approach . 108 7.2 Case Study . 109 7.3 Discussion . 113 7.4 Threats to Validity . 113 7.5 Summary . 114 8 Tool Support 115 8.1 API Method Recommendation . 115 8.2 Model Recommendation . 118 9 Conclusion 121 9.1 Third-Party Code Reuse in Practice . 121 9.2 Detection of Functionally Similar Code Fragments . 122 9.3 Developer Support for Library Reuse . 123 9.4 Reuse Support for Model-Based Development . 124 10 Future Work 125 10.1 Analysis of Third-Party Code Reuse . 125 10.2 Detection of Functionally Similar Code Fragments . 126 10.3 API Recommendation . 126 10.4 Model Recommendation . 127 Bibliography 129 8 1 Introduction Software reuse addresses the use of existing artifacts for the construction of new software. Research in software engineering has shown that it positively affects the competitiveness of an organization in multiple ways: the productivity of the development team is increased, the time-to-market is reduced, and the overall quality of the resulting software is improved [46, 66]. While software reuse can take many forms, a particularly attractive form are externally developed artifacts, for which the complete development and maintenance is outsourced to third parties. Fueled by both the free/open source movements and commercial marketplaces for software components in combination with the evolution of the Internet as a commodity resource in software development, a large number of reusable libraries and frameworks is available for building new software [101]. The Internet itself has become an interesting reuse repository [43] and is emerging as “a de facto standard library for reusable assets” [26]. For example, the Java library search engine findJAR.com1 indexes more than 140,000 Java class libraries2 and the Apache Maven repository3 contains about 40,000 unique artifacts4. To effectively use a software library, and thus achieve high reuse rates, profound knowledge about the library’s content is required. Unfortunately, recent research indicates that developers still rein- vent the wheel and reimplement general purpose functionality in their code instead of reusing ma- ture and well-tested library functionality [52, 57].