Impact of License Selection on Open Source Software Quality Benjamin J
Total Page:16
File Type:pdf, Size:1020Kb
Purdue University Purdue e-Pubs Open Access Theses Theses and Dissertations Fall 2014 Impact of license selection on open source software quality Benjamin J. Cotton Purdue University Follow this and additional works at: https://docs.lib.purdue.edu/open_access_theses Part of the Computer Engineering Commons, and the Computer Sciences Commons Recommended Citation Cotton, Benjamin J., "Impact of license selection on open source software quality" (2014). Open Access Theses. 314. https://docs.lib.purdue.edu/open_access_theses/314 This document has been made available through Purdue e-Pubs, a service of the Purdue University Libraries. Please contact [email protected] for additional information. PURDUE UNIVERSITY GRADUATE SCHOOL Thesis/Dissertation Acceptance Benjamin James Cotton ! ! Impact of license selection on open source software quality Master of Science Kevin Dittman Jeffrey Brewer Jeffrey Whitten To the best of my knowledge and as understood by the student in the Thesis/Dissertation Agreement, Publication Delay, and Certification/Disclaimer (Graduate School Form 32), this thesis/dissertation adheres to the provisions of Purdue University’s “Policy on Integrity in Research” and the use of copyrighted material. Kevin Dittman Jeffrey Whitten 11/24/2014 IMPACT OF LICENSE SELECTION ON OPEN SOURCE SOFTWARE QUALITY AThesis Submitted to the Faculty of Purdue University by Benjamin J. Cotton In Partial Fulfillment of the Requirements for the Degree of Master of Science December 2014 Purdue University West Lafayette, Indiana ii Dedicated to my wife, Angela, and my daughters, Eleanor and Bridget, whose unconditional love and support made this possible. iii ACKNOWLEDGMENTS No thesis is ever completed without support, advice, and encouragement. I would like to thank the following people for their contributions to my e↵orts. Professors Kevin Dittman, Je↵rey Whitten, and Je↵rey Brewer whose guidance and input as I developed the idea for this work kept me on the right track. Professors Dittman and Whitten also taught several classes that kept me motivated in the pursuit of my degree. Gerry McCartney, whose creation of the ITaP Scholarship Program inspired me to apply to graduate school in the first place. Preston Smith, Randy Herban, Carol Song, and Rob Futrick, who were my supervisors at various times through the course of my graduate studies and graciously allowed me time to attend class in the middle of the work day. Similarly, I must acknowledge the coworkers who had to deal with my sporadic absences. Finally, but not least, members of various open source communities including the Fedora Documentation team and the Greater Lafayette Open Source Symposium. Their ideas, both related to my thesis and not, have shaped my interest in open source and community collaboration. iv TABLE OF CONTENTS Page LIST OF TABLES ................................ vi LIST OF FIGURES ............................... vii ABBREVIATIONS ................................ viii GLOSSARY .................................... ix ABSTRACT ................................... x CHAPTER 1. INTRODUCTION ........................ 1 1.1 Statement of the Problem ....................... 1 1.2 Significance of the Problem ...................... 2 1.3 Research Question ........................... 4 1.4 Licenses ................................. 4 1.4.1 Copyleft ............................. 4 1.4.2 Permissive ............................ 5 1.5 Assumptions ............................... 5 1.6 Limitations ............................... 6 1.7 Delimitations .............................. 6 1.8 Summary ................................ 7 CHAPTER 2. REVIEW OF THE LITERATURE .............. 8 2.1 Definition of Quality .......................... 8 2.2 Quality Metrics ............................. 9 2.2.1 Bug reports ........................... 9 2.2.2 Selecting metrics ........................ 9 2.2.3 Static Analysis ......................... 12 2.3 Technical Debt ............................. 12 2.3.1 Definition ............................ 12 2.3.2 Measurement .......................... 13 2.4 SonarQube ................................ 15 2.5 Summary ................................ 15 CHAPTER 3. METHODOLOGY ........................ 16 3.1 Hypotheses ............................... 16 3.2 Data Collection ............................. 16 3.2.1 Software Selection ........................ 16 3.2.2 Metrics Collected ........................ 20 v Page 3.2.3 Collection Environment .................... 21 3.3 Analysis Methods ............................ 21 3.4 Threats to Validity ........................... 21 3.5 Summary ................................ 22 CHAPTER 4. PRESENTATION OF DATA AND FINDINGS ........ 23 4.1 Presentation of the data ........................ 23 4.2 Analysis of the data .......................... 27 4.3 Summary ................................ 29 CHAPTER 5. CONCLUSION, DISCUSSION, AND RECOMMENDATIONS 31 5.1 Conclusion ................................ 31 5.2 Future Work ............................... 32 5.3 Summary ................................ 32 LIST OF REFERENCES ............................ 33 vi LIST OF TABLES Table Page 1.1 The open source licenses included in this study. ............. 7 2.1 Attributes of software quality as defined by Boehm, Brown, and Lipow (1976). .................................... 11 3.1 Software projects included in this study ................. 17 3.2 The measures used to evaluate projects. ................. 20 4.1 Complexity and technical debt measurements .............. 23 4.2 Mean technical debt of projects by language and paradigm. ....... 30 vii LIST OF FIGURES Figure Page 4.1 Distribution of technical debt for programs in this study ........ 27 4.2 Distribution of programming languages for programs in this study ... 28 4.3 Technical debt of projects analyzed in this study ............. 29 4.4 Technical debt of Java programs in this study .............. 30 viii ABBREVIATIONS BSD Berkeley Software Distribution CDDL Common Development and Distribution License GPL GNU General Public License FSF Free Software Foundation KLOC thousand lines of code LGPL ”Lesser” GNU General Public License LOC lines of code ISO International Standards Organization MPL Mozilla Public License MTBF mean time between failures MTTF mean time to failure OSI Open Source Initiative OSS open source software PMI Project Management Institute SQALE software quality assessment based on life-cycle expectations ix GLOSSARY free software software under a license that provides the four freedoms defined by the Free Software Foundation (2013b) open source software under a license that meets the definition given by the Open Source Initiative (n.d.) permissive software under a license that is open source but not free software x ABSTRACT Cotton, Benjamin J. M.S., Purdue University, December 2014. Impact of license selection on open source software quality. Major Professor: Kevin C. Dittman. Open source software plays an important part in the modern world, powering businesses large and small. However, little work has been done to evaluate the quality of open source software. Two di↵erent license paradigms exist within the open source world, and this study examines the di↵erence in software quality between them. In this thesis, the author uses technical debt as a measure of software quality. Eighty open source projects (40 from each paradigm) were downloaded from the popular open source hosting website SourceForge. Using complexity, code duplication, comments, and unit test coverage as inputs to the SonarQube technical debt model, each project was evaluated. The technical debt was normalized based on the cyclomatic complexity and the paradigms were compared with the Mann-Whitney test. The results showed a clear di↵erence between the two paradigms. However, the results presented in this thesis are only a starting point. The collected data suggest that the programming language used in a project has an impact on the project’s quality. In addition, SonarQube plugins for the popular C and C++ languages were beyond the budget of this work, excluding many projects from consideration. This thesis closes with several suggestions for further avenues of investigation. 1 CHAPTER 1. INTRODUCTION This chapter presents the foundation of the study. It begins with a statement of the problem and its significance. The research question is stated in clear terms. Important definitions, including explanations of the license paradigms referenced throughout the study, are provided. Assumptions, limitations, and delimitations applicable to the study are enumerated. 1.1 Statement of the Problem The development of open source software has grown from the purview of hobbyist programmers into a major source of revenue. In 2012, Red Hat became the first open source company to record a billion dollars in revenue (Babcock, 2012). Red Hat has seen steady growth in revenue in the past decade and reported net revenue above $100 million in 2011 and 2012 (Red Hat, 2013). Other companies such as Oracle also generate revenue from support of their open source o↵erings. Many large Internet corporations such as Google, Facebook, and Amazon make heavy use of open source software to run their business. Small businesses especially rely on the open source Word Press and MySQL projects for their web presence (Hendrickson, Magoulas, & O’Reilly, 2012). Researchers (Kuan, 2002; Mockus, Fielding, & Herbsleb, 2002) have investigated the quality of open source projects in comparison to their proprietarily-developed counterparts.