Software Reuse in Open Source Java Projects

Total Page:16

File Type:pdf, Size:1020Kb

Software Reuse in Open Source Java Projects On the Extent and Nature of Software Reuse in Open Source Java Projects Lars Heinemann, Florian Deissenboeck, Mario Gleirscher, Benjamin Hummel, Maximilian Irlbeck Technische Universität München ICSR 2011, Pohang, Korea 1 Software Reuse • Reuse of existing artifacts for constructing new software • Proven benefits • Increased productivity • Reduced time to market • Improved quality 2 Software Reuse • Tremendous reuse opportunities • Class Libraries (e.g. Apache Commons) • Frameworks (e.g. Eclipse: 40 MLOC) • Open source code (Google Code Search: several GLOC) • Internet serves as reuse repository 3 Research Problem • Unclear how software projects make use of available reuse opportunities • Lack of data on amount of reuse in software projects • Assessing success of software reuse difficult 4 Contribution • Empirical knowledge about extent and nature of software reuse in OSS • Quantitative data on software reuse in 20 open source projects • Substantiates discussion of success/failure of software reuse • Provides practioners with benchmark 5 Terms • Software reuse: Using code developed by third parties (excluding OS/platform) • White-box reuse: Code incorporated in source form (internals exposed, potentially modified) • Black-box reuse: Code incorporated in binary form (internals hidden, no modifications) 6 Study Design (GQM) We analyze open source projects for the purpose of understanding the state of the practice in software reuse with respect to its extent and nature from the viewpoint of the developers and maintainers in the context of Java open source software. 7 Study Design (GQM) Question Metric RQ1: Do open source projects reuse existence of software? software reuse white-box reuse RQ 2: How much white-box reuse occurs? rate black-box reuse RQ 3: How much black-box reuse occurs? rate 8 Reuse Rate Overall code of Project‘s own code software system Reused code Reused source code [LOC] White-box Overall source code [LOC] Reused binary code [bytes] Black-box Overall binary code [bytes] Study Objects • 20 Java projects from • Criteria: Production/Stable, Standalone app, pure Java, Java SE platform, source download available • All among 50 most downloaded • sourcecode size: 0.4 to 790 kLOC, bytecode size: 17 to 22,761 KB • Test code excluded with heuristics (e.g. folders named test/tests) 10 Study Implementation a) Detecting white-box reuse • White-box reuse = copied code • Can be detected automatically by clone detectors • Clone detection against 22 commonly used Java libraries (~ 6MLOC) • Detection of reuse of statement sequences with > 15 statements 11 Study Implementation a) Detecting white-box reuse • In addition: manual inspection of source directory tree • Clues: file/package names • Source of files identified via header comments/web search • Detection of reuse of whole files/ directories, not limited to fixed set of libraries 12 Study Implementation b) Detecting black-box reuse • Byte-code based static analysis • Aggregates byte code size of all library types referenced by project‘s source code • Traverses type dependency graph using Java Constant Pool (type usages and method calls) • Includes transitive dependencies 13 Study Implementation b) Detecting black-box reuse • Although not covered by reuse definition, potential variations in use of Java API interesting • Black-box reuse baseline of empty Java program: 5 MB (2,082 types) • Object → Class → ClassLoader ... (Reflection API / Collections API) 14 Results RQ 1 Do open source projects reuse software? • 18 of the 20 projects (90%) reuse software from third parties • Exceptions: HSQLDB (relational database engine), Youtube Downloader (video download utility) 15 Results RQ 2 How much white-box reuse occurs? • Clone detection found 791 clones, 11,701 copied LOC in 7 study objects • Clones found: complete files with minor modifications (e.g. different version) • Manual inspection found additionally whole copied libraries in 4 study objects • Overall: white-box reuse found for 9 of 20 projects • Reuse rates: 0% - 10% 16 10 20 30 40 50 60 70 0 How much black-box reuse occurs?How much reuse black-box Absolute bytecode size distribution (MB) distribution size bytecode Absolute iReport-Designer soapUI RODIN Results RQ3 Results SQuirreL SQL Client Azureus/Vuze OpenProj TV-Browser DrJava Sweet Home 3D 17 JabRef Mobile Atlas Creator MB -17 13 API: Java MB -42 0 party: 3rd Jedit Buddi DavMail FreeMind HSQLDB PDF Split and Merge Java APIBaseline Java API 3rd party own Mediathek View subsonic YouTube Downloader 100 20 40 60 80 0 How much black-box reuse occurs?How much reuse black-box Combined: 41 -99% 41 Combined: -99% 23 API: Java -62% 0 party: 3rd Relative bytecode size distribution (%) size bytecode Relative PDF Split and Merge YouTube Downloader DavMail Results RQ3 Results Mediathek View Buddi Mobile Atlas Creator Java API subsonic HSQLDB FreeMind 18 OpenProj 3rd Party Sweet Home 3D iReport-Designer JabRef soapUI RODIN own Jedit TV-Browser DrJava SQuirreL SQL Client Azureus/Vuze Relative bytecode size distribution (%) without Java API distribution (%)withoutJava size bytecode Relative 100 20 40 60 80 0 How much black-box reuse occurs?How much reuse black-box PDF Split and Merge iReport-Designer DavMail Results RQ3 Results Buddi soapUI OpenProj RODIN Mobile Atlas Creator SQuirreL SQL Client 19 DrJava 3rd Party Sweet Home 3D TV-Browser JabRef FreeMind Mediathek View own JEdit subsonic Azureus/Vuze HSQLDB YouTube Downloader Discussion a) Extent of reuse • Software reuse common among Java OSS • On average: high black-box reuse rates • Expected to have significant impact on development effort • Black-box reuse rates considerably varying 20 Discussion b) Influence of project size on reuse rate • Lee&Litecky found a negative influence of project size on reuse rate (survey of 500 Ada professionals) • Without Java API: Spearman correlation of 0.05 (two tailed p-value 0.83) • With Java API: Spearman -0.93 (p-value < 0.0001) → significant and strong negative correlation 21 Discussion c) Types of reused functionality • Categorization of reused libraries (e.g. networking, text/xml, rich client platforms) • No predominant category found • Nearly all projects reuse software from more than one category • No significant insights, except reuse diverse w.r.t. types of functionality 22 Threats to internal validity a) overestimation of reuse • False-positives from clone detection • mitigated by manual inspection of results • Unclear if code was copied into study objects or from them • mitigated by manual inspection • Black-box analysis considers a whole class as the element of reuse 23 Threats to internal validity a) underestimation of reuse • Fixed set of libraries in clone detection • False-negatives in clone detection • Manual inspection for copied code inherently incomplete • Black-box analyses misses calls via reflection, boundaries by Java interfaces • Other forms of component interaction 24 Threats to external validity • Unclear how representative study objects are for all Java OSS • Transferability to other PL or commercial development unclear • Impact of PL is expected to be high • Availability of reusable code depends on PL (e.g. Java vs. COBOL) 25 Conclusions • Early visions of development by plugging reusable components not realistic • But: Reuse in form of libraries common in Java OSS • High black-box reuse rates (9 of 20 projects > 50%) • Availability of reusable functionality well- established for Java platform 26 Future Work • Other programming ecosystems • Legacy programming languages, e.g. COBOL • Scripting languages, e.g. Python • Commercial software development environments 27 Thank you. Questions? 28.
Recommended publications
  • Preview HSQLDB Tutorial (PDF Version)
    About the Tutorial HyperSQL Database is a modern relational database manager that conforms closely to the SQL:2011 standard and JDBC 4 specifications. It supports all core features and RDBMS. HSQLDB is used for the development, testing, and deployment of database applications. In this tutorial, we will look closely at HSQLDB, which is one of the best open-source, multi-model, next generation NoSQL product. Audience This tutorial is designed for Software Professionals who are willing to learn HSQL Database in simple and easy steps. It will give you a great understanding on HSQLDB concepts. Prerequisites Before you start practicing the various types of examples given in this tutorial, we assume you are already aware of the concepts of database, especially RDBMS. Disclaimer & Copyright Copyright 2016 by Tutorials Point (I) Pvt. Ltd. All the content and graphics published in this e-book are the property of Tutorials Point (I) Pvt. Ltd. The user of this e-book is prohibited to reuse, retain, copy, distribute or republish any contents or a part of contents of this e-book in any manner without written consent of the publisher. We strive to update the contents of our website and tutorials as timely and as precisely as possible, however, the contents may contain inaccuracies or errors. Tutorials Point (I) Pvt. Ltd. provides no guarantee regarding the accuracy, timeliness or completeness of our website or its contents including this tutorial. If you discover any errors on our website or in this tutorial, please notify us at [email protected]. i Table of Contents About the Tutorial ...................................................................................................................................
    [Show full text]
  • Base Handbook Copyright
    Version 4.0 Base Handbook Copyright This document is Copyright © 2013 by its contributors as listed below. You may distribute it and/or modify it under the terms of either the GNU General Public License (http://www.gnu.org/licenses/gpl.html), version 3 or later, or the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), version 3.0 or later. All trademarks within this guide belong to their legitimate owners. Contributors Jochen Schiffers Robert Großkopf Jost Lange Hazel Russman Martin Fox Andrew Pitonyak Dan Lewis Jean Hollis Weber Acknowledgments This book is based on an original German document, which was translated by Hazel Russman and Martin Fox. Feedback Please direct any comments or suggestions about this document to: [email protected] Publication date and software version Published 3 July 2013. Based on LibreOffice 4.0. Documentation for LibreOffice is available at http://www.libreoffice.org/get-help/documentation Contents Copyright..................................................................................................................................... 2 Contributors.............................................................................................................................2 Feedback................................................................................................................................ 2 Acknowledgments................................................................................................................... 2 Publication
    [Show full text]
  • Full-Graph-Limited-Mvn-Deps.Pdf
    org.jboss.cl.jboss-cl-2.0.9.GA org.jboss.cl.jboss-cl-parent-2.2.1.GA org.jboss.cl.jboss-classloader-N/A org.jboss.cl.jboss-classloading-vfs-N/A org.jboss.cl.jboss-classloading-N/A org.primefaces.extensions.master-pom-1.0.0 org.sonatype.mercury.mercury-mp3-1.0-alpha-1 org.primefaces.themes.overcast-${primefaces.theme.version} org.primefaces.themes.dark-hive-${primefaces.theme.version}org.primefaces.themes.humanity-${primefaces.theme.version}org.primefaces.themes.le-frog-${primefaces.theme.version} org.primefaces.themes.south-street-${primefaces.theme.version}org.primefaces.themes.sunny-${primefaces.theme.version}org.primefaces.themes.hot-sneaks-${primefaces.theme.version}org.primefaces.themes.cupertino-${primefaces.theme.version} org.primefaces.themes.trontastic-${primefaces.theme.version}org.primefaces.themes.excite-bike-${primefaces.theme.version} org.apache.maven.mercury.mercury-external-N/A org.primefaces.themes.redmond-${primefaces.theme.version}org.primefaces.themes.afterwork-${primefaces.theme.version}org.primefaces.themes.glass-x-${primefaces.theme.version}org.primefaces.themes.home-${primefaces.theme.version} org.primefaces.themes.black-tie-${primefaces.theme.version}org.primefaces.themes.eggplant-${primefaces.theme.version} org.apache.maven.mercury.mercury-repo-remote-m2-N/Aorg.apache.maven.mercury.mercury-md-sat-N/A org.primefaces.themes.ui-lightness-${primefaces.theme.version}org.primefaces.themes.midnight-${primefaces.theme.version}org.primefaces.themes.mint-choc-${primefaces.theme.version}org.primefaces.themes.afternoon-${primefaces.theme.version}org.primefaces.themes.dot-luv-${primefaces.theme.version}org.primefaces.themes.smoothness-${primefaces.theme.version}org.primefaces.themes.swanky-purse-${primefaces.theme.version}
    [Show full text]
  • Main Page 1 Main Page
    Main Page 1 Main Page FLOSSMETRICS/ OpenTTT guides FLOSS (Free/Libre open source software) is one of the most important trends in IT since the advent of the PC and commodity software, but despite the potential impact on European firms, its adoption is still hampered by limited knowledge, especially among SMEs that could potentially benefit the most from it. This guide (developed in the context of the FLOSSMETRICS and OpenTTT projects) present a set of guidelines and suggestions for the adoption of open source software within SMEs, using a ladder model that will guide companies from the initial selection and adoption of FLOSS within the IT infrastructure up to the creation of suitable business models based on open source software. The guide is split into an introduction to FLOSS and a catalog of open source applications, selected to fulfill the requests that were gathered in the interviews and audit in the OpenTTT project. The application areas are infrastructural software (ranging from network and system management to security), ERP and CRM applications, groupware, document management, content management systems (CMS), VoIP, graphics/CAD/GIS systems, desktop applications, engineering and manufacturing, vertical business applications and eLearning. This is the third edition of the guide; the guide is distributed under a CC-attribution-sharealike 3.0 license. The author is Carlo Daffara ([email protected]). The complete guide in PDF format is avalaible here [1] Free/ Libre Open Source Software catalog Software: a guide for SMEs • Software Catalog Introduction • SME Guide Introduction • 1. What's Free/Libre/Open Source Software? • Security • 2. Ten myths about free/libre open source software • Data protection and recovery • 3.
    [Show full text]
  • Performance Evaluation of Relational Embedded Databases: an Empirical
    Performance evaluation of relational Check for updates embedded databases: an empirical study Evaluación del rendimiento de bases de datos embebida: un estudio empírico Author: ABSTRACT 1 Hassan B. Hassan Introduction: With the rapid deployment of embedded databases Qusay I. Sarhan2 across a wide range of embedded devices such as mobile devices, Internet of Things (IoT) devices, etc., the amount of data generat- ed by such devices is also growing increasingly. For this reason, the SCIENTIFIC RESEARCH performance is considered as a crucial criterion in the process of selecting the most suitable embedded database management system How to cite this paper: to be used to store/retrieve data of these devices. Currently, many Hassan, B. H., and Sarhan, Q. I., Performance embedded databases are available to be utilized in this context. Ma- evaluation of relational embedded databases: an empirical study, Kurdistan, Irak. Innovaciencia. terials and Methods: In this paper, four popular open-source rela- 2018; 6(1): 1-9. tional embedded databases; namely, H2, HSQLDB, Apache Derby, http://dx.doi.org/10.15649/2346075X.468 and SQLite have been compared experimentally with each other to evaluate their operational performance in terms of creating data- Reception date: base tables, retrieving data, inserting data, updating data, deleting Received: 22 September 2018 Accepted: 10 December 2018 data. Results and Discussion: The experimental results of this Published: 28 December 2018 paper have been illustrated in Table 4. Conclusions: The experi- mental results and analysis showed that HSQLDB outperformed other databases in most evaluation scenarios. Keywords: Embedded devices, Embedded databases, Performance evaluation, Database operational performance, Test methodology.
    [Show full text]
  • Review Supported Technologies
    Review Supported Technologies This document supports Pentaho Business Analytics Suite 5.0 GA and Pentaho Data Integration 5.0 GA, documentation revision August 28, 2013, copyright © 2013 Pentaho Corporation. No part may be reprinted without written permission from Pentaho Corporation. All trademarks are the property of their respective owners. Help and Support Resources If you do not find answers to your quesions here, please contact your Pentaho technical support representative. Support-related questions should be submitted through the Pentaho Customer Support Portal at http://support.pentaho.com. For information about how to purchase support or enable an additional named support contact, please contact your sales representative, or send an email to [email protected]. For information about instructor-led training, visit http://www.pentaho.com/training. Liability Limits and Warranty Disclaimer The author(s) of this document have used their best efforts in preparing the content and the programs contained in it. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The author and publisher make no warranty of any kind, express or implied, with regard to these programs or the documentation contained in this book. The author(s) and Pentaho shall not be liable in the event of incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of the programs, associated instructions, and/or claims. Trademarks Pentaho (TM) and the Pentaho logo are registered trademarks of Pentaho Corporation. All other trademarks are the property of their respective owners. Trademarked names may appear throughout this document.
    [Show full text]
  • Openoffice.Org Base Macro Programming by Andrew Pitonyak
    OpenOffice.org Base Macro Programming By Andrew Pitonyak Last Modified Tuesday, July 28, 2009 at 09:54:04 PM Document Revision: 43 Information Page Copyright This document is Copyright © 2005-2009 by its contributors as listed in the section titled Authors. You can distribute it and/or modify it under the terms of the Creative Commons Attribution License, version 2.0 or later (http://creativecommons.org/licenses/by/2.0/). All trademarks within this guide belong to their legitimate owners. Authors Andrew Pitonyak Feedback Maintainer: Andrew Pitonyak [[email protected]] Please direct any comments or suggestions about this document to: [email protected] Acknowledgments I find it difficult to properly credit all of my sources, because so many people are helpful in an en devour of this size. There are, however, a few people who do indeed stand out in my mind as having provided significant encouragement. I have no explanation as to precisely why my wife Michelle allows me to spend so much time working with OpenOffice.org. Perhaps she is really the person that you should thank for my productivity. I Love you Michelle, you complete me. All of the people with whom I have interacted at Sun Microsystems have been very tolerant and patient with my endless questions. In the creation of this document, Frank Schönheit, however stands out in this regard. Mr. Schönheit spends a lot of time helping people with problems and questions, and most notable for me, he answers my questions. Thank you Frank! There is a large community volunteering their time with OpenOffice.org.
    [Show full text]
  • Eric Redmond, Jim R. Wilson — «Seven Databases in Seven Weeks
    What Readers Are Saying About Seven Databases in Seven Weeks The flow is perfect. On Friday, you’ll be up and running with a new database. On Saturday, you’ll see what it’s like under daily use. By Sunday, you’ll have learned a few tricks that might even surprise the experts! And next week, you’ll vault to another database and have fun all over again. ➤ Ian Dees Coauthor, Using JRuby Provides a great overview of several key databases that will multiply your data modeling options and skills. Read if you want database envy seven times in a row. ➤ Sean Copenhaver Lead Code Commodore, backgroundchecks.com This is by far the best substantive overview of modern databases. Unlike the host of tutorials, blog posts, and documentation I have read, this book taught me why I would want to use each type of database and the ways in which I can use them in a way that made me easily understand and retain the information. It was a pleasure to read. ➤ Loren Sands-Ramshaw Software Engineer, U.S. Department of Defense This is one of the best CouchDB introductions I have seen. ➤ Jan Lehnardt Apache CouchDB Developer and Author Seven Databases in Seven Weeks is an excellent introduction to all aspects of modern database design and implementation. Even spending a day in each chapter will broaden understanding at all skill levels, from novice to expert— there’s something there for everyone. ➤ Jerry Sievert Director of Engineering, Daily Insight Group In an ideal world, the book cover would have been big enough to call this book “Everything you never thought you wanted to know about databases that you can’t possibly live without.” To be fair, Seven Databases in Seven Weeks will probably sell better.
    [Show full text]
  • Working with Database Connections
    WWoorrkkiinngg wwiitthh DDaattaabbaassee CCoonnnneeccttiioonnss Intellicus Enterprise Reporting and BI Platform ©Intellicus Technologies [email protected] www.intellicus.com Working with Database Connections Copyright © 2013 Intellicus Technologies This document and its content is copyrighted material of Intellicus Technologies. The content may not be copied or derived from, through any means, in parts or in whole, without a prior written permission from Intellicus Technologies. All other product names are believed to be registered trademarks of the respective companies. Dated: June 2013 Acknowledgements Intellicus acknowledges using of third-party libraries to extend support to the functionalities that they provide. For details, visit: http://www.intellicus.com/acknowledgements.htm ii Working with Database Connections Contents Configuring Database Connections ........................................ 1 Finding a Connection .................................................................................. 5 Adding Database Connections ...................................................................... 6 Provider specific details for database connections .......................................... 7 BIG DATA ................................................................................................. 7 ASTERDATA ..................................................................................... 7 CASSANDRA .................................................................................... 7 GREENPLUM ...................................................................................
    [Show full text]
  • Chapter 2 Creating a Database Copyright
    Base Guide Chapter 2 Creating a Database Copyright This document is Copyright © 2020 by the LibreOffice Documentation Team. Contributors are listed below. You may distribute it and/or modify it under the terms of either the GNU General Public License (http://www.gnu.org/licenses/gpl.html), version 3 or later, or the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), version 4.0 or later. All trademarks within this guide belong to their legitimate owners. Contributors To this edition Pulkit Krishna Dan Lewis Jean Hollis Weber To previous editions Pulkit Krishna Dan Lewis Jean Hollis Weber Jochen Schiffers Robert Großkopf Jost Lange Martin Fox Hazel Russman Feedback Please direct any comments or suggestions about this document to the Documentation Team’s mailing list: [email protected] Note: Everything you send to a mailing list, including your email address and any other personal information that is written in the message, is publicly archived and cannot be deleted. Publication date and software version Published May 2020. Based on LibreOffice 6.4. Documentation for LibreOffice is available at http://documentation.libreoffice.org/en/ Contents Copyright..............................................................................................................................2 Contributors.................................................................................................................................2 To this edition..........................................................................................................................2
    [Show full text]
  • (Ijoe) – Eissn: 2626-8493 – Vol. 16, No. 9, 2020
    Paper—A Platform for Electronic Health Record Sharing in Environments with Scarce Resource… A Platform for Electronic Health Record Sharing in Environments with Scarce Resource Using Cloud Computing https://doi.org/10.3991/ijoe.v16i09.13187 Muhamad Fitra Kacamarga, Arif Budiarto (), Bens Pardamean Bina Nusantara University, Jakarta, Indonesia [email protected] Abstract—One of the main objectives of Electronic Health Record (EHR) is the transferability of patient data from one location to another. Many locations with scarce resources, particularly unreliable internet connectivity, face difficul- ties in accessing and sharing EHR data. This article presents our proposed design that utilizes Amazon Web Services (AWS) for a sharing mechanism platform among distributed healthcare organizations found in an environment with scarce resources. We proposed the use of database replication mechanism and REST (Representational State Transfer) web service to perform information exchange among health organizations and public health information systems. Keywords—EHR, cloud computing, database replication, web service. 1 Introduction An emerging development of information technology (IT) has provided benefits for health care institutions to effectively collect and manage vast amount of patients’ data in clinical settings including Electronic Health Record (EHR), medical image data, ge- netics data, and personal daily activities data [1]–[5]. Among all these data, EHR is the most comprehensive and important data source which can explain the patients’ condi- tion over time. EHR is the digital format of patients’ medical record that can be shared with multiple health care organizations for clinical purposes [6]. One of the main ob- jectives for an EHR implementation is the transferability of patient data from one loca- tion to another; this is especially crucial due to the multi-locale nature of data collection within the healthcare service environment.
    [Show full text]
  • Libreoffice Base Guide 6.4 | 3 Relationships Between Tables
    Copyright This document is Copyright © 2020 by the LibreOffice Documentation Team. Contributors are listed below. You may distribute it and/or modify it under the terms of either the GNU General Public License (http://www.gnu.org/licenses/gpl.html), version 3 or later, or the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), version 4.0 or later. All trademarks within this guide belong to their legitimate owners. Contributors This guide has been updated from Base Guide 6.2. To this edition Pulkit Krishna Dan Lewis Jenna Sargent Drew Jensen Jean-Pierre Ledure Jean Hollis Weber To previous editions Pulkit Krishna Jean Hollis Weber Dan Lewis Peter Scholfield Jochen Schiffers Robert Großkopf Jost Lange Martin Fox Hazel Russman Steve Schwettman Alain Romedenne Andrew Pitonyak Jean-Pierre Ledure Drew Jensen Randolph GAMO Feedback Please direct any comments or suggestions about this document to the Documentation Team’s mailing list: [email protected] Note Everything you send to a mailing list, including your email address and any other personal information that is written in the message, is publicly archived and cannot be deleted. Publication date and software version Published July 2020. Based on LibreOffice 6.4. Documentation for LibreOffice is available at http://documentation.libreoffice.org/en/ Contents Copyright.....................................................................................................................................2 Contributors............................................................................................................................2
    [Show full text]