Software Reuse in Open Source Java Projects

Software Reuse in Open Source Java Projects

On the Extent and Nature of Software Reuse in Open Source Java Projects Lars Heinemann, Florian Deissenboeck, Mario Gleirscher, Benjamin Hummel, Maximilian Irlbeck Technische Universität München ICSR 2011, Pohang, Korea 1 Software Reuse • Reuse of existing artifacts for constructing new software • Proven benefits • Increased productivity • Reduced time to market • Improved quality 2 Software Reuse • Tremendous reuse opportunities • Class Libraries (e.g. Apache Commons) • Frameworks (e.g. Eclipse: 40 MLOC) • Open source code (Google Code Search: several GLOC) • Internet serves as reuse repository 3 Research Problem • Unclear how software projects make use of available reuse opportunities • Lack of data on amount of reuse in software projects • Assessing success of software reuse difficult 4 Contribution • Empirical knowledge about extent and nature of software reuse in OSS • Quantitative data on software reuse in 20 open source projects • Substantiates discussion of success/failure of software reuse • Provides practioners with benchmark 5 Terms • Software reuse: Using code developed by third parties (excluding OS/platform) • White-box reuse: Code incorporated in source form (internals exposed, potentially modified) • Black-box reuse: Code incorporated in binary form (internals hidden, no modifications) 6 Study Design (GQM) We analyze open source projects for the purpose of understanding the state of the practice in software reuse with respect to its extent and nature from the viewpoint of the developers and maintainers in the context of Java open source software. 7 Study Design (GQM) Question Metric RQ1: Do open source projects reuse existence of software? software reuse white-box reuse RQ 2: How much white-box reuse occurs? rate black-box reuse RQ 3: How much black-box reuse occurs? rate 8 Reuse Rate Overall code of Project‘s own code software system Reused code Reused source code [LOC] White-box Overall source code [LOC] Reused binary code [bytes] Black-box Overall binary code [bytes] Study Objects • 20 Java projects from • Criteria: Production/Stable, Standalone app, pure Java, Java SE platform, source download available • All among 50 most downloaded • sourcecode size: 0.4 to 790 kLOC, bytecode size: 17 to 22,761 KB • Test code excluded with heuristics (e.g. folders named test/tests) 10 Study Implementation a) Detecting white-box reuse • White-box reuse = copied code • Can be detected automatically by clone detectors • Clone detection against 22 commonly used Java libraries (~ 6MLOC) • Detection of reuse of statement sequences with > 15 statements 11 Study Implementation a) Detecting white-box reuse • In addition: manual inspection of source directory tree • Clues: file/package names • Source of files identified via header comments/web search • Detection of reuse of whole files/ directories, not limited to fixed set of libraries 12 Study Implementation b) Detecting black-box reuse • Byte-code based static analysis • Aggregates byte code size of all library types referenced by project‘s source code • Traverses type dependency graph using Java Constant Pool (type usages and method calls) • Includes transitive dependencies 13 Study Implementation b) Detecting black-box reuse • Although not covered by reuse definition, potential variations in use of Java API interesting • Black-box reuse baseline of empty Java program: 5 MB (2,082 types) • Object → Class → ClassLoader ... (Reflection API / Collections API) 14 Results RQ 1 Do open source projects reuse software? • 18 of the 20 projects (90%) reuse software from third parties • Exceptions: HSQLDB (relational database engine), Youtube Downloader (video download utility) 15 Results RQ 2 How much white-box reuse occurs? • Clone detection found 791 clones, 11,701 copied LOC in 7 study objects • Clones found: complete files with minor modifications (e.g. different version) • Manual inspection found additionally whole copied libraries in 4 study objects • Overall: white-box reuse found for 9 of 20 projects • Reuse rates: 0% - 10% 16 10 20 30 40 50 60 70 0 How much black-box reuse occurs?How much reuse black-box Absolute bytecode size distribution (MB) distribution size bytecode Absolute iReport-Designer soapUI RODIN Results RQ3 Results SQuirreL SQL Client Azureus/Vuze OpenProj TV-Browser DrJava Sweet Home 3D 17 JabRef Mobile Atlas Creator MB -17 13 API: Java MB -42 0 party: 3rd Jedit Buddi DavMail FreeMind HSQLDB PDF Split and Merge Java APIBaseline Java API 3rd party own Mediathek View subsonic YouTube Downloader 100 20 40 60 80 0 How much black-box reuse occurs?How much reuse black-box Combined: 41 -99% 41 Combined: -99% 23 API: Java -62% 0 party: 3rd Relative bytecode size distribution (%) size bytecode Relative PDF Split and Merge YouTube Downloader DavMail Results RQ3 Results Mediathek View Buddi Mobile Atlas Creator Java API subsonic HSQLDB FreeMind 18 OpenProj 3rd Party Sweet Home 3D iReport-Designer JabRef soapUI RODIN own Jedit TV-Browser DrJava SQuirreL SQL Client Azureus/Vuze Relative bytecode size distribution (%) without Java API distribution (%)withoutJava size bytecode Relative 100 20 40 60 80 0 How much black-box reuse occurs?How much reuse black-box PDF Split and Merge iReport-Designer DavMail Results RQ3 Results Buddi soapUI OpenProj RODIN Mobile Atlas Creator SQuirreL SQL Client 19 DrJava 3rd Party Sweet Home 3D TV-Browser JabRef FreeMind Mediathek View own JEdit subsonic Azureus/Vuze HSQLDB YouTube Downloader Discussion a) Extent of reuse • Software reuse common among Java OSS • On average: high black-box reuse rates • Expected to have significant impact on development effort • Black-box reuse rates considerably varying 20 Discussion b) Influence of project size on reuse rate • Lee&Litecky found a negative influence of project size on reuse rate (survey of 500 Ada professionals) • Without Java API: Spearman correlation of 0.05 (two tailed p-value 0.83) • With Java API: Spearman -0.93 (p-value < 0.0001) → significant and strong negative correlation 21 Discussion c) Types of reused functionality • Categorization of reused libraries (e.g. networking, text/xml, rich client platforms) • No predominant category found • Nearly all projects reuse software from more than one category • No significant insights, except reuse diverse w.r.t. types of functionality 22 Threats to internal validity a) overestimation of reuse • False-positives from clone detection • mitigated by manual inspection of results • Unclear if code was copied into study objects or from them • mitigated by manual inspection • Black-box analysis considers a whole class as the element of reuse 23 Threats to internal validity a) underestimation of reuse • Fixed set of libraries in clone detection • False-negatives in clone detection • Manual inspection for copied code inherently incomplete • Black-box analyses misses calls via reflection, boundaries by Java interfaces • Other forms of component interaction 24 Threats to external validity • Unclear how representative study objects are for all Java OSS • Transferability to other PL or commercial development unclear • Impact of PL is expected to be high • Availability of reusable code depends on PL (e.g. Java vs. COBOL) 25 Conclusions • Early visions of development by plugging reusable components not realistic • But: Reuse in form of libraries common in Java OSS • High black-box reuse rates (9 of 20 projects > 50%) • Availability of reusable functionality well- established for Java platform 26 Future Work • Other programming ecosystems • Legacy programming languages, e.g. COBOL • Scripting languages, e.g. Python • Commercial software development environments 27 Thank you. Questions? 28.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    28 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us