Comparison of Open Source License Scanning Tools
Total Page:16
File Type:pdf, Size:1020Kb
Bachelor Degree Project Comparison of Open Source License Scanning Tools Author: Hailing Zhang Supervisor: Morgan Ericsson, Lu Wang Semester. VT 2020 Subject: Computer Science Abstract We aim to determine the features of four popular FOSS scanning tools, FOSSology, FOSSA, FOSSID(SCAS), and Black Duck, thereby providing references for users to choose a proper tool for performing open-source license compliance in their projects. The sanity tests firstly verify the license detection function by using the above tools to scan the same project. We consider the number of found licenses and scanned sizes as metrics of their accuracy. Then we generate testing samples in different programming languages and sizes for further comparing the scanning efficiency. The experiment data demonstrate that each tool would fit different user requirements. Thus this project could be considered as a definitive user guide. Keywords: Software licenses, FOSS scanning tool, accuracy, efficiency Preface We would like to thank Morgan Ericsson for his guidance and advice during the writing of this thesis. We also want to thank Lu Wang for the research topic and the feedback from Björn Kihlblom, Mats Fröjdh, and Wei Cao. We would not be able to finish this degree project without the resources provided by Ericsson. Contents 1 Introduction 1 1.1 Related work 1 1.2 Problem formulation 2 1.3 Motivation 2 1.4 Objectives 3 1.5 Scope 3 1.6 Target group 4 1.7 Outline 5 2 Background 6 2.1 Software licenses 6 2.1.1 Free and Open Source Software 6 2.1.2 Software license compliance 7 2.2 Tools introduction 9 2.2.1 FOSSology 10 2.2.2 FOSSA 11 2.2.3 FOSSID 11 2.2.4 Black Duck 12 3 Method 13 3.1 Method selection 13 3.2 Reliability and Validity 14 4 Implementation 15 4.1 Experiment design 15 4.1.1 Sanity test design 15 4.1.2 Advanced test design 16 4.2 Experiment preparation 18 4.3 Experiment execution 18 4.4 Experiment results 20 5 Results 25 5.1 Sanity test results 25 5.2 Advanced test results 26 5.2.1 Results of advanced test A 26 5.2.2 Results of advanced test B 27 6 Analysis 30 6.1 FOSSology 30 6.2 FOSSA 30 6.3 FOSSID 31 6.4 Black Duck 32 7 Discussion 34 8 Conclusion 36 8.1 Future work 36 References 38 1 Introduction The technical superiority induces companies to use the free, open-source software (FOSS) in almost all products [1]. Due to the FOSS components usually get ample support from the open-source community. The quicker technology iteration with lower cost promotes the spread of emerging technologies and fosters innovation [26]. On the other hand, the license compatibility problems and copyrighted obligations also arise in legal controversy [8]. Since the reused codes might have contractual license terms and conditions that oblige the licensee to use the source code with preconditions, unintentional ramifications could jeopardize corporate intellectual property and cause subsequent obstructions of development. In such a context, commercial companies such as Black Duck, FOSSID came to market. They assist organizations in identifying licenses and discovery repeated snippets. The availability of scanning tools mitigates the legal risk, especially when developers modifying, redistribution, or create derivative works based on FOSS [20]. 1.1 Related work Researchers made plenty of efforts in the implementation of a new scanning tool and analysis of the legal theories. Still, there are not many published papers discussing the differences in performances among scanning tools. Since it is related to business competition, most of the existing scanning projects released are under copyleft licenses. Under the nondisclosure agreement and copyright protection, analyzing algorithms becomes impossible due to the remote source code. Thus the researches in scanning tools comparison are few and focus on the open-source licensing projects. The diploma thesis, "Software Licensing Analysis Tool" by Tomáš Radej [20], inspired the design of our controlled experiments. The author compared license Check and Licorice by performing detection on a random sample of packages taken from the Fedora operating system's repository. Kapitsaki, Tselikas, and Foukarakis contributed to the visualization of the license compatibility and integrated framework to support license conflict detection in their article "An insight into license tools for open source software systems" [14]. It investigated software licensing, giving a critical and comparative overview of existing assistive approaches and tools. Their research demonstrates the role of the different methods in license use decisions. This thesis thus attempts to choose tools with varying principles of working to conduct experiments. The accuracy of license risk given by each tool would be determined based on FOSS license categories. OSI and FSF documents listed compatibility relationships among licenses [10] [19], which lay the theoretical foundation of this project, especially for designing testing samples. The importance of license compliance emphasizing is in every tool's website and user guide [4] [5] [8], which exactly motivated this thesis, as well as provides references for the design of experiments in Chapter 2. 1.2 Problem formulation This thesis aims to figure out the capabilities and characteristics of FOSS scanning tools on the market. Since it is a challenge for an organization to know which scanning tool to use in its development organization, we will try to determine FOSS scanning tools' performance by controlled experiments. By analyzing the scanning results, and record each tool's computational efficiency and accuracy as a database for choosing suitable FOSS scanning tools for the next projects in Ericsson. 1.3 Motivation It is an era defined by software; the included FOSS components in merging products are universal and increasing [12]. The FOSS scanning tools thus gained attention from commercial companies. They all declare that they have the most comprehensive knowledge base of open source components, vulnerability, and license information [8]. This project aims to provide experiment data as references for the open-source compliance in product development, the enterprise or individuals could save time and expenses for testing the various commercial scanning tools. The proper tool can ensure the company's intellectual property rights are not unintentionally exposed while contributing to FOSS and FOSS forums. The usage of scanning tools also assures legal fulfillment of the company's obligations relative to open source license as well as not limiting the company's ability to commercialize and retain product proprietorship. Besides, the protection of copyright could flourish open-source software by supervising the users to respect authors' requirements. After all, making better software is what open source is all about. This thesis attempts to help the FOSS components users legitimately to develop and publish their products, thus optimizing the software industry by popularizing the concept of software license compliance. 1.4 Objectives The objectives of this thesis are listed below. Compare capabilities of FOSSology, FOSSA, FOSSID(SCAS), and Black O1 Duck by using them to apply license detection on the same project. Compare the scanning time of FOSSology and Black Duck in projects O2 with different sizes and programming languages. 1.5 Scope The scope of the thesis project is limited; we will only test the scanning tools mentioned earlier. Because they are non-free licenses, so the analysis of scanning results will not involve the source code and the algorithms that caused the different performances. For a similar reason, the description of test objects will include programming language, lines of codes, and the instructions of open source components. We designed the experiments to observe the performance of candidate tools under different programming languages instead of code statements. We discussed the license definitions in Chapter 1.1, from the practical public view, the FOSS scanning aims to find the license and code that may jeopardize product security instead of recognizing the FOSS licenses that are approved by both OSI and FSF. Since scientific writing is supposed to use plain and accurate descriptions rather than rhetorical flourishes, this project will not limit the scanning scope into the valid FOSS license approved by FSF and OSI, but popular licenses of each category as approved by OSI or FSF. Besides, the vendor tends to emphasize that their tool can integrate into the continuous integration and delivery pipeline, but discussion of this function will not be in this thesis. Because the difference does not affect their performance, and the testing samples will not integrate with any parental project. This project is in the computer science area, and the author does not have any legal background, so this project does not give legal advice. Although some tools also have other functions more than FOSS license detection, such as vulnerability identification, risk evaluation, and dependency version confirmation, this project would not launch a discussion on these aspects. This extra function refers to another kind of scanning tool for finding security vulnerabilities such as Cross-site scripting, SQL Injection, and insecure server configuration. 1.6 Target group Companies across all industries are racing to use, participate in, and contribute to open source projects for the various advantages they offer from leveraging external engineering resources that