Understanding and Auditing the Licensing of Open Source Software Distributions

Understanding and Auditing the Licensing of Open Source Software Distributions Daniel M. German†, Massimiliano Di Penta‡, Julius Davies† † Dept. of Computer Science, University of Victoria, Canada ‡ Dept of Engineering, University of Sannio, Italy [email protected], [email protected], [email protected] Abstract—Free and open source software (FOSS) is often When one installs a new application/library in a Unix distributed in binary packages, sometimes part of GNU/Linux system, this is often done from what is known as a bi- operating system distributions, or part of products dis- nary package (such as RPM packages in Fedora/Redhat-like tributed/sold to users. FOSS creates great opportunities for users, developers and distributions or .deb packages in Debian-like distributions). integrators, however it is important for them to understand the Other than the various artifacts composing the applica- licensing requirements of any package they use. Determining tion/library, the package also contains metadata describing, the license of a package and assessing whether it depends among other things, (i) under what open source license the on other software with incompatible licenses is not trivial. package is distributed (which we call the declared license Although this task has been done in a labor intensive manner by software distributions, automatic tools to perform this of the binary package), and (ii) the list of other packages analysis are highly desired. required in order to successfully install and use the current This paper proposes a method to understand licensing com- package (its required packages) [2]. patibility issues in software packages, and reports an empirical From a legal point of view, modifying and redistributing study aimed at auditing licensing issues in binary packages of a FOSS package poses two important issues: the Fedora-12 GNU/Linux distribution. The objective of this study is (i) to understand how the license declared in packages 1) Can we trust the declared license of the package? i.e., is consistent with those of source code files, and (ii) to audit is that license consistent with those of the files the the licensing information of Fedora-12, highlighting cases of package contains? incompatibilities between dependent packages. The obtained results—supported by feedback received from 2) Do the dependency requirements of a binary package Fedora contributors—show that there exist many nuances in create potential legal concerns? Software with different determining the license of a binary package from its source licenses can be combined to create larger systems, but code, as well as cases of license incompatibility issues due to such combinations can increase the chance for license package dependencies. incompatibilities. Keywords: Software licenses, evolution, mining software In this paper we describe a method to help understand repositories, open source systems, empirical study. licensing issues that can arise from changing, combining, and re-distributing packages in open source distributions. I. INTRODUCTION First, we identify the license of all files contained in the The advent of free and open source software (FOSS) has source code from which the binary package is created created a large ecosystem of applications, libraries and com- (the source package). This gives us a detailed overview of ponents that are readily available for download and usage. licenses present in the source code of a package, and allows Often these applications/libraries/components are distributed us to identify possible inconsistencies with the declared together with the operating system, and this happens for the license of the binary package. Second, we combine the various distributions of the GNU/Linux operating system. dependency graph of a binary package with the declared The intellectual property of FOSS is protected by licensing licenses of its dependencies, trying to identify any license mechanisms and copyright notices that determine how an inconsistencies. open source can be used, even impacting the architecture of We have carried out a large empirical study aimed at a system [1]. analyzing licensing issues in the entire Linux-based Fedora- Dealing with issues related to open source licensing is not 12 operating system. Results indicated the presence of trivial, as at the time of writing there are 65 open source some inconsistencies between the declared licenses and the licenses approved by the Open Source Initiative, and many source code ones, as well as problems arising because more in use; each of them imposing particular constraints of dependencies between different packages. For many of concerning how one can use and/or change a software. these potential problems, we contacted Fedora people or the License auditing is usually a manual analysis (assisted by package maintainers, obtaining clarifications and feedback custom scripts) performed by GNU/Linux distributions (such about the problems we pointed out, as reported in the paper. as Debian, OpenSuse, RedHat, and Ubuntu). In many cases, either Fedora or the package maintainers made changes to address such issues. work the finding about licensing change, in fact this is one The main contribution of this paper is a better understand- of the reasons for misalignment between licenses declared ing of how auditing of licenses happens in a large ecology of in the packages and licenses in source code files. packages that use many different licenses. This information There have been some attempts towards the creation is valuable towards the creation of computer support to assist of automatic environments for the verification of software legal auditors in their day-to-day work. licenses [6], [7]. In both cases, they take a simple approach: The paper is organized as follows. After a discussion packages are licensed under a single, well defined license, of related work in Section II, Section III details some of and the dependency data is used to identify potential vi- the problems that can arise when combining open source olations. Alspaugh et al. [7] propose a requirement-based software distributed under different licenses. Section IV approach: the objective is to determine any restrictions describes the proposed method to audit open source distri- in the way components are integrated into a system be- butions for license compatibility issues. Section V describes fore this is built. Instead, Tuunanen et al. [6] propose a the empirical study we carried out on Fedora-12. Results more comprehensive approach: their environment identifies are presented and discussed in Section VI, while Section licenses from source code, uses compiling information to VII discusses the threats to validity. Finally, Section VIII determine if two components are connected to find potential concludes the paper and outlines directions for future work. violations. In both cases the approach has been evaluated against small applications composed of few components, II. RELATED WORK and they do not deal with the complexities of a large Linux This section describes related work concerning (i) the distribution containing more than 10,000 binary applications analysis and understanding of legal issues in software sys- and hundreds of thousands of source code files (we found tems, and (ii) the analysis of software distributions. 327,286 source code files in Fedora-12). In recent years legal issues and problems concerned with Although specific on issues related to software licens- intellectual property have triggered a series of research ing, this is not the first study aimed at analyzing en- works. Licenses impose constraints and thus can be defined tire GNU/Linux distributions. Robles et al. [8] and then as logical formulae constraining what can and cannot be Gonzalez-Barahona´ et al. [9] related the evolution of soft- done with a system. Software licensing patterns have been ware distributions with the evolution of single applications, recently studied by German et al. [1] using such a formal- finding that the former influences the latter. German et ization of licenses. They introduced several legal patterns, al. [2] proposed a method to analyze build dependencies along with examples of occurrences of these patterns. As in software distributions, and used it to analyze the Debian German et al. [1] did, we also consider constraints imposed GNU/Linux distribution. They showed how the retrieved by open source licenses and, in particular, we rely on inter-dependencies helped to understand how packages are these constraints to mine inconsistencies (i) between licenses used and variants in which the package can be installed. declared in the packages and source code licenses, and (ii) Similarly to this work, they used package descriptions to incompatibilities due to dependencies between packages and analyze dependencies—although their study was done on libraries having different licenses. Debian packages, while our study focuses on RPM packages. German et al. [3], presented a study of the influence of Stemming from that idea, we propose to integrate package software licenses on code migration between the FreeBSD, dependency information with license information to identify Linux, and OpenBSD kernels. Their findings support the potential cases of re-distribution with license incompatibili- hypothesis of a preferential code flow induced by permis- ties. sive licenses from FreeBSD and OpenBSD towards Linux. Hindle et al. [4] discovered that many of the largest commits III. CHALLENGES OF COMBINING OPEN SOURCE correspond to changes to the licenses

Understanding and Auditing the Licensing of Open Source Software Distributions

SUSE® Linux Enterprise Desktop 12 and the Workstation Extension: What's New ?

Speech Recognition Systems: a Comparative Review

Speech-To-Text System for Phonebook Automation

Remote Desktop Server: XDMCP

Multipurpose Large Vocabulary Continuous Speech Recognition Engine Julius

Indicators for Missing Maintainership in Collaborative Open Source Projects

The Wavesurfer Automatic Speech Recognition Plugin

Speech Phonetization Alignment and Syllabification (SPPAS): a Tool for the Automatic Analysis of Speech Prosody Brigitte Bigi, Daniel Hirst

List of Application Added in ARL #2607

Debian and Ubuntu

Portuguese Flash Cards

Xrdp.Log.Pdf