Code Siblings: Technical and Legal Implications of Copying Code Between Applications
Total Page:16
File Type:pdf, Size:1020Kb
Code siblings: technical and legal implications of copying code Between applications Daniel German, Massimiliano Di Penta, Yann -Ga ël Gu éhéneuc , and Giuliano (Giulio) Antoniol MSR 2009, Vancouver 1/17 TheThe ChallengeChallenge Code,Code, asas anyany otherother artisticartistic production,production, isis regulatedregulated byby copyrightcopyright lawlaw CompaniesCompanies ownown thethe propertyproperty ofof sourcesource codecode FreeFree andand openopen sourcesource softwaresoftware (FOSS)(FOSS) modelmodel isis differentdifferent CopyingCopying 2727 LOCLOC outout ofof 525525 KLOCKLOC resultedresulted inin aa copyrightcopyright infringementinfringement UsersUsers andand companiescompanies mustmust bebe awareaware ofof copyrightcopyright lawlaw andand ownershipownership MSR 2009, Vancouver 2/17 CodeCode HasHas PreferentialPreferential MigrationMigration FlowsFlows MSR 2009, Vancouver 3/17 LicenseLicense TypesTypes Permissive – the MIT/X11 and BSD licenses Minor constraints on the licensee Inclusion of fragments in a system under a different license BSD licensed fragments can be included in proprietary systems. CAVEAT! Multiple BSD licenses: original BSD (4-clauses BSD), the new BSD (3-clauses BSD), and the 2-clauses BSD Code licensed under the original 4-clauses BSD cannot be included inside systems licensed under the GPL Reciprocal – GNU variants Any system that includes the fragments must be licensed under the same license GPL-licensed fragments can only be included in systems licensed under the same version of the GPL MSR 2009, Vancouver 4/17 TheThe ScaleScale ofof thethe ProblemProblem WidelyWidely adoptedadopted systemssystems areare inin thethe rangerange ofof MLOCMLOC andand thousandsthousands ofof filesfiles IfIf 27LOC27LOC inin 525KLOC525KLOC leadlead toto copyrightcopyright infringementinfringement Companies implication in reusing code End user implications WeWe areare likelike detectivesdetectives Help monitoring and detecting license inconsistencies Help monitoring and identifying inconsistent licenses in code fragments MSR 2009, Vancouver 5/17 EmpiricalEmpirical StudyStudy CodeCode siblings:siblings: codecode fragmentsfragments thatthat migratedmigrated fromfrom oneone systemsystem toto anotheranother andand thenthen evolvedevolved followingfollowing theirtheir ownown pathspaths ThreeThree *nix*nix kernelskernels Linux ~7MLOC and 20,000 files FreeBSB ~8MLOC and 21,000 files OpenBSD ~2MLOC and 5,500 files OverallOverall SizeSize asas ofof Jan.Jan. 2009,2009, 17MLOC17MLOC MSR 2009, Vancouver 6/17 ResearchResearch QuestionsQuestions RQ1:RQ1: WhatWhat kindskinds ofof openopen sourcesource licenseslicenses areare usedused inin thethe threethree kernels?kernels? RQ2:RQ2: HowHow manymany potentialpotential siblingssiblings existexist betweenbetween thethe BSDBSD kernelskernels andand thethe LinuxLinux kernel?kernel? RQ3:RQ3: WhatWhat licenseslicenses areare usedused byby siblingssiblings and,and, ifif different,different, why?why? MSR 2009, Vancouver 7/17 TechnologiesTechnologies andand SetupSetup CloneClone detectiondetection tooltool CCFinderX tool Min 100 tokens Parse only .c files Concentrate on pair of files sharing a high percentage of common code fragment, least ~30%, i.e., ~20LOC Prune files mapped into more than five siblings LicenseLicense detectiondetection andand identificationidentification First comment(s) FoSSology version 1.0.0 78 different license variants Added 5 more licenses MSR 2009, Vancouver 8/17 Sibling(s)Sibling(s) OriginOrigin IdentifyIdentify currentcurrent siblingssiblings TraceTrace backback intointo pastpast siblingssiblings –– theirtheir codecode fragmentsfragments inin thethe samesame filesfiles WhenWhen theythey disappear,disappear, thenthen wewe havehave theirtheir originsorigins TakeTake thethe oldestoldest ofof thethe twotwo asas thethe truetrue originorigin Sys 1 – File i Cloned fragments Migration siblings direction Sys 2 – File j MSR 2009, Vancouver Cloned fragments 9/17 RQ1:RQ1: KindsKinds ofof openopen sourcesource licenseslicenses LinuxLinux …… isis LinuxLinux …… 65%65% ofof GPLGPL filesfiles plusplus 25%25% ofof filesfiles ““promotedpromoted ”” toto GPLGPL byby L.L. TorvaldTorvald A few files (35) have two licenses FreeBSDFreeBSD 75%75% ofof thethe filesfiles withwith BSDBSD licenselicense 189 files (5%) with no license 179 files with a corporate license (Intel licenses) 167 files with MIT license A few multiple licenses – 19 BSD and GPL, 15 BSD and Educational, 14 MIT and GPL OpenBSDOpenBSD 7676 %% BSDBSD licenseslicenses 295 files (9%) with a MIT license, 179 with an educational license 138 (84%) without license 59 files with BSD and Educational, 25 with MIT and MSR 2009, BSD, and 14 with BSD and GPL Vancouver 10/17 RQ2:RQ2: SiblingsSiblings betweenbetween kernelskernels 2500 2000 1500 FreeBSD vs.Linux OpenBSD vs. Linux 1000 Siblings 500 0 Filtered siblings Clone pairs Files Linux Files BSD File Pairs File Pairs (same name) 250 200 150 FreeBSD vs. Linux OpenBSD vs. Linux 100 50 MSR 2009, Vancouver 0 11/17 Files Linux Files BSD File Pairs File Pairs (same name) RQ3:RQ3: CodeCode MigrationMigration andand LicensesLicenses FreeBSD Linux Files Before Jan 1, 2002 BSD GPL 8 Almost nothing after BSD MIT 2 OpenBSD Linux Files BSD None 2 BSD BSD+GPL 1 Corporate BSD+GPL 89 BSD MIT 2 GPL None 1 Phrase BSD+GPL 1 BSD Unknown 1 X.Net+BSD MIT 1 BSD+GPL GPL 1 BSD+Phrase Phrase+GPL 1 MIT GPL 23 Linux FreeBSD Files BSD+GPL Corporate 8 GPL BSD 17 GPL BSD+GPL 1 GPL CPL+BSD+GPL 1 After Jan 1, 2002 MIT BSD 1 Nothing before MIT+GPL None 2 MSR 2009, None BSD 1 Vancouver None BSD 1 12/17 Phrase+GPL MIT 2 AIC7xxxAIC7xxx MaintainingMaintaining SiblingsSiblings 1994:1994: LinuxLinux AIC7xxxAIC7xxx seriesseries SCSISCSI adaptersadapters 1995:1995: LinuxLinux codecode isis incorporatedincorporated intointo anan OpenBSDOpenBSD driverdriver 1996:1996: NetBSDNetBSD driverdriver isis portedported toto FreeBSDFreeBSD #ifdef to maintain the variants 1997:1997: AA mailingmailing listlist isis createdcreated inin FreeBSDFreeBSD toto unifyunify thethe effortsefforts ofof peoplepeople inin thethe differentdifferent kernelskernels The major development of the driver seems to happen in FreeBSD 2000:2000: DevelopmentDevelopment propagatespropagates toto Linux,Linux, NetBSDNetBSD ,, andand OpenBSDOpenBSD Today:Today: DevelopmentDevelopment mostlymostly LinuxLinux andand FreeBSDFreeBSD MSR 2009, Vancouver 13/17 GPCGPC codecode inin FreeBSDFreeBSD 2002:2002: SiliconSilicon GraphicsGraphics xfsxfs filefile systemsystem integratedintegrated intointo LinuxLinux DecDec 12,12, 20052005 xfsxfs appearsappears inin FreeBSDFreeBSD The license of xfs is GPL FreeBSD is licensed under the 2-clause BSD Including xfs in a BSD kernel requires the kernel to be under the GPL too a CompilingCompiling GPLGPL --licensedlicensed codecode intointo thethe kernelkernel makesmakes itit ““RESTRICTEDRESTRICTED ”” It can no longer be distributed in binary form, its source code be made available for mirroring MSR 2009, Vancouver 14/17 LicenseLicense DefectsDefects FreeBSD rdma _cma .c / Linux cdma .c are siblings In Linux, it appeared on Jun 17, 2006, with 64 changes plus including 8 changes after it appeared in FreeBSD The Linux sibling is licensed under GPL v2 and the 2 - clause BSD licenses The FreeBSD sibling is licensed under the terms of the new BSD license, the GPL v2, and Commons Public License Original license still present in FreeBSD Linux license was changed: commit a9474917099e007c0f51d5474394b5890111614f Author: Sean Hefty <[email protected]> Date: Mon Jul 14 23:48:43 2008 -0700 RDMA: Fix license text The license text for several files references a third software license MSR 2009, that was inadvertently copied in. Update the license to what was Vancouver intended. This update was based on a request from HP. [..] 15/17 ConclusionConclusion CodeCode movemove andand codecode siblingssiblings dodo existexist SiblingsSiblings havehave aa preferentialpreferential flowflow Initially from BSD(s) to Linux – frequent Today from Linux to FreeBSD – less frequent CompaniesCompanies directlydirectly contributecontribute toto codecode inin differentdifferent kernelskernels –– seesee IntelIntel driversdrivers withwith dualdual licenseslicenses ManagingManaging siblingssiblings isis aa difficultdifficult problemproblem MSR 2009, Vancouver 16/17 IfIf youyou dondon ’’tt monitormonitor codecode maymay sneaksneak inin …… QuestionsQuestions ?? MSR 2009, Vancouver 17/17.