Code siblings: technical and legal implications of copying code Between applications
Daniel German, Massimiliano Di Penta, Yann -Ga ël Gu éhéneuc , and Giuliano (Giulio) Antoniol
MSR 2009, Vancouver 1/17 TheThe ChallengeChallenge
Code,Code, asas anyany otherother artisticartistic production,production, isis regulatedregulated byby copyrightcopyright lawlaw
CompaniesCompanies ownown thethe propertyproperty ofof sourcesource codecode
FreeFree andand openopen sourcesource softwaresoftware (FOSS)(FOSS) modelmodel isis differentdifferent
CopyingCopying 2727 LOCLOC outout ofof 525525 KLOCKLOC resultedresulted inin aa copyrightcopyright infringementinfringement
UsersUsers andand companiescompanies mustmust bebe awareaware ofof copyrightcopyright lawlaw andand ownershipownership MSR 2009, Vancouver 2/17 CodeCode HasHas PreferentialPreferential MigrationMigration FlowsFlows
MSR 2009, Vancouver 3/17 LicenseLicense TypesTypes
Permissive – the MIT/X11 and BSD licenses Minor constraints on the licensee Inclusion of fragments in a system under a different license BSD licensed fragments can be included in proprietary systems. CAVEAT! Multiple BSD licenses: original BSD (4-clauses BSD), the new BSD (3-clauses BSD), and the 2-clauses BSD Code licensed under the original 4-clauses BSD cannot be included inside systems licensed under the GPL
Reciprocal – GNU variants Any system that includes the fragments must be licensed under the same license GPL-licensed fragments can only be included in systems licensed under the same version of the GPL
MSR 2009, Vancouver 4/17 TheThe ScaleScale ofof thethe ProblemProblem
WidelyWidely adoptedadopted systemssystems areare inin thethe rangerange ofof MLOCMLOC andand thousandsthousands ofof filesfiles
IfIf 27LOC27LOC inin 525KLOC525KLOC leadlead toto copyrightcopyright infringementinfringement Companies implication in reusing code End user implications
WeWe areare likelike detectivesdetectives Help monitoring and detecting license inconsistencies Help monitoring and identifying inconsistent licenses in code fragments
MSR 2009, Vancouver 5/17 EmpiricalEmpirical StudyStudy
CodeCode siblings:siblings: codecode fragmentsfragments thatthat migratedmigrated fromfrom oneone systemsystem toto anotheranother andand thenthen evolvedevolved followingfollowing theirtheir ownown pathspaths
ThreeThree *nix*nix kernelskernels Linux ~7MLOC and 20,000 files FreeBSB ~8MLOC and 21,000 files OpenBSD ~2MLOC and 5,500 files
OverallOverall SizeSize asas ofof Jan.Jan. 2009,2009, 17MLOC17MLOC
MSR 2009, Vancouver 6/17 ResearchResearch QuestionsQuestions
RQ1:RQ1: WhatWhat kindskinds ofof openopen sourcesource licenseslicenses areare usedused inin thethe threethree kernels?kernels?
RQ2:RQ2: HowHow manymany potentialpotential siblingssiblings existexist betweenbetween thethe BSDBSD kernelskernels andand thethe LinuxLinux kernel?kernel?
RQ3:RQ3: WhatWhat licenseslicenses areare usedused byby siblingssiblings and,and, ifif different,different, why?why?
MSR 2009, Vancouver 7/17 TechnologiesTechnologies andand SetupSetup
CloneClone detectiondetection tooltool CCFinderX tool Min 100 tokens Parse only .c files Concentrate on pair of files sharing a high percentage of common code fragment, least ~30%, i.e., ~20LOC Prune files mapped into more than five siblings
LicenseLicense detectiondetection andand identificationidentification First comment(s) FoSSology version 1.0.0 78 different license variants Added 5 more licenses
MSR 2009, Vancouver 8/17 Sibling(s)Sibling(s) OriginOrigin
IdentifyIdentify currentcurrent siblingssiblings TraceTrace backback intointo pastpast siblingssiblings –– theirtheir codecode fragmentsfragments inin thethe samesame filesfiles WhenWhen theythey disappear,disappear, thenthen wewe havehave theirtheir originsorigins TakeTake thethe oldestoldest ofof thethe twotwo asas thethe truetrue originorigin
Sys 1 – File i
Cloned fragments Migration siblings direction Sys 2 – File j
MSR 2009, Vancouver Cloned fragments 9/17 RQ1:RQ1: KindsKinds ofof openopen sourcesource licenseslicenses
LinuxLinux …… isis LinuxLinux …… 65%65% ofof GPLGPL filesfiles plusplus 25%25% ofof filesfiles ““promotedpromoted ”” toto GPLGPL byby L.L. TorvaldTorvald A few files (35) have two licenses FreeBSDFreeBSD 75%75% ofof thethe filesfiles withwith BSDBSD licenselicense 189 files (5%) with no license 179 files with a corporate license (Intel licenses) 167 files with MIT license A few multiple licenses – 19 BSD and GPL, 15 BSD and Educational, 14 MIT and GPL OpenBSDOpenBSD 7676 %% BSDBSD licenseslicenses 295 files (9%) with a MIT license, 179 with an educational license 138 (84%) without license 59 files with BSD and Educational, 25 with MIT and MSR 2009, BSD, and 14 with BSD and GPL Vancouver 10/17 RQ2:RQ2: SiblingsSiblings betweenbetween kernelskernels
2500
2000
1500
FreeBSD vs.Linux OpenBSD vs. Linux 1000 Siblings 500
0 Filtered siblings Clone pairs Files Linux Files BSD File Pairs File Pairs ( same name) 250
200
150
FreeBSD vs. Linux OpenBSD vs. Linux 100
50 MSR 2009, Vancouver 0 11/17 Files Linux Files BSD File Pairs File Pairs (same name ) RQ3:RQ3: CodeCode MigrationMigration andand LicensesLicenses
FreeBSD Linux Files Before Jan 1, 2002 BSD GPL 8 Almost nothing after BSD MIT 2 OpenBSD Linux Files BSD None 2 BSD BSD+GPL 1 Corporate BSD+GPL 89 BSD MIT 2 GPL None 1 Phrase BSD+GPL 1 BSD Unknown 1 X.Net+BSD MIT 1 BSD+GPL GPL 1 BSD+Phrase Phrase+GPL 1 MIT GPL 23
Linux FreeBSD Files BSD+GPL Corporate 8 GPL BSD 17 GPL BSD+GPL 1 GPL CPL+BSD+GPL 1 After Jan 1, 2002 MIT BSD 1 Nothing before MIT+GPL None 2 MSR 2009, None BSD 1 Vancouver None BSD 1 12/17 Phrase+GPL MIT 2 AIC7xxxAIC7xxx MaintainingMaintaining SiblingsSiblings
1994:1994: LinuxLinux AIC7xxxAIC7xxx seriesseries SCSISCSI adaptersadapters 1995:1995: LinuxLinux codecode isis incorporatedincorporated intointo anan OpenBSDOpenBSD driverdriver 1996:1996: NetBSDNetBSD driverdriver isis portedported toto FreeBSDFreeBSD #ifdef to maintain the variants 1997:1997: AA mailingmailing listlist isis createdcreated inin FreeBSDFreeBSD toto unifyunify thethe effortsefforts ofof peoplepeople inin thethe differentdifferent kernelskernels The major development of the driver seems to happen in FreeBSD 2000:2000: DevelopmentDevelopment propagatespropagates toto Linux,Linux, NetBSDNetBSD ,, andand OpenBSDOpenBSD Today:Today: DevelopmentDevelopment mostlymostly LinuxLinux andand FreeBSDFreeBSD MSR 2009, Vancouver 13/17 GPCGPC codecode inin FreeBSDFreeBSD
2002:2002: SiliconSilicon GraphicsGraphics xfsxfs filefile systemsystem integratedintegrated intointo LinuxLinux DecDec 12,12, 20052005 xfsxfs appearsappears inin FreeBSDFreeBSD The license of xfs is GPL FreeBSD is licensed under the 2-clause BSD Including xfs in a BSD kernel requires the kernel to be under the GPL too a
CompilingCompiling GPLGPL --licensedlicensed codecode intointo thethe kernelkernel makesmakes itit ““RESTRICTEDRESTRICTED ”” It can no longer be distributed in binary form, its source code be made available for mirroring
MSR 2009, Vancouver 14/17 LicenseLicense DefectsDefects
FreeBSD rdma _cma .c / Linux cdma .c are siblings In Linux, it appeared on Jun 17, 2006, with 64 changes plus including 8 changes after it appeared in FreeBSD The Linux sibling is licensed under GPL v2 and the 2 - clause BSD licenses The FreeBSD sibling is licensed under the terms of the new BSD license, the GPL v2, and Commons Public License Original license still present in FreeBSD Linux license was changed:
commit a9474917099e007c0f51d5474394b5890111614f Author: Sean Hefty
CodeCode movemove andand codecode siblingssiblings dodo existexist SiblingsSiblings havehave aa preferentialpreferential flowflow Initially from BSD(s) to Linux – frequent Today from Linux to FreeBSD – less frequent CompaniesCompanies directlydirectly contributecontribute toto codecode inin differentdifferent kernelskernels –– seesee IntelIntel driversdrivers withwith dualdual licenseslicenses
ManagingManaging siblingssiblings isis aa difficultdifficult problemproblem
MSR 2009, Vancouver 16/17 IfIf youyou dondon ’’tt monitormonitor codecode maymay sneaksneak inin ……
QuestionsQuestions ??
MSR 2009, Vancouver 17/17