Sourcerer’s Apprentice and the study of code snippet migration Stephen Romansky, Abram Hindle Cheng Chen, Baljeet Malhotra Department of Computing Science, University of Alberta BlackDuck Edmonton, Canada Burnaby, Canada romansky,
[email protected] cchen,
[email protected] Abstract—On the worldwide web, not only are webpages A common open source license violation is the lack of connected but source code is too. Software development is attribution, most Free/Libre Open Source Software (F/LOSS) becoming more accessible to everyone and the licensing for licenses require that the authors who wrote the code are software remains complicated. We need to know if software licenses are being maintained properly throughout their reuse attributed in documentation, in the source code, or in startup and evolution. This motivated the development of the Sourcerer’s messages. Not attributing the opensource copyright holder Apprentice, a webservice that helps track clone relicensing, violates the opensource license. Thus using the wrong license because software typically employ software licenses to describe or misattributing code can be costly because: how their software may be used and adapted. But most developers do not have the legal expertise to sort out license conflicts. In this • a developer or company can lose the rights to use, reuse, paper we put the Apprentice to work on empirical studies that and distribute source code and software they rely on; demonstrate there is much sharing between StackOverflow code • a developer or company a developer may be required to and Python modules and Python documentation that violates the licensing of the original Python modules and documentation: distribute their proprietary source code unexpectedly, if a software snippets shared through StackOverflow are often being copy-left license was included; relicensed improperly to CC-BY-SA 3.0 without maintaining • or, the developer or company may be sued for copyright the appropriate attribution.