Npm Packages As Ingredients: a Recipe-Based Approach
Total Page:16
File Type:pdf, Size:1020Kb
npm Packages as Ingredients: a Recipe-based Approach Kyriakos C. Chatzidimitriou, Michail D. Papamichail, Themistoklis Diamantopoulos, Napoleon-Christos Oikonomou, and Andreas L. Symeonidis Electrical and Computer Engineering Dept., Aristotle University of Thessaloniki, Thessaloniki, Greece fkyrcha, mpapamic, thdiaman, [email protected], [email protected] Keywords: Dependency Networks, Software Reuse, JavaScript, npm, node. Abstract: The sharing and growth of open source software packages in the npm JavaScript (JS) ecosystem has been exponential, not only in numbers but also in terms of interconnectivity, to the extend that often the size of de- pendencies has become more than the size of the written code. This reuse-oriented paradigm, often attributed to the lack of a standard library in node and/or in the micropackaging culture of the ecosystem, yields interest- ing insights on the way developers build their packages. In this work we view the dependency network of the npm ecosystem from a “culinary” perspective. We assume that dependencies are the ingredients in a recipe, which corresponds to the produced software package. We employ network analysis and information retrieval techniques in order to capture the dependencies that tend to co-occur in the development of npm packages and identify the communities that have been evolved as the main drivers for npm’s exponential growth. 1 INTRODUCTION Given that dependencies and reusability have be- come very important in today’s software develop- The popularity of JS is constantly increasing, and ment process, npm registry has become a “must” along is increasing the popularity of frameworks for place for developers to share packages, defining code building server (e.g. Node.js), web (e.g. React, Vue.js, reuse as a state-of-the-practice development paradigm Angular, etc.), desktop (e.g. Electron) or mobile ap- (Chatzidimitriou et al., 2018). A white paper by Con- plications (e.g. React Native, NativeScript, etc.), even trast Security (Williams and Dabirsiaghi, 2014) men- IoT solutions (e.g. Node-RED). A common denom- tions that up to 80% of the code in today’s software inator to this explosive growth has been the launch applications comes from libraries and frameworks. of the npm registry (i.e. the package manager of JS) This is evident in the npm ecosystem, where the num- in 2010. The npm ecosystem is often seen as one of ber of dependencies for a package has been shown to the JS revolutions1 that have transformed JavaScript grow with time (Wittern et al., 2016). There are even from “a language that was adding programming ca- extreme cases, where one-liner libraries have more pabilities to HTML” into a full-blown ecosystem. In than 70 dependencies (Haney, 2016). Such extreme fact, the growth is so rapid that terms like “JS frame- reusability is usually attributed to the lack of a stan- work fatigue” have become common among develop- dard library in node.js and to the micropackaging cul- ers. Indicatively, the June 2018 Redmonk survey2, ture of the npm ecosystem. the GitHub status report3, and the 2018 Stack Over- Against this background, in this work we extract flow survey4 position JS as the most popular program- the collective knowledge and preferences when creat- ming language, while Module Counts5 depicts an ex- ing JavaScript (node.js) packages through mining the ponential growth of npm modules against repositories most proliferate module repository, the npm registry. of other languages. Inspired by (Teng et al., 2012), we treat packages as recipes and dependencies as ingredients. Just like a 1https://youtu.be/L-fx2xXSVso recipe comprises a list of ingredients along with a pro- 2https://redmonk.com/sogrady/2018/08/10/language- cess on how to combine them, one can consider an rankings-6-18/ npm package as a recipe: a list of core dependencies 3https://octoverse.github.com/ and a list of development dependencies, also known 4https://insights.stackoverflow.com/survey/2018/ as devDependencies in the npm lingo, that are com- 5http://www.modulecounts.com/ bined together with some code that uses them. And just like online recipes receive reviews and comments, code under development and other information (is- npm packages have repository stars, forks, watchers sues, commits, stars, forks, watchers etc.). All the and package downloads, GitHub issues, Stack Over- above hold true for node.js applications as well. That flow posts, and so on. This type of information can is, end-user applications that too have dependencies provide insights not only about package popularity or and devDependencies and allow us to see package us- quality but also about user preferences, development age from an application developer viewpoint. tendencies, and which dependencies go well with oth- ers. We focus on the following research questions: • RQ1: Are there any interesting communities (clus- 3 METHODOLOGY ters) developed by dependencies that tend to co- occur together in “recipes”? The target of our methodology is to distill the • RQ2: Is the same true for devDependencies? knowledge found in the package.json files of the npm registry, with respect to their declared depen- • RQ3: Can we identify the scope these communi- ties are used for? dencies. To do so, we parse them and extract the de- pendencies and devDependencies, the description and • RQ4: What are the similarities or differences be- keywords that denote their functionality, along with tween package recipes (node.js packages of the certain meta-data regarding package popularity. npm registry) and application recipes (node.js ap- We construct four networks in order to capture plications that are not part of the npm registry)? developers’ practices about how they combine them: By providing answers to these questions we aim to get one that reflects the relationships between dependen- a better understanding of how packages work when cies and one for devDependencies. This is performed used together as dependencies. We can also answer twice one for npm packages and one for node.js ap- questions as to which packages mix well together and plications. Upon creating the networks, we employ for what purpose. cluster detection algorithms in order to find communi- ties of packages that are often used together and apply information retrieval techniques to the keywords and 2 BACKGROUND description fields in order to assign keywords of func- tionality to the communities. Last but not least, we try The npm registry6 is the largest and fastest grow- to identify if employing frequently used dependencies ing module (or packages in the JS terminology) repos- has any impact to the popularity of the produced pack- itory of all programming languages. At the moment age in terms of GitHub stars or package downloads. of writing it includes more than 800K packages, while more than 500 are added every day. Packages pub- 3.1 The Dataset lished to the registry must contain a package.json file. The main goals of the package.json7 file are: In such big open databases like npm, the quality and 1. to list the packages that the project depends on. usefulness of the packages usually follows a Power Law8 or the Pareto Principle9. So in order to mine 2. to allow specifying the versions of a package that useful, quality packages, we augmented each pack- the project can use via semantic versioning rules. age with its monthly download count gathered from 3. to make one’s build reproducible, and therefore npms.io10 and the GitHub stars from the repository much easier to share with other developers. mentioned in the package.json file. Among other things, this file holds information, such Using popularity thresholds of 5;000 monthly as the package name and version, its description, the downloads and 70 GitHub stars, we extracted 8;732 authors, keywords, license, repository, and more. packages. The aforementioned thresholds were not Besides the dependencies, the package.json file arbitrarily selected, but were investigated using the contains a devDependencies list that accounts for elbow method based on the number of packages that packages used while developing the software project satisfy them. This number is exponentially increasing and not included in the “production” build of the when we further decrease those thresholds. For each product. Most often, the npm packages are associated package (recipe) we kept the name, its dependencies with a GitHub repository, which contains the source (ingredients), its devDependencies (another type of 6https://www.npmjs.com/ 8https://en.wikipedia.org/wiki/Power law 7https://docs.npmjs.com/getting-started/using-a- 9https://en.wikipedia.org/wiki/Pareto principle package.json 10https://api-docs.npms.io/ ingredients), the keywords and the description fields 3.2 Package Dependencies Complement found in the package.json file, augmented with its Network monthly downloads and the GitHub stars of the repos- itory declared in its package.json file. We only kept We constructed a dependency complement network the name of the package and the names of its de- based on the Pointwise Mutual Information (PMI) cri- pendencies and didn’t consider any declared semantic terion, defined on pairs of dependencies (a;b) as fol- version. As for the applications, by crawling GitHub, lows: we downloaded package.json files that were con- p(a;b) tained in the root folder of repositories with more PMI(a;b) = log (1) than 70 stars and their package names did not exist p(a)p(b) in the npm registry. Upon applying the aforemen- where p(a) and p(b) denote the frequency of pack- tioned methodology, we gathered 13;884 application ages containing a and b, respectively, and p(a;b) de- package.json files. The value of the threshold in notes the frequency of packages where a and b co- the number of stars was chosen again using the elbow occur. More specifically: methods as shown in Figure 111.