Npm Packages As Ingredients: a Recipe-Based Approach
Total Page:16
File Type:pdf, Size:1020Kb
npm Packages as Ingredients: A Recipe-based Approach Kyriakos C. Chatzidimitriou, Michail D. Papamichail, Themistoklis Diamantopoulos, Napoleon-Christos Oikonomou and Andreas L. Symeonidis Electrical and Computer Engineering Dept., Aristotle University of Thessaloniki, Thessaloniki, Greece Keywords: Dependency Networks, Software Reuse, JavaScript, npm, Node. Abstract: The sharing and growth of open source software packages in the npm JavaScript (JS) ecosystem has been exponential, not only in numbers but also in terms of interconnectivity, to the extend that often the size of de- pendencies has become more than the size of the written code. This reuse-oriented paradigm, often attributed to the lack of a standard library in node and/or in the micropackaging culture of the ecosystem, yields interest- ing insights on the way developers build their packages. In this work we view the dependency network of the npm ecosystem from a “culinary” perspective. We assume that dependencies are the ingredients in a recipe, which corresponds to the produced software package. We employ network analysis and information retrieval techniques in order to capture the dependencies that tend to co-occur in the development of npm packages and identify the communities that have been evolved as the main drivers for npm’s exponential growth. 1 INTRODUCTION come very important in today’s software develop- ment process, npm registry has become a “must” The popularity of JS is constantly increasing, and place for developers to share packages, defining code along is increasing the popularity of frameworks for reuse as a state-of-the-practice development paradigm building server (e.g. Node.js), web (e.g. React, Vue.js, (Chatzidimitriou et al., 2018). A white paper by Con- Angular, etc.), desktop (e.g. Electron) or mobile ap- trast Security (Williams and Dabirsiaghi, 2014) men- plications (e.g. React Native, NativeScript, etc.), even tions that up to 80% of the code in today’s software IoT solutions (e.g. Node-RED). A common denom- applications comes from libraries and frameworks. inator to this explosive growth has been the launch This is evident in the npm ecosystem, where the num- of the npm registry (i.e. the package manager of JS) ber of dependencies for a package has been shown to in 2010. The npm ecosystem is often seen as one of grow with time (Wittern et al., 2016). There are even the JS revolutions1 that have transformed JavaScript extreme cases, where one-liner libraries have more from “a language that was adding programming ca- than 70 dependencies (Haney, 2016). Such extreme pabilities to HTML” into a full-blown ecosystem. In reusability is usually attributed to the lack of a stan- fact, the growth is so rapid that terms like “JS frame- dard library in node.js and to the micropackaging cul- work fatigue” have become common among develop- ture of the npm ecosystem. ers. Indicatively, the June 2018 Redmonk survey2, Against this background, in this work we extract the GitHub status report3, and the 2018 Stack Over- the collective knowledge and preferences when creat- flow survey4 position JS as the most popular program- ing JavaScript (node.js) packages through mining the ming language, while Module Counts5 depicts an ex- most proliferate module repository, the npm registry. ponential growth of npm modules against repositories Inspired by (Teng et al., 2012), we treat packages as of other languages. recipes and dependencies as ingredients. Just like a Given that dependencies and reusability have be- recipe comprises a list of ingredients along with a pro- 1https://youtu.be/L-fx2xXSVso cess on how to combine them, one can consider an 2https://redmonk.com/sogrady/2018/08/10/language- npm package as a recipe: a list of core dependencies rankings-6-18/ and a list of development dependencies, also known 3https://octoverse.github.com/ as devDependencies in the npm lingo, that are com- 4https://insights.stackoverflow.com/survey/2018/ bined together with some code that uses them. And 5http://www.modulecounts.com/ just like online recipes receive reviews and comments, 544 Chatzidimitriou, K., Papamichail, M., Diamantopoulos, T., Oikonomou, N. and Symeonidis, A. npm Packages as Ingredients: A Recipe-based Approach. DOI: 10.5220/0007966805440551 In Proceedings of the 14th International Conference on Software Technologies (ICSOFT 2019), pages 544-551 ISBN: 978-989-758-379-7 Copyright c 2019 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved npm Packages as Ingredients: A Recipe-based Approach npm packages have repository stars, forks, watchers code under development and other information (is- and package downloads, GitHub issues, Stack Over- sues, commits, stars, forks, watchers etc.). All the flow posts, and so on. This type of information can above hold true for node.js applications as well. That provide insights not only about package popularity or is, end-user applications that too have dependencies quality but also about user preferences, development and devDependencies and allow us to see package us- tendencies, and which dependencies go well with oth- age from an application developer viewpoint. ers. We focus on the following research questions: • RQ1: Are there any interesting communities (clus- ters) developed by dependencies that tend to co- 3 METHODOLOGY occur together in “recipes”? • RQ2: Is the same true for devDependencies? The target of our methodology is to distill the knowl- edge found in the package.json files of the npm reg- • RQ : Can we identify the scope these communi- 3 istry, with respect to their declared dependencies. To ties are used for? do so, we parse them and extract the dependencies and • RQ4: What are the similarities or differences be- devDependencies, the description and keywords that tween package recipes (node.js packages of the denote their functionality, along with certain meta- npm registry) and application recipes (node.js ap- data regarding package popularity. plications that are not part of the npm registry)? We construct four networks in order to capture By providing answers to these questions we aim to get developers’ practices about how they combine them: a better understanding of how packages work when one that reflects the relationships between dependen- used together as dependencies. We can also answer cies and one for devDependencies. This is performed questions as to which packages mix well together and twice one for npm packages and one for node.js ap- for what purpose. plications. Upon creating the networks, we employ cluster detection algorithms in order to find communi- ties of packages that are often used together and apply information retrieval techniques to the keywords and 2 BACKGROUND description fields in order to assign keywords of func- tionality to the communities. Last but not least, we try The npm registry6 is the largest and fastest growing to identify if employing frequently used dependencies module (or packages in the JS terminology) reposi- has any impact to the popularity of the produced pack- tory of all programming languages. At the moment of age in terms of GitHub stars or package downloads. writing it includes more than 800K packages, while more than 500 are added every day. Packages pub- 3.1 The Dataset lished to the registry must contain a package.json file. The main goals of the package.json7 file are: In such big open databases like npm, the quality and 1. to list the packages that the project depends on. usefulness of the packages usually follows a Power 8 9 2. to allow specifying the versions of a package that Law or the Pareto Principle . So in order to mine the project can use via semantic versioning rules. useful, quality packages, we augmented each pack- age with its monthly download count gathered from 3. to make one’s build reproducible, and therefore npms.io10 and the GitHub stars from the repository much easier to share with other developers. mentioned in the package.json file. Among other things, this file holds information, such Using popularity thresholds of 5;000 monthly as the package name and version, its description, the downloads and 70 GitHub stars, we extracted 8;732 authors, keywords, license, repository, and more. packages. The aforementioned thresholds were not Besides the dependencies, the package.json file arbitrarily selected, but were investigated using the contains a devDependencies list that accounts for elbow method based on the number of packages that packages used while developing the software project satisfy them. This number is exponentially increasing and not included in the “production” build of the when we further decrease those thresholds. For each product. Most often, the npm packages are associated package (recipe) we kept the name, its dependencies with a GitHub repository, which contains the source (ingredients), its devDependencies (another type of 6https://www.npmjs.com/ 8https://en.wikipedia.org/wiki/Power law 7https://docs.npmjs.com/getting-started/using-a- 9https://en.wikipedia.org/wiki/Pareto principle package.json 10https://api-docs.npms.io/ 545 ICSOFT 2019 - 14th International Conference on Software Technologies ingredients), the keywords and the description fields 3.2 Package Dependencies Complement found in the package.json file, augmented with its Network monthly downloads and the GitHub stars of the repos- itory declared in its package.json file. We only kept We constructed a dependency complement network the name of the package and the names of its de- based on the Pointwise Mutual Information (PMI) cri- pendencies and didn’t consider any declared semantic terion, defined on pairs of dependencies (a;b) as fol- version. As for the applications, by crawling GitHub, lows: we downloaded package.json files that were con- p(a;b) tained in the root folder of repositories with more PMI(a;b) = log (1) than 70 stars and their package names did not exist p(a)p(b) in the npm registry.