Automated Large-Scale Multi-Language Dynamic Program Analysis in the Wild Alex Villazón Universidad Privada Boliviana, Bolivia
[email protected] Haiyang Sun Università della Svizzera italiana, Switzerland
[email protected] Andrea Rosà Università della Svizzera italiana, Switzerland
[email protected] Eduardo Rosales Università della Svizzera italiana, Switzerland
[email protected] Daniele Bonetta Oracle Labs, United States
[email protected] Isabella Defilippis Universidad Privada Boliviana, Bolivia isabelladefi
[email protected] Sergio Oporto Universidad Privada Boliviana, Bolivia
[email protected] Walter Binder Università della Svizzera italiana, Switzerland
[email protected] Abstract Today’s availability of open-source software is overwhelming, and the number of free, ready-to-use software components in package repositories such as NPM, Maven, or SBT is growing exponentially. In this paper we address two straightforward yet important research questions: would it be possible to develop a tool to automate dynamic program analysis on public open-source software at a large scale? Moreover, and perhaps more importantly, would such a tool be useful? We answer the first question by introducing NAB, a tool to execute large-scale dynamic program analysis of open-source software in the wild. NAB is fully-automatic, language-agnostic, and can scale dynamic program analyses on open-source software up to thousands of projects hosted in code repositories. Using NAB, we analyzed more than 56K Node.js, Java, and Scala projects. Using the data collected by NAB we were able to (1) study the adoption of new language constructs such as JavaScript Promises, (2) collect statistics about bad coding practices in JavaScript, and (3) identify Java and Scala task-parallel workloads suitable for inclusion in a domain-specific benchmark suite.