Android Apps and User Feedback: a Dataset for Software Evolution and Quality Improvement

Zurich Open Repository and Archive University of Zurich Main Library Strickhofstrasse 39 CH-8057 Zurich www.zora.uzh.ch Year: 2017 Android Apps and User Feedback: A Dataset for Software Evolution and Quality Improvement Grano, Giovanni ; Di Sorbo, Andrea ; Mercaldo, Francesco ; Visaggio, Corrado A ; Canfora, Gerardo ; Panichella, Sebastiano DOI: https://doi.org/10.1145/3121264.3121266 Posted at the Zurich Open Repository and Archive, University of Zurich ZORA URL: https://doi.org/10.5167/uzh-139426 Conference or Workshop Item Originally published at: Grano, Giovanni; Di Sorbo, Andrea; Mercaldo, Francesco; Visaggio, Corrado A; Canfora, Gerardo; Panichella, Sebastiano (2017). Android Apps and User Feedback: A Dataset for Software Evolution and Quality Improvement. In: Proceedings of the 2Nd ACM SIGSOFT International Workshop on App Market Analytics, Paderborn, Germany, 1 January 2017 - 1 January 2017, 8-11. DOI: https://doi.org/10.1145/3121264.3121266 Android Apps and User Feedback: A Dataset for Software Evolution and Quality Improvement Giovanni Grano Andrea Di Sorbo Francesco Mercaldo University of Zurich University of Sannio Ist. di Informatica e Telematica, CNR [email protected] [email protected] [email protected] Corrado A. Visaggio Gerardo Canfora Sebastiano Panichella University of Sannio University of Sannio University of Zurich [email protected] [email protected] [email protected] ABSTRACT 1 INTRODUCTION Nowadays, Android represents the most popular mobile platform Mobile app stores, such as Google Play and Apple App Store, rep- with a market share of around 80%. Previous research showed that resent a rich source of information for software engineering re- data contained in user reviews and code change history of mobile searchers and developers interested in better understanding how apps represent a rich source of information for reducing software mobile software is created and maintained [9], since these markets maintenance and development effort, increasing customers’ satis- provide an open access to huge numbers of software applications faction. Stemming from this observation, we present in this paper together with consumers’ feedback [11]. Indeed, previous research a large dataset of Android applications belonging to 23 different showed that data contained in both user reviews and code change apps categories, which provides an overview of the types of feed- history of mobile apps represent a rich source of information for back users report on the apps and documents the evolution of the the development of mobile software, for reducing software mainte- related code metrics. The dataset contains about 395 applications nance effort and increasing customers’ satisfaction [5, 12, 14]. of the F-Droid repository, including around 600 versions, 280,000 In this context users’ feedback are particularly important since user reviews and more than 450,000 user feedback (extracted with software maintenance and evolution of mobile applications are specific text mining approaches). Furthermore, for each app version strictly guided by requests contained in user reviews [3, 6, 11]. For in our dataset, we employed the Paprika tool and developed several instance, investigating the types of feedback users report on the Python scripts to detect 8 different code smells and compute 22 code apps they are using can give valuable information about the features quality indicators. The paper discusses the potential usefulness of on which they pay more attention or help better understanding the the dataset for future research in the field. most common issues related to a specific app category5 [ , 11, 16]. Be- Dataset URL: https://github.com/sealuzh/user_quality side that, a more in-depth analysis on the code changes performed by developers when integrating users’ feedback in the code base CCS CONCEPTS of mobile applications can provide key insights on how developers · Software and its engineering → Software development process evolve these apps to gain an higher customers satisfaction (e.g., for management; Software evolution; increasing downloads of a given app). Unfortunately, app stores lack functionalities to organize informative user reviews feedback KEYWORDS toward proper software maintenance tasks or filter them according to the treated topics [1]. Software Quality, App Reviews, Mobile Applications, Software In this paper we propose a dataset containing 288,065 reviews Maintenance and Evolution extracted from Google Play1 related to 395 open source apps mined 2 ACM Reference format: from F-Droid . Every review is connected with a specific version Giovanni Grano, Andrea Di Sorbo, Francesco Mercaldo, Corrado A. Vis- of the app and then split into atomic sentences. Each of the ob- aggio, Gerardo Canfora, and Sebastiano Panichella. 2017. Android Apps tained sentences is labelled with the intention and the topics it and User Feedback: A Dataset for Software Evolution and Quality Improve- deals with, relying on the two-dimension URM taxonomy proposed ment. In Proceedings of ACM Conference, Washington, DC, USA, July 2017 by Di Sorbo et al. [5]. Moreover, for each of the app versions, 22 (Conference’17), 4 pages. code quality metrics (e.g., object-oriented and Android-oriented https://doi.org/10.1145/nnnnnnn.nnnnnnn metrics) and 8 different code-smells have been computed. The goal of this work is to provide data that researchers may promptly use Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed to conduct experiments aimed at better (i) understanding how spe- for profit or commercial advantage and that copies bear this notice and the full citation cific aspects related to code quality could affect app reviews and on the first page. Copyrights for components of this work owned by others than ACM star-ratings, (ii) comprehending how developers react to specific must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a user review feedback when evolving their mobile applications. To fee. Request permissions from [email protected]. the best of authors’ knowledge, this is the first attempt to create Conference’17, July 2017, Washington, DC, USA © 2017 Association for Computing Machinery. ACM ISBN 978-x-xxxx-xxxx-x/YY/MM...$15.00 1https://play.google.com/store/apps https://doi.org/10.1145/nnnnnnn.nnnnnnn 2https://f-droid.org Conference’17, July 2017, Washington, DC, USA Grano et al. Table 1: Intention Categories Definition Table 2: Topic Definitions Information Sentences that inform other users or developers about some as- Cluster Description Giving pect of the app. App sentences related to the entire app, e.g., generic crash reports, ratings, or Information Sentences describing attempts to obtain information or help from general feedback Seeking other users or developers. GUI sentences related to the Graphical User Interface or the look and feel of Feature Sentences expressing ideas, suggestions or needs for enhancing the app Contents sentences related to the content of the app Request the app. Pricing sentences related to app pricing Problem Dis- Sentences reporting unexpected behavior or issues. Feature or Func- sentences related to specific features or functionality of the app covery tionality Other Sentences not belonging to any of the previous categories. Improvement sentences related to explicit enhancement requests Updates/ Ver- sentences related to specific versions or the update process of the app sions Resources sentences dealing with device resources such as battery consumption, stor- a publicly available data collection containing such a substantial age, etc. amount of data, in which reviews related to specific app releases are Security sentences related to the security of the app or to personal data privacy Download sentences containing feedback about the app download labeled according to software maintenance categories (i.e., types Model sentences reporting feedback about specific devices or OS versions of user feedback) and apps are analyzed by static analysis tools for Company sentences containing feedback related to the company/team which devel- ops the app computing their software quality. Other sentences not treating any of the previous topics 2 RELATED WORK 3.1 Data Collection Phase Despite app stores represent a relatively recent phenomenon, they In this phase, we primarily built a web crawler (available in the immediately captured the interest of the software engineering com- dataset URL) to collect from the F-Droid repository the meta-data munity and, nowadays, there are already over 180 papers devoted (package name, available versions, release date of each version) and to their study[9]. As a consequence, several datasets involving a the apks of each app. The crawler initially mined data for 1,929 quite high numbers of apps with structured (e.g., source code) and different apps. The versions of each mobile application have been unstructured information (e.g., commits messages) have been pro- ordered according to the release date (i.e., from the oldest to the posed in the literature. For instance, the paper by Krutz et al.[8] latest version). All the apps (i) not appearing in the Google Play provided a dataset that reports results obtained by several static Store and (ii) whose latest version was released before the year 2014 analysis tools on 4,416 different versions of 1,179 open-source an- (i.e., this could indicate that the app is no longer maintained) have droid applications combined with data of version control commits been discarded. A second scraper tool6 was built to download from related to these applications. Collections containing huge amounts Google Play Store all the user reviews related to the remaining 965 of app reviews have also been published for pursuing different re- 7 8 3 apps. It relies on Phantom JS and Selenium in order to navigate search goals. For example, the Data Set for Mobile App Retrieval the Play Store web site and extract reviews from the resulting HTML includes 1,385,607 user reviews of 43,041 mobile apps and it has code.

Android Apps and User Feedback: a Dataset for Software Evolution and Quality Improvement

On Client Ledger System™ Software

Microsoft Security Intelligence Report

Apple Business Manager Overview Overview

Maas360 and Ios

Free Your Android! Not Free As in Free Beer About the FSFE This ﬂyer Was Printed by the Free Software You Don't Have to Pay for the Apps from F-Droid

How to Disable Gatekeeper and Allow Apps from Anywhere in Macos Sierra

Applications: A

Pokki – End User License Agreement

Installing and Removing Software

Microsoft Store Strategic Audit

Instructions for BS&A Users

What's New for Business