Crowdsourcing User Reviews to Support the Evolution of Mobile Apps
Total Page:16
File Type:pdf, Size:1020Kb
Crowdsourcing User Reviews to Support the Evolution of Mobile Apps Fabio Palomba1, Mario Linares-V´asquez2, Gabriele Bavota3, Rocco Oliveto4 Massimiliano Di Penta5, Denys Poshyvanyk6, Andrea De Lucia7 1Delft University of Technology, The Netherlands, 2Universidad de los Andes, Colombia 3Universit`adella Svizzera italiana (USI), Lugano, Switzerland, 4University of Molise, Pesche (IS), Italy 5University of Sannio, Italy, 6The College of William and Mary, USA, 7University of Salerno, Italy Abstract In recent software development and distribution scenarios, app stores are playing a major role, especially for mobile apps. On one hand, app stores allow continuous releases of app updates. On the other hand, they have become the premier point of interaction between app providers and users. After installing/updating apps, users can post reviews and provide ratings, expressing their level of satisfaction with apps, and possibly pointing out bugs or desired features. In this paper we empirically investigate|by performing a study on the evolution of 100 open source Android apps and by surveying 73 developers|to what extent app developers take user reviews into account, and whether addressing them contributes to apps' success in terms of ratings. In order to perform the study, as well as to provide a monitoring mechanism for developers and project managers, we devised an approach, named CRISTAL, for tracing informative crowd reviews onto source code changes, and for monitoring the extent to which developers accommodate crowd requests and follow-up user reactions as reflected in their ratings. The results of our study indicate that (i) on average, half of the informative reviews are addressed, and over 75% of the interviewed developers claimed to take them into account often or very often, and that (ii) developers implementing user reviews are rewarded in terms of significantly increased user ratings. Keywords: Mobile app evolution, User reviews, Mining app stores, Empirical Study 1. Introduction impressions, positions, comparisons, and attitudes toward the apps [9, 32, 38, 66, 71]. Therefore, online store re- Online app stores are promoting and supporting a more views are free and fast crowd feedback mechanisms that dynamic way of distributing software directly to users with can be used by developers as a backlog for the develop- release periods shorter than in the case of traditional soft- ment process. Also, given this easy online access to app- ware systems. For instance, periodically planned releases store-review mechanisms, thousands of these informative that follow an established road map have been replaced by reviews can describe various issues exhibited by the apps continuous releases that become available for an upgrade in certain combinations of devices, screen sizes, operating with a cadence of few weeks, if not days [63]. This phe- systems, and network conditions that may not necessarily nomenon is particularly evident for|though not limited be reproducible during regular development/testing activ- to|mobile apps, where releases are managed through on- ities. line app stores, such as the Apple App Store [2], Google Consequently, by analyzing ratings and reviews, de- Play Market [28], or Windows Phone App Store [57]. The velopment teams are encouraged to improve their apps, pressure for continuous delivery of apps is a constant in the for example by fixing bugs or by adding commonly re- mobile economy, in which users are demanding new and quested features. According to a Gartner report's rec- better features from the apps, and the ratings/reviews pro- ommendation [40], given the complexity of mobile test- vided by the users are a strong mechanism to promote the ing \development teams should monitor app store reviews apps in the market. to identify issues that are difficult to catch during test- The distribution of updates|related to the introduc- ing, and to clarify issues that cause problems on the users' tion of new features and to bug fixes—through app online side". Moreover, useful app reviews reflect crowd-based stores is accompanied by a mechanism that allows users needs and are a valuable source of comments, bug reports, to rate releases using scores (i.e., star ratings) and text feature requests, and informal user experience feedback reviews. The former (i.e., the score) is usually expressed [8, 9, 25, 38, 41, 48, 58, 60, 61, 66]. However, because of as a choice of one to five stars, and the latter (i.e., the this easy access to the app stores and lack of control over review) is a free text description that does not have a pre- the reviews' content, relying on crowdsourced reviews for defined structure and is used to describe informally bugs planning releases may not be widely used by mobile de- and desired features. The review is also used to describe velopers because (i) a large amount of reviews need to be Preprint submitted to Elsevier November 15, 2017 analyzed, and (ii) some of these reviews may not provide in crowd reviews, and (ii) inherent abstraction mis- tangible benefits to the developers [9]. match between reviews and developers' source code In this paper we investigate to what extent app devel- lexicon. opment teams can leverage crowdsourcing mechanisms for planning future changes, and how these changes impact 2. CRISTAL's monitoring mechanism. After linking user satisfaction as measured by follow-up ratings. Specif- reviews to changes, the premier goal of CRISTAL is ically, we present (i) a mining study conducted on 100 to enable developers tracking how many reviews have Android open source apps in which we linked user re- been addressed, and analyzing the ratings to assess views to source code changes, and analyzed the impact users' reaction to these changes. More specifically, of implementing those user reviews in terms of app suc- the ability to monitor the extent to which user re- cess (i.e., ratings); and (ii) an online survey featuring 73 views have been addressed over the projects' evo- responses from mobile app developers, aimed at investi- lution history can be used as a support for release gating whether they rely on user reviews, what kind of re- planning activities. quirements developers try to elicit from user reviews, and 3. Results of an empirical study conducted on 100 An- whether they observe benefits in doing that. The survey droid apps. The study leverages CRISTAL to pro- does not aim at obtaining a different, subjective measure vide quantitative evidence on (i) how development of success. Instead, it helps understanding to what extent teams follow suggestions contained in informative developers actually rely on user reviews to improve their reviews, and (ii) how users react to those changes. apps. The results of the study demonstrate that on aver- To support our study and to provide developers age 49% of the informative reviews are implemented and project managers with an automated mechanism by developers in the new app release and this results to monitor whether and how user reviews have been in a substantial reward in terms of an increase in implemented, we have developed an approach named the app's rating. It should be noted that the value CRISTAL (Crowdsourcing RevIews to SupporTApp of responding to a user review of a mobile app has evoLution) able to identify traceability links between in- never been explored. Our analysis of app reviews and coming app reviews and source code changes likely ad- responses from 10,713 top apps in the Google Play dressing them, and use such links to analyze the impact of Store shows that developers of frequently-reviewed crowd reviews on the development process. Although ap- apps never respond to reviews. However, we observe proaches for recovering traceability links between require- that there are positive effects to responding to re- ments and source code [1, 51] or even between development views (users change their rating 38.7% of the time mailing lists and code [4] exist, such approaches are not di- following a developer response) with a median rat- rectly applicable in our context because of specific charac- ing increase of 20%. teristics of user reviews in mobile markets. First, as shown by Chen et al. [9], not all the reviews can be considered 4. Results of a survey conducted with 73 Android de- as useful and/or informative. Also, there is a vocabulary velopers and aimed at investigating the developers' mismatch between user reviews and source code or issues perception of user reviews, to what extent they ad- reported in issue trackers. Unlike issue reports and emails, dress them and whether they observe any tangible reviews do not refer to implementation details; moreover, benefits from such an activity. The achieved results because of the GUI-driven nature of mobile apps and the show that developers (i) strongly rely on user re- user interaction based on user gestures, user reviews in- views when planning the new release of their apps, clude mobile-specific GUI terms. In order to address these (ii) mainly look in user reviews for bugs experienced challenges, CRISTAL includes a multi-step reviews-to- by their users or for suggestions for new features, code-changes traceability recovery approach that firstly and (iii) confirm the positive effect of implementing identifies informative comments among reviews (based on change requests embedded in the user reviews on the a recent approach by Chen et al. [9]), and then traces app's ratings. crowd reviews onto commit notes and issue reports by leveraging a set of heuristics specifically designed for this 5. A comprehensive replication package [69] that traceability task. includes all the materials used in our studies. In summary, the paper makes the following contribu- tions: Paper structure. Section 2 describes CRISTAL, while 1. CRISTAL's novel reviews-to-code-changes trace- Section 3 reports results of a study aimed at evaluating ability recovery approach.