Extracting Signal from the Noise of App Reviews
Total Page:16
File Type:pdf, Size:1020Kb
Extracting Signal from the Noise of App Reviews Faculty of Science, Engineering and Technology Swinburne University of Technology Melbourne, Australia Submitted in fulfilment of the requirements for the degree of Doctor of Philosophy Leonard Hoon 2016 Copyright © 2016 Leonard Hoon Abstract The App Store and Google Play are mobile app store fronts furnished with a public review system. The competition on these store fronts is intense– developers can push new releases to users directly; users can publicly express opinions on the apps they have used. The resulting transparent feedback loop, while evident to the app community, is visi- ble to competitors as well. Studying these app reviews within a domain can inform the design of their own competing app, exploiting any prior user dissatisfaction. Thus, it behoves developers to maintain quality, in either attempts to enter this market with a new app or efforts in continuous refinement of an existing app portfolio. Opinion mining research has yielded observations of interest to busi- nesses by analysing user reviews of their products. These research efforts leverage static analysis techniques and natural language pro- cessing approaches to discern generalisable patterns and analyse con- tent. However, a good framework to mine mobile app reviews for content useful to app developers is lacking. Our work addresses this gap. In this thesis we demonstrate that the majority of reviews when read in isolation, do not yield developers much actionable feedback. However, using a series of analytics, we can construct a frame of context to detect developer-useful information. More specifically, our work analyses 8.7 million reviews from 17,330 apps at a store–, category– and app-level perspective. We present a statistically constructed app review model that profiles review properties, growth and evolution. A text stabilisa- tion treatment is also offered to prepare reviews for content analysis. Through use of the model to reduce the content search space, we offer a technique to detect and sort reviews by content usefulness. Holistic application of these analytics forms an app development benchmarking framework to inform project expectations, priorities and scoping. i Dedicated to the pursuit of knowledge and the changes wrought ii Acknowledgements I would like thank my supervisors, Jean-Guy and Rajesh for their pa- tience and support. I would like to thank my family for their love and encouragement. I would like to thank Milica, Scott, Andrew, Kon, Felix, John and all my friends from Swinburne, Deakin and back home for motivating me to finish. This thesis is the capstone of this journey, the next adventure awaits. Leonard Hoon, 2016 iii Declaration I declare that this thesis contains no material that has been accepted for the award of any other degree or diploma and to the best of my knowl- edge contains no material previously published or written by another person except where due reference is made in the text of this thesis. Leonard Hoon, 2016 iv Publications arising from this Thesis The work described in this thesis has been published as described in the following list: [1] R. Vasa, L. Hoon, K. Mouzakis, and A. Noguchi, “A Preliminary Analysis of Mobile App User Reviews,” in Proceedings of the 24th Australian Computer-Human Interaction Conference (OzCHI ’12), pp. 241–244, Nov. 2012. [2] L. Hoon, R. Vasa, J.-G. Schneider, and K. Mouzakis, “A Preliminary Analysis of Vocabulary in Mobile App User Reviews,” in Proceed- ings of the 24th Australian Computer-Human Interaction Conference (OzCHI ’12), pp. 245–248, Nov. 2012. [3] L. Hoon, R. Vasa, J.-G. Schneider, and J. Grundy, “An Analysis of the Mobile App Review Landscape: Trends and Implications,” tech. rep., Faculty of Science, Engineering and Technology, Swin- burne University of Technology, Melbourne, Victoria, Australia, 2013. http://hdl.handle.net/1959.3/352848. [4] K. Mouzakis, L. Hoon, and R. Vasa, “Socrates Mobile App Review Dataset.” http://hdl.handle.net/1959.3/364882, Oct. 2013. Swinburne Research Bank, Swinburne University of Technology. [5] L. Hoon, R. Vasa, G. Y. Martino, J.-G. Schneider, and K. Mouzakis, “Awesome! Conveying Satisfaction on the App Store,” in Proceed- ings of the 25th Australian Computer-Human Interaction Conference (OzCHI ’13), pp. 229–232, Nov. 2013. v [6] L. Hoon, K. Mouzakis, R. Vasa, F. T. C. Tan, and M. Fitzgerald, “Examining the Role of IS Affordances in Enabling Time Critical Clinical Practices: An Information Processing Perspective,” in Pro- ceedings of 25th Australasian Conference on Information Systems (ACIS ’14), Dec. 2014. [7] L. Hoon, F. T. C. Tan, R. Vasa, K. Mouzakis, and M. Fitzgerald, “ICT-Enabled Time-Critical Clinical Practices: Examining the Af- fordances of an Information Processing Solution,”Australasian Jour- nal of Information Systems, vol. 19, 2015. [8] L. Hoon, M. A. Rodriguez-García, R. Vasa, R. Valencia-García, and J.-G. Schneider, “App Reviews: Breaking the User and Developer Language Barrier,” in Trends and Applications in Software Engi- neering, vol. 405 of Advances in Intelligent Systems and Comput- ing, pp. 223–233, Springer International Publishing, 2016. Although the thesis is written in a linear fashion, our research efforts took an exploratory vector as we delved through ideation, modelling, formulation and backtracking from dead-ends. The following text out- lines the relevance of each publication listed in relation to this thesis. The early articles [1,2] presented our initial experiments post data col- lection. Specifically, our first study to determine summary statistics of reviews [1] and rudimentary content analysis [2] are described. These works bridged a gap in the literature at the time of writing by providing into the general properties of app reviews in terms of size, price, rat- ings and vocabulary as well as the relationships between these fields. The findings from these early works informed our technical report [3] that analyses review count growth and the evolution of both rating and size. The growth modelling and observations pertaining to ratings and size presented app developers an initial benchmarking framework to track and monitor their app from a review perspective. The statistically themed articles described thus far outline the experiments contributing to Chapter4: Statistical Landscape. Our more recent work, informed by [2], presents findings in regards to sentiment analysis and search space reduction [5]. The elimination of short reviews which make up over 80% of the dataset, greatly reduced the area of analysis for semantic analysis without disregarding the fact that short reviews, in volume still convey significance. Additionally, the vi first segment of our dataset was released with the publication to encour- age work in this space [4]. The Health & Fitness category was selected as our early works found the reviews in this category to be the most ver- bose, and hence, had the highest amount of raw data to work with for semantic analysis. Post rudimentary content and sentiment analysis, we took our experiments in the direction of constructing an ontology to map user review vernacular to themes presented in software quality [8]. Decomposing the vocabulary employed in reviews afforded tuning op- portunities for semantic analysis, without its probabilistic calculations being skewed by high volume but small sized reviews. The Socrates dataset and where to obtain it is detailed in Chapter3 while text and language processing research described are the bulk of Chapter5: Vo- cabulary Landscape. Finally, our information processing works [6,7] helped frame the struc- ture of this thesis. In particular, the premise of affordance as poten- tial, as a separate entity from the actualisation of realising said poten- tial, resonates strongly with our work. In the same vein, our expansive dataset is full of potential findings, but without the experiments we subject it to, said potential would never be realised. It was in the writ- ing of these works that the shape and structure of this thesis began to form, strongly influencing its prose. vii Contents 1 Introduction1 1.1 Research Goals..........................5 1.2 Usage Scenarios.........................7 1.3 Thesis Organisation.......................9 2 Literature Review 10 2.1 Electronic Word-of-Mouth................... 10 2.2 Digital User Reviews....................... 12 2.3 Affordance of User Reviews................... 13 2.4 App Store Ecosystem Dynamics................ 14 2.5 Ecosystem Dynamics via Simulation............. 16 2.6 App Release Strategies..................... 18 2.7 Ratings, Recommendations and Reviews........... 22 2.8 Review Information Extraction................. 29 2.8.1 Classifying Reviews................... 29 2.8.2 Summarising Reviews.................. 33 2.8.3 Review Information Extraction Summary....... 36 viii Contents 2.9 Literature Summary....................... 37 2.9.1 App Store Dynamics................... 37 2.9.2 App Reviews........................ 38 2.10Thesis Scope........................... 39 3 Data Selection and Collection 43 3.1 Mobile App Store Fronts.................... 43 3.2 App Store Selection....................... 46 3.3 Data Selection.......................... 47 3.4 Thesis Dataset.......................... 50 4 Statistical Landscape 52 4.1 Overview............................. 53 4.2 Typical Properties of a Review................. 54 4.2.1 Review Size........................ 54 4.2.2 Review Size across Ratings............... 56 4.2.3 Review Size across Categories............. 58 4.3