Please Please Me:∗ Does the Presence of Test Cases Influence Mobile App Users’ Satisfaction?

Please Please Me:∗ Does the Presence of Test Cases Influence Mobile App Users’ Satisfaction? Vinicius H. S. Durelli Rafael S. Durelli Andre T. Endo Federal University of São João del Rei Federal University of Lavras Federal University of Technology São João del Rei, Minas Gerais Lavras, Minas Gerais Cornelio Procopio, Parana [email protected] [email protected] [email protected] Elder Cirilo Washington Luiz Leonardo Rocha Federal University of São João del Rei Federal University of Minas Gerais Federal University of São João del Rei São João del Rei, Minas Gerais Belo Horizonte, Minas Gerais São João del Rei, Minas Gerais [email protected] [email protected] [email protected] ABSTRACT KEYWORDS Mobile application developers have started to realize that quality Software testing, mobile application, sentiment analysis plays a vital role in increasing the popularity of mobile applica- ACM Reference Format: tions (apps), thereby directly influencing economical profit (in-app Vinicius H. S. Durelli, Rafael S. Durelli, Andre T. Endo, Elder Cirilo, Washing- purchases revenue) and app-related success factors (i.e., number ton Luiz, and Leonardo Rocha. 2018. Please Please Me:: Does the Presence of downloads). Therefore, developers have become increasingly of Test Cases Influence Mobile App Users’ Satisfaction?. In XXXII BRAZIL- concerned with taking preemptive actions to ensure the quality IAN SYMPOSIUM ON SOFTWARE ENGINEERING (SBES 2018), September of their apps. In general, developers have been relying on testing 17–21, 2018, Sao Carlos, Brazil. ACM, New York, NY, USA, Article 4, 10 pages. as their main quality assurance practice. However, little is known https://doi.org/10.1145/3266237.3266272 about how much mobile app testing contributes to increasing user level satisfaction. In this paper we investigate to what extent testing 1 INTRODUCTION mobile apps contributes to achieving higher user satisfaction. To The mobile application (usually referred to as mobile app) market this end, we probed into whether there is a relation between hav- has been growing steadily over the last decade. As a result, the ing automated tests and overall user satisfaction. We looked into software industry has employed thousands of developers to produce users ratings, which express their level of satisfaction with apps, a myriad of mobile applications and generate millions of dollars in and users reviews, which often include bug (i.e., fault) reports. By revenue [2]. Currently, mobile apps are available through many app analyzing a quantitative indicator of user satisfaction (i.e., user rat- stores (or app marketplaces), two notable examples are Apple App ing), we found that there is no significant difference between apps Store1 and Google Play Store2. App stores act as online platforms with automated tests and apps that have been developed without where users can browse through a catalog of apps, download apps, test suites. We also applied sentiment analysis on user reviews to and provide feedback concerning their user-experience using the examine the main differences between apps with and without test downloaded apps (which usually takes the form of mobile app user suites. The results of our review-based sentiment analysis suggest reviews). that most apps with and without test suites score quite high for user Mobile app development is rife with challenges. One of the major satisfaction. In addition, we found that update-related problems are challenges is meeting the high expectations of mobile app users. To far more common in apps with test suites, while apps without test evaluate the quality of mobile apps, some app stores enforce devel- suites are likely to have battery-drain problems. opment policies and publishing guidelines that must be observed by app developers.In addition, apps often have to go through an CCS CONCEPTS evaluation before being approved and showcased in the app store. Due to the rapidly growing mobile market, unlike desktop- and • Software and its engineering → Formal software verification; web-based software users, mobile app users can choose from a wide ∗In this paper we explore whether automated testing helps developers to create high variety of apps. Given their greater choice, mobile app users have quality apps that “please” the end user, so the title – as made famous by The Beatles in developed a low tolerance for faulty apps. Naturally, the quality of 1963 – “Please Please Me”. mobile apps greatly influences how users perceive their experience using these apps. Some studies suggest that apps that fail to meet Permission to make digital or hard copies of all or part of this work for personal or users’ expectations end up deleted from users’ devices [11]. There- classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation fore, we surmise that quality issues can negatively impact an app’s on the first page. Copyrights for components of this work owned by others than ACM rating and reviews and ultimately hurt its popularity over time. must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a App developers have recognized that quality plays a critical role fee. Request permissions from [email protected]. in driving popularity and have started taking preemptive actions to SBES’18, September 2018, São Carlos, São Paulo, Brazil ensure the quality of their apps [7, 20]. Most developers have been © 2018 Association for Computing Machinery. ACM ISBN 978-1-4503-6503-1/18/09...$15.00 1https://www.apple.com/ios/app-store/ https://doi.org/10.1145/3266237.3266272 2https://play.google.com/store/apps relying on testing as their main quality assurance practice. Despite models. Essentially, when using probabilistic models, data is mod- the growing interest in devising better approaches to testing mobile eled based on two concepts: (i) each document follows a topic apps as well as building tools that can automate and improve testing distribution θ (d); (ii) and each topic follows a term distribution of mobile apps [29], little research has been done to explore the ϕ(z). θ (d) and ϕ(z) reflect to what extent a term is important toa relation between having automated tests and user satisfaction [18]. topic and to what extent a topic is important to a document, re- Therefore, this paper examines the contribution of mobile app spectively [5]. Non-probabilistic topic modeling employs strategies testing in increasing the level of satisfaction of users with apps in the such as matrix factorization, wherein a dataset with n documents context of the Android operating system (OS) and the Google Play and m different terms is encoded as a design matrix A 2 Rn×m and Store. We conjecture that apps that include some sort of automated the goal is to decompose A into sub-matrices that preserve some testing (i.e., test suites) fare better in terms of quality and thus desired property or constraint. The two main non-probabilistic satisfy user expectations better. More specifically, we set out to strategies are (i) singular value decomposition (SVD) [6] and (ii) examine the following research questions: non-negative matrix factorization (NMF) [15]. Several strategies have been proposed for extracting relevant topics from a set of pre- RQ1 Do apps with test suites score higher in terms of user satis- viously identified topics. Bicalho .et al [1] proposed a strategy based faction than apps without test suites? on NMF: semantic topic combination (SToC), whose main goal is • RQ1:1 Do apps with test suites achieve higher average rat- to group semantically similar topics. In this paper, to answer our ings than apps without test suites? research questions we consider TF-IDF, NMF, and SToC as our data representation, topic decomposition, and topic extraction strategies • RQ1:2 Do apps with test suites achieve more positive re- of choice, respectively. views? We analyzed 60 mobile apps. Although the number of apps we 3 SENTIMENT ANALYSIS investigated is relatively small, the analysis we carried out was Sentiment analysis is concerned with the task of automatically de- sound and we believe that the results are of general interest to the tecting the polarity (i.e., positive or negative) of sentiment in textual software engineering community. Our experimental results on a representations [9]. Basically, there are two types of methods for real-world app sample would seem to indicate the following: carrying out sentence-level analysis: (i) supervised based and (ii) • Both our qualitative and quantitative results would seem to lexicon based. Since supervised methods need to process labeled suggest that apps with and without test suites score similarly data, these methods do not perform properly when complex data in terms of user satisfaction. (i.e., real world textual data) has to be taken into account. Conse- • However, these two groups of apps are prone to different quently, recent efforts have come up with sentence-level methods problems that lead to poor reviews: (i) apps with test suites based on lexicons. Lexicon-based methods create lexicons and com- have a high incidence of update-related problems and (ii) apps bine different natural language processing techniques to determine without tests are fraught with battery-drain issues. a sentence polarity [9]. In this paper, we adopted a lexicon based The remainder of this paper is organized as follows. Section 2 strategy [16] that outperforms several supervised strategies. and 3 provide background on topic modeling and sentiment analysis, presenting definitions and a brief description of the review- 4 RELATED WORK based sentiment analysis approach we used in this paper. Section 4 Several studies have been exploring user feedback to understand presents related work.

Load more