The Pennsylvania State University the Graduate School College of Engineering APPGRADER: an APP QUALITY GRADING SYSTEM BASED on C

The Pennsylvania State University The Graduate School College of Engineering APPGRADER: AN APP QUALITY GRADING SYSTEM BASED ON CODE-LEVEL FEATURES A Thesis in Computer Science and Engineering by Xi Li c 2018 Xi Li Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Science August 2018 The thesis of Xi Li was reviewed and approved∗ by the following: Sencun Zhu Associate Professor of Computer Science and Engineering Thesis Advisor Danfeng Zhang Assistant Professor of Computer Science and Engineering Chita R. Das Distinguished Professor of Computer Science and Engineering Department Head of Computer Science and Engineering ∗Signatures are on file in the Graduate School. ABSTRACT The current app ranking systems applied by app markets are mainly based on app rating and downloads. However, these systems have drawbacks in handling: (i) apps with abnormally high ratings and fake downloads; (ii) newly published apps with limited user feedback. Rankings of these apps may not accord with their actual quality, which will mislead users. Therefore, in an attempt to explore app ranking systems and change the under studied status quo of it, we propose AppGrader, a novel app quality grading system that ranks apps under the same category based on app functionality measured by code-level features. This system is inspired by the analysis on 18 millions app reviews which suggests that while giving ratings, most users may consider user interface and other features that can be extracted directly from app code. Therefore, our system statically analyzes app code and generates \feature view graph" for each app which encodes app code- level features. For app ranking, we apply Graph Convolutional Network to cluster apps into different classes based on the complexity of their corresponding feature view graphs, where each class indicates one level of app quality. According to the system evaluation from two perspectives: system accuracy and label dissimilarity, AppGrader performs well on 1440 real world apps with average accuracy of around 72% and label dissimilarity of around 1, which indicates that AppGrader could be applied for evaluating apps with fake ratings and newly published apps. Keywords: Mobile App Ranking, App Quality Evaluation, App View, Graph Con- volutional Network, Deep Learning iii TABLE OF CONTENTS List of Figures vi List of Tables vii Chapter 1 Introduction 1 Chapter 2 Related Work 6 Chapter 3 Background 11 Chapter 4 Methodology 13 4.1 System Overview . 13 4.2 App Code-Level Features . 15 4.2.1 User Experience of an App . 15 4.2.2 App Review Analysis . 17 4.3 System Architecture . 21 4.3.1 Code De-compiler . 22 4.3.2 Feature View Graph Generator . 22 4.3.2.1 Generate View Graph . 22 4.3.2.2 Generate Feature Vector . 25 4.3.3 Graph Convolutional Network . 27 iv Chapter 5 Implement and Evaluation 31 5.1 Data Collection and Processing . 31 5.1.1 Data Collection . 31 5.1.2 Label Assigning . 32 5.1.3 GCN Inputs . 33 5.2 Experiment Results and Analysis . 34 5.2.1 Efficiency . 34 5.2.2 Effectiveness . 34 5.2.2.1 System Accuracy . 34 5.2.2.2 Label Dissimilarity . 39 5.2.2.3 Synthesized Result . 42 5.2.2.4 Case Study . 44 Chapter 6 Discussion 46 6.1 Limitation . 46 6.2 Future Work . 48 Chapter 7 Conclusion 50 Bibliography 51 Appendix A App Review Analysis 54 Appendix B Detailed Feature Vector 59 Appendix C Testing Result of AppGrader 63 v LIST OF FIGURES 4.1 System architecture . 14 4.2 Interaction between a user and an app . 16 4.3 Distribution of top 500 high frequency terms . 18 4.4 Distribution of 50 topics among 18 million app reviews . 20 4.5 Proportion of topics indicating app features in total 50 topics . 21 4.6 Graph convolutional network for node classification . 28 4.7 Graph convolutional network for app classification . 29 5.1 Comparison result in terms of system accuracy under three groups of ground truth labels . 37 5.2 Comparison of propagation models of GCN (Thomas et al. 2017) . 38 5.3 Comparison result in terms of label dissimilarity under three groups of ground truth labels . 42 5.4 Comparison of test accuracy and label dissimilarity of the first test (weights: 33%, 33%, 33%) . 43 vi LIST OF TABLES 4.1 Terms indicating app features in top 500 high-frequency terms . 17 4.2 Topics indicating app features . 19 4.3 Intent construction methods . 23 4.4 Android APIs for activity switching . 23 4.5 Android methods that listen user inputs . 27 5.1 Data overview . 32 5.2 Summary of results in terms of app classification accuracy . 35 5.3 Summary of results in terms of average label dissimilarity for false predictions . 39 5.4 Mis-estimated camera apps . 44 A.1 High-frequency terms counting result (top 500) . 54 B.1 Detailed feature vector . 59 C.1 Testing result of AppGrader on 123 apps in the first test (weights: 33%, 33%, 33%) . 63 vii CHAPTER 1 INTRODUCTION Recently, as the rapid development of smart phones and tablets, mobile application markets such as Google Play for Android and App Store for iOS are growing fast. Take Google Play for example, over 1 million apps were released in 20161, and the number of downloads had exceeded 65 billions in the same year2. Up to 2017, there have been more than 3.5 millions available apps3, and the number of apps is still increasing. Not as simple as several clicks to download apps from app markets, the huge number of apps is overwhelming users: even under the same category, there are 200 thousands apps on average to choose. Therefore, an app quality evaluation system that ranks apps by app quality is indispensable to liberate users from countless apps. The current ranking strategies applied by app markets are mainly based on app ratings and downloads. Apart from the app ranking, users may also refer to the 1https://www.statista.com/statistics/742370/annual-new-apps-google-play/ 2https://www.statista.com/statistics/281106/number-of-android-app-downloads-from- google-play/ 3https://www.statista.com/statistics/266210/number-of-available-applications-in-the-google- play-store/ reviews written by someone who has already tried this app. Nevertheless, these ranking strategies have drawbacks. Some app companies or developers may ma- nipulate the ratings and reviews of their apps to get more downloads from users and further, to earn higher revenues. It is not uncommon to find deceptively pos- itive reviews and abnormally high ratings. Besides, newly published apps have limited user feedback. As a result, rankings of these apps will deviate from real app quality, which may mislead users. The most effective way is to try each app on your own, but that is inconvenient and time-wasting. In the sea of mobile apps, it is hard for people to choose an appropriate one. Moreover, app developers are mired in predicament brought by ill-conceived ranking systems. For example, the app rankings offered by app markets merely inform developers the popularity of apps but not the popular reasons behind them, which finally impede the advent of excellent apps. However, there is few research about app ranking system. We only find ranking systems that offer app store optimization rather than optimizing the app itself, such as RankMyApp [1] and Appfigures [2]. These systems conduct sentiment analysis on app reviews to mine key words and popular topics by which they do app store optimization based on the mining result. That is, these systems analyze your app, your competitors' apps and the app market, and give opinions on app name, app description, app icon, screenshots to improve your app visibility in a certain category. However, they just modify the external parts of an app rather than evaluating the app itself therefore cannot help users find high-quality apps as they expect. These systems are superficial and do not solve the primary problems. Compared with relatively rare app ranking systems, there are plenty of app recommendation systems. Nevertheless, these app recommendation systems do not solve the problems mentioned before either. Some systems take user privacy preference into considerations. For instance, Liu et al. propose a system that recommends apps to users which reach a trade-off between app functionality and user privacy preference [3]. But the fact is that most users lack safety awareness. When re- ceiving a permission request to access sensitive data from an app, most users thoughtlessly click \allow" with less mind of their privacy. Thus, this type of system can not learn real user privacy preference and recommend appropriate apps 2 to them. Some other systems are personal recommendation systems which analyze user usage history to recommend apps that the user may like, such as [4, 5, 6, 7]. If a user is downloading an app under a specific category, their recommendations may not hit because of limited training samples. In this case, the user has to resort to app ratings and reviews again which, as we mentioned before, are not trustworthy. Given these observations, it is necessary to build an innovative app ranking system which is efficient and effective, and most importantly, can solve the problems we mentioned. This system must give a relatively impartial evaluation on app quality and rank apps by the level of quality. The most challenging part is what criterion should we choose to do quality evaluation on the apps. Since external ratings and reviews are not applicable, we decide to focus on app itself { user interface and app structure. The inspiration comes from our observation towards the way in which people give rating to an app.

The Pennsylvania State University the Graduate School College of Engineering APPGRADER: an APP QUALITY GRADING SYSTEM BASED on C

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support