Sensitive Text Icon Classification for Android Apps

SENSITIVE TEXT ICON CLASSIFICATION FOR ANDROID APPS by ZHIHAO CAO Submitted in partial fulfillment of the requirements for the degree of Master of Science Thesis Advisor: Dr. Xusheng Xiao Department of Electrical Engineering and Computer Science CASE WESTERN RESERVE UNIVERSITY January, 2018 CASE WESTERN RESERVE UNIVERSITY SCHOOL OF GRADUATE STUDIES We hereby approve the thesis/dissertation of Zhihao Cao candidate for the degree of Master of Science. Committee Chair Xusheng Xiao Committee Member Andy Podgurski Committee Member Ming-Chun Huang Date of Defense Nov. 30 2017 *We also certify that written approval has been obtained for any proprietary material contained therein. Contents List of Tables 3 List of Figures 4 List of Abbreviations 7 Abstract 8 1 Introduction 9 2 Background 16 2.1 Permission System in Android…………………………………………………..16 2.2 Sensitive UI Detection in Android……………………………………………….17 2.3 Pixel and Color Model…………………………………………………………...19 2.4 Optical Character Recognition…………………………………………………...21 3 Design of DroidIcon 23 3.1 Overview…………………………………………………………..……………..23 3.2 Image Mutation…………………………………………………………………..24 3.2.1 Image Scaling………………………………………………………………24 3.2.2 Color Inversion…………………………………………………………….31 1 3.2.3 Opacity Conversion………………………………………………………..33 3.2.4 Grayscale Conversion………..…………………………………………….37 3.2.5 Contrast Adjustment……………………………………………………….42 3.3 Text Icon Classification 48 3.3.1 Text Cleaning……………………...……………………………………….48 3.3.2 Keyword Dataset Construction…………………………………………….49 3.3.3 Classification Algorithm…………………………………………………...50 3.4 DroidIcon………..……………………………………………………………….58 4 Evaluations 60 4.1 Icon Dataset Construction………………………………………………………..60 4.2 Effectiveness of DroidIcon…..…………………………………………………..61 4.3 Case Study……………………………………………………………………….73 5 Related Work 76 6 Discussion and Conclusion 78 Bibliography 80 2 List of Tables 3.1 The pseudo code for image scaling……...……………………………….30 3.2 The pseudo code for color inversion……………………………………..32 3.3 The pseudo code for opacity conversion…………...……………………37 3.4 The pseudo code for grayscale conversion….…...………………………40 3.5 The pseudo code for contrast adjustment………………………………...46 3.6 Keyword Set……………………………………………………………...49 3.7 The pseudo code for text icon classification……………………………..57 3.8 The pseudo code for DroidIcon …………………………………………58 3 List of Figures 1.1 Motivation example of DroidIcon……………………..………………...13 2.1 Screenshots of Android Permission Requests………….………………...17 2.2 Example sensitive text label……………………………………………...18 2.3 Example of pixels in an image…………………………………………...19 2.4 RGB color model mapped to a cube……………………………………..20 3.1 Overview of DroidIcon……….………………………………………….23 3.2 Normalized sinc function………………………………………………...26 3.3 Lanczos window for a = 1, 2, 3…………………………………………..26 3.4 (a) Lanczos kernel for a = 2………………………………………………….28 3.4 (b) Lanczos kernel for a = 3………………………………………………….28 3.5 (a) Before scaling……...……………………………………………………..30 3.5 (b) After scaling……...………………………………………………………30 3.6 An example icon with bright characters and deep background………….31 3.7 The icon in Figure 3.6 after color inversion……..………………………33 3.8 User Interface containing ghost buttons………………………………....34 3.9 (a) Example ghost button……………………………………...……………..35 3.9 (b) Example ghost button without transparent ghost background...…………35 3.10 Converted icon with opacity mapped to RGB color……...……………...37 3.11(a) Image of colored bars…………………………………………………....39 3.11(b) Converted bars after using Intensity…………………………………..…39 4 3.11(c) Converted bars after using Luminance………………………………..…39 3.12(a) Example color icon that OCR fails to process……..…………………….40 3.12(b) The grayscale image for Figure 3.12 (a)……………………………...…41 3.12(c) Example icon after grayscale conversion and color inversion..…………41 3.13(a) Example of image with very low contrast……………………………….43 3.13(b) Example of image with very high contrast………………………………44 3.14(a) Example icon before contrast adjustment………………………….…….47 3.14(b) Example icon after contrast adjustment………...……………………….47 3.14(c) Example icon after contrast adjustment and color inversion….…………47 3.15 Example Email icon with extracted text “L\_/j Email”...………………..48 3.16 Demonstration of Levenshtein distance………………………………….52 4.1 Number of words in text icons………………………..………………….61 4.2 Recall of OCR…………………………………………………………...62 4.3 Recall of OCR + Classification ………………………………………...62 4.4 Comparison of the recalls of OCR and OCR + Classification……...….62 4.5 Recall of OCR + Classification + Image Scaling………………….......64 4.6 Recall of OCR + Classification + Color Inversion………………...….64 4.7 Recall of OCR + Classification + Opacity Conversion……………….64 4.8 Recall of OCR + Classification + Grayscale Conversion….………….65 4.9 Recall of OCR + Classification + Contrast Adjustment.......................65 4.10 Comparison of recalls among all the image mutation techniques……….66 4.11 Recall of DroidIcon…………………………...………………………...67 4.12 Comparison of recalls between OCR and DroidIcon……………….…..67 5 4.13 Recall, precision, accuracy, and F1-score achieved by DroidIcon……...69 4.14 Icons with unusual or decorative fonts…………………………………..69 4.15 Icons with unsuitable character and image size….………………………69 4.16 Scaled Icon from Figure 4.15……….……………………………………70 4.17 Email icon with image scaling and contrast adjustment.……………..….70 4.18 Icon with similar colors in text and background………………………....71 4.19 Icon in Figure 4.18 after contrast adjustment……………………………71 4.20 Comparison of DroidIcon’s performance using different similarity thresholds…………………………………………………………….......72 4.21 Influence of similarity threshold on effectiveness……………………….72 4.22 Case study: Text Icons for Location, Message, Email, Contracts and Call……………………………………………………..………………...74 4.23 Case study: Text Icons for Messaging and Email ……………………………………………………………………………75 6 List of Abbreviations UI User Interface OCR Optical Character Recognition 7 Sensitive Text Icon Classification for Android Apps Abstract by ZHIHAO CAO As smartphones have played a very important role in people’s daily life, users' privacy and security become a serious concern. Previous research efforts in improving mobile app security mainly focused on the predefined sources of sensitive information managed by smartphone platforms. To the best of our knowledge, text icons, a type of user interface elements that may indicate uses of the users' sensitive information, have been largely neglected. In this thesis, we proposed an approach to automatically identify text icons in the UIs of smartphone apps, and classify them into predefined categories of sensitive information. In particular, we developed an algorithm DroidIcon based on OCR (Optical Character Recognition) to determine whether the texts contained in text icons indicate uses of sensitive information. To evaluate the effectiveness of DroidIcon, we apply the algorithm to 707 text icons collected from 2000 popular Android apps. The algorithm achieves an accuracy of 90.52%, a precision of 91.28% and a recall of 88.25% for classifying text icons into pre-defined categories of sensitive information. 8 Chapter 1 Introduction With the rapid development of mobile phones, smartphones become more and more popular and are playing an important role in people’s daily life. Today, millions of mobile applications (i.e., apps) are available in app stores. These apps enable smartphones to address various kinds of needs from users. In order to provide better services, apps use more and more user’s sensitive information to customize their functionality. However, certain apps may have behaviors that could be less than desirable or even harmful. For example, some apps obtain users’ personal data such as GPS coordinates, contact list, and e-mail addresses without consensus from the users, and advertisers exploit such data as a marketing channel to bundle pushy ads with apps [27]. To protect users’ sensitive information in smartphones, a lot of research efforts have been spent in constraining the uses of private user data through a data-access control mechanism. That is, in order to access users’ sensitive information, apps need to request the corresponding permissions from the users. For example, to access the users’ contact list, the apps need to request the READ_CONTACTS permission. However, this kind of protection mechanism has shown limited success [28], since many apps have legitimate reasons to request users’ permissions in using their private data and it is difficult to distinguish such legitimate behaviors from the undersized behaviors. For example, apps 9 recommending restaurants use users’ GPS data to suggest restaurant near the users, and apps providing travel planning services let users make phone calls or send messages. To detect undesired behavior in mobile apps, we are motivated by the vision: can analysis of an app’s program behavior be contrasted with the intents of the app to determine whether the app will perform within the user’s expectation? In other words, we aim to automatically check the compatibility between the intents expressed by an app and its behind-the-scene behaviors. For example, if an app’s user interface (UI) has no texts or images to indicate that it will access users’ GPS data (i.e., no intents for GPS data), but the app sends out users’ GPS data when a button is pressed, then red flags should be raised. Other useful scenarios include reading users’ contacts, sending SMS messages, and taking pictures. Apps’ UIs contain various types of semantic information that express the intents of the apps. For example,

Load more