Mass Discovery of Android Malware Behavioral Characteristics for Detection Consideration

B Xin Su1,2,3, Weiqi Shi3, Jiuchuan Lin4( ), and Xin Wang5

1 Provincial Key Laboratory of Network Investigational Technology, Hunan Police Academy, , 2 Key Laboratory of Network Crime Investigation of Hunan Provincial Colleges, Hunan Police Academy, Changsha, China 3 Department of Information Technology, Hunan Police Academy, Changsha, China 4 Key Lab of Information Network Security of Ministry of Public Security, The Third Research Institute of Ministry of Public Security, Shanghai, China [email protected] 5 College of Computer Science and Electronics Engineering, , Changsha, China

Abstract. Android malware have surged and been sophisticated, posing a great threat to users. The key challenge of detect Android malware is how to discovery their behavioral characteristics at a large scale, and use them to detect Android malware. In this work, we are motivated to discover the discriminatory features extracted from Android APK files for Android malware detection. To achieve this goal, firstly we extract a very large number of static features from each Android application (or app). Secondly, we explain the importance of each kind of feature in Android malware detection. Thirdly, we fed these features into three different classifiers (e.g., SVM, DT, RandomFoerst) for the detection of Android malware. We conduct extensive experiments on large real-world app sets consisting of 6,820 Android malware and 37,581 Android benign apps. The experimental results and our analysis give insights regarding what discriminatory features are most effective to characterize Android malware for building an effective and efficient Android malware detection approach.

1 Introduction

Android platform dominates the smartphone operating system market, and has become the main attack target for Android malware. According to the report from IDC [2]. Android takes the first place again with 87.6% market share in the second quarter of 2016. However, this rapid deployment and extensive avail- ability of Android apps has made them attractive targets for various malware. Malware authors generally take advantage of the update mechanism of mobile apps to infect existing Android apps with malicious code and thus compromise the security of the smartphone. Recent statistical data show that malware based c Springer Nature Switzerland AG 2018 X. Sun et al. (Eds.): ICCCS 2018, LNCS 11065, pp. 101–112, 2018. https://doi.org/10.1007/978-3-030-00012-7_10 102 X. Su et al. on Android platform accounts for 97% of mobile malware [1]. The private data of the users, such as IMEI, contacts list, and other user specific data are the primary target for the attackers, which is a serious threat for the security and privacy of Android users. Consequently, there is an urgent need to identify and cope with the malware for the Android platform. This ever-growing malware threat has stimulated research into Android app security. Existing work mainly focused on (i) permission security model analysis [8,16], (ii) app vulnerability mitigation [9,11], and (iii) malware behavior analysis and detection [6,19] based on static or dynamic analysis. As the number of apps and malapps as well as their variants explosively increases in the market, it is crucial to discovery and precisely characterize the behavior of an app, so as to develop methods for Android malware detection at a large scale. The main challenge of characterizing apps is threefold as fol- lows. First, apps running on Android have distinct characteristics compared to traditional desktop software. Second, Android malware is becoming increasingly sophisticated by leveraging legitimate apps and system vulnerabilities to evade detection systems. Third, some unprotected data on smartphone such as sensor data can be exploited to steal confidential information. In order to better characterize the behavior of Android apps for malware classification, in this work, we aim at discovering discriminatory features of apps. We first extract 11 kind feature sets from Android APK files. Then, we explain the importance of each kind of feature to understand better how the features perform differently, how to select features in detection tasks and when to retrain the classification models. Extensive experiments are conducted with different feature sets and feature groups. First, in order to study the discriminative power of each feature set, we feed them into four classifiers to compare the classification performance. Second, the composition of relevant features selected by Random Forest classifier is thoroughly analyzed to reveal the most useful features of each feature set for Android malware analysis and detection. In summary, the main contributions of this work are listed as following:

– We explore eleven kind of Android app feature sets to discovery and char- acterize behaviors of Android apps based on static analysis. We propose to employ three classifiers, namely, linear Support Vector Machine (SVM), J.48, and Random Forest (RF), and compare the discriminative power of different feature sets and the performance of different classifiers. – We analyze the composition of relevant features and discover the usage pat- terns of features in the Android malware. These patterns help to under- stand the behaviors of malapps with the most suitable features for automated Android malware detection. – We conduct extensive experiments with a very large Android benign app and malware sets. The experimental results demonstrate the effectiveness of our methods and models

The rest of this paper is organized as follows. Section 2 describes the feature sets. Section 3 describes four different classifiers in this study. Section 4 provides Mass Discovery of Android Malware Behavioral Characteristics 103 a detailed evaluation and results analysis. Section 5 describes the related work, and Sect. 6 concludes this paper.

2 Behavioral Characteristics Description

As the first step, our approach performs a lightweight static analysis of a given Android app. The behavioral characteristics we extracted can be categorized into 11 behavioral characteristic sets (abbreviated as BC). Then, we categorized the 11 behavioral characteristic sets into 2 types based on the extracted source. The first type of behavioral characteristic sets named configuration-based behavioral characteristic sets which extract from Android app configuration files, such as AndroidManifest.xml, rsa file. The second type of behavioral characteristic sets named dex-based behavioral characteristic sets which extracted from dex code. Next, we will describe these behavioral as following 9 aspects.

2.1 Behavioral Characteristic Sets from Configuration File

Every Android app developed for Android must include a manifest file called AndroidManifest.xml which provides data supporting the installation and later execution of the app. The information stored in this file can be efficiently retrieved on the device using the Android Asset Packaging Tool that enables us to extract the following sets:

Component Name (BC1): The majority of Android malapps are repackaged legitimate apps [18], in which the attackers insert the same malicious payload (usually in terms of components) into many different legitimate apps. We include component names as a behavioral characteristic set to capture the behavior of component reuse presented in both benign apps and malware.

Request Permission (BC2) and Hardware and Software Requirement (BC3): In Android system, the two behavioral characteristics indicate the demands of the apps for system resources. Permission request patterns can char- acterize the apps intents of resource accessing. In this paper, we use all the per- missions defined by Android platform and the Android apps. Moreover, Android apps signal their hardware and software requirements to devices in their manifest files with elements. We thus extract the hardware and software feature descriptors defined in Android documents as the third behavioral char- acteristic set.

Filter Intent (BC4): Android platform uses intent as a messaging object an app and the platform can send to another apps component for requesting an action or process. Android malware often declare with an intent filter to receive specific system events, e.g., BOOT COMPLETED, for activating malicious activity. In this work, we extract all the intent filters in the manifest files of the samples as a feature set. 104 X. Su et al.

Besides the AndroidManifest.xml file, rsa is another configuration file of Android app, we also extract one behavioral characteristic set from this file as following description.

Certificate Information (BC5): App developers must sign their APK files with a certificate, the private key of which is held by themselves. This certificate helps to distinguish a developer from others. Developer information such as the country, email address, organization, state or province, as well as the SHA-1 thumbprint, can be extracted from the certificate.

2.2 Behavioral Characteristic Sets from Dex File

Android apps are developed in Java and compiled into optimized bytecode for the Dalvik virtual machine. This bytecode can be efficiently disassembled and provides our approach with information about API calls and data used in an app. To achieve a low run-time, we implement a lightweight disassembler based on the dex libraries of the Android platform that can output all API calls and strings contained in an app. We use this information to construct the following behavioral characteristic sets.

Restricted API Calls (BC6) and Used Permissions (BC7): Requesting a permission does not mean that the app actually accesses to the corresponding resources. We scan the disassembled code of the app samples and record whether they invoke API calls protected by some permissions. Additionally, we use the API-permission mapping provided by PScout [4] to obtain the used permissions. Used permissions and restricted API calls reflect the resources an app actually access at different levels of granularity.

String (BC8): By matching with regular expression patterns, we collect all the URLs, IP addresses, file path strings, and numbers (with more than three digits) in the disassembled code as a feature set. These strings may involve many malicious behaviors.

Payload Information (BC9): Payload indicates the files inside the APK archive file. We include payload information as a behavioral characteristic set, since some Android malware contain extra .apk files in the host apps that tricks users to install these malicious .apk files, and since Android malware can change the file name extension from .apk or .dex into .png, so as not to arouse suspicion.

Code Patterns (BC10): In this behavioral characteristic set, we check whether an app dynamically loads .dex file or Linux native code, whether an app executes shell commands, whether an app use Java reflection techniques, and whether an app invokes cryptographic functions, etc.

Suspicious API Calls (BC11): Inspired by Drebin [6], we extract certain API calls that allow access to sensitive smartphone resources such as accessing device ID, sending and receiving SMS messages, which are frequently used by Android malware. Mass Discovery of Android Malware Behavioral Characteristics 105

3 Approach

The purpose of our study is to find discriminatory behavioral characteristics to effectively classify Android malware based on a variety of behavioral characteris- tics which are directly extracted from APK files with static analysis techniques. Thus, we treat the Android malware detection as a binary classification problem. The framework of our approaches is shown in Fig. 1 consisting of four steps. First, we collect a large amount of apps from Google Plat, third-party Android app markets and Android malware in the wild. Second, we extract as many behavioral characteristics from the apps as possible, in order to characterize each app with a vector. Finally, we conduct comprehensive experiments including: (A) comparing the performance of different feature sets, (B) classifier comparison.

Fig. 1. Overview of our approach

We already discussed the behavioral characteristic sets extracted from Android apps, and the dataset we used will be discussed in Sect. 4. In this section, we mainly discuss the machine learning classifiers we use in this work.

3.1 Classification Models

Linear Support Vector Machine (SVM): SVM is one of the machine learn- ing classifiers receiving the most attention currently, and its various applications are being introduced because of its high performance. The SVM could also solve the problem of classifying nonlinear data. Of the input features, unnecessary ones are removed by the SVM machine learning classifier itself and the model- ing is carried out, so there is some overhead in the aspect of time. However, it could be expected to perform better than other machine learning classifiers in the aspect of complexity or accuracy in analysis. Figure 2 shows how to find hyperplanes which are criteria for the SVM to do the learning process to classify data. All hyperplanes (a), (b) and (c) classify two things correctly, but the greatest advantage of the SVM is that it selects hyperplane (c) which maximizes the margin (the distance between data) and accordingly maximizes the capability of generalization. 106 X. Su et al.

Fig. 2. classification method of SVM

Decision Tree (DT): Decision tree is a classification model defined by recur- sively partitioning the training data into a tree structure. In such a tree structure, nodes represent features, leaves represent class labels, and branches emanating from nodes to nodes or nodes to leaves represent conjunctions of features that generate the class labels. Inducing a decision tree is a multistage or sequential process. In the experiments, we firstly put all the training samples at the root node, and then partition the training set depending on the chosen feature at this node. These two steps are then executed recursively at the child nodes from the previous step with the partitioned training subsets. During this process, the training data set is gradually split into homogeneous subsets. Many specific decision tree algorithms have been proposed. We employ C4.5 in this work. Random Forest (RF): Random forest is a combined classifier consisting of a collection of decision trees where each tree is learned independently on a ran- domly selected subset of training data. A subset for training each decision tree is selected by randomly sampling from both features and objects. The final clas- sification will be done by voting within all the generated trees.

3.2 Embedded into Vector Because the majority of classification models process data with numerical vec- tors, we need to map our extracted behavioral characteristics into a joint feature vector. To address this, we define the behavioral characteristic vector as follows: F {f ,f , ..., f },Fa {f a,fa, ..., f a } = 1 2 n = 1 2 m (1) First, all of behavioral characteristics from Android apps are contained in F , for each given app a, it can be defined with the features it contains. As showninEq.1, n is the size of the feature set and m is the number of different behavioral characteristics in a. Then we define behavioral characteristic vector of an Android app as V in Eq. 2.  a a 1 fi ∈ F and fi ∈ F , 1 ≤ i ≤ n V = {v1,v2, ..., vi},vi = (2) 0 otherwise Mass Discovery of Android Malware Behavioral Characteristics 107

Thus a behavioral characteristic vector can be translated into V = {0, 1, 0, ...}, 1 indicates that the behavioral characteristic is contained in this app, whereas 0 indicates not. However, V is often a sparse vector, in order to reduce the storage overhead, we transform V to a compressed format V ∗. Assum- ing that the behavioral characteristics are arranged in a fixed order, then we can index a feature by its position, and V ∗ is defined as follow:

V ∗ = {1, 4, 6, ...} (3) in Eq. 3, the positions of non-zero elements in V are stored in V ∗,whichsavesa great amount of memory space when modeling. Unlike Drebin’s high-dimensional vector, V only contains several hundreds typical behavioral characteristics ben- efiting from the extracted behavioral characteristics. According to our experi- ments, algorithm like SVM used in Drebin takes more than one hour to build a model on such a high-dimensional vectors.

4 Evaluation

4.1 Dataset We collect a very large data set in order to comprehensively evaluate our meth- ods. The data sets consist of two parts, benign apps collected from Google Play and popular third-party Android app markets and Android malware collected from different sources. For benign apps, we first download 150 top popular (150 top rating score) Android apps from each of 31 most popular categories, which cover most of the app categories which are defined in Google Play, such as Game, Book, etc. Then, we collect 42,602 Android apps from four popular third-party Android app markets, such as Anzhi, hiapk, mi, ZOL.WeuseViruaTotal to verify these apps, and find 32,931 apps are labeled as benign app. Therefore, we collect 37,581 Android benign apps. For Android malware, we collect 1,260 malware from Android Malware Genome Project [18], and 5,560 Android malware from the dataset used by Drebin [6]. In total, the Android app dataset contains 37,581 Android benign apps and 6,820 Android malware.

4.2 Behavioral Characteristic Set Comparison In this experiment, we first evaluate detection accuracy based on different single behavioral characteristic set, and to find importance of each single behavioral characteristic set in Android malware detection. Table 1 shows the detail results of this experiment. From Table 1, we can find that the results of Requested perimission (BC1)can achieve the best performance when using single type of behavioral characteristic set. This result means Requested permission is an important behavioral char- acteristic for Android malware detection. However, the results of using every 108 X. Su et al.

Table 1. Comparison results of single behavioral characteristic set (the order of clas- sifiers results is SVM/DT/RF)

BC Precision Recall F-Measure ROC Area Code 0.686/0.697/0.694 0.685/0.695/0.694 0.685/0.69/0.69 0.684/0.725/0.724 Hardware and software 0.563/0.563/0.564 0.548/0.548/0.548 0.411/0.411/0.413 0.507/0.503/0.547 Intent 0.762/0.839/0.854 0.629/0.827/0.84 0.723/0.823/0.837 0.69/0.879/0.892 Requested permission 0.876/0.916/0.938 0.876/0.916/0.937 0.876/0.916/0.937 0.868/0.938/0.975 Suspicious API 0.752/0.773/0.779 0.753/0.773/0.779 0.752/0.772/0.777 0.749/0.842/0.861 Used permission 0.841/0.853/0.867 0.837/0.853/0.867 0.835/0.852/0.867 0.83/0.917/0.943 Restricted API 0.859/0.901/0.916 0.843/0.873/0.904 0.851/0.887/0.91 0.905/0.915/0.927 Payload 0.755/0.684/0.699 0.555/0.685/0.7 0.407/0.684/0.698 0.512/0.74/0.779 Cert information 0.796/0.754/0.806 0.744/0.759/0.807 0.769/0.756/0.807 0.8/0.805/0.834 String 0.731/0.751/0.736 0.724/0.74/0.762 0.727/0.745/0.763 0.76/0.65/0.784 Component name 0.84/0.861/0.893 0.839/0.86/0.847 0.839/0.86/0.87 0.851/0.865/0.875 single type of behavioral characteristic set cannot achieve good performance, which means only use single type of behavioral characteristic set cannot charac- terize Android malware, and cannot obtain good detection results. Second, we divide these behavioral characteristic sets into two categories based on extracted source as we mentioned in Sect. 1, and fed both of categories into three classification models to compare detection results. Figure 3 shows the detail results.

(a) Results of configuration-based be- (b) Results of dex-based behavioral havioral characteristic sets characteristic sets

Fig. 3. The comparison results of two categories behavioral characteristic sets

From Fig. 3, we can find that using dex-based behavioral characteristic set can achieve better performance than configuration-based one. This results means Android malware contain more distinct behaviors in dex code. Mass Discovery of Android Malware Behavioral Characteristics 109

4.3 Detection Results

In this section, we fed all 11 behavioral characteristic sets into three classification models, and Fig. 4 shows the detail results. From this figure, we can find that using all behavioral characteristic sets can achieve better results than single or each category. Because all 11 behavioral characteristic sets can cover more behaviors of Android malware, which could obtain better detection results.

Fig. 4. classification results of all 11 behavioral characteristic sets

Moreover, we find RF could achieve best results among three classification models, which means RF is the most suitable classifier for the dataset we used in this work.

5 Related Work

The analysis and detection of Android malware have been a vivid area of research in the past several years. Several categories of researches have been proposed to cope with the growing amount of more and more sophisticated Android malware. We divide these researches into two categories which are described as follows.

5.1 Extract Behavioral Characteristics from Android App

Wang et al. [15] extract static behavioral characteristics, such as requested per- mission. Intent, component to characterize Android app, and use ensemble learn- ing algorithm to build classification model to detect Android malware. Feizollah et al. [7] evaluate the effectiveness of Android Intents (explicit and implicit) as a distinguishing behavioral characteristic for identifying Android malware. This work also shows that Intents are semantically rich behavioral characteristic that are able to encode the intentions of malware when compared to other well-studied features such as permissions. Wu et al. [17] adopts a machine learning approach that leverages the use of dataflow application program interfaces (APIs) as clas- sification features to detect Android malware. The authors conduct a thorough analysis to extract dataflow-related API-level features and improve the k-nearest neighbour classification model. Saracino et al. [13] present MADAM, a novel 110 X. Su et al. host-based malware detection system for Android devices which simultaneously analyzes and correlates features at four levels: kernel, application, user and pack- age, to detect and stop malicious behaviors. Chen et al. [5] study the problem of learning and verifying unwanted behaviours abstracted as automata from mal- ware.

5.2 Detection of Android Malware Milosevic et al. [12] present two machine learning aided approaches for static analysis of Android malware. The first approach is based on permissions and the other is based on source code analysis utilizing a bag-of-words representa- tion model. The permission-based model is computationally inexpensive, and is implemented as the feature of OWASP Seraphimdroid Android app that can be obtained from Google Play Store. Tong et al. [14] first collect execution data of sample malware and benign apps using a net link technology to generate patterns of system calls related to file and network access. Then, the authors build up a malicious pattern set and a normal pattern set by comparing the patterns of malware and benign apps with each other. At last, they compare them with both the malicious and normal pattern sets offline in order to detect Android malware from unknown app. Flowdroid [3] detected malware by build- ing a precise model of Androids lifecycle, which helped to reduce missed leaks or false positives. Jiang et al. [10] propose a novel multi-channel intelligent attack detection method based on LSTM-RNNs. 6 Conclusions

In this work, we aim to discovery and characterize Android app behavioral char- acteristics, and combine them with three well-known classification models to detect Android malware at a large scale. To achieve this goal, we first extract 11 single kind of static behavioral characteristic sets from Android app, and category them into two groups: congiguration-based and dex-based behavioral characteristic. Then, we explain the importance of extracted behavioral charac- teristic sets from 9 aspects for purpose of Android malware detction. Third, we fed the 11 types of behavioral characteristics into three classification models to detect Android malware. Finally, we conduct two kinds of experiments to evalu- ate the efficiency of our approach, namely behavioral characteristic comparison experiment and Android malware detection experiment. The experiment results show that use single behavioral characteristic to detect Android malware can- not achieve high accuracy, and Random Forest can obtain the highest detection accuracy among three classification models.

Acknowledgement. This work is supported by the Science and Technology Projects of Hunan Province (No. 2016JC2074), the Research Foundation of Education Bureau of Hunan Province, China (No. 16B085), the Open Research Fund of Key Laboratory of Network Crime Investigation of Hunan Provincial Colleges (No. 2017WLZC008), the National Science Foundation of China (No. 61471169), the Key Lab of Information Network Security, Ministry of Public Security (No. C16614). Mass Discovery of Android Malware Behavioral Characteristics 111

References

1. Mobile malware. http://www.forbes.com/sites/gordonkelly-/2014/03/24/report- 97-of-mobile-malware-is-on-android-this-is-the-easy-way-you-stay-safe/ 2. Smartphone OS market share, Q2 2016. http://www.idc.com/prodserv/ smartphone-os-market-share.jsp, http://www.idc.com/prodserv/smartphone-os- market-share.jsp 3. Arzt, S., et al.: FlowDroid: precise context, flow, field, object-sensitive and lifecycle- aware taint analysis for android apps. In: Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 259–269 (2014) 4. Au, K.W.Y., Zhou, Y.F., Huang, Z., Lie, D.: PScout: analyzing the android per- mission specification. In: Proceedings of the 2012 ACM Conference on Computer and Communications Security, pp. 217–228 (2012) 5. Chen, W., Aspinall, D., Gordon, A.D., Sutton, C., Muttik, I.: On robust malware classifiers by verifying unwanted behaviours. In: Abrah´´ am, E., Huisman, M. (eds.) IFM 2016. LNCS, vol. 9681, pp. 326–341. Springer, Cham (2016). https://doi.org/ 10.1007/978-3-319-33693-0 21 6. Arp, D., Spreitzenbarth, M., H¨ubner, M., Gascon, H., Rieck, K.: DREBIN: effective and explainable detection of android malware in your pocket. In: Network and Distributed System Security Symposium, pp. 23–26 (2014) 7. Feizollah, A., Anuar, N.B., Salleh, R., Suarez-Tangil, G., Furnell, S.: AndroDialysis: analysis of android intent effectiveness in malware detection. Comput. Secur. 65, 121–134 (2017) 8. Felt, A.P., Ha, E., Egelman, S., Haney, A., Chin, E., Wagner, D.: Android permis- sions: user attention, comprehension, and behavior. In: Proceedings of the Eighth Symposium on Usable Privacy and Security, pp. 1–14 (2012) 9. Felt, A.P., Wang, H.J., Moshchuk, A., Hanna, S., Chin, E.: Permission re- delegation: attacks and defenses. In: Proceedings of the 20th USENIX Conference on Security, pp. 22–22 (2011) 10. Jiang, F., et al.: Deep learning based multi-channel intelligent attack detection for data security, pp. 1–1 (2018) 11. Lu, L., Li, Z., Wu, Z., Lee, W., Jiang, G.: CHEX: statically vetting android apps for component hijacking vulnerabilities. In: Proceedings of the 2012 ACM Conference on Computer and Communications Security, pp. 229–240 (2012) 12. Milosevic, N., Dehghantanha, A., Choo, K.K.R.: Machine learning aided android malware classification. Comput. Electr. Eng. 61, 266–274 (2017) 13. Saracino, A., Sgandurra, D., Dini, G., Martinelli, F.: MADAM: effective and efficient behavior-based android malware detection and prevention. IEEE Trans. Depend. Secur. Comput. 15(1), 83–97 (2018) 14. Tong, F., Yan, Z.: A hybrid approach of mobile malware detection in android. J. Parallel Distrib. Comput. 103, 22–31 (2017) 15. Wang, W., Li, Y., Wang, X., Liu, J., Zhang, X.: Detecting android malicious apps and categorizing benign apps with ensemble of classifiers. Future Gener. Comput. Syst. 78, 987–994 (2018) 16. Wang, W., Wang, X., Feng, D., Liu, J., Han, Z., Zhang, X.: Exploring permission- induced risk in android applications for malicious application detection. In: IEEE Transactions on Information Forensics and Security, pp. 1869–1882 (2017) 17. Wu, S., Wang, P., Li, X., Zhang, Y.: Effective detection of android malware based on the usage of data flow apis and machine learning. Inf. Softw. Technol. 75, 17–25 (2016) 112 X. Su et al.

18. Zhou, Y., Jiang, X.: Dissecting android malware: characterization and evolution. In: S&P, pp. 95–109 (2012) 19. Zhou, Y., Wang, Z., Zhou, W., Jiang, X.: Hey, you, get off of my market: detect- ing malicious apps in official and alternative android markets. In: Network and Distributed System Security Symposium, pp. 50–52 (2012)