Mass Discovery of Android Malware Behavioral Characteristics for Detection Consideration
Total Page:16
File Type:pdf, Size:1020Kb
Mass Discovery of Android Malware Behavioral Characteristics for Detection Consideration B Xin Su1,2,3, Weiqi Shi3, Jiuchuan Lin4( ), and Xin Wang5 1 Hunan Provincial Key Laboratory of Network Investigational Technology, Hunan Police Academy, Changsha, China 2 Key Laboratory of Network Crime Investigation of Hunan Provincial Colleges, Hunan Police Academy, Changsha, China 3 Department of Information Technology, Hunan Police Academy, Changsha, China 4 Key Lab of Information Network Security of Ministry of Public Security, The Third Research Institute of Ministry of Public Security, Shanghai, China [email protected] 5 College of Computer Science and Electronics Engineering, Hunan University, Changsha, China Abstract. Android malware have surged and been sophisticated, posing a great threat to users. The key challenge of detect Android malware is how to discovery their behavioral characteristics at a large scale, and use them to detect Android malware. In this work, we are motivated to discover the discriminatory features extracted from Android APK files for Android malware detection. To achieve this goal, firstly we extract a very large number of static features from each Android application (or app). Secondly, we explain the importance of each kind of feature in Android malware detection. Thirdly, we fed these features into three different classifiers (e.g., SVM, DT, RandomFoerst) for the detection of Android malware. We conduct extensive experiments on large real-world app sets consisting of 6,820 Android malware and 37,581 Android benign apps. The experimental results and our analysis give insights regarding what discriminatory features are most effective to characterize Android malware for building an effective and efficient Android malware detection approach. 1 Introduction Android platform dominates the smartphone operating system market, and has become the main attack target for Android malware. According to the report from IDC [2]. Android takes the first place again with 87.6% market share in the second quarter of 2016. However, this rapid deployment and extensive avail- ability of Android apps has made them attractive targets for various malware. Malware authors generally take advantage of the update mechanism of mobile apps to infect existing Android apps with malicious code and thus compromise the security of the smartphone. Recent statistical data show that malware based c Springer Nature Switzerland AG 2018 X. Sun et al. (Eds.): ICCCS 2018, LNCS 11065, pp. 101–112, 2018. https://doi.org/10.1007/978-3-030-00012-7_10 102 X. Su et al. on Android platform accounts for 97% of mobile malware [1]. The private data of the users, such as IMEI, contacts list, and other user specific data are the primary target for the attackers, which is a serious threat for the security and privacy of Android users. Consequently, there is an urgent need to identify and cope with the malware for the Android platform. This ever-growing malware threat has stimulated research into Android app security. Existing work mainly focused on (i) permission security model analysis [8,16], (ii) app vulnerability mitigation [9,11], and (iii) malware behavior analysis and detection [6,19] based on static or dynamic analysis. As the number of apps and malapps as well as their variants explosively increases in the market, it is crucial to discovery and precisely characterize the behavior of an app, so as to develop methods for Android malware detection at a large scale. The main challenge of characterizing apps is threefold as fol- lows. First, apps running on Android have distinct characteristics compared to traditional desktop software. Second, Android malware is becoming increasingly sophisticated by leveraging legitimate apps and system vulnerabilities to evade detection systems. Third, some unprotected data on smartphone such as sensor data can be exploited to steal confidential information. In order to better characterize the behavior of Android apps for malware classification, in this work, we aim at discovering discriminatory features of apps. We first extract 11 kind feature sets from Android APK files. Then, we explain the importance of each kind of feature to understand better how the features perform differently, how to select features in detection tasks and when to retrain the classification models. Extensive experiments are conducted with different feature sets and feature groups. First, in order to study the discriminative power of each feature set, we feed them into four classifiers to compare the classification performance. Second, the composition of relevant features selected by Random Forest classifier is thoroughly analyzed to reveal the most useful features of each feature set for Android malware analysis and detection. In summary, the main contributions of this work are listed as following: – We explore eleven kind of Android app feature sets to discovery and char- acterize behaviors of Android apps based on static analysis. We propose to employ three classifiers, namely, linear Support Vector Machine (SVM), J.48, and Random Forest (RF), and compare the discriminative power of different feature sets and the performance of different classifiers. – We analyze the composition of relevant features and discover the usage pat- terns of features in the Android malware. These patterns help to under- stand the behaviors of malapps with the most suitable features for automated Android malware detection. – We conduct extensive experiments with a very large Android benign app and malware sets. The experimental results demonstrate the effectiveness of our methods and models The rest of this paper is organized as follows. Section 2 describes the feature sets. Section 3 describes four different classifiers in this study. Section 4 provides Mass Discovery of Android Malware Behavioral Characteristics 103 a detailed evaluation and results analysis. Section 5 describes the related work, and Sect. 6 concludes this paper. 2 Behavioral Characteristics Description As the first step, our approach performs a lightweight static analysis of a given Android app. The behavioral characteristics we extracted can be categorized into 11 behavioral characteristic sets (abbreviated as BC). Then, we categorized the 11 behavioral characteristic sets into 2 types based on the extracted source. The first type of behavioral characteristic sets named configuration-based behavioral characteristic sets which extract from Android app configuration files, such as AndroidManifest.xml, rsa file. The second type of behavioral characteristic sets named dex-based behavioral characteristic sets which extracted from dex code. Next, we will describe these behavioral as following 9 aspects. 2.1 Behavioral Characteristic Sets from Configuration File Every Android app developed for Android must include a manifest file called AndroidManifest.xml which provides data supporting the installation and later execution of the app. The information stored in this file can be efficiently retrieved on the device using the Android Asset Packaging Tool that enables us to extract the following sets: Component Name (BC1): The majority of Android malapps are repackaged legitimate apps [18], in which the attackers insert the same malicious payload (usually in terms of components) into many different legitimate apps. We include component names as a behavioral characteristic set to capture the behavior of component reuse presented in both benign apps and malware. Request Permission (BC2) and Hardware and Software Requirement (BC3): In Android system, the two behavioral characteristics indicate the demands of the apps for system resources. Permission request patterns can char- acterize the apps intents of resource accessing. In this paper, we use all the per- missions defined by Android platform and the Android apps. Moreover, Android apps signal their hardware and software requirements to devices in their manifest files with <uses-feature> elements. We thus extract the hardware and software feature descriptors defined in Android documents as the third behavioral char- acteristic set. Filter Intent (BC4): Android platform uses intent as a messaging object an app and the platform can send to another apps component for requesting an action or process. Android malware often declare with an intent filter to receive specific system events, e.g., BOOT COMPLETED, for activating malicious activity. In this work, we extract all the intent filters in the manifest files of the samples as a feature set. 104 X. Su et al. Besides the AndroidManifest.xml file, rsa is another configuration file of Android app, we also extract one behavioral characteristic set from this file as following description. Certificate Information (BC5): App developers must sign their APK files with a certificate, the private key of which is held by themselves. This certificate helps to distinguish a developer from others. Developer information such as the country, email address, organization, state or province, as well as the SHA-1 thumbprint, can be extracted from the certificate. 2.2 Behavioral Characteristic Sets from Dex File Android apps are developed in Java and compiled into optimized bytecode for the Dalvik virtual machine. This bytecode can be efficiently disassembled and provides our approach with information about API calls and data used in an app. To achieve a low run-time, we implement a lightweight disassembler based on the dex libraries of the Android platform that can output all API calls and strings contained in an app. We use this information to construct the following behavioral