2016 International Conference on Mathematical, Computational and Statistical Sciences and Engineering (MCSSE 2016) ISBN: 978-1-60595-396-0

A Method to Detect Malware Based on Behavior Using Formal Concept Analysis

Shao-ming CHEN*, Yi-yang WANG and Bin LIANG Guangdong branch of National Computer Network Emergency Response Technical Team/Coordination Center of China, No. 4 middle road, Tianhe District, Guangzhou, China *Corresponding author

Keywords: Malware detection, App’s behavior, Permissions, FCA.

Abstract. Malware threats have recently become a real concern. To solve this problem, we propose a new approach in this paper. The method analyzes the apps’ used history and constructs a Formal Concept Lattice based on the permissions which the app is used. The concepts of Formal concept Lattice are used to be compared with the permissions which a new application required before installed. So we can find an optimal concept and identify malwares and inform users about the risk of apps which are about to be installed. An experiment illustrates that our method can effectively identify malicious apps and to protect the user's information security.

Introduction Android is one of the most popular platforms for phones today and majority of these devices are unprotected. It is not surprising that the majority malicious mobile attacks are designed for the Android mobile operating system [1]. Malware is a generic term that refers to any code added, changed, or removed from a software system to intentionally cause harm or subvert the system’s intended function. It compromises a system’s security, damages a system or obtains sensitive information without the user’s permission [2]. Malware can be detected statically or dynamically. If a malware is detected without actually running or executing it is called static analysis. If a malware is detected by executing and understanding its behaviour, the technique is called Dynamic analysis. We present a method called BIA for batch analysis of application's initial permissions and the applying permissions when used in the last paper. In this paper we present a method called BOAF for analysis of application's applying permissions when used in Formal Concept Analysis.

Related Works In recent years, many techniques and schemes have been proposed to resolve the growing problem of Android malware. We discuss the detection of malware techniques shortly. Yang et al. [3] identified privacy leakage based on whether sensitive data transmission is user intended. Their research systematically studied means of distinguishing user intended from unintended sensitive transmissions and thus provides a useful automated tool for identifying legitimate transmissions. Suleiman Y.et al. have proposed an approach to alleviate the problem of detecting malware based on Bayesian classification models obtained from static code analysis. The strategy leverages the applications’ reliance on the platform APIs and their structured packaging to extract certain properties. These properties then form the basis for Bayesian classifier, which is used to determine whether an Android app is suspicious [4]. In practice, it is a hard work to reverse app. Peng et al. [5] used probabilistic learning methods to calculate risk scores according to the requested permissions of an Android app and identified a hierarchical mixture of naive Bayes as the best classifier of detecting tasks. Bayer et al. presented a method where binary is run in PC emulator Qemu to monitor its security relevant activities by analyzing windows native call or API call. It did not change binary to prevent

166

detection by malware and uses hooks and breakpoints implanted in relevant API and native libraries [6]. The method destroys the integrity of the operating system or the integrity of the malicious program, and it can be used for testing the integrity of software or malicious programs. Elish et al. [7] used a highly accurate classification method. They proposed a scheme that statically extracts a property called user-trigger dependence as a classification feature. This property includes data flow features related to the manner in which a user inputs trigger sensitive API invocations. The paper is organized as follows. In Section 3 we describe the proposed method of detection of malware. In Section 4 the experimental results by the proposed method are presented. Section 5 contains conclusions and further directions for research.

Detection of Android Malware Based on Behavior Using Formal Concept Analysis (BOAF)

Android Android operating system is designed on basis of Linux kernel and is developed by the Google. Android has a layered architecture, including the Linux kernel layer, middle layer and application layer, which can provide consistent services for the upper layer, masks the differences of the current layer and lower layer [8].The main part of Android security model is permission mechanism. The permission mechanism limits applications to access user's private data (i.e., telephone numbers, contacts, etc.), resources (i.e., log files) and system interface (i.e., Internet, GPS etc.). In permission mechanism, the phone's resources are organized by different categories, and each category corresponds to one kind of accessed resource [9]. Formal Concept Analysis Formal Concept Analysis (FCA) is a mathematical method for analyzing binary relations. It’s a powerful tool which used to analyze data and extract knowledge from formal context by concept lattice. In 1982, concept lattice was first introduced by Wille [10]. It established on the basis of FCA in theory. In FCA, data are structured into formal concepts, which form a concept lattice, ordered by a subconcept–superconcept relation. At present, FCA has been extensively applied in several areas. Definition 3.1 A formal context is an ordered triple FC = (G, M, I) where G, M are finite nonempty sets and I ⊆ G × M is a binary relation. The elements in G are interpreted to be objects, and elements in M are said to be attributes. If (g , m) ∈ I the object g is said to have the attribute m. Definition 3.2 A formal concept of a context FC = (G, M, I) is a pair (A , B) ∈ P(G) × P(M) such that A↑ = B and B↓ = A. The set A is called the formal concept’s extension and the set B is called the formal concept’s intension. The Method of BOAF Our method is base on the author’s last paper and extends the research. Now it is illustrated as follows: (1) We regard the permissions set as permission set P in a class of applications. (2) We consider the frequency of each permission in the permission set P the support degree S when a given class of applications is used:

S(pi)=c/n (1) c is the frequency of permission pi and n is the total number of applications. (3) The permissions are ranked and the top-rank permissions are selected to represent this class of applications. (4) We calculate the permissions of g categories of apps. The g categories of apps are interpreted to be objects, and the total permissions are said to be attributes. So we can build a formal context and a concept lattice.

167

(5) Each concept of the concept lattice we built is selected to be compared with the permissions which a new application required before installed. If these permissions are consistent with the attributes of the concept, the more objects in the concept, the more common these permissions are, because these permissions are used in many classes of app. We believe the new app is secure. So we can detect whether the application is malicious and let the user decide whether to install or not. Figure1 shows the structure of the method which is proposed in this paper.

Figure 1. The structure of the method.

Validation This experiment is further extended on the basis of the BIA. The details of the procedures followed in this experiment can be summarized in the following steps: (1) We use 10 dangerous application permissions. They can be easily used by malicious programs. They are permissions such as CHANGE_NETWORK_STATE, DELETE_CACHE_FILE, INTERNET, READ_CONTACTS, SEND_SMS, READ_SMS, WRITE_EXTERNAL_STORAGE, ACCESS_FINE_LOCATION, CALL_PHONE, READ_OWNER_DATA. (2) We choose 10 apps of from a secure and reliable application store. These apps are Kugou music, QQ music, Netease music, Xiami music, , Kuwo music, Duomi music, Migu music, Love music and Love 4G. We use "P1", "P2", "P3", "P4", "P5", "P6", "P7", "P8", "P9" and "P10" represent CHANGE_NETWORK_STATE, DELETE_CACHE_FILE, INTERNET, READ_CONTACTS, SEND_SMS, READ_SMS, WRITE_EXTERNAL_STORAGE, ACCESS_FINE_LOCATION, CALL_PHONE, READ_OWNER_DATA respectively, and the "*" represents the application to apply for the permission. We obtain the required permissions of 10 applications when they are running during one week and be shown in Table 1. Table 1. The permissions of music Apps when be used P P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 App Kugou * * * QQ * * * Netease * * * Xiami * * * Baidu * * * * * Kuwo * * * * Duomi * * * * * * Migu * * * * Love * * * * Love 4G * * * * * Total 10 1 10 5 5 4 2 0 1 2 We calculate support degrees(pi): S(p1)=10/10=1, S(p2)=1/10=0.1, S(p3)=10/10=1, S(p4)=5/10=0.5, S(p5)=5/10=0.5, S(p6)=4/10=0.4, S(p7)=2/10=0.2, S(p8)=0/10=0, S(p9)=1/10=0.1, S(p10)=2/10=0.2.

168

The permissions are ranked and the top 8 permissions are selected to represent this class of applications. We calculate the permissions of 10 categories of apps and build a formal context and a concept lattice. We use "a", "b", "c", "d", "e", "f", "g", "h", "i" and "j" represent 10 classes apps of music, social, office, news, shooting, financial, medical, game, sports, shopping respectively. They are showed in Table 2 and Figure 2. Table 2. Context of 10 classes apps.

P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 1 * * * * * * * * 2 * * * * * * * * 3 * * * * * * * * 4 * * * * * * * * 5 * * * * * * * * 6 * * * * * * * * 7 * * * * * * * * 8 * * * * * * * * 9 * * * * * * * * 10 * * * * * * * *

Figure 2. The concept lattice of 10 classes apps. Each concept is used to be compared with the permissions which a new application required before installed. If a new application’s required permissions are included in the attributes of concept, the more objects (We think the objects are more than 2 and the new applications is safe) are in the concept, the more secure the new application is. Otherwise, we believe that the application is risk, a malicious program. We choose 10 categories of applications including music, 10 applications for each category. Comparing the methods by Bayer [6], Willems [11], BIA and this approach, the experiment proved that the method can be effective to detect malicious programs. The result is shown in Table 3, Table 4 and Figure 3. Table 3. Detection of malware from 10 Apps of music.

NewApp Used our method In fact Poweramp safe safe Kugou safe safe QQ malicious safe malicious malicious Guitar safe safe Violin safe safe Tianlai safe safe DJ safe safe Yoyo malicious malicious safe safe accuracy rate 90%

169

Table 4. Detection of malware from 10 classes of Apps. Categories By Bayer By Willems By By ous BIA music 80% 80% 90% 90% social 100% 100% 100% 100% office 70% 60% 80% 90% news 90% 90% 90% 90% shooting 80% 90% 90% 90% financial 90% 80% 100% 100% medical 80% 80% 80% 80% game 60% 70% 70% 80% sports 70% 80% 90% 90% shopping 80% 80% 80% 90%

Figure 3. Experimental result.

Conclusions A method called BOAF for batch analysis of application's applying permission when used in Formal Concept Analysis has been presented. Experimental result is encouraging. This research contributes to the improvement of detection of malware in several ways. First, we selected the apps from safe store and extracted the permissions which the app is required and used. These permissions can accurately confirm the security program. Second, it demonstrated the normal user usage records can greatly improve detection of malware. Third, we use the mathematical method FCA. Further research will concentrate on the automation of our approach.

References [1] Paul McNeil et al., SCREDENT: Scalable Real-time Anomalies Detection and Notification of Targeted Malware in Mobile Devices, Procedia Computer Science 83(2016) 1219-1225. [2] Aya Hellal et al., Minimal contrast frequent pattern mining for malware detection, Computer & Security 62 (2016) 19-32. [3] Z. Yang, M. Yang, Y. Zhang, G. Gu, P. Ning, X.S. Wang, Appintent: Analyzing sensitive data transmission in android for privacy leakage detection, in: Proceedings of the 2013 ACM SIGSAC Conference on Computer and Communications Security, in: CCS ’13, ACM, New York, USA, 2013, pp. 1043-1054.

170

[4] Shaikh Bushra Almin and Madhumita Chatterjee, A Novel Approach to Detect Android Malware, Procedia Computer Science 45(2015)407-417. [5] H. Peng, et al., Using probabilistic generative models for ranking risks of android apps, in: Proceedings of the 2012 ACM Conference on Computer and Communications Security, in: CCS ’12, ACM, New York, USA, 2012, pp. 241-252. [6] U. Bayer, A. Moser, C. Krugel and E. Kirda, Dynamic Analysis of Malicious Code, Journal in Computer Virology, vol. 2(1), pp. 67-77, (2006). [7] K.O. Elish, X. Shu, D.D. Yao, B.G. Ryder, X. Jiang, Profiling user-trigger dependence for android malware detection, Computers & Security 49 (0) (2015)255-273. [8] Google, Android Home Page, 2009. (http://www. android.com) [9] Google, Android Security and Permissions, 2013. (http://d.android.com/guide/topics /security.html) [10] R. Wille, Restructuring lattice theory: an approach based on hierarchies of concepts, in: I. Rival (Ed.), Ordered Sets, Reidel, Dordrecht, Boston, 1982, pp. 445–470. [11] C. Willems, T. Holz and F. Freiling, Toward Automated Dynamic Malware Analysis using CWSandbox, IEEE Security and Privacy, vol. 5(2), pp. 32–39, (2007).

171