This is a repository copy of Intelligent OS X malware threat detection with code inspection.

White Rose Research Online URL for this paper: http://eprints.whiterose.ac.uk/128371/

Version: Published Version

Article: Pajouh, H.H., Dehghantanha, A. orcid.org/0000-0002-9294-7554, Khayami, R. et al. (1 more author) (2018) Intelligent OS X malware threat detection with code inspection. Journal of Computer Virology and Hacking Techniques, 14 (3). pp. 213-223. ISSN 2274-2042 https://doi.org/10.1007/s11416-017-0307-5

Reuse This article is distributed under the terms of the Creative Commons Attribution (CC BY) licence. This licence allows you to distribute, remix, tweak, and build upon the work, even commercially, as long as you credit the authors for the original work. More information and the full terms of the licence here: https://creativecommons.org/licenses/

Takedown If you consider content in White Rose Research Online to be in breach of UK law, please notify us by emailing [email protected] including the URL of the record and the reason for the withdrawal request.

[email protected] https://eprints.whiterose.ac.uk/ J Comput Virol Hack Tech DOI 10.1007/s11416-017-0307-5

ORIGINAL PAPER

Intelligent OS X malware threat detection with code inspection

Hamed Haddad Pajouh1 · Ali Dehghantanha2 · Raouf Khayami1 · Kim-Kwang Raymond Choo3,4

Received: 31 July 2017 / Accepted: 27 September 2017 © The Author(s) 2017. This article is an open access publication

Abstract With the increasing market share of Mac OS X over 91% detection accuracy with 3.9% false alarm rate. operating system, there is a corresponding increase in the We also utilize Synthetic Minority Over-sampling Technique number of malicious programs (malware) designed to exploit (SMOTE) to create three synthetic datasets with different dis- vulnerabilities on Mac OS X platforms. However, existing tributions based on the refined version of collected dataset to manual and heuristic OS X malware detection techniques investigate impact of different sample sizes on accuracy of are not capable of coping with such a high rate of malware. malware detection. Using SMOTE datasets we could achieve While machine learning techniques offer promising results over 96% detection accuracy and false alarm of less than 4%. in automated detection of Windows and Android malware, All malware classification experiments are tested using cross there have been limited efforts in extending them to OS X validation technique. Our results reflect that increasing sam- malware detection. In this paper, we propose a supervised ple size in synthetic datasets has direct positive effect on machine learning model. The model applies kernel base Sup- detection accuracy while increases false alarm rate in com- port VectorMachine and a novel weighting measure based on pare to the original dataset. application calls to detect OS X malware. For training and evaluating the model, a dataset with a combination of 152 Keywords OS X malware detection · RBF–SVM · Mach-O · malware and 450 benign were created. Using common super- Supervised classification · Cyber threat intelligence vised Machine Learning algorithm on the dataset, we obtain

B Ali Dehghantanha 1 Introduction [email protected] Hamed Haddad Pajouh Malicious softwares (malware) are a serious threat to the [email protected] security of computing systems [1,2]. Kaspersky and Labs Raouf Khayami alone detected more than 121,262,075 unique malware in [email protected] 2015 [3] while Panda Labs predicted that half of security Kim-Kwang Raymond Choo issues are directly related to malware infections [4], McAffe [email protected] reported a rise of 744% OS X malware over 2015 in 2016 [5]. The increasing Mac OS X market size (second after 1 Department of Computer Engineering and Information, [6] and its fast adoption rate motivate Technology, Shiraz University of Technology, Shiraz, Iran cyber threat actors to shift their focus to developing OS X 2 School of Computing, Science and Engineering, University of malware. The “myth” that OS X is a more secure system Salford, Salford, UK only further increases malware success rate. For example, 3 Department of Information Systems and Cyber Security, The OS X Flashback Trojan successfully infected over 700,000 University of Texas at San Antonio, San Antonio, TX 78249, USA machines in 2012 [7]. Security researchers have developed a wide range of anti- 4 School of Information Technology and Mathematical Sciences, University of South Australia, Adelaide, SA 5095, malware tools and malware detection techniques in their Australia battle against the ever increasing malware and potentially

123 H. H. Pajouh et al.

Fig. 1 Research methodology

malicious programs, including approaches based on super- The False Alarm Rate (FAR) is the rate that a classifier vised and unsupervised machine learning techniques for wrongly detected a goodware as malware and computed as: malware detection [7]. In approaches using supervised tech- niques, tagged datasets of malicious and benign programs are FP FAR = (2) required for training. Approaches using unsupervised tech- FP + TN niques generally do not require the separation of malware and goodware, and programs are generally classified based Our research methodology is presented in Fig. 1. on observable similarities or differences [8]. The organization of this paper is as follows. Section 2 While there have been promising results on the use of discusses related research, and Sect. 3 describes our dataset machine learning in Windows and Android malware detec- development. Sections 4 and 5 presents our malware classi- tion [9,10], there has been no prior work on using machine fication and a discussion of this work, respectively. Finally, learning for OS X malware detection. This could be, per- we conclude in the last section. haps, due to the lack of a suitable research dataset and the difficulties in collecting OS X malware. In this paper, we propose a machine learning model to 2 Literature review detect OS X malware based on the Radial Base Function (RBF) in the SVM technique. This provides us a novel mea- Machine learning techniques have been used for malware sure based on application’s library calling to detect malware detection. Nauman et al. [11] used game-theoretic rough sets from benign samples. We then propose a new weighting mea- (GTRS) and information-theoretic rough sets (ITRS) to show sure for classifying OS X goodware and malware based on that a three-way decision-making approach (acceptance, the frequency of library calling. This measure weights each rejection and deferment) outperforms two-way (accept, library based on its frequency of occurrence in malware and reject) decision-making techniques in network flow analysis benign applications. for Windows malware detection. Fattori et al. [12]devel- These datasets are then evaluated using four main clas- oped an unsupervised system-centric behavioral Windows sification techniques, namely: Nave Bayes, Bayesian Net, malware detection model with reportedly 90% in accuracy. Multi Layer Perceptron (MLP), Decision Tree-J48, and Their approach monitors interactions between applications Weighted Radial Basis Function Kernels-Based Support and underlying Windows operating system for classifica- Vector Machine (Weighted-RBFSVM). The following per- tion of malicious applications. Mohaisen et al. [13] proposed formance indicators are used for evaluating the performance an unsupervised behavioral based (dynamic) Windows mal- of our machine learning classifiers: ware classification technique by monitoring file system and True Positive (TP): shows the ratio of goodware classified memory interactions and achieved more than 98% preci- as benign; sion. Huda et al. [14] proposed a hybrid framework for True Negative (TN): shows the ratio of malware correctly malware detection based on programs interactions with Win- detected as malware; dows Application Program Interface (API) using Support False Positive (FP): shows that the ratio of malware files Vector Machines (SVM) wrappers and statistical measures identified as benign; and and obtained over 96% detection accuracy. False Negative (FN): shows the ratio of goodware classified Nissim et al. [15] proposed an SVM-based Active Learn- as malware. ing framework to detect novel Windows malware using Accuracy (ACC): measures the ratio that a classifier cor- supervised learning with an average accuracy of 97%. rectly detected malware and benign samples (goodware), and Damodaran et al. [16] utilized Hidden Markov Models is computed using following formula: (HMMs) to trace APIs and Opcodes of Windows malware sequences and developed a fully dynamic approach for mal- TP + TN ware detection based on API calls with over 90% accuracy. ACC = (1) FN + TP + FP + TN Mangialardo and Duarte [17] proposed a hybrid supervised 123 Intelligent OS X malware threat detection with code inspection machine learning model using C5.0 and Random Forests (RF) algorithms with an accuracy of 93.00% for detecting Linux malware. Due to the increasing use of smart devices such as Android and iOS devices, there has been a corresponding increase in the number of Android and iOS malware [18–20]. Suarez- Tangil et al. [21], for example, proposed an Android malware detection model. Yerima et al. [22] utilized ensemble learn- ing techniques for Android malware detection and reportedly had an accuracy rate between 97.33 and 99%, with a rela- tively low false alarm rate (less than 3%). Saracino et al. [23] designed a system called MADAM which is a host-based Android malware detection. The MADAM was evaluated using real world apps. OS X malware has also been on the increase [24], but there is limited published research in OS X malware analysis and detection. For example, a small number of researchers have developed OS X malware and Rootkit detection tech- niques, and malware detectors by tracing suspicious activities in memory (like unwanted access, read, write and execute) [25–27]. However, applying machine learning to detect OS X malware is limited to the Walkup approach [28], which utilized Information Gain (IG) to select effective features for supervised classification of OS X malware. Hence, devel- opment of machine learning techniques for OS X malware detection is the gap that this paper seeks to contribute to.

3 Dataset development

As part of this research, we collected 152 malware samples from [29–31]. These samples were collected between Jan 2012 and June 2016 thus OS version which can run them are in following order: OS X 10.8 (Mountain Lion), 10.9 (Mavericks), 10.10(Yosemite) and 10.11(El Clapton). Dupli- Fig. 2 MacOS application bundle structure cated samples were detected by performing a SHA-256 hash comparison and removed from the datasets. Known OS X malware such as WireLurker, MacVX, LaoShu, and Kitmos info.plist: This fill consist the configuration information for are among the malware in our dataset. Similar to previous the application. The Mac Operating System relies on the datasets such as those of Masud et al. [32], in order to build presence of info.plist to realize related information about the a non-biased dataset for detecting malware as anomalous application and other relevant files. samples, we need at least 456 goodware (three times the MacOS: Consists the applications executable code file number of malware, compared to the number of malware) in (Mach-O). Usually, this directory comes with only a binary our datasets. file with the applications main entry point and constantly To start with how the dataset collected, we first presented linked code. an overall definition of each MacOS X application in Fig. 2. Resources: Consists all resource files of the application i.e. As it can be seen if you extract each OS X application bundle picture, Audio, Video and etc. you would usually encounter a directory, named Contents. Framework: Consists all private shared library of the appli- This directory also consists files and some component as cation and the framework which used by executable code. follows [33]: PlugIns: Consists all loadable files and libraries which Contents: This directory is main part of each application extend application features and capabilities. bundle and contains several directory and files which is intro- SharedSupport: Consists all non-critical resources which duce as follows: not extend the application capabilities.

123 H. H. Pajouh et al.

Fig. 3 The process of dataset development

Therefore, we randomly downloaded a total of 460 apps 3.1 Data preprocessing of top 100 apps listed in Utilities, Social Network, Weather, Video and Audio, Productivity, Health and Fitness and Net- Similar to many other malware machine learning datasets, work categories of the Apple App Store [34]asofJun our datasets include several features with missing values; 2016. Dominance of benign samples in the collected dataset thus, we utilized K-Nearest Neighbor (KNN) imputation was due to obtain desirable results in False Alarm rate technique [40] for estimation of missing values. The impu- by training the classifier with more goodware and detect tation technique is performed in two steps, as follows: anomalies from them just like real world benchmark dataset on anomaly detection which provided in [35–37]. We then • Utilizing Euclidean distance for computing distance extracted the Mach-O binaries of all malware and bening- between each missing value (i.e. xi ) and all other samples ware samples in the respective datasets manually. Mach-O without a missing value to detect the K nearest samples. binaries are the executable portion of an OS X applica- • Impute the missing value of xi by computing the average tion [38] and consist of three sections as follows (see also value of the K nearest samples. Fig. 3): Since extracted features values are in different ranges, a normalization technique is used to increase the SVM perfor- 1. Header contains common information about the binary mance. As all extracted features are Integer values (except such as byte order (magic number), CPU type, and num- Library Name), Eq. 3 can be used to convert them to [0 − 1] ber of load commands. Load Commands section contains interval. information about the logical structure of an executable file and data stored in the virtual memory such as symbol xi − min{featured } Xn = , table and dynamic symbol table. ranged 2. Load Commands contains information about logical ranged = max{featured }−min{featured } (3) structure of an executable file and data stored in the vir- tual memory such as symbol table, dynamic symbol table, In Eq. 3, xN and xi denote the respective normalized value etc. and raw extracted value of the feature in dth dimension. Fig- 3. Segments is the biggest part of each Mach-O files which ure 4 shows the overlap of the collected datasets between contains application code and data. two features vectors which belong to malicious and benign class before and after preprocessing. It is clear that there are minimal overlaps and the class borders are more distinct. We wrote a Python script [39] to extract features from Mach-O files (Table 1). Our script parsed each Mach-O 3.1.1 Feature selection binary and created three separate output files as follows: Mach-O HD: This file contains all Mach-O Header infor- Feature selection techniques are used to find the most rel- mation such as CPU type, number of commands, and size of evant attributes for tion. At this stage, the three common commands. feature selection technique (Information Gain, Chi-Square Mach-O LC: This file includes all information about library and Principal Component analysis) for malware detection import/export, symbol table and string functions. based on code inspection Shabtai et al. [41,42] were applied. Mach-O SG: This file provides the raw data of three Mach-O Information Gain (IG) [43] is a technique used to evalu- file sections (i.e. Data, Text and Segment) (Table 1). ate attributes to find an optimum separation in classification,

123 Intelligent OS X malware threat detection with code inspection

Table 1 OS X dataset features Feature name Descriptiona Value type File

1. ncmds Number of commands of each sample Integer Mach-O HD 2. sizeofcmds Size of commands of each sample Integer Mach-O HD 3. noloadcmd Number of commands which sample will loaded during execution Integer Mach-O LC 4. rebase_size Define size of the rebase information Integer Mach-O LC 5. bind_size Define size of the information which will be bind during execution Integer Mach-O LC 6. lazy_bind_size Define size of the information which will be bind during execution Integer Mach-O LC 7. export_size Define the size of the lazy binding information Integer Mach-O LC 8. nsyms Define the number of symbol table entries Integer Mach-O LC 9. strsize Define string table size in bytes Integer Mach-O LC 10. LoadDYLIB Define number of DYLIB which called and load for executing of malware Integer Mach-O LC 11. DYLIBnames Define names of loaded DYLIB Nominal Mach-O LC 12. Segments Number of total segments which consist in each sample Integer Mach-O SG 13. SectionsTEXT Number text segments which consist in each sample Integer Mach-O SG 14. SectionsData Number data segments which consist in each sample Integer Mach-O SG a Feature descriptions are adopted from apple developer guidelines (Mach-O programming topics) [38]

Suppose we have m class labels (for binary classification m = 2), c class and t be the number of attribute dimension to be evaluated, the IG scores can be obtained using Eq. (4) as follows:

m G(t) =− Pr(ci ) log Pr(ci ) + Pr(t) i=1 m =  Pr(ci |t) log Pr(ci |t) + Pr(t¯) i=1 (4) m =  Pr(ci |t¯) log Pr(ci |t¯) + IG i=1

= G(t) − G(ti )

2 Chi-Square method calculates the χavg (t)(seeEq.5) score function for attributes as per equation, where N is the sample size, A is the frequency of co-occurrence of t and c together, B is the frequency of occurrence of t without c, C is the times c happens without t, and D is the frequency without Fig. 4 a Probability density function (PDF) of sizeOfcmds and bind- Size features before pre-processing b probability density function (PDF) the occurrence of t or c. of sizeOfcmds and bindSize features after pre-processing N × (AD − CB)2 χ 2(t, c) = based on mutual dependencies of labels and attributes. Chi- ((A + C) × (B + D) × (A + B) × (C + D)) square measures the lack of independence between attributes (5) [44]. Principal Component Analysis (PCA) can be used to 2 2 χ (t) = Pr(ci )χ (t, ci ) (6) perform feature selection and extraction. We also used PCA avg as a feature selection mechanism to select the most infor- mative features for classification. After the feature selection These feature selection methods provided us a sequence methods were used to calculate the relevant scores, features of effective features after applying them on the collected with the highest scores will be considered. datasets based on their parameters (see Tables 2 and 3).

123 H. H. Pajouh et al.

Table 2 Selected features from the three different techniques Table 4 Applied collected and synthetic datasets distribution Method Selected features Dataset Benign Malicious Total record

Info-gain 4, 3, 1, 5, 2, 10, 6, 7, 9, 13, 8, 12, 11 Original dataset 460 152 612 χ 2 4, 5, 3, 1, 2, 10, 6, 9, 7, 8, 13, 12, 11 2x_SMOTE 920 304 1224 PCA 4, 5, 3, 1, 2, 10, 6, 9, 7, 8, 13, 12, 11 3x_SMOTE 1380 456 1836 5x_SMOTE 2300 760 3060

Table 3 Features obtained values from ranker search method to select appropriate feature n freq(lib j |b)i Features PCA InfoGain χ 2 [ j=1] SWi|b = n (8) [v=1] libv|b 1. ncmds 0.648 0.2197 178.62 2. sizeofcmds 0.4757 0.1852 151.86 In the above equations, SWi|m,b represents ith sample weight 3. noloadcmd 0.379 0.2256 183.25 for each class (malign or benign) and freq(lib j |m)i shows 4. rebase_size 0.3049 0.2794 216.90 that the occurrence number of jth library (lib) called by ith 5. bind_size 0.2336 0.2368 176.77 application in malign (m) or benign (b) class (i.e. libv|m 6. lazy_bind_size 0.1738 0.1721 132.58 means jth library in malign class). After these two mea- 7. export_size 0.1281 0.1062 94.45 sures are calculated, we use them as the new features for 8. nsyms 0.0854 0.1026 70.09 classification. 9. strsize 0.0553 0.1226 94.30 10. LoadDYLIB 0.0331 0.1841 138.63 3.3 SMOTE dataset development 11. Segments 0.00.0329 33.67 12. SectionsTEXT 0.00.0475 39.00 Synthetic Minority Over-sampling Technique (SMOTE) [45] 13. SectionsData 0.012 0.1024 87.91 is a supervised re-sampling technique to balance minority classes. SMOTE is using K-Nearest Neighbors (KNN) algo- rithm to find the best location in each dimension to generate 3.2 Library weighting synthetic samples (see Fig. 5). We used SMOTE to create three datasets of double size, triple size and quintuple size One of the extracted features is system libraries, which are of original dataset all in the same proportion with the orig- called by an application. In this phase, the probability of call- inal dataset (see Table 4). We believe our collected datasets ing each and every system libraries is calculated. For each pave the way for future research in application of machine system library, two indicators are calculated. First, the overall learning in OS X malware detection. occurrence probability of the library in the dataset. Second, the occurrence probability of the library in each of the mal- ware or goodware classes. Then, the sample weight (SW) of 4 OS X malware classification each library is calculated for both malign and benign classes as per Eqs. (7) and (8). Five main supervised classification techniques, Nave Bayes, Bayesian Net, Multi Layer Perceptron (MLP), Decision n [ j= ] freq(lib j |m)i Tree-J48, and Weighted Radial Basis Function Kernels- SW =  1 (7) i|m n Based Support Vector Machine (Weighted- RBFSVM), are [v=1] libv|m

Fig. 5 SMOTE technique uses KNN to generate synthetic sample

123 Intelligent OS X malware threat detection with code inspection

Fig. 8 Accuracy and false alarm rates among original dataset and syn- thetic dataset

Fig. 6 Support vectors and maximizing margin function is defined by support vectors data samples. This margin is calculated from candidate support vectors which are those nearest to the optimized margin (the largest margin then evaluated using our datasets. The main classification that separated two types of data) see Fig. 6. task of the proposed methodology is developed using The problem of maximizing margin in SVM can be solved SVM. using Quadratic Programming (QP) as shown in Eq. (9). The machine learning algorithm in [46] separates data into N-dimensions with different categories in each hyperplane. l l l Then, the dimension with the largest margin will be used 1 Minimize : W(α)=− α + γ γ α α k(χ ,χ ) for classification. The given training data samples are paired  k 2   k p k p k p k=1 k=1 p=1 and labeled as (X,Y), where X is the dataset feature vector subject to:∀k : 0 ≤ αk ≤ C and (which contains features as x1, x2, x3, xn) and Y that repre- l sents labels (malicious or benign) for X features. α γ = 0(9) Both X and Y are fed as inputs to the SVM classifier. SVM  k k k=1 is the used to maximize the margin between given classes and obtain best classification result. The boundary of margin

Fig. 7 Added library-weighting features and corresponding support vectors

Table 5 Supervised Classifier Dataset Accuracy False alarm classification results by cross-validation Nave bayes Original_row 51 36.3 Bayesian net Original_row 82.35 19.78 MLP Original_row 81.37 7.8 Decision tree-48 Original_row 88.07 8 Weighted-linear Original_row 89 4.1 Weighted-sigmoid Original_row 85.95 3.9 Weighted-polynomial Original_row 87.95 3.0 Weighted-RBF Original_normalized 91 3.9

123 H. H. Pajouh et al.

Table 6 Supervised classification results by cross-validation Although SVM is a promising supervised classifier, it has Classifier Dataset Accuracy False alarm its own drawbacks. SVM technique performance and accu- racy rely heavily on the training data complexity, structure Nave Bayes SMOTE_2X 54.33 43.15 and size [49]. In our research, the size of training dataset is Nave Bayes SMOTE_3X 55.35 44.72 suitable for SVM classification and there are not too many Nave Bayes SMOTE_5X 54.71 4.87 features. Moreover, our dataset is normalized which reduces Bayesian net SMOTE_2X 87.55 13.84 the complexity of the training set. Bayesian net SMOTE_3X 86.88 14.89 Bayesian net SMOTE_5X 84.72 18.84 5 Findings and discussion MLP SMOTE_2X 85.62 7.3 MLP SMOTE_3X 88.15 6.68 Using the library-weighting measure, we created two new MLP SMOTE_5X 89.02 5.1 features, namely: lib-w-b (library-weight-benign) and lib- Decision tree-J48 SMOTE_2X 92.82 7.1 w-m (library-weight-malware), to increase the accuracy of Decision tree-J48 SMOTE_3X 95.75 4.28 classification (see Fig. 7). Table 5 presents the evaluation Decision tree-J48 SMOTE_5X 96.62 4 results of Nave Bayes, Bayesian Net, MLP, Decision Tree- Weighted-RBFSVM Original 91 3.9 J48, and Weighted- RBFSVM on the original dataset with tenfold Cross Validation (CV) technique. Due to data nor- malization and well-separated features (shown in Fig. 7), In the above equation, l denotes the number of training it is clear that the weighted-RBFSVM offers the high- objects, αk the vector of l variables in which segment αk est accuracy (91%) and lowest false alarm rate (3.9%) belongs to the training sample of xk, and C is the mar- (Table 6). gin parameter which controls effects of noise and outliers Table 6 shows results of evaluating Nave Bayes, Bayesian within the training set. Samples in training set with αks Net, MLP, Decision Tree-J48, and Weighted- RBFSVM of greater than zero are the support vector objects. Others against our three SMOTE datasets using tenfold Cross Val- with αk value of zero are considered non-support vector idation (CV) technique. While accuracy is increased in all objects; thus, they are not consider in calculation of margin cases and we have received much higher accuracy (i.e. function. 96.62% detection rate of Decision Tree-J48 on SMOTE_5X); For better separation, data points in the SVM kernel func- the false alarm rate is not reduced and more training time is tion are used as k(xk, x p) in the QP equation (see Eq. 9). required due to the bigger size of datasets [50]. In Addition, Kernel functions map training data into higher dimensions the complexity of classification technique had reduction due to find a separating hyper plane with a maximum margin to two new added features(lib-w-b, lib-w-m). For instance [47]. J48 classification complexity before adding the two new fea- There are some common kernel functions such as Linear, tures was 65 nodes and 35 leaves but after providing the new Polynomial and RBF and Sigmoid Kernel for SVM classifier. features reduced to 55 nodes and 33 leaves receptively. In this research, due to the proximity of data (see Fig. 4), RBF Figure 9 depicts the frequency of occurrence of every kernel function [48] is utilized (see Eq. 10). library calls in the original dataset. Figure 8 depicts accuracy and false alarm rate for orig- 2 k(χk,χp) = exp(−γ ||χk − χp|| ) (10) inal and SMOTE datasets. While SMOTE datasets are

Fig. 9 Percentage of library intersection in the collated dataset

123 Intelligent OS X malware threat detection with code inspection

Fig. 10 KS density function for segments Fig. 12 KS density function for SectionsTEXT

Fig. 13 KS density function for lib-weighting Fig. 11 KS density function for SectionsData significantly bigger in compare with the original dataset, the proposed model obtained lower false alarm in the original dataset with almost same accuracy of SMOTE datasets. A comparison of low ranked features (i.e. Segments, SectionsTEXT, SectionsData) using Kernel Smooth (KS) density estimation shows a significant overlap between low ranked features of malware and benign applications (see Fig. 10); hence, these features are not suitable for classi- fication. The experiments on KS density estimation also suggested that data and text sections had the most over- Fig. 14 Application call library statistics for malign and benign appli- cations laps in comparison to other extracted features—see Figs. 11 and 12. According to Fig. 13, the KS density estimation library-weighting provides a distinction between malware mostly benign while most malicious apps more frequently and benign samples, since these two curves (malware and call system libraries (i.e. libSystem) and Sqlite libraries. benign) are almost orthonormal as the peak of one curve is the opposite trend of the other. Therefore, it can be said that this feature is highly appropriate for classification. 6 Conclusion and future work As shown in Fig. 14 CoreGraphics, CoreLocation, Ore- services and Webkit libraries were called a lot more in In this paper, we developed four OS X malware datasets and benign applications while libc and libsqlite3 were called a novel measure based on library calls for classification of OS significantly more by malware. Statistical analysis of the X malware and benign application. We have obtained accu- library calls revealed that applications that call audio and racy of 91% and the false alarm rate of 3.9% using weighted video related libraries (AudioToolbox and CoreGraphics) are RBF–SVM algorithm against our original dataset. Moreover,

123 H. H. Pajouh et al. using Decision Tree- J48 we obtained 96.62% accuracy using 8. Gardiner, J., Nagaraja, S.: On the security of machine learning in SMOTE_5X dataset with slightly higher false alarm (4%). malware C&C detection: a survey. ACM Comput. Surv. 49(3), 1–39 Moreover the synthetic datasets are generated using SMOTE (2016) 9. Sun, M., Li, X., Lui, J.C.S., Ma, R.T.B., Liang, Z.: Monet: a technique and assessed them by same supervised algorithm. user-oriented behavior-based malware variants detection system This experiment is conducted to show effect of number of for android. IEEE Trans. Inf. Forensics Secur. 12(5), 110312 (2017) sample size on detection accuracy. Our results indicate that 10. Nissim, N., Cohen, A., Elovici, Y.: ALDOCX: detection of increasing sample size may increase detection accuracy but unknown malicious microsoft office documents using designated active learning methods based on new structural feature extrac- adversely affect the false alarm rate. OS X malware detec- tion methodology. IEEE Trans. Inf. Forensics Secur. 12(3), 63146 tion and analysis utilising dynamic analysis techniques is (2017) a potential future work of this research. Extending classifi- 11. Nauman, M., Azam, N., Yao, J.: A three-way decision making cation using other techniques such as Fuzzy classification, approach to malware analysis using probabilistic rough sets. Inf. Sci. 20(374), 193209 (2016) applying deep learning for OS X malware detectionm and 12. Fattori, A., Lanzi, A., Balzarotti, D., Kirda, E.: Hypervisor-based using a combination of our suggested features for OSX mal- malware protection with accessminer. Comput. Secur. 52, 3350 ware detection are interesting future works of this study. (2015) 13. Mohaisen, A., Alrawi, O., Mohaisen, M.: AMAL: high-fidelity, behavior-based automated malware analysis and classification. Acknowledgements We thank VirusTotal for providing us a private Comput. Secur. 52, 25166 (2015) API key to access their data for constructing our dataset. This work 14. Huda, S., Abawajy, J., Alazab, M., Abdollalihian, M., Islam, R., is partially supported by the European Council International Incoming Yearwood, J.: Hybrids of support vector machine wrapper and filter Fellowship (FP7-PEOPLE-2013-IIF) grant. based framework for malware detection. Future Gener. Comput. Syst. 55, 37690 (2016) Open Access This article is distributed under the terms of the Creative 15. Nissim, N., Moskovitch, R., Rokach, L., Elovici, Y.: Novel active Commons Attribution 4.0 International License (http://creativecomm learning methods for enhanced PC malware detection in windows ons.org/licenses/by/4.0/), which permits unrestricted use, distribution, OS. Expert Syst. Appl. 41(13), 584357 (2014) and reproduction in any medium, provided you give appropriate credit 16. Damodaran, A., Troia, F.D., Visaggio, C.A., Austin, T.H., Stamp, to the original author(s) and the source, provide a link to the Creative M.A.: Comparison of static, dynamic, and hybrid analysis for mal- Commons license, and indicate if changes were made. ware detection. J. Comput. Virol. Hacking Tech. [Internet]. 29 December 2015 [cited 2016 Oct 4]. http://link.springer.com/10. 1007/s11416-015-0261-z 17. Mangialardo, R.J., Duarte, J.C.: Integrating static and dynamic malware analysis using machine learning. IEEE Lat. Am. Trans. References 13(9), 30807 (2015) 18. Shaerpour, K., Dehghantanha, A., Mahmod, R.: Trends in android 1. Daryabar, F., Dehghantanha, A., Udzir, N.I.: Investigation of malware detection. J. Digit. Forensics Secur. Law. 8(3), 2140 bypassing malware defences and malware detections. In: 2011 7th (2013) International Conference on Information Assurance and Security 19. Faruki, P.,Bharmal, A., Laxmi, V.,Ganmoor, V.,Gaur, M.S., Conti, (IAS), p. 1738 (2011) M., et al.: Android security: a survey of issues, malware penetration, 2. Bisio, F., Gastaldo, P., Meda, C, Nasta, S., Zunino, R.: Machine and defenses. IEEE Commun. Surv. Tutor. 17(2), 998–1022 (2015) learning-based system for detecting unseen malicious software. In: 20. Feizollah, A., Anuar, N.B., Salleh, R., Wahab, A.W.A.: A review Gloria A.D. (eds) Applications in Electronics Pervading Industry, on feature selection in mobile malware detection. Digit. Investig. Environment and Society [Internet], p. 915. Springer International 13, 2237 (2015) Publishing (2016) [cited 2016 Nov 28]. (Lecture in Electri- 21. Suarez-Tangil, G., Tapiador, J.E., Lombardi, F., Pietro, R.D.: cal Engineering). http://link.springer.com/chapter/10.1007/978-3- ALTERDROID: differential fault analysis of obfuscated smart- 319-20227-3_2 phone malware. IEEE Trans. Mob. Comput. 15(4), 789802 (2016) 3. Kaspersky Lab: Overall statistics for 2015 [Internet]. Kasper- 22. Yerima, S.Y., Sezer, S., Muttik, I.: High accuracy android mal- sky Lab, Russia (2016). https://securelist.com/files/2015/12/KSB_ ware detection using ensemble learning. IET Inf. Secur. 9(6), 31320 2015_Statistics_FINAL_EN.pdf (2015) 4. Panda Lab: Pandalabs annual report 2015 [Internet], p. 30. (2016) 23. Saracino, A., Sgandurra, D., Dini, G., Martinelli, F.: Madam: Effec- [cited 2016 Nov 30]. Report No.: 4. http://www.pandasecurity. tive and efficient behavior-based android malware detection and com/mediacenter/src/uploads/2014/07/Pandalabs-2015-anual- prevention. IEEE Trans. Dependable Secure Comput. (2016) EN.pdf 24. Brien, D.O.: The apple threat landscape [Internet], p. 31. Symantec 5. Beek, C., Frosst, D., Greve, P., Gund, Y., Moreno, F., Peterson, 2016 Feb. (SECURITY RESPONSE). Report No.: 1.02. https:// E., Schmugar, C., Simon, R., Sommer, D., Sun, B., Tiwari, R., www.symantec.com/content/dam/symantec/docs/security-center/ Weafer, V.: McAfee Labs Threats Report [Internet], p. 49. McAfee white-papers/apple-threat-landscape-16-en.pdf Lab (April 2017). https://www.mcafee.com/us/resources/reports/ 25. Europe key target for cybercrime. Comput Fraud Secur. 2011(1), rp-quarterly-threats-mar-2017.pdf 3, 20 (2011) 6. Stack Overflow Developer Survey 2016 Results [Internet]. Stack 26. Richard III, G.G., Case, A.: In lieu of swap: analyzing compressed Overflow. [cited 2016 Nov 28]. http://stackoverflow.com/research/ RAM in Mac OS X and Linux. Digit. Investig. 11(2), S3–S12 developer-survey-2016 (2014) 7. Aquilino, B.I.: FLASHBACK OS X MALWARE. In: Pro- 27. Case, A., Richard, G.G.: Advancing Mac OS X rootkit detection. ceedings of Virus Bulletin Conference [Internet], p. 102114. Digit. Investig. 14, S25–S33 (2015) (2012) [cited 2017 Apr 7]. https://pdfs.semanticscholar.org/6b7b/ 28. Walkup E.: Mac malware detection via static file struc- d026676c5e30b42b40f50ed8076b81eb2764.pdf ture analysis. Standford [Internet] (2014) [cited 2017 Mar

123 Intelligent OS X malware threat detection with code inspection

28]. http://cs229.stanford.edu/proj2014/Elizabeth%20Walkup,% 41. Shabtai, A., Kanonov, U., Elovici, Y., Glezer, C., Weiss, Y.: 20MacMalware.pdf Andromaly: a behavioral malware detection framework for android 29. VirusTotal-Free online virus, malware and URL scanner [Internet]. devices. J. Intell. Inf. Syst. 38(1), 16190 (2012) [cited 2016 Nov 28]. https://www.virustotal.com/ 42. Shabtai, A., Fledel, Y., Elovici, Y.: Automated static code analysis 30. Objective-see [Internet]: Objective-See. [cited 2016 Nov 28]. for classifying android applications using machine learning. In: https://objective-see.com Computational Intelligence and Security (CIS), 2010 International 31. Contagio Malware Dump: Mila. http://contagiodump.blogspot. Conference on IEEE, pp. 329-333 (2010) com/. Accessed 28 Jun 2016 43. Joachims, T.: Text categorization with support vector machines: 32. Masud, M.M., Khan, L., Thuraisingham, B.: A hybrid model to learning with many relevant features. In: European Conference on detect malicious executables. In: 2007 IEEE International Confer- Machine Learning, pp. 137–142 (1998) ence on Communications, 14438 (2007) 44. Zhu, Z., Ong, Y.-S.,Dash, M.: Wrapperfilter feature selection algo- 33. [Internet]. [cited 2017 Sep 13]. https://developer.apple.com/ rithm using a memetic framework. IEEE Trans. Syst. Man. Cybern. library/content/documentation/CoreFoundation/Conceptual/ Part B Cybern. 37(1), 706 (2007) CFBundles/BundleTypes/BundleTypes.html#apple_ref/doc/uid/ 45. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: 10000123i-CH101-SW1 SMOTE: synthetic minority over-sampling technique. J. Artif. 34. Downloads on iTunes [Internet]. [cited 2016 Nov Intell. Res. 16, 321357 (2002) 28]. https://itunes.apple.com/us/genre/mac/id39?mt=12 46. The Nature of Statistical Learning Theory | Vladimir Vapnik | 35. KDD Cup 1999 Data: 2000 [Online]. http://kdd.ics.uci.edu/ Springer [Internet]. [cited 2016 Dec 17]. http://www.springer.com/ databases/kddcup99/kddcup99.html. Accessed 17 Sept 2017 gp/book/9780387987804 36. Garcia, S., Grill, M., Stiborek, J., Zunino, A.: An empirical com- 47. Scholkopf, B., Smola, A.J.: Learning with Kernels: Support Vector parison of botnet detection methods. Comput. Secur. 45, 100–123 Machines, Regularization, Optimization, and Beyond. MIT Press, (2014) Cambridge (2001) 37. Song, J., Takakura, H., Okabe, Y., Eto, M., Inoue, D., Nakao, K.: 48. Shashua, A.: Introduction to machine learning: class notes 67577. Statistical analysis of honeypot data and building of Kyoto 2006+ ArXiv Preprint arXiv:0904.3664 [Internet]. 2009 [cited 2016 Dec dataset for NIDS evaluation. In: Proceedings of the First Workshop 17]. arXiv:0904.3664 on Building Analysis Datasets and Gathering Experience Returns 49. Burges, C.J.: A tutorial on support vector machines for pattern for Security (2011) recognition. Data Min. Knowl. Discov. 2(2), 121167 (1998) 38. Executing Mach-O Files [Internet]. [cited 2017 May 13]. 50. Kavzoglu, T., Colkesen, I.: The effects of training set size for https://developer.apple.com/library/content/documentation/ performance of support vector machines and decision trees. In: Pro- DeveloperTools/Conceptual/MachOTopics/1-Articles/executing_ ceeding of the 10th International Symposium on Spatial Accuracy files.html#apple_ref/doc/uid/TP40001829-SW1 Assessment in Natural Resources and Environmental Sciences, p. 39. HNSX/OSXMalware [Internet]. GitHub. [cited 2017 Apr 25]. 1013 (July 2012) https://github.com/HNSX/OSXMalware 40. Hastie, T., Tibshirani, R., , G., Eisen, M., Brown, P., Bot- stein, D.: Imputing Missing Data for Gene Expression Arrays. Stanford University Statistics Department Technical Report (1999)

123