Spyware Detection Using Data Mining for Windows Portable Executable
Total Page:16
File Type:pdf, Size:1020Kb
Islamic University of Gaza Deanery of Higher Studies Information Technology program Spyware Detection Using Data Mining for Windows Portable Executable Files By: Fadel Omar Shaban 120091437 Supervised by: Dr. Tawfiq S. Barhoom A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Science In Information Technology 2013-1434H ِ ِ ِ ِ َِِّ ِ ﴿قُ ْل إ َّن َصﻻتي َونُ ُسكي َوَم ْحيَا َي َوَمَماتي لله َر ِّب الَْعالَمي َن ﻻ ِ ِ ِ ِِ ( اﻷنعام - 261، 261.) َشِري َك لَهُ َوب َذل َك أُمْر ُت َوأَنَا أََّوُل الُْم ْسلمي َن﴾ ACKNOWLEDGMENTS First and Foremost, I am very grateful to almighty ALLAH whose blessings have always been source of encouragement for me and who gave me the ability to complete this task. This thesis would not exist without the help, advice, support, guidance, and encouragement of many people. In particular, I wish to express my sincere appreciation to my supervisor Dr. Tawfiq S. Barhoom, without his help, guidance, and continuous follow-up; this research would never have been. Also I would like to extend my thanks to the academic staff of the Faculty of Information Technology who taught me different courses and helped me during my Master’s study. Special greetings to my family, especially my parents, who have always kept me in their prayers, who have suffered a lot to make me happy. Last but not least, I wish to express my sincere thanks to all those who have one way or another helped me in making this study a success. I TABLE OF CONTENTS ACKNOWLEDGMENTS ............................................................................................................................. I LIST OF TABLES ............................................................................................................................................. IV LIST OF FIGURES ............................................................................................................................................ V LIST OF ABBREVIATIONS .............................................................................................................................. VI Abstract ....................................................................................................................................................... VII VIII ........................................................................................................................................................... ملخص 1 CHAPTER 1: Introduction ...................................................................................................................... 1 1.1 Introduction .................................................................................................................................. 1 1.2 Statement of the problem ............................................................................................................ 2 1.3 Objectives...................................................................................................................................... 3 1.3.1 Main objective ...................................................................................................................... 3 1.3.2 Specific objectives ................................................................................................................. 3 1.4 Scope and Limitation: ................................................................................................................... 4 1.5 Importance of the research .......................................................................................................... 4 1.6 Thesis Organization ....................................................................................................................... 4 2 CHAPTER 2: Literature Review .............................................................................................................. 6 2.1 Malware Detection Techniques .................................................................................................... 6 2.1.1 Anomaly-based detection ..................................................................................................... 6 2.1.2 Signature-based detection .................................................................................................... 6 2.2 Portable Executable File ................................................................................................................ 8 2.2.1 The PE File Headers and Sections ......................................................................................... 8 2.2.2 Importing Functions ............................................................................................................ 10 2.3 Packers and Unpacking ............................................................................................................... 11 2.4 Data Mining ................................................................................................................................. 14 2.4.1 Data Reduction .................................................................................................................... 16 2.4.2 Classification ....................................................................................................................... 18 2.4.3 Classification algorithms ..................................................................................................... 18 2.4.4 Classification Performance ................................................................................................. 25 3 CHAPTER 3: Related Work .................................................................................................................. 30 3.1 Malware detection ...................................................................................................................... 30 3.2 Spyware detection ...................................................................................................................... 33 II 3.3 Discussion and summary............................................................................................................. 35 4 CHAPTER 4: Data Collection and Preprocessing ................................................................................. 37 4.1 Data Collection ............................................................................................................................ 37 4.2 Data Preprocessing ..................................................................................................................... 37 4.3 Step 1: Unpack the spyware. ...................................................................................................... 39 4.4 Step 2: Disassemble the binary executable and feature extraction ........................................... 40 4.5 Feature Extraction ....................................................................................................................... 41 4.6 Feature Selection ........................................................................................................................ 42 5 CHAPTER 5: Experiments and Results analysis .................................................................................. 46 5.1 Experimental Environment and Tools ......................................................................................... 46 5.2 Performance Evaluation Metrics ................................................................................................ 46 5.3 Algorithm Configuration ............................................................................................................. 47 5.4 Experimental Results .................................................................................................................. 51 5.4.1 Experiment on features set 2 “list of DLLs used by the PE file”........................................... 51 5.4.2 Experiment on features set 1 “number of different API calls the PE file has imported from the corresponding DLL”. ..................................................................................................................... 53 5.4.3 Experiment on features set 1 “The number of different API function calls the PE file has used from API call categories”. ........................................................................................................... 55 5.4.4 Experiment on features set 4 “The list of selected API function calls used by the PE file”. 57 5.4.5 Experiment on features set 5 “A combination of Selected API calls categories and a list of selected API function calls” ................................................................................................................. 59 5.5 Discussion and summary............................................................................................................. 61 6 CHAPTER 6: Results Comparison and Summary ................................................................................ 64 6.1 Results Comparison .................................................................................................................... 64 6.2 Summary ..................................................................................................................................... 66 6.3 Future Work ................................................................................................................................ 67 7 References .......................................................................................................................................... 68 III LIST