Spyware Detection Using Data Mining for Windows Portable Executable

Total Page:16

File Type:pdf, Size:1020Kb

Spyware Detection Using Data Mining for Windows Portable Executable Islamic University of Gaza Deanery of Higher Studies Information Technology program Spyware Detection Using Data Mining for Windows Portable Executable Files By: Fadel Omar Shaban 120091437 Supervised by: Dr. Tawfiq S. Barhoom A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Science In Information Technology 2013-1434H ِ ِ ِ ِ َِِّ ِ ﴿قُ ْل إ َّن َصﻻتي َونُ ُسكي َوَم ْحيَا َي َوَمَماتي لله َر ِّب الَْعالَمي َن ﻻ ِ ِ ِ ِِ ( اﻷنعام - 261، 261.) َشِري َك لَهُ َوب َذل َك أُمْر ُت َوأَنَا أََّوُل الُْم ْسلمي َن﴾ ACKNOWLEDGMENTS First and Foremost, I am very grateful to almighty ALLAH whose blessings have always been source of encouragement for me and who gave me the ability to complete this task. This thesis would not exist without the help, advice, support, guidance, and encouragement of many people. In particular, I wish to express my sincere appreciation to my supervisor Dr. Tawfiq S. Barhoom, without his help, guidance, and continuous follow-up; this research would never have been. Also I would like to extend my thanks to the academic staff of the Faculty of Information Technology who taught me different courses and helped me during my Master’s study. Special greetings to my family, especially my parents, who have always kept me in their prayers, who have suffered a lot to make me happy. Last but not least, I wish to express my sincere thanks to all those who have one way or another helped me in making this study a success. I TABLE OF CONTENTS ACKNOWLEDGMENTS ............................................................................................................................. I LIST OF TABLES ............................................................................................................................................. IV LIST OF FIGURES ............................................................................................................................................ V LIST OF ABBREVIATIONS .............................................................................................................................. VI Abstract ....................................................................................................................................................... VII VIII ........................................................................................................................................................... ملخص 1 CHAPTER 1: Introduction ...................................................................................................................... 1 1.1 Introduction .................................................................................................................................. 1 1.2 Statement of the problem ............................................................................................................ 2 1.3 Objectives...................................................................................................................................... 3 1.3.1 Main objective ...................................................................................................................... 3 1.3.2 Specific objectives ................................................................................................................. 3 1.4 Scope and Limitation: ................................................................................................................... 4 1.5 Importance of the research .......................................................................................................... 4 1.6 Thesis Organization ....................................................................................................................... 4 2 CHAPTER 2: Literature Review .............................................................................................................. 6 2.1 Malware Detection Techniques .................................................................................................... 6 2.1.1 Anomaly-based detection ..................................................................................................... 6 2.1.2 Signature-based detection .................................................................................................... 6 2.2 Portable Executable File ................................................................................................................ 8 2.2.1 The PE File Headers and Sections ......................................................................................... 8 2.2.2 Importing Functions ............................................................................................................ 10 2.3 Packers and Unpacking ............................................................................................................... 11 2.4 Data Mining ................................................................................................................................. 14 2.4.1 Data Reduction .................................................................................................................... 16 2.4.2 Classification ....................................................................................................................... 18 2.4.3 Classification algorithms ..................................................................................................... 18 2.4.4 Classification Performance ................................................................................................. 25 3 CHAPTER 3: Related Work .................................................................................................................. 30 3.1 Malware detection ...................................................................................................................... 30 3.2 Spyware detection ...................................................................................................................... 33 II 3.3 Discussion and summary............................................................................................................. 35 4 CHAPTER 4: Data Collection and Preprocessing ................................................................................. 37 4.1 Data Collection ............................................................................................................................ 37 4.2 Data Preprocessing ..................................................................................................................... 37 4.3 Step 1: Unpack the spyware. ...................................................................................................... 39 4.4 Step 2: Disassemble the binary executable and feature extraction ........................................... 40 4.5 Feature Extraction ....................................................................................................................... 41 4.6 Feature Selection ........................................................................................................................ 42 5 CHAPTER 5: Experiments and Results analysis .................................................................................. 46 5.1 Experimental Environment and Tools ......................................................................................... 46 5.2 Performance Evaluation Metrics ................................................................................................ 46 5.3 Algorithm Configuration ............................................................................................................. 47 5.4 Experimental Results .................................................................................................................. 51 5.4.1 Experiment on features set 2 “list of DLLs used by the PE file”........................................... 51 5.4.2 Experiment on features set 1 “number of different API calls the PE file has imported from the corresponding DLL”. ..................................................................................................................... 53 5.4.3 Experiment on features set 1 “The number of different API function calls the PE file has used from API call categories”. ........................................................................................................... 55 5.4.4 Experiment on features set 4 “The list of selected API function calls used by the PE file”. 57 5.4.5 Experiment on features set 5 “A combination of Selected API calls categories and a list of selected API function calls” ................................................................................................................. 59 5.5 Discussion and summary............................................................................................................. 61 6 CHAPTER 6: Results Comparison and Summary ................................................................................ 64 6.1 Results Comparison .................................................................................................................... 64 6.2 Summary ..................................................................................................................................... 66 6.3 Future Work ................................................................................................................................ 67 7 References .......................................................................................................................................... 68 III LIST
Recommended publications
  • Reverse Software Engineering As a Project-Based Learning Tool
    Paper ID #33764 Reverse Software Engineering as a Project-Based Learning Tool Ms. Cynthia C. Fry, Baylor University CYNTHIA C. FRY is currently a Senior Lecturer of Computer Science at Baylor University. She worked at NASA’s Marshall Space Flight Center as a Senior Project Engineer, a Crew Training Manager, and the Science Operations Director for STS-46. She was an Engineering Duty Officer in the U.S. Navy (IRR), and worked with the Naval Maritime Intelligence Center as a Scientific/Technical Intelligence Analyst. She was the owner and chief systems engineer for Systems Engineering Services (SES), a computer systems design, development, and consultation firm. She joined the faculty of the School of Engineering and Computer Science at Baylor University in 1997, where she teaches a variety of engineering and computer science classes, she is the Faculty Advisor for the Women in Computer Science (WiCS), the Director of the Computer Science Fellows program, and is a KEEN Fellow. She has authored and co- authored over fifty peer-reviewed papers. Mr. Zachary Michael Steudel Zachary Steudel is a 2021 graduate of Baylor University’s computer science department. In his time at Baylor, he worked as a Teaching Assistant under Ms. Cynthia C. Fry. As part of the Teaching Assistant role, Zachary designed and created the group project for the Computer Systems course. Zachary Steudel worked as a Software Developer Intern at Amazon in the Summer of 2019, a Software Engineer Intern at Microsoft in the Summer of 2020, and begins his full-time career with Amazon in the summer of 2021 as a software engineer.
    [Show full text]
  • Cyber Threat Metrics
    SANDIA REPORT SAND2012-2427 Unlimited Release Printed March 2012 Cyber Threat Metrics Mark Mateski, Cassandra M. Trevino, Cynthia K. Veitch, John Michalski, J. Mark Harris, Scott Maruoka, Jason Frye Prepared by Sandia National Laboratories Albuquerque, New Mexico 87185 Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy's National Nuclear Security Administration under contract DE-AC04-94AL85000. Approved for public release; further dissemination unlimited Issued by Sandia National Laboratories, operated for the United States Department of Energy by Sandia Corporation. NOTICE: This report was prepared as an account of work sponsored by an agency of the United States Government. Neither the United States Government, nor any agency thereof, nor any of their employees, nor any of their contractors, subcontractors, or their employees, make any warranty, express or implied, or assume any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represent that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise, does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government, any agency thereof, or any of their contractors or subcontractors. The views and opinions expressed herein do not necessarily state or reflect those of the United States Government, any agency thereof, or any of their contractors. Printed in the United States of America. This report has been reproduced from the best available copy.
    [Show full text]
  • Reverse Engineering Digital Forensics Rodrigo Lopes October 22, 2006
    Reverse Engineering Digital Forensics Rodrigo Lopes October 22, 2006 Introduction Engineering is many times described as making practical application of the knowledge of pure sciences in the solution of a problem or the application of scientific and mathematical principles to develop economical solutions to technical problems, creating products, facilities, and structures that are useful to people. What if the opposite occurs? There is some product that may be a solution to some problem but the inner workings of the solution or even the problem it addresses may be unknown. Reverse engineering is the process of analyzing and understanding a product which functioning and purpose are unknown. In Computer Science in particular, reverse engineering may be defined as the process of analyzing a system's code, documentation, and behavior to identify its current components and their dependencies to extract and create system abstractions and design information. The subject system is not altered; however, additional knowledge about the system is produced. The definition of Reverse Engineering is not peaceful though, especially when it concerns to court and lawsuits. The Reverse Engineering of products protected by copyrighting may be a crime, even if no code is copied. From the software companies’ point of view, Reverse Engineering is many times defined as “Analyzing a product or other output of a process in order to determine how to duplicate the know-how which has been used to create a product or process”. Scope and Goals In the Digital Forensics’ scope, reverse engineering can directly be applied to analyzing unknown and suspicious code in the system, to understand both its goal and inner functioning.
    [Show full text]
  • Containing Conficker to Tame a Malware
    &#4#5###4#(#%#5#6#%#5#&###,#'#(#7#5#+#&#8##9##:65#,-;/< Know Your Enemy: Containing Conficker To Tame A Malware The Honeynet Project http://honeynet.org Felix Leder, Tillmann Werner Last Modified: 30th March 2009 (rev1) The Conficker worm has infected several million computers since it first started spreading in late 2008 but attempts to mitigate Conficker have not yet proved very successful. In this paper we present several potential methods to repel Conficker. The approaches presented take advantage of the way Conficker patches infected systems, which can be used to remotely detect a compromised system. Furthermore, we demonstrate various methods to detect and remove Conficker locally and a potential vaccination tool is presented. Finally, the domain name generation mechanism for all three Conficker variants is discussed in detail and an overview of the potential for upcoming domain collisions in version .C is provided. Tools for all the ideas presented here are freely available for download from [9], including source code. !"#$%&'()*+&$(% The big years of wide-area network spreading worms were 2003 and 2004, the years of Blaster [1] and Sasser [2]. About four years later, in late 2008, we witnessed a similar worm that exploits the MS08-067 server service vulnerability in Windows [3]: Conficker. Like its forerunners, Conficker exploits a stack corruption vulnerability to introduce and execute shellcode on affected Windows systems, download a copy of itself, infect the host and continue spreading. SRI has published an excellent and detailed analysis of the malware [4]. The scope of this paper is different: we propose ideas on how to identify, mitigate and remove Conficker bots.
    [Show full text]
  • X86 Disassembly Exploring the Relationship Between C, X86 Assembly, and Machine Code
    x86 Disassembly Exploring the relationship between C, x86 Assembly, and Machine Code PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Sat, 07 Sep 2013 05:04:59 UTC Contents Articles Wikibooks:Collections Preface 1 X86 Disassembly/Cover 3 X86 Disassembly/Introduction 3 Tools 5 X86 Disassembly/Assemblers and Compilers 5 X86 Disassembly/Disassemblers and Decompilers 10 X86 Disassembly/Disassembly Examples 18 X86 Disassembly/Analysis Tools 19 Platforms 28 X86 Disassembly/Microsoft Windows 28 X86 Disassembly/Windows Executable Files 33 X86 Disassembly/Linux 48 X86 Disassembly/Linux Executable Files 50 Code Patterns 51 X86 Disassembly/The Stack 51 X86 Disassembly/Functions and Stack Frames 53 X86 Disassembly/Functions and Stack Frame Examples 57 X86 Disassembly/Calling Conventions 58 X86 Disassembly/Calling Convention Examples 64 X86 Disassembly/Branches 74 X86 Disassembly/Branch Examples 83 X86 Disassembly/Loops 87 X86 Disassembly/Loop Examples 92 Data Patterns 95 X86 Disassembly/Variables 95 X86 Disassembly/Variable Examples 101 X86 Disassembly/Data Structures 103 X86 Disassembly/Objects and Classes 108 X86 Disassembly/Floating Point Numbers 112 X86 Disassembly/Floating Point Examples 119 Difficulties 121 X86 Disassembly/Code Optimization 121 X86 Disassembly/Optimization Examples 124 X86 Disassembly/Code Obfuscation 132 X86 Disassembly/Debugger Detectors 137 Resources and Licensing 139 X86 Disassembly/Resources 139 X86 Disassembly/Licensing 141 X86 Disassembly/Manual of Style 141 References Article Sources and Contributors 142 Image Sources, Licenses and Contributors 143 Article Licenses License 144 Wikibooks:Collections Preface 1 Wikibooks:Collections Preface This book was created by volunteers at Wikibooks (http:/ / en.
    [Show full text]
  • Binary Disassembly Block Coverage by Symbolic Execution Vs
    Air Force Institute of Technology AFIT Scholar Theses and Dissertations Student Graduate Works 3-22-2012 Binary Disassembly Block Coverage by Symbolic Execution vs. Recursive Descent Jonathan D. Miller Follow this and additional works at: https://scholar.afit.edu/etd Part of the Information Security Commons Recommended Citation Miller, Jonathan D., "Binary Disassembly Block Coverage by Symbolic Execution vs. Recursive Descent" (2012). Theses and Dissertations. 1138. https://scholar.afit.edu/etd/1138 This Thesis is brought to you for free and open access by the Student Graduate Works at AFIT Scholar. It has been accepted for inclusion in Theses and Dissertations by an authorized administrator of AFIT Scholar. For more information, please contact [email protected]. BINARY DISASSEMBLY BLOCK COVERAGE BY SYMBOLIC EXECUTION VS. RECURSIVE DESCENT THESIS Jonathan D. Miller, Second Lieutenant, USAF AFIT/GCO/ENG/12-09 DEPARTMENT OF THE AIR FORCE AIR UNIVERSITY AIR FORCE INSTITUTE OF TECHNOLOGY Wright-Patterson Air Force Base, Ohio APPROVED FOR PUBLIC RELEASE; DISTRIBUTION UNLIMITED The views expressed in this thesis are those of the author and do not reflect the official policy or position of the United States Air Force, the Department of Defense, or the United States Government. This material is declared a work of the U.S. Government and is not subject to copyright protection in the United States AFIT/GCO/ENG/12-09 BINARY DISASSEMBLY BLOCK COVERAGE BY SYMBOLIC EXECUTION VS. RECURSIVE DESCENT THESIS Presented to the Faculty Department of Electrical and Computer Engineering Graduate School of Engineering and Management Air Force Insitute of Technology Air University Air Education and Training Command in Partial Fulfillment of the Requirements for the Degree of Master of Science Jonathan D.
    [Show full text]
  • Reverse Engineering of a Malware
    REVERSE ENGINEERING OF A MALWARE EYEING THE FUTURE OF SECURITY A Thesis Presented to The Graduate Faculty of The University of Akron In Partial Fulfillment of the Requirements for the Degree Master of Science Supreeth Burji August, 2009 REVERSE ENGINEERING OF A MALWARE EYEING THE FUTURE OF SECURITY Supreeth Burji Thesis Approved: Accepted: ________________________________ ________________________________ Advisor Department Chair Dr. Kathy J. Liszka Dr. Chien-Chung Chan ________________________________ ________________________________ Faculty Reader Dean of the College Dr. Timothy W. O'Neil Dr. Chand Midha ________________________________ ________________________________ Faculty Reader Dean of the Graduate School Dr. Wolfgang Pelz Dr. George R. Newkome ________________________________ Date ii ABSTRACT Reverse engineering malware has been an integral part of the world of security. At best it has been employed for signature logging malware until now. Since the evolution of new age technologies, this is now being researched as a robust methodology which can lead to more reactive and proactive solutions to the modern security threats that are growing stronger and more sophisticated. This research in its entirety has been an attempt to understand the in and outs of reverse engineering pertaining to malware analysis, with an eye to the future trends in security. Reverse engineering of malware was done with Nugache P2P malware as the target showing that signature based malware identification is ineffective. Developing a proactive approach to quickly identifying malware was the objective that guided this research work. Innovative malware analysis techniques with data mining and rough sets methodologies have been employed in this research work in the quest of a proactive and feasible security solution.
    [Show full text]
  • Techniques of Adware and Spyware
    Techniques of Techniques and Adware Spyware Eric Chien SecuritySymantec Response From theauthor. proceedingsthe of permission with Used of the VB2005 Conference. WHITE PAPER: SYMANTEC SECURITY RESPONSE White Paper: Symantec Security Response Techniques of Adware and Spyware Contents Abstract.......................................................................................................................................6 Background................................................................................................................................. 6 Delivery vectors...........................................................................................................................8 Social engineering banner ads...................................................................................................8 Drive by Downloads.................................................................................................................... 9 Automatic refresh....................................................................................................................... 9 Active X........................................................................................................................................10 Continual Prompting...................................................................................................................11 Bundled and chained installs..................................................................................................... 11 Peer to peer installation............................................................................................................
    [Show full text]
  • Windows Malware Analysis & Static Analysis Blocking CYS5120 - Malware Analysis Bahcesehir University Cyber Security Msc Program
    Code Analysis Analyzing Malicious Windows Programs Static Analysis Blocking Methods 04 - Code Analysis & Windows Malware Analysis & Static Analysis Blocking CYS5120 - Malware Analysis Bahcesehir University Cyber Security Msc Program Dr. Ferhat Ozgur Catak 1 Mehmet Can Doslu 2 [email protected] [email protected] 2017-2018 Fall Dr. Ferhat Ozgur Catak & Mehmet Can Doslu 04 - Code Analysis & Windows Malware Analysis & Static Analysis Blocking Code Analysis Analyzing Malicious Windows Programs Static Analysis Blocking Methods Table of Contents 1 Code Analysis Packers & Unpacking Stack Operations Packer Anatomy Disassembler & Debugger Identifying Packed Programs IDA Pro Automated Unpacking The IDA Pro Interface Manual Unpacking Useful Windows for Analysis Anti-disassembly Lab Jump Instructions with the 2 Analyzing Malicious Windows Same Target Programs A Jump Instruction with a Introduction Constant Condition The Windows API Impossible Disassembly File System Functions The Function Pointer Problem Special Files Return Pointer Abuse The Windows Registry Misusing Structured Exception Networking APIs Handlers Lab Thwarting Stack-Frame 3 Static Analysis Blocking Methods Analysis Dr. Ferhat Ozgur Catak & Mehmet Can Doslu 04 - Code Analysis & Windows Malware Analysis & Static Analysis Blocking Code Analysis Analyzing Malicious Windows Programs Static Analysis Blocking Methods Table of Contents 1 Code Analysis Packers & Unpacking Stack Operations Packer Anatomy Disassembler & Debugger Identifying Packed Programs IDA Pro Automated Unpacking The IDA Pro Interface Manual Unpacking Useful Windows for Analysis Anti-disassembly Lab Jump Instructions with the 2 Analyzing Malicious Windows Same Target Programs A Jump Instruction with a Introduction Constant Condition The Windows API Impossible Disassembly File System Functions The Function Pointer Problem Special Files Return Pointer Abuse The Windows Registry Misusing Structured Exception Networking APIs Handlers Lab Thwarting Stack-Frame 3 Static Analysis Blocking Methods Analysis Dr.
    [Show full text]
  • Unpacking Framework for Packed Malicious Executables
    FACULDADE DE ENGENHARIA DA UNIVERSIDADE DO PORTO Unpacking Framework for Packed Malicious Executables Gaspar Furtado For Jury Evaluation Mestrado Integrado em Engenharia Informática e Computação Supervisor: José Manuel De Magalhães Cruz Second Supervisor: Jürgen Eckel June 19, 2013 Unpacking Framework for Packed Malicious Executables Gaspar Furtado Mestrado Integrado em Engenharia Informática e Computação June 19, 2013 Abstract Malware is a growing concern in the modern connected and machine-dependent world. A com- mon approach to fighting malware is early detection. This is the approach used by most antivirus products. On the other side, malware authors try to keep their software undetected as long as pos- sible in order to achieve their goals. One technique used for this is the use of packers. The ease of use and the protections against detection and analysis that packers provide have made packing malware very popular. An unavoidable fact is that the large majority of malware is packed. The varying complexity of packers from simple compressors to extremely advanced virtual machines have forced the IT security industry to address the problem seriously. The reduced effectiveness of detection on packed binaries is a known problem that the industry tries to solve using different techniques. Static unpacking provides an extremely efficient way of addressing the problem of packed executables. This approach relies on reversing the changes done by the packer to the bi- nary, without executing it. The goal of this project was to implement a static unpacking framework that would allow the unpacking of packed executables. The occurrence of a multitude of different packer families and versions meant that such a tool should allow the incremental addition of sup- port for different packers.
    [Show full text]
  • Metasploit Framework - Guide for Pentesters Ii
    Metasploit Framework - guide for pentesters ii Copyright © 2012 Software Media Sp. z o.o. SK Editor in Chief: Ewa Dudzic [email protected] Managing Editor: Aleksandra Cacko [email protected] DTP: Andrzej Kuca, Lalit Agarwal, Aleksandra Cacko Art Director: Andrzej Kuca [email protected] Graphics and cover: Ireneusz Pogroszewski Proofreaders: Edward Werzyn, Gareth Watters Top Betatesters: Stefanus Natahusada, Steven Wierckx Special Thanks to the Beta testers and Proofreaders who helped us with this issue. Without their assistance there would not be a PenTest e-book. Senior Consultant/Publisher: Pawel Marciniak Production Director: Andrzej Kuca Publisher: Software Media 02-682 Warszawa, ul. Bokserska 1 http://pentestmag.com/ First edition Issue 2/2012 (2) ISSN 2084-1116 Whilst every effort has been made to ensure the high quality of the e-book, the editors make no warranty, express or implied, concerning the results of content usage. All trademarks presented in the magazine were used only for informative purposes. All rights to trade marks presented in the magazine are reserved by the companies which own them. DISCLAIMER! The techniques described in our articles may only be used in private, local networks. The editors hold no responsibility for misuse of the presented techniques or consequent data loss. Metasploit Framework - guide for pentesters iv Contents 1 Metasploit: An Introduction 1 What is Metasploit? . .1 Architecture of Metasploit: . .2 Platform Used for demonstration . .2 Metasploit Interfaces: . .3 Good Practices for using Metasploit: . .3 Updating via Msfupdate . .3 Port scanning via Nmap . .4 Meterpreter: Metasploit’s Payload . .4 What typically payloads allow you to do after execution of exploit? .
    [Show full text]
  • The Ghost in the Browser Analysis of Web-Based Malware
    The Ghost In The Browser Analysis of Web-based Malware Niels Provos, Dean McNamee, Panayiotis Mavrommatis, Ke Wang and Nagendra Modadugu Google, Inc. {niels, deanm, panayiotis, kewang, ngm}@google.com Abstract tions of exploits against any user who visits the infected As more users are connected to the Internet and conduct page. their daily activities electronically, computer users have be- In most cases, a successful exploit results in the auto- come the target of an underground economy that infects hosts matic installation of a malware binary, also called drive-by- with malware or adware for financial gain. Unfortunately, download. The installed malware often enables an adversary even a single visit to an infected web site enables the attacker to gain remote control over the compromised computer sys- to detect vulnerabilities in the user’s applications and force tem and can be used to steal sensitive information such as the download a multitude of malware binaries. Frequently, banking passwords, to send out spam or to install more ma- this malware allows the adversary to gain full control of the licious executables over time. Unlike traditional botnets [4] compromised systems leading to the ex-filtration of sensitive that use push-based infection to increase their population, information or installation of utilities that facilitate remote web-based malware infection follows a pull-based model and control of the host. We believe that such behavior is sim- usually provides a looser feedback loop. However, the popu- ilar to our traditional understanding of botnets. However, lation of potential victims is much larger as web proxies and the main difference is that web-based malware infections are NAT-devices pose no barrier to infection [1].
    [Show full text]