Dllminer: Structural Mining for Malware Detection Masoud Narouei1, Mansour Ahmadi2 *, Giorgio Giacinto2, Hassan Takabi1 Andashkansami3

Total Page:16

File Type:pdf, Size:1020Kb

Dllminer: Structural Mining for Malware Detection Masoud Narouei1, Mansour Ahmadi2 *, Giorgio Giacinto2, Hassan Takabi1 Andashkansami3 SECURITY AND COMMUNICATION NETWORKS Security Comm. Networks 2015; 8:3311–3322 Published online 22 April 2015 in Wiley Online Library (wileyonlinelibrary.com). DOI: 10.1002/sec.1255 RESEARCH ARTICLE DLLMiner: structural mining for malware detection Masoud Narouei1, Mansour Ahmadi2 *, Giorgio Giacinto2, Hassan Takabi1 andAshkanSami3 1 Department of Computer Science and Engineering, University of North Texas, Denton, TX, U.S.A. 2 Department of Electrical and Electronic Engineering, University of Cagliari, Italy 3 CSE and IT Department, School of Electrical and Computer Engineering, Shiraz University, Shiraz, Iran ABSTRACT Existing anti-malware products usually use signature-based techniques as their main detection engine. Although these methods are very fast, they are unable to provide effective protection against newly discovered malware or mutated variant of old malware. Heuristic approaches are the next generation of detection techniques to mitigate the problem. These approaches aim to improve the detection rate by extracting more behavioral characteristics of malware. Although these approaches cover the disadvantages of signature-based techniques, they usually have a high false positive, and evasion is still possible from these approaches. In this paper, we propose an effective and efficient heuristic technique based on static analysis that not only detect malware with a very high accuracy, but also is robust against common evasion techniques such as junk injection and packing. Our proposed system is able to extract behavioral features from a unique structure in portable executable, which is called dynamic-link library dependency tree, without actually executing the application. Copyright © 2015 John Wiley & Sons, Ltd. KEYWORDS malware analysis; dependency tree; closed frequent tree; evasion *Correspondence Mansour Ahmadi, University of Cagliari, Italy. E-mail: [email protected] 1. INTRODUCTION malware, such as byte sequences or strings, for detec- tion. The key advantage of these approaches is their high Malware is the short form for malicious software that con- detection rate and low false alarms. Another benefit of sists of different hostile code combinations. Malware is these approaches is their speed in extracting features mainly designed to gain unauthorized access, steal user’s because they do not need to run each executable file, which critical information, and sometimes cause damage to com- is a time consuming task. New “anti-anti-malware” tech- puters. Malware is categorized into viruses, worms, Trojan niques such as code packing, control-flow and entry point horses, spywares, rootkits, scareware, adware, riskware, obfuscation change the specific signature of discovered and so on. According to [1], “As much malware was pro- malware, making static analyzers toothless in detecting duced in 2007 as in the previous 20 years altogether.” In previously detected malware [6]. Although there are lots 2011, Kaspersky [2] stated that about 413 billion infections of learning-based detection techniques such as [7,8] or, have been detected by host-based anti-viruses. Symantec distance-based signature matching [9], evasion from them [3] encountered more than 286 million unique variants of is still possible for new malware variants. Therefore, a malware and also Web-attack toolkits, and a phenomenon malware detection system is actually a trade-off between called Polymorphism drive the number of malware variants detection rate and robustness that should be considered in common circulation. during development. Improving the intelligence of anti-malware software is Because of the weaknesses of static-based detection critical. A lot of effort has been put in the area of malware techniques, dynamic analysis techniques have made lots detection, which can be broadly categorized into static [4] of progress in recent years. In comparison to static-based, and behavioral-based [5] malware detection techniques. dynamic analysis techniques consider the behavior of mal- Static detection techniques such as anomaly, heuris- ware during run time. They mostly monitor the behavior tic or signature-based, mostly operate on disassembled of malware using API calls [10,11]. Dynamic-based tech- instructions or Application Programming Interface (API) niques act almost robustly against polymorphism malware calls. These techniques use a specific signature of each because obfuscation techniques only change the static Copyright © 2015 John Wiley & Sons, Ltd. 3311 Structural mining for malware detection M. Narouei et al. signature of malware and do not affect the behavior of the different structures like opcode [18] from the output, or malware. Despite of its advantages, dynamic-based tech- directly focus on the byte code [19,20]. Sung et al. [21] niques have a large preprocessing overhead during run time proposed SAVE that considers a sequence of API calls as and monitoring, which lowers system performance. In fact, signature of malware. Kumar and Spafford [22] proposed dynamic analysis is usually performed when anti-malware a method that detects malware based on regular expres- companies want to look for a new sample in the wild sion matching. Christodorescu and Jha [9] proposed SAFE and then they use its static features in the anti-malware that takes patterns of malicious behavior and then creates products. Therefore, implementation of these techniques an automata from it. Some other approaches [7,8] consid- in the commercial host-based anti-malware products is not ered IAT of PE and its content. Another work [23] also applicable. Another drawback of such approaches is that considered some metadata, such as number of bitmaps, monitoring behavior of dynamic libraries, such as OCX size of import, and export address table, besides IAT con- and dynamic-link library (DLL), is difficult to perform. tent. There is only one research [24] that considered DLL According to [12], about 60% of malwares collected at dependency of PE but they did not extract semantic behav- KingSoft anti-malware lab are DLL files, which cannot be iors from all DLLs. They considered single APIs and DLL run and analyzed dynamically. dependency paths of the tree as features that can be cir- A portable executable (PE) depends on some DLLs for cumvented by junk injection techniques. In contrast, our execution, and each DLL has a relationship with other method is more resilient against evasion techniques and DLLs for completing the task. This hierarchical dependen- also the extracted behaviors by our method is more seman- cies between PE and DLLs is known as DLL dependency tic because we consider the relationships between all DLLs tree. In this paper, we will propose a hybrid system that not only the paths of the DLL dependency tree. The last by statically extracting the DLL dependency tree from the four approaches are the most similar methods to ours PE, has the advantages of both static and dynamic tech- but opposite to our approach, they did not consider any niques. Although the PE is not executed, the extracted tree semantic relation between all DLLs. can reveal coarse-grain behaviors of a PE because it is Besides static analysis, researchers have put a lot created based on the interactions of the PE with the operat- of effort in proposing behavior-based malware detection ing system, which leads to relationships among DLLs. The methods that monitor programs during the run-time. Dif- novelty of our method is the extraction of the coarse-grain ferent approaches and techniques can be applied to perform behaviors from PE’s import address table (IAT), which such dynamic analysis. In this technique, analyzers usually was not considered in previous methods. It means that the consider system calls or instructions sequences as discrim- analysis system is also very fast because it only consid- inating features between malware and benign programs. ers the header of PE, and also, the extracted behaviors Some approaches [11,25,26], monitored the program’s are semantic and influential to discriminate malicious and behavior by analyzing API calls. In another work [27], the benign programs. In addition of the efficiency of static authors extracted sequences of instructions from both mali- methods and the effectiveness of dynamic approaches, our cious and benign programs and use some combination of method cannot simply be circumvented. The evaluations frequent assembly instructions to build the dataset. In some show the effectiveness against several kinds of packed techniques [28,29], a behavioral graph is extracted from malware as well as robustness against junk injection tech- each malware based on API call parameters and under- niques. The rest of the paper is organized as follows: stood behavior of malware. Instead of program-centric related work are renewed in section 2; definitions are detection approaches, some approaches try to generalize presented in section 3, and section 4 presents the pro- the detection method to the whole system. Lanzi et al. posed method. Results of the experiments are discussed in [30] proposed an access activity model that captures the section 5, and conclusions and future work will wrap up generalized interactions of benign applications with oper- the paper. ating system resources such as files and registry, and the model can detect the malware well with very low false pos- itive. Nappa et al. [31] also proposed a technique to detect 2. RELATED WORK exploited servers, managed by the same organization. They collected information about how exploited servers Analysis techniques
Recommended publications
  • 1. Installing Axis C++ Server
    Axis c++ windows installation guide Description This document describes the installation process of Axis C++ on windows OS , using Apache web server, to the point where you have a running Calculator sample (provided with Axis) web service. What does Axis do? Web service is a way to expose services to developers over the web. For example, a Calculator object reside on a web server might expose an Add service , so that a client software can call this function and get a result over the web. The protocol used for this communication is SOAP, which is a XML based protocol. What Axis do is parse this SOAP message, deserialize and invoke the C++ class which perform the actual calculation, and than serialize the result back to SOAP protocol and return it to the callee. Installation The installation can be divided to 2 parts: 1. Installing Axis C++ server. 2. Creating the Web service. 1. Installing Axis C++ server How does it work? Axis is implemented as a module (mod_axis.dll for apache 1.3 or mod_axis2.dll for apache 2.0) which is loaded by the web server. A module is a piece of software which declares its interest in the messages sent to the web server. In order to be loaded as a module, this dll should be placed in the server's modules folder, and be registered in the server's httpd.conf file. The module itself requires several dlls to properly work. Some of them come with Axis and some with the apache server. In order for them to be found, we will add the path to their containing folders to the Path environment variable.
    [Show full text]
  • Windows Embedded Standard 2009 Prepkit
    MCTSi Exam 70-577 Windows Embedded Standard 2009 Preparation Kit Certification Exam Preparation Automation Not for resale. ii Published by Microsoft Corporation One Microsoft Way Redmond, Washington 98052-6399 This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS DOCUMENT. The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the date of publica- tion. Information in this document, including URL and other Internet Web site references, is subject to change with- out notice. Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in or introduced into a retrieval system, or transmit- ted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any pur- pose, without the express written permission of Microsoft Corporation. Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property. Copyright © 2009 Microsoft Corporation. All rights reserved. Internet Explorer, Microsoft, MS-DOS, Visual Studio, Win32, Windows and Windows Vista are trademarks of the Microsoft group of companies.
    [Show full text]
  • Implementation of a Microsoft Windows Embedded Standard System
    University of Louisville ThinkIR: The University of Louisville's Institutional Repository Electronic Theses and Dissertations 12-2010 Implementation of a Microsoft Windows embedded standard system. Kristopher L. Kumler University of Louisville Follow this and additional works at: https://ir.library.louisville.edu/etd Recommended Citation Kumler, Kristopher L., "Implementation of a Microsoft Windows embedded standard system." (2010). Electronic Theses and Dissertations. Paper 783. https://doi.org/10.18297/etd/783 This Master's Thesis is brought to you for free and open access by ThinkIR: The University of Louisville's Institutional Repository. It has been accepted for inclusion in Electronic Theses and Dissertations by an authorized administrator of ThinkIR: The University of Louisville's Institutional Repository. This title appears here courtesy of the author, who has retained all other copyrights. For more information, please contact [email protected]. IMPLEMENTATION OF A MICROSOFT WINDOWS EMBEDDED STANDARD SYSTEM By Kristopher L. Kumler B.S., University of Louisville, 2002 A Thesis Submitted to the Faculty of the University of Louisville J.B. Speed School of Engineering in Partial Fulfillment of the Requirements for the Professional Degree MASTER OF ENGINEERING Department of Computer Engineering and Computer Science December 2010 IMPLEMENTATION OF A MICROSOFT WINDOWS EMBEDDED STANDARD SYSTEM Submitted by: _______________________________ Kristopher L. Kumler A Thesis Approved On _______________December 8, 2010 _______________ (Date) by the Following Reading and Examination Committee: ___________________________________ Ahmed H. Desoky, Thesis Director ___________________________________ Adel S. Elmaghraby ___________________________________ John F. Naber ii ACKNOWLEDGMENTS I would like to express my appreciation to my advisor, Dr. Ahmed Desoky, for his guidance and patience. I also thank my committee, Dr.
    [Show full text]
  • Malware Engineering for Dummies
    ! Departamento de Informática Bachelor Thesis Malware engineering for dummies Author: Luis Buendía Carreño Tutor: Sergio Pastrana Portillo Leganés, September of 2016 MALWARE ENGINEERING FOR DUMMIES - LUIS BUENDIA !1 MALWARE ENGINEERING FOR DUMMIES - LUIS BUENDIA !2 Título: Malware engineering for dummies Autor:Luis Buendía Carreño Director: Sergio Pastrana Portillo EL TRIBUNAL Presidente: ! Vocal: ! Secretario: ! Realizado el acto de defensa y lectura del Proyecto Fin de Carrera el día __ de _______ de 20__ en Leganés, en la Escuela Politécnica Superior de la Universidad Carlos III de Madrid, acuerda otorgarle la CALIFICACIÓN de: VOCAL PRESIDENTE SECRETARIO MALWARE ENGINEERING FOR DUMMIES - LUIS BUENDIA !3 This page is intentionally left blank. MALWARE ENGINEERING FOR DUMMIES - LUIS BUENDIA !4 Abstract Malicious software has become a major threat to modern society. It affects to all sectors in the industry all over the world[1] . The impact can vary being from an economic point of view to privacy invasion or to damage the targeted system. Being able to understand what a malware does can be used for detection and future prevention. For it, is crucial to train our future professionals. Analyzing malware requires deep knowledge of Operating Systems internal design and tools manipulation. All this knowledge was acquired and demonstrated in practice along the project. This project develops three malware samples and provides their correspondent technical reverse engineer analysis. This material has been created with the goal of being used as teaching resource at the laboratories of the Master in Cybersecurity at the University Carlos III of Madrid. The subject which this material is done for is named Malware analysis and engineering.
    [Show full text]
  • Windbg. from a to Z!” Is a Quick Start and Introduction to Windbg
    WinDbg.WinDbg. FromFrom AA toto Z!Z! Everything you need to know about WinDbg. And nothing you don’t. By Robert Kuster December 2007. All rights reserved. www.windbg.info 1 WhyWhy WinDbg?WinDbg? Because WinDbg is: used by the Microsoft Windows product team to develop Windows much more powerful than the well-known Visual Studio debugger extensible through extension DLLs its debug engine is part of the Windows OS Up from Windows XP dgbeng.dll and dbghelp.dll are installed in “C:\Windows\System32”. 2 www.windbg.info WhyWhy ““WinDbgWinDbg .. FromFrom AA toto ZZ ”” ?? WinDbg's documentation is sub-optimal for people new to the topic Without good documentation and examples the learning curve for WinDbg is very steep In fact many people give up soon after the installation. “WinDbg. From A to Z!” is a quick start and introduction to WinDbg . After reading it you will have a good feeling about what WinDbg is and what it can do for you. While many parts of “WinDbg. From A to Z!” are based on user-mode examples, you will benefit from it even if you are doing kernel-mode development. Note that the same debugging engine is running behind the scenes, no matter if you debug user-mode or kernel-mode code. Essentially the only visible difference for kernel-mode debugging is that you will end up using another set of extension commands. 3 www.windbg.info TableTable ofof ContentsContents -- RoadmapRoadmap Behind the Scenes • Using WinDbg • Global Flags • Application Verifier • Process Dumps 4 www.windbg.info DebuggingDebugging ToolsTools forfor WindowsWindows XPXP • WinDbg.exe • ntsd.exe • cdb.exe • kd.exe dbgeng.dll • dbgsrv.exe • userdump.exe • drwtsn32.exe • livekd.exe • OlyDbg.exe dbghelp.dll • ProcessExplorer.exe • ..
    [Show full text]
  • Windows Sysinternals Administrator's Reference
    Windows® Sysinternals Administrator’s Reference Mark Russinovich Aaron Margosis PUBLISHED BY Microsoft Press A Division of Microsoft Corporation One Microsoft Way Redmond, Washington 98052-6399 Copyright © 2011 by Aaron Margosis and Mark Russinovich All rights reserved. No part of the contents of this book may be reproduced or transmitted in any form or by any means without the written permission of the publisher. Library of Congress Control Number: 2011931614 ISBN: 978-0-7356-5672-7 4 5 6 7 8 9 10 11 12 LSI 7 6 5 4 3 2 Printed and bound in the United States of America. Microsoft Press books are available through booksellers and distributors worldwide. If you need support related to this book, email Microsoft Press Book Support at [email protected]. Please tell us what you think of this book at http://www.microsoft.com/learning/booksurvey. Microsoft and the trademarks listed at http://www.microsoft.com/about/legal/en/us/IntellectualProperty/ Trademarks/EN-US.aspx are trademarks of the Microsoft group of companies. All other marks are property of their respective owners. The example companies, organizations, products, domain names, email addresses, logos, people, places, and events depicted herein are fictitious. No association with any real company, organization, product, domain name, email address, logo, person, place, or event is intended or should be inferred. This book expresses the author’s views and opinions. The information contained in this book is provided without any express, statutory, or implied warranties. Neither the authors, Microsoft Corporation, nor its resellers, or distributors will be held liable for any damages caused or alleged to be caused either directly or indirectly by this book.
    [Show full text]
  • Ransomware Analysis and Defense Wannacry and the Win32 Environment
    INTERNATIONAL JOURNAL OF INFORMATION SECURITY SCIENCE J. Jones and N. Shashidhar Vol.6, No.4 Ransomware Analysis and Defense WannaCry and the Win32 environment Justin Jones, Narasimha Shashidhar Department of Computer Science, Sam Houston State University, Huntsville, TX, USA. e-mail: fjxj037, [email protected] Abstract—Ransomware is a specific type of malware that threatens the victim’s access to her data unless a ransom is paid. It is also known as a cryptovirus due to its method of operation. Typically, ransomware encrypts the contents of the victim’s hard drive thereby rendering it inaccessible to the victim. Upon payment of the ransom, the decryption key is released to the victim. This means of attack is therefore also sometimes aptly called cryptoviral extortion. The ransomware itself is delivered to the victim using several channels. The most common channel of delivery is by masquerading the malware as a Trojan horse via an email attachment. In this work, we study a high-profile example of a ransomware called the WannaCry worm. This ransomware is particularly malicious since it has the ability to traverse computing equipment on a network without any human intervention. Since this worm has had a large scale impact, we find it imperative and instructive to better understand the inner workings of this high-profile ransomware. To this end, we obtain a sample of WannaCry and dissect it completely using advanced static and dynamic malware analysis techniques. This effort, we hope, will shed light on the inner workings of the malware and will enable cyber security experts to better thwart similar attacks in the future by: a) generating appropriate signatures and b) developing stronger defense solutions.
    [Show full text]
  • DLL Quick Guide
    DDLLLL -- QQUUIICCKK GGUUIIDDEE http://www.tutorialspoint.com/dll/dll_quick_guide.htm Copyright © tutorialspoint.com DDLLLL -- IINNTTRROODDUUCCTTIIOONN Dynamic linking is a mechanism that links applications to libraries at run time. The libraries remain in their own files and are not copied into the executable files of the applications. DLLs link to an application when the application is run, rather than when it is created. DLLs may contain links to other DLLs. Many times, DLLs are placed in files with different extensions such as .EXE, .DRV or .DLL. Advantages of DLL Given below are a few advantages of having DLL files. Uses fewer resources DLL files don't get loaded into the RAM together with the main program; they don't occupy space unless required. When a DLL file is needed, it is loaded and run. For example, as long as a user of Microsoft Word is editing a document, the printer DLL file is not required in RAM. If the user decides to print the document, then the Word application causes the printer DLL file to be loaded and run. Promotes modular architecture A DLL helps promote developing modular programs. It helps you develop large programs that require multiple language versions or a program that requires modular architecture. An example of a modular program is an accounting program having many modules that can be dynamically loaded at run-time. Aid easy deployment and installation When a function within a DLL needs an update or a fix, the deployment and installation of the DLL does not require the program to be relinked with the DLL.
    [Show full text]
  • Overview of Windows Embedded Versions
    Technical Article Overview of Windows Embedded Versions Summary: Original Equipment Manufacturers (OEMs) have several embedded options, and they may be unsure of which version of Windows Embedded best meets their needs. To help guide OEMs in their selection process, we have created this introductory guide to Windows Embedded. In this document, we discuss the various versions of Windows Embedded. We also describe their features, provide comparative scenarios, and discuss approaches to prototyping. Published: October 2015 Applies to: All currently supported Windows Embedded operating systems (c) 2016 Microsoft Corporation. All rights reserved. This document is provided "as-is." Information and views expressed in this document, including URL and other Internet Web site references, may change without notice. You bear the risk of using it. This document does not provide you with any legal rights to any intellectual property in any Microsoft product. You may copy and use this document for your internal, reference purposes. Table of Contents Overview ....................................................................................................................................................... 5 In Scope ..................................................................................................................................................... 5 Out of Scope .............................................................................................................................................. 5 Windows CE/Windows Embedded
    [Show full text]
  • Presentation Notes
    Slide 1 Slide 2 What are DLLs? - Dynamic-Link Libraries - Mechanism in Windows to share code and data DLL DLL Code and Data Code and Data Why use DLLs? Windows Windows - Easier to design and build an application Application Application - Allows for better Memory management User Overview and Background - DLL - Dynamic-Link Library - Mechanism in windows to share code and data - A specified DLL would be called by an exe file, so that the running process may use functions/resources that are present in the DLL. - Additionally, DLLs can also link to other DLLs Why Use DLL - Allows for an optimal use of system memory. Should the same function be needed by various processes, a single DLL can be loaded into physical memory and shared between processes. - Not all associated libraries need to be loaded into memory if not needed, this further saves memory. For Ex. Word processing application that needs access to a printer DLL to print. The associated DLL will only be loaded when a print job is needed, and then unloaded once its task is complete. - Allows for programs to be made modular, and makes for an easier way to upgrade software. Some important DLLs NTDLL.DLL – Exports the Windows Native API to calling applications. Used by native applications such as csrss.exe, which is the first process to run before any user process can be created. KERNEL32.DLL − Gives a process access to Windows Base API, which includes access to functions that allow memory management, I/O, and process and thread creation. USER32.DLL − Allows a process to implement Windows User component.
    [Show full text]
  • Buicap32.Dll Download File Buicap32.Dll Download File
    buicap32.dll download file Buicap32.dll download file. Completing the CAPTCHA proves you are a human and gives you temporary access to the web property. What can I do to prevent this in the future? If you are on a personal connection, like at home, you can run an anti-virus scan on your device to make sure it is not infected with malware. If you are at an office or shared network, you can ask the network administrator to run a scan across the network looking for misconfigured or infected devices. Another way to prevent getting this page in the future is to use Privacy Pass. You may need to download version 2.0 now from the Chrome Web Store. Cloudflare Ray ID: 66992f60ee0d8498 • Your IP : 188.246.226.140 • Performance & security by Cloudflare. What is a DLL. This article describes what a dynamic link library (DLL) is and the various issues that may occur when you use DLLs. It also describes some advanced issues that you should consider when developing your own DLLs. Applies to: Windows 10 - all editions Original KB number: 815065. Summary. In describing what a DLL is, this article describes dynamic linking methods, DLL dependencies, DLL entry points, exporting DLL functions, and DLL troubleshooting tools. This article finishes with a high-level comparison of DLLs to the Microsoft .NET Framework assemblies. For the Windows operating systems, much of the functionality of the operating system is provided by DLL. Additionally, when you run a program on one of these Windows operating systems, much of the functionality of the program may be provided by DLLs.
    [Show full text]