Effective and Efficient Malware Detection at the End Host

Total Page:16

File Type:pdf, Size:1020Kb

Effective and Efficient Malware Detection at the End Host Effective and Efficient Malware Detection at the End Host Clemens Kolbitsch∗, Paolo Milani Comparetti∗, Christopher Kruegel‡, Engin Kirda§, Xiaoyong Zhou†, and XiaoFeng Wang† ∗ Secure Systems Lab, TU Vienna ‡ UC Santa Barbara {ck,pmilani}@seclab.tuwien.ac.at [email protected] § Institute Eurecom, Sophia Antipolis † Indiana University at Bloomington [email protected] {zhou,xw7}@indiana.edu Abstract sible for such information flows. For detection, we exe- Malware is one of the most serious security threats on cute these slices to match our models against the runtime the Internet today. In fact, most Internet problems such behavior of an unknownprogram. Our experimentsshow as spam e-mails and denial of service attacks have mal- that our approach can effectively detect running mali- ware as their underlying cause. That is, computers that cious code on an end user’s host with a small overhead. are compromised with malware are often networked to- gether to form botnets, and many attacks are launched 1 Introduction using these malicious, attacker-controlled networks. With the increasing significance of malware in Inter- Malicious code, or malware, is one of the most press- net attacks, much research has concentrated on develop- ing security problems on the Internet. Today, millions ing techniques to collect, study, and mitigate malicious of compromised web sites launch drive-by download ex- code. Without doubt, it is important to collect and study ploits against vulnerable hosts [35]. As part of the ex- malware found on the Internet. However, it is even more ploit, the victim machine is typically used to download important to develop mitigation and detection techniques and execute malware programs. These programs are of- based on the insights gained from the analysis work. ten bots that join forces and turn into a botnet. Bot- Unfortunately, current host-based detection approaches nets [14] are then used by miscreants to launch denial (i.e., anti-virus software) suffer from ineffective detec- of service attacks, send spam mails, or host scam pages. tion models. These models concentrate on the features Given the malware threat and its prevalence, it is not of a specific malware instance, and are often easily evad- surprising that a significant amount of previous research able by obfuscation or polymorphism. Also, detectors has focused on developing techniques to collect, study, that check for the presence of a sequence of system calls and mitigate malicious code. For example, there have exhibited by a malware instance are often evadable by been studies that measure the size of botnets [37], the system call reordering. In order to address the shortcom- prevalence of malicious web sites [35], and the infes- ings of ineffective models, several dynamic detection ap- tation of executables with spyware [31]. Also, a num- proaches have been proposed that aim to identify the be- ber of server-side [4, 43] and client-side honeypots [51] havior exhibited by a malware family. Although promis- were introduced that allow analysts and researchers to ing, these approaches are unfortunately too slow to be gather malware samples in the wild. In addition, there used as real-time detectors on the end host, and they of- exist tools that can execute unknown samples and mon- ten require cumbersome virtual machine technology. itor their behavior [6, 28, 54, 55]. Some tools [6, 54] In this paper, we propose a novel malware detection provide reports that summarize the activities of unknown approach that is both effective and efficient, and thus, can programs at the level of Windows API or system calls. be used to replace or complement traditional anti-virus Such reports can be evaluated to find clusters of samples software at the end host. Our approach first analyzes a that behave similarly [5, 7] or to classify the type of ob- malware program in a controlled environment to build a served, malicious activity [39]. Other tools [55] incorpo- model that characterizes its behavior. Such models de- rate data flow into the analysis, which results in a more scribe the information flows between the system calls es- comprehensive view of a program’s activity in the form sential to the malware’s mission, and therefore, cannot of taint graphs. be easily evaded by simple obfuscation or polymorphic While it is important to collect and study malware, techniques. Then, we extract the program slices respon- this is only a means to an end. In fact, it is crucial that the insight obtained through malware analysis is trans- sample to use obfuscation and polymorphictechniques to lated into detection and mitigation capabilities that al- alter its appearance. The problem with static techniques low one to eliminate malicious code running on infected is that static binary analysis is difficult [30]. This diffi- machines. Considerable research effort was dedicated to culty is further exacerbated by runtime packing and self- the extraction of network-based detection models. Such modifying code. Moreover, the analysis is costly, and models are often manually-crafted signatures loaded into thus, not suitable for replacing AV scanners that need to intrusion detection systems [33] or bot detectors [20]. quickly scan large numbers of files. Dynamic analysis Other models are generated automatically by finding is an alternative approach to model malware behavior. In common tokens in network streams produced by mal- particular, several systems [22, 55] rely on the tracking of ware programs (typically, worms) [32, 41]. Finally, mal- dynamic data flows (tainting) to characterize malicious ware activity can be detected by spotting anomalous traf- activity in a generic fashion. While detection results are fic. For example, several systems try to identify bots by promising, these systems incur a significant performance looking for similar connection patterns [19, 38]. While overhead. Also, a special infrastructure (virtual machine network-based detectors are useful in practice, they suf- with shadow memory) is required to keep track of the fer from a number of limitations. First, a malware pro- taint information. As a result, static and dynamic anal- gram has many options to render network-based detec- ysis approaches are often employed in automated mal- tion very difficult. The reason is that such detectors can- ware analysis environments (for example, at anti-virus not observe the activity of a malicious program directly companies or by security researchers), but they are too but have to rely on artifacts (the traffic) that this program inefficient to be deployed as detectors on end hosts. produces. For example, encryption can be used to thwart In this paper, we propose a malware detection ap- content-based techniques, and blending attacks [17] can proach that is both effective and efficient, and thus, can change the properties of network traffic to make it ap- be used to replace or complement traditional AV soft- pear legitimate. Second, network-based detectors cannot ware at the end host. For this, we first generate effective identify malicious code that does not send or receive any models that cannot be easily evaded by simple obfusca- traffic. tion or polymorphic techniques. More precisely, we exe- Host-based malware detectors have the advantage that cute a malware program in a controlled environment and they can observe the complete set of actions that a mal- observe its interactions with the operating system. Based ware program performs. It is even possible to identify on these observations, we generate fine-grained models malicious code before it is executed at all. Unfortunately, that capture the characteristic, malicious behavior of this current host-based detection approaches have significant program. This analysiscan be expensive,as it needsto be shortcomings. An important problem is that many tech- run only once for a group of similar (or related) malware niques rely on ineffective models. Ineffective models are executables. The key of the proposed approach is that models that do not capture intrinsic properties of a mali- our models can be efficiently matched against the run- cious program and its actions but merely pick up artifacts time behavior of an unknown program. This allows us of a specific malware instance. As a result, they can be to detect malicious code that exhibits behavior that has easily evaded. For example, traditional anti-virus (AV) been previously associated with the activity of a certain programs mostly rely on file hashes and byte (or instruc- malware strain. tion) signatures [46]. Unfortunately, obfuscation tech- The main contributions of this paper are as follows: niques and code polymorphism make it straightforward to modify these features without changing the actual se- • We automatically generate fine-grained (effective) mantics (the behavior) of the program [10]. Another ex- models that capture detailed information about the ample are models that capture the sequence of system behavior exhibited by instances of a malware fam- calls that a specific malware program executes. When ily. These models are built by observing a malware these system calls are independent, it is easy to change sample in a controlled environment. their order or add irrelevant calls, thus invalidating the • We have developed a scanner that can efficiently captured sequence. match the activity of an unknown program against In an effort to overcome the limitations of ineffective our behavior models. To achieve this, we track de- models, researchers have sought ways to capture the ma- pendencies between system calls without requiring licious activity that is characteristic of a malware pro- expensive taint analysis or special support at the end gram (or a family). On one hand, this has led to de- host. tectors [9, 12, 25] that use sophisticated static analysis to identify code that is semantically equivalent to a mal- • We present experimental evidence that demon- ware template. Since these techniquesfocus on the actual strates that our approach is feasible and usable in semantics of a program, it is not enough for a malware practice.
Recommended publications
  • Statistical Structures: Fingerprinting Malware for Classification and Analysis
    Statistical Structures: Fingerprinting Malware for Classification and Analysis Daniel Bilar Wellesley College (Wellesley, MA) Colby College (Waterville, ME) bilar <at> alum dot dartmouth dot org Why Structural Fingerprinting? Goal: Identifying and classifying malware Problem: For any single fingerprint, balance between over-fitting (type II error) and under- fitting (type I error) hard to achieve Approach: View binaries simultaneously from different structural perspectives and perform statistical analysis on these ‘structural fingerprints’ Different Perspectives Idea: Multiple perspectives may increase likelihood of correct identification and classification Structural Description Statistical static / Perspective Fingerprint dynamic? Assembly Count different Opcode Primarily instruction instructions frequency static distribution Win 32 API Observe API calls API call vector Primarily call made dynamic System Explore graph- Graph structural Primarily Dependence modeled control and properties static Graph data dependencies Fingerprint: Opcode frequency distribution Synopsis: Statically disassemble the binary, tabulate the opcode frequencies and construct a statistical fingerprint with a subset of said opcodes. Goal: Compare opcode fingerprint across non- malicious software and malware classes for quick identification and classification purposes. Main result: ‘Rare’ opcodes explain more data variation then common ones Goodware: Opcode Distribution 1, 2 ---------.exe Procedure: -------.exe 1. Inventoried PEs (EXE, DLL, ---------.exe etc) on XP box with Advanced Disk Catalog 2. Chose random EXE samples size: 122880 with MS Excel and Index totalopcodes: 10680 3, 4 your Files compiler: MS Visual C++ 6.0 3. Ran IDA with modified class: utility (process) InstructionCounter plugin on sample PEs 0001. 002145 20.08% mov 4. Augmented IDA output files 0002. 001859 17.41% push with PEID results (compiler) 0003. 000760 7.12% call and general ‘functionality 0004.
    [Show full text]
  • Systems Administrator
    SYSTEMS ADMINISTRATOR SUMMARY The System Administrator’s role is to manage in-house computer software systems, servers, storage devices and network connections to ensure high availability and security of the supported business applications. This individual also participates in the planning and implementation of policies and procedures to ensure system provisioning and maintenance that is consistent with company goals, industry best practices, and regulatory requirements. ESSENTIAL DUTIES AND RESPONSIBILITIES: The Authority’s Senior Systems Manager may designate various other activities. The following statements are intended to describe the general nature and level of work being performed. They are not intended to be construed, as an exhaustive list of all responsibilities, duties and skills required of personnel so classified. Nothing in this job description restricts management's right to assign or reassign duties and responsibilities to this job at any time for any reason, including reasonable accommodation. Operational Management Manage virtual and physical servers with Windows Server 2003 – 2012 R2 and RHEL operating systems Manage Active Directory, Microsoft Office 365, and server and workstation patching with SCCM Manage the physical and virtual environment (VMware) of 300 plus servers Have familiarity with MS SQL server, windows clustering, domain controller setup, and group policy Ensure the security of the server infrastructure by implementing industry best-practices regarding privacy, security, and regulatory compliance. Develop and maintain documentation about current environment setup, standard operating procedures, and best practices. Manage end user accounts, permissions, access rights, and storage allocations in accordance with best- practices Perform and test routine system backups and restores. Anticipate, mitigate, identify, troubleshoot, and correct hardware and software issues on servers, and workstations.
    [Show full text]
  • Common Threats to Cyber Security Part 1 of 2
    Common Threats to Cyber Security Part 1 of 2 Table of Contents Malware .......................................................................................................................................... 2 Viruses ............................................................................................................................................. 3 Worms ............................................................................................................................................. 4 Downloaders ................................................................................................................................... 6 Attack Scripts .................................................................................................................................. 8 Botnet ........................................................................................................................................... 10 IRCBotnet Example ....................................................................................................................... 12 Trojans (Backdoor) ........................................................................................................................ 14 Denial of Service ........................................................................................................................... 18 Rootkits ......................................................................................................................................... 20 Notices .........................................................................................................................................
    [Show full text]
  • Emerging Threats and Attack Trends
    Emerging Threats and Attack Trends Paul Oxman Cisco Security Research and Operations PSIRT_2009 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 1 Agenda What? Where? Why? Trends 2008/2009 - Year in Review Case Studies Threats on the Horizon Threat Containment PSIRT_2009 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 2 What? Where? Why? PSIRT_2009 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 3 What? Where? Why? What is a Threat? A warning sign of possible trouble Where are Threats? Everywhere you can, and more importantly cannot, think of Why are there Threats? The almighty dollar (or euro, etc.), the underground cyber crime industry is growing with each year PSIRT_2009 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 4 Examples of Threats Targeted Hacking Vulnerability Exploitation Malware Outbreaks Economic Espionage Intellectual Property Theft or Loss Network Access Abuse Theft of IT Resources PSIRT_2009 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 5 Areas of Opportunity Users Applications Network Services Operating Systems PSIRT_2009 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 6 Why? Fame Not so much anymore (more on this with Trends) Money The root of all evil… (more on this with the Year in Review) War A battlefront just as real as the air, land, and sea PSIRT_2009 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 7 Operational Evolution of Threats Emerging Threat Nuisance Threat Threat Evolution Unresolved Threat Policy and Process Reactive Process Socialized Process Formalized Process Definition Reaction Mitigation Technology Manual Process Human “In the Automated Loop” Response Evolution Burden Operational End-User “Help-Desk” Aware—Know End-User No End-User Increasingly Self- Awareness Knowledge Enough to Call Burden Reliant Support PSIRT_2009 © 2009 Cisco Systems, Inc.
    [Show full text]
  • Designing a Framework for End User Applications
    Lincoln University Digital Thesis Copyright Statement The digital copy of this thesis is protected by the Copyright Act 1994 (New Zealand). This thesis may be consulted by you, provided you comply with the provisions of the Act and the following conditions of use: you will use the copy only for the purposes of research or private study you will recognise the author's right to be identified as the author of the thesis and due acknowledgement will be made to the author where appropriate you will obtain the author's permission before publishing any material from the thesis. Designing a Framework for End User Applications A thesis submitted in partial fulfilment of the requirements for Degree of Doctor of Philosophy in Software and Information Technology at Lincoln University By Yanbo Deng Lincoln University 2013 Abstract of a thesis submitted in partial fulfilment of the requirements for the Degree of Doctor of Philosophy in Software and Information Technology Designing a Framework for End User Applications by Yanbo Deng End user developers (i.e. non-professional developers) often create database applications to meet their immediate needs. However, these applications can often be difficult to generalise or adapt when requirements inevitably change. As part of this thesis, we visited several research institutions to investigate the issues of end user developed databases. We found that different user groups in the same organisation might require similar, but different, data management applications. However, the very specific designs used in most of these systems meant it was difficult to adapt them for other similar uses. In this thesis we propose a set of guidelines for supporting end user developers to create more flexible and adaptable data management applications.
    [Show full text]
  • Modeling of Computer Virus Spread and Its Application to Defense
    University of Aizu, Graduation Thesis. March, 2005 s1090109 1 Modeling of Computer Virus Spread and Its Application to Defense Jun Shitozawa s1090109 Supervised by Hiroshi Toyoizumi Abstract 2 Two Systems The purpose of this paper is to model a computer virus 2.1 Content Filtering spread and evaluate content filtering and IP address blacklisting with a key parameter of the reaction time R. Content filtering is a containment system that has a We model the Sasser worm by using the Pure Birth pro- database of content signatures known to represent par- cess in this paper. Although our results require a short ticular worms. Packets containing one of these signa- reaction time, this paper is useful to obviate the outbreak tures are dropped when a containment system member of the new worms having high reproduction rate λ. receives the packets. This containment system is able to stop computer worm outbreaks immediately when the systems obtain information of content signatures. How- 1 Introduction ever, it takes too much time to create content signatures, and this system has no effect on polymorphic worms In recent years, new computer worms are being created at a rapid pace with around 5 new computer worms per a [10]. A polymorphic worm is one whose code is trans- day. Furthermore, the speed at which the new computer formed regularly, so no single signature identifies it. worms spread is amazing. For example, Symantec [5] 2.2 The IP Address Blacklisting received 12041 notifications of an infection by Sasser.B in 7 days. IP address blacklisting is a containment system that has Computer worms are a kind of computer virus.
    [Show full text]
  • Computer Viruses and Related Threats
    NIST Special Publication 500-166 Allia3 imfiB? NATTL INST OF STANDARDS & TECH R.I.C. A11 103109837 Computer Viruses and Technolo^ Related Threats: U.S. DEPARTMENT OF COMMERCE National Institute of A Management Guide Standards and Technology John P. Wack Lisa J. Camahan NIST PUBLICATIONS HP Jli m m QC - — 100 U57 500-166 1989 C.2 rhe National Institute of Standards and Technology^ was established by an act of Congress on March 3, 1901. The Institute's overall goal is to strengthen and advance the Nation's science and technology and facilitate their effective application for public benefit. To this end, the Institute conducts research to assure interna- tional competitiveness and leadership of U.S. industry, science and technology. NIST work involves development and transfer of measurements, standards and related science and technology, in support of continually improving U.S. productivity, product quality and reliability, innovation and underlying science and engineering. The Institute's technical work is performed by the National Measurement Laboratory, the National Engineering Laboratory, the National Computer Systems Laboratory, and the Institute for Materials Science and Engineering. The National Measurement Laboratory Provides the national system of physical and chemical measurement; Basic Standards^ coordinates the system with measurement systems of other nations Radiation Research and furnishes essential services leading to accurate and imiform Chemical Physics physical and chemical measurement throughout the Nation's scientific Analytical Chemistry community, industry, and commerce; provides advisory and research services to other Government agencies; conducts physical and chemical research; develops, produces, and distributes Standard Reference Materials; provides calibration services; and manages the National Standard Reference Data System.
    [Show full text]
  • Software Assurance: an Overview of Current Industry Best Practices
    Software Assurance: An Overview of Current Industry Best Practices February 2008 Executive Summary Software Assurance: An Overview of Current Industry Best Practices Software underpins the information infrastructure that govern- ments, critical infrastructure providers and businesses worldwide depend upon for daily operations and business processes. These organizations widely and increasingly use commercial off-the- shelf software (“COTS”) to automate processes with information technology. At the same time, cyber attacks are becoming more stealthy and sophisticated, creating a complex and dynamic risk environment for IT-based operations that users are working to better understand and manage. As such, users have become in- creasingly concerned about the integrity, security and reliability of commercial software. To address these concerns and meet customer requirements, vendors have undertaken significant efforts to reduce vulner- abilities, improve resistance to attack and protect the integrity of the products they sell. These efforts are often referred to as “software assurance.” Software assurance is especially impor- tant for organizations critical to public safety and economic and national security. These users require a high level of confidence that commercial software is as secure as possible, something only achieved when software is created using best practices for secure software development. This white paper provides an overview of how SAFECode mem- bers approach software assurance, and how the use of best practices for software
    [Show full text]
  • End User Device Strategy: Security Framework & Controls
    End User Device Strategy: Security Framework & Controls v1.2 February 2013 End User Device Strategy: Security Framework & Controls This document presents the security framework for End User Devices working with OFFICIAL information, and defines the control for mobile laptops to be used for both OFFICIAL and OFFICIAL­SENSITIVE. The selected standards aim to optimise for technology and security assurance, low cost and complexity, minimal 3rd party software, good user experience, wider competition and choice, and improved alignment to the consumer and commodity IT market. This document defines standards for a mobile laptop with a “thick” operating system installed, such as Linux, Windows or MacOS X. Forthcoming guidance will cover thin client devices, smartphones and tablets but is not expected to vary significantly from the key principles of this guidance. The scope of this document is central government departments, their agencies and related bodies. Wider public sector organisations can use this security framework as part of their broader compliance with the Public Services Network (PSN) codes of connection. Security IT Reform strategic goals for a security framework are to: • make optimum use of native security functions, avoiding third party products wherever possible • make better use of controls around the data and services where they can often be more effective, rather than adding additional complexity to devices • allow greater user responsibility to reduce security complexity, maintaining user experience for the majority of responsible
    [Show full text]
  • 2020 State of the Phish: an In-Depth Look at User Awareness, Vulnerability and Resilience
    ANNUAL REPORT 2020 State of the Phish An in-depth look at user awareness, vulnerability and resilience proofpoint.com 2020 STATE OF THE PHISH | ANNUAL REPORT INTRODUCTION Do you have a good sense of how well users understand cybersecurity terms and best practices? Do you know the top issues infosec teams are dealing with as a result of phishing attacks? How about the ways organizations are fi ghting phishing attacks and the successes (and struggles) they’re experiencing? Our sixth annual State of the Phish report again brings you • The impacts information security professionals are critical, actionable insights into the current state of the phishing experiencing because of phishing attacks and the ways threat. You’ll learn about: they’re trying to combat these threats • How Proofpoint customers are approaching phishing • The end-user awareness and knowledge gaps that could awareness training, and the ways we’re helping them be hurting your cybersecurity defenses measure program success This year’s report includes analysis of data from a variety of sources, including the following: A survey of more than A survey of more than Nearly More than 3,500 600 50M 9M working adults across IT security professionals simulated phishing attacks suspicious emails seven countries across the same seven sent by our customers reported by our (the United States, countries over a 12-month period customers’ end users Australia, France, Germany, Japan, Spain and the United Kingdom) “Phishing” can mean different things to different people, but we use the term in a general sense. In the context of this report, phishing encompasses all socially engineered emails, regardless of the specifi c malicious intent (such as directing users to dangerous websites, distributing malware, collecting credentials, and so on).
    [Show full text]
  • ITSS 14 IT Security Standard – End User Protection
    ITSS_14 IT Security Standard – End User Protection Version Approved by Approval date Effective date Next review date 1.0 Vice-President, Finance and Operations 7 June 2016 7 June 2016 7 June 2017 Standard Statement End user computing (EUC) Hardware (e.g. desktop, notebooks workstations or tablets) are the primary gateway to the organisation’s sensitive information and business applications. Implementation of appropriate information security controls for EUC hardware can mitigate the risk to UNSW data and IT systems. Consequently end user protection is critical to ensuring a robust, Purpose reliable and secure UNSW IT environment. The purpose of this standard is to set out the rules for effective security measures enforced on EUC hardware as well as any future potential requirements for portable storage and mobile device management. This standard applies to all users of UNSW Information and Communication Technology resources – including (but not limited to) staff (including casuals), students, consultants and contractors, third parties, agency staff, alumni, associates and honoraries, conjoint appointments and visitors to UNSW. The Standard recognises the various types of user SOE (Standard Operating Environment) and non SOE builds deployed throughout UNSW. As such the depth of control that can be applied will differ depending on the level of management in place verses the end user functionality, which must be Scope considered from a risk perspective to achieve a balance. Build Type Owner / Managed by SOE UNSW IT SOE Faculty IT Non-SOE Research, School, Institute Non-SOE (BYOD) Individual Note: BYOD devices are primarily applicable to the ITSS_13 BYOD Guidelines. Are Local Documents on ☐ Yes ☐ Yes, subject to any areas specifically ☐ No this subject permitted? restricted within this Document Standard 1.
    [Show full text]
  • Measuring User Confidence in Smartphone Security and Privacy
    Measuring User Confidence in Smartphone Security and Privacy Erika Chin∗, Adrienne Porter Felt∗, Vyas Sekary, David Wagner∗ ∗University of California, Berkeley yIntel Labs {emc,apf,daw}@cs.berkeley.edu, [email protected] ABSTRACT Despite the popularity of smartphones, there are reasons to be- In order to direct and build an effective, secure mobile ecosys- lieve that privacy and security concerns might be inhibiting users tem, we must first understand user attitudes toward security and from realizing the full potential of their mobile devices. Although privacy for smartphones and how they may differ from attitudes to- half of U.S. adults own smartphones [5], mobile online shopping ward more traditional computing systems. What are users’ comfort is only 3% of overall shopping revenues [7], suggesting that users levels in performing different tasks? How do users select appli- are hesitant to perform these tasks on their smartphones. A re- cations? What are their overall perceptions of the platform? This cent commercial study also found that 60% of smartphone users understanding will help inform the design of more secure smart- are concerned that using mobile payments could put their financial phones that will enable users to safely and confidently benefit from and personal security at risk [4]. the potential and convenience offered by mobile platforms. Our goal is to help smartphone users confidently and securely To gain insight into user perceptions of smartphone security and harness the power of mobile platforms. In order to improve the se- installation habits, we conduct a user study involving 60 smart- curity of mobile systems, we must understand the challenges and phone users.
    [Show full text]