Explanation-Guided Backdoor Poisoning Attacks Against Malware Classifiers

Total Page:16

File Type:pdf, Size:1020Kb

Explanation-Guided Backdoor Poisoning Attacks Against Malware Classifiers Explanation-Guided Backdoor Poisoning Attacks Against Malware Classifiers Giorgio Severi Jim Meyer∗ Scott Coull Northeastern University Xailient Inc. FireEye Inc. [email protected] [email protected] scott.coull@fireeye.com Alina Oprea Northeastern University [email protected] Abstract goal is to alter the data point at inference time in order to Training pipelines for machine learning (ML) based malware induce a misclassification. However, in this paper, we focus classification often rely on crowdsourced threat feeds, expos- on the insidious problem of poisoning attacks [14], which ing a natural attack injection point. In this paper, we study attempt to influence the ML training process, and in partic- the susceptibility of feature-based ML malware classifiers to ular backdoor [28] poisoning attacks, where the adversary backdoor poisoning attacks, specifically focusing on challeng- places a carefully chosen pattern into the feature space such ing “clean label” attacks where attackers do not control the that the victim model learns to associate its presence with sample labeling process. We propose the use of techniques a class of the attacker’s choice. While evasion attacks have from explainable machine learning to guide the selection previously been demonstrated against both open-source [4] of relevant features and values to create effective backdoor and commercial malware classifiers [7], backdoor poisoning triggers in a model-agnostic fashion. Using multiple refer- offers attackers an attractive alternative that requires more ence datasets for malware classification, including Windows computational effort at the outset, but which can result in PE files, PDFs, and Android applications, we demonstrate a generic evasion capability for a variety of malware sam- effective attacks against a diverse set of machine learning ples and target classifiers. These backdoor attacks have been models and evaluate the effect of various constraints imposed shown to be extremely effective when applied to computer on the attacker. To demonstrate the feasibility of our backdoor vision models [21, 38] without requiring a large number of attacks in practice, we create a watermarking utility for Win- poisoned examples, but their applicability to the malware clas- dows PE files that preserves the binary’s functionality, and sification domain, and feature-based models in general, has we leverage similar behavior-preserving alteration method- not yet been investigated. ologies for Android and PDF files. Finally, we experiment Poisoning attacks are a danger in any situation where a with potential defensive strategies and show the difficulties of possibly malicious third party has the ability to tamper with a completely defending against these attacks, especially when subset of the training data. For this reason, they have come the attacks blend in with the legitimate sample distribution. to be considered as one of the most relevant threats to pro- duction deployed ML models [35]. We argue that the current training pipeline of many security vendors provides a natural 1 Introduction injection point for such attacks. Security companies, in fact, often rely on crowd-sourced threat feeds [1,6,8,9] to provide The endpoint security industry has increasingly adopted ma- them with a large, diverse stream of user-submitted binaries to chine learning (ML) based tools as integral components of train their classifiers. This is chiefly due to the sheer quantity their defense-in-depth strategies. In particular, classifiers us- of labeled binaries needed to achieve satisfactory detection ing features derived from static analysis of binaries are com- performance (tens to hundreds of millions of samples), and monly used to perform fast, pre-execution detection and pre- specifically the difficulty in adequately covering the diverse vention on the endpoint, and often act as the first line of de- set of goodware observed in practice (e.g., custom binaries, fense for end users [2,3,5]. Concurrently, we are witnessing multiple versions of popular software, software compiled with a corresponding increase in the attention dedicated to adver- different compilers, etc.). sarial attacks against malicious software (malware) detection One complication in this scenario, however, is that the models. The primary focus in this area has been the develop- labels for these crowd-sourced samples are often gener- ment of evasion attacks [13, 25, 62], where the adversary’s ated by applying several independent malware detection en- ∗The author contributed to this work while at FireEye Inc. gines [30], which would be impossible for an attacker to con- Outsourced data Proprietary data Gathering ML malware & Labeling classifier Preprocessing & The platforms collect data Feature Extraction and assign labels. Model training Users submit binaries to Attacker can now submit malware crowdsourced threat intelligence containing the same backdoor. The platforms for evaluation. model will be fooled into recognizing it Attacker submits poisoned The company obtains the outsourced data and as benign. benign files. uses it in the training of a ML malware classifier. Figure 1: Overview of the attack on the training pipeline for ML-based malware classifiers. trol. Therefore, in this paper, we study clean-label backdoor 2 Background attacks [55,65] against ML-based malware classifiers by de- 1 veloping a new, model-agnostic backdoor methodology. Our Malware Detection Systems. We can separate automated attack injects backdoored benign samples in the training set of malware detection approaches into two broad classes based a malware detector, with the goal of changing the prediction on their use of static or dynamic analysis. Dynamic analysis of malicious software samples watermarked with the same systems execute binary files in a virtualized environment, and pattern at inference time. To decouple the attack strategy from record the behavior of the sample looking for indicators of the specifics of the ML model, our main insight is to lever- malicious activities [10, 31, 41, 54, 63]. Meanwhile, static ana- age tools from ML explainability, namely SHapley Additive lyzers process executable files without running them, extract- exPlanations (SHAP) [40], to select a small set of highly ef- ing the features used for classification directly from the binary fective features and their values for creating the watermark. and its meta-data. With the shift towards ML based classifiers, We evaluate our attack against a variety of machine learning this second class can be further divided into two additional models trained on widely-used malware datasets, including subcategories: feature-based detectors [11, 42, 52, 53,56], and EMBER (Windows executables) [11], Contagio (PDFs) [57], raw-binary analyzers [22, 34, 48]. We focus our attacks on and Drebin (Android executables) [12]. Additionally, we ex- classifiers based on static features due to their prevalence in plore the impact of various real-world constraints on the ad- providing pre-execution detection and prevention for many versary’s success, and the viability of defensive mechanisms commercial endpoint protection solutions [2,3,5]. to detect the attack. Overall, our results show that the attack achieves high success rates across a number of scenarios and Adversarial Attacks. Adversarial attacks against machine that it can be difficult to detect due to the natural diversity learning models can also be broadly split into two main cate- present in the goodware samples. Our contributions are: gories: evasion attacks, where the goal of the adversary is to (i) We highlight a natural attack point which, if left add a small perturbation to a testing sample to get it misclassi- unguarded, may be used to compromise the training of fied; poisoning attacks, where the adversary tampers with the commercial, feature-based malware classifiers. training data, either injecting new data points, or modifying (ii) We propose the first general, model-agnostic method- existing ones, to cause misclassifications at inference time. ology for generating backdoors for feature-based The former has been extensively explored in the context of classifiers using explainable machine learning tech- computer vision [17], and previous research efforts have also niques. investigated the applicability of such techniques to malware classification [13, 27, 33, 59, 70]. The latter has been itself (iii) We demonstrate that explanation-guided backdoor divided into different subcategories. Availability poisoning attacks are feasible in practice by developing a back- attacks aim at degrading the overall model accuracy [14,29]. dooring utility for Windows PE files, and using similar Targeted poisoning attacks induce the model to misclassify functionality-preserving methods for Android and PDF a single instance at inference time [55,60]. Finally, in Back- files. We show that these methods can satisfy multiple, door attacks, the adversary’s goal is to inject a backdoor (or realistic adversarial constraints. watermark) pattern in the learned representation of the model, (iv) Finally, we evaluate mitigation techniques and demon- which can be exploited to control the classification results. In strate the challenges of fully defending against stealthy this context, a backdoor is a specific combination of features poisoning attacks. and selected values that the victim model is induced, during 1We will refer to the combination of features and values used to induce training, to associate with a target class. The same watermark, the misclassification,
Recommended publications
  • Backdoors in Software: a Cycle of Fear and Uncertainty
    BACKDOORS IN SOFTWARE: A CYCLE OF FEAR AND UNCERTAINTY Leah Holden, Tufts University, Medford, MA Abstract: With the advent of cryptography and an increasingly online world, an antagonist to the security provided by encryption has emerged: backdoors. Backdoors are slated as insidious but often they aren’t intended to be. Backdoors can just be abused maintenance accounts, exploited vulnerabilities, or created by another agent via infecting host servers with malware. We cannot ignore that they may also be intentionally programmed into the source code of an application. Like with the cyber attrition problem, it may be difficult to determine the root cause of a backdoor and challenging to prove its very existence. These characteristics of backdoors ultimately lead to a state of being that I liken to a cycle of fear where governments or companies throw accusations around, the public roasts the accused, and other companies live in fear of being next. Notably missing from this cycle are any concrete solutions for preventing backdoors. A pervasive problem underlies this cycle: backdoors could be mitigated if software developers and their managers fully embraced the concept that no one should ever be able to bypass authentication. From the Clipper chip to the FBI and Apple’s standoff, we will examine requested, suspected, and proven backdoors up through the end of 2017. We will also look at the aftermath of these incidents and their effect on the relationship between software and government. Introduction: A backdoor has been defined as both “an undocumented portal that allows an administrator to enter the system to troubleshoot or do upkeep” and “ a secret portal that hackers and intelligence agencies used to gain illicit access” (Zetter).
    [Show full text]
  • Veracode.Com
    Joe Brady Senior Solutions Architect [email protected] 1 Detecting Software Austin OWASP Backdoors August 30, 2011 Joe Brady Senior Solutions Architect Software Security Simplified [email protected] About • Veracode provides automated, SaaS-based, application security assessment and remediation capabilities for Internal , external and 3rd party Applications . • Automated techniques include static binary analysis and dynamic analysis . • Founded in 2006, includes application security experts from L 0pht, @stake, Guardent, Symantec, VeriSign and SPI Dynamics/Hewlett Packard 3 Now is a good time to think about software backdoors • Unverified and untested software is everywhere • It’s in your computer, house, car, phone, TV, printer and even refrigerator • Most of that software was developed by people you don’t trust or don’t know very well • You clicked on that link someone sent you didn’t you? What we will cover today (three things to worry think about) • Application Backdoors ‣ Backdoors in the applications you own, are buying or have built ‣ Do you know where your source code was last night? • System Backdoors ‣ Vulnerabilities in the software you use everyday that can be used to implant a system backdoor ‣ E.g. Aurora (CVE-2010-0249) • Mobile Backdoors ‣ Your phone just might be spying on you Attacker Motivation • Most practical method of compromise for many systems ‣ Let the users install your backdoor on systems you have no access to ‣ Looks like legitimate software so may bypass AV • Retrieve and manipulate valuable private data ‣ Looks like legitimate application traffic so little risk of detection by IDS and DLP • For high value targets it becomes cost effective and more reliable.
    [Show full text]
  • A Survey on Smartphones Security: Software Vulnerabilities, Malware, and Attacks
    (IJACSA) International Journal of Advanced Computer Science and Applications Vol. 8, No. 10, 2017 A Survey on Smartphones Security: Software Vulnerabilities, Malware, and Attacks Milad Taleby Ahvanooey*, Prof. Qianmu Li*, Mahdi Rabbani, Ahmed Raza Rajput School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, P.O. Box 210094 P.R. China. Abstract—Nowadays, the usage of smartphones and their desktop usage (desktop usage, web usage, overall is down to applications have become rapidly popular in people’s daily life. 44.9% in the first quarter of 2017). Further, based on the latest Over the last decade, availability of mobile money services such report released by Kaspersky on December 2016 [3], 36% of as mobile-payment systems and app markets have significantly online banking attacks have targeted Android devices and increased due to the different forms of apps and connectivity increased 8% compared to the year 2015. In all online banking provided by mobile devices, such as 3G, 4G, GPRS, and Wi-Fi, attacks in 2016, have been stolen more than $100 million etc. In the same trend, the number of vulnerabilities targeting around the world. Although Android OS becomes very popular these services and communication networks has raised as well. today, it is exposing more and more vulnerable encounter Therefore, smartphones have become ideal target devices for attacks due to having open-source software, thus everybody malicious programmers. With increasing the number of can develop apps freely. A malware writer (or developer) can vulnerabilities and attacks, there has been a corresponding ascent of the security countermeasures presented by the take advantage of these features to develop malicious apps.
    [Show full text]
  • Backdoor Attacks and Countermeasures on Deep Learning
    1 Backdoor Attacks and Countermeasures on Deep Learning: A Comprehensive Review Yansong Gao, Bao Gia Doan, Zhi Zhang, Siqi Ma, Jiliang Zhang, Anmin Fu, Surya Nepal, and Hyoungshick Kim Abstract—Backdoor attacks insert hidden associations or trig- cases, an attacker can intelligently bypass existing defenses with gers to the deep learning models to override correct inference an adaptive attack. Drawing the insights from the systematic such as classification and make the system perform maliciously review, we also present key areas for future research on the according to the attacker-chosen target while behaving normally backdoor, such as empirical security evaluations from physical in the absence of the trigger. As a new and rapidly evolving trigger attacks, and in particular, more efficient and practical realistic attack, it could result in dire consequences, especially countermeasures are solicited. considering that the backdoor attack surfaces are broad. In 2019, the U.S. Army Research Office started soliciting countermeasures Index Terms—Backdoor Attacks, Backdoor Countermeasures, and launching TrojAI project, the National Institute of Standards Deep Learning (DL), Deep Neural Network (DNN) and Technology has initialized a corresponding online competi- tion accordingly. CONTENTS However, there is still no systematic and comprehensive review of this emerging area. Firstly, there is currently no systematic taxonomy of backdoor attack surfaces according to I Introduction 2 the attacker’s capabilities. In this context, attacks are diverse and not combed. Secondly, there is also a lack of analysis and II A Taxonomy of Adversarial Attacks on Deep comparison of various nascent backdoor countermeasures. In Learning 3 this context, it is uneasy to follow the latest trend to develop II-A Adversarial Example Attack .
    [Show full text]
  • Ransomware Attack: What’S Your Data Recovery Plan?
    RANSOMWARE ATTACK: WHAT’S YOUR DATA RECOVERY PLAN? Learn more about how KeepItSafe can help to reduce costs, save time, and provide compliance for online 888 965 9988 backup, disaster recovery-as-a-Service, mobile data www.keepitsafe.com protection, and cloud SaaS backup — contact us today. [email protected] Understanding and Preventing Ransomware Attacks TABLE OF CONTENTS What is Ransomware and Why is it On the Rise? pg. 03 The Most Common Types of Ransomware pg. 04 How Ransomware Spreads pg. 06 What You Can Do: 10 Steps pg. 08 to Defend Against Ransomware How KeepItSafe® Helps Combat Ransomware pg. 11 2 www.keepitsafe.com Understanding and Preventing Ransomware Attacks WHAT IS RANSOMWARE AND WHY IS IT ON THE RISE? New Ransomware 1,400,000 1,200,000 1,000,000 800,000 600,000 400,000 200,000 0 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 2013 2014 2015 McAfee Labs Threats Report, August 2015 Imagine sitting down at your office computer, logging in to your corporate network, and being greeted by the following onscreen message: “We have locked you out of access to all of your company’s systems, files and other data. To have access restored, please deposit $100,000 in the following bitcoin account.” This is “ransomware,” one of the most prevalent forms of malicious cyber attacks facing businesses today. With a ransomware attack, a cyber hacker infects your network or device with malicious software, usually through code attached to an email or contained within seemingly legitimate software you download from a website. Once the malicious software propagates through your systems, the hacker can then encrypt your data — and contact you with an offer to pay a ransom to get the data back.
    [Show full text]
  • Back Door and Remote Administration Programs
    By:By:XÇzXÇzAA TÅÅtÜTÅÅtÜ ]A]A `t{ÅÉÉw`t{ÅÉÉw SupervisedSupervised By:DrBy:Dr.. LoLo’’aiai TawalbehTawalbeh New York Institute of Technology (NYIT)-Jordan’s Campus 1 Eng. Ammar Mahmood 11/2/2006 Introduction A backdoor in a computer system (or cryptosystem or algorithm) is a method of bypassing normal authentication or securing remote access to a computer, while attempting to remain hidden from casual inspection.(unauthorized persons/systems) Most backdoors are autonomic malicious programs that must be somehow installed to a computer. Some parasites do not require the installation, as their parts are already integrated into particular SW running on a remote host. 11/2/2006 Eng. Ammar Mahmood 2 Introduction The backdoor may take the form of an installed program (e.g., Back Orifice or the Sony/BMG rootkit backdoor installed when any of millions of Sony music CDs were played on a Windows computer), or could be a modification to a legitimate program. 11/2/2006 Eng. Ammar Mahmood 3 Ways of Infection Typical backdoors can be accidentally installed by unaware users. Some backdoors come attached to e- mail messages or are downloaded from the Internet using file sharing programs. Their authors give them unsuspicious names and trick users into opening or executing such files (Trojan horse ). Backdoors often are installed by other parasites like viruses, worms or even spyware (even antispyware e.g. AdWare SpyWare SE ). They get into the system without user knowledge and consent and affect everybody who uses a compromised computer. Some threats can be manually installed by malicious local users who have sufficient privileges for the software installation.
    [Show full text]
  • Backdoors in Crypto the Problem with Castles
    Backdoors in Crypto The Problem with Castles. Treason Doors. Nottingham Castle. ● Roger Mortimer, 1st Earl of March ruled from Nottingham Castle. Nottingham Castle. ● Roger Mortimer, 1st Earl of March ruled from Nottingham Castle. ● Nottingham Castle had a secret tunnel that bypassed all the defenses. Nottingham Castle. ● Roger Mortimer, 1st Earl of March ruled from Nottingham Castle. ● Nottingham Castle had a secret tunnel that bypassed all the defenses. ● 1330 AD - Secret passage isn’t so secret. ○ Overseer of the castle betrays Mortimer. ○ Leads raid through secret tunnel. ○ kills Mortimer’s guards, arrests Mortimer. ○ Mortimer is executed soon after. Maginot Line Fortifications. ● French WW2 fortifications. ● Claim: Designed so that if captured, they would be easier to recapture. What does this have to do with Crypto? ● Can you design a cipher that would keep you secure but you could break if other people used it? ● How? ● What properties should such a backdoor have? Backdoors Definition: ● Backdoors are built-in methods of bypassing the security of a system. Dual EC DRBG Backdoor: Overview ● Dual Elliptic Curve Deterministic Random Bit Generator. ● Backdoored by the NSA. ● Deployed in commercial systems. ● Exposed by Academic Cryptographers and Snowden. Dual EC DRBG Backdoor: How ● Dual Elliptic Curve Deterministic Random Bit Generator. ● Like Blum-Micali, but generates many bits at a time. Dual EC DRBG Backdoor: The players ● NSA - National Security Agency ○ Offensive/Defensive mission: Makes & breaks codes/ciphers, ○ Captures, listens to & analyzes communications. ● NIST - National Institute of Standards and Technology ○ Creates & evaluates national technology standards. ○ Trusted as a fair player nationally & internationally. ● RSA ○ Technology company, created to commercialize public key encryption.
    [Show full text]
  • Hunting Trojan Horses
    NUCAR Technical Report TR-01 January 2006 Hunting Trojan Horses Version 1.0 Micha Moffie and David Kaeli 1 Hunting Trojan Horses Micha Moffie and David Kaeli Computer Architecture Research Laboratory Northeastern University, Boston, MA {mmoffie,kaeli}@ece.neu.edu Abstract In this report we present HTH (Hunting Trojan Horses), a security framework for detecting Trojan Horses and Backdoors. The framework is composed of two main parts: 1) Harrier – an application security monitor that performs run-time monitoring to dynamically collect execution-related data, and 2) Secpert – a security-specific Expert System based on CLIPS, which analyzes the events collected by Harrier. Our main contributions to the security research are three-fold. First we identify common malicious behaviors, patterns, and characteristics of Trojan Horses and Backdoors. Second we develop a security policy that can identify such malicious behavior and open the door for effectively using expert systems to implement complex security policies. Third, we construct a prototype that successfully detects Trojan Horses and Backdoors. 1 Introduction Computer attacks grew at an alarming rate in 2004 [26] and this rate is expected to rise. Additionally, zero-day attack exploits are already being sold on the black market. Cases in which malicious code was used for financial gain were reported in the 2005 Symantec Internet Security Threat Report [31] (for the first half of 2005). Symantec also reports a rise in the occurrence of malicious code that exposes confidential information. They further report that this class of malicious code attack represents 74% of the top 50 code samples reported to Symantec in 2005 [31].
    [Show full text]
  • Defending Against Backdoor Techniques Used in Targeted Attacks
    TrendLabs Although the motivation behind targeted attack campaigns may vary, threat actors continue to go after the ‘crown jewels’ or confidential company data of enterprises. Based on a Harvard Business Review study,1 there are four types of data commonly stolen in targeted attacks: personally identifiable information (PII) (28%), authentication credentials (21%), intellectual property (20%), and other sensitive corporate/organizational data (16%). Before getting to the crown jewels, though, attackers need to gather a whole lot of other information about their target in order to infiltrate the network without being detected. This may involve gathering publicly available information about the target, as well as information about the target’s network infrastructure. The latter is often done with the use of malware, such as remote access tools or backdoors. Backdoors play a critical role in targeted attacks. Besides being the primary tool for stealing data, it is also through backdoors that attackers are able to go deeper into the target network without being detected. For this reason, threat actors often employ a wide-array of backdoor techniques to evade detection. For instance, attackers will launch a first-line backdoor to execute commands and establish command- and-control (C&C) communications. This will download the second-line backdoor that will steal information from compromised systems that attackers can use to be able to go deeper into their target networks. In the data exfiltration stage, backdoors are used for uploading files and employing certain ports to hide attackers’ malicious tracks. Figure 1. Backdoor techniques in targeted attacks 1 Scott Shackelford. (April 16, 2014).
    [Show full text]
  • Example of Trojan Horse
    Example Of Trojan Horse Hassan fled his motherliness jumps saliently or next after Gerhardt premises and clabber heroically, unicolor and catenate. Maury usually mangling flirtingly or clamour unmeaningly when genteel Kurtis brutify lest and languishingly. Lex influencing kitty-cornered if xerophilous Lucio hocus or centrifuging. And according to experts, it remains so. As with protecting against most common cybersecurity threats, effective cybersecurity software should be your front line of protection. But what if you need to form an allegiance with this person? Another type of the virus, Mydoom. Wonder Friends to read. When first developed, Gozi used rootkit components to hide its processes. What appearsto have been correctly uninstalled or malware that i get the date, or following paper describes the horse of! Please enter your password! Adware is often known for being an aggressive advertising software that puts unwanted advertising on your computer screen. In as far as events are concerned; the central Intelligence Agency has been conducting searches for people who engage in activities such as drug trafficking and other criminal activities. If someone tries to use your computer, they have to know your password. No matter whether a company favors innovation or not, today innovation is key not only to high productivity and growth, but to the mere survival in the highly competitive environment. The fields may be disguised as added security questions that could give the criminal needed information to gain access to the account later on. It is surprising how far hackers have come to attack people, eh? Run script if the backdoor is found, it will disconnect you from the server, and write to the console the name of the backdoor that you can use later.
    [Show full text]
  • The Trojan Horse Defense in Cybercrime Cases, 21 Santa Clara High Tech
    Santa Clara High Technology Law Journal Volume 21 | Issue 1 Article 1 2004 The rT ojan Horse Defense in Cybercrime Cases Susan W. Brenner Brian Carrier Jef Henninger Follow this and additional works at: http://digitalcommons.law.scu.edu/chtlj Part of the Law Commons Recommended Citation Susan W. Brenner, Brian Carrier, and Jef Henninger, The Trojan Horse Defense in Cybercrime Cases, 21 Santa Clara High Tech. L.J. 1 (2004). Available at: http://digitalcommons.law.scu.edu/chtlj/vol21/iss1/1 This Article is brought to you for free and open access by the Journals at Santa Clara Law Digital Commons. It has been accepted for inclusion in Santa Clara High Technology Law Journal by an authorized administrator of Santa Clara Law Digital Commons. For more information, please contact [email protected]. ARTICLES THE TROJAN HORSE DEFENSE IN CYBERCRIME CASES Susan W. Brennert & Brian Carrier with Jef Henninger* TABLE OF CONTENTS I. INTRODUCTION ............................................................. 3 II. LEGAL ISSUES ............................................................ 14 A. How the Trojan Horse Defense Is Used ...................... 16 1. Raise Reasonable Doubt ............................................ 16 2. Negate Mens Rea ..................................................... 18 3. Establishing the Defense .......................................... 18 B. How Can the Prosecution Respond? ............. .......... 21 1. Establish Defendant's Computer Expertise .............. 22 2. "Character" Evidence ..............................................
    [Show full text]
  • An Enhanced Fuzzing Framework for Physical SOHO Router Devices To
    Zhang et al. Cybersecurity (2021) 4:24 Cybersecurity https://doi.org/10.1186/s42400-021-00091-9 RESEARCH Open Access ESRFuzzer: an enhanced fuzzing framework for physical SOHO router devices to discover multi-Type vulnerabilities Yu Zhang1,2,3,4, Wei Huo1,2,3,4, Kunpeng Jian1,2,3,4, Ji Shi1,2,3,4,LongquanLiu1,2,3,4,YanyanZou1,2,3,4* , Chao Zhang5,6 and Baoxu Liu1,2,3,4 Abstract SOHO (small office/home office) routers provide services for end devices to connect to the Internet, playing an important role in cyberspace. Unfortunately, security vulnerabilities pervasively exist in these routers, especially in the web server modules, greatly endangering end users. To discover these vulnerabilities, fuzzing web server modules of SOHO routers is the most popular solution. However, its effectiveness is limited due to the lack of input specification, lack of routers’ internal running states, and lack of testing environment recovery mechanisms. Moreover, existing worksfordevicefuzzingaremorelikelytodetectmemorycorruptionvulnerabilities. In this paper, we propose a solution ESRFuzzer to address these issues. It is a fully automated fuzzing framework for testing physical SOHO devices. It continuously and effectively generates test cases by leveraging two input semantic models, i.e., KEY-VALUE data model and CONF-READ communication model, and automatically recovers the testing environment with power management. It also coordinates diversified mutation rules with multiple monitoring mechanisms to trigger multi-type vulnerabilities. With the guidance of the two semantic models, ESRFuzzer can work in two ways: general mode fuzzing and D-CONF mode fuzzing. General mode fuzzing can discover both issues which occur in the CONF and READ operation, while D-CONF mode fuzzing focus on the READ-op issues especially missed by general mode fuzzing.
    [Show full text]