Widespread Internet Attacks: Defense-Oriented Evolution and Countermeasures

WIDESPREAD INTERNET ATTACKS: DEFENSE-ORIENTED EVOLUTION AND COUNTERMEASURES

DISSERTATION

Presented in Partial Fulﬁllment of the Requirements for the Degree Doctor of Philosophy in the

Graduate School of The Ohio State University

Xun Wang, B.E., M.S.

*****

The Ohio State University

2007

Dissertation Committee: Approved by

Dong Xuan, Adviser Ten H. Lai Adviser Ming T. Liu Graduate Program in Computer Science and Engineering °c Copyright by

Xun Wang

2007 ABSTRACT

Widespread Internet attacks, such as Distributed Denial of Service (DDoS) attacks and

active worm attacks, have been major threats to the Internet in the recent past. Although

tremendous research effort has focused on this domain, the defense against these attacks

remains challenging for one reason: the attacks are evolving intelligently based on their

knowledge of defense mechanisms. In other words, the attacks are becoming more intel-

ligent and effective through defense-oriented evolution in order to defeat existing defense

systems. The objectives of this dissertation are to obtain deep insight about these defense-

oriented attacks and to address the challenges in defense against them.

While multiple elements deﬁne a speciﬁc defense system, the most important ones

are the system infrastructure and algorithms. The evolving defense-oriented attacks can exploit and leverage the knowledge of infrastructure and algorithms in the defense systems in order to counteract them. Hence we can classify defense-oriented widespread Internet attacks into infrastructure-oriented and algorithm-oriented attacks. In this dissertation, we

investigate a variety of such attacks and design new and more effective countermeasures

against them.

For infrastructure-oriented attacks, we study two classes of new attacks that target dif-

ferent aspects of the defense system infrastructure. First, we investigate intelligent DDoS

attacks which aim to infer architectures of the DDoS-defending Secure Overlay Forward-

ing Systems (SOFS) to launch attacks more efﬁciently than ordinary random DDoS attacks.

ii Second, we study the invisible LOCalization attack which can obtain location information

of Internet Threat Monitoring (ITM) systems. In order to defend against these new attacks,

we provide enhancements for SOFS and ITM systems.

For algorithm-oriented attacks, ﬁrst we study a class of new active worms, the Varying

Scan Rate Worm, which deliberately varies its port scan rate during propagation to evade detection by existing network-based worm detection algorithms. Second, we focus on polymorphic worms which change or possess new signatures to defeat existing host-based

worm detection algorithms. Furthermore, we provide new and more effective detection

approaches against these new worms.

The war between attackers and defenders is never ending. We believe this dissertation

lays a foundation to deeply understand the evolution of widespread Internet attacks and to

enhance defenses against them.

iii To my family.

iv ACKNOWLEDGMENTS

To reach this stage in my Ph.D. study and this point in my life, I am indebted to many great people for their wisdom, support, and love.

It was a great fortune for me to become the ﬁrst Ph.D. stduent of Dr. Dong Xuan in

September 2002. It is him who has given me the opportunity to conduct focused research in the past years, and it is him who has showed me the road to high quality work. While enjoying the freedom of independent thinking, I greatly appreciated his insightful advice on my research as well as life. I also greatly appreciate his patience in guiding me through my Ph.D. study. I still remember that, when I gave the ﬁrst formal academic presentation in English, he gave me so much help and encouragement.

As a Ph.D. student in the Department of Computer Science and Engineering, I have also enjoyed and appreciated the advice and help from many other professors both within and outside of The Ohio State University, including Dr. Ming T. Liu, Dr. Ten H. Lai, Dr.

David Lee and Dr. Srinivasan Parthasarathy in CSE department, as well as Dr. Xinwen

Fu in Dakota State University and Dr. Wei Zhao in Rensselaer Polytechnic Institute. They have made my stay at The Ohio State University fun and fruitful.

During my Ph.D. study, I have enjoyed working with many other fellow graduate stu- dents in CSE department. I have got the chance to work with Sriram Chellappan, Thang

Nam Le, Sandeep Reddy, Wenjun Gu, Corey Boyer, Kurt Schosek, Xiaole Bai, Boxuan

Gu and Adam Champion on shared projects or research problems, and it was a wonderful

v experience. I have also interacted extensively with a few other graduate colleagues including Wei Yu in Texas A&M University, my research partner who has given me most help as a fellow student, and Prasad Calyam in the Ohio Supercomputer Center. Their help and laughters have greatly enriched my life at The Ohio State University. I would also like to thank my many other friends for their continued support during my life and study in OSU.

I am indebted to my parents Shihong Wang and Fanding Zhang, my dearest sister Xu

Wang, her husband Jason Chang and their two lovely sons for their unconditional love and support. I would like to take this opportunity to express my greatest gratitude to my greatest aunt, Xiaoman Duan, her husband Shengqi Wang and my wonderful cousin sister,

Yuan Wang. I would also like to thank the rest of my family — who are too numerous to name individually — for their love and help. It is this family that has made me strong and courageous to be the one I am today. It is my love, my inspiration, and my life.

vi VITA

January 29, 1977 ...... Born - Weinan, China

1999 ...... B.E. in Computer Engineering, East China Normal University, China 2002 ...... M.E. in Computer Engineering, East China Normal University, China 2006 ...... M.S. in Computer Science and Engineer- ing, The Ohio State University 2002-present ...... Graduate Research and Teaching Asso- ciate, The Ohio State University

PUBLICATIONS

Research Publications

Wei Yu, Xun Wang, Dong Xuan and Wei Zhao. “On Detecting Camouﬂaging Worm”. in Proceedings of 23rd Annual Computer Security Applications Conference (ACSAC), De- cember 2006.

Wei Yu, Xun Wang, Dong Xuan and David Lee. “Effective Detection of Active Smart Worms with Varying Scan Rate”. in Proceedings of 2nd IEEE International Conference on Security and Privacy in Communication Networks (SecureComm), August 2006.

vii Xun Wang, Sriram Chellappan, Corey Boyer and Dong Xuan. “On the Effectiveness of Secure Overlay Forwarding Systems under Intelligent Distributed DoS Attacks”. IEEE Transactions on Parallel and Distributed Systems (TPDS), 17(7):619 - 632, July 2006.

Xun Wang, Sriram Chellappan, Wenjun Gu, Wei Yu, and Dong Xuan. “Policy-driven Physical Attacks in Sensor Networks: Modeling and Measurement”. in Proceedings of IEEE Wireless Communications and Networking Conference (WCNC), April 2006.

Xun Wang, Wenjun Gu, Kurt Schosek, Sriram Chellappan, Dong Xuan. “Sensor Network Conﬁguration under Physical Attacks”. International Journal of Ad Hoc and Ubiquitous Computing (IJAHUC), Lecture Notes in Computer Science, Inderscience, January 2006.

Wenjun Gu, Xun Wang, Sriram Chellappan, D. Xuan and Ten-Hwang Steve Lai. “Defend- ing against Search-based Physical Attacks in Sensor Networks”. in Proceedings of 2nd IEEE Mobile Sensor and Ad-hoc and Sensor Systems (MASS), November 2005.

Wei Yu, Sriram Chellappan, Xun Wang, and D. Xuan. “On Defending Peer-to-Peer System-based Active Worm Attacks”. in Proceedings of 48th IEEE Global Telecommuni- cations Conference (Globecom), November 2005.

Xun Wang, Sriram Chellappan, Wenjun Gu, Wei Yu and D. Xuan. “Search-based Physical Attacks in Sensor Networks”. in Proceedings of 14th IEEE International Conference on Computer Communication and Networks (ICCCN), October 2005.

Xun Wang, Wenjun Gu, Kurt Schosek, Sriram Chellappan, D. Xuan. “Sensor Network Conﬁguration under Physical Attacks”. in Proceedings of 3rd International Conference on Computer Network and Mobile Computing (ICCNMC), August 2005.

Xun Wang, Wenjun Gu, Sriram Chellappan, K. Schosek and D. Xuan. “Lifetime Optimiza- tion of Sensor Networks under Physical Attacks”. in Proceedings of IEEE International Conference on Communications (ICC), May 2005.

D. Xuan, Sriram Chellappan, Xun Wang and Shengquan Wang. “Analyzing the Secure Overlay Services Architecture under Intelligent DDoS Attacks”. in Proceedings of 24th IEEE International Conference on Distributed Computing Systems (ICDCS), March 2004.

D. Xuan, Sriram Chellappan and Xun Wang. “Resilience of Structured Peer-to-Peer Sys- tems: Analysis and Enhancement”. Handbook on Theoretical and Algorithmic Aspects of Sensor, Ad Hoc Wireless and Peer-to-Peer Networks, CRC press LLC, 2004.

viii FIELDS OF STUDY

Major Field: Computer Science and Engineering

Studies in: Computer Networking Prof. Dong Xuan Prof. Ten H. Lai Prof. Ming T. Liu Software Engineering Prof. Atanas Rountev Computer Architecture Prof. Mario Lauria

ix TABLE OF CONTENTS

Page

Abstract ...... ii

Dedication ...... iv

Acknowledgments ...... v

Vita ...... vii

List of Tables ...... xiii

List of Figures ...... xiv

Chapters:

1. Introduction ...... 1

1.1 Widespread Internet Attacks are Major Threats to the Internet ...... 1 1.2 Widespread Internet Attacks are Evolving ...... 2 1.3 Contributions of the Dissertation: Defense-Oriented Evolution and Coun- termeasures ...... 5 1.3.1 Infrastructure-oriented Attacks ...... 7 1.3.2 Algorithm-oriented Attacks ...... 10 1.4 Organization of the Dissertation ...... 13

2. Intelligent DDoS Attacks against Secure Overlay Forwarding Systems and Countermeasures ...... 14

2.1 Motivations ...... 14 2.2 Background ...... 17 2.3 Intelligent DDoS Attacks ...... 19 2.4 Analysis of Intelligent DDoS Attacks against SOFS Systems ...... 21

x 2.4.1 Analysis of Round based Intelligent DDoS Attacks ...... 21 2.4.2 Analysis of Continuous Intelligent DDoS Attacks ...... 43 2.5 Countermeasures ...... 47 2.5.1 Optimization of SOFS System Performance Under Round based Attacks ...... 47 2.5.2 General Design Guidelines to Enhance SOFS System Performance 50 2.6 Related Work ...... 51 2.7 Summary ...... 52

3. Localization Attack against Internet Threat Monitoring Systems and Counter- measures ...... 54

3.1 Motivations ...... 54 3.2 Background ...... 56 3.2.1 Internet Threat Monitoring Systems ...... 56 3.2.2 Localization attacks against ITM Systems ...... 57 3.3 iLOC Attack ...... 57 3.3.1 Overview ...... 58 3.3.2 Attack Trafﬁc Generation Stage ...... 60 3.3.3 Attack Trafﬁc Decoding Stage ...... 63 3.3.4 Discussions ...... 66 3.4 Analysis ...... 67 3.4.1 Accuracy Analysis ...... 67 3.4.2 Invisibility Analysis ...... 69 3.4.3 Determination of Attack Parameters ...... 71 3.5 Implementation and Validation ...... 73 3.5.1 Implementation of the iLOC Attack ...... 73 3.5.2 Validation of the iLOC Attack ...... 74 3.6 Performance Evaluation ...... 76 3.6.1 Evaluation Methodology ...... 76 3.6.2 Results ...... 78 3.7 Guidelines of Countermeasures ...... 80 3.8 Related Work ...... 83 3.9 Summary ...... 84

4. Varying Scan Rate Worms against Network-based Worm Detections and Coun- termeasures ...... 85

4.1 Motivations ...... 85 4.2 Background ...... 86 4.2.1 The Propagation Model of Traditional Worms ...... 86 4.2.2 Network-based Worm Detection ...... 88

xi 4.3 The Active Worm with Varying Scan Rate ...... 90 4.3.1 The VSR Worm Model ...... 90 4.3.2 Analysis of the VSR Worm ...... 91 4.4 DEC Worm Detection ...... 96 4.4.1 Design Rationale ...... 96 4.4.2 DEC Worm Detection ...... 97 4.4.3 Space of Worm Detection ...... 102 4.5 Performance Evaluation ...... 104 4.5.1 Evaluation Methodology ...... 104 4.5.2 Detection Performance ...... 106 4.6 Related Work ...... 110 4.7 Summary ...... 112

5. Polymorphic Worms against Host-based Worm Detection and Countermeasures 114

5.1 Motivations ...... 114 5.2 Background ...... 117 5.2.1 Worm Detection ...... 117 5.2.2 Program Analysis ...... 118 5.2.3 Data Mining ...... 119 5.3 Polymorphic Worms ...... 120 5.4 Worm Detection via Mining Dynamic Program Execution ...... 121 5.4.1 Framework ...... 121 5.4.2 Dataset Collection ...... 125 5.4.3 Feature Extraction ...... 127 5.4.4 Classiﬁer Learning and Worm Detection ...... 129 5.5 Experiments ...... 138 5.5.1 Experiment Setup and Metrics ...... 139 5.5.2 Experiment Results ...... 140 5.6 Discussions ...... 143 5.7 Related Work ...... 145 5.8 Summary ...... 147

6. Concluding remarks ...... 149

Bibliography ...... 151

xii LIST OF TABLES

Table Page

1.1 Defense-oriented attacks and countermeasures studied in this dissertation. . 6

2.1 Optimal mapping degree with different NT ...... 49

2.2 Optimal node distribution under 1 to 2 mapping with different NT . . . . . 49

3.1 Defender Detection Rate PDD (Port 135) ...... 79

4.1 Detection Time of Some Existing Detection Schemes ...... 95

4.2 Maximal Infection Ratio of Some Existing Detection Schemes ...... 96

4.3 DEC Performance Sensitivity of Parameter α ...... 108

5.1 Detection results for the Naive Bayes based detection ...... 141

5.2 Detection results for the SVM based detection ...... 141

xiii LIST OF FIGURES

Figure Page

2.1 The generalized SOFS architecture...... 17

2.2 A Snapshot of the generalized SOFS architecture under the intelligent DDoS attacks...... 23

2.3 Sensitivity of PS to L and mi under different attack intensities...... 29

2.4 Node demarcation in our successive attack at the end of Round j...... 34

2.5 Sensitivity of PS to NT under different L, mi and N...... 39

2.6 Sensitivity of PS to L, mi and node distribution...... 40

2.7 Sensitivity of PS to R (a) and PE (b)...... 42

2.8 Sensitivity of PS to L under different m (a), and to NC under different L and r (b)...... 45

2.9 Sensitivity of PS to NT under different r and L (a), and different r and m (b). 46

3.1 Workﬂow of the iLOC Attack ...... 58

3.2 PN-code and Encoded Attack Trafﬁc ...... 63

3.3 Experiment Setup ...... 74

3.4 Background Trafﬁc vs. Trafﬁc Mixed with iLOC Attack ...... 75

3.5 PSD for Background Trafﬁc vs. Trafﬁc Mixed with iLOC Attack ...... 75

xiv 3.6 Attack Successful Rate (Port 135) ...... 77

3.7 Attack Successful Rate vs. Code Length ...... 81

3.8 Attack Successful Rate vs. Number of Parallel Attack Sessions ...... 81

3.9 Attack Successful Rate vs. Number of Parallel Attack Sessions ...... 81

4.1 Infection ratio of different VSR worms...... 93

4.2 The observed worm instance count of different VSR worms...... 93

4.3 Bayes decision rule for normal and worm trafﬁc features ...... 101

4.4 Space of worm detection ...... 103

4.5 Detection time of detection schemes on VSR worms ...... 106

4.6 Maximal infection ratio of detection schemes on VSR worms ...... 107

4.7 Detection time o f detection schemes on the traditional PRS worms . . . . . 109

4.8 Maximal infection ratio of detection schemes on the traditional PRS worms 109

5.1 Workﬂow of the off-line classiﬁer learning ...... 121

5.2 Workﬂow of the on-line worm detection ...... 122

5.3 Basic idea of kernel function in SVM...... 135

xv CHAPTER 1

INTRODUCTION

1.1 Widespread Internet Attacks are Major Threats to the Internet

Widespread Internet attacks are large-scale attacks on the Internet whose attack sources spread widely over the Internet [23, 68]. They have been major threats to the Internet in the recent past and there are many well-known examples such as Distributed Denial of

Service (DDoS) attacks, active worm attacks, spam, spyware etc. In July 2001, an active worm called “Code-Red” infected more than 350,000 Microsoft IIS servers. In less than

14 hours, this active worm caused more than 1.2 billion dollars in economic damages [87].

In October 2002, a DDoS attack lasted for only an hour but was able to shut down 7 of the

13 Internet DNS root servers [7]. In January 2003, another active worm called “Slammer” infected nearly 75,000 Microsoft SQL servers in less than 10 minutes and consequently caused large scale disruptions in production systems worldwide [86]. In March 2004, active worms called “Witty” and “Sasser” infected many hosts in a short time and made them unusable [24]. This list of attacks is increasing and there is no apparent end in sight.

Furthermore, a recent trend has emerged in which different types of attacks combine to increase attack sophistication and efﬁciency. For example, Code Red worms launch DDoS attack against the White House’s website (www.whitehouse.gov) at the ﬁnal stage of their

1 propagation [88]. More recently, in February 2004, the MyDoom worm propagated rapidly to many hosts which ﬂooded the websites, www.sco.com and www.microsoft.com, thereby preventing legitimate users from accessing them [4]. Combinations of different types of attacks is not limited to active worms and DDoS attacks. Many active worms are used to infect a large number of hosts and recruit them as bots or zombies, which are networked together to form botnets [99] [102] [92]. These botnets can be used to: (i) launch massive

DDoS attacks that disrupt Internet utility [4], (ii) access confidential information that can be abused [5] through large scale traffic sniffing, key logging, identity theft etc., (iii) distribute large-scale unsolicited advertisement emails (as spam) or software (as adware), (iv) spread new malware by installing Trojan Horses or other backdoor software, and (v) destroy data that has a high monetary value [6]. There is even evidence showing that botnets are being rented out for attacks on Internet e-businesses [102].

1.2 Widespread Internet Attacks are Evolving

Due to the massive damage caused by widespread Internet attacks, a signiﬁcant amount of research efforts have been focused on developing effective methods to model, detect and defend against them. Among them, DDoS attacks and active worms are the most dominant and dangerous threats to the Internet. Research regarding understanding of them and defense against them is the most important and imperative and receives the greatest attention and effort.

Although much effort and progress have been made in this direction, effective defense against these attacks is still a challenge today due to one fact: widespread Internet attacks have evolved and are continuously evolving.

2 1. Evolution of DDoS Attacks

During its evolution, the DDoS attack has not only has equipped itself with new at-

tack approaches, it also has added new types of entities to its list of attack targets.

Generally, a Denial of Service (DoS) attack is characterized by an explicit attempt

to prevent the legitimate use of a system service. A Distributed Denial of Service

(DDoS) attack deploys multiple attacking computers to attain this goal. In the early

generations of DDoS attacks, the attacker sent a stream of packets to a victim, which

consume some key resource such as network bandwidth, computation capacity (CPU

cycle), or memory, thereby rendering the resource unavailable to the victim’s legit-

imate clients. In later DDoS attacks, the attacker sent a few malformed packets to

confuse an vulnerable application or a vulnerable protocol on the victim machine

and force it to freeze or reboot. Both of these two approaches set speciﬁc hosts or

networks as the victim, but the former targets to network or computation resources,

whereas the latter targets the protocol or application. Furthermore, new DDoS at-

tacks target the Internet infrastructure (such as DNS systems) rather than speciﬁc

victims [7].

DDoS attacks also attempt to evade detection. In the above attacks, the attacker con-

tinuously sends a large amount of packets to a victim to exhaust its key resource,

overload it to disable communication, crash its service, or block its network link.

Typical examples are TCP SYN attack, TCP and UDP ﬂood attacks, ICMP echo

attack and Smurf attack [81]. These attack methods share one common feature: a

large number of compromised machines or agents involved in the attack and transmit

packets at a high rate to the victim, which make the DDoS attacks easy to be de-

tected. While potentially quite harmful, the high-rate nature of such attacks presents

3 a statistical anomaly to network monitors such that the attack can be detected, the

attacker identiﬁed, and the attack effects mitigated. However, recent work shows

that smart DDoS attackers can maliciously chosen low-rate DoS trafﬁc patterns that

exploit TCP’s retransmission time-out mechanism and throttle TCP ﬂows to a small

fraction of their ideal rate while eluding detection [66] [74].

2. Evolution of Active Worms

Active worms also use multiple evolutionary principles to propagate themselves more

efﬁcient. First, while the ﬁrst generation worms only used port scanning to propa-

gate themselves, current worms propagate themselves more effectively using various

methods, e.g., network port scanning, email, ﬁle sharing, Peer-to-Peer (P2P) net-

works, and Instant Messaging (IM). Second, worms use different scan strategies dur-

ing different stages of propagation. For example, instead of using pure random scans

to ﬁnd victims, they use a hitlist to infect previously identiﬁed vulnerable hosts at

the initial stage of propagation in order to increase propagation efﬁciency [132, 81].

Once they propagate to a new local network, they scan all IP addresses in this local

network ﬁrst in order to increase the chances to hit victims. They use DNS, network

topology and routing information to identify active hosts instead of randomly scan

IP addresses [132, 81]. They split the target IP address space during propagation in

order to avoid duplicate scans.

They also become more modular and organized in order to carry other attack pay-

loads and launch any kind of organized and synchronized large scale attack [51, 99].

Furthermore, in order to evade numerous worm detection systems, they are becom-

ing stealthy. For example, the “tak” worm [144] is a recently-discovered active worm

4 that attempts to remain hidden by sleeping (stopping scans) when it suspects it is un-

der detection. Worms that adopt similar attack strategies to those of the “Atak” worm

could yield overall scan trafﬁc patterns different from those of traditional worms.

Therefore, the existing network-based detection schemes with scan trafﬁc monitor-

ing will not be able to effectively detect them.

Unlike the above network-based worm detection systems, host-based worm detection

systems search inbound binary code content for known patterns, or signatures, that

correspond to worms. To date, in order to detect and/or block active worms, these

worm detection systems use signatures that match bytes from a worm’s payload using

techniques such as string matching at arbitrary payload offsets [3, 111] and regular

expressions matching within a payload [3].

However, newly evolved active worms intend to be polymorphic [20, 32, 63]. Poly-

morphic worms are able to change their binary representation or signature as part

of the spreading process. This can be achieved with self-encryption mechanisms or

semantics-preserving code manipulation techniques. Consequently, copies of a poly-

morphic worm may no longer share a common invariant substring of sufﬁcient length

and existing detection systems will not recognize network streams that contain copies

of worms or executables as manifestations of a worm outbreak. This worm evolution

trend also requires us to enhance content-based worm detection schemes.

1.3 Contributions of the Dissertation: Defense-Oriented Evolution and Countermeasures

Among the above evolutions of widespread Internet attacks, some aim to evade the detection based on their knowledge of detection schemes, such as the slow-rate DDoS attacks

5 Defense Element the Attack Countermeasure Attack is Oriented Architecture Intelligent DDoS attacks Optimal conﬁguration of Infrastructure against SOFS systems SOFS systems Location invisible LOCalization Enhancement of ITM sys- (iLOC) attacks against ITM tems systems Network- Varying Scan Rate worms attack target Distribution Algorithm based algo- Entropy based dynamiC rithm (DEC) worm detection Camouﬂaging worms Spectrum analysis based worm detection Host-based Polymorphic worms Worm detection through algorithm mining dynamic program Execution

Table 1.1: Defense-oriented attacks and countermeasures studied in this dissertation.

and polymorphic worms discussed previously. We notice that this kind of deliberate attack evolution and resulted attacks are more effective (for the attacks) and more dangerous (for the defenses and victims) than random and ad hoc evolution. We carry out systematic and comprehensive investigations following a more general evolution trend: defense-oriented evolution. Defense-oriented evolution results in various enhanced or new attacks that take advantage of knowledge of existing defense systems to counteract these systems. In this dissertation, we investigate varieties of potentially defense-oriented, evolved attacks in order to obtain deep insights about them and ﬁnd vulnerabilities in existing defense systems.

Consequently, we further enhance these defense systems or design new and more effective defense schemes.

While multiple elements deﬁne a speciﬁc defense system, the most important ones are the infrastructure and algorithm parts. Defense-oriented (evolved) attacks can exploit and leverage knowledge of defense system infrastructure and algorithms in order to counteract them and make new attacks more effective and dangerous. Thus, we can classify

6 the defense-oriented attacks into defense-infrastructure-oriented (or just infrastructure-

oriented) and defense-algorithm-oriented (or just algorithm-oriented) attacks. We inves-

tigate different instances of each class of defense-oriented widespread Internet attacks as

shown in Table 1.1.

1.3.1 Infrastructure-oriented Attacks

Infrastructure is simple for single-sited systems, such as host-based or single-device-

based systems, but it can be sophisticated for distributed systems. There are several im-

portant elements in the infrastructure of a system, such as its architecture, topology and

components. In order to counteract defense systems, attackers desire to obtain information

about the infrastructure elements thereof. While some high-level infrastructure information

(such as constituent component, topology type, and etc.) may be public, detailed informa-

tion (such as the identities, roles and locations of the components, relations and connec-

tions between components within the architecture, and etc.) are not accessible outside of

the system. However, it is these detailed infrastructure information that attackers can use to

counteract the defense system. Therefore, attackers need to use speciﬁc attack approaches

to obtain these information.

In our research, we focus on infrastructure-oriented attacks that target to the architecture and location information of defense systems. We systematically investigate instances of these attacks, based on which we propose methods to enhance defense systems.

7 1. Architecture-oriented Attacks and Countermeasures

— Intelligent DDoS Attacks to SOFS Systems and Optimization of SOFS Sys-

tems

A recent approach to protect communications from DDoS attacks involves the use of

overlay systems. Although such systems perform well under random DDoS attacks,

it is questionable whether they are resilient to intelligent DDoS attacks which aim

to infer the architectures of the systems to launch more efﬁcient attacks. We deﬁne

several intelligent DDoS attack models and develop analysis and simulations to study

the impacts of intelligent DDoS attacks on system performance in terms of the path

availability between clients and the server [127, 135]. We generalize such systems as

Secure Overlay Forwarding Systems (SOFS). There are certain standard architectural

features of such systems, i.e., layering, mapping degree and node distribution.

We analyzed our SOFS system under discrete-round-based and continuous attacks

using a general analytical approach and simulations, respectively. We observed that

the system design features, attack strategies, intensities, prior knowledge about the

system, and system recovery signiﬁcantly impacts system performance. Even under

sophisticated attack strategies and intensities, we showed that smart design of system

features and recoveries can signiﬁcantly reduce attack impacts. We provide a method

to obtain optimal system conﬁgurations under given attack strategies and intensities.

Furthermore, we propose a set of design guidelines to enhance SOFS performance

under all general scenarios.

8 2. Location-oriented Attacks and Countermeasures

— invisible LOCalization Attack against Internet Threat Monitoring Systems

Internet threat monitoring (ITM) systems have been deployed to detect widely spread-

ing threats and attacks on the Internet in recent years. However, the integrity and

functionality of these systems largely depend on the location anonymity of their mon-

itors. If the locations of monitors are disclosed, the attacker can bypass the monitors

or even abuse them, signiﬁcantly jeopardizing the performance of ITM systems. In

this work, we study a new class of attack, the invisible LOCalization (iLOC) attack

[130]. The iLOC attack can accurately and invisibly localize monitors of ITM sys-

tems. In the iLOC attack, the attacker launches low-rate scan trafﬁc, encoded with

a selected pseudo-noise code (PN-code), to targeted networks. While the secret PN-

code is invisible to others, the attacker can accurately determine the existence of

monitors in the targeted networks based on whether the PN-code is embedded in the

report data queried from the data center of the ITM system. We implement the iLOC

attack and conduct experiments on a real-world ITM system to validate the feasibil-

ity of such attacks. We also conduct extensive simulations on the iLOC attack using

real-world traces. Our data demonstrate that the iLOC attack can accurately iden-

tify monitors while remaining invisible to ITM systems. Finally, we present a set

of guidelines to counteract the iLOC attack. Particularly, the iLOC attack does not

directly harm Internet or defense systems by itself, but it can aid other widespread

Internet attacks by defeating ITM systems to increase attack damage.

9 1.3.2 Algorithm-oriented Attacks

Algorithms in a defense system deal with ﬂow control, data processing and decision-

making in detection and response. Similar to the infrastructure information, the high-level

algorithms are not unknown to attackers. However, attacker do not need to know their

detailed information, despite the difﬁculty of obtaining this. Knowledge of high-level al-

gorithm information sufﬁces for attackers to evolve their attacks to render current defense

system algorithms ineffective, and thus to defeat the defense systems.

In this dissertation, we focus on algorithm-oriented worm attacks, which attempt to

evolve themselves based on their knowledge of existing worm detection algorithms in order

to circumvent detection. Since worm detection can be classiﬁed into network-based and host-based categories, we study algorithm-oriented worm attack instances in each category.

1. Network-based Algorithm-oriented Attacks and Countermeasures

— VSR Worm, C-Worm and Countermeasures

It has been observed that the number of infected hosts and overall port scan trafﬁc

volume increase exponentially over time when traditional worms propagate in the

Internet [86][27][148]. Based on these observations, many network-based worm de-

tection algorithms associated with global scan trafﬁc monitoring systems, such as

threshold-based detection and trend-based detection, have been developed to detect

large scale propagation of worms in the Internet [147][105][132][123]. However,

worm writers know that these detection algorithms expect exponentially increasing

port scan trafﬁc or a large volume of port scan trafﬁc during worm propagation.

Consequently, worm writers can evolve their worms accordingly so existing worm

detection algorithms will not observe abnormal trafﬁc or generate an alarm during

10 the propagation of new worms. Hence the evolved worms can evade existing worm

detection systems.

In this dissertation, we model a new class of active worms called the Varying Scan

Rate worm (the VSR worm in short) [141]. The VSR worm deliberately varies

its scan rate and is able to avoid effective detection by existing worm detection

schemes. As a countermeasure against the VSR worm, we design a new worm detec-

tion scheme called attack target Distribution Entropy based dynamiC worm detection

(DEC detection in short). DEC detection utilizes the attack target distribution and its

statistical entropy in conjunction with dynamic decision rules to distinguish worm

scan trafﬁc from non-worm scan trafﬁc. We conduct extensive performance eval-

uations on the DEC detection scheme using real-world traces as background scan

trafﬁc. Our data clearly demonstrate the effectiveness of the DEC detection scheme

in detecting VSR worms as well as traditional worms.

In our research, we also investigate another new class of active worms, i.e., the Cam-

ouﬂaging Worm (C-Worm), which has the ability to camouﬂage its propagation from

worm detection systems [142] through timely manipulation of scan trafﬁc. In order

to detect C-Worms, we design a novel spectrum-based scheme. Our performance

results demonstrate that our scheme can detect the C-Worm more rapidly and accu-

rately in comparison with existing worm detection schemes.

2. Host-based Algorithm-oriented Attacks and Countermeasures

— Polymorphic Worm Detection via Mining Dynamic Program Execution

As discussed in Section 1.2, host-based worm detection systems commonly use a col-

lection of worm signatures to determine whether an incoming executable is a worm

11 or not. However, the worm writers know that the detection algorithms in these de-

tection systems expect signatures of known worms. Consequently, they can generate

new worms which have different binary representations or signatures. They even can

generate polymorphic worms which change their binary representations or signatures

as part of the propagation process. Thus, the existing host-based worm detection al-

gorithm will not observe the expected worm signatures during the propagation of

these evolved worms and cannot detect them effectively

In order to detect these polymorphic worms or worms whose signatures are unknown

to the host-based worm detection systems, we propose a new worm detection ap-

proach based on mining dynamic program executions [129]. This approach can cap-

ture the dynamic behavior of executables to provide accurate and efﬁcient detection

against both seen and unseen worms. In particular, we execute a large number of real-

world worms and benign executables and trace their system calls. For mining from

a large amount of features extracted from the system call traces, we apply two clas-

siﬁer learning algorithms (Naive Bayes and Support Vector Machine). The learned

classiﬁers are further used to carry out rapid worm detection with low overhead on

the end-host. Our experimental results clearly demonstrate the effectiveness of our

approach to detect new and polymorphic worms in terms of very high detection rate

and low false positive rate.

Biological evolution is based on fast genetic mutation and natural selection, i.e., it relies on a brute-force and ad hoc search for better genetic adaption to the environment [19].

Men-made entities in human society cannot use this random and slow evolution principle; instead, humans will control the evolution of their products and adapt them purposely and effectively. We believe the defense-oriented attack evolution studied in this dissertation

12 is among the most dangerous and efﬁcient types of evolution, since attackers deliberately

evolve their malware in order to defeat their adversaries, i.e., defense systems. However,

our purpose is not to encourage attacks, but to obtain deep insights about potential new

Internet threats and vulnerabilities of current defense systems in order to enhance current

defense systems and design more effective defenses against evolving widespread Internet

attacks.

1.4 Organization of the Dissertation

The rest of this dissertation is organized as follows. We present our investigation

on architecture-oriented evolving attacks and countermeasures in Chapters 2 and 3, then

we discuss algorithm-oriented evolving attacks and countermeasures in Chapters 4 and 5.

Speciﬁcally, we discuss the intelligent DDoS attacks against SOFS systems and counter-

measures in Chapter 2, then we present the iLOC attacks against ITM systems and countermeasures in Chapter 3. Afterwards, we detail the VSR worm attacks with our DEC detection countermeasure in Chapter 4 and we introduce worm detection against polymorphic worms through mining dynamic program execution in Chapter 5. Finally, we conclude this dissertation in Chapter 6.

13 CHAPTER 2

INTELLIGENT DDOS ATTACKS AGAINST SECURE OVERLAY FORWARDING SYSTEMS AND COUNTERMEASURES

In this chapter, we discuss the ﬁrst type of infrastructure-oriented attack in this dissertation, which infer the architecture information of a DDoS (Distributed Denial of Service) attack defending system to facilitate the DDoS attacks. Particularly, we deﬁne intelligent

DDoS attacks and generalize DDoS-defending overlay intermediate forwarding systems as

Secure Overlay Forwarding Systems (SOFS). The intelligent DDoS attacks we study here are evolved DDoS attacks which use the architectures of SOFS to launch more efﬁcient

DDoS attacks against SOFS. We analyze the SOFS system under discrete round based attacks using a general analytical approach, and analyze the system under continuous attacks using simulations. We also provide optimal system architecture conﬁgurations for SOFS system under expected attack strategies and intensities. Furthermore, we propose a set of design guidelines to enhance SOFS performance under all general attack scenarios.

2.1 Motivations

DDoS attacks are currently major threats to communications in the Internet [83]. Cur- rent level of sophistication in system resilience to DDoS attacks is far from deﬁnite. Tremen- dous amount of research is being done in order to improve the system security under DDoS

14 attacks [104, 76, 95, 66, 60, 10, 115]. For many applications, reliability of communication

over the Internet is not only important but mandatory. Typical examples of such applica-

tions are emergency, medical, and other related services. The system needs to be resilient

to attacks from malicious users within and outside of the system that aim to disrupt com-

munications.

A recent body of work in the realm of protecting communications between a set of

clients and a server against DDoS attacks employs proactive defense mechanisms using

overlay-based architectures [60, 10, 115]. Typically, in such overlay-based architectures, a

set of system deployed nodes on the Internet form a communication bridge between clients

and a critical server. The deployed nodes are intermediate forwarders of communication

from clients to the server. These nodes are arranged into overlay-based architectures (or

structures) that provide attack-resistant features to the overall communication. For exam-

ple, the architecture in the SOS system [60] is a set of overlay nodes arranged in three layers between clients and the server through which trafﬁc is authenticated and then routed.

These layers are SOAP (Secure Overlay Access Point), Beacons and Secret Servlets. A client that wishes to communicate with a server ﬁrst contacts a node in the SOAP layer.

The node in the SOAP layer forwards the message to a node in the beacon layer, which then forwards the message to a node in the secret servlet layer, which routes the message to the server. In the Mayday system [10], the authors extend work on SOS [60] by primarily releasing the restrictions on the number of layers (unlike in SOS, where it is ﬁxed at three). In the Internet Indirection Infrastructure (I3) [115], one or more Indirection points are introduced as intermediaries for communication between senders and receivers.

The design rationale in all these systems is to ensure, using proactive architectures, (i)

that the server and intermediate communication mechanisms are hidden from outsiders, (ii)

15 the presence of multiple/alternate paths to improve reliability and (iii) access control to prevent illegitimate users from being serviced, and dropping attack trafﬁc far away from the server. The overall objective though is to ensure that there are high degrees of path availabilities from clients to the server even when attackers try to compromise communication using random congestion-based DDoS attacks, by bombarding randomly chosen nodes in the system with huge amounts of trafﬁc.

While the above systems provide high degrees of path availabilities under random congestion-based DDoS attacks, such systems can be targeted by intelligent attackers that can break-into the system structure apart from congesting nodes. By break-in attacks, we mean attacks that can break-into a node and disclose its neighbors in the communication chain. By combining break-in attacks with congestion attacks, attackers can signiﬁcantly worsen damages, as opposed to pure random congestion. In fact attackers can employ results of break-in attacks (disclosed nodes) to guide subsequent congestion attacks on the disclosed nodes. Under intense break-in attacks, the attacker can traverse the communication chain between the forwarder nodes, and can even disclose the server to eventually congest it and completely annul services.

We believe that such intelligent DDoS attacks that can combine break-in attacks with congestion attacks are representative and potent threats to overlay-based systems, such as [60, 10, 115] that protect communications between clients and the servers. However, existing work does not study system performance under these intelligent attacks. In this chapter, we extensively study performance of such overlay-based systems when targeted by intelligent DDoS attacks that combine break-in and congestion attacks. We also subsequently study how design features of such systems impact performance under intelligent attacks. As a ﬁrst step, we generalize such systems as Secure Overlay Forwarding Systems

16 (SOFS). We also capture three standard architectural features of such systems 1 as layering

(the number of layers between the client and server), mapping degree (number of next layer neighbors a node can communicate with), node distribution (number of nodes per layer).

Our objective is to study the impacts of the design features of SOFS system on its performance under intelligent DDoS attacks, and to provide guidelines to design SOFS systems highly resilient to intelligent DDoS attacks.

2.2 Background

The SOFS Architecture

In its most basic version, the SOFS architecture consists of a set of overlay nodes arranged in layers of a hierarchy as shown in Fig. 2.1. The nodes in these layers serve as intermediaries between the clients and the critical target 2. Such a system has three distinguishable design features. They are Layering, Mapping (Connectivity) Degree and Node

Distribution across layers. A clearer description for each feature is given below.

Layer i Layer i+1 LayerNode i LayerNode i+1 LayerNode i LayerNode i+1 Source Layer 2 Node Node Layer L Point LayerNode 2 LayerNode L Layer 2 Node LayerNode L Node Node Layer 1 Layer 1 Node Node Target Filtered region

Figure 2.1: The generalized SOFS architecture.

1We use the terms architectural features and design features interchangeably in this chapter. 2We use the terms target and server interchangeably in this chapter.

17 • Number of Layers (Layering): The number of layers in the architecture quantiﬁes

the depth of control during access to the target. If the number of layers is L, then

clients must pass through these L layers before communicating with the target. The

importance of layering is that if the number of layers is larger, implicitly it means

that the target is better hidden against external clients.

• Mapping (Connectivity) Degree: Each node in Layer i routes to node(s) in Layer

i + 1 towards the target to complete the communication chain. The mapping degree

in the SOFS architecture is a measure of the number of neighbors a node in Layer i

has in Layer i + 1. Typically, the larger the mapping degree is, the more reliable is

the communication due to the availability of more paths. The largest is actually 1 to

all, where each node in Layer i has all nodes in Layer i + 1 as its neighbors.

• Node Distribution: Node distribution is a measure of the number of nodes in each

layer. Intuitively it may seem that the uniform node distribution across layers is

preferred to ensure a degree of load balancing in the system. However, for a ﬁxed

amount of nodes to be distributed across a ﬁxed number of layers, it may be advisable

to deploy more nodes at layers closer to the target to increase defenses in sensitive

layers nearer the target.

A client that wishes to communicate with the target first contacts node(s) in the first layer which contact node(s) in the second layer and so on till the traffic reaches the target. In this architecture each node is only aware of neighbors in its neighboring layer. A set of filters acts as a firewall surrounding the target through which only legitimate traffic is allowed.

18 2.3 Intelligent DDoS Attacks

We now discuss how intelligent attackers can compromise the SOFS system. It has the ability to break-into nodes to disclose the victim nodes’ next-layer neighbors. The attacker also has the ability to congest nodes to prevent them from servicing legitimate clients. We formally deﬁne these two attacks below.

• Break-in Attacks: The attacker has the ability to attempt to break-into nodes in the

SOFS system. A successful break-in results in dysfunction of the victim node and

disclosure of the neighbors of the victim node.

• Congestion Attacks: The attacker has the ability to congest nodes in the SOFS sys-

tem. By congest, congestion-based DDoS attacks or simply congestion attacks, we

mean any of the distributed attack methods that prevent a victim machine from pro-

viding services.

Our work focuses on the theoretical analysis of the impacts of intelligent DDoS attacks on the SOFS system, rather than the actual attack methods. However, we believe that both the break-in attacks and congestion attacks models we present are practical. The execution of break-in attacks can be through some intrusion attacks, or through malicious code hidden in the message sent by malicious clients as those in Trojan horse or active worm attacks [83].

When received by the victim node, the malicious code can execute on the victim node to make it un-functional, and retrieve the victim node’s neighbor list. The malicious code can even then self propagate to the disclosed neighbors. The execution of congestion attacks on a victim machine will result in, the victim being prevented from servicing requests or, disconnecting the victim from the system. This can be due to exhausting its key resource,

19 overloading the machine to disable communication, crashing its service, blocking its network link. Typical examples are TCP SYN attack, TCP and UDP flood attack, ICMP echo attack and Smurf attack [83]. The above two attacks can be conducted in several possible ways. However, keeping in mind the above attack types, and with the intention of max- imizing attack impacts, the attacker will usually first conduct break-in attacks to disclose the identities of many nodes. Congestion attacks on the disclosed nodes then follow after the break-in attacks. In this realm, we define two attack models below.

• A discrete round based attack model: In our attack models, the attacker can launch

break-in attacks on limited number of nodes only. In round based attack model,

the attacker launches the break-in attacks in a round by round fashion, with part of

attempts made in each round. The rationale is that, by successively breaking-into

nodes and locating their neighbors, the attacker can disclose more nodes. We call

this model as discrete because, here the attacker starts a fresh round only after the

results of all attempted break-ins in the current round are available to it. Congestion

attacks follow next, and are conducted in one round.

• A continuous attack model: In this model, the attacker attempts to disclose some

nodes ﬁrst, using part of its break-in attack resources. However in this model, the at-

tacker continuously keeps breaking-into disclosed nodes as and when they are iden-

tiﬁed. Congestion attacks follow next in a similar fashion.

The attack models are described in more detail in Sections 2.4.1 and 2.4.2. We wish to emphasize here that the SOFS system also has the recovery ability to defend against attacks.

However any meaningful execution of the recovery mechanism is contingent on the attacks.

In some cases, the system may not be able to conduct any effective recovery if the attacker

20 can speedily conduct its attack, disrupting system performance for some short duration of

time. However, if the attack is slow, the system can attempt to take effective recovery action

to restore performance. More details on system recovery are given in Section 2.4.2.

In this chapter, we study the SOFS system performance under discrete round based

attacks and continues attacks. We demonstrate that system performance is sensitive to

design features and attacks, and the architecture needs to be ﬂexible in order to achieve

better performance under different attacks.

2.4 Analysis of Intelligent DDoS Attacks against SOFS Systems

2.4.1 Analysis of Round based Intelligent DDoS Attacks

In this section we conduct an extensive mathematical analysis on the SOFS architecture

under the discrete round based intelligent DDoS attack model with no system recovery.

The system we study consists of a total of N overlay nodes that can be active or dormant.

By active, we mean that the nodes are currently in the SOFS architecture and ready to

serve legitimate requests 3.A dormant overlay node is one that is a part of the system but

currently is not in the SOFS architecture and is not serving requests. In this chapter, when

we use the term overlay node, it could mean either an active or a dormant node. We denote

the number of active nodes in the SOFS architecture (also called as SOFS nodes) by n

PL (n ≤ N) which are distributed across L layers. Layer i has ni nodes and i=1 ni = n. Each node in Layer i has one or more neighbors in its next higher layer to complete the communication chain. We deﬁne the number of next layer (Layer i) neighbors that a Layer i − 1 node has as mi.

3In the remaining of the chapter, if the context is clear, we will just use node or SOFS node to represent an active node.

21 In this chapter, we assume that the attack resources are limited. By attack resources, we mean the attack capacity, which depends on the amount of attack facilities. For instance, this can be the number of slave machines recruited by the attacker to launch DDoS attacks

[83]. We denote that the break-in attack and congestion attack resource as NT and NC

respectively. Thus NT and NC are the maximum number of nodes the attacker can launch

break-in and congestion attacks on. Here NT + NC ≤ N. With a probability PB, the

attacker can successfully break-into a node and disclose its neighbors in a break-in attempt.

In the SOFS system we study in this section, the system does not do any recovery

to counter attacks. The signiﬁcance of our analysis and the results therein we observe

here is in obtaining a fundamental understanding of attack impacts to the system (and its

features). Nevertheless, our analysis here is still practical as in some cases, the speed of

attacks may be quite high, preventing the system from performing recoveries. In such cases,

our analysis here provides insights into damages that are caused under such rapid/burst

attacks. With the SOFS system and attack speciﬁcs in place, we now formally deﬁne our

performance metric, PS, as the probability that a client can ﬁnd a path to communicate with the target under on-going attacks.

Under a One-burst Round Based Attack Model

1. Attack Model

The model we deﬁne here is an instance of the discrete round based attack model

where the number of rounds is 1. The attacker will spend all the break-in attack

resources randomly and instantly in one round and then launch the congestion attack.

Even though this model may appear simple, in reality such a type of attack is possible

when say, the system is in a high state of alert anticipating imminent attacks, which

22 the attacker is aware of and still wishes to proceed with the attack. Here we assume

the attacker has no prior knowledge about the identities of the SOFS nodes, i.e.,

which overlay nodes are currently SOFS nodes.

2. Analysis

1 2 i L L+1

client target good nodecongested node broken-in node good filter congested filter

Figure 2.2: A Snapshot of the generalized SOFS architecture under the intelligent DDoS attacks.

Our goal is to determine PS, the probability that a client can ﬁnd a path to commu-

nicate with the target under attacks. This is directly related to the number of nodes

compromised due to attacks (both break-in and congestion attacks). Thus, the key

deﬁning feature of our analysis is in determining the set 4 of attacked SOFS nodes

in each layer. An intuitive way to analyze the system is to list all possible combi-

nations of attacked nodes in each layer. Then calculate and summarize PS over all

combinations. It is easy to see that there could be many such possible combinations.

For a system with L layers and n nodes evenly distributed, such combinations will

n 2L be in θ( L ) . For a system with 3 layers and 100 SOFS nodes evenly distributed, we have about 1.0 ∗ 1010 combinations. This is a very large number, it is not practical

4We use the terms set and number of nodes in a set interchangeably.

23 to analyze the system in this fashion. To circumvent the salability problem, we take an alternate approach. Based on the weak law of large numbers we use average case analysis. We calculate the average number of attacked SOFS nodes in each layer to obtain PS. In the following, we ﬁrst derive PS, which depends on the SOFS architecture and number of attacked SOFS nodes in each layer. We will then discuss how to calculate the number of attacked SOFS nodes in each layer (including nodes broken-into and congested).

1) Derivation of PS

Recall that PS is the probability that a client can successfully communicate with the target under attacks, which depends on the SOFS architecture and number of attacked

SOFS nodes. In the SOFS architecture, a SOFS node maintains a neighbor/routing table that consists of a number of (decided by the mapping degree) SOFS nodes in its next higher layer that it can communicate with. Upon receiving a message, a node in Layer i will contact a node in Layer i + 1 from its neighbor table and forward the received message to that node. This process repeats till the target is reached via the nodes in successive higher layers. The routing thus takes place through active

SOFS nodes in a distributed fashion. We call a bad or compromised node as one that has either been broken-into or is congested and thus cannot route a message. The other overlay nodes are good or alive nodes. The neighbor table will contain entries pointing to bad neighbors during break-in or congestion attacks that can cause failure of a message being delivered. A snapshot of the system under an on-going attack is shown in Fig. 2.2.

24 To compute PS, we should first know the probability Pi that a message can be successfully forwarded from Layer i − 1 to Layer i (1 ≤ i ≤ L + 1). Here Layer L + 1 refers to the set of filters that surround the target, which are also intermediate forwarders. In our analysis, we consider this layer also because it is possible that filter identities can be disclosed during a successful break-in at Layer L. With the property of distributed routing algorithm, we can obtain PS by direct product of all Pi’s, i.e.,

L+1 PS = Πi=1 Pi. Obviously, Pi depends on the availability of good nodes in Layer i that are in the routing table of nodes in Layer i − 1. Towards this extent, we deﬁne

P (x, y, z) as the probability that a set of y nodes selected at random from x > y µ ¶ µ ¶ y x nodes contains a speciﬁc subset of z nodes. Then P (x, y, z) = / if y ≥ z, z z and 0 otherwise. We denote si as the number of bad SOFS nodes in Layer i. Recall that each SOFS node in Layer i − 1 has mi neighbors in Layer i. Then, on average

P (ni, si, mi) is the probability that all next-hop neighbors in Layer i of a node in

Layer i − 1 are bad nodes. Hence Pi = 1 − P (ni, si, mi). Thus, the probability PS that a message will be successfully received by the target can be expressed as

L+1 L+1 PS = Πi=1 Pi = Πi=1 (1 − P (ni, si, mi)). (2.1)

In (2.1), only si (number of bad nodes) is undetermined. If we deﬁne bi and ci as the number of nodes that have been broken-into and the number of congested nodes respectively in Layer i, we have si = bi + ci. In the following we will derive bi and ci.

2) Derivation of bi

25 In the one-burst round based attack model, bi depends on break-in resource NT , and the probability of break-in PB. Since the attacker launches its break-in attacks randomly, the NT break-in attempts are uniformly distributed on the overlay nodes in the

n SOFS system. Thus the average number of broken-in SOFS nodes, NB = PB N NT , and hence, n b = P ( i )(N ), i = 1, . . . , L. (2.2) i B N T

We assume here that the ﬁlters are well-protected and cannot be broken-into. Filters are special and they are not among the N overlay nodes, and thus not the targets of random attacks. Hence bL+1 = 0.

3) Derivation of ci

We now discuss the derivation of ci (number of congested nodes in Layer i). Unlike bi, ci depends on the result of break-in attacks and congestion capacity NC . Thus, we

ﬁrst need to know the set of SOFS nodes which are disclosed in the break-in attack phase on Layer i. We divide the disclosed nodes on Layer i into three sets; (i) set

N of nodes on which break-in attempts have not been made (denoted as di ), (ii) set

A of nodes that have been unsuccessfully broken-into (denoted as di ) and (iii) set of nodes that were successfully broken-into (which we do not need to consider here).

N A The nodes in sets di and di will be targeted now by congestion attacks. We calculate

N A th di and di as follows. Let Yi,j be a random variable whose value is 1 when the j node in Layer i is either a disclosed node or one on which a break-in attempt has been made. Let zi denote the average number of nodes that have been disclosed or

26 have been tried to be broken-into. Thus,

Xni Xni Xni zi = E( Yi,j) = E(Yi,j) = Pr{Yi,j = 1}, i = 1,...,L + 1. j=1 j=1 j=1 (2.3)

Denoting hi as the number of nodes on which break-in attempts have been made in

ni Layer i, we have hi = NT ( N ) for i = 1, .., L, and hL+1 = 0 because ﬁlters are not targets of break-in attacks as discussed above. Thus, the probability that the jth node in Layer i is neither a disclosed node nor one on which a break-in attempt has been made, is given by (1 − mi )bi−1 (1 − hi ). The same node can be disclosed by more ni ni than one node in the previous layer. The part (1 − mi )bi−1 excludes such overlaps. ni We now have,

m h i bi−1 i Pr{Yi,j = 1} = 1 − (1 − ) (1 − ), i = 1,...,L + 1, j = 1, . . . , ni. (2.4) ni ni

Xni m h m h i bi−1 i i bi−1 i zi = (1−(1− ) (1− )) = ni(1−(1− ) (1− )), i = 1,...,L+1. j=1 ni ni ni ni (2.5)

We hence have,

m h N i bi−1 i di = zi − hi = ni(1 − (1 − ) (1 − )) − hi, i = 2,...,L + 1. (2.6) ni ni

Xhi−bi m m A i bi−1 i bi−1 di = (1 − (1 − ) ) = (hi − bi)(1 − (1 − ) ), i = 2,...,L + 1. j=1 ni ni (2.7)

Note that nodes in the ﬁrst layer cannot be disclosed due to a break-in attack and so

N A d1 = d1 = 0.

N A The attacker will now congest the SOFS nodes in the set di and di as their identities have been disclosed and they have not been successfully broken-into. We denote

ND to be the average number of SOFS nodes that are disclosed but not broken-into

27 PL+1 N A successfully. It is given by, ND = i=1 (di + di ). Now, we proceed to derive ci,

the number of congested nodes in Layer i. Recall that NC is the overall number of

overlay nodes that the adversary can congest, and the congestion attacks follow after

break-in attacks. There are two cases here;

• NC ≥ ND: In this case, all ND disclosed SOFS nodes will be congested. Since

the attacker still has capacity to congest NC − ND overlay nodes, it will expend

its spare resources randomly. The extra congested nodes will be uniformly ran-

N A domly chosen from the remaining N −NB −(ND −dL+1 −dL+1) good overlay

N A nodes, among which only a part are SOFS nodes. Here dL+1 and dL+1 are

parts of the ﬁlers and hence are excluded from ND to determine the remaining

overlay nodes that are targets for random congestion attacks 5. Therefore,

( A N A N A ni−bi −di −di di + di + (NC − ND) ∗ N−N −(N −dN −dA ) , i = 1,...,L, ci = B D L+1 L+1 N di , i = L + 1. (2.8)

• NC < ND: The attacker randomly congests NC nodes among ND disclosed

nodes. In this case,

NC N A ci = (di + di ), i = 1, 2,...,L + 1. (2.9) ND

Recall that si = bi + ci is the set of bad nodes in Layer i. Having thus computed bi

and ci, we obtain PS from (2.1).

3. Numerical Results and Discussion

We now present here numerical results based on our analysis above. We speciﬁcally

highlight the overall sensitivity of system performance to attacks and the impacts of

5In our model, the ﬁlters’ identities are hidden from attackers and they can be congested only upon disclosure by break-in attacks.

28 speciﬁc SOFS design features (layering and mapping degree) on performance under attacks. Impacts of node distribution per layer are discussed in the successive round based attack model in Section 2.4.1.

Fig. 2.3 shows the relationship between PS and the layering and mapping degree under different attack intensities. The mapping degrees (referred as m in the figures) used here are; 1 to 1 mapping which means each SOFS node has only one neighbor in the next layer; 1 to half mapping which means each node has half of all the nodes in the next layer as its neighbors; and 1 to all mapping which means each node has all the nodes in next layer as its neighbors. Other system and attack configuration parameters are; N = 10000, n = 100, PB = 0.5, SOFS nodes evenly distributed among the layers, and number of filters is 10. In Fig. 2.3 (a), NT is set as 0 and we evaluate performance under two congestion intensities; NC = 2000 and NC = 6000 representing moderate and heavy congestion attacks. In Fig. 2.3 (b), we fix NC =

2000 and analyze two intensities of break-in; NT = 200 and NT = 2000. We make the following observations.

1 1

0.8 m=1 to 1,Nc=2000 0.8 m=1 to half,Nc=2000 0.6 m=1 to all,Nc=2000 0.6 m=1 to 1, Nt=200 m=1 to 1,Nc=6000 m=1 to half, Nt=200

Ps m=1 to half,Nc=6000 Ps m=1 to all, Nt=200 0.4 m=1 to all,Nc=6000 0.4 m=1 to 1, Nt=2000 m=1 to half, Nt=2000 m=1 to all, Nt=2000 0.2 0.2

0 0 1 2 3 4 5 6 7 8 9 101112 1 2 3 4 5 6 7 8 9 101112 L L (a) (b)

Figure 2.3: Sensitivity of PS to L and mi under different attack intensities.

29 Fig. 2.3 (a) shows that under the same attack intensities, different layer numbers result in different PS. When NT = 0 (pure random congestion attack), as L increases,

PS goes down. This is because, there are less nodes per layer, which means under random congestion, few nodes per layer are left uncompromised. This behavior is more pronounced when the mapping degree is small. We wish to remind the reader about the SOS architecture [60], where for defending against random congestion- based DDoS attacks (same attack model as in this instance), the number of layers is ﬁxed as 3 and the mapping degree is 1 to all. From Fig. 2.3 (a), we can see that

ﬁxing the number of layers as 3 is not always the best solution to defend against such attacks. Instead, 1 layer is the best conﬁguration to defend against pure congestion- based attacks.

For any L, a larger mapping degree (more neighbors for each node) means more paths from nodes in one layer to nodes in the next layer, thus increasing PS as seen in Fig. 2.3 (a) under the absence of break-in attacks. Under break-in attacks, a high mapping degree is not always good as more nodes are disclosed due to break-ins.

For instance when the mapping is 1 to all, PS = 0 in Fig. 2.3 (b). Thus the effect of mapping typically depends on the attack intensities in the break-in and congestion phase. Finally, we see that an increase in NC and NT (attack intensities) leads to a decrease in PS as more nodes could be congested or broken-into, leading to a reduction in path availabilities.

30 Under a Successive Round Based Attack

1. Attack Model

In the following, we extend our one-burst attack model signiﬁcantly in order to study

performance under a highly sophisticated attack model called successive round based

attack model (successive attack in short). The successive attack model is represen-

tative of sophisticated attacks targeting the SOFS system and extends from the one-

burst attack model in two ways: (i) the attacker exploits prior knowledge about the

ﬁrst layer SOFS nodes. Let PE represent the percentage of nodes in the ﬁrst layer

known to the attacker prior to attack (typically, these are ﬁrst layer nodes advertized

to clients), (ii) the break-in attack phase is conducted in R rounds (R > 1), i.e.,

the attacker will launch its break-in attacks successively rather than in one burst. In

this attack model, more SOFS nodes are disclosed in a round by round fashion thus

accentuating the effect of break-in attacks.

The strategy of the successive attack is shown in Procedure 1. We denote β to be

the available break-in attack resources at the start of each round, and β = NT at the

start of round 1. For each round, the attacker will try to break-into a minimum of

NT α nodes and is ﬁxed as R . If the number of disclosed nodes is more than α, the attacker borrows resources from β to attack all disclosed nodes. Otherwise it attacks

the nodes disclosed and some other randomly chosen nodes to expend α resources

for that round. The break-in attack capacity available (β) keeps decreasing till the

attacker has exhausted all of its NT resources. At any round, if the attacker has

discovered more nodes than its available capacity(β), it tries to break-into a subset

(β) of the disclosed nodes then starts the congestion phase. The attacker will congest

31 Procedure 1 Pseudocode of the successive attack strategy

System parameters: N, n, L, PB; Attack parameters: NT , NC , R, X1, β, α Phase 1 Break-in attack: NT 1: β = NT , α = R ; 2: for j = 1 to R do 3: if Xj < α < β then 4: launch break-in attack on all Xj nodes and randomly launch break-in attack on α − Xj nodes and calculate the set Xj+1 disclosed nodes; update β = β − α; 5: end if 6: if Xj < β ≤ α then 7: launch break-in attack on all Xj nodes and randomly launch break-in attack on β − Xj nodes and calculate the set Xj+1 disclosed nodes; break; 8: end if 9: if α ≤ Xj < β then 10: launch break-in attack on all Xj nodes and calculate the set Xj+1 disclosed nodes; update β = β − Xj; 11: end if 12: if Xj ≥ β then 13: launch break-in attack on β nodes among Xj nodes and calculate the set Xj+1 disclosed nodes; break; 14: end if 15: end for 16: calculate ND; Phase 2 Congestion attack:

1: if NC > ND then 2: congest the ND nodes and randomly congest (NC − ND) nodes; 3: else 4: congestion NC nodes among ND nodes randomly; 5: end if

all disclosed nodes and more, or only a subset of the disclosed nodes depending on

its congestion capacity NC . Here we assume the attacker will not attempt to break-

into a node twice and a node broken-into will not be targeted by congestion attack.

Although there can be other variations of such successive attacks, We believe that

ours is a representative enough model of sophisticated attacks.

32 2. Analysis

We again use average case analysis approach and use a similar method to derive PS

as in (1). In calculating bi and ci in the one-burst attack model we analyzed before,

we had to take care of two possible overlap scenarios (i) a disclosed node could have

been already broken-into, (ii) the same node being disclosed by multiple lower layer

nodes. The complexity in overlap is accentuated here due to the nature of successive

attacks. This is because there are multiple rounds of break-in attacks before conges-

tion. We thus have to consider the above overlaps in the case of multiple rounds as

well. In the following, we will ﬁrst introduce a concept of SOFS node demarcation

in order to deal with above overlaps, and follow that with deriving bi and ci in each

round.

1) Node demarcation

In order to preserve the information about a node per round and across layers, we

introduce subscript j for round information, and subscript i for layer information.

We deﬁne Xj as the number of nodes whose identities are known to the attacker at

the start of round j. In order to deal with overlaps within and between rounds, we

need to separate the SOFS nodes into multiple sets as follows. At the beginning of

each round j, the attacker will base its break-in attack on the set of nodes disclosed

at the completion of round j − 1. We denote the set of nodes which are disclosed at

D round j−1 and on which break-in attempts are made in round j, as hi,j. Depending on its spare capacity for that round, the attacker can also select more nodes to randomly

A D A break-into. We denote this set as hi,j. We deﬁne hi,j = hi,j +hi,j, which is the number of nodes on which break-in attempts (successfully/unsuccessfully) have been made

33 at Layer i in round j. Once the attacker has launched its break-in attacks on these

D A hi,j nodes, it will successfully break-into a set of nodes. We denote bi,j and bi,j as

D A the set of nodes successfully broken-into, and denote ui,j and ui,j as the set of nodes

D unsuccessfully broken-into, after the attacker launches its break-in attacks on the hi,j

A and hi,j set of nodes respectively. We have,

D D A A bi,j = PB ∗ hi,j, and bi,j = PB ∗ hi,j, i = 1, . . . , L, (2.10)

D D A A ui,j = (1 − PB) ∗ hi,j, and ui,j = (1 − PB) ∗ hi,j, i = 1, . . . , L. (2.11)

j−1 = A + D ∑ hik hij hij hij k=1

W dij A dij *** *** ***** **** ***** ***** ** ** N * ***** ***** ***** ** * dij ** ****+ ***** **** - ** * * o o + + **** * - - * * o o o *** * - - - o o o + + + + *** - D A + + + - - - o o ** - - - - uij uij + + + + ** * - + + + + * * - - - + + + + - - - - + + + - - - A + + + - - - D = N hij + A D - hij dij −1 bij bij

Figure 2.4: Node demarcation in our successive attack at the end of Round j.

D A W Breaking-into nodes in sets bi,j and bi,j will disclose a set of nodes denoted by di,j.

W This set, di,j will overlap with (i) the nodes attacked in all previous rounds given

Pj−1 A D D by k=1 hi,k, (ii) the nodes in set bi,j, (iii) the nodes in set bi,j and ui,j and (iv) the

A W A nodes in set ui,j, where we denote the set of nodes in di,j overlapping with ui,j as

A di,j. Fig. 2.4 shows such overlaps at the end of round j. After discounting all the

W above overlaps from di,j, we can get the set of disclosed nodes which have not been

N D attacked till the end of round j denoted as di,j. Based on the deﬁnitions for hi,j and

34 N di,j, and that the ﬁlters are not targets of break-in attacks, we have

D N hi,j = di,j−1, i = 1, . . . , L. (2.12)

N A Note that di,j−1 and di,j−1 are 0 for i = 1. This is because the nodes at the ﬁrst layer cannot be disclosed by means of a break-in attack in any round j. Recall that Xj is the set of disclosed nodes whose identities are known to the attacker before round j and on which break-in attacks will be made at round j. Thus it can be calculated

PL N as Xj = i di,j−1. In the following, we proceed to derive the number of broken-in nodes (bi) and then compute the number of congested nodes (ci) for each round.

2) Derivation of bi

To derive bi, we need to first calculate the sets defined above. For ease of elucidation, we take a representative case Xj < α < β in Procedure 1 as an example to explain our analysis. Recall that β is the amount of available break-in attack resource at current round. This is the most representative case among the ones possible. We also discuss other possible cases briefly after analyzing this case. In this case, the attacker at the beginning of round j of its break-in attack phase has resources to break-into

N more nodes than those disclosed already prior to that round (di,j−1), and has attack resources left (α − Xj) to randomly conduct break-in on other overlay nodes. Now

PL Pj−1 there are N − Xj − q=1 k=1 hq,k unattacked overlay nodes and among them

N Pj−1 A ni − di,j−1 − k=1 hi,k are at Layer i. Thus, we can get the number of nodes (hi,j) on which random break-in attempts have been made on Layer i in round j as

N Pj−1 A ni − di,j−1 − k=1 hi,k hi,j = PL Pj−1 (α − Xj), i = 1, 2, . . . , L. (2.13) N − Xj − q=1 k=1 hq,k

35 We deﬁne bi,j as the number of nodes broken-into on Layer i in round j, which is the

A D 6 summation of bi,j and bi,j. Based on (2.10), (2.12) and (2.13), we have,

N Pj−1 ni − di,j−1 − k=1 hi,k N bi,j = PB ∗ PL Pj−1 ∗ (α − Xj) + PB ∗ di,j−1, i = 1, 2, . . . , L. N − Xj − q=1 k=1 hq,k (2.14)

We can now obtain bi as, XJ bi = bi,k, i = 1, 2, . . . , L. (2.15) k=1 where J is the number of rounds attacker takes to exhaust all break-in resources

(NT ). Note that J ≤ R.

N To obtain bi, we need to compute the set of nodes di,j, which is used in (2.14). As

N W discussed above, we have to extract the set di,j from di,j. Similar to the discussion in

N A the one-burst attack model, we can derive di,j and di,j as follows. We ﬁrst calculate the set of nodes that have been either disclosed or attacked. This is given by, P m j h i bi−1,j k=1 i,k zi,j = ni(1 − (1 − ) (1 − )), for bi−1,j > 0 and i = 2,...,L + 1. ni ni (2.16)

Note that in our attack model, the attacker will not try to break-into a node twice.

N Hence, to calculate di,j, from zi,j, we subtract the nodes on which break-in attempts

Pj have been made ( k=1 hi,k). Thus, we have,

Xj N di,j = zi,j − hi,k, for bi−1,j > 0 and i = 2,...,L + 1. (2.17) k=1

N A Having computed di,j, we can use (2.14) and (2.15) to obtain bi. Now, di,j (which

will be used to compute ci) is given by,

m A A A i bi−1,j di,j = (hi,j − bi,j)(1 − (1 − ) ), for bi−1,j > 0 and i = 2,...,L + 1. (2.18) ni

6 D A Recall that hL+1,j, hL+1,j and bL+1,j are all 0 because ﬁlters are not targets of break-in attacks.

36 Having discussed the necessary derivations for the representative case above in detail, we now clarify the readers about the situations involving particular cases for the successive attack. Apart from the representative case we have just discussed, there are three other cases: (i) Xj < β ≤ α, (ii) α ≤ Xj < β, and (iii) β ≤ Xj. For case

(i), all the formulas we derived for the above case can be directly applied, except that

α has to be replaced by β. For case (ii), all the formulas in the above case can be

A A applied except that hi,j = 0. For case (iii), we have hi,j = 0, and the formulas derived in the representative case have to be suitably modiﬁed. In this case, there are some disclosed nodes that the attacker does not try to break-into due to consumption of all break-in resources. Such nodes will be attacked during the congestion phase. We denote this set of nodes in Layer i after round j as fi,j. We wish to state here that fi,j has relevance (fi,j > 0) only when the attacker completes its break-in attack phase at round j. Thus in this case, there is no left resource for random break-in attacks.

Only β disclosed nodes will be attempted to be broken-into on all the layers and they are uniformly randomly distributed on each layer. Then we have,

N N β A D N fi,j = di,j−1 −di,j−1 ∗( ), hi,j = 0, hi,j = di,j−1 −fi,j, for i = 1, 2, . . . , L, and Xj (2.19) P P j j m j h + j f X X N i bi−1,j k=1 i,k k=1 i,k di,j = ni(1−(1− ) (1− ))− hi,k− fi,k, i = 2,...,L+1, ni ni k=1 k=1 (2.20)

A where bi−1,j > 0. Here, di,j is the same as (2.18) and fL+1,j = 0 because ﬁlters are not targets of break-in attacks. With the above derivations in this case, we can now use (2.14) and (2.15) to calculate bi.

37 3) Derivation of ci

Recall that in the congestion attack phase, the attacker will ﬁrst congest the disclosed

SOFS nodes disclosed in break-in attack phase. Let the ﬁnal round of the break-in attack be J(J ≤ R). Denoting ND as the number of disclosed nodes but not broken-

D N A into, based on the deﬁnitions of ui,j, di,j, di,j and fi,j, we have,

XL XJ XJ XL XL XL XJ D N N A ND = ui,k + dL+1,k + di,J + fi,J + di,k. (2.21) i=1 k=1 k=1 i=2 i=1 i=1 k=1

PL PJ We have the total number of broken-in nodes, NB = i=1 k=1 bi,k. If NC ≥ ND, similar as (2.8) we have the number of congested nodes per layer, ci as

 P P  J uD + dN + J dA + f + (N − N )  k=1 i,kP i,J Pk=1 i,k i,J PC D  (n − J b − J uD − dN − J dA − f ) c = i k=1 i,k kP=1 i,k i,J k=1 i,k i,J i  /(N − N − (N − j dN )), i = 1,...,L,  B D k=1 L+1,k  Pj N k=1 dL+1,k, i = L + 1. (2.22)

If NC < ND, similar as (2.9) we have ( P P NC J D N J A ∗ ( u + d + fi,J + d ), i = 1,...,L, ND k=1 i,k i,J k=1 i,k ci = P (2.23) NC ( J dN ), i = L + 1. ND k=1 L+1,k

Recall that si = bi +ci is the set of bad nodes in Layer i. We can now obtain PS from

(2.1).

Note that prior knowledge about identities of the ﬁrst layer SOFS nodes (PE) determines X1, i.e., X1 = n1 ∗ PE. In fact, we can consider this information as that obtained from a break-in attack at Round 0. The number of nodes “disclosed” at

Round 0 is thus n1 ∗ PE, all of which are distributed at the ﬁrst layer. At round 1, the

N attacker will launch its break-in attack based on this information. Thus bi,j, di,j, ci

38 etc., can be calculated by application of Formulas (2.10) to (2.23). We wish to point

out that if we set PE = 0 and R = 1, the successive attack model degenerates into

N the one-burst attack model. Thus the formulas to compute bi,j, di,j, ci etc., will be simpliﬁed to the corresponding ones derived in the previous sub-section.

3. Numerical Results

In the following, we discuss the system performance (PS) under successive attacks.

Unless otherwise speciﬁed, the default system and attack parameters are N = 10000,

n = 100, L = 4, NC = 2000, NT = 200, R = 3, PB = 0.5, PE = 0.2 and the

SOFS nodes are evenly distributed among the layers. We introduce two new mapping

degrees here, namely 1 to 2 mapping, meaning each SOFS node has 2 neighbors in

the next layer; and 1 to 5 mapping, meaning each node has 5 neighbors in the next

layer.

1 1 m=1to2, N=1000 m=1to5, N=1000 m=1to2, L=2 m=1to2, N=10000 m=1to5, N=10000 m=1to2, L=4 0.8 m=1to2, N=100000 m=1to5, N=100000 0.8 m=1to2, L=6 m=1to5, L=2 0.6 0.6 m=1to5, L=4 m=1to5, L=6 Ps Ps 0.4 0.4

0.2 0.2

0 0 10 100 1000 10000 100000 10 100 1000 10000

NT NT (a) (b)

Figure 2.5: Sensitivity of PS to NT under different L, mi and N.

In Fig. 2.5 we show how system performance, PS, changes with NT as the other

SOFS system parameters change. Fig. 2.5 (a) shows how the mapping degree and

39 total number of overlay nodes inﬂuence the relation between NT and PS. In this conﬁguration, we set NC = 2000 and even SOFS node distribution. Fig. 2.5 (b) shows the sensitivity of PS to NT under different number of layers, L, and different mapping degree. We make the following observations. First, PS is sensitive to NT .A larger NT results in a smaller PS. For higher mapping degrees, PS is more sensitive to changing NT . The reason follows from previous discussions that a higher mapping degree discloses more nodes under break-in attacks. Second, in Fig. 2.5, there is portion of the curve, where PS almost remains unchanged for increasing NT . This stable part is due to advantages offered by means of the layering in SOFS architecture against break-in attacks guided by prior disclosure of SOFS nodes. The fall in PS beyond this stable part is due to the effect of random break-in attacks apart from break-in attacks guided by prior disclosure of SOFS nodes.

1 1 m=1to1 m=1to2 m=1to2, node_dist=even m=1to5, node_dist=even m=1to2, node_dist=increasing m=1to5, node_dist=increasing m=1to5 m = 1tohalf m=1to2, node_dist=decreasing m=1to5, node_dist=decreasing 0.8 m=1toall 0.8

0.6 0.6 Ps Ps 0.4 0.4

0.2 0.2

0 0 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 L L (a) (b)

Figure 2.6: Sensitivity of PS to L, mi and node distribution.

Fig. 2.6 (a) shows the impact of layer number, L, on system performance, PS, under different mapping degrees. Similar to Fig. 2.3 (a) (b), PS is sensitive to L and the mapping degree, even under multiple rounds of break-in attacks, i.e., when NT > 0

40 and R > 1. An increase in the number of layers can always slow down penetration of the break-in attacks towards the target. However, if the system deploys too many layers, it decreases the number of nodes on each layer and the number of paths between layers decreases correspondingly, which will cause a decrease in PS (Recall that in our evaluation, the total number of SOFS nodes is ﬁxed). Among the conﬁgurations we tested, the one with L = 4 and mapping degree 1 to 2 provides better overall performance than others.

Fig. 2.6 (b) shows the impact of node distribution on PS when L and the mapping degree change. Other parameters remaining unchanged, here we show the sensitivity of performance to three different node distributions per layer. The ﬁrst is even node

n distribution wherein the nodes in each layer are the same (given by L ). The second is increasing node distribution, wherein the number of nodes in the ﬁrst layer is ﬁxed

n ( L ). This is to maintain a degree of load balancing with the clients. The other layers have nodes in an increasing distribution of 1 : 2 : ... : L − 1. The third is decreasing

n node distribution where the number of nodes in the ﬁrst layer is ﬁxed ( L ) and those in the other layers are in decreasing order of L − 1 : L − 2 : ... : 1. However, there can be other node distributions. We believe the above ones are representative to study the impact of node distributions.

We make the following observations. The node distribution does impact system performance. The sensitivity of PS to the node distribution seems more pronounced for higher mapping degrees (more neighbors per node). A very interesting observation we make is that increasing node distributions performs best among the tested node distributions. This is because when the mapping degree is larger than 1 to 1, breaking-into one node will lead to multiple nodes being disclosed at the next layer,

41 hence the layers closer to the target will have more nodes disclosed and are more vulnerable. More nodes at these layers can compensate the damage of disclosure. Also, we observe that as the number of layers increases, the sensitivity to node distribution gradually reduces. This is because as L increases, the difference in the number of nodes per layer turns to be less for the different node distributions.

1 1 m=1to2, L=2 L=1 L=2 L=3 m=1to2, L=4 0.8 0.8 m=1to2, L=6 L=4 L=5 L=6 m=1to5, L=2 0.6 0.6 m=1to5, L=4 m=1to5, L=6 Ps Ps 0.4 0.4

0.2 0.2

0 0 1 2 3 4 5 6 0 0.2 0.4 0.6 0.8 1

R PE (a) (b)

Figure 2.7: Sensitivity of PS to R (a) and PE (b).

Fig. 2.7 (a) shows the impact of R (the number of rounds) on PS under different L with mapping degree 1 to 5. The nodes are evenly distributed among the layers in this case. Overall, PS is sensitive and decreases when R increases. For larger values of L, PS is less sensitive to R because more layers can provide more protection from break-in attacks even for large round numbers. We also observe that PS is sensitive to

PE in Fig. 2.7 (b). For higher mapping degrees, PS is more sensitive to changing PE.

The reason follows from previous discussions that a higher mapping degree discloses more nodes. For smaller L, PS is more sensitive to changing PE because a smaller L increases the attacker’s chance to penetrate the system, layer by layer.

42 2.4.2 Analysis of Continuous Intelligent DDoS Attacks

In this section, we study the performance of the SOFS system in the presence of another

type of intelligent DDoS attacks called continuous attacks. We also study the impacts of

recovery mechanisms the SOFS can incorporate in this section. The performance metric

here is still PS.

Attack Model and System Recovery

The continuous attack model is different from the discrete round based attack model

proposed above in the sense that the attacker continuously breaks into SOFS nodes as and

when their identities are revealed to the attacker (and not in rounds). We deﬁne NT and NC

to be the maximum number of overlay nodes that can be simultaneously under break-in or

congestion attacks. Furthermore, here the attacker reuses its resources (NT and NC ) in a

more sophisticated way as follows. During system recovery (discussed next), the attacker

will know that a compromised node is recovered (it is replaced with a good node). If the

attacker attacks a non-SOFS node 7, it will also know that it is a non-SOFS node. In either

case, the attacker will redirect the attack to a new node in time Tred, which is referred as

attack redirection delay.

Under on-going congestion attack, the attacker will keep attacking a victim node as

long as it is an SOFS node. During break-in attacks, once a break-in attempt is completed

on a node (irrespective of the result), the attacker will redirect the break-in attack to another

node also in time Tred. When the attacker redirects the attack, it will use the disclosed node list if there is any node in that list, otherwise it will randomly pick a node from all the overlay nodes except ones currently under attack. Obviously, the disclosed nodes are all

7Recall that a SOFS node is one that currently is active in the SOFS structure, while a non-SOFS node is one that is a part of the overlay system, but is not a part of the SOFS structure currently.

43 SOFS nodes, so they will be targeted ﬁrst by break-in attacks if there are enough resources.

Otherwise, the nodes are attacked by congestion attacks.

In our analysis here, the SOFS system employs recovery to defend against attacks.

While there can be many potential recovery mechanisms, the one we employ is proactive

recovery, where a proactive reset mechanism periodically resets every SOFS node. When

a proactive reset event happens on a SOFS node, the SOFS system immediately replaces

that node with a new SOFS node chosen from the set of non-SOFS nodes. We denote the

interval between two successive proactive resets on a SOFS node as Tp, which is called

system recovery delay. In this study, we mainly focus our discussion on proactive recov-

ery. Interested readers can refer to [128] for our discussion and analysis on other recovery

mechanisms.

Analysis

The goal of our analysis here is to study the impacts of system design features on sys-

tem performance under continous attacks with system recovery. An analytical approach for

this case, similar to the one conducted under discrete round based attacks, is too compli-

cated. We use simulations here to study system performance under continuous attacks in

the presence of system recovery.

In order to analyze the system, we implement a discrete event driven simulation tool

to simulate the attack model and system recovery. The simulated system consists of 5000 overlay nodes among which there are 40 SOFS nodes, and 10 ﬁlters. Each client is con- nected to 5 ﬁrst layer SOFS nodes. In our simulations below, the attack redirection delay

(Tred) and the system recovery delay (Tp) follow exponential distributions. The system

mean value of Tp recovery is sensitive to , denoted as r, instead of the individual value Tred mean value of Tred or Tp. Thus r measures the competition between attacks and system recovery in terms of

44 speed. A smaller value of r, implies faster recovery, which is beneﬁcial for the system.

In the simulations below we only use r to discuss the impacts of continuous attacks and

system recovery. Numerical Results and Discussions

In following simulations, the default system and attack parameters are L = 4, PB = 0.5,

PE = 0.2, NT = 200 and NC = 200. Fig. 2.8 (a) shows the impact of layer number, L,

1 1 r=5, L=4 r=5, L=7 0.8 0.8 r=10, L=4 r=10, L=7 r=20, L=4 r=20, L=7 0.6 0.6 Ps Ps r=5, m=1to1 0.4 0.4 r=5, m=1to2 0.2 r=5, m=1toall 0.2

0 0 1 2 3 4 5 6 7 8 0 500 1000 1500 2000 2500

L NC (a) (b)

Figure 2.8: Sensitivity of PS to L under different m (a), and to NC under different L and r (b).

on PS under different mapping degrees when both NT and NC are ﬁxed as 200, and r = 5.

Similar to Fig. 2.6 (a), PS is sensitive to L and the mapping degree. The sensitivity of PS to

L and mapping degree is less than that in the discrete round based attack model. The reason

is due to the presence of system recovery, where the system replaces the compromised

and disclosed SOFS nodes, attack impacts are reduced. Fig. 2.8 (b) shows how L and r

inﬂuence PS when NT = 200, mapping degree is 1 to 2, and NC changes. Here L = 4 is

always better than L = 7. This is because, when NT is ﬁxed and NC increases, random

congestion attacks dominate, and hence less layers will improve performance as discussed

in the round based attack model.

45 1 1 r=5, m=1to2 r=5, m=1tohalf r=5, L=4 r=5, L=7 r=10, m=1to2 r=10, m=1tohalf r=10, L=4 r=10, L=7 0.8 0.8 r=20, m=1to2 r=20, m=1tohalf r=20, L=4 r=20, L=7 0.6 0.6 Ps Ps 0.4 0.4

0.2 0.2

0 0 0 500 1000 1500 2000 2500 0 500 1000 1500 2000 2500

NT NT (a) (b)

Figure 2.9: Sensitivity of PS to NT under different r and L (a), and different r and m (b).

Fig. 2.9 (a) shows how PS changes with NT under different L and r. The mapping degree is ﬁxed as 1 to 2. In most of the cases, L = 4 performs better than L = 7. This is because, in our simulations the total number of SOFS nodes is ﬁxed. Under this situation, deploying more layers decreases the number of nodes on each layer, and so decreases the number of paths from clients to the target. However, there is one exception to this claim.

When r = 20 and NT = 50, L = 7 performs better than L = 4, which shows that more layers can be beneﬁcial. The reason is, when NT is very small, there are few nodes disclosed and compromised at each layer. In this situation, decrease in PS is mainly due to disclosure and compromise of ﬁlters which are at the last layer. Here, slow recovery

(large r) cannot recover the compromised ﬁlters effectively. In this case, more layers can slow down the penetration of break-in attacks towards the ﬁlters and helps achieve better performance. The data also demonstrate that faster system recovery (smaller r) improves system performance more effectively.

Fig. 2.9 (b) shows how PS changes with NT under different mapping degrees and r, when L = 4. When NT is small, smaller mapping is better, especially when r is large.

But when NT is large, larger mapping performs better. This is because, when NT is not

46 large and mapping degree is small, fewer nodes are disclosed. Hence fewer nodes are

attacked, resulting in high PS. However, when NT is very large, many SOFS nodes are disclosed and compromised and it is the system recovery that maintains a certain (possibly small) number of nodes alive, which guarantees PS > 0. The number of alive nodes here is mainly determined by r, which is not related to mapping degree. But mapping degree decides the number of available paths. Given a number of alive nodes, a larger mapping degree means more paths. Hence, PS increases with larger mapping degree, especially when r is small (fast system recovery).

From the above,we see that attack intensities and system design features have signiﬁ-

cant impacts on system performance under continuous attacks with system recovery. We

also ﬁnd that recovery plays a signiﬁcant role in reducing impacts caused by even intense

attacks, by still sustaining a certain level of system performance. Large mapping degrees

help achieve better system performance in this circumstance.

2.5 Countermeasures

2.5.1 Optimization of SOFS System Performance Under Round based Attacks

In the above analysis, we have made important observations on SOFS system perfor-

mance, and the impacts of the design features on system performance under round based

intelligent DDoS attacks, using extensive analytical derivations. Based on this deep insight

of the attacks, we can provide countermeasures against them in order to optimize SOFS

system performance under attacks. In the following, we address this issue. For the pur-

pose of briefness, we only present the methods to obtain optimal mapping degree and node

47 distribution as examples. Optimal conﬁgurations for other design features can be obtained

similarly.

The performance of the SOFS system, i.e., PS, is a function of system design features

and attack parameters, as seen in (2.24) below, where m[] and n[] are the mapping degree

and the number of nodes on each layer. Function F contains all parameters that are used to

calculate PS and summarizes the above formulas to calculate PS.

PS = F (N, n, NC ,NT , L, m[], n[],PB,PE,R) (2.24)

If all other system and attack parameters are ﬁxed and we keep m[] as variables, then

we can use existing mathematic tools such as MATLAB to get the optimal mapping degree

under given system and attack parameters. Table 2.1 shows optimal mapping degrees under

default system and attack parameters that were used in Section 3. We can see that the

8 optimal mapping degree changes from 1 to all to 1 to 2, when NT changes from 0 to 2000.

This matches our previous observation that smaller mapping degrees improve the resilience of the system to break-in attacks. In Table 2.2, we get the optimal node distributions under

n two different NT values when L = 4, n1 is ﬁxed as L , i.e., 25, mapping degree is 1 to 2, and all other parameters are set as the default conﬁguration values. The results match our previous observation that increasing node distribution performs better than other node distributions.

While the above approach is useful in some cases, the real problem is how to optimize

multiple structure parameters simultaneously to achieve optimum performance. To com-

plicate this further, some of the parameters especially attack related (such as NC ,NT ) may

be unknown to the system designer at design time. In some cases some parameters can be

8While obtaining optimal mapping degrees we constrain the mapping degrees to be equal across layers for consistency in workload across nodes in the system. However this constraint can be relaxed if need be.

48 NT NT = 0 NT = 20 NT = 200 NT = 2000 Optimal mapping degree 1 to all 1 to 4 1 to 3 1 to 2

Table 2.1: Optimal mapping degree with different NT

NT n1 n2 n3 n4 NT n1 n2 n3 n4 200 25 20 21 34 600 25 22 22 31

Table 2.2: Optimal node distribution under 1 to 2 mapping with different NT

estimated to be within ranges. Also, the system may have other constraints like latency,

workload per node etc., that impact choices in number of layers, mapping degree etc. The

optimization of design features needs to take these issues into consideration too. It can be

easily seen that solving the overall optimization problem is thus not easy. Nevertheless,

we do provide some discussions on how to obtain optimal conﬁgurations under reasonable

assumptions on system and attack generalities.

Consider an instance, where intensities can be predicted within some interval, i.e., we

know the ranges and the distributions of NC and NT values. Then, a reasonable approach

to address this problem is to obtain conﬁgurations to optimize the expected value of the

0 0 path availabilities denoted as E(PS). It is formally deﬁned in (2.25), where P r(NC ,NT )

0 0 is the probability that the NC and NT have values of NC and NT respectively.

X 0 0 0 0 E(PS) = P r(NC ,NT ) × F (N, n, NC ,NT , L, m[], n[],PB,PE,R). (2.25) 0 0 NC ,NT Based on (2.25), we can use optimization tools such as those in MATLAB to get the

optimal mapping degree (m[]) and node distribution (n[]) to achieve overall optimal performance under certain ranges of NC and NT . In reality, the range and distribution of NC

49 and NT , and even other attack parameters can be obtained from historical experience and run-time measurement. Other attack parameters that can be estimated within ranges can be handled in the same way we deal with NC and NT .

To summarize here, the attack strategies, intensities, prior knowledge about the system signiﬁcantly impact system performance. However, the impacts are deeply inﬂuenced by the system design features. Larger values of L and smaller mapping degrees improve system resilience to break-in attacks, while the reverse is true for congestion-based attacks.

Increasing node distribution performs better than other node distributions. These design features interact among each other to impact system performance under intelligent DDoS attacks.

2.5.2 General Design Guidelines to Enhance SOFS System Performance

Although we couldn’t give the performance optimization for SOFS system under continuous attacks, but we are still able to obtain the impacts of design features of SOFS system on performance under continuous attacks, which matches our observations under round based attacks.

Based on our ﬁndings for all the attack models in the chapter, we propose a set of design guidelines to enhance performance under all general scenarios here as follows.

• The design feature conﬁgurations should be ﬂexible and adaptive to achieve high

performance under different intensities of attacks.

• When attack information is unknown, moderate number of layers and mapping de-

gree, and increasing node distribution are recommended to sustain a more than ac-

ceptable level of performance.

50 • When break-in attacks dominate, more layers and smaller mapping degrees are rec-

ommended. When congestion-based attacks dominate, less layers and larger map-

ping degrees are better.

• System recovery is always helpful to improve system performance under attacks.

Under intense break-in attack, system recovery with large mapping can always help

sustain a more than acceptable level of performance.

2.6 Related Work

The main scope of this work is in the realm of overlay systems (organized into deﬁnite

structures) for defending against Distributed DoS attacks. The surveys in [83, 104, 76] on

DDoS attacks and defense are exhaustive, and interested readers can refer to those papers.

In the following, we focus on work using overlay systems in general to defend against

DDoS attacks.

Recently, several works have proposed solutions based on overlay networks to enhance

security of communication systems like [60, 10, 115, 116, 11, 26, 14, 125]. An overlay

solution to track DDoS ﬂoods has been proposed in [116]. [11] proposes a overlay rout-

ing infrastructure to enhance the resilience of the Internet. Chen and Chow designed a

random Peer-to-Peer network that connects the registered client networks with the regis-

tered servers to defend against DDoS attacks in [26]. Badishi et. al present a systematic

study of the vulnerabilities of gossip-based multicast protocols to DoS attacks and propose

a simple gossip-based multicast protocol that eliminates such vulnerabilities in [14]. The

effectiveness of location-hiding of proxy-network based overlays is discussed in [125].

51 Anonymity systems share some features with our SOFS system. Anonymity systems usually use intermediate forwarding to achieve anonymity. However, there are some sig- niﬁcant differences between SOFS and anonymity systems. The goal of SOFS is to ensure paths from clients to the server by putting multiple connections between nodes in successive layers. Many anonymity systems depend on one or more third party nodes to generate an anonymous path [101, 134], which is not good for SOFS. SOFS cannot rely on a centralized node to achieve receiver anonymity, since the centralized node can itself be the target of a DDoS attack.

2.7 Summary

In this Chapter, we have studied the impacts of architectural design features on SOFS, a generalized overlay intermediate forwarding system under intelligent DDoS attacks. We analyzed our SOFS system under discrete round based attacks using a general analytical approach, and analyzed the system under continuous attacks using simulations. We observed that the system design features, attack strategies, intensities, prior knowledge about the system, system recovery signiﬁcantly impacts system performance. Even under sophisticated attack strategies and intensities, we showed that with smart designing of system features and recoveries, attack impacts can be signiﬁcantly reduced. As we discussed in

Section 2.4.1, we showed how to obtain optimal system conﬁgurations under expected attack strategies and intensities. Based on our ﬁndings in the chapter, we further proposed a set of design guidelines to enhance SOFS system performance under all general scenarios.

As part of future direction for this research topic, we propose to design SOFS system that is resilient to attacks while maintaining QoS. break-in attacks, increases the latency of communication. An increase in the mapping degree has the opposite effect of decreasing

52 latency due to more choices for routing. We are in the process of designing an SOFS system that is highly resilient to attacks while still attempting to achieve a desired level of QoS. Also, the impacts of our work extend beyond DDoS attack defense. There are several other applications where a structure present, enables better service delivery. These include Multicasting, Real-time delivery, File Sharing systems etc. As modeled in this chapter, attackers can cause signiﬁcant damages to performance by exploiting knowledge of the structure already present in these systems. We believe that our work is a ﬁrst step towards designing the features of resilient overlay architectures under intelligent attacks.

Analyzing the resilience of such systems under intelligent attacks will also be a part of our future work.

53 CHAPTER 3

LOCALIZATION ATTACK AGAINST INTERNET THREAT MONITORING SYSTEMS AND COUNTERMEASURES

In this chapter, we study a new class of attacks, i.e., the invisible LOCalization (iLOC) attack. The iLOC attack is another infrastructure-oriented attack discussed in this dissertation. Different from the architecture-oriented attack we discussed in previous chapter, it targets to the infrastructure location information of the defense systems. More particularly, the iLOC attack can accurately and invisibly obtain the location of the monitors in Inter- net Threat Monitoring (ITM) systems, which are well accepted defense systems against widespread Internet attacks. The task of iLOC is not to directly harm the Internet or ITM systems. Instead, its goal is to obtain the location information of the key components, i.e., monitors, in ITM systems, so that other widespread attacks can evade ITM systems and make attacks more effective. We also provide countermeasures again this potential threat to the Internet.

3.1 Motivations

In recent years, widespread attacks, such as active worms [87, 86, 4] and Distributed

Denial of Service (DDoS) attacks [82, 2], have been major threats to the Internet. Due to the widely-spreading nature of these attacks, large scale trafﬁc monitoring across the Internet

54 has become necessary in order to effectively detect and defend against them. Developing

and deploying Internet threat monitoring (ITM) systems (or motion sensor networks) is one of the major efforts in this realm.

However, the integrity and functionality of ITM systems largely depend on the anonymity

of the IP addresses covered by their monitors, i.e., the locations of monitors. If the locations

of monitors are identiﬁed, the attacker can deliberately avoid these monitors and directly

attack the uncovered IP address space. It is a known fact that the number of sub-networks

covered by monitors is much smaller than the total number of sub-networks in the Internet

[103, 138, 85]. In other words, the IP address space covered by monitors represents a very

small portion of the whole IP address space. For example, the SANs ISC covers around

1 million IP addresses, which is 0.023% of IPv4 IP address space. Hence, bypassing IP address spaces covered by monitors will significantly degrade the accuracy of the traffic data collected by the ITM system in reflecting the real situation of attack traffic. Further- more, the attacker may also poison ITM systems by manipulating the traffic towards and captured by disclosed monitors. For example, the attacker can launch high-rate port-scan traffic to disclosed monitors and feign a large scale worm propagation. The attackers may even launch retaliation attacks (e.g., DDoS) against participants (i.e., monitor contributors) of ITM systems, thereby discouraging them from contributing to ITM systems. In summary, the attacker can significantly compromise the ITM system performance if he is able to disclose the locations of monitors. It is important to have a thorough understanding of such attacks, in order to design efficient countermeasures enabling the protection of ITM systems.

55 In this chapter, we investigate a new class of attacks called invisible LOCcalization

(iLOC) attack, which can accurately and invisibly localize the monitors in ITM systems.

We further present a set to guidelines to counteract this potential threat to ITM systems.

3.2 Background

3.2.1 Internet Threat Monitoring Systems

Generally, an ITM system consists of a number of monitors and a data center. The monitors are distributed across the Internet and can be deployed at hosts, routers, and firewalls, etc. Each monitor is responsible for monitoring and collecting traffic targeting to a range of IP addresses within a sub-network. The range of IP addresses covered by a monitor is also referred to as the location of the monitor. Periodically, the monitors send traffic logs to the data center. The data center analyzes the traffic logs and publishes reports to the public9. The reports provide critical insights into widespread Internet threats and attacks, and are used in detecting and defending against such attacks. ITM systems have been successfully used to detect the outbreaks of worms [103] and DDoS attacks [89]. There have been many real-world developments and deployments of such systems. Examples include

DOMINO (Distributed Overlay for Monitoring InterNet Outbreaks) [137], SANs ISC (In- ternet Storm Center) [103], Internet sink [138], network telescope [85], CAIDA [21], and myNetWatchMan [90].

9In order to maximize the usage of such reports, most existing ITM systems publish the reports online and make them accessible to public.

56 3.2.2 Localization attacks against ITM Systems

A few works have been conducted on monitor localization attacks [18, 110] against

ITM systems. In this kind of attacks, accuracy is very important for an attacker in identify-

ing monitor locations. Meanwhile, invisibility is vital to the attacker as well. If the attack

attempts are identiﬁed by the defender (such as the ITM administrators), countermeasures

can be applied by the defender to reduce or eliminate the attack effect by ﬁltering suspi-

cious trafﬁc [120], decoying attackers [113], and even tracing back to attack origins for

accountability of their malicious acts [108]. Invisibility is critical for the attacker to evade

the above countermeasures.

However, it is challenging for the attacker to achieve these two objectives simultane-

ously. Intuitively, the attacker can use the high-rate attack trafﬁc, as in [18, 110], to easily

achieve high attack accuracy as follows. The attacker can launch high-rate port-scan trafﬁc

to a target network. The attacker then queries the data center for the report on recent port-

scan activities. If there is a trafﬁc spike in the report data reﬂecting the high-rate port-scan

trafﬁc sent by the attacker, the attacker can determine that the target network is deployed

with monitor(s) which sends trafﬁc report to the data center. However, it is hard for this

scheme to achieve invisibility, since large spikes caused by the attack trafﬁc make the attack

easily detectable. Our work is the ﬁrst to address an attack aiming to achieve the objectives

of accuracy and invisibility.

3.3 iLOC Attack

In this section, we will discuss the iLOC attack in detail. We will ﬁrst give an overview

of the iLOC attack, and then present the detailed stages of the attack, followed by additional discussions on its mechanisms.

57 3.3.1 Overview

: Background Traffic Data center QUERY RESPONSE Data center

: Attack Traffic + 1. Select code 2. Encode attack traffic + 4. Query for MONITORS’ LOG traffic report Attacker UPDATE Attacker 3. Launch 5. Recognize monitors attack mark monitors attack traffic Network B Network B Network A Network C Network A Network C

Internet Internet

monitors monitors (a) Attack stage 1: attack trafﬁc generating (b) Attack stage 2: attack trafﬁc decoding

Figure 3.1: Workﬂow of the iLOC Attack

Fig. 3.1 shows the basic workflow of the iLOC attack. This figure also illustrates the basic idea of the ITM system. In the ITM system, the monitors deployed at various networks record their observed port-scan traffic and continuously update their traffic logs to the data center. The data center first summarizes the volume of port-scan traffic towards

(and reported by) all monitors, and then publishes the report data to the public in a timely fashion.

As shown in Fig. 3.1 (a) and (b) respectively, the iLOC attack consists of the following

two stages:

1. Attack Trafﬁc Generation: In this stage, as shown in Fig. 3.1 (a), the attacker ﬁrst

selects a code. Then, he encodes the attack trafﬁc by embedding the selected code

into the trafﬁc. Lastly, the attacker launches the attack trafﬁc towards a target network

(e.g., network A in Fig. 3.1 (a)). We denote such an embedded code pattern in the

58 attack trafﬁc as the attack mark of the iLOC attack, and denote the attack trafﬁc

encoded by the code as attack mark trafﬁc.

2. Attack Trafﬁc Decoding: In this stage, as shown in Fig. 3.1 (b), the attacker ﬁrst

queries the data center for the trafﬁc report data. Such report data consist of both

attack trafﬁc and background trafﬁc. After getting the report data, the attacker tries

to recognize the attack mark (i.e., the code embedded in the iLOC attack trafﬁc)

by decoding the report data. If the attack mark is recognized, the report data must

include the attack trafﬁc, which means the target network is deployed with monitors

and the monitors are sending trafﬁc reports to the ITM data center.

Code-based Attack: The iLOC attack adopts a code based approach to generate the attack traffic. Coding techniques have been widely implemented in secured communication; for example, Morse code is one such example. Without knowledge of Morse code, the receiver would find it impossible to interpret the carried information [31]. In the iLOC attack, we use the pseudo-noise code (PN-code) based attack approach, which has three advantages. First, the code is embedded in traffic and can be correctly recognized by the attacker even under the interference from background traffic, ensuring accuracy of the attack. Second, the code (of sufficient length) itself provides enough privacy. That is, the code is only known by the attacker, thereby the code pattern embedded in attack traffic can only be recognized by the attacker. Furthermore, the code is able to carry information. A longer code is more immune to interference, and requires comparatively lower-rate attack traffic as the carrier, which is harder to be detected. All these characteristics help to achieve the objectives of attack accuracy and invisibility.

Parallel Attack Capacity: The iLOC attack can not only attack one target network to determine the deployment of monitors in one network at one time, but it can also attack

59 multiple networks simultaneously. Intuitively, one simple way to achieve this parallel at-

tack is to launch port-scan/attack trafﬁc towards multiple target networks simultaneously,

by scanning a different port number for each different target network. For example, if

the data center publishes traffic reports of 1000 (TCP/UDP) ports, then the attacker can launch attack towards 1000 networks simultaneously, attacking each network with a different port number. Since attack traffic on different ports are summarized separately at the data center, the attacker still can separate and thus decode his traffic towards different targets. Hence the attacker can localize monitors in multiple networks simultaneously and accurately. However, can the attacker further improve the attack efficiency? Assume the data center still only publishes reports of 1000 ports, can the attacker attack 10, 000 target

networks simultaneously, for example, attacking 10 different networks using one same port

number? Using high-rate of port-scan trafﬁc cannot achieve this, because it is indiscernible

whether a spike in the trafﬁc report is caused by trafﬁc logs from one network or the other

9 networks. In order to achieve this goal in the code-based attack, the selected code and

corresponding encoded attack trafﬁc towards multiple networks for the same port should

not interfere with each other (i.e., each of them can be decoded individually and accurately

by the attacker, although they are integrated/summarized in the trafﬁc report from the ITM

data center). The PN-code selected in the iLOC attack has this feature, giving it the unique

capacity to carry out parallel attack sessions towards multiple target networks using one

same port. The details of the PN-code selection will be discussed in the following sections.

3.3.2 Attack Trafﬁc Generation Stage

In this attack stage, the attacker: (1) selects the code, a PN-code in our case; (2) encodes the attack trafﬁc using the selected PN-code; and (3) launches the encoded attack trafﬁc

60 towards the target network. For the third step, the attacker can coordinate a large number

of compromised bots to launch the trafﬁc [92]. However, this is not the focus of this

chapter. In the following, we will present detailed discussion on the ﬁrst and second steps,

respectively.

Code Selection

To evade detection by others, the attack trafﬁc should be similar to the background

trafﬁc. From a large set of real-world background trafﬁc traces obtained from SANS ISC

[103, 39], we conclude that the background trafﬁc shows random patterns in both time

and frequency domains. The attack objectives of both accuracy and invisibility, and an

attacker’s desire for parallel attacks require that: (1) the encoded attack trafﬁc should blend

in with background trafﬁc, i.e., be random in both time and frequency domains, (2) the

code embedded in the attack trafﬁc should be easily recognizable to the attacker himself,

and (3) the code should support parallel attack.

To meet the above requirements, we choose the PN-code to encode the attack trafﬁc.

The PN-code in the iLOC attack is a sequence of −1 or +1 with the following features

[97, 35, 38].

• The PN-code is random and “balanced”. The −1 and +1 are randomly distributed

and the occurrence frequencies of −1 and +1 are nearly equal. This feature con-

tributes to good spectral density properties (i.e., equally spreading the energy over

the whole frequency-band). It makes the attack trafﬁc appear as noise and blend in

with background trafﬁc in both time and frequency domains.

• The PN-code has a high correlation to itself and a low correlation to others (such

as random noise), where the correlation is a mathematical tool for ﬁnding repeating

61 patterns in a signal [38]. This feature makes it feasible for the attacker to accurately

recognize attack trafﬁc (encoded by the PN-code) from the trafﬁc report data even

under the interference of background trafﬁc.

• The PN-code has a low cross-correlation value among different PN-code instances.

The lower this cross-correlation, the less interference among multiple attack sessions

in parallel attack. This feature makes it feasible for the attacker to conduct parallel

localization attacks towards multiple target networks on the same port.

The Walsh-Hadamard code and M-sequence code [97, 35] are two popular types of

PN-code. The Walsh-Hadamard code has some limitations. Since its frequency spreads into only a limited number of discrete frequency-components which is different from background trafﬁc, it will compromise the invisibility of the attack trafﬁc if used in the iLOC attack. In addition, the Walsh-Hadamard code also strongly depends on global synchronization [35]. On the contrary, M-sequence code does not have these shortcomings, we adopt M-sequence codes in the iLOC attack. We use the feedback shift register to repeatedly generate the M-sequence PN-code due to its popularity and ease of implementation

[97, 41]. In particular, a feedback shift register consists of two parts. One is an ordinary shift register consisting of a number of ﬂip-ﬂops (two state memory states). The other is a feedback module to form a multi-loop feedback logic.

Attack Trafﬁc Encoding

During the attack traffic encoding process, each bit in the selected PN-code is mapped to a unit time period Ts, denoted as mark bit duration. The entire duration of launched traffic (referred to as traffic launch session) is Ts · L, where L is the length of the PN-code.

62 PN-code = [+1, -1, +1, -1, +1]

+1 +1 +1

-1 -1 PN-Code Length = 5

Attack Traffic Encoded by PN-code V

V 0

TS TS Traffic Launch Session Duration = 5 · Ts

Figure 3.2: PN-code and Encoded Attack Trafﬁc

The encoding is carried out according to the following rules: each bit in the PN-code

maps to a mark bit duration (Ts); when the PN-code bit is +1, port-scan trafﬁc with a high

rate, denoted as mark traffic rate V , is generated in the corresponding mark bit duration; when the code bit is −1, no port-scan traffic is generated in the corresponding mark bit duration. Thus, the attacker embeds the attack traffic with a special pattern, i.e., the original

PN-code. Recall that, after this encoding process, the PN-code pattern embedded in trafﬁc

L is denoted as attack mark. If we use Ci =< Ci,1,Ci,2,...,Ci,L >∈ {−1, +1} to represent

the PN-code and use ηi =< ηi,1, ηi,2, . . . , ηi,L > to represent the attack trafﬁc, then we have

V V ηi,j = 2 · Ci,j + 2 (j = 1,...,L). Fig. 3.2 shows an example of the PN-code and the corresponding attack trafﬁc encoded with the PN-code.

3.3.3 Attack Trafﬁc Decoding Stage

In this stage, the attacker takes the following two steps: (1) The attacker queries the data

center for the traffic report data, which consists of both attack traffic and background traffic.

(2) From the report data, the attacker attempts to recognize the embedded attack mark. The existence of the attack mark determines the deployment of monitors in the attack targeted network. As the query of trafﬁc report data is relatively straightforward, here we only detail the second step, i.e., attack mark recognition, as follows.

63 In the report data queried from the data center, the attack trafﬁc encoded with the attack

mark is mixed with background trafﬁc. It is critical for the iLOC attack to accurately

recognize the attack mark from the trafﬁc report data. To address this, we develop the

correlation-based scheme. This scheme is motivated by the fact that the original PN-code

(used to encode attack trafﬁc) and its corresponding attack mark (embedded in the trafﬁc

report data) are highly correlated, in fact, they are actually the same.

The attack mark in the trafﬁc report data is the embedded form of the original PN-

code. The attack mark is similar to its original PN-code, although the background trafﬁc

may introduce interference and distortion into the attack mark. We adopt the following

correlation degree to measure their similarity. Mathematically, correlation degree is de-

ﬁned as the inner product of two vectors. For two vectors X =< X1,X2,...,XL > and

Y =< Y1,Y2,...,YL > of length L, the correlation degree of vector X and Y is P L X · Y Γ(X,Y ) = X ¯ Y = i=1 i i , (3.1) L

where Γ(.) represents the operator for the inner product of two vectors. Based on the above deﬁnition, we have Γ(X,X) = Γ(Y,Y ) = 1, ∀ X,Y ∈ {−1, +1}L.

We use two vectors, ηi =< ηi,1, ηi,2, . . . , ηi,L > and ωi =< ωi,1, ωi,2, . . . , ωi,L > to rep-

resent attack trafﬁc (embedded with attack mark) and background trafﬁc, respectively. We

shift the above two vectors by subtracting the mean value from the original data, resulting

0 0 0 0 0 0 0 0 in two new vectors, ηi =< ηi,1, ηi,2, . . . , ηi,L > and ωi =< ωi,1, ωi,2, . . . , ωi,L >. We still

L use a vector Ci =< Ci,1,Ci,2,...,Ci,L >∈ {−1, +1} to represent the PN-code. Thus,

the correlation degree between the PN-code and the (shifted) attack trafﬁc can be obtained.

Similarly, we can also obtain the correlation degree between the PN-code and the (shifted)

background trafﬁc as follows.

64 According to the rules of encoding attack trafﬁc discussed in Section 3.3.2, ηi =

V V 0 V V 2 ·Ci + 2 . Thus, ηi = ηi −E(ηi) = ηi − 2 = 2 ·Ci. Hence, the correlation degree between

0 V V the original PN-code and the (shifted) attack trafﬁc is Γ(Ci, ηi) = 2 · Γ(Ci,Ci) = 2 . Fur- thermore, we can also derive the correlation degree between the PN-code and the (shifted)

0 background trafﬁc, i.e., Γ(Ci, ωi). The mean of such correlation degree is close to 0, since

0 the PN-code has low correlation with the (shifted) background trafﬁc (i.e., E[Γ(Ci, ωi)] =

1 PL 0 L E[ j=1(ωi,j · Ci,j)] ≈ 0). If the standard deviation of the background trafﬁc rate is σx, the variance of such correlation degree is

0 0 2 V ar[Γ(Ci, ωi)] = E[(Γ(Ci, ωi) − 0) ] (3.2) 1 XL = E[ C2 ω0 2] (3.3) L2 i,j i,j j=1 1 XL σ 2 ≈ E[ ω0 2] = x . (3.4) L2 i,j L j=1 Thus, the average correlation degree between the PN-code and the (shifted) background

trafﬁc is Γ(C , ω0) ≈ √σx . Based on the above discussion, the attacker can set appropriate i i L attack parameters (e.g., PN-code length L and mark trafﬁc rate V ) to make correlation

V degree ( 2 ) between the PN-code and the attack mark trafﬁc is much larger than the corre-

lation degree ( √σx ) between the PN-code and the background traffic. As such, the attacker L can accurately distinguish the attack mark traffic from the background traffic.

In the practice of attack mark recognition, vector λi is used to represent the queried

0 report data, and vector λi is used to represent the shifted report data (by subtracting E(λi,j)

0 from λi). The attacker uses the correlation degree between λi and his PN-code Ci, i.e.,

0 0 Γ(Ci, λi), to determine the existence of PN-code in the report data. If Γ(Ci, λi) is larger

than a threshold Ta, which is referred to as mark decoding threshold, then the attacker determines that the report contains attack trafﬁc as well as the PN-code Ci, and determines

65 that the target network is deployed with monitors. The accuracy of this correlation-degree-

based PN-code recognition is analyzed and demonstrated in Section 3.4, 3.5 and 4.5.

3.3.4 Discussions

In order to accurately and effectively recognize the attack mark (PN-code) from the

report data, we need to ﬁnd the segment of the report data containing the PN-code (i.e.,

we need to fulﬁll the synchronization between the port-scan trafﬁc report data and the PN-

code). For this purpose, we introduce an iterative sliding window based scheme. The

basic idea is to let the attacker obtain a enough report data with small granularity. Then, a

sliding window iteratively moves forward to capture a segment of the report data. For each

segment, we apply the correlation-based scheme discussed in Section 3.3.3 to recognize

whether or not the attack mark exists. The details of this synchronization is presented as

follows.

The attacker ﬁrst sends a sequence of queries to the data center and each query requests

a part of report data which lasts for a given unit time, known as query duration Tq. To guar- antee good synchronization and capture of each bit in the PN-code, Tq should be smaller than the mark bit duration Ts. Also, the attacker needs to send enough queries and ensure that the queried report data contains the whole attack mark and attack mark trafﬁc, which is length L · Ts. With the report data, the attacker iteratively conducts a correlation test on the report data, using a sliding window. For example, in the ith round, the attacker selects

th ti as the starting time for the sliding window. In (i + 1) round, the attacker moves the

sliding window one step (Tq) forward, thus the start time of the sliding window becomes

th ti +Tq, and so on. In the i round, a sequence of data (length of L) is obtained in the sliding window. The first data point in the sequence is the traffic data in time duration [ti, ti + Ts], the second data point in the sequence is the traffic data in time duration [ti + Ts, ti + 2 · Ts],

66 and so on. With these data, the attacker conducts the attack mark recognition discussed in

Section 3.3.3. The attacker repeats the attack mark recognition after each time he moves forward the sliding window, until the attack mark is recognized from the report data in the current sliding window, or the sliding window has gone through all the report data.

3.4 Analysis

In this section, we ﬁrst present our formal analysis of the impacts of different attack parameters on attack accuracy and invisibility. Then based on analytical results, we discuss how to determine attack parameters.

Before starting analysis, we need to clarify two parties in the attack process, the iLOC attacker and its the adversary, the defender. The term defender generalizes the benign parties who maintain the ITM system and/or exploit the reports from the data center to identify widespread Internet attacks. Based on the reports, the defender not only attempts to determine whether there are anomalies in trafﬁc, but also takes appropriate actions should any anomalies be identiﬁed.

3.4.1 Accuracy Analysis

In order to measure attack accuracy, we introduce the following two metrics. The ﬁrst one is attack successful rate PAD, which is the probability that an attacker correctly recog- nizes the fact that a selected target network is deployed with monitors. The higher PAD is, the higher the attack accuracy. The second metric is attack false positive rate PAF , which is the probability that the attacker mistakenly declares a target network as one deployed with monitors. The lower PAF , the higher the attack accuracy.

Recall that Ta is the mark decoding threshold, V is the mark trafﬁc rate, vector λi repre-

0 sents the queried report data, and vector λi represents the shifted report data (by subtracting

67 0 0 E(λi,j) from λi). Assume that random variables ωi,1, . . . , ωi,L (i.e., the shifted background trafﬁc) are independent identically distributed (i.i.d) and follow a Gaussian random distribution with standard deviation σx, then we have the following theorem for the attack accuracy of the iLOC attack.

Theorem 1 In the iLOC attack, the attack successful rate PAD is

0 0 0 0 PAD = 1 − P r[Γ(λi,Ci) ≤ Ta|(λi = ηi + ωi)] (3.5) Z ∞ 1 −y2 = 1 − √ V √ e dy. (3.6) ( −Ta) L π 2 √ 2σx

The attack false positive rate PAF is

0 0 0 PAF = P r[Γ(λi,Ci) ≤ Ta|(λi = ωi)] (3.7) Z ∞ 1 −y2 = √ √ e dy. (3.8) π √L·Ta 2σx

Proof 1 i) Derivation of attack successful rate PAD.

According to the deﬁnition of PAD, we have

0 0 0 0 PAD = 1 − P r[Γ(λi,Ci) ≤ Ta|(λi = ηi + ωi)]. (3.9)

0 V V Consider that Γ(Ci, ηi) = 2 · Γ(Ci,Ci) = 2 , Equation (3.9) can be rewritten by

V PA = 1 − P r[Γ(λ0 ,C ) ≤ T − |(λ0 = ω0)]. (3.10) D i i a 2 i i

Based on the mean and variance of correlation degree determined in Section 3.3.3, then

PAD can be represented by

√ V Z T − 2 L a 2 −x L 2σ2 PAD = 1 − √ e x dx. (3.11) 2πσx −∞

68 √ 2 2 x L √x L Let y = 2 and y = , then we have 2σ 2σx V √ √ (Ta− ) L Z √ 2 √ L 2σx 2σx −y2 PAD = 1 − √ √ e dy (3.12) 2πσx −∞ L V √ (Ta− ) L Z ( √ 2 1 2σx 2 = 1 − √ e−y dy (3.13) π −∞ Z ∞ 1 −y2 = 1 − √ V √ e dy. (3.14) ( −Ta) L π 2 √ 2σx

ii) Derivation of attack false positive rate PAF .

0 0 0 0 We know Γ(λi,Ci) = λi¯Ci, where λi = ωi when no iLOC attack trafﬁc exists. Assume

2 0 σx that Γ(λi,Ci) follows a Gaussian distribution N(0, L ) (discussed in in Section 3.3.3), we have

0 0 0 PAF = P r[Γ(λi,Ci) ≥ Ta|(λi = ωi)]. (3.15)

Thus PAF can be presented by

√ Z L ∞ −x2L 2σ2 PAF = √ e x dx. (3.16) 2πσx Ta √ 2 2 x L √ Lx Let y = 2 and y = , we have 2σx 2σx

√ √ Z ∞ Z ∞ L −y2 2σx 1 −y2 PAF = √ √ (e √ )dy = √ √ e dy. (3.17) 2πσx √LTa L π √LTa 2σx 2σx Remarks: We make a few observations based on the theorem presented above. First, the

attack successful rate PAD increases and the attack false positive rate PAF decreases with

increasing PN-code length L. That is, higher attack accuracy increases when L increases.

Second, with the increasing mark trafﬁc rate V , attack accuracy also increases.

3.4.2 Invisibility Analysis

Here, invisibility refers to how invisible the iLOC attack is from the detection of de-

fender. In order to analyze invisibility, we need to consider the detection algorithms. While

69 there have been many different algorithms proposed to detect anomalies in port-scan traf-

ﬁc, here we use a representative and generic algorithm which has no speciﬁc requirement

on detection systems. This threshold based detection algorithm is widely adopted by many

systems [86, 103, 123, 114]. In this algorithm, if the trafﬁc rate (volume in a given time

duration) is larger than a pre-determined threshold Td (referred to as the defender detection

threshold), the defender issues threat alerts and initiates reactions [103]. Such detection

threshold is usually obtained through statistical analysis of the background trafﬁc. Note

that the threshold Td must be carefully chosen for anomaly detection: it must maintain both high detection rate (i.e., the probability that an ongoing attack is detected) and low false positive rate (i.e., the probability that an alarm is triggered when no attack is occurring).

To measure attack invisibility in terms of how well the iLOC attack can evade the detec-

tion by the defender, we use the following two metrics. The ﬁrst is the defender detection

rate PDD, the probability that the defender correctly detects the attack trafﬁc introduced

by the iLOC attack. The other one is the defender false positive rate PDF , the probability that the defender mistakenly identiﬁes the attack trafﬁc.

Similar to our approach in Section 3.3.2, we use random variable ω0 to represent the

shifted background trafﬁc, and random variable λ0 to represent the shifted trafﬁc data re-

ported by the ITM system. Note that if no iLOC attack exists, λ0 = ω0. Assume that values

of ω0 at a different time unit are independent identically distributed (i.i.d) and follow a

0 2 Gaussian random distribution with standard deviation σx (i.e., ω follows N(0, σx)). Then we have the following theorem for attack invisibility.

Theorem 2 In the iLOC attack, the defender detection rate PDD is

0 0 0 PDD = 1 − P r[λ ≤ Td|(λ = V + ω )] (3.18)

70 Z ∞ 1 2 = 1 − √ e−y dy. (3.19) (V −T ) π √ d 2σx

The defender false positive rate PDF is

0 0 0 PDF = P r[λ ≤ Td|(λ = ω )] (3.20) Z ∞ 1 2 = √ e−y dy. (3.21) T π √ d 2σx

The proof of Theorem 2 is similar to that of Theorem 1, therefore, we will skip it here

due to the space limitation.

Remarks: From Theorem 2, we make the following observations. First, with the in-

crease of mark trafﬁc rate V , the defender detection rate PDD increases, thus the attack

invisibility decreases. Second, the mark trafﬁc rate V does not affect the defender false

positive rate PDF , which is only determined by the threshold Td conﬁgured by the de-

fender.

3.4.3 Determination of Attack Parameters

Determination of V , Ta and L

The attacker can determine the values of attack parameters based on the above analysis.

First, the attacker can determine the mark trafﬁc rate V . This is because V is only related

to the attack invisibility metrics (defender detection rate PDD), and it impacts the determi-

nation of other parameters. After determination of V and given the expected false positive

rate, the attacker can further determine the mark decoding threshold Ta and PN-code length

L. Note that the values of other attack parameters such as the standard deviation of background trafﬁc σx, can be determined through analyzing historical background trafﬁc data published by the data center of the ITM system.

71 (i) Mark trafﬁc rate V : Using Equation (3.21), the attacker can ﬁrst estimate the defender detection threshold Td based on a reasonable upper-bound of the defender false positive rate PDF . For example, using the central limitation theory, we know that Td = 3 · σx

achieves a reasonable defender false positive rate PDF (1.7%). Thus, the attacker can use

3 · σx as a reasonable estimation of Td. After that, given the defender detection rate PDD which can be estimated similarly, and the background trafﬁc deviation σx, the attacker can

determine the mark trafﬁc rate V by resolving Equation (3.19) in Theory 2.

(ii) Mark recognition threshold Ta: Given the mark trafﬁc rate V (determined previ-

ously) and desired attack false positive rate PAF , the attacker can further determine the mark decoding threshold Ta by resolving Equation (3.8) in Theorem 1.

(iii) Length of PN-code L: Given the mark trafﬁc rate V , mark decoding threshold Ta, and desired attack successful rate PAD, the attacker can further determine the length of

PN-code L by resolving Equation (3.6) in Theorem 1.

Determination of Ts

To determine the mark bit duration Ts, the attacker needs to estimate the possible delay

from the moment when the attack trafﬁc is ﬁrst reported by monitors, to the moment when

such attack trafﬁc is published by the data center. To make the iLOC attack effective, the mark bit duration needs to be at least as large as such delay. Otherwise, the trafﬁc in different bit durations (each last Ts) may be published at the same moment from the data center, mixing and thereby rendering them inseparable.

Several possible methods can be used to obtain such delay information. Some ITM

systems may publish such information on their websites. The attacker may also actively

conduct experiments on ITM systems and measure such delay. For example, the attacker

may deploy monitors in his controlled (small) network and connect them to the targeted

72 ITM system. The attacker can simply use such monitors to report logs embedded with

special patterns (e.g., PN-code) and keep querying the data center until the embedded trafﬁc

patterns are recognized. After repeating the above process for several times, the attacker

is able to obtain the statistics proﬁle of delay information, and then determine the mark

bit duration Ts. We use this method in our implementation of the iLOC attack, which is

presented in the next section.

3.5 Implementation and Validation

In this section, we ﬁrst introduce our implementation of the iLOC attack. Then, we

report the validation results of our iLOC attack design and experiments against a real-world

ITM system. 3.5.1 Implementation of the iLOC Attack

We implement an iLOC attack prototype based on the design in Section 3.3. This

prototype works against any ITM system with the data center having a web-based user

interface. Particularly, there are ﬁve independent and important components in our iLOC

implementation, Data Center Querist, Background Trafﬁc Analyzer, PN-code Generator,

Attack Trafﬁc Generator and Attack Mark Decoder.

In particular, Data Center Querist is a component that interacts with the data center of

the targeted ITM system. Its main tasks consist of sending queries to the data center for

port-scan trafﬁc report and retrieving the response (i.e., the report) from the data center.

The inputs to this component are the URL, or IP address, of the data center and the port

number of the port-scan trafﬁc needed to query. From the trafﬁc report data, Background

Traffic Analyzer can obtain the statistics profile of background traffic and determine attack parameters for other components. PN-code Generator is a component that generates and

73 Network B Internet Data Center of Threat Monitoring System Campus Network R1

Network A

Attacker Monitors

Figure 3.3: Experiment Setup

stores the PN-code. The PN-code length is determined according to the attacker’s objec-

tives and background traffic profile as described in Section 3.4.3. Attack Traffic Generator

is a component that generates attack trafﬁc based on the PN-code and background statis-

tics proﬁle. In which, the PN-code encoded trafﬁc is generated in a way as discussed in

Section 3.3.2. Inputs to this component are the IP addresses range of target network, port

number and transportation protocol (TCP or UDP). Attack Mark Decoder is a component

that obtains the port-scan report data through Data Center Querist, and decides whether the

attack mark exists in the way discussed in Section 3.3.3. The PN-code used in the decod-

ing process is the same as the one used in encoding attack trafﬁc and stored in the PN-code

Generator.

These components may be integrated into one program running on one machine. The

attack can also be carried out in a more ﬂexible ways if the tasks of the above components

are performed by processes on different machines. Our iLOC prototype is implemented

using Microsoft MFC and Matlab on Windows XP operating system. 3.5.2 Validation of the iLOC Attack

In order to validate our iLOC implementation, we deployed it to identify a set of monitors that are associated with a real-world ITM system.

74 150 Background Traffic 145 Traffic Mixed with iLOC Attack

140

135

130

125

Traffic Rate 120

115

110

105

100 0 50 100 150 200 250 Time (hour)

Figure 3.4: Background Trafﬁc vs. Trafﬁc Mixed with iLOC Attack

24 Background Traffic

22 Traffic Mixed with iLOC Attack

14 Power 12

4 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 −5 Frequency (Hz) x 10

Figure 3.5: PSD for Background Trafﬁc vs. Trafﬁc Mixed with iLOC Attack

Fig. 3.3 illustrates our experiment setup. For the purposes of this research, we requested

information about locations of a set of monitors in the ITM system. We were provided

with the identities of two network sets A and B. There are some monitors deployed within

network set A and there is no monitor in network set B. All monitors in network set

A monitor a set of IP addresses and record the port-scan logs. Then we (the attacker)

execute the iLOC attack to decide whether monitors exist in the network set A and set B, respectively.

In our experiment, we use a PN-code of length 15. The mark bit duration is set for 2

hours and the query duration is 20 minutes. With the queried report data, we can correctly

determine that all networks in set A are deployed with monitors and networks in B are not

75 deployed with monitors. Fig. 3.4 shows the trafﬁc rate in time-domain. Fig. 3.5 shows the trafﬁc rate in frequency-domain in terms of Power Spectrum Density (PSD). The PSD describes how the power of a time series data is distributed in frequency-domain. Mathe- matically, it is equal to the Fourier transform of the auto-correlation of time series data [9].

From these two figures, we observe that it is hard for others, without knowing the content of PN-code, to detect the iLOC attack, since the overall traffic with the iLOC attack is very similar to the traffic without the iLOC attack traffic embedded. That is, such experiments demonstrate that the iLOC attack can accurately and invisibly localize the monitors of ITM systems in practice.

3.6 Performance Evaluation 3.6.1 Evaluation Methodology

In our evaluation, we use the real-world port-scan traces from SANs ISC (Internet

Storm Center) including the detail logs from 01/01/2005 to 01/15/2005 [103, 39]10. The traces used in our study contain over 80 million records and the overall data volume exceeds 80 GB. We use these real-world traces as the background traffic. We merge records of simulated iLOC attack traffic into these traces and replay the merged data to emulate the iLOC attack traffic. We evaluate different attack scenarios by varying attack parameters. Here, we only show the data on port 135; experiments on other ports result in similar observations.

We explore both attack accuracy and invisibility to evaluate attack performance. For attack accuracy, we use two metrics: one is the attack successful rate PAD and the other is

10We thank the ISC for providing us valuable traces in this research.

76 1

0.9

0.8 D 0.7 iLOC Experiment iLOC Theory 0.6 Volume−based Frequency−based 0.5

0.4

0.3 Attack Successful Rate − PA 0.2

0.1

0 0.5 1 1.5 2 2.5 3 Attack Traffic Rate − P

Figure 3.6: Attack Successful Rate (Port 135)

the attack false positive rate PAF , which are deﬁned in Section 3.4.1. For attack invisibil-

ity, we use two metrics: one is the defender detection rate PDD and the other is defender

false positive rate PDF , which are deﬁned in Section 3.4.2.

We evaluate the iLOC attack in comparison with two other baseline attack schemes.

The first one is the localization attack that launches a significantly high-rate of port-scan traffic to target networks as introduced in [18, 110]. We denote this attack as volume-based

attack. The second baseline scheme embeds the attack trafﬁc with a unique frequency pattern. In this attack, the attack trafﬁc rate changes periodically. Then the attacker expects that the report data from the data center show such unique frequency pattern if the selected target network is deployed with monitors. We denote this attack scheme as frequency-based

attack. For fairness, we adjust the detection thresholds in all schemes so that reasonable attack false positive rate PAF and defender false positive rate PDF (below 1%) are achieved.

For the iLOC attack, we generate different attack trafﬁc based on variant PN-code length L

(i.e., 15, 30, 45). The default PN-code length is set to 30. To better quantify the attack traf-

fic rate for the iLOC attack and other attack schemes, we use the normalized attack traffic rate P , which is defined as P = V/σx for iLOC attack, where σx is the standard variation of background traffic rate. The default value of Tq = 0.1 · Ts.

77 3.6.2 Results Attack Accuracy

To compare the attack accuracy of the iLOC attack with that of volume and frequency- based attack schemes, we plot the attack successful rate PAD under different attack traffic rates (i.e., P ∈ [0.01, 3]) as shown in Fig. 3.6. From this figure, we observe that both iLOC and frequency-based attacks consistently achieve a much higher attack successful rate PAD than the volume-based scheme. This difference in PAD is more significant when the attack traffic rate is lower, which can be explained as follows. For the iLOC scheme, the PN-code based encoding/decoding makes the recognition of attack marks robust to interference of the background traffic. For the frequency-based scheme, the invariant frequency in the attack traffic is also robust to the interference of the background traffic. Both of them can distinguish their attack traffic accurately even when the attack traffic rate (i.e., P ) is small.

Nevertheless, the volume-based scheme relies on the high rate of attack trafﬁc (i.e., large

P ), and thus, is very sensitive to the the interference of the background trafﬁc.

Attack invisibility

To compare the attack invisibility performance of the iLOC attack with the other two attack schemes, we show the defender detection rate PDD on port 135 in Table 3.1. This table shows the attacker-achieved defender detection rate PDD, given different localization successful rates PAD (90%, 95%, and 98%). Recall that the defender sets the detection threshold to make the defender false positive rate PDF below 1%. In the table, “(Time)” and “(Freq)” mean that the defender adopts the time-domain and frequency-domain analytical techniques to detect attacks. It is observed that our iLOC scheme consistently achieves much lower defender detection rate PDD than other two schemes, which means the iLOC

78 attack achieves the best attack invisibility performance. As expected, the defender can eas-

ily detect the frequency-based attack by frequency-domain analytical technique, as there is

a unique frequency pattern in its attack trafﬁc.

PAD iLOC(Time) iLOC(Freq) Volume-based Frequency-based Frequency-based attack (Time) attack (Freq) (Time) 90% 2.5% 2.2% 90% 90% 2.9% 95% 2.8% 2.4% 95% 95% 3.1% 98% 3.1% 2.8% 98% 98% 3.3%

Table 3.1: Defender Detection Rate PDD (Port 135)

Impact of the Length of PN-code

To investigate the impact of the PN-code length on the performance of the iLOC attack, we plot the attack successful rate PAD for PN-code of different lengths (15, 30, 45) in

Fig. 3.7. In the legend, iLOC(L = x) means that the PN-code length is x. Data in this

figure are also collected for various attack traffic rates. This figure shows that the attack successful rate PAD increases with increasing PN-code length. This is because a longer

PN-code can more signiﬁcantly reduce the interference impact from the background trafﬁc on recognizing the attack mark, thereby achieving higher attack accuracy.

Impact of the Number of Parallel Localization Attacks

To evaluate the impact of the number of parallel localization capability on attack ac-

curacy, we show the attack successful rate PAD for different number of parallel attack

sessions on the same port in Fig. 3.8. In the legend, iLOC(N = x) means that there are

x parallel attack sessions. This ﬁgure shows that in term of attack successful rate PAD,

79 the iLOC attack scheme is not sensitive to the number of parallel attack sessions. The attack successful rate PAD only slightly decreases with the increasing number of parallel attack sessions. The reason is that the trafﬁc for different attack sessions are encoded by PN-codes, which are low cross-correlated to each other as described in Section 3.3.2, and thereby have little interference amongst them. Fig. 3.9 shows the impact of the number of parallel attack sessions on attack invisibility. It can be observed that the increasing number of parallel attack sessions results in a slight increase of defender detection rate

PDD. Therefore, parallel localization capability can improve the attack efﬁciency without signiﬁcantly compromising both accuracy and invisibility.

The iLOC attack achieves invisibility by using the PN-code, which contributes to a longer period in which the attack can be carried out. Nevertheless, parallel capability can signiﬁcantly improve the attack efﬁciency. For example, take the case of attacking a system consisting of 1200 networks. Using one port, the volume-based attack needs 1200 unit time to perform the attack task. Single iLOC attack with code length of 15 needs 1200 ×

15 = 18000 unit time and achieves higher accuracy and invisibility. To fulﬁll the same localization attack task, parallel iLOC with 8 attack sessions and same code length can achieve similarly high accuracy and invisibility performance and the total time is only

1200 × 15/8 = 2250 unit time, which is comparable to that of a volume-based attack.

3.7 Guidelines of Countermeasures

We have demonstrated the threat of the iLOC attack against ITM systems above. Now, we discuss possible countermeasures to such attacks. It is relatively easy to defend against volume-based and frequency-based localization attacks which either embed a spike (using a high-rate scan trafﬁc) [18, 110] or an invariable frequency (using a certain frequency

80 1

0.9

0.8 D 0.7

0.6

0.5

0.4 C−Probe Experiment (L=15) 0.3 C−Probe Theory (L=15) C−Probe Experiment (L=30) Attack Successful Rate − PA C−Probe Theory (L=30) 0.2 C−Probe Experiment (L=45) C−Probe Theory (L=45) 0.1 Frequency−based Probe Volume−based Probe 0 0.5 1 1.5 2 2.5 3 Attack Traffic Rate − P

Figure 3.7: Attack Successful Rate vs. Code Length

0.9

0.8 D 0.7

0.6

0.5

0.4

0.3 iLOC Experiment (N=2) Attack Successful Rate − PA iLOC Experiment (N=4) 0.2 iLOC Experiment (N=8) Frequency−based 0.1 Volume−based

0 0.5 1 1.5 2 2.5 3 Attack Traffic Rate P

Figure 3.8: Attack Successful Rate vs. Number of Parallel Attack Sessions

0.07 iLOC Experiment (N=2, Time) iLOC Experiment (N=4, Time) 0.06 iLOC Experiment (N=8, Time iLOC Experiment (N=2, Freq)

D iLOC Experiment (N=4, Freq) 0.05 iLOC Experiment (N=8, Freq)

0.04

0.03

0.02 Defender Detection Rate − PD

0.01

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 Attack Traffic Rate − P

Figure 3.9: Attack Successful Rate vs. Number of Parallel Attack Sessions

81 pattern), since these two attack schemes show strong signatures in the attack trafﬁc (either in the time domain or frequency domain). However, in order to defend against the iLOC attack, the defender needs deep insight into the design of the attack. We give some general guidelines for counteracting the iLOC attack from the following aspects.

1) Limiting the Information Access Rate: Recall that in the iLOC attack, the attacker must generate a signiﬁcant amount of queries to the data center of ITM systems in order to accurately recognize the encoded attack trafﬁc. We may explore such knowledge to reduce the effectiveness of iLOC attack. To do so, the data center may throttle the query request rate. One possible way is to enforce human/system interaction for each query, and thereby eliminate the automatic query in the iLOC attack. This can be conducted through authenticated registration, e.g., one authenticated registration is only valid for a certain number of queries. However, these limitations on information access rate may also reduce the functionality and usage of ITM systems.

2) Perturbing the Information: Recall that in the iLOC attack, the attacker needs to recognize the encoded attack trafﬁc. Thus, the quality of reports plays an important role in such a recognition process. To reduce the effectiveness of iLOC attack, we may perturb the published report data by adding some random noise and even randomizing the data publishing delay. This principle is similar to the data perturbation in private data sharing realm

[145, 8]. By perturbing report data, the attack accuracy of iLOC attack can be degraded.

On the other hand, adding random noise and randomizing the delay in publishing report data will impact the data accuracy and usage of ITM systems. Studying such a trade-off will be one aspect of our future work.

3) Investigating Advanced Detection Schemes: Recall that in the iLOC attack, in order to effectively evade detection of monitors in ITM systems, the attacker has to continuously

82 launch attack/port-scan traffic to different new target networks to localize as many monitors as possible. Consequently, the target IP addresses of attack traffic may exhibit a widely dispersed distribution [67]. Thus, analyzing the distribution of IP addresses may provide one possible method of detection. Furthermore, since the iLOC attack uses the PN-code to encode the attack traffic, we plan to study potential ways of detecting PN-code in our future work.

3.8 Related Work

Many ITM systems have been developed and deployed since CAIDA initiated the network telescope project to monitor background trafﬁc in 2001 [22]. Although the IP addresses of monitors themselves can be protected by mechanisms, such as encryption and

Bloom filter [52], the public data reported by these ITM systems could be used to disclose the IP address space covered by monitors. Existing attack approaches achieve this by launching high-rate port-scan traffic [18, 110]. However, these kind of attacks do not consider the invisibility of attacks, since the high-rate attack traffic exposes the chance of being detected.

The invisibility techniques in our work borrows the camouflage principle, as illustrated by nature and the military. In nature, an animal can disguise itself as the object on which it stands in order to fool its predators or prey [12]. In military, soldiers wear camouflage clothing designed to blend into the surrounding terrain [94]. As an invisibility technique, our work leverages the PN-code technology and extends it to a new Internet cyber-security realm. The PN-code was initially used in military communication systems to provide anti- jamming and secured communication [97]. In wireless communication, PN-code has been widely used to improve the communication efficiency [35]. In addition, PN-code has other

83 broad applications, such as cryptography [17], secured data storage and retrieving [126],

image processing [133].

In this chapter, we study techniques in applying the PN-code in the iLOC attack. Work in [139] also studied how to use PN-code to effectively track anonymous ﬂows through mix networks. Since it is applied to a different problem domain, the solution in [139] is signiﬁcantly different from the one in this chapter, including the use of the PN code, designed algorithms, decision rule, and theoretical analysis.

3.9 Summary

In this chapter, we investigated a new class of attacks, i.e., the invisible LOCalization

(iLOC) attack. It can accurately and invisibly localize monitors of ITM systems. Its ef-

fectiveness is demonstrated by theoretical analysis and experiments with an implemented

prototype. We believe that our work lays the foundation for ongoing studies of attacks that

intelligently adapt attack trafﬁc to defenses. Our study is critical for securing and improv-

ing ITM systems.

As part of our future work, we will continue studying other invisible localization attack

schemes. While the PN-code used in this chapter is effective in achieving attack objectives

of accuracy and invisibility, other attack patterns embedded in attack trafﬁc may also be

accurately distinguished only by the attacker. Detection of such invisible attack and de-

sign of corresponding countermeasures are still challenging tasks and will be conducted

in our future research. Investigation of proactive methods to protect the location privacy

of monitors is also a part of our future work. Also, we believe that other vulnerabilities

exist in ITM systems and we plan to conduct a thorough investigation of them and develop

corresponding countermeasures.

84 CHAPTER 4

VARYING SCAN RATE WORMS AGAINST NETWORK-BASED WORM DETECTIONS AND COUNTERMEASURES

In this and next chapters, we study the other type of defense-oriented widespread Inter-

net attacks, i.e., algorithm-oriented attacks. In this chapter, we study a evolved worm called

Varying Scan Rate Worm (the VSR worm in short) which deliberately vires its scan rate and

is able to avoid detection by existing network-based worm detection systems. We also

present our new worm detection scheme called attack target Distribution Entropy based dynamiC detection scheme (DEC detection in short) which can effectively detect both VSR and traditional worms.

4.1 Motivations

In traditional active worms, each worm instance11 takes part in spreading worm attack

by scanning and infecting other vulnerable hosts in the Internet. The basic form of active

worms is the Pure Random Scan (PRS) worm, where a worm infected host continuously

scans a set of random Internet IP addresses to ﬁnd new vulnerable hosts. There are several

variants of the PRS worm such as local subnet scan worm [27] and hit-list scan worm [114].

Both of these worms attempt to speed up their propagation by increasing the probability

11In this chapter, we use worm infected host, worm victim, and worm instance interchangeably.

85 of successful scanning. It has been observed that there are exponentially increasing trends in the number of infected hosts and overall scanning trafﬁc volume over time when any of the above worms propagates in the Internet [86][27][148]. Based on these observations, many worm detection schemes associated with global scan trafﬁc monitoring systems, such as threshold-based detection and trend-based detection have been developed to detect the large scale propagation of worms in the Internet [147][105][132][123].

However, active worms continue evolving. The ”Atak” worm [144] is a new active worm found in the recent past that attempts to remain hidden by going to sleep (stop scanning) when it suspects to be under detection. The worms which adopt similar attack strategies to that of the ”Atak” worm could result in an overall scan trafﬁc pattern different from that of traditional worms. Therefore, the existing detection schemes based on global scan trafﬁc monitoring will not be able to effectively detect them. Therefore, it is very important to understand such smart-worms in order to defend against them. In the following, we study two new classes of such worms and design new detection schemes which can effectively detect them and traditional worms.

4.2 Background

In this section, we ﬁrst introduce the propagation model of traditional active worms.

We then discuss the worm detection framework and corresponding detection schemes associated with it.

4.2.1 The Propagation Model of Traditional Worms

Computer active worms are similar to biological viruses in terms of their infection and self-propagating nature. Worms identify vulnerable hosts in their neighborhood and other

86 networks and then infect them. The newly infected hosts propagate the worm infection

further to other vulnerable hosts and so on.

As discussed in Section 4.1, the Pure Random Scan (PRS) worm is the basic form of

traditional active worms. We use PRS approach as our baseline attack scheme to study

the the VSR worm and C-Worm. In our analysis, we adopt the epidemic dynamic model

for disease propagation [13] to characterize the worm propagation. Modeling active worm

propagation using the epidemic dynamic model has been done in [86][27][148]. This model

assumes that each host is in one of the following states: immune, vulnerable or infected.

An immune host is one that cannot be infected by a worm. A vulnerable host becomes an

infected host after being successfully infected by a worm. We use an average case analysis

approach to analyze the worm propagation. The analysis is conducted in discrete time

domain.

We deﬁne M(i) and N(i) as the number of overall infected hosts and the number of un-infected vulnerable hosts at time tick i respectively, and E(i+1) as the number of newly

infected hosts from time tick i to time tick i + 1. We deﬁne T as the total number of IP

32 addresses, i.e., 2 for IPv4. Thus N(0) = T · P1 · P2 is the number of vulnerable hosts

on the Internet before the worm attack starts, where P1 is the ratio of the number of IP

addresses assigned to existing hosts in the Internet to the entire Internet IP address space

T , P2 is the ratio of vulnerable host number to the total number of the existing hosts in

the Internet. We also assume E(0) = 0 and M(0) = M0. For the PRS worm, we have

propagation model as follows,

1 E(i + 1) = N(i)[1 − (1 − )S·M(i)], (4.1) T

M(i + 1) = M(i) + E(i + 1), (4.2)

87 N(i + 1) = N(i) − E(i + 1). (4.3)

The details of this model are in [27][132].

4.2.2 Network-based Worm Detection Worm Detection System Framework

As we stated before, this chapter focuses on designing schemes to detect the Internet- wide large scale propagation of active worms. In order to achieve rapid and accurate detection, it is imperative to monitor and analyze the traffic in multiple locations over the Internet to detect suspicious traffic generated by worms. The generic worm detection framework that we use in this chapter consists of multiple distributed monitors and a worm detection center that controls the former. This framework is widely adopted, and is similar to other existing worm detection systems, such as the Cyber center for disease controller [114], honeypots [113] and Internet sink [138]. The distributed monitors are located at gateways, firewalls, and border routers of local networks. They passively record and report any unusual scan traffic data, such as connection attempts to unavailable IP address and restricted service ports, to the worm detection center. The report to the worm detection center generally includes scan information in the form of a tuple, i.e.,

(Source IP, Destination IP, T ime, Signature). The worm detection center processes such reports and deploy various worm detection schemes to check whether there are suspicious and large-scale scans to restricted ports or connection attempts to unassigned IP addresses. If such excessive and uncommon scans are detected, the worm detection center concludes that there is a large-scale worm propagation over the monitored network.

88 Worm Detection Schemes

Various worm detection schemes based on the global scan trafﬁc monitoring can be

integrated into the above detection framework. Similar to the traditional intrusion systems,

most worm detection schemes take three steps in worm detection; (i) collecting the data for

detection purpose; (ii) analyzing the detection data; (iii) applying certain decision rules to

obtain the detection result.

Most existing worm detection schemes use the count of monitored worm instances as detection data. For the statistic property of detection data, various detection schemes are different in terms of the format and usage. Most of them use individual sampled detection data of each sampling window to decide the presence of wide-spreading worm [123][131].

Some of them use the variance of sampled detection data to set the detection criterion such as the threshold of detection decision [132]. Based on the detection decision rule, the existing detection schemes can be classified into two groups, namely the count-threshold- based and count-trend-based detections. The count-threshold-based scheme is a popular scheme to detect wide-spreading worms [132][123][131]. In this scheme, if the amount of observed worm instances/scan traffic exceeds a pre-configured threshold, large scale worm spreading is identified as being in existence. On the other hand, the count-trend- based scheme [147] is based on the knowledge of existing worm attack epidemic dynamic models and uses the principle of ”detecting dynamic traffic trends”. That is, it is based on the observation that although the scan rate of a worm instance might be limited by the network bandwidth and CPU capacity, the worm instance does not change the scan rate on purpose. Thus, the existing worm attacks cause the number of worm instances to increase at a positive exponential rate which can be monitored for detection purposes. In Section

89 4.3, we deﬁne a new active worm attack strategy and observe that the above two classes of

detection schemes are not able to detect this new active worm effectively.

4.3 The Active Worm with Varying Scan Rate

In this section, we ﬁrst deﬁne a new form of active worms, i.e., the active worm with

Varying Scan Rate (the VSR worm in short). We then analyze the effectiveness of the

VSR worm in changing the worm scan behavior and hence evading the existing global scan

trafﬁc monitoring based worm detection schemes.

4.3.1 The VSR Worm Model

For the traditional worm, each worm instance scans and infects other hosts on the In-

ternet. For the VSR worm, each worm-infected victim (worm instance) adopts a varying

scan rate (S(t)) and a varying attack probability (Pa(t)). S(t) can be totally randomized or determined by a certain function depending on worm attack strategies. The attack probability Pa(t) is the probability that a worm instance takes part in worm attack (i.e., scan other hosts on the Internet) at time t. In practice, a worm attacker may divide the time into a sequence of discrete time units. Accordingly, S(t) and Pa(t) become discrete functions,

th S(i) and Pa(i). In the i time unit, worm instances calculate S(i) and Pa(i) and carry out scan based on these two values. Algorithm 2 shows the procedure of the VSR worm attack.

The scan rate function S(i) and the attack probability function Pa(i) are predetermined

Sn by the worm attacker. For example, a VSR worm uses S(i) = max( 1+i , 0.02), which is

a time (index by i) decreasing function. Here, Sn is a parameter to control the overall

scan rate during the attack. This VSR worm can use a periodic function for the attack

90 Procedure 2 The Psudocode of the VSR Worm Attack

Require: This host is a worm-infected host 1: for all i = 0 to ∞ do 2: /*current time is i*/ 3: Calculate random scan rate S(i); 4: Calculate attack probability Pa(i); 5: Conduct the scan based on S(i) and Pa(i); 6: end for

probability,

2π P (i) = max(| cos( i)|, 0.08). (4.4) a 5000

Our VSR worm model is generic. The ”Atak” worm is one of its special cases, where

S(t) is close to a constant value and Pa(t) is determined by a time varying function. The traditional PRS worm is also a special case of the VSR worm, where S(t) is close to a constant value and Pa(t) is equal to 1. Other forms of worms such as local subnet scan worm [81][27] have propagation formulas different from Formula (4.1) for the PRS worm.

However, the varying S(t) and Pa(t) functions can be easily applied to those propagation formulas to model the combination of those worms with the VSR worm.

4.3.2 Analysis of the VSR Worm

In this section, we ﬁrst develop the propagation model of the VSR worm. Following this, we investigate the performance of the existing worm detection schemes in detecting

VSR worms, in order to determine the effectiveness of VSR worm in evading these detection schemes.

91 Propagation model of the VSR Worm

For the VSR worm, the Formula (4.2) needs to be modiﬁed to take the S(i) and Pa(i) into consideration. For the analysis purpose, we assume all the worm instances use the same scan rate function S(i) and the attack probability function Pa(i). Then, we have,

1 M(i + 1) = M(i) + N(i)(1 − (1 − )S(i)·Pa(i)·M(i)). (4.5) T

Now we derive the number of worm instances observed by the detection system from time tick i − 1 to time tick i, referred as Mˆ (i). The number of observed worm instances is impacted by the percentage of IP address space, referred as Pm, monitored by the detection system. Pm determines the average probability that a worm scan can be monitored by the detection system.

Assume at time tick i, there are M(i) worm instances. Based on the VSR worm attack strategy, there are M(i)·Pa(i) worm instances actively conducting the scan, and each active instance generates S(i) scans between time tick i − 1 and time tick i. The probability that at least one scan from S(i) scans generated by a worm instance will be detected by the

S(i) detection system is 1 − (1 − Pm) . Thus, the number of worm instances observed by detection system from time tick i − 1 to time tick i is

ˆ S(i) M(i) = M(i) · Pa(i)[1 − (1 − Pm) ]. (4.6)

In the following, we show the simulation data on the propagation of the VSR worm. In order to show performance of different VSR worms, we select S(i) to be

Sn S(t, K) = max( t , 0.08), (4.7) 1 + K and use the same Pa(t) in Formula (4.4). We select the parameter K deﬁned in For- mula (4.7) to be 200, 250 and 300 respectively. Thus, three corresponding VSR worms

92 Number of Total Infected Hosts on VSR Worms Number of Detected Worm Attack Instances on VSR Worms

500 350000 450 300000 400 350 250000 300 200000 250 150000 200 150 100000 100 50000

# of Attack Instances # of Attack 50

# of # Infected Host Instances 0 0 100 100 1000 2000 3000 4000 5000 6000 7500 8500 9500 1000 2000 3000 4000 5000 6000 7500 8500 9500 10500 11500 12500 13500 14500 15500 16500 17500 10500 11500 12500 13500 14500 15500 16500 17500 Time Time Traditional w orm VSR(K=200) VSR(K=250) VSR(K=300) Traditional w orm VSR(K=200) VSR(K=250) VSR(K=300)

Figure 4.1: Infection ratio of different Figure 4.2: The observed worm instance VSR worms. count of different VSR worms.

are generated, referred as VSR(K = 200), VSR(K = 250) and VSR(K = 300) respectively.

Fig. 4.1 shows the pattern of worm infected instances of the traditional PRS worm and three VSR worms determined previously. For the PRS worm, the scan rates of the worm infected hosts follow a normal distribution N(100, 100). It clearly demonstrates that the PRS worm has an exponential increasing pattern of worm instance number during its propagation, and the VSR worm can change this pattern. As shown in Fig. 4.2, the observed worm instance numbers of VSR worms are also very different from the traditional PRS worm. This demonstrates that the VSR worm can successfully hide the real worm instance count (M(i)), change its pattern over time and thus evade being effectively detected by the existing worm detection systems. In Fig. 4.2, the worm detection system covers a 220 IPv4

1 address space (Pm = 212 ) similar to that of SANS ISC [103].

93 Effectiveness of the VSR Worm

In the following, we evaluate the effectiveness of VSR worm in defeating some repre-

sentative worm detection schemes. We deﬁne two metrics here. The ﬁrst one is the Detec-

tion Time (DT ), which is deﬁned as the time taken to successfully detect a wide-spreading worm from the moment when the worm spreading starts. The second metric is the Maximal

Infected Ratio (MIR), which deﬁnes the ratio of infected host number over the total number of vulnerable hosts up to the moment when the worm spreading is detected. This metric fundamentally quantiﬁes the damage of the worm, i.e., how many hosts are infected by the time when worm spreading is detected. The importance of MIR is that it can distinguish

the performance of two worms in the case that two worms are detected at the same time

(same DT ), but they have infected different number of hosts at the moment being detected.

The fundamental purpose of the detection schemes is to minimize the damage caused by

the worm. Hence MIR also quantiﬁes the effectiveness of the worm detection schemes.

The higher this value is, the better is the worm attack performance and consequently, the

worse is the detection performance.

In our simulations, the parameters of the traditional worm and the VSR worms are same

as those in Section 4.3.2. By analyzing the background non-worm scan trafﬁc reported by

Internet Storm Center of SANs [103] and Goldsmith data [49], we are able to set detection

system parameters of all detection schemes to achieve reasonable detection false alarm

rate (below 5 × 10−4). The detection false alarm rate is deﬁned as the probability that a

worm spreading alarm is reported as the detection result when there is no wide-spreading

worm. We obtain the results for three existing detection schemes. The ﬁrst one is the

generalization of the detection schemes based on the comparison of the observed victim

count and a predeﬁned threshold [123]. We refer this detection scheme as CISH. The

94 second detection scheme is proposed in [132], which is based on the observed victim count

variance and a dynamic threshold. We refer this detection as CVDH. The third detection

scheme is proposed in [147], which is based on observing a predetermined trend of victim

count during the worm propagation. We refer this detection scheme as CISR. We also run

the Kalman ﬁlter following the design in [147] to perform CISR detection on VSR worms.

Tables 4.1 and 4.2 show the results of above three detection schemes in detecting the

traditional PRS worm and VSR worms. Although all detection schemes are effective in

detecting the traditional PRS worm, they are not effective in detecting VSR worms. For

example, when K is 200, CISH and CISR schemes fail to detect the VSR worm while

CVDH is ineffective, i.e., 41% of the vulnerable hosts are infected at the moment when the worm is detected. Comparatively, CISR is less effective compared with other detection schemes due to the following reason; it tries to detect an exponential increasing trend of the worm scan traffic, but the trend of VSR worm’s scan traffic does not follow the exponential increasing pattern. This causes the Kalman filter to oscillate without convergence.

Our simulation results show that above threshold-based and trend-based worm detection schemes are not be able to effectively detect the VSR worm. In the following section, we develop a new detection scheme which aims to effectively detect the VSR worm.

Detection Traditional Worm VSR(200) VSR(250) VSR(300) CISH 862 ∞ 17700 7600 CVDH 879 33400 12011 9234 CISR 760 ∞ ∞ ∞

Table 4.1: Detection Time of Some Existing Detection Schemes

95 Detection Traditional Worm VSR(200) VSR(250) VSR(300) CISH 3% 100% 52% 12% CVDH 4% 41% 20% 23% CISR 2% 100% 100% 100%

Table 4.2: Maximal Infection Ratio of Some Existing Detection Schemes

4.4 DEC Worm Detection

4.4.1 Design Rationale

As we discussed in Section 4.3, the VSR worm can adopt intelligence in its attack such that it can behave differently from traditional worms. Consequently, the existing worm detection schemes are not effective in detecting the VSR worm.

In this section, we develop a new detection scheme called attack target Distribution

Entropy based dynamiC (DEC) detection that captures the key inherent worm propagation feature and thus be able to effectively detect the VSR worm. The VSR worm can manipulate the scan trafﬁc volume over time so that it is undetectable by existing worm detection schemes based on global scan trafﬁc monitoring. However, its fundamental goal is to propagate itself to as many vulnerable hosts as possible. Hence, to be effective, the

VSR worm still has to continuously propagate the worm to the new targets and cause large scale infection to the Internet. Thus, the VSR worm scan trafﬁc must show a widely dispersed distribution for scan target addresses and the widely dispersed distribution of the attack/scan targets in worm scan trafﬁc can be used to distinguish the VSR worm scan attack from other occasional non-worm port scan activities, i.e., port scan due to software

96 faults or occasional port scan by benign software. Motivated by this observation, our DEC detection uses the attack target distribution as the basic attack data for the worm detection.

Recall our discussion in Section 4.2.2 where we mentioned that there are three important steps/elements associated with worm detection. DEC has special features in these three elements compared with the existing threshold-based and trend-based worm detection schemes: (i) Detection data of worm attacks: The distribution of the attack targets can be observed by the detection monitors in the detection system and DEC treats this distribution as the basic detection data. While a distribution can be described and quantified in different formats, we use the entropy to capture the distribution of the attack targets. From information theory perspective, a smaller entropy indicates fewer number of anomalies in the detection data. (ii) Statistical property of worm detection data: While processing and analyzing the detection data sequences obtained from continuous detection sampling windows, we observe that the sample entropy can distinguish the non-worm scan traffic and worm scan traffic more effectively than other statistical measures, i.e., sample mean and variance.

Hence, we use the entropy to process sampled detection data and capture the abnormality during worm propagation. (iii) Detection decision rule: Since the VSR worm can behave differently over time, we adopt run-time dynamic adaptations in the DEC detection.

4.4.2 DEC Worm Detection

DEC has speciﬁc features which improve the detection performance not only for the detection of VSR worm, but also for the detection of traditional worms. DEC obtains this improvement through the new features of three elements in worm detection as follows:

97 Attack Target Distribution

To deal with the VSR worm, DEC detection uses the distribution of the attack targets

in worm attacks as the detection data. The distribution of the attack targets is captured in

the form of entropy. In the following, we describe how to use the entropy to measure the

distribution of the worm attack targets.

The Shannon entropy H(X) of a data set X = (x1, x2, . . . , xn) is deﬁned as

Xn H(X) = − pilog(pi), (4.8) i=1

where n = |X|, pi = P [X = xi] for i ∈ 1, . . . , n, the logarithm is taken in base 2. Entropy

quantiﬁes ”the amount of uncertainty” contained in data [109], where the ”uncertainty”

means randomness. Entropy is frequently used to quantify the ”randomness” of a distri-

bution; more precisely is the degree of dispersal or concentration of a distribution. The

reason we use entropy is that; when worm is propagating, the destination IP address of port

scanning trafﬁc (worm attack target IP addresses) has a large distribution (more random)

and entropy naturally can quantify the distribution of worm attack targets and detect the

existence of worm propagation.

In the worm detection system, for each detection sampling window, the detection cen-

ter collects the report of connection attempts targeting unused IP addresses or restricted

service ports from the detection monitors located at different locations. Recall that men-

tioned in Section 4.2.2, each entry in a report table from detection monitors has following

format (Src, Dest, time, signature). Src represents the IP address of the worm instance;

Dest represents the destination IP address as the worm scan target; time represents the time stamp of scan; and signature represents some feature of scans, such as port number.

With all reports collected in one sampling window Ws (time unit), an attack/scan target

98 table is integrated for the entropy calculation with the following format: (Dest, sn). Here,

Dest is the unique key representing the scan target IP address and sn is the number of

distinct sources trying to scan Dest. For example, if a attack target table has M entries,

we have a set of data, Z1 = ((Dest1, sn1),..., (DestM , snM )). Mapping this case to the

Formula (4.8), we derive the entropy of worm attack target distribution,

X sn sn H¯ (Z ) = − ( i )log( i ), (4.9) 1 Y Y i=1

PM where Y = i=1 sni.

Statistical Property of Worm Detection Data

To deal with the VSR worm properties such as the time varying scan rate and the time

varying scan trafﬁc pattern, we use a statistical methodology to process the detection data

sequence obtained from continuous detection sampling windows to improve the detection

accuracy. During worm detection, the detection center conﬁgures a detection sliding win-

dow Wd (time unit), which includes q (> 1) continuous detection sampling windows (Re-

call that the size of each sampling window is Ws). As discussed before, the detection center calculates the target distribution in terms of entropy in each sampling window by

Formula (4.9) as the detection data. Thus, there is one detection data in each sampling window. Within the sliding window Wd, there are q target distribution entropy values denoted ¯ ¯ ¯ by Z2 = (Hi−q−1, Hi−q−2,..., Hi) recorded at time i.

The detection center can use sample mean and sample entropy as the statistical property

to process above q detection data in a detection sliding window. The sample mean Eˆ(H¯ ) ˆ and sample entropy H(Z2) of q target distribution entropy series Z2 are deﬁned below:

q X H¯ Eˆ(H(¯Z )) = i−q−j , (4.10) 2 n j=1

99 X o o Hˆ (Z ) = − j log( j ) + log(B). (4.11) 2 q q j

ˆ In H(Z2), we use the histogram-based entropy estimation in [84]. Here oj is the number of sample points in ith bin and B is the histogram’s bin size. Note that these two parameters are obtained through q target distribution entropy values denoted by Z2 [84]. Based on the mean square error of entropy estimation, we can obtain the optimal bin size of the entropy estimation [84].

Decision Rule Adaptation

With the above statistical property of the detection data, the last step in worm detection is to apply detection decision rules. The threshold-based scheme has been widely used in anomaly detection field [132][36]. Worm detection is performed by comparing statistical features of the background non-worm traffic and the detection data traffic. As the

VSR worm adopts varying scan rate and worm spreading follows time varying dynamics, we adopt the statistical pattern recognition as a fundamental principle and apply dynamic threshold to deal with the VSR worm.

Our dynamic threshold adaption is inspired by the following observations: Assume that Xn is a random variable, which represents the detection data in the normal system without worm spreading. We also assume that Xw is the random variable representing the detection data in the network system under worm attack. With the statistical pattern recognition as a principle, the Fig. 4.3 shows the two probability density function of normal non-worm traffic and worm attack traffic. From this figure, we obtain two observations: (i)

For the threshold selection, there is a trade-off between the false alarm rate and detection rate. Detection rate is the probability that a wide-spreading worm is detected successfully.

When the threshold is set larger, it causes smaller false alarm rate and smaller detection

100 rate. The smaller detection rate causes longer detection time. (ii) When the variance of attack trafﬁc Xw increases, the threshold should be dynamically adjusted to be smaller in order to maintain certain detection rate.

Distribution of Xn Distribution of Xw

Detection Rate

False Alarm Rate

Figure 4.3: Bayes decision rule for normal and worm trafﬁc features

The above two observations provide the guidelines in determining the dynamic threshold. The threshold needs to consider the normal non-worm traffic property to achieve low false alarm rate. It also needs to consider the run-time detection traffic property, i.e., variance. When the run-time traffic variance becomes larger, the threshold needs to select a relative smaller value in order to achieve high detection rate. If the normal non-worm traf-

ﬁc and worm attack trafﬁc follow the normal distributions, it is possible to obtain the close formula for the optimal threshold. We present the method to obtain optimal threshold in the case that the sample entropy is used as the statistical property of the detection data in

[140]. For the optimal threshold in the cases of sample mean and sample variance, please refer to [118].

Based on above observations, we conduct dynamic threshold adaptation. At the initial-

ization stage of worm detection, there is a initial threshold value, i.e., Tr0 , which is obtained through off-line training process with the large amount of normal non-worm Internet scan

101 trafﬁc traces [103]. As a result, we can obtain the initial Tr0 to achieve reasonable false

alarm rate. With the run-time detection data, threshold Tr is dynamically adjusted based on

the run-time detection trafﬁc variance σ(H¯ ) as

σ(H¯ ) Tr = [1 − α min ( , 1)]Tr0 , (4.12) Vm

where Vm is a constant value. The min() term is to provide the normalized operator for the

detection data variance σ(H¯ ). The α ∈ (0, 1) is the parameter to set the maximal adjustable threshold. Clearly, there is the trade-off for selecting α, i.e., a larger value of α will improve detection rate but potentially increase false alarm rate.

In Formula (4.12), σ(H¯ ) can be calculated as follows: At time tick i, there are q target ¯ ¯ ¯ distribution entropy values denoted by Z2 = (Hi−q−1, Hi−q−2,..., Hi) recorded in the ¯ sliding window Wd. The sample variance σ(H) of Z2 is deﬁned by v u q u1 X σ(H¯ ) = t (H¯ − Eˆ(H¯ ))2, (4.13) q i−q−j j=1

where Eˆ(H¯ ) is calculated by Formula (4.10).

With dynamical threshold Tr obtained by Formula (4.12) and sample entropy of the detection data by Formula (4.11), the DEC detection scheme conducts the detection through comparing the sample entropy with threshold Tr. If the sample entropy is larger than Tr, the alarm of wide-spreading worm is generated.

4.4.3 Space of Worm Detection

As we discussed in Section 4.2.2 and Section 4.4.1, there are three important ele-

ments/steps in the worm detection. Fig. 4.4 shows the space of schemes, in which three

elements are shown as three different dimensions. We can use a tuple to represent a detec-

tion scheme subspace: (Detection Data, Statistical P rocessing, Decision Rule). We

102 then have 32 possible combinations and the whole detection scheme space is divided into

32 subspaces shown in Fig. 4.4. Each subspace represents a different type of detection schemes.

DEC

EDC th DVDH Detection Data: Statistic CVDH Property: 2: Distribution 4: Entropy

3: Variance

CISH CISR 2: Mean 1: Counter

1: Individual

Decision 1: Static 2: Dynamic 3: Static 4: Dynamic Rule: Threshold Threshold Trend Trend

Figure 4.4: Space of worm detection

The traditional count-threshold-based detection schemes are in the subspace modeled by tuple (Count, Individual, Static tHreshold). We refer this group of detection schemes as CISH. The traditional count-trend-based detection schemes (referred as CISR) are in the subspace that modeled as (Count, Individual, Static T rend) [147]. The detection scheme like [132] is in the subspace which is modeled as (Count, V ariance, Dynamic tHreshold), and we refer it as CVDH. The extension of CVDH by replacing the worm instance count with the worm attack target distribution generates another detection scheme referred as

DVDH in this chapter. DVDH is in the subspace which is modeled as (Distribution,

V ariance, Dynamic tHreshold). Our DEC detection scheme is in the subspace which is modeled as (Distribution, Entropy, Dynamic tHreshold). We refer DEC scheme as

103 DEDH in the following section in order to emphasize the comparison with other schemes.

With the space of detection schemes, we can comprehensively compare our DEC detection

scheme with other existing schemes. This detection scheme space can also inspire the study

of potential new worm detection schemes.

4.5 Performance Evaluation

In this section, we report the results of our simulations to show the detection perfor-

mance of various worm detection schemes under different VSR and traditional PRS worm

attacks. We also investigate the sensitivity of detection performance to the worm attack

parameters.

4.5.1 Evaluation Methodology

In this chapter, we evaluate our proposed detection scheme (DEC or DEDH) by com-

paring it performance with some representative schemes discussed in Section 4.4.3, i.e.,

CISH, DISR, CVDH and DVDH.

Evaluation Metrics

The ﬁrst two metrics we used are the Detection Time (DT ) and Maximal Infection

Ratio (MIR), which are deﬁned in Section 4.3.2. We also use detection false alarm rate deﬁned in Section 4.3.2, which shows the accuracy of the detection scheme.

Simulation Setup

We use real-world Internet trafﬁc traces as the background non-worm scan trafﬁc in

our simulations. To do so, we analyzed scan trafﬁc reported by Internet Storm Center in

SANs [103] and Goldsmith data [49]. The default parameters in our simulation are set as

104 follows. The total number of vulnerable hosts on the Internet is 360, 000, which is similar

to the number of total vulnerable hosts in Code-Red worm incidence [148]. The unit of

the scan rate is the number of scans per time unit. For the traditional worm attack, we

assume that the different infected hosts (worm instances) have different scan rates, but

each worm instance has a scan rate S (> 0) which is determined by a normal distribution

2 12 S = N(Sm,Sσ). In our simulations, we select Sσ as 100 and change Sm from 50 to 350 to evaluate different traditional PRS worms [147].

We simulate the VSR worms as follows. Each worm instance adopts a varying scan

rate (S(t)) and a varying attack probability (Pa(t)), both are functions of time t. The S(t) function in our simulation is

C1 S(t) = max( t ,C2), (4.14) 1 + K

which is a decreasing function of time t (C1 and C2 are constants). Note that S(t) is the

varying scan rate adopted by the VSR worm instance deﬁned in Section 4.3.1. The attack

probability Pa(t) in our simulation is

2π P (t) = max(| cos( t)|,C ), (4.15) a 5000 3

where C3 is a constant. Different values of C1, C2 and C3 correspond to different S(t) and

Pa(t) functions, hence representing different VSR worms. Due to space limitation, we only

present a limited number of cases here. However, we found that the conclusions we draw

hold for all other cases we have evaluated.

We assume that the detection system has distributed monitors which cover 220 IP ad-

dresses. We select 220 as the detection system coverage size to simulate the coverage of

12Each worm instance may have different out-going link bandwidth and computing capacity, thus may have different scan rates.

105 the SANS ISC [103]. The detection sampling window Ws is set to be 5 time units and the detection sliding window Wd is set to be incremental from 100 units to 600 units. The incremental selection of Wd can be adaptive to reflect the worm scan traffic dynamics caused by the VSR worms with various speeds. For fair comparison purpose, the thresholds of detection schemes in the following evaluations are consistently configured with the values achieving the similar false alarm rate (below 5×10−4). For this purpose, the detection maximal adjustable ratio of detection threshold α and parameter Vm (defined in Formula (4.12)) are 0.04 and 200 respectively. Note that our DEC scheme uses Formula (4.12) to dynamically adjust the threshold.

4.5.2 Detection Performance

In this section, we ﬁrst compare the performance of our DEC (or DEDH) detection

scheme with other detection schemes under different VSR worm attacks. Then we report

the comparison between our DEC detection and other detection schemes under different

traditional PRS worm attacks.

Detection Time on VSR Worms 4800 4300 3800 3300 2800 2300 1800 DetectionTime 1300 800 300 325 350 400 450 600 750 950 1100 1350 1500 VSR Parameter K CISH DEDH(DEC) CVDH DVDH

Figure 4.5: Detection time of detection schemes on VSR worms

106 Maximal Infection Ratio on VSR Worms

0.07

0.06

0.05

0.04

0.03

0.02

Maximal Infection Maximal Ratio 0.01

0 325 350 400 450 600 750 950 1100 1350 1500 VSR Parameter K CISH DEDH(DEC) CVDH DVDH

Figure 4.6: Maximal infection ratio of detection schemes on VSR worms

Detection Performance to VSR Worms

As shown in Table 4.1 and 4.2, CISR is not effective to detect VSR worms. Hence,

we do not report the performance of CISR in this subsection. Fig. 4.5 shows the Detec-

tion Time (DT ) of various detection schemes under VSR worm attacks with different K

(deﬁned in Formula (4.14)). Fig. 4.6 shows the corresponding Maximal Infection Ratio

(MIR). From these two ﬁgures, we make the following observations: a) Our DEC (or

DEDH) detection scheme, consistently achieves the best detection performance in terms

of both DT and MIR. For example, when K = 400, the detection time of DEDH is 820

units, which is only 25% − 50% of other detection schemes. This means that DEC is more

robust and can detect the VSR worm signiﬁcantly faster than other detection schemes. With

the same K, the MIR achieved by our DEDH is 0.008, which is only 14% − 40% of the

other detection schemes. This means that DEDH can prevent much more vulnerable hosts

from being infected by the VSR worm, compared with other detection schemes. b) The de-

tection performance of DVDH is better than CVDH, in terms of both DT and DIR. This

result consistently conﬁrms that the worm attack target distribution is more effective than

107 victim count as the detection data. c) When the VSR scan rate parameter K increases, all detection schemes achieve faster worm detection (smaller DT ) and result in smaller MIR.

This is because larger K value implies the faster VSR worm scan. Thus, the VSR worm is detected earlier (smaller DT ) and the damage cost by VSR worm (MIR) is also smaller.

As discussed, there is a trade-off in selecting dynamic threshold parameter α in For-

mula (4.12). A larger value of α will achieve faster detection (smaller detection time) but

worse detection accuracy (larger false alarm rate). Table 4.3 shows the detection time and

detection false alarm rate with different values of α for our DEDH detection scheme. This

table shows that the false alarm rate is sensitive to the dynamic threshold parameter α.

When the value of α is larger, the false alarm rate is also larger.

Parameter α 0 0.04 0.08 0.16 False alarm rate 0.00001 0.00003 0.006 0.015 Detection time (DEDH) 1103 890 850 814 MIR (DEDH) 0.0025 0.0016 0.0013 0.0011

Table 4.3: DEC Performance Sensitivity of Parameter α

Detection Performance to Traditional PRS Worms

For the detection of the traditional PRS worm, we evaluate all the ﬁve detection schemes

listed in Fig. 4.4. Fig. 4.7 shows the Detection Time (DT ) of various detection schemes with different mean values of the scan rate Sm. Fig. 4.8 shows the corresponding Maximal

Infection Rate (MIR). From these two ﬁgures, we can make the following observations:

a) Our DEC (or DEDH) detection scheme consistently achieves the best detection per-

formance in term of both DT and MIR among ﬁve detection schemes. For example,

108 Detection Time on Traditional PRS Worms 1250

1050

850

650

450 DetectionTime

250

50 50 100 150 200 250 300 350 Scan Rate CISH CISR CVDH DVDH DEDH(DEC)

Figure 4.7: Detection time o f detection schemes on the traditional PRS worms

Maximal Infection Ratio on Traditional PRS Worms 0.045 0.04 0.035 0.03 0.025 0.02 0.015 0.01

Maximal Infection Maximal Ratio 0.005 0 50 100 150 200 250 300 350 Scan Rate CISH CISR CVDH DVDH DEDH(DEC)

Figure 4.8: Maximal infection ratio of detection schemes on the traditional PRS worms

when the mean scan rate is 150 per unit time, DEDH achieves a detection time with a value of 240 unit time, which is only 30% − 50% of the detection time of other detection schemes. With the same mean scan rate, DEDH achieves a MIR equal to 0.004, which is only 15% − 35% of other detection schemes. b) When the mean scan rate increases, all detection schemes achieve shorter detection time. Also all the detection schemes except

CVDH and DVDH achieve smaller MIR. This highlights that, in general for most detection schemes, the relatively small scan rates can achieve larger worm attack damage before

109 the worm-spreading is being identiﬁed. The reason is that, a faster worm increases the

worm instance count more quickly and thus the detection detects it much earlier. However,

extremely slow worm propagation is contradict to the goal of active worms. c) When the

scan rate increases, MIR of CVDH and DVDH show the different trends compared with other detection schemes. This observation also matches the result in [132]. Fig. 4.8 also shows that CVDH and DVDH can achieve better performance in terms of MIR compared

with CISH and CISR when the worm scan rate is relative low. This conﬁrms that dynam-

ically adjusting the detection threshold in DEDH is a good way to improve the detection

performance especially for detecting stealthy worm with varying scan rate.

To summarize our observations, we can see that our DEC-based detection scheme is

highly potent to detect both the VSR worm and traditional PRS worms.

4.6 Related Work

The signiﬁcant damage caused by active worms forces people to pay attention to their

study. Much effort has been paid on studying the worm modeling, analysis, detection, and

defense. In the area of modeling and analyzing active worms, Staniford et al. studied

various active worms and modeled the propagation of them [114]. Chen et al. analyzed the

propagation of active worms with a discrete time model and also considered the impact of

patching during the worm spreading [27]. Moore et al. analyzed and modeled Slammer

worm spreading in [86]. There are some other works such as malware spreading dynamics

by Garetto et al. in [48] and Code-Red worm modeling by Zou et al. in [148]. We modeled

a new form of worms, namely Varying Scan Rate worm, which generalizes the worms

that deliberately change the scan rates to evade the existing global scan trafﬁc monitoring

110 based worm detection schemes. Examples of such worms are ”Atak” worm [144] and

”self-stopping” worm [75].

The most important component of worm defense systems is worm detection, which in fact is the foundation of defending against worms. In worm detection, many of schemes leverage intrusion detection results [36]. Besides the detection schemes based on the global scan trafﬁc monitoring as discussed in Section 4.2.2, there are some other types of worm detection schemes. For example, using sequential hypothesis testing, Jung et al. developed a Threshold Random Walk online detection algorithm to identify worm infected hosts [105].

Gu et al. developed DSC (Destination-Source Correlation) for detecting worm in local networks, which is based on worm infection factor, i.e, the victim host is ﬁrst scanned, then sends out scans destined for the same port [53]. Kim and Karp proposed a scheme to automatically generate the worm signature [61].

Our DEC detection scheme is different from above detection schemes in the sense that

DEC uses attack target distribution as the detection data to capture worm propagation.

Further more, it uses the statistical property of entropy as mining utility to synthesize the detection data, and adopts dynamic detection decision rule to improve the detection performance. A recent similar work in [67] also discussed using traffic distribution (summarized by entropy) such as destination IP address to detect various anomalies. However, the work in [67] is different from ours in the following regards: its detection scheme was not tuned specifically for worm detection, it did not address the impact of various statistical properties (sample mean/variance/entropy) on detection performance and it did not compare representative worm detection schemes. Our work covered a thorough study on worm detection. Especially, we defined a three-dimensional worm detection space to leverage

111 existing schemes, and applied various statistical properties and used dynamic threshold to

enhance the accuracy in detecting both VSR and traditional worms.

4.7 Summary

In this chapter, we modeled a new form of worms called Varying Scan Rate worm

(the VSR worm in short). The VSR worm is generic and simple to launch. Our results

showed that the VSR worm can signiﬁcantly degrade the effectiveness of existing worm

detection schemes based on global trafﬁc monitoring. To counteract the VSR worm, we

developed a new worm detection scheme called attack target Distribution Entropy based dynamiC detection scheme (the DEC detection). The DEC detection utilizes the attack target distribution and its statistical entropy in conjunction with dynamic decision rules to distinguish the worm scan traffic from the non-worm scan traffic. Our data clearly demonstrated the effectiveness of the DEC detection scheme in detecting the VSR worms as well as traditional PRS worms. To the best of our knowledge, our work is the first work to systematically study active worms with deliberately varying scan rate and develop effective detection scheme against them.

In our research, except above VSR worm, we have also investigates another new class

of active worms, i.e., Camouﬂaging Worm (C-Worm), which has the ability to camouﬂage

its propagation from worm detection systems [142]. The C-Worm intelligently manipulates

its scan trafﬁc volume dynamically and timely so that its propagation may not be detected

by existing network-based worm detection algorithms. We analyzed characteristics of the

C-Worm and compare trafﬁc of both the C-Worm and normal non-worm scans. We ob-

served that they are barely distinguishable in the time domain. However, in the frequency

domain, the distinction between them is clear due to the recurring manipulative nature of

112 the C-Worm. Motivated by our observations, we designed a novel spectrum-based scheme to detect the C-Worm. Our scheme uses the Power Spectral Density (PSD) distribution of the scan traffic volume and its corresponding Spectral Flatness Measure (SFM) to distinguish the C-Worm traffic from non-worm traffic. We conducted extensive performance evaluations on the C-Worm through simulation using real-world trace as background scan traffic. The performance results demonstrate that our spectrum-based scheme can more rapidly and accurately detect the C-Worm in comparison with existing worm detection schemes.

113 CHAPTER 5

POLYMORPHIC WORMS AGAINST HOST-BASED WORM DETECTION AND COUNTERMEASURES

In this chapter, we do not define any new attack model because the evolving worms we study here have been in existing. The worm writers know that most host-based worm detection algorithms use binary presentations or signatures of seen worms as reference to distinguish worms and benign executable, hence they generate polymorphic worms which attempt to change the signatures during the propagation. In order to fight against this new threat in the real world, we propose a new worm detection approach based on mining the dynamic program executions. This approach can capture the dynamic behavior of executables to provide accurate and efficient detection against both seen and unseen worms including polymorphic worms.

5.1 Motivations

In general, there are two types of worm detection systems: network-based detection and host-based detection. Network-based detection systems detect worms primarily by monitoring, collecting, and analyzing the scan trafﬁc (messages to identify vulnerable computers) generated by worm attacks. Many detection schemes fall into this category

[132, 123, 112, 62, 147, 67, 141]. Nevertheless, because of their reliance on scan trafﬁc,

114 these schemes are not very effective in detecting worms that spread via email systems,

instant messenger (IM) or peer-to-peer (P2P) applications.

On the other hand, host-based detection systems detect worms by monitoring, collect-

ing, and analyzing worm behaviors on end-hosts. Since worms are malicious programs that

execute on these machines, analyzing the behavior of worm executables13 plays an impor-

tant role in host-based detection systems. Many detection schemes fall into this category

[1, 64, 106]. Considering that a large number of real-world worm executables are accessi-

ble over the Internet, they provide an opportunity for researchers to directly analyze them

to understand their behavior and, consequently, develop more effective detection schemes.

Therefore, the focus of this chapter is to use this large number of real-world worm executa-

bles to develop a host-based detection scheme which can efﬁciently and accurately detect

new worms.

Within this category, most existing schemes have been focusing on static properties

of executables [64, 106]. In particular, the list of called Dynamic Link Libraries (DLLs),

functions and speciﬁc ASCII strings extracted from the executable headers, hexadecimal

sequences extracted from the executable bodies, and other static properties are used to

distinguish malicious and benign executables. However, using static properties without

program execution might not accurately distinguish between these exectuables due to the

following two reasons.

• First, two different executables (e.g., one worm and one benign) can have same static

properties, i.e., they can call the same set of DLLs and even call the same set of

functions.

13In this chapter, an executable means a binary that can be executed, which is different from program source code.

115 • Second, these static properties can be changed by the worm writers by inserting

“dummy” functions in the worm executable that will not be called during program

execution, or by inserting benign-looking strings [32, 50, 63, 20].

Hence, the static properties of programs, or how they look, are not the keys to distinguish worm and benign executables. Instead, we believe the keys are what programs do, i.e., their run-time behaviors or dynamic properties. Therefore, our study adopts dynamic program analysis to profile the run-time behavior of executables to efficiently and accurately detect new worm executables. However, dynamic program analysis poses three challenges. First, in order to capture the run-time behavior of executables (both worm and benign ones), we have to execute a large number of malicious worms, which might damages our host and network systems. Second, given the large number of executables, manually executing and analyzing them is infeasible in practice. Hence, we need to find an efficient way to automatically capture programs’ run-time behavior from their execution.

Third, from the execution of a large set of various worms and benign executables, we need to ﬁnd some constant and fundamental behavior differences between the worms and the benign executables in order to accurately determine whether an unseen executable is a worm or a benign executable.

In order to address the above issues, we propose an effective worm detection approach

based on mining system-call traces of a large amount of real-world worms and benign

executables. Our goal is to use large volume of existing worms to capture the common

dynamic behavior features of worms and use them to detect new worms.

116 5.2 Background

In this section, we give an overview of worm detection and then introduce program analysis and data mining techniques.

5.2.1 Worm Detection

Generally, worm detection can be classiﬁed into network-based and host-based schemes.

Network-based schemes detect worm attacks by monitoring, collecting, and analyzing worm-generated traffic. For this purpose, Internet Threat Monitoring (ITM) systems have now been developed and deployed [103, 15]. An ITM system usually consists of a number of monitors and a data center. Each monitor of an ITM system is responsible for monitoring traffic targeted to a range of unused, yet routable, IP address space and periodically reports the collected traffic logs to the data center. The data center analyzes the logs and posts summarized reports for alarming Internet worm attacks. Based on data reported by ITM systems, many detection schemes have been proposed [132, 123, 146, 59]. Nevertheless, as we mentioned in Section 5.1, these detection schemes have limitations detecting worms that spread via e-mail systems, instant messenger (IM), or peer-to-peer (P2P) applications, since their traffic is difficult for ITM systems to observe [77].

Host-based schemes detect worm attacks by monitoring, collecting, and analyzing the worm behavior on end-hosts. In particular, when a worm executes on an infected computer, it may take control of the system with high privileges, modify the system as needed, and continue to infect other computers. These acts expose some anomalies on the infected computers, such as writing or modifying registry keys and system binaries or opening network connections to transfer worm executables to other vulnerable computers. For example,

117 the “Blaster” worm changes a registry entry, downloads a ﬁle named “msblast.exe”, and

executes it [25].

Traditional host-based detection focuses primarily on detecting worms by signature

matching. In particular, these detection systems have a database of distinctive patterns

(signatures) of malicious code for which they scan in possibly-infected systems. This ap-

proach is fast and, until recently, quite effective to detect known worms. However, it is

not effective to detect new worms, as they have new signatures unknown to these detec-

tion systems during the worms’ early propagation stage. Furthermore, worm writers can

use the clear worm signatures generated or used by these detection systems to change the

signatures in order to evade detection. For example, worms such as MetaPHOR [79] and

Zmist [45] intensively metamorphose to hide themselves from detection, thereby illustrat-

ing the feasibility and the efﬁciency of mutation techniques. Recent data show that current

commercial worm scanners can be easily circumvented by the use of simple mutation tech-

niques [29, 30].

Since attackers always want to hide their malicious actions, they do not make their at-

tack source code publicly available. However, the attack executables are publicly available

after the attacks are captured. Unlike classical host-based detection, our intention is to

use a large number of real-world worm executables and further develop a generic detec-

tion scheme to detect new worms. For this purpose, dynamic program analysis plays an

important role and is introduced in the following subsection.

5.2.2 Program Analysis

Unlike static program analysis, dynamic program analysis does not require the executable’s source code, but dynamic analysis must be performed by executing the program

118 [42, 72]. Most dynamic program analysis methods, such as debugging, simulation, binary instrumentation, execution tracing, stack status tracking, etc. are primarily used for software-engineering and compiler-optimization purposes. Recently, interest in dynamic program analysis has arisen for vulnerability and “security hole”-detection purposes. How- ever, some dynamic-analysis approaches are only suitable for analysis of individual executables with human expertise, such as debugging, or are only ﬁt for speciﬁc attacks

[44, 100]. For our work, we need an appropriate dynamic program analysis method to investigate the run-time behavior of worm and benign executables to detect worms. The method we adopt here is to trace system calls during the program execution, which is a type of execution tracing. In particular, we trace the operating system calls invoked by the programs during their execution. This method can be used to automatically record interesting information during execution to further investigate executables’ behavior in the course of worm detection.

5.2.3 Data Mining

Data mining refers to the process of extracting “knowledge,” or meaningful and useful information, from large volumes of data [40, 54]. This is achieved by analyzing data from different perspectives to find inherent hidden patterns, models, relationships, or any other information that can be applied to new datasets. It includes algorithms for classification, clustering, association-rule mining, pattern recognition, regression, and prediction, among others. Data-mining algorithms and tools are widely adopted in a range of applications as well as in the computer-security field. In particular, various data-mining technologies are adopted in different threat-detection approaches as described in Section 5.7. In our work,

119 we use classiﬁcation algorithms to differentiate between worm and benign program execution in order to provide accurate worm detection against both seen and unseen worms.

5.3 Polymorphic Worms

Although numerous of efforts have been made to detect active worms, the evolved worms are using metamorphism to circumvent these existing worm detections. While most of existing host-based worm detection algorithms use worm signature of seen worms to determine whether a encountered executable is worm or not, the polymorphic worms attempt to change their binary presentation or signatures during propagation, so that they are always unseen to the worm detectors and thus able to evade their detection.

In fact, above polymorphic techniques are not new in virus [79, 91, 117]. Recently, active worms also show the trend to utilize them [63]. Furthermore, the technology for mutate worm code have been public available even as open source toolkits or libraries

[37, 65, 107]. Attackers can easily use them to make their worms polymorphic and hard to be detected by the signature-based worm detection. Utilization of automatic encryption and decryption further makes the polymorphism of worms more feasible and efﬁcient.

Polymorphic worms are dangerous and potent evolving widespread Internet attacks which bring a serious threat to the Internet due to their effectiveness in evading existing host-based worm detection. The worm detection proposed by us in this chapter aims to address this threat by using the dynamic properties of executables instead of static signature to capture worm executables. We do not use the binary presentation as the feature to distinguish worms from benign executables, thus the mutation techniques used by the polymorphic worms have no impact on our worm detection approach. As shown in 5.5, our

120 dynamic program analysis based approach is effective in detecting unseen worms, includ-

ing brand new worms and polymorphic worms.

5.4 Worm Detection via Mining Dynamic Program Execution

5.4.1 Framework Overview

Recall that the focus of this chapter is to use a large number of real-world worm exe-

cutables and subsequently develop an approach to detect new worms. In this section, we

introduce the framework of our system for dynamic program analysis that detects worm

executables based on mining system-call traces of a large amount of real-world worm and

benign executables. In general, this mining process is referred to as the off-line classiﬁer

learning process. Its purpose is to learn (or train) a generic classifier that can be used to distinguish worm executables from benign ones based on system call traces. Then we use the learned classifier with appropriate classification algorithms to determine, with high accuracy, whether unknown executables belong to the worm class or the benign class. This process is referred to as the on-line worm detection process. The basic workflow is illustrated in Fig. 5.1 and Fig. 5.2, which is subsequently explained.

(1) Collect (2) Collect dataset (3) Extract (4) Learn the executables as by tracing system feature from classifier data source calls system call trace

Figure 5.1: Workﬂow of the off-line classiﬁer learning

121 (1) Trace (2) Extract (3) Classify the system call of a feature from its executable with new executable system call trace learned classifier

Figure 5.2: Workﬂow of the on-line worm detection

Off-line Classiﬁer Learning

1. Data Source Preparation

Before we can begin dynamic program analysis and proﬁle the behavior of worm

and benign executables, we need to collect a large number of such executables as

the data source for our study. These executables are labeled into two classes: worm

executables and benign executables. The worms are obtained from the Web site VX

Heavens (http://vx.netlux.org).

2. Collection Dataset - Dynamic Properties of Executables

With the prepared data source, we discuss how to collect the dataset, which we refer

to as the dynamic properties of executables. Recall that in order to accurately distin-

guish worm executables from benign ones, we need to collect data that can capture

the fundamental behavior differences between them—the dynamic properties. One

feasible and efﬁcient method we choose is to run the executables and trace the run-

time system-call sequences during their execution. However, executing worms might

damage the host operating systems or even the computer hardware. In order to solve

this problem in our experiments, we set up virtual machines as the testbed. Then we

launch each executable in our data source and record its system-call trace during the

execution on the virtual machine. We refer to the collection of the system-call traces

for each executable in our data source as the dataset. We split the dataset into two

122 parts: the training set and the test set. With the training set, we will apply classiﬁca-

tion learning algorithms to learn the classiﬁer. The concrete format and content of the

classiﬁer is determined by the learning algorithms adopted. With the test set, we will

further evaluate the accuracy of the learned classiﬁer with respect to the classiﬁcation

of new and unidentiﬁed executables.

3. Feature Extraction

With the collection dataset comprising system-call traces of different executables, we

extract all the system-call sequence segments with a certain length. These segments

are referred as n-grams, where n is the length of the sequence, i.e., the number of

system calls in one segment. These n-grams can map to relatively independent and

meaningful actions taken during the program execution, or the executables’ program

blocks. We intend to use these n-grams to capture the behaviors of common worms

and benign executables. Hence these n-grams are the features for classifying worms

and benign executables, and each distinct n-gram represents a particular feature in

our classiﬁcation.

4. Classiﬁer Learning

From the features we extract from the training dataset, we need to learn a classiﬁer

to distinguish between worms and benign executables. When we select the clas-

siﬁcation algorithm, we need to consider the learned classiﬁer’s accuracy as well

as its interpretability. Some classiﬁers are easy to interpret and the classiﬁcation

(i.e., the decision rule of worm detection) can be easily extracted from the classiﬁer

[106]. Then worm writers can use the rules to change their worms’ behavior and

consequently evade detection, similar to self-mutating worms that metamorphose

123 to defeat signature-based detection [20]. Thus, we need classiﬁers with very low

interpretability. In our case, we consider two algorithms, the Naive Bayes-based

algorithm and the Support Vector Machine (SVM) algorithm, and compare their per-

formance. While the Naive Bayes-based algorithm is simple and efﬁcient in classiﬁer

learning, SVM is more accurate. More importantly, SVM learns a black-box classi-

ﬁer that is hard for worm writers to interpret.

On-line Worm Detection

Having learned the classifier in the off-line process, we now describe how it is used to carry out on-line worm detection. In this process, we intend to automatically detect a new and unseen executable. In particular, we follow the same procedure as in the off-line process, in which system-call traces of an unknown executable are recorded and classification features (i.e., system-call sequence segments with certain lengths) are extracted during its execution. Then the classification algorithm is applied with the learned classifier to classify the new executable as a worm or a benign program.

In fact, the aforementioned worm detection actually depends on the accuracy of the classifier. In order to evaluate it, we use it to classify the executables in the test set. Since we already know the class label of these executables, we can simply compare the classification results from the learned classifier with the pre-known labels. As such, the accuracy of our classifier can be measured.

In the following sections, we will present the major steps above, i.e., dataset collection, feature extraction, classiﬁer learning and on-line worm detection in detail, followed by experiment results.

124 5.4.2 Dataset Collection

In this section, we present the details on how we obtain the dataset, i.e., the dynamic program properties of executables in the form of system call traces.

Worm Execution with Virtual Machine

In order to study the run-time behavior of worms and benign executables for worm detection, we need to execute the benign executables as well as the worms. However, worms might damage the operating system and even the hardware of the hosts. In order to solve this problem, we set up virtual machines (VMs) [55, 80] as the testbed. The VM we choose is VMware [55].

Even with VMs, two difficulties can still arise during data collection because of the worm execution. First, since worms can crash the operating system (OS) in the VM, we might have to repeatedly re-install the OS. In order to avoid these tedious re-installations, we install all necessary software for our experiments and store all our worm executables on the VM, and then save the image file for that VM. Whenever the VM OS crashes, we can clone the identical VM from the image file to continue our experiment. Second, it is difficult to obtain the system-call traces from the VM after it crashes. In order to solve this problem, we set the physical machine on which a VM is installed as the network neighbor of the VM through the virtual network. Thus, during worm execution, the VM automatically outputs the system-call trace to the physical machine. Although the physical machine can be attacked by the worms on the VM because of this virtual network, we protect the physical machine with anti-virus and other security software and impose very restrictive access controls.

125 System-Call Trace

Recall that we choose dynamic properties of executables to capture their behavior and, more accurately, distinguish worms from benign executables. There are multiple dynamic program analysis methods [42, 72] that can be used to investigate the dynamic properties of executables. The most popular methods are debugging and simulation. However, they must be used manually with human expertise to study program behavior. In our case, their human-intervention requirement makes them unsuitable for automatic analysis. Still, execution tracing is a good method for automatic analysis, as it can automatically record run-time behavior of executables. In addition, it is easy to analyze the system-call trace using automatic analysis algorithms.

There are several different ways to carry out execution tracing. In our case, we choose to trace system calls of worm and benign executables and use the trace to perform classiﬁ- cation (and hence worm detection). The reasons for doing this are straightforward. Tracing all Microsoft Windows Application Programming Interface (API) functions can capture more details about the run-time behavior of executables. However, in comparison with tracing only system calls, API tracing increases OS resource consumption and interferes with the execution of other programs. This is because there are far fewer system calls (311 for all the Windows version together [73], 293 for the Linux 2.6 kernel [56]) than there are

APIs (over 76,000 for Windows versions before Vista [119] and over 1,000 for Linux [98]).

Hence, we choose to trace only system calls to facilitate “light-weight” worm detection.

126 5.4.3 Feature Extraction

Features are key elements of any anomaly-based detection or classiﬁcation. In this

section, we describe the method to extract and process the features that are used to learn

the classiﬁer and carry out worm detection.

N-gram from System-Call Trace

System-call traces of executables are the system-call sequences (time series) of the

execution, which contain temporal information about the program execution and thus the

respective dynamic behavior information. In our system, we need to extract appropriate

features that can capture common or similar temporal information “hidden” in the system-

call sequences of all worm executables, which is different from the temporal information

hidden in the system-call sequences of all benign executables.

The n-gram is a well-accepted and frequently adopted temporal feature in various areas

of (statistical) natural language processing and genetic sequence analysis [69]. It also ﬁts

our temporal analysis requirement. An n-gram is a subsequence of n items from a given

sequence. For example, if a system call sequence is {NtReplyWaitReceivePort-

Ex, NtOpenKey, NtReadVirtualMemory, NtCreateEvent, NtQuerySystem-

Information}, then the 3-grams from this sequence are {NtReplyWaitReceive-

PortEx, NtOpenKey, NtReadVirtualMemory}, {NtOpenKey, NtReadVirtual-

Memory, NtCreateEvent}, and {NtReadVirtualMemory, NtCreateEvent, Nt-

QuerySystemInformation}.

We use n-grams as the features in our system for the following reasons. Imagine the

difference between one line of source code and one block of source code in a program. The

line of code provides little meaningful information about a program, but the block of code

127 usually represents a meaningful and self-contained small task in a program, which is the

logical unit of programming. Similarly, one system call only provides very limited infor-

mation about the behavior of an executable, whereas a segment of system calls might rep-

resent a meaningful and self-contained action taken during the program execution. Worm

and benign executables have different behaviors, and this difference can be represented as

the difference between their source code blocks, or the segments (i.e., n-grams) of their

system calls. Hence, we use these system-call segments, or the n-grams, as the features to

classify worm and benign executables, which proves to be very effective throughout our

experiments as described in Section 5.5.

Length of N-gram

A natural question is: what n-gram length is best for classifying worms from benign

executables? On one hand, in order to capture the dynamic program behavior, n should

be greater than 1. Otherwise, the extracted 1-gram list is actually the list of system calls

invoked by the executables. This special case is the same as the method used by static pro-

gram analysis to detect worms, which has no dynamic run-time information of executables.

On the other hand, n should not be very large for the following two reasons. First, if n

is too large, it is very unlikely that we will ﬁnd common or similar n-grams among different

worm executables. In one extreme case, when n becomes very large, the n-grams are no

longer small tasks. Instead, they encompass the entire execution of the programs. Because different worms cannot have the exact same sequence of system-call invocations (otherwise they are the same worm), the classiﬁer learning algorithms cannot ﬁnd a common feature

(i.e., the same system-call invocations) among them, and the algorithms cannot be used to deﬁne a class in which all the worms are included. In this case, the classiﬁcation will not work. Second, if n is too large, the number of possible distinct n-grams—311n for

128 MS Windows as Windows has 311 system calls, 293n for Linux as Linux has 293 system calls—will be too large to to be analyzed in practice. We investigate the impact of n-gram length on worm detection in our experiments and report the results in Section 5.5.

5.4.4 Classiﬁer Learning and Worm Detection

In this section, we describe the details of the last step in the off-line classifier learning process (i.e., how to apply the classifier learning algorithm to learn the classifier after extracting the features). In particular, we use two classification algorithms: the Naive Bayes algorithm, which is a simple but popular learning algorithm, and the Support Vector Ma- chine (SVM) algorithm, which is a more powerful but more computationally-expensive learning algorithm. We also discuss how to conduct on-line worm detection with each of the algorithms in detail.

Naive Bayes based Classiﬁcation and Worm Detection

The Naive Bayes classifier (also known as the Simple Bayes classifier) is a simple probabilistic classifier based on applying Bayes’ Theorem [54]. In spite of its naive design, the Naive Bayes classifier may perform better than more sophisticated classifiers in some cases, and it can be trained very efficiently with a labeled training dataset. Nevertheless, in order to use the Naive Bayes classifier, one must make the assumption that the features used in the classification occur independently.

In our case, we use the Naive Bayes classiﬁer to calculate the likelihood that an executable is a worm executable (i.e., in the worm class) and the likelihood that it is a benign executable (i.e., in the benign class). Then, based on which of the two classes have a larger likelihood, the detection decision is made.

1. Off-line Classiﬁer Learning

129 We represent each executable by an m-dimensional feature vector, X = (x1, x2, . . . , xm),

where m is the number of distinct n-grams in the dataset, xi (i = 0, . . . , m − 1) is

th the i distinct n-gram such that xi = 1 if xi appears in the executable’s system call

trace and xi = 0 otherwise. We have two classes: the worm class Cw and the benign

class Cb. Given the feature vector X of an unknown executable, we need to predict

which class X belongs to. The prediction is done as follows. First, for each class, we

calculate the likelihood that the executable belongs to that class. Second, we make a

decision based on the larger likelihood value, i.e., the executable belongs to the class

that has the larger likelihood.

Actually, the off-line “classiﬁer” learning process of the Naive Bayes algorithm is

the preparation for the calculation of the above two likelihoods. In particular, this

preparation is the calculation of some statistical probabilities based on the training

data. These probabilities comprise the posterior probability of each n-gram—say,

xi—conditioned on each class, Cw and Cb. Hence, the off-line “classiﬁer” learning

process in our Naive Bayes classiﬁcation actually is the calculation of P (xi|Cj) i =

1, . . . , m, j = w or b based on the training dataset.14

2. On-line Worm Detection

During the on-line worm detection, for each unknown executable, the feature vector

X for that executable is built ﬁrst. Then we predict that X belongs to the class that has

a higher posterior probability, conditioned on X. That is, the Naive Bayes classiﬁer

14In some implementations, the classiﬁer learning based on the Naive Bayes algorithm may conduct extra procedures, such as selection of features and cross-validation, but they are not the core procedures for the Naive Bayes algorithm.

130 assigns an unknown sample X to the class Cj if and only if

P (Cj|X) > P (Ck|X) where j, k = w or b, j 6= k. (5.1)

Based on Bayes’ Theorem, P (Cj|X) can be calculated by

P (X|C )P (C ) P (C |X) = j j . (5.2) j P (X)

In order to predict the class of X, we will calculate P (X|Cj)P (Cj) for j = w or b

and consequently compare P (Cw|X) to P (Cb|X). Now we discuss how to calculate

P (X|Cj)P (Cj). First, if the class prior probabilities P (Cw) and P (Cb) are unknown,

then it is commonly assumed that the classes are equally likely, i.e., P (Cw) = P (Cb).

Otherwise, P (Cj) can be estimated by the proportion of class Cj in the dataset. Sec-

ond, as we assume the features are independent, P (X|Cj) can be calculated by

Ym P (X|Cj) = P (xi|Cj), (5.3) i=1

where P (xi|Cj) can be calculated during the off-line classiﬁer learning process.

3. Discussion

The Naive Bayes classiﬁer is effective and efﬁcient in many applications. The theo-

retical time complexity for learning a Naive Bayes classiﬁer is O(Nd), where N is

the number of training examples and d is the dimensionality of the feature vectors.

The complexity of classiﬁcation for an unknown example (an unknown executable

in our case) is only O(d).

However, the Naive Bayes classiﬁer has two limitations in our case. First, worm writ-

ers can use it to make worm detection less effective for new worms. In our approach,

it includes a set of probabilities that the n-grams appear in each class. Worm writers

131 can directly use this information to make new worms similar to benign executables

by either using or avoiding certain n-grams (system-call sequences). Second, high

accuracy of the Naive Bayes classiﬁer is based on the assumption that the features

are independent of each other. However, in reality, the n-grams in the system-call

trace of an executable may not be independent. In order to address these problems

of Naive Bayes classiﬁer, we use the Support Vector Machine (SVM) in our worm

detection as described in the following subsection.

Support Vector Machines based Classiﬁcation and Worm Detection

The Support Vector Machine (SVM) is a type of learning machine based on statistical

learning theories [121]. SVM-based classiﬁcation includes two processes: classiﬁer learn-

ing and classification. Classifier learning is the learning of a classifier or model using the

training dataset. The learned classiﬁer is used to determine or predict the class “label” of

instances that are not contained in the training dataset. The SVM is a sophisticated and

accurate classiﬁcation algorithm. It is computationally expensive and its trained classiﬁer

is difﬁcult to interpret. Its outstanding accuracy and low interpretability match our require-

ments for accurate worm detection and interpretation difﬁculty for worm writers.

1. Off-line Classiﬁer Learning

A typical SVM classiﬁer-learning problem is to label (classify) N training data

15 d {x1,..., xN } to positive and negative classes where xi ∈ R (i = 1,...,N) and d

is the dimensionality of the samples. Thus, the classiﬁcation result is {(x1, y1),..., (xN , yN )},

15The SVM algorithm can be extended to classiﬁcation for more than two classes, but the two classes are the typical and basic cases. Our problem is a two-class classiﬁcation problem.

132 th yi ∈ {−1, +1}. In our case, xi is the feature vector built for the i executable in our dataset. That is, xi = {xi,1, . . . , xi,d}, where d is the number of distinct n-grams, xi,j

th th (j = 1, . . . , d) is the j n-gram, xi,j = 1 if xi,j appears in the i executable’s system call trace and xi = 0 otherwise. yi = −1 means that xi belongs to the worm class and yi = +1 means that xi belongs to the benign-executable class. As we have a large number of features (n-gram), the dimensionality of the Euclidean space in our classiﬁcation problem is very large (upper bounded by 311n depending on n-gram length n).

There are two cases for the SVM classifier learning problems; (1) the samples in the two classes are linearly separable; (2) the samples in the two classes are not linearly separable. But (2) case holds for most real-world problems. In the SVM, in order to achieve a optimal classifier, the non-linear solvable problem in case (2) needs to be transform to be a linear solvable problem in case (1) first. Then the optimal classifier can be learned through linear optimization [121, 122]. In the following, we first present the algorithm for the simple case (case (1)) followed by the algorithm for case (2).

1) Classes are linearly separable

If the two classes are linearly separable, then we can ﬁnd a hyperplane to separate the examples in two classes as shown in the right side of Fig. 5.3. Examples that belong to different classes should be located on different sides of the hyperplane. The intent of the classiﬁer learning process is to obtain a hyperplane which can maximally separate the two classes.

133 Mathematically, if the two classes are linearly separable, then we can ﬁnd a hyperplane w · x + b = 0 with a vector w and an intercept b that satisﬁes the following constraints:

w · xi + b ≥ +1 for yi = +1 and (5.4)

w · xi − b ≤ −1 for yi = −1, (5.5)

or, equivalently,

yi(w · xi − b) − 1 ≤ 0 ∀i. (5.6)

Examples in the training set that satisfy the above inequality are referred to as support vectors. The support vectors deﬁne two hyperplanes: one that goes through the support vectors of the positive class, and the other goes through the support vectors of the negative class. The distance between these two hyperplanes deﬁnes a margin and this margin is maximized when the norm of the vector w, kwk, is minimized.

When the margin is maximized, the hyperplane w · x + b = 0 separates the two classes maximally, which is the optimal classiﬁer in the SVM algorithm. The dual form of Equation 5.6 reveals that the above optimization actually is to maximize the following function:

XN 1 XN XN W (α) = α − α α (x · x )y y , (5.7) i 1 i j i j i j i=1 i=1 j=1 subject to the constraint that αi ≥ 0. The SVM algorithm can achieve the optimal classiﬁer by ﬁnding out αi ≥ 0 for each training sample xi to maximize W (α).

2) Classes are not linearly separable

In the above case, linearly-separable classes can be optimized. However, real-world classiﬁcation problems cannot usually be solved by the linear optimization algorithm.

134 This case is illustrated as the left side of Fig. 5.3, in which there is no linear hyperplane (in this case, it is a straight line in 2-dimensional space) that can separate the examples in two classes (shown here with different colors). In other words, the re- quired classiﬁer must be a curve, which is difﬁcult to optimize.

feature 2 new feature 2 feature mapping

feature 1 new feature 1

Figure 5.3: Basic idea of kernel function in SVM.

The SVM provides a solution to this problem by transforming the original feature space into some other, potentially high-dimensional, Euclidean space. Then the mapped examples in the training set can be linearly separable in the new space, as demonstrated by the right side of Fig. 5.3. This space transformation in Equa- tion (5.7) can be implemented by a kernel function,

K(xi, xj) = Φ(xi) · Φ(xj), (5.8) where Φ is the mapping from the original feature space to the new Euclidean space.

We would only need to use K in the classiﬁer training process with Equation (5.7), and would never need to explicitly even know what Φ is. The SVM kernel function can be linear or non-linear. Common non-linear kernel functions include the

135 Polynomial Function, Radial Basis Function (RBF), and Sigmoid Function, among

others.

2. On-line Worm Detection

On-line worm detection is the classiﬁcation of new executables using the SVM classi-

ﬁcation algorithm along with the optimal SVM classiﬁer learned during the previously-

discussed off-line learning process.

For an unknown executable (a worm or benign executable), its feature vector xk must

be built ﬁrst. The method is the same as the aforementioned process on the executa-

bles in the training set, i.e., the system-call trace during the execution is recorded,

then the n-grams with a certain value of n are extracted. Afterwards, the feature

vector xk is formed from the trace of the executable using the same method as in the

off-line classiﬁer learning process.

Recall that during the classiﬁer learning process, the optimal hyperplane is found.

Then for a new example xk, shown as the white circle in Fig. 5.3, the on-line classi-

ﬁcation checks on which side of the optimal hyperplane xk is. Mathematically, the

classiﬁcation is conducted through signing a class to the executable by

C(xk) = sign(w · xk − b), (5.9)

where XN w = αiyixi. (5.10) i=1

If C(xi) is positive, we predict that the executable is a worm. Otherwise, we predict

that it is benign.

136 3. Complexity of SVM

The classiﬁer learning process of SVM is relatively time-consuming because of the

large volume of the training set, the high-dimensionality of our feature space, and

the complexity of classiﬁer calculation and optimization. No matter which kernel

function is used for N training examples with feature vectors of dimensionality d and

3 NS support vectors, the SVM classiﬁer learning algorithm has complexity O(NS +

2 NSN + NSdN). However, the SVM classiﬁcation process for each new executable

is fast and involves only limited calculations. Its complexity is O(MNS), where M

is the complexity of the kernel function operation. For Radial Basis Function (RBF)

kernel functions, M is O(d).

4. Black Box Characteristics of the SVM Classiﬁer

The classiﬁer learned by the SVM can be easily used to carry out worm detection.

However, the SVM classiﬁer is hard to be interpreted. The SVM classiﬁer learning

algorithm generates black box models (classiﬁers) in the sense that they do not have

the ability to explain in an understandable form [93, 16, 46]. Thus, from the SVM

classiﬁer, it is hard to extract decision rules comprehensible in the original problem

domain, especially for the non-linear SVM due to the feature space transformation

introduce by kernel functions.

The above characteristic of SVM is a well-known limitation for the applications in

which one needs to know the decision rules which can be mapped back to the physical

entities in the original problem domain. However, this characteristic can help us

prevent the worm writers from interpreting and learning from the classiﬁer. We want

to prevent the worm writers from obtaining the signature of his worms or any benign

137 executable. Otherwise, the worm writer can hide new worms accordingly as benign

executables.

Besides the optimization algorithm used in SVM, the learning classiﬁer also depends

on the deﬁnition of input feature space, the selection of kernel function, the parame-

ters of the kernel function and etc., which are unknown to worm writers. The worm

writer does not know:

• the value of n of the n-gram used in the classiﬁer,

• the mapping between n-grams and feature indices in the feature vector,

• the deﬁnition of the kernel function,

• the parameters of the kernel function,

• the space transformation introduced by kernel function.

Hence, even if the worm writer knows that we use SVM and is able to get the clas-

siﬁer, it is hard for him to interpret the classiﬁer to discovery the decision rule we

used to distinguish between worms and benign executables. Thus, it is hard for him

to change the worm behavior accordingly to evade our detection. Furthermore, we

can protect the classiﬁcation by mechanisms, such as encryption and etc.

5.5 Experiments

In this section, we ﬁrst present our experimental setup and metrics, and then we report

the results of our experiments.

138 5.5.1 Experiment Setup and Metrics

In our experiments, we use 722 benign executables and 1589 worms in Microsoft Win-

dows or DOS Portable Executable (PE) format as the data source, though our approach

works for worm detection on other operating systems as well. We use this data source to

learn the generic worm classiﬁer and further evaluate the trained classiﬁer to detect worms.

The executables are divided into two classes: worm and benign executables. The worms

are obtained from the Web site VX Heavens (http://vx.netlux.org); they include e-mail worms, peer-to-peer (P2P) worms, Instant Messenger (IM) worms, Internet Relay

Chat (IRC) worms and other, non-classiﬁed worms. The benign executables in our experiments include Microsoft software, commercial software from other companies and free,

“open source” software. This diversity of executables enables us to comprehensively learn classifiers that capture the behavior of both worm and benign executables. We use 80% of each class (worm and benign) as the training set to learn the classifiers. We use the remaining 20% as the test set to evaluate accuracy of the classifiers, i.e., the performance of our detection approach.

We install MS Windows Professional 2000 with service pack 4 on our virtual machines

(VMs). On these VMs, we launch each collected executable and use strace for Windows

NT [33] to trace their system calls for 10 seconds.16 From the trace ﬁle of each executable,

we extract the system-call name sequences in temporal order. Then we obtain the seg-

ment of system calls (i.e., the n-grams), given different value of n for each executable.

Afterwards, we build the vector inputs for the classiﬁcation learning algorithms.

16We launch the executables in the dataset for a longer time and then use a slide window to capture traces of a certain length for the classiﬁer training. We found that a 10 second trace sufﬁces to provide high detection accuracy.

139 Recall that the classiﬁcation in our worm detection problem is in a high-dimensional

space. There are a large number of dimensions and features that cannot be handled or

handled efﬁciently by many data-mining tools. We choose the following data-mining

tools: Naive Bayes classiﬁcation tools from University of Magdeburg in Germany [28]

and svm light [57]. Both tools are implemented in the C language and thus have ef-

ﬁcient performance, especially for high-dimensional classiﬁcation problems. When we apply SVM algorithm with svm light, we choose the Gaussian Radial Basis Function

(Gaussian RBF), which has been proven as an effective kernel function [54]. The feature

distribution is a Gaussian distribution. The Gaussian RBF is in the form of

2 −γkxi−xj k K(xi, xj) = e , (5.11)

which means Equation (5.8) must be replaced by Equation (5.11) in the classiﬁer learning

process and on-line worm detection process. The value of γ is optimized through experi-

ments and comparison.

In order to evaluate the performance of our classiﬁcation for new worm detection, we

use two metrics: Detection Rate (PD) and False Positive Rate (PF ). In particular, the detection rate is defined as the probability that a worm is correctly classified. The false positive rate is defined as a benign executable classified mistakenly as a worm.

5.5.2 Experiment Results

In this subsection, we report the performance of our worm detection approaches. The

results of Naive Bayes- and SVM-based worm detections in terms of Detection Rate and

False Positive Rate under different n-gram length (n) are shown in Tables 5.5.2 and 5.5.2, respectively.

140 n-gram length 1 2 3 4 5 6

Detection Rate (PD) 69.8% 81.4.0% 85.0% 90.9% 93.6% 96.4% False Positive Rate (PF ) 33.2% 18.6% 11.5% 8.89% 6.67% 6.67%

Table 5.1: Detection results for the Naive Bayes based detection

n-gram length 1 2 3 4 5 6

Detection Rate (PD) 89.7% 96.0% 98.75% 99.5% 99.5% 99.5% False Positive Rate (PF ) 33.3% 18.75% 7.14% 4.44% 2.22% 2.22%

Table 5.2: Detection results for the SVM based detection

Effectiveness of Our Approaches

We conclude that our approaches of using both the Naive Bayes and SVM algorithms

correlate with detected worms at a high detection rate and low false positive rate when

the length of n-grams is reasonably large. For example, when the length of n-grams is

5, the detection based on the SVM algorithm achieves 99.5% detection rate and 2.22%

false positive rate and the detection based on the Naive Bayes algorithm achieves 96.4% detection rate and 6.67% false positive rate, respectively.

We also conclude that SVM-based detection performs better than Naive Bayes-based

detection in terms of both detection rate and false positive rate. There are two reasons for

this. First, the Naive Bayes classiﬁcation assumes that features are independent, which may

not always be the case in reality. Second, the Naive Bayes-based classiﬁcation calculates

the likelihood for classifying a new executable based on the vectors of the training set

executables in the feature space. Then it simply predicts the class of the new executable

141 based on the likelihood comparison. In contrast, the SVM attempts to optimize the classiﬁer

(hyperplane) by ﬁnding the hyperplane that can maximally separate the two classes in the

training set.

Impacts of N-gram Length

Another important observation is the length of n-gram, i.e., the value of n, impacts the

detection performance. When n increases from 1 to 4, the performance keeps increasing.

When n further increases after that, the performance does not increase or only marginally increases. The reason can be explained as follows. First, when n = 1, each n-gram only contains one system call and thus contains neither dynamic system-call sequence information nor executable behavior information. Actually, this special case is that of static program analysis, which only investigates the list of system calls used by the executables.

Second, when n is larger, the n-grams contain larger lengths of system call sequences, thereby capturing more dynamic behavior of the traced executables, hence increasing the detection performance. This also demonstrates that our dynamic program analysis approach outperforms the traditional static program analysis-based approaches.

From the above observation on the length of the n-gram, we conclude that a certain n-gram length is sufﬁciently effective for worm detection. This length (value of n) can be learned through experiments: when the increase of n does not greatly increase detection performance gain, that n value is good enough and can be used in practice. This method is actually used for other n-gram based data mining applications. Furthermore, with respect to the efﬁciency of worm detection, the n value should not be very large, as we discuss in

Section 5.4.4.

142 5.6 Discussions

In this chapter, we develop a worm detection approach that allows us to mine program execution, thereby detecting new and unseen worms. There are a number of possibilities for extending this work. A detailed discussion follows.

1. Classiﬁcation for Different Worms

Presently, we discuss how to generalize our approach to support classiﬁcation for dif-

ferent worms. Recall that our work in this chapter only focuses on distinguishing two

classes of executables: worm and benign. In practice, knowledge of the speciﬁc types

of worms (e.g., e-mail worms, P2P worms) can provide better ways to defend against

them. In order to classify different worms, the approach studied in Section 5.4.4 can

be extended as follows. First, we collect the training dataset including large number

of worms labeled by their types. Second, we use the same approach discussed in

Section 5.4.4 to train the classiﬁers, which are capable of proﬁling multiple classes

according to the types. Third, trained classiﬁers are used to determine the type (class)

of an un-labeled new worm.

2. Detection of Smart Worms

Now, we discuss how to extend our work to detect smart worms, or worms that be-

have “intelligently.” Based on mining the execution of a large number of program

executables, our approach has proven effective at detecting new, unseen worms. In

preliminary investigations, we even found that our approach detected “smart worms”

we created. These worms are “benign” executables that were infected by worms

and hence have behaviors characteristic of both worm and benign executables. We

143 use them to simulate smart worms that can camouﬂage themselves as benign exe-

cutables. Our initial ﬁnding is promising, as we found that our approach can detect

these smart worms that possess benign worm behaviors. This ability of our approach

awaits further proof of its efﬁcacy with a larger set of various types of smart worms.

Nevertheless, the detection of smart worms is still an open issue, and answers for

some questions are still unclear. For instance, how intelligent can worms be, i.e., to

what extent can worms behave similarly to benign executables while still “spreading”

themselves like “normal” worms? While this is an open problem, researchers have

begun to discuss the efﬁciency of worms and detection evasion [96, 143]. Worms can

be very intelligent in order to evade detection, but their “hiding” mechanisms may

diminish their efﬁciency. More importantly, so long as they are worms, they must

manifest behavior different than that of benign executables. This provides some op-

portunities for use to further investigate smart worm detection.

3. Integration of Network-based and Host-based Detection

In this chapter, our focus is the study of host-based detection and we did not consider

information about the trafﬁc generated by the executables during the worm detection.

As we know, a worm executable will expose multiple behaviors, such as generating

scan trafﬁc (i.e., messages that intend to identify vulnerable computers) and conduct-

ing malicious acts on the infected computers. Since these worm behaviors are ex-

posed from different perspectives, consideration of multiple behaviors could provide

more accurate worm detection. In fact, trafﬁc generated by worms can also be clas-

siﬁed and used to distinguish them from normal trafﬁc. For instance, the distribution

of destination IP addresses in network trafﬁc can provide accurate worm detection

through trafﬁc analysis [141]. Hence one ongoing work is to combine the trafﬁc logs

144 and system calls generated by the worms and benign executables. The integration of

trafﬁc and system calls can learn more reliable classiﬁers to detect worms.

5.7 Related Work

In this section, we review some existing work related to our study including worm

detection and data mining to the security research.

Since worm attacks have always been very dangerous threats to the Internet, much ef-

fort has gone into studying, analyzing, and modeling worms. For example, Staniford et

al. in [114] studied various worms and modeled their propagation using a continuous-time

epidemiology model. There has also been extensive work on the propagation of speciﬁc

worms [86, 148]. These analysis and modeling results help researchers to have better un-

derstanding of worm behaviors, and further develop detection schemes.

As we mentioned, there are two types of worm detection systems: network-based de-

tection and host-based detection. For the network-based worm detection, many schemes

proposed in the literature. For example, payload signature-based detection scheme is one to

examine speciﬁc byte sequence segments in the payload of worm scan trafﬁc [112]. Tradi-

tionally, these payload signatures are manually identiﬁed by security experts through care-

ful analysis of the byte sequence from captured network trafﬁc. Some efforts have been paid

to automatically generating payload signatures [112, 62]. There are other network-based

detection based on network trafﬁc analysis. For example, Jung et al. in [59] developed a

threshold-based detection algorithm to identify the anomaly of scan trafﬁc generated by a

computer. Venkataraman et al. and Weaver et al. in [123, 132] proposed schemes to ex-

amine statistics of scan trafﬁc volume, Zou et al. presented a trend-based detection scheme

to examine the exponential increase pattern of scan trafﬁc [147], Lakhina et al. in [67]

145 proposed schemes to examine other features of scan trafﬁc, such as destination address distribution. There are also other work studying worms attempt to pose new patterns to avoid detection [141, 96].

For the host-based detection, many schemes have been proposed in the literature. For example, a binary text scan program was developed to extract the human-readable strings from the binary, which reveal information about the function of the executable binary [1].

Wagner et al. in [124] proposed an approach that analyzes program executables and generates a non-deterministic ﬁnite automaton (NDFA) or a non-deterministic pushdown automaton (NDPDA) from the global control-ﬂow graph of the program. The automaton was then used to monitor the program execution on-line. Gao et al. in [47] presented an approach for detecting anomalous behavior of an executing process. The basic idea of their approach is that processes potentially running the same executable should behave similarly in response to a common input. Feng et al. [43] proposed a formal analysis framework for pushdown automata (PDA) models. Based on this framework, they studied program analysis techniques, incorporating system calls or stack activities. There are other schemes that detect anomaly behavior of executables through call stack information. For example,

Cowan et al. in [34] proposed a method, called StackGuard, to detect buffer overﬂow attacks. The difference that distinguishes our work form theirs is that we attempt to capture the common dynamic behavioral features of worms by mining the execution of a large number of worms.

Many articles have examined the use of data mining for security research. Lee et at. in

[70] formulated machine learning scheme on system call sequences of normal and abnormal execution of the Unix sendmail program. Lee et al. in [71] described a data mining framework for adaptively building Intrusion Detection models. The main idea of their work

146 is to utilize auditing programs (e.g., network logs of telnet sessions, shell command log) to extract an extensive set of features that describe each network connection or host session, and apply data mining techniques to learn rules that capture the behavior of intrusions and normal activities. Martin et al. in [78] proposed an approach via learning statistical pattern of outgoing emails from local hosts. Yang et al in [136] proposed an approach to apply machine learning approach to automatically ﬁngerprint polymorphic worms, which are capable of changing their appearance across every instance of executables. Kolter et al in [64] applied data mining techniques to extract byte sequences directly from program executables, converted these sequences into n-grams, and constructed the classiﬁer. Julisch et al in [58] proposed an approach to learn historical alarms generated by intrusion detection systems. In our work, we use data mining to obtain the dynamic behaviorial difference between worms and benign executables.

5.8 Summary

we proposed a new worm detection approach that is based on mining the dynamic execution of programs. Our approach is capable of capturing the dynamic behavior of executables to provide efﬁcient and accurate detection against both seen and unseen worms.

Using a large number of real-world worms and benign executables, we run executables on virtual machines and record system call traces. We applied two data mining classiﬁcation algorithms to learn classiﬁers off-line, which are subsequently used to carry out on-line worm detection. Our data clearly showed the effectiveness of our proposed approach in detection worms in terms of both a very high detection rate and a low false positive rate.

Our proposed approach has the following advantages. It is practical with low overhead during both classiﬁer learning and run-time detection. It does not rely on investigation for

147 individual executable; rather, it examines the common dynamic properties of executables.

Therefore, it can automatically detect new worms. Furthermore, our approach attempts to build a “black-box” classiﬁer which makes it difﬁcult for the worm writers to interpret our detection.

148 CHAPTER 6

CONCLUDING REMARKS

In this dissertation, we studied defense-oriented evolution, a novel evolutionary trend

among widespread Internet attacks, in which defense-oriented attacks leverage defense systems to circumvent them and increase attack effectiveness. As the most important characteristics of a defense system are its infrastructure and algorithms, we classify these attacks into two groups: infrastructure-oriented attacks and algorithm-oriented attacks.

For infrastructure-oriented widespread Internet attacks, we studied intelligent DDoS

attacks which aim to infer architectural information about the infrastructure of DDoS-

defensing Secure Overlay Forwarding Systems (SOFSes) to launch more efﬁcient DDoS

attacks. We further provided optimal structural conﬁguration for SOFS systems and guide-

lines to enhance SOFS system performance under intelligent DDoS attacks. Additionally,

we investigated another infrastructure-oriented attack, the invisible LOCalization attack

(the iLOC attack in short). The iLOC attack can accurately and invisibly obtain the moni-

tor locations in Internet Threat Monitoring (ITM) systems, enabling other attacks to evade

disclosed monitors or even to abuse ITM systems to degrade their integrity and functional-

ity.

For algorithm-oriented attacks, we studied Varying Scan Rate (VSR) Worm,s which

deliberately vary their port scan rate during propagation to defeat existing ITM-based worm

149 detection schemes. We also designed the attack target Distribution Entropy based dynamiC

(DEC) detection scheme to effectively detect VSR and traditional worms. Furthermore, in order to detect new worms, including polymorphic worms (which have different signatures or purposely change signatures in order evade host-based worm detection), we proposed a new worm detection scheme which mines dynamic program execution.

While there are other forms of evolutions among widespread Internet attacks, we believe the defense-oriented ones studied in this dissertation are among the most dangerous, since they deliberately and effectively counteract defense systems. As shown in the dissertation, they are also feasible and potent threats to the Internet. Our purpose is not to encourage attacks, but to obtain deep insights about potential new Internet threats and vulnerabilities within current defense systems in order to enhance defense systems and design new defenses against evolving widespread Internet attacks. We believe that the results in this dissertation lay a foundation for further research in this ﬁeld.

150 BIBLIOGRAPHY

[1] Binary Text Scan. http://netninja.com/ﬁles/bintxtscan.zip.

[2] Internet Security News. http://www.landﬁeld.com/isn/mail-archive/2001/Feb/0037 .html.

[3] Snort, the open-source network intrusion detection system. http://www.snort.org/.

[4] W32/MyDoom.B Virus. http://www.us-cert.gov/cas/techalerts/TA04-028A.html.

[5] W32.Sircam.Worm@mm. http://www.symantec.com/avcenter/venc/data/w32.sircam. [email protected].

[6] Worm.ExploreZip. http://www.symantec.com/avcenter/venc/data/worm.explore.zip .html.

[7] Powerful Attack Cripples Internet. Associated Press for Fox News, http://www.foxnews.com/story/0,2933,66438,00.html, October 2002.

[8] R. Agrawal, A. Evﬁmievski, and R. Srikant. Information sharing across private database. In Proceeding of the 22-th SIGMOD International Conference on Man- agement of Data, San Diego, CA, July 2003.

[9] R. L. Allen and D. W. Mills. Signal Analysis: Time, Frequency, Scale, and Structure. Wiley and Sons, 2004.

[10] D. Andersen. Mayday: Distributed ﬁltering for internet services. In Proceedings of the USENIX Symposium on Internet Technologies and Systems, Seattle, WA, March 2003.

[11] D. Andersen, H. Balakrishnan, M. Kaashoek, and R. Morris. Resilient overlay networks. In Proceedings of 18th ACM Symposium on Operating Systems Principles (SOSP), Banff, Canada, October 2001.

[12] A. Anderson, A. Johnston, and P. McOwan. Motion Illusions and Active Camou- ﬂaging. http://www.ucl.ac.uk/ucbplrd/motion/motion middle.html.

151 [13] R. M. Anderson and R. M. May. Infectious Diseases of Humans: Dynamics and Control. Oxford University Press, Oxford, 1991.

[14] G. Badishi, I. Keidar, and A. Sasson. Exposing and eliminating vulnerabilities to denial of service attacks in secure gossip-based multicast. In Proceedings of the International Conference on Dependable Systems and Networks (DSN), Florence, Italy, June 2004.

[15] M. Bailey, E. Cooke, F. Jahanian, J. Nazario, and D. Watson. The internet motion sensor: A distributed blackhole monitoring system. In Proceedings of the 12-th IEEE Network and Distributed Systems Security Symposium (NDSS), San Diego, CA, February 2005.

[16] N. Barakat and J. Diederich. Eclectic rule-extraction from support vector machines. In Int. Journal Computational Intelligence, volume 1, pages 59–62, 2005.

[17] M. Bellare, S. Goldwasser, and D. Miccianciom. Pseudo-random number generation within cryptographic algorithms: the dss case. In Proceedings of advances in cryptology’97, Lecture Notes in Computer Science, Springer-Verlag, May 1997.

[18] J. Bethencourt, J. Frankin, and M. Vernon. Mapping internet sensors with probe response attacks. In Proceedings of the 14-th USNIX Security Symposium, Baltimore, MD, July-August 2005.

[19] J. Blazquez, A. Oliver, and JM. Gomez-Gomez. Mutation and Evolution of An- tibiotic Resistance: Antibiotics as Promoters of Antibiotic Resistance, volume 3. Current Drug Targets, August 2002.

[20] D. Bruschi, L. Martignoni, and M. Monga. Detecting self-mutating malware using control ﬂow graph matching. In Proceedings of the Conference on Detection of Intrusions and Malware & Vulnerability Assessment (DIMVA), Berlin, Germany, July 2006.

[21] CAIDA. The Cooperative Association for Internet Data Center. http://www. caida.org.

[22] CAIDA. Telescope Analysis. http://www.caida.org/analysis/security/telescope.

[23] CERT. Advisory CA-1995-18 Widespread Attacks on Internet Sites. http://www.cert .org/advisories/CA-1995-18.html.

[24] CERT. CERT/CC advisories. http://www.cert.org/advisories/.

[25] CERT. Advisory CA-2003-20 W32/Blaster worm. http://www.cert.org/advisories /CA-2003-20.html, 2003.

152 [26] S. Chen and R. Chow. A new perspective in defending against ddos. In Proceed- ings of 10th IEEE Workshop on Future Trends of Distributed Computing Systems (FTDCS), Suzhou, China, May 2004.

[27] Z. S. Chen, L.X. Gao, and K. Kwiat. Modeling the spread of active worms. In Proceedings of the IEEE Conference on Computer Communications (INFOCOM), San Francisco, CA, March 2003.

[28] B. Christian. Full and Naive Bayes Classiﬁers. http://fuzzy.cs.uni-magdeburg.de/ borgelt/doc/bayes/bayes.html.

[29] M. Christodorescu and S. Jha. Static analysis of executables to detect malicious patterns. In Proceedings of the 12-th USENIX Security Symposium (SECURITY), Washington, DC, August 2003.

[30] M. Christodorescu and S. Jha. Testing malware detectors. In Proceedings of the 2004 ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA), Boston, MA, July 2004.

[31] L. Y. Chuang, C. H. Yang, C. H. Yang, and S. L Lin. An interactive training system for morse code users. In Proceedings of Internet and Multimedia Systems and Applications, Honolulu, Hawai, August 2002.

[32] M. Ciubotariu. Netsky: a conﬂict starter? Virus Bullettin, http://www.virusbtn.com, 2004.

[33] BindView Corporation. Strace for NT. http://www.bindview.com/Services/ RA- ZOR/Utilities/Windows/strace readme.cfm.

[34] C. Cowan, C. Pu, D. Maier, H. Hinton, P. Bakke, S. Beattie, A. Grier, P. Wagle, and Q. Zhang. Stack-guard: Automatic adaptive detection and prevention of buffer- overﬂow attacks. In Proceedings of 7th USENIX Security Symposium (SECURITY), San Antonio, TX, August 1998.

[35] E. J. Crusellers, M. Soriano, and J. L. Melus. Spreading codes generator for wireless cdma network. International Journal of Wireless Personal Communications, 7(1), 1998.

[36] D. E. Denning. An intrusion detection model. IEEE Transactions on Software En- gineering, 13(2):222–232, February 1987.

[37] T. Detristan, T. Ulenspiegel, Y. Malcom, and M. Underduk. Polymorphic shellcode engine using spectrum analysis. http://www.phrack.org/, 2003.

[38] Robert Dixon. Spread Spectrum Systems, 2nd Edition. John Wiley & Sons, 1984.

153 [39] Dshield. Distributed Intrusion Detection System. http://www.dshield .org/.

[40] M. H. Dunham. Data Mining: Introductory and Advanced Topics. Prentice Hall, 1 edition, 2002.

[41] Nova Engineering. Linear Feedback Register Shift. http://www.sss-mag.com/pdf /lfsr.pdf.

[42] M. Ernst. Static and dynamic analysis: Synergy and duality. Portland, Oregon, May 2003.

[43] H. H Feng, J. T. Gifﬁn, Y. Huang, S. Jha, W. Lee, and B. P. Miller. Formalizing sensitivity in static analysis for intrusion detection. In Proceedings of IEEE Symposium on Security and Privacy (S&P), Oakland, CA, May 2004.

[44] H. H. Feng, O. M. Kolesnikov, P.Fogla, W. Lee, and W. Gong. Anomaly detection using call stack information. In Proceedings of IEEE Symposium on Security and Privacy (S&P), Oakland, CA, May 2003.

[45] P. Ferrie and P. Szor.¨ Zmist. Zmist opportunities. Virus Bullettin, http://www .virusbtn.com.

[46] G. Fung, S. Sandilya, and R. Rao. Rule extraction from linear support vector machines. In Proceedings of the 11-th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), Chicago, Illinois, August 2005.

[47] D. Gao, M. Reiter, and Dawn Song. Behavioral distance for intrusion detection. In Proceedings of Symposium on Recent Advance in Intrusion Detection (RAID), Seattle, WA, September 1999.

[48] M. Garetto, W. B. Gong, and D. Towsley. Modeling malware spreading dynamics. In Proceedings of the IEEE Conference on Computer Communications (INFOCOM), San Francisco, CA, March 2003.

[49] D. Goldsmith. Incidents Maillist: Possible Code-Red Connection Attempts. http://lists.jammed.com/incidents/2001/07/0149.html.

[50] J. Gordon. Lessons from virus developers: The beagle worm history through april 24. http://www.securityfocus.com/guest/24228, 2004.

[51] V. S. Grichenko. Modular Worms. http://blogs.plotinka.ru/gritzko/modular.pdf, PC World.

[52] P. Gross, J. Parekh, and G. Kaiser. Secure selecticast for collaborative intrusion detection systems. In Proceedings of the 3-th International Workshop on Distributed Event-based Systems (DEBS), Edinburgh, UK, May 2004.

154 [53] G. F. Gu, M. I. Sharif, X. Z. Qin, D. Dagon, W. Lee, and G. F. Riley. Worm detection, early warning and response based on local victim information. In Proceed- ings of Proceedings of the 20-th Annual Computer Security Applications Conference (ACSAC 2004), Tucson, Arizona, December 2004.

[54] J. Han and M. Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, 2 edition, 2006.

[55] VMWare Inc. www.vmware.com/virtual-machine.

[56] Operating System Inside. Linux System Call Table. http://osinside.net/syscall /system call table.htm, 2006.

[57] T. Joachims. Advances in Kernel Methods: Support Vector Machines. MIT Press, Cambridge, Massachusetts, 1998.

[58] K. Julisch and M. Dacier. Mining intrusion detection alarms for actionable knowledge. In Proceedings of the 8th ACM International Conference on Knowledge Dis- covery and Data Mining (SIGKDD), Edmonton, Alberta, July 2002.

[59] J. Jung, V. Paxson, A. W. Berger, and H. Balakrishnan. Fast portscan detection using sequential hypothesis testing. In Proceedings of the 25-th IEEE Symposium on Security and Privacy, Oakland, CA, May 2004.

[60] A. Keromytis, V. Misra, and D. Rubenstein. SOS: Secure overlay services. In Proceedings of ACM SIGCOMM, Pittsburg, PA, August 2002.

[61] H. Kim and B. Karp. Autograph: Toward automated, distributed worm signature detection. In Proceedings of the 13-th USENIX Security Symposium, San Diego, CA, August 2004.

[62] H. Kim and B. Karp. Autograph: Toward automated, distributed worm signature detection. In Proceedings of the 13-th USENIX Security Symposium (SECURITY), San Diego, CA, August 2004.

[63] O. Kolesnikov and W. Lee. Advanced Polymorphic Worms: Evading IDS by Blend- ing in with Normal Trafﬁc. Technical report, Georgia Institute of Technology, 2004.

[64] J. Z. Kolter and M. A. Maloof. Learning to detect malicious executables in the wild. In Proceedings of the 10th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), Seattle, WA, August 2004.

[65] Ktwo. Admmutate v0.8.4: Shellcode mutation engine. http://www.ktwo.ca /ADMmutate-0.8.4.tar.gz, 2001.

155 [66] Aleksandar Kuzmanovic and Edward W. Knightly. Low-rate tcp-targeted denial of service attacks (the shrew vs. the mice and elephants). In Proceedings of ACM SIGCOMM, Karlsruhe, Germany, August 2003.

[67] A. Lakhina, M. Crovella, and C. Diot. Mining anomalies using trafﬁc feature distribution. In Proceedings of ACM SIGCOMM’05, Philadelphia, PA, August 2005.

[68] E. Larkin. Widespread Internet Attack Cripples Computers with Spyware. http://www.pcworld.com/article/id,120448-page,1/article.html.

[69] K. F. Lee and S. Mahajan. Automatic Speech Recognition: the Development of the SPHINX System. Springer, 1988.

[70] W. Lee, S. Stolfo, and Phil Chan. Learning patterns from unix process execution traces for intrusion detection. In Proceedings of AAAI Workshop: AI Approaches to Fraud Detection and Risk Management, Menlo Park, CA, June 1997.

[71] W. Lee, S. J. Stolfo, and W. Mok. A data mining framework for building intrusion detection models. In Proceedings of the 1999 IEEE Symposium on Security and Privacy (S&P), Oakland, CA, May 1999.

[72] Shengying Li. A Survey on Tools for Binary Code Analysis. Depart- ment of Computer Science, Stony Brook University, http://www.cs.sunysb .edu/ lshengyi/papers/rpe/RPE.htm, 2004.

[73] Metasploit LLC. Windows System Call Table. http://www.metasploit.com/users /op- code/syscalls.html.

[74] X. Luo and RKC Chang. On a new class of pulsing denial-of-service attacks and. the defense. In Proceedings of 13th Annual Network and Distributed System Security Symposium (NDSS), San Diego, CA, February 2005.

[75] J. Ma, G. M. Voelker, and S. Savage. Self-stopping worms. In Proceedings of the ACM Workshop on Rapid Malcode (WORM), November 2005.

[76] R. Mahajan, S. Bellovin, S. Floyd, J. Ioannidis, V. Paxson, and S. Shenker. Control- ling high bandwidth aggregates in the network. In Proceedings of ACM SIGCOMM Computer Communication Review (CCR), Stockholm, Sweden, July 2002.

[77] M. Mannan and P. C. Oorschot. Instant messaging worms, analysis and countermeasures. In In Proceedings of the 3-th Workshop on Rapid Malcode (WORM), Fairfax, VA, July 2005.

[78] S. Martin, A. Sewani, B. Nelson, K. Chen, and A. Joseph. Analyzing behavioral features for email classiﬁcation. In Proceedings of the 2th International conference on email and anti-span (CEAS), Mountain view, CA, August 2003.

156 [79] MetaPHOR. http://securityresponse.symantec.com/avcenter/venc/data/w32.simile .html.

[80] Microsoft. Microsoft Virtual PC. http://www.microsoft.com/windows/virtualpc /default.mspx.

[81] J. Mirkovic and P. Reiher. A taxonomy of ddos attack and ddos defense mechanisms. In ACM SIGCOMM Computer Communication Review, April 2004.

[82] J. Mirkovic and P. Reiher. A taxonomy of ddos attack and ddos defense mechanisms. ACM SIGCOMM Computer Communication Review, 34(2):39–54, 2004.

[83] J. Mirkovic and P. Reiher. A taxonomy of ddos attacks and defense mechanisms. ACM SIGCOMM Computer Communications Review, 34(2):39–54, April 2004.

[84] R. Moddemeijer. On estimation of entropy and mutual information of contineous distributions. Signal Processing, 16(3):233–246, 1989.

[85] D. Moore. Network telescopes: Observing small or distant security events. In Invited Presentation at the 11-th USENIX Security Symposium (SEC), San Francisco, CA, August 2002.

[86] D. Moore, V. Paxson, and S. Savage. Inside the slammer worm. IEEE Magazine of Security and Privacy, 1(4):33–39, 2003.

[87] D. Moore, C. Shannon, and J. Brown. Code-red: a case study on the spread and victims of an internet worm. In Proceedings of the 2-th Internet Measurement Workshop (IMW), Marseille, France, November 2002.

[88] D. Moore, C. Shannon, and K. Claffy. Code-red: A case study on the spread and victims of an internet worm. In Proceedings of the 2-th ACM SIGCOMM Workshop on Internet Measurment, Marseille, France, November 2002.

[89] D. Moore, G. M. Voelker, and S. Savage. Infering internet deny-of-service activity. In Proceedings of the 10-th USNIX Security Symposium, Washington, DC, Auguest 2001.

[90] myNetWatchman. myNetWatchman Project. http://www.mynetwatchman .com.

[91] C. Nachenberg. Computer virus-antivirus coevolution. Communications Of The ACM, 40(1):46–51, January 1997.

[92] R. Naraine. Botnet Hunters Search for Command and Control Servers. http://www.eweek.com/article2/0,1759,1829347,00.asp.

157 [93] H. Nunez, C. Angulo, and A. Catala. Rule extraction from support vector machines. In Proceedings of European Symposium on Artiﬁcial Neural Networks, Bruges, Bel- gium, August 2002.

[94] Chief of Engineers. United States Army: Army facilities components system user guide. http://www.usace.army.mil/inet/usace-docs/armytm/tm5-304/, October 1990.

[95] K. Park and H. Lee. On the eeffectiveness of route-based packet ﬁltering for distributed dos attack prevention in power-law internets. In Proceedings of ACM SIG- COMM, San Diego, CA, August 2001.

[96] R. Perdisci, O. Kolesnikov, P. Fogla, M. Sharif, and W. Lee. Polymorphic blending attacks. In Proceedings of the 15-th USENIX Security Symposium (SECURITY), Vancouver, B.C., August 2006.

[97] R. K. Pickholtz, D. L. Schilling, and L. B. Milstein. Theory of spead-spectrum communication - tutorial. IEEE Transaction on Communication, 30(5):855–884, 1982.

[98] GNU Project. Linux Function and Macro Index. http://www.gnu.org/software /libc/manual/html node/Function-Index.html#Function-Index.

[99] The Honeynet Project and Research Alliance. Know your enemy: Tracking botnets. http://www.honeynet.org/papers/bots/, 2005.

[100] F. Qin, C. Wang, Z. Li, H. Kim, Y. Zhou, and Y. Wu. Lift: A low-overhead practical information ﬂow tracking system for detecting security attacks. Orlando, Florida, December 2006.

[101] M. Reiter and A. Rubin. Crowds: Anonymity for web transactions. ACM Transac- tions on Information and System Security, 1(1):66–92, November 1998.

[102] P. R. Roberts. Zotob Arrest Breaks Credit Card Fraud Ring. http://www.eweek.com /article2/0,1895,1854162,00.asp.

[103] SANS. Internet Storm Center. http://isc.sans.org/.

[104] S. Savage, D. Wetherall, A. R. Karlin, and T. Anderson. Practical network support for ip traceback. In Proceedings of ACM SIGCOMM, Stockholm, Sweden, August 2000.

[105] Stuart Schechter, Jaeyeon Jung, and Arthur W. Berger. Fast Detection of Scanning Worm Infections. In Proceedings of the 7-th International Symposium on Recent Advances in Intrusion Detection (RAID), French Riviera, France, September 2004.

158 [106] M. G. Schultz, E. Eskin, E. Zadok, and S. J. Stolfo. Data mining methods for detection of new malicious executables. In Proceedings of IEEE Symposium on Security and Privacy (S&P), Oakland, CA, May 2001.

[107] M. Sedalo. Jempiscodes: Polymorphic shellcode generator. http://securitylab.ru/, 2003.

[108] V. Sekar, Y. Xie, D. Maltz, M. Reiter, and H. Zhang. Toward a framework for internet forensic analysis. In Proceeding of the 3-th Workshop on Hot Topics in Networks (HotNets-III), San Diego, CA, November 2004.

[109] C. E . Shannon and W. Weaver. The Mathematical Theory of Communication. Uni- versity of Illinois Press, 1949.

[110] Y. Shinoda, K. Ikai, and M. Itoh. Vulnerabilities of passive internet threat monitors. In Proceedings of the 14-th USNIX Security Symposium, Baltimore, MD, July- August 2005.

[111] S. Singh, C. Estan, G. Varghese, and S. Savage. Automated worm ﬁngerprinting. In the 6th ACM/USENIX Symposium on Operating System Design and Implementation (OSDI), Fairfax, Virginia, December 2004.

[112] S. Singh, C. Estan, G. Varghese, and S. Savage. Automated worm ﬁngerprinting. In Proceedings of the ACM/USENIX Symposium on Operating System Design and Implementation, December 2004.

[113] L. Spitzner. Know Your Enemy: Honeynets. Honeynet Project, http://project .honeynet.org/papers/honeynet.

[114] S. Staniford, V. Paxson, and N. Weaver. How to own the internet in your spare time. In Proceedings of the 11-th USENIX Security Symposium, San Francisco, CA, August 2002.

[115] I. Stoica, D. Adkins, S. Zhuang, S. Shenker, and S. Surana. Internet indirection infrastructure. In Proceedings of ACM SIGCOMM Conference, Pittsburge, PA, August 2002.

[116] R. Stone. Centertrack: An ip overlay network for tracking dos ﬂoods. In 9th USENIX Security Symposium, San Francisco, CA, August 2000.

[117] P. Szor and P. Ferrie. Hunting for metamorphic. In Proceedings of Virus Bulletin Conference, September 2001.

[118] S. Theodoridis and K. Koutroumbas. Pattern Recognition, Second Edition. Elsevier Science, 2003.

159 [119] Paul Thurrott. Windows ”Longhorn” FAQ. http://www.winsupersite.com/faq /longhorn.asp.

[120] J. Twucrpss and M. M. Williamson. Implementing and testing a virus throttling. In Proceedings of the 12-th USENIX Security Symposium, Washington, DC, August 2003.

[121] V. Vapnik. The Nature of Statistical Learning Theory. Springer-Verlag, New York, 1995.

[122] V. Vapnik. Statistical Learning Theory. John Wiley and Sons, New York, 1998.

[123] S. Venkataraman, D. Song, P. Gibbons, and A. Blum. New streaming algorithms for superspreader detection. In Proceedings of the 12-th IEEE Network and Distributed Systems Security Symposium (NDSS), San Diego, CA, Febrary 2005.

[124] D. Wagner and D. Dean. Intrusion detection via static analysis. In Proceedings of IEEE Symposium on Security and Privacy (S&P), Oakland, CA, May 2001.

[125] J. Wang, L. Lu, and A. A. Chien. Tolerating denial-of-service attacks using overlay networks – impact of overlay network topology. In Proceedings of ACM Workshop on Survivable and Self-Regenerative Systems, Fairfax, Virginia, October 2003.

[126] L. Wang and B. B. Hirsbrunner. Pn-based security design for data storage. In Pro- ceedings of Databases and Applications, Innsbruck, Austria, Feberary 2004.

[127] X. Wang, S. Chellappan, C. Boyer, and D. Xuan. On the effectiveness of secure overlay forwarding systems under intelligent distributed dos attacks. IEEE Transactions on Parallel and Distributed Systems (TPDS), 17(7):619–632, July 2006.

[128] X. Wang, S. Chellappan, P. Boyer, and D. Xuan. Analyzing secure overlay forwarding systems under intelligent ddos attacks. Technical Report, OSU-CISRC-12/04- TR71, Dept. of Computer Science and Engineering, The Ohio State University, June 2004.

[129] X. Wang, W. Yu, A. Champion, X. Fu, and D. Xuan. Detecting Worms via Min- ing Dynamic Program Execution. to appear in IEEE International Conference on Security and Privacy in Communication Networks (SecureComm), Nice, France, September 2007.

[130] X. Wang, W. Yu, X. Fu, D. Xuan, and W. Zhao. iLOC: An invisible LOCalization Attack to Internet Threat Monitoring Systems. submitted to the IEEE Conference on Computer Communications (INFOCOM), July 2007.

160 [131] N. Weaver, S. Staniford, and V. Paxson. Very fast containment of scanning worms. In Proceedings of the 13-th USENIX Security Symposium, San Diego, CA, August 2004.

[132] J. Wu, S. Vangala, and L. X. Gao. An effective architecture and algorithm for detecting worms with various scan techniques. In Proceedings of the 11-th IEEE Network and Distributed System Security Symposium (NDSS), San Diego, CA, Febrary 2004.

[133] X. G. Xia, C. G. Boncele, and G. R. Arce. A multiresolution watermark for digital images. In Proceedings of International Conference on Image Processing (ICIP’97), Washington, DC, October 1997.

[134] L. Xiao, Z. Xu, and X. Zhang. Mutual anonymity protocols for hybrid peer-to-peer systems. In Proceedings of IEEE International Conference on Distributed Comput- ing Systems (ICDCS), Providence, RI, May 2003.

[135] D. Xuan, S. Chellappan, X. Wang, and S. Wang. Analyzing the secure overlay services architecture under intelligent ddos attacks. In Proc. of IEEE International Conference on Distributed Computing Systems (ICDCS), Tokyo, Japan, March 2004.

[136] S. Yang, J. P. Song, H. Rajamani, T. W. Cho, Y. Zhang, and R. Mooney. Fast and effective worm ﬁngerprinting via machine learning. In Proceedings of the 3rd IEEE International Conference on Autonomic Computing (ICAC), Dublin, Ireland, June 2006.

[137] V. Yegneswaran, P. Barford, and S. Jha. Global intrusion detection in the domino overlay system. In Proceedings of the 11-th IEEE Network and Distributed System Security Symposium (NDSS), San Diego, CA, Febrary 2004.

[138] V.Yegneswaran, P. Barford, and D. Plonka. On the design and utility of internet sinks for network abuse monitoring. In Proceeding of Symposium on Recent Advances in Intrusion Detection (RAID), Pittsburgh, PA, September 2003.

[139] W. Yu, X. Fu, S. Graham, D. Xuan, and W. Zhao. Dsss-based ﬂow marking technique for invisible traceback. In Proceedings of IEEE Symposium on Security and Privacy (S&P), Oakland, CA, May 2007.

[140] W. Yu, X. Wang, D. Xuan, and D. Lee. Effective detection of active worms with varying scan rate. Technical report, Department of Computer Science and Engineer- ing, The Ohio State University, April 2005.

[141] W. Yu, X. Wang, D. Xuan, and D. Lee. Effective detection of active worms with varying scan rate. In Proceedings of IEEE International Conference on Security and Privacy in Communication Networks (SecureComm), Baltimore, MD, August 2006.

161 [142] W. Yu, X. Wang, D. Xuan, and W. Zhao. On detecting camouﬂaging worm. In An- nual Computer Security Applications Conference (ACSAC), Miami, FL, December 2006.

[143] W. Yu, N. Zhang, and W. Zhao. Self-adaptive worms and countermeasures. In Pro- ceedings of Symposium on Stabilization, Safety, and Security of Distributed Systems (SSS), Dallas, TX, 2006 November.

[144] Zdnet. Smart worm lies low to evade detection. http://news.zdnet.co.uk /internet/security/0,39020375,39160285,00.htm.

[145] N. Zhang, S. Wang, and W. Zhao. A new scheme on privacy preserving association rule mining. In Proceeding of the 8-th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), Pisa, Italy, September 2004.

[146] C. Zou, L. Gao, W. Gong, and D. Towsley. Monitoring and early warning for internet worms. In Proceedings of the 10th ACM conference on Computer and Communica- tion Security (CCS), Washington D.C., October 2003.

[147] C. Zou, W. B. Gong, D. Towsley, and L. X. Gao. Monitoring and early detection for internet worms. In Proceedings of the 10-th ACM Conference on Computer and Communication Security (CCS), Washington DC, October 2003.

[148] C. C. Zou, W. Gong, and D. Towsley. Code red worm propagation modeling and analysis. In Proceedings of the 9-th ACM Conference on Computer and Communi- cation Security (CCS), Washington, DC, November 2002.

162