WIDESPREAD INTERNET ATTACKS: DEFENSE-ORIENTED EVOLUTION AND COUNTERMEASURES

DISSERTATION

Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the

Graduate School of The Ohio State University

By

Xun Wang, B.E., M.S.

*****

The Ohio State University

2007

Dissertation Committee: Approved by

Dong Xuan, Adviser Ten H. Lai Adviser Ming T. Liu Graduate Program in Computer Science and Engineering °c Copyright by

Xun Wang

2007 ABSTRACT

Widespread Internet attacks, such as Distributed Denial of Service (DDoS) attacks and

active worm attacks, have been major threats to the Internet in the recent past. Although

tremendous research effort has focused on this domain, the defense against these attacks

remains challenging for one reason: the attacks are evolving intelligently based on their

knowledge of defense mechanisms. In other words, the attacks are becoming more intel-

ligent and effective through defense-oriented evolution in order to defeat existing defense

systems. The objectives of this dissertation are to obtain deep insight about these defense-

oriented attacks and to address the challenges in defense against them.

While multiple elements define a specific defense system, the most important ones

are the system infrastructure and algorithms. The evolving defense-oriented attacks can exploit and leverage the knowledge of infrastructure and algorithms in the defense systems in order to counteract them. Hence we can classify defense-oriented widespread Internet attacks into infrastructure-oriented and algorithm-oriented attacks. In this dissertation, we

investigate a variety of such attacks and design new and more effective countermeasures

against them.

For infrastructure-oriented attacks, we study two classes of new attacks that target dif-

ferent aspects of the defense system infrastructure. First, we investigate intelligent DDoS

attacks which aim to infer architectures of the DDoS-defending Secure Overlay Forward-

ing Systems (SOFS) to launch attacks more efficiently than ordinary random DDoS attacks.

ii Second, we study the invisible LOCalization attack which can obtain location information

of Internet Threat Monitoring (ITM) systems. In order to defend against these new attacks,

we provide enhancements for SOFS and ITM systems.

For algorithm-oriented attacks, first we study a class of new active worms, the Varying

Scan Rate Worm, which deliberately varies its port scan rate during propagation to evade detection by existing network-based worm detection algorithms. Second, we focus on polymorphic worms which change or possess new signatures to defeat existing host-based

worm detection algorithms. Furthermore, we provide new and more effective detection

approaches against these new worms.

The war between attackers and defenders is never ending. We believe this dissertation

lays a foundation to deeply understand the evolution of widespread Internet attacks and to

enhance defenses against them.

iii To my family.

iv ACKNOWLEDGMENTS

To reach this stage in my Ph.D. study and this point in my life, I am indebted to many great people for their wisdom, support, and love.

It was a great fortune for me to become the first Ph.D. stduent of Dr. Dong Xuan in

September 2002. It is him who has given me the opportunity to conduct focused research in the past years, and it is him who has showed me the road to high quality work. While enjoying the freedom of independent thinking, I greatly appreciated his insightful advice on my research as well as life. I also greatly appreciate his patience in guiding me through my Ph.D. study. I still remember that, when I gave the first formal academic presentation in English, he gave me so much help and encouragement.

As a Ph.D. student in the Department of Computer Science and Engineering, I have also enjoyed and appreciated the advice and help from many other professors both within and outside of The Ohio State University, including Dr. Ming T. Liu, Dr. Ten H. Lai, Dr.

David Lee and Dr. Srinivasan Parthasarathy in CSE department, as well as Dr. Xinwen

Fu in Dakota State University and Dr. Wei Zhao in Rensselaer Polytechnic Institute. They have made my stay at The Ohio State University fun and fruitful.

During my Ph.D. study, I have enjoyed working with many other fellow graduate stu- dents in CSE department. I have got the chance to work with Sriram Chellappan, Thang

Nam Le, Sandeep Reddy, Wenjun Gu, Corey Boyer, Kurt Schosek, Xiaole Bai, Boxuan

Gu and Adam Champion on shared projects or research problems, and it was a wonderful

v experience. I have also interacted extensively with a few other graduate colleagues includ- ing Wei Yu in Texas A&M University, my research partner who has given me most help as a fellow student, and Prasad Calyam in the Ohio Supercomputer Center. Their help and laughters have greatly enriched my life at The Ohio State University. I would also like to thank my many other friends for their continued support during my life and study in OSU.

I am indebted to my parents Shihong Wang and Fanding Zhang, my dearest sister Xu

Wang, her husband Jason Chang and their two lovely sons for their unconditional love and support. I would like to take this opportunity to express my greatest gratitude to my greatest aunt, Xiaoman Duan, her husband Shengqi Wang and my wonderful cousin sister,

Yuan Wang. I would also like to thank the rest of my family — who are too numerous to name individually — for their love and help. It is this family that has made me strong and courageous to be the one I am today. It is my love, my inspiration, and my life.

vi VITA

January 29, 1977 ...... Born - Weinan, China

1999 ...... B.E. in Computer Engineering, East China Normal University, China 2002 ...... M.E. in Computer Engineering, East China Normal University, China 2006 ...... M.S. in Computer Science and Engineer- ing, The Ohio State University 2002-present ...... Graduate Research and Teaching Asso- ciate, The Ohio State University

PUBLICATIONS

Research Publications

Wei Yu, Xun Wang, Dong Xuan and Wei Zhao. “On Detecting Camouflaging Worm”. in Proceedings of 23rd Annual Computer Security Applications Conference (ACSAC), De- cember 2006.

Wei Yu, Xun Wang, Dong Xuan and David Lee. “Effective Detection of Active Smart Worms with Varying Scan Rate”. in Proceedings of 2nd IEEE International Conference on Security and Privacy in Communication Networks (SecureComm), August 2006.

vii Xun Wang, Sriram Chellappan, Corey Boyer and Dong Xuan. “On the Effectiveness of Secure Overlay Forwarding Systems under Intelligent Distributed DoS Attacks”. IEEE Transactions on Parallel and Distributed Systems (TPDS), 17(7):619 - 632, July 2006.

Xun Wang, Sriram Chellappan, Wenjun Gu, Wei Yu, and Dong Xuan. “Policy-driven Physical Attacks in Sensor Networks: Modeling and Measurement”. in Proceedings of IEEE Wireless Communications and Networking Conference (WCNC), April 2006.

Xun Wang, Wenjun Gu, Kurt Schosek, Sriram Chellappan, Dong Xuan. “Sensor Network Configuration under Physical Attacks”. International Journal of Ad Hoc and Ubiquitous Computing (IJAHUC), Lecture Notes in Computer Science, Inderscience, January 2006.

Wenjun Gu, Xun Wang, Sriram Chellappan, D. Xuan and Ten-Hwang Steve Lai. “Defend- ing against Search-based Physical Attacks in Sensor Networks”. in Proceedings of 2nd IEEE Mobile Sensor and Ad-hoc and Sensor Systems (MASS), November 2005.

Wei Yu, Sriram Chellappan, Xun Wang, and D. Xuan. “On Defending Peer-to-Peer System-based Active Worm Attacks”. in Proceedings of 48th IEEE Global Telecommuni- cations Conference (Globecom), November 2005.

Xun Wang, Sriram Chellappan, Wenjun Gu, Wei Yu and D. Xuan. “Search-based Physical Attacks in Sensor Networks”. in Proceedings of 14th IEEE International Conference on Computer Communication and Networks (ICCCN), October 2005.

Xun Wang, Wenjun Gu, Kurt Schosek, Sriram Chellappan, D. Xuan. “Sensor Network Configuration under Physical Attacks”. in Proceedings of 3rd International Conference on Computer Network and Mobile Computing (ICCNMC), August 2005.

Xun Wang, Wenjun Gu, Sriram Chellappan, K. Schosek and D. Xuan. “Lifetime Optimiza- tion of Sensor Networks under Physical Attacks”. in Proceedings of IEEE International Conference on Communications (ICC), May 2005.

D. Xuan, Sriram Chellappan, Xun Wang and Shengquan Wang. “Analyzing the Secure Overlay Services Architecture under Intelligent DDoS Attacks”. in Proceedings of 24th IEEE International Conference on Distributed Computing Systems (ICDCS), March 2004.

D. Xuan, Sriram Chellappan and Xun Wang. “Resilience of Structured Peer-to-Peer Sys- tems: Analysis and Enhancement”. Handbook on Theoretical and Algorithmic Aspects of Sensor, Ad Hoc Wireless and Peer-to-Peer Networks, CRC press LLC, 2004.

viii FIELDS OF STUDY

Major Field: Computer Science and Engineering

Studies in: Computer Networking Prof. Dong Xuan Prof. Ten H. Lai Prof. Ming T. Liu Software Engineering Prof. Atanas Rountev Computer Architecture Prof. Mario Lauria

ix TABLE OF CONTENTS

Page

Abstract ...... ii

Dedication ...... iv

Acknowledgments ...... v

Vita ...... vii

List of Tables ...... xiii

List of Figures ...... xiv

Chapters:

1. Introduction ...... 1

1.1 Widespread Internet Attacks are Major Threats to the Internet ...... 1 1.2 Widespread Internet Attacks are Evolving ...... 2 1.3 Contributions of the Dissertation: Defense-Oriented Evolution and Coun- termeasures ...... 5 1.3.1 Infrastructure-oriented Attacks ...... 7 1.3.2 Algorithm-oriented Attacks ...... 10 1.4 Organization of the Dissertation ...... 13

2. Intelligent DDoS Attacks against Secure Overlay Forwarding Systems and Countermeasures ...... 14

2.1 Motivations ...... 14 2.2 Background ...... 17 2.3 Intelligent DDoS Attacks ...... 19 2.4 Analysis of Intelligent DDoS Attacks against SOFS Systems ...... 21

x 2.4.1 Analysis of Round based Intelligent DDoS Attacks ...... 21 2.4.2 Analysis of Continuous Intelligent DDoS Attacks ...... 43 2.5 Countermeasures ...... 47 2.5.1 Optimization of SOFS System Performance Under Round based Attacks ...... 47 2.5.2 General Design Guidelines to Enhance SOFS System Performance 50 2.6 Related Work ...... 51 2.7 Summary ...... 52

3. Localization Attack against Internet Threat Monitoring Systems and Counter- measures ...... 54

3.1 Motivations ...... 54 3.2 Background ...... 56 3.2.1 Internet Threat Monitoring Systems ...... 56 3.2.2 Localization attacks against ITM Systems ...... 57 3.3 iLOC Attack ...... 57 3.3.1 Overview ...... 58 3.3.2 Attack Traffic Generation Stage ...... 60 3.3.3 Attack Traffic Decoding Stage ...... 63 3.3.4 Discussions ...... 66 3.4 Analysis ...... 67 3.4.1 Accuracy Analysis ...... 67 3.4.2 Invisibility Analysis ...... 69 3.4.3 Determination of Attack Parameters ...... 71 3.5 Implementation and Validation ...... 73 3.5.1 Implementation of the iLOC Attack ...... 73 3.5.2 Validation of the iLOC Attack ...... 74 3.6 Performance Evaluation ...... 76 3.6.1 Evaluation Methodology ...... 76 3.6.2 Results ...... 78 3.7 Guidelines of Countermeasures ...... 80 3.8 Related Work ...... 83 3.9 Summary ...... 84

4. Varying Scan Rate Worms against Network-based Worm Detections and Coun- termeasures ...... 85

4.1 Motivations ...... 85 4.2 Background ...... 86 4.2.1 The Propagation Model of Traditional Worms ...... 86 4.2.2 Network-based Worm Detection ...... 88

xi 4.3 The Active Worm with Varying Scan Rate ...... 90 4.3.1 The VSR Worm Model ...... 90 4.3.2 Analysis of the VSR Worm ...... 91 4.4 DEC Worm Detection ...... 96 4.4.1 Design Rationale ...... 96 4.4.2 DEC Worm Detection ...... 97 4.4.3 Space of Worm Detection ...... 102 4.5 Performance Evaluation ...... 104 4.5.1 Evaluation Methodology ...... 104 4.5.2 Detection Performance ...... 106 4.6 Related Work ...... 110 4.7 Summary ...... 112

5. Polymorphic Worms against Host-based Worm Detection and Countermeasures 114

5.1 Motivations ...... 114 5.2 Background ...... 117 5.2.1 Worm Detection ...... 117 5.2.2 Program Analysis ...... 118 5.2.3 Data Mining ...... 119 5.3 Polymorphic Worms ...... 120 5.4 Worm Detection via Mining Dynamic Program Execution ...... 121 5.4.1 Framework ...... 121 5.4.2 Dataset Collection ...... 125 5.4.3 Feature Extraction ...... 127 5.4.4 Classifier Learning and Worm Detection ...... 129 5.5 Experiments ...... 138 5.5.1 Experiment Setup and Metrics ...... 139 5.5.2 Experiment Results ...... 140 5.6 Discussions ...... 143 5.7 Related Work ...... 145 5.8 Summary ...... 147

6. Concluding remarks ...... 149

Bibliography ...... 151

xii LIST OF TABLES

Table Page

1.1 Defense-oriented attacks and countermeasures studied in this dissertation. . 6

2.1 Optimal mapping degree with different NT ...... 49

2.2 Optimal node distribution under 1 to 2 mapping with different NT . . . . . 49

3.1 Defender Detection Rate PDD (Port 135) ...... 79

4.1 Detection Time of Some Existing Detection Schemes ...... 95

4.2 Maximal Infection Ratio of Some Existing Detection Schemes ...... 96

4.3 DEC Performance Sensitivity of Parameter α ...... 108

5.1 Detection results for the Naive Bayes based detection ...... 141

5.2 Detection results for the SVM based detection ...... 141

xiii LIST OF FIGURES

Figure Page

2.1 The generalized SOFS architecture...... 17

2.2 A Snapshot of the generalized SOFS architecture under the intelligent DDoS attacks...... 23

2.3 Sensitivity of PS to L and mi under different attack intensities...... 29

2.4 Node demarcation in our successive attack at the end of Round j...... 34

2.5 Sensitivity of PS to NT under different L, mi and N...... 39

2.6 Sensitivity of PS to L, mi and node distribution...... 40

2.7 Sensitivity of PS to R (a) and PE (b)...... 42

2.8 Sensitivity of PS to L under different m (a), and to NC under different L and r (b)...... 45

2.9 Sensitivity of PS to NT under different r and L (a), and different r and m (b). 46

3.1 Workflow of the iLOC Attack ...... 58

3.2 PN-code and Encoded Attack Traffic ...... 63

3.3 Experiment Setup ...... 74

3.4 Background Traffic vs. Traffic Mixed with iLOC Attack ...... 75

3.5 PSD for Background Traffic vs. Traffic Mixed with iLOC Attack ...... 75

xiv 3.6 Attack Successful Rate (Port 135) ...... 77

3.7 Attack Successful Rate vs. Code Length ...... 81

3.8 Attack Successful Rate vs. Number of Parallel Attack Sessions ...... 81

3.9 Attack Successful Rate vs. Number of Parallel Attack Sessions ...... 81

4.1 Infection ratio of different VSR worms...... 93

4.2 The observed worm instance count of different VSR worms...... 93

4.3 Bayes decision rule for normal and worm traffic features ...... 101

4.4 Space of worm detection ...... 103

4.5 Detection time of detection schemes on VSR worms ...... 106

4.6 Maximal infection ratio of detection schemes on VSR worms ...... 107

4.7 Detection time o f detection schemes on the traditional PRS worms . . . . . 109

4.8 Maximal infection ratio of detection schemes on the traditional PRS worms 109

5.1 Workflow of the off-line classifier learning ...... 121

5.2 Workflow of the on-line worm detection ...... 122

5.3 Basic idea of kernel function in SVM...... 135

xv CHAPTER 1

INTRODUCTION

1.1 Widespread Internet Attacks are Major Threats to the Internet

Widespread Internet attacks are large-scale attacks on the Internet whose attack sources spread widely over the Internet [23, 68]. They have been major threats to the Internet in the recent past and there are many well-known examples such as Distributed Denial of

Service (DDoS) attacks, active worm attacks, spam, spyware etc. In July 2001, an active worm called “Code-Red” infected more than 350,000 Microsoft IIS servers. In less than

14 hours, this active worm caused more than 1.2 billion dollars in economic damages [87].

In October 2002, a DDoS attack lasted for only an hour but was able to shut down 7 of the

13 Internet DNS root servers [7]. In January 2003, another active worm called “Slammer” infected nearly 75,000 Microsoft SQL servers in less than 10 minutes and consequently caused large scale disruptions in production systems worldwide [86]. In March 2004, active worms called “Witty” and “” infected many hosts in a short time and made them unusable [24]. This list of attacks is increasing and there is no apparent end in sight.

Furthermore, a recent trend has emerged in which different types of attacks combine to increase attack sophistication and efficiency. For example, worms launch DDoS attack against the White House’s website (www.whitehouse.gov) at the final stage of their

1 propagation [88]. More recently, in February 2004, the worm propagated rapidly to many hosts which flooded the websites, www.sco.com and www.microsoft.com, thereby preventing legitimate users from accessing them [4]. Combinations of different types of attacks is not limited to active worms and DDoS attacks. Many active worms are used to infect a large number of hosts and recruit them as bots or zombies, which are networked together to form botnets [99] [102] [92]. These botnets can be used to: (i) launch massive

DDoS attacks that disrupt Internet utility [4], (ii) access confidential information that can be abused [5] through large scale traffic sniffing, key logging, identity theft etc., (iii) distribute large-scale unsolicited advertisement emails (as spam) or software (as adware), (iv) spread new by installing Trojan Horses or other backdoor software, and (v) destroy data that has a high monetary value [6]. There is even evidence showing that botnets are being rented out for attacks on Internet e-businesses [102].

1.2 Widespread Internet Attacks are Evolving

Due to the massive damage caused by widespread Internet attacks, a significant amount of research efforts have been focused on developing effective methods to model, detect and defend against them. Among them, DDoS attacks and active worms are the most dominant and dangerous threats to the Internet. Research regarding understanding of them and defense against them is the most important and imperative and receives the greatest attention and effort.

Although much effort and progress have been made in this direction, effective defense against these attacks is still a challenge today due to one fact: widespread Internet attacks have evolved and are continuously evolving.

2 1. Evolution of DDoS Attacks

During its evolution, the DDoS attack has not only has equipped itself with new at-

tack approaches, it also has added new types of entities to its list of attack targets.

Generally, a Denial of Service (DoS) attack is characterized by an explicit attempt

to prevent the legitimate use of a system service. A Distributed Denial of Service

(DDoS) attack deploys multiple attacking computers to attain this goal. In the early

generations of DDoS attacks, the attacker sent a stream of packets to a victim, which

consume some key resource such as network bandwidth, computation capacity (CPU

cycle), or memory, thereby rendering the resource unavailable to the victim’s legit-

imate clients. In later DDoS attacks, the attacker sent a few malformed packets to

confuse an vulnerable application or a vulnerable protocol on the victim machine

and force it to freeze or reboot. Both of these two approaches set specific hosts or

networks as the victim, but the former targets to network or computation resources,

whereas the latter targets the protocol or application. Furthermore, new DDoS at-

tacks target the Internet infrastructure (such as DNS systems) rather than specific

victims [7].

DDoS attacks also attempt to evade detection. In the above attacks, the attacker con-

tinuously sends a large amount of packets to a victim to exhaust its key resource,

overload it to disable communication, crash its service, or block its network link.

Typical examples are TCP SYN attack, TCP and UDP flood attacks, ICMP echo

attack and Smurf attack [81]. These attack methods share one common feature: a

large number of compromised machines or agents involved in the attack and transmit

packets at a high rate to the victim, which make the DDoS attacks easy to be de-

tected. While potentially quite harmful, the high-rate nature of such attacks presents

3 a statistical anomaly to network monitors such that the attack can be detected, the

attacker identified, and the attack effects mitigated. However, recent work shows

that smart DDoS attackers can maliciously chosen low-rate DoS traffic patterns that

exploit TCP’s retransmission time-out mechanism and throttle TCP flows to a small

fraction of their ideal rate while eluding detection [66] [74].

2. Evolution of Active Worms

Active worms also use multiple evolutionary principles to propagate themselves more

efficient. First, while the first generation worms only used port scanning to propa-

gate themselves, current worms propagate themselves more effectively using various

methods, e.g., network port scanning, email, file sharing, Peer-to-Peer (P2P) net-

works, and Instant Messaging (IM). Second, worms use different scan strategies dur-

ing different stages of propagation. For example, instead of using pure random scans

to find victims, they use a hitlist to infect previously identified vulnerable hosts at

the initial stage of propagation in order to increase propagation efficiency [132, 81].

Once they propagate to a new local network, they scan all IP addresses in this local

network first in order to increase the chances to hit victims. They use DNS, network

topology and routing information to identify active hosts instead of randomly scan

IP addresses [132, 81]. They split the target IP address space during propagation in

order to avoid duplicate scans.

They also become more modular and organized in order to carry other attack pay-

loads and launch any kind of organized and synchronized large scale attack [51, 99].

Furthermore, in order to evade numerous worm detection systems, they are becom-

ing stealthy. For example, the “tak” worm [144] is a recently-discovered active worm

4 that attempts to remain hidden by sleeping (stopping scans) when it suspects it is un-

der detection. Worms that adopt similar attack strategies to those of the “Atak” worm

could yield overall scan traffic patterns different from those of traditional worms.

Therefore, the existing network-based detection schemes with scan traffic monitor-

ing will not be able to effectively detect them.

Unlike the above network-based worm detection systems, host-based worm detection

systems search inbound binary code content for known patterns, or signatures, that

correspond to worms. To date, in order to detect and/or block active worms, these

worm detection systems use signatures that match bytes from a worm’s payload using

techniques such as string matching at arbitrary payload offsets [3, 111] and regular

expressions matching within a payload [3].

However, newly evolved active worms intend to be polymorphic [20, 32, 63]. Poly-

morphic worms are able to change their binary representation or signature as part

of the spreading process. This can be achieved with self-encryption mechanisms or

semantics-preserving code manipulation techniques. Consequently, copies of a poly-

morphic worm may no longer share a common invariant substring of sufficient length

and existing detection systems will not recognize network streams that contain copies

of worms or executables as manifestations of a worm outbreak. This worm evolution

trend also requires us to enhance content-based worm detection schemes.

1.3 Contributions of the Dissertation: Defense-Oriented Evolution and Countermeasures

Among the above evolutions of widespread Internet attacks, some aim to evade the de- tection based on their knowledge of detection schemes, such as the slow-rate DDoS attacks

5 Defense Element the Attack Countermeasure Attack is Oriented Architecture Intelligent DDoS attacks Optimal configuration of Infrastructure against SOFS systems SOFS systems Location invisible LOCalization Enhancement of ITM sys- (iLOC) attacks against ITM tems systems Network- Varying Scan Rate worms attack target Distribution Algorithm based algo- Entropy based dynamiC rithm (DEC) worm detection Camouflaging worms Spectrum analysis based worm detection Host-based Polymorphic worms Worm detection through algorithm mining dynamic program Execution

Table 1.1: Defense-oriented attacks and countermeasures studied in this dissertation.

and polymorphic worms discussed previously. We notice that this kind of deliberate attack evolution and resulted attacks are more effective (for the attacks) and more dangerous (for the defenses and victims) than random and ad hoc evolution. We carry out systematic and comprehensive investigations following a more general evolution trend: defense-oriented evolution. Defense-oriented evolution results in various enhanced or new attacks that take advantage of knowledge of existing defense systems to counteract these systems. In this dissertation, we investigate varieties of potentially defense-oriented, evolved attacks in or- der to obtain deep insights about them and find vulnerabilities in existing defense systems.

Consequently, we further enhance these defense systems or design new and more effective defense schemes.

While multiple elements define a specific defense system, the most important ones are the infrastructure and algorithm parts. Defense-oriented (evolved) attacks can exploit and leverage knowledge of defense system infrastructure and algorithms in order to coun- teract them and make new attacks more effective and dangerous. Thus, we can classify

6 the defense-oriented attacks into defense-infrastructure-oriented (or just infrastructure-

oriented) and defense-algorithm-oriented (or just algorithm-oriented) attacks. We inves-

tigate different instances of each class of defense-oriented widespread Internet attacks as

shown in Table 1.1.

1.3.1 Infrastructure-oriented Attacks

Infrastructure is simple for single-sited systems, such as host-based or single-device-

based systems, but it can be sophisticated for distributed systems. There are several im-

portant elements in the infrastructure of a system, such as its architecture, topology and

components. In order to counteract defense systems, attackers desire to obtain information

about the infrastructure elements thereof. While some high-level infrastructure information

(such as constituent component, topology type, and etc.) may be public, detailed informa-

tion (such as the identities, roles and locations of the components, relations and connec-

tions between components within the architecture, and etc.) are not accessible outside of

the system. However, it is these detailed infrastructure information that attackers can use to

counteract the defense system. Therefore, attackers need to use specific attack approaches

to obtain these information.

In our research, we focus on infrastructure-oriented attacks that target to the architec- ture and location information of defense systems. We systematically investigate instances of these attacks, based on which we propose methods to enhance defense systems.

7 1. Architecture-oriented Attacks and Countermeasures

— Intelligent DDoS Attacks to SOFS Systems and Optimization of SOFS Sys-

tems

A recent approach to protect communications from DDoS attacks involves the use of

overlay systems. Although such systems perform well under random DDoS attacks,

it is questionable whether they are resilient to intelligent DDoS attacks which aim

to infer the architectures of the systems to launch more efficient attacks. We define

several intelligent DDoS attack models and develop analysis and simulations to study

the impacts of intelligent DDoS attacks on system performance in terms of the path

availability between clients and the server [127, 135]. We generalize such systems as

Secure Overlay Forwarding Systems (SOFS). There are certain standard architectural

features of such systems, i.e., layering, mapping degree and node distribution.

We analyzed our SOFS system under discrete-round-based and continuous attacks

using a general analytical approach and simulations, respectively. We observed that

the system design features, attack strategies, intensities, prior knowledge about the

system, and system recovery significantly impacts system performance. Even under

sophisticated attack strategies and intensities, we showed that smart design of system

features and recoveries can significantly reduce attack impacts. We provide a method

to obtain optimal system configurations under given attack strategies and intensities.

Furthermore, we propose a set of design guidelines to enhance SOFS performance

under all general scenarios.

8 2. Location-oriented Attacks and Countermeasures

— invisible LOCalization Attack against Internet Threat Monitoring Systems

Internet threat monitoring (ITM) systems have been deployed to detect widely spread-

ing threats and attacks on the Internet in recent years. However, the integrity and

functionality of these systems largely depend on the location anonymity of their mon-

itors. If the locations of monitors are disclosed, the attacker can bypass the monitors

or even abuse them, significantly jeopardizing the performance of ITM systems. In

this work, we study a new class of attack, the invisible LOCalization (iLOC) attack

[130]. The iLOC attack can accurately and invisibly localize monitors of ITM sys-

tems. In the iLOC attack, the attacker launches low-rate scan traffic, encoded with

a selected pseudo-noise code (PN-code), to targeted networks. While the secret PN-

code is invisible to others, the attacker can accurately determine the existence of

monitors in the targeted networks based on whether the PN-code is embedded in the

report data queried from the data center of the ITM system. We implement the iLOC

attack and conduct experiments on a real-world ITM system to validate the feasibil-

ity of such attacks. We also conduct extensive simulations on the iLOC attack using

real-world traces. Our data demonstrate that the iLOC attack can accurately iden-

tify monitors while remaining invisible to ITM systems. Finally, we present a set

of guidelines to counteract the iLOC attack. Particularly, the iLOC attack does not

directly harm Internet or defense systems by itself, but it can aid other widespread

Internet attacks by defeating ITM systems to increase attack damage.

9 1.3.2 Algorithm-oriented Attacks

Algorithms in a defense system deal with flow control, data processing and decision-

making in detection and response. Similar to the infrastructure information, the high-level

algorithms are not unknown to attackers. However, attacker do not need to know their

detailed information, despite the difficulty of obtaining this. Knowledge of high-level al-

gorithm information suffices for attackers to evolve their attacks to render current defense

system algorithms ineffective, and thus to defeat the defense systems.

In this dissertation, we focus on algorithm-oriented worm attacks, which attempt to

evolve themselves based on their knowledge of existing worm detection algorithms in order

to circumvent detection. Since worm detection can be classified into network-based and host-based categories, we study algorithm-oriented worm attack instances in each category.

1. Network-based Algorithm-oriented Attacks and Countermeasures

— VSR Worm, C-Worm and Countermeasures

It has been observed that the number of infected hosts and overall port scan traffic

volume increase exponentially over time when traditional worms propagate in the

Internet [86][27][148]. Based on these observations, many network-based worm de-

tection algorithms associated with global scan traffic monitoring systems, such as

threshold-based detection and trend-based detection, have been developed to detect

large scale propagation of worms in the Internet [147][105][132][123]. However,

worm writers know that these detection algorithms expect exponentially increasing

port scan traffic or a large volume of port scan traffic during worm propagation.

Consequently, worm writers can evolve their worms accordingly so existing worm

detection algorithms will not observe abnormal traffic or generate an alarm during

10 the propagation of new worms. Hence the evolved worms can evade existing worm

detection systems.

In this dissertation, we model a new class of active worms called the Varying Scan

Rate worm (the VSR worm in short) [141]. The VSR worm deliberately varies

its scan rate and is able to avoid effective detection by existing worm detection

schemes. As a countermeasure against the VSR worm, we design a new worm detec-

tion scheme called attack target Distribution Entropy based dynamiC worm detection

(DEC detection in short). DEC detection utilizes the attack target distribution and its

statistical entropy in conjunction with dynamic decision rules to distinguish worm

scan traffic from non-worm scan traffic. We conduct extensive performance eval-

uations on the DEC detection scheme using real-world traces as background scan

traffic. Our data clearly demonstrate the effectiveness of the DEC detection scheme

in detecting VSR worms as well as traditional worms.

In our research, we also investigate another new class of active worms, i.e., the Cam-

ouflaging Worm (C-Worm), which has the ability to camouflage its propagation from

worm detection systems [142] through timely manipulation of scan traffic. In order

to detect C-Worms, we design a novel spectrum-based scheme. Our performance

results demonstrate that our scheme can detect the C-Worm more rapidly and accu-

rately in comparison with existing worm detection schemes.

2. Host-based Algorithm-oriented Attacks and Countermeasures

— Polymorphic Worm Detection via Mining Dynamic Program Execution

As discussed in Section 1.2, host-based worm detection systems commonly use a col-

lection of worm signatures to determine whether an incoming executable is a worm

11 or not. However, the worm writers know that the detection algorithms in these de-

tection systems expect signatures of known worms. Consequently, they can generate

new worms which have different binary representations or signatures. They even can

generate polymorphic worms which change their binary representations or signatures

as part of the propagation process. Thus, the existing host-based worm detection al-

gorithm will not observe the expected worm signatures during the propagation of

these evolved worms and cannot detect them effectively

In order to detect these polymorphic worms or worms whose signatures are unknown

to the host-based worm detection systems, we propose a new worm detection ap-

proach based on mining dynamic program executions [129]. This approach can cap-

ture the dynamic behavior of executables to provide accurate and efficient detection

against both seen and unseen worms. In particular, we execute a large number of real-

world worms and benign executables and trace their system calls. For mining from

a large amount of features extracted from the system call traces, we apply two clas-

sifier learning algorithms (Naive Bayes and Support Vector Machine). The learned

classifiers are further used to carry out rapid worm detection with low overhead on

the end-host. Our experimental results clearly demonstrate the effectiveness of our

approach to detect new and polymorphic worms in terms of very high detection rate

and low false positive rate.

Biological evolution is based on fast genetic mutation and natural selection, i.e., it relies on a brute-force and ad hoc search for better genetic adaption to the environment [19].

Men-made entities in human society cannot use this random and slow evolution principle; instead, humans will control the evolution of their products and adapt them purposely and effectively. We believe the defense-oriented attack evolution studied in this dissertation

12 is among the most dangerous and efficient types of evolution, since attackers deliberately

evolve their malware in order to defeat their adversaries, i.e., defense systems. However,

our purpose is not to encourage attacks, but to obtain deep insights about potential new

Internet threats and vulnerabilities of current defense systems in order to enhance current

defense systems and design more effective defenses against evolving widespread Internet

attacks.

1.4 Organization of the Dissertation

The rest of this dissertation is organized as follows. We present our investigation

on architecture-oriented evolving attacks and countermeasures in Chapters 2 and 3, then

we discuss algorithm-oriented evolving attacks and countermeasures in Chapters 4 and 5.

Specifically, we discuss the intelligent DDoS attacks against SOFS systems and counter-

measures in Chapter 2, then we present the iLOC attacks against ITM systems and coun- termeasures in Chapter 3. Afterwards, we detail the VSR worm attacks with our DEC detection countermeasure in Chapter 4 and we introduce worm detection against polymor- phic worms through mining dynamic program execution in Chapter 5. Finally, we conclude this dissertation in Chapter 6.

13 CHAPTER 2

INTELLIGENT DDOS ATTACKS AGAINST SECURE OVERLAY FORWARDING SYSTEMS AND COUNTERMEASURES

In this chapter, we discuss the first type of infrastructure-oriented attack in this disser- tation, which infer the architecture information of a DDoS (Distributed Denial of Service) attack defending system to facilitate the DDoS attacks. Particularly, we define intelligent

DDoS attacks and generalize DDoS-defending overlay intermediate forwarding systems as

Secure Overlay Forwarding Systems (SOFS). The intelligent DDoS attacks we study here are evolved DDoS attacks which use the architectures of SOFS to launch more efficient

DDoS attacks against SOFS. We analyze the SOFS system under discrete round based at- tacks using a general analytical approach, and analyze the system under continuous attacks using simulations. We also provide optimal system architecture configurations for SOFS system under expected attack strategies and intensities. Furthermore, we propose a set of design guidelines to enhance SOFS performance under all general attack scenarios.

2.1 Motivations

DDoS attacks are currently major threats to communications in the Internet [83]. Cur- rent level of sophistication in system resilience to DDoS attacks is far from definite. Tremen- dous amount of research is being done in order to improve the system security under DDoS

14 attacks [104, 76, 95, 66, 60, 10, 115]. For many applications, reliability of communication

over the Internet is not only important but mandatory. Typical examples of such applica-

tions are emergency, medical, and other related services. The system needs to be resilient

to attacks from malicious users within and outside of the system that aim to disrupt com-

munications.

A recent body of work in the realm of protecting communications between a set of

clients and a server against DDoS attacks employs proactive defense mechanisms using

overlay-based architectures [60, 10, 115]. Typically, in such overlay-based architectures, a

set of system deployed nodes on the Internet form a communication bridge between clients

and a critical server. The deployed nodes are intermediate forwarders of communication

from clients to the server. These nodes are arranged into overlay-based architectures (or

structures) that provide attack-resistant features to the overall communication. For exam-

ple, the architecture in the SOS system [60] is a set of overlay nodes arranged in three lay- ers between clients and the server through which traffic is authenticated and then routed.

These layers are SOAP (Secure Overlay Access Point), Beacons and Secret Servlets. A client that wishes to communicate with a server first contacts a node in the SOAP layer.

The node in the SOAP layer forwards the message to a node in the beacon layer, which then forwards the message to a node in the secret servlet layer, which routes the message to the server. In the Mayday system [10], the authors extend work on SOS [60] by primar- ily releasing the restrictions on the number of layers (unlike in SOS, where it is fixed at three). In the Internet Indirection Infrastructure (I3) [115], one or more Indirection points are introduced as intermediaries for communication between senders and receivers.

The design rationale in all these systems is to ensure, using proactive architectures, (i)

that the server and intermediate communication mechanisms are hidden from outsiders, (ii)

15 the presence of multiple/alternate paths to improve reliability and (iii) access control to pre- vent illegitimate users from being serviced, and dropping attack traffic far away from the server. The overall objective though is to ensure that there are high degrees of path avail- abilities from clients to the server even when attackers try to compromise communication using random congestion-based DDoS attacks, by bombarding randomly chosen nodes in the system with huge amounts of traffic.

While the above systems provide high degrees of path availabilities under random congestion-based DDoS attacks, such systems can be targeted by intelligent attackers that can break-into the system structure apart from congesting nodes. By break-in attacks, we mean attacks that can break-into a node and disclose its neighbors in the communication chain. By combining break-in attacks with congestion attacks, attackers can significantly worsen damages, as opposed to pure random congestion. In fact attackers can employ re- sults of break-in attacks (disclosed nodes) to guide subsequent congestion attacks on the disclosed nodes. Under intense break-in attacks, the attacker can traverse the communi- cation chain between the forwarder nodes, and can even disclose the server to eventually congest it and completely annul services.

We believe that such intelligent DDoS attacks that can combine break-in attacks with congestion attacks are representative and potent threats to overlay-based systems, such as [60, 10, 115] that protect communications between clients and the servers. However, existing work does not study system performance under these intelligent attacks. In this chapter, we extensively study performance of such overlay-based systems when targeted by intelligent DDoS attacks that combine break-in and congestion attacks. We also subse- quently study how design features of such systems impact performance under intelligent attacks. As a first step, we generalize such systems as Secure Overlay Forwarding Systems

16 (SOFS). We also capture three standard architectural features of such systems 1 as layering

(the number of layers between the client and server), mapping degree (number of next layer neighbors a node can communicate with), node distribution (number of nodes per layer).

Our objective is to study the impacts of the design features of SOFS system on its performance under intelligent DDoS attacks, and to provide guidelines to design SOFS systems highly resilient to intelligent DDoS attacks.

2.2 Background

The SOFS Architecture

In its most basic version, the SOFS architecture consists of a set of overlay nodes ar- ranged in layers of a hierarchy as shown in Fig. 2.1. The nodes in these layers serve as intermediaries between the clients and the critical target 2. Such a system has three distin- guishable design features. They are Layering, Mapping (Connectivity) Degree and Node

Distribution across layers. A clearer description for each feature is given below.

Layer i Layer i+1 LayerNode i LayerNode i+1 LayerNode i LayerNode i+1 Source Layer 2 Node Node Layer L Point LayerNode 2 LayerNode L Layer 2 Node LayerNode L Node Node Layer 1 Layer 1 Node Node Target Filtered region

Figure 2.1: The generalized SOFS architecture.

1We use the terms architectural features and design features interchangeably in this chapter. 2We use the terms target and server interchangeably in this chapter.

17 • Number of Layers (Layering): The number of layers in the architecture quantifies

the depth of control during access to the target. If the number of layers is L, then

clients must pass through these L layers before communicating with the target. The

importance of layering is that if the number of layers is larger, implicitly it means

that the target is better hidden against external clients.

• Mapping (Connectivity) Degree: Each node in Layer i routes to node(s) in Layer

i + 1 towards the target to complete the communication chain. The mapping degree

in the SOFS architecture is a measure of the number of neighbors a node in Layer i

has in Layer i + 1. Typically, the larger the mapping degree is, the more reliable is

the communication due to the availability of more paths. The largest is actually 1 to

all, where each node in Layer i has all nodes in Layer i + 1 as its neighbors.

• Node Distribution: Node distribution is a measure of the number of nodes in each

layer. Intuitively it may seem that the uniform node distribution across layers is

preferred to ensure a degree of load balancing in the system. However, for a fixed

amount of nodes to be distributed across a fixed number of layers, it may be advisable

to deploy more nodes at layers closer to the target to increase defenses in sensitive

layers nearer the target.

A client that wishes to communicate with the target first contacts node(s) in the first layer which contact node(s) in the second layer and so on till the traffic reaches the target. In this architecture each node is only aware of neighbors in its neighboring layer. A set of filters acts as a firewall surrounding the target through which only legitimate traffic is allowed.

18 2.3 Intelligent DDoS Attacks

We now discuss how intelligent attackers can compromise the SOFS system. It has the ability to break-into nodes to disclose the victim nodes’ next-layer neighbors. The attacker also has the ability to congest nodes to prevent them from servicing legitimate clients. We formally define these two attacks below.

• Break-in Attacks: The attacker has the ability to attempt to break-into nodes in the

SOFS system. A successful break-in results in dysfunction of the victim node and

disclosure of the neighbors of the victim node.

• Congestion Attacks: The attacker has the ability to congest nodes in the SOFS sys-

tem. By congest, congestion-based DDoS attacks or simply congestion attacks, we

mean any of the distributed attack methods that prevent a victim machine from pro-

viding services.

Our work focuses on the theoretical analysis of the impacts of intelligent DDoS attacks on the SOFS system, rather than the actual attack methods. However, we believe that both the break-in attacks and congestion attacks models we present are practical. The execution of break-in attacks can be through some intrusion attacks, or through malicious code hidden in the message sent by malicious clients as those in Trojan horse or active worm attacks [83].

When received by the victim node, the malicious code can execute on the victim node to make it un-functional, and retrieve the victim node’s neighbor list. The malicious code can even then self propagate to the disclosed neighbors. The execution of congestion attacks on a victim machine will result in, the victim being prevented from servicing requests or, disconnecting the victim from the system. This can be due to exhausting its key resource,

19 overloading the machine to disable communication, crashing its service, blocking its net- work link. Typical examples are TCP SYN attack, TCP and UDP flood attack, ICMP echo attack and Smurf attack [83]. The above two attacks can be conducted in several possible ways. However, keeping in mind the above attack types, and with the intention of max- imizing attack impacts, the attacker will usually first conduct break-in attacks to disclose the identities of many nodes. Congestion attacks on the disclosed nodes then follow after the break-in attacks. In this realm, we define two attack models below.

• A discrete round based attack model: In our attack models, the attacker can launch

break-in attacks on limited number of nodes only. In round based attack model,

the attacker launches the break-in attacks in a round by round fashion, with part of

attempts made in each round. The rationale is that, by successively breaking-into

nodes and locating their neighbors, the attacker can disclose more nodes. We call

this model as discrete because, here the attacker starts a fresh round only after the

results of all attempted break-ins in the current round are available to it. Congestion

attacks follow next, and are conducted in one round.

• A continuous attack model: In this model, the attacker attempts to disclose some

nodes first, using part of its break-in attack resources. However in this model, the at-

tacker continuously keeps breaking-into disclosed nodes as and when they are iden-

tified. Congestion attacks follow next in a similar fashion.

The attack models are described in more detail in Sections 2.4.1 and 2.4.2. We wish to emphasize here that the SOFS system also has the recovery ability to defend against attacks.

However any meaningful execution of the recovery mechanism is contingent on the attacks.

In some cases, the system may not be able to conduct any effective recovery if the attacker

20 can speedily conduct its attack, disrupting system performance for some short duration of

time. However, if the attack is slow, the system can attempt to take effective recovery action

to restore performance. More details on system recovery are given in Section 2.4.2.

In this chapter, we study the SOFS system performance under discrete round based

attacks and continues attacks. We demonstrate that system performance is sensitive to

design features and attacks, and the architecture needs to be flexible in order to achieve

better performance under different attacks.

2.4 Analysis of Intelligent DDoS Attacks against SOFS Systems

2.4.1 Analysis of Round based Intelligent DDoS Attacks

In this section we conduct an extensive mathematical analysis on the SOFS architecture

under the discrete round based intelligent DDoS attack model with no system recovery.

The system we study consists of a total of N overlay nodes that can be active or dormant.

By active, we mean that the nodes are currently in the SOFS architecture and ready to

serve legitimate requests 3.A dormant overlay node is one that is a part of the system but

currently is not in the SOFS architecture and is not serving requests. In this chapter, when

we use the term overlay node, it could mean either an active or a dormant node. We denote

the number of active nodes in the SOFS architecture (also called as SOFS nodes) by n

PL (n ≤ N) which are distributed across L layers. Layer i has ni nodes and i=1 ni = n. Each node in Layer i has one or more neighbors in its next higher layer to complete the communication chain. We define the number of next layer (Layer i) neighbors that a Layer i − 1 node has as mi.

3In the remaining of the chapter, if the context is clear, we will just use node or SOFS node to represent an active node.

21 In this chapter, we assume that the attack resources are limited. By attack resources, we mean the attack capacity, which depends on the amount of attack facilities. For instance, this can be the number of slave machines recruited by the attacker to launch DDoS attacks

[83]. We denote that the break-in attack and congestion attack resource as NT and NC

respectively. Thus NT and NC are the maximum number of nodes the attacker can launch

break-in and congestion attacks on. Here NT + NC ≤ N. With a probability PB, the

attacker can successfully break-into a node and disclose its neighbors in a break-in attempt.

In the SOFS system we study in this section, the system does not do any recovery

to counter attacks. The significance of our analysis and the results therein we observe

here is in obtaining a fundamental understanding of attack impacts to the system (and its

features). Nevertheless, our analysis here is still practical as in some cases, the speed of

attacks may be quite high, preventing the system from performing recoveries. In such cases,

our analysis here provides insights into damages that are caused under such rapid/burst

attacks. With the SOFS system and attack specifics in place, we now formally define our

performance metric, PS, as the probability that a client can find a path to communicate with the target under on-going attacks.

Under a One-burst Round Based Attack Model

1. Attack Model

The model we define here is an instance of the discrete round based attack model

where the number of rounds is 1. The attacker will spend all the break-in attack

resources randomly and instantly in one round and then launch the congestion attack.

Even though this model may appear simple, in reality such a type of attack is possible

when say, the system is in a high state of alert anticipating imminent attacks, which

22 the attacker is aware of and still wishes to proceed with the attack. Here we assume

the attacker has no prior knowledge about the identities of the SOFS nodes, i.e.,

which overlay nodes are currently SOFS nodes.

2. Analysis

1 2 i L L+1

client target good nodecongested node broken-in node good filter congested filter

Figure 2.2: A Snapshot of the generalized SOFS architecture under the intelligent DDoS attacks.

Our goal is to determine PS, the probability that a client can find a path to commu-

nicate with the target under attacks. This is directly related to the number of nodes

compromised due to attacks (both break-in and congestion attacks). Thus, the key

defining feature of our analysis is in determining the set 4 of attacked SOFS nodes

in each layer. An intuitive way to analyze the system is to list all possible combi-

nations of attacked nodes in each layer. Then calculate and summarize PS over all

combinations. It is easy to see that there could be many such possible combinations.

For a system with L layers and n nodes evenly distributed, such combinations will

n 2L be in θ( L ) . For a system with 3 layers and 100 SOFS nodes evenly distributed, we have about 1.0 ∗ 1010 combinations. This is a very large number, it is not practical

4We use the terms set and number of nodes in a set interchangeably.

23 to analyze the system in this fashion. To circumvent the salability problem, we take an alternate approach. Based on the weak law of large numbers we use average case analysis. We calculate the average number of attacked SOFS nodes in each layer to obtain PS. In the following, we first derive PS, which depends on the SOFS ar- chitecture and number of attacked SOFS nodes in each layer. We will then discuss how to calculate the number of attacked SOFS nodes in each layer (including nodes broken-into and congested).

1) Derivation of PS

Recall that PS is the probability that a client can successfully communicate with the target under attacks, which depends on the SOFS architecture and number of attacked

SOFS nodes. In the SOFS architecture, a SOFS node maintains a neighbor/routing table that consists of a number of (decided by the mapping degree) SOFS nodes in its next higher layer that it can communicate with. Upon receiving a message, a node in Layer i will contact a node in Layer i + 1 from its neighbor table and forward the received message to that node. This process repeats till the target is reached via the nodes in successive higher layers. The routing thus takes place through active

SOFS nodes in a distributed fashion. We call a bad or compromised node as one that has either been broken-into or is congested and thus cannot route a message. The other overlay nodes are good or alive nodes. The neighbor table will contain entries pointing to bad neighbors during break-in or congestion attacks that can cause failure of a message being delivered. A snapshot of the system under an on-going attack is shown in Fig. 2.2.

24 To compute PS, we should first know the probability Pi that a message can be suc- cessfully forwarded from Layer i − 1 to Layer i (1 ≤ i ≤ L + 1). Here Layer L + 1 refers to the set of filters that surround the target, which are also intermediate for- warders. In our analysis, we consider this layer also because it is possible that filter identities can be disclosed during a successful break-in at Layer L. With the property of distributed routing algorithm, we can obtain PS by direct product of all Pi’s, i.e.,

L+1 PS = Πi=1 Pi. Obviously, Pi depends on the availability of good nodes in Layer i that are in the routing table of nodes in Layer i − 1. Towards this extent, we define

P (x, y, z) as the probability that a set of y nodes selected at random from x > y µ ¶ µ ¶ y x nodes contains a specific subset of z nodes. Then P (x, y, z) = / if y ≥ z, z z and 0 otherwise. We denote si as the number of bad SOFS nodes in Layer i. Recall that each SOFS node in Layer i − 1 has mi neighbors in Layer i. Then, on average

P (ni, si, mi) is the probability that all next-hop neighbors in Layer i of a node in

Layer i − 1 are bad nodes. Hence Pi = 1 − P (ni, si, mi). Thus, the probability PS that a message will be successfully received by the target can be expressed as

L+1 L+1 PS = Πi=1 Pi = Πi=1 (1 − P (ni, si, mi)). (2.1)

In (2.1), only si (number of bad nodes) is undetermined. If we define bi and ci as the number of nodes that have been broken-into and the number of congested nodes respectively in Layer i, we have si = bi + ci. In the following we will derive bi and ci.

2) Derivation of bi

25 In the one-burst round based attack model, bi depends on break-in resource NT , and the probability of break-in PB. Since the attacker launches its break-in attacks ran- domly, the NT break-in attempts are uniformly distributed on the overlay nodes in the

n SOFS system. Thus the average number of broken-in SOFS nodes, NB = PB N NT , and hence, n b = P ( i )(N ), i = 1, . . . , L. (2.2) i B N T

We assume here that the filters are well-protected and cannot be broken-into. Filters are special and they are not among the N overlay nodes, and thus not the targets of random attacks. Hence bL+1 = 0.

3) Derivation of ci

We now discuss the derivation of ci (number of congested nodes in Layer i). Unlike bi, ci depends on the result of break-in attacks and congestion capacity NC . Thus, we

first need to know the set of SOFS nodes which are disclosed in the break-in attack phase on Layer i. We divide the disclosed nodes on Layer i into three sets; (i) set

N of nodes on which break-in attempts have not been made (denoted as di ), (ii) set

A of nodes that have been unsuccessfully broken-into (denoted as di ) and (iii) set of nodes that were successfully broken-into (which we do not need to consider here).

N A The nodes in sets di and di will be targeted now by congestion attacks. We calculate

N A th di and di as follows. Let Yi,j be a random variable whose value is 1 when the j node in Layer i is either a disclosed node or one on which a break-in attempt has been made. Let zi denote the average number of nodes that have been disclosed or

26 have been tried to be broken-into. Thus,

Xni Xni Xni zi = E( Yi,j) = E(Yi,j) = Pr{Yi,j = 1}, i = 1,...,L + 1. j=1 j=1 j=1 (2.3)

Denoting hi as the number of nodes on which break-in attempts have been made in

ni Layer i, we have hi = NT ( N ) for i = 1, .., L, and hL+1 = 0 because filters are not targets of break-in attacks as discussed above. Thus, the probability that the jth node in Layer i is neither a disclosed node nor one on which a break-in attempt has been made, is given by (1 − mi )bi−1 (1 − hi ). The same node can be disclosed by more ni ni than one node in the previous layer. The part (1 − mi )bi−1 excludes such overlaps. ni We now have,

m h i bi−1 i Pr{Yi,j = 1} = 1 − (1 − ) (1 − ), i = 1,...,L + 1, j = 1, . . . , ni. (2.4) ni ni

Xni m h m h i bi−1 i i bi−1 i zi = (1−(1− ) (1− )) = ni(1−(1− ) (1− )), i = 1,...,L+1. j=1 ni ni ni ni (2.5)

We hence have,

m h N i bi−1 i di = zi − hi = ni(1 − (1 − ) (1 − )) − hi, i = 2,...,L + 1. (2.6) ni ni

Xhi−bi m m A i bi−1 i bi−1 di = (1 − (1 − ) ) = (hi − bi)(1 − (1 − ) ), i = 2,...,L + 1. j=1 ni ni (2.7)

Note that nodes in the first layer cannot be disclosed due to a break-in attack and so

N A d1 = d1 = 0.

N A The attacker will now congest the SOFS nodes in the set di and di as their identities have been disclosed and they have not been successfully broken-into. We denote

ND to be the average number of SOFS nodes that are disclosed but not broken-into

27 PL+1 N A successfully. It is given by, ND = i=1 (di + di ). Now, we proceed to derive ci,

the number of congested nodes in Layer i. Recall that NC is the overall number of

overlay nodes that the adversary can congest, and the congestion attacks follow after

break-in attacks. There are two cases here;

• NC ≥ ND: In this case, all ND disclosed SOFS nodes will be congested. Since

the attacker still has capacity to congest NC − ND overlay nodes, it will expend

its spare resources randomly. The extra congested nodes will be uniformly ran-

N A domly chosen from the remaining N −NB −(ND −dL+1 −dL+1) good overlay

N A nodes, among which only a part are SOFS nodes. Here dL+1 and dL+1 are

parts of the filers and hence are excluded from ND to determine the remaining

overlay nodes that are targets for random congestion attacks 5. Therefore,

( A N A N A ni−bi −di −di di + di + (NC − ND) ∗ N−N −(N −dN −dA ) , i = 1,...,L, ci = B D L+1 L+1 N di , i = L + 1. (2.8)

• NC < ND: The attacker randomly congests NC nodes among ND disclosed

nodes. In this case,

NC N A ci = (di + di ), i = 1, 2,...,L + 1. (2.9) ND

Recall that si = bi + ci is the set of bad nodes in Layer i. Having thus computed bi

and ci, we obtain PS from (2.1).

3. Numerical Results and Discussion

We now present here numerical results based on our analysis above. We specifically

highlight the overall sensitivity of system performance to attacks and the impacts of

5In our model, the filters’ identities are hidden from attackers and they can be congested only upon dis- closure by break-in attacks.

28 specific SOFS design features (layering and mapping degree) on performance under attacks. Impacts of node distribution per layer are discussed in the successive round based attack model in Section 2.4.1.

Fig. 2.3 shows the relationship between PS and the layering and mapping degree under different attack intensities. The mapping degrees (referred as m in the figures) used here are; 1 to 1 mapping which means each SOFS node has only one neighbor in the next layer; 1 to half mapping which means each node has half of all the nodes in the next layer as its neighbors; and 1 to all mapping which means each node has all the nodes in next layer as its neighbors. Other system and attack configuration parameters are; N = 10000, n = 100, PB = 0.5, SOFS nodes evenly distributed among the layers, and number of filters is 10. In Fig. 2.3 (a), NT is set as 0 and we evaluate performance under two congestion intensities; NC = 2000 and NC = 6000 representing moderate and heavy congestion attacks. In Fig. 2.3 (b), we fix NC =

2000 and analyze two intensities of break-in; NT = 200 and NT = 2000. We make the following observations.

1 1

0.8 m=1 to 1,Nc=2000 0.8 m=1 to half,Nc=2000 0.6 m=1 to all,Nc=2000 0.6 m=1 to 1, Nt=200 m=1 to 1,Nc=6000 m=1 to half, Nt=200

Ps m=1 to half,Nc=6000 Ps m=1 to all, Nt=200 0.4 m=1 to all,Nc=6000 0.4 m=1 to 1, Nt=2000 m=1 to half, Nt=2000 m=1 to all, Nt=2000 0.2 0.2

0 0 1 2 3 4 5 6 7 8 9 101112 1 2 3 4 5 6 7 8 9 101112 L L (a) (b)

Figure 2.3: Sensitivity of PS to L and mi under different attack intensities.

29 Fig. 2.3 (a) shows that under the same attack intensities, different layer numbers result in different PS. When NT = 0 (pure random congestion attack), as L increases,

PS goes down. This is because, there are less nodes per layer, which means under random congestion, few nodes per layer are left uncompromised. This behavior is more pronounced when the mapping degree is small. We wish to remind the reader about the SOS architecture [60], where for defending against random congestion- based DDoS attacks (same attack model as in this instance), the number of layers is fixed as 3 and the mapping degree is 1 to all. From Fig. 2.3 (a), we can see that

fixing the number of layers as 3 is not always the best solution to defend against such attacks. Instead, 1 layer is the best configuration to defend against pure congestion- based attacks.

For any L, a larger mapping degree (more neighbors for each node) means more paths from nodes in one layer to nodes in the next layer, thus increasing PS as seen in Fig. 2.3 (a) under the absence of break-in attacks. Under break-in attacks, a high mapping degree is not always good as more nodes are disclosed due to break-ins.

For instance when the mapping is 1 to all, PS = 0 in Fig. 2.3 (b). Thus the effect of mapping typically depends on the attack intensities in the break-in and congestion phase. Finally, we see that an increase in NC and NT (attack intensities) leads to a decrease in PS as more nodes could be congested or broken-into, leading to a reduction in path availabilities.

30 Under a Successive Round Based Attack

1. Attack Model

In the following, we extend our one-burst attack model significantly in order to study

performance under a highly sophisticated attack model called successive round based

attack model (successive attack in short). The successive attack model is represen-

tative of sophisticated attacks targeting the SOFS system and extends from the one-

burst attack model in two ways: (i) the attacker exploits prior knowledge about the

first layer SOFS nodes. Let PE represent the percentage of nodes in the first layer

known to the attacker prior to attack (typically, these are first layer nodes advertized

to clients), (ii) the break-in attack phase is conducted in R rounds (R > 1), i.e.,

the attacker will launch its break-in attacks successively rather than in one burst. In

this attack model, more SOFS nodes are disclosed in a round by round fashion thus

accentuating the effect of break-in attacks.

The strategy of the successive attack is shown in Procedure 1. We denote β to be

the available break-in attack resources at the start of each round, and β = NT at the

start of round 1. For each round, the attacker will try to break-into a minimum of

NT α nodes and is fixed as R . If the number of disclosed nodes is more than α, the attacker borrows resources from β to attack all disclosed nodes. Otherwise it attacks

the nodes disclosed and some other randomly chosen nodes to expend α resources

for that round. The break-in attack capacity available (β) keeps decreasing till the

attacker has exhausted all of its NT resources. At any round, if the attacker has

discovered more nodes than its available capacity(β), it tries to break-into a subset

(β) of the disclosed nodes then starts the congestion phase. The attacker will congest

31 Procedure 1 Pseudocode of the successive attack strategy

System parameters: N, n, L, PB; Attack parameters: NT , NC , R, X1, β, α Phase 1 Break-in attack: NT 1: β = NT , α = R ; 2: for j = 1 to R do 3: if Xj < α < β then 4: launch break-in attack on all Xj nodes and randomly launch break-in attack on α − Xj nodes and calculate the set Xj+1 disclosed nodes; update β = β − α; 5: end if 6: if Xj < β ≤ α then 7: launch break-in attack on all Xj nodes and randomly launch break-in attack on β − Xj nodes and calculate the set Xj+1 disclosed nodes; break; 8: end if 9: if α ≤ Xj < β then 10: launch break-in attack on all Xj nodes and calculate the set Xj+1 disclosed nodes; update β = β − Xj; 11: end if 12: if Xj ≥ β then 13: launch break-in attack on β nodes among Xj nodes and calculate the set Xj+1 disclosed nodes; break; 14: end if 15: end for 16: calculate ND; Phase 2 Congestion attack:

1: if NC > ND then 2: congest the ND nodes and randomly congest (NC − ND) nodes; 3: else 4: congestion NC nodes among ND nodes randomly; 5: end if

all disclosed nodes and more, or only a subset of the disclosed nodes depending on

its congestion capacity NC . Here we assume the attacker will not attempt to break-

into a node twice and a node broken-into will not be targeted by congestion attack.

Although there can be other variations of such successive attacks, We believe that

ours is a representative enough model of sophisticated attacks.

32 2. Analysis

We again use average case analysis approach and use a similar method to derive PS

as in (1). In calculating bi and ci in the one-burst attack model we analyzed before,

we had to take care of two possible overlap scenarios (i) a disclosed node could have

been already broken-into, (ii) the same node being disclosed by multiple lower layer

nodes. The complexity in overlap is accentuated here due to the nature of successive

attacks. This is because there are multiple rounds of break-in attacks before conges-

tion. We thus have to consider the above overlaps in the case of multiple rounds as

well. In the following, we will first introduce a concept of SOFS node demarcation

in order to deal with above overlaps, and follow that with deriving bi and ci in each

round.

1) Node demarcation

In order to preserve the information about a node per round and across layers, we

introduce subscript j for round information, and subscript i for layer information.

We define Xj as the number of nodes whose identities are known to the attacker at

the start of round j. In order to deal with overlaps within and between rounds, we

need to separate the SOFS nodes into multiple sets as follows. At the beginning of

each round j, the attacker will base its break-in attack on the set of nodes disclosed

at the completion of round j − 1. We denote the set of nodes which are disclosed at

D round j−1 and on which break-in attempts are made in round j, as hi,j. Depending on its spare capacity for that round, the attacker can also select more nodes to randomly

A D A break-into. We denote this set as hi,j. We define hi,j = hi,j +hi,j, which is the number of nodes on which break-in attempts (successfully/unsuccessfully) have been made

33 at Layer i in round j. Once the attacker has launched its break-in attacks on these

D A hi,j nodes, it will successfully break-into a set of nodes. We denote bi,j and bi,j as

D A the set of nodes successfully broken-into, and denote ui,j and ui,j as the set of nodes

D unsuccessfully broken-into, after the attacker launches its break-in attacks on the hi,j

A and hi,j set of nodes respectively. We have,

D D A A bi,j = PB ∗ hi,j, and bi,j = PB ∗ hi,j, i = 1, . . . , L, (2.10)

D D A A ui,j = (1 − PB) ∗ hi,j, and ui,j = (1 − PB) ∗ hi,j, i = 1, . . . , L. (2.11)

j−1 = A + D ∑ hik hij hij hij k=1

W dij A dij *** *** ***** **** ***** ***** ** ** N * ***** ***** ***** ** * dij ** ****+ ***** **** - ** * * o o + + **** * - - * * o o o *** * - - - o o o + + + + *** - D A + + + - - - o o ** - - - - uij uij + + + + ** * - + + + + * * - - - + + + + - - - - + + + - - - A + + + - - - D = N hij + A D - hij dij −1 bij bij

Figure 2.4: Node demarcation in our successive attack at the end of Round j.

D A W Breaking-into nodes in sets bi,j and bi,j will disclose a set of nodes denoted by di,j.

W This set, di,j will overlap with (i) the nodes attacked in all previous rounds given

Pj−1 A D D by k=1 hi,k, (ii) the nodes in set bi,j, (iii) the nodes in set bi,j and ui,j and (iv) the

A W A nodes in set ui,j, where we denote the set of nodes in di,j overlapping with ui,j as

A di,j. Fig. 2.4 shows such overlaps at the end of round j. After discounting all the

W above overlaps from di,j, we can get the set of disclosed nodes which have not been

N D attacked till the end of round j denoted as di,j. Based on the definitions for hi,j and

34 N di,j, and that the filters are not targets of break-in attacks, we have

D N hi,j = di,j−1, i = 1, . . . , L. (2.12)

N A Note that di,j−1 and di,j−1 are 0 for i = 1. This is because the nodes at the first layer cannot be disclosed by means of a break-in attack in any round j. Recall that Xj is the set of disclosed nodes whose identities are known to the attacker before round j and on which break-in attacks will be made at round j. Thus it can be calculated

PL N as Xj = i di,j−1. In the following, we proceed to derive the number of broken-in nodes (bi) and then compute the number of congested nodes (ci) for each round.

2) Derivation of bi

To derive bi, we need to first calculate the sets defined above. For ease of elucidation, we take a representative case Xj < α < β in Procedure 1 as an example to explain our analysis. Recall that β is the amount of available break-in attack resource at current round. This is the most representative case among the ones possible. We also discuss other possible cases briefly after analyzing this case. In this case, the attacker at the beginning of round j of its break-in attack phase has resources to break-into

N more nodes than those disclosed already prior to that round (di,j−1), and has attack resources left (α − Xj) to randomly conduct break-in on other overlay nodes. Now

PL Pj−1 there are N − Xj − q=1 k=1 hq,k unattacked overlay nodes and among them

N Pj−1 A ni − di,j−1 − k=1 hi,k are at Layer i. Thus, we can get the number of nodes (hi,j) on which random break-in attempts have been made on Layer i in round j as

N Pj−1 A ni − di,j−1 − k=1 hi,k hi,j = PL Pj−1 (α − Xj), i = 1, 2, . . . , L. (2.13) N − Xj − q=1 k=1 hq,k

35 We define bi,j as the number of nodes broken-into on Layer i in round j, which is the

A D 6 summation of bi,j and bi,j. Based on (2.10), (2.12) and (2.13), we have,

N Pj−1 ni − di,j−1 − k=1 hi,k N bi,j = PB ∗ PL Pj−1 ∗ (α − Xj) + PB ∗ di,j−1, i = 1, 2, . . . , L. N − Xj − q=1 k=1 hq,k (2.14)

We can now obtain bi as, XJ bi = bi,k, i = 1, 2, . . . , L. (2.15) k=1 where J is the number of rounds attacker takes to exhaust all break-in resources

(NT ). Note that J ≤ R.

N To obtain bi, we need to compute the set of nodes di,j, which is used in (2.14). As

N W discussed above, we have to extract the set di,j from di,j. Similar to the discussion in

N A the one-burst attack model, we can derive di,j and di,j as follows. We first calculate the set of nodes that have been either disclosed or attacked. This is given by, P m j h i bi−1,j k=1 i,k zi,j = ni(1 − (1 − ) (1 − )), for bi−1,j > 0 and i = 2,...,L + 1. ni ni (2.16)

Note that in our attack model, the attacker will not try to break-into a node twice.

N Hence, to calculate di,j, from zi,j, we subtract the nodes on which break-in attempts

Pj have been made ( k=1 hi,k). Thus, we have,

Xj N di,j = zi,j − hi,k, for bi−1,j > 0 and i = 2,...,L + 1. (2.17) k=1

N A Having computed di,j, we can use (2.14) and (2.15) to obtain bi. Now, di,j (which

will be used to compute ci) is given by,

m A A A i bi−1,j di,j = (hi,j − bi,j)(1 − (1 − ) ), for bi−1,j > 0 and i = 2,...,L + 1. (2.18) ni

6 D A Recall that hL+1,j, hL+1,j and bL+1,j are all 0 because filters are not targets of break-in attacks.

36 Having discussed the necessary derivations for the representative case above in detail, we now clarify the readers about the situations involving particular cases for the successive attack. Apart from the representative case we have just discussed, there are three other cases: (i) Xj < β ≤ α, (ii) α ≤ Xj < β, and (iii) β ≤ Xj. For case

(i), all the formulas we derived for the above case can be directly applied, except that

α has to be replaced by β. For case (ii), all the formulas in the above case can be

A A applied except that hi,j = 0. For case (iii), we have hi,j = 0, and the formulas derived in the representative case have to be suitably modified. In this case, there are some disclosed nodes that the attacker does not try to break-into due to consumption of all break-in resources. Such nodes will be attacked during the congestion phase. We denote this set of nodes in Layer i after round j as fi,j. We wish to state here that fi,j has relevance (fi,j > 0) only when the attacker completes its break-in attack phase at round j. Thus in this case, there is no left resource for random break-in attacks.

Only β disclosed nodes will be attempted to be broken-into on all the layers and they are uniformly randomly distributed on each layer. Then we have,

N N β A D N fi,j = di,j−1 −di,j−1 ∗( ), hi,j = 0, hi,j = di,j−1 −fi,j, for i = 1, 2, . . . , L, and Xj (2.19) P P j j m j h + j f X X N i bi−1,j k=1 i,k k=1 i,k di,j = ni(1−(1− ) (1− ))− hi,k− fi,k, i = 2,...,L+1, ni ni k=1 k=1 (2.20)

A where bi−1,j > 0. Here, di,j is the same as (2.18) and fL+1,j = 0 because filters are not targets of break-in attacks. With the above derivations in this case, we can now use (2.14) and (2.15) to calculate bi.

37 3) Derivation of ci

Recall that in the congestion attack phase, the attacker will first congest the disclosed

SOFS nodes disclosed in break-in attack phase. Let the final round of the break-in attack be J(J ≤ R). Denoting ND as the number of disclosed nodes but not broken-

D N A into, based on the definitions of ui,j, di,j, di,j and fi,j, we have,

XL XJ XJ XL XL XL XJ D N N A ND = ui,k + dL+1,k + di,J + fi,J + di,k. (2.21) i=1 k=1 k=1 i=2 i=1 i=1 k=1

PL PJ We have the total number of broken-in nodes, NB = i=1 k=1 bi,k. If NC ≥ ND, similar as (2.8) we have the number of congested nodes per layer, ci as

 P P  J uD + dN + J dA + f + (N − N )  k=1 i,kP i,J Pk=1 i,k i,J PC D  (n − J b − J uD − dN − J dA − f ) c = i k=1 i,k kP=1 i,k i,J k=1 i,k i,J i  /(N − N − (N − j dN )), i = 1,...,L,  B D k=1 L+1,k  Pj N k=1 dL+1,k, i = L + 1. (2.22)

If NC < ND, similar as (2.9) we have ( P P NC J D N J A ∗ ( u + d + fi,J + d ), i = 1,...,L, ND k=1 i,k i,J k=1 i,k ci = P (2.23) NC ( J dN ), i = L + 1. ND k=1 L+1,k

Recall that si = bi +ci is the set of bad nodes in Layer i. We can now obtain PS from

(2.1).

Note that prior knowledge about identities of the first layer SOFS nodes (PE) de- termines X1, i.e., X1 = n1 ∗ PE. In fact, we can consider this information as that obtained from a break-in attack at Round 0. The number of nodes “disclosed” at

Round 0 is thus n1 ∗ PE, all of which are distributed at the first layer. At round 1, the

N attacker will launch its break-in attack based on this information. Thus bi,j, di,j, ci

38 etc., can be calculated by application of Formulas (2.10) to (2.23). We wish to point

out that if we set PE = 0 and R = 1, the successive attack model degenerates into

N the one-burst attack model. Thus the formulas to compute bi,j, di,j, ci etc., will be simplified to the corresponding ones derived in the previous sub-section.

3. Numerical Results

In the following, we discuss the system performance (PS) under successive attacks.

Unless otherwise specified, the default system and attack parameters are N = 10000,

n = 100, L = 4, NC = 2000, NT = 200, R = 3, PB = 0.5, PE = 0.2 and the

SOFS nodes are evenly distributed among the layers. We introduce two new mapping

degrees here, namely 1 to 2 mapping, meaning each SOFS node has 2 neighbors in

the next layer; and 1 to 5 mapping, meaning each node has 5 neighbors in the next

layer.

1 1 m=1to2, N=1000 m=1to5, N=1000 m=1to2, L=2 m=1to2, N=10000 m=1to5, N=10000 m=1to2, L=4 0.8 m=1to2, N=100000 m=1to5, N=100000 0.8 m=1to2, L=6 m=1to5, L=2 0.6 0.6 m=1to5, L=4 m=1to5, L=6 Ps Ps 0.4 0.4

0.2 0.2

0 0 10 100 1000 10000 100000 10 100 1000 10000

NT NT (a) (b)

Figure 2.5: Sensitivity of PS to NT under different L, mi and N.

In Fig. 2.5 we show how system performance, PS, changes with NT as the other

SOFS system parameters change. Fig. 2.5 (a) shows how the mapping degree and

39 total number of overlay nodes influence the relation between NT and PS. In this configuration, we set NC = 2000 and even SOFS node distribution. Fig. 2.5 (b) shows the sensitivity of PS to NT under different number of layers, L, and different mapping degree. We make the following observations. First, PS is sensitive to NT .A larger NT results in a smaller PS. For higher mapping degrees, PS is more sensitive to changing NT . The reason follows from previous discussions that a higher mapping degree discloses more nodes under break-in attacks. Second, in Fig. 2.5, there is portion of the curve, where PS almost remains unchanged for increasing NT . This stable part is due to advantages offered by means of the layering in SOFS architecture against break-in attacks guided by prior disclosure of SOFS nodes. The fall in PS beyond this stable part is due to the effect of random break-in attacks apart from break-in attacks guided by prior disclosure of SOFS nodes.

1 1 m=1to1 m=1to2 m=1to2, node_dist=even m=1to5, node_dist=even m=1to2, node_dist=increasing m=1to5, node_dist=increasing m=1to5 m = 1tohalf m=1to2, node_dist=decreasing m=1to5, node_dist=decreasing 0.8 m=1toall 0.8

0.6 0.6 Ps Ps 0.4 0.4

0.2 0.2

0 0 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 L L (a) (b)

Figure 2.6: Sensitivity of PS to L, mi and node distribution.

Fig. 2.6 (a) shows the impact of layer number, L, on system performance, PS, under different mapping degrees. Similar to Fig. 2.3 (a) (b), PS is sensitive to L and the mapping degree, even under multiple rounds of break-in attacks, i.e., when NT > 0

40 and R > 1. An increase in the number of layers can always slow down penetration of the break-in attacks towards the target. However, if the system deploys too many lay- ers, it decreases the number of nodes on each layer and the number of paths between layers decreases correspondingly, which will cause a decrease in PS (Recall that in our evaluation, the total number of SOFS nodes is fixed). Among the configurations we tested, the one with L = 4 and mapping degree 1 to 2 provides better overall performance than others.

Fig. 2.6 (b) shows the impact of node distribution on PS when L and the mapping degree change. Other parameters remaining unchanged, here we show the sensitivity of performance to three different node distributions per layer. The first is even node

n distribution wherein the nodes in each layer are the same (given by L ). The second is increasing node distribution, wherein the number of nodes in the first layer is fixed

n ( L ). This is to maintain a degree of load balancing with the clients. The other layers have nodes in an increasing distribution of 1 : 2 : ... : L − 1. The third is decreasing

n node distribution where the number of nodes in the first layer is fixed ( L ) and those in the other layers are in decreasing order of L − 1 : L − 2 : ... : 1. However, there can be other node distributions. We believe the above ones are representative to study the impact of node distributions.

We make the following observations. The node distribution does impact system per- formance. The sensitivity of PS to the node distribution seems more pronounced for higher mapping degrees (more neighbors per node). A very interesting observa- tion we make is that increasing node distributions performs best among the tested node distributions. This is because when the mapping degree is larger than 1 to 1, breaking-into one node will lead to multiple nodes being disclosed at the next layer,

41 hence the layers closer to the target will have more nodes disclosed and are more vul- nerable. More nodes at these layers can compensate the damage of disclosure. Also, we observe that as the number of layers increases, the sensitivity to node distribution gradually reduces. This is because as L increases, the difference in the number of nodes per layer turns to be less for the different node distributions.

1 1 m=1to2, L=2 L=1 L=2 L=3 m=1to2, L=4 0.8 0.8 m=1to2, L=6 L=4 L=5 L=6 m=1to5, L=2 0.6 0.6 m=1to5, L=4 m=1to5, L=6 Ps Ps 0.4 0.4

0.2 0.2

0 0 1 2 3 4 5 6 0 0.2 0.4 0.6 0.8 1

R PE (a) (b)

Figure 2.7: Sensitivity of PS to R (a) and PE (b).

Fig. 2.7 (a) shows the impact of R (the number of rounds) on PS under different L with mapping degree 1 to 5. The nodes are evenly distributed among the layers in this case. Overall, PS is sensitive and decreases when R increases. For larger values of L, PS is less sensitive to R because more layers can provide more protection from break-in attacks even for large round numbers. We also observe that PS is sensitive to

PE in Fig. 2.7 (b). For higher mapping degrees, PS is more sensitive to changing PE.

The reason follows from previous discussions that a higher mapping degree discloses more nodes. For smaller L, PS is more sensitive to changing PE because a smaller L increases the attacker’s chance to penetrate the system, layer by layer.

42 2.4.2 Analysis of Continuous Intelligent DDoS Attacks

In this section, we study the performance of the SOFS system in the presence of another

type of intelligent DDoS attacks called continuous attacks. We also study the impacts of

recovery mechanisms the SOFS can incorporate in this section. The performance metric

here is still PS.

Attack Model and System Recovery

The continuous attack model is different from the discrete round based attack model

proposed above in the sense that the attacker continuously breaks into SOFS nodes as and

when their identities are revealed to the attacker (and not in rounds). We define NT and NC

to be the maximum number of overlay nodes that can be simultaneously under break-in or

congestion attacks. Furthermore, here the attacker reuses its resources (NT and NC ) in a

more sophisticated way as follows. During system recovery (discussed next), the attacker

will know that a compromised node is recovered (it is replaced with a good node). If the

attacker attacks a non-SOFS node 7, it will also know that it is a non-SOFS node. In either

case, the attacker will redirect the attack to a new node in time Tred, which is referred as

attack redirection delay.

Under on-going congestion attack, the attacker will keep attacking a victim node as

long as it is an SOFS node. During break-in attacks, once a break-in attempt is completed

on a node (irrespective of the result), the attacker will redirect the break-in attack to another

node also in time Tred. When the attacker redirects the attack, it will use the disclosed node list if there is any node in that list, otherwise it will randomly pick a node from all the overlay nodes except ones currently under attack. Obviously, the disclosed nodes are all

7Recall that a SOFS node is one that currently is active in the SOFS structure, while a non-SOFS node is one that is a part of the overlay system, but is not a part of the SOFS structure currently.

43 SOFS nodes, so they will be targeted first by break-in attacks if there are enough resources.

Otherwise, the nodes are attacked by congestion attacks.

In our analysis here, the SOFS system employs recovery to defend against attacks.

While there can be many potential recovery mechanisms, the one we employ is proactive

recovery, where a proactive reset mechanism periodically resets every SOFS node. When

a proactive reset event happens on a SOFS node, the SOFS system immediately replaces

that node with a new SOFS node chosen from the set of non-SOFS nodes. We denote the

interval between two successive proactive resets on a SOFS node as Tp, which is called

system recovery delay. In this study, we mainly focus our discussion on proactive recov-

ery. Interested readers can refer to [128] for our discussion and analysis on other recovery

mechanisms.

Analysis

The goal of our analysis here is to study the impacts of system design features on sys-

tem performance under continous attacks with system recovery. An analytical approach for

this case, similar to the one conducted under discrete round based attacks, is too compli-

cated. We use simulations here to study system performance under continuous attacks in

the presence of system recovery.

In order to analyze the system, we implement a discrete event driven simulation tool

to simulate the attack model and system recovery. The simulated system consists of 5000 overlay nodes among which there are 40 SOFS nodes, and 10 filters. Each client is con- nected to 5 first layer SOFS nodes. In our simulations below, the attack redirection delay

(Tred) and the system recovery delay (Tp) follow exponential distributions. The system

mean value of Tp recovery is sensitive to , denoted as r, instead of the individual value Tred mean value of Tred or Tp. Thus r measures the competition between attacks and system recovery in terms of

44 speed. A smaller value of r, implies faster recovery, which is beneficial for the system.

In the simulations below we only use r to discuss the impacts of continuous attacks and

system recovery. Numerical Results and Discussions

In following simulations, the default system and attack parameters are L = 4, PB = 0.5,

PE = 0.2, NT = 200 and NC = 200. Fig. 2.8 (a) shows the impact of layer number, L,

1 1 r=5, L=4 r=5, L=7 0.8 0.8 r=10, L=4 r=10, L=7 r=20, L=4 r=20, L=7 0.6 0.6 Ps Ps r=5, m=1to1 0.4 0.4 r=5, m=1to2 0.2 r=5, m=1toall 0.2

0 0 1 2 3 4 5 6 7 8 0 500 1000 1500 2000 2500

L NC (a) (b)

Figure 2.8: Sensitivity of PS to L under different m (a), and to NC under different L and r (b).

on PS under different mapping degrees when both NT and NC are fixed as 200, and r = 5.

Similar to Fig. 2.6 (a), PS is sensitive to L and the mapping degree. The sensitivity of PS to

L and mapping degree is less than that in the discrete round based attack model. The reason

is due to the presence of system recovery, where the system replaces the compromised

and disclosed SOFS nodes, attack impacts are reduced. Fig. 2.8 (b) shows how L and r

influence PS when NT = 200, mapping degree is 1 to 2, and NC changes. Here L = 4 is

always better than L = 7. This is because, when NT is fixed and NC increases, random

congestion attacks dominate, and hence less layers will improve performance as discussed

in the round based attack model.

45 1 1 r=5, m=1to2 r=5, m=1tohalf r=5, L=4 r=5, L=7 r=10, m=1to2 r=10, m=1tohalf r=10, L=4 r=10, L=7 0.8 0.8 r=20, m=1to2 r=20, m=1tohalf r=20, L=4 r=20, L=7 0.6 0.6 Ps Ps 0.4 0.4

0.2 0.2

0 0 0 500 1000 1500 2000 2500 0 500 1000 1500 2000 2500

NT NT (a) (b)

Figure 2.9: Sensitivity of PS to NT under different r and L (a), and different r and m (b).

Fig. 2.9 (a) shows how PS changes with NT under different L and r. The mapping degree is fixed as 1 to 2. In most of the cases, L = 4 performs better than L = 7. This is because, in our simulations the total number of SOFS nodes is fixed. Under this situation, deploying more layers decreases the number of nodes on each layer, and so decreases the number of paths from clients to the target. However, there is one exception to this claim.

When r = 20 and NT = 50, L = 7 performs better than L = 4, which shows that more layers can be beneficial. The reason is, when NT is very small, there are few nodes disclosed and compromised at each layer. In this situation, decrease in PS is mainly due to disclosure and compromise of filters which are at the last layer. Here, slow recovery

(large r) cannot recover the compromised filters effectively. In this case, more layers can slow down the penetration of break-in attacks towards the filters and helps achieve better performance. The data also demonstrate that faster system recovery (smaller r) improves system performance more effectively.

Fig. 2.9 (b) shows how PS changes with NT under different mapping degrees and r, when L = 4. When NT is small, smaller mapping is better, especially when r is large.

But when NT is large, larger mapping performs better. This is because, when NT is not

46 large and mapping degree is small, fewer nodes are disclosed. Hence fewer nodes are

attacked, resulting in high PS. However, when NT is very large, many SOFS nodes are disclosed and compromised and it is the system recovery that maintains a certain (possibly small) number of nodes alive, which guarantees PS > 0. The number of alive nodes here is mainly determined by r, which is not related to mapping degree. But mapping degree decides the number of available paths. Given a number of alive nodes, a larger mapping degree means more paths. Hence, PS increases with larger mapping degree, especially when r is small (fast system recovery).

From the above,we see that attack intensities and system design features have signifi-

cant impacts on system performance under continuous attacks with system recovery. We

also find that recovery plays a significant role in reducing impacts caused by even intense

attacks, by still sustaining a certain level of system performance. Large mapping degrees

help achieve better system performance in this circumstance.

2.5 Countermeasures

2.5.1 Optimization of SOFS System Performance Under Round based Attacks

In the above analysis, we have made important observations on SOFS system perfor-

mance, and the impacts of the design features on system performance under round based

intelligent DDoS attacks, using extensive analytical derivations. Based on this deep insight

of the attacks, we can provide countermeasures against them in order to optimize SOFS

system performance under attacks. In the following, we address this issue. For the pur-

pose of briefness, we only present the methods to obtain optimal mapping degree and node

47 distribution as examples. Optimal configurations for other design features can be obtained

similarly.

The performance of the SOFS system, i.e., PS, is a function of system design features

and attack parameters, as seen in (2.24) below, where m[] and n[] are the mapping degree

and the number of nodes on each layer. Function F contains all parameters that are used to

calculate PS and summarizes the above formulas to calculate PS.

PS = F (N, n, NC ,NT , L, m[], n[],PB,PE,R) (2.24)

If all other system and attack parameters are fixed and we keep m[] as variables, then

we can use existing mathematic tools such as MATLAB to get the optimal mapping degree

under given system and attack parameters. Table 2.1 shows optimal mapping degrees under

default system and attack parameters that were used in Section 3. We can see that the

8 optimal mapping degree changes from 1 to all to 1 to 2, when NT changes from 0 to 2000.

This matches our previous observation that smaller mapping degrees improve the resilience of the system to break-in attacks. In Table 2.2, we get the optimal node distributions under

n two different NT values when L = 4, n1 is fixed as L , i.e., 25, mapping degree is 1 to 2, and all other parameters are set as the default configuration values. The results match our previous observation that increasing node distribution performs better than other node distributions.

While the above approach is useful in some cases, the real problem is how to optimize

multiple structure parameters simultaneously to achieve optimum performance. To com-

plicate this further, some of the parameters especially attack related (such as NC ,NT ) may

be unknown to the system designer at design time. In some cases some parameters can be

8While obtaining optimal mapping degrees we constrain the mapping degrees to be equal across layers for consistency in workload across nodes in the system. However this constraint can be relaxed if need be.

48 NT NT = 0 NT = 20 NT = 200 NT = 2000 Optimal mapping degree 1 to all 1 to 4 1 to 3 1 to 2

Table 2.1: Optimal mapping degree with different NT

NT n1 n2 n3 n4 NT n1 n2 n3 n4 200 25 20 21 34 600 25 22 22 31

Table 2.2: Optimal node distribution under 1 to 2 mapping with different NT

estimated to be within ranges. Also, the system may have other constraints like latency,

workload per node etc., that impact choices in number of layers, mapping degree etc. The

optimization of design features needs to take these issues into consideration too. It can be

easily seen that solving the overall optimization problem is thus not easy. Nevertheless,

we do provide some discussions on how to obtain optimal configurations under reasonable

assumptions on system and attack generalities.

Consider an instance, where intensities can be predicted within some interval, i.e., we

know the ranges and the distributions of NC and NT values. Then, a reasonable approach

to address this problem is to obtain configurations to optimize the expected value of the

0 0 path availabilities denoted as E(PS). It is formally defined in (2.25), where P r(NC ,NT )

0 0 is the probability that the NC and NT have values of NC and NT respectively.

X 0 0 0 0 E(PS) = P r(NC ,NT ) × F (N, n, NC ,NT , L, m[], n[],PB,PE,R). (2.25) 0 0 NC ,NT Based on (2.25), we can use optimization tools such as those in MATLAB to get the

optimal mapping degree (m[]) and node distribution (n[]) to achieve overall optimal per- formance under certain ranges of NC and NT . In reality, the range and distribution of NC

49 and NT , and even other attack parameters can be obtained from historical experience and run-time measurement. Other attack parameters that can be estimated within ranges can be handled in the same way we deal with NC and NT .

To summarize here, the attack strategies, intensities, prior knowledge about the system significantly impact system performance. However, the impacts are deeply influenced by the system design features. Larger values of L and smaller mapping degrees improve sys- tem resilience to break-in attacks, while the reverse is true for congestion-based attacks.

Increasing node distribution performs better than other node distributions. These design features interact among each other to impact system performance under intelligent DDoS attacks.

2.5.2 General Design Guidelines to Enhance SOFS System Performance

Although we couldn’t give the performance optimization for SOFS system under con- tinuous attacks, but we are still able to obtain the impacts of design features of SOFS system on performance under continuous attacks, which matches our observations under round based attacks.

Based on our findings for all the attack models in the chapter, we propose a set of design guidelines to enhance performance under all general scenarios here as follows.

• The design feature configurations should be flexible and adaptive to achieve high

performance under different intensities of attacks.

• When attack information is unknown, moderate number of layers and mapping de-

gree, and increasing node distribution are recommended to sustain a more than ac-

ceptable level of performance.

50 • When break-in attacks dominate, more layers and smaller mapping degrees are rec-

ommended. When congestion-based attacks dominate, less layers and larger map-

ping degrees are better.

• System recovery is always helpful to improve system performance under attacks.

Under intense break-in attack, system recovery with large mapping can always help

sustain a more than acceptable level of performance.

2.6 Related Work

The main scope of this work is in the realm of overlay systems (organized into definite

structures) for defending against Distributed DoS attacks. The surveys in [83, 104, 76] on

DDoS attacks and defense are exhaustive, and interested readers can refer to those papers.

In the following, we focus on work using overlay systems in general to defend against

DDoS attacks.

Recently, several works have proposed solutions based on overlay networks to enhance

security of communication systems like [60, 10, 115, 116, 11, 26, 14, 125]. An overlay

solution to track DDoS floods has been proposed in [116]. [11] proposes a overlay rout-

ing infrastructure to enhance the resilience of the Internet. Chen and Chow designed a

random Peer-to-Peer network that connects the registered client networks with the regis-

tered servers to defend against DDoS attacks in [26]. Badishi et. al present a systematic

study of the vulnerabilities of gossip-based multicast protocols to DoS attacks and propose

a simple gossip-based multicast protocol that eliminates such vulnerabilities in [14]. The

effectiveness of location-hiding of proxy-network based overlays is discussed in [125].

51 Anonymity systems share some features with our SOFS system. Anonymity systems usually use intermediate forwarding to achieve anonymity. However, there are some sig- nificant differences between SOFS and anonymity systems. The goal of SOFS is to ensure paths from clients to the server by putting multiple connections between nodes in succes- sive layers. Many anonymity systems depend on one or more third party nodes to generate an path [101, 134], which is not good for SOFS. SOFS cannot rely on a central- ized node to achieve receiver anonymity, since the centralized node can itself be the target of a DDoS attack.

2.7 Summary

In this Chapter, we have studied the impacts of architectural design features on SOFS, a generalized overlay intermediate forwarding system under intelligent DDoS attacks. We analyzed our SOFS system under discrete round based attacks using a general analytical approach, and analyzed the system under continuous attacks using simulations. We ob- served that the system design features, attack strategies, intensities, prior knowledge about the system, system recovery significantly impacts system performance. Even under sophis- ticated attack strategies and intensities, we showed that with smart designing of system features and recoveries, attack impacts can be significantly reduced. As we discussed in

Section 2.4.1, we showed how to obtain optimal system configurations under expected at- tack strategies and intensities. Based on our findings in the chapter, we further proposed a set of design guidelines to enhance SOFS system performance under all general scenarios.

As part of future direction for this research topic, we propose to design SOFS system that is resilient to attacks while maintaining QoS. break-in attacks, increases the latency of communication. An increase in the mapping degree has the opposite effect of decreasing

52 latency due to more choices for routing. We are in the process of designing an SOFS system that is highly resilient to attacks while still attempting to achieve a desired level of QoS. Also, the impacts of our work extend beyond DDoS attack defense. There are several other applications where a structure present, enables better service delivery. These include Multicasting, Real-time delivery, File Sharing systems etc. As modeled in this chapter, attackers can cause significant damages to performance by exploiting knowledge of the structure already present in these systems. We believe that our work is a first step towards designing the features of resilient overlay architectures under intelligent attacks.

Analyzing the resilience of such systems under intelligent attacks will also be a part of our future work.

53 CHAPTER 3

LOCALIZATION ATTACK AGAINST INTERNET THREAT MONITORING SYSTEMS AND COUNTERMEASURES

In this chapter, we study a new class of attacks, i.e., the invisible LOCalization (iLOC) attack. The iLOC attack is another infrastructure-oriented attack discussed in this disser- tation. Different from the architecture-oriented attack we discussed in previous chapter, it targets to the infrastructure location information of the defense systems. More particularly, the iLOC attack can accurately and invisibly obtain the location of the monitors in Inter- net Threat Monitoring (ITM) systems, which are well accepted defense systems against widespread Internet attacks. The task of iLOC is not to directly harm the Internet or ITM systems. Instead, its goal is to obtain the location information of the key components, i.e., monitors, in ITM systems, so that other widespread attacks can evade ITM systems and make attacks more effective. We also provide countermeasures again this potential threat to the Internet.

3.1 Motivations

In recent years, widespread attacks, such as active worms [87, 86, 4] and Distributed

Denial of Service (DDoS) attacks [82, 2], have been major threats to the Internet. Due to the widely-spreading nature of these attacks, large scale traffic monitoring across the Internet

54 has become necessary in order to effectively detect and defend against them. Developing

and deploying Internet threat monitoring (ITM) systems (or motion sensor networks) is one of the major efforts in this realm.

However, the integrity and functionality of ITM systems largely depend on the anonymity

of the IP addresses covered by their monitors, i.e., the locations of monitors. If the locations

of monitors are identified, the attacker can deliberately avoid these monitors and directly

attack the uncovered IP address space. It is a known fact that the number of sub-networks

covered by monitors is much smaller than the total number of sub-networks in the Internet

[103, 138, 85]. In other words, the IP address space covered by monitors represents a very

small portion of the whole IP address space. For example, the SANs ISC covers around

1 million IP addresses, which is 0.023% of IPv4 IP address space. Hence, bypassing IP address spaces covered by monitors will significantly degrade the accuracy of the traffic data collected by the ITM system in reflecting the real situation of attack traffic. Further- more, the attacker may also poison ITM systems by manipulating the traffic towards and captured by disclosed monitors. For example, the attacker can launch high-rate port-scan traffic to disclosed monitors and feign a large scale worm propagation. The attackers may even launch retaliation attacks (e.g., DDoS) against participants (i.e., monitor contributors) of ITM systems, thereby discouraging them from contributing to ITM systems. In sum- mary, the attacker can significantly compromise the ITM system performance if he is able to disclose the locations of monitors. It is important to have a thorough understanding of such attacks, in order to design efficient countermeasures enabling the protection of ITM systems.

55 In this chapter, we investigate a new class of attacks called invisible LOCcalization

(iLOC) attack, which can accurately and invisibly localize the monitors in ITM systems.

We further present a set to guidelines to counteract this potential threat to ITM systems.

3.2 Background

3.2.1 Internet Threat Monitoring Systems

Generally, an ITM system consists of a number of monitors and a data center. The mon- itors are distributed across the Internet and can be deployed at hosts, routers, and firewalls, etc. Each monitor is responsible for monitoring and collecting traffic targeting to a range of IP addresses within a sub-network. The range of IP addresses covered by a monitor is also referred to as the location of the monitor. Periodically, the monitors send traffic logs to the data center. The data center analyzes the traffic logs and publishes reports to the public9. The reports provide critical insights into widespread Internet threats and attacks, and are used in detecting and defending against such attacks. ITM systems have been suc- cessfully used to detect the outbreaks of worms [103] and DDoS attacks [89]. There have been many real-world developments and deployments of such systems. Examples include

DOMINO (Distributed Overlay for Monitoring InterNet Outbreaks) [137], SANs ISC (In- ternet Storm Center) [103], Internet sink [138], network telescope [85], CAIDA [21], and myNetWatchMan [90].

9In order to maximize the usage of such reports, most existing ITM systems publish the reports online and make them accessible to public.

56 3.2.2 Localization attacks against ITM Systems

A few works have been conducted on monitor localization attacks [18, 110] against

ITM systems. In this kind of attacks, accuracy is very important for an attacker in identify-

ing monitor locations. Meanwhile, invisibility is vital to the attacker as well. If the attack

attempts are identified by the defender (such as the ITM administrators), countermeasures

can be applied by the defender to reduce or eliminate the attack effect by filtering suspi-

cious traffic [120], decoying attackers [113], and even tracing back to attack origins for

accountability of their malicious acts [108]. Invisibility is critical for the attacker to evade

the above countermeasures.

However, it is challenging for the attacker to achieve these two objectives simultane-

ously. Intuitively, the attacker can use the high-rate attack traffic, as in [18, 110], to easily

achieve high attack accuracy as follows. The attacker can launch high-rate port-scan traffic

to a target network. The attacker then queries the data center for the report on recent port-

scan activities. If there is a traffic spike in the report data reflecting the high-rate port-scan

traffic sent by the attacker, the attacker can determine that the target network is deployed

with monitor(s) which sends traffic report to the data center. However, it is hard for this

scheme to achieve invisibility, since large spikes caused by the attack traffic make the attack

easily detectable. Our work is the first to address an attack aiming to achieve the objectives

of accuracy and invisibility.

3.3 iLOC Attack

In this section, we will discuss the iLOC attack in detail. We will first give an overview

of the iLOC attack, and then present the detailed stages of the attack, followed by additional discussions on its mechanisms.

57 3.3.1 Overview

: Background Traffic Data center QUERY RESPONSE Data center

: Attack Traffic + 1. Select code 2. Encode attack traffic + 4. Query for MONITORS’ LOG traffic report Attacker UPDATE Attacker 3. Launch 5. Recognize monitors attack mark monitors attack traffic Network B Network B Network A Network C Network A Network C

Internet Internet

monitors monitors (a) Attack stage 1: attack traffic generating (b) Attack stage 2: attack traffic decoding

Figure 3.1: Workflow of the iLOC Attack

Fig. 3.1 shows the basic workflow of the iLOC attack. This figure also illustrates the basic idea of the ITM system. In the ITM system, the monitors deployed at various net- works record their observed port-scan traffic and continuously update their traffic logs to the data center. The data center first summarizes the volume of port-scan traffic towards

(and reported by) all monitors, and then publishes the report data to the public in a timely fashion.

As shown in Fig. 3.1 (a) and (b) respectively, the iLOC attack consists of the following

two stages:

1. Attack Traffic Generation: In this stage, as shown in Fig. 3.1 (a), the attacker first

selects a code. Then, he encodes the attack traffic by embedding the selected code

into the traffic. Lastly, the attacker launches the attack traffic towards a target network

(e.g., network A in Fig. 3.1 (a)). We denote such an embedded code pattern in the

58 attack traffic as the attack mark of the iLOC attack, and denote the attack traffic

encoded by the code as attack mark traffic.

2. Attack Traffic Decoding: In this stage, as shown in Fig. 3.1 (b), the attacker first

queries the data center for the traffic report data. Such report data consist of both

attack traffic and background traffic. After getting the report data, the attacker tries

to recognize the attack mark (i.e., the code embedded in the iLOC attack traffic)

by decoding the report data. If the attack mark is recognized, the report data must

include the attack traffic, which means the target network is deployed with monitors

and the monitors are sending traffic reports to the ITM data center.

Code-based Attack: The iLOC attack adopts a code based approach to generate the attack traffic. Coding techniques have been widely implemented in secured communica- tion; for example, Morse code is one such example. Without knowledge of Morse code, the receiver would find it impossible to interpret the carried information [31]. In the iLOC attack, we use the pseudo-noise code (PN-code) based attack approach, which has three advantages. First, the code is embedded in traffic and can be correctly recognized by the attacker even under the interference from background traffic, ensuring accuracy of the at- tack. Second, the code (of sufficient length) itself provides enough privacy. That is, the code is only known by the attacker, thereby the code pattern embedded in attack traffic can only be recognized by the attacker. Furthermore, the code is able to carry information. A longer code is more immune to interference, and requires comparatively lower-rate attack traffic as the carrier, which is harder to be detected. All these characteristics help to achieve the objectives of attack accuracy and invisibility.

Parallel Attack Capacity: The iLOC attack can not only attack one target network to determine the deployment of monitors in one network at one time, but it can also attack

59 multiple networks simultaneously. Intuitively, one simple way to achieve this parallel at-

tack is to launch port-scan/attack traffic towards multiple target networks simultaneously,

by scanning a different port number for each different target network. For example, if

the data center publishes traffic reports of 1000 (TCP/UDP) ports, then the attacker can launch attack towards 1000 networks simultaneously, attacking each network with a dif- ferent port number. Since attack traffic on different ports are summarized separately at the data center, the attacker still can separate and thus decode his traffic towards different targets. Hence the attacker can localize monitors in multiple networks simultaneously and accurately. However, can the attacker further improve the attack efficiency? Assume the data center still only publishes reports of 1000 ports, can the attacker attack 10, 000 target

networks simultaneously, for example, attacking 10 different networks using one same port

number? Using high-rate of port-scan traffic cannot achieve this, because it is indiscernible

whether a spike in the traffic report is caused by traffic logs from one network or the other

9 networks. In order to achieve this goal in the code-based attack, the selected code and

corresponding encoded attack traffic towards multiple networks for the same port should

not interfere with each other (i.e., each of them can be decoded individually and accurately

by the attacker, although they are integrated/summarized in the traffic report from the ITM

data center). The PN-code selected in the iLOC attack has this feature, giving it the unique

capacity to carry out parallel attack sessions towards multiple target networks using one

same port. The details of the PN-code selection will be discussed in the following sections.

3.3.2 Attack Traffic Generation Stage

In this attack stage, the attacker: (1) selects the code, a PN-code in our case; (2) encodes the attack traffic using the selected PN-code; and (3) launches the encoded attack traffic

60 towards the target network. For the third step, the attacker can coordinate a large number

of compromised bots to launch the traffic [92]. However, this is not the focus of this

chapter. In the following, we will present detailed discussion on the first and second steps,

respectively.

Code Selection

To evade detection by others, the attack traffic should be similar to the background

traffic. From a large set of real-world background traffic traces obtained from SANS ISC

[103, 39], we conclude that the background traffic shows random patterns in both time

and frequency domains. The attack objectives of both accuracy and invisibility, and an

attacker’s desire for parallel attacks require that: (1) the encoded attack traffic should blend

in with background traffic, i.e., be random in both time and frequency domains, (2) the

code embedded in the attack traffic should be easily recognizable to the attacker himself,

and (3) the code should support parallel attack.

To meet the above requirements, we choose the PN-code to encode the attack traffic.

The PN-code in the iLOC attack is a sequence of −1 or +1 with the following features

[97, 35, 38].

• The PN-code is random and “balanced”. The −1 and +1 are randomly distributed

and the occurrence frequencies of −1 and +1 are nearly equal. This feature con-

tributes to good spectral density properties (i.e., equally spreading the energy over

the whole frequency-band). It makes the attack traffic appear as noise and blend in

with background traffic in both time and frequency domains.

• The PN-code has a high correlation to itself and a low correlation to others (such

as random noise), where the correlation is a mathematical tool for finding repeating

61 patterns in a signal [38]. This feature makes it feasible for the attacker to accurately

recognize attack traffic (encoded by the PN-code) from the traffic report data even

under the interference of background traffic.

• The PN-code has a low cross-correlation value among different PN-code instances.

The lower this cross-correlation, the less interference among multiple attack sessions

in parallel attack. This feature makes it feasible for the attacker to conduct parallel

localization attacks towards multiple target networks on the same port.

The Walsh-Hadamard code and M-sequence code [97, 35] are two popular types of

PN-code. The Walsh-Hadamard code has some limitations. Since its frequency spreads into only a limited number of discrete frequency-components which is different from back- ground traffic, it will compromise the invisibility of the attack traffic if used in the iLOC attack. In addition, the Walsh-Hadamard code also strongly depends on global synchro- nization [35]. On the contrary, M-sequence code does not have these shortcomings, we adopt M-sequence codes in the iLOC attack. We use the feedback shift register to repeat- edly generate the M-sequence PN-code due to its popularity and ease of implementation

[97, 41]. In particular, a feedback shift register consists of two parts. One is an ordinary shift register consisting of a number of flip-flops (two state memory states). The other is a feedback module to form a multi-loop feedback logic.

Attack Traffic Encoding

During the attack traffic encoding process, each bit in the selected PN-code is mapped to a unit time period Ts, denoted as mark bit duration. The entire duration of launched traffic (referred to as traffic launch session) is Ts · L, where L is the length of the PN-code.

62 PN-code = [+1, -1, +1, -1, +1]

+1 +1 +1

-1 -1 PN-Code Length = 5

Attack Traffic Encoded by PN-code V

V 0

TS TS Traffic Launch Session Duration = 5 · Ts

Figure 3.2: PN-code and Encoded Attack Traffic

The encoding is carried out according to the following rules: each bit in the PN-code

maps to a mark bit duration (Ts); when the PN-code bit is +1, port-scan traffic with a high

rate, denoted as mark traffic rate V , is generated in the corresponding mark bit duration; when the code bit is −1, no port-scan traffic is generated in the corresponding mark bit duration. Thus, the attacker embeds the attack traffic with a special pattern, i.e., the original

PN-code. Recall that, after this encoding process, the PN-code pattern embedded in traffic

L is denoted as attack mark. If we use Ci =< Ci,1,Ci,2,...,Ci,L >∈ {−1, +1} to represent

the PN-code and use ηi =< ηi,1, ηi,2, . . . , ηi,L > to represent the attack traffic, then we have

V V ηi,j = 2 · Ci,j + 2 (j = 1,...,L). Fig. 3.2 shows an example of the PN-code and the corresponding attack traffic encoded with the PN-code.

3.3.3 Attack Traffic Decoding Stage

In this stage, the attacker takes the following two steps: (1) The attacker queries the data

center for the traffic report data, which consists of both attack traffic and background traffic.

(2) From the report data, the attacker attempts to recognize the embedded attack mark. The existence of the attack mark determines the deployment of monitors in the attack targeted network. As the query of traffic report data is relatively straightforward, here we only detail the second step, i.e., attack mark recognition, as follows.

63 In the report data queried from the data center, the attack traffic encoded with the attack

mark is mixed with background traffic. It is critical for the iLOC attack to accurately

recognize the attack mark from the traffic report data. To address this, we develop the

correlation-based scheme. This scheme is motivated by the fact that the original PN-code

(used to encode attack traffic) and its corresponding attack mark (embedded in the traffic

report data) are highly correlated, in fact, they are actually the same.

The attack mark in the traffic report data is the embedded form of the original PN-

code. The attack mark is similar to its original PN-code, although the background traffic

may introduce interference and distortion into the attack mark. We adopt the following

correlation degree to measure their similarity. Mathematically, correlation degree is de-

fined as the inner product of two vectors. For two vectors X =< X1,X2,...,XL > and

Y =< Y1,Y2,...,YL > of length L, the correlation degree of vector X and Y is P L X · Y Γ(X,Y ) = X ¯ Y = i=1 i i , (3.1) L

where Γ(.) represents the operator for the inner product of two vectors. Based on the above definition, we have Γ(X,X) = Γ(Y,Y ) = 1, ∀ X,Y ∈ {−1, +1}L.

We use two vectors, ηi =< ηi,1, ηi,2, . . . , ηi,L > and ωi =< ωi,1, ωi,2, . . . , ωi,L > to rep-

resent attack traffic (embedded with attack mark) and background traffic, respectively. We

shift the above two vectors by subtracting the mean value from the original data, resulting

0 0 0 0 0 0 0 0 in two new vectors, ηi =< ηi,1, ηi,2, . . . , ηi,L > and ωi =< ωi,1, ωi,2, . . . , ωi,L >. We still

L use a vector Ci =< Ci,1,Ci,2,...,Ci,L >∈ {−1, +1} to represent the PN-code. Thus,

the correlation degree between the PN-code and the (shifted) attack traffic can be obtained.

Similarly, we can also obtain the correlation degree between the PN-code and the (shifted)

background traffic as follows.

64 According to the rules of encoding attack traffic discussed in Section 3.3.2, ηi =

V V 0 V V 2 ·Ci + 2 . Thus, ηi = ηi −E(ηi) = ηi − 2 = 2 ·Ci. Hence, the correlation degree between

0 V V the original PN-code and the (shifted) attack traffic is Γ(Ci, ηi) = 2 · Γ(Ci,Ci) = 2 . Fur- thermore, we can also derive the correlation degree between the PN-code and the (shifted)

0 background traffic, i.e., Γ(Ci, ωi). The mean of such correlation degree is close to 0, since

0 the PN-code has low correlation with the (shifted) background traffic (i.e., E[Γ(Ci, ωi)] =

1 PL 0 L E[ j=1(ωi,j · Ci,j)] ≈ 0). If the standard deviation of the background traffic rate is σx, the variance of such correlation degree is

0 0 2 V ar[Γ(Ci, ωi)] = E[(Γ(Ci, ωi) − 0) ] (3.2) 1 XL = E[ C2 ω0 2] (3.3) L2 i,j i,j j=1 1 XL σ 2 ≈ E[ ω0 2] = x . (3.4) L2 i,j L j=1 Thus, the average correlation degree between the PN-code and the (shifted) background

traffic is Γ(C , ω0) ≈ √σx . Based on the above discussion, the attacker can set appropriate i i L attack parameters (e.g., PN-code length L and mark traffic rate V ) to make correlation

V degree ( 2 ) between the PN-code and the attack mark traffic is much larger than the corre-

lation degree ( √σx ) between the PN-code and the background traffic. As such, the attacker L can accurately distinguish the attack mark traffic from the background traffic.

In the practice of attack mark recognition, vector λi is used to represent the queried

0 report data, and vector λi is used to represent the shifted report data (by subtracting E(λi,j)

0 from λi). The attacker uses the correlation degree between λi and his PN-code Ci, i.e.,

0 0 Γ(Ci, λi), to determine the existence of PN-code in the report data. If Γ(Ci, λi) is larger

than a threshold Ta, which is referred to as mark decoding threshold, then the attacker determines that the report contains attack traffic as well as the PN-code Ci, and determines

65 that the target network is deployed with monitors. The accuracy of this correlation-degree-

based PN-code recognition is analyzed and demonstrated in Section 3.4, 3.5 and 4.5.

3.3.4 Discussions

In order to accurately and effectively recognize the attack mark (PN-code) from the

report data, we need to find the segment of the report data containing the PN-code (i.e.,

we need to fulfill the synchronization between the port-scan traffic report data and the PN-

code). For this purpose, we introduce an iterative sliding window based scheme. The

basic idea is to let the attacker obtain a enough report data with small granularity. Then, a

sliding window iteratively moves forward to capture a segment of the report data. For each

segment, we apply the correlation-based scheme discussed in Section 3.3.3 to recognize

whether or not the attack mark exists. The details of this synchronization is presented as

follows.

The attacker first sends a sequence of queries to the data center and each query requests

a part of report data which lasts for a given unit time, known as query duration Tq. To guar- antee good synchronization and capture of each bit in the PN-code, Tq should be smaller than the mark bit duration Ts. Also, the attacker needs to send enough queries and ensure that the queried report data contains the whole attack mark and attack mark traffic, which is length L · Ts. With the report data, the attacker iteratively conducts a correlation test on the report data, using a sliding window. For example, in the ith round, the attacker selects

th ti as the starting time for the sliding window. In (i + 1) round, the attacker moves the

sliding window one step (Tq) forward, thus the start time of the sliding window becomes

th ti +Tq, and so on. In the i round, a sequence of data (length of L) is obtained in the sliding window. The first data point in the sequence is the traffic data in time duration [ti, ti + Ts], the second data point in the sequence is the traffic data in time duration [ti + Ts, ti + 2 · Ts],

66 and so on. With these data, the attacker conducts the attack mark recognition discussed in

Section 3.3.3. The attacker repeats the attack mark recognition after each time he moves forward the sliding window, until the attack mark is recognized from the report data in the current sliding window, or the sliding window has gone through all the report data.

3.4 Analysis

In this section, we first present our formal analysis of the impacts of different attack parameters on attack accuracy and invisibility. Then based on analytical results, we discuss how to determine attack parameters.

Before starting analysis, we need to clarify two parties in the attack process, the iLOC attacker and its the adversary, the defender. The term defender generalizes the benign parties who maintain the ITM system and/or exploit the reports from the data center to identify widespread Internet attacks. Based on the reports, the defender not only attempts to determine whether there are anomalies in traffic, but also takes appropriate actions should any anomalies be identified.

3.4.1 Accuracy Analysis

In order to measure attack accuracy, we introduce the following two metrics. The first one is attack successful rate PAD, which is the probability that an attacker correctly recog- nizes the fact that a selected target network is deployed with monitors. The higher PAD is, the higher the attack accuracy. The second metric is attack false positive rate PAF , which is the probability that the attacker mistakenly declares a target network as one deployed with monitors. The lower PAF , the higher the attack accuracy.

Recall that Ta is the mark decoding threshold, V is the mark traffic rate, vector λi repre-

0 sents the queried report data, and vector λi represents the shifted report data (by subtracting

67 0 0 E(λi,j) from λi). Assume that random variables ωi,1, . . . , ωi,L (i.e., the shifted background traffic) are independent identically distributed (i.i.d) and follow a Gaussian random dis- tribution with standard deviation σx, then we have the following theorem for the attack accuracy of the iLOC attack.

Theorem 1 In the iLOC attack, the attack successful rate PAD is

0 0 0 0 PAD = 1 − P r[Γ(λi,Ci) ≤ Ta|(λi = ηi + ωi)] (3.5) Z ∞ 1 −y2 = 1 − √ V √ e dy. (3.6) ( −Ta) L π 2 √ 2σx

The attack false positive rate PAF is

0 0 0 PAF = P r[Γ(λi,Ci) ≤ Ta|(λi = ωi)] (3.7) Z ∞ 1 −y2 = √ √ e dy. (3.8) π √L·Ta 2σx

Proof 1 i) Derivation of attack successful rate PAD.

According to the definition of PAD, we have

0 0 0 0 PAD = 1 − P r[Γ(λi,Ci) ≤ Ta|(λi = ηi + ωi)]. (3.9)

0 V V Consider that Γ(Ci, ηi) = 2 · Γ(Ci,Ci) = 2 , Equation (3.9) can be rewritten by

V PA = 1 − P r[Γ(λ0 ,C ) ≤ T − |(λ0 = ω0)]. (3.10) D i i a 2 i i

Based on the mean and variance of correlation degree determined in Section 3.3.3, then

PAD can be represented by

√ V Z T − 2 L a 2 −x L 2σ2 PAD = 1 − √ e x dx. (3.11) 2πσx −∞

68 √ 2 2 x L √x L Let y = 2 and y = , then we have 2σ 2σx V √ √ (Ta− ) L Z √ 2 √ L 2σx 2σx −y2 PAD = 1 − √ √ e dy (3.12) 2πσx −∞ L V √ (Ta− ) L Z ( √ 2 1 2σx 2 = 1 − √ e−y dy (3.13) π −∞ Z ∞ 1 −y2 = 1 − √ V √ e dy. (3.14) ( −Ta) L π 2 √ 2σx

ii) Derivation of attack false positive rate PAF .

0 0 0 0 We know Γ(λi,Ci) = λi¯Ci, where λi = ωi when no iLOC attack traffic exists. Assume

2 0 σx that Γ(λi,Ci) follows a Gaussian distribution N(0, L ) (discussed in in Section 3.3.3), we have

0 0 0 PAF = P r[Γ(λi,Ci) ≥ Ta|(λi = ωi)]. (3.15)

Thus PAF can be presented by

√ Z L ∞ −x2L 2σ2 PAF = √ e x dx. (3.16) 2πσx Ta √ 2 2 x L √ Lx Let y = 2 and y = , we have 2σx 2σx

√ √ Z ∞ Z ∞ L −y2 2σx 1 −y2 PAF = √ √ (e √ )dy = √ √ e dy. (3.17) 2πσx √LTa L π √LTa 2σx 2σx Remarks: We make a few observations based on the theorem presented above. First, the

attack successful rate PAD increases and the attack false positive rate PAF decreases with

increasing PN-code length L. That is, higher attack accuracy increases when L increases.

Second, with the increasing mark traffic rate V , attack accuracy also increases.

3.4.2 Invisibility Analysis

Here, invisibility refers to how invisible the iLOC attack is from the detection of de-

fender. In order to analyze invisibility, we need to consider the detection algorithms. While

69 there have been many different algorithms proposed to detect anomalies in port-scan traf-

fic, here we use a representative and generic algorithm which has no specific requirement

on detection systems. This threshold based detection algorithm is widely adopted by many

systems [86, 103, 123, 114]. In this algorithm, if the traffic rate (volume in a given time

duration) is larger than a pre-determined threshold Td (referred to as the defender detection

threshold), the defender issues threat alerts and initiates reactions [103]. Such detection

threshold is usually obtained through statistical analysis of the background traffic. Note

that the threshold Td must be carefully chosen for anomaly detection: it must maintain both high detection rate (i.e., the probability that an ongoing attack is detected) and low false positive rate (i.e., the probability that an alarm is triggered when no attack is occurring).

To measure attack invisibility in terms of how well the iLOC attack can evade the detec-

tion by the defender, we use the following two metrics. The first is the defender detection

rate PDD, the probability that the defender correctly detects the attack traffic introduced

by the iLOC attack. The other one is the defender false positive rate PDF , the probability that the defender mistakenly identifies the attack traffic.

Similar to our approach in Section 3.3.2, we use random variable ω0 to represent the

shifted background traffic, and random variable λ0 to represent the shifted traffic data re-

ported by the ITM system. Note that if no iLOC attack exists, λ0 = ω0. Assume that values

of ω0 at a different time unit are independent identically distributed (i.i.d) and follow a

0 2 Gaussian random distribution with standard deviation σx (i.e., ω follows N(0, σx)). Then we have the following theorem for attack invisibility.

Theorem 2 In the iLOC attack, the defender detection rate PDD is

0 0 0 PDD = 1 − P r[λ ≤ Td|(λ = V + ω )] (3.18)

70 Z ∞ 1 2 = 1 − √ e−y dy. (3.19) (V −T ) π √ d 2σx

The defender false positive rate PDF is

0 0 0 PDF = P r[λ ≤ Td|(λ = ω )] (3.20) Z ∞ 1 2 = √ e−y dy. (3.21) T π √ d 2σx

The proof of Theorem 2 is similar to that of Theorem 1, therefore, we will skip it here

due to the space limitation.

Remarks: From Theorem 2, we make the following observations. First, with the in-

crease of mark traffic rate V , the defender detection rate PDD increases, thus the attack

invisibility decreases. Second, the mark traffic rate V does not affect the defender false

positive rate PDF , which is only determined by the threshold Td configured by the de-

fender.

3.4.3 Determination of Attack Parameters

Determination of V , Ta and L

The attacker can determine the values of attack parameters based on the above analysis.

First, the attacker can determine the mark traffic rate V . This is because V is only related

to the attack invisibility metrics (defender detection rate PDD), and it impacts the determi-

nation of other parameters. After determination of V and given the expected false positive

rate, the attacker can further determine the mark decoding threshold Ta and PN-code length

L. Note that the values of other attack parameters such as the standard deviation of back- ground traffic σx, can be determined through analyzing historical background traffic data published by the data center of the ITM system.

71 (i) Mark traffic rate V : Using Equation (3.21), the attacker can first estimate the de- fender detection threshold Td based on a reasonable upper-bound of the defender false pos- itive rate PDF . For example, using the central limitation theory, we know that Td = 3 · σx

achieves a reasonable defender false positive rate PDF (1.7%). Thus, the attacker can use

3 · σx as a reasonable estimation of Td. After that, given the defender detection rate PDD which can be estimated similarly, and the background traffic deviation σx, the attacker can

determine the mark traffic rate V by resolving Equation (3.19) in Theory 2.

(ii) Mark recognition threshold Ta: Given the mark traffic rate V (determined previ-

ously) and desired attack false positive rate PAF , the attacker can further determine the mark decoding threshold Ta by resolving Equation (3.8) in Theorem 1.

(iii) Length of PN-code L: Given the mark traffic rate V , mark decoding threshold Ta, and desired attack successful rate PAD, the attacker can further determine the length of

PN-code L by resolving Equation (3.6) in Theorem 1.

Determination of Ts

To determine the mark bit duration Ts, the attacker needs to estimate the possible delay

from the moment when the attack traffic is first reported by monitors, to the moment when

such attack traffic is published by the data center. To make the iLOC attack effective, the mark bit duration needs to be at least as large as such delay. Otherwise, the traffic in different bit durations (each last Ts) may be published at the same moment from the data center, mixing and thereby rendering them inseparable.

Several possible methods can be used to obtain such delay information. Some ITM

systems may publish such information on their websites. The attacker may also actively

conduct experiments on ITM systems and measure such delay. For example, the attacker

may deploy monitors in his controlled (small) network and connect them to the targeted

72 ITM system. The attacker can simply use such monitors to report logs embedded with

special patterns (e.g., PN-code) and keep querying the data center until the embedded traffic

patterns are recognized. After repeating the above process for several times, the attacker

is able to obtain the statistics profile of delay information, and then determine the mark

bit duration Ts. We use this method in our implementation of the iLOC attack, which is

presented in the next section.

3.5 Implementation and Validation

In this section, we first introduce our implementation of the iLOC attack. Then, we

report the validation results of our iLOC attack design and experiments against a real-world

ITM system. 3.5.1 Implementation of the iLOC Attack

We implement an iLOC attack prototype based on the design in Section 3.3. This

prototype works against any ITM system with the data center having a web-based user

interface. Particularly, there are five independent and important components in our iLOC

implementation, Data Center Querist, Background Traffic Analyzer, PN-code Generator,

Attack Traffic Generator and Attack Mark Decoder.

In particular, Data Center Querist is a component that interacts with the data center of

the targeted ITM system. Its main tasks consist of sending queries to the data center for

port-scan traffic report and retrieving the response (i.e., the report) from the data center.

The inputs to this component are the URL, or IP address, of the data center and the port

number of the port-scan traffic needed to query. From the traffic report data, Background

Traffic Analyzer can obtain the statistics profile of background traffic and determine attack parameters for other components. PN-code Generator is a component that generates and

73 Network B Internet Data Center of Threat Monitoring System Campus Network R1

Network A

Attacker Monitors

Figure 3.3: Experiment Setup

stores the PN-code. The PN-code length is determined according to the attacker’s objec-

tives and background traffic profile as described in Section 3.4.3. Attack Traffic Generator

is a component that generates attack traffic based on the PN-code and background statis-

tics profile. In which, the PN-code encoded traffic is generated in a way as discussed in

Section 3.3.2. Inputs to this component are the IP addresses range of target network, port

number and transportation protocol (TCP or UDP). Attack Mark Decoder is a component

that obtains the port-scan report data through Data Center Querist, and decides whether the

attack mark exists in the way discussed in Section 3.3.3. The PN-code used in the decod-

ing process is the same as the one used in encoding attack traffic and stored in the PN-code

Generator.

These components may be integrated into one program running on one machine. The

attack can also be carried out in a more flexible ways if the tasks of the above components

are performed by processes on different machines. Our iLOC prototype is implemented

using Microsoft MFC and Matlab on Windows XP operating system. 3.5.2 Validation of the iLOC Attack

In order to validate our iLOC implementation, we deployed it to identify a set of moni- tors that are associated with a real-world ITM system.

74 150 Background Traffic 145 Traffic Mixed with iLOC Attack

140

135

130

125

Traffic Rate 120

115

110

105

100 0 50 100 150 200 250 Time (hour)

Figure 3.4: Background Traffic vs. Traffic Mixed with iLOC Attack

24 Background Traffic

22 Traffic Mixed with iLOC Attack

20

18

16

14 Power 12

10

8

6

4 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 −5 Frequency (Hz) x 10

Figure 3.5: PSD for Background Traffic vs. Traffic Mixed with iLOC Attack

Fig. 3.3 illustrates our experiment setup. For the purposes of this research, we requested

information about locations of a set of monitors in the ITM system. We were provided

with the identities of two network sets A and B. There are some monitors deployed within

network set A and there is no monitor in network set B. All monitors in network set

A monitor a set of IP addresses and record the port-scan logs. Then we (the attacker)

execute the iLOC attack to decide whether monitors exist in the network set A and set B, respectively.

In our experiment, we use a PN-code of length 15. The mark bit duration is set for 2

hours and the query duration is 20 minutes. With the queried report data, we can correctly

determine that all networks in set A are deployed with monitors and networks in B are not

75 deployed with monitors. Fig. 3.4 shows the traffic rate in time-domain. Fig. 3.5 shows the traffic rate in frequency-domain in terms of Power Spectrum Density (PSD). The PSD describes how the power of a time series data is distributed in frequency-domain. Mathe- matically, it is equal to the Fourier transform of the auto-correlation of time series data [9].

From these two figures, we observe that it is hard for others, without knowing the content of PN-code, to detect the iLOC attack, since the overall traffic with the iLOC attack is very similar to the traffic without the iLOC attack traffic embedded. That is, such experiments demonstrate that the iLOC attack can accurately and invisibly localize the monitors of ITM systems in practice.

3.6 Performance Evaluation 3.6.1 Evaluation Methodology

In our evaluation, we use the real-world port-scan traces from SANs ISC (Internet

Storm Center) including the detail logs from 01/01/2005 to 01/15/2005 [103, 39]10. The traces used in our study contain over 80 million records and the overall data volume ex- ceeds 80 GB. We use these real-world traces as the background traffic. We merge records of simulated iLOC attack traffic into these traces and replay the merged data to emulate the iLOC attack traffic. We evaluate different attack scenarios by varying attack parame- ters. Here, we only show the data on port 135; experiments on other ports result in similar observations.

We explore both attack accuracy and invisibility to evaluate attack performance. For attack accuracy, we use two metrics: one is the attack successful rate PAD and the other is

10We thank the ISC for providing us valuable traces in this research.

76 1

0.9

0.8 D 0.7 iLOC Experiment iLOC Theory 0.6 Volume−based Frequency−based 0.5

0.4

0.3 Attack Successful Rate − PA 0.2

0.1

0 0.5 1 1.5 2 2.5 3 Attack Traffic Rate − P

Figure 3.6: Attack Successful Rate (Port 135)

the attack false positive rate PAF , which are defined in Section 3.4.1. For attack invisibil-

ity, we use two metrics: one is the defender detection rate PDD and the other is defender

false positive rate PDF , which are defined in Section 3.4.2.

We evaluate the iLOC attack in comparison with two other baseline attack schemes.

The first one is the localization attack that launches a significantly high-rate of port-scan traffic to target networks as introduced in [18, 110]. We denote this attack as volume-based

attack. The second baseline scheme embeds the attack traffic with a unique frequency pat- tern. In this attack, the attack traffic rate changes periodically. Then the attacker expects that the report data from the data center show such unique frequency pattern if the selected target network is deployed with monitors. We denote this attack scheme as frequency-based

attack. For fairness, we adjust the detection thresholds in all schemes so that reasonable at- tack false positive rate PAF and defender false positive rate PDF (below 1%) are achieved.

For the iLOC attack, we generate different attack traffic based on variant PN-code length L

(i.e., 15, 30, 45). The default PN-code length is set to 30. To better quantify the attack traf-

fic rate for the iLOC attack and other attack schemes, we use the normalized attack traffic rate P , which is defined as P = V/σx for iLOC attack, where σx is the standard variation of background traffic rate. The default value of Tq = 0.1 · Ts.

77 3.6.2 Results Attack Accuracy

To compare the attack accuracy of the iLOC attack with that of volume and frequency- based attack schemes, we plot the attack successful rate PAD under different attack traffic rates (i.e., P ∈ [0.01, 3]) as shown in Fig. 3.6. From this figure, we observe that both iLOC and frequency-based attacks consistently achieve a much higher attack successful rate PAD than the volume-based scheme. This difference in PAD is more significant when the attack traffic rate is lower, which can be explained as follows. For the iLOC scheme, the PN-code based encoding/decoding makes the recognition of attack marks robust to interference of the background traffic. For the frequency-based scheme, the invariant frequency in the attack traffic is also robust to the interference of the background traffic. Both of them can distinguish their attack traffic accurately even when the attack traffic rate (i.e., P ) is small.

Nevertheless, the volume-based scheme relies on the high rate of attack traffic (i.e., large

P ), and thus, is very sensitive to the the interference of the background traffic.

Attack invisibility

To compare the attack invisibility performance of the iLOC attack with the other two attack schemes, we show the defender detection rate PDD on port 135 in Table 3.1. This table shows the attacker-achieved defender detection rate PDD, given different localization successful rates PAD (90%, 95%, and 98%). Recall that the defender sets the detection threshold to make the defender false positive rate PDF below 1%. In the table, “(Time)” and “(Freq)” mean that the defender adopts the time-domain and frequency-domain analyt- ical techniques to detect attacks. It is observed that our iLOC scheme consistently achieves much lower defender detection rate PDD than other two schemes, which means the iLOC

78 attack achieves the best attack invisibility performance. As expected, the defender can eas-

ily detect the frequency-based attack by frequency-domain analytical technique, as there is

a unique frequency pattern in its attack traffic.

PAD iLOC(Time) iLOC(Freq) Volume-based Frequency-based Frequency-based attack (Time) attack (Freq) (Time) 90% 2.5% 2.2% 90% 90% 2.9% 95% 2.8% 2.4% 95% 95% 3.1% 98% 3.1% 2.8% 98% 98% 3.3%

Table 3.1: Defender Detection Rate PDD (Port 135)

Impact of the Length of PN-code

To investigate the impact of the PN-code length on the performance of the iLOC attack, we plot the attack successful rate PAD for PN-code of different lengths (15, 30, 45) in

Fig. 3.7. In the legend, iLOC(L = x) means that the PN-code length is x. Data in this

figure are also collected for various attack traffic rates. This figure shows that the attack successful rate PAD increases with increasing PN-code length. This is because a longer

PN-code can more significantly reduce the interference impact from the background traffic on recognizing the attack mark, thereby achieving higher attack accuracy.

Impact of the Number of Parallel Localization Attacks

To evaluate the impact of the number of parallel localization capability on attack ac-

curacy, we show the attack successful rate PAD for different number of parallel attack

sessions on the same port in Fig. 3.8. In the legend, iLOC(N = x) means that there are

x parallel attack sessions. This figure shows that in term of attack successful rate PAD,

79 the iLOC attack scheme is not sensitive to the number of parallel attack sessions. The attack successful rate PAD only slightly decreases with the increasing number of paral- lel attack sessions. The reason is that the traffic for different attack sessions are encoded by PN-codes, which are low cross-correlated to each other as described in Section 3.3.2, and thereby have little interference amongst them. Fig. 3.9 shows the impact of the num- ber of parallel attack sessions on attack invisibility. It can be observed that the increasing number of parallel attack sessions results in a slight increase of defender detection rate

PDD. Therefore, parallel localization capability can improve the attack efficiency without significantly compromising both accuracy and invisibility.

The iLOC attack achieves invisibility by using the PN-code, which contributes to a longer period in which the attack can be carried out. Nevertheless, parallel capability can significantly improve the attack efficiency. For example, take the case of attacking a system consisting of 1200 networks. Using one port, the volume-based attack needs 1200 unit time to perform the attack task. Single iLOC attack with code length of 15 needs 1200 ×

15 = 18000 unit time and achieves higher accuracy and invisibility. To fulfill the same localization attack task, parallel iLOC with 8 attack sessions and same code length can achieve similarly high accuracy and invisibility performance and the total time is only

1200 × 15/8 = 2250 unit time, which is comparable to that of a volume-based attack.

3.7 Guidelines of Countermeasures

We have demonstrated the threat of the iLOC attack against ITM systems above. Now, we discuss possible countermeasures to such attacks. It is relatively easy to defend against volume-based and frequency-based localization attacks which either embed a spike (using a high-rate scan traffic) [18, 110] or an invariable frequency (using a certain frequency

80 1

0.9

0.8 D 0.7

0.6

0.5

0.4 C−Probe Experiment (L=15) 0.3 C−Probe Theory (L=15) C−Probe Experiment (L=30) Attack Successful Rate − PA C−Probe Theory (L=30) 0.2 C−Probe Experiment (L=45) C−Probe Theory (L=45) 0.1 Frequency−based Probe Volume−based Probe 0 0.5 1 1.5 2 2.5 3 Attack Traffic Rate − P

Figure 3.7: Attack Successful Rate vs. Code Length

1

0.9

0.8 D 0.7

0.6

0.5

0.4

0.3 iLOC Experiment (N=2) Attack Successful Rate − PA iLOC Experiment (N=4) 0.2 iLOC Experiment (N=8) Frequency−based 0.1 Volume−based

0 0.5 1 1.5 2 2.5 3 Attack Traffic Rate P

Figure 3.8: Attack Successful Rate vs. Number of Parallel Attack Sessions

0.07 iLOC Experiment (N=2, Time) iLOC Experiment (N=4, Time) 0.06 iLOC Experiment (N=8, Time iLOC Experiment (N=2, Freq)

D iLOC Experiment (N=4, Freq) 0.05 iLOC Experiment (N=8, Freq)

0.04

0.03

0.02 Defender Detection Rate − PD

0.01

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 Attack Traffic Rate − P

Figure 3.9: Attack Successful Rate vs. Number of Parallel Attack Sessions

81 pattern), since these two attack schemes show strong signatures in the attack traffic (either in the time domain or frequency domain). However, in order to defend against the iLOC attack, the defender needs deep insight into the design of the attack. We give some general guidelines for counteracting the iLOC attack from the following aspects.

1) Limiting the Information Access Rate: Recall that in the iLOC attack, the attacker must generate a significant amount of queries to the data center of ITM systems in order to accurately recognize the encoded attack traffic. We may explore such knowledge to reduce the effectiveness of iLOC attack. To do so, the data center may throttle the query request rate. One possible way is to enforce human/system interaction for each query, and thereby eliminate the automatic query in the iLOC attack. This can be conducted through authenticated registration, e.g., one authenticated registration is only valid for a certain number of queries. However, these limitations on information access rate may also reduce the functionality and usage of ITM systems.

2) Perturbing the Information: Recall that in the iLOC attack, the attacker needs to recognize the encoded attack traffic. Thus, the quality of reports plays an important role in such a recognition process. To reduce the effectiveness of iLOC attack, we may perturb the published report data by adding some random noise and even randomizing the data pub- lishing delay. This principle is similar to the data perturbation in private data sharing realm

[145, 8]. By perturbing report data, the attack accuracy of iLOC attack can be degraded.

On the other hand, adding random noise and randomizing the delay in publishing report data will impact the data accuracy and usage of ITM systems. Studying such a trade-off will be one aspect of our future work.

3) Investigating Advanced Detection Schemes: Recall that in the iLOC attack, in order to effectively evade detection of monitors in ITM systems, the attacker has to continuously

82 launch attack/port-scan traffic to different new target networks to localize as many monitors as possible. Consequently, the target IP addresses of attack traffic may exhibit a widely dispersed distribution [67]. Thus, analyzing the distribution of IP addresses may provide one possible method of detection. Furthermore, since the iLOC attack uses the PN-code to encode the attack traffic, we plan to study potential ways of detecting PN-code in our future work.

3.8 Related Work

Many ITM systems have been developed and deployed since CAIDA initiated the net- work telescope project to monitor background traffic in 2001 [22]. Although the IP ad- dresses of monitors themselves can be protected by mechanisms, such as encryption and

Bloom filter [52], the public data reported by these ITM systems could be used to dis- close the IP address space covered by monitors. Existing attack approaches achieve this by launching high-rate port-scan traffic [18, 110]. However, these kind of attacks do not consider the invisibility of attacks, since the high-rate attack traffic exposes the chance of being detected.

The invisibility techniques in our work borrows the camouflage principle, as illustrated by nature and the military. In nature, an animal can disguise itself as the object on which it stands in order to fool its predators or prey [12]. In military, soldiers wear camouflage clothing designed to blend into the surrounding terrain [94]. As an invisibility technique, our work leverages the PN-code technology and extends it to a new Internet cyber-security realm. The PN-code was initially used in military communication systems to provide anti- jamming and secured communication [97]. In wireless communication, PN-code has been widely used to improve the communication efficiency [35]. In addition, PN-code has other

83 broad applications, such as cryptography [17], secured data storage and retrieving [126],

image processing [133].

In this chapter, we study techniques in applying the PN-code in the iLOC attack. Work in [139] also studied how to use PN-code to effectively track anonymous flows through mix networks. Since it is applied to a different problem domain, the solution in [139] is significantly different from the one in this chapter, including the use of the PN code, designed algorithms, decision rule, and theoretical analysis.

3.9 Summary

In this chapter, we investigated a new class of attacks, i.e., the invisible LOCalization

(iLOC) attack. It can accurately and invisibly localize monitors of ITM systems. Its ef-

fectiveness is demonstrated by theoretical analysis and experiments with an implemented

prototype. We believe that our work lays the foundation for ongoing studies of attacks that

intelligently adapt attack traffic to defenses. Our study is critical for securing and improv-

ing ITM systems.

As part of our future work, we will continue studying other invisible localization attack

schemes. While the PN-code used in this chapter is effective in achieving attack objectives

of accuracy and invisibility, other attack patterns embedded in attack traffic may also be

accurately distinguished only by the attacker. Detection of such invisible attack and de-

sign of corresponding countermeasures are still challenging tasks and will be conducted

in our future research. Investigation of proactive methods to protect the location privacy

of monitors is also a part of our future work. Also, we believe that other vulnerabilities

exist in ITM systems and we plan to conduct a thorough investigation of them and develop

corresponding countermeasures.

84 CHAPTER 4

VARYING SCAN RATE WORMS AGAINST NETWORK-BASED WORM DETECTIONS AND COUNTERMEASURES

In this and next chapters, we study the other type of defense-oriented widespread Inter-

net attacks, i.e., algorithm-oriented attacks. In this chapter, we study a evolved worm called

Varying Scan Rate Worm (the VSR worm in short) which deliberately vires its scan rate and

is able to avoid detection by existing network-based worm detection systems. We also

present our new worm detection scheme called attack target Distribution Entropy based dynamiC detection scheme (DEC detection in short) which can effectively detect both VSR and traditional worms.

4.1 Motivations

In traditional active worms, each worm instance11 takes part in spreading worm attack

by scanning and infecting other vulnerable hosts in the Internet. The basic form of active

worms is the Pure Random Scan (PRS) worm, where a worm infected host continuously

scans a set of random Internet IP addresses to find new vulnerable hosts. There are several

variants of the PRS worm such as local subnet scan worm [27] and hit-list scan worm [114].

Both of these worms attempt to speed up their propagation by increasing the probability

11In this chapter, we use worm infected host, worm victim, and worm instance interchangeably.

85 of successful scanning. It has been observed that there are exponentially increasing trends in the number of infected hosts and overall scanning traffic volume over time when any of the above worms propagates in the Internet [86][27][148]. Based on these observations, many worm detection schemes associated with global scan traffic monitoring systems, such as threshold-based detection and trend-based detection have been developed to detect the large scale propagation of worms in the Internet [147][105][132][123].

However, active worms continue evolving. The ”Atak” worm [144] is a new active worm found in the recent past that attempts to remain hidden by going to sleep (stop scan- ning) when it suspects to be under detection. The worms which adopt similar attack strate- gies to that of the ”Atak” worm could result in an overall scan traffic pattern different from that of traditional worms. Therefore, the existing detection schemes based on global scan traffic monitoring will not be able to effectively detect them. Therefore, it is very important to understand such smart-worms in order to defend against them. In the following, we study two new classes of such worms and design new detection schemes which can effectively detect them and traditional worms.

4.2 Background

In this section, we first introduce the propagation model of traditional active worms.

We then discuss the worm detection framework and corresponding detection schemes as- sociated with it.

4.2.1 The Propagation Model of Traditional Worms

Computer active worms are similar to biological viruses in terms of their infection and self-propagating nature. Worms identify vulnerable hosts in their neighborhood and other

86 networks and then infect them. The newly infected hosts propagate the worm infection

further to other vulnerable hosts and so on.

As discussed in Section 4.1, the Pure Random Scan (PRS) worm is the basic form of

traditional active worms. We use PRS approach as our baseline attack scheme to study

the the VSR worm and C-Worm. In our analysis, we adopt the epidemic dynamic model

for disease propagation [13] to characterize the worm propagation. Modeling active worm

propagation using the epidemic dynamic model has been done in [86][27][148]. This model

assumes that each host is in one of the following states: immune, vulnerable or infected.

An immune host is one that cannot be infected by a worm. A vulnerable host becomes an

infected host after being successfully infected by a worm. We use an average case analysis

approach to analyze the worm propagation. The analysis is conducted in discrete time

domain.

We define M(i) and N(i) as the number of overall infected hosts and the number of un-infected vulnerable hosts at time tick i respectively, and E(i+1) as the number of newly

infected hosts from time tick i to time tick i + 1. We define T as the total number of IP

32 addresses, i.e., 2 for IPv4. Thus N(0) = T · P1 · P2 is the number of vulnerable hosts

on the Internet before the worm attack starts, where P1 is the ratio of the number of IP

addresses assigned to existing hosts in the Internet to the entire Internet IP address space

T , P2 is the ratio of vulnerable host number to the total number of the existing hosts in

the Internet. We also assume E(0) = 0 and M(0) = M0. For the PRS worm, we have

propagation model as follows,

1 E(i + 1) = N(i)[1 − (1 − )S·M(i)], (4.1) T

M(i + 1) = M(i) + E(i + 1), (4.2)

87 N(i + 1) = N(i) − E(i + 1). (4.3)

The details of this model are in [27][132].

4.2.2 Network-based Worm Detection Worm Detection System Framework

As we stated before, this chapter focuses on designing schemes to detect the Internet- wide large scale propagation of active worms. In order to achieve rapid and accurate detection, it is imperative to monitor and analyze the traffic in multiple locations over the Internet to detect suspicious traffic generated by worms. The generic worm detec- tion framework that we use in this chapter consists of multiple distributed monitors and a worm detection center that controls the former. This framework is widely adopted, and is similar to other existing worm detection systems, such as the Cyber center for dis- ease controller [114], honeypots [113] and Internet sink [138]. The distributed monitors are located at gateways, firewalls, and border routers of local networks. They passively record and report any unusual scan traffic data, such as connection attempts to unavail- able IP address and restricted service ports, to the worm detection center. The report to the worm detection center generally includes scan information in the form of a tuple, i.e.,

(Source IP, Destination IP, T ime, Signature). The worm detection center processes such reports and deploy various worm detection schemes to check whether there are sus- picious and large-scale scans to restricted ports or connection attempts to unassigned IP addresses. If such excessive and uncommon scans are detected, the worm detection center concludes that there is a large-scale worm propagation over the monitored network.

88 Worm Detection Schemes

Various worm detection schemes based on the global scan traffic monitoring can be

integrated into the above detection framework. Similar to the traditional intrusion systems,

most worm detection schemes take three steps in worm detection; (i) collecting the data for

detection purpose; (ii) analyzing the detection data; (iii) applying certain decision rules to

obtain the detection result.

Most existing worm detection schemes use the count of monitored worm instances as detection data. For the statistic property of detection data, various detection schemes are different in terms of the format and usage. Most of them use individual sampled detection data of each sampling window to decide the presence of wide-spreading worm [123][131].

Some of them use the variance of sampled detection data to set the detection criterion such as the threshold of detection decision [132]. Based on the detection decision rule, the existing detection schemes can be classified into two groups, namely the count-threshold- based and count-trend-based detections. The count-threshold-based scheme is a popular scheme to detect wide-spreading worms [132][123][131]. In this scheme, if the amount of observed worm instances/scan traffic exceeds a pre-configured threshold, large scale worm spreading is identified as being in existence. On the other hand, the count-trend- based scheme [147] is based on the knowledge of existing worm attack epidemic dynamic models and uses the principle of ”detecting dynamic traffic trends”. That is, it is based on the observation that although the scan rate of a worm instance might be limited by the network bandwidth and CPU capacity, the worm instance does not change the scan rate on purpose. Thus, the existing worm attacks cause the number of worm instances to increase at a positive exponential rate which can be monitored for detection purposes. In Section

89 4.3, we define a new active worm attack strategy and observe that the above two classes of

detection schemes are not able to detect this new active worm effectively.

4.3 The Active Worm with Varying Scan Rate

In this section, we first define a new form of active worms, i.e., the active worm with

Varying Scan Rate (the VSR worm in short). We then analyze the effectiveness of the

VSR worm in changing the worm scan behavior and hence evading the existing global scan

traffic monitoring based worm detection schemes.

4.3.1 The VSR Worm Model

For the traditional worm, each worm instance scans and infects other hosts on the In-

ternet. For the VSR worm, each worm-infected victim (worm instance) adopts a varying

scan rate (S(t)) and a varying attack probability (Pa(t)). S(t) can be totally randomized or determined by a certain function depending on worm attack strategies. The attack prob- ability Pa(t) is the probability that a worm instance takes part in worm attack (i.e., scan other hosts on the Internet) at time t. In practice, a worm attacker may divide the time into a sequence of discrete time units. Accordingly, S(t) and Pa(t) become discrete functions,

th S(i) and Pa(i). In the i time unit, worm instances calculate S(i) and Pa(i) and carry out scan based on these two values. Algorithm 2 shows the procedure of the VSR worm attack.

The scan rate function S(i) and the attack probability function Pa(i) are predetermined

Sn by the worm attacker. For example, a VSR worm uses S(i) = max( 1+i , 0.02), which is

a time (index by i) decreasing function. Here, Sn is a parameter to control the overall

scan rate during the attack. This VSR worm can use a periodic function for the attack

90 Procedure 2 The Psudocode of the VSR Worm Attack

Require: This host is a worm-infected host 1: for all i = 0 to ∞ do 2: /*current time is i*/ 3: Calculate random scan rate S(i); 4: Calculate attack probability Pa(i); 5: Conduct the scan based on S(i) and Pa(i); 6: end for

probability,

2π P (i) = max(| cos( i)|, 0.08). (4.4) a 5000

Our VSR worm model is generic. The ”Atak” worm is one of its special cases, where

S(t) is close to a constant value and Pa(t) is determined by a time varying function. The traditional PRS worm is also a special case of the VSR worm, where S(t) is close to a constant value and Pa(t) is equal to 1. Other forms of worms such as local subnet scan worm [81][27] have propagation formulas different from Formula (4.1) for the PRS worm.

However, the varying S(t) and Pa(t) functions can be easily applied to those propagation formulas to model the combination of those worms with the VSR worm.

4.3.2 Analysis of the VSR Worm

In this section, we first develop the propagation model of the VSR worm. Following this, we investigate the performance of the existing worm detection schemes in detecting

VSR worms, in order to determine the effectiveness of VSR worm in evading these detec- tion schemes.

91 Propagation model of the VSR Worm

For the VSR worm, the Formula (4.2) needs to be modified to take the S(i) and Pa(i) into consideration. For the analysis purpose, we assume all the worm instances use the same scan rate function S(i) and the attack probability function Pa(i). Then, we have,

1 M(i + 1) = M(i) + N(i)(1 − (1 − )S(i)·Pa(i)·M(i)). (4.5) T

Now we derive the number of worm instances observed by the detection system from time tick i − 1 to time tick i, referred as Mˆ (i). The number of observed worm instances is impacted by the percentage of IP address space, referred as Pm, monitored by the detection system. Pm determines the average probability that a worm scan can be monitored by the detection system.

Assume at time tick i, there are M(i) worm instances. Based on the VSR worm attack strategy, there are M(i)·Pa(i) worm instances actively conducting the scan, and each active instance generates S(i) scans between time tick i − 1 and time tick i. The probability that at least one scan from S(i) scans generated by a worm instance will be detected by the

S(i) detection system is 1 − (1 − Pm) . Thus, the number of worm instances observed by detection system from time tick i − 1 to time tick i is

ˆ S(i) M(i) = M(i) · Pa(i)[1 − (1 − Pm) ]. (4.6)

In the following, we show the simulation data on the propagation of the VSR worm. In order to show performance of different VSR worms, we select S(i) to be

Sn S(t, K) = max( t , 0.08), (4.7) 1 + K and use the same Pa(t) in Formula (4.4). We select the parameter K defined in For- mula (4.7) to be 200, 250 and 300 respectively. Thus, three corresponding VSR worms

92 Number of Total Infected Hosts on VSR Worms Number of Detected Worm Attack Instances on VSR Worms

500 350000 450 300000 400 350 250000 300 200000 250 150000 200 150 100000 100 50000

# of Attack Instances # of Attack 50

# of # Infected Host Instances 0 0 100 100 1000 2000 3000 4000 5000 6000 7500 8500 9500 1000 2000 3000 4000 5000 6000 7500 8500 9500 10500 11500 12500 13500 14500 15500 16500 17500 10500 11500 12500 13500 14500 15500 16500 17500 Time Time Traditional w orm VSR(K=200) VSR(K=250) VSR(K=300) Traditional w orm VSR(K=200) VSR(K=250) VSR(K=300)

Figure 4.1: Infection ratio of different Figure 4.2: The observed worm instance VSR worms. count of different VSR worms.

are generated, referred as VSR(K = 200), VSR(K = 250) and VSR(K = 300) respec- tively.

Fig. 4.1 shows the pattern of worm infected instances of the traditional PRS worm and three VSR worms determined previously. For the PRS worm, the scan rates of the worm infected hosts follow a normal distribution N(100, 100). It clearly demonstrates that the PRS worm has an exponential increasing pattern of worm instance number during its propagation, and the VSR worm can change this pattern. As shown in Fig. 4.2, the observed worm instance numbers of VSR worms are also very different from the traditional PRS worm. This demonstrates that the VSR worm can successfully hide the real worm instance count (M(i)), change its pattern over time and thus evade being effectively detected by the existing worm detection systems. In Fig. 4.2, the worm detection system covers a 220 IPv4

1 address space (Pm = 212 ) similar to that of SANS ISC [103].

93 Effectiveness of the VSR Worm

In the following, we evaluate the effectiveness of VSR worm in defeating some repre-

sentative worm detection schemes. We define two metrics here. The first one is the Detec-

tion Time (DT ), which is defined as the time taken to successfully detect a wide-spreading worm from the moment when the worm spreading starts. The second metric is the Maximal

Infected Ratio (MIR), which defines the ratio of infected host number over the total num- ber of vulnerable hosts up to the moment when the worm spreading is detected. This metric fundamentally quantifies the damage of the worm, i.e., how many hosts are infected by the time when worm spreading is detected. The importance of MIR is that it can distinguish

the performance of two worms in the case that two worms are detected at the same time

(same DT ), but they have infected different number of hosts at the moment being detected.

The fundamental purpose of the detection schemes is to minimize the damage caused by

the worm. Hence MIR also quantifies the effectiveness of the worm detection schemes.

The higher this value is, the better is the worm attack performance and consequently, the

worse is the detection performance.

In our simulations, the parameters of the traditional worm and the VSR worms are same

as those in Section 4.3.2. By analyzing the background non-worm scan traffic reported by

Internet Storm Center of SANs [103] and Goldsmith data [49], we are able to set detection

system parameters of all detection schemes to achieve reasonable detection false alarm

rate (below 5 × 10−4). The detection false alarm rate is defined as the probability that a

worm spreading alarm is reported as the detection result when there is no wide-spreading

worm. We obtain the results for three existing detection schemes. The first one is the

generalization of the detection schemes based on the comparison of the observed victim

count and a predefined threshold [123]. We refer this detection scheme as CISH. The

94 second detection scheme is proposed in [132], which is based on the observed victim count

variance and a dynamic threshold. We refer this detection as CVDH. The third detection

scheme is proposed in [147], which is based on observing a predetermined trend of victim

count during the worm propagation. We refer this detection scheme as CISR. We also run

the Kalman filter following the design in [147] to perform CISR detection on VSR worms.

Tables 4.1 and 4.2 show the results of above three detection schemes in detecting the

traditional PRS worm and VSR worms. Although all detection schemes are effective in

detecting the traditional PRS worm, they are not effective in detecting VSR worms. For

example, when K is 200, CISH and CISR schemes fail to detect the VSR worm while

CVDH is ineffective, i.e., 41% of the vulnerable hosts are infected at the moment when the worm is detected. Comparatively, CISR is less effective compared with other detection schemes due to the following reason; it tries to detect an exponential increasing trend of the worm scan traffic, but the trend of VSR worm’s scan traffic does not follow the expo- nential increasing pattern. This causes the Kalman filter to oscillate without convergence.

Our simulation results show that above threshold-based and trend-based worm detection schemes are not be able to effectively detect the VSR worm. In the following section, we develop a new detection scheme which aims to effectively detect the VSR worm.

Detection Traditional Worm VSR(200) VSR(250) VSR(300) CISH 862 ∞ 17700 7600 CVDH 879 33400 12011 9234 CISR 760 ∞ ∞ ∞

Table 4.1: Detection Time of Some Existing Detection Schemes

95 Detection Traditional Worm VSR(200) VSR(250) VSR(300) CISH 3% 100% 52% 12% CVDH 4% 41% 20% 23% CISR 2% 100% 100% 100%

Table 4.2: Maximal Infection Ratio of Some Existing Detection Schemes

4.4 DEC Worm Detection

4.4.1 Design Rationale

As we discussed in Section 4.3, the VSR worm can adopt intelligence in its attack such that it can behave differently from traditional worms. Consequently, the existing worm detection schemes are not effective in detecting the VSR worm.

In this section, we develop a new detection scheme called attack target Distribution

Entropy based dynamiC (DEC) detection that captures the key inherent worm propaga- tion feature and thus be able to effectively detect the VSR worm. The VSR worm can manipulate the scan traffic volume over time so that it is undetectable by existing worm detection schemes based on global scan traffic monitoring. However, its fundamental goal is to propagate itself to as many vulnerable hosts as possible. Hence, to be effective, the

VSR worm still has to continuously propagate the worm to the new targets and cause large scale infection to the Internet. Thus, the VSR worm scan traffic must show a widely dis- persed distribution for scan target addresses and the widely dispersed distribution of the attack/scan targets in worm scan traffic can be used to distinguish the VSR worm scan attack from other occasional non-worm port scan activities, i.e., port scan due to software

96 faults or occasional port scan by benign software. Motivated by this observation, our DEC detection uses the attack target distribution as the basic attack data for the worm detection.

Recall our discussion in Section 4.2.2 where we mentioned that there are three im- portant steps/elements associated with worm detection. DEC has special features in these three elements compared with the existing threshold-based and trend-based worm detection schemes: (i) Detection data of worm attacks: The distribution of the attack targets can be observed by the detection monitors in the detection system and DEC treats this distribution as the basic detection data. While a distribution can be described and quantified in different formats, we use the entropy to capture the distribution of the attack targets. From infor- mation theory perspective, a smaller entropy indicates fewer number of anomalies in the detection data. (ii) Statistical property of worm detection data: While processing and ana- lyzing the detection data sequences obtained from continuous detection sampling windows, we observe that the sample entropy can distinguish the non-worm scan traffic and worm scan traffic more effectively than other statistical measures, i.e., sample mean and variance.

Hence, we use the entropy to process sampled detection data and capture the abnormality during worm propagation. (iii) Detection decision rule: Since the VSR worm can behave differently over time, we adopt run-time dynamic adaptations in the DEC detection.

4.4.2 DEC Worm Detection

DEC has specific features which improve the detection performance not only for the detection of VSR worm, but also for the detection of traditional worms. DEC obtains this improvement through the new features of three elements in worm detection as follows:

97 Attack Target Distribution

To deal with the VSR worm, DEC detection uses the distribution of the attack targets

in worm attacks as the detection data. The distribution of the attack targets is captured in

the form of entropy. In the following, we describe how to use the entropy to measure the

distribution of the worm attack targets.

The Shannon entropy H(X) of a data set X = (x1, x2, . . . , xn) is defined as

Xn H(X) = − pilog(pi), (4.8) i=1

where n = |X|, pi = P [X = xi] for i ∈ 1, . . . , n, the logarithm is taken in base 2. Entropy

quantifies ”the amount of uncertainty” contained in data [109], where the ”uncertainty”

means randomness. Entropy is frequently used to quantify the ”randomness” of a distri-

bution; more precisely is the degree of dispersal or concentration of a distribution. The

reason we use entropy is that; when worm is propagating, the destination IP address of port

scanning traffic (worm attack target IP addresses) has a large distribution (more random)

and entropy naturally can quantify the distribution of worm attack targets and detect the

existence of worm propagation.

In the worm detection system, for each detection sampling window, the detection cen-

ter collects the report of connection attempts targeting unused IP addresses or restricted

service ports from the detection monitors located at different locations. Recall that men-

tioned in Section 4.2.2, each entry in a report table from detection monitors has following

format (Src, Dest, time, signature). Src represents the IP address of the worm instance;

Dest represents the destination IP address as the worm scan target; time represents the time stamp of scan; and signature represents some feature of scans, such as port number.

With all reports collected in one sampling window Ws (time unit), an attack/scan target

98 table is integrated for the entropy calculation with the following format: (Dest, sn). Here,

Dest is the unique key representing the scan target IP address and sn is the number of

distinct sources trying to scan Dest. For example, if a attack target table has M entries,

we have a set of data, Z1 = ((Dest1, sn1),..., (DestM , snM )). Mapping this case to the

Formula (4.8), we derive the entropy of worm attack target distribution,

X sn sn H¯ (Z ) = − ( i )log( i ), (4.9) 1 Y Y i=1

PM where Y = i=1 sni.

Statistical Property of Worm Detection Data

To deal with the VSR worm properties such as the time varying scan rate and the time

varying scan traffic pattern, we use a statistical methodology to process the detection data

sequence obtained from continuous detection sampling windows to improve the detection

accuracy. During worm detection, the detection center configures a detection sliding win-

dow Wd (time unit), which includes q (> 1) continuous detection sampling windows (Re-

call that the size of each sampling window is Ws). As discussed before, the detection center calculates the target distribution in terms of entropy in each sampling window by

Formula (4.9) as the detection data. Thus, there is one detection data in each sampling win- dow. Within the sliding window Wd, there are q target distribution entropy values denoted ¯ ¯ ¯ by Z2 = (Hi−q−1, Hi−q−2,..., Hi) recorded at time i.

The detection center can use sample mean and sample entropy as the statistical property

to process above q detection data in a detection sliding window. The sample mean Eˆ(H¯ ) ˆ and sample entropy H(Z2) of q target distribution entropy series Z2 are defined below:

q X H¯ Eˆ(H(¯Z )) = i−q−j , (4.10) 2 n j=1

99 X o o Hˆ (Z ) = − j log( j ) + log(B). (4.11) 2 q q j

ˆ In H(Z2), we use the histogram-based entropy estimation in [84]. Here oj is the number of sample points in ith bin and B is the histogram’s bin size. Note that these two parameters are obtained through q target distribution entropy values denoted by Z2 [84]. Based on the mean square error of entropy estimation, we can obtain the optimal bin size of the entropy estimation [84].

Decision Rule Adaptation

With the above statistical property of the detection data, the last step in worm detection is to apply detection decision rules. The threshold-based scheme has been widely used in anomaly detection field [132][36]. Worm detection is performed by comparing statis- tical features of the background non-worm traffic and the detection data traffic. As the

VSR worm adopts varying scan rate and worm spreading follows time varying dynamics, we adopt the statistical pattern recognition as a fundamental principle and apply dynamic threshold to deal with the VSR worm.

Our dynamic threshold adaption is inspired by the following observations: Assume that Xn is a random variable, which represents the detection data in the normal system without worm spreading. We also assume that Xw is the random variable representing the detection data in the network system under worm attack. With the statistical pattern recognition as a principle, the Fig. 4.3 shows the two probability density function of normal non-worm traffic and worm attack traffic. From this figure, we obtain two observations: (i)

For the threshold selection, there is a trade-off between the false alarm rate and detection rate. Detection rate is the probability that a wide-spreading worm is detected successfully.

When the threshold is set larger, it causes smaller false alarm rate and smaller detection

100 rate. The smaller detection rate causes longer detection time. (ii) When the variance of attack traffic Xw increases, the threshold should be dynamically adjusted to be smaller in order to maintain certain detection rate.

Distribution of Xn Distribution of Xw

Detection Rate

False Alarm Rate

Tr

Figure 4.3: Bayes decision rule for normal and worm traffic features

The above two observations provide the guidelines in determining the dynamic thresh- old. The threshold needs to consider the normal non-worm traffic property to achieve low false alarm rate. It also needs to consider the run-time detection traffic property, i.e., vari- ance. When the run-time traffic variance becomes larger, the threshold needs to select a relative smaller value in order to achieve high detection rate. If the normal non-worm traf-

fic and worm attack traffic follow the normal distributions, it is possible to obtain the close formula for the optimal threshold. We present the method to obtain optimal threshold in the case that the sample entropy is used as the statistical property of the detection data in

[140]. For the optimal threshold in the cases of sample mean and sample variance, please refer to [118].

Based on above observations, we conduct dynamic threshold adaptation. At the initial-

ization stage of worm detection, there is a initial threshold value, i.e., Tr0 , which is obtained through off-line training process with the large amount of normal non-worm Internet scan

101 traffic traces [103]. As a result, we can obtain the initial Tr0 to achieve reasonable false

alarm rate. With the run-time detection data, threshold Tr is dynamically adjusted based on

the run-time detection traffic variance σ(H¯ ) as

σ(H¯ ) Tr = [1 − α min ( , 1)]Tr0 , (4.12) Vm

where Vm is a constant value. The min() term is to provide the normalized operator for the

detection data variance σ(H¯ ). The α ∈ (0, 1) is the parameter to set the maximal adjustable threshold. Clearly, there is the trade-off for selecting α, i.e., a larger value of α will improve detection rate but potentially increase false alarm rate.

In Formula (4.12), σ(H¯ ) can be calculated as follows: At time tick i, there are q target ¯ ¯ ¯ distribution entropy values denoted by Z2 = (Hi−q−1, Hi−q−2,..., Hi) recorded in the ¯ sliding window Wd. The sample variance σ(H) of Z2 is defined by v u q u1 X σ(H¯ ) = t (H¯ − Eˆ(H¯ ))2, (4.13) q i−q−j j=1

where Eˆ(H¯ ) is calculated by Formula (4.10).

With dynamical threshold Tr obtained by Formula (4.12) and sample entropy of the detection data by Formula (4.11), the DEC detection scheme conducts the detection through comparing the sample entropy with threshold Tr. If the sample entropy is larger than Tr, the alarm of wide-spreading worm is generated.

4.4.3 Space of Worm Detection

As we discussed in Section 4.2.2 and Section 4.4.1, there are three important ele-

ments/steps in the worm detection. Fig. 4.4 shows the space of schemes, in which three

elements are shown as three different dimensions. We can use a tuple to represent a detec-

tion scheme subspace: (Detection Data, Statistical P rocessing, Decision Rule). We

102 then have 32 possible combinations and the whole detection scheme space is divided into

32 subspaces shown in Fig. 4.4. Each subspace represents a different type of detection schemes.

DEC

EDC th DVDH Detection Data: Statistic CVDH Property: 2: Distribution 4: Entropy

3: Variance

CISH CISR 2: Mean 1: Counter

1: Individual

Decision 1: Static 2: Dynamic 3: Static 4: Dynamic Rule: Threshold Threshold Trend Trend

Figure 4.4: Space of worm detection

The traditional count-threshold-based detection schemes are in the subspace modeled by tuple (Count, Individual, Static tHreshold). We refer this group of detection schemes as CISH. The traditional count-trend-based detection schemes (referred as CISR) are in the subspace that modeled as (Count, Individual, Static T rend) [147]. The detection scheme like [132] is in the subspace which is modeled as (Count, V ariance, Dynamic tHreshold), and we refer it as CVDH. The extension of CVDH by replacing the worm instance count with the worm attack target distribution generates another detection scheme referred as

DVDH in this chapter. DVDH is in the subspace which is modeled as (Distribution,

V ariance, Dynamic tHreshold). Our DEC detection scheme is in the subspace which is modeled as (Distribution, Entropy, Dynamic tHreshold). We refer DEC scheme as

103 DEDH in the following section in order to emphasize the comparison with other schemes.

With the space of detection schemes, we can comprehensively compare our DEC detection

scheme with other existing schemes. This detection scheme space can also inspire the study

of potential new worm detection schemes.

4.5 Performance Evaluation

In this section, we report the results of our simulations to show the detection perfor-

mance of various worm detection schemes under different VSR and traditional PRS worm

attacks. We also investigate the sensitivity of detection performance to the worm attack

parameters.

4.5.1 Evaluation Methodology

In this chapter, we evaluate our proposed detection scheme (DEC or DEDH) by com-

paring it performance with some representative schemes discussed in Section 4.4.3, i.e.,

CISH, DISR, CVDH and DVDH.

Evaluation Metrics

The first two metrics we used are the Detection Time (DT ) and Maximal Infection

Ratio (MIR), which are defined in Section 4.3.2. We also use detection false alarm rate defined in Section 4.3.2, which shows the accuracy of the detection scheme.

Simulation Setup

We use real-world Internet traffic traces as the background non-worm scan traffic in

our simulations. To do so, we analyzed scan traffic reported by Internet Storm Center in

SANs [103] and Goldsmith data [49]. The default parameters in our simulation are set as

104 follows. The total number of vulnerable hosts on the Internet is 360, 000, which is similar

to the number of total vulnerable hosts in Code-Red worm incidence [148]. The unit of

the scan rate is the number of scans per time unit. For the traditional worm attack, we

assume that the different infected hosts (worm instances) have different scan rates, but

each worm instance has a scan rate S (> 0) which is determined by a normal distribution

2 12 S = N(Sm,Sσ). In our simulations, we select Sσ as 100 and change Sm from 50 to 350 to evaluate different traditional PRS worms [147].

We simulate the VSR worms as follows. Each worm instance adopts a varying scan

rate (S(t)) and a varying attack probability (Pa(t)), both are functions of time t. The S(t) function in our simulation is

C1 S(t) = max( t ,C2), (4.14) 1 + K

which is a decreasing function of time t (C1 and C2 are constants). Note that S(t) is the

varying scan rate adopted by the VSR worm instance defined in Section 4.3.1. The attack

probability Pa(t) in our simulation is

2π P (t) = max(| cos( t)|,C ), (4.15) a 5000 3

where C3 is a constant. Different values of C1, C2 and C3 correspond to different S(t) and

Pa(t) functions, hence representing different VSR worms. Due to space limitation, we only

present a limited number of cases here. However, we found that the conclusions we draw

hold for all other cases we have evaluated.

We assume that the detection system has distributed monitors which cover 220 IP ad-

dresses. We select 220 as the detection system coverage size to simulate the coverage of

12Each worm instance may have different out-going link bandwidth and computing capacity, thus may have different scan rates.

105 the SANS ISC [103]. The detection sampling window Ws is set to be 5 time units and the detection sliding window Wd is set to be incremental from 100 units to 600 units. The in- cremental selection of Wd can be adaptive to reflect the worm scan traffic dynamics caused by the VSR worms with various speeds. For fair comparison purpose, the thresholds of detection schemes in the following evaluations are consistently configured with the values achieving the similar false alarm rate (below 5×10−4). For this purpose, the detection max- imal adjustable ratio of detection threshold α and parameter Vm (defined in Formula (4.12)) are 0.04 and 200 respectively. Note that our DEC scheme uses Formula (4.12) to dynami- cally adjust the threshold.

4.5.2 Detection Performance

In this section, we first compare the performance of our DEC (or DEDH) detection

scheme with other detection schemes under different VSR worm attacks. Then we report

the comparison between our DEC detection and other detection schemes under different

traditional PRS worm attacks.

Detection Time on VSR Worms 4800 4300 3800 3300 2800 2300 1800 DetectionTime 1300 800 300 325 350 400 450 600 750 950 1100 1350 1500 VSR Parameter K CISH DEDH(DEC) CVDH DVDH

Figure 4.5: Detection time of detection schemes on VSR worms

106 Maximal Infection Ratio on VSR Worms

0.07

0.06

0.05

0.04

0.03

0.02

Maximal Infection Maximal Ratio 0.01

0 325 350 400 450 600 750 950 1100 1350 1500 VSR Parameter K CISH DEDH(DEC) CVDH DVDH

Figure 4.6: Maximal infection ratio of detection schemes on VSR worms

Detection Performance to VSR Worms

As shown in Table 4.1 and 4.2, CISR is not effective to detect VSR worms. Hence,

we do not report the performance of CISR in this subsection. Fig. 4.5 shows the Detec-

tion Time (DT ) of various detection schemes under VSR worm attacks with different K

(defined in Formula (4.14)). Fig. 4.6 shows the corresponding Maximal Infection Ratio

(MIR). From these two figures, we make the following observations: a) Our DEC (or

DEDH) detection scheme, consistently achieves the best detection performance in terms

of both DT and MIR. For example, when K = 400, the detection time of DEDH is 820

units, which is only 25% − 50% of other detection schemes. This means that DEC is more

robust and can detect the VSR worm significantly faster than other detection schemes. With

the same K, the MIR achieved by our DEDH is 0.008, which is only 14% − 40% of the

other detection schemes. This means that DEDH can prevent much more vulnerable hosts

from being infected by the VSR worm, compared with other detection schemes. b) The de-

tection performance of DVDH is better than CVDH, in terms of both DT and DIR. This

result consistently confirms that the worm attack target distribution is more effective than

107 victim count as the detection data. c) When the VSR scan rate parameter K increases, all detection schemes achieve faster worm detection (smaller DT ) and result in smaller MIR.

This is because larger K value implies the faster VSR worm scan. Thus, the VSR worm is detected earlier (smaller DT ) and the damage cost by VSR worm (MIR) is also smaller.

As discussed, there is a trade-off in selecting dynamic threshold parameter α in For-

mula (4.12). A larger value of α will achieve faster detection (smaller detection time) but

worse detection accuracy (larger false alarm rate). Table 4.3 shows the detection time and

detection false alarm rate with different values of α for our DEDH detection scheme. This

table shows that the false alarm rate is sensitive to the dynamic threshold parameter α.

When the value of α is larger, the false alarm rate is also larger.

Parameter α 0 0.04 0.08 0.16 False alarm rate 0.00001 0.00003 0.006 0.015 Detection time (DEDH) 1103 890 850 814 MIR (DEDH) 0.0025 0.0016 0.0013 0.0011

Table 4.3: DEC Performance Sensitivity of Parameter α

Detection Performance to Traditional PRS Worms

For the detection of the traditional PRS worm, we evaluate all the five detection schemes

listed in Fig. 4.4. Fig. 4.7 shows the Detection Time (DT ) of various detection schemes with different mean values of the scan rate Sm. Fig. 4.8 shows the corresponding Maximal

Infection Rate (MIR). From these two figures, we can make the following observations:

a) Our DEC (or DEDH) detection scheme consistently achieves the best detection per-

formance in term of both DT and MIR among five detection schemes. For example,

108 Detection Time on Traditional PRS Worms 1250

1050

850

650

450 DetectionTime

250

50 50 100 150 200 250 300 350 Scan Rate CISH CISR CVDH DVDH DEDH(DEC)

Figure 4.7: Detection time o f detection schemes on the traditional PRS worms

Maximal Infection Ratio on Traditional PRS Worms 0.045 0.04 0.035 0.03 0.025 0.02 0.015 0.01

Maximal Infection Maximal Ratio 0.005 0 50 100 150 200 250 300 350 Scan Rate CISH CISR CVDH DVDH DEDH(DEC)

Figure 4.8: Maximal infection ratio of detection schemes on the traditional PRS worms

when the mean scan rate is 150 per unit time, DEDH achieves a detection time with a value of 240 unit time, which is only 30% − 50% of the detection time of other detection schemes. With the same mean scan rate, DEDH achieves a MIR equal to 0.004, which is only 15% − 35% of other detection schemes. b) When the mean scan rate increases, all detection schemes achieve shorter detection time. Also all the detection schemes except

CVDH and DVDH achieve smaller MIR. This highlights that, in general for most detec- tion schemes, the relatively small scan rates can achieve larger worm attack damage before

109 the worm-spreading is being identified. The reason is that, a faster worm increases the

worm instance count more quickly and thus the detection detects it much earlier. However,

extremely slow worm propagation is contradict to the goal of active worms. c) When the

scan rate increases, MIR of CVDH and DVDH show the different trends compared with other detection schemes. This observation also matches the result in [132]. Fig. 4.8 also shows that CVDH and DVDH can achieve better performance in terms of MIR compared

with CISH and CISR when the worm scan rate is relative low. This confirms that dynam-

ically adjusting the detection threshold in DEDH is a good way to improve the detection

performance especially for detecting stealthy worm with varying scan rate.

To summarize our observations, we can see that our DEC-based detection scheme is

highly potent to detect both the VSR worm and traditional PRS worms.

4.6 Related Work

The significant damage caused by active worms forces people to pay attention to their

study. Much effort has been paid on studying the worm modeling, analysis, detection, and

defense. In the area of modeling and analyzing active worms, Staniford et al. studied

various active worms and modeled the propagation of them [114]. Chen et al. analyzed the

propagation of active worms with a discrete time model and also considered the impact of

patching during the worm spreading [27]. Moore et al. analyzed and modeled Slammer

worm spreading in [86]. There are some other works such as malware spreading dynamics

by Garetto et al. in [48] and Code-Red worm modeling by Zou et al. in [148]. We modeled

a new form of worms, namely Varying Scan Rate worm, which generalizes the worms

that deliberately change the scan rates to evade the existing global scan traffic monitoring

110 based worm detection schemes. Examples of such worms are ”Atak” worm [144] and

”self-stopping” worm [75].

The most important component of worm defense systems is worm detection, which in fact is the foundation of defending against worms. In worm detection, many of schemes leverage intrusion detection results [36]. Besides the detection schemes based on the global scan traffic monitoring as discussed in Section 4.2.2, there are some other types of worm detection schemes. For example, using sequential hypothesis testing, Jung et al. developed a Threshold Random Walk online detection algorithm to identify worm infected hosts [105].

Gu et al. developed DSC (Destination-Source Correlation) for detecting worm in local networks, which is based on worm infection factor, i.e, the victim host is first scanned, then sends out scans destined for the same port [53]. Kim and Karp proposed a scheme to automatically generate the worm signature [61].

Our DEC detection scheme is different from above detection schemes in the sense that

DEC uses attack target distribution as the detection data to capture worm propagation.

Further more, it uses the statistical property of entropy as mining utility to synthesize the detection data, and adopts dynamic detection decision rule to improve the detection perfor- mance. A recent similar work in [67] also discussed using traffic distribution (summarized by entropy) such as destination IP address to detect various anomalies. However, the work in [67] is different from ours in the following regards: its detection scheme was not tuned specifically for worm detection, it did not address the impact of various statistical prop- erties (sample mean/variance/entropy) on detection performance and it did not compare representative worm detection schemes. Our work covered a thorough study on worm detection. Especially, we defined a three-dimensional worm detection space to leverage

111 existing schemes, and applied various statistical properties and used dynamic threshold to

enhance the accuracy in detecting both VSR and traditional worms.

4.7 Summary

In this chapter, we modeled a new form of worms called Varying Scan Rate worm

(the VSR worm in short). The VSR worm is generic and simple to launch. Our results

showed that the VSR worm can significantly degrade the effectiveness of existing worm

detection schemes based on global traffic monitoring. To counteract the VSR worm, we

developed a new worm detection scheme called attack target Distribution Entropy based dynamiC detection scheme (the DEC detection). The DEC detection utilizes the attack target distribution and its statistical entropy in conjunction with dynamic decision rules to distinguish the worm scan traffic from the non-worm scan traffic. Our data clearly demon- strated the effectiveness of the DEC detection scheme in detecting the VSR worms as well as traditional PRS worms. To the best of our knowledge, our work is the first work to sys- tematically study active worms with deliberately varying scan rate and develop effective detection scheme against them.

In our research, except above VSR worm, we have also investigates another new class

of active worms, i.e., Camouflaging Worm (C-Worm), which has the ability to camouflage

its propagation from worm detection systems [142]. The C-Worm intelligently manipulates

its scan traffic volume dynamically and timely so that its propagation may not be detected

by existing network-based worm detection algorithms. We analyzed characteristics of the

C-Worm and compare traffic of both the C-Worm and normal non-worm scans. We ob-

served that they are barely distinguishable in the time domain. However, in the frequency

domain, the distinction between them is clear due to the recurring manipulative nature of

112 the C-Worm. Motivated by our observations, we designed a novel spectrum-based scheme to detect the C-Worm. Our scheme uses the Power Spectral Density (PSD) distribution of the scan traffic volume and its corresponding Spectral Flatness Measure (SFM) to dis- tinguish the C-Worm traffic from non-worm traffic. We conducted extensive performance evaluations on the C-Worm through simulation using real-world trace as background scan traffic. The performance results demonstrate that our spectrum-based scheme can more rapidly and accurately detect the C-Worm in comparison with existing worm detection schemes.

113 CHAPTER 5

POLYMORPHIC WORMS AGAINST HOST-BASED WORM DETECTION AND COUNTERMEASURES

In this chapter, we do not define any new attack model because the evolving worms we study here have been in existing. The worm writers know that most host-based worm detection algorithms use binary presentations or signatures of seen worms as reference to distinguish worms and benign executable, hence they generate polymorphic worms which attempt to change the signatures during the propagation. In order to fight against this new threat in the real world, we propose a new worm detection approach based on mining the dynamic program executions. This approach can capture the dynamic behavior of exe- cutables to provide accurate and efficient detection against both seen and unseen worms including polymorphic worms.

5.1 Motivations

In general, there are two types of worm detection systems: network-based detection and host-based detection. Network-based detection systems detect worms primarily by monitoring, collecting, and analyzing the scan traffic (messages to identify vulnerable computers) generated by worm attacks. Many detection schemes fall into this category

[132, 123, 112, 62, 147, 67, 141]. Nevertheless, because of their reliance on scan traffic,

114 these schemes are not very effective in detecting worms that spread via email systems,

instant messenger (IM) or peer-to-peer (P2P) applications.

On the other hand, host-based detection systems detect worms by monitoring, collect-

ing, and analyzing worm behaviors on end-hosts. Since worms are malicious programs that

execute on these machines, analyzing the behavior of worm executables13 plays an impor-

tant role in host-based detection systems. Many detection schemes fall into this category

[1, 64, 106]. Considering that a large number of real-world worm executables are accessi-

ble over the Internet, they provide an opportunity for researchers to directly analyze them

to understand their behavior and, consequently, develop more effective detection schemes.

Therefore, the focus of this chapter is to use this large number of real-world worm executa-

bles to develop a host-based detection scheme which can efficiently and accurately detect

new worms.

Within this category, most existing schemes have been focusing on static properties

of executables [64, 106]. In particular, the list of called Dynamic Link Libraries (DLLs),

functions and specific ASCII strings extracted from the executable headers, hexadecimal

sequences extracted from the executable bodies, and other static properties are used to

distinguish malicious and benign executables. However, using static properties without

program execution might not accurately distinguish between these exectuables due to the

following two reasons.

• First, two different executables (e.g., one worm and one benign) can have same static

properties, i.e., they can call the same set of DLLs and even call the same set of

functions.

13In this chapter, an executable means a binary that can be executed, which is different from program source code.

115 • Second, these static properties can be changed by the worm writers by inserting

“dummy” functions in the worm executable that will not be called during program

execution, or by inserting benign-looking strings [32, 50, 63, 20].

Hence, the static properties of programs, or how they look, are not the keys to dis- tinguish worm and benign executables. Instead, we believe the keys are what programs do, i.e., their run-time behaviors or dynamic properties. Therefore, our study adopts dy- namic program analysis to profile the run-time behavior of executables to efficiently and accurately detect new worm executables. However, dynamic program analysis poses three challenges. First, in order to capture the run-time behavior of executables (both worm and benign ones), we have to execute a large number of malicious worms, which might damages our host and network systems. Second, given the large number of executables, manually executing and analyzing them is infeasible in practice. Hence, we need to find an efficient way to automatically capture programs’ run-time behavior from their execution.

Third, from the execution of a large set of various worms and benign executables, we need to find some constant and fundamental behavior differences between the worms and the be- nign executables in order to accurately determine whether an unseen executable is a worm or a benign executable.

In order to address the above issues, we propose an effective worm detection approach

based on mining system-call traces of a large amount of real-world worms and benign

executables. Our goal is to use large volume of existing worms to capture the common

dynamic behavior features of worms and use them to detect new worms.

116 5.2 Background

In this section, we give an overview of worm detection and then introduce program analysis and data mining techniques.

5.2.1 Worm Detection

Generally, worm detection can be classified into network-based and host-based schemes.

Network-based schemes detect worm attacks by monitoring, collecting, and analyzing worm-generated traffic. For this purpose, Internet Threat Monitoring (ITM) systems have now been developed and deployed [103, 15]. An ITM system usually consists of a number of monitors and a data center. Each monitor of an ITM system is responsible for monitoring traffic targeted to a range of unused, yet routable, IP address space and periodically reports the collected traffic logs to the data center. The data center analyzes the logs and posts summarized reports for alarming Internet worm attacks. Based on data reported by ITM systems, many detection schemes have been proposed [132, 123, 146, 59]. Nevertheless, as we mentioned in Section 5.1, these detection schemes have limitations detecting worms that spread via e-mail systems, instant messenger (IM), or peer-to-peer (P2P) applications, since their traffic is difficult for ITM systems to observe [77].

Host-based schemes detect worm attacks by monitoring, collecting, and analyzing the worm behavior on end-hosts. In particular, when a worm executes on an infected computer, it may take control of the system with high privileges, modify the system as needed, and continue to infect other computers. These acts expose some anomalies on the infected com- puters, such as writing or modifying registry keys and system binaries or opening network connections to transfer worm executables to other vulnerable computers. For example,

117 the “” worm changes a registry entry, downloads a file named “msblast.exe”, and

executes it [25].

Traditional host-based detection focuses primarily on detecting worms by signature

matching. In particular, these detection systems have a database of distinctive patterns

(signatures) of malicious code for which they scan in possibly-infected systems. This ap-

proach is fast and, until recently, quite effective to detect known worms. However, it is

not effective to detect new worms, as they have new signatures unknown to these detec-

tion systems during the worms’ early propagation stage. Furthermore, worm writers can

use the clear worm signatures generated or used by these detection systems to change the

signatures in order to evade detection. For example, worms such as MetaPHOR [79] and

Zmist [45] intensively metamorphose to hide themselves from detection, thereby illustrat-

ing the feasibility and the efficiency of mutation techniques. Recent data show that current

commercial worm scanners can be easily circumvented by the use of simple mutation tech-

niques [29, 30].

Since attackers always want to hide their malicious actions, they do not make their at-

tack source code publicly available. However, the attack executables are publicly available

after the attacks are captured. Unlike classical host-based detection, our intention is to

use a large number of real-world worm executables and further develop a generic detec-

tion scheme to detect new worms. For this purpose, dynamic program analysis plays an

important role and is introduced in the following subsection.

5.2.2 Program Analysis

Unlike static program analysis, dynamic program analysis does not require the exe- cutable’s source code, but dynamic analysis must be performed by executing the program

118 [42, 72]. Most dynamic program analysis methods, such as debugging, simulation, bi- nary instrumentation, execution tracing, stack status tracking, etc. are primarily used for software-engineering and compiler-optimization purposes. Recently, interest in dynamic program analysis has arisen for vulnerability and “security hole”-detection purposes. How- ever, some dynamic-analysis approaches are only suitable for analysis of individual ex- ecutables with human expertise, such as debugging, or are only fit for specific attacks

[44, 100]. For our work, we need an appropriate dynamic program analysis method to investigate the run-time behavior of worm and benign executables to detect worms. The method we adopt here is to trace system calls during the program execution, which is a type of execution tracing. In particular, we trace the operating system calls invoked by the programs during their execution. This method can be used to automatically record interest- ing information during execution to further investigate executables’ behavior in the course of worm detection.

5.2.3 Data Mining

Data mining refers to the process of extracting “knowledge,” or meaningful and useful information, from large volumes of data [40, 54]. This is achieved by analyzing data from different perspectives to find inherent hidden patterns, models, relationships, or any other information that can be applied to new datasets. It includes algorithms for classification, clustering, association-rule mining, pattern recognition, regression, and prediction, among others. Data-mining algorithms and tools are widely adopted in a range of applications as well as in the computer-security field. In particular, various data-mining technologies are adopted in different threat-detection approaches as described in Section 5.7. In our work,

119 we use classification algorithms to differentiate between worm and benign program execu- tion in order to provide accurate worm detection against both seen and unseen worms.

5.3 Polymorphic Worms

Although numerous of efforts have been made to detect active worms, the evolved worms are using metamorphism to circumvent these existing worm detections. While most of existing host-based worm detection algorithms use worm signature of seen worms to determine whether a encountered executable is worm or not, the polymorphic worms at- tempt to change their binary presentation or signatures during propagation, so that they are always unseen to the worm detectors and thus able to evade their detection.

In fact, above polymorphic techniques are not new in virus [79, 91, 117]. Recently, active worms also show the trend to utilize them [63]. Furthermore, the technology for mutate worm code have been public available even as open source toolkits or libraries

[37, 65, 107]. Attackers can easily use them to make their worms polymorphic and hard to be detected by the signature-based worm detection. Utilization of automatic encryption and decryption further makes the polymorphism of worms more feasible and efficient.

Polymorphic worms are dangerous and potent evolving widespread Internet attacks which bring a serious threat to the Internet due to their effectiveness in evading existing host-based worm detection. The worm detection proposed by us in this chapter aims to address this threat by using the dynamic properties of executables instead of static signa- ture to capture worm executables. We do not use the binary presentation as the feature to distinguish worms from benign executables, thus the mutation techniques used by the polymorphic worms have no impact on our worm detection approach. As shown in 5.5, our

120 dynamic program analysis based approach is effective in detecting unseen worms, includ-

ing brand new worms and polymorphic worms.

5.4 Worm Detection via Mining Dynamic Program Execution

5.4.1 Framework Overview

Recall that the focus of this chapter is to use a large number of real-world worm exe-

cutables and subsequently develop an approach to detect new worms. In this section, we

introduce the framework of our system for dynamic program analysis that detects worm

executables based on mining system-call traces of a large amount of real-world worm and

benign executables. In general, this mining process is referred to as the off-line classifier

learning process. Its purpose is to learn (or train) a generic classifier that can be used to distinguish worm executables from benign ones based on system call traces. Then we use the learned classifier with appropriate classification algorithms to determine, with high ac- curacy, whether unknown executables belong to the worm class or the benign class. This process is referred to as the on-line worm detection process. The basic workflow is illus- trated in Fig. 5.1 and Fig. 5.2, which is subsequently explained.

(1) Collect (2) Collect dataset (3) Extract (4) Learn the executables as by tracing system feature from classifier data source calls system call trace

Figure 5.1: Workflow of the off-line classifier learning

121 (1) Trace (2) Extract (3) Classify the system call of a feature from its executable with new executable system call trace learned classifier

Figure 5.2: Workflow of the on-line worm detection

Off-line Classifier Learning

1. Data Source Preparation

Before we can begin dynamic program analysis and profile the behavior of worm

and benign executables, we need to collect a large number of such executables as

the data source for our study. These executables are labeled into two classes: worm

executables and benign executables. The worms are obtained from the Web site VX

Heavens (http://vx.netlux.org).

2. Collection Dataset - Dynamic Properties of Executables

With the prepared data source, we discuss how to collect the dataset, which we refer

to as the dynamic properties of executables. Recall that in order to accurately distin-

guish worm executables from benign ones, we need to collect data that can capture

the fundamental behavior differences between them—the dynamic properties. One

feasible and efficient method we choose is to run the executables and trace the run-

time system-call sequences during their execution. However, executing worms might

damage the host operating systems or even the computer hardware. In order to solve

this problem in our experiments, we set up virtual machines as the testbed. Then we

launch each executable in our data source and record its system-call trace during the

execution on the virtual machine. We refer to the collection of the system-call traces

for each executable in our data source as the dataset. We split the dataset into two

122 parts: the training set and the test set. With the training set, we will apply classifica-

tion learning algorithms to learn the classifier. The concrete format and content of the

classifier is determined by the learning algorithms adopted. With the test set, we will

further evaluate the accuracy of the learned classifier with respect to the classification

of new and unidentified executables.

3. Feature Extraction

With the collection dataset comprising system-call traces of different executables, we

extract all the system-call sequence segments with a certain length. These segments

are referred as n-grams, where n is the length of the sequence, i.e., the number of

system calls in one segment. These n-grams can map to relatively independent and

meaningful actions taken during the program execution, or the executables’ program

blocks. We intend to use these n-grams to capture the behaviors of common worms

and benign executables. Hence these n-grams are the features for classifying worms

and benign executables, and each distinct n-gram represents a particular feature in

our classification.

4. Classifier Learning

From the features we extract from the training dataset, we need to learn a classifier

to distinguish between worms and benign executables. When we select the clas-

sification algorithm, we need to consider the learned classifier’s accuracy as well

as its interpretability. Some classifiers are easy to interpret and the classification

(i.e., the decision rule of worm detection) can be easily extracted from the classifier

[106]. Then worm writers can use the rules to change their worms’ behavior and

consequently evade detection, similar to self-mutating worms that metamorphose

123 to defeat signature-based detection [20]. Thus, we need classifiers with very low

interpretability. In our case, we consider two algorithms, the Naive Bayes-based

algorithm and the Support Vector Machine (SVM) algorithm, and compare their per-

formance. While the Naive Bayes-based algorithm is simple and efficient in classifier

learning, SVM is more accurate. More importantly, SVM learns a black-box classi-

fier that is hard for worm writers to interpret.

On-line Worm Detection

Having learned the classifier in the off-line process, we now describe how it is used to carry out on-line worm detection. In this process, we intend to automatically detect a new and unseen executable. In particular, we follow the same procedure as in the off-line pro- cess, in which system-call traces of an unknown executable are recorded and classification features (i.e., system-call sequence segments with certain lengths) are extracted during its execution. Then the classification algorithm is applied with the learned classifier to classify the new executable as a worm or a benign program.

In fact, the aforementioned worm detection actually depends on the accuracy of the classifier. In order to evaluate it, we use it to classify the executables in the test set. Since we already know the class label of these executables, we can simply compare the classification results from the learned classifier with the pre-known labels. As such, the accuracy of our classifier can be measured.

In the following sections, we will present the major steps above, i.e., dataset collection, feature extraction, classifier learning and on-line worm detection in detail, followed by experiment results.

124 5.4.2 Dataset Collection

In this section, we present the details on how we obtain the dataset, i.e., the dynamic program properties of executables in the form of system call traces.

Worm Execution with Virtual Machine

In order to study the run-time behavior of worms and benign executables for worm detection, we need to execute the benign executables as well as the worms. However, worms might damage the operating system and even the hardware of the hosts. In order to solve this problem, we set up virtual machines (VMs) [55, 80] as the testbed. The VM we choose is VMware [55].

Even with VMs, two difficulties can still arise during data collection because of the worm execution. First, since worms can crash the operating system (OS) in the VM, we might have to repeatedly re-install the OS. In order to avoid these tedious re-installations, we install all necessary software for our experiments and store all our worm executables on the VM, and then save the image file for that VM. Whenever the VM OS crashes, we can clone the identical VM from the image file to continue our experiment. Second, it is difficult to obtain the system-call traces from the VM after it crashes. In order to solve this problem, we set the physical machine on which a VM is installed as the network neighbor of the VM through the virtual network. Thus, during worm execution, the VM automatically outputs the system-call trace to the physical machine. Although the physical machine can be attacked by the worms on the VM because of this virtual network, we protect the physical machine with anti-virus and other security software and impose very restrictive access controls.

125 System-Call Trace

Recall that we choose dynamic properties of executables to capture their behavior and, more accurately, distinguish worms from benign executables. There are multiple dynamic program analysis methods [42, 72] that can be used to investigate the dynamic properties of executables. The most popular methods are debugging and simulation. However, they must be used manually with human expertise to study program behavior. In our case, their human-intervention requirement makes them unsuitable for automatic analysis. Still, execution tracing is a good method for automatic analysis, as it can automatically record run-time behavior of executables. In addition, it is easy to analyze the system-call trace using automatic analysis algorithms.

There are several different ways to carry out execution tracing. In our case, we choose to trace system calls of worm and benign executables and use the trace to perform classifi- cation (and hence worm detection). The reasons for doing this are straightforward. Tracing all Microsoft Windows Application Programming Interface (API) functions can capture more details about the run-time behavior of executables. However, in comparison with tracing only system calls, API tracing increases OS resource consumption and interferes with the execution of other programs. This is because there are far fewer system calls (311 for all the Windows version together [73], 293 for the Linux 2.6 kernel [56]) than there are

APIs (over 76,000 for Windows versions before Vista [119] and over 1,000 for Linux [98]).

Hence, we choose to trace only system calls to facilitate “light-weight” worm detection.

126 5.4.3 Feature Extraction

Features are key elements of any anomaly-based detection or classification. In this

section, we describe the method to extract and process the features that are used to learn

the classifier and carry out worm detection.

N-gram from System-Call Trace

System-call traces of executables are the system-call sequences (time series) of the

execution, which contain temporal information about the program execution and thus the

respective dynamic behavior information. In our system, we need to extract appropriate

features that can capture common or similar temporal information “hidden” in the system-

call sequences of all worm executables, which is different from the temporal information

hidden in the system-call sequences of all benign executables.

The n-gram is a well-accepted and frequently adopted temporal feature in various areas

of (statistical) natural language processing and genetic sequence analysis [69]. It also fits

our temporal analysis requirement. An n-gram is a subsequence of n items from a given

sequence. For example, if a system call sequence is {NtReplyWaitReceivePort-

Ex, NtOpenKey, NtReadVirtualMemory, NtCreateEvent, NtQuerySystem-

Information}, then the 3-grams from this sequence are {NtReplyWaitReceive-

PortEx, NtOpenKey, NtReadVirtualMemory}, {NtOpenKey, NtReadVirtual-

Memory, NtCreateEvent}, and {NtReadVirtualMemory, NtCreateEvent, Nt-

QuerySystemInformation}.

We use n-grams as the features in our system for the following reasons. Imagine the

difference between one line of source code and one block of source code in a program. The

line of code provides little meaningful information about a program, but the block of code

127 usually represents a meaningful and self-contained small task in a program, which is the

logical unit of programming. Similarly, one system call only provides very limited infor-

mation about the behavior of an executable, whereas a segment of system calls might rep-

resent a meaningful and self-contained action taken during the program execution. Worm

and benign executables have different behaviors, and this difference can be represented as

the difference between their source code blocks, or the segments (i.e., n-grams) of their

system calls. Hence, we use these system-call segments, or the n-grams, as the features to

classify worm and benign executables, which proves to be very effective throughout our

experiments as described in Section 5.5.

Length of N-gram

A natural question is: what n-gram length is best for classifying worms from benign

executables? On one hand, in order to capture the dynamic program behavior, n should

be greater than 1. Otherwise, the extracted 1-gram list is actually the list of system calls

invoked by the executables. This special case is the same as the method used by static pro-

gram analysis to detect worms, which has no dynamic run-time information of executables.

On the other hand, n should not be very large for the following two reasons. First, if n

is too large, it is very unlikely that we will find common or similar n-grams among different

worm executables. In one extreme case, when n becomes very large, the n-grams are no

longer small tasks. Instead, they encompass the entire execution of the programs. Because different worms cannot have the exact same sequence of system-call invocations (otherwise they are the same worm), the classifier learning algorithms cannot find a common feature

(i.e., the same system-call invocations) among them, and the algorithms cannot be used to define a class in which all the worms are included. In this case, the classification will not work. Second, if n is too large, the number of possible distinct n-grams—311n for

128 MS Windows as Windows has 311 system calls, 293n for Linux as Linux has 293 system calls—will be too large to to be analyzed in practice. We investigate the impact of n-gram length on worm detection in our experiments and report the results in Section 5.5.

5.4.4 Classifier Learning and Worm Detection

In this section, we describe the details of the last step in the off-line classifier learning process (i.e., how to apply the classifier learning algorithm to learn the classifier after ex- tracting the features). In particular, we use two classification algorithms: the Naive Bayes algorithm, which is a simple but popular learning algorithm, and the Support Vector Ma- chine (SVM) algorithm, which is a more powerful but more computationally-expensive learning algorithm. We also discuss how to conduct on-line worm detection with each of the algorithms in detail.

Naive Bayes based Classification and Worm Detection

The Naive Bayes classifier (also known as the Simple Bayes classifier) is a simple probabilistic classifier based on applying Bayes’ Theorem [54]. In spite of its naive design, the Naive Bayes classifier may perform better than more sophisticated classifiers in some cases, and it can be trained very efficiently with a labeled training dataset. Nevertheless, in order to use the Naive Bayes classifier, one must make the assumption that the features used in the classification occur independently.

In our case, we use the Naive Bayes classifier to calculate the likelihood that an exe- cutable is a worm executable (i.e., in the worm class) and the likelihood that it is a benign executable (i.e., in the benign class). Then, based on which of the two classes have a larger likelihood, the detection decision is made.

1. Off-line Classifier Learning

129 We represent each executable by an m-dimensional feature vector, X = (x1, x2, . . . , xm),

where m is the number of distinct n-grams in the dataset, xi (i = 0, . . . , m − 1) is

th the i distinct n-gram such that xi = 1 if xi appears in the executable’s system call

trace and xi = 0 otherwise. We have two classes: the worm class Cw and the benign

class Cb. Given the feature vector X of an unknown executable, we need to predict

which class X belongs to. The prediction is done as follows. First, for each class, we

calculate the likelihood that the executable belongs to that class. Second, we make a

decision based on the larger likelihood value, i.e., the executable belongs to the class

that has the larger likelihood.

Actually, the off-line “classifier” learning process of the Naive Bayes algorithm is

the preparation for the calculation of the above two likelihoods. In particular, this

preparation is the calculation of some statistical probabilities based on the training

data. These probabilities comprise the posterior probability of each n-gram—say,

xi—conditioned on each class, Cw and Cb. Hence, the off-line “classifier” learning

process in our Naive Bayes classification actually is the calculation of P (xi|Cj) i =

1, . . . , m, j = w or b based on the training dataset.14

2. On-line Worm Detection

During the on-line worm detection, for each unknown executable, the feature vector

X for that executable is built first. Then we predict that X belongs to the class that has

a higher posterior probability, conditioned on X. That is, the Naive Bayes classifier

14In some implementations, the classifier learning based on the Naive Bayes algorithm may conduct extra procedures, such as selection of features and cross-validation, but they are not the core procedures for the Naive Bayes algorithm.

130 assigns an unknown sample X to the class Cj if and only if

P (Cj|X) > P (Ck|X) where j, k = w or b, j 6= k. (5.1)

Based on Bayes’ Theorem, P (Cj|X) can be calculated by

P (X|C )P (C ) P (C |X) = j j . (5.2) j P (X)

In order to predict the class of X, we will calculate P (X|Cj)P (Cj) for j = w or b

and consequently compare P (Cw|X) to P (Cb|X). Now we discuss how to calculate

P (X|Cj)P (Cj). First, if the class prior probabilities P (Cw) and P (Cb) are unknown,

then it is commonly assumed that the classes are equally likely, i.e., P (Cw) = P (Cb).

Otherwise, P (Cj) can be estimated by the proportion of class Cj in the dataset. Sec-

ond, as we assume the features are independent, P (X|Cj) can be calculated by

Ym P (X|Cj) = P (xi|Cj), (5.3) i=1

where P (xi|Cj) can be calculated during the off-line classifier learning process.

3. Discussion

The Naive Bayes classifier is effective and efficient in many applications. The theo-

retical time complexity for learning a Naive Bayes classifier is O(Nd), where N is

the number of training examples and d is the dimensionality of the feature vectors.

The complexity of classification for an unknown example (an unknown executable

in our case) is only O(d).

However, the Naive Bayes classifier has two limitations in our case. First, worm writ-

ers can use it to make worm detection less effective for new worms. In our approach,

it includes a set of probabilities that the n-grams appear in each class. Worm writers

131 can directly use this information to make new worms similar to benign executables

by either using or avoiding certain n-grams (system-call sequences). Second, high

accuracy of the Naive Bayes classifier is based on the assumption that the features

are independent of each other. However, in reality, the n-grams in the system-call

trace of an executable may not be independent. In order to address these problems

of Naive Bayes classifier, we use the Support Vector Machine (SVM) in our worm

detection as described in the following subsection.

Support Vector Machines based Classification and Worm Detection

The Support Vector Machine (SVM) is a type of learning machine based on statistical

learning theories [121]. SVM-based classification includes two processes: classifier learn-

ing and classification. Classifier learning is the learning of a classifier or model using the

training dataset. The learned classifier is used to determine or predict the class “label” of

instances that are not contained in the training dataset. The SVM is a sophisticated and

accurate classification algorithm. It is computationally expensive and its trained classifier

is difficult to interpret. Its outstanding accuracy and low interpretability match our require-

ments for accurate worm detection and interpretation difficulty for worm writers.

1. Off-line Classifier Learning

A typical SVM classifier-learning problem is to label (classify) N training data

15 d {x1,..., xN } to positive and negative classes where xi ∈ R (i = 1,...,N) and d

is the dimensionality of the samples. Thus, the classification result is {(x1, y1),..., (xN , yN )},

15The SVM algorithm can be extended to classification for more than two classes, but the two classes are the typical and basic cases. Our problem is a two-class classification problem.

132 th yi ∈ {−1, +1}. In our case, xi is the feature vector built for the i executable in our dataset. That is, xi = {xi,1, . . . , xi,d}, where d is the number of distinct n-grams, xi,j

th th (j = 1, . . . , d) is the j n-gram, xi,j = 1 if xi,j appears in the i executable’s system call trace and xi = 0 otherwise. yi = −1 means that xi belongs to the worm class and yi = +1 means that xi belongs to the benign-executable class. As we have a large number of features (n-gram), the dimensionality of the Euclidean space in our classification problem is very large (upper bounded by 311n depending on n-gram length n).

There are two cases for the SVM classifier learning problems; (1) the samples in the two classes are linearly separable; (2) the samples in the two classes are not linearly separable. But (2) case holds for most real-world problems. In the SVM, in order to achieve a optimal classifier, the non-linear solvable problem in case (2) needs to be transform to be a linear solvable problem in case (1) first. Then the optimal classifier can be learned through linear optimization [121, 122]. In the following, we first present the algorithm for the simple case (case (1)) followed by the algorithm for case (2).

1) Classes are linearly separable

If the two classes are linearly separable, then we can find a hyperplane to separate the examples in two classes as shown in the right side of Fig. 5.3. Examples that belong to different classes should be located on different sides of the hyperplane. The intent of the classifier learning process is to obtain a hyperplane which can maximally separate the two classes.

133 Mathematically, if the two classes are linearly separable, then we can find a hyper- plane w · x + b = 0 with a vector w and an intercept b that satisfies the following constraints:

w · xi + b ≥ +1 for yi = +1 and (5.4)

w · xi − b ≤ −1 for yi = −1, (5.5)

or, equivalently,

yi(w · xi − b) − 1 ≤ 0 ∀i. (5.6)

Examples in the training set that satisfy the above inequality are referred to as sup- port vectors. The support vectors define two hyperplanes: one that goes through the support vectors of the positive class, and the other goes through the support vectors of the negative class. The distance between these two hyperplanes defines a margin and this margin is maximized when the norm of the vector w, kwk, is minimized.

When the margin is maximized, the hyperplane w · x + b = 0 separates the two classes maximally, which is the optimal classifier in the SVM algorithm. The dual form of Equation 5.6 reveals that the above optimization actually is to maximize the following function:

XN 1 XN XN W (α) = α − α α (x · x )y y , (5.7) i 1 i j i j i j i=1 i=1 j=1 subject to the constraint that αi ≥ 0. The SVM algorithm can achieve the optimal classifier by finding out αi ≥ 0 for each training sample xi to maximize W (α).

2) Classes are not linearly separable

In the above case, linearly-separable classes can be optimized. However, real-world classification problems cannot usually be solved by the linear optimization algorithm.

134 This case is illustrated as the left side of Fig. 5.3, in which there is no linear hyper- plane (in this case, it is a straight line in 2-dimensional space) that can separate the examples in two classes (shown here with different colors). In other words, the re- quired classifier must be a curve, which is difficult to optimize.

feature 2 new feature 2 feature mapping

feature 1 new feature 1

Figure 5.3: Basic idea of kernel function in SVM.

The SVM provides a solution to this problem by transforming the original feature space into some other, potentially high-dimensional, Euclidean space. Then the mapped examples in the training set can be linearly separable in the new space, as demonstrated by the right side of Fig. 5.3. This space transformation in Equa- tion (5.7) can be implemented by a kernel function,

K(xi, xj) = Φ(xi) · Φ(xj), (5.8) where Φ is the mapping from the original feature space to the new Euclidean space.

We would only need to use K in the classifier training process with Equation (5.7), and would never need to explicitly even know what Φ is. The SVM kernel func- tion can be linear or non-linear. Common non-linear kernel functions include the

135 Polynomial Function, Radial Basis Function (RBF), and Sigmoid Function, among

others.

2. On-line Worm Detection

On-line worm detection is the classification of new executables using the SVM classi-

fication algorithm along with the optimal SVM classifier learned during the previously-

discussed off-line learning process.

For an unknown executable (a worm or benign executable), its feature vector xk must

be built first. The method is the same as the aforementioned process on the executa-

bles in the training set, i.e., the system-call trace during the execution is recorded,

then the n-grams with a certain value of n are extracted. Afterwards, the feature

vector xk is formed from the trace of the executable using the same method as in the

off-line classifier learning process.

Recall that during the classifier learning process, the optimal hyperplane is found.

Then for a new example xk, shown as the white circle in Fig. 5.3, the on-line classi-

fication checks on which side of the optimal hyperplane xk is. Mathematically, the

classification is conducted through signing a class to the executable by

C(xk) = sign(w · xk − b), (5.9)

where XN w = αiyixi. (5.10) i=1

If C(xi) is positive, we predict that the executable is a worm. Otherwise, we predict

that it is benign.

136 3. Complexity of SVM

The classifier learning process of SVM is relatively time-consuming because of the

large volume of the training set, the high-dimensionality of our feature space, and

the complexity of classifier calculation and optimization. No matter which kernel

function is used for N training examples with feature vectors of dimensionality d and

3 NS support vectors, the SVM classifier learning algorithm has complexity O(NS +

2 NSN + NSdN). However, the SVM classification process for each new executable

is fast and involves only limited calculations. Its complexity is O(MNS), where M

is the complexity of the kernel function operation. For Radial Basis Function (RBF)

kernel functions, M is O(d).

4. Black Box Characteristics of the SVM Classifier

The classifier learned by the SVM can be easily used to carry out worm detection.

However, the SVM classifier is hard to be interpreted. The SVM classifier learning

algorithm generates black box models (classifiers) in the sense that they do not have

the ability to explain in an understandable form [93, 16, 46]. Thus, from the SVM

classifier, it is hard to extract decision rules comprehensible in the original problem

domain, especially for the non-linear SVM due to the feature space transformation

introduce by kernel functions.

The above characteristic of SVM is a well-known limitation for the applications in

which one needs to know the decision rules which can be mapped back to the physical

entities in the original problem domain. However, this characteristic can help us

prevent the worm writers from interpreting and learning from the classifier. We want

to prevent the worm writers from obtaining the signature of his worms or any benign

137 executable. Otherwise, the worm writer can hide new worms accordingly as benign

executables.

Besides the optimization algorithm used in SVM, the learning classifier also depends

on the definition of input feature space, the selection of kernel function, the parame-

ters of the kernel function and etc., which are unknown to worm writers. The worm

writer does not know:

• the value of n of the n-gram used in the classifier,

• the mapping between n-grams and feature indices in the feature vector,

• the definition of the kernel function,

• the parameters of the kernel function,

• the space transformation introduced by kernel function.

Hence, even if the worm writer knows that we use SVM and is able to get the clas-

sifier, it is hard for him to interpret the classifier to discovery the decision rule we

used to distinguish between worms and benign executables. Thus, it is hard for him

to change the worm behavior accordingly to evade our detection. Furthermore, we

can protect the classification by mechanisms, such as encryption and etc.

5.5 Experiments

In this section, we first present our experimental setup and metrics, and then we report

the results of our experiments.

138 5.5.1 Experiment Setup and Metrics

In our experiments, we use 722 benign executables and 1589 worms in Microsoft Win-

dows or DOS Portable Executable (PE) format as the data source, though our approach

works for worm detection on other operating systems as well. We use this data source to

learn the generic worm classifier and further evaluate the trained classifier to detect worms.

The executables are divided into two classes: worm and benign executables. The worms

are obtained from the Web site VX Heavens (http://vx.netlux.org); they include e-mail worms, peer-to-peer (P2P) worms, Instant Messenger (IM) worms, Internet Relay

Chat (IRC) worms and other, non-classified worms. The benign executables in our exper- iments include Microsoft software, commercial software from other companies and free,

“open source” software. This diversity of executables enables us to comprehensively learn classifiers that capture the behavior of both worm and benign executables. We use 80% of each class (worm and benign) as the training set to learn the classifiers. We use the remain- ing 20% as the test set to evaluate accuracy of the classifiers, i.e., the performance of our detection approach.

We install MS Windows Professional 2000 with service pack 4 on our virtual machines

(VMs). On these VMs, we launch each collected executable and use strace for Windows

NT [33] to trace their system calls for 10 seconds.16 From the trace file of each executable,

we extract the system-call name sequences in temporal order. Then we obtain the seg-

ment of system calls (i.e., the n-grams), given different value of n for each executable.

Afterwards, we build the vector inputs for the classification learning algorithms.

16We launch the executables in the dataset for a longer time and then use a slide window to capture traces of a certain length for the classifier training. We found that a 10 second trace suffices to provide high detection accuracy.

139 Recall that the classification in our worm detection problem is in a high-dimensional

space. There are a large number of dimensions and features that cannot be handled or

handled efficiently by many data-mining tools. We choose the following data-mining

tools: Naive Bayes classification tools from University of Magdeburg in Germany [28]

and svm light [57]. Both tools are implemented in the C language and thus have ef-

ficient performance, especially for high-dimensional classification problems. When we apply SVM algorithm with svm light, we choose the Gaussian Radial Basis Function

(Gaussian RBF), which has been proven as an effective kernel function [54]. The feature

distribution is a Gaussian distribution. The Gaussian RBF is in the form of

2 −γkxi−xj k K(xi, xj) = e , (5.11)

which means Equation (5.8) must be replaced by Equation (5.11) in the classifier learning

process and on-line worm detection process. The value of γ is optimized through experi-

ments and comparison.

In order to evaluate the performance of our classification for new worm detection, we

use two metrics: Detection Rate (PD) and False Positive Rate (PF ). In particular, the detection rate is defined as the probability that a worm is correctly classified. The false positive rate is defined as a benign executable classified mistakenly as a worm.

5.5.2 Experiment Results

In this subsection, we report the performance of our worm detection approaches. The

results of Naive Bayes- and SVM-based worm detections in terms of Detection Rate and

False Positive Rate under different n-gram length (n) are shown in Tables 5.5.2 and 5.5.2, respectively.

140 n-gram length 1 2 3 4 5 6

Detection Rate (PD) 69.8% 81.4.0% 85.0% 90.9% 93.6% 96.4% False Positive Rate (PF ) 33.2% 18.6% 11.5% 8.89% 6.67% 6.67%

Table 5.1: Detection results for the Naive Bayes based detection

n-gram length 1 2 3 4 5 6

Detection Rate (PD) 89.7% 96.0% 98.75% 99.5% 99.5% 99.5% False Positive Rate (PF ) 33.3% 18.75% 7.14% 4.44% 2.22% 2.22%

Table 5.2: Detection results for the SVM based detection

Effectiveness of Our Approaches

We conclude that our approaches of using both the Naive Bayes and SVM algorithms

correlate with detected worms at a high detection rate and low false positive rate when

the length of n-grams is reasonably large. For example, when the length of n-grams is

5, the detection based on the SVM algorithm achieves 99.5% detection rate and 2.22%

false positive rate and the detection based on the Naive Bayes algorithm achieves 96.4% detection rate and 6.67% false positive rate, respectively.

We also conclude that SVM-based detection performs better than Naive Bayes-based

detection in terms of both detection rate and false positive rate. There are two reasons for

this. First, the Naive Bayes classification assumes that features are independent, which may

not always be the case in reality. Second, the Naive Bayes-based classification calculates

the likelihood for classifying a new executable based on the vectors of the training set

executables in the feature space. Then it simply predicts the class of the new executable

141 based on the likelihood comparison. In contrast, the SVM attempts to optimize the classifier

(hyperplane) by finding the hyperplane that can maximally separate the two classes in the

training set.

Impacts of N-gram Length

Another important observation is the length of n-gram, i.e., the value of n, impacts the

detection performance. When n increases from 1 to 4, the performance keeps increasing.

When n further increases after that, the performance does not increase or only marginally increases. The reason can be explained as follows. First, when n = 1, each n-gram only contains one system call and thus contains neither dynamic system-call sequence infor- mation nor executable behavior information. Actually, this special case is that of static program analysis, which only investigates the list of system calls used by the executables.

Second, when n is larger, the n-grams contain larger lengths of system call sequences, thereby capturing more dynamic behavior of the traced executables, hence increasing the detection performance. This also demonstrates that our dynamic program analysis ap- proach outperforms the traditional static program analysis-based approaches.

From the above observation on the length of the n-gram, we conclude that a certain n-gram length is sufficiently effective for worm detection. This length (value of n) can be learned through experiments: when the increase of n does not greatly increase detection performance gain, that n value is good enough and can be used in practice. This method is actually used for other n-gram based data mining applications. Furthermore, with respect to the efficiency of worm detection, the n value should not be very large, as we discuss in

Section 5.4.4.

142 5.6 Discussions

In this chapter, we develop a worm detection approach that allows us to mine program execution, thereby detecting new and unseen worms. There are a number of possibilities for extending this work. A detailed discussion follows.

1. Classification for Different Worms

Presently, we discuss how to generalize our approach to support classification for dif-

ferent worms. Recall that our work in this chapter only focuses on distinguishing two

classes of executables: worm and benign. In practice, knowledge of the specific types

of worms (e.g., e-mail worms, P2P worms) can provide better ways to defend against

them. In order to classify different worms, the approach studied in Section 5.4.4 can

be extended as follows. First, we collect the training dataset including large number

of worms labeled by their types. Second, we use the same approach discussed in

Section 5.4.4 to train the classifiers, which are capable of profiling multiple classes

according to the types. Third, trained classifiers are used to determine the type (class)

of an un-labeled new worm.

2. Detection of Smart Worms

Now, we discuss how to extend our work to detect smart worms, or worms that be-

have “intelligently.” Based on mining the execution of a large number of program

executables, our approach has proven effective at detecting new, unseen worms. In

preliminary investigations, we even found that our approach detected “smart worms”

we created. These worms are “benign” executables that were infected by worms

and hence have behaviors characteristic of both worm and benign executables. We

143 use them to simulate smart worms that can camouflage themselves as benign exe-

cutables. Our initial finding is promising, as we found that our approach can detect

these smart worms that possess benign worm behaviors. This ability of our approach

awaits further proof of its efficacy with a larger set of various types of smart worms.

Nevertheless, the detection of smart worms is still an open issue, and answers for

some questions are still unclear. For instance, how intelligent can worms be, i.e., to

what extent can worms behave similarly to benign executables while still “spreading”

themselves like “normal” worms? While this is an open problem, researchers have

begun to discuss the efficiency of worms and detection evasion [96, 143]. Worms can

be very intelligent in order to evade detection, but their “hiding” mechanisms may

diminish their efficiency. More importantly, so long as they are worms, they must

manifest behavior different than that of benign executables. This provides some op-

portunities for use to further investigate smart worm detection.

3. Integration of Network-based and Host-based Detection

In this chapter, our focus is the study of host-based detection and we did not consider

information about the traffic generated by the executables during the worm detection.

As we know, a worm executable will expose multiple behaviors, such as generating

scan traffic (i.e., messages that intend to identify vulnerable computers) and conduct-

ing malicious acts on the infected computers. Since these worm behaviors are ex-

posed from different perspectives, consideration of multiple behaviors could provide

more accurate worm detection. In fact, traffic generated by worms can also be clas-

sified and used to distinguish them from normal traffic. For instance, the distribution

of destination IP addresses in network traffic can provide accurate worm detection

through traffic analysis [141]. Hence one ongoing work is to combine the traffic logs

144 and system calls generated by the worms and benign executables. The integration of

traffic and system calls can learn more reliable classifiers to detect worms.

5.7 Related Work

In this section, we review some existing work related to our study including worm

detection and data mining to the security research.

Since worm attacks have always been very dangerous threats to the Internet, much ef-

fort has gone into studying, analyzing, and modeling worms. For example, Staniford et

al. in [114] studied various worms and modeled their propagation using a continuous-time

epidemiology model. There has also been extensive work on the propagation of specific

worms [86, 148]. These analysis and modeling results help researchers to have better un-

derstanding of worm behaviors, and further develop detection schemes.

As we mentioned, there are two types of worm detection systems: network-based de-

tection and host-based detection. For the network-based worm detection, many schemes

proposed in the literature. For example, payload signature-based detection scheme is one to

examine specific byte sequence segments in the payload of worm scan traffic [112]. Tradi-

tionally, these payload signatures are manually identified by security experts through care-

ful analysis of the byte sequence from captured network traffic. Some efforts have been paid

to automatically generating payload signatures [112, 62]. There are other network-based

detection based on network traffic analysis. For example, Jung et al. in [59] developed a

threshold-based detection algorithm to identify the anomaly of scan traffic generated by a

computer. Venkataraman et al. and Weaver et al. in [123, 132] proposed schemes to ex-

amine statistics of scan traffic volume, Zou et al. presented a trend-based detection scheme

to examine the exponential increase pattern of scan traffic [147], Lakhina et al. in [67]

145 proposed schemes to examine other features of scan traffic, such as destination address dis- tribution. There are also other work studying worms attempt to pose new patterns to avoid detection [141, 96].

For the host-based detection, many schemes have been proposed in the literature. For example, a binary text scan program was developed to extract the human-readable strings from the binary, which reveal information about the function of the executable binary [1].

Wagner et al. in [124] proposed an approach that analyzes program executables and gen- erates a non-deterministic finite automaton (NDFA) or a non-deterministic pushdown au- tomaton (NDPDA) from the global control-flow graph of the program. The automaton was then used to monitor the program execution on-line. Gao et al. in [47] presented an ap- proach for detecting anomalous behavior of an executing process. The basic idea of their approach is that processes potentially running the same executable should behave similarly in response to a common input. Feng et al. [43] proposed a formal analysis framework for pushdown automata (PDA) models. Based on this framework, they studied program analysis techniques, incorporating system calls or stack activities. There are other schemes that detect anomaly behavior of executables through call stack information. For example,

Cowan et al. in [34] proposed a method, called StackGuard, to detect buffer overflow at- tacks. The difference that distinguishes our work form theirs is that we attempt to capture the common dynamic behavioral features of worms by mining the execution of a large number of worms.

Many articles have examined the use of data mining for security research. Lee et at. in

[70] formulated machine learning scheme on system call sequences of normal and abnor- mal execution of the Unix sendmail program. Lee et al. in [71] described a data mining framework for adaptively building Intrusion Detection models. The main idea of their work

146 is to utilize auditing programs (e.g., network logs of telnet sessions, shell command log) to extract an extensive set of features that describe each network connection or host ses- sion, and apply data mining techniques to learn rules that capture the behavior of intrusions and normal activities. Martin et al. in [78] proposed an approach via learning statistical pattern of outgoing emails from local hosts. Yang et al in [136] proposed an approach to apply machine learning approach to automatically fingerprint polymorphic worms, which are capable of changing their appearance across every instance of executables. Kolter et al in [64] applied data mining techniques to extract byte sequences directly from program executables, converted these sequences into n-grams, and constructed the classifier. Julisch et al in [58] proposed an approach to learn historical alarms generated by intrusion detec- tion systems. In our work, we use data mining to obtain the dynamic behaviorial difference between worms and benign executables.

5.8 Summary

we proposed a new worm detection approach that is based on mining the dynamic execution of programs. Our approach is capable of capturing the dynamic behavior of executables to provide efficient and accurate detection against both seen and unseen worms.

Using a large number of real-world worms and benign executables, we run executables on virtual machines and record system call traces. We applied two data mining classification algorithms to learn classifiers off-line, which are subsequently used to carry out on-line worm detection. Our data clearly showed the effectiveness of our proposed approach in detection worms in terms of both a very high detection rate and a low false positive rate.

Our proposed approach has the following advantages. It is practical with low overhead during both classifier learning and run-time detection. It does not rely on investigation for

147 individual executable; rather, it examines the common dynamic properties of executables.

Therefore, it can automatically detect new worms. Furthermore, our approach attempts to build a “black-box” classifier which makes it difficult for the worm writers to interpret our detection.

148 CHAPTER 6

CONCLUDING REMARKS

In this dissertation, we studied defense-oriented evolution, a novel evolutionary trend

among widespread Internet attacks, in which defense-oriented attacks leverage defense sys- tems to circumvent them and increase attack effectiveness. As the most important charac- teristics of a defense system are its infrastructure and algorithms, we classify these attacks into two groups: infrastructure-oriented attacks and algorithm-oriented attacks.

For infrastructure-oriented widespread Internet attacks, we studied intelligent DDoS

attacks which aim to infer architectural information about the infrastructure of DDoS-

defensing Secure Overlay Forwarding Systems (SOFSes) to launch more efficient DDoS

attacks. We further provided optimal structural configuration for SOFS systems and guide-

lines to enhance SOFS system performance under intelligent DDoS attacks. Additionally,

we investigated another infrastructure-oriented attack, the invisible LOCalization attack

(the iLOC attack in short). The iLOC attack can accurately and invisibly obtain the moni-

tor locations in Internet Threat Monitoring (ITM) systems, enabling other attacks to evade

disclosed monitors or even to abuse ITM systems to degrade their integrity and functional-

ity.

For algorithm-oriented attacks, we studied Varying Scan Rate (VSR) Worm,s which

deliberately vary their port scan rate during propagation to defeat existing ITM-based worm

149 detection schemes. We also designed the attack target Distribution Entropy based dynamiC

(DEC) detection scheme to effectively detect VSR and traditional worms. Furthermore, in order to detect new worms, including polymorphic worms (which have different signatures or purposely change signatures in order evade host-based worm detection), we proposed a new worm detection scheme which mines dynamic program execution.

While there are other forms of evolutions among widespread Internet attacks, we be- lieve the defense-oriented ones studied in this dissertation are among the most dangerous, since they deliberately and effectively counteract defense systems. As shown in the dis- sertation, they are also feasible and potent threats to the Internet. Our purpose is not to encourage attacks, but to obtain deep insights about potential new Internet threats and vul- nerabilities within current defense systems in order to enhance defense systems and design new defenses against evolving widespread Internet attacks. We believe that the results in this dissertation lay a foundation for further research in this field.

150 BIBLIOGRAPHY

[1] Binary Text Scan. http://netninja.com/files/bintxtscan.zip.

[2] Internet Security News. http://www.landfield.com/isn/mail-archive/2001/Feb/0037 .html.

[3] Snort, the open-source network intrusion detection system. http://www.snort.org/.

[4] W32/MyDoom.B Virus. http://www.us-cert.gov/cas/techalerts/TA04-028A.html.

[5] W32.Sircam.Worm@mm. http://www.symantec.com/avcenter/venc/data/w32.sircam. [email protected].

[6] Worm.ExploreZip. http://www.symantec.com/avcenter/venc/data/worm.explore.zip .html.

[7] Powerful Attack Cripples Internet. Associated Press for Fox News, http://www.foxnews.com/story/0,2933,66438,00.html, October 2002.

[8] R. Agrawal, A. Evfimievski, and R. Srikant. Information sharing across private database. In Proceeding of the 22-th SIGMOD International Conference on Man- agement of Data, San Diego, CA, July 2003.

[9] R. L. Allen and D. W. Mills. Signal Analysis: Time, Frequency, Scale, and Structure. Wiley and Sons, 2004.

[10] D. Andersen. Mayday: Distributed filtering for internet services. In Proceedings of the USENIX Symposium on Internet Technologies and Systems, Seattle, WA, March 2003.

[11] D. Andersen, H. Balakrishnan, M. Kaashoek, and R. Morris. Resilient overlay net- works. In Proceedings of 18th ACM Symposium on Operating Systems Principles (SOSP), Banff, Canada, October 2001.

[12] A. Anderson, A. Johnston, and P. McOwan. Motion Illusions and Active Camou- flaging. http://www.ucl.ac.uk/ucbplrd/motion/motion middle.html.

151 [13] R. M. Anderson and R. M. May. Infectious Diseases of Humans: Dynamics and Control. Oxford University Press, Oxford, 1991.

[14] G. Badishi, I. Keidar, and A. Sasson. Exposing and eliminating vulnerabilities to denial of service attacks in secure gossip-based multicast. In Proceedings of the International Conference on Dependable Systems and Networks (DSN), Florence, Italy, June 2004.

[15] M. Bailey, E. Cooke, F. Jahanian, J. Nazario, and D. Watson. The internet motion sensor: A distributed blackhole monitoring system. In Proceedings of the 12-th IEEE Network and Distributed Systems Security Symposium (NDSS), San Diego, CA, February 2005.

[16] N. Barakat and J. Diederich. Eclectic rule-extraction from support vector machines. In Int. Journal Computational Intelligence, volume 1, pages 59–62, 2005.

[17] M. Bellare, S. Goldwasser, and D. Miccianciom. Pseudo-random number genera- tion within cryptographic algorithms: the dss case. In Proceedings of advances in cryptology’97, Lecture Notes in Computer Science, Springer-Verlag, May 1997.

[18] J. Bethencourt, J. Frankin, and M. Vernon. Mapping internet sensors with probe re- sponse attacks. In Proceedings of the 14-th USNIX Security Symposium, Baltimore, MD, July-August 2005.

[19] J. Blazquez, A. Oliver, and JM. Gomez-Gomez. Mutation and Evolution of An- tibiotic Resistance: Antibiotics as Promoters of Antibiotic Resistance, volume 3. Current Drug Targets, August 2002.

[20] D. Bruschi, L. Martignoni, and M. Monga. Detecting self-mutating malware using control flow graph matching. In Proceedings of the Conference on Detection of Intrusions and Malware & Vulnerability Assessment (DIMVA), Berlin, Germany, July 2006.

[21] CAIDA. The Cooperative Association for Internet Data Center. http://www. caida.org.

[22] CAIDA. Telescope Analysis. http://www.caida.org/analysis/security/telescope.

[23] CERT. Advisory CA-1995-18 Widespread Attacks on Internet Sites. http://www.cert .org/advisories/CA-1995-18.html.

[24] CERT. CERT/CC advisories. http://www.cert.org/advisories/.

[25] CERT. Advisory CA-2003-20 W32/Blaster worm. http://www.cert.org/advisories /CA-2003-20.html, 2003.

152 [26] S. Chen and R. Chow. A new perspective in defending against ddos. In Proceed- ings of 10th IEEE Workshop on Future Trends of Distributed Computing Systems (FTDCS), Suzhou, China, May 2004.

[27] Z. S. Chen, L.X. Gao, and K. Kwiat. Modeling the spread of active worms. In Proceedings of the IEEE Conference on Computer Communications (INFOCOM), San Francisco, CA, March 2003.

[28] B. Christian. Full and Naive Bayes Classifiers. http://fuzzy.cs.uni-magdeburg.de/ borgelt/doc/bayes/bayes.html.

[29] M. Christodorescu and S. Jha. Static analysis of executables to detect malicious patterns. In Proceedings of the 12-th USENIX Security Symposium (SECURITY), Washington, DC, August 2003.

[30] M. Christodorescu and S. Jha. Testing malware detectors. In Proceedings of the 2004 ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA), Boston, MA, July 2004.

[31] L. Y. Chuang, C. H. Yang, C. H. Yang, and S. L Lin. An interactive training sys- tem for morse code users. In Proceedings of Internet and Multimedia Systems and Applications, Honolulu, Hawai, August 2002.

[32] M. Ciubotariu. : a conflict starter? Virus Bullettin, http://www.virusbtn.com, 2004.

[33] BindView Corporation. Strace for NT. http://www.bindview.com/Services/ RA- ZOR/Utilities/Windows/strace readme.cfm.

[34] C. Cowan, C. Pu, D. Maier, H. Hinton, P. Bakke, S. Beattie, A. Grier, P. Wagle, and Q. Zhang. Stack-guard: Automatic adaptive detection and prevention of buffer- overflow attacks. In Proceedings of 7th USENIX Security Symposium (SECURITY), San Antonio, TX, August 1998.

[35] E. J. Crusellers, M. Soriano, and J. L. Melus. Spreading codes generator for wireless cdma network. International Journal of Wireless Personal Communications, 7(1), 1998.

[36] D. E. Denning. An intrusion detection model. IEEE Transactions on Software En- gineering, 13(2):222–232, February 1987.

[37] T. Detristan, T. Ulenspiegel, Y. Malcom, and M. Underduk. Polymorphic shellcode engine using spectrum analysis. http://www.phrack.org/, 2003.

[38] Robert Dixon. Spread Spectrum Systems, 2nd Edition. John Wiley & Sons, 1984.

153 [39] Dshield. Distributed Intrusion Detection System. http://www.dshield .org/.

[40] M. H. Dunham. Data Mining: Introductory and Advanced Topics. Prentice Hall, 1 edition, 2002.

[41] Nova Engineering. Linear Feedback Register Shift. http://www.sss-mag.com/pdf /lfsr.pdf.

[42] M. Ernst. Static and dynamic analysis: Synergy and duality. Portland, Oregon, May 2003.

[43] H. H Feng, J. T. Giffin, Y. Huang, S. Jha, W. Lee, and B. P. Miller. Formalizing sen- sitivity in static analysis for intrusion detection. In Proceedings of IEEE Symposium on Security and Privacy (S&P), Oakland, CA, May 2004.

[44] H. H. Feng, O. M. Kolesnikov, P.Fogla, W. Lee, and W. Gong. Anomaly detection using call stack information. In Proceedings of IEEE Symposium on Security and Privacy (S&P), Oakland, CA, May 2003.

[45] P. Ferrie and P. Szor.¨ Zmist. Zmist opportunities. Virus Bullettin, http://www .virus- btn.com.

[46] G. Fung, S. Sandilya, and R. Rao. Rule extraction from linear support vector ma- chines. In Proceedings of the 11-th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), Chicago, Illinois, August 2005.

[47] D. Gao, M. Reiter, and Dawn Song. Behavioral distance for intrusion detection. In Proceedings of Symposium on Recent Advance in Intrusion Detection (RAID), Seattle, WA, September 1999.

[48] M. Garetto, W. B. Gong, and D. Towsley. Modeling malware spreading dynamics. In Proceedings of the IEEE Conference on Computer Communications (INFOCOM), San Francisco, CA, March 2003.

[49] D. Goldsmith. Incidents Maillist: Possible Code-Red Connection Attempts. http://lists.jammed.com/incidents/2001/07/0149.html.

[50] J. Gordon. Lessons from virus developers: The beagle worm history through april 24. http://www.securityfocus.com/guest/24228, 2004.

[51] V. S. Grichenko. Modular Worms. http://blogs.plotinka.ru/gritzko/modular.pdf, PC World.

[52] P. Gross, J. Parekh, and G. Kaiser. Secure selecticast for collaborative intrusion detection systems. In Proceedings of the 3-th International Workshop on Distributed Event-based Systems (DEBS), Edinburgh, UK, May 2004.

154 [53] G. F. Gu, M. I. Sharif, X. Z. Qin, D. Dagon, W. Lee, and G. F. Riley. Worm de- tection, early warning and response based on local victim information. In Proceed- ings of Proceedings of the 20-th Annual Computer Security Applications Conference (ACSAC 2004), Tucson, Arizona, December 2004.

[54] J. Han and M. Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, 2 edition, 2006.

[55] VMWare Inc. www.vmware.com/virtual-machine.

[56] Operating System Inside. Linux System Call Table. http://osinside.net/syscall /sys- tem call table.htm, 2006.

[57] T. Joachims. Advances in Kernel Methods: Support Vector Machines. MIT Press, Cambridge, Massachusetts, 1998.

[58] K. Julisch and M. Dacier. Mining intrusion detection alarms for actionable knowl- edge. In Proceedings of the 8th ACM International Conference on Knowledge Dis- covery and Data Mining (SIGKDD), Edmonton, Alberta, July 2002.

[59] J. Jung, V. Paxson, A. W. Berger, and H. Balakrishnan. Fast portscan detection using sequential hypothesis testing. In Proceedings of the 25-th IEEE Symposium on Security and Privacy, Oakland, CA, May 2004.

[60] A. Keromytis, V. Misra, and D. Rubenstein. SOS: Secure overlay services. In Proceedings of ACM SIGCOMM, Pittsburg, PA, August 2002.

[61] H. Kim and B. Karp. Autograph: Toward automated, distributed worm signature detection. In Proceedings of the 13-th USENIX Security Symposium, San Diego, CA, August 2004.

[62] H. Kim and B. Karp. Autograph: Toward automated, distributed worm signature detection. In Proceedings of the 13-th USENIX Security Symposium (SECURITY), San Diego, CA, August 2004.

[63] O. Kolesnikov and W. Lee. Advanced Polymorphic Worms: Evading IDS by Blend- ing in with Normal Traffic. Technical report, Georgia Institute of Technology, 2004.

[64] J. Z. Kolter and M. A. Maloof. Learning to detect malicious executables in the wild. In Proceedings of the 10th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), Seattle, WA, August 2004.

[65] Ktwo. Admmutate v0.8.4: Shellcode mutation engine. http://www.ktwo.ca /ADMmutate-0.8.4.tar.gz, 2001.

155 [66] Aleksandar Kuzmanovic and Edward W. Knightly. Low-rate tcp-targeted denial of service attacks (the shrew vs. the mice and elephants). In Proceedings of ACM SIGCOMM, Karlsruhe, Germany, August 2003.

[67] A. Lakhina, M. Crovella, and C. Diot. Mining anomalies using traffic feature distri- bution. In Proceedings of ACM SIGCOMM’05, Philadelphia, PA, August 2005.

[68] E. Larkin. Widespread Internet Attack Cripples Computers with Spyware. http://www.pcworld.com/article/id,120448-page,1/article.html.

[69] K. F. Lee and S. Mahajan. Automatic Speech Recognition: the Development of the SPHINX System. Springer, 1988.

[70] W. Lee, S. Stolfo, and Phil Chan. Learning patterns from unix process execution traces for intrusion detection. In Proceedings of AAAI Workshop: AI Approaches to Fraud Detection and Risk Management, Menlo Park, CA, June 1997.

[71] W. Lee, S. J. Stolfo, and W. Mok. A data mining framework for building intrusion detection models. In Proceedings of the 1999 IEEE Symposium on Security and Privacy (S&P), Oakland, CA, May 1999.

[72] Shengying Li. A Survey on Tools for Binary Code Analysis. Depart- ment of Computer Science, Stony Brook University, http://www.cs.sunysb .edu/ lshengyi/papers/rpe/RPE.htm, 2004.

[73] Metasploit LLC. Windows System Call Table. http://www.metasploit.com/users /op- code/syscalls.html.

[74] X. Luo and RKC Chang. On a new class of pulsing denial-of-service attacks and. the defense. In Proceedings of 13th Annual Network and Distributed System Security Symposium (NDSS), San Diego, CA, February 2005.

[75] J. Ma, G. M. Voelker, and S. Savage. Self-stopping worms. In Proceedings of the ACM Workshop on Rapid Malcode (WORM), November 2005.

[76] R. Mahajan, S. Bellovin, S. Floyd, J. Ioannidis, V. Paxson, and S. Shenker. Control- ling high bandwidth aggregates in the network. In Proceedings of ACM SIGCOMM Computer Communication Review (CCR), Stockholm, Sweden, July 2002.

[77] M. Mannan and P. C. Oorschot. Instant messaging worms, analysis and countermea- sures. In In Proceedings of the 3-th Workshop on Rapid Malcode (WORM), Fairfax, VA, July 2005.

[78] S. Martin, A. Sewani, B. Nelson, K. Chen, and A. Joseph. Analyzing behavioral features for email classification. In Proceedings of the 2th International conference on email and anti-span (CEAS), Mountain view, CA, August 2003.

156 [79] MetaPHOR. http://securityresponse.symantec.com/avcenter/venc/data/w32.simile .html.

[80] Microsoft. Microsoft Virtual PC. http://www.microsoft.com/windows/virtualpc /de- fault.mspx.

[81] J. Mirkovic and P. Reiher. A taxonomy of ddos attack and ddos defense mechanisms. In ACM SIGCOMM Computer Communication Review, April 2004.

[82] J. Mirkovic and P. Reiher. A taxonomy of ddos attack and ddos defense mechanisms. ACM SIGCOMM Computer Communication Review, 34(2):39–54, 2004.

[83] J. Mirkovic and P. Reiher. A taxonomy of ddos attacks and defense mechanisms. ACM SIGCOMM Computer Communications Review, 34(2):39–54, April 2004.

[84] R. Moddemeijer. On estimation of entropy and mutual information of contineous distributions. Signal Processing, 16(3):233–246, 1989.

[85] D. Moore. Network telescopes: Observing small or distant security events. In Invited Presentation at the 11-th USENIX Security Symposium (SEC), San Francisco, CA, August 2002.

[86] D. Moore, V. Paxson, and S. Savage. Inside the slammer worm. IEEE Magazine of Security and Privacy, 1(4):33–39, 2003.

[87] D. Moore, C. Shannon, and J. Brown. Code-red: a case study on the spread and vic- tims of an internet worm. In Proceedings of the 2-th Internet Measurement Workshop (IMW), Marseille, France, November 2002.

[88] D. Moore, C. Shannon, and K. Claffy. Code-red: A case study on the spread and victims of an internet worm. In Proceedings of the 2-th ACM SIGCOMM Workshop on Internet Measurment, Marseille, France, November 2002.

[89] D. Moore, G. M. Voelker, and S. Savage. Infering internet deny-of-service activity. In Proceedings of the 10-th USNIX Security Symposium, Washington, DC, Auguest 2001.

[90] myNetWatchman. myNetWatchman Project. http://www.mynetwatchman .com.

[91] C. Nachenberg. -antivirus coevolution. Communications Of The ACM, 40(1):46–51, January 1997.

[92] R. Naraine. Botnet Hunters Search for Command and Control Servers. http://www.eweek.com/article2/0,1759,1829347,00.asp.

157 [93] H. Nunez, C. Angulo, and A. Catala. Rule extraction from support vector machines. In Proceedings of European Symposium on Artificial Neural Networks, Bruges, Bel- gium, August 2002.

[94] Chief of Engineers. United States Army: Army facilities components system user guide. http://www.usace.army.mil/inet/usace-docs/armytm/tm5-304/, October 1990.

[95] K. Park and H. Lee. On the eeffectiveness of route-based packet filtering for dis- tributed dos attack prevention in power-law internets. In Proceedings of ACM SIG- COMM, San Diego, CA, August 2001.

[96] R. Perdisci, O. Kolesnikov, P. Fogla, M. Sharif, and W. Lee. Polymorphic blending attacks. In Proceedings of the 15-th USENIX Security Symposium (SECURITY), Vancouver, B.C., August 2006.

[97] R. K. Pickholtz, D. L. Schilling, and L. B. Milstein. Theory of spead-spectrum communication - tutorial. IEEE Transaction on Communication, 30(5):855–884, 1982.

[98] GNU Project. Linux Function and Macro Index. http://www.gnu.org/software /libc/manual/html node/Function-Index.html#Function-Index.

[99] The Honeynet Project and Research Alliance. Know your enemy: Tracking botnets. http://www.honeynet.org/papers/bots/, 2005.

[100] F. Qin, C. Wang, Z. Li, H. Kim, Y. Zhou, and Y. Wu. Lift: A low-overhead practical information flow tracking system for detecting security attacks. Orlando, Florida, December 2006.

[101] M. Reiter and A. Rubin. Crowds: Anonymity for web transactions. ACM Transac- tions on Information and System Security, 1(1):66–92, November 1998.

[102] P. R. Roberts. Zotob Arrest Breaks Credit Card Fraud Ring. http://www.eweek.com /article2/0,1895,1854162,00.asp.

[103] SANS. Internet Storm Center. http://isc.sans.org/.

[104] S. Savage, D. Wetherall, A. R. Karlin, and T. Anderson. Practical network support for ip traceback. In Proceedings of ACM SIGCOMM, Stockholm, Sweden, August 2000.

[105] Stuart Schechter, Jaeyeon Jung, and Arthur W. Berger. Fast Detection of Scanning Worm Infections. In Proceedings of the 7-th International Symposium on Recent Advances in Intrusion Detection (RAID), French Riviera, France, September 2004.

158 [106] M. G. Schultz, E. Eskin, E. Zadok, and S. J. Stolfo. Data mining methods for detec- tion of new malicious executables. In Proceedings of IEEE Symposium on Security and Privacy (S&P), Oakland, CA, May 2001.

[107] M. Sedalo. Jempiscodes: Polymorphic shellcode generator. http://securitylab.ru/, 2003.

[108] V. Sekar, Y. Xie, D. Maltz, M. Reiter, and H. Zhang. Toward a framework for internet forensic analysis. In Proceeding of the 3-th Workshop on Hot Topics in Networks (HotNets-III), San Diego, CA, November 2004.

[109] C. E . Shannon and W. Weaver. The Mathematical Theory of Communication. Uni- versity of Illinois Press, 1949.

[110] Y. Shinoda, K. Ikai, and M. Itoh. Vulnerabilities of passive internet threat moni- tors. In Proceedings of the 14-th USNIX Security Symposium, Baltimore, MD, July- August 2005.

[111] S. Singh, C. Estan, G. Varghese, and S. Savage. Automated worm fingerprinting. In the 6th ACM/USENIX Symposium on Operating System Design and Implementation (OSDI), Fairfax, Virginia, December 2004.

[112] S. Singh, C. Estan, G. Varghese, and S. Savage. Automated worm fingerprinting. In Proceedings of the ACM/USENIX Symposium on Operating System Design and Implementation, December 2004.

[113] L. Spitzner. Know Your Enemy: Honeynets. Honeynet Project, http://project .hon- eynet.org/papers/honeynet.

[114] S. Staniford, V. Paxson, and N. Weaver. How to own the internet in your spare time. In Proceedings of the 11-th USENIX Security Symposium, San Francisco, CA, August 2002.

[115] I. Stoica, D. Adkins, S. Zhuang, S. Shenker, and S. Surana. Internet indirection in- frastructure. In Proceedings of ACM SIGCOMM Conference, Pittsburge, PA, August 2002.

[116] R. Stone. Centertrack: An ip overlay network for tracking dos floods. In 9th USENIX Security Symposium, San Francisco, CA, August 2000.

[117] P. Szor and P. Ferrie. Hunting for metamorphic. In Proceedings of Virus Bulletin Conference, September 2001.

[118] S. Theodoridis and K. Koutroumbas. Pattern Recognition, Second Edition. Elsevier Science, 2003.

159 [119] Paul Thurrott. Windows ”Longhorn” FAQ. http://www.winsupersite.com/faq /longhorn.asp.

[120] J. Twucrpss and M. M. Williamson. Implementing and testing a virus throttling. In Proceedings of the 12-th USENIX Security Symposium, Washington, DC, August 2003.

[121] V. Vapnik. The Nature of Statistical Learning Theory. Springer-Verlag, New York, 1995.

[122] V. Vapnik. Statistical Learning Theory. John Wiley and Sons, New York, 1998.

[123] S. Venkataraman, D. Song, P. Gibbons, and A. Blum. New streaming algorithms for superspreader detection. In Proceedings of the 12-th IEEE Network and Distributed Systems Security Symposium (NDSS), San Diego, CA, Febrary 2005.

[124] D. Wagner and D. Dean. Intrusion detection via static analysis. In Proceedings of IEEE Symposium on Security and Privacy (S&P), Oakland, CA, May 2001.

[125] J. Wang, L. Lu, and A. A. Chien. Tolerating denial-of-service attacks using overlay networks – impact of overlay network topology. In Proceedings of ACM Workshop on Survivable and Self-Regenerative Systems, Fairfax, Virginia, October 2003.

[126] L. Wang and B. B. Hirsbrunner. Pn-based security design for data storage. In Pro- ceedings of Databases and Applications, Innsbruck, Austria, Feberary 2004.

[127] X. Wang, S. Chellappan, C. Boyer, and D. Xuan. On the effectiveness of secure over- lay forwarding systems under intelligent distributed dos attacks. IEEE Transactions on Parallel and Distributed Systems (TPDS), 17(7):619–632, July 2006.

[128] X. Wang, S. Chellappan, P. Boyer, and D. Xuan. Analyzing secure overlay forward- ing systems under intelligent ddos attacks. Technical Report, OSU-CISRC-12/04- TR71, Dept. of Computer Science and Engineering, The Ohio State University, June 2004.

[129] X. Wang, W. Yu, A. Champion, X. Fu, and D. Xuan. Detecting Worms via Min- ing Dynamic Program Execution. to appear in IEEE International Conference on Security and Privacy in Communication Networks (SecureComm), Nice, France, September 2007.

[130] X. Wang, W. Yu, X. Fu, D. Xuan, and W. Zhao. iLOC: An invisible LOCalization Attack to Internet Threat Monitoring Systems. submitted to the IEEE Conference on Computer Communications (INFOCOM), July 2007.

160 [131] N. Weaver, S. Staniford, and V. Paxson. Very fast containment of scanning worms. In Proceedings of the 13-th USENIX Security Symposium, San Diego, CA, August 2004.

[132] J. Wu, S. Vangala, and L. X. Gao. An effective architecture and algorithm for detect- ing worms with various scan techniques. In Proceedings of the 11-th IEEE Network and Distributed System Security Symposium (NDSS), San Diego, CA, Febrary 2004.

[133] X. G. Xia, C. G. Boncele, and G. R. Arce. A multiresolution watermark for digital images. In Proceedings of International Conference on Image Processing (ICIP’97), Washington, DC, October 1997.

[134] L. Xiao, Z. Xu, and X. Zhang. Mutual anonymity protocols for hybrid peer-to-peer systems. In Proceedings of IEEE International Conference on Distributed Comput- ing Systems (ICDCS), Providence, RI, May 2003.

[135] D. Xuan, S. Chellappan, X. Wang, and S. Wang. Analyzing the secure overlay services architecture under intelligent ddos attacks. In Proc. of IEEE International Conference on Distributed Computing Systems (ICDCS), Tokyo, Japan, March 2004.

[136] S. Yang, J. P. Song, H. Rajamani, T. W. Cho, Y. Zhang, and R. Mooney. Fast and effective worm fingerprinting via machine learning. In Proceedings of the 3rd IEEE International Conference on Autonomic Computing (ICAC), Dublin, Ireland, June 2006.

[137] V. Yegneswaran, P. Barford, and S. Jha. Global intrusion detection in the domino overlay system. In Proceedings of the 11-th IEEE Network and Distributed System Security Symposium (NDSS), San Diego, CA, Febrary 2004.

[138] V.Yegneswaran, P. Barford, and D. Plonka. On the design and utility of internet sinks for network abuse monitoring. In Proceeding of Symposium on Recent Advances in Intrusion Detection (RAID), Pittsburgh, PA, September 2003.

[139] W. Yu, X. Fu, S. Graham, D. Xuan, and W. Zhao. Dsss-based flow marking technique for invisible traceback. In Proceedings of IEEE Symposium on Security and Privacy (S&P), Oakland, CA, May 2007.

[140] W. Yu, X. Wang, D. Xuan, and D. Lee. Effective detection of active worms with varying scan rate. Technical report, Department of Computer Science and Engineer- ing, The Ohio State University, April 2005.

[141] W. Yu, X. Wang, D. Xuan, and D. Lee. Effective detection of active worms with varying scan rate. In Proceedings of IEEE International Conference on Security and Privacy in Communication Networks (SecureComm), Baltimore, MD, August 2006.

161 [142] W. Yu, X. Wang, D. Xuan, and W. Zhao. On detecting camouflaging worm. In An- nual Computer Security Applications Conference (ACSAC), Miami, FL, December 2006.

[143] W. Yu, N. Zhang, and W. Zhao. Self-adaptive worms and countermeasures. In Pro- ceedings of Symposium on Stabilization, Safety, and Security of Distributed Systems (SSS), Dallas, TX, 2006 November.

[144] Zdnet. Smart worm lies low to evade detection. http://news.zdnet.co.uk /inter- net/security/0,39020375,39160285,00.htm.

[145] N. Zhang, S. Wang, and W. Zhao. A new scheme on privacy preserving associ- ation rule mining. In Proceeding of the 8-th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), Pisa, Italy, September 2004.

[146] C. Zou, L. Gao, W. Gong, and D. Towsley. Monitoring and early warning for internet worms. In Proceedings of the 10th ACM conference on Computer and Communica- tion Security (CCS), Washington D.C., October 2003.

[147] C. Zou, W. B. Gong, D. Towsley, and L. X. Gao. Monitoring and early detection for internet worms. In Proceedings of the 10-th ACM Conference on Computer and Communication Security (CCS), Washington DC, October 2003.

[148] C. C. Zou, W. Gong, and D. Towsley. Code red worm propagation modeling and analysis. In Proceedings of the 9-th ACM Conference on Computer and Communi- cation Security (CCS), Washington, DC, November 2002.

162