WIDESPREAD INTERNET ATTACKS: DEFENSE-ORIENTED EVOLUTION AND COUNTERMEASURES
DISSERTATION
Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the
Graduate School of The Ohio State University
By
Xun Wang, B.E., M.S.
*****
The Ohio State University
2007
Dissertation Committee: Approved by
Dong Xuan, Adviser Ten H. Lai Adviser Ming T. Liu Graduate Program in Computer Science and Engineering °c Copyright by
Xun Wang
2007 ABSTRACT
Widespread Internet attacks, such as Distributed Denial of Service (DDoS) attacks and
active worm attacks, have been major threats to the Internet in the recent past. Although
tremendous research effort has focused on this domain, the defense against these attacks
remains challenging for one reason: the attacks are evolving intelligently based on their
knowledge of defense mechanisms. In other words, the attacks are becoming more intel-
ligent and effective through defense-oriented evolution in order to defeat existing defense
systems. The objectives of this dissertation are to obtain deep insight about these defense-
oriented attacks and to address the challenges in defense against them.
While multiple elements define a specific defense system, the most important ones
are the system infrastructure and algorithms. The evolving defense-oriented attacks can exploit and leverage the knowledge of infrastructure and algorithms in the defense systems in order to counteract them. Hence we can classify defense-oriented widespread Internet attacks into infrastructure-oriented and algorithm-oriented attacks. In this dissertation, we
investigate a variety of such attacks and design new and more effective countermeasures
against them.
For infrastructure-oriented attacks, we study two classes of new attacks that target dif-
ferent aspects of the defense system infrastructure. First, we investigate intelligent DDoS
attacks which aim to infer architectures of the DDoS-defending Secure Overlay Forward-
ing Systems (SOFS) to launch attacks more efficiently than ordinary random DDoS attacks.
ii Second, we study the invisible LOCalization attack which can obtain location information
of Internet Threat Monitoring (ITM) systems. In order to defend against these new attacks,
we provide enhancements for SOFS and ITM systems.
For algorithm-oriented attacks, first we study a class of new active worms, the Varying
Scan Rate Worm, which deliberately varies its port scan rate during propagation to evade detection by existing network-based worm detection algorithms. Second, we focus on polymorphic worms which change or possess new signatures to defeat existing host-based
worm detection algorithms. Furthermore, we provide new and more effective detection
approaches against these new worms.
The war between attackers and defenders is never ending. We believe this dissertation
lays a foundation to deeply understand the evolution of widespread Internet attacks and to
enhance defenses against them.
iii To my family.
iv ACKNOWLEDGMENTS
To reach this stage in my Ph.D. study and this point in my life, I am indebted to many great people for their wisdom, support, and love.
It was a great fortune for me to become the first Ph.D. stduent of Dr. Dong Xuan in
September 2002. It is him who has given me the opportunity to conduct focused research in the past years, and it is him who has showed me the road to high quality work. While enjoying the freedom of independent thinking, I greatly appreciated his insightful advice on my research as well as life. I also greatly appreciate his patience in guiding me through my Ph.D. study. I still remember that, when I gave the first formal academic presentation in English, he gave me so much help and encouragement.
As a Ph.D. student in the Department of Computer Science and Engineering, I have also enjoyed and appreciated the advice and help from many other professors both within and outside of The Ohio State University, including Dr. Ming T. Liu, Dr. Ten H. Lai, Dr.
David Lee and Dr. Srinivasan Parthasarathy in CSE department, as well as Dr. Xinwen
Fu in Dakota State University and Dr. Wei Zhao in Rensselaer Polytechnic Institute. They have made my stay at The Ohio State University fun and fruitful.
During my Ph.D. study, I have enjoyed working with many other fellow graduate stu- dents in CSE department. I have got the chance to work with Sriram Chellappan, Thang
Nam Le, Sandeep Reddy, Wenjun Gu, Corey Boyer, Kurt Schosek, Xiaole Bai, Boxuan
Gu and Adam Champion on shared projects or research problems, and it was a wonderful
v experience. I have also interacted extensively with a few other graduate colleagues includ- ing Wei Yu in Texas A&M University, my research partner who has given me most help as a fellow student, and Prasad Calyam in the Ohio Supercomputer Center. Their help and laughters have greatly enriched my life at The Ohio State University. I would also like to thank my many other friends for their continued support during my life and study in OSU.
I am indebted to my parents Shihong Wang and Fanding Zhang, my dearest sister Xu
Wang, her husband Jason Chang and their two lovely sons for their unconditional love and support. I would like to take this opportunity to express my greatest gratitude to my greatest aunt, Xiaoman Duan, her husband Shengqi Wang and my wonderful cousin sister,
Yuan Wang. I would also like to thank the rest of my family — who are too numerous to name individually — for their love and help. It is this family that has made me strong and courageous to be the one I am today. It is my love, my inspiration, and my life.
vi VITA
January 29, 1977 ...... Born - Weinan, China
1999 ...... B.E. in Computer Engineering, East China Normal University, China 2002 ...... M.E. in Computer Engineering, East China Normal University, China 2006 ...... M.S. in Computer Science and Engineer- ing, The Ohio State University 2002-present ...... Graduate Research and Teaching Asso- ciate, The Ohio State University
PUBLICATIONS
Research Publications
Wei Yu, Xun Wang, Dong Xuan and Wei Zhao. “On Detecting Camouflaging Worm”. in Proceedings of 23rd Annual Computer Security Applications Conference (ACSAC), De- cember 2006.
Wei Yu, Xun Wang, Dong Xuan and David Lee. “Effective Detection of Active Smart Worms with Varying Scan Rate”. in Proceedings of 2nd IEEE International Conference on Security and Privacy in Communication Networks (SecureComm), August 2006.
vii Xun Wang, Sriram Chellappan, Corey Boyer and Dong Xuan. “On the Effectiveness of Secure Overlay Forwarding Systems under Intelligent Distributed DoS Attacks”. IEEE Transactions on Parallel and Distributed Systems (TPDS), 17(7):619 - 632, July 2006.
Xun Wang, Sriram Chellappan, Wenjun Gu, Wei Yu, and Dong Xuan. “Policy-driven Physical Attacks in Sensor Networks: Modeling and Measurement”. in Proceedings of IEEE Wireless Communications and Networking Conference (WCNC), April 2006.
Xun Wang, Wenjun Gu, Kurt Schosek, Sriram Chellappan, Dong Xuan. “Sensor Network Configuration under Physical Attacks”. International Journal of Ad Hoc and Ubiquitous Computing (IJAHUC), Lecture Notes in Computer Science, Inderscience, January 2006.
Wenjun Gu, Xun Wang, Sriram Chellappan, D. Xuan and Ten-Hwang Steve Lai. “Defend- ing against Search-based Physical Attacks in Sensor Networks”. in Proceedings of 2nd IEEE Mobile Sensor and Ad-hoc and Sensor Systems (MASS), November 2005.
Wei Yu, Sriram Chellappan, Xun Wang, and D. Xuan. “On Defending Peer-to-Peer System-based Active Worm Attacks”. in Proceedings of 48th IEEE Global Telecommuni- cations Conference (Globecom), November 2005.
Xun Wang, Sriram Chellappan, Wenjun Gu, Wei Yu and D. Xuan. “Search-based Physical Attacks in Sensor Networks”. in Proceedings of 14th IEEE International Conference on Computer Communication and Networks (ICCCN), October 2005.
Xun Wang, Wenjun Gu, Kurt Schosek, Sriram Chellappan, D. Xuan. “Sensor Network Configuration under Physical Attacks”. in Proceedings of 3rd International Conference on Computer Network and Mobile Computing (ICCNMC), August 2005.
Xun Wang, Wenjun Gu, Sriram Chellappan, K. Schosek and D. Xuan. “Lifetime Optimiza- tion of Sensor Networks under Physical Attacks”. in Proceedings of IEEE International Conference on Communications (ICC), May 2005.
D. Xuan, Sriram Chellappan, Xun Wang and Shengquan Wang. “Analyzing the Secure Overlay Services Architecture under Intelligent DDoS Attacks”. in Proceedings of 24th IEEE International Conference on Distributed Computing Systems (ICDCS), March 2004.
D. Xuan, Sriram Chellappan and Xun Wang. “Resilience of Structured Peer-to-Peer Sys- tems: Analysis and Enhancement”. Handbook on Theoretical and Algorithmic Aspects of Sensor, Ad Hoc Wireless and Peer-to-Peer Networks, CRC press LLC, 2004.
viii FIELDS OF STUDY
Major Field: Computer Science and Engineering
Studies in: Computer Networking Prof. Dong Xuan Prof. Ten H. Lai Prof. Ming T. Liu Software Engineering Prof. Atanas Rountev Computer Architecture Prof. Mario Lauria
ix TABLE OF CONTENTS
Page
Abstract ...... ii
Dedication ...... iv
Acknowledgments ...... v
Vita ...... vii
List of Tables ...... xiii
List of Figures ...... xiv
Chapters:
1. Introduction ...... 1
1.1 Widespread Internet Attacks are Major Threats to the Internet ...... 1 1.2 Widespread Internet Attacks are Evolving ...... 2 1.3 Contributions of the Dissertation: Defense-Oriented Evolution and Coun- termeasures ...... 5 1.3.1 Infrastructure-oriented Attacks ...... 7 1.3.2 Algorithm-oriented Attacks ...... 10 1.4 Organization of the Dissertation ...... 13
2. Intelligent DDoS Attacks against Secure Overlay Forwarding Systems and Countermeasures ...... 14
2.1 Motivations ...... 14 2.2 Background ...... 17 2.3 Intelligent DDoS Attacks ...... 19 2.4 Analysis of Intelligent DDoS Attacks against SOFS Systems ...... 21
x 2.4.1 Analysis of Round based Intelligent DDoS Attacks ...... 21 2.4.2 Analysis of Continuous Intelligent DDoS Attacks ...... 43 2.5 Countermeasures ...... 47 2.5.1 Optimization of SOFS System Performance Under Round based Attacks ...... 47 2.5.2 General Design Guidelines to Enhance SOFS System Performance 50 2.6 Related Work ...... 51 2.7 Summary ...... 52
3. Localization Attack against Internet Threat Monitoring Systems and Counter- measures ...... 54
3.1 Motivations ...... 54 3.2 Background ...... 56 3.2.1 Internet Threat Monitoring Systems ...... 56 3.2.2 Localization attacks against ITM Systems ...... 57 3.3 iLOC Attack ...... 57 3.3.1 Overview ...... 58 3.3.2 Attack Traffic Generation Stage ...... 60 3.3.3 Attack Traffic Decoding Stage ...... 63 3.3.4 Discussions ...... 66 3.4 Analysis ...... 67 3.4.1 Accuracy Analysis ...... 67 3.4.2 Invisibility Analysis ...... 69 3.4.3 Determination of Attack Parameters ...... 71 3.5 Implementation and Validation ...... 73 3.5.1 Implementation of the iLOC Attack ...... 73 3.5.2 Validation of the iLOC Attack ...... 74 3.6 Performance Evaluation ...... 76 3.6.1 Evaluation Methodology ...... 76 3.6.2 Results ...... 78 3.7 Guidelines of Countermeasures ...... 80 3.8 Related Work ...... 83 3.9 Summary ...... 84
4. Varying Scan Rate Worms against Network-based Worm Detections and Coun- termeasures ...... 85
4.1 Motivations ...... 85 4.2 Background ...... 86 4.2.1 The Propagation Model of Traditional Worms ...... 86 4.2.2 Network-based Worm Detection ...... 88
xi 4.3 The Active Worm with Varying Scan Rate ...... 90 4.3.1 The VSR Worm Model ...... 90 4.3.2 Analysis of the VSR Worm ...... 91 4.4 DEC Worm Detection ...... 96 4.4.1 Design Rationale ...... 96 4.4.2 DEC Worm Detection ...... 97 4.4.3 Space of Worm Detection ...... 102 4.5 Performance Evaluation ...... 104 4.5.1 Evaluation Methodology ...... 104 4.5.2 Detection Performance ...... 106 4.6 Related Work ...... 110 4.7 Summary ...... 112
5. Polymorphic Worms against Host-based Worm Detection and Countermeasures 114
5.1 Motivations ...... 114 5.2 Background ...... 117 5.2.1 Worm Detection ...... 117 5.2.2 Program Analysis ...... 118 5.2.3 Data Mining ...... 119 5.3 Polymorphic Worms ...... 120 5.4 Worm Detection via Mining Dynamic Program Execution ...... 121 5.4.1 Framework ...... 121 5.4.2 Dataset Collection ...... 125 5.4.3 Feature Extraction ...... 127 5.4.4 Classifier Learning and Worm Detection ...... 129 5.5 Experiments ...... 138 5.5.1 Experiment Setup and Metrics ...... 139 5.5.2 Experiment Results ...... 140 5.6 Discussions ...... 143 5.7 Related Work ...... 145 5.8 Summary ...... 147
6. Concluding remarks ...... 149
Bibliography ...... 151
xii LIST OF TABLES
Table Page
1.1 Defense-oriented attacks and countermeasures studied in this dissertation. . 6
2.1 Optimal mapping degree with different NT ...... 49
2.2 Optimal node distribution under 1 to 2 mapping with different NT . . . . . 49
3.1 Defender Detection Rate PDD (Port 135) ...... 79
4.1 Detection Time of Some Existing Detection Schemes ...... 95
4.2 Maximal Infection Ratio of Some Existing Detection Schemes ...... 96
4.3 DEC Performance Sensitivity of Parameter α ...... 108
5.1 Detection results for the Naive Bayes based detection ...... 141
5.2 Detection results for the SVM based detection ...... 141
xiii LIST OF FIGURES
Figure Page
2.1 The generalized SOFS architecture...... 17
2.2 A Snapshot of the generalized SOFS architecture under the intelligent DDoS attacks...... 23
2.3 Sensitivity of PS to L and mi under different attack intensities...... 29
2.4 Node demarcation in our successive attack at the end of Round j...... 34
2.5 Sensitivity of PS to NT under different L, mi and N...... 39
2.6 Sensitivity of PS to L, mi and node distribution...... 40
2.7 Sensitivity of PS to R (a) and PE (b)...... 42
2.8 Sensitivity of PS to L under different m (a), and to NC under different L and r (b)...... 45
2.9 Sensitivity of PS to NT under different r and L (a), and different r and m (b). 46
3.1 Workflow of the iLOC Attack ...... 58
3.2 PN-code and Encoded Attack Traffic ...... 63
3.3 Experiment Setup ...... 74
3.4 Background Traffic vs. Traffic Mixed with iLOC Attack ...... 75
3.5 PSD for Background Traffic vs. Traffic Mixed with iLOC Attack ...... 75
xiv 3.6 Attack Successful Rate (Port 135) ...... 77
3.7 Attack Successful Rate vs. Code Length ...... 81
3.8 Attack Successful Rate vs. Number of Parallel Attack Sessions ...... 81
3.9 Attack Successful Rate vs. Number of Parallel Attack Sessions ...... 81
4.1 Infection ratio of different VSR worms...... 93
4.2 The observed worm instance count of different VSR worms...... 93
4.3 Bayes decision rule for normal and worm traffic features ...... 101
4.4 Space of worm detection ...... 103
4.5 Detection time of detection schemes on VSR worms ...... 106
4.6 Maximal infection ratio of detection schemes on VSR worms ...... 107
4.7 Detection time o f detection schemes on the traditional PRS worms . . . . . 109
4.8 Maximal infection ratio of detection schemes on the traditional PRS worms 109
5.1 Workflow of the off-line classifier learning ...... 121
5.2 Workflow of the on-line worm detection ...... 122
5.3 Basic idea of kernel function in SVM...... 135
xv CHAPTER 1
INTRODUCTION
1.1 Widespread Internet Attacks are Major Threats to the Internet
Widespread Internet attacks are large-scale attacks on the Internet whose attack sources spread widely over the Internet [23, 68]. They have been major threats to the Internet in the recent past and there are many well-known examples such as Distributed Denial of
Service (DDoS) attacks, active worm attacks, spam, spyware etc. In July 2001, an active worm called “Code-Red” infected more than 350,000 Microsoft IIS servers. In less than
14 hours, this active worm caused more than 1.2 billion dollars in economic damages [87].
In October 2002, a DDoS attack lasted for only an hour but was able to shut down 7 of the
13 Internet DNS root servers [7]. In January 2003, another active worm called “Slammer” infected nearly 75,000 Microsoft SQL servers in less than 10 minutes and consequently caused large scale disruptions in production systems worldwide [86]. In March 2004, active worms called “Witty” and “Sasser” infected many hosts in a short time and made them unusable [24]. This list of attacks is increasing and there is no apparent end in sight.
Furthermore, a recent trend has emerged in which different types of attacks combine to increase attack sophistication and efficiency. For example, Code Red worms launch DDoS attack against the White House’s website (www.whitehouse.gov) at the final stage of their
1 propagation [88]. More recently, in February 2004, the MyDoom worm propagated rapidly to many hosts which flooded the websites, www.sco.com and www.microsoft.com, thereby preventing legitimate users from accessing them [4]. Combinations of different types of attacks is not limited to active worms and DDoS attacks. Many active worms are used to infect a large number of hosts and recruit them as bots or zombies, which are networked together to form botnets [99] [102] [92]. These botnets can be used to: (i) launch massive
DDoS attacks that disrupt Internet utility [4], (ii) access confidential information that can be abused [5] through large scale traffic sniffing, key logging, identity theft etc., (iii) distribute large-scale unsolicited advertisement emails (as spam) or software (as adware), (iv) spread new malware by installing Trojan Horses or other backdoor software, and (v) destroy data that has a high monetary value [6]. There is even evidence showing that botnets are being rented out for attacks on Internet e-businesses [102].
1.2 Widespread Internet Attacks are Evolving
Due to the massive damage caused by widespread Internet attacks, a significant amount of research efforts have been focused on developing effective methods to model, detect and defend against them. Among them, DDoS attacks and active worms are the most dominant and dangerous threats to the Internet. Research regarding understanding of them and defense against them is the most important and imperative and receives the greatest attention and effort.
Although much effort and progress have been made in this direction, effective defense against these attacks is still a challenge today due to one fact: widespread Internet attacks have evolved and are continuously evolving.
2 1. Evolution of DDoS Attacks
During its evolution, the DDoS attack has not only has equipped itself with new at-
tack approaches, it also has added new types of entities to its list of attack targets.
Generally, a Denial of Service (DoS) attack is characterized by an explicit attempt
to prevent the legitimate use of a system service. A Distributed Denial of Service
(DDoS) attack deploys multiple attacking computers to attain this goal. In the early
generations of DDoS attacks, the attacker sent a stream of packets to a victim, which
consume some key resource such as network bandwidth, computation capacity (CPU
cycle), or memory, thereby rendering the resource unavailable to the victim’s legit-
imate clients. In later DDoS attacks, the attacker sent a few malformed packets to
confuse an vulnerable application or a vulnerable protocol on the victim machine
and force it to freeze or reboot. Both of these two approaches set specific hosts or
networks as the victim, but the former targets to network or computation resources,
whereas the latter targets the protocol or application. Furthermore, new DDoS at-
tacks target the Internet infrastructure (such as DNS systems) rather than specific
victims [7].
DDoS attacks also attempt to evade detection. In the above attacks, the attacker con-
tinuously sends a large amount of packets to a victim to exhaust its key resource,
overload it to disable communication, crash its service, or block its network link.
Typical examples are TCP SYN attack, TCP and UDP flood attacks, ICMP echo
attack and Smurf attack [81]. These attack methods share one common feature: a
large number of compromised machines or agents involved in the attack and transmit
packets at a high rate to the victim, which make the DDoS attacks easy to be de-
tected. While potentially quite harmful, the high-rate nature of such attacks presents
3 a statistical anomaly to network monitors such that the attack can be detected, the
attacker identified, and the attack effects mitigated. However, recent work shows
that smart DDoS attackers can maliciously chosen low-rate DoS traffic patterns that
exploit TCP’s retransmission time-out mechanism and throttle TCP flows to a small
fraction of their ideal rate while eluding detection [66] [74].
2. Evolution of Active Worms
Active worms also use multiple evolutionary principles to propagate themselves more
efficient. First, while the first generation worms only used port scanning to propa-
gate themselves, current worms propagate themselves more effectively using various
methods, e.g., network port scanning, email, file sharing, Peer-to-Peer (P2P) net-
works, and Instant Messaging (IM). Second, worms use different scan strategies dur-
ing different stages of propagation. For example, instead of using pure random scans
to find victims, they use a hitlist to infect previously identified vulnerable hosts at
the initial stage of propagation in order to increase propagation efficiency [132, 81].
Once they propagate to a new local network, they scan all IP addresses in this local
network first in order to increase the chances to hit victims. They use DNS, network
topology and routing information to identify active hosts instead of randomly scan
IP addresses [132, 81]. They split the target IP address space during propagation in
order to avoid duplicate scans.
They also become more modular and organized in order to carry other attack pay-
loads and launch any kind of organized and synchronized large scale attack [51, 99].
Furthermore, in order to evade numerous worm detection systems, they are becom-
ing stealthy. For example, the “tak” worm [144] is a recently-discovered active worm
4 that attempts to remain hidden by sleeping (stopping scans) when it suspects it is un-
der detection. Worms that adopt similar attack strategies to those of the “Atak” worm
could yield overall scan traffic patterns different from those of traditional worms.
Therefore, the existing network-based detection schemes with scan traffic monitor-
ing will not be able to effectively detect them.
Unlike the above network-based worm detection systems, host-based worm detection
systems search inbound binary code content for known patterns, or signatures, that
correspond to worms. To date, in order to detect and/or block active worms, these
worm detection systems use signatures that match bytes from a worm’s payload using
techniques such as string matching at arbitrary payload offsets [3, 111] and regular
expressions matching within a payload [3].
However, newly evolved active worms intend to be polymorphic [20, 32, 63]. Poly-
morphic worms are able to change their binary representation or signature as part
of the spreading process. This can be achieved with self-encryption mechanisms or
semantics-preserving code manipulation techniques. Consequently, copies of a poly-
morphic worm may no longer share a common invariant substring of sufficient length
and existing detection systems will not recognize network streams that contain copies
of worms or executables as manifestations of a worm outbreak. This worm evolution
trend also requires us to enhance content-based worm detection schemes.
1.3 Contributions of the Dissertation: Defense-Oriented Evolution and Countermeasures
Among the above evolutions of widespread Internet attacks, some aim to evade the de- tection based on their knowledge of detection schemes, such as the slow-rate DDoS attacks
5 Defense Element the Attack Countermeasure Attack is Oriented Architecture Intelligent DDoS attacks Optimal configuration of Infrastructure against SOFS systems SOFS systems Location invisible LOCalization Enhancement of ITM sys- (iLOC) attacks against ITM tems systems Network- Varying Scan Rate worms attack target Distribution Algorithm based algo- Entropy based dynamiC rithm (DEC) worm detection Camouflaging worms Spectrum analysis based worm detection Host-based Polymorphic worms Worm detection through algorithm mining dynamic program Execution
Table 1.1: Defense-oriented attacks and countermeasures studied in this dissertation.
and polymorphic worms discussed previously. We notice that this kind of deliberate attack evolution and resulted attacks are more effective (for the attacks) and more dangerous (for the defenses and victims) than random and ad hoc evolution. We carry out systematic and comprehensive investigations following a more general evolution trend: defense-oriented evolution. Defense-oriented evolution results in various enhanced or new attacks that take advantage of knowledge of existing defense systems to counteract these systems. In this dissertation, we investigate varieties of potentially defense-oriented, evolved attacks in or- der to obtain deep insights about them and find vulnerabilities in existing defense systems.
Consequently, we further enhance these defense systems or design new and more effective defense schemes.
While multiple elements define a specific defense system, the most important ones are the infrastructure and algorithm parts. Defense-oriented (evolved) attacks can exploit and leverage knowledge of defense system infrastructure and algorithms in order to coun- teract them and make new attacks more effective and dangerous. Thus, we can classify
6 the defense-oriented attacks into defense-infrastructure-oriented (or just infrastructure-
oriented) and defense-algorithm-oriented (or just algorithm-oriented) attacks. We inves-
tigate different instances of each class of defense-oriented widespread Internet attacks as
shown in Table 1.1.
1.3.1 Infrastructure-oriented Attacks
Infrastructure is simple for single-sited systems, such as host-based or single-device-
based systems, but it can be sophisticated for distributed systems. There are several im-
portant elements in the infrastructure of a system, such as its architecture, topology and
components. In order to counteract defense systems, attackers desire to obtain information
about the infrastructure elements thereof. While some high-level infrastructure information
(such as constituent component, topology type, and etc.) may be public, detailed informa-
tion (such as the identities, roles and locations of the components, relations and connec-
tions between components within the architecture, and etc.) are not accessible outside of
the system. However, it is these detailed infrastructure information that attackers can use to
counteract the defense system. Therefore, attackers need to use specific attack approaches
to obtain these information.
In our research, we focus on infrastructure-oriented attacks that target to the architec- ture and location information of defense systems. We systematically investigate instances of these attacks, based on which we propose methods to enhance defense systems.
7 1. Architecture-oriented Attacks and Countermeasures
— Intelligent DDoS Attacks to SOFS Systems and Optimization of SOFS Sys-
tems
A recent approach to protect communications from DDoS attacks involves the use of
overlay systems. Although such systems perform well under random DDoS attacks,
it is questionable whether they are resilient to intelligent DDoS attacks which aim
to infer the architectures of the systems to launch more efficient attacks. We define
several intelligent DDoS attack models and develop analysis and simulations to study
the impacts of intelligent DDoS attacks on system performance in terms of the path
availability between clients and the server [127, 135]. We generalize such systems as
Secure Overlay Forwarding Systems (SOFS). There are certain standard architectural
features of such systems, i.e., layering, mapping degree and node distribution.
We analyzed our SOFS system under discrete-round-based and continuous attacks
using a general analytical approach and simulations, respectively. We observed that
the system design features, attack strategies, intensities, prior knowledge about the
system, and system recovery significantly impacts system performance. Even under
sophisticated attack strategies and intensities, we showed that smart design of system
features and recoveries can significantly reduce attack impacts. We provide a method
to obtain optimal system configurations under given attack strategies and intensities.
Furthermore, we propose a set of design guidelines to enhance SOFS performance
under all general scenarios.
8 2. Location-oriented Attacks and Countermeasures
— invisible LOCalization Attack against Internet Threat Monitoring Systems
Internet threat monitoring (ITM) systems have been deployed to detect widely spread-
ing threats and attacks on the Internet in recent years. However, the integrity and
functionality of these systems largely depend on the location anonymity of their mon-
itors. If the locations of monitors are disclosed, the attacker can bypass the monitors
or even abuse them, significantly jeopardizing the performance of ITM systems. In
this work, we study a new class of attack, the invisible LOCalization (iLOC) attack
[130]. The iLOC attack can accurately and invisibly localize monitors of ITM sys-
tems. In the iLOC attack, the attacker launches low-rate scan traffic, encoded with
a selected pseudo-noise code (PN-code), to targeted networks. While the secret PN-
code is invisible to others, the attacker can accurately determine the existence of
monitors in the targeted networks based on whether the PN-code is embedded in the
report data queried from the data center of the ITM system. We implement the iLOC
attack and conduct experiments on a real-world ITM system to validate the feasibil-
ity of such attacks. We also conduct extensive simulations on the iLOC attack using
real-world traces. Our data demonstrate that the iLOC attack can accurately iden-
tify monitors while remaining invisible to ITM systems. Finally, we present a set
of guidelines to counteract the iLOC attack. Particularly, the iLOC attack does not
directly harm Internet or defense systems by itself, but it can aid other widespread
Internet attacks by defeating ITM systems to increase attack damage.
9 1.3.2 Algorithm-oriented Attacks
Algorithms in a defense system deal with flow control, data processing and decision-
making in detection and response. Similar to the infrastructure information, the high-level
algorithms are not unknown to attackers. However, attacker do not need to know their
detailed information, despite the difficulty of obtaining this. Knowledge of high-level al-
gorithm information suffices for attackers to evolve their attacks to render current defense
system algorithms ineffective, and thus to defeat the defense systems.
In this dissertation, we focus on algorithm-oriented worm attacks, which attempt to
evolve themselves based on their knowledge of existing worm detection algorithms in order
to circumvent detection. Since worm detection can be classified into network-based and host-based categories, we study algorithm-oriented worm attack instances in each category.
1. Network-based Algorithm-oriented Attacks and Countermeasures
— VSR Worm, C-Worm and Countermeasures
It has been observed that the number of infected hosts and overall port scan traffic
volume increase exponentially over time when traditional worms propagate in the
Internet [86][27][148]. Based on these observations, many network-based worm de-
tection algorithms associated with global scan traffic monitoring systems, such as
threshold-based detection and trend-based detection, have been developed to detect
large scale propagation of worms in the Internet [147][105][132][123]. However,
worm writers know that these detection algorithms expect exponentially increasing
port scan traffic or a large volume of port scan traffic during worm propagation.
Consequently, worm writers can evolve their worms accordingly so existing worm
detection algorithms will not observe abnormal traffic or generate an alarm during
10 the propagation of new worms. Hence the evolved worms can evade existing worm
detection systems.
In this dissertation, we model a new class of active worms called the Varying Scan
Rate worm (the VSR worm in short) [141]. The VSR worm deliberately varies
its scan rate and is able to avoid effective detection by existing worm detection
schemes. As a countermeasure against the VSR worm, we design a new worm detec-
tion scheme called attack target Distribution Entropy based dynamiC worm detection
(DEC detection in short). DEC detection utilizes the attack target distribution and its
statistical entropy in conjunction with dynamic decision rules to distinguish worm
scan traffic from non-worm scan traffic. We conduct extensive performance eval-
uations on the DEC detection scheme using real-world traces as background scan
traffic. Our data clearly demonstrate the effectiveness of the DEC detection scheme
in detecting VSR worms as well as traditional worms.
In our research, we also investigate another new class of active worms, i.e., the Cam-
ouflaging Worm (C-Worm), which has the ability to camouflage its propagation from
worm detection systems [142] through timely manipulation of scan traffic. In order
to detect C-Worms, we design a novel spectrum-based scheme. Our performance
results demonstrate that our scheme can detect the C-Worm more rapidly and accu-
rately in comparison with existing worm detection schemes.
2. Host-based Algorithm-oriented Attacks and Countermeasures
— Polymorphic Worm Detection via Mining Dynamic Program Execution
As discussed in Section 1.2, host-based worm detection systems commonly use a col-
lection of worm signatures to determine whether an incoming executable is a worm
11 or not. However, the worm writers know that the detection algorithms in these de-
tection systems expect signatures of known worms. Consequently, they can generate
new worms which have different binary representations or signatures. They even can
generate polymorphic worms which change their binary representations or signatures
as part of the propagation process. Thus, the existing host-based worm detection al-
gorithm will not observe the expected worm signatures during the propagation of
these evolved worms and cannot detect them effectively
In order to detect these polymorphic worms or worms whose signatures are unknown
to the host-based worm detection systems, we propose a new worm detection ap-
proach based on mining dynamic program executions [129]. This approach can cap-
ture the dynamic behavior of executables to provide accurate and efficient detection
against both seen and unseen worms. In particular, we execute a large number of real-
world worms and benign executables and trace their system calls. For mining from
a large amount of features extracted from the system call traces, we apply two clas-
sifier learning algorithms (Naive Bayes and Support Vector Machine). The learned
classifiers are further used to carry out rapid worm detection with low overhead on
the end-host. Our experimental results clearly demonstrate the effectiveness of our
approach to detect new and polymorphic worms in terms of very high detection rate
and low false positive rate.
Biological evolution is based on fast genetic mutation and natural selection, i.e., it relies on a brute-force and ad hoc search for better genetic adaption to the environment [19].
Men-made entities in human society cannot use this random and slow evolution principle; instead, humans will control the evolution of their products and adapt them purposely and effectively. We believe the defense-oriented attack evolution studied in this dissertation
12 is among the most dangerous and efficient types of evolution, since attackers deliberately
evolve their malware in order to defeat their adversaries, i.e., defense systems. However,
our purpose is not to encourage attacks, but to obtain deep insights about potential new
Internet threats and vulnerabilities of current defense systems in order to enhance current
defense systems and design more effective defenses against evolving widespread Internet
attacks.
1.4 Organization of the Dissertation
The rest of this dissertation is organized as follows. We present our investigation
on architecture-oriented evolving attacks and countermeasures in Chapters 2 and 3, then
we discuss algorithm-oriented evolving attacks and countermeasures in Chapters 4 and 5.
Specifically, we discuss the intelligent DDoS attacks against SOFS systems and counter-
measures in Chapter 2, then we present the iLOC attacks against ITM systems and coun- termeasures in Chapter 3. Afterwards, we detail the VSR worm attacks with our DEC detection countermeasure in Chapter 4 and we introduce worm detection against polymor- phic worms through mining dynamic program execution in Chapter 5. Finally, we conclude this dissertation in Chapter 6.
13 CHAPTER 2
INTELLIGENT DDOS ATTACKS AGAINST SECURE OVERLAY FORWARDING SYSTEMS AND COUNTERMEASURES
In this chapter, we discuss the first type of infrastructure-oriented attack in this disser- tation, which infer the architecture information of a DDoS (Distributed Denial of Service) attack defending system to facilitate the DDoS attacks. Particularly, we define intelligent
DDoS attacks and generalize DDoS-defending overlay intermediate forwarding systems as
Secure Overlay Forwarding Systems (SOFS). The intelligent DDoS attacks we study here are evolved DDoS attacks which use the architectures of SOFS to launch more efficient
DDoS attacks against SOFS. We analyze the SOFS system under discrete round based at- tacks using a general analytical approach, and analyze the system under continuous attacks using simulations. We also provide optimal system architecture configurations for SOFS system under expected attack strategies and intensities. Furthermore, we propose a set of design guidelines to enhance SOFS performance under all general attack scenarios.
2.1 Motivations
DDoS attacks are currently major threats to communications in the Internet [83]. Cur- rent level of sophistication in system resilience to DDoS attacks is far from definite. Tremen- dous amount of research is being done in order to improve the system security under DDoS
14 attacks [104, 76, 95, 66, 60, 10, 115]. For many applications, reliability of communication
over the Internet is not only important but mandatory. Typical examples of such applica-
tions are emergency, medical, and other related services. The system needs to be resilient
to attacks from malicious users within and outside of the system that aim to disrupt com-
munications.
A recent body of work in the realm of protecting communications between a set of
clients and a server against DDoS attacks employs proactive defense mechanisms using
overlay-based architectures [60, 10, 115]. Typically, in such overlay-based architectures, a
set of system deployed nodes on the Internet form a communication bridge between clients
and a critical server. The deployed nodes are intermediate forwarders of communication
from clients to the server. These nodes are arranged into overlay-based architectures (or
structures) that provide attack-resistant features to the overall communication. For exam-
ple, the architecture in the SOS system [60] is a set of overlay nodes arranged in three lay- ers between clients and the server through which traffic is authenticated and then routed.
These layers are SOAP (Secure Overlay Access Point), Beacons and Secret Servlets. A client that wishes to communicate with a server first contacts a node in the SOAP layer.
The node in the SOAP layer forwards the message to a node in the beacon layer, which then forwards the message to a node in the secret servlet layer, which routes the message to the server. In the Mayday system [10], the authors extend work on SOS [60] by primar- ily releasing the restrictions on the number of layers (unlike in SOS, where it is fixed at three). In the Internet Indirection Infrastructure (I3) [115], one or more Indirection points are introduced as intermediaries for communication between senders and receivers.
The design rationale in all these systems is to ensure, using proactive architectures, (i)
that the server and intermediate communication mechanisms are hidden from outsiders, (ii)
15 the presence of multiple/alternate paths to improve reliability and (iii) access control to pre- vent illegitimate users from being serviced, and dropping attack traffic far away from the server. The overall objective though is to ensure that there are high degrees of path avail- abilities from clients to the server even when attackers try to compromise communication using random congestion-based DDoS attacks, by bombarding randomly chosen nodes in the system with huge amounts of traffic.
While the above systems provide high degrees of path availabilities under random congestion-based DDoS attacks, such systems can be targeted by intelligent attackers that can break-into the system structure apart from congesting nodes. By break-in attacks, we mean attacks that can break-into a node and disclose its neighbors in the communication chain. By combining break-in attacks with congestion attacks, attackers can significantly worsen damages, as opposed to pure random congestion. In fact attackers can employ re- sults of break-in attacks (disclosed nodes) to guide subsequent congestion attacks on the disclosed nodes. Under intense break-in attacks, the attacker can traverse the communi- cation chain between the forwarder nodes, and can even disclose the server to eventually congest it and completely annul services.
We believe that such intelligent DDoS attacks that can combine break-in attacks with congestion attacks are representative and potent threats to overlay-based systems, such as [60, 10, 115] that protect communications between clients and the servers. However, existing work does not study system performance under these intelligent attacks. In this chapter, we extensively study performance of such overlay-based systems when targeted by intelligent DDoS attacks that combine break-in and congestion attacks. We also subse- quently study how design features of such systems impact performance under intelligent attacks. As a first step, we generalize such systems as Secure Overlay Forwarding Systems
16 (SOFS). We also capture three standard architectural features of such systems 1 as layering
(the number of layers between the client and server), mapping degree (number of next layer neighbors a node can communicate with), node distribution (number of nodes per layer).
Our objective is to study the impacts of the design features of SOFS system on its performance under intelligent DDoS attacks, and to provide guidelines to design SOFS systems highly resilient to intelligent DDoS attacks.
2.2 Background
The SOFS Architecture
In its most basic version, the SOFS architecture consists of a set of overlay nodes ar- ranged in layers of a hierarchy as shown in Fig. 2.1. The nodes in these layers serve as intermediaries between the clients and the critical target 2. Such a system has three distin- guishable design features. They are Layering, Mapping (Connectivity) Degree and Node
Distribution across layers. A clearer description for each feature is given below.
Layer i Layer i+1 LayerNode i LayerNode i+1 LayerNode i LayerNode i+1 Source Layer 2 Node Node Layer L Point LayerNode 2 LayerNode L Layer 2 Node LayerNode L Node Node Layer 1 Layer 1 Node Node Target Filtered region
Figure 2.1: The generalized SOFS architecture.
1We use the terms architectural features and design features interchangeably in this chapter. 2We use the terms target and server interchangeably in this chapter.
17 • Number of Layers (Layering): The number of layers in the architecture quantifies
the depth of control during access to the target. If the number of layers is L, then
clients must pass through these L layers before communicating with the target. The
importance of layering is that if the number of layers is larger, implicitly it means
that the target is better hidden against external clients.
• Mapping (Connectivity) Degree: Each node in Layer i routes to node(s) in Layer
i + 1 towards the target to complete the communication chain. The mapping degree
in the SOFS architecture is a measure of the number of neighbors a node in Layer i
has in Layer i + 1. Typically, the larger the mapping degree is, the more reliable is
the communication due to the availability of more paths. The largest is actually 1 to
all, where each node in Layer i has all nodes in Layer i + 1 as its neighbors.
• Node Distribution: Node distribution is a measure of the number of nodes in each
layer. Intuitively it may seem that the uniform node distribution across layers is
preferred to ensure a degree of load balancing in the system. However, for a fixed
amount of nodes to be distributed across a fixed number of layers, it may be advisable
to deploy more nodes at layers closer to the target to increase defenses in sensitive
layers nearer the target.
A client that wishes to communicate with the target first contacts node(s) in the first layer which contact node(s) in the second layer and so on till the traffic reaches the target. In this architecture each node is only aware of neighbors in its neighboring layer. A set of filters acts as a firewall surrounding the target through which only legitimate traffic is allowed.
18 2.3 Intelligent DDoS Attacks
We now discuss how intelligent attackers can compromise the SOFS system. It has the ability to break-into nodes to disclose the victim nodes’ next-layer neighbors. The attacker also has the ability to congest nodes to prevent them from servicing legitimate clients. We formally define these two attacks below.
• Break-in Attacks: The attacker has the ability to attempt to break-into nodes in the
SOFS system. A successful break-in results in dysfunction of the victim node and
disclosure of the neighbors of the victim node.
• Congestion Attacks: The attacker has the ability to congest nodes in the SOFS sys-
tem. By congest, congestion-based DDoS attacks or simply congestion attacks, we
mean any of the distributed attack methods that prevent a victim machine from pro-
viding services.
Our work focuses on the theoretical analysis of the impacts of intelligent DDoS attacks on the SOFS system, rather than the actual attack methods. However, we believe that both the break-in attacks and congestion attacks models we present are practical. The execution of break-in attacks can be through some intrusion attacks, or through malicious code hidden in the message sent by malicious clients as those in Trojan horse or active worm attacks [83].
When received by the victim node, the malicious code can execute on the victim node to make it un-functional, and retrieve the victim node’s neighbor list. The malicious code can even then self propagate to the disclosed neighbors. The execution of congestion attacks on a victim machine will result in, the victim being prevented from servicing requests or, disconnecting the victim from the system. This can be due to exhausting its key resource,
19 overloading the machine to disable communication, crashing its service, blocking its net- work link. Typical examples are TCP SYN attack, TCP and UDP flood attack, ICMP echo attack and Smurf attack [83]. The above two attacks can be conducted in several possible ways. However, keeping in mind the above attack types, and with the intention of max- imizing attack impacts, the attacker will usually first conduct break-in attacks to disclose the identities of many nodes. Congestion attacks on the disclosed nodes then follow after the break-in attacks. In this realm, we define two attack models below.
• A discrete round based attack model: In our attack models, the attacker can launch
break-in attacks on limited number of nodes only. In round based attack model,
the attacker launches the break-in attacks in a round by round fashion, with part of
attempts made in each round. The rationale is that, by successively breaking-into
nodes and locating their neighbors, the attacker can disclose more nodes. We call
this model as discrete because, here the attacker starts a fresh round only after the
results of all attempted break-ins in the current round are available to it. Congestion
attacks follow next, and are conducted in one round.
• A continuous attack model: In this model, the attacker attempts to disclose some
nodes first, using part of its break-in attack resources. However in this model, the at-
tacker continuously keeps breaking-into disclosed nodes as and when they are iden-
tified. Congestion attacks follow next in a similar fashion.
The attack models are described in more detail in Sections 2.4.1 and 2.4.2. We wish to emphasize here that the SOFS system also has the recovery ability to defend against attacks.
However any meaningful execution of the recovery mechanism is contingent on the attacks.
In some cases, the system may not be able to conduct any effective recovery if the attacker
20 can speedily conduct its attack, disrupting system performance for some short duration of
time. However, if the attack is slow, the system can attempt to take effective recovery action
to restore performance. More details on system recovery are given in Section 2.4.2.
In this chapter, we study the SOFS system performance under discrete round based
attacks and continues attacks. We demonstrate that system performance is sensitive to
design features and attacks, and the architecture needs to be flexible in order to achieve
better performance under different attacks.
2.4 Analysis of Intelligent DDoS Attacks against SOFS Systems
2.4.1 Analysis of Round based Intelligent DDoS Attacks
In this section we conduct an extensive mathematical analysis on the SOFS architecture
under the discrete round based intelligent DDoS attack model with no system recovery.
The system we study consists of a total of N overlay nodes that can be active or dormant.
By active, we mean that the nodes are currently in the SOFS architecture and ready to
serve legitimate requests 3.A dormant overlay node is one that is a part of the system but
currently is not in the SOFS architecture and is not serving requests. In this chapter, when
we use the term overlay node, it could mean either an active or a dormant node. We denote
the number of active nodes in the SOFS architecture (also called as SOFS nodes) by n
PL (n ≤ N) which are distributed across L layers. Layer i has ni nodes and i=1 ni = n. Each node in Layer i has one or more neighbors in its next higher layer to complete the communication chain. We define the number of next layer (Layer i) neighbors that a Layer i − 1 node has as mi.
3In the remaining of the chapter, if the context is clear, we will just use node or SOFS node to represent an active node.
21 In this chapter, we assume that the attack resources are limited. By attack resources, we mean the attack capacity, which depends on the amount of attack facilities. For instance, this can be the number of slave machines recruited by the attacker to launch DDoS attacks
[83]. We denote that the break-in attack and congestion attack resource as NT and NC
respectively. Thus NT and NC are the maximum number of nodes the attacker can launch
break-in and congestion attacks on. Here NT + NC ≤ N. With a probability PB, the
attacker can successfully break-into a node and disclose its neighbors in a break-in attempt.
In the SOFS system we study in this section, the system does not do any recovery
to counter attacks. The significance of our analysis and the results therein we observe
here is in obtaining a fundamental understanding of attack impacts to the system (and its
features). Nevertheless, our analysis here is still practical as in some cases, the speed of
attacks may be quite high, preventing the system from performing recoveries. In such cases,
our analysis here provides insights into damages that are caused under such rapid/burst
attacks. With the SOFS system and attack specifics in place, we now formally define our
performance metric, PS, as the probability that a client can find a path to communicate with the target under on-going attacks.
Under a One-burst Round Based Attack Model
1. Attack Model
The model we define here is an instance of the discrete round based attack model
where the number of rounds is 1. The attacker will spend all the break-in attack
resources randomly and instantly in one round and then launch the congestion attack.
Even though this model may appear simple, in reality such a type of attack is possible
when say, the system is in a high state of alert anticipating imminent attacks, which
22 the attacker is aware of and still wishes to proceed with the attack. Here we assume
the attacker has no prior knowledge about the identities of the SOFS nodes, i.e.,
which overlay nodes are currently SOFS nodes.
2. Analysis
1 2 i L L+1
client target good nodecongested node broken-in node good filter congested filter
Figure 2.2: A Snapshot of the generalized SOFS architecture under the intelligent DDoS attacks.
Our goal is to determine PS, the probability that a client can find a path to commu-
nicate with the target under attacks. This is directly related to the number of nodes
compromised due to attacks (both break-in and congestion attacks). Thus, the key
defining feature of our analysis is in determining the set 4 of attacked SOFS nodes
in each layer. An intuitive way to analyze the system is to list all possible combi-
nations of attacked nodes in each layer. Then calculate and summarize PS over all
combinations. It is easy to see that there could be many such possible combinations.
For a system with L layers and n nodes evenly distributed, such combinations will
n 2L be in θ( L ) . For a system with 3 layers and 100 SOFS nodes evenly distributed, we have about 1.0 ∗ 1010 combinations. This is a very large number, it is not practical
4We use the terms set and number of nodes in a set interchangeably.
23 to analyze the system in this fashion. To circumvent the salability problem, we take an alternate approach. Based on the weak law of large numbers we use average case analysis. We calculate the average number of attacked SOFS nodes in each layer to obtain PS. In the following, we first derive PS, which depends on the SOFS ar- chitecture and number of attacked SOFS nodes in each layer. We will then discuss how to calculate the number of attacked SOFS nodes in each layer (including nodes broken-into and congested).
1) Derivation of PS
Recall that PS is the probability that a client can successfully communicate with the target under attacks, which depends on the SOFS architecture and number of attacked
SOFS nodes. In the SOFS architecture, a SOFS node maintains a neighbor/routing table that consists of a number of (decided by the mapping degree) SOFS nodes in its next higher layer that it can communicate with. Upon receiving a message, a node in Layer i will contact a node in Layer i + 1 from its neighbor table and forward the received message to that node. This process repeats till the target is reached via the nodes in successive higher layers. The routing thus takes place through active
SOFS nodes in a distributed fashion. We call a bad or compromised node as one that has either been broken-into or is congested and thus cannot route a message. The other overlay nodes are good or alive nodes. The neighbor table will contain entries pointing to bad neighbors during break-in or congestion attacks that can cause failure of a message being delivered. A snapshot of the system under an on-going attack is shown in Fig. 2.2.
24 To compute PS, we should first know the probability Pi that a message can be suc- cessfully forwarded from Layer i − 1 to Layer i (1 ≤ i ≤ L + 1). Here Layer L + 1 refers to the set of filters that surround the target, which are also intermediate for- warders. In our analysis, we consider this layer also because it is possible that filter identities can be disclosed during a successful break-in at Layer L. With the property of distributed routing algorithm, we can obtain PS by direct product of all Pi’s, i.e.,
L+1 PS = Πi=1 Pi. Obviously, Pi depends on the availability of good nodes in Layer i that are in the routing table of nodes in Layer i − 1. Towards this extent, we define
P (x, y, z) as the probability that a set of y nodes selected at random from x > y µ ¶ µ ¶ y x nodes contains a specific subset of z nodes. Then P (x, y, z) = / if y ≥ z, z z and 0 otherwise. We denote si as the number of bad SOFS nodes in Layer i. Recall that each SOFS node in Layer i − 1 has mi neighbors in Layer i. Then, on average
P (ni, si, mi) is the probability that all next-hop neighbors in Layer i of a node in
Layer i − 1 are bad nodes. Hence Pi = 1 − P (ni, si, mi). Thus, the probability PS that a message will be successfully received by the target can be expressed as
L+1 L+1 PS = Πi=1 Pi = Πi=1 (1 − P (ni, si, mi)). (2.1)
In (2.1), only si (number of bad nodes) is undetermined. If we define bi and ci as the number of nodes that have been broken-into and the number of congested nodes respectively in Layer i, we have si = bi + ci. In the following we will derive bi and ci.
2) Derivation of bi
25 In the one-burst round based attack model, bi depends on break-in resource NT , and the probability of break-in PB. Since the attacker launches its break-in attacks ran- domly, the NT break-in attempts are uniformly distributed on the overlay nodes in the
n SOFS system. Thus the average number of broken-in SOFS nodes, NB = PB N NT , and hence, n b = P ( i )(N ), i = 1, . . . , L. (2.2) i B N T
We assume here that the filters are well-protected and cannot be broken-into. Filters are special and they are not among the N overlay nodes, and thus not the targets of random attacks. Hence bL+1 = 0.
3) Derivation of ci
We now discuss the derivation of ci (number of congested nodes in Layer i). Unlike bi, ci depends on the result of break-in attacks and congestion capacity NC . Thus, we
first need to know the set of SOFS nodes which are disclosed in the break-in attack phase on Layer i. We divide the disclosed nodes on Layer i into three sets; (i) set
N of nodes on which break-in attempts have not been made (denoted as di ), (ii) set
A of nodes that have been unsuccessfully broken-into (denoted as di ) and (iii) set of nodes that were successfully broken-into (which we do not need to consider here).
N A The nodes in sets di and di will be targeted now by congestion attacks. We calculate
N A th di and di as follows. Let Yi,j be a random variable whose value is 1 when the j node in Layer i is either a disclosed node or one on which a break-in attempt has been made. Let zi denote the average number of nodes that have been disclosed or
26 have been tried to be broken-into. Thus,
Xni Xni Xni zi = E( Yi,j) = E(Yi,j) = Pr{Yi,j = 1}, i = 1,...,L + 1. j=1 j=1 j=1 (2.3)
Denoting hi as the number of nodes on which break-in attempts have been made in
ni Layer i, we have hi = NT ( N ) for i = 1, .., L, and hL+1 = 0 because filters are not targets of break-in attacks as discussed above. Thus, the probability that the jth node in Layer i is neither a disclosed node nor one on which a break-in attempt has been made, is given by (1 − mi )bi−1 (1 − hi ). The same node can be disclosed by more ni ni than one node in the previous layer. The part (1 − mi )bi−1 excludes such overlaps. ni We now have,
m h i bi−1 i Pr{Yi,j = 1} = 1 − (1 − ) (1 − ), i = 1,...,L + 1, j = 1, . . . , ni. (2.4) ni ni
Xni m h m h i bi−1 i i bi−1 i zi = (1−(1− ) (1− )) = ni(1−(1− ) (1− )), i = 1,...,L+1. j=1 ni ni ni ni (2.5)
We hence have,
m h N i bi−1 i di = zi − hi = ni(1 − (1 − ) (1 − )) − hi, i = 2,...,L + 1. (2.6) ni ni
Xhi−bi m m A i bi−1 i bi−1 di = (1 − (1 − ) ) = (hi − bi)(1 − (1 − ) ), i = 2,...,L + 1. j=1 ni ni (2.7)
Note that nodes in the first layer cannot be disclosed due to a break-in attack and so
N A d1 = d1 = 0.
N A The attacker will now congest the SOFS nodes in the set di and di as their identities have been disclosed and they have not been successfully broken-into. We denote
ND to be the average number of SOFS nodes that are disclosed but not broken-into
27 PL+1 N A successfully. It is given by, ND = i=1 (di + di ). Now, we proceed to derive ci,
the number of congested nodes in Layer i. Recall that NC is the overall number of
overlay nodes that the adversary can congest, and the congestion attacks follow after
break-in attacks. There are two cases here;
• NC ≥ ND: In this case, all ND disclosed SOFS nodes will be congested. Since
the attacker still has capacity to congest NC − ND overlay nodes, it will expend
its spare resources randomly. The extra congested nodes will be uniformly ran-
N A domly chosen from the remaining N −NB −(ND −dL+1 −dL+1) good overlay
N A nodes, among which only a part are SOFS nodes. Here dL+1 and dL+1 are
parts of the filers and hence are excluded from ND to determine the remaining
overlay nodes that are targets for random congestion attacks 5. Therefore,
( A N A N A ni−bi −di −di di + di + (NC − ND) ∗ N−N −(N −dN −dA ) , i = 1,...,L, ci = B D L+1 L+1 N di , i = L + 1. (2.8)
• NC < ND: The attacker randomly congests NC nodes among ND disclosed
nodes. In this case,
NC N A ci = (di + di ), i = 1, 2,...,L + 1. (2.9) ND
Recall that si = bi + ci is the set of bad nodes in Layer i. Having thus computed bi
and ci, we obtain PS from (2.1).
3. Numerical Results and Discussion
We now present here numerical results based on our analysis above. We specifically
highlight the overall sensitivity of system performance to attacks and the impacts of
5In our model, the filters’ identities are hidden from attackers and they can be congested only upon dis- closure by break-in attacks.
28 specific SOFS design features (layering and mapping degree) on performance under attacks. Impacts of node distribution per layer are discussed in the successive round based attack model in Section 2.4.1.
Fig. 2.3 shows the relationship between PS and the layering and mapping degree under different attack intensities. The mapping degrees (referred as m in the figures) used here are; 1 to 1 mapping which means each SOFS node has only one neighbor in the next layer; 1 to half mapping which means each node has half of all the nodes in the next layer as its neighbors; and 1 to all mapping which means each node has all the nodes in next layer as its neighbors. Other system and attack configuration parameters are; N = 10000, n = 100, PB = 0.5, SOFS nodes evenly distributed among the layers, and number of filters is 10. In Fig. 2.3 (a), NT is set as 0 and we evaluate performance under two congestion intensities; NC = 2000 and NC = 6000 representing moderate and heavy congestion attacks. In Fig. 2.3 (b), we fix NC =
2000 and analyze two intensities of break-in; NT = 200 and NT = 2000. We make the following observations.
1 1
0.8 m=1 to 1,Nc=2000 0.8 m=1 to half,Nc=2000 0.6 m=1 to all,Nc=2000 0.6 m=1 to 1, Nt=200 m=1 to 1,Nc=6000 m=1 to half, Nt=200
Ps m=1 to half,Nc=6000 Ps m=1 to all, Nt=200 0.4 m=1 to all,Nc=6000 0.4 m=1 to 1, Nt=2000 m=1 to half, Nt=2000 m=1 to all, Nt=2000 0.2 0.2
0 0 1 2 3 4 5 6 7 8 9 101112 1 2 3 4 5 6 7 8 9 101112 L L (a) (b)
Figure 2.3: Sensitivity of PS to L and mi under different attack intensities.
29 Fig. 2.3 (a) shows that under the same attack intensities, different layer numbers result in different PS. When NT = 0 (pure random congestion attack), as L increases,
PS goes down. This is because, there are less nodes per layer, which means under random congestion, few nodes per layer are left uncompromised. This behavior is more pronounced when the mapping degree is small. We wish to remind the reader about the SOS architecture [60], where for defending against random congestion- based DDoS attacks (same attack model as in this instance), the number of layers is fixed as 3 and the mapping degree is 1 to all. From Fig. 2.3 (a), we can see that
fixing the number of layers as 3 is not always the best solution to defend against such attacks. Instead, 1 layer is the best configuration to defend against pure congestion- based attacks.
For any L, a larger mapping degree (more neighbors for each node) means more paths from nodes in one layer to nodes in the next layer, thus increasing PS as seen in Fig. 2.3 (a) under the absence of break-in attacks. Under break-in attacks, a high mapping degree is not always good as more nodes are disclosed due to break-ins.
For instance when the mapping is 1 to all, PS = 0 in Fig. 2.3 (b). Thus the effect of mapping typically depends on the attack intensities in the break-in and congestion phase. Finally, we see that an increase in NC and NT (attack intensities) leads to a decrease in PS as more nodes could be congested or broken-into, leading to a reduction in path availabilities.
30 Under a Successive Round Based Attack
1. Attack Model
In the following, we extend our one-burst attack model significantly in order to study
performance under a highly sophisticated attack model called successive round based
attack model (successive attack in short). The successive attack model is represen-
tative of sophisticated attacks targeting the SOFS system and extends from the one-
burst attack model in two ways: (i) the attacker exploits prior knowledge about the
first layer SOFS nodes. Let PE represent the percentage of nodes in the first layer
known to the attacker prior to attack (typically, these are first layer nodes advertized
to clients), (ii) the break-in attack phase is conducted in R rounds (R > 1), i.e.,
the attacker will launch its break-in attacks successively rather than in one burst. In
this attack model, more SOFS nodes are disclosed in a round by round fashion thus
accentuating the effect of break-in attacks.
The strategy of the successive attack is shown in Procedure 1. We denote β to be
the available break-in attack resources at the start of each round, and β = NT at the
start of round 1. For each round, the attacker will try to break-into a minimum of
NT α nodes and is fixed as R . If the number of disclosed nodes is more than α, the attacker borrows resources from β to attack all disclosed nodes. Otherwise it attacks
the nodes disclosed and some other randomly chosen nodes to expend α resources
for that round. The break-in attack capacity available (β) keeps decreasing till the
attacker has exhausted all of its NT resources. At any round, if the attacker has
discovered more nodes than its available capacity(β), it tries to break-into a subset
(β) of the disclosed nodes then starts the congestion phase. The attacker will congest
31 Procedure 1 Pseudocode of the successive attack strategy
System parameters: N, n, L, PB; Attack parameters: NT , NC , R, X1, β, α Phase 1 Break-in attack: NT 1: β = NT , α = R ; 2: for j = 1 to R do 3: if Xj < α < β then 4: launch break-in attack on all Xj nodes and randomly launch break-in attack on α − Xj nodes and calculate the set Xj+1 disclosed nodes; update β = β − α; 5: end if 6: if Xj < β ≤ α then 7: launch break-in attack on all Xj nodes and randomly launch break-in attack on β − Xj nodes and calculate the set Xj+1 disclosed nodes; break; 8: end if 9: if α ≤ Xj < β then 10: launch break-in attack on all Xj nodes and calculate the set Xj+1 disclosed nodes; update β = β − Xj; 11: end if 12: if Xj ≥ β then 13: launch break-in attack on β nodes among Xj nodes and calculate the set Xj+1 disclosed nodes; break; 14: end if 15: end for 16: calculate ND; Phase 2 Congestion attack:
1: if NC > ND then 2: congest the ND nodes and randomly congest (NC − ND) nodes; 3: else 4: congestion NC nodes among ND nodes randomly; 5: end if
all disclosed nodes and more, or only a subset of the disclosed nodes depending on
its congestion capacity NC . Here we assume the attacker will not attempt to break-
into a node twice and a node broken-into will not be targeted by congestion attack.
Although there can be other variations of such successive attacks, We believe that
ours is a representative enough model of sophisticated attacks.
32 2. Analysis
We again use average case analysis approach and use a similar method to derive PS
as in (1). In calculating bi and ci in the one-burst attack model we analyzed before,
we had to take care of two possible overlap scenarios (i) a disclosed node could have
been already broken-into, (ii) the same node being disclosed by multiple lower layer
nodes. The complexity in overlap is accentuated here due to the nature of successive
attacks. This is because there are multiple rounds of break-in attacks before conges-
tion. We thus have to consider the above overlaps in the case of multiple rounds as
well. In the following, we will first introduce a concept of SOFS node demarcation
in order to deal with above overlaps, and follow that with deriving bi and ci in each
round.
1) Node demarcation
In order to preserve the information about a node per round and across layers, we
introduce subscript j for round information, and subscript i for layer information.
We define Xj as the number of nodes whose identities are known to the attacker at
the start of round j. In order to deal with overlaps within and between rounds, we
need to separate the SOFS nodes into multiple sets as follows. At the beginning of
each round j, the attacker will base its break-in attack on the set of nodes disclosed
at the completion of round j − 1. We denote the set of nodes which are disclosed at
D round j−1 and on which break-in attempts are made in round j, as hi,j. Depending on its spare capacity for that round, the attacker can also select more nodes to randomly
A D A break-into. We denote this set as hi,j. We define hi,j = hi,j +hi,j, which is the number of nodes on which break-in attempts (successfully/unsuccessfully) have been made
33 at Layer i in round j. Once the attacker has launched its break-in attacks on these
D A hi,j nodes, it will successfully break-into a set of nodes. We denote bi,j and bi,j as
D A the set of nodes successfully broken-into, and denote ui,j and ui,j as the set of nodes
D unsuccessfully broken-into, after the attacker launches its break-in attacks on the hi,j
A and hi,j set of nodes respectively. We have,
D D A A bi,j = PB ∗ hi,j, and bi,j = PB ∗ hi,j, i = 1, . . . , L, (2.10)
D D A A ui,j = (1 − PB) ∗ hi,j, and ui,j = (1 − PB) ∗ hi,j, i = 1, . . . , L. (2.11)
j−1 = A + D ∑ hik hij hij hij k=1
W dij A dij *** *** ***** **** ***** ***** ** ** N * ***** ***** ***** ** * dij ** ****+ ***** **** - ** * * o o + + **** * - - * * o o o *** * - - - o o o + + + + *** - D A + + + - - - o o ** - - - - uij uij + + + + ** * - + + + + * * - - - + + + + - - - - + + + - - - A + + + - - - D = N hij + A D - hij dij −1 bij bij
Figure 2.4: Node demarcation in our successive attack at the end of Round j.
D A W Breaking-into nodes in sets bi,j and bi,j will disclose a set of nodes denoted by di,j.
W This set, di,j will overlap with (i) the nodes attacked in all previous rounds given
Pj−1 A D D by k=1 hi,k, (ii) the nodes in set bi,j, (iii) the nodes in set bi,j and ui,j and (iv) the
A W A nodes in set ui,j, where we denote the set of nodes in di,j overlapping with ui,j as
A di,j. Fig. 2.4 shows such overlaps at the end of round j. After discounting all the
W above overlaps from di,j, we can get the set of disclosed nodes which have not been
N D attacked till the end of round j denoted as di,j. Based on the definitions for hi,j and
34 N di,j, and that the filters are not targets of break-in attacks, we have
D N hi,j = di,j−1, i = 1, . . . , L. (2.12)
N A Note that di,j−1 and di,j−1 are 0 for i = 1. This is because the nodes at the first layer cannot be disclosed by means of a break-in attack in any round j. Recall that Xj is the set of disclosed nodes whose identities are known to the attacker before round j and on which break-in attacks will be made at round j. Thus it can be calculated
PL N as Xj = i di,j−1. In the following, we proceed to derive the number of broken-in nodes (bi) and then compute the number of congested nodes (ci) for each round.
2) Derivation of bi
To derive bi, we need to first calculate the sets defined above. For ease of elucidation, we take a representative case Xj < α < β in Procedure 1 as an example to explain our analysis. Recall that β is the amount of available break-in attack resource at current round. This is the most representative case among the ones possible. We also discuss other possible cases briefly after analyzing this case. In this case, the attacker at the beginning of round j of its break-in attack phase has resources to break-into
N more nodes than those disclosed already prior to that round (di,j−1), and has attack resources left (α − Xj) to randomly conduct break-in on other overlay nodes. Now
PL Pj−1 there are N − Xj − q=1 k=1 hq,k unattacked overlay nodes and among them
N Pj−1 A ni − di,j−1 − k=1 hi,k are at Layer i. Thus, we can get the number of nodes (hi,j) on which random break-in attempts have been made on Layer i in round j as
N Pj−1 A ni − di,j−1 − k=1 hi,k hi,j = PL Pj−1 (α − Xj), i = 1, 2, . . . , L. (2.13) N − Xj − q=1 k=1 hq,k
35 We define bi,j as the number of nodes broken-into on Layer i in round j, which is the
A D 6 summation of bi,j and bi,j. Based on (2.10), (2.12) and (2.13), we have,
N Pj−1 ni − di,j−1 − k=1 hi,k N bi,j = PB ∗ PL Pj−1 ∗ (α − Xj) + PB ∗ di,j−1, i = 1, 2, . . . , L. N − Xj − q=1 k=1 hq,k (2.14)
We can now obtain bi as, XJ bi = bi,k, i = 1, 2, . . . , L. (2.15) k=1 where J is the number of rounds attacker takes to exhaust all break-in resources
(NT ). Note that J ≤ R.
N To obtain bi, we need to compute the set of nodes di,j, which is used in (2.14). As
N W discussed above, we have to extract the set di,j from di,j. Similar to the discussion in
N A the one-burst attack model, we can derive di,j and di,j as follows. We first calculate the set of nodes that have been either disclosed or attacked. This is given by, P m j h i bi−1,j k=1 i,k zi,j = ni(1 − (1 − ) (1 − )), for bi−1,j > 0 and i = 2,...,L + 1. ni ni (2.16)
Note that in our attack model, the attacker will not try to break-into a node twice.
N Hence, to calculate di,j, from zi,j, we subtract the nodes on which break-in attempts
Pj have been made ( k=1 hi,k). Thus, we have,
Xj N di,j = zi,j − hi,k, for bi−1,j > 0 and i = 2,...,L + 1. (2.17) k=1
N A Having computed di,j, we can use (2.14) and (2.15) to obtain bi. Now, di,j (which
will be used to compute ci) is given by,
m A A A i bi−1,j di,j = (hi,j − bi,j)(1 − (1 − ) ), for bi−1,j > 0 and i = 2,...,L + 1. (2.18) ni
6 D A Recall that hL+1,j, hL+1,j and bL+1,j are all 0 because filters are not targets of break-in attacks.
36 Having discussed the necessary derivations for the representative case above in detail, we now clarify the readers about the situations involving particular cases for the successive attack. Apart from the representative case we have just discussed, there are three other cases: (i) Xj < β ≤ α, (ii) α ≤ Xj < β, and (iii) β ≤ Xj. For case
(i), all the formulas we derived for the above case can be directly applied, except that
α has to be replaced by β. For case (ii), all the formulas in the above case can be
A A applied except that hi,j = 0. For case (iii), we have hi,j = 0, and the formulas derived in the representative case have to be suitably modified. In this case, there are some disclosed nodes that the attacker does not try to break-into due to consumption of all break-in resources. Such nodes will be attacked during the congestion phase. We denote this set of nodes in Layer i after round j as fi,j. We wish to state here that fi,j has relevance (fi,j > 0) only when the attacker completes its break-in attack phase at round j. Thus in this case, there is no left resource for random break-in attacks.
Only β disclosed nodes will be attempted to be broken-into on all the layers and they are uniformly randomly distributed on each layer. Then we have,
N N β A D N fi,j = di,j−1 −di,j−1 ∗( ), hi,j = 0, hi,j = di,j−1 −fi,j, for i = 1, 2, . . . , L, and Xj (2.19) P P j j m j h + j f X X N i bi−1,j k=1 i,k k=1 i,k di,j = ni(1−(1− ) (1− ))− hi,k− fi,k, i = 2,...,L+1, ni ni k=1 k=1 (2.20)
A where bi−1,j > 0. Here, di,j is the same as (2.18) and fL+1,j = 0 because filters are not targets of break-in attacks. With the above derivations in this case, we can now use (2.14) and (2.15) to calculate bi.
37 3) Derivation of ci
Recall that in the congestion attack phase, the attacker will first congest the disclosed
SOFS nodes disclosed in break-in attack phase. Let the final round of the break-in attack be J(J ≤ R). Denoting ND as the number of disclosed nodes but not broken-
D N A into, based on the definitions of ui,j, di,j, di,j and fi,j, we have,
XL XJ XJ XL XL XL XJ D N N A ND = ui,k + dL+1,k + di,J + fi,J + di,k. (2.21) i=1 k=1 k=1 i=2 i=1 i=1 k=1
PL PJ We have the total number of broken-in nodes, NB = i=1 k=1 bi,k. If NC ≥ ND, similar as (2.8) we have the number of congested nodes per layer, ci as
P P J uD + dN + J dA + f + (N − N ) k=1 i,kP i,J Pk=1 i,k i,J PC D (n − J b − J uD − dN − J dA − f ) c = i k=1 i,k kP=1 i,k i,J k=1 i,k i,J i /(N − N − (N − j dN )), i = 1,...,L, B D k=1 L+1,k Pj N k=1 dL+1,k, i = L + 1. (2.22)
If NC < ND, similar as (2.9) we have ( P P NC J D N J A ∗ ( u + d + fi,J + d ), i = 1,...,L, ND k=1 i,k i,J k=1 i,k ci = P (2.23) NC ( J dN ), i = L + 1. ND k=1 L+1,k
Recall that si = bi +ci is the set of bad nodes in Layer i. We can now obtain PS from
(2.1).
Note that prior knowledge about identities of the first layer SOFS nodes (PE) de- termines X1, i.e., X1 = n1 ∗ PE. In fact, we can consider this information as that obtained from a break-in attack at Round 0. The number of nodes “disclosed” at
Round 0 is thus n1 ∗ PE, all of which are distributed at the first layer. At round 1, the
N attacker will launch its break-in attack based on this information. Thus bi,j, di,j, ci
38 etc., can be calculated by application of Formulas (2.10) to (2.23). We wish to point
out that if we set PE = 0 and R = 1, the successive attack model degenerates into
N the one-burst attack model. Thus the formulas to compute bi,j, di,j, ci etc., will be simplified to the corresponding ones derived in the previous sub-section.
3. Numerical Results
In the following, we discuss the system performance (PS) under successive attacks.
Unless otherwise specified, the default system and attack parameters are N = 10000,
n = 100, L = 4, NC = 2000, NT = 200, R = 3, PB = 0.5, PE = 0.2 and the
SOFS nodes are evenly distributed among the layers. We introduce two new mapping
degrees here, namely 1 to 2 mapping, meaning each SOFS node has 2 neighbors in
the next layer; and 1 to 5 mapping, meaning each node has 5 neighbors in the next
layer.
1 1 m=1to2, N=1000 m=1to5, N=1000 m=1to2, L=2 m=1to2, N=10000 m=1to5, N=10000 m=1to2, L=4 0.8 m=1to2, N=100000 m=1to5, N=100000 0.8 m=1to2, L=6 m=1to5, L=2 0.6 0.6 m=1to5, L=4 m=1to5, L=6 Ps Ps 0.4 0.4
0.2 0.2
0 0 10 100 1000 10000 100000 10 100 1000 10000
NT NT (a) (b)
Figure 2.5: Sensitivity of PS to NT under different L, mi and N.
In Fig. 2.5 we show how system performance, PS, changes with NT as the other
SOFS system parameters change. Fig. 2.5 (a) shows how the mapping degree and
39 total number of overlay nodes influence the relation between NT and PS. In this configuration, we set NC = 2000 and even SOFS node distribution. Fig. 2.5 (b) shows the sensitivity of PS to NT under different number of layers, L, and different mapping degree. We make the following observations. First, PS is sensitive to NT .A larger NT results in a smaller PS. For higher mapping degrees, PS is more sensitive to changing NT . The reason follows from previous discussions that a higher mapping degree discloses more nodes under break-in attacks. Second, in Fig. 2.5, there is portion of the curve, where PS almost remains unchanged for increasing NT . This stable part is due to advantages offered by means of the layering in SOFS architecture against break-in attacks guided by prior disclosure of SOFS nodes. The fall in PS beyond this stable part is due to the effect of random break-in attacks apart from break-in attacks guided by prior disclosure of SOFS nodes.
1 1 m=1to1 m=1to2 m=1to2, node_dist=even m=1to5, node_dist=even m=1to2, node_dist=increasing m=1to5, node_dist=increasing m=1to5 m = 1tohalf m=1to2, node_dist=decreasing m=1to5, node_dist=decreasing 0.8 m=1toall 0.8
0.6 0.6 Ps Ps 0.4 0.4
0.2 0.2
0 0 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 L L (a) (b)
Figure 2.6: Sensitivity of PS to L, mi and node distribution.
Fig. 2.6 (a) shows the impact of layer number, L, on system performance, PS, under different mapping degrees. Similar to Fig. 2.3 (a) (b), PS is sensitive to L and the mapping degree, even under multiple rounds of break-in attacks, i.e., when NT > 0
40 and R > 1. An increase in the number of layers can always slow down penetration of the break-in attacks towards the target. However, if the system deploys too many lay- ers, it decreases the number of nodes on each layer and the number of paths between layers decreases correspondingly, which will cause a decrease in PS (Recall that in our evaluation, the total number of SOFS nodes is fixed). Among the configurations we tested, the one with L = 4 and mapping degree 1 to 2 provides better overall performance than others.
Fig. 2.6 (b) shows the impact of node distribution on PS when L and the mapping degree change. Other parameters remaining unchanged, here we show the sensitivity of performance to three different node distributions per layer. The first is even node
n distribution wherein the nodes in each layer are the same (given by L ). The second is increasing node distribution, wherein the number of nodes in the first layer is fixed
n ( L ). This is to maintain a degree of load balancing with the clients. The other layers have nodes in an increasing distribution of 1 : 2 : ... : L − 1. The third is decreasing
n node distribution where the number of nodes in the first layer is fixed ( L ) and those in the other layers are in decreasing order of L − 1 : L − 2 : ... : 1. However, there can be other node distributions. We believe the above ones are representative to study the impact of node distributions.
We make the following observations. The node distribution does impact system per- formance. The sensitivity of PS to the node distribution seems more pronounced for higher mapping degrees (more neighbors per node). A very interesting observa- tion we make is that increasing node distributions performs best among the tested node distributions. This is because when the mapping degree is larger than 1 to 1, breaking-into one node will lead to multiple nodes being disclosed at the next layer,
41 hence the layers closer to the target will have more nodes disclosed and are more vul- nerable. More nodes at these layers can compensate the damage of disclosure. Also, we observe that as the number of layers increases, the sensitivity to node distribution gradually reduces. This is because as L increases, the difference in the number of nodes per layer turns to be less for the different node distributions.
1 1 m=1to2, L=2 L=1 L=2 L=3 m=1to2, L=4 0.8 0.8 m=1to2, L=6 L=4 L=5 L=6 m=1to5, L=2 0.6 0.6 m=1to5, L=4 m=1to5, L=6 Ps Ps 0.4 0.4
0.2 0.2
0 0 1 2 3 4 5 6 0 0.2 0.4 0.6 0.8 1
R PE (a) (b)
Figure 2.7: Sensitivity of PS to R (a) and PE (b).
Fig. 2.7 (a) shows the impact of R (the number of rounds) on PS under different L with mapping degree 1 to 5. The nodes are evenly distributed among the layers in this case. Overall, PS is sensitive and decreases when R increases. For larger values of L, PS is less sensitive to R because more layers can provide more protection from break-in attacks even for large round numbers. We also observe that PS is sensitive to
PE in Fig. 2.7 (b). For higher mapping degrees, PS is more sensitive to changing PE.
The reason follows from previous discussions that a higher mapping degree discloses more nodes. For smaller L, PS is more sensitive to changing PE because a smaller L increases the attacker’s chance to penetrate the system, layer by layer.
42 2.4.2 Analysis of Continuous Intelligent DDoS Attacks
In this section, we study the performance of the SOFS system in the presence of another
type of intelligent DDoS attacks called continuous attacks. We also study the impacts of
recovery mechanisms the SOFS can incorporate in this section. The performance metric
here is still PS.
Attack Model and System Recovery
The continuous attack model is different from the discrete round based attack model
proposed above in the sense that the attacker continuously breaks into SOFS nodes as and
when their identities are revealed to the attacker (and not in rounds). We define NT and NC
to be the maximum number of overlay nodes that can be simultaneously under break-in or
congestion attacks. Furthermore, here the attacker reuses its resources (NT and NC ) in a
more sophisticated way as follows. During system recovery (discussed next), the attacker
will know that a compromised node is recovered (it is replaced with a good node). If the
attacker attacks a non-SOFS node 7, it will also know that it is a non-SOFS node. In either
case, the attacker will redirect the attack to a new node in time Tred, which is referred as
attack redirection delay.
Under on-going congestion attack, the attacker will keep attacking a victim node as
long as it is an SOFS node. During break-in attacks, once a break-in attempt is completed
on a node (irrespective of the result), the attacker will redirect the break-in attack to another
node also in time Tred. When the attacker redirects the attack, it will use the disclosed node list if there is any node in that list, otherwise it will randomly pick a node from all the overlay nodes except ones currently under attack. Obviously, the disclosed nodes are all
7Recall that a SOFS node is one that currently is active in the SOFS structure, while a non-SOFS node is one that is a part of the overlay system, but is not a part of the SOFS structure currently.
43 SOFS nodes, so they will be targeted first by break-in attacks if there are enough resources.
Otherwise, the nodes are attacked by congestion attacks.
In our analysis here, the SOFS system employs recovery to defend against attacks.
While there can be many potential recovery mechanisms, the one we employ is proactive
recovery, where a proactive reset mechanism periodically resets every SOFS node. When
a proactive reset event happens on a SOFS node, the SOFS system immediately replaces
that node with a new SOFS node chosen from the set of non-SOFS nodes. We denote the
interval between two successive proactive resets on a SOFS node as Tp, which is called
system recovery delay. In this study, we mainly focus our discussion on proactive recov-
ery. Interested readers can refer to [128] for our discussion and analysis on other recovery
mechanisms.
Analysis
The goal of our analysis here is to study the impacts of system design features on sys-
tem performance under continous attacks with system recovery. An analytical approach for
this case, similar to the one conducted under discrete round based attacks, is too compli-
cated. We use simulations here to study system performance under continuous attacks in
the presence of system recovery.
In order to analyze the system, we implement a discrete event driven simulation tool
to simulate the attack model and system recovery. The simulated system consists of 5000 overlay nodes among which there are 40 SOFS nodes, and 10 filters. Each client is con- nected to 5 first layer SOFS nodes. In our simulations below, the attack redirection delay
(Tred) and the system recovery delay (Tp) follow exponential distributions. The system
mean value of Tp recovery is sensitive to , denoted as r, instead of the individual value Tred mean value of Tred or Tp. Thus r measures the competition between attacks and system recovery in terms of
44 speed. A smaller value of r, implies faster recovery, which is beneficial for the system.
In the simulations below we only use r to discuss the impacts of continuous attacks and
system recovery. Numerical Results and Discussions
In following simulations, the default system and attack parameters are L = 4, PB = 0.5,
PE = 0.2, NT = 200 and NC = 200. Fig. 2.8 (a) shows the impact of layer number, L,
1 1 r=5, L=4 r=5, L=7 0.8 0.8 r=10, L=4 r=10, L=7 r=20, L=4 r=20, L=7 0.6 0.6 Ps Ps r=5, m=1to1 0.4 0.4 r=5, m=1to2 0.2 r=5, m=1toall 0.2
0 0 1 2 3 4 5 6 7 8 0 500 1000 1500 2000 2500
L NC (a) (b)
Figure 2.8: Sensitivity of PS to L under different m (a), and to NC under different L and r (b).
on PS under different mapping degrees when both NT and NC are fixed as 200, and r = 5.
Similar to Fig. 2.6 (a), PS is sensitive to L and the mapping degree. The sensitivity of PS to
L and mapping degree is less than that in the discrete round based attack model. The reason
is due to the presence of system recovery, where the system replaces the compromised
and disclosed SOFS nodes, attack impacts are reduced. Fig. 2.8 (b) shows how L and r
influence PS when NT = 200, mapping degree is 1 to 2, and NC changes. Here L = 4 is
always better than L = 7. This is because, when NT is fixed and NC increases, random
congestion attacks dominate, and hence less layers will improve performance as discussed
in the round based attack model.
45 1 1 r=5, m=1to2 r=5, m=1tohalf r=5, L=4 r=5, L=7 r=10, m=1to2 r=10, m=1tohalf r=10, L=4 r=10, L=7 0.8 0.8 r=20, m=1to2 r=20, m=1tohalf r=20, L=4 r=20, L=7 0.6 0.6 Ps Ps 0.4 0.4
0.2 0.2
0 0 0 500 1000 1500 2000 2500 0 500 1000 1500 2000 2500
NT NT (a) (b)
Figure 2.9: Sensitivity of PS to NT under different r and L (a), and different r and m (b).
Fig. 2.9 (a) shows how PS changes with NT under different L and r. The mapping degree is fixed as 1 to 2. In most of the cases, L = 4 performs better than L = 7. This is because, in our simulations the total number of SOFS nodes is fixed. Under this situation, deploying more layers decreases the number of nodes on each layer, and so decreases the number of paths from clients to the target. However, there is one exception to this claim.
When r = 20 and NT = 50, L = 7 performs better than L = 4, which shows that more layers can be beneficial. The reason is, when NT is very small, there are few nodes disclosed and compromised at each layer. In this situation, decrease in PS is mainly due to disclosure and compromise of filters which are at the last layer. Here, slow recovery
(large r) cannot recover the compromised filters effectively. In this case, more layers can slow down the penetration of break-in attacks towards the filters and helps achieve better performance. The data also demonstrate that faster system recovery (smaller r) improves system performance more effectively.
Fig. 2.9 (b) shows how PS changes with NT under different mapping degrees and r, when L = 4. When NT is small, smaller mapping is better, especially when r is large.
But when NT is large, larger mapping performs better. This is because, when NT is not
46 large and mapping degree is small, fewer nodes are disclosed. Hence fewer nodes are
attacked, resulting in high PS. However, when NT is very large, many SOFS nodes are disclosed and compromised and it is the system recovery that maintains a certain (possibly small) number of nodes alive, which guarantees PS > 0. The number of alive nodes here is mainly determined by r, which is not related to mapping degree. But mapping degree decides the number of available paths. Given a number of alive nodes, a larger mapping degree means more paths. Hence, PS increases with larger mapping degree, especially when r is small (fast system recovery).
From the above,we see that attack intensities and system design features have signifi-
cant impacts on system performance under continuous attacks with system recovery. We
also find that recovery plays a significant role in reducing impacts caused by even intense
attacks, by still sustaining a certain level of system performance. Large mapping degrees
help achieve better system performance in this circumstance.
2.5 Countermeasures
2.5.1 Optimization of SOFS System Performance Under Round based Attacks
In the above analysis, we have made important observations on SOFS system perfor-
mance, and the impacts of the design features on system performance under round based
intelligent DDoS attacks, using extensive analytical derivations. Based on this deep insight
of the attacks, we can provide countermeasures against them in order to optimize SOFS
system performance under attacks. In the following, we address this issue. For the pur-
pose of briefness, we only present the methods to obtain optimal mapping degree and node
47 distribution as examples. Optimal configurations for other design features can be obtained
similarly.
The performance of the SOFS system, i.e., PS, is a function of system design features
and attack parameters, as seen in (2.24) below, where m[] and n[] are the mapping degree
and the number of nodes on each layer. Function F contains all parameters that are used to
calculate PS and summarizes the above formulas to calculate PS.
PS = F (N, n, NC ,NT , L, m[], n[],PB,PE,R) (2.24)
If all other system and attack parameters are fixed and we keep m[] as variables, then
we can use existing mathematic tools such as MATLAB to get the optimal mapping degree
under given system and attack parameters. Table 2.1 shows optimal mapping degrees under
default system and attack parameters that were used in Section 3. We can see that the
8 optimal mapping degree changes from 1 to all to 1 to 2, when NT changes from 0 to 2000.
This matches our previous observation that smaller mapping degrees improve the resilience of the system to break-in attacks. In Table 2.2, we get the optimal node distributions under
n two different NT values when L = 4, n1 is fixed as L , i.e., 25, mapping degree is 1 to 2, and all other parameters are set as the default configuration values. The results match our previous observation that increasing node distribution performs better than other node distributions.
While the above approach is useful in some cases, the real problem is how to optimize
multiple structure parameters simultaneously to achieve optimum performance. To com-
plicate this further, some of the parameters especially attack related (such as NC ,NT ) may
be unknown to the system designer at design time. In some cases some parameters can be
8While obtaining optimal mapping degrees we constrain the mapping degrees to be equal across layers for consistency in workload across nodes in the system. However this constraint can be relaxed if need be.
48 NT NT = 0 NT = 20 NT = 200 NT = 2000 Optimal mapping degree 1 to all 1 to 4 1 to 3 1 to 2
Table 2.1: Optimal mapping degree with different NT
NT n1 n2 n3 n4 NT n1 n2 n3 n4 200 25 20 21 34 600 25 22 22 31
Table 2.2: Optimal node distribution under 1 to 2 mapping with different NT
estimated to be within ranges. Also, the system may have other constraints like latency,
workload per node etc., that impact choices in number of layers, mapping degree etc. The
optimization of design features needs to take these issues into consideration too. It can be
easily seen that solving the overall optimization problem is thus not easy. Nevertheless,
we do provide some discussions on how to obtain optimal configurations under reasonable
assumptions on system and attack generalities.
Consider an instance, where intensities can be predicted within some interval, i.e., we
know the ranges and the distributions of NC and NT values. Then, a reasonable approach
to address this problem is to obtain configurations to optimize the expected value of the
0 0 path availabilities denoted as E(PS). It is formally defined in (2.25), where P r(NC ,NT )
0 0 is the probability that the NC and NT have values of NC and NT respectively.
X 0 0 0 0 E(PS) = P r(NC ,NT ) × F (N, n, NC ,NT , L, m[], n[],PB,PE,R). (2.25) 0 0 NC ,NT Based on (2.25), we can use optimization tools such as those in MATLAB to get the
optimal mapping degree (m[]) and node distribution (n[]) to achieve overall optimal per- formance under certain ranges of NC and NT . In reality, the range and distribution of NC
49 and NT , and even other attack parameters can be obtained from historical experience and run-time measurement. Other attack parameters that can be estimated within ranges can be handled in the same way we deal with NC and NT .
To summarize here, the attack strategies, intensities, prior knowledge about the system significantly impact system performance. However, the impacts are deeply influenced by the system design features. Larger values of L and smaller mapping degrees improve sys- tem resilience to break-in attacks, while the reverse is true for congestion-based attacks.
Increasing node distribution performs better than other node distributions. These design features interact among each other to impact system performance under intelligent DDoS attacks.
2.5.2 General Design Guidelines to Enhance SOFS System Performance
Although we couldn’t give the performance optimization for SOFS system under con- tinuous attacks, but we are still able to obtain the impacts of design features of SOFS system on performance under continuous attacks, which matches our observations under round based attacks.
Based on our findings for all the attack models in the chapter, we propose a set of design guidelines to enhance performance under all general scenarios here as follows.
• The design feature configurations should be flexible and adaptive to achieve high
performance under different intensities of attacks.
• When attack information is unknown, moderate number of layers and mapping de-
gree, and increasing node distribution are recommended to sustain a more than ac-
ceptable level of performance.
50 • When break-in attacks dominate, more layers and smaller mapping degrees are rec-
ommended. When congestion-based attacks dominate, less layers and larger map-
ping degrees are better.
• System recovery is always helpful to improve system performance under attacks.
Under intense break-in attack, system recovery with large mapping can always help
sustain a more than acceptable level of performance.
2.6 Related Work
The main scope of this work is in the realm of overlay systems (organized into definite
structures) for defending against Distributed DoS attacks. The surveys in [83, 104, 76] on
DDoS attacks and defense are exhaustive, and interested readers can refer to those papers.
In the following, we focus on work using overlay systems in general to defend against
DDoS attacks.
Recently, several works have proposed solutions based on overlay networks to enhance
security of communication systems like [60, 10, 115, 116, 11, 26, 14, 125]. An overlay
solution to track DDoS floods has been proposed in [116]. [11] proposes a overlay rout-
ing infrastructure to enhance the resilience of the Internet. Chen and Chow designed a
random Peer-to-Peer network that connects the registered client networks with the regis-
tered servers to defend against DDoS attacks in [26]. Badishi et. al present a systematic
study of the vulnerabilities of gossip-based multicast protocols to DoS attacks and propose
a simple gossip-based multicast protocol that eliminates such vulnerabilities in [14]. The
effectiveness of location-hiding of proxy-network based overlays is discussed in [125].
51 Anonymity systems share some features with our SOFS system. Anonymity systems usually use intermediate forwarding to achieve anonymity. However, there are some sig- nificant differences between SOFS and anonymity systems. The goal of SOFS is to ensure paths from clients to the server by putting multiple connections between nodes in succes- sive layers. Many anonymity systems depend on one or more third party nodes to generate an anonymous path [101, 134], which is not good for SOFS. SOFS cannot rely on a central- ized node to achieve receiver anonymity, since the centralized node can itself be the target of a DDoS attack.
2.7 Summary
In this Chapter, we have studied the impacts of architectural design features on SOFS, a generalized overlay intermediate forwarding system under intelligent DDoS attacks. We analyzed our SOFS system under discrete round based attacks using a general analytical approach, and analyzed the system under continuous attacks using simulations. We ob- served that the system design features, attack strategies, intensities, prior knowledge about the system, system recovery significantly impacts system performance. Even under sophis- ticated attack strategies and intensities, we showed that with smart designing of system features and recoveries, attack impacts can be significantly reduced. As we discussed in
Section 2.4.1, we showed how to obtain optimal system configurations under expected at- tack strategies and intensities. Based on our findings in the chapter, we further proposed a set of design guidelines to enhance SOFS system performance under all general scenarios.
As part of future direction for this research topic, we propose to design SOFS system that is resilient to attacks while maintaining QoS. break-in attacks, increases the latency of communication. An increase in the mapping degree has the opposite effect of decreasing
52 latency due to more choices for routing. We are in the process of designing an SOFS system that is highly resilient to attacks while still attempting to achieve a desired level of QoS. Also, the impacts of our work extend beyond DDoS attack defense. There are several other applications where a structure present, enables better service delivery. These include Multicasting, Real-time delivery, File Sharing systems etc. As modeled in this chapter, attackers can cause significant damages to performance by exploiting knowledge of the structure already present in these systems. We believe that our work is a first step towards designing the features of resilient overlay architectures under intelligent attacks.
Analyzing the resilience of such systems under intelligent attacks will also be a part of our future work.
53 CHAPTER 3
LOCALIZATION ATTACK AGAINST INTERNET THREAT MONITORING SYSTEMS AND COUNTERMEASURES
In this chapter, we study a new class of attacks, i.e., the invisible LOCalization (iLOC) attack. The iLOC attack is another infrastructure-oriented attack discussed in this disser- tation. Different from the architecture-oriented attack we discussed in previous chapter, it targets to the infrastructure location information of the defense systems. More particularly, the iLOC attack can accurately and invisibly obtain the location of the monitors in Inter- net Threat Monitoring (ITM) systems, which are well accepted defense systems against widespread Internet attacks. The task of iLOC is not to directly harm the Internet or ITM systems. Instead, its goal is to obtain the location information of the key components, i.e., monitors, in ITM systems, so that other widespread attacks can evade ITM systems and make attacks more effective. We also provide countermeasures again this potential threat to the Internet.
3.1 Motivations
In recent years, widespread attacks, such as active worms [87, 86, 4] and Distributed
Denial of Service (DDoS) attacks [82, 2], have been major threats to the Internet. Due to the widely-spreading nature of these attacks, large scale traffic monitoring across the Internet
54 has become necessary in order to effectively detect and defend against them. Developing
and deploying Internet threat monitoring (ITM) systems (or motion sensor networks) is one of the major efforts in this realm.
However, the integrity and functionality of ITM systems largely depend on the anonymity
of the IP addresses covered by their monitors, i.e., the locations of monitors. If the locations
of monitors are identified, the attacker can deliberately avoid these monitors and directly
attack the uncovered IP address space. It is a known fact that the number of sub-networks
covered by monitors is much smaller than the total number of sub-networks in the Internet
[103, 138, 85]. In other words, the IP address space covered by monitors represents a very
small portion of the whole IP address space. For example, the SANs ISC covers around
1 million IP addresses, which is 0.023% of IPv4 IP address space. Hence, bypassing IP address spaces covered by monitors will significantly degrade the accuracy of the traffic data collected by the ITM system in reflecting the real situation of attack traffic. Further- more, the attacker may also poison ITM systems by manipulating the traffic towards and captured by disclosed monitors. For example, the attacker can launch high-rate port-scan traffic to disclosed monitors and feign a large scale worm propagation. The attackers may even launch retaliation attacks (e.g., DDoS) against participants (i.e., monitor contributors) of ITM systems, thereby discouraging them from contributing to ITM systems. In sum- mary, the attacker can significantly compromise the ITM system performance if he is able to disclose the locations of monitors. It is important to have a thorough understanding of such attacks, in order to design efficient countermeasures enabling the protection of ITM systems.
55 In this chapter, we investigate a new class of attacks called invisible LOCcalization
(iLOC) attack, which can accurately and invisibly localize the monitors in ITM systems.
We further present a set to guidelines to counteract this potential threat to ITM systems.
3.2 Background
3.2.1 Internet Threat Monitoring Systems
Generally, an ITM system consists of a number of monitors and a data center. The mon- itors are distributed across the Internet and can be deployed at hosts, routers, and firewalls, etc. Each monitor is responsible for monitoring and collecting traffic targeting to a range of IP addresses within a sub-network. The range of IP addresses covered by a monitor is also referred to as the location of the monitor. Periodically, the monitors send traffic logs to the data center. The data center analyzes the traffic logs and publishes reports to the public9. The reports provide critical insights into widespread Internet threats and attacks, and are used in detecting and defending against such attacks. ITM systems have been suc- cessfully used to detect the outbreaks of worms [103] and DDoS attacks [89]. There have been many real-world developments and deployments of such systems. Examples include
DOMINO (Distributed Overlay for Monitoring InterNet Outbreaks) [137], SANs ISC (In- ternet Storm Center) [103], Internet sink [138], network telescope [85], CAIDA [21], and myNetWatchMan [90].
9In order to maximize the usage of such reports, most existing ITM systems publish the reports online and make them accessible to public.
56 3.2.2 Localization attacks against ITM Systems
A few works have been conducted on monitor localization attacks [18, 110] against
ITM systems. In this kind of attacks, accuracy is very important for an attacker in identify-
ing monitor locations. Meanwhile, invisibility is vital to the attacker as well. If the attack
attempts are identified by the defender (such as the ITM administrators), countermeasures
can be applied by the defender to reduce or eliminate the attack effect by filtering suspi-
cious traffic [120], decoying attackers [113], and even tracing back to attack origins for
accountability of their malicious acts [108]. Invisibility is critical for the attacker to evade
the above countermeasures.
However, it is challenging for the attacker to achieve these two objectives simultane-
ously. Intuitively, the attacker can use the high-rate attack traffic, as in [18, 110], to easily
achieve high attack accuracy as follows. The attacker can launch high-rate port-scan traffic
to a target network. The attacker then queries the data center for the report on recent port-
scan activities. If there is a traffic spike in the report data reflecting the high-rate port-scan
traffic sent by the attacker, the attacker can determine that the target network is deployed
with monitor(s) which sends traffic report to the data center. However, it is hard for this
scheme to achieve invisibility, since large spikes caused by the attack traffic make the attack
easily detectable. Our work is the first to address an attack aiming to achieve the objectives
of accuracy and invisibility.
3.3 iLOC Attack
In this section, we will discuss the iLOC attack in detail. We will first give an overview
of the iLOC attack, and then present the detailed stages of the attack, followed by additional discussions on its mechanisms.
57 3.3.1 Overview
: Background Traffic Data center QUERY RESPONSE Data center
: Attack Traffic + 1. Select code 2. Encode attack traffic + 4. Query for MONITORS’ LOG traffic report Attacker UPDATE Attacker 3. Launch 5. Recognize monitors attack mark monitors attack traffic Network B Network B Network A Network C Network A Network C
Internet Internet
monitors monitors (a) Attack stage 1: attack traffic generating (b) Attack stage 2: attack traffic decoding
Figure 3.1: Workflow of the iLOC Attack
Fig. 3.1 shows the basic workflow of the iLOC attack. This figure also illustrates the basic idea of the ITM system. In the ITM system, the monitors deployed at various net- works record their observed port-scan traffic and continuously update their traffic logs to the data center. The data center first summarizes the volume of port-scan traffic towards
(and reported by) all monitors, and then publishes the report data to the public in a timely fashion.
As shown in Fig. 3.1 (a) and (b) respectively, the iLOC attack consists of the following
two stages:
1. Attack Traffic Generation: In this stage, as shown in Fig. 3.1 (a), the attacker first
selects a code. Then, he encodes the attack traffic by embedding the selected code
into the traffic. Lastly, the attacker launches the attack traffic towards a target network
(e.g., network A in Fig. 3.1 (a)). We denote such an embedded code pattern in the
58 attack traffic as the attack mark of the iLOC attack, and denote the attack traffic
encoded by the code as attack mark traffic.
2. Attack Traffic Decoding: In this stage, as shown in Fig. 3.1 (b), the attacker first
queries the data center for the traffic report data. Such report data consist of both
attack traffic and background traffic. After getting the report data, the attacker tries
to recognize the attack mark (i.e., the code embedded in the iLOC attack traffic)
by decoding the report data. If the attack mark is recognized, the report data must
include the attack traffic, which means the target network is deployed with monitors
and the monitors are sending traffic reports to the ITM data center.
Code-based Attack: The iLOC attack adopts a code based approach to generate the attack traffic. Coding techniques have been widely implemented in secured communica- tion; for example, Morse code is one such example. Without knowledge of Morse code, the receiver would find it impossible to interpret the carried information [31]. In the iLOC attack, we use the pseudo-noise code (PN-code) based attack approach, which has three advantages. First, the code is embedded in traffic and can be correctly recognized by the attacker even under the interference from background traffic, ensuring accuracy of the at- tack. Second, the code (of sufficient length) itself provides enough privacy. That is, the code is only known by the attacker, thereby the code pattern embedded in attack traffic can only be recognized by the attacker. Furthermore, the code is able to carry information. A longer code is more immune to interference, and requires comparatively lower-rate attack traffic as the carrier, which is harder to be detected. All these characteristics help to achieve the objectives of attack accuracy and invisibility.
Parallel Attack Capacity: The iLOC attack can not only attack one target network to determine the deployment of monitors in one network at one time, but it can also attack
59 multiple networks simultaneously. Intuitively, one simple way to achieve this parallel at-
tack is to launch port-scan/attack traffic towards multiple target networks simultaneously,
by scanning a different port number for each different target network. For example, if
the data center publishes traffic reports of 1000 (TCP/UDP) ports, then the attacker can launch attack towards 1000 networks simultaneously, attacking each network with a dif- ferent port number. Since attack traffic on different ports are summarized separately at the data center, the attacker still can separate and thus decode his traffic towards different targets. Hence the attacker can localize monitors in multiple networks simultaneously and accurately. However, can the attacker further improve the attack efficiency? Assume the data center still only publishes reports of 1000 ports, can the attacker attack 10, 000 target
networks simultaneously, for example, attacking 10 different networks using one same port
number? Using high-rate of port-scan traffic cannot achieve this, because it is indiscernible
whether a spike in the traffic report is caused by traffic logs from one network or the other
9 networks. In order to achieve this goal in the code-based attack, the selected code and
corresponding encoded attack traffic towards multiple networks for the same port should
not interfere with each other (i.e., each of them can be decoded individually and accurately
by the attacker, although they are integrated/summarized in the traffic report from the ITM
data center). The PN-code selected in the iLOC attack has this feature, giving it the unique
capacity to carry out parallel attack sessions towards multiple target networks using one
same port. The details of the PN-code selection will be discussed in the following sections.
3.3.2 Attack Traffic Generation Stage
In this attack stage, the attacker: (1) selects the code, a PN-code in our case; (2) encodes the attack traffic using the selected PN-code; and (3) launches the encoded attack traffic
60 towards the target network. For the third step, the attacker can coordinate a large number
of compromised bots to launch the traffic [92]. However, this is not the focus of this
chapter. In the following, we will present detailed discussion on the first and second steps,
respectively.
Code Selection
To evade detection by others, the attack traffic should be similar to the background
traffic. From a large set of real-world background traffic traces obtained from SANS ISC
[103, 39], we conclude that the background traffic shows random patterns in both time
and frequency domains. The attack objectives of both accuracy and invisibility, and an
attacker’s desire for parallel attacks require that: (1) the encoded attack traffic should blend
in with background traffic, i.e., be random in both time and frequency domains, (2) the
code embedded in the attack traffic should be easily recognizable to the attacker himself,
and (3) the code should support parallel attack.
To meet the above requirements, we choose the PN-code to encode the attack traffic.
The PN-code in the iLOC attack is a sequence of −1 or +1 with the following features
[97, 35, 38].
• The PN-code is random and “balanced”. The −1 and +1 are randomly distributed
and the occurrence frequencies of −1 and +1 are nearly equal. This feature con-
tributes to good spectral density properties (i.e., equally spreading the energy over
the whole frequency-band). It makes the attack traffic appear as noise and blend in
with background traffic in both time and frequency domains.
• The PN-code has a high correlation to itself and a low correlation to others (such
as random noise), where the correlation is a mathematical tool for finding repeating
61 patterns in a signal [38]. This feature makes it feasible for the attacker to accurately
recognize attack traffic (encoded by the PN-code) from the traffic report data even
under the interference of background traffic.
• The PN-code has a low cross-correlation value among different PN-code instances.
The lower this cross-correlation, the less interference among multiple attack sessions
in parallel attack. This feature makes it feasible for the attacker to conduct parallel
localization attacks towards multiple target networks on the same port.
The Walsh-Hadamard code and M-sequence code [97, 35] are two popular types of
PN-code. The Walsh-Hadamard code has some limitations. Since its frequency spreads into only a limited number of discrete frequency-components which is different from back- ground traffic, it will compromise the invisibility of the attack traffic if used in the iLOC attack. In addition, the Walsh-Hadamard code also strongly depends on global synchro- nization [35]. On the contrary, M-sequence code does not have these shortcomings, we adopt M-sequence codes in the iLOC attack. We use the feedback shift register to repeat- edly generate the M-sequence PN-code due to its popularity and ease of implementation
[97, 41]. In particular, a feedback shift register consists of two parts. One is an ordinary shift register consisting of a number of flip-flops (two state memory states). The other is a feedback module to form a multi-loop feedback logic.
Attack Traffic Encoding
During the attack traffic encoding process, each bit in the selected PN-code is mapped to a unit time period Ts, denoted as mark bit duration. The entire duration of launched traffic (referred to as traffic launch session) is Ts · L, where L is the length of the PN-code.
62 PN-code = [+1, -1, +1, -1, +1]
+1 +1 +1
-1 -1 PN-Code Length = 5
Attack Traffic Encoded by PN-code V
V 0
TS TS Traffic Launch Session Duration = 5 · Ts
Figure 3.2: PN-code and Encoded Attack Traffic
The encoding is carried out according to the following rules: each bit in the PN-code
maps to a mark bit duration (Ts); when the PN-code bit is +1, port-scan traffic with a high
rate, denoted as mark traffic rate V , is generated in the corresponding mark bit duration; when the code bit is −1, no port-scan traffic is generated in the corresponding mark bit duration. Thus, the attacker embeds the attack traffic with a special pattern, i.e., the original
PN-code. Recall that, after this encoding process, the PN-code pattern embedded in traffic
L is denoted as attack mark. If we use Ci =< Ci,1,Ci,2,...,Ci,L >∈ {−1, +1} to represent
the PN-code and use ηi =< ηi,1, ηi,2, . . . , ηi,L > to represent the attack traffic, then we have
V V ηi,j = 2 · Ci,j + 2 (j = 1,...,L). Fig. 3.2 shows an example of the PN-code and the corresponding attack traffic encoded with the PN-code.
3.3.3 Attack Traffic Decoding Stage
In this stage, the attacker takes the following two steps: (1) The attacker queries the data
center for the traffic report data, which consists of both attack traffic and background traffic.
(2) From the report data, the attacker attempts to recognize the embedded attack mark. The existence of the attack mark determines the deployment of monitors in the attack targeted network. As the query of traffic report data is relatively straightforward, here we only detail the second step, i.e., attack mark recognition, as follows.
63 In the report data queried from the data center, the attack traffic encoded with the attack
mark is mixed with background traffic. It is critical for the iLOC attack to accurately
recognize the attack mark from the traffic report data. To address this, we develop the
correlation-based scheme. This scheme is motivated by the fact that the original PN-code
(used to encode attack traffic) and its corresponding attack mark (embedded in the traffic
report data) are highly correlated, in fact, they are actually the same.
The attack mark in the traffic report data is the embedded form of the original PN-
code. The attack mark is similar to its original PN-code, although the background traffic
may introduce interference and distortion into the attack mark. We adopt the following
correlation degree to measure their similarity. Mathematically, correlation degree is de-
fined as the inner product of two vectors. For two vectors X =< X1,X2,...,XL > and
Y =< Y1,Y2,...,YL > of length L, the correlation degree of vector X and Y is P L X · Y Γ(X,Y ) = X ¯ Y = i=1 i i , (3.1) L
where Γ(.) represents the operator for the inner product of two vectors. Based on the above definition, we have Γ(X,X) = Γ(Y,Y ) = 1, ∀ X,Y ∈ {−1, +1}L.
We use two vectors, ηi =< ηi,1, ηi,2, . . . , ηi,L > and ωi =< ωi,1, ωi,2, . . . , ωi,L > to rep-
resent attack traffic (embedded with attack mark) and background traffic, respectively. We
shift the above two vectors by subtracting the mean value from the original data, resulting
0 0 0 0 0 0 0 0 in two new vectors, ηi =< ηi,1, ηi,2, . . . , ηi,L > and ωi =< ωi,1, ωi,2, . . . , ωi,L >. We still
L use a vector Ci =< Ci,1,Ci,2,...,Ci,L >∈ {−1, +1} to represent the PN-code. Thus,
the correlation degree between the PN-code and the (shifted) attack traffic can be obtained.
Similarly, we can also obtain the correlation degree between the PN-code and the (shifted)
background traffic as follows.
64 According to the rules of encoding attack traffic discussed in Section 3.3.2, ηi =
V V 0 V V 2 ·Ci + 2 . Thus, ηi = ηi −E(ηi) = ηi − 2 = 2 ·Ci. Hence, the correlation degree between
0 V V the original PN-code and the (shifted) attack traffic is Γ(Ci, ηi) = 2 · Γ(Ci,Ci) = 2 . Fur- thermore, we can also derive the correlation degree between the PN-code and the (shifted)
0 background traffic, i.e., Γ(Ci, ωi). The mean of such correlation degree is close to 0, since
0 the PN-code has low correlation with the (shifted) background traffic (i.e., E[Γ(Ci, ωi)] =
1 PL 0 L E[ j=1(ωi,j · Ci,j)] ≈ 0). If the standard deviation of the background traffic rate is σx, the variance of such correlation degree is
0 0 2 V ar[Γ(Ci, ωi)] = E[(Γ(Ci, ωi) − 0) ] (3.2) 1 XL = E[ C2 ω0 2] (3.3) L2 i,j i,j j=1 1 XL σ 2 ≈ E[ ω0 2] = x . (3.4) L2 i,j L j=1 Thus, the average correlation degree between the PN-code and the (shifted) background
traffic is Γ(C , ω0) ≈ √σx . Based on the above discussion, the attacker can set appropriate i i L attack parameters (e.g., PN-code length L and mark traffic rate V ) to make correlation
V degree ( 2 ) between the PN-code and the attack mark traffic is much larger than the corre-
lation degree ( √σx ) between the PN-code and the background traffic. As such, the attacker L can accurately distinguish the attack mark traffic from the background traffic.
In the practice of attack mark recognition, vector λi is used to represent the queried
0 report data, and vector λi is used to represent the shifted report data (by subtracting E(λi,j)
0 from λi). The attacker uses the correlation degree between λi and his PN-code Ci, i.e.,
0 0 Γ(Ci, λi), to determine the existence of PN-code in the report data. If Γ(Ci, λi) is larger
than a threshold Ta, which is referred to as mark decoding threshold, then the attacker determines that the report contains attack traffic as well as the PN-code Ci, and determines
65 that the target network is deployed with monitors. The accuracy of this correlation-degree-
based PN-code recognition is analyzed and demonstrated in Section 3.4, 3.5 and 4.5.
3.3.4 Discussions
In order to accurately and effectively recognize the attack mark (PN-code) from the
report data, we need to find the segment of the report data containing the PN-code (i.e.,
we need to fulfill the synchronization between the port-scan traffic report data and the PN-
code). For this purpose, we introduce an iterative sliding window based scheme. The
basic idea is to let the attacker obtain a enough report data with small granularity. Then, a
sliding window iteratively moves forward to capture a segment of the report data. For each
segment, we apply the correlation-based scheme discussed in Section 3.3.3 to recognize
whether or not the attack mark exists. The details of this synchronization is presented as
follows.
The attacker first sends a sequence of queries to the data center and each query requests
a part of report data which lasts for a given unit time, known as query duration Tq. To guar- antee good synchronization and capture of each bit in the PN-code, Tq should be smaller than the mark bit duration Ts. Also, the attacker needs to send enough queries and ensure that the queried report data contains the whole attack mark and attack mark traffic, which is length L · Ts. With the report data, the attacker iteratively conducts a correlation test on the report data, using a sliding window. For example, in the ith round, the attacker selects
th ti as the starting time for the sliding window. In (i + 1) round, the attacker moves the
sliding window one step (Tq) forward, thus the start time of the sliding window becomes
th ti +Tq, and so on. In the i round, a sequence of data (length of L) is obtained in the sliding window. The first data point in the sequence is the traffic data in time duration [ti, ti + Ts], the second data point in the sequence is the traffic data in time duration [ti + Ts, ti + 2 · Ts],
66 and so on. With these data, the attacker conducts the attack mark recognition discussed in
Section 3.3.3. The attacker repeats the attack mark recognition after each time he moves forward the sliding window, until the attack mark is recognized from the report data in the current sliding window, or the sliding window has gone through all the report data.
3.4 Analysis
In this section, we first present our formal analysis of the impacts of different attack parameters on attack accuracy and invisibility. Then based on analytical results, we discuss how to determine attack parameters.
Before starting analysis, we need to clarify two parties in the attack process, the iLOC attacker and its the adversary, the defender. The term defender generalizes the benign parties who maintain the ITM system and/or exploit the reports from the data center to identify widespread Internet attacks. Based on the reports, the defender not only attempts to determine whether there are anomalies in traffic, but also takes appropriate actions should any anomalies be identified.
3.4.1 Accuracy Analysis
In order to measure attack accuracy, we introduce the following two metrics. The first one is attack successful rate PAD, which is the probability that an attacker correctly recog- nizes the fact that a selected target network is deployed with monitors. The higher PAD is, the higher the attack accuracy. The second metric is attack false positive rate PAF , which is the probability that the attacker mistakenly declares a target network as one deployed with monitors. The lower PAF , the higher the attack accuracy.
Recall that Ta is the mark decoding threshold, V is the mark traffic rate, vector λi repre-
0 sents the queried report data, and vector λi represents the shifted report data (by subtracting
67 0 0 E(λi,j) from λi). Assume that random variables ωi,1, . . . , ωi,L (i.e., the shifted background traffic) are independent identically distributed (i.i.d) and follow a Gaussian random dis- tribution with standard deviation σx, then we have the following theorem for the attack accuracy of the iLOC attack.
Theorem 1 In the iLOC attack, the attack successful rate PAD is
0 0 0 0 PAD = 1 − P r[Γ(λi,Ci) ≤ Ta|(λi = ηi + ωi)] (3.5) Z ∞ 1 −y2 = 1 − √ V √ e dy. (3.6) ( −Ta) L π 2 √ 2σx
The attack false positive rate PAF is
0 0 0 PAF = P r[Γ(λi,Ci) ≤ Ta|(λi = ωi)] (3.7) Z ∞ 1 −y2 = √ √ e dy. (3.8) π √L·Ta 2σx
Proof 1 i) Derivation of attack successful rate PAD.
According to the definition of PAD, we have
0 0 0 0 PAD = 1 − P r[Γ(λi,Ci) ≤ Ta|(λi = ηi + ωi)]. (3.9)
0 V V Consider that Γ(Ci, ηi) = 2 · Γ(Ci,Ci) = 2 , Equation (3.9) can be rewritten by
V PA = 1 − P r[Γ(λ0 ,C ) ≤ T − |(λ0 = ω0)]. (3.10) D i i a 2 i i
Based on the mean and variance of correlation degree determined in Section 3.3.3, then
PAD can be represented by
√ V Z T − 2 L a 2 −x L 2σ2 PAD = 1 − √ e x dx. (3.11) 2πσx −∞
68 √ 2 2 x L √x L Let y = 2 and y = , then we have 2σ 2σx V √ √ (Ta− ) L Z √ 2 √ L 2σx 2σx −y2 PAD = 1 − √ √ e dy (3.12) 2πσx −∞ L V √ (Ta− ) L Z ( √ 2 1 2σx 2 = 1 − √ e−y dy (3.13) π −∞ Z ∞ 1 −y2 = 1 − √ V √ e dy. (3.14) ( −Ta) L π 2 √ 2σx
ii) Derivation of attack false positive rate PAF .
0 0 0 0 We know Γ(λi,Ci) = λi¯Ci, where λi = ωi when no iLOC attack traffic exists. Assume
2 0 σx that Γ(λi,Ci) follows a Gaussian distribution N(0, L ) (discussed in in Section 3.3.3), we have
0 0 0 PAF = P r[Γ(λi,Ci) ≥ Ta|(λi = ωi)]. (3.15)
Thus PAF can be presented by
√ Z L ∞ −x2L 2σ2 PAF = √ e x dx. (3.16) 2πσx Ta √ 2 2 x L √ Lx Let y = 2 and y = , we have 2σx 2σx
√ √ Z ∞ Z ∞ L −y2 2σx 1 −y2 PAF = √ √ (e √ )dy = √ √ e dy. (3.17) 2πσx √LTa L π √LTa 2σx 2σx Remarks: We make a few observations based on the theorem presented above. First, the
attack successful rate PAD increases and the attack false positive rate PAF decreases with
increasing PN-code length L. That is, higher attack accuracy increases when L increases.
Second, with the increasing mark traffic rate V , attack accuracy also increases.
3.4.2 Invisibility Analysis
Here, invisibility refers to how invisible the iLOC attack is from the detection of de-
fender. In order to analyze invisibility, we need to consider the detection algorithms. While
69 there have been many different algorithms proposed to detect anomalies in port-scan traf-
fic, here we use a representative and generic algorithm which has no specific requirement
on detection systems. This threshold based detection algorithm is widely adopted by many
systems [86, 103, 123, 114]. In this algorithm, if the traffic rate (volume in a given time
duration) is larger than a pre-determined threshold Td (referred to as the defender detection
threshold), the defender issues threat alerts and initiates reactions [103]. Such detection
threshold is usually obtained through statistical analysis of the background traffic. Note
that the threshold Td must be carefully chosen for anomaly detection: it must maintain both high detection rate (i.e., the probability that an ongoing attack is detected) and low false positive rate (i.e., the probability that an alarm is triggered when no attack is occurring).
To measure attack invisibility in terms of how well the iLOC attack can evade the detec-
tion by the defender, we use the following two metrics. The first is the defender detection
rate PDD, the probability that the defender correctly detects the attack traffic introduced
by the iLOC attack. The other one is the defender false positive rate PDF , the probability that the defender mistakenly identifies the attack traffic.
Similar to our approach in Section 3.3.2, we use random variable ω0 to represent the
shifted background traffic, and random variable λ0 to represent the shifted traffic data re-
ported by the ITM system. Note that if no iLOC attack exists, λ0 = ω0. Assume that values
of ω0 at a different time unit are independent identically distributed (i.i.d) and follow a
0 2 Gaussian random distribution with standard deviation σx (i.e., ω follows N(0, σx)). Then we have the following theorem for attack invisibility.
Theorem 2 In the iLOC attack, the defender detection rate PDD is
0 0 0 PDD = 1 − P r[λ ≤ Td|(λ = V + ω )] (3.18)
70 Z ∞ 1 2 = 1 − √ e−y dy. (3.19) (V −T ) π √ d 2σx
The defender false positive rate PDF is
0 0 0 PDF = P r[λ ≤ Td|(λ = ω )] (3.20) Z ∞ 1 2 = √ e−y dy. (3.21) T π √ d 2σx
The proof of Theorem 2 is similar to that of Theorem 1, therefore, we will skip it here
due to the space limitation.
Remarks: From Theorem 2, we make the following observations. First, with the in-
crease of mark traffic rate V , the defender detection rate PDD increases, thus the attack
invisibility decreases. Second, the mark traffic rate V does not affect the defender false
positive rate PDF , which is only determined by the threshold Td configured by the de-
fender.
3.4.3 Determination of Attack Parameters
Determination of V , Ta and L
The attacker can determine the values of attack parameters based on the above analysis.
First, the attacker can determine the mark traffic rate V . This is because V is only related
to the attack invisibility metrics (defender detection rate PDD), and it impacts the determi-
nation of other parameters. After determination of V and given the expected false positive
rate, the attacker can further determine the mark decoding threshold Ta and PN-code length
L. Note that the values of other attack parameters such as the standard deviation of back- ground traffic σx, can be determined through analyzing historical background traffic data published by the data center of the ITM system.
71 (i) Mark traffic rate V : Using Equation (3.21), the attacker can first estimate the de- fender detection threshold Td based on a reasonable upper-bound of the defender false pos- itive rate PDF . For example, using the central limitation theory, we know that Td = 3 · σx
achieves a reasonable defender false positive rate PDF (1.7%). Thus, the attacker can use
3 · σx as a reasonable estimation of Td. After that, given the defender detection rate PDD which can be estimated similarly, and the background traffic deviation σx, the attacker can
determine the mark traffic rate V by resolving Equation (3.19) in Theory 2.
(ii) Mark recognition threshold Ta: Given the mark traffic rate V (determined previ-
ously) and desired attack false positive rate PAF , the attacker can further determine the mark decoding threshold Ta by resolving Equation (3.8) in Theorem 1.
(iii) Length of PN-code L: Given the mark traffic rate V , mark decoding threshold Ta, and desired attack successful rate PAD, the attacker can further determine the length of
PN-code L by resolving Equation (3.6) in Theorem 1.
Determination of Ts
To determine the mark bit duration Ts, the attacker needs to estimate the possible delay
from the moment when the attack traffic is first reported by monitors, to the moment when
such attack traffic is published by the data center. To make the iLOC attack effective, the mark bit duration needs to be at least as large as such delay. Otherwise, the traffic in different bit durations (each last Ts) may be published at the same moment from the data center, mixing and thereby rendering them inseparable.
Several possible methods can be used to obtain such delay information. Some ITM
systems may publish such information on their websites. The attacker may also actively
conduct experiments on ITM systems and measure such delay. For example, the attacker
may deploy monitors in his controlled (small) network and connect them to the targeted
72 ITM system. The attacker can simply use such monitors to report logs embedded with
special patterns (e.g., PN-code) and keep querying the data center until the embedded traffic
patterns are recognized. After repeating the above process for several times, the attacker
is able to obtain the statistics profile of delay information, and then determine the mark
bit duration Ts. We use this method in our implementation of the iLOC attack, which is
presented in the next section.
3.5 Implementation and Validation
In this section, we first introduce our implementation of the iLOC attack. Then, we
report the validation results of our iLOC attack design and experiments against a real-world
ITM system. 3.5.1 Implementation of the iLOC Attack
We implement an iLOC attack prototype based on the design in Section 3.3. This
prototype works against any ITM system with the data center having a web-based user
interface. Particularly, there are five independent and important components in our iLOC
implementation, Data Center Querist, Background Traffic Analyzer, PN-code Generator,
Attack Traffic Generator and Attack Mark Decoder.
In particular, Data Center Querist is a component that interacts with the data center of
the targeted ITM system. Its main tasks consist of sending queries to the data center for
port-scan traffic report and retrieving the response (i.e., the report) from the data center.
The inputs to this component are the URL, or IP address, of the data center and the port
number of the port-scan traffic needed to query. From the traffic report data, Background
Traffic Analyzer can obtain the statistics profile of background traffic and determine attack parameters for other components. PN-code Generator is a component that generates and
73 Network B Internet Data Center of Threat Monitoring System Campus Network R1
Network A
Attacker Monitors
Figure 3.3: Experiment Setup
stores the PN-code. The PN-code length is determined according to the attacker’s objec-
tives and background traffic profile as described in Section 3.4.3. Attack Traffic Generator
is a component that generates attack traffic based on the PN-code and background statis-
tics profile. In which, the PN-code encoded traffic is generated in a way as discussed in
Section 3.3.2. Inputs to this component are the IP addresses range of target network, port
number and transportation protocol (TCP or UDP). Attack Mark Decoder is a component
that obtains the port-scan report data through Data Center Querist, and decides whether the
attack mark exists in the way discussed in Section 3.3.3. The PN-code used in the decod-
ing process is the same as the one used in encoding attack traffic and stored in the PN-code
Generator.
These components may be integrated into one program running on one machine. The
attack can also be carried out in a more flexible ways if the tasks of the above components
are performed by processes on different machines. Our iLOC prototype is implemented
using Microsoft MFC and Matlab on Windows XP operating system. 3.5.2 Validation of the iLOC Attack
In order to validate our iLOC implementation, we deployed it to identify a set of moni- tors that are associated with a real-world ITM system.
74 150 Background Traffic 145 Traffic Mixed with iLOC Attack
140
135
130
125
Traffic Rate 120
115
110
105
100 0 50 100 150 200 250 Time (hour)
Figure 3.4: Background Traffic vs. Traffic Mixed with iLOC Attack
24 Background Traffic
22 Traffic Mixed with iLOC Attack
20
18
16
14 Power 12
10
8
6
4 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 −5 Frequency (Hz) x 10
Figure 3.5: PSD for Background Traffic vs. Traffic Mixed with iLOC Attack
Fig. 3.3 illustrates our experiment setup. For the purposes of this research, we requested
information about locations of a set of monitors in the ITM system. We were provided
with the identities of two network sets A and B. There are some monitors deployed within
network set A and there is no monitor in network set B. All monitors in network set
A monitor a set of IP addresses and record the port-scan logs. Then we (the attacker)
execute the iLOC attack to decide whether monitors exist in the network set A and set B, respectively.
In our experiment, we use a PN-code of length 15. The mark bit duration is set for 2
hours and the query duration is 20 minutes. With the queried report data, we can correctly
determine that all networks in set A are deployed with monitors and networks in B are not
75 deployed with monitors. Fig. 3.4 shows the traffic rate in time-domain. Fig. 3.5 shows the traffic rate in frequency-domain in terms of Power Spectrum Density (PSD). The PSD describes how the power of a time series data is distributed in frequency-domain. Mathe- matically, it is equal to the Fourier transform of the auto-correlation of time series data [9].
From these two figures, we observe that it is hard for others, without knowing the content of PN-code, to detect the iLOC attack, since the overall traffic with the iLOC attack is very similar to the traffic without the iLOC attack traffic embedded. That is, such experiments demonstrate that the iLOC attack can accurately and invisibly localize the monitors of ITM systems in practice.
3.6 Performance Evaluation 3.6.1 Evaluation Methodology
In our evaluation, we use the real-world port-scan traces from SANs ISC (Internet
Storm Center) including the detail logs from 01/01/2005 to 01/15/2005 [103, 39]10. The traces used in our study contain over 80 million records and the overall data volume ex- ceeds 80 GB. We use these real-world traces as the background traffic. We merge records of simulated iLOC attack traffic into these traces and replay the merged data to emulate the iLOC attack traffic. We evaluate different attack scenarios by varying attack parame- ters. Here, we only show the data on port 135; experiments on other ports result in similar observations.
We explore both attack accuracy and invisibility to evaluate attack performance. For attack accuracy, we use two metrics: one is the attack successful rate PAD and the other is
10We thank the ISC for providing us valuable traces in this research.
76 1
0.9
0.8 D 0.7 iLOC Experiment iLOC Theory 0.6 Volume−based Frequency−based 0.5
0.4
0.3 Attack Successful Rate − PA 0.2
0.1
0 0.5 1 1.5 2 2.5 3 Attack Traffic Rate − P
Figure 3.6: Attack Successful Rate (Port 135)
the attack false positive rate PAF , which are defined in Section 3.4.1. For attack invisibil-
ity, we use two metrics: one is the defender detection rate PDD and the other is defender
false positive rate PDF , which are defined in Section 3.4.2.
We evaluate the iLOC attack in comparison with two other baseline attack schemes.
The first one is the localization attack that launches a significantly high-rate of port-scan traffic to target networks as introduced in [18, 110]. We denote this attack as volume-based
attack. The second baseline scheme embeds the attack traffic with a unique frequency pat- tern. In this attack, the attack traffic rate changes periodically. Then the attacker expects that the report data from the data center show such unique frequency pattern if the selected target network is deployed with monitors. We denote this attack scheme as frequency-based
attack. For fairness, we adjust the detection thresholds in all schemes so that reasonable at- tack false positive rate PAF and defender false positive rate PDF (below 1%) are achieved.
For the iLOC attack, we generate different attack traffic based on variant PN-code length L
(i.e., 15, 30, 45). The default PN-code length is set to 30. To better quantify the attack traf-
fic rate for the iLOC attack and other attack schemes, we use the normalized attack traffic rate P , which is defined as P = V/σx for iLOC attack, where σx is the standard variation of background traffic rate. The default value of Tq = 0.1 · Ts.
77 3.6.2 Results Attack Accuracy
To compare the attack accuracy of the iLOC attack with that of volume and frequency- based attack schemes, we plot the attack successful rate PAD under different attack traffic rates (i.e., P ∈ [0.01, 3]) as shown in Fig. 3.6. From this figure, we observe that both iLOC and frequency-based attacks consistently achieve a much higher attack successful rate PAD than the volume-based scheme. This difference in PAD is more significant when the attack traffic rate is lower, which can be explained as follows. For the iLOC scheme, the PN-code based encoding/decoding makes the recognition of attack marks robust to interference of the background traffic. For the frequency-based scheme, the invariant frequency in the attack traffic is also robust to the interference of the background traffic. Both of them can distinguish their attack traffic accurately even when the attack traffic rate (i.e., P ) is small.
Nevertheless, the volume-based scheme relies on the high rate of attack traffic (i.e., large
P ), and thus, is very sensitive to the the interference of the background traffic.
Attack invisibility
To compare the attack invisibility performance of the iLOC attack with the other two attack schemes, we show the defender detection rate PDD on port 135 in Table 3.1. This table shows the attacker-achieved defender detection rate PDD, given different localization successful rates PAD (90%, 95%, and 98%). Recall that the defender sets the detection threshold to make the defender false positive rate PDF below 1%. In the table, “(Time)” and “(Freq)” mean that the defender adopts the time-domain and frequency-domain analyt- ical techniques to detect attacks. It is observed that our iLOC scheme consistently achieves much lower defender detection rate PDD than other two schemes, which means the iLOC
78 attack achieves the best attack invisibility performance. As expected, the defender can eas-
ily detect the frequency-based attack by frequency-domain analytical technique, as there is
a unique frequency pattern in its attack traffic.
PAD iLOC(Time) iLOC(Freq) Volume-based Frequency-based Frequency-based attack (Time) attack (Freq) (Time) 90% 2.5% 2.2% 90% 90% 2.9% 95% 2.8% 2.4% 95% 95% 3.1% 98% 3.1% 2.8% 98% 98% 3.3%
Table 3.1: Defender Detection Rate PDD (Port 135)
Impact of the Length of PN-code
To investigate the impact of the PN-code length on the performance of the iLOC attack, we plot the attack successful rate PAD for PN-code of different lengths (15, 30, 45) in
Fig. 3.7. In the legend, iLOC(L = x) means that the PN-code length is x. Data in this
figure are also collected for various attack traffic rates. This figure shows that the attack successful rate PAD increases with increasing PN-code length. This is because a longer
PN-code can more significantly reduce the interference impact from the background traffic on recognizing the attack mark, thereby achieving higher attack accuracy.
Impact of the Number of Parallel Localization Attacks
To evaluate the impact of the number of parallel localization capability on attack ac-
curacy, we show the attack successful rate PAD for different number of parallel attack
sessions on the same port in Fig. 3.8. In the legend, iLOC(N = x) means that there are
x parallel attack sessions. This figure shows that in term of attack successful rate PAD,
79 the iLOC attack scheme is not sensitive to the number of parallel attack sessions. The attack successful rate PAD only slightly decreases with the increasing number of paral- lel attack sessions. The reason is that the traffic for different attack sessions are encoded by PN-codes, which are low cross-correlated to each other as described in Section 3.3.2, and thereby have little interference amongst them. Fig. 3.9 shows the impact of the num- ber of parallel attack sessions on attack invisibility. It can be observed that the increasing number of parallel attack sessions results in a slight increase of defender detection rate
PDD. Therefore, parallel localization capability can improve the attack efficiency without significantly compromising both accuracy and invisibility.
The iLOC attack achieves invisibility by using the PN-code, which contributes to a longer period in which the attack can be carried out. Nevertheless, parallel capability can significantly improve the attack efficiency. For example, take the case of attacking a system consisting of 1200 networks. Using one port, the volume-based attack needs 1200 unit time to perform the attack task. Single iLOC attack with code length of 15 needs 1200 ×
15 = 18000 unit time and achieves higher accuracy and invisibility. To fulfill the same localization attack task, parallel iLOC with 8 attack sessions and same code length can achieve similarly high accuracy and invisibility performance and the total time is only
1200 × 15/8 = 2250 unit time, which is comparable to that of a volume-based attack.
3.7 Guidelines of Countermeasures
We have demonstrated the threat of the iLOC attack against ITM systems above. Now, we discuss possible countermeasures to such attacks. It is relatively easy to defend against volume-based and frequency-based localization attacks which either embed a spike (using a high-rate scan traffic) [18, 110] or an invariable frequency (using a certain frequency
80 1
0.9
0.8 D 0.7
0.6
0.5
0.4 C−Probe Experiment (L=15) 0.3 C−Probe Theory (L=15) C−Probe Experiment (L=30) Attack Successful Rate − PA C−Probe Theory (L=30) 0.2 C−Probe Experiment (L=45) C−Probe Theory (L=45) 0.1 Frequency−based Probe Volume−based Probe 0 0.5 1 1.5 2 2.5 3 Attack Traffic Rate − P
Figure 3.7: Attack Successful Rate vs. Code Length
1
0.9
0.8 D 0.7
0.6
0.5
0.4
0.3 iLOC Experiment (N=2) Attack Successful Rate − PA iLOC Experiment (N=4) 0.2 iLOC Experiment (N=8) Frequency−based 0.1 Volume−based
0 0.5 1 1.5 2 2.5 3 Attack Traffic Rate P
Figure 3.8: Attack Successful Rate vs. Number of Parallel Attack Sessions
0.07 iLOC Experiment (N=2, Time) iLOC Experiment (N=4, Time) 0.06 iLOC Experiment (N=8, Time iLOC Experiment (N=2, Freq)
D iLOC Experiment (N=4, Freq) 0.05 iLOC Experiment (N=8, Freq)
0.04
0.03
0.02 Defender Detection Rate − PD
0.01
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 Attack Traffic Rate − P
Figure 3.9: Attack Successful Rate vs. Number of Parallel Attack Sessions
81 pattern), since these two attack schemes show strong signatures in the attack traffic (either in the time domain or frequency domain). However, in order to defend against the iLOC attack, the defender needs deep insight into the design of the attack. We give some general guidelines for counteracting the iLOC attack from the following aspects.
1) Limiting the Information Access Rate: Recall that in the iLOC attack, the attacker must generate a significant amount of queries to the data center of ITM systems in order to accurately recognize the encoded attack traffic. We may explore such knowledge to reduce the effectiveness of iLOC attack. To do so, the data center may throttle the query request rate. One possible way is to enforce human/system interaction for each query, and thereby eliminate the automatic query in the iLOC attack. This can be conducted through authenticated registration, e.g., one authenticated registration is only valid for a certain number of queries. However, these limitations on information access rate may also reduce the functionality and usage of ITM systems.
2) Perturbing the Information: Recall that in the iLOC attack, the attacker needs to recognize the encoded attack traffic. Thus, the quality of reports plays an important role in such a recognition process. To reduce the effectiveness of iLOC attack, we may perturb the published report data by adding some random noise and even randomizing the data pub- lishing delay. This principle is similar to the data perturbation in private data sharing realm
[145, 8]. By perturbing report data, the attack accuracy of iLOC attack can be degraded.
On the other hand, adding random noise and randomizing the delay in publishing report data will impact the data accuracy and usage of ITM systems. Studying such a trade-off will be one aspect of our future work.
3) Investigating Advanced Detection Schemes: Recall that in the iLOC attack, in order to effectively evade detection of monitors in ITM systems, the attacker has to continuously
82 launch attack/port-scan traffic to different new target networks to localize as many monitors as possible. Consequently, the target IP addresses of attack traffic may exhibit a widely dispersed distribution [67]. Thus, analyzing the distribution of IP addresses may provide one possible method of detection. Furthermore, since the iLOC attack uses the PN-code to encode the attack traffic, we plan to study potential ways of detecting PN-code in our future work.
3.8 Related Work
Many ITM systems have been developed and deployed since CAIDA initiated the net- work telescope project to monitor background traffic in 2001 [22]. Although the IP ad- dresses of monitors themselves can be protected by mechanisms, such as encryption and
Bloom filter [52], the public data reported by these ITM systems could be used to dis- close the IP address space covered by monitors. Existing attack approaches achieve this by launching high-rate port-scan traffic [18, 110]. However, these kind of attacks do not consider the invisibility of attacks, since the high-rate attack traffic exposes the chance of being detected.
The invisibility techniques in our work borrows the camouflage principle, as illustrated by nature and the military. In nature, an animal can disguise itself as the object on which it stands in order to fool its predators or prey [12]. In military, soldiers wear camouflage clothing designed to blend into the surrounding terrain [94]. As an invisibility technique, our work leverages the PN-code technology and extends it to a new Internet cyber-security realm. The PN-code was initially used in military communication systems to provide anti- jamming and secured communication [97]. In wireless communication, PN-code has been widely used to improve the communication efficiency [35]. In addition, PN-code has other
83 broad applications, such as cryptography [17], secured data storage and retrieving [126],
image processing [133].
In this chapter, we study techniques in applying the PN-code in the iLOC attack. Work in [139] also studied how to use PN-code to effectively track anonymous flows through mix networks. Since it is applied to a different problem domain, the solution in [139] is significantly different from the one in this chapter, including the use of the PN code, designed algorithms, decision rule, and theoretical analysis.
3.9 Summary
In this chapter, we investigated a new class of attacks, i.e., the invisible LOCalization
(iLOC) attack. It can accurately and invisibly localize monitors of ITM systems. Its ef-
fectiveness is demonstrated by theoretical analysis and experiments with an implemented
prototype. We believe that our work lays the foundation for ongoing studies of attacks that
intelligently adapt attack traffic to defenses. Our study is critical for securing and improv-
ing ITM systems.
As part of our future work, we will continue studying other invisible localization attack
schemes. While the PN-code used in this chapter is effective in achieving attack objectives
of accuracy and invisibility, other attack patterns embedded in attack traffic may also be
accurately distinguished only by the attacker. Detection of such invisible attack and de-
sign of corresponding countermeasures are still challenging tasks and will be conducted
in our future research. Investigation of proactive methods to protect the location privacy
of monitors is also a part of our future work. Also, we believe that other vulnerabilities
exist in ITM systems and we plan to conduct a thorough investigation of them and develop
corresponding countermeasures.
84 CHAPTER 4
VARYING SCAN RATE WORMS AGAINST NETWORK-BASED WORM DETECTIONS AND COUNTERMEASURES
In this and next chapters, we study the other type of defense-oriented widespread Inter-
net attacks, i.e., algorithm-oriented attacks. In this chapter, we study a evolved worm called
Varying Scan Rate Worm (the VSR worm in short) which deliberately vires its scan rate and
is able to avoid detection by existing network-based worm detection systems. We also
present our new worm detection scheme called attack target Distribution Entropy based dynamiC detection scheme (DEC detection in short) which can effectively detect both VSR and traditional worms.
4.1 Motivations
In traditional active worms, each worm instance11 takes part in spreading worm attack
by scanning and infecting other vulnerable hosts in the Internet. The basic form of active
worms is the Pure Random Scan (PRS) worm, where a worm infected host continuously
scans a set of random Internet IP addresses to find new vulnerable hosts. There are several
variants of the PRS worm such as local subnet scan worm [27] and hit-list scan worm [114].
Both of these worms attempt to speed up their propagation by increasing the probability
11In this chapter, we use worm infected host, worm victim, and worm instance interchangeably.
85 of successful scanning. It has been observed that there are exponentially increasing trends in the number of infected hosts and overall scanning traffic volume over time when any of the above worms propagates in the Internet [86][27][148]. Based on these observations, many worm detection schemes associated with global scan traffic monitoring systems, such as threshold-based detection and trend-based detection have been developed to detect the large scale propagation of worms in the Internet [147][105][132][123].
However, active worms continue evolving. The ”Atak” worm [144] is a new active worm found in the recent past that attempts to remain hidden by going to sleep (stop scan- ning) when it suspects to be under detection. The worms which adopt similar attack strate- gies to that of the ”Atak” worm could result in an overall scan traffic pattern different from that of traditional worms. Therefore, the existing detection schemes based on global scan traffic monitoring will not be able to effectively detect them. Therefore, it is very important to understand such smart-worms in order to defend against them. In the following, we study two new classes of such worms and design new detection schemes which can effectively detect them and traditional worms.
4.2 Background
In this section, we first introduce the propagation model of traditional active worms.
We then discuss the worm detection framework and corresponding detection schemes as- sociated with it.
4.2.1 The Propagation Model of Traditional Worms
Computer active worms are similar to biological viruses in terms of their infection and self-propagating nature. Worms identify vulnerable hosts in their neighborhood and other
86 networks and then infect them. The newly infected hosts propagate the worm infection
further to other vulnerable hosts and so on.
As discussed in Section 4.1, the Pure Random Scan (PRS) worm is the basic form of
traditional active worms. We use PRS approach as our baseline attack scheme to study
the the VSR worm and C-Worm. In our analysis, we adopt the epidemic dynamic model
for disease propagation [13] to characterize the worm propagation. Modeling active worm
propagation using the epidemic dynamic model has been done in [86][27][148]. This model
assumes that each host is in one of the following states: immune, vulnerable or infected.
An immune host is one that cannot be infected by a worm. A vulnerable host becomes an
infected host after being successfully infected by a worm. We use an average case analysis
approach to analyze the worm propagation. The analysis is conducted in discrete time
domain.
We define M(i) and N(i) as the number of overall infected hosts and the number of un-infected vulnerable hosts at time tick i respectively, and E(i+1) as the number of newly
infected hosts from time tick i to time tick i + 1. We define T as the total number of IP
32 addresses, i.e., 2 for IPv4. Thus N(0) = T · P1 · P2 is the number of vulnerable hosts
on the Internet before the worm attack starts, where P1 is the ratio of the number of IP
addresses assigned to existing hosts in the Internet to the entire Internet IP address space
T , P2 is the ratio of vulnerable host number to the total number of the existing hosts in
the Internet. We also assume E(0) = 0 and M(0) = M0. For the PRS worm, we have
propagation model as follows,
1 E(i + 1) = N(i)[1 − (1 − )S·M(i)], (4.1) T
M(i + 1) = M(i) + E(i + 1), (4.2)
87 N(i + 1) = N(i) − E(i + 1). (4.3)
The details of this model are in [27][132].
4.2.2 Network-based Worm Detection Worm Detection System Framework
As we stated before, this chapter focuses on designing schemes to detect the Internet- wide large scale propagation of active worms. In order to achieve rapid and accurate detection, it is imperative to monitor and analyze the traffic in multiple locations over the Internet to detect suspicious traffic generated by worms. The generic worm detec- tion framework that we use in this chapter consists of multiple distributed monitors and a worm detection center that controls the former. This framework is widely adopted, and is similar to other existing worm detection systems, such as the Cyber center for dis- ease controller [114], honeypots [113] and Internet sink [138]. The distributed monitors are located at gateways, firewalls, and border routers of local networks. They passively record and report any unusual scan traffic data, such as connection attempts to unavail- able IP address and restricted service ports, to the worm detection center. The report to the worm detection center generally includes scan information in the form of a tuple, i.e.,
(Source IP, Destination IP, T ime, Signature). The worm detection center processes such reports and deploy various worm detection schemes to check whether there are sus- picious and large-scale scans to restricted ports or connection attempts to unassigned IP addresses. If such excessive and uncommon scans are detected, the worm detection center concludes that there is a large-scale worm propagation over the monitored network.
88 Worm Detection Schemes
Various worm detection schemes based on the global scan traffic monitoring can be
integrated into the above detection framework. Similar to the traditional intrusion systems,
most worm detection schemes take three steps in worm detection; (i) collecting the data for
detection purpose; (ii) analyzing the detection data; (iii) applying certain decision rules to
obtain the detection result.
Most existing worm detection schemes use the count of monitored worm instances as detection data. For the statistic property of detection data, various detection schemes are different in terms of the format and usage. Most of them use individual sampled detection data of each sampling window to decide the presence of wide-spreading worm [123][131].
Some of them use the variance of sampled detection data to set the detection criterion such as the threshold of detection decision [132]. Based on the detection decision rule, the existing detection schemes can be classified into two groups, namely the count-threshold- based and count-trend-based detections. The count-threshold-based scheme is a popular scheme to detect wide-spreading worms [132][123][131]. In this scheme, if the amount of observed worm instances/scan traffic exceeds a pre-configured threshold, large scale worm spreading is identified as being in existence. On the other hand, the count-trend- based scheme [147] is based on the knowledge of existing worm attack epidemic dynamic models and uses the principle of ”detecting dynamic traffic trends”. That is, it is based on the observation that although the scan rate of a worm instance might be limited by the network bandwidth and CPU capacity, the worm instance does not change the scan rate on purpose. Thus, the existing worm attacks cause the number of worm instances to increase at a positive exponential rate which can be monitored for detection purposes. In Section
89 4.3, we define a new active worm attack strategy and observe that the above two classes of
detection schemes are not able to detect this new active worm effectively.
4.3 The Active Worm with Varying Scan Rate
In this section, we first define a new form of active worms, i.e., the active worm with
Varying Scan Rate (the VSR worm in short). We then analyze the effectiveness of the
VSR worm in changing the worm scan behavior and hence evading the existing global scan
traffic monitoring based worm detection schemes.
4.3.1 The VSR Worm Model
For the traditional worm, each worm instance scans and infects other hosts on the In-
ternet. For the VSR worm, each worm-infected victim (worm instance) adopts a varying
scan rate (S(t)) and a varying attack probability (Pa(t)). S(t) can be totally randomized or determined by a certain function depending on worm attack strategies. The attack prob- ability Pa(t) is the probability that a worm instance takes part in worm attack (i.e., scan other hosts on the Internet) at time t. In practice, a worm attacker may divide the time into a sequence of discrete time units. Accordingly, S(t) and Pa(t) become discrete functions,
th S(i) and Pa(i). In the i time unit, worm instances calculate S(i) and Pa(i) and carry out scan based on these two values. Algorithm 2 shows the procedure of the VSR worm attack.
The scan rate function S(i) and the attack probability function Pa(i) are predetermined
Sn by the worm attacker. For example, a VSR worm uses S(i) = max( 1+i , 0.02), which is
a time (index by i) decreasing function. Here, Sn is a parameter to control the overall
scan rate during the attack. This VSR worm can use a periodic function for the attack
90 Procedure 2 The Psudocode of the VSR Worm Attack
Require: This host is a worm-infected host 1: for all i = 0 to ∞ do 2: /*current time is i*/ 3: Calculate random scan rate S(i); 4: Calculate attack probability Pa(i); 5: Conduct the scan based on S(i) and Pa(i); 6: end for
probability,
2π P (i) = max(| cos( i)|, 0.08). (4.4) a 5000
Our VSR worm model is generic. The ”Atak” worm is one of its special cases, where
S(t) is close to a constant value and Pa(t) is determined by a time varying function. The traditional PRS worm is also a special case of the VSR worm, where S(t) is close to a constant value and Pa(t) is equal to 1. Other forms of worms such as local subnet scan worm [81][27] have propagation formulas different from Formula (4.1) for the PRS worm.
However, the varying S(t) and Pa(t) functions can be easily applied to those propagation formulas to model the combination of those worms with the VSR worm.
4.3.2 Analysis of the VSR Worm
In this section, we first develop the propagation model of the VSR worm. Following this, we investigate the performance of the existing worm detection schemes in detecting
VSR worms, in order to determine the effectiveness of VSR worm in evading these detec- tion schemes.
91 Propagation model of the VSR Worm
For the VSR worm, the Formula (4.2) needs to be modified to take the S(i) and Pa(i) into consideration. For the analysis purpose, we assume all the worm instances use the same scan rate function S(i) and the attack probability function Pa(i). Then, we have,
1 M(i + 1) = M(i) + N(i)(1 − (1 − )S(i)·Pa(i)·M(i)). (4.5) T
Now we derive the number of worm instances observed by the detection system from time tick i − 1 to time tick i, referred as Mˆ (i). The number of observed worm instances is impacted by the percentage of IP address space, referred as Pm, monitored by the detection system. Pm determines the average probability that a worm scan can be monitored by the detection system.
Assume at time tick i, there are M(i) worm instances. Based on the VSR worm attack strategy, there are M(i)·Pa(i) worm instances actively conducting the scan, and each active instance generates S(i) scans between time tick i − 1 and time tick i. The probability that at least one scan from S(i) scans generated by a worm instance will be detected by the
S(i) detection system is 1 − (1 − Pm) . Thus, the number of worm instances observed by detection system from time tick i − 1 to time tick i is
ˆ S(i) M(i) = M(i) · Pa(i)[1 − (1 − Pm) ]. (4.6)
In the following, we show the simulation data on the propagation of the VSR worm. In order to show performance of different VSR worms, we select S(i) to be
Sn S(t, K) = max( t , 0.08), (4.7) 1 + K and use the same Pa(t) in Formula (4.4). We select the parameter K defined in For- mula (4.7) to be 200, 250 and 300 respectively. Thus, three corresponding VSR worms
92 Number of Total Infected Hosts on VSR Worms Number of Detected Worm Attack Instances on VSR Worms
500 350000 450 300000 400 350 250000 300 200000 250 150000 200 150 100000 100 50000
# of Attack Instances # of Attack 50
# of # Infected Host Instances 0 0 100 100 1000 2000 3000 4000 5000 6000 7500 8500 9500 1000 2000 3000 4000 5000 6000 7500 8500 9500 10500 11500 12500 13500 14500 15500 16500 17500 10500 11500 12500 13500 14500 15500 16500 17500 Time Time Traditional w orm VSR(K=200) VSR(K=250) VSR(K=300) Traditional w orm VSR(K=200) VSR(K=250) VSR(K=300)
Figure 4.1: Infection ratio of different Figure 4.2: The observed worm instance VSR worms. count of different VSR worms.
are generated, referred as VSR(K = 200), VSR(K = 250) and VSR(K = 300) respec- tively.
Fig. 4.1 shows the pattern of worm infected instances of the traditional PRS worm and three VSR worms determined previously. For the PRS worm, the scan rates of the worm infected hosts follow a normal distribution N(100, 100). It clearly demonstrates that the PRS worm has an exponential increasing pattern of worm instance number during its propagation, and the VSR worm can change this pattern. As shown in Fig. 4.2, the observed worm instance numbers of VSR worms are also very different from the traditional PRS worm. This demonstrates that the VSR worm can successfully hide the real worm instance count (M(i)), change its pattern over time and thus evade being effectively detected by the existing worm detection systems. In Fig. 4.2, the worm detection system covers a 220 IPv4
1 address space (Pm = 212 ) similar to that of SANS ISC [103].
93 Effectiveness of the VSR Worm
In the following, we evaluate the effectiveness of VSR worm in defeating some repre-
sentative worm detection schemes. We define two metrics here. The first one is the Detec-
tion Time (DT ), which is defined as the time taken to successfully detect a wide-spreading worm from the moment when the worm spreading starts. The second metric is the Maximal
Infected Ratio (MIR), which defines the ratio of infected host number over the total num- ber of vulnerable hosts up to the moment when the worm spreading is detected. This metric fundamentally quantifies the damage of the worm, i.e., how many hosts are infected by the time when worm spreading is detected. The importance of MIR is that it can distinguish
the performance of two worms in the case that two worms are detected at the same time
(same DT ), but they have infected different number of hosts at the moment being detected.
The fundamental purpose of the detection schemes is to minimize the damage caused by
the worm. Hence MIR also quantifies the effectiveness of the worm detection schemes.
The higher this value is, the better is the worm attack performance and consequently, the
worse is the detection performance.
In our simulations, the parameters of the traditional worm and the VSR worms are same
as those in Section 4.3.2. By analyzing the background non-worm scan traffic reported by
Internet Storm Center of SANs [103] and Goldsmith data [49], we are able to set detection
system parameters of all detection schemes to achieve reasonable detection false alarm
rate (below 5 × 10−4). The detection false alarm rate is defined as the probability that a
worm spreading alarm is reported as the detection result when there is no wide-spreading
worm. We obtain the results for three existing detection schemes. The first one is the
generalization of the detection schemes based on the comparison of the observed victim
count and a predefined threshold [123]. We refer this detection scheme as CISH. The
94 second detection scheme is proposed in [132], which is based on the observed victim count
variance and a dynamic threshold. We refer this detection as CVDH. The third detection
scheme is proposed in [147], which is based on observing a predetermined trend of victim
count during the worm propagation. We refer this detection scheme as CISR. We also run
the Kalman filter following the design in [147] to perform CISR detection on VSR worms.
Tables 4.1 and 4.2 show the results of above three detection schemes in detecting the
traditional PRS worm and VSR worms. Although all detection schemes are effective in
detecting the traditional PRS worm, they are not effective in detecting VSR worms. For
example, when K is 200, CISH and CISR schemes fail to detect the VSR worm while
CVDH is ineffective, i.e., 41% of the vulnerable hosts are infected at the moment when the worm is detected. Comparatively, CISR is less effective compared with other detection schemes due to the following reason; it tries to detect an exponential increasing trend of the worm scan traffic, but the trend of VSR worm’s scan traffic does not follow the expo- nential increasing pattern. This causes the Kalman filter to oscillate without convergence.
Our simulation results show that above threshold-based and trend-based worm detection schemes are not be able to effectively detect the VSR worm. In the following section, we develop a new detection scheme which aims to effectively detect the VSR worm.
Detection Traditional Worm VSR(200) VSR(250) VSR(300) CISH 862 ∞ 17700 7600 CVDH 879 33400 12011 9234 CISR 760 ∞ ∞ ∞
Table 4.1: Detection Time of Some Existing Detection Schemes
95 Detection Traditional Worm VSR(200) VSR(250) VSR(300) CISH 3% 100% 52% 12% CVDH 4% 41% 20% 23% CISR 2% 100% 100% 100%
Table 4.2: Maximal Infection Ratio of Some Existing Detection Schemes
4.4 DEC Worm Detection
4.4.1 Design Rationale
As we discussed in Section 4.3, the VSR worm can adopt intelligence in its attack such that it can behave differently from traditional worms. Consequently, the existing worm detection schemes are not effective in detecting the VSR worm.
In this section, we develop a new detection scheme called attack target Distribution
Entropy based dynamiC (DEC) detection that captures the key inherent worm propaga- tion feature and thus be able to effectively detect the VSR worm. The VSR worm can manipulate the scan traffic volume over time so that it is undetectable by existing worm detection schemes based on global scan traffic monitoring. However, its fundamental goal is to propagate itself to as many vulnerable hosts as possible. Hence, to be effective, the
VSR worm still has to continuously propagate the worm to the new targets and cause large scale infection to the Internet. Thus, the VSR worm scan traffic must show a widely dis- persed distribution for scan target addresses and the widely dispersed distribution of the attack/scan targets in worm scan traffic can be used to distinguish the VSR worm scan attack from other occasional non-worm port scan activities, i.e., port scan due to software
96 faults or occasional port scan by benign software. Motivated by this observation, our DEC detection uses the attack target distribution as the basic attack data for the worm detection.
Recall our discussion in Section 4.2.2 where we mentioned that there are three im- portant steps/elements associated with worm detection. DEC has special features in these three elements compared with the existing threshold-based and trend-based worm detection schemes: (i) Detection data of worm attacks: The distribution of the attack targets can be observed by the detection monitors in the detection system and DEC treats this distribution as the basic detection data. While a distribution can be described and quantified in different formats, we use the entropy to capture the distribution of the attack targets. From infor- mation theory perspective, a smaller entropy indicates fewer number of anomalies in the detection data. (ii) Statistical property of worm detection data: While processing and ana- lyzing the detection data sequences obtained from continuous detection sampling windows, we observe that the sample entropy can distinguish the non-worm scan traffic and worm scan traffic more effectively than other statistical measures, i.e., sample mean and variance.
Hence, we use the entropy to process sampled detection data and capture the abnormality during worm propagation. (iii) Detection decision rule: Since the VSR worm can behave differently over time, we adopt run-time dynamic adaptations in the DEC detection.
4.4.2 DEC Worm Detection
DEC has specific features which improve the detection performance not only for the detection of VSR worm, but also for the detection of traditional worms. DEC obtains this improvement through the new features of three elements in worm detection as follows:
97 Attack Target Distribution
To deal with the VSR worm, DEC detection uses the distribution of the attack targets
in worm attacks as the detection data. The distribution of the attack targets is captured in
the form of entropy. In the following, we describe how to use the entropy to measure the
distribution of the worm attack targets.
The Shannon entropy H(X) of a data set X = (x1, x2, . . . , xn) is defined as
Xn H(X) = − pilog(pi), (4.8) i=1
where n = |X|, pi = P [X = xi] for i ∈ 1, . . . , n, the logarithm is taken in base 2. Entropy
quantifies ”the amount of uncertainty” contained in data [109], where the ”uncertainty”
means randomness. Entropy is frequently used to quantify the ”randomness” of a distri-
bution; more precisely is the degree of dispersal or concentration of a distribution. The
reason we use entropy is that; when worm is propagating, the destination IP address of port
scanning traffic (worm attack target IP addresses) has a large distribution (more random)
and entropy naturally can quantify the distribution of worm attack targets and detect the
existence of worm propagation.
In the worm detection system, for each detection sampling window, the detection cen-
ter collects the report of connection attempts targeting unused IP addresses or restricted
service ports from the detection monitors located at different locations. Recall that men-
tioned in Section 4.2.2, each entry in a report table from detection monitors has following
format (Src, Dest, time, signature). Src represents the IP address of the worm instance;
Dest represents the destination IP address as the worm scan target; time represents the time stamp of scan; and signature represents some feature of scans, such as port number.
With all reports collected in one sampling window Ws (time unit), an attack/scan target
98 table is integrated for the entropy calculation with the following format: (Dest, sn). Here,
Dest is the unique key representing the scan target IP address and sn is the number of
distinct sources trying to scan Dest. For example, if a attack target table has M entries,
we have a set of data, Z1 = ((Dest1, sn1),..., (DestM , snM )). Mapping this case to the
Formula (4.8), we derive the entropy of worm attack target distribution,
X sn sn H¯ (Z ) = − ( i )log( i ), (4.9) 1 Y Y i=1
PM where Y = i=1 sni.
Statistical Property of Worm Detection Data
To deal with the VSR worm properties such as the time varying scan rate and the time
varying scan traffic pattern, we use a statistical methodology to process the detection data
sequence obtained from continuous detection sampling windows to improve the detection
accuracy. During worm detection, the detection center configures a detection sliding win-
dow Wd (time unit), which includes q (> 1) continuous detection sampling windows (Re-
call that the size of each sampling window is Ws). As discussed before, the detection center calculates the target distribution in terms of entropy in each sampling window by
Formula (4.9) as the detection data. Thus, there is one detection data in each sampling win- dow. Within the sliding window Wd, there are q target distribution entropy values denoted ¯ ¯ ¯ by Z2 = (Hi−q−1, Hi−q−2,..., Hi) recorded at time i.
The detection center can use sample mean and sample entropy as the statistical property
to process above q detection data in a detection sliding window. The sample mean Eˆ(H¯ ) ˆ and sample entropy H(Z2) of q target distribution entropy series Z2 are defined below:
q X H¯ Eˆ(H(¯Z )) = i−q−j , (4.10) 2 n j=1
99 X o o Hˆ (Z ) = − j log( j ) + log(B). (4.11) 2 q q j
ˆ In H(Z2), we use the histogram-based entropy estimation in [84]. Here oj is the number of sample points in ith bin and B is the histogram’s bin size. Note that these two parameters are obtained through q target distribution entropy values denoted by Z2 [84]. Based on the mean square error of entropy estimation, we can obtain the optimal bin size of the entropy estimation [84].
Decision Rule Adaptation
With the above statistical property of the detection data, the last step in worm detection is to apply detection decision rules. The threshold-based scheme has been widely used in anomaly detection field [132][36]. Worm detection is performed by comparing statis- tical features of the background non-worm traffic and the detection data traffic. As the
VSR worm adopts varying scan rate and worm spreading follows time varying dynamics, we adopt the statistical pattern recognition as a fundamental principle and apply dynamic threshold to deal with the VSR worm.
Our dynamic threshold adaption is inspired by the following observations: Assume that Xn is a random variable, which represents the detection data in the normal system without worm spreading. We also assume that Xw is the random variable representing the detection data in the network system under worm attack. With the statistical pattern recognition as a principle, the Fig. 4.3 shows the two probability density function of normal non-worm traffic and worm attack traffic. From this figure, we obtain two observations: (i)
For the threshold selection, there is a trade-off between the false alarm rate and detection rate. Detection rate is the probability that a wide-spreading worm is detected successfully.
When the threshold is set larger, it causes smaller false alarm rate and smaller detection
100 rate. The smaller detection rate causes longer detection time. (ii) When the variance of attack traffic Xw increases, the threshold should be dynamically adjusted to be smaller in order to maintain certain detection rate.
Distribution of Xn Distribution of Xw
Detection Rate
False Alarm Rate
Tr
Figure 4.3: Bayes decision rule for normal and worm traffic features
The above two observations provide the guidelines in determining the dynamic thresh- old. The threshold needs to consider the normal non-worm traffic property to achieve low false alarm rate. It also needs to consider the run-time detection traffic property, i.e., vari- ance. When the run-time traffic variance becomes larger, the threshold needs to select a relative smaller value in order to achieve high detection rate. If the normal non-worm traf-
fic and worm attack traffic follow the normal distributions, it is possible to obtain the close formula for the optimal threshold. We present the method to obtain optimal threshold in the case that the sample entropy is used as the statistical property of the detection data in
[140]. For the optimal threshold in the cases of sample mean and sample variance, please refer to [118].
Based on above observations, we conduct dynamic threshold adaptation. At the initial-
ization stage of worm detection, there is a initial threshold value, i.e., Tr0 , which is obtained through off-line training process with the large amount of normal non-worm Internet scan
101 traffic traces [103]. As a result, we can obtain the initial Tr0 to achieve reasonable false
alarm rate. With the run-time detection data, threshold Tr is dynamically adjusted based on
the run-time detection traffic variance σ(H¯ ) as
σ(H¯ ) Tr = [1 − α min ( , 1)]Tr0 , (4.12) Vm
where Vm is a constant value. The min() term is to provide the normalized operator for the
detection data variance σ(H¯ ). The α ∈ (0, 1) is the parameter to set the maximal adjustable threshold. Clearly, there is the trade-off for selecting α, i.e., a larger value of α will improve detection rate but potentially increase false alarm rate.
In Formula (4.12), σ(H¯ ) can be calculated as follows: At time tick i, there are q target ¯ ¯ ¯ distribution entropy values denoted by Z2 = (Hi−q−1, Hi−q−2,..., Hi) recorded in the ¯ sliding window Wd. The sample variance σ(H) of Z2 is defined by v u q u1 X σ(H¯ ) = t (H¯ − Eˆ(H¯ ))2, (4.13) q i−q−j j=1
where Eˆ(H¯ ) is calculated by Formula (4.10).
With dynamical threshold Tr obtained by Formula (4.12) and sample entropy of the detection data by Formula (4.11), the DEC detection scheme conducts the detection through comparing the sample entropy with threshold Tr. If the sample entropy is larger than Tr, the alarm of wide-spreading worm is generated.
4.4.3 Space of Worm Detection
As we discussed in Section 4.2.2 and Section 4.4.1, there are three important ele-
ments/steps in the worm detection. Fig. 4.4 shows the space of schemes, in which three
elements are shown as three different dimensions. We can use a tuple to represent a detec-
tion scheme subspace: (Detection Data, Statistical P rocessing, Decision Rule). We
102 then have 32 possible combinations and the whole detection scheme space is divided into
32 subspaces shown in Fig. 4.4. Each subspace represents a different type of detection schemes.
DEC
EDC th DVDH Detection Data: Statistic CVDH Property: 2: Distribution 4: Entropy
3: Variance
CISH CISR 2: Mean 1: Counter
1: Individual
Decision 1: Static 2: Dynamic 3: Static 4: Dynamic Rule: Threshold Threshold Trend Trend
Figure 4.4: Space of worm detection
The traditional count-threshold-based detection schemes are in the subspace modeled by tuple (Count, Individual, Static tHreshold). We refer this group of detection schemes as CISH. The traditional count-trend-based detection schemes (referred as CISR) are in the subspace that modeled as (Count, Individual, Static T rend) [147]. The detection scheme like [132] is in the subspace which is modeled as (Count, V ariance, Dynamic tHreshold), and we refer it as CVDH. The extension of CVDH by replacing the worm instance count with the worm attack target distribution generates another detection scheme referred as
DVDH in this chapter. DVDH is in the subspace which is modeled as (Distribution,
V ariance, Dynamic tHreshold). Our DEC detection scheme is in the subspace which is modeled as (Distribution, Entropy, Dynamic tHreshold). We refer DEC scheme as
103 DEDH in the following section in order to emphasize the comparison with other schemes.
With the space of detection schemes, we can comprehensively compare our DEC detection
scheme with other existing schemes. This detection scheme space can also inspire the study
of potential new worm detection schemes.
4.5 Performance Evaluation
In this section, we report the results of our simulations to show the detection perfor-
mance of various worm detection schemes under different VSR and traditional PRS worm
attacks. We also investigate the sensitivity of detection performance to the worm attack
parameters.
4.5.1 Evaluation Methodology
In this chapter, we evaluate our proposed detection scheme (DEC or DEDH) by com-
paring it performance with some representative schemes discussed in Section 4.4.3, i.e.,
CISH, DISR, CVDH and DVDH.
Evaluation Metrics
The first two metrics we used are the Detection Time (DT ) and Maximal Infection
Ratio (MIR), which are defined in Section 4.3.2. We also use detection false alarm rate defined in Section 4.3.2, which shows the accuracy of the detection scheme.
Simulation Setup
We use real-world Internet traffic traces as the background non-worm scan traffic in
our simulations. To do so, we analyzed scan traffic reported by Internet Storm Center in
SANs [103] and Goldsmith data [49]. The default parameters in our simulation are set as
104 follows. The total number of vulnerable hosts on the Internet is 360, 000, which is similar
to the number of total vulnerable hosts in Code-Red worm incidence [148]. The unit of
the scan rate is the number of scans per time unit. For the traditional worm attack, we
assume that the different infected hosts (worm instances) have different scan rates, but
each worm instance has a scan rate S (> 0) which is determined by a normal distribution
2 12 S = N(Sm,Sσ). In our simulations, we select Sσ as 100 and change Sm from 50 to 350 to evaluate different traditional PRS worms [147].
We simulate the VSR worms as follows. Each worm instance adopts a varying scan
rate (S(t)) and a varying attack probability (Pa(t)), both are functions of time t. The S(t) function in our simulation is
C1 S(t) = max( t ,C2), (4.14) 1 + K
which is a decreasing function of time t (C1 and C2 are constants). Note that S(t) is the
varying scan rate adopted by the VSR worm instance defined in Section 4.3.1. The attack
probability Pa(t) in our simulation is
2π P (t) = max(| cos( t)|,C ), (4.15) a 5000 3
where C3 is a constant. Different values of C1, C2 and C3 correspond to different S(t) and
Pa(t) functions, hence representing different VSR worms. Due to space limitation, we only
present a limited number of cases here. However, we found that the conclusions we draw
hold for all other cases we have evaluated.
We assume that the detection system has distributed monitors which cover 220 IP ad-
dresses. We select 220 as the detection system coverage size to simulate the coverage of
12Each worm instance may have different out-going link bandwidth and computing capacity, thus may have different scan rates.
105 the SANS ISC [103]. The detection sampling window Ws is set to be 5 time units and the detection sliding window Wd is set to be incremental from 100 units to 600 units. The in- cremental selection of Wd can be adaptive to reflect the worm scan traffic dynamics caused by the VSR worms with various speeds. For fair comparison purpose, the thresholds of detection schemes in the following evaluations are consistently configured with the values achieving the similar false alarm rate (below 5×10−4). For this purpose, the detection max- imal adjustable ratio of detection threshold α and parameter Vm (defined in Formula (4.12)) are 0.04 and 200 respectively. Note that our DEC scheme uses Formula (4.12) to dynami- cally adjust the threshold.
4.5.2 Detection Performance
In this section, we first compare the performance of our DEC (or DEDH) detection
scheme with other detection schemes under different VSR worm attacks. Then we report
the comparison between our DEC detection and other detection schemes under different
traditional PRS worm attacks.
Detection Time on VSR Worms 4800 4300 3800 3300 2800 2300 1800 DetectionTime 1300 800 300 325 350 400 450 600 750 950 1100 1350 1500 VSR Parameter K CISH DEDH(DEC) CVDH DVDH
Figure 4.5: Detection time of detection schemes on VSR worms
106 Maximal Infection Ratio on VSR Worms
0.07
0.06
0.05
0.04
0.03
0.02
Maximal Infection Maximal Ratio 0.01
0 325 350 400 450 600 750 950 1100 1350 1500 VSR Parameter K CISH DEDH(DEC) CVDH DVDH
Figure 4.6: Maximal infection ratio of detection schemes on VSR worms
Detection Performance to VSR Worms
As shown in Table 4.1 and 4.2, CISR is not effective to detect VSR worms. Hence,
we do not report the performance of CISR in this subsection. Fig. 4.5 shows the Detec-
tion Time (DT ) of various detection schemes under VSR worm attacks with different K
(defined in Formula (4.14)). Fig. 4.6 shows the corresponding Maximal Infection Ratio
(MIR). From these two figures, we make the following observations: a) Our DEC (or
DEDH) detection scheme, consistently achieves the best detection performance in terms
of both DT and MIR. For example, when K = 400, the detection time of DEDH is 820
units, which is only 25% − 50% of other detection schemes. This means that DEC is more
robust and can detect the VSR worm significantly faster than other detection schemes. With
the same K, the MIR achieved by our DEDH is 0.008, which is only 14% − 40% of the
other detection schemes. This means that DEDH can prevent much more vulnerable hosts
from being infected by the VSR worm, compared with other detection schemes. b) The de-
tection performance of DVDH is better than CVDH, in terms of both DT and DIR. This
result consistently confirms that the worm attack target distribution is more effective than
107 victim count as the detection data. c) When the VSR scan rate parameter K increases, all detection schemes achieve faster worm detection (smaller DT ) and result in smaller MIR.
This is because larger K value implies the faster VSR worm scan. Thus, the VSR worm is detected earlier (smaller DT ) and the damage cost by VSR worm (MIR) is also smaller.
As discussed, there is a trade-off in selecting dynamic threshold parameter α in For-
mula (4.12). A larger value of α will achieve faster detection (smaller detection time) but
worse detection accuracy (larger false alarm rate). Table 4.3 shows the detection time and
detection false alarm rate with different values of α for our DEDH detection scheme. This
table shows that the false alarm rate is sensitive to the dynamic threshold parameter α.
When the value of α is larger, the false alarm rate is also larger.
Parameter α 0 0.04 0.08 0.16 False alarm rate 0.00001 0.00003 0.006 0.015 Detection time (DEDH) 1103 890 850 814 MIR (DEDH) 0.0025 0.0016 0.0013 0.0011
Table 4.3: DEC Performance Sensitivity of Parameter α
Detection Performance to Traditional PRS Worms
For the detection of the traditional PRS worm, we evaluate all the five detection schemes
listed in Fig. 4.4. Fig. 4.7 shows the Detection Time (DT ) of various detection schemes with different mean values of the scan rate Sm. Fig. 4.8 shows the corresponding Maximal
Infection Rate (MIR). From these two figures, we can make the following observations:
a) Our DEC (or DEDH) detection scheme consistently achieves the best detection per-
formance in term of both DT and MIR among five detection schemes. For example,
108 Detection Time on Traditional PRS Worms 1250
1050
850
650
450 DetectionTime
250
50 50 100 150 200 250 300 350 Scan Rate CISH CISR CVDH DVDH DEDH(DEC)
Figure 4.7: Detection time o f detection schemes on the traditional PRS worms
Maximal Infection Ratio on Traditional PRS Worms 0.045 0.04 0.035 0.03 0.025 0.02 0.015 0.01
Maximal Infection Maximal Ratio 0.005 0 50 100 150 200 250 300 350 Scan Rate CISH CISR CVDH DVDH DEDH(DEC)
Figure 4.8: Maximal infection ratio of detection schemes on the traditional PRS worms
when the mean scan rate is 150 per unit time, DEDH achieves a detection time with a value of 240 unit time, which is only 30% − 50% of the detection time of other detection schemes. With the same mean scan rate, DEDH achieves a MIR equal to 0.004, which is only 15% − 35% of other detection schemes. b) When the mean scan rate increases, all detection schemes achieve shorter detection time. Also all the detection schemes except
CVDH and DVDH achieve smaller MIR. This highlights that, in general for most detec- tion schemes, the relatively small scan rates can achieve larger worm attack damage before
109 the worm-spreading is being identified. The reason is that, a faster worm increases the
worm instance count more quickly and thus the detection detects it much earlier. However,
extremely slow worm propagation is contradict to the goal of active worms. c) When the
scan rate increases, MIR of CVDH and DVDH show the different trends compared with other detection schemes. This observation also matches the result in [132]. Fig. 4.8 also shows that CVDH and DVDH can achieve better performance in terms of MIR compared
with CISH and CISR when the worm scan rate is relative low. This confirms that dynam-
ically adjusting the detection threshold in DEDH is a good way to improve the detection
performance especially for detecting stealthy worm with varying scan rate.
To summarize our observations, we can see that our DEC-based detection scheme is
highly potent to detect both the VSR worm and traditional PRS worms.
4.6 Related Work
The significant damage caused by active worms forces people to pay attention to their
study. Much effort has been paid on studying the worm modeling, analysis, detection, and
defense. In the area of modeling and analyzing active worms, Staniford et al. studied
various active worms and modeled the propagation of them [114]. Chen et al. analyzed the
propagation of active worms with a discrete time model and also considered the impact of
patching during the worm spreading [27]. Moore et al. analyzed and modeled Slammer
worm spreading in [86]. There are some other works such as malware spreading dynamics
by Garetto et al. in [48] and Code-Red worm modeling by Zou et al. in [148]. We modeled
a new form of worms, namely Varying Scan Rate worm, which generalizes the worms
that deliberately change the scan rates to evade the existing global scan traffic monitoring
110 based worm detection schemes. Examples of such worms are ”Atak” worm [144] and
”self-stopping” worm [75].
The most important component of worm defense systems is worm detection, which in fact is the foundation of defending against worms. In worm detection, many of schemes leverage intrusion detection results [36]. Besides the detection schemes based on the global scan traffic monitoring as discussed in Section 4.2.2, there are some other types of worm detection schemes. For example, using sequential hypothesis testing, Jung et al. developed a Threshold Random Walk online detection algorithm to identify worm infected hosts [105].
Gu et al. developed DSC (Destination-Source Correlation) for detecting worm in local networks, which is based on worm infection factor, i.e, the victim host is first scanned, then sends out scans destined for the same port [53]. Kim and Karp proposed a scheme to automatically generate the worm signature [61].
Our DEC detection scheme is different from above detection schemes in the sense that
DEC uses attack target distribution as the detection data to capture worm propagation.
Further more, it uses the statistical property of entropy as mining utility to synthesize the detection data, and adopts dynamic detection decision rule to improve the detection perfor- mance. A recent similar work in [67] also discussed using traffic distribution (summarized by entropy) such as destination IP address to detect various anomalies. However, the work in [67] is different from ours in the following regards: its detection scheme was not tuned specifically for worm detection, it did not address the impact of various statistical prop- erties (sample mean/variance/entropy) on detection performance and it did not compare representative worm detection schemes. Our work covered a thorough study on worm detection. Especially, we defined a three-dimensional worm detection space to leverage
111 existing schemes, and applied various statistical properties and used dynamic threshold to
enhance the accuracy in detecting both VSR and traditional worms.
4.7 Summary
In this chapter, we modeled a new form of worms called Varying Scan Rate worm
(the VSR worm in short). The VSR worm is generic and simple to launch. Our results
showed that the VSR worm can significantly degrade the effectiveness of existing worm
detection schemes based on global traffic monitoring. To counteract the VSR worm, we
developed a new worm detection scheme called attack target Distribution Entropy based dynamiC detection scheme (the DEC detection). The DEC detection utilizes the attack target distribution and its statistical entropy in conjunction with dynamic decision rules to distinguish the worm scan traffic from the non-worm scan traffic. Our data clearly demon- strated the effectiveness of the DEC detection scheme in detecting the VSR worms as well as traditional PRS worms. To the best of our knowledge, our work is the first work to sys- tematically study active worms with deliberately varying scan rate and develop effective detection scheme against them.
In our research, except above VSR worm, we have also investigates another new class
of active worms, i.e., Camouflaging Worm (C-Worm), which has the ability to camouflage
its propagation from worm detection systems [142]. The C-Worm intelligently manipulates
its scan traffic volume dynamically and timely so that its propagation may not be detected
by existing network-based worm detection algorithms. We analyzed characteristics of the
C-Worm and compare traffic of both the C-Worm and normal non-worm scans. We ob-
served that they are barely distinguishable in the time domain. However, in the frequency
domain, the distinction between them is clear due to the recurring manipulative nature of
112 the C-Worm. Motivated by our observations, we designed a novel spectrum-based scheme to detect the C-Worm. Our scheme uses the Power Spectral Density (PSD) distribution of the scan traffic volume and its corresponding Spectral Flatness Measure (SFM) to dis- tinguish the C-Worm traffic from non-worm traffic. We conducted extensive performance evaluations on the C-Worm through simulation using real-world trace as background scan traffic. The performance results demonstrate that our spectrum-based scheme can more rapidly and accurately detect the C-Worm in comparison with existing worm detection schemes.
113 CHAPTER 5
POLYMORPHIC WORMS AGAINST HOST-BASED WORM DETECTION AND COUNTERMEASURES
In this chapter, we do not define any new attack model because the evolving worms we study here have been in existing. The worm writers know that most host-based worm detection algorithms use binary presentations or signatures of seen worms as reference to distinguish worms and benign executable, hence they generate polymorphic worms which attempt to change the signatures during the propagation. In order to fight against this new threat in the real world, we propose a new worm detection approach based on mining the dynamic program executions. This approach can capture the dynamic behavior of exe- cutables to provide accurate and efficient detection against both seen and unseen worms including polymorphic worms.
5.1 Motivations
In general, there are two types of worm detection systems: network-based detection and host-based detection. Network-based detection systems detect worms primarily by monitoring, collecting, and analyzing the scan traffic (messages to identify vulnerable computers) generated by worm attacks. Many detection schemes fall into this category
[132, 123, 112, 62, 147, 67, 141]. Nevertheless, because of their reliance on scan traffic,
114 these schemes are not very effective in detecting worms that spread via email systems,
instant messenger (IM) or peer-to-peer (P2P) applications.
On the other hand, host-based detection systems detect worms by monitoring, collect-
ing, and analyzing worm behaviors on end-hosts. Since worms are malicious programs that
execute on these machines, analyzing the behavior of worm executables13 plays an impor-
tant role in host-based detection systems. Many detection schemes fall into this category
[1, 64, 106]. Considering that a large number of real-world worm executables are accessi-
ble over the Internet, they provide an opportunity for researchers to directly analyze them
to understand their behavior and, consequently, develop more effective detection schemes.
Therefore, the focus of this chapter is to use this large number of real-world worm executa-
bles to develop a host-based detection scheme which can efficiently and accurately detect
new worms.
Within this category, most existing schemes have been focusing on static properties
of executables [64, 106]. In particular, the list of called Dynamic Link Libraries (DLLs),
functions and specific ASCII strings extracted from the executable headers, hexadecimal
sequences extracted from the executable bodies, and other static properties are used to
distinguish malicious and benign executables. However, using static properties without
program execution might not accurately distinguish between these exectuables due to the
following two reasons.
• First, two different executables (e.g., one worm and one benign) can have same static
properties, i.e., they can call the same set of DLLs and even call the same set of
functions.
13In this chapter, an executable means a binary that can be executed, which is different from program source code.
115 • Second, these static properties can be changed by the worm writers by inserting
“dummy” functions in the worm executable that will not be called during program
execution, or by inserting benign-looking strings [32, 50, 63, 20].
Hence, the static properties of programs, or how they look, are not the keys to dis- tinguish worm and benign executables. Instead, we believe the keys are what programs do, i.e., their run-time behaviors or dynamic properties. Therefore, our study adopts dy- namic program analysis to profile the run-time behavior of executables to efficiently and accurately detect new worm executables. However, dynamic program analysis poses three challenges. First, in order to capture the run-time behavior of executables (both worm and benign ones), we have to execute a large number of malicious worms, which might damages our host and network systems. Second, given the large number of executables, manually executing and analyzing them is infeasible in practice. Hence, we need to find an efficient way to automatically capture programs’ run-time behavior from their execution.
Third, from the execution of a large set of various worms and benign executables, we need to find some constant and fundamental behavior differences between the worms and the be- nign executables in order to accurately determine whether an unseen executable is a worm or a benign executable.
In order to address the above issues, we propose an effective worm detection approach
based on mining system-call traces of a large amount of real-world worms and benign
executables. Our goal is to use large volume of existing worms to capture the common
dynamic behavior features of worms and use them to detect new worms.
116 5.2 Background
In this section, we give an overview of worm detection and then introduce program analysis and data mining techniques.
5.2.1 Worm Detection
Generally, worm detection can be classified into network-based and host-based schemes.
Network-based schemes detect worm attacks by monitoring, collecting, and analyzing worm-generated traffic. For this purpose, Internet Threat Monitoring (ITM) systems have now been developed and deployed [103, 15]. An ITM system usually consists of a number of monitors and a data center. Each monitor of an ITM system is responsible for monitoring traffic targeted to a range of unused, yet routable, IP address space and periodically reports the collected traffic logs to the data center. The data center analyzes the logs and posts summarized reports for alarming Internet worm attacks. Based on data reported by ITM systems, many detection schemes have been proposed [132, 123, 146, 59]. Nevertheless, as we mentioned in Section 5.1, these detection schemes have limitations detecting worms that spread via e-mail systems, instant messenger (IM), or peer-to-peer (P2P) applications, since their traffic is difficult for ITM systems to observe [77].
Host-based schemes detect worm attacks by monitoring, collecting, and analyzing the worm behavior on end-hosts. In particular, when a worm executes on an infected computer, it may take control of the system with high privileges, modify the system as needed, and continue to infect other computers. These acts expose some anomalies on the infected com- puters, such as writing or modifying registry keys and system binaries or opening network connections to transfer worm executables to other vulnerable computers. For example,
117 the “Blaster” worm changes a registry entry, downloads a file named “msblast.exe”, and
executes it [25].
Traditional host-based detection focuses primarily on detecting worms by signature
matching. In particular, these detection systems have a database of distinctive patterns
(signatures) of malicious code for which they scan in possibly-infected systems. This ap-
proach is fast and, until recently, quite effective to detect known worms. However, it is
not effective to detect new worms, as they have new signatures unknown to these detec-
tion systems during the worms’ early propagation stage. Furthermore, worm writers can
use the clear worm signatures generated or used by these detection systems to change the
signatures in order to evade detection. For example, worms such as MetaPHOR [79] and
Zmist [45] intensively metamorphose to hide themselves from detection, thereby illustrat-
ing the feasibility and the efficiency of mutation techniques. Recent data show that current
commercial worm scanners can be easily circumvented by the use of simple mutation tech-
niques [29, 30].
Since attackers always want to hide their malicious actions, they do not make their at-
tack source code publicly available. However, the attack executables are publicly available
after the attacks are captured. Unlike classical host-based detection, our intention is to
use a large number of real-world worm executables and further develop a generic detec-
tion scheme to detect new worms. For this purpose, dynamic program analysis plays an
important role and is introduced in the following subsection.
5.2.2 Program Analysis
Unlike static program analysis, dynamic program analysis does not require the exe- cutable’s source code, but dynamic analysis must be performed by executing the program
118 [42, 72]. Most dynamic program analysis methods, such as debugging, simulation, bi- nary instrumentation, execution tracing, stack status tracking, etc. are primarily used for software-engineering and compiler-optimization purposes. Recently, interest in dynamic program analysis has arisen for vulnerability and “security hole”-detection purposes. How- ever, some dynamic-analysis approaches are only suitable for analysis of individual ex- ecutables with human expertise, such as debugging, or are only fit for specific attacks
[44, 100]. For our work, we need an appropriate dynamic program analysis method to investigate the run-time behavior of worm and benign executables to detect worms. The method we adopt here is to trace system calls during the program execution, which is a type of execution tracing. In particular, we trace the operating system calls invoked by the programs during their execution. This method can be used to automatically record interest- ing information during execution to further investigate executables’ behavior in the course of worm detection.
5.2.3 Data Mining
Data mining refers to the process of extracting “knowledge,” or meaningful and useful information, from large volumes of data [40, 54]. This is achieved by analyzing data from different perspectives to find inherent hidden patterns, models, relationships, or any other information that can be applied to new datasets. It includes algorithms for classification, clustering, association-rule mining, pattern recognition, regression, and prediction, among others. Data-mining algorithms and tools are widely adopted in a range of applications as well as in the computer-security field. In particular, various data-mining technologies are adopted in different threat-detection approaches as described in Section 5.7. In our work,
119 we use classification algorithms to differentiate between worm and benign program execu- tion in order to provide accurate worm detection against both seen and unseen worms.
5.3 Polymorphic Worms
Although numerous of efforts have been made to detect active worms, the evolved worms are using metamorphism to circumvent these existing worm detections. While most of existing host-based worm detection algorithms use worm signature of seen worms to determine whether a encountered executable is worm or not, the polymorphic worms at- tempt to change their binary presentation or signatures during propagation, so that they are always unseen to the worm detectors and thus able to evade their detection.
In fact, above polymorphic techniques are not new in virus [79, 91, 117]. Recently, active worms also show the trend to utilize them [63]. Furthermore, the technology for mutate worm code have been public available even as open source toolkits or libraries
[37, 65, 107]. Attackers can easily use them to make their worms polymorphic and hard to be detected by the signature-based worm detection. Utilization of automatic encryption and decryption further makes the polymorphism of worms more feasible and efficient.
Polymorphic worms are dangerous and potent evolving widespread Internet attacks which bring a serious threat to the Internet due to their effectiveness in evading existing host-based worm detection. The worm detection proposed by us in this chapter aims to address this threat by using the dynamic properties of executables instead of static signa- ture to capture worm executables. We do not use the binary presentation as the feature to distinguish worms from benign executables, thus the mutation techniques used by the polymorphic worms have no impact on our worm detection approach. As shown in 5.5, our
120 dynamic program analysis based approach is effective in detecting unseen worms, includ-
ing brand new worms and polymorphic worms.
5.4 Worm Detection via Mining Dynamic Program Execution
5.4.1 Framework Overview
Recall that the focus of this chapter is to use a large number of real-world worm exe-
cutables and subsequently develop an approach to detect new worms. In this section, we
introduce the framework of our system for dynamic program analysis that detects worm
executables based on mining system-call traces of a large amount of real-world worm and
benign executables. In general, this mining process is referred to as the off-line classifier
learning process. Its purpose is to learn (or train) a generic classifier that can be used to distinguish worm executables from benign ones based on system call traces. Then we use the learned classifier with appropriate classification algorithms to determine, with high ac- curacy, whether unknown executables belong to the worm class or the benign class. This process is referred to as the on-line worm detection process. The basic workflow is illus- trated in Fig. 5.1 and Fig. 5.2, which is subsequently explained.
(1) Collect (2) Collect dataset (3) Extract (4) Learn the executables as by tracing system feature from classifier data source calls system call trace
Figure 5.1: Workflow of the off-line classifier learning
121 (1) Trace (2) Extract (3) Classify the system call of a feature from its executable with new executable system call trace learned classifier
Figure 5.2: Workflow of the on-line worm detection
Off-line Classifier Learning
1. Data Source Preparation
Before we can begin dynamic program analysis and profile the behavior of worm
and benign executables, we need to collect a large number of such executables as
the data source for our study. These executables are labeled into two classes: worm
executables and benign executables. The worms are obtained from the Web site VX
Heavens (http://vx.netlux.org).
2. Collection Dataset - Dynamic Properties of Executables
With the prepared data source, we discuss how to collect the dataset, which we refer
to as the dynamic properties of executables. Recall that in order to accurately distin-
guish worm executables from benign ones, we need to collect data that can capture
the fundamental behavior differences between them—the dynamic properties. One
feasible and efficient method we choose is to run the executables and trace the run-
time system-call sequences during their execution. However, executing worms might
damage the host operating systems or even the computer hardware. In order to solve
this problem in our experiments, we set up virtual machines as the testbed. Then we
launch each executable in our data source and record its system-call trace during the
execution on the virtual machine. We refer to the collection of the system-call traces
for each executable in our data source as the dataset. We split the dataset into two
122 parts: the training set and the test set. With the training set, we will apply classifica-
tion learning algorithms to learn the classifier. The concrete format and content of the
classifier is determined by the learning algorithms adopted. With the test set, we will
further evaluate the accuracy of the learned classifier with respect to the classification
of new and unidentified executables.
3. Feature Extraction
With the collection dataset comprising system-call traces of different executables, we
extract all the system-call sequence segments with a certain length. These segments
are referred as n-grams, where n is the length of the sequence, i.e., the number of
system calls in one segment. These n-grams can map to relatively independent and
meaningful actions taken during the program execution, or the executables’ program
blocks. We intend to use these n-grams to capture the behaviors of common worms
and benign executables. Hence these n-grams are the features for classifying worms
and benign executables, and each distinct n-gram represents a particular feature in
our classification.
4. Classifier Learning
From the features we extract from the training dataset, we need to learn a classifier
to distinguish between worms and benign executables. When we select the clas-
sification algorithm, we need to consider the learned classifier’s accuracy as well
as its interpretability. Some classifiers are easy to interpret and the classification
(i.e., the decision rule of worm detection) can be easily extracted from the classifier
[106]. Then worm writers can use the rules to change their worms’ behavior and
consequently evade detection, similar to self-mutating worms that metamorphose
123 to defeat signature-based detection [20]. Thus, we need classifiers with very low
interpretability. In our case, we consider two algorithms, the Naive Bayes-based
algorithm and the Support Vector Machine (SVM) algorithm, and compare their per-
formance. While the Naive Bayes-based algorithm is simple and efficient in classifier
learning, SVM is more accurate. More importantly, SVM learns a black-box classi-
fier that is hard for worm writers to interpret.
On-line Worm Detection
Having learned the classifier in the off-line process, we now describe how it is used to carry out on-line worm detection. In this process, we intend to automatically detect a new and unseen executable. In particular, we follow the same procedure as in the off-line pro- cess, in which system-call traces of an unknown executable are recorded and classification features (i.e., system-call sequence segments with certain lengths) are extracted during its execution. Then the classification algorithm is applied with the learned classifier to classify the new executable as a worm or a benign program.
In fact, the aforementioned worm detection actually depends on the accuracy of the classifier. In order to evaluate it, we use it to classify the executables in the test set. Since we already know the class label of these executables, we can simply compare the classification results from the learned classifier with the pre-known labels. As such, the accuracy of our classifier can be measured.
In the following sections, we will present the major steps above, i.e., dataset collection, feature extraction, classifier learning and on-line worm detection in detail, followed by experiment results.
124 5.4.2 Dataset Collection
In this section, we present the details on how we obtain the dataset, i.e., the dynamic program properties of executables in the form of system call traces.
Worm Execution with Virtual Machine
In order to study the run-time behavior of worms and benign executables for worm detection, we need to execute the benign executables as well as the worms. However, worms might damage the operating system and even the hardware of the hosts. In order to solve this problem, we set up virtual machines (VMs) [55, 80] as the testbed. The VM we choose is VMware [55].
Even with VMs, two difficulties can still arise during data collection because of the worm execution. First, since worms can crash the operating system (OS) in the VM, we might have to repeatedly re-install the OS. In order to avoid these tedious re-installations, we install all necessary software for our experiments and store all our worm executables on the VM, and then save the image file for that VM. Whenever the VM OS crashes, we can clone the identical VM from the image file to continue our experiment. Second, it is difficult to obtain the system-call traces from the VM after it crashes. In order to solve this problem, we set the physical machine on which a VM is installed as the network neighbor of the VM through the virtual network. Thus, during worm execution, the VM automatically outputs the system-call trace to the physical machine. Although the physical machine can be attacked by the worms on the VM because of this virtual network, we protect the physical machine with anti-virus and other security software and impose very restrictive access controls.
125 System-Call Trace
Recall that we choose dynamic properties of executables to capture their behavior and, more accurately, distinguish worms from benign executables. There are multiple dynamic program analysis methods [42, 72] that can be used to investigate the dynamic properties of executables. The most popular methods are debugging and simulation. However, they must be used manually with human expertise to study program behavior. In our case, their human-intervention requirement makes them unsuitable for automatic analysis. Still, execution tracing is a good method for automatic analysis, as it can automatically record run-time behavior of executables. In addition, it is easy to analyze the system-call trace using automatic analysis algorithms.
There are several different ways to carry out execution tracing. In our case, we choose to trace system calls of worm and benign executables and use the trace to perform classifi- cation (and hence worm detection). The reasons for doing this are straightforward. Tracing all Microsoft Windows Application Programming Interface (API) functions can capture more details about the run-time behavior of executables. However, in comparison with tracing only system calls, API tracing increases OS resource consumption and interferes with the execution of other programs. This is because there are far fewer system calls (311 for all the Windows version together [73], 293 for the Linux 2.6 kernel [56]) than there are
APIs (over 76,000 for Windows versions before Vista [119] and over 1,000 for Linux [98]).
Hence, we choose to trace only system calls to facilitate “light-weight” worm detection.
126 5.4.3 Feature Extraction
Features are key elements of any anomaly-based detection or classification. In this
section, we describe the method to extract and process the features that are used to learn
the classifier and carry out worm detection.
N-gram from System-Call Trace
System-call traces of executables are the system-call sequences (time series) of the
execution, which contain temporal information about the program execution and thus the
respective dynamic behavior information. In our system, we need to extract appropriate
features that can capture common or similar temporal information “hidden” in the system-
call sequences of all worm executables, which is different from the temporal information
hidden in the system-call sequences of all benign executables.
The n-gram is a well-accepted and frequently adopted temporal feature in various areas
of (statistical) natural language processing and genetic sequence analysis [69]. It also fits
our temporal analysis requirement. An n-gram is a subsequence of n items from a given
sequence. For example, if a system call sequence is {NtReplyWaitReceivePort-
Ex, NtOpenKey, NtReadVirtualMemory, NtCreateEvent, NtQuerySystem-
Information}, then the 3-grams from this sequence are {NtReplyWaitReceive-
PortEx, NtOpenKey, NtReadVirtualMemory}, {NtOpenKey, NtReadVirtual-
Memory, NtCreateEvent}, and {NtReadVirtualMemory, NtCreateEvent, Nt-
QuerySystemInformation}.
We use n-grams as the features in our system for the following reasons. Imagine the
difference between one line of source code and one block of source code in a program. The
line of code provides little meaningful information about a program, but the block of code
127 usually represents a meaningful and self-contained small task in a program, which is the
logical unit of programming. Similarly, one system call only provides very limited infor-
mation about the behavior of an executable, whereas a segment of system calls might rep-
resent a meaningful and self-contained action taken during the program execution. Worm
and benign executables have different behaviors, and this difference can be represented as
the difference between their source code blocks, or the segments (i.e., n-grams) of their
system calls. Hence, we use these system-call segments, or the n-grams, as the features to
classify worm and benign executables, which proves to be very effective throughout our
experiments as described in Section 5.5.
Length of N-gram
A natural question is: what n-gram length is best for classifying worms from benign
executables? On one hand, in order to capture the dynamic program behavior, n should
be greater than 1. Otherwise, the extracted 1-gram list is actually the list of system calls
invoked by the executables. This special case is the same as the method used by static pro-
gram analysis to detect worms, which has no dynamic run-time information of executables.
On the other hand, n should not be very large for the following two reasons. First, if n
is too large, it is very unlikely that we will find common or similar n-grams among different
worm executables. In one extreme case, when n becomes very large, the n-grams are no
longer small tasks. Instead, they encompass the entire execution of the programs. Because different worms cannot have the exact same sequence of system-call invocations (otherwise they are the same worm), the classifier learning algorithms cannot find a common feature
(i.e., the same system-call invocations) among them, and the algorithms cannot be used to define a class in which all the worms are included. In this case, the classification will not work. Second, if n is too large, the number of possible distinct n-grams—311n for
128 MS Windows as Windows has 311 system calls, 293n for Linux as Linux has 293 system calls—will be too large to to be analyzed in practice. We investigate the impact of n-gram length on worm detection in our experiments and report the results in Section 5.5.
5.4.4 Classifier Learning and Worm Detection
In this section, we describe the details of the last step in the off-line classifier learning process (i.e., how to apply the classifier learning algorithm to learn the classifier after ex- tracting the features). In particular, we use two classification algorithms: the Naive Bayes algorithm, which is a simple but popular learning algorithm, and the Support Vector Ma- chine (SVM) algorithm, which is a more powerful but more computationally-expensive learning algorithm. We also discuss how to conduct on-line worm detection with each of the algorithms in detail.
Naive Bayes based Classification and Worm Detection
The Naive Bayes classifier (also known as the Simple Bayes classifier) is a simple probabilistic classifier based on applying Bayes’ Theorem [54]. In spite of its naive design, the Naive Bayes classifier may perform better than more sophisticated classifiers in some cases, and it can be trained very efficiently with a labeled training dataset. Nevertheless, in order to use the Naive Bayes classifier, one must make the assumption that the features used in the classification occur independently.
In our case, we use the Naive Bayes classifier to calculate the likelihood that an exe- cutable is a worm executable (i.e., in the worm class) and the likelihood that it is a benign executable (i.e., in the benign class). Then, based on which of the two classes have a larger likelihood, the detection decision is made.
1. Off-line Classifier Learning
129 We represent each executable by an m-dimensional feature vector, X = (x1, x2, . . . , xm),
where m is the number of distinct n-grams in the dataset, xi (i = 0, . . . , m − 1) is
th the i distinct n-gram such that xi = 1 if xi appears in the executable’s system call
trace and xi = 0 otherwise. We have two classes: the worm class Cw and the benign
class Cb. Given the feature vector X of an unknown executable, we need to predict
which class X belongs to. The prediction is done as follows. First, for each class, we
calculate the likelihood that the executable belongs to that class. Second, we make a
decision based on the larger likelihood value, i.e., the executable belongs to the class
that has the larger likelihood.
Actually, the off-line “classifier” learning process of the Naive Bayes algorithm is
the preparation for the calculation of the above two likelihoods. In particular, this
preparation is the calculation of some statistical probabilities based on the training
data. These probabilities comprise the posterior probability of each n-gram—say,
xi—conditioned on each class, Cw and Cb. Hence, the off-line “classifier” learning
process in our Naive Bayes classification actually is the calculation of P (xi|Cj) i =
1, . . . , m, j = w or b based on the training dataset.14
2. On-line Worm Detection
During the on-line worm detection, for each unknown executable, the feature vector
X for that executable is built first. Then we predict that X belongs to the class that has
a higher posterior probability, conditioned on X. That is, the Naive Bayes classifier
14In some implementations, the classifier learning based on the Naive Bayes algorithm may conduct extra procedures, such as selection of features and cross-validation, but they are not the core procedures for the Naive Bayes algorithm.
130 assigns an unknown sample X to the class Cj if and only if
P (Cj|X) > P (Ck|X) where j, k = w or b, j 6= k. (5.1)
Based on Bayes’ Theorem, P (Cj|X) can be calculated by
P (X|C )P (C ) P (C |X) = j j . (5.2) j P (X)
In order to predict the class of X, we will calculate P (X|Cj)P (Cj) for j = w or b
and consequently compare P (Cw|X) to P (Cb|X). Now we discuss how to calculate
P (X|Cj)P (Cj). First, if the class prior probabilities P (Cw) and P (Cb) are unknown,
then it is commonly assumed that the classes are equally likely, i.e., P (Cw) = P (Cb).
Otherwise, P (Cj) can be estimated by the proportion of class Cj in the dataset. Sec-
ond, as we assume the features are independent, P (X|Cj) can be calculated by
Ym P (X|Cj) = P (xi|Cj), (5.3) i=1
where P (xi|Cj) can be calculated during the off-line classifier learning process.
3. Discussion
The Naive Bayes classifier is effective and efficient in many applications. The theo-
retical time complexity for learning a Naive Bayes classifier is O(Nd), where N is
the number of training examples and d is the dimensionality of the feature vectors.
The complexity of classification for an unknown example (an unknown executable
in our case) is only O(d).
However, the Naive Bayes classifier has two limitations in our case. First, worm writ-
ers can use it to make worm detection less effective for new worms. In our approach,
it includes a set of probabilities that the n-grams appear in each class. Worm writers
131 can directly use this information to make new worms similar to benign executables
by either using or avoiding certain n-grams (system-call sequences). Second, high
accuracy of the Naive Bayes classifier is based on the assumption that the features
are independent of each other. However, in reality, the n-grams in the system-call
trace of an executable may not be independent. In order to address these problems
of Naive Bayes classifier, we use the Support Vector Machine (SVM) in our worm
detection as described in the following subsection.
Support Vector Machines based Classification and Worm Detection
The Support Vector Machine (SVM) is a type of learning machine based on statistical
learning theories [121]. SVM-based classification includes two processes: classifier learn-
ing and classification. Classifier learning is the learning of a classifier or model using the
training dataset. The learned classifier is used to determine or predict the class “label” of
instances that are not contained in the training dataset. The SVM is a sophisticated and
accurate classification algorithm. It is computationally expensive and its trained classifier
is difficult to interpret. Its outstanding accuracy and low interpretability match our require-
ments for accurate worm detection and interpretation difficulty for worm writers.
1. Off-line Classifier Learning
A typical SVM classifier-learning problem is to label (classify) N training data
15 d {x1,..., xN } to positive and negative classes where xi ∈ R (i = 1,...,N) and d
is the dimensionality of the samples. Thus, the classification result is {(x1, y1),..., (xN , yN )},
15The SVM algorithm can be extended to classification for more than two classes, but the two classes are the typical and basic cases. Our problem is a two-class classification problem.
132 th yi ∈ {−1, +1}. In our case, xi is the feature vector built for the i executable in our dataset. That is, xi = {xi,1, . . . , xi,d}, where d is the number of distinct n-grams, xi,j
th th (j = 1, . . . , d) is the j n-gram, xi,j = 1 if xi,j appears in the i executable’s system call trace and xi = 0 otherwise. yi = −1 means that xi belongs to the worm class and yi = +1 means that xi belongs to the benign-executable class. As we have a large number of features (n-gram), the dimensionality of the Euclidean space in our classification problem is very large (upper bounded by 311n depending on n-gram length n).
There are two cases for the SVM classifier learning problems; (1) the samples in the two classes are linearly separable; (2) the samples in the two classes are not linearly separable. But (2) case holds for most real-world problems. In the SVM, in order to achieve a optimal classifier, the non-linear solvable problem in case (2) needs to be transform to be a linear solvable problem in case (1) first. Then the optimal classifier can be learned through linear optimization [121, 122]. In the following, we first present the algorithm for the simple case (case (1)) followed by the algorithm for case (2).
1) Classes are linearly separable
If the two classes are linearly separable, then we can find a hyperplane to separate the examples in two classes as shown in the right side of Fig. 5.3. Examples that belong to different classes should be located on different sides of the hyperplane. The intent of the classifier learning process is to obtain a hyperplane which can maximally separate the two classes.
133 Mathematically, if the two classes are linearly separable, then we can find a hyper- plane w · x + b = 0 with a vector w and an intercept b that satisfies the following constraints:
w · xi + b ≥ +1 for yi = +1 and (5.4)
w · xi − b ≤ −1 for yi = −1, (5.5)
or, equivalently,
yi(w · xi − b) − 1 ≤ 0 ∀i. (5.6)
Examples in the training set that satisfy the above inequality are referred to as sup- port vectors. The support vectors define two hyperplanes: one that goes through the support vectors of the positive class, and the other goes through the support vectors of the negative class. The distance between these two hyperplanes defines a margin and this margin is maximized when the norm of the vector w, kwk, is minimized.
When the margin is maximized, the hyperplane w · x + b = 0 separates the two classes maximally, which is the optimal classifier in the SVM algorithm. The dual form of Equation 5.6 reveals that the above optimization actually is to maximize the following function:
XN 1 XN XN W (α) = α − α α (x · x )y y , (5.7) i 1 i j i j i j i=1 i=1 j=1 subject to the constraint that αi ≥ 0. The SVM algorithm can achieve the optimal classifier by finding out αi ≥ 0 for each training sample xi to maximize W (α).
2) Classes are not linearly separable
In the above case, linearly-separable classes can be optimized. However, real-world classification problems cannot usually be solved by the linear optimization algorithm.
134 This case is illustrated as the left side of Fig. 5.3, in which there is no linear hyper- plane (in this case, it is a straight line in 2-dimensional space) that can separate the examples in two classes (shown here with different colors). In other words, the re- quired classifier must be a curve, which is difficult to optimize.
feature 2 new feature 2 feature mapping
feature 1 new feature 1
Figure 5.3: Basic idea of kernel function in SVM.
The SVM provides a solution to this problem by transforming the original feature space into some other, potentially high-dimensional, Euclidean space. Then the mapped examples in the training set can be linearly separable in the new space, as demonstrated by the right side of Fig. 5.3. This space transformation in Equa- tion (5.7) can be implemented by a kernel function,
K(xi, xj) = Φ(xi) · Φ(xj), (5.8) where Φ is the mapping from the original feature space to the new Euclidean space.
We would only need to use K in the classifier training process with Equation (5.7), and would never need to explicitly even know what Φ is. The SVM kernel func- tion can be linear or non-linear. Common non-linear kernel functions include the
135 Polynomial Function, Radial Basis Function (RBF), and Sigmoid Function, among
others.
2. On-line Worm Detection
On-line worm detection is the classification of new executables using the SVM classi-
fication algorithm along with the optimal SVM classifier learned during the previously-
discussed off-line learning process.
For an unknown executable (a worm or benign executable), its feature vector xk must
be built first. The method is the same as the aforementioned process on the executa-
bles in the training set, i.e., the system-call trace during the execution is recorded,
then the n-grams with a certain value of n are extracted. Afterwards, the feature
vector xk is formed from the trace of the executable using the same method as in the
off-line classifier learning process.
Recall that during the classifier learning process, the optimal hyperplane is found.
Then for a new example xk, shown as the white circle in Fig. 5.3, the on-line classi-
fication checks on which side of the optimal hyperplane xk is. Mathematically, the
classification is conducted through signing a class to the executable by
C(xk) = sign(w · xk − b), (5.9)
where XN w = αiyixi. (5.10) i=1
If C(xi) is positive, we predict that the executable is a worm. Otherwise, we predict
that it is benign.
136 3. Complexity of SVM
The classifier learning process of SVM is relatively time-consuming because of the
large volume of the training set, the high-dimensionality of our feature space, and
the complexity of classifier calculation and optimization. No matter which kernel
function is used for N training examples with feature vectors of dimensionality d and
3 NS support vectors, the SVM classifier learning algorithm has complexity O(NS +
2 NSN + NSdN). However, the SVM classification process for each new executable
is fast and involves only limited calculations. Its complexity is O(MNS), where M
is the complexity of the kernel function operation. For Radial Basis Function (RBF)
kernel functions, M is O(d).
4. Black Box Characteristics of the SVM Classifier
The classifier learned by the SVM can be easily used to carry out worm detection.
However, the SVM classifier is hard to be interpreted. The SVM classifier learning
algorithm generates black box models (classifiers) in the sense that they do not have
the ability to explain in an understandable form [93, 16, 46]. Thus, from the SVM
classifier, it is hard to extract decision rules comprehensible in the original problem
domain, especially for the non-linear SVM due to the feature space transformation
introduce by kernel functions.
The above characteristic of SVM is a well-known limitation for the applications in
which one needs to know the decision rules which can be mapped back to the physical
entities in the original problem domain. However, this characteristic can help us
prevent the worm writers from interpreting and learning from the classifier. We want
to prevent the worm writers from obtaining the signature of his worms or any benign
137 executable. Otherwise, the worm writer can hide new worms accordingly as benign
executables.
Besides the optimization algorithm used in SVM, the learning classifier also depends
on the definition of input feature space, the selection of kernel function, the parame-
ters of the kernel function and etc., which are unknown to worm writers. The worm
writer does not know:
• the value of n of the n-gram used in the classifier,
• the mapping between n-grams and feature indices in the feature vector,
• the definition of the kernel function,
• the parameters of the kernel function,
• the space transformation introduced by kernel function.
Hence, even if the worm writer knows that we use SVM and is able to get the clas-
sifier, it is hard for him to interpret the classifier to discovery the decision rule we
used to distinguish between worms and benign executables. Thus, it is hard for him
to change the worm behavior accordingly to evade our detection. Furthermore, we
can protect the classification by mechanisms, such as encryption and etc.
5.5 Experiments
In this section, we first present our experimental setup and metrics, and then we report
the results of our experiments.
138 5.5.1 Experiment Setup and Metrics
In our experiments, we use 722 benign executables and 1589 worms in Microsoft Win-
dows or DOS Portable Executable (PE) format as the data source, though our approach
works for worm detection on other operating systems as well. We use this data source to
learn the generic worm classifier and further evaluate the trained classifier to detect worms.
The executables are divided into two classes: worm and benign executables. The worms
are obtained from the Web site VX Heavens (http://vx.netlux.org); they include e-mail worms, peer-to-peer (P2P) worms, Instant Messenger (IM) worms, Internet Relay
Chat (IRC) worms and other, non-classified worms. The benign executables in our exper- iments include Microsoft software, commercial software from other companies and free,
“open source” software. This diversity of executables enables us to comprehensively learn classifiers that capture the behavior of both worm and benign executables. We use 80% of each class (worm and benign) as the training set to learn the classifiers. We use the remain- ing 20% as the test set to evaluate accuracy of the classifiers, i.e., the performance of our detection approach.
We install MS Windows Professional 2000 with service pack 4 on our virtual machines
(VMs). On these VMs, we launch each collected executable and use strace for Windows
NT [33] to trace their system calls for 10 seconds.16 From the trace file of each executable,
we extract the system-call name sequences in temporal order. Then we obtain the seg-
ment of system calls (i.e., the n-grams), given different value of n for each executable.
Afterwards, we build the vector inputs for the classification learning algorithms.
16We launch the executables in the dataset for a longer time and then use a slide window to capture traces of a certain length for the classifier training. We found that a 10 second trace suffices to provide high detection accuracy.
139 Recall that the classification in our worm detection problem is in a high-dimensional
space. There are a large number of dimensions and features that cannot be handled or
handled efficiently by many data-mining tools. We choose the following data-mining
tools: Naive Bayes classification tools from University of Magdeburg in Germany [28]
and svm light [57]. Both tools are implemented in the C language and thus have ef-
ficient performance, especially for high-dimensional classification problems. When we apply SVM algorithm with svm light, we choose the Gaussian Radial Basis Function
(Gaussian RBF), which has been proven as an effective kernel function [54]. The feature
distribution is a Gaussian distribution. The Gaussian RBF is in the form of
2 −γkxi−xj k K(xi, xj) = e , (5.11)
which means Equation (5.8) must be replaced by Equation (5.11) in the classifier learning
process and on-line worm detection process. The value of γ is optimized through experi-
ments and comparison.
In order to evaluate the performance of our classification for new worm detection, we
use two metrics: Detection Rate (PD) and False Positive Rate (PF ). In particular, the detection rate is defined as the probability that a worm is correctly classified. The false positive rate is defined as a benign executable classified mistakenly as a worm.
5.5.2 Experiment Results
In this subsection, we report the performance of our worm detection approaches. The
results of Naive Bayes- and SVM-based worm detections in terms of Detection Rate and
False Positive Rate under different n-gram length (n) are shown in Tables 5.5.2 and 5.5.2, respectively.
140 n-gram length 1 2 3 4 5 6
Detection Rate (PD) 69.8% 81.4.0% 85.0% 90.9% 93.6% 96.4% False Positive Rate (PF ) 33.2% 18.6% 11.5% 8.89% 6.67% 6.67%
Table 5.1: Detection results for the Naive Bayes based detection
n-gram length 1 2 3 4 5 6
Detection Rate (PD) 89.7% 96.0% 98.75% 99.5% 99.5% 99.5% False Positive Rate (PF ) 33.3% 18.75% 7.14% 4.44% 2.22% 2.22%
Table 5.2: Detection results for the SVM based detection
Effectiveness of Our Approaches
We conclude that our approaches of using both the Naive Bayes and SVM algorithms
correlate with detected worms at a high detection rate and low false positive rate when
the length of n-grams is reasonably large. For example, when the length of n-grams is
5, the detection based on the SVM algorithm achieves 99.5% detection rate and 2.22%
false positive rate and the detection based on the Naive Bayes algorithm achieves 96.4% detection rate and 6.67% false positive rate, respectively.
We also conclude that SVM-based detection performs better than Naive Bayes-based
detection in terms of both detection rate and false positive rate. There are two reasons for
this. First, the Naive Bayes classification assumes that features are independent, which may
not always be the case in reality. Second, the Naive Bayes-based classification calculates
the likelihood for classifying a new executable based on the vectors of the training set
executables in the feature space. Then it simply predicts the class of the new executable
141 based on the likelihood comparison. In contrast, the SVM attempts to optimize the classifier
(hyperplane) by finding the hyperplane that can maximally separate the two classes in the
training set.
Impacts of N-gram Length
Another important observation is the length of n-gram, i.e., the value of n, impacts the
detection performance. When n increases from 1 to 4, the performance keeps increasing.
When n further increases after that, the performance does not increase or only marginally increases. The reason can be explained as follows. First, when n = 1, each n-gram only contains one system call and thus contains neither dynamic system-call sequence infor- mation nor executable behavior information. Actually, this special case is that of static program analysis, which only investigates the list of system calls used by the executables.
Second, when n is larger, the n-grams contain larger lengths of system call sequences, thereby capturing more dynamic behavior of the traced executables, hence increasing the detection performance. This also demonstrates that our dynamic program analysis ap- proach outperforms the traditional static program analysis-based approaches.
From the above observation on the length of the n-gram, we conclude that a certain n-gram length is sufficiently effective for worm detection. This length (value of n) can be learned through experiments: when the increase of n does not greatly increase detection performance gain, that n value is good enough and can be used in practice. This method is actually used for other n-gram based data mining applications. Furthermore, with respect to the efficiency of worm detection, the n value should not be very large, as we discuss in
Section 5.4.4.
142 5.6 Discussions
In this chapter, we develop a worm detection approach that allows us to mine program execution, thereby detecting new and unseen worms. There are a number of possibilities for extending this work. A detailed discussion follows.
1. Classification for Different Worms
Presently, we discuss how to generalize our approach to support classification for dif-
ferent worms. Recall that our work in this chapter only focuses on distinguishing two
classes of executables: worm and benign. In practice, knowledge of the specific types
of worms (e.g., e-mail worms, P2P worms) can provide better ways to defend against
them. In order to classify different worms, the approach studied in Section 5.4.4 can
be extended as follows. First, we collect the training dataset including large number
of worms labeled by their types. Second, we use the same approach discussed in
Section 5.4.4 to train the classifiers, which are capable of profiling multiple classes
according to the types. Third, trained classifiers are used to determine the type (class)
of an un-labeled new worm.
2. Detection of Smart Worms
Now, we discuss how to extend our work to detect smart worms, or worms that be-
have “intelligently.” Based on mining the execution of a large number of program
executables, our approach has proven effective at detecting new, unseen worms. In
preliminary investigations, we even found that our approach detected “smart worms”
we created. These worms are “benign” executables that were infected by worms
and hence have behaviors characteristic of both worm and benign executables. We
143 use them to simulate smart worms that can camouflage themselves as benign exe-
cutables. Our initial finding is promising, as we found that our approach can detect
these smart worms that possess benign worm behaviors. This ability of our approach
awaits further proof of its efficacy with a larger set of various types of smart worms.
Nevertheless, the detection of smart worms is still an open issue, and answers for
some questions are still unclear. For instance, how intelligent can worms be, i.e., to
what extent can worms behave similarly to benign executables while still “spreading”
themselves like “normal” worms? While this is an open problem, researchers have
begun to discuss the efficiency of worms and detection evasion [96, 143]. Worms can
be very intelligent in order to evade detection, but their “hiding” mechanisms may
diminish their efficiency. More importantly, so long as they are worms, they must
manifest behavior different than that of benign executables. This provides some op-
portunities for use to further investigate smart worm detection.
3. Integration of Network-based and Host-based Detection
In this chapter, our focus is the study of host-based detection and we did not consider
information about the traffic generated by the executables during the worm detection.
As we know, a worm executable will expose multiple behaviors, such as generating
scan traffic (i.e., messages that intend to identify vulnerable computers) and conduct-
ing malicious acts on the infected computers. Since these worm behaviors are ex-
posed from different perspectives, consideration of multiple behaviors could provide
more accurate worm detection. In fact, traffic generated by worms can also be clas-
sified and used to distinguish them from normal traffic. For instance, the distribution
of destination IP addresses in network traffic can provide accurate worm detection
through traffic analysis [141]. Hence one ongoing work is to combine the traffic logs
144 and system calls generated by the worms and benign executables. The integration of
traffic and system calls can learn more reliable classifiers to detect worms.
5.7 Related Work
In this section, we review some existing work related to our study including worm
detection and data mining to the security research.
Since worm attacks have always been very dangerous threats to the Internet, much ef-
fort has gone into studying, analyzing, and modeling worms. For example, Staniford et
al. in [114] studied various worms and modeled their propagation using a continuous-time
epidemiology model. There has also been extensive work on the propagation of specific
worms [86, 148]. These analysis and modeling results help researchers to have better un-
derstanding of worm behaviors, and further develop detection schemes.
As we mentioned, there are two types of worm detection systems: network-based de-
tection and host-based detection. For the network-based worm detection, many schemes
proposed in the literature. For example, payload signature-based detection scheme is one to
examine specific byte sequence segments in the payload of worm scan traffic [112]. Tradi-
tionally, these payload signatures are manually identified by security experts through care-
ful analysis of the byte sequence from captured network traffic. Some efforts have been paid
to automatically generating payload signatures [112, 62]. There are other network-based
detection based on network traffic analysis. For example, Jung et al. in [59] developed a
threshold-based detection algorithm to identify the anomaly of scan traffic generated by a
computer. Venkataraman et al. and Weaver et al. in [123, 132] proposed schemes to ex-
amine statistics of scan traffic volume, Zou et al. presented a trend-based detection scheme
to examine the exponential increase pattern of scan traffic [147], Lakhina et al. in [67]
145 proposed schemes to examine other features of scan traffic, such as destination address dis- tribution. There are also other work studying worms attempt to pose new patterns to avoid detection [141, 96].
For the host-based detection, many schemes have been proposed in the literature. For example, a binary text scan program was developed to extract the human-readable strings from the binary, which reveal information about the function of the executable binary [1].
Wagner et al. in [124] proposed an approach that analyzes program executables and gen- erates a non-deterministic finite automaton (NDFA) or a non-deterministic pushdown au- tomaton (NDPDA) from the global control-flow graph of the program. The automaton was then used to monitor the program execution on-line. Gao et al. in [47] presented an ap- proach for detecting anomalous behavior of an executing process. The basic idea of their approach is that processes potentially running the same executable should behave similarly in response to a common input. Feng et al. [43] proposed a formal analysis framework for pushdown automata (PDA) models. Based on this framework, they studied program analysis techniques, incorporating system calls or stack activities. There are other schemes that detect anomaly behavior of executables through call stack information. For example,
Cowan et al. in [34] proposed a method, called StackGuard, to detect buffer overflow at- tacks. The difference that distinguishes our work form theirs is that we attempt to capture the common dynamic behavioral features of worms by mining the execution of a large number of worms.
Many articles have examined the use of data mining for security research. Lee et at. in
[70] formulated machine learning scheme on system call sequences of normal and abnor- mal execution of the Unix sendmail program. Lee et al. in [71] described a data mining framework for adaptively building Intrusion Detection models. The main idea of their work
146 is to utilize auditing programs (e.g., network logs of telnet sessions, shell command log) to extract an extensive set of features that describe each network connection or host ses- sion, and apply data mining techniques to learn rules that capture the behavior of intrusions and normal activities. Martin et al. in [78] proposed an approach via learning statistical pattern of outgoing emails from local hosts. Yang et al in [136] proposed an approach to apply machine learning approach to automatically fingerprint polymorphic worms, which are capable of changing their appearance across every instance of executables. Kolter et al in [64] applied data mining techniques to extract byte sequences directly from program executables, converted these sequences into n-grams, and constructed the classifier. Julisch et al in [58] proposed an approach to learn historical alarms generated by intrusion detec- tion systems. In our work, we use data mining to obtain the dynamic behaviorial difference between worms and benign executables.
5.8 Summary
we proposed a new worm detection approach that is based on mining the dynamic execution of programs. Our approach is capable of capturing the dynamic behavior of executables to provide efficient and accurate detection against both seen and unseen worms.
Using a large number of real-world worms and benign executables, we run executables on virtual machines and record system call traces. We applied two data mining classification algorithms to learn classifiers off-line, which are subsequently used to carry out on-line worm detection. Our data clearly showed the effectiveness of our proposed approach in detection worms in terms of both a very high detection rate and a low false positive rate.
Our proposed approach has the following advantages. It is practical with low overhead during both classifier learning and run-time detection. It does not rely on investigation for
147 individual executable; rather, it examines the common dynamic properties of executables.
Therefore, it can automatically detect new worms. Furthermore, our approach attempts to build a “black-box” classifier which makes it difficult for the worm writers to interpret our detection.
148 CHAPTER 6
CONCLUDING REMARKS
In this dissertation, we studied defense-oriented evolution, a novel evolutionary trend
among widespread Internet attacks, in which defense-oriented attacks leverage defense sys- tems to circumvent them and increase attack effectiveness. As the most important charac- teristics of a defense system are its infrastructure and algorithms, we classify these attacks into two groups: infrastructure-oriented attacks and algorithm-oriented attacks.
For infrastructure-oriented widespread Internet attacks, we studied intelligent DDoS
attacks which aim to infer architectural information about the infrastructure of DDoS-
defensing Secure Overlay Forwarding Systems (SOFSes) to launch more efficient DDoS
attacks. We further provided optimal structural configuration for SOFS systems and guide-
lines to enhance SOFS system performance under intelligent DDoS attacks. Additionally,
we investigated another infrastructure-oriented attack, the invisible LOCalization attack
(the iLOC attack in short). The iLOC attack can accurately and invisibly obtain the moni-
tor locations in Internet Threat Monitoring (ITM) systems, enabling other attacks to evade
disclosed monitors or even to abuse ITM systems to degrade their integrity and functional-
ity.
For algorithm-oriented attacks, we studied Varying Scan Rate (VSR) Worm,s which
deliberately vary their port scan rate during propagation to defeat existing ITM-based worm
149 detection schemes. We also designed the attack target Distribution Entropy based dynamiC
(DEC) detection scheme to effectively detect VSR and traditional worms. Furthermore, in order to detect new worms, including polymorphic worms (which have different signatures or purposely change signatures in order evade host-based worm detection), we proposed a new worm detection scheme which mines dynamic program execution.
While there are other forms of evolutions among widespread Internet attacks, we be- lieve the defense-oriented ones studied in this dissertation are among the most dangerous, since they deliberately and effectively counteract defense systems. As shown in the dis- sertation, they are also feasible and potent threats to the Internet. Our purpose is not to encourage attacks, but to obtain deep insights about potential new Internet threats and vul- nerabilities within current defense systems in order to enhance defense systems and design new defenses against evolving widespread Internet attacks. We believe that the results in this dissertation lay a foundation for further research in this field.
150 BIBLIOGRAPHY
[1] Binary Text Scan. http://netninja.com/files/bintxtscan.zip.
[2] Internet Security News. http://www.landfield.com/isn/mail-archive/2001/Feb/0037 .html.
[3] Snort, the open-source network intrusion detection system. http://www.snort.org/.
[4] W32/MyDoom.B Virus. http://www.us-cert.gov/cas/techalerts/TA04-028A.html.
[5] W32.Sircam.Worm@mm. http://www.symantec.com/avcenter/venc/data/w32.sircam. [email protected].
[6] Worm.ExploreZip. http://www.symantec.com/avcenter/venc/data/worm.explore.zip .html.
[7] Powerful Attack Cripples Internet. Associated Press for Fox News, http://www.foxnews.com/story/0,2933,66438,00.html, October 2002.
[8] R. Agrawal, A. Evfimievski, and R. Srikant. Information sharing across private database. In Proceeding of the 22-th SIGMOD International Conference on Man- agement of Data, San Diego, CA, July 2003.
[9] R. L. Allen and D. W. Mills. Signal Analysis: Time, Frequency, Scale, and Structure. Wiley and Sons, 2004.
[10] D. Andersen. Mayday: Distributed filtering for internet services. In Proceedings of the USENIX Symposium on Internet Technologies and Systems, Seattle, WA, March 2003.
[11] D. Andersen, H. Balakrishnan, M. Kaashoek, and R. Morris. Resilient overlay net- works. In Proceedings of 18th ACM Symposium on Operating Systems Principles (SOSP), Banff, Canada, October 2001.
[12] A. Anderson, A. Johnston, and P. McOwan. Motion Illusions and Active Camou- flaging. http://www.ucl.ac.uk/ucbplrd/motion/motion middle.html.
151 [13] R. M. Anderson and R. M. May. Infectious Diseases of Humans: Dynamics and Control. Oxford University Press, Oxford, 1991.
[14] G. Badishi, I. Keidar, and A. Sasson. Exposing and eliminating vulnerabilities to denial of service attacks in secure gossip-based multicast. In Proceedings of the International Conference on Dependable Systems and Networks (DSN), Florence, Italy, June 2004.
[15] M. Bailey, E. Cooke, F. Jahanian, J. Nazario, and D. Watson. The internet motion sensor: A distributed blackhole monitoring system. In Proceedings of the 12-th IEEE Network and Distributed Systems Security Symposium (NDSS), San Diego, CA, February 2005.
[16] N. Barakat and J. Diederich. Eclectic rule-extraction from support vector machines. In Int. Journal Computational Intelligence, volume 1, pages 59–62, 2005.
[17] M. Bellare, S. Goldwasser, and D. Miccianciom. Pseudo-random number genera- tion within cryptographic algorithms: the dss case. In Proceedings of advances in cryptology’97, Lecture Notes in Computer Science, Springer-Verlag, May 1997.
[18] J. Bethencourt, J. Frankin, and M. Vernon. Mapping internet sensors with probe re- sponse attacks. In Proceedings of the 14-th USNIX Security Symposium, Baltimore, MD, July-August 2005.
[19] J. Blazquez, A. Oliver, and JM. Gomez-Gomez. Mutation and Evolution of An- tibiotic Resistance: Antibiotics as Promoters of Antibiotic Resistance, volume 3. Current Drug Targets, August 2002.
[20] D. Bruschi, L. Martignoni, and M. Monga. Detecting self-mutating malware using control flow graph matching. In Proceedings of the Conference on Detection of Intrusions and Malware & Vulnerability Assessment (DIMVA), Berlin, Germany, July 2006.
[21] CAIDA. The Cooperative Association for Internet Data Center. http://www. caida.org.
[22] CAIDA. Telescope Analysis. http://www.caida.org/analysis/security/telescope.
[23] CERT. Advisory CA-1995-18 Widespread Attacks on Internet Sites. http://www.cert .org/advisories/CA-1995-18.html.
[24] CERT. CERT/CC advisories. http://www.cert.org/advisories/.
[25] CERT. Advisory CA-2003-20 W32/Blaster worm. http://www.cert.org/advisories /CA-2003-20.html, 2003.
152 [26] S. Chen and R. Chow. A new perspective in defending against ddos. In Proceed- ings of 10th IEEE Workshop on Future Trends of Distributed Computing Systems (FTDCS), Suzhou, China, May 2004.
[27] Z. S. Chen, L.X. Gao, and K. Kwiat. Modeling the spread of active worms. In Proceedings of the IEEE Conference on Computer Communications (INFOCOM), San Francisco, CA, March 2003.
[28] B. Christian. Full and Naive Bayes Classifiers. http://fuzzy.cs.uni-magdeburg.de/ borgelt/doc/bayes/bayes.html.
[29] M. Christodorescu and S. Jha. Static analysis of executables to detect malicious patterns. In Proceedings of the 12-th USENIX Security Symposium (SECURITY), Washington, DC, August 2003.
[30] M. Christodorescu and S. Jha. Testing malware detectors. In Proceedings of the 2004 ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA), Boston, MA, July 2004.
[31] L. Y. Chuang, C. H. Yang, C. H. Yang, and S. L Lin. An interactive training sys- tem for morse code users. In Proceedings of Internet and Multimedia Systems and Applications, Honolulu, Hawai, August 2002.
[32] M. Ciubotariu. Netsky: a conflict starter? Virus Bullettin, http://www.virusbtn.com, 2004.
[33] BindView Corporation. Strace for NT. http://www.bindview.com/Services/ RA- ZOR/Utilities/Windows/strace readme.cfm.
[34] C. Cowan, C. Pu, D. Maier, H. Hinton, P. Bakke, S. Beattie, A. Grier, P. Wagle, and Q. Zhang. Stack-guard: Automatic adaptive detection and prevention of buffer- overflow attacks. In Proceedings of 7th USENIX Security Symposium (SECURITY), San Antonio, TX, August 1998.
[35] E. J. Crusellers, M. Soriano, and J. L. Melus. Spreading codes generator for wireless cdma network. International Journal of Wireless Personal Communications, 7(1), 1998.
[36] D. E. Denning. An intrusion detection model. IEEE Transactions on Software En- gineering, 13(2):222–232, February 1987.
[37] T. Detristan, T. Ulenspiegel, Y. Malcom, and M. Underduk. Polymorphic shellcode engine using spectrum analysis. http://www.phrack.org/, 2003.
[38] Robert Dixon. Spread Spectrum Systems, 2nd Edition. John Wiley & Sons, 1984.
153 [39] Dshield. Distributed Intrusion Detection System. http://www.dshield .org/.
[40] M. H. Dunham. Data Mining: Introductory and Advanced Topics. Prentice Hall, 1 edition, 2002.
[41] Nova Engineering. Linear Feedback Register Shift. http://www.sss-mag.com/pdf /lfsr.pdf.
[42] M. Ernst. Static and dynamic analysis: Synergy and duality. Portland, Oregon, May 2003.
[43] H. H Feng, J. T. Giffin, Y. Huang, S. Jha, W. Lee, and B. P. Miller. Formalizing sen- sitivity in static analysis for intrusion detection. In Proceedings of IEEE Symposium on Security and Privacy (S&P), Oakland, CA, May 2004.
[44] H. H. Feng, O. M. Kolesnikov, P.Fogla, W. Lee, and W. Gong. Anomaly detection using call stack information. In Proceedings of IEEE Symposium on Security and Privacy (S&P), Oakland, CA, May 2003.
[45] P. Ferrie and P. Szor.¨ Zmist. Zmist opportunities. Virus Bullettin, http://www .virus- btn.com.
[46] G. Fung, S. Sandilya, and R. Rao. Rule extraction from linear support vector ma- chines. In Proceedings of the 11-th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), Chicago, Illinois, August 2005.
[47] D. Gao, M. Reiter, and Dawn Song. Behavioral distance for intrusion detection. In Proceedings of Symposium on Recent Advance in Intrusion Detection (RAID), Seattle, WA, September 1999.
[48] M. Garetto, W. B. Gong, and D. Towsley. Modeling malware spreading dynamics. In Proceedings of the IEEE Conference on Computer Communications (INFOCOM), San Francisco, CA, March 2003.
[49] D. Goldsmith. Incidents Maillist: Possible Code-Red Connection Attempts. http://lists.jammed.com/incidents/2001/07/0149.html.
[50] J. Gordon. Lessons from virus developers: The beagle worm history through april 24. http://www.securityfocus.com/guest/24228, 2004.
[51] V. S. Grichenko. Modular Worms. http://blogs.plotinka.ru/gritzko/modular.pdf, PC World.
[52] P. Gross, J. Parekh, and G. Kaiser. Secure selecticast for collaborative intrusion detection systems. In Proceedings of the 3-th International Workshop on Distributed Event-based Systems (DEBS), Edinburgh, UK, May 2004.
154 [53] G. F. Gu, M. I. Sharif, X. Z. Qin, D. Dagon, W. Lee, and G. F. Riley. Worm de- tection, early warning and response based on local victim information. In Proceed- ings of Proceedings of the 20-th Annual Computer Security Applications Conference (ACSAC 2004), Tucson, Arizona, December 2004.
[54] J. Han and M. Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, 2 edition, 2006.
[55] VMWare Inc. www.vmware.com/virtual-machine.
[56] Operating System Inside. Linux System Call Table. http://osinside.net/syscall /sys- tem call table.htm, 2006.
[57] T. Joachims. Advances in Kernel Methods: Support Vector Machines. MIT Press, Cambridge, Massachusetts, 1998.
[58] K. Julisch and M. Dacier. Mining intrusion detection alarms for actionable knowl- edge. In Proceedings of the 8th ACM International Conference on Knowledge Dis- covery and Data Mining (SIGKDD), Edmonton, Alberta, July 2002.
[59] J. Jung, V. Paxson, A. W. Berger, and H. Balakrishnan. Fast portscan detection using sequential hypothesis testing. In Proceedings of the 25-th IEEE Symposium on Security and Privacy, Oakland, CA, May 2004.
[60] A. Keromytis, V. Misra, and D. Rubenstein. SOS: Secure overlay services. In Proceedings of ACM SIGCOMM, Pittsburg, PA, August 2002.
[61] H. Kim and B. Karp. Autograph: Toward automated, distributed worm signature detection. In Proceedings of the 13-th USENIX Security Symposium, San Diego, CA, August 2004.
[62] H. Kim and B. Karp. Autograph: Toward automated, distributed worm signature detection. In Proceedings of the 13-th USENIX Security Symposium (SECURITY), San Diego, CA, August 2004.
[63] O. Kolesnikov and W. Lee. Advanced Polymorphic Worms: Evading IDS by Blend- ing in with Normal Traffic. Technical report, Georgia Institute of Technology, 2004.
[64] J. Z. Kolter and M. A. Maloof. Learning to detect malicious executables in the wild. In Proceedings of the 10th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), Seattle, WA, August 2004.
[65] Ktwo. Admmutate v0.8.4: Shellcode mutation engine. http://www.ktwo.ca /ADMmutate-0.8.4.tar.gz, 2001.
155 [66] Aleksandar Kuzmanovic and Edward W. Knightly. Low-rate tcp-targeted denial of service attacks (the shrew vs. the mice and elephants). In Proceedings of ACM SIGCOMM, Karlsruhe, Germany, August 2003.
[67] A. Lakhina, M. Crovella, and C. Diot. Mining anomalies using traffic feature distri- bution. In Proceedings of ACM SIGCOMM’05, Philadelphia, PA, August 2005.
[68] E. Larkin. Widespread Internet Attack Cripples Computers with Spyware. http://www.pcworld.com/article/id,120448-page,1/article.html.
[69] K. F. Lee and S. Mahajan. Automatic Speech Recognition: the Development of the SPHINX System. Springer, 1988.
[70] W. Lee, S. Stolfo, and Phil Chan. Learning patterns from unix process execution traces for intrusion detection. In Proceedings of AAAI Workshop: AI Approaches to Fraud Detection and Risk Management, Menlo Park, CA, June 1997.
[71] W. Lee, S. J. Stolfo, and W. Mok. A data mining framework for building intrusion detection models. In Proceedings of the 1999 IEEE Symposium on Security and Privacy (S&P), Oakland, CA, May 1999.
[72] Shengying Li. A Survey on Tools for Binary Code Analysis. Depart- ment of Computer Science, Stony Brook University, http://www.cs.sunysb .edu/ lshengyi/papers/rpe/RPE.htm, 2004.
[73] Metasploit LLC. Windows System Call Table. http://www.metasploit.com/users /op- code/syscalls.html.
[74] X. Luo and RKC Chang. On a new class of pulsing denial-of-service attacks and. the defense. In Proceedings of 13th Annual Network and Distributed System Security Symposium (NDSS), San Diego, CA, February 2005.
[75] J. Ma, G. M. Voelker, and S. Savage. Self-stopping worms. In Proceedings of the ACM Workshop on Rapid Malcode (WORM), November 2005.
[76] R. Mahajan, S. Bellovin, S. Floyd, J. Ioannidis, V. Paxson, and S. Shenker. Control- ling high bandwidth aggregates in the network. In Proceedings of ACM SIGCOMM Computer Communication Review (CCR), Stockholm, Sweden, July 2002.
[77] M. Mannan and P. C. Oorschot. Instant messaging worms, analysis and countermea- sures. In In Proceedings of the 3-th Workshop on Rapid Malcode (WORM), Fairfax, VA, July 2005.
[78] S. Martin, A. Sewani, B. Nelson, K. Chen, and A. Joseph. Analyzing behavioral features for email classification. In Proceedings of the 2th International conference on email and anti-span (CEAS), Mountain view, CA, August 2003.
156 [79] MetaPHOR. http://securityresponse.symantec.com/avcenter/venc/data/w32.simile .html.
[80] Microsoft. Microsoft Virtual PC. http://www.microsoft.com/windows/virtualpc /de- fault.mspx.
[81] J. Mirkovic and P. Reiher. A taxonomy of ddos attack and ddos defense mechanisms. In ACM SIGCOMM Computer Communication Review, April 2004.
[82] J. Mirkovic and P. Reiher. A taxonomy of ddos attack and ddos defense mechanisms. ACM SIGCOMM Computer Communication Review, 34(2):39–54, 2004.
[83] J. Mirkovic and P. Reiher. A taxonomy of ddos attacks and defense mechanisms. ACM SIGCOMM Computer Communications Review, 34(2):39–54, April 2004.
[84] R. Moddemeijer. On estimation of entropy and mutual information of contineous distributions. Signal Processing, 16(3):233–246, 1989.
[85] D. Moore. Network telescopes: Observing small or distant security events. In Invited Presentation at the 11-th USENIX Security Symposium (SEC), San Francisco, CA, August 2002.
[86] D. Moore, V. Paxson, and S. Savage. Inside the slammer worm. IEEE Magazine of Security and Privacy, 1(4):33–39, 2003.
[87] D. Moore, C. Shannon, and J. Brown. Code-red: a case study on the spread and vic- tims of an internet worm. In Proceedings of the 2-th Internet Measurement Workshop (IMW), Marseille, France, November 2002.
[88] D. Moore, C. Shannon, and K. Claffy. Code-red: A case study on the spread and victims of an internet worm. In Proceedings of the 2-th ACM SIGCOMM Workshop on Internet Measurment, Marseille, France, November 2002.
[89] D. Moore, G. M. Voelker, and S. Savage. Infering internet deny-of-service activity. In Proceedings of the 10-th USNIX Security Symposium, Washington, DC, Auguest 2001.
[90] myNetWatchman. myNetWatchman Project. http://www.mynetwatchman .com.
[91] C. Nachenberg. Computer virus-antivirus coevolution. Communications Of The ACM, 40(1):46–51, January 1997.
[92] R. Naraine. Botnet Hunters Search for Command and Control Servers. http://www.eweek.com/article2/0,1759,1829347,00.asp.
157 [93] H. Nunez, C. Angulo, and A. Catala. Rule extraction from support vector machines. In Proceedings of European Symposium on Artificial Neural Networks, Bruges, Bel- gium, August 2002.
[94] Chief of Engineers. United States Army: Army facilities components system user guide. http://www.usace.army.mil/inet/usace-docs/armytm/tm5-304/, October 1990.
[95] K. Park and H. Lee. On the eeffectiveness of route-based packet filtering for dis- tributed dos attack prevention in power-law internets. In Proceedings of ACM SIG- COMM, San Diego, CA, August 2001.
[96] R. Perdisci, O. Kolesnikov, P. Fogla, M. Sharif, and W. Lee. Polymorphic blending attacks. In Proceedings of the 15-th USENIX Security Symposium (SECURITY), Vancouver, B.C., August 2006.
[97] R. K. Pickholtz, D. L. Schilling, and L. B. Milstein. Theory of spead-spectrum communication - tutorial. IEEE Transaction on Communication, 30(5):855–884, 1982.
[98] GNU Project. Linux Function and Macro Index. http://www.gnu.org/software /libc/manual/html node/Function-Index.html#Function-Index.
[99] The Honeynet Project and Research Alliance. Know your enemy: Tracking botnets. http://www.honeynet.org/papers/bots/, 2005.
[100] F. Qin, C. Wang, Z. Li, H. Kim, Y. Zhou, and Y. Wu. Lift: A low-overhead practical information flow tracking system for detecting security attacks. Orlando, Florida, December 2006.
[101] M. Reiter and A. Rubin. Crowds: Anonymity for web transactions. ACM Transac- tions on Information and System Security, 1(1):66–92, November 1998.
[102] P. R. Roberts. Zotob Arrest Breaks Credit Card Fraud Ring. http://www.eweek.com /article2/0,1895,1854162,00.asp.
[103] SANS. Internet Storm Center. http://isc.sans.org/.
[104] S. Savage, D. Wetherall, A. R. Karlin, and T. Anderson. Practical network support for ip traceback. In Proceedings of ACM SIGCOMM, Stockholm, Sweden, August 2000.
[105] Stuart Schechter, Jaeyeon Jung, and Arthur W. Berger. Fast Detection of Scanning Worm Infections. In Proceedings of the 7-th International Symposium on Recent Advances in Intrusion Detection (RAID), French Riviera, France, September 2004.
158 [106] M. G. Schultz, E. Eskin, E. Zadok, and S. J. Stolfo. Data mining methods for detec- tion of new malicious executables. In Proceedings of IEEE Symposium on Security and Privacy (S&P), Oakland, CA, May 2001.
[107] M. Sedalo. Jempiscodes: Polymorphic shellcode generator. http://securitylab.ru/, 2003.
[108] V. Sekar, Y. Xie, D. Maltz, M. Reiter, and H. Zhang. Toward a framework for internet forensic analysis. In Proceeding of the 3-th Workshop on Hot Topics in Networks (HotNets-III), San Diego, CA, November 2004.
[109] C. E . Shannon and W. Weaver. The Mathematical Theory of Communication. Uni- versity of Illinois Press, 1949.
[110] Y. Shinoda, K. Ikai, and M. Itoh. Vulnerabilities of passive internet threat moni- tors. In Proceedings of the 14-th USNIX Security Symposium, Baltimore, MD, July- August 2005.
[111] S. Singh, C. Estan, G. Varghese, and S. Savage. Automated worm fingerprinting. In the 6th ACM/USENIX Symposium on Operating System Design and Implementation (OSDI), Fairfax, Virginia, December 2004.
[112] S. Singh, C. Estan, G. Varghese, and S. Savage. Automated worm fingerprinting. In Proceedings of the ACM/USENIX Symposium on Operating System Design and Implementation, December 2004.
[113] L. Spitzner. Know Your Enemy: Honeynets. Honeynet Project, http://project .hon- eynet.org/papers/honeynet.
[114] S. Staniford, V. Paxson, and N. Weaver. How to own the internet in your spare time. In Proceedings of the 11-th USENIX Security Symposium, San Francisco, CA, August 2002.
[115] I. Stoica, D. Adkins, S. Zhuang, S. Shenker, and S. Surana. Internet indirection in- frastructure. In Proceedings of ACM SIGCOMM Conference, Pittsburge, PA, August 2002.
[116] R. Stone. Centertrack: An ip overlay network for tracking dos floods. In 9th USENIX Security Symposium, San Francisco, CA, August 2000.
[117] P. Szor and P. Ferrie. Hunting for metamorphic. In Proceedings of Virus Bulletin Conference, September 2001.
[118] S. Theodoridis and K. Koutroumbas. Pattern Recognition, Second Edition. Elsevier Science, 2003.
159 [119] Paul Thurrott. Windows ”Longhorn” FAQ. http://www.winsupersite.com/faq /longhorn.asp.
[120] J. Twucrpss and M. M. Williamson. Implementing and testing a virus throttling. In Proceedings of the 12-th USENIX Security Symposium, Washington, DC, August 2003.
[121] V. Vapnik. The Nature of Statistical Learning Theory. Springer-Verlag, New York, 1995.
[122] V. Vapnik. Statistical Learning Theory. John Wiley and Sons, New York, 1998.
[123] S. Venkataraman, D. Song, P. Gibbons, and A. Blum. New streaming algorithms for superspreader detection. In Proceedings of the 12-th IEEE Network and Distributed Systems Security Symposium (NDSS), San Diego, CA, Febrary 2005.
[124] D. Wagner and D. Dean. Intrusion detection via static analysis. In Proceedings of IEEE Symposium on Security and Privacy (S&P), Oakland, CA, May 2001.
[125] J. Wang, L. Lu, and A. A. Chien. Tolerating denial-of-service attacks using overlay networks – impact of overlay network topology. In Proceedings of ACM Workshop on Survivable and Self-Regenerative Systems, Fairfax, Virginia, October 2003.
[126] L. Wang and B. B. Hirsbrunner. Pn-based security design for data storage. In Pro- ceedings of Databases and Applications, Innsbruck, Austria, Feberary 2004.
[127] X. Wang, S. Chellappan, C. Boyer, and D. Xuan. On the effectiveness of secure over- lay forwarding systems under intelligent distributed dos attacks. IEEE Transactions on Parallel and Distributed Systems (TPDS), 17(7):619–632, July 2006.
[128] X. Wang, S. Chellappan, P. Boyer, and D. Xuan. Analyzing secure overlay forward- ing systems under intelligent ddos attacks. Technical Report, OSU-CISRC-12/04- TR71, Dept. of Computer Science and Engineering, The Ohio State University, June 2004.
[129] X. Wang, W. Yu, A. Champion, X. Fu, and D. Xuan. Detecting Worms via Min- ing Dynamic Program Execution. to appear in IEEE International Conference on Security and Privacy in Communication Networks (SecureComm), Nice, France, September 2007.
[130] X. Wang, W. Yu, X. Fu, D. Xuan, and W. Zhao. iLOC: An invisible LOCalization Attack to Internet Threat Monitoring Systems. submitted to the IEEE Conference on Computer Communications (INFOCOM), July 2007.
160 [131] N. Weaver, S. Staniford, and V. Paxson. Very fast containment of scanning worms. In Proceedings of the 13-th USENIX Security Symposium, San Diego, CA, August 2004.
[132] J. Wu, S. Vangala, and L. X. Gao. An effective architecture and algorithm for detect- ing worms with various scan techniques. In Proceedings of the 11-th IEEE Network and Distributed System Security Symposium (NDSS), San Diego, CA, Febrary 2004.
[133] X. G. Xia, C. G. Boncele, and G. R. Arce. A multiresolution watermark for digital images. In Proceedings of International Conference on Image Processing (ICIP’97), Washington, DC, October 1997.
[134] L. Xiao, Z. Xu, and X. Zhang. Mutual anonymity protocols for hybrid peer-to-peer systems. In Proceedings of IEEE International Conference on Distributed Comput- ing Systems (ICDCS), Providence, RI, May 2003.
[135] D. Xuan, S. Chellappan, X. Wang, and S. Wang. Analyzing the secure overlay services architecture under intelligent ddos attacks. In Proc. of IEEE International Conference on Distributed Computing Systems (ICDCS), Tokyo, Japan, March 2004.
[136] S. Yang, J. P. Song, H. Rajamani, T. W. Cho, Y. Zhang, and R. Mooney. Fast and effective worm fingerprinting via machine learning. In Proceedings of the 3rd IEEE International Conference on Autonomic Computing (ICAC), Dublin, Ireland, June 2006.
[137] V. Yegneswaran, P. Barford, and S. Jha. Global intrusion detection in the domino overlay system. In Proceedings of the 11-th IEEE Network and Distributed System Security Symposium (NDSS), San Diego, CA, Febrary 2004.
[138] V.Yegneswaran, P. Barford, and D. Plonka. On the design and utility of internet sinks for network abuse monitoring. In Proceeding of Symposium on Recent Advances in Intrusion Detection (RAID), Pittsburgh, PA, September 2003.
[139] W. Yu, X. Fu, S. Graham, D. Xuan, and W. Zhao. Dsss-based flow marking technique for invisible traceback. In Proceedings of IEEE Symposium on Security and Privacy (S&P), Oakland, CA, May 2007.
[140] W. Yu, X. Wang, D. Xuan, and D. Lee. Effective detection of active worms with varying scan rate. Technical report, Department of Computer Science and Engineer- ing, The Ohio State University, April 2005.
[141] W. Yu, X. Wang, D. Xuan, and D. Lee. Effective detection of active worms with varying scan rate. In Proceedings of IEEE International Conference on Security and Privacy in Communication Networks (SecureComm), Baltimore, MD, August 2006.
161 [142] W. Yu, X. Wang, D. Xuan, and W. Zhao. On detecting camouflaging worm. In An- nual Computer Security Applications Conference (ACSAC), Miami, FL, December 2006.
[143] W. Yu, N. Zhang, and W. Zhao. Self-adaptive worms and countermeasures. In Pro- ceedings of Symposium on Stabilization, Safety, and Security of Distributed Systems (SSS), Dallas, TX, 2006 November.
[144] Zdnet. Smart worm lies low to evade detection. http://news.zdnet.co.uk /inter- net/security/0,39020375,39160285,00.htm.
[145] N. Zhang, S. Wang, and W. Zhao. A new scheme on privacy preserving associ- ation rule mining. In Proceeding of the 8-th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), Pisa, Italy, September 2004.
[146] C. Zou, L. Gao, W. Gong, and D. Towsley. Monitoring and early warning for internet worms. In Proceedings of the 10th ACM conference on Computer and Communica- tion Security (CCS), Washington D.C., October 2003.
[147] C. Zou, W. B. Gong, D. Towsley, and L. X. Gao. Monitoring and early detection for internet worms. In Proceedings of the 10-th ACM Conference on Computer and Communication Security (CCS), Washington DC, October 2003.
[148] C. C. Zou, W. Gong, and D. Towsley. Code red worm propagation modeling and analysis. In Proceedings of the 9-th ACM Conference on Computer and Communi- cation Security (CCS), Washington, DC, November 2002.
162