Email Worm Modeling and Defense
Total Page:16
File Type:pdf, Size:1020Kb
Email Worm Modeling and Defense CliffC.Zou∗,DonTowsley†, Weibo Gong∗ ∗Department of Electrical & Computer Engineering †Department of Computer Science University of Massachusetts, Amherst, MA 01003 {czou, gong}@ecs.umass.edu, [email protected] Abstract— Email worms constitute one of the major Internet little differences in terms of the propagation dynamics of email security problems. In this paper, we present an email worm worms. model that accounts for the behaviors of email users by con- Based on the above email worm model, we study the ef- sidering email checking time and the probability of opening email attachments. Email worms spread over a logical network fectiveness of “selective immunization defense”, i.e., selecting defined by email address relationship, which plays an important some most connected email users’ computers to make them role in determining the spreading dynamics of an email worm. immune to an email worm. The results explain why selective Our observations suggest that the node degrees of an email immunization defense against email worm propagation is quite network are heavy-tailed distributed. We compare email worm effective for a power law topology but not so good for the other propagation on three topologies: power law, small world and random graph topologies; and then study how the topology two topologies. affects immunization defense on email worms. The impact of The rest of the paper is organized as follows. Section II the power law topology on the spread of email worms is mixed: surveys related work. Section III presents the email worm email worms spread more quickly on a power law topology than propagation model. In Section IV, we explains why we choose on a small world topology or a random graph topology, but immunization defense is more effective on a power law topology power-law topology to represent the logical email network. We than on the other two. conduct extensive simulation studies and present the results in Section V. Then we study selective immunization defense I. INTRODUCTION against email worm attacks in Section VI. Finally, we make some discussions and conclude the paper with Section VII. “Email worms” are malicious computer programs that prop- agate through email: when an email user clicks a worm II. RELATED WORK program in the attachments of a worm email, the worm compromises the user’s computer and then finds all email Kephart, White and Chess studied viral infection based on addresses stored on this computer to send out worm email. epidemiology models from 1991 to 1993 [2][3][4], where Email has become an indispensable communication medium they considered virus spreading on random graph networks in our life. However, email worms keep attacking us with in- and local networks. After Code Red worm incident [1] in creasing intensity and using more advanced social engineering 2001, many researchers have studied worm propagation mod- tricks. Some famous email worms include Melissa in 1999, eling [5][6][7][8]. However, these papers studied “scan-based” “Love Letter” in 2000, “W32/Sircam” in 2001, “SoBig”in worms that propagate through random scanning — models of 2003, “MyDoom”, “Bagle” and “Netsky” in this year [1]. scan-based worms do not need to consider topological issues Like earthquake modeling or tornado modeling, a good and thus not suitable for modeling email worms. email worm model gives us deep understanding of email Wang et al. [9] simulated a simple virus propagation based worms, helps us evaluate the effectiveness of defense mecha- on a clustered topology and a tree-like hierarchic topology. nisms, and provides possible early warning to help us control Newman et al. [10] collected email address book data from a worm’s potential damage. In this paper, we first present a a university and showed that the campus-level email topology realistic email worm model that accounts for the behaviors has an exponential and a stretched exponential distribution for of email users by considering users’ email checking time and in-degree and out-degree, respectively. However, they didn’t the probability of opening email attachments. Then we carry consider email lists, which can dramatically increase edges in out extensive simulation studies. Email worms spread over a an email network. In addition, most email worms send out logical network defined by email addresses; our observation worm email to all addresses existed in a computer, not just shows that the Internet-scale email network might be heavy- in email address books. Thus the Internet-scale email network tailed distributed and we model it as a power law network. may follow a completely different topology. Briesemeister et To study how topology affects an email worm’s propagation, al. [11] used simulation to study epidemic spreading on scale- we compare worm spreading on power law, small world and free networks without considering other topologies and user random graph topologies. We derive the conclusion that email interactions. Satorras et al. [12] used SIS epidemic model to worms spread more quickly on a power law topology than study epidemic spreading in scale-free networks — the SIS on the other two topologies; the other two topologies have model is not suitable for modeling the propagation of one email worm because a cured host is not likely to become of the checking time interval Ti for user i, i =1, 2, ··· , |V |. Ti susceptible to the same email worm again. may follow different distributions. For example, it is a constant From an email worm’s point of view, the connectivity of a value when a user checks email once every morning or uses partly immunized email network is a “percolation” problem. email client programs to fetch and check email at a specified Newman et al. [13] derived the analytical solution of the time interval; it is exponentially distributed (i.e., checking “percolation threshold” for arbitrary topologies: if nodes are action is a Poisson process) if a user checks email at a random removed randomly from a network, the network will be broken time. In this paper we will study how different distributions into pieces when the fraction of removed nodes is higher than of email checking time interval affect the propagation of an the network’s “percolation threshold”. However, it is more email worm. effective and reasonable to “selectively immunize” nodes in an Since the number of email users in the Internet is very large email network rather than the uniform immunization studied and users’ behaviors are independent, we assume that the mean in [13]. Albert et al. [14] showed that a power law network checking email time E[Ti] of user i is generated by a Gaussian 2 is vulnerable under selective attack, which is consistent to our distributed random variable T , i.e., T ∼ N(µT ,σT ) (E[Ti]= conclusions. 0 when T<0). We assume that when a user checks his email, he checks all new email in his mailbox. III. EMAIL WORM PROPAGATION MODEL The opening probability Pi of user i is determined by: (1) We represent the topology of the logical email network by the user’s security awareness; and (2) the social engineering an undirected graph G =<V,E>. ∀v ∈ V , v denotes an tricks deployed by an email worm (e.g., “MyDoom” infected email user; ∀e =(u, v) ∈ E, u, v ∈ V , represents that two more users than any email worm before due to its advanced users u and v have the email address of each other in their social engineering techniques [1]). Therefore, for the propaga- d own computers. “Node degree” of a node means that this tion of one email worm, we assume Pi to be constant for user d d |V | node has edges connecting to other nodes. is the total i. Similar to E[Ti], we assume Pi of user i is generated by a 2 number of email users. For a reader’s convenience, Table I Gaussian distributed random variable P , i.e., P ∼ N(µP ,σP ) lists most notations used in this paper. (Pi =0when P<0 and Pi =1when P>1). An email user is called infected once the user opens an TABLE I email worm attachment; upon opening a worm attachment, NOTATIONS USED IN THIS PAPER an infected user immediately sends out worm email to all Symbol Explanation his neighbors. Let N0 denote the number of initially infected G G =<V,E> Undirected graph representing email network, users. Let random variable Nt denote the number of infected Nt Number of infected users at time t t N ≤ N ≤ α Power law exponent of a power law topology that has users at time during email worm propagation, 0 t complementary cumulative degree distribution F (d) ∝ d−α |V |, ∀t>0. It takes time before a recipient receives an email h N∞ Number of users uninfected when a worm finishes spreading worm sent out by an infected user; but the email transmission T i i =1, 2, ··· , |V | i Email checking time interval of user , time is usually much smaller compared with a user’s email Pi Probability of user i to open email worm attachments T Gaussian-distributed random variable that generates E[Ti], checking time interval, and thus, it is ignored in our model. 2 T ∼ N(µT ,σT ) (E[Ti]=0when T<0) P Gaussian-distributed random variable that generates Pi, 2 IV. EMAIL NETWORK TOPOLOGY DISCUSSION P ∼ N(µP ,σP ) (Pi =0when P<0 P =1 P>1 and i when ) f(d) d Dt Average degree of those nodes that are healthy before time t Let be the fraction of nodes with node degree but are infected at time t, ∀t>0 in email network graph G. The complementary cumulative ∞ L(p) Remained link ratio — fraction of links remained after distribution function (ccdf) is denoted by F (d)= f(i), removal of the top p percent most connected nodes i=d C(p) Connection ratio — fraction of remained nodes that are i.e., the fraction of nodes with degree greater than or equal connected after removal of the top p most connected nodes to d.