Analyzing, Modeling, and Improving the Performance of Overlay Networks

Richard Winfried Thommes

Department of Electrical & Computer Engineering McGill University Montreal, Canada

November 2007

A thesis submitted to McGill University in partial fulfillment of the requirements for the degree of Doctor of Philosophy.

© 2007 Richard Winfried Thommes

2007/11/01 Library and Bibliotheque et 1*1 Archives Canada Archives Canada Published Heritage Direction du Branch Patrimoine de I'edition

395 Wellington Street 395, rue Wellington Ottawa ON K1A0N4 Ottawa ON K1A0N4 Canada Canada

Your file Votre reference ISBN: 978-0-494-51007-0 Our file Notre reference ISBN: 978-0-494-51007-0

NOTICE: AVIS: The author has granted a non­ L'auteur a accorde une licence non exclusive exclusive license allowing Library permettant a la Bibliotheque et Archives and Archives Canada to reproduce, Canada de reproduire, publier, archiver, publish, archive, preserve, conserve, sauvegarder, conserver, transmettre au public communicate to the public by par telecommunication ou par Plntemet, prefer, telecommunication or on the Internet, distribuer et vendre des theses partout dans loan, distribute and sell theses le monde, a des fins commerciales ou autres, worldwide, for commercial or non­ sur support microforme, papier, electronique commercial purposes, in microform, et/ou autres formats. paper, electronic and/or any other formats.

The author retains copyright L'auteur conserve la propriete du droit d'auteur ownership and moral rights in et des droits moraux qui protege cette these. this thesis. Neither the thesis Ni la these ni des extraits substantiels de nor substantial extracts from it celle-ci ne doivent etre imprimes ou autrement may be printed or otherwise reproduits sans son autorisation. reproduced without the author's permission.

In compliance with the Canadian Conformement a la loi canadienne Privacy Act some supporting sur la protection de la vie privee, forms may have been removed quelques formulaires secondaires from this thesis. ont ete enleves de cette these.

While these forms may be included Bien que ces formulaires in the document page count, aient inclus dans la pagination, their removal does not represent il n'y aura aucun contenu manquant. any loss of content from the thesis. Canada 1

Abstract

This thesis consider factors impacting the performance of certain overlay networks. We present models and make a number of novel contributions in the form of proposed improve­ ments. Congestion of the underlying network affects any overlay network. Congestion control schemes rely on quick and accurate aggregation of link shadow prices calculated by indi­ vidual routers. We present a deterministic packet marking algorithm which allows routers to encode these prices in the ECN field of the IP packets traversing the link, and allows the receiver to estimate the total path price. We show, through simulation, that the per­ formance of our algorithm exceeds that of a previously proposed random marking scheme in a number of scenarios, particularly when the prices are time-varying. The class of Peer-to-peer (P2P) networks are extremely popular for file-. Two phenomena have emerged which reduce the efficacy of P2P networks. One is pollution: the presence of corrupt or mislabeled files. The other is the presence of viruses specifically designed to spread over P2P networks. Once downloaded and executed, these viruses typically produce multiple copies of themselves in order to increase their proliferation in the network. We present models based on the field of epidemiology to examine the propagation of viruses and polluted files in the networks. A number of simulations indicate the effect various model parameters have on the extent to which pollution/viruses become prevalent in the network. We also consider the effect of object reputation schemes designed to lessen the impact of these problems, and determine their effectiveness under various conditions. (BT) is an especially efficient and popular P2P protocol. It divides files into small segments, thus enabling peers to upload to others before they have completed downloading the entire file. The most significant issue with BT is a potential lack of fairness, meaning peers may upload much more data than they download. To address this issue, we present three modifications to the protocol intended to increase fairness. We present a simplified model of BT and conduct simulations which illustrate the effectiveness of our proposed modifications. 11

Sommaire

Cette these considere divers facteurs influengant la performance de divers reseaux de re- couvrement. Nous presentons quelques modeles et proposons plusieurs ameliorations pour ce type d'architecture. La congestion du reseau fondamental affecte n'importe quel reseau de recouvrement. L'agregation rapide et precise du prix virtuels de chaque lien calcules par les differents routeurs est essentiel pour mettre en place un mecanisme de controle de congestion. Nous presentons un algorithme d'inscription deterministe de paquet qui per- met aux routeurs de coder ces prix dans le champ ECN des paquets IP traversant le lien, et permettons au destinataire d'estimer le cout du chemin. Nous demontrons, a l'aide de simulations, que la performance de notre algorithme est superieure a celle de methodes alea- toires proposes precedemment dans un certain nombre de scenarios, en particulier quand les prix varient avec le temps. Les reseaux Poste-a-poste (P2P) sont extremement populaires pour le partage de fichiers. Deux phenomenes ayant emerge reduisent l'efficacite des reseaux P2P : la pollution (la presence de fichiers corrompus ou marques incorrectement) et la presence des virus speci- fiquement congus pour se propager a travers les reseaux P2P. Une fois telecharges et exe­ cutes, ces virus produisent typiquement de multiples copies d'eux-memes afin d'augmenter leur proliferation dans le reseau. Nous presentons des modeles bases sur le champ de l'epidemiologie pour examiner la propagation des virus et des fichiers pollues dans les reseaux. Nos simulations demontrent l'effet que les divers parametres de ces modeles ont sur l'etendu de la pollution et des virus dans le reseau. Nous considerons egalement de l'effet des mecanismes de reputation d'objet congus pour diminuer l'impact de ces problemes, et determinons leur efficacite dans diverses conditions. BitTorrent (BT) est un protocole particulierement efficace et populaire de P2P. II divise des fichiers en petits segments, ce qui permet a chaque pair d'amorcer des televersements vers d'autres pairs avant d'avoir telecharger le fichier en entier. Un des principaux problemes de BT est son potentiel manque d'equite, 'est-a-dire que les pairs peuvent telecharger beaucoup plus de donnees qu'ils en televersent aux autres. Pour aborder cette question, nous presentons trois modifications au protocole ayant pour but d'augmenter l'equite entre les utilisateurs. Nous presentons un modele simplifie de BT et conduisons des simulations qui illustrent l'efficacite des modifications que nous proposons. Ill

Acknowledgments

I would like to express my gratitude to my thesis supervisor, Dr. Mark Coates. I greatly appreciate his mentoring, encouragement, and unwavering support.

Financial support for this thesis was provided by Dr. Coates via a grant from NSERC and Bell Canada. I wish to thank them for their assistance.

I also thank the other members of my Ph.D. committee, Dr. Ioannis Psaromiligkos and Dr. Peter Edwin Caines.

I would like to recognize the graduate students in the McGill ECE department for their help and friendship. In particular, I thank Tuncer Can Aysal, Yvan Pointurier, and Frederic Thouin for proofreading this manuscript. Further thanks to Frederic for translating the abstract.

I would like to recognize my family for their vital contributions. My parents and my brother provided me the love, support and encouragement to complete my previous educa­ tion, and greatly contributed to my success at McGill. Contents

1 Introduction 1 1.1 Measuring Network Congestion 2 1.2 Modeling and Evaluating P2P Virus Propagation and Pollution 2 1.3 Analyzing and Improving Bit rent Performance 3 1.4 Novel Contributions 4 1.4.1 Measuring Network Congestion 4 .1.4.2 Modeling and Evaluating P2P Virus Propagation and Pollution ... 4 1.4.3 Analyzing and Improving BitTorrent Performance 4 1.5 Related Publications : • 5 1.5.1 Journal Paper 5 1.5.2 Major Conferences . . 5 1.5.3 Other Conferences and Workshops 5

2 Network Congestion Pricing: Background and Related Work 6 2.1 Background 6 2.1.1 The Marking Problem 6 2.2 Related Work 8 2.2.1 Examining How Shadow Prices are Defined 8 2.2.2 Modeling Networks Utilizing Price-Based Congestion Control 12 2.2.3 Issues Related to the Practical Implementation of Price-based Con­ gestion Control 19 2.2.4 Maximum Link Price Estimation 24

3 Deterministic Packet Marking for Congestion Price Estimation 26 3.1 Chapter Structure 26 Contents v

3.2 Deterministic Packet Marking 26 3.2.1 IP Standard 26 3.2.2 Preliminaries 27 3.2.3 The IP Identification Field and Probe Types 29 3.2.4 The Router Marking Algorithm 30 3.2.5 Path Price Estimation 32 3.3 Error Analysis 32 3.3.1 Distribution of Missing Probe Types 34 3.3.2 Representing the End-to-End Error 34 3.3.3 A Bound on the Expected Mean-Squared Error 36 3.4 Simulation Performance 37 3.4.1 Missing Probe Types 40 3.4.2 Error Analysis 40 3.4.3 Mean-Squared Error Analysis 44 3.5 Time Varying Behaviour 44 3.6 Practical Issues 48 3.6.1 Receiver Feedback 48 3.6.2 Security 49 3.7 Maximum Link Price Estimation 50 3.7.1 Algorithm Specification 51 3.7.2 Performance Analysis 52 3.7.3 Simulation Results 53 3.7.4 Related Publications 57 3.8 Summary 58

4 Background and Related Work: P2P Viruses and Pollution 59 4.1 Background 59 4.1.1 P2P Networks, Viruses and Pollution 60 4.2 Related Work 61 4.2.1 Internet-Related Epidemiological Modeling 61 4.2.2 Modeling and Simulating Pollution in P2P Networks 68 4.2.3 Measurement of P2P Pollution 76 Contents vi

5 Epidemiological Models of Peer-to-Peer Viruses and Pollution 80 5.1 Chapter Structure 80 5.2 S-E-I P2P Virus Model 80 5.2.1 Model Equations 82 5.2.2 Steady State Behaviour 84 5.3 S-E-I-R Model .86 5.3.1 S-E-I-R Model Equations 87 5.3.2 S-E-I-R Model Extensions 90 5.4 P2P Pollution Model . . 94 5.4.1 The Impact of Object Reputation Schemes 96 5.5 P2P Measurements 98 5.6 Simulation Results 101 5.6.1 S-E-I Model 101 5.6.2 S-E-I Simulations with Varying Peer Behaviour 104 5.6.3 S-E-I-R Model 114 5.6.4 Pollution Model Behaviour and Simulations 115 5.6.5 Impact of Object Reputation Schemes on P2P Virus Propagation . . 115 5.6.6 Impact of Object Reputation Schemes on P2P Pollution 116 5.7 Summary 116

6 Background and Related Work: BitTorrent 117 6.1 Background 117 6.1.1 The BitTorrent Protocol 118 6.2 Related Work 119 6.2.1 BitTorrent Measurement Studies 119 6.2.2 Modeling BitTorrent 121 6.2.3 Proposed Improvements to BitTorrent 126

7 BitTorrent Fairness: Analysis and Improvements 130 7.1 Chapter Structure 130 7.2 Model Description 130 7.2.1 Simulator Description 131 7.3 Theoretical Analysis 131 Contents vii

7.4 Proposed BitTorrent Modifications 133 7.4.1 Conditional Optimistic Unchoke 133 7.4.2 Multiple Connection Chokes 133 7.4.3 Variable Number of Outgoing Connections 134 7.5 Results 135 7.6 VOC Analysis 138 7.6.1 New Peer Joining the Network: Time spent Free-Riding 140 7.6.2 New Peer Joining the Network: Time to Achieve An Equal Number of Incoming and Outgoing Connections 141 7.6.3 Achieving Perfect Fairness 141 7.7 Summary 143

8 Conclusion 144 8.1 Congestion Price Estimation 144 8.2 Modeling P2P Viruses and Pollution 145 8.3 Improving BitTorrent Fairness 146

A 147 A.l Identified P2P Viruses 147 A.2 Rates at Which Number of Infected Files Changes 147 A.3 Rate at Which Proportion of Infected Files Changes 156

B 158

References 160 Vlll

List of Figures

2.1 High-level illustration of network model with price signals 18 2.2 Model of congestion control feedback loop 25

3.1 Marking Algorithm I example 28 3.2 Marking Algorithm II operation 31 3.3 Analysis of the distribution of probe types 38 3.4 The survival function for the number of missing probe types 41 3.5 Error probability vs. path length 42 3.6 Error probability and mean-squared error of marking algorithms 43 3.7 Estimating a time-varying price 45 3.8 Error probability of marking algorithms 46 3.9 Maximum link price estimation example 54 3.10 CDF of outdated estimate and correct estimate intervals 54 3.11 RMS error of MLP vs. delay between link price changes 55

4.1 Observed Code Red data compared with behaviour predicted by model. ... 63 4.2 J(t), I(t) and Q(t) as predicted by model, and J(t) according to the classic K-M model 64 4.3 Measured Code Red Worm behaviour vs. AAWP model prediction 65 4.4 Small-world graph example 67 4.5 Reaching probability of virus on a ring 68 4.6 Steady-state pollution level 71 4.7 Effect of poisoning techniques on temporal stability 74 4.8 Effect of poisoning techniques on content replication 75 List of Figures ix

4.9 Percentage of polluted versions and copies of songs for the top 500 versions in the network 77 4.10 Measured poisoning and pollution levels in Fastrack and 78

5.1 Transition diagram for peer states 94 5.2 Empirical CDFs based on eDonkey measurement data 98 5.3 Dynamic P2P network behaviour 100 5.4 The effect of the initial infection on the evolution of the number of infected peers 102 5.5 The effect of varying model parameters on the analytical steady-state pro­ portion of infected files 103 5.6 The impact of variability in individual peer download rates 105 5.7 The impact of variability in individual peer recovery rates 106 5.8 Examining the effect that the initial number of Exposed peers 107 5.9 The effect of the recovery rate on the level of Infected peers 108 5.10 Examining the presence of oblivious peers. 109 5.11 Examining the behaviour of the pollution model 110 5.12 The effect of varying model parameters on the analytical steady-state pro­ portion of polluted files Ill 5.13 The impact of using an object reputation scheme on the residual proportion of infected files 112 5.14 Comparing the epidemiological model and a discrete time simulation . . . .113 5.15 The impact of using an object reputation scheme on the steady-state pro­ portion of polluted files 113

6.1 Measured BitTorrent data 119 6.2 Distribution of time peers remain connected to a tracker upon completing their download 121 6.3 Average number of unchoked connections vs. peer set size 125 6.4 Performance of suggested BitTorrent improvements 129

7.1 Empirical cumulative distribution function of Average Fairness Ratio . . . .136 7.2 Scatterplots of Average Fairness Ratio versus upload capacity 137 7.3 Average Instantaneous Fairness Ratio versus time 139 List of Figures x

7.4 Average Ranking Difference vs. time 140 XI

List of Tables

3.1 The breakdown of observed IP id behaviour 39 3.2 Observed Server Operating Systems 39 3.3 Mean-squared error of estimates, and empirically determined optimal block length for RAM 47 3.4 Root mean-squared error of estimates during OE/CRE and CE phases ... 57

4.1 Number of available versions and copies of seven popular songs 77 4.2 Statistics for legitimate clients and attackers in the Fastrack network 79

5.1 SEIR P2P model variables potentially affected by each event 82 5.2 SEI model variables potentially affected by each possible event 87 5.3 Observed eDonkey2000 peer download behaviour 101

A.l Summary of known P2P viruses 152 1

Chapter 1

Introduction

Overlay networks are constructed on top of other, existing networks. They are conceptually virtual in the sense that they introduce a level of abstraction where a single logical link in the overlay network may correspond to a path spanning multiple links in the underlying network. Overlay networks take advantage of the addressing, routing, and other services offered by the underlying network in order to offer a new service. The advantage they offer is that they are flexible and quickly deployed because they do not require the addition of new hardware or the redesign of existing network protocols. A number of existing overlay networks are built upon the TCP/IP Internet protocol suite [1]. For instance, MBONE [2] adds multicast capabilities to the Internet for ap­ plications such as videoconferencing by establishing a set of unicast connections between multicast routers in order to create a virtual network. Another example is the Resilient Overly Networks project [3] which improves the reliability of network connections between nodes by allowing them to probe multiple paths and choose the best one. The class of peer-to-peer (P2P) networks may also be described as overlay networks. P2P networks are characterized by interconnected nodes which serve as both and servers. The pro­ cessing power and upload/download capacities of all participating nodes is utilized, rather than that of centralized servers. Contemporary examples include Kazaa [4], [5], and BitTorrent [6]. In this thesis we examine three factors affecting the performance of overlay networks. We consider a general technique applicable to all overlay networks - accurate congestion measurement; a phenomenon affecting the subset of P2P networks - pollution and virus

2007/11/01 1 Introduction 2 propagation; and a factor of importance to a particular P2P network - the fairness of BitTorrent. We identify ways to measure the impact of these factors, model them, and present novel techniques to alter their impact in order to increase network performance.

1.1 Measuring Network Congestion

The performance of an overlay network is closely tied to the functioning of the underlying network. Thus, it is critical to combat network congestion, a situation that results in longer delays in packets reaching their destination, and, in some cases, being dropped entirely. Congestion arises when network resources (in particular link bandwidth, and memory and processing capabilities of network elements such as routers and switches) become over- utilized. A number of price-based congestion control algorithms have been proposed. For these to work, routers must calculate a congestion price which depends on the current level of congestion at the router. Each router must send its price to the source so that its rate can be changed as required. The Explicit Congestion Notification (ECN) field in the IP header can be used to encode and send this information. Two proposed marking techniques [7,8] are based on routers modifying the ECN field probabilistically, and enable receivers to estimate the sum of the link prices by counting the fraction of marked packets. In Chapter 2 we present background and related work in the field of network pricing and congestion control. In Chapter 3 we introduce and evaluate a novel deterministic marking scheme for encoding and estimating path prices, allowing sources in a network to react quickly to congestion so that any overlay services may continue to operate satis­ factorily. We believe that the two random marking algorithms have weaknesses, and that our deterministic quantized marking (DQM) algorithm offers advantages. We evaluate its performance in comparison to probabilistic marking, for both static and time-varying path prices. We also introduce a variant of the scheme which allows the maximum link price along any path to be estimated.

1.2 Modeling and Evaluating P2P Virus Propagation and Pollution

P2P networks are vulnerable to malicious behaviour, including the dissemination of polluted versions of files and the release of P2P viruses. Pollution refers to the presence of corrupted 1 Introduction 3 or mislabeled versions of files. It has resulted in a significant portion of the files on popular P2P networks becoming unusable. A number of viruses specifically targeting P2P networks have surfaced. Typically, when a user downloads and executes an infected file, it creates multiple copies of itself which are then shared with other users on the network. Chapter 3 provides background information and related work in measuring and model­ ing pollution and virus propagation. In chapter 4, we begin by considering the behaviour of viruses and pollution in P2P networks, and then develop dynamic models based on epi­ demiology to describe the evolution of infection/pollution. These models are deterministic and focus on the expected behaviour of the system. However, we illustrate, through sim­ ulations, that the they are sufficiently accurate to capture the random behaviour of P2P networks. These models are useful for both studying the impact of malicious code on P2P net­ works, and to consider the effectiveness of mitigation techniques. We look at object rep­ utation schemes (such as Credence [9]) and other techniques for reducing infected files in the network. We provide approximations on how effective these schemes are in achieving a reduction in polluted or infected files. We validate these specifications through more accurate simulation of the networks.

1.3 Analyzing and Improving BitTorrent Performance

The most widely used and efficient currently deployed P2P protocol is BitTorrent (BT). It is useful for sharing large files, by breaking them into multiple small blocks. A peer can download different blocks at the same time from various peers, and can immediately begin uploading blocks it has downloaded. BT also includes measurements to discourage free-riding. Measurement studies [10,11] and detailed simulation studies [12] have shown that the only substantial problem with BT is fairness. It has been observed that peers with high upload bandwidth frequently upload much more data than they download, with the opposite being the case for peers with low upload bandwidths. In this section, we focus on the peer selection and search method and conduct anal­ ysis and simulations to explore the fairness properties of these mechanisms. Our results illustrate that these techniques can induce substantial unfairness under certain conditions. 1 Introduction 4

We propose several modifications to Bit Torrent to improve the fairness and examine their impact through simulation.

1.4 Novel Contributions

1.4.1 Measuring Network Congestion

• Detailed specification of a deterministic packet marking algorithm which utilizes the ECN field in the IP header in order to allow a receiver to estimate the congestion price of the path between it and the sender.

• Error analysis of the marking algorithm, including probability distributions and a bound on the total expected mean-squared error.

• Simulation results comparing the performance of the proposed marking algorithm with previously proposed probabilistic ones, in estimating both static and time- varying prices

• Suggested ways of dealing with security issues related to the algorithm

1.4.2 Modeling and Evaluating P2P Virus Propagation and Pollution

• Presentation of an epidemiological model of the spread of a conventional virus through a peer-to-peer network, and a steady-state analysis of this model.

• Derivation of an epidemiological model of a hypothetical peer-to-peer virus more vicious than those which have been observed, and an examination of three conceptual modifications to this model.

• Presentation of a model of the spread of pollution in peer-to-peer networks.

• Results of a measurement study of the eDonkey2000 peer-to-peer network examining download and file-sharing behaviour.

1.4.3 Analyzing and Improving BitTorrent Performance

• Presentation of three potential modifications to BitTorrent aimed at increasing fair­ ness. 1 Introduction 5

• Simulation results illustrating the effectiveness of the modifications, relative to two proposed fairness metrics.

• Theoretical analysis of the best-performing modification, including a preposition showing it can achieve perfect fairness under certain conditions.

1.5 Related Publications

1.5.1 Journal Paper

• R.W. Thommes and M.J. Coates, Deterministic packet marking for time-varying congestion price estimation, IEEE/ACM Transactions on Networking, vol. 14, no. 3, June 2006, pp. 592-602.

1.5.2 Major Conferences

• R.W. Thommes and M.J. Coates, Epidemiological modelling of peer-to-peer viruses and pollution, in Proc. IEEE Infocom, Barcelona, Spain, April 2006.

• R.W. Thommes and M.J. Coates, Deterministic packet marking for congestion price estimation, in Proc. IEEE Infocom, Hong Kong, March 2004.

1.5.3 Other Conferences and Workshops

• R.W. Thommes and M.J. Coates, Deterministic packet marking for maximum link price estimation, in Proc. Canadian Workshop on Information Theory, Montreal, QC, Canada, June 2005.

• R.W. Thommes and M.J. Coates, BitTorrent fairness: analysis and improvements, in Proc. Workshop Internet, Telecom, and Signal Proc, Noosa, Australia, Dec. 2005.

• R.W. Thommes and M.J. Coates, Modeling virus propagation in peer-to-peer net­ works, in Proc. IEEE Int. Conf. on Information, Comm. and Signal Proc, Bangkok, Thailand, Dec. 2005. 6

Chapter 2

Network Congestion Pricing: Background and Related Work

2.1 Background

Recently, a number of optimization-based network congestion schemes have been pro­ posed [7,13-18]. Most of these are price-based congestion control protocols: They require routers to maintain a congestion price which is a function of the arrival rate of incoming traffic and the capacity of the outgoing link. Each router must (indirectly) convey informa­ tion about its calculated congestion price to the source so that its rate can be changed as required. The proposed two-bit Explicit Congestion Notification (ECN) field addition to the IP header facilitates a mechanism for conveying price information [19]. Several of the proposed congestion control protocols separate the tasks of calculating the link price and communicating it (through packet marking) [7,17,18]. Two probabilistic marking proposals have emerged to carry out the latter task [7,8]. Both proposals allow receivers to estimate the path price - the sum of the link prices - by examining the proportion of marked packets.

2.1.1 The Marking Problem

The problem we consider is that of determining the sum of the prices of a set of links making up a path between a source and receiver. We now formalize the problem as in [8], slightly deviating occasionally. Consider a set of links 1,... ,n constituting an end-to-end path s from a source to a receiver. Each link i has a non-negative price s^. Define zn = YH=i * ^

2007/11/01 2 Network Congestion Pricing: Background and Related Work 7 the sum of the prices along the path. The routers pass price information to the receiver by encoding it in data packets traversing the path. Routers may modify the two congestion notification (CN) bits present in the header of every packet. We denote the value of these bits after undergoing the marking action of link/router i by X{. Xi can take on four possible values: 00,01,10,11. As per the the proposal of RFC 3168 [19], the (00) state is reserved to communicate that ECN is not being used. This leaves three codepoints for passing information.

After calculating zn based on the receipt of marked data packets, the receiver must communicate its estimate to the sender. Our primary focus is on the problem of routers conveying congestion price information to the receiver. Our objective is to design a marking algorithm that routers apply to the IP ECN field of all packets traversing the path in order to provide congestion information to the receiver. According to Adler, Cai, Shapiro, and Towsley, the authors of [8], a marking algorithm must obey some design constraints. It has to be fully distributed, meaning that a router may only use local information - the current link price and the value of the ECN field in the IP packet header - when making a marking decision. Furthermore, the marking algorithm should not keep any per-flow state information, or retain a history of how previous packets were marked. Adler et al. [8] make the claim that "there is no deterministic marking algorithm under these conditions". They present Random Additive Marking (RAM), a probabilistic algorithm, and compare it to Random Exponential Marking (REM), another probabilistic algorithm, proposed by Athuraliya, Li, Low, and Yin in [7]. Section 2.2.3 provides a detailed summary of both of these algorithms. A binomial distribution lurks beneath any probabilistic marking scheme, meaning one can at best achieve a linear decay in variance and mean-squared error with respect to the size of the block of data packets upon which the estimate is based. Thus, substantial variance in price estimates may persist even when estimates are based on a significant number of data packets. Due to the fact that path prices are dynamic, accurate estimates have to be obtained using a relatively small number of packets. Furthermore, the actual value of link prices has a significant impact on estimation error: in order to garner an accurate estimate with RAM, link prices must be close to 0.5 (assuming they are bounded between zero and one). In the case of REM, performance is heavily affected by the choice of its parameters and how well they match the given path lengths and prices. We believe that Adler's design constraints are unnecessarily restrictive. Since these 2 Network Congestion Pricing: Background and Related Work 8 constraints allow routers to read a packet's ECN field, it seems there is no increase in the fundamental difficulty of practically implementing a marking algorithm which allows routers to read additional fields in the IP header. Specifically, if we expand the list of locally available information to include the value of the time-to-live (TTL) and IP Identification (iPid ) fields in each packet's IP header, it is possible to define a deterministic marking algorithm1. In the following chapter we outline a deterministic quantized marking (DQM) algorithm and evaluate its performance. In order to make deployment of our algorithm more attainable, we have endeavoured to introduce as few changes to the current protocols as possible. The scheme does not require any modification to the TCP or IP headers.

2.2 Related Work

2.2.1 Examining How Shadow Prices are Denned

There are a variety of ways in which a shadow price may specifically be defined. The theory behind these approaches lies beyond the scope of our thesis, but below we summarize existing research in the area. Gibbens and Kelly present a simple slotted model in [14] to illustrate the properties of the shadow price in the ideal case. The scarce resource in the network has the capacity to transmit N packets. The number of packets generated per slot by each user i is a Poisson random variable Xi with mean %i for i = 1,2,..., m. The total load on the resource is given by the random variable Y = YML\ -^»> which is Poisson with mean y = Y1T=\ Xi- The cos^ is defined as the number of packets lost per slot. The expected cost, C(y) is

C(y) = E(Y - JV)+ = J> - N)e~^ i2-1) n>N

Note: the notation E(R)+ denotes the expectation of a modified version of a discrete random variable R in which any negative values R takes on with non-zero probability are replaced with values of zero. The shadow price p(y) is defined as the marginal increment in expected cost due to a marginal increment in the load. Thus, p(y) = j-C(y). This, in turn, means that

1Adler also assumes only one bit in the packet header is available for marking. While our algorithm makes use of both bits in the ECN field, it could be simply modified to function using only one bit. 2 Network Congestion Pricing: Background and Related Work 9

y!L p(y) = J2 e~ j = Prob{Y > N} (2.2) n>N The marking procedure is as follows: when more than N packets arrive during a slot, all of them are marked to indicate congestion. Given that the number of packets arriving in the slot is n, the number of packets contributed by user r is a binomial random variable with mean n^. The number of marks, per slot, on packets originating from user r is given by the random variable XrI{Y > N}, where /{} is the indicator function. Therefore, the expected number of marks is

E(XrI{Y > N})

= En>NE(Xr\Y = n)Prob(Y = n)

x = Zn>N tne-~y% (2.3)

— T V p-vt.

= xrp(y) Therefore, under this marking procedure, the expected charge per packet or, equiva- lently, the expected proportion of marked packets from flow r, is equal to the shadow price p(y)- For the general case where packet flows cannot be modeled as a Poisson process, an analogous identity does not necessarily exist. However, it can be shown that if the same marking algorithm is used, the relationship between the incremental increase in system load and the expected charge to the flow causing the increase, still holds. Assuming the same slotted model, and an initial load represented by the random variable Y, one wishes to determine the impact of an additional load represented by the non-negative random variable X. The additional amount of over-utilization of the resource, due to X, is: 2 Network Congestion Pricing: Background and Related Work 10

[X + Y-N]+ -[Y- N}+ (2.4) = XI{X + Y> N}-{N- Y)I{X + Y > N > Y} The first term on the r.h.s. of the equation represents the case where Y already exceeds N, and thus all X packets contribute to the over-utilization of the resource. The second r.h.s term is correction factor if Y < N, meaning that only X — (N — Y) packets from the additional load contribute to the over-utilization. In the case where N, Y and X are integers, and the incremental increase X < 1, the correction term is not necessary because it represents an impossible case. Thus, eliminating this term and taking expectations, gives:

E[X + Y - N}+ - E[Y - N}+ = E{XI{X + Y > N}) (2.5)

The r.h.s of this equation is, by definition, the expected number of marked X packets per unit time, i.e. the charge due to the additional load. Thus, for small increments, the equality between increased system cost and charge to the additional load holds in the general case. Gibbens and Kelly also examine how to implement an approximation to the idealized pricing model in a queue with a finite-sized buffer. The size of the queue at time t is given by Qt, and the, number of packets arriving in the interval (t — 1, t] is given by Y^-i. If the queue is of size N and can clear out one packet per unit time, the following recursion holds:

Qt = min {N, Qt-X - I{Qt^ > 0} + Yt^} (2.6)

The amount of packets lost during the interval (t — l,t] is given by:

+ [Qt-! - I{Qt-x > 0} + y*_x - N] (2.7)

A busy period is defined as follows: it begins at time t if Qt-\ < 1 and Qt > 1; it ends at time t if Qt-i = 1. A critical congestion period is defined as the time between the start of a busy period and the loss of a packet during that busy period. Thus, an additional packet increases, by one, the number of lost packets iff it arrives within a critical congestion period. All packets arriving during such a period should ideally be marked, since critical congestion periods are analogous to overloaded slots in the idealized model. The difficulty 2 Network Congestion Pricing: Background and Related Work 11 lies in the fact that, when a packet arrives, it is often impossible to discern whether or not a current busy period will become a critical congestion period. Two approximations to the ideal marking mechanism are provided. In the first one, all packets in the queue while a loss occurs are marked. A certain number of packets arriving subsequently are also marked. This number is chosen such that the proportion of marked packets corresponds to the probability that, if a randomly chosen packet had not been sent, the number of lost packets would have been one less. The second mechanism is a simplified version of the first - after a packet is lost, every packet departing the queue is marked until the queue is empty. This mechanism is simpler to implement because the queue does not have to calculate how many packets to mark. Numerical simulations indicate that it produces results nearly identical to the first mechanism. Another view of defining shadow prices is presented by MacKie-Mason and Varian x in [20]. A user i uses x{ of a network resource, and the total usage is given by X = YTi=i j- The total capacity of the resource is defined by K, meaning the utility Y is given by Y = jf. Each user has a utility function Ui(xi,Y), which is concave and differentiable with respect to Xi, and decreasing concave in Y. The authors consider the problem of efficient network usage by examining the total utility, over all users, W(K):

n W{K) = Y.u^xi'Y) (2-8) 3=1

The marginal change in utilization, over all users, due to a marginal increase in user i's usage is given by ^gp:

dW(K) =^dUj(Xj,Y)dY _ 1 ^dujjx^Y) dxi ^ dY dxi K *-" dY { ' }

Defining pe as the shadow price equal to the reduction in utilization due to user i provides: p^_^su^n (210) 3 = 1

The expression for pe requires a correction factor: the ith component in the sum must be subtracted since the change in user i's utilization is not a cost imposed on the system. 2 Network Congestion Pricing: Background and Related Work 12

Thus, for optimality, user i solves the following problem:

duj(xitY) , 1 dUi(xi,Y) _ , , + Pe [ dxt K dY ~ >

In [21], Key, Massoulie, and Shapiro examine a pricing scheme for which the cost depends specifically on the delay at a link in the network. Each user sends at a rate xr) and the aggregate rate is given by x = ^xr. If the delay at the link is given by D(x), a simple congestion cost metric suggested is C(x) = xD. As in [14], the packet price is the derivative of the cost: p(x) — C' = xD + D(x). If each user has a strictly concave utility function Vr(xr) (not directly dependent on link delay) and tries to maximize its net benefit

V(r)—xr(p), the system will reach a global maximum for the total welfare Y^x Vr(xr) — C(x).

In a more general utility function U(xr, D) will be dependent on both transmis­ sion rate and average delay. As in [20], to achieve optimal total welfare, a user increases its rate until it experiences an increase in utility equal to the reduction of the utilities over all other users. Thus, the price p must be defined as:

d(Zr Ur{xr,D)) _ d{Y,r Ur{xr,D)) dD V dx dD dx (2.12)

__3Dy> dUr{xr,D) dx i—Jr dx

Key et al. also adapt their pricing scheme for a network with multiple classes of ser­ vice. Packets belonging to service class i arrive at an aggregate rate xt, and experience delay Di. The delay-based congestion cost is C(xi,x2, ...,£») = XiDi + x2D2 + ... + a^'A. ac x x The congestion price for class i is ( i<**<-< i), The utility Ur for user r is defined as

Ur(xri,xr2,...,xri,Di,D2,...,Di) = Vr(xri,xr2,...,xri) — xriDi - xr2D2,..., -xriDi, where

xri is the rate at which users r is sending traffic of class i. In practice, a user may only transmit in one class of service. With the given utility function, the network will achieve maximum total welfare.

2.2.2 Modeling Networks Utilizing Price-Based Congestion Control

Since our price estimation algorithm is designed to ultimately operate in conjunction with a congestion control algorithm, we now present a summary of the theory behind such algorithms from existing publications. 2 Network Congestion Pricing: Background and Related Work 13

In [17] Low and Lapsley present a model for optimizing utilization of a network shared by a number of sources S = {l,...,S}. The network has a set of unidirectional links L = {1,..., L}. Each link / € L has a capacity Q. Each source s uses a set of link L(s) C L, and has a increasing, strictly concave utility function Us(xs) when transmitting at a rate xs, where 0 < ms < xs < Ms < oo. MS and ms are, respectively, the maximum and minimum transmission rates that source s requires. The range for xs may be represented by Is — [ms,Ms]. For every link I, S(l) = {s € S\l G L(s)} are the set of sources using link I. The objective of the optimization problem is to choose the rates x — (xs, s £ S) to maximize the sum of the utility functions, while not exceeding the capacity of any link:

P:max J^ UB(xa) (2.13)

s subject to the constraint:

Y, xa

L(x,p) = J2 Us(xs) - X> ( Z>- " « I (2-15)

Here pi are the Lagrange multipliers, and may also be interpreted as the shadow price (per unit bandwidth) of link I. After some manipulation of the Lagrangian, the objective function of the dual problem becomes:

s D(p) - maxL{x,p) ^Bs(p ) + Y>q (2.16) Xs€l,

where

s s Bs(p ) = max Us(xs) - xsp (2.17) X3s

PS=YPI (2-18) 2 Network Congestion Pricing: Background and Related Work 14

s s The value p is the sum of the prices of all the links in the path of s, and therefore xsp is the cost to source s when sending at a rate xs. Bs(pa) is the maximum benefit - utility minus cost - s can achieve at the given cost. A key implication is that, given a price ps, source s can solve its individual optimal rate xs - the maximization in (2.17) - without the need to communicate with any other sources. If a number of subtle assumptions on U(s) hold [17], the maximizer, x*(ps) is found, by the Kuhn-Tucker theorem, to be:

s <(p ) = \U;\P)Z: (2.i9)

b Here [z] a = min{max{z,a},6}. In order to determine the optimal price adjustment rule for any link /, Low and Lapsley employ the gradient projection method, which adjusts prices in the direction opposite to the gradient of the objective function WD(p):

Mt+v^ipM-i^-mr (2.20)

Where 7 > 0 is the stepsize. The partial derivatives |^ are given by:

^ = Cl - £ xs(p) (2.21) Fl ses(i) Therefore, the adjustment rule for the price pi of each link I is:

c + 2 22 Pl(t + 1) = |p,(t) + 7( E *-(P) - ')l < ' ) sGS(0 Thus, Low and Lapsley have specified a distributed optimization algorithm in which sources calculates their transmission xs rates based only on the current price ps of their transmission path, and links update their prices pi based on the current total traffic X an J2ses(i) S(P) d their capacity c\. In [22], Kelly presents a model similar to the one suggested in [17]. The main difference is that a source/sink pair may have more than one possible unique route through the network, and that traffic may be divided among two or more routes. There is a set of J links, each with a capacity cy. A route r is a non-empty subset of

J, and the set of these routes is designated R. The matrix A = (Ajr,j G J,r £ R) defines 2 Network Congestion Pricing: Background and Related Work 15

the links associated with each route: Ajr — 1 if link j is a part of route r and Ajr = 0 otherwise. A source s 6 S is able to use a subset of the R routes in order to send data to its associated sink. The matrix H — (Hsr,s G S,r € R) defines which routes each source may use: Hsr = 1 if source s may use route r, and Hsr = 0 otherwise. Each route r serves exactly one source s. This is route is designated s(r)

Each source has a utility Us(xs), a function of its transmission rate xs. This function has the same properties as identified in [17].

The flow pattern yr,r € R specifies how much traffic flows over each route. In order for every given source s to transmit at its chosen rate xs, it must be the case that Hy = x, where x is the vector of rates (xs, s € S) and y is the vector of flow patterns (yr,r e R). In order for the flow pattern to be feasible, no link capacities may be exceeded. This condition is met if Ay < C, where C = (C,-, j 6 J). Therefore, the optimization problem SYSTEM(U,H,A,C) may be expressed as follows:

max^2us{xs) (2.23) seS subject to the constraints:

Hy = x, Ay 0 (2.24)

The Lagrangian form of SYSTEM(U,H,A,C) is:

U x x H L{x, y, z] A, /i) - J2s€S s( s) - >?( - y) (2 25) +HT(C-Ay-z)

Here A = (As, s G S),^= (fJ-j,j € J) are Lagrange multipliers, and z = (zj,j E J) are slack variables. After differentiating L with respect to its first three arguments, the following conditions are found to hold in order to maximize L: 2 Network Congestion Pricing: Background and Related Work 16

As = Us(xs) ,xa > 0

> Us(xa) ,xs = 0

Xs(r) = Ejgr Pi > Vr > ° /g ggv < Ejer ^ > ?/r = 0

fij — 0 , Zj = 0 >0 ,Zj = 0 Kelly then uses constrained optimization theory to demonstrate the existence of a quadruple (X,/j,,x,y), that solves the optimization problem SYSTEM(U,H,A,C). It sat­ isfies the following conditions:

X>U(x), Hy = z, {X-U{x))Tx = 0

^>0, Ax

T T T T X H0, (fi A-\ H)y = 0

Another optimization problem Users(Us; Xs) is presented, in which each user s indepen­ dently adjusts its rate xs to maximize its utility Us(xs) — Xsxs given it is charged a price As per unit flow. A third optimization problem NETWORK(H,A,C;\) requires the network to vary the flows x in order to optimize its total revenue:

2 28 max:X>^* ) ( - ) s&S subject to the constraints:

Hy = x, Ay0 (2.29)

Kelly then proves that, given the vector of optimal flows x comprised of the solutions to the S instances of User„(Ua; Xs)} there exists a price vector A = (As, s € S) such that this same vector x solves both NETWORK(H,A,C;X) and the original optimization problem SYSTEM(U,H,A,C). This fact mirrors a conclusion from [17]: the global optimization 2 Network Congestion Pricing: Background and Related Work 17 problem may be solved in a distributed manner, among the individual sources and links. In the special case where only a single route exists for a source-sink pair (meaning

H = I), and the utility function Us(xs) = ms loga:s, the conditions from (2.27) imply:

As = —- Xa = ^2fj,j xs = =^-iij (2.30) x' jes ^iea such that xs and Hj(j, s € S) solve:

ji > 0, Ax < c, nT(C -Ax)=0 (2.31)

If ms — 1 V s G S, the rates x = (xa,s G S) derived from (2.30) and (2.31) are proportionally fair. This means that the rates are feasible: x > 0 and Ax < C, and for any other possible vector x* the total of proportional changes is non-positive:

J^^—^<0 (2.32) ses Xs

Indeed, the rates x found by solving (2.30) and (2.31) with ms = 1 are the unique set which are proportionally fair. In [23], Paganini, Doyle, and Low consider price-based congestion control using a flow model and analysis from the field of control theory. This approach allows the modeling of feedback delays in the network. The model consists of L links utilized by S sources. The associated routing matrix R is

of dimension L x S. Entry Rti = 1 if source i uses link I, and Rti = 0 if the link is not used. Each link I has a capacity q, a total rate of flow through it denoted by yi, and generates a price signal pi. If yi < pi, then pi — 0 (i.e. only bottleneck links have a non-zero cost).

Each source i sends a flow of rate Xj, and is aware of the price qs which is the sum of all the prices of the links used by i. The relationship between these variables in the Laplace-transform domain is:

y(s) = Rf(8)x(S) (2.33) T q(s) = Rb(s) p(s)

Rf(s) contains information about the forward delay, r(v in source i's flow in reaching link 2 Network Congestion Pricing: Background and Related Work 18

I, and Rb(s) captures the backward delay, T\U in link price signals reaching the source. The matrices are defined as follows:

[#(*)]„ = J{JMe-^ (2.34) T [#(a)]w = J{/Ue- «.«' where /{.} is the indicator function. The network model is summarized in Figure 2.1.

SOURCES LINKS

Pig. 2.1 High-level illustration of network model with price signals (repro­ duced from [23])

Next, Paganini et al. examine the stability of the network when operating at any equi­ librium point (XO,2/OJPO)9O)- The value of the equilibrium point depends on the design implementation of the network: specifically how links set prices depending on their utiliza­ tion, and how sources set their rates depending on the path price. The authors propose a linear control law, in which there is gain between qt and xt (implemented by the source) of Ki, and the link controller integrates the the flow yi. The control law is define as follow:

OtjXQi Ki (2.35)

Pi CIS Vx

Here ctj < 1 is a free parameter,^ = rft + r\x is the round-trip between source i and link I, x0i is source i's equilibrium rate, and Mi is how many bottleneck links there are in the path of source i. In order to guarantee stability, two assumptions are necessary: For any link, the target rate Q must fall below its actual capacity by a small amount, and the matrices Rf(0) and Rb(0) must be full rank. Paganini et al. provide a detailed proof that, 2 Network Congestion Pricing: Background and Related Work 19 with these control laws and assumptions, the system is linearly stable for any delay values, routing topologies and link capacities. In order to implement the network with nonlinear laws, the following price dynamic is suggested:

(m^L ioiPl>0, P={ ' (2.36) lmax(0, ^) forp, = 0.

. Under this relationship, at equilibrium all non-bottleneck links will have zero prices, and bottleneck links will have yi = C\ and a nonzero price. The control implementation for the mapping price information ^ to source rates Xi may be set as:

Xi = fMi) = Zmax,^"1^ (2.37) where a;max,i is the maximum transmission rate of source i.

2.2.3 Issues Related to the Practical Implementation of Price-based Congestion Control

In this section we examine two probabilistic marking schemes for encoding pricing infor­ mation which serve as an alternative to our deterministic algorithm. In [7] Athuraliya et al. present the Random Exponential Marking (REM) scheme. Each link I calculates its congestion price pi, and it encodes this information using the two-bit Explicit Congestion Notification field in the IP packet header [19]. The link price pi is calculated recursively, and is based on the total input rate at link I : xl(t), its buffer backlog bi(t), and its capacity q:

+ pi(t + 1) = \pt(t) + 7(a,6,(t) + *'(<) - Q)] (2-38) As intuition would suggest, the price is an increasing function of the backlog and input rate. The constants 7 and a/ are small and strictly positive. Thus, pi(t) is non-negative and unbounded. A link I marks unmarked packets arriving during time period t with probability mi(t) which is a function of the current price pi (t): 2 Network Congestion Pricing: Background and Related Work 20

pi(t) mx{t) = 1 - (/T (2.39)

Here (f> > 0 is fixed for a given implementation of the algorithm. The choice of (j) impacts the error performance of REM, as discussed in the summary of [8]. The end-to-end marking probability ms(t) is therefore given by:

m«(t) = X) t1 ~ m'W) = l ~ ^^ (2,4°) i where ps(t) — Y2iPi(t) ^s *ne total path price. To estimate the path price the receiver logs the proportion of marked packets ms(t) during a certain time period. Then it forms an estimate p"(t) of the path price by inverting (2.40):

F(*) =-log,(l-m'(t)) (2-41)

In order to implement the algorithm, the ECN codepoint X0 is initialized to 01 (10 could also be used). If X*_i = 01, the i-th router sets the ECN codepoint to 11 with probability 1 — ~Pi^. If Xi-\ is already 11, the i-th router makes no change. The value of the codepoint at the receiver is Xn. Xn = 01 with probability ^-SiU^W and Xn = 11 otherwise. After collecting N packets the receiver may estimate the total price by computing X — Y^iLi3-i[Xn = H], where li[Xn = 11] is the indicator function for packet i, and forming the estimate ps(t) = — log^l — X). Adler et al. present Random Additive Marking (RAM) in [8]. It is an alternative probabilistic marking scheme. The paper does not specify how each link calculates its price, but requires that each link's price pi(i) fall in the range bounded by 0 and 1. It is also necessary that each link know its position i within the path. Links may estimate their position using a method based on the TTL field in the IP header [8]. Adler et al. demonstrate that price estimation performance does not significantly decline when applying this method.

RAM is also implemented using the ECN field. Initially X0 = 01, and the i-th router in the path leaves the ECN field Xj_i unchanged with probability (i — l)/i, sets it to 11 with probability Pi(t)/i, and sets it to 01 otherwise. The expectation of the indicator n = i n function X[Xn = 11] is equal to the expectation of Y^i=iPi^)/ P*( )/ - Upon receiving N packets, the receiver may produce an unbiased estimate of the path price by multiplying 2 Network Congestion Pricing: Background and Related Work 21

the proportion of marked packets X = Yli=i Z[Xn = H] by the path length:

ps(t) = Xn (2.42)

Adler et al. also analyze the accuracy of RAM and REM in estimating the path price. If the true path price is ps(t), and the estimate is p"(t), the error probability, err(e) is defined as the probability that the estimate does not fall within a certain range of the true price: err{e) = 1 - Pr[(l - e)jf(t) < ps(t) < (1 + e)p'(t)] (2.43)

For RAM, the relationship between the price estimate, ps(t), and the proportion of marked packets ms(t), is

f(t) = ms{t)n (2.44) where n is the number of links in the path. For REM, the relation is:

ps(t) = -log(l-ms(t)) (2.45)

The error probability err it) may also be expressed in terms of ms(t) and the true marking probability ms(t):

err(e) = 1 - Pr[(l -

It must hold that

ps(t) = (1 - e)ps(t) & ms{t) = (1 - S-)ms(t) (2.47) ps(t) = (1 + e)ps(t) «*• ms(t) = (1 + S+)ms(t) Thus, for RAM:

ps(t) = (1 - e)ps(t) = nms(t) - (1 - 5-)nms(t) (2.48) which implies 5' = e, and, analogously, 5+ = e. For REM:

s s s pS(t) p (i) = (1 - e)p (t) = -log0 (1 - rh {t)) = -log^ (1 - (1 - r)(l - 0" )) (2.49) 2 Network Congestion Pricing: Background and Related Work 22 meaning

0-(l-e)p'(t) _ A-p^t) r = -—: •;—£; 2.50 and, analogously,

5+ = i-l^ (2-51) Given the marking probability ms(t), and the number of samples N, the distribu­ tion of the number of marked packets B is a Bernoulli random variable r(N,B,ms(t)) = N\ b I (ms(t))B(l — (ms(t))YN~B\ This allows the error probabilities to be calculated as follows:

s r 5 mS Pr\p (t) > (1 + e)z] = ELWi+«+)m-Wl M > ^)) (2.52) B m S )mS{ Pr^(t) < (1 - e)z] = E B2l - ~ ^ r(N,B,m'(t)) Thus, err(e) is the sum of the two preceding equations, with 5~ = <5+ = e for RAM; for REM the relation is given by equations (2.50) and (2.51). Adler et al. compare the error probability versus path length for RAM and two versions of REM with different values of 0. They show that RAM provides a lower error probability over the entire range of path lengths, and its performance has little dependence on the path length. For REM, err(e) is a function of the parameter can result in poor estimation performance. This estimate is biased, but does converge to ps(t) almost everywhere, almost surely. Under the assumption of independent, uniformly distributed link prices normalized to have a mean price of approximately 0.5, Adler et al. show that RAM always exhibits a lower error probability than REM. This is true even when the optimal value of 4> is used. In addition, the performance of RAM, unlike REM, is not affected by path length. However, it suffers severely when the average link price strays significantly from 0.5. When a marking scheme is used for price estimation in a time-varying network model, 2 Network Congestion Pricing: Background and Related Work 23 there is a question of how many packets should be received before a new estimate is formed. If the price is constant, the accuracy of both RAM and REM continue to increase as the block of packets comprising the estimation window increases. However, if the price is time- varying, a longer estimation window increases the probability that the estimate will be partially based on outdated information. Shapiro, Hollot, and Towsley examine the trade­ off by modeling a single-link network with price-based congestion control as a feedback loop [24]. A high-level illustration of the model is given in Figure 2.2. In this model, the link price, q(x), a function of the link rate x, is bounded in [0,1], and calculated using a scheme called adaptive virtual queue, proposed in [25]:

q(x) = ^^ (2.53) x where c is the virtual capacity which is adapted so that the rate stabilizes at a link rate less than or equal to the true capacity c. The source has a logarithmic utility function u(x):

u(x) = wlogx (2.54)

With a round trip delay of r, the source rate is adjusted according to the primal gradient descent algorithm, as proposed in [26]:

^ = K(W- x(t - r)q(x(t - r))) (2.55) at Here K is scale factor, and q is the price estimate calculated by the source, based on an estimation window of N packets. The round-trip delay, r is comprised of a fixed propagation delay given by r0, and the effective delay in receiving the packets in the estimation window. This effective delay is the time necessary to receive N/2 packets. If each packet is of size P, the total delay r is:

T = T0 + ^ (2.56)

The price estimate q(x) is the sum of the true price, and an error term r)\

q(x) = q(x)+rj (2.57) 2 Network Congestion Pricing: Background and Related Work 24

The estimation error TJ is modeled as Gaussian white noise. In the case of RAM, which is an unbiased estimator, the mean value of 77 is zero. The variance, cr^(N, q), depends on the marking probability p = F(q) at the link. The number of marked packets in the estimation 2 window of size N is binomially distributed with variance a D — F(q)(l — F(q). The binomial distribution is approximated as a normal distribution with variance aD. Finally, the arrival of data packets must be approximated by a continuous-time process. Packets arrive once every ts seconds: ts — (3/x0, where x0 is the average bit rate. The variance of the continuous 2 2 time process is given by cr^(N, q) = t sa D:

, P2F(q)(l-F(q)) > 1 (2 58) °^^0 N ~^4iV -

In order to calculate the variance (equivalently, mean-squared-error) of the source rate under this control model, Shapiro et al. linearize the system equations about the point

(x0,q0) — (w + c, ^g). They derive an expression of the loop transfer function, and then 2 compute its 2-norm. The final expression for the mean-squared-error a x of the source rate is:

(2 59) '• ~^r- 2-Kh+^} ' Shapiro et al. consider the effect that the window size N has on the MSE. They show that the optimal MSE is achieved with only approximately 100 samples per estimate, and that there is a wide range of sample sizes which result in an MSE close to the minimum.

2.2.4 Maximum Link Price Estimation

In Chapter 3 we describe a deterministic marking algorithm for estimating the total con­ gestion price along a path, as an alternative to the probabilistic REM and RAM techniques described above. These results were first presented by us in a 2004 paper [27]. A proposed congestion control technique called MaxNet [28], which requires sources to only have in­ formation about the most congested link in a path, motivated us to publish a proposed modification to our algorithm in 2005 [29]. Independently, Ryu and Chong published their proposal for a similar algorithm [30], citing [27]. Subsequently, Andrew, Hanly, Chan, and Cui proposed an improved maximum link price estimation protocol, citing both [30] 2 Network Congestion Pricing: Background and Related Work 25

rate, x delay delayed rate, x N,x

' source link

i

r^- estimated price, q price, q error bias noise N, q N, q

Fig. 2.2 Model of congestion control feedback loop. Source: [24] and [27]. We discuss these two papers in Section 3.7.4 after first describing our algorithm. 26

Chapter 3

Deterministic Packet Marking for Congestion Price Estimation

3.1 Chapter Structure

The chapter is organized as follows. In Section 3.2 we review the current IP standard for packet marking and propose a new deterministic marking algorithm and describe its structure. In Section 3.3 we quantify the inherent sources of error of our algorithm and analyze its performance in terms of mean-squared error and comparison with RAM. In Section 3.4, we examine the performance of our algorithm based on Internet trace-driven simulations. Finally, in Section 3.5 we present what we consider to be the most significant results in this chapter: we consider several scenarios with time-varying link prices and explore how well our algorithm and RAM track the path price.

3.2 Deterministic Packet Marking

3.2.1 IP Standard

ECN provides intermediate network routers with an alternative to dropping packets in order to signal congestion to end systems. RFC 3168 provides details of the, proposed ECN addition to the IP protocol [19]. It specifies that the two-bit ECN field, comprising four code-points, may be used to indicate the presence of congestion. In order to ensure backwards-compatibility, one codepoint (00) is used to indicate that a sent packet does not

2007/11/01 3 Deterministic Packet Marking for Congestion Price Estimation 27

support ECN marking. RFC 3168 indicates that when ECN is used, the sender initializes the ECN field to one of two codepoints (01,10). This indicates ECT (ECN-Capable Transport), implying the sender and receiver support the use of ECN. An intermediate node experiencing congestion will set the ECN field of a received packet with an ECT codepoint to the Congestion Experienced (CE) codepoint (11) to indicate its congestion to the receiver. When an end system receives a single packet with the CE codepoint set, the RFC requires that the end system transport layer protocol respond in essentially the same manner as when it detects a dropped packet. RFC 3168 does not specify how routers are to calculate congestion nor how they shall decide whether to mark a given packet.

3.2.2 Preliminaries

We now specify a marking mechanism which allows the routers on a path between transmit­ ter and receiver to convey the sum of their quantized prices to the receiver. Our proposed algorithm makes use of only the two existing ECN bits in the IP header, and retains the RFC allocation of the 00-codepoint [19]. However, we modify both the router marking algorithm and the manner in which the receiver interprets the marking information. Un­ like REM and RAM, our marking scheme is deterministic in the sense that every packet is marked, and the marking is performed according to a deterministic function applied to quantized link prices. As with RAM, our scheme requires that every link price S; is bounded: 0 < Si < 1. Every router calculates a 6-bit uniform quantization of its congestion price. We use b = 4 for our empirical analysis and examples in this chapter, as this value provides a good trade-off between quantization error and the number of bits needed to describe the price estimates. The key idea behind our scheme is that each data packet encodes the sum of a small subset of link price bits. The receiver can reconstruct the sum of the quantized prices (and hence form an estimate of the path price) by combining all the partial sums. All data packets are mapped into so-called probe types, and each probe type may be modified by two routers so that it carries the sum of two bits of equal significance of the quantized prices of the two links outgoing from the routers. Figure 3.1(a) provides an example: probe type 7 carries the sum of the most significant bits (MSBs) of the quantized price of links 3 3 Deterministic Packet Marking for Congestion Price Estimation 28

Link Quantized Price Sum X(n)

1 0 1 0 01 t 1 1 0 1 10 1 0 0 2 11 0 0 1 V. Probe Type 11 Probe Type 7

(a) Probe examples (b) Output

Fig. 3.1 Example of the operation of Deterministic Marking Algorithm I. (a) The nature of the probe types - each performs a sum of two bits, as indicated by the boxes, (b) The state output of the sum. and 4. Figure 3.1(b) indicates the output of the sum. In order to initialize the summation procedure, X0 is set to 01. Subsequently, only designated routers can change the ECN field; in this example routers 3 and 4. These routers modify the ECN field if the price bit of their outgoing link to which the probe type corresponds (the MSB in our example) is equal to 1. In this case, the ECN field is updated according to table 3.1(b); i.e. state 01 is changed to 10, and 10 to 11. Thus, the data packet indicates the sum of the two bits when it arrives at the receiver. It it important to restrict the number of probe types to a reasonable value. If we limit the maximum path length to a value nmax, the number of probe types required is b\nma,x/2].

We use nmax = 30 in this chapter, to facilitate a concrete description of our technique. This implies a need for 60 probe types. We choose nmax — 30 due to the results of Begtavesic et al. [31], which indicate that paths of lengths greater than 30 are exceptionally rare in the Internet. In the unlikely event that a path does contain n > 30 (but less than 60) links, 2(n — 30) of the probe types are susceptible to overflow error because their ECN field may be incremented by three routers. An overflow will only occur if all three routers attempt to encode a price bit of 1. Therefore, the resulting estimation error is unlikely to be catastrophic unless the path length is significantly longer than 30 links. In order for our marking procedure to function correctly, we must define a mapping that labels each data packet as a certain probe type. This mapping makes use of the IP identification field. In addition, the routers must have a way to determine whether to modify a given probe type. They make this decision by comparing the time-to-live (TTL) field with the probe type label. 3 Deterministic Packet Marking for Congestion Price Estimation 29

3.2.3 The IP Identification Field and Probe Types

The IPid field provides a mechanism for fragment reassembly. According to RFC 791 [32] it "is used to distinguish the fragments of one datagram from another" and for each datagram, the identification field must be set to "a value that must be unique for the source-destination pair and protocol for the time the datagram will be active in the internet system." There are no additional requirements placed on the actual value of the IPid field. A technique for counting NATted hosts (hosts connected to the Internet by way of a Network Address Translator), introduced by Bellovin [33], exploits the manner in which many hosts implement the IPid. Bellovin makes the observation, also noted in [34], that many hosts implement the IP field using a simple counter. This means successive data packets emitted by a host carry sequential IPid values. However, there are other, less common, approaches. Some hosts use pseudo-random number generators, while others use byte-swapped counters [33]. Some versions of the Solaris implement separate sequence number spaces for each (source, destination, protocol) triple [33]. Since the 16-bit IPid field can take on significantly more unique values than the required

6[nmax/2] (60 in our example) probe types, we require a mapping function. The function we choose sets the probe type identifier to IPid mod m, where m is a prime number slightly larger than our required number of probe types. The majority of the resulting m probe types carry price information, while the remaining probe types are reserved for other forms of communication. One of the major goals of this mapping function is that each probe type appears as regularly as possible (ideally once every 60 packets). The counter implementation and the Solaris mechanism are of substantial benefit here. In the case where the host implements the IPid field as a counter, the sequence of data packets sent between a given host and receiver cycles through the probe types, skipping some values whenever packets belonging to other streams intervene. Each probe type appears regularly with this implementation. In the ideal scenario - where a separate space is dedicated to each (source, destination, protocol) triple, the IPid field is generated by a counter, and every data packet belonging to that triple is used to calculate the path price - each probe type appears once every m data packets. • In the next section, we analyse the performance of our marking algorithm assuming random IPid generation. The empirical survival function of the spacing between probe types of the same kind using data collected from the Internet is shown in Figure 3.3(a) in 3 Deterministic Packet Marking for Congestion Price Estimation 30

Section 3.4. It illustrates that in most cases the spacing is slightly larger than m, which suggests the prevalence of counting implementations in the Internet. A strictly random IP id generation results in a substantially heavier tail.

3.2.4 The Router Marking Algorithm

When a data packet arrives at a router, the router calculates its Link ID as LinkID = TTL mod nmax. Secondly, the router calculates the probe type as ProbeType = IPid mod rn. The router then uses the pair (Link ID, Probe Type) to determine whether it should perform any action and, if so, what price bit it should encode. The router makes its decision by consulting an nraax x m lookup-table in its memory. This table, which is static and identical in all routers, returns an entry of "perform no action", or "increment ECN field if bit j=l", where j G (1,2, ..6). Every column of the table contains two entries of the latter variety, corresponding to the two routers which may modify a given probe type. Each row of the table contains b such entries, one for each quantization bit. If the table informs a router to modify an incoming data packet, and the indicated price bit is 1, the router increments the state X^i, from 01 to 10 or from 10 to 11. In order for this marking algorithm to work properly, we must make two mild assumptions. First, to ensure that at most two routers modify the ECN field of a probe type, every router must decrement the TTL field by 1. Second, to ensure that disparate data packets corresponding to the same probe type are marked by the same routers, the TTL field for every data packet of a given stream must be initialized to a constant value by the sender. We summarize the marking procedure with pseudocode presented in Appendix B. A slight modification of the above technique, which we call Algorithm II, performs the bit summation in a different manner. Figure 3.2 depicts its operation. Each probe focuses on six bits of the same significance (examples are shown in Figure 3.2(d)). However, in this case the mechanism for determining a state transition is slightly more complicated. Each probe type is associated with one of two categories, A or B. If the (Link ID, Probe Type) pair indicates that router i should participate, it checks the indicated price bit. If the price bit is zero, it takes no action, setting Xi — Xj_i. If the price bit is one, the router sets a new value of X{. This value is determined by the category of the probe and the associated state transition diagrams, either that of Figure 3.2(a) or that of Figure 3.2(b). Two probe types of different categories focus on the same six bits. The output state of the two probe 3 Deterministic Packet Marking for Congestion Price Estimation 31

a) Probe category A b) Probe category B

Sum A B Link ID Quantized Price 1 0 01 01 3 0 1 0 1 1 10 10 4 0 1 1 0 2 11 11 5 1 0 0 0 3 01 10 r 6 / 0 0 1 1 4 10 11 7 1 1 0 1 5 11 10 00 . 1 0 1 1 01 11 Probe Type 7 = 01 Probe Type 8 = 10

c) Output d) Probe Examples Fig. 3.2 Algorithm II operation, (a) and (b): State transition diagrams for probe categories A and B, respectively, (c) The output of probe categories A and B depending on the sum of the bits, (d) Examples of probe types. 3 Deterministic Packet Marking for Congestion Price Estimation 32 types uniquely determines the sum of the six bits, as depicted in Figure 3.2(c). As two probe types are sufficient to identify the sum of six bits, the number of probe types required to cover all bits is reduced by a factor of 2/3 to 2b\n/6]. In our example of b = 4 and nmax = 30, we require 40 probe types and use m — 43 as the generator of the probe type. Although Algorithm II requires fewer probe types than our primary algorithm to encode the same amount of information, it exhibits a significant increase in the expected error when the receiver performs a price estimate from an incomplete set of probe types. Thus, for subsequent discussion and simulations we only consider our primary algorithm.

3.2.5 Path Price Estimation

When every probe type has arrived at the receiver at least once, it can determine the sum of quantized prices exactly and the estimation error will be equal to the quantization error (as detailed in the following section). However, in any practical scenario, the path price can only be assumed to remain fixed for short durations. We specify two procedures for estimation in such a scenario:

1. Block-based estimation: After receiving a block of K data packets, form the path price estimate based on the available probe types. If one or more probe types are missing, insert values that minimize the expected error under the assumption of a uniform link price distribution.

2. Time-varying estimation: Upon the reception of a data packet, form a new estimate based on the values of the most recently received instances of each probe type.

In the following section, we provide bounds on the mean-squared error under the block- based estimation approach. In Section 3.4 we analyse performance empirically using probe type sequences derived from Internet traces.

3.3 Error Analysis

In this section, we analyse the performance of our marking algorithm based on fixed block lengths, with probe types generated independently and uniformly over the possible m val­ ues. Under the assumption of uniformly distributed link prices over the range [0,1], we 3 Deterministic Packet Marking for Congestion Price Estimation 33 derive a bound on the mean-squared error. In Section 3.4 we perform an empirical analysis of mean-squared error for comparison. There are two potential sources of error when a receiver calculates the end-to-end price using the deterministic marking algorithm. One is due to the quantization, and the other is due to the possibility of one or more unique probes types not being observed. We assume that the path is of length n links, each link price is quantized using b bits, a block size of K packets is examined by the receiver to determine the end-to-end price and the number of unique probe types m is [y ]. We commence with some background discussion.

Quantization Error

Under the assumption that the price of each link is uniformly distributed and normalized to fall in the interval [0,1), the obvious choice for the 6-bit quantizer is a uniform one. That is to say, its representation points will be of the form

1 ' (3.1) 2&+i' 26+!' 26+1''"' 2b+l

Since link prices are uniform, the quantization error eq is uniformly distributed between an( nas a the two extremes: f(eq) ~ U (TJTSTI , -JETT) * variance of Var(eq) = J^K•

Error due to Missing Bits

In order to examine the error in representing the price of a link due to missing one or

more of the b bits, we consider the decimal representation qd of a binary quantization value

(a0,ai, ...,oj,_i) where a0 is the MSB (most significant bit). It is given by:

For the purposes of estimating qd we adapt the approach of replacing a missing bit a; by its expected value under the assumption of a uniform distribution. Since any bit is equally likely to be 0 or 1, a missing bit will always be replaced by a value of 1/2. Clearly this results in a lower error variance than if a missing bit is randomly chosen to be 0 or 1. The

error due to a missing bit a{ is a discrete uniform random variable (denoted by UD) taking 3 Deterministic Packet Marking for Congestion Price Estimation 34 on one of two values with equal probability: e„,~^{-(l-i)^,(l-y^} (3.3)

The expected value of the error eai is 0, and its variance is given by: VarM = {wrM (3'4)

3.3.1 Distribution of Missing Probe Types

The distribution of the number of probes not observed in a block of size K is strongly de­ pendent on how the IP identifier field increments between contiguously transmitted packets. Ideally, the identifier increases by one every time, in which case any block of K > m is guaranteed to include at least one packet mapping to every probe type. However, as verified by our measurements in Section 3.4, some hosts on the Internet generate IP identifiers that change in a random manner. In this section, we consider the case where identifiers and hence probe types are randomly selected based on a uniform distribution. We note that there exist theoretical cases worse than randomly distributed identifiers. For instance, if all identifiers equivalent to a certain value smodra are skipped by the host, then the probe type corresponding to this value will never be observed for any sequence length. How­ ever, our observation of Internet traffic does not provide evidence that such pathological behaviour is likely to occur. Due to our assumption of all probe types being equally likely, the analysis of the number of missing probes is an instance of the classical occupancy problem [35,36]. Briefly, this problem considers the number of empty bins resulting from a random allocation of K balls into m different bins.

3.3.2 Representing the End-to-End Error

Consider an estimation block of K packets in a scenario where there are m unique probe types. Define the random variable W : max(l, m — K) < W < m as the number of probe types not observed in the block, and let w be a vector of the form (w0, wi,..., itv-i) where Wi : 0 < Wi < n/2 is how many probe types encoding information about the zth price bit-column have not been observed. 3 Deterministic Packet Marking for Congestion Price Estimation 35

Theorem 1. // quantization is performed on each link price using b bits and there are n links, a block of K probes uniformly generated from m possible probe types is used to perform path-price estimation, and the quantization error is uniformly distributed between

extrema, then the probability density of the estimation error et is:

p(et\n,b,m,K) = V™ p(W\m,K) x ^—'VK=max(0,m—K)

]T p(w|W>n,6)-p(et|w,n,6) (3.5)

Detailed expressions for p(W\m, K), p(w\W,n,b), andp(et\w)n)b) are givenby, respec­ tively, (3.6), (3.7), and (3.8).

The decomposition of (3.5) is readily verifiable using the Law of Total Probability. We complete our derivation of the density function for the end-to-end error by considering each of the three component conditional distributions defined in Theorem 1 in turn. The result of interest arising from the Occupancy Problem is the distribution of the number of missing entities when drawing a certain sample size [35]. This result may be directly applied to establish the distribution for W:

p(W\m,K) = c)s<-

Our next objective is to determine the distribution of the components of the vector w given that W total probes are not observed. Each of the b price bit-columns has an equal proportion of the total number of probes m assigned to encode it, so a missing probe is equally likely to be from any one of the b columns. Thus, the distribution of the vector w 6-1 is multivariate hypergeometric with uii < f and Yl Wi = W: i=0 n(t) p(w|Win,6) = ^jr- (3.7) \w) 3 Deterministic Packet Marking for Congestion Price Estimation 36

Finally, we consider the distribution of the error in the end-to-end price given w. The error is the combined sum of quantization errors of the price of each link and the errors due to missing bits. The distribution is comprised of a discrete component due to missing bits, and a continuous component arising from the quantization errors. The distribution of the total quantization error is the n-fold convolution over n IID uniform random variables. an It has a mean of 0, is distributed in the interval [— ^ETT, ^TT] > d has a variance of 12^2t. We will denote a shifted version of this distribution centered at [i as g^. The distribution due to the 2W missing bits may be obtained by evaluating the error arising from every one of the 22W cases. Each case has probability ^w, meaning that it is simply a matter of identifying all the possible error values and determining how many different cases map to each value. Let the total number of possible error values be v, and construct a (2xt>) probability matrix P in which entry Plti identifies the z-th error value and

P2,j is its probability. The distribution of the total error due to quantization and missing bits is then given by the following sum:

V P , e 3 8 p(et\w,n,b) = Y^ ^ 9PiA t) ( - )

3.3.3 A Bound on the Expected Mean-Squared Error

We now derive a bound on the expected mean-squared error of the end-to-end price estimate assuming uniform generation of probe types. We make use of another result related to the Occupancy Problem - an upper bound H(K, m, W) on the probability p(W\m,K) of missing W probe types when K probes are generated and there are m unique types (from Theorem III in [37]):

'exp{-(^ln(^)+£[W]-W)} ,W>E[W] H(K,m,W) = l (3.9)

|exp{-(d^ + «-^)} ,W

1/2 for purposes of calculating the quantized price for a given link. Given that an MSB a0 contributes (l - £) t30^) to the quantized price, a missing MSB results in a discrete error taking on the values ±(l — ^)(^) with equal probability. We can thus upper-bound the error due to a missing MSB (and hence any bit) at ±1/4 and the variance at 1/16. Finally, due to independence, the variance in the error of the path price estimate is just the sum of the individual variances, given by J^F + Y- Multiplying this variance by (3.9) and summing over all possible values of W then gives the following result:

Theorem 2. For an estimation block of K probes uniformly drawn from m unique probes, with n links in the path and b bits used to quantize each link price, the expected mean-squared error in the path-price estimation is bounded as:

MSE{m,b,K,n)<

Y, mm(H(K,m, W), 1)- + ^6 (3-10) w=o where H(K)m,W) is determined from (3.9).

3.4 Simulation Performance

In this section of the chapter, we analyse the performance of the estimation algorithm using trace-driven simulations. During the last week of June, 2003, we collected 100 traces of 2000 packets with the program TCPDump by downloading files from 100 different servers. Of these servers, 50 were based in USA/Canada, 25 in Europe, 15 in Asia and Oceania and the remaining ten in other locations worldwide including South America and Africa. Using a simple script, we extracted the IP identifier field for each packet. Next, we mapped the field into a probe type using m = 63. We perform our analysis assuming a maximum path length of 30 (the TTL is taken modulo 30) and 4 bit quantization. These settings 3 Deterministic Packet Marking for Congestion Price Estimation 38 imply that the algorithm requires 60 probe types for estimation. The remaining three are reserved for protocol communication.

50 100 150 200 250 300 100 140 180 220 260 300 Probe spacing Block length (a) The survival function of the spacing between (b) The mean number of missing probes as a function probes of the same type. of block length.

Fig. 3.3 Analysis of the distribution of probe types. In each plot, the solid curve corresponds to the Internet traces and the dashed curve to uniform random allocation of probe type.

Since the performance of the algorithm is dependent on the properties of the sequence of IP identifiers in received packets, a closer examination of the traces is in order. A total of 44 traces exhibited a consistent increase of 1 in the IP identifier of successive packets. This is the ideal case for our algorithm, as it ensures that any block of size m contains all probe types. Another 53 of the traces exhibited a general sequentially increasing trend, but on occasion the IP identifiers of contiguous packets did not increase by 1 {i.e. there were "skips"). The behaviour of all servers with sequentially increasing IP identifiers is summarized in Table 3.1 The remaining three traces had randomly changing IP identifiers. We conducted a simple statistical autocorrelation test on each sequence that suggested the sequences were truly random. For most servers, we were able to determine the operating system being used. The distribution is given in Table 3.2. 3 Deterministic Packet Marking for Congestion Price Estimation 39

Percentage of Received Number of Servers Packets with IP Identifier Displaying this Behaviour Increasing by 1 100 44 99+ 16 90-98 18 80-89 5 70-79 3 60-69 1 50-59 6 less than 50 4

Table 3.1 The breakdown of observed server IPid behaviour

Operating System Number of Servers 53 PreeBSD 23 Solaris 15 Windows 6 Unknown 3

Table 3.2 The breakdown of observed server operating Systems. 3 Deterministic Packet Marking for Congestion Price Estimation 40

3.4.1 Missing Probe Types

First, we examined the nature of the distribution of probe types in the empirical data sequences. It is important that every probe type appears regularly. Figure 3.3(a) shows the survival function of the spacing between probes of the same type over the 100 traces. The function generated from a uniform allocation of probe types is shown for comparison. If the IP identifier were implemented as a strict counter, with no intervening packets from the host, probe types of the same kind would always be 63 packets apart. The empirical survival function indicates that the counter implementation is widespread; more than 90 percent of the time, the spacing is 63 packets. Sometimes the probe types are spaced much further apart; this occurs due to pseudorandom implementations and busy servers where the counter frequently skips. 99 percent of the time, the spacing between probes of the same type is less than 150 packets. Clearly, the uniform generation of probe types results in a substantially heavier tail in the spacing distribution (although the decay rate is approximately the same). The second feature that interests us for the purposes of block-based path price estima­ tion is the number of probe types missing from a block of size K. Figure 3.3(b) shows the mean number of missing probes as a function of block length for the empirical traces and for uniform random allocation. For a block length of 63, the mean number of missing probe types is slightly less than 4 in the empirical case (as compared to 24 for uniform allocation). The effects of the IP identifier counter implementation are again evident. The mean number of missing probe types is substantially less than 1 when the block size is 140. Figure 3.4 show the empirical survival functions for the number of missing probes in a block for the Internet traces. In the case of a block size of 100 packets, 80 percent of the time there are no missing types, and approximately 95 percent of the time there are less than 10 missing types.

3.4.2 Error Analysis

We performed an examination of the error probability of the deterministic estimation to enable comparison with RAM [8] probabilistic price estimation. The error probability err(e) is defined as the probability that the path-price estimate falls outside of a range (defined 3 Deterministic Packet Marking for Congestion Price Estimation 41

10° <

!a ro XI 2 -2 £ 10 ro '•. E -3 UJ 10

'" 0 5 10 15 20 25 No. Missing Probes Fig. 3.4 The survival function for the number of missing probe types. Dashed line corresponds to uniform random probe type allocation for a block length of 100. Solid lines correspond to Internet traces; diamond marker is block length 80, no marker block length 100, circular marker block length 120. by e) about the true price:

err{e) = 1 - Pr[(l - e)z < z < (1 + e)z] (3.11)

We examined two cases, each with a path consisting of 10 links. In case 1, all link prices were uniformly distributed over [0,0.2]. In case 2, all link prices were uniformly distributed over [0.4,0.6]. For both cases, one hundred realizations of link prices were generated, and the error probability was evaluated for each realization using the Internet traces to generate probe types. The results were averaged to determine an average error probability. Figure 3.5 displays the resulting averages for e = 0.1 as a function of block length. The performance of RAM is shown for comparison (with results derived from expressions in [8]). Both algorithms show improved performance when the marking range is centered around 0.5. When the Internet traces are used to generate probe types for these two scenarios, the deterministic algorithm outperforms RAM for block lengths greater than approximately 60. However, RAM performs better for block sizes below 240 when probe types are generated from a uniform random distribution. We note that as the level of congestion is lowered, RAM will eventually provide a lower 3 Deterministic Packet Marking for Congestion Price Estimation 42

40 80 120 160 200 240 280 320 360 400 Block length (a) Link prices uniformly distributed over the range 0 to 0.2

1- f---r--"Ti.\. 1 1 1 1 1 1 \\ \ s 0.9 \ \ 0.8. Jk \ x 1 % 0.7 •TSL * * \ \ '"'s- * -0.6 \ ^*s^ * *~ 1 ^^s**1*. ^0.5

u> 1 \ ^**S^M "H0.4 01 l x 9 ^ 0.3 ^s^. ^ \ s - X v - 0.2 V N •L N 0.1 ^•^^^ A. ** *• 0 —.—.—. i"+"t-»-+ > > • i /;> 40 80 120 160 200 240 280 320 360 400 Block length (b) Link prices uniformly distributed over the range 0.4 to 0.6

Fig. 3.5 Error probability as a function of block length for a path of 10 links and e = 0.1. The solid line with diamond markers is the Internet trace deterministic algorithm. The solid line with circular markers is the RAM al­ gorithm. The dashed line is the deterministic algorithm with uniform random probe type generation. The thin, solid line in figure (a) represents the lower bound on the error probability due to quantization. The lower bound is not shown in the other figure, as it is too close to zero to be distinguishable. 3 Deterministic Packet Marking for Congestion Price Estimation 43

100 120 140 200 Block length (a) The error probability as a function of e for a block (b) Normalized mean-squared error as a function of length of 100. block length. Error is normalized by the number of links (20).

400 500 600 Block length (c) Comparing the MSE bound to simulated MSE be­ haviour. The dashed line is the deterministic algo­ rithm, and the solid line is the bound.

Fig. 3.6 Error Analysis - error probability and mean-squared error. The solid line with diamond markers is the deterministic algorithm with probe types generated from Internet traces. The solid line with circular markers is the RAM algorithm. The path length is 20 links and prices are uniformly distributed between 0 and l.The dashed line is the deterministic algorithm with uniform random probe type generation. The thin, solid line in figure (a) represents the lower bound on the error probability due to quantization. 3 Deterministic Packet Marking for Congestion Price Estimation 44

error probability than the deterministic algorithm for any block length. Since the expected quantization error inherent to our algorithm is constant regardless of path price, the relative error grows as the path price decreases. Thus, if a network is expected to operate under congestion levels consistently near zero, RAM may be a better choice for estimating path prices. However, in such a situation, one may consider renormalizing the congestion metric so that prices are distributed over a wider range of values and the deterministic algorithm performs better.

3.4.3 Mean-Squared Error Analysis

We performed an analysis of the mean-squared error of the deterministic algorithm for the case of uniformly distributed link prices over [0,1]. We generated 100 realizations of link prices for a path of length 20 links, and then evaluated the mean-squared error for each realization. The probe types were generated both from the Internet traces and from a uniform random allocation. Figure 3.6(a) shows the error probability as a function of e. Figure 3.6(b) shows the normalized mean-squared error (the error is normalized by the number of links in the path). The mean-squared error performance of RAM is shown for comparison. As in error probability, the deterministic algorithm outperforms RAM for small block lengths when probe types are generated from the Internet traces. As the block length becomes large (> 1000), RAM begins to outperform the deterministic algorithm; the performance of the latter is bounded by quantization error. We also examined the accuracy of the MSE bound derived in 3.3.3. Figure 3.6(c) shows the bound, along with the simulated MSE for a path length of 20 links whose price is estimated with random IPids. The figure illustrates the range of block lengths over which our bound is somewhat tight. The bound offers little insight for block lengths of under 250.

3.5 Time Varying Behaviour

We have, to this point, examined the performance of our algorithm and RAM in static price scenarios. A more realistic model of a network will include time-varying levels of congestion and hence time-varying link prices. In this section we consider how well the algorithms perform in a time-varying scenario. The deterministic algorithm is naturally suited to estimating time-varying prices due to the periodic nature in which instances of a given probe type arrive. At any point in time, 3 Deterministic Packet Marking for Congestion Price Estimation 45

1000 2000 3000 4000 5000 6000 1000 2000 3000 4000 5000 6000 Packets Transmitted Packets Sent (a) Sample time-varying link price (b) Path price and estimate generated by determinis­ tic algorithm with Internet trace-based IPids

1000 2000 3000 4000 5000 6000 1000 2000 3000 4000 5000 6000 Packets Sent Packets Sent (c) Path price and estimate generated by determinis­ (d) Path price and estimate generated by RAM tic algorithm with random IPids

Fig. 3.7 Estimating a Time-Varying Price. Figure (a) is an example of how the price of one link varies with time. Figures (b)-(d) illustrate a sample path price as the heavy curve and the given algorithm's current estimate as the dashed line 3 Deterministic Packet Marking for Congestion Price Estimation 46

(a) Error probability for Scenario 1: 20 links, (b) Error probability for Scenario 2: 30 links.

(c) Error probability for Scenario 3: 20 links. (d) Error probability for Scenario 4: 30 links.

Fig. 3.8 Error probability err(e). The solid line with diamond markers is the deterministic algorithm with probe types generated from Internet traces. The solid line with circular markers is the RAM algorithm. The dashed line is the deterministic algorithm with uniform random probe type generation. 3 Deterministic Packet Marking for Congestion Price Estimation 47

the receiver estimates the current path price by considering the most recent instance of every probe type received. When a packet arrives, the ECN value of the associated probe becomes the new current value stored by the receiver. In order to adapt RAM to estimate a time-varying path price, one must choose an appropriate block length. The best choice is not apparent. On the one hand, the block length should be as short as possible, so that the algorithm can accurately detect price variations; if the block is so long that the path price varies significantly while the packets making up the block are sent, the estimate will be an average path price rather than an instantaneous value. On the other hand, the expected estimation error of RAM decays linearly with block length, meaning a long block length is necessary to avoid significant estimation errors. The block length that garners the best result is dependent on the specific dynamics of the link prices. For each time-varying scenario, we experimentally chose a block length that minimized the sample mean-squared-error over all estimates. These block lengths are provided in Table 3.3.

MSE: MSE: MSE: Scenario Deterministic Deterministic RAM RAM Optimal w/ random w/ trace-based Block IPids IPids Length 1 0.630 0.270 1.078 450 2 0.786 0.223 1.314 380 3 1.296 0.904 1.325 500 4 0.845 0.224 1.782 370 Table 3.3 Mean-squared error of estimates, and empirically determined op­ timal block length for RAM

We consider 4 scenarios. In our first two scenarios link prices change after every block of 50 packets is transmitted. While the interval between price changes is the same for all links, the price transitions are not synchronized. This is accomplished by having the first price transition occur after a random number of probes - uniformly distributed between 1 and 50. Each link price is initialized to a random value uniformly distributed in the range [0.25, 0.75]. Every price change is normally distributed with mean 0 and standard deviation 0.1. In addition, there are reflective boundaries at 0.85 and 0.15 which define the range of values the prices may take on. The behaviour of each link price may be considered a random walk bounded in the range [0.15, 0.85]. The two scenarios model paths with 20 3 Deterministic Packet Marking for Congestion Price Estimation 48 and 30 links. Figure 3.7(a) depicts the time-varying price behaviour of a sample link over the interval in which 6000 data packets are sent. Figure 3.7(b)-(d) shows a sample path price for the 30-link scenario, and the instantaneous estimates generated by RAM and the deterministic algorithm when exposed to data packets with random IPids as well as Internet trace- based IPids. The sample estimation plots suggest that RAM tends to experience longer runs of over/under estimation than the deterministic algorithm. In order to draw more meaningful conclusions, we consider err(e) as defined in (3.11). We ran 100 iterations of each of the time-varying scenarios. Figures 3.8(a) and 3.8(b) depict the results. For every value of e, both instances of the deterministic algorithm perform better than RAM. As with the simulations involving static prices, the deterministic algorithm performs better when subjected to Internet trace-based IPids than to random ones. Our final two scenarios involve link prices that change less often - once every 100 packets. However, the standard deviation of each change is 0.2, and the reflective boundaries are at 0 and 1. The resulting err(e) performance is shown in 3.8(c) and 3.8(d). The relative performance of the algorithms remains the same. Another metric of interest in comparing the algorithms is the mean-squared error over all estimates. Table 3.3 summarizes this data. The deterministic algorithm exhibits the lowest MSE in all scenarios.

3.6 Practical Issues

3.6.1 Receiver Feedback

Any price-based congestion control protocol requires that the source be able to determine the price along its path to the receiver. As is the case for the REM and RAM proposals, we have not addressed the mechanism by which a receiver informs the sender of its path price estimate (the performance of REM in conjunction with RFC 3168-style feedback has been explored via simulation in [7]). RFC 3168 suggests the addition of a 1-bit ECN-echo (ECE) in the TCP header [19]. The RFC proposes that after a client receives a packet with the ECN field in the CE state, it set the ECE bit to 1 in every acknowledgment it sends until it receives a packet from the sender with the Congestion Window Reduced (CWR) flag set. The CWR flag, another new TCP flag suggested in [19], is set by the sender to acknowledge 3 Deterministic Packet Marking for Congestion Price Estimation 49 the receipt of an ECE packet. Unfortunately, this mechanism cannot be readily adapted to provide explicit feedback of the path price to the sender. The fundamental problem is that there is the potential for multiple acknowledgments to be sent with the ECE field set in response to a single received packet with the ECN field in the CE state. This means that the sender is unable to accurately determine what proportion of its sent packets were marked. Thus, a novel approach is necessary in order to provide more informative price feedback.

3.6.2 Security

Although the specific method used by the receiver to convey its path price estimate back to the sender lies outside of the scope of this chapter, we will examine the associated security concerns. The primary concern is that a malicious receiver stands to receive a disproportionate of a congested link's capacity if it conceals the presence of congestion from the sender [38]. Under our proposed algorithm, the receiver forms an estimate of the path price based on the received probes. Using our example of 4-bit quantization and a maximum of 30 links, the estimated path price will fall between 0 and 30 and can be represented - to the full precision afforded by the chosen quantization - using only 9 bits. In the simplest possible model of a feedback mechanism, the receiver periodically sends the 9-bit estimate to the sender (possibly embedded in TCP acknowledgment packets). This approach generates the smallest amount of traffic along the feedback path, but provides no security. It is trivial for a receiver to send any arbitrary path price estimate it wishes, and the sender has no way to detect any possible deceit. We now present a modification to our algorithm which guards against a malicious re­ ceiver falsifying its path estimate. Rather than always initializing the ECN field to 01, the sender chooses, while setting up the connection, a random ECN initialization for all probes corresponding to each bit-position. For every data packet subsequently sent, the sender determines the probe type so that it knows what bit position the probe is encoding. Based on this information, it initializes the ECN field to the (now) fixed value it randomly chose during setup. Routers increment the field as before, but may now also "wrap-around" from 11 to 01. The receiver is not aware of the b initial ECN fields, rendering it incapable of calculating the estimated path price itself. For each of the b categories of probe types, the receiver keeps a tally of how many of the probe types have their ECN field set to each of 3 Deterministic Packet Marking for Congestion Price Estimation 50

the 3 possible codepoints. Once it has seen every probe type, the sender provides its tally as feedback to the sender. This is enough information for the sender to calculate the path price estimate. For example, if the sender set the initial ECN field of probe types encoding MSB sums to 11, and the receiver informs it that of the 15 MSB probe types it received five were 11, four were 01, and six were 10, the sender would determine that five probes encoded a bit-sum of zero, four encoded a one, and six encoded a two. Thus it would deduce that 16 of the 30 MSBs were 1. With this approach, the total amount of data the receiver has to transmit to the sender under our standard assumptions is 32 bits: for each of the four bit-positions of the quantized link price, a value of 0 to 15 for the number of probe types with an ECN value of 01 (requiring 4 bits to represent), and a value from 0 to 15 for the number of probe types with an ECN value of 10 (again, requiring 4 bits). We note that the number of probe types taking on the remaining possible value of 11 is uniquely determined by the other two values, and hence does not have to be explicitly encoded. With this approach, we achieve a level of security because the receiver does not know the mapping between codepoints and bit-sums for any of the probes. However, due to the fact that the codepoint initialization remains constant for a subset of the probes, the receiver could conceivably keep a long term count of how many of the three codepoints it observes for each of the sets and attempt to deduce the codepoint initialization. It would take considerable effort to configure a receiver to carry out this attack and the results achieved may not be reliable.

3.7 Maximum Link Price Estimation

A newly proposed congestion control algorithm, MaxNet [28], only requires sources to have information about the most congested link in a path. In this section we present a modified version of our deterministic packet marking algorithm designed to be used with MaxNet. MaxNet is an attractive alternative, because, as the authors show, it achieves MaxMin fairness for a wide range of utility functions. A utility function is defined by the relationship between source transmission rate and the congestion price. MaxMin fairness indicates that every source in a network is transmitting at the maximum rate possible without lowering the rate of another source transmitting at an equal or lower rate. In order to convey the maximum link price (MLP), the authors indicate that a packet "must include bits to communicate the complete congestion price", but the details lie outside the scope 3 Deterministic Packet Marking for Congestion Price Estimation 51 of their paper. This serves as the primary motivation for this section of the chapter. We specify a modified deterministic marking algorithm which estimates the MLP along an end- to-end path from source to receiver in a TCP/IP network, and conveys the information back to the source. Given the restriction the TCP protocol places on one's ability to embed or feed back information, the approach we describe is a very efficient mechanism for conveying MLP information to the source.

3.7.1 Algorithm Specification

As before, our algorithm requires every link price s, to be bounded and normalized: 0 < Si < 1. Each router calculates a 6-bit quantization of the congestion price of outgoing links. The key idea of the algorithm is that there is a finite number of possible quantized prices a link may take on, and each packet ascertains whether any links currently take on 1 of 2 particular prices. The specific link prices a given packet is concerned with depends on its probe type. A packet's probe type is again determined by its IPid field. There are m = 2b_1 unique probe types, one for each of the possible values taken on by the 6—1 most-significant bits (MSBs) of a link price. For example, if b = 4, probe type 0 is concerned with the two prices whose MSBs are 000. It ascertains if there is a link price (or link prices) on the path in the range of [0000,0001], and, if so, what the maximum price is in this range. Similarly, probe type 7 is concerned with prices in the range [1110,1111] When a packet arrives at a router, the router calculates the packet's probe type using a modulo operation: ProbeType = IPid mod m. The routers accesses a static look-up table mapping probe types to the link prices with which they are concerned. If a router determines a match between the link price and the probe type, it modifies the packet's ECN field based on that field's current value and the least significant bit (LSB) of the link price. If the ECN field is 01, and the LSB is 0, it sets the ECN field to 10. If the LSB is 1, it sets the ECN field to 11. Once the receiver has received at least one instance of each probe type, it can inspect the ECN fields to determine the maximum quantized price of any link. It scans through the probe types, in order of highest to lowest associated price ranges. Proceeding in this manner, the first probe the receiver finds with an ECN field not equal to 01 determines the MLP. In order to notify the source of the current MLP, the receiver makes use of the ECE 3 Deterministic Packet Marking for Congestion Price Estimation 52 bit in packets sent back to the source. Each such packet encodes one of the b bits of the currently estimated MLP. Packets are mapped to a different set of probe types based on the IPid field, in this case using a modulo b operation. Probe type 0 carries the LSB of the price, while probe type b—1 carries the MSB. Once the source has receive one instance of each receiver probe type, it can discern the MLP.

3.7.2 Performance Analysis

As the MLP estimation algorithm proceeds, it will always be in one of three phases. The first phase, Outdated Estimate (OE) begins immediately after a change in the MLP. During this interval, neither the receiver nor source estimates are correct. We note that a correct estimate ec is defined as differing from the true MLP pm by no more than the maximum possible quantization error: pm — 2^+1) < ec < pm + ^+1). Upon receiving instances of the downstream probe types reflecting the new MLP, the receiver corrects its estimate. At this point the Correct Receiver Estimate (CRE) phase begins. It lasts until the source receives the upstream probe types necessary to mirror the correct estimate of the receiver. Thus begins the Correct Estimate (CE) phase, which lasts until the next MLP change. Two price changes in quick succession may result in either the CE or both the CE and CRE phase being of length zero. Next we will examine the characteristics of these phases, including the distribution of their lengths and the estimation errors at the source. Lengths are defined according to the number of downstream packets received, and in the case of fixed-rate packet arrivals may be converted to time lengths by multiplying by the rate parameter. The length of an OE interval depends on whether the associated MLP change was an increase or decrease. In the case of an increase, the receiver must only receive a probe type carry information about the interval in which the new MLP lies. With randomly generated IPids, the number of packets that the receiver sees before receiving the one necessary probe type is geometrically distributed with parameter 1/m. With sequentially increasing IPids, the distribution is uniformly distributed in the interval [1, raj. If the MLP decreases, the receiver must receive a probe type carrying information on the interval of the previous MLP so that the receiver can deduce that no links lie in that price range anymore. It may also require a probe type with information on the new MLP interval. However, in some cases it will already have this information and can form a correct estimate after receiving a probe type "clearing" 3 Deterministic Packet Marking for Congestion Price Estimation 53 the previous MLP (recall that the receiver stores the value of the most previously received instance of each probe). The proportion of instances in which a price decreases requires the receiver seeing only one specific probe type (as apposed to two or more) before forming a correct estimate is dependent on the number of links making up the path, the frequency at which individual link prices change, and the possible magnitudes of price changes. The distributions of the estimation error during the OE and CRE phases are also dependent on these factors. The length of the CRE depends on the number of bits that have changed between the receiver's current correct MLP estimate and its previous estimate. If 1 < q < b bits have changed, the source has to receive q unique upstream probe types to correct its estimate. In the case of random IPids, the corresponding number of packets the source has to receive is distributed according to a sum of geometric distributions with parameters K ^-',..., K With sequential IPids, the number of required packets is distributed in the interval \q, b]. The distribution of the CE phase length again depends on the frequency at which link prices change, and the distribution of the other two phases. The only source of error dur­ ing this interval in the MLP estimate is due to quantization. If the MLP is uniformly an distributed, the error will be uniformly distributed in the interval [^FFT, JJETT]' d the ex­ 1 pected mean-squared error is 12 226. The preceding discussion has alluded to the two sources of error in our MLP estimation algorithm: quantization noise, and outdated estimates due to delay between changes in the MLP and arrivals of probe types conveying the updated information. There is an inherent trade off between these two errors. A larger value of b reduces the expected quantization noise but increases the estimation delay, i.e. the length of OE and CRE. Given a set of network parameters - the rate and magnitudes of link prices change, the upstream and downstream packet rate, and the IPid behaviour - an obvious question is how to choose b to minimize the root mean-squared error (RMSE) of the estimate. Solving this problem analytically is not feasible since not all error and interval length distributions are known. Instead, we will approach this problem using simulation.

3.7.3 Simulation Results

Our simulation models a TCP connection over a path comprised of 20 individual links. The source sends data packets to the receiver, and the receiver sends only pure ACK packets. 3 Deterministic Packet Marking for Congestion Price Estimation 54

J 0.9 \ jfi f n "jT-iT Pl 1- In U F yflf hai u 'S< 0.6 -

0.5 •

500 1000 15O0 2000 2500 3000 3500 4000 4500 5000 Packets Sent from Source to Sink

Fig. 3.9 Maximum link price estimation example. The solid line indicates the actual MLP, and the dotted line the estimate at the receiver. The price of each link changes after every 100 packets, and the change is normally dis­ tributed with mean 0 and standard deviation 0.2.

20 40 60 10 20 30 40 Packets Sent from Source to Sink Packets Sent from Source to Sink (a) Empirical CDF of OE Interval Length (b) Empirical CDF of CRE Interval Length Fig. 3.10 Empirical Cumulative Distribution Function of the lengths of the Outdated Estimate and Correct Receiver Estimate intervals. The thin solid, dashed, and thick solid lines represent, respectively, 3,4, and 5 bit price quan­ tization 3 Deterministic Packet Marking for Congestion Price Estimation 55

0.07 0.07

o a 0.03

0.02 0.02 500 1000 1500 2000 500 1000 1500 2000 Packets Sent Between Link Price Changes Packets Sent Between Link Price Changes (a) Price changes normally distributed with fi = 0, (b) Price changes normally distributed with fj, = 0, a = 0.1 a = 0.2

0.07

o S. 0.03

0.02 500 1000 1500 2000 Packets Sent Between Link Price Changes (c) Price changes normally distributed with \x = 0, a = 0.3

Fig. 3.11 Root mean-squared error of MLP estimate as a function of the delay between link price changes. The thin solid, dashed, and thick solid lines represent, respectively, 3,4, and 5 bit price quantization 3 Deterministic Packet Marking for Congestion Price Estimation 56

Since many TCP implementations send one ACK for every two data packets received [39], we fix the upstream packet rate at one half the downstream rate. The price of each link is initially uniformly distributed in the interval [0,1]. Subsequently, each link price changes independently after a fixed number of downstream packets have been sent. The magnitude of each price change is normally distributed, and there are reflective boundaries at 0 and 1. The IPids are randomly generated. Figure 3.9 provides an example of how a 4-bit version of our algorithm estimates the changing MLP under these conditions. We are interested in examining the effect that b has on the length of OE and CRE. The empirical CDFs of the interval lengths for b set to 3, 4, and 5, are presented in Figure 3.10. In all cases, the standard deviation of link price changes is set to 0.1, and prices change after every 100 downstream packets. The mean OE lengths, in order of increasing b are 4.88, 9.78, and 22.23 packets. For the CRE lengths, the means are 6.41, 8.53, and 10.60. As expected, the average lengths of both intervals increase with b. Next, we examine the RMSE of the source estimate during each phase with the same simulation parameters as above. The results are compiled in Table 3.4. From the perspec­ tive of the source, the distinction between the OE and CRE phase is largely irrelevant and therefore these two intervals were combined in the simulation. The RMSE during the CE is decreasing in b and significantly smaller than during the OE/CRE phase in all cases. The OE/CRE RMSE is essentially independent of b. It is also worth noting that the CE RMSE values all lie within 16% of the theoretical RMSE for uniform quantization. In order to explore the problem of minimizing the estimation RMSE, we consider sim­ ulations over a range of link price change-rates (LPCR) and magnitude distributions. The results are illustrated in Figure 3.11. In all cases, 5-bit quantization results in the largest RMSE for the higher LPCRs, and eventually achieves the best performance as the rate decreases. This can be explained by the following observations: As the LPCR decreases, the expected lengths of the OE and CRE intervals are essentially unaffected, but the length of the CE phase tends to grow. Furthermore, the length of the OE and CRE interval grows with b. Since the RMSE is much higher in the OE and CRE intervals than during the CE phase, it is advantageous to limit the lengths of these intervals by choosing a smaller b when the LPCR is high. However, since increasing b results in a lower RMSE during the CE phase, as the LPCR is lowered, the CE phase eventually becomes long enough to warrant the choice of a larger b. 3 Deterministic Packet Marking for Congestion Price Estimation 57

Phase: 3-bit 4-bit 5-bit quantization quantization quantization OE and CRE 0.1030 0.0924 0.0995 CE 0.0317 0.0158 0.0076

Table 3.4 Root mean-squared error of estimates during OE/CRE and CE phases

3.7.4 Related Publications

In [30] Ryu and Chong cite our first publication describing the deterministic marking algo­ rithm [27] and modify it in order to devise a maximum link price estimation scheme similar to the one presented in this section and published by us in [29]. In their approach, each link price is quantized to one of iV uniform levels, and each of the M = \N/3] probetypes is responsible for encoding whether any links take on one of 3 associated quantized price, and, if so, what the maximum link price in the range is. Ryu and Chong's simulation results include considering the relationship between block length and estimation error, and the effect of packet loss rate. In [40], Andrew, Hanly, Chan, and Cui present another approach to maximum link price estimation. Their publication also cites our paper [27] as the basis for the concept of deterministic packet marking using the ECN field. Their approach does not use ex­ plicit link price quantization, but instead maps packets to price thresholds i, and requires routers to mark a packet if the associated link price p exceeds i. The receiver maintains an estimate of the maximum link price p, and it changes to to i if it receives a packet of probetype i > p, or receives an unmarked packet of type % < p. Andrew et al. suggest two approaches to mapping probe types to threshold prices. One possibility is a simple mapping to random thresholds distributed uniformly over the range of possible prices. The more advanced approach is to to have probe types that partition the space of possible prices into increasingly smaller intervals. Under the authors' bit-reversed counting assignment of probe types, after 2J — 1 sequential probe types have been received, the maximum link price can be estimated by the receiver to fall in an interval of proportion 2~J of the total price range. Simulations results show that the scheme proposed by Andrew et al. provides an improved mean-squared error over a modified version of our algorithm presented in [27]. 3 Deterministic Packet Marking for Congestion Price Estimation 58

3.8 Summary

In this chapter we have presented a novel deterministic packet marking technique which allows the estimation of the total amount of congestion along a network path. We provided theoretical analysis of its behaviour, and included a number of simulations comparing its performance to random marking techniques - in both static and time-varying price scenar­ ios. Finally, we examined some of the implementation issues associated with the algorithm, and presented a modified version suitable for estimating the maximum amount of congestion on any link along a path. 59

Chapter 4

Background and Related Work: P2P Viruses and Pollution

4.1 Background

Peer-to-peer (P2P) networks have become increasingly vulnerable to malicious behaviour, including the dissemination of polluted versions of files and the release of P2P viruses. Early P2P networks such as focused exclusively on media files, so propagation of viruses was difficult to achieve [41]. Contemporary P2P networks such as Kazaa / Fastrack [4] and eDonkey2000 [42] can be used to disseminate executable files and are hence much more susceptible. The mainstream adoption of P2P file exchange—the eDonkey2000 network alone typically has over 2 million users connected at any given time [43]—means that a significant fraction of users lack the technical knowledge to detect suspicious files or scan for viruses. The phenomenon of pollution, the presence of corrupted (or "bad") versions of items (songs, movies or multimedia files) in P2P networks, has become increasingly prevalent. Some of these versions are made available by accident, as users make errors in file generation. But the dominant cause is deliberate dissemination of decoy files, termed item poisoning in [44], a technological mechanism employed by copyright holders and their agents to impede the distribution of content. These decoy files have names and metadata matching those of the genuine item, but contain corrupted, unreadable or inappropriate data. Whether accidental or deliberate, pollution has rendered a substantial portion of the files on popular

2007/11/01 4 Background and Related Work: P2P Viruses and Pollution 60

P2P networks unusable. In this thesis we examine the behaviour of viruses and pollution in P2P networks. We adopt an epidemiological approach, developing dynamic models to describe the evolution of infection/pollution. We consider the stochastic nature of the system during our develop­ ment of the models, but our models are deterministic and focus on the expected behaviour of the system. We illustrate that these deterministic models are sufficiently accurate to capture the behaviour of P2P networks, by comparison with more realistic simulations that model individual peers. Our initial purpose is to model the impact of malicious code on a P2P network, but a primary motivation is to examine how effective the introduction of mitigation techniques might be. In particular, we focus on object reputation schemes (such as Credence [9]).and methods that increase the rate of elimination of infected files. Our model provides an analytical method for determining (at least approximately) how widespread the adoption of such schemes must be, and how effective they must be, in order that specific targets of residual pollution or infection be achieved. We validate these specifications through more accurate simulation of the networks.

4.1.1 P2P Networks, Viruses and Pollution

This section highlights the key features shared by popular P2P networks, including Kazaa, eDonkey2000, and Gnutella [5]. Every peer connected to the network has a shared folder containing all the files the user wishes to make publicly available for download by others on the network. When a user wants to download a file, he begins by sending out a search request. In response he receives a list of files matching the search criteria. The specific manner in which this list is generated varies among the various P2P networks, but in all cases the query response is the result of the examination of the shared folders of a subset of all peers connected to the network. Once the user elects to download one of the files from the list, his client attempts to set up a connection to a peer sharing the file and begins receiving the file. Depending on the specific network, the client may attempt to simultaneously download different parts of the file from a number of peers in order to expedite the operation. P2P clients typically save new downloaded files in the shared folder - making them immediately available to other users. A number of worms and viruses that exploit P2P networks have already surfaced. Ta- 4 Background and Related Work: P2P Viruses and Pollution 61 ble A.l in the Appendix summarizes the known P2P malware, culled from the public databases of several companies providing anti-virus software [45-52]. The majority of these behave in a similar fashion. Specifically, when a user downloads a file containing the virus and executes it, a number of new files containing the virus are created and placed in the client's shared directory. Some types of viruses, including Achar [53] and Gotorm [54], generate a fixed list of filenames when executed. More advanced viruses, such as Bare [55] and Krepper [56], randomly pick the list of filenames from a large pool of candidates. Pollution is a more widespread phenomenon, as indicated by the empirical study per­ formed in [57]. The study shows that the number of versions of relatively popular items is generally substantial (on the order of tens or hundreds). It was also observed that the pollution level (the fraction of bad versions) for a specific item remained approximately constant over time.

4.2 Related Work

4.2.1 Internet-Related Epidemiological Modeling

Epidemiology has traditionally been used to analyze and model the spread of diseases in a biological population. However, a number of authors have published papers in which they applied this field to the area of computer networks. In particular, it has been used to model the spread of worms and email viruses over the Internet. In [58] Zou, Gong, and Towsley consider the Code Red worm, which exploited a vulner­ ability in the Windows IIS web servers. After a largely unsuccessful first version, a second effective version was launched on July 13th, 2001. Once a computer becomes infected, the worm generates 100 threads simultaneously, each of which randomly choses an IP address to try to connect to and infect. After a 21-second time-out a thread tries another IP ad­ dress. A significant amount of measurement data was collected on the worm, including a sample of the number of port scans and unique source addresses each hour. In the generic Kermack-Mckendrick (K-M) epidemiology model [59], hosts are classified as infected, susceptible, or removed. The number of hosts in each class at time t is denoted by, respectively, I(t), S(t), and R(t). The total number of hosts is fixed as N = I(i) + S(t) + R(t). The rate at which infected hosts infect susceptible ones is given by /?, and the rate at which infected hosts are removed from circulation is 7. The model equations are 4 Background and Related Work: P2P Viruses and Pollution 62 given by:

dJ(t)/dt = /3J(t)[N- J(t)} dR(t)/dt = -yI(t) (4.1) J(t) = I(t) + R(t) = N- S(t) Zou et al. identify two shortcomings of the K-M model in its suitability for modeling the Code Red worm. First, in the Internet, susceptible (not just infected) hosts can be removed from circulation. The reason for this is that users may patch a susceptible host, disconnect it or implement some other preemptive countermeasures that renders it immune to infection. These removed susceptible hosts are represented by the variable Q(t), and the authors argue that its rate of change is given by:

dQ(t)/dt = vS{t)J(t) (4.2)

Zou et al. justify the dependency of (4.2) on J(t) by explaining that, as the worm first appears and few hosts are infected, few users will be aware of it and hence immuniza­ tion is slow. As the level of infection increases, so does user awareness and the rate of immunization. The value (x is a constant. A second identified shortcoming is that the constant infection rate (/?) assumed by the classic model is not appropriate for modeling the spread of worms. In actuality, Zou et al. argue, f3 changes with time. As exhibited by Code Red, worms cause Internet congestion due to the enormous amount of scanning packets, and may also cause some routers to reboot because the large volume of unique IP destination address fills up their memories. Thus, the authors model (3{t) as:

0(t) = P0[l - I(t)/NY> (4.3)

Here @0 is a constant determining the initial infection rate, and r) models how sensitive the rate is to I(t). Thus, the modified K-M model presented by Zou et al. is defined by the following equations: 4 Background and Related Work: P2P Viruses and Pollution 63

dS(t)dt = -p(t)S(t)I(t)-^ dR(t)/dt = 7/(<) dQ(t)/dt = nS{t)J(t)p(t) = ft[l - I(t)/N}n (4.4) iV = 5(f) + I(t) + R(t) + Q(t)

1(0) = J„ « JV; 5(0) = iV(0) - J0; i?(0) = Q(0) = 0 Here S(t), I(t), R{t), and J[t) are as defined in the K-M model; C(t) = R{t) + Q{t), and N - /(t) + R(t) + Q(t) + S(t), The parameters in these equations are set so that the model matches the observed

behaviour of Code Red: rj = 3, /J, = 0.06/iV, rj = 0.05, and /30 = 0.8/N. A comparison between the evolution of infection predicted by this model and the measured Code Red behaviour is given in figure 4.1. Indeed, the model presented tracks the true Code Red well. Figure 4.2 shows the evolution of J(t), Q(t), and I(t) and the evolution of I(t) according to the K-M model. It illustrates that number of infectious hosts decreases eventually, due to the removal of susceptible hosts. In the classic model this decrease is never observed.

x stf Comparison between model and observed data 12

10

(A

(fl 3 O

(U e •5 6

0 E * Z O Observed Code Red infectious hosts two-factor worm model: I(t)

QUO—a 12:00 14:00 16:00 18:00 20:00 22:00 00:00 UTC hours (July 19 -20)

Fig. 4.1 Observed Code Red data compared with behaviour predicted by model. Source: [58] 4 Background and Related Work: P2P Viruses and Pollution 64

x10 Two-factor worm model numerical solution 10 9 8 « 7 CO Classical simple epidemic model: J(t) 2 6 Iufected hosts: J(t)=I(t)+R(t) 2 5 Infectious hosts: I(t) CD Removed hosts fromsusceptible : Q(t) E 4 c 3. 2 1 0 10 20 30 40 50 60 70 80 time: t

Fig. 4.2 J(t), I(t) and Q(t) as predicted by model, and J(t) according to the classic K-M model. Source: [58]

In [60] Chen, Gao, and Kwiat present their Analytical Active Worm Propagation (AAWP) model. It is a discrete time model of how the expected number of infected nodes evolves. The model is characterized by the following equation, which is a recursion for the number of infected machines ni+i at time step i + \,i > 0 and n0 = h, where h is the size of a "hitlist" of known vulnerable machines which the worm immediately infects once deployed:

yriii ni+1 = (l-d-p)m + [(1 -PYN - m][1 - (1 232 (4.5) where N is the number of susceptible hosts, s is the average rate at which hosts are scanned by infected machines, p is the patching rate - the rate at which susceptible or infected machines are made invulnerable, and d is the death rate - how fast infected machines are detected and eliminated without patching. The first term accounts for a reduction in infected hosts due to patching and elimination, the first factor in the second term is the remaining number of susceptible nodes after i time steps, and the second factor is the probability that any given susceptible host will be infected by an infected host. The 1/232 element of the equation is due to the random nature of the scanning worm: each of the 232 4 Background and Related Work: P2P Viruses and Pollution 65

possible IP addresses is scanned with equal likelihood. Similar to what Zou et al. did in [58] Chen et al. use their model to simulate the Code Red worm in [60]. As illustrated in Figure 4.3, they are able to choose parameters which lead to an accurate representation of the measured Code Red behaviour. Data on the Code Red worm was collected by CAIDA [61] by recording how many scanning packets were sent on one /8 network (representing 224 IP addresses) and two /16 networks. Chen et al. apply their model in order to determine how large an IP address space must be monitored, using the same technique as employed by CAIDA, in order to accurately estimate the propagation of an active worm. They conclude that 224 addresses are indeed sufficient, and address spaces of smaller than 220 are inadequate.

Cod. R*d Uofm - inflated toeeoe '">"*'

sseoee

300006 /

£59066 / / saeeeo ( - J 1 1300193 • - / r loaooe - / / saeee J • ia>08 ieie9 seiee 12 16 20 time (hour)

(a) Measurement of the Code Red v2 worm spread using real data (b) A simulation of the spread of the Code Red v2 worm (500,000 from CAIDA. vulnerable machines, starting on a single machine, a scanning rate of 2 scans/second, a death rate of 0.00002 /second, a patching rate of 0.000002 /second, and a time period of I second to complete infection).

Fig. 4.3 Measured Red Worm behaviour vs. AAWP model prediction. Source: [60]

In [62] Garetto, Gong, and Towsley model the spreading of a virus via an email at­ tachment. They present a discrete-time Markov chain known as an influence model. The email virus model is represented by a directed graph in which each node represents a user, and the edges represent connections to other users with whom a relationship, and thus the possibility of receiving an email, exists. Each edge coming into a node i from another 4 Background and Related Work: P2P Viruses and Pollution 66

node j has a weight witj, and these are normalized so that the sum over all incoming edges into any node is one. The weight represents the probability that, during a time-step of the Markov chain, a user checks whether an email was sent from i to j. Each user i has a probability Q (the "click probability") of opening an attachment containing the virus, regardless of how many copies they receive. All users fall into one of three categories: susceptible (S), infected (I), or immune (M). An S user has not been infected, but has the potential to be; / indicates a user whose computer has been infected because he received and opened the email attachment; an M user cannot ever be infected, due to a variety of possible reasons. The model is defined by the following system of recursive equations:

ph [k + i] = ph [k] + £IU whJ Cj pltSj [k]

PMj [k+l] = PMj [k] + Eli Wij (1 -

PSj[k + 1] = 1 - Pj.[k + 1] - PMj[k + 1] where Wij are edge weights, and Pj^ is the probability that i is infected at time k and j is susceptible. The above equations cannot generally be solved, but explicit expressions for the expected steady-state behaviour of email viruses exist in special cases. Specifically, Garetto et al. examine the spreading of these viruses in a small-world graph proposed in Figure [63]. The version they consider is a ring graph with N nodes, with each node connected to all neighbours within a range of k, and S additional "shortcut" links between randomly selected nodes. The ratio is defined as 0 = ^. An example of such a small-world graph is given in Figure 4.4. Results from previous work [64] allow a number of closed form expressions, such as the expected number of sites that become infected in the limiting case as time goes to infinity:

(4 7) *!'-] = i-t-2^i + <) ' where q = 1 — (1 — c)k. When a virus is injected into a lattice at a certain node, the virus reaches a node k at distance k with probability PR = c(l — (1 — c) ). This reaching probability decays 4-Background and Related Work: P2P Viruses and Pollution 67 geometrically with parameter a:

a = g-c(1,7KHg) (4.8) q - kc(l - q) In a ring lattice with no shortcuts, a node may be reached by the virus from either side. If the probability of being reached from one side is P^ and from the other side P^, the total probability of the node being reached is:

P« = 1-(1-PA)(1-^) (4-9) Plot 4.5 shows the reaching probability on a ring lattice with N = 100 for two different sets of parameters: k = 3, c = 0.6; and k = 5, c = 0.4. In the case of a ring lattice with shortcuts, Garetto et al. do not present exact solutions.

Instead, they derive upper and lower bounds on PR(i).

Fig. 4.4 Small-world graph example with iV = 24, k = 3, and 5 = 4. Source: [62]

An explicit solution to (4.6) would be necessary in order to determine the transient behaviour of the network rather than the steady-state condition. Unfortunately this is not possible, not even for numerical solutions, because the probabilities P^Sj are unknown. However, upper and lower bounds on these joint probabilities exist for all graphs in the case where c* = 1 Vi, and approximations exist for the small-world graph. Garetto et al. show that the approximations and bounds give similar results in predicting the evolution of the number of infected nodes in the network with time as simulations do. In the case where 4 Background and Related Work: P2P Viruses and Pollution 68

1

o.i

simk = 3-C = 0.6 mod k = 3 - C = 0.6 simk = 5-C = 0.3 mod k = 5 - C = 0.3 -1 1 1— 1 1 r- 10 20 30 40 50 60 70 80 90 99 node index

Fig. 4.5 Reaching probability of virus on a ring for two different parameters. Source: [62]

Ci < 1 for some or all users, looser approximations are necessary. The resulting predictions of transient behaviour are still somewhat close to the simulation results, although the model tends to overestimate the amount of infected hosts - especially early in the simulated scenario.

4.2.2 Modeling and Simulating Pollution in P2P Networks

The presence of mislabeled or corrupt files in P2P networks is known as pollution. We model and analyze this phenomenom in the following chapter, and below present a summary of three recent papers considering the same issue. In [65] Kumar, Yao, Bagchi, Ross, and Rubenstein present a fluid model of pollution proliferation in P2P networks. In this model, there are initially JV polluted versions of the file shared by malicious peers in the network. There are M benign peers which want to obtain a non-polluted version of the file. Each peer has an exponentially distributed inspection rate with parameter // which includes the time to download a file and assess whether it is clean or polluted. If a file is found to be polluted, a user deletes it and initiates a new download. In the simple copy centric case, the probability p of downloading a polluted version of the file is given by the proportion of polluted files to total number of 4 Background and Related Work: P2P Viruses and Pollution 69 files. Thus, given x(t) and y(t) - the number of benign peers with, respectively, a clean and polluted file at time t - p(t) is given by:

p{t) = yW + * (4.10) Fy ' x(t) + y(t) + N K ' The differential equations governing the model are given by:

^ = [M - x(t) - y(t)Hl - p(t)) + y(tUl - p(t)) (4.11)

^ = [M- X(t) - y(t)Mp(t)) - y(th(i - P(t))

The first term in the first equation accounts for peers with no copies downloading a clean copy, and the second term models peers with polluted files downloading a clean copy. Analogous reasoning applies for the second equation. Kumar et al. are able to solve this system of equations in closed form. They show that the solution is given by:

M *(*) = M+N .* i+M^-jrfr)™*" (412)

y(t) = M - cie-"* - x(t) where ci = M - x(Q) - y(0) (4.13)

Co - x(0) (N+xW+ypU-jMjf U2 M-i(0)V M+N > In order to validate their fluid model, they compare their solutions to the results from a discrete event simulator of the P2P network. They find that the results mirror each other very closely, especially in the case of a large network (M=100,000 and iV=10,000). Kumar et al. also examine the effect of K = (y(0)+N)/x(0), the initial ratio of polluted to good copies in the network. They consider the limiting case of pollution in a very large network, by defining (4 14) *"®-&.*¥ = T+KP* ' The equation illustrates that, even when the number of users wishing to download a file tends to infinity, the initial proportion of infected files has a significant effect on the prolif- 4 Background and Related Work: P2P Viruses and Pollution 70 eration of polluted files in the network. The paper shows that once attackers stop seeding their polluted copies - meaning N ~ 0, the time until M — e of peers obtain a clean copy of the file is 0(- In —). Kumar et al. also study more advanced user behaviour, including a version centric model where there are multiple versions of a title in the network, and users choose to download a version with equal probability regardless of how many copies are available. This model also has an explicit solution. Other modifications considered are: modeling of peer abandonment and freeloading, a more advanced version of the centric model where users have a varying amount of bias for selecting versions with more copies, and blacklisting behaviour whereby users never download the same polluted version more than once. All these modifications are only solved numerically. In [44] Dumitriu, Knightly, Kuzmanovic, Stoica, and Zwaenepoel model two versions of P2P pollution: file-targeted and network targeted. A file-targeted attack is the traditional approach, were malicious nodes inject corrupt copies of a file into the network. Network- targeted attacks are theoretical in nature, but potentially much more effective. They entail malicious nodes responding to every search request with false information.

To model file-based attacks, Dumitriu et al. present a model with M users, with gt and bi being the respective number of peers which have a good and corrupt copy of a file at hour i. The remaining M —

9i+i =9i + (M-gi- bi)si-^-r (4.15)

Under the assumption that peers with an infected file delete it at most after L hours, and after j < L hours with probability pj, where Y^=\Pj ~ 1> *ne number of peers with polluted files at time i + 1 is given by the recursion

L

bi+1 = bi + Ri- 'Y^pjRi+i-j (4.16) 4 Background and Related Work: P2P Viruses and Pollution 71 where Ri is the number of peers which newly downloaded a polluted file at time i:

Ri = {M-gi- bi)sr (4.17) 9i + bi

Under these model assumptions, all peers eventually download a valid copy of the file. However, since measurement studies of P2P networks indicate that the proportion of pol­ luted files stays fairly constant with time [57], Dumitriu et al. present a modification to their model to account for this behaviour. They introduce a probability ps that a user does not share a legitimate copy of a file after downloading it, and denote by g[ the number of peers at time i sharing a good copy of the file. This leads to the following modified version of equation 4.15: 91 9i

The authors also argue that general peer interest in downloading a file decreases with time, due to frustration from pollution. Thus, st decreases with time in the modified version.

Setting Si to decrease at 15% per day, s0 = l/24,pa = 0.4, M - 15,000, b0 = 1,500, g0 = 10, and L = 48, they derive Figure 4.6. Under these conditions, the proportion of polluted files remains high, and eventually essentially constant.

1 1 1 — Corrupted copies / v — Non-corrupted public copies 0.35 _

s \ 3 il ° 1 \ - 03 ; °0.25 CO d) 1 V CL O 0.2 -! O N \ s— ' O \ c 0.15 \ \ ,o \ "o - . 2 0.1 u. - -

50 100 150 200 250 300 350 400 450 500 Hours

Fig. 4.6 Steady-state pollution level. Source: [44] 4 Background and Related Work: P2P Viruses and Pollution 72

In Network-targeted attacks, the second type of attack proposed by Dumitriu et al., malicious nodes respond to all searches directed to them, and are able to modify responses from other nodes. For P2P networks which incorporate supernodes, such as Kazaa, searches and responses are are routed via a mesh-connected network of the supernodes, and supern­ odes respond to the the searches originating from their attached leaf nodes. The authors define a model with N peers, S supernodes, and s < S < N malicious supernodes. A false response to a search results when one or more supernodes involved in a search is malicious. Therefore, if a search traverses a path involving h supernodes, the probability of a false response is P(false\h) = | + (l - (l - i)*) (l - i) (4.19)

Here the first term is the probability a leaf node is connected to a malicious supernode, and the second term is the probability of encountering a malicious supernode along the path, given that the leaf node is not directly attached to one. An analogous result holds for proposed structured P2P networks such as Chord [66] and Pastry [67], which can be approximated by ^-regular graphs. These do not implement supernodes, so the instances of ~ in (4.19) are replaced by the ratio of the number of malicous nodes n to total nodes N. Next, Dumitriu et al. consider P2P networks with a power-law structure. They assume the degree of nodes is given by a Pareto distribution, meaning the number of nodes with degree greater than x connections is given by P(X > x) = kx~a, where k and a are constants. The authors show that if there are h hops per path and malicious nodes are placed at the / most connected nodes in the network, then the probability of receiving a false response to a query is given by:

/ D(f)\H /N{»-VI* - f(a-l)/o\ w=i -11 - m)=* - ( w-J-i ) <4-2o> Dumitriu et al. examine client strategies to counteract the network-targeted attacks. In all cases, the assumption is that the peer receives N replies to a search, and the probability of a given one being false is given independently P(false). If malicious nodes always provide responses indicating they offer the best performance (for instance, in terms of available upload rate), and a user always selects to download from the "best" source, the probability of success is only P(succ) = (1 — P(false))N. If a user selects a source at 4 Background and Related Work: P2P Viruses and Pollution 73

random, then P(succ) = 1 — P(false). If a user selects C copies to concurrently download, the probability of at least on successful download is P(succ) = 1 — P(false)c. Simulation results show that the network goodput (the average aggregate rate of useful data transmitted in the network) in hierarchical P2P networks is reduced when no malicious nodes are present and peers implement a redundant version of the "best" approach by downloading from the two or more nodes advertising the best performance in parallel. It has no beneficial effect when malicious nodes are present. Random redundant downloads reduce goodput when no malicious nodes are present, but increase it in the presence of malicious nodes. A combination of random redundant downloads along with the implementation of an object reputation system fairs best when a significant proportion of supernodes are malicious, but greatly reduces goodput when fewer than 2% of supernodes are malicious. In [68] Christin, Weigend, and Chuang perform a measurement study of four P2P net­ works: Gnutella, eDonkey, Overnet/eDonkey, and Fastrack/Kazaa. Based on the measure­ ment data, they perform simulations to assess the effectiveness of poisoning the networks (they define pollution as the accidental injection of unusable files into a P2P network, and poisoning as the intentional injection of decoy files - often by copyright holders to decrease the ease with which pirated files may be downloaded). The two metrics Christin et al. consider in examining the effects of content poisoning are Temporal Stability and Content Replication. Temporal Stability is defined by a function x(r)i which provides the average probability that a file available at a given time T is also available at time T + r. This empirically derived function is obtained by averaging measured data both over time and over all P2P clients used in the experiment (located at over 50 nodes around the world including the Americas, Europe, Asia, and Oceania). Content replication measures the number of copies available for each of the (up to) 1000 most prevalent files matching the 15 search criterion used. Of the 15 search strings, 6 were for movies, 6 for songs, and 3 for software titles. The titles are not given, but are indicated as being "popular" at the time. The first poisoning technique Christin et al. consider is Random Decoy Injection. Under this approach a set of hosts inject different decoy files with random, frequently changing content. The authors argue that this is not an effective strategy unless an extremely high number of decoys are injected into the network - on the order of 100 times the number of legitimate files returned by a query. At lower levels, users can easily filter out random decoys by ranking matches according to the number of available copies. A second alternative suggested is Replicated Decoy Injection (RDI), where numerous 4 Background and Related Work: P2P Viruses and Pollution 74

i i i i 0.9 0.8 / 0.7 0.6 \ 0.5 Replicated / 0.4 0.3 I -Original 0.2 Repl. transient 0.1 Random (99%) ~A 0 i i i , li i i i -24 -IS -12 -6 0 6 12 IS 24 -12 -6 0 6 Time (hr) Time (hr) (a) eDonkey/Overnet (b) FastTrack

l 0.9 Replicated 0.8 0.7 0.6 13 0.5 0.4 I 0.3 (2 Repl. transient '--> Original 0.2 0.1 Random (99%) - 0 -24 -18 -12 Time (hr) (c) Gnutella

Fig. 4.7 Effect of poisoning techniques on temporal stability. Source: [68] 4 Background and Related Work: P2P Viruses and Pollution 75

Replicated (Transient) Replicated (Transient) B IO s 10 Original \ ••3 o Original Random -A. I \ Random I, S 10 100 1000 £ It 10 100 1000 Availability rank Availability rank

(a) eDonkey/Ovemet (b) FastTrack

9r Replicated (Transient) 8 io - O fe Original | Random Z 1 10 100 1000 Availability rank (c) Gnutella

Fig. 4.8 Effect of poisoning techniques on content replication copies of the same decoy are injected into the network. In this case, the decoy would rate highly when matches are ranked according to availability. While a single decoy version would be easily detected, a more advanced version of this technique with multiple decoys would be more effective. Christin et al. suggest that 30-40 versions of the decoy, each with 10 copies injected per hub would significantly shift the popularity rankings in favour of the decoys. However, this approach is easily thwarted by an external file reputation system. In order to combat this countermeasure, the authors suggest a more advanced poisoning technique called Replicated Transient Decoy Injection (RTDCI). With this approach, the injected replicated decoys are often replaced. The results of the three approaches are illustrated in Figures 4.7 and 4.8. One of the characteristics that makes RTDCI attractive is that, as shown on the plot, it has little effect on temporal stability and therefore is difficult to detect and counteract. Both RTDCI and RDI significantly affect content replication (in an identical fashion, skewing it toward the decoy files). Christin et al. contend that the only currently deployed countermeasure which would have any level of effectiveness against RTDCI is deliberate misspellings in file metadata. 4 Background and Related Work: P2P Viruses and Pollution 76

4.2.3 Measurement of P2P Pollution

The two papers discussed in this section both present significant measurement studies of the amount of pollution in contemporary P2P networks In [57] Liang, Kumar, Xi, and Ross measure the amount of pollution in the Kazaa/Fastrack network. The Kazaa network is a two-tiered system in which peers are divided into two classes: Super Nodes (SNs) and Ordinary Nodes (ONs). SNs tend to have higher band­ width connections and processing power. Each SN has approximately 1000 ONs connected to it, all of whose shared files it indexes. When a user initiates a search, it is forwarded to a subset of SNs which respond with matching results. Liang et al. created the "Kazaa Crawling Platform", a multi-thread program which connects to essentially all of Kazaa's approximately 30,000 supernodes in under one hour. They used this program in order to measure the pollution level of five songs which were popular during the time of the study. Their data indicate that all songs had a significant number of different versions available in the Kazaa network. A unique version is identified by its ContentHash, a proprietary hash signature employed in the Kazaa network. Ta­ ble 4.2.3 summarizes the number of versions and copies. These results were obtained by querying all the supernodes with keyword searches corresponding to the song titles. Two file are considered copies if, and only if, they have the same ContentHash value. As it is very time consuming to listen to each version of a song to determine whether it is polluted or not, the authors devised an automated scheme based on the observation that most polluted MP3 files do not adhere correctly to the standard and are hence non- decodable. Thus, each copy was tested in an automated fashion, and classified as polluted if it was non-decodable and/or its length was not within 10% of the true length of the song on the original CD. Figure 4.9 shows the percentage of polluted copies and versions for the top 500 versions of each of the songs. For 4 of the songs, over 50% of copies were polluted. In order to illustrate that these levels of pollution among popular songs were likely due to intentional action on behalf of the record companies to thwart copyright infringement, the authors examined the level of pollution of five older songs. These exhibited significantly lower levels of pollution (3 had less than 2% pollution). In [69] Liang, Naoumov, and Ross identify and measure the prevalence of a relatively new variant of P2P pollution: index poisoning. They examine its presence in both the 4 Background and Related Work: P2P Viruses and Pollution 77

Song Title Number of Versions # of Copies Naughty Girl 26,715 631,387 Ocean Avenue 8,000 174,106 Where is the Love ? 48,613 448,987 Hey Ya 46,926 734,108 Toxic 38,992 650,529 Tipsy 32,893 853,688 My Band 49,447 1,816,663

Table 4.1 Number of available versions and copies of seven popular songs (collected May 1, 2004). Source: [57]

90 ao •Versions Polluted 70 • Copies PoMed

HeyYa Naughty Ocean Tipsy Toxic Where is My Band Girl Avenue the love ?

Fig. 4.9 Percentage of polluted versions and copies of songs for the top 500 versions in the Kazaa network (May 1, 2004). Source: [57] 4 Background and Related Work: P2P Viruses and Pollution 78

Clean ezzza Clean irzzza Polluted **wm Polluted i Poisoned i Poisoned ••• w n

0.4

8 9 to 11 Title Title (a) FastTrack (b) Overnet

Fig. 4.10 Measured poisoning and pollution levels in Fastrack and Overnet. Source: [69]

Fastrack/Kazaa and Overnet networks. Rather than sharing corrupted or unusable files as decoys, malicious attackers create entries in their index for nonexistent files. These advertisements include a hash value, metadata, and the IP/port number of the client. When a user attempts to download one of these would-be files, the P2P client will continuously attempt (and fail) to locate and download a copy. The strategy is another attempt by copyright holders to decrease piracy. The advantage of index poisoning, from the perspective of the attacker, is that it requires much fewer server resources and less upload bandwidth than injecting actual decoy files into the network. No files are transferred - only records for files which don't exist. A single malicious node can have hundreds of bogus index entries with keywords in the metadata corresponding to a particular copyrighted file. Liang et al. point out that it is difficult, time-consuming, and requires very large amounts of bandwidth and storage space to measure the level of poisoning in P2P networks by attempting to download a large number of files and determining how many are polluted or correspond to poisoned index entries. Instead, they employed a simpler, novel technique based on the observation that attackers typically have a large number of versions of one song. Liang et al. considered the 10 most popular songs, according iTunes at the time of the study in June 2005, and harvested the number of available versions and copies available in Overnet and Kazaa. They then classified the users as either "heavy" or "light" depending 4 Background and Related Work: P2P Viruses and Pollution 79 on how many versions of a song they offered. Next, Liang et al. present the heuristic that a heavy user - one sharing a large number of versions of the song - is likely to be an attacker, while a light user - sharing one or only a few versions of the song - is likely to be a legitimate user. Thus, files advertised only by heavy users are likely to be decoys implemented by index poisoning, those shared by only light users are likely to be legitimate, and those shared by both are likely to be polluted decoys (since legitimate users are able to download these from the attackers). In order to evaluate their heuristic, Liang et al. downloaded approximately 50 000 versions of the songs and analyzed the binary data through an automated process to check if it was likely to be legitimate. They show that their heuristic was correct approximately 96% of the time. Table 4.2 illustrates the differences between attackers and legitimate users in the Fastrack network. Attackers tend to have numerous clients per IP address (13.9 on average compared to 1.4 for legitimate users), have many more copies on average (136.3 vs 2.9), and account for the majoriy of available copies and versions in the network (77% and 72% respectively), although they only make up 7% of clients. Figure 4.10 illustrates the measured poisoning and pollution levels. The Fastrack net­ work exhibited a significant amount of both attacks, with less than 20% of the copies available being legitimate for 7 of the 10 songs. There was also significant poisoning in the Overnet network, but virtually no pollution. In order to combat poisoning, Liang et al. suggest a distributed ratings system of blocks of IP addresses. The rational is that attackers tend to control a small block of IP addresses. It may be inefficient to block individual IP addresses, since attackers can change them easily within their assigned range.

# of IPs # of Users Copies Versions Attacker 624 8,683 1,183,622 443,102 Ordinary 82,015 117,673 347,939 167,103

Table 4.2 Statistics for legitimate clients and attackers in the Fastrack net­ work. Source: [69] 80

Chapter 5

Epidemiological Models of Peer-to-Peer Viruses and Pollution

5.1 Chapter Structure

The chapter is structured as follows. Section 5.2 presents a model for the expected evolution of a virus in the system. We derive several differential equations that govern the expected evolution of the network over time. We also include an analysis of the steady-state behaviour of our P2P virus model. Section 5.3 introduces a second model of P2P virus propagation, which deals with a more malicious type of virus. We present equations governing this model and also consider some extensions. Section 5.4 introduces an epidemiological model for the proliferation of pollution. We also consider the impact of object reputation schemes. Section 5.5 reports on an empirical study of the e-Donkey network, which we conducted to identify suitable parameters for the examination of our models. Finally, Section 5.6 reports on simulations of a P2P network, based on the various models presented in this chapter.

5.2 S-E-I P2P Virus Model

The intent of our S-E-I (Susceptible-Exposed-Infected) model is to predict the expected behaviour of a virus which spreads through a P2P network in the form of malicious code embedded in executable files shared by peers. We make the simplifying assumption that all users download files to their shared folder. We are not concerned with the transfer of media files which cannot contain malicious code, and do not model them. Note that we use

2007/11/01 5 Epidemiological Models of Peer-to-Peer Viruses and Pollution 81 the term user in this chapter to refer to a person using a P2P client program. The term peer is used to collectively refer to a P2P client and the user directing its behaviour. This model classifies all peers as falling into one of three classes: Susceptible, Exposed, or Infected:

Susceptible - Peers that are not sharing any infected files, but are at risk of downloading infected files. The number of peers in this category at time t is denoted by S(t).

Exposed - Peers that have downloaded one or more infected files, but have not executed them. The number of peers in this category at time t is denoted by E(t). The Exposed category is included in the model to allow for a delay between download of an infected file and execution.

Infected - Peers that have executed an infected file. Upon execution, a total of of c infected files reside in the peer's shared folder. The number of peers in this category at time t is denoted by I(t).

An Infected client may be detected by the user, who will then proceed to remove all the infected files, thereby returning the state of the peer to Susceptible. At all times, every one of the N peers making up the network falls into one of the three categories. Thus, for all values oit,N = S(t) + E(t) + I(t). We assume that the total number of uninfected files in the network is fixed at M. The total number of infected files at time t is given by K[t). The expected proportion of infected files in the network, q(t), is therefore q(t) — Kft^M • When a user downloads a file, we assume the probability of choosing an infected file will be dependent on the prevalence of infected files in the network. The probability will vary to some degree for different peers, according to whether the peer has updated virus-detection software or is aware of the common characteristics of virus files (such files are often much smaller than genuine versions of the item). In our model, we are interested in the average probability of choosing an infected file, and we denote this probability by h(t). When we examine steady-state behaviour later in this section, we set h(t) — aq(t), for some constant a, to reflect the fact that the probability is closely tied to virus prevalence. There are three distinct events that may occur in the network which affect one or more 5 Epidemiological Models of Peer-to-Peer Viruses and Pollution 82 of the time-varying variables described above. These events include one peer downloading a file from another, a peer executing a shared file, and an Infected peer recovering. Although individual peers conduct these activities at (potentially very) different rates, we develop our model based on average behaviour. Our simulation results in Section 5.6.2 indicate that this modeling choice does not produce substantially erroneous behaviour. The average rates at which each of these events occurs are governed by three parameters:

As: Average rate, in files per minute, at which each peer downloads new files (this includes time spent searching and setting up the connection to another peer).

A#: Average rate, in files per minute, at which each peer executes shared files. We assume that a peer executes files in the order in which they are downloaded.

XR: Average rate, in "recoveries per minute", at which Infected peers recover. A recov­ ery occurs when all infected files are removed, returning the peer state to Susceptible.

Table 5.1 summarizes which time-varying variables are affected by each of the three events that may occur in the network. The state progression for all peers in our model is S —>£?—>/ —> S.... We now derive the differential equations that govern the evolution of our P2P model.

Event Variables Affected File downloaded q(t), S(t), E(t) File executed q(t), E(t), I(t) Peer recovers q(t), I(t), R{t)

Table 5.1 P2P Virus model variables that are potentially affected by each possible event in the network.

5.2.1 Model Equations

Rate at Which the Number of Infected Peers Changes

When an Infected peer recovers, the number of Infected peers decreases by one. Recoveries occur at rate XRI(t). When an Exposed peer executes an infected file, the number of 5 Epidemiological Models of Peer-to-Peer Viruses and Pollution 83

Infected peers increases by one. Since we assume files are executed in order of download, the file executed by an Exposed peer will always be the infected file which it had downloaded to become Exposed. This occurs at a rate of XEE(t). Therefore,

*M = -\RI{t) + \EE(t) (5.1)

Rate at Which the Number of Exposed Peers Changes

The rate at which the number of Exposed peers decreases due to infection is given by the negative of the second term in (5.1). The rate at which previously Susceptible peers become Exposed is dependent on the aggregate rate at which they download files: \sS(t), multiplied by the probability that a downloaded file is infected: h(t). The overall rate is therefore:

^ = -XEE(t) + XsS(t)h(t) (5.2)

Rate at Which the Number of Susceptible Peers Changes

This is governed by the negatives of the the first term in (5.1) and the second term in (5.2):

^ = -XsS(t)h(t) + XRI(t) (5.3)

Rate at Which the Number of Infected Files in the Network Changes

There are three events which result in a change in the number of infected files in the network: a peer downloads an infected file, an Exposed peer becomes Infected (this only changes the number of infected files in some scenarios), and an Infected peer recovers. In some scenarios, there is a fourth case: an Infected peer executes an infected file. This is addressed below. The rate at which these events occur is dependent on three model assumptions. Two of these are related to peer behaviour and one to the behaviour of the virus. The first assump­ tion is whether downloaded files are always eventually executed, or if some proportion are never executed. Second, if a downloaded file is executed, users may or may not be able to download additional files before execution. The final assumption relates to the creation of new infected files upon execution of an infected file. Some viruses generate c infected files with file names randomly generated from a pool significantly larger than c, while others 5 Epidemiological Models of Peer-to-Peer Viruses and Pollution 84 generate the same list of c file names every time. In all cases we assume that each peer can only have one instance of a particular file name. These three binary choices give rise to eight possible models. We consider only the simplest case here: all downloads executed, no additional downloads before execution, c unique file names. All eight models are presented in Section A.2 in the Appendix. In the simple case only Susceptible peers can download infected files. Exposed peers do not download any additional files before becoming Infected, and Infected peers are sharing all c possible infected files. Thus, the rate of change due to downloads is S(t)Xsh(t). An Exposed peer always has one infected file before becoming Infected, meaning in all cases c— 1 new infected files are created when an Exposed peer becomes Infected. The rate of change is thus £7(£)A#(c — 1). An Infected peer will always share c infected files, so a recovery results in a reduction of c infected files. The rate is therefore —I(t)\RC. The overall rate of change of K is therefore:

^- = S(t)Xsh(t) + E(t)\E(c - 1) - I(t)\Rc (5.4)

5.2.2 Steady State Behaviour

If the P2P network reaches a steady-state equilibrium by some time t = T, then ^jp- =

—jfcl = jt - = 0. In this section, we assume that the probability of downloading an infected file is a function of the proportion of infected files, i.e., h(t) = f(q(t)). Defining E, I, S, as the steady-state values of, respectively, E(t), I(t), and S(t), Equation (5.1) implies that: I = E^- (5.5)

If we define r and ix as, respectively, the expected number of infected files each Exposed and Infected peer is sharing in steady-state (the actual expressions for r and //, depend on which of the models from Section A.2 is used), then q, the proportion of infected files in steady-state may be expressed as:

ET+JJJ* . q = = — (5.6) M + ET + IH 5 Epidemiological Models of Peer-to-Peer Viruses and Pollution 85

Substituting (5.5) into (5.6) provides:

E(rX + n\ ) R E (5.7) MXR + E(T\R + n\E) If f(o) > 0, equation (5.2) implies that, in steady state:

S = E~^- (5.8)

Since S = N — I — E, equation (5.5) can be utilized to express S as:

S = N-E(1 + ^) (5.9) XR

If h(t) is proportional to q(t), h(t) = aq(t), we can obtain a closed-form expression for E by substituting (5.7) into (5.8), equating with (5.9), and solving for E:

&= MaNXs{t,XE + rXR)-MXEXR)

(TXR + nXE){Xsa{XR + XE) + XEXR)

The expression for / follows trivially from (5.10) and (5.5):

-^ XE(aNXs{fiXE + TXR)- MXEXR) _

(TXR + fiXE)(Xsa(XR + XE) + XEXR) '

If q = 0, it follows from (5.6) that E = / = 0. It is of interest to consider Equation (5.11) as q approaches 0. In the limiting case, approached from above, we have the equality

aNXs(nXE + TXR) = MXEXR (5.12)

If we assume that all downloaded files are eventually executed - corresponding to cases 1) - 4) in Appendix A.2 - it follows that we can reasonably equate the rates of download and execution, A# = X$. Under this assumption, (5.12) provides the following minimum average recover rate, A™" in order for all infected files to eventually be removed from a P2P network: A™" = "NllXZ ;M>aNr (5.13) 5 Epidemiological Models of Peer-to-Peer Viruses and Pollution 86

This equation indicates that, if h(t) = aq(t), then A™" is a linearly increasing function of

5.3 S-E-I-R Model

We now present another model of virus propagation in P2P networks. The purpose of the model is to consider the effects of a theoretical P2P virus which is more malicious than the ones that have been observed. In particular, we consider a virus, that, when executed by a user, infects all other executable files in his P2P client's shared folder. An infected file is defined as an executable file containing malicious code. Unless otherwise specified, the assumptions for the model are the same as those indicated in Section 5.2 Our P2P model is based on the S-E-I-R model used in epidemiology to study the spread of diseases in a population [70]. This model classifies all individuals in a population as falling into one of four classes: Susceptible, Exposed, Infected, or Recovered. The definitions of the first three classes are the same as in Section 5.2, and Recovered peers are defined as ones that have no infected files, and are not at risk of downloading any infected files. The number of peers in this category at time t is denoted by R(t). An Infected client may be detected by the user, who will then proceed to remove the malicious code from all shared files, thereby changing the state of the peer to Recovered. Once a peer is Recovered, it will remain in that state for all time. In a practical setting, this corresponds to a situation where the user installs a virus scanner that removes the infections from all shared files, and checks and cleans, if necessary, all subsequently downloaded files. At all times, every one of the N peers making up the network falls into one of these four categories. Thus, for all values oi t, N = S(t) + E{t) + I(t) + R(t). The total number of files in the network at time t is given by M{t). The expected proportion of infected files in the network is given by q{t). We make the simplifying assumption that when a user downloads a file, he chooses one with equal probability from among all available files. Therefore, the probability of choosing an infected file is just q(t). It follows from the definition of the four peer classes that only Infected and Exposed peers can share infected files. We denote by qE(t) the expected proportion of files shared by Exposed peers which are infected, and express the expected number of files each peer is 5 Epidemiological Models of Peer-to-Peer Viruses and Pollution 87 sharing as «(£) = —jy . Then, the following relation holds:

K(t)[I(t) + qE(t)E(t)} I(t) + qE(t)E(t) Q(t) = (5.14) M(t) N

In this equation, the numerator is the total number of infected files in the network, and the denominator is the total number of files. Conversely,

N(t)q(t) - I(t) E (5.15) * (t) = E(t)

5.3.1 S-E-I-R Model Equations

Table 5.2 summarizes which time-varying variables are affected by each of the three events that may occur in the network.

Event Variables Affected File downloaded q(t),q»(t), S(t), E(t) File executed q(t),q»(t), E(t), I(t) Peer recovers q(t), I{t), R(t) Table 5.2 Model variables potentially affected by each possible event

The state progression for all peers in our model is S —> E —• / —> R. We now derive the differential equations that govern the evolution of our P2P model. We wish to emphasize that all our equations predict the expected behaviour of the network. Depending on the variances of peer download, execution, and recovery rates and the number of files shared, the behaviour of a particular realization of a P2P network may vary significantly from the expected results predicted by our model.

Rate at Which the Number of Recovered Peers Change

Each Infected peer recovers at a rate A^, meaning that the total expected rate at which newly Recovered peers arise is given by:

dR{t) = X I(t) (5.16) dt R 5 Epidemiological Models of Peer-to-Peer Viruses and Pollution 88

Rate at Which the Number of Infected Peers Change

When an Infected peer recovers, the number of Infected peers decreases by one. This occurs at a rate which is the negative of (5.16). When an Exposed peer executes an infected file, the number of Infected peers increases by one. This occurs at the aggregate rate at which Exposed peers execute files, A^£'(t), multiplied by the probability that the file chosen is infected. Since every peer executes each one of its shared files with equal likelihood, this probability is simply qE{t). Therefore,

E ^ = -\RI(t) + XEE(t)q (t) (5.17)

Rate at Which the Number of Exposed Peers Change

The rate at which the number of Exposed peers decreases due to infection is given by the negative of the second term in (5.17). The rate at which previously Susceptible peers become Exposed is dependent on the aggregate rate at which they download files, XsS(t), multiplied by the probability that a downloaded file is infected, q(t). The overall rate is therefore: E = -\EE{t)q (t) + XsS(t)q(t) (5.18) dt

Rate at Which Number of Susceptible Peers Change

This is governed by the negative of the second term in (5.18):

^ = -XsS(t)q(t) (5.19)

Rate at Which the Total Number of Files Changes

According to the model description to this point, the total number of files in the network grows unbounded at a constant rate 4M. = NXs. Obviously, this is unrealistic. As we argue in Section 5.5, the total number of files in an actual P2P network tends to remain fairly constant - at least over a time interval on the order of a month. This equilibrium can be attributed to users removing files from their shared folder at approximately the same rate at which they download new files. For the purposes of our model, we assume users randomly choose the files they remove from the network, and hence this activity has no 5 Epidemiological Models of Peer-to-Peer Viruses and Pollution 89 effect on the expected value of qE(t). Furthermore, we also assume that the removal rate is NXS, which means ^=0 (5,0,

Rate at Which qE(t) Changes

There are three distinct events which cause a change in qE(t): An Exposed peer downloads an infected file, an Exposed peer downloads a clean file, or a Susceptible peer becomes Exposed. We note that when an Exposed peer becomes Infected, this does not contribute to an expected change in qE{t) because the files removed from the pool shared by Exposed peers as a result of this transition have an expected infection proportion of qE{t). For each of the three events described, we now derive the resulting rate of change in qE(t). A. Exposed Peer Downloads Infected File This increases by one both the total number of files shared by Exposed peers, and the number of infected files. The expected number of total files shared by Exposed peers prior to the download was nE(t), and the number of infected files was nqE(t)E(t). The resulting change is

E E l + q (t)KE(t) _ E l-q (t) q {) l + KE(t) l + KE{t)

Since Exposed peers download infected files at an aggregate rate of E(t)\sq(t), the contribution to the rate of change in qE (t) is (521) ITSI^'^M ' B. Exposed Peer Downloads Clean File In this case, the total number of files shared by Exposed peers increases by one, while the number of infected files is unaffected. The rate at which Exposed peers download clean files is E(t)\s(l — q(t)). The contribution to the rate of change of qE{t), derived analogously to (5.21), is:

~qE{t) E(t)X (l-q(t)) (5.22) 1 + nE(t) s C. Susceptible Peer Becomes Exposed 5 Epidemiological Models of Peer-to-Peer Viruses and Pollution 90

A previously Susceptible peer becomes Exposed as soon as it shares one infected file. Thus, a new Exposed peer contributes one new infected file and an expected total of K files to the pool shared by Exposed peers. This means the change in qE(t) is:

E E l + q {t)KE(t) E l-Kq (t) K+KE(t) q* '[i) ' — K(l + E{t)) Susceptible peers become Exposed at a rate XsS(t)q(t), so the resulting contribution to the rate of change of qB(t) is:

1 _ KqE{t)-X S(t)q(t) (5.23) K(l + E{t)Ys

Combining the three rates (5.21), (5.22), and (5.23), provides the following expression for^fM:

E E E dq (t) E(t)Xs(q(t) - q (t)) 1 - Kq (t) = + s m) ( } -*r —r^m— WTEW) 5.3.2 S-E-I-R Model Extensions

Modeling On-line/Off-line Behaviour

In a real P2P network, individual peers are only on-line for limited durations. In order to capture this behavior, we present an extension of our model that includes both on-line and off-line users. Each of the four variables specifying how many peers are in each category - S,E,I,R-is partitioned into two variables to account for how many peers in the category are on and off-line. So, for instance, I(t) — Jjv(£) + /F(0> where Ipr(t) is the number of Infected peers on-line, and JF(£) is the number of Infected peers off-line. Peers that are off­ line go on-line at a certain rate AJV, and on-line peers go off-line at rate XF- The differential equation governing the change in the number of on-line Infected peers at time t is:

1MI = IF(t)XN - IN(t)XF (5.25)

The equations governing the rates of change in SN(t), EN(t), and H/v(£) are analogous. We assume here that peers go on and off-line at the same rate regardless of their state. It 5 Epidemiological Models of Peer-to-Peer Viruses and Pollution 91 would also be simple to expand the model to include different rates for each state. To complete the specification of the extended model, all the previously defined differ­ ential equations are changed as follows: every instance of S(t), E(t), I(t), and R(t), is replaced, respectively, by SW^), EN(t), ^(t) and i?jv(i)-

Modeling Peers that Remain Infected

One can argue that a certain proportion of P2P users, when their client becomes Infected, will never detect that this has occurred and not take any action to remedy this problem. In order to include this behaviour in our model, we classify all peers as "aware" or "oblivious". Aware peers behave as those in our basic model, while oblivious peers progress S —> E —> / and then remain Infected. The number of peers in each group is fixed: N — NA + No where

NA is the number of aware peers, and NQ is the number of oblivious peers. As in Extension A, the number of peers falling into each of the four categories at time t is partitioned into two groups; in this case the number of aware users in category X at time t, where X G {5, E, I, R}, is denoted by XA(t) and the number of oblivious users in each category is denoted by X0(t). The behaviour of aware users is determined by equa­ tions (5.16), (5.17), (5.18), and (5.19), with XA(t) replacing X(t) for all X e {S, E, I, R}.

Oblivious users are governed by (5.17), (5.18), and (5.19), with X0(t) replacing X(t), and

XR set to zero (reflecting the fact that oblivious peers never recover). Finally, d^' is governed by a modified version of (5.24), with S(t) replaced by SA(t) + So{t) and E(t) replaced by EA (t) + E0 (t).

Modeling Individual File Types

In addition to predicting how many clients fall in each category, one may also be interested in differentiating among the different file types in the network and predicting how the popularity and infection rate of each file type varies over time. The definition of file type is left general - all files matching a certain search criteria could define a file type, as could all files with the same value under some hash function. The revised model tracks T unique file types, indexed by 1,2,.., T. Every peer connected to the network may be sharing an infected instance of a given file type, a clean version, or 5 Epidemiological Models of Peer-to-Peer Viruses and Pollution 92 not have a copy of the file. The probability that a peer is sharing a copy of file type j at time t - or, equivalently, the expected proportion of peers sharing file type j at time t - is given by Pj(t). The probability that an Exposed peer sharing file type j has an infected instance of the file at time t is governed by the variable qf(t). Peers only download file types they do not have. In our model, when a peer initiates a download, it chooses a given file type with a probability proportional to the prevalence of files on the network. As a simple example, if the only file types a peer does not have are those indexed by 1 and 2, and the relative popularities are, respectively, 0.1, and 0.3, the 1 peer would choose to download file type 1 with probability 0.25 (= ,Q 1°+ 0 3)) and file type 2 with probability 0.75. In general, given that a peer has initiated the download of a file at time t, the expected probability that it has chosen file j is given by Tj{t) = ~P j< T where K = J2 Pi(t)0- ~ Pi{t)) + Pj(t). In the numerator, 1 — Pj(t) is the probability that the peer doesn't have a copy of the file, while the remainder of the expression is the expected probability that the peer chooses to download that file given that it does not have it. The (1 — Pi) in the denominator sum accounts for the fact that only file types the peer does not have are included. We assume that a peer is equally likely to download the chosen file from any peer sharing the file, and is unaware of which peers are sharing infected instances of the file. Thus, the probability that the peer downloads an infected version of file j is given by ^(i) = ^ . When a peer chooses a file to execute, it picks one of its shared files with equal proba- T bility. The expected number of files that it is sharing at time t is Yl Pi{t)- With probability Pj the peer is sharing file j, and if the peer does indeed have the file, it chooses to execute the file with probability -jr—. Therefore, the probability a peer executes file j is given by i=l JPj_ Uj — T Y, Pi 1=1 We now consider how the differential equations defining the state evolution change to reflect this modified model. Equation (5.16) is unchanged. For equation (5.17), qE(t) in the second term is replaced by the expected probability that an Exposed peer chooses an T infected file to execute:^ Uj(t)qf(t). For equation (5.18), in addition to the change just j=i T indicated, the value of q(t) is replaced by the weighted probability Yl rj(*)?j(0- This same 5 Epidemiological Models of Peer-to-Peer Viruses and Pollution 93 change is also made in (5.19). The four state equations for the modified model are therefore:

^ = A*/(t) (5.26)

^ = -\RI(t) + XEEJ2Uj(t)qf(t) (5.27)

rp rp

^jp. = -\EE(t)£uj{t)qS(t) + \sS'£rj(t)qJ(t) (5.28) dt j=i j=i dS(t) =-\sS(t)Y,rj(t) (t) (5.29) dt qj

The values of Pj (t) are also time varying. A given file j is downloaded by every peer with average rate XsTj, for an aggregate average rate of NXsrj. When a copy of the file is downloaded, the value of Pj increases by -^. Therefore, the contribution to the rate of change in pj is NXsrj(t) (£) = Xsrj{t). It follows from our assumption concerning file removal from Section 5.3.1 that each file type is removed from the network at an aggregate rate of NXsPj. The resulting change in Pj is jj, and the contribution to the rate of change in pj is —XsPj(t). Therefore, the net rate of change in file popularity is

d ^M = Xs{rj{t)-pj{t)) (5.30)

q The final component of the model is an expression for ^dt . Since the derivation is somewhat involved, we include it in the Appendix (Section A.3), and indicate the final result below:

where n,-(t) = qf(t)Pj(t)E, dt(t) = Pj(t)E, An^t) = **&, and Adj(t) = «22. Explicit expressions for Arij(t) and Adj(t) are provided by equations (A. 10) and (A. 11) in the Appendix. 5 Epidemiological Models of Peer-to-Peer Viruses and Pollution 94

5.4 P2P Pollution Model

download decide to good file give up

Fig. 5.1 The transition diagram for peers indicating the actions that trig­ ger movement between the three classes of Susceptible (S), Infected (I) and Recovered (R)

In this section we present a model of pollution - the presence of corrupt or mislabeled files - in P2P networks. We assume that M* peers are interested in item i, and that there are a multitude of versions of the item, classified as "good" or "bad". Initially the P2P network is seeded with Ng(0) good files and iV&(0) bad files. The peers who provided these seed files do not number among the Mi peers we consider in our model. We model the peers as belonging to three classes: Susceptible, Infected, and Recovered. S(t) is the number of susceptible peers at time t; this class includes all peers who will attempt to download another version of the file in the future. Initially 5(0) = Mi, as all interested peers are susceptible. 1(0) = 0 and R(0) = 0, because no files have been downloaded from the seeds. A peer transitions between the three states as depicted in the transition diagram in Figure 5.1. Each peer is susceptible when it intends to download a file. When a susceptible peer downloads a file, it joins the Infected class if the file is bad and the Recovered class if the file is good. A peer may leave the Infected class by testing the downloaded file and electing to retry at some stage in the future. In this case, the peer rejoins the Susceptible class. Alternatively, an infected peer may decide to give up and join the Recovered class, despite not being successful in acquiring a good version of the item. A peer may dwell in the infected state for some period of time before choosing to give up or to retry. This represents the period of time before an infected peer tests a downloaded file. 5 Epidemiological Models of Peer-to-Peer Viruses and Pollution 95

Eventually all peers will belong to the Recovered class. We label this class "recovered" primarily to highlight the parallels with standard epidemiological models. In our model the distinguishing feature of a recovered peer is that it is no longer actively seeking the item of interest. Note that in our model, any susceptible or infected peer may be sharing none or several polluted files, but cannot be sharing a good file. A recovered peer may share at most one good file and may share several polluted files. The number of good shared versions of the item varies over time, as does the number of bad. When a peer transitions from the susceptible to recovered state by downloading a good version, it shares the file with probability psg. When a peer transitions from the susceptible to infected state by downloading a bad file, it shares the file with probability psb. When a peer transitions from the infected to susceptible state or recovered state, it removes the polluted file with probability pdb. We model the probability of downloading a polluted file at time t, pb(t), as being equal to the fraction of polluted files. This probability is the same for a peer irrespective of how many times it has been infected. This is a reasonable approximation because the number of versions of an item is anticipated to be much larger than the number of re-tries. We model the expected behaviour of a large group of peers. At time t, a fraction of the susceptible peers Xs download a file. This is effectively the download rate. A fraction Ar of the infected peers decide to retry and hence rejoin the susceptible pool. A fraction Xx of the infected peers choose to give up and enter the recovered state. We make the simplifying assumption that the download rate, and the rates of trying again and giving up (Ar and

Ax) do not vary over time. A constant value of Xs produces the approximately exponential decay in the number of downloads of an item as time elapses and its popularity declines. It is reasonable to assume that the variation of the rates of trying again or giving up do not change substantially over time. With these modeling choices, we arrive at the following set of equations that describe the evolution of pollution in the system:

^ = -U(l) + V(l) (5.33) 5 Epidemiological Models of Peer-to-Peer Viruses and Pollution 96

^ = pb(t) \sS(t) ~ (K + K)I(t) (5.34) dR(t) = (l-p {t))\ S{t) + \ I{t) (5.35) dt b s x dN {t) b = KPb(t)PsbS(t) - (A + \ )PdbPsbI(t) (5.36) dt r x dN (t) g = X (l-p (t)) S(t) (5.37) dt s b Psg

As with the P2P virus model, these equations are derived under the assumption that all peers have common behaviour; variability in individual behaviour means that this will not be a completely accurate model of the system. In addition, the model does not address any notion of memory in user behaviour; it is probable that a peer's downloading behaviour would change substantially if it has already received several bad versions of an item. In simulations in Section 5.6.4, we account for variability in peer behaviour and a limited notion of memory; our results indicate that the deterministic model described above, despite its limitations and assumptions, provides a good indication of the evolution of the extent of pollution in the P2P network (for a specific item).

5.4.1 The Impact of Object Reputation Schemes

The possibility of downloading an infected or polluted file may be reduced through the use of an object reputation scheme which allows P2P users to rate individual files and share this information with others in the network. The standard Kazaa client [4] includes such a feature, allowing users to assign one of four possible rankings to each file. However, this simplistic implementation has been ineffective in combating the number of polluted files in the network [57]. A recently introduced object-reputation scheme for the Gnutella network named Credence [9] appears promising because of its robustness in the face of malicious peers which intentionally give high ratings to polluted or Infected files. In this section we model the effect that an effective object-reputation scheme such as Credence has on virus propagation in a P2P network. 5 Epidemiological Models of Peer-to-Peer Viruses and Pollution 97

Effect on P2P Virus Propagation

As in Extension B in Section III, peers are divided into two groups, "smart" peers which utilize an object-reputation system, and "regular" peers which do not. The number of regular peers falling in a category X at time t, is denoted by XR(t) and the number of smart users in each category is denoted by Xs(t). Regular peer behaviour is governed by (5.1), (5.2), and (5.3). Smart peer behaviour is determined by (5.1) and modified versions of (5.2) and (5.3) with h(t) replaced by g(t). In order to reflect the fact that smart users are less likely to download infected files, we require that g(t) < h(t) Vi. In the case of a perfect object-reputation system, in which smart peers never download infected files, g(t) = 0 W and hence Ss(t) = Ns W. Finally, equation (5.4) is replaced by

^ = SR(t)Xsh(t) + Ss(t)\sg(t)+

(ER(t) + Es(t))XE(c - 1) - (IR(t) + Is(t))XRc (5.38)

Effect on Pollution Dissemination

We model the effect on pollution dissemination in a similar fashion, decomposing the set of interested peers into the two groups of "smart" and "regular" peers. The object reputation scheme is assumed to reduce the probability of downloading a bad version of a file by a fixed proportion. Smart peers now download a bad version with probability **> - mrm (5'39) for some constant (3 < 1. Regular peers download bad versions with the same probability as before (proportional to the extent of pollution). The modified epidemiological model now keeps track of the number of smart and regular peers in each class and can hence determine the rates of change of the number of good and bad files in the network. We have:

dNb(-^ = X {p ,s(t) S {t) + p , {t) S (t)) - (A + X )p I(t) (5.40) dt sPsb b s b R R r x dbPsb dN (t) 9 = Kp ((l-Pb,s(t)) Ss(t) + (l-p , (t))S (t)). (5.41) dt ag b R R 5 Epidemiological Models of Peer-to-Peer Viruses and Pollution 98

5.5 P2P Measurements

•£- 0.5

Files (log scale) Files (log scale) (a) CDF of number of files shared per peer. (b) CDF of number of downloads per peer per 48 hour interval.

1 0.9 r~^~"

0.8 •

0.7 •

0.6 •

So.5 • • 0.4 • 0.3 -

0.2 • 0.1

A -1000 -500 J0 500 1000 150I 0 Files (c) CDF of net change in shared files per peer per 48 hour interval.

Fig. 5.2 Empirical CDFs based on eDonkey measurement data.

In order to choose a realistic value of Xs for simulation experiments with our model, we sought to acquire appropriate measurement data from an actual P2P network. A number of previous empirical studies have explored the behaviour of the Gnutella Network [71-74], the Kazaa Network [57,74], and the eDonkey network [75]. The statistics presented have included the number of files shared by peers, latency between peers, the amount of time spent on and off-line, the degree of peer connectivity, and mean bandwidth usage. However, 5 Epidemiological Models of Peer-to-Peer Viruses and Pollution 99 we are not aware of any previous work directly analyzing the rate at which peers download files. We chose to conduct our measurements on the eDonkey2000 network because of its popularity and the apparently limited amount of research conducted on the network. BayTSP [76], a company which monitors Internet file-trading, indicates that, as of Septem­ ber, 2004, the eDonkey2000 network had, on average, the most users of any P2P net­ work [77]. The eDonkey2000 network is comprised of a number of servers [43] to which a peer can connect. Each server keeps a list of all the files shared by connected peers, and uses this information to respond to keyword-based search queries. The search results returned by the server include a 16-byte MD4 hash [78] value for each file in order to uniquely identify it. When the user elects to download a specific file, his client sends the hash value of the desired file to the server, and the server responds with a list of IP addresses and ports of peers sharing the file. Our experiment consisted of two phases. In the first part, we collected a list of eDon- key2000 peer IP addresses/ports. We achieved this by first conducting searches for keywords likely to return a significant number of results, for example: ".exe", and ".iso", and then initiating the download of files shared by a large number of peers. Next, we made use of the Ethereal network protocol analyzer [79] to capture and analyze the packets returned by the server containing the peer IP addresses. We initiated the download of approximately 500 files to harvest over 20,000 peer addresses. For the next phase of the experiment, we developed a scanner program which attempts to connect to every peer and retrieve its list of shared files. We made use of previous work carried out to reverse-engineer the eDonkey2000 protocol [80], and conducted further analysis using Ethereal. Users of eDonkey2000 have the option of configuring their clients to block requests by other peers to view their list of shared files. Our work was complicated by the fact that approximately 95% of peers to which we attempted to connect did not permit viewing of their shared files. There are two obvious factors that contribute to this high percent­ age: eMule [81], the most popular eDonkey2000 client, has the blocking option enabled by default, and the advent of RIAA (Recording Industry Association of America) lawsuits directed against P2P users [82] based on the scanning of shared directories has likely moti­ vated many users to actively disallow the viewing of their files. Nevertheless, we managed to connect to 1000 peers and retrieve their lists of shared files. We repeated this procedure 5 Epidemiological Models of Peer-to-Peer Viruses and Pollution 100

xlO" 0.07

Susceptible Peers

SI £

0.5 Infectecj Peers

200 400 600 800 1000 400 600 800 1000 Time (hours) Time (hours) (a) The number of peers in each group (b) The proportion of infected files

Fig. 5.3 Example of the dynamic behaviour of a P2P network exposed to a virus (with model parameters set to the values described in Section 5.6.1). The network reaches steady-state after about 600 hours, at which point ap­ proximately twenty percent of the peers are infected. three more times, in 48-hour intervals. Each scan required approximately two hours to carry out. In order to deduce the rate at which users were downloading files, we tracked the addition of any new shared files every time the scanner connected to a peer. We assume that any new file is the result of a download. Admittedly, the possibility exists that a new shared file was not downloaded, but instead added to the shared directory by the user from a source outside the eDonkey2000 network. However, we are unable to distinguish such files and therefore our calculated download rate may be a slight over-estimate. Table 5.3 provides the results of our measurements. The overall average download rate is 37.7 files per 48-hour period. Figure 5.2 provides the empirical cumulative density functions (CDFs) of the number of files shared per peer, the number of downloads per peer per 48 hour interval, and the net change in number of files each peer changes per 48 hour interval. All three plots suggest heavy-tailed distributions, indicating that there are a small percentage of "power-peers", which are much more active and share many more files. This phenomenon has been observed in other empirical studies conducted on P2P networks [74,75]. We calculated the rate at which peers removed files from their shared folder, by counting all files peers had made available during a given run of our scanner program which were no longer present during a subsequent scan. The average removal rate is 29.1. Although 5 Epidemiological Models of Peer-to-Peer Viruses and Pollution 101 this does not entirely validate our Section 5.2 assumption of a zero net increase in the total number of files, it indicates that files are removed from the network at a similar rate to which new ones are downloaded. Furthermore, a website [43] tracking eDonkey2000 server statistics over one-month intervals indicates that, while there are significant daily fluctuations in the number of files available, the month-long trend is fairly constant.

Interval 96 of 96 of 96 of 96 of Average Peers Peers Peers Peers Down­ with with with with load 0 1-10 10-100 > 101 Rate new new new new (Files/ Files Files Files Files 48 hrs.) 1 11 50 33 6 41.2 2 12 47 33 8 35.8 3 7 39 48 6 36.0

Table 5.3 Observed eDonkey2000 peer download behaviour over three dis­ joint 48-hour intervals.

As stated in Section 5.2, we are only concerned about modeling executable files in P2P networks. To estimate the proportion of these files in the eDonkey2000 network, we analyzed the aggregate list of approximately 230,000 files initially shared by the one thousand peers we tracked. From this list, we removed all files with extensions known to indicate a media file, e.g. "." and ".avi". This left just over 55,000 files that were likely to be executable. Therefore, we estimate that the proportion of files on the eDonkey2000 network that can potentially contain ma­ licious code lies at 24%. We note that this value may be a slight over-estimate, due to the fact that some of the shared files were compressed (".zip" or ".rar"), and therefore we could not identify them as executable with total certainty.

5.6 Simulation Results

5.6.1 S-E-I Model

In this section we provide some examples of virus behaviour in P2P networks predicted by our S-E-I model. Results are based on the version of our model described in Section 5.2.1 5 Epidemiological Models of Peer-to-Peer Viruses and Pollution 102

xlO

400 600 800 1000 Time (hours) Fig. 5.4 The effect of the initial infection on the evolution of the number of infected peers. The light solid line corresponds to 10,000 infected files initially in the network, the dashed line: 100,000 initial infected files, the heavy solid line: 1,000,000 initial infected files. with c = 10. Figure 5.3 illustrates how the number of peers falling into each of the three categories evolve over time, and eventually reach a steady state. Due to our results from 37.7 section 5.5, we choose the following value for the download rate As: 48X60 x 0.24 = 3.14 x 10~3 downloads / minute. We assume that the proportion of downloaded files which are executable also lies at 0.24, and hence include this factor in the expression for As. For M and N we make use of global eDonkey2000 network data provided by [43] at the time of writing: N = 2.2 million and M — 0.24 x 260 million = 62.4 million. We set the average -4 time for a peer to recover to 24 hours, meaning XR is 6.94 x 10 . Finally, h(t) = 0.5q(t). Initially, there are 10,000 Exposed peers, each sharing one infected file. Unless otherwise noted, all the parameter values mentioned in this paragraph remain constant for subsequent examples. In Figure 5.4 we examine the effect of varying the initial extent of infection on the evolution of the number of infected peers in the network. For high initial infection (1 million files), there is an initial overshoot in the number of infected peers beyond the steady state. The medium initial infection case converges most quickly to the steady state value, since, out of the three cases, the number of initially infected peers is closest to the eventual steady state value. After about 700 hours, the three networks reach the same 5 Epidemiological Models of Peer-to-Peer Viruses and Pollution 103

100

(a) (b)

(c)

Fig. 5.5 The effect of varying model parameters on the analytical steady- state proportion of infected files, (a) The effect of varying c, the number of virus files created upon infection, (b) The effect of varying a, the constant determining the probability of downloading an infected file, (c) The effect of varying the download/execution rate XE- 5 Epidemiological Models of Peer-to-Peer Viruses and Pollution 104 steady-state. This is also the behaviour implicitly predicted by equation (5.11), since it is independent of any initial condition (as long as at least one infected file initially exists in the network). Figure 5.5 examines how the steady-state proportion of infected files is affected as model parameters are varied. The panels in the figure display the effect of changing (i) c, the number of virus files inserted in the shared directory upon infection; (ii) a, the constant that governs the probability of downloading an infected file; and (iii) Xs, the download rate of peers in the network. These plots indicate that an increasing a and download rate have a limited effect on the infection level of the network, whereas an increase in the number of files created by a virus can significantly raise the steady-state infection of the network. However, in a practical setting, the more new files a virus creates, the more likely a user is to notice them and delete them. Thus, in reality, the recovery rate would likely be an increasing function of c and the high level of infection for viruses creating 50 or more new files upon execution would be unlikely to occur.

5.6.2 S-E-I Simulations with Varying Peer Behaviour

The propagation of a virus in a P2P network predicted by our model is based only on the expected values of peer recovery rates - XR, peer download rate - As, and peer execution network - A#. Realistically, one may expect these values to differ significantly among peers. Since our equations do not incorporate the notion of a random distribution of these parameters for each peer, we are essentially modeling a P2P network in which all peers take on the same deterministic parameter values. Therefore, it is of interest to consider how closely the results predicted by our model mirror those which would be seen in a P2P network in which individual peer parameters are randomly distributed. To this end, we present a number of discrete-time simulation results for a peer-to-peer network in which individual recovery and download/execution rates are chosen according to several different probability distributions. All figures illustrate the evolution of the number of Infected, Exposed, and Susceptible peers over time. The non-hashed lines are the values predicted by our model, and the hashed lines represent the values obtained via our simulations. We consider 20,000 users sharing 600,000 clean files. Parameters not explicitly mentioned below are set to the same values as in Section 5.6.1. In Figure 5.6(a), the download/execution rate is uniformly 5 Epidemiological Models of Peer-to-Peer Viruses and Pollution 105

x10 2 """"V. " ' 1.8 \ • 1.6 ^ 1.4 Vw a> 1.2 OL o 1 cD |0.8 z0.6 0.4 •

IMIHIIIIHUIIIIM UIHuHiitHIHMH "'Hli

W»II,II|WIHH IMlWlKMIMIMMHt

HjimillMimimnillllHI'l^lMllMIIII'IM'^llllllliH'MIIMIIWHf 200 400 600 800 1000 Time (hours)

(c)

Fig. 5.6 The impact of variability in individual peer download rates. The dark solid, light solid and dashed lines show the predicted behaviour according to the dynamic model of infected, susceptible and exposed peers, respectively. The hashed lines show the results achieved in discrete-time simulations, (a) Download rate drawn from a uniform distribution; (b) download rate drawn from a normal distribution; (c) interval between downloads drawn from a normal distribution. 5 Epidemiological Models of Peer-to-Peer Viruses and Pollution 106

imn>**'>iinnninmnimi>MtimnniM"f w^^^mmfaammmnmammmmsesA

HIHIinHIMMHMIIMW'*"11*"1"*'"1*" i ntiiimiiHiiiiiHii inmuf

«tMimmn>«i»nHfcmitmimnmiMiiniinimiMiiiiii»imiMii

200 400 600 800 1000 200 400 600 800 1000 Time (hours) Time (hours)

(a) (b)

xlO 2

1.8

16

•J* '"« •""" IIUUI.JI • $1.2 Q-

o i

-| 0.8 II ii iiriirii.iiin: n *"' zO.6 0.4

0.2 jUIHIHIIIHIHIiniHHIHIIIIHHIIIIIIIIIIUHIIIIIUMHHHHmil

00 200 400 600 800 1000 Time (hours) (c)

Fig. 5.7 The impact of variability in individual peer recovery rates. The dark solid, light solid and dashed lines show the predicted behaviour according to the dynamic model of infected, susceptible and exposed peers, respectively. The hashed lines show the results achieved in discrete-time simulations, (a) Recovery rate drawn from a normal distribution; (b) interval between recov­ eries drawn from a normal distribution; and (c) Recovery rate, download rate and susceptibility to infection (a) drawn from normal distributions. 5 Epidemiological Models of Peer-to-Peer Viruses and Pollution 107

200 300 200 300 Time (hours) Time (hours)

(a) £(0) = 100,«(0) = —*262.4x10° s (b)25(0) = 1000,^(0) =e^&

100 200 300 200 300 Time (hours) Time (hours) 10000 100000 (d) E(0) = 100000, g(0) 6 (c) E(0) = 10000, q(Q) = 62.4x10° 62.4X10 Fig. 5.8 Examining the effect that the initial number of Exposed peers has on the dynamics of virus infection in a P2P network. In each plot, the dark solid line represents the number of Susceptible peers, the dark dashed line corresponds to Exposed peers, the light dashed line indicates Infected peers, and the light solid line represents Recovered peers. 5 Epidemiological Models of Peer-to-Peer Viruses and Pollution 108

X10

2 •

1.8 • 1.6 5 R 1,4 -\ -y 1.2 - t VA ^» / / 0.8 [ A v N - 0.6

0.4 •~** i/ / \ \ / • 0.2 * ,'S \ * * —1- - 200 300 Time (hours) Time (hours)

(A\ \I (b) Ap = T^r

200 300 200 300 Time (hours) Time (hours)

(c) AH l 96X60 (d) XR = 168x60

Fig. 5.9 Examining the effect of the recovery rate XR. In each plot, the dark solid line represents the number of Susceptible peers, the dark dashed line corresponds to Exposed peers, the light dashed line indicates Infected peers, and the light solid line represents Recovered peers. 5 Epidemiological Models of Peer-to-Peer Viruses and Pollution 109

200 300 200 300 Time (hours) Time (hours)

(a) 5o(0) = 100000, SA(0) = 2090000 (b) So(0) = SA{0) = 1095000

Fig. 5.10 Examining the presence of oblivious peers. 5o(0) and 5^(0) are the initial numbers of, respectively, Susceptible oblivious peers and Suscep­ tible aware peers. In each plot, the dark solid line represents the number of Susceptible peers, the dark dashed line corresponds to Exposed peers, the light dashed line indicates Infected peers, and the light solid line represents Recovered peers. distributed about the mean value of ^ files per day, with individual rates varying from 0 to ||. Figure 5.6(b) illustrates the case where the download rate is normally distributed with mean ^ and standard deviation 0.05. Finally, in Figure 5.6(c), the average length of time between downloads is normally distributed, with mean y and standard deviation 5. In Figure 5.7(a) the recovery rate is normally distributed with mean 1/24 recoveries per day, and standard deviation 0.1. In Figure 5.7(b) the length of the interval between recoveries is normally distributed with mean 24 and standard deviation 5. In Figure 5.7(c) both the download and recovery intervals are normally distributed. The key observation from these figures is that the simulation results converge to steady-state values, and that these values are within 10% of the values predicted by our model. Given these results, we assert that our model provides a good approximation of a P2P network in which individual peer behaviour may vary significantly from the mean. 5 Epidemiological Models of Peer-to-Peer Viruses and Pollution 110

,x10

10 20 30 40 50 20 30 Time (hours) Time (hours) (a) (b)

200 400 600 800 1000 Intial Number of Seeded Polluted Files (c)

Fig. 5.11 Examining the behaviour of the pollution model. Hashed lines are simulation results, (a) The evolution of susceptible (dotted), infected (solid) and recovered (dashed) peers, (b) The percentage of polluted files versus time. (c) The steady-state percentage of polluted files as a function of the number of initially "bad" files (with 100 good files). 5 Epidemiological Models of Peer-to-Peer Viruses and Pollution 111

0.065 0.065

0.06

o Q. 0.055 >i > •o 0.05 ro a> co

0.045 0.045 0.4 0.5 0.6 0.7 Download Rate (X ) Retry Rate (X )

(a) (b)

0.06

Give-up Rate (Xx) (c)

Fig. 5.12 The effect of varying model parameters on the analytical steady- state proportion of polluted files, (a) The effect of varying As, the download rate, (b) The effect of varying Ar, the rate at which Infected peers retry downloading a file, (c) The effect of varying the give-up rate Xx. 5 Epidemiological Models of Peer-to-Peer Viruses and Pollution 112

p Fraction of Peers Using Credence (a) (b)

Fig. 5.13 The impact of using an object reputation scheme such as Credence on the residual proportion of infected files. The proportion of infected files as (a) a function of /?, the parameter determining the effectiveness of Credence, and (b) a function of the fraction of peers using Credence. 5 Epidemiological Models of Peer-to-Peer Viruses and Pollution 113

X10 2

1.8

1.6

<« 1.4 0

a)

| 0.8

Z 0.6 WjimiMmiimnnni««M«««M«n''»'*»w*«« 0.4

0.2 WMIIIIIIIIIIIIIIIIIIIIIIIIIlllinilllMMIMHIIIIIMIllllHHf

0 200 400 600 800 1000 Time (hours)

Fig. 5.14 A comparison between the predicted behaviour according to the epidemiological model and a discrete time simulation. The hashed lines are the results of the simulator (number of susceptible, infected and exposed peers from top to bottom). These lines cover the predicted results for most of the display.

0.06 0.06

a; 0.05

0.04

0.03

0.02

0.01 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Fraction of Peers Using Credence (a) (b)

Fig. 5.15 The impact of using an object reputation scheme such as Credence on the steady-state proportion of polluted files. The proportion of polluted files as (a) a function of (5, the parameter determining the effectiveness of Credence, and (b) a function of the fraction of peers using Credence. 5 Epidemiological Models of Peer-to-Peer Viruses and Pollution 114

5.6.3 S-E-I-R Model

We now present a number of P2P network scenarios, and examine the virus propagation dynamics predicted by our S-E-I-R model. We use the same parameters as for the S-E-I model in Section 5.6.1. In the first scenario, a relatively small number of peers are initially Exposed and each is sharing one infected file. The remaining peers are all Susceptible. We consider four values for £7(0): 100, 1000, 10,000, and 100,000. Figure 5.8 provides the results. The figure indicates that the value of £7(0) has little impact on the peak infection level, which ranges from 415,940 peers for £7(0)=100 to 421,035 for £7(0)=100,000. However, the choice of £7(0) has a significant effect on the time of peak infection; £7(0)=100,000 results in a peak infection at 168 hours (1 week), and when £7(0)=100 the peak is not reached until 390 hours (2.3 weeks)

Next, we examine the effects of varying XR. £7(0) is fixed at ten thousand, with each of these peers sharing one infected file. The remaining peers are Susceptible. The recovery x rates we consider are 12^60, 48^gQ, 96^60, and 168 x60 minutes. Figure 5.9 depicts the results.

Clearly XR has a significant effect on the peak infection value - it ranges from 129,653

Infected peers for XR = 12 to 1,591,488 for XR = 168. Its impact on peak infection time is less significant; this value ranges from 322 hours for XR = 12 to 203 hours for XR = 168. The next scenario considers the effect of oblivious peers in the network by making use of Model Extension B from Section III. We fix £7^(0) at ten thousand, and XR at 24x60-

We consider three cases: So(0)=0, SU(0)=2,190,000; So(0)=100,000, SU(0)=2,090,000; and S'o(0)=.S'/i(0)=l,095,000. The first case corresponds to the previously discussed Fig­ ure 5.8(c), and the other two are illustrated in Figure 5.10. The presence of oblivious peers has limited impact on peak infection time - it lies at 243 hours when all peers are aware, and at 227 hours when half the initially Susceptible peers are oblivious. The peak infection lev­ els, in order of increased numbers of oblivious peers, are 416,430, 489,852, and 1,250,412. It is noteworthy that even in the case where 5o(0)=l,095,000, the system reaches equilibrium with all aware peers recovering within 60 hours of the peak infection. Finally, when modeling peers going off-line for periods of time, as described in Extension A in Section III, the result is merely a slower evolution of the system, i.e., a uniform stretch of the S-E-I-R plot. All peak values remain the same. As one would expect, the smaller the ratio of average on-line to average off-line time, the slower the system evolves. 5 Epidemiological Models of Peer-to-Peer Viruses and Pollution 115

5.6.4 Pollution Model Behaviour and Simulations

In order to verify our pollution model, we conducted a discrete-time simulation of a P2P network with polluted files, and compared it to the results predicted by our model. As with our other simulations, we used exponentially-distributed delays between the various events governed by rate parameters. We set psg = psb — Pdb = 0.3, Ng(0) = 100, iVj,(0) =

10, Mi = 20,000, As = |j, Xx = £4, Ar = ^. Figure 5.11(a) shows the number of Susceptible, Infected and Recovered peers versus time for both the simulation and the model. Figure 5.11(b) shows how pb varies with time and reaches a steady state. The model and the simulation track each other well, with the steady-state pb varying by less than 10%. In Figure 5.11(c), we examine the impact that the initial number of seeded polluted files has on the steady-state value of pb- All other parameters are as described above. This plot indicates that the initial number of polluted files seeded will indeed have a significant effect on the long term pollution level of the network. In Figure 5.12, we consider how the steady-state proportion of polluted files is affected by three of the model parameters. The panels illustrate the effect of changing (i) As, the download rate; (ii) Ar, the rate at which Infected peers try again to download a non-polluted copy of the file; and (iii) Ax, the rate at which Infected peers give-up on downloading the file. These plots indicate that variations in Ax and Ar have a similar effect on the ultimate level of pollution in the network.

5.6.5 Impact of Object Reputation Schemes on P2P Virus Propagation

We now report on simulation and model results for the impact of an object reputation scheme such as Credence on the evolution of P2P viruses. Figure 5.13(a) illustrates how the steady-state proportion of infected files changes as the effectiveness of Credence (as reflected by /?, the factor by which the probability of download of an infected file is reduced) increases. Figure 5.13(b) depicts the reduction in residual infection as the number of peers using Credence increases. These results are obtained for the model parameters described in Section 5.6.1. The results indicate that if Credence reduces the probability of downloading an infected file by a factor of 0.7 and fifty percent of the peers use Credence, then the residual infection is halved. Figure 5.14 compares the behaviour of the deterministic model with a discrete time 5 Epidemiological Models of Peer-to-Peer Viruses and Pollution 116 simulation of the propagation of a virus in a P2P network consisting of 20,000 peers. Fifty percent of the users employ Credence and it has an effectiveness of /3 = 0.7. The figure illustrates that there is a good match between the expected behaviour and that of the simulated system.

5.6.6 Impact of Object Reputation Schemes on P2P Pollution

In this section we consider the effectiveness of a Credence-like scheme in reducing pollution in a P2P network. The plots generated are similar to those from Section 5.6.5, with steady- state pollution levels replacing infection levels. Figure 5.15(a) illustrates the impact of P, while Figure 5.15(b) shows the impact of the proportion of peers using Credence. The other parameters are the same as in Section 5.6.4. The results indicate that a scheme like Credence can indeed be effective in reducing pollution, assuming a significant proportion of peers are utilizing it. For instance, if forty percent of peers use Credence and it reduces the probability of downloading a polluted file by 0.7, the proportion of polluted files in steady-state is approximately one half of what it would be without Credence.

5.7 Summary

In this chapter we have examined two of the principal problems currently facing content- distribution P2P networks: the spreading of malware and polluted files. We provided a total of three epidemiological models for these phenomena, and included several modifications to extend their ability to model additional factors. We also included measurement results from a P2P network, and presented a number of simulations based on these results and our models. 117

Chapter 6

Background and Related Work: Bit Torrent

6.1 Background

BitTorrent is an extremely popular peer-to-peer application for sharing large files, based on the principle of decomposing a file into multiple small pieces. A peer can then down­ load different pieces concurrently from multiple peers, and during the downloading process provide other peers with the pieces it has already retrieved. BitTorrent employs a rate- based "tit-for-tat" policy, whereby a peer chooses to upload to a small set of neighbouring peers which are providing it with the best download rates. This mechanism is intended to discourage free-riding (downloading without uploading) and promote fairness among the peers. Measurement studies have indicated that BitTorrent displays excellent scalability and achieves high utilization of the available upload capacity of the network [10,11]. These same studies and detailed simulation studies [12] have, however, called into question the fairness properties of the BitTorrent protocol. It has been observed that peers with high upload bandwidth frequently upload much more data than they download, with the opposite being the case for peers with low upload bandwidths.

2007/11/01 6 Background and Related Work: BitTorrent 118

6.1.1 The BitTorrent Protocol

BitTorrent is a peer-to-peer application that aims to enable the fast and efficient distribution of large files [6]. Here we provide a brief overview; see [6,11,12,83] for more detailed descriptions. The primary difference between BitTorrent and other file-sharing applications operating on peer-to-peer networks such as e-Donkey [42] and Gnutella [5] is that the files are split into equal-sized pieces and peers download these pieces concurrently from multiple peers. For each torrent (file) available for download, there is a centralized tracker that keeps track of the peers currently in the system. When a peer wishes to download the torrent, it notifies the tracker, and receives a list containing a random subset of the other peers. The peer attempts to establish connections to these other peers, which become its neighbours upon success. The group of neighbours is called the peerset of a peer, and in practice numbers about 40. Peers in the system are either seeds or leechers. Seeds have a complete copy of the file and are remaining in the system to provide pieces to others. Leechers are in the process of downloading the file, and can only upload the pieces they have already retrieved. Each peer strives to download pieces from other peers. Initially, when a peer needs to quickly acquire pieces to exchange, it accepts whatever pieces are made available, but later it chooses the pieces that are rarest amongst its neighbours in a local rarest first policy. BitTorrent attempts to induce fairness and guard against free-riding through a rate- based tit-for-tat policy. Each peer maintains a small, constant number of concurrent up­ loads (usually 5), preserving the balance through a process called choking. At any moment a peer has a set of 5 unchoked neighbours (those to which it is uploading) and a set of choked neighbours. Every ten seconds the peer evaluates the download rates it is receiving from its neighbours. If the lowest download rate provided by an unchoked neighbour is less than the highest provided by a choked neighbour, then the peer chokes the former and unchokes the latter. This peer selection policy attempts to establish the "fair" scenario where peers upload to and download from peers with similar bandwidths. In addition to this peer selection policy, BitTorrent incorporates optimistic unchoking. Every thirty seconds a peer randomly chooses a neighbour and uploads to it. This is both a search procedure, allowing peers to discover neighbours with better upload capability, and also serves to bootstrap peers that have just joined by providing them with an initial set of pieces to exchange. 6 Background and Related Work: BitTorrent 119

6.2 Related Work

6.2.1 BitTorrent Measurement Studies

Two significant measurement studies of BitTorrent provide some interesting insights. Izal, Urvoy-Keller, Biersack, Felber, Hamra et al. analyze data recorded, over a 5-month period, on a torrent for a 1.77 GB Linux distribution [10]. The information was obtained from the the tracker log, and from a modified client which interacted with the swarm - the set of all leechers and seeds.

4.5e+13 Uploaded by SEEDS 8 4e+13 Uploaded by LEECHERS 3.5e+13 3e+13 2.5e+13 2e+13 1.5e+13 1e+13 5e+12 30/03 31/03 01/04 02/04 03/04 0 24:00 24:00 24:00 24:00 24:00 31/03 01/05 01/06 01/07 01/08 01/09 Time Time (a) Number of peers connected to the tracker during (b) Total amount of data uploaded by seeds and leech­ the first five days. ers over five months. 100 LEECHERS •-•• 0.04 SEEDS Throughput(mean=1387 Kbps) 80 I- /VM# 60 (A/

S 40 a 0. o a. 20

0 31/03 01/05 01/06 01/07 01/08 01/09 100 1000 10000 Time Throughputs (Kbps) (c) Percentage of seeds and leechers connected to the (d) Distribution of download rates tracker over five months.

Fig. 6.1 Measured BitTorrent data. Source: [10]

Figure 6.1(a) shows the number of peers participating in the torrent during the first five days of its availability. There is a flash crowd effect during this time: an immediate influx 6 Background and Related Work: BitTorrent 120 of peers connecting when the file becomes available who quickly disperse, resulting in a lower, fairly consistent number of connected peers. Figure 6.1(b) presents the cumulative amount of data uploaded by seeds and leechers. After the initial flash crowd, the ratio of data uploaded by seeds vs. leechers remains fairly constant at a little above 2. This indicates the critical role that seeds play in sustaining a torrent. Figure 6.1(c) illustrates the proportion of seeds and leechers comprising the swarm. While there is initially a downward trend in the number of seeds, the percentage stabilizes, somewhat, at around 20% . Izal et al. also examine the performance of individual peers. A surprising observation is that a significant majority of leechers - 81% - which connect to the torrent never complete the download. Leechers that do not finish downloading tend to leave the swarm early: 90% stay for less than 10,000 seconds and 60% for less than 1000 seconds. The speculation is that in many of these cases the users experience disappointing download rates and therefore become discouraged and decide to disconnect. Among the peers that do successfuly com­ plete the download of the file, the average individual download rate is 500 Kbits/s for an average download time of 29,000 seconds. Figure 6.1(d) shows the distribution of download rates, illustrating that there is significant variability. The peak around 400 Kb/s is thought to be the result of the fact this is the download capacity of many ADSL connections. Pouwelse, Garbacki, Epema, and Sips present a second major BitTorrent measurement study [11]. Much of the data is collected from the website Supernova.org, which indexed available content and the associated torrent files. While this site has been shut down since the paper was published, the results still provide relevant insight into various aspects of BitTorrent and the numerous BitTorrent indexing websites that have since emerged. Pouwelse et al. tracked the total number of peers connected to trackers indexed by Supernova.org, over a one month period during the end of 2003 and the beginning of 2004. They found that there is significant variability in the number of daily peers, reaching a low of around 230,000 on Christmas Day, and a high of 575,000 early in the new year. They also recorded several instances of failures- of both the Supernova.org server, and the servers hosting the trackers - and found that they had a significant impact on peer levels. This clearly illustrates that the centralized repository of trackers and torrent data necessary for BitTorrent is a weakness, since it presents the possibility of catastrophic failures. Pouwelse et al. also examine the average download rate of over 50,000 peers, spread over 108 different trackers. The average download rate is 240 Kb/s. While this is only 6 Background and Related Work: BitTorrent 121 about half the rate determined in [10], the results presented in [11] presumably cover all downloaders including those who don't complete the download (which tend to have lower download rates, and are omitted in [10]). The nature of BitTorrent dictates that a file becomes unavailable for others to down­ load as soon as the last seed departs from the swarm. Therefore, the lifetime of content availability is unpredictable. In order to examine this issue, Pouwelse et al. measured the uptime of all peers connecting a tracker for a pirated PC game, over the entire 3-month period of time it was available. Figure 6.2 shows the distribution of peer uptime upon fin­ ishing the download and becoming a seed. The majority of users disconnect quickly, with only 17% staying connected for more than 1 hour, 3% staying for more than 10 hours, and 0.3% staying on for longer than 100 hours. This illustrates that long-term availability of content may be dependent on a very small number of reliable seeds which stay connected for an extremely long period of time. In this case, one seed remained connected for 84 days, almost the entire lifetime of the torrent.

1000

100 8 8 1 §• 10

Q.

1

0.1

1 10 100 1000 10000 Peer uptime ranking Fig. 6.2 Distribution of time peers remain connected to a tracker upon com­ pleting their download. Source: [11]

6.2.2 Modeling BitTorrent

In the following chapter we present a model of BitTorrent which we use to analyze the performance of our suggested modifications. Other approaches to modeling the protocol 6 Background and Related Work: BitTorrent 122 are summarized in this section. In [84], Qiu and Srikant present a fluid model of BitTorrent and derive a number of results about steady-state performance. The following variables are defined for the model: x(t): number of leechers in the network at time t. y(t)\ number of seeds in the network at time t. A: rate at which new requests arrive, according to a Poisson process. fi: the (identical) bandwidth each peer has for uploading, c: the (identical) bandwidth each peer has for downloading. 8: average rate at which leechers cease downloading, according to an exponential distribu­ tion. 7: average rate at which seeds leave network, according to an exponential distribution. 77: the effectiveness of ; 0 < 77 < 1. It is the probability that any given , i, has pieces which at least one other leecher j to which i is connected does not have, and hence is able to upload to j.

The value of 77 determines the average rate at which leechers upload to one another, relative to their maximum rate \x. The total uploading rate, across all peers in the system is given by min {cx(t), fi(rjx(t) + y(t))}. The first term corresponds to the condition where, in all cases, the bottleneck for transfers is the download bandwidth. The second term is for the case where the upload bandwidth is the limiting constraint. The probability of a leech be­ coming a seed during an interval of length 5 is assumed to be min {cx(t), fj,(r]x(t) + y(t))}5. Thus, the total rate at which leechers depart is given as:

min {cx(t),/i(r)x(t) + y(t))} + 9x(t) (6.1)

It follows that the rates at which the number of seeds and leechers in the system change are: f = A - 6x(t) - min {cx{t)^(r]x(t) + y(t))} (6.2)

f - min {cx(t), n(Vx(t) + y(t))} - 1V(t) In order to determine the steady-state performance, the equations (6.2) are equated to zero. The steady-state values of x(t) and y(t) are denoted, respectively, x and y. In the case where ex < fj,(rjx + y), (6.2) implies that y = ^. Substituting this expression for y 6 Background and Related Work: BitTorrent 123 into the inequality, provides the following equivalent inequality:

i > i• (I - I) (6.3)

Likewise, when ex > /.i(rjx + y), \ < - (- — -) Defining | = max-! ^M^-M|, allows the steady-state solution to (6.2) to be ex­ pressed as: x = (6.4)

y = 7(1+ Little's law provides further insight into the steady-state behavior of this BitTorrent model. Little's law states that, in steady-state, the average number of users in a system is equal to their average arrival rate multiplied by the average time, T, they spend in the system. Since the average number of leechers remains constant at steady-state, the rate at which they complete downloads is equal to the arrival rate: X — 6x. The average fraction of leechers that become seeds is ^y2- Therefore, by Little's law

t—^-x = (A - 9x)T (6.5) A Combining this result with (6.4) provides the following expression for T: T = eh (6'6) Qiu and Srikant present a number of observations that follow from these equations: • (6.6) implies that the download time T is independent of A, the rate at which new leechers enter the system. This indicates excellent scalability.

• (6.6), together with the definition of (3, indicate that T decreases as the effectiveness of filesharing, characterized by 77 increases. Also, T increases as the departure rate of seeds, 7 decreases. Both these observations are intuitively obvious and lend credence to the model. 6 Background and Related Work: BitTorrent 124

• T decreases as the upload capacity c increases, up until the point where ^ < ^ f ^ — ^ J • At this point the download capacity /i becomes the bottleneck. Likewise, T decreases with an increasing fi, until \> -(^ — -).

The total number of leechers in the system is given by x, the maximum number of other leechers a leecher can connect to is K, and each is connected to k = min{x — 1, K} other leechers. Therefore 77 = 1— V(leech i has no pieces j requires)fc. The number of unique pieces making up the file is N. Qiu and Srikant assume each leecher i has n» pieces of the file, where n, is a random variable uniformly distributed in 0,..., iV — 1. Assuming these n* pieces are randomly chosen from the set of all pieces, then "P(leech i has no pieces j requires) is:

X^-=i Y^i=Q^{J nas nj pieces and i has n, pieces} V{j has all of the pieces of i \ 7ii,rij}

(6.7) nj=l 2—in,i =0 N2 ("0

_ N+l V^N 1. r^ iQgiV

Note that we have omitted some of the steps of the derivation. Finally, the approximate expression for 77 is:

This shows that even for small values of k, a reasonably large file will result in a value of 7] close to 1. Thus, BitTorrent exhibits high efficiency - it is likely that any given leecher is connected to at least one other leecher with which it can exchange pieces. In [85] Barbera, Lombardo, Schembra, and Tribastone consider the issue of a freerider in BitTorrent. A freerider is defined as a leecher that does not upload any pieces to other leechers. Barbera et al. present a Markov model of a freerider, whose state is defined by the number of other leechers and seeds it is connected to, and the number of peers in each category which are either choking the freerider or optimistically unchoking it. The model also includes parameters specifying the arrival rate of seeds and leechers, and their aver­ age lifetimes. Finally, the rate at which peers choke and optimistically unchoke leechers is 6 Background and Related Work: BitTorrent 125 also incorporated into the model. After deriving the transition probabilities of the Markov model, Barbera et al. provide some numerical results. The most interesting insight is pre­ sented in Figure 6.3, which illustrates the time-averaged number of unchoked connections a freerider maintains as a function of the peerset size. The average number of leechers that the freerider is downloading from is less than one. This is due to the fact that leech­ ers, in accordance with the tit-for-tat paradigm, tend to choke the freerider immediately (i.e., during the subsequent unchoke interval), since they receive no pieces from the freerid- ers. However, since seeds base their choice of unchoked leechers on their download rates, independent of how much data the leechers upload, the freerider is able to continuously download from a number of seeds if it has a sufficiently high download bandwidth. The authors point out that this is contrary to one of the chief objectives of BitTorrent - to discourage users from only downloading and uploading nothing in return. To counteract this problem, Barbera et al. suggest that the seeds' unchoking algorithm be modified so they would make their decisions not just on the download capacity of the leechers, but also on how much the leechers are uploading to others. However, such a modification would require a significant change to the BitTorrent protocol, since in the current implementation there is no mechanism by which a seed may discern the upload rate of a leecher.

TOTAL -e—e—e-—e~ 4 %"' -e—e- —(b .-0

LEECHERS

__e e- ©—o—e ~e- %'—e- e—e- e— e -e -

10 12 14 16 18 20 N Fig. 6.3 Average number of unchoked connections vs. peer set size. Source: [85] 6 Background and Related Work: BitTorrent 126

6.2.3 Proposed Improvements to BitTorrent

Other authors have also published suggestions for improving BitTorrent performance. We now present an overview of two such papers. In [86], Koo, Lee, and Kannan examine the problem of BitTorrent neighbour selec­ tion. In the current implementation, the tracker provides each peer with a random list of neighbours. Koo et al. describe a more advanced algorithm, in which the tracker as­ signs neighbours in a fashion to maximize the number of unique pieces available for each to download. The model used is as follows: there are N peers, and if peers i and j are connected (i.e. they are neighbours) then the variable e^ = 1; otherwise e^ = 0. There is a set C of \C\ unique file pieces, and the set of pieces peer i has is denoted Q C C. Each peer % can be connected to a maximum of d; < N neighbours. The disjointness of peers i and j, defined as the pieces i has and j does not, is denoted by CJ\CJ:

Ci\cj = Ci - (ciDCj) (6.9)

The objective is to determine the set of connections E = {eij G {0,1} : i,j — 1, ...,N} in order to maximize the disjointness of content pieces, which is equivalent to maximizing the number of pieces each peer can upload to its neighbours. The optimization problem is expressed as follows:

niaxsEJLi Uf=1{(cA^)/K}} (6.10) constrained by: N J2eH

The function g(pi,Pj,eij) = Pj + {Pi(l ~ Pj)}^ij provides the proportion of files shared between the two peers i and j, and takes into consideration whether or not they are neigh­ bours. Defining the recursive function Gj(n) = g(pn, Gj(n — 1), eji), with Gj(0) = Pj allows the proportion of files shared between Cj, and all ck,k = 1...N to which it is connected to be expressed as Gj(N). After some manipulation, it can be shown that the optimization problem (6.10) is approximately equivalent to:

N max^lC^AO-^IICI (6.13)

3=1 In [12], Bharambe, Herley, and Padmanabhan conduct simulations of BitTorrent in order to examine its performance and suggest some improvements. The default scenario, they consider is one of a 100 MB files divided into 400 pieces of size 256 KB, with one initial seed having an upload bandwidth of 6000 Kb/s, 1000 leechers immediately connecting to the torrent in the form of a flash crowd and leaving immediately upon completing their download. Each leecher has a download/upload capacity of 1500/400 Kbps, has a neighbourhood of 7 peers and maintains 5 outgoing connections. In order to examine the scalability of BitTorrent, Bharambe et al. vary the number of leechers from 50 to 8000 and record the aggregate network uplink utilization (which is the percentage of the total available upload bandwidth over all peers used to upload data). They find that download utilization remains close to 100% and is mostly unaffected by the number of leechers. This illustrates that BitTorrent exhibits excellent scalabiliy and near optimality in terms of utilizing the upload bandwidth, which is typically the bottleneck in P2P applications. Bharambe et al. also consider how the load on the initial seed varies with the number of the leechers. The total number of copies the seed has to serve remains fairly constant, again indicating the system scales well. One improvement suggested by Bharambe et al. is termed the smartseed policy. This modification entails two changes to the behaviour of seeds: uploads are never choked until a complete piece is transferred, and instead of using the Local Rarest First (LRF) policy for choosing which piece to upload, seeds choose to upload pieces which they have previously served the least. The motivation behind the smartseed modification is to improve piece diversity in the system. Simulation results show that this objective is indeed met, and that the upload utilization significantly improves - especially in cases where the initial seed has a low uplink capacity. 6 Background and Related Work: BitTorrent 128

Bharambe et al. also consider the importance of the LRF policy used by leechers in determining upload utilization. Figure 6.4(a) illustrates the results, for scenarios in which three parameters are varied: random piece selection versus LRF; seed uplink bandwidth (400 Kbps vs. 6000 Kbps); and the neighbourhood size of each leecher. As shown, the importance of LRF is most significant when the seed has low uplink bandwidth and it takes a significant amount of time before a new piece is made available to leechers. However, if the size of the neighbourhood is very small (i.e. d = 4 in the plot), leechers have limited local information about piece availability and therefore piece diversity on a global level is poor. At high uplink utilization, new pieces are introduced to leechers quickly enough that there is little difference between the LRF and random policy. Bharambe et al. next consider the issue of fairness in BitTorrent in a scenario with heterogeneous peer bandwidths. They propose two modifications to improve fairness. The first is called Quick Bandwidth Estimation (QBE), which assumes nodes are able to use an (unspecified) algorithm to allow them to measure the uplink bandwidth of their neighbours. Leechers use this information to unchoke those peers with the highest upload rates. The other proposal is termed Pairwise Block-Level Tit-for-Tat. Under this proposal, leechers choose whether to unchoke a given peer based on the discrepancy in the total number of pieces uploaded/downloaded rather than the instantaneous rate. Under this approach there is a parameter A which is defined as the unfairness threshold: a node A will only upload to another node B if the inequality between the number of pieces A has downloaded from B,

Dab and the number of pieces A has uploaded to B, Uab satisfies Uab — Dab < A. While this approach improves fairness it can reduce utilization. For instance, if a peer has a download deficit with every neighbour, it would cease all uploading until that situation changes. The simulation scenario for the results depicted in Figures 6.4(b) and 6.4(c) includes an equal number of leechers in each of three categories defined by the following down­ load/upload capacities: 6/3 Mbps, 1.4/0.4 Mbps, and 0.784/0.128 Mbps. The low uti­ lization of Pairwise TFT when peers have a small degree (size of neighbourhood) shown in Figure 6.4(b) is mostly due to cases of high bandwidth peers being grouped with only low bandwidth neighbours. In this case, the nature of the Pairwise TFT dictates that the high bandwidth node only upload at a rate approximately equal to the upload rate of its low bandwidth peers. The measure of fairness presented in Figure 6.4(c) is the maximum number of copies of the file uploaded by any peer. Both proposed modifications do indeed significantly improve fairness. 6 Background and Related Work: BitTorrent 129

140 100 lobw-rand 90 120 lobw-LR a .J 5) UOIJBZII I " £ hibw-rand 80 c y/7 jH •U 100 hibw-LR 70 TO N 60 80 uti l b 50 60 oa d 8 40 *' n3 % 30 C 40 m 20 Vanilla BitTorrent - d> VI) Quick BW Estimation -—x—- ... £ Mea n 10 Pairwise TFT (Delta=2) • •••*•••• 0 0 d = 4 d = 7 d' 10 20 30 40 50 60 Size of neighbor set (d) Node degree (d) (a) Average upload utilization for LRF and random (b) Average upload utilization for regular BitTorrent piece selection strategies vs. neighbour size. and two suggested modifications.

Vanilla BitTorrent —\— 10 Quick BW Estimation --x- Pairwise TFT (Delta=2) *••

2

••*•••

10 20 30 40 50 60 Node degree (d) (c) Maximum number of file copies uploaded by any peer, for BitTorrent and two suggested modifications.

Fig. 6.4 Performance of suggested BitTorrent improvements. Source: [12] 130

Chapter 7

BitTorrent Fairness: Analysis and Improvements

7.1 Chapter Structure

The chapter is structured as follows. Section 7.2 describes the simplified and abstracted BitTorrent model that we analyse and simulate. In Section 7.3 we identify an equilibrium state for the system when the optimistic unchoke procedure is idealized, and demonstrate that this state provides a form of fairness. Section 7.4 describes modifications we propose to enhance fairness and Section 7.5 analyses the simulation results. In section 7.6 we provide in-depth analysis of the proposed modification which provides the most promising results.

7.2 Model Description

In this section we describe the details of the BitTorrent model used in our analysis and simulations. We make a number of simplifying assumptions:

• Every peer is always able to provide any other peer a desired piece of the file.

• In all cases upload capacities rather than download capacities are the bottlenecks in data transfers. 7 BitTorrent Fairness: Analysis and Improvements 131

• For each peer, its peerset - the set of other peers it is aware of and able to connect to - includes all the peers in the system.

These three assumptions imply that at any point in time a peer i is able to download from any other peer j if j wishes to upload to i. Limitations on download rates, restricted peersets and uneven piece availability serve to reduce the number of possible connections that may exist between peers and hence interfere with the (un-)choking procedure. Since we are interested in assessing the inherent fairness of BitTorrent protocol peer selection and (un-)choking, we do not model these constraints. In addition, we idealize the network behaviour, assuming that:

• A peer always utilizes its full upload capacity, and is always sending data to five other peers. The peer utilizes exactly one-fifth of its upload capacity to upload to each of these five peers.

• Peers are able to measure download rates with perfect accuracy.

• Peers always send at full rate, i.e., the ramp-up time of a connection is negligible.

7.2.1 Simulator Description

We implemented the above BitTorrent abstraction as a discrete-time simulator in Matlab. Each of the iV simulated peers has a fixed upload rate, normalized to fall in the interval (0.05,1] (this might correspond to the range of 50kbps to 1Mbps). Initially, upload rates are randomly chosen according to a uniform distribution and each peer randomly chooses the 5 peers to which it uploads. The peer selection (choking and unchoking) procedure occurs as in the BitTorrent protocol described in Section 6.1.1, with peers calculating download rates every 10 seconds. Optimistic unchoking of a random peer is performed every 30 seconds. The simulation proceeds in 1-second time steps, with each peer's initial unchoke uniformly distributed between 0 and 9 seconds from the beginning of the simulation.

7.3 Theoretical Analysis

In this section we identify an equilibrium state for a system operating according to the model specified above, except that optimistic unchoking is replaced by an idealized mecha- 7 BitTorrent Fairness: Analysis and Improvements 132 nism where peers exchange truthful information about upload capabilities and establish a connection if both peers agree. Consider JV peers downloading from one another. Every peer has a fixed upload capacity, and no two peers have exactly the same capacity. Every peer acts in a greedy manner to maximize its download rate. A peer can upload to five different peers at one time, sending data to each at one fifth of its upload capacity. All connections are bidirectional: peer i uploads to peer j if and only if peer j uploads to peer i. To initiate a new connection, peer i sends out a request to peer j specifying the upload rate it can provide. Peer j responds with its offered upload rate. A connection is established only if both peers agree. At any point, either peer may close the connection. The following proposition identifies the equilibrium state and is useful for quantifying the performance of our suggested improvements to the BitTorrent protocol, and we will utilize it in Section 7.5. The corollary follows directly from the proof of the lemma.

Proposition 1. The system outlined above achieves an equilibrium point where peers form into [~J disjoint groups of six and one group comprising the remaining peers. Group members upload to and download from each of the other members. Once this unique set of groups is established, no pair of peers will agree to form a new connection. Each peer is downloading at its maximum rate according to the system rules.

Corollary 1. The equilibrium point achieves a form of fairness: the download rate of a peer i cannot be increased without decreasing the download rate of a peer j with higher upload bandwidth.

Proof. Order the peers from 1 to N in ascending order of their upload capacity. The peer with the highest upload capacity will henceforth be referred to as peer N, the one with the second highest upload capacity as peer N — 1 etc. Consider peer N. It achieves its highest download rate when it is downloading from peers N — 1, N — 2,..., N — 5. Thus, if peer N establishes a connection with each one of these peers, it will not agree to any subsequent connection requests. Since N offers the highest upload rate in the entire network, none of the five peers connected to it will drop the connection due to any new requests. Next, peer N — 1 will achieve the highest download rate if it is downloading from N, N — 2,.., N — 5, and thus it will not agree to any new connections once it has these five established. This argument continues up to and including peer N — 5. 7 BitTorrent Fairness: Analysis and Improvements 133

Now consider peer N — 6: if the first group is formed, it is unable to "convince" any of the five higher ranked peers to form a connection. This means the highest download rate it can achieve is if it establishes a connection with the four peers below it in ranking: N — 7,..., N — 11. These peers, in turn, are also unable to join the first group and thus maximize their download rate if they form connections among each other. This argument can be continued inductively to all other peers, except peers 1, 2, ..,K, where K — iVmod6. These K lowest ranked peers must form a group among each other, and each will only have K — 1 outgoing/incoming connections. •

7.4 Proposed BitTorrent Modifications

In this section we propose three approaches for improving BitTorrent fairness. We treat each modification separately, as they cannot be combined. We define the Instantaneous Fairness Ratio (IFR) for an individual peer as the ratio of data uploaded to data downloaded during the last 10 seconds. Therefore, an IFR less than 1 indicates a peer is downloading an excessive amount (relative to perfect fairness), and an IFR greater than 1 indicates a peer is downloading an insufficient amount.

7.4.1 Conditional Optimistic Unchoke

The Conditional Optimistic Unchoke modification represents a minor change to the Bit­ Torrent protocol. A peer performs an optimistic unchoke only if its IFR is greater than 1. Essentially, peers operate in a more cautious manner: if a peer has an IFR less than 1, it is already downloading more than its fair share of data. Choking an outgoing connection is likely to change the set of peers from which it is downloading, and hence the peer risks eliminating or reducing its download surplus. Peers do not take this risk, thereby also forgoing some opportunities to potentially further reduce their IFR.

7.4.2 Multiple Connection Chokes

The Multiple Connection Chokes modification allows peers to choke/unchoke multiple con­ nections each round. A peer calculates the Connection Fairness for each of the five peers to which it is uploading. This is simply the ratio of the peer's upload rate to a specific peer to the download rate from that peer. If the other peer is not sending any data, the 7 BitTorrent Fairness: Analysis and Improvements 134 connection fairness is defined as infinity. There are two parameters in the modification: the Threshold Ratio, which is the largest value a Connection Fairness can assume before the corresponding upload may be choked, and the Maximum Chokes (MC), which is the largest number of uploads a peer can choke per round. It initially appears tempting to set the Threshold Ratio to 1. However, unless two peers have exactly the same upload capacity, one will always face a Connection Fairness of less than 1. Thus, if the Threshold Ratio is not greater than 1, few connections persist. If during a given round the number of Connection Fairness values exceeding the Threshold Ratio is less than or equal to MC, the peer will choke all the unfair connections. Otherwise, it chokes only MC of the peers, chosen at random. For every choked connection, the peer considers the set of other peers currently uploading to it, to which it is not uploading in return. If it finds one that is uploading at a rate higher than the peer it just choked, it will unchoke it. Otherwise, it performs an optimistic unchoke.

7.4.3 Variable Number of Outgoing Connections

This modification, denoted VOC, is a more significant departure from the BitTorrent pro­ tocol. Instead of all peers having a fixed number of outgoing connections, the number of connections a peer attempts to maintain depends on its upload capacity. A simple approach is to set the upload rate for each connection to the same value for all peers, fixing it at some rate 77. Therefore, if a peer has an upload capacity of rc, it establishes k = ^ connections. However, with this approach a peer wastes rcmodr/ of its capacity. Thus, a better choice is to have any given peer upload at a rate of 77 + rctn° Vf. This means there will be some variability in the upload rates of different peers, but each rate is assured to be at least 77. The basic idea behind this approach is that any pair of peers can establish a connection between one another in which the individual upload rates are nearly identical irrespective of the discrepancy between peer upload capacities. For example, a high capacity peer might establish connections to twenty low capacity peers, and exchange data with each in a fair manner, whereas a low capacity peer might only maintain two connections. A pair of peers is allowed to have multiple connections between each other. This is particularly important for enabling pairs of high capacity peers to transmit data to one another at high rates. We propose that each peer evaluate its set of outgoing and incoming connections every 7 BitTorrent Fairness: Analysis and Improvements 135

10 seconds. At each iteration, it makes a list Lnd of peers to which it is currently uploading, but from which it is not receiving any data. It immediately chokes all of these peers. Next, it makes a list Lnu of peers from which it is downloading, but to which it is not uploading.

If \Lnu\ > \Lnd\, it begins uploading to a random set of \Lnd\ peers in Lnu. If |Lnu| < |Lnd|, it begins uploading to all peers in Lnu, and optimistically unchokes |Lnd| — |Lnu| additional peers chosen at random.

7.5 Results

In this section we present the results generated via our simulator. In all cases, we consider a network with N = 100 peers over a 1-hour interval. For the Multiple Connection Chokes modification we set the Threshold Ratio to 1.1, and MC to 3. We determined experimentally that these values appear to provide the best performance (although the results for Threshold Ratios in the range 1.1-1.3 and MC from 2-3 are similar). We do not claim that these two values are always the optimal choice, which is probably dependent on the distribution of peers' upload capacities. For the VOC modification, we set 77 to a normalized upload rate of 0.025, as this ensures that, with the chosen upload capacity distribution, each peer will have at least 2 outgoing connections. Theoretically, using an extremely small value of 77 produces the best fairness because it results in negligible differences between different peers' upload rates. However, there is overhead associated with each connection and it is impractical to set 77 to an excessively small value. We define the Time-Averaged Fairness Ratio (TAFR) for a particular peer as the ratio of data uploaded to data downloaded, averaged over the entire hour. We also introduce the Average Ranking Difference (ARD): Peers are ranked from lowest to highest upload capacity, and the Ranking Difference (RD) for any current connection is the absolute value of the difference between the rank of the uploading peer and that of the downloading peer. The ARD at any point in time is then defined as the average RD of the 500 current upload sessions. It is easy to verify that the ARD of the equilibrium state described in Section 7.3 is 31 (if one ignores the peers in the lowest ranked group). Thus, we assert that the difference between the steady-state ARD of a scenario and the theoretical lower limit of approximately 4 gives a good indication as to how close to the "ideal" case the current set of peer connections is. Figure 7.1 presents the empirical Cumulative Distribution Function of the TAFR for 7 BitTorrent Fairness: Analysis and Improvements 136

1

c 0.8 o tj c J 06 X> 'c "w b g 0.4

E 3 0.2

0.4 0.6 0.8 1 1.2 1.4 Average Fairness Ratio Fig. 7.1 Empirical cumulative distribution function of average fairness ratio. The solid curve is for regular BitTorrent, the alternating dashed and dotted curve represents BitTorrent with Conditional Optimistic Unchoke, the dashed line is for BitTorrent with Multiple Connection Chokes, and the line with circle-markers corresponds to BitTorrent with VOC. regular BitTorrent and the three proposed modifications. Figure 7.2 includes scatterplots of the TAFR versus upload capacity for the 100 peers for the four cases. For regular BitTorrent, peers with low upload capacities tend to download disproportionately more data than they provide to other peers. This is attributable to the BitTorrent optimistic unchoke mechanism: probabilistically, most peers that randomly choose to upload to a low capacity peer will have a higher upload capacity. Although these peers will typically choke this new upload session quickly, after determining that the low capacity peer cannot offer a comparable upload rate in return, the BitTorrent protocol ensures that data is transferred for at least 10 seconds. Figure 7.2 illustrates that low bandwidth peers are randomly chosen by other peers at a high enough average rate to enable them to download more data than they upload. Conversely, high capacity peers tend to upload more than they download. Again, this can be attributed to the optimistic unchoke mechanism: when a high capacity peer chooses another peer at random, the majority of the time this will be a peer with significantly lower upload rate. The Conditional Optimistic Unchoke modification introduces a marginal improvement in the TAFR distribution, as is best illustrated by Figure 7.1. The Multiple Connection Chokes modification significantly reduces the number of peers with a TAFR of less than 7 BitTorrent Fairness: Analysis and Improvements 137

1.3 1.3 1.2 1.2 • • » • g • • V

ti o 1.1 _•, „_J ,__ ^,_ 1.1 to ra OS 1 OS 1 J*. in CO 0.9 CD 0.9 E —r^ E • "• TO ni U- 0.8 LL 0.8 • 0) CD * • • 4

0.7 ra g 0.7 ra g • 0) <» #• 0.6 0.6 <2 4 •• 0.5 0.5 0.4 0.4 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Normalized Upload Rate Normalized Upload Rate (a) Regular BitTorrent (b) BitTorrent with Conditional Optimistic Unchoke

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.3 0.4 0.5 0.6 0.7 0.8 Normalized Upload Rate Normalized Upload Rate (c) BitTorrent with Multiple Connection Choke (d) BitTorrent with VOC.

Fig. 7.2 Scatterplots of Average Fairness Ratio versus upload capacity. 7 BitTorrent Fairness: Analysis and Improvements 138

0.85, indicating that this modification reduces the unfair advantage that peers with low upload capacities enjoy under regular BitTorrent. This is because the proposed modifica­ tions allows a high capacity peer to terminate a connection to a low capacity peer earlier. Finally, the VOC modification provides excellent fairness. Approximately 90% of peers have a TAFR between 0.95 and 1.05. Figure 7.3 shows the average Instantaneous Fairness Ratio averaged over all the peers in the network. The two curves correspond to peers with an IFR of less than 1 and more than 1. For regular BitTorrent, there is a slight trend toward improvements for approximately the first 600 seconds, at which point the system appears to fluctuate about a steady-state. The Conditional Optimistic Unchoke modification displays improvement for approximately 1000 seconds, and at a higher rate. The Multiple Connection Choke modification continues to show an improvement for about about 1200 seconds, and achieves even better fairness. Finally, with VOC the system rapidly converges and shows the best steady-state fairness. We note that in steady-state, some of upper IFR curves take on larger values than the maximum TAFR any peer takes on in Figure 7.2. The reason for this is that the IFR of any given peer may vary significantly over time: a peer included in the upper IFR curve at a certain point in time may quickly lower its IFR and be included in the lower IFR shortly thereafter. Figure 7.4 illustrates the Average Ranking Differences. We note that the relative steady- state ARD rankings mirror those of the three protocols' IFR and TAFR. Furthermore, the amount of time during which the ARD decreases for each case corresponds approximately to the duration during which the IFR improves. This provides evidence that ARD is indeed a relevant measure of performance.

7.6 VOC Analysis

As shown in the preceding section, the proposed VOC algorithm provides the most promis­ ing results. Thus, in this section we provide some theoretical analysis of VOC. We make one additional simplification: we ignore the Tc™ r/ term in the upload rate of each peer and assume that all upload at exactly 77. Since ry will typically be much larger than rcm° rf, this simplification does not significantly impact the accuracy of the analysis. 7 BitTorrent Fairness: Analysis and Improvements 139

600 1200 1800 2400 3000 3600 600 1200 1800 2400 3000 3600 Time (Seconds) Time (Seconds) (a) Regular BitTorrent (b) BitTorrent with Conditional Optimistic Unchoke.

600 1200 1800 2400 3000 3600 600 1200 1800 2400 3000 3600 Time (Seconds) Time (Seconds) (c) BitTorrent with Multiple Connection Choke (d) BitTorrent with VOC.

Fig. 7.3 Average Instantaneous Fairness Ratio versus time, over all peers. For each plot the upper curve is the average IFR over all peers with an IFR greater than 1, and the lower curve is the average IFR for peers with an IFR less than 1. 7 BitTorrent Fairness: Analysis and Improvements 140

0 600 1200 1800 2400 3000 3600 Time (Seconds) Pig. 7.4 Average Ranking Difference (ARD) vs. time. Prom top to bot­ tom, the three curves are for regular BitTorrent, BitTorrent with Conditional Optimistic Unchoke, and BitTorrent with Multiple Connection Chokes. The ARD for BitTorrent with VOC is not shown, as it is not a relevant measure of performance for this case.

7.6.1 New Peer Joining the Network: Time spent Free-Riding

If a peer has sufficient upload capacity to maintain k outgoing connections, and at a certain point also has k incoming connections, it actually has no incentive to perform an optimistic unchoke. However, in order for new peers joining the system to have the opportunity to bootstrap themselves by downloading at least one complete piece, it is necessary that peers do continue to occasionally perform optimistic unchokes. Let Tu be the time between optimistic unchokes. Assume peers unchoke only one of their outgoing connections at a time. The peer to be unchoked is chosen at random. Consider a network with N peers in equilibrium: each peer has as many outgoing connections as incoming ones. Now consider a new peer with no complete piece joining the system. In order to bootstrap itself, it will request to download the most prevalent piece in the network. Assume that M < N peers have this piece. Each one of these peers unchokes the new one with probability -^ every Tu seconds. Once an existing peer unchokes the new one, it will continue to upload at a rate 77 for Tu seconds. At this point it will perform the tit-for-tat analysis, recognize that the new peer is not uploading in return, and choke the 7 BitTorrent Fairness: Analysis and Improvements 141

connection. Thus, the new peer is unchoked by an existing peer offering the most prevalent

piece an average of ^ times every Tu seconds, for a duration of Tu seconds. Thus, the average rate at which it downloads data is ^-77. A new peer will continue to free-ride until it has downloaded the complete piece. If the piece is of size S, the expected time for this * i?

7.6.2 New Peer Joining the Network: Time to Achieve An Equal Number of Incoming and Outgoing Connections

Consider the new peer after it has ceased freeriding and is ready to establish bi-directional connections with any other peer in the system. Let the number of outgoing connections that the new peer can maintain be c. The new peer has to unchoke, and be unchoked by, c other peers (we assume that all peers respond positively to unchokes - if all their outgoing connections are utilized they will drop a random one in order to upload to the peer that unchoked them). The new peer unchokes other peers at a rate of 1 every Tu seconds, and is unchoked by any given existing peer with probability J^J every Tu seconds. Thus, the average rate at which the new peer establishes new connections is ^+1 « ^- (if N is reasonably large). Thus, the new peer will achieve full utilization and perfect fairness after an expected time of approximately ^.

7.6.3 Achieving Perfect Fairness

We consider a network without any newly joined peers, and assume the following two conditions are met:

1. The sum of available outgoing connections (each of rate 77) over all peers is an even number

2. If the peers are ranked from largest to smallest number of available outgoing connec­ tions, the number of connections offered by the first-ranked peers is less than or equal to the sum of connections offered by all other peers.

We consider n peers, denoted Pi,P2, ...,p„, and define C(*) to be a function that returns the number of maximum outgoing connections supported by a peer - i.e. C(pi) is the number of outgoing connections supported by peer pi. Let 77 = argmaxi (C(pi)) be the index of the peer with the most outgoing connections. 7 BitTorrent Fairness: Analysis and Improvements 142

Proposition 2. Under the above assumptions and definitions, ifY^n^Cipi) > C{rj) and

raod(J2iC(pi),2) = 0, it is possible for every peer to have all its supported connections utilized bidirectionally and hence have the network achieve perfect fairness and utilization. This means each peer is uploading on all its outgoing connections, and downloading from the same number of incoming connections. Note: a bidirectional connection uses one outgoing connection for each of the two peers involved.

Proof. Proof outline: This is a proof by induction. Starting with a set of n peers meeting the above condition, we will show that it is possible for peer pv to utilize all its connections so that the remaining n — 1 (or fewer) peers which still have free connections available form a set which again meet the two conditions specified above. If we can show this, the proof of perfect utilization/perfect fairness follows. The n original peers is reduced to a new set with n — 1 peers, since pn has utilized all its connections. This new set has a new highest ranked peer, pn>, which again is able to utilize all it connections to create a new set of peers with n — 2 peers. This cycle continues until the last set has only 2 peers in it. Since it still holds that the highest ranked peer has as many or fewer free connections as all other peers (i.e. the one other peer), it must be the case that the two peers have the same number of remaining outgoing connections, and thus can also achieve perfect utilization/fairness. Part 1: Even number of remaining connections

Here we show, trivially, that once pn has used up all its connections, the total number of connections summed over all remaining peers is still an even number. Let Y2i C(p») =

ty. After pn establishes its C(pv) bidirectional connections, A = \I/ — 2£)iC(pi) total connections remain. Since ^ is even by assumption, this difference must also be even. Part 2: Connection inequality

Here we show that it is possible for pv to choose its connections in.such a way that the peer with the most remaining free connections has fewer than the sum of free connections over all other remaining peers. In order for this condition to be true, no peer may have more than A/2 of the A total remaining connections. We will begin by considering all peers (except pv) with more than

A/2 available connections prior to pv choosing any connections. We define k < n — 1 to be the number of peers meeting this criteria. We express the number of connections that each one of these k peers can support in the form of a^ + A/2,i = 1, ...,&;, a, > 1 (the order of these peers is irrelevant). If we can show it is possible for pv to establish at least 7 BitTorrent Fairness: Analysis and Improvements 143

Oj bidirectional connections with each peer % in this group, we have completed the second part of the proof. We will do this via a contradiction.

Case 1: Let k = 1. Assume pn does not have enough free connections to establish a\ connections with that one peer. This implies that pv has fewer maximum connections than that peer, which leads to a contradiction since, by assumption, pv has as many or more connections available as any other peer.

Case 2: k > 1. Assume that pv has attempted to assign all its connections to these k peers in an attempt to limit each peer's number of outgoing connections to A/2 or less, but has failed because it has too few outgoing connections. This would mean that Y2i=i ai ^ C(Pr?)> and hence these peers have at least kA/2 + 1 total free connections still available. Since k > 1, this means there are at least A + 1 free connections available after pv assigned all its connections. However, this contradicts the assumption that there are only A remaining free connections. •

Corollary 2. If an existing network has achieved perfect fairness and utilization, and a new peer joins with an even number of available outgoing connections, it is always possible for the new peer to become a member of the network and ultimately have the network return to its state of perfect fairness and utilization.

7.7 Summary

In the preceding lemma we showed that under our simplified model which does not place any limitations on which peer may exchange useful data with any other peer, the proposed Variable Number of Outgoing Connections modification allows all peers in the network to achieving perfect fairness by maintaining an equal number of outgoing and incoming connections which are all of the same bandwidth. We also provide analysis of the behaviour of a new peer joining the network utilizing VOC. Earlier in the chapter we examined the fairness of the standard BitTorrent protocol. Using our simplified model, we showed that if peers have perfect global information, they will form into disjoint groups of six. Each will exchange data with every other peer in the group, and have no incentive to leave the group. We also proposed two other changes to BT, and provided simulations results showing that all three modifications do increase fairness in our model. -f

144

Chapter 8

Conclusion

In this thesis we have considered the impact of three factors on the performance of certain overlay networks. We investigated congestion measurement through the use of packet marking to encode path prices. We also examined the problem of virus propagation and pollution in P2P networks, and the fairness of BitTorrent. We identified ways to measure the impact of these factors, modeled them, and presented novel techniques to alter the impact of these factors to increase network performance.

8.1 Congestion Price Estimation

The first factor, how accurately and quickly the congestion of the links of an underlying network are measured, affects the performance of a broad range of overlay networks. We specified a novel packet marking algorithm that allows a host to deduce the sum of the prices over the links traversed to a client. There exist two previously proposed probabilistic marking techniques, but our algorithm is - we believe - the first deterministic marking algorithm. We introduced the idea that each packet is a probe, which encodes part of the price of one of the links in the path. By reading a packet's IPid field to uniquely identify the probe type, and the TTL field to identify the LinkID, every router is able to determine whether it should modify the ECN fields based on the quantized price of its outgoing link. Based on empirical data, the sequential manner in which the majority of Internet hosts increment the IPid of transmitted packets is conducive to observing all probe types in a relatively short block of packets with high probability. This is vital to the performance of our algorithm, because the chief source of error in estimating the path price based is

2007/11/01 8 Conclusion 145 failing to observe one or more probe types in an estimation block. Quantization, the other, generally less critical source of error is independent of the block length. We presented analysis of these error sources, their underlying probability distributions, and an upper bound on the total mean-squared error. Our simulation results indicated that our algorithm performs better than RAM up to certain block lengths in estimating static prices. Since the levels of congestion in networks tend to vary dynamically, the most significant feature of our deterministic algorithm is its performance in estimating time-varying prices. In all the scenarios considered, our algorithm exhibited a lower mean-squared error than RAM and had a greater proportion of estimates falling within any given error bound up to 30%. The improvement over RAM was especially pronounced when using Internet trace-based IPid behaviour. We also presented a novel algorithm allowing a source in a TCP/IP network to determine the maximum level of congestion of any link along the path to the receiver, and examined the problem of choosing an optimal number of quantization bits to minimize the mean- squared estimation error of the algorithm.

8.2 Modeling P2P Viruses and Pollution

The second factor we examined is related to file-sharing P2P networks. Viruses designed to propagate via popular networks such as Kazaa and Gnutella have a detrimental effect on the efficacy of these networks. The same holds true for the proliferation of pollution - the spreading of mislabeled or corrupt files. We presented two deterministic epidemiological models of how a virus spreads infection in a P2P network. One models the spread of the type of virus which has been observed in numerous incarnations in P2P network: when a file infected with the virus is downloaded and executed, it creates multiple new files containing the virus. We used an S-E-I model to represent this form of P2P virus, presenting differential equations for how the numbers of Susceptible, Exposed, and Infected peers change with time. The rate at which the number of infected files in the network changes depends on particular model assumptions, and we provided equations for 8 different cases. We also derived expressions for the steady-state behaviour in the case where the probability of a peer downloading an infected file is proportional to the prevalence of infection. The other model presented is an S-E-I-R one, which was used to examine a particularly malicious virus 8 Conclusion 146 which infects all the executable files a peer is sharing. We included several model extensions, including the representation of individual file types. We also presented an epidemiological model for the evolution of pollution in a P2P network. Finally, we modeled the impact that object reputation systems have on both the level of pollution and the propagation of viruses. Our models are all deterministic, assuming a homogeneous P2P population. In order to address the impact of this simplification, we included discrete-time simulations with varying individual peer behaviour. These simulations indicated that the models are sufficiently accurate to provide insight into system dynamics despite being based on average behaviour. We included data from a measurement study of the eDonkey network, examining peer download behaviour, as well as the distribution of shared files and how fast files are added and removed from the network. Finally, we provided the results of extensive simulations using our models in order to examine the effect of various model parameters. We also examined the effect of object reputation systems.

8.3 Improving BitTorrent Fairness

We considered the popular BitTorrent P2P protocol and presented a simplified model use­ ful for studying BT's inherent fairness. We included a lemma about a stable equilibrium point of this system. Next, we specified three modifications intended to improve the fair­ ness of BT. We provided simulation results indicating that all three provide some level of improvement. The rankings, in order of increased improvement to fairness, are Condi­ tional Optimistic Unchoke, Multiple Connection Choke, and Variable Number of Outgoing Connections. This order also corresponds to how radically each proposal modifies the BT protocol, and thus likely the likely degree of difficulty in practical implementation. We considered the VOC algorithm and provided detailed analysis to deduce the expected be­ haviour of a new peer joining a network. We also proved that, under certain assumptions, a BT network utilizing VOC can achieve essentially perfect fairness. 147

Appendix A

A.l Identified P2P Viruses

Table A.l summarizes the available information about known P2P viruses.

A.2 Rates at Which Number of Infected Files Changes

This section considers the 8 cases mentioned in Section 5.2.1. 1) All downloads executed, no additional downloads before execution, c unique file names

In this case only Susceptible peers can download infected files: Exposed peers do not download any additional files before becoming Infected, and Infected peers are sharing all c possible infected files. Thus, the rate of change due to downloads is S(t)Xsh(t). An Exposed peer always has one infected file before becoming Infected, meaning in all cases c — 1 new infected files are created when an Exposed peer becomes Infected. The rate of change is thus E(t)XE(c — 1). An Infected peer will always share c infected files, so a recovery results in a reduction of c infected files. The rate is therefore —I(t)XRc. The overall rate of change of K is therefore:

^2S = S(t)Xsh(t) + E(t)XE(c - 1) - I(t)XRc (A.l)

2) All downloads executed, no additional downloads before execution, possible file names ^> c

Susceptible and Infected peers can both download infected files. Thus, the rate of change

2007/11/01 A 148

Name First P2P Size Language Detected Network Achar 02/13/03 Kazaa 8Kb Assembly

Alcan 05/12/05 Kazaa Various eMule Limewire Gnucleus Shareza Aozo 01/09/04 Kazaa

Bare 08/28/02 Kazaa, Morpheus 7.6 Kb Visual Bearshare Basic eDonkey2000

Benjamin 05/20/03 Kazaa 216 Kb Borland Delphi

Bodiru 12/08/03 Kazaa 65 Kb Visual eDonkey2000 Basic

Caspid 11/09/03 Bearshare 39 Kb Visual eDonkey 2000 Basic FileNavigator Gnucleus iMesh Kazaa Morpheus LimeWire Overnet A 149

Name First P2P Size Language Detected Network Darby 08/29/03 Bearshare 108 Kb Visual Kazaa Basic eDonkey2000 Filetopia FileNavigator Gnucleus Grokster iMesh Morpheus LimeWire Soulseek

Duload.b 06/22/02 Kazaa 7.6 Kb Visual Basic

Duload.a 10/31/02 Kazaa 18 Kb Visual Basic

Fizzer 11/12/02 Kazaa

Franvir 09/20/05 Kazaa 1274 Kb

Gotorm 08/01/03 Kazaa 192 Kb Visual C

Harex.a 08/29/03 Kazaa, Imesh 15 Kb

Harex.b / 08/29/03 Kazaa, Imesh 4Kb Genky

Harex.c / 04/04 Kazaa, Imesh 2Kb Exebat

Hofox 01/14/04 Various 49 Kb Visual Basic Name First P2P Size Language Detected Network Igloo 02/13/03 Kazaa

Irkaz 09/24/03 Kazaa 6 Kb C

Kazmor 06/23/02 Kazaa, 52 Kb Borland Morpheus Delphi

Krepper 06/15/04 Altnet, Kazaa 17 Kb C Morpheus iMesh eDonkey2000 LimeWire

Kwbot 04/30/03 Kazaa, iMesh 245 Kb

Lolol 12/20/02 Kazaa 60 Kb Visual C++

Mandragore / 02/23/01 Gnutella 8Kb Gnuman

Mareta 01/15/04 Kazaa 42 Kb Delphi

Nocano 08/01/03 KMD, Kazaa Visual Morpheus Basic Grokster BearShare eKonkey200 Limewire » Relmony 08/26/02 Kazaa 29 Kb Visual Basic

Scranor 10/19/04 Kazaa, iMesh 12 Kb

SdDrop 10/12/04 Kazaa, iMesh 25 Kb Name First P2P Size Language Detected Network Slideshow 12/04/03 Kazaa Visual Basic

Spear 08/29/02 Kazaa 40- Delphi Morpheus 70 Kb BearShare eDonkey2000

Spybot 04/16/03 Kazaa 16- Visual C++ 146 Kb

Stewon 08/02/04 Kazaa Overnet, DC++

Surnova 07/19/02 Kazaa 40 Kb Visual Basic

Tanked 05/27/02 Kazaa 100 Kb Visual C++

Tibick 01/17/05 Kazaa 14 Kb Morpheus iMesh eMule wareo DC++

Togod 11/12/02 Kazaa 100 Kb Delphi

Vb.bh / Vb.ar 07/29/04 Kazaa 32Kb Visual / SillyP2P Basic 152

Name First P2P Size Language Detected Network Vb.dg 07/15/05 Kazaa 260 Visual Morpheus Basic Xolox Gnucleus eDonkey2000 Limewire WinMX BearShare

Zafi 12/14/04 All 12 Kb

Table A.l Summary of known P2P viruses. Sources: [45-52]

due to downloads is (S(t) + I(t))Xsh(t). When an Exposed peer becomes Infected c new infected files are created. The rate of change due to infections is thus E[t)\EC. Infected peers can continue to execute newly downloaded infected files. Since the possible file names are 3> c, we assume that each infected file executed generates c new, additional infected files. The rate of change due to infected peers's executions is thus IfyXgc Prior to recovery, an Infected peer will have the c infected files created during the transition from Exposed to Infected, the one infected file downloaded while Susceptible, and the additional infected files downloaded and created due to execution while Infected. The expected time spent in the Infection phase is j-, so the expected number of infected files downloaded and generated during this period is (c 4- 1) f^~(t i_, Xsh(T)dr. We note that the preceding integral may be approximated by y s. The accuracy of this approximation depends on how much the function h(r) fluctuates over the range from t — •£- to t. Thus, the rate of change of K is:

dK(t) = {S(t) + I(t)}X h(t) + {E(t) + I(t)}X c dt s E

T -I(t)XR(c+l)(l+ [ ' Xsh(r)dr) (A.2)

3) All downloads executed, additional downloads before execution possible, c unique file A 153 names

Susceptible and Exposed peers can both download infected files, but Infected peers cannot since they already have all c possible infected files. Thus, the rate of change due to downloads is {S(t) + E(t)}Xsh(t). The expected time from when a peer becomes Exposed till when it becomes Infected is j-. Thus, while a peer is Exposed, the expected number of additional infected files it downloads is f*~(t I_N Xsh(r)dr. When it becomes Infected, the remaining infected files are generated. The rate of change in the number of infected files due to Exposed peers becoming Infected is thus E(t)\E(c — 1 — J^~(t j* Xsh(r)dr). An Infected peer will always share c infected files, so a recovery results in a reduction of c infected files. The rate is therefore —I(t)XRC. This means the overall rate of change of K is:

^ = {S(t) + E(t)}Xsh(t) + E(t)XE

x c-l-f Xsh(T)dT] - I(t)XRc (A.3)

4) All downloads executed, additional downloads before execution possible, possible file names ^$> c

All peer can continue to download, so the rate of change due to downloads is NXsh(t). When an Exposed peer becomes infected or an Infected peer executes a newly down­ loaded infected file, c new infected files are created in addition to the ones it is already sharing. So, the rate of change due to executions of infected files is (I(t) + E(t))XEC The expected number of infected files downloaded while Exposed and Infected is X JJ~(t !_, Xsh(r)dT + f _(t_ JL l^^sh^dr. Since all these infected files are eventually downloaded, the rate of change from recoveries is

-/(t)Afl(c + 1)(1 + CU.i, Xsh{r)dr). A 154

Therefore, K undergoes the following rate of change:

dK(t) = NX h(t) + {J(i) + E{t)}X c - I{t)X (c + 1) dt s E R

x (i( 1 + +/ r Xsh{T)dT ) (A.4)

5) Some downloads not executed, if file is executed: no additional downloads before ex­ ecution, c unique file names

Since not all files are executed, A# < As, and the probability of a downloaded file being executed is 4s- The expected proportion of Exposed peers which do not execute the infected file they downloaded, and hence keep downloading, is jf1. Thus, the rate of change due to downloads is {S(t) + E(t)^}h(t)Xs. An expected proportion 1 — ^ of Exposed peers will execute the downloaded infected file. So, the expected number of Exposed peers becoming infected upon the next execution is (1 - %)E. The number of infected files an Exposed peer downloads prior to executing one is ge­ ometrically distributed with parameter ^, so the expected number of such downloads is j2-. This means upon Infection, an additional c — 1 — j&- infected files will be created, and hence the rate of change due to new infections is (1 — ^L)E(t)XE(c — 1 — jf-) Infected peers will always share c infected files, so the rate of change due to recoveries is -I(t)XRc. The overall rate of change of K is therefore:

^ = {S(t) + E(t)^}h(t)Xs + (1 - ^)E(t)XB{c - 1 - ^} - I(t)XRc (A.5) at As AS *E

6) Some downloads not executed, if file is executed: no additional downloads before ex­ ecution, possible file names ^> c

Susceptible and Exposed peers download in the same fashion as in case 5), but in this case Infected peers can also continue to download infected files. Therefore, the rate of A 155 change in infected files due to downloads is {S(t) + I(t) + E(t)^f-}h(t)Xs Upon infection, c additional infected file types are created, meaning the rate of change due to infections is (1 — ^£)E(£)Asc Prior to recovery, Infected peers are expected to have downloaded f*~, i_^ Xsh(r)dT infected files, and executed ^ of them. Prior to infection, Exposed peers are expected to have downloaded ^ infected files. Therefore, the rate of change in the number of infected files due to recoveries is —I(t)XR(c + 1 + (1 + c^-) J*~,t i_. Xsh(r)dT + jf-) Combining these rates provides the total rate at which K changes:

^P- = {S(t) + I(t) + E(t)^}h(t)Xs + (1 - ^)E(t)XEc - I(t)XR at As AS

T_t (c+l + (l + c^) / Xsh{r)dT + ^-) (A.6)

7) Some downloads not executed, additional downloads before execution possible, c unique file names

Susceptible and Exposed peers can both download infected files, but Infected peers cannot since they already have all c possible infected files. Thus, the rate of change due to downloads is {S(t) + E(t)}Xsh(t). The expected number of infected files downloaded by an Exposed peer prior to deciding 2 to execute one is j -. Before this file is executed,JJ~,t i_,Xsh(T)dr additional infected files are expected to be downloaded. Hence the rate of change due to new Infections is

(1 -^)E(t)XE(c - 1 - £ - j;;*^) Xsh{r)dr) Infected peers will always share c infected files, so the rate of change due to recoveries is -I(t)XRc. Combining these rate provides the overall rate of change of K:

^- = {S(t) + E(t)}Xsh(t) - I(t)XRc

+(1 - ^)E(t)XE (c - 1 - ^ - f ~* Xsh(r)dr) (A.7) Xs \ XE JT={t_ i ) J A 156

8) Some downloads not executed, additional downloads before execution possible, possi­ ble file names » c

All peers can download infected files at all times, so the rate of change in infected files due to downloads is N\sh(t) Upon infection, c additional infected file types are created, meaning the rate of change £ , due to infections is (1 - ^ )£ (i)ABc

Prior to recovery, Infected peers are expected to have downloaded f*~ i_> \sh{r)dT infected files and executed ^ of them. Prior to infection, Exposed peers are expected to T = t--±- A 2 have downloaded f _. f x . XshMdr + j - infected files. Therefore, the rate of change in the number of infected files due to recoveries is — J(£)AR(C+1+(1+C^) JJ~,t i_) Xsh(r)dT+ C^f_^Xsh(r)dr + ^ Therefore, K changes at the following rate:

^ = NXshit) + (1 - ^)E(t)XEc at Ag

Tf % XR -I(t)XR (c+l + (l + A/ Xsh(T)dT + P Xsh(r)dT + ^) /r=(t-1i-) ^-r=(*-v—r-) x X V X J R' E *R (A.8)

A.3 Rate at Which Proportion of Infected Files Changes

This discussion expands on (5.31) in Section 5.3.2. Define qf{t) = ^4jy, where the numerator Uj(t) is the expected number of Exposed peers that have an infected copy of file j and the denominator dj (t) is the expected number of Exposed peers that have a (infected or clean) copy of file j. Let Adj(t) be the rate of change of dj{t) and let An,-(£) be the rate of change of Uj(t). 9i Then, using the Quotient Rule for differentiation, dt is of the form:

dqfjt) dj^An^-n^Adjjt) 2 [ j dt (d3(t)) ' A 157

Derivation of Arij(t): The number of infected files in the set of Exposed peers increases due to the download of an infected version of file type j at a rate of (S + E)Xsrj(t)qj(t). It is reduced when an Exposed peer with file j becomes infected, which occurs at a rate of XEEuj(t)qf(t) + M Pj{t)qf(t)\EE £ Ui{t)qf(t). The first term is due to the execution of infected copies of i=l,i¥:3 file j; the second term is due to the execution of any other infected file whose peer also has an infected copy of j. Thus,

M Anj(t) = (S + E)Xsrj(t)qj(t) - XEEUj(t)qf(t) - qf(t)Pj(t)XEE £ u^qfit) (A.10) Derivation of Adj(t): The total number of type j files shared by Exposed peers increases by one when a Susceptible peer already sharing a clean copy of file j downloads an infected file. This M happens at a rate PjSXs £ ri (*)&(£)• It also increases due to Susceptible peers down- loading an infected instance of the file, which occurs at a rate of SXsrj(t)qj(t). Finally, the number of copies of type j files in E also increases if an Exposed peer that doesn't have file j downloads an (infected or uninfected) copy of it; this occurs at a rate of EXsrj(t). It decreases by the fraction of new infections that included file j, at a rate M of XBErj(t)qf(t) + pj(t)XEE J2 Ti{t)qf{t). The first term is due to the execution of infected j files; the second term accounts for the execution of any other infected file when the peer also has a copy of j. Thus,

M Adj(t) - S\srj(t)qj(t) + EXsrj(t)+Pj(t)SXs £ r^qfityXEEu^qfit) i=Wi (A.11) -Pj(t)\EE £ Ui(t)qf(t)

Substituting (A.10), (A.ll), n^t) = qf(t)pj(t)E, and dj(t) = Pj(t)E into (A.9) provides the complete expression for K . 158

Appendix B

This section expands on Section 3.2.4, summarizing our proposed marking algorithm with pseudocode. A procedure Mar king Decision takes, as its parameters, pointers to three fields in the header of the currently buffered IP packet. Each router is aware of the constants nmax, m, and MarkingTable, the lookup table described above. MarkingTable has entries of zero to indicate no marking action is to be performed, or positive integers corresponding to the price bit on which to base the marking decision. Routers store their current price estimate in the bit-array CurrentPrice of length b.

procedure MarkingDecision(&:ttl, faipid, keen) l: Linkld <— *ttl mod nmax 2: ProbeType <— *ipidmod m 3: BitToMark = MarkingTable(LinkId, ProbeType) 4: if BitToMark ^ 0 then 5: PriceBitV alue = CurrentPrice(BitToMark) 6: end if 7: if PriceBitV alue == 1 then 8: if *ecn == 01 then 9: *ecn — 10 10: else 11: if *ecn == 10 then B__ __^____ 159

12: *ecn = 11 13: end if 14: end if 15: end if 160

References

[1] R. W. Stevens, TCP/IP Illustrated, Volume 1: The Protocols. Addison-Wesley, 1994.

[2] H. Eriksson, "Mbone: the multicast backbone," Commun. ACM, vol. 37, August 1994.

[3] D. G. Andersen, H. Balakrishnan, F. Kaashoek, and R. Morris, "Resilient Overlay Networks," in 18th ACM SOSP, (Banff, Canada), October 2001.

[4] "Kazaa." http: //www. kazaa. com.

[5] "Gnutella protocol development." http://rf c-gnutella.sourcef orge.net/.

[6] "BitTorrent." http: //bittorrent. com. [7] S. Athuraliya, V. Li, S. Low, and Q. Yin, "REM: Active queue management," IEEE Network, vol. 15, pp. 48-53, May 2001.

[8] M. Adler, J.-Y. Cai, J. Shapiro, and D. Towsley, "Estimation of congestion price using probabilitic packet marking," in Proc. IEEE INFOCOM, (San Francisco, CA), Apr. 2003.

[9] K. Walsh and E. Sirer, "Thwarting p2p pollution using object reputation," Tech. Rep. Computer Science Department Technical Report TR2005-1980, Cornell University, Feb 2005. [10] M. Izal, G. Urvoy-Keller, E. Biersack, P. Felber, A. A. Hamra, and L. Garces-Erice, "Dissecting BitTorrent: Five months in a torrent's lifetime," in Proc. Passive Analysis and Measurement Workshop, (Antibes, Juan-les-Pins, France), April 2004.

[11] J. Pouwelse, P. Garbacki, D. Epema, and H. Sips, "The BitTorrent P2P file-sharing system: Measurements and analysis," in Proc. Int'l Workshop on Peer-to-Peer Sys­ tems, (Ithaca, NY), Feb. 2005.

[12] A. Bharambe, C. Herley, and V. Padmanabhan, "Analyzing and improving BitTorrent performance," Tech. Rep. MSR-TR-2005-03, Microsoft Research, 2005. References 161

S. Athuraliya and S. Low, "Optimization flow control II: Implementation," tech. rep., Netlab, California Institute of technology, 2000.

R. Gibbens and F. Kelly, "Resource pricing and the evolution of congestion control," Automatica, vol. 35, pp. 1969-1985, 1999.

D. Katabi, M. Handley, and C. Rohrs, "Internet congestion control for future high bandwidth-delay product environments," in Proc. ACM Sigcomm 2002, (Pittsburgh, PA), Aug. 2002.

S. Kunniyur and R. Srikant, "A time scale decomposition approach to adaptive ECN marking," in Proc. IEEE INFOCOM, (Anchorage, AL), pp. 1330-1339, 2001.

S. Low and D. Lapsley, "Optimization flow control I: Basic algorithm and convergence," IEEE/ACM Trans. Networking, vol. 7, pp. 861-875, Dec. 1999.

F. Paganini, Z. Wang, S. Low, and J. Doyle, "A new TCP/AQM for stable operation in fast networks," in Proc. IEEE INFOCOM, (San Francisco, CA), Apr. 2003.

K. Ramakrishnan, S. Floyd, and D. Black, "The addition of explicit congestion notifi­ cation (ECN) to IP," Sept. 2001. IETF RFC 3168.

J. MacKie-Mason and H. Varian, "Pricing congestible network resources," IEEE Jour­ nal on Selected Areas in Communication, vol. 13, pp. 1141-1149, Sept. 1995.

P. Key, L. Massoulie, and J. Shapiro, "Service differentiation for delay sensitive ap­ plications: An optimisation-based approach, Technical Report, Microsoft Research," 2001. MSR-TR 2001-115.

F. Kelly, "Charging and rate control for elastic traffic," European Transactions on Telecommunications, vol. 8, pp. 33-37, Sept. 1997.

F. Paganini, J. Doyle, and S. Low, "Scalable laws for stable network congestion control," in Proc. IEEE Conf. Decision and Control, (Orlando, FL), Dec. 2001.

J. Shapiro, C. V. Hollot, and D. Towsley, "Trading precision for stability in congestion control with probabilistic packet marking," in Proc. IEEE ICNP, (Boston, MA), Nov. 2005. S. Kunniyur and R. Srikant, "End-to-end congestion control: Utility functions, random losses and ECN marks," in Proc. IEEE INFOCOM, (Tel Aviv, Israel), 2000.

S. Low and R. Srikant, "A mathematical framework for designing a low-loss low-delay internet," Network and Spatial Economics, Jan. 2003. References 162

R. Thommes and M. Coates, "Determinstic packet marking for congestion price esti­ mation," in IEEE Infocom 2004, March 2004.

B. Wydrowski and M. Zukerman, "MaxNet: A congestion control architecture for MaxMin fairness," IEEE Communications Letters, vol. 6, pp. 512-514, November 2002.

R. Thommes and M. Coates, "Deterministic packet marking for maximum link price estimation," in Proc. Canadian Workshop on Information Theory, (Montreal, QC, Canada), June 2005. http://www.tsp.ece.mcgill.ca/Networks/publications. html.

H.-K. Ryu and S. Chong, "Deterministic packet marking for max-min flow control," IEEE Communications Letters, vol. 9, pp. 856-858, September 2005.

F. Begtasevic and P. Mieghen, "Measurements of the hopcount in the Internet," in Proc. Passive and Active Measurement, (Amsterdam, The Netherlands), Apr. 2001.

J. Postel, "Internet protocol," Sept. 1981. IETF RFC 791.

S. Bellovin, "A technique for counting NATted hosts," in Proc. Internet Measurement Workshop, (Marseille, France), Nov. 2002.

R. Mahajan, N. Spring, and D. Wetherall, "Measuring ISP topologies with Rocketfuel," in Proc. ACM Sigcomm 2002, (Pittsburgh, PA), Aug. 2002.

N. Johnson and S. Kotz, Urn Models and their applications. John Wiley & Sons, 1977.

V. Kolchin, B. Sevastyanov, and V. Chistyakov, Random allocations. John Wiley k, Sons, 1978.

A. Kamath, R. Motwani, K. Palem, and P. Spirakis, "Tail bounds for occupancy and the satisfiability threshold conjecture," Random Structures and Algorithms, vol. 7, no. 1, pp. 59-80, 1995.

D. Wetherall, D. Ely, N. Spring, S. Savage, and T. Anderson, "Robust congestion signaling," in IEEE Conference on Network Protocols, pp. 332-341, November 2001.

J. Padhye, V. Firoiu, D. Towsley, and J. Kurose, "Modeling TCP throughput: a simple model and its empirical validation," in ACM SIGCOMM, Sept. 1998.

L. L. H. Andrew, S. V. Hanly, S. Chan, and T. Cui, "Adaptive deterministic packet marking," IEEE Communications Letters, vol. 10, pp. 790-792, Nov. 2006.

F-Secure, "F-secure hoax information pages: Mp3 virus." http: //www. f-secure. com/ hoaxes/mp3.shtml, 1998. References 163

[42] "Edonkey2000." http: //www. . com.

[43] "eDonkey2000 server list." http://ocbmaurice.no-ip.org/slist/serverlist. html.

[44] D. Dumitriu, E. Knightly, A. Kuzmanovic, I. Stoica, and W. Zwaenepoel, "Denial- of-service resilience in peer-to-peer file-sharing systems," in Proc. ACM Sigmetrics, (Banff, Canada), June 2005.

[45] "Viruslist.com." http: //www. viruslist. com.

[46] "Zone labs smartdefense research center." http: //vie. zonelabs. com.

[47] "Mcafee avert labs threat library." http://vil.nai.com/.

[48] "Symantec security response." http: //securityresponse. Symantec. com.

[49] "Sophos threat analyses." http: //www. sophos. com/security/analyses/.

[50] "Frisk software international f-prot antivirus virus information." http: //www. f-prot. com/virusinfo/.

[51] "CA virus information center." http: //www3. ca.com/securityadvisor/virusinfo/.

[52] "F-Secure virus description database." http: //http: //www. f-secure. com/v-descs/.

[53] Viruslist.com, "P2p-worm.win32.achar.a." http: //www. viruslist. com/en/viruses/ encyclopedia?virusid=23893, May 2003. [54] Symantec, "W32.hllw.gotorm." http: //securityresponse. Symantec. com/avcenter/venc/data/w32.hllw.gotorm.html, August 2003.

[55] Viruscan, "W32/bare.worm." http://www.virus-scan-software.com/ latest-virus-software/latest-viruses/w32bare-worm.shtml.

[56] Sophos, "Sophos virus analysis: Troj/krepper-g." http://www.sophos.com/ virusinfo/analyses/trojkrepperg.html, July 2004.

[57] J. Liang, R. Kumar, Y. Xi, and K. W. Ross, "Pollution in P2P file sharing systems," in Proc. IEEE Infocom, (Miami, FL), Mar. 2005. [58] C. Zou, W. Gong, and D. Towsley, "Code red worm propagation modeling and analy­ sis," in Proc. ACM Conf. Computer and Comm. Soc, (Washington DC), Nov. 2002. [59] J. Frauenthal, Mathematical Modeling in Epidemiology. New York, NY: Springer Ver- lag, 1980. References 164

Z. Chen, L. Gao, and K. Kwiat, "Modeling the spread of active worms," in Proc. IEEE Infocom, (San Francisco, CA), Mar. 2003.

D. Moore, "The spread of the code red worm." http://www.caida.org/research/ security/code-red/coderedv2_analysis.xml.

M. Garetto, W. Gong, and D. Towsley, "Modeling malware spreading dynamics," in Proc. IEEE Infocom, (San Francisco, CA), Mar. 2003.

D. Watts and S. Strogatz, "Collective dynamics of 'small-world' network," Nature, vol. 393, pp. 440-442, 1998.

C. Moore and M. Newman, "Exact solutions of site and bond percolation on small- world networks," Phys. Rev., vol. 62, 2000. R. Kumar, D. D. Yao, A. Bagchi, K. W. Ross, and D. Rubenstein, "Fluid modeling of pollution proliferation in p2p networks," in SIGMETRICS '06/Performance '06: Proceedings of the joint international conference on Measurement and modeling of computer systems, (Saint Malo, France), June 2006.

I. Stoica, R. Morris, D. Karger, F. Kaashoek, and H. Balakrishnan, "Chord: A scalable Peer-To-Peer lookup service for internet applications," in Proceedings of the 2001 ACM SIGCOMM Conference, 2001.

A. Rowstron and P. Druschel, "Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems," in IFIP/ACM International Conference on Distributed Systems Platforms (Middleware), (Heidelberg, Germany), Nov. 2001. N. Christin, A. Weigend, and J. Chuang, "Content availability, pollution, and poi­ soning in file sharing peer-to-peer networks," in Proc. ACM Conference on Electronic Commerce, (Vancouver, Canada), June 2005.

J. Liang, N. Naoumov, and K. W. Ross, "The index poisoning attack in P2P file sharing systems," in Proc. IEEE Infocom, (Barcelona, Spain), April 2006.

R. Anderson and R. May, "Population biology of infectious diseases I," Nature, vol. 180, pp. 361-367, 1979. E. Adar and B. A. Huberman, "Free riding on Gnutella," First Monday, vol. 5, October 2000. [72] M. Ripeanu, A. Iamnitchi, and I. Foster, "Mapping the gnutella network," IEEE In­ ternet Computing, pp. 50-57, January/February 2002. References 165

S. Saroiu, P. Gummadi, and S. Gribble, "A measurement study of peer-to-peer file sharing systems," in Proceedings of Multimedia Computing and Networking, 2002.

S. Sen and J. Wang, "Analyzing peer-to-peer traffic across large networks," IEEE/ACM Transactions on Networking, vol. 12, pp. 219-232, April 2004.

K. Tutschku, "A measurement-based traffic profile of the edonkey filesharing service," in Proc. Passive and Active Measurement Workshop, (Juan-les-Pins, France), Apr. 2004.

"BayTSP." http: //www. baytsp. com.

John Borland: CNET News, "Kazaa loses p2p crown." http: //news. com. com/Kazaa+ loses+P2P+crown/2100-1038_3-5406278.html, October 2004.

R. Rivest, "RFC 1186: The MD4 message-digest algorithm." http://www. rfc-editor.org/rfc/rfcl320.txt, April 1992.

"Ethereal network protocol analyzer." http: //www.ethereal. com/. O. Heckmann and A. Bock, "The eDonkey2000 protocol," Tech. Rep. Version 0.8, Darmstadt University of Technology, December 2002.

"Emule." http://www.emule-project.net.

Electronic Frontier Foundation, "RIAA vs. the people." http://www.eff.org/IP/ P2P/?f=riaa-v-thepeople.html.

B. Cohen, "Incentives build robustness in Bit Torrent." http://bittorrent.org/ bittorrentecon.pdf,2003.

D. Qiu and R. Srikant, "Modeling and performance analysis of BitTorrent-like peer- to-peer networks," in Proc. ACM Sigcomm, (Portland, OR), Aug. 2004. M. Barbera, A. Lombardo, G. Schembra, and M. Tribastone, "A markov model of a freerider in a BitTorrent p2p network," in Proc. Globecom, (St. Louis, Missouri, USA), November 2005.

S. Koo, C. Lee, and K. Kannan, "A genetic algorithm-based neighbor-selection strategy for hybrid peer-to-peer networks," in Proc. International Conference on Computer Communications and Networks, (Chicago, Illinois), October 2004.