University of Nevada, Reno

Virtual Direction for Overlay Networks

A dissertation submitted in partial fulfillment of the

requirements for the degree of Doctor of Philosophy in

Computer Science and Engineering

by

Suat Mercan

Dr. Murat Yuksel Dissertation Advisor

August, 2011

THE GRADUATE SCHOOL

We recommend that the dissertation prepared under our supervision by

SUAT MERCAN

entitled

Virtual Directional Multicast for Overlay Networks

be accepted in partial fulfillment of the requirements for the degree of

DOCTOR OF PHILOSOPHY

Murat Yuksel, Ph.D., Advisor

Mehmet H. Gunes, Ph.D., Committee Member

Monica Nicolescu, Ph.D., Committee Member

Sergiu Dascalu, Ph.D., Committee Member

Gokhan Pekcan, Ph.D., Graduate School Representative

Marsha H. Read, Ph. D., Associate Dean, Graduate School

August, 2011

i

Virtual Direction Multicast for Overlay Networks Suat Mercan University of Nevada, Reno, 2011

Advisor: Dr. Murat Yuksel

Abstract

Recently emerged Internet applications such as Internet TV, tele-conferencing and online education requires group communication, also known as “multicast”. Many researchers have put their research focus on achieving a robust and efficient way of sending traffic via multicast. Network layer multicast attracted the attention for years; but, because of deployment issues of network layer multicast, it has not been deployed widely. Overlay multicast (i.e., application layer multicast) is a promising alternative solution in which multicast functionalities are implemented in the appli- cation layer. We propose Virtual Direction Multicast (VDM) which aims to minimize net- work usage and increase user-perceived quality for video multicast applications on peer-to-peer overlay networks. VDM locates the end hosts relative to each other based on a virtualized orientation scheme and it builds its multicast tree by con- necting the nodes which are estimated to be in the same virtual direction that are determined by virtual distances among the nodes. By using the concept of direction- ii ality, we target to consume minimal resources of the underlying network and increase user satisfaction. Another aim of using virtual distance is to adapt overlay multicast tree to different performance targets desired by applications. Sensitivity of applica- tions differs against various network performance metrics, e.g., delay is crucial for video conferencing which includes interactivity while it is not so significant for video streaming that is highly sensitive to loss. Calculating virtual distances based on given application-specific preferences is a key capability of VDM and it enables the system to accommodate specific performance targets while providing a generalized framework to establish the overlay trees. We perform extensive evaluation of VDM and compare it against the Host Multicast Tree Protocol (HMTP) that connects nearby nodes to construct the multicast tree. Our simulation results and Planetlab implementation show that our proposed technique VDM consistently outperforms HMTP under dif- ferent churn rates for some key measures such as network usage and user-perceived quality. iii

To my Family iv

Acknowledgments

First, I like to thank my advisor Dr. Murat Yuksel for his invaluable teaching, mo- tivation and patience through my graduate study. His guidance helped me in my research and writing my dissertation. Besides my advisor, I want to thank other committee members: Dr. Mehmet H. Gunes, Dr. Monica Nicolescu, Dr. Sergiu Dascalu and Dr. Gokhan Pekcan for their time and comments. I also like to thank all my teachers from 1st grade up to now for their contributions to my knowledge. I really appreciate understanding of my wife and mother that I could not spend much time for them. My special thanks go to some people that they supported financially in my hard times and never ask for that until I pay back. I should mention my lab mates in Computer Networking Lab. Our lab has very friendly atmosphere with their warm and lovely behaviors. Insightful discussions have contributed to our understanding.

Suat Mercan

University of Nevada, Reno August 2011 v

Contents

Abstract i

Acknowledgments iv

List of Figures viii

Chapter 1 Introduction 1

1.1 Contributions ...... 5 1.2 OrganizationofDissertation ...... 6

Chapter 2 Related Work 8

2.1 TypesofDataDelivery...... 9 2.1.1 Unicast ...... 9

2.1.2 Broadcast ...... 10 2.1.3 Multicast ...... 10

2.2 ContentDeliveryNetworks(CDNs) ...... 11 2.3 NetworkLayerMulticast ...... 12 2.4 ApplicationLayerMulticast ...... 14

2.4.1 OverlayNetwork ...... 14 2.4.2 MulticastinOverlay ...... 15 vi 2.4.3 Application Layer Multicast Considerations ...... 16

2.4.4 Previous Studies on Peer-to-Peer Multicast ...... 18 2.4.5 AvailableP2PTVApplications ...... 19

2.4.6 BananaTreeProtocol(BTP) ...... 20 2.4.7 HostMulticastTreeProtocol(HMTP) ...... 21

2.4.8 SplitStream ...... 22 2.4.9 NICE ...... 23 2.4.10 Narada...... 24

2.4.11 Topology-AwareNodeSelection ...... 25 2.4.12 ContentAddressableNetwork(CAN) ...... 25

Chapter3 VirtualDirectionMulticast 27

3.1 ProtocolDescription ...... 28 3.1.1 KeyDesignConsiderations...... 28

3.1.2 VirtualDirectionalityonaLine ...... 30 3.2 JoinProcess...... 32 3.2.1 JoinExamples...... 34

3.2.2 MoreComplexJoinExamples ...... 37 3.2.3 ComplexityAnalysis ...... 41

3.3 Reconnection ...... 42 3.4 Refinement ...... 44

3.5 VDMversusHMTP ...... 45 3.6 SimulationExperiments ...... 47 3.6.1 SimulationBasics...... 48

3.6.2 SimulationSetup ...... 49 vii 3.6.3 PerformanceMetrics ...... 50

3.6.4 SimulationResults ...... 52

Chapter 4 Generalizing Virtual Directions for Various Metrics 61 4.1 Generalization...... 61

4.2 VDM-DversusVDM-L...... 65

Chapter5 PlanetLabImplementation 69 5.1 PlanetLabEnvironment ...... 69

5.2 VirtualDirectionMulticastonPlanetLab...... 70 5.2.1 NodeSelection ...... 70

5.2.2 Implementation ...... 71 5.3 PerformanceMetrics ...... 75 5.4 Results...... 77

5.4.1 SampleTree...... 77 5.4.2 VDMversusHMTP ...... 79

5.4.3 PerformanceversusNumberofNodes ...... 84 5.4.4 PerformanceversusNodeDegree ...... 88

5.4.5 RefinementComponent...... 92 5.4.6 ComparisonwithMinimumSpanningTree ...... 94

Chapter6 SummaryandFutureWork 96

6.1 Summary ...... 96 6.2 FutureWork...... 98

Bibliography 99 viii

List of Figures

2.1 Unicast ...... 9 2.2 Broadcast ...... 10 2.3 Multicast ...... 11

2.4 NetworkLayerMulticast ...... 14 2.5 OverlayNetwork ...... 15

2.6 ApplicationLayerMulticast ...... 15 2.7 SiblingswitchinBTP ...... 21 2.8 JoinProcedureforHMTP ...... 22

2.9 SplitStream ...... 23 2.10NICE ...... 24

2.11Narada...... 25 2.12JoininCAN...... 26

3.1 DirectionalityConcept...... 30 3.2 CaseI...... 31 3.3 CaseII...... 31

3.4 CaseIII...... 31 3.5 AnotherillustrationforCaseIII...... 32 ix 3.6 Joinprocedure...... 33

3.7 A sample overlay tree with source and some children...... 35 3.8 AjoinexampleillustratingCaseI...... 35

3.9 A join example illustrating CaseIII and CaseI sequentially...... 36 3.10 A join example illustrating CaseIII and CaseII sequentially...... 36

3.11 A join example illustrating CaseIII and CaseII...... 37 3.12 Differentjoinscenarios...... 38 3.13 CaseII is existing with two different children in the sameiteration. . . 38

3.14 CaseIII is existing with two different children in the same iteration. . 39 3.15 CaseII is existing with one child and CaseIII is with another child in

thesameiteration...... 40 3.16 C3 is not able to find C2 directionality divergence...... 40

3.17 N needs to contact grandchildren of P to find C2...... 41 3.18 ReconnectionProcedure ...... 43 3.19 Orphan nodes starts reconnection at grandparent...... 44

3.20 SourceRefinement...... 44 3.21 Scenario I showing the difference between VDM and HMTP...... 46

3.22 ScenarioII showing the difference between VDM and HMTP...... 46 3.23 Stressvaluesoflinksareshown ...... 50

3.24 Stretchvaluesoflinksareshown...... 51 3.25Stressvs.Churn...... 53 3.26Stretchvs.Churn...... 54

3.27Lossratevs.Churn...... 54 3.28 Overheadvs.Churn...... 55

3.29 Stressvs.NumberofNodes...... 56 x 3.30 Stretchvs.NumberofNodes...... 56

3.31 Lossratevs.NumberofNodes...... 57 3.32 Overheadvs.NumberofNodes...... 57

3.33 Stressvs.NodeDegree...... 58 3.34 Stretchvs.NodeDegree...... 59

3.35 Lossratevs.NodeDegree...... 59 3.36 Overheadvs.NodeDegree...... 60

4.1 Delay and Loss rate values of a triangle among San Francisco, Boston

andDallas...... 62 4.2 Delay and Loss rate values of a triangle among Chicago, Tokyo and

Johannesburg...... 63 4.3 Asampletopology...... 63

4.4 Relativevirtualdistances...... 64 4.5 Differentlyformedoverlaytrees...... 64 4.6 Stressvs.Time...... 66

4.7 Stretchvs.Time...... 66 4.8 Lossratevs.Time...... 67

4.9 Overheadvs.Time...... 67

5.1 PlanetLabnodesaroundtheworld[45]...... 70 5.2 Filteringprocesstoselectnodes...... 71

5.3 Maincomponentsofthesystem...... 72 5.4 Control Messages between Main Controller and VDMAgent...... 74

5.5 Sample tree shows virtual links among nodes...... 77 5.6 Sample tree includes nodes from United States and Europe...... 78 xi 5.7 StartupTimevs.ChurnRate...... 80

5.8 ReconnectionTimevs. ChurnRate...... 81 5.9 Stretchvs.ChurnRate...... 81

5.10 Hopcountvs.ChurnRate...... 82 5.11 Resourceusagevs. ChurnRate...... 82

5.12 LossRatevs.ChurnRate...... 83 5.13 Overheadvs.ChurnRate...... 83 5.14 StartupTimevs. NumberOfNodes...... 84

5.15 ReconnectionTimevs. NumberOfNodes...... 85 5.16 Stretchvs.NumberOfNodes...... 85

5.17 Hopcountvs.NumberOfNodes...... 86 5.18 ResourceUsagevs. NumberOfNodes...... 87

5.19 LossRatevs.NumberOfNodes...... 87 5.20 Overheadvs.NumberOfNodes...... 88 5.21 StartupTimevs.NodeDegree...... 89

5.22 ReconnectionTimevs. NodeDegree...... 89 5.23 Stretchvs.NodeDegree...... 90

5.24 Hopcountvs.NodeDegree...... 90 5.25 ResourceUsagevs.NodeDegree...... 91

5.26 LossRatevs.NodeDegree...... 91 5.27 Overheadvs.NodeDegree...... 92 5.28 Improvement in Stretch in use of Refinement...... 93

5.29 Improvement in Hopcount in use of Refinement...... 94 5.30 CostofRefinementinOverhead...... 94

5.31 ComparisonwithMST...... 95 1

Chapter 1

Introduction

With the improvement of bandwidth capacity and expansion of Internet usage, con- tent providers are aiming to deliver multimedia content over the Internet both as backbone (e.g. IPTV [2,26,35]) and overlay (e.g. P2PTV [17, 43]) distribution ap- plications. Average Internet download speed has jumped to 4.4 Mbps in 2010 from 127 Kbps in 2000 [6]. This growth attracted Internet users to watch high quality videos without downloading which could be on demand or live. High demand for such applications which constitute a large amount of today’s Internet traffic increases server load and network bandwidth consumption. For example, AOL [5] broadcasted Live8 concert in July 2005. They had 175,000 concurrent users as a peak which requires almost 100Gbps bandwidth for

500Kbps streaming quality. According to [10], online video is the largest contributor of the Internet traffic with a 26% share in 2010, with a tendency to grow further.

Given this growing interest and increase in video/multimedia traffic, video streaming applications are considered to be the killer applications for the Internet in near fu- ture. This recent trend in moving video distribution on to the Internet is calling for 2 mechanisms to efficiently and scalably transfer video content to many receivers from

a single source. Such content delivery to many receivers is desired to be seamless to the multi-provider operation of the Internet and capable of handling a lot of churn

from the network as well as the receiver population. The demand for group communication (i.e. multicast) on the Internet urged people to seek efficient solutions from network and end-user perspectives. For this purpose, IP Multicast was first proposed by [14]. IP multicast can be implemented in the network layer by integrating additional algorithms and multicast tables to routers. Even though IP multicast provides bandwidth efficiency, ISPs are reluctant to support it since it introduces extra workload and complicates network management.

Thus, IP multicast has not been successful in its applicability over the Internet due to scalability, deployment and limited support for high level functionality. Since

IP multicast could not get much support from network operators, application layer multicast (ALM) [13,27,29,34,36,52,61,62,66] has emerged as an alternative to achieve the multicast functionality. The idea is different than traditional client/server model, where a server is dedicated to serve clients. In ALM, a virtual network is established among end-hosts, each of which not only receives the stream but also forwards to other end-hosts. ALM is attracting more attention since it does not require support from network layer routers, and thus network operators. End-hosts constitute a multicast group which moves functionality from the network layer to the application layer. This makes ALMs easy to deploy. A similar technology that uses end-host-based communication architecture is peer-to-peer systems. The emergence of the end-to-end communication concept, which is called peer-to-peer (p2p), has lead to different applications. One of the most common applications of peer-to-peer systems is file sharing between partic- 3 ipants. Each participant can download or distribute a file which sometimes may violate copyright. File sharing systems such as BitTorrent [9], Gnutella [40], Open- Nap [21] have become so popular among people that they have millions of users.

Another p2p application that is used by many people is voice over IP (VoIP) sys- tems such as Skype [56], which showed the applicability of p2p-based architectures to problems like multimedia distribution in addition to the file sharing. There is no fixed infrastructure within such p2p multimedia systems, and they can scale up to thousands of users. They provide cheap communication for their users.

One of the most recent and complex application of peer-to-peer systems is video and real time on the Internet (also known as P2PTV) that we study in this dissertation. P2P video broadcast poses different challenges than other applications. It requires high bandwidth capacity and low latency. These problems attracted significant amount of interest from the networking research community. There are also many implemented and commercialized systems for P2P streaming audio and video such as SOPCast [57], PPLive [47], TVAnts [58], , PPStream [49],

QQLive [50], PPMate [48], TVUNetworks [59], [28] and UUSee [60]. Multicast- ing over P2P systems is essentially an ALM approach but the fact that the underlying

P2P system can be highly dynamic poses a set of complex research issues. The main goal of ALM is obtaining an efficient and robust . Different tech- niques can be defined to achieve this goal. There are several issues that need to be considered in designing a technique for P2P multicast. Live multimedia streaming is a real-time application which requires minimum delay. The end-to-end delay might be too long due to a high number of intermediate nodes and inappropriate overlay structure. Also, a large volume of data is transmitted in these applications, which requires avoiding redundant transmission of the multicast 4 traffic. So it is important to construct the overlay congruent to the underlying physi- cal network. One of the drawbacks of ALMs is being deprived of underlying network structure knowledge. This makes it hard to set efficient multicast data paths. Dis- tance measurements or topology maps are used to obtain best overlay network. ALM is naturally incapable of reaching IP multicast performance in terms of end-to-end delay and link usage efficiency. There have to be some cases that a packet has to be transmitted multiple times on a link. The fact that a packet has to traverse through some other peers increases the time to reach the destination.

Another challenge to be addressed is that peers might join and leave at any time. This behavior, known as churn, makes tree maintenance harder. Ungraceful leaving of a node causes data interruption for its descendants. Orphan nodes have to recover quickly. Long and frequent data outage is not acceptable for real-time applications, and thus robustness against churn is of crucial importance for overlay multicasting. Nodes on overlay exchange messages to initially start and maintain the virtual network. Since network conditions change over time, overlay should be updated. These messages are considered as overhead. Minimizing the control overhead is crucial to the scalability of a system. In an overlay network, each node might have different uplink bandwidth ca- pacity. This affects the number of children that a node can feed. Even there might be some users, called free riders, who don’t want to contribute to system. The het- erogeneity in node bandwidths adds another problem. Different techniques have been developed to overcome this issue such as incentive mechanisms or contribution obli- gation. 5 1.1 Contributions

This dissertation focuses on real time video delivery using p2p scheme. It has two main contributions, a new overlay multicast tree design and a generic distance calculation method. More specifically, our contributions include:

• We are proposing a new P2P multicast streaming technique which is called Vir- tual Direction Multicast (VDM). VDM focuses on a tree construction method to reduce redundant data transmission and failure recovery to decrease data

reception outage under churn. We aim to find the most appropriate parent for a peer so that data travels the minimum possible path. To converge on a tree

with a minimal source-receiver delay, we exploit round-trip times (RTTs) to measure the virtual distances between peers. VDM uses an iterative approach

by selecting a child which is in the same virtual direction. This iterative process continues until the best potential parent is found. The idea is to connect the nodes which are in the same virtual direction so that we try to minimize the

source-destination path length for the overall structure.

• A key property of our protocol is the capability of virtualizing the underlying

network in different ways. It is possible to establish “virtual directions” based on other performance metrics such as loss or bandwidth. Different values of

these metrics may produce different virtual distances and thus different overlay tree in our protocol. By generalizing and customizing virtual direction, we can establish target specific overlay trees to improve some specific performance

metrics desirable by applications.

• We did an implementation on Planetlab to evaluate the protocol in a more 6 realistic environment.

1.2 Organization of Dissertation

We organize the dissertation as follows:

In Chapter I, we overview recent trends in multimedia distribution and our motivation to work on Overlay Multicast. We touch on general reasons that boosted overlay multicast research. The technologies that overlay multicast resulted in are discussed. Then we explained our contribution to this research area.

In Chapter II, we first explore basic concepts about video delivery. Then we look at current technologies used for multicasting. Previous proposed techniques are deeply investigated with their advantages and disadvantages.

Chapter III gives a detailed description of the VDM protocol. We elaborate on system design principles with examples. A theoretical comparison between our

protocol and existing protocols is also made. Strength and weaknesses of protocols are discussed. We also look at the algorithmic complexity of the protocol, and present

different scenarios to understand its working principles better. After that, we present simulation experiment results gathered from a well known network simulator. Exper- iment simulations compare VDM and HMTP based on results. It also investigates

protocol characteristic against different network dynamics and simulation variables. In Chapter IV, we focus on generalization concept of virtual distance. Instead

of delay which is widely used as link performance metric for overlay construction, we attempted to utilize loss rate. This gives us the opportunity to design target specific

overlays. We performed additional experiments to compare the results of delay based and loss rate based overlay trees. 7 In order to test our algorithm under more realistic conditions, we developed an implementation on Planetlab presented in Chapter V. We introduce the Planetlab environment and the main components of implementation. We also look at more metrics in this part to understand the protocol better. Furthermore, we look at the effects of different variables over protocol design.

Finally in Chapter VI, we draw the conclusions of our research and talk about possible future work. 8

Chapter 2

Related Work

For multimedia content delivery to multiple users, data should be carried from source to receivers. Our scope is the multimedia delivery on the Internet. This could be achieved via either traditional client/server model, which seems unfeasible for today’s

high demand for multimedia applications, or new emerging technologies such as IPTV [2,26,35] and P2PTV [17,43]. High-performance transmission requirement for live

applications adds more complexity to the problem like delay and continuous stream reception without loss.

In this chapter, we first define basic multicast techniques with their pros and cons. After that we focus on network layer multicast which is the first proposed solution for group communication. Relevant application layer multicast tecniques are

also explored. 9 2.1 Types of Data Delivery

Multicast is defined as the delivery of a data message from single source to multiple destinations. We first describe different methods for delivering data to multiple re- ceivers over a network and then we define multicasting with respect to other delivery methods..

2.1.1 Unicast

Unicast is one to one communication between nodes. To deliver the same data to multiple receivers, a distinct connection must be established from source to each receiver. As illustrated in Figure 2.1, the source sends a packet to each interested user separately. This model is a killer approach for the server because it has to run a process for each connection. Also, a packet is transmitted many times on a link which overloads the network with redundant data. This model causes inefficient use of resources and degrades service quality in return, and limits scalability of system since the server can accept a limited number of connections and a link has limited transmission capacity.

Q% V`

QRV



Figure 2.1: Unicast 10 2.1.2 Broadcast

Broadcast is the transmission of a message to each computer connected to network. For example, radio frequency is carried at every direction through the air. A receiver

at certain frequency can play the signal; and whoever wants to use coming data can play it. Broadcasting on the Internet differs in that data could be delivered to only a subset defined by a prefix. But, uninterested users in that subset still receive the

data. In the broadcast model, the source sends a data packet only once. The packets are replicated at the routers. So, this causes a situation where a node receives the

data stream even though it is not interested. This model is inapplicable for video delivery on the Internet since users may be very sparse and data is not useful for

most of the users in the receiver set.

Q% V`

QRV



Figure 2.2: Broadcast

2.1.3 Multicast

In this model, only a subset of nodes receive data messages. The nodes interested in the content register so that they receive the data stream. Similar to the broadcast, the source sends a packet only once. The packets are replicated at the routers, and based on registry tables, forwarded only to the interested nodes. Multicast is the 11 most efficient method compared to the previous ones. The efficiency of multicast comes with extra overhead and protocol complexity, in that an extra state is needed to maintain the list of subscribers and extra messaging is needed to cope with a dynamic set of subscribers.

Q% V`

QRV



Figure 2.3: Multicast

2.2 Content Delivery Networks (CDNs)

Content delivery networks are composed of a main server and cache servers, also called surrogate servers, around the world. For example, Akamai [3], one of the leading CDNs with Amazon CloudFront [4], Limelight [31], Adero [1] and Digital

Island [15], has 70,000 servers in more than 70 countries. CDNs serve not only for live stream application but also for delivering web content, static data and other multimedia applications. A CDN could be viewed as a network of servers. The media in the original server is replicated to cache servers. When a user requests content, he or she is redirected to the closest server so that bottleneck near the original server is avoided. CDNs are mainly proposed to increase service quality for users by improving accessibility and time delay. They also help to reduce bandwidth consumption, network congestion and network traffic. 12 Even though using cache servers help to improve the overall network perfor- mance, communication between an end-user and a cache server is accomplished with unicast which decrease local network performance around the server. For example,

Akamai establishes several connections to an end-user from different surrogate servers for user satisfaction.

A CDN basically consists of following components.

• Original and cache servers

• Distribution of content to cache servers

• Routing requests to appropriate servers

This kind of solution is expensive. It requires placing cache servers around the globe. We need CDNs anyway for web content. To list some issues, cache servers need to be cooperated in terms of content. Different caching and mirroring systems have been developed. Determining replica places, where and how many cache server should be placed are some problems to be solved for a CDN design.

2.3 Network Layer Multicast

Network layer multicast is also called IP multicast. We will not describe protocol details but we give an overview, benefits and shortcomings. IP multicast is achieved in the network layer by integrating additional multicast tables to routers. Routers keep IP addresses of users for each specific stream and redirect a stream packet based on multicast table. End hosts use Internet Group Management Protocol (IGMP) to communicate with edge routers when joining and leaving a multicast group. Some well known IP multicast protocols are Distance Vector Multicast Routing Protocol 13 (DVMRP) [14], Multicast Open Shortest Path First (MOSPF) [38] which is an ex- tension from OSPF, Protocol Independent Multicast - Dense Mode (PIM-DM) [53] and Protocol Independent Multicast - Sparse Mode (PIM-SM) [12].

IP multicast provides the best solution from network efficiency perspective. A packet traverses a link only once since packets are replicated at routers. This prevents redundant data transmission. With network layer multicast, end users receive packets through shortest path. This gives minimum delay. On the other hand, IP multicast had been proposed in 1991 by [14], but still it has not been widely deployed over the Internet for some reasons [19]. It is only used in islands of networks such as campuses or LANs.

First, IP multicast is platform dependent. It cannot work without the support of the underlying structure. Routing devices should implement the protocol. Routers have this implementation, but ISPs turn it off for some reasons such as security, pricing policy. Second, it is difficult to implement some high level functionalities on top of network layer multicast. For example, in case of congestion on a link, the routing path should be updated which requires transferring all state information from one router to another. It also requires an update in other routers’ multicast tables. It is also difficult to implement access control to stream for receivers and senders, and set up pricing policy. Third, routers always keep a list of receivers for each multicast session. The list should be updated with joins and leaves, an obligation that constraints IP multicast to be applied to larger size sessions. Multicast session should be able to scale up. It also increases computation time and memory consumption in routers.

Even though so much work has been dedicated [19] by research groups, ISPs 14 and companies to overcome these issues, a satisfactory result has not come out yet.

Now, although most routers have ability for multicast, it is turned off because of these reasons. And, most of the proposed techniques have not been deployed yet.



Figure 2.4: Network Layer Multicast

2.4 Application Layer Multicast

2.4.1 Overlay Network

Overlay network is a virtual network on top of underlying network. Each host is called a peer. We setup a logical link between two peers regardless of real links. Each of these logical links is composed of one or more physical links. Nodes don’t have knowledge of physical network. Two nodes far from each other might be neighbor in the overlay network. This virtualization enables us to implement a network service which is not in the existing network. Figure 2.5 shows the conceptual diagram of an overlay network. 15 1` %:CC1J@

.71H:CC1J@

Figure 2.5: Overlay Network

2.4.2 Multicast in Overlay

Slow deployment of IP Multicast caused Application Layer Multicast (ALM) to emerge as an alternative to achieve the concept of multicast. ALM doesn’t require support of lower layers. Multicast is achieved in the application layer, which makes

ALM easy to deploy. Routers do only unicast forwarding, and other multicast func- tionalities are implemented in application layer. Figure 2.6 describes ALM concept. Each host acts as a router. A peer receives a packet and forwards it to some other peers on the overlay.



Figure 2.6: Application Layer Multicast 16 2.4.3 Application Layer Multicast Considerations

Application layer multicast is flexible and easy to deploy, but its performance depends on its design. ALM also comes with some issues that need to be addressed. The

common goal of all ALM methods is obtaining an efficient and robust overlay multicast tree. However, the criteria for effectiveness of the overlay multicast tree can be various depending on the application goals. We focus on live multimedia streaming over a

peer-to-peer (P2P) network. Live multimedia streaming is a real-time application which requires minimal

delay. Delay is defined as time needed for a packet to reach its destination. Data should traverse minimum path while being transferred from source to destination.

The end-to-end delay might be too long due to a high number intermediate nodes and an inappropriate overlay structure. Overlay tree design should provide possible minimum delay for each node. In construction phase, connecting close peers to each other is a general approach and a promising solution. But, locating peers correctly is major problem.

Another challenge to be addressed in a P2P network is adhoc behavior of members of overlay tree. Since most of the P2P systems don’t have membership re- quirements, peers might join and leave at any time. This behavior, known as churn, makes tree maintenance harder. Ungraceful exit made by a node may cause interrup- tion of data reception at its descendants. When such ungraceful exits happen, the orphan nodes have to recover quickly. Long and frequent data outage is not acceptable for real-time applications, and thus robustness against churn is of crucial importance

for overlay multicasting. This problem is attempted to be solved with different ap- proaches such as buffering, local repairs in case of churn, alternative potential parent

for quick reconnection, redundant links to source through different paths. 17 Moreover, large volume of data is transmitted in multimedia applications,

which requires avoiding redundant transmission of the multicast traffic. The rea- son for client/server model not being feasible for these applications is that data has

to be sent to each receiver separately which consumes bandwidth and server power. IP multicast is the best solution from this perspective if we disregard its shortcom-

ings to be deployed on the Internet. It prevents duplicate transmissions. ALM can’t solve this problem as optimum as IP multicast. Because of its nature, this redundant transmission has to occur in some links. But, it can be minimized with appropriate

overlay design. One of the drawbacks of ALMs is being deprived of underlying network struc- ture knowledge. This makes it hard to construct efficient multicast data paths. This could be solved by doing some measurements. Most used technique is to measure distance between peers which we call Round trip time (RTT). Some geolocation tech- niques which estimate geographical location of an IP and topology maps also can be to overcome this problem.

In an overlay multicast protocol, there has to be some message exchanges between peers. Overlay needs these messages for initialization and extension of the tree, to accommodate underlying network changes and for recovery from node failures. Minimizing the control messaging overhead is important to the scalability of such a system. Exponential increase of control messages constrains the system scalability. A peer in overlay serves its children while being served by its parent. A peer can serve only a limited number of children based on its outgoing bandwidth. Because of this limitation, the optimal overlay tree may not be built. Even there are some nodes that don’t contribute which we call free riders. Some enforcement mechanism can be developed against free riders. 18 Another major issue that a multicast protocol should address is the scalability of system. One of the shortcomings of IP multicast is its inability to scale. IP multicast has to track state information at routers and update it over time. This gets harder with larger group sizes. ALMs can scale better since there is no centralized management.

In an overlay tree, it is advantageous for a node to be close to source since it experiences lower delay. Also, reliability and consistency of nodes which are in higher levels in tree are more important. An incentive mechanism can be used to move upward reliable nodes in terms of consistency and contribution to the tree. So, we get a more stable tree while rewarding trustable peers.

All matters we mentioned in this section can be used to improve performance of an ALM. But, we focus on tree construction mechanism and quick recovery method that we explain in detail later.

2.4.4 Previous Studies on Peer-to-Peer Multicast

Since the emergence of ALM concept, numerous algorithms have been proposed using different techniques to achieve a successful overlay structure for live video streaming. Overlay network construction techniques can be classified into two main categories according to their structure [8]; mesh-based and tree-based. In mesh-based approach, either nodes join to multiple disjoint trees (e.g., Split- Stream [33] and ChunkySpread [63]) or choose a set of neighbors to create a mesh topology (e.g., CoolStreaming [64], Narada [67]). This approach is known as a pull- based mechanism. Important point for the mesh-based approach is its robustness to churn; but it is more expensive to maintain due to its higher control overhead, which also limits scalability. This approach is not satisfactory in terms of network resource 19 usage and is wasteful in leveraging the underlying network bandwidth.

In tree-based approach (e.g., BTP [24], HMTP [7] and Yoid [18]), nodes are organized in a tree structure rooted at the source. Nodes have parent-child relation-

ship. When a node receives a packet from its parent, it forwards to children; which is named as a push-based mechanism. The tree is extended when a new node joins the

group. Tree-based approach is efficient in terms of avoiding redundant data transmis- sion; but, when a node leaves its offspring, peers suffer from data outage. The tree must be repaired quickly in this case. So, it is not robust to churn from this point.

Another disadvantage is that while interior nodes are busy with forwarding data, leaf nodes stay idle; which is an unfair manner for members. Tree-based approach is

scalable and can be used for large groups.

2.4.5 Available P2PTV Applications

Before investigating previous application layer multicast protocols, we explore some

commercialized P2PTV applications. P2PTV is peer-to-peer television which uses overlay multicast concept to dis- tribute TV channels or special events to receivers. Some P2PTV applications allow users to broadcast their own video either self produced or obtained from a video.

Example applications are PPStream [49], PeerCast [44], [68], LiveStation [32], SOPCast [57], PPLive [47], TVAnts [58] and UUSee [60]. PPStream announced in 2005 is the first video delivery application on Internet. It can be used as browser plugin or desktop application. PeerCast was announced in 2002 in Japan for .

Since these systems are designed for commercial purpose, their internal designs are unknown. Some muasurement studies have been performed to understand their 20 behaviour. Delia [11] looks at PPLive and Joost [28] while Shahzad [51] is comparing

SOPCast and PPLive. According to findings presented in [51], there is not a certain effective way in choosing the parent. A test node in North America often download data from Asia. It means that P2P systems have not been able to deploy a consistent method to construct overlay tree. Also, systems are crucially depend on some reliable nodes which causes unfairness in the system. The findings show that P2PTV requires more work to become more efficient both from network and users perspective. In the rest of this chapter, we investigate some of previously proposed ALM methods.

2.4.6 Banana Tree Protocol (BTP)

Banana Tree Protocol is a tree-based overlay multicast protocol. It could be viewed as one of the simplest protocols. Each host in the tree is called a node. The node that created the tree has no parent. Other nodes have a parent. The next node to the root path is the parent. Children of a node are siblings of each other. For a node to join the overlay tree, it first connects to the root of the tree. Then it switches to a closer node which was a sibling before. The purpose of parent switch is optimizing the path. Figure 2.7 shows a sibling switch. A switches from R to B. For data dissemination, when a node receives data it forwards to its children. In order to avoid loop a node can not switch to its descendant and potential new parent should not attempt to change its parent which also might cause a loop. 21  

     Figure 2.7: Sibling switch in BTP

2.4.7 Host Multicast Tree Protocol (HMTP)

HMTP is an overlay multicast protocol that interconnects IP-enabled islands. If IP multicast is available in any subnet, one node is selected as head to join the overlay tree and IP multicast is used in subnet. The key idea in HMTP is connecting nearby peers. When a new peer wants to join, it contacts the source, and get list of the children. By probing each children, it finds closest child to itself in terms of delay. It repeats the same process with the closest child. This iterative process is repeated until best potential parent is found. HMTP also applies a tree refinement process. Each node randomly select a peer in its root path and look for if any closer peer than its parent connected in meantime. This refinement process is repeated periodically. HMTP aims to reduce routing inefficiency. It also proposes a foster child concept to shorten startup time. A node connects root at the beginning to start stream immediately. Then, it jumps to ideal parent when it is found. A simple join procedure for HMTP is given in Figure 2.8. H is a newcomer which is going to join the tree. It first contacts A, then finds D among children of A as closest. H detects F to connect. If F doesn’t have enough outbound bandwidth to accept a new connection, H flags F and goes back to D. It looks for next available child. HMTP also does triangle optimization to use shortest path between peers. 22 ^`QQ _   

  

  Figure 2.8: Join Procedure for HMTP

2.4.8 SplitStream

SplitStream is a tree based multicast protocol which uses multiple trees to disseminate data. The key idea is dividing the stream into k stripes. Each stripe is delivered through a different tree. Base stream is encoded in these stripes. The content can be constructed from any subset of stripes. One stripe is enough to watch the video, but receiving more stripes increase the quality of video. In this way, a peer continues to receive stream even though one tree is broken. Received video quality gets lower over this time, but user doesn’t experience data interruption. A peer may receive a subset of stripes so that nodes with different bandwidths can join the network. Another goal of splitstream is balancing load among all peers. A peer that is interior node in one tree has to be leaf in other trees. None of the peers is free rider so that all peers contribute. SplitStream is highly robust to failures since failure in one tree causes losing only one stripe until the tree is repaired. But, it is not scalable and imposes high overhead. An illustration of SplitStream is given in Figure 2.9. In this simple example, main content is splitted in two stripes. Each stripe is transmitted through an independent multicast tree. 23 Q%`HV

 `1]V  `1]V



Figure 2.9: SplitStream

2.4.9 NICE

NICE is also a tree based protocol for application layer multicast. The key idea is members are arranged hierarchically. NICE assigns peers to different layers. Topo- logically close members constitute a cluster at each level. A host can be only in one cluster. One member usually a central one is selected as head of cluster. Heads of each cluster make up another layer upon bottom layer. In Figure 2.10, we show hierarchical structure of nodes. Cluster leaders of layer 0 form layer 1. And, cluster leaders of layer 1 form layer 2. Number of layers depends on total number members in the multicast tree. Cluster size is upper and lower bounded by a constant. One goal of this organization is decreasing number of queries when a new host is joining.

The paper doesn’t mention about degree bound of members which stands as major problem in heterogeneous environment of overlay multicast. Especially cluster lead- ers should be selected amongst hosts which has higher degrees. When a node wants to join, it contacts the source. Source responds all members in highest layer. The newcoming host contacts all of these members to find closest to itself. After this iter- 24 ative process, new host is mapped to one cluster in lower layer. Neighbors exchanges

refresh messages among them to adjust to topology changes. Leader of clusters might change over time.

Q]QCQ$1H:CHC% V` 

:7V`

:7V`

:7V`



Figure 2.10: NICE

2.4.10 Narada

Narada is one of the earliest proposed application layer multicast methods. Narada is using a mesh-based approach, a highly connected graph between nodes. Narada has a designated node to initialize join process for a new coming peer. When a new member wants to join, in contacts this designated node and get list of children. It selects a random subset from this list and attempts to connect. This is repeated until at least one existing member accepts connection request. When the new coming node joined, it starts exchanging messages with neighbors. Narada implements a periodic refinement process to improve tree performance. Some new edges are added and some links are removed over time. When a node leaves, it notifies its neighbors. This information is propagated among neighbors. Leave of a node may cause the tree to be partitioned. This partition should be detected and repaired by existing peers. Narada is robust against node failures because of high connectivity among peers. In Narada, 25 all nodes keep state of other nodes which leads relatively high control overhead. So,

Narada is effective only for small size groups. It doesn’t scale well for larger sizes.

Q%`HV V1IVIGV` Q%`HV RRJV1C1J@

VCV V:C1J@ 

Figure 2.11: Narada

2.4.11 Topology-Aware Node Selection

In this paper [55], authors propose a scheme in which they assume that they have some known nodes around the globe. They are called landmarks. These are known by application nodes. These landmarks help nodes to find a bin to join. They also define regions called bins in which nodes are close to each other. Round Trip Times is used as distance metric. Nodes only know the distance to landmarks. They do not need inter-landmark distances. For a newcomer to join, instead of iterative queries like in Host Multicast, it measures its distance to these landmarks and decide where to join the tree. They incorporate this method with some existing overlay multicast protocols and test the binning strategy. They also analyze how number of landmarks effect performance of proposed scheme.

2.4.12 Content Addressable Network (CAN)

CAN [54] is a mesh-based structured overlay multicast protocol. It is designed to support large group size. CAN uses a virtual coordinate system in which each node 26 is assigned to a coordinate space using a hash-like function. Each node has a deter- ministic space on coordinate system. Nodes self-organize themselves into an overlay structure. A node maintains its neighbors with their IP addresses and coordinate spaces. For example in Figure 2.12, Node5 is neighbor of Node1, but Node6 is not a neighbor because there is not a common edge.

Two nodes are assumed neighbors if their coordinate span overlap and abut in one dimension. A node forwards messages by simply forwarding to its neighbor closest to destination. Since there are many routes between two points, in case of crash of a neighbor, a node receives message through next best route. For join process, the new node must find a node which is already in the overlay.

Then, new node selects a point P in the space. Then it sends a join request to the owner of this point. Join request is transmitted using CAN routing mechanism. When the owner of P accept this join request, the zone which includes P is splitted. And neighbors are notified about this join. Join of Node7 is illustrated in Figure 2.12.

   

      

  

Figure 2.12: Join in CAN 27

Chapter 3

Virtual Direction Multicast

As mentioned in Chapter II, there are different ways to deliver content to users. IP Multicast can be implemented in lower layer of protocol stack with the help of Internet Service Providers (ISP). Unicast was mentioned as a primitive solution. While we leave pros and cons of these techniques for previous chapter, we focus on our proposed overlay technique for content delivery.

Overlay multicast is an application layer technique that establishes a virtual network by connecting end hosts using logical links. Different dynamics play key role in designing an overlay tree in such a virtual network. Our goal is to build a virtual network confined to physical network. Even though our ultimate goal is not to find Minimum Spanning Tree (MST) for overlay tree, we try to converge to MST as much as possible by using local and simplistic methods that can be practical to implement. One reason to use overlay multicast instead of unicast is to relieve the network from redundant traffic. So, establishing a multicast tree close to MST is important, but not the only and ultimate goal. The overhead messages for constructing the tree should not overwhelm the system which would destroy the real purpose of the overlay. 28 On the other hand, the system design should take other performance factors which

are important for users into account. Virtual Direction Multicast (VDM) [37] is an overlay multicast algorithm. It

builds a multicast tree by making parent-child relationships between nodes which are determined to be on the same virtual direction based on performance of the

connections between them. VDM uses a single tree for multicast purpose. Each node has only one parent, but might have more than one child. It aims using resources more efficiently by building the multicast tree in a reasonable way by measuring inter-peer distances in a 1 dimensional directional ab- straction and by using fewer number of maintenance messages. It uses a decentralized method for tree construction. Each peer contacts the source at the beginning and finds a proper node to connect. VDM also observes user expectations by trying to reduce startup time and reconnection time.

3.1 Protocol Description

3.1.1 Key Design Considerations

In an overlay network, converging to Minimum Spanning Tree (MST) while observing other multicast requirements should result in better performance in terms of resource utilization and overall multicast quality. Overlay network is a degree-constrained en- vironment. Each node has a certain number of outgoing links, and thus, the multicast tree must be constructed within this degree constraint. Degree Constrained Minimum Spanning Tree (DCMST) problem is known to be NP-hard [20]. Additionally, in a peer-to-peer network environment, the overlay tree changes because of constantly new coming and leaving nodes. Moreover, network dynamics causes changes in path 29 performances between nodes and may require reconstruction of the overlay tree for

better performance. When we think all of these, calculating global MST is expensive and difficult.

But, staying close to MST using simple methods while satisfying other requirements is a better choice. With using VDM that calculates virtual distances between the

overlay nodes in a 1-D space, we try to converge to MST in general. In order to achieve this “directional” abstraction, we define three succinct cases that will be explained later.

A key design component of VDM is directionality. We locate a newly joining peer relative to existing peers with an iterative process using this concept of direc-

tionality. We take the peers three by three, and we estimate the location of the new peer relative to the existing peers by comparing inter-peer distances.

Another design component of VDM is the capability of virtualizing the under- lying network in different ways. It is possible to establish ”virtual directions” based on performance metrics such as delay, loss or bandwidth.

In environments like P2P networks, churn is a major issue. When peers are leaving or joining frequently, the performance of the protocol depends heavily on being able to swiftly switch to a new tree. Reevaluation of the overall multicast tree requires a centralized approach and is typically not possible within the very short period of time available for switchover. Our directionality-based procedure is completely distributed and can quickly establish a new and good performing tree with local repair. 30 3.1.2 Virtual Directionality on a Line

VDM exploits virtual directions when peers are joining or leaving the system. Sup- pose that that there is a source (S) and an existing node (E) which are already in

overlay network. There is a new node (N) which is going to join to overlay tree. We measure the distances among these three nodes. In Figure 3.1, S and E which are already in overlay network keep their position while N could be in three different

locations. Based on position of N, three nodes can form three combinations in linear representation, i.e., a line. R

   R R

 R  R 

R

 R 

 R R  Figure 3.1: Directionality Concept.

Distances d1, d2 and d3 are round-trip times (RTTs) measured by probing.

Longer distance is generally not equal to the sum of shorter distances which seems equal in linear representation. We look at the longest one to determine into what case the combination falls. There are three cases. Case I: In this case, S is between N and E. N should be connected to S.

Figure 3.2 represents this case in router level. Numbers are representing relative link delays. Dashed arrow shows new stream after S gets connected. Case II: N comes in between S and E. N becomes child of S and parent of E. 31  

      

 Figure 3.2: CaseI.

When we look at Figure 3.3, dotted line shows old stream, and it is removed when new connection, dashed line, is established.

  

    



Figure 3.3: CaseII.

Case III: E is in between S and N. N is connected to E. Dashed line in Figure 3.4 is new connection.

  

     

Figure 3.4: CaseIII.

With this technique, we aim to minimize multiple packet transmission on the same link and resource usage in the network. If there has to be multiple packets on a link, we try to find possible shortest one to minimize network usage. Figure 3.5 helps more to understand the last case. N is joining to network. It will either connect to S or E, or E will connect to N. The best solution is to connect N to E which gives us minimum stress and stretch. 32    

            Figure 3.5: Another illustration for CaseIII.

3.2 Join Process

A pseudo-code for the Join procedure is given in Figure 3.6. Nodes store some state information to cope with the protocol. Each node has children list and distances to them. They also know their parent and grandparent. When a node wants to join to the overlay tree, it sends a connection query to the source. It gets the list of its children and learns RTT by probing to each child. During a join process, we first look for if Case II or Case III exists among parent, an existing child and new child. We may find only one of these cases, multiple of the same case or two of them together. If CaseII or CaseIII is not found, it means that the new node is not in the same direction with any of existing children in this iteration. Then, it connects currently queried node if it has a free degree (or outgoing interface). Otherwise, it connects to the closest free child.

If we encounter CaseIII, we proceed to next iteration from that child, and repeat the same procedure. If we have CaseIII with more than one child at the same time, we select the closest one. If we find CaseII which means the new node is between two existing nodes (parent and currently checked child), then proper connections are made, and join process is done. In some cases we might have CaseII with more than one existing 33 S = source N = newnode Join Contact(S) N pings S and all children of S D(n) = find the directional nodes if D(n) is not empty (if CaseII or CaseIII exist) if D(1..n) is between S and N (All directional nodes in CaseIII) S = closest(D(n)) (Select closest of CaseIII) Contact(S) (Continue from closest one) end if if N is between S and D(1..n) (All directional nodes in CaseII) for D(1..n) if N has free degree S becomes parent of N N becomes parent of D(i) Update grandparent of D(i)’s children end if end for end if if CaseII and CaseIII together S = closest(D(n)) (Select closest of CaseIII) Contact(S) (Continue from closest one) end if else (Case I) if S has free degree N connects to S else N connects to closest free child end if end if end while

Figure 3.6: Join procedure. 34 child. Then, we make connections as long as the new node allows. Since every

node stores grandparent information to use in case of parent failure, grandparent information of existing child’s children should be updated.

If we find CaseII and CaseIII together, we continue with CaseIII by selecting the closest one if more than one CaseIII is existing.

When N finds the correct node, it connects and joins the session. A node can accept connections up to its maximum degree, which we call ”degree limit”. Each node has a pre-defined degree limit. We assume that degree limit of each node

is at least one. If the potential parent that new node decided to connect reached its degree limit, new node connects to its closest child which can accept connection without breaking its degree limit.

3.2.1 Join Examples

The utilization of defined cases for the join process is presented in this section. Let’s assume that we have an existing tree with several children as in Figure 3.7. S denotes the source and others are children in the tree. Children are already connected. Based on this tree, some examples are given. We try to illustrate all cases to help understand join process better.

Example I

N in Figure 3.8 is going to join the existing tree. N contacts the source by sending information request message. Source replies this message with information response which contains children list and distances to them. N queries all children to get

distance information. When N starts receiving information response messages from children, it starts to check described cases. For this example, N is not in the same 35





Figure 3.7: A sample overlay tree with source and some children. direction with any of the children. Since we don’t have CaseII and CaseIII for any child, CaseI is valid. So, N sends connection request message to S. S replies with connection response message which tells yes, if S is able to accept new connection.

Otherwise, it returns closest child for N to connect.

  



Figure 3.8: A join example illustrating CaseI.

Example II

In this example, another node, N, in a different position wants to join to the overlay tree in Figure 3.9. N will contact the source and get all children information. After 36 getting in contact with every child of source, it will find C1 in the same direction with

itself which falls into CaseIII. It will repeat the same process from C1. C1 will return its children list which is empty. So CaseI is valid. N will send connection request to

C1.

  

 



Figure 3.9: A join example illustrating CaseIII and CaseI sequentially.

Example III

   

 



Figure 3.10: A join example illustrating CaseIII and CaseII sequentially.

N in Figure 3.10 attempts to join the multicast session. It first again get children list by sending information request message to the source. N doesn’t have 37 any information about C2 at the beginning. It detects C1 which falls into CaseIII.

After that N continues its join process through C1. It gets children list from C1. It receives distance information from C1’s children. While checking these children, it

will see that N is between C1 and C2 which we call CaseII. So, N will connect to C1 while C2 changes its parent from C1 to N. When the proper connections are made,

connection process is done. Final tree will be as in Figure 3.11.

   



Figure 3.11: A join example illustrating CaseIII and CaseII.

3.2.2 More Complex Join Examples

Upto here, we presented some simple examples which includes only one case at a time to explain join process in basic. We need some more scenarios to understand some situations where two cases (CaseII and CaseIII) can exist with different children.

In Figure 3.12, we show these situations. In the first scenario, we have CaseII with two different children. A similar situation occurs for Scenario II. CaseIII is existing for both children. N can continue join process with one of the children. In the third scenario, we have CaseIII with C1 and CaseII with C2. We will look at each one by one and seek the best solution. 38   

      

 

:V  :V  :V :JR:V 

HVJ:`1Q  HVJ:`1Q  HVJ:`1Q   Figure 3.12: Different join scenarios.

Scenario I

In this situation, N detects CaseII with two children. N selects C1 or C2 as a child and P becomes parent. Then, if N has free degree, the other child connects to N. So, we get best solution in terms of local MST. Normally, in order to get this solution, we should know the distance among all nodes. In this case, we know all distances except

C1-C2. In order to know the distance between C1 and C2, either we should store this information (distance to sibling) and update this when necessary or we should make another measurement among siblings of P.

  

  

     

 : V   : V   : V   Figure 3.13: CaseII is existing with two different children in the same iteration.

Nodes in the overlay join at different times. We measure the distance, C1- C2, when C1 or C2 is joining the tree. Then using this information we make the 39 connections. But, after that we don’t save the distance information. First choice, storing this information, increases the number of state information that we have to store. This information has to be updated all the time with new coming or leaving siblings. All the siblings of a node might change with a parent change which occurs if a node get connected with CaseII in between child and parent. This complicates maintaining this state. Second choice, new measurement among siblings, increases the overhead of the system. For each join process, siblings should communicate with each other.

In our proposed system, using 1-D abstraction, we reach desired tree without storing extra information or using redundant messaging.

Scenario II

In this situation, Figure 3.14, N detects CaseIII with two children. We select the one which is closer to N. Then, we continue join process through that node.

  

     

  

 : V   : V   : V   Figure 3.14: CaseIII is existing with two different children in the same iteration. 40 Scenario III

In this situation as in Figure 3.15, N has CaseIII with C1 and CaseII with C2. In this case, we prefer CaseIII and continue join process from C1.

 

 

 

 

 : V   : V   Figure 3.15: CaseII is existing with one child and CaseIII is with another child in the same iteration.

This situation is a scenario that misses local MST in Figure 3.15. This problem can resolved by using some more specification in the design. When we do this, C3 in Figure 3.16, if there is such a node, is not able to find C2. So, it might cause other problems. We don’t want the directions to diverge too much from its original definition. As a result, we intentionally leave Scenario III as it is.







   Figure 3.16: C3 is not able to find C2 directionality divergence. 41 Scenario IV



 

   Figure 3.17: N needs to contact grandchildren of P to find C2.

Another example that a node may not find the closest node is presented here. In Figure 3.17, best potential parent for N is C2. But, when N checks P’s children, it will see C3 as a directional node and miss C2. This situation can be prevented only by contacting grandchildren of P which increases the overhead and time.

3.2.3 Complexity Analysis

In this part, we analyze the complexity of our join algorithm. A node who wants to join the session first contacts the source and gets all children information. Based on this information, it determines a direction to go. It will pick up one child and repeat same procedure through that child if needed. This process is repeated until the best potential parent is found. For our analysis, we assume that each child has the same number of degree and it is a balanced tree. Let’s say n is node degree for one peer, N is the total number of nodes and a is tree depth. It is possible to express the relationship between the number of nodes and the rest of the parameters as: 42

N = na (3.1)

Then, the tree depth will be

a = logn(N) (3.2)

which is in the order of O(logN).

In the worst case, if the node will join the tree at the leaf, the number of nodes it has to contact (A) will be

A = n ∗ logN (3.3)

It is on the order of O(logN).

So, complexity for join algorithm will be in the order of O(logN) which provides scalability and short connection time for nodes.

3.3 Reconnection

An Overlay network is an ad-hoc environment, particularly when it is a peer-to-peer overlay. The system depends on users who receive the stream through each other.

Users are free to join and leave the system at anytime. Even though some incentive and punishment mechanism can be utilized to increase the stability, adhocness (or churn) is still in the nature of the system. Our algorithm builds a tree to transmit the streaming traffic flow from source to each user. In this structure, when a node leaves the session, some orphan nodes occur within the tree. These nodes have to find a new parent to continue receiving 43 O = Orphan Node S = Source G = Grandparent P = Parent Leave(P) Notify each children ...... Reconnect(O) if G is alive start join at G else start join at S end if

Figure 3.18: Reconnection Procedure data. In VDM, a peer is required to inform its children when it is leaving. When an orphan child gets this leave message, it starts the join process at its grandparent.

Since our algorithm is using a single tree, quick reconnection is important to avoid high loss. We start reconnection process at the grandparent instead of the source to expedite the reconnection process. In that sense, reconnection is basically same thing as the join process except that it starts at the current grandparent. If both the parent and the grandparent leave at the same time, which could occur very occasionally, the orphan node goes to the source for reconnection, which will still take a small amount of time due to the logarithmic complexity of the join process.

Since the reconnection starts at the grandparent, we expect that it is accom- plished in a very short period of time compared to regular join process. 44 





  Figure 3.19: Orphan nodes starts reconnection at grandparent.

3.4 Refinement

E = Existing Node S = Source ...... Refinement(E) while sleep(timeperiod) Join process at Source if found node is different change parent end if end while

Figure 3.20: Source Refinement.

Refinement is usually performed to adapt the tree to changing network con- ditions. Since the Internet is a dynamic environment, loss or delay on a path might change. In order to reduce the effect of these changes, a periodical rejoin process can be used. One good thing about overlay design is that the tree can be modified easily.

We apply a periodic refinement in our algorithm. An existing node repeats 45 the join process triggered by a clock counter in itself, if the new found potential parent is different than the current parent, then it changes its parent. We choose this refinement period as 3 minutes.

In our regular experiments, we don’t use refinement component. In order to see the improvement that we can achieve with this component, we run additional ex- periments in Chapter V. Then we compare the results gathered by running algorithm with refinement component and without it.

3.5 VDM versus HMTP

Overlay multicast is a well studied area. Researchers have developed numerous num- bers of algorithms that have different approaches as we mentioned some of them in

Chapter II. All of these techniques have their own advantages and disadvantages. HMTP is one of these proposed techniques. Even though we didn’t get inspired from this technique, we found that it is similar to our proposed algorithm VDM. So, it is important to make one-to-one comparison between VDM and HMTP to see the differences and our superiorities. VDM and HMTP use different approaches when building multicast tree. HMTP focuses on closeness while VDM utilizes directionality concept. As we explained in

Chapter II, HMTP uses an iterative process that utilizes closeness factor. A node finds closest node to attach, then with periodic tree refinement process the tree is tuned. In our protocol, we try to detect nodes which appears in the same direction. Some scenarios are studied to clarify the difference more. 46   

  

   .:V .:V .:V  Figure 3.21: Scenario I showing the difference between VDM and HMTP.

Scenario I

We look at Figure 3.21. In phase1, N comes to join to the overlay. With HMTP, N connects to P first, then C finds N by sending a refinement message to its parent to see if there is a closer node. The overlay tree is formed as in phase3. But, by using

VDM we can directly detect the case and make proper connections. The disadvantage of using HMTP here is that since the tree is degree constrained, desired connections may not be established. If P cannot accept N because of degree limitation, the opportunity to connect these three nodes in the best manner will be missed. Another disadvantage of HMTP is that it has to use periodic tree refinement to be able to detect closer child. Using our directionality concept, we directly connect these three nodes in best way without using any extra messaging.

Scenario II

  

  

   .:V .:V .:V  Figure 3.22: ScenarioII showing the difference between VDM and HMTP. 47 A similar situation exists for Scenario II. In this case, N finds C as the closest node. A U-turn problem occurs here. To solve this issue, HMTP explicitly looks at the distances among these three nodes. If the distance from N to P is longer than the distance from N to C, N connects to P so that C can find N in refinement stage. This case also faces a degree limitation problem.

Refinement

VDM and HMTP have different refinement processes. HMTP selects one node on its path to source, and starts the refinement process from that node. HMTP uses refinement to complete the join process. A node in HMTP has to do refinement to find a closer sibling. But, VDM achieves the same thing without any refinement. This requirement for HMTP exposes too high overhead for the system. Our refinement purposes to adapt the tree to changing conditions of the internet.

Overhead

As mentioned at several places, refinement is a part of join process for HMTP. Every node has to check if there is a closer node. The time to converge to a better tree depends on the frequency of these refinement messages. On the other hand, VDM achieves better tree without using refinement messages. So, VDM is very efficient in terms of overhead when compared to HMTP.

3.6 Simulation Experiments

In this section, we evaluate the performance of VDM by implementing it in a network simulator. We present the simulation environment and the setup details. Then, 48 evaluation results are analyzed. More detailed evaluations performed in more realistic

environment are presented in Chapter V which includes a PlanetLab implementation of our protocol.

3.6.1 Simulation Basics

Network simulators are typically used to simulate scenarios without configuring real networks. Simulators can help with the development and testing of a network ap-

plication. Network simulators allow a developer to produce a network topology and define delay, bandwidth and traffic characteristics. Since same simulations can be

produced with same setup, simulators are very suitable to compare different proto- cols. There are two main types of network simulators, packet-based and flow-based.

Flow-based simulators work at application level and don’t implement the network stack. Packet-based simulators allow for a more detailed and realistic analysis. There are some simulators such as OverSim [41], PlanetSim [46], P2PSim [42]

that have been designed specifically for P2P protocols. But, we prefer using NS-2 [39] which is a well established network simulator within the network research community.

It has been actively developed for almost 20 years. NS-2 is an event-based simulator for packet-level simulations. It attempts to model the whole network stack. Each action in the simulation can be traced for analysis. NS-2 uses C++ and an object oriented version of TCL called, OTCL. Simulations are written in TCL scripts. The TCL scripts define the nodes and the characteristics of the communications links, while protocols are implemented in C++. We incorporated our protocol into NS-2. We had to change some internal

files in the simulator while adding some new files in order to add VDM as a new agent. We then created scenarios for testing purposes. A scenario file contains such 49 information that specifies time, topology, traffic, and various actions taking place during the simulation.

3.6.2 Simulation Setup

We use NS-2 [39] to conduct simulation experiments for evaluating our protocol. We generated transit-stub model topology consisting of 792 nodes using GT-ITM [22]. One of the node is chosen as source for simulation. The source is alive over all simulation time, and it is known by others peers. Randomly selected 200 of nodes are joining to overlay multicast tree. We run the simulation for 10000s. We give 2000s for join process at the beginning. We take 400s as a time interval and define the churn based on that interval. Based on the churn rate, a number of nodes join and leave the tree. For example, if churn rate is 10%, then 20 new nodes are joining and 20 of existing nodes are leaving in each time slot. Number of nodes in the overlay is retained by the end of the 400s time interval. At the end of every time slot, we give 100s for tree to come to steady state, then we do the measurements. We expose the tree to churn again after measurement. This process is repeated until the end of the entire simulation. For instance, the nodes are renewed almost twice over lifetime under 10% churn. Some nodes may join and leave several times while some never join. All nodes are considered equal. There is no supernode. Degree limits of nodes ranges from 2 to 5. We simulate the protocols under different churn rates from 1% to 20%. We repeated the simulation experiments 32 times for each churn rate, and we report 90% confidence intervals on our results. 50 3.6.3 Performance Metrics

We are interested in efficiency of data delivery path and service quality that end- users are experiencing. In order to quantify these two targets, we focus on four performance metrics. Stress is the major factor for data delivery efficiency while stretch and overhead also have some impact. Service quality is basically measured with loss rate and delay.

Stress

Stress is defined as the number of identical packets transmitted on the same link. Stress is the best metric to measure how efficiently the resources are used. In IP multicast, stress is always one since a packet goes through a link only once. In overlay multicast, multiple transmission on some links is inevitable. But, it should be kept as low as possible.

Q% V`  QRV

  



Figure 3.23: Stress values of links are shown

We use the following equation to calculate average stress for each packet.

N avg.stress = X stress(i)/N (3.4) i=1 51 where i represents each used link in the overlay network and N is total number

of used links.

Stretch

Stretch is used to measure the delay for a packet to reach the destination. Stretch is

defined as the ratio of path length a packet is traveling in the overlay multicast tree to that of in unicast. Unicast is assumed to have optimal stretch. Overlay multicast

can’t achieve as low as unicast since packets have to travel some intermediate nodes. In Figure 3.24, node B receives the stream through node A. Link propagation time

would be 8s, but because of tree structure this delay becomes 12s.



Q% V`  

 QRV



 

Figure 3.24: Stretch values of links are shown

We calculate stretch for each node then find the average. The equation is

N avg.stretch = X stretch(i)/N (3.5) i=1 where i represents a node that is alive at measurement time and N is the number of alive nodes. 52 Messaging Overhead

We define overhead as the ratio between maintenance messages and data messages. Overhead has the primary impact on the scalability of the system. Based on the

design, complexity of overhead for an highly connected mesh could be too high which overwhelms network with maintenance messages in contrast to the motivation for p2p systems. Such a design could be applicable only to small size networks.

We can derive the equation as

total maintenance messages overhead = (3.6) total data messages

Result could be normalized to get a more readable number. Trend versus network size is important rather than number.

Loss

Loss at a peer is the ratio of number of lost packets to the number of packets supposed to be received in the peer’s lifetime. Loss rate is important for experienced service quality for the end users. Loss rate depends on number of stream interruptions that stem from churn and path quality between nodes. Loss could be formalized as

total sent packet − total received packet lossrate = (3.7) total sent packet

3.6.4 Simulation Results

We show results of previously defined four metrics with 90% confidence interval. We investigate the behavior of these metrics versus churn rate, number of nodes and 53 node degree. VDM is compared to IP multicast with stress. It shows how much VDM

converges to IP Multicast. Stretch is comparing VDM to unicast. Unicast provides smallest delay for peers. Overhead and loss cannot be avoided especially under adhoc

behaviors of peers. But they should be kept minimal, and they shouldn’t increase exponentially with churn rate.

Comparison with HMTP

In this part, we compare the simulation results for VDM and HMTP. Analytical comparison was done before in Chapter III. Here, we run both protocol on the same

topology using same scenarios.

VDM 1.75 HMTP 1.7

1.65 ss

e 1.6 r t s 1.55

1.5

1.45

1.4 1 3 5 7 10 churn (%)  Figure 3.25: Stress vs. Churn.

In Figure 3.25, we show stress vs churn rate. Stress is one of the most important metric for resource usage efficiency. Average stress is around 1.6 for both VDM and HMTP. VDM gives slightly better results. Stress doesn’t change significantly while churn rate increasing for both protocol.

Figure 3.26 shows stretch vs churn rate. Stretch is important for efficient content delivery and efficient resource usage. VDM outperforms HMTP in terms of 54 15 VDM HMTP h c t

e 10 r t s

5 1 3 5 7 10 churn (%)  Figure 3.26: Stretch vs. Churn.

3 VDM 2.5 HMTP

2 ) % ( 1.5 ss o l 1

0.5

0 1 3 5 7 10 churn (%)  Figure 3.27: Loss rate vs. Churn. stretch. Average stretch is around 7 for VDM while it is around 12 for HMTP. Stretch is slightly increasing with churn rate for both protocol.

Figure 3.27 shows average loss rate for all nodes. End users are especially interested in continuity and quality of streaming. High loss rate dissatisfies end users. Average loss rate is below 2% for VDM under 10% churn. It is higher for HMTP.

In this simulation, we don’t apply link error which causes packets to be lost. So, all packet loss are caused by disconnection of churn. That is why it is so small when 55 churn rate is low. 6 VDM HMTP 5 ) %

( 4

head r

e 3 v o

2

1 1 3 5 7 10 churn (%)  Figure 3.28: Overhead vs. Churn.

Figure 3.28 shows the comparison between VDM and HMTP for overhead.

Overhead should be kept small to put less load to network. It cannot be prevented from increasing with an increase in churn rate, but it shouldn’t be exponential. Fig- ure 3.28 depicts that overhead increases linearly as churn rate increases. Overhead is around 2.2% for VDM when churn rate is 10%. VDM achieves less overhead compared to HMTP.

VDM Performance versus Number of Nodes

We investigate how performance of VDM change when we increase the number of nodes in overlay topology. We start with 100 nodes and goes up to 1000 nodes. With this experiment, we basically look at the scalability of the system. Scalability is important for a system to be deployable. Stress, Figure 3.29, is going up from 1.3 to 1.8. Density of nodes is increasing which causes multiple use of a link for same packet transfer. In another word, the amount of used resource is increasing as expected since more nodes are joining the 56 2 VDM

1.8

1.6 ss e r t s 1.4

1.2

1 100 200 300 400 500 600 700 800 900 1000 number of node  Figure 3.29: Stress vs. Number of Nodes.

10

9 VDM

8

7

h 6 c t e r

t 5 s 4 3

2

1 100 200 300 400 500 600 700 800 900 1000 number of node  Figure 3.30: Stretch vs. Number of Nodes. overlay network. An average value of 1.8 for stress is still a good value for 1000 nodes. Stress is inevitable for overlay network because we have to do multiple transmission on some of the links. This value shows that we locate peers well and make good connections confined to physical network.

In Figure 3.30, stretch is starting at 4 and reaching 9 for 1000 nodes. An increase in number of nodes causes overlay tree depth to increase since node degrees are same. Packets have to traverse more intermediate nodes to reach destination 57 especially for those which are leaves in the tree. When the stretch is high, time for

packet to reach the destination is increasing.

1.5 VDM

1 ) % (

ss o l 0.5

0 100 200 300 400 500 600 700 800 900 1000 number of node  Figure 3.31: Loss rate vs. Number of Nodes.

When tree depth is higher, a failure of a node, especially if close to source, causes more nodes to get disconnected. So loss rate, Figure 3.31, is increasing with number of nodes. 1.8 VDM

1.6 )

% 1.4 (

head r

e 1.2 v o

1

0.8 100 200 300 400 500 600 700 800 900 1000 number of node  Figure 3.32: Overhead vs. Number of Nodes.

Overhead, Figure 3.32, is also going up. A node has to communicate more nodes when joining to overlay. There is a diminishing increase since tree depth is 58 increasing with logarithm of number of nodes.

In general, we observe an increase for all metrics when the number of nodes is increased. But, it is important that these are not exponential which enables the

system to scale.

VDM Performance versus Node Degree

Nodes in an overlay network have limited degree because of uplink capacity of users. In this part, we look at performance change of VDM when we change degrees of

nodes.

2 VDM

1.8

1.6 ss e r t s 1.4

1.2

1 1.25 1.5 1.75 2 2.5 3 4 5 6 7 8 average node degree  Figure 3.33: Stress vs. Node Degree.

Stress, Figure 3.33, doesn’t change significantly with node degree increase. But, stretch, Figure 3.34, is heavily depending on degree up to a point. When nodes don’t have enough capacity to serve multiple children, tree depth has to be longer which causes peers to experience a long delay. Stretch stabilizes after a point and doesn’t get improved much after this point. The reason is that we don’t fully utilize uplink capacity of a node if we don’t need. We try to confine the overlay tree to physical network. We observe network utilization by trying to converge to MST. 59 40

35 VDM

30

25 h c t

e 20 r t s 15

10

5

0 1.25 1.5 1.75 2 2.5 3 4 5 6 7 8 average node degree  Figure 3.34: Stretch vs. Node Degree.

3 VDM 2.5

2 ) % ( 1.5 ss o l 1

0.5

0 1.25 1.5 1.75 2 2.5 3 4 5 6 7 8 average node degree  Figure 3.35: Loss rate vs. Node Degree.

Low node degree causes long path length for peers. When we have long path length, loss rate is also high. With node degree increase, loss rate, Figure 3.35, is decreasing first, and then fluctuates. Overhead, Figure 3.36, exhibits an interesting characteristic. It decreases up to a point then it again starts increasing. In case of low node degree, high tree depth, a node has to do many iteration to find appropriate parent. So it gives a high overhead.

Overhead gets lower until a certain point. But, after a this point, a node needs to 60 5 VDM 4 ) %

( 3

head r

e 2 v o

1

0 1.25 1.5 1.75 2 2.5 3 4 5 6 7 8 average node degree  Figure 3.36: Overhead vs. Node Degree. contact more nodes in one iteration than needed. 61

Chapter 4

Generalizing Virtual Directions for Various Metrics

4.1 Generalization

Live multimedia streaming is a real-time application that requires minimal delay, where the delay is defined as time needed for a packet to reach its receiver(s). The data packets should ideally traverse the minimum path while being transferred from source to destination. Overlay tree design should provide possible minimum delay for each multicast receiver. However, delay and loss rate between two nodes may be uncorrelated because of background and cross traffic on routers. So, a peer might experience high loss rate on a good path in terms of delay. Sensitivity of multimedia applications differs against various network performance metrics such as delay, loss, or bandwidth. This requires to take other factors into account when building overlay tree.

A key property of VDM is the capability of virtualizing the underlying network 62 in different ways. Though we only consider inter-peer delay (i.e. RTT) in previous

section, it is possible to establish “virtual directions” based on other performance metrics such as loss or bandwidth. Different values of these metrics may produce

different virtual distances and thus different overlay tree in our protocol. By general- izing and customizing virtual direction, we can establish target specific overlay trees

to improve some specific performance metrics desirable by applications. Calculating the virtual distance based on different criteria, but without protocol modification, makes the overlay multicast protocol satisfy different quality expectations. Our key

goal in this part of the work is to automatically calculate overlay multicast trees such that they can be seamlessly customized to applications’ performance goals.

Q QJ I5Q 8 

:J`:JH1HQ

I5Q8  I5Q8  :CC:

 Figure 4.1: Delay and Loss rate values of a triangle among San Francisco, Boston and Dallas.

In order to corroborate the generalization method, we took simple measure- ment statistics from [30]. It shows latency and loss rate among three cities in United States and in three different countries. Values among San Francisco, Boston and

Dallas are shown in Figure 4.1. Ratio among three values for latency and loss rate is different, thus overlay tree to be constructed based on delay and loss rate among three nodes in these cities will be different. As an another example, we look at the mea- surements among Chicago, Tokyo and Johannesburg, values are shown in Figure 4.2, 63 .1H:$Q

 I5Q 8  Q@7Q

I5Q8   I5Q8

Q.:JJVG%`$ 

Figure 4.2: Delay and Loss rate values of a triangle among Chicago, Tokyo and Johannesburg. which also gives different ratio for latency and loss rate. We also took sample inter-PoP measurement dataset from [25] which has la- tency and loss rate information. From this dataset, we pick three points A, B, C among the links whose loss rate is not zero. We look at delay of A-B (d1) and B-C

(d2), and loss rate of A-B (l1) and B-C (l2). When we compare the ratios d1/d2 and l1/l2, 44% of this dataset is inversely correlated. And, the rest doesn’t give the same ratio.



 

  Figure 4.3: A sample topology. 64 We illustrate a topology in Figure 4.3. Again S is source, E is existing child

and N is a newcomer. Relative distances among these three nodes might be different as shown in Figure 4.4 when we do distance measurement in terms of delay and loss.

As a result, overlay tree will be formed in different ways as in Figure 4.5. For this specific topology, this difference is caused by the traffic characteristic on router R4.

 

 



 VC:7RG: VR Q RG: VR  Figure 4.4: Relative virtual distances.

 

 

  VC:7RG:VR QRG:VR  Figure 4.5: Differently formed overlay trees.

If we mention some works similar to our generalization proposal. In [65], they focus on quality improvement for VOIP. The paper first proposes a retransmission protocol for loss rate improvement. If problem persists in path then it looks for path optimization using combination of delay and loss rate. It also investigates performance of different distance calculation methods such as hop count, best route, expected 65 latency and loss rate.

iPlane nano [23], a modified version of iPlane, introduces a system for end-to- end measurement. It is a small sized information data set usable by other applica-

tions. They aim to provide a lightweight metric prediction system for large scale P2P networks.

Real time loss rate estimation between two points may not be as quick and easy as delay. There are specific measurement studies [16,25] on this subject. These systems can be used as a third party service provider. We are more interested in showing the advantage of using generalization method rather than measurement study for this work.

We propose a generalized method of calculating overlay trees to increase user- perceived quality of performance-sensitive applications. We define and use the concept of “virtual distance” to determine “virtual direction” for constructing overlay trees. Abstracting applications’ sensitivity within the virtual distances, we aim to find the most appropriate parent for a peer according to the application’s purpose. We em- bed the virtual distance method in our protocol, VDM, and show that the protocol automatically calculates overlay trees based on delay (VDM-D) or loss (VDM-L), depending on which is more important for the application under consideration.

4.2 VDM-D versus VDM-L

In this part, we evaluate the performance of VDM-D(delay-based) and VDM-L(loss- based) in order to prove the efficiency of generalization concept. We analyze protocol behaviors versus time. In this experiment, each physical link in topology is assigned a random error rate between 0% and 2%. Measurements are done until all the nodes 66 join, they don’t show results for the times that churn takes place. At each interval 50

nodes join,and then we do the measurement. We show average values of 32 simulations for each metric. We expect that VDM-L reduces loss rate while trading off stress and

stretch. VDM-D should give better results for stress and stretch.

2 VDM-L VDM-D 1.8

1.6 ss e r t s 1.4

1.2

1 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 time (sec)  Figure 4.6: Stress vs. Time.

Stress, Figure 4.6, is increasing with time. After all nodes join, average stress of VDM-D is around 1.7, and around 1.9 for VDM-L.

10

9

8

7

h 6 c t e r t 5 s 4 3 VDM-L 2 VDM-D

1 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 time (sec)  Figure 4.7: Stretch vs. Time.

Figure 4.7 shows average stretch vs time. VDM-D gives a better path for peers 67 compared to VDM-L.

16

14

12 ) % (

ss 10 o l

8 VDM-L 6 VDM-D

500 1000 1500 2000 2500 3000 3500 4000 4500 5000 time (sec)  Figure 4.8: Loss rate vs. Time.

Figure 4.8 shows average loss rate vs time. Loss in this graph is caused by packet drops over path since churn is not included in the simulation. Packets are dropped over links according to their error rate. With VDM-L, we achieve better performance in terms of loss rate.

0.3 VDM-L 0.25 VDM-D

0.2 ) % (

0.15 head r e v

o 0.1

0.05

0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 time (sec)  Figure 4.9: Overhead vs. Time.

Even though we used same number of probing messages to measure delay and 68 loss rate, overhead, Figure 4.9, for VDM-L is calculated less. The reason is number of lost packets in VDM-L is less which makes denominator greater in the definition. Our major concern for this part is getting better performance results for certain types of metrics which may be more important for different applications. VDM-D, uses delay for distance estimation, improves stress and stretch while gives higher loss rate. It could be used for delay sensitive applications. On the other hand, VDM-L achieves better performance in terms of loss rate. It can be chosen for more delay tolerant but loss sensitive applications. 69

Chapter 5

PlanetLab Implementation

5.1 PlanetLab Environment

PlanetLab [45] is a globally distributed platform for researchers. It currently consists

of 1068 nodes at 560 sites as of July 2011. In 2009, University of Nevada, Reno joined this group by contributing 2 nodes. Generally, research institutions hosts nodes, but

there are some nodes colocated at different places. All of them are connected to the Internet and all machines run a common software package which is a linux-based

operating system. Since its deployment, more than 1000 researchers have developed new tech- nologies using this platform. PlanetLab is serving as a testbed for research groups.

They can request a slice in which they can run their experiment for different network studies such as routing, multicast, content distribution, file sharing, and network measurement. The advantage of the PlanetLab environment is that people can test their new services under real conditions and at large scale. They can expose their protocols to 70 real environment in which link failures, congestion are taking place, and observe the

reactions.



Figure 5.1: PlanetLab nodes around the world [45].

Red dots in Figure 5.1 shows Planetlab sites over the world. In order to become a PlanetLab node, an institution has to dedicate at least 2 machines. Each machine is called a node. A slice is created for a project and users are assigned to that slice. When the slice is created, a virtual server is created for that slice at each node. A slice expires after some time and has to be renewed by the user. The slice name for our project is unr vdm.

5.2 Virtual Direction Multicast on PlanetLab

5.2.1 Node Selection

On PlanetLab, some nodes aren’t working. Some nodes block ping messages. Using these nodes might give wrong results for our experiments since VDM needs to obtain a 71 response to its ping messages We need to eliminate the nodes that are not working. We

apply several filtering processes to select appropriate nodes as illustrated in Figure 5.2.

1C V` .V 1C V` .V 1C V` JQRV  .:  JQRV  .: 7Q% V CCQRV `V ]QJR1J$ Q :CCQ1  Q]1J$ H:J`%J ]1J$ Q .V` 0RI:$VJ

Figure 5.2: Filtering process to select nodes.

We first get all the nodes, then send ping messages to all nodes. Unresponding nodes are eliminated. Then, we try to send ping messages from inside the node to others. Again, we eliminate the nodes that don’t allow pinging. Finally we run a small program at every node. The program declares itself to the source by sending a declaration message, so that we make sure that we can run our agent by remote connection.

5.2.2 Implementation

We have developed several components and a considerable amount of coding to get

VDM work on Planetlab. Main components of implementation are illustrated in Figure 5.3.

Scenario

There is a list of nodes that works properly on our local machine. We have a scenario generator which generates the scenario file by using the node list and provided seeds. The scenario file determines the whole scenario such that when and which node will join and leave. By using different seeds for random number generator we get different 72

:1J  $VJ `:JHV10V` QJ `QCCV`

HVJ:`1Q VJRV`

Figure 5.3: Main components of the system. scenario files. A line in scenario file mainly has action type, node information and time for action. Main controller reads this file and executes the commands in this file sequentially by contacting the node which could be an action of join or leave.

VDMAgent

VDMAgent is uploaded to every node on Planetlab. It carries out the core job of the protocol. When it receives the connect message from the main controller, it contacts the source which is known by each node. Then, VDMAgent using the algorithm embedded inside finds an appropriate parent to connect. As soon as it connects to a parent, it starts the transceiver unit which is responsible to receive data message from parent and transmit to children if exists. After a node gets connected, VDMAgent keeps running and answers incoming information messages or connection requests. It is also responsible to reconnect the node to the tree if the parent leaves the session. At the end of the session, when it receives the termination command from the main 73 controller, it starts result calculator to calculate session statistics for itself.

Sender

Our system has single source which hosts the main data stream. The source node sends this data to its children. Other nodes transmits the data that they receive from

their parents. Sender is responsible to send the data at the origin. This component exists and runs only at the source node. When the session starts, Sender begins streaming even though there is no child in the tree. When a node connects, it starts receiving stream after that point.

Transceiver

Every node has this unit except the source. When a node connects, the transceiver unit is started. The main job of this unit is to catch incoming data messages and transmit them to its children. The node plays the stream and sends a duplicate to

children via the transceiver unit.

Main Controller

The Main controller is in charge of applying the scenario in the input scenario file by

communicating with Planetlab nodes. The scenario file tells time, node and action for each event in the simulation. When the time comes the main controller sends

an appropriate message to the related node for that node. There are mainly three different control messages from the main controller to the nodes. The first one is a

connect message which alerts the node to connect to the multicast session. When a node receives this message, its VDMAgent starts the connection process. Another one is the disconnect message that tells the node to leave the session. When a node 74 gets this message, it doesn’t get a copy of the stream any more. VDMAagent has to

notify the children when it leaves. A node might rejoin after some time even if it left the session. In that case, it needs to run the join process again. The last message

is terminate which is sent to every node at the end of the session. When a node receives this message, it stops the transceiver and runs the component to calculate the statistics and various other results desired for the experiment. At the end we download the result file from all nodes to collect the aggregate statistics.

 :` HQJJVH  ]`QHV

:1J IV:$V R1HQJJVH  $VJ Q 1`7.1CR`VJ QJ `QCCV`

:CH%C: V V%C 

Figure 5.4: Control Messages between Main Controller and VDMAgent.

Control Messages between Nodes information request: This message is used by a peer for two purposes. First one is that a node can learn the distance to another node. It includes a timestamp. It compares the difference between the timestamp and the time that response message arrives. Then it estimates the distance. Second purpose is that response message to information request includes children information that will be used for the join process. When a node receives a connect command from the main controller, it starts the join process by sending this message to the source. 75 information response: This message is a response to information request. The queried node puts the timestamp that comes with information request into informa- tion response. It also attaches children list with distances to them.

connection request: A node trying to connect continue to send information request messages until it finds a parent based on the VDM’s join algorithm. When it finds a

proper potential parent, it sends a connection request message to this potential par- ent. This message tells type of connection which could be CaseI or CaseII. If CaseI, potential parent has to have free degree. If CaseII, this is not an obligation.

connection response: Potential parent responds using this message to connec- tion request. Based on the request message type, it acts differently.

parent change: If a node connects with CaseII, it goes between two nodes that have parent-child relationship. The child needs to update its parent and grandparent.

New connecting node with CaseII sends this message to child to update its parent and grandparent. grand parent change: Again with CaseII connection, grandparent of some nodes might change. This information should be updated. This message is sent to related nodes to update their grandparent information.

5.3 Performance Metrics

In Planetlab implementation, we used some more metrics in addition to the metrics defined for NS-2 simulation experiments. 76 Startup Time

When a node receives a connect command, it marks the time. When it is able to find a parent to connect and establish the connection it checks time again. The

difference between two clock read is recorded as startup time for this node. Then we calculate average startup time for all nodes. We also look at the maximum values to see worst case performance. But, sometimes Planetlab nodes are lazy to answer the

information request. So, the maximum value may not reflect algorithmic complexity of our protocol. To see this, we also look at the hopcount for all nodes and leaf nodes explicitly as we explain later.

Reconnection Time

When a node receives a leave notification from its parent, it contacts its grandparent to rejoin the tree. A quick rejoin process is required. We measure the time required for reconnection for the nodes whose parent leave. Again we look at the average and maximum values.

Hopcount

Hopcount is a metric similar to stretch, and measures the number of hops a node is away from the source. But, hopcount gives a better idea on how the tree is established. We look at average hopcount for all nodes and for leaf nodes. We can understand the tree balance with hopcount statistics.

Network Usage

In our Planetlab experiments, we use network usage instead of stress as it is easier to measure the network usage in PlanetLab. Both reflect resource utilization. We 77 assume that latency is proportional to path length between two peers. By adding the

latencies for all used links, we get a total value that gives us used resource. With this we can understand how much the tree converged to Minimum Spanning Tree.

5.4 Results

5.4.1 Sample Tree

 Figure 5.5: Sample tree shows virtual links among nodes.

In Figure 5.5, we show a sample tree established using VDM. The big circle in the middle is chosen as source in this tree. Other nodes join the tree at different times. Geographically, nodes may not establish the best tree because the Internet 78



Figure 5.6: Sample tree includes nodes from United States and Europe. 79 backbones and routing within and between ISPs may result in different distances between the nodes in contrast to geographic distribution. Instantaneous background traffic may also be the reason. Even though the tree is not a exact fit, it still reflects geographical distribution quite well. For example, two nodes in our university are connected. They are able to find each other.

Another sample tree is in Figure 5.6 that includes nodes both from United States and Europe. An interesting thing here is that nodes in United States are connected with each other as in Europe. There is a clear clustering in continents.

The transatlantic connection is over only one link. In our experiments, we observed that this may not always occur. There might be several connections in some cases.

But, clustering is still visible.

5.4.2 VDM versus HMTP

In this part, we compare VDM and HMTP using results gathered from PlanetLab. We

filtered working nodes in PlanetLab to get more accurate results. For this evaluation, we used nodes only in the United States. We have identified a pool of working nodes that has around 140 nodes. Each time we select 100 nodes from this pool and run our experiment. We selected a node in Colorado as the source. When we ran more than one experiment on the same nodes at the same time, the performance got affected. So, we ran experiments one by one. An experiment is taking 5000 seconds which is like a real session length since we are actually transmitting a streaming traffic over the multicast tree, unlike the abstracted environment we worked with in the NS-2 experiments in Chapter III. First 2000 seconds are spent for join processes only. In the remaining 3000 seconds, churn takes place. The reality that we have to run each experiment separately prevented us to run simulations many times. Each experiment 80 is run 5 times with different seeds. We show average values of these 5 runs.

1 VDM HMTP 0.8 ) ond c

e 0.6 s ( e m i t 0.4 up t r a t s 0.2

0 2 4 6 8 10 churn rate  Figure 5.7: Startup Time vs. Churn Rate.

In Figure 5.7, we show the startup time versus churn rate. As expected, the startup time doesn’t depend on churn rate. We look at first 100 hundred nodes in first 2000 second and calculate the average startup time. Startup time for HMTP is little higher. The reason is that HMTP may require more steps since tree depth is higher when HMTP is used. Normally, churn rate should not effect the startup time. Even though we use the same setup for different churn rates, startup time fluctuates because network condition changes. Same tree may not be built for the same setup. Also some unstable nodes in PlanetLab cause this fluctuation.

We present average reconnection time in Figure 5.8. Reconnection time also doesn’t depend on churn rate. But when churn rate is increasing, more reconnection has to occur. The expectation is that the reconnection time should be less than the startup time. Since our number of node is not so high in our setup, the difference between reconnection and startup times is small. We can see the difference better when we look at maximum startup time later. As we explained in Chapter III, HMTP doesn’t build ideal multicast tree be- 81 0.5 VDM HMTP 0.4 ) ond c

e 0.3 s ( on i t c 0.2 onne c e r 0.1

0 2 4 6 8 10 churn (%)  Figure 5.8: Reconnection Time vs. Churn Rate. cause it misses some cases. This causes longer path length for nodes. Average stretch for VDM is 1.6 while it is 1.9 for HMTP, as shown in Figure 5.9. We expect this difference to increase with number of nodes. Node degree is also important for stretch values. We picked node degree as 4 here.

2.5 VDM HMTP

2 h c t e r t s 1.5

1 2 4 6 8 10 churn (%)  Figure 5.9: Stretch vs. Churn Rate.

Though hopcount is a metric similar to stretch, it gives better idea about the shape of the multicast tree. When we look at Figure 5.10, average hopcount for VDM is 4.5 and 5.5 for HMTP. Hopcount is independent of churn rate. We will look at 82 6.5 VDM 6 HMTP

5.5 t

oun 5 c hop 4.5

4

3.5 2 4 6 8 10 churn (%)  Figure 5.10: Hopcount vs. Churn Rate. maximum values to see the worst case performance later. We use network usage for PlanetLab implementation instead of stress in net- work simulator to measure resource utilization. Figure 5.11 shows normalized value of sum of all used link length. Since VDM builds a better tree, it uses less resource compared to HMTP.

2 VDM HMTP 1.8

1.6 age s u 1.4

1.2

1 2 4 6 8 10 churn (%)  Figure 5.11: Resource usage vs. Churn Rate.

Figure 5.12 shows loss rate values versus churn rate for both protocols. In network simulator, we were able to control the packet loss so that it happens only 83 1 VDM HMTP 0.8

0.6 ss o l 0.4

0.2

0 2 4 6 8 10 churn (%)  Figure 5.12: Loss Rate vs. Churn Rate. because of the churn by setting link error rate to 0 in the underlying network. But, in PlanetLab we can’t control the loss rate over links. It is also difficult to distinguish loss from churn and packet drop. So, values here includes loss both from churn and packet drop if exists. VDM achieves less loss rate. Loss rate is increasing with churn for both protocol. But, it is increasing more when HMTP is used.

1 VDM HMTP 0.8

0.6 head r e v

o 0.4

0.2

0 2 4 6 8 10 churn (%)  Figure 5.13: Overhead vs. Churn Rate.

In the PlanetLab implementation, sender at the source is sending 10 chunks in 1 second. We calculated overhead as the ratio of number of maintenance messages 84 to the number of these data packets. HMTP has to send refinement messages peri-

odically in order to converge to MST. We picked this period as 30 s. So calculated overhead for HMTP is very high compared to VDM. Overhead ratio is increasing

with churn rate for both protocols.

5.4.3 Performance versus Number of Nodes

In this part, we investigate the behavior of our protocol against the scale, i.e., the number of nodes. 2

avg max

) 1.5 ond c e s ( e 1 m i t up t r a t

s 0.5

0 20 40 60 80 100 number of node  Figure 5.14: Startup Time vs. Number Of Nodes.

Figure 5.14 shows average and maximum time needed for join process. Number of nodes that a new joining node has to query is increasing when number of nodes is increasing. This causes startup time to increase. We calculated average and maximum startup time. When number of nodes is 100, average time is around 0.5 second. The maximum time is 1.5 second. These values are reasonable to start receiving a stream. Reconnection time is not related to number of nodes since orphan nodes start reconnection at their grandparent. Figure 5.15 presents reconnection time versus number of nodes. Average time is around 0.2 second. Even though it seems high 85 1.5

avg

) max ond c

e 1 s ( e m i t on i t c 0.5 onne c e r

0 20 40 60 80 100 number of node  Figure 5.15: Reconnection Time vs. Number Of Nodes. when number of nodes is 100, there is no dependency on number of nodes. There are increases and decreases with number of nodes. This 0.2 second interruption is experinced as jitter by user if there is no buffer. But, if there is usually a couple of second buffer to tolerate these interruptions.

4 min 3.5 avg leaf-avg 3 max 2.5 h c t

e 2 r t s 1.5

1

0.5

0 20 40 60 80 100 number of node  Figure 5.16: Stretch vs. Number Of Nodes.

In Figure 5.16, we present results that gives stretch results versus number of nodes. We show four different measurements. In some cases, nodes might have shorter path length to source whey they use overlay routing. Even though the number is not 86 high, there are such nodes. The bottom line shows average stretch values of these

nodes. The value is around 0.8. It means that these nodes receives stream with less delay than direct connection to source.

We show overall average stretch value for all nodes. It first increases with number of nodes. Then it stabilizes around 1.5. This value is very good for a structure

that a node has to reach source through many intermediate nodes. We also show average stretch values only for leaf nodes. Leaf in tree structure can be considered as worst places to be. Leaf nodes are expected long path length. Almost half of the

nodes are expected to be at the leaf position in the tree. When we look at the average stretch for these nodes, it is a little higher than the general average.

Since our protocol doesn’t try to balance the tree, it is good idea to look at the worst cases. We also show maximum stretch values. It goes up to 3 when number of

nodes is 100.

10 avg leaf-avg max 8 t 6 oun c

hop 4

2

0 20 40 60 80 100 number of node  Figure 5.17: Hopcount vs. Number Of Nodes.

In Figure 5.17, we show hopcount versus number of nodes. Hopcount should increase with number of nodes. This increase depends on node degree and propor- tional to log of number of nodes of nodes as we explained in Chapter III. Average 87 2

avg

age 1.5 s u

e c r ou

s 1 e r

ed z li a m

r 0.5 no

0 20 40 60 80 100 number of node  Figure 5.18: Resource Usage vs. Number Of Nodes. value for all tree is around 4. If we look at only leaf nodes, it is around 5 for number of nodes 100. We also look worst cases. It starts at 5 for number of nodes 20 and goes up to 11 for 100 nodes. Figure 5.18 shows resource usage versus number of nodes. Resource usage is increasing with number of nodes as expected. These are normalized value that shows total length of used links.

0.5

avg 0.4

0.3 e t a r

ss o l 0.2

0.1

0 20 40 60 80 100 number of node  Figure 5.19: Loss Rate vs. Number Of Nodes.

From Figure 5.19, loss rate is increasing with number of nodes. In this ex- 88 periment, we keep churn rate same while increasing number of nodes. So, number

of joins and leaves is getting higher. When number of nodes is high, even though reconnection time doesn’t change, number of nodes effected from a disconnection is

high. This causes an increase in loss rate.

0.4

0.35 avg 0.3

0.25 head

r 0.2 e v o 0.15

0.1

0.05

0 20 40 60 80 100 number of node  Figure 5.20: Overhead vs. Number Of Nodes.

Overhead in Figure 5.20 is increasing with number of nodes. Number of nodes that a node needs to contact for a join process is increasing. This causes an increase in overhead.

5.4.4 Performance versus Node Degree

In this part, we analyze the effect of node degree on our protocol. We investigate each performance metric while we vary average node degree starting at 2 and increasing until 8.

In Figure 5.21, we see startup time change versus node degree. When node degree is small, tree depth is high which requires more steps for a node to complete join process. Startup time decreases until node degree becomes 4 or 5. After that, it doesn’t change much. 89 1 avg max 0.8 ) ond c

e 0.6 s ( e m i t 0.4 up t r a t s 0.2

0 2 3 4 5 6 7 8 node degree  Figure 5.21: Startup Time vs. Node Degree.

1 avg max 0.8 ) ond c

e 0.6 s ( on i t c 0.4 onne c e r 0.2

0 2 3 4 5 6 7 8 node degree  Figure 5.22: Reconnection Time vs. Node Degree.

We present reconnection time versus node degree in Figure 5.22. We could not find a relation between reconnection time and node degree. An orphan node connects its grandparent and starts reconnection from there. When node degree is high, leave of an intermediate node causes more orphan nodes. But, all of these orphan nodes carry out reconnection process by themselves. So, reconnection time is independent of node degree according to our experiment results. We show four different statistics for stretch in Figure 5.23. We can say that 90 4

3.5

3 min 2.5 avg

h leaf-avg c t

e 2

r max t s 1.5

1

0.5

0 2 3 4 5 6 7 8 node degree  Figure 5.23: Stretch vs. Node Degree. all of these except the one below zero decreasing until node degree becomes 5, then they stabilize. The reason for this is that our protocol improves its efficiency by using more degree until node degree 5. After that according to our protocol definition, it doesn’t exploit remaining available degrees to keep resource utilization less.

10

8 t 6 oun c

hop 4 avg 2 leaf-avg max 0 2 3 4 5 6 7 8 node degree  Figure 5.24: Hopcount vs. Node Degree.

Node degree is very important for hopcount. Especially small node degree gives high hopcount. When node degree is 2, average hopcount is around 6. It becomes 4 when node degree is 5. After that point it doesn’t get improved. 91 3 avg 2.5 age s u 2 e c r ou

s 1.5 e r

ed z

li 1 a m r

no 0.5

0 2 3 4 5 6 7 8 node degree  Figure 5.25: Resource Usage vs. Node Degree.

When node degree is small ideal tree can not be built because desired connec- tions can not be established because of degree limitation. So, resource utilization is high. It gets improved when node degree is increasing. If we utilize all degree, we can decrease stretch by sacrificing resource usage. We like to find a good point that satisfies us in terms of resource usage and stretch.

1 avg

0.8

) 0.6 % (

ss o l 0.4

0.2

0 2 3 4 5 6 7 8 node degree  Figure 5.26: Loss Rate vs. Node Degree.

When we look at loss rate versus node degree. Loss rate is high when node degree is small. It gets smaller with node degree. Then, it stays same after node 92 degree is 5. The tree is reaching its optimum shape when node degree is around 5.

After that, the multicast tree doesn’t change much.

0.5 avg

0.4 ) %

( 0.3

head r

e 0.2 v o

0.1

0 2 3 4 5 6 7 8 node degree  Figure 5.27: Overhead vs. Node Degree.

Overhead in Figure 5.27 is high when node degree is small. It decreases up to node degree 5. Then the protocol gives similar values for overhead. When we consider all results in this subsection together, we see that the multi- cast tree established using our protocol VDM gets optimum shape when node degree is around 5. After this point the tree is not effected by node degree. The available degrees of some nodes are not used because of our protocol description. VDM is designed to find an optimum point for user perceived quality by minimizing stretch and resource utilization by converging to MST.

5.4.5 Refinement Component

We added a refinement component to our algorithm. Refinement is needed to adapt the tree to changing physical network conditions and overlay changes. For example, a node may not connect to an ideal parent because it may not have enough capacity. 93 After some time a child of ideal parent may leave. Or a better parent might have

joined the tree. We apply a periodic refinement process. We chose this period as 5 minutes.

Each node repeats join process every 5 minutes. If new found potential parent is different than current parent, it switches its parent. Refinement component brings

additional overhead to our algorithm. We investigate the trade off between the gain in stretch and loss in overhead. 2.5 VDM VDM-R

2 h c t e r t s 1.5

1 10 20 30 40 50 number of node  Figure 5.28: Improvement in Stretch in use of Refinement.

In Figure 5.28, VDM-R denotes VDM with refinement component. When we use refinement we achieve better stretch values. We see an improvement about 10%.

We see an improvement in average hopcount. Even though the number of node is same, average hopcount decrease. That means refinement component helps to construct a more balanced tree.

We observe the cost for improving stretch and hopcount in Figure 5.30. The frequency of refinement process should chosen carefully. Frequent refinement may not provide enough improvement as we experience too much overhead. Additional exper- iments could be done to understand the effect of frequency of refinement messages. 94 4 VDM 3.5 VDM-R

3 t

oun 2.5 c hop 2

1.5

1 10 20 30 40 50 number of node  Figure 5.29: Improvement in Hopcount in use of Refinement.

1 VDM-R 0.8 VDM

0.6 head r e v

o 0.4

0.2

0 10 20 30 40 50 number of node  Figure 5.30: Cost of Refinement in Overhead.

5.4.6 Comparison with Minimum Spanning Tree

With our algorithm we try to converge to Minimum Spanning Tree as we try to satisfy other user requirements. In this part, we test our algorithm to see how much it gets closer to MST. We don’t expect VDM to find MST since there are some cases it doesn’t address for simplicity. In this part, we don’t apply degree limitation.

In Figure 5.31, we show ratio between the tree constructed using overlay algo- rithm and MST. When number of nodes increases, the ratio is increasing. But, still 95 it is not very far from MST.

2 VDM

1.8

T 1.6 S M

o t o i

t 1.4 a r

1.2

1 10 20 30 40 50 number of node  Figure 5.31: Comparison with MST. 96

Chapter 6

Summary and Future Work

6.1 Summary

Overlay network has emerged as an alternative multicast technique in networking area. It also find application areas in file sharing, data center design and telecom- munication. We have focused on real time content delivery using overlay networks, with the goal to overcome some implementation issues of IP Multicast. The core idea behind the overlay network is establishing virtual links between users on the applica- tion layer. One can transfer a video stream to another in a structural way without a dependency on the physical layer. In this dissertation, we have introduced a new overlay multicast protocol,

(VDM) Virtual Directional Multicast. VDM exploits directionality concept to con- struct the multicast tree. By using the concept of directionality, VDM attempts to build its overlay tree congruent to the underlying network so that network resources are utilized efficiently while satisfying end-users in terms of perceived quality. The key idea is connecting nodes which are estimated to be in the same virtual direction. 97 The overlay tree is extended with new joins. A newcomer positions itself by measur-

ing RTT to other nodes in an iterative manner by starting at source. The new node tries to find the best plugin place to the tree.

With VDM, we aim to reduce network cost and increase end user satisfaction. The main motivation of multicast over unicast is avoiding overwhelming the Internet

with same packet transmissions. While doing this, the end user should receive the stream in acceptable time limits and should watch the video without experiencing stream interruption.

We have done extensive simulation experiments to evaluate the performance of our proposed algorithm. We compared VDM with HMTP in order to prove the effectiveness of the directionality concept. VDM and HMTP are compared based on defined performance metrics. Simulation results shows that VDM achieves better performance in terms of various metrics like stretch, stress, loss and overhead. VDM improved the network usage efficiency, which is important for the underlying network infrastructure the overlay multicast is working on. VDM also improved the path stretch which effects both overlay tree participants and the physical network. Another improved metric is packet loss, which is important for applications with real-time and/or reliable. Finally, we also showed that VDM causes less messaging overhead, which is a key factor for scalability of overlay multicast applications.

We focused on the virtualization of distance calculation, and assigned a per- formance value using loss rate instead of delay. Then a multicast tree was established using this virtual distance. This virtualization method helps us improve the desired

performance metric, thus design target specific multicast tree. We implemented our algorithm on Planetlab which gave us the opportunity to

test our protocol on a real environment. We first introduced the Planetlab environ- 98 ment and explained our implementation. Some limitation induced by unstable nodes in Planetlab was explained. We defined additional metrics such as start up time, reconnection time and hop count. We compared our protocol with others based on performance results.

6.2 Future Work

We have proposed a new overlay multicast protocol, implemented in a simulator and Planetlab. But, so far, we didn’t send real video stream and watch it. The next step could be setting up several computers at different places around the country, and we can transfer real video data. This enables us to see more limitations if there is and understand system requirements.

In our experiments, we randomly assigned a degree to each node between upper and lower bounds. This degree depends on outgoing bandwidth of nodes. A system is required to measure and determine the degree of each node in real implementation. Another problem is that even though one node has enough capacity to accept an- other node, a bottleneck point between these two nodes may not satisfy bandwidth requirement for the stream. These are some problems that require special study. A generalization method was proposed and implemented in a simulator. How- ever, measuring loss rate takes long time compared to delay, which is infeasible for quick start up and reconnection. Some third party systems that provides statistics can be used to quicken the process. 99

Bibliography

[1] www.webvisions.com/.

[2] Aman Shaikh Jia Wang Jennifer Yates Yin Zhang Ajay Mahimkar, Zihui Ge and Qi Zhao. Towards automated performance diagnosis in a large iptv network. In Proc. of ACM SIGCOMM, 2009.

[3] Akamai. http://www.akamai.com/.

[4] Amazon. http://aws.amazon.com/cloudfront/.

[5] http://www.thewhir.com/web-hosting-news/aol070505.

[6] http://news.cnet.com/8301-30686_3-20006530-266.html.

[7] Sugih Jamin Beichuan Zhang and Lixia Zhang. Host multicast: A framework for delivering multicast to end users. In Proc. of IEEE INFOCOM, 2002.

[8] Michael Bishop and Sanjay Rao. Considering priority in overlay multicast pro- tocols under heterogeneous environments. In Proc. of IEEE INFOCOM, 2006.

[9] BitTorrent. http://www.bittorrent.com/.

[10] http://www.broadbandtvnews.com/2010/11/01/ online-video-surpasses-peer-to-peer-traffic/.

[11] M. Meo E. Leonardi D. Ciullo, M. Mellia. Understanding p2p-tv systems through real measurements. In Proc. of IEEE Globecom 2008, Dec. 2008.

[12] A. Helmy D. Thaler S. Deering M. Handely V. Jacobson C. Liu P. Sharma D. Estrin, D. Farinacci and L. Wei. Protocol independent multicast-sparse mode (pim-sm): Protocol specification. In RFC-1584, June 1997.

[13] D. Verma D. Pendarakis, S. Shi and M.Waldvogel. Almi: An application level multicast infrastructure. In Proceedings of the 3rd USNIX Symposium on Inter- net Technologies and Systems, Mar. 2001. 100 [14] S. Deering and D. Cheriton. Multicast routing in datagram internetworks and extended lans. In ACM Transactions on Computer Systems, volume 8, pages 85–100, May 1990.

[15] www.sandpiper.net.

[16] http://www.netdimes.org/new/.

[17] E. Leonardi M. Mellia M. Meo E. Alessandria, M. Gallo. P2p-tv systems under adverse network conditions: A measurement study. In Proceedings of IEEE INFOCOM, pages 100–108, April 2006.

[18] P. Francis. Yoid: Extending the multicast internet architecture. In White paper http://www.aciri.org/yoid/, 1999.

[19] A. Ganjam and H. Zhang. Internet multicast video delivery. In Proc. of IEEE, volume 93, pages 159–70, January 2005.

[20] M. R. Garey and D. S. Johnson. In Computers and Intractability: A Guide to the Theory of NP-Completeness, W. H. Freeman, SanFrancisco, CA, 1979.

[21] Gnutella. http://www.gnutella.com/.

[22] Gt-itm: Georgia tech internetwork topology models. http://www.cc.gatech. edu/fac/Ellen.Zegura/graphs.html.

[23] T. Anderson A. Krishnamurthy H. V. Madhyastha, E. Katz-Bassett and A. Venkataramani. iplane nano: Path prediction for peer-to-peer applications. In NSDI, 2009.

[24] D. Helder and S. Jamin. End-host multicast communication using switch-trees protocols. In Proc of 2nd Workshop on Global and Peer-to-Peer Computing on Large Scale Distributed Sys., 2002.

[25] http://iplane.cs.washington.edu/.

[26] IPTV News. http://www.iptvnews.net.

[27] M. Nahas J. Liebeherr and W. Si. Application-layer multicasting with delaunay triangulation overlays. IEEE Journal on Selected Areas in Communications, 20(8):1472–1488, Oct 2002.

[28] Joost. http://www.joost.com.

[29] M. Kwon and S. Fahmy. Topology aware overlay networks for group communi- cation. In NOSSDAV02, May. 2002. 101 [30] http://www.akamai.com/html/technology/dataviz2.html.

[31] Limelight. www.limelightnetworks.com/.

[32] LiveStation. http://www.livestation.com/.

[33] A. Kermarrec A. Nandi A. Rowstron M. Castro, P. Druschel and A. Singh. Split- stream: High-bandwidth multicast in cooperative environments. In 9th ACM Symp. on Operating Systems and Principles (SOSP), pages 298–313, 2003.

[34] A.-M. Kermarrec M. Castro, P. Druschel and A. Rowstron. Scribe: A large-scale and decentralized application-level multicast infrastructure. IEEE Journal on Selected Areas in Communications, 20(8):1489–1499, Oct 2002.

[35] Z. Ge J. Yates M. Cha, W. A. Chaovalitwongse and S. Moon. Path protection routing with srlg constraints to support iptv in wdm mesh networks. In Proc. of IEEE Global Internet Symposium, 2006.

[36] B. Botev D. Xu M. Heffeeda, A. Habib and B. Bhargava. Promise: Peer-to-peer media streaming using collectcast. In Proc. ACM Multimedia, November 2003.

[37] S. Mercan and M. Yuksel. Virtual direction multicast for overlay networks. In 8th International Workshop on Hot Topics in Peer-to-Peer Systems (HOTP2P), May 2011.

[38] J. Moy. Multicast extensions to ospf. In RFC-1584, March 1994.

[39] The Network Simulator NS-2. http://www.isi.edu/nsnam/ns/.

[40] OpenNap. http://opennap.sourceforge.net/ADC.html.

[41] OverSim: The Overlay Simulation Framework. http://www.oversim.org/.

[42] P2PSim. http://pdos.csail.mit.edu/p2psim/.

[43] P2PTV. http://en.wikipedia.org/wiki/P2PTV.

[44] PeerCast. http://www.peercast.org.

[45] http://www.planet-lab.org/.

[46] PlanetSim. http://planet.urv.es/planetsim/.

[47] PPLive. http://www.pplive.com/.

[48] PPMate. http://www.ppmate.com/. 102 [49] PPStream. http://www.ppstream.com/.

[50] QQLive. http://www.qq.com/.

[51] A. Mathur S. Ali and Hui Zhang. Measurement of commercial peer-to-peer live video streaming. In Proc. 1st Workshop Recent Adv. Peer-To-Peer Streaming, Aug. 2006.

[52] C. Kommareddy S. Banerjee and B. Bhattacharjee. Scalable application layer multicast. In Proc. of IEEE ACM SIGCOMM, Aug. 2002.

[53] D. Farinacci V. Jacobson C. Liu S. Deering, D. Estrin and L.Wei. An architecture for wide-area multicast routing. In Proc. of ACM SIGCOMM, 1994.

[54] R. Karp S. Ratnasamy, M. Handley and S. Shenker. A scalable content- addressable network. In Proceeding of ACM SIGCOMM, 2001.

[55] R. Karp S. Shenker S. Ratnasamy, M. Handley. Topologically-aware overlay construction and server selection. In Proceeding of Infocom, 2002.

[56] Skype. http://www.skype.com/.

[57] SOPCast. http://www.sopcast.org/.

[58] TVAnts. http://www.tvants-ppstream.com/.

[59] TVUNetworks. http://www.tvunetworks.com/.

[60] UUSee. http://www.uusee.com/.

[61] P. Chou V. N. Padmanabhan, H. Wang. Resilient peer-to-peer streaming. In Proc. of IEEE ICNP, November 2003.

[62] V. Sambamurthy K. Kumar V. Pai, K. Tamilmani and A. Mohr. Chainsaw: Elim- inating trees from overlay multicast. In Proc. of the 4th International Workshop on Peer-to-Peer Systems (IPTPS), February 2005.

[63] P. Francis V. Venkataraman and J. Calandrino. Chunkyspread: Multi-tree un- structured peer-to-peer multicast. In 6th IPTPS, 2006.

[64] B. Li X. Zhang, J. Liu and Y. Yum. Coolstreaming/donet: A data-driven overlay network for peer-to-peer live media streaming. In IEEE INFOCOM, 2005.

[65] S. Goose D. Hedqvist Y. Amir, C. Danilov and A. Terzis. 1-800-overlays: using overlay networks to improve voip quality. In Int. Workshop on Network and Operating Systems Support for Digital Audio and Video (NOSSDAV), 2005. 103 [66] S. McCanne Y. Chawathe and E. A. Brewer. Rmx: Reliable multicast for het- erogeneous networks. In Proc. of IEEE IEEE INFOCOM, Mar. 2000.

[67] S. G. Rao Y.H. Chu and H. Zhang. A case for end system multicast. In Proc. of ACM SIGMETRICS, June 2000.

[68] Zattoo. http://zattoo.com/.