Design and Implementation of Centrally-Coordinated Peer-to-Peer Live-streaming

ROBERTO ROVERSO

Licentiate Thesis Stockholm, Sweden 2011 TRITA-ICT/ECS AVH 11:03 KTH School of Information and ISSN 1653-6363 Communication Technology ISRN KTH/ICT/ECS/AVH-11/03-SE SE-100 44 Stockholm ISBN 978-91-7415-957-8 SWEDEN

Akademisk avhandling som med tillstånd av Kungl Tekniska högskolan framlägges till offentlig granskning för avläggande av Licentiatexamen i Elektronik och Da- torsystem Torsdagen den 5 maj 2011 klockan 14.00 i Sal D, ICT School, Kungl Tekniska högskolan, Forum 105, 164 40 Kista, Stockholm.

© ROBERTO ROVERSO, May 2011

Tryck: Universitetsservice US AB i

Abstract

In this thesis, we explore the use of a centrally-coordinated peer-to-peer overlay as a possible solution to the live streaming problem. Our contribution lies in showing that such approach is indeed feasible given that a number of key challenges are met. The motivation behind exploring an alternative design is that, although a number of approaches have been investigated in the past, e.g. mesh-pull and tree-push, hybrids and best-of-both-worlds mesh-push, no consensus has been reached on the best solution for the problem of peer-to-peer live streaming, despite current deployments and reported successes. In the proposed system, we model sender/receiver peer assignments as an optimization problem. Optimized peer selection based on multiple utility factors, such as bandwidth availability, delays and connectivity compatibility, make it possible to achieve large source bandwidth savings and provide high quality of user experience. Clear benefits of our approach are observed when Network Address Translation constraints are present on the network. We have addressed key scalability issues of our platform by parallelizing the heuristic which is the core of our optimization engine and by implement- ing the resulting algorithm on commodity Graphic Processing Units (GPUs). The outcome is a Linear Sum Assignment Problem (LSAP) solver for time- constrained systems which produces near-optimal results and can be used for any instance of LSAP, i.e. not only in our system. As part of this work, we also present our experience in working with Network Address Translators (NATs) traversal in peer-to-peer systems. Our contribution in this context is threefold. First, we provide a semi-formal model of state of the art NAT behaviors. Second, we use our model to show which NAT combinations can be theoretically traversed and which not. Last, for each of the combinations, we state which traversal technique should be used. Our findings are confirmed by experimental results on a real network. Finally, we address the problem of reproducibility in testing, debugging and evaluation of our peer-to-peer application. We achieve this by providing a software framework which can be transparently integrated with any already- existing software and which is able to handle concurrency, system time and network events in a reproducible manner.

iii

Acknowledgements

I am extremely grateful to Sameh El-Ansary for the way he supported me during this work. This thesis would not have possible without his patient supervision and constant encouragement. His clear way of thinking, methodical approach to problems and enthusiasm are of great inspiration to me. He also taught me how to do research properly given extremely complex issues and, most importantly, how to make findings clear for others to understand. In particular, I admire his practical approach in solving problems in an efficient and timely fashion given the very demanding goals and strict deadlines imposed by the industrial setting we have been working in. In this years, he has treated me more as a friend than a colleague/student and I feel very privileged to have worked with him and hope to continue doing so in the future. I would like to acknowledge my supervisor Seif Haridi for giving me the oppor- tunity to work under his supervision. His vast knowledge of the field and experience was of much help to me. In particular, I take this opportunity to thank him for his understanding and guidance for the complex task which was combining the work as a student and as employee of Peerialism. I am grateful to Peerialism AB for funding my studies and to all the company’s very talented members for their help: Johan Ljungberg, Andreas Dahlström, Mo- hammed El-Beltagy, Nils Franzen, Magnus Hedbeck, Amgad Naiem, Mohammed Reda, Jonas Vasell, Christer Wik, Riccardo Reale and Alexandros Gkogkas. Each one of them has contributed to this work in his own way and has made my stay at the company very enjoyable. I would also like to acknowledge all my colleagues at the Swedish Institute of Computer Science: Cosmin Arad, Joel Höglund, Tallat Mahmood Shaffat, Ahmad Al-Shishtawy, Amir Payberah and Fatemeh Rahimian. In particular, I would like to express my gratitude to Jim Dowling for providing me with valuable feedback on my work. Special thanks go to all the people that have supported me in the last years and have made my life exciting and cherishable: Stefano Bonetti, Tiziano Pic- cardo, Christian Calgaro, Jonathan Daphy, Tatjana Schreiber to name a few and, in particular, Maren Reinecke for for her love. Finally, I would like to dedicate this work to my parents Segio and Odetta and close relatives, Paolo and Ornella, who have at all times motivated and helped me in every possible way.

To my family

Contents

Contents vii

I Thesis Overview 1

1 Introduction 3 1.1 Content Streaming ...... 4 1.2 Thesis Organization ...... 6

2 Peer-To-Peer Live Streaming 7 2.1 Tree-based ...... 9 2.1.1 Overlay maintenance and construction ...... 11 2.2 Mesh-based ...... 11 2.3 Hybrid Approaches ...... 13

3 Problem Definition 15

4 Thesis Contribution 17 4.1 List of publications ...... 17 4.2 Design and implementation of a centrally-managed peer-to-peer live streaming platform ...... 18 4.2.1 Solving Linear Sum Assignment Problems in a time-constrained environment ...... 19 4.3 NAT Traversal ...... 19 4.4 Highly reproducible emulation of P2P systems ...... 20

5 Conclusion and Future Work 21

Bibliography 25

vii viii CONTENTS

II Research Papers 33

6 On The Feasibility Of Centrally-Coordinated Peer-To-Peer Live Streaming 35

7 NATCracker: NAT Combinations Matter 43

8 GPU-Based Heuristic Solver for Linear Sum Assignment Prob- lems Under Real-time Constraints 53

9 MyP2PWorld: Highly Reproducible Application-level Emula- tion of P2P Systems 61 Part I

Thesis Overview

1

Chapter 1

Introduction

Peer-to-peer (P2P) systems have shown a significant evolution since first introduced to the world by Gnutella [41] and Kazaa [27]. Nowadays, applications which use the peer-to-peer approach vary from illegal file sharing to distributing games updates. It is safe to state that Bittorrent in particular has been a major force in driving the bandwidth demands of most consumer networks throughout the last 5 years. It is estimated that in 2009 one fourth to one third of the Internet traffic was somehow related to P2P applications [20]. The consequences of an increased popularity of P2P platforms has coincided with efforts from the academia to try to understand how such an important amount of traffic influences the Internet and what can be done to reduce its congesting impact, in particular regarding Bittorrent [39][44]. The industry as well has applied the peer-to-peer approach to a number of areas, including VoIP, with Skype [18], distributed storage, with Wuala [19], and on-demand audio streaming, with Spotify [25]. All the aforementioned are attempts to provide services which do not involve significant costs from the point of view of bandwidth consumption and infrastructure. P2P-based software amounts to only a tiny part of all Internet-based services. The bulk of the industry instead relies on expensive but reliable solutions, that are content delivery networks (CDNs) and Clouds. In particular, when considering video streaming, no commercial peer-to-peer software has been widely deployed. However, a number of free applications such as SopCast [2] and PPLive [14] have proven very effective for large-scale live streaming, mostly because of their limited requirements in terms of bandwidth at the distribution site. In fact, most of the source of the streams in those systems are users which broadcast live content for thousands of others with limited upload bandwidth. On the other hand, the aforementioned applications provide a low quality of service which would not be acceptable in a commercial system. Many solutions and different approaches have been proposed by the research community to the problem of streaming live video and audio content over the In- ternet using overlay networks. However, no consensus has been reached on which

3 4 1.1. CONTENT STREAMING one of them solves best the difficult task of guaranteeing high quality of user expe- rience while providing a large amount of bandwidth savings. In this thesis, we explore an alternative and novel approach to the aforemen- tioned problem, that is using central coordination for controlling the delivery of content on overlay networks. This in the hope of showing the viability of such approach for large-scale live content streaming. The next sections detail the types, requirements and technologies currently used in content streaming services. This will serve as a background for a better under- standing of the challenging problem that is efficiently distributing content streams over the Internet.

1.1 Content Streaming

Streaming services can be classified in two main classes: video-on-demand and live.

Video-on-Demand(VOD). VOD allows users to select and watch pre-recorded video/audio material at any point in time. Users are usually presented with a catalog of streams to choose from; once a decision has been made, the stream is sent to the player as a flow of video/audio chunks. VOD allows for the delivery to start at any point of the stream. Seeking operations are also allowed. VOD has an inherently sparse content popularity distribution. It is widely recognized that content follows a long tail, where the majority of the videos are not accessed very often, while few popular others are requested very frequently [8]. The complexity of VOD lies in guaranteeing the same quality of experience for popular and non-popular content items.

Live Streaming. The main difference of Live streaming compared to VOD is that the content is not pre-recorded. The stream instead is being created and broadcasted in real-time. As a requirement, every client receiving the live content should have the minimum possible delay from the moment the content becomes available at the distribution point, i.e. the streaming server, to the point when it gets played at the receiver’s end. A desirable feature is also to minimize the inter- client delay, i.e. the playback point should be synchronized or within a reasonable time window across all clients. Live streaming is usually implemented using stateful network control protocols, such as RTSP [52], where clients establish and manage media sessions towards the streaming server by issuing VCR-like commands, e.g. play, stop and pause. The media delivery is carried out using a transport protocol such as RTP [51], however proprietary alternatives are also common, e.g. RDT [1]. In standard live streaming, it is the server that pushes content fragments at a specific rate to the client following a single play request. At the transport level, standard live streams are delivered through UDP, while TCP is used for control messages. CHAPTER 1. INTRODUCTION 5

Recently, the industry has introduced a new technology for live streaming, called HTTP-live. In HTTP-live, the stream is split into a number of small HTTP files, i.e. fragments. The streaming server appears to the clients as a standard HTTP server. When a client first contacts the streaming server, it is presented with a descriptor file, called Manifest, which outlines the characteristics of the stream. It contains the stream’s fragments path on the HTTP server and the bitrate characteristics. As the content becomes available from an encoder or a capturing device, the streaming server creates new fragments and regenerates the Manifest accordingly. The player periodically requests a new copy of the Manifest to be aware of which fragments are available at the current time. Reasons behind the development of HTTP live protocol are the simplicity of management at server side and the use of HTTP as a transport protocol, which makes it easier to achieve good connectivity in restrictive environments such as corporate networks. Examples of HTTP live protocols are Apple’s Live streaming [16] and Microsoft’s Smooth Streaming [17].

Bandwidth Requirements. Media streams are usually quite demanding from the point of view of bandwidth consumption. You Tube for example requires a minimum bitrate of 256Kbit/s for normal quality videos encoded with the H264 codec. With the same video compression format, it is possible to achieve a quality comparable to Digital Satellite TV at 1.5Mbit/s, whereas for an HD quality stream a minimum bitrate of 4Mbit/s is required. The high bitrate requirements of video streaming raise obvious challenges. First, from the point of view of server infrastructure, since a single streaming server is typically able of handling just a few thousands of clients. And second, from the point of view of bandwidth consumption, because streaming requires a capacity of many Gbit/s towards the distribution site. Bandwidth capacity is by far the most expensive of the two aspects. Pricing as of Q4 2010 for streaming from a CDN is shown in Table 1.1.

Volume Max Price ($) Min Price ($) 50TB 0.45 0.40 100TB 0.25 0.20 250TB 0.10 0.06 500TB 0.06 0.02

Table 1.1: Table showing the highest and lowest prices per acquired volume in the CDN market as of Q4 2010. Data taken from [40].

Distribution Infrastructure. Live and VOD streams are mostly distributed using unicast towards a single content source or a CDN. Multicast is also exploited. 6 1.2. THESIS ORGANIZATION

Given that providers of steaming services are usually ISPs, the quality of the service is guaranteed by means of network resources reservation. Alternatives to the ISP approach include proprietary application-level solutions, such as Voddler, Netlix and Hulu. The delivery of audio and video streams in this case happens without any guarantee of quality of service or prevention of service interruption. Despite their best effort nature, these solutions have known a large amount of success in the last years. Internet-based services use different delivery strategies for streaming: • Unicast: Single End-to-End connectivity through either TCP or best effort UDP is implemented in this case. The load of multiple clients is usually shared among multiple locations of a Content Delivery Network. Server farms are placed in strategical geographical locations. Proximity to clients allow for lower distribution delays. CDNs are usually organized in a way to lower peering costs by placing servers with copies of the same content inside ASs and ISPs. • IP Multicast: support for efficient broadcasting in this case is implemented at Network layer. Multicast is the most efficient way to deliver streams of content, since the distribution happens using a single stream of data along a tree-like network path. However, IP Multicast setup and configuration is cumbersome and requires expensive hardware. • Peer-To-Peer: As opposed to the aforementioned strategies, the peer-to- peer approach allows to utilize hosts as relays for content streams. A client plays a double role: it receives the content data delivering it to the player, and it makes the data available to other peers. This approach allows for sharing the load of distribution among all involved hosts. Only few peers need to be receiving content from the source, whilst the others can retrieve it from them. Obviously, if the peer-to-peer delivery is organized in the right way, this can lead to significant savings in terms of bandwidth utilization at the distribution site and to improved scalability of service.

1.2 Thesis Organization

This chapter provides a general introduction to the thesis and the problem of con- tent streaming. Chapter 2 presents the state of the art of peer-to-peer live stream- ing. A definition of the problems addressed in this work is presented in Chapter 3. The contribution to the defined issues is explained in Chapter 4. Section 4.1 provides the list of publications related to this thesis. Finally, Chapter 5 concludes the thesis and gives an insight of future directions of this work. Chapter 2

Peer-To-Peer Live Streaming

Peer-to-peer content streaming can be viable solution to provide scalability and distribution savings. However, P2P live streaming is subject to a number of chal- lenges. Typically, in any live streaming architecture, the source streamlines the content into a number of data chunks of fixed or variable size which need to be de- livered to the player at fixed rate and within a strict deadline. Missing a deadline might cause loss of quality, temporary interruption or full termination of the play- back. The behaviour of the content players, when such problematic events happen, can vary significantly according to the type of transport protocol, the codec used for encryption/decryption and quality of the stream.

Challenges

The main challenge for a peer-to-peer live streaming system is to meet real-time constraints while coping with dynamicity, e.g. churn, network congestion and con- nectivity limitations. On top of that, the application should strive for efficient bandwidth utilization. The peers in the overlay network should in fact contribute with their upload bandwidth in order to offload as much as possible the source of the content. Churn, defined as the process and rate of peer joining and leaving the peer- to-peer network, is an important issue in peer-to-peer systems. This because nodes have very limited time to adapt to overlay network changes as deadlines for video/audio fragments are in the order of seconds rather than minutes. Studies on peer-to-peer file-sharing systems have shown that node lifetimes fol- low an heavy-tailed Pareto distribution [6][50], where few peers have very long lifetime while all others have a very short one. In addition to that, the population size tends to remain stable over time. In live streaming instead, churn behaviour is significantly different. In Figure 2.1(a) and Figure 2.1(b), we show the arrival and departure rates of a movie channel

7 7

3000 with each other. We will address this peer dynamics more 2500 in Section V-D3.

2000 250

1500 200 # of Peers

1000

150 500

# of Arrival 100 0 0 2 4 6 8 10 12 14 16 18 20 22 24 Time(h) 7 (a) Popular TV channel 50

3000 0 70 0 2 4 6 8 10 12 14 16 18 20 22 24 with each other. We will address this peer dynamics more Time(h) 60 2500 in Section V-D3. (a) Peer arrival rate 50 2000 250 1400 40 1500 1200 30 200 # of Peers # of Peers 1000 1000 20 150 800 7 500 10 600

# of Arrival 100 0 0 # of Departure 0 2 4 6 8 10 12 14 16 18 20 22 24 30000 2 4 6 8 10 12 14 16 18 20 22 24 Time(h) Time(h) with each400 other. We will address this peer dynamics more 50 2500 (a) Popular TV channel (b) Unpopular TV channel in Section V-D3.200

70 2000 0 0 Fig. 9. Diurnal trend of number of participating0 2 4 6 users8 on10 Oct.12 13,14 200616 18 20 22 24 250 0 2 4 6 8 10 12 14 16 18 20 22 24 Time(h) Time(h)

60 1500 (a) Peer arrival rate 200 (b) Peer departure rate # of Peers 50 1000 1400 40 Fig. 11. Peer150 arrival and departure evolution of a popular movie channel pull P2P streaming500 systems scales well, handling a flash crowd 1200 30

# of Peers in a live broadcasting.

# of Arrival 100 0 1000 0 2 4 6 8 10 12 14 16 18 20 22 24 20 Time(h) We also plot the numbers of peer arrivals and departures of 6 10 800 7 50 10 Jan.(a) 28 Popular TV channelJan. 29 the popular TV channel in every minute of one day in Figure 600 12. We observe that the arrival pattern of this TV channel 0 # of Departure 3000 0 0 2 4 6 8 10 12 14 16 18 20 22 24 70 0 2 4 6 8 10 12 14 16 18 20 22 24 Time(h) 5 8 400 10 with each other. We will address this peer dynamicsis similar more to that of the movieTime(h) channel. However, there is no 2500 60 (b) Unpopular TV channel in Section V-D3.200 periodic batch departure(a) pattern Peer arrival for rate this TV channel. 50

2000 # of peers 0 250 0 2 4 6 8 10 12 14 16 18 20 22 24 3001400 Fig. 9. Diurnal trend of number of participating users on Oct. 13, 20064 10 Time(h) 40 1500 1200 250 30 200 (b) Peer departure rate # of Peers # of Peers 1000 1000 200 203 10 4 8 Fig.12 11.16 20 Peer15024 arrival4 and8 departure12 16 20 evolution24 of a popular movie channel 800 pull P2P streaming500 systems scales well, handling a flash crowd Time(h) 10 150 in a live broadcasting. 600

# of Arrival 100 0 # of Arrival

0 # of Departure 0 2 4 6 8 10 12 14 16 18 20 Fig.22 10.24 Flash0 crowd2 4 on Chinese6 8 10 new12 year14 eve16 18 20 22 24 100 Time(h) We also plot the numbers of peer arrivals and departures of 6 Time(h) 400 10 50 Jan. 29 the popular TV channel in every minute of one day in Figure Jan.(a) 28 Popular TV channel (b) Unpopular TV channel 50200 12. We observe that the arrival pattern of this TV channel 70 0 00 0 2 4 6 8 10 12 14 16 18 20 22 24 00 22 44 66 88 1010 1212 1414 1616 1818 2020 2222 2424 5 C.Fig. User 9. Diurnal Arrivals trend and of number Departures of participating users on Oct. 13, 2006 10 is similar to that of the movieTime(h) channel. However, there is no Time(h)Time(h) 60 In this section,periodic we examine batch the departure peer(a) arrival pattern Peer(a) arrivaland for departure rate this TV channel. (b)(a) Peer Peer(b) departure arrival rate rate 50

# of peers pattern for various PPLive channels. We plot the numbers of 1400300 4 250 10 Fig. 11. Peer arrival and departure evolution of a popular movie channel 40 peerpull P2P arrivals streaming and departures systems scales of the well, popular handling movie a flash channel crowd in 1200 250 30 200 # of Peers everyin a live minute broadcasting. of one day in Figure 11. Comparing it with 1000 200 203 the evolution of the number of participating peers, we find 10 We also plot the numbers of peer arrivals and departures of 6 4 8 12 16 20 24 4 8 12 16 20 24 10 150 Time(h) that peers join and leave at800 a higher rate at peak times. We 10 Jan. 28 150 Jan. 29 the popular TV channel in every minute of one day in Figure 600 also see consecutive spikes# of Arrival with a period of about 2 hours 12. We observe100 that the arrival pattern of this TV channel 0 # of Departure Fig. 10. Flash0 crowd2 4 on Chinese6 8 10 new12 year14 eve16 18 20 22 24 100 # of Departure

Time(h) 5 400 in the departure10 rate curve in Figure 11(b). The spikes are is similar to that of the movie channel. However, there is no 50 (b) Unpopular TV channel due to many peers leaving immediately20050 and simultaneously at periodic batch departure pattern for this TV channel. the end of (roughly) two-hour programs. This batch-departure # of peers 00 0 00 22 44 66 88 1010 12 14 16 18 20 2222 2424 3000 2 4 6 8 10 12 14 16 18 20 22 24 Fig.C. User9. Diurnal Arrivals trend and of number Departures of participating users on Oct. 13, 20064 pattern in P2P10 IPTV systems is different from P2PTime(h) file sharing Time(h) In this section, we examine the peer arrivalsystems, and departure where peer departures are mostly(b)(a) Peer Peer triggered departure(c) arrival rate rate by the 250 (b) Peer(d) departure rate pattern for various PPLive channels. We plot the numbers of asynchronous completions (or, the detections of completions) 200 3 250 10 4 8 Fig.12 11.16 Peer20 24 arrival4 and8 departure12 16 20 evolution24 of a popularFig. movie 12. channel Peer arrival and departure evolution of a popular TV channel pullpeer P2P arrivals streaming and departures systems scales of the well, popular handling movieof a flashfile channel downloads. crowd in ThisFigure suggestsTime(h) 2.1: thatPeer P2P arrival IPTV (a) and systems departure (b) rate of a Movie channel and a TV channel (c) (d) 1 150 inevery a live minute broadcasting. of one day in Figure 11. Comparingexpect lower it with peer churnin PPLive rates200 in the middle of a program. Fig. 10. Flash crowd on Chinese new year eve # of Arrival the evolution of the number of participatingConsequently, peers, we find peersWe can also maintain plot the morenumbers stable of peer partnership arrivals and departuresWe define100 of the peer lifetime as the time between the arrival 6 that peers10 join and leave at a higher rate at peak times. We 150 Jan. 28 Jan. 29 the popular TV channel in every minute of one day in Figure50 also see consecutive spikes with a period of about 2 hours 12. We observe100 that the arrival pattern of this TV channel

# of Departure 1 0 broadcasted using the PPLive steaming platform0 2 during4 6 8 a 1024 12hours14 16 time18 20 span.22 24 As 5 C. User Arrivals and Departures in the departure10 rate curve in Figure 11(b). The spikes are is similar to that of the movie channel. However, there is no Time(h) we can50 see, the join rate varies significantly during the day and peaks out at noon due to many peers leaving immediately and simultaneouslyIn this section, at periodic we examine batch the departure peer arrival pattern and for departure this TV channel. (a) Peer arrival rate the end of (roughly) two-hour programs. This batch-departure and late evening. Being this a movie channel, we observe batch peer departures at # of peers pattern for various PPLive channels.0 We plot the numbers of 300 4 0 2 4 6 8 10 12 14 16 18 20 22 24 250 pattern in10 P2P IPTV systems is different from P2Ppeer file arrivals sharing and departuresthe end of the of each popular movie, movieTime(h) i.e. channel every in second hour. Figure 2.1(c) and 2.1(d) instead show systems, where peer departures are mostly triggered by the 250 every minute of one daythe in same Figure type 11.(b) of ComparingPeer statistics departure itrate but with for a TV channel.200 The arrival rate is very similar asynchronous completions (or, the detections of completions) 200 3 the evolution of the numberto the of previous participating case, peers, however, we find the departures trend is totally different. Figure 2.2 10 4 8 12 16 20 24 4 8 12 16 20 24 Fig. 12. Peer arrival and departure evolution of a popular TV channel150 of file downloads. This suggestsTime(h) that P2Pthat IPTV peers systems join and leaveshows at a the higher lifetime rate at peakdistribution times. We of the same system. As we can see, most of the expect lower peer churn rates in the middle of a program. 150

also see consecutive spikes# of Arrival with a period of about 2 hours users, both for the movie and TV channel, stay100 online for less than 1.5 hours. It is Fig. 10. Flash crowd on Chinese new year eve # of Departure Consequently, peers can maintain more stable partnership We define100 the peer lifetime as the time between the arrival in the departure rate curvetherefore in Figure clear 11(b). that The a steady spikes amount are of dynamicity should be expected in a peer- 50 due to many peers leavingto-peer immediately50 live streaming and simultaneously system at all times. In addition, flashcrowds and sudden the end of (roughly) two-hour programs. This batch-departure 0 0 C. User Arrivals and Departures departure0 2 of4 large6 8 number10 12 14 of16 peers18 20 are22 24 also observed.0 2 4 6 8 10 12 14 16 18 20 22 24 pattern in P2P IPTV systems is different from P2PTime(h) file sharing Time(h) In this section, we examine the peer arrivalsystems, and departure where peer departuresFurther are source mostly(a) Peer triggeredof arrival disruption rate by the in a peer-to-peer network(b) Peer is departure congestion. rate Although pattern for various PPLive channels. We plot theasynchronous numbers of completionsit is (or, believed the detections that the of completions) last mile segment is usually the bottleneck along a network 250 peer arrivals and departures of the popular movieof file channel downloads. in Thisroute, suggests congestion that P2P can IPTV be systemsexperiencedFig. 12. at differentPeer arrival and levels departure in evolution the network. of a popular This TV channel is every minute of one day in Figure 11. Comparingexpect lowerit with peer churncaused rates200 mainlyin the middle by the of afact program. that ISPs and ASs dimension their internal network the evolution of the number of participating peers,Consequently, we find peers canand maintain border linksmore consideringstable partnership the averageWe define traffic, the rather peer lifetime than asthe the peak. time between This leaves the arrival 150 that peers join and leave at a higher rate at peak times. We room to possible congestion scenarios when most of the users are downloading also see consecutive spikes with a period of about 2 hours 100

significant# of Departure amount of data, e.g. when accessing high definition live streaming events in the departure rate curve in Figure 11(b). The spikes are due to many peers leaving immediately and simultaneously at 50 1Figures taken from "A Measurement Study of a Large-Scale P2P IPTV System" [15] the end of (roughly) two-hour programs. This batch-departure 0 0 2 4 6 8 10 12 14 16 18 20 22 24 pattern in P2P IPTV systems is different from P2P file sharing Time(h) systems, where peer departures are mostly triggered by the (b) Peer departure rate asynchronous completions (or, the detections of completions) of file downloads. This suggests that P2P IPTV systems Fig. 12. Peer arrival and departure evolution of a popular TV channel expect lower peer churn rates in the middle of a program. Consequently, peers can maintain more stable partnership We define the peer lifetime as the time between the arrival 8 and the departure of the peer. Our analysis shows that peer largest share at about 8PM EST. Thus the behavior of users lifetimes vary from very small values up to 16 hours. There in North America is quite similar to users in Asia. are totally 34, 021 recorded peer sessions for the popular 100% channel and 2, 518 peer sessions for the unpopular channel. Others 90%

The peer lifetime distribution in Figure 13 suggests that peers 80% preferCHAPTER to stay longer 2. PEER-TO-PEER for popular programs LIVE than STREAMING for unpopular 70% 9 programs. However 90% of peers for both programs have 60% 50% lifetimes shorter than 1.5 hours. 40% ! Jan. 28 Jan. 29 Distribution of peers 30% 1 " 20% 0.9 10% Others 0.8 North America Asia

! 4 8! 12 16 20 24 4 8 12 16 20 0.7 #! $ Time(h) 0.6 0.5 Fig. 15. Evolution of geographic distribution during Chinese new year’s eve 0.4 ! ! ! ! ! CDF Probability 0.3 % & ' ( )

0.2 One lesson learned from this crawling study is that, with PopMovie 0.1 UnpopTv PopTv P2P IPTV systems, it is possible to track detailed user be- 0 ! ! ! 0 30 60 90 120 150 180 * "+! ""! Time(mins) havior. Unlike traditional! broadcast television, the operators "+! of a P2P IPTV system can track the type of programs a Figure 2.2: Peer Lifetime Distribution1 Figure 2.3: Tree-based system Fig. 13. Peer lifetime distribution user watches, the region in which the user lives, the times at which the user watches, and the users channel switching behavior. Such detailed information will likely be used in the D.of User particular Geographic interest. Distribution Effects of network congestion are:future longer for targeted, transmission user-specific delay, advertising. This research also demonstrates that an independent third-party can also track packetWe classify loss andPPLive blocking users into of new three connections. regions: users from peer and user characteristics for a P2P IPTV system. Similar Asia,An users additional from North requirement America, and of users a live from streaming the rest of platform is that the stream must to file-sharing monitoring companies (such as Big Champagne thebe world. received To accomplish with a small this classification, delay from we the map source a peer’s of the stream. Furthermore, users [17]), IPTV monitoring companies will likely emerge, and will IP address to a region by querying the free MaxMind GeoIP in the same geographical location should not experienceprovide significant content creators, difference content in distributors, and advertisers database [16]. Figure 14 shows the evolution of the geographic playback point with respect to their neighbours. Thesewith information requirements about tie user directly interests. distributioninto the meaningof the popular of live channel broadcasting. during one full Guaranteeing day. The low delay from the source is figureparticularly is divided cumbersome into three regions in by peer-to-peer two curves. The networks bottom since, in most cases, the stream region is made up of the peers from Asia, the middle region IV. PLAYBACK DELAY AND PLAYBACK LAGS AMONG has to traverse multiple hops. In order to solve this issue, it is often necessary to is for the peers from North America, and the top region is PEERS forintroduce peers from structure the rest of in the the world. overlay We can such see that that most the amount of As of theoretically hops can be demonstrated kept under in [18], appropriate buffering userscontrol. come from Asia. Again, the percentage of peers from can significantly improve video streaming quality. However, Asia reachesFinally, the a lowest peer-to-peer point around live7 streamingPM to 8PM application EST. too should much strive buffering for may keeping make the the delay performance unac- traffic inside a certain network segment, i.e. an ISP orceptable an Autonomous for a streaming System service. (AS) In this section, we report network.100% In fact, hosts which belong to a certain segmentquantitative are likelyresults to have on the low buffering com- effect of PPLive peers on munication delay between them, whereas promoting intra-segmentthe delay performance. communication In particular, we used our measurement platforms, to obtain insights into start-up delay and playback also lowers95% incurred peering costs for network operators. There exists three main approaches to peer-to-peerlags oflive peers. streaming, we detail

them in90% the next sections. Distribution of Peers A. Start-up Delay

Others Start-up delay is the time interval from when one channel is North America Asia 85% 2.1 Tree-based0 2 4 6 8 10 12 14 16 18 20 22 24 selected until actual playback starts on the screen. For stream- Time(h) ing applications in the best-effort Internet, start-up buffering Fig.The 14. Tree-based Geographic distribution approach of popular aims movie to channel recreate the samehas network always structure been a useful of IP mechanism mul- to deal with the rate ticast but with using an overlay network. The peersvariations organize of themselves streaming sessions. in a tree P2P streaming applications whoseFig 15 plots root the is evolution the broadcast of peer geographic provider. distribution The content for isadditionally then pushed have to from deal thewith root peer churn, increasing the need thealong Spring the Festival tree by Gala letting event onpeers the pastreceive Chinese the Newvideo Year’s from theirfor startup parents buffering and then and delay forward [18]. While short start-up delay Eve.it to This their figure children. has the same format as Figure 14, with three is desirable, certain amount of start-up delay is necessary for regionsThe denoting typical three structure different geographical of a system regions. based We on can the tree-basedcontinuous playback. approach Using is shown our monitored peers (see Section see that for this event, many peers from outside of Asia were V), we recorded two types of start-up delays in PPLive: the in Figure 2.3. Peers with highest upload bandwidth capacity are usually placed watching this live broadcast – in fact, a significantly higher delay from when one channel is selected until the streaming percentagenear the of the peers source were of from the outside stream, of Asia in this as compared case peersplayer number pops two up; and and the three, delay and from when the player pops up withthe Figure others 14. are The positioned geographic intodistribution new rows evolution according is con- tountil their the available playback actually capacity starts. in a For a popular channel, the sistentdecreasing with the order. observations Examples in Section of III-C: tree-based Peers from systems North aremeasured Overcast player [22], pop-up Climber delay [34] was from 5 to 10 seconds and America have the smallest share at about 7AM EST, and the the player buffering delay was 5 to 10 seconds. Therefore, 10 2.1. TREE-BASED

,! +!

*!

! ! %&! % ! ! &'! ! ! %%! ! &&! (! ! ! ! ) #! *! ! + #! ! ! ! ! ! % ,! ! ' ! )! ( %(! ! &! $! ! ! ! $! ! ! " ! %&! " ! "! &'! %,! " ! # ! # Figure 2.4: ! Multitree-based system Figure 2.5: Mesh-based system $ ! $ ! and ZigZag [54]. The main advantage of a tree-based overlay is that the distribu- tion delay is predictable and corresponds to the sum of the delays along the overlay path. There exist a number of ways to construct a peer-to-peer tree given a set of available peers. Two major aspects must be considered when building such a structure: the depth of the tree and the fan-out degree of the internal nodes. For instance, as peers from lower levels in the tree receive content from others which are closer to the source, it is necessary to keep the number of rows in the tree to a bare minimum in order to avoid delays. As a consequence, the fan out degree of the internal nodes in the tree should be as large as possible. However, the fan out degree of a node is constrained by its upload capacity and it is therefore able to provide only to a limited number of peers. In order to improve bandwidth utilization, a multi-tree structure can be used. In a single tree structure, the leaves of the tree do not contribute to the delivery. In a multi-tree configuration instead, the content server splits the stream in multiple sub- streams which are then broadcasted over separate overlay trees. As a consequence, a peer might be a provider of a certain sub-stream but only a receiver for another. Solutions using this approach are, for instance, SplitStream [7] and Orchard [30]. A study on a widely deployed system, i.e. GridMedia [61], has shown that using multi-tree based systems leads to near-optimal bandwidth utilization within a cer- tain degree of churn. Figure 2.4 shows a multi-tree overlay network structure. In the example, two sub-streams are broadcasted by the streaming server along two trees, starting at peer 2 and 1. Maintenance of the streaming tree is essential given the high dynamicity of peer- to-peer networks. When a peer abruptly leaves the overlay, all of its descendants get disconnected from the stream. In order to create room for the system to re- cover from a failure, a peer typically keeps a buffer. The buffer provides a way to compensate for disruptions in the delivery. Since the buffer itself introduces a delay from the point the data is received to the one it is provided to the player, its size is usually kept small. It is therefore very important for the system to be able to recover quickly from failures in order to avoid playback issues. CHAPTER 2. PEER-TO-PEER LIVE STREAMING 11

2.1.1 Overlay maintenance and construction A tree-based overlay construction and maintenance can be carried out either in a centralized or decentralized fashion. We describe both of the methods in the next paragraphs.

Centrally-coordinated tree construction. In this case, peers rely on a central coordinator for instructions on which is the peer they should get the stream from. The central server keeps an overview of the system which includes characteristics of the peers and configuration of the tree at a certain point in time. Using this information, the central server makes decisions upon the join of a peer or a failure. For the purpose of load balancing, the central entity might enforce an explicit reconfiguration of the overlay based on some criteria. Load balancing operations may be carried out, for instance, to lower the number of rows in the overlay three or to improve efficiency. The coordinator might become the performance bottleneck of the system, lim- iting scalability. This considering the challenging task of both providing quick reaction in case of churn and coping with overlay management complexity. To the best of our knowledge, central coordination has been used exclusively for content distribution, e.g. in Antfarm [36], but never for live streaming. In this thesis, we will explore this approach and report our experience and findings.

Decentralized tree construction. A number of distributed algorithms have been designed to construct and maintain a tree-based live streaming overlay. In this approach, peers negotiate by means of gossiping their placement in the tree using their upload bandwidth as a metric. Examples of systems using decentralized tree construction are: SplitStream [7], Orchard [30], ChunkySpread [57] and CoopNet [32].

2.2 Mesh-based

In mesh-based overlay networks no static overlay structure is enforced. Instead, peers create and lose peering relationships dynamically. Examples of systems using this kind of approach are: SopCast [2], DONet/Coolstreaming [62], Chainsaw [33], BiToS [58] and PULSE [37]. A typical structure of a mesh-based system is shown in Figure 2.5. In the pictured case, only few peers receive the stream from the source while the majority of them exchanges content chunks through overlay links, i.e. the black arrows in the picture. A mesh-based system is usually composed by three main parts: membership, partnership and chunk scheduling. The membership mechanism allows peers to discover others in the network receiving the same stream. This is usually achieved by means of a central discovery service where all peers report their address, e.g. the Bittorrent Tracker. Another way of discovering peers 12 2.2. MESH-BASED is using gossip. Gossiping algorithms in this case are usually biased towards finding neighbours that have interesting characteristics, e.g. peers with similar playback point and available upload capacity [35] or that are geographically closer to the requester [23]. Upon the discovery of peers, the partnership service is used to establish temporary peering connections with a subset of peers which is considered suitable for the receipt of the stream. A partnership between two peers is commonly established according to the fol- lowing considerations:

• The load on the peer and resource availability at both ends. Possible load metrics include: available upload and download bandwidth capacity and CPU/memory usage. • Network Connectivity. The potential quality of the link between the two peers in terms of delay, packet loss characteristics, network proximity and firewall/NAT configurations, as in [24]. • Content Availability. The available chunks at both ends, i.e. the parts of the stream which have been already downloaded and are available locally. Availability of chunks is advertised periodically.

Being the peer-to-peer network very dynamic, the partnership service continuously creates and terminates peering relationships. Control traffic generated by the part- nership service is usually significant given that peers need frequent exchange of sta- tus information. On the other hand, less frequent updates might lead to increased latencies since pieces become available later. On top of that, a larger degree of partners gives more flexibility when requesting content, as peers have more choice to quickly change from a partner to the other if complications arise during the de- livery. It is therefore important to find the right trade-off between partner set size and frequencies of updates. The chunk scheduling service is entitled with the task of requesting content chunks as the delivery progresses, making use of information about the available peers collected by the membership service. Differently from the tree-based case, content chunks are not delivered through an already established overlay network path. In fact, a peer might be downloading different chunks from different peers in parallel. Chunks may then take different paths to reach a peer and consequently be delivered in an out-of-order fashion. In order to guarantee continuous playback, a peer keeps a buffer of received chunks and re-orders them before delivering them out to the player. The content of the buffer is usually what is made available to other peers. For this purpose, a peer might keep chunks available for a longer time. Given their high peering degree and randomness, mesh-based systems are ex- tremely robust both against churn and network-related disruptions, such as con- gestion. Since no static structure is enforced on the overlay, a peer can quickly switch between different providers if a failure occurs or the the necessary streaming rate cannot be sustained. That said, since every chunk is treated as a separate delivery unit, per-packet distribution paths and delivery times are not predictable CHAPTER 2. PEER-TO-PEER LIVE STREAMING 13 and highly variable. Consequently, it is very challenging to design a mesh-based streaming platform able to guarantee playback deadlines. Another drawback of the mesh-based approach is sub-optimal bandwidth uti- lization of provider peers since chunk requests are mostly unpredictable.

2.3 Hybrid Approaches

A tree-based approach can be combined with a mesh-based to obtain better band- width utilization and lower delays. mTreebone[59] elects a subset of nodes in the system as stable and uses them to form a tree structure. The content is broadcasted from the source node along the tree structure. A second mesh overlay is then built comprising both of the peers in the tree and of the rest of the peers in the system. For content delivery, the peers rely on the few elected stable nodes but default to the auxiliary mesh nodes in case of a stable node failure. The drawback of this approach is that a few stable peers might become congested while the others are not contributing with their upload bandwidth. Thus, this solution clearly ignores the aspect of efficient bandwidth utilization. As an alternative approach, CliqueStream [3] creates clusters of peers using both delay and locality metrics. One or more peers are then elected in each cluster to form a tree-like structure to interconnect the clusters. Both in PRIME [29] and NewCoolsteaming [26], peers establish a semi-stable parent-to-child relationship to combine the best of push and pull mesh approach. Typically a child subscribes to a certain stream of chunks from the parent and the parent pushes data to the child as soon as it becomes available. It has been shown that this hybrid approach can achieve near-optimal streaming rates [38] but, in this case as well, no consideration has been given to efficient bandwidth utilization.

Chapter 3

Problem Definition

In this work, we target the design and implementation of a live streaming platform which allows for large amount of bandwidth savings at the source of the stream and high quality of user experience. In order to achieve the aforementioned goal, we argue that a number of impor- tant challenges must be addressed:

• Efficient Overlay Management. The system must be carefully crafted to enable management of peers in a way to enable high upload bandwidth utilization while coping with real-time streaming deadlines. User quality of experience must be guaranteed at all times, even if bandwidth savings need to be sacrificed for that purpose. As a consequence, low initial buffering time, small playback and distribution delays should be maintained throughout a streaming session. A further requirement on the platform is to preserve network locality of streams in order to limit peering costs for operators. • Scalability. The proposed system should scale to thousands of users. In order to achieve this, obvious bottlenecks should be resolved with targeted solutions or by providing an alternative design of the platform. • Circumventing NAT limitations. In peer-to-peer systems deployed on consumer network, it is common to ob- serve a large amount of participating hosts behind NAT, typically up to 70% [47]. Obviously, limited accessibility to peers translates in limited efficiency of the system. In order to overcome this limitation, a number of NAT traversal techniques exist in order to enable connectivity even when peers are behind two different NAT boxes [10][53][42][43]. However, in our experience, their ef- fectiveness is limited in the case direct connections must be achieved between peers behind different NAT boxes. When a direct connection fails to be estab- lished, content is simply relayed through other peers [21]. This is not feasible

15 16

when considering streaming systems where the amount of data to be relayed is extremely large. Research on NAT limitations has not known significant advancements in the last years. This, even though NAT constraints are one of the most relevant issues in peer-to-peer systems, for the simple fact that good connectivity is a precondition for any distributed algorithm to function correctly. • Deterministic Testing/Debugging/Evaluation Environment. Arguably, when designing large-scale systems, it is infeasible to cover all pos- sible runtime scenarios with pure reasoning. For that reason, prototyping of peer-to-peer systems is often conducted in a controlled environment, such as Discrete Event Simulator. One of the main advantages in using a controlled environment is that, depending on the platform, it is possible to achieve partial reproducibility of executions, i.e. the ability to execute the same ex- periment many times while preserving the exact same sequence of events, at least network-related ones. This brings obvious advantages when it comes to debugging of code and detailed inspection of the application’s execution. A number of platforms have been developed to allow reproducibility of net- work events, both at kernel, e.g. [55], and application level, e.g. WiDS [28], ProtoPeer [11] and SSFNet [60]. However, to the best of our knowledge, no solution has been developed to manipulate operating system concurrency and thus enabling a fully deterministic application execution. Chapter 4

Thesis Contribution

In this chapter, we summarize the contributions of this thesis. First, we list the publications that were produced as a result of this work. After that, we provide a small summary of each publication’s contribution.

4.1 List of publications

• Roberto Roverso, Amgad Naiem, Mohammed Reda, Mohammed El-Beltagy, Sameh El-Ansary, Nils Franzen, and Seif Haridi. On The Feasibility Of Centrally-Coordinated Peer-To-Peer Live Streaming. In Proceedings of IEEE Consumer Communications and Networking Conference 2011, Las Vegas, NV, USA, January 2011

• Roberto Roverso, Sameh El-Ansary, and Seif Haridi. NATCracker: NAT Combinations Matter. In Proceedings of 18th International Conference on Computer Communications and Networks 2009, ICCCN ’09, pages 1–7, San Francisco, CA, USA, 2009. IEEE Computer Society. ISBN 978-1-4244-4581-3

• Roberto Roverso, Amgad Naiem, Mohammed El-Beltagy, Sameh El-Ansary, and Seif Haridi. A GPU-enabled solver for time-constrained linear sum as- signment problems. In Informatics and Systems (INFOS), 2010 The 7th In- ternational Conference on, pages 1–6, Cairo, Egypt, 2010. IEEE Computer Society. ISBN 978-1-4244-5828-8

• Roberto Roverso, Mohammed Al-Aggan, Amgad Naiem, Andreas Dahlstrom, Sameh El-Ansary, Mohammed El-Beltagy, and Seif Haridi. MyP2PWorld: Highly Reproducible Application-Level Emulation of P2P Systems. In Pro- ceedings of the 2008 Second IEEE International Conference on Self-Adaptive and Self-Organizing Systems Workshops, pages 272–277, Venice, Italy, 2008. IEEE Computer Society. ISBN 978-0-7695-3553-1

17 4.2. DESIGN AND IMPLEMENTATION OF A CENTRALLY-MANAGED 18 PEER-TO-PEER LIVE STREAMING PLATFORM List of publications of the same author but not related to this work • Roberto Roverso, Cosmin Arad, Ali Ghodsi, and Seif Haridi. DKS: Dis- tributed K-Ary System a Middleware for Building Large Scale Dynamic Dis- tributed Applications, Book Chapter. In Making Grids Work, pages 323–335. Springer US, 2007. ISBN 978-0-387-78447-2

4.2 Design and implementation of a centrally-managed peer-to-peer live streaming platform

The work describing the design and implementation of our live streaming platform has been published in a conference paper [49]. The paper appears as Chapter 6 in this thesis. In our work, we adopt the approach of a centrally coordinated overlay in order to provide significant amount of savings while coping with churn and limitations on connectivity of the underlying network. Our contribution lies in showing that the realization of a system based on the central coordination approach is indeed feasible. In this work we present a design where a central coordinator organizes the overlay network by issuing direct instructions to peers. Our results show that this approach leads to efficient delivery of streams by letting the central entity help peers in their provider’s choice, thus avoiding the time consuming trial-and-error process typical of fully decentralized solutions. The main challenge we faced was to design an efficient decision engine which can provide directions on how to organize the overlay in a very short period of time. If the population of the overlay network varies considerably due to churn while the decision engine is running, then its decisions might be of no use or even detrimental to the system performance. A decision is based on a snapshot of the overlay network status based on the information periodically sent by the peers to the central coordinator. The decision process is composed of three steps: I Row construction. Peers are placed in subsequent rows according to their available upload bandwidth. Peers with highest bandwidth capacity are placed in the row closest to the source of the stream so that they can provide to others. The rest of the rows are filled with peers with decreasing amount of upload bandwidth. During this process we make use of an heuristic based on the max-flow approach [13] in order to guarantee connectivity compatibility between peers in consecutive rows, in particular due to the presence of NAT. II Input definition. Every assignment between a peer a in an upper row and another b in a lower row is given a certain score and placed in an assignment matrix. The score depends on the following metrics: inter-peer delay, stickiness (connections that already exists are favoured), playback buffer matching, NAT compatibility probability and ISP friendliness. CHAPTER 4. THESIS CONTRIBUTION 19

III Assignment problem solving. In an effort to provide the best possible set of in- terconnections between peers in the overlay, we model peer assignment between consecutive rows as an optimization problem, namely as a Linear Sum Assign- ment Problem. We make use of an heuristic called Deep Greedy Switching (DGS) which provides fast outcomes while attaining near-optimal solutions. In practice, we have seen it deviate by less than 0.6%. The input to the opti- mization engine is the assignment matrix produced in step II.

Once an outcome of the decision has been achieved, orders are issued to the peers and the overlay re-organized accordingly. The process is repeated periodically or as need arises, i.e. when disruptions are observed on the overlay.

4.2.1 Solving Linear Sum Assignment Problems in a time-constrained environment Another contribution of this thesis is the parallelization of the Deep Greedy Switch- ing heuristic algorithm and its implementation on Graphic Processing Units (GPUs). This work is motivated by the fact that we identified the optimization engine to be the bottleneck of our peer-to-peer live streaming system. In fact, large instances of the Linear Sum Assignment problem need to be solved in order to couple peers of consecutive rows to each others. Sequential implementations of the DGS solver, although much faster than any other similar software using algorithms with opti- mal outcome, such as for the Auction algorithm [5][56][31], fell short of our needs in terms of time complexity. Given the time-constrained nature of the application, the solver must provide an outcome in a matter of seconds rather than minutes. This work was published in a conference [48] and can be found in Chapter 8 of this thesis. The conference paper details the parallelization of the original Deep Greedy Switching algorithm and the evaluation of the new heuristic’s implementa- tion on GPUs using the NVIDIA CUDA language [12]. The solver is very generic and can be used outside of our streaming platform’s context for solving any instance of the LSAP problem.

4.3 NAT Traversal

The work on NAT traversal has been published in a conference paper [47] and ap- pears in Chapter 7 of this thesis. In this work, we classify behaviours of Network Address Translators recently discovered by the community in 27 different types. Given this set of behaviors, we tackle two fundamental issues that were not pre- viously solved in the state of the art. We provide a study which comprehensively states, for each of the possible 378 combinations, whether direct connectivity, with- out relaying of traffic, is possible or not between NAT types. The outcome is the first contribution of this work. In addition, we define a formal model of NAT which helps us describe how each type behaves when allocating new ports. This is the second outcome of our work. As a third contribution, we define through 20 4.4. HIGHLY REPRODUCIBLE EMULATION OF P2P SYSTEMS an augmentation of currently-known traversal methods which techniques should be used to traverse all pairwise combinations of NAT types. This scheme is validated by reasoning using our formalism, simulation and implementation on a real test network.

4.4 Highly reproducible emulation of P2P systems

The work on application-level emulation has been published in a workshop [45] and appears as Chapter 9 of this thesis. In this work we describe our application- level emulator which has been designed to tackle the problem of reproducibility for development and testing of an existing Peer-To-Peer application. The contribution in this case is a novel platform which can provide highly transparent emulation which can be used for reproducible debugging, testing and evaluation of an already- developed peer-to-peer software. We achieve transparency by injecting custom implementations of standard/widely- used APIs, such as Apache MINA, for networking, and Java Executors, for con- currency. Calls made to those APIs are redirected to a Discrete Event Simulator adapted for the purpose of emulation. A further contribution of this work is an implementation of an accurate flow- level bandwidth model, based on the progressive filling algorithm [4], which is used to mimic transfers duration between peers according to the configured up- load/download capacity of the emulated nodes. Chapter 5

Conclusion and Future Work

In this Chapter, we present conclusions for this thesis work. Conclusion are de- scribed separately for each problem area that we addressed. Finally, we present some of the future work related to this thesis.

Design and implementation of a centrally-managed peer-to-peer live streaming platform

In this thesis, we have reported our efforts with addressing the issue of bandwidth savings and user experience in live streaming using a central coordination approach. We employed a design which models the issue of assigning peers to each others as a linear sum optimization problem (LSAP). Our main concern with scalability of the solution has been addressed using a fast LSAP heuristic called Deep Greedy Switching. We further improved on the performance of DGS by parallelizing the heuristic’s algorithm and implementing the resulting algorithm on commodity GPUs using the CUDA language. We show in our results, how the solver is able to handle 10, 000 problem instances in less than 3 seconds. The solver is generic and can used to solve any instance of LSAP given that a tolerance of 0.6%, or less than, on the outcome optimality is tolerated. The merit of our live streaming approach is that it provides high savings at the distribution site in real-world scenarios where realistic delay, packet-loss, bandwidth emulation and NAT constraints are modelled in the experiments. Our results also show that, in such difficult scenarios, the system is able to maintain low initial buffering time and playback delays. Concerns on scalability for overlays which require the assignment engine to scale over 10.000 can be addressed by either providing specialized GPU processing hardware, e.g. GPU racks, or by partitioning the overlay into multiple decision engines. An alternative approach is to manage a backbone of nodes with central coordination and let swarms of peers create along that backbone.

21 22

NAT Traversal

In this work, we proposed a comprehensive analysis of what combinations of NAT types are traversable using direct connectivity without relaying connections. We have defined a formal model for describing NAT behaviors. Using that model, we covered all possible NAT combinations describing which traversal techniques are applicable for each combination. With formal reasoning, we have shown that about 80% of all possible combina- tions are traversable. Our experimental results however, show that only 50% of all possible combinations are encountered in reality, or at least in our test network. Of those, 79.4% are shown to be traversable with various degrees of success, which are also presented in our study. Data on the time needed for the carrying out traversal strategies is also reported in our findings. It is the first time that such a comprehensive study, including all new find- ings in Network Address Translation behaviors, has been conducted to provide a clear understanding on NAT traversal both from a formal point of view and from experimental results on a real deployed network.

Highly reproducible emulation of P2P systems

We have provided our experience in developing an emulation framework which can be used for debugging, testing and evaluation of an already-existing peer-to-peer software. By inspecting the state of the art, we found that we could not find a tool which simultaneously satisfies the following requirements: minimal changes to the software under test, ease of deployment and high reproducibility. Our approach in developing the platform was to adopt application-level em- ulation but with ensuring that high reproducibility is met. We achieved this by controlling concurrency, system time and all network related events. The resulting framework called “MyP2PWorld” has been used for a number of months for testing our live streaming peer-to-peer application. It has resulted in huge improvements in software quality and bug discovery rate. It became an integral part of our development process. While we have initially developed MyP2PWorld to complement the implemen- tation of a specific software, we are now trying to make it available for other peer- to-peer software developers to use.

Future Work

As future work, we would like to investigate using central coordination or other types of peer-to-peer approaches with HTTP live streaming. HTTP-live streaming is significantly different from other streaming protocol in the way that requests for chunks of the content are completely unrelated between each others. CHAPTER 5. CONCLUSION AND FUTURE WORK 23

The streaming player at the end of the distribution chain decides which segment to request and at which bitrate. The time of the request is also not deterministic. This approach greatly simplifies the complexity on a unicast system, where the distribution site is composed of one or more standard HTTP servers. However, it poses significant issues when trying to efficiently organize the delivery of the content over a peer-to-peer overlay, since no assumptions can be made on which fragment of the content will be requested next. As far as we know, there exist no current peer-to-peer system which provides support for HTTP-live. Related to the work conducted in this thesis, we would like to extend our NAT traversal findings by deploying our solution on a global scale to conduct further tests and, if possible, improve our approach. We also would like to implement a connec- tivity library to be used in any peer-to-peer application and which transparently handles traversing of NAT boxes. As an additional feature of the connectivity library, we would like to include support of different traffic prioritization policies. This would allow peer-to-peer applications to tweak the desired level of QoS. In the case of live streaming for instance, traffic generated by the peer-to-peer application should have priority over all other transfers happening at the same time on the host. We think this is achievable using dynamic congestion control mechanism, such as MulTFRC [9] or LEDBAT [44], over UDP. We are not aware of any other attempt of including general traffic prioritization for peer-to-peer applications at application-level. We are currently in the process of transforming the platform called MyP2PWorld to add support for a component model and event driven runtime. This would allow for faster prototyping and easier testing of single part of the application. The event runtime should also provide a greater degree of scalability and better execution semantics when it comes to concurrency. We are also considering improving scalability of the bandwidth model’s imple- mentation used in MyP2PWorld.

Bibliography

[1] Real Data Transport. https://helixcommunity.org/viewcvs/server/protocol/ transport/rdt/. [2] SopCast. http://sopcast.com. [3] Shah Asaduzzaman, Ying Qiao, and Gregor von Bochmann. CliqueStream: an efficient and fault-resilient live streaming network on a clustered peer-to-peer overlay. CoRR, abs/0903.4365, 2009. [4] Dimitri Bertsekas and Robert Gallager. Data Networks. Prentice Hall, second edition, 1992. [5] D.P. Bertsekas. The auction algorithm: A distributed relaxation method for the assignment problem. Annals of Operations Research, 14(1):105–123, 1988. [6] Fabian Bustamante and Yi Qiao. Friendships that Last: Peer lifespan and its Role in P2P Protocols. Web Content Caching and Distribution, pages 233–246, 2004. [7] Miguel Castro, Peter Druschel, Anne-Marie Kermarrec, Animesh Nandi, Antony Rowstron, and Atul Singh. Splitstream: high-bandwidth mul- ticast in cooperative environments. In SIGOPS Oper. Syst. Rev., vol- ume 37, pages 298–313, New York, NY, USA, October 2003. ACM. URL http://doi.acm.org/10.1145/1165389.945474. [8] Meeyoung Cha, Haewoon Kwak, Pablo Rodriguez, Yong-Yeol Ahn, and Sue Moon. Analyzing the video popularity characteristics of large-scale user gener- ated content systems. IEEE/ACM Trans. Netw., 17:1357–1370, October 2009. ISSN 1063-6692. URL http://dx.doi.org/10.1109/TNET.2008.2011358. [9] D. Damjanovic and S. Gjessing. MulTFRC IETF draft. http://tools. ietf.org/html/draft -irtf-iccrg-multfrc-01, July 2010. [10] Bryan Ford, Pyda Srisuresh, and Dan Kegel. Peer-to-peer communication across network address translators. In ATEC ’05: Proceedings of the annual conference on USENIX Annual Technical Conference, pages 13–13, Berkeley, CA, USA, 2005. USENIX Association.

25 26 BIBLIOGRAPHY

[11] Despotovic Galuba, Aberer and Kellerer. Protopeer: A p2p toolkit bridging the gap between simulation and live deployement. 2nd International ICST Conference on Simulation Tools and Techniques, May 2009.

[12] M. Garland, S. Le Grand, J. Nickolls, J. Anderson, J. Hardwick, S. Morton, E. Phillips, Yao Zhang, and V. Volkov. Parallel Computing Experiences with CUDA. Micro, IEEE, 28(4):13–27, 2008.

[13] A V Goldberg and R E Tarjan. A new approach to the maximum flow problem. In STOC ’86: Proceedings of the eighteenth annual ACM symposium on Theory of computing, pages 136–146, New York, NY, USA, 1986. ACM. ISBN 0-89791- 193-8.

[14] X. Hei, C. Liang, J. Liang, Y. Liu, and K. W. Ross. Insights into PPLive: A Measurement Study of a Large-Scale P2P IPTV System. In In Proc. of IPTV Workshop, International World Wide Web Conference, 2006.

[15] Xiaojun Hei, Chao Liang, Jian Liang, Yong Liu, and Keith W. Ross. A mea- surement study of a large-scale P2P IPTV system. Multimedia, IEEE Trans- actions on, 2007.

[16] Apple Inc. HTTP Live Streaming. http://developer.apple.com/resources/http- streaming/.

[17] Microsoft Inc. Smooth Streaming. http://www.iis.net/download/SmoothStrea- ming.

[18] Skype Inc. Skype. http://www.skype.com/.

[19] Wuala Inc. Wuala. http://www.wuala.com/.

[20] Ipoque. Internet Study 2008/2009. http://www.ipoque.com/userfiles/file/ipo- que-Internet-Study-08-09.pdf.

[21] C. Huitema J. Rosenberg, R. Mahy. Traversal Using Relays around NAT (TURN): Relay extensions to session traversal util- ities for NAT (STUN). Internet draft, November 2008. URL http://tools.ietf.org/html/draft-ietf-behave-turn-14.

[22] John Jannotti, David K. Gifford, Kirk L. Johnson, M. Frans Kaashoek, and James W. O’Toole, Jr. Overcast: Reliable Multicasting with an Overlay Network. In Proceedings of the 4th conference on Symposium on Operating System Design & Implementation - Volume 4, OSDI’00, pages 14–14, Berkeley, CA, USA, 2000. USENIX Association. URL http://portal.acm.org/citation.cfm?id=1251229.1251243. BIBLIOGRAPHY 27

[23] David Kempe, Jon Kleinberg, and Alan Demers. Spatial gossip and resource location protocols. In Proceedings of the thirty-third an- nual ACM symposium on Theory of computing, STOC ’01, pages 163– 172, New York, NY, USA, 2001. ACM. ISBN 1-58113-349-9. URL http://doi.acm.org/10.1145/380752.380796. [24] Anne-Marie Kermarrec, Alessio Pace, Vivien Quema, and Valerio Schiavoni. NAT-resilient Gossip Peer Sampling. Distributed Computing Systems, Inter- national Conference on, 0:360–367, 2009. ISSN 1063-6927. [25] G. Kreitz and F. Niemela. Spotify – large scale, low latency, P2P music- on-demand streaming. In Proceedings of the Tenth IEEE International Con- ference on Peer-to-Peer Computing (P2P), pages 1–10, August 2010. URL http://dx.doi.org/10.1109/P2P.2010.5569963. [26] B. Li, Y. Qu, Y. Keung, S. Xie, C. Lin, J. Liu, and X. Zhang. Inside the New Coolstreaming: Principles, Measurements and Performance Implications. In INFOCOM 2008. The 27th Conference on Computer Communications. IEEE, 2008.

[27] J. Liang, R. Kumar, and K. Ross. The kazaa overlay: A measurement study. In Proceedings of the 19th IEEE Annual Computer Communications Workshop, 2004. [28] Shiding Lin, Aimin Pan, Rui Guo, and Zheng Zhang. Simulating large-scale p2p systems with the wids toolkit. In Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, 2005. 13th IEEE International Symposium on, pages 415 – 424, 2005. [29] N. Magharei and R. Rejaie. PRIME: Peer-to-Peer Receiver-drIven MEsh- Based Streaming. In INFOCOM 2007. 26th IEEE International Conference on Computer Communications. IEEE, pages 1415–1423, May 2007. URL http://dx.doi.org/10.1109/INFCOM.2007.167. [30] J.J.D. Mol, D.H.J. Epema, and H.J. Sips. The Orchard Algorithm: P2P Mul- ticasting without Free-Riding. Peer-to-Peer Computing, IEEE International Conference on, 0:275–282, 2006. [31] A. Naiem and M. El-Beltagy. Deep greedy switching: A fast and simple ap- proach for linear assignment problems. In (To Appear in) 7th International Conference of Numerical Analysis and Applied , 2009. [32] Venkata N. Padmanabhan and Kunwadee Sripanidkulchai. The case for cooperative networking. In Revised Papers from the First Inter- national Workshop on Peer-to-Peer Systems, IPTPS ’01, pages 178– 190, London, UK, 2002. Springer-Verlag. ISBN 3-540-44179-4. URL http://portal.acm.org/citation.cfm?id=646334.758993. 28 BIBLIOGRAPHY

[33] Vinay Pai, Kapil Kumar, Karthik Tamilmani, Vinay Sambamurthy, and Alexander Mohr. Chainsaw: Eliminating trees from overlay multicast. In Peer-to-Peer Systems IV, volume Volume 3640/2005, pages 127–140. Springer Berlin / Heidelberg, 2005. URL http://dx.doi.org/10.1007/11558989_12.

[34] Kunwoo Park, Sangheon Pack, and Taekyoung Kwon. Climber: an incentive- based resilient peer-to-peer system for live streaming services. In Proceed- ings of the 7th international conference on Peer-to-peer systems, IPTPS’08, pages 10–10, Berkeley, CA, USA, 2008. USENIX Association. URL http://portal.acm.org/citation.cfm?id=1855641.1855651.

[35] Amir H. Payberah, Jim Dowling, Fatemeh Rahimian, and Seif Haridi. gra- dienTv: Market-based P2P Live Media Streaming on the Gradient Overlay. In Lecture Notes in Computer Science (DAIS 2010), pages 212–225. Springer Berlin / Heidelberg, Jan 2010. ISBN 978-3-642-13644-3.

[36] Ryan S. Peterson and Emin Gün Sirer. Antfarm: efficient con- tent distribution with managed swarms. In Proceedings of the 6th USENIX symposium on Networked systems design and implementation, pages 107–122, Berkeley, CA, USA, 2009. USENIX Association. URL http://portal.acm.org/citation.cfm?id=1558977.1558985.

[37] Fabio Pianese, Diego Perino, Joaquin Keller, and Ernst W. Biersack. Pulse: An adaptive, incentive-based, unstructured p2p live streaming system. IEEE Transactions on Multimedia, 9(8):1645–1660, December 2007. ISSN 1520-9210. URL http://dx.doi.org/10.1109/TMM.2007.907466.

[38] Fabio Picconi and Laurent Massoulié. Is there a future for mesh-based live video streaming? In Proceedings of the 2008 Eighth International Conference on Peer-to-Peer Computing, pages 289–298, Washington, DC, USA, 2008. IEEE Computer Society. ISBN 978-0-7695-3318-6. URL http://portal.acm.org/citation.cfm?id=1443220.1443468.

[39] Dongyu Qiu and R. Srikant. Modeling and performance analysis of BitTorrent- like peer-to-peer networks. In Proceedings of the 2004 conference on Applica- tions, technologies, architectures, and protocols for computer communications, SIGCOMM ’04, pages 367–378, New York, NY, USA, 2004. ACM. ISBN 1- 58113-862-8. URL http://dx.doi.org/10.1145/1015467.1015508.

[40] Dan Rayburn. 2010 Q4 CDN Pricing Detailed. http://blog. streamingmedia.com/the_business_of_online_vi/2010/06/datafromq1- showsvideocdnpricingstabilizingshouldbedown25fortheyear.html.

[41] M. Ripeanu. Peer-to-peer architecture case study: Gnutella network. In Peer-to-Peer Computing, 2001. Proceedings. First International Conference on, pages 99–100, August 2001. BIBLIOGRAPHY 29

[42] J. Rosenberg. Interactive Connectivity Establishment (ICE): A Protocol for Network Address Translator (NAT) Traversal for Offer/Answer Protocols. Internet draft, October 2007. URL http://tools.ietf.org/html/draft-ietf-mmusic-ice-19. [43] J. Rosenberg, J. Weinberger, C. Huitema, and R. Mahy. STUN - Sim- ple Traversal of User Datagram Protocol (UDP) Through Network Address Translators (NATs). RFC 3489 (Proposed Standard), March 2003. URL http://www.ietf.org/rfc/rfc3489.txt. Obsoleted by RFC 5389. [44] Dario Rossi, Claudio Testa, Silvio Valenti, and Luca Muscariello. LED- BAT: The New BitTorrent Congestion Control Protocol. In Com- puter Communications and Networks (ICCCN), 2010 Proceedings of 19th International Conference on, pages 1–6, August 2010. URL http://dx.doi.org/10.1109/ICCCN.2010.5560080. [45] Roberto Roverso, Mohammed Al-Aggan, Amgad Naiem, Andreas Dahlstrom, Sameh El-Ansary, Mohammed El-Beltagy, and Seif Haridi. MyP2PWorld: Highly Reproducible Application-Level Emulation of P2P Systems. In Pro- ceedings of the 2008 Second IEEE International Conference on Self-Adaptive and Self-Organizing Systems Workshops, pages 272–277, Venice, Italy, 2008. IEEE Computer Society. ISBN 978-0-7695-3553-1. [46] Roberto Roverso, Cosmin Arad, Ali Ghodsi, and Seif Haridi. DKS: Distributed K-Ary System a Middleware for Building Large Scale Dynamic Distributed Applications, Book Chapter. In Making Grids Work, pages 323–335. Springer US, 2007. ISBN 978-0-387-78447-2. [47] Roberto Roverso, Sameh El-Ansary, and Seif Haridi. NATCracker: NAT Com- binations Matter. In Proceedings of 18th International Conference on Com- puter Communications and Networks 2009, ICCCN ’09, pages 1–7, San Fran- cisco, CA, USA, 2009. IEEE Computer Society. ISBN 978-1-4244-4581-3. [48] Roberto Roverso, Amgad Naiem, Mohammed El-Beltagy, Sameh El-Ansary, and Seif Haridi. A GPU-enabled solver for time-constrained linear sum assign- ment problems. In Informatics and Systems (INFOS), 2010 The 7th Interna- tional Conference on, pages 1–6, Cairo, Egypt, 2010. IEEE Computer Society. ISBN 978-1-4244-5828-8. [49] Roberto Roverso, Amgad Naiem, Mohammed Reda, Mohammed El-Beltagy, Sameh El-Ansary, Nils Franzen, and Seif Haridi. On The Feasibility Of Centrally-Coordinated Peer-To-Peer Live Streaming. In Proceedings of IEEE Consumer Communications and Networking Conference 2011, Las Vegas, NV, USA, January 2011. [50] Stefan Saroiu, Krishna P. Gummadi, and Steven D. Gribble. A Measurement Study of Peer-to-Peer File Sharing Systems. Multimedia Computing and Net- working (MMCN), January 2002. 30 BIBLIOGRAPHY

[51] H. Schulzrinne. RTP: A Transport Protocol for Real-Time Ap- plications. RFC 3550 (Proposed Standard), July 2002. URL http://www.ietf.org/rfc/rfc3550.txt. [52] H. Schulzrinne, A. Rao, and R. Lanphier. RTSP: Real-Time Streaming Protocol. RFC 2326 (Proposed Standard), 1998. URL http://www.ietf.org/rfc/rfc2326.txt. [53] P. Srisuresh, B. Ford, and D. Kegel. State of Peer-to-Peer (P2P) Communica- tion across Network Address Translators (NATs). RFC 5128 (Informational), March 2008. URL http://www.ietf.org/rfc/rfc5128.txt. [54] D. Tran, K. Hua, and S. Sheu. ZIGZAG: An Efficient Peer-to-Peer Scheme for Media Streaming. In Proc. of IEEE INFOCOM, 2003.

[55] Amin Vahdat, Ken Yocum, Kevin Walsh, Priya Mahadevan, Dejan Kostic, Jeffrey S. Chase, and David Becker. Scalability and accuracy in a large-scale network emulator. In OSDI, 2002. [56] Cristina Nader Vasconcelos and Bodo Rosenhahn. Bipartite Graph Matching Computation on GPU. In Daniel Cremers, Yuri Boykov, Andrew Blake, and Frank R. Schmidt, editors, EMMCVPR, volume 5681 of Lecture Notes in Computer Science, pages 42–55. Springer, 2009. ISBN 978-3-642-03640-8. URL http://dblp.uni-trier.de/db/conf/emmcvpr/emmcvpr2009.html#Vasconc elosR09. [57] Vidhyashankar Venkataraman, Kaouru Yoshida, and Paul Francis. Chunkyspread: Heterogeneous Unstructured Tree-Based Peer-to-Peer Multicast. Proceedings of the Proceedings of the 2006 IEEE Inter- national Conference on Network Protocols, pages 2–11, 2006. URL http://portal.acm.org/citation.cfm?id=1317535.1318351. [58] A. Vlavianos, M. Iliofotou, and M. Faloutsos. Bitos: Enhancing bittorrent for supporting streaming applications. In 9th IEEE Global Internet Symposium 2006, April 2006. [59] Feng Wang, Yongqiang Xiong, and Jiangchuan Liu. mTreebone: A Hybrid Tree/Mesh Overlay for Application-Layer Live Video Multicast. In Proceed- ings of the 27th International Conference on Distributed Computing Systems, ICDCS ’07, pages 49–, Washington, DC, USA, 2007. IEEE Computer Society. ISBN 0-7695-2837-3. URL http://dx.doi.org/10.1109/ICDCS.2007.122. [60] Sunghyun Yoon and Young Boo Kim. A Design of Network Simulation En- vironment Using SSFNet. In Proceedings of the 2009 First International Conference on Advances in System Simulation, pages 73–78, Washington, DC, USA, 2009. IEEE Computer Society. ISBN 978-0-7695-3773-3. URL http://portal.acm.org/citation.cfm?id=1637862.1638154. BIBLIOGRAPHY 31

[61] Meng Zhang, Yun Tang, Li Zhao, Jian-Guang Luo, and Shi-Qiang Yang. Gridmedia: A Multi-Sender Based Peer-to-Peer Multicast Sys- tem for Video Streaming. In Multimedia and Expo, 2005. ICME 2005. IEEE International Conference on, pages 614–617, 2005. URL http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1521498. [62] Xinyan Zhang, Jiangchuan Liu, Bo Li, and Y. S. P. Yum. Cool- Streaming/DONet: a data-driven overlay network for peer-to-peer live media streaming. In INFOCOM 2005. 24th Annual Joint Confer- ence of the IEEE Computer and Communications Societies. Proceed- ings IEEE, volume 3, pages 2102–2111 vol. 3, March 2005. URL http://dx.doi.org/10.1109/INFCOM.2005.1498486.

Part II

Research Papers

33