<<

Live Streaming Performance of the Network

Hyunseok Chang Sugih Jamin Wenjie Wang Alcatel-Lucent Bell Labs University of Michigan Zattoo Inc. Holmdel, NJ 07733 USA Ann Arbor, MI 48109 USA Ann Arbor, MI 48105 USA [email protected] [email protected] [email protected]

ABSTRACT 1. INTRODUCTION A number of commercial peer-to-peer systems for live stream- We draw a distinction between three uses of peer-to-peer ing, such as PPLive, , LiveStation, SOPCast, TVants, (P2P) networks: delay tolerant file download of archival ma- etc. have been introduced in recent years. The behavior of terial, delay sensitive progressive download (or streaming) these popular systems has been extensively studied in sev- of archival material, and real-time live streaming. In the eral measurement papers. Due to the proprietary nature first case, the completion of download is elastic, depending of these commercial systems, however, these studies have on available bandwidth in the P2P network. The applica- to rely on a “black-box” approach, where packet traces are tion buffer receives data as it trickles in and informs the collected from a single or a limited number of measurement user upon the completion of download. The user can then points, to infer various properties of traffic on the control and start playing back the file for viewing in the case of a data planes. Although such studies are useful to compare file. In the second case, video playback starts as soon as different systems from end-user’s perspective, it is difficult the application assesses it has sufficient data buffered that, to intuitively understand the observed properties without given the estimated download rate and the playback rate, fully reverse-engineering the underlying systems. Our paper it will not deplete the buffer before the end of file. If this presents a large-scale measurement study of Zattoo, one of assessment is wrong, the application would have to either the largest production live streaming providers in Europe, pause playback and rebuffer, or slow down playback. While using data collected by the provider. To highlight, we found users would like playback to start as soon as possible, the that even when the Zattoo system was heavily loaded with application has some degree of freedom in trading off play- as high as 20,000 concurrent users on a single overlay, the back start time against estimated network capacity. The median channel join delay remained less than 2 to 5 seconds, third case, real-time live streaming, has the most stringent and that, for a majority of users, the streamed signal lags delay requirement. While progressive download may toler- over-the-air broadcast signal by no more than 3 seconds. To ate initial buffering of tens of seconds or even minutes, live motivate the measurement study, we also present a descrip- streaming generally cannot tolerate more than a few seconds tion of the Zattoo network architecture. of buffering. Taking into account the delay introduced by signal ingest and encoding, and network transmission and propagation, the live streaming system can introduce only Categories and Subject Descriptors a few seconds of buffering time end-to-end and still be con- C.2.1 [Computer-Communication Networks]: Network sidered “live” [1]. Architecture and Design; C.2.2 [Computer-Communication The Zattoo peer-to-peer live streaming system was a free- Networks]: Network Protocols; C.4 [Performance of Sys- to-use network serving over 3 million registered users in eight tems] European countries at the time of study, with a maximum of over 60,000 concurrent users on a single channel. The sys- tem delivers live streams using a receiver-based, peer-division General Terms multiplexing scheme as described in Section 2. After delv- measurement, performance, design ing into Zattoo’s architecture in detail, we study in Sections 3 and 4 large-scale measurements collected during the live broadcast of the UEFA European Football Championship, Keywords one of the most popular one-time events in Europe, in June, Peer-to-peer system, live streaming, network architecture 2008 [2]. During the course of the month of June 2008, Zattoo served more than 35 million sessions to more than one million distinct users. Drawing from these measure- ments, we report on the operational scalability of Zattoo’s Permission to make digital or hard copies of all or part of this work for live streaming system along several key issues: personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. IMC'09, November 4–6, 2009, Chicago, Illinois, USA. Copyright 2009 ACM 978-1-60558-770-7/09/11 ...$10.00.

417 1. How does the system scale in terms of overlay size and its effectiveness in utilizing peers’ uplink bandwidth?

2. How responsive is the system during channel switching,

¢  ¦ £  ¢ 

for example, when compared to the 3-second channel 

¡ ¥ ¡ ¨

switch time of satellite TV? ©

¡  §     ¡  ¡ 3. How effective is the packet retransmission scheme in  allowing a peer to recover from transient congestion?

4. How effective is the receiver-based peer-division multi-

§ ¡ ¢     ¦ ¢ © ¡ ¥ ¡

plexing scheme in delivering synchronized sub-streams?

¡ ¢ £ ¡ ¤ ¥ ¦ § ¨ © ¡ ¥ ¡

¡ £     © ¡ ¥ ¡

5. Would a peer further away from the stream source ex-  ¡

¡  £   ¢ ¨ ¡ ¥ ¡ ¨ perience adversely long lag compared to a peer closer ¦ to the stream source? 6. How effective is error-correcting code in isolating packet losses on the overlay? Figure 1: Zattoo delivery network architecture. We also discuss in Section 5 several challenges in increasing the bandwidth contribution of Zattoo peers. Finally, we describe related works in Section 6 and conclude in Section 7. network as n logical sub-streams. Thus the first packet gen- 2. SYSTEM ARCHITECTURE erated is considered part of the first sub-stream, the second The Zattoo system rebroadcasts live TV, captured from packet that of the second sub-stream, the n-th packet that satellites, onto the . The system carries each TV of the n-th sub-stream. The n + 1-th packet cycles back to channel on a separate peer-to-peer delivery network and is the first sub-stream, etc. such that the -th sub-stream car- not limited in the number of TV channels it can carry. Al- ries the mn + i-th packets, where m ≥ 0, 1 ≤ i ≤ n, and n though a peer can freely switch from one TV channel to a user-defined constant. We call a set of n packets with the another, and thereby departing and joining different peer- same index multiplier m a “segment.” Thus m serves as the to-peer networks, it can only join one peer-to-peer network segment index, while i serves as the packet index within a at any one time. We henceforth limit our description of the segment. Each segment is of size n packets. Being the packet Zattoo delivery network as it pertains to carrying one TV index, i also serves as the sub-stream index. The number channel. Fig. 1 shows a typical setup of a single TV chan- mn + i is carried in each packet as its sequence number. nel carried on the Zattoo network. TV signal captured from Zattoo uses the Reed-Solomon (RS) error correcting code satellite is encoded into H.264/AAC streams, encrypted, and (ECC) for forward error correction. The RS code is a sys- sent onto the Zattoo network. The encoding server may be tematic code: of the n packets sent per segment, k < n pack- physically separated from the server delivering the encoded ets carry the live stream data while the remainder carries the content onto the Zattoo network. For ease of exposition, redundant data [3, Section 7.3]. Due to the variable-bit rate we will consider the two as logically co-located on an En- nature of the data stream, the time period covered by a seg- coding Server. Users are required to register themselves at ment is variable, and a packet may be of size less than the the Zattoo website to download a free copy of the Zattoo maximum packet size. A packet smaller than the maximum player application. To receive the signal of a channel, the packet size is zero-padded to the maximum packet size for user first authenticates itself to the Zattoo Authentication the purposes of computing the (shortened) RS code, but is Server. Upon authentication, the user is granted a ticket transmitted in its original size. Once a peer has received k with limited lifetime. The user then presents this ticket, packets per segment, it can reconstruct the remaining n − k along with the identity of the TV channel of interest, to the packets. We do not differentiate between streaming data Zattoo Rendezvous Server. If the ticket specifies that the and redundant data in our discussion in the remainder of user is authorized to receive signal of the said TV channel, this paper. the Rendezvous Server returns to the user a list of peers When a new peer requests to join an existing peer, it spec- currently joined to the P2P network carrying the channel, ifies the sub-stream(s) it would like to receive from the exist- together with a signed channel ticket. If the user is the ing peer. These sub-streams do not have to be consecutive. first peer to join a channel, the list of peers it receives con- Contingent upon availability of bandwidth at existing peers, tain only the Encoding Server. The user joins the channel the receiving peer decides how to multiplex a stream onto by contacting the peers returned by the Rendezvous Server, its set of neighboring peers, giving rise to our description of presenting its channel ticket, and obtaining the live stream the Zattoo live streaming protocol as a receiver-based, peer- of the channel from them (see Section 2.1 for details). division multiplexing protocol. The details of peer-division Each live stream is sent out by the Encoding Server as multiplexing is described in Section 2.1 while the details n logical sub-streams. The signal received from satellite is of how a peer manages sub-stream forwarding and stream encoded into a variable-bit rate stream. During periods of reconstruction is described in Section 2.2. Receiver-based source quiescence, no data is generated. During source busy peer-division multiplexing has also been used by the latest periods, generated data is packetized into a packet stream, version of CoolStreaming peer-to-peer protocol though it dif- with each packet limited to a maximum size. The Encod- fers from Zattoo in its stream management (Section 2.2) and ing Server multiplexes this packet stream onto the Zattoo adaptive behavior (Section 2.3) [4].

418 2.1 Peer-Division Multiplexing To minimize per-packet processing time of a stream, the Zattoo protocol sets up a virtual circuit with multiple fan outs at each peer. When a peer joins a TV channel, it estab- lishes a peer-division multiplexing (PDM) scheme amongst a set of neighboring peers, by building a virtual circuit to each of the neighboring peers. Baring departure or perfor- mance degradation of a neighbor peer, the virtual circuits

are maintained until the joining peer switches to another

+ , TV channel. With the virtual circuits set up, each packet is * forwarded without further per-packet handshaking between peers. We describe the PDM boot strapping mechanism in this section and the adaptive PDM mechanism to handle peer departure and performance degradation in Section 2.3. The PDM establishment process consists of two phases:

the search phase and the join phase. In the search phase,

    

the new, joining peer determines its set of potential neigh- 

!  " #

bors. In the join phase, the joining peer requests peering

$  % !

' ' $ ( )   (  " relationships with a subset of its potential neighbors. Upon & acceptance of a peering relationship request, the peers be- come neighbors and a virtual circuit is formed between them. Search phase. To obtain a list of potential neighbors, a joining peer sends out a SEARCH message to a random sub- set of the existing peers returned by the Rendezvous Server. The SEARCH message contains the sub-stream indices for which this joining peer is looking for peering relationships. Figure 2: Zattoo peer with IOB. The sub-stream indices is usually represented as a bitmask of n bits, where n is the number of sub-streams defined for the TV channel. In the beginning, the joining peer will cally close-by peers, even if these peers have less capacity or be looking for peering relationships for all sub-streams and carry lower quality sub-streams. The “topological” location have all the bits in the bitmask turned on. In response to a of a peer is defined to be its subnet number, autonomous SEARCH message, an existing peer replies with the number system (AS) number, and country code, in that order of of sub-streams it can forward. From the returning SEARCH precedence. A joining peer obtains its own topological lo- replies, the joining peer constructs a set of potential neigh- cation from the Zattoo Authentication Server as part of its bors that covers the full set of sub-streams comprising the authentication process. The list of peers returned by both live stream of the TV channel. The joining peer continues to the Rendezvous Server and potential neighbors all come at- wait for SEARCH replies until the set of potential neighbors tached with topological locations. A topology-aware overlay contains at least a minimum number of peers, or until all not only allows us to be “ISP-friendly,” by minimizing inter- SEARCH replies have been received. With each SEARCH domain traffic and thus save on transit bandwidth cost, but reply, the existing peer also returns a random subset of its also helps reduce the number of physical links and metro known peers. If a joining peer cannot form a set of po- hops traversed in the overlay network, potentially resulting tential neighbors that covers all of the sub-streams of the in enhanced user-perceived stream quality. TV channel, it initiates another SEARCH round, sending 2.2 Stream Management SEARCH messages to peers newly learned from the previous round. The joining peer gives up if it cannot obtain the full We represent a peer as a packet buffer, called the IOB, stream after two SEARCH rounds. To help the joining peer fed by sub-streams incoming from the PDM constructed as 1 synchronize the sub-streams it receives from multiple peers, described in Section 2.1. The IOB drains to (1) a local each existing peer also indicates for each sub-stream the lat- media player if one is running, (2) a local file if recording est sequence number it has received for that sub-stream, and is supported, and (3) potentially other peers. Fig. 2 depicts the existence of any quality problem. The joining peer can a Zattoo player application with virtual circuits established then choose sub-streams with good quality that are closely to four peers. As packets from each sub-stream arrive at synchronized. the peer, they are stored in the IOB for reassembly to re- Join phase. Once the set of potential neighbors is es- construct the full stream. Portions of the stream that have tablished, the joining peer sends JOIN requests to each po- been reconstructed are then played back to the user. In tential neighbor. The JOIN request lists the sub-streams for addition to providing a reassembly area, the IOB also al- which the joining peer would like to construct virtual circuit lows a peer to absorb some variabilities in available network with the potential neighbor. If a joining peer has l potential bandwidth and network delay. neighbors, each willing to forward it the full stream of a TV The IOB is referenced by an input pointer, a repair pointer, channel, it would typically choose to have each forward only and one or more output pointers. The input pointer points 1/l-th of the stream, to spread out the load amongst the to the slot in the IOB where the next incoming packet with peers and to speed up error recovery, as described in Sec- 1In the case of the Encoding Server, which we also consider a tion 2.3. In selecting which of the potential neighbors to peer peer on the Zattoo network, the buffer is fed by the encoding with, the joining peer gives highest preference to topologi- process.

419

8 A B 8 A 9

sequence number higher than the highest sequence number

; < = ; > ? @ received so far will be stored. The repair pointer always : points one slot beyond the last packet received in order and

is used to regulate packet retransmission and adaptive PDM

8

; < = ; > ? C D

as described later. We assign an output pointer to each for- :

. / 0 1 2 2 3 warding destination. The output pointer of a destination in- -

dicates the destination’s current forwarding horizon on the

2 1 4 5 2 6 3 7 IOB. In accordance to the three types of possible forward- - ing destinations listed above, we have three types of output pointers: player pointer, file pointer, and peer pointer. One Figure 3: Packet map associated with a peer pointer. would typically have at most one player pointer and one file

pointer but potentially multiple concurrent peer pointers,

~  €

a b c a d e ` j k a l m

referencing an IOB. The Zattoo player application does not `

M NJF

U Q  OV

currently support recording. V

X Y Z [ \ ]

Q T OP S U V

Y _

Since we maintain the IOB as a circular buffer, if the in- ^

a b c a d e f coming packet rate is higher than the forwarding rate of `

a particular destination, the input pointer will overrun the

W WW

E F F GH

Y Z [ \ ]

output pointer of that destination. We could move the out- X

^ Y _

a b c a d e g h i

put pointer to match the input pointer so that we consis- `

` n o p o n q q a r s

F F GI

tently forward the oldest packet in the IOB to the destina- E

X Y Z [ \ ]

Y _

tion. Doing so, however, requires checking the input pointer ^

a b c a d e f

against all output pointers on every packet arrival. Instead, `

Q R S we have implemented the IOB as a double buffer. With OP

W WW

E JKL F G

X Y Z [ \ ]

Q T OP SU V

Y _

the double buffer, the positions of the output pointers are ^

a b c a d e g h i

checked against that of the input pointer only when the in- `

n o p o n q q a r t put pointer moves from one sub-buffer to the other. When `

the input pointer moves from sub-buffer a to sub-buffer b, all

r a u a j v a w x y u z a e a c x e { ` | } e the output pointers still pointing to sub-buffer b are moved to the start of sub-buffer a and sub-buffer b is flushed, ready to accept new packets. When a sub-buffer is flushed while Figure 4: IOB, input/output pointers and packet there are still output pointers referencing it, packets that maps. have not been forwarded to the destinations associate with those pointers are lost to them, resulting in quality degrada- tion. To minimize packet lost due to sub-buffer flushing, we Fig. 4 shows an IOB consisting of a double buffer, with an would like to use large sub-buffers. However, the real-time input pointer, a repair pointer, and an output file pointer, delay requirement of live streaming limits the usefulness of an output player pointer, and two output peer pointers ref- late arriving packets and effectively puts a cap on the max- erencing the IOB. Each output pointer has a packet map imum size of the sub-buffers. associated with it. For the scenario depicted in the figure, Different peers may request for different numbers of, pos- the player pointer tracks the input pointer and has skipped sibly non-consecutive, sub-streams. To accommodate the over some lost packets. Both peer pointers are lagging the different forwarding rates and regimes required by the desti- input pointer, indicating that the forwarding rates to the nations, we associate a packet map and forwarding discipline peers are bandwidth limited. The file pointer is pointing with each output pointer. Fig. 3 shows the packet map as- at the first lost packet. Archiving a live stream to file does sociated with an output peer pointer where the peer has not impose real-time delay bound on packet arrivals. To requested sub-streams 1, 4, 9, and 14. Every time a peer achieve the best quality recording possible, a recording peer pointer is repositioned to the beginning of a sub-buffer of always waits for retransmission of lost packets that cannot the IOB, all the packet slots of the requested sub-streams be recovered by error correction. are marked NEEDed and all the slots of the sub-streams not In addition to achieving lossless recording, we use retrans- requested by the peer are marked SKIP. When a NEEDed mission to let a peer recover from transient network conges- packet arrives and is stored in the IOB, its state in the packet tion. A peer sends out a retransmission request when the map is changed to READY. As the peer pointer moves along distance between the repair pointer and the input pointer its associated packet map, READY packets are forwarded to has reached a threshold of R packet slots, usually spanning the peer and their states changed to SENT. A slot marked multiple segments. A retransmission request consists of an NEEDed but not READY, such as slot n + 4 in Fig. 3, indi- R-bit packet mask, with each bit representing a packet, and cates that the packet is lost or will arrive out-of-order and the sequence number of the packet corresponding to the first is bypassed. When an out-of-order packet arrives, its slot is bit. Marked bits in the packet mask indicate that the corre- changed to READY and the peer pointer is reset to point to sponding packets need to be retransmitted. When a packet this slot. Once the out-of-order packet has been sent to the loss is detected, it could be caused by congestion on the vir- peer, the peer pointer will move forward, bypassing all SKIP, tual circuits forming the current PDM or congestion on the NEED, and SENT slots until it gets to the next READY path beyond the neighboring peers. In either case, current slot, where it can resume sending. The player pointer be- neighbor peers will not be good sources of retransmitted haves the same as a peer pointer except that all packets in packets. Hence we send our retransmission requests to r its packet map will always start out marked NEEDed. random peers that are not neighbor peers. A peer receiv-

420 ing a retransmission request will honor the request only if mate of available uplink bandwidth without triggering any the requested packets are still in its IOB and it has sufficient bad quality feedback from neighboring peers. A peer dou- left-over capacity, after serving its current peers, to transmit bles the estimated available uplink bandwidth if current esti- all the requested packets. Once a retransmission request is mate is below a threshold, switching to linear increase above accepted, the peer will retransmit all the requested packets the threshold, similar to how TCP maintains its congestion to completion. window size. A peer also increases its estimate of available uplink bandwidth if a neighbor peer departs the network 2.3 Adaptive PDM without any bad quality feedback. While we rely on packet retransmission to recover from When the repair pointer lags behind the input pointer transient congestions, we have two channel capacity adjust- by R packet slots, in addition to initiating a retransmission ment mechanisms to handle longer-term bandwidth fluctua- request, a peer also computes a loss rate over the R pack- tions. The first mechanism allows a forwarding peer to adapt ets. If the loss rate is above a threshold, the peer consid- the number of sub-streams it will forward given its current ers the neighbor slow and attempts to reconfigure its PDM. available bandwidth, while the second allows the receiving In reconfiguring its PDM, a peer attempts to shift half of peer to switch provider at the sub-stream level. the sub-streams currently forwarded by the slow neighbor Peers on the Zattoo network can redistribute a highly vari- to other existing neighbors. At the same time, it searches able number of sub-streams, reflecting the high variability for new peer(s) to forward these sub-streams. If new peer(s) in uplink bandwidth of different access network technologies. are found, the load will be shifted from existing neighbors For a full-stream consisting of sixteen constant-bit rate sub- to the new peer(s). If sub-streams from the slow neighbor streams, our prior study show that based on realistic peer continues to suffer after the reconfiguration of the PDM, the characteristics measured from the Zattoo network, half of peer will drop the neighbor completely and initiate another the peers can support less than half of a stream, 82% of peers reconfiguration of the PDM. When a peer loses a neighbor can support less than a full-stream, and the remainder can due to reduced available uplink bandwidth at the neighbor support up to ten full streams (peers that can redistribute or due to neighbor departure, it also initiates a PDM re- more than a full stream is conventionally known as supern- configuration. A peer may also initiate a PDM reconfig- odes in the literature) [5]. With variable-bit rate streams, uration to switch to a topologically closer peer. Similar to the bandwidth carried by each sub-stream is also variable. the PDM establishment process, PDM reconfiguration is ac- To increase peer bandwidth usage, without undue degra- complished by peers exchanging sub-stream bitmasks in a dation of , we instituted measurement-based admis- request/response handshake, with each bit of the bitmask sion control at each peer. In addition to controlling resource representing a sub-stream. During and after a PDM recon- commitment, another goal of the measurement-based admis- figuration, slow neighbor detection is disabled for a short sion control module is to continually estimate the amount period of time to allow for the system to stabilize. of available uplink bandwidth at a peer. Each peer on the Zattoo network is assumed to serve a The amount of available uplink bandwidth at a peer is user through a media player, which means that each peer initially estimated by the peer sending a pair of probe pack- must receive, and can potentially forward, all n sub-streams ets to Zattoo’s Bandwidth Estimation Server. Once a peer of the TV channel the user is watching. Given the limited starts forwarding sub-streams to other peers, it will receive redistribution capacity of peers on the Zattoo network, we from those peers quality-of-service feedbacks that inform its added Repeater nodes whose function is to serve as band- update of available uplink bandwidth estimate. A peer sends width multiplier, to amplify the amount of available band- quality-of-service feedback only if the quality of a sub-stream width in the network [5]. The Repeater nodes also receive drops below a certain threshold.2 Upon receiving quality and serve all n sub-streams of each TV channel they sup- feedback from multiple peers, a peer first determines if the port. The Repeater nodes run the same PDM protocol and identified sub-streams are arriving in low quality. If so, the are treated by actual peers like any other peers on the net- low quality of service may not be caused by limit on its work. The use of Repeater nodes makes the Zattoo network own available uplink bandwidth; in which case, it ignores a hybrid P2P and content distribution network. the low quality feedbacks. Otherwise, the peer decrements its estimate of available uplink bandwidth. If the new es- 3. SERVER-SIDE MEASUREMENTS timate is below the bandwidth needed to support existing In the Zattoo system, two separate centralized collector number of virtual circuits, the peer closes a virtual circuit. servers collect usage statistics and error reports, which we To reduce the instability introduced into the network, a peer call the “stats” server and the “user-feedback” server respec- closes first the virtual circuit carrying the smallest number tively. The “stats” server periodically collects aggregated of sub-streams. player statistics from individual peers, from which full ses- A peer attempts to increase its available uplink bandwidth sion logs are constructed and entered into a session database. estimate periodically: if it has fully utilized its current esti- The session database gives a complete picture of all past 2Depending on a peer’s NAT and/or firewall configuration, and present sessions served by the Zattoo system. A given Zattoo uses either UDP or TCP as the underlying transport database entry contains statistics about a particular session, protocol. The quality of a sub-stream is measured differ- which includes join time, leave time, uplink bytes, download ently for UDP and TCP. A packet is considered lost under bytes, and channel name associated with the session. We UDP if it doesn’t arrive within a fixed threshold. The qual- study the sessions generated on three major TV channels ity measure for UDP is computed as a function of both the packet lost rate and the burst error rate (number of contigu- from three different countries (, Spain, and Switzer- ous packet losses). The quality measure for TCP is defined land), from June 1st to June 30th, 2008. Throughout the to be how far behind a peer is, relative to other peers, in paper, we label those channels from Germany, Spain, and serving its sub-streams. as ARD, Cuatro, and SF2, respectively. Euro

421 Table 1: Session database (6/1/2008–6/30/2008). Table 3: Average sharing ratio. Average sharing ratio Channel # sessions # distinct users Channel ARD 2,102,638 298,601 Off-peak Peak Cuatro 1,845,843 268,522 ARD 0.335 0.313 SF2 1,425,285 157,639 Cuatro 0.242 0.224 SF2 0.277 0.222

Table 2: Feedback logs (6/20/2008–6/29/2008). Channel # feedback logs # sessions are spread uniformly throughout the entire session dura- tion, we approximate the average sharing ratio at time i ARD 871 1,253 Pi S uplink bytes(i)/duration(i) as ∈ i . Cuatro 2,922 4,568 Pi∈Si download bytes(i)/duration(i) SF2 656 1,140 Fig. 5 shows the overlay size (i.e., number of concurrent users) and average sharing ratio super-imposed across the month of June, 2008. According to the figure, the overlay 2008 games were held during this period, and those three size grew to more than 20,000 (e.g., 20,059 on ARD on 6/18 channels broadcast a majority of the Euro 2008 games in- and 22,152 on Cuatro on 6/9). As opposed to the overlay cluding the final match. See Table 1 for information about size, the average sharing ratio tends to stay flatter through- the collected session data sets. out the month. Occasional spikes in the sharing ratio all The “user-feedback” server, on the other hand, collects occurred during 2AM to 6AM (GMT) when the channel us- users’ error logs submitted asynchronously by users. The age is very low, and therefore may be considered statistically “user feedback” data here is different from peer’s quality insignificant. feedback used in PDM reconfiguration described in Section 2.3. By segmenting a 24-hour day into two time periods, e.g., Zattoo player maintains an encrypted log file which contains, off-peak hours (0AM-6PM) and peak hours (6PM-0AM), for debugging purposes, detailed behavior of client-side P2P Table 3 shows the average sharing ratio in the two time engine, as well as history of all the streaming sessions ini- periods separately. Zattoo’s usage during peak hours typi- tiated by a user since the player startup. When users en- cally accounts for about 50% to 70% of the total usage of counter any error while using the player, such as log-in er- the day. According to the table, the average sharing ra- ror, join failure, bad quality streaming etc., they can choose tio during peak hours is slightly lower than, but not very to report the error by clicking a “Submit Feedback” but- much different from during off-peak hours. Cuatro channel ton on the player, which causes the Zattoo player to send in Spain exhibits relatively lower sharing ratio than the two the generated log file to the user-feedback server. Since a other channels. One bandwidth test site [6] reports that av- given feedback log not only reports on a particular error, but erage uplink bandwidth in Spain is about 205 kbps, which also describes “normal” sessions generated prior to the oc- is much lower than in Germany (582 kbps) and Switzerland currence of the error, we can study user’s viewing experience (787 kbps). The lower sharing ratio on the Spanish channel (e.g., channel join delay) from the feedback logs. Table 2 de- may reflect regional difference in residential access network scribes the feedback logs collected from June 20th to June provisioning. The balance of the required bandwidth is pro- 29th. A given feedback log can contain multiple sessions (for vided by Zattoo’s Encoding Server and Repeater nodes. the same or different channels), depending on user’s viewing behavior. The second column in the table represents the 3.2 Channel Switching Delay number of feedback logs which contain at least one session When user clicks on a new channel button, it takes some generated on the channel listed in the corresponding entry time (a.k.a. channel switching delay) for the user to be able in the first column. The numbers in the third column indi- to start watching streamed video on Zattoo player. The cate how many distinct sessions generated on said channel channel switching delay has two components. First, Zattoo are present in the feedback logs. player needs to contact other available peers and retrieve all required sub-streams from them. We call the delay in- 3.1 Overlay Size and Sharing Ratio curred during this stage “join delay.” Once all necessary sub- We first study how many concurrent users are supported streams have been negotiated successfully, the player then by the Zattoo system, and how much bandwidth is con- needs to wait and buffer the minimum amount of streams tributed by them. For this purpose, we use the session (e.g., 3 seconds) before starting to show the video to the database presented in Table 1. By using the join/leave user. We call the resulting wait time “buffering delay.” The timestamps of the collected sessions, we calculate the num- total channel switching delay experienced by users is thus ber of concurrent users on a given channel at time i. Then the sum of join delay and buffering delay. PPLive reports we calculate the average sharing ratio of the given chan- channel switching delay around 20 to 30 seconds, but can be nel at the same time. The average sharing ratio is de- as high as 2 minutes, of which join delay typically accounts fined as total users’ uplink rate divided by their download for 10 to 15 seconds [7]. rate on the channel. A sharing ratio of one means users We measure the join delay experienced by Zattoo users contribute to other peers in the network as much traffic from the feedback logs described in Table 2. Debugging in- as they download at the time. We calculate the average formation contained in the feedback logs tells us when user sharing ratio from the total download/uplink bytes of the clicked on a particular channel, and when the player has collected sessions. We first obtain all the sessions which successfully joined the P2P overlay and starting to buffer are active across time i. We call a set of such sessions content. One concern in relying on user-submitted feedback Si. Then assuming uplink/download bytes of each session logs to infer join delay is the potential sampling bias associ-

422 25000 2 25000 2 25000 2 Overlay Size Overlay Size Overlay Size Sharing Ratio 1.8 Sharing Ratio 1.8 Sharing Ratio 1.8 20000 1.6 20000 1.6 20000 1.6 1.4 1.4 1.4 15000 1.2 15000 1.2 15000 1.2 1 1 1 10000 0.8 10000 0.8 10000 0.8 Overlay Size Overlay Size Overlay Size 0.6 Sharing Ratio 0.6 Sharing Ratio 0.6 Sharing Ratio 5000 0.4 5000 0.4 5000 0.4 0.2 0.2 0.2 0 0 0 0 0 0 0 5 10 15 20 25 30 0 5 10 15 20 25 30 0 5 10 15 20 25 30 Day in 2008/6 Day in 2008/6 Day in 2008/6 (a) ARD (b) Cuatro (c) SF2

Figure 5: Overlay size and sharing ratio.

3.3 Packet Retransmission Table 4: Median channel join delay. Median join delay Maximum overlay size As described in Section 2.2, Zattoo player’s P2P engine Channel Off-peak Peak Off-peak Peak performs on-demand, application-level packet retransmis- sion for any missing or delayed packets. The objective of ARD 2.29 sec 1.96 sec 2,313 19,223 such packet retransmission is to mask transient packet loss Cuatro 3.67 sec 4.48 sec 2,357 8,073 or delay incurred on the P2P overlay, to help stabilize the SF2 2.49 sec 2.67 sec 1,126 11,360 overlay structure. Zattoo player’s P2P engine periodically records in the log file the amount of packets in the local I/O buffer that are ated with them. Users typically submit feedback logs when retransmitted. Thus, by studying the user’s feedbacks, we they encounter some kind of errors, and that brings up the can estimate the amount of packet retransmission occurring question of whether the captured sessions are representative on Zattoo’s P2P overlay. Fig. 8(a) shows the ratio of re- samples to study. We attempt to address this concern by transmitted packets and the total incoming packets on ARD, comparing the data from feedback logs against those from Cuatro and SF2 per hour across the day. Fig. 8(b) plots the session database. The latter captures the complete pic- the total amount of retransmitted traffic in a similar fash- ture of user’s channel watching behavior, and therefore can ion. Looking at the figures, the amount of retransmission serve as a reference. In our analysis, we compare the user ar- clearly reflects the daily usage pattern of Zattoo users (e.g., rival time distribution obtained from the two data sets. For peaked at 7PM GMT), while the ratio of retransmission re- fair comparison, we used a subset of the session database mains flatter, with occasional peaks spread out throughout which was generated during the same period when the feed- the day. Fig. 8(a) is rather contrary to our expectation that back logs were collected (i.e., from June 20th to 29th). there would be more frequent packet retransmission when Fig. 6 plots the CDF distribution of user arrivals per hour the overlay is larger. One possible cause is that packet re- obtained from feedback logs and session database separately. transmission requests are handled at a lower priority than The steep slope of the distributions during hour 18-20 (6- regular join requests. Consequently, even though there are 9PM) indicates the high frequency of user arrivals during more retransmission requests during peak hours, the num- those hours. On ARD and Cuatro, the user arrival distri- ber of successfully retransmitted packets may not be propor- butions inferred from feedback logs are almost identical to tionally larger, due to the lack of available uplink bandwidth those from session database. On the other hand, on SF2, during these hours. the distribution obtained from feedback logs tends to grow slowly during early hours, which indicates that feedback sub- mission rate during off-peak hours on SF2 was relatively 4. CLIENT-SIDE MEASUREMENTS lower than normal. Later during peak hours, however, feed- To further study the P2P overlay beyond details obtain- back submission rate picks up as expected, closely matching able from aggregated session-level statistics, we run several the actual user arrival rate. Based on this observation, we modified Zattoo clients which periodically retrieve the in- argue that feedback logs can serve as representative samples ternal states of other participating peers in the network by of daily user activities. exchanging SEARCH/JOIN messages with them. After a Fig. 7 shows the CDF distributions of channel join de- given probe session is over, the monitoring client archives a lay for ARD, Cuatro and SF2 channels. We show the dis- log file where we can analyze control/data traffic exchanged tributions for off-peak hours (0AM-6PM) and peak hours and detailed protocol behavior. We run the experiment dur- (6PM-0AM) separately. Median channel join delay is also ing Zattoo’s live coverage of Euro 2008 (June 7th to 29th). presented in a similar fashion in Table 4. According to the The monitoring clients tuned to game channels from one of CDF distributions, 80% of users experience less than 4 to Zattoo’s data centers located in Switzerland while the games 8 seconds of join delay, and 50% of users even less than 2 were broadcast live. The data sets presented in this paper seconds of join delay. Also, Table 4 shows that even a 10- were collected during the coverage of the championship final fold increase on the number of concurrent users during peak on two separate channels: ARD in Germany and Cuatro in hours does not unduly lengthen the channel join delay (up Spain. Soccer teams from Germany and Spain participated to 22% increase in median join delay). in the championship final.

423 1 1 1 Feedback Logs Feedback Logs Feedback Logs Session Database Session Database Session Database 0.8 0.8 0.8

0.6 0.6 0.6 CDF CDF CDF 0.4 0.4 0.4

0.2 0.2 0.2

0 0 0 0 2 4 6 8 10 12 14 16 18 20 22 24 0 2 4 6 8 10 12 14 16 18 20 22 24 0 2 4 6 8 10 12 14 16 18 20 22 24 Arrival Hour Arrival Hour Arrival Hour (a) ARD (b) Cuatro (c) SF2

Figure 6: CDF of user arrival time.

1 1 1 0.9 0.9 0.9 0.8 0.8 0.8 0.7 0.7 0.7 0.6 0.6 0.6 0.5 0.5 0.5 CDF CDF CDF 0.4 0.4 0.4 0.3 0.3 0.3 0.2 0.2 0.2 0.1 Off-Peak Hours 0.1 Off-Peak Hours 0.1 Off-Peak Hours Peak Hours Peak Hours Peak Hours 0 0 0 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 Channel Join Delay (Sec) Channel Join Delay (Sec) Channel Join Delay (Sec) (a) ARD (b) Cuatro (c) SF2

Figure 7: CDF of channel join delay.

As described in Section 2.1, Zattoo’s peer discovery is In our experiment, we collected SEARCH replies from guided by peer’s topology information. To minimize poten- 4,420 and 6,530 distinct peers from ARD and Cuatro chan- tial sampling bias caused by our use of single vantage point nels respectively, during the 2-hour coverage of the final for monitoring, we assigned “empty” AS number and coun- game. From the collected SEARCH replies, we check how try code to our monitoring clients, so that their probing is many sub-streams (out of 16) are “bad” (e.g., lossy or miss- not geared towards those peers located in the same AS and ing) for each peer. Fig. 9(a) shows the CDF distribution country. of the number of bad sub-streams. According to the figure, about 99% (ARD) and 96% (Cuatro) of peers have 3 or less 4.1 Sub-Stream Synchrony bad sub-streams. Current Zattoo deployment dedicates k = 3 sub-streams To ensure good viewing quality, peer should not only ob- (out of n = 16) for loss recovery purposes. That is, given tain all necessary sub-streams (discounting redundant sub- a segment of 16 consecutive packets, if peer has received at streams), but also have those sub-streams delivered tempo- least 13 packets, it can reconstruct the remaining 3 packets rally synchronized with each other for proper online decod- from the RS error correcting code (see Section 1). Thus if ing. Receiving out-of-sync sub-streams typically results in peer can receive any 13 sub-streams out of 16 reliably, it pixelated screen on the player. As described in Sections 1 can decode the full stream properly. The result in Fig. 9 (a) and 2.3, Zattoo’s protocol favors sub-streams that are rela- suggests that the number of bad sub-streams is low enough tively in-sync when constructing the PDM, and continually as to not cause quality issues in the Zattoo network. monitors the sub-streams’ progression over time, replacing After discounting “bad” sub-streams, we then look at the those sub-streams that have fallen behind and reconfigur- synchrony of the remaining “good” sub-streams in each peer. ing the PDM when necessary. In this section we measure Fig. 9(b) shows the CDF distribution of the sub-stream syn- the effectiveness of Zattoo’s Adaptive PDM in selecting sub- chrony in the two channels. Sub-stream synchrony of a streams that are largely in-sync. given peer is defined as the difference between maximum and To quantify such inter-sub stream synchrony, we measure minimum packet sequence numbers among all sub-streams, the difference in the latest (i.e., maximum) packet sequence which is obtained from the peer’s SEARCH reply. For ex- numbers belonging to different incoming sub-streams. When ample, if some peer has sub-stream synchrony measured at a remote peer responds to a SEARCH query message, it in- 100, it means that the peer has one sub-stream that is ahead cludes in its SEARCH reply the latest sequence numbers of another sub-stream by 100 packets. If all the packets are that it has received for all sub-streams. If some sub-streams received in order, the sub-stream synchrony of a peer mea- happen to be lossy or stalled at that time, the peer marks sures at most n−1. If we received multiple SEARCH replies such sub-streams in its SEARCH replies. Thus, we can in- from the same peer, we average the sub-stream synchrony spect SEARCH replies from existing peers to study their across all the replies. Given the 500 kbps average channel inter-sub stream synchrony.

424 0.08 ARD ARD 200 0.07 Cuatro Cuatro SF2 SF2 0.06 150 0.05

0.04 100 0.03

0.02 50

Average Retransmission Ratio 0.01 Total Retransmitted Traffic (MB) 0 0 0 2 4 6 8 10 12 14 16 18 20 22 24 0 2 4 6 8 10 12 14 16 18 20 22 24 Hour of Day Hour of Day (a) Retransmission ratio (b) Total retransmitted traffic

Figure 8: Packet retransmission.

1 1 0.9 0.9 0.8 0.8 0.7 0.7 0.6 0.6 0.5 0.5 CDF CDF 0.4 0.4 0.3 0.3 0.2 0.2 0.1 Euro 2008 final on ARD 0.1 Euro 2008 final on ARD Euro 2008 final on Cuatro Euro 2008 final on Cuatro 0 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 0 100 200 300 400 500 600 700 800 900 1000 Number of Bad Sub-streams Sub-stream Synchrony (# of Packets) (a) CDF for number of bad sub-streams (b) CDF for sub-stream synchrony

Figure 9: Sub-stream synchrony. data rate, 60 consecutive packets roughly correspond to 1- second worth of streaming. Thus, the figure shows that on Cuatro channel, 20% of peers have their sub-streams com- pletely in-sync, while more than 90% have their sub-streams lagging each other by at most 5 seconds; on ARD channel, 1 Euro 2008 final on Cuatro 30% are in-sync, and more than 90% are within 1.5 seconds. 0.9 Euro 2008 final on ARD The buffer space of Zattoo player has been dimensioned suf- 0.8 ficiently to accommodate such low degree of out-of-sync sub- streams. 0.7 0.6 0.5

4.2 Peer Synchrony CDF While sub-stream synchrony tells us stream quality dif- 0.4 ferent peers may experience, “peer synchrony” tells us how 0.3 varied in time peers’ viewing points are. With small scale 0.2 P2P networks, all participating peers are likely to watch 0.1 live streaming roughly synchronized in time. However, as 0 the size of the P2P overlay grows, the viewing point of edge -350 -300 -250 -200 -150 -100 -50 0 50 100 nodes may be delayed significantly compared to those closer Relative Peer Synchrony (# of Packets) to the Encoding Server. In the experiment, we define the viewing point of a peer as the median of the latest sequence Figure 10: Peer synchrony. numbers across its sub-streams. Then we choose one peer (e.g., a Repeater node directly connected to the Encoding Server) as a reference point, and compare other peers’ view- ing point against the reference viewing point.

425 1 1 Peer hops from ES: 6 Peer hops from ES: 6 0.9 Peer hops from ES: 5 0.9 Peer hops from ES: 5 Peer hops from ES: 4 Peer hops from ES: 4 0.8 Peer hops from ES: 3 0.8 Peer hops from ES: 3 0.7 Peer hops from ES: 2 0.7 Peer hops from ES: 2 0.6 0.6 0.5 0.5 CDF CDF 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 -350 -300 -250 -200 -150 -100 -50 0 50 100 -350 -300 -250 -200 -150 -100 -50 0 50 100 Peer Synchrony Peer Synchrony (a) ARD (b) Cuatro

Figure 11: Peer synchrony vs. peer-hop from the Encoding Server.

Fig. 10 shows the CDFs of relative peer synchrony. The 4.3 Effectiveness of ECC in Isolating Loss relative peer synchrony of peer X is obtained by the viewing we investigate the effects of overlay sizes on the per- point of X subtracted by the reference viewing point. So formance scalability of the Zattoo system. Here we focus on peer synchrony at -60 means that a given peer’s viewing client-side quality (e.g., loss rate). As described in Section 1, point is delayed by 60 packets (roughly 1 second for a 500 Zattoo-broadcast media streams are RS encoded, which al- kbps stream) compared to the reference viewing point. A lows peers to reconstruct a full stream once they obtain at positive viewing point means that a given peer’s stream gets least k of n sub-streams. Since the ECC-encoded stream ahead of the reference point, which could happen for peers reconstruction occurs hop by hop in the overlay, it can mask which receive streams directly from the Encoding Server. sporadic packet losses, and thus prevent packet losses from The figure shows that about 1% of peers on ARD and 4% being propagated throughout the overlay at large. of peers on Cuatro experienced more than 3 seconds (i.e., To see if such packet loss containment actually occurs in 180 packets) delay in streaming compared to the reference the production system, we run the following experiments. viewing point. We let our monitoring client join a game channel, stay tuned To better understand to what extent peer’s position in the for 15 seconds, and then leave. We wait for 10 seconds af- overlay affects peer synchrony, we plot in Fig. 11 the CDFs ter the 15-second session is over. We repeat this process of peer synchrony at different depths, i.e., distances from throughout the 2-hour game coverage and collect the logs. the Encoding Server. We look at how much delay is intro- A given log file from each session contains a complete list duced for sub-streams traversing i peers from the Encoding of packet sequence numbers received from the connected Server (i.e., depth i). For this purpose, we associate per-sub peers during the session, from which we can detect upstream stream viewing point information from SEARCH reply with packet losses. We then associate individual packet loss with per-sub stream overlay depth information from JOIN reply, the peer-path from the Encoding Server. This gives us a where SEARCH/JOIN replies were sent from the same peer, rough sense of whether packets traversing longer hops would close in time (e.g., less than 1 second apart). If the length of experience higher loss rate. In our analysis, we discount peer-hop from the Encoding Server has an adverse effect on packets delivered for the first 3 seconds of a session to allow playback delay and viewing point, we expect peers further the PDM to stabilize. away from the Encoding Server to be further offset from the Fig. 12 shows how the average packet loss rate changes reference viewing point, resulting in CDFs that are shifted across different overlay depths (in all cases < 1%). On both to the upper left corner for peers at further distances from channels, the packet loss rate does not grow with overlay the Encoding Server. The figure shows an absence of such depth. This result confirms our expectation that ECC help a leftward shift in the CDFs. The median delay at depth localize packet losses on the overlay. 6 does not grow by more than 0.5 seconds compared to the median delay at depth 2. The figure shows that having more peers further away from the Encoding Servers increases the 5. PEER-TO-PEER SHARING RATIO number of peers that are 0.5 seconds behind the reference The premise of a given P2P system’s success in providing viewing point, without increasing the offset in viewing point scalable stream distribution is sufficient bandwidth sharing itself. On the Zattoo network, once the virtual circuits com- from participating users [5]. Section 3.1 shows that the av- prising a PDM have been set up, each packet are streamed erage sharing ratio of Zattoo users ranges from 0.2 to 0.35, with minimal delay through each peer. Hence each peer-hop which translates into bandwidth uplinks ranging from 100 from the Encoding Server introduces delay only in tens of kbps to 175 kbps. This is far lower than the numbers re- milliseconds range, attesting to the suitability of the network ported as typical uplink bandwidth in countries where Zat- architecture to carry live media streaming on large-scale P2P too is available [6]. Aside from the possibility that user’s networks. bandwidth may be shared with other applications, we find factors such as users’ behavior, support for variable-bit rate encoding, and heterogeneous NAT environments contribut-

426 0.7 0.6

0.6 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 Packet Loss Rate (%) Packet Loss Rate (%) 0.1 0.1

0 0 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 Peer Hops from Encoding Server Peer Hops from Encoding Server (a) ARD (b) Cuatro

Figure 12: Packet loss rate vs. peer hops from Encoding Server.

60 Variable-bit rate streams: Variable-bit rate (VBR) en- coding is commonly used in production streaming systems, 50 including Zattoo, due to its better quality-to-bandwidth ra- tio compared to constant-bit rate encoding. However, VBR 40 streams may put additional strain on peer’s bandwidth con- tribution and quality optimization. As described in Sec- 30 tion 2.3, the presence of VBR streams require peers to per- form measurement-based admission control when allocat- 20 ing resources to set up a virtual circuit. To avoid degra- dation of service due to overloading of the uplink band- 10 width, the measurement-based admission control module Average Receiver-Peer Churns must be conservative both in its reaction to increases in 0 0 10 20 30 40 50 60 available bandwidth and its allocation of available band- Session Length (Min) width to newly joining peers. This conservativeness nec- essarily lead to under-utilization of resources. Figure 13: Receiver-peer churn frequency. NAT reachability: Asymmetric reachability imposed by prevalent NAT boxes adds to the difficulty in achieving full utilization of user’s uplink capacity. Zattoo delivery sys- tem supports 6 different NAT configurations: open host, full ing to sub-optimal sharing performance. Designers of P2P cone, IP-restricted, port-restricted, symmetric, and UDP- streaming systems must pay attention to these factors to disabled, listed in increasing degree of restrictiveness. Not achieve good bandwidth sharing. However, one must also every pairwise communication can occur among different keep in mind that improving bandwidth sharing should not NAT types. For example, peers behind a port-restricted be at the expense of compromised user viewing experience, NAT box cannot communicate with those of symmetric NAT e.g., due to more frequent uplink bandwidth saturation. type (we documented a NAT reachability matrix in an ear- User churns: It is known that frequent user churns ac- lier work [5]). companied by short sessions, as well as flash crowd behavior To examine how the varied NAT reachability comes into pose a unique challenge in live streaming systems [7, 8]. User play as far as sharing performance is concerned, we per- churns can occur in both upstream (e.g., forwarding peers) formed the following two controlled experiments. In both and downstream (e.g., receiver peers) connectivity on the experiments, we run four Zattoo clients, each with a dis- overlay. While frequent churns of forwarding peers can cause tinct NAT type, tuned to the same channel concurrently for quality problems, frequent changes of receiver peers can lead 30 minutes. In the first experiment, we fixed the maximum to under-utilization of a forwarding peer’s uplink capacity. allowable uplink capacity (denoted“max cp”) of those clients To estimate receiver peers’ churn behavior, we visualize to the same constant value.3 In the second experiment, we in Fig. 13 the relationship between session length and the let the max cp of those clients self-adjust over time (which is frequency of receiver-peer churns experienced during the ses- the default setting of Zattoo player). In both experiments, sions. We studied receiver-peer churn behavior for the Cu- we monitored how the uplink bandwidth utilization ratio atro channel by using Zattoo’s feedback logs described in (i.e., ratio between actively used uplink bandwidth and max- Table 2. The y-value at session length x minutes denotes imum allowable bandwidth max cp) changes over time. The the average number of receiver-peer churns experienced for experiments were repeated three times for reliability. sessions with length ∈ [x, x + 1). According to the fig- ure, the receiver-peer churn frequency tends to grow linearly 3The experiments were performed from one of Zattoo’s data with session length, and peers experience approximately one centers in Europe, and there was sufficient uplink bandwidth receiver-peer churn every minute. available to support 4 * max cp.

427 Fig. 14 shows the capacity utilization ratio from these two 7 3.5E+5 NAT failure rate experiments for four different NAT types. Fig. 14(a) visual- # of NAT detections izes how fast the capacity utilization ratio converges to 1.0 6 3.0E+5 over time after the client joins the network. Fig. 14(b) plots 5 2.5E+5 the average utilization ratio (i.e., [Rt current capacity(t)dt] 4 2.0E+5 /[Rt max cp(t)dt]). In both constant and adjustable cases, the results clearly exemplify the varied sharing performance across different NAT types. Especially, the “symmetric” 3 1.5E+5 NAT type, which is the most restrictive among the four, 2 1.0E+5 shows inferior sharing ratio compared to the rest. Unfor- tunately, however, the “symmetric” NAT type is the sec- 1 5.0E+4 Number of NAT Detections ond most popular NAT type for Zattoo population [5], and NAT Detection Failure Rate (%) 0 0.0E+ therefore can seriously affect Zattoo’s overall sharing perfor- 0 2 4 6 8 10 12 14 16 18 20 22 24 mance. There is a relatively small number of peers running Hour of Day “IP-restricted”NAT-type, hence its sharing performance has not been fine-tuned at Zattoo. Figure 15: NAT detection failure rate. Reliability of NAT detection: To allow communica- tions among clients behind a NAT gateway (i.e., NATed clients), each client in Zattoo performs a NAT detection pro- core data from a large production system with over 3 million cedure upon player startup to identify its NAT configuration registered users with intimate knowledge of the underlying and advertises it to the rest of the Zattoo network. Zattoo’s network architecture and protocol. NAT detection procedure implements a UDP-based - P2P systems are usually classified as either tree-based dard STUN protocol which involves communicating with push or mesh-based swarming [25]. In tree-based push schemes, external STUN servers to discover the presence/type of a peers organize themselves into multiple distribution trees [26, NAT gateway [9]. The communication occurs in UDP with 27, 28]. In mesh-based swarming, peers form a randomly no reliable transport guarantee. Lost or delayed STUN UDP connected mesh, and content is swarmed via dynamically packets may lead to inaccurate NAT detection, preventing constructed directed acyclic paths [29, 30, 21]. Zattoo is not NATed clients from contacting each other, and therefore ad- only a hybrid between these two, similar to the latest version versely affect their sharing performance. of Coolstreaming [4], its dependence on Repeater nodes also To understand the reliability and accuracy of STUN-based makes it a hybrid of P2P network and a content-distribution NAT detection procedure, we utilize Zattoo’s session database network (CDN), similar to PPlive [7]. (Table 1) which stores among other things NAT - tion, public IP address, and private IP address for each ses- 7. CONCLUSION sion. We assume that all the sessions associated with the We have presented a receiver-based, peer-division multi- same public/private IP address pair would be generated un- plexing engine to deliver live streaming content on a peer- der the same NAT configuration. If sessions with the same to-peer network. The same engine can be used to transpar- public/private IP address pair report inconsistent NAT de- ently build a hybrid P2P/CDN delivery network by adding tection results, we consider those sessions as having experi- Repeater nodes to the network. By analyzing large amount enced failed NAT detection, and apply the majority rule to of usage data collected on the network during one of the determine the correct result for that configuration. largest viewing event in Europe, we have shown that the re- Fig. 15 plots the daily trend of NAT detection failure rate sulting network can scale to a large number of users and can derived from ARD, Cuatro and SF2 during the month of take good advantage of available uplink bandwidth at peers. June, 2008. The NAT detection failure rate at hour x is We have also shown that error-correcting code and packet the number of bogus NAT detection results divided by the retransmission can help improve network stability by isolat- total number of NAT detections occurring on that hour. The ing packet losses and preventing transient congestion from total number of NAT detections indicates how busy Zattoo resulting in PDM reconfigurations. We have further shown system was through out. The NAT detection failure rate that the PDM and adaptive PDM schemes presented have grows from around 1.5% at 2-3AM to a peak of almost 6% small enough overhead to make our system competitive to at 8PM. This means that at the busiest times, the NAT type digital satellite TV in terms of channel switch time, stream of about 6% of clients are not determinable, leading to them synchronization, and signal lag. not contributing any bandwidth to the P2P network. 6. RELATED WORKS 8. REFERENCES Aside from Zattoo, several commercial peer-to-peer sys- [1] Rolf auf der Maur, “Die Weiterverbreitung von TV- tems intended for live streaming have been introduced since und Radioprogrammen uber¨ IP-basierte Netze,” in 2005, notably PPLive, PPStream, SopCast, TVAnts, and Entertainment Law (f. d. Schweiz). St¨ampfli Verlag, UUSee from , and Joost, Livestation, Octoshape, and 1st edition, 2006. RawFlow from the EU. A large number of measurement [2] UEFA, “Euro2008,” http://www1.uefa.com/. studies have been done on one or the other of these sys- [3] S. Lin and D. J. Costello, Jr., Error Control Coding, tems [7, 10, 11, 12, 13, 14, 15, 16]. Many research prototypes Pearson Prentice-Hall, 2nd edition, 2004. and improvements to existing P2P systems have also been [4] S. Xie, B. Li, G. Y. Keung, and X. Zhang, proposed and evaluated [17, 18, 19, 20, 21, 4, 22, 23, 24]. “CoolStreaming: Design, Theory, and Practice,” IEEE Our study is unique in that we are able to collect network Trans. on Multimedia, vol. 9, no. 8, December 2007.

428 1 1

0.8 0.8

0.6 0.6

0.4 0.4

0.2 Full cone IP-restricted 0.2 Port-restricted Current Capacity Utilization Ratio

Symmetric Average Capacity Utilization Ratio 0 0 5 10 15 20 25 30 0 Time Since Joining (Min) Full-cone IP-restricted Port-restricted Symmetric (a) Constant maximum capacity (b) Adjustable maximum capacity

Figure 14: Uplink capacity utilization ratio.

[5] K. Shami et al., “Impacts of Peer Characteristics on [18] D. Tran, K. Hua, and T. Do, “ZIGZAG: An Efficient P2PTV Networks Scalability,” in IEEE INFOCOM Peer-to-Peer Scheme for Media Streaming,” in Proc. of Mini Conference, April 2009. the IEEE Infocom, 2003. [6] Bandwidth-test.net, “Bandwidth test statistics across [19] D. Kostic et al., “Bullet: High Bandwidth Data different countries,” Dissemination Using an Overlay Mesh,” in Proc. of the http://www.bandwidth-test.net/stats/country/. 19th ACM SOSP, Bolton Landing, NY, USA, Oct. [7] X. Hei, C. Liang, J. Liang, Y. Liu, and K. W. Ross, 2003. “Insights into PPLive: A Measurement Study of a [20] R. Rejaie and S. Stafford, “A Framework for Large-Scale P2P IPTV System,” in IPTV Workshop, Architecting Peer-to-Peer Receiver-driven Overlays,” International World Wide Web Conference, 2006. in Proc. of the ACM NOSSDAV, 2004, pp. 42–47. [8] B. Li et al., “An Empirical Study of Flash Crowd [21] X. Liao et al., “AnySee: Peer-to-Peer Live Streaming,” Dynamics in a P2P-Based Live Video Streaming in IEEE INFOCOM, April 2006. System,” in IEEE Global Telecommunications [22] F. Pianese et al., “PULSE: An Adaptive, Conference, 2008. Incentive-Based, Unstructured P2P live Streaming [9] J. Rosenberg et al, “STUN - Simple Traversal of User System,” IEEE Trans. on Multimedia, vol. 9, no. 6, Datagram Protocol (UDP) Through Network Address 2007. Translators (NATs),” 1993, RFC 3489. [23] J. Douceur, J. Lorch, and T. Moscibroda, “Maximizing [10] A. Ali, A. Mathur, and H. Zhang, “Measurement of Total Upload in Latency-Sensitive P2P Applications,” Commercial Peer-to-Peer Live Video Streaming,” in in Proc. of the 19th ACM SPAA, 2007, pp. 270–279. Workshop on Recent Advances in Peer-to-Peer [24] Y.-W. Sung, M. Bishop, and S. Rao, “Enabling Streaming, August 2006. Contribution Awareness in an Overlay Broadcasting [11] L. Vu et al., “Measurement and Modeling of a System,” in Proc. of the ACM SIGCOMM, 2006. Large-scale Overlay for Multimedia Streaming,” in [25] N. Magharei, R. Rejaie, and Y. Guo, “Mesh or The 4th Int’l Conf. on Heterogeneous Networking for Multiple-Tree: A Comparative Study of Live P2P Quality, Reliability, Security and Robustness, 2007. Streaming Approaches,” in IEEE INFOCOM, May [12] T. Silverston and O. Fourmaux, “Measuring P2P 2007. IPTV Systems,” in ACM NOSSDAV, November 2008. [26] V. N. Padmanabhan, H. J. Wang, and P. A. Chou, [13] C. Wu, B. Li, and S. Zhao, “Magellan: Charting “Resilient Peer-to-Peer Streaming,” in IEEE ICNP, Large-Scale Peer-to-Peer Live Streaming Topologies,” November 2003. in Proc. of ICDCS’07, 2007, p. 62. [27] M. Castro et al., “SplitStream: High-Bandwidth [14] M. Cha et al., “On Next-Generation Telco-Managed Multicast in Cooperative Environments,” in ACM P2P TV Architectures,” in Proc. of the 7th Int’l SOSP, October 2003. Workshop on Peer-to-Peer Systems, 2008. [28] J. Liang and K. Nahrstedt, “DagStream: Locality [15] D. Ciullo et al., “Understanding P2P-TV Systems Aware and Failure Resilient Peer-to-Peer Streaming,” Through Real Measurements,” in IEEE GLOBECOM, in SPIE Multimedia Computing and Networking, November 2008. January 2006. [16] E. Alessandria et al., “P2P-TV Systems under [29] N. Magharei and R. Rejaie, “PRIME: Peer-to-Peer Adverse Network Conditions: a Measurement Study,” Receiver-drIven MEsh-based Streaming,” in IEEE in IEEE INFOCOM, April 2009. INFOCOM, May 2007. [17] S. Banerjee et al., “Construction of an Efficient [30] M. Hefeeda et al., “PROMISE: Peer-to-Peer Media Overlay Multicast Infrastructure for Real-time Streaming Using CollectCast,” in ACM Multimedia, Applications,” in IEEE Infocom, Mar. 2003. November 2003.

429