P2P Content Distribution Pablo Rodriguez Christos Gkantsidis Traditional Content Distribution Server Farm Often, large content needs to be distributed to millions of clients:
• Currently: • Huge server farms • Infrastructure-based solutions (e.g. Akamai) slow, expensive, non scalable
2 Content Distribution Evolution
Layer-7 Switches Satellite CDNs
CDNs Disappointment Akamai
Hype
Caching P2P IP Multicast Enterprise Growth CDNs
Realism 1999 2000 2001 2002 2003 2004
3 P2P Content Distribution
4 P2P Content Distribution Server Farm Desktop PCs can help each other! • Clients become new servers • Capacity increases with the number of clients • Limitless scalability and fast speeds at extremely low cost!!
10000000
1000000
100000
Cooperative 10000 Client/Server
1000 Number of Clients Served Clients of Number 100
10
0 7 14 21 28 36 43 50 57 Time (sec)
4 MB file. Server 100 Mbps. Client 1 Mbps
5 Examples • Updates/Critical Patches (combat virus/worm propagation) – Adding more servers and egress capacity to absorb pick load is quite expensive – Alternative solution is to artificially delay clients » Patches do not arrive on-time
• Software Distribution – BitTorrent: successfully distributed 1.77GB Redhat 9
• PodCasting
• Group Information Sharing
• Enterprise content distribution
6 P2P Content Distribution
• Benefits: – Dramatically improves speed – Limitless scalability – Minimum server requirements – Very cheap
• Challenges: – Requires incentives for cooperation – Hard to ensure end2end full connectivity – Security – Manageability – Lack of locality increases transit costs for ISPs – Asymmetric links (traffic engineering) – Variable bandwidth, peers come and go – Need for more sophisticated distribution algorithms
7 P2P Swarming
• File is divided into many small pieces for distribution • Clients request different pieces from the server or from other clients • Clients become servers for those pieces downloaded • When all pieces are downloaded, clients can re-construct the whole file Server
1 2 3 4 5 6
1 3 5 6 2 4
1 2 3 4 5 6
[Rodriguez, Biersack, Infocom’00]
8 The Challenge If there are many users, Solutions that require to have full deciding which is the best piece to knowledge of who has what are non- download can be very hard!! scalable ⇒ Incorrect decisions result in low Server throughput, nodes not able to finish, bandwidth wasted, etc. 1 2 3 4 5 6
1 3 5 6 2 4
1 2 3 4 5 6
9 Avalanche: Improving BitTorrent through Coding Techniques
10 Goal
• Provide a very fast and robust P2P file distribution solution
• Current problems in existing P2P solutions: •Rare-blocks are hard to obtain •Tit-for-tat incentive mechanisms decrease speeds •Arrival of new users slows down old users •Heterogeneous nodes do not interact well •Same information travels repeatedly over bottleneck links •Too much dependency from seeds •Sudden departures can prevent peers from finishing
11 The Problem of Efficient Scheduling of Information
Source
Block 1 Block 2 Block 1
Node C
Node A Node B
Block 1, or 2, or 1 ⊕2?
12 The Avalanche Magic
• To solve problems of existing P2P file distribution solutions, Avalanche uses special encoding algorithms
• Each encoded piece has the “DNA” of all pieces in the file. => A given encoded piece can be used by any peer in place of any piece
• Encoded pieces are created using linear equations that involve all pieces in the file
• Reconstructing the file requires collecting enough encoded pieces and solving the set of mathematical equations
13 Coding in general
• Assume file: F = [x 1 x2], where x i is a block.
• Define code E i(a i,1 , a i,2 ) = a i,1 *x 1+ a i,2 *x 2, where ai,1 , a i,2 are numbers.
• “Infinite” number of Ei’s.
• Any two linearly independent E i(a i,1 , a i,2 ) can recover [x 1 x2]. – Similar as solving a system of linear equations.
• Operations in finite fields [such as GF(2 16 )].
14 Avalanche Coding
File B1 B2 Bn Server
β β α α 1 2 α β 1 2 n n
Client A E1 E2 ω ω 1 2
Client B E3
• Content is encoded at the server • Clients can produce new encoded packets out of partial files [Chou et al., ’03]
15 Avalanche Robustness
500 NC 450 FEC LR 400
350
300
250
200
# of Peers Finished 150
100
50
0 100 150 200 250 300 Time If server suddenly goes down (after serving the full file one), all Avalanche users are able to complete the download. Only 10% of BitTorrent-like users are able to complete. 16 Avalanche Download Time 300
250
200
150
Finish Times Finish 100
AvalancheNC BitTorrentRandom 50 BitTorrent peers not yet finished
0 0 50 100 150 200 250 300 350 400 Nodes (sorted by order of arrival)
=> Much lower and predictable download times
17 No need for nodes to stay around… Finish Times Finish
Nodes stay for ever Nodes leave immediately
Nodes (sorted by order of arrival) • With Avalanche, there is no need for nodes to stay after they finish the download to help other nodes (the performance remains unchanged)
18 Minimum Server Requirements
140 NC FEC 120 Simple
100
80
60 Server Load
40
20
0 50 100 150 200 Time
Less than half the server requirements of BitTorrent-like systems
19 Decoding Performance
Avalanche trades-off better speeds and less server load for more processing power at each node
File Size (MB) Blocks Time 10 100 5 sec 50 100 37 sec 100 100 2m 21 sec 200 100 3m 38 sec
Note: Pentium III, 650MHz, 512MB RAM.
Decoding time is less than 4% of the total download
20 Summary
• Adding resources in an arbitrary fashion is not efficient or cost effective
• We are witnessing a new Revolution •P2P can be used to provide hugely scalable, fast content distribution at very low cost
21