An Analysis of Internet Content Delivery Systems
S. Saroiu, K. P. Gummadi, R. J. Dunn, S. D. Gribble, and H. M. Levy Department of Computre Science & Engineering University of Washington
2008. 11. 27
Kyusik Kim
Wireless Networks lab Contents
Introduction Overview of Content Delivery Systems
WWW, Content delivery networks (CDNs), peer-to-peer systems (P2P) Measurement Methodology High-Level Data Characteristics Detailed Content Delivery Characteristics
objects, clients, servers, scalability of P2P systems The Potential Role of Caching in CDNs and P2P Conclusion
1 Wireless Networks Lab. Introduction (1)
Purpose
Examining content delivery traffic
HTTP web
Akamai content delivery network
p2p file sharing systems
Gnutella, Kazza
Providing a detailed characterization and comparison of content delivery systems
analyzing a nine day trace (incoming and outgoing Internet traffic at University of Washington)
over 500 million transactions and over 20 terabytes of HTTP data
2 Wireless Networks Lab. Introduction (2)
Results quantify
the extent to which p2p traffic has overwhelmed web traffic as a leading consumer of Internet bandwidth
the differences in the characteristics of objects being transferred
the impact of the two-way nature of p2p communication
the ways in which p2p systems are not scaling, despite their explicitly scalable design
3 Wireless Networks Lab. Content Delivery Systems -WWW
The world-wide web (WWW)
Simple client-server architecture
Using the HTTP protocols
Most web objects are small (5~10KB)
The number if web objects is enormous and rapidly growing
4 Wireless Networks Lab. Content Delivery Systems - CDNs
Content delivery networks (CDNs)
Dedicated collection of servers located strategically across the wide- area Internet
Content providers contracts with commercial CDNs to host and distribute content
Content is replicated across the wide area
highly available
Clients can access topologically nearby replicas with low latency
DNS redirection causes overhead
5 Wireless Networks Lab. CDN example - Akami
6 Wireless Networks Lab. Content Delivery Systems –P2P
Peer-to-peer systems (P2P)
Peers collaborate to form a distributed system
exchanging content
Peers behave as both servers and clients
Architecture types of P2P systems
logically centralized architecture
Napster
fully distributed architecture
Gnutella, Freenet
hybird architecture
some peers are elected as supernodes
Kazaa
7 Wireless Networks Lab. P2P example
Distributed Centralized
Hybrid 8 Wireless Networks Lab. Measurement Methodology
Passive network monitoring
collects traces of traffic between the University of Washington(UW) and the Internet
UW connects to its IPSs via two border routers
one for outbound traffic, the other is for inbound traffic
both are fully connected to four switches
each switches has a monitoring port
sending copies of incoming and outgoing packets to the monitoring hosts
Traffic types
HTTP traffic
WWW, Akamai, Kazza, Gnutella
non-HTTP TCP traffic
Kazza and Gnutella search traffic
9 Wireless Networks Lab. Classifying traffic types
Akamai HTTP traffic on port 80, 8080, 443
WWW HTTP traffic on port 80, 8080, 443
HTTP traffic on port 6346, 6347 Gnutella - inculding file transfer - excluding search, control traffic HTTP traffic on port 1214 Kazza - including file transfer -excluding search, control traffic P2P Gnutella + Kazza any other TCP traffic - NNTP, SMTP, HTTP traffic to other ports Non-HTTP TCP traffic -traffic from other P2P systems - control and search traffic on Gnutella and Kazza
10 Wireless Networks Lab. High-Level Data Characteristics
11 Wireless Networks Lab. HTTP trace summary
Exporting 16.65 TB, importing 3.44 TB
UW is a net provider rather than consumer of HTTP data P2P systems account for a large percentage of the bytes exported and the total bytes transferred
12 Wireless Networks Lab. TCP Bandwidth
All systems show a typical diurnal cycle Bandwidth consumption
Akamai - 0.2%
Gnutella - 6.04%
WWW traffic - 14.3% of TCP traffic
Kazaa - 36.99% of TCP bytes
other TCP based protocols – 43%
13 Wireless Networks Lab. UW Client and Server TCP Bandwidth
Figure (a) – Inbound Data BW (web and P2P downloads from UW clients)
WWW peaking in the middle of the day
Kazza peaking late at night Figure (b) – Outbound Data BW (web and P2P uploads from UW servers)
Peak Kazza BW dominates WWW by a factor of 3
External Kazza clients consume 7.6 times more BW than UW Kazza clients
14 Wireless Networks Lab. Content Types Downloaded by UW Clients
GIF & JPEG images
42% of requests, only 16.3% of the bytes transferred AVI & MPEG
0.41% of requests, 29.3% of the bytes transferred Comparison with measurements from study in 1999
HTML traffic : -43%, GIF&JPG traffic : -59%
AVI&MPG traffic : 400%, MP3 traffic 300%
15 Wireless Networks Lab. Summary
The balance of HTTP traffic has changed dramatically over the last server years P2P traffic overtaking WWW traffic as the largest contributor to HTTP bytes transferred Although UW is large publisher of web documents, P2P traffic makes the University an even larger exporter of data The mixture of object types downloaded by UW clients has changed
video and audio accounting for a substantially larger fraction of traffic than three years ago
16 Wireless Networks Lab. Detailed Content Delivery Characteristics
17 Wireless Networks Lab. Objects
Object size: P2P (median: 4MB)> WWW (median: 2KB) & Akamai Top bandwidth consuming Objects
Gnutella
relatively large number of objects account for a large portion of the transferred bytes
18 Wireless Networks Lab. Top 10 bandwidth consuming objects
WWW – The top 10 objects are a mix of extremely small objects Akamai – 8 out of the top 10 objects are larger and unpopular Kazaa – Export objects are larger than import objects
19 Wireless Networks Lab. Downloaded bytes by object type
20 Wireless Networks Lab. Top UW bandwidth consuming clients
Figure (a) – Top Bandwidth Consuming UW Clients
WWW - Top 200 clients (0.5%) 13% of WWW traffic Kazza - Top 200 clients (4%) 50% of Kazza traffic
Figure (b)
Kazza: 200 clients 20% of the total HTTP bytes downloaded (worst offender)
21 Wireless Networks Lab. Clients - Request rates over time
Figure (a) – WWW + Akamai Request Rates
inbound request rate peaks at 1100 request per second
outbound request rate peaks under 200 request per second
Figure (b) – Kazza Request Rates
at a high level request rate: two orders of magnitude lower than the web Kazza consumes median object size: three orders of magnitude higher than the web more bandwidth
22 Wireless Networks Lab. Client – Concurrent HTTP transactions
Despite the order of magnitude request-rate advantage of WWW over Kazza
the number of simultaneous open Kazza connections is about twice the number of simultaneous open WWW + Akamai connections Tue 0:00
Kazza generates only 23 requests per second
up to almost 1000 open requests at a time due to its long transfers
23 Wireless Networks Lab. Top UW-internal servers to external clients
Figure (a) – Top Bandwidth Consuming UW Servers
Gnutella: All of the bytes first 10 servers WWW: steep curve several major servers provide documents to the web Kazza: 80% of the bytes top 334 servers
Figure (b)
WWW: 20 servers 20% of all HTTP bytes output
Kazza: 170 server 50% of all HTTP bytes output
24 Wireless Networks Lab. The UW-external servers to internal clients
Figure (a)
WWW: 938 external servers 50% of the bytes Kazza: 600 external servers 26% of the bytes
Figure (b)
Kazza: Top 500 external Kazza peers 10% of the bytes WWW: Top 500 servers 22% of the bytes
25 Wireless Networks Lab. Scalability of P2P Systems
Whether P2P Systems like Kazaa can scale in environments such as UW ? Every peer in P2P system consumes bandwidth in both directions
Each new P2P client added becomes a server for the entire P2P structure
Kazaa object is huge, so a small number of peers can consume an enormous amount of total net. Bandwidth The bandwidth cost of each P2P peer is 90 times that of the web client It seems questionable whether any organization can supports a service with these characteristics
26 Wireless Networks Lab. Summary
Peer-to-peer, which now accounts for over three quarters of HTTP traffic A small number of P2P users are consuming a disproportionately high fraction of bandwidth While the P2P request rate is quite low, the transfer last long While the design of P2P overlay structures focuses on spreading the workload for scalability, our measurements show that a small number of servers are taking the majority of the burden
27 Wireless Networks Lab. The Potential Role of Caching in CDNs
Akamai requests achieve an 88% ideal hit rate and a 50% practical hit rate, noticeably higher than www requests (77% and 36%)
Our analysis shows that akamai requests are more skewed towards the most popular documents than are WWW requests
We know that most bytes fetched from Akamai are from images and videos
This implies that much of Akamai's content is in fact static and could be cached
We would expect that widely deployed proxy caches would significantly reduce the need for a separate content delivery network
28 Wireless Networks Lab. The Potential Role of Caching in P2P
The potential impact of caching in P2P systems may exceed the benefits seen in the web Inbound cache byte hit rate = 35%, Outbound cache byte hit rate = 85% Hit rate increases with client population size for outbound traffic. (1000 client - 40%, 500,000 client - 85%) Reverse P2P cache saves the most bandwidth
29 Wireless Networks Lab. Conclusion
P2P traffic now accounts for the majority of HTTP bytes transferred P2P documents are three orders of magnitude larger than web objects leading to a 1000-fold increase in transfer time A small number of extremely large objects account for an enormous fraction of observed P2P traffic A small number of clients and servers are responsible for the majority of the traffic we saw in the P2P systems Each P2P client creates a significant bandwidth load in both directions
30 Wireless Networks Lab. Q & A
Thank you!!!!!
31 Wireless Networks Lab.