15-441 Computer Networking Lecture 16-17: Delivering Content

Overview • Web 15-441 • Protocol interactions 15-441 Computer Networking 15-641 • Caching • Cookies • Consistent hashing Lecture 16-17: Delivering Content • Peer-to-peer Part 1 • CDN Peter Steenkiste • Video Fall 2013 www.cs.cmu.edu/~prs/15-441-F13 2 Web history Web history (cont) • 1945: Vannevar Bush, “As we may think”, Atlantic • 1992 Monthly, July, 1945. • NCSA server released • 26 WWW servers worldwide • Describes the idea of a distributed hypertext system. • 1993 • A “memex” that mimics the “web of trails” in our minds. • Marc Andreessen releases first version of NCSA Mosaic Mosaic • 1989: Tim Berners-Lee (CERN) writes internal proposal version released for (Windows, Mac, Unix). to develop a distributed hypertext system • Web (port 80) traffic at 1% of NSFNET backbone traffic. • Connects “a web of notes with links”. • Over 200 WWW servers worldwide. • Intended to help CERN physicists in large projects share and • 1994 manage information • Andreessen and colleagues leave NCSA to form "Mosaic • 1990: Tim BL writes graphical browser for Next Communications Corp" (Netscape). machines. 3 4 1 Typical Workload (Web Pages) HTTP 0.9/1.0 • Multiple (typically small) objects per page • One request/response per TCP connection • File sizes •Lots of small objects • Simple to implement • Heavy-tailed means & TCP • Disadvantages • Pareto distribution for tail • 3-way handshake • Multiple connection setups three-way handshake each time • Lognormal for body of distribution • Lots of slow starts • Several extra round trips added to transfer • Embedded references • Extra connection state • Multiple slow starts • Number of embedded objects also Pareto -k Pr(X>x) = (x/xm) • This plays havoc with performance. Why? • Solutions? 5 6 Must Consider Network Layer - Single Transfer Example Example: TCP 1: TCP connections need to be set up Client Server 0 RTT SYN • “Three Way Handshake”: Client opens TCP SYN 1 RTT connection ACK Client sends HTTP request DAT Client Server ACK Server reads from for HTML DAT disk SYN (Synchronize) 2 RTT FIN ACK Client parses HTML FIN Client opens TCP ACK SYN/ACK (Synchronize + Acknowledgement) connection SYN SYN ACK 3 RTT ACK Client sends HTTP request DAT …Data… for image Server reads from ACK disk 4 RTT 2: TCP transfers start slowly and then ramp up the DAT bandwidth used – bandwidth is variable Image begins to arrive 7 Lecture 3: Applications 8 2 Performance Issues Netscape Solution • Short transfers are hard on TCP • Mosaic (original popular Web browser) fetched one object • Stuck in slow start at a time! • Loss recovery is poor when windows are small • Netscape uses multiple concurrent connections to improve • Lots of extra connections response time • Increases server state/processing • Different parts of Web page arrive independently • Servers also hang on to connection state after the • Can grab more of the network bandwidth than other connection is closed users • Why must server keep these? • Does not necessarily improve response time • Tends to be an order of magnitude greater than # of active connections, why? • TCP loss recovery ends up being timeout dominated because windows are small 9 10 Persistent Connection Solution Persistent Connection Solution • Multiplex multiple transfers onto one TCP Server connection Client • Also allow pipelined transfers 0 RTT DAT • How to identify requests/responses Client sends HTTP request ACK Server reads from for HTML DAT disk • Delimiter Server must examine response for delimiter string 1 RTT • Content-length and delimiter Must know size of transfer in ACK advance Client parses HTML DAT Server reads from Client sends HTTP request • Block-based transmission send in multiple length delimited ACK DAT disk blocks for image • Store-and-forward wait for entire response and then use 2 RTT content-length Image begins to arrive • Solution use existing methods and close connection otherwise 11 12 3 Other Problems Web Proxy Caches • Serialized transmission • Much of the useful information in first few bytes • User configures browser: Web origin • May be better to get the 1st 1/4 of all images than one complete accesses via cache server image (e.g., progressive JPEG) • Browser sends all HTTP Proxy • Can “packetize” transfer over TCP, e.g., range requests requests to cache server • Application specific solution to transport protocol • Object in cache: cache client problems. :( returns object • Could fix TCP so it works well with multiple • Else cache requests object simultaneous connections from origin server, then • More difficult to deploy returns object to client • Persistent connections can significantly increase state on the server client • Allow server to close HTTP session origin server • Client can no longer send requests 13 14 No Caching Example (1) No Caching Example (2) Assumptions Possible solution • Average object size = 100,000 bits origin • Increase bandwidth of access link origin • Avg. request rate from institution’s servers to, say, 10 Mbps servers browser to origin servers = 15/sec • Often a costly upgrade public public • Delay from institutional router to Internet Internet any origin server and back to router Consequences = 2 sec • Utilization on LAN = 15% Consequences • Utilization on access link = 15% • Utilization on LAN = 15% 1.5 Mbps • Total delay = Internet delay + access 10 Mbps access link • Utilization on access link = 100% delay + LAN delay access link • Total delay = Internet delay + access institutional = 2 sec + msecs + msecs institutional delay + LAN delay network network 10 Mbps LAN 10 Mbps LAN = 2 sec + minutes + milliseconds 15 16 4 With Caching Example (3) HTTP Caching Install cache origin • Clients often cache documents • Suppose hit rate is .4 servers • Challenge: update of documents Consequence • 40% requests will be satisfied almost public • If-Modified-Since requests to check immediately (say 10 msec) Internet • HTTP 0.9/1.0 used just date • 60% requests satisfied by origin server • HTTP 1.1 has an opaque “entity tag” (could be a file signature, • Utilization of access link reduced to 60%, etc.) as well resulting in negligible delays • Weighted average of delays 1.5 Mbps • When/how often should the original be checked for = .6*2 sec + .4*10msecs < 1.3 secs access link changes? institutional • Check every time? network 10 Mbps LAN • Check each session? Day? Etc? • Use Expires header • If no Expires, often use Last-Modified as estimate institutional cache 17 18 Example Cache Check Request Example Cache Check Response GET / HTTP/1.1 HTTP/1.1 304 Not Modified Accept: */* Date: Tue, 27 Mar 2001 03:50:51 GMT Accept-Language: en-us Server: Apache/1.3.14 (Unix) (Red-Hat/Linux) mod_ssl/2.7.1 Accept-Encoding: gzip, deflate OpenSSL/0.9.5a DAV/1.0.2 PHP/4.0.1pl2 mod_perl/1.24 If-Modified-Since: Mon, 29 Jan 2001 17:54:18 GMT Connection: Keep-Alive If-None-Match: "7a11f-10ed-3a75ae4a" Keep-Alive: timeout=15, max=100 User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows NT ETag: "7a11f-10ed-3a75ae4a" 5.0) Host: www.intel-iris.net Connection: Keep-Alive 19 20 5 Problems Caching Proxies – Sources for Misses • Well over 50% of all HTTP objects are not cacheable • Capacity • Why? • How large a cache is necessary or equivalent to infinite • Major exception? • On disk vs. in memory typically on disk • This problem will not go away • Compulsory • Dynamic data stock prices, scores, web cams • First time access to document • CGI scripts results based on passed parameters • Non-cacheable documents • Other less obvious examples • CGI-scripts • SSL encrypted data is not cacheable • Personalized documents (cookies, etc) • Most web clients don’t handle mixed pages well many generic • Encrypted data (SSL) objects transferred with SSL • Consistency • Cookies results may be based on past data • Document has been updated/expired before reuse • Hit metering owner wants to measure # of hits for revenue, etc. • Conflict • What will be the end result? • No such misses 21 22 Cookies: Keeping “state” Cookies: Keeping “State” client Amazon server Many major Web sites use Cookie file usual http request msg server cookies usual http response + creates ID Four components: Example: ebay: 8734 Set-cookie: 1678 1678 for user 1) Cookie header line in the • Susan accesses Internet HTTP response message always from the same PC Cookie file usual http request msg 2) Cookie header line in HTTP • She visits a specific e- amazon: 1678 cookie- request message cookie: 1678 commerce site for the first time ebay: 8734 specific 3) Cookie file kept on user’s • When initial HTTP requests usual http response msg action host and managed by user’s arrives at the site, the site one week later: browser creates a unique ID and creates 4) Back-end database at Web an entry in a backend database usual http request msg Cookie file cookie- site for that ID cookie: 1678 amazon: 1678 specific ebay: 8734 usual http response msg action 23 24 6 Overview Distributing Load across Servers • Web • Given document XYZ, we need to choose a • Consistent hashing server to use • Peer-to-peer • Suppose we use simple hashing: modulo of a • CDN hash of the name of the document • Video • Number servers from 1…n • Place document XYZ on server (XYZ mod n) • What happens when a servers fails? n n-1 • Same if different people have different measures of n • Why might this be bad? 25 26 Consistent Hash: Goals Consistent Hash – Example • Construction • “view” = subset of all hash buckets that are visible 0 • Assign each of C hash buckets to 14 • Correspond to a real server random points on mod 2n circle, where, hash

Load more