15-441 Computer Networking Lecture 16: Delivering Content Web And
Total Page:16
File Type:pdf, Size:1020Kb
Overview • Web 15-441 • Protocol interactions 15-441 Computer Networking 15-641 • Caching • Cookies • Consistent hashing Lecture 16: Delivering Content • Peer-to-peer Web and Peer-Peer • CDN Peter Steenkiste • Video Fall 2014 www.cs.cmu.edu/~prs/15-441-F14 2 Web history Internet Traffic History • 1945: Vannevar Bush, “As we may think”, Atlantic 100000 Monthly, July, 1945. • Describes the idea of a distributed hypertext system. 10000 All Fixed 1000 • A “memex” that mimics the “web of trails” in our minds. Mobile • 1989: Tim Berners-Lee (CERN) writes internal proposal 100 to develop a distributed hypertext system 10 • Connects “a web of notes with links”. PByte/month 1 • Intended to help CERN physicists in large projects share and manage information 0.1 • 1990: TBL writes graphical browser for Next machines 0.01 0.001 • 1992-1994: NCSA/Mosaic/Netscape browser release Year 3 4 1 Typical Workload (Web Pages) HTTP 0.9/1.0 • Multiple (typically small) objects per page • One request/response per TCP connection • File sizes •Lots of small objects • Simple to implement • Heavy-tailed means & TCP • Short transfers are very hard on TCP • Pareto distribution for tail • 3-way handshake • Multiple connection setups three-way handshake • Lognormal for body of distribution • Lots of slow starts each time • Embedded references • Extra connection state • Several extra round trips added to transfer • Number of embedded objects also Pareto • Many slow starts – low throughput because of small -k Pr(X>x) = (x/xm) window • This plays havoc with performance. Why? • Never leave slow start for short transfers • Loss recovery is poor when windows are small • Solutions? • Lots of extra connections • Increases server state/processing 5 6 Single Transfer Example Improving Performance Client Server • Multiple concurrent connections (Netscape) 0 RTT SYN • Benefits are mixed: more state, timeouts, … Client opens TCP SYN 1 RTT • Multiplex multiple transfers onto one TCP connection connection ACK Client sends HTTP request DAT (HTTP 1.1) ACK Server reads from for HTML DAT disk • Also allow pipelined transfers, i.e., multiple outstanding requests 2 RTT FIN ACK • How to identify requests/responses Client parses HTML FIN • Delimiter Server must examine response for delimiter string Client opens TCP ACK connection SYN • Content-length and delimiter Must know size of transfer in SYN advance 3 RTT ACK • Block-based transmission send in multiple length delimited Client sends HTTP request DAT blocks for image Server reads from ACK disk • Store-and-forward wait for entire response and then use 4 RTT content-length DAT Image begins to arrive • Solution use existing methods and close connection otherwise 7 8 2 Persistent Connection Solution Other Problems Server • Serialized transmission but first bytes may be most useful • May be better to get the 1st 1/4 of all images than one Client complete image (e.g., progressive JPEG) 0 RTT DAT • Can “packetize” transfer over TCP, e.g., range requests Client sends HTTP request ACK Server reads from for HTML DAT disk • Application specific solution to transport protocol 1 RTT ACK problems. :( Client parses HTML DAT • Could fix TCP so it works well with multiple Server reads from Client sends HTTP request simultaneous connections ACK disk for image DAT • More difficult to deploy 2 RTT • Persistent connections can significantly increase state on the server Image begins to arrive • Allow server to close HTTP session • Client can no longer send requests 9 10 Web Proxy Caches No Caching Example (1) Assumptions • Average object size = 100,000 bits origin • User configures browser: Web origin • Avg. request rate from institution’s servers browser to origin servers = 15/sec accesses via cache server public • Delay from institutional router to • Browser sends all HTTP Proxy Internet requests to cache any origin server and back to router server = 2 sec • Object in cache: cache client returns object Consequences • Else cache requests object • Utilization on LAN = 15% 1.5 Mbps access link from origin server, then • Utilization on access link = 100% returns object to client • Total delay = Internet delay + access institutional network delay + LAN delay 10 Mbps LAN = 2 sec + minutes + milliseconds client origin server 11 12 3 No Caching Example (2) With Caching Example (3) Possible solution Install cache • Increase bandwidth of access link origin origin • Suppose hit rate is .4 to, say, 10 Mbps servers servers Consequence • Often a costly upgrade public • 40% requests will be satisfied almost public Internet immediately (say 10 msec) Internet Consequences • 60% requests satisfied by origin server • Utilization on LAN = 15% • Utilization of access link reduced to 60%, • Utilization on access link = 15% resulting in negligible delays • Total delay = Internet delay + access 10 Mbps • Weighted average of delays 1.5 Mbps delay + LAN delay access link = .6*2 sec + .4*10msecs < 1.3 secs access link = 2 sec + msecs + msecs institutional institutional network network 10 Mbps LAN 10 Mbps LAN institutional cache 13 14 HTTP Caching Problems • Clients often cache documents • Fraction of HTTP objects that are cacheable is dropping • Challenge: update of documents • Why? • If-Modified-Since requests to check • Major exception? • HTTP 0.9/1.0 used just date • This problem will not go away • HTTP 1.1 has an opaque “entity tag” (could be a file signature, • Dynamic data stock prices, scores, web cams etc.) as well • CGI scripts results based on passed parameters • When/how often should the original be checked for • Other less obvious examples changes? • SSL encrypted data is not cacheable • Check every time? • Most web clients don’t handle mixed pages well many generic objects transferred with SSL • Check each session? Day? Etc? • Cookies results may be based on past data • Use Expires header • Hit metering owner wants to measure # of hits for revenue, etc. • If no Expires, often use Last-Modified as estimate • What will be the end result? 15 16 4 Cookies: Keeping “state” Cookies: Keeping “State” client Amazon server Many major Web sites use Cookie file usual http request msg server cookies usual http response + creates ID Four components: Example: ebay: 8734 Set-cookie: 1678 1678 for user 1) Cookie header line in the • Susan accesses Internet HTTP response message always from the same PC Cookie file usual http request msg 2) Cookie header line in HTTP • She visits a specific e- amazon: 1678 cookie- request message cookie: 1678 commerce site for the first time ebay: 8734 specific 3) Cookie file kept on user’s • When initial HTTP requests usual http response msg action host and managed by user’s arrives at the site, the site one week later: browser creates a unique ID and creates 4) Back-end database at Web an entry in a backend database usual http request msg Cookie file cookie- site for that ID cookie: 1678 amazon: 1678 specific ebay: 8734 usual http response msg action 17 18 Overview Distributing Load across Servers • Web • Given document XYZ, we need to choose a • Consistent hashing server to use • Peer-to-peer • E.g., in a data center • CDN • Suppose we use simple hashing: modulo of a • Video hash of the name of the document • Number servers from 1…n • Place document XYZ on server (XYZ mod n) • What happens when a servers fails? n n-1 • Same if different people have different measures of n • Why might this be bad? 19 20 5 Consistent Hash: Goals Consistent Hash – Example • Construction • “view” = subset of all hash buckets that are 0 • Assign each of C hash buckets to 14 candidate locations random points on mod 2n circle, where, hash key size = n. Bucket • Correspond to a real server 12 4 • Map object to random position on • Desired features unit interval • Load – all hash buckets have a similar number • Hash of object = closest bucket 8 of objects assigned to them • Monotone addition of bucket does not cause • Smoothness – little impact on hash bucket movement between existing buckets contents when buckets are added/removed • Spread & Load small set of buckets that lie • Spread – small set of hash buckets that may near object hold an object regardless of views • Balance no bucket is responsible for large number of objects 21 22 Consistent Hashing: Ring Consistent Hashing Example • Use consistent has to map both keys and nodes to an m-bit identifier Rule: A key is stored at its successor: node with next higher or equal ID in the same (metric) identifier space • For example, use SHA-1 hashes IP=“198.10.10.1” 0 K5 • Node identifier: SHA-1 hash of IP address IP=“198.10.10.1” SHA-1 ID=123 N123 K20 • Key identifier: SHA-1 hash of key Circular 7-bit SHA-1 ID=60 Key=“LetItBe” K101 ID space N32 • Also need “rule” for assigning keys to nodes • For example: “closest”, higher, lower, .. N90 Key=“LetItBe” K60 24 23 6 Consistent Hashing Properties Finer Grain Load Balancing • Load balance: all nodes receive roughly the same • Redirector knows all server IDs number of keys • It can also track approximate “load” for more • For N nodes and K keys, with high probability precise load balancing • Each node holds at most (1+)K/N keys • Provided that K is large compared to N • Need to define load and be able to track it • When server is added, it receives its initial work load from • To balance load: “neighbors” on the ring • W = Hash(URL, ip of s ) for all i • “Local” operation: no other servers are affected i i • Sort W from high to low • Similar property when a server is removed i • Find first server with low enough load • Benefits and drawbacks? 25 26 Consistent