15-441 Computer Networking Lecture 16: Delivering Content Web And

Overview • Web 15-441 • Protocol interactions 15-441 Computer Networking 15-641 • Caching • Cookies • Consistent hashing Lecture 16: Delivering Content • Peer-to-peer Web and Peer-Peer • CDN Peter Steenkiste • Video Fall 2014 www.cs.cmu.edu/~prs/15-441-F14 2 Web history Internet Traffic History • 1945: Vannevar Bush, “As we may think”, Atlantic 100000 Monthly, July, 1945. • Describes the idea of a distributed hypertext system. 10000 All Fixed 1000 • A “memex” that mimics the “web of trails” in our minds. Mobile • 1989: Tim Berners-Lee (CERN) writes internal proposal 100 to develop a distributed hypertext system 10 • Connects “a web of notes with links”. PByte/month 1 • Intended to help CERN physicists in large projects share and manage information 0.1 • 1990: TBL writes graphical browser for Next machines 0.01 0.001 • 1992-1994: NCSA/Mosaic/Netscape browser release Year 3 4 1 Typical Workload (Web Pages) HTTP 0.9/1.0 • Multiple (typically small) objects per page • One request/response per TCP connection • File sizes •Lots of small objects • Simple to implement • Heavy-tailed means & TCP • Short transfers are very hard on TCP • Pareto distribution for tail • 3-way handshake • Multiple connection setups three-way handshake • Lognormal for body of distribution • Lots of slow starts each time • Embedded references • Extra connection state • Several extra round trips added to transfer • Number of embedded objects also Pareto • Many slow starts – low throughput because of small -k Pr(X>x) = (x/xm) window • This plays havoc with performance. Why? • Never leave slow start for short transfers • Loss recovery is poor when windows are small • Solutions? • Lots of extra connections • Increases server state/processing 5 6 Single Transfer Example Improving Performance Client Server • Multiple concurrent connections (Netscape) 0 RTT SYN • Benefits are mixed: more state, timeouts, … Client opens TCP SYN 1 RTT • Multiplex multiple transfers onto one TCP connection connection ACK Client sends HTTP request DAT (HTTP 1.1) ACK Server reads from for HTML DAT disk • Also allow pipelined transfers, i.e., multiple outstanding requests 2 RTT FIN ACK • How to identify requests/responses Client parses HTML FIN • Delimiter Server must examine response for delimiter string Client opens TCP ACK connection SYN • Content-length and delimiter Must know size of transfer in SYN advance 3 RTT ACK • Block-based transmission send in multiple length delimited Client sends HTTP request DAT blocks for image Server reads from ACK disk • Store-and-forward wait for entire response and then use 4 RTT content-length DAT Image begins to arrive • Solution use existing methods and close connection otherwise 7 8 2 Persistent Connection Solution Other Problems Server • Serialized transmission but first bytes may be most useful • May be better to get the 1st 1/4 of all images than one Client complete image (e.g., progressive JPEG) 0 RTT DAT • Can “packetize” transfer over TCP, e.g., range requests Client sends HTTP request ACK Server reads from for HTML DAT disk • Application specific solution to transport protocol 1 RTT ACK problems. :( Client parses HTML DAT • Could fix TCP so it works well with multiple Server reads from Client sends HTTP request simultaneous connections ACK disk for image DAT • More difficult to deploy 2 RTT • Persistent connections can significantly increase state on the server Image begins to arrive • Allow server to close HTTP session • Client can no longer send requests 9 10 Web Proxy Caches No Caching Example (1) Assumptions • Average object size = 100,000 bits origin • User configures browser: Web origin • Avg. request rate from institution’s servers browser to origin servers = 15/sec accesses via cache server public • Delay from institutional router to • Browser sends all HTTP Proxy Internet requests to cache any origin server and back to router server = 2 sec • Object in cache: cache client returns object Consequences • Else cache requests object • Utilization on LAN = 15% 1.5 Mbps access link from origin server, then • Utilization on access link = 100% returns object to client • Total delay = Internet delay + access institutional network delay + LAN delay 10 Mbps LAN = 2 sec + minutes + milliseconds client origin server 11 12 3 No Caching Example (2) With Caching Example (3) Possible solution Install cache • Increase bandwidth of access link origin origin • Suppose hit rate is .4 to, say, 10 Mbps servers servers Consequence • Often a costly upgrade public • 40% requests will be satisfied almost public Internet immediately (say 10 msec) Internet Consequences • 60% requests satisfied by origin server • Utilization on LAN = 15% • Utilization of access link reduced to 60%, • Utilization on access link = 15% resulting in negligible delays • Total delay = Internet delay + access 10 Mbps • Weighted average of delays 1.5 Mbps delay + LAN delay access link = .6*2 sec + .4*10msecs < 1.3 secs access link = 2 sec + msecs + msecs institutional institutional network network 10 Mbps LAN 10 Mbps LAN institutional cache 13 14 HTTP Caching Problems • Clients often cache documents • Fraction of HTTP objects that are cacheable is dropping • Challenge: update of documents • Why? • If-Modified-Since requests to check • Major exception? • HTTP 0.9/1.0 used just date • This problem will not go away • HTTP 1.1 has an opaque “entity tag” (could be a file signature, • Dynamic data stock prices, scores, web cams etc.) as well • CGI scripts results based on passed parameters • When/how often should the original be checked for • Other less obvious examples changes? • SSL encrypted data is not cacheable • Check every time? • Most web clients don’t handle mixed pages well many generic objects transferred with SSL • Check each session? Day? Etc? • Cookies results may be based on past data • Use Expires header • Hit metering owner wants to measure # of hits for revenue, etc. • If no Expires, often use Last-Modified as estimate • What will be the end result? 15 16 4 Cookies: Keeping “state” Cookies: Keeping “State” client Amazon server Many major Web sites use Cookie file usual http request msg server cookies usual http response + creates ID Four components: Example: ebay: 8734 Set-cookie: 1678 1678 for user 1) Cookie header line in the • Susan accesses Internet HTTP response message always from the same PC Cookie file usual http request msg 2) Cookie header line in HTTP • She visits a specific e- amazon: 1678 cookie- request message cookie: 1678 commerce site for the first time ebay: 8734 specific 3) Cookie file kept on user’s • When initial HTTP requests usual http response msg action host and managed by user’s arrives at the site, the site one week later: browser creates a unique ID and creates 4) Back-end database at Web an entry in a backend database usual http request msg Cookie file cookie- site for that ID cookie: 1678 amazon: 1678 specific ebay: 8734 usual http response msg action 17 18 Overview Distributing Load across Servers • Web • Given document XYZ, we need to choose a • Consistent hashing server to use • Peer-to-peer • E.g., in a data center • CDN • Suppose we use simple hashing: modulo of a • Video hash of the name of the document • Number servers from 1…n • Place document XYZ on server (XYZ mod n) • What happens when a servers fails? n n-1 • Same if different people have different measures of n • Why might this be bad? 19 20 5 Consistent Hash: Goals Consistent Hash – Example • Construction • “view” = subset of all hash buckets that are 0 • Assign each of C hash buckets to 14 candidate locations random points on mod 2n circle, where, hash key size = n. Bucket • Correspond to a real server 12 4 • Map object to random position on • Desired features unit interval • Load – all hash buckets have a similar number • Hash of object = closest bucket 8 of objects assigned to them • Monotone addition of bucket does not cause • Smoothness – little impact on hash bucket movement between existing buckets contents when buckets are added/removed • Spread & Load small set of buckets that lie • Spread – small set of hash buckets that may near object hold an object regardless of views • Balance no bucket is responsible for large number of objects 21 22 Consistent Hashing: Ring Consistent Hashing Example • Use consistent has to map both keys and nodes to an m-bit identifier Rule: A key is stored at its successor: node with next higher or equal ID in the same (metric) identifier space • For example, use SHA-1 hashes IP=“198.10.10.1” 0 K5 • Node identifier: SHA-1 hash of IP address IP=“198.10.10.1” SHA-1 ID=123 N123 K20 • Key identifier: SHA-1 hash of key Circular 7-bit SHA-1 ID=60 Key=“LetItBe” K101 ID space N32 • Also need “rule” for assigning keys to nodes • For example: “closest”, higher, lower, .. N90 Key=“LetItBe” K60 24 23 6 Consistent Hashing Properties Finer Grain Load Balancing • Load balance: all nodes receive roughly the same • Redirector knows all server IDs number of keys • It can also track approximate “load” for more • For N nodes and K keys, with high probability precise load balancing • Each node holds at most (1+)K/N keys • Provided that K is large compared to N • Need to define load and be able to track it • When server is added, it receives its initial work load from • To balance load: “neighbors” on the ring • W = Hash(URL, ip of s ) for all i • “Local” operation: no other servers are affected i i • Sort W from high to low • Similar property when a server is removed i • Find first server with low enough load • Benefits and drawbacks? 25 26 Consistent

15-441 Computer Networking Lecture 16: Delivering Content Web And

IPFS and Friends: a Qualitative Comparison of Next Generation Peer-To-Peer Data Networks Erik Daniel and Florian Tschorsch

A Fog Storage Software Architecture for the Internet of Things Bastien Confais, Adrien Lebre, Benoît Parrein

Decentralized File Sharing Application for Mobile Devices -. | Davide Taibi

Forensic Investigation of P2P Cloud Storage Services and Backbone For

Download-Emule-Kad-Server-List.Pdf

Content Addressed, Versioned, P2P File System (DRAFT 3)

African Development Bank Group Mali

A Distributed Content Addressable Network for Open Educational Resources

An Efficient Holistic Data Distribution and Storage Solution for Online Social Networks Guoxin Liu Clemson University

Bittorrent App Change Download Location How to Seed Moved Or Renamed Files in Bittorrent

Solid Over the Interplanetary File System

Intro to Crypto Sci-Club Notes