Internet Technology 2/15/2016 Paul Krzyzanowski 1
Total Page:16
File Type:pdf, Size:1020Kb
Internet Technology 2/15/2016 Peer-to-Peer (P2P) Application Architectures • No reliance on a central server • Machines (peers) communicate with each other Internet Technology • Pools of machines (peers) provide the 04. Peer-to-Peer Applications service • Goals client server – Robustness • Expect that some systems may be down – Self-scalability Paul Krzyzanowski • The system can handle greater workloads as Rutgers University more peers are added Spring 2016 peers February 15, 2016 CS 352 © 2013-2016 Paul Krzyzanowski 1 February 15, 2016 CS 352 © 2013-2016 Paul Krzyzanowski 2 Peer-to-Peer networking Peer to Peer applications “If a million people use a web site simultaneously, • P2P targets diverse solutions doesn’t that mean that we must have a heavy-duty – Cooperative computation remote server to keep them all happy? – Communications (e.g., Skype) – Exchanges, digital currency (bitcoin) – DNS (including multicast DNS) No; we could move the site onto a million desktops – Content distribution (e.g., BitTorrent) and use the Internet for coordination. – Storage distribution Could amazon.com be an itinerant horde instead of • P2P can be a distributed server a fixed central command post? Yes.” – Lots of machines spread across multiple datacenters – David Gelernter The Second Coming – A Manifesto Today, we’ll focus on file distribution See http://edge.org/conversation/the-second-coming-a-manifesto February 15, 2016 CS 352 © 2013-2016 Paul Krzyzanowski 3 February 15, 2016 CS 352 © 2013-2016 Paul Krzyzanowski 4 Four key primitives Example: Napster • Join/Leave • Background – How do you join a P2P system? – Started in 1999 by 19-year-old college dropout Shawn Fanning – How do you leave it? – Built only for sharing MP3 files – Who can join? – Stirred up legal battles with $15B recording industry • Publish – Before it was shut down in 2001: • 2.2M users/day, 28 TB data, 122 servers – How do you advertise content? Strategies: - Central server • Access to contents could be slow or unreliable - Flood the query • Search - Route the query – How do you find a file? • Big idea: Central directory, distributed contents • Fetch – Users register files in a directory for sharing – How do you download the file? – Search in the directory to find files to copy February 15, 2016 CS 352 © 2013-2016 Paul Krzyzanowski 5 February 15, 2016 CS 352 © 2013-2016 Paul Krzyzanowski 6 Paul Krzyzanowski 1 Internet Technology 2/15/2016 Napster: Overview Napster: Discussion Napster is based on a central directory • Pros • Join – Super simple – Search is handled by a single server – On startup, a client contacts the central server – The directory server is a single point of control • Publish • Provides definitive answers to a query – Upload a list of files to the central server – These are the files you are sharing and are on your system • Cons • Search – Server has to maintain state of all peers – Query the sever – Server gets all the queries – Get back one or more peers that have the file – The directory server is a single point of control • Fetch • No directory server, no Napster! – Connect to the peer and download the file February 15, 2016 CS 352 © 2013-2016 Paul Krzyzanowski 7 February 15, 2016 CS 352 © 2013-2016 Paul Krzyzanowski 8 Example: Gnutella Gnutella: Overview • Background Gnutella is based on query flooding – Created by Justin Frankel and Tom Pepper (authors of Winamp) • Join – AOL acquired their company, Nullsoft in 1999 – On startup, a node (peer) contacts at least one node – In 2000, accidentally released gnutella • Asks who its friends are – AOL shut down the project but the code was released – These become its “connected nodes” • Publish – No need to publish • Big idea: create fully distributed file sharing • Search – Unlike Napster, you cannot shut down gnutella – Ask connected nodes. If they don’t know, they will ask their connected nodes, and so on… – Once/if the reply is found, it is returned to the sender • Fetch – The reply identifies the peer; connect to the peer via HTTP & download February 15, 2016 CS 352 © 2013-2016 Paul Krzyzanowski 9 February 15, 2016 CS 352 © 2013-2016 Paul Krzyzanowski 10 Overlay network Overlay network An overlay network is a virtual network formed by peer connections An overlay network is a virtual network formed by peer connections – Any node might know about a small set of machines – Any node might know about a small set of machines – “Neighbors” might not be physically close to you – they’re just who you know – “Neighbors” might not be physically close to you – they’re just who you know Underlying IP Network Overlay Network February 15, 2016 CS 352 © 2013-2016 Paul Krzyzanowski 11 February 15, 2016 CS 352 © 2013-2016 Paul Krzyzanowski 12 Paul Krzyzanowski 2 Internet Technology 2/15/2016 Gnutella: Search Gnutella: Search Initial query sent to neighbors (“connected nodes”) If a node does not have the answer, it forwards the query Query: where is file X? Query: where is file X? Query: Query: Query: where is file X? where is file X? where is file X? Query: where is file X? Query: Query: where is file X? Query: where is file X? Query: where is file X? where is file X? Queries have a hop count (time to live) – so we avoid forwarding loops February 15, 2016 CS 352 © 2013-2016 Paul Krzyzanowski 13 February 15, 2016 CS 352 © 2013-2016 Paul Krzyzanowski 14 Gnutella: Search Gnutella: Search If a node has the answer, it replies – replies get forwarded • Original protocol I have X! Query: – Anonymous: you didn’t know if the request you’re getting is from the where is file X? Query: originator or the forwarder where is file X? – Replies went through the same query path Reply Query: Query: where is file X? where is file X? • Downloads Reply Reply – Node connects to the server identified in the reply Query: – If a connection is not possible due to firewalls, the requesting node can where is file X? Reply send a push request for the remote client to send it the file Query: where is file X? Query: where is file X? February 15, 2016 CS 352 © 2013-2016 Paul Krzyzanowski 15 February 15, 2016 CS 352 © 2013-2016 Paul Krzyzanowski 16 Peers do not have equal capabilities Gnutella: Enhancements • Network upstream and downstream bandwidth • Optimizations • Connectivity costs (willingness to participate) – Requester’s IP address sent in query to optimize reply – Every node is no longer equal • Availability • Leaf nodes & Ultrapeers • Compute capabilities • Leaf nodes connect to a small number of ultrapeers • Ultrapeers are connected to ≥ 32 other ultrapeers • Route search requests through ultrapeers • Downloads – Node connects to the server identified in the reply – If a connection is not possible due to firewalls, the requesting node can send a push request for the remote client to send it the file February 15, 2016 CS 352 © 2013-2016 Paul Krzyzanowski 17 February 15, 2016 CS 352 © 2013-2016 Paul Krzyzanowski 18 Paul Krzyzanowski 3 Internet Technology 2/15/2016 Gnutella: Summary Example: FastTrack/Kazaa • Pros • Background – Fully decentralized design – Kazaa & FastTrack protocol created in 2001 – Team of Estonian programmers – same team that will later create Skype – Searching is distributed – Post-Napster and a year after Gnutella was released – No control node – cannot be shut down – FastTrack: used by others (Grokster, iMesh, Morpheus) – Open protocol • Proprietary protocol; Several incompatible versions • Cons • Big idea: Some nodes are better than others – A subset of client nodes have fast connectivity, lots of storage, and fast – Flooding is inefficient: processors • Searching may require contacting a lot of systems; limit hop count – These will be used as supernodes (similar to gnutella’s ultrapeers) – Well-known nodes can become highly congested – Supernodes: – In the classic design, if nodes leave the service, the system is • Serve as indexing servers for slower clients crippled • Know other supernodes February 15, 2016 CS 352 © 2013-2016 Paul Krzyzanowski 19 February 15, 2016 CS 352 © 2013-2016 Paul Krzyzanowski 20 Kazaa: Supernodes Kazaa: publish a file I have X February 15, 2016 CS 352 © 2013-2016 Paul Krzyzanowski 21 February 15, 2016 CS 352 © 2013-2016 Paul Krzyzanowski 22 Kazaa: search Kazaa: Discussion Supernodes answer for all their peers (ordinary nodes) Selective flooding of queries • Join – A peer contacts a supernode query • Publish query query Query X – Peer sends a list of files to a supernode query • Search query – Send a query to the supernode – Supernodes flood the query to other supernodes • Fetch – Download the file from the peer with the content Reply February 15, 2016 CS 352 © 2013-2016 Paul Krzyzanowski 23 February 15, 2016 CS 352 © 2013-2016 Paul Krzyzanowski 24 Paul Krzyzanowski 4 Internet Technology 2/15/2016 Kazaa: Summary BitTorrent • Pros • Background – Similar to improved Gnutella – Introduced in 2002 by Bram Cohen – Efficient searching via supernodes – Motivation – Flooding restricted to supernodes • Popular content exhibits temporal locality: flash crowds – E.g., slashdot effect, CNN on 9/11, new movies, new OS releases • Cons • Big idea: allow others to download from you while you are – Can still miss files downloading – Well-known supernodes provide opportunity to stop service – Efficient fetching, not searching – Single publisher, many downloaders February 15, 2016 CS 352 © 2013-2016 Paul Krzyzanowski 25 February 15, 2016 CS 352 © 2013-2016 Paul Krzyzanowski 26 BitTorrent: Overview BitTorrent: Publishing & Fetching Enable downloads from peers • To distribute a file • Join – Create a .torrent