
Peer to Peer Networking CS 4720 – Web & Mobile Systems CS 4720 History of File Sharing • P2P is the solu:on to a problem: how to get very large files to a lot of people in a :mely fashion • How did the ques:on arise? • "Freedom!" "Internet Media!" "Take it to the man!" • … or sharing copies of copyrighted files… 2 CS 4720 Bulle:n Board Systems • Ah, the good old days… • Let's take a look at one now! – bbsmates.com / hp://renegadebbs.info/telnet • We can consider it the earliest form of a web service • Usenet is a form of a bbs… kinda… – No central server, fully distributed, evolving mesh – "servers" copy info between themselves 3 CS 4720 Napster • The tech story of my undergrad years • Debuted in Summer of 1999 – That fall I was star:ng my Sophomore year – I was taking: • CSC 112 (and lab) Fundamentals of Comp Science – B • MTH 112 Calculus II – B • MTH 117 Discrete Mathemacs – A • THE 112 Introduc:on to the Theatre – A- • HMN 396 Individual Study (Medieval Themes in Modern Video Games) – A 4 CS 4720 Napster Protocol • Napster ran central servers that maintained: – User authen:caon – Logging – Chat func:onality – Making connec:ons between clients • A user would login to Napster and the program would the populate their profile with all the songs/files they had available 5 CS 4720 Napster Protocol • <nick> "<filename>" <md5> <size> <bitrate> <frequency> <me> – <nick> is the user contribu:ng the file – <filename> is the mp3 file contributed – <md5> is the has of the mp3 file – <size> is the file size in bytes – <bitrate> is the mp3 bitrate in kbps – <frequence> is the sampling frequency in Hz – <me> is the play :me in seconds – Example: foouser "generic band - generic song.mp3" b92870e0d41bc8e698cf2f0a1ddfeac7 443332 128 44100 60 6 CS 4720 Napster Protocol • When a user did a search, it was just a DB lookup on Napster's servers • Then Napster would establish a client-client connec:on to make the transfer happen • Usually, this would be a simple TCP connec:on to that users Napster data port (effec:vely like an FTP server) – If a firewall was involved, the sender would ini:ate the connec:on with the requester 7 CS 4720 Why did this work? • Universi:es and Colleges were pushing HARD about how "connected" their campus was • Many students got their first email address when they went to school star:ng in 1996 or so • The speed was an INCREDIBLE jump over 14.4, 28.8, 56k • The direct connec:on made it really easy to send the files 8 CS 4720 Why did this work? • Napster (theore:cally at the :me) was in the clear – They didn't host any of the files… they just made them available – Feb 2001: 26.4 million users! – Eventually, the connec:on part of the whole deal was "enabling technology" and that was the end of that in the Summer of 2001 • One of the big problems was there was an iden:fiable target: Napster and Shawn Fanning 9 CS 4720 The Soluon • Decentralize the network • Truly create "the cloud" where the data would live • Even though Napster was going strong un:l Summer of 2001, the foundaon was already being laid for the next generaon of sharing technolgy 10 CS 4720 Kazaa, Morpheus, eDonkey, et al. • Con:nuing the trend of odd names for programs comes the first set of decentralized sharing services 11 CS 4720 The Link Between Them All • The network was actually called "FastTrack" • Was the most popular file sharing network in 2003 – es:mates say that it even eclipsed Napster at its height • FastTrack was an inten:onally designed, corporate funded de-centralized distribu:on network 12 CS 4720 Nodes and Supernodes • When you connected to the FastTrack network (through whatever program you used), you started as a node • Nodes provide file informaon and download requests to supernodes • Supernodes are responsible for indexing users' shares, performing queries, and keeping stas:cs • When a connec:on is made, HTTP is used 13 CS 4720 But you just said it was decentralized… • And it is! • Supernodes are regular nodes that are "promoted" to supernode status by other supernodes on the network • As supernodes see that their ranks are diminishing, or if the bandwidth is hur:ng, they find an unsuspec:ng node and assimilate… I mean, promote it to supernode • Supernodes could also self-announce 14 CS 4720 Guess which nodes got promoted! • Supernodes liked nodes with: – Lots of files – Lots of bandwidth – Lots of up:me – Low latency – Lots of compu:ng power • So… where do you find one of these machines? • College students who leave their machines on overnight! 15 CS 4720 University Response • Aoer all of that with Napster traffic, now the university machines themselves are the supernodes! • IT staff did their best to throUle traffic, block ports, etc. • In the end, the RIAA came knocking… 16 CS 4720 Hashing and RIAA Response • There were some problems in the FastTrack protocol that were suscep:ble to some aacks from the RIAA • The hashing algorithm used to verify if a file was indeed a par:cular file was wriUen to be fast and efficient… but not terribly accurate • The RIAA seeded a ton of dummy files to drop the value of the network 17 CS 4720 Other Problems • Remember how I said this was corporately funded? • How would they make their money back? • Malware and spyware! 18 CS 4720 Kazaa Malware • Cydoor (spyware): Collects informaon on the PC's surfing habits and passes it on to the company which created Cydoor. • B3D (adware): An add-on which causes adver:sing popups if the PC accesses a website which triggers the B3D code. • Altnet (adware): A distribu:on network for paid "gold" files. • The Best Offers (adware): Tracks your browsing habits and internet usage to display adver:sements similar to your interests. • InstaFinder (hijacker): Redirects your URL typing errors to InstaFinder's web page instead of the standard search page. • TopSearch (adware): Displays paid songs and media related to your search in Kazaa. • RX Toolbar (spyware): The toolbar monitors all the sites you visit with Microsoo Internet Explorer and provides links to compe:tors' websites. • New.net (hijacker): A browser plugin that lets you access several of its own unofficial Top Level Domain names, e.g., .chat and .shop. The main purpose of which is to sell domain names such as www.record.shop which is actually www.record.shop.new.net. 19 CS 4720 Today • The FastTrack network is s:ll out there, but many people (comparavely) don't use it anymore • The inventors of FastTrack? • They're doing just fine. • They created Skype. 20 CS 4720 So what replaced FastTrack? • BitTorrent • Introduced by Bram Cohen in Summer of 2001 • Just a few months aer FastTrack goes online, actually • At the :me, though, there weren't any groups to host the trackers that were needed for the protocol to work • That wouldn't change un:l early 2003 21 CS 4720 What exactly is BitTorrent? • From biUtorrent.org: – BitTorrent is a free speech tool. – BitTorrent gives you the same freedom to publish previously enjoyed by only a select few with special equipment and lots of money. – You have something terrific to publish -- a large music or video file, sooware, a game or anything else that many people would like to have. – But the more popular your file becomes, the more you are punished by soaring bandwidth costs. – If your file becomes phenomenally successful and a flash crowd of hundreds or thousands try to get it at once, your server simply crashes and no one gets it. – There is a solu:on to this vicious cycle: BitTorrent – With BitTorrent free speech no longer has a high price. 22 CS 4720 But does it cure the common cold? • So… that was nice and all… but what is BitTorrent? • Simply put, BT is a P2P file sharing protocol for sharing large amounts of data in a method where all nodes share not only demand but also supply • There is no one node a file is downloaded from, and thus the load is theore:cally evenly balanced 23 CS 4720 The Basics • It starts with one node who publishes a file • A .torrent file is created, which simply has the connec:on informaon, file size, etc. • Files are split into (usually) 256KB pieces • They are hashed (of course) to verify the contents aer transmission • The .torrent file is hosted on a tracker site • The tracker HOSTS NO FILES other than the .torrent files • But it does monitor traffic to connect nodes 24 CS 4720 The Basics • The file originator is called the seed • The seed pushes the .torrent file to the tracker • The tracker provides the .torrent file for others to download • When a peer downloads the .torrent to start downloading the file, it announces itself to everyone else that is downloading that file 25 CS 4720 The Basics • In general, BT works on a rarest piece first algorithm • A peer will ask for the rarest piece for a given file from the seeder and will receive it • The peer will then start hos:ng that "rarest piece," which theore:cally is now NOT the rarest, and again asks for the rarest 26 CS 4720 The Basics • This "rarest first" approach makes downloading different from any other downloading you've done before • A standard HTTP request is a straight flow of data… sort of • HTTP packets are out of order and put back together… so why not whole files? • It works out fairly well to get data distributed 27 CS 4720 The Basics • Also, BT doesn't (necessarily) use a single port • Mul:ple TCP connec:ons can be opened randomly to keep the network strong • The problem with this: – Speed of download is a bell curve, not constant – Par:al seeding is possible – Streaming is preUy hard (although Bram says "it's coming") 28 CS 4720 The Protocol • All non "keep alive" messages start with one of the following: – 0 - choke – 1 - unchoke
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages32 Page
-
File Size-