Peer to Peer Networking

CS 4720 – Web & Mobile Systems

CS 4720 History of File • P2P is the soluon to a problem: how to get very large files to a lot of people in a mely fashion • How did the queson arise? • "Freedom!" "Internet Media!" "Take it to the man!" • … or sharing copies of copyrighted files…

2 CS 4720 Bullen Board Systems • Ah, the good old days… • Let's take a look at one now! – bbsmates.com / hp://renegadebbs.info/telnet • We can consider it the earliest form of a web service • Usenet is a form of a bbs… kinda… – No central server, fully distributed, evolving mesh – "servers" copy info between themselves

3 CS 4720 • The tech story of my undergrad years • Debuted in Summer of 1999 – That fall I was starng my Sophomore year – I was taking: • CSC 112 (and lab) Fundamentals of Comp Science – B • MTH 112 Calculus II – B • MTH 117 Discrete Mathemacs – A • THE 112 Introducon to the Theatre – A- • HMN 396 Individual Study (Medieval Themes in Modern Video Games) – A

4 CS 4720 Napster Protocol • Napster ran central servers that maintained: – User authencaon – Logging – Chat funconality – Making connecons between clients • A user would login to Napster and the program would the populate their profile with all the songs/files they had available

5 CS 4720 Napster Protocol • "<filename>" <me> – is the user contribung the file – <filename> is the mp3 file contributed – is the has of the mp3 file – is the file size in bytes – is the mp3 bitrate in kbps – is the sampling frequency in Hz – <me> is the play me in seconds – Example: foouser "generic band - generic song.mp3" b92870e0d41bc8e698cf2f0a1ddfeac7 443332 128 44100 60

6 CS 4720 Napster Protocol • When a user did a search, it was just a DB lookup on Napster's servers • Then Napster would establish a -client connecon to make the transfer happen • Usually, this would be a simple TCP connecon to that users Napster data port (effecvely like an FTP server) – If a firewall was involved, the sender would iniate the connecon with the requester

7 CS 4720 Why did this work? • Universies and Colleges were pushing HARD about how "connected" their campus was • Many students got their first email address when they went to school starng in 1996 or so • The speed was an INCREDIBLE jump over 14.4, 28.8, 56k • The direct connecon made it really easy to send the files

8 CS 4720 Why did this work? • Napster (theorecally at the me) was in the clear – They didn't host any of the files… they just made them available – Feb 2001: 26.4 million users! – Eventually, the connecon part of the whole deal was "enabling technology" and that was the end of that in the Summer of 2001 • One of the big problems was there was an idenfiable target: Napster and Shawn Fanning

9 CS 4720 The Soluon • Decentralize the network • Truly create "the cloud" where the data would live • Even though Napster was going strong unl Summer of 2001, the foundaon was already being laid for the next generaon of sharing technolgy

10 CS 4720 , , eDonkey, et al. • Connuing the trend of odd names for programs comes the first set of decentralized sharing services

11 CS 4720 The Link Between Them All • The network was actually called "FastTrack" • Was the most popular file sharing network in 2003 – esmates say that it even eclipsed Napster at its height • FastTrack was an intenonally designed, corporate funded de-centralized distribuon network

12 CS 4720 Nodes and Supernodes • When you connected to the FastTrack network (through whatever program you used), you started as a node • Nodes provide file informaon and download requests to supernodes • Supernodes are responsible for indexing users' shares, performing queries, and keeping stascs • When a connecon is made, HTTP is used

13 CS 4720 But you just said it was decentralized… • And it is! • Supernodes are regular nodes that are "promoted" to supernode status by other supernodes on the network • As supernodes see that their ranks are diminishing, or if the bandwidth is hurng, they find an unsuspecng node and assimilate… I mean, promote it to supernode • Supernodes could also self-announce

14 CS 4720 Guess which nodes got promoted! • Supernodes liked nodes with: – Lots of files – Lots of bandwidth – Lots of upme – Low latency – Lots of compung power • So… where do you find one of these machines? • College students who leave their machines on overnight!

15 CS 4720 University Response • Aer all of that with Napster traffic, now the university machines themselves are the supernodes! • IT staff did their best to throle traffic, block ports, etc. • In the end, the RIAA came knocking…

16 CS 4720 Hashing and RIAA Response • There were some problems in the FastTrack protocol that were suscepble to some aacks from the RIAA • The hashing algorithm used to verify if a file was indeed a parcular file was wrien to be fast and efficient… but not terribly accurate • The RIAA seeded a ton of dummy files to drop the value of the network

17 CS 4720 Other Problems • Remember how I said this was corporately funded? • How would they make their money back? • Malware and spyware!

18 CS 4720 Kazaa Malware

• Cydoor (spyware): Collects informaon on the PC's surfing habits and passes it on to the company which created Cydoor. • B3D (adware): An add-on which causes adversing popups if the PC accesses a website which triggers the B3D code. • Altnet (adware): A distribuon network for paid "gold" files. • The Best Offers (adware): Tracks your browsing habits and internet usage to display adversements similar to your interests. • InstaFinder (hijacker): Redirects your URL typing errors to InstaFinder's web page instead of the standard search page. • TopSearch (adware): Displays paid songs and media related to your search in Kazaa. • RX Toolbar (spyware): The toolbar monitors all the sites you visit with Microso Internet Explorer and provides links to competors' websites. • New.net (hijacker): A browser plugin that lets you access several of its own unofficial Top Level Domain names, e.g., .chat and .shop. The main purpose of which is to sell domain names such as www.record.shop which is actually www.record.shop.new.net.

19 CS 4720 Today • The FastTrack network is sll out there, but many people (comparavely) don't use it anymore • The inventors of FastTrack? • They're doing just fine. • They created .

20 CS 4720 So what replaced FastTrack? • • Introduced by Bram Cohen in Summer of 2001 • Just a few months aer FastTrack goes online, actually • At the me, though, there weren't any groups to host the trackers that were needed for the protocol to work • That wouldn't change unl early 2003

21 CS 4720 What exactly is BitTorrent? • From bitorrent.org: – BitTorrent is a free speech tool. – BitTorrent gives you the same freedom to publish previously enjoyed by only a select few with special equipment and lots of money. – You have something terrific to publish -- a large music or video file, soware, a game or anything else that many people would like to have. – But the more popular your file becomes, the more you are punished by soaring bandwidth costs. – If your file becomes phenomenally successful and a flash crowd of hundreds or thousands try to get it at once, your server simply crashes and no one gets it. – There is a soluon to this vicious cycle: BitTorrent – With BitTorrent free speech no longer has a high price.

22 CS 4720 But does it cure the common cold? • So… that was nice and all… but what is BitTorrent? • Simply put, BT is a P2P file sharing protocol for sharing large amounts of data in a method where all nodes not only demand but also supply • There is no one node a file is downloaded from, and thus the load is theorecally evenly balanced

23 CS 4720 The Basics • It starts with one node who publishes a file • A .torrent file is created, which simply has the connecon informaon, file size, etc. • Files are split into (usually) 256KB pieces • They are hashed (of course) to verify the contents aer transmission • The .torrent file is hosted on a tracker site • The tracker HOSTS NO FILES other than the .torrent files • But it does monitor traffic to connect nodes

24 CS 4720 The Basics • The file originator is called the seed • The seed pushes the .torrent file to the tracker • The tracker provides the .torrent file for others to download • When a peer downloads the .torrent to start downloading the file, it announces itself to everyone else that is downloading that file

25 CS 4720 The Basics • In general, BT works on a rarest piece first algorithm • A peer will ask for the rarest piece for a given file from the seeder and will receive it • The peer will then start hosng that "rarest piece," which theorecally is now NOT the rarest, and again asks for the rarest

26 CS 4720 The Basics • This "rarest first" approach makes downloading different from any other downloading you've done before • A standard HTTP request is a straight flow of data… sort of • HTTP packets are out of order and put back together… so why not whole files? • It works out fairly well to get data distributed

27 CS 4720 The Basics • Also, BT doesn't (necessarily) use a single port • Mulple TCP connecons can be opened randomly to keep the network strong • The problem with this: – Speed of download is a bell curve, not constant – Paral seeding is possible – Streaming is prey hard (although Bram says "it's coming")

28 CS 4720 The Protocol • All non "keep alive" messages start with one of the following: – 0 - choke – 1 - unchoke – 2 - interested – 3 - not interested – 4 - have – 5 - biield – 6 - request – 7 - piece – 8 - cancel

29 CS 4720 The Protocol • biield – set of indices indicang what the peer has • have – successful download and check of a piece • request – send index and offset of data wanted • piece – index of and actual data of a piece • cancel – stop transmission

30 CS 4720 The Protocol • Interested / not interested – indicates whether a peer wants to start communicang with another peer • Choke / unchoke – response from peer to interested party as to whether the connecon will connue • Used to manage the number of connecons at any one me

31 CS 4720 Snark • Build your own BT client • Or build it into your own app • How might you use BT in an app? • How might you use it in: – An enterprise architecture? – A service-oriented architecture?

32 CS 4720