<<

Peer-to-Peer Networks

14-740: Fundamentals of Networks

Credit to Bill Nace, 14-740, Fall 2017

Material from Computer Networking: A Top Down Approach, 6th edition. J.F. Kurose and K.W. Ross traceroute

• P2P Overview • Architecture components • Napster (Centralized) • Gnutella (Distributed) • Skype and KaZaA (Hybrid, Hierarchical) • KaZaA Reverse Engineering Study

14-740: Spring 2018 2 What is P2P?

/ interaction • Client: any end-host • Server: specific end-host • P2P: Peer-to-peer • Any end-host • Aim to leverage resources available on “clients” (peers) • Hard drive space • Bandwidth (especially upload) • Computational power • Anonymity (i.e. Zombie botnets) • “Edge-ness” (i.e. being distributed at network edges) • Clients are particularly fickle • Users have not agreed to provide any particular level of service • Users are not altruistic -- algorithm must force participation without allowing cheating • Clients are not trusted • Client code may be modified • And yet, availability of resources must be assured P2P History

• Proto-P2P systems exist • DNS, Netnews/Usenet • Xerox Grapevine (~1982): name, mail delivery service • Kicked into high gear in 1999 • Many users had “always-on” broadband net connections • 1st Generation: Napster (music exchange) • 2nd Generation: Freenet, Gnutella, Kazaa, BitTorrent • More scalable, designed for anonymity, fault-tolerant • 3rd Generation: Middleware -- Pastry, Chord • Provide for overlay routing to place/find resources

14-740: Spring 2018 6 P2P Architecture

• Content Directory • “Database” of content • Structured? Unstructured? • Which peer has what files? • Metadata: Other info about files • Signaling protocol • How do peers exchange coordination messages? • Proprietary? Encrypted?

14-740: Spring 2018 7 Architecture (2)

• File transfer • How does a peer retrieve a file from another peer? • HTTP or HTTP-like • Any peer must be able to send reply messages

14-740: Spring 2018 8 Overlay network is not the network • Overlay networks are formed on top of network graph • Connect peers via abstract links in the overlay • Transport accomplished on network edges • Overlay algorithms abstract particulars of the network

perhaps even built on HTTP one edge for transport! traceroute

• P2P Overview • Architecture components • Napster (Centralized) • Gnutella (Distributed) • Skype and KaZaA (Hybrid, Hierarchical) • KaZaA Reverse Engineering Study

14-740: Spring 2018 10 Napster

• Original “centralized” design 1. When peer connects it informs central server of • IP address • content 2. Marcia queries for “Believer” • Server looks through index • Reply: “Daichi has Believer” 3. Marcia requests file from Daichi Problems?

• File transfer is decentralized, but locating content is highly centralized • Single point of failure • Performance bottleneck • Single point of lawsuit • Result: Napster was owned by Best Buy • Now it’s a rebranded Rhapsody music streaming service

14-740: Spring 2018 12 traceroute

• P2P Overview • Architecture components • Napster (Centralized) • Gnutella (Distributed) • Skype and KaZaA (Hybrid, Hierarchical) • KaZaA Reverse Engineering Study

14-740: Spring 2018 13 Gnutella

• Created in response to Napster problems • Fully decentralized • Does not depend on central directory • Participants arrange themselves in overlay • Queries flood network to find file • Fully anonymous • Public domain protocol • Various Gnutella clients

14-740: Spring 2018 14 Bootstrapping

1. New peer X must find some member of the Gnutella network • Use a list of candidate peers 2. X sequentially attempts to make TCP connection with peers on list until successful with peer Y 3. X sends ping message to Y; Y forwards ping message 4. All peers receiving a ping message respond to X with a pong message 5. X receives many pong messages and can setup additional TCP connections

14-740: Spring 2018 15 Query Flooding

File transfer • Query messages sent (HTTP) Query over existing TCP QueryHit connections • Peers forward query message Query • QueryHit messages sent over reverse path • File transfer arranged over HTTP Limited Scope Query Flooding

• Original design not scalable • Exponential increase in signaling traffic • Solution is to limit scope of query • Include peer-count field in query message, e.g. peer-count = 4 • This field gets decremented by 1 at each hop • Message stops propagating when peer-count hits zero

Query (peer-count = 3) Query (peer-count = 2)

14-740: Spring 2018 17 Question

• If peer-count = 4 at the start, how many peers would the query message eventually reach? • It depends on the number of neighbors each peer has!

14-740: Spring 2018 19 More Questions

• Is limited scope query flooding scalable? (i.e. How does number of nodes affect message counts?) • Not scalable • Number of messages grows with number of nodes • Desire: constant time search

14-740: Spring 2018 21 Even more questions

• Are we guaranteed to find an object? (Assume the object exists somewhere in the overlay network) • No guarantee • Query stops after peer-count hits zero • Gnutella uses a unstructured graph

14-740: Spring 2018 23 traceroute

• P2P Overview • Architecture components • Napster (Centralized) • Gnutella (Distributed) • Skype and KaZaA (Hybrid, Hierarchical) • KaZaA Reverse Engineering Study

14-740: Spring 2018 24 KaZaa: Exploiting Heterogeneity

• Each peer is either a Super (SN) or an Ordinary Node (ON) assigned to a SN • TCP connection between ON and its SN • TCP connections between some pairs of SNs • SN tracks the content in all its children KaZaa Queries

• Each file has a hash and a descriptor • Client sends keyword query to its SN • SN responds with matches: • For each match: metadata, hash, IP address • If SN forwards query to other SNs, they respond with matches • Client then selects files for downloading • HTTP requests using hash as identifier sent to peers holding desired file

14-740: Spring 2018 27 Measurement Study

• Developed tools to reverse engineering KaZaA • Attempt to answer the following questions: • What is the ratio of SN to ONs? • What is the fraction of SNs overall? • How are SNs connected, sparsely or densely? • How does ON pick best SN? • Random port numbers and NATs?

14-740: Spring 2018 28 Structural Properties

• Deployed apparatus in Polytechnic campus and broadband residential network • SN connects to 40-50 other SNs (dynamic) • SN has 100-160 ONs at Polytechnic, 55-70 at access network • Given 3 million peers, 25000 – 40000 SNs • SN is connected to ~0.1% of other SNs

14-740: Spring 2018 29 Unanswered Questions...

• Details about the residential access network? • Where is it? What is it? • What is the uplink/download bandwidth? • How long was the measurement study? • 6 hours on 2 days? Aug 22 03, Oct 24 03 • How are these time periods representative samples? • Where did the 3 million peers number come from? • From KaZaA?

14-740: Spring 2018 30 Overlay Dynamics

• Connection lifetimes are short • Average for ON-SN is 34 mins, SN-SN is 11 mins • 38% of ON-SN and 32% of SN-SN lasted < 30 secs • Why so short? • SN searching for other SNs with small workload • Long-term connection shuffling, so larger set of SNs can be explored • Exchange of SN lists

14-740: Spring 2018 31 Unanswered Questions ...

• Big jump from overlay dynamic numbers to conjectures of what SNs are doing… • How can we interpret these numbers better?

“Staircases” in the cumulative distribution?

Different distinct groups of connection times

Compare these times to conjectures

14-740: Spring 2018 32 Parent Selection

• Workload • Exact algorithm to calculate workload is unknown • Tied to the number of connections a SN is current supporting • Locality • RTT measurements • 60% of SN-SN connections < 50 msec • 40% of ON-SN < 5 msecs • Transatlantic traffic ~ 100 msecs • Transpacific traffic ~ 180 msecs • Topological closeness (Prefix matching) • SNs in SN list close to ON • Issues with this methodology?

14-740: Spring 2018 33 Skype

• P2P Voice-over-IP (VoIP) Skype login server • pc-to-pc, pc-to-phone, phone- to-pc • also IM, video • proprietary application-layer protocol (inferred via reverse engineering) • hierarchical overlay Making a Call

• User starts Skype Skype • Client registers with SN login server • list of bootstrap SNs • Client logs in (authenticates) • Call: client queries SN with callee ID • SN contacts other SNs (how? unknown) to find addr of callee • SN returns address to client • Client directly contacts callee (TCP) Lesson Objectives

• Now, you should be able to: • list reasons that led to the creation of P2P networks • describe what an overlay network is and how it is different from the • use historical P2P networks to describe centralized P2P networks, fully distributed P2P networks, and hierarchical P2P networks • describe search techniques in the various P2P forms, and to analyze search efficiencies

14-740: Spring 2018 36