DISTRIBUTED ARCHITECTURES AND P2P NETWORKS (PART 1)

Advanced Solutions for Networked Systems

ANGELO FURNO Lectures for the Course: INRIA, INSA de Lyon - France Networks 2 Context and Objectives

• Context: Peer-to-peer approaches and technologies • key components for building large scale effective and efficient distributed systems

• Goal: Introduction to distributed systems and P2P systems • general properties, architectures and applications

• The course will cover the following topics: • Basic definitions for distributed systems, P2P networks and their relation to the ISO/OSI model • Different kinds of P2P Networks • Different topologies and properties • Currently deployed peer-to-peer systems and how they work

Nov. 2015 Angelo Furno – Networks 2 2 Teaching Material

• Lesson slides • Download from http://perso.citi.insa-lyon.fr/afurno/#teaching

• Books for insights • Peer to Peer Computing, Applications, Architecture, Protocols and Challanges - Yu-Kwong Ricky Kwok - CRC Press, 2012 • Peer to Peer Computing, Principles and Applications, Q. Hieu Vu, M. Lupu, B. Chin Ooi - Springer Verlag, 2010 • Distributed Systems: Concepts and Design – G. Coulouris, J. Dollimore, T. Kindberg, G. Blair, Pearson, 5th Edition, 2009 • Overlay Networks: Toward Information Networking - Sasu Tarkoma – CRC Press, 2010 • P2P Networking and Applications - Buford, Yu, Lua - Morgan Kaufmann, 2009 • Peer to Peer Systems and Applications - R. Steinmetz, K.Wehrle - LNCS 3485, Springer Verlag, 2005

Nov. 2015 Angelo Furno – Networks 2 3 Outline

1. Network protocol stack • the ISO-OSI model

2. Introduction to Distributed Systems • Fundamentals, Architectures and Properties

3. Introduction to P2P Systems and Overlays • Basic kinds of Overlay networks • P2P Application Examples • From to … • , , Skype, etc.

Nov. 2015 Angelo Furno – Networks 2 4 1

THE ISO-OSI PROTOCOLS STACK A quick refresher…

Reference: Ian Stoica

5 What is Layering?

• A design technique to organize a network system into a succession of logically distinct entities • such that the service provided by one entity is solely based on the service provided by the previous (lower level) entity

Nov. 2015 Angelo Furno – Networks 2 6 Why layering?

• Each layer means different layer of abstraction • corresponding to a well defined function • The functions of each layer should promote standardization • The information flow across the interfaces should be minimized • The number of layers should be • large enough to separate functionality • small enough to keep the architecture under control • Without layering: each new application has to be re-implemented for every network technology!

Application Telnet FTP NFS HTTP

Transmission Coaxial Fiber Packet Media cable optic radio

Nov. 2015 Angelo Furno – Networks 2 7 Layering: the good and the bad

• Advantages • Modularity – protocols easier to manage and maintain • Abstract functionality – a lower layer can be changed without affecting the upper layer • Reuse – upper layer can reuse the functionality provided by lower layer

• Disadvantages • Possible inefficient implementations

Nov. 2015 Angelo Furno – Networks 2 8 ISO OSI Reference Model

• ISO – International Standard Organization • OSI – Open System Interconnection

• Started in late 70s; first published in 1983 • ARPANET started in 1969; TCP/IP protocols ready by 1974

• Goal: a general open standard • Allow vendors to enter the market by using their own implementation and protocols

Nov. 2015 Angelo Furno – Networks 2 9 ISO OSI Reference Model

• Seven layers OUR FOCUS: Distributed and

P2P Systems

Application Application

Presentation Presentation LAYERS

Session Session HOST LAYERS HOST

HOST Transport Transport

Network Network Network LAYERS Datalink Datalink Datalink

Physical Physical Physical LAYERS MEDIA MEDIA Physical medium

Nov. 2015 Angelo Furno – Networks 2 10 Data Transmission

• Each layer can use only the service provided by the layer immediate below it • Each layer may change and add a header to data packet

data data data data data data data data data data data data data data

Nov. 2015 Angelo Furno – Networks 2 11 OSI vs TCP/IP

• OSI: conceptually define service, interface, protocol • Internet: provide a successful implementation

Application Application Telnet FTP DNS Presentation Session TCP UDP Transport Transport Network Internet IP Datalink Host-to- LAN Packet Physical network radio

OSI: theoretical reference TCP/IP: Internet real-world model implementation

Nov. 2015 Angelo Furno – Networks 2 12 Key Concepts in the OSI Model

• Service: says what a layer does

• Interface: says how to access the service

• Protocol: says how is the service implemented • A set of rules and formats that govern the communication between two peers

Nov. 2015 Angelo Furno – Networks 2 13 Physical Layer

• Service: move the information between two systems connected by a physical link. • Examples: coaxial cable, optical fiber links; transmitters, receivers

• Interface: specifies how to send a bit

• Protocol: coding scheme used to represent a bit, voltage levels, duration of a bit

Nov. 2015 Angelo Furno – Networks 2 14 Datalink Layer

• Service: Send data frames between peers attached to the same physical media • Others (optional): • Arbitrate the access to common physical media • Ensure reliable transmission • Provide flow control

• Interface: send a data unit (frame) to a machine connected to the same physical media

• Protocol: layer addresses, implement Medium Access Control (MAC) (e.g., CSMA/CD)

Nov. 2015 Angelo Furno – Networks 2 15 Network Layer

• Service: • Deliver a packet to specified destination • Perform segmentation / reassemble (fragmentation/defragmentation) • Others: • Packet scheduling; buffer management

• Interface: send a packet to a specified destination

• Protocol: define global unique addresses; construct routing tables

• Data plane • concerned with Packet forwarding, Buffer management, Packet scheduling • Control Plane • concerned with installing and maintaining state for data plane

Nov. 2015 Angelo Furno – Networks 2 16 Transport Layer

• Service: • Provide an error-free and flow-controlled end-to-end connection between remote processes/nodes • Multiplex multiple transport connections to one network connection • Split one network connection into multiple transport connections

• Interface: send a packet to specific destination

• Protocol: implement reliability and flow control • Examples: TCP, UDP

Nov. 2015 Angelo Furno – Networks 2 18 Session Layer

• Service: • Opening, closing and managing a “long”-term connection between end-user application processes • Full-duplex • Access management, e.g., token control • Synchronization, e.g., provide check points for long transfers • For Internet applications, each session is related to a particular port • HTTP program or daemon always has port number 80 • port numbers associated with the main Internet applications are referred to as well-known port numbers

• Interface: diverse according to the service

• Protocol: token management; insert checkpoints, implement roll-back functions • Examples: Remote Procedure Calls (RPC), AppleTalk Session Protocol (ASP)

Nov. 2015 Angelo Furno – Networks 2 19 Presentation Layer

• Service: convert data between various representations • Translation of data between a networking service and an application • character encoding (e.g.: ASCII/EBCDIC), data compression, encryption/decryption • Interface: depends on the service

• Protocol: define data formats, and rules to convert from one format to another • Examples: Multipurpose Internet Mail Extensions (MIME) protocol, Secure Sockets Layer (SSL) protocol,

Nov. 2015 Angelo Furno – Networks 2 20 Application Layer Application Presentation • Service: any service provided to the end user Session • Interface: depends on the application Transport Network • Protocol: depends on the application Datalink • Examples: FTP, Telnet, WWW browser Physical

Distributed systems and P2P Overlay Networks are typically designed at this level, depending on the services they have to provide. Assumptions and choices are made on the lower levels of the ISO/OSI model

Nov. 2015 Angelo Furno – Networks 2 21 2

DISTRIBUTED SYSTEMS Fundamentals, Overlay Networks

22 Definitions

• Def 1: “a distributed system is a collection of independent computers that appears to its users as a single coherent system” [Andrew Tanenbaum]

• Def 2: “a distributed system is a software system in which components located on networked computers communicate and coordinate their actions by passing messages” [George Coulouris]

Application Transport Internet Host-to- network

Nov. 2015 Angelo Furno – Networks 2 23 (Desired) Key Props.

• Heterogeneity • Networks and protocols, hardware/OSs, programming languages, implementations • Concurrency • Shared limited resources/services, competition, multiple users • Openness • Extensibility, re-implementation • Scalability • Working well (effectiveness) at different usage scales (number or required resources/users) • Failure Handling, Security, Transparency, Quality of Service, etc.

Nov. 2015 Angelo Furno – Networks 2 24 A System View

• The entities that communicate in a distributed system are typically processes • a distributed system can be described as processes coupled with appropriate inter-process communication paradigms • Caveats: • if the underlying operating systems (e.g. sensor networks) does not support processes, the interacting entities are nodes • processes can be supplemented by threads, as the endpoints of communication. Processes Comm. Links

Nov. 2015 Angelo Furno – Networks 2 25 Architectural Models

• Logical description of the distributed systems • Roles and responsibilities of the communicating entities (processes, threads, nodes) • Communication paradigms • Direct comunication among processes (i.e. interprocess comm.) • Completely based on low-level message-passing primitives (socket, multicast, etc.) • Remote invocation • Request-reply protocols (i.e. /server protocols) • Basic enhancement to message-passing primitives • Support for specifying the remote operation to perform • HTTP Protocol (GET; POST; HEAD; …) • Remote procedure calls • Remote method invocation • Indirect communication • Group communication; Publish-subscribe (ONE-To-MANY); Message queues (ONE-To-ONE) • Shared memory and tuple space

Nov. 2015 Angelo Furno – Networks 2 26 Architectural Models: C/S

• Client/Server model • Asymmetric model

• Two well-distinguished and independent roles: • Client • client processes interact with individual server processes to access some shared resources • Server • provides services over shared resources and manages accesses by clients • can be in turn client of other servers

• Examples: • DNS service • Search Engines • a server component to answer user queries • many web crawlers that are client of other servers

Nov. 2015 Angelo Furno – Networks 2 27 Architectural Models: C/S

• P2P model • symmetric model to exploit data/hardware resources on a large number of participating computers to fulfil a global task or activity

• All of the processes involved in a task or activity play similar roles • they all run the same program • and offer the same set of interfaces to each other • each peer holds only a small part of the application database • storage, processing and communication load is distributed across peers • no central point-of-failure/bottleneck as in C/S

• Examples: • File services (e.g. torrent applications) • Audio/Video Streaming applications (e.g. Spotify, Skype, IPTV) • Distributed data storage systems

Nov. 2015 Angelo Furno – Networks 2 28 P2P Overlay Networks (ONs)

Focus at the application level • : a network of peers built on top of an existing network • Virtual (logical) network • made up of nodes and logical links • nodes are application processes/threads • built on top of an existing network • logical links that can span many physical links • to provide a service not available in the underlying network • the overlay relies on underlays for basic networking (i.e., routing/forwarding) • can offer new routing and forwarding features without changing the routers

• Today, most P2P overlay networks are built in the application layer on top of the TCP/IP networking suite • It means: P2P overlay networks are typically application level abstractions!

Nov. 2015 Angelo Furno – Networks 2 29 3

P2P SYSTEMS AND OVERLAY TOPOLOGIES An introduction

References: Laura Ricci, Peer to Peer Systems Sasu Tarkoma, Overlay and P2P

30 P2P Systems: definitions

• Def1: A peer to peer system is an overlay network of autonomous entities (processes/nodes) which is able to auto-organize and a set of distributed resources, like CPU cycles, memory, bandwidth. The system exploits such resources to globally provide one or multiple services in a complete or partially decentralized way

• Def2: A way of structuring distributed applications such that the individual nodes have symmetric roles, rather than being divided into clients and servers with quite distinct roles. A key concept for p2p systems is therefore to permit any two peers to communicate with one another in such a way that either ought to be able to initiate the contact

Ref: IRTF Peer-to-Peer Research Group (P2PRG)

Nov. 2015 Angelo Furno – Networks 2 31 P2P Systems: sharing means power!

• P2P is relative to give and receive from a community • each peer shares a set of resources • and obtains in return a set of computational/data resources • e.g. share music (audio files) and obtain music in return (Napster, Gnutella,…) • each peer behaves both like a client and like a server • Servent: server + client • the shared resources are at “the border” of the Internet • directly shared by the peers • possibly, without “special purpose nodes” for their management

Nov. 2015 Angelo Furno – Networks 2 32 Benefits of P2P Systems

• Self-organization • Lack of central coordination • Efficiency through resource sharing • Bandwidth, storage, and processing power at the edge of the network  reduced costs to implement large-scale distributed systems • Scalability • Aggregate resources grow naturally, as more peers join • Large number of peers means reliability • Replicas • Geographic distribution • No single point of failure • Resilient to certain kinds of attacks • but vulnerable to others

Nov. 2015 Angelo Furno – Networks 2 33 Challenges of P2P Systems

• Heterogeneity • network peers have very different features • memory, disk space, processor speed bandwidth, uptime, idle time • Single points of failure (still) • most p2p networks still rely on a centralized bootstrap and update server • may be critical to filtering, blocking, and recovery in case of system-wide failure. • fully decentralized solutions exist • NAT Traversal • PCs are connected to the Internet via routers using NAT (Network Address Translation) and firewalling • NAT routers prevent incoming connections per default • NAT traversal is required for a peer to become active part of p2p infrastructure • Security • parts of the p2p infrastructure are accessible to everyone • need for protecting against data and routing manipulation • Privacy • protect privacy despite the fact that data is stored on unreliable nodes

Nov. 2015 Angelo Furno – Networks 2 34 P2P Systems: churn

• Peers’ presence is transient • connections and disconnections to the network happen at a high rate • Resources offered by the peers are dynamically added and removed • Each peer is paired with a different IP address for each connection to the System • a resource cannot be located by a static IP • special addressing mechanisms are required at the application level • not at the IP level

Nov. 2015 Angelo Furno – Networks 2 35 Evolution of P2P (1)

• (Back to the 60s) ARPAnet had P2P like qualities • End-to-end communication, FTP, USENET,.. • Today’s BGP is P2P • (2K) P2P Boom! • (1999) started from centralized servers • Napster is launched • Centralized directory, Single point of failure • (2001) second generation used flooding (Gnutella v0.4) • Local directory for each peer • High cost, worst-case O(N) messages for lookup • (2002) third generation used more structured approaches (Gnutella 0.6) to improve performance • Ultra Peers/Leaf Peers

Nov. 2015 Angelo Furno – Networks 2 36 Evolution of P2P (2)

• (2003) • EDonkey/Emule dominates P2P File Sharing in Europe, KaZaA/Fastrack in USA

• (2005): • BitTorrent becomes the most popular P2P file sharing network • Skype dominates VOIP • eDonkey replaced by eMule, exploiting a similar protocol

• (2008) • P2P music on demand streaming: Spotify was launched in Sweden • in 2011, of all tracks that were not accessed over the Internet, roughly 80% went through the P2P network. Means reduced server resources and associated costs

• (2009): • Wuala P2P-based storage service, now cloud-based • users can decide to share with the community the free space on the hard disks of the users • PPLive: P2P-based Video Streaming Platform: diffusion in Asia • Vuze (an Azureus evolution); P2P-based Video-on demand • First version of Bitcoin

Nov. 2015 Angelo Furno – Networks 2 37 Volumes of P2P traffic

• P2P Volume • Estimates range from 20-40% of Internet Traffic

• Latest estimates from Cisco suggest that P2P video delivery is growing, while file sharing traffic is becoming smaller • as well as for voice • Skype, P2PSIP

• Hundreds of millions of people use P2P technology today

Nov. 2015 Angelo Furno – Networks 2 38 P2P Overlay Classification: Unstructured P2P

• Unstructured P2P systems (Gnutella, Kazaa,…) • A new peer randomly connects to a set of active peers in the P2P network • The resulting logical network (overlay network) is unstructured • Peers directly interacts without the presence of a centralized server

• Look-up algorithms: • centralized servers (Napster) • flooding (Gnutella)

• Look up cost • linear with N, where N is the number of nodes in the network • Problem: scalability

Nov. 2015 Angelo Furno – Networks 2 39 P2P Overlay Classification: Hybrid Unstructured P2P

• To increase the system performance, hybrid solutions have been defined (Gnutella v. 0.6, Kazaa, Skype) • Infrastructure nodes (i.e. Superpeers) are allowed to exist and are often a type of central directory server • Dynamic definition of a set of Superpeers indexing the resources shared by the peers • Superpeers can be special peers with high-performance hardware that manage the other peers in the network

• Hierarchical structure is introduced

• Resources can be directly exchanged among the peers

Nov. 2015 Angelo Furno – Networks 2 40 P2P Overlay Classification: Structured P2P

• The selection of neighbors is defined according to a given criterion • The resulting overlay network is structured • Goal: guaranteeing bounded latency

• The network structure guarantees that look up of a resource has a bounded complexity • CAN, Chord, Pastry, Emule - • e.g.: O(log N) in Chord • complexity guarantees also for peer joins and leave

• Churn handling becomes costly • Need for maintaining the structure

• Most popular kind of structured network: distributed hash tables • Hash functions to assign logical identifiers to peers and data • Same identifier space is used for both peer and data • Requires the definition of a mapping function data-peer • Key-based search • Routing algorithm definition • Each routing hop on the overlay is guided by the key identifying the data that is searched

Nov. 2015 Angelo Furno – Networks 2 41 P2P Overlays: recap

Source: Eberspächer, Jörg, et al. "Structured P2P networks in mobile and fixed environments"

Nov. 2015 Angelo Furno – Networks 2 42 3.1

UNSTRUCTURED P2P OVERLAYS Different Lookup Strategies, Napster, Gnutella

43 Unstructured P2P with centralized lookup

• Overlay links are established arbitrarily • to retrieve the searched resources

• Two distringuished roles for resource lookup • Central Server • Manages the file index • Register new entries in the index • Handle client queries • Clients • Join: register to the central server • Publish: report to the central server • Lookup: query the server for a list of peers holding the required resource

• Peer role for resource usage/retrieval • resource usage/transfer is directly managed by the peers • i.e., files are directly exchanged among the peers • the server is not involved

Nov. 2015 Angelo Furno – Networks 2 44 Unstructured P2P with centralized lookup: Napster Case Study

• Pre-req: users install the client software • Register name, password, local directory, etc.

1. Installed client contacts Napster • via TCP • provides a list of music files it will share • … and Napster’s central server updates the directory

2. The client searches on a title or performer • Napster identifies online clients with the file • … and provides IP addresses

3. The client requests the transfer of the file from the chosen supplier • Supplier transmits the file to the client • Both client and supplier report status to Napster

Nov. 2015 Angelo Furno – Networks 2 45 Unstructured P2P with centralized lookup: Limitations

• The good • Consistent view of the P2P network • Search guaranteed to find all files in the network

• Limitations of this design are severe • Centralized server is the weakest point of the system • File transfer is decentralized, but locating content is highly centralized • Single point of failure • Attacks, network partitions, … • Bottleneck • Limited scalability • the central server requires more and more resources to scale • when the number of requests increases

• Need for more decentralization for better scalability and performance

Nov. 2015 Angelo Furno – Networks 2 46 Pure Unstructured P2P

• Join: the overlay links are established arbitrarily • each has to retrieve at least one/a few neighbors • a new peer that wants to join the network can copy existing links of another node form its own links over time • No structure has to be created to make the network work • No need to maintain a structure whenever something changes

• Lookup: no server, decentralized database • the entire network consists solely of equipotent peers

• Publish: each peer provides a bunch of resources (e.g. files) • Items are published locally (by replying to queries) for sharing

• Leave: no need to notify server/restructure the network • Neighbors have to eventually locally remove the link

Nov. 2015 Angelo Furno – Networks 2 47 Pure Unstructured P2P: lookup by flooding

• Flooding • the query message is propagated to all the peers in the network • Sender ask its neighbors, who ask their neighbors, who ask their neighbors, and so on... when/if found, reply to sender • Enhanced by TTL (Time to Live) • associated to queries • to limit propagations in the network • allows to find any answer to the query • search is now distributed but…

• Many of the popular P2P networks are unstructured and use flooding and its variations! • Gnutella v. 0.4

Nov. 2015 Angelo Furno – Networks 2 48 Pure Unstructured P2P: Gnutella Client implementations

Nov. 2015 Angelo Furno – Networks 2 49 Pure Unstructured P2P: Gnutella 0.4 Case Study

• Gnutella 0.4 one of the first proposal of a completely decentralized overlay • http://rfc-gnutella.sourceforge.net/developer/index.html

• History: developed in about a month by Frankel e Peppers at Nullsoft, acquired by AOL • Open protocol specification • Gnutella: ≈ GNU Project + Nutella • user base in the millions for peer-to-peer file sharing in 2010

• Several definition of the protocol • Specifications 0.4, 0.6, 0.7 and Gnutella 2 • many implementations, with several optimizations • Clients: • Limewire (shutdown in 2010); Limewire Pirated Edition; WireShare

Nov. 2015 Angelo Furno – Networks 2 50 Gnutella 0.4: Lookup

• No Centralized Database

Peer Q has «Nirvana» Peer P searches for «Nirvana» Functions: • PING • PONG • QUERY • QUERYHIT • Message header • PUSH

Message Set by source, Number of times Lenght of the Message type decreased by descriptor has identifier actual message identifier neighbors been forwarded. content (7 by default) TTL(i) + Hops(i) = TTL(0)

Nov. 2015 Angelo Furno – Networks 2 51 Gnutella 0.4: Protocol Phases

• Bootstrap • a peer joining the network needs to discover the address of a peer who is already a member of the network and open a connection with it (GNUTELLA CONNECT messages) • contact known beacon (bootstrap) servers • store a list of peers often active on the network • keep a local and persistent peer cache • store list of peers contacted in the previous sessions • bootstrap evolution in recent version of the Gnutella protocol

• Network exploration • network is explored by PING/PONG messages to find new neighbors • PING messages sent to discover peers • PONG messages are received including data regarding peers • follow the reverse path of PINGs • new connections may be established with the new peers

Nov. 2015 Angelo Furno – Networks 2 52 Gnutella 0.4: Protocol Phases (2)

• Breadth first search for lookups • query are sent from the initiator to neighbors (QUERY message) • neighbors propagate the query through TTL-enhanced-flooding • replies are received from the neighbors • “best” reply is selected (QUERYHIT message is sent) • PUSH request message can be used to circumvent firewalls • servent sends file to the requesting node after receiving request • HTTP Push proxy: proxy sends the push request

• Download • direct connection with the peer offering the content and download of the selected file • data exchange through the HTTP protocol

Nov. 2015 Angelo Furno – Networks 2 53 Gnutella 0.4: Bootstrap (1)

• WebCache (GWebCaches) • Web servers able to interact with the bootstrapping servent • store the IP addresses of a set of servents and to further WebCaches • dynamically and automatically updated by the servents • define a further overlay network, “over” the Gnutella network

• The servent • at startup, selects one random web cache from the list of known caches • stores information received from the GWebServer in its local cache • periodically notifies the GWebServer about its presence in the network • only after the servent is active in the overlay for a given time interval • e.g. an hour • after this time interval, the updates are periodically sent • not if it is natted or behind a firewall

Nov. 2015 Angelo Furno – Networks 2 54 Gnutella 0.4: Bootstrap (2)

• Internal Cache (Local Cache) • each servent locally stores the IP addresses of servents contacted in the current and previous sessions • to avoid contacting many times the GWebServer • this represents a permanent cache • uploaded in the RAM memory when the client joins the overlay • dynamically updated in RAM • from the PONG, QUERYHIT and X-TRY messages

• depending on the uptime, number and size of shared files, last time when the remote servent has been contacted • saved on the disk when the servent disconnects • only a portion of the RAM references is made permanent • an initial list of hosts and GWebServers is distributed with the application

Nov. 2015 Angelo Furno – Networks 2 55 Gnutella 0.4: Bootstrap (3)

• Bootstrap procedure • Try to join the Gnutella overlay, by exploiting data from the Local Cache • to keep low the load on the GWebServer • If after X seconds no connection has been established, send a query to a randomly selected GWebServer • If after Y seconds no reply has been received from the GWebServer, try to contact a further GWebServer and so on, until a contact is established • Do not contact the same GWebServer during the same session

• Standard values: X=5 seconds, Y=10 seconds

Nov. 2015 Angelo Furno – Networks 2 56 Gnutella 0.4: Exploring the network

• How can a new peer enlarge its neighborhood after the bootstrap? The new peer connects to (at least) a known host (by bootstrapping) 1. Sends a PING message to the network through this host 2. Receives a set of PONGs through the existing connections 3. Chooses a subset S of the PONGs 4. Define a TCP connection with the nodes in S

Nov. 2015 Angelo Furno – Networks 2 57 Gnutella 0.4: TTL-enhanced flooding in PING/PONG messages

• When a peer receives a PING message (same for QUERY msg) • increments the HOP counter and decrements TTL (Time-to-Live) • discards the message if one of the following condition is true: 1. TTL=0 2. it has already transmitted the same message (exploits unique identifiers) • otherwise, it sends the message to all its neighbors except the peer from which it has received the message • or to a subset of k neighbors • replies with a PONG to the received PING • stores the relations between received and sent messages in a hashtable

• Standard values: TTL=7, k=4 • When a peer receives a PONG (or a QUERYHIT) • forward the PONG message to the peer from which it has received the corresponding PING message (incrementing HOP and decrementing TTL)

Nov. 2015 Angelo Furno – Networks 2 58 Gnutella 0.4: Exploring the overlay

3 6 Ping 1 Ping 1 Pong 6,7,8 Pong 3 Pong 6,7,8 Pong 6

Ping 1 7 Pong 3,4,5 Pong 5 5 1 2 Pong 7 Ping 1 Ping 1 Ping 1 Known Hosts: Pong 2 2 Pong 8 Pong 4 8 3,4,5 Ping 1

6,7,8 4 Query/Response analogous

Nov. 2015 Angelo Furno – Networks 2 59 Gnutella 0.4: Exploring the overlay

PONG PING PONGPONG PONG PING PING PONG PONG PING PONGPONG

PONG PONG

PING

PING PONG PONGPONGPONGPONG PONGPONGPINGPONG PONG

Nov. 2015 Angelo Furno – Networks 2 60 Gnutella: Handshaking for establishing connections

• Once servent A has obtained the address (IP + port number) of another servent B on the network • opens a TCP connection with B and asks B if it is available for a Gnutella connection • sends a message (GNUTELLA CONNECT) to B including text lines (ASCII encoded)

• B may decide to accept the connection from A or to refuse it • on the basis of management strategy of the connections • List of (IP address, port) of Gnutella servents known to B are sent to A • even if B decides to not accept the connection • X-try header: sort of connection pongs

Servent A Servent B

Nov. 2015 Angelo Furno – Networks 2 61 Gnutella 0.4: Exploring the overlay

Nov. 2015 Angelo Furno – Networks 2 62 Identifiers in Unstructured P2P

• Identifiers are exploited by the routing protocol • to allow responding to query and ping messages • backward routing used with • QUERYHIT • PONG • PUSH (Connection reversing for NAT trasversal) • to avoid message duplications

• Globally unique identifier (GUID) • uniquely identifies a message, it is included in each Gnutella message • Servent Identifier • included in QueryHit and Push messages

Nov. 2015 Angelo Furno – Networks 2 63 Identifiers for QUERYHIT and PONG

• Peer P receives a QUERY/PING message • store the GUID of the request message (Query/Ping) • wait for the reply (QueryHit/Pong) • route the reply to the connection over which the request was received

• Exploit an hashtable • {key = GUID, value = connection from which the message was received} • For each received QUERY/PING, insert in the table the pair GUID:connection_ID • if the message is generated by the local host • set the connection reference to null. • Match GUID in the received QUERYHIT/PONG with the corresponding QUERY/PING and forward using the connection

Nov. 2015 Angelo Furno – Networks 2 64 Gnutella: TCP overlays

• Unstructured overlay networks • made of servent nodes • with TCP connections as edges • each logical edge may correspond to more physical hops • to exchange binary data

Nov. 2015 Angelo Furno – Networks 2 65 Gnutella: pong caching

• Problem: ping/pong – broadcasting causes too much traffic

Frequency of different types of messages in Gnutella 0.4: 41.7% of the total amount of messages is exploited for network exploration

• Solution: Pong Caching • Servents keep information about nearby nodes, and send this when pinged to reduce the number of messages

• When a host A receives a PING message from a neighbor N1

• the first time, A propagates the PING received from N1 to all the neighbors and stores the received PONGs

• afterwards, A does not propagate the PING received from N2 but exploits a local cache of PONGs

Nov. 2015 Angelo Furno – Networks 2 66 Gnutella: pong caching (2)

• Problems • peers may disconnect from the network • Peer churn… • shared data changes • e.g., new files available on some peers • Cache has to be periodically refreshed

• Different pong caching implementations in GNUTELLA clients • A good PONG caching strategy should: • cache PONG messages from host which are online with high probability • guarantee a uniform choice among the network peers • optimize the bandwith of the PING/PONG messages

• Any implementation of the PING/PONG protocol must satisfy the following properties: • each PONG is associated with the same GUID of the received PING • TTL(PONG) ≥ HOP (PING) • a PONG will never be received from a peer who is more than TTL hops distant from the peer who sent the PING

Nov. 2015 Angelo Furno – Networks 2 67 Gnutella: pong caching (2)

PONG(D), HOPS = 3

• Node A PING(F), • Receives a PONG(D) from previous HOPS = 2 PING and stores in its cache Pc=PONG(D), with HOP=3 GUID(Po) = GUID(Pi) • receives a PING(F), HOP=2

HOPS(Po) = HOPS(Pc) + 1 • builds a PONG (F) starting from

TTL(Po) = 7 – HOPS(Pc) PONG Pc stored in its cache

Check whether: • TTL(PONG(F)) = 7-3= 4 TTL(Po) < HOPS(Pi) • Since 4 ≥ 2 (HOP(PING(F))) and do not send PO if TRUE • PONG is sent to F

Nov. 2015 Angelo Furno – Networks 2 68 Gnutella: pong caching (2)

PONG(D), HOPS = 3

• Node A PING(I), • stores in its cache PONG(D), HOPS = 5 HOP=3 • receives a PING(I), HOP=5 • builds a PONG(I) starting from PONG(D) stored in its cache • TTL(PONG(F)) = 7-3= 4 • Since 4 ≥ 5 (HOP(PING(I))) • PONG is discarded

Nov. 2015 Angelo Furno – Networks 2 69 Gnutella: queries

• Queries work very much like the original Ping/Pong scheme • Send a QUERY message to neighbors • with TTL=7 and Hops=0 • This defines the default “horizon” for reaching servents • As TTL increases, so does the horizon, and the number of hosts that can be reached, improving odds of a hit

• Decrement TTL, increase Hops when you receive • Continue forwarding while TTL != 0 • If you have the requested file, return a QUERYHIT message

Nov. 2015 Query Example

File Iis need requested . a guitar song

Query Message Query Hit Message 71 Transferring Data

• When a file is located using the Gnutella protocol, the QUERYHIT message returns a host/port to contact to obtain the file • The connection to get data is made using HTTP • Usually (but not always) same host/port as used for Gnutella traffic

• Problems occur when servent is behind firewall • Firewall’s filter rules are usually designed to block incoming connections that are not a response to a previous outgoing request

NATed peer Non-NATed peer

Nov. 2015 72 NAT

• NAT (Network Address Translation) • router functionality: IP addresses (and possibly port numbers) of IP datagrams are replaced at the boundary of a private network • a method that enables hosts on private networks to communicate with hosts on the Internet • NAT is run on routers that connect private networks to the public Internet, to replace the IP address-port pair of an IP packet with another IP address-port pair

Nov. 2015 Angelo Furno – Networks 2 73 NAT Connection Reversal

• Scenario: peer A queries for something, B has the answer to the query, but servent B is behind a NAT • It means that external hosts cannot open connection to B • if a A does not belong to the same private network than B • B can still receive queries/pings through the TCP overlay connections it has opened towards other peers NATed on the overlay peer

• What happens? 1. The QUERYHIT message is sent from B to A, possibly traversing a chain of servents, including the following information • B’s ServentID • firewalled flag • Meaning that the host who sent the message is unable to accept incoming connections 2. When A receives the QUERYHIT from a peer X of the chain (peer B has the answer), it sends a Push message to X, including • B ServentId • An index uniquely identifying the file to be pushed from the target servent • IP and port of peer A

Nov. 2015 Angelo Furno – Networks 2 74 NAT Connection Reversal (2)

3. The Push message is backward routed through the chain to B • During query resolution, each time a peer receives a QUERYHIT over a TCP connection C, it inserts in a PUSH table the pair for routing possible future PUSH messages • When a PUSH messages is received • it is paired with the servent identifier of the previous QUERYHITs • the push table is used to perform backward-routing on the connection of the corresponding QUERYHIT 4. When B receives the Push message, it opens a connection to A and sends the required file

• Works only if A has a public IP address • If also A is behind a NAT firewall, more advanced solutions are required • e.g. Universal Plug and Play; tcp/udp hole punching; STUN

Nov. 2015 Angelo Furno – Networks 2 75 Can we improve flooding in unstructured P2P?

Thampi, Sabu M. "Survey of search and replication schemes in unstructured P2P networks." arXiv preprint arXiv:1008.1629 (2010).

Nov. 2015 Angelo Furno – Networks 2 76 Further readings

• Read more on optimized flooding in unstructured P2P: • Shiping Chen, Zhan Zhang, Shigang Chen and Baile Shi, “Efficient File Search in Non-DHT P2P Networks”, in Computer Communications 2008 • Gkantsidis, C.; Mihail, M.; Saberi, A., "Hybrid search schemes for unstructured peer-to-peer networks," in INFOCOM 2005. 24th Annual Joint Conference of the IEEE Computer and Communications Societies. Proceedings IEEE , vol.3, no., pp.1526-1537 vol. 3, 13-17 March 2005 • Thampi, Sabu M. "Survey of search and replication schemes in unstructured P2P networks." arXiv preprint arXiv:1008.1629 (2010) • Kai-Hsiang Yang, Chi-Jen Wu and Jan-Ming Ho " Antsearch: An Ant Search Algorithm in Unstructured Peer-to-Peer Networks " in IEICE Transactions on Communications 2007 • Read more on the Gnutella protocol: • http://rfc-gnutella.sourceforge.net/developer/stable/ (v 0.4) • http://rfc-gnutella.sourceforge.net/src/rfc-0_6-draft.html (v 0.6) • Eberspächer J; Schollmeier R., “First and Second Generation of Peer-to-Peer Systems,” in Peer-to-Peer Systems and Applications, vol. 3485, Chapter. Lecture Notes in Computer Science pp 35-56 • Applications based on Unstructured P2P • Pianese, F.; Perino, D.; Keller, J.; Biersack, E.W., "PULSE: An Adaptive, Incentive-Based, Unstructured P2P Live Streaming System," in Multimedia, IEEE Transactions on , vol.9, no.8, pp.1645-1660, Dec. 2007 • Details on NAT traversal techniques: • Carboni, D. “P2P and NAT”, PPT slides available online: http://www.cc.gatech.edu/classes/AY2013/cs4270_fall/nat-traversal-carboni.ppt

Nov. 2015 Angelo Furno – Networks 2 77 3.2

HYBRID-UNSTRUCTURED P2P OVERLAYS Superpeers, Gnutella 0.6, Kazaa

78 Beyond Pure Unstructured P2P

• The good • Simple and fully decentralized lookup • search cost is distributed among peers • Search guaranteed to find all query answers in the network • Processing local to each node permits powerful search

• Limitations coming from flooding are severe • Search scope is O(N) • N = number of peers in the network • Message complexity is O(E) per query • E = number of overlay links • bandwidth is wasted • Time complexity equals network diameter D • What if number of peers is high and many queries are generated at the same time?

• TTL-limited search mitigates but does not solve the problem • False negative • For scalability, better NOT to query every node

Nov. 2015 Angelo Furno – Networks 2 79 Hybrid Unstructured P2P

• Hybrid of centralized unstructured and pure unstructured overlays

• Two Roles • Superpeers act as local search hubs • similar to a Napster server for a small portion of the network • automatically chosen by the system • based on their capacities (storage, bandwidth, etc.) and availability (connection time) • dynamic definition of a hierarchical level in the network • periodically exchange information on peers’ resources • take charge of much of the load on slower nodes • Peers • upload their resource description to a super-peer • query the superpeers • involved in resource transfer

• Famous P2P-hybrid overlays • Gnutella (starting from v.0.6) • Kazaa & (proprietary systems)

Nov. 2015 Angelo Furno – Networks 2 80 Hybrid Unstructured P2P: Gnutella 0.6 Case Study

• Gnutella 0.6: improvement with respect to Gnutella 0.4 • UltraPeers • execute all the functions of the Gnutella protocol • PING, PONG, QUERYHIT, QUERY, PUSH • define a superpeer overlay network • Gnutella overlay • are proxy for the leaf nodes they manage • must have suitable characteristics • high speed connections and connections not shielded by a NAT or a firewall

• LeafNodes • connect to the UltraPeer to access the Gnutella network • do not participate to the PING/PONG QUERY/QUERYHIT protocols and should not accept GNUTELLA CONNECT requests • open only a few connections towards the UltraPeers • typically to 1 superpeer only • involved in file transfers

Nov. 2015 Angelo Furno – Networks 2 81 Hybrid Unstructured P2P: Gnutella 0.6 Case Study

• Peers autonomously detect if they have the characteristics to become UltraPeers

• Criteria for being UltraPeer capable • Firewall/NAT absence • Bandwidth: approximated by computing download/upload throughput • e.g. Limewire, a famous client implementation, requires 10kB/s UpLoad, 20kB/s download to be an UltraPeer • RAM available to store the routing tables • Operating system • some operating system can manage a higher number of sockets with respect to others • e.g. Limewire forbids a node from becoming a UltraPeer if it runs an old version of the operating system • Computational power to manage the incoming queries • Future uptime • an heuristics: the future uptime is proportional to the past one

Nov. 2015 Angelo Furno – Networks 2 82 Hybrid Unstructured P2P: Gnutella 0.6 Electing UltraPeers

• Superpeer election is dynamic and demand-driven • e.g. estimated by considering the total amount of UltraPeer currently available in the overlay • gossip aggregation algorithms

• Simplified solution for Gnutella 0.6 based on handshake messages • periodically exchanges • expolit new headers introduced by the Gnutella 0.6 protocol • X-UltraPeer • tells whether a host plans on acting as a ultrapeer (if true) or a shielded node (if false) • X-UltraPeerNeeded • used to balance the number of UltraPeers. • X-TryUltraPeer • addresses of ultrapeers to try if the connection is lost • X-Degree • number of leaves managed by an UltraPeer • X-QueryRouting • support QueryRouting Protocol (QRP)

Nov. 2015 Angelo Furno – Networks 2 83 Hybrid Unstructured P2P: Gnutella 0.6 Leaf connecting to UltraPeer

At the end of the handshake example NODE A: • is a shielded node of the UltraPeer • should drop any incoming connection request • sends a QRP routing table to the UltraPeer

Nov. 2015 Angelo Furno – Networks 2 84 Hybrid Unstructured P2P: Gnutella 0.6 Leaf A connecting to Shielded Leaf B

• Leaf A may try to connect to Leaf B • B could have been an UltraPeer before • Now it’s a shielded leaf and current state is still not updated on the bootstrapping nodes • B has to refuse incoming connection

• Sometimes nodes are ultrapeer-incapable and unable to find a ultrapeer • in this case, they behave exactly like old, unrouted Gnutella 0.4 connections (they connect to unshielded peers)

Nov. 2015 Angelo Furno – Networks 2 85 Hybrid Unstructured P2P: Gnutella 0.6 UltraPeer meeting UltraPeer (case 1)

• New connections emerge among ultrapeers • PING/PONG mechanism is performed by ultrapeers

Ultrapeer A Ultrapeer B

GNUTELLA CONNECT/0.6 X-Ultrapeer: True

GNUTELLA/0.6 200 OK X-Ultrapeer: True

GNUTELLA/0.6 200 OK

• A new connection is established between A and B

Nov. 2015 Angelo Furno – Networks 2 86 Hybrid Unstructured P2P: Gnutella 0.6 UltraPeer meeting UltraPeer (case 2) Ultrapeer A Ultrapeer B

GNUTELLA CONNECT/0.6 X-Ultrapeer: True

GNUTELLA/0.6 503 Service Unavailable X-Ultrapeer: True

• B already manages a large number of connections • B refuses A’s connection request

Nov. 2015 Angelo Furno – Networks 2 87 Hybrid Unstructured P2P: Gnutella 0.6 Leaf Guidance

• Sometimes there can be too many UltraPeer-capable nodes on the network • Leaf guidance mechanism • goal: to balance leaf nodes and UltraPeers • what: forces the choice of the role for a new peer connecting to the overlay • how: local mechanism, if executed by all the UltraPeers, the whole network tends to balance

• Implemented through the X-Ultrapeer-Needed Gnutella 0.6 header • X-Ultrapeer-Needed=True the role of UltraPeer is accepted • X-Ultrapeer-Needed=False the node has a low number of leaf nodes • “do you accept to become my leaf node?”

Nov. 2015 Angelo Furno – Networks 2 88 Hybrid Unstructured P2P: Gnutella 0.6 Leaf Guidance

• Each UltraPeer has • a set of k slots available for connection with further UltraPeers • n slots for connections with leaf nodes

• If a leaf node asks for a connection • this is accepted if there is room • otherwise it is refused

• If an UltraPeer A asks for a connection to a UltraPeer B • if B has just a few leaf nodes, leaf guidance is activated to accept A as a shielded leaf, instead of leaving an additional UltraPeer in the overlay • otherwise, A remains UltraPeer and a new connection towards it is accepted by B only if there is room

Nov. 2015 Angelo Furno – Networks 2 89 Hybrid Unstructured P2P: Gnutella 0.6 Leaf Guidance Node A Ultrapeer B

GNUTELLA CONNECT/0.6 X-Ultrapeer: True

GNUTELLA/0.6 200 ok X-Ultrapeer: True X-Ultrapeer-Needed: False

GNUTELLA/0.6 200 OK X-Ultrapeer: False

• B has few leaves and directs A to become a leaf node • If A has no leaf connections, it stops fetching new connections, drops any Gnutella 0.4 connections, and sends its QRP table to B • Then B will shield A from all traffic • If A has leaves, the leaf guidance is refused, A remains UltraPeer and establishes a new connection to B

Nov. 2015 Angelo Furno – Networks 2 90 Hybrid Unstructured P2P: Gnutella 0.6 Exploring the network

• Message Types: • Content Request: QUERY/QUERYHIT • Keep Alive: PING/PONG

• The ping-pong messages are exchanged only among UltraPeers • UltraPeers never propagate ping messages to their leaves • UltraPeers “shield” Leaves from keep-alive traffic • An UltraPeer may receive a ping from its LeafNodes • it replies pong with the information received from other SuperPeers • A LeafNode may exploit this information when its UltraPeer disconnects from the overlay • or in the case it wants to open connections with further UltraPeers

Nov. 2015 Angelo Furno – Networks 2 91 Hybrid Unstructured P2P: Gnutella 0.6 Query Routing Protocol

• QRP governs how Ultrapeers filter queries by only forwarding them to the leaf nodes most likely to have a match • Goal: avoid forwarding a query that cannot match • done without even knowing the resource names • by looking at the query words through the Query Routing Table (QRT) • sent by each leaf node to its Ultrapeer • How: requires combined action of Leaf and Ultra peers

Nov. 2015 Angelo Furno – Networks 2 92 Query Routing Protocol: File Indexing on a leaf

1. Whenever a new file is available on a leaf, break file name into individual words • word = consecutive sequence of letters and digits (ASCII characters only) • words are separated by space in the query 2. Hash each word with a well-known hash function and insert a "present" flag in the corresponding hash table slot • a big boolean array more than a real hashtable • only stores the fact that a key ended up filling some slot • only those words with at least 3 letters are retained 3. Re-hash words by removing their trailing 1, 2, or 3 letters • store the re-hased word only if is at least 3 letter long after trimming • simple attempt to remove plural from words 4. Optionally, chop off more letters from the end 5. The "boolean vector” is • optionally compressed • broken up into small messages • sent mixed with regular Gnutella traffic from a peer to its Ultrapeer

Nov. 2015 Angelo Furno – Networks 2 93 Query Routing Protocol: Indexing a new file on a leaf, example

• Indexing a new song on peer A: “Bohemian Rhapsody - Queen” • break it into 4 words “bohemian”, “rhapsody”, “-”, “queen” • discard “-” • apply the hash function h(k, m) to each word k • m = the QRT size • the hash function should hash keys of arbitrary length reasonably uniformly • should be easy to implement and efficient to compute on most platforms • h(‘bohemian’, 8) = 0, h(‘rhapsody’, 8) = 2, h(‘queen’, 8) = 5 • Peer A's routing table = bitmap 00100101 • Query Routing Table: a boolean vector R, including information on the content shared by the leaves • Size from 64 kilobit slots up to 2 megabit

Hash function example Query Routing Table example • n = k[3]k[2]k[1]k[0] ^ k[7]k[6]k[5]k[4] ^ ... • integer value on 32 bits; k[i] = i-th byte from k • h’(n, m) = m ((n A) mod 1) where A = (√5-1)/2≈0.6180339887...

Nov. 2015 Angelo Furno – Networks 2 94 Query Routing Protocol: Query match on UltraPeers

• Store updated boolean vectors as the Routing table for each leaf node • Query resolution • queries are broken into individual words, all accentuated letters are removed. • for each leaf node with a Query Routing table: • each word is hashed and looked up in the Query Routing table • using the same hasing function used by the leaf nodes • to declare a Query Routing Hit 1. all the words have to be found in the Query Routing table 2. only some of them • only those queries that were declared a hit at the previous stage are forwarded to a given leaf node

• NO False negative • if all the positions of the bitvector corresponding to the hash of the keys in Q are 0, the leaf does not possess the resource • False positive are possible

Nov. 2015 Angelo Furno – Networks 2 95 Query Routing Protocol: Query Forwarding

• QUERY message: Content is requested by one leaf/UltraPeer • Leaf sends the request to its known UltraPeer • UltraPeer looks up in its routing tables whether content is offered by one of its leaf nodes • if so, case the request is forwarded to this node • Additionally, the UltraPeer increases the hop counter, decreases TTL and forwards this request to the UltraPeer neighbours • If a UltraPeer receives such a request from another UltraPeer, this request is handled the same way, as if it would have received it from one of its leaf nodes • When the TTL reaches 0, the message will no longer be forwarded

• NOTE: TTL-flooding limited to UltraPeers only

Nov. 2015 Angelo Furno – Networks 2 96 Query Routing Protocol: Query Response

• When a leaf node receives a request • double-checks whether it shares the file • may have not the content, due the approximation of QRT • if success • leaf node sends a content reply (QUERYHIT) back to the requesting peer by means of backward routing • by sending it back to that node (UltraPeer) it received the message from • Hop by hop the message can thus be routed back to the requesting node

• Content exchange • directly between the leaf nodes, via HTTP connections

Nov. 2015 Angelo Furno – Networks 2 97 Query Routing Protocol: Routing the indexes

• Table Merge • UltraPeers combine their own QRT and those of their Leaf Nodes • by computing the bitwise OR of the corresponding positions in the different QRTs • Index routing • Each UltraPeer sends the combined routing table to its neighbours • by exploiting regular query forwarding traffic • In some Gnutella extension • propagate the routing tables to all the hosts within a given radius, defined by the TTL • modify the content of the QRT such that every position points out the distance from the host owning the content • Last Hop Saving • do not send the query to a neighbour if TTL = 1 and the neighbour does not own a Leaf Node owning the content • bit=0 in all the positions corresponding to keywords which identify the content

Nov. 2015 Angelo Furno – Networks 2 98 Case Study 2: KAZAA

• Created in March 2001 by Estonian programmers from BlueMoon Interactive (the same who developed Skype) • FastTrack protocol • uses encryption and not documented by its creators, closed source software clients • client-supernode communication has been reverse-engineered • but the supernode-supernode communication protocol remains largely unknown • Two Tier Architecture (similar to Gnutella 0.6) • Ordinary Nodes (ON) • notify the metadata describing the shared content to the SN • Super Nodes (SN) • connect the ON to the Kazaa overlay; maintain the overlay; track the metadata of the ON; query management • Load balancing • choice of SN on the basis of their workload • Connections Shuffling

• the connections defining the overlay are continously “shuffled” using a gossip approach • Read more on Kazaa in: • Liang, Jian, Rakesh Kumar, and Keith Ross. "The kazaa overlay: A measurement study." Proceedings of the 19th ieee annual computer communications workshop. IEEE, 2004

Nov. 2015 Angelo Furno – Networks 2 99 Case Study 3: Skype

• Skype architecture is derived from Kazaa • developed by makers of Kazaa, now owned by Microsoft • exploits a proprietary protocol using encryption • no documentation, some analysis based on sniffers • centralized server for logging and billing • SuperPeers responsible of: • looking for the peers with whom the call has to be established • forwarding calls and call traffic • act as relays for NATed peers • 60% of the user is behind a NAT • Supports VOIP multi-user sessions • algorithm for merging different voice flows

Nov. 2015 Angelo Furno – Networks 2 100 Case Study 3: Skype Login

1. Login procedure performed through super nodes • Find super nodes to perform login with by sending UDP packets to bootstrap (defaults) super nodes and wait for their responses 2. Establish TCP connections with retrieved super nodes (SNs) based on responses 3. Acquire the address of a login server, through the connected SNs, and authenticate user 4. Send UDP packets to a pre-set number of nodes to advertise presence (a backup connectivity list)

• Host Cache (HC) is a list of super node IP address and port pairs that Skype Client maintains

Nov. 2015 Angelo Furno – Networks 2 101 Case Study 3: Skype Call

• Case 1: caller & callee have public IP addresses • Caller establishes TCP connection with callee Skype client; UDP for media transfer

• Case 2: Caller is behind port-restricted NAT, callee has public IP • Caller uses online Skype super node to forward signalling packets over TCP • SN acts as a relay only for establishing the call • UDP direct communication is used for media content streaming

• Case 3: Both caller and callee behind port-restricted NAT and UDP restricted firewall • Exchange signalling info with a Skype super node using TCP • Caller sends media over TCP to an online node which forwards to callee via TCP and vice-versa • The SN acts as a full relay between NATed nodes A and B • A and B can exchange traffic because relay SN can bounce data back and forth. This solution is robust and fully functional but has the big disadvantage of involving SN with large workload.

• Read more on Skype in: • Baset, S.A.; Schulzrinne, H.G., "An Analysis of the Skype Peer-to-Peer Internet Telephony Protocol," in INFOCOM 2006. 25th IEEE International Conference on Computer Communications. Proceedings , vol., no., pp.1-11, April 2006: doi: 10.1109/INFOCOM.2006.312

Nov. 2015 Angelo Furno – Networks 2 102 Recap on using Hybrid P2P overlays

• Advantages • The presence of superpeers makes the network scale better • by reducing the number of network nodes involved in message handling and routing, and reducing the actual traffic among them • Low performance peers are not involved in query routing / keep alive messages • No single point of failure • Can provide anonymity • Can be adapted to special interest groups

• Disadvantages • Still High signaling traffic, because of decentralization • No definitive statement possible if content is not available or not found • Super Nodes may become bottlenecks • Overlay topology not optimal, as • no complete view available • no coordinator • If not adapted to physical structure delay and total network load increases • Loops • Can not be adapted to physical network completely because of hub structure • Asymmetric load (Superpeers have to bear a significantly higher load)

Nov. 2015 Angelo Furno – Networks 2 103 DISTRIBUTED ARCHITECTURES AND P2P NETWORKS (PART 2)

Advanced Solutions for Networked Systems

ANGELO FURNO Lectures for the Course: INRIA, INSA de Lyon - France Networks 2 P2P Overlays: recap

Source: Eberspächer, Jörg, et al. "Structured P2P networks in mobile and fixed environments"

Nov. 2015 Angelo Furno – Networks 2 105 Outline

1. Applications (partially) relying on unstructured P2P • Spotify • Quick Introduction • BitTorrent • Quick Introduction

2. Structured P2P • Hash Tables • Distributed Hash Tables (DHTs) • CHORD

Nov. 2015 Angelo Furno – Networks 2 106 References: Stefano Ferretti, Spotify: History Gunnar Kreitz

• P2P-assisted on-demand music streaming • Launched in October 2008 by Swedish startup Spotify AB • Large catalog of music • over 15 million tracks • 24 million active users streamed over 4.5 billion hours of music • in 2013 • Available in more than 32 countries • USA, Europe, Asia “We’re now at a stage where we can power • More than 75 million active users music delivery through our growing number • of servers and ensure our continue to as of June 2015 receive a best-in-class service,” • including about 20 million paid users Alison Bonny at TorrentFreak • Fast Read more on Spotify: • median playback latency of 265 ms Gunnar Kreitz and Fredrik Niemela, • Legal “Spotify - Large Scale, Low Latency, • ad-funded (and free), or monthly subscription P2P Music-on-demand Streaming” (P2P 2010)

Nov. 2015 Angelo Furno – Networks 2 107 Spotify: Technical Overview

• Spotify supports media streaming • media content is distributed on the network and played out while downloading • on-demand media streaming proprietary protocol with encryption • content stored at server and peers and sent upon request • pause, fast forward, fast rewind are allowed • download rate should be higher than production/playback rate • no need for a complete download to begin the playback • Can also play locally stored and cached music files

• Client Applications • desktop clients on , OS X and Windows • smartphone clients on Android, iOS, Palm, Symbian, Windows Phone • also a Web based application • mostly developed in C++, some Objective-C++ and Java

Nov. 2015 Angelo Furno – Networks 2 108 Read more on BitTorrent: 1. Cohen Bram, "Incentives build robustness in BitTorrent." Workshop on Economics of Peer-to-Peer systems. Vol. 6. 2003. BitTorrent: History 2. Legout, Arnaud et al. "Rarest first and choke algorithms are enough." Proceedings of the 6th ACM SIGCOMM conference on Internet measurement. ACM, 2006. • In 2002, Bran Cohen debuted BitTorrent • Key Idea • Popularity exhibits temporal locality (Flash Crowds) • E.g. CNN on 9/11, new movie/game release • Efficient content distribution system using file swarming • Does not perform all the functions of a typical p2p system, like searching • Focused on Efficient Fetching, not Searching • Distribute the same file to all peers • Throughput increases with the number of downloaders via the efficient use of network bandwidth • Single publisher, multiple downloaders • Has many “legal” publishers • Blizzard Entertainment using it to distribute the beta of their new game

Nov. 2015 Angelo Furno – Networks 2 116 BitTorrent: Sharing a File

• To share a file or group of files, the initiator first creates a .torrent file, a small file that contains • Metadata about the files to be shared, and information about the tracker • address of the tracker • the SHA-1 hashes of all pieces • piece size • length of the file

• Downloaders first obtain a .torrent file, and then connect to the specified tracker, which tells them from which other peers to download file pieces • The tracker is a central server keeping a list of all peers participating in the swarm • A swarm is the set of peers that are participating in distributing the same files • A peer joins a swarm by asking the tracker for a peer list and connects to those peers • Some terminology • Seeder = a peer that provides the complete file • Initial seeder = a peer that provides the initial copy of the file and inform the tracker • Leech = any peer who is downloading the file • Lurker = peer downloading but not adding new content (they can be seeder anyway)

Nov. 2015 Angelo Furno – Networks 2 117 3.3

STRUCTURED P2P OVERLAYS DHT, Chord, CAN, Pastry

References: Laura Ricci, Peer to Peer Systems

129 Content Publishing/Searching in Structured P2P

• Main problem in P2P overlays is content search • In Unstructured P2P • content is stored (published) at the peer sharing it • due to the lack of network structuring, searching becomes very costly • Can we impose some constraints (rules, i.e. structure) on the way peers publish their resource to optimize search? • Search techniques will exploit the costraints on content publishing to optimize lookup • Addressing: associate a unique identifier ID to the content (sometimes called content key) and exploit the ID to address (i.e. retrieve) the content

Nov. 2015 Angelo Furno – Networks 2 130 Structured P2P: Motivations

• Centralized Approach: a server indexing the data • Search message/time complexity: O(1) • content is stored in a centralized server • Memory complexity (on the server): O(N) • N = amount of shared content) • Bandwidth complexity (connection server/overlay): O(N) • Complex queries may be easily managed

• Fully Distributed Approach: unstructured network • Search message complexity: O(N2) • “each node contact each of its neighbours” • Possible optimizations (TTL, identifiers to avoid cyclic paths) • Search time complexity: O(N) • Memory complexity (on each peer): O(1) • does not depend on the number of nodes in the system • no data structure to route queries (flooding) • Complex queries may be easily managed

Nov. 2015 Angelo Furno – Networks 2 131 Structured P2P: Motivations (2)

Hybrid(with P2P TTL) + flooding optimization

Nov. 2015 Angelo Furno – Networks 2 132 Structured P2P: Motivations (2)

(with TTL)

Nov. 2015 Angelo Furno – Networks 2 133 Hash Table

• A hash table (hash map) is a data structure • used to implement a associative arrays (i.e., maps) • an abstract data type that maps keys to values • when implementing a map with a hash table, the goal is • to store item (k, o) at index i = h(k) • to retrieve item by knowing its key k and index i = h(k) • uses a hash function h to compute an index into an array of N buckets or slots • should guarantee uniform distribution of the content • better than using keys directly • h maps keys of a given type T to integers in a fixed interval [0, N - 1] • T can be a very large domain • If N < |T | => Conflicts • Simple example • values to insert: [0, 1, 4, 9, 16, 25] • 10-size array • Key = Value • hash(k) = k mod 10

Nov. 2015 Angelo Furno – Networks 2 134 Hashtables (Cont’d) and DHT

• Collisions occur when different elements are mapped to the same cell • Separate Chaining • let each cell in the table point to a linked list of entries that map there • Interface include • insert(key, value) • value = lookup(key) • value = remove(key)

• Distributed Hash Tables • Map hash-buckets to nodes • Requirements • Uniform distribution of buckets • Cost of insert and lookup should scale well • Amount of local state (routing table size) should scale well

Nov. 2015 Angelo Furno – Networks 2 135 Distibuted Hash Tables

• In a DHT, each node is responsible for the management of one or more buckets • when a node enters/leaves, the buckets are partially re-assigned • the nodes communicate among them to detect-by- key the node managing a bucket • requires a communication mechanism scalable and efficient • all the operations of a classical hash table are supported

• The typical behaviour: • a node knows the key of the content it is looking for • routing brings to the node which is responsible for the bucket where the key is located • the node which is responsible for that bucket, directly sends the content or a pointer to the content (if any)

Nov. 2015 Angelo Furno – Networks 2 136 DHT: Data Management

• Basic Idea: the same address space is used for mapping nodes and data • How? • a hash function pairs a unique identifier • with each node • Hashing IP addresses + port number, URL, etc. • with each data stored in the P2P network from the key • for files, typically the whole filename is hashed • each node is responsible for the management of a portion of the addresses logical space (one or more buckets) used for the data • global ordering is defined over the address space (e.g. modulo operation) • some replication (redundancy) is often introduced • the correspondence between data and nodes may vary in time • due to the join/leave of the nodes • Data search • each node maintains a routing table, which gives a partial view of the system • typically only a small number of connections s.t. max num hops to traverse is minimized • key-based routing towards the node responsible of that data • guided by the knowledge of the unique identifier(key) of the data which is looked up • NO false negatives

Nov. 2015 Angelo Furno – Networks 2 137 Node with ID 1621 manages buckets with Structured P2P: DHT index from 1008 to 1621 611- The address space is a sequence of integers 1007 (e.g. 12 bit -> integer values in [0, 4095]) with a global ordering defined over it

Lookup message /time complexity (per query)

Memory complexity (per peer)

Nov. 2015 Angelo Furno – Networks 2 138 DHT: Data Storage

• Direct Storage • DHT stores pairs (key, value) • The data is stored as the value, when it is inserted in the DHT, onto the responsible node • such a node is not, in general, the node which has inserted the data into • Example: • key = H(“Data”) = 3107. • String “Data” is stored onto the node which manages the address portion including id 3107

• Indirect Storage • Value is a reference to the data • Example: the physical address of the node holding the data • The node storing the content is the same node holding the data, which has requested to insert it into the DHT • A more flexible solution, but it needs a further step to access the data

Nov. 2015 Angelo Furno – Networks 2 139 DHT: Routing

• Search for data “my_data” starts at an arbitrary node of the DHT and is guided by key=hash(“my_data”) • each node has a partial vision of the other nodes (routing table) • its successor on the address space ring • and a few additional connections to other “far” nodes • e.g. node 611 has 2011 as neighbor • next hop is computed by means of a routing algorithm over the address space managed by the nodes • e.g. can exploit the closeness between the searched data key and the node IDs of the neighbors in the routing table • termination: • the node in charge of handling the key associated to the requested resource will be reached by the routing algorithm • it will check if a (key, value) pair has been previously stored on it • it replies to the initiator with value or Null

Nov. 2015 Angelo Furno – Networks 2 140 DHT: Data Retrieval

• Content Download • Upon termination, send back to the requesting peer, through the reverse path, IP address and port • If direct storage has been used, the requestor which will open a connection to it • if indirect addressing is used, the requesting peer will download the content from a third peer

Nov. 2015 Angelo Furno – Networks 2 141 DHT: Node JOIN

• On startup, contact a bootstrap node and integrate yourself into the distributed data structure; get a node id • Churn: peers join and leave the network frequently • Need for maintaining the address space among the living peers • Initial configuration of the • peers (colored circles) are hashed to an address space • data (gray squares) are hashed using associated keys to the same address space • each peer is assigned a share of the address space according to its id and stores the (reference to) data hased to that portion • Whenever a new peer joins • the peer is hashed and neighbors are detected • the new peer will be assigned a share of the address space which is taken from its

neighbors

Data

-

Peers

Initial configuration Nov. 2015 Angelo Furno – Networks 2 142 DHT: Node LEAVE

• When a peer leaves the DHT • Neighbors inherit the responsibilities for the the data stored and the address space managed by the leaving peers • Which neighbors depend from the DHT • Voluntary leave of a node • partitioning of its address space to the neighbor nodes • copy key/value pairs to the corresponding nodes • deletion of the node from the routing tables of the other nodes • Node failure: the node suddenly disconnects from the network • all data stored on it are lost if they are not stored on other nodes • need for some redundancy (data replication) and periodical refresh • exploit alternative/redundant routing paths • periodical probing of the neighbour nodes to detect their activity. • When a fault is detected, update routing tables

Nov. 2015 Angelo Furno – Networks 2 143 DHT: API and Applications

Some Existing • DHTs offer a generic distributed service for Implemantations: information storing and indexing • Chord • UC Berkeley, MIT • The value paired with a key may be • Pastry • Microsoft Research, Rice • files University • Tapestry • IP addresses • UC Berkeley • or any other data…… • CAN • UC Berkeley, ICSI • Applications exploiting a DHT • P-Grid • EPFL Lausanne • DNS implementation • Kademlia protocool and the KAD network implementation • key: host name, value: list of corresponding IP addresses of e-Mule... • P2P storage systems • Symphony, Viceroy, … • example , PAST • Distributed Middleware services for higher level services

Nov. 2015 Angelo Furno – Networks 2 144 DHT: RECAP

• Structured Overlay Routing: • Join: On startup, contact a “bootstrap” node and integrate yourself into the distributed data structure; get a node id • Publish: Route publication for file id toward a close node id along the data structure • Search: Route a query for file id toward a close node id. Data structure guarantees that query will meet the publication. • Fetch: Two options: • Publication contains actual file => fetch from where query stops • Publication says “I have file X” => query tells you 128.2.1.3 has X, use IP routing to get X from 128.2.1.3

Nov. 2015 Angelo Furno – Networks 2 145 DHT: CHORD Case Study

• Chord • Stoica, I.; Morris, R.; Karger, D.; Kaashoek, M. F.; Balakrishnan, H. (2001). "Chord: A scalable peer-to-peer lookup service for internet applications" (PDF). ACM SIGCOMM Computer Communication Review 31 (4): 149 • Opens the new body of research on Distributed Hash Tables and structured P2P • Solves problem of locating a data item in a collection of distributed nodes, taking into account frequent node arrivals and departures • Core operation in most p2p systems is efficient location of data items • given a key (e.g. filename), it maps the key (and the corresponding object/reference) onto a node • The protocol specifies • how to find the locations of keys • how new nodes join the system • how to recover from the failure or planned departure of existing nodes

Nov. 2015 Angelo Furno – Networks 2 146 Chord: Consistent Hashing

• Assign unique m-bit identifiers to both nodes (peers holding the buckets) and objects (e.g. files) • the hash key space is represented as a ring • global ordering of the ID set X follows their numerical ordering modulo |X| • Use a hash function that evenly distributes items over the hash space and a low collision probability • SHA-1 (m=160 bits) • a cryptographic hash function designed by the United States National Security Agency • SHA-1 produces a 160-bit (20-byte) hash value known as a message digest • for any set of inputs, a set of outputs that is uniformly distributed over its output space • one half of the output values should have 1 as its first bit • one half of those values should have 1 as its second bit as well, etc. N56 N1

N51 N8 Id space represented Consistent as a ring on binary keys N48 hashing N10 of length m  2m–1 ids on peers with a 6-bit N14 ID(node) = hash(IP, Port) N42 hash space ID(key) = hash(key) N21 N38 N30 Nov. 2015 Angelo Furno – Networks 2 147 Chord: Consistent Hashing (2)

• Consistent hashing guarantees • For any set of N nodes and K keys – Load balancing: each node is responsible for O(K/N) keys – a limited even number of keys per node – When a node joins (or leaves) the network, only an O(1/N) fraction of the keys are moved to a different location – a few changes occur when a new node joins/leaves the DHT Filename “A beautiful Day” K5

N51 K10

Consistent A new key is stored at the node hashing on keys N14 with the next higher or equal ID with a 6-bit (i.e., the key’s successor) hash space N42

K38

Nov. 2015 Angelo Furno – Networks 2 148 Chord: Node Joins and Leaves

• Node 7 joins • Node 1 leaves

6

6 1 successor(6) = 7 0 7 1 successor(1) = 3

6 2

5 3 4 2 1 Ref: Markus Böhning

Nov. 2015 Angelo Furno – Networks 2 149 Chord: Basic Lookup

• Each node knows only its successor • Routing around the circle, one node at a time.

6 “Where is object with key 4?” 1 0 7 1

6 2

5 3 4 4 2

Nov. 2015 Angelo Furno – Networks 2 150 Chord: Efficient Lookup

• Basic Lookup scheme correct, BUT inefficient • it may require traversing all N nodes • Lookups are accelerated by maintaining additional routing information • Each node maintains a routing table with (at most) m entries (where N=2m) called the finger table • ith entry in the table at node n contains the id (and IP + port) of the first node, s, that succeeds n by at least 2i-1 on the identifier circle (i in range [1, m]) • s = successor(n + 2i-1) • remember: all arithmetic is mod 2m • s is called the i-th finger of node n • denoted by n.finger(i).node

Nov. 2015 Angelo Furno – Networks 2 151 Chord: Finger Table

• Starting value at entry i • (n + 2i-1) finger table keys start int. succ. 6 • ID Interval (used for lookup) 1 [1,2) 1

2 [2,4) 3

• [starti, starti+1) 4 [4,0) 0 • Successor at entry i finger table keys • successor(n + 2i-1) 0 start int. succ. 1 7 1 2 [2,3) 3 3 [3,5) 3 5 [5,1) 0 6 2

finger table keys 5 3 start int. succ. 2 4 [4,5) 0 4 5 [5,7) 0 7 [7,3) 0

Nov. 2015 Angelo Furno – Networks 2 152 Chord: Finger Table (2)

• Each node stores information about only a small number of other nodes, and knows more about nodes closely following it than about nodes farther away • A node’s finger table generally does not contain enough information to determine the successor of an arbitrary key k • Repetitive queries to nodes that immediately precede the given key will lead to the key’s successor eventually

Nov. 2015 Angelo Furno – Networks 2 153 Chord: Node Join with Finger Table

finger table keys start int. succ. 6 1 [1,2) 1

2 [2,4) 3

4 [4,0) 06

finger table keys 0 start int. succ. 1 7 1 2 [2,3) 3 3 [3,5) 3 5 [5,1) 06 finger table keys start int. succ. 6 2 7 [7,0) 0 0 [0,2) 0 finger table keys 2 [2,6) 3 5 3 start int. succ. 2 4 [4,5) 06 4 5 [5,7) 06 7 [7,3) 0

Nov. 2015 Angelo Furno – Networks 2 154 Chord: Node Leave with Finger Table

finger table keys start int. succ. 1 [1,2) 13

2 [2,4) 3

4 [4,0) 06

finger table keys 0 start int. succ. 1 7 1 2 [2,3) 3 3 [3,5) 3 5 [5,1) 06 finger table keys start int. succ. 6 6 2 7 [7,0) 0 0 [0,2) 0 finger table keys 2 [2,6) 3 5 3 start int. succ. 2 4 [4,5) 6 4 5 [5,7) 6 7 [7,3) 00

Nov. 2015 Angelo Furno – Networks 2 155 Chord: Lookup with Finger Tables

• Upon receiving a query for item with key k, a node: finger table keys • Checks whether it stores start int. succ. 0 the item locally 1 [1,2) 1

2 [2,4) 3 • If not, forwards the query to the farthest successor 4 [4,0) 06 in the finger table whose id most immediately finger table keys precedes k 0 start int. succ. 1 7 1 2 [2,3) 3 3 [3,5) 3 query(0) 5 [5,1) 6 finger table keys start int. succ. 6 6 2 7 [7,0) 0 0 [0,2) 0 finger table keys 2 [2,6) 3 5 3 start int. succ. 2 4 [4,5) 6 4 5 [5,7) 6 7 [7,3) 00

Nov. 2015 Angelo Furno – Networks 2 156 Chord: Stabilization protocol

• Basic stabilization protocol is used to keep nodes’ successor pointers up to date, which is sufficient to guarantee correctness of lookups

• Those successor pointers can then be used to verify the finger table entries

• Every node runs stabilize periodically to find newly joined nodes

Nov. 2015 Angelo Furno – Networks 2 157 Chord: Stabilization protocol (2)

 n joins

 predecessor = nil ns  n acquires ns as successor via some n’

 n notifies ns being the new predecessor

 ns acquires n as its predecessor

 np runs stabilize

) ) n = s  np asks ns for its predecessor (now n)

 np acquires n as its successor

pred(n n  n notifies n

s p

p

 n will acquire np as its predecessor

) ) n =

p ) ) n = s  all predecessor and successor pointers

nil are now correct

succ(n

pred(n

) ) n = p  fingers still need to be fixed, but old

fingers will still work succ(n

 fix_fingers protocol called periodically

np

Nov. 2015 Angelo Furno – Networks 2 158 Chord: Stabilization protocol (3)

• Key step in failure recovery is maintaining correct successor pointers • To help achieve this, each node maintains a successor-list of its r nearest successors on the ring • If node n notices that its successor has failed, it replaces it with the first live entry in the list • fix_fingers and stabilize protocols will correct finger table entries and successor-list entries pointing to failed node

• Performance is sensitive to the frequency of node joins and leaves versus the frequency at which the stabilization protocol is invoked

Nov. 2015 Angelo Furno – Networks 2 159 Chord: Recap

• Mem complexity: every node is responsible for about K/N keys • N nodes, K keys • When a node joins or leaves an N-node network, only O(K/N) keys change hands • and only to and from joining or leaving node • Lookups need O(log N) messages • Churn complexity: to reestablish routing invariants and finger tables after node joining or leaving, O(log2N) messages are required

• Pros • Guaranteed Lookup • O(log N) per node state and search scope • Cons • Stabilization protocol becomes very costly in highly dynamic environment • Supporting non-exact key-match search is hard • Peers are coupled to their location in the key space by the global IP • lack of autonomy in deciding which data they want to store • Network can only evolve from a single origin • either an agreed upon entry point exists or a global identification needs to be maintained

Nov. 2015 Angelo Furno – Networks 2 160 Other DHT implementations: CAN

• CAN (Content Addressable Network) • Based on hashing of keys and nodes into a d-dimensional cartesian space • a multi-torus • Each peer is responsible for keys of a sub-volume of the space (a zone) • Each peer stores the addresses of peers responsible for the neighboring zones for routing • Search requests are greedily forwarded to the peers in the closest zones • Assignment of peers to zones depends on a random selection made by the peer

Nov. 2015 Angelo Furno – Networks 2 161 d = number of dimension DHTs, Summary k = number of keys n = number of nodes

# Links per node Routing hops

b b b Pastry/Tapestry O(2 log2 n) O(log2 n) Chord log n O(log n) CAN d dn1/d SkipNet O(log n) O(log n) Symphony k O((1/k) log2 n)

Koorde d logd n Viceroy 7 O(log n) Optimal (= lower bound)

Nov. 2015 Angelo Furno – Networks 2 162 Applications using DHT

• Databases, FSes, storage, archival • CFS [Chord] • Block based read-only storage • PAST [Pastry] • File based read-only storage • Ivy [Chord] • Block based read-write storage • Web serving, caching • BitTorrent, KaZaA, etc., all use peers as caches for hot data • Content distribution • Query & indexing • Naming systems • Communication primitives • Chat services • Application-layer multi-casting • Event notification services

Nov. 2015 Angelo Furno – Networks 2 163 Recap on P2P overlay networks

• Many different styles • useful to build scalable distributed systems • not only file sharing • remember pros and cons of each • centralized, flooding, swarming, unstructured and structured routing

• Take away messages • Single points of failure are very bad • Flooding messages to everyone is bad • Underlying network topology is important • Not all nodes are equal • Need incentives to discourage freeloading • Privacy and security are important • Structure can provide theoretical bounds and guarantees

Nov. 2015 Angelo Furno – Networks 2 164 THANK YOU FOR YOUR ATTENTION!

Angelo Furno INRIA, INSA de Lyon, Lyon – France

E-mail: [email protected]

Nov. 2015 Angelo Furno – Networks 2 165