University of Wollongong Thesis Collections University of Wollongong Thesis Collection

University of Wollongong Year 

A reputation system for peer-to-peer file- networks

Lan Yu University of Wollongong

Yu, Lan, A reputation system for BitTorrent peer-to-peer file-sharing networks, M.Comp.Sc. thesis, Information Technology and Computer Science, University of Wollongong, 2006. http://ro.uow.edu.au/theses/637

This paper is posted at Research Online. http://ro.uow.edu.au/theses/637

NIVERSITY U OF WOLLONGONG

A Reputation System for BitTorrent Peer-to-Peer File-sharing Networks

A thesis submitted in fulfillment of the requirements for the award of the degree

Master of Computer Science by Research

from

UNIVERSITY OF WOLLONGONG

by

Lan YU

Faculty of Informatics August 2006 c Copyright 2006

by

Lan YU

All Rights Reserved

ii Dedicated to My family

iii Declaration

This is to certify that the work reported in this thesis was done by the author, unless specified otherwise, and that no part of it has been submitted in a thesis to any other university or similar institution.

Lan YU August 31, 2006

iv Abstract

Over the past few years, Peer-to-Peer (P2P) networks have grown extensively and dramatically changed large-scale file transfer. One of the most popular P2P network is the BitTorrent system. BitTorrent can efficiently distribute large files by optimiz- ing the use of network bandwidth and providing scalability. Due to the open and anonymous nature of P2P systems BitTorrent also provides an ideal environment for distribution of malicious, low quality, or doctored information. A number of reputation systems, including P2PRep with its successors XRep and X2Rep, had been proposed to address security weaknesses of P2P file sharing networks. Although it has been claimed that these methods are also applicable to the other file sharing networks, it is not clear how to achieve this task. Moreover, some of the shortcomings of these reputation systems such as online-polling only and cold-start may be exploited by ma- licious attackers. In this paper, we propose a reputation system, called X2BTRep, which is an extension of the X2Rep and for BitTorrent network. We show that the proposed system improves the security and the quality of information distributed over P2P networks.

v Acknowledgements

First of all, I would like to make a grateful acknowledgement to Rei Safavi-Naini, my supervisor, for pointing me to the right direction on my research, Network Security in Peer to Peer System. During this research, she provided me many valuable advices and constant support. Also, my sincere appreciation goes to Willy Susilo, my co-supervisor, for his guidance through the past two years of chaos and confusion on my study and life.

Moreover, I am thankful to Juliet Richardson, International Study Program Coor- dinator, for her carefully reading the draft and offering grammar correcting and writing suggestions.

Additionally, my thanks go to all staff in my Faculty, School of Information Tech- nology and Computer Science, for their offering many facilities, including space,soft and hardware, telephone, etc. to support my research.

Furthermore, I appreciated my family and friends for their encouragement helping me to finish this research.

vi Publications

L. Yu. X2BTRep trusted reputation system: A robust mechanism for P2P networks. HDR Student Conference, University of Wollongong, Australia, August 2006.

L. Yu, W. Susilo, and R. Safavi-Naini. X2BTRep trusted reputation system: A robust mechanism for P2P networks. The 5th International Conference on Cryptology and Network Security, China, August 2006.

vii Contents

Abstract v

Acknowledgements vi

Publications vii

1 Introduction 1 1.1 AimsandObjectives ...... 2 1.2 StructureoftheThesis ...... 3 1.3 OurContribution ...... 3 1.4 Glossary...... 4 1.5 Notations ...... 4

2 Peer-to-Peer: Now and the Future 6 2.1 WhatisaP2PNetwork?...... 6 2.2 Generational Evolution of P2P Networks ...... 8 2.2.1 FirstGeneration–CentralisedP2P ...... 8 2.2.2 Second Generation – Decentralised P2P ...... 11 2.2.3 ThirdGeneration–HybridP2P ...... 14 2.2.4 CurrentDevelopments ...... 17 2.3 Advantages and Weaknesses of P2P Networks ...... 20 2.4 Summary ...... 22

3 Previous Studies of Reputation Systems 23 3.1 Introduction...... 23 3.2 eBay...... 23 3.3 P2PRep ...... 25 3.4 XRep...... 29 3.5 X2Rep...... 31

viii 3.6 Credence...... 33 3.7 Summary ...... 35

4 Overview of BitTorrent Network 37 4.1 Introduction...... 37 4.2 Architecture...... 37 4.3 ProtocolSpecification...... 39 4.3.1 Bencoding...... 39 4.3.2 MetainfoFile(.torrentfile)...... 40 4.3.3 TrackerHTTPprotocol ...... 41 4.3.4 PeerTCPProtocol ...... 43 4.4 StrengthsofBitTorrent...... 45 4.5 ShortcomingsofBitTorrent ...... 46 4.5.1 Distributionoffalseinformation ...... 46 4.5.2 Man-in-middleattack...... 47 4.5.3 IPharvesting ...... 47 4.6 Summary ...... 48

5 A Robust Reputation Management System – X2BTRep 49 5.1 Introduction...... 49 5.2 Principle of our Management Model ...... 49 5.2.1 Assumptions...... 49 5.2.2 DistributedRepositories ...... 50 5.2.3 Voting ...... 50 5.2.4 CentralisedRepositories ...... 51 5.2.5 SecureSocketsLayer ...... 51 5.3 Further Discussion of Two Novel Algorithms ...... 51 5.3.1 Credibility Award Algorithm ...... 52 5.3.2 Credibility Chain Exchange ...... 52 5.4 ProtocolDesigns ...... 53 5.4.1 Initialisation Phase: Enhanced Torrent Creation ...... 53 5.4.2 Phase1:TorrentSearch ...... 54 5.4.3 Phase2:ExchangeofVotes ...... 54 5.4.4 Phase3:VotesEvaluation...... 55 5.4.5 Phase4:TrackerQueries...... 57 5.4.6 Phase 5: Pieces and Credibility Chain Exchange ...... 57

ix 5.4.7 Phase 6: Updating and Voting Reputation Value ...... 58 5.5 Summary ...... 59

6 Critical Evaluation of X2BTRep 60 6.1 Introduction...... 60 6.2 Assessment of X2BTRep...... 60 6.3 Attacks on BitTorrent-like systems ...... 62 6.3.1 Defectionattack...... 62 6.3.2 Poisoning attack & Insertion of viruses in carried data ..... 62 6.3.3 Denialofserviceattack...... 62 6.3.4 Malware software in BitTorrent networks ...... 63 6.3.5 Identityattack ...... 63 6.3.6 Spamming...... 63 6.4 Attacks on Reputation-based Systems ...... 63 6.4.1 Pseudospoofing ...... 63 6.4.2 Reputationspoofing ...... 64 6.4.3 Whitewashingattack ...... 64 6.4.4 Reputationattackbycollectives ...... 64 6.4.5 Referralattack ...... 65 6.5 Summary ...... 65

7 Implementation and Interpretation of Experiment 67 7.1 Introduction...... 67 7.2 Implementation of X2BTRep in the BitTorrent Environment ...... 67 7.2.1 X2BTRepExtensionstoBitTorrent ...... 67 7.2.2 RepositorySchema ...... 69 7.2.3 AdditionalProtocolMessages ...... 70 7.3 Experiment ...... 71 7.3.1 I.IntentionofExperiment ...... 71 7.3.2 II.ExperimentSetting ...... 71 7.3.3 III.WorkingPrinciple ...... 72 7.3.4 IV.MaliciousStrategies ...... 73 7.3.5 V.SimulationResult ...... 75 7.3.6 VI. Conclusion of Our Experiment ...... 77 7.4 Comparison with other Reputation Systems ...... 78 7.5 Summary ...... 80

x 8 Conclusions and Future Work 81

Bibliography 84

xi List of Tables

2.1 The definition of five descriptors in the Gnutella protocol...... 13

3.1 The basic polling protocol in P2PRep system ...... 27 3.2 The enhanced polling protocol in P2PRep system ...... 28 3.3 The X2Repreputationsystemprotocol ...... 32 3.4 The summary of various trusted reputation mechanisms ...... 36

7.1 The summary of a comparison between X2BTRep and other trust rep- utationmechanisms...... 79

xii List of Figures

2.1 First Generation – Centralised P2P Architecture ...... 9 2.2 Sequence of operations in the protocol ...... 10 2.3 Second Generation – Decentralised P2P Architecture ...... 12 2.4 Sequence of operations in the Gnutella v0.4 protocol ...... 13 2.5 Third Generation – Hybrid P2P Architecture ...... 15

4.1 BitTorrentArchitecture...... 38 4.2 A glance at torrent files of from mininova.org webpage...... 39

5.1 Sequence of messages in enhanced torrent creation phase ...... 53 5.2 Number of active peers over time (sourced from [26]) ...... 55 5.3 Sequence of messages and operations in X2BTRep protocol ...... 56

7.1 BitTorrent’s Information Flow with X2BTRep protocol extensions . . . 68 7.2 Basic working principle based on a genuine resource and honest clients 72 7.3 Working principle based on a milicious resource and honest clients . . . 74 7.4 Working principle based on a malicious resource and several dishonest pollers ...... 75 7.5 Simulation results in different situations ...... 76

xiii Chapter 1

Introduction

Peer-to-peer (P2P) file sharing is one of the most significant technical models of the internet. Rather than traditional -server architecture, P2P networks equip each node with an equivalent capability or responsibility and they can computer re- sources and services via direct connections. Over the last few years, a series of P2P networks and channels, such as Napster [38], Gnutella [45], [33], eDonkey [19] and BitTorrent [53], have been developed. Among them, BitTorrent is the most in- fluential and innovative protocol [34, 41], designed to be a large-scale file distribution tool. Studies in the most recent research [40, 46, 43] indicate that BitTorrent has con- sumed more than a third of the internet’s bandwidth and is rapidly emerging as the preferred means of many content providers to distribute legitimate content, such as the free computer operating system, Linux.

One of the primary advantages of BitTorrent is the fact that resources and services can be easily contributed, searched and obtained. Also, it utilises unused the upload capacity of downloaders, which overcomes the problem of free-riding (i.e. users prefer to download but refuse to upload [3]) that occurs in other P2P networks. As a result, network bandwidth can be used as efficiently as possible. Moreover, several other significant features, such as anonymity, scalability, fault tolerance and diversity in service, provide BitTorrent with sufficient potential for future growth.

Along with the aforementioned advantages, inherent risks and threats with BitTor- rent have become a stumbling block against further progress. First, due to anonymity, misbehaving users can arbitrarily distribute low-quality, even malicious content over the network without witnesses, such as Trojan horses and viruses. Second, it provides a good environment for malicious attackers to subvert systems in hiding, because of no enforcement rule of joining, leaving and staying in the system. Third, in a Bit- Torrent network, users can easily expose their private information when joining the system, such as users’ IP address and port, and hence, users’ privacy can be violated

1 1.1. Aims and Objectives 2 by adversaries.

Previous evidence and studies [2, 6, 42] show that reputation systems are a ro- bust solution to protect P2P networks from malicious attacks. In a reputation system, the quality of a given resource/peer is determined by a user, based on historical in- formation from other users [47]. The main advantage of such reputation systems is to protect against most known attacks and vulnerabilities, and simultaneously retain the characteristic of anonymity as well as maintaining minimal overheads. During the past few years, research has been conducted to develop several protocols of reputation systems to protect P2P networks, such as P2PRep, XRep, and X2Rep. Unfortunately, these protocols are only designed for Gnutella-like P2P networks, which have a differ- ent architecture from BitTorrent. Therefore, none of them can be integrated into the BitTorrent protocol even though some proposals claim that they can be adjusted to any P2P systems. Moreover, they have shortcomings like online-polling only, cold-start and performance bottleneck problems.

1.1 Aims and Objectives

Below, we identify a set of key questions that intend to address the aims and objectives of the thesis.

• It is well-known that P2P architecture is one of the key technical concepts of the internet. Over the last few years, P2P technology has seen an explosion of various applications. So, what is P2P and its characteristics? How has P2P been developed during recent years? What advantages and disadvantages does P2P have? And what is the future of P2P?

• Reputation systems are able to establish trust between genuine people, encour- age honest behaviour, and prevent fraudulent behaviour. What is the nature of reputation systems? What is the world’s biggest online marketplace – eBay - and how does it work? What are the characteristics of other reputation systems? And what are the security issues of these protocols?

• BitTorrent is now the dominant P2P application in use. What are the main features of BitTorrent that were helped it to become the most influential and innovative P2P internet application? How does BitTorrent work? And what are the weaknesses of BitTorrent? 1.2. Structure of the Thesis 3

• What is a trusted reputation system? How does it work? Why is it able to protect against weaknesses associated with P2P systems? And what are the main features of current applications of reputation management systems for P2P systems?

• What is the purpose of our proposal? How does our proposal work and how is it integrated into BitTorrent? And what security improvements does our proposal make in BitTorrent-like networks?

• How does our proposal become reality? And how can our experiment make convince this application?

1.2 Structure of the Thesis

The rest of this thesis is structured as follows: Chapter 2 presents a generational review of P2P systems. Chapter 3 analyses several reputation systems including predecessors to ours such as XRep and X2Rep, and discusses their weaknesses. Chapter 4 gives a detailed overview of the BitTorrent network and demonstrates its advantages and dis- advantages. Chapter 5 proposes our protocol X2BTRep by describing its assumptions, several enhanced approaches and the protocol. Chapter 6 evaluates the security is- sues of our proposal. Chapter 7 shows the performance results of the proposed scheme based on our simulation and summarises the comparison between our scheme and other reputation systems. Finally, Chapter 8 concludes the thesis.

1.3 Our Contribution

In this thesis, we propose the first robust reputation system, X2BTRep, devoted to Bit- Torrent networks. It is an extension of the X2Rep trust semantics algorithm. The out- standing advantage of X2BTRep is that it prevents all known attacks on BitTorrent-like networks. Other major contributions of X2BTRep are credibility award and credibility chain exchange algorithms. Credibility award is a method to overcome the cold-start problem, so that newcomers can participate in the system as quickly as possible. The method of credibility chain exchange improves trusted ratings sharing among all the peers, which avoids the limits of ratings sharing between peers with few interests in previous reputation systems. Moreover, our approach can be implemented on other centralised P2P networks as well, such as Napster. 1.4. Glossary 4

1.4 Glossary

AOL: American On-Line, is an American-based online service provider. ElGamal: is an asymmetric key encryption algorithm based on Diffie-Hellman key agreement. Free-riding: users in P2P networks prefer to download but refuse to upload. : peers in BitTorrent networks have a partial resource. MD5: is a cryptographic hash algorithm, which can be used to protect the integrity of messages and detect the modification of messages. NAT: Network Address Translation involves re-writing the source and/or destination addresses of IP packets as they pass through a router or firewall. NIST: National Institute of Standards and Technology. P2P: peer-to-peer is a communications model which allows all participants communi- cate and share resources as equals. RIAA: Recording Industry Association of America, is the trade group of United States, which represents the recording industry. RSA: published by Rivest, Shamir, and Adleman, is a public key cryptographic algo- rithm. Seeder: peers in BitTorrent networks possess a complete copy of the resource. Servent: Servent is a P2P network node, combining the functionalities of a server and a client over the network. SHA1: is a message digest function proposed by NIST. SSL: Secure Socket Layer is a cryptographic protocol which provides secure commu- nications on the Internet for client/server applications. TTL: Time To Live, refers to the number of times that a query message can be for- warded before being removed from the networks. VoIP: Voice over Internet Protocol, is a technology that allows clients to make tele- phone calls using a Internet connection instead of a regular phone line.

1.5 Notations

We use PKi and SKi to denote a pair of public and private keys belonging to the owner i. We denote public key encryption with curly brackets, preceded by the key with which something was encrypted, as in {M}PKi , which means message M encrypted with public key PKi. Signing is denoted as square brackets, with the symbol of the key subscripting the closed bracket, as in [M]SKi . Cryptographic certificate is denoted 1.5. Notations 5

with combination of the signing and the public key, as in ([M]SKi ,PKi). Notations used in this thesis are summarised as follows.

• Let (PKi,SKi) denote peer i’s public and private key, respectively.

• Let {M}PKi be a message M encrypted with a public key PKi.

• Let ([M]SKi ,PKi) denote a message M signed by a private key SKi, associated

with a public key PKi. Chapter 2

Peer-to-Peer: Now and the Future

In this chapter, we give a comprehensive introduction to peer-to-peer (P2P) networks. First of all, we describe the basic nature of P2P networks; then we illustrate three generational evolutions and current developments of P2P networks; and finally, we discuss the strengths and weaknesses of P2P networks in general. In Chapter 3, we then demonstrate a series of techniques to overcome the weaknesses introduced in this chapter.

2.1 What is a P2P Network?

In recent years, a new technology trend to win pundits’ and technology vendors’ hearts has emerged: peer-to-peer computing, often referred to simply as P2P. The Gartner Group said that P2P will “radically change business models.” [49] Chief Technology Officer of Intel Pat Gelsinger, also said, “Peer-to-peer computing could usher in the next generation of the internet, much as we saw Mosaic usher in the last era,” referring to the pioneering Mosaic Internet browser. And vendors such as IBM, Microsoft, Google and Yahoo are rushing to build P2P applications.

P2P computing is one of the key technical concepts of the internet, with a his- tory over several decades. However, P2P has only grown extensively and dramatically changed the way resource and service distribution in recent years. P2P is a type of network in which each node has equivalent capabilities and responsibilities, and each participant can initiate a communication to share huge amounts of resources. Unlike the traditional client/server model and master/slave model, P2P networks allow nodes to act as both “clients” and “servers” to other nodes.

Resources and services shared in P2P systems include disk storage and information and content files, such as music, videos, images, games and other software. Moreover,

6 2.1. What is a P2P Network? 7

P2P networks can be used for sharing processing cycles and instant messaging. Ac- cording to the network and application, P2P systems are generally identified in three groups:

• Collaborative Computing, also referred to as distributed computing, is a con- cept that combines idle CPU capacities from multiple machines to work on jointly solving a problem, which requires a huge number of CPU cycles if a single ma- chine were to do the calculation by itself [27]. It is popular in science, research and biotechnology organisations, which need large computational capabilities. SETI@home [44] is the biggest and most successful example of collaborative com- puting, with over five million participants worldwide to date. It makes use of the wasted processor power of individual computers around the world for analysis of data from the Arecibo radio telescope.

• Instant Messaging is another widely-used form of P2P networks and includes applications such as MSN Messenger, Yahoo Messenger, Google Gtalk, ICQ and etc. Instant Messaging applications allow users to swap media data including text messages, audio and video in real-time. These Instant Messaging applications reply on a central server which authenticates a client to the network, stores contact lists and coordinates communication between peers. However, two clients can talk with each other directly without the central server. With the help of Instant Messaging software, many features, such as the transfer of files, white- boarding, real-time conferences and Digital Phone/VoIP, can be employed [48].

• Affinity Communities are direct file-swap P2P applications which possess a large amount of software like Napster, Gnutella, KaZaA and BitTorrent. Par- ticipating peers in these systems provide shared space and available bandwidth to each other. Systems are able to help peers bootstrap into networks, provide resource searching mechanisms, locate requested resources, enable peers to find each other and exchange information and content. In a word, affinity communi- ties are based on users collaborating, and searching for information and files in each others’ computers [1].

A number of characteristics are summarised below, which are conventionally at- tributed to peer-to-peer systems. These general features are not complied with abso- lutely by peer-to-peer systems, but are used to identify P2P networks [4]. 2.2. Generational Evolution of P2P Networks 8

• Nodes can perform the tasks of clients, requesting services from others; the roles of servers, providing locally held resources; and as routers, relaying queries between nodes.

• P2P provides a way of decentralising adminstration. In other words, nodes are independent of one other.

• Resources and services are shared by direct exchange.

• Another characteristic of P2P systems is their self-organising capacity. Nodes may leave or enter networks unpredictably. Moreover, the capabilities of nodes, such as connectivity and performance, are highly capricious.

• The topology of P2P networks is internet-wide. Resources and services are geo- graphically distributed.

2.2 Generational Evolution of P2P Networks

P2P is one of the most significant technologies on the internet, first emerging in 1999. During the last few years, it has achieved wide predominance in the distribution of resources such as audio, video and real-time data. The evolution of P2P networks over the past few years is classified into three major generations, and a fourth generation is currently under development.

2.2.1 First Generation – Centralised P2P

Background and History

In May 1999, the first truly global P2P file-sharing infrastructure – Napster [38] – was created by Shawn Fanning, and it achieved wide popularity within a few months. It reached its highest point in February 2001, when 29.4 million registered users shared 2.79 billion files in the same month [8]. However, files shared on Napster networks were copies of copyrighted popular music and movies. These illegal acts were considered to be unacceptable by the most music industry and media companies. Hence, Naspter was shut down by the RIAA (Recording Industry Association of America) at the end of 2001. 2.2. Generational Evolution of P2P Networks 9

Architecture and Protocol

Napster is based on a centralised P2P network. There is a central server that keeps track of peers and helps them to find each other. Peers maintain shared resources, and inform the central server about the description information (file name, size of file, time of creation etc.) of the shared resources. When the central server receives a query from Peer A, it checks its local index database to find matches, then returns the search results to Peer A. The results indicate, for example, that Peer B has a copy of a requested file. Peer A finally makes a direct connection with Peer B and downloads the file (see Figure 2.1).

[A]

[B]

Figure 2.1: First Generation – Centralised P2P Architecture

The Napster protocol, illustrated in Figure 2.2, consists of the following basic phases: user joins, publish shared file, search query, download query, and data fetch.

Phase 1: A client Peer i logs into a Napster centralised server along with nickname, password, port, client information (client version info), and link type (e.g. 14.4 kbps, 64K ISDN, Cable, etc.). If the port value is 0, it means that the client is behind a firewall and can only push files outward. If the login is successful, then the server sends a login ack message back to Peer i.

Phase 2: In this phase, Peer i notifies the central server about the information in the shared files, which includes filename (the mp3 file is contributed), md5 (md5 is a cryptographic hash algorithm, which can be used to protect the integrity of messages and detect the modification of messages), file size in bytes, bitrate (the mp3 bitrate in 2.2. Generational Evolution of P2P Networks 10

1

2 3

4

5

Figure 2.2: Sequence of operations in the Napster protocol kbps), frequency (the sampling frequency in Hertz) and time (the play time in seconds). Once it has received these messages, the server adds them into a shared library.

Phase 3: To look for a file, the initiator Peer i issues a query message to the server. The message can be either a search or a browse command:

• If Peer i requests a search command, several parameters such as artist name, song name, link-type, bitrate and so on can be applied. Then, the server identifies match results according to the matching conditions, such as “max results”, “at least”, “at best” or “equal to”, and returns response messages to Peer i. The response messages contain the parameters of filename, md5, file size and so on.

• Peer i sends a browse message when it wants to gain a list of the files shared by a specific user. The browse message includes the nickname of the specific user. Then, the server forwards Peer i the browse response message, which has the attributes of nickname, filename, md5, size and so on.

Phase 4: As Peer i receives feedback from the server, it may choose some clients from which it would like to download, according to speed, quality of MP3 and size. Afterwards, Peer i sends the nickname, together with the filename that it chooses, to the centralised server for downloading. The server passes the nickname, filename, IP, port and linespeed back to Peer i for downloading. 2.2. Generational Evolution of P2P Networks 11

Phase 5: At this point Peer i decides from which peer (for example Peer j ) to download the file. The download connections are divided into two categories, depending on the port value from Peer j.

• If the port value is not zero, it means the remote Peer j is not firewalled. Then Peer i makes a direct TCP connection with Peer j. After the establishment of the connection, Peer i can send a request to Peer j for the file to download.

• If Peer i finds that the port value is 0, the remote client Peer j cannot accept the connection and the file needs to be pushed from the remote client behind a firewall. It asks the server to inform Peer j to make an outgoing connection to the requesting client Peer i, as well as the filename. The server needs to send the client, Peer j behind the firewall, the connection information including Peer i’s name, IP, port and so on. Once Peer j receives the message from the server, it can make a TCP connection via Peer i’s IP/port and then begin transferring the file.

During the process of downloading, both Peer i and Peer j need to give their status to the server when they are transferring files or have completed the job.

Evaluation of Advantages and Disadvantages

Centralised P2P architecture is a simple protocol, which provides a fast and efficient way to locate resources. Nevertheless, centralised P2P networks, heavily relying on a central point to maintain information on all the peers and resources, is suscepti- ble to being shut down. Furthermore, this type of network is not scalable, because performance deteriorates sharply as the number of clients increases.

2.2.2 Second Generation – Decentralised P2P

Background and History

The second generation of P2P networks – Gnutella [45], was developed by Nullsoft, which was a division of AOL (American On-Line). The article of the protocol was divulged prematurely by accident, on 14 March 2000. Although AOL removed the article on the following day and restrained Nullsoft from doing any further work on Gnutella, the blueprint of the protocol had been propagated over the internet, and development of the Gnutella protocol has been undertaken by different groups up to the present. 2.2. Generational Evolution of P2P Networks 12

Architecture and Protocol

Gnutella is a fully decentralised peer-to-peer system in which there is no central point, and nodes function as both clients and servers. A node A on the initial use is required to find at least one of the other nodes to which to connect. Once a node (e.g. B) has been connected, node B will convey its own list of active nodes to node A. Then node A will try to establish connections with the nodes from B, as well as from other neighbours. Node A will repeat the acquisition of nodes until reaching a certain quota.

Download [C]

[B]

QUERY [A]

QUERY RESPONSE QUERY

Figure 2.3: Second Generation – Decentralised P2P Architecture

To locate for a resource, node A issues a query to all its neighbouring nodes, and they forward the query to their neighbours, and so forth. Each node checks its local resources to find a match for the query. If node C holds the match for the requested resource, it will return a query response along the reverse path of the query. After the response has been received, node A can directly download the resource from node C. To avoid high traffic consumption on account of query messages, each query includes a TTL (Time To Live) field, which refers to the number of times that a query message can be forwarded before being removed from the networks. Each node must decrease the TTL before sending to other nodes, and must remove the query if its TTL becomes zero.

The Gnutella version 0.4 protocol [50] employs five different types of descriptors to control communication between servents1 over the network. The below table 2.1 defines the descriptors:

1Servent is a P2P network node, combining the functionalities ofa server and a client 2.2. Generational Evolution of P2P Networks 13

Descriptor Description Ping A request to discover hosts on the network. Pong The reply to a Ping. It includes the IP and port of the connected host and information on files shared. Query A manner of acting to search for a file in the distributed network. It contains a search string and minimum speed requirements from the requestor. QueryHit The reply to a Query. This descriptor states the information matching the corresponding Query. Push A way to establish communication with firewalled servents.

Table 2.1: The definition of five descriptors in the Gnutella protocol

The Gnutella protocol, demonstrated in Figure 2.4, describes how a servent joins the network, interacts with other servents, locates a file and fetches the file.

1 2

3

4

5

6 GET /filename.mp3 HTTP/1.1

Figure 2.4: Sequence of operations in the Gnutella v0.4 protocol

Phase 1: Before connecting to the Gnutella network, Servent i should acquire the addresses of other servents currently on the network. Host-cache servers are the main and most prevalent means by which servents acquire Gnutella servent addresses. Phase 2: Servent i establishes a TCP/IP connection with other servents using the addresses from Phase 1. A request string “GNUTELLA CONNECT/0.4\n\n” may be sent by Servent i. A response string “GNUTELLA OK\n\n” is sent back by a servent willing to accept the connection. Phase 3: Once Servent i has successfully joined the network, it interacts with other 2.2. Generational Evolution of P2P Networks 14 servents by sending and receiving PING and PONG descriptors. Servent i broadcasts a PING message to announce its existence. A back-propagated PONG message, returned by its neighbour Servent j, includes information about the IP, the port, the number of shared files and the number of kilobytes shared.

Phase 4: QUERY and QUERYHIT messages are sent between Servents i and j when Servent i attempts to discover a file. A QUERY message contains a minimum speed requirement and a string of search criteria specified by Servent i. A QUERYHIT message is sent to reply to the QUERY message and consists of the IP, the port and speed of the responding Servent j and information on the file matching query.

Phase 5: If Servent j, which sends the QUERYHIT message, is firewalled, it cannot establish an incoming connection. Then Servent i may use a PUSH descriptor to obtain the file from Servent j. The PUSH descriptor contains the IP and port of Servent i and the identifier of the target file. Upon receipt of the PUSH descriptor, firewalled Servent j makes a new TCP/IP connection with the requesting Servent i.

Phase 6: This step shows that Servent i directly downloads a file from Servent j. This procedure is done with the HTTP protocol.

Evaluation of Advantages and Disadvantages

The second P2P generation is based on a decentralised architecture. Without any central server, it effectively avoids central point failure or control, and distributes the search costs to servents. As a result, however, it slows the search/response times for locating resources. Also, routing on the Gnutella is unreliable due to the uncertain status of the connection of nodes, such as online, off-line and access speed. Moreover, excess volume of network traffic affects the optimal operational performance of the network.

2.2.3 Third Generation – Hybrid P2P

Background and History

As a result of the publication of Napster and Gnutella, P2P systems began to gain sig- nificant acceptance. Although their popularity increased dramatically, both centralised P2P and decentralised P2P suffered from their own weaknesses. Therefore, after 2001, several new P2P systems, including Kazaa [33], Grokster [24] and the recently evolving Gnutella [22], were released, which combined the characteristics and structure of both 2.2. Generational Evolution of P2P Networks 15 centralised P2P and decentralised P2P. This hybrid P2P solution takes advantage of SuperNodes (or UltraPeers) to build up a backbone network in the system. SuperN- odes are peers with high bandwidth connectivity, processing power and memory, which act as central index servers. This hierarchical structure improves routing performance over the network and reduces search/query times during resource searches.

Architecture and Protocol

Kazaa is one of the most successful applications in terms of participating users and traffic volumes. According to Sharman Networks, the publishers and distributors of Kazaa, more than 85 million copies have been downloaded from its network and it has an average of two million users online at any given time [23]. Also, sandvine.com shows that in 2002 76% of P2P file sharing traffic was Kazaa/FastTrack traffic whilst only 8% was consumed by Gnutella networks. In the following paragraphs, we will take an example of Kazaa to describe the basic architecture and protocol of the third generation.

When an ordinary peer A joins the network, it bootstraps from a SuperNode, which collects and maintains the information about the ordinary peer and shared con- tent (shown in Figure 2.5). Each SuperNode maintains a file index, which maps file identifiers into the IP addresses of ordinary peers.

Node

Node Supernode [B]

Node QUERY

Supernode REPLY QUERY

Supernode REPLY [A] QUERY REPLY Node

Node Node

Download

Figure 2.5: Third Generation – Hybrid P2P Architecture 2.2. Generational Evolution of P2P Networks 16

After the ordinary peer A establishes a semi-permanent TCP connection, it noti- fies metadata information about shared files to its parent SuperNode. The metadata includes the file name, the file size, the content hash and the file descriptors. The file descriptors contain a set of keywords, such as artist name, album name and text spec- ified by users, to be used to match queries. The content hash is associated with each resource and is computed by applying a secure hash function to the resource content. The content hash can be used not only to verify a file after a download request, but also to resume the transfer of a file from other peers if a download from a specific peer fails.

If the ordinary peer A needs to locate a resource, it will forward a search query to its local SuperNode. Then, the SuperNode broadcasts the Gnutella-type query messages to its neighbouring SuperNodes, and so on. The query response with a match (e.g. peer B has a copy of the requested resource) is returned to peer A via its local SuperNode. At this point onward, peer A downloads the file directly from peer B. It should be noted that this kind of P2P system is able to operate without the existence of SuperNodes. As a result, the query latency becomes worse.

The Kazaa peer contains several software components as follows: [32]

• SuperNode List Cache (SLC) is stored in the Windows Registry. The list cache includes a list of IP addresses and port numbers up to 200 SuperNodes.

• DBB files refer to metadata information for the files shared by peers. Files added and deleted are instantly reflected into the DBB file.

files are file extensions for incomplete files. After download completion, the DAT file is renamed to the original file extension.

The Kazaa network protocol employs four different packet types of TCP traffic, namely:

1. Signalling traffic includes a handshaking protocol for connection establishment between peers; metadata information updated from ordinary peers to SuperN- odes; query search and response messages; and SuperNode lists exchange. It should be noted that all signalling messages are encrypted.

2. File transfer traffic refers to plaintext data message transfer between peers over the HTTP protocol. 2.2. Generational Evolution of P2P Networks 17

3. Commercial advertisements, over the HTTP protocol.

4. Instant messaging traffic, encoded in BASE64.

Evaluation of Advantages and Disadvantages

This hybrid P2P architecture overcomes the weaknesses in the first and second P2P generations, and provides a high degree of performance and resilience. On the one hand, there is no central point of failure as in the centralised P2P system. On the other hand, the employment of SuperNodes (or UltraPeers) offers better search/response response times, with less traffic generation than in decentralised networks.

2.2.4 Current Developments

Over the last few years, P2P technology has seen an explosion of various systems, such as centralised P2P, decentralised P2P and hybrid P2P, each with different characteris- tics. Recently, the development of P2P has moved to a series of enticing features rather than the evolution of structures. These features include the notion of cryptography, bi-directional and multiple connections, dynamic IP address/NAT and firewall, plus tit-for-tat policy.

Cryptography

One of the most innovative and popular features in the current development of P2P is the utilisation of cryptography. The basic service provided by cryptography to the P2P world is the ability for confidentiality. It also supplies other services including integrity assurance and authentication. The following section describes these three types of functions:

• Confidentiality allows information to be sent between participants in a way that prevents others from reading it [30]. The technology of confidentiality is data encryption, e.g RSA, AES, Twofish, Blowfish. A P2P internet telephony VoIP network, called Skype [54], features strong AES encryption to actively encode the data stream of each telephone call. Another instant messaging P2P system – WASTE [52] – encrypts traffic data using a Blowfish encryption algorithm in order to prevent snooping. Furthermore, a Java-based P2P client, Qnext [13], utilises 512-bit encryption for handshaking and 192-bit encryption for all other sensitive traffic. 2.2. Generational Evolution of P2P Networks 18

• Integrity Checking ensures information will not be accidentally or maliciously altered or destroyed. Integrity checking not only guarantees that modifications to data are detectable but also identifies identical files with different filenames or distinguishes differing files with identical filenames. Several P2P systems, such as eDonkey [19] and BitTorrent [53], allow users to identify correct files or manage their own namespaces for resources based on MD4 and SHA1 hash algorithms, respectively.

• Authentication provides a way to verify a user’s identity. Anonymity is one of the major characteristics of P2P networks and most P2P systems require no proof of the user at all. However, some P2P systems such as Skype can permit users to reveal the identities of other users in the system. Moreover, the WASTE P2P system employs the RSA public key system for session key exchange and authentication.

Bi-directional and Multiple Connections

The architectures of eDonkey and BitTorrent have introduced features of bi-directional data streaming of download and concurrent download of a file from multiple peers. In order to enable multiple, simultaneous downloads/uploads of a single file, these architectures require data from the file to be divided into small pieces with an ability to detect file corruption using hashing. Bi-directional and multiple download/upload connections overcome the shortcomings in traditional P2P networks, where files cannot be uploaded until a whole copy of the files has been downloaded. This facilitates the rapid distribution of resources as more fragments of resources are available in the network.

Dynamic IP address/NAT and Firewall

In the early days of the internet, all the hosts were considered as equal. Every host could act as a server and a client, and could accept both inbound and outbound connections. However, with the explosive rise of users connecting to the internet after the mid-1990s, security risks increasingly haunted users and fixed public IP addresses were becoming used up. To overcome these problems, a series of technologies, including dynamic port selection and Firewall/NAT, was deployed.

A firewall stands for a gate between the intranet network and the internet, which filters packets and chooses which traffic to allow pass through and which to deny [39]. 2.2. Generational Evolution of P2P Networks 19

A typical firewall allows hosts inside an intranet network to establish a connection to hosts in the internet, but it rejects connections from random hosts in the internet. Also, outbound connections may be limited to common applications such as FTP and HTTP at certain ports. This way, hosts in the intranet network can function only as a client, but not easily as a server. Therefore, a firewall is a useful security tool for network administrators to control access to their hosts, but, it becomes a serious obstruction to P2P communication.

With the increase in users joining the internet, fewer address-blocks can be allocated per capita. The use of Dynamic IP address and Network Address Translation (NAT) has proven particularly popular to solve the shortage of IP addresses. Dynamic IP address assignment allows an individual host’s IP address to change every single day. This is a useful solution for Broadband providers to serve clients who employ dial-up to the internet. As a result, it is difficult to reach hosts with dynamic IP address assignment.

Network Address Translation (NAT) rewrites the source and/or destination ad- dresses of IP packets when they pass through a router. This is the easy way to give multiplehosts in a private network access to the internet using a single public IP ad- dress. When a traffic packet passes through from the private network to the internet, the source address in the packet is translated by the router from the private address to the public address. The router maintains an index database of source/destination addresses for each active connection. When a reply packet is returned back, the router decides to which internal host it should be forwarded according to the index database. However, NAT causes hosts behind a NAT-enabled router not to have true end-to-end connectivity and they are thus difficult to reach.

Dynamic IP address/NAT and firewalls enhance the reliability and security of pri- vate networks, and provide scalability by allowing millions of computers to connect to the internet. However, these technologies have some drawbacks, namely, the instabil- ity of the IP address of a host and the unreachability of a host [39]. Therefore, it is becoming increasingly necessary for P2P applications to allow the use of dynamic IP address/NAT and firewalls and then solve the corresponding risks.

Tit-for-tat Policy

Until recently, free-riding has always been considered as a severe problem that occurs in most P2P systems. However, Kazaa has seen an introduction of participation level 2.3. Advantages and Weaknesses of P2P Networks 20

(PL) during the usage time of each user. PL varies between 0 and 1000 depending on the recent downloads/uploads ratio. As a newcomer, the default of PL is in the medium level with 100. Sharing of files by a user may increase its PL in the system.

When more than one user requests the download of a file, the user with a high participation level will usually be given the highest priority to receive the file. The user with a low PL will have to wait until users with a higher PL finish their download. The period of time spent waiting is uncertain and depends on the circumstances. In order to improve a user’s download performance, the user is required to upload more attractive files to others.

The author of the BitTorrent network, Bram Cohen, was inspired by the PL of Kazaa and developed a much more robust method – tit-for-tat policy – to discourage free-riders. Peers receive data from other peers by uploading what they have. Hence, peers with a high upload speed will probably maximise their own download speed and achieve a high utilisation of bandwidth. On the other hand, peers unwilling to upload will be punished with a decrease in download speed. This therefore facilitates the rapid dissemination of content among peers with high scalability. Chapter 4 gives details of the BitTorrent protocol.

2.3 Advantages and Weaknesses of P2P Networks

The advantages of P2P networks can be summarised as follows: [35] [21]

• Resource aggregation – in P2P networks, all participants contribute resources, including bandwidth, storage space and computer power. Therefore, resource- demanding tasks can be fulfilled.

• Improved scalability – as the number of clients placing demands on the system increases, the total capability of the system increases as well. However, the down- load performance in traditional client/server file sharing networks deteriorates sharply when more clients join in the system.

• No single points of failure – in P2P networks, especially in pure P2P systems, a single point of failure or control in the systems is effectively avoided

• Autonomy – nodes in the system can decide what resources should be shared for others. 2.3. Advantages and Weaknesses of P2P Networks 21

• Cost reduction – in the traditional client/server model, there is a fixed set of servers. But in the P2P world, there are configurations where each node can serve as a “server” for the purposes of being a resource provider. It is possible to reduce the demand for costly centralised servers.

• Anonymity – nodes in P2P networks are pseudonymous by default. These systems allow for the unfettered free flow of information, legal or otherwise.

• Dynamism – there are no enforcement rules for joining, leaving and staying for participants in the system.

The weaknesses of many peer-to-peer networks are generally concluded as being one or more of the following:

• Defection attacks – users make use of the network while not providing any re- sources to other peers in the system.

• Poisoning attacks – peers share files whose contents are different from their de- scription.

• Insertion of viruses in carried data – users provide data infected with viruses or other malware.

• Denial of service attacks – attackers attempt to degrade the performance of peer- to-peer systems by purposely issuing excessive amounts of queries.

• Malware software in peer-to-peer networks – a P2P system may carry spyware, such as a virus or a Trojan horse.

• Identity attacks – attackers keep track of information on users in the network, and then harass them or launch a massive attack on them.

• Spamming – P2P networks may be abused by being sent unsolicited messages in bulk.

Many attacks can be defeated or resolved by careful design of the P2P network and through the use of trusted reputation systems. A reputation system takes feedback from users in the form of certifications and provides a mechanism to accumulate these and determine the quality (or reputation) of a given resource based on this feedback. In Chapter 3 we will review and discuss several reputation systems. 2.4. Summary 22

2.4 Summary

To be able to effectively improve peer-to-peer networks, it is important to understand the characteristics of the protocols and architectures used by the various peer-to-peer networks. This chapter discussed these subjects historically and has illustrated the advantages and weaknesses of P2P systems. Due to the continuous emergence of many protocols and architectures, it is difficult to predict the evolution of P2P networks in the future. However, we believe that several keynode issues, including security and privacy, capabilities to handle dynamic IP address/NAT, firewall and multiple connections, together with free-riding avoidance, should be taken into account when designing new peer-to-peer applications. Chapter 3

Previous Studies of Reputation Systems

3.1 Introduction

The internet has produced vast new opportunities to interact with strangers. These interactions can be fun, informative and even profitable [47]. Along with the benefits, however, they also involve risks. Data from a provider may be different from the description. Products may be low quality or be shipped with inappropriate packaging.

In real life, similar risks of interaction with strangers can be reduced through rep- utations. A stranger’s reliability can be traced through an authority, such as bank or government, or can be judged via past personal experience. Also, people can make friends with strangers through the introduction of their trusted friends.

In the same way, reputation systems play an important role in the internet service, helping to foster trust and elicit cooperation among loosely connected and geograph- ically dispersed economic agents [17]. In reputation systems, the quality of a given resource or peer can be determined by a user based on historical information from other users. These systems encourage good behaviours and deter dishonest participa- tion. In the following sections, we present more detailed analyses of different reputation systems, including eBay, P2PRep, XRep, X2Rep and Credence.

3.2 eBay eBay is the largest online auction house, founded by Pierre Omidyar and Jeff Skoll in September 1995. By means of eBay, millions of antiques, appliances, clothing, collectables, computers, vehicles and other miscellaneous items are being bought and sold moment by moment. Apart from tangible items, services or other intangibles can be traded over eBay as well. Nowadays, eBay has expanded to more than 20 countries.

23 3.2. eBay 24 eBay exploits a complex fee system to generate revenue. The fee system charges $0.25 to $80 per listing and 2-5% of the final transaction price.

The model of eBay relies on a binary reputation system mechanism for quality signalling and quality control. Buyers and sellers can report feedback at the end of transactions. The format of the solicited feedback can be a designation of “positive”, “negative” or “neutral”, together with a short text comment. eBay publishes the sum of positive, negative and neutral ratings plus all comments to all its users.

The binary reputation system mechanism exploited by eBay is able to build trusted relationships between strangers. First, buyers and sellers can determine the other parties’ abilities and manners via historical information on past interactions. Second, the hope of positive feedback or anxiety of gaining a negative reputation for the future leads to a trustworthy behaviour in both parties.

Nonetheless, unfair ratings exists in eBay-like reputation systems. The following sections study a number of unfair rating scenarios and analyse their effects on an online reputation system [18]:

Unfair Ratings for Seller

• Unfairly high ratings – a seller knows several buyers, and they may attempt to inflate the seller’s reputation by acting as a group. This high reputation enables the seller to attract more orders from buyers and higher prices.

• Unfairly low ratings – a seller may collude with buyers in order to debase other sellers’ reputations and expel them from the marketplace. The seller can rig the market and manipulate it dishonestly for personal gain.

Discriminatory Seller Behaviour

• Sellers usually provide a good service to others, but occasionally they supply a bad service to a small number of buyers. Therefore, it is possible for the sellers maintain their good reputation rating, while committing fraud towards victimised buyers. 3.3. P2PRep 25

3.3 P2PRep

P2PRep [12] is designed to globally solve current security problems and minimise some well-known weaknesses in decentralised P2P systems, such as Gnutella. P2PRep al- lows a peer, before deciding from where to download the source, to enquire about the reputation of offerers by polling its peers. The protocol can be easily piggybacked the existing P2P networks and has limited impact on current implementation. In addition, it keeps the level of anonymity of participators whilst allowing them to share their views on other parties.

Security Improvements

P2PRep can resolve and minimise the following weaknesses of P2P systems like Gnutella:

• Distribution of tampered-with information – this is helpful since there is no way to verify the resources or content of messages. For instance, there might be a malicious node Bob providing a fake resource with the same name that Alice wants. After Alice has received the hostile resource, she will update the reputa- tion about Bob, thus preventing further interaction with him. Also, Alice will become a witness against Bob in all polling processes initiated by others.

• Man-in-middle attack – this attack refers to an intermediate node settling itself between two honest nodes, which then acts as a message route, and actively changes messages, garbles them or drops them. Before a resource provider’s QueryHit message reaches the receiver, the attacker rewrites the provider’s Query- Hit message with attacker’s IP and port. The attacker then modifies the content and pass it to the receiver. P2PRep protocols address this problem by including a challenge-response phase just before downloading. In order to impersonate the provider in this phase, the attacker should know the provider’s private key and be able to design a public key. Therefore, this attack are successfully prevented by P2PRep.

Basics of P2PRep

P2PRep requires that each servent generates a pair of public and private keys through public key algorithms such as RSA and ElGamal1. Each servent is associated with

1Both RSA and ElGamal are public key algorithms, which can perform encryption, digital signa- tures and authentication 3.3. P2PRep 26 a servent id (servent id to be a digest of a public key, obtained using a secure hash function) used for interaction with others. P2PRep assumes that each servent maintains reputation information with respect to resources and votes. Moreover, it provides a method for each servent to translate local reputation information in order to express its votes and assess votes from others.

• Representing Reputations – each servent is required to maintain an experi- ence repository, which has three attributes, including servent id, num plus, and num minus. num plus represents the number of successful experiences associated with the servent id, while num minus stands for unsuccessful experiences.

• Vote Translation – in order to translate votes for an offerer, each servent checks its experience repository. The vote can be either an ordinal scale or a continuous number. The vote can be calculated through num plus divided by the sum of num plus and num minus.

• Representing Credibilities – each servent maintains a credibility repository containing three attributes (servent id, num agree, and num disgree). The ser- vent id is associated with the voter who has submitted votes. num agree repre- sents the number of votes made for on servent id. On the contrary, num disgree expresses the number of votes made against servent id.

Protocol of P2PRep

The P2PRep reputation approach can be divided into two types:

• Basic Polling – servents respond votes without providing their servent id. This is obviously vulnerable in that fake peers may respond with positive votes to other fake peers and with negative votes to good peers. Table 3.1 describes the message sequence of basic polling.

• Enhanced Polling – voters need to declare their servent id when polling so that the peer can take this into account in weighting the votes received. The enhanced polling in P2PRep protocol is summarised in Table 3.2.

We note the following about the basic polling protocol.

Phase 1: To look for a resource, an initiator p broadcasts a search QUERY message to others. The QUERY message includes a search string and a minimum speed specified 3.3. P2PRep 27

Phase Description 1 Resource searching Initiator p to Network: Query(search string,min speed) Offerers to Initiator p: QueryHit(num hits,IP,port,speed,Result,servent idi) 2 Resource selection Initiator p to Network: & vote polling Poll({Offerer1,Offerer2...Offerern},PKp) Voters to Initiator p:

PollReply({IP,port,Votes}PKp ) 3 Vote evaluation Actions: – Remove suspicious votes from poll – Randomly select a group of voters from the elected votes – Initiator p to voters: TrueVote(Votesj) Voters to initiator p: TrueVoteReply(response) – If response is negative, discard the vote 4 Resource download Actions: – Generate a random number r – Initiator p to servent s: challenge(r)

– Servent s to initiator p: response([r]SKs ,PKs)

– If H (PKs)=servent ids ∧ {{[r]SKs }PKs =r}, download 5 Repository update Update experience repositories

Table 3.1: The basic polling protocol in P2PRep system by the initiator p. After receiving the QUERY message, servents with a match result send back a QUERYHIT response. The QUERYHIT message provides information about the number of query hits, the speed, the offerer’s IP/port and servent idi.

Phase 2: The servent p can select a servent or a group of servents based on the connection speed and the quality of the offer after it receives a response from the enquiry. Then, it requests the reputation opinion of selected resource providers through broadcasting a reputation request POLL message to other peers. The POLL message includes the servent ids of elected offerers and the public key of the servent p. The servents receiving the POLL message feed back a POLLREPLY message on any of the offerers in the list. The POLLREPLY message contains the IP/port and their votes, and is encrypted with p’s public key before being sent back.

Phase 3: After obtaining a set of votes, the servent p employs a series of mechanisms to remove suspicious votes and calculate the reputation. First, the servent p decrypts the votes and removes the tempered-with votes. Second, p discards suspicious votes that come from the same IP address. Third, p randomly selects a list of voters from the elected votes, and sends them a corresponding TRUEVOTE message. The selected 3.3. P2PRep 28 voters are expected to send back a TRUEVOTEREPLY message which declares the validity of the votes. Therefore, p can judge the correctness of the votes and decide from which offerer to download.

Phase 4: Before initiating the download, p sends a random number r to the best offerer s and expects s to return a self-certified message including the number r. Asa result, p can decrypt the self-certified message to check the equality of the original r and the r from s, and also p can verify the identity between the previous servent ids in the QUERYHIT message and the PKs’s digest. If both verifications are valid, p initiates the download.

Phase 5: After the completion of the download, the servent p updates the experience with the offerer s, which may be translated into votes for other peers’ requests.

Phase Description 1 Resource searching Initiator p to Network: Query(search string,min speed) Offerers to Initiator p: QueryHit(num hits,IP,port,speed,Result,servent idi) 2 Resource selection Initiator p to Network: & vote polling Poll({Offerer1,Offerer2...Offerern},PKp) Voters to Initiator p:

PollReply({([IP,port,Votes,servent idi]SKi ,PKi)}PKp ) 3 Vote evaluation Actions: – Remove suspicious votes from poll – Randomly select a group of voters from the elected votes – Initiator p to voters: AreYou(servent idj) Voters to initiator p: AreYouReply(response) – If response is negative, discard the vote 4 Resource download Actions: – Generate a random number r – Initiator p to servent s: challenge(r)

– Servent s to initiator p: response([r]SKs ,PKs)

– If H (PKs)=servent ids ∧ {{[r]SKs }PKs =r}, download 5 Repository update Update experience and credibility repositories

Table 3.2: The enhanced polling protocol in P2PRep system

The difference between basic polling and enhanced polling is as follows:

• The major difference between the enhanced polling protocol and the basic polling protocol is the deployment of credibility to weight votes during the vote evaluation process. 3.4. XRep 29

• It requires that voters provide not only IP/port and votes, but also their ser- vent id in a POLLREPLY message. Moreover, the vote in the POLLREPLY message is signed by the private key together with the corresponding public key. The use of public key encryption provides integrity and confidentiality for mes- sages.

• Selected voters are contacted with an AREYOU message which reports the cor- responding servent id. Upon receiving this contact, the selected voters return an AREYOUREPLY message to confirm the validity of the votes.

• After a successful download, the downloading initiator updates not only the ex- perience repository, but also the credibility repository.

Security Weaknesses

Nevertheless, P2PRep suffers from some security weaknesses. The main problem of P2PRep is that this protocol only incorporates servent-based reputations and ignores reputations associated with resources. Therefore, it only provides inadequate security guarantees to P2P systems. Moreover, P2PRep requires peers to be online in order to express votes to others. This restricts the flexibility of servents collecting reputations. In addition, voters reveal their IP addresses in POLLREPLY messages, which lead to weak anonymities. Malicious attackers may take advantage of IP addresses to harass owners or launch a massive attack on them.

3.4 XRep

XRep [15] reputation system is a distributed polling algorithm designed for the Gnutella P2P sharing protocol, where each peer keeps track of reputation ratings and shares them with other peers. The reputation ratings are based on the experience of both resources and peers, and are used to minimise the potential downloading risks of a resource offered by a peer. Thus, XRep preserves the characteristic of anonymity in P2Ps while providing reputation management.

Basic Assumptions

To be able to identify resources and peers, XRep makes two assumptions. Each servent keeps a consistent pair of public (PKi) and private keys (SKi) for multiple use. There is a peer id (Pi) associated with the peer which is the digest of the public key obtained 3.4. XRep 30 using a secure hash function. On the other hand, each resource is related to an identifier

(Resourceid) computed from the hash of the resource’s content.

There are two types of votes in the XRep protocol. One is the resource vote, which is a binary value describing whether the resource is good(1) or bad(0). Another is the servent vote, which is same as the one in the P2PRep protocol. The servent vote can be generated as an ordinal scale (e.g. from A to D or from ⋆ to ⋆⋆⋆⋆⋆) or a continuous one (e.g. 80%).

Approach

XRep protocol includes five phases as follows:

• Phase 1 – Resource searching: the initiator Pi broadcasts a Query mes- sage including search keyword and min-speed, to its neighbours. The receivers with matching resources respond with QueryHit messages which contain IP, port, peer id etc.

• Phase 2 – Resource selection & vote collection: next, Pi selects a resource

which seems best to satisfy the request. Pi then asks other peers’ opinions on either the resource or the offerer by Poll messages. The Poll messages include

Resourceid, several offerers’ peer ids and Pi’s public key PKi. Upon receiving the

Poll message, each peer sends back a PollReply message encrypted by Pi’s public key. The PollReply message includes a vote, together with the peer’s peer id, IP and port signed by its private key and combined with its public key.

• Phase 3 – Evaluation of votes: after Pi receives the PollReply messages, a series of mechanisms is used to remove suspicious votes and assess the rest of the

votes. Hence, the final adjusted value of the votes is presented to Pi to select the most reliable offerer.

• Phase 4 – Best servent check: before deciding to download, Pi sends a chal- lenge message to authenticate the best offerer and the offerer returns a response

to Pi.

• Phase 5 – Resource downloading: at this point, Pi can establish a direct connection with the chosen offerer and request downloading. After the comple-

tion of the download, Pi checks the resource and updates the experience of this transaction. 3.5. X2Rep 31

In general, XRep utilises three mechanisms to assess votes. First, votes are all encrypted with the initiator’s public key. This allows the initiator to detect tampered votes after decryption. Second, in order to prevent a clique of dummy or controlled votes, the initiator calculates votes from the same network Id in an average value, or only selects a single vote as the final value. Third, the initiator randomly chooses several peers in the voting list and sends messages requesting confirmation on votes to ensure the existence of the peers. The peers who receive the request must return a response message; otherwise their votes will be discarded.

Security Considerations

We nevertheless find the mechanisms provided by XRep cannot evaluate votes and remove those from suspicious attackers effectively; furthermore, they may have negative impacts. First of all, XRep requires that PollReply messages include not only votes but also peers’ IP and port. This enables attackers to track the IP addresses of other peers and launch a massive attack against those peers. Second, XRep employs a simple cluster computation to evaluate votes, so that the votes are either entirely accepted or blindly discarded. Hence, the download decision may be based on an inadequate evaluation. Finally, requesting confirmation can generate much additional overhead if a large number of peers have been chosen.

3.5 X2Rep

X2Rep [14] extends XRep and is designed to protect against the weaknesses of XRep. As in XRep, a distributed polling algorithm is employed in X2Rep to manage the reputations based on resources and servents. However, it is an improvement on XRep and has the ability to compute the weight of a peer based on past voting experiences.

Basic Assumptions

The weight of a peer introduced by the X2Rep protocol is the most critical part of the voting evaluation process. It is measured in terms of credibility rating. Each peer builds and maintains a table of credibility ratings. The table contains peer ids and the credibility values of other peers who have previously submitted votes. The credibility value cij given by a peer Pi for another peer Pj is a real number in the interval [0, 1]. For an unknown peers or recognised malicious attackers, its credibility value is set to zero. 3.5. X2Rep 32

Approach

In X2Rep, the first two phases are similar to XRep’s. But in the third (vote evaluation) phase, X2Rep employs a different algorithm: the evaluation computation allows only votes from peers whose credibility values are more than zero to be involved. The computation uses the credibility values of the peers to multiply their vote value and the final results are given to the user to determine the download, whilst those votes from unknown peers or recognised malicious attackers are temporarily stored. After download completion, the user updates their opinions on the resource and the offerer. And then, the credibility ratings of the peers who have submitted votes are refreshed.

If the votes are the same as Pi’s opinions, the corresponding submitters’ credibility ratings increase by 0.05. Otherwise, the credibility ratings are reduced to zero.

The extension of X2Rep, X2Rep protocol is illustrated in Table 3.3.

Phase Description 1 Resource searching Initiator p to Network: Query(search string,min speed) Offerers to Initiator p: QueryHit(num hits,IP,port,speed,Result,servent idi) 2 Resource selection Initiator p to Network: & vote polling Poll(resource id,{Offerer1,Offerer2...Offerern},PKp) Voters to Initiator p:

PollReply({([Votes,servent idi]SKi ,PKi)}PKp ) 3 Vote evaluation Actions: – Find credibility ratings for voters and adjust votes accordingly – Create adjusted resource and offerer votes, then decide on download compared with the trust threshold values 4 Resource download Actions: – After successful download, check the resource’s digest – Update experience and credibility repositories

Table 3.3: The X2Rep reputation system protocol

Security Considerations

X2Rep effectively improves on XRep is security shortcomings. Credibility rating pro- vides a more reliable assessment of the resources and the offerer before initiating a download. The protocol replaces the IP and port of a peer with a self-signed certificate within PollReply messages, which avoids the exposure of personal privacy. Further- more, it reduces additional overheads because of discarding several unnecessary XRep 3.6. Credence 33 network communication phases.

Nonetheless, there are several shortcomings of X2Rep reputation systems as follows:

• Online polling only – both XRep and X2Rep protocols require peers to be online during the polling process. However, it is impractical for peers to run the reputation systems around the clock simply in order to distribute votes.

• Cold-start – in X2Rep, newcomers with inadequate credibility ratings may take a risk to download resources. They also have to struggle for credibility ratings in other peers so as to continuously participate in polling distribution. However, establishing sufficient credibility ratings in either newcomers’ or in other peers’ repositories takes a long time for newcomers.

• Performance bottlenecks – X2Rep suffers from a problem in that peers can only generate a limited number of credibility ratings. Final adjusted trust values presented to users will be restricted to the limited number of credibility ratings. Hence, users may make download selections based on inadequate reputation val- ues.

3.6 Credence

Kevin and Emin [51] introduced a robust reputation system designed for the Gnutella P2P system called Credence, which provides not only a basic voting scheme and a weight of correlation to evaluate votes but also a selected information sharing mechanism to find correlation between peers without common interests. The following section provides goes into details of these three mechanisms:

• Employs a simple voting scheme based on objects, where objects are associated with positive or negative evaluations. Any client is able to give a positive or negative vote on any object.

• Enables clients to weigh votes according to the statistical correlation between peers. Statistical correlation is calculated by the historical relationship between a pair of peers. Statistical correlation can be positive or negative. A positive correlation indicates that a pair of peers tend to agree on votes, whereas a negative correlation signifies they tend to disagree. A pair of peers without historical correlation would discard votes from each other. 3.6. Credence 34

• Allows clients to extend the scope of correlation via selective information shar- ing. This is an effective method to discover relationships between peers without common interests. To transfer relationships, the statistical correlations between A and B, and between B and C should be positive. Then, the correlation between A and C can be computed by multiplying the above two correlations.

Basic Assumptions

The Credence system assumes that each resource consists of data and a descriptor, and each servent generates a pair of public and private keys for the use of cryptography. A resource identifier associated with the resource includes a hash of the resource data and the descriptor. Several methods illustrated below are used to manage reputations in Credence:

• Representing reputation – each servent maintains a resource repository which consists of a resource identifier and a binary value which represents good(1) or bad(0) of the corresponding resource.

• Vote translation – the information in the resource repository, is used to generate

a vote. It adopts the following format: ([resourceID, value]K, certK ). The vote includes a signature under the servent’s key K, together with a certificate for the K.

• Local state – each servent maintains a vote database of votes during the vote polling process. The database contains four attributes: a resource identifier, a timestamp, a vote from the database owner and a vector of votes from others. The database is used for poll responses and statistical correlation calculation.

• Representing correlation – each Credence servent computes a trust coefficient of correlation θ for each vote. The coefficient of correlation utilises the method

of Phi correlation as follows: θ = (p − ab)/qa(1 − a)b(1 − b). Given a set of resources, let a and b be the fraction where servents A and B voted positively, respectively. Let p be the fraction where both A and B voted positively. In the Credence system, only |θ| ≥ 0.5 can be considered to weigh votes. The coefficient of correlation for a peer is stored in a correlation table, which would be used further and be refreshed periodically. 3.7. Summary 35

Approach

The Credence protocol can be divided into four phases and described as follows:

• Phase 1 – Resource searching: an initiator p broadcasts a query message to others to look for a resource. The receivers send a response message back to p.

• Phase 2 – Vote collection: p issues a vote-gather query on a resource to others. Upon receiving the query, servents respond with matching votes in the vote databases.

• Phase 3 – Evaluation of votes: after obtaining a set of votes on the resource, p generates the correlations for voters to weigh the votes, by using the afore- mentioned formula. Hence, p can decide whether to download the resource or not.

• Phase 4 – Resource downloading: once p finishes the downloading, p updates the local repositories, including the resource table, the vote database and the correlation table.

Security Considerations

In general, the Credence system provides a powerful computation of the correlation between two peers based on their past voting histories. However, it is difficult for Credence system to detect those malicious peers who submit accurate votes over an extended period, but occasionally provide inaccurate votes. In addition, this system is a resource-based reputation only.

3.7 Summary

Table 3.4 summarises the comparisons of various trusted reputation mechanisms. Fur- thermore, much evidence and several studies [2, 6, 42] show that reputation systems are a robust solution to protecting P2P networks from malicious attacks. The main advantage of such reputation systems is to protect against most known attacks and vul- nerabilities, and simultaneously retain the characteristic of anonymity together with maintaining minimal overheads. During the past few years, applications and research have been conducted to develop several protocols of reputation systems to protect P2P networks. In this chapter, we have demonstrated and discussed several trusted reputation systems, including eBay, P2PRep, XRep, and X2Rep. 3.7. Summary 36 id id; also id; and a id id and a binary Format of published feedback Sums of positive, negativeneutral and ratings during last six months An ordinal scale (e.g.D) from or A continuous to number80%) (e.g. for a resource An ordinal scale (e.g.D) from or A continuous to number80%) (e.g. for a servent binary value for a source A continuous number (e.g.for 80%) a servent value for a source A self-certificate containing a binary value for aa source sharable positive correlationan for unknown peer isms minus for minus for agree and plus and num plus and num disagree for each voter Format of solicited feedback Positive, negative or neutral rating, plus a short comment Num each servent; num num Num each servent; a binaryeach value for resource A vector expressing reputation value with each offerer;value a for binary each resource;real and number a to weighvoters votes from A binary value fora each correlation resource; table storedset in of a coefficients for each peer Table 3.4: The summary of various trusted reputation mechan Summary of reputation mechanism Buyers and sellers rateafter one transactions another A reputation system basedonly; on data servents receivers rateafter offerers download A reputation system basedand on resources; servents data receiversboth rate offerers and resourcesdownload after completion A reputation system basedand on resources, servents plus avotes; weight data to receivers assess rateresources offerers, and voters aftercompletion download A reputation system basedonly, on plus resources a coefficientto of weigh correlation votes; dataresources, receivers refresh rate vote databasetable and of correlation after downloading Category Online auction web site Online information services for Gnutella Online information services for Gnutella The extension of XRep An object -based reputation system Rep 2 eBay XRep Name X P2PRep Credence Chapter 4

Overview of BitTorrent Network

4.1 Introduction

BitTorrent is a P2P file-sharing protocol [11], the brainchild of Bram Cohen, designed to distribute large resources by utilising the unused upload capacity of downloaders. The philosophy of BitTorrent is that peers should upload while they are downloading. In conventional client/server downloading, a high demand by clients leads to degradation of the server performance. On the other hand, there is a significant number of selfish clients in traditional P2P networks, such as Gnutella. With BitTorrent, high bandwidth utilisation can be achieved and free-riding can be prevented. Recently, BitTorrent has grown into an option for content providers to efficiently distribute data (e.g. BitTorrent helped Redhat to distribute 1.77GB Linux Redhat 9 to more than 180,000 clients in the first five months [26]).

4.2 Architecture

The basic idea of BitTorrent is to split a file into several equal-sized pieces (except for the last piece, which may be smaller). The downloaders of the file are reciprocated with pieces from multiple peers by uploading pieces which they possess. Once the download finishes, the downloaders can reassemble all the pieces into the original file.

Since BitTorrent does not provide a search mechanism, it employs search engines in Web Server. The web server maintains a list of torrent files. Each torrent file refers to a resource with a common tracker entry. The file also includes the number of pieces and the SHA1 hash digest of each piece, which will be used to verify its integrity. The tracker is the central point in the BitTorrent network, and keeps track of all the active peers in the process of uploading/downloading the corresponding resource. Peers in the network are either seeder (peers with a complete copy of the resource) or leecher

37 4.2. Architecture 38

Web Server Tracker

Link .Torrent Point to Track

Peers

Peer0 Leecher1

Leecher2 Seeder1

Figure 4.1: BitTorrent Architecture.

(peers with a partial resource).

1) Before initialising the download, the initiator (P eer0) needs to obtain a torrent file from the web server. 2) After fetching the torrent file, P eer0 then contacts the tracker using a simple protocol layered on top of HTTP and sends information about its IP address and the file currently downloading. The tracker responds with a random list of other peers’ information, including IP address/Port. 3) From this point onwards,

P eer0 establishes connections with other peers and begins to download. Figure 4.1 shows a basic architecture of BitTorrent.

In order to maximise download speed and guard against free-riding, a tit-for-tat policy is utilised. In simple terms: each peer concurrently uploads downloaded data to others at the same time as downloading. Peers willing to upload are rewarded with the best download rates by their neighbours. A choking algorithm, which refers to a peer’s refusal to temporarily upload, controls the efficiency of the tit-for-tat policy. In BitTorrent, each peer decides a fixed number (default is four) of other peers to be unchoked based on current download rates of each connection. Each peer decides which connections to be choked or unchoked by recalculating the download rates once every 10 seconds. Once a peer completes downloading and becomes a seeder, it no longer calculates the download rates to decide which peers should be unchoked. It follows a policy which uploads preferred peers (up to five) with the best upload rates.

To avoid wasting of network traffic caused by a rapid shift in choking or unchoking and to find potential unused bandwidth, an optimistic unchoking, which unchokes a 4.3. Protocol Specification 39

Figure 4.2: A glance at torrent files of Linux from mininova.org web page randomly chosen neighbour regardless of download rate, is used by each peer every 30 seconds.

In order to make the number of copies of each piece as equal as possible in the network and to reduce the chances of a peer having nothing to upload, a good piece selection algorithm is required. When a download initialises, the first piece is randomly selected by a peer. After the first piece has been downloaded, the piece selection strategy changes to the rarest first policy. Using this strategy, the peer picks a piece which is least possessed by other peers to download.

4.3 Protocol Specification

4.3.1 Bencoding

Bencoding organises and specifies data in a terse, efficient and extensible format. It currently supports the following types: strings, integers, lists and dictionaries.

• Strings are encoded as follows: :. The length-prefixed of strings is decimal-based. Example: 4:spam stands for the string ‘spam’.

• Integers are encoded as follows: ie. Example: i3e corresponds to 3, and i-3e corresponds to -3. However, i03e and 4.3. Protocol Specification 40

i-0e are invalid.

• Lists are encoded as follows: le. Lists may contain any bencoded elements, including integers, strings, dictionaries and other lists. Example: l4:spam4:eggse represents a list of two strings: [‘spam’, ‘eggs’].

• Dictionaries are encoded as follows: de. Note that the keys must be encoded strings. The values may be any encoded type, including strings, integers, lists and other dictionaries. Keys must be strings and sorted in raw strings order. Example: d3:cow3:moo4:spam4:eggse represents the dictionary {‘cow’:‘moo’, ‘spam’:‘eggs’}. Example: d4:spaml1:a1:bee represents the dictionary {’spam’: [’a’, ’b’]}.

4.3.2 Metainfo File (.torrent file)

A metainfo file contains information describing the source files and URL of trackers. All the contents in metainfo files are bencoded. The specification of bencoding is defined above. The data of a metainfo file is bencoded in dictionaries with the following keys:

• announce: The URL of the tracker (string).

• announce-list: This is an optional field which is an extension to the official specification. The key is used to supply a group of trackers.

• info: There are two types of formats: one is the case of a single file with no directory structure; the other is the case of multiple files with subdirectory trees. In the single file case, the info field maps to a dictionary with the following structure:

– length: the length of the file in bytes (integer). – name: the name of the file (string). – piece length: the number of bytes in each piece (integer). – pieces: a string which is the SHA1 hash of the piece with a length of 20 (raw binary encoded).

In the case of multiple files, the info field contains a list of dictionaries with the keys described below: 4.3. Protocol Specification 41

– files: consist of a list of dictionaries, one for each file, with the following keys: ∗ length: the length of the file in bytes. (integer) ∗ path: a list of strings represents the path and actual file name. A zero length list is an error case. – name: the name of the top directory in the structure. – piece length: the number of bytes in each piece (integer). – pieces: a string with a length of multiple 20 which corresponds to a list of SHA1 hashes of the pieces in the corresponding index. (raw binary encoded)

• created by: This is also an optional field, which describes the name and version of the program used to create the metainfo file. (string)

• created date: This is an optional field specifying the creation time of the torrent in standard UNIX epoch format.

• comment: An optional field which describes comments made by the creator. (string)

Note: The piece length is usually a power of two, most commonly 256KB(218), 512KB (219) and 1MB (220). The selection of a piece size is irregular and is dependant on providers’ decisions. However, too small a piece size may cause a large metainfo file, and too large a piece size may result in inefficiency. Furthermore, in the case of multiple files, the data is considered as a long continuous stream, comprising the concatenation of each file in the order listed in the files list.

4.3.3 Tracker HTTP protocol

Peers communicate with a tracker via a HTTP protocol. Peers request information from the tracker via HTTP GET parameters and the tracker returns a bencoded response. Peers notifies the tracker of the current status of the metainfo file. The response mes- sage includes a list of peers in the process of downloading the corresponding metainfo file.

The following keys are used in Tracker GET requests:

• info hash: A 20-byte SHA1 hash of the bencoded value of the info key from the metainfo file. The info key is a dictionary as a substring of the metainfo file. 4.3. Protocol Specification 42

• peer id: A 20-byte string generated as a unique ID for clients. Each client randomly generates its own ID at the startup of a new download.

• ip: This is an optional parameter, which identifies the true IP address of a client. In general, this parameter is not required as the IP address of a client can be determined via the HTTP request from the client. However, it is necessary when the IP address from where the request comes is not the IP address of the client. For instance, clients are located behind a Firewall/NAT gateway, or communication between clients and a tracker passes through a proxy/cache server.

• port: This is the port on which the peer is listening. Default port numbers reserved by BitTorrent range between 6881 and 6889.

• uploaded: The total amount uploaded so far, encoded in base ten ascii.

• downloaded: The total amount downloaded so far, encoded in base ten ascii.

• left: The number of bytes remaining to be downloaded by this client, encoded in base ten ascii.

• event: This is an optional parameter specifying either started, completed or stopped. An announcement of started is sent when a download starts. A event key with completed is provided when a download finishes and an announcement of stopped is forwarded when a client ceases downloading.

• num want: This is an optional key, which describes the number of peers that the client wants to receive from the tracker.

The tracker sends back a bencoded dictionary with the following keys:

• failure reason: This maps a string readable by humans which explains the reason for the failure of the request. If this key presents, there are no other keys required.

• interval & peers: Keys of interval and peers are presented in pairs. interval maps to the number of seconds that the peer should wait between regular requests to the tracker. Key peers contain a list of dictionaries, each of which has the following keys:

– peer id: represents the peer’s self-selected ID. 4.3. Protocol Specification 43

– ip: maps the IP address or DNS name of the peer. – port: represents the peer’s port number.

4.3.4 Peer TCP Protocol

Peer connections are implemented over a TCP protocol and are symmetrical. This performs efficiently for exchanging pieces as described in the metainfo file. Each client maintains two types of states on connections with remote peers.

• choked: Choked is a notification that the remote peer will accept no requests until the client is unchoked. All requests from the client will be discarded by the remote peer.

• interested: This is a notification that the remote peer is interested in pieces that the client is offering.

Data exchange happens only if one side is interested and the other side is not choking. However, it should be noted that downloaders need to notify remote peers of their interest state at all times, and keep track of whether or not they have been choked by the remote peers.

Message Flow

The peer wire protocol initiates a handshake followed by a stream of length-prefix messages. Note that all integers in the peer wire protocol are encoded as four bytes big-endian, including all messages which come after the handshake.

1. Handshake

A handshake message is a connection-oriented protocol which exchanges control infor- mation between peers to verify that they are ready to receive data before sending it. The format of a handshake message is listed as follows:

• prefixed length: In version 1.0 of the BitTorrent protocol, this maps a decimal 19.

• prefixed string: In version 1.0 of the BitTorrent protocol, this represents a string ‘BitTorrent protocol’. 4.3. Protocol Specification 44

• reserved: After the fixed headers, this comes in 8 reserved bytes, which are all zero in current implementations. The purpose of this field is to be used for further extension.

• info hash: 20-byte SHA1 hash of the info value from the metainfo file. This is the same info hash that is transmitted in tracker requests. If they are different, the connection between peers would be severed. However, there is an exception whereby clients accept multiple downloads over a single port and maintain a list of info hashes.

• peer id: 20-byte unique ID for the client which is the same peer id that is transmitted in tracker requests. If they do not match each other, then the client would drop the connection.

2. Message

After a successful handshake, there is an alternating stream of length prefixes and messages in the form of . The length prefix is a four bytes big-endian value and the message ID is a single decimal key. The payload is message ID dependant. The following list describes all the types of messages:

Keepalive : The ‘keepalive’ message is a message of zero length with no message ID and no payload. Keepalive messages are usually transmitted once every two minutes to maintain connections between peers.

Choke : The ‘choke’ message is a message with a fixed length and no payload.

Unchoke : The ‘unchoke’ message is a message with a fixed length and no payload.

Interested : The ‘interested’ message is a message with a fixed length and no payload.

Not interested : The ‘not interested’ message is a message with a fixed length and no payload.

Have : The payload of the ‘have’ message is a single index number, which identifies a piece that has just been successfully downloaded. 4.4. Strengths of BitTorrent 45

Bitfield : The ‘bitfield’ message is the first message after the handshaking sequence and it is optional if a client has no pieces. The length of the ‘bitfield’ message represents the length of the bitfield. The bitfield is the payload of the message which corresponds to pieces owned in order. The first byte of the bitfield maps to piece indices 0 ∼ 7, from high bit to low bit, the second byte corresponds to piece indices 8 ∼ 15, and so on. Bits set to one indicate a piece that has been downloaded, while bits of zero represent a missing piece.

Request : The ‘request’ message contains a payload with an index, begin and length. Index corresponds to the zero-based piece index. Begin corresponds to the zero-based byte offset with a piece. Length is generally a power of two (typical values 215 and no greater than 217), except the last of the files.

Piece : The ‘piece’ message is related to the ‘request’ message. It contains a payload with an index, begin and piece data. Index corresponds to the zero-based piece index. Begin corresponds to the zero-based byte offset with a piece. Piece data refers to the data of a portion of the piece.

Cancel : The ‘cancel’ message has the same payload as the ‘request’ message. Typically, it is employed to notify other peers of end game mode (described below) during the end of a download.

A comment on end game mode: when a download is almost complete, it is possible that the last few pieces may only trickle slowly. To speed this up, the client sends requests for all the missing pieces to all the peers from which it is downloading. However, it must send cancel messages to everyone once a piece arrives in order to avoid great inefficiency.

4.4 Strengths of BitTorrent

Several advantages of BitTorrent networks can be outlined as follows [5]:

• Download Performance – BitTorrent is robust and scalable in large file distri- bution. It effectively utilises unused upload bandwidth to achieve high download 4.5. Shortcomings of BitTorrent 46

performance. Also it scales well as number of nodes increases.

• Local Rarest Policy – the rarest policy provides a way for peers to pick pieces during downloading. It is an alternative used to maximise the fragments of content over the network.

• Tit-for-tat Policy – the BitTorrent TFT policy mechanism is used to encourage cooperation in terms of data served by nodes and to guard against free-riding and unfairness in P2P systems.

• Rapid Participation – BitTorrent enables newcomers to participate in the net- work as soon as possible and become productive nodes.

• Other advantages include the fact that spurious pieces/files cannot be propa- gated due to the verification of a SHA1 hash, and that BitTorrent has the ability to resume a download on account of the partition of contents.

4.5 Shortcomings of BitTorrent

Recently, several studies and surveys [20, 29] have indicated that BitTorrent has be- come a “king” of P2P internet traffic. Several studies point out that BitTorrent can efficiently distribute large content with optimisation of the use of network bandwidth, and keep desirable scalability properties. However, we believe BitTorrent has a number of weaknesses.

4.5.1 Distribution of false information

This is the main shortcoming of BitTorrent and is caused by the lack of a way of verifying either resources or peers. The BitTorrent system manages its own namespace, associating identifiers called info hashes with torrent files for resources, and allowing peers to have temporary identities on the system. In fact a malicious file, can easily be distributed by publishing several different torrent files or by combining with an acceptable resource. For instance, a three MB file can be divided into 12 pieces of pieces of 256 bytes, or it can be divided into six pieces of 512 bytes. Because info hash is the SHA1 digest of info value, including name, length, piece length and pieces’ digests in a torrent file, these two different divisions will generate different info hashes with a high probability. Additionally, the identities of peers are not persistent, and may be randomly produced. With the protection of anonymity, a malicious attacker 4.5. Shortcomings of BitTorrent 47 could distribute a doctored file with an attractive name to honest peers for download and further spread.

4.5.2 Man-in-middle attack

A man-in-middle attack can be launched in a P2P network where an intermediate node, which acts as a message route, actively changes the messages, garbles them or drops them [36]. The application-level routing of P2P networking in BitTorrent makes this attack easier. Assume that A is a client, T is a trusted tracker which helps A to find other active peers, and B is a malicious peer.

• A sends the tracker T a request message including its IP address and port, and then T responds with a list of IP addresses of other peers.

• B intercepts the tracker’s response message and modifies the IP address and port fields, directing it to a sacrificial host. The doctored response message from B is sent back to A.

• A begins downloading from the sacrificial host. Although the host does not provide a service, a flood of messages may shut down the sacrificial host.

4.5.3 IP harvesting

IP harvesting is a type of indirect attack, in which a harmful user actively collects a list of IP addresses, and then launches massive attacks against the owners of these IP addresses by discovering their security flaws. The user’s IP address is generally unknown to anyone other than the internet server which the user contacts. However, the IP address needs to be broadcast to other peers due to the open nature of the P2P network. The current BitTorrent network might facilitate IP harvesting because of its open nature and the carelessness of trackers. In fact, each peer in the process of downloading is required to notify a tracker of its IP address, so that the tracker collects thousands of peers’ IP addresses. Therefore, data reading for a malicious agent on a tracker is fairly easy if the agent can sniff1.

Malicious attackers continuously scanning a tracker can find many IP addresses in the BitTorrent network and may take advantage of these IP addresses. If the IP

1Sniff refers to the action of a program/or device that monitors data traveling over a network. Unauthorised sniffers can be extremely dangerous to a network’s security because they are virtually impossible to detect and can be inserted almost anywhere. 4.6. Summary 48 addresses change after some time or are assigned by DHCP, it may not cause any harm. However, users that have static IP addresses and run the BitTorrent clients all the time will be up against bigger problems. The static IP addresses may be a specific internet server. A malicious attacker can either hack into the server through a poorly written BitTorrent client or break into the server by force.

4.6 Summary

Recently, BitTorrent has quickly gained popularity and is becoming one of the most promising P2P systems. BitTorrent is a robust and scalable application targeted at large file distribution. In this chapter, we have described the architecture and the protocol specification of BitTorrent networks. Moreover, we have analysed the charac- teristics of BitTorrent and summarised its advantages and disadvantages. Along with its remarkable advantages, the inherent risks and threats with BitTorrent have become a stumbling block against further development. Therefore, it is necessary to design a self-regulating system to address these problems. In the next chapter, we will propose a new reputation system, which can be straightforwardly piggybacked on BitTorrent-like protocols, and will illustrate this reputation system in detail. Chapter 5

A Robust Reputation Management System – X2BTRep

5.1 Introduction

X2BTRep [56, 55] is designed to overcome the weaknesses, especially the distribution of false contents, of BitTorrent networks. It can be seamlessly integrated into the current BitTorrent protocol, whilst at the same time retaining the protocol’s characteristics of openness, scalability and minimum additional overheads. Moreover, our proposal addresses the shortcomings of other reputation systems including, XRep and X2Rep. In this chapter, we discuss the basic requirements of X2BTRep, describe its advanced functionalities and demonstrate the protocol.

5.2 Principle of our Management Model

5.2.1 Assumptions

Our solution is a system that combines both resources’ and peers’ reputations. We make the following two assumptions with respect to resources and peers. First, each peer i keeps a consistent pair of public (PKi) and private keys (SKi) for multiple use.

Also, there is a peer id (Pi) associated with the peer, which is the digest of the public key obtained using a secure hash function. Second, each torrent file is associated with an identifier (info hash), which is the SHA1 hash of the “info” section in the torrent file.

Additionally, we require a tracker that can be trusted by all the peers with respect to BitTorrent’s centralised architecture. The tracker is a central location in BitTorrent that keeps track of all the peers and helps peers to find each other. If the tracker fails or is a malicious point, the whole system will break down or even lead well-behaved

49 5.2. Principle of our Management Model 50 users to a dangerous network. Therefore, a good tracker is the key to ensuring that the system operates. This is a reasonable assumption so as to enable the BitTorrent network to operate correctly. In our solution, we assume that a tracker is trusted by all the peers, and plays a significant role in the verification of torrent files and the distribution of votes submitted by the peers. The trusted tracker T is associated with a consistent pair of public (PKT ) and private keys (SKT ).

5.2.2 Distributed Repositories

Each peer records its own experiences on torrent files, the offerers of each torrent file and the pollers who provide votes on either torrent files or offerers. Each peer is required to store data about its experiences in three local experience repositories:

• a torrent repository, which stores a pair (info hash, value), where a value describes whether an info hash that the peer has experienced is good (1) or poor (0) in the peer’s opinion.

• an offerer repository, which stores a vector (offerer id, v1, v2, ... vk, m) that presents the most recent k times and the total number m of experiences where a peer has interacted with an offerer whose identifier is offerer id (offerer id is

the same as peer id). v1, v2,...vk are values that are either good (1) or poor (0). During the initialisation for a new offerer, all the values are set to zero. If a new experience is added in, the oldest one will be removed and the value m will be increased by one.

• a credibility repository, which is a pair (peer id, value). The credibility parameter is a weight that describes the reliability of other peers and measures the trust- worthiness of votes from other peers in the voting system. The real value, with an interval from 0 to 1, is set to zero during the initialisation for a new poller.

5.2.3 Voting

There are two types of votes: the torrent vote and the offerer vote. In order to generate a vote on either a torrent file or an offerer, each peer checks its local torrent or offerer repositories and uses the following calculation:

• Torrent Vote – the vote for a torrent file is a Boolean value 0 or 1 associated with

info hash, which is denoted as vH =0/1; 5.3. Further Discussion of Two Novel Algorithms 51

• Offerer Vote – the vote from a peer Pi for an offerer Po is calculated as follows:

k voi =(X vx/k,m) (5.1) x=1 It should be noted that votes for an offerer not only contains a continuous scale

expressing the recent behaviour of Po, but also a sum of experiences with Po.

To protect authenticity and integrity, each poller generates a self-signed certificate called a vote certificate, which contains an identifier of a torrent, an offerer’s id, a poller’s id, opinions on either the torrent or the offerer and a time stamp, and then combines its public key for verification before submitting it to a tracker. The format is shown as follows:

([info hash, IDo,IDi, vH /voi,TS]SKi ,PKi) (5.2)

5.2.4 Centralised Repositories

A trusted tracker manages the votes submitted by pollers. After receiving votes from a poller in the above-mentioned format, the tracker verifies the votes using the public key of the pollers, removes suspicious ones, extracts the id of the poller, as well as opinions on either the torrent or the offerer and the time stamp and then inserts this information (IDi, vH /voi,TS) into corresponding info hash and offerer groups.

5.2.5 Secure Sockets Layer

SSL is an encryption protocol invoked on a secure Web Server to provide a reliable, encrypted and integrity-protected stream to the application. One of the best-known BitTorrent clients, Azureus [9], provides SSL to support tracker security. In our so- lution, we recommend that HTTPs is used in order to protect the tracker/peer com- munication. HTTPs is an http protocol using a Secure Socket Layer (SSL) which can be easily integrated into existing BitTorrent architecture. The HTTPs protocol offers several advantages including protection of the confidentiality of messages as well as authentication of a secure server.

5.3 Further Discussion of Two Novel Algorithms

In order to improve the shortcomings in XRep and X2Rep, we introduce two algorithms: Credibility Award Algorithm and Credibility Chain Exchange. 5.3. Further Discussion of Two Novel Algorithms 52

5.3.1 Credibility Award Algorithm

This is an algorithm to enable newcomers to be involved in the system as soon as pos- sible by means of encouraging them to participate actively. It is designed to counteract the cold-start problem of X2Rep. To implement the credibility award algorithm, a client first needs to have a full copy of the downloaded file and to have submitted a vote notifying other peers that the file is genuine. It is also required that the client keeps uploading for the benefit of other downloaders. Suppose that a downloader receives a vote and several pieces from the client, and finds that the file is genuine after downloading is complete. The down- loader will then give some bonus credibility values according to the client’s uploading contribution. This algorithm can help newcomers to increase their credibility ratings with other peers. Moreover, it is an incentive mechanism to promote each peer’s contribution to the community.

5.3.2 Credibility Chain Exchange

This aims to establish information in a confidence exchange system based on a relia- bility parameter, i.e. the credibility rating. The goal of this algorithm is to break the weakness of the performance bottleneck in the X2Rep protocol. A credibility chain exchange algorithm can effectively help a peer to establish a relationship of trust with an unknown peer via an intermediary node. In our proposal, credibility is regarded as an indication of trust. If the credibility of peer B given by peer A is reliable (it can be expressed as the credibility value cAB, which is higher than a threshold), it means that peer B has provided accurate votes in the past and can be trusted by A. If A happens to meet peer C, and C is a friend of B, then B can introduce C to A. In the end, A quickly establishes a relationship with C. The relationship, denoted as cAC , is calculated by multiplying cAB and cBC . However, malicious peers may pretend to be trustworthy and then attempt to in- troduce a clique of dummy or controlled “friends” to honest users (e.g. referral attacks, cf. Chapter 6). In order to repulse this kind of attack and establish reliable relation- ships with peers, users may make a selection by themselves instead of readily accepting referrals from other peers. Furthermore, users can maintain a database to keep track of the reliability of friends introduced by other peers and adjust the credibility rat- ings of those introducers accordingly. For instance, the credibility ratings of a peer 5.4. Protocol Designs 53

Offerer Po Tracker T

Generates a standard torrent for a file, selects a trusted tracker T and makes a SSL connection

CertRequest([Po, info hash]SKo ,PKo) Po → T

Checks the duplicate of info hash and offerer CertResponse

([[Po, info hash]SKo ,PKo,TS]SKT ,PKT ) T → Po

Verifies the certificate and appends it to the standard torrent

Figure 5.1: Sequence of messages in enhanced torrent creation phase

will be punished by being decreased when the user disagrees with votes from friends introduced by that peer.

With the help of a credibility chain exchange algorithm, peers can build up a comprehensive and reliable database of credibility ratings for pollers. As a result, peers have greater capability to assess resources and offerers, and hence, the likelihood of content pollution can be reduced.

5.4 Protocol Designs

The X2BTRep protocol consists of several phases: (1) torrent search, (2) exchange of votes, (3) evaluation of votes, (4) tracker queries, (5) pieces and credibility chain ex- change, and (6) updating and voting reputation value; plus an enhanced torrent creation phase.

5.4.1 Initialisation Phase: Enhanced Torrent Creation

A torrent file is the start-up in the BitTorrent architecture. A torrent file should be unique and authentic in our reputation system. It allows downloaders to detect whether it has been tampered with, and to trust it is truly associated with the offerer. To achieve these features, we provide a certificate relating to each torrent. The certificate contains information on the offerer’s ID, the identifier of the torrent file and a time

stamp. If a peer Po wants to generate a torrent file with a certificate, it first creates a standard torrent file. Then it establishes an SSL connection with a trusted Tracker T 5.4. Protocol Designs 54

and sends a self-signed certificate, including Po and info hash, with its public key PKo. The tracker checks whether the certificate has been used. If it is unique, it returns a certificate, including the above information, together with the timestamp signed by the tracker’s private key. After Po receives the result from the tracker, it verifies the certificate and appends it to the original torrent file. It then can publish the enhanced torrent file on a Web Site to be downloaded by other peers. Figure 5.1 shows the sequence of messages.

5.4.2 Phase 1: Torrent Search

As with the original BitTorrent protocol, an initiator Pi first searches for a torrent file from a web site W before downloading resources. However, there is a difference between this file and the standard torrent file, in that it contains specific information: a certificate of a torrent file. The certificate, issued by a trusted tracker, contains the identifier of the torrent file, the peer id of the offerer and a creation time stamp. When initiating a download, Pi can verify the torrent file using the public key PKT of the trusted tracker included in the certificate.

5.4.3 Phase 2: Exchange of Votes

If the certificate in the torrent file is valid, Pi establishes an SSL connection with the trusted tracker via the entry point in the torrent file. Then it checks the identification of PKT in the torrent certificate and the one from the SSL connection. If they are the same, Pi begins to exchange votes on either the torrent or the offerer between the tracker via Poll and PollReply messages.

The Poll message contains the vote certificate relating to the offerer. The vote certificate includes info hash, Po, Pi, vote voi on the offerer and Time Stamp. Upon receiving the Poll message, the trusted tracker checks its centralised repositories and responds with a group of votes corresponding to the info hash and the offerer via

PollReply messages. The PollReply messages, signed with SKT then combined with

PKi, allow Pi to preserve the confidentiality and integrity of votes.

As observed in [26], 90% of the leechers remain in the network for less than 10,000 seconds during the beginning of the torrent usage. It can be observed that the number of votes on the torrent given by seeders is very low at the beginning of downloading. Moreover, the tracker log illustrated in Figure 5.2 (a) and (b) traces the torrent of the 1.77GB Linux Redhat 9 over a period of five months from April to August 2003. This 5.4. Protocol Designs 55

figure clearly demonstrates that seeds reach their peak more than 12 hours after the peak of leechers is achieved. In order to avoid a lack of information, each downloader is supposed to provide its opinion on the offerer for other peers’ reference before initiating downloading, if possible.

Figure 5.2: Number of active peers over time (sourced from [26])

5.4.4 Phase 3: Votes Evaluation.

To make a decision on a possible transaction, Pi needs to convert the votes from the previous phase into evaluation values. However, a received vote may be spoofed or generated by a group of colluding peers. In order to reduce these malicious activities, an additional factor called the credibility rating, introduced in the X2Rep protocol [14], is used to remove malicious votes and produce adjusted values on the resource and the offerer.

To present trust values to the user for decision-making regarding downloading, two formulas are introduced below to evaluate the trustworthiness of the torrent and the offerer. It is recommended that the number of valid votes be combined with the trust values to make the decision.

• Adjusted torrent trust value:

VH = (X vH ∗ cij)/ X cij (where cij =6 0); (5.3)

• Adjusted offerer trust value:

Vo = (X voj ∗ cij)/ X cij (where cij =6 0); (5.4) 5.4. Protocol Designs 56

∗ Initiator Pi Web Server W, Tracker T, Peer P Search(keywords) Pi → W

Download(torrent file) W → Pi

Verifies validity of the certificate in the torrent file [phase 1]

Contacts with the tracker T, verifies PKT ′ and PKT ; produces a vote on the offerer

Poll(V otei) Pi → T PollReply

({[info hash, Po, V otes]SKT }P Ki ) T → Pi [phase 2]

Evaluates valid votes and makes a T removes a suspicious V otei and adds download decision a simplified one into the vote group [phase 3]

TrackerGet(info hash, Pi, my IP/P ort) Pi → T

TrackerResponse(interval, peers/failure) T → Pi [phase 4]

Establishes peer connections

∗ Exchange of Pieces ∗ Pi → P P → Pi

Counts the number µj of pieces from Pj (Pj ∈ seeders)

CredRequest([Pi ,Pk,IDs,flag]SKi ,PKi) Selects unknown IDs, Pi → Pk where Cik ≥ δ

CredResponse({[IDs′, values] ,PK } ) SKk k P Ki ′ ′ Pk → Pi(∀IDs ∈ IDs)

Updates new credibility values of IDs’ into credibility repository [phase 5] After downloading completion, updates local repositories; generates a new vote on both torrent and offerer FinalPoll(new V otei) Pi → T

T checks the V otei from Pi, removes the old vote and adds a new one. [phase 6]

Figure 5.3: Sequence of messages and operations in X2BTRep protocol 5.4. Protocol Designs 57

After this evaluation process, Pi can trust its assessment of either the resource or the offerer, and can therefore make a decision whether or not to download the resource. If the evidence on the reputation is not sufficient, downloading may be made by accepting a level of risk.

On the tracker side, the new vote from Pi is added into the vote group after being checked and simplified by the tracker T. A suspicious vote, such as a duplicate or invalid vote, will be removed by the tracker.

5.4.5 Phase 4: Tracker Queries

After deciding to download a resource, the peer Pi queries the tracker in order to obtain a list of peers in the process of downloading, and at the same time submits its IP address and port. The tracker returns a random list of the keys containing: peer id, IP and port, which can be used to establish connections with peers in the next phase.

5.4.6 Phase 5: Pieces and Credibility Chain Exchange

In this phase, Pi contacts peers in the list one by one and then exchanges pieces of the file by index as described in the torrent file. Furthermore, the two algorithms, including the credibility award algorithm and the credibility chain exchange, are implemented in this phase.

In the implementation of the credibility award algorithm, the peer Pi needs to main- tain a temporary parameter µj, which indicates how many pieces have been successfully downloaded from a seeder Pj during the process of downloading. After completing a download, a credibility award rate ∆ is calculated as shown below for pieces provided by the seeder Pj. In phase 6, the rate will be used to calculate a bonus for the seeder according to its contribution.

∆= µj/n (n is the total number of pieces.) (5.5)

The following section describes the credibility chain exchange algorithm. If the credibility Cik given by the peer Pi for peer Pk passes beyond a threshold value δ, Pi then sends a request message CredRequest to peer Pk. The request message includes

IDi, IDk, IDs and flag, which map the Pi’s ID, peer Pk’s ID, a collection of peer ids whose credibility values Pi intends to receive from peer Pk, and the tag assigned to distinguish the request message from pieces exchange messages. Peer Pk checks its 5.4. Protocol Designs 58 credibility repository and returns the credibility values of peer ids with matching IDs.

The message, signed with Pk’s private key and combined with its public key, is en- crypted with the PKi of Pi before being sent back. After Pi receives the returned message CredResponse, it calculates a new credibility value for each peer id in the IDs

(if duplicate credibilities are found, the peer with the highest credibility Cik is chosen), and inserts it into the credibility repository.

5.4.7 Phase 6: Updating and Voting Reputation Value

After completing resource downloading, peer Pi assesses the quality of the resource, updates local repositories and submits a new vote on both the torrent and the offerer to the trusted tracker.

Updating the state on a Peer:

1. The torrent repository: if the quality of the resource is good, the value in the torrent repository is 1, otherwise it is 0.

2. The offerer repository: the content value of the above resource as above, is added into the table, and the oldest one is removed from the table. However, if a malicious offerer generates the torrent, it does not upload at all. This value will be zero. For example, after the common life cycle of a torrent file (for example, 20 days), if no peer finishes the download, it will set the reputation value to 0 at this time.

3. The credibility repository: for each peer which provided a vote:

• If the voting peer Pj provides an accurate vote on either the torrent and the offerer.

– For seeder Cij = Cij +0.05(1 + ∆), only if the quality of the resource is good.

– For others, Cij = Cij +0.05

• If the voting peer Pj provides an inaccurate vote on the torrent: Cij = 0.

Submitting a final vote to the Tracker:

• After updating the state on local repositories, the peer Pi recalculates a new vote certificate on both the torrent and the offerer, then submits the vote via a FinalPoll message to the trusted tracker. After receiving the vote, the tracker 5.5. Summary 59

deletes the old vote and updates the new vote (Pi, vH /voi) on the corresponding torrent and offerer groups.

5.5 Summary

In this chapter, we have presented X2BTRep, a robust trust management protocol based on BitTorrent-like networks. We have defined the protocol through its require- ment and protocol phases. We have started to outline our protocol by describing the basic assumptions, the local storage and centralised repository of experience informa- tion, the generation of votes, the evaluation of trust and the message sequence in the X2BTRep protocol. Moreover, we have introduced two advanced algorithms, namely, the credibility award algorithm and the credibility chain exchange, to improve on sev- eral weaknesses in our predecessors, such as P2PRep, XRep, and X2Rep. Chapter 6

Critical Evaluation of X2BTRep

6.1 Introduction

Several problems are associated with peer-to-peer networks. The basis of these prob- lems stems from the open nature of P2P networks. First of all, this feature allows unlimited peers to join, leave or stay in the system without particular enforcement of any rules. Second, together with the distribution of good resources and services to peers in P2P networks, malicious peers can propagate inauthentic data or viruses. Moreover, free-riding is an inherent problem in P2P networks. No system enforces any compulsion on its peers to share resources. As a result of this phenomenon, the quality and availability of the system relies on a few peers sharing data. Finally, the open nature context also leads to other problems, such as anonymity, which creates opportunities for malicious peers to subvert the system in hiding, and for peers’ private information to be broadcast arbitrarily. Furthermore, several research studies [16, 39, 31] indicate the possibility of attacks, such as pseudospoofing, reputation spoofing, collective attack etc., within reputation- based systems. In this chapter, we assess our reputation system in terms of the essential characteristics of reputation systems [6] and assert that our solution can prevent these attacks upon both BitTorrent networks and reputation-based systems.

6.2 Assessment of X2BTRep

It is essential for reputation management systems within P2P networks to have certain characteristics in the design of solutions. These characteristics can be listed as follows [28]:

• Self-policing systems – each node must be able to apply a solution for the management of trust without supervision or help from a central authority.

60 6.2. Assessment of X2BTRep 61

In our solution, each peer can apply the trust management system by itself. Each peer maintains three local repositories, and gives trust to other peers without any help from a central authority, such as a trusted tracker.

• Anonymity – the solution provided must not reveal the identity of the host or the user on the network, and must use indirect approach such as pseudonym. There is a unique identifer associated with each peer in our proposal. The iden- tifier is neither the identity of the host nor the user on the network. It is the digest of a public key, obtained by using a secure hash function. The public key can be consistent and be used to establish built-up credits for peers.

• No profit to newcomers – it is essential that newcomers joining a system must not be automatically provided with any value of trust. Newcomers must start off with no trust at all, but must be provided with a chance to increase their trust values. Our solution has this essential characteristic. The values for the offerer and the credibility repositories for each peer begin with zero for an unknown peer. However, the values can be changed according to the peer’s behaviour.

• Minimal overheads – this is a very essential characteristic of the solution. As P2P systems operate with large numbers of users, the overheads associated with the solution must not be so great that they cripple the performance of the system as it scales to larger numbers.

– Overheads in Clients (1000 peers for example)

Resource Repository: {[20(bytes)info hash + 1(byte)value] ∗ 1000 ≈ 20KB}

Offerer Repository: {[[20(bytes)info hash + 50(bytes)values] ∗ 1000 ≈ 70KB}

Credibility Repository: {[20(bytes)info hash + 1(byte)value] ∗ 1000 ≈ 20KB} Therefore, each client uses about 100KB in repositories for 1000 peers. This amounts to nothing in terms of current computer storage capacity. – Overheads in a Tracker (3000[metainfo files]*3000[votes] for example)

3000∗3000∗[20(bytes)peer id+3(bytes)vote values+2(bytes)timestamp] ≈ 220MB – Overheads in Messages Poll [1KB] PollReply (1000 peers) [30KB] Tracker GET/Response [as standard] 6.3. Attacks on BitTorrent-like systems 62

Pieces exchange [as standard] Credibility change exchange (100 peers) [6-10KB]

• Robust to attack by collectives – malicious users may know each other and can form collectives of all kinds to destroy a system. Hence, the solution must involve protection against this type of attack (explained in below).

6.3 Attacks on BitTorrent-like systems

6.3.1 Defection attack

Defection attack can be also refer to free-riding problem in P2P networks. In this attack, peers willing to download but reject uploading. However, inherent characteristic of BitTorrent can effectively overcome this problem. In BitTorrent, peers download data from other users by uploading what they have. Therefore, peers with a high upload speed can achieve download optimization. On the other hand, peers will get a low download speed as a result of refusing to upload.

6.3.2 Poisoning attack & Insertion of viruses in carried data

These attacks exploit the fact that there is no way of verifying resources or peers in a BitTorrent network. In our solution, if a malicious user generates a torrent file to distribute a false resource, the offerer id of the user and the info hash of the torrent file will be associated with a bad reputation, and, its subsequent downloading therefore will be prevented.

6.3.3 Denial of service attack

Denial of service attack on BitTorrent networks aims to prevent peers from accessing to central servers using a flood of data. The central servers in BitTorrent networks hold critical information that pinpoints users’ IP addresses and the various fragments of resources they possessing. Once the central trackers have failed, peers cannot find each other for the future. Our proposal untilises SSL in the communication between trackers and peers to prevent this type of attack in a certain extent. SSL provides secure communications between central servers and peers over BitTorrent networks. 6.4. Attacks on Reputation-based Systems 63

6.3.4 Malware software in BitTorrent networks

Malware is malicious code planted on computer systems, and it can allow the attacker to control over the systems, networks, and private data. Most client applications of BitTorrent such as Azureus, BitComet[10] and uTorrent[37] grantee themselves 100% clean, which means they contain no any form of malware, spyware, viruses, trojans and backdoors. Moreover, some BitTorrent clients under open source licence can avoid malware software to be abused in BitTorrent networks.

6.3.5 Identity attack

Our proposal uses HTTPs and hence can efficiently protect against these types of attacks. First, encrypted messages between peers and a trusted tracker make data reading harder. Information, such as peers’ IP addresses, cannot be exposed due to the high security on the tracker side. Second each peer can only find the limited number of other peers returned by the tracker according to its current download. To find more IP addresses, attackers have to always change torrent files to download multiple times. Thus, it makes it expensive for attackers to obtain adequate IP addresses.

6.3.6 Spamming

Peers may send unsolicited, unwanted messages to others without their permission. These messages may be distributed over BitTorrent networks together with needed resources. This type of attack can be prevented by the means of bad reputation asso- ciated with corresponding torrent files and their offerers.

6.4 Attacks on Reputation-based Systems

6.4.1 Pseudospoofing

The simplest attack is undertaken by exploiting “cheap pseudonyms” in the reputation system. A malicious user registers a pseudonym, behaves in a corrupt manner for a while and then turns into a new pseudonym after earning a bad reputation. Our solution makes this attack ineffective because the behaviour of a newcomer is limited with respect to influencing the voting process system.

However, we also provide several mechanisms to help newcomers to participate in the system. First, newcomers can acquire a high reputation value by actively sharing 6.4. Attacks on Reputation-based Systems 64 genuine resources and polling accurate votes. Second, newcomers can help to distribute genuine resources among peers to be granted greater credibility values. More uploading time gives more credibility. Third, newcomers can generate a large reliable credibility repository for themselves using the credibility chain exchange algorithm.

6.4.2 Reputation spoofing

In this attack, malicious users try to find some vulnerability in the reputation algorithm and spoof the reputation scores. In our solution, the credibility parameter is the key that affects the performance of the system. The credibility value from A to B increases only if the votes from B are same as A’s opinion. In our solution, attackers may take advantage of the credibility award algorithm or the credibility chain exchange to increase credibility as soon as possible. Fortunately, our reputation algorithms prevent this type of attack. The formula of the credibility award algorithm itself makes malicious attacks expensive to keep uploading over a long period and the increment of the award is limited rather than infinite. Also, the equation of the credibility chain exchange can protect against this attack by doing a multiplication of the sender’s credibility.

6.4.3 Whitewashing attack

This is a common attack in the eBay online transaction system. It refers to an attacker that actively participates in the system by providing genuine items, but occasionally dumps a small number of inferior goods to be sold to others. In our scheme, an attacker may distribute bona fide resources most of the times, while spreading malicious resources sometimes; or an attacker polls accurately on most offerers and torrent files, but gives a small number of incorrect votes to others. In the first case, although it is difficult to identify the attacker, our torrent reputation-based system can block the distribution of malicious resources by associating the torrent files with a bad reputation. For the second scenario, the occasional malicious-behaviour still makes the credibility of the attacker to other peers drop sharply. This was proven in our experiments, which are demonstrated in the appendix section.

6.4.4 Reputation attack by collectives

When malicious users know each other, they may try to harm the system by acting as a group. Shilling is an attack of this kind. Instead of creating multiple phony identities as in a pseudospoofing attack, this attack maintains several ‘real’ identities 6.5. Summary 65 in order to influence the voting process on a doctored resource and a malicious peer. It can be effectively guarded against through our solution, due to the existence of the credibility parameter. Although malicious attackers may submit false votes to gain high credibility values for each peer in the same group, the false votes may have a negative effect on other well-behaviour users. Our mechanism deals with shilling by trying to ensure there is a great number of well-behaved users, making it expensive for malicious users to create and maintain a sufficient number of shills. Also, once shills provide a false vote, their credibility values will be set to zero.

6.4.5 Referral attack

Referral attack takes advantage of the opportunity to discovering potential relation- ships with peers by introduction. It is a new attack, where a malicious attacker may forge a group of dummy and controlled peers, try to introduce them to other peers and then utilise the forged peers to influence the judgment of other peers. A malicious at- tacker will always behave well in the polling process to acquire high credibility ratings with other peers. The high ratings help the attacker to have the capability to intro- duce false peers to other peers. Using a digest-based mechanism to build cliques of false peers is easy in most reputation systems. Once the false peers have been accepted by other peers, the malicious peer can generate spoofed votes by using false Ids, and after that the decision of other peers may be influenced by those votes. Even though the credibility ratings of false peers may become zero due to negative votes, the malicious attacker is able to continuously introduce new forged peers. However, the self-reliant friend selection in our protocol means that those malicious attackers are incapable of unconstrained introduction of peers. Also, users can establish a database to keep track of the reliability of friends introduced by other peers. If a malicious attacker introduces a false peer and utilises the false Id to behave badly, his behaviour can be witnessed in the database and he will be punished by the credibility ratings.

6.5 Summary

The purpose of a trust reputation system is to improve security and alleviate weak- nesses. X2BTRep is designed in order to minimise or solve current security problems of P2P systems like BitTorrent. In this chapter, we have evaluated our proposal with respect to the essential characteristics in the designed reputation system. Furthermore, we have discussed our protocol in terms of known attacks based on both P2P systems 6.5. Summary 66 and reputation systems. We have drawn the conclusion that our proposal can prevent malicious behaviours in P2P networks and reputation systems. Chapter 7

Implementation and Interpretation of Experiment

7.1 Introduction

In this chapter, we describe how the X2BTRep protocol is implemented and what modification it requires to conform to a standard BitTorrent architecture. We demon- strate a typical component-based structure of BitTorrent, and present the reputation information manager and additional protocol messages in our implementation. Also, we describe a series of experiments aimed at analysing and understanding the per- formance of BitTorrent in a range of strategies. Furthermore, we attempt to verify whether our proposal can protect against several known attacks, including whitewash- ing and collective attacks, through the simulation of malicious strategies. Finally, We analyse several trusted reputation systems, including eBay, P2PRep, XRep, X2Rep and X2BTRep, and draw a comparison between these systems.

7.2 Implementation of X2BTRep in the BitTorrent Environment

7.2.1 X2BTRep Extensions to BitTorrent

We adopt component-based techniques for designing and implementing our proposal. Figure 7.1 shows the information flow structure with X2BTRep protocol extensions; the dotted line encloses three components which do not exist in a common BitTorrent network, but are required to sustain the X2BTRep protocol.

• Reputation Manager – this component handles all reputation value retrieving, refreshing and computation. It also has contact with the Crypto Agent to treat the cryptography functions in the protocol.

67 7.2. Implementation of X2BTRep in the BitTorrent Environment 68

NET

Tracker Reputation Locator GET/Response HTTP Voting Manager Agent Poll/PollReply Tracker/Client

Torrent Manager

Download Pieces Exchange TCP Trust Computation CertRequest Manager CretResponse Client/Client

Crypto Agent Disk Manager

Experience & Shared Credibility Resources Repositories

Figure 7.1: BitTorrent’s Information Flow with X2BTRep protocol extensions

• Experience Repository – this is a repository which stores past accumulated experiences with respect to reputation and credibility.

• Crypto Agent – this offers all the cryptography functions used in the protocol. The component generates a pair of keys and provides the hash algorithm used in the asymmetric protocol.1

In addition to the above auxiliary components designed for our protocol, other components are introduced as follows:

• Locator Agent – This communicates with the reputation manager and the torrent manager, and routes query messages and reputation-related messages between a tracker and peers.

• Torrent Manager – This component controls information on metainfo files to be further used by the locator agent and the download manager.

1In our protocol, we employ a RSA public key algorithm with 1024/2048 bits keys. The SHA1 digest algorithm produces 160 bits to generate the peer id or the identifier for content. 7.2. Implementation of X2BTRep in the BitTorrent Environment 69

• Download Manager – the download manager organises the routing of packets between the client and peers. The packets include the data of pieces indexed in the corresponding metainfo file and credibility information about unknown peers.

• Disk Manager – this is responsible for data reading and writing in the hard disk.

7.2.2 Repository Schema

In our solution, reputation and credibility information is managed by the experience and credibility repositories. The experience repositories consist of two tables, namely, the torrent repository and the offerer repository. The three repositories are summarised as follows:

• Torrent Repository (d7:torrentld9:torrentid20:sInfo hash9:offererid20:sOffererid7:qualityiiQualityee. . .ee) All the data in the torrent repository is bencoded in a dictionary, which lists a series of dictionaries with the following keys: sInfo hash is the identifier of a metainfo file provided by the offerer with the identifer sOffererid and sQuality specifies the quality of the metainfo file.

• Offerer Repository (d8:providerld9:offererid20:sOffererid7:history100:1|2|. . .k| . . .eee) All the information in the offerer table is bencoded in a dictionary with a list of dictionaries, which has several keys. An offerer is associated with an identifier sOffererid and is united with the past k time experience values 1, 2, 3 ....

• Credibility Repository (d6:pollerld8:pollerid20:sPollerid11:credibilityiiValueeeee) This repository keeps track of the credibility value iValue with respect to all relevant pollers with the ID sPollerid. The actual credibility value is iValue percent, which is used for the reputation calculation.

In practice, each entry in each experience table is no more than 50 bytes long. For 10,000 peers, this means that the total amount of disk storage is less than 1.5MB. It is therefore not significant enough to be worth considering in modern computer systems. 7.2. Implementation of X2BTRep in the BitTorrent Environment 70

7.2.3 Additional Protocol Messages

To implement the X2BTRep protocol, four messages are required: Poll, PollReply, Cre- dRequest and CredResponse. The new messages introduced by our protocol should be interoperable with the current BitTorrent network. The additional messages, Poll and PollReply, are based on the traditional tracker HTTP protocol, while the messages CredRequest and CredResponse are piggybacked onto the peer TCP protocol. The ad- ditional messages are described below:

Poll

• Syntax: Http(s)://sDomain/announce?info hash=sInfo hash&poll=sVote

• Token: sDomain specifies the fully qualified tracker name or IP address of the site. sInfo hash specifies a string of Info hash of the metainfo file. sVote specifies a string of a certified vote on either the offerer or the metainfo file.

PollReply

• Syntax: d5:pollsld6:peerid20:sPeerid4:voteX:sVotee. . .d6:peerid20:sPeerid4:voteX:sVoteeee

• Token: The PollReply returned by the tracker consists of a bencoded dictionary containing a list. The list contains several votes bencoded into corresponding dictionaries with the following keys: sPeerid is a 20-byte string of a peer’s self- generated ID and sVote describes a string of votes from the peer.

CredRequest

• Syntax:

• Token: The CredRequest message has a payload with a peerid, length and data which map the peerid of the requestor, the length of the data and a self-signed request message respectively.

CredResponse

• Syntax: 7.3. Experiment 71

• Token: The CredResponse message is related to the CredRequest message. It includes a payload with a peerid, length and data which map the peerid of the responsor, the length of the data, and a self-signed response message respectively.

7.3 Experiment

7.3.1 I. Intention of Experiment

This experiment is designed to show the effectiveness, reliability and robustness of our proposed system. First, it aims at testifying that the system is able to distinguish between genuine and malicious resources in a complex environment; that is to say, it could help clients to make a correct decision on downloading. Second, it focuses on verifying that our system has a strong capability to prevent malicious clients from spreading false content by various means, such as pseudospoofing and collective attacks, as well as combining these two strategies.

Based on the above attacks, in the following sections we define two malicious strate- gies.

Strategy One This strategy is based on a whitewashing attack A harmful offerer tries to distribute polluted resources. The offerer actively publishes genuine re- sources most of the times in order to acquire a good reputation. However, he may sometimes spread malicious resources under the cover of this good reputation.

Strategy Two This strategy is an extension of the first strategy, which combines a whitewashing attack and a collective attack. As well as a harmful offerer such as in Strategy One, there is a group of malicious pollers. All of them, including both the offerer and the pollers, usually behave well in order to achieve a good reputation. However, the pollers will jointly cover up for the offerer when the offerer distributes false resources.

In our experimental implementation, we evaluate our protocol using these two strategies.

7.3.2 II. Experiment Setting

Our model is based on a Kazaa file-sharing network [25] with some modifications. The experimental network consists of 1000 clients, in which each client can initiate four 7.3. Experiment 72 queries to download at a time. Queries are drawn from various objects which exist in the system during a time interval of 30 times. New objects are introduced into the system at a rate of 10-30 per time, where 90% of the objects are assumed to be genuine resources, and the remaining 10% are false contents. In order to simulate the experiment in a more realistic environment, we randomly assign ownership of the objects to 200 peers out of 1000 clients, so that we can concentrate on evaluating the reputations of this small group of peers.

7.3.3 III. Working Principle

The basic model of a system’s working principle to estimate the reputation of both a given object and its owner (the offerer) is illustrated in Figure 7.2 below, which is based on the assumption of a perfect situation in X2BTRep. That is, it is supposed that Po distributes a resource R with good quality and P1, P2 and P3 are honest clients who download the resource in sequence.

Tracker Server Centralised Repository Resource R P o Centralised Repository Subtitle Offerer Torrent Peer ID Vote Vote

P1 62% 1

P2 72% 1 P1

R P3 82% 1

P2

P3 R

R

Calculation Table Calculation Table Subtitle P Credibility Adjusted Torrent Adjusted Offerer Download Peer ID o Description Experience Rating Value Value Probability

P1 60% n/a n/a n/a 50% There is no vote. P has decided to P 70% C = 0.6 0.6*.62 / 0.6 = .62 0.6*1 / 0.6 = 1 (.62+1) / 2 = 81% 1 2 21 download C31 = 0.7 (.7*.62+.8*.72)/ (.7*1+.8*1)/(.7+.8) P1 & P2 have decided P3 80% (.67+1) / 2 = 84% C32 = 0.8 (.7+.8) = .67 = 1 to download

Figure 7.2: Basic working principle based on a genuine resource and honest clients

It is demonstrated in the above figure that the downloading probability of P1 is 50% as he is the first downloader, and there is no reference to the voting for resource

R. However, P1 has offerer Po’s experience (current reputation value on Po from P1) 7.3. Experiment 73

as being at 60%. In this case, it is assumed that P1 draws a query on R and votes R at 1, subsequently voting Po at 62% (60% +2%). Then it is P2’s turn to download.

Since P2 has obtained the votes on both R and Po from P1, he thus has a credibility rating for P1 at 0.6, which means that P1 is a fairly reliable peer in terms of his voting. Therefore, the downloading probability can be calculated at 81% based on the formula in Chapter 5.

As the downloading probability is quite high, P2 decides to download. Once P2 completes his query, he gives a vote onR at 1 and voting on Po would increase to

72% (70%+2%). Next, the process and method for P3 to make a decision on a query and vote on both R and Po are similar as those for P2, which are illustrated in the calculation table in the above figure. If there is sufficient evidence to support the fact that the object is genuine, the downloading probability would directly become 100% without calculations. For exam- ple, if the result of the total number of positive votes on R divided by the total number of negative votes on R is more than 10, it obviously indicates that the object is genuine. In this way, clients can choose genuine resources to download according to the downloading probability.

7.3.4 IV. Malicious Strategies

The above-mentioned basic working principle is effective in a perfect situation. Now, we relax the restrictive conditions and introduce malicious strategies in order to test our system. Generally, we divide the malicious strategies into the following two categories: • Strategy 1 The first category of malicious strategy is based on a whitewashing attack. Such malicious attackers usually participate in the system through distributing genuine ob- jects, so that they can achieve a high reputation. However, they occasionally distribute false objects under the shield of their high reputation. Hence, they can successfully spread malicious content in the network while maintain their good reputation. In our experiment it is simulated that the malicious attacker distributes 4-5 genuine objects but just 1-2 false objects within 20-30 times. The method to calculate the downloading probability in this simulation is described above. Basically, the calculation method in the simulation of malicious Strategy 1 is the same as that in the above-mentioned basic model of the system’s working principle.

It is also assumed that P1, P2 and P3 are honest clients who download the resource 7.3. Experiment 74

Tracker Server Centralised Repository Resource R P o Centralised Repository Subtitle Offerer Torrent Peer ID Vote Vote

P1 58% 0

P2 68% 0 P1

R P3 78% 0

P2

P3 R

R

Calculation Table Calculation Table Subtitle P Credibility Adjusted Torrent Adjusted Offerer Download Peer ID o Description Experience Rating Value Value Probability

P1 60% n/a n/a n/a 50% There is no vote. P has determined to P 70% C = 0.6 0.6*.58 / 0.6 = .58 0.6*0 / 0.6 = 0 (.58+0) / 2 = 29% 1 2 21 download C31 = 0.7 (.7*.58+.8*.68)/ (.7*0+.8*0)/(.7+.8) P1 & P2 have decided P3 80% (.63+0) / 2 = 32% C32 = 0.8 (.7+.8) = .63 = 0 to download

Figure 7.3: Working principle based on a milicious resource and honest clients

in sequence. However, the difference now is that Po acts as a malicious attacker, distributing a resource R with poor quality.

It is shown in the above figure that our system is able to distinguish the quality of the resource through indicating a drop in download probability. In particular, it is noted that the vote for the object will become zero as long as the peers find out that the object is malicious. Meanwhile, the vote for Po will decrease by 2% with zero voting on the object.

If there is sufficient evidence to support the fact that the object is false, the down- loading probability would directly become zero without any calculation. For instance, if the result of total number of positive votes on R divided by the total number of negative votes on R is less than 0.1, it obviously indicated that the object is false.

By doing this, our system can prevent such a malicious attack, so that the malicious content can not be spread in the network as no client will choose to download this object.

• Strategy 2

The distinctive feature of the second category of malicious strategy is collective malicious attackers who attempt to harm the system. Like the malicious peer in the 7.3. Experiment 75

first category, one peer in this dangerous group pretends to be a good participant in the network with a high reputation, by usually offering genuine resources but occasionally distributing false objects. When he offers a false object, other malicious pollers at- tempt to cover up his misdemeanors by inaccurately voting for both the object and the malicious peer. Also, in order to increase these pollers’ credibility ratings in order to convince other clients, they always actively participate in the voting system on popular objects, but just tell a lie when help the malicious peer with his attack.

Tracker Server Centralised Repository Resource R P o Centralised Repository Subtitle Offerer Torrent Peer ID Vote Vote

Pm 100% 1

P1 68% 0 Pm

P2 78% 0

P1

P2 R

R

Calculation Table Calculation Table Subtitle P Credibility Adjusted Torrent Adjusted Offerer Download Peer ID o Description Experience Rating Value Value Probability Pm is an accomplice Pm 100% n/a n/a n/a n/a of Po

P1 70% C1m = 0.6 0.6*.1 / 0.6 = 1 0.6*1 / 0.6 = 1 (1+1) / 2 = 100% Pm has given his vote

C2m = 0.7 (.7*1+.8*.68)/ (.7*1+.8*0)/(.7+.8) (.83+.47) / 2 = P1 has finished P2 80% C21 = 0.8 (.7+.8) = .83 = .47 65% downloading

Figure 7.4: Working principle based on a malicious resource and several dishonest pollers

It is illustrated in Figure 7.4 above that Pm attempts in order to cover up the malicious offerer’s attack to attract more clients to download a false resource by voting positively on the false object at 1 and fraudulently voting Po at 100%. Although some peers at the beginning of the cycle of downloading times might be affected by the deceitful vote from Pm, other peers would recognise within a short time the real quality of the object through the calculation of downloading probability in our system.

7.3.5 V. Simulation Result

The detailed simulation results for each situation are shown as in Figure 7.5 below in order to assess our robust system. Two important variables are identified in the 7.3. Experiment 76 two-dimensional charts, namely, the number of peers and the object lifecycle (Time). Also, the legend inquirers represents the population of all the peers that request the object set during the lifecycle of the object, whilst the legend downloaders represents the population of all the peers that actually downloaded the object in the experiment.

Basic working principle Whitewashing attack with only a malicious offerer 45 40 Downloaders Inquirers Inquirers Downloaders 40 35

35 30

30 25

25 20 20

Number of peers Number of peers 15 15

10 10

5 5

0 0 5 10 15 20 25 30 5 10 15 20 25 30 Object lifecycle (Time) Object lifecycle (Time)

a). The distribution of a genuine object under 10% noise b). The distribution of a false object based on malicious on voting Strategy 1

Collective attack with a malicious offerer and 10 malicious pollers Collective attack with a malicious offerer and 50 malicious pollers 45 60 Inquirers Inquirers Downloaders Downloaders 40 50 35

30 40

25 30 20 Number of peers Number of peers 15 20

10 10 5

0 0 5 10 15 20 25 30 5 10 15 20 25 30 Object lifecycle (Time) Object lifecycle (Time)

c). The distribution of a false object with a malicious d). The distribution of a false object with a malicious offerer and 10 malicious pollers and 50 malicious pollers

Figure 7.5: Simulation results in different situations

1). Result one: Part (a) of Figure 7.5 indicates the simulation of a perfect situation, with 10% noise caused by users’ mistakes on voting or non-deterministic decisions. As shown in the figure, 44 peers requested the object the first time, of which 38 actually downloaded the object. The second time, 35 out of 36 peers drew queries. The 10th time, 17 peers requested objects while 27 peers downloaded. Therefore, it is revealed that the downloading trend is in line with the request trend.

2). Result two: Part (b) of Figure 7.5 shows the distribution of a false object based 7.3. Experiment 77 on malicious Strategy 1. It is illustrated in the figure that the first time, 36 peers requested the object, then five of those peers downloaded the object. However, the following times, although more than 30 peers requested the object, only a few peers actually drew queries.

3). Result three: Parts (c) and (d) in Figure 7.5 illustrate the distribution of a false object based on malicious Strategy 2. It is supposed that 10 and 50 malicious pollers made an alliance with the malicious offerer to spread a false resource in part c and part d respectively.

As indicated in part (c), 11 out of 37 peers drew queries on the false objects the first time. However, the number of peers that downloaded the object dropped dramatically for the following times, and only a few peers wanted to query the object after two time slices no matter how many peers requested the resources.

Since there were more malicious pollers in part (d), more peers downloaded the false objects at the beginning of object lifecycle. Thirty-two out of 41 peers drew queries on false resources the first time. However, the number of peers that actually downloaded the object decreased to zero the fourth time. After that, only a few peers drew queries, regardless of the number of peers that requested the object.

7.3.6 VI. Conclusion of Our Experiment

We have evaluated our system and shown that it is reliable and effective. We have conducted experiments on a large number of peers with multiplex characteristics in terms of varying numbers of objects, the introduction of polluted resources, noise on voting and different malicious strategies. An analysis of the results of these experiments has demonstrated a good understanding of our reputation system. Finally, we conclude our major findings as follows:

• Peers can make a correct downloading decision in a complex environment with the help of our system.

• Resources of low quality can easily be recognised and kept from spreading over the network.

• Our reputation system offers a strong capability to prevent two different malicious strategies, including whitewashing attacks and collective attacks. 7.4. Comparison with other Reputation Systems 78

7.4 Comparison with other Reputation Systems

This section consists of a comparison between X2Rep and other reputation systems, including eBay, P2PRep, XRep and X2Rep. We analyse these trust reputation systems with respect to several features regarding system category, target network, distinctive features, management of reputation information, publishing of reputation information, and security considerations.

eBay is a typical example of a reputation system in the electronic market over the internet. It works on a binary feedback mechanism for each transaction. Reputation information provided by buyers and sellers is maintained in the eBay centralised server and can be accessed by any clients in the system. Although eBay is able to establish trust relationships between buyers and sellers, it nevertheless tolerates fraud, as well as unfair and discriminatory behaviour.

The other system, X2Rep, and its predecessors, including P2PRep and XRep, are all decentralised reputation systems designed for the Gnutella protocol. Among these systems, P2PRep is the only protocol based on servent reputation information, whilst the rest of the systems are associated with both resources’ and peers’ reputations. However, all these systems require peers to be online during the voting transmission, and do not share trust information among peers.

Another robust reputation system, Credence, provides not only a basic voting scheme and a weight of correlation to evaluate votes, but also a selected information sharing mechanism to find the correlation between peers without common interests. The Credence system provides a powerful computation of the correlation between two peers based on past voting histories. However, it is difficult for the Credence system to detect malicious peers that submit accurate votes over an extended period, but then provide inaccurate votes occasionally. In addition, this system is a resource-based reputation system only.

Our proposal, X2BTRep, designed for BitTorrent networks, is an extension of X2Rep. X2BTRep deals with trust information based on a center and self manage- ment. It also allows peers to submit votes offline and to share trust ratings with other clients. Moreover, it can prevent known attacks related to P2P systems and reputation systems. Table 7.1 draws a comparison between the various reputation systems. 7.4. Comparison with other Reputation Systems 79 Rep protocol 2 X Security Considerations Builds trust between most buyers and sellers. Fraud haunting every eBay transaction; discriminatory and unfair behaviours Against distribution of polluted content and man- in-middle attack. Ignores reputation related to resources Against self-replication, ID stealth man-in-middle attack, pseudo-spoofing and shilling. But suffers fromprivacy exposure, inadequate reputation calculation and additional overheads Guards against known attacks in the XRepand protocol improve on weaknesses in the predecessor. Several shortcomings including cold-start and performance bottlenecks Guards against distribution of polluted content and shilling attacks; but suffers from whitewashing attacks Improves on the weaknesses of the and is able toattacks. prevent known Publishing of Information Sums of positive, negative and neutral ratings during the last half year Requires peers online to collect votes. The votesbe can either an ordinalor scale continuous number forofferer an Requires peers online to collect votes. The votesbe can either an ordinalor scale continuous number forofferer; an a binary valueresource for a Require peers online to collect votes. The votesbe can either an ordinalor scale continuous number forofferer; an a binary valueresource for a A self-certificate including a binary value for aand resource; a positive correlation shared for other peers Allows peers offline to collect votes. The votesbe can either an ordinalor scale continuous number forofferer; an a binary valueresource for a Rep and other trust reputation mechanisms 2BT System Features Information Management Center management – feedbacks rated by buyers and sellers is storedeBay in centralised the server Self management – feedback rated by downloaders for offerers or pollers isin reserved each downloader’s local storage Self management – feedback rated by downloaders for offerers or resources isin stored each downloader’s local repository Self management – feedback rated by downloaders for resources, offerers or pollers is reserved in each downloader’s local storage Self management – feedback rated by downloaders for resources and weight correlations for pollers are reserved in each downloader’s local storage Center and self management – feedback rated by downloaders for offerers or resources is stored incentralised a tracker. Weights are maintained in each downloader’s local storage Distinctive Features A binary feedback mechanism for each transaction between buyers and sellers A feedback mechanism based on servents only; a weight to estimate votes from servents A feedback mechanism based on servent and resource; no weight to evaluate votes from servents Designed to improve the XRep protocol; a weight provided to assess votes from servents A feedback mechanism based on resources only; plus a sharable weight to assess votes from servents A feedback mechanism based on peer and resources; a weight to evaluate votes from peers Target Networks internet Designed for a Gnutella-like protocol Designed for a Gnutella-like protocol Proposed for a Gnutella-like protocol Designed for a Gnutella-like protocol Designed for a BitTorrent-like protocol Table 7.1: The summary of a comparison between X System Category A trust management system in electronic markets A reputation management system A trust reputation management system A reputation management system A reputation management system A trust reputation management system Rep Rep 2 eBay XRep 2BT X P2PRep System Credence X 7.5. Summary 80

7.5 Summary

In this chapter, we have presented the design and implementation of BitTorrent with X2BTRep extensions. We have discussed the implementation based on component tech- niques and introduced several additional components used for reputation management. However, this additional load of managing and exchanging reputation information has resulted in a limited impact on the performance of the existing BitTorrent protocol. Also, our simulation shows that the X2BTRep protocol allows P2P systems to guard against a series of attacks, such as the distribution of false information, whitewashing and collective attacks, while preserving the characteristics of anonymity and openness in P2P systems. Moreover, we have draw a table to demonstrate our proposal as well as several other reputation systems with respect to a group of features. Chapter 8

Conclusions and Future Work

Nowadays, the internet has become a cooperative and information-sharing network with millions of hosts all over the world. In the early period of the development of the internet, the traditional client/server model dominated in the field of information exchange. However, as an increasing number of new people joined the internet every day, the bandwidth, processing and storage of this model could not be applied as effectively and efficiently as previously. Therefore, the emergence of the P2P technique targets this problem with several enhanced characteristics, including cost reduction, anonymity, scalability, dynamism, resource aggregation and no single point of failure.

In a P2P network, each peer has equivalent capabilities and responsibilities, and each participant is also able to initiate a communication in order to share huge amounts of resources. The difference between the traditional client/server model and P2P net- works is that P2P computing allows peers to act as both “clients” and “servers” to other peers.

The P2P network has evolved over three generations, namely, centralised P2P, decentralised P2P and hybrid P2P, since Shawn Fanning created Napster, the first true global P2P file-sharing infrastructure in May 1999. Currently, P2P is entering a new generation with a series of enticing features, including the introduction of cryptography, bi-directional and multiple connections, dynamic IP address/NAT and firewall, plus tit- for-tat policy. These features are leading to fresh applications in P2P network, such as eDonkey and BitTorrent, as well as Skype, of which BitTorrent is the most popular and promising P2P file sharing protocol.

BitTorrent, the brainchild of Bram Cohen, was designed to distribute large resources by taking advantage of the unused upload capacity of downloaders. With BitTorrent, high bandwidth utilisation can be achieved as efficiently as possible, and free-riding can be prevented. Several other significant features, such as scalability and diversity in service, provide BitTorrent with sufficient potential for future growth. Along with its

81 82 remarkable advantages, however, inherent risks and threats, such as the distribution of false information, man-in-middle attacks, and IP harvesting, have become stumbling blocks against further development.

It has been revealed through previous evidence and studies that reputation sys- tems, such as P2PRep, XRep, and X2Rep, have become robust solutions to protect Gnutella-like P2P networks from such malicious attacks. In these reputation systems, the quality of a given resource/peer is determined by a user on the basis of historical in- formation from other users. The significant advantage of such reputation systems is to protect against most known attacks and vulnerabilities, and simultaneously retain the characteristic of anonymity, together with maintaining minimal overheads. However, little research has been conducted to develop a reputation system protocol to protect P2P networks with BitTorrent architecture. Also, it is evident from past research that none of these Gnutella-like P2P network solutions can be applied or integrated into the BitTorrent protocol. Moreover, they have shortcomings, such as online-polling only, cold-start and performance bottleneck problems.

Our proposal, XBTRep is intended to fill this blank, as it is designed to overcome the weaknesses of BitTorrent networks, particularly the distribution of false contents. It can be seamlessly integrated into the current BitTorrent protocol, whilst also main- taining the protocol’s characteristics of openness, scalability and minimum additional overheads. Moreover, our proposal addresses the pitfalls of other reputation systems, including XRep and X2Rep.

There is strong evidence from a series of theory analysis and simulation results that our proposal is able to prevent all known attacks on BitTorrent-like systems and reputation-based systems. Furthermore, it is demonstrated from experiments that X2BTRep is capable of guarding against more complicated attack strategies, which combine two different malicious attacks, including whitewashing attacks and collective attacks.

The design of reputation systems in peer-to-peer networks presents various chal- lenges, as each P2P architecture has specific characteristics and its own set of appli- cation problems. Our X2BTRep is designed on the basis of the centralised structure, BitTorrent. It may therefore not be effective and efficient if it is applied in the decen- tralised or hybrid structure. Meanwhile, our proposal also need some modification to accommodate other centralised structures, such as Napster and eDonkey, due to the different protocols. 83

Illegal activities involving BitTorrent and other P2P application may cause P2P to flourish with a short life. People would like to distribute illegal copies of films, music, software, etc [7]. However, The actual program and technique of BitTorrent have many legal and legitimate uses although many people choose to use this medium to distribute illegal content. Therefore, the scope of this thesis does not address this issue. We will consider this issue in the future work.

In the future, there is no doubt that applications with P2P technology to a large ex- tent will be rapidly developed, since P2P networks have demonstrated various enticing features. Reputation systems will thus be developed and modified to meet the require- ments of P2P networks as well. However, the P2P network is a very heterogeneous work and research area. Our future research will continue to analyse and develop rep- utation systems to improve P2P networks, so that they can be applied more effectively and efficiently. Our future research topics will include the eXeem file-sharing P2P pro- tocol; the application of the current X2BTRep on other centralised structures, such as eDonkey; and the application of X2BTRep on decentralised and hybrid structures. Bibliography

[1] All about peer-to-peer (p2p). Technical report, May 20, 2005, Available at http://www.Webopedia.com (Accessed at February 2006).

[2] Karl Aberer and Zoran Despotovic. Managing trust in a peer-2-peer information system, 2001.

[3] E. Adar and B.A. Huberman. Free-riding on gnutella. First Monday, (5(10)), October 2000.

[4] Stephanos Androutsellis-Theotokis. A survey of peer-to-peer file sharing tech- nologies. Technical report, ELTRUN, Athens University of Economics and Busi- ness, Greece, 2002, Available at http://www.mimuw.edu.pl/ alx/ask/androutsellis- theoto02survey.pdf (Accessed at February 2005).

[5] Ashwin R. Bharambe, Cormac Herley, and Venkata N. Padmanabhan. Analyzing and improving bittorrent performance. Microsoft Research, Redmond, WA 98052, March 2005.

[6] Vishwas V. Bhat. Reputation management in peer-to-peer systems. Technical re- port, Department of Computer Sciences, University of Texas at Austin, December 2004, Available at http://www.cs.utexas.edu/ vishwas/documents/Reputation.pdf (Accessed at January 2006).

[7] John Borland. Hollywood, bittorrent creator strike deal. Avail- able: http://news.com.com/Hollywood,+BitTorrent+creator+strike+deal/2100- 1032 3-5967750.html, 2005.

[8] CacheLogic. [Online]. Available: http://www.CacheLogic.com/, 2004.

[9] Azureus-Java BitTorrent Client. Available: http://azureus.sourceforge.net/.

[10] BitComet-A C++ BitTorrent Client. Available: http://www.bitcomet.net/.

84 BIBLIOGRAPHY 85

[11] Bram Cohen. Incentives build robustness in bittorrent. In Workshop on Economics of Peer-to-Peer Systems, Berkeley, CA, USA, May 2003.

[12] F. Cornelli, E. Damiani, S. De Capitani di Vimercati, S. Paraboschi, and P. Sama- rati. Choosing reputable servents in a p2p network. Eleventh International World Wide Web Conference, Honolulu, Hawaii, May 2002.

[13] Qnext Corp. [Online]. Available: http://www.qnext.com/.

[14] Nathan Curtis, R. Safavi-Naini, and W. Susilo. X2rep: Enhanced trust semantics for the xrep protocol. Applied Cryptography and Network Security. Second Inter- national Conference ACNS2004 . Proceedings. LNCS3089, pages 205–219, 2004.

[15] E. Damiani, S. De Capitani di Vimercati, S. Paraboschi, P. Samarati, and F. Vi- olante. A reputation-based approach for choosing reliable resources in peer-to-peer networks. Proceedings of the 9th ACM conference on Computer and Communica- tions Security, pages 207–216, November 2002.

[16] Chrysanthos Dellarocas. Immunizing outline reputation reporting system against unfair rating and discriminatory behavior. 2nd ACM Conference on Electronic Commerce, pages 150–157, 2000.

[17] Chrysanthos Dellarocas. The digitization of word-of-mouth: Promise and chal- lenges of online feedback mechanisms. Management Science 49 (10), pages 1407– 1424, 2003.

[18] Chrysanthos Dellarocas. Building trust online: The design of robust rep- utation reporting mechanisms for online trading communities. Technical report, Massachusetts Institute of Technology, USA, 2004, Available at http://ebusiness.mit.edu/research/papers/101 Dellarocas TrustManagement.pdf (Accessed at November 2005).

[19] eDonkey protocol description. [Online]. http://kent.dl.sourceforge.net/pdonkey/eDonkey- protocol-0.6.2.html.

[20] Nelson D Eubanks. Bittorrent: Digital river of the hacker culture. School of Information and Library Science, University of North Carolina at Chapel Hill, April, 2005. BIBLIOGRAPHY 86

[21] Robert Flenner, Michael Abbott, Toufic Boubez, Frank Cohen, Navaneeth Krish- nan, Alan Moffet, Rajam Ramamurti, Bilal Siddiqui, and Frank Sommers. Java P2P Unleashed. Sams, 2003.

[22] Gnutella. Gnutella ultrapeers. Available: http://rfc-gnutella.sourceforge.net /Protocols/Ultrapeer/Ultrapeers.htm/, 2002.

[23] Nathaniel S. Good and Aaron Krekelberg. Usability and privacy: a study of kazaa p2p file-sharing. Available at: http://www.hpl.hp.com/techreports/2002/HPL- 2002-163.html, 13 June 2002.

[24] Grokster. [Online]. Available: http://www.grokster.com/.

[25] Krishna P. Gummadi, Richard J. Dunn, Stefan Saroiu, Steven D. Gribble, Henry M. Levy, and John Zahorjan. Measurement, modeling, and analysis of a peer-to-peer file-sharing workload. ACM Symposium on Operating Systems Prin- ciples, October 2003.

[26] Mikel Izal, Guillaume Urvoy-Keller, Ernst W Biersack, Pascal A Felber, Anwar Al Amra, and Luis Garces-Erice. Dissecting bittorrent: five months in a torrent’s lifetime. PAM’2004, 5th Annual Passive & Active Measurement Workshop, April 19-20, 2004, Antibes Juan-les-Pins, France, April 2004.

[27] P2P Journal. Available: http://www.P2PJournal.com/.

[28] Sepandar D. Kamwar, Mario T. Schlosser, and Hector Garcia-Molina. The eigen- trust algorithm for reputation management in p2p networks. In the Proceedings of the Twelfth International Conference on World Wide Web, Budapest, Hungary, 2003.

[29] T. Karagiannis, A. Broido, N. Brownlee, K. Claffy, and M Faloutsos. Is p2p dying or just hiding? Presented at Globecom 2004, Dallas, TX, November/December, 2004.

[30] Charlie Kaufman, Radia Perlman, and Mike Speciner. Network Security: Private Communication in a Public World. Prentice Hall PTR, Upper Saddle River, NJ 07458, second edition, 2002.

[31] Jiejun Kong. Formal notations of anonymity for peer-to-peer networks. Technical report, UCLA Computer Science Technical Report CSD-TR050016, May 2005. BIBLIOGRAPHY 87

[32] Jian Lian, Rakesh Kumar, and Keith W. Ross. The kazaa overlay: A measure- ment study. Technical report, Department of Computer and Information Science, Polytechnic University, Brooklyn, NY, USA 11201, September 2004.

[33] KaZaA media desktop. [Online]. Available: http://www.kazaa.com/.

[34] Thomas Mennecke. Bittorrent remains powerhouse network. Slyck News, Avail- able at http://www.slyck.com/news.php?story=649 (Accessed at February 2006), January 31, 2005.

[35] Dejan S. Milojicic, Vana Kalogeraki, Rajan Lukose, Kiran Nagaraja, Jim Pruyne, Bruno Richard, Sami Rollins, and Zhichen Xu. Peer-to-peer comput- ing. Technical report, Hewlett-Packard Technical Report, 2002, Available at http://www.hpl.hp.com/techreports/2002/HPL-2002-57R1.pdf (Accessed at Jan- uary 2006).

[36] Mayank Mishra. Cascade: an attack resistant peer-to-peer system. The 3rd New York Metro Area Networking Workshop, 2003.

[37] µTorrent A BitTorrent Client. Available: http://www.utorrent.com/.

[38] Napster. [Online]. Available: http://www.Napster.com/.

[39] Andy Oram. Peer-to-Peer: Harnessing the Power of Disruptive Technologies. O’Reilly & Associates, Inc., March 2001.

[40] A. Parker. The true picture of peer-to-peer filesharing. [Online]. Available: http://www.cachelogic.com, 2004.

[41] Adam Pasick. Livewire - file-sharing network thrives beneath the radar. Yahoo Technology News, Available at http://in.tech.yahoo.com/041103/137/2ho4i.html (Accessed at February 2006), November 4, 2004.

[42] Joseph O. Patterson. A matter of trust: Reputation manage- ment in peer-to-peer networks. Technical report, 2003, Available at http://csci.mrs.umn.edu/Personal/pub/Patterson/SeminarIIPaperDevelopment/sem2 draft.doc (Accessed at November 2005).

[43] J.A. Pouwelse, P. Garbacki, D.H.J. Epema, and H.J. Sips. A measurement study of the bittorrent peer-to-peer file-sharing system. Elsevier Science, May 2004. BIBLIOGRAPHY 88

[44] SETI@home project website. [Online]. http://setiathome.berkeley.edu/.

[45] Gnutella protocol development. [Online]. Available: http://rfc-gnutella.sf.net/.

[46] Dongyu Qiu and R. Shrikant. Modeling and performance analysis of bittorrent-like peer-to-peer networks. SIGCOMM, September 2004.

[47] Paul Resnick, Richard Zeckhauser, Eric Friedman, and Ko Kuwabara. Reputation systems: Facilitating trust in internet interactions. Communications of the ACM 43(12), 2000.

[48] Ashish Sharma. P2p networks classifieds: Three types of p2p networks are col- laborative computing, messaging and affinity groups. Technical report, September 12, 2002, Available at http://www.pcquest.com/content/p2p/default.asp (Accessed at February 2006).

[49] Heather A. Smith, John Clippinger, and Benn Konsynski. Riding the wave: Dis- covering the value of p2p technologies. Communications of AIS, 2002.

[50] Gnutella Protocol Specification. The gnutella protocol specification v0.4 document revision 1.2. Technical report, May 2003, Available at http://www9.limewire.com/developer/gnutella protocol 0.4.pdf.

[51] Keniv Walsh and Emin G¨un Sirer. Fighting peer-to-peer spam and decoys with object reputation. SIGCOMM’05 Workshops, Philadelphia, PA, USA, August 22-26, 2005.

[52] WASTE. [Online]. Available: http://waste.sourceforge.net/index.php/.

[53] Official BitTorrent website. [Online]. Available: http://www.BitTorrent.com/.

[54] Official Skype website. [Online]. Available: http://www.Skype.com/.

[55] L. Yu. X2btrep trusted reputation system: A robust mechanism for p2p networks. HDR Student Conference, University of Wollongong, Australia, August 2006.

[56] L. Yu, W. Susilo, and R. Safavi-Naini. X2btrep trusted reputation system: A ro- bust mechanism for p2p networks. The 5th International Conference on Cryptology and Network Security, China, August 2006.