The Popularity Parameter in Unstructured P2P File Sharing Networks

The Popularity Parameter in Unstructured P2P File Sharing Networks JAIME LLORET, JUAN R. DIAZ, JOSE M. JIMÉNEZ, MANUEL ESTEVE Department of Communications Polytechnic University of Valencia Camino de Vera s/n, 46022 Valencia SPAIN Abstract: - Since P2P became extremely popular between Internet users, many researchers have tried to model those P2P networks. One of the parameters, used in these models, is the popularity of a file. Some articles demonstrate that, if a file is so popular, the probability to find this file inside the P2P file sharing network is bigger. This article deals with popularity parameter in P2P file sharing networks. In order to do so, the unstructured public domain Peer-to-Peer networks Gnutella, FastTrack, OpenNap, eDonkey, Soulseek and MP2P have been measured. The authors have established a relationship between some films, songs, programs and documents found in web search engines and the same files found in public domain P2P file sharing networks. If all these analyzed Peer-To-Peer file-sharing networks were interconnected, the probability to find a desired file will be incremented. On the other hand, those analyzed P2P networks seems to be specialized in different type of files as it is shown in the paper. Key-Words: - Peer to peer, File Popularity, File Search, Peer-To-Peer Interconnection. 1 Introduction results, these algorithms use location/frequency Since Internet became accessible to the world, one method (search engines check to see if the search of the first users concerns is to find the file or the keywords appear near the top of a web page and information is looking for. A measurement study [1] how often keywords appear in relation to other of the deep Web reveals that it contains nearly 550 words in a web page) and the off-the-page factor billion of pages and it is doubling each year. On the (like clickthrough measurement). They are the other hand, the surface Web contains an estimated major factor in how search engines determine the 2.5 billion documents, growing at a rate of 7.5 popularity of a document. Habitually, search results million documents per day, and the deep Web is are sorted in popularity order. approximately 500 times greater than that visible to Currently there are a lot of P2P file-sharing conventional search engines. Nowadays there are a networks in existence, and many of them have lot of web search engines [2] and a lot of them have millions of on-line users and millions of data shared billions of textual documents indexed [3]. The Web [4]. In this type of networks, what a user really search engines can be classified in three types: wants is to find the file is looking for to download it. - Crawler-Based Search Engines, such as Google, The probability to find a desired file, in the network which create their listings automatically. They where a user is searching, is associated to the "crawl" or "spider" documents by following one popularity of the file. Some other parameters like hypertext link to another, then people search through what they have found. the type of file it can be shared, the availability of - Human-Powered Directories, such as Open the file and its replication are also considered. In Directory. It depends on humans for its listings. order to have real search measurements about some People have to submit a short description to the films, songs, programs and documents, we have directory for their entire site. A search looks for selected some of the most popular public domain matches only in the descriptions submitted. P2P file-sharing networks. Those selected networks - Hybrid Search Engines, such as MSN search. It are Gnutella [5], FastTrack [6], Opennap [7], is maintained by a combination of previous Edonkey [8], Soulseek [9] and MP2P [10]. types and present both results. Although there are other networks [11], we have The Web search engines employ some kind of selected this ones because they are so popular centralized algorithm. In order to have the best between Internet users. On the other hand, we have selected two crawler- clients with a higher bandwidth and process based search engines, Google and Altavista, and one capacity will be considered automatically search directory, Yahoo!, in order to find the same supernodes. Those clients with less bandwidth will files in Web search engines. be supernode clients. This type of system uses an Later on, it is established a relationship between flow control algorithm for sending queries and the results obtained in web search engines and the replies. It also has a diagram of priorities used to results obtained in the peer-to-peer file-sharing discard some messages. This type of search is used networks aforementioned. It will give us the by FastTrack and Gnutella 2 [12]. popularity of those files. This paper is structured as follows. Section 2 2.1.3 Randomly search technique The query is sent to k number of randomly selected discusses the search techniques used in Peer-To- neighbours. Each of these neighbours forward the Peer file-sharing networks. In section 3, it is query to any of their randomly selected neighbours. described the popularity parameter. Section 4 shows The query is propagated to sufficient number of the measurements taken in the Peer-To-Peer file- nodes to match the entry or until a TTL value. This sharing networks and Web search engines selected. technique is described in [13]. It is also shown the relationship between them. In section 5, it is discussed how can be increased the 2.1.4 Probably search technique probability to find a desired file in Peer-To-Peer In this case, the queries are sent to specific clients file-sharing networks. Finally, in Section 6, there are which are considered to have the greater probability conclusions and future works. finding the request. Each node maintains a probability value corresponding to each neighbour which defines the chances that a query will be 2 Search Techniques in Peer to Peer forwarded to that neighbour. An example of this File Sharing Networks type of search is APS [14]. In order to find a file in a P2P network, a search is needed. The implemented search algorithm in every 2.2 Strongly controlled P2P search network depends on the type of the network algorithms. (centralized P2P, decentralized P2P and partially In structured P2P networks, data placement and centralized). There are several types of searching topology within the P2P file-sharing network is algorithms and they can be classified as follows: tightly controlled. These networks are based in Distributes Hash Tables (DHT), and the nodes do 2.1 Loosely controlled P2P search not decide what they store and share with other algorithms. peers in the network. The data placement is defined They are used in decentralized Peer-To-Peer by the algorithm. When a document is published, it networks. The data placement is not defined because is routed to the client whose ID is the most similar the nodes of the network decide what files they want to the document’s ID. In order to find a file, the to share. There are two kind of loosely controlled queries are sent to the client whose ID is the most P2P search algorithms: similar to the document’s ID. The process is repeated until a close match is found. The main 2.1.1 Broadcast search technique search problem in this type of networks is that they The query search is sent to all directly connected are not very efficient for keyword based search. neighbours and they forward the query to all their This type of search is used by Freenet [15], CAN neighbours. The query is propagated to sufficient [16], Chord [17], Pastry [18] and Tapestry[19]. number of nodes to match the entry or until a TTL value. If the neighbour has the content, it replies, otherwise if floods the query to its neighbours. This type of search is used by Gnutella network. 2.3 Server-centrally controlled P2P search algorithms. 2.1.2 Selective search technique They are used in peer-to-peer networks where there The query search is sent to some nodes called is a server or a group of servers. This type of search supernodes that act as a central nodes. This is very simple and has short query time. There are supernodes will perform the search to other two kind of server-centrally controlled P2P search supernodes in order to find the requested file. The algorithms: 2.3.1 Single-Server search technique other peers. The popularity of a file governs Initially, P2P clients connect to a central server how long it stays in the network and how often where they publish their shared files (the files’ it is replicated. names, their sizes, etc). When a search query is sent In peer-to-peer file-sharing networks, the to the server, it looks up in its index database. If popularity can be mathematically expressed as there is a matching entry, the IP address of the node follows. Objects in a peer-to-peer file-sharing that shares the file is sent to the one that requested networks do not have the same popularity. it, and then, the direct connection and download Assuming that there are m files of interest in one takes place. This technique is used by the Soulseek P2P network and qi represents their normalized network. relative popularity (number of queries issued for it), it is verified: 2.3.2 Farm-of-Servers search technique m = In this type of P2P networks, there is a group of ∑ qi 1 (1) available servers called “brokers”.

The Popularity Parameter in Unstructured P2P File Sharing Networks

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support