The Robustness of Content-Based Search in Hierarchical Peer to Peer Networks

Total Page:16

File Type:pdf, Size:1020Kb

The Robustness of Content-Based Search in Hierarchical Peer to Peer Networks The Robustness of Content-Based Search in Hierarchical Peer to Peer Networks ∗ M. Elena Renda Jamie Callan I.S.T.I. – C.N.R. Language Technologies Institute and School of Computer Science Scuola Superiore Sant’Anna Carnegie Mellon University I-56100 Pisa, Italy Pittsburgh, PA 15213, US [email protected] [email protected] ABSTRACT Keywords Hierarchical peer to peer networks with multiple directory services Peer to Peer, Hierarchical, Search, Retrieval, Content-based, Ro- are an important architecture for large-scale file sharing due to their bustness effectiveness and efficiency. Recent research argues that they are also an effective method of providing large-scale content-based federated search of text-based digital libraries. In both cases the 1. INTRODUCTION directory services are critical resources that are subject to attack Peer to peer (P2P) networks have emerged in recent years as a or failure, but the latter architecture may be particularly vulnera- popular method of sharing information, primarily popular music, ble because content is less likely to be replicated throughout the movies, and software, among large numbers of networked com- network. puters. Computers (‘nodes’) in P2P networks can be information This paper studies the robustness, effectiveness and efficiency providers and information consumers, and may provide network of content-based federated search in hierarchical peer to peer net- services such as message routing and regional directory services. works when directory services fail unexpectedly. Several recovery A single node may play several roles at once, for example, posting methods are studied using simulations with varying failure rates. queries, sharing files, and relaying messages. Experimental results show that quality of service and efficiency Information providers in P2P networks can be viewed as digital degrade gracefully as the number of directory service failures in- libraries. In many of the well-known operational P2P networks, for creases. Furthermore, they show that content-based search mech- example, Gnutella [5], KaZaA [10], or LimeWire [13], the digital anisms are more resilient to failures than the match-based search libraries are relatively small, ranging from a few dozen to a few techniques. thousand files. The search services they offer tend to be based on exact or partial match of file names that are known or easily guessed Categories and Subject Descriptors (e.g., names of popular songs, movies, or software) and/or small amounts of manually-assigned metadata. H.3.3 [Information Storage and Retrieval]: Information Search Search for information in P2P networks is essentially federated and Retrieval—retrieval models, search process, selection process; search (or ‘distributed information retrieval’ [22]) across a possibly H.3.4 [Information Storage and Retrieval]: Systems and Soft- large set of digital libraries. Recent P2P research has focused on ware—distributed systems, performance evaluation (efficiency and how to efficiently and effectively determine which digital libraries effectiveness); H.3.5 [Information Storage and Retrieval]: On- have desired information. The two major approaches are highly- line Information Services—data sharing organized networks (e.g., Tapestry [8], OceanStore [12], Pond [20], Pastry [21], Chord [24]), in which content location can be inferred easily due to well-defined data placement policies; and hierarchical General Terms P2P networks with local directory services. The latter approach Algorithms, Design, Experimentation, Performance has became the most popular architecture for widely-deployed P2P applications (e.g., Gnutella 0.6 [5], KaZaA [10], Morpheus [15], Napster [16], Project JXTA [17]). ∗This work was done when the author was a Visiting Graduate Re- Our research interest is in using P2P networks to provide fed- search Assistant at the Language Technologies Institute, School of erated search of text-based digital libraries. For example, P2P Computer Science, Carnegie Mellon University. protocols could be an effective and convenient method of provid- ing federated search across the various document repositories of a medium-sized company. A P2P architecture is appealing in this environment because there is no need to collect and maintain cen- Permission to make digital or hard copies of all or part of this work for tralized repositories or indexes. P2P protocols could also be an personal or classroom use is granted without fee provided that copies are effective method of providing federated search of digital libraries not made or distributed for profit or commercial advantage and that copies that can be searched but not crawled by a Web crawler (so called bear this notice and the full citation on the first page. To copy otherwise, to “Hidden Web” repositories). republish, to post on servers or to redistribute to lists, requires prior specific Prior research demonstrated that a hierarchical P2P network ar- permission and/or a fee. CIKM’04, November 8–13, 2004, Washington, DC, USA. chitecture can provide effective and efficient access to several thou- Copyright 2004 ACM 1-58113-874-1/04/0011 ...$5.00. sand small text-based digital libraries [14]. In a hierarchical P2P Figure 1: Pure P2P Network. network some nodes (‘hub’ nodes) provide regional or topical di- rectory services that improve the accuracy and efficiency of the net- work by directing messages appropriately. Our interest in this pa- per is to study the stability and robustness of such networks when Figure 2: Hierarchical P2P Network. directory services fail. There is ample ‘real world’ evidence that hi- erarchical P2P networks such as KaZaA and Morpheus are stable as directory services (‘ultra peers’) enter and exit the network, but Napster was highly influential, most later systems adopted decen- their stability relies on massive sharing of information. Our goal tralized search. is to discover whether an architecture that gives digital libraries greater control over their intellectual property is as stable in the 2.2 Decentralized Search Without Directory presence of directory service failure. Services The next section of this paper describes some prior work related One reaction to the vulnerability of P2P networks with a sin- to our research. Section 3 provides more details about the motiva- gle directory service was P2P networks with no directory ser- tion for our research. Section 4 describes the search mechanisms vices, sometimes called ‘pure’ or ‘flat’ P2P networks (Figure 1). we used. Section 5 presents the behaviour of the search proto- Gnutella 0.4 [5] is an early example. It adopted unstructured search col under dynamic conditions. Section 6 describes the data and to find content: A peer sends a query to its neighbors (‘flooding’), methodology adopted for the experiments. Section 7 reports and which attempt to satisfy it, and which also relay it to their neigh- discusses the results obtained. Finally, Section 8 concludes. bors, until the request has traveled a maximum distance from the initiating node (TTL, Time to Live). The Gnutella 0.4 protocol is 2. PEER-TO-PEER SYSTEMS simple, robust, and easy to apply to any type of content, but it is not P2P applications for file sharing have become widespread in re- scalable because the message overhead is high, and because mes- cent years, which has focused considerable research attention on sages do not propagate to distant parts of the network (which may this area, particularly on how to improve P2P search. P2P search have the best matching files). methods can be classified into three broad categories, based on how A more focused interest-based locality approach [23] is based they manage the search process. We discuss these categories, and on collaborative filtering: If peer A is looking for document d and related prior research, below, with an emphasis on well-known so- peer B owns it, A creates an interest shortcut to B because B might lutions and solutions likely to be useful for federated search of text- have other documents A is interested in. The shortcuts provide a based digital libraries. loose overlay structure on top of Gnutella 0.4. To avoid flooding, the protocol first looks for content through shortcuts; when all of a 2.1 Centralized Search (One Directory node’s shortcuts fail, it reverts to Gnutella 0.4. Shortcuts seem to Service) have good load distribution properties, and help reduce the number of peers touched by each query. Prior experimental results were Napster, one of the first P2P systems, was essentially a sin- 1 based on relatively short simulations (one hour), where the system gle, network-wide directory service. When a new digital library was always running in optimal conditions [23], so the generality of (‘node’) wanted to join the network, it provided a catalog of its cur- the experimental results are not yet clear. rent holdings to Napster. Napster handled search requests, return- P-Grid [1] is a self-organizing and completely decentralized P2P ing lists of matching files and the addresses of the digital libraries lookup system based on a virtual distributed search tree. It has providing them. some similarities with Chord [24] (see below), but it exploits lo- The Napster architecture was efficient and effective, but not ro- cal knowledge and does not require central coordination or global bust because of the central and unique point of failure. Although knowledge, as Chord does. The P-Grid’s search has good ro- 1For efficiency reasons, this single, logical directory service was bustness and scalability characteristics, due to the replication dis- actually implemented as several distinct, slightly different directory tributed over all the peers. services. Other approaches [9, 26] select which nodes to search based on specific heuristics (past behaviour, past number of results, past ciency, DHTs have several disadvantages as a solution for federated query response time, etc.). These methods are generally efficient, search of text-based digital libraries. Name-based retrieval is gen- but the heuristic-based selection of nodes can fail to find relevant erally not effective for text retrieval.
Recommended publications
  • What Is Peer-To-Peer File Transfer? Bandwidth It Can Use
    sharing, with no cap on the amount of commonly used to trade copyrighted music What is Peer-to-Peer file transfer? bandwidth it can use. Thus, a single NSF PC and software. connected to NSF’s LAN with a standard The Recording Industry Association of A peer-to-peer, or “P2P,” file transfer 100Mbps network card could, with KaZaA’s America tracks users of this software and has service allows the user to share computer files default settings, conceivably saturate NSF’s begun initiating lawsuits against individuals through the Internet. Examples of P2P T3 (45Mbps) internet connection. who use P2P systems to steal copyrighted services include KaZaA, Grokster, Gnutella, The KaZaA software assesses the quality of material or to provide copyrighted software to Morpheus, and BearShare. the PC’s internet connection and designates others to download freely. These services are set up to allow users to computers with high-speed connections as search for and download files to their “Supernodes,” meaning that they provide a How does use of these services computers, and to enable users to make files hub between various users, a source of available for others to download from their information about files available on other create security issues at NSF? computers. users’ PCs. This uses much more of the When configuring these services, it is computer’s resources, including bandwidth possible to designate as “shared” not only the and processing capability. How do these services function? one folder KaZaA sets up by default, but also The free version of KaZaA is supported by the entire contents of the user’s computer as Peer to peer file transfer services are highly advertising, which appears on the user well as any NSF network drives to which the decentralized, creating a network of linked interface of the program and also causes pop- user has access, to be searchable and users.
    [Show full text]
  • The Effects of Digital Music Distribution" (2012)
    Southern Illinois University Carbondale OpenSIUC Research Papers Graduate School Spring 4-5-2012 The ffecE ts of Digital Music Distribution Rama A. Dechsakda [email protected] Follow this and additional works at: http://opensiuc.lib.siu.edu/gs_rp The er search paper was a study of how digital music distribution has affected the music industry by researching different views and aspects. I believe this topic was vital to research because it give us insight on were the music industry is headed in the future. Two main research questions proposed were; “How is digital music distribution affecting the music industry?” and “In what way does the piracy industry affect the digital music industry?” The methodology used for this research was performing case studies, researching prospective and retrospective data, and analyzing sales figures and graphs. Case studies were performed on one independent artist and two major artists whom changed the digital music industry in different ways. Another pair of case studies were performed on an independent label and a major label on how changes of the digital music industry effected their business model and how piracy effected those new business models as well. I analyzed sales figures and graphs of digital music sales and physical sales to show the differences in the formats. I researched prospective data on how consumers adjusted to the digital music advancements and how piracy industry has affected them. Last I concluded all the data found during this research to show that digital music distribution is growing and could possibly be the dominant format for obtaining music, and the battle with piracy will be an ongoing process that will be hard to end anytime soon.
    [Show full text]
  • Piratez Are Just Disgruntled Consumers Reach Global Theaters That They Overlap the Domestic USA Blu-Ray Release
    Moviegoers - or perhaps more accurately, lovers of cinema - are frustrated. Their frustrations begin with the discrepancies in film release strategies and timing. For example, audiences that saw Quentin Tarantino’s1 2 Django Unchained in the United States enjoyed its opening on Christmas day 2012; however, in Europe and other markets, viewers could not pay to see the movie until after the 17th of January 2013. Three weeks may not seem like a lot, but some movies can take months to reach an international audience. Some take so long to Piratez Are Just Disgruntled Consumers reach global theaters that they overlap the domestic USA Blu-Ray release. This delay can seem like an eternity for ultiscreen is at the top of the entertainment a desperate fan. This frustrated enthusiasm, combined industry’s agenda for delivering digital video. This with a lack of timely availability, leads to the feeling of M is discussed in the context of four main screens: being treated as a second class citizen - and may lead TVs, PCs, tablets and mobile phones. The premise being the over-anxious fan to engage in piracy. that multiscreen enables portability, usability and flexibility for consumers. But, there is a fifth screen which There has been some evolution in this practice, with is often overlooked – the cornerstone of the certain films being released simultaneously to a domestic and global audience. For example, Avatar3 was released entertainment industry - cinema. This digital video th th ecosystem is not complete without including cinema, and in theaters on the 10 and 17 of December in most it certainly should be part of the multiscreen discussion.
    [Show full text]
  • Online Piracy
    Online Piracy Christian Helmers 16 April 2015 1 / 31 Online piracy • Digital technologies: • Create, transform, copy data • Digital technologies: transmission, display and analysis of data through copying • Enabled large-scale online piracy • Piracy: illegal copying of copyrighted material online • Music • Movies • Ebooks/audiobooks • Software • Note: [Piracy] is not an offense of timeless character, universally definable by a priori criteria. It is far richer and trickier than that. (Johns, 2009:4) • Beware: often very ideological debate 2 / 31 P2P: Napster 3 / 31 P2P: The Pirate Bay 4 / 31 P2P: LimeWire Pirate Edition (\Not an equivalent, not a replacement, but the exact same thing") 5 / 31 P2P: More BitTorrent sites 6 / 31 Direct downloads: Megaupload 7 / 31 Direct downloads: many anonymous options 8 / 31 Cracking devices: Amazon's MOBI 9 / 31 Online piracy • Digital technologies: • P2P music sharing software: Napster, KaZaa, Morpheus, Grokster, BitTorrent sites, etc. • Direct downloads, file repositories • Cracking software • Enabled large-scale online piracy • Importance of digital platforms for sharing • File sharing market organized in private groups with tightly controlled access (non-profit) • Filehosters often for-profit 10 / 31 The economics of piracy (Klein et al., 2002) • Negative effect on copyright owner: • Pirated copies substitutes for legal downloads (but lower quality files may limit substitution) • Inability to price-discriminate • Inability to control prices • Positive effect on copyright owner: • Sampling (learning) • Space
    [Show full text]
  • Title: P2P Networks for Content Sharing
    Title: P2P Networks for Content Sharing Authors: Choon Hoong Ding, Sarana Nutanong, and Rajkumar Buyya Grid Computing and Distributed Systems Laboratory, Department of Computer Science and Software Engineering, The University of Melbourne, Australia (chd, sarana, raj)@cs.mu.oz.au ABSTRACT Peer-to-peer (P2P) technologies have been widely used for content sharing, popularly called “file-swapping” networks. This chapter gives a broad overview of content sharing P2P technologies. It starts with the fundamental concept of P2P computing followed by the analysis of network topologies used in peer-to-peer systems. Next, three milestone peer-to-peer technologies: Napster, Gnutella, and Fasttrack are explored in details, and they are finally concluded with the comparison table in the last section. 1. INTRODUCTION Peer-to-peer (P2P) content sharing has been an astonishingly successful P2P application on the Internet. P2P has gained tremendous public attention from Napster, the system supporting music sharing on the Web. It is a new emerging, interesting research technology and a promising product base. Intel P2P working group gave the definition of P2P as "The sharing of computer resources and services by direct exchange between systems". This thus gives P2P systems two main key characteristics: • Scalability: there is no algorithmic, or technical limitation of the size of the system, e.g. the complexity of the system should be somewhat constant regardless of number of nodes in the system. • Reliability: The malfunction on any given node will not effect the whole system (or maybe even any other nodes). File sharing network like Gnutella is a good example of scalability and reliability.
    [Show full text]
  • Scaling Content Delivery Outline P2P System
    3/17/2019 Problem: Scaling Content Delivery • Millions of clients server and network meltdown 15-441/641: Content Delivery Peer-to-Peer 15-441 Spring 2019 Profs Peter Steenkiste & Justine Sherry Fall 2019 https://computer-networks.github.io/sp19/ Outline P2P System • Peer-to-peer • Overlays: naming, addressing, and routing • CDNs • Leverage the resources of client machines (peers) • Computation, storage, bandwidth 3 4 1 3/17/2019 P2P Definition Why p2p? • Harness lots of spare capacity Distributed systems consisting of interconnected • 1 Big Fast Server: $10k/month++ versus 1000s .. 1000000s clients: $ ?? nodes able to self-organize into network topologies with the purpose of sharing resources such as • Capacity grows with the number of users! content, CPU cycles, storage and bandwidth, • Build very large-scale, self-managing systems capable of adapting to failures and accommodating • Same techniques useful for companies, transient populations of nodes while maintaining • E.g. Akamai’s 14,000+ nodes, Google’s 100,000+ nodes acceptable connectivity and performance, without requiring the intermediation or support of a global • But: servers vs. arbitrary nodes, hard vs. soft state (backups vs caches), …. centralized server or authority. • Also: security, fairness, freeloading, .. • No single point of failure – A Survey of Peer-To-Peer Content Distribution Technologies, • Some nodes go down – others take over Androutsellis-Theotokis and Spinellis • … government shuts down nodes – peers in other countries are available 6 P2P Construction Key Idea: Network Overlay P2P Overlay Network • A network overlay is a network that is layered on top of the Internet • Simplified picture: overlays use IP as their datalink layer • Overlays need the equivalent of all the functions IP networks need: Clients • Naming and addressing Servers • Routing SPRINT • Bootstrapping Verizon • Security, error recovery, etc.
    [Show full text]
  • Three Case Studies on Mp3.Com, Napster and Kazaa
    University of Wollongong Research Online Faculty of Engineering & Information Sciences - Honours Theses University of Wollongong Thesis Collections 25-10-2004 Digital distribution music services and the demise of the traditional music industry: three case studies on mp3.com, Napster and Kazaa K. Alves University of Wollongong Follow this and additional works at: https://ro.uow.edu.au/thesesinfo University of Wollongong Copyright Warning You may print or download ONE copy of this document for the purpose of your own research or study. The University does not authorise you to copy, communicate or otherwise make available electronically to any other person any copyright material contained on this site. You are reminded of the following: This work is copyright. Apart from any use permitted under the Copyright Act 1968, no part of this work may be reproduced by any process, nor may any other exclusive right be exercised, without the permission of the author. Copyright owners are entitled to take legal action against persons who infringe their copyright. A reproduction of material that is protected by copyright may be a copyright infringement. A court may impose penalties and award damages in relation to offences and infringements relating to copyright material. Higher penalties may apply, and higher damages may be awarded, for offences and infringements involving the conversion of material into digital or electronic form. Unless otherwise indicated, the views expressed in this thesis are those of the author and do not necessarily represent the views of the University of Wollongong. Recommended Citation Alves, K., "Digital distribution music services and the demise of the traditional music industry: three case studies on mp3.com, Napster and Kazaa" (2004).
    [Show full text]
  • A Safe and Efficient Storage Scheme Based on Blockchain and IPFS for Agricultural Products Tracking
    Journal of Computers Vol. 29 No. 6, 2018, pp. 158-167 doi:10.3966/199115992018122906015 A Safe and Efficient Storage Scheme Based on BlockChain and IPFS for Agricultural Products Tracking JinTao Hao*, Yan Sun, Hong Luo Beijing Key Laboratory of Intelligent Telecommunications Software and Multimedia, Beijing University of Posts and Telecommunications, Beijing 100876, China {haojintao, Sunyan, luoh}@bupt.edu.cn Received 9 January 2018; Revised 22 January 2018; Accepted 22 January 2018 Abstract. The adoption of agricultural products traceability management based on Internet of Things (IoT) technology provides excellent benefits for the current food safety issues. The provenance data can demonstrate agricultural products movement process from the countryside to the dining table. However, the massive provenance data incurs an inefficient query. Meanwhile, the provenance data can be tampered deliberately which affect food safety. There are seldom reported approaches that can solve the above problem effectively. In this paper, we propose a data storage model based on Inter-Planetary File System (IPFS) and blockchain. First, IPFS is used to store video, images, and real-time monitoring data reported from the sensors. Then, in order to avoid a malicious user in case of data faking attack, we exploit the blockchain to store the IPFS hash address of the provenance data. Based on that, we design an authentication mechanism based on blockchain. It can verify the data and ensures effective data security. The experimental results show that the proposed approach can outperforms the existing methods. Keywords: agricultural products tracking, blockchain, IPFS, storage scheme 1 Introduction For agricultural products movement process from the countryside to the dining table, it needs to go through the cultivation, processing, transportation and sales.
    [Show full text]
  • An Analysis of Internet Content Delivery Systems
    An Analysis of Internet Content Delivery Systems Stefan Saroiu, Krishna P. Gummadi, Richard J. Dunn, Steven D. Gribble, and Henry M. Levy Department of Computer Science & Engineering University of Washington {tzoompy,gummadi,rdunn,gribble,levy}@cs.washington.edu Abstract versity of Washington, a large university with over 60,000 students, faculty, and staff. For this paper, we analyze a In the span of only a few years, the Internet has expe- nine day trace that saw over 500 million transactions and rienced an astronomical increase in the use of specialized over 20 terabytes of HTTP data. From this data, we pro- content delivery systems, such as content delivery networks vide a detailed characterization and comparison of content and peer-to-peer file sharing systems. Therefore, an under- delivery systems, and in particular, the latest peer-to-peer standing of content delivery on the Internet now requires workloads. Our results quantify: (1) the extent to which a detailed understanding of how these systems are used in peer-to-peer traffic has overwhelmed web traffic as a lead- practice. ing consumer of Internet bandwidth, (2) the dramatic differ- This paper examines content delivery from the point of ences in the characteristics of objects being transferred as a view of four content delivery systems: HTTP web traffic, the result, (3) the impact of the two-way nature of peer-to-peer Akamai content delivery network, and Kazaa and Gnutella communication, and (4) the ways in which peer-to-peer sys- peer-to-peer file sharing traffic. We collected a trace of all tems are not scaling, despite their explicitly scalable design.
    [Show full text]
  • Radical Tactics of the Offline Library
    Radical Tactics of the Offline Radical Tactics of the Offline Library Radical Library Tactics of theHENRY WARWICK Offline Library Radical Tactics of the Offline07 Library Radical Radical Tactics of the Offline Library 07 COLOPHON Network Notebooks editors: Geert Lovink and Miriam Rasch Design: Medamo, Rotterdam http://www.medamo.nl ePub development: André Castro Printer: Printvisie Publisher: Institute of Network Cultures, Amsterdam Supported by: Amsterdam University of Applied Sciences (Hogeschool van Amsterdam), Amsterdam Creative Industries Publishing, Stichting Democratie en Media If you want to order copies please contact: Institute of Network Cultures Hogeschool van Amsterdam Rhijnspoorplein 1 1091 GC Amsterdam The Netherlands http://www.networkcultures.org [email protected] t: +31 (0)20 59 51 865 ePub and PDF editions of this publication are freely downloadable from: http://www.networkcultures.org/publications This publication is licensed under Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported (CC BY-NC-ND 3.0) To view a copy of this license, visit creativecommons.org/licenses/by-nc-sa/3.0/. Amsterdam, June 2014 ISBN 978-90-818575-9-8 (print) ISBN 978-90-822345-0-3 (ePub) NETWORK NOTEBOOK SERIES The Network Notebooks series presents new media research commissioned by the INC. PREVIOUSLY PUBLISHED NETWORK NOTEBOOKS: Network Notebooks 06 Andreas Treske, The Inner Life of Video Spheres: Theory for the YouTube Generation, 2013. ISBN: 978-90-818575-3-6. Network Notebooks 05 Eric Kluitenberg, Legacies of Tactical Media, 2011. ISBN: 978-90-816021-8-1. Network Notebooks 04 Rosa Menkman, The Glitch Momentum, 2011. ISBN: 978-90-816021-6-7. Network Notebooks 03 Dymtri Kleiner, The Telekommunist Manifesto, 2010.
    [Show full text]
  • Printmgr File
    Warner Music Group Annual Report 2007 NYSE: WMG www.wmg.com A Message from Chairman and CEO Edgar Bronfman, Jr. Dear Fellow Shareholders: Industry transitions are times of great opportunity, and there is no doubt that the recorded music industry is in the midst of a significant and fundamental transformation. Of course, opportunity does not come without its challenges and this year, Warner Music Group continued to meet and address the challenges of this transition head on, laying the groundwork for future growth. The transformation of the recorded music industry will require not only a clear vision and a progressive strategy to radically alter traditional business models, but also leadership, continued financial discipline and resilience. At Warner Music Group, we are creating the foundation for future growth and believe we have the right strategy in place not only to effectively navigate through these difficult times, but also to reap the rewards of the recorded music industry’s evolution. Looking back on our fiscal 2007, we made meaningful progress in many different areas. Sustaining Digital Leadership During fiscal 2007, digital revenue grew 30% to $460 million or 14% of total revenue and, in the U.S. we once again reported digital SoundScan album share substantially above our physical share. Building upon our digital leadership is essential as the recorded music business rapidly shifts from physical to digital. We entered into selective partnerships across business models that are aimed at driving expanded monetization of our content through new products in the digital space. These partnerships included the acquisition of a majority stake in Zebralution, a German digital distribution company, as well as deals with social networking site, imeem, and online retailer and music discovery site, Lala.com.
    [Show full text]
  • Malware Prevalence in the Kazaa File-Sharing Network
    Malware Prevalence in the KaZaA File-Sharing Network Seungwon Shin Jaeyeon Jung Hari Balakrishnan ETRI Mazu Networks MIT CSAIL [email protected] [email protected] [email protected] ABSTRACT With few defense mechanisms in place, peer-to-peer (P2P) file- In recent years, more than 200 viruses have been reported to use sharing networks have been known to be vulnerable to many secu- a peer-to-peer (P2P) file-sharing network as a propagation vector. rity attacks. Recent papers discuss the threat of malware spread in Disguised as files that are frequently exchanged over P2P networks, P2P networks [30, 26]. One experimental study reports that 44% these malicious programs infect the user’s host if downloaded and of the 4,778 executable files downloaded through a KaZaA client opened, leaving their copies in the user’s sharing folder for fur- application contain malicious code [28]. In this study, we perform ther propagation. Using a light-weight crawler built for the KaZaA a large scale measurement study on the prevalence of malicious file-sharing network, we study the prevalence of malware in this programs in the KaZaA file-sharing network. popular P2P network, the malware’s propagation behavior in the In recent years, KaZaA has been one of most popular P2P net- P2P network environment and the characteristics of infected hosts. works and the number of active users far outnumbers that of Over- We gathered information about more than 500,000 files returned net and Gnutella. Despite a number of lawsuits [6], KaZaA still by the KaZaA network in response to 24 common query strings.
    [Show full text]