The Robustness of Content-Based Search in Hierarchical Peer to Peer Networks ∗ M. Elena Renda Jamie Callan I.S.T.I. – C.N.R. Language Technologies Institute and School of Computer Science Scuola Superiore Sant’Anna Carnegie Mellon University I-56100 Pisa, Italy Pittsburgh, PA 15213, US
[email protected] [email protected] ABSTRACT Keywords Hierarchical peer to peer networks with multiple directory services Peer to Peer, Hierarchical, Search, Retrieval, Content-based, Ro- are an important architecture for large-scale file sharing due to their bustness effectiveness and efficiency. Recent research argues that they are also an effective method of providing large-scale content-based federated search of text-based digital libraries. In both cases the 1. INTRODUCTION directory services are critical resources that are subject to attack Peer to peer (P2P) networks have emerged in recent years as a or failure, but the latter architecture may be particularly vulnera- popular method of sharing information, primarily popular music, ble because content is less likely to be replicated throughout the movies, and software, among large numbers of networked com- network. puters. Computers (‘nodes’) in P2P networks can be information This paper studies the robustness, effectiveness and efficiency providers and information consumers, and may provide network of content-based federated search in hierarchical peer to peer net- services such as message routing and regional directory services. works when directory services fail unexpectedly. Several recovery A single node may play several roles at once, for example, posting methods are studied using simulations with varying failure rates. queries, sharing files, and relaying messages.