Sharing Files on Peer-To-Peer Networks Based on Social Network

Total Page:16

File Type:pdf, Size:1020Kb

Sharing Files on Peer-To-Peer Networks Based on Social Network Sharing Files on Peer-to-Peer Networks based on Social Network Fabrice Le Fessant INRIA Saclay-Île de France 23 Février 2009 1 / 14 Social networks are less vulnerable to attacks by spies ) peer-to-peer network + social network ) less vulnerable to censorship... maybe Fighting Censorship Standard peer-to-peer networks are vulnerable ! Spies can locate content providers States and Big Companies already have spies on P2P networks 2 / 14 Fighting Censorship Standard peer-to-peer networks are vulnerable ! Spies can locate content providers States and Big Companies already have spies on P2P networks Social networks are less vulnerable to attacks by spies ) peer-to-peer network + social network ) less vulnerable to censorship... maybe 3 / 14 Peer-to-Peer + Social Network Each user allows his computer to connect only to his friends' computers What can we do with such a network ? 4 / 14 My main requirement: • Content providers should remain unknown • while Links (friends) are not hidden • ISP must share such information with spies File-Sharing Three main operations: Resource discovery: nd which resources exist Resource localization: nd which resources are available and where Resource access: download the resources 5 / 14 File-Sharing Three main operations: Resource discovery: nd which resources exist Resource localization: nd which resources are available and where Resource access: download the resources My main requirement: • Content providers should remain unknown • while Links (friends) are not hidden • ISP must share such information with spies 6 / 14 Related Work Restrict unstructured networks to social based networks Gnutella Simple ooding (Turtle) FreeNet Local Routing with replication GnuNet Flooding with probabilistic shortcuts 7 / 14 The Turtle Network • Gnutella on a Social Network • Search by ooding the neighbours ! Tree Structure ! Reply from leaves to root • Too expensive in bandwidth ! But interesting simulations on Orkut 8 / 14 Popularity of Users in Orkut trace: 106,481 users in 2004 9 / 14 Connectivity Users linked to the main component 10 / 14 Size of largest Component 11 / 14 Distribution of Path Lengths 12 / 14 Orkut Crawl (2004) Users by Popularity 100 Orkut 2004 10 100 10 1 Number of neighbours 0.1 0 20000 40000 60000 80000 100000 120000 CDF of users 13 / 14 Orkut Crawl (2007) Users by Popularity 100000 Orkut 2007/10/01 10 10000 100 500 1000 100 10 Number of neighbours 1 0.1 0m 0.5m 1m 1.5m 2m 2.5m 3m 3.5m CDF of users 14 / 14 Orkut Crawl (2009) Users by Popularity 100000 Orkut 2009/02/04 10 10000 100 500 32000 1000 100 10 Number of neighbours 1 0.1 0m 0.5m 1m 1.5m 2m 2.5m 3m 3.5m 4m 4.5m 5m CDF of users 15 / 14 Orkut Crawl (2009) Crawler Performance 5m 100k 4.5m 90k 4m 80k 3.5m 70k 3m 60k 2.5m 50k 2m 40k 1.5m 30k Number of Queued Users 1m 20k Number of Crawled Users 0.5m Queued 10k Crawled 0m 0k 0 50 100 150 200 250 Hours 16 / 14 Reliability of Orkut Crawls Real topology of a social network ? • First crawl (2004): crawled users and queued users probably mixed • Second crawl (2007): only links between crawled users were kept • Third crawl (2009): uses advertized popularity for queued users 17 / 14 FreeNet Anti-censorship network for small documents • Can be limited to friends • Documents are encrypted • Routing: • Depth First Search • Closest Neighbour rst • Access: • Document Replicated on way back 18 / 14 GnuNet Same goal as FreeNet, but: • Routing: • Limited-Breadth First Search • Shortcut system (not with friends !) • Credit System (anti-ooding) • Optimizations are weaknesses 19 / 14 Can we do better ? • Designed for small les • FreeNet and GnuNet: • Expect stronger attackers: • Trac analysis • Moving connectivity • Turtle: • Unstructured 20 / 14 Queries or Publications • Current Systems focus on whole-system queries • My position: • Try to structure the network • Half query/half publication (like DHT...) • Whole-system publications • File-type optimizations/issues • New problems: • Real network must be a sub-network of social one • Use source-coding for ecient big downloads ) Requires replica localization 21 / 14.
Recommended publications
  • The Edonkey File-Sharing Network
    The eDonkey File-Sharing Network Oliver Heckmann, Axel Bock, Andreas Mauthe, Ralf Steinmetz Multimedia Kommunikation (KOM) Technische Universitat¨ Darmstadt Merckstr. 25, 64293 Darmstadt (heckmann, bock, mauthe, steinmetz)@kom.tu-darmstadt.de Abstract: The eDonkey 2000 file-sharing network is one of the most successful peer- to-peer file-sharing applications, especially in Germany. The network itself is a hybrid peer-to-peer network with client applications running on the end-system that are con- nected to a distributed network of dedicated servers. In this paper we describe the eDonkey protocol and measurement results on network/transport layer and application layer that were made with the client software and with an open-source eDonkey server we extended for these measurements. 1 Motivation and Introduction Most of the traffic in the network of access and backbone Internet service providers (ISPs) is generated by peer-to-peer (P2P) file-sharing applications [San03]. These applications are typically bandwidth greedy and generate more long-lived TCP flows than the WWW traffic that was dominating the Internet traffic before the P2P applications. To understand the influence of these applications and the characteristics of the traffic they produce and their impact on network design, capacity expansion, traffic engineering and shaping, it is important to empirically analyse the dominant file-sharing applications. The eDonkey file-sharing protocol is one of these file-sharing protocols. It is imple- mented by the original eDonkey2000 client [eDonkey] and additionally by some open- source clients like mldonkey [mlDonkey] and eMule [eMule]. According to [San03] it is with 52% of the generated file-sharing traffic the most successful P2P file-sharing net- work in Germany, even more successful than the FastTrack protocol used by the P2P client KaZaa [KaZaa] that comes to 44% of the traffic.
    [Show full text]
  • IPFS and Friends: a Qualitative Comparison of Next Generation Peer-To-Peer Data Networks Erik Daniel and Florian Tschorsch
    1 IPFS and Friends: A Qualitative Comparison of Next Generation Peer-to-Peer Data Networks Erik Daniel and Florian Tschorsch Abstract—Decentralized, distributed storage offers a way to types of files [1]. Napster and Gnutella marked the beginning reduce the impact of data silos as often fostered by centralized and were followed by many other P2P networks focusing on cloud storage. While the intentions of this trend are not new, the specialized application areas or novel network structures. For topic gained traction due to technological advancements, most notably blockchain networks. As a consequence, we observe that example, Freenet [2] realizes anonymous storage and retrieval. a new generation of peer-to-peer data networks emerges. In this Chord [3], CAN [4], and Pastry [5] provide protocols to survey paper, we therefore provide a technical overview of the maintain a structured overlay network topology. In particular, next generation data networks. We use select data networks to BitTorrent [6] received a lot of attention from both users and introduce general concepts and to emphasize new developments. the research community. BitTorrent introduced an incentive Specifically, we provide a deeper outline of the Interplanetary File System and a general overview of Swarm, the Hypercore Pro- mechanism to achieve Pareto efficiency, trying to improve tocol, SAFE, Storj, and Arweave. We identify common building network utilization achieving a higher level of robustness. We blocks and provide a qualitative comparison. From the overview, consider networks such as Napster, Gnutella, Freenet, BitTor- we derive future challenges and research goals concerning data rent, and many more as first generation P2P data networks, networks.
    [Show full text]
  • Privacy Enhancing Technologies 2003 an Analysis of Gnunet And
    Privacy Enhancing Technologies 2003 An Analysis of GNUnet and the Implications for Anonymous, Censorship-Resistant Networks Dennis Kügler Federal Office for Information Security, Germany [email protected] 1 Anonymous, Censorship-Resistant Networks • Anonymous Peer-to-Peer Networks – Gnutella • Searching is relatively anonymous • Downloading is not anonymous • Censorship-Resistant Networks – Eternity Service • Distributed storage medium • Attack resistant • Anonymous, Censorship-Resistant Networks – Freenet – GNUnet 2 GNUnet: Obfuscated, Distributed Filesystem Content Hash Key: [H(B),H(E (B))] • H(B) – Content encryption: H(B) – Unambiguous filename: H(E (B)) H(B) • Content replication – Caching while delivering – Based on unambiguous filename • Searchability – Keywords 3 GNUnet: Peer-to-Peer MIX Network • Initiating node – Downloads content • Supplying nodes – Store content unencrypted • Intermediary nodes – Forward and cache encrypted content – Plausible deniability due to encryption • Economic model – Based on credit Query A Priority=20 B – Charge for queries c =c -20 B B - – Pay for responses 4 GNUnet Encoding • DBlocks DBlock DBlock ... DBlock – 1KB of the content – Content hash encrypted • IBlocks IBlock ... IBlock – CHKs of 25 DBlocks – Organized as tree – Content hash encrypted IBlock • RBlock – Description of the content – CHK of the root IBlock RBlock – Keyword encrypted 5 The Attacker Model • Attacker – Controls malicious nodes that behave correctly – Prepares dictionary of interesting keywords – Observes queries and
    [Show full text]
  • Will Sci-Hub Kill the Open Access Citation Advantage and (At Least for Now) Save Toll Access Journals?
    Will Sci-Hub Kill the Open Access Citation Advantage and (at least for now) Save Toll Access Journals? David W. Lewis October 2016 © 2016 David W. Lewis. This work is licensed under a Creative Commons Attribution 4.0 International license. Introduction It is a generally accepted fact that open access journal articles enjoy a citation advantage.1 This citation advantage results from the fact that open access journal articles are available to everyone in the word with an Internet collection. Thus, anyone with an interest in the work can find it and use it easily with no out-of-pocket cost. This use leads to citations. Articles in toll access journals on the other hand, are locked behind paywalls and are only available to those associated with institutions who can afford the subscription costs, or who are willing and able to purchase individual articles for $30 or more. There has always been some slippage in the toll access journal system because of informal sharing of articles. Authors will usually send copies of their work to those who ask and sometime post them on their websites even when this is not allowable under publisher’s agreements. Stevan Harnad and his colleagues proposed making this type of author sharing a standard semi-automated feature for closed articles in institutional repositories.2 The hashtag #ICanHazPDF can be used to broadcast a request for an article that an individual does not have access to.3 Increasingly, toll access articles are required by funder mandates to be made publically available, though usually after an embargo period.
    [Show full text]
  • Scalable Supernode Selection in Peer-To-Peer Overlay Networks∗
    Scalable Supernode Selection in Peer-to-Peer Overlay Networks∗ Virginia Lo, Dayi Zhou, Yuhong Liu, Chris GauthierDickey, and Jun Li {lo|dayizhou|liuyh|chrisg|lijun} @cs.uoregon.edu Network Research Group – University of Oregon Abstract ically fulfill additional requirements such as load bal- ance, resources, access, and fault tolerance. We define a problem called the supernode selection The supernode selection problem is highly challeng- problem which has emerged across a variety of peer-to- ing because in the peer-to-peer environment, a large peer applications. Supernode selection involves selec- number of supernodes must be selected from a huge tion of a subset of the peers to serve a special role. The and dynamically changing network in which neither the supernodes must be well-dispersed throughout the peer- node characteristics nor the network topology are known to-peer overlay network, and must fulfill additional re- a priori. Thus, simple strategies such as random selec- quirements such as load balance, resource needs, adapt- tion don’t work. Supernode selection is more complex ability to churn, and heterogeneity. While similar to than classic dominating set and p-centers from graph dominating set and p-centers problems, the supernode theory, known to be NP-hard problems, because it must selection problem must meet the additional challenge respond to dynamic joins and leaves (churn), operate of operating within a huge, unknown and dynamically among potentially malicious nodes, and function in an changing network. We describe three generic super- environment that is highly heterogeneous. node selection protocols we have developed for peer-to- Supernode selection shows up in many peer-to-peer peer environments: a label-based scheme for structured and networking applications.
    [Show full text]
  • CS505: Distributed Systems
    Cristina Nita-Rotaru CS505: Distributed Systems Lookup services. Chord. CAN. Pastry. Kademlia. Required Reading } I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, H. Balakrishnan, Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications, SIGCOMM 2001. } A Scalable Content-Addressable Network S.a Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker, SIGCOMM 2001 } A. Rowstron and P. Druschel. "Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems". IFIP/ACM International Conference on Distributed Systems Platforms (Middleware), 2001 } Kademlia: A Peer-to-peer Information System Based on the XOR Metric. P. Maymounkov and D. Mazieres, IPTPS '02 2 DHTs 1: Lookup services Peer-to-Peer (P2P) Systems } Applications that take advantage of resources (storage, cycles, content, human presence) available at the edges of the Internet. } Characteristics: } System consists of clients connected through Internet and acting as peers } System is designed to work in the presence of variable connectivity } Nodes at the edges of the network have significant autonomy; no centralized control } Nodes are symmetric in function 4 DHTs Benefits of P2P and Applications } High capacity: all clients provide resources (bandwidth, storage space, and computing power). The capacity of the system increases as more nodes become part of the system. } Increased reliability: achieved by replicating data over multiple peers, and by enabling peers to find the data without relying on a centralized index server. } Applications:
    [Show full text]
  • Piracy of Scientific Papers in Latin America: an Analysis of Sci-Hub Usage Data
    Developing Latin America Piracy of scientific papers in Latin America: An analysis of Sci-Hub usage data Juan D. Machin-Mastromatteo Alejandro Uribe-Tirado Maria E. Romero-Ortiz This article was originally published as: Machin-Mastromatteo, J.D., Uribe-Tirado, A., and Romero-Ortiz, M. E. (2016). Piracy of scientific papers in Latin America: An analysis of Sci-Hub usage data. Information Development, 32(5), 1806–1814. http://dx.doi.org/10.1177/0266666916671080 Abstract Sci-Hub hosts pirated copies of 51 million scientific papers from commercial publishers. This article presents the site’s characteristics, it criticizes that it might be perceived as a de-facto component of the Open Access movement, it replicates an analysis published in Science using its available usage data, but limiting it to Latin America, and presents implications caused by this site for information professionals, universities and libraries. Keywords: Sci-Hub, piracy, open access, scientific articles, academic databases, serials crisis Scientific articles are vital for students, professors and researchers in universities, research centers and other knowledge institutions worldwide. When academic publishing started, academies, institutions and professional associations gathered articles, assessed their quality, collected them in journals, printed and distributed its copies; with the added difficulty of not having digital technologies. Producing journals became unsustainable for some professional societies, so commercial scientific publishers started appearing and assumed printing, sales and distribution on their behalf, while academics retained the intellectual tasks. Elsevier, among the first publishers, emerged to cover operations costs and profit from sales, now it is part of an industry that grew from the process of scientific communication; a 10 billion US dollar business (Murphy, 2016).
    [Show full text]
  • Practical Anonymous Networking?
    gap – practical anonymous networking? Krista Bennett Christian Grothoff S3 lab and CERIAS, Department of Computer Sciences, Purdue University [email protected], [email protected] http://www.gnu.org/software/GNUnet/ Abstract. This paper describes how anonymity is achieved in gnunet, a framework for anonymous distributed and secure networking. The main focus of this work is gap, a simple protocol for anonymous transfer of data which can achieve better anonymity guarantees than many traditional indirection schemes and is additionally more efficient. gap is based on a new perspective on how to achieve anonymity. Based on this new perspective it is possible to relax the requirements stated in traditional indirection schemes, allowing individual nodes to balance anonymity with efficiency according to their specific needs. 1 Introduction In this paper, we present the anonymity aspect of gnunet, a framework for secure peer-to-peer networking. The gnunet framework provides peer discovery, link encryption and message-batching. At present, gnunet’s primary application is anonymous file-sharing. The anonymous file-sharing application uses a content encoding scheme that breaks files into 1k blocks as described in [1]. The 1k blocks are transmitted using gnunet’s anonymity protocol, gap. This paper describes gap and how it attempts to achieve privacy and scalability in an environment with malicious peers and actively participating adversaries. The gnunet core API offers node discovery, authentication and encryption services. All communication between nodes in the network is confidential; no host outside the network can observe the actual contents of the data that flows through the network. Even the type of the data cannot be observed, as all packets are padded to have identical size.
    [Show full text]
  • Zeronet Presentation
    ZeroNet Decentralized web platform using Bitcoin cryptography and BitTorrent network. ABOUT ZERONET Why? Current features We believe in open, free, and ◦ Real-time updated sites uncensored network and communication. ◦ Namecoin .bit domain support ◦ No hosting costs ◦ Multi-user sites Sites are served by visitors. ◦ Password less, Bitcoin's BIP32- ◦ Impossible to shut down based authorization It's nowhere because it's ◦ Built-in SQL server with P2P data everywhere. synchronization ◦ No single point of failure ◦ Tor network support Site remains online so long as at least 1 peer serving it. ◦ Works in any browser/OS ◦ Fast and works offline You can access the site even if your internet is unavailable. HOW DOES IT WORK? THE BASICS OF ASYMMETRIC CRYPTOGRAPHY When you create a new site you get two keys: Private key Public key 5JNiiGspzqt8sC8FM54FMr53U9XvLVh8Waz6YYDK69gG6hso9xu 16YsjZK9nweXyy3vNQQPKT8tfjCNjEX9JM ◦ Only you have it ◦ This is your site address ◦ Allows you to sign new content for ◦ Using this anyone can verify if the your site. file is created by the site owner. ◦ No central registry ◦ Every downloaded file is verified, It never leaves your computer. makes it safe from malicious code inserts or any modifications. ◦ Impossible to modify your site without it. MORE INFO ABOUT CRYPTOGRAPHY OF ZERONET ◦ ZeroNet uses the same elliptic curve based encryption as in your Bitcoin wallet. ◦ You can accept payments directly to your site address. ◦ Using the current fastest supercomputer, it would take around 1 billion years to "hack" a private key. WHAT HAPPENS WHEN YOU VISIT A ZERONET SITE? WHAT HAPPENS WHEN YOU VISIT A ZERONET SITE? (1/2) 1 Gathering visitors IP addresses: Please send some IP addresses for site 1EU1tbG9oC1A8jz2ouVwGZyQ5asrNsE4Vr OK, Here are some: 12.34.56.78:13433, 42.42.42.42:13411, ..
    [Show full text]
  • CS 552 Peer 2 Peer Networking
    CS 552 Peer 2 Peer Networking R. Martin Credit slides from B. Richardson, I. Stoica, M. Cuenca Peer to Peer • Outline • Overview • Systems: – Gnutella – Freenet – Chord – PlanetP Why Study P2P • Huge fraction of traffic on networks today – >=50%! • Exciting new applications • Next level of resource sharing – Vs. timesharing, client-server, P2P – E.g. Access 10’s-100’s of TB at low cost. P2P usage • CMU network (external to world), 2003 • 47% of all traffic was easily classifiable as P2P • 18% of traffic was HTTP • Other traffic: 35% – Believe ~28% is port- hopping P2P • Other sites have a similar distribution Big Picture • Gnutella – Focus is simple sharing – Using simple flooding • Bit torrent – Designed for high bandwidth • PlanetP – Focus on search and retrieval – Creates global index on each node via controlled, randomized flooding • Cord – Focus on building a distributed hash table (DHT) – Finger tables Other P2P systems • Freenet: – Focus privacy and anonymity – Builds internal routing tables • KaaZa • eDonkey • Napster – Success started the whole craze Key issues for P2P systems • Join/leave – How do nodes join/leave? Who is allowed? • Search and retrieval – How to find content? – How are metadata indexes built, stored, distributed? • Content Distribution – Where is content stored? How is it downloaded and retrieved? Search and Retrieval • Basic strategies: – Flooding the query – Flooding the index – Routing the query • Different tradeoffs depending on application – Robustness, scalability, legal issues Flooding the Query (Gnutella) N3 Lookup(“title”) N1 N2 N4 N5 Key=title N8 N6 Value=mp3 N7 Pros: highly robust. Cons: Huge network traffic Flooding the Index (PlanetP) Key1=title1 N3 N1 Key2=title2 N2 N4 N5 Lookup(“title4”) Key1=title3 N8 N6 Key2=title4 N7 Pros: Robust.
    [Show full text]
  • Free Riding on Gnutella
    Free Riding on Gnutella Eytan Adar and Bernardo A. Huberman Internet Ecologies Area Xerox Palo Alto Research Center Palo Alto, CA 94304 Abstract An extensive analysis of user traffic on Gnutella shows a significant amount of free riding in the system. By sampling messages on the Gnutella network over a 24-hour period, we established that nearly 70% of Gnutella users share no files, and nearly 50% of all responses are returned by the top 1% of sharing hosts. Furthermore, we found out that free riding is distributed evenly between domains, so that no one group contributes significantly more than others, and that peers that volunteer to share files are not necessarily those who have desirable ones. We argue that free riding leads to degradation of the system performance and adds vulnerability to the system. If this trend continues copyright issues might become moot compared to the possible collapse of such systems. 1 1. Introduction The sudden appearance of new forms of network applications such as Gnutella [Gn00a] and FreeNet [Fr00], holds promise for the emergence of fully distributed information sharing systems. These systems, inspired by Napster [Na00], will allow users worldwide access and provision of information while enjoying a level of privacy not possible in the present client-server architecture of the web. While a lot of attention has been focused on the issue of free access to music and the violation of copyright laws through these systems, there remains an additional problem of securing enough cooperation in such large and anonymous systems so they become truly useful. Since users are not monitored as to who makes their files available to the rest of the network (produce) or downloads remote files (consume), nor are statistics maintained, the possibility exist that as the user community in such networks gets large, users will stop producing and only consume.
    [Show full text]
  • Title: P2P Networks for Content Sharing
    Title: P2P Networks for Content Sharing Authors: Choon Hoong Ding, Sarana Nutanong, and Rajkumar Buyya Grid Computing and Distributed Systems Laboratory, Department of Computer Science and Software Engineering, The University of Melbourne, Australia (chd, sarana, raj)@cs.mu.oz.au ABSTRACT Peer-to-peer (P2P) technologies have been widely used for content sharing, popularly called “file-swapping” networks. This chapter gives a broad overview of content sharing P2P technologies. It starts with the fundamental concept of P2P computing followed by the analysis of network topologies used in peer-to-peer systems. Next, three milestone peer-to-peer technologies: Napster, Gnutella, and Fasttrack are explored in details, and they are finally concluded with the comparison table in the last section. 1. INTRODUCTION Peer-to-peer (P2P) content sharing has been an astonishingly successful P2P application on the Internet. P2P has gained tremendous public attention from Napster, the system supporting music sharing on the Web. It is a new emerging, interesting research technology and a promising product base. Intel P2P working group gave the definition of P2P as "The sharing of computer resources and services by direct exchange between systems". This thus gives P2P systems two main key characteristics: • Scalability: there is no algorithmic, or technical limitation of the size of the system, e.g. the complexity of the system should be somewhat constant regardless of number of nodes in the system. • Reliability: The malfunction on any given node will not effect the whole system (or maybe even any other nodes). File sharing network like Gnutella is a good example of scalability and reliability.
    [Show full text]