Performance Evaluation of a Distributed Storage Service in Community Network Clouds

CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Computat.: Pract. Exper. (2015) Published online in Wiley Online Library (wileyonlinelibrary.com). DOI: 10.1002/cpe.3658 SPECIAL ISSUE PAPER Performance evaluation of a distributed storage service in community network clouds Mennan Selimi1,*,†, Felix Freitag1, Llorenç Cerdà-Alabern1 and Luís Veiga2 1Department of Computer Architecture, Universitat Politècnica de Catalunya, Barcelona, Spain 2Instituto Superior Técnico (IST), INESC-ID Lisboa, Lisbon, Portugal SUMMARY Community networks are self-organized and decentralized communication networks built and operated by citizens, for citizens. The consolidation of today’s cloud technologies offers now, for community networks, the possibility to collectively develop community clouds, building upon user-provided networks and extend- ing toward cloud services. Cloud storage, and in particular secure and reliable cloud storage, could become a key community cloud service to enable end-user applications. In this paper, we evaluate in a real deployment the performance of Tahoe least-authority file system (Tahoe-LAFS), a decentralized storage system with provider-independent security that guarantees privacy to the users. We evaluate how the Tahoe-LAFS storage system performs when it is deployed over distributed community cloud nodes in a real community network such as Guifi.net. Furthermore, we evaluate Tahoe-LAFS in the Microsoft Azure commercial cloud platform, to compare and understand the impact of homogeneous network and hardware resources on the performance of the Tahoe-LAFS. We observed that the write operation of Tahoe-LAFS resulted in similar performance when using either the community network cloud or the commercial cloud. However, the read operation achieved better performance in the Azure cloud, where the reading from multiple nodes of Tahoe- LAFS benefited from the homogeneity of the network and nodes. Our results suggest that Tahoe-LAFS can run on community network clouds with suitable performance for the needed end-user experience. Copyright © 2015 John Wiley & Sons, Ltd. Received 28 February 2015; Revised 22 June 2015; Accepted 8 August 2015 KEY WORDS: community networks; community cloud; cloud storage 1. INTRODUCTION Wireless community networks [1] encompass an emergent model of infrastructure that aims to sat- isfy a community’s demand for Internet access and information and communications technology services. Most community networks originated in rural areas, which commercial telecommunication operators left behind when deploying the broadband access infrastructure for urban areas. Different stakeholders of such a geographic area teamed up to invest, create, and run a community network as an open telecommunication infrastructure based on self-service and self-management by the users. Since the first community networks started more than 10 years ago, they have become rather successful. Several large community networks in Europe have from 500 to 30,000 nodes; these include Guifi.net [2], FunkFeuer [3], Athens Wireless Metropolitan Network (AWMN) [4], and many more that exist worldwide. Most of them are based on Wi-Fi technology (ad-hoc networks, IEEE 802.11a/b/g/n access points in the first hop, long-distance point-to-point Wi-Fi links for the trunk network), and also a growing number of optic fiber links have started to be deployed. *Correspondence to: Mennan Selimi, Department of Computer Architecture, UPC-BarcelonaTech, Jordi Girona, 1-3, 08034, Barcelona, Spain. †E-mail: [email protected] Copyright © 2015 John Wiley & Sons, Ltd. M. SELIMI ET AL. Figure 1. Guifi.net and Athens Wireless Metropolitan Network nodes and links in the area around Barcelona and Athens. Figure 1 shows as an example the wireless links and nodes of the Guifi.net and AWMN community network in the area of Barcelona and Athens. Community networks are a successful case of resource sharing among a collective, where resources shared are not only the networking hardware but also the time, effort, and knowledge contributed by its members that are required for maintaining the network. Resource sharing in community networks from the equipment perspective refers, in practice, to the sharing of the nodes network bandwidth. This sharing enables the traffic from other nodes to be routed over the nodes of different node owners. This is carried out in a reciprocal manner, which allows community networks to successfully operate as IP networks. The sharing of other computing resources such as storage, which is now common practice in today’s Internet through cloud computing, hardly exists in community networks, but it can be made possible through clouds in community networks, that is, community network clouds [5]. The concept of community network clouds has been introduced in its generic form before [e.g., 6, 7], as a cloud deployment model in which a cloud infrastructure is built and provisioned for an exclusive use by a specific community of consumers with shared concerns and interests. Besides public, private, and hybrid models of cloud computing, a community cloud presents another option differing from the others in that it is designed with a specific community in mind and with costs and responsibility shared among community members. Security concerns, government legislation, need for sophisticated control, or enhanced performance requirements in many scenarios make it difficult to rely on public clouds, and private clouds are not always the most cost-effective option. We refer here to a specific kind of a community cloud in which sharing of computing resources is from within community networks [8, 9], using the application models of cloud computing in general. Establishing a community cloud not only involves many challenges, both in technological context and socio-economic context, but also promises an interesting value proposition for communities in terms of local services and applications. To conduct our evaluation of applications in a realistic scenario, our approach is to leverage on the cloud infrastructure provided by an ongoing cloud deployment in the Guifi.net community network [10] and a testbed deployed in community networks [1, 11]. The approach for performance assessment of applications in community networks, which we propose in this paper, is to set the experimental conditions as seen from the end user: experiment in production community networks, focus on metrics that are of interest to end users, and deploy applications on real nodes integrated in community networks. We specifically look at a distributed secure storage application, Tahoe least-authority file system (Tahoe-LAFS) [12], to be used within these wireless community-owned networks. After basic connectivity, storage is the most general and fundamental resource to support a distributed system, for example, to store users’ files, as in services like Dropbox, being fundamental for cloud take-up in community network scenarios. Tahoe-LAFS is an open-source distributed cloud storage system. Features of Tahoe-LAFS that are relevant for community networks are that it encrypts data at the Copyright © 2015 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. (2015) DOI: 10.1002/cpe DISTRIBUTED STORAGE SERVICE IN COMMUNITY NETWORK CLOUDS client side, erasure codes it (data are broken into fragments, expanded, and encoded with redun- dant data pieces), and then disperses it among a set of storage nodes. This approach of Tahoe-LAFS results in high availability; for example, even if some of the storage nodes are down or taken over by an attacker, the entire file system continues to function correctly while preserving privacy and security. This is key for users’ adoption. We host the Tahoe-LAFS distributed application on nodes of a community cloud spread inside the production community network. Furthermore, we compare the performance of Tahoe-LAFS when it is deployed in community clouds and in a commercial cloud such as Microsoft Azure. The contributions of this paper are the following: (1) Verify the correct operation and adequate performance of the Tahoe-LAFS distributed file system in a real cloud system in a wireless community network. (2) Characterize the write and read performance of Tahoe-LAFS with different workloads generated by IOzone storage benchmark. (3) Present and show the feasibility of a case for storage sharing with performance evaluation conducted in a real setting of a deployed cloud system over a community network. We expect that these results will open the door to further application deployments and take-up of community clouds. The operational distributed file system, which we investigated in this paper, should encourage application developers to create innovative cloud-based community network services and users to contribute resources that enable the hosting and usage of these services. The rest of the paper is organized as follows. Section 2 defines the community clouds and discusses the requirements for such clouds. Section 3 explains the characteristics of the Tahoe-LAFS distributed file system that we use. Section 4 explains the experiment environment for community clouds and Azure cloud. Section 5 shows and discusses the results from our cloud deployment. Section 6 talks about related work, and Section 7 concludes and discusses future research directions. 2. CLOUDS IN COMMUNITY NETWORKS 2.1. Community cloud scenarios Our proposition is to deploy a distributed storage service in community networks using cloud- based resources. In this section, we

Performance Evaluation of a Distributed Storage Service in Community Network Clouds

Globalfs: a Strongly Consistent Multi-Site File System

Big Data Storage Workload Characterization, Modeling and Synthetic Generation

File System Design Approaches

A Decentralized Cloud Storage Network Framework

Alpha Release of the Data Service

Hands-On Linux Administration on Azure

Tahoe-LAFS Documentation Release 1.X

Porting FUSE to L4re

Paratrac: a Fine-Grained Profiler for Data-Intensive Workflows

The Quantcast File System

Extendable Storage Framework for Reliable Clustered Storage Systems by Sumit Narayan B.E., University of Madras, 2002 M.S., University of Connecticut, 2004

Collocated Data Deduplication for Virtual Machine Backup in the Cloud