Indexes for Distributed File/Storage Systems As a Large Scale Virtual Machine Disk Image Storage in a Wide Area Network
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Performance Evaluation of a Distributed Storage Service in Community Network Clouds
CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Computat.: Pract. Exper. (2015) Published online in Wiley Online Library (wileyonlinelibrary.com). DOI: 10.1002/cpe.3658 SPECIAL ISSUE PAPER Performance evaluation of a distributed storage service in community network clouds Mennan Selimi1,*,†, Felix Freitag1, Llorenç Cerdà-Alabern1 and Luís Veiga2 1Department of Computer Architecture, Universitat Politècnica de Catalunya, Barcelona, Spain 2Instituto Superior Técnico (IST), INESC-ID Lisboa, Lisbon, Portugal SUMMARY Community networks are self-organized and decentralized communication networks built and operated by citizens, for citizens. The consolidation of today’s cloud technologies offers now, for community networks, the possibility to collectively develop community clouds, building upon user-provided networks and extend- ing toward cloud services. Cloud storage, and in particular secure and reliable cloud storage, could become a key community cloud service to enable end-user applications. In this paper, we evaluate in a real deploy- ment the performance of Tahoe least-authority file system (Tahoe-LAFS), a decentralized storage system with provider-independent security that guarantees privacy to the users. We evaluate how the Tahoe-LAFS storage system performs when it is deployed over distributed community cloud nodes in a real community network such as Guifi.net. Furthermore, we evaluate Tahoe-LAFS in the Microsoft Azure commercial cloud platform, to compare and understand the impact of homogeneous network and hardware resources on the performance of the Tahoe-LAFS. We observed that the write operation of Tahoe-LAFS resulted in similar performance when using either the community network cloud or the commercial cloud. However, the read operation achieved better performance in the Azure cloud, where the reading from multiple nodes of Tahoe- LAFS benefited from the homogeneity of the network and nodes. -
SNIA MSFS Tutorial
Massively Scalable File Storage Philippe Nicolas - OpenIO SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA unless otherwise noted. Member companies and individual members may use this material in presentations and literature under the following conditions: Any slide or slides used must be reproduced in their entirety without modification The SNIA must be acknowledged as the source of any material used in the body of any document containing material from these presentations. This presentation is a project of the SNIA Education Committee. Neither the author nor the presenter is an attorney and nothing in this presentation is intended to be, or should be construed as legal advice or an opinion of counsel. If you need legal advice or a legal opinion please contact your attorney. The information presented herein represents the author's personal opinion and current understanding of the relevant issues involved. The author, the presenter, and the SNIA do not assume any responsibility or liability for damages arising out of any reliance on or use of this information. NO WARRANTIES, EXPRESS OR IMPLIED. USE AT YOUR OWN RISK. Massively Scalable File Storage Approved SNIA Tutorial © 2016 Storage Networking Industry Association. All Rights Reserved. 2 SNIA Legal Notice Massively Scalable File Storage Internet changed the world and continues to revolutionize how people are connected, exchange data and do business. This radical change is one of the cause of the rapid explosion of data volume that required a new data storage approach and design. One of the common element is that unstructured data rules the IT world. -
Experiences on File Systems Which Is the Best File System for You?
Experiences on File Systems Which is the best file system for you? Jakob Blomer CERN PH/SFT CHEP 2015 Okinawa, Japan 1 / 22 We like file systems for their rich and standardized interface We struggle finding an optimal implementation of that interface Why Distributed File Systems Physics experiments store their files in a variety of file systems for a good reason File system interface is portable: we can take our local analysis ∙ application and run it anywhere on a big data set File system as a storage abstraction is a ∙ sweet spot between data flexibility and data organization 2 / 22 Why Distributed File Systems Physics experiments store their files in a variety of file systems for a good reason File system interface is portable: we can take our local analysis ∙ application and run it anywhere on a big data set File system as a storage abstraction is a ∙ sweet spot between data flexibility and data organization We like file systems for their rich and standardized interface We struggle finding an optimal implementation of that interface 2 / 22 Shortlist of Distributed File Systems Quantcast File System CernVM-FS 3 / 22 Agenda 1 What do we want from a distributed file system? 2 Sorting and searching the file system landscape 3 Technology trends and future challenges 4 / 22 Can one size fit all? Change Frequency Throughput MB/s Mean File Size Data Classes Throughput Home folders IOPS ∙ Physics Data Data ∙ Value ∙ Recorded ∙ Simulated Volume ∙ Analysis results Confidentiality Software binaries ∙ Cache Hit Rate Scratch area Redundancy ∙ [Data -
HPC Storage, Part 1&2
Linux Clusters Institute: HPC Storage, Part 1&2 Rutgers University, 19-23 August 2019 Garrett McGrath, Princeton Neuroscience Institute [email protected] HPC Storage Concepts, Planning and Implementation Targets for Session #1 Target Audience: Those involved in designing, implementing, or managing HPC storage systems. Outline: ● Concepts and Terminology ● Goals & Requirements ● Storage Hardware ● File Systems ● Wrap Up Concepts and Terminology What is Storage? A place to store data. Either temporarily or permanently. ● Processor Cache ○ Fastest access; closest to the CPU; temporary CPU ● System Memory (DRAM) Registers ○ Very Fast access; close to CPU but not on it; temporary Latency & Size Increase Cache ● Solid State Storage (L1, L2, L3,) ○ Fast access ○ Can be system internal or part of an external storage system ○ Capable of high densities with high associated costs Memory ● Spinning Disk (DRAM, HBM, ) ○ Slow; performance is tied to access behavior Solid State Disk ○ Can be system internal or part of an external storage system (SATA SSD, M.2 Module, ○ Capable of extremely high densities PCIe Card) ● Tape ○ Extremely slow; typically found in only in libraries Spinning Disks Bandwidth Increase(PMR, SMR, HAMR/MAMR) Tape (DLT-S, DAT, AIT, LTO, QIC) Concepts and Terminology • IOPs: Input/Output Operations per second • RAID: Redundant Array of Inexpensive Disk • JBOD: Just a Bunch of Disk • RAS: reliability, accessibility, serviceability • Storage Server: provide direct access to storage device and functions as a data manager for that -
Tools for Big Data Analysis
Masaryk University Faculty of Informatics Tools for big data analysis Master’s Thesis Bc. Martin Macák Brno, Spring 2018 Replace this page with a copy of the official signed thesis assignment anda copy of the Statement of an Author. Declaration Hereby I declare that this paper is my original authorial work, which I have worked out on my own. All sources, references, and literature used or excerpted during elaboration of this work are properly cited and listed in complete reference to the due source. Bc. Martin Macák Advisor: doc. Ing. RNDr. Barbora Bühnová, Ph.D. i Acknowledgements I would like to thank my supervisor, doc. Ing. RNDr. Barbora Bühnová, Ph.D. for offering me to work on this thesis. Her support, guidance, and patience greatly helped me to finish it. I would also like to thank her for introducing me to the great team of people in the CERIT-SC Big Data project. From this team, I would like to especially thank RNDr. Tomáš Rebok, Ph.D., who had many times found time for me, to provide me useful advice, and Bruno Rossi, PhD, who had given me the opportunity to present the results of this thesis in LaSArIS seminar. I would also like to express my gratitude for the support of my family, my parents, Jana and Alexander, and the best sister, Nina. My thanks also belong to my supportive friends, mainly Bc. Tomáš Milo, Bc. Peter Kelemen, Bc. Jaroslav Davídek, Bc. Štefan Bojnák, and Mgr. Ondřej Gasior. Lastly, I would like to thank my girlfriend, Bc. Iveta Vidová for her patience and support. -
A Flexible and Adaptable Distributed File System
A Flexible and Adaptable Distributed File System S. E. N. Fernandes1, R. S. Lobato1, A. Manacero1, R. Spolon2, and M. A. Cavenaghi2 1Dept. Computer Science and Statistics, Univ. Estadual Paulista UNESP, São José do Rio Preto, São Paulo, Brazil 2Dept. Computing, Univ. Estadual Paulista UNESP, Bauru, São Paulo, Brazil Abstract— This work describes the development of a flexible 2. Related work and adaptable distributed file system model where the main Among the several existing DFSs, this work focused on concepts of distributed computing are intrinsically incor- exploring the key features of some models of DFSs based porated. The file system incorporates characteristics such on traditional designs and some newer systems, allowing as transparency, scalability, fault-tolerance, cryptography, to extract features for the development of a DFSs that has support for low-cost hardware, easy configuration and file characteristics such as high performance, fault-tolerance and manipulation. easiness of use. Keywords: Distributed file systems, fault-tolerance, data storage. 2.1 Network File System Network File System (NFS) [2] [3] is a DFS based 1. Introduction on remote procedure calls (RPC) providing a convenient medium to applications through a virtual layer (Virtual File The amount of stored data increases at an impressive rate, System - VFS) that enables a transparent access to NFS demanding more storage space and compatible processing components [5] [6] [7]. speeds. Aiming to avoid complete data loss from failures or system overloads, it became usual to adopt the distributed 2.2 Andrew File System files model [1] [2]. Andrew File System (AFS) was designed aiming scala- Therefore, a distributed file system (DFS) is a system bility to several users. -
Performance Evaluation of a Distributed Storage Service in Community Network Clouds
CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Computat.: Pract. Exper. 0000; 00:1–15 Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/cpe Performance Evaluation of a Distributed Storage Service in Community Network Clouds Mennan Selimi1∗, Felix Freitag1, Llorenc¸Cerda-Alabern` 1, Lu´ıs Veiga2 1Department of Computer Architecture, Universitat Politecnica` de Catalunya, Barcelona, Spain 2Instituto Superior Tecnico´ (IST), INESC-ID Lisboa, Lisbon, Portugal SUMMARY Community networks are decentralized communication networks built and operated by citizens, for citizens. The consolidation of todays cloud technologies offers now, for community networks, the possibility to collectively develop community clouds, building upon user-provided networks and extending toward cloud services. Cloud storage, and in particular secure and reliable cloud storage, could become a key community cloud service to enable end-user applications. In this paper, we evaluate in a real deployment the performance of Tahoe-LAFS, a decentralized storage system with provider-independent security that guarantees privacy to the users. We evaluate how the Tahoe-LAFS storage system performs when it is deployed over distributed community cloud nodes in a real community network. Furthermore, we evaluate Tahoe-LAFS in the Microsoft Azure commercial cloud platform, to compare and understand the impact of homogeneous network and hardware resources on the performance of the Tahoe-LAFS. We observed that the write operation of Tahoe-LAFS resulted in similar performance when using either the community network cloud or the commercial cloud. However, the read operation achieved better performance in the Azure cloud, where the reading from multiple nodes of Tahoe-LAFS benefited from the homogeneity of the network and nodes. -
Contrail White Paper Overview of the Contrail System, Components And
Contrail White Paper Overview of the Contrail system, components and usage Bringing data centre style versatility to the Cloud Second edition January 2014 © 2014 Contrail project This White Paper is produced by the Contrail Consortium to give insight in the technology and use cases developed by the project in the Cloud computing context. Although we did every effort to provide correct information, the actual working of the Contrail software and use cases could deviate from the descriptions given here. Copyright Images belongs to their owners. http://contrail-project.eu The Contrail project is managed by the Contrail consortium. Contrail is partially funded by the FP7 Programme of the European Commission under Grant Agreement FP7-ICT-257438. Version II 1.5 2014-01-20 [preface] Contrail - Bringing data centre versatility to the Cloud Cloud computing has developed fast over the past years. However, what it exactly is and how it can help your business is not always obvious. This White Paper provides a global overview of Contrail and is intended for IT managers that want to quickly get an idea about how they can take advantage of federated clouds at an Infrastructure-as-a-Service or Platform-as-a-Service level. In this White Paper we describe how the software that is being developed by the Contrail project can bring data centre style versatility to the Cloud that includes scalability, flexibility, security, and reliability. In this context a Cloud can be a private, public or hybrid Cloud. A Cloud can be a set of infrastructures (Infrastructure-as-a-Service) or platforms (Platform-as-a-Service) managed as one single entity by system management software, commonly referred to as Cloud middleware (OpenNebula, OpenStack, etc.). -
File Systems Unfit As Distributed Storage Backends: Lessons from 10 Years of Ceph Evolution
File Systems Unfit as Distributed Storage Backends: Lessons from 10 Years of Ceph Evolution Abutalib Aghayev Sage Weil Michael Kuchnik Carnegie Mellon University Red Hat, Inc. Carnegie Mellon University Mark Nelson Gregory R. Ganger George Amvrosiadis Red Hat, Inc. Carnegie Mellon University Carnegie Mellon University Abstract 1 Introduction For a decade, the Ceph distributed file system followed the Distributed file systems operate on a cluster of machines, conventional wisdom of building its storage backend on top each assigned one or more roles such as cluster state mon- of local file systems. This is a preferred choice for most dis- itor, metadata server, and storage server. Storage servers, tributed file systems today because it allows them to benefit which form the bulk of the machines in the cluster, receive from the convenience and maturity of battle-tested code. I/O requests over the network and serve them from locally Ceph’s experience, however, shows that this comes at a high attached storage devices using storage backend software. Sit- price. First, developing a zero-overhead transaction mecha- ting in the I/O path, the storage backend plays a key role in nism is challenging. Second, metadata performance at the the performance of the overall system. local level can significantly affect performance at the dis- Traditionally distributed file systems have used local file tributed level. Third, supporting emerging storage hardware systems, such as ext4 or XFS, directly or through middleware, is painstakingly slow. as the storage backend [29, 34, 37, 41, 74, 84, 93, 98, 101, 102]. Ceph addressed these issues with BlueStore, a new back- This approach has delivered reasonable performance, pre- end designed to run directly on raw storage devices.