The Quantcast File System
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Globalfs: a Strongly Consistent Multi-Site File System
GlobalFS: A Strongly Consistent Multi-Site File System Leandro Pacheco Raluca Halalai Valerio Schiavoni University of Lugano University of Neuchatelˆ University of Neuchatelˆ Fernando Pedone Etienne Riviere` Pascal Felber University of Lugano University of Neuchatelˆ University of Neuchatelˆ Abstract consistency, availability, and tolerance to partitions. Our goal is to ensure strongly consistent file system operations This paper introduces GlobalFS, a POSIX-compliant despite node failures, at the price of possibly reduced geographically distributed file system. GlobalFS builds availability in the event of a network partition. Weak on two fundamental building blocks, an atomic multicast consistency is suitable for domain-specific applications group communication abstraction and multiple instances of where programmers can anticipate and provide resolution a single-site data store. We define four execution modes and methods for conflicts, or work with last-writer-wins show how all file system operations can be implemented resolution methods. Our rationale is that for general-purpose with these modes while ensuring strong consistency and services such as a file system, strong consistency is more tolerating failures. We describe the GlobalFS prototype in appropriate as it is both more intuitive for the users and detail and report on an extensive performance assessment. does not require human intervention in case of conflicts. We have deployed GlobalFS across all EC2 regions and Strong consistency requires ordering commands across show that the system scales geographically, providing replicas, which needs coordination among nodes at performance comparable to other state-of-the-art distributed geographically distributed sites (i.e., regions). Designing file systems for local commands and allowing for strongly strongly consistent distributed systems that provide good consistent operations over the whole system. -
Big Data Storage Workload Characterization, Modeling and Synthetic Generation
BIG DATA STORAGE WORKLOAD CHARACTERIZATION, MODELING AND SYNTHETIC GENERATION BY CRISTINA LUCIA ABAD DISSERTATION Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science in the Graduate College of the University of Illinois at Urbana-Champaign, 2014 Urbana, Illinois Doctoral Committee: Professor Roy H. Campbell, Chair Professor Klara Nahrstedt Associate Professor Indranil Gupta Assistant Professor Yi Lu Dr. Ludmila Cherkasova, HP Labs Abstract A huge increase in data storage and processing requirements has lead to Big Data, for which next generation storage systems are being designed and implemented. As Big Data stresses the storage layer in new ways, a better understanding of these workloads and the availability of flexible workload generators are increas- ingly important to facilitate the proper design and performance tuning of storage subsystems like data replication, metadata management, and caching. Our hypothesis is that the autonomic modeling of Big Data storage system workloads through a combination of measurement, and statistical and machine learning techniques is feasible, novel, and useful. We consider the case of one common type of Big Data storage cluster: A cluster dedicated to supporting a mix of MapReduce jobs. We analyze 6-month traces from two large clusters at Yahoo and identify interesting properties of the workloads. We present a novel model for capturing popularity and short-term temporal correlations in object re- quest streams, and show how unsupervised statistical clustering can be used to enable autonomic type-aware workload generation that is suitable for emerging workloads. We extend this model to include other relevant properties of stor- age systems (file creation and deletion, pre-existing namespaces and hierarchical namespaces) and use the extended model to implement MimesisBench, a realistic namespace metadata benchmark for next-generation storage systems. -
A Decentralized Cloud Storage Network Framework
Storj: A Decentralized Cloud Storage Network Framework Storj Labs, Inc. October 30, 2018 v3.0 https://github.com/storj/whitepaper 2 Copyright © 2018 Storj Labs, Inc. and Subsidiaries This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 license (CC BY-SA 3.0). All product names, logos, and brands used or cited in this document are property of their respective own- ers. All company, product, and service names used herein are for identification purposes only. Use of these names, logos, and brands does not imply endorsement. Contents 0.1 Abstract 6 0.2 Contributors 6 1 Introduction ...................................................7 2 Storj design constraints .......................................9 2.1 Security and privacy 9 2.2 Decentralization 9 2.3 Marketplace and economics 10 2.4 Amazon S3 compatibility 12 2.5 Durability, device failure, and churn 12 2.6 Latency 13 2.7 Bandwidth 14 2.8 Object size 15 2.9 Byzantine fault tolerance 15 2.10 Coordination avoidance 16 3 Framework ................................................... 18 3.1 Framework overview 18 3.2 Storage nodes 19 3.3 Peer-to-peer communication and discovery 19 3.4 Redundancy 19 3.5 Metadata 23 3.6 Encryption 24 3.7 Audits and reputation 25 3.8 Data repair 25 3.9 Payments 26 4 4 Concrete implementation .................................... 27 4.1 Definitions 27 4.2 Peer classes 30 4.3 Storage node 31 4.4 Node identity 32 4.5 Peer-to-peer communication 33 4.6 Node discovery 33 4.7 Redundancy 35 4.8 Structured file storage 36 4.9 Metadata 39 4.10 Satellite 41 4.11 Encryption 42 4.12 Authorization 43 4.13 Audits 44 4.14 Data repair 45 4.15 Storage node reputation 47 4.16 Payments 49 4.17 Bandwidth allocation 50 4.18 Satellite reputation 53 4.19 Garbage collection 53 4.20 Uplink 54 4.21 Quality control and branding 55 5 Walkthroughs ............................................... -
Hierarchical Storage Management with SAM-FS
Hierarchical Storage Management with SAM-FS Achim Neumann CeBiTec / Bielefeld University Agenda ● Information Lifecycle Management ● Backup – Technology of the past ● Filesystems, SAM-FS, QFS and SAM-QFS ● Main Functions of SAM ● SAM and Backup ● Rating, Perspective and Literature June 13, 2006 Achim Neumann 2 ILM – Information Lifecylce Management ● 5 Exabyte data per year (92 % digital, 8 % analog) ● 20% fixed media (disk), 80% removable media ● Managing data from Creation to Cremation (and everything that is between (by courtesy of Sun Microsystems Inc). June 13, 2006 Achim Neumann 3 Different classes of Storagesystems (by courtesy of Sun Microsystems Inc). June 13, 2006 Achim Neumann 4 Backup – Technology of the past ● Backup – protect data from data loss ● Backup is not the problem ● Ability to restore data when needed − Disaster − Hardware failure − User error − Old versions from a file June 13, 2006 Achim Neumann 5 Backup – Technology of the past (cont.) ● Triple data in 2 – 3 years ● Smaller backup window ☞ more data in less time Stream for # Drives for # Drives for #TB 24 h 24 h 6 h 1 12,1 MB/s 2 5 10 121 MB/s 13 49 30 364 MB/s 37 146 June 13, 2006 Achim Neumann 6 Backup – Technology of the past (cont.) ● Limited number of tape drives in library ● Limited number of I/O Channels in server ● Limited scalability of backup software ● Inability to keep all tape drives streaming at one time ● At least every full backup has a copy of the same file ● It's not unusual to have 8 or 10 copies of the same file ☞ excessive media usage and -
Petascale Data Management: Guided by Measurement
Petascale Data Management: Guided by Measurement petascale data storage institute www.pdsi-scidac.org/ MPP2 www.pdsi-scidac.org MEMBER ORGANIZATIONS & Lustre • Los Alamos National Laboratory – institute.lanl.gov/pdsi/ • Parallel Data Lab, Carnegie Mellon University – www.pdl.cmu.edu/ • Oak Ridge National Laboratory – www.csm.ornl.gov/ • Sandia National Laboratories – www.sandia.gov/ • National Energy Research Scientific Computing Center • Center for Information Technology Integration, U. of Michigan pdsi.nersc.gov/ www.citi.umich.edu/projects/pdsi/ • Pacific Northwest National Laboratory – www.pnl.gov/ • University of California at Santa Cruz – www.pdsi.ucsc.edu/ The Computer Failure Data Repository Filesystems Statistics Survey • Goal: to collect and make available failure data from a large variety of sites GOALS • Better understanding of the characteristics of failures in the real world • Gather & build large DB of static filetree summary • Now maintained by USENIX at cfdr.usenix.org/ • Build small, non-invasive, anonymizing stats gather tool • Distribute fsstats tool via easily used web site Red Storm NAME SYSTEM TYPE SYSTEM SIZE TIME PERIOD TYPE OF DATA • Encourage contributions (output of tool) from many FSs Any node • Offer uploaded statistics & summaries to public & Lustre 22 HPC clusters 5000 nodes 9 years outage . Label Date Type File Total Size Total Space # files # dirs max size max space max dir max name avg file avg dir . 765 nodes (2008) System TB TB M K GB GB ents bytes MB ents . 1 HPC cluster 5 years PITTSBURGH 3,400 disks -
Kratka Povijest Unixa Od Unicsa Do Freebsda I Linuxa
Kratka povijest UNIXa Od UNICSa do FreeBSDa i Linuxa 1 Autor: Hrvoje Horvat Naslov: Kratka povijest UNIXa - Od UNICSa do FreeBSDa i Linuxa Licenca i prava korištenja: Svi imaju pravo koristiti, mijenjati, kopirati i štampati (printati) knjigu, prema pravilima GNU GPL licence. Mjesto i godina izdavanja: Osijek, 2017 ISBN: 978-953-59438-0-8 (PDF-online) URL publikacije (PDF): https://www.opensource-osijek.org/knjige/Kratka povijest UNIXa - Od UNICSa do FreeBSDa i Linuxa.pdf ISBN: 978-953- 59438-1- 5 (HTML-online) DokuWiki URL (HTML): https://www.opensource-osijek.org/dokuwiki/wiki:knjige:kratka-povijest- unixa Verzija publikacije : 1.0 Nakalada : Vlastita naklada Uz pravo svakoga na vlastito štampanje (printanje), prema pravilima GNU GPL licence. Ova knjiga je napisana unutar inicijative Open Source Osijek: https://www.opensource-osijek.org Inicijativa Open Source Osijek je član udruge Osijek Software City: http://softwarecity.hr/ UNIX je registrirano i zaštićeno ime od strane tvrtke X/Open (Open Group). FreeBSD i FreeBSD logo su registrirani i zaštićeni od strane FreeBSD Foundation. Imena i logo : Apple, Mac, Macintosh, iOS i Mac OS su registrirani i zaštićeni od strane tvrtke Apple Computer. Ime i logo IBM i AIX su registrirani i zaštićeni od strane tvrtke International Business Machines Corporation. IEEE, POSIX i 802 registrirani i zaštićeni od strane instituta Institute of Electrical and Electronics Engineers. Ime Linux je registrirano i zaštićeno od strane Linusa Torvaldsa u Sjedinjenim Američkim Državama. Ime i logo : Sun, Sun Microsystems, SunOS, Solaris i Java su registrirani i zaštićeni od strane tvrtke Sun Microsystems, sada u vlasništvu tvrtke Oracle. Ime i logo Oracle su u vlasništvu tvrtke Oracle. -
LSF ’07: 2007 Linux Storage & Filesystem Workshop Namically Adjusted According to the Current Popularity of Its Hot Zone
June07login1summaries_press.qxd:login summaries 5/27/07 10:27 AM Page 84 PRO: A Popularity-Based Multi-Threaded Reconstruction overhead of PRO is O(n), although if a priority queue is Optimization for RAID-Structured Storage Systems used in the PRO algorithm the computation overhead can Lei Tian and Dan Feng, Huazhong University of Science and be reduced to O(log n). The entire PRO implementation in Technology; Hong Jiang, University of Nebraska—Lincoln; Ke the RAIDFrame software only added 686 lines of code. Zhou, Lingfang Zeng, Jianxi Chen, and Zhikun Wang, Hua- Work on PRO is ongoing. Future work includes optimiz - zhong University of Science and Technology and Wuhan ing the time slice, scheduling strategies, and hot zone National Laboratory for Optoelectronics; Zhenlei Song, length. Currently, PRO is being ported into the Linux soft - Huazhong University of Science and Technology ware RAID. Finally, the authors plan on further investigat - ing use of access patterns to help predict user accesses and Hong Jiang began his talk by discussing the importance of of filesystem semantic knowledge to explore accurate re - data recovery. Disk failures have become more common in construction. RAID-structured storage systems. The improvement in disk capacity has far outpaced improvements in disk band - The first questioner asked about the average rate of recov - width, lengthening the overall RAID recovery time. Also, ery for PRO. Hong answered that the average reconstruc - disk drive reliability has improved slowly, resulting in a tion time is several hundred seconds in the experimental very high overall failure rate in a large-scale RAID storage setup. -
Lustrefs and Its Ongoing Evolution for High Performance Computing and Data Analysis Solutions
LustreFS and its ongoing Evolution for High Performance Computing and Data Analysis Solutions Roger Goff Senior Product Manager DataDirect Networks, Inc. 2014 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved. What is Lustre? Parallel/shared file system for clusters Aggregates many systems into one file system Hugely scalable - 100s of GB/s throughput, 10s of thousands of native clients Scales I/O throughput and capacity Widely adopted in HPC community Open source, community driven development Supports RDMA over high speed low latency interconnects and Ethernet Runs on commodity storage and purpose built HPC storage platforms Abstracts hardware view of storage from software Highly available 2 2014 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved. Lustre Node Types Management Servers (MGS) Object Storage Servers (OSS) Meta-Data Servers (MDS) Clients LNET Routers 3 2014 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved. Lustre MGS ManaGement Server Server node which manages cluster configuration database All clients, OSS and MDS need to know how to contact the MGS MGS provides all other components with information about the Lustre file system(s) it manages MGS can be separated from or co-located with MDS 4 2014 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved. Lustre MDS Meta Data Server Manages the names and directories in the Lustre file system Stores metadata (such as filenames, directories, permissions and file layout) on a MetaData Target (MDT) Usually MDSs are configured in an active/passive failover pair Lustre inode ext3 inode 5 2014 Storage Developer Conference. © Insert Your Company Name. -
Sun Storedge™ QFS and Sun Storedge SAM-FS Software Installation and Configuration Guide
Sun StorEdge™ QFS and Sun StorEdge SAM-FS Software Installation and Configuration Guide Release 4.2 Sun Microsystems, Inc. www.sun.com Part No. 817-7722-10 September 2004, Revision A Submit comments about this document at: http://www.sun.com/hwdocs/feedback Copyright 2004 Sun Microsystems, Inc., 4150 Network Circle, Santa Clara, California 95054, U.S.A. All rights reserved. Sun Microsystems, Inc. has intellectual property rights relating to technology that is described in this document. In particular, and without limitation, these intellectual property rights may include one or more of the U.S. patents listed at http://www.sun.com/patents and one or more additional patents or pending patent applications in the U.S. and in other countries. This document and the product to which it pertains are distributed under licenses restricting their use, copying, distribution, and decompilation. No part of the product or of this document may be reproduced in any form by any means without prior written authorization of Sun and its licensors, if any. Third-party software, including font technology, is copyrighted and licensed from Sun suppliers. Parts of the product may be derived from Berkeley BSD systems, licensed from the University of California. UNIX is a registered trademark in the U.S. and in other countries, exclusively licensed through X/Open Company, Ltd. Sun, Sun Microsystems, the Sun logo, AnswerBook2, docs.sun.com, Solaris, SunOS, SunSolve, Java, JavaScript, Solstice DiskSuite, and StorEdge are trademarks or registered trademarks of Sun Microsystems, Inc. in the U.S. and in other countries. All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc. -
View the Slides
SMB3.1.1 POSIX Protocol Extensions: Summary and Current Implementation Status Steve French Azure Storage – Microsoft Samba Team And SMB Jeremy Allison Google/Samba Team 3.1.1 Legal Statement This work represents the views of the author(s) and does not necessarily reflect the views of Microsoft or Google Linux is a registered trademark of Linus Torvalds. Other company, product, and service names may be trademarks or service marks of others. Outline Linux is a lot more than POSIX ... Why do these extensions matter? Implementation Status What works today? Some details How to handle Linux continuing to extend APIs? Wireshark and Tracing Linux > POSIX Currently huge number of syscalls! (try “git grep SYSCALL_DEFINE” well over 850 and 500+ are even documented “man syscalls” FS layer has 223). Verified today vs Only about 100 POSIX API calls 513 syscalls with man pages! +12 just since last year’s SDC! Some examples of new fs ones from past 9 months ... Syscall name Kernel Version introduced io_uring_enter 5.1 io_uring_register 5.1 io_uring_setup 5.1 move_mount 5.2 open_tree 5.2 fsconfig 5.2 fsmount 5.2 fsopen 5.2 fspick 5.2 Repeating an old slide ... Remember LINUX > POSIX And not just new syscalls … new flags ... 2 examples of richer Linux vs. simpler POSIX fallocate has 7 flags – Insert range – Unshare range – Zero range – Keep size – But POSIX fallocate has no flags Rename (renameat2) has 3 flags – noreplace, whiteout and exchange – POSIX rename has none Network File systems matter ● these extensions to most popular network fs protocol (SMB3) are important ● block devices struggle to do file system tasks: locking, security, leases, consistent metadata Linux Apps need to work over network mounts and continue to work as Linux evolves Improve common situations where customers have Linux and Windows and Mac clients Make sure extensions work with most secure, most optimal SMB3.1.1 dialect (don’t encourage less secure network file systems, or even SMB1/CIFS) Quick Overview of Status ● Linux kernel client: – 5.1 kernel or later can be used. -
Creating a Hierarchical Database Backup System Using Oracle RMAN and Oracle SAM QFS with the Sun ZFS Storage Appliance
An Oracle White Paper August 2011 Creating a Hierarchical Database Backup System using Oracle RMAN and Oracle SAM QFS with the Sun ZFS Storage Appliance Creating a Hierarchical Database Backup System using Oracle RMAN and Oracle SAM QFS with the Sun ZFS Storage Appliance Introduction ......................................................................................... 2 Overview of the Hierarchical Database Backup Solution.................... 2 Physical Architecture of a Hierarchical Database Backup System ..... 3 Hierarchical Database Backup System Logical Architecture .............. 5 SAM Server Configuration .............................................................. 6 Configuring the Oracle RMAN Server............................................. 9 Oracle RMAN Configuration ......................................................... 10 Performance Considerations ............................................................ 11 Sizing Oracle RMAN Backups .......................................................... 12 Additional Considerations ................................................................. 14 Conclusion ........................................................................................ 15 Appendix A. Reference Scripts and Configurations.......................... 16 Appendix B. Solution Component Features...................................... 19 Oracle Recovery Manager ............................................................ 19 Sun Quick File System and Storage Archive Manager................. 20 Sun ZFS -
Collocated Data Deduplication for Virtual Machine Backup in the Cloud
UNIVERSITY OF CALIFORNIA Santa Barbara Collocated Data Deduplication for Virtual Machine Backup in the Cloud A Dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Computer Science by Wei Zhang Committee in Charge: Professor Tao Yang, Chair Professor Jianwen Su Professor Rich Wolski September 2014 The Dissertation of Wei Zhang is approved: Professor Jianwen Su Professor Rich Wolski Professor Tao Yang, Committee Chairperson January 2014 Collocated Data Deduplication for Virtual Machine Backup in the Cloud Copyright c 2014 by Wei Zhang iii To my family. iv Acknowledgements First of all, I want to take this opportunity to thank my advisor Professor Tao Yang for his invaluable advice and instructions to me on selecting interesting and challenging research topics, identifying specific problems for each research phase, as well as prepar- ing for my future career. I would also like to thank my committee members, Professor Rich Wolski and Professor Jianwen Su, for their pertinent and helpful instructions on my research work. I also want to thank members and ex-members of my research group, Michael Agun, Gautham Narayanasamy, Prakash Chandrasekaran, Xiaofei Du, and Hong Tang, for their incessant support in the forms of technical discussion, system co-development, cluster administration, and other research activities. I owe my deepest gratitude to my parents, for their love and support, which give me the confidence and energy to overcome all past, current, and future difficulties. Finally and most importantly, I would like to express my gratitude beyond words to my wife, Jiayin, who has been encouraging and inspiring me since we met in love.