Scaling Implicit Parallelism Via Dynamic Control Replication
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Quantum Dxi4500 User's Guide
User’s Guide User’s Guide User’s Guide User’s Guide QuantumQuantum DXDXii45004500 DX i -Series 6-66904-01 Rev C DXi4500 User’s Guide, 6-66904-01 Rev C, August 2010, Product of USA. Quantum Corporation provides this publication “as is” without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability or fitness for a particular purpose. Quantum Corporation may revise this publication from time to time without notice. COPYRIGHT STATEMENT © 2010 Quantum Corporation. All rights reserved. Your right to copy this manual is limited by copyright law. Making copies or adaptations without prior written authorization of Quantum Corporation is prohibited by law and constitutes a punishable violation of the law. TRADEMARK STATEMENT Quantum, the Quantum logo, DLT, DLTtape, the DLTtape logo, Scalar, and StorNext are registered trademarks of Quantum Corporation, registered in the U.S. and other countries. Backup. Recovery. Archive. It’s What We Do., the DLT logo, DLTSage, DXi, DXi-Series, Dynamic Powerdown, FastSense, FlexLink, GoVault, MediaShield, Optyon, Pocket-sized. Well-armored, SDLT, SiteCare, SmartVerify, StorageCare, Super DLTtape, SuperLoader, and Vision are trademarks of Quantum. LTO and Ultrium are trademarks of HP, IBM, and Quantum in the U.S. and other countries. All other trademarks are the property of their respective companies. Specifications are subject to change without notice. ii Quantum DXi4500 User’s Guide Contents Preface xv Chapter 1 DXi4500 System Description 1 Overview . 1 Features and Benefits . 3 Data Reduction . 3 Data Deduplication . 3 Compression. 4 Space Reclamation . 4 Remote Replication . 5 DXi4500 System . -
Replication Using Group Communication Over a Partitioned Network
Replication Using Group Communication Over a Partitioned Network Thesis submitted for the degree “Doctor of Philosophy” Yair Amir Submitted to the Senate of the Hebrew University of Jerusalem (1995). This work was carried out under the supervision of Professor Danny Dolev ii Acknowledgments I am deeply grateful to Danny Dolev, my advisor and mentor. I thank Danny for believing in my research, for spending so many hours on it, and for giving it the theoretical touch. His warm support and patient guidance helped me through. I hope I managed to adopt some of his professional attitude and integrity. I thank Daila Malki for her help during the early stages of the Transis project. Thanks to Idit Keidar for helping me sharpen some of the issues of the replication server. I enjoyed my collaboration with Ofir Amir on developing the coloring model of the replication server. Many thanks to Roman Vitenberg for his valuable insights regarding the extended virtual synchrony model and the replication algorithm. I benefited a lot from many discussions with Ahmad Khalaila regarding distributed systems and other issues. My thanks go to David Breitgand, Gregory Chokler, Yair Gofen, Nabil Huleihel and Rimon Orni, for their contribution to the Transis project and to my research. I am grateful to Michael Melliar-Smith and Louise Moser from the Department of Electrical and Computer Engineering, University of California, Santa Barbara. During two summers, several mutual visits and extensive electronic correspondence, Louise and Michael were involved in almost every aspect of my research, and unofficially served as my co-advisors. The work with Deb Agarwal and Paul Ciarfella on the Totem protocol contributed a lot to my understanding of high-speed group communication. -
Replication Vs. Caching Strategies for Distributed Metadata Management in Cloud Storage Systems
ShardFS vs. IndexFS: Replication vs. Caching Strategies for Distributed Metadata Management in Cloud Storage Systems Lin Xiao, Kai Ren, Qing Zheng, Garth A. Gibson Carnegie Mellon University {lxiao, kair, zhengq, garth}@cs.cmu.edu Abstract 1. Introduction The rapid growth of cloud storage systems calls for fast and Modern distributed storage systems commonly use an archi- scalable namespace processing. While few commercial file tecture that decouples metadata access from file read and systems offer anything better than federating individually write operations [21, 25, 46]. In such a file system, clients non-scalable namespace servers, a recent academic file sys- acquire permission and file location information from meta- tem, IndexFS, demonstrates scalable namespace processing data servers before accessing file contents, so slower meta- based on client caching of directory entries and permissions data service can degrade performance for the data path. Pop- (directory lookup state) with no per-client state in servers. In ular new distributed file systems such as HDFS [25] and the this paper we explore explicit replication of directory lookup first generation of the Google file system [21] have used cen- state in all servers as an alternative to caching this infor- tralized single-node metadata services and focused on scal- mation in all clients. Both eliminate most repeated RPCs to ing only the data path. However, the single-node metadata different servers in order to resolve hierarchical permission server design limits the scalability of the file system in terms tests. Our realization for server replicated directory lookup of the number of objects stored and concurrent accesses to state, ShardFS, employs a novel file system specific hybrid the file system [40]. -
Concurrent Cilk: Lazy Promotion from Tasks to Threads in C/C++
Concurrent Cilk: Lazy Promotion from Tasks to Threads in C/C++ Christopher S. Zakian, Timothy A. K. Zakian Abhishek Kulkarni, Buddhika Chamith, and Ryan R. Newton Indiana University - Bloomington, fczakian, tzakian, adkulkar, budkahaw, [email protected] Abstract. Library and language support for scheduling non-blocking tasks has greatly improved, as have lightweight (user) threading packages. How- ever, there is a significant gap between the two developments. In previous work|and in today's software packages|lightweight thread creation incurs much larger overheads than tasking libraries, even on tasks that end up never blocking. This limitation can be removed. To that end, we describe an extension to the Intel Cilk Plus runtime system, Concurrent Cilk, where tasks are lazily promoted to threads. Concurrent Cilk removes the overhead of thread creation on threads which end up calling no blocking operations, and is the first system to do so for C/C++ with legacy support (standard calling conventions and stack representations). We demonstrate that Concurrent Cilk adds negligible overhead to existing Cilk programs, while its promoted threads remain more efficient than OS threads in terms of context-switch overhead and blocking communication. Further, it enables development of blocking data structures that create non-fork-join dependence graphs|which can expose more parallelism, and better supports data-driven computations waiting on results from remote devices. 1 Introduction Both task-parallelism [1, 11, 13, 15] and lightweight threading [20] libraries have become popular for different kinds of applications. The key difference between a task and a thread is that threads may block|for example when performing IO|and then resume again. -
Replication As a Prelude to Erasure Coding in Data-Intensive Scalable Computing
DiskReduce: Replication as a Prelude to Erasure Coding in Data-Intensive Scalable Computing Bin Fan Wittawat Tantisiriroj Lin Xiao Garth Gibson fbinfan , wtantisi , lxiao , garth.gibsong @ cs.cmu.edu CMU-PDL-11-112 October 2011 Parallel Data Laboratory Carnegie Mellon University Pittsburgh, PA 15213-3890 Acknowledgements: We would like to thank several people who made significant contributions. Robert Chansler, Raj Merchia and Hong Tang from Yahoo! provide various help. Dhruba Borthakur from Facebook provides statistics and feedback. Bin Fu and Brendan Meeder gave us their scientific applications and data-sets for experimental evaluation. The work in this paper is based on research supported in part by the Betty and Gordon Moore Foundation, by the Department of Energy, under award number DE-FC02-06ER25767, by the Los Alamos National Laboratory, by the Qatar National Research Fund, under award number NPRP 09-1116-1-172, under contract number 54515- 001-07, by the National Science Foundation under awards CCF-1019104, CCF-0964474, OCI-0852543, CNS-0546551 and SCI-0430781, and by Google and Yahoo! research awards. We also thank the members and companies of the PDL Consortium (including APC, EMC, Facebook, Google, Hewlett-Packard, Hitachi, IBM, Intel, Microsoft, NEC, NetApp, Oracle, Panasas, Riverbed, Samsung, Seagate, STEC, Symantec, and VMware) for their interest, insights, feedback, and support. Keywords: Distributed File System, RAID, Cloud Computing, DISC Abstract The first generation of Data-Intensive Scalable Computing file systems such as Google File System and Hadoop Distributed File System employed n (n ≥ 3) replications for high data reliability, therefore delivering users only about 1=n of the total storage capacity of the raw disks. -
Parallel Computing a Key to Performance
Parallel Computing A Key to Performance Dheeraj Bhardwaj Department of Computer Science & Engineering Indian Institute of Technology, Delhi –110 016 India http://www.cse.iitd.ac.in/~dheerajb Dheeraj Bhardwaj <[email protected]> August, 2002 1 Introduction • Traditional Science • Observation • Theory • Experiment -- Most expensive • Experiment can be replaced with Computers Simulation - Third Pillar of Science Dheeraj Bhardwaj <[email protected]> August, 2002 2 1 Introduction • If your Applications need more computing power than a sequential computer can provide ! ! ! ❃ Desire and prospect for greater performance • You might suggest to improve the operating speed of processors and other components. • We do not disagree with your suggestion BUT how long you can go ? Can you go beyond the speed of light, thermodynamic laws and high financial costs ? Dheeraj Bhardwaj <[email protected]> August, 2002 3 Performance Three ways to improve the performance • Work harder - Using faster hardware • Work smarter - - doing things more efficiently (algorithms and computational techniques) • Get help - Using multiple computers to solve a particular task. Dheeraj Bhardwaj <[email protected]> August, 2002 4 2 Parallel Computer Definition : A parallel computer is a “Collection of processing elements that communicate and co-operate to solve large problems fast”. Driving Forces and Enabling Factors Desire and prospect for greater performance Users have even bigger problems and designers have even more gates Dheeraj Bhardwaj <[email protected]> -
Middleware-Based Database Replication: the Gaps Between Theory and Practice
Appears in Proceedings of the ACM SIGMOD Conference, Vancouver, Canada (June 2008) Middleware-based Database Replication: The Gaps Between Theory and Practice Emmanuel Cecchet George Candea Anastasia Ailamaki EPFL EPFL & Aster Data Systems EPFL & Carnegie Mellon University Lausanne, Switzerland Lausanne, Switzerland Lausanne, Switzerland [email protected] [email protected] [email protected] ABSTRACT There exist replication “solutions” for every major DBMS, from Oracle RAC™, Streams™ and DataGuard™ to Slony-I for The need for high availability and performance in data Postgres, MySQL replication and cluster, and everything in- management systems has been fueling a long running interest in between. The naïve observer may conclude that such variety of database replication from both academia and industry. However, replication systems indicates a solved problem; the reality, academic groups often attack replication problems in isolation, however, is the exact opposite. Replication still falls short of overlooking the need for completeness in their solutions, while customer expectations, which explains the continued interest in developing new approaches, resulting in a dazzling variety of commercial teams take a holistic approach that often misses offerings. opportunities for fundamental innovation. This has created over time a gap between academic research and industrial practice. Even the “simple” cases are challenging at large scale. We deployed a replication system for a large travel ticket brokering This paper aims to characterize the gap along three axes: system at a Fortune-500 company faced with a workload where performance, availability, and administration. We build on our 95% of transactions were read-only. Still, the 5% write workload own experience developing and deploying replication systems in resulted in thousands of update requests per second, which commercial and academic settings, as well as on a large body of implied that a system using 2-phase-commit, or any other form of prior related work. -
Manufacturing Equipment Technologies for Hard Disk's
Manufacturing Equipment Technologies for Hard Disk’s Challenge of Physical Limits 222 Manufacturing Equipment Technologies for Hard Disk’s Challenge of Physical Limits Kyoichi Mori OVERVIEW: To meet the world’s growing demands for volume information, Brian Rattray not just with computers but digitalization and video etc. the HDD must Yuichi Matsui, Dr. Eng. continually increase its capacity and meet expectations for reduced power consumption and green IT. Up until now the HDD has undergone many innovative technological developments to achieve higher recording densities. To continue this increase, innovative new technology is again required and is currently being developed at a brisk pace. The key components for areal density improvements, the disk and head, require high levels of performance and reliability from production and inspection equipment for efficient manufacturing and stable quality assurance. To meet this demand, high frequency electronics, servo positioning and optical inspection technology is being developed and equipment provided. Hitachi High-Technologies Corporation is doing its part to meet market needs for increased production and the adoption of next-generation technology by developing the technology and providing disk and head manufacturing/inspection equipment (see Fig. 1). INTRODUCTION higher efficiency storage, namely higher density HDDS (hard disk drives) have long relied on the HDDs will play a major role. computer market for growth but in recent years To increase density, the performance and quality of there has been a shift towards cloud computing and the HDD’s key components, disks (media) and heads consumer electronics etc. along with a rapid expansion have most effect. Therefore further technological of data storage applications (see Fig. -
Control Replication: Compiling Implicit Parallelism to Efficient SPMD with Logical Regions
Control Replication: Compiling Implicit Parallelism to Efficient SPMD with Logical Regions Elliott Slaughter Wonchan Lee Sean Treichler Stanford University Stanford University Stanford University SLAC National Accelerator Laboratory [email protected] NVIDIA [email protected] [email protected] Wen Zhang Michael Bauer Galen Shipman Stanford University NVIDIA Los Alamos National Laboratory [email protected] [email protected] [email protected] Patrick McCormick Alex Aiken Los Alamos National Laboratory Stanford University [email protected] [email protected] ABSTRACT 1 for t = 0, T do We present control replication, a technique for generating high- 1 for i = 0, N do −− Parallel 2 for i = 0, N do −− Parallel performance and scalable SPMD code from implicitly parallel pro- 2 for t = 0, T do 3 B[i] = F(A[i]) grams. In contrast to traditional parallel programming models that 3 B[i] = F(A[i]) 4 end 4 −− Synchronization needed require the programmer to explicitly manage threads and the com- 5 for j = 0, N do −− Parallel 5 A[i] = G(B[h(i)]) munication and synchronization between them, implicitly parallel 6 A[j] = G(B[h(j)]) 6 end programs have sequential execution semantics and by their nature 7 end 7 end avoid the pitfalls of explicitly parallel programming. However, with- 8 end (b) Transposed program. out optimizations to distribute control overhead, scalability is often (a) Original program. poor. Performance on distributed-memory machines is especially sen- F(A[0]) . G(B[h(0)]) sitive to communication and synchronization in the program, and F(A[1]) . G(B[h(1)]) thus optimizations for these machines require an intimate un- . -
A Crash Course in Wide Area Data Replication
A Crash Course In Wide Area Data Replication Jacob Farmer, CTO, Cambridge Computer SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies and individuals may use this material in presentations and literature under the following conditions: Any slide or slides used must be reproduced without modification The SNIA must be acknowledged as source of any material used in the body of any document containing material from these presentations. This presentation is a project of the SNIA Education Committee. A Crash Course in Wide Area Data Replication 2 © 2008 Storage Networking Industry Association. All Rights Reserved. Abstract A Crash Course in Wide Area Data Replication Replicating data over a WAN sounds pretty straight-forward, but it turns out that there are literally dozens of different approaches, each with it's own pros and cons. Which approach is the best? Well, that depends on a wide variety of factors! This class is a fast-paced crash course in the various ways in which data can be replicated with some commentary on each major approach. We trace the data path from applications to disk drives and examine all of the points along the way wherein replication logic can be inserted. We look at host based replication (application, database, file system, volume level, and hybrids), SAN replication (disk arrays, virtualization appliances, caching appliances, and storage switches), and backup system replication (block level incremental backup, CDP, and de-duplication). This class is not only the fastest way to understand replication technology it also serves as a foundation for understanding the latest storage virtualization techniques. -
Regent: a High-Productivity Programming Language for Implicit Parallelism with Logical Regions
REGENT: A HIGH-PRODUCTIVITY PROGRAMMING LANGUAGE FOR IMPLICIT PARALLELISM WITH LOGICAL REGIONS A DISSERTATION SUBMITTED TO THE DEPARTMENT OF COMPUTER SCIENCE AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY Elliott Slaughter August 2017 © 2017 by Elliott David Slaughter. All Rights Reserved. Re-distributed by Stanford University under license with the author. This work is licensed under a Creative Commons Attribution- Noncommercial 3.0 United States License. http://creativecommons.org/licenses/by-nc/3.0/us/ This dissertation is online at: http://purl.stanford.edu/mw768zz0480 ii I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Alex Aiken, Primary Adviser I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Philip Levis I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Oyekunle Olukotun Approved for the Stanford University Committee on Graduate Studies. Patricia J. Gumport, Vice Provost for Graduate Education This signature page was generated electronically upon submission of this dissertation in electronic format. An original signed hard copy of the signature page is on file in University Archives. iii Abstract Modern supercomputers are dominated by distributed-memory machines. State of the art high-performance scientific applications targeting these machines are typically written in low-level, explicitly parallel programming models that enable maximal performance but expose the user to programming hazards such as data races and deadlocks. -
Replication Under Scalable Hashing: a Family of Algorithms for Scalable Decentralized Data Distribution
Replication Under Scalable Hashing: A Family of Algorithms for Scalable Decentralized Data Distribution R. J. Honicky Ethan L. Miller [email protected] [email protected] Storage Systems Research Center Jack Baskin School of Engineering University of California, Santa Cruz Abstract block management is handled by the disk as opposed to a dedicated server. Because block management on individ- Typical algorithms for decentralized data distribution ual disks requires no inter-disk communication, this redis- work best in a system that is fully built before it first used; tribution of work comes at little cost in performance or ef- adding or removing components results in either exten- ficiency, and has a huge benefit in scalability, since block sive reorganization of data or load imbalance in the sys- layout is completely parallelized. tem. We have developed a family of decentralized algo- There are differing opinions about what object-based rithms, RUSH (Replication Under Scalable Hashing), that storage is or should be, and how much intelligence belongs maps replicated objects to a scalable collection of storage in the disk. In order for a device to be an OSD, however, servers or disks. RUSH algorithms distribute objects to each disk or disk subsystem must have its own filesystem; servers according to user-specified server weighting. While an OSD manages its own allocation of blocks and disk lay- all RUSH variants support addition of servers to the sys- out. To store data on an OSD, a client providesthe data and tem, different variants have different characteristics with a key for the data. To retrieve the data, a client provides a respect to lookup time in petabyte-scale systems, perfor- key.