
The Pennsylvania State University The Graduate School College of Engineering A PROTEAN ATTACK ON THE COMPUTE-STORAGE GAP IN HIGH-PERFORMANCE COMPUTING A Dissertation in Computer Science and Engineering by Ellis H. Wilson III © 2014 Ellis H. Wilson III Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy August 2014 The dissertation of Ellis H. Wilson III was reviewed and approved∗ by the following: Mahmut T. Kandemir Professor of Computer Science and Engineering Dissertation Advisor, Chair of Committee Padma Raghavan Professor of Computer Science and Engineering Wang-Chien Lee Professor of Computer Science and Engineering Christopher Duffy Professor of Civil Engineering Lee Coraor Associate Professor of Computer Science and Engineering Director of Academic Affairs of Computer Science and Engineering ∗Signatures are on file in the Graduate School. ii Abstract Distributed computing, in particular supercomputers, have facilitated a significant acceleration in scientific progress over the last three-quarters of a century by enabling scientists to ask questions that previously held intractable answers. Looking at historical data over the last 20 years for the top supercomputers in the world, we note that they have demonstrated an amazing doubling in performance every 13.5 months, well in excess of Moore’s law. Moreover, as these machines grow in computational power, the magnetic hard disk drives (HDDs) they rely upon to store data to and retrieve data from double in capacity roughly every 18 months. These facts considered in concert provide a foundation for the recent data-driven revolution in the way both scientists and businesses extract useful knowledge from their increasing datasets. However, while computation and capacity potential for these machines is growing at a breathless rate, a disturbing but oft-ignored reality is that the ability to access the data on a given HDDs is shrinking by comparison, doubling only once every decade. In short, this means although the capability to process and store the data scientists and businesses are so excited about is here, the ability to access that data (a prerequisite for processing it) falls behind year in and year out. Therefore, the focus in this thesis is to find ways to limit or close the annually widening compute-to-bandwidth gap, specifically for systems at scale such as supercomputers and the cloud. Recognizing that this problem requires improvement at numerous levels in the storage stack, we take a protean approach to seeking and implementing solutions. Specifically, we attack this problem by researching ways to 1) consolidate our storage devices to maximize aggregate bandwidth while enabling best-of-breed analytic approaches, 2) determine optimal data-reduction techniques such as deduplication and compression in the face of a sea of data and a lack of existing analysis tools, and 3) designing novel algorithms to overcome longevity shortcomings in state-of-the-art alternatives to magnetic storage such as flash-based solid-state disks (SSDs). iii Table of Contents List of Figures vii List of Tables viii Acknowledgments ix Gopher Guts . xiv Chapter 1 Introduction 1 1.1 Problem Statement . 1 1.2 Thesis Statement . 5 1.2.1 Data Consolidation . 6 1.2.2 Data Reduction . 6 1.2.3 Storage Device Improvement . 8 Chapter 2 Data Consolidation: Enabling Big Data Computation atop Tra- ditional HPC NAS Storage 9 2.1 Introduction . 9 2.1.1 The NAS and HPC Narrative . 10 2.1.2 Contributions . 12 2.2 Background . 13 2.2.1 Overview of HDFS . 13 2.2.1.1 Replication in HDFS . 14 2.3 Architectures Explored . 14 2.4 Reliability Analysis . 17 2.4.1 Failure in NAS . 17 2.4.2 Failure in Hadoop . 17 2.4.3 Combining the Architectures . 18 2.4.4 Why Not Just NAS? . 19 iv 2.5 Data Locality and Transport . 20 2.5.1 Write Transport . 20 2.5.2 Read Transport . 21 2.6 RainFS . 22 2.6.1 Design Desirata . 22 2.6.2 Implementation Overview . 23 2.6.3 File Operations . 23 2.6.3.1 Create . 24 2.6.3.2 Delete . 25 2.6.3.3 Move . 26 2.6.4 Failure Handling . 27 2.7 Evaluation . 28 2.7.1 Experimental Setup . 28 2.7.2 Benchmarks . 29 2.7.3 Results . 30 2.8 Related Works . 33 2.9 Conclusion . 34 Chapter 3 Data Reduction: Scalable Deduplication and Compression Eval- uation 35 3.1 Introduction . 35 3.2 Background . 38 3.3 Compression . 38 3.4 Deduplication . 40 3.5 Design of TreeChunks . 43 3.6 Exemplary Evaluation . 46 3.7 Conclusion . 58 Chapter 4 Data Storage Improvement: Extending SSD Longevity 59 4.1 Introduction . 59 4.2 Background . 63 4.2.1 SSD Architecture . 64 4.2.2 NAND Flash Overview . 64 4.2.3 The Physics of Cell Wear-Out . 65 4.3 Wear-Unleveling for Lifetime . 66 4.3.1 The Basics of Wear-Leveling . 66 4.3.2 The Early Switching Pool . 67 4.4 Evaluation . 69 v 4.4.1 Simulation Framework . 69 4.4.2 Experimental Setup . 71 4.4.3 Synthetic Results . 72 4.4.4 Trace-Driven Results . 76 4.5 Related Work . 82 4.6 Conclusion . 84 Chapter 5 Conclusion 85 Bibliography 86 vi List of Figures 1.1 Performance results on HPLinpack for the Top500 supercomputers . 2 1.2 HDD capacity and bandwidth growth over the years . 4 2.1 Flow of write I/Os in traditional HDFS . 13 2.2 Write I/O flows in explored architectures . 15 2.3 Errant pass-through I/O flows . 20 2.4 TeraSort Suite benchmark results for 1 and 2 replicas . 29 2.5 Throughput impact on write-intensive workloads when going from replication level of 1 to 2. 32 3.1 Impact of chunking on data compressability . 40 3.2 Compression efficacy filesystem map . 56 3.3 Deduplication efficacy filesystem map . 57 4.1 ZombieNAND proof-of-concept on raw flash chips . 60 4.2 Typical SSD Architecture . 63 4.3 Lifetime and latency synthetic evaluation for TLC SSDs . 73 4.4 Lifetime and latency synthetic evaluation for MLC SSDs . 74 4.5 Trace address reuse CDF . 78 4.6 Impact of ZombieNAND on lifetime for trace-driven evaluation . 79 4.7 Impact of ZombieNAND on latency for trace-driven workloads . 80 vii List of Tables 2.1 Disk and rack failure tolerance by architecture . 16 2.2 Hardware and VM resources . 28 4.1 Access latency based on operation type and bit-level . 72 4.2 Experimental configurations . 72 4.3 Trace access composition . 77 viii Acknowledgments I proceed with great caution in writing the acknowledgments for my Ph.D. below. On the one hand, it would be utterly callous to not pen down some form of thanks to the many people who have contributed in one way or another towards the successful completion of this milestone in my life. However, on the other, and perhaps what I fear more, it seems almost impossible to enumerate and thank each and every influence that has brought me to this moment in time. Therefore, to whomever I errantly exclude, please forgive the flighty memory of an academic. This dissertation and my Ph.D. on the whole are in no small part attributable to the interactions with and inspiration from three distinct groups in my life: fellow academics, key figures in my distance running career, and my friends and family, which I address in order below. Please note that persons addressed within categories generally follow first chronological impact–no assignment of greater or lesser import should be derived from their ordering. While my love for computing in general stretches as far back as my very early teens, my interest in high-performance distributed computing was born during my undergraduate sophomore year when I read “Engineering a Beowulf-Style Compute Cluster,” a free online book by Robert G. Brown (rgb). Although I cannot recall how I first stumbled upon it, my thanks must be given to rgb for writing (and freely providing) the text and those on the Beowulf mailing list for their helpful support and advice as my curiosity for distributed computing first blossomed. This curiosity first struggled to find appropriate soil (i.e., free machines for me to build a cluster with) until I was introduced to Michael Prushan, a professor of chemistry at my undergraduate institution, La Salle University. During my junior and senior years as I learned by trial-and-error how to build a cluster, he was an unparalleled advisor of my research with a lust for problem-solving and teaching I will never forget. In working with him I was offered my first exposure to interdisciplinary research, an experience every computer scientist must have to fully appreciate and understand the scope of his own field. Equally critically, he introduced me to Robert Levis, another chemistry professor and director of Temple ix University’s Center for Advanced Photonics Research (CAPR). It was during my undergraduate internships with Robert at CAPR that I managed my first sizable cluster, learned (again, mostly by trial-and-error) to write applications that would scale, was exposed to the basics of machine learning, and came to grips with the raw difficulty of problems academics face (and fall in love with) on a day-to-day basis. This problem difficulty and corresponding progress by tiny, deliberate steps was addictive, and a key instigator in my application to Ph.D. programs. My sincere thanks to you both for your advisement and education in what I consider my first years as a scientist. To this day it is not entirely clear to me why my doctoral advisor, Mahmut Kandemir, called me with an acceptance to a top-thirty computer science Ph.D.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages108 Page
-
File Size-