Making a Case for Distributed File Systems at Exascale 1,2Ioan Raicu, 2,3,4Ian T. Foster, 2,3,5Pete Beckman 1 Department of Computer Science, Illinois Institute of Technology, Chicago, IL, USA 2 Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL, USA 3 Computation Institute, University of Chicago, Chicago, IL, USA 4 Department of Computer Science, University of Chicago, Chicago, IL, USA 5 Exascale Technology and Computing Institute, Argonne National Laboratory, Argonne, IL, USA
[email protected],
[email protected],
[email protected] ABSTRACT 1. INTRODUCTION Exascale computers will enable the unraveling of significant Today’s science is generating datasets that are increasing scientific mysteries. Predictions are that 2019 will be the year of exponentially in both complexity and volume, making their exascale, with millions of compute nodes and billions of threads analysis, archival, and sharing one of the grand challenges of the of execution. The current architecture of high-end computing 21st century. Seymour Cray once said – “a supercomputer is a systems is decades-old and has persisted as we scaled from device for turning compute-bound problems into I/O-bound gigascales to petascales. In this architecture, storage is problems” – which addresses the fundamental shift in bottlenecks completely segregated from the compute resources and are as supercomputers gain more parallelism at exponential rates [1], connected via a network interconnect. This approach will not the storage infrastructure performance is increasing at a scale several orders of magnitude in terms of concurrency and significantly lower rate. This implies that the data management throughput, and will thus prevent the move from petascale to and data flow between the storage and compute resources is exascale.