Solving the HPC I/O Bottleneck: Sun Lustre Storage System
Total Page:16
File Type:pdf, Size:1020Kb
SOLVING THE HPC I/O BOTTLENECK: SUN™ LUSTRE™ STORAGE SYSTEM Sean Cochrane, Global HPC Sales Ken Kutzer, HPC Marketing Lawrence McIntosh, Engineering Solutions Group Sun BluePrints™ Online Part No 820-7664-20 Revision 2.0, 11/12/09 Sun Microsystems, Inc. Table of Contents Solving the HPC I/O Bottleneck: Sun Lustre Storage System ...............................1 Target Environments ........................................................................................... 1 The Lustre File System ......................................................................................... 2 Lustre File System Design ................................................................................ 3 Sun and Open Storage..................................................................................... 4 Sun Lustre Storage System Overview .................................................................... 5 Design Considerations ..................................................................................... 6 Hardware Components.................................................................................... 8 HA MDS Module ......................................................................................... 8 Standard OSS Module ................................................................................. 9 HA OSS Module ........................................................................................ 11 Software Components .................................................................................. 14 Performance Evaluation .................................................................................... 16 HA OSS Testing and Results ........................................................................... 17 HA OSS Benchmark Configuration .............................................................. 17 RAID and Disk Configuration ...................................................................... 18 IOzone Benchmark Runs ............................................................................ 18 Sample IOzone Benchmark Output ............................................................. 20 Standard OSS Testing and Results .................................................................. 22 Standard OSS Benchmark Configuration ..................................................... 22 IOzone Benchmark Runs ............................................................................ 23 IOzone Benchmark Output ......................................................................... 24 Proven Scalability ............................................................................................. 26 CLUMEQ Supercomputing Consortium ........................................................... 26 Texas Advanced Computer Center (TACC) ........................................................ 27 Summary ......................................................................................................... 28 About the Authors ............................................................................................. 28 Acknowledgements ........................................................................................... 29 References ....................................................................................................... 30 Ordering Sun Documents .................................................................................. 30 Accessing Sun Documentation Online ................................................................ 30 1 Solving the HPC I/O Bottleneck: Sun Lustre Storage System Sun Microsystems, Inc. Solving the HPC I/O Bottleneck: Sun™ Lustre™ Storage System Much of the focus of high performance computing (HPC) has traditionally centered on CPU performance. However, as computing requirements have grown, HPC clusters are demanding increasingly higher rates of aggregate data throughput. With ongoing increases in CPU performance and the availability of multiple cores per socket, many clusters can now generate I/O loads that a few years ago were observed only in very large systems. Traditional shared file systems, such as NFS, were not originally designed to scale to the required levels of performance of today’s clusters. As a parallel or clustered file system, the Lustre™ file system can aggregate I/O Note: This Sun BluePrints™ across a number of individual storage devices and provide parallel data access that article is an updated version far exceeds the performance of monolithic storage devices. By providing shared of an article by the same title file system access for hundreds or even thousands of nodes, the Lustre file system originally published in April 2009. Specifically, this article enables the creation of a storage solution that can provide the high aggregate I/O contains updated performance bandwidth required by HPC applications in areas such as manufacturing, electronic results for the High Availability design, government, and research. Object Storage Server (HA OSS) This paper describes the Sun™ Lustre Storage System, a simple-to-deploy storage module used in the Lustre™ environment based on the Lustre file system, Sun Fire™ servers and Sun Open file system implementation. This new HA OSS module uses Storage platforms: two Sun Fire™ X4270 servers, • “Target Environments” on page 1 introduces target environments for the Sun each with two quad-core Intel® Lustre Storage System. Xeon® 5500 series (Nehalem) • “The Lustre File System” on page 2 provides an overview of the Lustre file system. processors and each configured with 24 GB RAM; Quad Data Rate • “Sun Lustre Storage System Overview” on page 5 introduces the Sun Lustre (QDR) InfiniBand; and Lustre 1.8 Storage System, including design considerations and hardware and software file system software. components. • “Performance Evaluation” on page 16 details data obtained from a performance evaluation of the Sun Lustre Storage System. Target Environments High performance computing covers a diverse set of markets including education, research, weather and climate forecasting, financial modeling, biosciences, seismic processing, computer aided engineering and digital content creation to name a few. The focus of this paper is deploying very high bandwidth storage solutions with the Sun Lustre Storage System. The Sun Lustre Storage System is a very high performance and extremely scalable storage solution for serving compute clusters or grids requiring high aggregate I/O bandwidth. The Sun Lustre Storage System combines the open source, Lustre 2 Solving the HPC I/O Bottleneck: Sun Lustre Storage System Sun Microsystems, Inc. parallel file system, Sun Fire servers and Sun Open Storage products. The result is a simple-to-deploy parallel storage solution that delivers sustained performance ranging from a few gigabytes per second to over 200 GB/sec, capacity scaling to tens of petabytes, and a compelling price to performance ratio. This storage solution is generally deployed using InfiniBand interconnects, but can also be deployed using Gigabit Ethernet or 10 Gigabit Ethernet infrastructure. While not covered in this paper, readers should be aware of the following Sun solutions that may be well suited for data sets that are not subject to the high bandwidth I/O needs outlined later in the document. • Sun Storage 7000 Unified Storage System Sun Storage 7000 Unified Storage Systems are simple-to-use storage appliances designed to deliver leading performance via traditional file sharing protocols such as NFS and CIFS at a radically new price point. Developed using open source software and industry standard components, the Unified Storage family of storage products installs in minutes and provides simple to use yet very powerful analytic capabilities that allow sophisticated performance management. These products are often accessed via NFS or CIFS but incorporate flash technology to provide a performance profile that exceeds typical NFS server products. The Unified Storage product family can be used with Gigabit Ethernet, 10 Gigabit Ethernet, or InfiniBand interconnects. For more on the Sun Storage 7000 Unified Storage Systems, see http://www.sun.com/storage/disk_systems/unified_storage. • Sun Archive Many sites need to retain very large volumes of data in the most economical fashion and facilitate storing that data as well as recalling it for future projects. Sun provides a full set of solutions to address the massive data problem that many sites are facing. Sun provides archiving products to over 48% of the top 50 supercomputers as ranked by top500.org on the June 2009 listing. For more information on Sun’s archiving solutions, see http://www.sun.com/storage/hpc/ and http://www.sun.com/storage/archive. The Lustre File System The Lustre file system is an open source shared file system designed to address the I/O needs of compute clusters containing up to thousands of nodes. It is best known for powering the largest HPC clusters in the world, with tens of thousands of client systems, petabytes (PB) of storage, and hundreds of gigabytes per second (GB/sec) of I/O throughput. A number of HPC sites use Lustre file system as a site-wide global file system, servicing clusters on an unprecedented scale. The Lustre file system is used by 62% of the top 50 supercomputers as ranked by top500.org on the June 2009 3 Solving the HPC I/O Bottleneck: Sun Lustre Storage System Sun Microsystems, Inc. listing. Additionally, IDC lists Lustre file system as the file system with the largest market share in HPC. (Source: IDC’s