An Application-Attuned Framework for Optimizing HPC Storage Systems

An Application-Attuned Framework for Optimizing HPC Storage Systems Arnab Kumar Paul Dissertation submitted to the Faculty of the Virginia Polytechnic Institute and State University in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science and Applications Ali R. Butt, Chair Ian Foster Dongyoon Lee Eli Tilevich Gang Wang July 28, 2020 Blacksburg, Virginia Keywords: Parallel File Systems, Object-Based Storage, Data Management, Load Balancing, File System Indexing, Metadata Management, High Performance Computing Copyright 2020, Arnab Kumar Paul An Application-Attuned Framework for Optimizing HPC Storage Systems Arnab Kumar Paul (ABSTRACT) High performance computing (HPC) is routinely employed in diverse domains such as life sciences, and Geology, to simulate and understand the behavior of complex phenomena. Big data driven scientific simulations are resource intensive and require both computing and I/O capabilities at scale. There is a crucial need for revisiting the HPC I/O subsystem to better optimize for and manage the increased pressure on the underlying storage systems from big data processing. Extant HPC storage systems are designed and tuned for a specific set of applications targeting a range of workload characteristics, but they lack the flexibility in adapting to the ever-changing application behaviors. The complex nature of modern HPC storage systems along with the ever-changing application behaviors present unique opportu- nities and engineering challenges. In this dissertation, we design and develop a framework for optimizing HPC storage systems by making them application-attuned. We select three different kinds of HPC storage systems { in-memory data analytics frameworks, parallel file systems and object storage. We first analyze the HPC application I/O behavior by studying real-world I/O traces. Next we optimize parallelism for applications running in-memory, then we design data management techniques for HPC storage systems, and finally focus on low-level I/O load balance for im- proving the efficiency of modern HPC storage systems. In the first part of this dissertation (Chapter3), we focus on collecting and studying file system statistics from two Livermore Computing systems with 15 PiB Lustre file systems at Lawrence Livermore National Laboratory. We add to the state-of-the art in I/O under- standing by providing insight into how general HPC workloads affect the performance of large-scale HPC storage systems. In the second part of this dissertation (Chapter4), we enable dynamic partitioning approach to improve task parallelism for in-memory data analytics frameworks. The third part of this dissertation (Chapter5 and Chapter6) proposes scalable and novel techniques for data management in HPC storage systems. We design and develop a generalized and scalable storage system monitor FSMonitor (Chapter5) for cap- turing and reporting events on heterogeneous large-scale storage systems. Furthermore, we propose a metadata indexing and search tool Brindexer (Chapter6) specifically designed for large-scale HPC storage systems. Brindexer is mainly designed for system administra- tors to help them manage the file system effectively. In the fourth part of this dissertation (Chapter7), we use all the findings and tools from the previous parts to develop an \end- to-end control plane" that leverages real time load information for distributed storage server global data placement while our design model leverages trace-based optimization techniques to minimize I/O load request latency between clients and servers. An Application-Attuned Framework for Optimizing HPC Storage Systems Arnab Kumar Paul (GENERAL AUDIENCE ABSTRACT) Clusters of multiple computers connected through internet are often deployed in industry and laboratories for large scale data processing or computation that cannot be handled by standalone computers. In such a cluster, resources such as CPU, memory, disks are integrated to work together. With the increase in popularity of applications that read and write a tremendous amount of data, we need a large number of disks that can interact effectively in such clusters. This forms the part of high performance computing (HPC) storage systems. Such HPC storage systems are used by a diverse set of applications coming from organizations from a vast range of domains from earth sciences, financial services, telecommunication to life sciences. Therefore, the HPC storage system should be efficient to perform well for the different read and write (I/O) requirements from all the different sets of applications. But current HPC storage systems do not cater to the varied I/O requirements. To this end, this dissertation designs and develops a framework for HPC storage systems that is application- attuned and thus provides much improved performance than other state-of-the-art HPC storage systems without such optimizations. Dedicated to my family - Ma, Baba, Mann, Didi, Tun da, Jhilik, and Aarush; and to my friends who have become a part of my life, for their support, encouragement, motivation, and love. iv Acknowledgments This dissertation would not have been possible without the support of many people who have become an integral part of my life. I would like to take this time to express my gratitude to these awesome individuals. My journey in the path of Computer Science started when I was in the eighth standard when my first computer came to our house. Didi was the one who first inspired me to write computer codes to solve complex problems. I also owe my interest in Computer Science hugely to Geeta ma'am and Rama Rao Sir, who were the ones to instill in me the confidence of coding and digital logic right throughout my high school days. During my B.Tech. days at Netaji Subhash Engineering College, I am thankful to the entire CSE faculty, especially, Anupam Sir and Pritwish Sir, who taught me the fundamentals of Operating System and Computer Hardware, which form the base of my Ph.D. dissertation. My life's journey would not have turned towards higher studies without the support of my roommates and friends during B.Tech. - Soumya, Sabya Da, Debu Da, Koontal Da, Golu, Subroto Da, Pritam, Krishna, Bikram, Arijit, Prithwish, Priyankar, Arka, Arjun, Pratip, Banu, Asif, Samik, Nandu, Manashi, Lopa, Such, Nimi, Animesh, Arunava, Kathi, Raj, Arindam, Aritra, Rahul Roy, Chandrani, Moulik, and Madhu. Next, during my M.Tech. at National Institute of Technology Rourkela, I met so many awesome individuals who shaped my life and prepared me further for my Ph.D. Firstly, special thanks to Bibhu Sir, Rath Sir, and Mohapatra Sir for letting me work on a M.Tech. topic that was outside my specialization area but would lead me to pursue Ph.D. on that topic. They were the ones who wrote recommendation letters and motivated me to pursue my Ph.D. in the U.S.. My roommates and classmates at NIT Rourkela have been the best. Ravikant - I miss you a lot mere bhai! Koshy, Akshay, Sandeep, Soham, Sumanta, Rajgopal, Aditi, Sai, and Hareesh, you all have given me the best moments to cherish throughout my life. My Ph.D. journey at Virginia Tech could not have gone smoother without the support and encouragement of my advisor, Dr. Ali R. Butt. He has been a mentor, and an elder brother to me during the course of my Ph.D.. The freedom that he gave me during my v Ph.D. to pursue my research ideas, mentor students, teach classes, and brainstorm ideas for proposals has given me the much needed boost for my entire career. He has also showed me how to balance family and work, being successful at both. I am also thankful to Dr. Eli Tilevich who gave me life and Ph.D. lessons during our ping-pong sessions, sometimes even late at night in CRC. I have also been inspired by the rigorous hard work that my committee members, Dr. Dongyoon Lee and Dr. Gang Wang, put in day and night to excel in their respective research areas. I would also like to take this opportunity to thank Dr. Wu Feng who was my first point of contact at Virginia Tech. Thanks are in order for my labmates: Kwang, Umar - my sav- ior, Jay, Safdar and Anwar - my mentors, Bharti, Kevin, Sangeetha, Arpit, Salman, Luna, Hadeel, Nannan, Redwan, Debasmita, Jingoo, Hyogi, Arjun, Jamaal, and Michael. The in- teractions, collaborations, and lengthy discussions about research projects and life in general with my labmates really enriched my Ph.D. experience. DSSL rocks! Subil - In addition to being an awesome labmate, we also collaborated on research. I hope we continue our rela- tionship even in ORNL. Breno - For the past two years we have served as the punching bag for each other's stress in the ping-pong room. I hope we can have many more collaborations in the near future. I am extremely thankful to the wonderful staff in the Computer Science department at Virginia Tech. Special mention needs to be given to Sharon, who has helped me numerous number of times and the special bond that we have created over the years. Teresa - without you none of my conference travels would have been possible. Melanie and Robert - thank you for taking care of us whenever it was needed, and finally the tech staff, for being there always whenever I needed technical assistance for managing our lab's cluster. Thanks are also in order for the graduate school. Thank you for being there whenever I needed help with immigration documents and procedures in the graduate school. My Ph.D. career has been majorly marked by the three summer internships that I have done. Firstly, thank you Dr. Foster, Ryan and Kyle for giving me an opportunity to work in Argonne National Laboratory in the summer of 2017. It was my first ever internship and the experience I gained from it was tremendous.

Load more