NESL, NDPC (A Home- Grown Language), and Racket

Electrical Engineering and Computer Science Department Technical Report NU-EECS-16-12 September 8, 2016 Hybrid Runtime Systems Kyle C. Hale Abstract As parallelism continues to increase in ubiquity—from mobile devices and GPUs to datacenters and supercomputers—parallel runtime systems occupy an increasingly important role in the system software stack. The needs of parallel runtimes and the increasingly sophisticated languages and compilers they support do not line up with the services provided by general-purpose OSes. Furthermore, the semantics available to the runtime are lost at the system-call boundary in such OSes. Finally, because a runtime executes at user- level in such an environment, it cannot leverage hardware features that require kernel-mode privileges—a large portion of the functionality of the machine is lost to it. These limitations warp the design, implementation, functionality, and performance of parallel runtimes. I make the case for eliminating these compromises by transforming parallel runtimes into This work was made possible by support from the United States National Science Foundation through grants CCF-1533560 and CNS-0709168, from the United States Department of Energy through grant number DE- SC0005343, and from Sandia National Laboratories through the Hobbes Project, which is funded by the 2013 Exascale Operating and Runtime Systems Program under the Office of Advanced Scientific Computing Research in the DoE’s Office of Science. hybrid runtimes (HRTs), runtimes that run as kernels, and that enjoy full hardware access and control over abstractions to the machine. The primary claim of this dissertation is that the hybrid runtime model can provide significant benefits to parallel runtimes and the applications that run on top of them. I demonstrate that it is feasible to create instantiations of the hybrid runtime model by doing so for four different parallel runtimes, including Legion, NESL, NDPC (a home- grown language), and Racket. These HRTs are enabled by a kernel framework called Nautilus, which is a primary software contribution of this dissertation. A runtime ported to Nautilus that acts as an HRT enjoys significant performance gains relative to its ROS counterpart. Nautilus enables these gains by providing fast, light-weight mechanisms for runtimes. For example, with Legion running a mini-app of importance to the HPC community, we saw speedups of up to forty percent. We saw further improvements by leveraging hardware (interrupt control) that is not available to a user-space runtime. In order to bridge Nautilus with a “regular OS” (ROS) environment, I introduce a concept we developed called the hybrid virtual machine (HVM). Such bridged operation allows an HRT to leverage existing functionality within a ROS with low overheads. This simplifies the construction of HRTs. In addition to Nautilus and the HVM, I introduce an event system called Nemo, which allows runtimes to leverage events both with a familiar interface and with mechanisms that are much closer to the hardware. Nemo enables event notification latencies that outperform Linux user-space by several orders of magnitude. Finally, I introduce Multiverse, a system that implements a technique called automatic hybridization. This technique allows runtime developers to more quickly adopt the HRT model by starting with a working HRT system and incrementally moving functionality from a ROS to the HRT. Keywords: Hybrid Runtimes, Parallelism, Operating Systems, High-Performance Com- puting NORTHWESTERN UNIVERSITY Hybrid Runtime Systems A DISSERTATION SUBMITTED TO THE GRADUATE SCHOOL IN PARTIAL FULFILLMENT OF THE REQUIREMENTS for the degree DOCTOR OF PHILOSOPHY Field of Computer Science By Kyle C. Hale EVANSTON, ILLINOIS August 2016 2 c Copyright by Kyle C. Hale 2016 All Rights Reserved 3 Thesis Committee Peter A. Dinda Northwestern University Committee Chair Nikos Hardavellas Northwestern University Committee Member Russ Joseph Northwestern University Committee Member Fabián E. Bustamante Northwestern University Committee Member Arthur B. Maccabe Oak Ridge National Laboratory External Committee Member 4 Abstract Hybrid Runtime Systems Kyle C. Hale As parallelism continues to increase in ubiquity—from mobile devices and GPUs to datacenters and supercomputers—parallel runtime systems occupy an increasingly important role in the system software stack. The needs of parallel runtimes and the increasingly sophisticated languages and compilers they support do not line up with the services provided by general-purpose OSes. Furthermore, the semantics available to the runtime are lost at the system-call boundary in such OSes. Finally, because a runtime executes at user-level in such an environment, it cannot leverage hardware features that require kernel-mode privileges—a large portion of the functionality of the machine is lost to it. These limitations warp the design, implementation, functionality, and performance of parallel runtimes. I make the case for eliminating these compromises by transforming parallel runtimes into hybrid runtimes (HRTs), runtimes that run as kernels, and that enjoy full hardware access and control over abstractions to the machine. The primary claim 5 of this dissertation is that the hybrid runtime model can provide significant benefits to parallel runtimes and the applications that run on top of them. I demonstrate that it is feasible to create instantiations of the hybrid runtime model by doing so for four different parallel runtimes, including Legion, NESL, NDPC (a home- grown language), and Racket. These HRTs are enabled by a kernel framework called Nautilus, which is a primary software contribution of this dissertation. A runtime ported to Nautilus that acts as an HRT enjoys significant performance gains relative to its ROS counterpart. Nautilus enables these gains by providing fast, light-weight mechanisms for runtimes. For example, with Legion running a mini-app of importance to the HPC community, we saw speedups of up to forty percent. We saw further improvements by leveraging hardware (interrupt control) that is not available to a user-space runtime. In order to bridge Nautilus with a “regular OS” (ROS) environment, I introduce a concept we developed called the hybrid virtual machine (HVM). Such bridged operation allows an HRT to leverage existing functionality within a ROS with low overheads. This simplifies the construction of HRTs. In addition to Nautilus and the HVM, I introduce an event system called Nemo, which allows runtimes to leverage events both with a familiar interface and with mechanisms that are much closer to the hardware. Nemo enables event notification latencies that outperform Linux user-space by several orders of magnitude. Finally, I introduce Multiverse, a system that implements a technique called automatic hybridization. This technique allows runtime developers to more quickly adopt the HRT model by starting with a working HRT system and incrementally moving functionality from a ROS to the HRT. 6 Acknowledgments When I first embarked on the road to research, I considered myself an ambitious individual. Little did I know that personal ambitions and aspirations comprise a relatively minor fraction of the conditions required to pursue an academic career. I have found that the incredible amount of support so generously offered by my mentors, my friends, and my family has been far more important than anything else. I cannot hope to thank everyone who has shaped me as an individual and led me to where I am, but I will attempt here to name a few that stand out. I first thank my advisor, Professor Peter Dinda, for his encouragement and advice during my studies at Northwestern. I find it inadequate to just call him my advisor, as it understates the role he has played in guiding my development, both intellectually and per- sonally. Peter embodies all the qualities of a true mentor, from his seemingly inexhaustible supply of patience to his extraordinary ability to cultivate others’ strengths. Before meet- ing him, I had never encountered a teacher so genuinely dedicated to advocating for his students and their success. I can only hope to uphold the same standards of mentorship. I would also like to thank my other dissertation committee members. Nikos Hardavellas has always had an open door for brainstorming sessions and advice. Russ Joseph similarly 7 has always been willing to take the time to help, always with an uplifting note of humor and a smile. Fabián Bustamante has throughout been a source of candid conversation and genuine advice. Barney Maccabe has been an indispensable colleague and has always made me feel welcome in the HPC community. Two very important people who gave me the benefit of the doubt when I started doing research deserve special thanks. Steve Keckler and Boris Grot not only advised me as an undergraduate, they also helped me discover confidence in myself that I never thought I had. I also owe thanks to Calvin Lin and the Turing Scholars Program at The University of Texas for planting the seeds for my interest in research in computer systems. To FranCee Brown McClure and all of the other Ronald E. McNair scholars, I could not have made it to this point without you. I must also thank Kevin Pedretti and Patrick Bridges for helping make my stint in New Mexico an enjoyable one, and for making me feel at home in the Southwest. To Jack Lange I am grateful for being able to enjoy the solidarity of a fellow Texan who also braved the cold Chicago winters to earn a doctorate. Several members of the Prescience Lab, both former and present, deserve thanks for their assistance, including Conor Hetland, Lei Xia, and Chang Bae. While the friends that have supported me through the years are too numerous to name, some stand out in helping me survive graduate school. I have Leonidas Spinoulas, Luis Pestaña, Armin Kappeler, and Jaime Espinosa to thank for broadening my horizons and bringing me out of my shell. I am ever grateful to John Rula for being a fantastic roommate, a loyal friend, and a willing participant in what others would consider outlandish intel- lectual discussions.

Load more