Sharing Gpus for Real-Time Autonomous-Driving Systems

SHARING GPUS FOR REAL-TIME AUTONOMOUS-DRIVING SYSTEMS Ming Yang A dissertation submitted to the faculty at the University of North Carolina at Chapel Hill in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Computer Science. Chapel Hill 2020 Approved by: James H. Anderson Parasara Sridhar Duggirala Jan-Michael Frahm Shahriar Nirjon F. Donelson Smith Shige Wang ©2020 Ming Yang ALL RIGHTS RESERVED ii ABSTRACT Ming Yang: Sharing GPUs for Real-Time Autonomous-Driving Systems (Under the direction of James H. Anderson) Autonomous vehicles at mass-market scales are on the horizon. Cameras are the least expensive among common sensor types and can preserve features such as color and texture that other sensors cannot. Therefore, realizing full autonomy in vehicles at a reasonable cost is expected to entail computer-vision techniques. These computer-vision applications require massive parallelism provided by the underlying shared accelerators, such as graphics processing units, or GPUs, to function “in real time.” However, when computer-vision researchers and GPU vendors refer to “real time,” they usually mean “real fast”; in contrast, certifiable automotive systems must be “real time” in the sense of being predictable. This dissertation addresses the challenging problem of how GPUs can be shared predictably and efficiently for real-time autonomous-driving systems. We tackle this challenge in four steps. First, we investigate NVIDIA GPUs with respect to scheduling, synchronization, and execution. We conduct an extensive set of experiments to infer NVIDIA GPU scheduling rules, which are unfortunately undisclosed by NVIDIA and are beyond access owing to their closed-source software stack. We also expose a list of pitfalls pertaining to CPU-GPU synchronization that can result in unbounded response times of GPU-using applications. Lastly, we examine a fundamental trade-off for designing real-time tasks under different execution options. Overall, our investigation provides an essential understanding of NVIDIA GPUs, allowing us to further model and analyze GPU tasks. Second, we develop a new model and conduct schedulability analysis for GPU tasks. We extend the well-studied sporadic task model with additional parameters that characterize the parallel execution of GPU tasks. We show that NVIDIA scheduling rules are subject to fundamental capacity loss, which implies a necessary total utilization bound. We derive response-time bounds for GPU task systems that satisfy our schedulability conditions. Third, we address an industrial challenge of supplying the throughput performance of computer-vision frameworks to support adequate coverage and redundancy offered by an array of cameras. We re-think iii the design of convolution neural network (CNN) software to better utilize hardware resources and achieve increased throughput (number of simultaneous camera streams) without any appreciable increase in per-frame latency (camera to CNN output) or reduction of per-stream accuracy. Fourth, we apply our analysis to a finer-grained graph scheduling of a computer-vision standard, OpenVX, which explicitly targets embedded and real-time systems. We evaluate both the analytical and empirical real-time performance of our approach. iv To my sister, Yuan Yang. v ACKNOWLEDGEMENTS It was serendipity that led me to UNC five years ago. As I stand at the end point of this journey now, with this dissertation being completed, I realize that anything I accomplished would not have been possible had I not received all the guidance, aid, and support from people I met. I am indebted and thankful to them. First and foremost, I would like to express my deepest appreciation to my advisor, Jim Anderson, for his unwavering support, for his patience whenever I was slowly progressing, for his motivation and encouragement every time I was frustrated, and for his guidance that led me forward. I would also like to thank my dissertation committee: Parasara Sridhar Duggirala, Jan-Michael Frahm, Shahriar Nirjon, F. Donelson Smith, and Shige Wang, for their valuable advice. I would also like to extend my sincere thanks to many colleagues I worked with. In particular, I am especially thankful to people who helped with the work of this dissertation: Tanya Amert, Joshua Bakita, Nathan Otterness, Thanh Vu, and Kecheng Yang. This dissertation would not have been possible without their contributions. I am also grateful for the internships that General Motors and Aurora Innovation offered so I had the opportunities to enjoy sitting in the autonomous vehicles and experiencing how they work. I very much appreciate Shige Wang for his advice regarding and beyond my research and career. I am also thankful to Glenn Elliott for many inspiring discussions. I thank Joseph D’Ambrosio and Ken Conley for their support. I am also thankful to my other co-authors: Alex Berg, Pontus Ekberg, Vance Miller, Saujas Nandi, Catherine Nemitz, Eunbyung Park, Sarah Rust; and other people I worked with in UNC: Shareef Ahmed, Akash Bapat, Lee Barnett, Micaiah Chisholm, Calvin Deutschbein, Shiwei Fang, Cheng-Yang Fu, Zhishan Guo, Clara Hobbs, Bashima Islam, Tamzeed Islam, Namhoon Kim, Seulki Lee, Yubo Luo, Mac Mollison, Sims Osborne, Abhishek Singh, Stephen Tang, Peter Tong, Sergey Voronov, and Bryan Ward. I gratefully acknowledge the assistance I received from the staff of the UNC Computer Science Depart- ment. Special thanks to Denise Kenney, Beth Mayo, Jodie Gregoritsch, Adia Ware, and Mellisa Wood for keeping the graduate school paperwork in order. Many thanks to Murray Anderegg, Bil Hays, David Musick, Mike Stone, and other staff for their help with various hardware and system issues. vi I cannot leave Chapel Hill without mentioning my friends. I owe special thanks to Weiwei Li, for preventing me from insanely giving up my PhD study. I was also very lucky to meet Shiwei Fang by chance in the roommate lottery. Thanks to them and my other friends for enriching my PhD life experience. As I believe everyone needs a supporting system to survive the PhD process, mine is my dearest sister—in my darkest time or whenever, she was always there for me, persuasive and supportive, and finally, I share this victorious ending of this journey with her. Lastly, I thank my parents for their unwavering support, and my wife for her patience and love. The research in this dissertation was funded by NSF grants CPS 1446631 and CPS 1837337, ARO grant W911NF-17-1-0294, and support from General Motors. vii TABLE OF CONTENTS LIST OF TABLES . xiii LIST OF FIGURES . xiv LIST OF ABBREVIATIONS . xvi Chapter 1: Introduction . 1 1.1 Computer Vision in Autonomous Driving . 2 1.2 Graphics Processing Units . 4 1.3 Real-Time Systems and GPU Scheduling . 6 1.4 Thesis Statement . 8 1.5 Contributions . 8 1.5.1 A Study of NVIDIA GPUs beyond Official Documentation . 9 1.5.2 A Model of GPU Execution . 10 1.5.3 A Response-Time Bound Analysis for Applications Sharing GPUs . 10 1.5.4 A Computer-Vision Framework Providing Improved Throughput . 11 1.5.5 A Case Study Evaluating Analytical and Empirical Real-Time Performance . 12 1.6 Organization . 13 Chapter 2: Background . 14 2.1 Autonomous Driving . 14 2.1.1 State of the Art . 14 2.1.2 Autonomous-Driving System Architecture . 17 2.1.2.1 Architecture . 17 2.1.2.2 Sensor Hardware . 19 2.1.2.3 Perception. 20 viii 2.1.2.4 Planning and Control . 23 2.1.3 Summary . 24 2.2 Computer Vision for Autonomous Driving. 24 2.2.1 Methods . 25 2.2.1.1 Classic Pipelines. 25 2.2.1.2 Deep Learning Methods. 26 2.2.2 Object-Detection Accuracy Metrics . 29 2.2.2.1 IoU, Precision, and Recall. 29 2.2.2.2 Precision-Recall Curve and Mean Average Precision . 30 2.2.3 Frameworks and Libraries . 31 2.2.4 OpenVX Standard . 33 2.2.5 Challenges of Real-Time Certification Amenable Framework . 35 2.3 GPUs and Accelerators . 36 2.3.1 Accelerators. 36 2.3.2 GPU Hardware Architecture . 39 2.3.2.1 CUDA-Enabled Devices . 41 2.3.3 GPU Programming Model . 42 2.3.4 Prior Work on Non-Real-Time GPUs . 46 2.3.4.1 GPU Reverse Engineering . 47 2.3.4.2 GPU Resource Management . 48 2.3.4.3 GPU Sharing . 49 2.3.4.4 GPU Virtualization . ..

Load more