Scalable Event Tracking on High-End Parallel Systems

Portland State University PDXScholar Dissertations and Theses Dissertations and Theses 2010 Scalable event tracking on high-end parallel systems Kathryn Marie Mohror Portland State University Follow this and additional works at: https://pdxscholar.library.pdx.edu/open_access_etds Part of the Systems Architecture Commons Let us know how access to this document benefits ou.y Recommended Citation Mohror, Kathryn Marie, "Scalable event tracking on high-end parallel systems" (2010). Dissertations and Theses. Paper 2811. https://doi.org/10.15760/etd.2805 This Dissertation is brought to you for free and open access. It has been accepted for inclusion in Dissertations and Theses by an authorized administrator of PDXScholar. Please contact us if we can make this document more accessible: [email protected]. DISSERTATION APPROVAL The abstract and dissertation of Kathryn Marie Mohror for the Doctor of Philosophy in Computer Science were presented December 11, 2009, and accepted by the dissertation committee and the doctoral program. COMMITTEE APPROVALS: S Christopher M. Monsere Representative of the Office of Graduate Studies DOCTORAL PROGRAM APPROVAL: Wu-chi Feng, DirectOr Computer Science Ph.D. Program ABSTRACT An abstract of the dissertation of Kathryn Marie Mohror for the Doctor of Philosophy in Computer Science presented December 11, 2009. Title: Scalable Event Tracing on High-End Parallel Systems Accurate performance analysis of high end systems requires event-based traces to correctly identify the root cause of a number of the complex performance problems that arise on these highly parallel systems. These high-end architectures contain tens to hundreds of thousands of processors, pushing application scalability challenges to new heights. Unfortunately, the collection of event-based data presents scalability challenges itself: the large volume of collected data increases tool overhead, and results in data files that are difficult to store and analyze. Our solution to these problems is a new measurement technique called trace profiling that collects the information needed to diagnose performance problems that traditionally require traces, but at a greatly reduced data volume. The trace profiling technique reduces the amoun! of data measured and stored by capitalizing on the repeated behavior of programs, and on the similarity of the behavior and performance of parallel processes in an application run. Trace profiling is a hybrid between profiling and tracing, collecting summary information about the event patterns in an application run. Because the data has already been classified into behavior categories, we can present reduced, partially analyzed performance data to the user, highlighting the performance behaviors that comprised most of the execution time. 2 SCALABLE EVENT TRACING ON HIGH-END PARALLEL SYSTEMS by KATHRYN MARIE MOHROR A dissertation submitted in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Ill COMPUTER SCIENCE Portland State University ©2010 Acknowledgments I thank my dissertation advisor, Karen L. Karavanic, for her extraordinary guidance, understanding, and support over these years. Because of her efforts, I am solidly prepared for my future research career. I know that at this moment I sincerely appreciate all she has done for me, as well as I know that my appreciation will only grow over time as I begin to fully realize the depth of her commitment to my success. Thank you, Karen. I thank my dissertation committee (Jingke Li, Suresh Singh, Bryant York, and Christopher Monsere) for taking the time to provide thoughtful feedback and advice on my research. Your expertise made my dissertation stronger. Thank you to my fellow students in the High Performance Computing Lab at Portland State University, and most especially to Rashawn Knapp, for your camaraderie, your willingness to be a sounding board for new ideas, and ability to sit through countless practice talks. I thank John May and Lawrence Livermore National Laboratory for giving me opportunities for collaboration and for access to LLNL computing resources. Last, but certainly not least, I thank my husband and family for their unwavering support during my time in school. I know I was sometimes a very distracted wife, mother, daughter, sister, but you all supported my goals regardless. Thank you for cheering me on during the good times and cheering me up during the bad. If it weren't for you, the road would have been much more difficult. 1 Table of Contents Acknowledgments ........................................................................................................ i List of Figures ............................................................................................................. iv List of Tables .............................................................................................................. vi 1 Introduction ......................................................................................................... 1 1.1 Motivation ................................................................................................... 1 1.1.1 Uses of Event Tracing ............................................................................. 2 1.1.2 Summary ................................................................................................ 10 1.2 Dissertation Contributions ......................................................................... 10 1.3 Dissertation Organization .......................................................................... 12 2 Related Work ..................................................................................................... 13 2.1 Perturbation ............................................................................................... 13 2.2 Trace File Size Reduction ......................................................................... 15 2.2.1 Trace File Compression ......................................................................... 15 2.2.2 Measuring or Writing Less Data ........................................................... 16 2.3 Analysis Tool and Visualization Scalability ............................................. 21 3 Study of Tracing Overheads .............................................................................. 23 3 .1 Experiment Design .................................................................................... 23 3 .2 Results ....................................................................................................... 28 3.2.1 Event Counts and Trace File Sizes ........................................................ 28 3.2.2 Execution Time ..................................................................................... 29 3.2.3 Execution Time vs Event Counts .......................................................... 32 3.3 Conclusions ............................................................................................... 33 4 Trace Profiling ................................................................................................... 35 4.1 Background ................................................................................................ 36 4.2 Trace Profiling Technique ......................................................................... 36 4.2.1 Trace Segmentation ............................................................................... 40 4.2.2 Intra-process Segment Comparison ....................................................... 41 4.2.3 Inter-process Segment Comparison ....................................................... 42 4.3 Trace Profile Segment Comparison Methods ............................................ 45 4.3.1 Distance Methods .................................................................................. 45 4.3.2 Iteration-based Methods ........................................................................ 51 4.4 Traditional Trace and Trace Profile Size Models ...................................... 52 4.4.1 Traditional Trace ................................................................................... 52 4.4.2 Trace Profile .......................................................................................... 54 4.5 Traditional Trace and Trace Profile Size Comparison Using Models ...... 57 4.5.l Traditional Trace ................................................................................... 58 4.5.2 Trace Profile .......................................................................................... 60 4.5.3 Comparison of Traditional Trace and Trace Profile .............................. 60 4.6 Trace Profiling and Visualization .............................................................. 62 4.7 Summary .................................................................................................... 63 5 Trace Comparison Methods .............................................................................. 65 11 5.1 Evaluation Methodology ........................................................................... 66 5 .1.1 Benchmarks ........................................................................................... 66 5.1.2 Application ............................................................................................ 70 5.1.3 Instrumentation ...................................................................................... 70 5.1.4 Evaluation Criteria ................................................................................. 71 5.2 Intra-process Reduction Evaluation

Load more