Parallel Programming Methodology and Environment

PARALLEL PROGRAMMING METHODOLOGY AND ENVIRONMENT FOR THE SHARED MEMORYPROGRAMMING MODEL A Thesis Submitted to the Faculty of Purdue University by Insung Park In Partial Fulllmentofthe Requirements for the Degree of Do ctor of Philosophy Decemb er ii Tomy b eloved grandmother iii ACKNOWLEDGMENTS First Id like to thank my grandmother whom I have not seen for more than twoyears and whom I will never see again She ed to South Korea with four little daughters during the Korean War and started a new life in an unfamiliar place with her bare hands Her courage p erseverance and endurance have led to my existence Over the years in graduate scho ol she has always b een on my side lending a sympathetic ear and doing her b est to keep me sane I wish I could see her just one more time Id like to thank my advisor Dr Rudolf Eigenmann for his encouragementand advice during my research His insightful comments and constructive suggestions are greatly appreciated I also express my gratitude to my graduate committee memb ers Dr JoseABFortes Dr Howard J Siegel and Dr Elias Houstis for their time and advice My deep est lovegoestomy parents and two brothers In Jun and In Kwon Icannever thank them enough for their neverending supp ort that has made me come through with my research Through ups and downs in life their loveand encouragementhasgiven me the strength to go on with my life I am also grateful to my aunts uncles and cousins who have never hidden their pride in me and concern for mywellb eing Fresh and valuable p ersp ectives that the memb ers of our research group have provided are greatly appreciated Among them Mike Seon Brian and Vishal have made extra eorts to help me with my research which I deeply acknowledge Mike Natalie and Nicholas deserve sp ecial mention for always b eing there for me Icherish them as my brother sister and nephew Without them I would not have made it this far I b elieve one of the reasons Go d led me here is to meet them I also iv value my tob elifelong friendship with Seon Young and their precious daughter Arden Numerous evenings I havespent with all these friends are precious to me I appreciate manyofmy Korean friends here at Purdue Esp ecially I extend my thanks to Jonghyeok and JeHo The life here has b een joyous and fun b ecause of them Thanks are also due to their wives who have fed this single hungry graduate student countless times Id also liketomention In Sung Jae Hyung Yonghee So on Keon Heon Seungmo on So ohong Jang Won Il Jung Min Hun So o Woon Young Jong Sun Se Hyun and their families LastlyIsendmy b est regard to Jo on So ok and her family I wish them happiness v TABLE OF CONTENTS Page LIST OF TABLES ix LIST OF FIGURES xi ABSTRACT xv INTRODUCTION Motivation State of parallel computing Op en issues in the shared memory programming mo del Need for parallel programming environment Thesis Organization BACKGROUND Parallel Programming Concepts Terminology and Notations Parallelization in the Shared Memory Programming Mo del Intro duction History of parallel shared memory directives Shared memory program execution Automatic parallelization Parallelization in the Message Passing Programming Mo del MPI and PVM HPF Visual parallel programming systems Parallel Programming and Optimization Metho dology Shared memory programming metho dology Message Passing programming metho dology To ols vi Program development and optimization Instrumentation Performance visualization and evaluation Guidance Utilizing Web Resources for Parallel Programming Conclusions SHARED MEMORYPROGRAM OPTIMIZATION METHODOLOGY Intro duction Scop e Audience and Metrics Scop e of the prop osed metho dology Target audience Metrics understanding overheads Parallel Program Optimization Metho dology Instrumenting program Getting serial execution time Running parallelizing compiler Manually optimizing programs Getting optimized execution time Finding and resolving p erformance problems Conclusions TOOL SUPPORTFOR PROGRAM OPTIMIZATION METHODOLOGY Design Ob jectives Ursa MinorPerformance Evaluation Tool Functionality Internal Organization of the Ursa Minor to ol Database structure and data format Summary InterPolInteractiveTuning Tool Overview Functionality vii Summary Other To ols in Our Toolset Polaris parallelizing compiler InterAct p erformance monitoring and steering to ol MaxP parallelism analysis to ol Integration with Metho dology To ol supp ort in eachstep Other useful utilities The Parallel Programming Hub and Ursa Major Parallel Programming Hub globally accessible integrated to ol environment Ursa Major making a rep ository of knowledge available to the world wide audience Conclusions EVALUATION Metho dology Evaluation Case Studies Manual tuning of ARCD Evaluating a parallelizing compiler on a large application Interactive compilation Performance advisor hardware counter data analysis Performance advisor simple techniques to improve p erformance Eciency of the Tool Support Facilitating the tasks in parallel programming General comments from users Comparison with Other Parallel Programming Environments Comparison of Ursa Major and the Parallel Programming Hub Conclusions CONCLUSIONS Summary Directions for Future Work viii LIST OF REFERENCES VITA ix LIST OF TABLES Table Page Overhead categories of the sp eedup comp onentmodel Optimization technique application criteria A detailed breakdown of the p erformance improvement due to each technique Common tasks in parallel programming Time in seconds taken to p erform the tasks without our to ols Time in seconds taken to p erform the tasks with our to ols Feature comparison of parallel programming environments Workload distribution on resources with our networkbased to ols x xi LIST OF FIGURES Figure Page The structure of an SMP A pro cessor Origin system a top ology and b structure of a single no de b oard Simple parallelization with Op enMP Screenshot of the CODE visual programming system The timeline graph from NTV The graphs generated by AIMS The graphs generated byPablo Typical parallel program developmentcycle Overview of the prop osed metho dology Scalar privatization a the original lo op and b the same lo op after privatizing variable X Arrayprivatization a the original lo op and b the same lo op after privatizing variable array A Scalar reduction a the original lo op and b the same lo op after recognizing reduction variable SUM Array reduction a the original lo op and b the same lo op after recognizing reduction array A Induction variable recognition a the original lo op and b the same lo op after replacing induction variable X Scheduling mo dication a the original lo op and b the same lo op after mo difying scheduling by pushing parallel constructs inside the lo op nest In b the inner lo op is executed in parallel thus pro cessor access array elements that at least stride apart Padding a the original lo op and b the same lo op after padding extra space into the arrays xii Load balancing a the original lo op and b the same lo op after changing to interleaved scheduling scheme By changing the scheduling from static to dynamic unbalanced load can b e distributed more evenly Blo ckingtiling a the original lo op and b the same lo op after ap plying tiling to split the matrices into smaller tiles In b another lo op has b een added to assign smaller blo cks to each pro cessor The data are likely to remain in the cache when they are needed again Lo op interchange a a lo op with p o or lo calityandbthesameloop with b etter lo cality after interchanging lo op nest Software pip eline and lo op unrolling a the original lo op b the same lo op with software pip eline Instructions are interleaved across iterations and preamble and p ostamble have b een added and c the same lo op unrolled by do in program SWIM Original lo op SHALOW Parallel version of SHALOW do in program SWIM Optimized version of SHALOW do in program SWIM Main view of the Ursa Minor to ol The user has gathered infor mation on program BDNA After sorting the lo ops based on the ex ecution time the user insp ects the p ercentage of three ma jor lo ops ACTFOR do ACTFOR do RESTAR do using a pie chart gen erator b ottom left Computing the sp eedup column with the do is Expression Evaluator reveals that the sp eedup for RESTAR p o or so the user is examining more detailed information on the lo op Structure view of the Ursa Minor to ol The user is lo oking at the Structure

Load more