Performance Optimization of a Structured Cfd Code - Ghost on Commodity Cluster Architectures

University of Kentucky UKnowledge University of Kentucky Master's Theses Graduate School 2008 PERFORMANCE OPTIMIZATION OF A STRUCTURED CFD CODE - GHOST ON COMMODITY CLUSTER ARCHITECTURES Pavan K. Kristipati University of Kentucky, [email protected] Right click to open a feedback form in a new tab to let us know how this document benefits ou.y Recommended Citation Kristipati, Pavan K., "PERFORMANCE OPTIMIZATION OF A STRUCTURED CFD CODE - GHOST ON COMMODITY CLUSTER ARCHITECTURES" (2008). University of Kentucky Master's Theses. 567. https://uknowledge.uky.edu/gradschool_theses/567 This Thesis is brought to you for free and open access by the Graduate School at UKnowledge. It has been accepted for inclusion in University of Kentucky Master's Theses by an authorized administrator of UKnowledge. For more information, please contact [email protected]. ABSTRACT OF THESIS PERFORMANCE OPTIMIZATION OF A STRUCTURED CFD CODE - GHOST ON COMMODITY CLUSTER ARCHITECTURES This thesis focuses on optimizing the performance of an in-house, structured, 2D CFD code – GHOST, on commodity cluster architectures. The basic philosophy of the work is to optimize the cache usage of the code by implementing efficient coding techniques without changing the underlying numerical algorithm. Various optimization techniques that were implemented and the resulting changes in performance have been presented. Two techniques, external and internal blocking that were implemented earlier to tune the performance of this code have been reviewed. What follows is further tuning effort in order to circumvent the problems associated with using the blocking techniques. Later, to establish the universality of the optimization techniques, testing has been done on more complicated test case. All the techniques presented in this thesis have been tested on steady, laminar test cases. It has been proved that optimized versions of the code achieve better performances on variety of commodity cluster architectures chosen in this study. KEYWORDS: Cache Optimization, Structured CFD Code Optimization, Efficient Coding Techniques, Improving Performance Without Changing Algorithm, Commodity Cluster Architectures. Pavan K. Kristipati 12/3/2008 Copyright © Pavan K Kristipati, 2008 PERFORMANCE OPTIMIZATION OF A STRUCTURED CFD CODE - GHOST ON COMMODITY CLUSTER ARCHITECTURES By Pavan. K. Kristipati Dr. Raymond .P. LeBeau Director of Thesis Dr. L. S. Stephens Director of Graduate Studies 12/3/2008 RULES FOR THE USE OF THESIS Unpublished thesis submitted for the Master’s degree and deposited in the University of Kentucky Library are as a rule open for inspection, but are to be used only with due regard to the rights of the authors. Bibliographical references may be noted, but quotations or summaries of parts may be published only with the permission of the author, and with the usual scholarly acknowledgements. Extensive copying or publication of the thesis in whole or in part also requires the consent of the Dean of the Graduate School of the University of Kentucky. A library that borrows this thesis for use by its patrons is expected to secure the signature of each user. Name Date THESIS Pavan K. Kristipati The Graduate School University of Kentucky 2008 PERFORMANCE OPTIMIZATION OF A STRUCTURED CFD CODE - GHOST ON COMMODITY CLUSTER ARCHITECTURES THESIS A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in Mechanical Engineering in the College of Engineering at the University of Kentucky By Pavan. K. Kristipati Lexington, Kentucky Director: Dr. Raymond .P. LeBeau Assistant Professor of Mechanical Engineering University of Lexington, Kentucky 2008 To my wife Sudhira Kristipati ACKNOWLEDGEMENTS I would like to express my deepest appreciation and respect to my academic advisor Dr. Raymond P. LeBeau who has not only been a source of constant support and encouragement in many ways through out my Masters program but also who has given me an opportunity to complete my Masters program after being away from university for nearly 4 years. He is one of the very few people I came across who strives for excellence and creates an atmosphere for us to excel. Dr. LeBeau, I am honoured to be your student. I would like to thank Dr. Johné M. Parker, Dr. T. Michael Seigler for accepting to be part of my thesis committee and for their valuable time in spite of their hectic schedule. Their input made this a better thesis. Sincere thanks to Dr. George Huang, for teaching how to understand a CFD code and for letting me work on his code. A million thanks to my wife Sudhira, the backbone behind this accomplishment; it would not have been possible for me to wrap up this work after a gap of nearly 4 years without her help and her constant belief in me that I can do this. I want to thank my parents (Ashok and Visala) and Sudhira’s parents (Prabhakara Rao and Bhanumathi) for believing in me. Dad – I finally made your life ambition come true!! My gratitude is beyond words for my uncle Satyam and aunt Bhavani, the couple who are the reason for me to be able to come to the USA and who constantly challenged me to work on completing my graduate degree. Special thanks to the Graduate school at University of Kentucky for offering me the Kentucky Graduate Scholarship (KGS) and my life mentors Prakash Gupta and Dr. Santhosh Kumar. iii TABLE OF CONTENTS ACKNOWLEDGEMENTS .......................................................................................................................... iii LIST OF FIGURES...................................................................................................................................... vii LIST OF TABLES ..........................................................................................................................................x LIST OF FILES..............................................................................................................................................xi 1. INTRODUCTION.....................................................................................................................................1 1.1 WHY CFD? .........................................................................................................................................1 1.2 SOLUTION PROCESS IN CFD .........................................................................................................2 1.3 CFD AND COMPUTING POWER.....................................................................................................4 1.4 PARALLEL COMPUTING AND BEOWULF CLUSTERS..............................................................6 1.5 INTRODUCTION TO PROBLEM .....................................................................................................8 1.6 PRESENT WORK ...............................................................................................................................9 2. CACHE-BASED ARCHITECTURES ..................................................................................................11 2.1 INTRODUCTION .............................................................................................................................11 2.2 EVOLUTION OF CACHE-BASED ARCHITECTURES ................................................................11 2.3 MEMORY ARCHITECTURE ..........................................................................................................12 2.4 CACHE’S ROLE IN THE PERFORMANCE OF A CODE .............................................................14 2.4.1 LOCALITY OF REFERENCE..................................................................................................14 2.4.2 CACHE HIT AND CACHE MISS ............................................................................................16 2.4.3 HOW CACHE MEMORY WORKS .........................................................................................18 2.5 CACHE MEMORY ORGANIZATION............................................................................................19 2.6 CACHE OPTIMIZATION GUIDELINES........................................................................................25 2.6.1 TECHNIQUES FOR OPTIMIZING MEMORY ACCESS.......................................................25 2.6.1.1 OPTIMAL DATA LAYOUT ...........................................................................................26 2.6.1.2 LOOP INTERCHANGE...................................................................................................26 2.6.1.3 USING DATA STRUCTURES INSTEAD OF ARRAYS ..............................................27 2.6.1.4 LOOP BLOCKING ..........................................................................................................30 2.6.2 OPTIMIZING FLOATING POINT OPERATIONS .................................................................30 2.6.2.1 REMOVING FLOATING IFS.........................................................................................31 2.6.2.2 REMOVING UNWANTED CONSTANTS INSIDE LOOPS.........................................31 2.6.2.3 AVOIDING UNNECESSARY RECALCULATIONS INSIDE LOOPS ........................31 2.6.2.4 REDUCING DIVISION LATENCY BY USING RECIPROCALS................................32 2.6.2.5 SUBROUTINE INLINING ..............................................................................................33 2.6.2.7 LOOP UNWINDING .......................................................................................................34 2.6.2.8 LOOP DEFACTORIZATION..........................................................................................35

Load more