Msc THESIS Quantitative Analysis and Visualization of Memory Access Patterns

Computer Engineering 2010 Mekelweg 4, 2628 CD Delft The Netherlands http://ce.et.tudelft.nl/ MSc THESIS Quantitative Analysis and Visualization of Memory Access Patterns Marco Corina Abstract As the rate of improvement of processor performance has greatly exceeded the rate of improvement of memory performance, the communication between the (general-purpose) processor and the memory sybsystem became the main obstacle for achieving overall system performance improvements. Conversely, more and more modern application require a considerable amount of computing power. The introduction of heterogeneous reconfigurable systems are increasingly gaining popularity due to their ability to speed up the execution of an application. However, the widespread utilization of such systems through the industry seems inconvenient due to the shortage of tools guiding developers throughout the entire development process. Fur- thermore, the introduction of heterogenous reconfigurable systems does not still solve the processor-memory dilemma. Hence, there is a compelling need for tools that facilitate the development of applications of heterogeneous platforms and that help the developer gain more insight in the memory access behaviour of an application. CE-MS-2010-15 In this thesis, a set of sophisticated memory access analysis tools is presented, which provide detailed analysis on the memory access behaviour of an application. This toolset is developed in the context of the Delft Workbench and hArtes. The development of one of the tools in the toolset is the main contribution to this thesis. In this thesis, the toolset will be described and motivated. Emphasis will be put on the tool that is developed during this thesis. Finally, a case study on a real application is conducted, showing the potentialities of the tool. Faculty of Electrical Engineering, Mathematics and Computer Science Quantitative Analysis and Visualization of Memory Access Patterns THESIS submitted in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE in COMPUTER ENGINEERING by Marco Corina born in The Hague, The Netherlands Computer Engineering Department of Electrical Engineering Faculty of Electrical Engineering, Mathematics and Computer Science Delft University of Technology Quantitative Analysis and Visualization of Memory Access Patterns by Marco Corina Abstract s the rate of improvement of processor performance has greatly exceeded the rate of improvement of memory performance, the communication between the (general-purpose) pro- A cessor and the memory sybsystem became the main obstacle for achieving overall system performance improvements. Conversely, more and more modern application require a considerable amount of computing power. The introduction of heterogeneous reconfigurable systems are increasingly gaining popularity due to their ability to speed up the execution of an application. However, the widespread utilization of such systems through the industry seems inconvenient due to the shortage of tools guiding developers throughout the entire development process. Further- more, the introduction of heterogenous reconfigurable systems does not still solve the processor- memory dilemma. Hence, there is a compelling need for tools that facilitate the development of applications of heterogeneous platforms and that help the developer gain more insight in the memory access behaviour of an application. In this thesis, a set of sophisticated memory access analysis tools is presented, which provide detailed analysis on the memory access behaviour of an application. This toolset is developed in the context of the Delft Workbench and hArtes. The development of one of the tools in the toolset is the main contribution to this thesis. In this thesis, the toolset will be described and motivated. Emphasis will be put on the tool that is developed during this thesis. Finally, a case study on a real application is conducted, showing the potentialities of the tool. Laboratory : Computer Engineering Codenumber : CE-MS-2010-15 Committee Members : Advisor: Koen Bertels, CE, TU Delft Chairperson: Koen Bertels, CE, TU Delft Member: Eelco Visser, ST, TU Delft Member: Georgi Kuzmanov, CE, TU Delft i ii To my parents iii iv Contents List of Figures vii List of Tables ix Acknowledgements xi 1 Introduction 1 1.1 ProblemDefinition ............................... 2 1.2 Thesisoutline.................................. 3 2 Background 5 2.1 The Need for Heterogenous Reconfigurable Architectures .......... 5 2.2 The MOLEN Processor and Programming Paradigm . ... 7 2.3 Memory Communication Bottleneck . 9 2.3.1 Hardware Exploitation Methods . 9 2.3.2 Software Exploitation Methods . 11 2.3.3 ConcludingRemarks . 14 2.4 Motivation for Improved Memory Analysis . ..... 15 2.5 Hardware/Software Co-design . 15 2.5.1 DesignSpaceExploration . 16 2.5.2 Hardware/Software Partitioning and Mapping . ..... 16 2.6 DelftWorkbench ................................ 17 2.7 hArtes...................................... 20 2.8 Motivation for Dynamic Memory Analysis Tools . ..... 21 2.9 Conclusion ................................... 22 3 Analysis Techniques 23 3.1 ProgramAnalysis................................ 23 3.2 DynamicBinaryInstrumentation . 24 3.2.1 DynamoRIO .............................. 25 3.2.2 Valgrind................................. 25 3.2.3 pin.................................... 26 3.3 Pin........................................ 26 3.3.1 EvaluationofPin............................ 28 3.3.2 Comparison of Pin against Other DBIs . 28 3.3.3 ProetContraofPin.......................... 29 3.4 PinInstrumentationAPI . 29 3.5 CallingPin ................................... 31 3.6 Conclusion ................................... 32 v 4 QUAD: Sophisticated Memory Patterns Analysis Tools 33 4.1 QUAD - Quantitative Usage Analysis of Data . 33 4.1.1 QUADObjective............................ 34 4.1.2 QUAD Design and Implementation . 34 4.1.3 HowQUADWorks........................... 36 4.1.4 QUAD’sExampleOutputs . 37 4.2 tQUAD ..................................... 38 4.2.1 tQUADObjectives........................... 38 4.2.2 tQUAD Design and Implementation . 39 4.2.3 HowtQUADWorks .......................... 41 4.2.4 tQUAD’sExampleOutput . 42 4.3 ConcludingRemarks-ThexQUADTool . 42 5 The xQUAD Tool 45 5.1 Introduction and overview of xQUAD . 45 5.2 BackgroundInformation . 46 5.2.1 ObjectFileOverview . .. .. .. .. .. .. .. 46 5.2.2 DWARF Debugging Format . 49 5.2.3 DWARFConsumerLibrary . 51 5.3 xQUADArchitecture.............................. 52 5.3.1 xQUAD Design and Implementation . 53 5.3.2 DWARF Debugging Information Module . 54 5.3.3 Pin API calls and Analysis Routines . 58 5.3.4 How xQUAD Works: Functionalities and Restrictions . ..... 60 5.4 xQUADAnalysisOutputExamples . 61 5.4.1 MemoryMapReportFile . 61 5.5 Detailed Variable report file . 63 5.6 Conclusion ................................... 63 6 Case Study: Wave Field Synthesis 65 6.1 Introduction to Wave Field Synthesis . ..... 65 6.2 ExperimentalSetup .............................. 66 6.3 Memory Access Behaviour of WFS . 67 6.4 Ranking of Memory Intensive WFS Kernels . 70 6.5 Conclusion ................................... 73 7 Conclusion and Future Work 75 7.1 Conclusion ................................... 75 7.2 FutureWork .................................. 76 Bibliography 80 vi List of Figures 2.1 TheMOLENmachineorganization . 8 2.2 TheDWBArchitecture ............................ 17 2.3 TheDWARVToolset.............................. 19 2.4 ThehArtesToolchain ............................. 20 3.1 Pin’sArchitecture ............................... 27 3.2 Pin example of instrumentation and analysis routines . ......... 30 4.1 Profiling Framework of QUAD within DWB . 35 4.2 Architectural overview of QUAD . 35 4.3 Implementation overview of QUAD . 36 4.4 XML formatofproducer/consumerbinding . 38 4.5 Profiling Framework of tQUAD within DWB . 39 4.6 Architectural overview of tQUAD . 39 4.7 tQUAD instruction instrumentation pseudocode . ....... 40 4.8 tQUAD routine instrumentation . 41 5.1 SectionsandsegmentsofanELF . 48 5.2 Segments of an executable object file mapped in memory . ...... 49 5.3 DWARFDIEssampleoutput . 50 5.4 C Program and DWARF description of program . 51 5.5 Architectural overview of xQUAD . 52 5.6 Example of the MONITOR file passed to the xQUAD tool . 54 5.7 xQUADmainfunction............................. 55 5.8 DWARF Depth-First Tree Traversal . 56 5.9 xQUADmemorymapexample . 61 5.10 Example of memory visualization . 62 6.1 PrincipleofWFS................................ 66 6.2 SnapshotsoftheheapmemoryusageofWFS . 67 6.3 SnapshotsoftheheapmemoryusageofWFS . 68 6.4 Snapshots of the stack memory usage of the WFS . 70 6.5 Snapshots of the stack memory usage of the WFS . 71 6.6 Snapshots of the stack memory usage of the WFS . 72 vii viii List of Tables 4.1 Phases in the execution path of the hArtes wfs application. 42 5.1 Example of statistics for the wav load function from the WFS application 62 5.2 Example of statistics for the wav load function from the WFS application 63 6.1 Memory usage statistics for the hArtes wfs application. 68 6.2 Flat profile for the hArtes wfs application.. 69 6.3 RankofWFSkernels.............................. 72 ix x Acknowledgements First of all, I would like to thank my advisor Prof. Koen Bertels, for his support and guidance during this thesis. Further, I would like to thank Arash Ostadzadeh for his invaluable help, it would have been much harder without his assistance. Also, I would like to thank Roel Meeuws

Load more