Application-Specific Memory Subsystems

Washington University in St. Louis Washington University Open Scholarship Engineering and Applied Science Theses & McKelvey School of Engineering Dissertations Spring 5-15-2015 Application-Specific eM mory Subsystems Joseph George Wingbermuehle Washington University in St. Louis Follow this and additional works at: https://openscholarship.wustl.edu/eng_etds Part of the Engineering Commons Recommended Citation Wingbermuehle, Joseph George, "Application-Specific eM mory Subsystems" (2015). Engineering and Applied Science Theses & Dissertations. 94. https://openscholarship.wustl.edu/eng_etds/94 This Dissertation is brought to you for free and open access by the McKelvey School of Engineering at Washington University Open Scholarship. It has been accepted for inclusion in Engineering and Applied Science Theses & Dissertations by an authorized administrator of Washington University Open Scholarship. For more information, please contact [email protected]. WASHINGTON UNIVERSITY IN ST. LOUIS Department of Computer Science & Engineering Dissertation Examination Committee: Roger D. Chamberlain, Chair Kunal Agrawal Ron K. Cytron Viktor Gruev Krishna Kavi Hiro Mukai Application-Specific Memory Subsystems by Joseph G. Wingbermuehle A dissertation presented to the Graduate School of Arts and Sciences of Washington University in partial fulfillment of the requirements for the degree of Doctor of Philosophy May 2015 St. Louis, Missouri c 2015, Joseph G. Wingbermuehle Table of Contents List of Figures ...................................... vi List of Tables ....................................... ix Acknowledgments .................................... x Abstract .......................................... xi Chapter 1: Introduction ................................ 1 1.1 ResearchQuestions ............................... 5 1.2 Contributions ................................... 6 1.3 Outline....................................... 7 Chapter 2: Background and Related Work .................... 8 2.1 On-ChipMemory................................. 8 2.2 Off-ChipMemory................................. 9 2.2.1 DRAM................................... 13 2.2.2 Phase-ChangeMemory. 14 2.2.3 Flash.................................... 15 2.2.4 STT-RAM................................. 16 2.3 MemoryComponents............................... 16 2.3.1 Caches................................... 16 2.3.2 Scratchpads ................................ 19 2.3.3 Prefetchers ................................ 20 2.3.4 Splits.................................... 20 2.3.5 AddressTransformations. 20 2.4 RelatedWork ................................... 21 2.4.1 Superoptimization ............................ 21 2.4.2 DesignSpaceExploration . 23 ii 2.4.3 Software Techniques for Improving Memory Behavior . ....... 24 2.4.4 TuningCacheParameters . 24 2.4.5 Non-traditionalMemorySubsystems . .. 25 2.4.6 MemoryInterfaces ............................ 27 Chapter 3: Tools ..................................... 28 3.1 ScalaPipe ..................................... 28 3.1.1 KernelDSL ................................ 29 3.1.2 ApplicationDSL ............................. 30 3.2 MemorySimulator ................................ 31 3.3 MemorySuperoptimizer . 35 3.4 MemoryGenerator ................................ 35 Chapter 4: Superoptimization of Memory Subsystems ............. 37 4.1 Introduction.................................... 37 4.2 Method ...................................... 37 4.2.1 AddressTraces .............................. 38 4.2.2 Simulation................................. 39 4.2.3 Optimization ............................... 41 4.2.4 NeighborhoodGeneration . 43 4.2.5 OffsetSelectionHeuristic. 45 4.2.6 ModelValidation ............................. 45 4.3 Benchmarks.................................... 46 4.4 MinimizingTotalAccessTime. .. 47 4.4.1 FPGAResults............................... 48 4.4.2 ASICResults ............................... 53 4.4.3 MemorySubsystemSpecificity. 57 4.5 MinimizingWrites ................................ 59 iii 4.5.1 Motivation................................. 59 4.5.2 Results................................... 60 4.6 Multi-ObjectiveSuperoptimization . ....... 67 4.7 Summary ..................................... 70 Chapter 5: Memory Subsystems for Streaming Applications ......... 72 5.1 Introduction.................................... 72 5.2 Method ...................................... 73 5.2.1 AddressTraces .............................. 74 5.2.2 Simulation................................. 76 5.2.3 Optimization ............................... 76 5.2.4 SubsystemGeneration . 78 5.3 Benchmarks.................................... 79 5.4 Results....................................... 84 5.4.1 InputSpecificity ............................. 92 5.4.2 Discussion................................. 94 5.5 Summary ..................................... 95 Chapter 6: A Model for Faster Superoptimization of Streaming Applications 96 6.1 Introduction.................................... 96 6.2 Method ...................................... 100 6.3 ModelError.................................... 104 6.4 Benchmarks.................................... 105 6.5 Evaluation..................................... 107 6.5.1 SubsystemPerformance . 107 6.5.2 SuperoptimizationRunTime . 114 6.6 Summary ..................................... 117 Chapter 7: Conclusion ................................. 118 iv 7.1 FutureWork.................................... 119 References ......................................... 121 Appendix A: ScalaPipe ................................. 134 A.1 KernelDSL .................................... 134 A.1.1 LanguageFeatures ............................ 135 A.1.2 Example.................................. 135 A.1.3 IntermediateRepresentation . 136 A.1.4 CodeGeneration ............................. 138 A.1.5 Optimizations............................... 140 A.2 ApplicationDSL ................................. 144 A.2.1 Overview ................................. 144 A.2.2 ResourceMapping ............................ 144 A.2.3 Example.................................. 145 A.2.4 TimeTrial ................................. 146 v List of Figures 2.1 MainMemoryLayout............................... 10 2.2 MainMemoryAddressing ............................ 10 2.3 DRAMCell .................................... 13 3.1 SimpleScalaPipeKernel . .. 29 3.2 GenericSplitKernel .............................. 30 3.3 AveragingApplication .............................. 30 3.4 ExampleMemoryDescription . .. 33 4.1 Working-SetSizes................................ 47 4.2 Best-caseFPGASpeedup ............................ 48 4.3 RealizedFPGASpeedup............................. 48 4.4 Superoptimized Memory Subsystems for the FPGA Target . ....... 50 4.5 Best-caseASICSpeedup ............................. 54 4.6 RealizedASICSpeedup ............................. 54 4.7 Superoptimized Memory Subsystems for the ASIC Target . ........ 55 4.8 FPGASubsystemSpecificity. 57 4.9 ASICSubsystemSpecificity . 57 4.10 SpeedupwithDifferentInputs . .... 59 4.11 WriteandAccessTimeImprovement . ... 61 4.12 Superoptimized Memory Subsystems for bitcount ............... 62 4.13 Superoptimized Memory Subsystems for dijkstra ............... 63 4.14 Superoptimized Memory Subsystems for heap ................. 64 4.15 Superoptimized Memory Subsystems for jpegd ................. 65 4.16 Superoptimized Memory Subsystems for patricia ............... 66 4.17 Superoptimized Memory Subsystems for qsort ................. 67 vi 4.18 Multi-ObjectiveSuperoptimization . ........ 68 4.19 Memory Subsystems for jpegd .......................... 69 5.1 Split-JoinTopology............................... 74 5.2 merge Topology.................................. 80 5.3 nbody Topology.................................. 80 5.4 laplace Topology ................................ 82 5.5 mm Topology.................................... 83 5.6 median Topology ................................. 83 5.7 SimulatedSpeedup ................................ 85 5.8 ActualSpeedup.................................. 85 5.9 Subsystem for the Hash Kernel ......................... 88 5.10 Subsystem for the Heap Kernel ......................... 88 5.11 Subsystem for the Distribute Kernel ..................... 90 5.12 Subsystem for the Buffer Kernel ........................ 91 5.13 Subsystem for the Streamer Kernel....................... 91 5.14 SubsystemSpecificity. ... 93 6.1 SimpleApplication ................................ 96 6.2 ExampleTopology ................................ 98 6.3 SimulationAlgorithm.............................. 102 6.4 SuperoptimizationAlgorithm . 103 6.5 Speedup ...................................... 107 6.6 Subsystem for the Heap Kernel ......................... 110 6.7 Subsystem for the Hash Kernel(Full) ...................... 110 6.8 Subsystem for the Hash Kernel(Model). 110 6.9 Subsystems for the Distribute Kernel ..................... 112 6.10 Subsystems for the Buffer Kernel........................ 113 vii 6.11 Subsystems for the Streamer Kernel ...................... 114 6.12 Simulations Required for Superoptimization . .......... 116 A.1 ExampleKernel.................................. 135 A.2 MersenneTwisterKernel. 137 A.3 ScalaPipeFibonacciKernel . 139 A.4 Intermediate Representation of the Fibonacci Kernel . ......... 140 A.5 OptimizedFibonacciKernel . 141 A.6 ExampleApplication ............................... 145 viii List of Tables 2.1 MainMemoryParameters . 12 3.1 MemorySubsystemComponents . 34 4.1 MainMemoryParameters . 40 5.1

Load more