Dynamic Runtime Scheduler Support for SCORE by Michael Monkang
Total Page:16
File Type:pdf, Size:1020Kb
Dynamic Runtime Scheduler Support for SCORE By Michael Monkang Chu Research Project Submitted to the Department of Electrical Engineering and Computer Sciences, University of California at Berkeley, in partial satisfaction of the requirements for the degree of Master of Science, Plan II. Approval for the Report and Comprehensive Examination: Committee: _____________________________________ Professor John Wawrzynek Research Advisor _____________________________________ (Date) ******* ____________________________________ Professor John Kubiatowicz Second Reader ____________________________________ (Date) Table of Contents ABSTRACT................................................................................................................................... 5 1 INTRODUCTION...................................................................................................................... 6 1.1 SCORE COMPUTE MODEL.................................................................................................................................. 6 1.1.1 Execution Model .......................................................................................................................................... 7 1.1.2 Hardware Requirements .............................................................................................................................. 9 1.2 SCHEDULER RESPONSIBILITIES.......................................................................................................................... 10 1.3 RELATED WORK................................................................................................................................................ 11 1.3.1 Multiprocessors ......................................................................................................................................... 11 1.3.2 Dataflow Systems....................................................................................................................................... 11 2 METHODOLOGY & IMPLEMENTATION....................................................................... 13 2.1 OVERVIEW OF SYSTEM FLOW............................................................................................................................ 13 2.2 SCHEDULING SCENARIOS................................................................................................................................... 14 2.2.1 Small Design, Large Array ........................................................................................................................ 14 2.2.2 Large Design, Small Array ........................................................................................................................ 15 2.2.3 Bufferlocked Design................................................................................................................................... 15 2.3 SCHEDULER FLOW............................................................................................................................................. 15 2.3.1 Operator Instantiation ............................................................................................................................... 15 2.3.2 Timeslice Iteration ..................................................................................................................................... 16 2.4 SCHEDULER IMPLEMENTATION HIGHLIGHTS ..................................................................................................... 22 2.4.1 “Frontier” Scheduling Heuristic............................................................................................................... 22 2.4.2 Pseudo-“Buddy” System Memory Management........................................................................................ 23 2.4.3 Deadlock Detection & Bufferlock Detection/Resolution ........................................................................... 25 2.4.4 Simulator Interface .................................................................................................................................... 26 3 RESULTS ................................................................................................................................. 27 3.1 APPLICATION OVERVIEW................................................................................................................................... 27 3.1.1 Wavelet Encoder/Decoder ......................................................................................................................... 27 3.1.2 JPEG Encoder ........................................................................................................................................... 27 3.2 PERFORMANCE SCALING ................................................................................................................................... 27 3.3 SCHEDULER RUNTIME COST ANALYSIS............................................................................................................. 29 4 FUTURE WORK..................................................................................................................... 31 4.1 EXPANDABLE STITCH BUFFERS ......................................................................................................................... 31 4.2 QUANTIZED PRIORITY “FRONTIER” CLUSTER LIST............................................................................................ 31 4.3 STATIC SCHEDULING IN SCORE ....................................................................................................................... 31 4.4 MIN-EDGE-CUT CLUSTERING............................................................................................................................. 32 4.5 PROCESS FAIRNESS GUARANTEES ..................................................................................................................... 33 5 CONCLUSION ........................................................................................................................ 34 6 APPENDIX – SCHEDULER DATA...................................................................................... 35 6.1 Data Types.................................................................................................................................................... 35 6.2 Data Structures............................................................................................................................................. 37 7 BIBLIOGRAPHY.................................................................................................................... 39 2 Table of Figures Figure 1 - Example of Page Decomposition: (a) original operator, (b) mapped to logic elements (LEs), (c) decomposed into fixed-size, 64-LE pages ............................................ 7 Figure 2 - Dataflow Computation Graph with both Compute Pages and Segments.............. 8 Figure 3 - Segments and Other Data mapped onto a CMB ...................................................... 8 Figure 4 - Stream Signals ............................................................................................................. 8 Figure 5 - Video Processing Operator......................................................................................... 9 Figure 6 - Fully Spatial Implementation of Video Processing Operator ................................. 9 Figure 7 - Capacity-Limited, Temporal Implementation of Video Processing Operator...... 9 Figure 8 - Hypothetical, single-chip SCORE system............................................................... 10 Figure 9 - High-level SCORE System Flow & Interaction ..................................................... 14 Figure 10 - Concurrent Operation of Processor & Array....................................................... 14 Figure 11 - Execution Flow of addOperator().......................................................................... 16 Figure 12 - Dataflow Graph with Strongly Connected Components Marked ...................... 16 Figure 13 - Execution Flow of doSchedule()............................................................................. 17 Figure 14 - Example of Explicit and Implicit Done Nodes...................................................... 18 Figure 15 - Example of trial-based scheduling. Shaded indicates speculatively scheduled and dots indicate speculative stitch buffers. ...................................................................... 21 Figure 16 - Example of Various Elements of "Frontier" Scheduling.................................... 23 Figure 17 - Example of Inter-Cluster Feedback Loop ............................................................ 24 Figure 18 - Allocation of CMB Memory Blocks in Pseudo-“Buddy” System ....................... 24 Figure 19 - Example of Dependency Cycle in Bufferlock Detection ...................................... 25 Figure 20 - Wavelet Encoder Dataflow..................................................................................... 27 Figure 21 - JPEG Encoder Dataflow......................................................................................... 28 Figure 22 - Wavelet Encoder: Makespan vs. Array Size (64-LUT CPs) ............................... 28 Figure 23 - Wavelet Decoder: Makespan vs. Array Size (64-LUT CPs)................................ 28 Figure 24 - JPEG Encoder: Makespan vs. Array Size (512-LUT CPs) ................................. 29 Figure 25 - Breakdown of Per Timeslice Scheduler Cost (Wavelet Encoder Execution)