Design and Optimization of Networks-On-Chip for Future Heterogeneous Systems-On-Chip
Total Page:16
File Type:pdf, Size:1020Kb
Design and Optimization of Networks-on-Chip for Future Heterogeneous Systems-on-Chip Young Jin Yoon Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences COLUMBIA UNIVERSITY 2017 ©2017 Young Jin Yoon All Rights Reserved ABSTRACT Design and Optimization of Networks-on-Chip for Future Heterogeneous Systems-on-Chip Young Jin Yoon Due to the tight power budget and reduced time-to-market, Systems-on-Chip (SoC) have emerged as a power-efficient solution that provides the functionality required by target applications in em- bedded systems. To support a diverse set of applications such as real-time video/audio processing and sensor signal processing, SoCs consist of multiple heterogeneous components, such as software processors, digital signal processors, and application-specific hardware accelerators. These com- ponents offer different flexibility, power, and performance values so that SoCs can be designed by mix-and-matching them. With the increased amount of heterogeneous cores, however, the traditional interconnects in an SoC exhibit excessive power dissipation and poor performance scalability. As an alternative, Networks-on-Chip (NoC) have been proposed. NoCs provide modularity at design-time because communications among the cores are isolated from their computations via standard interfaces. NoCs also exploit communication parallelism at run-time because multiple data can be transferred simul- taneously. In order to construct an efficient NoC, the communication behaviors of various heterogeneous components in an SoC must be considered with the large amount of NoC design parameters. There- fore, providing an efficient NoC design and optimization framework is critical to reduce the design cycle and address the complexity of future heterogeneous SoCs. This is the thesis of my dissertation. Some existing design automation tools for NoCs support very limited degrees of automation that cannot satisfy the requirements of future heterogeneous SoCs. First, these tools only support a limited number of NoC design parameters. Second, they do not provide an integrated environment for software-hardware co-development. Thus, I propose FINDNOC, an integrated framework for the generation, optimization, and validation of NoCs for future heterogeneous SoCs. The proposed framework supports software- hardware co-development, incremental NoC design-decision model, SystemC-based NoC customiza- tion and generation, and fast system protyping with FPGA emulations. Virtual channels (VC) and multiple physical (MP) networks are the two main alternative meth- ods to obtain better performance, provide quality-of-service, and avoid protocol deadlocks in packet- switched NoC design. To examine the effect of using VCs and MPs with other NoC architectural parameters, I completed a comprehensive comparative analysis that combines an analytical model, synthesis-based designs for both FPGAs and standard-cell libraries, and system-level simulations. Based on the results of this analysis, I developed VENTTI, a design and simulation environment that combines a virtual platform (VP), a NoC synthesis tool, and four NoC models characterized at different abstraction levels. VENTTI facilitates an incremental decision-making process with four NoC abstraction models associated with different NoC parameters. The selected NoC parameters can be validated by running simulations with the corresponding model instantiated in the VP. I augmented this framework to complete FINDNOC by implementing ICON, a NoC generation and customization tool that dynamically combines and customizes synthesizable SystemC compo- nents from a predesigned library. Thanks to its flexibility and automatic network interface gener- ation capabilities, ICON can generate a rich variety of NoCs that can be then integrated into any Embedded Scalable Platform (ESP) architectures for fast prototying with FPGA emulations. I designed FINDNOC in a modular way that makes it easy to augment it with new capabilities. This, combined with the continuous progress of the ESP design methodology, will provide a seam- less SoC integration framework, where the hardware accelerators, software applications, and NoCs can be designed, validated, and integrated simultaneously, in order to reduce the design cycle of future SoC platforms. Table of Contents List of Figures v List of Tables viii Acknowledgements ix 1 Introduction 1 1.1 Trends in SoC Design Methodology . .2 1.2 NoC Design Space and Automation Tools . .5 1.3 Overview of FINDNOC................................7 1.4 Outline of the Dissertation . .9 2 Future of Heterogeneous Systems-on-Chip 11 2.1 Moore’s Law and Dennard’s CMOS Scaling . 11 2.1.1 Dark Silicon and the Four Horsemen . 14 2.1.2 Challenges of Specialization . 16 2.2 Embedded Scalable Platforms . 16 2.2.1 ESP Architecture . 17 2.2.2 ESP Design Methodology . 19 2.3 Concluding Remarks . 21 3 Networks-on-Chip 23 i 3.1 Emergence of Networks-on-Chip . 23 3.2 Basics of Network Design . 25 3.2.1 Common Considerations . 25 3.2.2 Topology . 27 3.2.3 Routing . 28 3.2.4 Flow Control . 34 3.2.5 Performance Evaluation . 42 3.2.6 Simulation Methodology . 47 3.3 Design and Optimization of Networks-on-Chip . 49 3.3.1 Characteristics of Networks-on-Chip . 50 3.3.2 NoC Router Architecture . 52 3.4 Concluding Remarks . 56 4 A Comparative Analysis of Multiple Physical Networks and Virtual Channels 57 4.1 Introduction . 58 4.2 Related Work . 61 4.3 Comparison Methodology . 63 4.3.1 Competitive Sizing . 64 4.3.2 Minimum Sizing . 65 4.4 Analytical Model . 66 4.4.1 Microarchitectural Analysis . 66 4.4.2 Latency Analysis . 68 4.5 Synthesis-Based Analysis . 69 4.5.1 Area Analysis . 70 4.5.2 Power Analysis . 72 4.5.3 Analysis with Architectural Optimizations of VC Routers . 74 4.5.4 The Effect of Channel Width . 75 4.5.5 Analysis with Minimum Sizing . 75 4.5.6 Synthesis Analysis with FPGA . 76 ii 4.5.7 Summary . 77 4.6 System-Level Performance Analysis . 78 4.6.1 Throughput . 80 4.6.2 Latency . 84 4.6.3 Impact of Minimum Sizing . 84 4.7 Case Study: Shared-Memory CMPs . 85 4.7.1 Experimental Methodology . 85 4.7.2 Experimental Results . 88 4.8 Future Work . 92 4.9 Concluding Remarks . 94 5 Vertically Integrated Framework for Simulation and Optimization of Networks-on- Chip 96 5.1 Introduction . 97 5.2 Related Work . 99 5.3 The VENTTI Integrated Framework . 100 5.4 VENTTI Network Abstraction Layers . 103 5.4.1 Transaction Verification Model . 103 5.4.2 Hop-Count-Based Network Model . 103 5.4.3 Filt-Based Network Model . 106 5.4.4 Pin-and-Cycle-Accurate Model . 107 5.5 Case Study . 108 5.6 Conclusions . 114 6 System-Level Design of Networks-on-Chip for Heterogeneous Systems-on-Chip 115 6.1 Introduction . 116 6.2 Related Work . 118 6.3 Overview of the Proposed Framework . 119 6.3.1 SystemC NoC Component Library . 120 iii 6.3.2 Input and Output Units . 122 6.3.3 Network Interfaces . 122 6.3.4 Network Generation . 125 6.4 Experimental Results . 127 6.4.1 Experiments with ASIC Design Flow . 128 6.4.2 Experiments with FPGA Designs . 130 6.5 Conclusions . 132 7 Conclusion 134 7.1 Limitations and Future Work . 136 Bibliography 151 Appendix: Instructions for VENTTI and ICON 152 1 VENTTI ........................................ 152 1.1 Software Requirements . 153 1.2 File Structure . 154 1.3 Login and Compilation of COSI ....................... 155 1.4 Initial Simulations with TVM . 156 1.5 FNM and HNM NoC Generations with COSI . 157 2 ICON ......................................... 158 2.1 Software Requirements . 158 2.2 File Structure . 159 2.3 Generation of API Reference Manual . 159 2.4 XML Generation and Front-End Compilation . 160 2.5 High-Level and Logic Syntheses . 161 2.6 SystemC, RTL, and Netlist Simulations . 161 iv List of Figures 1.1 A comparison of classical design vs. hardware-software co-design [114].......4 1.2 The organization of FINDNOC. ...........................7 2.1 The end of Moore’s law and Dennard’s CMOS scaling. 12 2.2 Four key approaches for the dark silicon [112].................... 14 2.3 Socket interfaces for the two main heterogeneous components in an ESP architec- ture: processors and accelerators [30]......................... 17 2.4 An example of the ESP accelerator [79]........................ 19 2.5 The ESP design methodology [30] .......................... 20 3.1 Probability of contention with the bus-based interconnect. 24 3.2 Topology of a 4x4 2D torus and its layout. 27 3.3 Examples of direct and indirect network topologies. 28 3.4 An example of deadlock and its dependency graph. 29 3.5 An example of routing livelock with a deflecting flow control mechanism and multi- filt packets. 30 3.6 All possible turns in 2D-Mesh, and turn restrictions of routing algorithms. 31 3.7 Header encodings for two different routing types..