Phd Dissertation
Total Page:16
File Type:pdf, Size:1020Kb
ABSTRACT WIDIALAKSONO, RANDY HARI. Three-Dimensional Integration of Heterogeneous Multi-Core Processors. (Under the direction of Dr. Paul Franzon and Dr. W. Rhett Davis.) This dissertation will explore the advantages of and design methodology for 3D integration in the context of building heterogeneous multi-core processors. The processor features a fast thread migration and cache core decoupling scheme. First, we present empirical results in a commercial 130 nm process. We demonstrate that the 3D implementation of a heterogeneous multi-core processor consumes 31% less power and 22% shorter average wirelength compared to the 2D implementation. Second, this work presents the physical design methodology used for the tape-out of a die-stacked 3D-IC processor. Finally, we propose a new algorithm and methodology for timing-driven 3D-IC via assignment. Experiment results show up to 30% improvement in total negative slack compared to a via assignment algorithm with total wirelength objective function. © Copyright 2016 by Randy Hari Widialaksono All Rights Reserved Three-Dimensional Integration of Heterogeneous Multi-Core Processors by Randy Hari Widialaksono A dissertation submitted to the Graduate Faculty of North Carolina State University in partial fulfillment of the requirements for the Degree of Doctor of Philosophy Computer Engineering Raleigh, North Carolina 2016 APPROVED BY: Dr. Eric Rotenberg Dr. Agnes Szanto Dr. Paul Franzon Dr. W. Rhett Davis Co-chair of Advisory Committee Co-chair of Advisory Committee DEDICATION Dedicated to my wife and my parents who instilled the importance of pursuing and applying knowledge. ii BIOGRAPHY Randy Widialaksono was born in Jakarta, Indonesia. He completed Bachelors in Electrical Engineering at Institut Teknologi Bandung, Indonesia, in 2009. He started his Ph.D. in Com- puter Engineering at North Carolina State University in 2010. His research focus is on design implementation methodologies for realizing 3D integrated circuits. He also maintains an active interest in computer architecture, digital VLSI design, and machine learning. He has been a IEEE member since 2008. iii ACKNOWLEDGEMENTS First of all, I would like to thank both of my advisors, Dr. Paul Franzon and Dr. W. Rhett Davis for being supportive and providing the opportunity for a rewarding research project. I would also like to thank the following faculty: Dr. Eric Rotenberg for teaching advanced computer micro-architecture concepts. Dr. Krishnendu Chakrabarty at Duke University for collaborating on our research project and welcoming me to his DFT course and research. Dr. Agnes Santo for feedback on the assignment problem and teaching computer algebra. I would like thank the following people for their contribution that made this dissertation possible: Dr. Steve Lipa for his tremendous contributions in deploying the design kit infras- tructure, signing off our tapeouts and developing numerous EDA utilities. Zhenqian Zhang for being a great colleague throughout the research project and helping in the final days of the tapeout performing timing ECO fixes. Bagus Wibowo for collaboration on the timing-driven via assignment experiments. Wenxu Zhao for collaboration on papers and helpful technical discussions. Josh Ledford for developing customized I/O pads for the 3D-IC tapeout process. Jongbeum Park for sharing insights on device and interconnect scaling. Kirti Bhanushali and Dr. T. Robert Harris, for proofreading and presentation feedback. Thor Thorollfsson for mentoring and sharing tape-out/Ph.D. experience. Elliott Forbes for conducting chip bringup of the 2D prototype. Rangeen for taking part in physical design for tapeouts and collaboration on papers. Brandon Dwiel for establishing the processor implementation. Vinesh Srinivasan for providing and verifying the 3D processor netlist. Qouniitah Fadhilah, for full support throughout the doctoral program, proofreading, and assisting in the graphics and typesetting department. iv TABLE OF CONTENTS LIST OF TABLES ...................................... vi LIST OF FIGURES ..................................... vii Chapter 1 Introduction ................................... 1 1.1 Overview of the Following Chapters ......................... 2 1.2 Abbreviations...................................... 3 Chapter 2 3D Integration .................................. 4 2.1 3D Multi-core Processor................................ 6 2.2 Challenges for 3D.................................... 7 2.3 Design for Test..................................... 8 2.3.1 3D DFT Overhead............................... 8 2.4 On-chip Timing Measurements ............................ 9 2.5 Routability Improvement ............................... 10 2.6 3D Via Assignment................................... 12 Chapter 3 3D-IC Physical Design Methodology .................... 14 3.1 Fabrication Process Technology............................ 14 3.2 Processor Architecture................................. 16 3.2.1 2D Prototype.................................. 17 3.2.2 3D Architecture ................................ 18 3.3 Design Flow....................................... 19 3.4 Floorplan ........................................ 20 3.5 Via Assignment..................................... 21 3.5.1 Visualization Tool ............................... 22 3.6 Power Delivery..................................... 23 3.7 Timing.......................................... 26 3.7.1 Timing Constraints and Analysis....................... 26 3.7.2 Inter-tier Clock Skew Balancing........................ 27 3.8 Physical Verification.................................. 28 3.8.1 Design Rule Checks .............................. 28 3.8.2 Connectivity Checks.............................. 28 3.9 Physical Design Metrics ................................ 28 Chapter 4 3D-IC Benefits Case Study .......................... 30 4.1 2D vs 3D Register File................................. 31 4.1.1 Experimental Framework ........................... 31 4.1.2 Floorplan.................................... 31 4.1.3 Area Comparison................................ 32 4.1.4 Power Analysis................................. 32 4.1.5 Face-to-face Via Pitch Analysis........................ 34 v 4.1.6 Routing Congestion .............................. 35 4.1.7 Wirelength Analysis.............................. 38 4.2 2D vs 3D Processor................................... 39 4.2.1 Floorplan.................................... 39 4.2.2 Wirelength Analysis.............................. 40 4.2.3 Power Analysis................................. 43 4.2.4 Path Delay Analysis.............................. 43 4.3 Conclusion ....................................... 47 Chapter 5 Timing Driven Via Assignment in 3D-IC ................. 51 5.1 Timing Metrics..................................... 53 5.2 Optimal Assignment.................................. 54 5.3 Nearest-Neighbor Assignment............................. 56 5.3.1 Timing-Ordered ................................ 57 5.3.2 Contention Based................................ 57 5.4 Resolving Multiple Sinks................................ 59 5.4.1 Fan-In...................................... 59 5.4.2 Fan-Out..................................... 59 5.5 Timing Aware Cost Function ............................. 60 5.6 Congestion Avoidance ................................. 61 5.7 Experiment Results................................... 62 5.7.1 Framework ................................... 62 5.7.2 Runtime..................................... 63 5.7.3 Parameter Search................................ 64 5.7.4 Wirelength Comparison............................ 64 5.7.5 Quality of Result Comparison......................... 64 5.8 Conclusion ....................................... 65 Chapter 6 Conclusion and Future Work ........................ 67 6.1 Summary of Contributions............................... 67 6.2 Future Work ...................................... 68 BIBLIOGRAPHY ....................................... 68 vi LIST OF TABLES Table 3.1 Process technology metrics........................... 15 Table 3.2 H3 core types .................................. 16 Table 3.3 FabScalar processor metrics [33]........................ 17 Table 3.4 Estimated maximum currents per metal width for vias and metals (mA per µm)[12] ................................... 25 Table 3.5 Physical design metrics of the fabricated 3D-IC processor.......... 29 Table 4.1 Face-to-face via experiment parameters.................... 35 Table 5.1 Assignment runtime with 2500 x 2500 problem size (seconds) . 63 Table 5.2 Comparison of total wirelength between via assignment schemes (µm) . 64 Table 5.3 Comparison of WNS between via assignment schemes (ns) ......... 64 vii LIST OF FIGURES Figure 2.1 Vernier TDC architecture for 3D on-chip timing measurements . 11 Figure 2.2 3D on-chip timing measurement scheme................... 11 Figure 3.1 Cross-section of 3D-IC stack.......................... 15 Figure 3.2 Prototype fabricated in IBM-8RF 130 nm.................. 17 Figure 3.3 Inter-core state transfer scheme: fast thread migration (FTM) [10] . 18 Figure 3.4 Inter-core state transfer scheme: cache core decoupling (CCD) [10] . 18 Figure 3.5 The 3D-IC physical design flow........................ 20 Figure 3.6 Detailed 3D-IC EDA tool flow......................... 21 Figure 3.7 3D-IC heterogeneous processor floorplan................... 22 Figure 3.8 Inter-tier signal to via assignment flow.................... 23 Figure 3.9 F2F via visualization and analysis tool.................... 24 Figure 3.10