Designing Low Power and High Performance Network-On-Chip Communication Architectures for Nanometer Socs

Designing Low Power and High Performance Network-on-Chip Communication Architectures for Nanometer SoCs DISSERTATION Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University By Gursharan Reehal, B.S., M.S. Graduate Program in Electrical and Computer Engineering The Ohio State University 2012 Dissertation Committee: Prof. Mohammed Ismail El-Naggar, Advisor Prof. Steve Bibyk Prof. Joanne DeGroat c Copyright by Gursharan Reehal 2012 ABSTRACT Network-on-Chip (NoC) communication architectures have been recognized as the most scalable and efficient solution for on chip communication challenges in the multi-core era. Diverse demanding applications coupled with the ability to integrate billions of transistors on a single chip are some of the main driving forces behind ever increasing performance requirements towards the level that requires several tens to over a hundred of cores per chip, with aggregate performance exceeding one trillion operations per second. Such tera-scale many-core processors will be highly integrated System-on-Chip designs (SoC) containing a variety of on-chip storage elements, mem- ory controllers, and input/output (I/O) functional blocks. Small scale multicore processors so far have been a great commercial success and found applicability in high bandwidth, computer intensive applications including high performance, throughput oriented, scientific computing, high performance graphics and 3-D immersive visual interfaces, as well as in decision and support systems. Systems using multi-core processors are now the norm rather than the exception. As the number of cores or components integrated into a single system is keep increasing, the design of on-chip communication architecture is becoming more chal- lenging. The increasing number of components in a system translates into more ii inter-component communication that must be handled by the on-chip communication infrastructure. It's not surprising to see that leading-edge design teams search- ing for better solutions as multi-core SoCs continue to evolve. Future system-on-chip (SoC) designs require predictable, scalable and reusable on-chip communication architectures to increase reliability and productivity. Current bus-based interconnect architectures are inherently non-scalable, less adaptable for reuse and their reliability decreases with system size. NoC communication guarantees scalability, high-speed, high-bandwidth communication with minimal wiring overhead and routing issues. NoCs are layered, packet- based on-chip communication networks integrated onto a single chip and their opera- tion is based on the operating principle of macro networks. NoC consists of resources and switches that are directly connected in a way that resources are able to com- municate with each other by sending messages. The proficiency of a NoC to meet its design goals and budget requirements for the target application depends on its design. Often, these design goals conflict and trade-off with each other. The multi- dimensional pull of design constraints in addition to technology scaling complicates the process of NoC design in many aspects, as they are expected to support high performance and reliability along with low cost, smaller area, less time-to-market and lower power consumption. To aid the process, this research presents design method- ologies to achieve low power and high performance NoC communication architectures for nanometer SoCs. In NoCs, interconnects play a crucial role in the overall system performance and can have a large impact on the total power consumption, wiring area and achievable system performance. The effect of technology scaling on the NoC interconnects is iii studied and an improved design flow is presented. The influence of technology node, die size, number of components on the power consumption by the NoC interconnects is analyzed. The success of NoC heavily depends on its power budget. As CMOS technology continues to scale, power aware design is more important than ever before, especially for the designs targeted towards low power applications, however in large scale NoCs the power consumption can increase beyond acceptable limits. Designing low power NoC is therefore extremely important especially for larger SoCs designs. The elevation of power to a first-class design constraint requires that power estimations are done at the same time as the performance studies in the design flow. In NoC, one method to have a power aware design is to consider the impact of the architectural choices on the power in the early stages of design process. In this research an efficient design methodology based on the layout and power models is presented to have rough power estimates in the early stages of design cycle. The impact of die size and number of IPs on the power consumed by different NoC architectures is evaluated. Additionally, as multi-core SoCs continue to evolve, Globally Asynchronous Lo- cally Synchronous (GALS) design techniques have been suggested as a potential solution in larger and faster SoC designs to avoid the problems related to synchronization and clock skew. These multi-core SoCs will operate using GALS paradigm, where each core can operate in a separate clock domain. In this research a study on the power efficiency between Synchronous and Asynchronous NoC architecures is presented. Asynchronous NoC architecture is shown to consume less power, when activity factor of data transfers between two switches is within certain range. Asynchronous designs are more power efficient, as the need for clock distribution is eliminated. iv Dedicated to my mother, for her love, support and encouragement. v ACKNOWLEDGMENTS I would like to thank many remarkable people who have supported and encouraged me during my time at The Ohio State University. First and foremost, I would like to thank my advisor, Prof. Mohammed Ismail for his continued guidance, support and sustained encouragement throughout my graduate study. He always encouraged me to think independently and to define my own research agenda, allowing me to develop the skills necessary to do research. His significant effort in bringing the Synopsys Software, an industry standard EDA tool for research in the area of Digital VLSI and henceforth enriching the quality of education and student research experience here at The Ohio State University is highly remarkable. Without his commitment and encouragement, this dissertation would not have been possible. The experience with Prof. Ismail will always be highly regarded. I would like to thank Prof. Steve Bibyk for his early guidance, mentorship, encouragement and support to pursue the PhD program. I am thankful for his valuable discussions throughout my time in the ECE department and for being my MS advisor. It truly has been a great experience working with him. I am also very grateful to Prof. Joanne DeGroat for her support, guidance and for kindly serving on my PhD exam committees. I am thankful to her for treating me as a member of her own group. vi I am thankful for the opportunity to do some research with Mohamed Abd Ghany, who is affiliated with the German University Cairo in Cairo Egypt. He has been a very helpful friend and a great source of guidance in the work on Asynchronous NoCs. I am grateful for his insight in the asynchronous digital design and for his support. I enjoyed working with him. I am honored to call myself a member of the VLSI Lab. I want to thank my fellow colleagues Amneh Akour, Sleiman Bou Sleiman, John Hu, Sidharth Balasubramanian, Yiqiao Lin, Feiran Lei and Samantha Yoder for their friendship, discussions and valuable guidance. In particular, I would like to thank Amneh Akour, who has been a great source of sage advice in the times, when I needed it the most. It truly has been a privilege for me to work with them in the VLSI lab and I always enjoyed and cherished their company. I would also like to thank many people in the ECE department. Stephanie Muldrow, Carol Duhigg, Tricia Toothman, Vincent Juodvalkis, Aaron Aufderheide, Don Gibb, and Edwin Lim, who work very hard and diligently behind the scenes, to make this department a wonderful place for graduate students. Their friendly and helping nature ease the stress of a graduate student and who are always willing to help and guide students in our department. Life as a graduate student would not be possible without the help of family and friends. My deepest regards goes to my mother, to whom I dedicate this work. She always encouraged me and supported me to pursue higher education. She worked very hard in making sure, education is a priority. She always emphasized the importance of education, and encouraged me to pursue the PhD program. She has been a great source of inspiration in my life and education. I am deeply thankful to her for her vii endless love, care, support, wisdom...and basically everything. Your love and faith in me has made all the difference...love you Mom! My final thanks goes to the reason I am here and able to do this work, my creator. I am thankful for the talent and opportunities I have been given and the strength to undertake this task and see it to completion! viii VITA 1996 . .B.S. Electrical Engineering 1998 . .M.S. Electrical Engineering 2007- 2009 . .Graduate Teaching Associate, The Ohio State University. 2010 . .Graduate Technical Intern, Intel Corporation 2010 . .Graduate Technical Research Intern, Intel Labs FIELDS OF STUDY Major Field: Electrical and Computer Engineering ix

Designing Low Power and High Performance Network-On-Chip Communication Architectures for Nanometer Socs

A Survey of Network Performance Monitoring Tools

Development of a Dynamically Extensible Spinnaker Chip Computing Module

An Introduction

An Architecture and Compiler for Scalable On-Chip Communication

Three-Dimensional Integrated Circuit Design: EDA, Design And

On-Chip Interconnect Schemes for Reconfigurable System-On-Chip

Learning to Predict End-To-End Network Performance

NVIDIA CUDA on IBM POWER8: Technical Overview, Software Installation, and Application Development

Designing Vertical Bandwidth Reconfigurable 3D Nocs for Many

A Social Network Matrix for Implicit and Explicit Social Network Plates

Openpower AI CERN V1.Pdf

AXI Reference Guide