Power, Performance and Reliability Optimisation of On-Chip Interconnect by Adroit Use of Dark Silicon

Power, Performance and Reliability Optimisation of On-Chip Interconnect by Adroit Use of Dark Silicon by Haseeb Bokhari AThesis Submitted in Accordance with the Requirements for the Degree of Doctor of Philosophy School of Computer Science and Engineering The University of New South Wales September 2015 Copyright c Haseeb Bokhari 2016 All Rights Reserved ii ORIGINALITY STATEMENT ‘I hereby declare that this submission is my own work and to the best of my knowledge it contains no materials previously published or written by another person, or substantial proportions of material which have been accepted for the award of any other degree or diploma at UNSW or any other educational institution, except where due acknowledgement is made in the thesis. Any contribution made to the research by others, with whom I have worked at UNSW or elsewhere, is explicitly acknowledged in the thesis. I also declare that the intellectual content of this thesis is the product of my own work, except to the extent that assistance from others in the project's design and conception or in style, presentation and linguistic expression is acknowledged.’ Signed …………………………………………….............. Date ……………………………………………..............9/06/2016 COPYRIGHT STATEMENT ‘I hereby grant the University of New South Wales or its agents the right to archive and to make available my thesis or dissertation in whole or part in the University libraries in all forms of media, now or here after known, subject to the provisions of the Copyright Act 1968. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation. I also authorise University Microfilms to use the 350 word abstract of my thesis in Dissertation Abstract International (this is applicable to doctoral theses only). I have either used no substantial portions of copyright material in my thesis or I have obtained permission to use copyright material; where permission has not been granted I have applied/will apply for a partial restriction of the digital copy of my thesis or dissertation.' Signed ……………………………………………........................... Date ……………………………………………..........09/06/2016 ................. AUTHENTICITY STATEMENT ‘I certify that the Library deposit digital copy is a direct equivalent of the final officially approved version of my thesis. No emendation of content has occurred and if there are any minor variations in formatting, they are the result of the conversion to digital format.’ Signed ……………………………………………........................... Date ……………………………………………..........09/06/2016 ................. Thesis Publications H. Bokhari, H. Javaid, M. Shafique, J. Henkel, S. Parameswaran, Super- • Net: Multimode Interconnect Architecture for Manycore Chips. In Design Automation Conference,DAC,2015 H. Bokhari, H. Javaid, M. Shafique, J. Henkel, S. Parameswaran, Malleable • NoC: Dark Silicon Inspired Adaptable Network on Chip. In Design, Automa- tion and Test in Europe,DATE,2015 H. Bokhari, H. Javaid, M. Shafique, J. Henkel, S. Parameswaran, darkNoC: • Designing Energy Efficient Network-on-Chip with Multi-Vt Cells for Dark Sil- icon. In Design Automation Conference,DAC,2014 H. Bokhari, H. Javaid, S. Parameswaran, System-Level Optimization of On- • chip Communication Using Express Links for Throughput Constrained MP- SoC. In IEEE Symposium on Embedded System for Real Time Multimedia, ESTIMedia, 2013 Other Publications H. Javaid, Y. Yachide, S. M. Min, H. Bokhari, S. Parameswaran, FALCON: • A framework for HierarichAL Computation of Metrics for CompONent-Based Parameterized SoCs. In Design Automation Conference , DAC,2014 iii Acknowledgements My journey towards PhD has been full of excitement, frustration and gratification. Itakethisopportunitytoacknowledgepeoplewhohelpedandencouragedmeon every step of my journey. First of all I thank Almighty Allah for His endless blessings. It was through His kindness that I was able to overcome all the challenges that I faced during my research. IextendmydeepestgratitudetomyPhDsupervisor,Prof.SriParameswaran. IhadalmostnoresearchexperiencewhenIenteredthePhDprogramme,butProf. Parameswaran believed in me and taught me the basic skills of identifying and solving a research problem. He has always been there to listen to my problems, in either research or life. I will always be grateful to him for his supervision and support. I thank my friend and research colleague Dr. Haris Javaid for his valuable feedback on my research. I am also very grateful to Dr. Muhammad Shafique for collaborating with me on research ideas and providing useful feedback. I acknowledge the valuable reviews of the research committee consisting of Prof. Gernot Heiser, Dr. Oliver Diesel and Dr. Annie Guo. Their timely feedback on my research was valuable for steering my research in the right direction. I take this opportunity to acknowledge the funding support I received from the School of Computer Science and Engineering, UNSW Faculty of Engineering and UNSW Graduate Research School. I am very grateful to my colleagues at Embedded System Lab: Isuru, Dr. Tuo, Darshana, Pasindu, Lawrence, Dr. Liang, Dr. Su, Dr. Jorgen, Dr. Angelo, Dr. Joseph, Babak, Arash and Daniel. Thank you all for sitting through my research talks and giving me invaluable feedback. ImadesomegreatfriendswhenIcametoSydneywhomademystayawonderful experience. Thank you Murtaza and Zafar for hosting me for numerous dinners and lunches. Thank you Malik, Shoaib and Hamid for wonderful weekend meetups. iv Last, I thank my family for their undaunted support throughout my life. Thank you mom and dad for encouraging me to pursue my passion and providing me all kinds of support I needed. Thank you my brothers Ammar Hassan Bokhari and Shoaib Bokhari for being there when I needed you the most. v Abstract Continuous transistor scaling has enabled computer architecture to integrate increas- ing numbers of cores on a chip. As the number of cores on a chip and application complexity has increased, the on-chip communication bandwidth requirement increased as well. Packet switched Network-on-Chip (NoC) is envisioned as a scalable and cost e↵ective communication fabric for multicore architectures with tens and hundreds of cores. Extreme transistor scaling (45nm and beyond) has its own share of technical challenges. Traditionally, semiconductor scaling resulted in reduction in both area and power of the transistor at the same time, a scaling law know as Dennard’s Scaling. However, for recent technology nodes, the power per transistor is not reducing at the same rate as area. Failed Dennard’s Scaling has resulted in a situation where we have abundant transistors, but not enough power to switch on these transistors at the same time, a phenomenon termed Dark Silicon.Previous research on dark silicon identified the energy inefficiency of general purpose com- puting cores as the cause of dark silicon and hence proposed integrating application specific accelerators or cores to improve energy efficiency and reliability, completely neglecting the interplay of dark silicon and NoC architecture. For the first time, this thesis proposes various NoC architectures that exploit dark silicon to improve the energy efficiency, performance and reliability of the on-chip interconnect. The first proposal is an on-chip interconnect, named darkNoC,that consists of multiple NoCs where each NoC is optimised at design time using multi-vt optimisation for di↵erent voltage-frequency (VF) levels. This architecture is based on the observation that instead of applying DVFS to a router that has been designed to operate at a higher VF level, using a router that has been designed specifically for a lower VF level is more energy efficient. The architecture operates autonomously for seamless switchover between di↵erent NoCs. This architecture can provide up to 52% saving in NoC energy delay product (EDP) for certain benchmarks, whereas vi state-of-the-art DVFS scheme only saved 15% EDP. Then, the Malleable NoC architecture is proposed, which improves the energy efficiency of NoC by a combination of multiple VF optimised routers and per node VF selection. We exploit the het- erogeneity of application workload and application-to-core mapping to compose a communication fabric at runtime by choosing a VF optimised router from each mesh node. In one of the benchmark applications, darkNoC saved 26.4% NoC EDP whereas Malleable NoC saved 46% EDP, showing the efficacy of Malleable NoC architecture in the presence of workloads with heterogeneous applications. Next, this thesis proposes SuperNet NoC architecture, that exchanges dark silicon for optimising the energy, performance and reliability of on-chip interconnect. SuperNet consists of two parallel NoC planes that are optimised for di↵erent VF levels, and can be configured at runtime to operate in energy efficient mode, performance mode or reliability mode. Our evaluation with a diverse set of applications show that the energy efficient mode can save on average 40% NoC power, whereas the performance mode improves the average core IPC by up to 13% for high MPKI applications. The reliability mode provides safety against soft error in the data path through the use of strong byte oriented error correction codes, and in the control path through dual modular redundancy and lock step processing. Finally, a design flow for designing custom on-chip communication for application specific MPSoCs targeting streaming applications is proposed. Streaming application, such as H264 encoder,arenormallyrealisedusingasetofprocessorslogically

Load more