Embedded Networks on Chip for Field-Programmable Gate Arrays by Mohamed Saied Abdelfattah a Thesis Submitted in Conformity With

Embedded Networks on Chip for Field-Programmable Gate Arrays by Mohamed Saied Abdelfattah A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Graduate Department of Electrical and Computer Engineering University of Toronto © Copyright 2016 by Mohamed Saied Abdelfattah Abstract Embedded Networks on Chip for Field-Programmable Gate Arrays Mohamed Saied Abdelfattah Doctor of Philosophy Graduate Department of Electrical and Computer Engineering University of Toronto 2016 Modern field-programmable gate arrays (FPGAs) have a large capacity and a myriad of embedded blocks for computation, memory and I/O interfacing. This allows the implementation of ever-larger applications; however, the increase in application size comes with an inevitable increase in complexity, making it a challenge to implement on-chip communication. Today, it is a designer's burden to create a customized communication circuit to interconnect an application, using the fine-grained FPGA fabric that has single-bit control over every wire segment and logic cell. Instead, we propose embedding a network-on-chip (NoC) to implement system-level communication on FPGAs. A prefabricated NoC improves communication efficiency, eases timing closure, and abstracts system-level communication on FPGAs, separating an application's behaviour and communication which makes the design of complex FPGA applications easier and faster. This thesis presents a complete embedded NoC solution, includ- ing the NoC architecture and interface, rules to guide its use with FPGA design styles, application case studies to showcase its advantages, and a computer-aided design (CAD) system to automatically interconnect applications using an embedded NoC. We compare NoC components when implemented hard versus soft, then build on this component-level analysis to architect embedded NoCs and integrate them into the FPGA fabric; these NoCs are on average 20{23× smaller and 5{6× faster than soft NoCs. We design custom interfaces between the embedded NoC and the FPGA fabric to transport data efficiently without compromising on the FPGA's configurability, and then we enumerate the necessary conditions to implement FPGA-compatible communication styles using our NoC. Next, our application case study with image compression shows that an embedded NoC improves frequency by 10{80% and reduces the utilization of scarce long wires by 40%. Additionally, we leverage our embedded NoC to create an Ethernet switch that has ~5× more bandwidth and ~3× lower area compared to other FPGA-based switches. Finally, we create a CAD system (LYNX) that automatically connects an application using an embedded NoC. We compare our LYNX + embedded NoC interconnection solution to a commercial CAD tool that generates a custom soft bus, and show that we improve both the efficiency and performance of most systems. ii Acknowledgements My sincerest thanks go to my advisor Prof. Vaughn Betz from whom I have learned so much over the past 5 years, both technically, pedagogically and personally. Never have I met someone so knowledgeable, yet so modest and generous with his time and supervision; I will forever be indebted to him. I consider myself lucky to have worked with one of the world's foremost authorities on FPGA technology { his guidance has greatly improved the quality of this work. I would like to thank Prof. Natalie Enright Jerger for her constant guidance on everything related to NoCs throughout my PhD { her feedback has been invaluable and I am deeply appreciative of our meetings and email exchanges. I was also fortunate to meet many outstanding researchers, both at the University of Toronto and at Altera Corporation, with whom I have regularly consulted and learned from. Thanks to Prof. Jason Anderson, Prof. Jonathan Rose, Prof. Paul Chow, Dr. Desh Singh, Dr. David Lewis, Dr. Mike Hutton and Dr. Dana How. Their expertise and feedback helped make this work more realistic and accurate. I would also like to thank Prof. James Hoe of Carnegie Mellon University for serving as external examiner during my final PhD defense, and for providing fresh insights and valuable comments on my work. Special thanks to master's graduate Andrew Bitar who joined my project and proved its viability through essential networking application case studies. Andrew's energy accelerated our collaborative work, and contributed to some of the best parts of this thesis. Thanks to my lab mates and friends who were always patient in hearing me talk about my research, and often sparked clever additions and enhancements to my research. Kevin Murray, Jeff Cassidy, Charles Chiasson, Tim Liu, Shehab Yomn and Mario Badr. I would also like to thank summer students Ange Yaghi, Harshita Huria and Aya ElSayed for contributing excellent work my PhD project. I learned a lot by supervising you. During my PhD, I have been fortunate to receive funding from Altera Corporation, the University of Toronto, the Connaught International Scholarship and the Vanier Canada Graduate Scholarship. On a more personal note, I would like to thank my friends whose camaraderie made five years of PhD research a lot easier and much more fun. My warmest thanks go to my wife, Lina, a brilliant art historian who was never too bored or impatient when I rambled on and on about FPGAs and NoCs, on the contrary, she was always keen on hearing about my latest work details. Also, her opinions about creating good illustrations greatly improved the quality of the figures and graphs in this thesis. Without your care and love, nothing would be the same. Finally, I would like to thank my parents and siblings for their unconditional love and support which fueled my journey until I got here. iii Preface Work in this thesis is largely based on the following publications: Peer-reviewed Conference Papers: Mohamed S Abdelfattah and Vaughn Betz. Design Tradeoffs for Hard and Soft FPGA-based Networks-on-Chip. In International Conference on Field-Programmable Technology (FPT), pages 95{103. IEEE, 2012 Mohamed S Abdelfattah and Vaughn Betz. The Power of Communication: Energy-Efficient NoCs for FPGAs. In International Conference on Field-Programmable Logic and Applications (FPL), pages 1{8. IEEE, 2013 (Stamatis Vassiliadis Best Paper Award) Mohamed S. Abdelfattah and Vaughn Betz. Augmenting FPGAs with Embedded Networks- on-Chip. In Workshop on the Intersections of Computer Architecture and Reconfigurable Logic (CARL), 2013 Mohamed S. Abdelfattah, Andrew Bitar, and Vaughn Betz. Take the Highway: Design for Embed- ded NoCs on FPGAs. In International Symposium on Field-Programmable Gate Arrays (FPGA), pages 98{107. ACM, 2015 (Best Paper Award) M.S. Abdelfattah, A. Bitar, A. Yaghi, and V. Betz. Design and simulation tools for Embedded NOCs on FPGAs. In International Conference on Field-Programmable Logic and Applications (FPL). IEEE, 2015. [Demonstration Abstract] Andrew Bitar, Mohamed S. Abdelfattah, and Vaughn Betz. Bringing Programmability to the Data Plane: Packet Processing with a NoC-Enhanced FPGA. In International Conference on Field-Programmable Technology (FPT). IEEE, 2015 Mohamed S Abdelfattah and Vaughn Betz. LYNX: CAD for Embedded NoCs on FPGAs. In International Conference on Field-Programmable Logic and Applications (FPL). IEEE, 2016 (Ac- cepted) Peer-reviewed Journal Papers: Mohamed S Abdelfattah and Vaughn Betz. The Case for Embedded Networks on Chip on FPGAs. IEEE Micro, 34(1):80{89, 2014 Mohamed S Abdelfattah and Vaughn Betz. Networks-on-Chip for FPGAs: Hard, Soft or Mixed? ACM Transactions on Reconfigurable Technology and Systems (TRETS), 7(3):1{22, 2014 (Invited) iv Mohamed S Abdelfattah and Vaughn Betz. Power Analysis of Embedded NoCs on FPGAs and Comparison With Custom Buses. IEEE Transactions on Very Large Scale Integration Systems (TVLSI), 24(1):165{177, 2016 Mohamed S Abdelfattah, Andrew Bitar, and Vaughn Betz. Design and Applications for Embed- ded Networks-on-Chip on Field-Programmable Gate-Arrays. IEEE Transactions on Computers (TCOMP), 2016 (Submitted) Book Chapter and Patent Application: Mohamed S Abdelfattah and Vaughn Betz. Embedded Networks-on-Chip for FPGAs. In Pierre- Emmanuel Gaillardon, editor, Reconfigurable Logic: Architecture, Tools and Applications, chapter 6, pages 149{184. CRC Press, 2016 Mohamed S Abdelfattah and Vaughn Betz. Field Programmable Gate-Array with Network-on- Chip Hardware and Design Flow, 04 2015. US Patent Application 14/060,253 v Contents Abstract iii Preface v List of Tables x List of Figures xiv List of Abbreviations xvi 1 Introduction 1 1.1 Motivation ............................................ 1 1.2 Embedded NoCs for Future FPGAs .............................. 3 1.3 Thesis Organization ....................................... 4 2 Background 5 2.1 Interconnection Problems and Solutions ............................ 5 2.1.1 Scaling of Wires ..................................... 6 2.1.2 Interconnect-Aware Design ............................... 7 2.1.3 Latency-Insensitive Design ............................... 7 2.1.4 Networks-on-Chip .................................... 9 2.2 FPGA Interconnection...................................... 12 2.2.1 FPGA Logic and Interconnect Scaling......................... 12 2.2.2 FPGA System-Level Interconnect............................ 14 2.3 FPGA-Based NoCs........................................ 19 2.3.1 Soft NoCs......................................... 19 2.3.2 Hard NoCs ........................................ 21 2.4 Summary ............................................. 24 I Architecture 25

Embedded Networks on Chip for Field-Programmable Gate Arrays by Mohamed Saied Abdelfattah a Thesis Submitted in Conformity With

Network on Chip for FPGA Development of a Test System for Network on Chip

An Early Performance Evaluation of Many Integrated Core Architecture Based SGI Rackable Computing System

A Comparison of Four Series of Cisco Network Processors

MYTHIC MULTIPLIES in a FLASH Analog In-Memory Computing Eliminates DRAM Read/Write Cycles

Using Inspiration from Synaptic Plasticity Rules to Optimize Traffic Flow in Distributed Engineered Networks Arxiv:1611.06937V1

State-Of-The-Art in Heterogeneous Computing

ESA DSP Day 2016 Workshop Proceedings

Hardware Design of Message Passing Architecture on Heterogeneous System

Scheduling Tasks on Heterogeneous Chip Multiprocessors with Reconfigurable Hardware

Smartcell: an Energy Efficient Reconfigurable Architecture for Stream Processing

Towards a Scalable Software Defined Network-On-Chip for Next Generation Cloud

Low-Power Design Using Noc Technology