HARDWARE ACCELERATORS for VLSI GLOBAL ROUTING a Thesis
Total Page:16
File Type:pdf, Size:1020Kb
HARDWARE ACCELERATORS FOR VLSI GLOBAL ROUTING A Thesis Presented to The Faculty of Graduate Studies of The University of Guelph by MAHDIELGHAZALI In partial fulfilment of requirements for the degree of Master of Science January, 2009 © Mahdi Elghazali, 2009 Library and Bibliotheque et 1*1 Archives Canada Archives Canada Published Heritage Direction du Branch Patrimoine de I'edition 395 Wellington Street 395, rue Wellington Ottawa ON K1A0N4 Ottawa ON K1A0N4 Canada Canada Your file Votre reference ISBN: 978-0-494-47764-9 Our file Notre reference ISBN: 978-0-494-47764-9 NOTICE: AVIS: The author has granted a non L'auteur a accorde une licence non exclusive exclusive license allowing Library permettant a la Bibliotheque et Archives and Archives Canada to reproduce, Canada de reproduire, publier, archiver, publish, archive, preserve, conserve, sauvegarder, conserver, transmettre au public communicate to the public by par telecommunication ou par Plntemet, prefer, telecommunication or on the Internet, distribuer et vendre des theses partout dans loan, distribute and sell theses le monde, a des fins commerciales ou autres, worldwide, for commercial or non sur support microforme, papier, electronique commercial purposes, in microform, et/ou autres formats. paper, electronic and/or any other formats. The author retains copyright L'auteur conserve la propriete du droit d'auteur ownership and moral rights in et des droits moraux qui protege cette these. this thesis. Neither the thesis Ni la these ni des extraits substantiels de nor substantial extracts from it celle-ci ne doivent etre imprimes ou autrement may be printed or otherwise reproduits sans son autorisation. reproduced without the author's permission. In compliance with the Canadian Conformement a la loi canadienne Privacy Act some supporting sur la protection de la vie privee, forms may have been removed quelques formulaires secondaires from this thesis. ont ete enleves de cette these. While these forms may be included Bien que ces formulaires in the document page count, aient inclus dans la pagination, their removal does not represent il n'y aura aucun contenu manquant. any loss of content from the thesis. Canada ABSTRACT HARDWARE ACCELERATORS FOR VLSI GLOBAL ROUTING Mahdi Elghazali Advisor: University of Guelph, 2009 Dr. Shawki Areibi This thesis investigates three different approaches to enhance the performance of the global routing step in the physical design process. The first approach is based on a hardware/software co-design strategy, while the second is a custom hardware implementation using Handel-C [1]. An application specific instruction implementation is also implemented and investigated. This approach targets the Tensilica configurable processor. The experimental results show that the three approaches produce the same quality solutions as the pure-software implementation. However, the co-design approach achieves an average speedup of 4.3x over the pure- software based approach, while the custom hardware approach achieves an average speed up of 3.9x. The configurable approach obtained an average speedup of 33.6x over the pure software, while achieving a speedup of 7.81x and 8.61x over the hardware/software co-design and the custom hardware respectively. I hereby declare that I am the sole author of this thesis. I authorize the University of Guelph to lend this thesis to other institutions or individuals for the purpose of scholarly research. I further authorize the University of Guelph to reproduce this thesis by photo copying or by other means, in total or in part, at the request of other institutions or individuals for the purpose of scholarly research. 1 The University of Guelph requires the signatures of all persons using or photo copying this thesis. Please sign below, and give address and date. n Acknowledgments I would like to take this opportunity to express my sincere appreciation and thanks to my supervisor professor Shawki Areibi for his great guidance and assistance, and for the help he provided throughout this Master program. Many thanks to professor Radu Muresan and professor Gary Grewal for reviewing this thesis. I would like also to thank Adam Erb and Jon Spenceley for their help in this work. I want to especially thank my father, my mother, my brothers and sister for their continuous encouragement and support. And finally, many thanks to all my friends. Special thanks to Ahmed Saghaier and Ahmed Elhossini, I really enjoyed the time we spent together. Thanks to all the people who helped me by any means. m To my family for their support and encouragement. iv Contents 1 Introduction 1 1.1 Motivation 2 1.2 Overall Methodology 4 1.3 Contributions 5 1.4 Thesis Organization 6 2 Background 7 2.1 VLSI Design Process 8 2.1.1 VLSI Physical Design Automation 9 2.2 Global Routing 11 2.2.1 Routing Problem Definition 12 2.2.2 A Classification of Global Routing Algorithms 12 2.3 Maze Routing Algorithms 13 2.3.1 Lee's Algorithm 14 2.3.2 Limitations of Lee's Algorithm for Large Circuits 14 2.3.3 Reducing the Running Time 15 2.4 Reconfigurable Computing Systems 17 v 2.4.1 Hardware/Software Co-design in RCS 17 2.4.2 Field-Programmable Gate Arrays (FPGAs) 19 2.5 Application Specific Instruction-set Processors 22 2.5.1 Tensilica Configurable Processors 22 2.6 Benchmarks 23 2.7 Summary 25 3 Literature Review 26 3.1 Placement Based Hardware Accelerators 28 3.2 Accelerators for FPGA Routers 31 3.2.1 Distributed Workstations 31 3.2.2 Pure Hardware Accelerators 32 3.3 Accelerators for ASIC Routers 34 3.3.1 General Purpose Processors 34 3.3.2 ASIC-Based Implementations 37 3.3.3 FPGA-Based Implementations 41 3.4 Summary 44 4 Hardware/Software Co-design 46 4.1 Methodology 46 4.2 Design Flow of Lee's Algorithm 48 4.3 A Pure-software Based Implementation 49 4.3.1 Implementation on a MicroBlaze System 49 4.3.2 Major Software Functions 51 4.3.3 Multi-Terminal Nets Routing 57 vi 4.3.4 Profiling 58 4.3.5 Framing Technique 58 4.4 A Hardware/Software Co-Design Implementation 59 4.4.1 Fast Simplex Link (FSL) Bus 61 4.4.2 The Hardware Accelerator Module 63 4.5 Results 67 4.5.1 FPGA Usage 68 4.5.2 Speedup 68 4.6 Summary 71 5 A Handel-C Custom RTL Implementation 72 5.1 DK Design Flow 73 5.2 Design Constraints 74 5.3 Design Details 74 5.3.1 Parallelizing Lee's Algorithm 74 5.3.2 Input/Output Data 77 5.4 The Custom Hardware vs. The MicroBlaze Based Implementations 77 5.4.1 Speedup 77 5.4.2 FPGA Usage 79 5.5 Summary 80 6 Configurable Processors Implementation 81 6.1 Tensilica Configurable Processors 82 6.1.1 Xtensa Processors 82 6.1.2 Design Flow 83 vii 6.2 Design Details 84 6.2.1 Design Environment and Overall Architecture 85 6.2.2 Profiling 86 6.3 Results 87 6.3.1 Speed and Area 87 6.4 Overall Comparison 88 6.4.1 Speedup 88 6.4.2 Area 90 6.5 Summary 90 7 Conclusions 92 7.1 Future Work 93 Bibliography 95 A Glossary 100 B AMIRIX AP1000 FPGA PCI Development Board 102 C RC10 104 D The Netlist and the Placement Files 106 D.l The Netlist File 106 D.2 The Placement File 106 vm List of Tables 2.1 Benchmarks 24 3.1 Comparison between the three placement architectures 31 3.2 PE Commands . 44 4.1 The Profiling Results 59 4.2 The FPGA Usage 68 4.3 The Consumed Clock Cycles and the Maximum Operating Frequency 69 4.4 The Obtained Speed up over pure-software 70 5.1 The Consumed Clock Cycles 78 5.2 The Actual Execution Time of the Three Implementations in mili Sec. 78 5.3 The FPGA Usage 80 6.1 Xtensa Processor Configuration Detail 85 6.2 The Profiling Results 86 6.3 The Consumed Clock Cycles and the Speed up Obtained over the Pure ISA Processor 87 6.4 The Consumed Clock Cycles 89 ix 6.5 The Actual Execution Time of the Three Approaches in m Sec. 89 6.6 The Speed up obtained by Tensilica Approach over the H/S and Handel-C Approaches 90 x List of Figures 1.1 Interconnect and Gate Delay 3 1.2 The Overall Design Methodology 5 2.1 The VLSI Design Process 8 2.2 VLSI Physical Design Cycle 9 2.3 An Illustration of General Routing 12 2.4 The Classification of the Global Routing Algorithms 13 2.5 Lee's Algorithm: (a) The Wave Propagation Phase (b) The Retrace Phase (c) The Clean up Phase 15 2.6 Schemes to Reduce the Running Time of Lee's Algorithm, (a) Start ing point selection, (b) Double fan-out. (c) Framing 16 2.7 A General FPGA Structure 20 2.8 A General Configurable Logic Block[2] 20 2.9 The Different Computing Approaches 24 3.1 Hardware Accelerators for CAD 27 3.2 The model of the partially reconfigurable dynamic system [3] . 28 3.3 The Serial Architecture [3] 29 xi 3.4 The Parallel Architecture [3] 30 3.5 The Serial Parallel Architecture [3] 30 3.6 HSRA T-Switch with Path-Search OR [4] 33 3.7 Maze Router General Architecture and Pipelined Processors .... 35 3.8 Basic structure of the wavefront machine 38 3.9 Block diagram of a single PE 40 3.10 L3 General Organization [5] 42 3.11 L4 Architecture [6] 43 4.1 The Design Methodology 47 4.2 The Flow Chart of Lee's Algorithm 48 4.3 The MicroBlaze System for the Pure-software Based Implementation 50 4.4 Assign the Source and the Target 51 4.5 The Wave Propagation Function 53 4.6 Retrace and Clean up Function 54 4.7 The Rip up Function 56 4.8 The Rip up Function Steps 57 4.9 Multi-Terminal Nets Routing 58 4.10 The Wave Propagation Function with Framing Technique 60 4.11 Framing 1 Technique 61 4.12 The MicroBlaze System for the Hardware/Software Co-design ..