Constraint Programming Techniques for Generating Efficient Hardware Architectures for Field Programmable Gate Arrays
Total Page:16
File Type:pdf, Size:1020Kb
Utah State University DigitalCommons@USU All Graduate Theses and Dissertations Graduate Studies 5-2010 Constraint Programming Techniques for Generating Efficient Hardware Architectures For Field Programmable Gate Arrays Atul Kumar Shah Utah State University Follow this and additional works at: https://digitalcommons.usu.edu/etd Part of the Computer Engineering Commons Recommended Citation Shah, Atul Kumar, "Constraint Programming Techniques for Generating Efficient Hardware Architectures For Field Programmable Gate Arrays" (2010). All Graduate Theses and Dissertations. 609. https://digitalcommons.usu.edu/etd/609 This Thesis is brought to you for free and open access by the Graduate Studies at DigitalCommons@USU. It has been accepted for inclusion in All Graduate Theses and Dissertations by an authorized administrator of DigitalCommons@USU. For more information, please contact [email protected]. CONSTRAINT PROGRAMMING TECHNIQUES FOR GENERATING EFFICIENT HARDWARE ARCHITECTURES FOR FIELD PROGRAMMABLE GATE ARRAYS by Atul Kumar Shah A thesis submitted in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE in Computer Engineering Approved: Dr.BrandonEames Dr.ArvindDasu MajorProfessor CommitteeMember Dr.EdmundSpencer Dr.ByronR.Burnham CommitteeMember DeanofGraduateStudies UTAH STATE UNIVERSITY Logan, Utah 2010 ii Copyright c Atul Kumar Shah 2010 All Rights Reserved iii Abstract Constraint Programming Techniques for Generating Efficient Hardware Architectures for Field Programmable Gate Arrays by Atul Kumar Shah, Master of Science Utah State University, 2010 Major Professor: Dr. Brandon Eames Department: Electrical and Computer Engineering This thesis presents an approach for modeling and generating efficient hardware archi- tectures using constraint programming techniques, targeting field programmable gate arrays (FPGAs). The focus of this thesis is the derivation of optimal or near-optimal schedules for streaming applications from data flow graphs (DFGs). The resulting schedules are then used to facilitate the architecture generation process. Most streaming applications, like digital singal processing (DSP) algorithms, are repetitive in nature: the same computation is performed on different data items. This repetitive nature of streaming applications can be used to expose additional parallelism available across different iterations, by creating multiple instances of the same computation. The replication of the single computation, when applied to high level synthesis (HLS), improves the performance of the design but re- quires additional area. The amount of additional area required for a replicated graph can be reduced through the use of pipelined functional units and the addition of some extra clock cycles beyond the critical path of the DFG. This thesis discusses the use of a constraint pro- gramming (CP)-based scheduler to generate optimal schedules based on designer-provided replication level and critical path relaxation. The scheduler is an integrated part of the design tool, called CHARGER, which analyzes the resulting schedules to allocate memory iv for storing intermediate data, creates the infrastructure necessary to efficiently execute the application, and finally generates a synthesizable Verilog/VHDL code for the controller. The performance of the architectures derived using the CP-based scheduler is compared with the architectures generated using a force directed scheduling (FDS)-based scheduler for algorithms selected from embedded/multimedia applications. The results show that our CP-based scheduler outperforms the FDS-based scheduler, both in terms of area and efficiency of the generated architectures. The results show average area saving of 39% and average performance improvement of 41%. (123 pages) v To my beloved family and friends vi Acknowledgments This research project would have never been completed without my advisor and mentor, Dr. Brandon Eames. This project is the outcome of his constant motivation and belief in me. I would have never been able to complete this research without his guidance, patience, and expertise in the field of constraint programming and embedded systems. His courses provided the foundation for my research and helped me become a good researcher. I am really grateful to him and would like to thank him for all his guidance, support, and precious time that he has devoted on this project. I would like to thank my committee members, Dr. Arvind Dasu and Dr. Edmund Spencer, for extending their support. I would like to acknowledge my friends here at Logan, especially Nagendra, Karthik, Amrita, Santhi, and Lalit, with whom I have spent the best times of my student life. I would also like to thank my friends at the ECE Department: Rohit Saraswat, Hari Krishna Samala, Shant Chandrakar, and Sravanthi Venigalla for helping me on various occasions. Especially I would like to thank Hari for sharing his expertise on FPGAs and helping me with the VHDL/Verilog Controller generator. And last, but not least, I would like to thank my father who always told me never to give up, and my mother who is the most important person in my life and the greatest strength behind all my successes. Atul Kumar Shah vii Contents Page Abstract ................................................... .... iii Acknowledgments ............................................... vi List of Tables ................................................... ix List of Figures .................................................. xi 1 Introduction ................................................. 1 2 Related Work ................................................ 5 2.1 Scheduling Using Heuristic Approaches. ....... 5 2.2 Scheduling Using Constraint Programming Techniques . .......... 8 2.3 High-LevelLanguagestoHDLs . 11 3 Background .................................................. 14 3.1 FiniteDomainConstraints. 14 3.2 FormulationandScheduling . 17 3.2.1 DataFlowGraphRepresentation . 17 3.2.2 Oz-BasedScheduler ........................... 18 3.3 CHARGERToolFlow .............................. 18 4 Constraint Model for Scheduling and Resource Allocation ........... 21 4.1 LatencyConstraint............................... 22 4.2 ResourceConstraint .............................. 22 4.3 PrecedenceConstraint . 23 4.4 PriorityConstraint .............................. 24 4.5 DistanceConstraint .............................. 24 4.6 PredecessorandSuccessorConstraint . ...... 28 4.7 DominanceConstraint . 30 4.8 AreaConstraint.................................. 32 4.9 DistributionandSearch . 33 4.10Summary ..................................... 35 5 Experimental Evaluation and Results ............................. 36 5.1 OzScheduler(Version1). 37 5.2 OzScheduler(Version2). 45 5.3 OzScheduler(Version3). 50 5.3.1 Impact of I/O Resources on Search Time: Version 3(a) . ...... 53 5.3.2 TwoPhaseSearch: Version3(b) . 54 5.3.3 Summary ................................. 64 viii 5.4 OzScheduler(Version4). 64 6 Area Estimation and Verilog Generator ........................... 73 6.1 AreaEstimation ................................. 74 6.2 ControllerGeneration . 75 6.3 ComparisonoftheEstimationApproach . ..... 78 7 Conclusion and Future Work .................................... 81 References ................................................... ... 84 Appendices ................................................... 87 AppendixA ArchitectureDescriptionFile . ..... 88 A.1 Architecture Description File for Data Flow Graphs . ..... 88 A.2 Data Flow Graph File for Discrete Cosine Transform . 89 Appendix B Sample FDS and Constraint Solver Scheduler Output....... 95 B.1 FDSGeneratedGeneratedScheduleforDCT . 95 B.2 The Constraint Solver Generated Generated Schedule for DCT . 98 Appendix C Sample Verilog Generated Files for DCT Data Flow Graph . 101 ix List of Tables Table Page 4.1 Notations used for specifying constraints. ......... 22 4.2 Additional notations used for specifying the distance constraints. 26 5.1 Basic structures of various data flow graphs. ........ 37 5.2 Specifications of the architectures generated based on Oz Scheduler (version 1)forDCTdataflowgraph............................ 38 5.3 Specifications of the architectures generated based on the modified FDS al- gorithmforDCTdataflowgraph. 39 5.4 Search times of the solver using the original and the modified compare func- tionsforDCTdataflowgraph. 41 5.5 Difference in WSDP and efficiency of the architectures generated based on the constraint solver (version 1) using original and modified compare functions forDCTdataflowgraph. ............................ 42 5.6 Difference in WSDP and efficiency of the architectures generated based on the constraint solver (version 1) using original and the modified compare functionsforFFTdataflowgraph. 42 5.7 Search time taken by version 1 and version 2 of the constraint solver for the DCTdataflowgraph. .............................. 47 5.8 Difference in WSDP and efficiency of the architectures generated based on the constraint solver (version 2) and the modified FDS algorithm for DCT dataflowgraph. ................................. 48 5.9 Difference in WSDP and efficiency of architectures based on the constraint solver (version 2) and the modified FDS algorithm for IDCT data flow graph. 49 5.10 Difference in the specifications of architectures generated by the constraint solver using bounded I/O and unbounded I/O for DCT data flow graph. 55 5.11 Comparison of #Input and #Output required in the schedules generated by vesion 2 and version 3(b) for DCT data flow graph. 57 x 5.12 Difference in specifications