High-Level Synthesis of Dataflow Programs for Heterogeneous Platforms: Design Flow Tools and Design Space Exploration

High-level synthesis of dataflow programs for heterogeneous platforms: design flow tools and design space exploration THÈSE NO 6653 (2015) PRÉSENTÉE LE 29 MAI 2015 À LA FACULTÉ DES SCIENCES ET TECHNIQUES DE L'INGÉNIEUR GROUPE SCI STI MM PROGRAMME DOCTORAL EN MICROSYSTÈMES ET MICROÉLECTRONIQUE ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE POUR L'OBTENTION DU GRADE DE DOCTEUR ÈS SCIENCES PAR Endri BEZATI acceptée sur proposition du jury: Dr J.-M. Sallese, président du jury Dr M. Mattavelli, directeur de thèse Prof. G. De Micheli, rapporteur Dr J. Janneck, rapporteur Dr M. Raulet, rapporteur Suisse 2015 The philosophers have only interpreted the world, in various ways. The point, however, is to change it. — Karl Marx To my parents for giving me the chance to pursue my dreams. Acknowledgements First of all I would like to thank my parents, Dhimiter and Pavlina Bezati for their strong efforts as immigrants to support me during my studies in France. I would like to thank my brother, Dr. Feliks Bezati for helping me with guidance all those years. I thank my uncle Dr. Anesti Duka for his proposition to pursue my studies in France, my path would have been completely different today. I would like to thank my partner Franziska Thoms for all those years of love, happiness, and support. I wish to express my gratitude to my supervisor Dr. MER Marco Mattavelli for his guidance, support, advice, and criticism during the last four years. I would like to thank Dr. Mickaël Raulet for believing in me and suggesting me to Marco for pursuing a PhD in my Lab. I wish also to express my gratitude to my spiritual mentor Dr. Jörn Janneck for his help and guidance. I would like to thank Dr. Ghislain Roquier for his patience and help that he provided me during the first years of my thesis. Also, I would like to thank Dr. Matthieu Wipliez for the Orcc compiler infrastructure, Herve Yviquel and Antoine Lorence for their maintenance and advancement in Orcc. In addition, I would like to thank all previous developers of OpenForge for their excellent work and Xilinx Inc. for open sourcing it. I would like to thank my partner in crime Simone Casale Brunet, both of us have built tools that are working in perfect coordination. Also, I would like to thank all my past and present lab members. Finally, I would like to thank the Swiss National Science Foundation for founding my research. Lausanne, 29 April 2015 Endri Bezati v Abstract The growing complexity of digital signal processing applications implemented in programmable logic and embedded processors make a compelling case the use of high-level methodologies for their design and implementation. Past research has shown that for complex systems, raising the level of abstraction does not necessarily come at a cost in terms of performance or resource requirements. As a matter of fact, high-level synthesis tools supporting such a high abstraction often rival and on occasion improve low-level design. In spite of these successes, high-level synthesis still relies on programs being written with the target and often the synthesis process, in mind. In other words, imperative languages such as C or C++, most used languages for high-level synthesis, are either modified or a constrained subset is used to make parallelism explicit. In addition, a proper behavioral description that permits the unification for hardware and software design is still an elusive goal for heterogeneous platforms. A promis- ing behavioral description capable of expressing both sequential and parallel application is RVC-CAL. RVC-CAL is a dataflow programming language that permits design abstraction, modularity, and portability. The objective of this thesis is to provide a high-level synthesis solution for RVC-CAL dataflow programs and provide an RVC-CAL design flow for heterogeneous platforms. The main contributions of this thesis are: a high-level synthesis infrastructure that supports the full specification of RVC-CAL, an action selection strategy for supporting parallel read and writes of list of tokens in hardware synthesis, a dynamic fine-grain profiling for syn- thesized dataflow programs, an iterative design space exploration framework that permits the performance estimation, analysis, and optimization of heterogeneous platforms, and finally a clock gating strategy that reduces the dynamic power consumption. Experimental results on all stages of the provided design flow, demonstrate the capabilities of the tools for high-level synthesis, software hardware Co-Design, design space exploration, and power optimization for reconfigurable hardware. Consequently, this work proves the viability of complex systems design and implementation using dataflow programming, not only for system-level simulation but real heterogeneous implementations. Key words: High-level synthesis, Dataflow Programing, Clock-Gating, Co-Design, Design Flow, Design Space Exploration vii Résumé De nos jours, les applications de traitement numérique du signal sont de plus en plus complexes dans leurs mises en œuvre et leurs conceptions pour des implantations sur des proces- seurs embarqués contenant de la logique programmable. Ceci demande le développement de nouvelles méthodologies basées sur un langage à haut niveau d’abstraction. Des recherches antérieures ont montré que pour les systèmes complexes, l’élévation du niveau d’abstraction n’augmente pas nécessairement les coûts en termes de ressources ou la dégradation des per- formances. En général, des outils de synthèse haut niveau avec une telle abstraction, souvent concurrentiels, améliorent dans certains cas la conception de systèmes de niveau hiérarchique très bas. En dépit de ces succès, la synthèse haut niveau s’appuie toujours sur le principe que les programmes doivent s’adapter à la cible souhaitée sans oublier les particularités du système de synthèse. En d’autres termes, les langages impératifs tels que C ou C ++, les plus utilisés pour la synthèse haut niveau, sont soit modifiés ou soit adaptés en un sous-ensemble de langages pour faire un parallélisme explicite. De plus, une description comportementale appropriée qui permet l’unification de conception de systèmes hétérogènes est toujours un objectif difficile à atteindre. Une des plus prometteuses, capable d’exprimer à la fois l’application séquentielle et parallèle, est exprimée en RVC-CAL. Ce dernier est donc un langage de programmation en flux de données qui permet l’abstraction de la conception, de la modularité et de la portabilité. Les objectifs de cette thèse sont de fournir une solution de synthèse haut niveau pour des programmes écrits en RVC-CAL et de fournir une solution d’implantation pour des plates-formes hétérogènes. Les principales contributions de cette thèse sont : une infrastructure de synthèse de haut niveau qui prend en charge la spécification complète de RVC-CAL, une stratégie de sélection d’action pour la synthèse haut niveau pour permettre la lecture et l’écriture en parallèle de la liste des jetons, un profilage dynamique à grain fin pour les programmes de flux de données qui sont synthétisés, une structure d’exploration de l’espace pour des programmes de flux de données qui permet l’estimation de performance, l’analyse et l’optimisation des plates-formes hétérogènes et enfin une stratégie de « clock- gating » qui réduit la consommation d’énergie dynamiquement. Des résultats expérimentaux sur toutes les étapes du déroulement de la conception, démontrent les capacités des outils de synthèse haut niveau, le Co-Design des parties matériel et logiciel, l’exploration spatiale de l’application, et l’optimisation de puissance pour des plates-formes reconfigurables. En conclusion, ce travail prouve la viabilité de la conception et la mise en œuvre de systèmes complexes en utilisant la programmation de flux de données, pas seulement pour la simulation au niveau du système, mais aussi pour les implémentations hétérogènes. ix Acknowledgements Mots clefs : synthèse de haut niveau, flux de données, clock-gating, co-design, flot de conception, exploration de l’éspace du design x Abbreviations ALAP As Late As Possible ANSI American National Standards Institute ASAP As Soon As Possible ASIC Application Specific Integrated Circuit AST Abstract Syntax Tree BDL Behavioral Description Language BRAM Block RAM CAD Computer Aided Design CAL CAL Actor Language CAM Computer Aided Manufacturing CDFG Control-Data Flow Graph CFG Control Flow Graph CL Computational Load CP Critical Path CPU Central Processing Unit CUDA Compute Unified Device Architecture DSL Domain-Specific Language DSP Digital Signal Processor EMF Eclipse Modeling Framework ETG Execution Trace Graph FDS Force-Directed Scheduling xi Acknowledgements FF Flip-Flop Register FPGA Filed-Programmable Gate Array FSM Finite State Machine HDL High-Description Language HLS High-Level Synthesis HW Hardware IDE Integrated Developing Environment ILP Integer Linear Programming IP Intellectual Property IR Intermediate Representation ISPS Instruction Set Processor Specification LLVM Low Level Virtual Machine LUB Least Upper Bound LUT Look-Up Table LVA Live Variable Analysis MDE Model-Driven Engineering MoA Model of Architecture MPEG Movie Picture Expert Group Orcc Open RVC-CAL Compiler QoR Quality of Results RAM Random Access Memory ROM Read Only Memory RTL Register Transfer Layer RVC Reconfigurable Video Coding SoC System on Chip SSA Single Static Assignment SW Software xii Acknowledgements UML Unified Modeling Language VHDL VHSIC Hardware Description Language VHSIC Very High-Speed Integrated Circuit VLSI Very-Large-Scale Integration XLIM

High-Level Synthesis of Dataflow Programs for Heterogeneous Platforms: Design Flow Tools and Design Space Exploration

Eee4120f Hpes

An Introduction to High-Level Synthesis

The Challenges of Hardware Synthesis from C-Like Languages

A Compiler Infrastructure for Static and Hybrid Analysis of Discrete Event System Models

Review of FPD's Languages, Compilers, Interpreters and Tools

A Mobile Programmable Radio Substrate for Any‐Layer Measurement and Experimentation

Total Ionizing Dose Effects on Xilinx Field-Programmable Gate Arrays

Altera Powerpoint Guidelines

System-Level Methods for Power and Energy Efficiency of Fpga-Based Embedded Systems

Parallel Systemc/TLM Simulation of Hardware Components Described for High-Level Synthesis Denis Becker

INTEGRATING ALGORITHM-LEVEL DESIGN and SYSTEM-LEVEL DESIGN THROUGH SPECIFICATION Synthesisintegrating Algorithm-Level Design

System Modeling & HW/SW Co-Verification