System-Level Hardware Synthesis of Dataflow Programs with HEVC As Study Use Case
Total Page:16
File Type:pdf, Size:1020Kb
THESE INSA Rennes présentée par sous le sceau de l’Université Bretagne Loire pour obtenir le titre de Mariem Abid DOCTEUR DE L’INSA RENNES ECOLE DOCTORALE : MATISSE Spécialité : Traitement du Signal et de l`image LABORATOIRE : IETR Thèse soutenue le 28.04.2016 System-Level Hardware devant le jury composé de : Synthesis of dataflow Mohamed Akil Professeur à l’ESIEE Paris (France) / Président et Rapporteur programs with HEVC as Ahmed Chiheb Ammari Associate Professor à Université du roi Abdulaziz à Jeddah (Arabie study use case Saoudite) / Rapporteur Mohamed Atri Maître de Conférences à Faculté des Sciences de Monastir (Tunisie) / Examinateur Audrey Queudet Maître de conférences à l’université de Nantes (France) / Examinateur Olivier Déforges Professeur à l’INSA de Rennes (France) / Directeur de thèse Mohamed Abid Professeur à l’Ecole Nationale d’Ingénieur de Sfax (Tunisie) / Directeur de thèse System-Level Hardware Synthesis of Dataflow Programs with HEVC as Study Use Case Mariem Abid En partenariat avec Document protégé par les droits d’auteur Dedication i ii Abstract Image and video processing applications are characterized by the processing of a huge amount of data. The design of such complex applications with traditional design methodologies at low- level of abstraction causes increasing development costs. In order to resolve the above mentioned challenges, Electronic System Level (ESL) synthesis or High-Level Synthesis (HLS) tools were proposed. The basic premise is to model the behavior of the entire system using high-level specifications, and to enable the automatic synthesis to low-level specifications for efficient im- plementation in Field-Programmable Gate Array (FPGA). However, the main downside of the HLS tools is the lack of the entire system consideration, i.e. the establishment of the communi- cations between these components to achieve the system-level is not yet considered. The purpose of this thesis is to raise the level of abstraction in the design of embedded sys- tems to the system-level. A novel design flow was proposed that enables an efficient hardware implementation of video processing applications described using a Domain Specific Language (DSL) for dataflow programming. The design flow combines a dataflow compiler for generating C-based HLS descriptions from a dataflow description and a C-to-gate synthesizer for generating Register-Transfer Level (RTL) descriptions. The challenge of implementing the communication channels of dataflow programs relying on Model of Computation (MoC) in FPGA is the mini- mization of the communication overhead. In this issue, we introduced a new interface synthesis approach that maps the large amounts of data that multimedia and image processing applica- tions process, to shared memories on the FPGA. This leads to a tremendous decrease in the latency and an increase in the throughput. These results were demonstrated upon the hardware synthesis of the emerging High-Efficiency Video Coding (HEVC) standard. R´esum´e Les applications de traitement d'image et vid´eosont caractris´eespar le traitement d'une grande quantit´ede donn´ees.La conception de ces applications complexes avec des m´ethodologies de con- ception traditionnelles bas niveau provoque l'augmentation des co^utsde d´eveloppement. Afin de r´esoudreces d´efis,des outils de synth`esehaut niveau ont ´et´epropos´es.Le principe de base est de mod´eliserle comportement de l'ensemble du syst`emeen utilisant des sp´ecificationshaut niveau afin de permettre la synth`eseautomatique vers des sp´ecificationsbas niveau pour impl´ementation efficace en FPGA. Cependant, l'inconv´enient principal de ces outils de synth`esehaut niveau est le manque de prise en compte de la totalit´edu syst`eme,c.-`a-d.la cr´eationde la communication entre les diff´erents composants pour atteindre le niveau syst`emen'est pas consid´er´ee. Le but de cette th`eseest d'´elever le niveau d'abstraction dans la conception des syst`emesem- barqu´esau niveau syst`eme. Nous proposons un flot de conception qui permet une synth`ese mat´erielleefficace des applications de traitement vid´eod´ecritesen utilisant un langage sp´ecifique `aun domaine pour la programmation flot-de-donn´ees.Le flot de conception combine un compi- lateur flot-de-donn´eespour g´en´ererdes descriptions `abase de code C et d'un synth´etiseurpour g´en´ererdes descriptions niveau de transfert de registre. Le d´efimajeur de l'impl´ementation en FPGA des canaux de communication des programmes flot-de-donn´eesbas´essur un mod`ele de calcul est la minimisation des frais g´en´erauxde la communication. Pour cel`a,nous avons introduit une nouvelle approche de synth`esede l'interface qui mappe les grandes quantit´esdes donn´eesvid´eo,`atravers des m´emoirespartag´eessur FPGA. Ce qui conduit `aune diminution consid´erablede la latence et une augmentation du d´ebit.Ces r´esultatsont ´et´ed´emontr´essur la synth`esemat´erielledu standard vid´eo´emergent High-Efficiency Video Coding (HEVC). ii Acknowledgment iii iv Contents List of Figures ix List of Tables xi Acronyms xv 1 Introduction 1 1.1 Context and Motivation . 1 1.2 Problem Statement and Contributions . 2 1.3 Outline . 3 1.4 Publications . 3 I BACKGROUND 5 2 Fundamentals of Embedded Systems Design 7 2.1 Introduction . 7 2.2 The Embedded Systems Design . 7 2.2.1 What is an embedded system? . 7 2.2.1.1 What is an Application-Specific Integrated Circuit (ASIC)? . 8 2.2.1.2 What is a Field-Programmable Gate Array (FPGA)? . 8 2.2.2 Video compression in FPGA ......................... 9 2.2.3 The embedded systems design challenges . 10 2.2.3.1 Design constraints . 10 2.2.3.2 Design productivity gap . 11 2.3 Hardware Design Methodologies . 11 2.3.1 Levels of abstraction . 12 2.3.2 Bottom-up methodology . 13 2.3.3 Top-down methodology . 13 2.3.4 System design process . 13 2.3.5 Design flow and taxonomy of synthesis . 14 2.4 Register-Transfer Level (RTL) Design . 15 2.4.1 What is a Hardware Description Language (HDL)? . 16 2.4.2 What is wrong with RTL design and HDLs? . 17 2.5 High-Level Synthesis (HLS) Design . 18 2.6 System-Level Design . 22 2.7 Conclusion . 24 Bibliography 27 3 Dataflow Programming in the Reconfigurable Video Coding (RVC) Frame- work 33 3.1 Introduction . 33 3.2 Dataflow Programming . 33 3.3 The RVC Standard . 35 3.3.1 Motivation and Objectives . 35 3.3.2 Structure of the Standard . 35 3.3.3 Instantiation Process of a RVC Abstract Decoder Model (ADM)..... 36 3.4 RVC-Caltrop Actor Language (CAL) Dataflow Programming . 37 v CONTENTS vi 3.4.1 RVC-CAL Language . 37 3.4.2 Representation of Different Model of Computations (MoCs) in RVC-CAL 39 3.5 RVC-CAL Code Generators and Related Work . 42 3.5.1 Open Dataflow environment (OpenDF)................... 42 3.5.2 Open RVC-CAL Compiler (Orcc)....................... 43 3.6 Conclusion . 45 Bibliography 47 II CONTRIBUTIONS 51 4 Toward Efficient Hardware Implementation of RVC-based Video Decoders 53 4.1 Introduction . 53 4.2 Limitations of the Current Solution and Problem Statement . 53 4.3 Rapid Prototyping Methodology . 55 4.3.1 Outline of the Prototyping Process . 55 4.3.2 System-Level Synthesis using Orcc ...................... 57 4.3.2.1 Vivado HLS Coding Style . 57 4.3.2.2 Automatic Communication Refinement . 58 4.3.2.3 Automatic Computation Refinement . 59 4.3.3 HLS using Vivado HLS ............................ 61 4.3.4 System-Level Integration . 62 4.3.5 Automatic Validation . 62 4.4 Rapid Prototyping Results: High-Efficiency Video Coding (HEVC) Decoder Case Study . 63 4.4.1 RVC-CAL Implementation of the HEVC Decoder . 63 4.4.1.1 The HEVC standard . 63 4.4.1.2 The RVC-CAL HEVC Decoder . 66 4.4.1.3 Test Sequences . 66 4.4.2 Optimization Metrics . 68 4.4.3 Experimental Setup . 68 4.4.4 Experimental Results . 69 4.4.4.1 The Main Still Picture profile of HEVC case study . 69 4.4.4.2 The IntraPrediction Functional Unit (FU) Case Study . 71 4.4.4.3 Design-Space Exploration (DSE) Through Optimizations Direc- tives . 71 4.5 Conclusion . 73 Bibliography 75 5 Toward Optimized Hardware Implementation of RVC-based Video Decoders 77 5.1 Introduction . 77 5.2 Issues with Explicit Streaming . 78 5.3 Interface Synthesis Optimization . 79 5.3.1 Shared-Memory Circular Buffer . 79 5.3.2 Scheduling Optimization . 81 5.3.3 Synthesis of Arrays . 82 5.3.4 System-Level Integration . 83 5.3.5 Test Infrastructure . 85 5.4 Experimental Results . 85 5.4.1 The Main Still Picture profile of HEVC case study . 85 5.4.2 The Main profile of HEVC case study . 86 5.4.2.1 Throughput Analysis . 87 5.4.2.2 Latency Analysis . 88 5.4.3 Task Parallelism Optimization . 89 5.4.4 Data parallelism optimization . 90 5.4.4.1 YUV-Parallel RVC-CAL HEVC decoder . 90 5.4.4.2 Ref Design vs. YUV Design .................... 90 CONTENTS vii 5.4.5 Comparison with Other Works . 92 5.4.5.1 System-level hardware synthesis versus hand-coded HDL of the HEVC decoder . 92 5.4.5.2 Comparison with other alternative HLS for RVC-CAL ...... 92 5.5 Conclusion . 93 Bibliography 95 6 Conclusion 97 III APPENDIX 101 Appendix A System-Level Design Flow: Tutorial 103 A.1 A User Guide for the C-HLS Backend: Steps and Requirements . 103 A.2 Write a Very Simple Network . 103 A.2.1 Setup a New Orcc Projet . 103 A.2.2 Implements Actors . 103 A.2.3 Build the Orcc Network .