System-Level Hardware Synthesis of Dataflow Programs with HEVC As Study Use Case

THESE INSA Rennes présentée par sous le sceau de l’Université Bretagne Loire pour obtenir le titre de Mariem Abid DOCTEUR DE L’INSA RENNES ECOLE DOCTORALE : MATISSE Spécialité : Traitement du Signal et de lìmage LABORATOIRE : IETR Thèse soutenue le 28.04.2016 System-Level Hardware devant le jury composé de : Synthesis of dataflow Mohamed Akil Professeur à l’ESIEE Paris (France) / Président et Rapporteur programs with HEVC as Ahmed Chiheb Ammari Associate Professor à Université du roi Abdulaziz à Jeddah (Arabie study use case Saoudite) / Rapporteur Mohamed Atri Maître de Conférences à Faculté des Sciences de Monastir (Tunisie) / Examinateur Audrey Queudet Maître de conférences à l’université de Nantes (France) / Examinateur Olivier Déforges Professeur à l’INSA de Rennes (France) / Directeur de thèse Mohamed Abid Professeur à l’Ecole Nationale d’Ingénieur de Sfax (Tunisie) / Directeur de thèse System-Level Hardware Synthesis of Dataflow Programs with HEVC as Study Use Case Mariem Abid En partenariat avec Document protégé par les droits d’auteur Dedication i ii Abstract Image and video processing applications are characterized by the processing of a huge amount of data. The design of such complex applications with traditional design methodologies at low- level of abstraction causes increasing development costs. In order to resolve the above mentioned challenges, Electronic System Level (ESL) synthesis or High-Level Synthesis (HLS) tools were proposed. The basic premise is to model the behavior of the entire system using high-level specifications, and to enable the automatic synthesis to low-level specifications for efficient implementation in Field-Programmable Gate Array (FPGA). However, the main downside of the HLS tools is the lack of the entire system consideration, i.e. the establishment of the communi- cations between these components to achieve the system-level is not yet considered. The purpose of this thesis is to raise the level of abstraction in the design of embedded systems to the system-level. A novel design flow was proposed that enables an efficient hardware implementation of video processing applications described using a Domain Specific Language (DSL) for dataflow programming. The design flow combines a dataflow compiler for generating C-based HLS descriptions from a dataflow description and a C-to-gate synthesizer for generating Register-Transfer Level (RTL) descriptions. The challenge of implementing the communication channels of dataflow programs relying on Model of Computation (MoC) in FPGA is the mini- mization of the communication overhead. In this issue, we introduced a new interface synthesis approach that maps the large amounts of data that multimedia and image processing applications process, to shared memories on the FPGA. This leads to a tremendous decrease in the latency and an increase in the throughput. These results were demonstrated upon the hardware synthesis of the emerging High-Efficiency Video Coding (HEVC) standard. Résumé Les applications de traitement d'image et vidéosont caractriséespar le traitement d'une grande quantitéde données.La conception de ces applications complexes avec des méthodologies de conception traditionnelles bas niveau provoque l'augmentation des coûtsde développement. Afin de résoudreces défis,des outils de synthèsehaut niveau ont étéproposés.Le principe de base est de modéliserle comportement de l'ensemble du systèmeen utilisant des spécificationshaut niveau afin de permettre la synthèseautomatique vers des spécificationsbas niveau pour implémentation efficace en FPGA. Cependant, l'inconvénient principal de ces outils de synthèsehaut niveau est le manque de prise en compte de la totalitédu système,c.-à-d.la créationde la communication entre les différents composants pour atteindre le niveau systèmen'est pas considérée. Le but de cette thèseest d'élever le niveau d'abstraction dans la conception des systèmesem- barquésau niveau système. Nous proposons un flot de conception qui permet une synthèse matérielleefficace des applications de traitement vidéodécritesen utilisant un langage spécifique àun domaine pour la programmation flot-de-données.Le flot de conception combine un compi- lateur flot-de-donnéespour générerdes descriptions àbase de code C et d'un synthétiseurpour générerdes descriptions niveau de transfert de registre. Le défimajeur de l'implémentation en FPGA des canaux de communication des programmes flot-de-donnéesbaséssur un modèle de calcul est la minimisation des frais générauxde la communication. Pour celà,nous avons introduit une nouvelle approche de synthèsede l'interface qui mappe les grandes quantitésdes donnéesvidéo,àtravers des mémoirespartagéessur FPGA. Ce qui conduit àune diminution considérablede la latence et une augmentation du débit.Ces résultatsont étédémontréssur la synthèsematérielledu standard vidéoémergent High-Efficiency Video Coding (HEVC). ii Acknowledgment iii iv Contents List of Figures ix List of Tables xi Acronyms xv 1 Introduction 1 1.1 Context and Motivation . 1 1.2 Problem Statement and Contributions . 2 1.3 Outline . 3 1.4 Publications . 3 I BACKGROUND 5 2 Fundamentals of Embedded Systems Design 7 2.1 Introduction . 7 2.2 The Embedded Systems Design . 7 2.2.1 What is an embedded system? . 7 2.2.1.1 What is an Application-Specific Integrated Circuit (ASIC)? . 8 2.2.1.2 What is a Field-Programmable Gate Array (FPGA)? . 8 2.2.2 Video compression in FPGA ......................... 9 2.2.3 The embedded systems design challenges . 10 2.2.3.1 Design constraints . 10 2.2.3.2 Design productivity gap . 11 2.3 Hardware Design Methodologies . 11 2.3.1 Levels of abstraction . 12 2.3.2 Bottom-up methodology . 13 2.3.3 Top-down methodology . 13 2.3.4 System design process . 13 2.3.5 Design flow and taxonomy of synthesis . 14 2.4 Register-Transfer Level (RTL) Design . 15 2.4.1 What is a Hardware Description Language (HDL)? . 16 2.4.2 What is wrong with RTL design and HDLs? . 17 2.5 High-Level Synthesis (HLS) Design . 18 2.6 System-Level Design . 22 2.7 Conclusion . 24 Bibliography 27 3 Dataflow Programming in the Reconfigurable Video Coding (RVC) Frame- work 33 3.1 Introduction . 33 3.2 Dataflow Programming . 33 3.3 The RVC Standard . 35 3.3.1 Motivation and Objectives . 35 3.3.2 Structure of the Standard . 35 3.3.3 Instantiation Process of a RVC Abstract Decoder Model (ADM)..... 36 3.4 RVC-Caltrop Actor Language (CAL) Dataflow Programming . 37 v CONTENTS vi 3.4.1 RVC-CAL Language . 37 3.4.2 Representation of Different Model of Computations (MoCs) in RVC-CAL 39 3.5 RVC-CAL Code Generators and Related Work . 42 3.5.1 Open Dataflow environment (OpenDF)................... 42 3.5.2 Open RVC-CAL Compiler (Orcc)....................... 43 3.6 Conclusion . 45 Bibliography 47 II CONTRIBUTIONS 51 4 Toward Efficient Hardware Implementation of RVC-based Video Decoders 53 4.1 Introduction . 53 4.2 Limitations of the Current Solution and Problem Statement . 53 4.3 Rapid Prototyping Methodology . 55 4.3.1 Outline of the Prototyping Process . 55 4.3.2 System-Level Synthesis using Orcc ...................... 57 4.3.2.1 Vivado HLS Coding Style . 57 4.3.2.2 Automatic Communication Refinement . 58 4.3.2.3 Automatic Computation Refinement . 59 4.3.3 HLS using Vivado HLS ............................ 61 4.3.4 System-Level Integration . 62 4.3.5 Automatic Validation . 62 4.4 Rapid Prototyping Results: High-Efficiency Video Coding (HEVC) Decoder Case Study . 63 4.4.1 RVC-CAL Implementation of the HEVC Decoder . 63 4.4.1.1 The HEVC standard . 63 4.4.1.2 The RVC-CAL HEVC Decoder . 66 4.4.1.3 Test Sequences . 66 4.4.2 Optimization Metrics . 68 4.4.3 Experimental Setup . 68 4.4.4 Experimental Results . 69 4.4.4.1 The Main Still Picture profile of HEVC case study . 69 4.4.4.2 The IntraPrediction Functional Unit (FU) Case Study . 71 4.4.4.3 Design-Space Exploration (DSE) Through Optimizations Direc- tives . 71 4.5 Conclusion . 73 Bibliography 75 5 Toward Optimized Hardware Implementation of RVC-based Video Decoders 77 5.1 Introduction . 77 5.2 Issues with Explicit Streaming . 78 5.3 Interface Synthesis Optimization . 79 5.3.1 Shared-Memory Circular Buffer . 79 5.3.2 Scheduling Optimization . 81 5.3.3 Synthesis of Arrays . 82 5.3.4 System-Level Integration . 83 5.3.5 Test Infrastructure . 85 5.4 Experimental Results . 85 5.4.1 The Main Still Picture profile of HEVC case study . 85 5.4.2 The Main profile of HEVC case study . 86 5.4.2.1 Throughput Analysis . 87 5.4.2.2 Latency Analysis . 88 5.4.3 Task Parallelism Optimization . 89 5.4.4 Data parallelism optimization . 90 5.4.4.1 YUV-Parallel RVC-CAL HEVC decoder . 90 5.4.4.2 Ref Design vs. YUV Design .................... 90 CONTENTS vii 5.4.5 Comparison with Other Works . 92 5.4.5.1 System-level hardware synthesis versus hand-coded HDL of the HEVC decoder . 92 5.4.5.2 Comparison with other alternative HLS for RVC-CAL ...... 92 5.5 Conclusion . 93 Bibliography 95 6 Conclusion 97 III APPENDIX 101 Appendix A System-Level Design Flow: Tutorial 103 A.1 A User Guide for the C-HLS Backend: Steps and Requirements . 103 A.2 Write a Very Simple Network . 103 A.2.1 Setup a New Orcc Projet . 103 A.2.2 Implements Actors . 103 A.2.3 Build the Orcc Network .

System-Level Hardware Synthesis of Dataflow Programs with HEVC As Study Use Case

Open Source Synthesis and Verification Tool for Fixed-To-Floating and Floating-To-Fixed Points Conversions

Xilinx Vivado Design Suite User Guide: Release Notes, Installation, And

Vivado Design Suite User Guide: High-Level Synthesis (UG902)

Vivado Tutorial

Xilinx Vivado Design Suite 7 Series FPGA and Zynq-7000 All Programmable Soc Libraries Guide (UG953)

Vivado Design Suite User Guide: Implementation

UG908 (V2019.2) October 30, 2019 Revision History

Xilinx Vivado Design Suite User Guide: Release Notes, Installation, and Licensing (IUG973)

Analýza Síťového Provozu Na Síťové Kartě Fpga Network Traffic Analysis on Fpga Network Card

A Parallel Bandit-Based Approach for Autotuning FPGA Compilation

Eee4120f Hpes

Xilinx Vivado Design Suite User Guide: Release Notes, Installation, And