High-Level Synthesis and Arithmetic Optimizations Yohann Uguen

High-level synthesis and arithmetic optimizations Yohann Uguen To cite this version: Yohann Uguen. High-level synthesis and arithmetic optimizations. Mechanics [physics.med-ph]. Uni- versité de Lyon, 2019. English. NNT : 2019LYSEI099. tel-02420901v2 HAL Id: tel-02420901 https://hal.archives-ouvertes.fr/tel-02420901v2 Submitted on 17 Jul 2020 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. N°d'ordre NNT : 2019LYSEI099 THESE` de DOCTORAT DE L'UNIVERSITE´ DE LYON Opéréeau sein de : (INSA de Lyon, CITI lab) Ecole Doctorale InfoMaths EDA 512 (Informatique Mathématique) Spécialitéde doctorat : Informatique Soutenue publiquement le 13/11/2019, par : Yohann Uguen High-level synthesis and arithmetic optimizations Devant le jury composéde : FrédéricPétrot Président Professeur des Universités, TIMA, Grenoble, France Philippe Coussy Rapporteur Professeur des Universités, UBS, Lorient, France Olivier Sentieys Rapporteur Professeur des Universités, Univ. Rennes, Inria, IRISA, Rennes Laure Gonnord Examinatrice Ma^ıtrede conférence, UniversitéLyon 1, France Martin Kumm Examinateur Professeur des Universités, Universitéde Fulda, Allemagne Florent de Dinechin Directeur de thèse Professeur des Universités, INSA Lyon, France Cette thèse est accessible à l'adresse : http://theses.insa-lyon.fr/publication/2019LYSEI099/these.pdf © [Y. Uguen], [2019], INSA Lyon, tous droits réservés Cette thèse est accessible à l'adresse : http://theses.insa-lyon.fr/publication/2019LYSEI099/these.pdf © [Y. Uguen], [2019], INSA Lyon, tous droits réservés Résumé A` cause de la nature relativement jeune des outils de synthèsede haut-niveau (HLS), de nombreuses optimisations arithmétiquesn'y sont pas encore implémentées.Cette thèse propose des optimisations arithmétiquesse servant du contexte spécifiquedans lequel les opérateurs sont instanciés. Certaines optimisations sont de simples spécialisations d'opérateurs,respectant la sémantique du C. D'autres nécéssitent de s'éloignerde cette sémantique pour améliorerle compromis précision/coût/performance. Cette proposition est démontrésur des sommes de produits de nombres flottants. La somme est réalisée dans un format en virgule-fixe définipar son contexte. Quand trop peu d'informations sont disponibles pour définir ce format en virgule-fixe, une stratégieest de générerun accumulateur couvrant l'intégralitédu format flottant. Cette thèseexplore plusieurs implémentations d'un tel accumulateur. L'utilisation d'une représentation en complément àdeux permet de réduirele chemin critique de la boucle d'accumulation, ainsi que la quantitéde ressources utilisées. Un format alternatif aux nombres flottants, appeléposit, propose d'utiliser un encodage àprécisionvariable. De plus, ce format est augmentépar un accumulateur exact. Pour évaluer précisément le coûtmatériel de ce format, cette thèseprésente des architectures d'opérateursposits, implémentésavec le mêmedegréd'optimisation que celui de l'état de l'art des opérateursflottants. Une analyse détailléemontre que le coûtdes opérateurs posits est malgrétout bien plus élevéque celui de leurs équivalents flottants. Enfin, cette thèseprésente une couche de compatibilitéentre outils de HLS, permettant de viser plusieurs outils avec un seul code. Cette bibliothèqueimplémente un type d'entiers de taille variable, avec de plus une sémantique strictement typée,ainsi qu'un ensemble d'opérateursad-hoc optimisés. 1 Cette thèse est accessible à l'adresse : http://theses.insa-lyon.fr/publication/2019LYSEI099/these.pdf © [Y. Uguen], [2019], INSA Lyon, tous droits réservés 2 Cette thèse est accessible à l'adresse : http://theses.insa-lyon.fr/publication/2019LYSEI099/these.pdf © [Y. Uguen], [2019], INSA Lyon, tous droits réservés Abstract High-level synthesis (HLS) tools offer increased productivity regarding FPGA programming. However, due to their relatively young nature, they still lack many arithmetic optimizations. This thesis proposes safe arithmetic optimizations that should always be applied. These optimizations are simple operator specializations, following the C semantic. Other require to a lift the semantic embedded in high-level input program languages, which are inherited from software programming, for an improved accuracy/cost/performance ratio. To demonstrate this claim, the sum-of-product of floating-point numbers is used as a case study. The sum is performed on a fixed-point format, which is tailored to the application, according to the context in which the operator is instantiated. In some cases, there is not enough information about the input data to tailor the fixed- point accumulator. The fall-back strategy used in this thesis is to generate an accumulator covering the entire floating-point range. This thesis explores different strategies for implementing such a large accumulator, including new ones. The use of a 2's complement representation instead of a sign+magnitude is demonstrated to save resources and to reduce the accumulation loop delay. Based on a tapered precision scheme and an exact accumulator, the posit number systems claims to be a candidate to replace the IEEE floating-point format. A throughout analysis of posit operators is performed, using the same level of hardware optimization as state-of-the-art floating-point operators. Their cost remains much higher that their floating-point counterparts in terms of resource usage and performance. Finally, this thesis presents a compatibility layer for HLS tools that allows one code to be deployed on multiple tools. This library implements a strongly typed custom size integer type along side a set of optimized custom operators. 3 Cette thèse est accessible à l'adresse : http://theses.insa-lyon.fr/publication/2019LYSEI099/these.pdf © [Y. Uguen], [2019], INSA Lyon, tous droits réservés 4 Cette thèse est accessible à l'adresse : http://theses.insa-lyon.fr/publication/2019LYSEI099/these.pdf © [Y. Uguen], [2019], INSA Lyon, tous droits réservés CONTENTS Contents 1 Introduction 9 2 Context 13 2.1 Representing real numbers . 13 2.1.1 Fixed-point representation . 13 2.1.2 Floating-point representation . 15 2.2 Field Programmable Gate Arrays (FPGAs) . 20 2.2.1 FPGAs architecture . 20 2.2.2 FPGAs programming model . 23 3 Bridging high-level synthesis and application specific arithmetic 29 3.1 Optimization examples that do not change the program semantic . 31 3.1.1 Floating-point corner-case optimization . 31 3.1.2 Integer multiplication by a constant . 33 3.1.3 Integer division by small constants . 33 3.1.4 Floating-point multiplications and divisions by small constants . 36 3.1.5 Evaluation in context . 37 3.2 Optimization examples that change the program semantic . 38 3.2.1 High-level synthesis (HLS) faithful to the floats . 38 3.2.2 Towards HLS faithful to the reals . 39 3.2.3 The arithmetic side: application-specific accumulator support . 40 3.2.4 The compiler side: source-to-source transformation . 43 3.2.5 Evaluation in context . 48 3.3 Discussion . 51 4 Architecture exploration of exact floating-point accumulators 53 4.1 Parameters of an accumulator for exact sums and sums-of-products . 54 4.2 Kulisch-1: Long adder and long shift . 55 4.2.1 Kulisch-1 with the accumulator in sign+magnitude . 55 4.2.2 Kulisch-1 with the accumulator in 2's complement . 56 4.2.3 Kulisch-1 high-radix carry-save architecture . 57 4.3 Segmenting the accumulator into words . 58 4.3.1 Kulisch-2: Segmented accumulator in RAM . 58 4.3.2 Kulisch-3: Sub-adders with delayed carry propagation . 59 4.4 Recovering the accumulator values . 61 4.5 Evaluation of Kulisch accumulators . 63 4.5.1 Cost/frequency/latency trade-off . 64 5 Cette thèse est accessible à l'adresse : http://theses.insa-lyon.fr/publication/2019LYSEI099/these.pdf © [Y. Uguen], [2019], INSA Lyon, tous droits réservés CONTENTS 4.5.2 Comparisons with plain floating-point accumulation . 65 4.6 Discussion . 66 5 Architecture exploration of operators for the posit number system 67 5.1 Posit representation . 68 5.1.1 Decoding posits . 68 5.1.2 Posits bounds and sizes . 69 5.2 Architecture . 70 5.2.1 Posit intermediate format (PIF) . 70 5.2.2 Posit to PIF decoder . 71 5.2.3 PIF to posit encoder . 72 5.2.4 PIF adder/subtracter and multiplier . 73 5.2.5 Quire . 75 5.3 Evaluation . 76 5.3.1 Comparison with state-of-the-art posit implementations . 76 5.3.2 Comparison with floating-point operators . 80 5.3.3 Quire evaluation . 81 5.4 Using posits as a storage format only . 82 5.5 Discussion . 85 6 A type-safe arbitrary precision arithmetic portability layer for HLS tools 87 6.1 Background and motivation . 88 6.1.1 Arbitrary-sized integers for HLS . 88 6.1.2 Type safety . 89 6.1.3 Core arithmetic primitives for floating-point and posits . 90 6.1.4 Fused arithmetic primitives . 90 6.1.5 Support of these primitives in HLS tools . 91 6.2 Type safety for arbitrary-precision integers in HLS . 91 6.2.1 Variable declaration . 91 6.2.2 Variable assignment . 91 6.2.3 Slicing . 92 6.2.4 Concatenation . 92 6.2.5 Others . 92 6.3 Portability to mainstream HLS tools .

High-Level Synthesis and Arithmetic Optimizations Yohann Uguen

Arithmetic Algorithms for Extended Precision Using Floating-Point Expansions Mioara Joldes, Olivier Marty, Jean-Michel Muller, Valentina Popescu

Variable Precision in Modern Floating-Point Computing

Floating Points

Integers in C: an Open Invitation to Security Attacks?

Hacking in C 2020 the C Programming Language Thom Wiggers

Floating Point Formats

Understanding Integer Overflow in C/C++

5. Data Types

Extended Precision Floating Point Arithmetic

A Library for Interval Arithmetic Was Developed

Signedness-Agnostic Program Analysis: Precise Integer Bounds for Low-Level Code

A Variable Precision Hardware Acceleration for Scientific Computing Andrea Bocco