Acceleration of a Bioinformatics Application Using High-Level Synthesis Naeem Abbas

Acceleration of a bioinformatics application using high-level synthesis Naeem Abbas To cite this version: Naeem Abbas. Acceleration of a bioinformatics application using high-level synthesis. Other [cs.OH]. École normale supérieure de Cachan - ENS Cachan, 2012. English. NNT : 2012DENS0019. tel- 00847076 HAL Id: tel-00847076 https://tel.archives-ouvertes.fr/tel-00847076 Submitted on 22 Jul 2013 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. THÈSE / ENS CACHAN - BRETAGNE présentée par sous le sceau de l’Université européenne de Bretagne Naeem Abbas pour obtenir le titre de Préparée à l’Unité Mixte de Recherche 6074 DOCTEUR DE L’école norMALE suPÉRIEURE de CACHAN Institut de recherche en informatique Mention : Informatique École doctorale MATISSE et systèmes aléatoires Thèse soutenue le 22 mai 2012 Acceleration of devant le jury composé de : Philippe COUSSY, a Bioinformatics Maître de conférences - Université de Bretagne Sud / rapporteur Florent DE DINECHIN, Application using Maître de conférences - ENS Lyon / rapporteur Rumen ANDONOV, Professeur des universités - Université de Rennes 1 / examinateur High-Level Tanguy RISSET, Professeur des universités - INSA de Lyon / examinateur Synthesis Steven DERRIEN, Maître de conférences - Université de Rennes 1 / directeur de thèse Patrice QUINTON, Professeur des universités - ENS Cachan-Bretagne / directeur de thèse Résumé Abstract Les avancées dans le domaine de la bioinformatique ont ouvert de The revolutionary advancements in the field of bioinformatics have nouveaux horizons pour la recherche en biologie et en pharmacologie. opened new horizons in biological and pharmaceutical research. Les machines comme les algorithmes utilisées aujourd’hui ne sont However, the existing bioinformatics tools are unable to meet the cependant plus en mesure de répondre à la demande exponentiellement computational demands, due to the recent exponential growth in croissante en puissance de calcul. Il existe donc un besoin pour des biological data. So there is a dire need to build future bioinformatics plate-formes de calculs spécialisées pour ce types de traitement, qui platforms incorporating modern parallel computation techniques. sauraient tirer partie de l’ensemble des technologie de calcul parallèle actuelles (Grilles, multi-coeurs, GPU, FPGA). In this work, we investigate FPGA based acceleration of these applications, using High-Level Synthesis. High-Level Synthesis Dans cette thèse nous étudions comment l’utilisation d’outils de tools enable automatic translation of abstract specifications to the synthèse de haut niveau peut aider à la conception d’accélérateurs hardware design, considerably reducing the design efforts. However, matériels spécialisés massivement parallèles. Ces outils permettent the generation of an efficient hardware using these tools is often a de réduire considérablement les temps de conception mais ne sont challenge for the designers. Our research effort encompasses an pas conçus pour produire des architectures matérielles massivement exploration of the techniques and practices, that can lead to the parallèles efficaces. Les travaux de cette thèse se sont attachés generation of an efficient design from these high-level synthesis tools. à dégager des techniques de parallélisation, ainsi que les moyens d’exprimer efficacement ce parallélisme, pour des outils de type HLS. We illustrate our methodology by accelerating a widely used application -- HMMER -- in bioinformatics community. HMMER is well-known for Nous avons appliqué ces résultats à une application de bioinformatique its compute-intensive kernels and data dependencies that lead to a connue sous le nom de HMMER. Cet algorithme qui pourrait être un bon sequential execution. We propose an original parallelization scheme candidat à une accélération matérielle est très délicat à paralléliser. based on rewriting of its mathematical formulation, followed by an in- Nous avons proposé un schéma d’exécution parallèle original, basé sur depth exploration of hardware mapping techniques of these kernels, une réécriture mathématique de l’algorithme, qui a été suivi par une and finally show on-board acceleration results. exploration des schéma d’exécution matériels possible sur FPGA. Ce résultat à ensuite donnée lieu à une mise en œuvre sur un accélérateur Our research work demonstrates designing flexible hardware matériel et a démontré des facteurs d’accélération encourageants. accelerators for bioinformatics applications, using design methodologies which are more efficient than the traditional ones, and where resulting Les travaux démontre également la pertinence des outils de HLS pour designs are scalable enough to meet the future requirements. la conception d’accélérateur matériel pour le calcul haute performance en Bioinformatique, à la fois pour réduire les temps de conception, mais aussi pour obtenir des architectures plus efficaces et plus facilement reciblables d’un plateforme à une autre. N° d’ordre : École normale supérieure de Cachan - Antenne de Bretagne Campus de Ker Lann - Avenue Robert Schuman - 35170 BRUZ Tél : +33(0)2 99 05 93 00 - Fax : +33(0)2 99 05 93 29 Contents 1 Introduction 1 1.1 High Performance Computing for Bioinformatics . .1 1.2 FPGA based Hardware Acceleration . .3 1.3 FPGA Design Flow . .3 1.3.1 Synthèse de haut niveau . .5 1.4 Parallélisationàl'aide de réductions et de préfixes parallèles. .6 1.5 Contributions de cette thèse. .7 2 Introduction 9 2.1 High Performance Computing for Bioinformatics . .9 2.2 FPGA based Hardware Acceleration . 10 2.3 FPGA Design Flow . 11 2.3.1 High-level Synthesis . 13 2.4 Exploiting Parallelism with Reductions and Prefixes . 14 2.5 Contributions of this work . 15 3 An Introduction to Bioinformatics Algorithms 17 3.1 DNA, RNA & Proteins: . 17 3.2 Sequence Alignment . 19 3.2.1 Pairwise Sequence Alignment . 20 3.2.2 Multiple Sequence Alignment . 23 3.2.3 The HMMER tool suit . 27 3.2.4 Computational Complexity . 29 3.3 RNA Structure Prediction . 30 3.3.1 The Nussinov Algorithm . 31 3.3.2 The Zuker Algorithm . 32 3.4 High Performance Bioinformatics . 33 3.5 Conclusion . 34 4 HLS Based Acceleration: From C to Circuit 35 4.1 Reconfigurable Computing . 35 4.2 Accelerators for Biocomputing . 38 4.3 High Level Synthesis . 39 i ii CONTENTS 4.3.1 Advantages of HLS over RTL coding . 39 4.4 HLS Design Steps . 41 4.4.1 Compilation . 41 4.4.2 Operation Scheduling . 41 4.4.3 Allocation & Binding . 47 4.4.4 Generation . 50 4.5 High Level Synthesis Tools: An Overview . 50 4.5.1 Impulse C . 51 4.5.2 Catapult C . 52 4.5.3 MMAlpha . 53 4.5.4 C2H . 54 4.6 Conclusion . 54 5 Efficient Hardware Generation with HLS 57 5.1 Bit-Level Transformations . 57 5.1.1 Bit-Width Narrowing . 58 5.1.2 Bit-level Optimization . 59 5.2 Instruction-level transformations . 60 5.2.1 Operator Strength Reduction . 60 5.2.2 Height Reduction . 61 5.2.3 Code Motion . 63 5.3 Loop Transformations . 65 5.3.1 Unrolling . 65 5.3.2 Loop Interchange . 66 5.3.3 Loop Shifting . 66 5.3.4 Loop Peeling . 68 5.3.5 Loop Skewing . 68 5.3.6 Loop Fusion . 68 5.3.7 C-Slowing . 69 5.3.8 Loop Tiling & Strip-mining . 70 5.3.9 Memory Splitting & Interleaving . 70 5.3.10 Data Replication, Reuse and Scalar Replacement . 71 5.3.11 Array Contraction . 73 5.3.12 Data Prefetching . 73 5.3.13 Memory Duplication . 74 5.4 Conclusion . 76 6 Extracting Parallelism in HMMER 79 6.1 Introduction . 79 6.2 Background . 80 6.2.1 Profile HMMs . 80 6.2.2 P7Viterbi Algorithm Description . 81 6.2.3 Look ahead Computations . 82 6.3 Related work . 83 6.3.1 Early Implementations . 83 6.3.2 Speculative Execution of the Viterbi Algorithm . 85 6.3.3 GPU Implementations of HMMER . 87 6.3.4 HMMER3 and the Multi Ungapped Segment Heuristic . 87 6.3.5 Accelerating the Complete HMMER3 Pipeline . 89 6.4 Rewriting the MSV Kernel . 89 6.5 Rewriting the P7Viterbi Kernel . 90 6.5.1 Finding Reductions . 91 6.5.2 Impact of the Data-Dependence Graph . 93 6.6 Parallel Prefix Networks . 94 6.7 Conclusion . 96 7 Hardware Mapping of HMMER 97 7.1 Hardware Mapping . 98 7.1.1 Architecture with a Single Combinational Datapath . 98 7.1.2 A C-slowed Pipelined Datapath . 98 7.1.3 Implementing the Max-Prefix Operator . 99 7.1.4 Managing Resource Constraints through Tiling . 100 7.1.5 Accelerating the Full HMMER Pipeline . 101 7.2 Implementation through High-Level Synthesis . 102 7.2.1 Loop Transformations . 102 7.2.2 Loop Unroll & Memory Partitioning . 103 7.2.3 Ping-Pong Memories . 103 7.2.4 Scalar Replication . 103 7.2.5 Memory Duplication . 104 7.3 Experimental results . ..

Load more