A Hybrid-Parallel Architecture for Applications in Bioinformatics

A Hybrid-parallel Architecture for Applications in Bioinformatics M.Sc. Jan Christian Kässens Dissertation zur Erlangung des akademischen Grades Doktor der Ingenieurwissenschaften (Dr.-Ing.) der Technischen Fakultät der Christian-Albrechts-Universität zu Kiel eingereicht im Jahr 2017 Kiel Computer Science Series (KCSS) 2017/4 dated 2017-11-08 URN:NBN urn:nbn:de:gbv:8:1-zs-00000335-a3 ISSN 2193-6781 (print version) ISSN 2194-6639 (electronic version) Electronic version, updates, errata available via https://www.informatik.uni-kiel.de/kcss The author can be contacted via [email protected] Published by the Department of Computer Science, Kiel University Computer Engineering Group Please cite as: Ź Jan Christian Kässens. A Hybrid-parallel Architecture for Applications in Bioinformatics Num- ber 2017/4 in Kiel Computer Science Series. Department of Computer Science, 2017. Dissertation, Faculty of Engineering, Kiel University. @book{Kaessens17, author = {Jan Christian K\"assens}, title = {A Hybrid-parallel Architecture for Applications in Bioinformatics}, publisher = {Department of Computer Science, CAU Kiel}, year = {2017}, number = {2017/4}, doi = {10.21941/kcss/2017/4}, series = {Kiel Computer Science Series}, note = {Dissertation, Faculty of Engineering, Kiel University.} } © 2017 by Jan Christian Kässens ii About this Series The Kiel Computer Science Series (KCSS) covers dissertations, habilitation theses, lecture notes, textbooks, surveys, collections, handbooks, etc. written at the Department of Computer Science at Kiel University. It was initiated in 2011 to support authors in the dissemination of their work in electronic and printed form, without restricting their rights to their work. The series provides a unified appearance and aims at high-quality typography. The KCSS is an open access series; all series titles are electronically available free of charge at the department’s website. In addition, authors are encouraged to make printed copies available at a reasonable price, typically with a print-on-demand service. Please visit http://www.informatik.uni-kiel.de/kcss for more information, for instructions how to publish in the KCSS, and for access to all existing publications. iii 1. Gutachter: Prof. Dr. Manfred Schimmler Christian-Albrechts-Universität Kiel 2. Gutachter: Prof. Dr. Bertil Schmidt Johannes-Gutenberg-Universität Mainz Datum der mündlichen Prüfung: 10. Oktober 2017 iv Zusammenfassung Seit der Einführung der Next Generation Sequencing (NGS)-Technologie steigen die Datenmengen aus der Sequenzierung von Genomen besonders schnell. Die Verfügbarkeit dieser Daten führt wiederum zu der Erschlie- ßung neuer Felder in der Molekular- und Zellbiologie, sowie der Genetik, die wiederum neue Daten erzeugen. Andererseits steigt die verfügbare Rechenleistung in Rechenzentren nur linear. In den letzten Jahren zeigte sich jedoch, dass neue Spezialhardware in Rechenzentren Einzug nimmt, insbesondere Grafikprozessoren (GPUs) und, weniger verbreitet, FPGAs (field-programmable gate arrays). Durch den Leistungsbedarf angetrieben be- gannen Entwickler Standardsoftware auf diese neuen Systeme zu portieren, um die speziellen Fähigkeiten ausnutzen zu können. Systeme, die GPUs und FPGAs gemeinsam nutzen, sind jedoch selten zu finden. Besonde- re Herausforderungen stellt dabei einerseits der Bedarf an tiefgehendem Know-How in zwei diametral verschiedenen Programmierparadigmen dar, sowie das nötige Ingeneurswissen um das für heterogene Systeme typische Nadelöhr in der Kommunikation zwischen den beteiligten Geräten. Für diese Arbeit wurden zwei Algorithmen aus der Bioinformatik für die Implementierung auf einer neuen, hybrid-parallelen Rechnerarchitektur und Softwareplattform ausgewählt, die die Vorzüge von GPUs, FPGAs und CPUs auf besonders effiziente Weise nutzt. Es wird gezeigt, dass ei- ne solche Entwicklung nicht nur möglich ist, sondern die Rechenleistung homogener FPGA- oder GPU-Systeme vergleichbarer Größe übertrifft und dennoch weniger Energie benötigt. Beide Methoden werden genutzt, um Fall-Kontroll-Daten aus Assoziationsstudien auszuwerten und auf Interak- tionen zwischen zwei oder drei Genen zu analysieren. Insbesondere bei letzterem zeigt sich, dass die neu gewonnene Rechenleistung es erstmals ermöglicht, größere Datenmengen zu untersuchen, ohne ganze Rechenzen- tren über Wochen auszulasten. Der Erfolg der Architektur führt schließlich zu der Entwicklung eines Hochleistungsrechners auf Basis des vorliegenden Konzeptes. v Abstract Since the advent of Next Generation Sequencing (NGS) technology, the amount of data from whole genome sequencing has been rising fast. In turn, the availability of these resources led to the tapping of whole new research fields in molecular and cellular biology, producing even more data. On the other hand, the available computational power is only increasing linearly. In recent years though, special-purpose high-performance devices started to become prevalent in today’s scientific data centers, namely graphics processing units (GPUs) and, to a lesser extent, field-programmable gate arrays (FPGAs). Driven by the need for performance, developers started porting regular applications to GPU frameworks and FPGA configurations to exploit the special operations only these devices may perform in a timely manner. However, applications using both accelerator technologies are still rare. Major challenges in joint GPU/FPGA application development include the required deep knowledge of associated programming paradigms and the efficient communication both types of devices. In this work, two algorithms from bioinformatics are implemented on a custom hybrid-parallel hardware architecture and a highly concurrent software platform. It is shown that such a solution is not only possible to develop but also its ability to outperform implementations on similar- sized GPU or FPGA clusters in terms of both performance and energy consumption. Both algorithms analyze case/control data from genome- wide association studies to find interactions between two or three genes with different methods. Especially in the latter case, the newly available calculation power and method enables analyses of large data sets for the first time without occupying whole data centers for weeks. The success of the hybrid-parallel architecture proposal led to the development of a high- end array of FPGA/GPU accelerator pairs to provide even better runtimes and more possibilities. vii Acknowledgments I would not have been able to complete this work without the guidance and support of a lot of people and I would like to take this opportunity to express my gratitude. Probably the most important person during my academic career was my supervisor, mentor and leader of the Computer Engineering Group, Manfred Schimmler. From my second semester on, I regularly visited his tutoring lessons and got to know an enthusiastic lecturer that never failed to motivate me, and he obviously also didn’t fail this time. Besides his deep knowledge and his readiness to provide funds for prototype systems with uncertain value, this is probably the quality with the highest importance for me, thank you. On the administrative side, I wish to thank Brigitte Scheidemann, who did what can best be summarized as “always knowing what to do, even if or especially when nobody else did.” I further wish to thank Bertil Schmidt for his continued guidance in publishing articles and fruitful discussions about new technology, Andre Franke and David Ellinghaus for supporting my research even though I should probably have done other things in my working time. Every scientist in academia is only as good as his team and this is especially the case with me. Thank you for being my colleagues, Vasco Grossmann, Sven Koschnicke and Christoph Starke, for listening to my sometimes confused ideas and providing me with chocolate and sweets of all kinds. Special thanks go to my tutor, mentor, room mate and true friend (in temporal order) Lars Wienbrandt. A rather outstanding property of having him as a colleague and friend is the ability to discuss problems, both personal and professional, in a unique, in-depth, sometimes emotional and sometimes rough way, with him always being open and forgiving. Besides our relationship, he was the one that pulled me into working with FPGAs and eventually hybrid computers. Thanks for what you have done so far. On the non-academic side, I would like to express my gratitude to my family who always covered my back, especially to my wife Patricia and my daughter Freya who had to do an awful lot of time without me, even when I was technically present, but never complained and always beared with me when things didn’t go well. I will never forget that. viii Contents 1 Introduction1 1.1 Motivation..............................2 1.2 Related Work............................5 1.3 Structure of This Work.......................8 2 Architecture Essentials 11 2.1 Architectures Overview...................... 12 2.1.1 Central Processing Units (CPUs)............. 12 2.1.2 Graphics Processing Units (GPUs)............ 25 2.1.3 Field Programmable Gate Arrays (FPGAs)....... 31 2.1.4 Comparison and Suitability Assessment........ 38 2.2 Data Transfer and Communication................ 39 2.2.1 Direct Memory Access................... 40 2.2.2 PCI Express......................... 43 2.3 Parallel Computing......................... 47 2.3.1 Parallelism and Machine Models............. 47 2.3.2 Systolic Arrays....................... 49 3 Applications 53 3.1 A Primer on Bioinformatics...................

A Hybrid-Parallel Architecture for Applications in Bioinformatics

Efficient Implementation of an Optimized Attack on a Reconfigurable Hardware Cluster

Intel 8080 Datasheet

45-Year CPU Evolution: One Law and Two Equations

Three-Dimensional Integrated Circuit Design: EDA, Design And

Appendix D an Alternative to RISC: the Intel 80X86

Wdv-Notes Stand: 29.DEZ.1994 (2.) 329 Intel Pentium – Business Must Learn from the Debacle

The Birth, Evolution and Future of Microprocessor

Professor Won Woo Ro, School of Electrical and Electronic Engineering Yonsei University the Intel® 4004 Microprocessor, Introdu

Declustering Spatial Databases on a Multi-Computer Architecture

High Performance Computing Zur Technischen Finanzmarktanalyse

RZ Family Microprocessors Brochure

Chap01: Computer Abstractions and Technology