Fpgas in Bioinformatics

Total Page:16

File Type:pdf, Size:1020Kb

Fpgas in Bioinformatics FPGAs in Bioinformatics Implementation and Evaluation of Common Bioinformatics Algorithms in Reconfigurable Logic Dipl.-Inf. Lars Wienbrandt Dissertation zur Erlangung des akademischen Grades Doktor der Ingenieurwissenschaften (Dr.-Ing.) der Technischen Fakultät der Christian-Albrechts-Universität zu Kiel eingereicht im Jahr 2015 Kiel Computer Science Series (KCSS) 2016/2 v1.0 dated 2016-03-15 ISSN 2193-6781 (print version) ISSN 2194-6639 (electronic version) Electronic version, updates, errata available via https://www.informatik.uni-kiel.de/kcss The author can be contacted via http://www.techinf.informatik.uni-kiel.de Published by the Department of Computer Science, Kiel University Technical Computer Science Group Please cite as: Ź Lars Wienbrandt. FPGAs in Bioinformatics Number 2016/2 in Kiel Computer Science Series. Department of Computer Science, 2016. Dissertation, Faculty of Engineering, Kiel University. @book{Wienbrandt16, author = {Lars Wienbrandt}, title = {{FPGAs in Bioinformatics}}, publisher = {Department of Computer Science, Kiel University}, year = {2016}, number = {2016/2}, series = {Kiel Computer Science Series}, note = {Dissertation, Faculty of Engineering, Kiel University.} } © 2016 by Lars Wienbrandt ii About this Series The Kiel Computer Science Series (KCSS) covers dissertations, habilitation theses, lecture notes, textbooks, surveys, collections, handbooks, etc. written at the Department of Computer Science at Kiel University. It was initiated in 2011 to support authors in the dissemination of their work in electronic and printed form, without restricting their rights to their work. The series provides a unified appearance and aims at high-quality typography. The KCSS is an open access series; all series titles are electronically available free of charge at the department’s website. In addition, authors are encouraged to make printed copies available at a reasonable price, typically with a print-on-demand service. Please visit http://www.informatik.uni-kiel.de/kcss for more information, for instructions how to publish in the KCSS, and for access to all existing publications. iii 1. Gutachter: Prof. Dr. rer. nat. Manfred Schimmler Technical Computer Science Group Department of Computer Science Christian-Albrechts-University of Kiel 2. Gutachter: Prof. Dr. rer. nat. Andre Franke Genetics & Bioinformatics Group Institute of Clinical Molecular Biology Christian-Albrechts-University of Kiel 3. Gutachter: Prof. Dr. Olaf Wolkenhauer Systems Biology & Bioinformatics Group Institute of Computer Science University of Rostock Datum der mündlichen Prüfung: 4. März 2016 iv Acknowledgments FPGA technology has fascinated me already during my study of Computer Science. Soon, I got in touch with the area of bioinformatics and saw a great potential in the combination of both. After my graduation I continued research in this area and had the opportunities to publish many of my results. I was allowed to present most of them on several conferences around the world, and some of them even on invitation. Thus, first and foremost, I owe my special thanks to my supervisor Manfred Schimmler who always supported, promoted and motivated my work and gave me the freedom I needed including the approval for travel funding. He also encouraged me to apply for project fundings together with Andre Franke, whom I owe my grateful thanks for supporting me and my work in this manner. This successful project has finally funded my research in the last three years. And I owe my true appreciation to Bertil Schmidt as well, for the numerous lucrative publications and additional motivation and support in the second successfully raised collaboration project fundings that sustained the SNP interaction analysis project and financed Jan Kässens, who turned out to be my most cooperative colleague. His and my work perfectly complemented each other. With his knowledge in writing extremely efficient host code he always squeezed out the last bits of total system performance. With him developing the host part while I was working out the hardware design we perfectly minimized the development time for our applications. I sincerely thank him for not just being an invaluable colleague but also for being my friend. Furthermore, I would like to thank my colleagues and former colleagues for their constructive help, namely Daniel Siebert for his counterpart on the BLASTp application, Jorge González-Domínguez and Jost Bissel for their work in the SNP interaction analysis project, especially Jorge for creating the competitive solutions on GPUs, Ayman Abbas who provided me insight in the fruitful FPGA application area of cryptanalysis, and Vasco Gross- v Acknowledgments mann, Sven Koschnicke and Christoph Starke who tested FPGA technology in stock market analysis. I also sincerely thank Matthias Hübenthal and David Ellinghaus from the ICMB for their help on the mathematics of the SHAPEIT2 genotype phasing tool. My special thanks go to our secretary Brigitte Scheidemann. She helped me with all kinds of administration tasks especially when I again had a complicated travel cost refund application. Moreover, I truly appreciate the valuable return of results from all students whom I supervised for their theses. And thank you, Dirk Nowotka, Reinhard Koch and again Andre Franke for joining my examination committee. Last but not least, I want to thank my mother and my sister for unques- tionably supporting me in all kinds of situations, and, of course, I truly appreciate the backing of my wife. She always encouraged me to go on, when I ran into a situation that seemed to be a dead end, she always told me to keep my head up when I lost my self-confidence, and she always calmed me when I thought I was going insane. Thank you, Ilze, for being there and that I can always rely on you! And finally I do not want to forget to mention my little daughter Alina Annija, who is my sunshine, even when it is raining, and thus always reminds me of what is most important in life. This study makes use of data generated by the Wellcome Trust Case- Control Consortium. A full list of the investigators who contributed to the generation of the data is available from http://www.wtccc.org.uk. Funding for the project was provided by the Wellcome Trust under award 076113 and 085475. vi Zusammenfassung Das Leben. Sehr viel Aufwand wird getrieben um der Menschheit einen Einblick in dieses faszinierende und komplexe, aber fundamentale Thema zu erlauben. Um Zusammenhänge zu verstehen und Folgen ableiten zu können hat der Mensch begonnen sein Genom zu sequenzieren, d.h. seine DNA zu bestimmen um daraus Informationen, z.B. in Bezug auf Erbkrank- heiten folgern zu können. Der Prozess der DNA-Sequenzierung sowie die darauffolgenden Analysen sind schon allein wegen der riesigen Datenmen- gen eine Herausforderung für aktuelle Rechensysteme. Laufzeiten von über einen Tag für die Analyse einfacher Datensätze sind üblich, selbst wenn der Prozess bereits auf einem Computercluster ausgeführt wird. Diese Arbeit zeigt, wie dieses gängige Problem im Bereich der Bioin- formatik mit rekonfigurierbarer Hardware, speziell FPGAs, angegangen werden kann. Es werden drei rechenintensive Themengebiete hervorgeho- ben: Sequenzalignment, SNP-Interaktionsanalyse und Genotyp-Imputation. Beispielhaft wird im Bereich des Sequenzalignments die Software BLASTp für die Suche in Proteinsequenzdatenbanken vorgestellt, implemen- tiert und evaluiert. Die SNP-Interaktionsanalyse wird mit drei Verfahren zur vollständigen Suche von Interaktionen inklusive des dazugehörigen statistischen Tests vorgestellt: die Messung der Kullback-Leibler-Divergenz in BOOST, die r-Differenz in iLOCi und die Messung der Transinformation. Alle Verfahren werden auf FPGA-Hardware implementiert und evaluiert, mit einer bestechenden Beschleunigung im dreistelligen Bereich gegenüber Standard-Rechnern. Das letzte Gebiet der Genotyp-Imputierung ist ein zweiteiliges Verfahren bestehend aus dem Phasing und der eigentlichen Imputation. Der Schwer- punkt liegt im Phasing-Schritt, der mit dem SHAPEIT2-Tool adressiert wird. SHAPEIT2 wird ausführlich mit den zugrunde liegenden mathematischen Methoden diskutiert, und schließlich implementiert und evaluiert. Auch hier wird ein beachtlicher Speedup von 46 erreicht. vii Abstract Life. Much effort is taken to grant humanity a little insight in this fascinating and complex but fundamental topic. In order to understand the relations and to derive consequences humans have begun to sequence their genomes, i.e. to determine their DNA sequences to infer information, e.g. related to genetic diseases. The process of DNA sequencing as well as subsequent analysis presents a computational challenge for recent computing systems due to the large amounts of data alone. Runtimes of more than one day for analysis of simple datasets are common, even if the process is already run on a CPU cluster. This thesis shows how this general problem in the area of bioinformatics can be tackled with reconfigurable hardware, especially FPGAs. Three com- pute intensive problems are highlighted: sequence alignment, SNP interaction analysis and genotype imputation. In the area of sequence alignment the software BLASTp for protein database searches is exemplarily presented, implemented and evaluated. SNP interaction analysis is presented with three applications performing an exhaustive search for interactions including the corresponding statistical tests: BOOST, iLOCi and the mutual information measurement. All appli- cations are implemented in FPGA-hardware and evaluated,
Recommended publications
  • Efficient Implementation of an Optimized Attack on a Reconfigurable Hardware Cluster
    Breaking ecc2-113: Efficient Implementation of an Optimized Attack on a Reconfigurable Hardware Cluster Susanne Engels Master’s Thesis. February 22, 2014. Chair for Embedded Security – Prof. Dr.-Ing. Christof Paar Advisor: Ralf Zimmermann EMSEC Abstract Elliptic curves have become widespread in cryptographic applications since they offer the same cryptographic functionality as public-key cryptosystems designed over integer rings while needing a much shorter bitlength. The resulting speedup in computation as well as the smaller storage needed for the keys, are reasons to favor elliptic curves. Nowadays, elliptic curves are employed in scenarios which affect the majority of people, such as protecting sensitive data on passports or securing the network communication used, for example, in online banking applications. This works analyzes the security of elliptic curves by practically attacking the very basis of its mathematical security — the Elliptic Curve Discrete Logarithm Problem (ECDLP) — of a binary field curve with a bitlength of 113. As our implementation platform, we choose the RIVYERA hardware consisting of multiple Field Programmable Gate Arrays (FPGAs) which will be united in order to perform the strongest attack known in literature to defeat generic curves: the parallel Pollard’s rho algorithm. Each FPGA will individually perform a what is called additive random walk until two of the walks collide, enabling us to recover the solution of the ECDLP in practice. We detail on our optimized VHDL implementation of dedicated parallel Pollard’s rho processing units with which we equip the individual FPGAs of our hardware cluster. The basic design criterion is to build a compact implementation where the amount of idling units — which deplete resources of the FPGA but contribute in only a fraction of the computations — is reduced to a minimum.
    [Show full text]
  • Current Status and Future Perspectives of Bioinformatics in Tanzania
    CURRENT STATUS AND FUTURE PERSPECTIVES OF BIOINFORMATICS IN TANZANIA Sylvester L. Lyantagaye Department of Molecular Biology and Biotechnology, College of Natural and Applied Sciences, University of Dar es Salaam, P.O. Box 35179, Dar es Salaam, Tanzania E-mail: [email protected], [email protected] ABSTRACT The main bottleneck in advancing genomics in present times is the lack of expertise in using bioinformatics tools and approaches for data mining in raw DNA sequences generated by modern high throughput technologies such as next generation sequencing. Although bioinformatics has been making major progress and contributing to the development in the rest of the world, it has still not yet fully integrated the tertiary education and research sector in Tanzania. This review aims to introduce a summary of recent achievements, trends and success stories of application of bioinformatics in biotechnology. The applications of bioinformatics in the fields such as molecular biology, biotechnology, medicine and agriculture, the global trend of bioinformatics, accessibility bioinformatics products in Tanzania, bioinformatics training initiatives in Tanzania, the future prospects of bioinformatics use in biotechnology globally and Tanzania in particular are reviewed. The paper is of interest and importance to rouse public awareness of the new opportunities that could be brought about by bioinformatics to address many research problems relevant to Tanzania and sub-Sahara Africa. Keywords: Bioinformatics, Biotechnology, Genomics, Tanzania. INTRODUCTION analysis, and management of biological Bioinformatics is conceptualising biology in information through computational terms of molecules (in the sense of physical techniques. It uses mathematics, biology, chemistry) and applying "informatics and computer science to understand the techniques" (derived from disciplines such biological importance of an extensive as applied mathematics, computer science variety of omics data (Reportlinker 2013).
    [Show full text]
  • High Performance Computing Zur Technischen Finanzmarktanalyse
    High Performance Computing zur technischen Finanzmarktanalyse Christoph Starke Dissertation zur Erlangung des akademischen Grades Doktor der Ingenieurwissenschaften (Dr.-Ing.) der Technischen Fakultät der Christian-Albrechts-Universität zu Kiel eingereicht im Jahr 2012 1. Gutachter: Prof. Dr. Manfred Schimmler Christian-Albrechts-Universität zu Kiel 2. Gutachter: Prof. Dr. Andreas Speck Christian-Albrechts-Universität zu Kiel Datum der mündlichen Prüfung: 24.9.2012 ii Zusammenfassung Auf Grundlagen der technischen Finanzmarktanalyse wird ein Algorith- mus für eine sicherheitsorientierte Wertpapierhandelsstrategie entwickelt. Maßgeblich für den Erfolg der Handelsstrategie ist dabei eine mög- lichst optimale Gewichtung mehrerer Indikatoren. Die Ermittlung dieser Gewichte erfolgt in einer sogenannten Kalibrierungsphase, die extrem rechenintensiv ist. Bei einer direkten Implementierung auf einem herkömmlichen High Performance PC würde diese Kalibrierungsphase zigtausend Jahre dauern. Deshalb wird eine parallele Version des Algorithmus entwickelt, die hervorragend für die massiv parallele, FPGA-basierte Rechnerarchitektur der RIVYERA geeignet ist, die am Lehrstuhl für technische Infor- matik der Christian-Albrechts-Universität zu Kiel entwickelt wurde. Durch mathematisch äquivalente Transformationen und Optimierungs- schritte aus verschiedenen Bereichen der Informatik gelingt eine FPGA- Implementierung mit einer im Vergleich zu dem PC mehr als 22.600-fach höheren Performance. Darauf aufbauend wird durch die zusätzliche Ent- wicklung eines
    [Show full text]
  • High-Performance Reconfigurable Computing
    International Journal of Reconfigurable Computing High-Performance Reconfigurable Computing Guest Editors: Khaled Benkrid, Esam El-Araby, Miaoqing Huang, Kentaro Sano, and Thomas Steinke High-Performance Reconfigurable Computing International Journal of Reconfigurable Computing High-Performance Reconfigurable Computing Guest Editors: Khaled Benkrid, Esam El-Araby, Miaoqing Huang, Kentaro Sano, and Thomas Steinke Copyright © 2012 Hindawi Publishing Corporation. All rights reserved. This is a special issue published in “International Journal of Reconfigurable Computing.” All articles are open access articles distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, pro- vided the original work is properly cited. Editorial Board Cristinel Ababei, USA Paris Kitsos, Greece Mario Porrmann, Germany Neil Bergmann, Australia Chidamber Kulkarni, USA Viktor K. Prasanna, USA K. L. M. Bertels, The Netherlands Miriam Leeser, USA Leonardo Reyneri, Italy Christophe Bobda, Germany Guy Lemieux, Canada Teresa Riesgo, Spain Miodrag Bolic, Canada Heitor Silverio Lopes, Brazil Marco D. Santambrogio, USA Joao˜ Cardoso, Portugal Martin Margala, USA Ron Sass, USA Paul Chow, Canada Liam Marnane, Ireland Patrick R. Schaumont, USA Rene´ Cumplido, Mexico Eduardo Marques, Brazil Andrzej Sluzek, Singapore Aravind Dasu, USA Maire´ McLoone, UK Walter Stechele, Germany Claudia Feregrino, Mexico Seda Ogrenci Memik, USA Todor Stefanov, The Netherlands Andres D. Garcia, Mexico Gokhan Memik, USA Gregory Steffan, Canada Soheil Ghiasi, USA Daniel Mozos, Spain Gustavo Sutter, Spain Diana Gohringer,¨ Germany Nadia Nedjah, Brazil Lionel Torres, France Reiner Hartenstein, Germany Nik Rumzi Nik Idris, Malaysia Jim Torresen, Norway Scott Hauck, USA JoseNu´ nez-Ya˜ nez,˜ UK W. Vanderbauwhede, UK Michael Hubner,¨ Germany Fernando Pardo, Spain Mus¨¸tak E.
    [Show full text]
  • White Paper on CLC Read Mapper
    White Paper White paper on CLC read mapper October 10, 2012 Sample to Insight QIAGEN Aarhus Silkeborgvej 2 Prismet 8000 Aarhus C Denmark Telephone: +45 70 22 32 44 www.qiagenbioinformatics.com [email protected] White paper: White paper on CLC read mapper 2 Contents 1 Abstract 3 2 Introduction 3 3 Performance Benchmark of CLC Genomics Workbench/Server 5 4 Computational Resource Requirements of CLC Assembly Cell 9 5 Accuracy Benchmark of CLC Assembly Cell 12 6 Algorithmic Details 18 7 Conclusions 19 8 Materials & Methods 19 8.1 Performance Benchmark of CLC Genomics Workbench/Server ........... 19 8.2 Computational Resource Requirements of CLC Assembly Cell ............ 20 8.3 Accuracy Benchmark of CLC Assembly Cell ...................... 20 8.4 Read Mappers ..................................... 22 8.5 Data Sets ....................................... 22 8.6 Benchmarking Hardware ............................... 23 8.7 bash related ...................................... 23 References 25 White paper: White paper on CLC read mapper 3 1 Abstract After decades of biological research relying on Sanger sequencing, massively parallel high throughput sequencing (HTS) techniques have created a broad range of new and exciting research applications by increasing the output sequencing data dramatically. Although allowing for hitherto unseen DNA and RNA sequencing and binding study designs, HTS technologies have created remarkable bioinformatic challenges. In typical resequencing studies read mappers are used to align sequence reads to reference genomes. Read mapping, while not too time consuming in the Sanger-century, quickly evolved into one of the most insistent bottlenecks. Nowadays, researchers can resort to a broad range of read mapping solutions and it becomes increasingly challenging to choose software that optimally meets given requirements.
    [Show full text]
  • A Hybrid-Parallel Architecture for Applications in Bioinformatics
    A Hybrid-parallel Architecture for Applications in Bioinformatics M.Sc. Jan Christian Kässens Dissertation zur Erlangung des akademischen Grades Doktor der Ingenieurwissenschaften (Dr.-Ing.) der Technischen Fakultät der Christian-Albrechts-Universität zu Kiel eingereicht im Jahr 2017 Kiel Computer Science Series (KCSS) 2017/4 dated 2017-11-08 URN:NBN urn:nbn:de:gbv:8:1-zs-00000335-a3 ISSN 2193-6781 (print version) ISSN 2194-6639 (electronic version) Electronic version, updates, errata available via https://www.informatik.uni-kiel.de/kcss The author can be contacted via [email protected] Published by the Department of Computer Science, Kiel University Computer Engineering Group Please cite as: Ź Jan Christian Kässens. A Hybrid-parallel Architecture for Applications in Bioinformatics Num- ber 2017/4 in Kiel Computer Science Series. Department of Computer Science, 2017. Dissertation, Faculty of Engineering, Kiel University. @book{Kaessens17, author = {Jan Christian K\"assens}, title = {A Hybrid-parallel Architecture for Applications in Bioinformatics}, publisher = {Department of Computer Science, CAU Kiel}, year = {2017}, number = {2017/4}, doi = {10.21941/kcss/2017/4}, series = {Kiel Computer Science Series}, note = {Dissertation, Faculty of Engineering, Kiel University.} } © 2017 by Jan Christian Kässens ii About this Series The Kiel Computer Science Series (KCSS) covers dissertations, habilitation theses, lecture notes, textbooks, surveys, collections, handbooks, etc. written at the Department of Computer Science at Kiel University. It was initiated in 2011 to support authors in the dissemination of their work in electronic and printed form, without restricting their rights to their work. The series provides a unified appearance and aims at high-quality typography. The KCSS is an open access series; all series titles are electronically available free of charge at the department’s website.
    [Show full text]
  • An Efficient VHDL Description and Hardware Implementation of The
    An Efficient VHDL Description and Hardware Implementation of the Triple DES Algorithm A thesis submitted to the Graduate School of the University of Cincinnati In partial fulfillment of the requirements for the degree of Master of Science In the Department of Electrical and Computer Engineering Of the College of Engineering and Applied Sciences June 2014 By Lathika SriDatha Namburi B.Tech, Electronics and Communications Engineering, Jawaharlal Nehru Technological University, Hyderabad, India, 2011 Thesis Advisor and Committee Chair: Dr. Carla Purdy ABSTRACT Data transfer is becoming more and more essential these days with applications ranging from everyday social networking to important banking transactions. The data that is being sent or received shouldn’t be in its original form but must be coded to avoid the risk of eavesdropping. A number of algorithms to encrypt and decrypt the data are available depending on the level of security to be achieved. Many of these algorithms require special hardware which makes them expensive for applications which require a low to medium level of data security. FPGAs are a cost effective way to implement such algorithms. We briefly survey several encryption/decryption algorithms and then focus on one of these, the Triple DES. This algorithm is currently used in the electronic payment industry as well as in applications such as Microsoft OneNote, Microsoft Outlook and Microsoft system center configuration manager to password protect user content and data. We implement the algorithm in a Hardware Description Language, specifically VHDL and deploy it on an Altera DE1 board which uses a NIOS II soft core processor. The algorithm takes input encoded using a software based Huffman encoding to reduce its redundancy and compress the data.
    [Show full text]
  • Secure Volunteer Computing for Distributed Cryptanalysis
    ysis SecureVolunteer Computing for Distributed Cryptanal Nils Kopal Secure Volunteer Computing for Distributed Cryptanalysis ISBN 978-3-7376-0426-0 kassel university 9 783737 604260 Nils Kopal press kassel university press ! "# $ %& & &'& #( )&&*+ , #()&- ( ./0 12.3 - 4 # 5 (!!&& & 6&( 7"#&7&12./ 5 -839,:,3:3/,2;1/,2% ' 5 -839,:,3:3/,2;13,3% ,' 05( (!!<& &!.2&.81..!")839:3:3/2;133 "=( (!!, #& !(( (2221,;2;13/ '12.97 # ?@7 & &, & ) ? “With magic, you can turn a frog into a prince. With science, you can turn a frog into a Ph.D. and you still have the frog you started with.” Terry Pratchett Abstract Volunteer computing offers researchers and developers the possibility to distribute their huge computational jobs to the computers of volunteers. So, most of the overall cost for computational power and maintenance is spread across the volunteers. This makes it possible to gain computing resources that otherwise only expensive grids, clusters, or supercomputers offer. Most volunteer computing solutions are based on a client-server model. The server manages the distribution of subjobs to the computers of volunteers, the clients, which in turn compute the subjobs and return the results to the server. The Berkeley Open Infrastructure for Network Computing (BOINC) is the most used middleware for volunteer computing. A drawback of any client-server architecture is the server being the single point of control and failure. To get rid of the single point of failure, we developed different distribution algorithms (epoch distribution algorithm, sliding-window distribution algorithm, and extended epoch distribution algorithm) based on unstructured peer-to-peer networks. These algorithms enable the researchers and developers to create volunteer computing networks without any central server.
    [Show full text]
  • Solving the Discrete Logarithm of a 113-Bit Koblitz Curve with an FPGA Cluster
    Solving the Discrete Logarithm of a 113-bit Koblitz Curve with an FPGA Cluster Erich Wenger and Paul Wolfger Graz University of Technology Institute for Applied Information Processing and Communications Inffeldgasse 16a, 8010 Graz, Austria [email protected], [email protected] Abstract. Using FPGAs to compute the discrete logarithms of elliptic curves is a well-known method. However, until to date only CPU clus- ters succeeded in computing new elliptic curve discrete logarithm records. This work presents a high-speed FPGA implementation that was used to compute the discrete logarithm of a 113-bit Koblitz curve. The core of the design is a fully unrolled, highly pipelined, self-sufficient Pollard's rho iteration function. An 18-core Virtex-6 FPGA cluster computed the discrete logarithm of a 113-bit Koblitz curve in extrapolated 24 days. Until to date, no attack on such a large Koblitz curve succeeded using as little resources or in such a short time frame. Keywords: elliptic curve cryptography, discrete logarithm problem, Koblitz curve, hardware design, FPGA, discrete logarithm record. 1 Introduction It is possible to repeatedly fold a standard letter-sized sheet of paper at the midway point about six to seven times. In 2012, some MIT students [28] were able to fold an 1.2 kilometer long toilet paper 13 times. And every time the paper was folded, the number of layers on top of each other doubled. Therefore, the MIT students ended up with 213 = 8192 layers of paper on top of each other. And poor Eve's job was to manually count all layers one by one.
    [Show full text]
  • A Massively Parallel Architecture for Bioinformatics
    A Massively Parallel Architecture for Bioinformatics Gerd Pfeiffer1, Stefan Baumgart1, Jan Schr¨oder2, and Manfred Schimmler2 1 SciEngines GmbH, 24118 Kiel, Germany, WWW home page: http://www.sciengines.com 2 Christian-Albrechts Universit¨at, Department of Computer Science, Hermann-Rodewald-Str. 3, 24118 Kiel, Germany WWW home page: http://www.informatik.uni-kiel.de Abstract. Today’s general purpose computers lack in meeting the re- quirements on computing performance for standard applications in bioin- formatics like DNA sequence alignment, error correction for assembly, or TFBS finding. The size of DNA sequence databases doubles twice a year. On the other hand the advance in computing performance per unit cost only doubles every 2 years. Hence, ingenious approaches have been developed for putting this discrepancy in perspective by use of special purpose computing architectures like ASICs, GPUs, multicore CPUs or CPU Clusters. These approaches suffer either from being too applica- tion specific (ASIC and GPU) or too general (CPU-Cluster and multi- core CPUs). An alternative is the FPGA, which outperforms the solu- tions mentioned above in case of bioinformatic applications with respect to cost and power efficiency, flexibility and communication bandwidths. For making maximal use of the advantages, a new massively parallel architecture consisting of low-cost FPGAs is presented. 1 Introduction Bioinformatics algorithms are most demanding in scientific computing. Most of the times they are not NP-hard or even of high asymptotic complexity but the sheer masses of input data make their computation laborious. Furthermore, the life science thus bioinformatics research field is growing fast and so is the input data that wait to be processed.
    [Show full text]
  • Buying in to Bioinformatics: an Introduction to Commercial Sequence Analysis Software David Roy Smith
    BRIEFINGS IN BIOINFORMATICS. VOL 16. NO 4. 700 ^709 doi:10.1093/bib/bbu030 Advance Access published on 1 September 2014 Buying in to bioinformatics: an introduction to commercial sequence analysis software David Roy Smith Submitted: 25th June 2014; Received (in revised form) : 7th August 2014 Abstract Advancements in high-throughput nucleotide sequencing techniques have brought with them state-of-the-art bio- informatics programs and software packages. Given the importance of molecular sequence data in contemporary life science research, these software suites are becoming an essential component of many labs and classrooms, and as such are frequently designed for non-computer specialists and marketed as one-stop bioinformatics toolkits. Although beautifully designed and powerful, user-friendly bioinformatics packages can be expensive and, as more arrive on the market each year, it can be difficult for researchers, teachers and students to choose the right soft- ware for their needs, especially if they do not have a bioinformatics background. This review highlights some of the currently available and most popular commercial bioinformatics packages, discussing their prices, usability, fea- tures and suitability for teaching. Although several commercial bioinformatics programs are arguably overpriced and overhyped, many are well designed, sophisticated and, in my opinion, worth the investment. If you are just beginning your foray into molecular sequence analysis or an experienced genomicist, I encourage you to explore proprietary software
    [Show full text]
  • CLC Sequence Viewer
    CLC Sequence Viewer USER MANUAL Manual for CLC Sequence Viewer 8.0.0 Windows, macOS and Linux June 1, 2018 This software is for research purposes only. QIAGEN Aarhus Silkeborgvej 2 Prismet DK-8000 Aarhus C Denmark Contents I Introduction7 1 Introduction to CLC Sequence Viewer 8 1.1 Contact information.................................9 1.2 Download and installation..............................9 1.3 System requirements................................ 11 1.4 When the program is installed: Getting started................... 12 1.5 Plugins........................................ 13 1.6 Network configuration................................ 16 1.7 Latest improvements................................ 16 II Core Functionalities 18 2 User interface 19 2.1 Navigation Area................................... 20 2.2 View Area....................................... 27 2.3 Zoom and selection in View Area.......................... 35 2.4 Toolbox and Status Bar............................... 37 2.5 Workspace...................................... 40 2.6 List of shortcuts................................... 41 3 User preferences and settings 44 3.1 General preferences................................. 44 3.2 View preferences................................... 46 3.3 Advanced preferences................................ 49 3.4 Export/import of preferences............................ 49 3 CONTENTS 4 3.5 View settings for the Side Panel.......................... 50 4 Printing 52 4.1 Selecting which part of the view to print...................... 53 4.2 Page setup.....................................
    [Show full text]