Accelerating Phylogeny-Aware Short DNA Read Alignment with Fpgas
Total Page:16
File Type:pdf, Size:1020Kb
Accelerating Phylogeny-Aware Short DNA Read Alignment with FPGAs Nikolaos Alachiotis, Simon Berger, Alexandros Stamatakis The Exelixis Lab, Scientific Computing Group Heidelberg Institute for Theoretical Studies Heidelberg, Germany Emails: {Nikolaos.Alachiotis, Simon.Berger, Alexandros.Stamatakis}@h-its.org Abstract—Recent advances in molecular sequencing tech- reads to a fixed reference alignment that also uses the in- nology have given rise to novel algorithms for simultaneously formation contained in the respective evolutionary reference aligning short sequence reads to reference sequence alignments tree. Although short reads can be aligned with respect to a and corresponding evolutionary reference trees. We present a complete hardware/software implementation for the accelera- reference alignment using the HMMALIGN tool (part of the tion of a program called PaPaRa, a newly introduced dynamic HMMER [5] tool suite), PaPaRa outperforms HMMER in programming algorithm for this purpose. the context of phylogenetic short read placement, because We verify the correctness of the proposed architecture on the tree structure (phylogeny) is also incorporated by Pa- a real FPGA and introduce a straight-forward communication PaRa [2] for the alignment process. Like most alignment protocol (using gigabit ethernet) for seamless integration with the encapsulating steering software that is executed on a methods, PaPaRa is based on a dynamic-programming al- PC processor. The hardware description and the software gorithm. The underlying principle is similar to the Smith- implementation are freely available for download. Waterman algorithm [6], with affine gap penalties [7]. How- When mapped to a Virtex 6 FPGA, our reconfigurable ar- ever, the specific alignment kernel in PaPaRa is used to chitecture can compute 133.4 billion cell updates per second for align a sequence (a short read) against an ancestral state the novel, tree-based alignment kernel of PaPaRa. Compared to PaPaRa, running on a 3.2GHz Intel Core i5 CPU, we obtain vector that is derived from varying positions (branches) in speedups for the alignment kernel, that range between 170 and the phylogenetic reference tree.To this end, compared to the 471. For the entire application, that is, the alignment kernel Smith-Waterman algorithm, PaPaRa implements a unique and the trace-back step, we obtain speedups between 74 and alignment kernel and scoring scheme. Moreover, the PaPaRa 118. alignment kernel is ’one-sided’, that is, gaps can only be Keywords-FPGA, dynamic programming, multiple align- inserted into the query sequence and not into the reference ment, phylogenetic inference alignment. The source code of PaPaRa is available as part of I. INTRODUCTION RAxML [8] (http://wwwkramer.in.tum.de/exelixis/papara. Significant advances in molecular wet-lab sequencing tar.gz), a popular likelihood-based phylogenetic inference techniques, that is, methods for determining the order of program. Profiling of the PaPaRa source code revealed nucleotides in a DNA molecule, have led to a tremendous that the scoring function (the alignment kernel) accounts biological data flood in recent years. The term short for 98% to 99.5% of overall execution time. To date, read refers to DNA sequence data that are produced by reconfigurable devices have already successfully been de- a new-generation sequencer. Current state-of-the-art pyro- ployed to accelerate Bioinformatics applications including sequencing technologies can generate between 100,000 to phylogenetic inference kernels [9] and a plethora of pair- 1,000,000 short reads. The read lengths typically vary be- wise sequence alignment algorithms [10]. Since PaPaRa uses tween 30 and 450 nucleotides. To allow for an efficient and phylogenetic inference and alignment kernels we explore accurate phylogenetic analysis of such short read samples, FPGA technology to accelerate the code. novel maximum likelihood-based methods [1] have recently The FPGA-based system we introduce here is able to been introduced ([2], [3], [4]). These approaches, known as reduce the execution time of the PaPaRa kernel by up to phylogenetic placement algorithms, assign the short reads 748 times compared to executing PaPaRa on an Intel Core i5 to a fixed, given reference phylogeny, that is, an unrooted CPU running at 3.2GHz. The underlying system architecture phylogenetic (evolutionary) tree which is based on a given of our design is to offload the scoring function (align- multiple sequence reference alignment. Before applying one ment kernel) calculations to dedicated hardware components of the above phylogenetic placement algorithms, all short (henceforth denoted as: Score Processing Units, SPUs) on reads must be aligned to the reference alignment. the FPGA. The software implementation on the PC side Here, we present a FPGA-based system for the accel- is used to orchestrate computations, collect the scores, and eration of PaPaRa [2] (PArsimony-based Phylogeny-Aware perform the final trace-back step of the alignment algorithm. short Read Alignment), a novel method for aligning short To allow for reproduction of all results in this paper the Ancestral State Vector software implementation and the hardware description of a(A) 1 0 1 1 1 1 1 0 0 1 0 the FPGA design are available as open-source code at: a(C) 0 1 0 1 1 0 1 0 0 1 0 http://wwwkramer.in.tum.de/exelixis/FPGA papara.tar.gz. a(G) 0 0 0 1 0 0 1 0 1 1 1 a(T) 0 0 0 0 0 0 1 1 0 0 0 The remainder of the paper is organized as follows: Section II describes the PaPaRa algorithm. In Section III, A 0 2 0 0 0 0 0 we review related work on FPGA-based accelerators for C 0 2 0 0 2 0 2 QS A 0 2 0 0 2 3 4 standard alignment algorithms. The reconfigurable architec- T 3 5 3 0 2 3 4 ture of the Score Processing Unit (SPU) is described in G 6 8 3 3 2 3 4 Section IV while Section V provides a detailed descrip- - - - A C - A T G - - tion of the verification process and the design of a fully aligned QS functional FPGA-based system. In Section VI we provide performance results for our FPGA accelerator. We conclude Figure 1. Example of a dyamic-programming matrix for aligning a QS against an ancestral state vector. The positions for which the QS characters in Section VII. match the bits in the ancestral state vector are highlighted. II. THE PAPARA ALGORITHM The algorithm aligns short reads, henceforth denoted as removing again) an additional node (’virtual root’) into each query sequences (QS), against ancestral state vectors derived edge of the RT. The ancestral state vector is then calculated from the reference multiple sequence alignment (RA) and for each virtual root (each edge of the RT) by combining the corresponding phylogenetic reference tree (RT) that has the ancestral states of the left and right internal nodes that been inferred using the RA. In a phylogenetic tree, known are attached to the branch under consideration. In addition sequences (e.g., nucleotide or amino-acid sequence data) to the ancestral state vectors, PaPaRa computes and uses of living species (taxa) are assigned to the leaves (tips) an additional, dedicated signal which provides information of the tree. The internal nodes correspond to hypothetical about the distribution of gaps in the RT. This signal consists common ancestors of the subtrees they define. Because the of a supplementary flag (CGAP) for each site i which real sequences at the ancestral nodes are not known, different is used to appropriately adapt the scoring function of the methods for representing the inherent uncertainty of ances- dynamic programming implementation. The CGAP signal is tral states (ancestral sequences) have been introduced in calculated along with the ancestral state vectors, according the context of functions for scoring alternative phylogenetic to the rules described in [2]. trees (different evolutionary hypotheses). The alignment scoring function is provided in equation 1. The maximum parsimony (MP) method is a widely used The equation recursively defines the score of the dynamic- optimality criterion for phylogenetic inference. Given a programming matrix cell Di,j in column i and row j for fixed multiple sequence alignment, the idea is to find the aligning site Ai of the ancestral state vector against site Bj phylogenetic tree which explains the alignment by the least in the QS. number of mutations. For nucleotide sequences (DNA/RNA) the alphabet contains four characters: A (Adenine), C (Cy- 10 if CGAP is set for site i CGi = tosine), G (Gouanine), and T (Thymine). The computation 0 otherwise and representation of ancestral state vectors under parsimony (2, 1) if CGi = 0 (GP i ,GP i ) = is described in [11]. The key point is that for DNA data, OE E (0, 0) otherwise b every edge (branch) in the RT can be represented by i j 1 n i,j 0 if A andB match a parsimony state vector Ab = Ab , ..., Ab , where each S = i 3 otherwise individual Ab represents the parsimony state of alignment i−1,j i i,j D + GPOE site i (i.e., column i) of the RA. The individual entries D = min −1 i E Di ,j GP i in the parsimony state vector consist of bit vectors A = E + E 4 b i i i i i−1,j−1 i,j i ab(A),ab(C),ab(G),ab(T ) ∈ [0, 1] , where the individual i,j D + S + CG i D = min i,j (1) bits in ab correspond to the four DNA characters. This is DE a key difference to the simpler case of pair-wise sequence alignment, where both input sequences can only have one For instance, an ancestral state Ai at site i with Ai = character per site. Here, every site in Ab can be either an [1100] means that, As and Cs in the QS under consideration A, C, G, or T, or any possible combinations thereof (i.e., can be matched against alignment site i of the ancestral an ancestral site can be an A or a T), to reflect the inherent state without incurring a mismatch penalty.