Research Article a Systolic Array-Based FPGA Parallel Architecture for the BLAST Algorithm

International Scholarly Research Network ISRN Bioinformatics Volume 2012, Article ID 195658, 11 pages doi:10.5402/2012/195658 Research Article A Systolic Array-Based FPGA Parallel Architecture for the BLAST Algorithm Xinyu Guo,1 Hong Wang,2 and Vijay Devabhaktuni1 1 Electrical Engineering and Computer Science Department, The University of Toledo, MS.308, 2801 W. Bancroft Street, Toledo, OH 43607, USA 2 Department of Engineering Technology, The University of Toledo, MS.402, 2801 W. Bancroft Street, Toledo, OH 43606, USA Correspondence should be addressed to Hong Wang, [email protected] Received 23 May 2012; Accepted 25 July 2012 Academic Editors: F. Couto, B. Haubold, and J. T. L. Wang Copyright © 2012 Xinyu Guo et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. A design of systolic array-based Field Programmable Gate Array (FPGA) parallel architecture for Basic Local Alignment Search Tool (BLAST) Algorithm is proposed. BLAST is a heuristic biological sequence alignment algorithm which has been used by bioinformatics experts. In contrast to other designs that detect at most one hit in one-clock-cycle, our design applies a Multiple Hits Detection Module which is a pipelining systolic array to search multiple hits in a single-clock-cycle. Further, we designed a Hits Combination Block which combines overlapping hits from systolic array into one hit. These implementations completed the first and second step of BLAST architecture and achieved significant speedup comparing with previously published architectures. 1. Introduction the computational requirements using current platforms is becoming a difficult task. In the bioinformatics and computational biology (BCB) Compared to general-purpose PC microprocessors, Field domain, biological sequence alignment is a quite common Programmable Gate Arrays (FPGAs) are able to provide a task. It includes comparison of DNA (nucleotide), RNA much higher degree of bit-level data parallelism which leads (nucleotide), and protein (amino acid) sequences. In such to higher computational speed. In addition, FPGAs are re- comparison, a query sequence is aligned to subject sequences programmable [5]. Scientists have been investigating parallel from a large database to find similarities [1]. Scientists are architecture design for the BLAST algorithm on FPGAs to used to adopting sequence alignment to infer biological achieve higher computational efficiency. information of a newly discovered sequence from a set of previously known sequences. For instance, if a recently dis- This paper proposes a design and implementation of a covered sequence is similar to a known disease gene, then parallel architecture on FPGA for BLAST, namely, systolic the biological information about the functionality of the new array-based parallel architecture for the BLAST algorithm. sequence can be inferred. This is extremely meaningful in The design is implemented with Very High Speed Integrated early disease diagnosis and drug engineering [2]. In addition, Circuit Hardware Description Language (VHDL), making it biological sequence alignment plays an important role in the compatible across a number of FPGA architectures (e.g., study of evolutionary development and the history of species Xilinx, Altera). The rest of this paper is organized as follows. [3]. We first present the overview of the BLAST algorithm as well Being one of the most important tools applied in finding as the related work. Then, the design of our FPGA archi- biological sequence alignment, BLAST (Basic Local Align- tecture will be detailed. After that, architecture performance ment Search Tool) [4] has been widely implemented on including storage resource requirements, FPGA resource commodity PC clusters. But with the exponential growth consumption, and estimated speedups against other archi- of the biosequence databases (see Figure 1 [1]), meeting tectures will be analyzed based on synthesizing, functional, 2 ISRN Bioinformatics 120000 the sequence is m. After dividing the query sequence, these k-words plus a score matrix are used to find the possible 100000 matching words which have a score with at least the threshold 80000 value of T. Finally, subject sequences in the database are scanned one by one to find exact matches to those possible 60000 matching words. Previous work shows that this step could 40000 be finished by high-level application software running on a PC [10]. By designing an architecture which can implement 20000 every step of BLAST on a FPGA, the interface issue for software and hardware connection can be avoided, saving 0 execution time. Instead of finding exact matches to those 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 00 01 02 03 04 05 possible matching words, in this paper, the implementation Millions of nucleotides searches for the exact matches of those k-words directly on the FPGA. This is called “finding hits.” Figure 1: Exponential growth of biological sequence database on a yearly basis. (2) Ungapped Extension. In this step, each hit recorded in the last step is extended by aligning with a subject sequence in and timing simulation. Finally, conclusions and future work both directions without any gap. A score matrix, for example, are presented. BLOSUM50 [10], is used to compute residue pair alignment scores. The extension process is terminated when the total score of the alignment is below a threshold value. These 2. The BLAST Algorithm results from ungapped extension are called high-scoring segment pairs (HSPs). Biological sequence alignment algorithms can be exhaustive, meaning they give optimal alignments. Two examples are the Needleman-Wunsch [6] and Smith-Waterman [7]algo- (3) Gapped Extension. In this stage, the Smith-Waterman rithms. Algorithms like BLAST are heuristic and give sub- or Needleman-Wunsh algorithm is applied to perform the optimal results [8]. Although heuristic algorithms produce gapped alignment in both directions for those collected local alignments that are not always optimal, they are nor- HSPs. After this step, a collection of homologous sequences mally much quicker than exhaustive dynamic programming are assigned with different scores. algorithms. Exhaustive algorithms typically have a complex- We choose the Needleman-Wunsch algorithm for gapped ity of O(mn)[9]. The execution time of biological sequence extension implementation because of its ability to find opti- alignment is crucial to bioinformatics scientists, especially mal global gapped alignment between the query sequence due to the fact that the size of the biosequence databases and the subject sequence [8]. Suppose that there are two has grown exponentially over years. BLAST is a very popular sequences, Q and S, whose lengths are M and N,respectively. heuristic algorithm that provides high computational speed. Q[1 : i] denotes the first i characters in sequence Q and The three main steps of the BLAST algorithm are finding S[1 : j] denotes the first j characters in sequence S.The hits, ungapped extension, and gapped extension. These are similarity score between Q[1 : i]andS[1 : j] can be obtained described as follows. from the similarity scores computed for Q[1 : i − 1] and S[1 : j − 1]. A dynamic programming score matrix D is built, (1) Finding Hits. In this step, we use a protein sequence where each element of the matrix denotes the best alignment Q i Q S j which consists of eight residues (amino acids) as an example score between the [1 : ]segmentof and the [1 : ] S shown below: segment of . The first step of Needleman-Wunsch algorithm is to PQGEFGVY. create the score matrix D with M+1columnsandN +1rows. M and N are lengths of sequences Q and S. The first row and k The query sequence is divided into overlapping -words the first column are assigned the value zero. The computation k = as illustrated below with 3: complexity of the initialization step is O(M + N). D Word 0: PQG The second step is comprised of filling the matrix. i,j is used to express the cell (i, j) of the scoring matrix D. Di,j is Word 1: QGE defined as the maximum of the similarity scores computed Word 2: GEF in positions (i − 1, j − 1), (j, j − 1), and (i − 1, j) associated with gap penalty or Si j . The following equation is used to Word 3: EFG , compute the value of the cell (i, j): Word 4: FGV ⎧ Word 5: GVY. ⎪D i − j − S ⎨ 1, 1 + i,j , D i j = D i − i d Six words are extracted from this sequence. Generally speak- , max⎪ ( 1, ) + , (1) ⎩⎪ ing, (m − k) + 1 words would be extracted when the size of D i, j − 1 + d. ISRN Bioinformatics 3 In this equation, Si,j is the scoring matrix score for Xi TGGCAGTCCG C Y S = i Q and j . i,j 1 if the th character in is equal to the 0 0 0000000000 jth character in S (MATCH score); otherwise Si,j = 0 (MISMATCH penalty). d is the gap penalty, d = 0. D(i − T 0 1 1111111111 j X 1, ) + 1 is the score of an alignment between i and a gap in T 0 1 1111112222 Y. D(i, j − 1) + d is the score of an alignment between a gap in X and Yj . G 0 1 2222222223 The last step of this algorithm is trace back. Trace back C 0 1 2233333333 can form the best global alignment between Q and S. It starts from the bottom-right value in the matrix and ends at the A 0 1 2233444444 up-left value. For each step, the next alignment pair which T 0 1 2233445555 is added to the front of the alignment segment accords to 3 rules: (1) If the cell (i, j)wasderivedfromcell (i − 1, j − 1), G 0 1 2333455556 the pair of alignment Xi and Yj is added to the front of the alignment segment.

Research Article a Systolic Array-Based FPGA Parallel Architecture for the BLAST Algorithm

ECE 590: Digital Systems Design Using Hardware Description Language (VHDL) Systolic Implementation of Faddeev's Algorithm in V

On the Efficiency of Register File Versus Broadcast Interconnect For

Computer Architectures

Systolic Computing on Gpus for Productive Performance

EE 7722—GPU Microarchitecture

Delft University of Technology an Efficient GPU-Accelerated Implementation of Genomic Short Read Mapping with BWA-MEM

An FPGA-Based Systolic Array to Accelerate the BWA-MEM Genomic Mapping Algorithm

11 Systolic Arrays Types of Special-Purpose Computers: 1

Computer Architecture: SIMD and Gpus (Part III) (And Briefly VLIW, DAE, Systolic Arrays)

Generating Systolic Array Accelerators with Reusable Blocks

Computer Archicture

A Comprehensive Survey of Various Processor Types & Latest