High Performance Bioinformatics May 0909--06:06: Bryan McCoy, Kinit Patel, Tyson Williams Advisor/Client: Dr. Zhao Zhang

What is Bioinformatics? The Problem: Bioinformatics is the use of information technology in the field of molecular biology. Current solutions for solving bioinformatics problems are not Bioinformatics is used to sequence DNA and fold proteins. realistically feasible in most cases.

•Massive amounts of data. This is because the current solutions : •A large number of simple operations. •Are too expensive as they require a . •Perfect for distributed computing . •Are too slow as they could take several days for some inputs.

There is a real need to develop high speed , low cost solutions.

Our Solution: Use MPI (Message Passing Interface) and multiple IBM processors to achieve significant improvement in runtime. This solution has several big advantages to traditional methods.

•High speed due to the Cell’s multiple cores. •Low cost due to the availability in PlayStation 3. •Unique architecture. •Ideal for SIMD (Single Instruction, Multiple data execution commands). •Excellent fit for Bioinformatics processing.

Source: Imperial London College Source: IBM The Software: The Hardware: The specific piece of software we have ported is called DNAPenny . It is used to We are using 3 PlayStation 3s set up in a private cluster. The Cell processor has analyze DNA sequences from several different species to determine which ones several unique factors that make it ideal for clustered computation. most similar. It does this through the use of a branch and bound search algorithm. •1 PowerPC based processor called the PPE (Power Processing Element). Phase 1 (previous senior design group) successfully made the program work across •6 other co-processors called SPE s (synergistic Processing Element). a single Cell processor using multiple SPEs. •A bus made of 4 high speed rings for processor communication.

Phase 2 (our group) has distributed the work across multiple Cell processors using Note: while the picture below shows 8 SPEs, only 6 are accessible to developers on the PlayStation 3. MPI . Development: The main point to the development process was to use MPI to decrease the runtime by distributing the work . A profile of the code revealed that one function took 90% of the runtime . Phase 1 had distributed this function to the SPEs of a single Cell processor.

During our development, we discovered that this function and another one each had a bug that severely affected runtime . After contacting Dr. Felsenstein, the original developer from the University of Washington, he rewrote both functions.

The new version executed 8.51x faster on a normal desktop computer and nearly 7.47x faster on a PS3. Unfortunately, this negated all the work done by Phase 1 as their ported function no longer took a significant amount of the runtime .

In order to again have code to execute on the SPEs, we decided Source: lecture Slides Michael Terribilini to take a more traditional approach of assigning a branch of the Source: IBM search to each SPE . The Results: Our project had 2 goals: Estimated Speedup for a Cluster of PS3s 1. To improve the runtime of the DNAPenny Code revision Runtime X Speedup X Speedup # of algorithm through the use of MPI. (sec) available cores 2. To show that the PS3 with its Cell processor is (compared a viable alternative to expensive, traditional to desktop) computers. Original (Core2) 1861.66 0.116 1 We achieved our first goal . Our ported version of DNAPenny runs 2.77x faster on a cluster of 3 With Bug Fixes 218.8 1 1 PS3s than on a single PS3. Furthermore, we also (Core 2) contributed to two code improvements that Original 4953.51 1 0.044 1 resulted in a 7.47x speedup in the sequential (1 PPE) version of DNAPenny on the PS3. We were also able to beat our original projections from last With Bug Fixes 662.70 7.47 0.330 1 semester. (1 PPE)

238.57 20.76 0.917 3 However, we did not show that the Cell is a viable MPI with Bug Fixes (3 PPEs) alternative as we were not able to take advantage of the SPEs. At this time, the sequential version of MPI with Bug 34.08 145.35 6.420 21 DNAPenny executes faster on a desktop than on a Fixes cluster of 3 PS3s using MPI. (3 PPEs, 18 SPEs) If we were able to complete the port to the SPEs, we estimate a speedup of 6.42x compared to the (Projected) fixed version running on a desktop, assuming that each SPE would perform at least as well as each Original 334.82 14.79 0.653 21 PPE. This speed up would make the Cell cluster a Projections viable alternative for DNAPenny.