Heterogeneous Computing in the Cloud: Crunching Big Data and Democratizing HPC Access for the Life Sciences

WHITE PAPER Intel® Xeon® Processors Intel® Xeon Phi™ Coprocessor Life Sciences Computing Heterogeneous Computing in the Cloud: Crunching Big Data and Democratizing HPC Access for the Life Sciences A New Paradigm for Life Sciences Computing The combination of heterogeneous computing and cloud computing is emerging as a powerful new paradigm to meet the requirements for high-performance computing (HPC) and data throughput throughout the life sciences (LS) and healthcare value chains. Of course, neither cloud computing nor the use of innovative computing architectures is new, but the rise of big data as a defining feature of modern life sciences and the proliferation of vastly differing applications to mine the data have dramatically changed the landscape of LS computing requirements. Heterogeneous cloud computing offers heterogeneous architectures can the potential to shift flexibly from one provide even small research groups HPC architecture to another in a secure with affordable access to diverse public or private cloud environment. As compute resources. such, it meets two critical needs for life This paper discusses several trends and sciences computing: enablers of affordable, heterogeneous • Crunch more data faster. As data cloud computing for LS, including the new sets have ballooned, HPC approaches Intel® Xeon Phi™ coprocessor, based on Intel® have evolved and diversified to better Many Integrated Core Architecture (Intel® match specific problems with their most MIC Architecture). The Intel Xeon Phi effective HPC solutions. Indeed, some LS coprocessor represents a breakthrough in problems (for example, de novo genome heterogeneous computing by delivering assembly) are essentially intractable exceptional throughput and energy unless tackled with special-purpose efficiency without the high costs, solutions. Today, no single approach is inflexibility, and programming challenges adequate. Heterogeneous computing that have plagued many previous embodies the use of multiple approaches approaches to heterogeneous computing. to achieve optimal throughput for each This paper also provides brief examples big data LS workload. of heterogeneous computing innovators, such as Nimbix and Convey Computer, Ketan Paranjape • Democratize access. While the scope that are adopting or experimenting with Director, Healthcare and complexity of HPC resources have heterogeneous computing approaches. and Life Sciences, grown, the ability of research groups Intel Corporation to identify, afford, and support them Rising Demand for Big Data has diminished. Budget constraints are Steve Hebert Computing CEO, limiting access to necessary compute Nimbix resources at the very time when the Genomics is perhaps the clearest LS example where progress is accelerated Bob Masson explosive growth in LS data makes access increasingly desirable. HPC- by access to the right HPC resource Marketing Manager, and stymied when those resources Convey Computer oriented clouds supporting the latest Heterogeneous Computing in the Cloud: Crunching Big Data and Democratizing HPC Access for the Life Sciences Table of Contents are missing. Pioneering computational Cloud-based, heterogeneous computing biologist Dr. Eric Schadt, currently represents a significant step toward solving A New Paradigm for director of the Institute for Genomics and these problems. Indeed, heterogeneous Life Sciences Computing. 1 Multiscale Biology at Mt. Sinai Hospital, computing has become virtually a necessity Rising Demand for captured the challenge well in a recent in life sciences, where the output from next- Big Data Computing. .1 Nature Reviews Genetics paper. Dr. Schadt generation sequencing (NGS) instruments Application-Specific Computing. 3 and his colleagues wrote: represents a data tipping point. This data deluge has outpaced even the steady Intel Xeon Phi Coprocessor “The astonishing rate of data generation performance-doubling of Moore’s Law. New Advances Performance, by these low-cost, high-throughput Energy, and Programmability. 3 approaches based on specialized processors technologies in genomics is being matched such as field-programmable gate arrays Embracing Heterogeneous by that of other technologies, such as (FPGAs) and general-purpose computing on Computing: Jackson Laboratory. 4 real-time imaging and mass spectrometry- graphics processing units (GPGPUs), as well Other Advances: Improving based flow cytometry. Success in the as innovative computing strategies such as Ease of Use and Development life sciences will depend on our ability Apache Hadoop*, are being pressed into for Heterogeneous Computing. 5 to properly interpret the large-scale, service with impressive results. Democratizing HPC Access high-dimensional data sets that are with Cloud Computing. 6 generated by these technologies, which Not surprisingly, heterogeneous in turn requires us to adopt advances in approaches are being embraced by the Hiding Complexity from the User. .7 informatics…[Scientists must] master bioinformatics community. “We just don’t Entering a New Era. .8 different types of computational have enough electricity, cooling, floor Learn More. .8 environments that exist—such as cloud space, money, etc. for using standard and heterogeneous computing—to clusters or parallel processing to handle successfully tackle our big data problems. the load,” said Dr. Harold “Skip” Garner, head of the Medical Informatics Systems “In under a year genomics technologies Division and former executive director will enable individual laboratories to of the Virginia Bioinformatics Institute, generate terabyte or even petabyte scales both part of the Virginia Polytechnic of data at a reasonable cost. However, Institute and State University (Virginia the computational infrastructure that is Tech). “Future bioinformatics centers, required to maintain and process these especially large computing centers, will large-scale data sets, and to integrate them have a mix of technologies in hardware with other large-scale sets, is typically that include standard processors, FPGAs, beyond the reach of small laboratories and and GPGPUs, and new jobs will be designed is increasingly posing challenges even for for, implemented on, and steered to the large institutes.”1 appropriate processing environments.”2 Figure 1: The Convey Intel® Xeon® Processor-Based Host FPGA-Based Coprocessor Computer hybrid- core system provides a reconfigurable, Xilinx Xilinx Xilinx Xilinx FPGA* FPGA FPGA FPGA application-specific Intel® Xeon® HCMI accelerator, but Processor(s) appears as a standard Intel I/O Application-Speciﬁc Personalities Intel® architecture- Subsystem based server to the rest of the computing Cluster Interconnect infrastructure Fabric Memory Memory Memory Hybrid-Core Globally Shared Memory (HCGSM) 2 Heterogeneous Computing in the Cloud: Crunching Big Data and Democratizing HPC Access for the Life Sciences Application-Specific Computing do searching and alignment. Such It is still early days for heterogeneous applications rely on many independent HPC cloud computing. Advances in and simple operations, and are thus virtualization technology, the porting highly parallelizable. of key bioinformatics algorithms to Unique architectures from innovative special-purpose hardware, and ongoing companies are emerging to help meet evolution in standard microprocessor the demand. For example, Convey architecture, such as the Intel Xeon Phi Computer, established in December coprocessor, are all important enablers of 2006, has created a hybrid-core system cloud-based, heterogeneous computing. that pairs classic Intel® processors with While challenges remain (faster data a coprocessor of FPGAs (see Figure 1). transfer and data security concerns, for Particular algorithms—DNA sequence example), heterogeneous HPC in the cloud assembly, for example—are optimized is demonstrating the potential to broadly and translated into code that’s loaded enable researchers and clinicians. onto the FPGAs at runtime. The Convey GPU-based acceleration was an early architecture also features a highly parallel success. First developed to speed graphics memory subsystem to further increase performance in gaming, GPUs’ excellent performance. Convey Computer’s approach floating-point capability proved attractive provides very fast access to random in many applications. GPGPU-based access to memory or single-word access to systems were quickly adopted by the oil memory, and is very useful for the hashing and gas industry, where they delivered functions so widely used in bioinformatics. dramatic speedups at reduced cost. Today, general-purpose GPU computing has Intel Xeon Phi Coprocessor spread to LS disciplines where floating- Advances Performance, Energy, point performance is important (molecular and Programmability modeling, for example, and some Lively debate persists over the right HPC alignment algorithms). approach for various applications and disciplines. For example, FPGA advocates More recently, systems based on FPGAs contend that if you lay out the gates on have gained momentum throughout a chip to execute the specific aspect of genomics and life sciences. While it’s an algorithm, it will by definition be more possible to use application-specific efficient than a series of instructions on a integrated circuits (ASICs), LS applications commodity processor. However, if getting change so quickly that it’s impractical to an algorithm onto a new architecture spin up an ASIC for each algorithm for requires that you use a new language, then every HPC application. This approach

Load more