January 2002 Life Sciences Group Compaq Computer Corporation

Contents Compaq Computer DEFINITIONS...... 3 Life Science Program LIFE SCIENCE INDUSTRY OVERVIEW ...... 3 Abstract:

COMPAQ’S MARKET POSITION: This document presents an overview of Compaq in the Life Science NUMBER ONE IN LIFE SCIENCE discovery market. It includes a market overview, Compaq’s market DISCOVERY ...... 4 position, reference sites and collaborations, and a description of the CELERA ...... 4 key components of Compaq’s solutions for Life Science discovery. THE WELLCOME TRUST SANGER INSTITUTE ...... 5 The benefits of Compaq’s solutions for the Life Science industry are THE WHITEHEAD INSTITUTE CENTER FOR discussed. GENOME RESEARCH ...... 5 CENTRE NATIONAL DE SEQUENÇAGE (GENOSCOPE)...... 5 GENEPROT, INC...... 5 GENENTECH ...... 5 THE INSTITUTE FOR GENOMIC RESEARCH (TIGR) ...... 6 ENTELOS ...... 6 ACADEMIA SINICA...... 6 STRUCTURAL GENOMIX...... 6 PITTSBURGH SUPERCOMPUTING CENTER (PSC) ...... 6 COMPAQ MARKET POSITION: NUMBER ONE IN HPTC...... 7

COMPUTING POWER FOR THE POST-GENOMIC ERA...... 7

COMPAQ BIOCLUSTER...... 8 COMPAQ LIFE SCIENCE INVESTMENT PROGRAM...... 8 COMPAQ’S LIFE SCIENCES SOLUTIONS...... 9

FULL RANGE OF HIGH-PERFORMANCE COMPUTING SYSTEMS ...... 9 TODAY: THE BEST ABSOLUTE PERFORMANCE WITH ALPHASERVER SYSTEMS ...... 9 TOMORROW: THE COMPAQ/INTEL ANNOUNCEMENT ENHANCES COMPAQ’S HPTC ROADMAP AND SOLUTIONS...... 10 STORAGEWORKS TECHNOLOGY FOR HIGHLY RELIABLE, SCALABLE, MODULAR STORAGE SYSTEMS...... 11 APPLICATION PORTFOLIO...... 11 PROFESSIONAL SERVICES...... 12

COMPAQ FINANCIAL SERVICES ...... 12 SUMMARY OF COMPAQ SOLUTION BENEFITS ...... 12

CONTACT INFORMATION...... 13

Notice Compaq Computer Corporation’s Life Science Program Document prepared by High Performance Technical Computing Group First Edition November 2001. The information in this publication is subject to change without notice and is provided “AS IS” WITHOUT WARRANTY OF ANY KIND. The entire risk arising out of the use of any information presented in this paper remains with the recipient. In no event shall Compaq e liable for any direct, consequential, incidental, special, punitive or other damages whatsoever (including without limitation, damages for loss of business profits, business interruptions of loss of using information), even if Compaq has been advised of the possibility of such damages. The limited warranties for Compaq products are exclusively set forth in the documentation accompanying such products. Nothing herein should be construed as constituting a further or additional warranty. This publication does not constitute an endorsement of the product or products that were tested. The configuration of configurations tested or described may or may not be the only available solution. This test is not a determination or product quality or correctness, nor does it ensure compliance with any federal, state, or local requirements. Product names mentioned herein may be trademarks and/or registered trademarks of their respective companies. Compaq, the Compaq logo, AlphaServer, NonStop, OpenVMS, ProLiant, StorageWorks, Tru64 and TruCluster are trademarks of Compaq Information Technologies Group L.P. in the U.S. and/or other countries. UNIX is a trademark of The Open Group. Intel and Itanium are trademarks or registered trademarks of Intel Corporation. Linux is a registered trademark of Linus Torvalds. Windows is a registered trademark of Microsoft Corporation. PhysioLab is a trademark of Entelos, Inc. All other product names mentioned herein may be trademarks or registered trademarks of their respective companies. Copyright © 2001,2002 Compaq Computer Corporation. All rights reserved.

Definitions There are many definitions of Life Science. This paper focuses on the Life Science discovery sector: genomics, , functional genomics, comparative genomics, physiomics, systems , pathways and others. Life science is becoming increasingly quantitative as new technologies facilitate collection and analysis of vast amounts of data ranging from complete genomic sequences of organisms to three-dimensional structure and complete biological pathways. As a consequence, biomathematics, biostatistics and computational science are crucial technologies for the study of complex models of biological processes. High performance computing technology is a requirement for this industry and enables the study of complex models of these biological processes. Compaq Computer Corporation supplies high performance solutions for wide range of Life Sciences organizations and is a critical partner for researchers who are working to improve human health. Life Science Industry Overview There is no doubt that the sequencing and initial annotation of the human genome, completed in April 2001, is one of the great scientific advancements in history. There is also no question that this breakthrough in biological research was made possible by advances in high performance computing technology. High-speed computers are necessary to analyze the hundreds of terabytes of raw sequence data and correctly order the 3.2 billion base pairs of DNA that compose the human genome. Compaq is at the forefront of this technological breakthrough, providing the solutions required as companies, organizations and agencies open new and exciting chapters in life science. The staggering quantity and complexity of data generated by the Human Genome Mapping Project (HGMP) has created a huge demand for high performance computing. Genetic databases are estimated to double in size every six to eight months, more than twice as fast as Moore’s Law predictions for microprocessor performance improvements. Genomic research organizations such as Celera Genomics, the Whitehead Institute Center for Genome Research, and the Sanger Institute are already managing many terabytes of data, larger in size than the Library of Congress. That data will grow exponentially over the next several years. The computing resources required for the HGMP are millions of times greater than used to land a man on the moon. Today, separating the advances in biotechnology from advances in high performance computing is increasingly difficult. Many leading scientists believe that high-end computing is the future of biology and medicine. High performance computing has become the third leg of traditional scientific research, along with theory and experimentation.

The assembly and initial annotation is only the first step on a long road to understanding the human genome. Many companies, research institutes, universities and government

laboratories are now rapidly moving on to the next steps: comparative genomics, functional genomics, proteomics, pathways, and whole organ modeling. Researchers are beginning the quest to determine exactly how each gene and protein functions and, more important, how they malfunction to trigger deadly illnesses from heart disease and cancer to Alzheimer’s and Parkinson’s diseases. Increasingly powerful computers and software will be needed to gather, store, analyze, model and distribute information. The availability of the biological data, coupled with powerful software analysis tools and powerful computer systems, will allow scientists to develop new diagnostics, drug therapies and new strategies and methods for identifying disease genes and, eventually, personalizing medicine. In addition, agrochemical companies and traditional chemical companies are also investing in to develop new strains of seeds, herbicide resistant crops and novel biomaterials. Bioinformatics is a large part of the R&D investment that agrochem, chemical, pharmaceutical, and biotech companies are making today as they strive to develop and bring products to market faster. In addition, industry analysts estimate up to 100 percent annual revenue growth for genomics companies, while information technology spending is expected to match or exceed that rate. Computing companies, like Compaq, are responding with ever more powerful, faster, reliable and cost-effective computing solutions. Compaq’s Market Position: Number One in Life Science Discovery Compaq stands as the leading supplier of computing systems for Life Science research, according to International Data Corporation (IDC). Compaq works with many of the major research centers worldwide, including the world’s largest genomic sequencing facilities at Celera Genomics in the USA and the Wellcome Trust Sanger Institute in the United Kingdom. With Compaq working as a partner, the time to process the mapping of the full human genome was dramatically compressed through more aggressive and large- scale application of sequencing and computing technology. Most of the leading commercial and public genomics and proteomics centers, such as Celera Genomics, the Sanger Institute, the Whitehead Institute Center for Genome Research, Genoscope and GeneProt use Compaq Alpha systems either exclusively or primarily. Major pharmaceutical and biotechnology companies such as Genentech are also prominent Compaq customers.

Celera Genomics Celera is a pioneer in human genomics and chose Compaq as its IT partner in developing the world’s largest private genomic sequencing and computing facility. Compaq designed and equipped Celera’s data center, which includes over 900 interconnected Alpha processors and Compaq StorageWorks™ systems managing a 100-terabyte database.

The Wellcome Trust Sanger Institute The Sanger Institute is one of the world’s major genome research centers. The Sanger’s vast bioinformatics computing resources include more than 770 Compaq AlphaServer™ systems running Tru64™ UNIX® software, as well as Compaq PCs running Linux® and over 60 TB of Compaq StorageWorks storage. Over the next five years, the Sanger Institute will require a greater than fivefold increase in IT resources and infrastructure.

The Whitehead Institute Center for Genome Research The Whitehead Institute operates the largest public genomic sequencing center in the U.S. The Institute selected Compaq to supply the IT infrastructure, relying on 32 AlphaServer ES40, DS20 and DS10 systems and more than 21 TB of Compaq StorageWorks storage to manage and analyze genomic data. The Whitehead Institute plans to triple the size of its computing facility within one year. The Institute currently sequences 34 million lanes per year, soon increasing to 48 million lanes per year. To accomplish this will require 7x24 availability of computer resources.

Centre National de Séquençage (Genoscope) Genoscope is a non-profit organization located in Evry, France and manages the second largest genome sequencing facility in Europe. Genoscope chose Compaq AlphaServer systems and Compaq StorageWorks products as the basis for its IT architecture.

GeneProt, Inc. GeneProt announced in October 2000 the selection of Compaq as the exclusive IT partner for the company’s two major research proteomics factories in Geneva, Switzerland, and in North Brunswick, New Jersey. As part of the agreement, GeneProt will use Compaq Global Services, StorageWorks storage systems, and industry-leading AlphaServer systems running Tru64 UNIX to power and manage critical aspects of the company’s pioneering proteomic research facilities. The entire initial IT system is fully outsourced to Compaq and located at Compaq facilities in Geneva. In April 2001, GeneProt opened the world’s first large-scale proteomic discovery center in Geneva, Switzerland. The facility will enable GeneProt to become a world leader in proteomics, contributing to the discovery of new drugs and biomarkers based on the human body’s own . The new center, which is managed by a world-class team of proteomics and bioinformatics pioneers, will run 20 hours a day and will use the supercomputing capabilities of Compaq’s systems to capture, store and analyze the huge volumes of data generated by GeneProt’s proteome analyses. The supercomputing technology includes 1,420 Compaq Alpha-based systems, each of which is capable of performing more than a billion sequence comparisons per hour, while offering increased sensitivity and performance in sequence similarity analysis. GeneProt will also deploy Compaq AlphaServer GS160 systems and over 50 TBs of storage.

Genentech Genentech is a biotechnology pioneer and relies on Compaq AlphaServer systems as a key element in its IT infrastructure – running everything from email to high performance bioinformatics, protein and molecular biology applications on Tru64 UNIX Alpha

systems. The Genentech configuration is an eight-node TruCluster™ Server environment that consists of seven 4-processor SMP AlphaServer ES40 systems with 4GBs of memory and one 10-processor AlphaServer 8400 system with 8GBs of memory. Attached to the cluster are more than 2 TBs of StorageWorks Fibre Channel storage. The configuration uses the Memory Channel II interconnect.

The Institute for Genomic Research (TIGR) TIGR is a leading non-profit genomics research institute that focuses on analyzing and describing entire genomes and making this valuable information available to the scientific community. TIGR moved to the 64-bit Compaq Alpha platform to alleviate memory limitations experienced with its 32-bit systems. TIGR has reduced response time by more than 90% with its Alpha systems.

Entelos Entelos announced in August 2001 that it has adopted Compaq’s AlphaServer systems running Tru64 UNIX and Compaq ProLiant™ servers running Linux as the computational platforms for its current and next-generation high-throughput PhysioLab™ systems. PhysioLab technology provides a breakthrough platform in deterministic simulations for testing experiments and hypotheses in silico, predicting the results through computer-based simulation and creating knowledge out of fragmented discovery and clinical data.

Academia SINICA On September 1, 2001, Academia SINICA, Taiwan’s leading academic research institute with strong ties to the central government, announced that it would team with Compaq in a joint venture to help jump-start the island’s efforts in developing its biotechnology industry.

Structural GenomiX On September 20, 2001, Structural GenomiX (SGX) announced a multi-million dollar agreement with Compaq to develop SGX’s high performance structural informatics systems. SGX will purchase a complete Compaq technology solution including Compaq AlphaServer systems running Tru64 UNIX, Compaq ProLiant industry-standard servers running Linux, StorageWorks products and services. Compaq’s Alpha chip technology and leadership in building high performance computing systems will advance SGX’s drug discovery efforts by increasing SGX’s ability to run its high-throughput protein modeling, molecular dynamics, and chemical docking programs on a large scale.

Pittsburgh Supercomputing Center (PSC) On August 3, 2000, the National Science Foundation awarded $45 million for PSC to provide “terascale” computing capability for U.S. researchers in all science and engineering disciplines. PSC collaborated with Compaq to create a new, extremely powerful system for the use of scientists and engineers nationwide. The six-teraFLOPS system is a network of 682 Compaq AlphaServer systems, each of

Figure 1 - PSC's 6 TFlop system

which contains four Compaq Alpha microprocessors. Existing terascale systems rely on other processors, but extensive testing by PSC and others indicates that the Alpha processor offers superior performance over a range of scientific applications. In early 2001, researchers used a prototype of the Terascale Computing System (TSC) for a pioneering simulation of a mechnanosensitive membrane protein (http://www.psc.edu/science/schulten2001.html).

0 to 4 TFlops in 29 days On October 2, 2001, Dr. Michael Levine announced that the PSC and Compaq had created a 4.1 TFlop AlphaServer SC system in 29 days. Compaq Market Position: Number One in HPTC In addition to its leadership position in supplying computing systems for life science, last year Compaq established industry leadership in high performance technical computing by winning an unprecedented series of “most powerful” supercomputing programs:

- World’s most powerful supercomputer: the 30 TeraOPS ASCI Q system at Los Alamos National laboratory - World’s most powerful supercomputer for non-military research: Pittsburgh Supercomputing Center/National Science Foundation 6 TeraFlops system - Europe’s most powerful supercomputer: the French Atomic energy Commission’s (CEA) 5 TeraFlops system - World’s most powerful Linux system: Sandia National Laboratory’s CPlant system - Most powerful computer in the film and video industry: Blue Sky Studios: 512 AlphaServer DS10L systems - Most powerful university supercomputers in Australia: the Australian Partnership for Advanced Computing (APAC) and the Victorian Partnership for Advanced Computing (VPAC) - Most powerful supercomputer in Japan: Japanese Atomic Energy Research Institute (JAERI)

These successes, combined with Compaq’s equal success in delivering smaller systems, earned Compaq the number one ranking from International Data Corporation for HPTC revenue in 2000. Computing Power for the Post-Genomic Era The challenges of the post-genomic sequencing era promise to be even more intensive than those of the initial genome-mapping phase. As the Human moves into the annotation and analysis work, Compaq has launched several programs to further its commitment.

Compaq BioCluster The public Human Genome Mapping Project lacked the computing capacity to complete the initial annotation work on schedule and requested Compaq’s assistance. Compaq agreed to help by providing the computing tools and storage to complete the annotation. Compaq contributed a cluster of AlphaServer ES40/667 systems with over one terabyte of storage located at Compaq’s Enterprise Systems Laboratory in Littleton, Massachusetts. The system is available to public sector research institutions to complete the annotation of the human genome – identifying where the genes are located in the human chromosome and also doing an initial analysis. In the first six months after it was set up in May 2000, more than one million jobs were run on the BioCluster. Many researchers, located around the world access the cluster through the World Wide Web. The BioCluster comprises 27 AlphaServer ES40 4-processor SMP nodes, each with 54 gigabytes of local storage. The BioCluster is networked using dual-switched 10/100 Ethernet. Twenty-five of the nodes have 4 GBs of RAM and one has 16 GBs of RAM. In addition, the system has a central file server with 1 terabyte of secondary storage. A standalone AlphaServer ES40 system is also available for testing scripts and new user code before running on the main cluster. Platform Computing’s LSF load sharing facility efficiently balances application loads. The Whitehead Institute has been the biggest user, employing the BioCluster to run RepeatMasker and BLAST queries on successive releases of draft human sequence to catalog the repetitive DNA in the human genome. There are 30,000-40,000 genes in the human genome; the rest consists of repetitive, although conceivably just as important, code whose purpose has yet to be determined. An MIT lab has also used the cluster to perform genome annotation. The University of California at Santa Cruz has used the BioCluster for the layout and arranging of DNA pieces into the best version of the sequence. Researchers at Washington University in St. Louis also use the system for various gene expression and annotation work. France’s Genoscope ran the first analysis of the complete draft of the human genome on the BioCluster. The results will provide highly accurate predictions of the total number of genes in the human genome. The analysis used the LASSAP (LArge Scale Sequence compArison Package) sequence comparison software. The complete analysis of the whole draft dataset took only 38 hours on the 100-CPU BioCluster.

Compaq Life Science Investment Program On September 26, 2000, Compaq announced that it would invest an initial $100 million in early-stage life science companies in the areas of genomics, bioinformatics, and related market disciplines. The program involves a mix of direct investment in such companies and investment in venture capital funds targeting these areas. Compaq’s goal is to spur the growth of discovery in life science companies through a combination of financial support and early access to Compaq’s industry-leading AlphaServer systems running the Tru64 UNIX operating system, StorageWorks systems, solutions and services.

Initial investments include an equity investment in GeneProt and an investment in the Cambridge, Massachusetts-based Applied Genomic Technology Capital Fund, L.P., a venture capital fund focusing on investments in genomics and bioinformatics. Compaq’s Life Sciences Solutions Compaq believes that a complete life science solution must include the very best hardware, system software, tools, applications and professional services, delivered at a competitive cost. Compaq and its selected partners can design a solution that will fit a customer’s IT needs today and that will be modular and scalable to meet future IT requirements as the business grows. Compaq has established relationships with some of the world best life science institutions and companies that understand the IT challenges facing both small and large life science organizations.

Full Range of High-Performance Computing Systems Compaq offers a complete choice of HPTC systems (see Figure 2) from the low end to the high end, Tru64 UNIX, Linux, OpenVMS™ or Microsoft® Windows® operating systems. With its Alpha systems, Compaq is the leader in scalable 64-bit systems supporting very large memory database applications. Today’s AlphaServer DS, ES, GS and SC systems provide the performance of the latest Alpha microprocessors in scalable configurations from the low-cost, single processor AlphaServer DS10 to high-end, Figure 2 - AlphaServer Family switched SMP systems such as the AlphaServer SC series supercomputer, which supports up to 128 nodes (512 processors). Custom AlphaServer SC systems can be designed and built with higher node count. The Pittsburgh Supercomputer Center (PSC) built a 692-node (2768 ES45 Alpha microprocessors) public supercomputer and the US Figure 3 - AlphaServer government ASCI Q project will connect over 12,000 processors ES45 rack (3,000 ES45 nodes) into a single machine. One AlphaServer ES45 rack (Figure 3) contains 5 nodes (20 Alpha 21264b CPUs) and delivers 40 GFlops of performance.

Today: The Best Absolute Performance with AlphaServer Systems AlphaServer systems deliver not only the best absolute performance, but also the best- sustained application performance available with Tru64 UNIX or Linux. Compaq Alpha technology continues to hold the performance pole position. In Celera Genomics’ benchmark tests, Celera took a large bioinformatics benchmark and gave it to all vendors. Only two vendors could run it. One ran it in 87 hours. The Compaq AlphaServer system ran it in under seven hours – an order of magnitude faster

than the next fastest system. On the latest Compaq hardware, this benchmark now runs in less than two hours. Similarly, Genoscope found that Compaq’s AlphaServer BioCluster system took 25% less time to complete a LASSAP sequence comparison analysis run that was 2.5 times larger than all previous runs made on any system available from any other vendor. Figure 5 shows the results of a benchmark using a public version and an optimized version of HMMER1. The benchmark was run on an AlphaServer DS20 system using the Pfam database as the target database. The input consisted of several random amino acid sequences.

Figure 5

Tomorrow: The Compaq/Intel Announcement Enhances Compaq’s HPTC Roadmap and Solutions In June 2001, Compaq announced a strategic partnership with Intel Corporation, designed to enhance Compaq’s long-term product roadmap substantially for High Performance Technical Computing. Beginning in mid-decade, Compaq will implement its 64-bit enterprise servers, including the SC family, using the Intel® Itanium™ microprocessor architecture.

The alliance draws upon the unique competencies of each company and will enable Compaq to deliver better, higher performing HPTC solutions that will meet the needs of research scientists into the foreseeable future. Compaq will focus on designing and building advanced systems utilizing Intel’s advanced microprocessor technology.

1 Gary Monty, Southwest Parallel Software

StorageWorks Technology for Highly Reliable, Scalable, Modular Storage Systems In the past 20 years, scientists around the world have collected and generated an incredible quantity of complexity life science data. The amount of genomic data is doubling every six to eight months. Proteomic data will grow exponentially over the next several years. Compaq’s StorageWorks storage area network (SAN, figure 6 and 7) is the preferred solution to meet the challenges imposed by the explosive data growth and the need for more efficient data storage and management. Compaq’s SANs provide the industry leading Figure 6 - Compaq SAN architecture and vision to deploy, manage and scale today’s networked storage enterprise environment. Compaq SANs significantly increase the value of the IT storage infrastructure by Figure 7 - increasing its ability to respond to change, streamline operations and Storage Array sustain performance during normal and critical conditions.

Application Portfolio Compaq works closely with strategic commercial and public domain applications developers to provide top performing, scalable, reliable and high available (7x24) solutions to Life Science customers today and into the future. This includes extensive experience implementing solutions with key partners such as Oracle Corporation and Platform Computing at a wide range of customer sites (research institutes, biotech startups, established pharmaceuticals). In addition, Compaq has facilities and experience to pre-configure and test solutions prior to implementing at customer sites.

Key commercial developers include:

• Database Management − Oracle Corporation • Distributed Resource Management − Platform Computing • Data Analysis − Lion BioScience, Informax, Accelrys/GCG • Data Integrators − Lion BioScience and Informax • Data Providers − Celera Genomics, Incyte Genomics, Structural GenomiX, Entelos • LIMS − Thermo Lab Systems and Applied Biosystems • Grid Technologies − Platform Computing

Public Domain Applications Over 60 public domain codes are available on the Alpha Tru64 UNIX platform. (e.g. BLAST, FastA, Phrap, HMMER, ClustalW, RepeatMasker, LASSAP, GeneHunter, GeneScan, GenomeScan). Many of these codes are optimized on Alpha. Any optimization work carried out on public domain software by Compaq is returned to the author or curator for testing and quality control then included in next release.

Compaq works with Southwest Parallel Software to optimize key codes. By way of example, Gary Montry, CEO, Southwest Parallel Software, optimized Phrap to take advantage of multiple processors on Compaq’s SMP systems. The result of this effort shows that a job can finish three times faster using a four-processor AlphaServer ES40 system compared to a single-processor Alpha system.

Professional Services Compaq has assembled one of the largest state-of-the-art IT professional service organizations in the world. With over 40,000 service professionals around the world, Compaq is poised to handle the multi-national requirements of our customers and partners. Our professional service organization has the knowledge and real world experience to install a world-class program management office. A typical program office consists of solution architects and business personnel to design, pre-stage, test, implement and manage the day-to-day operation and growth of a system. The Compaq organization consists of professionals who are there to ensure that the system availability is optimized.

Compaq Financial Services Compaq Financial Services is the leasing arm of Compaq, offering global leasing solutions to ensure that assets are tracked and utilized to their fullest financial benefit. Key points include technology refresh, web-based asset tracking and flexible lease terms. Summary of Compaq Solution Benefits • Full range of systems, services and solutions o Intel and Alpha o Tru64 UNIX, Linux, OpenVMS and Windows o Single-processor, SMP, clusters and supercomputers o Highly reliable, scalable, modular computational and storage systems o Range of system and storage interconnects • High level of performance o 1st 64-bit architecture o Best compilers and tools o Optimized applications o Leadership in supercomputer wins • High-availability, NonStop™ computing o 7x24 reliability, availability and serviceability o Facility, services, training and support o Pre-staging and testing of complex systems

Contact Information For more information on Compaq’s Life Science Program, including customer success stories, software partners and applications, white papers, and benchmarks, please go to http://www.compaq.com/hpc/life_sciences/index.html or contact your local Compaq Account manager.