Resolving Bottlenecks: Target Elimination P.52 Strategic Insights: Regulatory Compliance P.61 www.bio-itworld.com

Winner of Two 2003 Neal Awards TECHNOLOGY FOR THE LIFE SCIENCES WorldAPRIL 2003 • VOL. 2, NO. 4 BioIT See Page 8

The Double Helix Turns 50 UNRAVELING Futurethe

• A conversation with James Watson P.28 • Beyond the Blueprint P.38 • First Base: The Doyens of DNA P.6 • The sound of genome music P.32 KAYOMI TUKIMOTO; MIRIAM CHUA/COLD SPRING HARBOR LABORATORY (INSET) EDC SHREDS PAPER TRAILS DEBATING DNA IN MONTEREY Electronic data capture finds data capture (EDC) to improve At Time conference, top The February meeting looked increased acceptance clinical trial management. Recently, brains spin genomic future both backward at the discovery of Biogen and Dana Farber/Partners amid potshots at pharma the double helix and forward to the CancerCare (DF/PCC) have gotten benefits and havoc DNA may BY AMANDA FOX in on the action, demonstrating that wreak. Genomics, clearly, isn’t for As the pressure rises to compress the EDC applies to both settings. BY MARK D. UEHLING only scientists anymore. Ordinary time it takes to conduct clinical tri- With an increasing number of “This whole meeting has been a citizens gawked at the speakers, als, commercial and research insti- federal clinical trial and drug mixture of exhilaration and ter- begging autographs and snapshots tutions alike are adopting electronic (CONTINUED ON PAGE 24) ror,” said Francis Collins, director of uber-nerds and scientists such of the National Human Genome as Leroy Hood, Ray Kurzweil, Research Institute. If Collins were Richard Dawkins, Stewart Brand, scared, having run the public effort and E.O. Wilson. The conference to sequence the human genome, featured a baroness, an ambas- imagine the reeling minds of sador, and two Nobelists. preachers, financiers, teachers, and As Collins sketched a number other civilians attending “The Fu- of April events to celebrate the fin- ture of Life,” a conference spon- ished human genetic sequence, in- sored by Time magazine in cluding a plan to dispatch 1,000 Monterey, Calif. (CONTINUED ON PAGE 20) 50 th Anniversary of the Double Helix BEYOND the As investigators celebrate the golden anniversary of the double helix, how will the wealth of data emanating from the human genome and allied technologies impact research on health and disease?

BY MALORYE BRANCA

ontemplating DNA’s inner beauty from blurry X-ray images and That reality is gaining credence as the “completed” sequence of the human and oth- cardboard cutouts, James Watson and Francis Crick could hard- er genomes, transcriptomes, and proteomes ly have imagined that someday, scientists would be surfing the become publicly available, along with power- double helix from their desktops, making discoveries with the ful new tools to investigate them. To commemorate the 50th anniversary of click of a mouse. the double helix, we asked 50 experts to share Since Watson and Crick’s landmark April 1953 publication in their views about the key developments, par- ticularly since the release of the first draft CNature, has become a mainstay of and devel- genome sequence in 2000, as well as speculate opment, culminating in the sequencing, and public sharing, of the human on future advances. The staggering progress in genomic medicine in the past few years is genome sequence. The announcement of the “substantially complete” human just a prelude of the excitement in store. genome sequence this month, coinciding with the 50th anniversary of the double The biggest shock in the human genome is the paucity of genes. Even the earlier esti- helix, heralds a new phase in the application of genome science to improving hu- mates of 30,000 may prove too high. This man health. begs the troubling question,“Where is the hu- man complexity coming from?” The answer “There has been a remarkable transformation in the way we think about bi- shifts the spotlight from the genome to the ology,” says Eric Lander, director of MIT’s Whitehead Institute Center for proteome (and beyond). Genome Research. “We think about biology now as information.” Aravinda Chakravarti of Johns Hop- In honor of the 50th anniversary / : of Crick and Watson’s Nature kins University agrees: “Today, my students take it for 50 50 paper describing the double helix, granted they can browse the genome. Someday, we’ll Reflections Bio•IT World presents an exclusive online series of interviews with 50 do all these studies — sequence analysis, proteomics, on the Double Helix experts in genomic medicine. genotyping — from a single desktop.” bio-itworld.com/news/reflections_index.html SCIENCE MUSEUM/SCIENCE & SOCIETY PICTURE LIBRARY (TOP)

38 Bio•IT World April 2003 www.bio-itworld.com Illustration by Kayomi Tukimoto www.tukimoto.com Blueprint

“This structure has novel features which are of considerable biological interest.”

— J. WATSON & F. CRICK, Nature, 25 April, 1953 ERIC LANDER director, Whitehead Institute Center for Genome Research

“Watson and Crick’s discovery represented the high point of the molecular biology revolution, in that they reduced biology to molecules. But that contained the seed of the next revolution, to reduce biology beyond molecules, to pure information.”

The proteome — the sum of all proteins — itive DNA is being expressed,” says Affymetrix is far larger than the genome. For example, al- President Steve Fodor. “This is remarkable, be- ternative gene splicing produces multiple pro- cause the lore would be that only 1.5 to 2 percent teins from a single gene, which are then of the genome would be expressed.” chemically decorated with various moieties, Either the gene tally is wrong, or RNA serves producing a bewildering array of protein forms. many more functions than a simple messenger. Even RNA — the molecular messenger be- “Besides the standard genes that code for pro- tween genes and proteins — may play more tein, there are also genes that code for small roles than previously thought. The recent dis- RNAs,”says Adrian Krainer at Cold Spring Har- covery that RNA interference (RNAi), an excit- bor Laboratory. “There is this whole new ma-

ing new tool for gene manipulation first chinery associated with RNAi and related 112. WITH PERMISSION ELSEVIER SCIENCE studied in plants, also works in mammalian processes. There appear to be hundreds of genes CELL cells has not only opened new avenues of in- coding for small RNAs that are part of this.” quiry but also may yield a new class of drugs With so much attention lavished on gene (see “The RNAi Revolution”). count, the critical issue of gene regulation — the And last year, a study by Affymetrix found role of non-coding DNA sequences, the mystery + Transcription – Transcription much more RNA than expected in a survey of all of accessing tightly bundled chromosomal DNA CHUBB, J.R. AND BICKMORE, W.A. Reading the genome: Chromosomes containing genes expressed from chromosomes 21 and 22. to switch on gene sequences — will be a fasci- active genes (red dots) loop out (left), whereas inactive “We found that 30 to 35 percent of the nonrepet- nating research vista in the coming years. chromosomes adopt a more condensed state (right).

The RNAi Revolution

he discovery of RNA interference (RNAi) most important new advance in biology is the to them all more quickly,” Sabatini says. could not have been more timely. RNAi approach,” says Tom Cech, president of the Early adopters of RNAi sound a cautionary T “Genomics generated a much larger uni- Howard Hughes Medical Institute and Nobel lau- note. “We’ve been using it for about five years,” verse of targets,” says Bristol-Myers Squibb’s reate for the discovery of RNA . “RNAi is says Geoffrey Duyk, president of R&D at Exelixis. Nicholas Dracopoli. “The newer targets, which vastly more powerful than anything we have “The dirty little secret of RNAi is that you are we don’t have much experience with, have had. Even people who don’t know how to spell knocking down messenger RNA to knock out pro- slowed down the industry’s success rate.” RNA can use this successfully in diverse biologi- tein. Because proteins have different turnover The standard tools to probe gene function cal systems.” rates, you have to have a good way to measure were either cumbersome, such as knockout mice, RNAi is already making an impact. Genome- protein level and activity.” or poorly informative, such as gene expression. wide knockdowns have been carried out in Even skittish venture capitalists are falling for RNAi was described in 1998 by Andy Fire and organisms including nematodes. Small interfer- RNAi and its potential therapeutic value, with Craig Mello at the Carnegie Institute in Washing- ing RNAs (siRNAs), which silence genes in mam- backing for companies such as Cenix and Alny- ton, D.C. In plants and lower organisms such as malian cells, are now being designed against as lam Pharmaceuticals. the nematode, RNAi — a process in which dou- many genes as possible. “The great thing about recombinant DNA and ble-stranded RNA fragments target and elimi- One promising approach is to spot cells monoclonal antibodies was that they gave actual nate specific messenger RNA molecules — expressing defined genes on microchips for the drugs right from the start,” says Christoph West- probably helps defend against viruses and other analysis (see Ziauddin, J. and Sabatini, D.M. phal, of Polaris Venture Partners. “With foreign molecules. Nature 411, 107-110: 2001). “The hard part is not genomics, it just wasn’t clear when it would Two years ago, a paper in Nature dramatically printing the chips or doing the experiment,” says develop a drug.” showed that RNAi also worked in mammalian David Sabatini, Whitehead Institute Center for Firms such as Polaris, which is funding Alny- cells (Elbashir, S.M. et al. Nature 411, 494-498: Genome Research Fellow and co-founder of lam, hope RNAi can fuel the next wave of 2001). The role of this process in humans remains Akceli. “It’s picking the right sequences.” biotech breakthroughs. “If we are very fortu- vague, but, regardless, it possesses immense Each siRNA contains 21 nucleotides, but some nate, siRNAs will make good drugs,” Westphal experimental — and therapeutic — potential. sequences stick better than others. For once, says. “If we are unlucky, we still have a whole Fast, easy, and inexpensive — that’s what an researchers welcome the intense competition. natural cellular machinery that is open to small- experimentalist likes to hear. “To my mind, the “Hopefully, we will cover different genes, and get molecule development.” —M. B.

40 Bio•IT World April 2003 www.bio-itworld.com BILL HASELTINE chairman and CEO, Human Genome Sciences

“The human genome sequence is not an end in itself — it is the creation of a set of tools.”

(CONTINUED FROM PAGE 40) trait and do our hardcore biology on those, in- Nadeau’s lab is using mouse genetics to study The analysis of genome variations, chiefly single stead of doing every possible technology that ex- heart development. Crossbreeding mouse strains nucleotide polymorphisms (SNPs), performed in ists across the whole genome on everybody.” with characteristic traits, such as high heart rate, concert with the sequencing project, has also been a Nadeau has produced a computational model of revelation. The SNP Consortium has documented The Evolution Solution the interrelationships between these cardiac prop- some 2 million SNPs in the human genome, which One of the salutary lessons of the human genome erties (see “Heart and cell”). Says Nadeau: “The can serve as valuable landmarks in the search for sequence is the insight that evolution affords to the first time I showed this [model], a physician in the disease genes.“We never expected that SNPs would assembly of the genetic parts list.“Evolution tries to audience said, ‘Yes, that’s the heart, but we already be present in such densities,” says Dahlia Cohen, hold on to things that are functionally important,” know how it works.’ Then someone pointed out head of functional genomics at Novartis. says Eric Green, the new intramural director of the that we had figured it out with computers and ge- Interestingly, SNPs in noncoding genome re- National Human Genome Research Institute. netics in just months, rather than the 200 years it’s gions may be more important than expected. “In- Green is designing algorithms that hunt for taken physiologists!” stead of changing the nature of the protein, “multispecies conserved sequences” (MCSs) to un- The mouse is also invaluable for pharmacoge- variations may subtly change the nomics, as different strains exhibit dif- amount of protein produced or the ferent drug sensitivities, offering a timing of its production,” says Uni- Left ventricular End End strategy to identify genes related to versity of Chicago’s Nancy Cox. mass/body diastolic systolic drug response. “Right now, we can’t weight dimension dimension Understanding diversity is where do genomewide studies in humans,” some of the major challenges lie. says Howard McLeod of Washington Left ventricular Heart Rate “Producing any draft genome se- mass University in St Louis. “We just don’t quence is trivial compared to what it have enough patients to do the studies is going to take to understand varia- Posterior wall Fractional we want and still have statistically thickness shortening tion,” says Anthony Brookes of the valid results.” Karolinska Institute. But many Septal wall Stroke volume The world of microbial genomics groups are making rapid progress in thickness is undergoing a revolution. “The field documenting that variation, identi- was moving along slowly, then, sud- Exercise Thickness of Cardiac Body weight fying shortcuts to land valuable dis- wall/radius of output denly everyone wanted to have their ease markers. Chromosomal DNA is ventricle cavity bacterium sequenced,” says Steven inherited in blocks, such that the in- Salzberg of The Institute for Genomic Lost Retained Not Measured

heritance of one SNP may be diag- SOURCE: JOSEPH NADEAU, CASE WESTERN RESERVE UNIVERSITY SCHOOL OF MEDICINE Research (TIGR). Since sequencing nostic for an entire series. This Heart and cell: Data on the effects of elevated calsequestrin — a protein the first two bacterial genomes suggests that researchers don’t have with cardiac implications — in inbred mice can be fed into a computational (Haemophilus influenzae and My- model of the heart, shown here, that predicts physiologic changes. The next to sift through a million or more step is to add gene expression and other genomic data to do the same type of coplasma genitalium) in 1995, TIGR SNPs to find markers of disease or predictions for cellular activity. has finished 20 more. “Studies of bac- drug response. teria have revealed the power of ge- The National Institute of Health’s International earth hard-to-recognize regulatory motifs. “Find- nomics better than anything else,”TIGR’s Jonathan HapMap Project aims to generate a genomewide ing these noncoding functional elements will tell us Eisen says.“Because those genomes are done.” map of SNP blocks, or haplotypes, in the next few a lot about how to make complex biological sys- Working in some cases with industry, Salzberg years. Genaissance already has its own haplotype tems,” Green says. His lab is currently hunting adds, “We’ve quickly turned this information into database, based on the analysis of about 7,000 MCSs across 12 diverse genomes. “A lot of people candidate drugs or vaccines.” With the om- genes. According to Sequenom Chief Scientific Of- thought that by sequencing the mouse, we would nipresent bioterrorism threat, more opportunities ficer Charles Cantor, “Having the draft sequence find all the important regions. Clearly that’s not will follow (see “Sequence Signatures and Home- has helped us do genetics far more efficiently. We true,”he says. land Security,” page 34). But just as wondrous is want to study just a few, really important, disease This is not to malign the mouse genome, the window this has opened onto evolution. “We genes ... Within the next six months, we will have which is (to some) as important as the human se- are starting to get a better picture about very early more than we can study.” quence.“Not only do we know the sequence of the events such as the origin of microbes,”Eisen says. Using wafer arrays that hold 12 billion mouse genome, we know the variations in se- oligonucleotide probes, Perlegen has produced a quence among strains,” says Joe Nadeau at Case Genome Glut haplotype map of 1.7 million SNPs, based on 25 Western Reserve University School of Medicine in As the Homo sapiens sequence moves into the individuals of diverse ethnic background. CEO Cleveland. Mouse strains can vary dramatically in “complete” column, it joins 100 genomes already David Cox’s strategy is to identify “the top list of their cognitive properties and susceptibility to ge- finished.“Enormous advances are being made fill- 100 to 200 regions in the genome involved with a netic and infectious diseases, such as anthrax. ing in the data, understanding where all the genes

42 Bio•IT World April 2003 www.bio-itworld.com GEORGE POSTE CEO, Health Technology Networks, and chairman, Orchid Biosciences

“In many instances, we are talking about not one, but a constellation or cassette of genes that are at the root of a disease.” are in these organisms, and going beyond that,” Nature 415, 180-183: 2002). Once again, this is genomewide bug. Gene-IT’s Biofacet software can says ’s Michael Snyder. A genera- just a prelude to similar studies in humans. compare so many sequences, “We haven’t found tion ago, researchers could study only a gene at a Using the yeast two-hybrid method, Hybrigen- an upper limit yet,” claims Richard Resnick, vice time: These days, a graduate student can study an ics is building libraries of validated protein-pro- president of services. Using the Dutch National entire genome. tein interactions, which can be viewed through Supercomputer TERAS (a 1,024-CPU SGI Origin Julie Ahringer’s team at the Wellcome maps called PIMRiders (PIM stands for Protein 3800), Biofacet took 520,000 CPU hours to per- Trust/Cancer Research Institute in Cambridge, Eng- Interaction Map). For example, the company re- form 70 million protein sequence alignments land, has systematically turned off more than 85 cently identified human proteins that interact across 82 organisms, searching for proteins that percent of genes in the nematode Caenorhabditis el- with HIV.“All the drugs available today are direct- are common in bacteria but not in humans, and egans using RNAi (Kamath, R.S. et al. Nature 421, ed against the viral protein,”explains CEO Donny hence might make good targets for antibiotics. 231-237: 2003). Nematodes grazed on a lawn of Strosberg. “But if you could target human pro- bacteria containing small interfering RNA (siRNA) teins that interact with the virus, you could side- Debugging Bioinformatics inhibitors designed against the nematode genome. step the problem of viral variability.” Despite the oft-cited “data deluge,” the ability of “This would have been inconceivable five years ago,” Suppliers are keeping up with the genomewide the best bioinformatics algorithms to predict se- marvels Cold Spring Harbor’s Lincoln Stein. The trend. Invitrogen is generating a human ORFeome, quences and structures leaves much to be desired, biggest surprise? “How many genes you can knock allowing customers to select ORFs over the Web. as ongoing efforts to determine the total number out without killing the worm,”Stein says. Applied Biosystems (ABI) has stockpiled about of genes attest (see “The Dark Side of Genomics”). In complementary research, Mark Vidal’s 120,000 SNP assays, while it continues polishing For example, it is standard practice to use DNA group at the Dana-Farber Cancer Institute is dili- Celera Genomics’ genome sequence data. “We’ve or protein sequence to predict a protein’s structure gently engineering DNA constructs for all the ne- had a big program to catalog the functional varia- and, possibly, function. But this remains an impre- matode genes, or open reading frames (ORFs). tion in the genome,” says Mark Adams, vice presi- cise science. “Once the similarity between two se- These clones purposely lack the flanking regulato- dent of informatics. quences drops below 30 percent, most of the ry DNA sequences that govern how and when the Software developers have also caught the proteins will change their function in sometimes gene is switched on, making them clean experi- mental tools. Vidal’s ambitious “ORFeome” pro- ject will confirm whether those ORFs predicted by The Dark Side of Genomics gene-hunting software are genuine, and settle the worm’s gene count. dentifying genes is one of the more glamorous genes” — those predicted by computer pro- At the Whitehead Institute, Richard Young has aspects of genomics, but it’s difficult picking grams but unverified by other techniques. developed “genomewide location analysis.” Iout the 1 percent to 2 percent of coding AnVil and Applied Biosystems (ABI) collaborat- Young’s group uses a combination of laboratory sequences amid all the nonsense and junk DNA. ed to shed light on the “dark” genome using PCR methods and informatics to understand how gene Most genes undergo alternative splicing, pro- (polymerase chain reaction) methods with Taq- expression is regulated by DNA-binding tran- ducing two or more different proteins. Then, Man, microarrays, and AnVil’s advanced analyt- scription factors, mapping the binding sites of there are pseudogenes — genes that look func- ics, to study about 10,000 putative genes. AnVil more than 100 of 141 such factors in baker’s yeast tional but aren’t. Some of these are “dead genes,” designed the analytical program that “deter- (Lee, T.I. et al. Science 298, 799-804: 2002). according to University of California at Santa Cruz mines whether the lab data are hard evidence bioinformatician David Haussler. “They once had that these are genes,” AnVil’s John McCarthy says. Young’s group is now turning its sights on the a function,” he says, “but they have accumulated Once “found again,” those genes will help final- 1,700 or so transcription factors in humans. So, enough mutations to slowly decay and become ize the actual number of genes. ABI reports that too, is Snyder:“This will tell you which genes these worthless.” Distinguishing functional genes from 2,400 to 2,500 genes were confirmed after factors regulate. The ultimate goal is to under- pseudogenes is not trivial. screening 9,500 predicted ones. Aside from the stand the regulatory circuitry.” Thanks in part to the mouse genome, com- inherent scientific interest, “The assays for those Protein interactions are popular targets for pleted last year (see Paper View, Feb. 2003 genes will make a nice addition to our offering,” drug development. “We now recognize that a lot Bio•IT World, page 46), the emerging consensus ABI’s Raymond Samaha says. of intercellular signaling is between proteins is that the total number of human genes is less Ultimately, identifying all genes will be a themselves, rather than just between proteins and than the initial 30,000 estimate. But uncertainty walk in the park compared to what lies ahead. small molecules,”says Walter Gilbert, Nobel laure- remains, and Haussler concedes that “computa- Eric Green of the National Human Genome ate at Harvard University. Last year, two indus- tional prediction of genes is simply very hard.” Research Center says: “We will find all the genes. try/academic consortia made impressive progress New programs such as TwinScan from Washing- But we aren’t even in diapers yet in terms of find- using mass spectrometry and other methods to ton University in St. Louis, and Genomix’s EXP6, ing all the other stuff — the functionally impor- map the yeast “interactome,”which involves prob- are assisting efforts to not only identify novel tant sequences and the regulatory elements.” ably 30,000 protein-protein interactions (Gavin genes but also confirm the existence of “dark —M. B. A-C. et al. Nature 415, 141-147: 2002; Ho, Y. et al.

www.bio-itworld.com April 2003 Bio•IT World 43 JANET THORNTON professor of biomolecular structure, University College, London

“It’s clear now that we have a relatively small set of protein folds. The complexity of life comes from how they go together and how they have evolved to perform different functions.” subtle or radical ways,”explains Janet Thornton of al (see www.magicad.org.uk). the European Bioinformatics Institute. Indeed, se- The explosion of genomewide data Functional Maps Genome, ORFeome or Proteome quences sharing 90 percent identity may have dif- from DNA microarray studies is strik- 1 2 3 ... n ferent functions. The leap from structure to ing (see “The Maturation of Microar- function is equally precarious. Thornton and col- rays,” page 46). But proteomics is Genotypes, Transcriptome Conditions leagues have documented a haphazard relation- catching up. “We’ve already seen excit- Standardized ship between the sequences, structures, and ing signs that proteomics can find bio- Phenome Phenotypes , 104, 333-339: 2001 functions of proteins containing one very com- markers, and this is a rich area of Proteome as CELL mon type of fold known as TIM barrels. research,” says Ruedi Aebersold of the Interactome Partner Since predictions about structure and function Institute for Systems Biology. Cell, Tissues, Localizome Development, Time are often filed alongside sequences in databases as Roland Eils, co-founder of bioinfor- Proteome as “annotation,”misinformation gets propagated. “We matics firm Phase-it (recently acquired Enzymome Substrates found a whole number of proteins that were anno- by Europroteome), concurs. At the Ger- With Various tated by virtue of their similarity to proteases, but man Cancer Research Center, Eils has Foldome Partner Pairs weren’t proteases,” says struc- been mining data generated from mouse ADAPTED WITH PERMISSION FROM VIDAL, M. Oodles of omics: Genomic data is interrelated in many tural biologist Gregory Petsko.“I have a terrible feel- breast tumors using Ciphergen’s Surface ways. Piecing together a single cellular effect requires multi- ing that the number of wrong annotations is huge.” Enhanced Laser/Desorption Ionization ple types of data. The difficulty of filtering critical data from (SELDI) system.“Mouse tumors are sim- background noise is affecting new fields such as ilar to human in terms of how they progress,” Eils R. Mark Adams of Variagenics (now part of Nu- metabonomics, the study of an organism’s says, “but lab mice are more homogeneous,” such velo) that predicts the effects of a given SNP on metabolites. “The metabonome for every organ- that differentially expressed proteins are more protein structure and function, which will help ism is different, and it changes with age,”says Jere- clearly defined. winnow the number of SNPs used in clinical tri- my Nicholson of Imperial College London, whose But these are baby steps. “The huge challenge als. A program called Multiprospector, written by group coined the term. (The metabolome, by con- ahead for proteomics and genomics has to do Jeffrey Skolnick (University of Buffalo), models with how we interpret, analyze, and interactions between proteins of undetermined store the data,”Aebersold says. “How do structure and “predicts the 3-D structure of the you interrelate the information from complex.” And EraGen Biosciences’ new “evolu- two types of experiments, such as mi- tionary proteomics tool” contains data from Gen- croarrays and proteomics?” Bank and other private databases clustered into Gene Myers, of the University of Cali- 150,000 protein families. The platform enables fornia at Berkeley, sees a more fundamen- several types of analysis, including multiple se- tal problem: “We are building bigger and quence alignment and evolutionary trees. bigger computers, but, frankly, we haven’t While the genome may never be 100 percent made the computer easier for biologists to complete, progress over the past two years is such use.”The software is often too difficult for that, “The sequence is now in much better shape,” “busy, senior molecular biologists” to says Richard Durbin of the Wellcome Trust learn, creating “a barrier to discovery.” Sanger Institute, “and we have better tools for Another impediment is data hoard- dealing with it.”But large tracts of repetitive DNA Family breakup: EraGen’s MasterCatalog clusters sequence ing. Many researchers argue that genom- remain “impossible to sequence with current databases into more evolutionary families than traditional ic data should be made public instantly, technologies,” says David Haussler of the Univer- methods, allowing better discrimination between related and unrelated proteins. even if this causes problems for scientists sity of California at Santa Cruz, host of the “Gold- who, naturally, seek first dibs on their en Path” genome portal. trast, is the full complement of metabolites in a own data. “People are scared of bioinformatics,” With the sequence now secure, there is a shift given cell.) There are 600 to 700 known major asserts Peter Weisner, a consultant and formerly of away from raw data generation to a more holistic metabolites, leading to vast numbers of potential LION Bioscience and Phase-it. “People would approach, called systems biology. Says Perlegen’s combinations and different sets in each cell. “The rather keep their data in their computers, because David Cox: “Everyone was told, ‘If you have the in- metabonome is so big, we may never know how they were afraid that good bioinformatics would formation, [drugs] will fall out like a gold nugget.’ big it really is,” Nicholson says. His group has de- find more information than they had.” They are still waiting for the clunk! Finding the ‘nee- veloped a metabonomics blood test that can non- But Steve Lincoln, of Invitrogen subsidiary In- dles’ in the data haystacks has proved hard. The new invasively diagnose coronary heart disease formax, says, “Bioinformatics isn’t magic; it’s just trend is to coalesce data into computer models of (Brindle, J.T. et al. Nature Medicine 8, 1439-1444: a tool like everything else.” Among the more known biological pathways and networks.” 2002). Patients are being recruited for a larger tri- promising tools is a new algorithm developed by (CONTINUED ON PAGE 46)

44 Bio•IT World April 2003 www.bio-itworld.com ALLEN ROSES senior vice president, GlaxoSmithKline

“I think a decade from now pharmacogenomic testing will be a requirement. Before you try something on the public, you’ll have to do genotyping.”

The Maturation of Microarrays

he advent of DNA chips in the late 1990s the Web: “A lot of statisticians like me Primary Tumors Metastases came, conveniently, just as genome- noticed and downloaded the data. I car- T sequencing projects were gaining ried those [results] around on my laptop momentum. At last, scientists could study the for a long time, trying to figure out expression or sequence of thousands of genes what was going on.” simultaneously. The breakthrough came when post- “Three years ago, people had the idea that doc Katie Kerr showed him a striking studying gene expression ... would somehow graph (see “Something fishy”). “With a open our eyes to disease pathways never seen data set like that, you need replication before,” says Anthony Brookes, of the Karolinska so you can sort out the signal from the Institute. But there were problems — in chip noise,” Churchill says. Kerr’s graph manufacture, in training, and particularly in showed that the two fluorescent dyes data analysis. “This turned part of biology into a typically used in microarray experi- data-rich area,” says Terry Speed, at the Universi- ments had different intensities, and that ty of California at Berkeley. “Where before you was skewing the data plot. “A lot of had a notebook to write your results in, now you excellent statisticians have migrated to have megabytes in the computer.” the field,” Goodman says, “but Churchill and Speed were among the first, and 0.4 pushed the hardest.” Both scientists 0.3 maintain useful Web sites (see

0.2 www.jax.org/staff/churchill/labsite/ index.html and stat-www.berkeley.edu/ 0.1 users/terry/zarray/html). 0 Results from commercial and home- Residual

–0.1 made DNA chips have improved dramati- Metastases Genes Primary Genes Tumor cally with experience. “The goal was to 33. © NATURE PUBLISHING GROUP. USED WITH PERMISSION. –0.2 get the noise from the chip to become –0.3 irrelevant,” says Affymetrix President NATURE GENETICS

–0.4 Steve Fodor. The payoff is coming, 6 6.5 7 7.5 8 8.5 9 9.5 10 10.5 11 although in a slightly different way than Fitted Value

COURTESY KATHLEEN KERR, UNIVERSITY OF WASHINGTON, SEATTLE anticipated. “Rather than a research tool LOW Normalized Expression HIGH

Something fishy: The fishtail pattern, that gets you to the primary cause [of a RAMASWAMY, S. ET AL. elicited by advanced statistical analysis, disease], it’s another type of phenotype,” Spot the difference: Expression of 128 genes alerted researchers to how the two dyes analyzed in 64 primary and 12 metastatic cancers Brookes says. In other words, chips reveal behave differently — just one of many reveals a subset of 64 genes overexpressed in key error promulgators lurking in micro- important differences, but not necessari- metastases (red, bottom right) and some primary array data. ly the reasons for those differences. tumors (where the arrow points). The past two years have witnessed a Biologists weren’t used to that, and it showed. stream of reports dissecting gene expression et al. Nature Genet. 33, 49-54: 2003). In the early days, says Nat Goodman of the Insti- “signatures” in cancer. At the Whitehead Insti- Further enhancements are on the horizon. “If tute for Systems Biology, researchers declared tute Center for Genome Research, Sridhar protein chips could really work, they could be gene expression as “significantly different if you Ramaswamy, Todd Golub, and colleagues very valuable,” says Scott Patterson, who just saw a twofold or threefold change, without any described a signature that predicts whether joined Farmal Biomedicine from Celera attention to replication or statistical criteria.” tumors are likely to metastasize. Genomics. “The real future lies in the integra- Gary Churchill, a staff scientist at the Jack- This study used data from a variety of DNA tion of data from many different sources — son Laboratory, chanced upon one of Stanford chip platforms, and several tumor types, show- genotypes, proteins, metabolites, and arrays,” University microarray pioneer Pat Brown’s ing how much more robust the data and the Churchill predicts. “It is really essential to find groundbreaking Science papers while surfing analytical tools have become (Ramaswamy, S. new ways to tie them all together.” —M. B.

46 Bio•IT World April 2003 www.bio-itworld.com LEROY HOOD founder, the Institute for Systems Biology

“The genome is the beginning of this wonderful new adventure into systems biology and towards revolutionizing medicine.”

(CONTINUED FROM PAGE 44) drug fared better than others. In a “We need computational techniques comparison of patients with differ- that will not only help us decipher ent responses, Paris revealed physi- genomes, but that can integrate the ologically important sets of genes many different levels of information that are turned off or on. Most im- coming out of the genome,” says Leroy portantly, the proteasome complex Hood, founder of the Institute for Sys- — Velcade’s cellular site of action tems Biology. He cites Eric Davidson’s — appeared to be working at dif- studies of sea urchin development, ferent levels in the patients. which “have transformed our under- In another approach to answer- standing of gene regulatory networks” ing this question, Millennium is (see “Major networks” and Davidson, evaluating a subset of 30 genes that E.H. et al. Science 295, 1669-1678: 2002). may indicateof response.“The idea Whitehead Institute Fellow Trey that there might be a genetic com- Ideker is building a systems model ponent to response [to Velcade] called Cytoscape. “We are starting to evolved with the understanding of get a glimpse of how genes and pro- the genome and microarray tech- teins interact with drugs and hor- nology,” says the drug’s developer, mones to dictate function,”Ideker says. Millennium’s Julian Adams. “That Simpler organisms — notably, yeast — was not in my consciousness when , 421. © NATURE PUBLISHING GROUP; USED WITH PERMISSION are the proving ground where Ideker this was begun.” and his ilk test their programs, plug- NATURE ging experimental data into the model Beyond the Genome to see if it can account for the observa- Whether systems biology lives up tions. The current success rate is about HOOD, L. & GALAS, D. to its promise, everyone wants to Major networks: a) Transcription factors control a cascade of gene regulation 40 percent. The big payoff, of course, (arrows, activation; ⊥ , inhibition. b) The regulatory region of gene endo16 know when the fruits of the will be if they can do this for multiple enlarged, showing 34 DNA-binding sites in six clusters. genome will be translated into data types and in humans.“Interaction new and better medicines. Signs data is the hot area of research now,”he says. tions of 500 genes and proteins. Bernhard Palsson of progress are subtle, but growing.“Everything in Human data are already being analyzed in and colleagues at the University of California at our pipeline is genomics-based,” says Human some places. Gene Networks Sciences has built a San Diego, meanwhile, have produced a model of Genome Sciences CEO Bill Haseltine. He means detailed model of a cancer cell, representing the ac- red blood cell metabolism. Analyzing sequence genomics in the new sense — woven in with the data from patients with hemolytic anemia, the rest of drug discovery and development. Other model accurately predicted whether particular companies hope to follow suit, from Celera to big SNPs were linked to severe or mild forms of this pharmaceuticals. GlaxoSmithKline still has a disorder. Palsson, a co-founder of Genomatica, heart-disease drug in Phase II trials that sprang cautions that “other cells are 20 to 50 times more from Smith-KlineBeecham’s groundbreaking complicated” than red blood cells. 1993 deal with Human Genome Sciences — one Proteasome The turning point for these models will be of the few genomics-derived drugs that hasn’t “when they advance from rediscovering pathways been dropped in early trials.

NFκB — finding what we already know — to making Even if progress is too incremental for Wall novel predictions we can then prove,” says Andrea Street, the future of genome-based science and Califano of First Genetic Trust. medicine is wondrous. “Clearly, we have not Millennium Pharmaceuticals has obtained hints peaked in appreciating the true value of the of such results from “Paris,” its pathways analysis genome projects,”says Tom Cech, president of the

TNF Path platform that includes literature, data analysis, and Howard Hughes Medical Institute. pathway visualization tools. Millennium scientist Fifty years from now, we may be looking back COURTESY MILLENNIUM PHARMACEUTICALS Paris match: Millennium scientists compared the George Mulligan recently used Paris to study pa- nostalgically at the genome revolution, just as signature of a nonresponder patient with that of tient responses to the cancer drug Velcade. Mulligan we have celebrated the 50th anniversary of the a group of good responders, revealing possible sifted through “an avalanche of data” — microarray double helix. Whatever breakthroughs lie ahead, molecular mechanisms affecting response, such as the proteasome complex, NFκB, and the results on tens of thousands of genes from about 50 they will owe a profound debt to this pair of his- TNF pathway. patients — to understand why some patients on the toric feats. ●

50 Bio•IT World April 2003 www.bio-itworld.com