NATURALISM IN THE COMPUTER AGE A Field Guide to Computational

HARLES DARWIN WOULD The and Bio- vast amount of information that holds never have predicted that informatics Discussion Group, one of a the very secrets of humanity. Bioinfor- his disciplinary decendents – diverse set of groups that participates in maticists – who specialize in the mathe- C biologists driven by a passion the Academy’s Frontiers of Science pro- matical analysis of large data sets – are for exploration and observation of the gram, never fails to intrigue its members, working feverishly to mine this data to natural world – would do their most pro- not least because of its unpredictability. discover the inestimable gems it holds. ductive work in an office. But a rapidly The questions computational biologists Although we have collected the infor- growing army of modern day naturalists seek to answer are as broad as the prob- mation we need to understand life, it is focuses on understanding the complex lems that constitute all of biology. The encoded in layers of complexity – a tri- details of the biological world through common link is not the nature of the umvirate of sequences (DNA, RNA, and an exploration instrument highly diver- questions, but the approach to answering protein) stores the instructions for the gent from Darwin’s Beagle – the desktop them. Still, a handful of problems have molecules that regulate life processes. computer. Computers have revolutionized the way we meet, carry out business, tell jokes, The questions computational biologists seek to answer are share photographs, and pay our bills. It is no surprise that they have also radically as broad as the problems that constitute all of biology. changed the landscape of scientific The common link is not the nature of the questions, but the research. Though not all experimentalists have packed up their benches to make approach to answering them. room for rows of processors, a consider- able number of scientists are relying on computers as research equipment. Burk- emerged as the centerpieces of computa- One method of mining this information hard Rost, a bioinformaticist at Colum- tional biology and it is worth noting that is to create sequence libraries – raw DNA bia University, predicts that compu- each deals with a complex system and an from cancerous tumors, RNA coding tational tools are so powerful that “no enormous amount of data. For a brief regions, amino acid recipes for proteins – experimental lab will live without [them] tour of the state-of-the-science and a small and enlisting computers to chug through by the end of this decade.” taste of the past year’s talks, keep reading. the vast amounts of data they hold to The overarching field encompassing find patterns, repetitions, or other inter- the application of computers to answer MINING THE TINIEST GEMS esting anomalies. biological problems has been dubbed When asked to name the key advantages Put simply, sequence libraries are computational biology, although, as with of using computational approaches to enormous databases filled with the codes all emerging disciplines, there is consid- answer biological questions, Cold Spring that underlie nature. Researchers who erable murkiness surrounding the label. Harbor Laboratory molecular biologist employ brute force approaches to crack You would be hard pressed to find a 21st Michael Zhang does not hesitate: “Speed, them will likely fall short of a break- century scientific laboratory that does economy, and necessity.” Such an incisive through. Analyzing this vast data requires not rely heavily on computers. But response resonates with researchers work- creative, intelligent analysis, and, more searching PubMed online for the latest ing on one of today’s biggest biological often than not, the development of clever research paper or running a microscope problems: understanding human health mathematical algorithms. Laxmi Parida, from a desktop machine does not a com- and treating disease. The past few years of the IBM T.J. Watson Research Center, putational biologist make. Computa- have brought great leaps forward in this developed a sophisticated mathematical tional biology, along with its close endeavor. Most strikingly, the comple- technique for identifying important pat- relative, bioinformatics, is a science whose tion of the human genome project terns in protein sequences. “Once biology practitioners use computerized theoreti- (which itself relied on serious computa- went molecular it paved the way for cal models as their primary research tool. tional prowess) has brought scientists a computational approaches,” she noted,

18 Update November/December 2005 www.nyas.org alluding to the sequences of biological can lead to the design of therapeutic theoretical model of the electrostatic molecules that can easily be converted into drugs. Comments Tamar Schlick, an interactions in chromosomes can be digital strings of 1s and 0s for computers applied mathematician who directs the worked out with a pencil and paper: to read. Her statistical method seeks to computational biology doctoral pro- DNA’s phosphate backbone is negatively demystify the large data set of protein gram at New York University: “Modeling charged and the amino acid side groups sequences, which determines the ultimate and simulation allow us to experiment of histones (the nucleosome’s protein structures and functions of the molecules with situations that are unfeasible, too components) are positive. Everybody responsible for the diversity of life. expensive, or too dangerous to explore in knows that opposites attract, but try Life’s diversity is what inspires MIT’s the wet laboratory.” scaling up to take into account hundreds Christopher Burge, who uses messenger Molecular modeling is one of the cor- of thousands of atomic interactions and RNA libraries to understand the elusive nerstones of computational biology, evi- you will quickly run out of paper and rules of gene expression. To shed light on denced by the number of scientists who patience. Utilizing computing power to how a small set of genes is carefully regu- speak on this topic to the discussion get to the bottom of the nucleosome’s lated to achieve a wide range of purposes, group. Wilma Olson of Rutgers University architecture, Olson discovered that charge Burge’s lab set out to create a library of uses computational modeling to under- neutrality was not, as expected, the pre- exonic splicing silencers – short strings of stand the physical and chemical proper- vailing force dominating the deforma- RNA that turn off a particular gene. Such ties of the nucleosome, a deformed tion. In fact, the actual sequence of DNA a library provides scientists with a refer- region of the chromosomal scaffold. A seems to be encoded in order to best twist ence for understanding the vitally impor- tant RNA splicing code, which edits and translates DNA instructions into protein products. Powerful biological insight, like the peek of nature that Burge is priv- ileged to view, has propelled scores of mathematicians and computer scientists to move into the world of biology, creating ILLUSTRATION: FRED OTNES – IMAGES.COM ILLUSTRATION: an explosion in the field of bioinformat- ics in recent years.

MODEL MOLECULE One of the classical rules of scientific research is that if you can replicate a phenomenon, then you truly understand it. Car mechanics well know that there is no better way to grasp the inner workings of an engine than to take one apart and then rebuild it. The most important machines in the world are proteins and nucleic acids, the molecules that direct life. Even though they are unimaginably small, they are orders of magnitude more intricate than any car engine. The size and complexity of objects worth under- standing in biology render most of them impossible to comprehend through experimental tools. This is where computers come in handy. With a few strokes on the key- board – after some sophisticated pro- gramming – computational biologists can create models of interesting systems and simulate their activity. A movie of a protein folding into its native state can provide an incredible degree of insight into the function of a real protein, which

www.nyas.org November/December 2005 Update 19 quantum mechan- ested in fighting diseases like cancer. ics provides us with Ronald Dunbrack of the Fox Chase Cancer a set of laws that Center tries to predict the unknown can accurately pre- structures of proteins through a tech- dict the movement nique known as comparative modeling. of molecules in Dunbrack pilfers theoretical methods the natural world. from physics, computer science, mathe- Unfortunately, the matics, robotics, and quantum field the- math that describes ory, and cleverly applies them to better the interactions understand the molecular basis of cancer. between particles is so intensive that PLAYING BATTLESHIP it is not practical IN THE GENE AGE to simulate a bio- One of the unparalleled successes of logical system in computational biology has been the all its glorious development of the gene chip or detail, unless you microarray. Modeled on a silicon chip, have several life- this marvelous device separates individ- times to spare. ual genes or gene regions into the thou- Assumptions must sands of spots on a tiny grid that looks Molecular modeling provides a view of molecules that are too tiny to see, even be made, and something like a Battleship gameboard. with a microscope. This protein-DNA complex (DNA polymerase beta with primer/ determining what As researchers deposit DNA or RNA onto template DNA) was simulated by Tamar Schlick’s computational biology group at NYU (by Ravi Radhakrishnan and Karunesh Arora) to understand at the atomic level approximations the chip, components line up specifically the mechanisms that regulate fidelity, or the faithful duplication and repair of DNA. are appropriate for at the holes where similar DNA or RNA answering a specif- is located. This enables scientists to assess ic question about a all the genome’s DNA or all the RNA into the stable nucleosome structure. particular system is sometimes more of transcripts simultaneously, providing a In his presentation at the Academy, an art than a science. genome-wide picture of gene expression. Princeton University’s Ned Wingreen Computational modeling is artistic in A microarray view of an individual’s described a minimalist model he created other ways, as anyone who has seen a DNA can help decode which genes give to mimic protein folding. Where Olson molecular movie will affirm. Such color- rise to cancer or other diseases. Bio- focused on charge and the electrostatic ful renderings provide a visual image of informatics laboratories like that of Olga forces inherent to nature’s building chemical events under experimental Troyanskaya at Princeton University, use blocks, Wingreen took into account only scrutiny as well as a mathematical model microarrays to identify mutations in the hydrophobic effect, looking at how to explain confounding results. Steven tumor DNA. Specifically, Troyanskaya different protein residues interact with Schwartz, a theoretical chemist at the searches for copy-number changes – nat- water. His goal was to determine the des- Albert Einstein College of ignability of protein structures, and his Medicine, points out that substantial approximations were justified computation does not when he compared his theoretical results only bolster experiment, to the structures of known proteins. but can also surpass discov- Wingreen’s reductionism points to eries at the bench. “There one of the key challenges in modeling: are some questions that just the three-way trade-off among the com- cannot be answered with peting computational currencies of time, experiment – the enzyme system size, and accuracy. Stating reason- mechanism question is one able assumptions about the natural world where computation has that are accurate enough for your system brought basic new ideas to of interest without being too computa- the field.” tionally expensive is a central source of Likewise, predicting tension in computational biology. Just as protein structures, a task Newton’s simple laws of motion can pin- that often daunts experi- This microarray contains 6,000 spots, each of which holds a small gene point the arrival time of a transatlantic mentalists, presents a fer- region of the parasite , the cause of the most flight by considering the speed, distance, tile opportunity for com- deadly form of human . Joseph DeRisi at UCSF analyzes the lit and frictional forces such as air resistance, putational biologists inter- regions on the chip to understand how genes are regulated.

20 Update November/December www.nyas.org urally arising amplifications or deletions these disciplines in biological research.” biologists from more traditional groups of genes – which may cause disease. Yet the field is exciting enough to attract have been opening dialogs with compu- Microarray data is analyzed through individuals with backgrounds in each of tationalists, knowing that computers will highly sophisticated mathematical algo- these distinct disciplines willing to learn no doubt foster the future of discovery. rithms, and many bioinformaticists are the languages and cultures of all of the The emergence of computational biol- mathematicians and computer scientists other research specialties. Chris Wiggins ogy as a powerful interdisciplinary research who find their way into the messy world highlights the collaborative interdiscipli- area has vastly altered the biological sci- of biology. Columbia University’s Harmen nary appeal of computational biology, ences, opening new routes for the explo- Bussemaker, a one-time theoretical physi- “Because we are all coming from differ- ration of the natural world. As the field cist, studies gene expression through RNA ent backgrounds, all of us not only speak continues to evolve and more researchers microarrays. At the Academy a graduate multiple [scientific] languages, but we learn computational techniques, scien- student in his lab, Barrett Foat, highlight- know how to communicate in a language tists will be reminded that at the core, ed the group’s reverse engineering that doesn’t presuppose that somebody computational biologists are just natural- approach to identifying the noncoding else already knows ... [a] particular piece ists for the 21st century. RNA sequences that control genes, turn- of technical jargon.” At the Academy, –Kiryn Haslinger ing them on or off. Since its development the mid-’90s, the microarray has become the work- 2005 Highlights from the NYAS Computational Biology horse of bioinformatics research. It has & Bioinformatics Discussion Group also set a standard for data-driven com- putational biology, forcing scientists in For complete summaries and access to the talks, go to more traditional fields to perk up and www.nyas.org/compbio discover the power of computation. Chris Wiggins, a theoretical physicist-turned- May 11, 2005 computational biologist at Columbia, Sequence Signals for RNA‹ Processing › pointed to the transformative power that N Searching for the right words: Computational approaches to understanding RNA microarray technology has exhibited on stability regulation, Barrett Foat, Columbia University the field: “I think microarrays have N Genetic determinism and the central dogma: The path to an RNA splicing code, shown how well biology can be revealed Christopher Burge, MIT through data, and have convinced biolo- March 9, 2005 gists that they benefit from talking to Seeing : Computational methods provide microscopic numerically minded people. Now it’s visions into biological systems time to take a ‘high throughput’ approach N The twists and turns of lipid bilayers, Richard Pastor, US Food and Drug to the rest of biology – microscopy and Administration other technologies should be amenable N Lipid-Protein interactions viewed through an MD lens, Alan Grossfield, IBM T.J. to these same data-driven approaches.” Watson Research Center N Antifreeze proteins and the angular structure of water, Kim Sharp, University of COMPUTING COST Pennsylvania AND BENEFIT Microarrays, simulations, and sequence February 17, 2005 libraries have irreversibly altered biological Base Camp: Developing strategies for predicting microRNA targets research with stunning findings about N Worlds apart: microRNAs and their targets in higher plants and animals, David nature and health. Often the most exciting Bartel, MIT N Looking for teamwork: predicting targets for microRNA cooperativity, discoveries in computational biology are Nikolaus Rajewsky, NYU not concrete results, but novel methods N Learning from validated microRNA/messenger RNA targets, Frank Slack, Yale and algorithms whose power lies in their University versatility to be applied to a variety of prob- lems. As a result, newcomers to the field January 12, 2005 may find it dauntingly technical, requiring Reaching for Biology’s Holy Grail: Novel methods for understanding a higher level of mathematical literacy protein sequence alignment than many other biological disciplines. N Identifying co-conserved patterns in multiple-sequence alignments, Andrew NYU’s Tamar Schlick admits, “It’s Neuwald, Cold Spring Harbor Laboratory much easier to be a mathematician, N Upgrading the toolbox for protein structure prediction, Roland L. Dunbrack, Fox chemist, or computer scientist, than a com- Chase Cancer Center putational biologist who must employ all

www.nyas.org November/December 2005 Update 21