A Field Guide to Computational Biology
Total Page:16
File Type:pdf, Size:1020Kb
NATURALISM IN THE COMPUTER AGE A Field Guide to Computational Biology HARLES DARWIN WOULD The Computational Biology and Bio- vast amount of information that holds never have predicted that informatics Discussion Group, one of a the very secrets of humanity. Bioinfor- his disciplinary decendents – diverse set of groups that participates in maticists – who specialize in the mathe- C biologists driven by a passion the Academy’s Frontiers of Science pro- matical analysis of large data sets – are for exploration and observation of the gram, never fails to intrigue its members, working feverishly to mine this data to natural world – would do their most pro- not least because of its unpredictability. discover the inestimable gems it holds. ductive work in an office. But a rapidly The questions computational biologists Although we have collected the infor- growing army of modern day naturalists seek to answer are as broad as the prob- mation we need to understand life, it is focuses on understanding the complex lems that constitute all of biology. The encoded in layers of complexity – a tri- details of the biological world through common link is not the nature of the umvirate of sequences (DNA, RNA, and an exploration instrument highly diver- questions, but the approach to answering protein) stores the instructions for the gent from Darwin’s Beagle – the desktop them. Still, a handful of problems have molecules that regulate life processes. computer. Computers have revolutionized the way we meet, carry out business, tell jokes, The questions computational biologists seek to answer are share photographs, and pay our bills. It is no surprise that they have also radically as broad as the problems that constitute all of biology. changed the landscape of scientific The common link is not the nature of the questions, but the research. Though not all experimentalists have packed up their benches to make approach to answering them. room for rows of processors, a consider- able number of scientists are relying on computers as research equipment. Burk- emerged as the centerpieces of computa- One method of mining this information hard Rost, a bioinformaticist at Colum- tional biology and it is worth noting that is to create sequence libraries – raw DNA bia University, predicts that compu- each deals with a complex system and an from cancerous tumors, RNA coding tational tools are so powerful that “no enormous amount of data. For a brief regions, amino acid recipes for proteins – experimental lab will live without [them] tour of the state-of-the-science and a small and enlisting computers to chug through by the end of this decade.” taste of the past year’s talks, keep reading. the vast amounts of data they hold to The overarching field encompassing find patterns, repetitions, or other inter- the application of computers to answer MINING THE TINIEST GEMS esting anomalies. biological problems has been dubbed When asked to name the key advantages Put simply, sequence libraries are computational biology, although, as with of using computational approaches to enormous databases filled with the codes all emerging disciplines, there is consid- answer biological questions, Cold Spring that underlie nature. Researchers who erable murkiness surrounding the label. Harbor Laboratory molecular biologist employ brute force approaches to crack You would be hard pressed to find a 21st Michael Zhang does not hesitate: “Speed, them will likely fall short of a break- century scientific laboratory that does economy, and necessity.” Such an incisive through. Analyzing this vast data requires not rely heavily on computers. But response resonates with researchers work- creative, intelligent analysis, and, more searching PubMed online for the latest ing on one of today’s biggest biological often than not, the development of clever research paper or running a microscope problems: understanding human health mathematical algorithms. Laxmi Parida, from a desktop machine does not a com- and treating disease. The past few years of the IBM T.J. Watson Research Center, putational biologist make. Computa- have brought great leaps forward in this developed a sophisticated mathematical tional biology, along with its close endeavor. Most strikingly, the comple- technique for identifying important pat- relative, bioinformatics, is a science whose tion of the human genome project terns in protein sequences. “Once biology practitioners use computerized theoreti- (which itself relied on serious computa- went molecular it paved the way for cal models as their primary research tool. tional prowess) has brought scientists a computational approaches,” she noted, 18 Update November/December 2005 www.nyas.org alluding to the sequences of biological can lead to the design of therapeutic theoretical model of the electrostatic molecules that can easily be converted into drugs. Comments Tamar Schlick, an interactions in chromosomes can be digital strings of 1s and 0s for computers applied mathematician who directs the worked out with a pencil and paper: to read. Her statistical method seeks to computational biology doctoral pro- DNA’s phosphate backbone is negatively demystify the large data set of protein gram at New York University: “Modeling charged and the amino acid side groups sequences, which determines the ultimate and simulation allow us to experiment of histones (the nucleosome’s protein structures and functions of the molecules with situations that are unfeasible, too components) are positive. Everybody responsible for the diversity of life. expensive, or too dangerous to explore in knows that opposites attract, but try Life’s diversity is what inspires MIT’s the wet laboratory.” scaling up to take into account hundreds Christopher Burge, who uses messenger Molecular modeling is one of the cor- of thousands of atomic interactions and RNA libraries to understand the elusive nerstones of computational biology, evi- you will quickly run out of paper and rules of gene expression. To shed light on denced by the number of scientists who patience. Utilizing computing power to how a small set of genes is carefully regu- speak on this topic to the discussion get to the bottom of the nucleosome’s lated to achieve a wide range of purposes, group. Wilma Olson of Rutgers University architecture, Olson discovered that charge Burge’s lab set out to create a library of uses computational modeling to under- neutrality was not, as expected, the pre- exonic splicing silencers – short strings of stand the physical and chemical proper- vailing force dominating the deforma- RNA that turn off a particular gene. Such ties of the nucleosome, a deformed tion. In fact, the actual sequence of DNA a library provides scientists with a refer- region of the chromosomal scaffold. A seems to be encoded in order to best twist ence for understanding the vitally impor- tant RNA splicing code, which edits and translates DNA instructions into protein products. Powerful biological insight, like the peek of nature that Burge is priv- ileged to view, has propelled scores of mathematicians and computer scientists to move into the world of biology, creating ILLUSTRATION: FRED OTNES – IMAGES.COM ILLUSTRATION: an explosion in the field of bioinformat- ics in recent years. MODEL MOLECULE One of the classical rules of scientific research is that if you can replicate a phenomenon, then you truly understand it. Car mechanics well know that there is no better way to grasp the inner workings of an engine than to take one apart and then rebuild it. The most important machines in the world are proteins and nucleic acids, the molecules that direct life. Even though they are unimaginably small, they are orders of magnitude more intricate than any car engine. The size and complexity of objects worth under- standing in biology render most of them impossible to comprehend through experimental tools. This is where computers come in handy. With a few strokes on the key- board – after some sophisticated pro- gramming – computational biologists can create models of interesting systems and simulate their activity. A movie of a protein folding into its native state can provide an incredible degree of insight into the function of a real protein, which www.nyas.org November/December 2005 Update 19 quantum mechan- ested in fighting diseases like cancer. ics provides us with Ronald Dunbrack of the Fox Chase Cancer a set of laws that Center tries to predict the unknown can accurately pre- structures of proteins through a tech- dict the movement nique known as comparative modeling. of molecules in Dunbrack pilfers theoretical methods the natural world. from physics, computer science, mathe- Unfortunately, the matics, robotics, and quantum field the- math that describes ory, and cleverly applies them to better the interactions understand the molecular basis of cancer. between particles is so intensive that PLAYING BATTLESHIP it is not practical IN THE GENE AGE to simulate a bio- One of the unparalleled successes of logical system in computational biology has been the all its glorious development of the gene chip or detail, unless you microarray. Modeled on a silicon chip, have several life- this marvelous device separates individ- times to spare. ual genes or gene regions into the thou- Assumptions must sands of spots on a tiny grid that looks Molecular modeling provides a view of molecules that are too tiny to see, even be made, and something like a Battleship gameboard. with a microscope. This protein-DNA complex (DNA polymerase beta with primer/ determining what As researchers deposit DNA or RNA onto template DNA) was simulated by Tamar Schlick’s computational biology group at NYU (by Ravi Radhakrishnan and Karunesh Arora) to understand at the atomic level approximations the chip, components line up specifically the mechanisms that regulate fidelity, or the faithful duplication and repair of DNA. are appropriate for at the holes where similar DNA or RNA answering a specif- is located. This enables scientists to assess ic question about a all the genome’s DNA or all the RNA into the stable nucleosome structure.