Profile of David Haussler
Total Page:16
File Type:pdf, Size:1020Kb
PROFILE Profile of David Haussler equencing the human genome rules, on strings written over finite al- was the final grand scientific phabets, ‘‘so it could be DNA in one achievement of the 20th century. incarnation, and something quite differ- One scientist who played a major ent in another’’ (4). Spart in organizing and analyzing the three billion base pairs of DNA that Sequencing Pioneers make up our genome is David Haussler, While at Boulder in the early 1980s, who was elected to the National Acad- Haussler befriended fellow grad stu- emy of Sciences in 2006. His Inaugural dents Gene Myers and Gary Stormo, Article, on mathematically modeling the and the three of them worked toward evolution of genomes, is published in sequencing DNA and making sense this issue (1). of genomes. They met in a seminar Haussler is currently a professor of led by Haussler’s supervisor, Andrzej biomolecular engineering at the Univer- Ehrenfeucht. sity of California, Santa Cruz (UCSC), Since their student days, the three and a Howard Hughes Medical Institute have each had an enormous impact on investigator, but his career took a slight genomics: Stormo developed fundamen- detour from his southern California tal methods of recognizing sequence roots to scientific prominence. He began motifs and other patterns in genomic his career in the humanities before set- data, Myers led the bioinformatics team tling in mathematics, but a keen ability at Celera, the private company that se- to mix his interests in biology and quenced the human genome, and genomics has led to his consistently ap- Haussler’s group provided bioinformat- plying math to biology. ics for the public, government-funded Human Genome Project. A Restive Youth ‘‘In the 1980s, we were analyzing Haussler grew up in Los Angeles, where David Haussler small snippets of the E. coli genome his father, an engineer, encouraged that were available and genomes of bac- David’s and his brother Mark’s interest teriophages like phi-X 174. These repre- because I was the math major. It was in science. David, however, did not fol- sented the first products of the early low a straight path to science. During really a foundational experience for me sequencing efforts, driven by the introduc- high school he was more interested in and a harbinger of things to come’’(2). tion of recombinant DNA methodolo- art and psychology and, after graduat- After completing his bachelor’s de- gies,’’ Haussler says. ‘‘We didn’t have ing, enrolled in the Academy of Art in gree in 1975, Haussler received a mas- much data to work with back then.’’ But San Francisco, in 1971, where he stud- ter’s degree in applied mathematics in that didn’t stop them from preparing the ied painting for three months. He soon 1979 from California Polytechnic State techniques that could be used to analyze transferred to tiny, offbeat Immaculate University at San Luis Obispo. He the large quantities of DNA to come (5). Heart College (IHC) in Hollywood, then moved to the University of Colo- Later that decade, Haussler moved where he studied gestalt therapy in rado (Boulder, CO), where he ob- among computer science, artificial intel- the hope of becoming a practicing tained his Ph.D in computer science in ligence, and statistics. ‘‘I was interested psychologist. 1982. In 2005, he won the Classic Pa- in how brain-like algorithms could be During this restless time, his brother per Award from the American Associa- built and what their limitations and helped him find his calling. After two tion of Artificial Intelligence (AAAI) strengths were,’’ he says. ‘‘I wanted to years at IHC, David left in 1973 to ma- for an earlier manuscript on learning know what was theoretically learnable in jor in mathematics at Connecticut Col- algorithms (3). a very general sense.’’ lege (New London, CT). He has also won the Dickson Prize in Even while working in mathematics, ‘‘I think the turning point came when science from Carnegie Mellon Univer- Haussler kept his eye on genomics, a I went to work in my brother’s lab. He’s sity (Pittsburgh, PA) and the Associa- contemporary field that only began tak- 12 years older and was a biochemist at tion for Computing Machinery/AAAI ing shape during the late 1970s. When the University of Arizona,’’ Haussler Allen Newell Award, and he is a Fel- more gene sequence data became avail- recalls. ‘‘He said, ‘You want a summer low of the California Academy of Sci- able in the 1990s, he got back into the job? Come to my lab and I’ll teach you ences, the American Academy of Arts field, developing statistical models and how to do science.’ He gave me and Sciences, and the American Asso- algorithms that were later used in major Leninger’s book on biochemistry and ciation for the Advancement of Sci- genome projects (6). said, ‘Read this first.’ I read the text and ence. He is 54 and married, with two ‘‘It was a long way from bacterio- worked in his lab and it was a dream children in college. phage genomes to our first whole ani- summer.’’ Haussler’s doctoral thesis was in pure mal genome,’’ Haussler says. ‘‘Of ‘‘By the end of it we had measured math, reporting his study of formal lan- course, the ultimate project was the hu- the levels of the hormonal form of vi- guage theory and the theory of compu- man genome. We were recruited into tamin D in the human bloodstream for tation, including Turing machines. It the project because they wanted experts the first time, and we published a pa- may seem far removed from the human per in Science—my first publication,’’ genome, but Haussler explains that this This is a Profile of a recently elected member of the National Haussler recalls. ‘‘Although my job was abstract world of machine language de- Academy of Sciences to accompany the member’s Inaugural the lab work, I also ended up doing a scribes how anything that is computable Article on pages 14254-14261 of volume 105. key step in the analysis for the paper, can be computed, using very simple © 2008 by The National Academy of Sciences of the USA www.pnas.org͞cgi͞doi͞10.1073͞pnas.0808284105 PNAS ͉ September 23, 2008 ͉ vol. 105 ͉ no. 38 ͉ 14251–14253 Downloaded by guest on September 30, 2021 to find the genes in the DNA. We had developed a methodology for this using hidden Markov models.’’ He officially joined the public Human Genome Project in 1999. ‘‘When we got there, the public project had just tiny snippets of DNA scattered all over the genome in GenBank files without any cohesive map or assembly to pull them together,’’ Haussler recalls. There were genetic maps of the genome measured in centimorgans, radiation hybrid maps, and physical maps made from restric- tion enzyme digest data obtained from the thousands of approximately 150,000-base pair artificial chromo- somes that the project was sequencing. He remembers that the project’s origi- nal plans for assembling the draft ge- nome data were not working and had to be reinvented. ‘‘These maps were mutually inconsis- tent in places, the data were noisy, and David Haussler’s wet lab trying to overlay all the sequence data— both genomic DNA snippets and cDNA sequences made from mRNAs—was just You could download the human ge- have put on more powerful search and a huge jigsaw puzzle,’’ he says. ‘‘We nome, for free, without restriction, from interactive capabilities for access to an couldn’t even start to find the genes un- Santa Cruz that day. This was humani- increasing variety of high-throughput til we had built long stretches of contin- ty’s first real glimpse at its own recipe.’’ genomic data’’ (8). uous DNA.’’ Since then, Haussler has been assem- Along the way, the researchers have ‘‘Jim Kent, of my group, stepped in at bling and scanning other genomes to found ultra-conserved regions of the the last minute and saved the day by find the recipes of different species of human genome that have remained un- writing an amazing assembly program living animals, and using the results to changed for hundreds of millions of that we call GigAssembler,’’ Haussler determine the difference between the years, along with an RNA gene, ex- says (7). The program works at a scale DNA of various organisms and modern pressed in early development of the of billions of bases of information, tak- humans. His team has been a part of brain’s neocortex, that has changed dra- ing information from 13 sources, includ- deciphering the mouse, chimpanzee, matically only recently and could play a ing the different maps and information macaque, fruit fly, chicken, and rat role in human uniqueness (9–11). on RNA transcripts. genomes. All data from those projects Haussler’s current research is fo- Haussler says that without Kent, who cused on the evolution of complete wrote 20,000 lines of code in just a few genomes. He wants to take the trees of months, the public project would not life that have been created by compari- have caught up with Gene Myers’ team ‘‘The only hope for sons of single genes and do the same at Celera, which was well funded and with entire genomes, studying the boasted plenty of computational understanding the changes that made species what they power. are, or were (12). ‘‘We ended up doing our first assem- molecular evolution bly on 100 desktop machines that the DNA Evolution UCSC chancellor and dean of engineer- of life is to understand In his Inaugural Article, he and his ing hastily purchased for us.