Interview Life, the Universe, and Everything: An Interview with David Haussler

Jane Gitschier* Departments of Medicine and Pediatrics and Institute for Human Genetics, University of California San Francisco, San Francisco, California, of America

Among the pantheon of computer looking at his picture, I realized he must scientists who have framed our capacity be your sibling. to interpret DNA sequences stands David Haussler: My only sibling is my Haussler of the University of California, brother. He was professor of biochemistry Santa Cruz (UCSC). Applying his prowess in and taught me in computer learning theory to the prob- how to do science. And he is really one of lems of protein modeling and gene the leading scientists in the world on structure prediction, Haussler emerged in , which was the subject of that the mid-1990s as a trail-blazer in the field first paper. of computational biology. He came to Gitschier: How much older is he? wider prominence in 2000 during the Haussler: Twelve years. frenetic race to produce a draft sequence Gitschier: So he was established when of the human genome by nucleating an you were just a kid. impassioned team of coders and engineers Haussler: Right. The summer after who assembled the sequence data and my freshman year [in college], I spent time launched the UCSC Genome Browser. in his lab. Every third week, I would Fittingly, his team’s contribution made sacrifice a chick that was raised without manifest the vision of Robert Sinsheimer, vitamin D. I would take out its intestines who as Chancellor of UCSC in 1985 for receptors for the hormonal form of convened a pivotal workshop to explore vitamin D, and we used those receptors in sequencing the human genome. a radio-receptor competitive binding assay Haussler (Image 1) now plays, by my to first measure the level of the hormonal count, at least half-a-dozen leadership form of vitamin D in the human blood- roles, including co-director of the Genome Image 1. David Haussler. Photograph by stream, in both normal and diseased 10 K project, coordinating committee Ron Jones, courtesy of the Center for Biomo- humans. By the end of the summer, we member of The Genome Atlas lecular Science and Engineering, University of had a paper in Science! You know, big project, and director of the Center for California Santa Cruz. breakthrough. doi:10.1371/journal.pgen.1003282.g001 Biomolecular Science and Engineering at Then I went back the next summer, and UCSC. He is easily spotted by his nothing worked. I remember my brother predilection for Hawaiian shirts, whose a discussion of his growing up in the town saying to me, ‘‘Now, this is how science informality, he suggests, fosters inter- of North Hills in the San Fernando Valley. really is.’’ But I was undaunted. disciplinary collaboration. Indeed, Hauss- Haussler: My dad went to Caltech Gitschier: Let’s talk about your tran- ler’s ken for machine learning and his and because of the economic pressures of sition to science, because I know your first quest for the meaning of life are so having a young family, decided not to college experience was in art. expansive that I was tempted to title this pursue pure science, but to pursue a Haussler: I did visual art mostly. piece ‘‘Deep Thought’’, a nod to the professional position in structural engi- Acrylic painting and metal sculpture were fictional computer in Douglas Adams’ neering. He worked on mathematical probably my favorites, although I did The Hitchhiker’s Guide to the Galaxy, but problems as a hobbyist and had a love of stone lithography and all kinds of fabulous chose a more subdued allusion instead. pure science. Both my brother and I ended things in the San Francisco Academy of I located Haussler on the upper reaches up living out his dream to be a scientist. Art. Then, I switched schools and into of the stunning UCSC campus in the My brother is a highly accomplished psychology. engineering building, a sleek structure of biochemist. Gitschier: And that was where? glass and aluminum, tucked into a red- Gitschier: I saw that your first paper Haussler: That was actually at a crazy wood grove that was still dripping and in the early ’70s was with a Haussler and little experimental college. You have to fragrant from the morning’s rain. The had assumed it was your father, but then, understand that this was the early ’70s… anteroom to his modest office was deco- rated with handsome prints from UCSC’s Citation: Gitschier J (2013) Life, the Universe, and Everything: An Interview with David Haussler. PLoS scientific illustration program as well as Genet 9(1): e1003282. doi:10.1371/journal.pgen.1003282 books on the genome project, and a box Published January 31, 2013 labeled ‘‘for the intron lounge’’ was piled Copyright: ß 2013 Jane Gitschier. This is an open-access article distributed under the terms of the Creative high with journals. Haussler swept in via Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, bicycle, swiftly signed a few documents, provided the original author and source are credited. and downed a cold drink as we began with * E-mail: [email protected]

PLOS Genetics | www.plosgenetics.org 1 January 2013 | Volume 9 | Issue 1 | e1003282 Gitschier: I do understand! [Haussler work in . That seems like Gitschier: I see. It was a tradition! and I were born the same year.] a logical transition to me. Haussler: My great grandfather, my Haussler: My mother was hoping I’d Haussler: Logical is the correct word. grandfather, my father always ran their go to UCLA, but I was a rebel and said After studying pure mathematics as an own businesses. Never had a boss. It was a ‘‘No, I want to go to a crazy place,’’ undergrad, I decided that the foundation crazy, fierce, independent kind of tradi- Immaculate Heart College [IHC] in for everything was logic. And I read tion. Hollywood. extensively before I went to graduate Gitschier: So this was instilled in you Gitschier: Immaculate Heart doesn’t school, but even after getting my under- very early. I’m now seeing the fuller sound so ‘‘crazy’’ on the surface. graduate degree in mathematics and a context! Haussler: It doesn’t, not at all, but minor in physics, I still hadn’t decided to Haussler: Right. I wasn’t going to the thought leader there was Sister pursue a life of science. play along with any institutional programs! Corita Kent, and you remember from Gitschier: What were you thinking Those were the days when you could be the ’60s, those love posters? A lot of the of—art, philosophy, psychology? anti every institution and get away with it. art movement and the philosophy that Haussler: I wanted to get at the heart Well, I look back at my writings from was expressed in art and posters in that of the meaning of life. that time and there was some very creative era actually came out of Sister Corita Gitschier: Wow. [I had to swallow the stuff but isolated from the bulk of the Kent and a number of other rebels. The answer, ‘‘42’’.] intellectual mainstream, it’s very hard to sisters at IHC were essentially kicked Haussler: Still this rebel spirit, I guess. make progress. So I was thrilled to get re- out of the Catholic Church for being I wasn’t convinced that I would find that engaged, just by taking advanced math radicals, and they had an extremely at traditional institutions. I spent about classes at Cal Poly [San Luis Obispo]. experimental college. So I, being the nine months wandering around Europe Applied mathematics was my major, contrarian I was, applied there. It was and then settled in San Luis Obispo on the but I took computer science classes as well. strong in art and music and psychology. family farm, kind of between generations. I seized on the question of what is We studied Fritz Perls and Carl Rogers My grandfather was aging and my father computable. What can be formalized by and all of these self-realization psychol- was active as an engineer, so there was no mathematics? And the answer, according ogy thinkers at the time. And I was one to take care of it. to Alan Turing, was that this is the same as extremely into that. We had intensive While I was there, I wanted to keep what can be computed on a very simple encounter groups and dug very deeply touch with my intellectual side, so my kind of machine. I was tremendously taken into personal interactions. friends and I—it was almost like a by that and by the fact that Turing and Gitschier: But you didn’t stick with commune—believed in working hard on Kurt Go¨del had established that there Immaculate Heart. the ranch during the day and then reading were things that were fundamentally Haussler: I got interested in science and discussing philosophy, history, litera- uncomputable; true but unprovable. It by working with my brother. I then ture, and psychology at night. appealed to my mystical side. I was always transferred to Connecticut College back Gitschier: Who were these people that interested in the unity of the universe and east. Again, I liked very intimate, individ- you recruited to the farm? the mystery of it. ual learning. This was part of my whole Haussler: Well, important people that Gitschier: Are you still? psychology background. I view essential I met in my life and in my travels. We read Haussler: I still am in many ways. human progress being made, including books and had wonderful discussions. I The mystery of ‘‘why life’’ and ‘‘is there a learning, within a very intensive, one-on- remember my favorite title was The Origin mathematical inevitability that there will one or small group interaction. of Consciousness in the Breakdown of the be life’’ are questions that I spend quite a Gitschier: When you went there, you Bicameral Mind. We were trying to build a bit of time thinking about. I don’t write knew you wanted to do math? non-traditional intellectual environment. much about them because I’m engaged in Haussler: Yes. During those two But size is a factor there. What was areas that are more immediately applied summers with my brother, the one thing missing at that time was the Internet. and have urgent impact, but I think a lot that mattered most was not the wet lab There was no way to get in touch with about them. experiments that I had done, but when it other people who had very specific And there’s a theme in my thinking and came to analyzing the data. Someone in interests except through the library and in my life that has been constant since the lab was showing concentration in through post. So it became a 19th century those days as a young adult searching for relation to a radioactive response curve gentleman-scholar kind of activity, which answers. I turned away from thinking and trying to fit that data with a linear has very limited impact. about that as a humanistic quest—to function. And I said, ‘‘Well you can’t use Gitschier: What happened to the farm understand my psychology and our inter- linear regression on this until you trans- after you left? actions—into an absolute quest for knowl- form the variables.’’ And they looked at Haussler: My father did retire there. edge about the universe. In a sense that is me and said, ‘‘Can you do that?’’ He and my mother had a spectacular the one thread that unites my entire adult And then I realized, hey wait a minute, retirement, raising organic fruit and selling life because I’ve been in so many different I can contribute on the math side and it’s a it at the farmers market. So I played an scientific areas. lot more fun than grinding up chicken important role in the family; I was the But life itself has always been something guts! I like the quote that ‘‘mathematics is bridge to that retirement and it allowed that fascinated me, life in the broadest the queen of sciences’’ [attributed to me close friendship and think time. sense, that spans everything from the Gauss]. Mathematics is the beautiful unity Gitschier: And what firm had your actual biological life that we observe on in the universe, and that’s what totally father worked for? this planet, to the abstract notion of life. captivated me. Haussler: Oh, in my family, we never Like in Conway’s Game of Life where you Gitschier: Then, you find yourself at worked for anybody else! Robert Haussler have a disarmingly simple mathematical the University of Colorado doing PhD Structural Engineering! system that nevertheless is sufficiently

PLOS Genetics | www.plosgenetics.org 2 January 2013 | Volume 9 | Issue 1 | e1003282 complex that it is naturally an incubator of the board and we would scribble on it. He who is now at University of Illinois, and self-reproducing patterns; you start with a also ran weekly seminars on campus, and several other scientists created a new random pattern, and you will have other graduate students would attend. scientific research group and area, the emergent forms that will be self-replicating Gitschier: I take it, Gene Myers, and field of computational learning theory. entities that interact, as in living systems. also Manfred Warmuth. Here we go: COLT ’88—[Haussler I went through a period where I was Haussler: Yes, and don’t forget Gary pulls a report from his bookshelf]— more fascinated with what we can formal- Stormo! Gary was one of the founders of computational learning theory. This was ize than with the messiness of real the field of . He was a the conference on computational learning biological life. I’m talking primarily about graduate student in , theory and they still have them today. the act of doing science via computer. while the rest of us were in computer The methodologies that we, the com- How powerful were the systems of logic science. The four of us all participated in munity of machine learning, developed in and mathematical calculation in terms of Andrzej’s weekly seminar where we would those decades in the ’80s and ’90s have revealing the nature and structure of the discuss any topic in science or mathemat- come to fruition in the last decade. For universe? And if we turned them loose, ics that was exciting, and the topics were example, we started with primitive speech would they actually produce things that all over the map. Anybody who was recognition where you could speak one of human minds are not able to? So I became engaged and wanted to could participate. ten words in a fixed vocabulary, and the obsessed with machine learning. Gitschier: In fact, you have an early speech recognition algorithm would tell Gitschier: Why don’t you define paper with Gary in 1986 about DNA you which of the ten words you spoke. ‘‘machine learning’’? sequence landscapes. This is your first Now there is Siri, and Siri doesn’t come Haussler: Machine learning is the foray into annotating sequences or think- easy! Siri is based on hidden Markov design of adaptive software systems for ing about sequences. And we’re not talking models, and that framework had to be the purposes of modeling the world or hidden Markov here yet. worked out and all the engineering had to achieving some action in response to Haussler: No, this is before the hidden be done on top of that. All of these ideas input, as in speech recognition, in a way Markov period. really were fermenting in the ’80s and that embraces the complexity of the At that time, we were exposed to some ’90s. Finally, we could pull together a phenomena being modeled and improves very exciting data. The first DNA se- conceptual framework for machine learn- with use. quences were coming in from bacterio- ing that was much richer than had existed Gitschier: And then, from there, that phage WX, T7, snippets of E. coli, and the before. you’ll be able to make predictions. whole recombinant DNA revolution was Gitschier: OK. So regarding hidden Haussler: Right. The prototypical occurring. The language of DNA was Markov models, let’s talk about their machine learning exercise is to take a starting to be decoded for the first time, application to DNA. I take it that you training set of examples, such as spoken and it seemed like an extraordinary and your post-doc were utterances coupled to the actual words opportunity from a mathematical point really the first to start applying this model being spoken, and to expose a computer of view. Gary’s thesis was on recognizing to biological problems. You have two algorithm to that training corpus, have the ribosome-binding sites in E. coli by com- papers in 1994—one on protein modeling computer algorithm adjust some parame- puter analysis of DNA sequences. Mean- and the other, which is one of my ters or actually invent or discover rules while, Gene Myers had worked out the favorites, on gene prediction in E. coli… within those data, and then encapsulate algorithms for RNA folding into its Haussler: Oh, thank you. the structure that it learns from these secondary structure. All of that was Gitschier: …’Cause I actually under- examples into some computer readable developed as part of Andrzej’s seminar. stood it! form. Then you test it by giving other Week after week I watched that happen Haussler: Yeah, that is the first paper examples where you are not supplying the and participated and kibitzed. There was that applies hidden Markov models to find answer, but asking it to predict what the no term ‘‘bioinformatics’’ or ‘‘’’. genes in DNA. Before this paper, there answer is. Machine learning algorithms These fields weren’t even named yet, but were a lot of ad hoc tools. People had model both data as well as decisions or we were seeing an inkling of what could be independently come upon pieces of this actions that are made in light of data. accomplished. theory, and they weren’t recognizing the So, I went to Colorado to study the I probably would have stayed with the whole. quantitative framework for logic and DNA analysis had there been more of a So the important contribution to the mathematics that explains emergent and capability to produce DNA data at the field here was to take what was considered adaptive phenomena, like life or the time, but months went by and there was a disparate toolbox of different methods complexity of systems. nothing new! We looked at every scrap of and approaches and to unify it under one Gitschier: And why did you choose DNA that existed. The field was essentially mathematical framework that was concep- Colorado? data-bound. It is interesting to think back tually clean and revealed the power and Haussler: I chose Colorado for one upon that today in light of the sequencing the central concepts behind these meth- reason: to work with Andrzej Ehrenfeucht. deluge that we are now seeing! odologies. That mathematical frame- He is very broad, a polymath. And his So the bulk of my time for the next work—the —had work appealed to me very fundamentally. decade was really spent on the machine already been developed and is much If I learned from my brother how to do learning question in general. broader than DNA or proteins; it is used science, from Andrzej I learned how to Gitschier: Including after your move very broadly in science in engineering. think deeply and rigorously and to unleash to Santa Cruz. Again, it takes a cross-disciplinary per- mathematical creativity. To work with a Haussler: Yeah, Manfred recruited spective to synthesize those things. real math genius is an intense interaction! me here, and he and I worked together It just struck me at one point that we We would spend whole Saturdays at his very intensively. We, with other col- really should start to think about doing this house, in his backyard. He would roll out leagues, Ron Rivest at MIT, Lenny Pitt on DNA and protein sequences. And that

PLOS Genetics | www.plosgenetics.org 3 January 2013 | Volume 9 | Issue 1 | e1003282 gelled. It really snowballed very quickly. produced a successor to his 1994 work, He wrote thousands of lines of code We really started to see that many things the GENIE program, which was used very to assemble the DNA for the first draft fit together. It was one of those ‘‘aha effectively by Celera to annotate the genes of the genome. And it’s such a compli- moments’’. in the fly genome in 1998. cated problem. There were 13 different Gitschier: Clearly by 1994, you are Gitschier: OK, right. But none of this types of input that you had to solve very well aware that there is a ton of was about assembly. this big constraint satisfaction problem sequence coming down the road. Haussler: No. We had not done any for. Basically you had the cloned frag- Haussler: Yes, exactly. Things started work on assembly. And we were not asked ments, you had the genetic maps, you cooking. The [sequencing] technology was to do any work on assembly. had the physical maps, you had RNA progressing at about a Moore’s law rate at Gitschier: OK, so was a sequences to order and orient the frag- that time. And it bore analogy to the graduate student here at UCSC, and he ments, all kinds of information about how computer world. It was clear that this had just produced a graphical presentation the DNA should go together and much of would be a revolutionary opportunity to of the splice sites in C. elegans, right? it contradictory! You had to adjudicate look at life by sequencing DNA, so we got Haussler: That’s right, ‘‘INTRO- those contradictions and deal with all very excited about the fact that there NERATOR’’. of these different pieces of information would be enough data for computers and Gitschier: Love that name. So Jim to achieve a single assembly that made machine learning to actually make a had already been coming up with these sense. contribution to molecular biology. computer-generated representations of Francis Collins had prearranged with It was thought in the early ’80s and even genes. And we’re anticipating the genome Craig Venter that we would declare a tie the early ’90s, some of the major compu- browser, here. with Celera and a meeting was to be held tational contributions would be in protein Haussler: Jim’s background is the at the White House to announce the first structure predictions, and we were excited computer game industry and other areas draft of the genome on June 26th, 2000. about that problem, but realized that it is of large-scale software design, as well as Without Jim’s efforts, it never would have fraught with intrinsic mathematical diffi- biology and mathematics. He had this happened! And it is such a complicated culties. I mean, that is a system that has so ability, which is unprecedented in many and tricky problem that a committee could many degrees of freedom that it is very, ways, to conceive of and quickly build argue about it for years, or a set of very hard. software that is incredibly complicated and programmers could design a modular But there are other problems where you rich. So that capability was huge. It was approach and each solve part of the can see that extraordinary gains will be clear that he had extraordinary talent from problem, which would take years to made by simply scaling them up. Hidden the beginning. implement! Only one person, a genius Markov models have the property that as So, we were asked to work on analysis of with everything in his head and incredible you increase the amount of data, the the biological content of the DNA se- skills would pull this off! And he did it in computer time increases only linearly, not quence. The problem was that the DNA about four weeks! I mean, it was unprec- exponentially. Therefore, if the computer sequence was in tiny pieces and the edented. power is increasing exponentially, you get assembly efforts—there were two; they Gitschier: He just started with this this enormous exponential leverage on the had hedged their bets—were not succeed- problem… problem. ing. So we had nothing to analyze. You Haussler: …And did it. So we were extremely excited about the can’t find genes from very small pieces of Jim delivered the assembly on June scaling, but I must say, back then, it wasn’t DNA, even a few thousand bases is not 22nd, and then we spent a weekend clear at all that we would have the enough. We need 50,000 bases or more at furiously analyzing it. You know, Francis sequencing technology that we have today. a shot. was on the email or the phone to It took an enormous amount of effort to get Gitschier: So then you had to start everybody and we were going back and the done. To scale addressing yourself to the assembly prob- forth. The entire international team had to from thousands of bases in the ’80 s to 3 lem. analyze it. And we found out from Gene billion bases at the turn of the millennium Haussler: We had to! I had a post-doc Myers, who was our counterpart from was just an enormous technological who was working on it, Nick Littlestone; Celera, that his team finished on June achievement. It cost roughly $300 million his design was elegant but didn’t scale 24th, two days before the conference, and just in machine time and reagents to get enough. This was totally a skunk-works. was going through exactly the same the first human genome. It wasn’t clear We didn’t have any funding, that wasn’t experience analyzing the Celera draft of where it was going to go from there. our mandate, but it became clear that the the human genome. Gitschier: We’ll come back to that, whole thing would fall apart unless Later, we had the advantage, of course, but first, how did you come to be involved somebody could assemble the genome. because Celera couldn’t really publicly in the Human Genome Project? Meanwhile, Jim was working on issues unveil its genome; you had to pay a Haussler: Eric Lander called me up in assuming the assembly would work, getting subscription for it. But we could, so we had 1999 and said, ‘‘We’re familiar with your more and more nervous every day that he the honor of posting it from UCSC on the work on hidden Markov models in gene wouldn’t have any input to what he was Internet on July 7th. Now that was really finding, and we expect to have sequence doing unless someone did the assembly! the greatest moment. for analysis sooner than we had previously And at one point Jim said, ‘‘I think I’m Gitschier: Now, when you posted it, it planned. We would like you to participate going to do it.’’ And I said, ‘‘Godspeed!’’ was in the framework of a genome in the analysis of it.’’ And of course I said, And he just kicked out the code. I mean browser? ‘‘Yes.’’ It was the opportunity of a lifetime he literally worked night and day. I Haussler: No, at that point it was raw to join the human genome project. I set remember visiting him and he had to DNA. about gathering a team, which included ice his wrists because he was coding so Gitschier: Oh, God. So when did the Jim Kent and David Kulp, who had furiously! browser become official?

PLOS Genetics | www.plosgenetics.org 4 January 2013 | Volume 9 | Issue 1 | e1003282 Haussler: So that was shortly after curve where DNA sequencing is improving that you have this kind of curve. I can’t that. Jim took the INTRONERATOR ten times every two years. So you have to think of another technology like this. For code and turned around and made the think about these numbers! Computers me, it’s an amazing experience, to remem- UCSC Human Genome Browser, and to continue to go along at most at two times ber the days when we would wait years for make a long story short, that’s been a huge improvement every two years, so for the another few thousand bases to appear—to success. last eight years, that’s 2626262, well that’s this! The cumulative effects of sequencing Gitschier: OK, and now let’s return to great! Computers are 16-fold more power- technology improvements are just unbe- this exponential growth in sequencing ful! DNA sequencing: 10610610610! lievable! It’s unbelievable! When you have capability since then. That is 10,000 times more powerful! It’s such a steep exponential growth law, you Haussler: Really, the ‘‘hyper’’ Moore’s huge! get a shocking change on time scales that Law [for DNA sequencing] started in about It’s incredibly disruptive technology. It are a blink of an eye in terms of science 2005, 2006, where suddenly we’re on a will affect everyone’s lives! It’s very seldom and society.

PLOS Genetics | www.plosgenetics.org 5 January 2013 | Volume 9 | Issue 1 | e1003282