William Stafford Noble Department of Genome Sciences University of Washington 2006–2011

Research

Since 2006, major research accomplishments include the publication and widespread adoption of a semi- supervised learning method for improving peptide identification from shotgun proteomics. The software, Percolator, published with Mike MacCoss in Nature Methods in 2007, is now distributed along with every copy of Mascot, the most widely used MS/MS search engine and will soon be distributed by Thermo as well. Also, last year, with Tony Blau, Stan Fields and Jay Shendure, we published in Nature a kilobase resolution model of the 3D structure of the yeast genome in vivo. This year, a postdoc in my lab, Michael Hoffman, received K99 funding. I also received the University of Washington Postdoc Mentor of the Year award from the UWPA. My lab is now approaching a major transition. All three of my current PhD students will be finished by the end of August. One of my postdocs and two masters students are also leaving. This leaves four postdocs and one programmer in my lab. I am currently advertising for the following seven postdoc positions, which span most of the current research in my lab:

1. Structure of mammalian genomes: Last year, in collaboration with Tony Blau’s lab, we published a detailed description of the three-dimensional architecture of the yeast genome in vivo. We have recently received NIH funding to continue this work in mammalian systems. The postdoc involved in this project would work on developing and applying statistical methods for interpreting the raw sequencing data, for relating these data to known classes of functional elements, and for improving our ability to infer 3D structure from observed pairs of interactions. Funded by a new R01, with Tony Blau as PI.

2. Clonal population of cancer: More recently, also in collaboration with Tony Blau, we have been developing next generation sequencing strategies for characterizing the population of clones in a single cancer by assaying paired cancerous and non-cancerous samples. This project will employ dynamic Bayesian network models to infer the clonal population structure. Funded by Tony Blau.

3. Genomics and proteomics of Plasmodium: Our lab is about to embark in a new research direction, focusing on analyses of Plasmodium falciparum, the parasite responsible for the most lethal form of malaria. In collaboration with Karine Le Roch’s lab at UC Riverside, we will investigate local and global DNA structure, with the goal of building a computational model of gene regulation in this organism. We will also be applying our expertise in interpreting shotgun proteomics data to help shed light on the differences between RNA and protein expression. Funded by the Yeast Resource Center P41. I am planning to submit an R01 in the fall on genome structure in yeast and Plasmodium, with two or three co-investigators (Karine Le Roch at UCR, Zhijun Duan in Hematology, and possibly Linda Breeden at the FHCRC).

4. Local chromatin structure and gene regulation: This project involves investigating the relation- ship between DNA sequence and chromatin structure of the human genome. Computational models, such as dynamic Bayesian networks or support vector machines, will be employed to investigate the competitive binding of proteins to nuclear DNA and to understand their collective influence on gene regulation. This project is a collaboration with Prof. Zhiping Weng at the University of Massachusetts Medical School. Funded by an NSF award, with Zhiping Weng as the PI.

5. Integration of functional genomics data: This project will be carried out in the context of the NIH ENCODE Consortium, the aim of which is to discover all of the functional elements in the human genome. Our lab’s role in this consortium is to develop unsupervised and semi-supervised machine

1 learning methods for identifying new instances and new types of functional elements. Funded by the ENCODE Data Analysis Center, with as PI.

6. Machine learning for mass spectrometry analysis: In collaboration with Mike MacCoss’s lab here in Genome Sciences, as well as Jeff Bilmes’ lab in Electrical Engineering, we have developed a series of machine learning and statistical methods for interpreting shotgun proteomics data sets. The postdoc working on this project will have opportunities to develop new methods for quantifying proteins, interpreting targeted proteomics data, identifying modified proteins, etc. Funded by my R01, with Jeff Bilmes as co-I.

7. Genomics and proteomics of auditory pathways: Dr. Ed Rubel’s lab, in the UW Department of Otalaryngology, studies auditory pathways in the developing mouse brain. A collaboration involving Ed, Mike MacCoss and our lab will collect a series of RNA and protein samples from microdissected mouse brains at particular time points. These samples will be subjected to shotgun proteomics and RNAseq analysis, with the goal of identifying genes and proteins involved in development of these pathways. The postdoc working on this project would have the opportunity to work in any of the three collaborating labs. Funded by an R01, with Ed Rubel as PI.

In addition, I have an R01, with Tim Bailey as co-investigator, to maintain and develop the MEME Suite. This grant funds one senior programmer in my lab. I also have a pending R01 application with Christina Leslie at Memorial Sloan-Kettering as the PI.

Teaching

This year I taught part of GENOME 541 and managed the entire course. Last year, due to my sabbatical, I did no teaching, but the previous year I taught an entire 10-week undergraduate course (GENOME 373) in addition to part of GENOME 541. Finally, every quarter I help Martin Tompa, Larry Ruzzo and Joe Felsenstein run the CMB journal club (CS590C).

Service

I am currently serving on three editorial boards—PLoS , IEEE Transactions in Com- putatioanl Biology and and Journal of Bioinformatics and Computational Biology. In addition to extensive ad hoc reviewing as well as program committee memberships, I have served on five NIH review panels since 2006 and am slated for another in early July. I am just finishing a three-year term on the Board of Directors of the International Society for Computational Biology. With Job Dekker and Tony Blau, I will be leading a workshop on genome structure and function at the Pacific Symposium on Biocomputing in January. More details are provided in my attached CV.

2