Janga-Phd-Thesis.Pdf (PDF, 9Mb)

Total Page:16

File Type:pdf, Size:1020Kb

Janga-Phd-Thesis.Pdf (PDF, 9Mb) Exploiting network-based approaches for understanding gene regulation and function Sarath Chandra Janga A dissertation submitted to the University of Cambridge in candidature for the degree of Doctorate of Philosophy April 2010 Darwin College, University of Cambridge MRC Laboratory of Molecular Biology Cambridge, United Kingdom Previous page: A portrait of the transcriptional regulatory network of the budding yeast, Saccharomyces Cerevisiae. Each circle represents the network of transcriptional interconnections between all other chromosomes to one of the chromosomes. Evidently all chromosomes are transcriptionally controlled by factors encoded on many of the 16 chromosomes in this organism marked by the letters ‘a’ through ‘p’. iii Declaration of originality This dissertation describes work I carried out at the Medical Research Council Laboratory of Molecular Biology in Cambridge between January 2008 and April 2010. The contents are my original work, although much has been influenced by the collaborations in which I took part. I have not submitted the work in this dissertation for any other degree or qualification at any other university. Sarath Chandra Janga April, 2010 Cambridge, United Kingdom iv Acknowledgements First of all I would like to express my gratitude to Dr. Madan Babu with out whose continuous support all along my doctoral work, it would have just remained a dream for me to carry out my thesis work at MRC Laboratory of Molecular Biology. Madan has not only been an excellent supervisor but a good friend who was always supportive of my research interests, by allowing me to work independently on a wide range of problems during my stay here. He has been a source of great inspiration on various occasions and a great scientific colleague to work with. In short, I probably could not have had a more understanding and motivating supervisor. I am also very grateful to Dr. Sarah Teichmann whose equivalently supporting words from time to time have been a motivation to finish my doctoral work in a short time. I have learnt from her the art of adventuring into unchartered territories of molecular biology with out fear. I am also thankful for the kind support and warm welcome that I received from Dr. Cyrus Chothia from the first day that I came to LMB. I consider myself very fortunate to be in a wonderful lab with a lot of energetic and highly motivating people working on fundamental problems of molecular biology. Indeed, I must admit that I have learnt at least as much from my colleagues and seminars at LMB, as I have learnt from reading books and papers, not to mention the fun that I had during numerous lunch and dinner breaks with various members of the lab and TCB group in particular. I especially would like to thank A Wuster, B Lang, AJ Venkatakrishnan, D Hebenstreit, D Wilson, E Levy, G Chalancon, J Su, N Mittal, P Kota, R Janky, S De, T Perica, V Charoensawan and J Gsponer for making my stay at LMB a memorable experience. I am also greatly indebted to all my scientific friends, collaborators and mentors, both in the past and during my PhD, for having helped me learn and adventure diverse areas of molecular biology. In no defined order, I would like to sincerely thank Agustino Martinez-Antonio (Irapuato, Mexico) for his confidence in my abilities, Ernesto Perez-Rueda (Cuernavaca, Mexico) for his kind hospitality during my visits to mexico, Gabriel Moreno-Hagelsieb (Waterloo, Canada) for being a great mentor and an excellent scientific friend, Heladia Salgado (Cuernavaca, Mexico) for her energy and patience to my requests to data, Andrew Emili (Toronto, Canada) for giving me the opportunity to work on an unsolved mystery, Denis Thieffry (Marseille, France) for making me learn to focus on important ideas and many other colleagues for scientific discussions over the years which made me a mature and independent scientist. I would also like to take this opportunity to offer my gratitude to all colleagues, administrative staff and heads of division, Venki Ramakrishnan and Kiyoshi Nagai at LMB whose continuous support have made it possible for me to develop a career in science. I am also grateful to the financial support that I received from Cambridge Commonwealth Trust (CCT) and the Medical Research Council during my PhD. Last, but not the least, I am most indebted to my family (my parents and sister) as well as near and dear who have been continuously supportive of my adventures in science and for understanding my reasons to be in silence for months. My very presence on this planet would not have been possible if not for my mother who expired long before I knew what maths and science is all about. I dedicate this thesis on her name. v Abbreviations 3C Chromosome Confirmation Capture ArcA Aerobic respiration control protein A BDBH Bi-Directional Best Hits BLAST Basic Local Alignment Search Tool cAMP cyclic Adenosine MonoPhosphate ChIP Chromatin immunoprecipitation CLIP Cross Linking and Immuno-Precipitation COGs Clusters of Orthologous Groups CRP cAMP Receptor Protein CT Chromosomal Territory DBTBS DataBase of Transcriptional regulation in Bacillus Subtilis DNA DeoxyriboNucleic Acid EC Enzyme Commission FDR False Discovery Rate FIS Factor for Inversion Stimulation FISH Fluorescent In Situ Hybridization FFL Feed Forward Loop FNR regulator of Fumarate and Nitrate Reduction GBA Guilt By Association GC Genomic Context GO Gene Ontology GR Global Regulator GRN Gene Regulatory Network HMM Hidden Markov model hnRNP heterogeneous nuclear RiboNucleoProtein HNS Histone-like Nucleoid Structuring protein HU Heat Unstable protein IHF Integration Host Factor LAD Lamina Associated Domain LCMS Liquid Chromatography-Mass Spectrometry LCR Locus Control Region MALDI Matrix-Assisted Laser Desorption/Ionization MCL Markov CLuster algorithm mRNA Messenger RNA NAP Nucleoid Associated Protein PAB PolyAdenylate-Binding protein PI/PPI Protein Interactions PTM Post-Translational Modification PTN Post-Transcriptional Network PTS PhosphoTransferase System RBD RNA Binding Domain RBP RNA Binding Protein RIP RNP ImmunoPrecipitation RNA RiboNucleic Acid RNP RiboNucleo Protein complex RRM RNA Recognition Motif TAP Tandem Affinity Purification TF Transcription Factor TG Target Gene TPI Target Proximity Index TRN Transcriptional Regulatory Network vi Summary It is increasingly becoming clear in the post-genomic era that proteins in a cell do not work in isolation but rather work in the context of other proteins and cellular entities during their life time. This has lead to the notion that cellular components can be visualized as wiring diagrams composed of different molecules like proteins, DNA, RNA and metabolites. These systems-approaches for quantitatively and qualitatively studying the dynamic biological systems have provided us unprecedented insights at varying levels of detail into the cellular organization and the interplay between different processes. The work in this thesis attempts to use these systems or network-based approaches to understand the design principles governing different cellular processes and to elucidate the functional and evolutionary consequences of the observed principles. Chapter 1 is an introduction to the concepts of networks and graph theory summarizing the various properties which are frequently studied in biological networks along with an overview of different kinds of cellular networks that are amenable for graph-theoretical analysis, emphasizing in particular on transcriptional, post-transcriptional and functional networks. In Chapter 2, I address the questions, how and why are genes organized on a particular fashion on bacterial genomes and what are the constraints bacterial transcriptional regulatory networks impose on their genomic organization. I then extend this one step further to unravel the constraints imposed on the network of TF- TF interactions and relate it to the numerous phenotypes they can impart to growing bacterial populations. Chapter 3 presents an overview of our current understanding of eukaryotic gene regulation at different levels and then shows evidence for the existence of a higher- order organization of genes across and within chromosomes that is constrained by transcriptional regulation. The results emphasize that specific organization of genes across and within chromosomes that allowed for efficient control of transcription within the nuclear space has been selected during evolution. Chapter 4 first summarizes different computational approaches for inferring the function of uncharacterized genes and then discusses network-based approaches currently employed for predicting function. I then present an overview of a recent high-throughput study performed to provide a ‘systems-wide’ functional blueprint of the bacterial model, Escherichia coli K-12, with insights into the biological and evolutionary significance of previously uncharacterized proteins. In Chapter 5, I focus on post-transcriptional regulatory networks formed by RBPs. I discuss the sequence attributes and functional processes associated with RBPs, methods used for the construction of the networks formed by them and finally examine the structure and dynamics of these networks based on recent publicly available data. The results obtained here show that RBPs exhibit distinct gene expression dynamics compared to other class of proteins in a eukaryotic cell. Chapter 6 provides a summary of the important aspects of the findings presented in this thesis and their practical implications. Overall, this dissertation presents a framework which
Recommended publications
  • RECENT ADVANCES in BIOLOGY, BIOPHYSICS, BIOENGINEERING and COMPUTATIONAL CHEMISTRY
    RECENT ADVANCES in BIOLOGY, BIOPHYSICS, BIOENGINEERING and COMPUTATIONAL CHEMISTRY Proceedings of the 5th WSEAS International Conference on CELLULAR and MOLECULAR BIOLOGY, BIOPHYSICS and BIOENGINEERING (BIO '09) Proceedings of the 3rd WSEAS International Conference on COMPUTATIONAL CHEMISTRY (COMPUCHEM '09) Puerto De La Cruz, Tenerife, Canary Islands, Spain December 14-16, 2009 Recent Advances in Biology and Biomedicine A Series of Reference Books and Textbooks Published by WSEAS Press ISSN: 1790-5125 www.wseas.org ISBN: 978-960-474-141-0 RECENT ADVANCES in BIOLOGY, BIOPHYSICS, BIOENGINEERING and COMPUTATIONAL CHEMISTRY Proceedings of the 5th WSEAS International Conference on CELLULAR and MOLECULAR BIOLOGY, BIOPHYSICS and BIOENGINEERING (BIO '09) Proceedings of the 3rd WSEAS International Conference on COMPUTATIONAL CHEMISTRY (COMPUCHEM '09) Puerto De La Cruz, Tenerife, Canary Islands, Spain December 14-16, 2009 Recent Advances in Biology and Biomedicine A Series of Reference Books and Textbooks Published by WSEAS Press www.wseas.org Copyright © 2009, by WSEAS Press All the copyright of the present book belongs to the World Scientific and Engineering Academy and Society Press. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of the Editor of World Scientific and Engineering Academy and Society Press. All papers of the present volume were peer reviewed
    [Show full text]
  • The EMBL-European Bioinformatics Institute the Hub for Bioinformatics in Europe
    The EMBL-European Bioinformatics Institute The hub for bioinformatics in Europe Blaise T.F. Alako, PhD [email protected] www.ebi.ac.uk What is EMBL-EBI? • Part of the European Molecular Biology Laboratory • International, non-profit research institute • Europe’s hub for biological data, services and research The European Molecular Biology Laboratory Heidelberg Hamburg Hinxton, Cambridge Basic research Structural biology Bioinformatics Administration Grenoble Monterotondo, Rome EMBO EMBL staff: 1500 people Structural biology Mouse biology >60 nationalities EMBL member states Austria, Belgium, Croatia, Denmark, Finland, France, Germany, Greece, Iceland, Ireland, Israel, Italy, Luxembourg, the Netherlands, Norway, Portugal, Spain, Sweden, Switzerland and the United Kingdom Associate member state: Australia Who we are ~500 members of staff ~400 work in services & support >53 nationalities ~120 focus on basic research EMBL-EBI’s mission • Provide freely available data and bioinformatics services to all facets of the scientific community in ways that promote scientific progress • Contribute to the advancement of biology through basic investigator-driven research in bioinformatics • Provide advanced bioinformatics training to scientists at all levels, from PhD students to independent investigators • Help disseminate cutting-edge technologies to industry • Coordinate biological data provision throughout Europe Services Data and tools for molecular life science www.ebi.ac.uk/services Browse our services 9 What services do we provide? Labs around the
    [Show full text]
  • Functional Effects Detailed Research Plan
    GeCIP Detailed Research Plan Form Background The Genomics England Clinical Interpretation Partnership (GeCIP) brings together researchers, clinicians and trainees from both academia and the NHS to analyse, refine and make new discoveries from the data from the 100,000 Genomes Project. The aims of the partnerships are: 1. To optimise: • clinical data and sample collection • clinical reporting • data validation and interpretation. 2. To improve understanding of the implications of genomic findings and improve the accuracy and reliability of information fed back to patients. To add to knowledge of the genetic basis of disease. 3. To provide a sustainable thriving training environment. The initial wave of GeCIP domains was announced in June 2015 following a first round of applications in January 2015. On the 18th June 2015 we invited the inaugurated GeCIP domains to develop more detailed research plans working closely with Genomics England. These will be used to ensure that the plans are complimentary and add real value across the GeCIP portfolio and address the aims and objectives of the 100,000 Genomes Project. They will be shared with the MRC, Wellcome Trust, NIHR and Cancer Research UK as existing members of the GeCIP Board to give advance warning and manage funding requests to maximise the funds available to each domain. However, formal applications will then be required to be submitted to individual funders. They will allow Genomics England to plan shared core analyses and the required research and computing infrastructure to support the proposed research. They will also form the basis of assessment by the Project’s Access Review Committee, to permit access to data.
    [Show full text]
  • ISMB 2008 Toronto
    ISMB 2008 Toronto The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters Citation Linial, Michal, Jill P. Mesirov, B. J. Morrison McKay, and Burkhard Rost. 2008. ISMB 2008 Toronto. PLoS Computational Biology 4(6): e1000094. Published Version doi:10.1371/journal.pcbi.1000094 Citable link http://nrs.harvard.edu/urn-3:HUL.InstRepos:11213310 Terms of Use This article was downloaded from Harvard University’s DASH repository, and is made available under the terms and conditions applicable to Other Posted Material, as set forth at http:// nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of- use#LAA Message from ISCB ISMB 2008 Toronto Michal Linial1,2, Jill P. Mesirov1,3, BJ Morrison McKay1*, Burkhard Rost1,4 1 International Society for Computational Biology (ISCB), University of California San Diego, La Jolla, California, United States of America, 2 Sudarsky Center, The Hebrew University of Jerusalem, Jerusalem, Israel, 3 Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America, 4 Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York, United States of America the integration of students, and for the of ISMB. One meeting in South Asia support of young leaders in the field. (InCoB; http://incob.binfo.org.tw/) has ISMB has also become a forum for already been sponsored by ISCB, and reviewing the state of the art in the many another one in North Asia is going to fields of this growing discipline, for follow. ISMB itself has also been held in introducing new directions, and for an- Australia (2003) and Brazil (2006).
    [Show full text]
  • Exploring the Structure and Function Paradigm Oliver C Redfern, Benoit Dessailly and Christine a Orengo
    Available online at www.sciencedirect.com Exploring the structure and function paradigm Oliver C Redfern, Benoit Dessailly and Christine A Orengo Advances in protein structure determination, led by the Figure 1 shows that as the international genomics initiat- structural genomics initiatives have increased the proportion of ives gather pace, both the number of sequences and novel folds deposited in the Protein Data Bank. However, these protein families are still growing at an exponential rate, structures are often not accompanied by functional annotations although the rate of expansion of protein families is with experimental confirmation. In this review, we reassess the substantially less. This trend is also observed among meaning of structural novelty and examine its relevance domain families, which are 10-fold fewer (<10 000) than to the complexity of the structure-function paradigm. the number of protein families. By targeting these, the Recent advances in the prediction of protein function from structural genomics initiatives can aim to characterise the structure are discussed, as well as new sequence-based major building blocks of whole proteins and since these methods for partitioning large, diverse superfamilies into domains recur in different combination in the genomes, it biologically meaningful clusters. Obtaining structural data for will be an important step towards understanding the these functionally coherent groups of proteins will allow us to complete structural repertoire in nature. Furthermore, better understand the relationship between structure and the complement of molecular functions found within function. an organism is likely to be even fewer. For example, 97% of proteins in yeast can be assigned one or more of Address 4000 unique GO terms.
    [Show full text]
  • AI and Bioinformatics
    AI Magazine Volume 25 Number 1 (2004) (© AAAI) Articles Editorial Introduction AI and Bioinformatics Janice Glasgow, Igor Jurisica, and Burkhard Rost ■ This article is an editorial introduction to the re- modern-day biology is far more complex than search discipline of bioinformatics and to the articles suggested by the simplified sketch presented in this special issue. In particular, we address the issue here. In fact, researchers in life sciences live off of how techniques from AI can be applied to many of the introduction of new concepts; the discov- the open and complex problems of modern-day mol- ecular biology. ery of exceptions; and the addition of details that usually complicate, rather than simplify, his special issue of AI Magazine focuses the overall understanding of the field. on some areas of research in bioinfor- Possibly the most rapidly growing area of re- Tmatics that have benefited from applying cent activity in bioinformatics is the analysis AI techniques. Undoubtedly, bioinformatics is of microarray data. The article by Michael Mol- a truly interdisciplinary field: Although some la, Michael Waddell, David Page, and Jude researchers continuously affect wet labs in life Shavlik (“Using Machine Learning to Design science through collaborations or provision of and Interpret Gene-Expression Microarrays”) tools, others are rooted in the theory depart- introduces some background information and ments of exact sciences (physics, chemistry, or provides a comprehensive description of how engineering) or computer sciences. This wide techniques from machine learning can be used variety creates many different perspectives and to help understand this high-dimensional and terminologies. One result of this Babel of lan- prolific gene-expression data.
    [Show full text]
  • EMBO Facts & Figures
    excellence in life sciences Reykjavik Helsinki Oslo Stockholm Tallinn EMBO facts & figures & EMBO facts Copenhagen Dublin Amsterdam Berlin Warsaw London Brussels Prague Luxembourg Paris Vienna Bratislava Budapest Bern Ljubljana Zagreb Rome Madrid Ankara Lisbon Athens Jerusalem EMBO facts & figures HIGHLIGHTS CONTACT EMBO & EMBC EMBO Long-Term Fellowships Five Advanced Fellows are selected (page ). Long-Term and Short-Term Fellowships are awarded. The Fellows’ EMBO Young Investigators Meeting is held in Heidelberg in June . EMBO Installation Grants New EMBO Members & EMBO elects new members (page ), selects Young EMBO Women in Science Young Investigators Investigators (page ) and eight Installation Grantees Gerlind Wallon EMBO Scientific Publications (page ). Programme Manager Bernd Pulverer S Maria Leptin Deputy Director Head A EMBO Science Policy Issues report on quotas in academia to assure gender balance. R EMBO Director + + A Conducts workshops on emerging biotechnologies and on H T cognitive genomics. Gives invited talks at US National Academy E IC of Sciences, International Summit on Human Genome Editing, I H 5 D MAN 201 O N Washington, DC.; World Congress on Research Integrity, Rio de A M Janeiro; International Scienti c Advisory Board for the Centre for Eilish Craddock IT 2 015 Mammalian Synthetic Biology, Edinburgh. Personal Assistant to EMBO Fellowships EMBO Scientific Publications EMBO Gold Medal Sarah Teichmann and Ido Amit receive the EMBO Gold the EMBO Director David del Álamo Thomas Lemberger Medal (page ). + Programme Manager Deputy Head EMBO Global Activities India and Singapore sign agreements to become EMBC Associate + + Member States. EMBO Courses & Workshops More than , participants from countries attend 6th scienti c events (page ); participants attend EMBO Laboratory Management Courses (page ); rst online course EMBO Courses & Workshops recorded in collaboration with iBiology.
    [Show full text]
  • Are You an Invited Speaker? a Bibliometric Analysis of Elite Groups for Scholarly Events in Bioinformatics
    Are You an Invited Speaker? A Bibliometric Analysis of Elite Groups for Scholarly Events in Bioinformatics Senator Jeong, Sungin Lee, and Hong-Gee Kim Biomedical Knowledge Engineering Laboratory, Seoul National University, 28–22 YeonGeon Dong, Jongno Gu, Seoul 110–749, Korea. E-mail: {senator, sunginlee, hgkim}@snu.ac.kr Participating in scholarly events (e.g., conferences, work- evaluation, but it would be hard to claim that they have pro- shops, etc.) as an elite-group member such as an orga- vided comprehensive lists of evaluation measurements. This nizing committee chair or member, program committee article aims not to provide such lists but to add to the current chair or member, session chair, invited speaker, or award winner is beneficial to a researcher’s career develop- practices an alternative metric that complements existing per- ment.The objective of this study is to investigate whether formance measures to give a more comprehensive picture of elite-group membership for scholarly events is represen- scholars’ performance. tative of scholars’ prominence, and which elite group is By one definition (Jeong, 2008), a scholarly event is the most prestigious. We collected data about 15 global “a sequentially and spatially organized collection of schol- (excluding regional) bioinformatics scholarly events held in 2007. We sampled (via stratified random sampling) ars’ interactions with the intention of delivering and shar- participants from elite groups in each event. Then, bib- ing knowledge, exchanging research ideas, and performing liometric indicators (total citations and h index) of seven related activities.” As such, scholarly events are communica- elite groups and a non-elite group, consisting of authors tion channels from which our new evaluation tool can draw who submitted at least one paper to an event but were its supporting evidence.
    [Show full text]
  • BIOINFORMATICS Doi:10.1093/Bioinformatics/Btq499
    Vol. 26 ECCB 2010, pages i409–i411 BIOINFORMATICS doi:10.1093/bioinformatics/btq499 ECCB 2010 Organization CONFERENCE CHAIR B. Comparative Genomics, Phylogeny, and Evolution Yves Moreau, Katholieke Universiteit Leuven, Belgium Martijn Huynen, Radboud University Nijmegen Medical Centre, The Netherlands PROCEEDINGS CHAIR Yves Van de Peer, Ghent University & VIB, Belgium Jaap Heringa, Free University of Amsterdam, The Netherlands C. Protein and Nucleotide Structure LOCAL ORGANIZING COMMITTEE Anna Tramontano, University of Rome ‘La Sapienza’, Italy Jan Gorodkin, University of Copenhagen, Denmark Yves Moreau, Katholieke Universiteit Leuven, Belgium Jaap Heringa, Free University of Amsterdam, The Netherlands D. Annotation and Prediction of Molecular Function Gert Vriend, Radboud University, Nijmegen, The Netherlands Yves Van de Peer, University of Ghent & VIB, Belgium Nir Ben-Tal, Tel-Aviv University, Israel Kathleen Marchal, Katholieke Universiteit Leuven, Belgium Fritz Roth, Harvard Medical School, USA Jacques van Helden, Université Libre de Bruxelles, Belgium Louis Wehenkel, Université de Liège, Belgium E. Gene Regulation and Transcriptomics Antoine van Kampen, University of Amsterdam & Netherlands Jaak Vilo, University of Tartu, Estonia Bioinformatics Center (NBIC) Zohar Yakhini, Agilent Laboratories, Tel-Aviv & the Tech-nion, Peter van der Spek, Erasmus MC, Rotterdam, The Netherlands Haifa, Israel STEERING COMMITTEE F. Text Mining, Ontologies, and Databases Michal Linial (Chair), Hebrew University, Jerusalem, Israel Alfonso Valencia, National
    [Show full text]
  • The 4Th Bologna Winter School: Hot Topics in Structural Genomics†
    Comparative and Functional Genomics Comp Funct Genom 2003; 4: 394–396. Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/cfg.314 Conference Report The 4th Bologna Winter School: hot topics in structural genomics† Rita Casadio* Department of Biology/CIRB, University of Bologna, Via Irnerio 42, 40126 Bologna, Italy *Correspondence to: Abstract Rita Casadio, Department of Biology/CIRB, University of The 4th Bologna Winter School on Biotechnologies was held on 9–15 February Bologna, Via Irnerio 42, 40126 2003 at the University of Bologna, Italy, with the specific aim of discussing recent Bologna, Italy. developments in bioinformatics. The school provided an opportunity for students E-mail: [email protected] and scientists to debate current problems in computational biology and possible solutions. The course, co-supported (as last year) by the European Science Foundation program on Functional Genomics, focused mainly on hot topics in structural genomics, including recent CASP and CAPRI results, recent and promising genome- Received: 3 June 2003 wide predictions, protein–protein and protein–DNA interaction predictions and Revised: 5 June 2003 genome functional annotation. The topics were organized into four main sections Accepted: 5 June 2003 (http://www.biocomp.unibo.it). Published in 2003 by John Wiley & Sons, Ltd. Predictive methods in structural Predictive methods in functional genomics genomics • Contemporary challenges in structure prediction • Prediction of protein function (Arthur Lesk, and the CASP5 experiment (John Moult, Uni- University of Cambridge, Cambridge, UK). versity of Maryland, Rockville, MD, USA). • Microarray data analysis and mining (Raf- • Contemporary challenges in structure prediction faele Calogero, University of Torino, Torino, (Anna Tramontano, University ‘La Sapienza’, Italy).
    [Show full text]
  • Statistical and Computational Methods for Analyzing High-Throughout Genomic Data
    Statistical and Computational Methods for Analyzing High-Throughout Genomic Data by Jingyi Li A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Biostatistics and the Designated Emphasis in Computational and Genomic Biology in the Graduate Division of the University of California, Berkeley Committee in charge: Professor Peter J. Bickel, Chair Professor Haiyan Huang Professor Sandrine Dudoit Professor Steven E. Brenner Spring 2013 Statistical and Computational Methods for Analyzing High-Throughput Genomic Data Copyright 2013 by Jingyi Li 1 Abstract Statistical and Computational Methods for Analyzing High-Throughput Genomic Data by Jingyi Li Doctor of Philosophy in Biostatistics and the Designated Emphasis in Computational and Genomic Biology University of California, Berkeley Professor Peter J. Bickel, Chair In the burgeoning field of genomics, high-throughput technologies (e.g. microarrays, next-generation sequencing and label-free mass spectrometry) have enabled biologists to perform global analysis on thousands of genes, mRNAs and proteins simultaneously. Ex- tracting useful information from enormous amounts of high-throughput genomic data is an increasingly pressing challenge to statistical and computational science. In this thesis, I will address three problems in which statistical and computational methods were used to analyze high-throughput genomic data to answer important biological questions. The first part of this thesis focuses on addressing an important question in genomics: how to identify and quantify mRNA products of gene transcription (i.e., isoforms) from next- generation mRNA sequencing (RNA-Seq) data? We developed a statistical method called Sparse Linear modeling of RNA-Seq data for Isoform Discovery and abundance Estimation (SLIDE) that employs probabilistic modeling and L1 sparse estimation to answer this ques- tion.
    [Show full text]
  • Janet Thornton Ing for Future Research Needs
    THIS MONTH are dedicated to developing and maintaining services THE AUTHOR FILE for the scientific community, including databases such as the genome portal Ensembl, the proteomic data- base UniProt and more, all of which requires strategiz- Janet Thornton ing for future research needs. “These resources don’t Finding ways to navigate the reactions happen overnight; they’re big teams and they have to of life and herd tigers are all part of her think carefully,” she says. workday. Fostering collaborative science internally and across organizational and national boundaries such as for The wealth of available sequenced genomes invites sci- the project ELIXIR—the European life sciences infra- entists to explore the molecular basis of life, to browse structure for biological information, a pan-European and parse the beautiful infrastructure she has spearheaded to help scientists complexity of this now share data—is hard work, especially in times of fiscal “open book,” says Janet belt-tightening. At times, Thornton says, the task can Thornton. feel more like “herding tigers” than cats. A physicist turned Overall she sees the role of computational biology computational shifting, she says. In physics, work by experimentalists biologist, Thornton led to theoretical lines of inquiry. Biology is in its data- directs the European gathering stage, and rapidly moving toward the ability Bioinformatics to model processes such as the effect of a drug on the Institute (EBI), where human body. “We’re only a fraction of the way towards she has cultivated being able to do that,” she says, but she believes the EBI interdisciplinary EBI’s resources help to create the “bedrock” for this teamwork since 2001.
    [Show full text]