Analysis of Genomic Variants for Investigating the Genetic Etiology of Disease

Total Page:16

File Type:pdf, Size:1020Kb

Analysis of Genomic Variants for Investigating the Genetic Etiology of Disease ANALYSIS OF GENOMIC VARIANTS FOR INVESTIGATING THE GENETIC ETIOLOGY OF DISEASE A DISSERTATION SUBMITTED TO THE DEPARTMENT OF BIOMEDICAL INFORMATICS AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY Daniel Edmund Newburger March 2015 © 2015 by Daniel Edmund Newburger. All Rights Reserved. Re-distributed by Stanford University under license with the author. This work is licensed under a Creative Commons Attribution- Noncommercial 3.0 United States License. http://creativecommons.org/licenses/by-nc/3.0/us/ This dissertation is online at: http://purl.stanford.edu/kh271wr8164 ii I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Serafim Batzoglou, Primary Adviser I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Jonathan Pritchard I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Arend Sidow Approved for the Stanford University Committee on Graduate Studies. Patricia J. Gumport, Vice Provost for Graduate Education This signature page was generated electronically upon submission of this dissertation in electronic format. An original signed hard copy of the signature page is on file in University Archives. iii Abstract The study of genomic variation within human populations is critical for elucidating the genetic factors that contribute to disease. Identifying and characterizing the genetic architecture of disease advances clinical care by facilitating the development of novel diagnostic tools, the identification of new therapeutic targets, and the practice of personalized treatment for genetic syndromes. The massive volume of genetic data generated by modern genotyping technologies, combined with the informatics challenges of filtering and interpreting these noisy measurements, represent significant obstacles to genomic research. These technical issues necessitate the development of computationally efficient methodologies that leverage raw genotype data for the comparative genomic analysis of complex phenotypes across human subpopulations. In this dissertation, I describe my contributions towards the biomedical study of genetic syndromes using high-throughput genotyping technologies. First, I discuss methods for studying the genome evolution of pre-malignant cancer lesions during progression to breast cancer. Second, I describe algorithms for performing highly accurate variant validation in genomic studies using next generation sequencing. Fi- nally, I present methods for identifying novel disease susceptibility loci in complex diseases using identity by descent mapping in large case-control cohorts. iv Acknowledgements I would like to thank the truly extraordinary mentors, collaborators, and friends who have supported me through both the good times and the terrifying doldrums of graduate school. I simply cannot thank you enough for your patience, wisdom, and friendship. Foremost, I would like to thank my thesis advisor, Serafim. I remain in awe of your ability to deconvolute the most tangled analytical problems into solvable components. You found elegant paths through so many technical obstacles in my research and have always been a wellspring of novel ideas. I am even more grateful to you for your unwavering encouragement and patience. You gave me the freedom to explore far afield, and I feel privileged to be part of your group. I am deeply grateful to Arend Sidow for his guidance, mentorship, and leadership. Arend, you brought vision and scientific rigor to every meeting, and you always managed to make time in your schedule to share your expertise. You taught me how to examine complex problems down to the finest detail, and your forthright advice and criticisms have been invaluable. You are one of the few people who will say what you really think, and yet you are always optimistic and generous in your feedback. I am also indebted to the other members of my reading and orals committees: Rob West, Jonathan Pritchard, and Gavin Sherlock. Rob, your boundless knowledge of cancer genetics and histomorphology drove our cancer genomics projects forward, and I am grateful for all of the time you spent tutoring me in the field. Jonathan, although we didn’t meet until the end of my graduate career, your advice has been penetrating and insightful. Gavin, thank you so much for chairing my defense committee and for your keen questions and suggestions. v My thesis would not have been possible without several other mentors. I would like to thank Hanlee Ji and Sivan Bercovici for their incredible generosity. Hanlee, you coached me through the first years of my PhD with wisdom, precision, and humor. Your relentless pursuit of scientific innovation and your mastery of genetics, oncology, and biotechnology continue to inspire me. Sivan, thanks are entirely inadequate to express my gratitude for your patience, your guidance, and the surfeit of brilliant ideas you contributed to our joint projects. Our meetings have been some of the funniest and most productive moments in my graduate work. I would also like to thank Atul Butte, who first introduced me to bioinformatics as an undergrad, and whose encouragement and counsel propelled me through the first few years of graduate school. I am deeply indebted to my academic advisor, Russ Altman. Russ, your clairvoyant advice during our biannual meetings proved pivotal over and over again, and I can’t thank you enough for ensuring that my meandering thesis evolved into a BMI dissertation. Likewise, I am deeply indebted to my colleague Alex Morgan, who has been exceptionally generous as a mentor. Whether proofreading my fellowship applications in first year or talking me through tough decisions in sixth year, you have always provided singularly thoughtful advice and gone far out of your way to render assistance. Without your help, I would still be floundering in my studies. It has been a joy to be a member of the BMI program. Mary Jeanne, thank you so much for steering me through the tortuous process of navigating graduate school. We in BMI are incredibly lucky to have you at the helm of the BMI program, keeping us from running aground on rocky shores. I would like to thank all the other amazing people who have kept BMI afloat: Nancy Lennartsson, Steve Bagley, John DiMario, Betty Cheng, Larry Fagan, Carol Maxwell, and of course Darlene Vian. I would also like to thank my staunch compatriots in BMI, especially fellow classmates Linda Liu, Nick Tatonetti, and Rob Bruggner. The Batzoglou lab has fostered some of the most amazing folks at Stanford, and I feel incredibly fortunate to call them friends and colleagues. I would especially like to thank Sarah Aerni, Marc Schaub, Tom Do, Sam Gross, Jesse Rodriguez, Sofia Kyriazopoulou-Panagiotopoulou, Anshul Kundaje, Lin Huang, Alex Bishara, vi and Yuling Liu. Marc and Sarah, your friendship and advice meant so much to me as I struggled to orient myself in the lab, and I deeply appreciate your generosity as mentors. Jesse, working with you and learning from you has been a blast. Alex, I still have not watched Clerks. I have been privileged to work with incredible collaborators from outside the lab, as well. I would like to thank Georges Natsoulis, John Bell, Sue Grimes, Patrick Flaherty, Sarah Garcia, Ziming Weng, Noah Spies, Alayne Brunner, Robert Sweeney, and Marina Sirota. Patrick, you inspired me with your commitment to scientific excellence and taught me how to evaluate my projects and research goals. John, I greatly enjoyed kvetching and swapping books during our much-needed coffee breaks. I would like to give special thanks to a few friends without whom I would never have completed my graduate studies. I am incredibly grateful to Dorna Kashef- Haghighi, whose brilliance and hard work made our joint projects in the Batzoglou lab possible, and whose friendship made it fun. Working alongside you was the highlight of graduate school. Tiffany Chen and Tim Lee, I am exceptionally fortunate to be friends with you. You have been my most trusted confidants in matters ranging from research priorities to hunting for good eats in Cupertino. Tiffany, your insight and wisdom regarding matters of both research and career have been invaluable. Tim, your humor, consideration, and scientific advice have kept me sane during times of stress and failure. I hope you get another twelve-win arena run soon. My family has been an inexhaustible source of love and support. Mom, you have always set the highest bar for hard work and dedication to research. I would never have made it through graduate school without your encouragement and, when necessary, admonishments. Dad, you first got me interested in science, and your sage and pragmatic advice has always helped me tackle questions of research and career. Maggie, I can always look to you for both encouragement and commiseration. Finally, completing graduate school would have been inconceivable without the love and support of my wife, Melody. Mel, whether proofreading my papers at midnight, fixing my slides, celebrating victories, or providing consolation, you were always there for me, making life better than better; you make life great. vii Contents Abstract iv Acknowledgements v 1 Introduction 1 2 Background 3 2.1 The genome and disease . .3 2.1.1 Genomic variation . .4 2.1.2 Technologies for genomic studies . .6 2.1.3 Cancer sequencing . 10 3 Genome Evolution in Breast Cancer 13 3.1 Abstract . 13 3.2 Introduction . 14 3.3 Results . 15 3.3.1 Whole-genome sequencing of early neoplasias and related car- cinomas from archival material . 15 3.3.2 Somatic SNVs fall into a limited and highly structured set of classes .
Recommended publications
  • Proquest Dissertations
    Automated learning of protein involvement in pathogenesis using integrated queries Eithon Cadag A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy University of Washington 2009 Program Authorized to Offer Degree: Department of Medical Education and Biomedical Informatics UMI Number: 3394276 All rights reserved INFORMATION TO ALL USERS The quality of this reproduction is dependent upon the quality of the copy submitted. In the unlikely event that the author did not send a complete manuscript and there are missing pages, these will be noted. Also, if material had to be removed, a note will indicate the deletion. UMI Dissertation Publishing UMI 3394276 Copyright 2010 by ProQuest LLC. All rights reserved. This edition of the work is protected against unauthorized copying under Title 17, United States Code. uest ProQuest LLC 789 East Eisenhower Parkway P.O. Box 1346 Ann Arbor, Ml 48106-1346 University of Washington Graduate School This is to certify that I have examined this copy of a doctoral dissertation by Eithon Cadag and have found that it is complete and satisfactory in all respects, and that any and all revisions required by the final examining committee have been made. Chair of the Supervisory Committee: Reading Committee: (SjLt KJ. £U*t~ Peter Tgffczy-Hornoch In presenting this dissertation in partial fulfillment of the requirements for the doctoral degree at the University of Washington, I agree that the Library shall make its copies freely available for inspection. I further agree that extensive copying of this dissertation is allowable only for scholarly purposes, consistent with "fair use" as prescribed in the U.S.
    [Show full text]
  • DEPARTMENT of HEALTH and HUMAN SERVICES NATIONAL INSTITUTES of HEALTH NATIONAL CANCER INSTITUTE 44Th Meeting BOARD of SCIENTIFIC
    DEPARTMENT OF HEALTH AND HUMAN SERVICES NATIONAL INSTITUTES OF HEALTH NATIONAL CANCER INSTITUTE 44th Meeting BOARD OF SCIENTIFIC ADVISORS Minutes of Meeting November 2–3, 2009 Building 31C, Conference Room 10 Bethesda, Maryland DEPARTMENT OF HEALTH AND HUMAN SERVICES NATIONAL INSTITUTES OF HEALTH NATIONAL CANCER INSTITUTE BOARD OF SCIENTIFIC ADVISORS MINUTES OF MEETING November 2–3, 2009 The Board of Scientific Advisors (BSA), National Cancer Institute (NCI), convened for its 44th meeting on Monday, 2 November 2009, at 8:00 a.m. in Conference Room 10, Building 31C, National Institutes of Health (NIH), Bethesda, MD. Dr. Richard L. Schilsky, Professor of Medicine, Section of Hematology and Oncology, Biological Sciences Division, University of Chicago Pritzker School of Medicine, presided as Chair. The meeting was open to the public from 8:00 a.m. until 4:35 p.m. on 2 November for the NCI Director’s report; a report on NCI Congressional relations; reports on Comparative Effectiveness Research (CER) and linking Surveillance, Epidemiology and End Results (SEER) and Medicare claims databases to facilitate CER; an update on The Cancer Genome Atlas (TCGA) Program; the BSA Request for Applications (RFA) Annual Concept Report; and consideration of RFAs and requests for proposals (RFPs) reissuance concepts presented by NCI Program staff. The meeting was open to the public from 8:30 a.m. on 3 November until adjournment at 12:00 p.m. for a report on the cancer initiating cell and stem cell biology. BSA Board Members Present: Dr. Victor J. Strecher Dr. Richard L. Schilsky (Chair) Dr. Louise C. Strong Dr. Christine Ambrosone Dr.
    [Show full text]
  • The Principled Design of Large-Scale Recursive Neural Network Architectures–DAG-Rnns and the Protein Structure Prediction Problem
    Journal of Machine Learning Research 4 (2003) 575-602 Submitted 2/02; Revised 4/03; Published 9/03 The Principled Design of Large-Scale Recursive Neural Network Architectures–DAG-RNNs and the Protein Structure Prediction Problem Pierre Baldi [email protected] Gianluca Pollastri [email protected] School of Information and Computer Science Institute for Genomics and Bioinformatics University of California, Irvine Irvine, CA 92697-3425, USA Editor: Michael I. Jordan Abstract We describe a general methodology for the design of large-scale recursive neural network architec- tures (DAG-RNNs) which comprises three fundamental steps: (1) representation of a given domain using suitable directed acyclic graphs (DAGs) to connect visible and hidden node variables; (2) parameterization of the relationship between each variable and its parent variables by feedforward neural networks; and (3) application of weight-sharing within appropriate subsets of DAG connec- tions to capture stationarity and control model complexity. Here we use these principles to derive several specific classes of DAG-RNN architectures based on lattices, trees, and other structured graphs. These architectures can process a wide range of data structures with variable sizes and dimensions. While the overall resulting models remain probabilistic, the internal deterministic dy- namics allows efficient propagation of information, as well as training by gradient descent, in order to tackle large-scale problems. These methods are used here to derive state-of-the-art predictors for protein structural features such as secondary structure (1D) and both fine- and coarse-grained contact maps (2D). Extensions, relationships to graphical models, and implications for the design of neural architectures are briefly discussed.
    [Show full text]
  • Classifying Transport Proteins Using Profile Hidden Markov Models And
    Classifying Transport Proteins Using Profile Hidden Markov Models and Specificity Determining Sites Qing Ye A Thesis in The Department of Computer Science and Software Engineering Presented in Partial Fulfillment of the Requirements for the Degree of Master of Computer Science (MCompSc) at Concordia University Montréal, Québec, Canada April 2019 ⃝c Qing Ye, 2019 CONCORDIA UNIVERSITY School of Graduate Studies This is to certify that the thesis prepared By: Qing Ye Entitled: Classifying Transport Proteins Using Profile Hidden Markov Models and Specificity Determining Sites and submitted in partial fulfillment of the requirements for the degree of Master of Computer Science (MCompSc) complies with the regulations of this University and meets the accepted standards with respect to originality and quality. Signed by the Final Examining Committee: Chair Dr. T.-H. Chen Examiner Dr. T. Glatard Examiner Dr. A. Krzyzak Supervisor Dr. G. Butler Approved by Martin D. Pugh, Chair Department of Computer Science and Software Engineering 2019 Amir Asif, Dean Faculty of Engineering and Computer Science Abstract Classifying Transport Proteins Using Profile Hidden Markov Models and Specificity Determining Sites Qing Ye This thesis develops methods to classifiy the substrates transported across a membrane by a given transmembrane protein. Our methods use tools that predict specificity determining sites (SDS) after computing a multiple sequence alignment (MSA), and then building a profile Hidden Markov Model (HMM) using HMMER. In bioinformatics, HMMER is a set of widely used applications for sequence analysis based on profile HMM. Specificity determining sites (SDS) are the key positions in a protein sequence that play a crucial role in functional variation within the protein family during the course of evolution.
    [Show full text]
  • Methodology for Predicting Semantic Annotations of Protein Sequences by Feature Extraction Derived of Statistical Contact Potentials and Continuous Wavelet Transform
    Universidad Nacional de Colombia Sede Manizales Master’s Thesis Methodology for predicting semantic annotations of protein sequences by feature extraction derived of statistical contact potentials and continuous wavelet transform Author: Supervisor: Gustavo Alonso Arango Dr. Cesar German Argoty Castellanos Dominguez A thesis submitted in fulfillment of the requirements for the degree of Master’s on Engineering - Industrial Automation in the Department of Electronic, Electric Engineering and Computation Signal Processing and Recognition Group June 2014 Universidad Nacional de Colombia Sede Manizales Tesis de Maestr´ıa Metodolog´ıapara predecir la anotaci´on sem´antica de prote´ınaspor medio de extracci´on de caracter´ısticas derivadas de potenciales de contacto y transformada wavelet continua Autor: Tutor: Gustavo Alonso Arango Dr. Cesar German Argoty Castellanos Dominguez Tesis presentada en cumplimiento a los requerimientos necesarios para obtener el grado de Maestr´ıaen Ingenier´ıaen Automatizaci´onIndustrial en el Departamento de Ingenier´ıaEl´ectrica,Electr´onicay Computaci´on Grupo de Procesamiento Digital de Senales Enero 2014 UNIVERSIDAD NACIONAL DE COLOMBIA Abstract Faculty of Engineering and Architecture Department of Electronic, Electric Engineering and Computation Master’s on Engineering - Industrial Automation Methodology for predicting semantic annotations of protein sequences by feature extraction derived of statistical contact potentials and continuous wavelet transform by Gustavo Alonso Arango Argoty In this thesis, a method to predict semantic annotations of the proteins from its primary structure is proposed. The main contribution of this thesis lies in the implementation of a novel protein feature representation, which makes use of the pairwise statistical contact potentials describing the protein interactions and geometry at the atomic level.
    [Show full text]
  • BIOGRAPHICAL SKETCH NAME: Berger
    BIOGRAPHICAL SKETCH NAME: Berger, Bonnie eRA COMMONS USER NAME (credential, e.g., agency login): BABERGER POSITION TITLE: Simons Professor of Mathematics and Professor of Electrical Engineering and Computer Science EDUCATION/TRAINING (Begin with baccalaureate or other initial professional education, such as nursing, include postdoctoral training and residency training if applicable. Add/delete rows as necessary.) EDUCATION/TRAINING DEGREE Completion (if Date FIELD OF STUDY INSTITUTION AND LOCATION applicable) MM/YYYY Brandeis University, Waltham, MA AB 06/1983 Computer Science Massachusetts Institute of Technology SM 01/1986 Computer Science Massachusetts Institute of Technology Ph.D. 06/1990 Computer Science Massachusetts Institute of Technology Postdoc 06/1992 Applied Mathematics A. Personal Statement Advances in modern biology revolve around automated data collection and sharing of the large resulting datasets. I am considered a pioneer in the area of bringing computer algorithms to the study of biological data, and a founder in this community that I have witnessed grow so profoundly over the last 26 years. I have made major contributions to many areas of computational biology and biomedicine, largely, though not exclusively through algorithmic innovations, as demonstrated by nearly twenty thousand citations to my scientific papers and widely-used software. In recognition of my success, I have just been elected to the National Academy of Sciences and in 2019 received the ISCB Senior Scientist Award, the pinnacle award in computational biology. My research group works on diverse challenges, including Computational Genomics, High-throughput Technology Analysis and Design, Biological Networks, Structural Bioinformatics, Population Genetics and Biomedical Privacy. I spearheaded research on analyzing large and complex biological data sets through topological and machine learning approaches; e.g.
    [Show full text]
  • Program Book
    Pacific Symposium on Biocomputing 2016 January 4-8, 2016 Big Island of Hawaii Program Book PACIFIC SYMPOSIUM ON BIOCOMPUTING 2016 Big Island of Hawaii, January 4-8, 2016 Welcome to PSB 2016! We have prepared this program book to give you quick access to information you need for PSB 2016. Enclosed you will find • Logistics information • Menus for PSB hosted meals • Full conference schedule • Call for Session and Workshop Proposals for PSB 2017 • Poster/abstract titles and authors • Participant List Conference materials are also available online at http://psb.stanford.edu/conference-materials/. PSB 2016 gratefully acknowledges the support the Institute for Computational Biology, a collaborative effort of Case Western Reserve University, the Cleveland Clinic Foundation, and University Hospitals; the National Institutes of Health (NIH), the National Science Foundation (NSF); and the International Society for Computational Biology (ISCB). If you or your institution are interested in sponsoring, PSB, please contact Tiffany Murray at [email protected] If you have any questions, the PSB registration staff (Tiffany Murray, Georgia Hansen, Brant Hansen, Kasey Miller, and BJ Morrison-McKay) are happy to help you. Aloha! Russ Altman Keith Dunker Larry Hunter Teri Klein Maryln Ritchie The PSB 2016 Organizers PACIFIC SYMPOSIUM ON BIOCOMPUTING 2016 Big Island of Hawaii, January 4-8, 2016 SPEAKER INFORMATION Oral presentations of accepted proceedings papers will take place in Salon 2 & 3. Speakers are allotted 10 minutes for presentation and 5 minutes for questions for a total of 15 minutes. Instructions for uploading talks were sent to authors with oral presentations. If you need assistance with this, please see Tiffany Murray or another PSB staff member.
    [Show full text]
  • Conference Proceedingssmall
    1 COMMITTEES Steering Committee Phil Bourne - University of California, San Diego Eric Davidson - California Institute of Technology Steven Salzberg - The Institute for Genomic Research John Wooley - University of California San Diego, San Diego Supercomputer Center Organizing Committee Pat Blauvelt – LSS Membership Director Karen Hauge – Palo Alto Medical Foundation, Local Arangements Kass Goldfein - Finance Consultant AlishaHolloway – The J. David Gladstone Institutes, Tutorial Chair Sami Khuri – San Jose State University, Poster Chair Ann Loraine – University of North Carolina at Charlotte, CSB Publication Chair Fenglou Mao – University of Georgia, On-Line Registration and Refereeing Website Peter Markstein – in silico Labs, Program Co-Chair Vicky Markstein - Life Sciences Society, Conference Chair, LSS President Jean Tsukamoto - Graphics Design Bill Wang - Sun Microsystems Inc, LSS Information Technology Director Ying Xu – University of Georgia, Program Co-Chair Program Committee Tatsuya Akutsu – Kyoto University Chris Bailey-Kellogg – Dartmouth College Pierre Baldi – University of California Irvine Liming Cai – University of Georgia Bill Cannon – Pacific Northwest National Laboratory Jake Chen – Indiana University Bhaskar DasGupta – University of Illinois Chicago Andrey Gorin – Oak Ridge National Laboratory Matt Hibbs – Princeton University Wen-Lian Hsu – Academia Sinica Tamer Kahveci – University of Florida Carl Kingsford – University of Maryland Christina Leslie – Memorial Sloan-Kettering Cancer Center Jing Li – Case Western Reserve
    [Show full text]
  • Deep Learning in Chemoinformatics Using Tensor Flow
    UC Irvine UC Irvine Electronic Theses and Dissertations Title Deep Learning in Chemoinformatics using Tensor Flow Permalink https://escholarship.org/uc/item/963505w5 Author Jain, Akshay Publication Date 2017 Peer reviewed|Thesis/dissertation eScholarship.org Powered by the California Digital Library University of California UNIVERSITY OF CALIFORNIA, IRVINE Deep Learning in Chemoinformatics using Tensor Flow THESIS submitted in partial satisfaction of the requirements for the degree of MASTER OF SCIENCE in Computer Science by Akshay Jain Thesis Committee: Professor Pierre Baldi, Chair Professor Cristina Videira Lopes Professor Eric Mjolsness 2017 c 2017 Akshay Jain DEDICATION To my family and friends. ii TABLE OF CONTENTS Page LIST OF FIGURES v LIST OF TABLES vi ACKNOWLEDGMENTS vii ABSTRACT OF THE THESIS viii 1 Introduction 1 1.1 QSAR Prediction Methods . .2 1.2 Deep Learning . .4 2 Artificial Neural Networks(ANN) 5 2.1 Artificial Neuron . .5 2.2 Activation Function . .7 2.3 Loss function . .8 2.4 Optimization . .8 3 Deep Recursive Architectures 10 3.1 Recurrent Neural Networks (RNN) . 10 3.2 Recursive Neural Networks . 11 3.3 Directed Acyclic Graph Recursive Neural Networks (DAG-RNN) . 11 4 UG-RNN for small molecules 14 4.1 DAG Generation . 16 4.2 Local Information Vector . 16 4.3 Contextual Vectors . 17 4.4 Activity Prediction . 17 4.5 UG-RNN With Contracted Rings (UG-RNN-CR) . 18 4.6 Example: UG-RNN Model of Propionic Acid . 20 5 Implementation 24 6 Data & Results 26 6.1 Aqueous Solubility Prediction . 26 6.2 Melting Point Prediction . 28 iii 7 Conclusions 30 Bibliography 32 A Source Code 37 A.1 UGRNN .
    [Show full text]
  • Course Outline
    Department of Computer Science and Software Engineering COMP 6811 Bioinformatics Algorithms (Reading Course) Fall 2019 Section AA Instructor: Gregory Butler Curriculum Description COMP 6811 Bioinformatics Algorithms (4 credits) The principal objectives of the course are to cover the major algorithms used in bioinformatics; sequence alignment, multiple sequence alignment, phylogeny; classi- fying patterns in sequences; secondary structure prediction; 3D structure prediction; analysis of gene expression data. This includes dynamic programming, machine learning, simulated annealing, and clustering algorithms. Algorithmic principles will be emphasized. A project is required. Outline of Topics The course will focus on algorithms for protein sequence analysis. It will not cover genome assembly, genome mapping, or gene recognition. • Background in Biology and Genomics • Sequence Alignment: Pairwise and Multiple • Representation of Protein Amino Acid Composition • Profile Hidden Markov Models • Specificity Determining Sites • Curation, Annotation, and Ontologies • Machine Learning: Secondary Structure, Signals, Subcellular Location • Protein Families, Phylogenomics, and Orthologous Groups • Profile-Based Alignments • Algorithms Based on k-mers 1 Texts | in Library D. Higgins and W. Taylor (editors). Bioinformatics: Sequence, Structure and Databanks, Oxford University Press, 2000. A. D. Baxevanis and B. F. F. Ouelette. Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins, Wiley, 1998. Richard Durbin, Sean R. Eddy, Anders Krogh,
    [Show full text]
  • BMC Bioinformatics Biomed Central
    BMC Bioinformatics BioMed Central Proceedings Open Access NIPS workshop on New Problems and Methods in Computational Biology Gal Chechik*1, Christina Leslie2, William Stafford Noble3, Gunnar Rätsch4, Quaid Morris5 and Koji Tsuda6 Address: 1Computer Science Department, Stanford University, 353 Serra Mall, Stanford University, Stanford, CA 94305, USA, 2Computational Biology Program, Memorial Sloan-Kettering Cancer Center, 1275 York Ave, Box 460, New York, NY 10065, USA, 3Department of Genome Sciences, University of Washington, 1705 NE Pacific St, Seattle, WA 98109, USA, 4Friedrich Miescher Laboratory of the Max Planck Society, Spemannstr. 39, 72076 Tübingen, Germany, 5Terrence Donnelley Centre for Cellular and Biomolecular Research, University of Toronto, 160 College Street, Toronto, Ontario, M5S 3E1, Canada and 6Max Planck Institute for Biological Cybernetics, Spemannstr. 38, 72076 Tübingen, Germany Email: Gal Chechik* - [email protected]; Christina Leslie - [email protected]; William Stafford Noble - [email protected]; Gunnar Rätsch - [email protected]; Quaid Morris - [email protected]; Koji Tsuda - [email protected] * Corresponding author from NIPS workshop on New Problems and Methods in Computational Biology Whistler, Canada. 8 December 2006 Published: 21 December 2007 BMC Bioinformatics 2007, 8(Suppl 10):S1 doi:10.1186/1471-2105-8-S10-S1 <supplement> <title> <p>Neural Information Processing Systems (NIPS) workshop on New Problems and Methods in Computational Biology</p> </title> <editor>Gal Chechik, Christina Leslie, William Stafford Noble, Gunnar Rätsch, Quiad Morris and Koji Tsuda</editor> <note>Proceedings</note> <url>http://www.biomedcentral.com/content/pdf/1471-2105-8-S10-info.pdf</url> </supplement> This article is available from: http://www.biomedcentral.com/1471-2105/8/S10/S1 © 2007 Chechik et al; licensee BioMed Central Ltd.
    [Show full text]
  • Report from the California Breast Cancer Research Program to the California Legislature: 2010–2015
    Report from the California Breast Cancer Research Program to the California Legislature: 2010–2015 December 2015 California Breast Cancer Research Program Annual Report to the State of California Legislature 2015 Report prepared by the University of California, Office of the President pursuant to Article 1 of Chapter 2 of Part 1 of Division 103 of the California Health and Safety Code Marion H. E. Kavanaugh-Lynch, M.D., M.P.H. Director, California Breast Cancer Research Program Mary Croughan, Ph.D. Executive Director, Research Grants Program Office William Tucker, Ph.D. Interim Vice President for Research and Graduate Studies Aimée Dorr, Ph.D. Provost and Executive Vice President Janet Napolitano, J.D. President California Breast Cancer Research Program University of California, Office of the President 300 Lakeside Drive, 6th Floor Oakland, CA 94612-3550 Phone: (510) 987-9884 Toll-free: (888) 313-BCRP Fax: (510) 587-6325 Email: [email protected] Web: http://www.CABreastCancer.org 2 Report from the California Breast Cancer Research Program to the California Legislature December 2015 Table of Contents Executive Summary 4 About the California Breast Cancer Research Program 12 CBCRP’s Strategy for Allocating Research Funds 18 Relationship between Federal and State Funding for Breast 25 Cancer Research Funding and Research Highlights, 2010–2015 30 Funding and Research Detail: The Special Research 31 Initiatives Funding and Research Details: The Community Impact 40 of Breast Cancer Funding and Research Details: Etiology and Prevention
    [Show full text]