Lecture Notes in Bioinformatics 5541 Edited by S
Total Page:16
File Type:pdf, Size:1020Kb
Load more
										Recommended publications
									
								- 
												  Classifying Transport Proteins Using Profile Hidden Markov Models AndClassifying Transport Proteins Using Profile Hidden Markov Models and Specificity Determining Sites Qing Ye A Thesis in The Department of Computer Science and Software Engineering Presented in Partial Fulfillment of the Requirements for the Degree of Master of Computer Science (MCompSc) at Concordia University Montréal, Québec, Canada April 2019 ⃝c Qing Ye, 2019 CONCORDIA UNIVERSITY School of Graduate Studies This is to certify that the thesis prepared By: Qing Ye Entitled: Classifying Transport Proteins Using Profile Hidden Markov Models and Specificity Determining Sites and submitted in partial fulfillment of the requirements for the degree of Master of Computer Science (MCompSc) complies with the regulations of this University and meets the accepted standards with respect to originality and quality. Signed by the Final Examining Committee: Chair Dr. T.-H. Chen Examiner Dr. T. Glatard Examiner Dr. A. Krzyzak Supervisor Dr. G. Butler Approved by Martin D. Pugh, Chair Department of Computer Science and Software Engineering 2019 Amir Asif, Dean Faculty of Engineering and Computer Science Abstract Classifying Transport Proteins Using Profile Hidden Markov Models and Specificity Determining Sites Qing Ye This thesis develops methods to classifiy the substrates transported across a membrane by a given transmembrane protein. Our methods use tools that predict specificity determining sites (SDS) after computing a multiple sequence alignment (MSA), and then building a profile Hidden Markov Model (HMM) using HMMER. In bioinformatics, HMMER is a set of widely used applications for sequence analysis based on profile HMM. Specificity determining sites (SDS) are the key positions in a protein sequence that play a crucial role in functional variation within the protein family during the course of evolution.
- 
												  BIOGRAPHICAL SKETCH NAME: BergerBIOGRAPHICAL SKETCH NAME: Berger, Bonnie eRA COMMONS USER NAME (credential, e.g., agency login): BABERGER POSITION TITLE: Simons Professor of Mathematics and Professor of Electrical Engineering and Computer Science EDUCATION/TRAINING (Begin with baccalaureate or other initial professional education, such as nursing, include postdoctoral training and residency training if applicable. Add/delete rows as necessary.) EDUCATION/TRAINING DEGREE Completion (if Date FIELD OF STUDY INSTITUTION AND LOCATION applicable) MM/YYYY Brandeis University, Waltham, MA AB 06/1983 Computer Science Massachusetts Institute of Technology SM 01/1986 Computer Science Massachusetts Institute of Technology Ph.D. 06/1990 Computer Science Massachusetts Institute of Technology Postdoc 06/1992 Applied Mathematics A. Personal Statement Advances in modern biology revolve around automated data collection and sharing of the large resulting datasets. I am considered a pioneer in the area of bringing computer algorithms to the study of biological data, and a founder in this community that I have witnessed grow so profoundly over the last 26 years. I have made major contributions to many areas of computational biology and biomedicine, largely, though not exclusively through algorithmic innovations, as demonstrated by nearly twenty thousand citations to my scientific papers and widely-used software. In recognition of my success, I have just been elected to the National Academy of Sciences and in 2019 received the ISCB Senior Scientist Award, the pinnacle award in computational biology. My research group works on diverse challenges, including Computational Genomics, High-throughput Technology Analysis and Design, Biological Networks, Structural Bioinformatics, Population Genetics and Biomedical Privacy. I spearheaded research on analyzing large and complex biological data sets through topological and machine learning approaches; e.g.
- 
												  The Myth of Junk DNAThe Myth of Junk DNA JoATN h A N W ells s eattle Discovery Institute Press 2011 Description According to a number of leading proponents of Darwin’s theory, “junk DNA”—the non-protein coding portion of DNA—provides decisive evidence for Darwinian evolution and against intelligent design, since an intelligent designer would presumably not have filled our genome with so much garbage. But in this provocative book, biologist Jonathan Wells exposes the claim that most of the genome is little more than junk as an anti-scientific myth that ignores the evidence, impedes research, and is based more on theological speculation than good science. Copyright Notice Copyright © 2011 by Jonathan Wells. All Rights Reserved. Publisher’s Note This book is part of a series published by the Center for Science & Culture at Discovery Institute in Seattle. Previous books include The Deniable Darwin by David Berlinski, In the Beginning and Other Essays on Intelligent Design by Granville Sewell, God and Evolution: Protestants, Catholics, and Jews Explore Darwin’s Challenge to Faith, edited by Jay Richards, and Darwin’s Conservatives: The Misguided Questby John G. West. Library Cataloging Data The Myth of Junk DNA by Jonathan Wells (1942– ) Illustrations by Ray Braun 174 pages, 6 x 9 x 0.4 inches & 0.6 lb, 229 x 152 x 10 mm. & 0.26 kg Library of Congress Control Number: 2011925471 BISAC: SCI029000 SCIENCE / Life Sciences / Genetics & Genomics BISAC: SCI027000 SCIENCE / Life Sciences / Evolution ISBN-13: 978-1-9365990-0-4 (paperback) Publisher Information Discovery Institute Press, 208 Columbia Street, Seattle, WA 98104 Internet: http://www.discoveryinstitutepress.com/ Published in the United States of America on acid-free paper.
- 
												  BIOGRAPHICAL SKETCH NAME: Bonnie Berger POSITION TITLEBIOGRAPHICAL SKETCH NAME: Bonnie Berger POSITION TITLE: Simons Professor of Mathematics and Professor of Electrical Engineering & Computer Science EDUCATION/TRAINING (Begin with baccalaureate or other initial professional education, such as nursing, include postdoctoral training and residency training if applicable. Add/delete rows as necessary.) DEGREE Completion (if Date FIELD OF STUDY INSTITUTION AND LOCATION applicable) MM/YYYY Brandeis University, Waltham, MA AB 06/1983 Computer Science Massachusetts Institute of Technology SM 01/1986 Computer Science Massachusetts Institute of Technology Ph.D. 06/1990 Computer Science Massachusetts Institute of Technology Postdoc 06/1992 Applied Mathematics A. Personal Statement Many advances in modern biology revolve around automated data collection and the large resulting data sets. I am considered a pioneer in the area of bringing computer algorithms to the study of biological data, and a founder in this community that I have witnessed grow so profoundly over the last 20 years. I have made major contributions to many areas of computational biology and biomedicine, largely, though not exclusively through algorithmic insights, as demonstrated by ten thousand citations to my scientific papers and widely-used software. My research group works on diverse challenges, including and Computational Genomics, Structural Bioinformatics, High-throughput Technology Analysis and Design, Network Inference, and Data Privacy. We collaborate closely with biologists, MDs, and software engineers, implementing these new techniques in order to design experiments to maximally leverage the power of computation for biological exploration. Over the past five years I have been particularly active analyzing large and complex biological data sets; for example, my lab has played integral roles in modENCODE (non-coding RNA analysis), MPEG (biological data compression standard), and the Broad Institute’s sequence analysis efforts.
- 
												  Genome InformaticsJoint Cold Spring Harbor Laboratory/Wellcome Trust Conference GENOME INFORMATICS September 15–September 19, 2010 View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Cold Spring Harbor Laboratory Institutional Repository Joint Cold Spring Harbor Laboratory/Wellcome Trust Conference GENOME INFORMATICS September 15–September 19, 2010 Arranged by Inanc Birol, BC Cancer Agency, Canada Michele Clamp, BioTeam, Inc. James Kent, University of California, Santa Cruz, USA SCHEDULE AT A GLANCE Wednesday 15th September 2010 17.00-17.30 Registration – finger buffet dinner served from 17.30-19.30 19.30-20:50 Session 1: Epigenomics and Gene Regulation 20.50-21.10 Break 21.10-22.30 Session 1, continued Thursday 16th September 2010 07.30-09.00 Breakfast 09.00-10.20 Session 2: Population and Statistical Genomics 10.20-10:40 Morning Coffee 10:40-12:00 Session 2, continued 12.00-14.00 Lunch 14.00-15.20 Session 3: Environmental and Medical Genomics 15.20-15.40 Break 15.40-17.00 Session 3, continued 17.00-19.00 Poster Session I and Drinks Reception 19.00-21.00 Dinner Friday 17th September 2010 07.30-09.00 Breakfast 09.00-10.20 Session 4: Databases, Data Mining, Visualization and Curation 10.20-10.40 Morning Coffee 10.40-12.00 Session 4, continued 12.00-14.00 Lunch 14.00-16.00 Free afternoon 16.00-17.00 Keynote Speaker: Alex Bateman 17.00-19.00 Poster Session II and Drinks Reception 19.00-21.00 Dinner Saturday 18th September 2010 07.30-09.00 Breakfast 09.00-10.20 Session 5: Sequencing Pipelines and Assembly 10.20-10.40
- 
												  BIOINFORMATICS Doi:10.1093/Bioinformatics/Btu322Vol. 30 ISMB 2014, pages i3–i8 BIOINFORMATICS doi:10.1093/bioinformatics/btu322 ISMB 2014 PROCEEDINGS PAPERS COMMITTEE PROCEEDINGS PAPERS COMMITTEE CHAIRS F. Gene Regulation and Transcriptomics Serafim Batzoglu, Stanford University, United States Alexander Haremink, Duke University, Durham, United States Russell Schwartz, Carnegie Mellon University, Pittsburgh, Zohar Yakhini, Agilent, Haifa, Israel United States G. Mass Spectrometry and Proteomics Olga Vitek, Purdue University, West Lafayette, United States Bill Noble, University of Washington, Seattle, United States PROCEEDINGS PAPERS-AREA CHAIRS A. Applied Bioinformatics H. Metabolic Networks Thomas Lengauer, Max Planck Institute for Informatics, Jason Papin, University of Virginia, Charlottesville, United Saarbrucken, Germany States Lenore Cowen, Tufts University, Medford, United States I. Population Genomics Eran Halperin, Tel-Aviv University, Israel B. Bioimaging and Data Visualization Itsik Pe’er, Columbia University, New York, United States Robert Murphy, Carnegie Mellon University, Pittsburgh, United States J. Protein Interactions and Molecular Networks Mona Singh, Princeton University, United States C. Databases and Ontologies and Text Mining Trey Ideker, UC San Diego, United States Hagit Shatkay, University of Delaware, Newark, United States Alex Bateman, European Bioinformatics Institute (EMBL-EBI), K. Protein Structure and Function Wellcome Trust Genome Campus, Hinxton, United Kingdom Jie Liang, University of Illinois at Chicago, United States Jinbo Xu, Toyota Technical Institute,
- 
												  Reports from the Fifth Edition of CAGI: the Critical Assessment of Genome InterpretationReceived: 16 July 2019 | Accepted: 19 July 2019 DOI: 10.1002/humu.23876 OVERVIEW Reports from the fifth edition of CAGI: The Critical Assessment of Genome Interpretation Gaia Andreoletti1 | Lipika R. Pal2 | John Moult2,3 | Steven E. Brenner1,4 1Department of Plant and Microbial Biology, University of California, Berkeley, California Abstract 2Institute for Bioscience and Biotechnology Interpretation of genomic variation plays an essential role in the analysis of Research, University of Maryland, Rockville, cancer and monogenic disease, and increasingly also in complex trait disease, with Maryland 3Department of Cell Biology and Molecular applications ranging from basic research to clinical decisions. Many computational Genetics, University of Maryland, College impact prediction methods have been developed, yet the field lacks a clear consensus Park, Maryland on their appropriate use and interpretation. The Critical Assessment of Genome 4Center for Computational Biology, University of California, Berkeley, California Interpretation (CAGI, /'kā‐jē/) is a community experiment to objectively assess computational methods for predicting the phenotypic impacts of genomic variation. Correspondence John Moult, Institute for Bioscience and CAGI participants are provided genetic variants and make blind predictions of Biotechnology Research, University of resulting phenotype. Independent assessors evaluate the predictions by comparing Maryland, 9600 Gudelsky Drive, Rockville, MD 20850. with experimental and clinical data. Email: [email protected] CAGI has completed five editions with the goals of establishing the state of art in Steven E. Brenner, Department of Plant and genome interpretation and of encouraging new methodological developments. This Microbial Biology, University of California, special issue (https://onlinelibrary.wiley.com/toc/10981004/2019/40/9) comprises Berkeley, CA 94720.
- 
											Analyses of Deep Mammalian Sequence Alignments and Constraint Predictions for 1% of the Human GenomeDownloaded from genome.cshlp.org on October 3, 2021 - Published by Cold Spring Harbor Laboratory Press Article Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome Elliott H. Margulies,2,7,8,21 Gregory M. Cooper,2,3,9 George Asimenos,2,10 Daryl J. Thomas,2,11,12 Colin N. Dewey,2,4,13 Adam Siepel,5,12 Ewan Birney,14 Damian Keefe,14 Ariel S. Schwartz,13 Minmei Hou,15 James Taylor,15 Sergey Nikolaev,16 Juan I. Montoya-Burgos,17 Ari Löytynoja,14 Simon Whelan,6,14 Fabio Pardi,14 Tim Massingham,14 James B. Brown,18 Peter Bickel,19 Ian Holmes,20 James C. Mullikin,8,21 Abel Ureta-Vidal,14 Benedict Paten,14 Eric A. Stone,9 Kate R. Rosenbloom,12 W. James Kent,11,12 NISC Comparative Sequencing Program,1,8,21 Baylor College of Medicine Human Genome Sequencing Center,1 Washington University Genome Sequencing Center,1 Broad Institute,1 UCSC Genome Browser Team,1 British Columbia Cancer Agency Genome Sciences Center,1 Stylianos E. Antonarakis,16 Serafim Batzoglou,10 Nick Goldman,14 Ross Hardison,22 David Haussler,11,12,24 Webb Miller,22 Lior Pachter,24 Eric D. Green,8,21 and Arend Sidow9,25 A key component of the ongoing ENCODE project involves rigorous comparative sequence analyses for the initially targeted 1% of the human genome. Here, we present orthologous sequence generation, alignment, and evolutionary constraint analyses of 23 mammalian species for all ENCODE targets. Alignments were generated using four different methods; comparisons of these methods reveal large-scale consistency but substantial differences in terms of small genomic rearrangements, sensitivity (sequence coverage), and specificity (alignment accuracy).
- 
											Structural Variation Discovery: the Easy, the HardSTRUCTURAL VARIATION DISCOVERY: THE EASY, THE HARD AND THE UGLY by Fereydoun Hormozdiari B.Sc., Sharif University of Technology, 2004 M.Sc., Simon Fraser University, 2007 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY in the School of Computing Science Faculty of Applied Science c Fereydoun Hormozdiari 2011 SIMON FRASER UNIVERSITY Fall 2011 All rights reserved. However, in accordance with the Copyright Act of Canada, this work may be reproduced without authorization under the conditions for Fair Dealing. Therefore, limited reproduction of this work for the purposes of private study, research, criticism, review and news reporting is likely to be in accordance with the law, particularly if cited appropriately. APPROVAL Name: Fereydoun Hormozdiari Degree: Doctor of Philosophy Title of thesis: Structural Variation Discovery: the easy, the hard and the ugly Examining Committee: Dr. Ramesh Krishnamurti Chair Dr. S. Cenk Sahinalp, Senior Supervisor Computing Science, Simon Fraser University Dr. Evan E. Eichler, Supervisor Genome Sciences, University of Washington Dr. Artem Cherkasov, Supervisor Urologic Sciences, University of British Columbia Dr. Inanc Birol, Supervisor Bioinformatics group leader, GSC, BC Cancer Agency Dr. Fiona Brinkman, Internal Examiner Molecular Biology and Chemistry, Simon Fraser Univer- sity Dr. Serafim Batzoglou, External Examiner Computer Science, Stanford University Date Approved: August 22nd, 2011 ii Partial Copyright Licence Abstract Comparison of human genomes shows that along with single nucleotide polymorphisms and small indels, larger structural variants (SVs) are common. Recent studies even suggest that more base pairs are altered as a result of structural variations (including copy number variations) than as a result of single nucleotide variations or small indels.
- 
												  ABSTRACT GENOME ASSEMBLY and VARIANT DETECTION USING EMERGING SEQUENCING TECHNOLOGIES and GRAPH BASED METHODS Jay Ghurye DoctorABSTRACT Title of dissertation: GENOME ASSEMBLY AND VARIANT DETECTION USING EMERGING SEQUENCING TECHNOLOGIES AND GRAPH BASED METHODS Jay Ghurye Doctor of Philosophy, 2018 Dissertation directed by: Professor Mihai Pop Department of Computer Science The increased availability of genomic data and the increased ease and lower costs of DNA sequencing have revolutionized biomedical research. One of the critical steps in most bioinformatics analyses is the assembly of the genome sequence of an organ- ism using the data generated from the sequencing machines. Despite the long length of sequences generated by third-generation sequencing technologies (tens of thousands of basepairs), the automated reconstruction of entire genomes continues to be a formidable computational task. Although long read technologies help in resolving highly repetitive regions, the contigs generated from long read assembly do not always span a complete chromosome or even an arm of the chromosome. Recently, new genomic technologies have been developed that can “bridge” across repeats or other genomic regions that are difficult to sequence or assemble and improve genome assemblies by “scaffolding” to- gether large segments of the genome. The problem of scaffolding is vital in the context of both single genome assembly of large eukaryotic genomes and in metagenomics where the goal is to assemble multiple bacterial genomes in a sample simultaneously. First, we describe SALSA2, a method we developed to use interaction frequency between any two loci in the genome obtained using Hi-C technology to scaffold frag- mented eukaryotic genome assemblies into chromosomes. SALSA2 can be used with either short or long read assembly to generate highly contiguous and accurate chromo- some level assemblies.
- 
												  ENCODE MSA.PdfDownloaded from www.genome.org on June 14, 2007 Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome Elliott H. Margulies, Gregory M. Cooper, George Asimenos, Daryl J. Thomas, Colin N. Dewey, Adam Siepel, Ewan Birney, Damian Keefe, Ariel S. Schwartz, Minmei Hou, James Taylor, Sergey Nikolaev, Juan I. Montoya-Burgos, Ari Löytynoja, Simon Whelan, Fabio Pardi, Tim Massingham, James B. Brown, Peter Bickel, Ian Holmes, James C. Mullikin, Abel Ureta-Vidal, Benedict Paten, Eric A. Stone, Kate R. Rosenbloom, W. James Kent, Gerard G. Bouffard, Xiaobin Guan, Nancy F. Hansen, Jacquelyn R. Idol, Valerie V.B. Maduro, Baishali Maskeri, Jennifer C. McDowell, Morgan Park, Pamela J. Thomas, Alice C. Young, Robert W. Blakesley, Donna M. Muzny, Erica Sodergren, David A. Wheeler, Kim C. Worley, Huaiyang Jiang, George M. Weinstock, Richard A. Gibbs, Tina Graves, Robert Fulton, Elaine R. Mardis, Richard K. Wilson, Michele Clamp, James Cuff, Sante Gnerre, David B. Jaffe, Jean L. Chang, Kerstin Lindblad-Toh, Eric S. Lander, Angie Hinrichs, Heather Trumbower, Hiram Clawson, Ann Zweig, Robert M. Kuhn, Galt Barber, Rachel Harte, Donna Karolchik, Matthew A. Field, Richard A. Moore, Carrie A. Matthewson, Jacqueline E. Schein, Marco A. Marra, Stylianos E. Antonarakis, Serafim Batzoglou, Nick Goldman, Ross Hardison, David Haussler, Webb Miller, Lior Pachter, Eric D. Green and Arend Sidow Genome Res. 2007 17: 760-774 Access the most recent version at doi:10.1101/gr.6034307 Supplementary "Supplemental Reseach Data" data http://www.genome.org/cgi/content/full/17/6/760/DC1 References This article cites 68 articles, 33 of which can be accessed free at: http://www.genome.org/cgi/content/full/17/6/760#References Article cited in: http://www.genome.org/cgi/content/full/17/6/760#otherarticles Open Access Freely available online through the Genome Research Open Access option.
- 
												  Algorithms for Biological Network Alignment AALGORITHMS FOR BIOLOGICAL NETWORK ALIGNMENT A DISSERTATION SUBMITTED TO THE DEPARTMENT OF COMPUTER SCIENCE AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY Jason Flannick August 2009 c Copyright by Jason Flannick 2009 All Rights Reserved ii I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. (Serafim Batzoglou) Principal Adviser I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. (Harley McAdams) I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. (David Dill) Approved for the University Committee on Graduate Studies. iii iv Abstract A major goal in the post-genomic era is to understand how genes and proteins organize to ulti- mately cause complex traits. There are multiple levels of biological organization, such as low-level interactions between pairs of molecules, higher-level metabolic pathways and molecular complexes, and ultimately high-level function of an organism. Interaction networks summarize interactions between pairs of molecules, which are the building blocks for higher levels of molecular organization. As databases of interaction networks continue to grow in size and complexity, new computational tools are needed to search them for these higher levels of organization.