ABSTRACT GENOME ASSEMBLY and VARIANT DETECTION USING EMERGING SEQUENCING TECHNOLOGIES and GRAPH BASED METHODS Jay Ghurye Doctor

Total Page:16

File Type:pdf, Size:1020Kb

ABSTRACT GENOME ASSEMBLY and VARIANT DETECTION USING EMERGING SEQUENCING TECHNOLOGIES and GRAPH BASED METHODS Jay Ghurye Doctor ABSTRACT Title of dissertation: GENOME ASSEMBLY AND VARIANT DETECTION USING EMERGING SEQUENCING TECHNOLOGIES AND GRAPH BASED METHODS Jay Ghurye Doctor of Philosophy, 2018 Dissertation directed by: Professor Mihai Pop Department of Computer Science The increased availability of genomic data and the increased ease and lower costs of DNA sequencing have revolutionized biomedical research. One of the critical steps in most bioinformatics analyses is the assembly of the genome sequence of an organ- ism using the data generated from the sequencing machines. Despite the long length of sequences generated by third-generation sequencing technologies (tens of thousands of basepairs), the automated reconstruction of entire genomes continues to be a formidable computational task. Although long read technologies help in resolving highly repetitive regions, the contigs generated from long read assembly do not always span a complete chromosome or even an arm of the chromosome. Recently, new genomic technologies have been developed that can “bridge” across repeats or other genomic regions that are difficult to sequence or assemble and improve genome assemblies by “scaffolding” to- gether large segments of the genome. The problem of scaffolding is vital in the context of both single genome assembly of large eukaryotic genomes and in metagenomics where the goal is to assemble multiple bacterial genomes in a sample simultaneously. First, we describe SALSA2, a method we developed to use interaction frequency between any two loci in the genome obtained using Hi-C technology to scaffold frag- mented eukaryotic genome assemblies into chromosomes. SALSA2 can be used with either short or long read assembly to generate highly contiguous and accurate chromo- some level assemblies. Hi-C data are known to introduce small inversion errors in the assembly, so we included the assembly graph in the scaffolding process and used the sequence overlap information to correct the orientation errors. Next, we present our contributions to metagenomics. We developed a scaffolding and variant detection method “MetaCarvel” for metagenomic datasets. Several factors such as the presence of inter-genomic repeats, coverage ambiguities, and polymorphic regions in the genomes complicate the task of scaffolding metagenomes. Variant detec- tion is also tricky in metagenomes because the different genomes within these complex samples are not known beforehand. We showed that MetaCarvel was able to generate accurate scaffolds and find genome-wide variations de novo in metagenomic datasets. Finally, we present EDIT, a tool for clustering millions of DNA sequence fragments originating from the highly conserved 16s rRNA gene in bacteria. We extended classi- cal Four Russians’ speed up to banded sequence alignment and showed that our method clusters highly similar sequences efficiently. This method can also be used to remove duplicates or near duplicate sequences from a dataset. With the increasing data being generated in different genomic and metagenomic studies using emerging sequencing technologies, our software tools and algorithms are well timed with the need of the community. GENOME ASSEMBLY AND VARIANT DETECTION USING EMERGING SEQUENCING TECHNOLOGIES AND GRAPH BASED METHODS by Jay Ghurye Dissertation submitted to the Faculty of the Graduate School of the University of Maryland, College Park in partial fulfillment of the requirements for the degree of Doctor of Philosophy 2018 Advisory Committee: Professor Mihai Pop, Chair/Advisor Dr. Adam M. Phillippy, Professor Hector´ Corrada Bravo, Professor Aravind Srinivasan, Professor Max Leiserson, Professor Michael Cummings, Dean’s Representative c Copyright by Jay Ghurye 2018 Preface The algorithms, software, and results in this dissertation have either been published in peer-reviewed journals and conferences or are currently under preparation for submis- sion. At the time of this writing, Chapters 2, 4, 5, and 7 have already been published or submitted for publication and are reformatted here. Chapter 3 and 6 are under preparation for submission. I am indebted to my co-authors on these projects - their dedication and knowledge in the areas of computer science, statistics, and biology have resulted in much stronger scientific papers. • Chapter 1 - Jay Ghurye, Victoria Cepeda-Espinoza, and Mihai Pop. “Focus: Microbiome: Metagenomic Assembly: Overview, Challenges and Applications.” The Yale jour- nal of biology and medicine 89.3 (2016): 353. My contributions to these works include (1) surveying the literature for recent and seminal results, and (2) writing the manuscripts. • Chapter 2 - Jay Ghurye and Mihai Pop. “Algorithms and technologies for uncovering genome structure,” Under review. My contributions to these works include (1) surveying the literature for recent and seminal results, and (2) writing the manuscripts. • Chapter 3 - Jay Ghurye, Mihai Pop, Sergey Koren, Derek Bickhart, and Chen-Shan Chin. ii “Scaffolding of long read assemblies using long range contact information.” BMC genomics 18, no. 1 (2017): 527. - Jay Ghurye, Arang Rhie, Brian P. Walenz, Anthony Schmitt, Siddarth Selvaraj, Mihai Pop, Adam M. Phillippy, and Sergey Koren. “Integrating Hi-C links with assembly graphs for chromosome-scale assembly.” bioRxiv (2018): 261149. My contributions to these works include (1) design and implementation of the al- gorithm, (2) doing software evaluation, and (3) writing the manuscript. • Chapter 4 - Jay Ghurye, Sergey Koren, Arang Rhie, Brian Walenz, Siddarth Selvaraj, An- thony Schmitt and Adam Phillippy “Effect of different Hi-C library preparation on eukaryotic genome scaffolding”, Under preparation My contributions to these works include (1) designing and running experiments, (2) writing the manuscript. • Chapter 5 - Jay Ghurye, and Mihai Pop. “Better Identification of Repeats in Metagenomic Scaffolding.” In International Workshop on Algorithms in Bioinformatics, pp. 174- 184. Springer, Cham, 2016. My contributions to these works include (1) design and implementation of the al- gorithm, (2) doing software evaluation, (3) writing the manuscript. • Chapter 6 - Jay Ghurye, Todd Treangen, Sergey Koren, Marcus Fedarko, W. Judson Hervey iii IV and Mihai Pop “MetaCarvel: linking assembly graph motifs to biological vari- ants”, Under preparation. My contributions to these works include (1) design and implementation of the al- gorithm, (2) doing software evaluation, and (3) writing the manuscript. • Chapter 7 - Brian Brubach∗, Jay Ghurye∗, Mihai Pop and Aravind Srinivasan, 2017. “Bet- ter Greedy Sequence Clustering with Fast Banded Alignment”. In LIPIcs-Leibniz International Proceedings in Informatics (Vol. 88). Schloss Dagstuhl-Leibniz- Zentrum fuer Informatik. (∗ denotes equal contribution) My contributions to these works include (1) design and implementation of the al- gorithm, (2) doing software evaluation, and (3) writing the manuscript. Here is the list of other publications I have been involved, either as a first author or a contributing author. • Jay Ghurye, Gautier Krings and Vanessa Frias-Martinez, 2016, June. “A Frame- work to Model Human Behavior at Large Scale during Natural Disasters”. In Mo- bile Data Management (MDM), 2016 17th IEEE International Conference on (Vol. 1, pp. 18-27). IEEE. • Brian Brubach∗, Jay Ghurye∗, 2018. “A Succinct Four Russian’s Speedup for Edit Distance Computation and One-against-many Banded Alignment”. In LIPIcs- Leibniz International Proceedings in Informatics (Vol. 105). Schloss Dagstuhl- Leibniz-Zentrum fuer Informatik (∗ denotes equal contribution). iv • Jay Ghurye, Sergey Koren, Scott Small, Adam Phillippy, and Nora Besansky. “De novo assembly of Anopheles funestus genome”. (Under preparation) • - Marcus Fedarko, Jay Ghurye, Todd Treagen, and Mihai Pop. “MetagenomeScope: Web-Based Hierarchical Visualization of Metagenome Assembly Graphs.” (2017): 630-632. (Under submission) • Nathan D. Olson, Todd J. Treangen, Christopher M. Hill, Victoria Cepeda-Espinoza, Jay Ghurye, Sergey Koren, and Mihai Pop. “Metagenomic assembly through the lens of validation: recent advances in assessing and improving the quality of genomes assembled from metagenomes.” Briefings in bioinformatics (2017). • Seth Commichaux, Nidhi Shah, Alexander Stoppel, Jay Ghurye, Michael Cum- mings and Mihai Pop. “A Critical Analysis of the Integrated Gene Catalog,” (Under preparation) v Dedication To my parents for their unconditional love and support vi Acknowledgments I owe my gratitude to all the people who have made this thesis possible and because of whom my graduate experience has been the one that I will cherish forever. First, I would like to thank my advisor, Professor Mihai Pop for giving me an in- valuable opportunity to work on challenging and exciting projects. It has been a pleasure to work with and learn from such an extraordinary individual. His advice has always helped me at various stages in the graduate program and helped me grow as a researcher and better person. I would also like to thank all of my other committee members - Michael Cum- mings, Aravind Srinivasan, Hector´ Corrada Bravo, Max Leiserson, and Adam Phillippy for being interested in my research. Your feedback has been instrumental in shaping this dissertation. I would like to thank all my collaborators who contributed to the work described in this dissertation. Primarily, I would want to thank Adam Phillippy for giving me an opportunity to work on exciting projects with his group at NIH. A special thank you to Sergey Koren and Arang Rhie for continuous
Recommended publications
  • ISCB Ebola Award for Important Future Research on the Computational Biology of Ebola Virus
    ISCB Ebola Award for Important Future Research on the Computational Biology of Ebola Virus Journal Information Article/Issue Information Journal ID (nlm-ta): F1000Res Self URI: f1000research-4-6464.pdf Journal ID (iso-abbrev): F1000Res Date accepted: 13 January 2015 Journal ID (pmc): F1000Research Publication date (electronic): 15 January 2015 Title: F1000Research Publication date (collection): 2015 ISSN (electronic): 2046-1402 Volume: 4 Publisher: F1000Research (London, UK) Electronic Location Identifier: 12 Article Id (accession): PMC4457108 Article Id (pmcid): PMC4457108 Article Id (pmc-uid): 4457108 PubMed ID: 26097686 DOI: 10.12688/f1000research.6038.1 Funding: The author(s) declared that no grants were involved in supporting this work. Categories Subject: Editorial Categories Subject: Articles Subject: Bioinformatics Subject: Theory & Simulation Subject: Tropical & Travel-Associated Diseases Subject: Viral Infections (without HIV) Subject: Virology ISCB Ebola Award for Important Future Research on the Computational Biology of Ebola Virus v1; ref status: not peer reviewed Peter D. Karp Bonnie Berger Diane Kovats Thomas Lengauer Michal Linial Pardis Sabeti Winston Hide Burkhard Rost 1 1International Society for Computational Biology, La Jolla, CA, USA 2 2SRI International, Menlo Park, CA, USA 3 3Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA, USA 4 4Computational Biology and Applied Algorithmics, Max Planck Institute for Informatics, Saarbruecken, Germany 5 5Hebrew University & Institute of Advanced
    [Show full text]
  • Classifying Transport Proteins Using Profile Hidden Markov Models And
    Classifying Transport Proteins Using Profile Hidden Markov Models and Specificity Determining Sites Qing Ye A Thesis in The Department of Computer Science and Software Engineering Presented in Partial Fulfillment of the Requirements for the Degree of Master of Computer Science (MCompSc) at Concordia University Montréal, Québec, Canada April 2019 ⃝c Qing Ye, 2019 CONCORDIA UNIVERSITY School of Graduate Studies This is to certify that the thesis prepared By: Qing Ye Entitled: Classifying Transport Proteins Using Profile Hidden Markov Models and Specificity Determining Sites and submitted in partial fulfillment of the requirements for the degree of Master of Computer Science (MCompSc) complies with the regulations of this University and meets the accepted standards with respect to originality and quality. Signed by the Final Examining Committee: Chair Dr. T.-H. Chen Examiner Dr. T. Glatard Examiner Dr. A. Krzyzak Supervisor Dr. G. Butler Approved by Martin D. Pugh, Chair Department of Computer Science and Software Engineering 2019 Amir Asif, Dean Faculty of Engineering and Computer Science Abstract Classifying Transport Proteins Using Profile Hidden Markov Models and Specificity Determining Sites Qing Ye This thesis develops methods to classifiy the substrates transported across a membrane by a given transmembrane protein. Our methods use tools that predict specificity determining sites (SDS) after computing a multiple sequence alignment (MSA), and then building a profile Hidden Markov Model (HMM) using HMMER. In bioinformatics, HMMER is a set of widely used applications for sequence analysis based on profile HMM. Specificity determining sites (SDS) are the key positions in a protein sequence that play a crucial role in functional variation within the protein family during the course of evolution.
    [Show full text]
  • BIOGRAPHICAL SKETCH NAME: Berger
    BIOGRAPHICAL SKETCH NAME: Berger, Bonnie eRA COMMONS USER NAME (credential, e.g., agency login): BABERGER POSITION TITLE: Simons Professor of Mathematics and Professor of Electrical Engineering and Computer Science EDUCATION/TRAINING (Begin with baccalaureate or other initial professional education, such as nursing, include postdoctoral training and residency training if applicable. Add/delete rows as necessary.) EDUCATION/TRAINING DEGREE Completion (if Date FIELD OF STUDY INSTITUTION AND LOCATION applicable) MM/YYYY Brandeis University, Waltham, MA AB 06/1983 Computer Science Massachusetts Institute of Technology SM 01/1986 Computer Science Massachusetts Institute of Technology Ph.D. 06/1990 Computer Science Massachusetts Institute of Technology Postdoc 06/1992 Applied Mathematics A. Personal Statement Advances in modern biology revolve around automated data collection and sharing of the large resulting datasets. I am considered a pioneer in the area of bringing computer algorithms to the study of biological data, and a founder in this community that I have witnessed grow so profoundly over the last 26 years. I have made major contributions to many areas of computational biology and biomedicine, largely, though not exclusively through algorithmic innovations, as demonstrated by nearly twenty thousand citations to my scientific papers and widely-used software. In recognition of my success, I have just been elected to the National Academy of Sciences and in 2019 received the ISCB Senior Scientist Award, the pinnacle award in computational biology. My research group works on diverse challenges, including Computational Genomics, High-throughput Technology Analysis and Design, Biological Networks, Structural Bioinformatics, Population Genetics and Biomedical Privacy. I spearheaded research on analyzing large and complex biological data sets through topological and machine learning approaches; e.g.
    [Show full text]
  • Using Bayesian Networks to Analyze Expression Data
    JOURNAL OFCOMPUTATIONAL BIOLOGY Volume7, Numbers 3/4,2000 MaryAnn Liebert,Inc. Pp. 601–620 Using Bayesian Networks to Analyze Expression Data NIR FRIEDMAN, 1 MICHAL LINIAL, 2 IFTACH NACHMAN, 3 andDANA PE’ER 1 ABSTRACT DNAhybridization arrayssimultaneously measurethe expression level forthousands of genes.These measurementsprovide a “snapshot”of transcription levels within the cell. Ama- jorchallenge in computationalbiology is touncover ,fromsuch measurements,gene/ protein interactions andkey biological features ofcellular systems. In this paper,wepropose a new frameworkfor discovering interactions betweengenes based onmultiple expression mea- surements. This frameworkbuilds onthe use of Bayesian networks forrepresenting statistical dependencies. ABayesiannetwork is agraph-basedmodel of joint multivariateprobability distributions thatcaptures properties ofconditional independence betweenvariables. Such models areattractive for their ability todescribe complexstochastic processes andbecause theyprovide a clear methodologyfor learning from(noisy) observations.We start byshowing howBayesian networks can describe interactions betweengenes. W ethen describe amethod forrecovering gene interactions frommicroarray data using tools forlearning Bayesiannet- works.Finally, we demonstratethis methodon the S.cerevisiae cell-cycle measurementsof Spellman et al. (1998). Key words: geneexpression, microarrays, Bayesian methods. 1.INTRODUCTION centralgoal of molecularbiology isto understand the regulation of protein synthesis and its Areactionsto external
    [Show full text]
  • Download Link at Figshare.Com/Articles/Bint Fishmovie32 100 Mat/5009840)
    Distribution Agreement In presenting this thesis or dissertation as a partial fulfillment of the requirements for an advanced degree from Emory University, I hereby grant to Emory University and its agents the non-exclusive license to archive, make accessible, and display my thesis or dissertation in whole or in part in all forms of media, now or hereafter known, including display on the world wide web. I understand that I may select some access restrictions as part of the online submission of this thesis or dissertation. I retain all ownership rights to the copyright of the thesis or dissertation. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation. Signature: Joseph L. Natale Date Inference, Dynamics, and Coarse-Graining of Large-Scale Biological Networks By Joseph L. Natale Doctor of Philosophy Physics Ilya Nemenman, Ph.D. Advisor Gordon Berman, Ph.D. Committee Member Avani Gadani, Ph.D. Committee Member H. George E. Hentschel, Ph.D. Committee Member Daniel Weissman, Ph.D. Committee Member Accepted: Lisa A. Tedesco, Ph.D. Dean of the James T. Laney School of Graduate Studies Date Inference, Dynamics, and Coarse-Graining of Large-Scale Biological Networks By Joseph L. Natale B.S., Stevens Institute of Technology, NJ, 2012 M.S., Stevens Institute of Technology, NJ, 2013 Advisor: Ilya Nemenman, Ph.D. An abstract of A dissertation submitted to the Faculty of the James T. Laney School of Graduate Studies of Emory University in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Physics 2020 Inference, Dynamics, and Coarse-Graining of Large-Scale Biological Networks By Joseph L.
    [Show full text]
  • Dear Delegates,History of Productive Scientific Discussions of New Challenging Ideas and Participants Contributing from a Wide Range of Interdisciplinary fields
    3rd IS CB S t u d ent Co u ncil S ymp os ium Welcome To The 3rd ISCB Student Council Symposium! Welcome to the Student Council Symposium 3 (SCS3) in Vienna. The ISCB Student Council's mis- sion is to develop the next generation of computa- tional biologists. We would like to thank and ac- knowledge our sponsors and the ISCB organisers for their crucial support. The SCS3 provides an ex- citing environment for active scientific discussions and the opportunity to learn vital soft skills for a successful scientific career. In addition, the SCS3 is the biggest international event targeted to students in the field of Computational Biology. We would like to thank our hosts and participants for making this event educative and fun at the same time. Student Council meetings have had a rich Dear Delegates,history of productive scientific discussions of new challenging ideas and participants contributing from a wide range of interdisciplinary fields. Such meet- We are very happy to welcomeings have you proved all touseful the in ISCBproviding Student students Council and postdocs Symposium innovative inputsin Vienna. and an Afterincreased the network suc- cessful symposiums at ECCBof potential 2005 collaborators. in Madrid and at ISMB 2006 in Fortaleza we are determined to con- tinue our efforts to provide an event for students and young researchers in the Computational Biology community. Like in previousWe ar yearse extremely our excitedintention to have is toyou crhereatee and an the opportunity vibrant city of Vforienna students welcomes to you meet to our their SCS3 event. peers from all over the world for exchange of ideas and networking.
    [Show full text]
  • Ribonucleotide Reductase Genes Influence The
    RIBONUCLEOTIDE REDUCTASE GENES INFLUENCE THE BIOLOGY AND ECOLOGY OF MARINE VIRUSES by Amelia O. Harrison A thesis submitted to the Faculty of the University of Delaware in partial fulfillment of the requirements for the degree of Master of Science in Marine Studies Fall 2019 c 2019 Amelia O. Harrison All Rights Reserved RIBONUCLEOTIDE REDUCTASE GENES INFLUENCE THE BIOLOGY AND ECOLOGY OF MARINE VIRUSES by Amelia O. Harrison Approved: K. Eric Wommack, Ph.D. Professor in charge of thesis on behalf of the Advisory Committee Approved: Mark A. Moline, Ph.D. Director of the School of Marine Science and Policy Approved: Estella A. Atekwana, Ph.D. Dean of the College of Earth, Ocean, and Environment Approved: Douglas J. Doren, Ph.D. Interim Vice Provost for Graduate and Professional Education and Dean of the Graduate College ACKNOWLEDGMENTS First, I would like to thank my advisor, Eric Wommack, for taking a chance on me as a freshman undergrad and again as a masters student. It has been a privilege to work with someone so passionate and excited about science. I would also like to thank Shawn Polson, who has also supported me and helped foster my love for research over the past six years. I would like to thank the final member of my committee, Jennifer Biddle, for her insight and general awesomeness. I am deeply thankful for all of my past and current labmates. Barb Ferrell who somehow always has time for everyone. Prasanna Joglekar, for always making me laugh. Rachel Marine, who took me on as an intern despite not having the time.
    [Show full text]
  • Methods for Global Organization of All Known Protein Sequences
    Metho ds for Global Organization of all Known Protein Sequences Thesis submitted for the degree \Do ctor of Philosophy" Golan Yona Submitted to the Senate of the Hebrew University in the year 1999 1 This work was carried out under the sup ervision of Prof. Nathan Linial, Prof. Naftali Tishby and Dr. Michal Linial. 2 Acknowledgments First, I would like to thank my advisors: Nati Linial and Tali Tishby of the computer science department and Michal Linial of the life science department at the Hebrew University. Within the last ve years we have come a long way together, and I'm grateful for their guidance, supp ort, warmth and generous help all these years. I would like to thank my colleagues and friends in the machine learning lab, with whom I sp ent most of the last ve years: Itay Gat, Lidror Troyansky, Elad Schneidman, Shlomo Dubnov, Shai Fine, Noam Slonim, Ofer Neiman and Ran El-Yaniv, for all their help and for b eing great company. A sp ecial thanks to Ran El-Yaniv, a dear friend who encouraged me to keep b elieving in myself. I very much enjoyed working with Gill Bejerano, with whom I am still working on fascinating new directions. I would next like to thank all of my colleagues and friends in the molecular genetics and biotechnology lab headed by Hanah Margalit: Ora Schueler, Eyal Nadir, Iddo Friedb erg, Yael Mandel and Yael Altuvia, for the great time I had every Thursday in our journal club group meetings. Besides the many pap ers I had in my bag after each such meeting, it was also great fun.
    [Show full text]
  • Shsearch: a Method for Fast Remote Homology Detection Baddar M
    SHsearch: a method for fast remote homology detection Baddar M. Faculty of Engineering, Alexandria University , Egypt Abstract—Remote homology detection is the problem of detecting homology in cases of low sequence similarity. It is a hard ​ computational problem with no approach that works well in all cases. Methods based on profile hidden Markov models (HMM) often exhibit relatively higher sensitivity for detecting remote homologies than commonly used approaches. However, calculating similarity scores in profile HMM methods is computationally intensive as they use dynamic programming algorithms. In this paper we introduce SHsearch: a new method for remote protein homology detection. Our method is implemented as a modification of HHsearch: a remote protein homology detection method based on comparing two profile HMMs. The motivation for modification was to reduce the run time of HHsearch significantly with minimal sensitivity loss. SHsearch focuses on comparing the important submodels of the query and database HMMs instead of comparing the complete models. Hence, SHsearch achieves a significant speedup over HHsearch with minimal loss in sensitivity. On SCOP 1.63, SHsearch achieved 88X speedup with 8.2% loss in sensitivity with respect to HHsearch at error rate of 10%, which deemed to be an acceptable tradeoff. Index Terms—biological sequence classification, hidden Markov models ​ —————————— ◆ —————————— ​ ​ ​ ​ 1 INTRODUCTION Analysis of large scale sequence data has become an important task in computational biology and comparative genomics, inspired in part by numerous scientific and technological applications such as the biomedical literature analysis or the analysis of biological sequences. Protein homology detection has attracted particular interest due to its critical role in many biological applications.
    [Show full text]
  • The Myth of Junk DNA
    The Myth of Junk DNA JoATN h A N W ells s eattle Discovery Institute Press 2011 Description According to a number of leading proponents of Darwin’s theory, “junk DNA”—the non-protein coding portion of DNA—provides decisive evidence for Darwinian evolution and against intelligent design, since an intelligent designer would presumably not have filled our genome with so much garbage. But in this provocative book, biologist Jonathan Wells exposes the claim that most of the genome is little more than junk as an anti-scientific myth that ignores the evidence, impedes research, and is based more on theological speculation than good science. Copyright Notice Copyright © 2011 by Jonathan Wells. All Rights Reserved. Publisher’s Note This book is part of a series published by the Center for Science & Culture at Discovery Institute in Seattle. Previous books include The Deniable Darwin by David Berlinski, In the Beginning and Other Essays on Intelligent Design by Granville Sewell, God and Evolution: Protestants, Catholics, and Jews Explore Darwin’s Challenge to Faith, edited by Jay Richards, and Darwin’s Conservatives: The Misguided Questby John G. West. Library Cataloging Data The Myth of Junk DNA by Jonathan Wells (1942– ) Illustrations by Ray Braun 174 pages, 6 x 9 x 0.4 inches & 0.6 lb, 229 x 152 x 10 mm. & 0.26 kg Library of Congress Control Number: 2011925471 BISAC: SCI029000 SCIENCE / Life Sciences / Genetics & Genomics BISAC: SCI027000 SCIENCE / Life Sciences / Evolution ISBN-13: 978-1-9365990-0-4 (paperback) Publisher Information Discovery Institute Press, 208 Columbia Street, Seattle, WA 98104 Internet: http://www.discoveryinstitutepress.com/ Published in the United States of America on acid-free paper.
    [Show full text]
  • BIOGRAPHICAL SKETCH NAME: Bonnie Berger POSITION TITLE
    BIOGRAPHICAL SKETCH NAME: Bonnie Berger POSITION TITLE: Simons Professor of Mathematics and Professor of Electrical Engineering & Computer Science EDUCATION/TRAINING (Begin with baccalaureate or other initial professional education, such as nursing, include postdoctoral training and residency training if applicable. Add/delete rows as necessary.) DEGREE Completion (if Date FIELD OF STUDY INSTITUTION AND LOCATION applicable) MM/YYYY Brandeis University, Waltham, MA AB 06/1983 Computer Science Massachusetts Institute of Technology SM 01/1986 Computer Science Massachusetts Institute of Technology Ph.D. 06/1990 Computer Science Massachusetts Institute of Technology Postdoc 06/1992 Applied Mathematics A. Personal Statement Many advances in modern biology revolve around automated data collection and the large resulting data sets. I am considered a pioneer in the area of bringing computer algorithms to the study of biological data, and a founder in this community that I have witnessed grow so profoundly over the last 20 years. I have made major contributions to many areas of computational biology and biomedicine, largely, though not exclusively through algorithmic insights, as demonstrated by ten thousand citations to my scientific papers and widely-used software. My research group works on diverse challenges, including and Computational Genomics, Structural Bioinformatics, High-throughput Technology Analysis and Design, Network Inference, and Data Privacy. We collaborate closely with biologists, MDs, and software engineers, implementing these new techniques in order to design experiments to maximally leverage the power of computation for biological exploration. Over the past five years I have been particularly active analyzing large and complex biological data sets; for example, my lab has played integral roles in modENCODE (non-coding RNA analysis), MPEG (biological data compression standard), and the Broad Institute’s sequence analysis efforts.
    [Show full text]
  • 4310586.Pdf (122.4Kb)
    ISCB Ebola Award for Important Future Research on the Computational Biology of Ebola Virus The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters Citation Karp, Peter D., Bonnie Berger, Diane Kovats, Thomas Lengauer, Michal Linial, Pardis Sabeti, Winston Hide, and Burkhard Rost. 2015. “ISCB Ebola Award for Important Future Research on the Computational Biology of Ebola Virus.” PLoS Computational Biology 11 (1): e1004087. doi:10.1371/journal.pcbi.1004087. http:// dx.doi.org/10.1371/journal.pcbi.1004087. Published Version doi:10.1371/journal.pcbi.1004087 Citable link http://nrs.harvard.edu/urn-3:HUL.InstRepos:14065320 Terms of Use This article was downloaded from Harvard University’s DASH repository, and is made available under the terms and conditions applicable to Other Posted Material, as set forth at http:// nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of- use#LAA MESSAGE FROM ISCB ISCB Ebola Award for Important Future Research on the Computational Biology of Ebola Virus Peter D. Karp1,2, Bonnie Berger1,3, Diane Kovats1*, Thomas Lengauer1,4, Michal Linial1,5, Pardis Sabeti6,7,8, Winston Hide9, Burkhard Rost1,10,11* 1 International Society for Computational Biology, La Jolla, California, United States of America, 2 SRI International, Menlo Park, California, United States of America, 3 Department of Mathematics, Massachusetts Institute of Technology (MIT), Cambridge, Massachusetts, United States of America, 4 Computational Biology and Applied Algorithmics,
    [Show full text]