Characterization of Genomic Diversity at a Quantitative Disease Resistance Locus in Maize Using Improved Bioinformatic Tools

Total Page:16

File Type:pdf, Size:1020Kb

Characterization of Genomic Diversity at a Quantitative Disease Resistance Locus in Maize Using Improved Bioinformatic Tools CHARACTERIZATION OF GENOMIC DIVERSITY AT A QUANTITATIVE DISEASE RESISTANCE LOCUS IN MAIZE USING IMPROVED BIOINFORMATIC TOOLS FOR TARGETED RESEQUENCING by Felix Francis A dissertation submitted to the Faculty of the University of Delaware in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Bioinformatics and Systems Biology Spring 2018 © 2018 Felix Francis All Rights Reserved CHARACTERIZATION OF GENOMIC DIVERSITY AT A QUANTITATIVE DISEASE RESISTANCE LOCUS IN MAIZE USING IMPROVED BIOINFORMATIC TOOLS FOR TARGETED RESEQUENCING by Felix Francis Approved: Cathy H. Wu, Ph.D. Chair of Bioinformatics & Computational Biology Approved: Mark Rieger, Ph.D. Dean of the College of Agriculture and Natural Resources Approved: Ann L. Ardis, Ph.D. Senior Vice Provost for Graduate and Professional Education I certify that I have read this dissertation and that in my opinion it meets the academic and professional standard required by the University as a dissertation for the degree of Doctor of Philosophy. Signed: Randall J. Wisser, Ph.D. Professor in charge of dissertation I certify that I have read this dissertation and that in my opinion it meets the academic and professional standard required by the University as a dissertation for the degree of Doctor of Philosophy. Signed: J. Antoni Rafalski, Ph.D. Member of dissertation committee I certify that I have read this dissertation and that in my opinion it meets the academic and professional standard required by the University as a dissertation for the degree of Doctor of Philosophy. Signed: Shawn W. Polson, Ph.D. Member of dissertation committee I certify that I have read this dissertation and that in my opinion it meets the academic and professional standard required by the University as a dissertation for the degree of Doctor of Philosophy. Signed: Blake C. Meyers, Ph.D. Member of dissertation committee I certify that I have read this dissertation and that in my opinion it meets the academic and professional standard required by the University as a dissertation for the degree of Doctor of Philosophy. Signed: Li Liao, Ph.D. Member of dissertation committee ACKNOWLEDGEMENTS I would like to express my deep and sincere gratitude to my dissertation advisor, Dr. Randall J. Wisser, for the opportunity to pursue research in his lab. The experience has definitely helped me become a better scientist. In particular, I thank him for giving me the independence to explore my research ideas and through this experience, I have learned a lot, especially the importance of perseverance while dealing with challenging research problems. I am extremely grateful to my dissertation committee members, Dr. Blake C. Meyers, Dr. J. Antoni Rafalski, Dr. Shawn W. Polson, and Dr. Li Liao, for their feedback and guidance, which greatly helped shape my research direction. I thank the Wisser group, for all the interesting discussions and for providing an enjoyable environment to learn. I greatly appreciate the assistance of Teclemariam Weldekidan, Michael Dumas and Scott Davis for various crucial validation and data generation work associated with this dissertation. I particularly feel lucky to have shared my time at the lab with Meredith Biedrzycki, Juliana Teixeira, Heather Manch- ing and Terence Mhora who continue to inspire me with their enthusiasm towards science, and have been brilliant role models as early career scientists. I would also like to thank the other collaborators on the NSF grant that provided the funding for this research, especially Dr. Rebecca Nelson and Dr. Tiffany Jamann who provided valuable insights into the biological questions addressed in this dissertation. I appreciate those who got me started in life and research, especially my parents, for their support and encouragement and for introducing me into scientific research. I thank my teachers and advisors during my Undergraduate and Masters programs for inspiring me to pursue science. I am especially thankful to my wife, Pratha Sah, for her support and patience throughout these six years and beyond, whose encouragement and sacrifice made this happen. v TABLE OF CONTENTS LIST OF TABLES :::::::::::::::::::::::::::::::: ix LIST OF FIGURES ::::::::::::::::::::::::::::::: xi ABSTRACT ::::::::::::::::::::::::::::::::::: xix Chapter 1 INTRODUCTION :::::::::::::::::::::::::::::: 1 1.1 Role of genomic diversity for crop improvement :::::::::::: 1 1.2 Challenges in plant genome sequencing projects :::::::::::: 2 1.3 Maize genomic diversity ::::::::::::::::::::::::: 4 1.4 Complex traits and Quantitative disease resistance (QDR) :::::: 5 2 THERMOALIGN: A GENOME-AWARE PRIMER DESIGN TOOL FOR STANDARD PCR AND TILED AMPLICON RESEQUENCING ::::::::::::::::::::::::::::: 7 2.1 Abstract :::::::::::::::::::::::::::::::::: 7 2.2 Introduction :::::::::::::::::::::::::::::::: 8 2.3 Results ::::::::::::::::::::::::::::::::::: 11 2.3.1 Target Region Selection (TRS) ::::::::::::::::: 12 2.3.2 Unique Oligo Design (UOD) ::::::::::::::::::: 12 2.3.3 Priming Specificity Evaluation (PSE) :::::::::::::: 15 2.3.4 Primer Pair Selection (PPS). ::::::::::::::::::: 16 2.3.5 Empirical evaluation of priming specificity ::::::::::: 18 2.4 Discussion ::::::::::::::::::::::::::::::::: 21 2.5 Methods :::::::::::::::::::::::::::::::::: 25 2.5.1 ThermoAlign pipeline ::::::::::::::::::::::: 25 2.5.2 Target region selection (TRS) :::::::::::::::::: 25 2.5.3 Unique oligonucleotide design (UOD) :::::::::::::: 27 vi 2.5.4 Priming specificity evaluation (PSE) :::::::::::::: 30 2.5.5 Primer pair selection (PPS) ::::::::::::::::::: 32 2.5.6 PCR validation :::::::::::::::::::::::::: 36 2.5.7 SMRT sequencing and analysis of long-range PCR amplicons : 37 2.6 Availability :::::::::::::::::::::::::::::::: 37 2.7 Acknowledgments ::::::::::::::::::::::::::::: 38 2.8 Author contributions statement ::::::::::::::::::::: 38 2.9 Additional information :::::::::::::::::::::::::: 38 3 CLUSTERING OF CIRCULAR CONSENSUS SEQUENCES: ACCURATE ERROR CORRECTION AND ASSEMBLY OF SINGLE MOLECULE REAL-TIME READS FROM MULTIPLEXED AMPLICON LIBRARIES ::::::::::::: 39 3.1 Abstract :::::::::::::::::::::::::::::::::: 39 3.2 Background :::::::::::::::::::::::::::::::: 40 3.3 Methods :::::::::::::::::::::::::::::::::: 41 3.3.1 Sequence data ::::::::::::::::::::::::::: 41 3.3.2 Clustering of circular consensus sequences for long amplicon analysis :::::::::::::::::::::::::::::: 42 3.3.3 Evaluating the accuracy of C3S-LAA :::::::::::::: 43 3.4 Results and Discussion :::::::::::::::::::::::::: 46 3.5 Conclusion ::::::::::::::::::::::::::::::::: 51 3.6 Availability :::::::::::::::::::::::::::::::: 52 4 RESEQUENCING OF A QUANTITATIVE DISEASE RESISTANCE LOCUS IN MAIZE PROVIDES BENCHMARK DATA AND INSIGHT INTO THE SPECTRUM OF SEQUENCE VARIATION AMONG INBRED LINES :::::::::::::::: 54 4.1 Introduction :::::::::::::::::::::::::::::::: 54 4.2 Methods :::::::::::::::::::::::::::::::::: 57 4.2.1 Barcoded DNA amplification of the qNLB 1 25721468 23298 locus :::::::::::::::::::::::::::::::: 57 4.2.2 Sequencing, error correction and assembly of multiplexed amplicon libraries ::::::::::::::::::::::::: 58 4.2.3 Sequence characterization :::::::::::::::::::: 59 4.2.4 Comparison to maize HapMap3 ::::::::::::::::: 60 4.2.5 Annotation of variant effects ::::::::::::::::::: 62 vii 4.2.6 Association mapping ::::::::::::::::::::::: 62 4.3 Results ::::::::::::::::::::::::::::::::::: 62 4.3.1 Genomic diversity across the qNLB 1 25721468 23298 locus : 62 4.3.2 Comparison to maize HapMap3 ::::::::::::::::: 64 4.3.3 Analysis of the NLB susceptible, Tx303 haplotype ::::::: 68 4.4 Discussion ::::::::::::::::::::::::::::::::: 70 5 DISCUSSION AND CONCLUSIONS :::::::::::::::::: 74 5.1 A ThermoAlign approach for targeted enrichment of repetitive genomes 75 5.2 SMRT sequencing and assembly of multiplexed amplicon libraries from the maize genome ::::::::::::::::::::::::::::: 77 5.3 Unravelling the genomic diversity at a maize quantitative disease resistance (QDR) locus using long molecule resequencing ::::::: 78 5.4 Future directions ::::::::::::::::::::::::::::: 79 BIBLIOGRAPHY :::::::::::::::::::::::::::::::: 81 Appendix A SUPPLEMENTARY INFORMATION FOR THERMOALIGN: A GENOME-AWARE PRIMER DESIGN TOOL FOR STANDARD PCR AND TILED AMPLICON RESEQUENCING :::::::: 99 B SUPPLEMENTARY INFORMATION FOR: CLUSTERING OF CIRCULAR CONSENSUS SEQUENCES: ACCURATE ERROR CORRECTION AND ASSEMBLY OF SINGLE MOLECULE REAL-TIME READS FROM MULTIPLEXED AMPLICON LIBRARIES ::::::::::::::::::::::::::::::::: 109 C SUPPLEMENTARY INFORMATION FOR: UNRAVELLING THE GENOMIC DIVERSITY AT A MAIZE QUANTITATIVE DISEASE RESISTANCE LOCUS USING LONG MOLECULE RESEQUENCING ::::::::::::::::::::::::::::: 114 D PERMISSIONS ::::::::::::::::::::::::::::::: 124 viii LIST OF TABLES 2.1 Results from BLASTn alignment of error corrected PacBio consensus sequences to the B73 genome. ::::::::::::::::::::: 21 3.1 Comparison of LAA and C3S-LAA consensus sequences for B73 amplicons. :::::::::::::::::::::::::::::::: 47 3.2 The number of consensus sequences generated from the multiplex library, following barcode demultiplexing. ::::::::::::::: 50 4.1 Quartiles of genotyping accuracy for maize HapMap3 at the qNLB 1 25721468 23298 locus. :::::::::::::::::::: 67 A.1 Comparison of ThermoAlign to related primer design tools. ::::: 99 A.2 Effects of the amplicon size range parameter on the minimum tiling path primer design for the 24 kb target region described in the main text. ::::::::::::::::::::::::::::::::::: 100 A.3 Eight genomic loci in maize B73 genome, selected for targeted enrichment :::::::::::::::::::::::::::::::
Recommended publications
  • A Multiscale Tool to Explore Genomic Conservation
    SynVisio: A Multiscale Tool to Explore Genomic Conservation A Thesis Submitted to the College of Graduate and Postdoctoral Studies in Partial Fulfillment of the Requirements for the degree of Master of Science in the Department of Computer Science University of Saskatchewan Saskatoon By Venkat Kiran Bandi ©Venkat Kiran Bandi, May/2020. All rights reserved. Permission to Use In presenting this thesis in partial fulfilment of the requirements for a Postgraduate degree from the University of Saskatchewan, I agree that the Libraries of this University may make it freely available for inspection. I further agree that permission for copying of this thesis in any manner, in whole or in part, for scholarly purposes may be granted by the professor or professors who supervised my thesis work or, in their absence, by the Head of the Department or the Dean of the College in which my thesis work was done. It is understood that any copying or publication or use of this thesis or parts thereof for financial gain shall not be allowed without my written permission. It is also understood that due recognition shall be given to me and to the University of Saskatchewan in any scholarly use which may be made of any material in my thesis. Requests for permission to copy or to make other use of material in this thesis in whole or part should be addressed to: Head of the Department of Computer Science 176 Thorvaldson Building 110 Science Place University of Saskatchewan Saskatoon, Saskatchewan Canada S7N 5C9 Or Dean College of Graduate and Postdoctoral Studies University of Saskatchewan 116 Thorvaldson Building, 110 Science Place Saskatoon, Saskatchewan S7N 5C9 Canada i Abstract Comparative analysis of genomes is an important area in biological research that can shed light on an organism's internal functions and evolutionary history.
    [Show full text]
  • A Zebrafish Reporter Line Reveals Immune and Neuronal Expression of Endogenous Retrovirus
    bioRxiv preprint doi: https://doi.org/10.1101/2021.01.21.427598; this version posted January 21, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. A zebrafish reporter line reveals immune and neuronal expression of endogenous retrovirus. Noémie Hamilton1,2*, Amy Clarke1, Hannah Isles1, Euan Carson1, Jean-Pierre Levraud3, Stephen A Renshaw1 1. The Bateson Centre, Department of Infection, Immunity and Cardiovascular Disease, University of Sheffield, Sheffield, UK 2. The Institute of Neuroscience, University of Sheffield, Sheffield, UK 3. Macrophages et Développement de l’Immunité, Institut Pasteur, CNRS UMR3738, 25 rue du docteur Roux, 75015 Paris *Corresponding author: [email protected] Abstract Endogenous retroviruses (ERVs) are fossils left in our genome from retrovirus infections of the past. Their sequences are part of every vertebrate genome and their random integrations are thought to have contributed to evolution. Although ERVs are mainly kept silenced by the host genome, they are found activated in multiple disease states such as auto-inflammatory disorders and neurological diseases. What makes defining their role in health and diseases challenging is the numerous copies in mammalian genomes and the lack of tools to study them. In this study, we identified 8 copies of the zebrafish endogenous retrovirus (zferv). We created and characterised the first in vivo ERV reporter line in any species. Using a combination of live imaging, flow cytometry and single cell RNA sequencing, we mapped zferv expression to early T cells and neurons.
    [Show full text]
  • Evolution and Function of Drososphila Melanogaster Cis-Regulatory Sequences
    Evolution and Function of Drososphila melanogaster cis-regulatory Sequences By Aaron Hardin A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Molecular and Cell Biology in the Graduate Division of the University of California, Berkeley Committee in charge: Professor Michael Eisen, Chair Professor Doris Bachtrog Professor Gary Karpen Professor Lior Pachter Fall 2013 Evolution and Function of Drososphila melanogaster cis-regulatory Sequences This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License 2013 by Aaron Hardin 1 Abstract Evolution and Function of Drososphila melanogaster cis-regulatory Sequences by Aaron Hardin Doctor of Philosophy in Molecular and Cell Biology University of California, Berkeley Professor Michael Eisen, Chair In this work, I describe my doctoral work studying the regulation of transcription with both computational and experimental methods on the natural genetic variation in a population. This works integrates an investigation of the consequences of polymorphisms at three stages of gene regulation in the developing fly embryo: the diversity at cis-regulatory modules, the integration of transcription factor binding into changes in chromatin state and the effects of these inputs on the final phenotype of embryonic gene expression. i I dedicate this dissertation to Mela Hardin who has been here for me at all times, even when we were apart. ii Contents List of Figures iv List of Tables vi Acknowledgments vii 1 Introduction1 2 Within Species Diversity in cis-Regulatory Modules6 2.1 Introduction....................................6 2.2 Results.......................................8 2.2.1 Genome wide diversity in transcription factor binding sites......8 2.2.2 Genome wide purifying selection on cis-regulatory modules......9 2.3 Discussion.....................................9 2.4 Methods for finding polymorphisms......................
    [Show full text]
  • A Burst of Protein Sequence Evolution and a Prolonged Period of Asymmetric Evolution Follow Gene Duplication in Yeast
    Downloaded from genome.cshlp.org on October 3, 2021 - Published by Cold Spring Harbor Laboratory Press Letter A burst of protein sequence evolution and a prolonged period of asymmetric evolution follow gene duplication in yeast Devin R. Scannell1,2 and Kenneth H. Wolfe Smurfit Institute of Genetics, Trinity College Dublin, Dublin 2, Ireland It is widely accepted that newly arisen duplicate gene pairs experience an altered selective regime that is often manifested as an increase in the rate of protein sequence evolution. Many details about the nature of the rate acceleration remain unknown, however, including its typical magnitude and duration, and whether it applies to both gene copies or just one. We provide initial answers to these questions by comparing the rate of protein sequence evolution among eight yeast species, between a large set of duplicate gene pairs that were created by a whole-genome duplication (WGD) and a set of genes that were returned to single-copy after this event. Importantly, we use a new method that takes into account the tendency for slowly evolving genes to be retained preferentially in duplicate. We show that, on average, proteins encoded by duplicate gene pairs evolved at least three times faster immediately after the WGD than single-copy genes to which they behave identically in non-WGD lineages. Although the high rate in duplicated genes subsequently declined rapidly, it has not yet returned to the typical rate for single-copy genes. In addition, we show that although duplicate gene pairs often have highly asymmetric rates of evolution, even the slower members of pairs show evidence of a burst of protein sequence evolution immediately after duplication.
    [Show full text]
  • A Phd Position Is Available in the Research Group of Aoife Mclysaght
    A PhD position is available in the research group of Aoife McLysaght in Trinity College Dublin to work on comparative genomics of vertebrates with a focus on understanding dosage sensitive genes in terms of evolution and disease. The execution of the project will involve bioinformatics and computer programming as well as statistical analysis of data. For examples of the kinds of things we work on, look up our papers on PubMed: https://www.ncbi.nlm.nih.gov/pubmed?cmd=search&term=mclysaght+a[au]&dispmax=50 Applicants should be in the final year of, or hold, a bachelor's degree in molecular biology with an interest and aptitude for programming and bioinformatics; or vice versa (computer science bachelor's with an interest in molecular biology). Interest and enthusiasm for molecular evolution is essential. Students will be expected to be self-motivated and creative and to work closely with other members of the team. This is a four-year position and comes with a tax-free stipend of 18000euro and covers fees up to EU level (non-EU students may apply, but fees are only covered up to EU rates). TCD is Ireland’s top ranked University and a member of the League of European Research Universities (LERU). The McLysaght lab is funded by a European Research Council (ERC) Starting Grant. Lab: http://www.gen.tcd.ie/molevol/ Applications including a cover letter, CV, summary of scientific interests and reasons for being interested in our research group, and contact details for two referees should be made to [email protected].
    [Show full text]
  • 'A Draft Sequence of the Neandertal Genome'
    A Draft Sequence of the Neandertal Genome Richard E. Green, et al. Science 328, 710 (2010); DOI: 10.1126/science.1188021 This copy is for your personal, non-commercial use only. If you wish to distribute this article to others, you can order high-quality copies for your colleagues, clients, or customers by clicking here. Permission to republish or repurpose articles or portions of articles can be obtained by following the guidelines here. The following resources related to this article are available online at www.sciencemag.org (this information is current as of May 7, 2010 ): Updated information and services, including high-resolution figures, can be found in the online version of this article at: http://www.sciencemag.org/cgi/content/full/328/5979/710 Supporting Online Material can be found at: http://www.sciencemag.org/cgi/content/full/328/5979/710/DC1 This article cites 81 articles, 29 of which can be accessed for free: http://www.sciencemag.org/cgi/content/full/328/5979/710#otherarticles on May 7, 2010 This article has been cited by 1 articles hosted by HighWire Press; see: http://www.sciencemag.org/cgi/content/full/328/5979/710#otherarticles This article appears in the following subject collections: Immunology http://www.sciencemag.org/cgi/collection/immunology www.sciencemag.org Downloaded from Science (print ISSN 0036-8075; online ISSN 1095-9203) is published weekly, except the last week in December, by the American Association for the Advancement of Science, 1200 New York Avenue NW, Washington, DC 20005. Copyright 2010 by the American Association for the Advancement of Science; all rights reserved.
    [Show full text]
  • Review 2015–16 Review
    Review 2015–16 Review 2015–16 AR cover 1.indd 1 3/2/2017 3:00:03 PM TO PRINT.indd 82 3/2/2017 1:56:30 PM Inside A year in view 4 A note from the President 11 A note from the Executive Secretary 13 We explore 17 We award 27 We engage 31 We advise 49 We support 55 New members 61 Donations 73 Bereavements 78 Accounts 80 This report covers the period from March 2015 to December 2016 TO PRINT.indd 1 3/2/2017 1:53:07 PM 156 new entries added to the Dictionary of Irish Biography 61 maps from Irish Historic Towns Atlas now available online 10 volumes of Documents on Irish Foreign Policy to date 6 US-Ireland Research Innovation Award winners 1 prehistoric bear bone excavated with RIA grants TO PRINT.indd 2 3/2/2017 1:53:08 PM 19 1,400 visits on 1,500 events and meetings Culture Night 2016 13,300 followers on Twitter 20,100 books sold 100,000,000 stamps inspired by A History of Ireland in 100 Objects 38,500 visitors 70,000 441,900 in research grants awarded downloads € of 1916 Portraits and Lives TO PRINT.indd 3 3/2/2017 1:53:09 PM US-Ireland Research Innovation Awards 2016: L–R Eddie Cullen, Ulster Bank; Minister for Jobs, Enterprise and Innovation, Mary Mitchell O’Connor, TD; Shaun Murphy, KPMG and James O’Connor, American Chamber of Commerce. Participant at Inspiring Ireland London Collection Day, organised by the Digital Repository of Ireland in March 2016.
    [Show full text]
  • SG Stories004-President-Of-Ireland
    COMMUNITY SYNTHETIC BIOLOGY EVENT LAUNCHES PRESIDENT OF IRELAND ETHICS INITIATIVE WHO? ▶ Science Gallery Dublin ▶ President of Ireland Ethics Initiative ▶ Drew Endy, Associate Professor of Stanford University Professor of Bioengineering, Stanford University Bioengineering Drew Endy came Hugh Whittall, Director of Nuffield to Science Gallery Dublin in Council on Bioethics, UK 2014 as the inaugural event of the ▶ Aoife McLysaght, Professor of Genetics, President of Ireland Ethics Initiative, Trinity College Dublin leading to awareness and ▶ Celsius Research Cluster, Dublin City discussion of critical ethical issues University around synthetic biology. ▶ President of Ireland Ethics Initiative WHAT? The President of Ireland Ethics Initiative is a series of discussions on prominent ethical issues headed by Irish President Michael D Higgins, and the inaugural event was held in Science Gallery Dublin in January 2014. Synthetic Biology pioneer Drew Endy, from Stanford University, spoke about synthetic biology and engaged in a discussion with audience members, geneticist Aoife McLysaght and Nuffield Council on Bioethics Director Hugh Whittall, exploring the nature and ethics of synthetic biology. The event tied in with Science Gallery Dublin’s synthetic biology themed GROW YOUR OWN... exhibition. Evaluation of the event was carried out by the Celsius Research Cluster in Dublin City University. Event attendees were asked questions about synthetic biology before and after the event, and the results showed an increase in awareness of synthetic biology
    [Show full text]
  • Evidence from Human, Yeast, and Plant Positionally Biased Gene Loss
    Downloaded from genome.cshlp.org on December 3, 2012 - Published by Cold Spring Harbor Laboratory Press Positionally biased gene loss after whole genome duplication: Evidence from human, yeast, and plant Takashi Makino and Aoife McLysaght Genome Res. 2012 22: 2427-2435 originally published online July 26, 2012 Access the most recent version at doi:10.1101/gr.131953.111 Supplemental http://genome.cshlp.org/content/suppl/2012/09/17/gr.131953.111.DC1.html Material References This article cites 39 articles, 19 of which can be accessed free at: http://genome.cshlp.org/content/22/12/2427.full.html#ref-list-1 Creative This article is distributed exclusively by Cold Spring Harbor Laboratory Press Commons for the first six months after the full-issue publication date (see License http://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 3.0 Unported License), as described at http://creativecommons.org/licenses/by-nc/3.0/. Email alerting Receive free email alerts when new articles cite this article - sign up in the box at the service top right corner of the article or click here To subscribe to Genome Research go to: http://genome.cshlp.org/subscriptions © 2012, Published by Cold Spring Harbor Laboratory Press Downloaded from genome.cshlp.org on December 3, 2012 - Published by Cold Spring Harbor Laboratory Press Research Positionally biased gene loss after whole genome duplication: Evidence from human, yeast, and plant Takashi Makino1,2 and Aoife McLysaght1,3 1Smurfit Institute of Genetics, University of Dublin, Trinity College, Dublin 2, Ireland; 2Department of Ecology and Evolutionary Biology, Graduate School of Life Sciences, Tohoku University, Sendai 980-8578, Japan Whole genome duplication (WGD) has made a significant contribution to many eukaryotic genomes including yeast, plants, and vertebrates.
    [Show full text]
  • Synteny-Based Analyses Indicate That Sequence Divergence Is Not the Main Source of Orphan Genes Nikolaos Vakirlis1, Anne-Ruxandra Carvunis2*, Aoife Mclysaght1*
    RESEARCH ARTICLE Synteny-based analyses indicate that sequence divergence is not the main source of orphan genes Nikolaos Vakirlis1, Anne-Ruxandra Carvunis2*, Aoife McLysaght1* 1Smurfit Institute of Genetics, Trinity College Dublin, University of Dublin, Dublin, Ireland; 2Department of Computational and Systems Biology, Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, United States Abstract The origin of ‘orphan’ genes, species-specific sequences that lack detectable homologues, has remained mysterious since the dawn of the genomic era. There are two dominant explanations for orphan genes: complete sequence divergence from ancestral genes, such that homologues are not readily detectable; and de novo emergence from ancestral non-genic sequences, such that homologues genuinely do not exist. The relative contribution of the two processes remains unknown. Here, we harness the special circumstance of conserved synteny to estimate the contribution of complete divergence to the pool of orphan genes. By separately comparing yeast, fly and human genes to related taxa using conservative criteria, we find that complete divergence accounts, on average, for at most a third of eukaryotic orphan and taxonomically restricted genes. We observe that complete divergence occurs at a stable rate within a phylum but at different rates between phyla, and is frequently associated with gene shortening akin to pseudogenization. *For correspondence: [email protected] (A-RC); Introduction [email protected] (AMcL) Extant genomes contain a large repertoire of protein-coding genes which can be grouped into fami- lies based on sequence similarity. Comparative genomics has heavily relied on grouping genes and Competing interests: The proteins in this manner since the dawn of the genomic era (Rubin, 2000).
    [Show full text]
  • Blood Donor Genotyping
    Blood Donor Genotyping Nicholas S. Gleadall Department of Haematology University of Cambridge This dissertation is submitted for the degree of Doctor of Philosophy Gonville & Caius College September 2020 I would like to dedicate this thesis to the 813,212 donors, patients and healthy individuals whose data this work has used. Their decision to share personal medical and genetic information so that it may be used for the benefit of others is laudable. I hope that thework presented here meets the expectations they had when making such a valuable donation. Declaration I hereby declare that except where specific reference is made to the work of others, the contents of this dissertation are original and have not been submitted in whole or in part for consideration for any other degree or qualification in this, or any other university. This dissertation is my own work and contains nothing which is the outcome of work done in collaboration with others, except as specified in the text and Acknowledgements. This dissertation contains fewer than 65,000 words including appendices, bibliography, footnotes, tables and equations and has fewer than 150 figures. Nicholas S. Gleadall September 2020 Blood Donor Genotyping Nicholas S. Gleadall Abstract Transfusion of blood is one of the oldest and most widely used clinical interventions. In 2020 the World Health Organisation reported that globally 118.5 million blood donations had been collected worldwide. This blood will be used to provide life-saving transfusion support for millions of individuals with a wide range of medical conditions. To ensure the safety of each blood transfusion it is common policy to identify and ensure compatibility between the ABO and RhD antigens of both donor and recipient.
    [Show full text]
  • Dissertation Submitted to the Combined Faculties for the Natural
    Dissertation submitted to the Combined Faculties for the Natural Sciences and for Mathematics of the Ruperto-Carola University of Heidelberg, Germany for the degree of Doctor of Natural Sciences presented by Sascha Meiers MS B born in Merzig, Germany Date of oral examination: .. Exploiting emerging DNA sequencing technologies to study genomic rearrangements Referees: Dr. Judith Zaugg Prof. Dr. Benedikt Brors Exploiting emerging DNA sequencing technologies to study genomic rearrangements Sascha Meiers th March Supervised by Dr. Jan Korbel Licensed under Creative Commons Attribution (CC BY) . The source code of this thesis is available at https://github.com/meiers/thesis The layout is inspired by and partly taken from Konrad Rudolph’s thesis Summary Structural variants (SVs) alter the structure of chromosomes by deleting, dupli- cating or otherwise rearranging pieces of DNA. They contribute the majority of nucleotide differences between humans and are known to play causal roles in many diseases. Since the advance of massively parallel sequencing (MPS) tech- nologies, SVs have been studied more comprehensively than ever before. How- ever, in contrast to smaller types of genetic variation, SV detection is still funda- mentally hampered by the limitations of short-read sequencing that cannot suf- ficiently cope with the complexity of large genomes. Emerging DNA sequencing technologies and protocols hold the potential to overcome some of these lim- itations. In this dissertation, I present three distinct studies each utilizing such emerging techniques to detect, to validate and/or to characterize SVs. These tech- nologies, together with novel computational approaches that I developed, allow to characterize SVs that had previously been challening, or even impossible, to assess.
    [Show full text]