A Thesis Entitled Snps and Indels Analysis in Human Genome Using

Total Page:16

File Type:pdf, Size:1020Kb

A Thesis Entitled Snps and Indels Analysis in Human Genome Using A Thesis entitled SNPs and Indels Analysis in Human Genome using Computer Simulation and Sequencing Data by Sharmistha Chakrabortty Submitted to the Graduate Faculty as partial fulfillment of the requirements for the Master of Science Degree in Biomedical Sciences: Bioinformatics, Proteomics and Genomics ________________________________________ Dr. Alexei Fedorov, Committee Chair ________________________________________ Dr. Robert Blumenthal, Committee Member ________________________________________ Dr. Sadik Khuder, Committee Member ________________________________________ Dr. Amanda Bryant-Friedrich, Dean College of Graduate Studies The University of Toledo August 2017 Copyright 2017, Sharmistha Chakrabortty This document is copyrighted material. Under copyright law, no parts of this document may be reproduced without the expressed permission of the author. An Abstract of SNPs and Indels Analysis in Human Genome using Computer Simulation and Sequencing Data by Sharmistha Chakrabortty Submitted to the Graduate Faculty as partial fulfillment of the requirements for the Master of Science Degree in Biomedical Sciences: Bioinformatics, Proteomics and Genomics The University of Toledo August 2017 Genetic variations are the heritable changes in DNA caused by mutation and can be present in both coding and non-coding region of the DNA. They provide great resources for the evolution of an organism in response to environmental and biological changes. Analysis of these variants (such as Single Nucleotide Polymorphism (SNPs), Indels, and other structural variants like Copy Number Variations (CNV)) thus, have a wide range of potential applications. These include identification of causative variants and the genes for genetic diseases, personalized genomics, population and evolutionary genetics, and forensic biology. This study represents two such applications of human variant analysis (particularly the analysis of SNPs and Indels). In the first chapter, SNPs were analyzed to understand the correlation between recombination rate and genetic diversity in the human genome, using a computational modeling program. A iii simulated human population was used to study the effect of various population level factors such as natural selective forces, the type of mutations, etc., on this correlation. In the second chapter, Next Generation Sequencing (in this case Whole Exome Sequencing) data and associated computational variant analysis tools and software were used to analyze both SNPs and Indels in the human genomes to find a lead candidate genetic variant responsible for Inherited Retinal Dystrophy in a family. iv I dedicated this work firstly to Lord Almighty for bestowing his kind blessings onto me at every stage of my life. And to my parents, Shri Arun Kumar Chakrabortty and Smt. Sikha Chakrabortty, and my brother Dr. Sudipto Kumar Chakrabortty, who has supported me all my life and encouraged me to chase my dreams, no matter how far-fetched and difficult they may seem. Finally, I would also like to dedicate this work to all my teachers and co-workers in the past who have inspired me and ignited my mind with curiosity and thirst for knowledge Acknowledgements First and foremost, I would like to acknowledge immense contribution of my parents Shri. Arun Kumar Chakrabortty and Smt. Sikha Chakrabortty for successful completion of my research, for it is they who kept me going despite numerous difficulties and always inspired me to reach my goals. They sacrificed their present to secure my future. Without unwavering support and continuous encouragement from my brother Dr. Sudipto Kumar Chakrabortty, this work would have never seen the light of the day. I would like to take this opportunity thank my advisor Dr. Alexei Fedorov for his immense patience and constant motivation while I took my baby steps towards the huge ocean of scientific knowledge. I will be forever indebted for his irreplaceable ideas of critical analysis, and vital strategies to deal with insurmountable bioinformatics and algorithmic challenges. I am also deeply obligated towards my teachers and committee members Dr. Robert Blumenthal, Dr. Sadik Khuder, and Dr. Robert Trumbly for their invaluable professional and personal lessons for a successful bioinformatics career; without their constant support, this degree would not have come to fruition. I also owe gratitude toward my coworkers Rajib Dutta, Patrick Brennan and Basil Khuder for their inspiring ideas, constant assistance while we worked as a team, and for fostering strong bonds of friendship and camaraderie. I would also like to thank Jo Anne Gray and all my colleagues for helping me at different stages of the graduate program. v Table of Contents Abstract ................................................................................................................... iii Acknowledgements ................................................................................................... iv Table of Contents ....................................................................................................... v List of Tables ........................................................................................................ viii List of Figures........................................................................................................... ix List of Abbreviations ................................................................................................. x List of Symbols ......................................................................................................... xi 1 Chapter 1. Correlation of recombination rate with genetic diversity in human genome 1.1 Synopsis ..................................................................................................... 1 1.2 Introduction 1.2.1 Recombination rate an important determinant of genetic diversity .............. 2 1.2.2 Recombination increases genetic diversity by reducing the effect of two main selective forces: Genetic Hitchhiking and Background Selection ................ 4 1.2.3 Recombination rate is positively correlated with genetic diversity in natural populations ..................................................................................................... 5 1.3 Materials and Methods 1.3.1 GEMA computational modelling program ................................................... 7 1.3.2 Modes of GEMA program .......................................................................... 9 1.3.3 Fitness calculation .................................................................................... 10 vi 1.3.4 Parameters used for GEMA modelling ..................................................... 10 1.3.4.1 Recombination rate .................................................................... 11 1.3.4.2 Modes of gene functionality ........................................................ 11 1.3.4.3 Number of offspring ................................................................... 12 1.3.4.4 Population size ........................................................................... 12 1.3.4.5 Gene size and gene length .......................................................... 12 1.3.4.6 Mutation rate .............................................................................. 13 1.3.4.7 Distribution of Selection Coefficient in the population ............... 14 1.4 Results 1.4.1 GEMA program under saturated mode 1.4.1.1 GEMA in dominant mode of gene functionality ......................... 15 1.4.1.2 GEMA in codominant mode of gene functionality ...................... 18 1.4.1.3 GEMA in recessive mode of gene functionality .......................... 18 1.4.2 GEMA program under unsaturated mode 1.4.2.1 GEMA in dominant mode of gene functionality .......................... 21 1.4.2.2 GEMA in codominant mode of gene functionality ..................... 24 1.4.2.3 GEMA in recessive mode of gene functionality .......................... 24 1.4.3 GEMA under no selection pressure ........................................................... 31 1.5 Summary of conclusion ........................................................................................... 34 2 Chapter 2. Identification of rare genetic variant for Retinal Dystrophy in a family 2.1 Synopsis ................................................................................................... 35 2.2 Introduction ................................................................................................... 36 vii 2.3 Materials and Methods ............................................................................................ 42 2.3.1 Filtering against 1000 Genome Phase 1 and Phase 3 ................................. 45 2.3.2 Filtering based upon Genotype ................................................................. 45 2.3.3 Variant Analysis 2.3.3.1 Variant analysis using IGV ........................................................ 46 2.3.3.2 Variant analysis using database and literature survey .................. 49 2.3.4 Confirmation of unknown novel variant ................................................... 52 2.4 Results ................................................................................................... 53 2.5 Summary of conclusion ......................................................................................... 61 References ................................................................................................... 67 A Appendix A ....................................................................................................................
Recommended publications
  • Frameshift Indels Generate Highly Immunogenic Tumor Neoantigens Tumor-Specifi C Neoantigens Are the Targets of T Cells in the Neoantigens
    Published OnlineFirst July 21, 2017; DOI: 10.1158/2159-8290.CD-RW2017-135 RESEARCH WATCH Apoptosis Major finding: An NMR-based fragment Mechanism: BIF-44 binds to a deep hy- Impact: Allosteric BAX sensitization screen identified a BAX-interacting com- drophobic pocket to induce conformation may represent a therapeutic strategy pound, BIF-44, that enhances BAX activity . changes that sensitize BAX activation . to promote apoptosis of cancer cells . BAX CAN BE ALLOSTERICALLY SENSITIZED TO PROMOTE APOPTOSIS The proapoptotic BAX protein is comprised of BH3 motif of the BIM protein. BIF-44 bound nine α-helices (α1–α9) and is a critical regulator competitively to the same region as vMIA, in a of the mitochondrial apoptosis pathway. In the deep hydrophobic pocket formed by the junction conformationally inactive state, BAX is primar- of the α3–α4 and α5–α6 hairpins that normally ily cytosolic and can be activated by BH3-only maintain BAX in an inactive state. Binding of BIF- activator proteins, which bind to ab α6/α6 “trig- 44 induced a structural change that resulted in ger site” to induce a conformational change that allosteric mobilization of the α1–α2 loop, which activates BAX and promotes its oligomerization. is involved in BH3-mediated activation, and the Conversely, antiapoptotic BCL2 proteins or the cytomeg- BAX BH3 helix, which is involved in propagating BAX oli- alovirus vMIA protein can bind to and inhibit BAX. Efforts gomerization, resulting in sensitization of BAX activation. In to therapeutically enhance apoptosis have largely focused addition to identifying a BAX allosteric sensitization site and on inhibiting antiapoptotic proteins.
    [Show full text]
  • Small Variants Frequently Asked Questions (FAQ) Updated September 2011
    Small Variants Frequently Asked Questions (FAQ) Updated September 2011 Summary Information for each Genome .......................................................................................................... 3 How does Complete Genomics map reads and call variations? ........................................................................... 3 How do I assess the quality of a genome produced by Complete Genomics?................................................ 4 What is the difference between “Gross mapping yield” and “Both arms mapped yield” in the summary file? ............................................................................................................................................................................. 5 What are the definitions for Fully Called, Partially Called, Half-Called and No-Called?............................ 5 In the summary-[ASM-ID].tsv file, how is the number of homozygous SNPs calculated? ......................... 5 In the summary-[ASM-ID].tsv file, how is the number of heterozygous SNPs calculated? ....................... 5 In the summary-[ASM-ID].tsv file, how is the total number of SNPs calculated? .......................................... 5 In the summary-[ASM-ID].tsv file, what regions of the genome are included in the “exome”? .............. 6 In the summary-[ASM-ID].tsv file, how is the number of SNPs in the exome calculated? ......................... 6 In the summary-[ASM-ID].tsv file, how are variations in potentially redundant regions of the genome counted? .....................................................................................................................................................................
    [Show full text]
  • Relationship Between Sequence Homology, Genome Architecture, and Meiotic Behavior of the Sex Chromosomes in North American Voles
    HIGHLIGHTED ARTICLE | INVESTIGATION Relationship Between Sequence Homology, Genome Architecture, and Meiotic Behavior of the Sex Chromosomes in North American Voles Beth L. Dumont,*,1,2 Christina L. Williams,† Bee Ling Ng,‡ Valerie Horncastle,§ Carol L. Chambers,§ Lisa A. McGraw,** David Adams,‡ Trudy F. C. Mackay,*,**,†† and Matthew Breen†,†† *Initiative in Biological Complexity, †Department of Molecular Biomedical Sciences, College of Veterinary Medicine, **Department of Biological Sciences, and ††Comparative Medicine Institute, North Carolina State University, Raleigh, North Carolina 04609, ‡Cytometry Core Facility, Wellcome Sanger Institute, Hinxton, United Kingdom, CB10 1SA and §School of Forestry, Northern Arizona University, Flagstaff, Arizona 86011 ORCID ID: 0000-0003-0918-0389 (B.L.D.) ABSTRACT In most mammals, the X and Y chromosomes synapse and recombine along a conserved region of homology known as the pseudoautosomal region (PAR). These homology-driven interactions are required for meiotic progression and are essential for male fertility. Although the PAR fulfills key meiotic functions in most mammals, several exceptional species lack PAR-mediated sex chromosome associations at meiosis. Here, we leveraged the natural variation in meiotic sex chromosome programs present in North American voles (Microtus) to investigate the relationship between meiotic sex chromosome dynamics and X/Y sequence homology. To this end, we developed a novel, reference-blind computational method to analyze sparse sequencing data from flow- sorted X and Y chromosomes isolated from vole species with sex chromosomes that always (Microtus montanus), never (Microtus mogollonensis), and occasionally synapse (Microtus ochrogaster) at meiosis. Unexpectedly, we find more shared X/Y homology in the two vole species with no and sporadic X/Y synapsis compared to the species with obligate synapsis.
    [Show full text]
  • Mutational Landscape of Spontaneous Base Substitutions and Small Indels in Experimental Caenorhabditis Elegans Populations of Differing Size
    | INVESTIGATION Mutational Landscape of Spontaneous Base Substitutions and Small Indels in Experimental Caenorhabditis elegans Populations of Differing Size Anke Konrad, Meghan J. Brady, Ulfar Bergthorsson, and Vaishali Katju1 Department of Veterinary Integrative Biosciences, Texas A&M University, College Station, Texas 77845 ORCID IDs: 0000-0003-3994-460X (A.K.); 0000-0003-1419-1349 (U.B.); 0000-0003-4720-9007 (V.K.) ABSTRACT Experimental investigations into the rates and fitness effects of spontaneous mutations are fundamental to our understanding of the evolutionary process. To gain insights into the molecular and fitness consequences of spontaneous mutations, we conducted a mutation accumulation (MA) experiment at varying population sizes in the nematode Caenorhabditis elegans, evolving 35 lines in parallel for 409 generations at three population sizes (N = 1, 10, and 100 individuals). Here, we focus on nuclear SNPs and small insertion/deletions (indels) under minimal influence of selection, as well as their accrual rates in larger populations under greater selection efficacy. The spontaneous rates of base substitutions and small indels are 1.84 (95% C.I. 6 0.14) 3 1029 substitutions and 6.84 (95% C.I. 6 0.97) 3 10210 changes/site/generation, respectively. Small indels exhibit a deletion bias with deletions exceeding insertions by threefold. Notably, there was no correlation between the frequency of base substitutions, nonsynonymous substitutions, or small indels with population size. These results contrast with our previous analysis of mitochondrial DNA mutations and nuclear copy-number changes in these MA lines, and suggest that nuclear base substitutions and small indels are under less stringent purifying selection compared to the former mutational classes.
    [Show full text]
  • 1 Early Patterning and Specification of Cardiac Progenitors In
    Early Patterning and Specification of Cardiac Progenitors in Gastrulating Mesoderm W. Patrick Devine1,2,3,4, Joshua D. Wythe1,2, Matthew George1,2,5 , Kazuko Koshiba- Takeuchi1,2, Benoit G. Bruneau1,2,4,5 1. Gladstone Institute of Cardiovascular Disease, San Francisco, CA, 94158 USA 2. Roddenberry Center for Stem Cell Biology and Medicine at Gladstone, San Francisco, CA 94158, USA 3. Department of Pathology, University of California, San Francisco, CA 94143 USA 4. Cardiovascular Research Institute, University of California, San Francisco, CA 94158 USA 5. Developmental and Stem Cell Biology Program, University of San Francisco, CA 94143, USA 6. Department of Pediatrics, University of California, San Francisco, CA 94143 USA Competing interests statement: The authors declare no competing interests. 1 Abstract Mammalian heart development requires precise allocation of cardiac progenitors. The existence of a multipotent progenitor for all anatomic and cellular components of the heart has been predicted but its identity and contribution to the two cardiac progenitor "fields" has remained undefined. Here we show, using clonal genetic fate mapping, that Mesp1+ cells in gastrulating mesoderm are rapidly specified into committed cardiac precursors fated for distinct anatomic regions of the heart. We identify Smarcd3 as a marker of early specified cardiac precursors and identify within these precursors a compartment boundary at the future junction of the left and right ventricles that arises prior to morphogenesis. Our studies define the timing and hierarchy of cardiac progenitor specification and demonstrate that the cellular and anatomical fate of mesoderm-derived cardiac cells is specified very early. These findings will be important to understand the basis of congenital heart defects and to derive cardiac regeneration strategies.
    [Show full text]
  • Supplemental Table 1. Complete Gene Lists and GO Terms from Figure 3C
    Supplemental Table 1. Complete gene lists and GO terms from Figure 3C. Path 1 Genes: RP11-34P13.15, RP4-758J18.10, VWA1, CHD5, AZIN2, FOXO6, RP11-403I13.8, ARHGAP30, RGS4, LRRN2, RASSF5, SERTAD4, GJC2, RHOU, REEP1, FOXI3, SH3RF3, COL4A4, ZDHHC23, FGFR3, PPP2R2C, CTD-2031P19.4, RNF182, GRM4, PRR15, DGKI, CHMP4C, CALB1, SPAG1, KLF4, ENG, RET, GDF10, ADAMTS14, SPOCK2, MBL1P, ADAM8, LRP4-AS1, CARNS1, DGAT2, CRYAB, AP000783.1, OPCML, PLEKHG6, GDF3, EMP1, RASSF9, FAM101A, STON2, GREM1, ACTC1, CORO2B, FURIN, WFIKKN1, BAIAP3, TMC5, HS3ST4, ZFHX3, NLRP1, RASD1, CACNG4, EMILIN2, L3MBTL4, KLHL14, HMSD, RP11-849I19.1, SALL3, GADD45B, KANK3, CTC- 526N19.1, ZNF888, MMP9, BMP7, PIK3IP1, MCHR1, SYTL5, CAMK2N1, PINK1, ID3, PTPRU, MANEAL, MCOLN3, LRRC8C, NTNG1, KCNC4, RP11, 430C7.5, C1orf95, ID2-AS1, ID2, GDF7, KCNG3, RGPD8, PSD4, CCDC74B, BMPR2, KAT2B, LINC00693, ZNF654, FILIP1L, SH3TC1, CPEB2, NPFFR2, TRPC3, RP11-752L20.3, FAM198B, TLL1, CDH9, PDZD2, CHSY3, GALNT10, FOXQ1, ATXN1, ID4, COL11A2, CNR1, GTF2IP4, FZD1, PAX5, RP11-35N6.1, UNC5B, NKX1-2, FAM196A, EBF3, PRRG4, LRP4, SYT7, PLBD1, GRASP, ALX1, HIP1R, LPAR6, SLITRK6, C16orf89, RP11-491F9.1, MMP2, B3GNT9, NXPH3, TNRC6C-AS1, LDLRAD4, NOL4, SMAD7, HCN2, PDE4A, KANK2, SAMD1, EXOC3L2, IL11, EMILIN3, KCNB1, DOK5, EEF1A2, A4GALT, ADGRG2, ELF4, ABCD1 Term Count % PValue Genes regulation of pathway-restricted GDF3, SMAD7, GDF7, BMPR2, GDF10, GREM1, BMP7, LDLRAD4, SMAD protein phosphorylation 9 6.34 1.31E-08 ENG pathway-restricted SMAD protein GDF3, SMAD7, GDF7, BMPR2, GDF10, GREM1, BMP7, LDLRAD4, phosphorylation
    [Show full text]
  • The Role of Cyclin B3 in Mammalian Meiosis
    THE ROLE OF CYCLIN B3 IN MAMMALIAN MEIOSIS by Mehmet Erman Karasu A Dissertation Presented to the Faculty of the Louis V. Gerstner Jr. Graduate School of Biomedical Sciences, Memorial Sloan Kettering Cancer Center In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy New York, NY November, 2018 Scott Keeney, PhD Date Dissertation Mentor Copyright © Mehmet Erman Karasu 2018 DEDICATION I would like to dedicate this thesis to my parents, Mukaddes and Mustafa Karasu. I have been so lucky to have their support and unconditional love in this life. ii ABSTRACT Cyclins and cyclin dependent kinases (CDKs) lie at the center of the regulation of the cell cycle. Cyclins as regulatory partners of CDKs control the switch-like cell cycle transitions that orchestrate orderly duplication and segregation of genomes. Similar to somatic cell division, temporal regulation of cyclin-CDK activity is also important in meiosis, which is the specialized cell division that generates gametes for sexual production by halving the genome. Meiosis does so by carrying out one round of DNA replication followed by two successive divisions without another intervening phase of DNA replication. In budding yeast, cyclin-CDK activity has been shown to have a crucial role in meiotic events such as formation of meiotic double-strand breaks that initiate homologous recombination. Mammalian cells express numerous cyclins and CDKs, but how these proteins control meiosis remains poorly understood. Cyclin B3 was previously identified as germ cell specific, and its restricted expression pattern at the beginning of meiosis made it an interesting candidate to regulate meiotic events.
    [Show full text]
  • Genome-Wide Transcriptome and Binding Sites Analyses Identify
    www.nature.com/scientificreports OPEN Genome-Wide Transcriptome and Binding Sites Analyses Identify Early FOX Expressions Received:​06​October​2015 Accepted:​14​July​2016 for Enhancing Cardiomyogenesis Published:​09​August​2016 Efficiency of hESC Cultures Hock Chuan Yeo1,2, Sherwin Ting1, Romulo Martin Brena3, Geoffrey Koh1, Allen Chen1, Siew Qi Toh2, Yu Ming Lim1, Steve Kah Weng Oh1 & Dong-Yup Lee1,2,4 The differentiation efficiency of human embryonic stem cells (hESCs) into heart muscle cells (cardiomyocytes) is highly sensitive to culture conditions. To elucidate the regulatory mechanisms involved, we investigated hESCs grown on three distinct culture platforms: feeder-free Matrigel, mouse embryonic fibroblast feeders, and Matrigel replated on feeders. At the outset, we profiled and quantified their differentiation efficiency, transcriptome, transcription factor binding sites and DNA- methylation. Subsequent genome-wide analyses allowed us to reconstruct the relevant interactome, thereby forming the regulatory basis for implicating the contrasting differentiation efficiency of the culture conditions. We hypothesized that the parental expressions of FOXC1, FOXD1 and FOXQ1 transcription factors (TFs) are correlative with eventual cardiomyogenic outcome. Through WNT induction of the FOX TFs, we observed the co-activation of WNT3 and EOMES which are potent inducers of mesoderm differentiation. The result strengthened our hypothesis on the regulatory role of the FOX TFs in enhancing mesoderm differentiation capacity of hESCs. Importantly, the final proportions of cells expressing cardiac markers were directly correlated to the strength of FOX inductions within 72 hours after initiation of differentiation across different cell lines and protocols. Thus, we affirmed the relationship between early FOX TF expressions and cardiomyogenesis efficiency.
    [Show full text]
  • Indelible Markers the Recruitment of Modified Histones by the RITS Complex
    RESEARCH HIGHLIGHTS IN BRIEF EPIGENETICS Argonaute slicing is required for heterochromatic silencing and spreading. HUMAN GENETICS Irvine, D. V. et al. Science 313, 1134–1137 (2006) It has been proposed that small interfering RNA (siRNA)- guided histone H3 dimethylation on lysine 9 (H3K9me2) might be caused by an interaction of siRNA with DNA and INDELible markers the recruitment of modified histones by the RITS complex. Alternatively, siRNAs might guide histone modification by Over 10 million unique SNPs, some comprised about 30% of the base-pairing with RNA. Working in fission yeast, Irvine et al. of which influence human traits and total. Another ~30% consisted of provide support for the second mechanism. They show that disease susceptibilities, have been expansions of either monomeric the endonucleolytic cleavage motif of Argonaute is required identified in the human genome. base-pair repeats or multi-base for heterochromatic silencing and for ‘slicing’ mRNAs that are Now, another type of natural genetic repeats. Approximately 40% of indels complementary to siRNAs. They also show that spreading of variation, which involves insertion included insertions of apparently ran- silencing requires read-through transcription, as well as slicing. and deletion polymorphisms (indels), dom DNA sequences. Transposons has been systematically studied and accounted for only a small proportion TECHNOLOGY mapped for the first time. (less than 1%) of the polymorphisms Trans-kingdom transposition of the maize Dissociation Understanding more about indels that were identified. element. is important because they are known Indels were spread throughout Emelyanov, A. et al. Genetics 1 September 2006 (doi:10.1534/ to contribute to human disease.
    [Show full text]
  • Pervasive Indels and Their Evolutionary Dynamics After The
    MBE Advance Access published April 24, 2012 Pervasive Indels and Their Evolutionary Dynamics after the Fish-Specific Genome Duplication Baocheng Guo,1,2 Ming Zou,3 and Andreas Wagner*,1,2 1Institute of Evolutionary Biology and Environmental Studies, University of Zurich, Zurich, Switzerland 2The Swiss Institute of Bioinformatics, Quartier Sorge-Batiment Genopode, Lausanne, Switzerland 3Key Laboratory of Aquatic Biodiversity and Conservation, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, People’s Republic of China *Corresponding author: E-mail: [email protected]. Associate editor: Herve´ Philippe Abstract Research article Insertions and deletions (indels) in protein-coding genes are important sources of genetic variation. Their role in creating new proteins may be especially important after gene duplication. However, little is known about how indels affect the divergence of duplicate genes. We here study thousands of duplicate genes in five fish (teleost) species with completely sequenced genomes. The ancestor of these species has been subject to a fish-specific genome duplication (FSGD) event Downloaded from that occurred approximately 350 Ma. We find that duplicate genes contain at least 25% more indels than single-copy genes. These indels accumulated preferentially in the first 40 my after the FSGD. A lack of widespread asymmetric indel accumulation indicates that both members of a duplicate gene pair typically experience relaxed selection. Strikingly, we observe a 30–80% excess of deletions over insertions that is consistent for indels of various lengths and across the five genomes. We also find that indels preferentially accumulate inside loop regions of protein secondary structure and in http://mbe.oxfordjournals.org/ regions where amino acids are exposed to solvent.
    [Show full text]
  • The Origin, Evolution, and Functional Impact of Short Insertion–Deletion Variants Identified in 179 Human Genomes
    Downloaded from genome.cshlp.org on October 4, 2021 - Published by Cold Spring Harbor Laboratory Press Research The origin, evolution, and functional impact of short insertion–deletion variants identified in 179 human genomes Stephen B. Montgomery,1,2,3,14,16 David L. Goode,3,14,15 Erika Kvikstad,4,13,14 Cornelis A. Albers,5,6 Zhengdong D. Zhang,7 Xinmeng Jasmine Mu,8 Guruprasad Ananda,9 Bryan Howie,10 Konrad J. Karczewski,3 Kevin S. Smith,2 Vanessa Anaya,2 Rhea Richardson,2 Joe Davis,3 The 1000 Genomes Pilot Project Consortium, Daniel G. MacArthur,5,11 Arend Sidow,2,3 Laurent Duret,4 Mark Gerstein,8 Kateryna D. Makova,9 Jonathan Marchini,12 Gil McVean,12,13 and Gerton Lunter13,16 1–13[Author affiliations appear at the end of the paper.] Short insertions and deletions (indels) are the second most abundant form of human genetic variation, but our un- derstanding of their origins and functional effects lags behind that of other types of variants. Using population-scale sequencing, we have identified a high-quality set of 1.6 million indels from 179 individuals representing three diverse human populations. We show that rates of indel mutagenesis are highly heterogeneous, with 43%–48% of indels occurring in 4.03% of the genome, whereas in the remaining 96% their prevalence is 16 times lower than SNPs. Polymerase slippage can explain upwards of three-fourths of all indels, with the remainder being mostly simple de- letions in complex sequence. However, insertions do occur and are significantly associated with pseudo-palindromic sequence features compatible with the fork stalling and template switching (FoSTeS) mechanism more commonly as- sociated with large structural variations.
    [Show full text]
  • The Genomics Era: the Future of Genetics in Medicine - Glossary
    The Genomics Era: the Future of Genetics in Medicine - Glossary The glossary below provides a list of key terms used throughout the course. You do not need to read them all now; we’ll be linking back to the main glossary step wherever these terms appear, so you may refer back to this list if you are unsure of the terminology being used. Term Definition The process of matching reads back to their original Alignment position in the reference genome. An allele is one of a number of alternative forms of the same gene or genetic locus. We inherit one copy Allele of our genetic code from our mother and one copy of our genetic code from our father. Each copy is known as an allele. Microarray based genomic comparative hybridisation. This is a technique used to detect chromosome imbalances by comparing patient and control DNA and comparing differences between the two sets. It is Array CGH a useful technique for detecting small chromosome deletions and duplications which would not have been detected with more traditional karyotyping techniques. A unit of DNA. There are four bases which form the Base cross links (or rungs) of the DNA double helix: adenine (A), thymine (T), guanine (G) and cytosine (C). Capture see Target enrichment. The process by which a cell becomes specialized in Cell differentiation order to perform a specific function. Centromere The point at which the sister chromatids are joined. #1 FutureLearn A structure located in the nucleus all living cells, comprised of DNA bound around proteins called histones. The normal number of chromosomes in each Chromosome human cell nucleus is 46 and is composed of 22 pairs of autosomes and a pair of sex chromosomes which determine gender: males have an X and a Y chromosome whilst females have two X chromosomes.
    [Show full text]