Uncovering the Variability, Regulatory Roles and Mutation Rates of Short Tandem Repeats by Thomas F

Total Page:16

File Type:pdf, Size:1020Kb

Uncovering the Variability, Regulatory Roles and Mutation Rates of Short Tandem Repeats by Thomas F Uncovering the variability, regulatory roles and mutation rates of short tandem repeats by Thomas F. Willems B.S. in Chemical Engineering, University of California, Berkeley (2011) Submitted to the Computational and Systems Biology Program in partial fulfillment of the requirements for the degree of Doctor of Philosophy at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY June 2016 ○c Massachusetts Institute of Technology 2016. All rights reserved. Author............................................................................ Computational and Systems Biology Program April 26, 2016 Certified by . Yaniv Erlich, PhD Assistant Professor of Computer Science, Columbia University Thesis Supervisor Certified by . Manolis Kellis, PhD Associate Professor of Computer Science, MIT Thesis Supervisor Accepted by....................................................................... Christopher B. Burge, PhD Professor of Biology and Biological Engineering Director, Computational and Systems Biology Graduate Program 2 Uncovering the variability, regulatory roles and mutation rates of short tandem repeats by Thomas F. Willems Submitted to the Computational and Systems Biology Program on April 26, 2016, in partial fulfillment of the requirements for the degree of Doctor of Philosophy Abstract Over the past decade, the advent of next-generation DNA sequencing technologies has ushered in an exciting era of biological research. Through large-scale sequencing projects, scientists have begun to unveil the variability and function of millions of DNA mutations called single nucleotide polymorphisms. Despite this rapid growth in understanding, short tandem repeats (STRs), ge- nomic elements consisting of a repeating pattern of 2-6 bases, have remained poorly understood. Mutating orders of magnitude more rapidly than most of the human genome, STRs have been identified as the causal variants in diseases such as Fragile X syndrome and Huntington’s disease. However, in spite of their potentially profound biological consequences, STRs remain systemat- ically understudied due to difficulties associated with obtaining accurate genotypes. To address this issue, we developed a series of bioinformatics approaches and applied them to population- scale whole-genome sequencing data sets. Using data from the 1000 Genomes Project, we performed the first genome-wide characterization of STR variability by analyzing over 700,000 loci in more than 1000 individuals. Next, we integrated these genotypes with expression data to assess the contribution of STRs to gene expression in humans, uncovering their substantial regulatory role. We then developed a state-of-the-art algorithm to genotype STRs, resulting in vastly improved accuracy and uncovering hundreds of replicable de novo mutations in a deeply sequenced trio. Lastly, we developed a novel approach to estimate mutation rates for STRs on the Y-chromosome (Y-STR), resulting in rates for hundreds of previously uncharacterized markers. Collectively, these analyses highlight the extreme variability of STRs and provide a framework for incorporating them into future studies. Thesis Supervisor: Yaniv Erlich, PhD Title: Assistant Professor of Computer Science, Columbia University Thesis Supervisor: Manolis Kellis, PhD Title: Associate Professor of Computer Science, MIT 3 4 Acknowledgments My journey through graduate school never would have started, let alone finished, if not for an enormous number of people that have supported and inspired me along the way. I sincerely want to thank the following people that have made the past five years a fantastic experience: ∙ Yaniv Erlich, for believing in me, inspiring me and teaching me how to do great science, and for introducing me to the exciting world of genomics. ∙ Manolis Kellis, for adopting me into his lab and for his limitless positive energy and sound scientific advice. ∙ David Housman and Laurie Boyer, for their terrific scientific guidance throughout my PhD. ∙ My labmates: Melissa Gymrek, for teaching me all she knows about STRs and bioinfor- matics and for being a great friend and colleague. Dina Zielinski, for lending her wet lab expertise and challenging me to swimming duels. Assaf Gordon, for his never ending wise programming advice and generosity. Sophie Zaaijer, for always welcoming me and entertaining me at the NYGC. And to all the other members of the Erlich lab that have come and gone, thank you for making it a fun and exciting place to work. ∙ The Whitehead Institute, for providing a fantastic environment for young scientists. ∙ Area Four coffee, for providing the fuel that has sustained me throughout graduate school. ∙ Eric Zhu, for being a sage and tireless friend. ∙ Rosanna Lim and Aanchal Jain, for much needed relaxation during Friday night dinners. ∙ Lionel Lam, for providing timely musical relief and humor. ∙ Xiao Su, for teaching me the value of a coffee or donut break. ∙ Mark, Sean, Steven, William, Karthik and David, for many much needed nights out in Boston. ∙ Jacquie Carota, for her tireless work as our program administrator. ∙ My brother Nick, my sister Julie and my brother-in-law Laurens, for their love and support throughout graduate school. ∙ My girlfriend Su Luo, whose love has carried me through difficult times and has inspired me to aim for new heights. 5 ∙ My parents Paul and Carine, for molding me into the person I am today and prioritizing my education. Mom and Dad, without your endless advice and support, none of this would ever have been possible. 6 Contents 1 Introduction 13 1.1 Overview...................................... 13 1.2 Defining an STR.................................. 15 1.3 STR applications.................................. 16 1.4 Population-scale STR variation.......................... 17 1.5 STRs and complex traits............................. 18 1.6 STR genotyping methods............................. 19 1.7 Y-STR mutation rates............................... 23 2 The landscape of human STR variation 25 2.1 Introduction.................................... 25 2.2 Results....................................... 27 2.2.1 Identifying STR loci in the human genome................ 27 2.2.2 Profiling STRs in 1000 Genomes samples................. 28 2.2.3 Quality assessment............................ 29 2.2.4 Validation using population genetics trends................ 34 2.2.5 Patterns of STR variation......................... 34 2.2.6 The prototypical STR........................... 37 2.2.7 STRs in the NCBI reference and LoF analysis.............. 38 2.2.8 Linkage disequilibrium between STRs and SNPs............. 39 2.3 Discussion..................................... 40 2.4 Methods...................................... 42 2.4.1 Call set generation............................. 42 2.4.2 Estimating the number of samples per locus and number of loci per sample 43 2.4.3 Saturation analysis............................ 43 2.4.4 Mendelian inheritance........................... 44 2.4.5 Capillary electrophoresis comparison................... 44 2.4.6 Heterozygosity calculations........................ 45 2.4.7 Summary statistic comparisons...................... 45 7 2.4.8 Comparison of population heterozygosity................. 45 2.4.9 Deviation of lobSTR calls from the NCBI reference........... 46 2.4.10 Sample clustering............................. 46 2.4.11 STR variability trends........................... 46 2.4.12 Extraction of orthologous chimp STR lengths.............. 47 2.4.13 Rst levels.................................. 47 2.4.14 Assessing linkage disequilibrium...................... 47 2.5 Acknowledgments................................. 48 2.6 Supplemental Text................................. 48 2.6.1 Finding putative STR loci in the human genome............. 48 2.6.2 Comparison of STR thresholds to prior studies.............. 49 2.6.3 Incorporation of annotated STRs..................... 49 2.6.4 Assessment of empirical score thresholds with a permissive call set... 50 2.6.5 Call set integration............................ 51 2.6.6 Homopolymer STRs............................ 52 2.7 Supplemental Tables................................ 53 2.8 Supplemental Figures............................... 57 3 Abundant contribution of short tandem repeats to gene expression variation in humans 67 3.1 Introduction.................................... 68 3.2 Results....................................... 69 3.2.1 Initial genome-wide discovery of eSTRs.................. 69 3.2.2 Partitioning the contribution of eSTR and nearby variants........ 72 3.2.3 The effect of eSTRs in the context of individual SNP eQTLs...... 73 3.2.4 Integrative genomic evidence for a functional role of eSTRs....... 74 3.2.5 The potential role of eSTRs in human conditions............ 77 3.3 Discussion..................................... 79 3.4 Acknowledgements................................. 80 3.5 Author contributions................................ 81 3.6 Methods...................................... 81 3.6.1 Genotype datasets............................. 81 3.6.2 Targeted sequencing of promoter region STRs.............. 81 3.6.3 Expression datasets............................ 82 8 3.6.4 eQTL association testing......................... 82 3.6.5 Controlling for gene-level false discovery rate............... 83 3.6.6 Partitioning heritability using linear mixed models............ 84 3.6.7 Comparing to the lead eSNP....................... 85 3.6.8 Conservation analysis........................... 86 3.6.9 Enrichment of STRs and eSTRs in predicted enhancers........
Recommended publications
  • Genome Signature-Based Dissection of Human Gut Metagenomes to Extract Subliminal Viral Sequences
    ARTICLE Received 16 Apr 2013 | Accepted 8 Aug 2013 | Published 16 Sep 2013 DOI: 10.1038/ncomms3420 OPEN Genome signature-based dissection of human gut metagenomes to extract subliminal viral sequences Lesley A. Ogilvie1, Lucas D. Bowler1, Jonathan Caplin2, Cinzia Dedi1, David Diston2,w, Elizabeth Cheek3, Huw Taylor2, James E. Ebdon2 & Brian V. Jones1 Bacterial viruses (bacteriophages) have a key role in shaping the development and functional outputs of host microbiomes. Although metagenomic approaches have greatly expanded our understanding of the prokaryotic virosphere, additional tools are required for the phage- oriented dissection of metagenomic data sets, and host-range affiliation of recovered sequences. Here we demonstrate the application of a genome signature-based approach to interrogate conventional whole-community metagenomes and access subliminal, phylogen- etically targeted, phage sequences present within. We describe a portion of the biological dark matter extant in the human gut virome, and bring to light a population of potentially gut- specific Bacteroidales-like phage, poorly represented in existing virus like particle-derived viral metagenomes. These predominantly temperate phage were shown to encode functions of direct relevance to human health in the form of antibiotic resistance genes, and provided evidence for the existence of putative ‘viral-enterotypes’ among this fraction of the human gut virome. 1 Centre for Biomedical and Health Science Research, School of Pharmacy and Biomolecular Sciences, University of Brighton, Brighton BN2 4GJ, UK. 2 School of Environment and Technology, University of Brighton, Brighton BN2 4GJ, UK. 3 School of Computing, Engineering and Mathematics, University of Brighton, Brighton BN2 4GJ, UK. w Present address: Mikrobiologische and Biotechnologische Risiken Bundesamt fu¨r Gesundheit BAG, 3003 Bern, Switzerland.
    [Show full text]
  • The 4D Nucleome Project
    The 4D nucleome project The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation Dekker, Job et al. “The 4D Nucleome Project.” Nature 549, no. 7671 (September 2017): 219–226 © 2017 Macmillan Publishers Limited, part of Springer Nature As Published http://dx.doi.org/10.1038/NATURE23884 Publisher Nature Publishing Group Version Author's final manuscript Citable link http://hdl.handle.net/1721.1/114838 Terms of Use Creative Commons Attribution-Noncommercial-Share Alike Detailed Terms http://creativecommons.org/licenses/by-nc-sa/4.0/ HHS Public Access Author manuscript Author ManuscriptAuthor Manuscript Author Nature. Manuscript Author Author manuscript; Manuscript Author available in PMC 2017 September 27. Published in final edited form as: Nature. 2017 September 13; 549(7671): 219–226. doi:10.1038/nature23884. The 4D Nucleome Project Job Dekker1, Andrew S. Belmont2, Mitchell Guttman3, Victor O. Leshyk4, John T. Lis5, Stavros Lomvardas6, Leonid A. Mirny7, Clodagh C. O’Shea8, Peter J. Park9, Bing Ren10, Joan C. Ritland Politz11, Jay Shendure12, Sheng Zhong4, and the 4D Nucleome Network13 1Program in Systems Biology, Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Howard Hughes Medical Institute, Worcester, MA 01605 2Department of Cell and Developmental Biology, University of Illinois, Urbana-Champaign, IL 61801 3Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125 4Department
    [Show full text]
  • Identification of the Binding Partners for Hspb2 and Cryab Reveals
    Brigham Young University BYU ScholarsArchive Theses and Dissertations 2013-12-12 Identification of the Binding arP tners for HspB2 and CryAB Reveals Myofibril and Mitochondrial Protein Interactions and Non- Redundant Roles for Small Heat Shock Proteins Kelsey Murphey Langston Brigham Young University - Provo Follow this and additional works at: https://scholarsarchive.byu.edu/etd Part of the Microbiology Commons BYU ScholarsArchive Citation Langston, Kelsey Murphey, "Identification of the Binding Partners for HspB2 and CryAB Reveals Myofibril and Mitochondrial Protein Interactions and Non-Redundant Roles for Small Heat Shock Proteins" (2013). Theses and Dissertations. 3822. https://scholarsarchive.byu.edu/etd/3822 This Thesis is brought to you for free and open access by BYU ScholarsArchive. It has been accepted for inclusion in Theses and Dissertations by an authorized administrator of BYU ScholarsArchive. For more information, please contact [email protected], [email protected]. Identification of the Binding Partners for HspB2 and CryAB Reveals Myofibril and Mitochondrial Protein Interactions and Non-Redundant Roles for Small Heat Shock Proteins Kelsey Langston A thesis submitted to the faculty of Brigham Young University in partial fulfillment of the requirements for the degree of Master of Science Julianne H. Grose, Chair William R. McCleary Brian Poole Department of Microbiology and Molecular Biology Brigham Young University December 2013 Copyright © 2013 Kelsey Langston All Rights Reserved ABSTRACT Identification of the Binding Partners for HspB2 and CryAB Reveals Myofibril and Mitochondrial Protein Interactors and Non-Redundant Roles for Small Heat Shock Proteins Kelsey Langston Department of Microbiology and Molecular Biology, BYU Master of Science Small Heat Shock Proteins (sHSP) are molecular chaperones that play protective roles in cell survival and have been shown to possess chaperone activity.
    [Show full text]
  • The Contribution of Mitochondrial Metagenomics to Large-Scale Data
    Molecular Phylogenetics and Evolution 128 (2018) 1–11 Contents lists available at ScienceDirect Molecular Phylogenetics and Evolution journal homepage: www.elsevier.com/locate/ympev The contribution of mitochondrial metagenomics to large-scale data mining and phylogenetic analysis of Coleoptera T ⁎ Benjamin Linarda,i,j, , Alex Crampton-Platta,b,k, Jerome Morinierec, Martijn J.T.N. Timmermansa,d, Carmelo Andújara,b,l, Paula Arribasa,b,l, Kirsten E. Millera,b,m, Julia Lipeckia,n, Emeline Favreaua,o, Amie Huntera, Carola Gómez-Rodrígueze, Christopher Bartona,p, Ruie Niea,f, Conrad P.D.T. Gilletta,q, Thijmen Breeschotena,g,r, Ladislav Bocakh, Alfried P. Voglera,b a Department of Life Sciences, Natural History Museum, London SW7 5BD, United Kingdom b Department of Life Sciences, Silwood Park Campus, Imperial College London, Ascot SL5 7PY, United Kingdom c Bavarian State Collection of Zoology (SNSB – ZSM), Münchhausenstrasse 21, 81247 München, Germany d Department of Natural Sciences, Hendon Campus, Middlesex University, London NW4 4BT, United Kingdom e Departamento de Zoología, Facultad de Biología, Universidad de Santiago de Compostela, c/ Lope Gómez de Marzoa s/n, 15782 Santiago de Compostela, Spain f Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China g Naturalis Biodiversity Center, Darwinweg 2, 2333 CR Leiden, The Netherlands h Department of Zoology, Faculty of Science, Palacky University, 17. listopadu 50, 771 46 Olomouc, Czech Republic i LIRMM, Univ. Montpellier,
    [Show full text]
  • The International Human Epigenome Consortium (IHEC): a Blueprint for Scientific Collaboration and Discovery
    The International Human Epigenome Consortium (IHEC): A Blueprint for Scientific Collaboration and Discovery Hendrik G. Stunnenberg1#, Martin Hirst2,3,# 1Department of Molecular Biology, Faculties of Science and Medicine, Radboud University, Nijmegen, The Netherlands 2Department of Microbiology and Immunology, Michael Smith Laboratories, University of British Columbia, Vancouver, BC, Canada V6T 1Z4. 3Canada’s Michael Smith Genome Science Center, BC Cancer Agency, Vancouver, BC, Canada V5Z 4S6 #Corresponding authors [email protected] [email protected] Abstract The International Human Epigenome Consortium (IHEC) coordinates the generation of a catalogue of high-resolution reference epigenomes of major primary human cell types. The studies now presented (cell.com/XXXXXXX) highlight the coordinated achievements of IHEC teams to gather and interpret comprehensive epigenomic data sets to gain insights in the epigenetic control of cell states relevant for human health and disease. One of the great mysteries in developmental biology is how the same genome can be read by cellular machinery to generate the plethora of different cell types required for eukaryotic life. As appreciation grew for the central roles of transcriptional and epigenetic mechanisms in specification of cellular fates and functions, researchers around the world encouraged scientific funding agencies to develop an organized and standardized effort to exploit epigenomic assays to shed additional light on this process (Beck, Olek et al. 1999, Jones and Martienssen 2005, American Association for Cancer Research Human Epigenome Task and European Union 2008). In March 2009, leading scientists and international health research funding agency representatives were invited to a meeting in Bethesda (MD, USA) to gauge the level of interest in an international epigenomics project and to identify potential areas of focus.
    [Show full text]
  • 4-6 Weeks Old Female C57BL/6 Mice Obtained from Jackson Labs Were Used for Cell Isolation
    Methods Mice: 4-6 weeks old female C57BL/6 mice obtained from Jackson labs were used for cell isolation. Female Foxp3-IRES-GFP reporter mice (1), backcrossed to B6/C57 background for 10 generations, were used for the isolation of naïve CD4 and naïve CD8 cells for the RNAseq experiments. The mice were housed in pathogen-free animal facility in the La Jolla Institute for Allergy and Immunology and were used according to protocols approved by the Institutional Animal Care and use Committee. Preparation of cells: Subsets of thymocytes were isolated by cell sorting as previously described (2), after cell surface staining using CD4 (GK1.5), CD8 (53-6.7), CD3ε (145- 2C11), CD24 (M1/69) (all from Biolegend). DP cells: CD4+CD8 int/hi; CD4 SP cells: CD4CD3 hi, CD24 int/lo; CD8 SP cells: CD8 int/hi CD4 CD3 hi, CD24 int/lo (Fig S2). Peripheral subsets were isolated after pooling spleen and lymph nodes. T cells were enriched by negative isolation using Dynabeads (Dynabeads untouched mouse T cells, 11413D, Invitrogen). After surface staining for CD4 (GK1.5), CD8 (53-6.7), CD62L (MEL-14), CD25 (PC61) and CD44 (IM7), naïve CD4+CD62L hiCD25-CD44lo and naïve CD8+CD62L hiCD25-CD44lo were obtained by sorting (BD FACS Aria). Additionally, for the RNAseq experiments, CD4 and CD8 naïve cells were isolated by sorting T cells from the Foxp3- IRES-GFP mice: CD4+CD62LhiCD25–CD44lo GFP(FOXP3)– and CD8+CD62LhiCD25– CD44lo GFP(FOXP3)– (antibodies were from Biolegend). In some cases, naïve CD4 cells were cultured in vitro under Th1 or Th2 polarizing conditions (3, 4).
    [Show full text]
  • Investigation of Candidate Genes and Mechanisms Underlying Obesity
    Prashanth et al. BMC Endocrine Disorders (2021) 21:80 https://doi.org/10.1186/s12902-021-00718-5 RESEARCH ARTICLE Open Access Investigation of candidate genes and mechanisms underlying obesity associated type 2 diabetes mellitus using bioinformatics analysis and screening of small drug molecules G. Prashanth1 , Basavaraj Vastrad2 , Anandkumar Tengli3 , Chanabasayya Vastrad4* and Iranna Kotturshetti5 Abstract Background: Obesity associated type 2 diabetes mellitus is a metabolic disorder ; however, the etiology of obesity associated type 2 diabetes mellitus remains largely unknown. There is an urgent need to further broaden the understanding of the molecular mechanism associated in obesity associated type 2 diabetes mellitus. Methods: To screen the differentially expressed genes (DEGs) that might play essential roles in obesity associated type 2 diabetes mellitus, the publicly available expression profiling by high throughput sequencing data (GSE143319) was downloaded and screened for DEGs. Then, Gene Ontology (GO) and REACTOME pathway enrichment analysis were performed. The protein - protein interaction network, miRNA - target genes regulatory network and TF-target gene regulatory network were constructed and analyzed for identification of hub and target genes. The hub genes were validated by receiver operating characteristic (ROC) curve analysis and RT- PCR analysis. Finally, a molecular docking study was performed on over expressed proteins to predict the target small drug molecules. Results: A total of 820 DEGs were identified between
    [Show full text]
  • Supplementary Table 1: Adhesion Genes Data Set
    Supplementary Table 1: Adhesion genes data set PROBE Entrez Gene ID Celera Gene ID Gene_Symbol Gene_Name 160832 1 hCG201364.3 A1BG alpha-1-B glycoprotein 223658 1 hCG201364.3 A1BG alpha-1-B glycoprotein 212988 102 hCG40040.3 ADAM10 ADAM metallopeptidase domain 10 133411 4185 hCG28232.2 ADAM11 ADAM metallopeptidase domain 11 110695 8038 hCG40937.4 ADAM12 ADAM metallopeptidase domain 12 (meltrin alpha) 195222 8038 hCG40937.4 ADAM12 ADAM metallopeptidase domain 12 (meltrin alpha) 165344 8751 hCG20021.3 ADAM15 ADAM metallopeptidase domain 15 (metargidin) 189065 6868 null ADAM17 ADAM metallopeptidase domain 17 (tumor necrosis factor, alpha, converting enzyme) 108119 8728 hCG15398.4 ADAM19 ADAM metallopeptidase domain 19 (meltrin beta) 117763 8748 hCG20675.3 ADAM20 ADAM metallopeptidase domain 20 126448 8747 hCG1785634.2 ADAM21 ADAM metallopeptidase domain 21 208981 8747 hCG1785634.2|hCG2042897 ADAM21 ADAM metallopeptidase domain 21 180903 53616 hCG17212.4 ADAM22 ADAM metallopeptidase domain 22 177272 8745 hCG1811623.1 ADAM23 ADAM metallopeptidase domain 23 102384 10863 hCG1818505.1 ADAM28 ADAM metallopeptidase domain 28 119968 11086 hCG1786734.2 ADAM29 ADAM metallopeptidase domain 29 205542 11085 hCG1997196.1 ADAM30 ADAM metallopeptidase domain 30 148417 80332 hCG39255.4 ADAM33 ADAM metallopeptidase domain 33 140492 8756 hCG1789002.2 ADAM7 ADAM metallopeptidase domain 7 122603 101 hCG1816947.1 ADAM8 ADAM metallopeptidase domain 8 183965 8754 hCG1996391 ADAM9 ADAM metallopeptidase domain 9 (meltrin gamma) 129974 27299 hCG15447.3 ADAMDEC1 ADAM-like,
    [Show full text]
  • Genome-Contric Metagenomic Investigation of 134 Samples
    GENOME-CONTRIC METAGENOMIC INVESTIGATION OF 134 SAMPLES COLLECTED FROM BIOGAS REACTORS REVEALED IMPORTANT FUNCTIONAL ROLES FOR MICROBIAL SPECIES BELONGING TO UNDEREXPLORED TAXA Campanaro, S.1, Luo, G.2, Treu, L.3 Kougias, P.G.4, Zhu X.3, Angelidaki, I.3 1 University of Padova, Department of Biology, Via U. Bassi, 58b, 35131 Padova (PD), Italy 2 Fudan University, Department of Environmental Science and Engineering, Shanghai Key Laboratory of Atmospheric Particle Pollution and Prevention (LAP3), 200433, Shanghai, China 3 Technical University of Denmark, Department of Environmental Engineering, Kgs. Lyngby DK-2800, Denmark 4 Institute of Animal Science, Hellenic Agricultural Organization DEMETER, Paralimni, Greece e-mail of the corresponding author: [email protected] Keywords: anaerobic digestion; metagenomics; metagenome-assembled genomes; taxonomy. INTRODUCTION In the past four years many samples collected from full-scale biogas plants and lab-scale reactors have been investigated using Illumina shotgun sequencing approaches and deposited in NCBI Sequence Read Archive database. In all these reactors, the Anaerobic Digestion (AD) process is performed by a plethora of different microorganisms organized in a complex functional network where different species have distinct roles in degradation of organic matter. Difficulties associated to cultivation of many prokaryotes present in natural and engineered ecosystems strongly limits the possibility to expand our knowledge regarding their physiology, genetics and function (Tringe and Rubin, 2005). Thanks to the bioinformatics approaches recently developed, which allow the reconstruction of microbial genomes starting from shotgun DNA sequences obtained from mixed cultures (Campanaro et al., 2016), it is now becoming possible to reveal the functional roles of microbial species resisting cultivation in axenic cultures.
    [Show full text]
  • Genome Sequences of Tropheus Moorii and Petrochromis Trewavasae, Two Eco‑Morphologically Divergent Cichlid Fshes Endemic to Lake Tanganyika C
    www.nature.com/scientificreports OPEN Genome sequences of Tropheus moorii and Petrochromis trewavasae, two eco‑morphologically divergent cichlid fshes endemic to Lake Tanganyika C. Fischer1,2, S. Koblmüller1, C. Börger1, G. Michelitsch3, S. Trajanoski3, C. Schlötterer4, C. Guelly3, G. G. Thallinger2,5* & C. Sturmbauer1,5* With more than 1000 species, East African cichlid fshes represent the fastest and most species‑rich vertebrate radiation known, providing an ideal model to tackle molecular mechanisms underlying recurrent adaptive diversifcation. We add high‑quality genome reconstructions for two phylogenetic key species of a lineage that diverged about ~ 3–9 million years ago (mya), representing the earliest split of the so‑called modern haplochromines that seeded additional radiations such as those in Lake Malawi and Victoria. Along with the annotated genomes we analysed discriminating genomic features of the study species, each representing an extreme trophic morphology, one being an algae browser and the other an algae grazer. The genomes of Tropheus moorii (TM) and Petrochromis trewavasae (PT) comprise 911 and 918 Mbp with 40,300 and 39,600 predicted genes, respectively. Our DNA sequence data are based on 5 and 6 individuals of TM and PT, and the transcriptomic sequences of one individual per species and sex, respectively. Concerning variation, on average we observed 1 variant per 220 bp (interspecifc), and 1 variant per 2540 bp (PT vs PT)/1561 bp (TM vs TM) (intraspecifc). GO enrichment analysis of gene regions afected by variants revealed several candidates which may infuence phenotype modifcations related to facial and jaw morphology, such as genes belonging to the Hedgehog pathway (SHH, SMO, WNT9A) and the BMP and GLI families.
    [Show full text]
  • Supplementary Table S4. FGA Co-Expressed Gene List in LUAD
    Supplementary Table S4. FGA co-expressed gene list in LUAD tumors Symbol R Locus Description FGG 0.919 4q28 fibrinogen gamma chain FGL1 0.635 8p22 fibrinogen-like 1 SLC7A2 0.536 8p22 solute carrier family 7 (cationic amino acid transporter, y+ system), member 2 DUSP4 0.521 8p12-p11 dual specificity phosphatase 4 HAL 0.51 12q22-q24.1histidine ammonia-lyase PDE4D 0.499 5q12 phosphodiesterase 4D, cAMP-specific FURIN 0.497 15q26.1 furin (paired basic amino acid cleaving enzyme) CPS1 0.49 2q35 carbamoyl-phosphate synthase 1, mitochondrial TESC 0.478 12q24.22 tescalcin INHA 0.465 2q35 inhibin, alpha S100P 0.461 4p16 S100 calcium binding protein P VPS37A 0.447 8p22 vacuolar protein sorting 37 homolog A (S. cerevisiae) SLC16A14 0.447 2q36.3 solute carrier family 16, member 14 PPARGC1A 0.443 4p15.1 peroxisome proliferator-activated receptor gamma, coactivator 1 alpha SIK1 0.435 21q22.3 salt-inducible kinase 1 IRS2 0.434 13q34 insulin receptor substrate 2 RND1 0.433 12q12 Rho family GTPase 1 HGD 0.433 3q13.33 homogentisate 1,2-dioxygenase PTP4A1 0.432 6q12 protein tyrosine phosphatase type IVA, member 1 C8orf4 0.428 8p11.2 chromosome 8 open reading frame 4 DDC 0.427 7p12.2 dopa decarboxylase (aromatic L-amino acid decarboxylase) TACC2 0.427 10q26 transforming, acidic coiled-coil containing protein 2 MUC13 0.422 3q21.2 mucin 13, cell surface associated C5 0.412 9q33-q34 complement component 5 NR4A2 0.412 2q22-q23 nuclear receptor subfamily 4, group A, member 2 EYS 0.411 6q12 eyes shut homolog (Drosophila) GPX2 0.406 14q24.1 glutathione peroxidase
    [Show full text]
  • Biallelic Alteration and Dysregulation of the Hippo Pathway in Mucinous Tubular and Spindle Cell Carcinoma of the Kidney
    Published OnlineFirst September 7, 2016; DOI: 10.1158/2159-8290.CD-16-0267 RESEARCH BRIEF Biallelic Alteration and Dysregulation of the Hippo Pathway in Mucinous Tubular and Spindle Cell Carcinoma of the Kidney Rohit Mehra 1 , 2 , 3 , Pankaj Vats 1 , 3 , 4 , Marcin Cieslik 3 , Xuhong Cao 3 , 5 , Fengyun Su 3 , Sudhanshu Shukla 3 , Aaron M. Udager 1 , Rui Wang 3 , Jincheng Pan 6 , Katayoon Kasaian 3 , Robert Lonigro 3 , Javed Siddiqui 3 , Kumpati Premkumar 4 , Ganesh Palapattu 7 , Alon Weizer 2 , 7 , Khaled S. Hafez 7 , J. Stuart Wolf Jr 7 , Ankur R. Sangoi 8 , Kiril Trpkov 9 , Adeboye O. Osunkoya 10 , Ming Zhou 11 , Giovanna A. Giannico 12 , Jesse K. McKenney 13 , Saravana M. Dhanasekaran 1 , 3 , and Arul M. Chinnaiyan 1 , 2 , 3 , 5 , 7 ABSTRACT Mucinous tubular and spindle cell carcinoma (MTSCC) is a relatively rare subtype of renal cell carcinoma (RCC) with distinctive morphologic and cytogenetic fea- tures. Here, we carry out whole-exome and transcriptome sequencing of a multi-institutional cohort of MTSCC (n = 22). We demonstrate the presence of either biallelic loss of Hippo pathway tumor sup- pressor genes (TSG) and/or evidence of alteration of Hippo pathway genes in 85% of samples. PTPN14 (31%) and NF2 (22%) were the most commonly implicated Hippo pathway genes, whereas other genes such as SAV1 and HIPK2 were also involved in a mutually exclusive fashion. Mutations in the context of recurrent chromosomal losses amounted to biallelic alterations in these TSGs. As a readout of Hippo pathway inactivation, a majority of cases (90%) exhibited increased nuclear YAP1 protein expression.
    [Show full text]