LENGTH TRANSCRIPTOME SEQUENCING a Dissertation

Total Page:16

File Type:pdf, Size:1020Kb

LENGTH TRANSCRIPTOME SEQUENCING a Dissertation HEMATOPOIETIC CELL POPULATION SEGREGATION THROUGH FULL- LENGTH TRANSCRIPTOME SEQUENCING A Dissertation submitted to the Faculty of the Graduate School of Arts and Sciences of Georgetown University in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Tumor Biology By Anne Deslattes Mays, M.Sc., M.Sc. Washington, DC July 17, 2015 Copyright 2015 by Anne Deslattes Mays All Rights Reserved ii HEMATOPOIETIC CELL POPULATION SEGREGATION THROUGH FULL- LENGTH TRANSCRIPTOME SEQUENCING Anne Deslattes Mays, M.Sc., M.Sc. Thesis Advisor: Anton Wellstein, M.D. Ph.D. ABSTRACT “Progress in science results from new technologies, new discoveries and new ideas, probably in that order.” Nobel Laureate Sydney Brenner (1927 - ) Sequencing the human genome was a critical first step in setting the groundwork to understanding the molecular programming that is involved in transforming a cell from a healthy to a cancerous state. Cellular transcriptome complexity has become increasingly more apparent as technological advances have exposed us to its diversity. Full-length RNA sequencing is crucial for an unbiased analysis of transcriptome complexity. This complexity is due to posttranscriptional processing of primary transcripts that results in a variety of isoforms generated from the same genomic loci. Distinct cell lineages are defined by their transcript isoform expression profiles, and the annotation of cells can be derived from the expression of transcript isoforms that can result in functionally different proteins. Alternate splice site utilization provides cells with a powerful regulatory mechanism of gene expression that can impact the composition of the protein product, and influence the rate of translation of transcripts from multi-exon genes. The overall goal of this project was to delineate the hematopoietic transcriptome revealed by full-length sequencing and assess the shortcomings of transcriptome iii reconstruction using fragmented-read sequencing. The aims were to (a) evaluate the complexity of the hematopoietic transcriptome using full-length RNA sequencing, to (b) compare the full- length RNA-sequencing transcriptome with the reconstructed transcriptome from fragmented- read sequencing and to (c) evaluate whether hematopoietic cell subpopulations show distinct transcriptome patterns. Sequencing and reconstructing transcripts through transcriptome reconstruction from fragmented read sequencing have advanced our understanding of the transcriptome. Here we show that full-length transcriptome sequencing is necessary to faithfully expose the transcriptome and understand its complexities. Abundance information and pathway analysis support this. Also, full-length sequencing illustrates open reading frames that code for contiguous canonical or fusion proteins that can be validated with peptides. This transcriptome diversity is consistent with distinct phenotypes of cell subpopulations present in tissues. Accurate transcriptome measurement builds a foundation that can be relied upon to ensure higher success rates for therapeutics and lower false discovery rates for biomarkers of disease. The analysis of transcripts of a set of selected genes as well as the potential for posttranscriptional processing predicts for a highly complex transcriptome and an abundance of hitherto unknown protein isoforms. Classic approaches have not allowed full testing of this hypothesis due to limitations in sequencing lengths. Taking advantage of full-length sequencing technology provides us with an opportunity to uncover transcripts that cannot be obtained through traditional transcript reconstruction techniques. iv DEDICATION Life and time are often inconvenient partners that challenge us to keep moving forward while circumstances do their best to derail us from our chosen paths. While heading up software development for Craig Venter and his team at Celera (sequencing the human genome) my husband suffered a serious stroke. A long and ongoing recovery period has followed. During this period my father was stricken with and subsequently died of lung cancer. Time passed all too quickly as I raised my daughter through adolescence while at the same time my mother began her slow drift into dementia. It was during this period between my fathers’ death and the early stages of my mothers’ decline that I committed myself to pursing my PhD. By no-means an easy decision - and one that has tested the bounds of time, life and love for me and those around me. I feel the need to complete the work that Craig Venter had asked us to do at Celera and to show my daughter what can be accomplished despite the adversity and circumstances that life and time present us. Craig challenged to us to first, sequence the human genome, and then second, to cure cancer. We are not there yet, and I am committed to help accomplish that task. I feel strongly we must use skills and resources we have, from biological assay, to mathematical algorithm, to complex computer infrastructure to beat down the details in such a way we can dissect the signals of disease and understand its origins. We must ask the right questions, use the right tools, and work to de-obfuscate the information we have. v To my daughter Katie, thank you for your love and support throughout these years. You inspire me and challenge me in unexpected and surprising ways. Thank you for revealing what you see with your eyes, hear with your ears and create with your unbelievable imagination. I am so grateful for your understanding and patience throughout this journey. This thesis is dedicated to you. vi ACKNOWLEDGEMENTS I would like to thank my mentor, Dr. Anton Wellstein. He gave me a project that he thought was right up my alley. The journey presented challenges unseen at the beginning, yet ultimately produced results well worth the efforts. I would also like to thank Dr. Anna Tate Riegel for allowing me to enter the program. Her belief that I could do this and her willingness to help me navigate through the process helped inspire me to try despite the obstacles and challenges. I am grateful to my Thesis Committee Members - Dr. Michael Johnson, Dr. Anatoly Dritschilo, Dr. Habtom Resom, Dr. Yuri Gusev, and Dr. Christopher Loffredo. Their time, support and advice during the development of my project have been greatly appreciated. A special “thank you” to Dr. Terence Ryan for all your support throughout these years and for being my external committee member. I wish to thank Dr. Eric Schadt - a brief encounter at Cold Spring Harbor started me on the path to complete of my journey. Thank you for getting Dr. Robert Sebra to sequence that first sample for me, I wouldn't be writing these words had you not started that ball rolling for me. I wish to thank Dr. Mike Hunkapillar and Dr. Elizabeth Tseng and the Pacific Biosciences collaboration that made the completion of this work possible. Thank you Liz for your software, your friendship and your dedication. None of the work presented in this thesis would have been possible without it. I would like to also thank present and past members of the Wellstein-Riegel lab, Drs. Sonia Rosenfeld and Elena Tassi who shared differing lab corners over the years and who vii witnessed my trials and tribulations supporting me with love and kindness without which I would not have made it to this end. To Garrett Graham, thank you for being an unofficial member of the lab, I am appreciative of the late night and weekend discussions regarding GRanges, BioViz and all other things bioconductor. The spirit in this lab is without match -- the program is a special one of dedication and striving for proper and correct science. I would like to thank current and past members of KeyGene, especially I give my thanks to Dr. Arjen van Tunen, Dr. Leo Zwinkels, Dr. Mark van Haaren, Dr. An Michiels , Dr. Jan van Oeveren, Mike Cariaso and Matthew McCoy and most recently Dr. Fayaz Khazi of KeyGene -- for your support throughout the years. To the future Dr. Rutger van Bergem, it was wonderful to have a partner in this final 800 meters of the race. I am too slow a runner to beat you and Dr. Eveline Vietsch in a running race -- but I guess I got to finish just a hair ahead of you in this PhD race! Thank you for your encouragement and for your pushing me along. Finally, I would like to say a special thank you to Dr. Marcel Schmidt who taught me all I know for working on the bench and has been a staunch supporter and friend throughout these years. viii INDEX CHAPTER 1 - INTRODUCTION .................................................................................................. 1 A. Genome, Genomic Loci of Genes, mRNA and mRNA Isoforms ........................................... 1 B. Technological Advances Drive Discoveries ........................................................................... 2 C. RNA Sequencing ..................................................................................................................... 4 D. Transcriptional Measurement Limitations .............................................................................. 6 E. Cancer Discoveries and Landmarks ........................................................................................ 9 F. Hematopoietic Transcriptome ............................................................................................... 10 G. Transcript Expression, Structure and Mutational Landscape ............................................... 11 H. Hypothesis, Goal and Specific Aims ...................................................................................
Recommended publications
  • Analysis of Trans Esnps Infers Regulatory Network Architecture
    Analysis of trans eSNPs infers regulatory network architecture Anat Kreimer Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences COLUMBIA UNIVERSITY 2014 © 2014 Anat Kreimer All rights reserved ABSTRACT Analysis of trans eSNPs infers regulatory network architecture Anat Kreimer eSNPs are genetic variants associated with transcript expression levels. The characteristics of such variants highlight their importance and present a unique opportunity for studying gene regulation. eSNPs affect most genes and their cell type specificity can shed light on different processes that are activated in each cell. They can identify functional variants by connecting SNPs that are implicated in disease to a molecular mechanism. Examining eSNPs that are associated with distal genes can provide insights regarding the inference of regulatory networks but also presents challenges due to the high statistical burden of multiple testing. Such association studies allow: simultaneous investigation of many gene expression phenotypes without assuming any prior knowledge and identification of unknown regulators of gene expression while uncovering directionality. This thesis will focus on such distal eSNPs to map regulatory interactions between different loci and expose the architecture of the regulatory network defined by such interactions. We develop novel computational approaches and apply them to genetics-genomics data in human. We go beyond pairwise interactions to define network motifs, including regulatory modules and bi-fan structures, showing them to be prevalent in real data and exposing distinct attributes of such arrangements. We project eSNP associations onto a protein-protein interaction network to expose topological properties of eSNPs and their targets and highlight different modes of distal regulation.
    [Show full text]
  • Transcriptome Analyses of Rhesus Monkey Pre-Implantation Embryos Reveal A
    Downloaded from genome.cshlp.org on September 23, 2021 - Published by Cold Spring Harbor Laboratory Press Transcriptome analyses of rhesus monkey pre-implantation embryos reveal a reduced capacity for DNA double strand break (DSB) repair in primate oocytes and early embryos Xinyi Wang 1,3,4,5*, Denghui Liu 2,4*, Dajian He 1,3,4,5, Shengbao Suo 2,4, Xian Xia 2,4, Xiechao He1,3,6, Jing-Dong J. Han2#, Ping Zheng1,3,6# Running title: reduced DNA DSB repair in monkey early embryos Affiliations: 1 State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan 650223, China 2 Key Laboratory of Computational Biology, CAS Center for Excellence in Molecular Cell Science, Collaborative Innovation Center for Genetics and Developmental Biology, Chinese Academy of Sciences-Max Planck Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China 3 Yunnan Key Laboratory of Animal Reproduction, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan 650223, China 4 University of Chinese Academy of Sciences, Beijing, China 5 Kunming College of Life Science, University of Chinese Academy of Sciences, Kunming, Yunnan 650204, China 6 Primate Research Center, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, 650223, China * Xinyi Wang and Denghui Liu contributed equally to this work 1 Downloaded from genome.cshlp.org on September 23, 2021 - Published by Cold Spring Harbor Laboratory Press # Correspondence: Jing-Dong J. Han, Email: [email protected]; Ping Zheng, Email: [email protected] Key words: rhesus monkey, pre-implantation embryo, DNA damage 2 Downloaded from genome.cshlp.org on September 23, 2021 - Published by Cold Spring Harbor Laboratory Press ABSTRACT Pre-implantation embryogenesis encompasses several critical events including genome reprogramming, zygotic genome activation (ZGA) and cell fate commitment.
    [Show full text]
  • Mouse Letmd1 Conditional Knockout Project (CRISPR/Cas9)
    https://www.alphaknockout.com Mouse Letmd1 Conditional Knockout Project (CRISPR/Cas9) Objective: To create a Letmd1 conditional knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering. Strategy summary: The Letmd1 gene (NCBI Reference Sequence: NM_134093 ; Ensembl: ENSMUSG00000037353 ) is located on Mouse chromosome 15. 9 exons are identified, with the ATG start codon in exon 1 and the TGA stop codon in exon 9 (Transcript: ENSMUST00000037001). Exon 5~7 will be selected as conditional knockout region (cKO region). Deletion of this region should result in the loss of function of the Mouse Letmd1 gene. To engineer the targeting vector, homologous arms and cKO region will be generated by PCR using BAC clone RP23-81M8 as template. Cas9, gRNA and targeting vector will be co-injected into fertilized eggs for cKO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note: Exon 5 starts from about 43.89% of the coding region. The knockout of Exon 5~7 will result in frameshift of the gene. The size of intron 4 for 5'-loxP site insertion: 2391 bp, and the size of intron 7 for 3'-loxP site insertion: 1135 bp. The size of effective cKO region: ~1132 bp. The cKO region does not have any other known gene. Page 1 of 8 https://www.alphaknockout.com Overview of the Targeting Strategy Wildtype allele 5' gRNA region gRNA region 3' 1 5 6 7 8 9 Targeting vector Targeted allele Constitutive KO allele (After Cre recombination) Legends Exon of mouse Letmd1 Homology arm cKO region loxP site Page 2 of 8 https://www.alphaknockout.com Overview of the Dot Plot Window size: 10 bp Forward Reverse Complement Sequence 12 Note: The sequence of homologous arms and cKO region is aligned with itself to determine if there are tandem repeats.
    [Show full text]
  • Manual Annotation and Analysis of the Defensin Gene Cluster in the C57BL
    BMC Genomics BioMed Central Research article Open Access Manual annotation and analysis of the defensin gene cluster in the C57BL/6J mouse reference genome Clara Amid*†1, Linda M Rehaume*†2, Kelly L Brown2,3, James GR Gilbert1, Gordon Dougan1, Robert EW Hancock2 and Jennifer L Harrow1 Address: 1Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK, 2University of British Columbia, Centre for Microbial Disease & Immunity Research, 2259 Lower Mall, Vancouver, BC, V6T 1Z4, Canada and 3Department of Rheumatology and Inflammation Research, Göteborg University, Guldhedsgatan 10, S-413 46 Göteborg, Sweden Email: Clara Amid* - [email protected]; Linda M Rehaume* - [email protected]; Kelly L Brown - [email protected]; James GR Gilbert - [email protected]; Gordon Dougan - [email protected]; Robert EW Hancock - [email protected]; Jennifer L Harrow - [email protected] * Corresponding authors †Equal contributors Published: 15 December 2009 Received: 15 May 2009 Accepted: 15 December 2009 BMC Genomics 2009, 10:606 doi:10.1186/1471-2164-10-606 This article is available from: http://www.biomedcentral.com/1471-2164/10/606 © 2009 Amid et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Abstract Background: Host defense peptides are a critical component of the innate immune system. Human alpha- and beta-defensin genes are subject to copy number variation (CNV) and historically the organization of mouse alpha-defensin genes has been poorly defined.
    [Show full text]
  • Genome-Wide DNA Methylation Profiling Identifies Differential Methylation in Uninvolved Psoriatic Epidermis
    Genome-Wide DNA Methylation Profiling Identifies Differential Methylation in Uninvolved Psoriatic Epidermis Deepti Verma, Anna-Karin Ekman, Cecilia Bivik Eding and Charlotta Enerbäck The self-archived postprint version of this journal article is available at Linköping University Institutional Repository (DiVA): http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-147791 N.B.: When citing this work, cite the original publication. Verma, D., Ekman, A., Bivik Eding, C., Enerbäck, C., (2018), Genome-Wide DNA Methylation Profiling Identifies Differential Methylation in Uninvolved Psoriatic Epidermis, Journal of Investigative Dermatology, 138(5), 1088-1093. https://doi.org/10.1016/j.jid.2017.11.036 Original publication available at: https://doi.org/10.1016/j.jid.2017.11.036 Copyright: Elsevier http://www.elsevier.com/ Genome-Wide DNA Methylation Profiling Identifies Differential Methylation in Uninvolved Psoriatic Epidermis Deepti Verma*a, Anna-Karin Ekman*a, Cecilia Bivik Edinga and Charlotta Enerbäcka *Authors contributed equally aIngrid Asp Psoriasis Research Center, Department of Clinical and Experimental Medicine, Division of Dermatology, Linköping University, Linköping, Sweden Corresponding author: Charlotta Enerbäck Ingrid Asp Psoriasis Research Center, Department of Clinical and Experimental Medicine, Linköping University SE-581 85 Linköping, Sweden Phone: +46 10 103 7429 E-mail: [email protected] Short title Differential methylation in psoriasis Abbreviations CGI, CpG island; DMS, differentially methylated site; RRBS, reduced representation bisulphite sequencing Keywords (max 6) psoriasis, epidermis, methylation, Wnt, susceptibility, expression 1 ABSTRACT Psoriasis is a chronic inflammatory skin disease with both local and systemic components. Genome-wide approaches have identified more than 60 psoriasis-susceptibility loci, but genes are estimated to explain only one third of the heritability in psoriasis, suggesting additional, yet unidentified, sources of heritability.
    [Show full text]
  • Epigenetic Mechanisms of Lncrnas Binding to Protein in Carcinogenesis
    cancers Review Epigenetic Mechanisms of LncRNAs Binding to Protein in Carcinogenesis Tae-Jin Shin, Kang-Hoon Lee and Je-Yoel Cho * Department of Biochemistry, BK21 Plus and Research Institute for Veterinary Science, School of Veterinary Medicine, Seoul National University, Seoul 08826, Korea; [email protected] (T.-J.S.); [email protected] (K.-H.L.) * Correspondence: [email protected]; Tel.: +82-02-800-1268 Received: 21 September 2020; Accepted: 9 October 2020; Published: 11 October 2020 Simple Summary: The functional analysis of lncRNA, which has recently been investigated in various fields of biological research, is critical to understanding the delicate control of cells and the occurrence of diseases. The interaction between proteins and lncRNA, which has been found to be a major mechanism, has been reported to play an important role in cancer development and progress. This review thus organized the lncRNAs and related proteins involved in the cancer process, from carcinogenesis to metastasis and resistance to chemotherapy, to better understand cancer and to further develop new treatments for it. This will provide a new perspective on clinical cancer diagnosis, prognosis, and treatment. Abstract: Epigenetic dysregulation is an important feature for cancer initiation and progression. Long non-coding RNAs (lncRNAs) are transcripts that stably present as RNA forms with no translated protein and have lengths larger than 200 nucleotides. LncRNA can epigenetically regulate either oncogenes or tumor suppressor genes. Nowadays, the combined research of lncRNA plus protein analysis is gaining more attention. LncRNA controls gene expression directly by binding to transcription factors of target genes and indirectly by complexing with other proteins to bind to target proteins and cause protein degradation, reduced protein stability, or interference with the binding of other proteins.
    [Show full text]
  • Proteomic Analysis Uncovers Measles Virus Protein C Interaction with P65
    bioRxiv preprint doi: https://doi.org/10.1101/2020.05.08.084418; this version posted May 9, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. Proteomic Analysis Uncovers Measles Virus Protein C Interaction with p65/iASPP/p53 Protein Complex Alice Meignié1,2*, Chantal Combredet1*, Marc Santolini 3,4, István A. Kovács4,5,6, Thibaut Douché7, Quentin Giai Gianetto 7,8, Hyeju Eun9, Mariette Matondo7, Yves Jacob10, Regis Grailhe9, Frédéric Tangy1**, and Anastassia V. Komarova1, 10** 1 Viral Genomics and Vaccination Unit, Department of Virology, Institut Pasteur, CNRS UMR-3569, 75015 Paris, France 2 Université Paris Diderot, Sorbonne Paris Cité, Paris, France 3 Center for Research and Interdisciplinarity (CRI), Université de Paris, INSERM U1284 4 Network Science Institute and Department of Physics, Northeastern University, Boston, MA 02115, USA 5 Department of Physics and Astronomy, Northwestern University, Evanston, IL 60208-3109, USA 6 Department of Network and Data Science, Central European University, Budapest, H-1051, Hungary 7 Proteomics platform, Mass Spectrometry for Biology Unit (MSBio), Institut Pasteur, CNRS USR 2000, Paris, France. 8 Bioinformatics and Biostatistics Hub, Computational Biology Department, Institut Pasteur, CNRS USR3756, Paris, France 9 Technology Development Platform, Institut Pasteur Korea, Seongnam-si, Republic of Korea 10 Laboratory of Molecular Genetics of RNA Viruses, Institut Pasteur, CNRS UMR-3569,
    [Show full text]
  • Supplemental Information
    Supplemental information Dissection of the genomic structure of the miR-183/96/182 gene. Previously, we showed that the miR-183/96/182 cluster is an intergenic miRNA cluster, located in a ~60-kb interval between the genes encoding nuclear respiratory factor-1 (Nrf1) and ubiquitin-conjugating enzyme E2H (Ube2h) on mouse chr6qA3.3 (1). To start to uncover the genomic structure of the miR- 183/96/182 gene, we first studied genomic features around miR-183/96/182 in the UCSC genome browser (http://genome.UCSC.edu/), and identified two CpG islands 3.4-6.5 kb 5’ of pre-miR-183, the most 5’ miRNA of the cluster (Fig. 1A; Fig. S1 and Seq. S1). A cDNA clone, AK044220, located at 3.2-4.6 kb 5’ to pre-miR-183, encompasses the second CpG island (Fig. 1A; Fig. S1). We hypothesized that this cDNA clone was derived from 5’ exon(s) of the primary transcript of the miR-183/96/182 gene, as CpG islands are often associated with promoters (2). Supporting this hypothesis, multiple expressed sequences detected by gene-trap clones, including clone D016D06 (3, 4), were co-localized with the cDNA clone AK044220 (Fig. 1A; Fig. S1). Clone D016D06, deposited by the German GeneTrap Consortium (GGTC) (http://tikus.gsf.de) (3, 4), was derived from insertion of a retroviral construct, rFlpROSAβgeo in 129S2 ES cells (Fig. 1A and C). The rFlpROSAβgeo construct carries a promoterless reporter gene, the β−geo cassette - an in-frame fusion of the β-galactosidase and neomycin resistance (Neor) gene (5), with a splicing acceptor (SA) immediately upstream, and a polyA signal downstream of the β−geo cassette (Fig.
    [Show full text]
  • Noelia Díaz Blanco
    Effects of environmental factors on the gonadal transcriptome of European sea bass (Dicentrarchus labrax), juvenile growth and sex ratios Noelia Díaz Blanco Ph.D. thesis 2014 Submitted in partial fulfillment of the requirements for the Ph.D. degree from the Universitat Pompeu Fabra (UPF). This work has been carried out at the Group of Biology of Reproduction (GBR), at the Department of Renewable Marine Resources of the Institute of Marine Sciences (ICM-CSIC). Thesis supervisor: Dr. Francesc Piferrer Professor d’Investigació Institut de Ciències del Mar (ICM-CSIC) i ii A mis padres A Xavi iii iv Acknowledgements This thesis has been made possible by the support of many people who in one way or another, many times unknowingly, gave me the strength to overcome this "long and winding road". First of all, I would like to thank my supervisor, Dr. Francesc Piferrer, for his patience, guidance and wise advice throughout all this Ph.D. experience. But above all, for the trust he placed on me almost seven years ago when he offered me the opportunity to be part of his team. Thanks also for teaching me how to question always everything, for sharing with me your enthusiasm for science and for giving me the opportunity of learning from you by participating in many projects, collaborations and scientific meetings. I am also thankful to my colleagues (former and present Group of Biology of Reproduction members) for your support and encouragement throughout this journey. To the “exGBRs”, thanks for helping me with my first steps into this world. Working as an undergrad with you Dr.
    [Show full text]
  • Variation in Protein Coding Genes Identifies Information
    bioRxiv preprint doi: https://doi.org/10.1101/679456; this version posted June 21, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Animal complexity and information flow 1 1 2 3 4 5 Variation in protein coding genes identifies information flow as a contributor to 6 animal complexity 7 8 Jack Dean, Daniela Lopes Cardoso and Colin Sharpe* 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Institute of Biological and Biomedical Sciences 25 School of Biological Science 26 University of Portsmouth, 27 Portsmouth, UK 28 PO16 7YH 29 30 * Author for correspondence 31 [email protected] 32 33 Orcid numbers: 34 DLC: 0000-0003-2683-1745 35 CS: 0000-0002-5022-0840 36 37 38 39 40 41 42 43 44 45 46 47 48 49 Abstract bioRxiv preprint doi: https://doi.org/10.1101/679456; this version posted June 21, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Animal complexity and information flow 2 1 Across the metazoans there is a trend towards greater organismal complexity. How 2 complexity is generated, however, is uncertain. Since C.elegans and humans have 3 approximately the same number of genes, the explanation will depend on how genes are 4 used, rather than their absolute number.
    [Show full text]
  • Supplementary Table S4. FGA Co-Expressed Gene List in LUAD
    Supplementary Table S4. FGA co-expressed gene list in LUAD tumors Symbol R Locus Description FGG 0.919 4q28 fibrinogen gamma chain FGL1 0.635 8p22 fibrinogen-like 1 SLC7A2 0.536 8p22 solute carrier family 7 (cationic amino acid transporter, y+ system), member 2 DUSP4 0.521 8p12-p11 dual specificity phosphatase 4 HAL 0.51 12q22-q24.1histidine ammonia-lyase PDE4D 0.499 5q12 phosphodiesterase 4D, cAMP-specific FURIN 0.497 15q26.1 furin (paired basic amino acid cleaving enzyme) CPS1 0.49 2q35 carbamoyl-phosphate synthase 1, mitochondrial TESC 0.478 12q24.22 tescalcin INHA 0.465 2q35 inhibin, alpha S100P 0.461 4p16 S100 calcium binding protein P VPS37A 0.447 8p22 vacuolar protein sorting 37 homolog A (S. cerevisiae) SLC16A14 0.447 2q36.3 solute carrier family 16, member 14 PPARGC1A 0.443 4p15.1 peroxisome proliferator-activated receptor gamma, coactivator 1 alpha SIK1 0.435 21q22.3 salt-inducible kinase 1 IRS2 0.434 13q34 insulin receptor substrate 2 RND1 0.433 12q12 Rho family GTPase 1 HGD 0.433 3q13.33 homogentisate 1,2-dioxygenase PTP4A1 0.432 6q12 protein tyrosine phosphatase type IVA, member 1 C8orf4 0.428 8p11.2 chromosome 8 open reading frame 4 DDC 0.427 7p12.2 dopa decarboxylase (aromatic L-amino acid decarboxylase) TACC2 0.427 10q26 transforming, acidic coiled-coil containing protein 2 MUC13 0.422 3q21.2 mucin 13, cell surface associated C5 0.412 9q33-q34 complement component 5 NR4A2 0.412 2q22-q23 nuclear receptor subfamily 4, group A, member 2 EYS 0.411 6q12 eyes shut homolog (Drosophila) GPX2 0.406 14q24.1 glutathione peroxidase
    [Show full text]
  • Literature Mining Sustains and Enhances Knowledge Discovery from Omic Studies
    LITERATURE MINING SUSTAINS AND ENHANCES KNOWLEDGE DISCOVERY FROM OMIC STUDIES by Rick Matthew Jordan B.S. Biology, University of Pittsburgh, 1996 M.S. Molecular Biology/Biotechnology, East Carolina University, 2001 M.S. Biomedical Informatics, University of Pittsburgh, 2005 Submitted to the Graduate Faculty of School of Medicine in partial fulfillment of the requirements for the degree of Doctor of Philosophy University of Pittsburgh 2016 UNIVERSITY OF PITTSBURGH SCHOOL OF MEDICINE This dissertation was presented by Rick Matthew Jordan It was defended on December 2, 2015 and approved by Shyam Visweswaran, M.D., Ph.D., Associate Professor Rebecca Jacobson, M.D., M.S., Professor Songjian Lu, Ph.D., Assistant Professor Dissertation Advisor: Vanathi Gopalakrishnan, Ph.D., Associate Professor ii Copyright © by Rick Matthew Jordan 2016 iii LITERATURE MINING SUSTAINS AND ENHANCES KNOWLEDGE DISCOVERY FROM OMIC STUDIES Rick Matthew Jordan, M.S. University of Pittsburgh, 2016 Genomic, proteomic and other experimentally generated data from studies of biological systems aiming to discover disease biomarkers are currently analyzed without sufficient supporting evidence from the literature due to complexities associated with automated processing. Extracting prior knowledge about markers associated with biological sample types and disease states from the literature is tedious, and little research has been performed to understand how to use this knowledge to inform the generation of classification models from ‘omic’ data. Using pathway analysis methods to better understand the underlying biology of complex diseases such as breast and lung cancers is state-of-the-art. However, the problem of how to combine literature- mining evidence with pathway analysis evidence is an open problem in biomedical informatics research.
    [Show full text]