Development of Novel Analysis and Data Integration Systems to Understand Human Gene Regulation

Total Page:16

File Type:pdf, Size:1020Kb

Development of Novel Analysis and Data Integration Systems to Understand Human Gene Regulation Development of novel analysis and data integration systems to understand human gene regulation Dissertation zur Erlangung des Doktorgrades Dr. rer. nat. der Fakult¨atf¨urMathematik und Informatik der Georg-August-Universit¨atG¨ottingen im PhD Programme in Computer Science (PCS) der Georg-August University School of Science (GAUSS) vorgelegt von Raza-Ur Rahman aus Pakistan G¨ottingen,April 2018 Prof. Dr. Stefan Bonn, Zentrum f¨urMolekulare Neurobiologie (ZMNH), Betreuungsausschuss: Institut f¨urMedizinische Systembiologie, Hamburg Prof. Dr. Tim Beißbarth, Institut f¨urMedizinische Statistik, Universit¨atsmedizin, Georg-August Universit¨at,G¨ottingen Prof. Dr. Burkhard Morgenstern, Institut f¨urMikrobiologie und Genetik Abtl. Bioinformatik, Georg-August Universit¨at,G¨ottingen Pr¨ufungskommission: Prof. Dr. Stefan Bonn, Zentrum f¨urMolekulare Neurobiologie (ZMNH), Referent: Institut f¨urMedizinische Systembiologie, Hamburg Prof. Dr. Tim Beißbarth, Institut f¨urMedizinische Statistik, Universit¨atsmedizin, Korreferent: Georg-August Universit¨at,G¨ottingen Prof. Dr. Burkhard Morgenstern, Weitere Mitglieder Institut f¨urMikrobiologie und Genetik Abtl. Bioinformatik, der Pr¨ufungskommission: Georg-August Universit¨at,G¨ottingen Prof. Dr. Carsten Damm, Institut f¨urInformatik, Georg-August Universit¨at,G¨ottingen Prof. Dr. Florentin W¨org¨otter, Physikalisches Institut Biophysik, Georg-August-Universit¨at,G¨ottingen Prof. Dr. Stephan Waack, Institut f¨urInformatik, Georg-August Universit¨at,G¨ottingen Tag der m¨undlichen Pr¨ufung: der 30. M¨arz2018 i Contents List of Figuresv Acknowledgements vi Abstract 1 List of publications and softwares3 Thesis structure5 1 Biological Background Knowledge6 1.1 Deoxyribonucleic acid.............................6 1.2 Gene expression.................................6 1.2.1 Transcription start site.........................7 1.2.2 RNA polymerase II...........................7 1.2.3 Promoter................................8 1.2.4 Enhancers................................8 1.2.5 Transcription factors..........................8 1.3 Alternative Splicing..............................8 1.4 Small RNA (sRNA).............................. 10 1.4.1 MicroRNAs............................... 10 1.4.2 PIWI-interacting RNAs........................ 11 1.4.3 Small nucleolar RNAs......................... 12 1.4.4 Small interfering RNA......................... 12 1.4.5 Small nuclear RNAs.......................... 13 1.5 Next generation sequencing.......................... 15 1.5.1 RNA sequencing............................ 15 1.5.1.1 Method............................ 16 2 Bioinformatics Background Knowledge 18 2.1 Database management systems........................ 18 2.1.1 DBMS Architecture.......................... 19 2.2 Types of databases............................... 20 2.2.1 Relational database systems...................... 20 2.2.1.1 Constraints.......................... 22 2.2.1.2 Entity relationship model (ER model)........... 23 2.2.2 Non-relational database systems................... 23 2.2.2.1 Types of NoSQL databases................. 24 ii CONTENTS iii 2.3 Standard workflows for NGS data analysis.................. 26 2.3.1 Raw data (FASTQ).......................... 26 2.3.2 Quality control (QC).......................... 27 2.3.2.1 FastQC............................ 27 2.3.3 Adapter trimming........................... 28 2.3.4 Alignment and counting........................ 29 2.3.5 Differential expression (DE) analysis................. 29 2.4 Biological ontologies.............................. 30 2.5 Principles of supervised machine learning methods............. 30 2.5.1 Classification.............................. 31 2.5.1.1 Biological example...................... 31 2.5.1.2 Random forest........................ 32 2.6 Thesis related existing resources and research................ 33 2.6.1 sRNA-seq analysis tools........................ 33 2.6.1.1 sRNA workbench...................... 33 2.6.1.2 CAP-miRSeq......................... 34 2.6.1.3 omiRas............................ 34 2.6.1.4 mirTools 2.0......................... 34 2.6.1.5 MAGI............................. 34 2.6.1.6 Chimira............................ 34 2.6.1.7 sRNAtoolbox......................... 34 2.6.2 sRNA expression databases...................... 35 2.6.2.1 miRmine........................... 35 2.6.2.2 DASHR............................ 35 2.6.2.3 Miratlas............................ 35 2.6.2.4 YM500v3........................... 36 2.6.3 Mutually exclusive splicing of exons................. 36 2.7 Goals of the Thesis............................... 36 2.7.1 Online analysis of small RNA deep sequencing data (Oasis).... 36 2.7.2 sRNA expression atlas (SEA)..................... 37 2.7.3 Mutually exclusive splicing of exons................. 38 3 Results, Discussion and Outlook 39 3.1 Online analysis of small RNA-seq data (Oasis 2).............. 39 3.1.1 Oasis 2's module............................ 39 3.1.2 OasisCompressor............................ 42 3.1.3 Quality Control (QC)......................... 44 3.1.4 Functional enrichment analysis.................... 45 3.2 Small RNA expression atlas (SEA)...................... 47 3.2.1 System design.............................. 48 3.2.2 Annotation tool............................ 49 3.2.2.1 Annotation criteria..................... 50 3.2.3 SEA web application.......................... 51 3.3 Mutually exclusive splicing of exons..................... 52 3.3.1 Data sources.............................. 52 3.3.2 Prediction of MXE candidates.................... 53 3.3.3 Validation of MXE candidates.................... 53 CONTENTS iv 3.3.4 Spatio-temporal expression of MXEs................. 54 3.3.5 Disease pathology prediction..................... 55 3.4 Conclusion and outlook............................ 57 References 67 Appendices 68 A Article 1 69 B Article 2 80 C Article 3 95 List of Figures 1.1 DNA structure.................................7 1.2 Gene expression.................................7 1.3 Promoter, enhancers and TFs.........................9 1.4 Forms of alternative splicing.......................... 10 1.5 miRNA biogenesis............................... 11 1.6 piRNA biogenesis................................ 13 1.7 snoRNA biogenesis............................... 14 1.8 siRNA biogenesis................................ 15 1.9 RNA-seq library preparation workflow.................... 17 2.1 Three-level DBMS architecture........................ 19 2.2 DBMS architecture along with different ways of querying the DBMS... 21 2.3 ERD representation.............................. 22 2.4 Standard workflow for NGS data analysis (RNA-seq,sRNA-seq)...... 26 2.5 FastQ format.................................. 27 2.6 FastQC per-base quality............................ 28 2.7 FastQC sequence quality............................ 28 2.8 Disease ontology................................ 30 2.9 Supervised machine learning.......................... 31 2.10 Illustration of random forest algorithm.................... 32 3.1 Oasis 2 modules and workflow......................... 40 3.2 OasisCompressor................................ 43 3.3 Browser view of the primary output of sRNA detection module...... 44 3.4 Assessment of Oasis 2' (QC) outlier detection................ 46 3.5 SEA system architecture............................ 49 3.6 SEA data integration workflow........................ 50 3.7 Annotation tool................................. 51 3.8 SEA home page................................. 52 3.9 MXE illustration................................ 54 3.10 Spatio-temporal expression of MXEs..................... 55 3.11 MXE-ratio expression predicts disease pathology.............. 56 v Acknowledgements First, I would like to thank Prof. Dr. Stefan Bonn for his guidance and helpful sug- gestions, who helped me to expand on my bioinformatics skills, and guided me to be able to manage teams. I would also like to thank my Thesis Committee, Prof. Dr. Tim Beißbarth and Prof. Dr. Burkhard Morgenstern, who gave me advice regarding my various projects from time to time. I would like to thank the entire Bonn lab, who were very helpful and encouraging. I would especially like to thank Abhivyakti Gautam and Abdul Sattar, who helped me in the development of these projects. Finally, I would like to dedicate my phd to my mother Shams-un Nahar for her ongoing love and support and to my father Atta Ur Rahman who could not see this thesis completed. vi Abstract This thesis covers a very broad range of bioinformatics methods ranging from the devel- opment of the analysis pipeline to the data integration and development of an expression atlas (database and web application development). In addition, an in silco method was developed to annotate genome with novel features, and predicting diseases based on the expression profiles. Development of online analysis of small RNA sequencing data Small RNA (sRNA) are biomolecules that play important roles in organismal health and disease; as such, sRNA dysregulation can cause severe diseases. The modern method of choice for sRNA expression profiling is sRNA sequencing (sRNA-seq). There are several sRNA-seq analysis platforms available that differ in their analysis portfolio, per- formance, and user-friendliness. However, these analysis platforms lack one or more important features such as disease biomarkers identification, detection of viral and bac- terial infections in sRNA-seq samples, storage of novel predicted miRNAs, multivariate differential expression(DE) analysis and automated submission
Recommended publications
  • Analysis of Trans Esnps Infers Regulatory Network Architecture
    Analysis of trans eSNPs infers regulatory network architecture Anat Kreimer Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences COLUMBIA UNIVERSITY 2014 © 2014 Anat Kreimer All rights reserved ABSTRACT Analysis of trans eSNPs infers regulatory network architecture Anat Kreimer eSNPs are genetic variants associated with transcript expression levels. The characteristics of such variants highlight their importance and present a unique opportunity for studying gene regulation. eSNPs affect most genes and their cell type specificity can shed light on different processes that are activated in each cell. They can identify functional variants by connecting SNPs that are implicated in disease to a molecular mechanism. Examining eSNPs that are associated with distal genes can provide insights regarding the inference of regulatory networks but also presents challenges due to the high statistical burden of multiple testing. Such association studies allow: simultaneous investigation of many gene expression phenotypes without assuming any prior knowledge and identification of unknown regulators of gene expression while uncovering directionality. This thesis will focus on such distal eSNPs to map regulatory interactions between different loci and expose the architecture of the regulatory network defined by such interactions. We develop novel computational approaches and apply them to genetics-genomics data in human. We go beyond pairwise interactions to define network motifs, including regulatory modules and bi-fan structures, showing them to be prevalent in real data and exposing distinct attributes of such arrangements. We project eSNP associations onto a protein-protein interaction network to expose topological properties of eSNPs and their targets and highlight different modes of distal regulation.
    [Show full text]
  • Transcriptome Analyses of Rhesus Monkey Pre-Implantation Embryos Reveal A
    Downloaded from genome.cshlp.org on September 23, 2021 - Published by Cold Spring Harbor Laboratory Press Transcriptome analyses of rhesus monkey pre-implantation embryos reveal a reduced capacity for DNA double strand break (DSB) repair in primate oocytes and early embryos Xinyi Wang 1,3,4,5*, Denghui Liu 2,4*, Dajian He 1,3,4,5, Shengbao Suo 2,4, Xian Xia 2,4, Xiechao He1,3,6, Jing-Dong J. Han2#, Ping Zheng1,3,6# Running title: reduced DNA DSB repair in monkey early embryos Affiliations: 1 State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan 650223, China 2 Key Laboratory of Computational Biology, CAS Center for Excellence in Molecular Cell Science, Collaborative Innovation Center for Genetics and Developmental Biology, Chinese Academy of Sciences-Max Planck Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China 3 Yunnan Key Laboratory of Animal Reproduction, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan 650223, China 4 University of Chinese Academy of Sciences, Beijing, China 5 Kunming College of Life Science, University of Chinese Academy of Sciences, Kunming, Yunnan 650204, China 6 Primate Research Center, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, 650223, China * Xinyi Wang and Denghui Liu contributed equally to this work 1 Downloaded from genome.cshlp.org on September 23, 2021 - Published by Cold Spring Harbor Laboratory Press # Correspondence: Jing-Dong J. Han, Email: [email protected]; Ping Zheng, Email: [email protected] Key words: rhesus monkey, pre-implantation embryo, DNA damage 2 Downloaded from genome.cshlp.org on September 23, 2021 - Published by Cold Spring Harbor Laboratory Press ABSTRACT Pre-implantation embryogenesis encompasses several critical events including genome reprogramming, zygotic genome activation (ZGA) and cell fate commitment.
    [Show full text]
  • Seq2pathway Vignette
    seq2pathway Vignette Bin Wang, Xinan Holly Yang, Arjun Kinstlick May 19, 2021 Contents 1 Abstract 1 2 Package Installation 2 3 runseq2pathway 2 4 Two main functions 3 4.1 seq2gene . .3 4.1.1 seq2gene flowchart . .3 4.1.2 runseq2gene inputs/parameters . .5 4.1.3 runseq2gene outputs . .8 4.2 gene2pathway . 10 4.2.1 gene2pathway flowchart . 11 4.2.2 gene2pathway test inputs/parameters . 11 4.2.3 gene2pathway test outputs . 12 5 Examples 13 5.1 ChIP-seq data analysis . 13 5.1.1 Map ChIP-seq enriched peaks to genes using runseq2gene .................... 13 5.1.2 Discover enriched GO terms using gene2pathway_test with gene scores . 15 5.1.3 Discover enriched GO terms using Fisher's Exact test without gene scores . 17 5.1.4 Add description for genes . 20 5.2 RNA-seq data analysis . 20 6 R environment session 23 1 Abstract Seq2pathway is a novel computational tool to analyze functional gene-sets (including signaling pathways) using variable next-generation sequencing data[1]. Integral to this tool are the \seq2gene" and \gene2pathway" components in series that infer a quantitative pathway-level profile for each sample. The seq2gene function assigns phenotype-associated significance of genomic regions to gene-level scores, where the significance could be p-values of SNPs or point mutations, protein-binding affinity, or transcriptional expression level. The seq2gene function has the feasibility to assign non-exon regions to a range of neighboring genes besides the nearest one, thus facilitating the study of functional non-coding elements[2]. Then the gene2pathway summarizes gene-level measurements to pathway-level scores, comparing the quantity of significance for gene members within a pathway with those outside a pathway.
    [Show full text]
  • A Computational Approach for Defining a Signature of Β-Cell Golgi Stress in Diabetes Mellitus
    Page 1 of 781 Diabetes A Computational Approach for Defining a Signature of β-Cell Golgi Stress in Diabetes Mellitus Robert N. Bone1,6,7, Olufunmilola Oyebamiji2, Sayali Talware2, Sharmila Selvaraj2, Preethi Krishnan3,6, Farooq Syed1,6,7, Huanmei Wu2, Carmella Evans-Molina 1,3,4,5,6,7,8* Departments of 1Pediatrics, 3Medicine, 4Anatomy, Cell Biology & Physiology, 5Biochemistry & Molecular Biology, the 6Center for Diabetes & Metabolic Diseases, and the 7Herman B. Wells Center for Pediatric Research, Indiana University School of Medicine, Indianapolis, IN 46202; 2Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis, Indianapolis, IN, 46202; 8Roudebush VA Medical Center, Indianapolis, IN 46202. *Corresponding Author(s): Carmella Evans-Molina, MD, PhD ([email protected]) Indiana University School of Medicine, 635 Barnhill Drive, MS 2031A, Indianapolis, IN 46202, Telephone: (317) 274-4145, Fax (317) 274-4107 Running Title: Golgi Stress Response in Diabetes Word Count: 4358 Number of Figures: 6 Keywords: Golgi apparatus stress, Islets, β cell, Type 1 diabetes, Type 2 diabetes 1 Diabetes Publish Ahead of Print, published online August 20, 2020 Diabetes Page 2 of 781 ABSTRACT The Golgi apparatus (GA) is an important site of insulin processing and granule maturation, but whether GA organelle dysfunction and GA stress are present in the diabetic β-cell has not been tested. We utilized an informatics-based approach to develop a transcriptional signature of β-cell GA stress using existing RNA sequencing and microarray datasets generated using human islets from donors with diabetes and islets where type 1(T1D) and type 2 diabetes (T2D) had been modeled ex vivo. To narrow our results to GA-specific genes, we applied a filter set of 1,030 genes accepted as GA associated.
    [Show full text]
  • Parallel Next Generation Sequencing of DNA and RNA from a Single
    www.nature.com/scientificreports OPEN Towards Improving Embryo Prioritization: Parallel Next Generation Sequencing of DNA and Received: 2 August 2018 Accepted: 14 January 2019 RNA from a Single Trophectoderm Published: xx xx xxxx Biopsy Noga Fuchs Weizman1, Brandon A. Wyse1, Ran Antes1, Zenon Ibarrientos1, Mugundhine Sangaralingam1, Gelareh Motamedi1, Valeriy Kuznyetsov1, Svetlana Madjunkova1 & Cliford L. Librach1,2,3,4 Improved embryo prioritization is crucial in optimizing the results in assisted reproduction, especially in light of increasing utilization of elective single embryo transfers. Embryo prioritization is currently based on morphological criteria and in some cases incorporates preimplantation genetic testing for aneuploidy (PGT-A). Recent technological advances have enabled parallel genomic and transcriptomic assessment of a single cell. Adding transcriptomic analysis to PGT-A holds promise for better understanding early embryonic development and implantation, and for enhancing available embryo prioritization tools. Our aim was to develop a platform for parallel genomic and transcriptomic sequencing of a single trophectoderm (TE) biopsy, that could later be correlated with clinical outcomes. Twenty-fve embryos donated for research were utilized; eight for initial development and optimization of our method, and seventeen to demonstrate clinical safety and reproducibility of this method. Our method achieved 100% concordance for ploidy status with that achieved by the classic PGT-A. All sequencing data exceeded quality control metrics. Transcriptomic sequencing data was sufcient for performing diferential expression (DE) analysis. All biopsies expressed specifc TE markers, further validating the accuracy of our method. Using PCA, samples clustered in euploid and aneuploid aggregates, highlighting the importance of controlling for ploidy in every transcriptomic assessment.
    [Show full text]
  • Noelia Díaz Blanco
    Effects of environmental factors on the gonadal transcriptome of European sea bass (Dicentrarchus labrax), juvenile growth and sex ratios Noelia Díaz Blanco Ph.D. thesis 2014 Submitted in partial fulfillment of the requirements for the Ph.D. degree from the Universitat Pompeu Fabra (UPF). This work has been carried out at the Group of Biology of Reproduction (GBR), at the Department of Renewable Marine Resources of the Institute of Marine Sciences (ICM-CSIC). Thesis supervisor: Dr. Francesc Piferrer Professor d’Investigació Institut de Ciències del Mar (ICM-CSIC) i ii A mis padres A Xavi iii iv Acknowledgements This thesis has been made possible by the support of many people who in one way or another, many times unknowingly, gave me the strength to overcome this "long and winding road". First of all, I would like to thank my supervisor, Dr. Francesc Piferrer, for his patience, guidance and wise advice throughout all this Ph.D. experience. But above all, for the trust he placed on me almost seven years ago when he offered me the opportunity to be part of his team. Thanks also for teaching me how to question always everything, for sharing with me your enthusiasm for science and for giving me the opportunity of learning from you by participating in many projects, collaborations and scientific meetings. I am also thankful to my colleagues (former and present Group of Biology of Reproduction members) for your support and encouragement throughout this journey. To the “exGBRs”, thanks for helping me with my first steps into this world. Working as an undergrad with you Dr.
    [Show full text]
  • Targeting Toxoplasma Gondii CPSF3 As a New Approach to Control Toxoplasmosis
    Published online: February 1, 2017 Research Article Targeting Toxoplasma gondii CPSF3 as a new approach to control toxoplasmosis Andrés Palencia1,2,*,† , Alexandre Bougdour1,**,† , Marie-Pierre Brenier-Pinchart1, Bastien Touquet1, Rose-Laurence Bertini1, Cristina Sensi2, Gabrielle Gay1, Julien Vollaire3, Véronique Josserand3, Eric Easom4, Yvonne R Freund4, Hervé Pelloux1, Philip J Rosenthal5, Stephen Cusack2 & Mohamed-Ali Hakimi1,*** Abstract Introduction Toxoplasma gondii is an important food and waterborne Toxoplasma gondii chronically infects about 30–50% of the pathogen causing toxoplasmosis, a potentially severe disease in human population (Pappas et al, 2009; Flegr et al, 2014; Parlog immunocompromised or congenitally infected humans. Available et al, 2015). Toxoplasmosis is usually an unapparent or mild therapeutic agents are limited by suboptimal efficacy and disease in immunocompetent individuals, but it is a serious threat frequent side effects that can lead to treatment discontinuation. in immunocompromised patients, who can experience lethal or Here we report that the benzoxaborole AN3661 had potent chronic cardiac, pulmonary or cerebral pathologies. Moreover, in vitro activity against T. gondii. Parasites selected to be resis- congenital toxoplasmosis can cause a range of problems including tant to AN3661 had mutations in TgCPSF3, which encodes a foetal malformations and retinochoroiditis. Current therapies for homologue of cleavage and polyadenylation specificity factor toxoplasmosis are reasonably effective, but they require long subunit 3 (CPSF-73 or CPSF3), an endonuclease involved in mRNA durations of treatment, often with toxic side effects (Farthing processing in eukaryotes. Point mutations in TgCPSF3 introduced et al, 1992; Fung & Kirschenbaum, 1996), underlining the need into wild-type parasites using the CRISPR/Cas9 system recapitu- for new classes of drugs to treat this infection (Neville et al, lated the resistance phenotype.
    [Show full text]
  • Aneuploidy: Using Genetic Instability to Preserve a Haploid Genome?
    Health Science Campus FINAL APPROVAL OF DISSERTATION Doctor of Philosophy in Biomedical Science (Cancer Biology) Aneuploidy: Using genetic instability to preserve a haploid genome? Submitted by: Ramona Ramdath In partial fulfillment of the requirements for the degree of Doctor of Philosophy in Biomedical Science Examination Committee Signature/Date Major Advisor: David Allison, M.D., Ph.D. Academic James Trempe, Ph.D. Advisory Committee: David Giovanucci, Ph.D. Randall Ruch, Ph.D. Ronald Mellgren, Ph.D. Senior Associate Dean College of Graduate Studies Michael S. Bisesi, Ph.D. Date of Defense: April 10, 2009 Aneuploidy: Using genetic instability to preserve a haploid genome? Ramona Ramdath University of Toledo, Health Science Campus 2009 Dedication I dedicate this dissertation to my grandfather who died of lung cancer two years ago, but who always instilled in us the value and importance of education. And to my mom and sister, both of whom have been pillars of support and stimulating conversations. To my sister, Rehanna, especially- I hope this inspires you to achieve all that you want to in life, academically and otherwise. ii Acknowledgements As we go through these academic journeys, there are so many along the way that make an impact not only on our work, but on our lives as well, and I would like to say a heartfelt thank you to all of those people: My Committee members- Dr. James Trempe, Dr. David Giovanucchi, Dr. Ronald Mellgren and Dr. Randall Ruch for their guidance, suggestions, support and confidence in me. My major advisor- Dr. David Allison, for his constructive criticism and positive reinforcement.
    [Show full text]
  • Binding Specificities of Human RNA Binding Proteins Towards Structured
    bioRxiv preprint doi: https://doi.org/10.1101/317909; this version posted March 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 1 Binding specificities of human RNA binding proteins towards structured and linear 2 RNA sequences 3 4 Arttu Jolma1,#, Jilin Zhang1,#, Estefania Mondragón4,#, Teemu Kivioja2, Yimeng Yin1, 5 Fangjie Zhu1, Quaid Morris5,6,7,8, Timothy R. Hughes5,6, Louis James Maher III4 and Jussi 6 Taipale1,2,3,* 7 8 9 AUTHOR AFFILIATIONS 10 11 1Department of Medical Biochemistry and Biophysics, Karolinska Institutet, Solna, Sweden 12 2Genome-Scale Biology Program, University of Helsinki, Helsinki, Finland 13 3Department of Biochemistry, University of Cambridge, Cambridge, United Kingdom 14 4Department of Biochemistry and Molecular Biology and Mayo Clinic Graduate School of 15 Biomedical Sciences, Mayo Clinic College of Medicine and Science, Rochester, USA 16 5Department of Molecular Genetics, University of Toronto, Toronto, Canada 17 6Donnelly Centre, University of Toronto, Toronto, Canada 18 7Edward S Rogers Sr Department of Electrical and Computer Engineering, University of 19 Toronto, Toronto, Canada 20 8Department of Computer Science, University of Toronto, Toronto, Canada 21 #Authors contributed equally 22 *Correspondence: [email protected] 23 24 25 SUMMARY 26 27 Sequence specific RNA-binding proteins (RBPs) control many important 28 processes affecting gene expression. They regulate RNA metabolism at multiple 29 levels, by affecting splicing of nascent transcripts, RNA folding, base modification, 30 transport, localization, translation and stability. Despite their central role in most 31 aspects of RNA metabolism and function, most RBP binding specificities remain 32 unknown or incompletely defined.
    [Show full text]
  • Transcriptome Analysis Uncovers the Diagnostic Value of Mir-192-5P/HNF1A-AS1/VIL1 Panel in Cervical Adenocarcinoma
    www.nature.com/scientificreports OPEN Transcriptome analysis uncovers the diagnostic value of miR‑192‑5p/ HNF1A‑AS1/VIL1 panel in cervical adenocarcinoma Junfen Xu1*, Jian Zou1, Luyao Wu1 & Weiguo Lu1,2* Despite the fact that the incidence of cervical squamous cell carcinoma has decreased, there is an increase in the incidence of cervical adenocarcinoma. However, our knowledge on cervical adenocarcinoma is largely unclear. Transcriptome sequencing was conducted to compare 4 cervical adenocarcinoma tissue samples with 4 normal cervical tissue samples. mRNA, lncRNA, and miRNA signatures were identifed to discriminate cervical adenocarcinoma from normal cervix. The expression of VIL1, HNF1A‑AS1, MIR194‑2HG, SSTR5‑AS1, miR‑192‑5p, and miR‑194‑5p in adenocarcinoma were statistically signifcantly higher than that in normal control samples. The Receiver Operating Characteristic (ROC) curve analysis indicated that combination of miR‑192‑5p, HNF1A‑AS1, and VIL1 yielded a better performance (AUC = 0.911) than any single molecule -and could serve as potential biomarkers for cervical adenocarcinoma. Of note, the combination model also gave better performance than TCT test for cervical adenocarcinoma diagnosis. However, there was no correlation between miR‑192‑5p or HNF1A‑AS1 and HPV16/18 E6 or E7. VIL1 was weakly correlated with HPV18 E7 expression. In summary, our study has identifed miR‑192‑5p/HNF1A‑AS1/VIL1 panel that accurately discriminates adenocarcinoma from normal cervix. Detection of this panel may provide considerable clinical value in the diagnosis of cervical adenocarcinoma. Abbreviations SCC Cervical squamous cell carcinoma NcRNA Non-coding RNA MiRNA MicroRNA LncRNA Long non-coding RNA ROC Receiver operating characteristic AUC​ Area under the ROC curve RT-qPCR Quantitative reverse-transcriptase PCR HNF1A-AS1 LncRNA HNF1A antisense RNA 1 VIL1 Villin 1 Cervical cancer ranks fourth for both cancer incidence and mortality in women worldwide 1.
    [Show full text]
  • Whole Exome Sequencing in Families at High Risk for Hodgkin Lymphoma: Identification of a Predisposing Mutation in the KDR Gene
    Hodgkin Lymphoma SUPPLEMENTARY APPENDIX Whole exome sequencing in families at high risk for Hodgkin lymphoma: identification of a predisposing mutation in the KDR gene Melissa Rotunno, 1 Mary L. McMaster, 1 Joseph Boland, 2 Sara Bass, 2 Xijun Zhang, 2 Laurie Burdett, 2 Belynda Hicks, 2 Sarangan Ravichandran, 3 Brian T. Luke, 3 Meredith Yeager, 2 Laura Fontaine, 4 Paula L. Hyland, 1 Alisa M. Goldstein, 1 NCI DCEG Cancer Sequencing Working Group, NCI DCEG Cancer Genomics Research Laboratory, Stephen J. Chanock, 5 Neil E. Caporaso, 1 Margaret A. Tucker, 6 and Lynn R. Goldin 1 1Genetic Epidemiology Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, Bethesda, MD; 2Cancer Genomics Research Laboratory, Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, Bethesda, MD; 3Ad - vanced Biomedical Computing Center, Leidos Biomedical Research Inc.; Frederick National Laboratory for Cancer Research, Frederick, MD; 4Westat, Inc., Rockville MD; 5Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, Bethesda, MD; and 6Human Genetics Program, Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, Bethesda, MD, USA ©2016 Ferrata Storti Foundation. This is an open-access paper. doi:10.3324/haematol.2015.135475 Received: August 19, 2015. Accepted: January 7, 2016. Pre-published: June 13, 2016. Correspondence: [email protected] Supplemental Author Information: NCI DCEG Cancer Sequencing Working Group: Mark H. Greene, Allan Hildesheim, Nan Hu, Maria Theresa Landi, Jennifer Loud, Phuong Mai, Lisa Mirabello, Lindsay Morton, Dilys Parry, Anand Pathak, Douglas R. Stewart, Philip R. Taylor, Geoffrey S. Tobias, Xiaohong R. Yang, Guoqin Yu NCI DCEG Cancer Genomics Research Laboratory: Salma Chowdhury, Michael Cullen, Casey Dagnall, Herbert Higson, Amy A.
    [Show full text]
  • The Interaction of DNA Repair Factors ASCC2 and ASCC3 Is Affected by Somatic Cancer Mutations
    ARTICLE https://doi.org/10.1038/s41467-020-19221-x OPEN The interaction of DNA repair factors ASCC2 and ASCC3 is affected by somatic cancer mutations Junqiao Jia 1, Eva Absmeier1,5, Nicole Holton1, Agnieszka J. Pietrzyk-Brzezinska1,6, Philipp Hackert2, ✉ Katherine E. Bohnsack 2, Markus T. Bohnsack2,3 & Markus C. Wahl 1,4 The ASCC3 subunit of the activating signal co-integrator complex is a dual-cassette Ski2-like nucleic acid helicase that provides single-stranded DNA for alkylation damage repair by the 1234567890():,; α-ketoglutarate-dependent dioxygenase AlkBH3. Other ASCC components integrate ASCC3/ AlkBH3 into a complex DNA repair pathway. We mapped and structurally analyzed inter- acting ASCC2 and ASCC3 regions. The ASCC3 fragment comprises a central helical domain and terminal, extended arms that clasp the compact ASCC2 unit. ASCC2–ASCC3 interfaces are evolutionarily highly conserved and comprise a large number of residues affected by somatic cancer mutations. We quantified contributions of protein regions to the ASCC2–ASCC3 interaction, observing that changes found in cancers lead to reduced ASCC2–ASCC3 affinity. Functional dissection of ASCC3 revealed similar organization and regulation as in the spliceosomal RNA helicase Brr2. Our results delineate functional regions in an important DNA repair complex and suggest possible molecular disease principles. 1 Laboratory of Structural Biochemistry, Freie Universität Berlin, D-14195 Berlin, Germany. 2 Department of Molecular Biology, University Medical Centre Göttingen, Göttingen, Germany. 3 Göttingen Center for Molecular Biosciences, Georg-August-Universität, Göttingen, Germany. 4 Helmholtz-Zentrum Berlin für Materialien und Energie, Macromolecular Crystallography, D-12489 Berlin, Germany. 5Present address: MRC Laboratory of Molecular Biology, Cambridge Biomedical Campus, Cambridge CB2 0QH, UK.
    [Show full text]