An Integrated Epigenomic–Transcriptomic Landscape of Lung

Total Page:16

File Type:pdf, Size:1020Kb

An Integrated Epigenomic–Transcriptomic Landscape of Lung 1 An integrated epigenomic–transcriptomic landscape of lung cancer reveals novel 2 methylation driver genes of diagnostic and therapeutic relevance 3 4 Xiwei Sun1#, Jiani Yi 1#, Juze Yang1#, Yi Han1, Xinyi Qian2, Yi Liu1, Jia Li1, Bingjian Lu2, 5 Jisong Zhang1, Xiaoqing Pan3, Yong Liu4, Mingyu Liang4, Enguo Chen1,5*, Pengyuan 6 Liu1,4,5*, Yan Lu2,5* 7 8 1Sir Run Run Shaw Hospital and Institute of Translational Medicine, Zhejiang University 9 School of Medicine, Hangzhou, Zhejiang 310016, China. 10 2Center for Uterine Cancer Diagnosis & Therapy Research of Zhejiang Province, Women's 11 Reproductive Health Key Laboratory of Zhejiang Province, Department of Gynecologic 12 Oncology, Women’s Hospital and Institute of Translational Medicine, School of Medicine, 13 Zhejiang University, Hangzhou, Zhejiang 310006, China. 14 3Department of Mathematics, Zhejiang University, Hangzhou, Zhejiang 310027, China. 15 4Center of Systems Molecular Medicine, Department of Physiology, Medical College of 16 Wisconsin, Milwaukee, WI 53226, USA 17 5Cancer center, Zhejiang University, Hangzhou, Zhejiang 310013, China. 18 19 # XS, JNY, and JZY equally. 20 Running title: Integrative epigenomic–transcriptomic landscape of lung cancer 21 Key words: DNA methylation; driver genes; epigenomics; lncRNA, lung cancer; miRNA, 22 reduced representation bisulfite sequencing; transcriptomics; transcription factor 23 *Correspondence: [email protected], [email protected] or [email protected] 1 24 Abstract 25 Background: Aberrant DNA methylation occurs commonly during carcinogenesis and is of 26 clinical value in human cancers. However, knowledge of the impact of DNA methylation 27 changes on lung carcinogenesis and progression remains limited. 28 Methods: Genome-wide DNA methylation profiles were surveyed in 18 pairs of tumors and 29 adjacent normal tissues from non-small cell lung cancer (NSCLC) patients using Reduced 30 Representation Bisulfite Sequencing (RRBS). An integrated epigenomic–transcriptomic 31 landscape of lung cancer was depicted using the multi-omics data integration method. 32 Results: We discovered a large number of hypermethylation events pre-marked by poised 33 promoter in embryonic stem cells, being a hallmark of lung cancer. These hypermethylation 34 events showed a high conservation across cancer types. Eight novel driver genes with 35 aberrant methylation (e.g., PCDH17 and IRX1) were identified by integrated analysis of 36 DNA methylation and transcriptomic data. Methylation level of the eight genes measured by 37 pyrosequencing can distinguish NSCLC patients from lung tissues with high sensitivity and 38 specificity in an independent cohort. Their tumor-suppressive roles were further 39 experimentally validated in lung cancer cells, which depend on promoter hypermethylation. 40 Similarly, 13 methylation-driven ncRNAs (including 8 lncRNAs and 5 miRNAs) were 41 identified, some of which were co-regulated with their host genes by the same promoter 42 hypermethylation. Finally, by analyzing the transcription factor (TF) binding motifs, we 43 uncovered sets of TFs driving the expression of epigenetically regulated genes and 44 highlighted the epigenetic regulation of gene expression of TCF21 through DNA methylation 45 of EGR1 binding motifs. 46 Conclusions: We discovered several novel methylation driver genes of diagnostic and 47 therapeutic relevance in lung cancer. Our findings revealed that DNA methylation in TF 48 binding motifs regulates target gene expression by affecting the binding ability of TFs. Our 49 study also provides a valuable epigenetic resource for identifying DNA methylation-based 50 diagnostic biomarkers, developing cancer drugs for epigenetic therapy and studying cancer 51 pathogenesis. 2 52 3 53 Introduction 54 Lung cancer is the leading cause of cancer mortality worldwide and non-small cell lung 55 cancers (NSCLCs) account for about 85% of lung cancers [1, 2]. Although there are new 56 agents for the treatment of NSCLC patients, the 5-year survival rate is still estimated at 15% 57 [1, 3, 4]. To reduce the high lethality of NSCLCs, many significant efforts have been made. 58 Previous studies have reported causal genetic alterations, from somatic point mutations to 59 large structure variations, involved in the carcinogenesis of NSCLCs [5-9]. Based on these 60 findings, various therapeutic strategies have been developed for NSCLC treatment, such as 61 combination cisplatin-based chemotherapy with the anti-angiogenic bevacizumab [10, 11], 62 the use of tyrosine kinase inhibitors to treat EGFR-mutated, ALK or ROS1-rearranged 63 NSCLC patients [12-15]. However, owing to the primary, adaptive and acquired resistance of 64 these drugs, the target treatment is not effective for all NSCLC patients [16]. DNA 65 methylation alterations commonly occur in cancer and are involved in tumor initiation and 66 progression. Aberrant promoter CpG methylation provides a selective mechanism for the 67 regulation of tumor suppressor genes and oncogenes in cancer instead of genetic mutations 68 [17]. However, these methylation changes could be inherently reversed by agents, in contrast 69 with genetic mutations [18]. Thus, identification of epigenetically modulated genes opens 70 new avenues for epigenetic therapies and for discovery of novel drug targets. 71 In NSCLC, CDKN2A was the first reported gene whose downregulated expression in 72 lung carcinogenesis was predominantly attributed to promoter hypermethylation [19]. 73 Afterwards, several studies described several other epigenetically regulated genes, such as 74 FHIT, DAPK and RASSF1A [20-26]. However, these earlier studies only investigated a 75 single gene or a small list of genes. The advent of high throughput next-generation 76 sequencing (NGS) for analyzing DNA methylome and transcriptome has offered the unique 77 ability to analyze methylation changes and detect methylation driver events at a genome-wide 78 scale [27-29]. 79 In this study, we analyzed genome-wide methylation profiles of ~ 2 million CpG sites 80 spanning more than 19,600 genes and noncoding regions using Reduced Representation 81 Bisulfite Sequencing (RRBS) technology [30]. RRBS combines restriction enzymes and 82 bisulfite sequencing to enrich for DNA segments with a high CpG content and regulatory 83 potentials. It is an efficient and cost-effective technique for analyzing the genome-wide 84 methylation profiles on a single nucleotide level. Using this technique and the multi-omics 85 data integration method, we pinpointed several novel methylation driver protein coding genes 86 and noncoding RNAs that could be potential targets for epigenetic therapy. Furthermore, we 87 detected sets of transcription factor (TF) binding motifs located in differentially methylated 88 regions (DMRs) which regulated target gene expression by affecting the binding ability of 89 TFs in lung cancer. 4 90 Materials and Methods 91 Reduced representation bisulfite sequencing (RRBS) 92 The study was approved by the institutional review board of Sir Run Run Shaw Hospital 93 at Zhejiang University (Hangzhou, China). Eighteen lung tumor tissues and adjacent normal 94 tissues were collected from NSCLC patients (Table S1). All samples used in the current study 95 were obtained at the time of diagnosis before any treatment was administered. Genomic DNA 96 of these tissues was extracted and then treated according to the RRBS library preparation 97 protocols as we previously described [30] with modifications to allow multiplexing [31]. 98 Paired-end sequencing with 100bp was performed on the Illumina HiSeq 2000 according to 99 manufacturer’s protocol. 100 Bioinformatics analyses of RRBS data 101 Bisulfite sequencing reads were pre-processed with Trim Galore 102 (http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/). Both adapters and 103 sequences with low quality (base quality < 20) were removed before the analysis. The 104 trimmed reads were then aligned to the human reference genome (hg19) and the methylation 105 status of each CpG was determined using Bismark (v0.14.1) with default parameters [32] 106 (Table S2). The unconverted cytosines at fill-in 3′ MspI sites of sequencing reads were used 107 to estimate the bisulfite conversion rate. For each CpG site with at least 5×coverage, the 108 methylation rate, C/(C+T), was calculated. We merged the 18 tumor-normal sample pairs 109 based on CpG coordinates, yielding 2,574,098 CpG sites that covered at least ten paired 110 samples. CpG sites in the heterosome and CpG sites overlapped with SNP (dbSNP build 142) 111 were filtered, and 2,166,853 were retained for subsequent analysis. The remaining missing 112 methylation values were imputed using k-nearest neighbors in the CpGs space 113 (http://bioconductor.org/packages/release/bioc/html/impute.html). 114 Next, we used metilene (Version 0.23) [33] to identify DMRs between tumor and 115 matched noncancerous tissues. These DMRs were further examined for significance using the 116 Wilcoxon rank sum test for a paired dataset. The final DMRs were determined using the 117 following threshold: at least ten CpGs in the DMRs, at least 10% differences in methylation, 118 and false discovery rate (FDR) < 0.05. We performed an unsupervised hierarchical cluster 119 analysis on the CpG sites’ methylation using ward linkage and Euclidean distance. The 120 Metascape software (http://metascape.org) was used to conduct Gene Ontology (GO) 121 analyses according to the standard protocol. 122 RNA-seq analysis 123 mRNA sequencing libraries were prepared for
Recommended publications
  • Gene Nomenclature System for Rice
    Rice (2008) 1:72–84 DOI 10.1007/s12284-008-9004-9 Gene Nomenclature System for Rice Susan R. McCouch & CGSNL (Committee on Gene Symbolization, Nomenclature and Linkage, Rice Genetics Cooperative) Received: 11 April 2008 /Accepted: 3 June 2008 /Published online: 15 August 2008 # The Author(s) 2008 Abstract The Committee on Gene Symbolization, Nomen- protein sequence information that have been annotated and clature and Linkage (CGSNL) of the Rice Genetics Cooper- mapped on the sequenced genome assemblies, as well as ative has revised the gene nomenclature system for rice those determined by biochemical characterization and/or (Oryza) to take advantage of the completion of the rice phenotype characterization by way of forward genetics. With genome sequence and the emergence of new methods for these revisions, we enhance the potential for structural, detecting, characterizing, and describing genes in the functional, and evolutionary comparisons across organisms biological community. This paper outlines a set of standard and seek to harmonize the rice gene nomenclature system procedures for describing genes based on DNA, RNA, and with that of other model organisms. Newly identified rice genes can now be registered on-line at http://shigen.lab.nig. ac.jp/rice/oryzabase_submission/gene_nomenclature/. The current ex-officio member list below is correct as of the date of this galley proof. The current ex-officio members of CGSNL (The Keywords Oryza sativa . Genome sequencing . Committee on Gene Symbolization, Nomenclature and Linkage) are: Gene symbolization Atsushi Yoshimura from the Faculty of Agriculture, Kyushu University, Fukuoka, 812-8581, Japan, email: [email protected] Guozhen Liu from the Beijing Institute of Genomics of the Chinese Introduction Academy of Sciences, Beijing, People’s Republic of China, email: [email protected] The biological community is moving towards a universal Yasuo Nagato from the Graduate School of Agricultural and Life system for the naming of genes.
    [Show full text]
  • Systematic Review of the Genetics of Sudden Unexpected Death in Epilepsy: Potential Overlap with Sudden Cardiac Death and Arrhythmia-Related Genes C
    SYSTEMATIC REVIEW AND META-ANALYSIS Systematic Review of the Genetics of Sudden Unexpected Death in Epilepsy: Potential Overlap With Sudden Cardiac Death and Arrhythmia-Related Genes C. Anwar A. Chahal, MBChB; Mohammad N. Salloum, MD; Fares Alahdab, MD; Joseph A. Gottwald, PharmD; David J. Tester, BS; Lucman A. Anwer, MD; Elson L. So, MD; Mohammad Hassan Murad, MD, MPH; Erik K. St Louis, MD, MS; Michael J. Ackerman, MD, PhD; Virend K. Somers, MD, PhD Background-—Sudden unexpected death in epilepsy (SUDEP) is the leading cause of epilepsy-related death. SUDEP shares many features with sudden cardiac death and sudden unexplained death in the young and may have a similar genetic contribution. We aim to systematically review the literature on the genetics of SUDEP. Methods and Results-—PubMed, MEDLINE Epub Ahead of Print, Ovid Medline In-Process & Other Non-Indexed Citations, MEDLINE, EMBASE, Cochrane Database of Systematic Reviews, and Scopus were searched through April 4, 2017. English language human studies analyzing SUDEP for known sudden death, ion channel and arrhythmia-related pathogenic variants, novel variant discovery, and copy number variant analyses were included. Aggregate descriptive statistics were generated; data were insufficient for meta- analysis. A total of 8 studies with 161 unique individuals were included; mean was age 29.0 (ÆSD 14.2) years; 61% males; ECG data were reported in 7.5% of cases; 50.7% were found prone and 58% of deaths were nocturnal. Cause included all types of epilepsy. Antemortem diagnosis of Dravet syndrome and autism (with duplication of chromosome 15) was associated with 11% and 9% of cases.
    [Show full text]
  • A Genetic Screen for Genes That Impact Peroxisomes in Drosophila Identifies
    bioRxiv preprint doi: https://doi.org/10.1101/704122; this version posted July 16, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. A genetic screen for genes that impact peroxisomes in Drosophila identifies candidate genes for human disease Hillary K. Graves1, Sharayu Jangam1, Kai Li Tan1, Antonella Pignata1, Elaine S. Seto1, Shinya Yamamoto1,2,3,4# and Michael F. Wangler1,3,4,# 1Department of Molecular and Human Genetics, Baylor College of Medicine (BCM), Houston, TX 77030, USA 2Department of Neuroscience, BCM, Houston, TX 77030, USA 3Program in Developmental Biology, BCM, Houston, TX 77030, USA 4Jan and Dan Duncan Neurological Research Institute, Texas Children Hospital, Houston, TX 77030, USA #Corresponding authors: SY ([email protected]), MFW ([email protected]) Keywords: Drosophila, peroxisomes, BRD4, fs(1)h bioRxiv preprint doi: https://doi.org/10.1101/704122; this version posted July 16, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. Abstract Peroxisomes are sub-cellular organelles that are essential for proper function of eukaryotic cells. In addition to being the sites of a variety of oxidative reactions, they are crucial regulators of lipid metabolism. Peroxisome loss or dysfunction leads to multi- system diseases in humans that strongly affects the nervous system.
    [Show full text]
  • Indicators of the Molecular Pathogenesis of Virulent Newcastle
    www.nature.com/scientificreports OPEN Indicators of the molecular pathogenesis of virulent Newcastle disease virus in chickens revealed by transcriptomic profling of spleen Mohammad Rabiei1*, Wai Yee Low2, Yan Ren2, Mohamad Indro Cahyono3, Phuong Thi Kim Doan1,4, Indi Dharmayanti3, Eleonora Dal Grande1 & Farhid Hemmatzadeh1,2 Newcastle disease virus (NDV) has caused signifcant outbreaks in South-East Asia, particularly in Indonesia in recent years. Recently emerged genotype VII NDVs (NDV-GVII) have shifted their tropism from gastrointestinal/respiratory tropism to a lymphotropic virus, invading lymphoid organs including spleen and bursa of Fabricius to cause profound lymphoid depletion. In this study, we aimed to identify candidate genes and biological pathways that contribute to the disease caused by this velogenic NDV-GVII. A transcriptomic analysis based on RNA-Seq of spleen was performed in chickens challenged with NDV-GVII and a control group. In total, 6361 genes were diferentially expressed that included 3506 up-regulated genes and 2855 down-regulated genes. Real-Time PCR of ten selected genes validated the RNA-Seq results as the correlation between them is 0.98. Functional and network analysis of Diferentially Expressed Genes (DEGs) showed altered regulation of ElF2 signalling, mTOR signalling, proliferation of cells of the lymphoid system, signalling by Rho family GTPases and synaptogenesis signalling in spleen. We have also identifed modifed expression of IFIT5, PI3K, AGT and PLP1 genes in NDV-GVII infected chickens. Our fndings in activation of autophagy-mediated cell death, lymphotropic and synaptogenesis signalling pathways provide new insights into the molecular pathogenesis of this newly emerged NDV-GVII. Newcastle disease virus (NDV) has a worldwide distribution.
    [Show full text]
  • Identification of Common Gene Networks Responsive To
    Cancer Gene Therapy (2014) 21, 542–548 © 2014 Nature America, Inc. All rights reserved 0929-1903/14 www.nature.com/cgt ORIGINAL ARTICLE Identification of common gene networks responsive to radiotherapy in human cancer cells D-L Hou1, L Chen2, B Liu1, L-N Song1 and T Fang1 Identification of the genes that are differentially expressed between radiosensitive and radioresistant cancers by global gene analysis may help to elucidate the mechanisms underlying tumor radioresistance and improve the efficacy of radiotherapy. An integrated analysis was conducted using publicly available GEO datasets to detect differentially expressed genes (DEGs) between cancer cells exhibiting radioresistance and cancer cells exhibiting radiosensitivity. Gene Ontology (GO) enrichment analyses, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis and protein–protein interaction (PPI) networks analysis were also performed. Five GEO datasets including 16 samples of radiosensitive cancers and radioresistant cancers were obtained. A total of 688 DEGs across these studies were identified, of which 374 were upregulated and 314 were downregulated in radioresistant cancer cell. The most significantly enriched GO terms were regulation of transcription, DNA-dependent (GO: 0006355, P = 7.00E-09) for biological processes, while those for molecular functions was protein binding (GO: 0005515, P = 1.01E-28), and those for cellular component was cytoplasm (GO: 0005737, P = 2.81E-26). The most significantly enriched pathway in our KEGG analysis was Pathways in cancer (P = 4.20E-07). PPI network analysis showed that IFIH1 (Degree = 33) was selected as the most significant hub protein. This integrated analysis may help to predict responses to radiotherapy and may also provide insights into the development of individualized therapies and novel therapeutic targets.
    [Show full text]
  • The Prepattern Transcription Factor Irx3 Directs Nephron Segment Identity
    Downloaded from genesdev.cshlp.org on October 8, 2021 - Published by Cold Spring Harbor Laboratory Press The prepattern transcription factor Irx3 directs nephron segment identity Luca Reggiani,1 Daniela Raciti,1 Rannar Airik,2 Andreas Kispert,2 and André W. Brändli1,3 1Department of Chemistry and Applied Biosciences, Institute of Pharmaceutical Sciences, ETH Zurich, CH-8093 Zurich, Switzerland; 2Institute of Molecular Biology, Hannover Medical School, D-30625 Hannover, Germany The nephron, the basic structural and functional unit of the vertebrate kidney, is organized into discrete segments, which are composed of distinct renal epithelial cell types. Each cell type carries out highly specific physiological functions to regulate fluid balance, osmolarity, and metabolic waste excretion. To date, the genetic basis of regionalization of the nephron has remained largely unknown. Here we show that Irx3,a member of the Iroquois (Irx) gene family, acts as a master regulator of intermediate tubule fate. Comparative studies in Xenopus and mouse have identified Irx1, Irx2, and Irx3 as an evolutionary conserved subset of Irx genes, whose expression represents the earliest manifestation of intermediate compartment patterning in the developing vertebrate nephron discovered to date. Intermediate tubule progenitors will give rise to epithelia of Henle’s loop in mammals. Loss-of-function studies indicate that irx1 and irx2 are dispensable, whereas irx3 is necessary for intermediate tubule formation in Xenopus. Furthermore, we demonstrate that misexpression of irx3 is sufficient to direct ectopic development of intermediate tubules in the Xenopus mesoderm. Taken together, irx3 is the first gene known to be necessary and sufficient to specify nephron segment fate in vivo.
    [Show full text]
  • The Genetic Basis of Malformation of Cortical Development Syndromes: Primary Focus on Aicardi Syndrome
    The Genetic Basis of Malformation of Cortical Development Syndromes: Primary Focus on Aicardi Syndrome Thuong Thi Ha B. Sc, M. Bio Neurogenetics Research Group The University of Adelaide Thesis submitted for the degree of Doctor of Philosophy In Discipline of Genetics and Evolution School of Biological Sciences Faculty of Science The University of Adelaide June 2018 Table of Contents Abstract 6 Thesis Declaration 8 Acknowledgements 9 Publications 11 Abbreviations 12 CHAPTER I: Introduction 15 1.1 Overview of Malformations of Cortical Development (MCD) 15 1.2 Introduction into Aicardi Syndrome 16 1.3 Clinical Features of Aicardi Syndrome 20 1.3.1 Epidemiology 20 1.3.2 Clinical Diagnosis 21 1.3.3 Differential Diagnosis 23 1.3.4 Development & Prognosis 24 1.4 Treatment 28 1.5 Pathogenesis of Aicardi Syndrome 29 1.5.1 Prenatal or Intrauterine Disturbances 29 1.5.2 Genetic Predisposition 30 1.6 Hypothesis & Aims 41 1.6.1 Hypothesis 41 1.6.2 Research Aims 41 1.7 Expected Outcomes 42 CHAPTER II: Materials & Methods 43 2.1 Study Design 43 2.1.1 Cohort of Study 44 2.1.2 Inheritance-based Strategy 45 1 2.1.3 Ethics for human and animals 46 2.2 Computational Methods 46 2.2.1 Pre-Processing Raw Reads 46 2.2.2 Sequencing Coverage 47 2.2.3 Variant Discovery 49 2.2.4 Annotating Variants 53 2.2.5 Evaluating Variants 55 2.3 Biological Methods 58 2.3.1 Cell Culture 58 2.3.2 Genomic DNA Sequencing 61 2.3.3 Plasmid cloning 69 2.3.4 RNA, whole exome & Whole Genome Sequencing 74 2.3.5 TOPFlash Assay 75 2.3.6 Western Blot 76 2.3.7 Morpholino Knockdown in Zebrafish 79 CHAPTER III: A mutation in COL4A2 causes autosomal dominant porencephaly with cataracts.
    [Show full text]
  • Highlights of the 'Gene Nomenclature Across Species' Meeting
    MEETING REPORT Highlights of the ‘Gene Nomenclature Across Species’ Meeting Elspeth A. Bruford* Project Coordinator, HUGO Gene Nomenclature Committee (HGNC), EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, UK *Correspondence to: E-mail: [email protected] Date received (in revised form): 24th February, 2010 Abstract The first ‘Gene Nomenclature Across Species’ meeting was held on 12th and 13th October 2009, at the Møller Centre in Cambridge, UK. This meeting, organised and hosted by the HUGO Gene Nomenclature Committee (HGNC), brought together invited experts from the fields of gene nomenclature, phylogenetics and genome assembly and annotation. The central aim of the meeting was to discuss the issues of coordinating gene naming across vertebrates, culminating in the publication of recommendations for assigning nomenclature to genes across multiple species. Meeting summary coorganiser of the meeting, kicked off by discussing The meeting began with a welcome and outline of the current work of the HGNC, ‘An Essential the agenda from Elspeth Bruford, one of the Resource for the Human Genome’. Matt outlined meeting organisers and the group coordinator for the roles of the HGNC, including a summary of the HUGO Gene Nomenclature Committee the process of symbol assignment, and its current (HGNC). HGNC has been based at the European efforts in coordinating gene naming across ver- Bioinformatics Institute (EBI) at Hinxton, UK, tebrates. He also highlighted instances where the since 2007. Since its inception in 1979, the lack of approved gene nomenclature for most mam- HGNC has been assigning gene symbols and malian genomes has resulted in valuable published names to all human genes, including pseudogenes data for these species being absent or confused in and non-coding RNAs.
    [Show full text]
  • Abundant and Dynamically Expressed Mirnas, Pirnas, and Other Small Rnas in the Vertebrate Xenopus Tropicalis
    Downloaded from genome.cshlp.org on September 26, 2021 - Published by Cold Spring Harbor Laboratory Press Letter Abundant and dynamically expressed miRNAs, piRNAs, and other small RNAs in the vertebrate Xenopus tropicalis Javier Armisen,1,2,3 Michael J. Gilchrist,1,3 Anna Wilczynska,2 Nancy Standart,2 and Eric A. Miska1,2,4 1Wellcome Trust Cancer Research UK Gurdon Institute, University of Cambridge, The Henry Wellcome Building of Cancer and Developmental Biology, Cambridge CB2 1QN, United Kingdom; 2Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, United Kingdom Small regulatory RNAs have recently emerged as key regulators of eukaryotic gene expression. Here we used high- throughput sequencing to determine small RNA populations in the germline and soma of the African clawed frog Xenopus tropicalis. We identified a number of miRNAs that were expressed in the female germline. miRNA expression profiling revealed that miR-202-5p is an oocyte-enriched miRNA. We identified two novel miRNAs that were expressed in the soma. In addition, we sequenced large numbers of Piwi-associated RNAs (piRNAs) and other endogenous small RNAs, likely representing endogenous siRNAs (endo-siRNAs). Of these, only piRNAs were restricted to the germline, suggesting that endo-siRNAs are an abundant class of small RNAs in the vertebrate soma. In the germline, both endogenous small RNAs and piRNAs mapped to many high copy number loci. Furthermore, endogenous small RNAs mapped to the same specific subsets of repetitive elements in both the soma and the germline, suggesting that these RNAs might act to silence repetitive elements in both compartments. Data presented here suggest a conserved role for miRNAs in the vertebrate germline.
    [Show full text]
  • Downloaded from the CAVA Integration with Data Generated by Pre-NGS Methods Webpage [19]
    Münz et al. Genome Medicine (2015) 7:76 DOI 10.1186/s13073-015-0195-6 METHOD Open Access CSN and CAVA: variant annotation tools for rapid, robust next-generation sequencing analysis in the clinical setting Márton Münz1†, Elise Ruark2†, Anthony Renwick2, Emma Ramsay2, Matthew Clarke2, Shazia Mahamdallie2,3, Victoria Cloke3, Sheila Seal2,3, Ann Strydom2,3, Gerton Lunter1 and Nazneen Rahman2,3,4* Abstract Background: Next-generation sequencing (NGS) offers unprecedented opportunities to expand clinical genomics. It also presents challenges with respect to integration with data from other sequencing methods and historical data. Provision of consistent, clinically applicable variant annotation of NGS data has proved difficult, particularly of indels, an important variant class in clinical genomics. Annotation in relation to a reference genome sequence, the DNA strand of coding transcripts and potential alternative variant representations has not been well addressed. Here we present tools that address these challenges to provide rapid, standardized, clinically appropriate annotation of NGS data in line with existing clinical standards. Methods: We developed a clinical sequencing nomenclature (CSN), a fixed variant annotation consistent with the principles of the Human Genome Variation Society (HGVS) guidelines, optimized for automated variant annotation of NGS data. To deliver high-throughput CSN annotation we created CAVA (Clinical Annotation of VAriants), a fast, lightweight tool designed for easy incorporation into NGS pipelines. CAVA allows transcript specification, appropriately accommodates the strand of a gene transcript and flags variants with alternative annotations to facilitate clinical interpretation and comparison with other datasets. We evaluated CAVA in exome data and a clinical BRCA1/BRCA2 gene testing pipeline.
    [Show full text]
  • The Human Gamma-Glutamyltransferase Gene Family
    Hum Genet (2008) 123:321–332 DOI 10.1007/s00439-008-0487-7 REVIEW The human gamma-glutamyltransferase gene family Nora Heisterkamp · John GroVen · David Warburton · Tam P. Sneddon Received: 9 November 2007 / Accepted: 6 March 2008 / Published online: 21 March 2008 © Springer-Verlag 2008 Abstract Assays for gamma-glutamyl transferase related genes or sequences. These sequences were given (GGT1, EC 2.3.2.2) activity in blood are widely used in a multiple diVerent names, leading to inconsistencies and clinical setting to measure tissue damage. The well-charac- confusion. Here we systematically evaluated all human terized GGT1 is an extracellular enzyme that is anchored to sequences related to GGT1 using genomic and cDNA data- the plasma membrane of cells. There, it hydrolyzes and base searches and identiWed thirteen genes belonging to the transfers -glutamyl moieties from glutathione and other extended GGT family, of which at least six appear to be -glutamyl compounds to acceptors. As such, it has a critical active. In collaboration with the HUGO Gene Nomencla- function in the metabolism of glutathione and in the con- ture Committee (HGNC) we have designated possible version of the leukotriene LTC4 to LTD4. GGT deWciency active genes with nucleotide or amino acid sequence simi- in man is rare and for the few patients reported to date, larity to GGT1, as GGT5 (formerly GGL, GGTLA1/GGT- mutations in GGT1 have not been described. These patients rel), GGT6 (formerly rat ggt6 homologue) and GGT7 (for- do secrete glutathione in urine and fail to metabolize LTC4. merly GGTL3, GGT4). Two loci have the potential to Earlier pre-genome investigations had indicated that encode only the light chain portion of GGT and have now besides GGT1, the human genome contains additional been designated GGTLC1 (formerly GGTL6, GGTLA4) and GGTLC2.
    [Show full text]
  • Decision Support for Molecular Diagnostic Laboratories Using Interactive Biosoftware Alamut V1.2
    EuroGentest Unit 5 New technologies Decision support for molecular diagnostic laboratories using Interactive Biosoftware Alamut v1.2 November 2007- February 2008 Prepared by: NGRL Manchester As part of the activities of: EuroGentest Unit 5 Quality Management and Accreditation / Certification of Genetic Testing within the work package WP5.4. EuroGentest Project Co-ordinator: Professsor Jean-Jaques Cassiman K.U.Leuven Center for Human Genetics/ Centrum Menselijke Erfelijkheid Gasthuisberg O&N Herestraat 49 Box 602 3000 Leuven Belgium The views expressed in this document are those of the work package participants and do not necessarily reflect the policies of the institutions or companies they are affiliated to. © July 2008 Copyright EuroGentest Network of Excellence Project 2005 - EU Contract no. FP6-512148 Table of contents Summary ii Acknowledgements ii Introduction 1 Background 1 Description of tool 1 Evaluation 3 Aim 3 Methods 3 Results 5 User interface and usability 5 Data sources 5 Applicability to diagnostic testing 8 Validity and accuracy 8 Conclusions 15 Glossary 16 Appendix A Tested Gene Variants 18 i Summary Alamut is a decision-support software application developed by Interactive Biosoftware for mutation diagnostics in medical molecular genetics. It is a client- server application that integrates genetic information from different sources to describe variants using HGVS nomenclature and to help interpret their pathogenic status. We evaluated Alamut in four areas: its user interface and usability; the suitability of its data sources; its applicability to diagnostic testing; and its validity and accuracy. Laboratories found the software intuitive and easy to use, and well adapted to diagnostic testing. Over 400 variants from 14 genes were tested.
    [Show full text]